ARISTOTLE UNIVERSITY OF THESSALONIKI - The Federal

ARISTOTLE UNIVERSITY OF THESSALONIKI

SCHOOL OF ENGINEERING

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

DOCTORAL DISSERTATION

Development of New Model-Based Adaptive Predictive Control Algorithms and

Their Implementation on Real-Time Embedded Systems

Vincent Andrew Akpan

B.Sc. (Physics), M. Tech. (Instrumentation)

Supervisor: Professor George Hassapis

THESSALONIKI, GREECE, 2011.

Development of New Model-Based Adaptive Predictive Control Algorithms and

Their Implementation on Real-Time Embedded Systems

Doctoral Dissertation


Examination Committee:

George Hassapis, Professor, Department of Electrical and Computer Engineering, School of Engineering,

Aristotle University of Thessaloniki, T.K. 54124 Thessaloniki, Greece.

Alkiviadis Hatzopoulos, Professor, Department of Electrical and Computer Engineering, School of Engineering,


Loukas Petrou, Associate Professor, Department of Electrical and Computer Engineering, School of Engineering,


Vasilios Petridis, Professor, Department of Electrical and Computer Engineering, School of Engineering,


Zoe Doulgeri, Professor, Department of Electrical and Computer Engineering, School of Engineering, Aristotle

University of Thessaloniki, T.K. 54124 Thessaloniki, Greece.

John Theocharis, Professor, Department of Electrical and Computer Engineering, School of Engineering,


Olga Kosmidou, Associate Professor, Department of Electrical and Computer Engineering, School of

Engineering, Democritus University of Thrace, T.K. 67100 Xanthi, Greece.

Abstract

i

ABSTRACT

This dissertation deals with the development of two new neural network-based model identification

algorithms and two new model predictive control (MPC) algorithms which are combined to form model-based

adaptive control strategies. Also, two new computer platforms for the implementation of these algorithms and

their corresponding strategies are proposed. The overall strategies consist of an online model identification part

and a model-based predictive control part. The proposed algorithms for the model identification are practically

two new algorithms for training a dynamic feedforward neural network (DFNN) which will be considered to

comprise the model of a nonlinear dynamic system. The proposed training algorithms are: the adaptive recursive

least squares (ARLS) algorithm and the modified Levenberg-Marquardt algorithm (MLMA). The proposed

algorithms for the predictive control parts are: the adaptive generalized predictive control (AGPC) and the

nonlinear adaptive model predictive control (NAMPC). The two real-time platforms for the implementation of the

combined operation of identification and predictive control algorithms with the purpose of forming an adaptive

control strategy are: a service-oriented architecture (SOA) cluster network based on the device profile for web

services (DPWS) and a Xilinx’s Virtex-5 FPGA (field programmable gate array) development board.

The proposed control strategies have been applied to control three applications, namely: the fluidized bed

furnace reactor (FBFR) of the steam deactivation unit (SDU) used for preparing catalyst for evaluation in a fluid

catalytic cracking (FCC) pilot plant; an activated sludge wastewater treatment plant (AS-WWTP) in accordance

with the European wastewater treatment standards; and the auto-pilot control unit of a nonlinear F-16 fighter

aircraft. The neural network models for these three applications were validated using one-step, five-step and ten-

step ahead prediction simulations as well as with the Akaike’s final prediction error (AFPE) estimate algorithm.

Also, the performances of the proposed ARLS and MLMA algorithms were compared with the backprogation

with momentum (BPM) and incremental backpropagation (INCBP) algorithms. Also the performances of the PID

control of the identified model of the FBFR process by means of the ARLS and the MLMA network training

algorithms versus the PID control of the first principles model of the same process.

The AGPC and NAMPC control of the considered applications when model identification is performed

by the ARLS and MLMA algorithms were implemented on a conventional mulitcore personal computer (PC) and

SOA cluster of muticore PCs. Hardware-in-the-loop simulations have been performed by linking the PC or SOA

implementations with MATLAB simulations of the processes. An AGPC implementation with neural networks

trained using the MLMA algorithm has been made on a Xilinx Virtex-5 FPGA.

The hardware-in-the-loop simulations have shown that the proposed algorithms and their SOA or FPGA

implementations can have execution times shorter than other algorithms which present similar performance.

Therefore, they render themselves more appropriate compared to other algorithms for use in the control of

processes requiring shorter sampling time for stable operations.

Acknowledgement

ii

ACKNOWLEDGEMENT

My sincere appreciation and gratitude goes to my project supervisor, Professor George Hassapis, who conceived

and supervised the work contained in this dissertation. I also thank him for his technical and financial supports,

encouragements and fatherly roles throughout the course of this work. I will always remain grateful to him for his

advice, suggestions, intuitive comments, patience and untiring efforts in reading through my manuscripts with

necessary corrections from conception through algorithm developments, problem formulations, implementations,

and several simulations and analyses which have resulted in this dissertation.

I also thank Professor Alkiviadis Hatzopoulos and Associate Professor Loukas Petrou for their co-supervisory

roles in this work. My sincere thanks to Associate Professor Loukas Petrou for his technical supports and

comments as well as his efforts and time devoted to this work from inception to completion.

I specially acknowledge and thank the Greek State Scholarships’ Foundation (I.K.Y.) that provided the

scholarship as well as the major funding for this research. I also thank the Federal Government of Nigeria for their

financial support towards the Bilateral Educational Agreement with I.K.Y. and the Federal University of

Technology, Akure – Nigeria for their financial supports which has made this scholarship a reality leading to the

successful completion of my doctorate degree programme. My acknowledgment also goes to Ambassador of

Nigeria to Greece, His Excellency (Dr.) Etim U. Uyie for his love, care and financial assistance.

My special thanks go to the Staff of the School of Electrical and Computer Engineering, AUTH, Greece. I

gratefully acknowledge Dr. Simeonidis Andreas for his comments and encouragements, and to Mr. George

Voukalis for his technical assistance. I also wish to thank my colleagues at the Laboratory of Computer Systems

Architecture: Maria Koukourli, Ioakeim Samaras, Babis Serenis, Manos Tsardoulias and Nikos Sismanis for their

technical support, comments and contributions towards the successful completion of this project.

I am highly indebted to my mother Mrs. Cecilia Andrew Akpan; my mother-in-law Mrs. Titilayo Nathaniel

Oyewo, and my siblings Justine, Sylvester, Emmanuel and Justina for their sacrifices and prayers.

Words are not enough to thank my wife, Mrs. Rachael Oyenike Vincent–Akpan, for all her sacrifices, financial

support, prayers, and encouragements throughout the period of this study. Just know that I love you.

Finally, I am most grateful to God Almighty for His infinite mercy, divine grace and sound health.


July, 2011.

Table of Contents

iii

TABLE OF CONTENTS

CONTENTS PAGES

Abstract i

Acknowledgement ii

Table of Contents iii

List of Figures xiii

List of Tables xxiii

List of Acronyms xxv

Chapter 1 Introduction 1

1.1 Introduction 1

1.2 Research Objectives 3

1.3 Scientific Contributions 4

1.4 Thesis Organization and Structure 6

1.5 Scientific Publications 7

Chapter 2 Background of the Research 9

2.1 Introduction 9

2.2 Model Predictive Control (MPC) 11

2.2.1 Historical Background of MPC 11

2.2.2 Overview of MPC Strategy 12

2.3 MPC Process Models 15

2.4 Neural Networks: An Overview 18

2.4.1 Neural Networks 18

2.4.2 Multilayer Perceptron (MLP) Neural Networks 19

2.4.3 Supervised and Unsupervised Learning Methods Using Neural Networks 20

2.4.3.1 Dynamic Neural Networks for Supervised Learning 21

2.4.3.1.1 Dynamic Feedforward Multilayer Perceptron Neural Networks (DFNN) 21

2.4.3.1.2 Recurrent Neural Networks (RNN) 22

2.4.3.1.2.1 The Hopfield Network 23

2.4.3.1.2.2 The Jordan Network 25

2.4.3.1.2.3 The Elman Network 26

2.4.3.1.3 Tapped Delay Neural Networks 27

2.4.3.2 Neural Networks Based on Unsupervised Learning 28

2.4.3.2.1 Generalized Regression Neural Network (GRNN) 30

Table of Contents

iv

2.4.3.2.2 Radial Basis Function Neural Network (RBFNN) 31

2.4.4 Basic Neural Networks Training Algorithms 34

2.4.4.1 The Backpropagation (BP) Algorithm with Momentum 34

2.4.4.2 Teacher-Forced Real-Time Recurrent Learning (RTRL) Algorithm 39

2.5 System Description and Neural Network-Based Nonlinear Model Predictors 42

2.5.1 General System Description and mathematical Notations 42

2.5.1.1 Remarks on the Disturbance Model 44

2.5.2 The Neural Network-Based Nonlinear Model Predictors 45

2.5.2.1 Neural Network-Based Auto-Regressive with Exogenous Inputs (NNARX)

Model Predictor 46

2.5.2.2 Neural Network-Based Auto-Regressive with Moving Average and Exogenous

Inputs (NNARMAX) Model Predictor 46

2.5.2.3 Neural Network-Based Output–Error (NNOE) Model Predictor 47

2.6 Implementation of MPC Algorithms 48

2.6.1 Computer Implementation of MPC Algorithms 48

2.6.2 FPGA Implementation of MPC Algorithms 49

2.6.3 Remarks on the Reviewed MPC Implementation Strategies 50

2.7 Switched Ethernet Architecture and Service Oriented Architecture (SOA) Technologies 51

2.7.1 The Architecture of the Switched Ethernet 51

2.7.2 SOA Technologies 52

2.7.2.1 The Jini Technology 52

2.7.2 2.The UPnP Technology 52

2.7.2.3 The DPWS Technology 52

2.8 Programmable Logic Devices and Field Programmable Gate Array (FPGA) Technologies 53

2.8.1 The Xilinx Virtex Series FPGA Family Members 55

2.8.2 Comparison of the Xilinx General-Purpose, Defense-Grade, Space-Grade Virtex-4

and Virtex-5 FPGA Product Family Members 57

2.8.3 The Xilinx Virtex-5 XC5VFX70T ML507 FPGA Development Board 59

2.9 Application of MPC 61

2.10 Practical Problems with MPC Algorithms 63

2.11 Neural Network-Based Control Schemes and MPC 64

2.11.1 Direct Adaptive Control 65

2.11.1.1 Direct Inverse Control with Generalized Training 66

2.11.1.2 Direct Inverse Control with Specialized Training 66

2.11.1.3 Indirect Inverse Control 66

2.11.1.4 Internal Model Control (IMC) 66

Table of Contents

v

2.11.1.5 Feedback Linearization 67

2.11.1.6 Feedforward Control 68

2.11.1.7 Optimal Control 68

2.11.2 Indirect Adaptive Control 69

2.11.2.1 Indirect Adaptive Control Based on Instantaneous Linearization 70

2.11.2.2 Nonlinear Predictive Control (NPC) 72

2.11.3 Modular Neural Network (MNN) Controller Design 73

2.11.4 Back-Propagation Through Time (BPTT) Control Scheme 73

2.11.5 Neural Network-Based Adaptive Critic Designs (ACD) 74

2.12 State-of-the-Art in Neural Network-Based MPC: Neural Network Training, Model

Identification, Adaptive Control and MPC Implementation 75

2.12.1 Neural Network and Training Methods 76

2.12.2 Neural Network Model Identification for MPC Design 77

2.12.3 Neural Network–Based MPC Algorithms 80

2.12.4 MPC Implementation 82

Chapter 3 Neural Network-Based Nonlinear Model Identification Algorithms 87

3.1 Introduction 87

3.2 Dynamic Neural Network Model Identification 88

3.2.1 Parallel Model Identification 88

3.2.2 Series-Parallel Model Identification 89

3.2.3 Remarks on the Basic Neural Network Training Algorithms 89

3.2.3.1 Backpropagation Algorithms and Its Variations 89

3.2.3.2 Teacher-Forced Real-Time Recurrent Learning 90

3.2.4 The Architecture for the Teacher-Forced Dynamic Feedforward Neural Network 91

3.3 Training Algorithms for Neural Network Model Identification 92

3.3.1 Formulation of the Model Identification Problem 92

3.3.2 The Proposed Neural Network Model Identification Schemes 94

3.3.3 Backpropagation (BP) Techniques 97

3.3.4 The Gauss-Newton Second-Order Approximation Method 97

3.3.4.1 Computing the Gradient of the Network [ , ( )]k kψ θ 98

3.3.4.2 Computing the Partial Derivatives ( )kφ 100

3.3.4.3 Second-Order Expansion and the Gauss-Newton Search Direction 102

3.3.5 The Adaptive Recursive Least Squares (ARLS) Algorithm 103

3.3.6 The Modified Levenberg-Marquardt Algorithm (MLMA) 108

3.3.7 Training Parameters and Criteria for Evaluating the Neural Network (NN) Model 113

Table of Contents

vi

3.3.8 Scaling the Training Data and Rescaling the Trained Network 114

3.4 Neural Network-Based Validation Algorithms 115

3.4.1 One-Step Ahead Prediction Validation 115

3.4.2 k-Step Ahead Prediction Validation 116

3.4.3 Akaike’s Final Prediction Error Estimate 116

Chapter 4 Neural Network-Based Adaptive Model Predictive Control Algorithms 118

4.1 Introduction 118

4.2 The Objective Function 119

4.3 Adaptive Generalized Predictive Control (AGPC) Algorithm 120

4.3.1 Instantaneous Linearization of a Deterministic Nonlinear Neural Network ARX Model 121

4.3.2 Instantaneous Linearization of a Stochastic Nonlinear Neural Network ARMAX Model 122

4.3.3 The AGPC Algorithm 124

4.4 Nonlinear Adaptive Model Predictive Control (NAMPC) Algorithm 129

4.5 Tuning the Neural Network-Based Model Predictive Controllers 139

Chapter 5 Development of Real-Time Implementation Platforms for the Neural

Network-Based Nonlinear Model Identification and Adaptive Model

Predictive Control Algorithms 141


5.2 The Description of the Proposed Network Control System (NCS) 142

5.2.1 Bounded Transmission Delay 142

5.2.2 Interoperability at the Application Level 145

5.3 The Development of Real-Time Embedded Processor System Platform 146

5.3.1 Overview of Embedded Processor Systems and Design Considerations 146

5.3.1.1 Why Embedding a Processor Inside an FPGA? 146

5.3.1.2 Some Advantages and Disadvantages of FPGA Embedded Processor System 147

5.3.1.3 Xilinx’s Embedded Hard PowerPC™440 and MicroBlaze Soft Processors 148

5.3.1.4 Standard Industry Benchmark for FPGA Embedded Processors and Xilinx’s FPGA

Embedded Processors Benchmark Performances 149

5.3.1.5 Design Considerations for the Proposed FPGA Embedded Processor System 149

5.3.1.5.1 Compiler Optimization and Parameters 150

5.3.1.5.2 Memory Types 150

5.3.1.5.3 Optimization Specific to an FPGA Embedded Processor 152

5.3.2 The PowerPC™ 440 Embedded Processor System Development Using Xilinx Integrated

Software Environment (ISE) and Xilinx Platform Studio (XPS) 153

Table of Contents

vii

5.3.3 MicroBlaze Embedded Processor System Development Using the Xilinx Integrated

Software Environment (ISE) and the Xilinx Platform Studio (XPS) 161

5.3.4 Software Development and Performance Verification of the PowerPC™440 and

MicroBlaze™ Embedded Processor Systems Using the Xilinx Software Development

Kit (Xilinx SDK) 164

5.3.5 MicroBlaze™ Dhrystone Benchmark Performance Evaluation 167

5.3.6 Comparison of the Device Utilization for the PowerPC™440 and MicroBlaze™ Embedded

Processor Systems 169

Chapter 6 Case Studies and Simulation Results 171


6.2 The Model Identification and Control of the Fluidized Bed Furnace Reactor (FBFR) Process 173

6.2.1 The Fluidized Bed Furnace Reactor (FBFR) Process Description and Mathematical Model 173

6.2.1.1 The Fluidized Bed Furnace Reactor (FBFR) Process 174

6.2.1.2 The Control Problem of the Fluidized Bed Furnace Reactor (FBFR) Process 176

6.2.1.3 FBFR Experiment and Training Data Acquisition 177

6.2.1.4 Statement of the FBFR Neural Network Model Identification and Control Problem 177

6.2.2 Training the Neural Network that Models the FBFR Process 179

6.2.2.1 Validating the Trained Network that Models the FBFR Process 181

6.2.2.1.1 Validation by the One-Step Ahead Predictions Simulation 182

6.2.2.1.2 K-Step Ahead Prediction Simulations for the FBFR Process 182

6.2.2.1.3 The Akaike’s Final Prediction Error (AFPE) Estimates for the FBFR Process 184

6.2.2.2 Online Closed-Loop Identification with PID Control 185

6.2.3 Validation and Dynamic Performance Comparison of the Proposed MLMA algorithm

with Backpropagation with momentum (BPM) and Incremental Backpropagation

(INCBP) Algorithms 189

6.2.3.1 Network Training Using BPM, INCBP and the Proposed MLMA Algorithms 190

6.2.3.2 Validating the Trained Network by BPM, INCBP and MLMA Algorithms 191

6.2.3.2.1 One-Step Ahead Predictions Simulation for the FBFR Process 192

6.2.3.2.2 K-Step Ahead Prediction Simulations for the FBFR Process 195

6.2.3.2.3 The Akaike’s Final Prediction Error (AFPE) Estimates for the FBFR Process 196

6.2.3.3 Performance Comparison of the BPM, INCBP and the MLMA Algorithms 196

6.2.4 Validation and Performance Evaluation of the Proposed AGPC and NAMPC Algorithms

for the Model-Based Adaptive MPC of the FBFR Process 197

6.2.4.1 Comparison of Simulation Results for the Control Performance of AGPC and

NAMPC for the FBFR Process Identification and Control 199

Table of Contents

viii

6.2.4.2 Computation Time for the Neural Network Identification and Control of the FBFR

Process 201

6.2.5 Implementation of the PID and NAMPC algorithms Over the Service-Oriented Architecture

Cluster Network and their Performance Evaluation 201

6.2.5.1 Results of the Closed-Loop Simulation 204

6.2.5.2 Worst Case Overall Control Loop Delay Introduced by a DPWS-Based Traditional

Ethernet Network 207

6.2.5.3 Worst Case Overall Control Loop Delay Introduced by the Proposed Service-Oriented

Architecture (SOA) Cluster Network Based on the DPWS 208

6.3 Activated Sludge Wastewater Treatment Plant (AS-WWTP) 210

6.3.1 An Overview of the AS-WWTP Process 210

6.3.1.1 Statement of the Activated Sludge Wastewater Treatment Plant (AS-WWTP) Problem 210

6.3.1.2 Statement of the Activated Sludge Wastewater Treatment Plant (AS-WWTP) Neural

Network Model Identification and Control Problem 213

6.3.1.3 Experiment with the BSM1 for AS-WWTP Process Neural Network Training Data

Acquisition 215

6.3.2 Training the Neural Network that Models the AS-WWTP Aerobic Reactor 215

6.3.2.1 Validating the Trained Network that Models the AS-WWTP Process 217

6.3.3.2.1 Validation by the One-Step Ahead Predictions Simulation 217

6.3.3.2.2 K-Step Ahead Prediction Simulations for the AS-WWTP Process 220

6.3.3.2.3 Akaike’s Final Prediction Error (AFPE) Estimates for the AS-WWTP Process 221

6.3.2.2 Online Closed-Loop Identification and Control with AGPC Controller 221

6.3.3 Validation and Dynamic Performance Comparison of the BPM, INCBP and Proposed

ARLS Algorithms for the Model Identification of the Aerobic Reactor of the AS-WWTP

Process 224

6.3.3.1 Network Training Using the BPM, INCBP and the Proposed ARLS Algorithms 224

6.3.3.2 Validating the Trained Network by BPM, INCBP and MLMA Algorithms 226

6.3.3.2.1 One-Step Ahead Predictions Simulation for the AS-WWTP Process 227

6.3.3.2.2 K-Step Ahead Prediction Simulations for the AS-WWTP Process 230

6.3.3.2.3 The Akaike’s Final Prediction Error (AFPE) Estimates for the AS-WWTP

Neural Network Model 230

6.2.3.3 Performance Comparison of the BPM, INCBP and the MLMA Algorithms 230

6.3.4 Validation and Performance Evaluation of the Proposed AGPC and NAMPC Algorithms for

Model-Based Adaptive Control of the AS-WWTP Process 231

6.3.4.1 Comparison of Simulation Results for the Control Performance of AGPC and

NAMPC for the AS-WWTP Process Identification and Control 232

Table of Contents

ix

6.4 Neural Network-Based Model Identification and Adaptive Predictive Auto-Pilot Control of a

Nonlinear F-16 Fighter Aircraft 235

6.4.1 Formulation of the Nonlinear F-16 Aircraft Control Problem 235

6.4.1.1 Simulations of the Nonlinear F-16 Aircraft for Training Data Acquisition 239

6.4.2 Neural Network Identification of the Nonlinear F-16 Aircraft Model 241

6.4.2.1 Performance Comparison of the Neural Network Model Identification Based on

the ARLS and the MLMA Algorithms 242

6.4.3 Validation of the Trained Neural Network for Modeling the Nonlinear F-16 Aircraft 244

6.4.3.1 Trained Network Validation by the One-Step Ahead Predictions Simulation 245

6.4.3.2 K–Step Ahead Prediction Simulations 249

6.4.3.3 The Akaike’s Final Prediction Error (AFPE) Estimates 251

6.4.4. Closed-Loop Model-Based Adaptive Control of the Nonlinear F-16 Aircraft 252

6.4.5 Evaluation and Performance Comparison of the AGPC and NAMPC Algorithms for the

Control of the Nonlinear F-16 Aircraft 256

6.4.5.1 Comparison of Simulation Results of AGPC and NAMPC Controllers 256

6.4.5.2 Computation Time for the Neural Network Model Identification and Adaptive Control

of the Nonlinear F-16 Aircraft Auto-Pilot System 257

6.5 Real-Time Implementation of the Neural Network-Based Adaptive Generalized Predictive Control

(AGPC) Algorithm for Nonlinear F-16 Fighter Aircraft Auto-Pilot Control System on a Xilinx

Virtex-5 FX70T ML507 FPGA Board 259

6.5.1 Model-Based Approach for the FPGA Implementation of the AGPC Algorithm 261

6.5.2 Hardware Synthesis of the AGPC Algorithm Using the Xilinx AccelDSP 263

6.5.2.1 Discussions on the Generated Hardware Model of the AGPC Algorithm 270

6.5.2.2 Remarks on the Generated Hardware Model of the AGPC Algorithm 271

6.5.3 Model-Based Implementation of the Synthesized AGPC Algorithm Using Xilinx System

Generator for DSP 272

6.5.4 Hardware-in-the-Loop Co-Simulation of the System Generator Model of the Synthesized

AGPC Algorithm on Xilinx Virtex-5 FX70T ML507 FPGA Board 275

6.5.5 Generation and Integration of an AGPC Co-Processor Intellectual Property (IP) Core

with an Embedded PowerPC™440 Processor System 278

6.5.6 Real-Time Implementation of the Embedded PowerPC™440 Processor and AGPC

Co-Processor System on Xilinx Virtex-5 FX70T ML507 FPGA Board 288

6.5.6.1 Closed-Loop Control of the Nonlinear F-16 Aircraft Using the Simulink and Neural

Network Models with the Embedded PowerPC™440 Processor–AGPC Co-Processor

System on Xilinx Virtex-5 FX70T ML507 FPGA Board 290

Table of Contents

x

Chapter 7 Conclusion 295

7.1 Evaluation and Discussion of Results 295

7.1.1 The Temperature Control of a Fluidized Bed Furnace Reactor (FBFR) 296

7.1.2 The Soluble Oxygen Control in the Aerobic Reactor of an Activated Sludge

Wastewater Treatment Plant (AS-WWTP) 297

7.1.3 The Nonlinear F-16 Fighter Aircraft Auto-Pilot Control 298

7.1.4 Real-Time Embedded PowerPC™440 Processor–AGPC Co-Processor System

Implementation on Xilinx Virtex-5 FX70T ML507 FPGA Board 298

7.2 Open Issues 300

7.3 Recommendations 301

REFERENCES 303

Appendix A: Overview of the Xilinx FPGA and Embedded System Design Tools 333

Appendix A–1: Overview of the Xilinx Model-Based Design Flow of an Embedded System 333

Appendix A–2: Algorithm Development Using the Xilinx AccelDSP Synthesis Tool 335

Appendix A–3: Model-Based System Design and Development Using the Xilinx System

Generator for DSP 336

Appendix A–4: Xilinx Embedded Development Kit (EDK) Design Tools 338

Appendix A–5: Importing and Integrating an EDK Processor with a System Generator Model 340

Appendix A–6: Exporting and Integrating a System Generator Model with an EDK Processor 341

Appendix A–7: Xilinx IST™ Foundation: Design Implementation Tool 341

Appendix A – 8: Major Simulink and System Generator for DSP Hardware Block Description

used in Modeling and Synthesis of the Adaptive MPC 343

Appendix A–9: PowerPC™ 440 Embedded Processor 352

Appendix A–9.1: The PowerPC™ 440 Core Block Diagram 353

Appendix A–9.2: The PowerPC™ 440 Embedded Processor Organization 354

Appendix A–9.3: PowerPC™ 440 Embedded Processor Block Components,

Buses and Controllers 356

Appendix A–9.4: Processor Interfaces 362

Appendix A–10: MicroBlaze™ Embedded Processor 365

Appendix A–11: XPS Synthesis and ISE Device Utilization Summaries for the PowerPC™440

and MicroBlaze™ Embedded Processors Design 368

Appendix A–11.1: XPS Synthesis Summary for PowerPC™440 Processor Design 368

Appendix A–11.2: ISE Device Utilization Summary for PowerPC™440 Processor 369

Appendix A–11.3: XPS Synthesis Summary for MicroBlaze™ Processor Design 371

Appendix A–11.4: ISE Device Utilization Summary for MicroBlaze™ Processor 372

Table of Contents

xi

Appendix B: The Mathematical Model of the Fluidized Bed Furnace Reactor 374

Appendix B–1: The Fluidized Bed Furnace Reactor (FBFR) 374

Appendix B–2: MATLAB Program for the Fluidized Bed Furnace Reactor (FBFR) Model 376

Appendix B–3: MATLAB Script for Simulation of the FBFR Model 378

Appendix C: The Activated Sludge Wastewater Treatment Plant (AS-WWTP) Process Description

and Model 380

Appendix C–1: Introduction 380

Appendix C–2: Appendix C–2: AS-WWTP Process Description 383

Appendix C–3: Appendix C–3: AS-WWTP Process Model 384

Appendix C–4: General Characteristics of the Biological Reactors 386

Appendix C–5: General Characteristics of the Secondary Settler 387

Appendix C–6: The Simulink Model of the BSM1 for Evaluating the Performance of AS-WWTPs 392

Appendix C–7: The AS-WWTP Operational Considerations for the Biological Reactors 393

Appendix C–8: Criteria for Evaluating and Assessing the Performances of the AS-WWTP Control 394

Appendix C–9: Constraints Imposed on the ASWWTP Based on the Benchmark 397

Appendix C–10: Controller Performance Evaluation and Assessment Criteria Based on the

Benchmark 399

Appendix C–11: MATLAB C Program for the Mixing Tank, combiner.c 401

Appendix C–12: MATLAB C Program for the First-order Reaction Flow, hyddelayv2.c 403

Appendix C–13: MATLAB C Program for the BSM No.1 for the Bioreactors, asm1.c 406

Appendix C–14: MATLAB C Program for the Secondary Settler, settler1dv4.c 410

Appendix C–15: Initialization of the Benchmark Simulation Model no. 1 (BSM1) 419

Appendix C–16: Initialization of the Benchmark Simulation Model no. 1 (BSM1)

Secondary Clarifier (Settler) 422

Appendix C–17: Initialization of the Dissolved Oxygen and Nitrate PI-Controller 425

Appendix C–18: Food-to-Microorganism Ratio 426

Appendix C–19: Computation of the Sludge Age 427

Appendix C–20: Influent Quality (IQ) 428

Appendix C–21: Effluent Quality (EQ) 429

Appendix D: The Nonlinear F-16 Aircraft Description and Model 430

Appendix D–1: The Nonlinear F-16 Aircraft Description and Anatomy 430

Appendix D–2: The Nonlinear F-16 Aircraft Equation of Motions 432

Appendix D–3: The Nonlinear F-16 Aircraft Modeled Using Simulink 435

Appendix D–4: Static, Dynamic and Total Pressures 440

Table of Contents

xii

Appendix D–5: The MATLAB C Program for the Nonlinear F-16 Aircraft Model, nlpant.c 442

Appendix D–6: The MATLAB Program for the F-16 Model Trimming Routine, trim_F16.m 453

Appendix D–7: The MATLAB Program for Computing the Initial States of the Nonlinear

F-16 Model Used in the Trimming Routine, trimfun.m 455

Appendix D–8: MATLAB script for the Simulation of the Nonlinear F-16 Aircraft,

F-16_Simulations.m 458

Appendix D–9: MATLAB Script for Implementing the Nonlinear F-16 Aircraft Simulation,

F16_aircraft_model.m 460

Appendix E: Embedded PowerPC™440 Processor–AGPC Co-Processor System XPS

Synthesis and Xilinx ISE™ Device Utilization Summaries 462

Appendix E – 1: XPS Synthesis Summary for the Embedded PowerPC™440 Processor–AGPC

Co-Processor System 462

Appendix E – 2: Xilinx ISE™ Device Utilization Summary for the Embedded PowerPC™440

Processor–AGPC Co-Processor System 463

Appendix E – 3: Summary and Table of Contents of the Embedded PowerPC™440

Processor–AGPC Co-Processor System 465

Appendix E–4: The AGPC Co-Processor (f16_nagpc_ipcore_plbw_0) System Device Utilization 466

Appendix E–5: The EDK Processor API for the AGPC Co-Processor IP Core Drivers and

Software Development Guide 468

Appendix E–6: Software for Initializing the Embedded System Driver and Implementing the

Embedded PowerPC™440 Processor and the AGPC Co-Processor System on

Virtex-5 FX70T ML507 FPGA Board 480

List of Figures

xiii

List of Figures

Fig. 2.1: Basic structure of MPC scheme 13

Fig. 2.2: The general MPC control strategy 14

Fig. 2.3: A nonlinear model of a neuron 19

Fig. 2.4: Feedforward multilayer perceptron neural network with one hidden and output layer 20

Fig. 2.5: Dynamic feedforward neural network (DFNN) structure 21

Fig. 2.6: The schematic diagram of the Hopfield network 24

Fig. 2.7: The basic architecture of the Jordan network 25

Fig. 2.8: Unfolding action of recurrent neural networks with additional layer at each time step 25

Fig. 2.9: The basic architecture of the Elman network 26

Fig. 2.10: Tapped delayed neural network (TDNN) 27

Fig. 2.11: Generalized regression neural network (GRNN) 31

Fig. 2.12: Radial basis function neural network (RBFNN) 32

Fig. 2.13: NNARX model predictor 46

Fig. 2.14: NNARMAX model predictor 46

Fig. 2.15: NNOE model predictor 46

Fig. 2.16: The Virtex-5 ML507 FPGA embedded system development board: (a) Top view and

(b) Bottom view 60

Fig. 2.17: Model reference adaptive control scheme: ( )U k is the control input, ( )R k is the desired

reference, ( )E k is the error between the reference model and the system output ( )Y k 65

Fig. 2.18: The principle of internal model control (IMC) implemented with two neural networks:

A model of the system (M) and an inverse model (C) with disturbance ( )d k acting on the

output of the system 67

Fig. 2.19: Indirect model-based adaptive control scheme: ( )U k is the control input, ( )R k is the

desired reference, ( )E k is the error between the reference model and the system output ( )Y k 69

Fig. 2.20: Indirect control based on instantaneous linearization of the neural network model 70

Fig. 2.21: Basic structure of the backpropagation through time (BPTT) control scheme 73

Fig. 2.22: The structure of an action-dependent heuristic dynamic programming form of adaptive

critic design (ACD) 74

Fig. 3.1: Neural network parallel model identification structure 88

Fig. 3.2: Neural network series-parallel model identification structure 88

Fig. 3.3: Teacher-forced dynamic feedforward neural network (TF-DFNN) architecture 91

Fig. 3.4: The architecture of the dynamic feedforward neural network (DFNN) model 94

Fig. 3.5: Neural network model identification based on the teacher-forcing method for (a): NNARX

List of Figures

xiv

and (b) NNARMAX model predictors 94

Fig. 4.1: The proposed NN-based AGPC scheme 120

Fig. 4.2: The proposed NN-based NAMPC strategy with a NN model 130

Fig. 5.1: General structure of the proposed network control system (NCS) 143

Fig. 5.2: Structure of a SOAP message 144

Fig. 5.3: The Xilinx ISE “New Project Summary” and the BSB Welcome, System, and Processor

design stages for the embedded PowerPC™440 processor system 154

Fig. 5.4: The BSB: the Peripheral and Summary design stages for the embedded PowerPC™440

processor system 156

Fig. 5.5: The XPS graphical user interface (GUI) for the creation and initial compilation of the

embedded processor system 157

Fig. 5.6: A section of the Xilinx ISE™ graphical user interface from where the PowerPC™440

embedded processor system design is instantiated 159

Fig. 5.7: The block diagram of the PowerPC™440 embedded processor system with associated memory

types, peripherals, clock generator, buses, hardware and software specifications and key/symbols 160

Fig. 5.8: The BSB: the Peripheral and Summary design stages for the embedded MicroBlaze™


Fig. 5.9: The block diagram of the MicroBlaze™ embedded processor system with associated memory types,

peripherals, clock generator, buses, hardware and software specifications and key/symbols 163

Fig. 5.10: Xilinx software development kit graphical user interface for software development

and programming the Virtex-5 ML507 FPGA using the “Debug on Hardware” option 165

Fig. 5.11: The MicroBlaze™ processor: (a) memory and (b) peripheral test results on the

HyperTerminal window 166

Fig. 5.12: The XPS for creating, compiling and initializing the Dhrystone benchmark program to

load from on-board BRAM for benchmark performance evaluation of MicroBlaze™

embedded processor on Virtex-5 ML507 FPGA 168

Fig. 6.1: Simplified diagram of the steam deactivation unit (SDU) of the FCC pilot plant with the FBFR 175

Fig. 6.2: Schematic of the vertical cross-section of the cylindrical fluidized bed furnace reactor (FBFR) 175

Fig. 6.3: Temperature distribution and variation across the six sections of the FBFR system:

(a) temperature distribution across the FBFR and (b) temperature variations showing

the maximum temperatures for Tirw, Tbrwh, Th and Tormw with the minimum

and maximum temperatures for Tri and Tins 176

Fig. 6.4: Neural network-based FBFR model identification schemes (a) NNARX model and

(b) NNARMAX model 178

Fig. 6.5: Network convergence using the ARLS and the MLMA algorithms (performance index vs. epoch) 180

Fig. 6.6: Comparison of Tri and Th training data predictions by the network trained using ARLS and

List of Figures

xv

MLMA algorithms for 10 Epochs 181

Fig. 6.7: Comparison of Tri and Th training data predictions by the network trained using ARLS and


Fig. 6.8: Comparison of Tri and Th test data predictions by the network trained using ARLS and


Fig. 6.9: Comparison of Tri and Th test data predictions by the network trained using ARLS and

MLMA algorithms for 100 Epochs. 183

Fig. 6.10: 5-step ahead Tri and Th output predictions (red--*) comparison with original unscaled training

data (blue-) using the network trained using ARLS and MLMA algorithms for 10 epochs 184

Fig. 6.11: 5-step ahead Tri and Th output predictions (red--*) comparison with original unscaled training

data (blue-) using the network trained using ARLS and MLMA algorithms for 100 epochs 184

Fig. 6.12: The PID control scheme 185

Fig. 6.13: PID control performance with the first principles validated model of the FBFR process:

(a) Th and (b) Tri output predictions, and (c) Th and (d) Tri predictions without disturbances

on the model 187

Fig. 6.14: PID control performance with the first principles validated model of the FBFR process

under disturbances: (a) Th and (b) Tri output predictions, and (c) HRP and (d) DWP control

signals 187

Fig. 6.15: Closed-loop PID control performance of the FBFR process using NN model trained with ARLS

(dotted red lines) and MLMA (black dashed dotted lines) for 10 epochs: (a) Th and (b) Tri

output predictions, (c) HRP and (d) DWP control signals 188

Fig. 6.16: Closed-loop PID control performance of the FBFR process using NN model trained with ARLS

(dotted red lines) and MLMA (black dashed dotted lines) for 100 epochs: (a) Th and (b) Tri

output predictions, (c) HRP and (d) DWP control signals 188

Fig. 6.17: Network convergence using the BPM, INCBP and the MLMA algorithms (performance

index vs. epoch) 191

Fig. 6.18: Comparison of (a) Tri and (b) Th training data predictions by the network trained using

backpropagation with momentum (BPM), incremental backpropagation (INCBP), and the

MLMA algorithms 193

Fig. 6.19: Comparison of (a) Tri and (b) Th test data predictions by the network trained using

backpropagation with momentum (BPM), incremental backpropagation (INCBP), and the

MLMA algorithms 194

Fig. 6.20: Comparison of the 5-step ahead output predictions (red --*) of the NN for (a) Tri and (b) Th

when it is trained by the BPM, (INCBP), and the MLMA algorithms with the original unscaled

training data (blue-) 195

Fig. 6.21: FBFR temperature predictions by AGPC (blue--) and NAMPC (red -) for (a) Th and (b) Tri

List of Figures

xvi

with the manipulated signals (c) HRP and (d) DWP to track the desired reference signal

(green .-) 200

Fig. 6.22: Computation time for the parallel implementation of the identification and control strategies

at each time sample: (a) AGPC for the FBFR process and (c) NAMPC for the FBFP process 201

Fig. 6.23: FBFR temperature predictions by the PID controller (blue--) and NAMPC (red) for (a) Th and

(b) Tri with the control signals (c) HRP and (d) DWP for tracking the reference signal

(pink -.-) together with output prediction errors in (e) and (f) for Th and Tri respectively due to

both controllers for k = 350 samples 205

Fig. 6.24: 1 2tr tr

D D+ delay between the FBFR process and the control system obtained by NS-2 207

Fig. 6.25: Online identification and control of the FBFR process over the DPWS implemented over a

traditional Ethernet network: (a) Th and (b) Tri predictions with their respective control signals

(c) HRP and (d) DWP 208

Fig. 6.26: Online identification and control of the FBFR process over the proposed Fieldbus: (a) Th and

(b) Tri predictions with their respective control signals (c) HRP and (d) DWP 208

Fig. 6.27: Computation time for the FBFR model identification and control at each time sample 209

Fig. 6.28: The AS-WWTP with dissolved oxygen concentration and the nitrate control loops 212

Fig. 6.29: The neural network model identification scheme for AS-WWTP based on NNARMAX model 214

Fig. 6.30: Network convergence for the AS-WWTP using the ARLS and the MLMA algorithms 217

Fig. 6.31: Comparison of soluble oxygen (SO) data predictions with the training data by the network

trained using ARLS and MLMA algorithms for 10 Epochs 218

Fig. 6.32: Comparison of soluble oxygen (SO) data predictions with the training data by the network

trained using ARLS and MLMA algorithms for 100 Epochs 218

Fig. 6.33: Comparison of soluble oxygen (SO) validation data predictions by the network trained using

ARLS and MLMA algorithms for 10 Epochs 219

Fig. 6.34: Comparison of soluble oxygen (SO) validation data predictions by the network trained using

ARLS and MLMA algorithms for 100 Epochs 219

Fig. 6.35: 5-step ahead soluble oxygen (SO) output predictions (red--*) comparison with original

unscaled training data (blue-) using the network trained using ARLS and MLMA algorithms

for 10 Epochs 220

Fig. 6.36: 5-step ahead soluble oxygen (SO) output predictions (red--*) comparison with original

unscaled training data (blue -) using the network trained using ARLS and MLMA algorithms

for 100 Epochs 220

Fig. 6.37: The closed-loop AGPC scheme used for the soluble oxygen (SO) in order to evaluate the

online model identification based on ARLS and MLMA algorithms 222

Fig. 6.38: Closed-loop AGPC control performance of the AS-WWTP process using NN model

trained with ARLS and MLMA for (a) 10 and (c) 100 epochs with their respective control

List of Figures

xvii

signals in (b) and (d) 222

Fig. 6.39: Network convergence using the BPM, INCBP and the ARLS algorithms (performance

index vs. epoch) 226

Fig. 6.40: Comparison of the one-step ahead prediction of the soluble oxygen (SO) training data by the

network trained using backpropagation with momentum (BPM), incremental

backpropagation (INCBP), and the proposed ARLS algorithms 227

Fig. 6.41: Comparison of the one-step ahead prediction of the soluble oxygen (SO) validation data

using the network trained with backpropagation with momentum (BPM), incremental

backpropagation (INCBP), and the proposed ARLS algorithms 228

Fig. 6.42: Comparison of the ten-step ahead prediction of the soluble oxygen (SO) unscaled training

data by the network trained with backpropagation with momentum (BPM),

incremental backpropagation (INCBP), and the proposed ARLS algorithms 229

Fig. 6.43: The closed-loop NAMPC scheme used for the soluble oxygen (SO) in order to evaluate the

online model identification based on ARLS and MLMA algorithms 231

Fig. 6.44: The soluble oxygen predictions and the oxygen transfer coefficient control by (a) AGPC and

NAMPC with the control signal (b) for the manipulated variable, oxygen transfer coefficient

(KLa5) for the alternating AS-WWTP process. Computation time for the parallel implementation

of the identification and control strategies for the AS-WWTP process at each sampling instant

sample: (c) AGPC with an average computation time of 0.6594 seconds and (d) NAMPC

with an average computation time of 1.7316 seconds 233

Fig. 6.45: The soluble oxygen control predictions and control by AGPC and NAMPC with the control

signal (b) for the manipulated variable, oxygen transfer coefficient (KLa5) AS-WWTP process

with sinusoidal disturbances 234

Fig. 6.46: The F-16 aircraft surfaces for the control of the thrust, roll rate (p), pitch rate (q), yaw rate (r),

( , , )b b b

x y z are the body axes, ( , , )u v w are the velocities along the body axes, (L) is the

rolling moment, (M) is the pitching moment, (N) is the awing moment, ( , , )n n n

x y z is the

navigation frame, cm

O is the center of mass, ( , , )φ ϑ ψ are the Euler angles for aileron,

elevator and rudder deflections respectively 236

Fig. 6.47: Definition of the angle of attack, α ( 0α > ) and sideslip, β ( 0β > ). , ,n n n

x y z are the

North, East and South orientation of the navigation frame 236

Fig. 6.48: The desired reference trajectories for the roll rate (p), pitch rate (q) and the yaw rate (r)

as well as the thrust (t) for the nonlinear F-16 aircraft 237

Fig. 6.49: Convergence of the NN used to model the F-16 aircraft when it is trained with the ARLS

and the MLMA algorithms (sum of squares error vs. epoch) 243

Fig. 6.50: Comparison of the output predictions of the scaled training data using the network trained by

List of Figures

xviii

ARLS and MLMA: (a) roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and

(d) thrust prediction for 20 epochs 245

Fig. 6.51: Comparison of the outpredictions of the scaled training data using the network trained by ARLS

and MLMA: (a) roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d)

thrust prediction for 100 Epochs 246

Fig. 6.52: Comparison of the unscaled data predictions of the trained network the using by ARLS and

MLMA for (a) roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d)

thrust prediction for 20 Epochs 247

Fig. 6.53: Comparison of the output predictions of the unscaled validation data using the network trained

by ARLS and MLMA: (a) roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction

and (d) thrust prediction for 100 Epochs 248

Fig. 6.54: 5-step ahead output predictions (red--*) comparison with the original unscaled training data

(blue -) using the Network trained by the ARLS and MLMA algorithms for 20 Epochs: (a) roll

rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d) thrust prediction 249

Fig. 6.55: 5-step ahead output predictions (red--*) comparison with the original unscaled training data

(blue -) using the Network trained by the ARLS and MLMA algorithms for 100 Epochs: (a) roll

rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d) thrust prediction 250

Fig. 6.56: The nonlinear F-16 model: (a) neural network model identification and (b) neural

network-based adaptive control scheme using the NAMPC control strategy 252

Fig. 6.57: Responses of controlled variables and time variations of the manipulated variables when NN

is trained with ARLS and MLMA algorithms for 20 epochs: (a) roll rate, pitch rate, yaw rate and

thrust and (b) aileron deflection, elevator deflection, rudder deflection and the throttle setting 254

Fig. 6.58: Responses of controlled variables and time variations of the manipulated variables when NN is

trained with ARLS and MLMA algorithms for 100 epochs: (a) roll rate, pitch rate, yaw rate and

the thrust and (b) aileron deflection, elevator deflection, rudder deflection and the throttle setting 255

Fig. 6.59: The AGPC (blue--) and NAMPC (red -) responses of (a) roll rate p, (b) pitch rate q, (c) yaw rate r,

and (d) the throttle setting t with their respective manipulated variables (e) aileron deflections,

(f) elevator deflections, (g) rudder deflections and (h) the thrust command to track the reference

signal (green .-) 257

Fig. 6.60: Computation time for the parallel implementation of the identification and control strategies for

the nonlinear F-16 auto-pilot control system at each time sample: (a) AGPC and (d) NAMPC 258

Fig. 6.61: The proposed scheme for the FPGA implementation, verification and performance evaluation

of a neural network-based adaptive generalized predictive control (AGPC) algorithm on a

Xilinx Virtex-5 FX70T ML507 FPGA board 260

Fig. 6.62: The block diagram for the proposed model-based design flow for the FPGA implementation

of the AGPC algorithm on Virtex-5 FX70T ML507 FPGA development board 262

List of Figures

xix

Fig. 6.63: The block diagram of the AGPC modeling and Synthesis using MATLAB and Xilinx

AccelDSP synthesis tool 264

Fig. 6.64: AccelDSP design flow to generate the System Generator block model that encrypts the

AGPC algorithm 265

Fig. 6.65: Floating-point simulation results of the F-16 aircraft control using the MATLAB AGPC

algorithm with a total computation time of 104.8105 seconds 267

Fig. 6.66: AccelDSP fixed-point simulation of the F-16 aircraft control using the C++ AGPC algorithm

with a total computation time 100.17 seconds 267

Fig. 6.67: The System Generator block model of the AGPC algorithm generated by Xilinx AccelDSP

synthesis tool. Output sequence 1, 2 and 3 corresponds to aileron – roll, elevator – pitch and

rudder – yaw respectively 269

Fig. 6.68: The AccelDSP Synthesis Tool description of the generated hardware model of the AGPC

algorithm “agpc_acceldsp_model” 269

Fig. 6.69: The complete System Generator model for the generated hardware model

“agpc_acceldsp_model” for the AGPC algorithm 273

Fig. 6.70: The nonlinear F-16 aircraft control simulation results using the System Generator model

of the AGPC algorithm of Fig. 6.69 274

Fig. 6.71: (a) System Generator token (left) and the six System Generator compilation options with the

Hardware Co-Simulation options for Virtex-5 ML507 and (b) Hardware Co-Simulation block 274

Fig. 6.72: The System Generator model of the AGPC algorithm for the nonlinear F-16 aircraft auto-pilot

control with the generated Hardware Co-Simulation block 276

Fig. 6.73: Hardware-in-the-loop co-simulation results produced by the generated Hardware

Co-Simulation block model evaluated on the Xilinx Virtex-5 ML507 FPGA board over

JTAG cable. In the top plots, the output predictions (yellow) are compared to the reference

signal (red).The bottom plots are the control signals. (a), (b), and (c) are the simulation results

for the aileron-roll, elevator-pitch and rudder-yaw prediction and control respectively 277

Fig. 6.74: The System Generator model for the AGPC algorithm with the EDK Processor block used

for to generate the AGPC Co-Processor IP core. The model here is renamed as

“f16_nagpc_ipcore” to distinguish it from Fig. 6.69 279

Fig. 6.75: The generated AGPC Co-processor IP core that will be integrated with a PowerPC™440


Fig. 6.76: The XPS graphical user interface for the connecting and configuring the embedded

PowerPC™440 processor and the AGPC Co-Processor system 283

Fig. 6.77: The Xilinx ISE™ software for the synthesis, implementation and generation of the

programming file for the embedded PowerPC™440 processor – AGPC Co-Processor system 284

Fig. 6.78: The block diagram of the embedded PowerPC™440–AGPC Co-Processor system with their

List of Figures

xx

associated memory types, peripherals, clock generator, buses, hardware and software

specifications and key/symbols 286

Fig. 6.79: Xilinx SDK GUI for software development and programming the Virtex-5 FX70T ML507

FPGA board for embedded PowerPC™440 processor–AGPC Co-Processor system 289

Fig. 6.80: Memory allocation and generation of the linker script for the embedded PowerPC™440

processor–AGPC Co-processor system 290

Fig. 6.81: F-16 aircraft auto-pilot closed-loop control simulations on the embedded PowerPC™440

processor–AGPC Co-processor system running on Virtex-5 FPGA for 14,560 samples 291

Fig. 6.82: F-16 aircraft auto-pilot closed-loop control simulations on the embedded PowerPC™440

processor–AGPC Co-processor system running on Virtex-5 FPGA for 58,240 samples 291

Fig. 6.83: Computation time by the embedded PowerPC™440 processor–AGPC Co-processor system

at each time sample: (a) 14,560 samples for first control simulation and (b) 58,240 samples

for second control simulation 293

Fig. A.1: Embedded system design flow: IP – Intellectual Property, AD – algorithm developer, SE – system

engineer, HSE – hardware/software engineer, NDSPHE – Non-DSP hardware engineer, EDK–

Embedded Development Kit, XPS – Xilinx Platform Studio, XSDK – Xilinx Software

Development Kit, RTM – RTL Top-Level Module, ISE – Integrated Software Environment 333

Fig. A.2: System modeling, development, Simulation and validation 334

Fig. A.3: AccelDSP design routine at the Electronic System Level (ESL) 334

Fig. A.4: From system specification and algorithm/model development to Xilinx AccelDSP synthesis

design flow option implementations 335

Fig. A.5: System Generator token (left) and the six System Generator compilation options (right) with

available Hardware Co-Simulation options without the Virtex-5 ML507 FPGA board 336

Fig. A.6: HDL Co-Simulation with ModelSim and FPGA Hardware-in-the-Loop (HIL) Simulation with

ISE using System Generator in MATLAB/Simulink modeling environment 337

Fig. A.7: The basic embedded system design flow using the Xilinx EDK via the Xilinx ISETM

338

Fig. A.8: EDK Embedded processor import and export options within the Xilinx System Generator 339

Fig. A.9: Basic structure, interface and communication between an embedded processor system and

an PI core, user-defined or custom logic 339

Fig. A.10: Typical Xilinx ISE™ design implementation flowchart 342

Fig. A.11: The internal architecture of the DES48E multiplier for embedding into a Virtex-5 FPGAs 351

Fig. A.12: Including the DSP48E into FPGA with non DSP48 hardware primitive using the

“Use Synthesizable Model” highlighted with broken red lines 351

Fig. A.13: The Pipeline parameters tab for pipelining the Xilinx DSP48E embedded multiplier 351

Fig. A.14: The PowerPC™ 440 Core system on a chip with two-level bus structure and additional

peripherals 352

List of Figures

xxi

Fig. A.15: The PowerPC™ 440 embedded processor core block diagram 353

Fig. A.16: The logical organization of the PowerPC™ 440 embedded processor 354

Fig. A.17: The seven-stage pipelines included in the PowerPC™ 440 embedded processor core CPU 355

Fig. A.18: Power PC™ 440 Embedded Processor Block in Virtex-5 FPGAs 356

Fig. A.19: The architectural implementation of the embedded PowerPC™ processor and connection

to the associated peripherals in the Virtex-5 ML507 FX70T FPGA as well as the Virtex-5

FPGA family members 362

Fig. A.20: The architecture of the Xilinx MicroBlaze™ processor core, the core interfaces, buses 366

Fig. C.1: The schematic of the AS-WWTP process 381

Fig. C.2: Open-loop steady-state benchmark simulation model No.1 (BSM1) with constant influent 392

Fig. C.3: Simulink model of the bioreactor model 393

Fig. C.4: Simulink model of the flow splitter 393

Fig. C.5: Simulink model of the secondary settler 393

Fig. D.1: The four right positive control deflections of a nonlinear F-16 aircraft control surfaces with

the direction of positive thrust, roll rate (p), pitch rate (q), yaw rate (r), ( , , )b b b

x y z body

axes, velocities ( , , )u v w along the body axes, rolling moment (L), pitching moment (M),

yawing moment (N), navigation frame ( , , )n n n

x y z , the center of mass cm

O , the Euler

angles ( , , )φ ϑ ψ for aileron, elevator and rudder deflections respectively 430

Fig. D.2: The navigation frame and the Euler angles 431

Fig. D.3: The Euler angles and frame transformation 431

Fig. D.4: Definition of the angle of attack and sideslip, 0α > and 0β > 431

Fig. D.5: The schematic of the Simulink® model of the nonlinear F-16 aircraft of Fig. D.1 435

Fig. D.6: The Simulink model of the F-16 aircraft cockpit of Fig. D.5 435

Fig. D.7: The Simulink model of the leading edge flap for the F-16 aircraft 437

Fig. D.8: The Simulink model for creating the ( qbar ) and ( ps ) for the F-16 aircraft 437

Fig. D.9: The Simulink actuator model for the aileron, elevator, rudder, thrust and the leading edge flap

for the F-16 aircraft 437

Fig. D.10: The aileron, elevator, rudder and thrust disturbances model. The step time “Step1”, “Step2”

and “Step3” for aileron, elevator, rudder and thrust are all set to 1, 3 and 5 respectively 437

Fig. D.11: The Simulink model of the F-16 nonlinear dynamics together with its inputs defined by the

MATLAB Function “nlplant.c” given in Appendix D – 5 438

Fig. D.12: The F-16 aircraft state outputs sampled at 0.5 second using the Simulink zero-order-hold

(ZOH) block 438

Fig. D.13: Static ( ps ) and total (T

p ) pressures together with the airflow a

v , b

v and c

v 441

Fig. D.14: The measurement of static ( ps ), dynamic ( qbar ) and total pressure (T

p ) using the pitot tube 441

List of Tables

xxii

List of Tables

Table 2.1: Comparison of the Xilinx General-Purpose, Defense-Grade, Space-Grade

Virtex-4 and Virtex-5 FPGA Product Family Members in terms of their available

hardware resources and capabilities 58

Table 2.2: Summary of linear MPC applications by areas (estimates based on vendor survey; estimates

do no include applications by companies who have licensed vendor technology – Source

[Qin and Badgwell, 2003] 61

Table 2.3: Summary of nonlinear MPC applications by areas (estimates based on vendor survey;

Estimates do no include applications by companies who have licensed vendor technology)

– Source [Qin and Badgwell, 2003] 62

Table 3.1: Iterative Algorithm for Estimating the Covariance Noise Matrix 96

Table 3.2: An algorithm for placing the roots of the time-varying filter of a NNARMAX model

predictor within the unit circle for stability 99

Table 3.3: Iterative algorithm for selecting the Levenberg-Marquardt parameter, τλ 110

Table 3.4: The modified Levenberg-Marquardt algorithm (MLMA) incorporating the Trust Region

algorithm for updating ˆ( )kθ 112

Table 4.1: Iterative algorithm for selecting ( )τλ for guaranteed positive definiteness of the

Gauss–Newton Hessian Matrix 136

Table 4.2: The implementation steps for the nonlinear adaptive model predictive control (NAMPC)

algorithm 138

Table 5.1: The Xilinx platform studio (XPS) PowerPC™440 and MicroBlaze™ embedded processor

systems synthesis summary 169

Table 5.2: The Xilinx ISE™ device utilization summary used by the PowerPC™440 and

MicroBlaze™ embedded processor systems 170

Table 6.1: Summary of training results for ARLS and MLMA algorithms 180

Table 6.2: Input and output constraints on the PID control of the FBFR process 185

Table 6.3: Summary of training results for the BPM, INCBP and the MLMA algorithms 191

Table 6.4: Constraints for the FBFR Process 198

Table 6.5: The AGPC and the NAMPC Tuning parameters for the FBFR Process 198

Table 6.6: FBFR Process Constraints 203

Table 6.7: Tuning Parameters 203

Table 6.8: Size volume of the DPWS (in bytes) from the FBFR process 206

Table 6.9: Summary of the training results by ARLS and MLMA algorithms for the AS-WWTP process 217

Table 6.10: The AGPC process control and tuning parameters for the AS-WWTP process 222

Table 6.11: Summary of training results for the BPM, INCBP and the ARLS algorithms 226

List of Tables

xxiii

Table 6.12: Constraints on the soluble oxygen (SO) concentration control in the aerobic reactor

of the AS-WWTP Process 232

Table 6.13: The AGPC and the NAMPC tuning parameters for the SO control in the aerobic reactor

of the AS-WWTP process SO control 232

Table 6.14: Nonlinear F-16 aircraft model simulation parameters for data acquisition 240

Table 6.15: Summary of training results using the ARLS and MLMA algorithms for nonlinear F-16 aircraft 243

Table 6.16: Input and output constraints on the nonlinear F-16 aircraft 253

Table 6.17: Tuning parameters for the NAMPC controller 253

Table 6.18: Constraints for the nonlinear F-16 aircraft 256

Table 6.19: Tuning parameters for GPC and NAMPC controllers 256

Table 6.20: The total resources used by the AccelDSP Synthesis and System Generator for DSP

modeling tools for synthesizing, modeling and generating the AGPC Co-Processor system 285

Table 6.21: Comparison of the hardware resources used by the Xilinx platform studio (XPS) for the

AGPC Co-Processor systems synthesis 287

Table 6.22: Comparison of the hardware resources used by the Xilinx ISE™ for the implementation

of the AGPC Co-Processor system 287

Table 6.23: Summary of the computation times at various stages of the AGPC Co-Processor system

development as well as the complete embedded PowerPC™ processor–AGPC Co-Processor

system 293

Table C.1: The AS-WWTP Nomenclatures and Parameter Definitions 383

Table C.2: Stiochiometric parameters with their units and values 385

Table C.3: Kinetic parameters with their units and values 385

Table C.4: The double-exponential settling velocity function parameters with their definition,

units and values 388

Table C.5: Numerical values of the constraints available control handles and their limitations 398

List of Acronyms

xxiv

List of Acronyms

ACD Adaptive Critic Design

AFPE Akaike’s Final Prediction Error

AGPC Adaptive Generalized Predictive Control

AIL_PRED Roll Rate Output Predictions

AIL_REF Roll Rate Reference Signal

AIL_ROLL_CONT Aileron Control Signal

ALU Arithmetic Logic Unit

API Application Programming Interface

APU Auxiliary Processing Unit

ARGMC Adaptive Robust Generic Model Controller

ARIX Integrated Autoregressive with Exogenous Inputs

ARLS Adaptive Recursive Least Squares

ARMAX Autoregressive Moving Average with Exogenous inputs

ARX Autoregressive with Exogenous Inputs

AS Address Space

ASIC Application-Specific Integrated Circuit

ASP Activated Sludge Process

ASSP Application Specific Standard Part

AS-WWTP Activated Sludge Wastewater Treatment Plant

BFGS Broyden-Fletcher-Goldfrab-Shanno

BOD Biochemical Oxygen Demand

BP Backpropagation

BPM Backpropagation with momentum

BPTT Bacpropagation Through Time

BRAM Block RAM

BSB Base System Builder

BSM1 Benchmark Simulation Model Number 1

BTAC Branch Target Address Cache

CAD Computer-Aided Design

CAE Computer-Aided Engineering

CARIMA Controlled Autoregressive Integrated moving average

COD Chemical Oxygen Demand

CPERI Chemical Process Engineering Research Institute

List of Acronyms

xxv

CPLD Complex Programmable Logic Device

CPU Central Processing Unit

CR Condition Register

CSMA/CD Carrie Sense Multiple Access with Collision Detection

CV Control Variable

DAMRC Direct Adaptive Model Reference Control

DBCR Debug Counter Register

DCC Data Cache Controller

DCE Data Circuit-Terminating Equipment

DCR Device Configuration Register (and Device Control Register)

DCS Distributed Control System

DDR SRAM Double Data Rate Static Random Access Memory

DEC Decrementer

DFMLPNN Dynamic Feedforward Multilayer Perceptron Neural Network

DFNN Dynamic Feedforward Neural Network

DFNN Dynamic Feedforward Neural Network

DISS Decode/Issue

DLL Data Link Layer

DMA Direct Memory Acces

DMC Dynamic Matrix Control

DMIPs Dhrystone Million Instructions Per Second

DO Dissolved Oxygen

DPPC Dynamic Performance Predictive Control

DPWS Device Profile for Web Services

DSP Digital Signal Processor (Digital Signal Processing)

DTE Data Terminal Equipment

DTLB Data Shadow Translation Lookaside Buffer

DWP Deionized Water Pump

EDIF Electronic Data Interchange Format

EDK Embedded Development Kit

ELEV_PITCH_CONT Elevator Control Signal

ELEV_PRED Pitch Rate Output Predictions

ELEV_REF Pitch Rate Reference Signal

EXE1/AGEN Execute stage 1 and generate load/store address

EXE2/CRD Execute stage 2

List of Acronyms

xxvi

FBFR Fluidized Bed Furnace Reactor

FCC Fluid Catalytic Cracking

FIT Fixed Interval Timer

FNN Feedforward Neural Network

FPGA Field Programmable Gate Array

FPU Floating-Point Unit

FSL Fast Simplex Link

GAL Generic Array Logic

GCC GNU Compiler Collection

GMVC Generalized Minimum Variance Control

GNU Unix-Like Operating System

GPC Generalized Predictive Control

GPR General Purpose Register

GRNN Generalized Regression Neural Network

GUI Graphical User Interface

HDL Hardware Description Language

HIECOM Hierarchical Constraint Control

HIL Hardware-in-the-Loop

HRP High Resistance Potentiometer

HTTP Hypertext Transfer Protocol

HW Co-Sim Hardware Co-Simulation

I/O Input-Output

IBM International Business Machines

IC Integrated Circuit

ICC Instruction Cache Controller

ICI Initial Control Input

ICT Information and Communication Technology

IDCOM Identification and Command

IFTH Fetch instructions from instruction cache

IMC Internal Model Control

INCBP Incremental Backpropagation

IP Internet Protocol (and Intellectual Property)

IPO Initial Predicted Output

ISE Integrated Software Environment

ITLB Instruction Shadow Translation Lookaside Buffer

List of Acronyms

xxvii

JTAG Joint Test Action Group

LEF Leading Edge Flap

LMA Levenberg-Marquardt Algorithm

LMB Local Memory Block

LMS Least Mean Squares

LQG Linear Quadratic Gaussian

LQGPC Linear Quadratic Generalized Predictive Control

LQR Linear Quadratic Regulator

LUT Look-Up-Table

MAC Model Algorithmic Control (and Media Access Control)

MDM Microprocessor Debug Module

MHz Mega Hertz

MIMO Multiple-Inputs Multiple-Outputs

MLMA Modified Levenberg-Marquardt Algorithm

MLP Multilayer Perceptron

MLSS Mixed Liquor Suspended Solids

MLVSS Mixed Liquor Volatile Suspended Solids

MMU Memory Management Unit

MNN Modular Neural Network

MPC Model Predictive Control

MPHC Model Predictive Heuristic Control

Mp-QP Multi-Parametric Programming

MRAC Model Reference Adaptive Control

MSE Mean Square Error

MSR Machine State Register

MSR[DS] Data Access Address Space

MSR[IS] Instruction Fetch Address Space

MURHAC Multivariable Receding Horizon Adaptive Control

MUSMAC Multistep Multivariable Adaptive Control

MV Manipulated Variable

MVPE Mean Value of K-Step Ahead Prediction Error

NAMPC Nonlinear Adaptive Model Predictive Control

NCF Netlist Constraint File

NCS Network Control System

NGC Netlist with Logical Design Data and Constraints

List of Acronyms

xxviii

NMPC Nonlinear Model Predictive Control

NN Neural Network

NNARMAX Neural Network-Based Nonlinear Autoregressive Moving Average with Exogenous Inputs

NNARX Neural Network-Based Nonlinear Autoregressive with Exogenous Inputs

NNOE Neural Network-Based Nonlinear Output Error

NPC Nonlinear Predictive Control

OE Output Error

OPB On-Chip Peripheral Bus

OSI Open Systems Interconnection

OTP One-Time Programmable

PAL Programmable Array Logic

PAO Phosphorus-Accumulating Organisms

PC Preview Control

PCI Peripheral Component Interconnect

PCT Predictive Control Technology

PDCD Pre-decode

PFC Predictive Functional Control

PHA Ploy-β Hydroxyl Alkanoates

PID Proportional-Integral-Derivative (and Process Identity)

PLB Processor Local Bus

PLC Programmable Logic Controller

PNN Probabilistic Neural Network

QDMC Quadratic Dynamic Matrix Control

RACC Register Access

RAM Random Access Memory

RAS Recycled (Returned) Activated Sludge

RBF Radial Basis Function

RBFNN Radial Basis Function Neural Network

RISC Reduced Instruction Set Computer

RLS Recursive Least Squares

RMPCT Robust Model Predictive Control Technology

RNN Recurrent Neural Network

RTL Register Transfer Level

RTRL Real Time Recurrent Learning

RUDD_PRED Yaw Rate Output Predictions

List of Acronyms

xxix

RUDD_REF Yaw Rate Reference Signal

RUDD_YAW_CONT Rudder Control Signal

SDK Software Development Kit

SDRAM Single Data Rate RAM

SDU Steam Deactivation Unit

SLC Single-Loop Controller

SO Soluble Oxygen

SOA Service-Oriented Architecture

SOAP Simple Object Access Protocol

SoC System-on-a-Chip

SQP Sequential Quadratic Programming

SRAM Static RAM

TCP Transfer Control Protocol

TCR Timer Control Register

TDL Tapped Delay Lines

TDNN Tapped Delayed Neural Network

TLB Translation Lookaside Buffer

TSR Timer Status Register

UART Universal Asynchronous Transmitter and Receiver

UAV Unmanned Aerial Vehicle

UCS Unit Cell Size

UDP User Datagram Protocol

UPC Unified Predictive Control

UPnP Universal Plug-n-Play

VFA Volatile Fatty Acid

VHDL Very-High-Speed Hardware Description Language

WAS Waste Activated Sludge

WB WriteBack

WS Web Services

WSDL Web Services Description Language

WWTP Wastewater Treatment Plant

XCL Xilinx Cache Link

XML Extensible Markup Language

XPS Xilinx Platform Studio

XST Xilinx Synthesis Tool

Chapter 1 Introduction

1

CHAPTER 1

INTRODUCTION

1.1 Introduction

Model predictive control (MPC) is an established advanced control strategy based on the optimization of

an objective function within a specified horizon and has been recognized as the winning alternative for

constrained multivariable control systems ([Dones et al., 2010]; [Maciejowski, 2002]; [Normey-Rico and

Camacho, 2007]; [Seborg et al., 2004]; [Wang, 2009]). Its main strength is when it is applied to problems with

large number of manipulated and controlled variables, constraints imposed on both manipulated and controlled

variables, changing control objectives and/or equipment failures, and time delays [Grimble and Ordys, 2001].

MPC was originally developed in the 1970s [García et al., 1989] to meet the specialized needs of power plants

and petroleum industries but it is now widely adopted in industries as an effective means to deal with large

multivariable constrained control problems.

The most straightforward MPC design techniques are those that are based on a linear mathematical model

of the controlled process [Muske and Rawlings, 1993]. However, the characteristics of many industrial

applications in areas such as robotics, aerospace, batch processing, petrochemicals, automotives, chemicals, e.t.c;

are highly nonlinear and time-varying in nature. In these cases the linear MPC design techniques result to

inefficient control algorithms [Kalra and Georgaki, 1994] and methods based on nonlinear models of the

processes are preferred ([Dones et al., 2010]; [Potočnik and Grabec, 2002]). In either of the linear or nonlinear

cases, the use of a model of the process does not fully reflect the actual process operation over long periods of

time. Therefore, the algorithms obtained by MPC design techniques which are based on a mathematical model of

the controlled process [Muske and Rawlings, 1993] are not very efficient because these methods cannot guarantee

stable control outside the range of the model validity ([Kalra and Georgaki, 1994]; [Su and Wu, 2009]). For these

reasons adaptive algorithms which could be based on a continuous model updating process and redesign of the

MPC strategy before a new control action is applied to the real plant would result to a better plant performance.

Up to now the development of such algorithms was very much restrained to systems with large sampling time

because of their high computation time [Dones et al., 2010]. When this high computation time is longer than the

time constant of the controlled variables, the application of such algorithms is of no use. These MPC algorithms

include many calculations that can be executed in parallel and therefore their execution time can be significantly

reduced below the time constants of the controlled variables in a number of industrial applications, especially in

the chemical and petrochemical industry, if parallel processing techniques are applied. However, up to the recent

past, using parallel computing facilities merely for control applications was not cost wise feasible.

The recent development and availability of multi-core processors, the Service Oriented Architecture

(SOA) for clustering multicore processors and the Field Programmable Gate Array (FPGA) technologies at very


2

competitive prices make us to rethink the possibility of developing adaptive MPC algorithms. These MPC control

strategies would first involve the frequent updating of the model used to design the MPC algorithms, at every

sampling instant if possible, and next the application of the design method by using the updated model to

reconfigure the algorithm and compute the next control action by using the reconfigured algorithm.

Even if this approach is used, the use of traditional modeling methods used in several variations of the

MPC designs ([Camacho and Bordons, 2007]; [Grimble and Ordys, 2001]; [Maciejowski, 2002]) cannot model

accurately the strong interactions among the process variables as well as the short and tight operating constraints.

The best approach would be the use of highly complicated validated models of groups of nonlinear differential

and partial differential equations, and the invention of new MPC design methods based on these models. However

the computational burden for modeling dynamic systems with relatively short sampling interval becomes

enormous to be handled even by the new multi-core, clustering and FPGA technologies. In order to exploit these

technologies, instead of using groups of differential equations, one could consider developing other accurate

nonlinear models, the computational burden of which would be of course higher than the linear models but less

than that of the groups of differential equations. If, however, this computational burden is kept to a certain level,

then the development of model-based adaptive MPC control algorithms might become feasible for certain classes

of applications with the current multi-core computers, service-oriented architecture (SOA) clustering networks

and FPGA technologies.

A recent approach to modeling nonlinear dynamical systems is the use of neural networks (NN). The

application of neural networks (NN) for model identification and adaptive control of dynamic systems has been

studied extensively ([Jin and Su, 2008]; [Mjalli, 2006]; [Narendra and Parthasarathy, 1990]; [Nørgaard et al.,

2000]; [Omidvar and Elliott, 1997]; [Salahshoor et al., 2010]; [Sarangapani, 2006]; [Spooner et al., 2002]; [Su

and Wu, 2009]; [Suárez et al., 2010]; [Yu and Yu, 2007]). As demonstrated in [Nørgaard et al., 2000], [Omidvar

and Elliott, 1997], [Sarangapani, 2006] and [Spooner et al., 2002], neural networks can approximate any nonlinear

function to an arbitrary high degree of accuracy. The adjustment of the NN parameters results in different shaped

nonlinearities achieved through a gradient descent approach on an error function that measures the difference

between the output of the NN and the output of the true system for given input data or input-output data pairs

(training data).

In the absence of operating data from the transient and steady state operation of the system to be

controlled, data for training and testing the NN model can be obtained from the system by simulating the

validated model of the groups of differential equations which are usually derived from the first principles on

which the operation of the physical process is based. Such approaches are reported in [Jin and Su, 2008], [Su and

Wu, 2009], [Suárez et al., 2010], [Guarneri et al., 2008] and [Yüzgeç et al., 2008]. The use of the nonlinear NN

models can replace the first principles model equally well and it can reduce the computational burden as argued in

[Yüzgeç et al., 2008] and [Lu and Tsai, 2008]. This is because a nonlinear discrete NN model of high accuracy is

available immediately after or at each instant of the network training process


3

The aim of the research work presented in this thesis was at providing new model-based adaptive MPC

algorithms and computer system architectures for their implementation with the purpose to achieve algorithm

execution times well below the limits of sampling times that are required for the stable operation of typical

industrial processes. The specific research objectives and the claimed scientific contributions are presented in the

next sections.

1.2 Research Objectives

The following are the specific objectives of the research:

1. To develop new and efficient but less computational intensive neural network-based model identification

algorithms for modeling nonlinear dynamical systems. In this framework, two neural network-based

identification algorithms are proposed, namely: the adaptive recursive least square (ARLS) algorithm and the

modified Levenberg-Marquardt algorithm (MLMA).

2. To develop new and efficient but less computational intensive neural network-based model predictive control

(MPC) algorithms for nonlinear dynamical system control. In this research, two MPC algorithms are

proposed, namely: the neural network-based adaptive generalized predictive control (AGPC) and neural

network-based nonlinear adaptive model predictive control (NAMPC). The AGPC is based on the recursive

solution of a Diophantine equation combined with a constrained sequential quadratic programming (SQP)

optimization technique to obtain the AGPC optimal control signal. The nonlinear adaptive model predictive

control (NAMPC) algorithm on the other hand is based on the trust-region method which uses the full-

Newton method and guarantees the positive definiteness of the second-order Hessian matrix by determining λ

iteratively in an open neighbourhood of a global minimum. The NAMPC optimal control signal is obtained

by a direct nonlinear second-order optimization technique.

3. Performance evaluation of the proposed neural network-based identification and control algorithms by applying

them on three highly nonlinear dynamic systems considered as the case studies of this work, namely:

i) A fluidized bed furnace reactor (FBFR) of the steam deactivation unit (SDU) used for preparing catalyst for

evaluation in a fluid catalytic cracking (FCC) pilot plant;

ii) An activated sludge wastewater treatment plant (AS-WWTP) in accordance with the European wastewater

treatment standard, and

iii) The auto-pilot control unit of a nonlinear F-16 fighter aircraft.


4

4. To propose computer system architectures for the implementation of the proposed neural network-based

identification and model-based predictive control algorithms and assess the feasibility of the implementation

by performing hardware-in-the-loop simulations (HIL). The following two architectures are proposed:

i) A service-oriented architecture (SOA) network based on the device profile for web services (DPWS); and

ii) A real-time embedded processor platform based on the Xilinx Virtex-5 FX70T ML507 FGPA board.

1.3 Scientific Contributions

1. A new proposed neural network-based adaptive recursive least squares (ARLS) model identification algorithm

can be used to effectively identify nonlinear dynamic systems with sampling intervals as low as 2.5 minutes.

2. A new proposed neural network-based modified Levenberg-Marquardt algorithm (MLMA) model

identification algorithm which can be used to accurately identify the model time-varying systems with

sampling intervals as low as 5 seconds.

3. A new adaptive generalized predictive control (AGPC) algorithm with low computational requirements which

can be used for the adaptive control of linear systems as well as systems with smooth nonlinearities and short

sampling intervals.

4. A new nonlinear adaptive model predictive control (NAMPC) algorithm which can be used to efficiently

control nonlinear systems with non-smooth nonlinearities at the expense of higher computation time than the

adaptive generalized predictive control (AGPC) algorithm.

5. A networked control system which utilizes the service-oriented architecture (SOA) technology based on device

profile for web services (DPWS) for the implementation of the proposed algorithms in (1) to (4) with reduced

data transmission overhead compared to that of a conventional SOA architecture.

6. A comparison of the processing speed and hardware resource utilization for the realization of an embedded

MicroBlaze soft processor system versus an embedded PowerPC™440 hard processor system on a Virtex-5

Xilinx FPGA based. This comparison has shown that the IBM PowerPC™440 processor outperforms the

Xilinx MicroBlaze processor in terms of processing speed and hardware resource utilization, rendering it more

appropriate for applications such as those proposed algorithms in (1) to (4). The FPGA-industry standard

Dhrystone benchmark validation tests demonstrates the compliance and high performance embedded processor

systems design reported in this dissertation.


5

7. A proposed new technique for the synthesis and generation of a hardware intellectual property (IP) core for the

adaptive generalized predictive control (AGPC) algorithm using the Xilinx AccelDSP synthesis tool.

8. The development of a System Generator model of the AGPC using the AGPC IP core in (7) above and a FPGA

hardware-in-the-loop simulator for verifying the MPC-based control system using the Xilinx Virtex-5 FX70T

ML507 FPGA board.

9. Hardware and software techniques for embedding a hard-core PowerPC™ 440 processor in a Xilinx FPGA and

programming an adaptive generalized predictive control (AGPC) algorithm on a FPGA either with the Power

PC core or the MicroBlaze processor in an FPGA by using the EDK development kit (EDK). This design

methodology presents a new way for integrating and embedding MPC algorithms into embedded processor

systems.

10. The closed-loop implementation and application of the proposed model identification and adaptive MPC

control algorithms listed in (1) to (4) to the three industrial case studies as follows:

i) The temperature control of the fluidized bed furnace reactor (FBFR) implementation using a personal

computer;

ii) The temperature control of the fluidized bed furnace reactor (FBFR) over the proposed service-oriented

architecture (SOA) cluster network based on the device profile for web services (DPWS);

iii) The dissolved oxygen concentration control in the third aerobic reactor of an activated sludge wastewater

treatment plant (AS-WWTP) implementation using a personal computer;

iv) The auto-pilot control system of a nonlinear F-16 fighter aircraft implementation using a personal

computer system; and

v) The embedded PowerPC™440 processor–AGPC co-processor system implementation on a Xilinx Virtex-5

FX70T ML507 FPGA board for the auto-pilot control system of a nonlinear F-16 fighter aircraft.

These closed-loop implementations have demonstrated that: 1) the proposed neural network-based model

identification and adaptive control algorithms can be applied for the control of any nonlinear dynamic system,

2) the proposed service-oriented architecture (SOA) cluster network based on the device profile for web

services (DPWS) can be used in industrial network control system environment with the algorithms listed in

(1) to (4), and 3) the FPGA is a suitable platform for implementing the algorithms listed in (1) to (4) for the

nonlinear F-16 aircraft auto-pilot control.


6

1.4 Thesis Organization and Structure

The thesis is organized into seven (7) chapters and an appendix under four different headings. The thesis

begins with the Introduction in Chapter 1 and ends with Chapter 7 where the evaluation of results, discussions,

conclusions and recommendation are given. The rest of the thesis is organized as follows.

In this Chapter 1 the problem to be researched, the objectives of the research and their scientific

contributions mad were presented.

Chapter 2 gives a literature survey on the researched problem and a concise and comprehensive treatment

of the basic concepts and the background knowledge required for reading this thesis.

Chapter 3 presents the proposed in this thesis two neural network based identification algorithms, namely:

the adaptive recursive least squares (ARLS) and the modified Levenberg-Marquardt algorithm (MLMA). Three

validation techniques are also introduced and briefly discussed. The parameters for evaluating the performance of

the training algorithms are introduced and discussed in this chapter.

Then in Chapter 4, two model-based predictive control algorithms are proposed and formulated, namely:

the adaptive generalized predictive control (AGPC) and the nonlinear adaptive model predictive control

(NAMPC).

Chapter 5 presents the development of computing platforms for which the neural network-based nonlinear

model identification and adaptive model predictive can be realized.

Chapter 6 investigates the performance of the proposed algorithms when they are applied them to three

industrial case studies, namely:

i) The temperature control of the fluidized bed furnace reactor (FBFR) of the steam deactivation unit of a

fluid catalytic cracking (FCC) pilot plant;

ii) The soluble oxygen (dissolved oxygen concentration) control of the third aerobic reactor of an

activated sludge wastewater treatment plant (AS-WWTP);

iii) The auto-pilot control system of a nonlinear F-16 fighter aircraft.

The performances are assessed by employing hardware in the loop simulation techniques.

Chapter 7 concludes the thesis. In this chapter the results are evaluated and discussed. Conclusions are

drawn and recommendations for further work are given. The main contributions and some limitations of the work

presented in this thesis are highlighted and briefly discussed.


7

Appendix A gives an overview of Xilinx FPGA synthesis, model-based and embedded system design

tools including brief description of some blocks used from the MATLAB/Simulink, A System Generator for DSP

libraries and a detailed description of the IBM PowerPC™ 440 embedded processor are also explained.

Appendix B gives the mathematical model of the first case study; that is, the fluidized bed furnace reactor

(FBFR) of the steam deactivation unit (SDU) of a fluid catalytic cracking (FCC) pilot plant.

Appendix C discusses second case study which is consider in this work; that is, the activated sludge

wastewater treatment plant (AS-WWTP) together with its complete model description within the framework of

the European wastewater management requirement.

In Appendix D, the main components of the nonlinear F-16 fighter aircraft together with their

descriptions and mathematical model are presented.

Finally, Appendix E presents the synthesis results and device utilization for mapping an adaptive

generalized predictive control (AGPC) to an embedded FPGA processor system. The embedded processor

application programmer interface (API) for the synthesized AGPC algorithm is also presented in this Appendix E.

1.5 Scientific Publications

The following is the list of the scientific publications that have been made within the framework of this

work. Four papers have been published in refereed conference proceedings, two in refereed journals and one

submitted. These papers are listed under the following three categories:

1). Published Papers in Refereed Conference Proceedings

1. [Akpan and Hassapis, 2009] Akpan, V. A. and Hassapis, G. (2009). “Adaptive predictive control using

recurrent neural network identification”. In the Proceedings of the 17th Mediterranean Conference on

Control and Automation, Thessaloniki, Greece, 24 – 26, June 2009, pp. 61 – 66.

2. [Akpan and Hassapis, 2010] Akpan, V. A. and Hassapis, G. D. (2010). “Adaptive Recurrent Neural

Network Training Algorithm for Nonlinear Model Identification using Supervised Learning”. In the

Proceedings of the 2010 American Control Conference (ACC2010), Baltimore, Maryland, USA, 30 June –

02 July, 2010, pp. 4937 – 4942.

3. [Akpan et al., 2010] Akpan, V. A., Samaras, I. K., and Hassapis, G. D. (2010). “Implementation of Neural

Network-Based Nonlinear Adaptive Model Predictive Control over a Service-Oriented Computer

Network”. In the Proceedings of the 2010 American Control Conference (ACC2010), Baltimore,

Maryland, USA, 30 June – 02 July 2010, pp. 5495 – 5500.


8

4. [Samaras et al., 2009] Samaras, I. K., Gialelis, J. V., Hassapis, G. D. and Akpan, V. A. (2009). “Utilizing

semantic web services in factory automation towards integrating resource constrained devices into

enterprise information systems”. In the Proceedings of the 14th IEEE International Conference on

Emerging Technologies and Factory Automation (ETFA’2009), Palma de Mallorca, Spain, 22 – 26 Sept.,

2009, pp. 1 – 8.

2). Published Papers in Refereed Journals

1. [Akpan and Hassapis, 2011] Akpan, V. A. and Hassapis, G. D. (2011). “Nonlinear model identification and

adaptive model predictive control using neural networks”. ISA Transactions, vol. 50, no. 2, pp. 177 – 194.

2. [Akpan and Hassapis, 2011] Akpan, V. A. and Hassapis, G. D. (2011). “Training dynamic feedforward

neural networks for online nonlinear model identification and control applications”. International Reviews

of Automatic Control: Theory & Applications, vol. 4, no. 3, pp. 335 – 350.

3). Papers Submitted

1. [Akpan et al., 2011] Akpan, V. A., Samaras, I. K., and Hassapis, G. D. (2011). “A service-oriented

architecture cluster network for industrial control applications”. European Journal of Control (Submitted).

Chapter 2 Background of the Research

9

CHAPTER 2

BACKGROUND OF THE RESEARCH

2.1 Introduction

In this chapter, concise and comprehensive background knowledge is presented on the dynamic modeling

of industrial processes, the design of control algorithms and platforms and techniques used for their

implementation. This knowledge is considered fundamental for the non-expert to understand the research work

presented in this thesis. In the following, the major contents of each of the 11 sections of this chapter are briefly

presented.

Section 2.2 introduces the historical background of modern control engineering and extends these

concepts to model predictive control (MPC). The argument for introducing MPC is briefly highlighted. An

overview of the MPC is first presented and then the basic MPC control strategy is illustrated schematically.

In Section 2.3, the three basic types of process models used for modeling systems intended for MPC

applications, namely: empirical, state space, input-output models are briefly introduced. It is argued in this section

that neural networks (NN) have proven to be a universal approximator for modeling nonlinear systems based on

the input-output model type.

Neural networks (NN) are discussed in Section 2.4. The section begins with an overview of NN with the

concept of perceptron for single layer perceptron and multilayer perceptron (MLP). The various forms of

supervised and unsupervised methods using NN are mentioned and the methods that are widely used in dynamic

system modeling for MPC designs and applications are briefly discussed with emphasis on their advantages and

disadvantages over the other methods considered. Although, both the supervised and unsupervised methods are

applicable in diverse application areas, it is argued in this section that the supervised method is suitable for the

present work in which system models are used for MPC design. The concept of NN training is briefly introduced

using two basic NN training algorithms, namely: 1) backpropagation with momentum (BPM) for training static

and feedforward NN, and 2) teacher-forced method derived from real-time recurrent learning for training dynamic

NN in real-time. Based on the remarks from the two training methods, the teacher-forced RTRL method is

adopted for use in the present work; and its proposed structure is presented and briefly discussed.

For the input-output model type adopted in the current work, nonlinear dynamical system modeling using

NN typically involves mapping a set of input-output data to a particular model type or structure. Thus, in Section

2.5, a general system description is presented in terms of a family of dynamic model structures and the meaning

of the mathematical notations that defines the model structures.

A brief remark on disturbances, which is the characteristics of industrial processes, is also given in this

section. On the basis of the disturbance model, three model structures are identified and briefly discussed for use


10

with NN in this study, namely: neural network-based nonlinear autoregressive with exogenous inputs (NNARX)

model, neural network-based nonlinear autoregressive moving average with exogenous inputs (NNARMAX)

model, and the neural network-based nonlinear output error (NNOE) model.

Having introduced the basic MPC strategy and modeling techniques, the different technologies, platforms

and techniques for implementing MPC algorithms are investigated in Section 2.6. In this section, computer

implementation of MPC algorithms, related technologies for implementing MPC algorithms, and field

programmable gate array (FPGA) implementation of MPC algorithms are considered. It is argued in the remarks

on the reviewed implementation strategies that service oriented network based fieldbuses and FPGAs could be

novel platforms for implementing and evaluating the performance of MPC algorithms. The former is suitable for

industrial network control systems (NCS) while the latter is suitable for the implementation of MPC algorithms

targeting real-time embedded control applications.

Section 2.7 gives an overview of the switched Ethernet architecture which is used in this study for

realizing a service-oriented architecture (SOA) cluster network based on device profile foe web services (DPWS)

and outlines the benefits of their utilization in a NCS.

The field programmable gate array (FPGA), the second platform for implementing the MPC algorithms,

forms the discussions of Section 2.8. After careful analysis and product evaluation, the Xilinx FPGAs is selected

for use in this work. Overview of the Xilinx model-based and embedded system design flow together with the

relevant tools and design capabilities are introduced and discussed in this section.

The applications of MPC are outlined in Section 2.9 while practical problems associated with current

MPC algorithms are highlighted in Section 2.10.

In Section 2.11, several NN control techniques proposed in literature to address some of the issues raised

in the Section 2.10 are presented. In this section, these NN-based techniques are classified and discussed under

four main classes, namely: direct adaptive control, indirect adaptive control, modular neural network (MNN)

controller design, backpropagation through time (BPTT) control scheme, and neural network-based adaptive

control designs. The advantages and disadvantages of the different control techniques are highlighted with some

remarks.

Section 2.12 concludes this chapter with state-of-the-art review of the main concepts of this chapter. The

successes and drawbacks of existing techniques on NN-based MPC and their implementations are reviewed.

Then, general frameworks of the new techniques proposed in this work to improve on existing techniques and at

the same time address the reported drawbacks are presented.


11

2.2 Model Predictive Control (MPC)

2.2.1 Historical Background of MPC

The development of modern optimal control methods can be traced back to the work of Kalman in the

early 1960’s [Kalman, 1960a] who studied a Linear Quadratic Regulator (LQR) designed to minimize a quadratic

objective function. The solution to the LQR problem was shown to have powerful stabilizing properties for any

reasonable linear plant. A dual theory was then developed to estimate the plant states from noisy input and output

measurements, using what is now known as the Kalman Filter [Kalman, 1960b]. The combined LQR and Kalman

filter is called Linear Quadratic Gaussian (LQG) controller. Constraints on the process inputs, states, and outputs

were not considered in the development of the LQG theory. Although, the LQG theory provides an elegant and

powerful solution to the problems of controlling unconstrained linear plants, it had little impact on control

technology development in the process industries. The most significant reasons that the LQG theory failed have

been related to the culture of the industrial process control communities at that time in which instrument

technicians and control engineers either had no exposure to LQG concepts or regarded the LQG as impractical.

This led to the development in industry by [Cutler and Ramaker, 1980] of a more flexible, very powerful

and general criterion-based control method which does not suffer from all the above mentioned drawbacks and in

which the optimization problem is solved on-line at each control interval ([Grimble and Ordys, 2001] and

Badgwell, 2003]). This new method was based on a linear model of the plant and it was called Dynamic Matrix

Control (DMC). The basic idea is to use the convolution of the time-domain step-responses of the process to

predict the future controlled variables (CVs) and to obtain the optimal movement of the manipulated variables

(MVs). Each different version of this new method is now generally referred to as Model Predictive Control

(MPC) ([Clarke et al, 1987a]; [Clarke et al, 1987b]; [Clarke and Mohtadi, 1989]; [García and Morshedi, 1986];

[García et al., 1989]; [Richalet et al., 1978]). In addition to development of the MPC method, new process

identification technologies have also been proposed and developed to allow quick estimation of empirical

dynamic models from test data, thereby substantially reducing the cost of model development (see for example

[Camacho and Bordons, 2007]; [Ljung, 1999]; [Maciejowski, 2002]; [Normey-Rico and Camacho, 2007];

[Omidvar and Elliott, 1997]; [Nørgaard et al., 2000]; [Seborg et al., 2004]; [Spooner et al., 2002]; and [Wang,

2009]).

It has been shown in [Lewis and Syrmos, 2003] that the two optimal control design methods, LQG and

MPC, can guarantee closed-loop stability; but MPC explicitly takes into account the process model and

constraints, which is an important feature in many industrial processes ([Camacho and Bordons, 2007];

[Maciejowski, 2002]; [Seborg et al., 2004]; [Zheng and Morari, 1995]). Together with proportional-integral-

derivative (PID) controllers of different degree of freedom (DOF) [Normey-Rico and Camacho, 2007], MPC is

the most widely used control technique in process control industries ([Camacho and Bordons, 2007]; [Seborg et

al., 2004]; [Wang, 2009]). Indeed, in its basic unconstrained form, MPC is closely related to LQG control. In the


12

constrained case, however, MPC leads to an optimization problem which is solved on-line at each sampling

interval, and takes full advantage of the computational power available in today’s control computer hardware.

Although, the development and application of MPC was driven by process industry, the idea of controlling a

system by solving a sequence of open-loop dynamic optimization problems was not new. Propoi [Propoi, 1963]

described a moving horizon controller in 1963 while Lee and Markus [Lee and Markus, 1967] anticipated the

current MPC practice in 1967 and the later model predictive heuristic control proposed by Richalet and co-

workers [Richalet et al., 1978]. Nowadays, MPC application extends to a number of other embedded system

applications for biomedical instrumentation, telecommunication systems, automotive controls ([Bemporad and

Morari, 1999], [Dones et al., 2010], [Froisy, 1994], [Qin and Badgwell, 2003]).

2.2.2 Overview of MPC Strategy

MPC is an established advanced control strategy based on the optimization of an objective function

within a specified horizon and has been recognized as the winning alternative for difficult multivariable control

systems with tight constraints ([García et al., 1989] and [Zheng and Morari, 1995]). MPC refers to a wide class of

optimal control algorithms that make use of explicit process model to predict future plant behaviour.

Although, MPC is suitable for almost any kind of problem, it displays its main strength when applied to

problems with large number of manipulated and controlled variables, constraints imposed on both manipulated

and controlled variables, changing control objectives and/or equipment (sensor/actuator) failures, and long time

delays. The many algorithms associated with MPC include: Model Predictive Heuristic Control (MPHC)

[Richalet et al., 1978], Dynamic Matrix Control (DMC), Quadratic Dynamic Matrix Control (QDMC), Model

Algorithmic Control (MAC), Predictive Functional Control (PFC), Preview Control (PC), Generalized Predictive

Control (GPC), Linear Quadratic Generalized Predictive Control (LQGPC), Dynamic Performance Predictive

Control (DPPC), Predictive Control Technology (PCT), Hierarchical Constraint Control (HIECOM),

Identification and Command (IDCOM), Multistep Multivariable Adaptive Control (MUSMAC), Multivariable

Receding Horizon Adaptive Control (MURHAC), robust model predictive control technology (RMPCT), Unified

Predictive Control (UPC), e.t.c. More details on these algorithms can be found in [Bemporad and Morari, 1999];

[Camacho and Bordons, 2007], [Froisy, 1994]; [Grimble and Ordys, 2001]; [Qin and Badgwell, 2003].

The most significant feature that distinguishes MPC from other control algorithms is its receding control

approach and its long range prediction concept. In addition, the superior performance of MPC in handling

constraint violations in a natural and systematic way, where constraints can be incorporated directly into the

objective function, makes it theoretically a perfect real-time optimal control strategy.

While the many MPC algorithms differ in certain details, the main ideas behind them are very similar.

The basic structure of MPC scheme is shown in Fig. 2.1; where '( ),R k ( ),R k ( ),E k ( )U k , ( )Y k and ( )d k are the

desired reference signal, filtered reference signal, prediction error, control input, system output, predicted output


13

and noise/input disturbances respectively and k is the number of samples based on the new measurement data

sample. The MPC scheme of Fig. 2.1 uses the explicit process model to predict the process output and calculates

such a process input that makes the predicted output to follow the desired reference signal according to the MPC

strategy of Fig. 2.2.

In the basic MPC scheme ([Maciejowski, 2002]; [Normey-Rico and Camacho, 2007]; [Seborg et al.,

2004]; [Wang, 2009]), the desired reference signal '( )R k is calculated via a pre-filter to compensate for

disturbances on the reference signal [Clarke et al., 1987b]. Here, the desired reference signal is calculated by

using a first-order low-pass digital filter defined as follows:

( ) '( )m

m

BR k R k

A= (2.1)

where '( )R k and ( )R k are the desired and filtered reference signals respectively; m

A and m

B are the denominator

and numerator polynomials of the filter. In this way, the MPC is deigned, in part, based on the filter tracking error

capability; where m

A and m

B serve as tuning parameters used to improve the robustness and internal stability of

the MPC controller respectively

The main idea behind the MPC strategy is based on a receding horizon principle illustrated in Fig. 2.2 and

can be summarized as follows with the assumption that an explicit, stable, proper and deterministic discrete-time

model of the process is available:

(i) At the current sampling time k , the NN model predictor uses the past m-inputs, n-outputs and the

current system information to identify the nonlinear discrete-time NN model of the system.

(ii) Assuming that the identified NN model is stable, proper and deterministic, then the NAMPC strategy uses

the identified NN model to accurately predict the current system output ˆ( )Y k at the same sample time

instant k. However, the AGPC uses a linear model based on the instantaneous linearization of the NN

model around the current system operating point.

System ( )Y k ( )U k

•

Neural

Network

Model

Nonlinear

Optimizer •

+ −

( )E k

•

Constraints ( )d k

( )R k

'( )R k

First-Order

Low Pass

Filter

Fig. 2.1: Basic structure of MPC scheme.


14

(iii) At timeu d

k N N+ − , the MPC strategy calculates a sequence of control inputs ( | )u d

U k N N k+ −

consisting of the current ( | )U k k and future inputs ( | )u d

U N N k− . The current input ( ) ( | )U k U k k= is

held constant after u

N control moves. u

N is the maximum control horizon. The input ( )U k is calculated

in such a way that a set of η-step ahead predicted outputs ˆ( | )Y k kη+ approaches the desired reference

signal in an optimal manner over a specified prediction horizon [ , ]d pN Nη ∈ ; where d

N and pN are

the minimum and maximum prediction horizons respectively, with d

N is usually taken as 1, that

is, 1d

N = .

(iv) The predicted values are used to calculate the control moves by minimizing an objective function of the

form:

ˆ ˆ ˆ( ( )) ( ) ( ) ( ) ( ) ( ) ( )T

TJ U k R k Y k R k Y k U k U kκ ρ = − − +

(2.2)

subject to the constraints

( ) 0u d

U k N N∆ + − = [ , ]d pN Nη ∈ (2.3)

min max( )U U k U∆ ≤ ∆ ≤ ∆ , min max( )Y Y k Y≤ ≤ (2.4)

where

( ) [ ( ) ( )]T

d pR k R k N R k N+ + … ,

ˆ ˆ ˆ( ) [ ( | ) ( | )]T

d pY k Y k N k Y k N k+ + …

Past Future

Set point (desired values), ( | )R k kη+

( 1 | )u

U k N k+ −

Control Horizon,u

N

Prediction Horizon,p

N

k 1k −

( )U k d m− −

( )Y k n−

ˆ ( | )Y k kη+

1u

k N+ − k η+ 1k +

Fig. 2.2: The general MPC control strategy.


15

ˆ( ) ( ) ( ) [ ( | ) ( | )]T

d pE k R k Y k E k N k E k N k = − + + … ,

( ) [ ( ) ( )]T

u dU k U k U k N N∆ ∆ + − …

where U∆ is the change in control signal vector; κ and ρ are two weighting matrices penalizing

changes on ˆ( )Y k and ( )U k in (2.2).

Although a sequence of u

N moves is calculated at each sampling instant, only the first control move

( ) ( | )U k U k k= is actually implemented and applied to control the process. The remaining control signals are not

applied because at the next sampling instant 1k k= + a new output ( 1)Y k + is known based on new

measurements. The MPC strategy enters a new optimization loop while the remaining control signals

( 1| )u

U N k− are used to initialize the optimizer. This is indeed the receding horizon principle inherent in MPC

strategy.

2.3 MPC Process Models

MPC is a model-based control strategy which depends on the accuracy of the process model. Accurate

process model predictions can provide early warnings of potential problems whereas inaccurate process model

will not lead only to process control failure but can also lead to expensive economic loss and environmental

hazards.

The majority of the MPC algorithms used in industry are based on a linear mathematical model of the

controlled process ([Camacho and Bordons, 2007], [Froisy, 1994], [Seborg et al., 2004], [Qin and Badgwell,

2003]). However, the characteristics of many industrial systems are highly nonlinear. Therefore, algorithms

obtained by MPC design techniques which are based on a linear mathematical model of the controlled process are

not very efficient because these methods cannot guarantee stable control outside the range of the model validity

([Kalra and Georgakis, 1994], [Su and Yu, 2009]). For these reason adaptive algorithms which would be based on

a continuous updating of the nonlinear process model and redesign of the MPC strategy online before a new

control action is applied to the process would be the preferred ones. As noted in [Seborg et al., 2004], the benefits

of MPC algorithms lie on the accuracy of the process model; and for this reason, accurate nonlinear identification

of the process model forms an integral part of MPC development.

In practice, every possible form of process modeling appears in any given MPC formulation. A detailed

treatment of MPC formulation using linear models can be found in [Camacho and Bordons, 2007], [Maciejowski,

2002], [Normey-Rico and Camacho, 2007], [Seborg et al., 2004], and [Wang, 2009]. The most commonly used

linear models are the impulse response (convolution) model, the step response model, the transfer function model,

and the state-space models.


16

There are two main reasons for using linear models in MPC. On one hand, the control of a linear process

is relatively easy and, on the other hand, linear models provide good results when the plant operating point is in

the neighborhood of the model validity. If this is not the case, nonlinear models must be used. The control of

nonlinear systems is more complex, especially when the nonlinear process exhibits a dead time, because the

difficulties of handling dead times have to be added to the difficulties of controlling nonlinear processes.

Nonlinear systems are systems whose outputs are nonlinear function of their inputs. Models can be

classified into first-principle input-output models, derived from the physiochemical laws governing the

relationship among their variables, or empirical models, derived from data obtained from the process. A major

mathematical obstacle to a complete theory of nonlinear processes is the lack of a superposition principle for

nonlinear systems. Because of this, the determination of models from process input/output data becomes a very

difficult task. The number of plant tests required to identify a nonlinear plant is much greater than that needed for

a linear plant. In an ideal situation, if the plant is linear, only a step test has to be performed in order to know the

step response of the plant. Because of the superposition principle, the response to a different size step can be

obtained by multiplying the response to the step test by the ratio of the step sizes. The use of nonlinear models for

MPC design is currently an active area of research which is also an objective in the current work. Some

approaches for modeling nonlinear systems for MPC design includes the empirical, state space, and the input-

output approaches ([Camacho and Bordons, 2007], [Ljung, 1999], [Maciejowski, 2002], [Normey-Rico and

Camacho, 2007], [Rossiter, 2004], [Zhu, 2001]). Next, we provide brief descriptions of the type of the linear and

nonlinear input-output modeling technique that have been used in this work and may be needed by the reader to

understand this research work.

Discrete Time Input − Output Models

The nonlinear discrete-time models used for control can be viewed as mappings between those variables

that are available for predicting system behaviour up to the current time and those to be predicted at or after that

instant. This kind of model can be represented as a Nonlinear AutoRegressive Moving Average with eXogenous

input (NARMAX) model given by the following general form:

( ) [ ( 1), , ( ), ( 1), , ( ), ( 1), , ( )]Y k J U k d U k d m Y k Y k n D k D k n= − − − − − − − − … … … (2.8)

where ( , , )J ⋅ ⋅ ⋅ is a nonlinear mapping function, ( )Y k is the output, ( )U k is the input, d is the delay and ( )D k is

the noise/disturbance input at a sampling instant k . The suitability of this model depends on the choice of

function ( , , )J ⋅ ⋅ ⋅ and the order parameters m and n. It should be noted that this equation covers a wide range of

descriptions, depending mainly on function ( , , )J ⋅ ⋅ ⋅ . Different choices of this function give rise to certain models

that are able to provide attractive formulations for predictive control ([Camacho and Bordons, 1993], [Camacho


17

and Bordons, 2007]). In the input-output models, system delay can be included in the model for delaying the plant

inputs or plant outputs for specific time steps.

Several input-output model structures exist in literature for modeling nonlinear systems intended for MPC

designs and they includes the Volterra models ([Bibes et al., 2005], [Floriani et al., 2000], [Li and Kashiwagi,

2005], [Ogunfunmi, 2007]); Wiener models ([Kalafatis et al., 1997], [Ogunfunmi, 2007]); Hammerstein models

([Al-Duwaish and Karim, 1997], [Bai, 2002], [Gómez and Baeyens, 1998], [Ogunfunmi, 2007], [Vörös, 1997]),

Hammerstein-Wiener models ([Bai, 1998], [Bai, 2002], [Zhu, 2002]), Wiener-Hammerstein model ([Enqvist and

Ljung, 2005], [Vanderstteen et al., 1997]); local model networks [Camacho and Bordons, 2007]; neural network

models ([Hagan, et al., 1996], [Haykin, 1999], [Ljung, 1999]), and neural network and fuzzy models ([Gupta et

al., 2003], [Spooner et al., 2002], [Tsoukalas and Uhrig, 1997]).

Two major problems [Ogunfunmi, 2007] associated with the Volterra modeling approach concerns the

measurement of the Volterra kernels of a given system as no exact method of isolating the individual Volterra

operator exits while the second problem is concerned with the large eigenvalues spread issue which implies that

slow convergence speed and large misadjustments may be expected, especially for the least mean square (LMS)–

type adaptive algorithms. A problem with the Wiener modeling approach is for even m, all the homogeneous

function kernels with odd index numbers are equal to zero while for odd m, all the homogeneous function kernels

with even index numbers are equal to zero [Ogunfunmi, 2007]. The problems associated with the Hammerstein

model is that because Hammerstein model can be considered as diagonal Volterra model and since the off-

diagonal coefficients are all zero, this restricts the model to reduced number of parameters only [Camacho and

Bordons, 2007]. Also, the parameterization (2.5) is not unique, since any parameter matrices for some nonzero

scalar parameter provide the same input-output equation [Gómez and Baeyens, 1998]. The main difficulty in the

Hammerstein-Wiener (and Wiener-Hammerstein) model approaches is that the parameters of these models cannot

be uniquely determined due to the difficulties in measuring intermediate signal of the linear part ([Zhu, 2002],

[Haykin, 1999]). Although the local model networks technique allows the use of linear predictive controller, thus

avoiding the problems associated with the computation time and optimality of nonlinear solution, but the

identification of local operating regimes can be a difficult task [Camacho and Bordons, 2007]. The neural network

(NN) modeling approach is becoming an increasing suitable and promising method for modeling complex

nonlinear systems, and has been recommended in many MPC and MPC-based adaptive control articles

([Camacho and Bordons, 2007], [Maciejowski, 2002], [Normey-Rico and Camacho, 2007], [Seborg et al., 2004],

[Spooner et al., 2002], and [Wang, 2009]) for MPC applications. Neural network (NN) models can be used to

capture the nonlinear dynamics of processes. Neural networks, coupled with the training techniques, are excellent

and attractive tools to construct the models of nonlinear dynamic systems since they have inherent ability to

approximate any nonlinear function to an arbitrary degree of accuracy. The NN modeling approach is proposed as

the modeling tool for nonlinear dynamical systems in this work and is briefly introduced in the next sub-section.


18

2.4 Neural Networks: An Overview

2.4.1 Neural Networks

A neural network is a massively parallel distributed processor made up of simple processing neurons (also

called units), which has a natural propensity for storing experimental knowledge acquired from its environment

through a learning process and making it available for use ([Gupta et al., 2003], [Hagan et al., 1996], [Haykin,

1999]). Neural networks are composed of many of these neurons. The standard neuron, shown in Fig. 2.3 is a

processing element whose output î

y is calculated by multiplying its inputs l

u by individual weights ,i lw ; add the

bias term ,0i

w which takes care of the offset in the process model, summing up the results to obtain a

as a

function of the input signals, and applying a nonlinear activation function ( )i

f a

to the sum a

. The mathematical

interpretation can be expressed as follows:

, ,0

1

l

i j j i

j

a w u w=

= +∑

(2.9)

, ,0

1

ˆ ( )l

i i i i j j i

j

y f a f w u w=

= = +

∑

(2.10)

where 1 2, , ,l

u u u… are the input signals; ,1 ,2 ,, , ,i i i lw w w… are the synaptic weights of the neuron i ; ,0iw is a fixed

input clamped to +1, called the bias, and can be interpreted as a weight applied to the input. The activation

function ( )i i

f v in terms of the activation potential i

v is most often monotonic and can take any of the following

forms:

Logistic (standard sigmoid) : 1

( )1

j af a

eα−

=+

i

(2.11)

Hyperbolic tangent sigmoid : 2

2( ) 1

1j a

f ae

−= −

−i

(2.12)

Linear : ( )jf a a=

(2.13)

Signum (step) :

1, 0

( ) sgn( ) 0 0

1 0

j

a

f a a a

a

>

= = =− <

(2.14)

where α is the slope parameter of the sigmoid logistic functions.

The adjustment of the weights is called the training or learning process. For the simple single layer neuron

of Fig. 2.3 given by (2.10), the perceptron learning rule is adopted and stated as follows:

1 For 1j = to l


19

( )i

f a

,i lw

,1iw

î

y

, 0 1i

w = +

1u

2u

lu

a

∑ , 2i

w

Fig. 2.3 A nonlinear model of a neuron.

Initialize the weights, ,i jw with small random vector and set ,0iw to small initial number.

End For

2. Compute (2.10)

3. If î iy y≠ (the desired output)

4. For 1j = to l

(i) Modify the weight, , ,j i j i i jw w y u= + ⋅

(ii). The bias, ,0 ,0i i i lw w y u= + ⋅

End For

6. Go to 2.

7. If î iy y= , End.

The original perceptron (Fig. 2.3) proposed by Rosenblatt [Rosenblatt, 1959] and the later Adaline

(adaptive linear element) proposed by Widrow and Hoff [Widrow and Hoff, 1960] were modified by Minsky and

Papet [Minsky and Papet, 1969].

2.4.2 Multilayer Perceptron (MLP) Neural Networks

The single input neuron of Fig. 2.3 can be combined with other single input neurons to form several

layers and architectures in different fashions ([Gupta et al., 2003], [Hertz et al., 1991], [Hagan et al., 1996],

[Haykin, 1999]) and resulting in this way to a more complex network. The most common of these architectures is

the multilayer perceptron (MLP) neural networks where the output of the previous unit serves as the input to the

next unit and so on. A MLP NN with two, three or four-units is usually referred to as two, three or four- layer

network. The typical structure of a two layer multi-input multi-output network is shown in Fig. 2.4, where the first

layer is the hidden layer since it is between the inputsl

u , and the output layer that produces the output ˆ ( )i

y k .

The MLP NN is fully connected since all its inputs and all units in one layer are all connected to all units in the


20

,0jw

1ˆ ( )y k

0 1u = +

1u

2u

lu

1( )f a∑

2 ( )f a∑

( )j

f a∑

1,0w

1,1w

2,0w

1,2w

,1jw

,j lw

,2jw

1,lw

2,1w

2,lw 2,2

w

,0iW

1( )F b∑

1,0W

1,1W

2,0W

1,2W

,1iW

,i jW

,2iW

1, jW

2,1W

2, jW

2,2W

0 1U = +

2( )F b∑

( )i

F b∑

2ˆ ( )y k

ˆ ( )iy k

•

•

•

•

•

•

•

•

•

•

••

•

•

•

Fig. 2.4: Feedforward multilayer perceptron neural network with a hidden and an output layer.

next layer and this network is often referred to as feedforward network due to its structure. For a given set of

inputsl

u , the thi output ˆ ( )

iy k of the MLP NN of Fig. 2.4 can be expressed mathematically as:

, , ,0 ,0

1 1

ˆ ( )h un n

i i i j j j l l j i

j l

y k F W f w u w W= =

= + +

∑ ∑ (2.15)

and the term a

defined in (2.9) is given here from (2.15) as

, ,0

1

un

j l l j

l

a w u w=

= +∑

(2.16)

where every ,j l

w and every ,i j

W are the hidden and output weights respectively; ,0j

w and ,0i

W are the hidden and

output biases; [1, ]h

j n∈ is the number of hidden neurons; [1, ]u

l n∈ is the number of inputs, [1, ]o

i n∈ is the ith

number of output neurons and also correspond to the number of the system outputs; Every ( )j

f i is the hidden

layer sigmoidal activation function which can be (2.11) or (2.12) and ( )i

F i is the output layer linear activation

function which can be (2.13) or (2.14).

2.4.3 Supervised and Unsupervised Learning Methods Using Neural Networks

This sub-section briefly discusses supervised and unsupervised NNs with emphasis on types that are widely

used in dynamic system modeling and adaptive control applications. The types of supervised learning methods

discussed here are: the dynamic feedforward multilayer perceptron neural network (DFMLPNN), recurrent neural

network (RNN) and the tapped delay neural network (TDNN). Two types of unsupervised methods of interest

discussed here are the generalized regression neural network (GRNN) and the radial basis function neural network


21

( 1)ml

kϕ −

( )ml

k mϕ −

( )nl

k nϕ −

( 1)nl

kϕ −

1

1

1 z−

−

,0jw,0iW

lα−

•

Self-Feedback Loop

Self-Recurrence Loop

ˆ ( )iy k ( )iF i

Dynamics

b

∑ ( )jf ia

∑

,j lw

,i jW ( )l kϕ

,j lw

( )ml

kϕ

( )nl

kϕ

Lateral

Recurrences

∑

Fig. 2.5 Dynamic feedforward neural network (DFNN) structure.

(RBFNN). These two types are the most widely used in constructing models for neural control application (see

[Narendra and Parthasarathy, 1990], [Ronco and Gawthrop, 1997]).

2.4.3.1 Dynamic Neural Networks for Supervised Learning

Supervised learning methods utilize the dynamic structures of the multilayer perceptron (MLP) NN

introduced above. These structures include: the dynamic feedforward MLP NN (DFNN); the recurrent neural

networks (RNN) such as the Hopfield, Jordan, and the Elman networks and the tapped delayed neural networks

(TDNN). Although RNNs and TDNNs are extensions of the basic MLP NNs, they sometimes referred to as

dynamic neural network since they contain feedback connections as well as temporal memory units for storing

temporal previous information.

2.4.3.1.1 Dynamic Feedforward Multilayer Perceptron Neural Networks (DFNN)

The dynamic feedforward neural network (DFNN) can be constructed from the basic MLP NN if some

temporal tapped delay lines (TDL) memory units and/or some internal feedback loops, such as self-feedback and

self-recurrent, are incorporated into the MLP network as illustrated in Fig. 2.5 [Gupta et al., 2003]. As shown in

the figure, the weighted summation that is associated directly with the state feedback signals and the synaptic

weight of the hidden unit denoted by ,j lw and ,0jw is first computed. Then, the nonlinear operation ( )jf i using

the activation function given by either (2.11) or (2.12) is applied to this summation. Note that the output of this

hidden unit is the same as the internal state of the system. The network output is obtained by summing the output

of the hidden unit and the synaptic weight denoted by ,i jW and ,0iW , and applying a linear ( )i

F i activation


22

function given by (2.13) or (2.14). The lateral recurrences block consists of the initial weights of the network and

multipliers for multiplying the network input-output signals with weights.

The mathematical description for the output of the network of Fig. 2.5 can be expressed as:

, ,0

1

ˆ ( ) ( ) ( )hn

i i i i j j i

j

y k F b F W f a W=

= = +

∑

(2.17)

where , ,0

1

( )ln

j l l j

l

a w k wϕ=

= +∑

(2.18)

where for simplicity 1α = , but its significance as an adaptation parameter is discussed later in sub-section 2.4.4.1

chapter, ( ) [ ( ) ( )]m nl l lk k kϕ ϕ ϕ= and

m nl l l= + is the length of the input vector to the network.

2.4.3.1.2 Recurrent Neural Networks (RNN)

In the context of models development for dynamic systems, when the MLP feed-forward neural network

discussed in the previous sub-section is augmented with external feedback loops, then the network is referred as

recurrent neural network (RNN). Unlike the feed-forward NN, where there is algebraic relationship between the

input and output, the RNN architecture contains internal time delayed feedback connections.

In recurrent neural networks the activation values in the network are repeatedly updated until a stable

point is reached after which the weights are adapted. There are, however, recurrent networks where the learning

rule is used after each propagation (where an activation value is transversed over each weight only once), while

external inputs are included in each propagation. In such networks, the recurrent connections can be regarded as

extra inputs to the network, the values of which are computed by the network itself.

Suppose a network that must generate a control command depending on an external input is to be

constructed, which is a time series ( ), ( 1) , ( 2), , ( )U k U k U k U k m− − −… . With a feed-forward NN there are two

approaches: 1) create inputs ( 1) , ( 2), , ( )k k k mϕ ϕ ϕ− − −… which constitute the first m values of the input vector

or 2) create inputs , ', '', , mϕ ϕ ϕ ϕ… . Besides the current input ( )U k , the first, second, up to m -derivatives are

also added as inputs. Naturally, computation of these derivatives is not a trivial task for higher-order derivatives.

The disadvantage of these two approaches is that the input dimensionality of the feed-forward network is

multiplied with ,n leading to a very large network, which is slow and difficult to train as pointed out in [Bengio et

al., 1994], [Pearlmutter, 1995], [Song, 2010], and [Williams and Zipser, 1989] . The RNN provides a solution to

this problem due to its recurrent connections. A window of inputs does not need to be input anymore; instead, the

network is supposed to learn the influence of the previous time steps itself. One of the earliest RNN is the auto-

associator ([Anderson, 1977] and [Kohonen, 1977]); which consisted of a pool of neurons with weighted

connections between each unit i and ( )j i j≠ . In 1982, Hopfield [Hopfield, 1982] brings together several


23

earlier ideas concerning these networks and presents a complete mathematical analysis based on Ising-spin

models [Amit et al., 1986]. However, the two most common RNN designs are the Jordan network [Jordan, 1986a]

and the Elman network [Elman, 1990].

2.4.3.1.2.1 The Hopfield Network

The Hopfield dynamic neural network is a nonlinear dynamic system that has the potential for exhibiting

a wide range of complex behaviour depending on how the network parameters are chosen. The Hopfield network

shown in Fig. 2.6 consists of a set of n interconnected neurons which update their activation values

asynchronously and independently of other neurons. All neurons are both input ( )j and output ( )i neurons. The

activation values are binary. Originally, Hopfield [Hopfield, 1982] chose activation values of 1 and 0, but using

values +1 and -1 presents some advantages discussed as below.

Suppose that the state of the system is given by the activation values ˆ ˆ( ) ( )i

Y k Y k= . The net input

( 1)i

a k +

of a neuron i at cycle 1k + is a weighted sum given by:

, ,0( 1) ( )l

i j j i i

j i

a k U k w w≠

+ = +∑

(2.19)

Suppose, a simple logistic sigmoidal function (2.11) is applied to the net input to obtain the new activation value

ˆ ( 1)i

Y k + at time 1k + defined as:

1 ( 1)

ˆ ( 1) 1 ( 1)

ˆ ,( )

i

i i

i

if a t U

Y k if a t U

OtherwiseY t

+ + >

+ = − + <

(2.20)

i.e., ˆ ( 1) ( ( 1))i j

Y k f a k+ = +

. A neuron i in the Hopfield network is called stable at time k if, in accordance with

equations (2.19) and (2.20),

ˆ ( ) ( ( 1))i j iY k f a k= +

(2.21)

In the Hopfield scheme, a state α is called stable if, when the network is in state α , all neurons are stable. A

pattern ( )pU k is called stable if, when ( )p

U k is clamped, all neurons are stable. When the extra restriction

, ,j i i jw w= is made, the behaviour of a dynamic system can be described by the following computational energy

function [Hopfield, 1984]:

, ,0

1 1 1

1 ˆ ˆ ˆ( ) ( ) ( ) ( )2

n m l

j i i j i i

i j i

k Y k Y k w w Y k= = =

ℑ = − −∑∑ ∑ (2.22)

The advantage of a 1 / 1+ − model over a 1 / 0 model then is symmetry of the states of the network. When some

pattern ( )U k is stable, its inverse is also stable, whereas in the 1 / 0 model this is not always true. For example, as


24

1 ( )S k

1ˆ ( )Y k

∑ ∑ 1ℑ

1z

−

+

+ +

+

+

–

+

1,1w

1α 1, nw

1, 2w

1st neuron

1 ( )a k

( )f i

2 ( )S k

2ˆ ( )Y k

∑ ∑ 1ℑ

1z

−

+

+ +

+

+

–

+

2 , 2w

2α 2 , nw

2 ,1w

2nd neuron

2 ( )a k

( )f i

( )mS k

ˆ ( )nY k ∑ ∑ 1

ℑ 1

z−

+

+ +

+

+

–

+

,m nw

nα , 2mw

,1mw

nth neuron

( )na k

( )f i

•

•

•

• • •

• • •

• • •

2ˆ ( )Y k

ˆ ( )n

Y k

1ˆ ( )Y k

ˆ ( )n

Y k

1ˆ ( )Y k

2ˆ ( )Y k

ˆ ( )n

Y k

Recurrent Layer

2ˆ ( )Y k

ˆ ( )n

Y k

Fig. 2.6 The schematic diagram of the Hopfield network.

an example, the pattern 00 00… is always stable, but 11 11… need not be stable. Similarly, both a pattern and its

inverse have the same energy in the 1 / 1+ − model. Thus, removing the restriction of bidirectional connections

(i.e., , ,j i i jw w= ) results in a system that is not guaranteed to settle to a stable state.

Although the primary application of the Hopfield network is an associative memory, it suffers from the

following two problems: 1) the stored patterns become unstable due to oscillations; and 2) spurious stable states

appear (i.e., the stable states which do not correspond to stored patterns). Despite the fact that solutions to these

two problems have been proposed independently by [Bruce et al., 1986] and ([Hopfield, 1984], [Hopfield et al.,

1983]. The resulting patterns are weakly removed and may become unstable again. This is a major obstacle to


25

, 0 1j

w = +

1 ( )f a

2 ( )f a

( )j

f a

, 0 1i

W = +

•

•

•

••

•

••

•

1z

−

1z

−

1z

−

1 ( )U k•

•

•

1 ( )F b

2 ( )F b

( )i

F b

1ˆ ( )y k q+

•

•

2ˆ ( )y k q+

ˆ ( )n

y k q+

2 ( )U k

( )n

U k

•

•

•

2 ( )f •

1 ( )f •

( )n

f •

State

Layers

Input

Units

Fig. 2.7 The basic architecture of the Jordan network.

Output Layer

Hidden Layer

( )k

Input Layer Hidden Layer

( 1 )k −

Input Layer

Input Layer Hidden Layer

( )k n−

•

•

•

Hidden Layer

( 2 )k −

Fig. 2.8 Unfolding action of recurrent neural networks with additional layer at each time step.

most applications using the Hopfield network which require that the network be a stable system with multiple

asymptotically stable equilibrium points.

2.4.3.1.2.2 The Jordan Network

One of the earliest RNN is the Jordan network ([Jordan, 1986a], [Jordan 1986b]) shown in Fig. 2.7. In the

Jordan network, the activation values of the output units are fed back into the input layer through a set of extra

input units called the state units. There are as many state units as there are output units in the network. The


26

, 0 1j

w = +

1 ( )f a

2 ( )f a

( )j

f a

, 01

iW = +

•

•

•

••

•

••

•

1z

−

1z

−

1z

−

1 ( )U k•

•

•

1 ( )F b

2 ( )F b

( )i

F b

1ˆ ( )y k q+

•

•

2ˆ ( )y k q+

ˆ ( )ny k q+

2( )U k

( )n

U k

•

•

•

2 ( )f •

1 ( )f •

( )nf •

Context

Layers

Input

Units

Fig. 2.9 The basic architecture of the Elman network.

connections between the output and state units have a fixed weight of +1; and learning takes place only in the

connections between input and hidden units as well as hidden and output units.

Because, the Jordan network uses the back propagation network (to be presented later) with the output

clocked back to the inputs to generate temporal sequences, it represents a restricted class of supervised (or teacher

forced) recurrent networks. Moreover, due to the recurrent feedback to the state layer, the network “unfolds in

time” by one at each time step as illustrated in Fig. 2.8. Since the feedback connections in theory have infinite

memory, training the network might require large storage memory based on the number of steps (also called the

truncation depth). As Fig. 2.8 shows the network becomes complex and presents difficulties in training as well as

in stability and convergence analysis.

2.4.3.1.2.3 The Elman Network

Another widely used dynamic RNN for modeling and control is the Elman network [Elman, 1990]. In the

Elman network a set of context units are introduced, which are extra input units (neurons) whose activation values

are fed back from the hidden units. Thus the network is very similar to the Jordan network, except that: 1) the

hidden units instead of the output units are fed back; and 2) the extra input units have no self-connections. The

basic architecture of the Elman network is shown in Fig. 2.9. The hidden units are connected to the context units

with a fixed weight of value +1. The major learning process can be summarized as follows:

1) The context units are set to 0; 1k = ;


27

,0 1

jw = +

( )U k

1( )f a

2 ( )f a

( )j

f a

,0 1i

W = +

•

•

•

••

•

•

•ˆ( )Y kq

z− •

ˆ( )Y k

1z

−

1z

−

1z

−

1z

− ˆ( 1)Y k −

ˆ( )Y k n−

( )U k m−

( 1)U k − •

•

•

1( )F b

2 ( )F b

( )i

F b

ˆ( )Y k q+

•

•

•

TDL

Feedforward MPL

Fig. 2.10 Tapped delayed neural network (TDNN).

2) Pattern px is clamped; the feed-forward calculations are performed once;

3) The backpropagation learning rule is applied;

4) 1k k← + ; go to 2.

The context units at step k always have the activation value of the hidden units at step 1k − . Thus, in an Elman

network, the hidden layer outputs are fed back through a one step delay to dummy input nodes. The network also

unfolds in time as illustrated in Fig. 2.8 with increased complexities similar to that discussed for the Jordan

network.

2.4.3.1.3 Tapped Delay Neural Network (TDNN)

Tapped Delay Neural Networks (TDNNs), first described in [Lang and Hinton, 1988] and [Waibel et al.,

1989] can learn temporal behavior by using not only the present inputs, but also past inputs. The basic structure of

a TDNN is shown in Fig. 2.10 with time delayed memory elements called tapped delayed lines (TDL). The

TDNNs accomplish this by simply delaying the input signal while the hidden neurons and output neurons are

replicated across time. The NN architecture is usually a standard multilayer feedforward NN but it can also be a

radial basis function (RBF), probabilistic neural network (PNN), generalized regression neural network (GRNN),

or other feed-forward network architectures. Since the TDNN has no recurrent feedback terms, it can easily be

trained with standard algorithms such as the backpropagation (BP) algorithm to be discussed later. Although the

TDNNs are most suitable for classifying temporary patterns that consist of a sequence of fixed dimensional

feature vectors such as phonemes [Haykin, 1999], they also find application in adaptive signal processing


28

[Tsoukalas and Uhrig, 1997] and they have been widely used in identification and control applications (see

[Narendra and Parthasarathy, 1990])

As shown in Fig. 2.10, the TDNN consists of a multilayer feedforward neural network (MFNN) with

some backward shift delay operators. Compared with the Hopfield, Jordan and Elaman networks, although the

TDNN has time-delayed feedback of the network outputs, its training presents less complexity, requires less

computational time and memory requirement. The main disadvantage to use TDNN is that a prior knowledge

about the system is necessary. Since there is no state feedback, the TDNN can be trained with the basic

backpropagation algorithm.

Consider that ˆ( )Y k ∈ℜ in Fig. 2.10 is an internal state variable at time instant k . The delayed states

ˆ ˆ ˆ( ), ( 1), , ( )Y k Y k Y k n− −… fed as inputs to the TDNN can be used to define specific applications such as time-

series predictions and analysis, identification and adaptive control, communication channels, nonlinear input-

output function approximations, etc. Since this work is concerned with identification and adaptive control, the

input-output equations of the TDNN illustrated in Fig. 2.10 with relative degree one and η step ahead output

predictions can be expressed as follows:

ˆ ˆ( 1) [ ( ), ( )]Y k J k kθ ϕ+ = (2.23)

ˆ ˆ( ) [ ( ), ( )]Y k J k kη θ ϕ+ = (2.24)

where ˆ( , )J ⋅ ⋅ is a nonlinear continuous and differentiable function, ( )kθ denotes the weights and biases of the

network parameters, ( ) [ ( ), ( )]m nl ik k kϕ ϕ ϕ= denotes the state vector with ( ) [ ( ), ( 1), , ( )]

nik Y k Y k Y k nϕ = − −…

containing the time-delayed samples of the NN outputs and ( ) [ ( ), ( 1), , ( )]ml

k U k d U k d U k d mϕ = − − − − −…

containing the current NN inputs, and d the system delay. The sampling time T is assumed to be one i.e. 1T = .

2.4.3.2 Neural Networks Based on Unsupervised Learning

The supervised learning methods just discussed are applied to cases where the inputs and target outputs

are available or can be obtained for NN training and are usually characterized by multilayer neurons. An

unsupervised learning rule is one in which no target outputs are given and are mostly single layer neurons. This

discussion in this section explores unsupervised learning methods based on the Hebbian learning rule [Hebb,

1949] which states that “When the synaptic input and the neuron output are both active, the strength of the

connection between the input and the output is enhanced.” This implies that if the output of the single layer

network is active when the input is active, the weight connecting the two active nodes is enhanced. This allows

the network to associate relationships between inputs and outputs, hence the name associative networks. The

simplest unsupervised Hebb rule can be expressed mathematically as follows:


29

ABn ABoW W xyβ= + (2.25)

where AB

W is the weight connecting input A to output ,B β is the learning constant, x is the input, and y is the

output. The learning constant β controls the rate at which the network learns. If β is made large, few number of

observations are needed to learn an association but if β is made small, many observations are needed.

If the weights between active neurons are only allowed to be enhanced, as in the equation above, there is no limit

to their magnitude. Therefore, Grossberg [Grossberg, 1982] proposes the following weight changing rule that will

allow both learning and forgetting to be implemented as follows:

(1 )ABn ABo

W W xyα β= − + (2.26)

In (2.26), α is the forgetting constant and controls the rate at which the memory of old information ( )ABo

W is

allowed to decay away or be forgotten. Using this update rule, the network constantly forgets old information and

continuously learns new information. The values of β and α control the speed of learning and forgetting

respectively and are usually set in the interval [0 1]. The updating rule can be rewritten as:

T

AB ABW W xyα β∆ = − + (2.27)

This rule limits the magnitude of the weights to a value determined by α and .β Thus, solving (2.27) by setting,

0AB

W∆ = , we find the maximum weight to be /β α when x and y are both active.

Several types of unsupervised learning methods exist such as the instar and outstar learning which can be

combined to form an associative memory ([Kohonen, 1977], [Tsoukalas and Uhrig, 1997]); bi-directional

associative memory (BAM) composed of associative network architecture with a crossbar structure [Kosko,

1988]; competitive networks [Hertz et al., 1991]; self organizing maps (SOM) [Kohonen, 1982, 1984, 1995];

probabilistic neural network (PNN) [Masters, 1993a, 1993b]; radial basis function (RBF); and the generalized

regression neural network (GRNN) [Specht, 1991].

While these neural networks can be used interchangeably for diverse applications, their superior

performance can be derived in certain only applications. For example, the SOM (also called Kohonen network)

maps a high dimension input vector into a smaller dimensional pattern of dimension one or two. The PNN is a

Bayesian classifier integrated into neural network architecture and it is best suited for pattern classification where

a representative training set is available. RBF has been proven to be a universal function approximator [Park and

Sandberg, 1991] and can perform similar function mappings as a MLP neural network except that it has

architecture and functionality. GRNN is best suited for function approximation task such as system modeling and

prediction. For more detail on these networks see the references [Gupta et al., 2003], [Hagan et al., 1996] and

[Haykin, 1999].

Among the above mentioned unsupervised methods, the use of radial basis function (RBF) for modeling

and identification of nonlinear dynamic plants has been reported in [Seng et at., 2002] while the application of the

combination of the RBF with GRNN in identification and control has been reported in [Seng et al, 1998]. Also,

the use of GRNN combined with model-based and/or direct adaptive control of diverse applications has been


30

reported in [Aggelogiannaki and Sarimveis, 2006], Manzie et al., 2000], [Ronco and Gawthrop, 1997],

[Salahshoor et al., 2010], [Venkateswarlu, and Rao, 2005]). A theoretical comparison of RBF and GRNN by [Tsai

et al., 2002 has indicated that the RBFNN has some promising advantages and superior performance over the

GRNN when it is used for modeling a process for MPC design. Nonetheless, these two types of unsupervised

learning are briefly considered and discussed in the following two sub-sections due to their frequent use in MPC

designs and applications.

2.4.3.2.1 Generalized Regression Neural Network (GRNN)

The Generalized Regression Neural Network (GRNN) is a feed-forward NN best suited to be used in

modeling and prediction tasks, although it can be used for pattern classification [Specht, 1991]. A typical GRNN

is composed of four layers as illustrated in Fig. 2.11. The first layer is the input layer and is fully connected to the

second layer which is the pattern layer and has one neuron for each input pattern. This layer performs the same

function as the first layer of RFB neurons: its output is a measure of the distance of the input from the stored

patterns. The third layer is the summation layer and is composed of two types of neurons: S-summation neurons

and a single D-summation neuron (division). The S-summation neuron computes the sum of the weighted outputs

of the pattern layer while the D-summation neuron computes the sum of the non-weighted outputs of the pattern

neurons. There is one S-summation neuron for each output neuron and a single D-summation neuron. The last

layer is the output layer and divides the output of each S-summation neuron by the output of the D-summation

neuron. The output of a GRNN is the conditional mean given by:

2

21

2

21

exp2

ˆ( )

exp2

NpT

p

p

Np

p

DW

Y kD

σ

σ

=

=

−

=

−

∑

∑ (2.28)

where pW is the weight for pattern [1, ]p N∈ , N is the number of input pattern, the exponential function in

brackets is a Gaussian function D with a width constant sigma σ . Note that the calculation of the Gaussian is

performed in the pattern layer denoted by P in Fig. 2.11. The multiplication and summation of the weight vector

are performed in the summation layer denoted by S, and the division is performed in the output layer denoted O.

The GRNN learning phase is similar to that of a probabilistic neural network (PNN). It does not learn

iteratively as most neural networks do, but instead, it learns by storing each input pattern in the pattern layer and

calculating the weights in the summation layer. The equations for the weight calculations are given by the two

following expressions:

(i) The pattern layer weights, set to the input patterns and expressed as:


31

1 ( )U k 1ˆ ( )Y k

•

•

•

P

P

P

P

P

P

ˆ ( )N

Y k

O

2ˆ ( )Y k O

O

•

•

•

S

S

S

•

•

•

I

I

I

•

•

•

•

•

•

2 ( )U k

( )N

U k

Inputs

( )U k Outputs

ˆ ( )Y k

•

•

•

•

•

•

•

•

•

Input

Layer

Pattern

Layer Summation

Layer

Output

Layer

Fig. 2.11: Generalized regression neural network (GRNN).

( )T

pW U k= (2.29)

(ii) The summation layer weights matrix which use the training target outputs. Specifically, the matrix is the

target output ( )Y k values appended with a vector of ones that connect the pattern layer to the D-

summation neuron as follows:

[ ( ) ]s

W Y k ones= (2.30)

The recall performance for the network is very dependent on the width parameter. A small width

parameter gives good recall of the training patterns but poor generalization. A larger width parameter would give

better generalization but poorer recall. The choice of a good width parameter is necessary to having good

performance. Usually, the largest width parameter that gives good recall is optimal.

It should be noted that with the proper choice of training data and width parameter the network can be

able to generalize with very few training parameters. However, if there is not any known information about the

function to be approximated, then a large training set must be chosen to guarantee the representation of the system

behavior. This would make the network very large (many pattern nodes) and would require large memory and

long recall times. Clustering techniques can be used to select a representative training set, thus reducing the

number of pattern nodes. Finally, it is good practice to check the trained network for correct generalization

through simulation over the trained region.

2.4.3.2.2 Radial Basis Function Neural Networks (RBF)

A Radial Basis Function neural network (RBFNN) having the typical structure illustrated in Fig. 2.12 has

been proven to be a universal function approximator [Park and Sandberg, 1991] and an alternative to feedforward

MLP NN. The RBFNN is a multidimensional nonlinear function that maps the inputs to the outputs depending on


32

ˆ ( ) [ ( )]y k f kϕ= ( )kϕ

i

i

i

1w

2w

mw

( )φ i

( )φ i

( )φ i

∑

∑

∑∑

1c

2c

mc

Fig. 2.12: Radial basis function neural network (RBFNN).

the distance between the input vector and the center vector. Therefore, it can perform similar function mappings

as a MLP but its architecture and functionality are very different. In the following, a brief discussion on the

RBFNN is first presented and then the differences between the feedforward MLP NN and the RBFNN are

discussed.

Consider that the RBFNN shown in Fig. 2.12 with m − dimensional input ( ) mkϕ ∈ℜ and a single output

ˆ( )y k ∈ℜ can be represented by the weighted summation of a finite number of radial basis functions as follows:

( )1

ˆ( ) [ ( )] ( ) ( ) ; 1,2, ,m

j j j j

j

y k f k w k c k j mϕ φ ϕ=

= − =∑ … (2.31)

where ( )φ i is an arbitrary nonlinear function, i is the norm that is usually assumed to be Euclidean, ( )m

jc k ∈ℜ

denotes known vectors that represent the centers of the radial basis functions, ,j iw is the weight parameter, and

( ) ( )j jk c kϕ − is the radial basis function of ( )j kϕ at time k obtained by shifting ( )j kϕ by ( )jc k . However,

if the individual elements of the input vector belong to different classes, then a weighted norm can be introduced

from [Gupta et al., 2003] as follows:

,

1

ˆ( ) [ ( )] ( ) ( )j

m

j j i j j jK

j

y k f k w k c kϕ φ ϕ=

= −

∑ (2.32)

where m m

iK×∈ℜ is a weight matrix and the weighted Euclidean norm is given by:

( )

( ) ( )

22( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

jj

j j j jK

T

j j j j j j

k c k K k c k

k c k K K k c k

ϕ ϕ

ϕ ϕ

− = −

= − −

(2.33)

A RBF network is a two layer network that has different types of neurons in the hidden layer and the

output layer. The hidden layer, which corresponds to a MLP hidden layer, is a non-linear local mapping. This

layer contains radial basis function neurons which most commonly use a Gaussian activation function ( )g x .

These functions are centered over receptive fields. Receptive fields are areas in the input space which activate the


33

local radial basis neurons.

( )2

2

( )[ ( )] exp

j j

j

j

kg k

ϕ µϕ

σ

− − =

(2.34)

where jµ is the center of a region called a receptive field, jσ is the width of the receptive field, and ( ( ))jg kϕ is

the output of the jth neuron. The basic design method for RBFNNs can be summarized from [Haykin, 1999] as:

1) Random selection of fixed centers,

2) Self-organized selection of centers,

3) Supervised selection of centers, and

4) Regularized interpolation exploiting the connection between the RBF network and the Watson-Nadaraya

regression kernel.

The output layer is a layer of standard linear neurons and performs a linear transformation of the hidden

node outputs. This layer is equivalent to a linear output layer in a MLP, but the weights are usually solved for

using a least square algorithm rather trained for using back-propagation. The output layer may, or may not,

contain biases.

Receptive fields center on areas of the input space where input vectors lie, and serve to cluster similar

input vectors. If an input vector ( ( ))kϕ lies near the center of a receptive field ( )µ , then that hidden node will be

activated. If an input vector lies between two receptive field centers, but inside the receptive field width ( )σ ,

then the hidden nodes will both be partially activated. When input vectors lie far from all receptive fields there is

not any hidden layer activation and the RBF output is equal to the values of the output layer bias.

A radial basis function (RBF) is a local network that is trained in a supervised manner. This contrasts with

a MLP network that is a global network. An MLP network performs a global mapping, meaning that all the inputs

influences the output, while an RBF network performs local mapping, meaning that only inputs near receptive

field produces activation.

The ability to recognize whether an input is near the training set or if it is in an untrained region of the

input space gives the RBF a significant benefit over the standard MLP. Since networks generalize improperly and

arbitrarily when operating in regions outside the training area, no confidence should be given to their outputs in

those regions. When using an MLP, one cannot judge whether or not the input vector comes from these untrained

regions and therefore, one cannot judge whether the output contains significant information. On the other hand, an

RBF can tell the user if the network is operating outside its training region and the user will know when to ignore

the output. This ability makes the RBF the network of choice for safety critical applications or for applications

that have a high financial impact.

Designing an RBF neural network requires the selection of the width parameter of the radial basis

function. This decision is not required for an MLP. The width should be chosen so that the receptive fields


34

overlap but so that one function does not cover the entire input space. This means that several radial basis neurons

have some activation to each input but all radial basis neurons are not highly active for a single input.

Another choice to be made is the number of radial basis neurons. Depending on the training algorithm

used to implement the RBF, this may, or may not, be a decision made by the designer. For example, the

MATLAB Neural Network Toolbox [The MathWorks, 2010a] has two training algorithms. The first algorithm

centers a radial basis neuron on each input vector. This leads to an extremely large network for input data

composed of many patterns. The second algorithm incrementally adds radial basis neurons to reduce the training

error to the preset goal. There are several network architectures that will meet a specified error criterion. These

architectures consist of different combinations of the radial basis function widths and the number of radial basis

functions in the network.

The maximum number of neurons is the number of input patterns; the minimum is related to the error

tolerance and the complexity of the mapping. This minimum must be experimentally determined. A more

complex map and a smaller tolerance require more neurons. The minimum width constant should overlap the

input patterns and the maximum should not cover the entire input space. Excessively large widths can sometimes

give good results for data with no noise, but these systems usually fail under real world conditions in which noise

exists. The reason that the system can train well with noise free cases is that a linear method is used to provide

solutions for the second layer weights. The use of a regression method will minimize the error, but usually at the

expense of large weights and significant over-fitting. This over-fitting is apparent when there is noise in the

system. A smaller width will do a better job of alerting that an input vector is outside the training space, while a

larger width may result in a network of smaller size and faster execution time.

2.4.4 Basic Neural Network Training Algorithms

The basic neural network training algorithms for feedforward neural networks is the basic

backpropagation (BP) algorithm and its variations such as the delta-rule, generalized delta-rule, backpropagation

with momentum (BPM), e.t.c. The basic training algorithms for recurrent neural networks are the back-

propagation through time (BPTT) and the real-time recurrent learning (RTRL) algorithms. While the BPTT is an

extension of the BP algorithm, the RTRL utilizes the generality of the BPTT to derive a real-time training

algorithm for online applications while not suffering its growing memory requirements of the BPTT due to its

unfolding action.

In the following, two variations of the BP algorithm called the back-propagation with momentum (BPM)

and the teacher-forced real-time recurrent learning (RTRL) algorithms derived from the RTRL method are first

presented together. The former is presented to facilitate the derivation of the latter.

2.4.4.1 The Back-Propagation (BP) Algorithm with Momentum


35

An important generalization of the perceptron training algorithm was presented by Widrow and Hoff

[Widrow and Hoff, 1960] as least mean squares (LMS) learning procedure, also known as the delta rule. While

the perceptron learning rule uses the output of the activation function (either -1 0r 1) for learning, the delta rule

uses the net output without further mapping into output values -1 or 1. The adaptive operation introduced by the

delta rule [Widrow and Hoff, 1960] implies a mechanism by which iW can be adjusted, usually iteratively, to

attain the correct values.

Minsky and Papet [Minsky and Papet, 1969] showed that a two layer feedforward network can overcome

many restrictions, but did not provide a learning rule on how to adjust the weights from the input to the hidden

layers. The problem here is how to determine the coefficients , 1,2, ,iW i n= … such that the input-output

response is correct for a large number of data sets. The idea behind this solution is that the errors for the units of

the hidden layer are determined by back-propagating the errors of the units of the output layer. For this reason the

method is often called the back-propagation learning rule and can also be considered as a generalization of the

delta rule for non-linear activation functions and multilayer networks. It should be noted that when linear

activation functions are used, the multi-layer network is not more powerful than a single-layer network.

For a given set of input-output training data pair

[ ] ( ), ( ) , 1,2,3, ,N

Z U p Y p p N= = … (2.35)

where ( )U N and ( )Y N are the input and desired (or target) output values. Thus, for every given input-output

data, the output of the network ˆ( )Y k differs from the target value ( )Y k by ˆ[ ( ) ( )]Y k Y k− at the time instant k .

The delta-rule uses a cost- or error-function based on these differences to adjust the weights. The error function E

is defined as a total square error given as:

2

1 1

1 ˆ( ) ( ) ( ) ( )N N

p p p

p p

E k E k Y k Y kN= =

= = − ∑ ∑ (2.36)

where the index p ranges over the set of input patterns N and ( )pE k represents the error on pattern p . In high

dimensional input spaces, the least mean squares (LMS) procedure is used to find the values of all the weights

that minimize the error function by a method called gradient descent. The activation units are considered as a set

of nonlinear continuous and differentiable functions ( )p

j jf a

of the total inputs given by (2.11) or (2.12), so that

the network output can be expressed as:

ˆ ( ) ( )p p

j jY k f a=

(2.37)

where , ,0

1

( )m

p

j j l l j

l

a w k wϕ=

= +∑

(2.38)

where [1, ]h

j n∈ is the number of hidden neurons and [1, ]l m∈ is the length of the input vector to the network.

The idea is to make a change in the weight proportional to the negative of the derivative of the error as measured

on the current pattern with respect to each weight. Thus for the generalized delta rule, we have:


36

,

,

( )p

j l

j l

E kw

wγ

∂∆ = −

∂ (2.39)

where γ is a proportionality constant called the learning rate. Let the error measure pE be defined as the total

quadratic error for pattern p at the output units:

2

1

1 ˆ( ) ( ) ( )2

oN

p p p

o o

o

E k Y k Y k=

= − ∑ (2.40)

where ( )p

oY k is the desired output for unit o when pattern p is clamped. The partial derivative of ( )P

E k in

(2.39) can also be expressed as:

, ,

pp pj

p

j l j j l

aE E

w a w

∂∂ ∂=

∂ ∂ ∂

(2.41)

From Eq. (2.38), the second factor in (2.41) can be expressed as

,

ˆ ( )

p

j p

j

j l

aY k

w

∂=

∂

(2.42)

By defining

pp

j p

j

E

aδ

∂= −

∂ (2.43)

An update rule which is equivalent to the delta rule is obtained, resulting in a gradient descent on the error surface

if the weight changes are made according to ([Phansalkar and Sastry, 1994]; [Wang et al., 2004]; [Yu et al.,

1993]):

,ˆ ( )

p p

j l j jw Y kγδ∆ = (2.44)

where ˆ( ) ( )p p p

j Y k Y kδ = − is the difference between the target output and the actual output for pattern p .

The idea here is to investigate the value of p

jδ for each unit j in the network. This implies that there is a

simple recursive computation of these 'sδ which can be implemented by propagating error signals backward

through the network.

To compute p

jδ in (2.43), the partial derivative is expressed as the product of two factors using chain

rule, one factor reflecting the change in error as a function of the output of the unit and one reflecting the change

in the output as a function of change in the input. The computation can be expressed as follows:

ˆ

ˆ

pp pjp

j p ppj jj

YE E

a aYδ

∂∂ ∂= − = −

∂ ∂∂ (2.45)

The second factor in (2.45) is computed using (2.37) and it can be seen that

'ˆ

( )

p

j p

j jp

j

Yf a

a

∂=

∂

(2.46)


37

which is the derivative of the squashing function ( )jf i for the jth unit, evaluated at the net input p

ja

to that unit.

To compute the first factor of (2.45), two cases are considered.

First, assuming that unit j is an output unit j o= of the network. In this case, it follows from the

definition of pE in (2.40) that

ˆ( ) ( )ˆ

pp p

o op

o

EY k Y k

Y

∂ = − − ∂

(2.47)

Substituting (2.46) and (2.47) into (2.45) results in the following expression

'ˆ( ) ( ) ( )p p p p

o o o o oY k Y k f aδ = −

(2.48)

for any output unit o.

Secondly, assuming that the unit j is a hidden unit j h= , it is not readily known the contribution of the

unit to the output error of the network. However, the error measure can be written as a function of the net inputs

from hidden to output layer; 1 2( , , , , )p p p p p

jE E a a a=

… … and the chain rule can be applied to obtain the results as

follows:

1 1 1

ˆˆ ˆ ˆ

o o hpN N Np p ppo

jo jp pp p po o jo oh h h

aE E Ew Y

a aY Y Y= = =

∂∂ ∂ ∂ ∂= =

∂ ∂∂ ∂ ∂∑ ∑ ∑

1 1ˆ

o oN Np pp

ho o hoppo ooh

E Ew w

aYδ

= =

∂ ∂∴ = = −

∂∂∑ ∑ (2.49)

where ˆ p

hY is the output of the hidden layer for pattern p . Substituting (2.49) into (2.45) results in the following

expression:

'

1

( )oN

p p p

h j h o ho

o

f a wδ δ=

= ∑

(2.50)

Equations (2.48) and (2.50) give a recursive procedure for computing the 'sδ for all units in the network, which

are then used to compute the weight changes according to (2.44). This procedure constitutes the generalized delta

rule for a feed-forward network of nonlinear units.

The idea behind the back-propagation algorithm is that when a learning pattern is clamped, the activation

values are propagated to the output units, and the actual network output is compared with the desired output

values which results in an error ( )o

E k for each of the output units. The task then is to bring ( )o

E k to zero by

changing the connections in the neural network in such a way that at the next iteration, the error ( )o

E k will be

zero for this particular pattern by adapting (based on the delta rule) the incoming weights according to:

ˆ ˆ( ) ( ) ( )ho o o h

w Y k Y k Y k ∆ = − (2.51)

It should be noted that the application of (2.51) only does not change the weights from the input to hidden units

but the hidden to output weights. In order to adapt the weights from the input to hidden units, the delta rule is


38

applied again to achieve the universal approximation theorem of the feed-forward network. However, the

unknown value for δ for the hidden units is obtained using the chain rule which distribute the error of an output

unit o to all the hidden units that it connects; that is, a hidden unit h receives a delta from each output unit o equal

to the delta of that output unit multiplied by the weighted of the connection between those units and apply the

derivative of the activation function ' ( )jf i , i.e.

'oN

h j o ho

o

f wδ δ

= ∑ (2.52)

The application of the generalized delta rule involves two processes: 1) the input ( )kϕ is presented and

propagated forward through the network to compute the output values ˆ ( )p

oY k for each output unit. This output is

compared with its desired value ( )o

Y k , resulting in an error signal p

oδ for each output unit; and 2) a backward

pass through the network during which the error signal is passed to each unit in the network and appropriate

weight changes are calculated.

Thus, the back-propagation with momentum (BPM) can be summarized as in the following discussion

([Phansalkar and Sastry, 1994]; [Wang et al., 2004]; [Yu et al., 1993]):

1) The weight of a connection is adjusted by an amount proportional to the product of an error signal ,δ on the

unit k receiving the input and the output of the unit j sending the signal along the connection as follows:

,ˆ ( )

p p

j l j jw Y kγδ∆ = (2.53)

2) If the unit is an output unit, the error signal is given by:

'ˆ( ) ( ) ( )p p p p

o o o j oY k Y k f aδ = −

(2.54)

For the logistic sigmoidal activation function ( )jf i defined in (2.11), the output ˆ ( )pY k can be expressed as:

1ˆ ( ) ( )1

p

p p

aY k f a

e−

= =+

(2.55)

so that the derivative of (2.55) can be expressed as:

( )( )2

1 1'( )

1 1

ˆ ˆ( ) 1 ( )

p

pp

p a

p aa

p p

f a ea e e

Y k Y k

−

−−

∂ = = −

∂ + +

= −

(2.56)

and such that the error signal for an output unit can be expressed as:

ˆ ˆ ˆ( ) ( ) ( ) 1 ( )p p p p p

o o o o oY k Y k Y k Y kδ = − − (2.57)

3) The error signal for a hidden unit is determined recursively in terms of error signals of the units to which it is

directly connected and the weights of those connections. Thus, for the sigmoid activation function, we have:


39

1 1

ˆ ˆ'( ) 1 ( )o oN N

p p p p p p

h h o ho h h o ho

o o

f a w Y Y k wδ δ δ= =

= = − ∑ ∑

(2.58)

4) The learning procedure requires that the change in weight be proportional to pE w∂ ∂ as in (2.39). The true

gradient descent method requires that infinitesimal steps are taken. For practical purpose, the learning rate γ

in (2.39) is chosen as large as possible without leading to oscillation. To avoid oscillation at large ,γ the

change in weight is made to be dependent on past weight change by adding a momentum term as follows:

, ,ˆ( 1) ( ) ( )

p p

j l j j j lw k Y k w kγδ α∆ + = + ∆ (2.59)

where j indexes the presentation number and α is a constant which determines the effects of the previous

weight change. When no momentum term is used, it can take a long time before the minimum is reached with

a low learning rate, whereas for high learning rates the minimum is never reached because of the oscillations.

When a momentum term is added, the minimum is reached faster ([Phansalkar and Sastry, 1994]; [Wang et al.,

2004]; [Yu et al., 1993]).

2.4.4.2 Teacher –Forced Real-Time Recurrent Learning (RTRL) Algorithm

The real-time recurrent learning (RTRL) is a class of learning algorithm in which the adjustments of the

synaptic weights of a fully connected recurrent neural network are made in real-time while the network continues

to perform a signal processing function. The real-time recurrent learning (RTRL) utilizes the generality of the

backpropagation through time (BPTT) while not suffering from its growing memory requirements in arbitrarily

long training sequences. It is similar to the approach proposed by [McBride and Narendra, 1965] for tuning the

parameters of general dynamic systems. The RTRL algorithm described in the following follows from [Williams

and Zipser, 1989] with unrestricted architectures (see [Haykin, 1999], [Pearlmutter, 1990] and [Pearmutter, 1995]

for more details). The approach is based on the teacher forcing method where the actual output ˆ ( )pY k of a unit is

replaced by the teacher signal ( )pY k in subsequent computation of the behaviour of the network, whenever such

a value exists.

Let the network have n units with m external input lines. Let ˆ( )Y k denote n number of outputs of the

units in the network at time k , and let ( )U k denote the m number of external input signals to the network at

time k . Concatenating ˆ( )Y k and ( )U k ( )Z k of dimension m n+ is formed, with NZ denoting the set of

indices p such that NZ is the output of a unit in the network and I the set of indices p for which NZ is an

external input; the indices on ˆ( )Y k and ( )U k are chosen to correspond to those of ( )Z k , so that:


40

( ),

( ) ( ),

ˆ ( )( ),

p

p p N

Np

U k if p I

Z k Y k if p Z

if k Z T kY k

∈

= ∈ ∈ −

(2.60)

Let W denote the weight matrix for the network, with a unique weight between every pair of units and

also from each input line to each unit. Let the net input to the pth unit at time k , for Np Z∈ be given by:

, ( )N

p p l l

l Z I

a w Z k∈ ∪

= ∑

(2.61)

So that the output at the next time step can be expressed as:

ˆ ( 1) ( ( ))p

p pY k f a t+ =

(2.62)

where ( )pf i is the unit’s activation function and the external input at time k does not influence the output of any

unit until time 1k + . Let ( )T k denote the set of indices Np Z∈ for which there exist a specified target value

( )pY k of the pth output unit at time k , so that the time-varying error ( )p

E k can be expressed as:

ˆ ( )( ) ( ),( )

0,

p pp

if k T kY k Y kE k

otherwise

∈−=

(2.63)

This allows the target values to be specified at different times. Let the overall network error at time k be given

by:

2

1

1( ) ( )

2

m np

p

J k E k+

=

= ∑ (2.64)

The minimization of (2.63) is achieved using the gradient descent method by adjusting the joint weights W along

the negative of 0

( , 1)total

WJ k k∇ + based on the following objective function:

1

0

0 1

1

( , ) ( )k

total

k k

J k k J k= +

= ∑ (2.65)

So that the overall weight change for any particular weight ,i jw in the network can be written as:

1

0

, ,

1

( )k

i j i j

k k

w w k= +

∆ = ∆∑ (2.66)

where ,

,

( )( )i j

i j

J kw k

wα

∂∆ = −

∂ and α is a fixed positive learning rate. So that the gradient with respect to the weight

becomes:

, ,

ˆ( ) ( )( )

pp

p Ui j i j

J k Y kE k

w w∈

∂ ∂− =

∂ ∂∑ (2.67)

where ,( ) i jJ k w∂ ∂ can be computed by differentiating (2.61) and (2.62) to obtain:


41

'

, ,

( ), ,

ˆˆ ( )( 1)( ( )) ( )

N

p

l

p p p l i p j

l Z T ki j i j

Y kY kf a t w Z k

w wδ

∈ −

∂∂ += +

∂ ∂ ∑

(2.68)

The assumption that the initial state of the network has no functional dependence on the weights implies

that 0 ,( ) 0l i jY k w∂ ∂ = . These equations hold for all ( )l T k∈ , ,Np Z∈ ,N

i Z∈ and Nj Z I∈ ∪ . So that we can

create a dynamic system with variable ,

p

i jP for all ( )l T k∈ , ,Nk Z∈ ,N

i Z∈ and Nj Z I∈ ∪ given by:

'

, , , ,

( )

( 1) ( ( )) ( ) ( )N

p l

i j p p p l i j i pk j

l Z T k

P k f a k w p k z kδ∈ −

+ = +

∑

(2.69)

with the initial conditions , 0( ) 0p

i jP k = . It follows that , ,ˆ( ) ( )

p p

i j i jP k Y k w= ∂ ∂ for every time step k , and all

appropriate , , ,l i j and p . Thus, the actual algorithm then consists of computing, at each time step k from 0

k

to k , the quantities , ( )p

i jP k using (2.69) and the initial condition; and then using ( )pE k to compute the weight

changes as:

, ,( ) ( ) ( )N

p p

i j i j

k Z

w k E k P kα∈

∆ = ∑ (2.70)

The overall correction to be applied to each weight ,i jw in the network is simply the sum of these individual

, ( )i jw k∆ values for each time step t along the trajectory. Thus, in this case, when each unit in the network uses

the logistic squashing function:

' ˆ ˆ( ( )) ( 1) 1 ( 1)p p

p pf a k Y k Y k = + − +

(2.71)

is used in equation (2.69). The corresponding ,

p

i jP values are set to zero after they have been used to

compute ,i jw∆ .

The algorithm increment each weight ,i jw by an amount , ( )i jw k∆ given by (2.70) at each time step k ,

without accumulating the values elsewhere and making the weight changes at some later time. The actual

dynamics of the teacher-forced network when trained by the RTRL algorithm are given by (2.60) to (2.62).

The main points in the teacher-forced RTRL are: 1) the desired output values ( )pY k are used in place of

the actual network outputs ˆ ( )pY k to compute future network activities; and 2) the corresponding , ( )p

i jP k values

are set to zero after each computation of the , ( )i jw k∆ values. The actual discrete time RTRL algorithm can be

found in [Williams and Zipser, 1989] while the continuous time version of the RTRL algorithm can be found in

[Pearlmutter, 1995]. A more elaborate description of the RTRL algorithm based on state-space formulation is

given in [Haykin, 1999].


42

2.5 System Mathematical Models and Neural Network-Based Nonlinear Model

Predictors

2.5.1 General System Mathematical Models and Mathematical Notations

The method of representing the behaviour of dynamical systems by vector difference or differential

mathematical relationships is well established in system and control theories ([Goodwin and Sin, 1984], [Haykin,

1999], [Ljung, 1999], [Camacho and Bordons, 2007], [Narendra and Annasmay, 1989], [Narendra and

Parthasarathy, 1990], [Serborg et al., 2004], [Wang, 2009]). These relationships constitute the so-called

mathematical model of the system.

One very common method of modeling the behaviour of a p-input q-output multivariable plant in the

discrete time space is by the family of the following general mathematical relationship [Ljung, 1999]:

1 11

1 1

( ) ( )( ) ( ) ( ) ( )

( ) ( )

d B z C zA z Y k z U k e k

F z D z

− −− −

− −= + (2.72)

where ( )Y k is the vector of order n of the q outputs at the timing instant k responding to the vector input ( )U k ;

( )e k is the noise disturbance vector; and 1( )A z− , 1( )B z

− , 1( )C z− , 1( )D z

− and 1( )F z− are polynomial matrices

given by

1 1

1

1 1

0 1

1 1

1

1 1

1

1 1

1

( )

( )

( )

( )

( )

a

a

b

b

c

c

d

d

f

f

n

n

n

n

n

n

n

n

n

n

A z I A z A z

B z B B z B z

C z I C z C z

D z I D z D z

F z I F z F z

−− −

−− −

−− −

−− −

−− −

= + + +

= + + +

= + + +

= + + +

= + + +

(2.73)

d is the system delay, A, C, D and F are monic polynomial matrices in the backward shift operator 1z

− Their

dimensions are a a

n n× , ,a d d

q n q n q n× × × and f

q n× and their degree , , , ,n m c l respectively; B is a b

n p×

stable polynomial matrix (i.e. all its zeros are all inside the unit circle) of degrees of degree r . The term monic

implies that the leading coefficients of A, C, D and F are identity matrices of appropriate dimension to avoid

division by zeros and also because the magnitude of ( )e k can be adjusted to compensate for this if necessary. In

this discussion, it is assumed that: 1) the time delay d of the system is known, i.e. 1d = ; 2) the coefficients of

the polynomials matrices 1 1 1 1( ), ( ), ( ), ( )A z B z C z D z− − − − and 1( )F z

− are unknown; 3) the polynomials matrices

1 1( ), ( ),A z B z− − 1 1( ), ( )C z D z

− − and 1( )F z− are relatively prime; and 4) that the upper bound on the order or each

polynomial matrix is known or can be specified exactly.

Since the noise term ( )e k enters the general model equation (2.72) as a direct error term, the model of

(2.72) is often called an equation error model (Goodwin and Sin, 1984], [Ljung, 1999] and [Zhu, 2001]).


43

Depending on how the five parameters A, B, C, D and F are combined, several model structures can be obtained

from (2.72)

The choice of the models that will represent the noise disturbances is as important as the choice of the

system model. Depending on the different assumptions made about the spectral density of the noise, ( )e k and

how the noise is assumed to enter the system given by (2.72); 32 different model structures can be derived from

(2.72) based on the combination of the five parameters A, B, C, D and F [Ljung, 1999]. However, the model

structures considered in the present work is limited to the structures derived from the combination of the four

parameters A, B, C and F, that is ignoring the D parameter in (2.72). The reason for choosing these four

parameters is because, as literature shows, they were adequate for the modeling needs of the Model Predictive

Control (MPC) for a wide range of dynamical systems The combination of A and B results in an AutoRegressive

with eXogenous inputs (ARX) model, the combination of A, B and C gives an AutoRegressive Moving Average

with eXogenous inputs (ARMAX) model, and the combination of B and F corresponds to the output error (OE)

model. The output error (OE) model is a form of equation error model ([Goodwin and Sin, 1984] and [Ljung,

1999]) and can also take the form based on A, B, C and D which is widely used in MPC literature ([Maciejowski,

2002], [Camacho and Bordons, 2007] and [Normey-Rico and Camacho, 2007]). Rather than using A, B, C and D

to describe the output error (OE) model, the choice of using B and F is adopted in this work for the output error

(OE) model [Ljung, 1999].

Let ( )kθ be a parameter vector which encapsulates the model parameters given in (2.73) and defined as:

1 1 1 1 1( ) , , , , , , , , , , , , , , ,a b c d f

T

n n n n nk A A B B C C D D F Fθ = − − − − − − … … … … … (2.74)

Since the exact value of the parameter vector ( )kθ in (2.74) is unknown, a parameterized set of model structures

Θ can be defined as a set of candidate models given as:

ˆ: ( ) ( )k kν

θθ θΘ ∈ ⊂ ℜ → (2.75)

where θ is some subset of νℜ inside which the search for a model is carried out; ν is the dimension of ( )kθ ;

ˆ( )kθ is the desired model associated with the parameter vector ( )kθ and contained in the set of models

1 2( ) , ( ), , ( )k k kτθ θ θΘ = … ; 1 2

( ) , ( ), , ( )k k kτθ θ θ… Each member of this set is a distinct value of ( )kθ ; and

1,2, ,maxiter

τ = … is the number of iterations required to determine the ˆ( )kθ from Θ .

Thus, the minimum variance (one-step) ahead predictor of (2.72) at time k based on the system

information up to the time 1k − can be expressed as

1 1 11

1 1 1

( ) ( ) ( )ˆ( | 1, ( 1)) ( ) 1 ( ) ( )( ) ( ) ( )

d B z D z D zY k k k z U k A z Y k

F z C z C zθ

− − −− −

− − −

− − = + −

(2.76)


44

Note the inclusion of ( )kθ as an argument to indicate that the model structure represents a set of models. For

notational convenience, the 1k − will be omitted henceforth. The prediction error ( , )kε θ can be computed

directly from (2.72) and (2.76) as follows:

1 11

1 1

ˆ( , ( )) ( ) ( , ( ))

( ) ( )( ) ( ) ( )

( ) ( )

d

k k Y k Y k k

D z B zA z Y k z U k

C z F z

ε θ θ

− −− −

− −

= −

= −

(2.77)

By introducing 1

1

( )( , ( )) ( )

( )

d B zd k k z U k

F zθ

−−

−= (2.78)

and 1( , ( )) ( ) ( ) ( , ( ))v k k A z Y k d k kθ θ−= − (2.79)

and using (2.78) and (2.79), equation (2.77) can be expressed as

1

1

( )ˆ( , ( )) ( ) ( , ( )) ( , ( ))( )

D zk k Y k Y k k v k k

C zε θ θ θ

−

−= − = (2.80)

Let the regression vector (the so-called state vector) derived from the difference equation form of (2.72) be:

[

, ( )

( , ( )) ( 1), , ( ), ( ), , ( ), ( 1, ( )), , ( , ( ))

( 1, ( )), , ( , ( )), ( , ( )), , ( )

a b c

f d d k

k k Y k Y k n U k d U k d n k k k n k

d k k d k n k v k n k v k n θ

ϕ θ ε θ ε θ

θ θ θ

= − − − − − − −

− − − −

… … …

… … (2.81)

Using the parameter vector given in (2.74) and the regression vector in (2.81) above, equations (2.78) and (2.80)

can be expressed respectively as:

1 1( , ( )) ( ) ( ) ( 1, ( )) ( , ( ))b fn b n f

d k k BU k d B U k d n F d k k F d k n kθ θ θ= − + + − − + − + + − (2.82)

1 1( , ( )) ( 1, ( )) ( , ( )) ( , ) ( ) ( )c dn c n dk k C k k C k n k v k D v k d D v k d nε θ ε θ ε θ θ= − + + − + + − + + − − (2.83)

Inserting ( , )v k θ from (2.83) and substituting ( , )d k θ from (2.82) into (2.77) gives

( , ( )) ( ) ( , ( )) ( )k k Y k k k kε θ ϕ θ θ= − (2.84)

Thus, the one-step ahead predictor can then be expressed as:

ˆ( , ( )) ( , ( )) ( )Y k k k k kθ ϕ θ θ= (2.85)

2.5.1.1 Remarks on the Disturbance Model

The disturbance model, i.e. the second term in (2.72), plays significant role in modeling the overall

system behaviour. Let the disturbance model be defined as

1

1

( )( )

( )M

C zD e k

D z

−

−= (2.86)

In MPC literature, (2.86) this model is usually called CARIMA (controlled auto-regressive and integrated

moving average) model ([Camacho and Bordons, 2007], [Clarke and Mohtadi, 1989], [Clarke et al., 1987a &


45

1987b], [Maciejowski, 2002]). In practice, ( )e k cannot be measured but it can be estimated as deterministic or

stochastic noise ([Goodwin and Sin, 1984], [Ljung, 1999], [Narendra and Annaswamy, 1989]).

The deterministic case is simply achieved by setting 1 1( ) ( ) 1C z D z− −= = with the assumption that ( )e k is

a zero-mean white noise with finite variance while its first few terms are made non-zero. Additional assumption

on ( )e k is that it is independent of past inputs and that it can be characterized by some probability function

[Ljung, 1999]. With these assumptions on (2.86) and setting 1( ) 1F z− = in (2.72), equation (2.72) essentially

reduces to an autoregressive with exogenous (ARX) model structure, which is stable for wide range of operations.

The stochastic case is somewhat more involved. Consider the case of modeling a stationary, zero-mean

white noise process, namely 2 2 ( ) E e k σ= , ( ) ( ) 0E e k e k λ− = for all 0,λ ≠ the probability distribution of

( )e k being the same for all ( ),k and each ( )e k being independent of ( )e λ if kλ ≠ ; where the term E i

implies the expectation or mean value of its arguments. Then, if 1 1( ) / ( )C z D z− − is an asymptotically stable

transfer function, the (2.86) will be a stationary process with spectral density given by

2

2

2

( )( )

( )

j T

SDj T

C e

D e

ω

ωω σ

−

−Φ = (2.87)

where σ is the spectral density. Note that since2

( ) ( ) ( )j T j T j TC e C e C e

ω ω ω− − −= ⋅ , it is always possible to choose

1( )C z− such that all its roots lie inside the unit disc, i.e. without restricting the spectral densities which can be

modeled in this way. Also for the same reason, the factors of 1( )C z− do not affect the spectral density. This

property shows and guarantees a useful way of selecting 1( )C z− to lie inside the unit circle for models with

moving average such as ARMAX and OE models introduced later.

2.5.2 The Neural Network-Based Nonlinear Model Predictors

In formulating nonlinear model structures for use in dynamic neural network (NN), as pointed out in

([Ljung, 1999], [Norgarrd et. al., 2000]), a simple but effective approach with several advantages is to reuse the

structures from the linear models just discussed above while letting the internal architecture to be the feedforward

dynamic neural network (FDNN) developed in Section 2.4.3.1.1 and 2.4.4.4. The one-step ahead nonlinear

predictor of (2.85) can be expressed as:

ˆ( | ( )) ( , ( ), ( ))NY k k J Z k kθ ϕ θ= (2.88)

where ( , ( ), ( ))NJ Z k kϕ θ is a nonlinear cost function of its arguments that can be realized by a neural network and

it is assumed to have a feedforward structure, and NZ is the input-output data pair obtained from prior plant

operation over NT period of time defined as:


46

(1), (1) , ( ), ( ) 1,2, ,NZ U Y U N Y N N z= =… … (2.89)

where N is the number of input-output data pair, T is the sampling period of the system and z is the total number

of samples.

2.5.2.1 Neural Network-Based Auto-Regressive with Exogenous Inputs (NNARX)

Model Predictor

The Neural Network-based Auto-Regressive with eXogenous input (NNARX) model predictor can be

obtained from (2.76) by setting 1 1 1( ) ( ) ( ) 1C z D z F z− − −= = = , so that the NNARX model predictor from (2.88)

takes the following form:

ˆ( , ( ) , ( ), ( )NY k k J Z k kθ ϕ θ = (2.90)

with ( ) [ ( 1), , ( ), ( ), , ( )]T

a bk Y k Y k n U k d U k d nϕ = − − − − −… … is a new regression vector and ( )kθ =

1 0[ , , , , , ]a b

T

n nA A B B− −… … contains the adjustable parameters of the network. Although 1 1( ) ( )A z B z− − now has

poles, but there is still an algebraic relationship between the output predictions and the past inputs and measured

outputs; and consequently the predictor will always be stable even if the system is unstable. This is a very

important feature of the ARX model structure [Ljung, 1999]. The structure of the NNARX model predictor is

shown in Fig. 2.13.

2.5.2.2 Neural Network-Based Auto-Regressive with Moving Average and

Exogenous Inputs (NNARMAX) Model Predictor

The Neural Network-based Auto-Regressive with Moving Average and eXogenous input (NNARMAX)

model predictor can be obtained directly from (2.76) by setting 1 1( ) ( ) 1D z F z− −= = , so that the NNARMAX

model predictor from (2.88) takes the following form:

Dynamic

Neural

Network

( 1)Y k −

( )a

Y k n−

( )b

U k d n− −

ˆ ( , ( ))Y k kθ

( )U k d−

Dynamic

Neural

Network ( 1, ( ))k kε θ−

( , ( ))c

k n kε θ−

( )Y k

ˆ( , ( ))Y k kθ

( , ( ))k kε θ

–

+

•

cnz

−

1z

−

( )b

U k d n− −

( )U k d−

( )a

Y k n−

( 1)Y k −

•

Dynamic

Neural

Network ˆ( 1)Y k −

ˆ( )aY k n−

( )U k d−

( )b

U k d n− −

anz

−

1z

−

ˆ( , ( ))Y k kθ •

•

Fig. 2.13: NNARX model predictor. Fig. 2.14 NNARMAX model predictor Fig. 2.15 NNOE model predictor


47

ˆ( , ( ) , ( , ( )), ( )NY k k J Z k k kθ ϕ θ θ = (2.91)

where [ ]( , ( )) ( 1), , ( ), ( ), , ( ), ( , ( )), , ( , ( ))T

a b ck k Y k Y t n U k d U k d n k k k n kϕ θ ε θ ε θ= − − − − − −… … … is the new

regression vector and 1 0 1( ) , , , , , , , ,a b c

T

n n nk A A B B C Cθ = − − … … … contains the adjustable parameters of the

network. Due to the presence of the 1( )C z− polynomial in (2.72), the predictor now has poles. Thus 1( )C z

− must

have its roots inside the unit circle for the predictor to be stable. Also, the poles imply that the regression vector

depends on the model parameters, which make the estimation of the model parameters ( )kθ more complicated

due to the feedback structure. The model dependency on 1( )C z− is indicated by including ( )kθ as an argument

in ( , ( ))k kϕ θ .

The typical structure of a NNARMAX model predictor is shown in Fig. 2.14. The stability of the

predictor of the linear ARMAX depends on the values of the roots of the 1( )C z− polynomial. The approach of

(2.87) can be employed to initialize the disturbance in a natural way. These problems can be partially alleviated

when used in conjunction with the teacher-forced RTRL method discussed earlier since the actual system outputs

will be used in the subsequent computations in the recurrent network rather than the predicted outputs in a

feedforward fashion.

2.5.2.3 Neural Network-Based Output-Error (NNOE) Model Predictor

The output error model structure has been widely used in situations where the only noise affecting the

system is white measurement noise (Camacho and Bordons, 2007], [Maciejowski, 2002]). The Neural Network-

based Output Error (NNOE) model predictor can be obtained directly from (2.76) by setting 1( )A z− =

1 1( ) ( ) 1C z D z− −= = . In this case (2.88) takes the following form:

ˆ( , ( ) , ( , ( )), ( )NY k k J Z k k kθ ϕ θ θ = (2.92)

where ˆ ˆ( , ( )) ( 1, ( )), , ( , ( )), ( ), , ( )T

a bk k Y k k Y t n k U k d U k d nϕ θ θ θ = − − − − − … … is the new regression vector

and 0 1( ) , , , , ,b f

T

n nk B B F Fθ = − − … … contains the adjustable parameters of the network. The regressor in the

NNOE model depends on past output information as shown in Fig. 2.15. The model dependency on 1( )F z− is

indicated by including ( )kθ as an argument in ( , ( ))k kϕ θ . Thus, for the predictor to be stable, the roots of F

must lie inside the unit circle. The discussion based on (2.87) can be employed to chose and initialize the

disturbance model. Although, the NNOE model predictor is has the same problems with the NNARMAX model


48

predictor, these problems are partially alleviated when the network is combined with the teacher-forced RTRL

structure discussed earlier [Haykin, 1999].

The NNOE structure depicted in Fig. 2.15 is sometimes considered a fully recurrent neural network used

for formulating the real-time recurrent learning algorithm and/or recurrent nonlinear ARX (NARX) model

[Haykin, 1999].

2.6 Implementation of MPC Algorithms

2.6.1 Computer Implementation of MPC Algorithms

The early forms of implementing MPC algorithms were based on the use of process control computers.

For a brief historical overview of early MPC implementation techniques, see Appendix A of [Seborg et al, 2004].

Technological advancements in the last three decades have led to the development of sophisticated distributed

control systems (DCS) and NCS for implementing MPC designs. The revolutionary developments in

microelectronics and telecommunications have led to the evolution of distributed computer networks for MPC

implementation in the process industries where computers are configured into a network for process control.

Industrial process control networks can be distributed both geographically and logically. Geographically,

distributed network implies computers that are physically located in different plant areas to control nearby plants

whereas logically distributed network implies the distribution of control functions over more than one computer

or devices to control the plant. The control applications often utilize a variety of digital devices such as

workstations, personal computers, single-loop controllers (SLC), and programmable logic controllers (PLC). In a

distributed control system (DCS) or NCS, a fieldbus, which is a low-cost protocol, can be used to perform

necessary communication between the DCS (or NCS) and the plant’s sensors (or smart devices) efficiently in a

platform independent fashion.

MPC was originally developed to meet the specialized needs of power plants and the petroleum refineries

but now it finds applications in wide areas including robotics, aerospace, chemicals, food processing, as well as

paper and pulp industry. MPC, as well as control based on online optimization has long been recognized as the

winning alternative for constrained multivariable system control but its applicability has been limited to slow

systems with long sampling time because of the huge optimization algorithm which must be solved repeatedly at

each sampling time. Even with the currently available multicore computers and processors, several techniques and

implementation strategies are still been exploited to speed-up the online computation of the MPC optimization for

its application to nonlinear dynamical systems with relatively short sampling time. These techniques are

presented in the next two sub-sections.


49

2.6.2 FPGA Implementation of MPC Algorithms

In its basic form, model predictive control (MPC) is a computationally intensive online optimization

control strategy based on the system process model. Since this online optimization must be repeated at each

sample time, the computation must be completed within the sampling time of the system under control. The

online optimization is even more involved for the nonlinear system control. Thus, extensive research has been on

ways to speed up the MPC computation while some research seeks alternative implementation of the MPC

algorithm. For example, Tøndel and co-workers ([Tøndel et al., 2003] and [Hegrenæs et al., 2005]) have shown

that it is possible to adopt multi-parametric quadratic programming (mp-QP) as an alternative to the online MPC

optimization for fast systems with relatively short sampling time. Felt [Felt, 2006] applied stochastic techniques to

speed up the MPC online computation and recommended its parallel implementation. Multi-modal and

decentralized techniques [Magni and Scattolini, 2006] as well as multiplexed MPC [Ling et al., 2005] and

distributed nonlinear MPC [Dunbar and Desa, 2005] have been reported to speed up the MPC online optimization

problem in sequential computations. Efforts made to reduce the MPC computational load has led to the extension

of MPC into multirate situations with additional causality constraints ([Halldrosson and Unbehauen, 2001], [Ling

et al., 2004], [Sheng et al., 2002]).

In the last few years there have been significant efforts towards the implementation of MPC as a system-

on-a-chip (SoC), or application-specific integrated circuit (ASIC), on a digital signal processor (DSP), and/or on

a field programmable gate array (FPGA). The main differences between ASIC and FPGA implementation is that

the ASIC can be fast with low power consumption and can include both analog and digital signals but the ASIC

implementation is expensive with long development cycle. FPGA implementations are less expensive, more

flexible with short development cycle but with limited speed which can be improved using multiple processor

architecture.

Several papers on MPC implementation on FPGAs have been reported ([Bleris et al., 2006], [Daniel and

Ruano, 1999], [Garcia et al., 2004], [He and Ling, 2005], [Johansen et al., 2007], [Ling et al., 2006], [Ling et al.,

2008], [Vouzis et al., 2006]). In these papers FPGAs have been proposed to have great potentials for meeting the

needs of real-time computational and optimization problems. In [Johansen et al., 2007], the explicit MPC

optimization is solved off-line and during runtime the solutions are only invoked from a local memory. The major

disadvantage of this approach is that the memory requirements increase exponentially as the size of the problem

increases and is applicable to small scale embedded applications. In [Daniel and Ruano, 1999], the computational

load on the DSP processor designed for the unconstrained generalized predictive controller (GPC) will increase if

the proposed architecture is applied to constrained nonlinear MPC (NMPC). Moreover, the GPC suffers from

instability when the process operates outside the neighbourhood of its operating range. In [He and Ling, 2005],

[Ling et al., 2006] and [Ling et al., 2008] a novel technique for FPGA hardware design tailored to solve

constrained MPC problems is presented and hardware-in-the-loop (HIL) simulation is used to verify the


50

functionality and performance of the design. The problem envisaged with the linear MPC used in the last three

papers is computational efficiency and performance degradation outside the validity regime of the process model

when compared with constrained nonlinear MPC control system. In [Bleris et al., 2006], [Garcia et al., 2004] and

[Vouzis et al., 2006], a hardware/software FPGA implementation of MPC has been presented based on a

logarithmic number system (LNS) architecture with several precautions and limitations regarding its precision.

In general terms, an embedded system [Ganssle and Barr, 2003] is composed of an embedded processor

(or microcontroller) system with dedicated hardware and software portions of the design properly partitioned

within the embedded system. As pointed out in [Fletcher, 2005] and recommended in [Xilinx, 2010], an

embedded processor offers several advantages such as customization, obsolescence migitation, component and

cost reduction, and hardware acceleration for the implementation/application when compared to typical

microprocessors. All the FPGA implementation of MPC algorithms reported so far are all based on linear process

models and these MPC algorithms are implemented directly using Xilinx Logic Cells only [Xilinx, 2010]. The

FPGA implementation of the MPC algorithm proposed in this work is developed and modeled as hardware

peripheral. The hardware peripheral is validated via a hardware-in-the-loop co-simulation with a FPGA board.

Next, the validated hardware peripheral is exported and attached to a pre-designed embedded processor system to

complete the hardware portion. Next, software is written for initializing the hardware drivers and executing the

MPC algorithm embedded in the processor system. To the best of my knowledge as at the time of this write up, no

neural network-based and\or constrained MPC implementation on a FPGA embedded processor system has been

reported and this is one of the objectives that are being exploited in the present study.

2.6.3 Remarks on the Reviewed MPC Implementation Strategies

As noted in Section 2.6.1, a fieldbus based network control system (NCS) could be a possible platform

for running model predictive control (MPC) algorithm for nonlinear systems control. Fieldbuses have greater

functionality, resulting in reduced setup time, improved control, combined functionality of separated devices, and

smart sensor diagnostic capabilities together with the added advantage of digital communication which allows the

control system to be completely distributed. Depending on the constraints on the computation time, a more

advanced platform with emphasis on computational efficiency could be sought. In view of the above requirement,

two implementation platforms are proposed in this work. The first platform is based on service oriented

architecture (SOA) which utilizes the device profile for web services (DPWS) on a computer clustered network.

The second real-time implementation platform is based on a FPGA. The selection of FPGA for MPC

implementation depends on the MPC formulation and intended application ([Shoukry et al., 2010a], [Shoukry et

al., 2010a] and [Meloni et al., 2010]). The term “MPC formulation” implies whether the process model to be used

is linear or nonlinear and whether the MPC algorithm is adaptive or non-adaptive. Also the term “intended


51

application” implies whether the application is a single-input single-output (SISO) or multiple-input multiple-

output (MIMO) and whether the system has slow dynamics with large sample time or fast dynamics with relative

short sample time.

The choice of an FPGA as well as its vendor is critical in an embedded system design. Oldfield and Dorf

[Oldfiel and Dorf, 1995] outlined efficient 2–stage 14–points guidelines for choosing an FPGA for implementing

an embedded application from a number of FPGA manufacturer catalogues. Based on the survey on the FPGA

computing power conducted by [Guccione, 2000] together with the additional selection considerations provided

by [Fletcher, 2005], places the Xilinx FPGAs [Xilinx, 2010].

In the next two sub-sections, an overview of the switched Ethernet architecture, the available service

oriented architecture (SOA) technologies as well as a brief overview of programming logic devices with emphasis

on FPGAs are presented.

2.7 Switched Ethernet Architecture and SOA Technologies

2.7.1 The Architecture of the Switched Ethernet

Ethernet specification defines a number of wiring and signaling standards for the physical layer of the

open systems interconnection (OSI) networking model as well as a media access control (MAC) algorithm and a

common addressing format at the data link layer (DLL). However, it is impossible to guarantee a bounded

message transmission time mostly due to its weakness to handle collisions.

In switched Ethernet data is transmitted and received on different wires and the hub is replaced by an

Ethernet switch. The carrier sense multiple access with collision detection (CSMA/CD) MAC protocol is no

longer used in switched Ethernet. The switch regenerates the information and forwards it only to the port on

which the destination is attached. It complies with the IEEE 802.3 MAC protocol when relaying the frames and

creates different collision domain per switch port whereas in the hubs all the nodes have to share the bandwidth of

a half-duplex connection. If a frame is already being transmitted on the output port, the newly received frame is

queued and will be transmitted again when the medium becomes idle. In addition, all cables are point to point,

from one station to a switch and vice versa. So it is allowed to have dedicated bandwidth on each point to point

connection with every node running in a full duplex mode with no collisions. This characteristic renders the

switched Ethernet appropriate for industrial applications where the response time is a crucial matter. Furthermore,

apart from the cases in which overflows occur [Decotignie, 2005] in the switches, transmission bounds can be

predicted. However, overflows may occur if for example the combined traffic destined to the same destination

may exceed the capacity of the link between the switch and destination. The excess traffic accumulates in the

switch until its output buffer overflows. In such a case no strict bound can be considered on the transmission

delay.


52

2.7.2 SOA technologies

Nowadays, service-oriented architecture (SOA) has become the state of the art solution for implementing

autonomous and interoperable systems as it provides web-based and modular implementation of complex and

distributed software systems [Erl, 2005]. The interoperability at the application level that it offers due to its

loosely coupled nature renders it a desirable element when developing information and communication

technology (ICT) systems.

Several device level SOA technologies have been proposed, most notably Jini [Jini, 2010], universal

plug-n-play (UPnP) [UPnP, 2010] and device profile for web services (DPWS) [DPWS, 2006].

2.7.2.1 The Jini Technology

Jini Jini [Jini, 2010] offers the ability to register, discover, and use services. However is highly rooted in

Java and therefore is not designed for completely language and platform independency.

2.7.2.2 The UPnP Technology

The UPnP architecture leverages Internet and Web technologies including Internet protocol (IP), transfer

control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), simple object access

protocol (SOAP), and extensible markup language (XML). However, it is not fully compatible with web services

(WS) technology. Furthermore it uses specific protocols for discovery and event notification purposes.

2.7.2.3 The DPWS Technology

The DPWS has adopted WSs technology [Jammes and Smit, 2005a] and therefore it provides plug-n-play

connectivity and completely language and platform independency. For these reasons, the DPWS is the preferred

implementation vehicle for SOA technology in the present study.

DPWS utilizes Internet and web technologies including IP, TCP, UDP, HTTP, simple object access

protocol (SOAP), extensible markup language (XML) as well as web services description language (WSDL) 1.1.

As it is documented in [Jammes and Smit, 2005b], the core WSs standards are the following: WSDL, XML

Schema, SOAP, WS-Addressing, WS-Policy, WS-MetadataExchange and WS-Security. Apart from the standard

core WSs, DPWS adds WS-Discovery for WS discovery and WS-Eventing for subscription mechanisms. A

detailed description of these protocols can be found in [DPWS, 2006] and [Jammes and Smit, 2005b].


53

2.8 Programmable Logic Devices and Field Programmable Gate Array (FPGA)

Technologies

A typical embedded system design involves a significant amount of custom logic circuitry with pre-

designed hardware components, such as processor, memory units and various types of input/output (I/O)

interfaces as well as integrated modular software partitioned to the various hardware components within the

embedded system and running under by a real-time operating system [Xilinx, 2010]. For multivariable systems

with relatively short sampling time, the embedded system design becomes a complex and complicated task when

real-time constraints must be satisfied.

Traditionally, early systems were controlled by mechanical means using cams, gear, levers and other

basic mechanical devices. As system complexity increases, the programmable logic controller (PLC) was

introduced which provided an easy way for system control using “ladder logic” (and sometimes C) programs and

can be reprogrammed rather than rewiring the control system. Unlike computers, PLCs are rugged computer that

controls a part of an industrial control system and can typically withstand shock, vibration, elevated temperatures,

and electrical noise which are the characteristics of industrial and manufacturing systems [Jack, 2003].

Furthermore, as the computational burden for the control of multivariable systems coupled with

interactive strong nonlinearities and complexities increase, new computing platforms began to evolve as

programmable logic devices which include logic manipulations with varying computational efficiency. These

computing platforms have evolved within four decades from simple programmable logic devices (PLDs) such as

programmable array logic (PAL), generic array logic (GAL), complex programmable logic devices (CPLDs) to

the currently most widely used field programmable gate arrays (FPGAs). The PLDs (PAL, GAL and CPLD) and

FPGAs are all programmable devices which mean that they are integrated circuits that are used to create a circuit

in which the internal design is not defined until after it has been programmed.

It is important to note here that PLC and PLD are two distinct entities. While the former is the brain of

industrial manufacturing process, the latter is an integrated circuit that can be used to implement a digital logic

design in hardware [Ganssle and Barr, 2003]. A similarity is that both are programmable. We also note that PLDs

contain relatively limited number of logic gates and the functions they can implement are less and simpler when

compared to FPGAs.

Moreover, the internal architecture of PLDs is predetermined by the manufacturer, but the PLDs are

created in such a way that they can be configured in the field to perform a variety of different functions. On the

other hand, FPGAs are digital integrated circuits that contain configurable (programmable) blocks of logic gates

along with configurable interconnects between these blocks. Depending on how the FPGAs are implemented,

some may be programmed only once (one-time programmable, OTP), while others may be reprogrammed several

times. The term “field programmable” refers to the fact that FPGA programming takes place in the field as

opposed to devices whose internal functionality is hardwired by the manufacturer such as application-specific

integrated circuits (ASICs) and application-specific standard parts (ASSPs) [Maxfield, 2004]. Thus, FPGAs can


54

be configured or reprogrammed while are residing in a higher-level system or in an electronic system that has

been deployed to the outside world.

ASICs and ASSPs are also programmable devices that can contain hundreds of millions of logic gates and

can be used to implement incredibly large and complex functions. ASICs and ASSPs are based on the same

design processes and manufacturing technologies. Both are custom-designed and tailored to address specific

applications. The only difference being that an ASIC is designed and built by a specific company, while an

ASSP is designed for multiple customers. Although ASICs offer the ultimate in terms of logic gate, complexity,

and performance, designing and building one is an extremely time-consuming and expensive process, with the

added disadvantage that the final design cannot be modified once it has been fabricated (frozen in silicon).

FPGAs lie between PLDs and ASICs because their functionality can be customized in the field like PLDs,

and they can contain millions of logic gates that can be used to implement extremely large and complex functions

that could previously be realized using ASICs only ([Dubey, 2009], [Kilts, 2007]). In comparison to ASIC, FPGA

is cheaper even in small quantities, implementing design changes is much easier, and the time to complete the

overall system design, implementation, verification and deployment (i.e., time to market) is faster.

FPGAs are a good choice for implementing digital systems because, as in [Akpan, 2010] and [Cardenas

and Troncoso, 2008]: 1) FPGAs offer large logic capacity, exceeding several million equivalent logic gates, and

include dedicated memory resources; 2) they include special hardware circuitry that is often needed in digital

systems, such as digital signal processing (DSP) blocks (with multiply and accumulate functionality) and phase-

locked loops (PLLs) (or delay-locked loops (DLLs)) that support complex clocking schemes; and 3) they also

support a wide range of interconnection standards, such as double data rate static random access memory (DDR

SRAM), peripheral component interconnect (PCI) and high-speed serial protocols. In addition to the above

features, FPGAs provide a significant benefit as “off-the-shelf” chips that are programmed by the end user.

On an FPGA, hard- and soft- processors are available for implementation ([Akpan, 2010], [Kilts, 2007],

[Xilinx, 2010]). A hard processor is a pre-designed circuit that is fabricated within the FPGA chip. A more

flexible alternative is to use a soft processor. In this case, the processor exists as code written in a hardware

description language (HDL), and it is implemented along with the rest of the system by using the logic and

memory resources in the FPGA fabric. One disadvantage of this approach is that the hardware resources on the

FPGA fabrics are consumed by the processor whereas these components are actually needed by the system. It is

also possible to include multiple soft processors in the FPGA when desired ([Virtex-4, 2010], [Virtex-5, 2010],

[Virtex-6], [Virtex-7]).

FPGA are semiconductor devices containing programmable logic components and programmable

interconnects. The programmable logic components can be programmed to duplicate the functionality of basic

logic gates such as AND, OR, XOR, NOT or more complex combinatorial functions such as decoders or simple

math functions. In most FPGAs, these programmable logic components (or logic blocks in FPGA parlance) also

include memory elements, which may be simple flip-flops or more complete blocks of memories. A hierarchy of


55

programmable interconnects allows the logic blocks of an FPGA to be interconnected as needed by the system

designer, somewhat like a one-chip programmable breadboard. These logic blocks and interconnects can be

programmed after the manufacturing process by the engineer/designer (hence the term "field programmable"), so

that the FPGA can perform whatever logical function is needed. Applications of FPGAs include DSP, software-

defined radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer vision, speech

recognition, cryptography, bio-informatics, computer hardware emulation and a growing range of other areas.

FPGAs now find applications in areas that require the use of massive parallelism offered by their architectures

[Maxfield, 2004], [Kilts, 2007].

In the traditional approach for designing such systems, a new integrated circuit (IC) chip is created for the

custom logic circuits, but each pre-designed component is included as a separate chip. Because many products

contain hardware and software components the difficulty of generating a design from a set of requirements and

specifications increases as the product becomes complex. These difficulties led to the development of electronic

system level (ESL) design and verification which is an algorithm modeling methodology that focuses on a higher

abstraction level using high-level languages such as C, C++, or MATLAB® to model the entire behaviour of the

system with no initial link to its implementation [Moretti, 2003], [Martin, 2002]. The ESL design and verification

has evolved into an industry standard complementary methodology that enables to design verify and debug

embedded systems either as a custom system-on-a-chip or a system-on-FPGA or system-on-board. The details of

the model-based ESL design and verification techniques used in this work which combines MATLAB/Simulink

from The MathWorks with AccelDSP and System Generator for DSP from Xilinx are discussed in Appendix A.

2.8.1 The Xilinx Virtex Series FPGA Family Members

A careful study of Xilinx’s latest products list by function categories together with their data sheets and

performance capabilities of Xilinx FPGAs published in [Guccione, 2000] and [Xilinx, 2011] reveals that one of

the Xilinx’s Virtex FPGA family members can be considered for use in this work. In this discussion, Virtex,

Virtex-II, Virtex-II Pro, and Virtex-II Pro X are not included since they have been enhanced into Virtex-4 FPGA

family members. Among the Xilinx’s Virtex FPGA family members are Virtex-4, Virtex-5, Virtex-6 and Virtex-

7. The main differences and application areas of the Xilinx Virtex family members can be summarized as follows:

1). The Virtex-4 FPGA family members include [Virtex-4, 2010]:

i). The Virtex-4 LX is optimized for high-performance logic applications solution.

ii). The Virtex-4 SX is optimized for high-performance solution for digital signal processing (DSP)

applications.

iii). The Virtex-4 FX is optimized for high-performance, full-featured solution for embedded system platform

applications.


56


i). The Virtex-5 LX is optimized for high-performance general logic applications.

ii). The Virtex-5 LXT is optimized for high-performance logic applications with advanced serial connectivity.

iii). The Virtex-5 SXT is optimized for high-performance digital signal processing (DSP) applications with

advanced serial connectivity.

iv). Virtex-5 TXT is optimized for high-performance systems with advanced serial connectivity.

v). Virtex-5 FXT is optimized for high-performance, full-featured solution for embedded system platform

applications advanced serial connectivity.


i). The Virtex-6 CXT is optimized for high-performance digital signal processing applications with low-

power serial connectivity and high performance GTX transceivers for niche applications.

ii). The Virtex-6 LXT is optimized for high-performance logic applications with advanced serial connectivity.

iii). The Virtex-6 SXT is optimized for high-performance solution for digital signal processing (DSP)

applications.

iv). The Virtex-6 HXT is optimized for applications that require ultra high-speed serial connectivity. It

offers the highest bandwidth with advanced serial connectivity.


i). The Virtex-7 T offers 12.5Gb/s advanced serial connectivity, greatest parallel input/output bandwidth and

ultra high-end logic capacity for advanced systems requiring the highest performance and highest

bandwidth connectivity.

ii). The Virtex-7 XT offers extended capabilities including 13.1 Gb/s serial connectivity, higher DSP-to-logic

ratio, and higher block RAM-to-cell ration for advanced systems requiring the highest performance and

highest bandwidth connectivity.

iii). The Virtex-7 HT combines 28Gb/s and 13.1Gb/s serial connectivity for 400GHz communications line

cards for advanced systems requiring the highest performance and highest bandwidth connectivity.

Since the FPGA development concerned in the work is geared towards a platform which would support

embedded processor system development, it is obvious that the Virtex-4 and Virtex-5 FPGA family members

would be appropriate for the embedded processor system design as desired in this work.

However, in addition to the Virtex-4 and Virtex-5 FPGA family members introduced above, Xilinx has

also recently introduced additional Virtex-4 and Virtex-5 FPGAs family members, namely the “Defense-grade”

Virtex-4Q and Virtex-5Q FPGAs as well as the “Space-grade” Virtex-4QV and Virtex-5QV FPGAs”. The

difference between the Virtex-4 and Virtex-5 above to their Defense-grade and Space-grade counterpart lies in


57

their application environments. While the Virtex-4 and Virtex-5 FPGAs are for general purpose in their specific

application areas in diverse environment; the Defense-grade Virtex-4Q [Virtex-4Q, 2010] and Virtex-5Q [Virtex-

5Q, 2010] FPGA family members, as the name implies, are for mission critical aerospace and defense

applications. The radiation-hardened Space-grade Virtex-4QV [Virtex-4QV, 2010] and Virtex-5QV [Virtex-5Q,

2010] FPGA family members, on the other hand, are for applications involving sensor processing, reconfigurable

computing platforms, and modem and communication systems.

While the general purpose, Defense-grade, and the Space-grade categories of Xilinx’s Virtex-4 and

Virtex-5 FPGA family members still retains their respective designations, they differ in some ways in terms of

their available hardware resources and consequently their costs.

2.8.2 Comparison of the Xilinx General-Purpose, Defense-Grade, Space-Grade

Virtex-4 and Virtex-5 FPGA Product Family Members

In the following, some comparisons between the general-purpose Virtex-4 and Virtex-5 FPGA family

members are made in terms of their available hardware resources while references to the Defense-grade and the

Space-grade are made where there are differences for completeness. An overview of the hardware resources

embedded within the general-purpose Virtex-4 and Virtex-5 FPGA family members and some of their capabilities

are shown in the Table 2.1. Then, on the basis of the general-purpose Virtex-4 and Virtex-5 FPGAs, references

under Table 2.1 are then made to the respective Defense-grade and Space-grade FPGA family members with

respect to the general-purpose Virtex-4 and Virtex-5 FPGAs using superscripts discussed in the 17–point Notes

below Table 2.1.

In the following discussion, since the available FPGA hardware resources increases from the first device

part number on the left to the right for the Virtex-4 and Virtex-5 FPGA family members; the Virtex-4 XC4VFX20

and Virtex-5 XC5VFXT30T would form the basis for comparisons and they would simply be referred to as

Virtex-4 and Virtex-5 FPGAs respectively, except where otherwise stated.

Thus, comparing the combined least resources provided by the configurable logic blocks (CLBs) in Table

2.1 for the Virtex-4 and Virtex-5 FPGA family members, it is obvious that those provided by the later exceed that

provided by the former. As noted in Note (1), the combined logic cells and slices in Virtex-4 are 111,072 whereas

that provided by the Virtex-5 slices computed from Note (3) is 81,920. Again, comparing the Arrays [Rows x

Columns] and maximum distributed random access memory (RAM) indicates that the Virtex-5 has more available

hardware resources.

Another key difference between the Virtex-4 and Virtex-5 FPGA is that while the Virtex-4 utilizes the

XtremeDSP slices (see Note (4) under Table 2.1) to implement a primitive of the DSP48 complex multipliers, the

Virtex-5 embeds an enhanced DSP48E slices (see Note (5) under Table 2.1) with optional bitwise logical

functionality, dedicated cascade connections, and resources twice the amount of those in Virtex-4 FPGAs.


58

While the Virtex-4 FPGAs use Xesium Clock technology with single digital clock managers (DCMs) and

sometimes combines it with additional phase-matched clock dividers (PMCDs), the Virtex-5 FPGAs employs

powerful clock management tiles (CMTs) with each consisting of two DCM blocks for zeros delay buffering,

Table 2.1: Comparison of the Xilinx General-Purpose, Defense-Grade, Space-Grade Virtex-4 and Virtex-5

FPGA Product Family Members in terms of their available hardware resources and capabilities.

Virtex-4 FPGA Family Members(17) Virtex-5 FPGA Family Members(17) Available Hardware Resources

XC4V

FX12

XC4V

FX20

XC4V

FX40

XC4V

FX60(13)

XC4V

FX100

XC4V

FX140(13)

XC5V

FX30T(15)

XC5V

FX70T

XC5V

FX100T

XC5V

FX130T(16)

XC5V

FX200T

Array(2) [Rows x Cols] 64x24 64x36 96x52 128x52 160x68 192x84 80x38 160x38 160x56 200x56 240x68

Logic Cells 12,312 19,224 41,904 56,880 94,896 142,128 NA NA NA NA NA

Slices 5,472 8,544 18,624 25,280 42,176 63,168 NA NA NA NA NA

Virtex-5 Slices(3) NA NA NA NA NA NA 5,120 11,200 16,000 20,480 30,720

Configurable

Logic Blocks

(CLBs)(1)

Max Distributed RAM (Kb) 86 134 291 395 659 987 380 820 1,240 1,580 2,280

XtremeDSP Slices(4) 32 32 48 128 160 192 NA NA NA NA NA

DSP48E Slices(5) NA NA NA NA NA NA 64 128 256 320 384

18Kb Blocks(6) 36 68 144 232 376 552 136 296 456 596 912

36Kb Blocks NA NA NA NA NA NA 68 148 228 298 456

Block RAM

Blocks MAX Block RAM (Kb) 648 1,224 2,592 4,176 6,768 9,936 2,448 5,328 8,208 10,728 16,416

Digital Clock Managers (DCMs) 4 4 8 12 12 20 NA NA NA NA NA

Phase-Matched Clock Dividers (PMCDs) 0 0 4 8 8 8 NA NA NA NA NA

Clock Management Tiles (CMTs)(7) NA NA NA NA NA NA 2 6 6 6 6

PowerPC Processors Blocks(8) 1 1 2 2 2 2 1 1 2 2 2

Endpoint Blocks for PCI Express NA NA NA NA NA NA 1 3 3 3 4

Ethernet MACs(9) 2 2 4 4 4 4 4 4 4 6 8

MGT NA 8 12 16 20 24 NA NA NA NA NA

GTP NA NA NA NA NA NA NA NA NA NA NA

Max RocketIO

Transceivers

Blocks(10) GTX NA NA NA NA NA NA 8 16 16 20 24

Total Input/Output (I/O) Blocks(11) 9 9 11 13 15 17 12 19 20 24 27

Max User Input/Output (I/O)(12), (14) 320 320 448 576 768 896 360 640 680 840 960

Notes on the superscripts indicated in Table 2.1:

(1) In the Virtex-4 FPGAs one CLB block contains four slices with 64-bits maximum.

(2) In the Virtex-4 FPGAs, some of the row/column array is used by the processors in the FX devices.

(3) Virtex-5 slices are organized differently from previous generations. Each Virtex-5 FPGA slice contains four look-up tables (LUTs)

and four flip-flops whereas previous generation of FPGAs had two LUTs and two flip-flops.

(4) Each XtremeDSP slice contains one 18x18 multiplier, an adder, and an accumulator.

(5) Each DSP48E slice in the Virtex-5 FPGAs contains a 25x18 multiplier, an adder, and an accumulator.

(6) Block RAMs are functionally 36-Kbits in size. Each block can also be used as independent 18-Kbit blocks.

(7) Each Clock Management Tile (CMT) contains two Digital Clock Managers (DCMs) and one phase-lock-loop (PLL).

(8) PowerPC 405 processor Block(s) are embedded in the Virtex-4, Virtex-4Q and Virtex-4QV FX FPGA members while the

enhanced PowerPC 440 processor Block(s) are embedded in the Virtex-5, Virtex-5Q and Virtex-5QV FXT FPGA members.

(9) The Virtex-5 FPGA family members contain separate Ethernet Memory Access Controllers (Ethernet MACs) per device.

(10) RocketIO GTP transceivers in the Virtex-5 FPGAs are designed to run from 100-Mbits/s to 3.75-Gbits/s. RocketIO GTX

transceivers are to run from 150-Mbits/s to 6.5-Gbits/s whereas those in the Virtex-4 FPGAs are designed to run from 622-Mbits/s

to 6.5-Gbits/s only.

(11) This total number of the input/output (I/O) Banks includes configuration Bank 0.

(12) The Maximum User Input/Output (Max User I/O) ports do not include the RocketIO transceivers I/O ports.

(13) Only the Virtex-4 XC4VFX60 and XC4VFX140 counterparts are available in the Virtex-4Q and Virtex-4QV FPGAs; i.e.

XQ4VFX60 and XQ4VFX140 as well as XQR4VFX60 and XQR4VFX140 respectively for the Virtex-4Q and Virtex-4QV.

(14). The Virtex-4 XC4VFX60 has 896 Max User I/Os ports respectively whereas each of the Virtex-4Q and Virtex-4QV counterparts

has 768 Max User I/Os ports.

(15). Of the five designations of the Virtex-5 FPGA members, only the Virtex-5 XC5VFX30T FPGA counterpart; that is Virtex-5

XQ5VFX30T, is not available in the Virtex-5Q FPGA members.

(16). Of the five designations of the Virtex-5 FPGA members, only the Virtex-5 XC5VFX130T FPGA counterpart; that is Virtex-5

XQR5VFX130T, is available in the Virtex-5QV FPGA members with 836 Max User I/Os ports against 840 in XC5VFX130T.

The Virtex-5QV XQR5VFX130T FPGA member has 18 Max RocketIO GTX Transceivers designed to run from 150-Mbits/s to

3.75-Mbits/s only against 20 in XC5VFX130T designed to run in the two ranges given in (10) above. The Virtex-5QV

XQR5VFX130T FPGA member does not include the any embedded PowerPC Processor Block(s).

(17). Apart from the features listed above, the general-purpose, Defense-grade, and the Space-grade Virtex-4 and Virtex-5 series of

FPGA family members have approximately the same hardware resources in Table 2.1 except for their operational environments.


59

frequency synthesis and clock phase shifting as well as a phase-lock-loop (PLL) which offers the function of the

PMCDs together with input jitter filtering.

The PowerPC processor block(s) in Virtex-4 is the IBM PowerPC™ 405 Core while that in Virtex-5 is

the IBM PowerPC™ 440 core ([IBM PPC405C, 2006]; [XPPC405C Virtex-4, 2010]). The PowerPC™ 405 core

is a scalar 5-stage pipeline 32-bit reduced instruction set computer (RISC) central processing unit (CPU) core

providing up to 400-MHz and 608-DIMPS (distributed integrated message processing system) performance as

implemented in IBM’s advanced 90-nm copper CMOS technology. The PowerPC™405 processor core employs

the scalable and flexible Power Architecture technology optimized for embedded applications with 16-bit x 16-bit

MAC. The PowerPC™ 440, on the other hand, is also a 32-bit RISC CPU core but with support for two

instructions per clock and providing up to 667-MHz and 1,334 DIMPS performance as implemented in IBM’s

advanced 90-nm copper CMOS technology ([IBM PPC440C, 2006]; [XEPB Virtex-5, 2010]). The PowerPC™

440 core integrates a superscalar 7-stage pipeline with out-of-order issue, execution and completion. PowerPC™

440 employs the scalable and flexible Book E enhanced Power Architecture optimized for embedded applications

with single cycle throughput 32-bit x 32-bit MACs. For detail description of the PowerPC™ 440 embedded

processor core, processor block architecture, organization, associated peripherals, and controllers see Appendix 9.

Thus, the PowerPC™ 440 processor cores embedded in Virtex-5 offer more enhancements and performance with

low-power compared to the Virtex-4 family members. Finally, the Virtex-5 FPGAs offer higher number of total

input-output blocks as well as maximum user input-output ports when compared to the Virtex-4 family of FPGAs.

2.8.3 The Xilinx Virtex-5 XC5VFX70T ML507 FPGA Development Board

It is obvious that the comparison of the hardware resources available on the Virtex-5 FXT FPGA family

members with those available on the Virtex-4 FX FPGA family members, makes the former suitable for use in

developing the proposed embedded PowerPC™ processor system platform in this work. Since this work is

concerned with the development of an embedded processor system platform for the implementation of adaptive

model predictive control algorithm, the multi-processor system is not desired for use at the current state of the

work rather a single embedded processor system would be exploited this implementation. Among the Xilinx

embedded system development boards in the Virtex-5 FXT FPGA family members is the Virtex-5 XC5VFX70T

ML507 FPGA development board. For simplicity and convenience, this board shall be referred to as the Virtex-5

ML507 FPGA board. The top and bottom views of the Virtex-5 ML507 FPGA board that will be used in this

work are shown Fig. 2.16(a) and (b) respectively.

The ML507 is a general purpose FPGA, RocketIO™ GTX, and embedded system development board

that: provides feature-rich general purpose evaluation and development platform, it includes on-board memory

and industry standard connectivity interfaces, and delivers a versatile development platform for embedded

applications. The key physical features of the Virtex-5 ML507 FPGA board shown in Fig. 2.16 include:


60

XC5VFX70TFFG1136, 256-MB DDR2 SODIMM, 1-Mbit ZBT SRAM, 32-MB linear flash card, System ACE™

CF technology (CompactFlash), platform flash, SPI Flash, JTAG programming interface, external clocking (2

differential pairs), 2 universal serial bus interface for host development platform and peripheral, 2 PS/2 keyboard

inputs, mouse input, RJ-45 10/100/1000-Mbit/s port for networking, RS-232 (Male) serial port, 2 Audio In lines,

microphone input, 2 Audio Out lines, amp, SPDIF, piezo speaker, Rotary encoder, Video Input, DVI/VGA video

output, Single-ended and differential I/O , expansion, 8 GPIO DIP switches, 8 light emitting diodes (LEDs)

outputs display, push buttons (5), MII, GMII, RGMII, and SGMII Ethernet PHY interfaces, 1 Endpoint PCI

(a) Top view

(b) Bottom view

Fig. 2.16: The Virtex-5 ML507 FPGA embedded system development board: (a) Top view and (b) Bottom view.


61

Express® edge connector, GTX: SFP (1000Base-X), GTX: SMA (RX and TX differential pairs), GTX: SGMII,

GTX: PCIe™, dual host connections GTX: SATA, GTX clock synthesis chips, Header for second serial port,

Second Platform Flash PROM (32 Mb) for large device, Mictor trace port, BDM debug port, and Soft touch port.

2.9 Application of MPC

The first-generation of MPC was originally proposed and developed by two independent industrial

research groups at ADESA by [Richalet et al., 1978] and at Shell Oil by [Cutler and Ramaker, 1980] respectively

to meet the specialized control needs of petroleum refinery industries, but it now finds application in a variety of

control areas. The second-generation of MPC began with the generalized predictive control (GPC) proposed by

Clarke and co-workers [Clarke et al., 1987] as an adaptive MPC. Since its inception 30 years ago [Dones et al.,

2010], MPC has been applied to a wide range and broad classes of applications. Several surveys on the industrial

applications of MPC have been reported [Froisy, 1994], [Bemporad and Morari, 1999], [Qin and Badgwell,

2003]; for example Goodhart [Goodhart et al., 2000] recorded more than 1000 successful MPC applications

worldwide while Seborg [Seborg et al., 2004] recorded over 4500 MPC applications worldwide. A simple

database search for MPC conducted by Morari [Morari, 1994] generated 128 references for the years 1991 – 1993

while the same search for the years 1991 – 1998 generated 2802. A summary of linear and nonlinear MPC

applications based on a survey conducted by [Qin and Badgwell, 2003] and reported until 1999 were estimated to

be 4,542 and 93 respectively as Table 2.2 and Table 2.3 show.

Table 2.2: Summary of linear MPC applications by areas (estimates based on vendor survey; estimates do no

include applications by companies who have licensed vendor technology) – Source [Qin and

Badgwell, 2003]

Area Aspen

Technology

Honeywell

Hi-Spec

Adersa Invensys SGS

Total

Refining 1200 480 280 25 – 1985

Petrochemicals 450 80 – 20 – 550

Chemicals 100 20 3 21 – 144

Pulp and paper 18 50 – – – 68

Air and gas – 10 – – – 10

Utility – 10 – 4 – 14

Mining/Metallurgy 8 6 7 16 – 37

Food processing – – 41 10 – 51

Polymer 17 – – – – 17

Furnaces – – 42 3 – 45

Aerospace/Defense – – 13 – – 13

Automotive – – 7 – – 7

Unclassified 40 40 1045 26 450 1601

Total 1833 696 1438 125 450 4542


62

A quick comparison of Table 2.2 and Table 2.3 shows the dominance of linear MPC applications over the

nonlinear counterpart. Generally speaking most systems are highly nonlinear. The resurgence and subsequent

inclusion of neural networks and fuzzy-rule-based logic techniques into MPC algorithms have enhanced the

efficient development and applications of nonlinear MPC (NMPC) to the modeling and control of more advanced

nonlinear industrial systems. The recent developments in MPC applications have been geared towards re-design

of nonlinear and adaptive MPC algorithms with several challenges. Although the list of applications of nonlinear

and adaptive MPC applications is endless, some recent applications include: i) an adaptive liquid-liquid extraction

process which has an advantage over distillation and other separation techniques using a neural network-based

GPC [Mjalli, 2006]; ii) a pneumatic system control using the GPC based on a linearized neural network ARMAX

(AutoRegressive Moving Average with eXternal input) [Song and Liu, 2006]; iii) the control of a chemical

process rig using adaptive neural network with online multivariable NMPC [Yu et al. 2006]; iv) the model-based

control of a fluidized furnace reactor using MPC with a validated first-principle process model [Voutetakis et al.,

2006]; v) the control of a fast turbocharged spark ignition engine with variable camshaft timing using neural

network-based MPC[Colin et al., 2007]; vi) multi-rate MPC of a chemical reactor using neural network models

[Yu and Yu, 2007]; vii) the application of temperature control of a variable-frequency oil-cooling machine using

neural network-based adaptive GPC [Lu and Tsai, 2008]; viii) an exothermic chemical semi-batch reactor control

using MPC with neural network [Samek and Macku, 2008]; ix) the control of an industrial baker’s yeast drying

process using neural network-based MPC [Yüzgeç et al., 2008]; x) application of MPC to wastewater treatment

process [Shen et al., 2008]; xi) direct application of NMPC to the control of a fed-batch cultures for

biopharmaceutical process [Hafidi et al., 2008]; xii) the use of GPC as a position-torque strategy for pneumatic

artificial robot muscles control [Chikh et al., 2010]; xii) circuit control using MPC for the processing of mined ore

[Coetzee et al., 2010]; xiv) the combination radial basis function-type neural network and genetic algorithm with

MPC for the adaptive control of a hybrid motor-boat [Salahshoor et al., 2010]; and xv) adaptive control of the

concentration of grape juice using neural network MPC [Suárez et al., 2010].

Table 2.3: Summary of nonlinear MPC applications by areas (estimates based on vendor survey; estimates do

no include applications by companies who have licensed vendor technology) – Source [Qin and

Badgwell, 2003]

Area Adersa Aspen

Technology

Continental

Controls

DOT

Products

Pavilion

Technologies

Total

Air and gas – – 18 – – 18

Chemicals 2 – 15 – 5 22

Food processing – – – – 9 9

Polymers – 1 – 5 15 21

Pulp and paper – – – – 1 1

Refining – – – – 13 13

Utility – 5 2 – – 7

Unclassified 1 – 1 – – 2

Total 3 6 36 5 43 93


63

Again, a comparison of Table 2.2 and Table 2.3 indicates that until 1999 there was not any reports on

nonlinear MPC control in petrochemicals, mining and metallurgy, aerospace/defense and automotive. After 2006

some MPC applications have been proposed for the aerospace/defense and automobile industry which deal with

nonlinearity by either combining neural networks with simple PID [Savran et al., 2006] or reinforcement

learning algorithms [Jiang and Kamel, 2007] or using recurrent neurocontrollers trained with simultaneous

perturbation stochastic approximation and adaptive critic [Prokhorov, 2007].

2.10 Practical Problems with MPC Algorithms

Model predictive control (MPC) is an advanced digital control technique that has been developed for the

control of constrained multivariable systems with dead time (i.e. time delay) that are met in most of the real world

industrial processes. A comprehensive treatment of dead-time control using MPC can be found in [Normey-Rico

and Camacho, 2007]. In the basic MPC formulation, it is assumed that the system under control will never react

immediately to a control input until after the time delay has elapsed. It is evident that the MPC algorithms are

suitable for systems with long time constants while are unsuitable for systems with fast dynamics for the control

of which short sampling times of the controlled variables are required. Have high computational demands which

cannot be carried out within the short sampling times that the control of the systems with fast dynamics requires.

As pointed in [Camacho and Bordons, 2007], [Froisy, 1994], [Hugo, 2000], [Qin and Badgwell, 2003]

and [Seborg et al., 2004], many commercial industrial MPC packages are based on linear process models which

might not accurately represent or cover the entire operating range of the system [Kalra and Georgakis, 1994]. The

paper of Maciejowski [Maciejowski, 2002] proposed the use of neural networks for modeling nonlinear dynamics

and their use in the MPC design. As mentioned earlier, the majority of MPC research employing NN uses a

linearized NN model for GPC design (GPC is a celebrated class of MPC algorithm with less computational

requirements). The comparison of GPC schemes using linear and nonlinear models have also been investigated by

[Roa et al., 2009], and it has been shown that the latter outperforms the former. Introducing NN into the MPC

scheme for nonlinear MPC design increases the overall design effort in terms of proper model and training

algorithm selection, the arrangement of the NN model amongst other.

Despite the success of MPC, Hugo [Hugo, 2000] has noted several limitations of it in industrial

applications related with controller justification, installation and maintenance expenses, controller structure ill-

conditioning, and operator interface. As pointed out in [Froisy, 1994], [Hugo, 2000], [Maciejowski, 2002], [Qin

and Badgwell, 2003] and [Seborg et al., 2004], ill-conditioning is a major drawback in MPC implementation and

has disabled many MPC controllers. The idea of the ill-conditioning arises due to the multivariable nature of

MPC. Ill-conditioning occurs when the available inputs have very similar effects on two or more outputs. As a

result, the process gain matrix is nearly singular, and large input movements are required to control the outputs


64

independently. Consequently, it becomes imperative to check for ill-conditioning at each control sequence.

Although, some strategies have been proposed in [Hugo, 2000], [Maciejowski, 2002], [Qin and Badgwell, 2003]

and [Seborg et al., 2004] to address this issue their proposed strategy basically utilizes linear models off-line.

According to the following statement posed in the abstract of [Nikolaou, 2001]: “After several years of

efforts, the constrained model predictive control (MPC), the de facto standard algorithm for advanced control in

process industries, has finally succumbed to rigorous analysis. Successful practical implementations of MPC

were already in place almost two decades before a rigorous stability proof for constrained MPC was published.

What is then the importance of recent theoretical results for practical MPC applications?” In conclusion,

Nikolaou [Nikolaou, 2001] pointed out that several open issues need to be addressed such as robustness,

adaptation, nonlinearity handling, performance monitoring, model building, computation, and implementation. In

addition Quin and Badwell [Qin and Badgwell, 2003] also pointed out that the current MPC suffers from the

following: limited model choice, sub-optimal feedback, lack of nominal stability, sub-optimal or inefficient

solution of the dynamic optimization, difficulty in tuning the controller for stable processes in the presence of

severe constraints, exploiting MPC to handle significantly larger and/or faster processes, and model uncertainty.

Other issues raised in [Qin and Badgwell, 2003] concern current model identification techniques which include

poor plant test signals they do not have tools to determine whether the collected plant data represent the system

dynamics for MPC design, lack of model validation methods to ascertain the accuracy of the model for MPC

design. There is a need to develop improved identification algorithms which will not be based on other than least

squares and convolution models.

In the light of the above discussion, neural networks were selected for modeling nonlinear systems in the

current work. Given information based on prior tests performed on the system in order to obtain the input-output

training data, the main MPC problem can be summarized as follows: obtain a suitable dynamic neural network

model of a nonlinear system; find an efficient training algorithm, design an MPC algorithm by using the NN

model.

Since 1990, the application of NN to nonlinear system control has been studied extensively. In the next

sub-section, the basic NN-based control schemes are briefly discussed and comparisons are made with NN-based

MPC control strategy.

2.11 Neural Network-Based Control Schemes and MPC

Based on NN learning methods which could be supervised learning, unsupervised learning or

reinforcement learning four (4) NN-based control schemes can be identified namely: neural network-based

adaptive control, modular NN controller design, back-propagation through time control scheme, and adaptive

critic designs. The first two schemes utilize the supervised learning method; the third utilizes the unsupervised


65

learning method, whereas the fourth scheme can employ supervised or unsupervised learning methods. In the next

three sub-sections, the four methods are briefly introduced while the fourth sub-section presents a brief

comparison of these methods with MPC scheme.

2.11.1 Direct Adaptive Control

The direct adaptive control is an off-spring of the model reference adaptive control (MRAC) with the

incorporation of neural network (NN) which is illustrated in Fig. 2.17. The implicit assumption is that the

designer is sufficiently familiar with the system under consideration [Narendra and Annaswamy, 1989]. Direct

implies that the controller is a NN without identifying a model of the system ([Haykin, 1999], [Nørgaard et al.,

2000], [Spooner et al., 2002]). In this scheme, the adaptation mechanism is designed to adjust the NN causing it to

match some unknown nonlinear controller that will stabilize the system and make the closed-loop system achieve

it performance objective.

The control technique adjusts the controller characteristics so that the controller/system combination

performs like a reference model. The desired response ˆ( )Y k for the plant is supplied by the output of the stable

reference model, which is produced in response to the reference ( )R k . The desired response ˆ( )Y k is therefore a

function of the reference signal and the state of the reference model.

This form of controller design has been studied extensively (see for example [Levin and Narendra, 1993],

[Narendra and Parthasarathy, 1990], and [Narendra and Parthasarathy, 1992]). Over the years, several

configurations of the controller have resulted to several different control schemes with different names (for

example see [Gupta et al., 2003], [Hine, 1997], [Omidvar and Elliot, 1997], [Nørgaard et al., 2000],

[Sarangapnani, 2006], [Spooner et al., 2002]). In the following, seven types of direct adaptive control techniques

that are widely used in industrial control applications are briefly discussed together with their advantages and

disadvantages.

( )Y k

• ( )U k

System Neural Network

Controller

( )kε

( )R k +

−

−

+

•

Reference

Model

• •

ˆ( )Y k

Fig. 2.17: Model reference adaptive control scheme: ( )U k is the control input, ( )R k is the desired reference, ( )E k is

the error between the reference model and the system output ( )Y k .


66

2.11.1.1 Direct Inverse Control with Generalized Training

In the direct inverse control with generalized training scheme ([Nørgaard et al., 2000] and [Gupta et al.,

2003]) the neural network is trained off-line to minimize the mean error between a control signal applied to the

system in an initial experiment and the control signal produced by the neural network. After the neural network

correctly models the inverse of the system, it is then used as a forward controller. Although the closed-loop

response is fast with little computational load, the deadbeat response is not a good choice. In addition, a more a

priori knowledge about the system is required for control purpose.

2.11.1.2 Direct Inverse Control with Specialized Training

The direct inverse control scheme with specialized training, unlike the direct inverse control scheme with

generalized training, utilizes a recursive network as the inverse model and the training is done on-line. The

recursive training paradigm follows from the recursive pseudo-linear regression method described in [Ljung,

1999]. With this training method the controller can be optimized for a specific trajectory and is suitable for time-

varying systems. However, there are several drawbacks with this approach, namely: 1) the method does not work

for systems with unstable inverse which often occurs when using high sampling frequency, 2) lack of tuning

options, and 3) the controller shows high sensitivity to disturbances and noise. A detailed treatment of this control

method can be found in [Nørgaard et al., 2000].

2.11.1.3 Indirect Inverse Control

Unlike the two indirect schemes above where the network is trained to model the inverse of the system, in

the direct inverse control scheme the neural network is first trained as a feedforward network (in the general case).

Next, the trained network is inverted and used as a controller. This method is also subject to the problems with the

direct inverse methods mentioned above. Additional requirement is that the inverted model (i.e. the controller)

must be well-damped.

2.11.1.4 Internal Model Control (IMC)

Internal model control (IMC) based on neural network requires a forward model as well as the inverse

model of the system. The principle of the IMC with an output disturbance ( )d k is illustrated in Fig. 2.18. Note

that instead of a direct output feedback, the error between the system output and the model is fed back. Suppose


67

that the model is perfect and no disturbances are acting on the system, the feedback signal will be zero. The

controller will then be a feedforward from the reference ( )R k . The concept and detailed treatment of IMC can be

found in [Morari and Zafiriou, 1989]. The NN-based approach of IMC is discussed in [Nørgaard et al., 2000].

IMC has been widely used for the control of chemical processes and other applications. It is a restrictive

class of control algorithm with several limitations. Due to the structure of the IMC, it is difficult to ensure that the

inverse model is trained on realistic data set. Moreover, IMC requires that the system to be open-loop stable and it

is difficult to impose constraints.

2.11.1.5 Feedback Linearization

Feedback linearization was proposed as a method for designing pole placement type of controllers for a

particular class of nonlinear systems. The feedback linearization is commonly formulated in a continuous time

framework and expressed in a canonical form. Thus, the neural network used for modeling the system must have

a particular structure in order to implement the controller. This type of controller is closely related to the model-

reference adaptive controller in which a nonlinear controller is designed such that the closed-loop system behaves

linearly according to a specified transfer function model. The consequence is the feedback linearization

controllers to be subjected to the limitations and problems of the direct and indirect inverse control. Although it is

simple to implement the feedback linearization because only a model of the system is required and tuning of the

closed-loop is possible without retraining the network. This method however does not provide any parameters for

tuning the controller. Since it is a restricted class of control techniques, it is difficult to find if the unknown

system belong to this class. In addition, model structure selection is complicated since two neural networks are

usually used to retain the canonical nature of the system.

( )Y k

•

NN Controller

C

( )U k

System d

q P−

NN Model d

q M−

Filter

F

( )d k

ˆ( )Y k

( )R k +

−

−

+

+

+

•

Fig. 2.18: The principle of internal model control (IMC) implemented with two neural networks: a model of the system

(M) and an inverse model (C) with a disturbance ( )d k acting on the output of the system.


68

2.11.1.6 Feedforward Control

Unlike the adaptive controllers discussed until now, the feedforward controller does not have feedback

but is governed only by the reference signal. However, feedback can be included when the feedforward is used for

reference tracking and the feedback is used for stabilizing the system and for suppressing disturbances. Three

following types of feedforward control exist, namely [Nørgaard et al., 2000]: 1) the static feedforward where the

feedforward controller is governed only by the reference signal without feedback when stability problems do not

exist, 2) dynamic feedforward where the feedforward controller is an inverse model with inverse model related

problems, and 3) steady-state feedforward where the feedforward controller is a function of the steady-state

reference in which the steady-state gain of the inverse model multiplied by the reference returns.

Although, the implementation of the feedforward controller is simple and reference tracking is improved

without any increase in noise sensitivity, a fundamental requirement is a feedback controller to be present. This

control method does not reduce the effect of disturbances acting on the system and a poor feedback may reduce

the performance of the system instead of enhancing it.

2.11.1.7 Optimal Control

Optimal control is a goal directed control technique that deals with the problem of finding a control law

that minimizes certain criterion to obtain the control input or signal. In contrast with all the control techniques

discussed until now which make to pole placement with full zero cancellation and problems and result to poorly

damped or unstable systems, the optimal control is an optimization strategy which uses a specialized recursive

algorithm to train the neural network. The optimal control strategy minimizes a criterion of the following form:

22

1

( ) ( ) ( ) 0N

p p p

p

J R k Y k U kρ ρ=

= − + ≥ ∑ (2.93)

where ( )R k , ( )U k , ( )Y k and ρ are the desired reference, the system input, the system output, and a tuning

parameter for penalizing changes in the control input; N specifies the length (finite horizon) of the desired

reference at each time step k .

The controller is implemented as an extension to the specialized training of the direct inverse model

controller. The extension consists of the second term in (2.93) for penalizing squared control inputs with a price in

the deterioration of the reference tracking. Compared with the direct and indirect inverse model controllers the

goal-directed optimal controller is easy to tune, applicable to a large class of time-varying systems and suitable

for designing controllers with a specified reference trajectory. A major drawback is that the network must be

trained again each time the penalty factor ρ is changed or modified. Because the network is trained on-line,

initialization of the network is difficult.


69

2.11.2 Indirect Adaptive Control

In the direct adaptive control discussed above, the neural network (NN) is the controller itself and there is

not any method to directly adjust the controller weights with the purpose of reducing the error. Although its

implementation is simple, the on-line retraining of the network is difficult every time a design parameter is

modified. The design of indirect adaptive control is always model-based. The basic principle of the indirect

adaptive control scheme is illustrated in Fig. 2.19. This scheme usually employs a NN to model the physical

system which from now on will be called the identified NN model. Then, this model is employed for controller

design based on a NN model of the physical system. The identified NN model is trained in advance but then it

becomes a part of the control strategy which involves the computation of the output error between the physical

system and the prediction of the NN model every time an updated measurement of the output of the physical

system is received, Next, the adjustment of its parameters is performed so that the error is minimized. By using

the modified NN model new controller parameters are computed so that the deviation of the system output from

the required reference is also minimized ([Hines, 1997], [Tsoukalas and Uhrig, 1997].

The influence of the nonlinearity of the NN model has led to different types of indirect controller designs

and implementations for different applications. The different achitectures for indirect adaptive controller design

which have been reported in the literature ([Gupta et al., 2003], [Hine, 1997], [Omidvar and Elliot, 1997],

[Nørgaard et al., 2000], [Sarangapnani, 2006], [Spooner et al., 2002]), can be grouped into two main classes,

namely: indirect adaptive control design based on the finding of a linear approximation of the physical system

and indirect adaptive control design based on the tuning of the initial nonlinear NN model at every output an

updated output measurement becomes available. The former class comprises pole placement controller,

generalized minimum variance control (GMVC) and the well-known generalized predictive controllers (GPC)

whereas the latter class is nonlinear predictive controller (NPC). These two classes are briefly discussed in the

following two sub-sections and some remarks on their performance are made.

( )Y k

• ( )U k

System

Neural Network

Controller

( )kε

( )R k +

−

−

+

•

Prediction

Model

•

•

Identification Model

•

•

+

−

( )kε

ˆ ( )Y k

Fig. 2.19: Indirect model-based adaptive control scheme: ( )U k is the control input, ( )R k is the desired reference,

( )E k is the error between the reference model and the system output ( )Y k .


70

In the direct adaptive control scheme shown in Fig. 2.19, the neural identification model is used to model

the system which is assumed here to be nonlinear. If necessary, this model may be updated to track the system.

The error signal is then backpropagated through the identification model to train the neural network controller so

that the system response is equal to that of the reference model. Note that this method uses two neural networks,

one for model or system identification and the other for model reference adaptive control (MRAC) [Hines, 1997].

2.11.2.1 Indirect Adaptive Control Based on Instantaneous Linearization

The idea behind instantaneous linearization is to extract a linear model from a nonlinear NN model at

each sampling instant. The linearization of nonlinear models is a technique often used to simplify the design of

controllers for nonlinear systems. The basic principle of controller design based on instantaneous linearization is

shown in Fig. 2.20. This control scheme is adapted from Fig. 2.19 except that the nonlinear NN model is extracted

and linearized around a specific operating regime. The characteristics of the nonlinearities and size of the

operating regime determines whether it is sufficient to use a single linear model, or it is necessary to model by

linearizating around a larger set of operating points with banks of controllers and some switching rules as in the

so-called gain scheduling controllers [Åström and Witternmark, 1995].

Assuming that a nonlinear NN model based on any of the structures described in Section 2.5 has been

obtained through an off-line training process, the instantaneous linearization has been applied to directly obtain an

approximate pole placement controller with compensation for disturbance based on a so-called internal model

principle in [Nørgaard et al., 2000]. In the pole placement scheme, the desired behaviour is specified in terms of a

transfer function or a set of closed-loop poles with possible oscillatory and damping related problems.

Another type of controller design based on the instantaneous linearization is the so-called generalized

minimum variance controller (GMVC). Unlike the pole placement technique, the GMVC can be specified in terms

of a criterion which should be minimized to obtain the control signal ( )U k . The criterion-based design may be

Controller

Design

Extract

Linear Model

Controller

System

Linearized model parameters

ˆ ( )Y k ( )U k

( )R k • •

Fig. 2.20: Indirect control based on instantaneous linearization of the neural network model.


71

advantageous in that tuning becomes simple and intuitive. It appears to be a natural way of designing controllers

for stochastic systems in the context of self-tuning regulators [Åström and Witternmark, 1995]. The basic GMVC

controller is usually designed to solve a servo problem rather than only a regulation problem. Assuming that the

NNARMAX model discussed in Section 2.5.2.2 has been obtained, the objective might be to compensate for

stochastic disturbances. To cover a wider class of systems, the criterion that is often minimized by the GMV is of

the form:

21 1 1 2ˆ( ) ( ) ( ) ( ) ( ) ( ) ( )G G G kJ k E P z Y k d W z R k Q z U k I

− − − = + − + (2.93)

where ,G GP W and GQ are rational transfer functions, and ( ), ( 1), , (0), ( ), ( 1), ,kI Y k Y k Y U k U k= − −… …

( ), ( 1), , (0)U k U k U− … . A simplified form of (2.93) is given from [Isermann et al., 1992] and [Nørgaard et al.,

2000] as:

[ ] 2 2ˆ( ) ( ) ( ) ( ) kJ k E Y k d R k U k Iρ = + − + ∆ (2.94)

The minimizing solution for (2.94) can be expressed as:

1 1 1 1

1 1

( ) ( ) ( ) ( )

( ) ( )

G G

G

R z B z R z C z

T z C z

ρ− − − −

− −

= ∆ + ∆

= (2.95)

where the 1( )GR z− and 1( )GS z

− can be found from the following Diophantine equation: [Åström and

Witternmark, 1995]:

1 1 1 1( ) ( ) ( ) ( )d

G GC z A z R z z S z− − − − −= ∆ + and 1 1 1( ) ( ) ( )G GR z R z B z

− − −= ∆

with deg( ) 1GR d= − and deg( ) max( , )G a cS n n d= − .

The GMV control is a fairly general approach with many design parameters and can be used to control

systems with unstable inverse by choosing suitable design parameter. The reason for penalizing the differenced

control inputs is to accomplish the desired integral action; that is, ∆ becomes a factor of 1( )GR z− . This scheme

is well-suited for real-time control in that the linearization and the controller design can be performed in between

time samples. It also provides a useful physical interpretation of the dynamics of the system. A major

disadvantage of this method is that the linearized model might be valid only for a very narrow region around the

current operating point.

The third class of controllers based on the linearization considered here is the so-called generalized

predictive control (GPC) proposed by Clarke and co-workers [Clarke et al., 1987]. The GPC is a class of model

predictive (MPC) algorithms and was proposed in the context of adaptive control and it has been widely used in

many advanced and industrial control applications with successful results. It is a criterion-based design and

follows the basic structure shown in Fig. 2.33. The GPC includes an integral action to account for regular constant

disturbances as well as for operating point dependent bias component, which shall be defined as ( ')d k . This is


72

achieved by 1) considering a criterion where the differenced control inputs are weighted, and 2) a careful selection

of the predictor. The general form of the criterion minimized by the GPC can be expressed as:

ˆ ˆ ˆ( ( )) ( ) ( ) ( ) ( ) ( ) ( )T

TJ U k R k Y k R k Y k U k U kρ = − − +

(2.96)

Note that the difference between (2.2) and (2.96) is the absence of the weighting matrix κ in (2.96) for penalizing

changes on ˆ( )Y k .

Several strategies for the implementation of GPC have been exploited and there is a rich literature for it.

The GPC can be tuned by quite intuitive means, it is well-suited for controlling systems with time delay and can

stabilize unstable systems or systems for which the inverse is unstable. It compensates for disturbances with

known characteristics and it can handle inputs and/or output constraints in a natural way. The major drawback

with the GPC is that it relies on linearized models which may be valid only in certain operating regime of the

system under control. In addition, the GPC may suffer from sensitivity to noise.

A new adaptive generalized predictive control (AGPC) strategy is proposed in this work which is derived

from the GPC controller. The proposed strategy is presented in Chapter 4.

2.11.2.2 Nonlinear predictive control

The nonlinear predictive control (NPC) is also a class of model predictive control (MPC) algorithms and

has the basic structure as that depicted in Fig. 2.19. It is an outgrowth of GPC and can have the same criterion

defined by (2.96) in the general case but employ the nonlinear NN model directly for control. The minimization

of the NPC criterion for output predictions and control input computation constitutes a complex nonlinear

optimization problem. When real-time issues are taken into account, this demands that algorithm computation

must be completed within the sampling period of the measurements of the system outputs. The sampling period is

usually very short for systems with short time constants. Thus the algorithm must be numerically robust and be

able to run automatically since there will be no operator inference or tuning parameters adjustments to achieve

satisfactory convergence.

The NPC has similar advantages as the GPC except that it can cover the entire operating range of

nonlinear dynamic systems. However, the NPC is computation demanding when compared to GPC. The criterion

may have several local minima and hence an efficient minimization algorithm is required. Ad hoc fixes for

handling stopping, numerical problems, local minima, convergence, etc., might conflict with real-time demands if

a global minimizer (i.e. the new control inputs) of the criterion could not be found within the sampling period of

the system under control.

Although extensive work on NPC (also called neuro-control) design and application has been reported

([Omidvar and Elliot, 1997], [Nørgaard et al., 2000], [Sarangapnani, 2006], [Si et al., 2004], [Spooner et al.,

2002]), it is still an active area of research with respect to the issues highlighted in Section 2.10. In the chapter 4


73

of this work, a new nonlinear adaptive model predictive control (NAMPC) algorithm is proposed and the

complete formulation as well as implementation details is also presented in the chapter.

2.11.3 Modular Neural Network (MNN) Controller Design

Modular neural networks perform a spatial clustering of input space in order to select and dispatch the

input vector to various modules. The most advanced algorithm in this respect is the local model network (LMN).

The LMN consists of a number of single layer NN, each one of them having the same architecture and trained to

model a system behavior around a local point (local plant) and a gating system often composed of radial basis

functions (RBF). The various local models are activated by a time-varying input vector X that usually

corresponds to the vector of parameters of a NARMAX model given in (2.8). Note that NARMAX model is a

lagged version of the NARX model and other model structures can as well be used for this purpose.

The construction of the local models into a multiple model structure and the controller structure including

possible switching techniques is discussed in [Ronco and Gawthrop, 1997]. A comprehensive treatment of the

modular and multiple model methods with several extensions and remarks on their implementation can be found

in the book “Predictive Modular Neural Networks” [Petridis and Kehagias, 1998]. This form of control has been

extended to include fuzzy rule-based logic to enhance switching between multiple models and controllers as

discussed in [Petridis and Kehagias, 1998] and [Ronco and Gawthrop, 1997]. Recent extensions of the multiple

models technique to adaptive control using neural networks have also been reported [Chen and Narendra, 2001].

2.11.4 Back-Propagation Through Time (BPTT) Control Scheme

The backpropagation through time (BPTT), besides being used for recurrent neural network training, can

also be used for neural control [Werbos, 1990]. The BPTT can be used to move a system from one state to another

state in a finite number of steps, provided that the system is controllable. The typical structure of this control

scheme is shown in Fig. 2.21; where X is the state vector, U is the control signal, C is the controller, P is the

( 1)U k −

'( )X k(0)X

•

C

P

C

P

C

P

C

P

∑

• • •

( )U i (1)U (0)U

(1)X ( )X i ( 1)X k − ( )X k

Initial

State Final

State

Fig. 2.21: Basic structure of the backpropagation through time (BPTT) control scheme.


74

plant model, '( )X k is the desired target state, and ( )E k is the error signal. Generally, the BPTT training takes

place in the following two steps:

1). The plant motion stage, where the plant model takes k time steps from the initial state (0)X to state

( )X k as shown in Fig. 2.21.

2). The weight adjustment stage, where the weights of the controller are adjusted to make the final state to

approach the target state.

In the BPTT scheme, the neural network must be trained so that the error signals can be propagated through it to

the controller, then the controller can be trained with the BPTT algorithm. Note that there is only on set of

weights to be adjusted since there is only one controller. Thus, several iterations are executed until the desired

performance is achieved.

2.11.5 Neural Network-Based Adaptive Critic Designs

Neural network-based adaptive critic designs can be supervised or unsupervised [Sarangapani, 2006].

However, in a more general case, typical neural network-based adaptive critic design (ACD) consists of three

neural networks: critic network (for evaluation), the model network (for prediction), and the action network (for

decision) as well as the real plant as illustrated in the action-dependent heuristic dynamic programming (ADHDP)

form of ACD ([Liu, 2002]; [Mohagheghi et al., 2006]) shown in Fig. 2.22. Consider a discrete-time nonlinear

time-varying system described by the following equation:

[ ]( 1) ( ), ( ),x k F x k u k k+ = (2.97)

where nx∈ℜ is the state vector and m

u ∈ℜ denotes the control action. Suppose that the performance index or

( 1)x k +

Critic Network

Model Network

Action Network

Plant

( )u k

( )u k

ˆ ( 1)J k +

( )x k

• •

Fig. 2.22: The structure of an action-dependent heuristic dynamic programming form of adaptive critic design.


75

cost associated with the system (2.97) can be expressed as

[ ]( ( ), ) ( ), ( ),p

p

J x U x p u p pτ

τ

τ τ γ∞

−

=

=∑ (2.98)

where U is the utility or local cost function and [0,1]γ ∈ is the discount factor. Note that J is dependent on the

initial time τ and the initial state ( )x τ and it is referred to as the cost-to-go of state ( )x τ . The objective then is

to choose the control sequence ( ),u p , 1, 2, ,p τ τ τ= + + … such that the function J in (2.98) is minimized.

The training of the critic network is achieved by minimizing the error function of the following form:

ˆ ˆ( ) ( ) ( ) ( 1)c

k

E k J k U k J kγ = − − + ∑ (2.99)

where [ ]ˆ ˆ( ) ( ), ( ), ,c

J k J x k u k k W= and c

W denotes the NN parameters (i.e. the weights) of the model network If

the ( ) 0c

E k = for all k , then (2.99) takes the form:

ˆ ˆ( ) ( ) ( 1) ( )p

p

J k U k J k U pτ

τ

γ γ∞

−

=

= + + =∑ (2.100)

Thus, training a NN which minimizes (2.99) produces the estimate of (2.99) defined by (2.100). The minimization

of (2.99) can be achieved with the standard back-propagation algorithm.

Next, the action network is trained with the objective of minimizing ˆ( 1)J k + through the use of the

action signal [ ]( ) ( ), , Au k u x k k W= . At the end of the training process, the action network generates control signals

which are optimal. The training of the action network is done through weight updates while keeping the weights

of the critic and the model networks fixed.

Most often in ACDs, decision has to be taken without having an accurate assessment of the effectiveness

of the control actions. The critic network evaluates the results of the control action from the action network: if it is

good, the action is reinforced and if it is poor, the action is weakened. This type of training the action network is

called reinforcement learning [Sutton and Barto, 1998]. This is rather a trial and error method which uses active

exploration when the gradient of the evaluation system in terms of control action is not available. Several

techniques for training neural network-based ACDs are discussed in [Omidvar et al., 1997], [Prokhorov, 2007],

[Sarangapani, 2006] and [Si et al., 2004].

2.12 State-of-the-Art in Neural Network-Based MPC: Neural Network Training,

Model Identification, Adaptive Control and MPC Implementation

This sub-section critically examines the main concepts of all the materials discussed in the chapter. State-

of-the-art overview on neural network-based MPC strategies is reviewed in terms of neural network training,

model identification techniques, neural network-based MPC scheme, and MPC implementation techniques. The


76

successes and drawbacks of these approaches are presented and discussed. Then, new strategies proposed in this

research are presented to improve on the reported successes while correcting the drawbacks reported.

2.12.1Neural Network and Training Methods

As it has been widely discussed and demonstrated in [Narendra and Parthasarathy, 1990], [Nørgaard et

al., 2000], [Omidvar and Elliott, 1997], [Spooner et al., 2002] and [Chiong, 2010], the accuracy of a nonlinear NN

model depends on the structure of the nonlinear function that will be used to describe the dynamics of the system

to be identified; the NN architecture; the selection of the inputs to the NN; the NN training algorithm; and the NN

model identification scheme.

Over time, different architectures of NNs have evolved (see [Gupta et al., 2003], [Hagan et al., 1996] and

[Haykin, 1999]) and it has been shown that a multilayer perceptron (MLP) NN with one hidden and output layers

is capable of approximating any continuous and/or nonlinear functions reasonably well with an arbitrary degree of

accuracy. The most widely used NN architecture for dynamic system modeling is the dynamic feedforward NN

(DFNN) ([Yüzgeç et al., 2008]; [Colin et al., 2007] and [Samek and Macku, 2008] and [Pan and Wang, 2008]).

The use of recurrent NN (RNN) for modeling nonlinear dynamic systems has also been reported in several

research papers such as in [Lu and Tsai, 2008], [Pan and Wang, 2008], [Wang and Thomas, 2006] and [Zhang

and Quan, 2009].

On the other hand, recurrent neural networks (RNNs) are more powerful than DFNNs because they

contain the basic FNN structure with feedback connections from the output to the input layer via a state layer (the

so-called Jordan network [Pearlmutter, 1995]) or from the output unit to the input unit of the hidden layer via a

context layer as in the so-called Elman network [Song, 2010]. However, training these networks presents

difficulties due to their feedback structures ([Bengio et al., 1994]; [Pearlmutter, 1990 and 1995]). The usual

methods for training RNNs are: 1) the real-time recurrent learning (RTRL) discussed in [Williams and Zipser,

1989] with further extensions in [Pearlmutter, 1995] is an unrestrictive on-line, exact, stable but computationally

expensive method for determining the derivatives of the state functions of a dynamic system with respect to the

internal parameters of the system; and 2) the backpropagation through time (BPTT) [Werbos, 1990] where the

network is unfolded into a multilayer feedforward network that increases by one at each time step with growing

memory requirements as shown in [Pearlmutter, 1995] and [Song, 2010]. Both RTRL and BPTT are variations of

the backpropagation (BP) algorithm originally proposed by [Rumelhart et al., 1986] as a basic gradient descent

algorithm detailed in [Boyd and Vandenberghe, 2007], [Antoniou and Lu, 2007], [Dennis and Schnabel, 1996],

and [Kelley, 1995 and 1999].

However, Williams and Zipser [Williams and Zipser, 1989] extended the RTRL to teacher-forcing RTRL

method. According to this method, the feedbacks to the network input (state layer) in subsequent computations


77

are the actual outputs from the system rather than those from the network outputs. This architecture places the

system in parallel with the NN model resulting in a series-parallel model structure which has been shown in

[Narendra and Parthasarathy, 1990], [Omidvar and Elliott, 1997] and [Spooner et al., 2002] to be suitable for

stable adaptive control systems design. By teacher forcing the output units, no real recurrent paths exist, so that

the basic BP through a single time step can be used for training the resulting network in a feedforward fashion.

The main disadvantage of this method (i.e., the state to be retrained by the network across time) is alleviated in

the control applications if a small stack proposed in [Dias et al., 2005] or a sliding window proposed in

[Pearlmutter, 1995] and [Savran et al., 2006] of the output values at previous time steps are cycled back to the

state layer during network training.

2.12.2 Neural Network Model Identification for MPC Design

Neural network model identification for use in MPC applications has been reported using feedforward

neural network (FNN) ([Colin et al., 2007]; [Pan and Wang, 2008]; [Samek and Macku, 2008]; [Yüzgeç et al.,

2008]) and recurrent neural network (RNN) ([Jin and Su, 2008]; [Lu and Tsai, 2008]; [Wang and Thomas, 2006];

[Zhang and Quan, 2009]). The MPC methods reported in these papers all uses the linearized form of the identified

nonlinear NN model and employ the generalized predictive control (GPC) techniques originally proposed by

Clarke and co-workers [Clarke et al, 1987a and 1987b] except that in [Lu and Tsai, 2008] the GPC control actions

are updated online in real-time for a relatively slow process with long sampling time using a recursive least

squares (RLS) adaptation rule that is obtained from a network trained with the basic BP algorithm. Although, the

GPC was proposed by Clarke and co-workers [Clarke et al, 1987a and 1987b] in the context of adaptive control, it

is well-known the control performance can degrade if the system is operated outside its valid linear or operating

region.

Since 1990, the design of control algorithms for nonlinear dynamical systems which is based on the use

of neural network (NN) models of the plant dynamics has been studied extensively with some successful results

reported in research papers such as the ones in [Chiong, 2010], [Narendra and Parthasarathy, 1990], [Narendra

and Parthasarathy, 1992], [Nørgaard et al., 2000], [Petridis and Kehagias, 1998] and [Ronco and Gawthrop,

1997]. In these cases, the major issues that arise are the structure of the nonlinear function that will be used to

describe the dynamics of the system, the NN architecture, the training algorithm for adjusting the NN parameters,

the way the controller and the NN are used within the control loop, and how the control algorithm will be

updated. Neural network (NN) parameters, for instance, are the weights and biases of the network. The

adjustments of NN parameters can be achieved by minimizing the errors between the outputs of the neural

network and the desired outputs ([Gupta et al., 2003]; [Hagan, 1996]; [Haykin, 1999]). This minimization is

called the neural network training process which is performed by an algorithm.


78

Over the years, different NN architectures and training algorithms have been proposed in several research

papers such as those in ([Chiong, 2010]; [Hagan and Menhaj, 1994]; [Haykin, 1999]; [Wu, 2008]; [Narendra and

Parthasarathy, 1990]; [Nørgaard et al., 2000]; [Petridis and Kehagias, 1998]; [Prokhorov, 2007]; [Rumelhart et al.,

1986]; [Werbos, 1990]; [Williams and Zipser, 1989]). Among these, the most common NN architecture is the

feedforward NN (FNN) while the backpropagation (BP) algorithm ([Rumelhart et al., 1986]; [Werbos, 1990]) and

the Levenberg-Marquardt algorithm (LMA) ([Hagan and Menhaj, 1994]; [Marquardt, 1963]) are two common

NN training algorithms. Although, the BP has been widely used in training FNNs due to the simplicity and ease of

its implementation, it has been reported to be characterized by large network parameters, poor convergence, long

training cycles (epoch) and not being robust when compared to the Levenberg-Marquardt algorithm (LMA).

Another widely used method which is increasingly been used for model identification or parameter

estimation incorporating neural networks is the recursive least squares (RLS) algorithm. While the

backpropagation and the Levenberg-Marquardt algorithms evaluates all the training data set at each epoch in an

off-line manner, the recursive least squares algorithms evaluates each input-output data pair as new measurements

are acquired in an on-line manner. Thus, the RLS algorithms have been widely used in adaptive control, adaptive

filtering, adaptive signal processing, adaptive prediction, etc. ([Goodwin and Sin, 1984], [Ljung, 1999]). The

major challenge in implementing the RLS algorithm is that the on-line computation must be computed in such a

way that the processing of the measurements from one sampling instant must be completed during one sampling

interval with certainty. The use of RLS for on-line model and parameter estimations have been studied

extensively ([Chen, 2009], [Goodwin and Sin, 1984], [Ljung, 1999], [Salgado et al., 1988]). The extensions of the

RLS with neural networks are recent developments ([Bouchard, 2001], [Lu and Tsai, 2008], [Mirikitani and

Nikolaev, 2007]). The integration of the RLS into neural network framework has resulted in several formulations

of neural network-based RLS algorithms [Bouchard, 2001]. Attempt has also been made to incorporate the RLS

algorithm into the Levenberg-Marquardt algorithm for training recurrent neural networks [Mirikitani and

Nikolaev, 2007]. A close study of the algorithm proposed in [Mirikitani and Nikolaev, 2007] is the basic

Levenberg-Marquardt algorithm with extra regularization term and a recursive covariance update rule at the

expense of high computational complexities and requirements. However, as noted by Bouchard [Bouchard, 2001],

the two key issues that arise from the use of neural network-based RLS algorithms for adaptive control

applications are: 1) they do not solve the problem of finding a nonlinear model of the system that will be valid for

most control signals, and 2) it is characterized by poor numerical robustness and slow convergence.

One way to overcome these problems is to formulate the RLS algorithm as off-line model identification

or parameter estimation algorithm where all the input-output data set are evaluated for a specific number of times

at each time sample as new measurements are acquired and added to the data set. Although, this idea is not new in

the general framework of RLS algorithm formulations ([Ljung, 1999], [Salgado et al., 1988]), and can be

extended to the neural network schemes for formulating RLS algorithms. Since neural network is data-driven

application, improved numerical robustness and convergence can be improved with the assumption that the


79

computations will be completed within the sampling time of the system under consideration. Also, rather than

training the network to infinity as in [Chen, 2009] and for long-term stability as [Bouchard, 2001], and a nonlinear

model of the system can be identified for a specified number of iterations that will be suitable and valid for

control applications. Since new measurements are acquired on-line at each time sample, the next issue to be

addressed is on the weight update for the time-varying system. The point here is that as new measurements are

acquired, old information must be discarded to avoid redundant information during the updated weight as this

could lead to inaccurate process model. The effect of inaccurate process model is more severe in model predictive

control strategy as is evident in [Bouchard, 2001], [Lu and Tsai, 2008], and [Mirikitani and Nikolaev, 2007].

The basic RSL algorithms are well known to have optimal properties when the parameters are time-

invariant and the gain of the RLS algorithm converges to zeros ([Goodwin and Sin, 1984], [Narendra and Driollet,

2001]). In the neural network framework, the data must be properly excited in order for the RLS algorithm to

track time-varying signals. However, as proposed in [Salgado et al., 1988], if an adaptive scheme is incorporated

to discard obsolete information as new ones are acquired, time-varying signals can accurately be tracked even

when the data are poorly excited. In [Salgado et al., 1988], three methods for discarding obsolete information in

basic RLS algorithms, namely: forgetting factor method, constant trace method, and the exponential forgetting

and resetting method. Simulation studies in [Salgado et al., 1988] shows that the exponential forgetting and

resetting method proved superior over the other two methods. With this methods, the gain convergence of the

does not go to zeros.

Irrespective of using the backpropagation (BP) algorithm, Levenberg-Marquardt algorithm (LMA) or the

recursive least squares (RLS) algorithm; an alternative way to train a NN with the purpose of modeling the

behaviour of a controlled process could be to place the true system in parallel with the network and establish a

feedback loop from the true system output to the NN inputs, and then apply the BP or LMA training algorithms.

In literature ([Gupta et al., 2003]; [Haykin, 1999]; [Pearlmutter, 1995]; [Williams and Zipser, 1989]), this

approach is referred to as the teacher forcing method. As pointed out earlier and noted also in [8], the difficulty

with this method is that the state to be retrained must be manifested throughout the network training process.

In this work, to overcome this problem, the FNN is augmented with appropriate tapped delay line (TDL)

memory units to form a dynamic FNN (DFNN) while a short temporal window is also created to store not just the

current states but also few states from the previous time steps in a first-in first-out fashion. Both the BP and the

LMA algorithms are standard algorithms where all the input-output data set are evaluated in batch mode at each

epoch. Recursive least squares algorithms, on the other hand, are sequential algorithms where each input-output

data pair is evaluated at each epoch. In order to adapt the standard algorithms for DFNN training, the same

approximation obtainable in the sequential methods must be made. The main drawback in adapting standard

algorithms for online training is the difficulty in inverting and searching for a region where the Hessian matrix is

guaranteed to be positive definite. This drawback has been investigated in several literature and different

techniques to solve this problem have been reported by many authors including [Antoniou and Lu, 2007], [Boyd


80

and Vandenberghe, 2007], [Chiong, 2010], [Dennis and Schnabel, 1996], [Fletcher, 1987]; [Kelley, 1996 and

1999], [Marquardt, 1963], [Sagaldo et al., 1988]; [Scales, 1985], [Wu, 2008].

Such a Hessian matrix is formed during the solution of the optimization problem that a training algorithm

involves. Also, in this work, a modified LMA (MLMA) is proposed which alleviates the above drawback and is

adapted to DFNN architecture. Also a new adaptive recursive least square (ARLS) identification algorithm is

proposed for training the DFNN.

2.12.3 Neural Network–Based MPC Algorithms

Furthermore, a nonlinear adaptive model predictive control (NAMPC) algorithm is presented which

depends on the neural network model of the process obtained from the online identification of a nonlinear model

of the controlled process. Then, by using the updated model, it solves the optimal control problem online in order

to determine the control strategy on which the next control action will be applied. The model identification is

based on a neural network which is trained by using data from the prior operation of the plant and then used

online to adjust itself to monitored changes of the plant dynamics over the time.

The use of a neural network model instead of the actual nonlinear plant model derived from first

principles and expressed by ordinary and/or partial differential equations simplifies considerably the computations

of MPC actions at every sampling instant and makes it appropriate for use in an adaptive MPC algorithm ([Colin

et al., 2007]; [Lu and Tsai, 2008]). The simulation of differential equations and dynamic models to obtain training

and test data for neural network model development for use in different applications together with their merits

have been reported ([Guarneri et al., 2008]; [Savran et al., 2006]). While this approach allows for perturbations of

the mathematical model to account for plant uncertainty, it also reduces computational burden for real-time

control implementation ([Colin et al., 2007]; [Lu and Tsai, 2008]; [Prokhorov, 2007]) as a nonlinear discrete

neural network model is available immediately after the network training ([Nørgaard et al., 2000]; [Sarangapani,

2006]).

The optimization of the proposed MPC control actions can be based on the Newton’s method, quasi-

Newton’s method or the Broyden-Fletcher-Goldfarb-Shanno (BFGS’s) method described in several literature

([Dennis and Schnabel, 1996], [Fletcher, 1987]; [Kelley, 1995 and 1999]; [Scales, 1985]). While the BFGS’s

method approximates the derivatives of the Hessian matrix, the quasi-Newton assumes a fixed step size in search

for the minimum of the minimization. On the other hand, it is well-known that the Newton’s method does not

suffer from poor scaling of the Jacobian matrix and converges rapidly without need for a line search when the

initial solution is near the optimal solution. The well-known problems associated with the Newton’s method is

that the Hessian matrix occasionally becomes ill-conditioned or singular and is not guaranteed to be positive

definite in an open neighbourhood of a minimum. On the other hand, the Levenberg-Marquardt algorithm is


81

known for robustness and fast convergence except for the selection of the Levenberg-Marquardt parameter, (λ) to

achieve them.

The backprogation algorithm is usually the most common algorithm for neural network training. It is a

pure gradient algorithm and is characterized by poor performance which has led to the adoption of the Newton

method. While the first-order derivative (Jacobian matrix) of the Newton method is guaranteed to be positive

definite the second-order derivative (Hessian matrix) is not. Thus, the Gauss-Newton method approximates the

Hessian matrix based on the Jacobian matrix. In order to obtain the optimal network parameter ˆ( )kθ (where k is

time samples), the Gauss-Newton Hessian matrix must be inverted and this sometimes causes the optimal network

parameters to be trapped in several local minima due to ill-conditioning or singularity of the Hessian matrix. To

alleviate this difficulty led to the Levenberg-Marquardt formulations which added a non-negative parameter ( λ )

to the diagonal of the Gauss-Newton Hessian with a new update rule. The backpropagation (BP) algorithm and

the Levenberg-Marquardt algorithm (LMA) are both standard algorithms where ˆ( )kθ is updated after all the data

set have been are evaluated whereas the recursive algorithms are sequential algorithm where ˆ( )kθ is updated

after a single input-output data pair is evaluated.

In this work, two neural network training algorithms are proposed, namely: adaptive recursive least

quares (ARLS) and modified Levenberg-Marquardt algorithm (MLMA) algorithms. The ARLS is an online

estimation algorithm and is proposed here as a standard algorithm while the MLMA algorithm is proposed as an

online algorithm. The similarity is that both are operated and implemented as standard algorithms. The difficulties

in adapting the MLMA for online identification are due to the inability to invert the Hessian matrix and searching

for the region where the optimal value of the nonlinear function could be found in an open neighbourhood of the

global minimum.

The proposed ARLS and MLMA algorithms for nonlinear neural network model identification use both

the series-parallel and parallel identification schemes. The ARLS and MLMA algorithms are used to approximate

neural network-based nonlinear autoregressive with exogenous inputs (NNARX), nonlinear autoregressive

moving average with exogenous inputs (NNARMAX), and nonlinear output error (NNOE) model structures

through minimization procedures. Initially, the neural network (NN) is trained off-line to determine suitable

network parameters. Next, the identification scheme is applied online for NN model identification at each

sampling instant. Ideally, adaptive training requires that one input-output data pair be evaluated at each sample

time, but because NN is a data driven application, the resulting model will not capture the underlying system

dynamics. Thus, a sliding stack window is used to store a short history of the training patterns and this stack is

updated continuously at each sample time in a first-in first-out fashion. With the proposed technique, old data is

discarded as new data is progressively added according to an exponential forgetting and resetting algorithm

incorporated in the proposed ARLS. Moreover, the ARLS is approximated by a Gauss-Newton method without

the inversion of the Gauss-Newton Hessian matrix. In this way the computational burden of matrix inversion is


82

avoided and at the same time the convergence and stability of the network training in the presence of

nonlinearities and uncertainties is guaranteed due to the continuously differentiable nature of the hidden layer

activation function of the MLP NN.

In order to utilize the above mentioned NNARX, NNARMAX and the NNOE model trained with the

proposed ARLS and the MLMA algorithms, two new adaptive control algorithms have been proposed, namely:

adaptive generalized predictive control (AGPC) and nonlinear adaptive model predictive control (NAMPC). The

AGPC is adapted from the generalized predictive control (GPC) originally formulated by Clarke and Co-workers

[Clarke et al.., 1987a and 1987b]. The GPC is a well-celebrated class of MPC algorithms with less computational

requirements. The AGPC is formulated as an adaptive neural network-based constrained GPC algorithm based on

the instantaneous linearization of an identified NN model in an adaptive context by solving a set of recursive

Diophantine equations ([Camacho and Bordons, 2007]; [Clarke et al.., 1987a and 1987b]; [Maciejowski, 2002])

and then applies sequential quadratic programming (SQP) technique implemented in order to obtain the optimal

control signal for the AGPC.

Different from the online unconstrained minimization used by the MLMA algorithm, the proposed

NAMPC is an online constrained optimization control strategy based on the full Newton method. It is well-known

that the full Newton Hessian matrix is not guaranteed to be positive definite in an open neighbourhood of a global

minimum during its inversion. Thus, the NAMPC incorporates a new iterative strategy for guaranteed positive

definiteness of the full Newton Hessian matrix for online adaptation. The adaptive updating rule used by the

NAMPC is based on the Levenberg-Marquardt method which is a second-order trust region optimization method

known for robustness and fast convergence as reported in some literature ([Guarneri et al., 2008]; [Wu, 2008]). A

difficulty with the Levenberg-Marquardt method is the selection of an adaptive updating parameter as noted by

many researchers ([Chiong, 2010]; [Dias et al., 2005]; [Hagan and Menhaj, 1994]). In this work, a modification is

made to the Levenberg-Marquardt method by incorporating into the NAMPC a strategy for iteratively selecting

the updating parameter and updating the control sequence accordingly. The main difference between the proposed

NAMPC and the AGPC is that the former uses a nonlinear NN model directly to compute the optimal control

signal whereas the latter uses a linearized version of a nonlinear NN model.

2.12.4 MPC Implementation

The next issue is on the implementation and validation of the proposed identification and adaptive MPC

algorithms. Traditionally, industrial systems are controlled using programmable logic controllers (PLCs) which

provide easy way for system control using ladder logic (and sometimes C programs). PLCs can be used to control

part of an industrial control system; and they can typically withstand shock, vibrations, elevated temperatures and

electrical noise which are the characteristics of industrial control system. Due to the advent of communication


83

technologies and smart sensors, computers are now the most widely and commonly used platform for

implementing varieties of control algorithms including MPC algorithms. Depending on the number of computers

used for control, the configuration of these computers and how these computers communicate with the system

under control; a network control system (NCS) can be established.

In the context of network control system (NCS), it is now common in industrial control to use computer

networks for passing measurements collected from sensors to the controllers, that is, the computing facilities

which execute the control algorithms and transmitting the commands produced by the controllers to the actuators

which adjust the values of the controlled variables. Furthermore, a second level of computer networks is used for

the communication of the controllers with higher level computers which perform operations management and

supervisory or cell control. Usually, the systems that consist of sensors, actuators, controllers and supervisory

computers which communicate between each over a computer network are called Networked Control Systems

(NCS). The uses of computer networks in industrial control applications has the benefits of reduced wiring and

eases of maintenance and are usually build with special architectures and protocols ([Chow and Tipsuwan, 2001];

[DDCMC, 1999]; [Lee et al., 2006]) which provide a bounded transmission delay. However they suffer from high

hardware and software cost, and the inability to be linked directly to ordinary IP networks, build around the IEEE

802.3 Ethernet technology which would allows them to communicate with ordinary computing facilities available

off-the self at low cost or are used an industrial organization [Jammes and Smit, 2005b] for office or computer-

aided design/computer-aided engineering (CAD/CAE) functions. Such facilities can offer enhanced computing

power, sophisticated graphics and mathematical processing software. The recent advances on service oriented

architectures (SOA) for networks [Jammes and Smit, 2005b] build again over IP networks and offered in the form

of standardized off the shell solutions make even more attractive the replacement of the special high cost

architectures of the NCS with such architectures. SOA architectures offer high degree of flexibility,

interoperability, ease of use and application development over the IP protocol, and complete language and

platform independency. However, as it might be expected, it offers all these advantages at the cost of higher

communication overhead. To use it in NCS there must be found ways of implementing these concepts at lower

communication costs so the time limits imposed on the exchange of information in a NCS must be met. To this

end a protocol stack was developed to embed the SOA technology, based on web services (WS), into sensors and

actuators. It was called the device profile for web services (DPWS) [DPWS, 2006] and presented an SOA

implementation with reduced bandwidth requirements. However, to extend the SOA concepts to advanced control

strategies, such as predictive and adaptive control a further reduction in the bandwidth requirements is needed

[Cucinotta et al., 2009].

As it is demonstrated in this work such a reduction can be achieved if a proposed new computer network

architecture for NCSs is used. In this architecture the DPWS technology is modified by introducing a new format

for the exchange of messages in the network and is combined with the use of switched Ethernet. In this way an

overall bounded transmission delay among the sensing and actuating devices is achieved. The performance of this


84

network has been evaluated by considering the fluidized bed furnace reactor (FBFR) of the steam deactivation

unit (SDU) of a fluid catalytic cracking (FCC) pilot plant. The performance evaluation of the process involve the

implementation of the proposed identification and the adaptive predictive control algorithms for the model

identification and adaptive control of the FBFR process with the purpose of investigating if the closed

identification and control satisfy the real-time constraints of the process.

Over the years, as the complexities of industrial systems increases, the complexity of control algorithms

have also increased considerably and hence the computational requirements have also increased. The complexities

of industrial control systems has led to their decomposition into different sub-systems and termed the name

multivariable control systems having multiple inputs and multiple outputs (MIMO) with several constraints. MPC

algorithms have been recognized for handling difficult multivariable control systems with hard constraints.

However, the computational requirements inherent in MPC algorithms have limited their use to multivariable

systems with large sampling time. The advents of multi-core computers have made possible the extension of MPC

algorithms to the control of systems with short sampling times a recent research interest in diverse fields

especially in aircraft auto-pilot control, unmanned aerial vehicles (UAVs) and robotics. Despite the computational

power of modern multi-core computers, the nonlinear MPC (NMPC) computations are hardly completed within

the sampling times of the just mentioned three application areas due to their relatively short time constants when

compared to the computationally intensive optimization associated with NMPC algorithms.

Recently, investigations and surveys on the use of FPGAs in industrial control applications have been

reported ([Malinowski and Yu, 2011]; [Meloni et al., 2010]; [Monmasson et al., 2011]); where it has been

proposed that FPGAs can be configured to solve computationally intensive tasks for real-time applications. For

example, an FPGA-based framework for prototyping of multi-core embedded architectures have been proposed in

[Meloni et al., 2010] although no embedded processor was design nor implemented. The comparison of

embedded system design for industrial applications using FPGAs, microprocessors, microcontrollers, application

specific integrated circuits (ASICs), and digital signal processors (DSPs) indicated that FPGAs are more suitable

for such tasks and several references are provided for justification ([Malinowski and Yu, 2011]; [Monmasson et

al., 2011]). It is noted that no embedded processor(s) have been incorporated in these reported papers.

Furthermore, significant efforts have also been made towards FPGA implementation of predictive

controllers ([Lin-Shi et al., 2007]; [Naouar et al., 2008]; [Pérez et al., 2009]). While linear proportional-integral

(PI) controller FPGA implementations have been demonstrated in these papers under the name of predictive

control, additional optimization technique has been used in [Lin-Shi et al., 2007] to select the optimal control

signal. Although, no embedded processor(s) were included in the FPGA implementations; however, the results

obtained in these papers shows the computational efficiency of the FPGA.

While the PI-controllers may not provide efficient control performances as evident in these papers ([Lin-

Shi et al., 2007]; [Naouar et al., 2008]; [Pérez et al., 2009]), several techniques have been proposed in literature

for FPGA implementation of MPC algorithms with diverse objectives. A close study of the papers published on


85

MPC implementations on FPGA reveals that different implementation has different objectives. For example,

while the objective of some implementation are to: 1) achieve efficient control performance in tracking the

desired reference signal with reduction in computation time ([Bleris et al., 2006]; [Joos and Fichter, 2011]), 2)

some are to create portable MPC algorithm as a system-on-a-chip (SoC) for modularity ([He and Ling, 2005];

[Shoukry et al, 2010]), 3) some are implemented to achieve reduction in the MPC computation time ([He et al.,

2006]; [Jian et al., 2010]; [Ling et al., 2008]), and 4) some are just to exploit the possibility of implementing an

MPC algorithm on an FPGA ([Ling et al., 2006]; [Shoukry et al, 2010b]).

Some common features in all these papers on FPGA implementation of MPC algorithm reveal that: 1)

they are all based on the use of a static first principles model of the system which may degrade the performance of

the MPC when operated outside the operating region of the system; 2) majority of the papers implements linear

MPC for the control of highly nonlinear system and in most case no constraints are imposed which grades the

results presented; all the FPGA implementations are simple where look-up-tables (LUT) are used for multiply and

add matrix operations without employing dedicated multipliers such the XtremelDSP slices, DSP48E slices or any

DSP multiplier for enhanced computational efficiency; 3) as reported in [Fletcher, 2005], an embedded processor

improves the performances of FPGA-based designs, whereas the implementation discussed until now does not

include such or any embedded processor(s); 4) although it is sometimes necessary to off-load certain

computational intensive part of an algorithm to a dedicate computing machine, but majority of the reported paper

implements relatively small portion of the MPC on their so-called FPGA which can easily be implemented by a

single XtremeDSP or DSP48E slice; 5) there are several misuse of MPC and FPGA terminologies such as calling

linear quadratic Gaussian (LQG) controller a nonlinear MPC [Joos and Fichter, 2005], calling a set of registers an

embedded multicore system ([Meloni et al., 2010]; [Shoukry et al., 2010a], etc.

In this work, two real-time platforms are proposed for the implementation for the proposed neural

network-based nonlinear model identification and adaptive MPC algorithms. The two proposed real-time

implementation platforms are: 1) a service-oriented architecture (SOA) cluster computer network based on device

profile for web services (DPWS), and 2) a Xilinx Virtex-5 FX70T ML507 FPGA development system

incorporating an IBM PowerPC™ 440 embedded hard processor core with the MPC core as a co-processing

system.

The embedded system design proposed in this work is approached as a digital signal processing problem

from a model-based design view point. Thus, the proposed embedded processor system begins with the modeling

and synthesis of the adaptive MPC algorithm using the Xilinx AccelDSP synthesis tool in conjunction with

MATLAB from The MAthWorks [MathWorks, 2009]. In order to evaluate and know how the developed System

Generator model of the adaptive MPC algorithm will perform when deployed on to the FPGA, a hardware co-

simulation block representative of the adaptive MPC System Generator model is generated and a hardware-in-the-

loop co-simulation with the Xilinx Virtex-5 FX70TML507 FPGA development board in closed loop is

performed; and its performance is compare to that of the System Generator model of the adaptive MPC. If the


86

observed control performance is acceptable, the embedded system design proceed to the next stage, otherwise the

adaptive MPC model and synthesis using the AccelDSP tool is repeated. This synthesis results in the generation

of an AccelDSP block model which is used to design a complete System Generator model of the adaptive MPC

algorithm using the Xilinx System Generator for DSP with additional blocks from Xilinx System Generator for

DSP and Simulink libraries. The Simulink is also from The MathWorks [MathWorks, 2009]. Next, using the

System Generator token from the Xilinx System Generator for DSP library, a complete intellectual property (IP)

core that encrypts the adaptive MPC algorithm is generated which can be readily integrated and attached as a co-

processor to an embedded processor system.

However, prior to the synthesis of the adaptive MPC algorithm using the Xilinx AccelDSP synthesis tool,

two processor systems are designed using the Xilinx platform studio (XPS) based on: 1) the Xilinx MicroBlaze

soft–core embedded processor system, and 2) the IBM PowerPC™ 440 hard–core processor system; and the

performance comparisons are made. As shown in Chapter 5, the IBM PowerPC™440 embedded processor system

is preferred to the Xilinx MicroBlaze embedded processor system.

Thus, the generated adaptive MPC IP core is exported from the MATLAB/Simulink – System Generator

for DSP environment to and integrated with the IBM PowerPC™ 440 embedded processor system within the

Xilinx platform studio (XPS). Software for initializing and implementing the complete embedded system is also

written here within the XPS environment. Finally, the complete hardware-software embedded PowerPC™ 440

and the integrated MPC algorithm are exported to the Xilinx software development platform (Xilinx SDK) for

software development and programming the Xilinx Virtex-5 FX70T ML507 FPGA. The performance evaluation

of the proposed embedded processor system is used for the control of a nonlinear F-16 fighter aircraft which is the

third and last case study considered in this work.

Chapter 3 Neural Network Model Identification Algorithms

87

CHAPTER 3

NEURAL NETWORK MODEL IDENTIFICATION ALGORITHMS

3.1 Introduction

The term neural network (NN) model identification refers to the technique used for finding the weights of

a Neural Network which will predict the time changes of the outputs of a physical system, i.e. an industrial plant,

when the same time sequences of inputs that are fed to the physical system are also fed to the inputs of the NN.

Also, errors computed between the predicted by the NN outputs and the physical system outputs are taken into

consideration for improving the accuracy of the prediction by updating the weights at successive iterations. This

technique shall involve first the selection of a structure for feeding inputs and output errors to the NN inputs and

next an algorithm (training algorithm) that will find the weights at each successive iteration.

In this chapter two new algorithms are presented for training three different families of neural networks

in two known identification structure which are explained in 3.2.1 and 3.2.2. The developed algorithms are

modifications of existing training algorithms and have been made with the purpose of reducing the computation

load that the other algorithms require. The computation load reduction will make them appropriate for use as part

of an adaptive MPC control strategy applied at almost every time new samples of the controlled system outputs

are taken.

The considered neural networks (NN) families are: 1) the neural network autoregressive with exogenous

inputs (NNARX) networks and 2) the autoregressive moving average with exogenous input (NNARMAX) neural

networks.

The proposed training algorithms have been given the names: 1) the adaptive recursive least square

(ARLS) algorithm and a modified Levenberg-Marquardt algorithm (MLMA). They are extensions of the basic

backpropagation algorithm but are derived from the Gauss-Newton method. In order to understand their

derivation, the reader must be familiar with the backpropagation algorithm and the Gauss-Newton algorithms.

For this reason these algorithms are briefly explained in Sections 3.3.3 and 3.3.4 respectively.

The success of any training algorithm depends on how the derivatives of the network outputs with respect

to the inputs are ordered and computed ([Gupta et al., 2003], [Haykin, 1999], [Nørgaard et al., 2000]). The

approach for these computations is presented in Sections 3.3.4.1 and 3.3.4.2 with extensions which make them

applicable to the considered two identification structures. The adaptive recursive least squares (ARLS) and

modified Levenberg-Marquardt algorithm (MLMA) algorithms are presented in Sections 3.3.5 and 3.3.6

respectively.

In order to assess how well the considered neural networks (NNs) predict a modeled system when they

are trained by the developed algorithms, in this work three model validation methods were used. These are the


88

Dynamic

Neural

Network

(DNN)

System

ˆ ( )Y k

( )U k

•

1z

−

1z

−

1z

−

1z

− ˆ ( 1)Y k −

ˆ ( )Y k n−

( )U k d m− −

( )U k d−

( )kε

( )Y k

•

•

• •

•

+ −

Dynamic

Neural

Network

(DNN)

System

ˆ ( )Y k

( )U k •

1z

−

1z

−

1z

−

1z

− ( 1)Y k −

( )Y k n−

( )U k d m− −

( )U k d−

( )kε

( )Y k

•

•

• •

•

+ −

Fig. 3.1: Neural network parallel model identification Fig. 3.2: Neural network series-parallel model

scheme. identification scheme.

one-step ahead and K-step (multi-step) ahead output predictions as well as the Akaike’s final prediction error

(AFPE) estimate of the average generalization error. To help the reader to understand the contents of chapters

which present three case studies of NN-based modeling and model validation these three methods are highlighted

in Section 3.4.

3.2 Dynamic Neural Network Model Identification

In neural network based system identification, the internal weights and biases of the neural network are

adjusted to make the predictions of the network output close in some sense to the measured (or actual) outputs

taken some time other than the current system operation. In this case, the NN model structures and predictors

introduced in Section 2.5 of Chapter 2 as well as state-space techniques ([Goodwin and Sin, 1984], [Haykin,

1999], [Ljung, 1999], [Zhu, 2001]) can be used. Regarding dynamic model identification, that is the adjustment of

the weights and bias by using data taken at the time instant that they are produced by the current system operation,

there are basically two NN model identification structures, namely: the parallel and the series-parallel structures

([Gupta et al., 2003], [Narendra and Parthasarathy, 1990], [Omidvar and Elliot, 1997]).

3.2.1 Parallel Model Identification

The Parallel model Identification Structure, shown in Fig. 3.1, has direct feedback from the network

outputs to its inputs. It uses the previous output prediction to compute the deviation from the actual system output

in order to find the new weight of the network. On the basis of this new weight and the network configuration, the

next output prediction is computed. Because of this feedback, it has no guarantee of stability and may require


89

dynamic backpropagation training ([Omidvar and Elliot, 1997], [Werbos, 1990]). This structure may be used if

the actual plant outputs are not available or if more than one-step ahead prediction are required. Although it may

not be expected that the identification scheme will be perfect, this model of operation provides a viable way to

make short term prediction. However, in many cases the objective is not to make specific predictions concerning

the system but rather to train the network [Omidvar and Elliot, 1997]. In this case, if the identification model is

accurate, it will exhibit the behaviour of the true system. Since an experiment input-output data pair obtained from

prior plant operation or from the simulation of the actual plant mathematical model will be available, this form of

model identification is not considered in this work.

3.2.2 Series-Parallel Model Identification

The Series-Parallel model Identification Structure is shown in Fig. 3.2. It does not use feedback from the

output but instead it uses the actual plant output to estimate future system outputs. Therefore, static

backpropagation training can be used to train this form of NN structure and there are proofs for guaranteed

stability and convergence. This structure has been found to be the preferred choice for the development of NN-

based models intended for stable adaptive control systems ([Gupta et al., 2003], [Narendra and Parthasarathy,

1990], [Omidvar and Elliot, 1997], [Spooner et al., 2002], [Pearlmutter, 1995]). Note that this structure is similar

to the teacher-forcing method discussed in Section 2.4.4.4 of Chapter 2. This method is the main focus in this

work.

3.2.3 Remarks on the Basic Neural Network Training Algorithms

3.2.3.1 Back-Propagation Algorithm and Its Variations

Despite the wide spread use of the back-propagation learning algorithm to a number of cases, there are

certain aspects which limit its use in control applications the algorithm. The first is its long training time (cycles

or epochs). The other is the fact that as the network is trained, the weights can be adjusted to very large values

The total input of a hidden unit can reach very high (either positive or negative) values, and because of the

sigmoid activation function the unit will have an activation very close to zero or very close to one. As it can be

seen from (2.57) and (2.58), the weight adjustments which are proportional to ˆ ˆ( ) 1 ( )p p

j jY k Y k − will be close to

zero and the process can come to a virtual standstill. Also, the error surface of a complex network is full of hills

and valleys. Because of the gradient descent, the network can get trapped in a local minimum when there is a

lower minimum nearby.


90

Probabilistic methods can help to avoid this trap, but they tend to be slow. Another suggested possibility

is to increase the number of hidden units. Although this may work well because of the higher dimensionality of

the error space, and the chance to get trapped is smaller, it appears that there is some upper limit on the number of

hidden units which, when exceeded, results in having the network trapped in local minima. This can lead to

network overtraining and result in the peaking effects.

Several advanced algorithms based on the back-propagation learning and its variations have been

proposed over the years and many researchers have devised improvements of and extensions to the basic back-

propagation learning algorithms ([Wang et al., 2004]; [Werbos, 1990]). One major improvement by the addition

of adaptation parameters which has resulted in the widely used backpropagation with momentum (BPM) version

of the algorithm discussed in Section 2.4.4.1 ([Phansalkar and Sastry, 1994]; [Yu et al., 2006]). Another obvious

improvement is to the replacement of the primitive steepest descent method with a direction set minimization

method which has led to other several minimization algorithms ([Chiong, 2010]; [Hagan and Menhaj, 1994];

[Wu, 2008]). One of the objectives of this work is an extension of the back-propagation learning algorithm by

investigating a new minimization technique of the direction set in order to achieve a global minimum of the joint

weights, i.e. weight and biases.

3.2.3.2 Teacher-Forced Real-Time Recurrent Learning

The teacher-forced real-time recurrent learning (RTRL) consists of jamming the desired output values

into the network units as the network is being trained; thus the output units are forced to have the correct states

even during the network training phase. This technique is applied to discrete time clocked network because only

in this case only the concept of changing the state of the output unit at each time step makes sense [Pearlmutter,

1990] and [Pearmutter, 1995].

The errors in the teacher-forced RTRL method are usually measured before the output units are forced,

and not after that. Thus, this method leads to faster training because it is assumed that the network has learned

correctly the dynamics in the previous time steps [Haykin, 1999]. As noted in [Zipser and Williams, 1989] and

[Pearmutter, 1995], the computation time is radically reduced in contrast with the original computationally

intensive RTRL algorithm. Note that since the actual outputs of the system are fed back rather than the network,

no real recurrent path exists in the network; the backpropagation algorithm can readily be applied to train the

network as dynamic feedforward neural network (DFNN). In addition, the training does not diverge in an unstable

way. In fact, if the network is trained in a closed-loop manner with the true system, the weights at the beginning

of the training are randomly distributed and the predicted state may be quite different from the desired one. These

erroneous outputs are recursively fed back to the input layer to predict new states and so on until convergence

occurs [Guarneri et al., 2008].


91

1

1

1 z −−

,0jw,0iW

lα−

•

Self-Feedback Loop

Self-Recurrence Loop

ˆ ( )Y k

( )iF i

Dynamics

b

∑ ( )jf i

a

∑

,j lw

,i jW ( )l kϕ

1z −

Lateral

Recurrences

∑

( 1)Y k −

( )Y k n−

( 1)U k −

( )U k m−

( )U k ( )Y k

Training (Learning)

Algorithm

System

ˆ ( 1)Y k + 1z −

•

−

•

•

•

1z −

1z −

+

( )kε

1z −

TDL

Units

,j lw

Fig. 3.3: Teacher-forced dynamic feedforward neural network (TF-DFNN) architecture.

The main disadvantage of the teacher-forced RTRL approach is that the input vector to the network which

describes the current state to be retrained in time by the network must be made available; otherwise new persistent

internal representations of temporal structures may be difficult to create during network training [Pearlmutter,

1990, 1995]. However, in the usual control theory, this problem can be partially alleviated by cycling back to the

inputs not just the immediate output from the previous time step but also those from a small number of previous

time steps. The trade-offs between using hidden units to encapsulate temporally hidden structures and using a

temporal window of values which must contain the desired information is problem dependent and depends in

essence on how long a hidden variable can remain hidden without being manifested in the observable state

variables. The second problem is the difficulty when a large number of hidden units is used in the network

[Pearlmutter, 1990]. Again, in control applications, this problem can possibly be alleviated in two ways: 1) by

training the network with a large experimental data set covering the entire operating range of the real system,

which will allow the network to converge with even with a small numbers of hidden units; and/or 2) by training

the network with a suitable second-order algorithm.

3.2.4 Architecture for the Teacher-Forced Dynamic Feedforward Neural Network

The architecture for the teacher-forced dynamic feedforward neural network (TF-DFNN) that has been

developed and used in this work is shown in Fig. 3.3. As shown in the figure, the physical system is placed in

parallel with the neural network (NN). This is the so-called series-parallel architecture. The input to the NN is a


92

state vector of the previous inputs and outputs via a bank of tapped delay line (TDL) memory units. With this

architecture, the state to be retrained is constituted by the previous state in time and replacing those values with

the new outputs from the system in a first-in first out-fashion. Thus, the TDL memory unit forms a sliding

window for the state vector. Note that in the proposed architecture, the system time delay has been assumed to be

one, 1.

The NN training or learning process basically involves the use a set of input-output data pair which are

obtained from a specific nonlinear function modeling the behaviour of the physical system. A more detailed

treatment and discussion of the proposed DFNN architecture together with its mathematical description and the

training (learning) algorithms will be presented successively in the next two chapters.

3.3 Training Algorithms for Neural Network Model Identification

3.3.1 Formulation of the Model Identification Problem

It was shown in Section 2.5 of Chapter 2 that the output ( )Y k of a p-input q-output for discrete-time

nonlinear multivariable system at time k responding to input ( )U k with disturbance ( )d k could be represented

in the following form:

( ) [ ( , ( ), ( )] ( )Y k J k k k d kϕ θ θ= + (3.1)

where ( , , )J i i i is a nonlinear function of its arguments, NZ is the set of input-output data (observations) pairs

obtained from prior system operation or experiment over NT period of time defined as

(1), (1) , ( ), ( ) 1,2, ,NZ U Y U N Y N k N= =… … (3.2)

where N is the number of input-output data pairs and T is the sampling period of the system, ( , ( ))k kϕ θ is the

regression (state) vector given as:

[

]

( , ( )) ( 1), , ( ), ( ), , ( ), ( 1, ( )), , ( , ( ))

( 1, ( )), , ( , ( )), ( 1, ( )), , ( , ( ))

a b c

T

f d

k k Y k Y k n U k d U k d n k k k n k

d k k d k n k v k k v k n k

ϕ θ ε θ ε θ

θ θ θ θ

= − − − − − − −

− − − −

… … …

… … (3.3)

( )kθ is an unknown parameter vector which contains the model parameters of (3.1) to be determined. Since ( )kθ

is unknown, let a parameterized set of model structures Θ that contains a set of candidate models be defined as:

ˆ: ( ) ( )k kν

θθ θΘ ∈ ⊂ ℜ → (3.4)

where θ is some subset of νℜ inside which the search for a model is carried out; ν is the dimension of ( )kθ ;

ˆ( )kθ is the desired model associated with the unknown parameter vector ( )kθ contained in the set of model

structures 1 2( ) , ( ), , ( )k k kτθ θ θΘ = … ; 1 2( ) , ( ), , ( )k k kτθ θ θ… are distinct values of ( )kθ ; and 1,2, ,maxiter

τ = … is

the number of iterations required to determine the ˆ( )kθ from within the set of models in Θ .


93

As in Section 2.5 of Chapter 2, the one-step ahead predictor form of model (3.1) based on the information

up to time 1k − can be expressed in the following compact form ([Goodwin and Sin, 1984], [Ljung, 1999], [Zhu,

200]) as:

[ ]ˆ( | 1, ( )) ( , ( )), ( )Y k k k J k k kθ ϕ θ θ− = (3.5)

Note that as discussed in Chapter 2, the nonlinear model structures and predictors considered in this work are

those based on NNARX and NNARMAX models. These various models are distinguished in terms of the

regression vector being used in (3.5). Thus, the regression vector corresponding to NNARX and NNARMAX

models for use in the predictor model of (3.5) are given respectively as:

( ) [ ( 1), , ( ), ( ), , ( )]T

a bk Y k Y k n U k d U k d nϕ = − − − − −… … (3.6)

[ ]( , ( )) ( 1), , ( ), ( ), , ( ), ( , ( )), , ( , ( ))T

a b ck k Y k Y k n U k d U k d n k k k n kϕ θ ε θ ε θ= − − − − − −… … … (3.7)

Note that ( )kϕ in (3.6) does not contain ( )kθ as an argument which is an indication that the predictor is pure

dynamic feedforward NN without feedback (see Fig. 2.13), whereas ( , ( ))k kϕ θ in (3.7) contains ( )kθ with

feedback loops as shown in Fig. 2.14. The ( )kθ in (3.7) is an indication of the dependence of ( , ( ))k kϕ θ on the

error estimate based on the moving average inputs while ( )kθ in (3.8) is an indication of the dependence of

( , ( ))k kϕ θ on a priori predicted outputs. Although, the form of NNARMAX regressor in (3.7) corresponds to the

series-parallel scheme of Fig. 3.2 and the teacher forcing method can be imposed, it also contains additional

feedback which depends on past error estimates (see Fig. 2.14). On the other hand, the NNARX model structure

(see Fig. 2.13) can easily be configured to the series-parallel structure of Fig. 3.2 and it has been the most widely

used structure in literature.

Let the parameter vector corresponding to the true system (3.1) be 0( ) ( )k kθ θ= and so that (3.1) becomes

0 0 0( ) [ ( , ( ), ( )] ( )Y k J k k k d kϕ θ θ= + (3.8)

Assuming that, ,a b

n n and c

n are known, and given (3.2), the unknown parameter vector ( )kθ can be

determined in such a way that ˆ( | 1, ( )) ( )Y k k k Y kθ− ≈ using a suitable minimization procedure, in terms of the

error [ , ( )]k kε θ between (3.8) and (3.5) and defined as:

ˆ[ , ( )] ( ) ( , ( ))k k Y k Y k kε θ θ= − (3.9)

Then, the minimization problem of (3.9) can be formulated as a mean square error (MSE) type cost function

stated as:

( ) 21, ( ) [ ( , ( ))]

2

NN

k

J Z k k kN

θ ε θ= ∑ (3.10)

The minimization of (3.10) can be approached in several ways such as using numerical optimization algorithms,

recursive algorithms, evolutionary and genetic algorithms. In this work, two adaptive minimization algorithms are

proposed for solving (3.10).


94

3.3.2 The Proposed Neural Network Model Identification Schemes

The minimization of (3.10) is approached here by considering ˆ( )kθ as the vector of weights and bias of a

dynamic feedforward neural network (DFNN), having the generalized internal architecture shown in Fig. 3.4. The

proposed NN model identification scheme is illustrated in Fig. 3.5 for (a) NNARX and (b) NNARMAX model-

type of DFNNs. The NN identification schemes of Fig. 3.5(a) and (b) for the NNARX and the NNARMAX model

predictors respectively follows from the series-parallel identification scheme of Fig. 3.2 and are based on the

teacher-forced method illustrated in Fig. 3.3.. The input vector to the DFNNs of Fig. 3.5 is the regression vector

originating from the system inputs to the NNARX and NNARMAX model predictors given by

[ ]( ) ( 1), , ( )bn bk U k U k d nϕ = − − −… , those originating from the system outputs to the NNARX and NNARMAX

model predictors given by [ ]( ) ( 1), , ( )an ak Y k Y k nϕ = − −… , and those due to the prediction errors from the

NNARMAX model predictor ( , ( )) [ ( , ( )), , ( , ( ))]cn ck k k k k n kϕ θ ε θ ε θ= −… . These input regression vectors are

concatenated into ( )l

kϕ for the NNARX model predictor and for the NNARMAX model predictor as shown in

,0jw

∑ ,i j

W

Hidden layer

1

1

( )i

F b

( )j

f a

Output layer

b

a

ˆ( )Y k

( )

( , ( ( ))

NNARX

NNARMAX

k

or

k k

ϕ

ϕ θ

,j lw

×

,0iW

∑

( )l

kϕ

with Network

of l-TDL

Fig. 3.4: The architecture of the dynamic feedforward neural network (DFNN) model.

Training

Algorithm

Neural

Network

Model

bn TDL−

System

( )kε

ˆ ( )Y k

( )Y k ( )U k • •

•

• +

−

( )bn

kϕ

an TDL−

( )an

kϕ

( )d k

Training

Algorithm

Neural

Network

Model

b

n TDL−

a

n TDL−

System

( , ( ))k kε θ

ˆ( )Y k

( )Y k ( )U k • •

•

•

•

+

−

( )bn

kϕ

( )an kϕ

c

n TDL−

( , ( ))cn k kϕ θ

( )d k

•

( , ( ))k kε θ

(a) (b)

Fig. 3.5: Neural network model identification schemes: (a) NNARX and (b) NNARMAX model predictors.


95

Fig. 3.4. Noting that the predictor output depends on the structure of the regression vector, the output of the NN

model for the general architecture of Fig. 3.4 can be expressed as:

, ,0

1

ˆ( ) ( )hn

i i j j i

j

Y k F W f a W=

= +

∑

(3.11)

, ,0

1

( )

n

j l l j

l

a w k wϕ

ϕ=

= +∑

(3.12)

where ˆ( )Y k is the vector of [ ]1 2ˆ ˆ ˆ( ), ( ), , ( )

T

iy k y k y k… , h

n and nϕ are the number of hidden layer neurons and

number of regressor respectively; i is the number of outputs; ,j lw and ,i jW are the hidden and output weights

respectively; ,0jw and ,0iW are the hidden and output biases; ( )i

F b

is a linear activation function for the output

layer and ( )jf a

is an hyperbolic tangent activation function for the hidden layer defined here as:

2

2( ) 1

1j a

f ae

⋅= −

+

(3.13)

The term bias is interpreted here as a weight acting on the input and clamped to 1. Also, the parameter ˆ( )kθ is a

collection of all network weights and biases in (3.12).

Covariance Noise Matrix Estimate and Regularization by Weight Decay

Note that the disturbance ( )d k in (3.1) is unknown but in most cases can be estimated recursively as a

covariance matrix for the noise, [ ( )] [ ( ) ( )].Tk d k d kθΓ = E Using [ ( )]kθΓ , Equation (3.10) can be rewritten as:

1

1

1( , ( )) [ , ( )] [ ( )] [ , ( )]

2

NN T

k

J Z k k k k k kN

θ ε θ θ ε θ−

=

= Γ∑ (3.14)

The weights ( )kθ are initialized to small random vector v of dimension depending on the number of neurons in

the hidden layer and the dimension of the regression vector. Next, the network is trained to minimize (3.14) using

the algorithms that are proposed later in this Chapter to obtain ˆ( )kθ . Note that [ ( )]kθΓ is adjusted

simultaneously together with ( )kθ in the search for ˆ( )kθ and afterwards, the noise component is estimated as

ˆ[ ( )]kθΓ according to some minimum satisfactory termination error esp . The iterative algorithm for the

covariance noise estimate is summarized in Table 3.1. In the implementation of this iterative algorithm, it is

assumed that the covariance matrix is symmetric and positive definite and thus, a check for this is also required.

Note that the covariance noise matrix estimation algorithm of Table 3.1 is implemented at each time sample until

the ˆ[ ( )]kθΓ term has reduced significantly based on the conditions set out in Step 7). Note also that the noise

covariance noise matrix is re-initialized at each time sample.


96

In the model identification problem formulation, it is obvious that ( )kθ contains many parameters. As

pointed out in ([Gupta et al., 2003], [Ljung, 1999], [Omidvar, 1997], [Spooner et al, 2002]), it is well-known that

minimizing (3.14) may be ill-conditioned, in the sense that the Hessian [ ( )] ''( , ( ))NR k J Z kθ θ= may be an ill-

conditioned or non-positive definite matrix. Thus, it has been proposed that a norm Iα can be added to control

the effective number of parameters that are used in the minimization of (3.14). This technique is called

regularization and the norm Iα is called the regularization (or weight decay parameter) usually implemented as a

diagonal matrix ([Ljung, 1999], [Nørgaard et al., 2000], [Sjöberg and Ljung, 1995], [Sjöberg et al., 1995]). Here,

by introducing the weight decay parameter as [ ; ]d h o

D I Iα α α= = into (3.14), the resulting expression becomes

1

1

1( , ( )) [ , ( )] [ ( )] [ , ( )] ( ) ( )

2

NN T T

k

J Z k k k k k k k D kN

θ ε θ θ ε θ θ θ−

=

= Γ +∑ (3.15)

where [ ; ]d h o

α α α= is the penalty norm, h

α and o

α are the weight decay values for the input-to-hidden and

hidden-to-output layers respectively, and I is an identity matrix. Equation (3.15) is referred to literature as the

weighted criterion and the second term is called the regularization (or weight decay) term.

Regularization by weight decay reduces modeling errors, improves the robustness and performance of

second-order training algorithms for a finite data set NZ ([Ljung, 1999], [Sjöberg and Ljung, 1995]). With the

weighted criterion and appropriate choice ofd

a , it is possible to accurately estimate all the many parameters

contained in the parameterized model. As noted in [Ljung, 1999], a large value of α will lock more parameters to

the vicinity of ˆ( )kθ , will remove ill-conditioning, and improve convergence of the training algorithm.

Table 3.1: Iterative Algorithm for Estimating the Covariance Noise Matrix

1) Given initial network weights (0) ( )kθ and maxj j= .

2) For 1k = to Number of Samples (N), Do,

3) Initialize (0)[ ( )]k IθΓ = , Do,

4) Set 1j = , Do,

5) Train the network for τ iterations with a suitable training algorithm using ( 1)[ ( )]

jkθ−

Γ to obtain

( )ˆ( ) ( ) ( )jk k kτθ θ θ← .

6) Estimate the covariance matrix for the noise using

( ) ( ) ( )

1

1ˆ [ ( )] [ ( )] [ ( )]2

Nj T j j

k

k k kN

θ ε θ ε θ=

Γ = ∑

7) If ( )ˆ [ ( )]j

k espθΓ < , where esp is a convergence criteria.

Set 1j j= + and Go To Step 4).

Else, set ( )ˆ( ) ( )j

k kθ θ= and End Set j.

8) End For k.


97

In the weighted criterion given by (3.15), note that both 1[ ( )]kθ−Γ and D are adjusted simultaneously

together with ( )kθ and used to update ˆ( )kθ . The next task is how to train the proposed neural network model

identification schemes illustrated in Fig. 3.5.

3.3.3 Backpropagation (BP) Techniques

The minimization of (3.15) is based on an iterative procedure which starts with a randomly initialized

0( ) ( )k kθ θ= and updates ˆ( )kθ iteratively according to the following typical updating rule

ˆ( ) ( ) ( )k k kτ τθ θ θ= + ∆ (3.16)

where ( )kτθ denotes the adjustable parameters of the network at the current iteration ,τ ( )kτθ∆ is the searching

direction. The algorithm of (3.16) is terminated if certain criteria or stopping conditions are satisfied.

As mentioned earlier, the most commonly used method for updating ˆ( )kθ is the BP algorithm introduced

in Chapter 2 ([Rumelhart et al, 1986], [Werbos, 1990]). This algorithm uses the gradient method and sets the

( )kτθ∆ directly proportional to the negative of the gradient of (3.10) evaluated at ( ) ( )k kτθ θ= . Using (3.10) and

(3.16), the BP algorithm can easily be stated as:

( ) ( )

[ ( )]ˆ( ) ( )( )

k k

dJ kk k

d kτ

τ τ

θ θ

θθ θ µ

θ=

= − (3.17)

where the second term in (3.17) corresponds to the search direction ( )kτθ∆ and τµ the step size. NN training

using the BP algorithm, with an approximate steepest descent rule, has been reported to be characterized by poor

performance in terms of poor convergence, poor robustness, long training cycle, and the trapping of several local

minima within the cost function ([Chiong, 2010], [Guarneri et al., 2008], [Song, 2010], [Wu, 2008]). To improve

the performance of the BP algorithm, the Gauss-Newton method has been widely used as a starting point for this

purpose. The ARLS and the MLMA algorithms proposed in this work are both derived from the Gauss-Newton

method.

3.3.4 The Gauss-Newton Second-Order Approximation Method

The quadratic criterion used for deriving the search direction of the Gauss-Newton method is briefly

reviewed to facilitate the formulation of the ARLS and MLMA algorithms. The Gauss-Newton method uses the

linear approximation error [ , ( )]k kτε θ to the [ , ( )]k kε θ in (3.9) which can be expressed as:


98

[ ]

[ ]

( ) ( )

( ) ( )

[ , ( )][ , ( )] [ , ( )] ( ) ( )

( )

ˆ( | ( ))[ , ( )] ( ) ( )

( )

T

k k

T

k k

d k kk k k k k k

d k

dY k kk k k k

d k

τ

τ

τ τ τ

θ θ

τ τ

θ θ

ε θε θ ε θ θ θ

θ

θε θ θ θ

θ

=

=

= + −

= − −

(3.18)

where the coefficient of ( )kθ in the second term is the negative derivative of the network output ˆ( , ( ))Y k kθ with

respect to ( )kθ evaluated at ( ) ( )k kτθ θ= . Let [ , ( )]k kτψ θ be the derivative of the network output with respect to

( )kθ given in (3.18) and be defined as

ˆ( | ( ))[ , ( )]

( )

dY k kk k

d k

θψ θ

θ= (3.19)

The computation of (3.19) is carried out in a backpropagation fashion for the input-to-hidden layer and for the

hidden-to-output layers respectively for the two-layer NN considered in this work. The derivatives of the (3.19)

with respect to weights ( )kθ is the only component that impedes the implementation of the training algorithms

proposed in this work.

3.3.4.1 Computing the Gradient of the Network ψ[k,θ(k)]

Note that the three NN model predictors considered in this work are the NNARX and the NNARMAX

model predictors. Their main distinguishing feature is the structure of the regression vector ( , ( ))k kϕ θ . Using

(3.6) and (3.7), the derivatives for the NNARX and the NNARMAX model predictors can be expressed from

(3.19) as in the following computations. For notational convenience, the covariance matrix and the weight decay

terms are not included but their inclusion in the final result is straightforward as shown in sub-section 3.3.4.3.

The Derivative of the NNARX Model Predictor

Using (3.19), the derivative for the NNARX model predictor can be expressed as

ˆ( , ( ))[ , ( )]

( )NNARX

dY k kk k

d k

θψ θ

θ= (3.20)

Note that in the NNARX model predictor, the regression vector (3.6) does not depend on the weight as there is no

feedback, so it is straight forward to compute the partial derivatives of the network denoted as ( )kφ which

constitutes (3.20). Thus, Equation (3.20) can be rewritten as

ˆ ˆ( | ( )) ( | ( ))[ , ( )] ( )

( ) ( )NNARX

dY k k Y k kk k k

d k k

θ θψ θ φ

θ θ

∂= = =

∂ (3.21)


99

The Derivative of the NNARMAX Model Predictor

In the NNARMAX model predictor, the regression vector (3.7) depends on a posteriori error estimate. Thus,

using (3.19), the derivative for the NNARMAX model predictor can be expressed as

ˆ ˆ ˆ ˆ( | ( )) ( | ( )) ( | ( )) ( 1| ( ))[ , ( )]

( ) ( ) ( 1, ( )) ( )

ˆˆ ( | ( ))( | ( ))

( , ( )) ( )

NNARMAX

c

c

dY k k Y k k Y k k Y k kk k

d k k k k k

Y k n kY k k

k n k k

θ θ θ θψ θ

θ θ ε θ θ

θθ

ε θ θ

∂ ∂ ∂ −= = −

∂ ∂ − ∂

∂ −∂ − − ∂ − ∂

(3.22)

Thus, Equation (3.22) can be expressed equivalently as

1[ , ( )] ( ) ( ) [ 1, ( )] ( ) [ , ( )]cNNARMAX NNARMAX n NNARMAX ck k k C k k k C k k n kψ θ φ ψ θ ψ θ= − − − − − (3.23)

Suppose we let 1 1

1( , ) 1 ( ) ( ) c

c

n

nC k z C k z C k z−− −= + + + , Equation (3.23) can be reduced to the following form

1

1[ , ( )] ( )

( , )NNARMAX k k k

C k zψ θ φ

−= (3.24)

As it can be seen from (3.24), the gradient is calculated by filtering the partial derivatives with the time-varying

filter 11 ( , )C k z− which depends on the prediction errors based on the predicted outputs.

Due to the feedback signals, the NNARMAX model predictor may be unstable if the system to be

identified is not stable since the roots of (3.24) may, in general, not lie within the unit circle. One approach

proposed here to ensure that the predictor becomes stable is summarized in the algorithm of Table 3.2. Thus, this

algorithm ensures that roots of 1( , )C k z− lies within the unit circle before the weights are updated by a suitable

training algorithm such as the ones proposed later in this work.

Table 3.2: An algorithm for placing the roots of the time-varying filter of an NNARMAX model predictor

within the unit circle for stability.

1) Given network weights (0)( ) ( )k kθ θ= , time-varying filter 1 (0) 1( , ) ( , )C k z C k z

− −= and the regression vector ( , ( ))k kϕ θ .

2) Computer the roots of 1( , )C k z

− as 1( , )Roots

C k z− and length of 1( , )Roots

C k z− asCRoots

l .

3) Compute the absolute value of 1 1( , ) ( ( , ))Roots Roots

C k z abs C k z− −= .

4) for 1i = toCRoots

l ,

if ( ) 1( ( , )) 1i

Rootsabs C k z− >

( ) 1

( ) 1

1( , )

( , )

i

Roots i

Roots

C k zC k z

−

−=

End if

End for

5) Compute the 1( , )C k z− using the real root from 5).


100

3.3.4.2 Computing the Partial Derivatives ( )kφ

From the computation of [ , ( )]k kψ θ for the NNARX and NNARMAX model predictors in (3.21) and

(3.24) respectively, it is evident that while [ , ( )]k kψ θ is different, ( )kφ is the same in all cases. Let ˆ ( | ( ))h

Y k kθ

and ˆ ( | ( ))o

Y k kθ be the outputs of input-to-hidden and hidden-to-output layers when the regressor’s dependency

on the weights is ignored. Also, let ˆ ( | ( ))Y k kϕ θ be a generic output of the network when the regression vector

depends on the weights.

Let the output of the ith unit of the general network with arbitrary j hidden units in the hidden layer

shown in Fig. 3.3 and given by (3.11) and (3.12) be re-expressed here again as

, , ,0 ,0

1 1

,

1

ˆˆ( | ( )) ( , ( ))

( )

h

h

nn

i i j j j l l j i

j l

n

i i j j

j

Y k k F W f w k k w W

F W a k

ϕ

θ ϕ θ= =

=

= + +

=

∑ ∑

∑

(3.25)

where , ,0 ,0

1

( , ( ))

n

j j l l j i

l

a f w k k w Wϕ

ϕ θ=

= + + ∑

(3.26)

Computing the Partial Derivative Ignoring Regressor Dependencies on the Weights

In this case, it is assumed that the derivative of the network output predictions with respect to the weights

can be computed by ignoring the regressor’s dependency on the weights. The partial derivative for the input-to-

hidden and hidden-to-output layers can be expressed respectively as:

2

,

2, ,

1 ( ) ( , ( )), 0ˆ ( | ( ))

1 ( ) 0

i j j lh

j l i j j

W a k k k lY k k

w W a k l

ϕ θθ − >∂ =

∂ − =

(3.27)

and ,

( ) 0,ˆ ( | ( ))

1 0

0

j o

o

o

i j

o

a k j i nY k k

j i nW

i n

θ> =

∂ = = =

∂ ≠

(3.28)

Computing the Partial Derivative with Regressor Dependencies on the Weights

In this case, the derivative of the network output predictions with respect to the weights is computed with

the assumption that the regressor’s depends on the weights. In other words, the instantaneous Jacobian or actual

gain matrix is the derivative of the network output with respect to the inputs for a given input-output pair. Thus,


101

the Jacobian is required for a portion of the inputs when the regression vector depends on the weights for a given

input-output pair. Thus, for the network given by (3.25) and (3.26), derivative of the network can be expressed as:

2

, , , ,0

1 1

2

, ,

1

ˆ ( | ( ))1 ( , ( ))

( , ( ))

1 ( , ( ))

h

h

nn

i j j l j l j

j l

n

i j j l j

j

Y k kW w f w k k w

k k

W w a k k

ϕ

ϕ θϕ θ

ϕ θ

θ

= =

=

∂= − + ∂

= −

∑ ∑

∑

(3.29)

Computing and Backpropagating the Output Derivatives within the Network

Now, considering (3.25) with (3.26), the partial derivative , , ( )j i l kφ of the network outputs with respect to

the weights in the input-to-hidden layer can be expressed as

' ', , , , ,

, 0 0

ˆ ( | ( )( ) ( ) ( ) ( )

hn n

hj i l l j j l l i j i i j j

i l j j

Y k kk k f w k W F W a k

w

ϕθφ ϕ ϕ

= =

∂= =

∂ ∑ ∑

(3.30)

Thus, the gradient for the input-to-hidden layer weights can then be expressed as

', , ,

1 0 0

1

( ) ( ) ( ) ( )

( ) ( )

on nN

j l l j j l l i j i

k l i

N

l j

k

G w k f w k W k

k k

ϕ

ϕ ϕ δ

ϕ δ

= = =

=

=

=

∑ ∑ ∑

∑ (3.31)

where ', ,

0 0

( ) ( ) ( )o

n n

j j j l l i j i

l i

k f w k W kϕ

δ ϕ δ= =

=

∑ ∑ (3.32)

Also, the partial derivative , ( )j i kφ of the network output with respect to the weights in the hidden-to-output layer

is given by

', ,

, 0

ˆ ( | ( )( ) ( ) ( )

hn

oj i j i i j j

i j j

Y k kk a k F W a k

W

θφ

=

∂= =

∂ ∑

(3.33)

And, the gradient for the hidden-to-output layer weights can therefore be expressed as

( )', ,

1 0

1

ˆ( ) ( ) ( ) ( ) ( | ( )

( ) ( )

hnN

i j j i i j j

k j

N

j i

k

G W a k F W a k Y k Y k k

a k k

θ

δ

= =

=

= −

=

∑ ∑

∑

(3.34)

where ( )',

0

ˆ( ) ( ) ( ) ( | ( )hn

i i i j j

j

k F w a k Y k Y k kδ θ=

= −

∑

(3.35)


102

where '( )jf i and ' ( )iF i are the first-order derivative of their respective arguments. The two terms ( )j kδ and

( )i kδ are the sensitivities of the weighted criterion or performance index (3.15) to changes in the net inputs of the

units. Note the internal recurrence between the output and hidden layers initiated by (3.35) and (3.32), where the

hidden-to-output derivative by (3.35) is backpropagated to the input-to-hidden layer via (3.32).

3.3.4.3 Second-Order Expansion and the Gauss-Newton Search Direction

Thus, during the computations of the partial derivatives of the network outputs with respect to the weights

when the regressors’ dependence on the weights are ignored, the weights input-to-hidden and hidden-to-output

layers are updated according to (3.27) and (3.28) respectively. However, when the regressor depends on the

weights, the computation specified in (3.29) is employed. Finally, the weights are backpropagated in accordance

with the sensitivity terms given by (3.32) and (3.5 9) to obtain the output predictions with respect to the joint

input-to-hidden and hidden-to-output weights.

The Gauss-Newton method uses the linear approximation error of (3.9) evaluated at ( ) ( )k kτθ θ= to

search for ˆ( )kθ around the neighborhood ( ( ))J kτθ of the local minimum. Substituting [ , ( )]k kψ θ in (3.19) with

( ) ( )k kτθ θ= into (3.18) gives:

( ) [ ]( , ( )) ( , ( )) ( ) ( ) , ( )T

k k k k k k k kτ τ τ τε θ ε θ θ θ ψ θ= − − (3.36)

The half mean square of the linear expansion of (3.36) over time k gives the Gauss-Newton linear quadratic

criterion defined as

2

1

1( , ( )) [ , ( )]

2

NN

k

J Z k k kN

τ τθ ε θ=

= ∑ (3.37)

Expanding (3.37) using (3.18) gives

[ ] [ ] [ ]( )2

1

1( , ( )) , ( ) ( ) ( ) , ( )

2

NTN

k

J Z k k k k k k kN

τ τ τ τθ ε θ θ θ ψ θ=

= − −∑

which can be simplified to obtain the following expression:

[ ] [ ] [ ] [ ]

[ ] [ ] [ ][ ]

2

1 1

1

1 1( , ( )) , ( ) , ( ) ( ) ( ) , ( )

2

1( ) ( ) , ( ) , ( ) ( ) ( )

2

N NTN

k k

NT T

k

J Z k k k k k k k k kN N

k k k k k k k kN

τ τ τ τ τ

τ τ τ τ

θ ε θ ε θ θ θ ψ θ

θ θ ψ θ ψ θ θ θ

= =

=

= − − −

+ − −

∑ ∑

∑

(3.38)

where [ ], ( )k kτψ θ in (3.38) denotes the derivative of the network output with respect to ( )kθ evaluated at

( ) ( )k kτθ θ= . By re-arranging the terms in (3.38), Equation (3.37) can be expressed equivalently as:


103

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

1

1

1( , ( )) ( , ( )) ( ) ( ) , ( ) , ( )

1( ) ( ) , ( ) , ( ) ( ) ( )

2

NTN N

k

NT T

k

J Z k J Z k k k k k k kN

k k k k k k k kN

τ τ τ τ τ

τ τ τ τ

θ θ θ θ ε θ ψ θ

θ θ ψ θ ψ θ θ θ

=

=

= − −

+ − −

∑

∑

(3.39)

Furthermore, substituting ( , ( ))NJ Z kτθ for ( , ( ))N

J Z kτ τθ in (3.39) and setting

[ ] [ ] [ ]1

1( ) , ( ) , ( )

N

k

G k k k k kN

τ τ τθ ψ θ ε θ=

= − ∑ (3.40)

and [ ] [ ] [ ]1

1( ) , ( ) , ( )

NT

k

R k k k k kN

τ τ τθ ψ θ ψ θ=

= ∑ (3.41)

gives the quadratic criterion expressed as

[ ] [ ]

[ ] [ ][ ]

( , ( )) ( , ( )) ( ) ( ) ( )

1( ) ( ) ( ) ( ) ( )

2

TN N

T

J Z k J Z k G k k k

k k R k k k

τ τ τ τ τ

τ τ τ

θ θ θ θ θ

θ θ θ θ θ

= + −

+ − −

(3.42)

where [ ]( )G kτθ , the gradient matrix, is the first-order coefficient that measures the gradient of the cost function

(3.15) and [ ]( )R kτθ is the second-order coefficient which is the exactly the Gauss-Newton Hessian matrix.

By re-introducing the covariance matrix for the noise and the weight decay term, the [ ]( )G kτθ and

[ ]( )R kτθ in (3.40) and (3.41) can be expressed respectively in terms of the weighted criterion of (3.15) as

[ ] [ ] [ ]( )1

1

1( ) , ( ) ( ) , ( ) ( )

N

k

G k k k k k k D kN

τ τ τ τθ ψ θ ε θ θ−

=

= − Γ +∑ (3.43)

and [ ] [ ] [ ]( )1

1

1( ) , ( ) ( ) , ( )

NT

k

R k k k k k k DN

τ τ τθ ψ θ ψ θ−

=

= Γ +∑ (3.44)

Next, by substituting ( ) ( ) ( )k k kτ τθ θ θ∆ = − into (3.42) above and setting its derivative to zero the Gauss-Newton

searching direction can be obtained and expressed as follows

1( ) [ ( )] [ ( )]k R k G kτ τ τθ θ θ−∆ = − (3.45)

and the resulting Gauss-Newton updating rule becomes:

ˆ( ) ( ) ( )k k kτ τθ θ θ= + ∆ (3.46)

3.3.5 The Adaptive Recursive Least Squares (ARLS) Algorithm

The recursive algorithm is relevant in situations: 1) when the network has very large number of

parameters, 2) the training data is very large and there is lack of sufficient memory, and 3) where there is high

degree of redundancy in the data set. The only disadvantage is the excessive computation time when compared to


104

batch or standard algorithms. However, for small data set NZ , the standard calculations can be repeated as the

data size increases as more data is acquired. But as N becomes larger, the computation may become prohibitive

due to the fact that the dimensions of (3.5), (3.43) and (3.44) depend on N. In this section, an adaptive recursive

least squares (ARLS) algorithm is proposed that will estimate and update ˆ( )kθ in (3.46) at each time sample as

new data pair is added to NZ . The proposed ARLS is formulated as a standard algorithm but having to compute

the inverse of (3.44) as in (3.45) directly to obtain ˆ( )kθ in (3.46).

The proposed ARLS algorithm is derived from (3.45) and (3.46) with the assumptions that: 1) new input-

output data pair is progressively added to NZ in a first-in first-out fashion according to (3.2) at each time step k

such that 0 k N< < ; 2) Equation (3.44) is non-singular for all k ; 3) the parameter vector ˆ( )kθ is updated after

a complete sweep through NZ ; 4) the parameters of the physical system ( )kθ vary slowly; and 4) all NZ is

repeated τ times.

Since the size of NZ will be increasing successively and since it is assume that the size will increase by

one at each time step, let the time index k N= be such that 1 k N≤ ≤ . Thus, Equation (3.15) can be expressed as

the following weighted criterion:

( )1

1

1( , ( )) [ , ( )] [ ( )] [ , ( )] ( ) ( )

2

kN k T T

J Z k Dk

ι

τ τ

ι

θ π ε ι θ ι θ ι ε ι θ ι θ ι θ ι− −

=

= Γ +∑ (3.47)

where [0,1)π ∈ is an exponential forgetting and resetting parameter that will give more recent data a higher

weight during the optimization process.

Also, let ( 1)kθ − denote the least squares estimate from the criterion (3.47) based on the data pair at

time 1k − , then the updating rule for the ARLS algorithm at time k subject to the value of the parameter vector

( 1)kθ − at time 1k − can be expressed for (3.47) from (3.46) as:

[ ] [ ]1ˆ( ) ( | 1) ( ) ( | 1)k k k R k G k kτ τ τθ θ θ θ

−= − − − (3.48)

Assuming that the parameter vector ( 1)kθ − actually minimized (3.47) at time 1k − , then the gradient matrix

[ ( )]G kτθ can be expressed as

( )

( )

( )

1

1

11

1

1

1

1[ , ( )] [ , ( 1)] [ ( 1)] [ , ( 1)] ( 1)

1[ , ( 1)] [ ( 1)] [ , ( 1)] ( 1)

1[ , ( 1)] [ ( 1)] [ , ( 1)] ( 1)

1 1[ , ( 1)] [ , (

kk

k

k

G Z k Dk

Dk

k k k k k D kk

kG Z k k

k k

τ τ τ

ι

τ τ τ

ι

τ τ τ

τ

θ ψ ι θ ι θ ι ε ι θ ι θ ι

ψ ι θ ι θ ι ε ι θ ι θ ι

ψ θ θ ε θ θ

θ ψ θ

−

=

−−

=

−

−

= − − ⋅Γ − ⋅ − + −

= − − ⋅Γ − ⋅ − + −

− − ⋅ Γ − ⋅ − + −

−= − −

∑

∑

( )11)] [ ( 1)] [ , ( 1)] ( 1)k k k k D kτ τθ ε θ θ−

− ⋅Γ − ⋅ − + −

(3.49)


105

Since it is assumed that the ( 1)kθ − minimized (3.47) at time 1k − , then the first term on the right hand side of

(3.49) above equals zero. Hence, Equation (3.49) reduces to

( )11[ ( )] [ , ( 1)] [ ( 1)] [ , ( 1)] ( 1)G k k k k k k D k

kτ τ τθ ψ θ θ ε θ θ−

= − − ⋅ Γ − ⋅ − + − (3.50)

and the Gauss-Newton Hessian matrix [ ( )]R kτθ becomes

( )

( )

1

1

1

1[ ( )] [ , ( 1)] [ ( 1)] [ , ( 1)]

1[ , ( 1)] [ , ( 1)] [ ( 1)] [ , ( 1)] [ , ( 1)]

kT

T

R k Dk

R k k k k k k k R k k Dk

τ τ τ

ι

τ τ τ τ

θ ψ ι θ ι θ ι ψ ι θ ι

θ ψ θ θ ψ θ θ

−

=

−

= − ⋅ Γ − ⋅ − +

= − + − ⋅Γ − ⋅ − − − +

∑ (3.51)

where [ , ( 1)]k kτψ θ − is the derivative of the network outputs with respect to ( )kθ evaluated at ( ) ( )k kτθ θ= as

defined by (3.19) but at time 1k − .

Moreover, by ignoring the fact that [ , ( 1)]k kτψ θ − should, in principle, be evaluate at ( 1)kτθ − and not at

( 1)τθ ι − , the update for the ARLS algorithm can thus be expressed by two the following expressions:

[ ] [ ]1ˆ( ) ( 1) ( 1) ( 1)k k R k G kτ τ τθ θ θ θ

−= − + − − (3.52)

1[ 1, ( 1)] [ ( 1)] [ 1, ( 1)]1[ ( )] [ 1, ( 1)]

[ 1, ( 1)]

Tk k k k k

R k R k kk R k k D

τ τ τ

τ τ

τ

ψ θ θ ψ θθ θ

θ

− − − ⋅Γ − ⋅ − −= − − +

− − − + (3.53)

In order to avoid the inversion of [ ( )]R kτθ given by (3.52), let a covariance matrix ( )P k that will be used to

compute the inverse of (3.53) in (3.52) defined as

1

1

1( ) [ , ( )] , 1,2, ,

k

P k R kk

τ

ι

ι θ ι ι−

=

= =∑ … (3.54)

where ( )P k is a diagonal matrix. Equation (3.54) above can also be expressed equivalently as

1

1

1( ) [ , ( )]

k

P k Rk

τ

ι

ι θ ι−

=

= ∑ (3.55)

By pulling the last term from the summation in the (3.55), the equation can be rewritten as:

11

1

( ) [ ( )] [ ( )]k

P k R R kτ τ

ι

θ ι θ−

−

=

= +∑ (3.56)

and hence 1 1( ) ( 1) [ ( )]P k P k R kτθ− −= − + (3.57)

Now, using (3.45) rewritten here again as

1

1

1 1

1

1 1

( ) [ ( )] [ ( )] [ ( )] [ ( )]

( ) [ ( )] ( ) [ ( )] [ ( )]

k k

k k

k R k G k R G

P k G P k G G k

τ τ τ τ τ

ι ι

τ τ τ

ι ι

θ θ θ θ ι θ ι

θ ι θ ι θ

−

−

= =

−

= =

∆ = − =

= = +

∑ ∑

∑ ∑ (3.58)

Hence,


106

1

1

( 1) ( 1) [ ( )]k

k P k Gτ τ

ι

θ θ ι−

=

− = − ∑

So that

11

1

( 1) ( 1) [ ( )]k

P k k Gτ τ

ι

θ θ ι−

−

=

− − =∑ (3.59)

Now, replacing 1( 1)P k− − in (3.59) above with the result in (3.57) gives the following expression

( )1

1

1

( ) [ ( )] ( 1) [ ( )]k

P k R k k Gτ τ τ

ι

θ θ θ ι−

−

=

− − =∑ (3.60)

And then using the result from (3.58) gives the following expression

( )

( )

1( ) ( ) ( ) [ ( )] ( 1) ( ) [ ( )]

( 1) ( ) [ ( )] ( 1) ( ) [ ( )]

( 1) ( ) ( ) ( ) ( ) ( 1)T

k P k P k R k k P k G k

k P k R k k P k G k

k P k k Y k k k

τ τ τ τ

τ τ τ τ

τ τ

θ θ θ θ

θ θ θ θ

θ ψ ψ θ

− = − − +

= − + − +

= − + − −

(3.61)

Equation (3.61) provides a method for to compute the estimate of the parameter vector ( )kτθ at each time step k

from the past estimate ( 1)kτθ − and the latest data pair that is received kZ . Note that ( ) ( ) ( 1)T

Y k k kτψ θ− − in

(3.61) is the error in predicting ( )Y k using ( 1).kτθ −

To update ( )kτθ in (3.61), the value of ( )P k is required and it may be convenient to reuse (3.57). But

this will, of course, require that the inverse of matrix 1( )P k− be computed at each time step k (i.e., each time a

new input-output data pair is received). Clearly, thus computation is not desirable for real-time implementation

and must be avoided. An alternative to the matrix inversion is to use the following matrix inversion lemma:

Lemma 3.1: If ( ) n nkθ ×∈ℜ , n m

B×∈ℜ and m n

C×∈ℜ , then

1 1 1 1 1 1 1[ ] [ ]A BCD A A B C DA B DA− − − − − − −+ = − + (3.62)

Provided that A , C and 1 1[ ]C DA B− −+ are non-singular square matrices.

Thus, the matrix inversion Lemma 3.1 above can be used to remove the need to compute the inverse of 1( )P k−

that comes from (3.57) so that it can used in (3.61) to update ( )kτθ . Notice from (3.57) that

( )1

1( ) ( 1) [ ( )]P k P k R kτθ−

−= − + (3.63)

and by using the matrix Lemma 3.1 with 1[ , ( )]A R k kτπ θ −= , 1B D−= and 1C = , Equation (3.57) can then be

expressed as

1

( 1) [ , ( 1)] [ , ( 1)] ( 1)( ) ( 1)

[ ( )] [ , ( 1)] ( 1) [ , ( 1)]

T

T

P k k k k k P kP k P k

k k k P k k k

τ τ

τ τ

ψ θ ψ θ

θ ψ θ ψ θ−

− − − −= − −

Γ + − − − (3.64)


107

which together with (3.61) is called the weighted recursive least squares algorithm. Basically, the matrix inversion

lemma turns the inversion of a matrix into the inversion of a scalar (i.e., the denominator on the right hand side of

(3.64) is a scalar).

The next task is to initialize and implement the weighted recursive least squares algorithm above. One

approach that is often used is to set (0) 0θ = and (0) 0P = where 0

P Iα= for some larger 0α > [Spooner et al.,

2002]. In this case, initial covariance matrix (0) 0P = may be selected and (0) 0θ = is chosen as initial guess of

the true parameter vector ( )kθ . Another choice proposed in [Nørgaard et al., 2002] is choose α such that

4 810 10α< < .

The approach proposed in this work is to initialize (0)θ as small random matrix and then select (0)P

such that the following are satisfied: 1) an upper bound for ( )P k , i.e. a non-zero lower bound for 1( )P k− , and 2)

an upper bound for 1( )P k− , i.e. a non-zero lower bound for ( )P k . To satisfy these two conditions in

conjunction with the weighted criterion of (3.47), the initial covariance matrix (0)P is selected here such that:

min max(0)e I P e I< < (3.65)

and 4max

min

10e

e (3.66)

where min

e and max

e are the minimum and maximum eigenvalues of (0)P given respectively by

( ) ( )( )( ) ( )( )

2

min

2

max

( ) 2 ' 1 1 4 ' ( )

2 ' 1 1 4 '

e

e

α γ δ βδ α γ

γ δ βδ γ

= − − + + −= + +

(3.67)

where ,α ,β 'δ and π are four design parameters which are selected such that the following conditions are

satisfied [Salgado et al, 1988]:

2 2

0 1, 0, ' 0,

( ) 4 ' (1 )

γ α β δ

γ α βδ α

< < < > >

− + < − (3.68)

where [0.1,0.5]α ∈ adjusts the gain of the (3.52), ' [0,0.01]δ ∈ is a small constant that is inversely related to the

maximum eigenvalue of P(k), [0.9,0.99]π ∈ is the exponential forgetting factor which is selected such that

1 ππ

γ − and [0,0.01]β ∈ is a small constant which is related to the minimum min

e and maximum max

e eigenvalues

in (3.65). Also, the values of ,α ,β 'δ and π in (3.68) is selected such that 4

max min 10e e in (3.66).

Thus, the new formula for the computation of the covariance matrix of (3.68) based on the exponential

forgetting and resetting parameter in (3.47) can be expressed as follows:

21( ) ( 1) ( ) [ , ( 1)] ( 1) ' ( 1)T

P k P k k k k P k I P kτψ θ β δπ

= − − Λ − − + − − (3.69)

where ( )kΛ is the adaptation factor given by


108

1

( 1) [ , ( 1)]( )

[ ( 1)] [ , ( 1)] ( 1) [ , ( 1)]T

P k k kk

k k k P k k k

τ

τ τ

α ψ θ

θ ψ θ ψ θ−

− −Λ =

Γ − + − − − (3.70)

and I is a diagonal matrix of appropriate dimension. Thus, the complete update for the parameter vector ˆ( )kθ is

given from (3.61) as

1ˆ ˆ( ) ( 1) ( ) [ ( 1)][ ( ) ( | ( 1))]k k k k Y k Y k kτ τθ θ θ θ−= − + Λ Γ − − − (3.71)

which together with (3.69) and (3.70) is called the adaptive recursive least squares (ARLS) algorithm.

Note that in the proposed algorithm, the exponential forgetting and resetting parameterπ discards old

information as new data is acquired to avoid redundant information in the model ˆ( )kθ . In the formulation of the

ARLS, it was assumed that (3.44) which resulted in (3.51) as well as the terms in the matrix inversion Lemma 3.1

were assumed to be non-singular. The two conditions above for selecting (0)P then refer in (3.69).

The first condition suggests the addition of positive definite term to the 1( )P k− update, and the second

condition suggests the addition of a positive definite term to ( )P k . In this algorithm, the first term in (3.69)

relates to the exponential forgetting and resetting algorithm in the weighted criterion of (3.47), the second term

corresponds to the usual recursive least squares algorithm, the third term provides the positive definite addition to

the ( )P k update, and the fourth term provides the positive definite addition to the ( )P k update. In this way, the

positive definiteness of the matrix inversion lemma is guaranteed. Note that after ˆ( )kθ is obtained, the algorithm

of Table 3.1 is also implemented to estimate the covariance noise matrix ˆ[ ( )]kτθΓ and the weights are finally

updated according to the conditions specified in Step 7).

3.3.6 The Modified Levenberg-Marquardt Algorithm (MLMA)

The Levenberg-Marquardt algorithm (LMA) utilizes the general form of (3.46) but with modification to

(3.45) by the inclusion of the non-negative scalar parameter τλ to the diagonal of [ ( )]R kτθ with a new iterative

updating rule as follows ([Dennis and Schnabel, 1996]; [Hagan and Menhaj, 1996]; [Marquardt, 1963]):

ˆ( ) ( ) ( )k k kτ τθ θ θ= + ∆ (3.72)

[ ]1

( ) [ ( )] [ ( )]k R k I G kτ τ τ τθ θ λ θ−

∆ = − + (3.73)

where I is a diagonal matrix, [ ( )]G kτθ and [ ( )]R kτθ are:

( )1

1

1[ ( )] [ , ( )] [ ( )] [ , ( )] ( )

N

k

G k k k k k k D kN

τ τ τ τθ ψ θ θ ε θ θ−

=

= − ⋅Γ ⋅ +∑ (3.74)

1

1

1[ ( )] [ , ( )] [ ( )] [ , ( )]

NT

k

R k k k k k k DN

τ τ τθ ψ θ θ ψ θ−

=

= ⋅Γ ⋅ +∑ (3.75)


109

and [ , ( )]k kτψ θ the derivative of the network outputs with respect to ( )kθ evaluated at ( ) ( )k kτθ θ= given in

(3.19).

Proof of Non-Negative Property of τλ

Recall the quadratic criterion given by (3.42). By substituting ( ) ( ) ( )k k kτ τθ θ θ= + ∆ into (3.42) and noting that

( ) ( ) ( )k k kτ τθ θ θ∆ = − , results in the following expression

( ) [ ] [ ]

[ ] [ ][ ]

[ ] [ ]

, ( ) ( ) ( , ( )) ( ) ( ) ( ) ( )

1( ) ( ) ( ) ( ) ( ) ( ) ( )

2

1( , ( )) ( ) ( ) ( ) ( ) ( )

2

TN N

T

TN T

J Z k k J Z k G k k k k

k k k R k k k k

J Z k G k k k R k k

τ τ τ τ τ τ τ τ τ

τ τ τ τ τ τ τ


θ θ θ θ θ θ θ

θ θ θ θ θ θ θ

θ θ θ θ θ θ

+ ∆ = + + ∆ −

+ + ∆ − + ∆ −

= + ∆ + ∆ ∆

(3.76)

From (3.73), we can write

[ ] [ ]( ) ( ) ( )R k I k G kτ τ τ τθ λ θ θ+ ∆ = − (3.77)

By subtracting (3.76) from (3.9) around the current iterate τ and using (3.77), gives

[ ] ( ) ( ) ( )

[ ] [ ]

[ ]( ) [ ]

[ ]

, ( ) , ( ) ( ) , ( ) , ( ) ( )

1( ) ( ) ( ) ( ) ( )

2

1( ) ( ) ( ) ( ) ( ) ( )

2

1( ) ( ) ( ) ( ) ( )

2

N N N

T T

TT T

T T

k k J Z k k J Z k J Z k k

G k k k R k k

k R k I k k R k k

k R k k k k

τ τ τ τ τ τ τ τ τ

τ τ τ τ τ


τ τ τ τ τ τ

ε θ θ θ θ θ θ

θ θ θ θ θ

θ θ λ θ θ θ θ

θ θ θ λ θ θ

− + ∆ = − + ∆

= − ∆ − ∆ ∆

= ∆ + ∆ − ∆ ∆

= ∆ ∆ + ∆ ∆

(3.78)

If τλ is non-negative and [ ]( )R kτθ is positive definite, then

[ ] ( ), ( ) , ( ) ( ) 0Nk k J Z k kτ τ τ τε θ θ θ− + ∆ ≥ (3.79)

The Search Direction with Levenberg-Marquardt Method

The parameter τλ characterizes a hybrid of searching directions and has several effects: 1) for relatively

small values of τλ , for example 0τλ = , Equation (3.73) reduces to the search direction of the Gauss-Newton

method and [ ]1

[ ( )]R k Iτ τθ λ−

+ may become ill-conditioned or non-positive definite ; 2) for large values of τλ , the

identity matrix dominates the right product matrix of (3.73) to produce the search directions of the steepest

descent algorithm (with step 1 )τλ which requires a descend search method; and 3) for intermediate values of τλ ,


110

Equation (3.73) characterizes a hybrid of searching directions used respectively by the gradient method and the

Gauss-Newton method.

Despite the fact that (3.15) is a quadratic weighted criterion, the convergence of the LMA may be slow

since ( )kτθ contains many parameters of different magnitudes, especially if these magnitudes are large as in most

cases ([Ljung, 1999], [Sjöberg and Ljung, 1995]). This problem can be alleviated by adding a scaling matrix

S sIτ = (where s is the scaling parameter and I is an identity matrix) which is adjusted simultaneously

with ( )kτθ . Furthermore, to ensure that the Gauss-Newton Hessian matrix and its inverse [ ]1

[ ( )]R k Iτ τθ λ−

+ will

always be positive definite; instead of checking the direct Levenberg-Marquardt form [ ]1

[ ( )]R k Iτ τθ λ−

+ proposed

in [Marquardt, 1963], the form proposed here together with scaling matrix S sIτ = is given as

1

( ( )) [ ( )] ( ) ( )TV k R k S Sτ τ τ τ τθ θ λ

−

= + (3.80)

which will ensure that (3.80) is always positive definite with fast convergence based on a suitable choice of τλ .

Table 3.3: Iterative algorithm for selecting the Levenberg-Marquardt, τλ .

Initialize [0.5, ] [0.5,1,2,4,6,8]km η∈ = of length kl .

Let [ , ] ( ( ( )))sm sn size V kτ θ= . Set,

( ) 1a a

L k = − .

Evaluate (3.80).

for 1i = to sm

while iter kl<

for 1kn = to kl

for 1a = to sn

1 2

, , ,1( ( )) ( ( )) ( ( ))

a

a a a a a jjL k V k L kθ θ θ

−

== −∑ (3.81)

end for a.

if ,

( ) 0a a

L k < , * (1, )km knλ λ= and recomputed (3.80)

Set 1a a= + , recomputed (3.81)

else, for 1b a= + to sn

1

, , , ,

1,

1( ( )) ( ( )) ( ( )) ( ( ))

( ( ))

a

b a b a a j a j

ja a

L k V k L k L kL k

θ θ θ θθ

−

=

= −∑ (3.82)

end for b, end if , ( )a aL k , end for kn.

1iter iter= +

if iter kl> and , ( ) 0a aL k < , break, end.

Set τλ λ← and recomputed (3.80) using τλ .

end while iter , end for sn .


111

This problem is one reason why the LMA is not being used online. Thus, if the Gauss-Newton matrix is singular

or indefinite, the algorithm of Table 3.3 will fail and the value of τλ will be increased until non-singularity and/or

positive definiteness is achieved. Thus, the proposed algorithm can be used effectively to check whether or not a

symmetric matrix is positive definite. Once the positive definiteness of (3.80) is achieved, the Cholesky factors

,( ( ))

b aL kθ in (3.82) are produced whose inverse is guaranteed to be positive definite as well.

Different from other methods ([Chong, 2010], [Dennis and Schnabel, 1996], [Kelly, 1999], [Nørgaard et

al, 2000], the method proposed here uses the Cholesky factorization algorithm ([Antoniou and Lu, 2007], [Boyd

and Vandenberghe, 2007] which is implemented in such a way as to iteratively select τλ to guarantee the positive

definiteness of (3.80) for online application. First, (3.80) is computed and the check is performed. If (3.80) is

positive definite, the algorithm is terminated, otherwise λ is increased iteratively until this is achieved. The

method is summarized in Table 3.3. The key parameter in the algorithm is η and how τλ is to be updated at each

iteration. Next, the Cholesky factors , ( ( ))b aL kθ given by (3.82) in Table 3.3 is reused to compute the search

direction from (3.73) in two stage forward and backward substitution procedures given respectively as:

, [( ( )] ( ) [ ( )]b aL k k G kτ τ τθ θ θ∆ = (3.83)

( )1

,( ) [( ( )] [ ( )]T

b ak L k G kτ τ τθ θ θ

−

∆ = (3.84)

The convergence of the LMA using (3.72) to (3.75), (3.83), (3.84) with the algorithm of Table 3.3 may again be

slowed if the initial guess (0) ( )kθ too far from the optimum value ˆ( )kθ . Thus, the LMA is sometimes combined

with the trust region method so that the search for ˆ( )kθ is constrained around a trusted region τδ . The problem

can be defined as:

( ) ( )

( ) arg min ( , ( ))N

k k

k J Z kτ

τ τθ θ

θ θ= ∈Θ

= (3.85)

Subject to ( )( ) ( )S k kτ τ τθ θ δ− ≤ (3.86)

where ( , ( ))NJ Z kτ θ is the second-order Gauss-Newton quadratic criterion given by (3.42) which is expected to

be valid only in a neighborhood around the current iterate evaluated at ( ) ( )k kτθ θ= . Thus, with this modified

Levenberg-Marquardt approach and the trust region method, and using the results from (3.84); Equation (3.72) is

rewritten as

ˆ( ) ( ) ( )k k kτ τθ θ θ= + ∆ (3.87)

The choice of selecting and/or adjusting τδ and τλ has led to the coding of several algorithms ([Chiong, 2010],

Colin et al., 2007], [Dennis and Schnabel, 1996], [Guarneri et al., 2008], Fletcher, 1987], [Nørgaard et al., 2000],

[Wu, 2008]). In stead of adjusting τδ directly, this paper develops on the indirect approach proposed in [Fletcher,

1987] but reuses τλ computed in Table 3.3 to update the weighted criterion (3.15).


112

The idea here is to observe how well the reduction in the criterion matches the reduction predicted by the

approximate ( , ( ))NJ Z kτ θ and then adjust τλ according to this reduction. Here, τλ is adjusted according to the

according to the accuracy of the ratio τα between the actual reduction ( )ared and theoretical predicted decrease

( )pdec of (3.15) and (3.42) using (3.85) subject to (3.86). The ratio of this accuracy can be defined as:

( , ( )) ( , ( ) ( ))

ˆ( , ( )) ( , ( ) ( ))

N N

N N

J Z k J Z k kared

pdec J Z k J Z k k

τ τ ττ

τ τ τ τ

θ θ θα

θ θ θ

− + ∆= =

− + ∆ (3.88)

Table 3.4: The modified Levenberg-Marquardt algorithm (MLMA) incorporating the Trust Region algorithm

for computing and updating ˆ( )kθ .

1) Specify ,τ max ,τ D, 3max [1, 10 ]λ ∈ , 2

[0.1, 10 ]s−

∈ , m and n for ( , ( )),k kϕ θ 3 4[0.1, 10 ], [0.1, 10 ]τ τλ δ− −∈ ∈ .

2) Initialize the weights 0( ) ( )k kθ θ= and the time-varying filter 1 0 1( , ) ( , )C k z C k z− −= with appropriate dimensions.

3) While 1,τ = Do.

4) Evaluate ( ( ))J kθ using (3.15) for the a priori estimate.

5) For the NNARMAX model predictor, ensure that the roots of 1( , )C k z− are within unit circle using the algorithm of

Table 3.2 with regression vector ( , ( ))k kϕ θ in (3.7) for NNARMAX model predictor.

6) Compute [ ( )]G kτθ from (3.43) using (3.31) and (3.34) while [ ( )]R kτθ∆ is computed from (3.44).

7) Evaluate [ ( )]V kτ θ in (3.80) using the algorithm of Table 3.3 and use the Cholesky factors from (3.82) to determine

the searching direction ( ( ))kτθ∆ using (3.84).

8) Re-evaluate ( ( ))J kθ using (3.15) for the posteriori estimate.

9) Evaluate ( , ( ))NJ Z kτ θ in (3.42) and (3.85) subject to (3.86).

10) Evaluate the ratio τα in (3.88).

11) Update τλ according to the following conditions on τα :

If 0.75τα > , then 0.5*τ τλ λ← and Go To 12).

If 0.25τα < , then 2*τ τλ λ← and Go To 12).

12) If ( ( ) ( ))S k kτ τ τθ θ δ− ≤ , maxτλ λ< and 0ared > .

Accept ( ( ))kτθ∆ in (3.84), Set ( ) ( ) ( ( ))k k kτ τ τθ θ θ← + ∆ and Go To 13).

Else 1τ τ← + , 1τ τλ λ +← , 1( ) ( )k kτ τθ θ +← and Go To 3).

13) Accept ˆ( ) ( )k kτθ θ← in (3.87).

* This algorithm is implemented in Step 5) in the algorithm of Table 3.1.


113

Now, if the ratio is close to one, ˆ( , ( ) ( ))NJ Z k kτ τ τθ θ+ ∆ may be a reasonable approximation to ( , ( ))N

J Z kτθ and

τλ should be reduced by some factor and thereby increasing the trust-region indirectly. On the other hand, if τα

is small or negative, then τλ should be increased by some factor thereby reducing the trust-region indirectly.

The complete modified Levenberg-Marquardt – trust region algorithm for updating ˆ( )kθ is summarized

in Table 3.4. Note that after ˆ( )kθ is obtained, the algorithm of Table 3.1 is implemented until the conditions set

out in Step 7) of the Table 3.1 algorithm is satisfied.

3.3.7 Training Parameters and Criteria for Evaluating the Neural Network (NN) Model

In order to train the neural network (NN) models proposed in this work, the training parameters for the

adaptive recursive least squares (ARLS) algorithm and the modified Levenberg-Marquardt algorithms (MLMA)

must be specified. It is also necessary to specify the parameters of NN structure of Fig. 3.3 and Fig. 3.4 in terms

of the NN inputs, number of hidden and output layer neurons, and the activation function of each layer. To fulfill

these requirements, the correct number of regressor as well as the number of hidden neurons must first be selected

but unfortunately there are not specific rules for these selections ([Gupta et al., 2003], [Hagan et al., 1996],

[Haykin, 1999]). In many applications ([Gomm et al., 1997], [Guarneri et al., 2008], [Vieira et al., 2005],

[Zamarreñ0 and Vega, 1999]), the number of neurons for the hidden layer is selected based on trial-and-error

method usually starting with small initial network. Even the so-called optimal brain surgeon (OBS) [Haykin,

1999] and optimal brain damage (OBD) [Nørgaard et al., 2000] rules for automating network architecture requires

an initial large network.

In this study, the number of past inputs ( )b

n and output ( )a

n and the estimated prediction errors ( )c

n that

constitutes the regression vector of length nϕ is selected initially by trial-and-error method starting with an initial

small value. Note that the regression vector is the input to the NN. For the two–layer NN considered in this work,

hα ,

oα and the number of hidden layer neurons

hn are also selected using this same method starting with initial

small values while the number of output neurons o

n corresponds exactly to the number of system outputs. The

parameter vector ( )kθ is initialized to small random matrix of appropriate dimension ν as (0)θ . The initial

(0)[ , ( )]k kθΓ is initialized as diagonal matrix of dimensionso

n , the number of iterations j for the algorithm of

Table 3.1 and the number iterations τ for implementing the ARLS and the MLMA identification algorithms must

also be specified. In addition, where the NNARMAX model predictor is used, the time-varying filter (0) 1( , )C k z−

is initialized as a diagonal matrix of dimension c

n consisting of random small numbers and the number of

iterations i for the algorithm of Table 3.2 must be specified.


114

The training parameters for the adaptive recursive least squares (ARLS) algorithm discussed in Section

3.3.5 of Chapter 3 that must be defined include π , α , β , 'δ which are used to compute, γ as well as min

e and

maxe to chose the initial diagonal of (0)P . Also, the training parameters for the modified Levenberg-Marquardt

algorithms (MLMA) that must be specified include λ , s and δ .

The performance and evaluation of the trained is subject to satisfying certain criteria during the network

training process. The criteria considered in this work include:

1) The Minimum Criterion,min

:crit This shall be the minimum value of the weighted mean square error (MSE)

cost function given by (3.15) evaluated at each iteration until the network training is terminated. Equation

(3.15) shall simply be referred to as the criterion and the minimum values of the criterion shall be called the

performance index. The performance index is actually the criteria which shall be used to judge the

convergence rate of the training algorithms at each iteration. The training algorithms shall be terminated if the

minimum value of the criterion is less than a specified value ofmin

crit .

2) The Change in Criterions, :crit∆ This shall be the change in the criterions evaluated between two successive

iterations. If the change in the criterions between two successive iterations is less than ,crit∆ then the

training algorithms will be terminated.

3) The Maximum Gradient,max

:grad This shall be related to the value of the largest element in the gradient term

evaluated at any given iteration. If the value of the largest element in the gradient terms given by (3.50) and

(3.74) for the ARLS and MLMA algorithms respectively is less thanmax

,grad then the network training will

be terminated.

4) The Change in Weight,max

:θ∆ This shall be the largest value of the change in the value of the parameter

vector ( )kτθ between two successive iterations. If the largest parameter change is belowmax

,θ∆ then the

training algorithms will be terminated.

3.3.8 Scaling the Training Data and Rescaling the Trained Network

Due to the fact the input and outputs of a process may, in general, have different physical units and

magnitudes; the scaling of all signals to the same variance is necessary to prevent signals of largest magnitudes

from dominating the identified model. Moreover, scaling improves the numerical robustness of the training


115

algorithm, leads to faster convergence and gives better models. The training data are scaled to unit variance using

their mean values and standard deviations according to the following equations:

( )

( )

( )

( )

( ) ( )( )

( ) ( )( )

s

U k

S

Y k

U k U kU k

Y k Y kY k

σ

σ

−=

− =

(3.89)

where ( ),U k ( )Y k and ( )U kσ , ( )Y kσ are the mean and standard deviation of the input and output training data

pair; and ( ) ( )SU k and ( ) ( )S

Y k are the scaled inputs and outputs respectively. Also, after the network training, the

joint weights are rescaled according to the expression

( )ˆ ˆˆ ˆ( , ( )) ( , ( )) ( )Y kY k k Y k k Y kθ θ σ= + (3.90)

so that the trained network can work with other unscaled validation data and test data not used for training.

However, for notational convenience, ( )( ) ( )SU k U k= and ( )( ) ( )S

Y k Y k= shall be used.

3.4 Neural Network-Based Validation Algorithms

A trained network generalizes well if it is able to predict correctly both data that were used for training

and data that were not used during training. This process is called network validation. During the validation

process, tests are performed to access to what extend the developed model represents the operational dynamics of

the underlying system. The NN models identified by the proposed ARLS and the MLMA algorithms are validated

in this work using three validation methods, namely: 1) the one-step ahead output predictions of the scaled

training and unscaled validation data as well as unscaled test data from the operation of the real system under

consideration, where such test data is available; 2) K-step (that is, multi-step) ahead output predictions of the

unscaled training data; and 3) the Akaike’s final prediction error (AFPE) estimate of the average generalization

based on the weighted (or regularized) criterion.

3.4.1 One-Step Ahead Prediction Validation

The one-step ahead predictions of the network performance is the most widely used validation method

mostly because the predictors designed in most cases are one-step ahead model predictors. With this method, the

unscaled training data, validation data as well as available test data can be used to simulate the trained network for

network output prediction comparison and the assessment of the corresponding prediction errors. While the one-

step ahead validation is simple and give intuitive insight about the performance of the trained network, it might


116

not reveal the suitability or accuracy of the trained network for more distant predictions such as in multi-step

ahead predictions which are widely used in controller design. Thus, the K-step ahead validation might be useful.

3.4.2 K-Step Ahead Output Prediction Simulation

The K-step ahead output prediction simulation is a validation technique which is particularly useful as far

as model validation for predictive control strategies are concerned and in situations where the sampling period of

the plant is high compared to the dynamics and time constants of the controlled variables. In the K-step ahead

prediction simulations, the output predictions of the trained network are compared with the unscaled training data

where the output of the one-step ahead NN predictor is calculated K-step ahead.

The inspection of one-step ahead output predictions might not reveal the model inaccuracy for multi-step

output or distant predictions. The K-step ahead predictor follows from the one-step ahead NN model predictor

given in (3.5) except that ˆ( , ( )) ( , ( ))k k k K kϕ θ ϕ θ= + , so that the K-step ahead predictor take the following form:

( ) ( )( )ˆ ˆ ˆˆ ˆ ˆ| , ( ) , , ( ) , ( )NY k K k k J Z k K k kθ ϕ θ θ+ = + (3.91)

where J is now known in terms of ˆ( )kθ and ˆˆ( , ( ))k K kϕ θ+ is now the K-step ahead predictor regression vector

( ) ( ) ( ) ( )

( ) ( )

ˆ , ( ) 1 , , , ( ), , max( ,0) ,

ˆ ˆˆ ˆ1, ( ) , , 1 min( , ), ( )

b a

T

a

k K k U k K U k K n Y k Y k n k

Y k K k Y k K k n k

ϕ θ

θ θ

+ = + − + − − −

+ − + + −

… …

… (3.92)

The mean value of the K-step ahead prediction error (MVPE) between the outputs predicted by the trained

network and the outputs of the actual system in terms of the unscaled training data set is computed as follows:

( )ˆˆ( ) | , ( )100%

( )

N

k k K

Y k Y k K k kMVPE mean

Y k

θ

= +

− + = ×

∑ (3.93)

where ( )Y k corresponds to the actual outputs and ˆˆ(( ) | , ( ))Y k K k kθ+ is the K-step ahead output predictions.

3.4.3 Akaike’s Final Prediction Error Estimate

The Akaike’s Final Prediction Error (AFPE) is used to verify the accuracy of the trained network by

computing the average generalization error estimate. The AFPE estimate is useful for selecting a suitable model

based on the weighted criterion by trying different values of the weight decay parameter D in (3.15). A smaller

value of the AFPE estimate indicates that the trained network models and captures all the dynamics of the

underlying system and also measure how well the identified model will perform when presented with actual data

taken during normal system operation. The AFPE estimate algorithm used here is adopted from [Ljung, 1999],

[Sjöberg and Ljung, 1995], and [Nørgaard et al., 2000].


117

The average generalization error, ˆˆ ( , ( ))NF Z kθ as a function of the model quality is considered here to be

the expectation ˆ ( , )NE J Zθ [Ljung, 1999] of the one-step ahead NN predictor error ˆ( , )kε θ with respect to the

regression vector ( )kϕ and the noise contribution ( ).v k For the regularized criterion considered in this paper with

multiple weight decay parameter D in (3.15), the AFPE estimate of ˆˆ ( , ( ))NF Z kθ has been derived in [Ljung,

1999] and [Sjöberg and Ljung, 1995] as a function of the noise variance 2ê

σ which is given here as:

( ) 21ˆˆ ˆ, ( ) 12

N a

e

PF Z k

Nθ σ γ

= + +

(3.94)

where 2ê

σ is the noise variance given as

( )2 ˆˆ 2 ( , ( ))2

N

e

a b

NJ Z k

N P Pσ θ γ= −

+ −

1 1

ˆ ˆ ˆ ˆ[ ( )] [ ( ) [ ( )] ( )ap R k R k D R k V Dθ θ θ θ− −

= + + tr

11ˆ ˆ[ ( )] [ ( )]bp R k R k DN

θ θ

− = +

tr

and γ is a positive quantity that improves the accuracy of the estimate and is computed by the relationship:

1 1

2

1 1 1ˆ ˆ ˆ ˆ ˆ( ) [ ( )] [ ( )] [ ( )] ( )Tk D R k D R k R k D D k

N N Nγ θ θ θ θ θ

− −

= + +

and tr i is the trace of its arguments and it is computed as the sum of the diagonal elements of its arguments.

In (3.94), the ˆ( , ( ))k kε θ portion of (3.15) is evaluated using the identified NN model ˆ( )kθ and taking the

expectation ˆ ( , ( ))NJ Z kθE with respect to ( , ( ))k kϕ θ and ( )d k . The term expectation E i is the mean or

average vale of its arguments. The above AFPE estimate algorithm is evaluated on unscaled validation data as

well as on unscaled test data where available.

Chapter 4 Adaptive Model-Based Predictive Control Algorithms

118

CHAPTER 4

NEURAL NETWORK-BASED ADAPTIVE MODEL PREDICTIVE

CONTROL ALGORITHMS

4.1 Introduction

This chapter presents the developed two new NN-based adaptive MPC algorithms both consisting of an

on-line process identification part and a predictive control part. Both parts are executed at each sampling instant.

The predictive control part of the first algorithm is a nonlinear adaptive model predictive control (NAMPC)

strategy and the control part of the second algorithm is an adaptive generalized predictive control (AGPC)

strategy. In the identification parts of both algorithms, the process model is approximated by a neural network

which is trained either by the modified Levenberg-Marquardt algorithm (MLMA) or the adaptive recursive least

squares (ARLS) algorithms presented in Chapter 3.

In the proposed NN-based identification and control scheme, any deviations in system outputs at each

sampling instant due to disturbances ( )d k or change in the operating conditions are considered to correspond to

equivalent model changes. As discussed in Section 2.11.1.2 of Chapter 2, both AGPC and NAMPC are indirect

adaptive control strategies where the NN is used as a nonlinear approximator (or identifier in adaptive control

literature [Omidvar and Elliot, 1997], [Spooner et al., 2002]) of the system at each sampling instant k . The

identified NN model is then employed in the AGPC and NAMPC design at the same sampling instant.

The proposed NN-based AGPC algorithm is based on the instantaneous linearization of an identified

nonlinear NN which models the process at each sampling instant, the subsequent solution of a set of recursive

Diophantine equations and the application of a quadratic programming method to obtain the control signal under

tight input and output constraints. On the other hand, the proposed NAMPC is a constrained online optimization

control strategy based on the full Newton method ([Dennis and Schnabel, 1996], [Kelley, 1995], [Kelley, 1999],

[Omidvar and Elliot, 1997], [Scales, 1985]). It is well-known that the full Newton Hessian matrix is not

guaranteed to be positive definite in an open neighbourhood of a global minimum during its inversion. Thus, the

NAMPC incorporates a new iterative strategy for guaranteed positive definiteness of the full Newton Hessian

matrix making the on line optimization computations always feasible. The optimization algorithm used by the

NAMPC is based on the Levenberg-Marquardt method which is a second-order trust region optimization method

known for robustness and fast convergence ([Fletcher, 1987], [Hagan and Menhaj, 1994], [Marquardt, 1963],

[Wu, 2008]). A difficulty with the Levenberg-Marquardt method is the selection of the adaptation parameter for

updating the minimization process ([Chiong, 2010], [Dennis and Schnabel, 1996], [Kelley, 1999], [Scales, 1985]).

In this work, a modification is made to the Levenberg-Marquardt method by incorporating into the NAMPC a


119

new strategy for: 1) iteratively selecting the adaptation parameter and 2) iteratively updating the control sequence

on the basis of the ratio between an actual and predicted decrease in the objective function being minimized.

The remaining three sections of the chapter are organized as follows. The objective function to be

minimized introduced in Section 2.2 is restated in Section 4.2. Techniques for the instantaneous linearization of

the NNARX and NNARMAX model predictors are presented in Section 4.3.1 and 4.3.2 respectively. Section

4.3.3 explains the formulation and development of the AGPC algorithm based on the linearized NN ARX model

while the NAMPC algorithm is presented in Section 4.4. Finally, Section 4.5 highlights intuitive tuning methods

for the two proposed controllers and proposes some criteria which can enhance efficient controller performance.

4.2 The Objective Function

The developed two MPC algorithms follow from the discussions presented in Section 2.2 of Chapter 2.

As it was defined in Section 2.2, the calculated and filtered desired reference signal using a first-order low-pass

digital filter is given by:

( ) '( )m

m

BR k R k

A= (4.1)

where '( )R k and ( )R k are the desired and filtered reference signals respectively; m

A and m

B are the denominator

and numerator polynomials of the filter. The MPC design is based partly on the filter tracking error capability

where m

A and m

B serve as tuning parameters used to improve the robustness and internal stability respectively of

the MPC controller.

Furthermore, following the discussions in Section 2.2, the predicted values used to calculate the control

moves are obtained by minimizing an objective function of the form given by (2.2) and re-expressed here again:

ˆ ˆ ˆ( ( )) ( ) ( ) ( ) ( ) ( ) ( )T

TJ U k R k Y k R k Y k U k U kκ ρ = − − +

(4.2)

subject to the constraints

( ) 0U k η∆ + = , u p dN N Nη≤ ≤ − (4.3)

min max( )U U k U∆ ≤ ∆ ≤ ∆ , min max( )Y Y k Y≤ ≤ (4.4)

where ( ) [ ( ) ( )]T

d pR k R k N R k N+ + … ,

ˆ ˆ ˆ( ) [ ( | ) ( | )]T

d pY k Y k N k Y k N k+ + …

ˆ( ) ( ) ( ) [ ( | ) ( | )]T

d pE k R k Y k E k N k E k N k = − + + … ,

( ) [ ( ) ( )]T

u dU k U k U k N N∆ ∆ + − …


120

where U∆ is the change in control signal; κ and ρ are two weighting matrices penalizing changes on ˆ( )Y k and

( )U k in (4.2). Note that, although a sequence of u

N moves is calculated at each sampling instant, only the first

control move ( ) ( | )U k U k k= is actually implemented and applied to control the process. The remaining control

signals are not applied because at the next sampling instant, 1k k= + a new output ( 1)Y k + is known based on

new measurements. The MPC strategy enters a new optimization loop while the remaining control signals

( 1| )u

U N k− are used to initialize the optimizer.

4.3 Adaptive Generalized Predictive Control (AGPC) Algorithm

The generalized predictive control algorithm (GPC) is a class of MPC originally proposed by Clarke et al.

[Clarke et al., 1987a and 1987b] in the context of adaptive control and has been widely used for the industrial

system control based on linear model ([Jin and Su, 2008], [Lu and Tsai, 2008], [Maciejowski, 2002], [Mjalli,

2006], [Normey-Rico and Camacho, 2007], [Salahshoor et al., 2010], [Seborg et al., 2004], [Su and Wu, 2009],

[Suárez et al.,, 2010], [Wang, 2009], [Yu et al., 2006]). The theoretical background of the proposed AGPC

follows from the work of [Clarke et al., 1987a and 1987b] with the incorporation of a neural network

identification scheme. The proposed NN-based AGPC scheme is shown in Fig. 4.1 and follows from the MPC

discussions of Chapter 2 except that it uses a linearized NN model based on the instantaneous linearization of the

nonlinear NN model. Here we assume that nonlinear NN model given by (3.23) in Chapter represents the

underlying nonlinear system of (3.5) obtained at each sampling instant, so that the system (3.1) can be expressed

in terms of the model predictor (3.5) as:

ˆ ˆ ˆˆ( , ( )) ( ( , ( )), ( ))Y k k J k k kθ ϕ θ θ= (4.5)


•

Neural Network

Model (NNARX or

NNARMAX)

Optimizer • +

−

( )E k

•

Constraints ( )d k

η-Step Ahead

Output Predictor

Linearized

Model

Parameters

'( )R k

First-Order

Low Pass

Filter

( )R k

Extract Linear

Model Parameters

Fig. 4.1: The proposed NN-based AGPC scheme.


121

4.3.1 Instantaneous Linearization of a Deterministic Nonlinear Neural Network

ARX Model

Suppose that the input-output model of the system to be controlled is available and can be defined as the

prediction model of (4.5). The regression vector for the NNARX model can be expressed from (3.6) as

[ ]( ) ( 1), , ( ), ( 1), , ( )T

a bk Y k Y k n U k U k d nϕ = − − − − −… … (4.6)

Here, the idea of the instantaneous linearization is to extract a linear model from the nonlinear neural network

model at each sample time. The extracted model linear model is then used for controller design.

By interpreting the regression vector ( )kϕ of (4.6) as the vector defining the state of the system and at

time 'k k= the nonlinear system of (4.5) is linearized around the current state ( ')kϕ to obtain the approximate

linearized model expressible as

1 0( ) ( 1) ( ) ( ) ( )a bn a n bY k AY k A Y k n B U k d B U k d n= − − − − − + − + + − − (4.7)

where anA and

bnB are the linearized model parameters obtained from (4.5) by taking the partial derivatives of the

nonlinear model with respect to the output and input parts of the regression vector respectively as follows. Let Aτ

and Bτ of orders a

n and b

n respectively be two parameters associated with anA and

bnB of orders a

n and b

n

respectively in (4.5) defined as follows:

( ) ( ')

( ) ( ')

( ( )), 0,1, ,

( )

( ( )), 0,1, ,

( )

a

n na a

b

n nb b

n

a

k k

n

b

k k

J kA n

Y k

J kB n

U k d

τ

ϕ ϕ

τ

ϕ ϕ

ϕτ

τ

ϕτ

τ

=

=

∂= − =

∂ −

∂ = = ∂ − −

…

…

(4.8)

where ( )a

Y k n− and ( )b

U k n− in (4.7) are defined respectively by

( ) ( ) ( ' ), 0,1, ,

( ) ( ) ( ' ), 0,1, ,

a

b

Y k Y k Y k n

U k U k U k n

τ τ τ τ

τ τ τ τ

− = − − − =

− = − − − =

…

… (4.9)

Note that the coefficients of Aτ and Bτ corresponds to those of 1( )A z− and 1( )B z

− defined in (2.73) and re-

written here as

1 1

1

1 1

0 1

( )

( )

a

a

b

b

n

n

n

n

A z I A z A z

B z B B z B z

−− −

−− −

= + + +

= + + +

(4.10)

By separating the ( ')an kϕ and ( ')

bn kϕ components of the current regression vector ( ')kϕ at time 'k which

constitutes the bias term ˆ( ')d k around the current operating point, the linearized model from (4.5) can then be

expressed equivalently as:

1 1 ˆ( ) [1 ( )] ( ) ( ) ( ) ( ) ( ')dY k A z Y k z B z U k e k d k

− − −= − + + + (4.11)


122

where ( )e k is a deterministic white noise and the bias term ˆ( ')d k is given by

1 1ˆ( ') ( ') ( ' 1) ( ' ) ( ' ) ( ' )

a bn a n bd k Y k AY k A Y k n BU k d B U k d n= + − + + − − − − − − − (4.12)

The system outputs are extracted from the NN model at each sampling instant as the derivative of the network

outputs with respect to the inputs ( ) [ ( ) ( )]a b

T

n nk k kτϕ ϕ ϕ= . The derivative of the NN outputs with respect to the

inputs ( )kτϕ is given from (3.15) with ˆ( ) ( )k kθ θ= by

2

, , ,0

1 1

ˆ( )1 ( )

( )

h a bn n n

j j l j l l j

j l

Y kW w f w k w

kτ

ϕϕ

+

= =

∂= − + ∂ ∑ ∑ (4.13)

where ( )f i is an hyperbolic tangent sigmoid function defined in (3.16).

Here, the approximate linearized model given by (4.11) can be interpreted as a linear model affected by

an integrated white noise and a constant disturbance ˆ( ')d k which depends on the current operating point. Thus,

Equation (4.11) can be expressed by the following integrated autoregressive with exogenous input (ARIX) model

given as:

1 1 ( )( ) ( ) ( ) ( )d e k

A z Y k z B z U k d− − −= − +

∆ (4.14)

where 11 z−∆ = − is an integrator which is included to compensate for linearization errors as well as for the

deterministic noise and the constant disturbance ˆ( ')d k . Note that in this case, the constant disturbance term is

evaluated depending on the current operating point. Note also that (4.14) corresponds to a more general form

where 1 1

1( ) c

c

n

nC z I C z C z−− −= + + + , defined and discussed in section 2.5.1, can be truncated and absorbed into

1( )A z− and 1( )B z

− polynomial matrices. A detailed treatment of this techniques can be found in [Camacho and

Bordons, 2007], [Clarke et al., 1987b], [Goodwin and Sin, 1984], [Ljung, 1999], [Maciejowski, 2002], [Normey-

Rico and Camacho, 2007], [Rossiter, 2004] and [Wang, 2009]. Note that by removing the integrator term ∆ ,

Equation (4.14) corresponds exactly to an ARX model with ( )e k assumed to be white noise [Ljung, 1999].

4.3.2 Instantaneous Linearization of a Stochastic Nonlinear Neural Network

ARMAX Model

Unlike the instantaneous linearization of the NARX model affected by deterministic noise, the

instantaneous linearization is here applied to systems affected by stochastic noise. Consider that (4.5) correspond

to the one-step ahead predictor of a nonlinear neural network autoregressive with moving average exogenous

input (NNARMAX) model of a nonlinear system. The regression vector for the NNARMAX model can be

expressed from (3.7) as:


123

[ ]( , ( )) ( 1), , ( ), ( 1), , ( ), ( 1, ( )), , ( , ( ))T

a b ck k Y k Y k n U k U k d n k k k n kϕ θ ε θ ε θ= − − − − − − −… … … (4.15)

Next, by interpreting the regression vector ( , ( ))k kϕ θ of the NNARMAX model predictor of (4.5) as a vector

defining the state of the nonlinear system, so that at time 'k k= the nonlinear system of (4.5) is linearized around

the current state ( ', ( '))k kϕ θ to obtain the approximate linearized model predictor expressible as:

1 0

1

( ) ( 1) ( ) ( ) ( )

( , ( )) ( , ( ))

a b

c

n a n b

n c

Y k AY k A Y k n B U k d B U k d n

C k k C k n kε θ ε θ

= − − − − − + − + + − −

+ + + −

(4.16)

where anA ,

bnB and cnC are the linearized model parameters obtained from (4.16) by taking the partial derivatives

of the nonlinear model with respect to the output and input parts of the regression vector respectively as follows.

Let Aτ , Bτ and Cτ of orders a

n , b

n and c

n respectively denote the three parameters associated with anA ,

bnB and

cnC of orders a

n , b

n and b

n respectively in (4.5) defined as follows:

( ) ( ')

( ) ( ')

( ) ( ')

( ( , ( ))), 0,1, ,

( )

( ( , ( ))), 0,1, ,

( )

( ( , ( ))), 0,1, ,

( )

a

n na a

b

n nb b

c

n nc c

n

a

k k

n

b

k k

n

c

k k

J k kA n

Y k

J k kB n

U k d

J k kC n

k

τ

ϕ ϕ

τ

ϕ ϕ

τ

ϕ ϕ

ϕ θτ

τ

ϕ θτ

τ

ϕ θτ

ε τ

=

=

=

∂= − =

∂ − ∂

= = ∂ − −

∂

= = ∂ −

…

…

…

(4.17)

where ( )a

Y k n− , ( )b

U k n− and ( , ( ))c

k n kε θ− in (4.16) are defined respectively by

( ) ( ) ( ' ), 0,1, ,

( ) ( ) ( ' ), 0,1, ,

( , ( )) ( , ( )) ( ' , ( ')), 0,1, ,

a

b

c

Y k Y k Y k n

U k U k U k n

k k k k k k n

τ τ τ τ

τ τ τ τ

ε τ θ ε τ θ ε τ θ τ

− = − − − =

− = − − − = − = − − − =

…

…

…

(4.18)

Note that the coefficients of Aτ , Bτ and Cτ corresponds to those of 1( )A z− , 1( )B z

− and 1( )C z− defined in

(2.73) and re-written here as

1 1

1

1 1

0 1

1 1

1

( )

( )

( )

a

a

b

b

c

c

n

n

n

n

n

n

A z I A z A z

B z B B z B z

C z I C z C z

−− −

−− −

−− −

= + + +

= + + +

= + + +

(4.19)

By separating the ( ', ( '))an k kϕ θ , ( ', ( '))

bn k kϕ θ and ( ', ( '))cn k kϕ θ components of the current regression vector

( ', ( '))k kϕ θ at time 'k k= which constitutes the bias term ˆ( ')d k around the current operating point, the

approximate linearized model from (4.5) can then be expressed equivalently as

1 1 1 ˆ( ) [1 ( )] ( ) ( ) ( ) [ ( ) 1] ( , ( )) ( ')dY k A z Y k z B z U k C z k k d kε θ− − − −= − + + − + (4.20)


124

where the third term on the right side of (4.20) constitute the stochastic noise term affecting the system and the

last term ˆ( ')d k is the bias term given as

1 1

1

ˆ( ') ( ') ( ' 1) ( ' ) ( ' )

( ' ) ( ' 1) ( ' )

a

b c

n a

n b n c

d k Y k AY k A Y k n BU k d

B U k d n C k C k nε ε

= + − + + − − − −

− − − + − + + −

(4.21)

The system outputs are extracted from the linearized neural network model at each sampling instant as the

derivatives of the network outputs with respect to the inputs ( , ( )) [ ( , ( )) ( , ( ))a bn nk k k k k kτϕ θ ϕ θ ϕ θ=

( , ( ))]c

T

n k kϕ θ . The derivative of the neural network outputs with respect to the inputs ( , ( ))k kτϕ θ is given from

(3.15) with ˆ( ) ( )k kθ θ= by

2

, , ,0

1 1

ˆ( | ( ))1 ( , ( ))

( , ( ))

h a b cn n n n

j j l j l l j

j l

Y k kW w f w k k w

k kτ

θϕ θ

ϕ θ

+ +

= =

∂= − + ∂ ∑ ∑ (4.22)

where ( )f i is an hyperbolic tangent sigmoid function defined in (3.16).

Here, the approximate linearized model given by (4.20) can be interpreted as a linear model affected by

an integrated white noise and a time-varying disturbance ˆ( ')d k which depends on the current operating point.

Thus, Equation (4.20) can be expressed by the following controlled autoregressive integrated moving average

(CARIMA) model given in [Clarke et al., 1987] which is widely used in most GPC formulations:

1 1 1 ( )( ) ( ) ( ) ( ) ( )d e k

A z Y k z B z U k d C z− − − −= − +

∆ (4.23)

where 11 z−∆ = − is an integrator which is included to compensate for linearization errors as well as the integrated

moving average disturbance ˆ( ')d k and ( )e k is a white noise sequence independent of past control inputs. In this

case as well, the constant disturbance term is evaluated depending on the current operating point or region. Note

here again that by removing the integrator term ∆ , Equation (4.23) corresponds to an ARMAX model.

4.3.3 The AGPC Algorithm

The AGPC Predictor

Consider the linearized ARIX model given by (4.14) in Section 4.3.1. Note that the same formulations

presented below also apply for the linearized CARIMA model given by (4.23). There are several methods of

computing the output predictions using (4.14) to determine the sequence of future control signals such as those

discussed in [Albertos and Ortega, 1989], [Clarke et al., 1987a and 1987b], [Camacho and Bordons, 2007], and

[Rossiter, 2004].

Although, it is not an objective in this study to compare and contrast the different methods of solving

(4.14) or (4.23); however it is worth marking few remarks about existing methods. A popular method of solution


125

is the recursions of Diophantine equation [Clarke et al., 1987a and 1987b]. Rossiter [Rossiter, 2004] argued that

the method based on the recursion of the Diophantine equation: 1) tends to be obscure how the predictions are

obtained, 2) appears confusing and can pose difficulty for those not familiar with GPC, and 3) that the historical

reason for this method is unclear; and then introduced a method based on matrix manipulations which appears

more difficult from computational efficiency view point. Albertos and Ortega [Albertos and Ortega, 1989]

proposed two methods based on: 1) a multirate state-space formulation of the process model, and 2) iteration of

the process difference equations; where both methods involve matrix inversion without provisions for ill-

condition, robustness and stability of the system. The method used in [Camacho and Bordons, 2007] follows

closely from that in [Clarke et al., 1987a and 1987b]. While all these methods seek the same result, a simplified

approach based on the Diophantine equation followed by its recursion is presented below.

To obtain the η -step ahead predictor for the AGPC criterion given in (4.2), consider the output

predictions from the ARIX model of (4.14) at time k η+ , derived from the linearized deterministic input-output

NNARX model, given as

1 1( ) ( ) ( ) ( ) ( )dA z Y k z B z U k e kη η η− − −∆ + = ∆ + + + (4.24)

Note that similar result of (4.24) for the CARIMA model of (4.23) can be obtained where the second term in

(4.23) is incorporated into the second term ( )e k η+ in (4.24) (see for example [Clarke et al., 1987a and 1987b],

[Camacho and Bordons, 2007], [Maciejowski, 2002], and [Rossiter, 2004]). Thus, in order to solve (4.24), the

following Diophantine equation is introduced

1 1 11 ( ) ( ) ( )A z E z z F z

η

η η

− − − −= ∆ + (4.25)

where 1( )E zη

− and 1( )F zη

− are polynomial matrices of degree d

Nη − and a

n given respectively by

1 1 ( 1)

,0 ,1 , 1( )E z E E z E z

η

η η η η η

− − − −

−= + + +

1 1

,0 ,1 ,( ) a

a

n

nF z F F z F zη η η η

−− −= + + +

By multiplying both sides of (4.24) by 1( )E zη

− and using (4.25) one obtains

1 1 1 1( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

dY k z E z B z U k F z Y k E z e kη η ηη η η− − − − −

+ = ∆ + + + + (4.26)

Since the degree of 1( )

dE z Nη η−

= − , it follows from (4.26) that the noise terms are all in the future. Note that the

only known terms in (4.26) are the sequence of future control inputs ( )U k η∆ + . Thus, the minimum variance

predictor of (4.26) can be expressed as the expectation conditioned on the available information with ( ) 0E e k =

up to time k given as:

1 1ˆ( | ) ( ) ( ) ( ) ( )Y k k G z U k d F z Y kη ηη η− −+ = ∆ + − + (4.27)

where 1 1 1( ) ( ) ( )G z E z B zη η

− − −= is a polynomial matrix of degree

b dn Nη+ − . Suppose that both sides of (4.25) are

multiplied by 1 1( ) ( )B z A z− −∆ to obtain


126

111 1 1

1 1

( )( )( ) ( ) ( )

( ) ( )

F zB zB z E z z B z

A z A z

ηη

η

−−− − − −

− −= +

∆ ∆ (4.28)

Comparing (4.27) and (4.28), it is obvious that the first η terms in 1( )G zη

− are the first η coefficients of the step

response of 1 1( ) ( )B z A z− −∆ . This implies that the first

dNη − terms in 1( )G zη

− must equal the first d

Nη −

terms in 1

1( )G zη

−

− .

Since the control inputs are the only known quantities in (4.27); then equation (4.27) can be expressed in

an equivalent form to show the contributions of all the responses. The (4.27) can also be expressed as:

1 1

1 1

0 0

1 1 1 1

( ) ( ) ( ) ( ), 1

ˆ( | ) ( ) ( ) ( ) ( ) ( ),

( ) ( ) ( ) ( ) ( ) ( ) ( ),

d d

d

d

N N d

N d p

G z U k d F z Y k N

Y k k G U k G z G U k F z Y k N

N NG z U k d z G z G z U k F z Y k

η η

η

η η

η η

η η

ηη

− −

− −

−− − − −

∆ + − + ≤ <

+ = ∆ + − ∆ + = < ≤ ∆ + − + − ∆ +

(4.29)

where ( )1 1

0 1( ) d

d

N

NG z G G z G zη

η

− −− −

−= + + + . Note that the first row in (4.29) depends only on the future values

of the control signal and can be interpreted as the forced response, that is, the response obtained when the initial

conditions are zero. The second row depends on the future deterministic disturbances. The third row depends on

the past values of the process outputs, measured disturbances and input variables and corresponds to the free

response of the process considered if the control signals and measured disturbances are kept constant.

In order to obtain the AGPC control law, it is first necessary to solve the Diophantine equation given in

(4.25) to obtain the value of 1( )F zη

− associated with (4.29). The solution method is usually called the recursion of

the Diophantine equation proposed and discussed in [Clarke et al., 1987a, 1987b].

Recursion of the Diophantine Equation

As pointed out in Section 2.5.1, 1( )A z− is monic. Thus, the solution to (4.25) begins with an initial solution by

setting 1( )E zη

− equal to 1 at 1η = and computing 1( )F zη

− as follows:

1 1 1 1

1 11 ( ) ( ) ( )A z E z z F z− − − −= + (4.30)

where 1 1( ) ( )A z A z− −∆ . Obviously, the first solution becomes

1

1( ) 1E z

− = and 1 1

1( ) 1 ( )F z z A z− − = −

(4.31)

Next, assuming that the solution to

1 1 11 ( ) ( ) ( )A z E z z F z

η

η η

− − − −= + (4.32)

exists for some η , equation (4.32) can be expressed at the time steps 1η + , as:


127

1 1 ( 1) 1

1 11 ( ) ( ) ( )A z E z z F zη

η η

− − − + −

+ += + (4.33)

Subtracting (4.32) from (4.33) the following relationship is obtained

1 1 1 1 1 1

1 10 ( ) ( ) ( ) ( ) ( )A z E z E z z z F z F zη

η η η η

− − − − − − −

+ + = − + −

(4.34)

Since the degree of ( )1 1

1( ) ( )E z E zη η

− −

+ − and 1

1( )E zη

−

+ in (4.34) equals ,η it may be convenient to define

1 1 1 1

1 1( ) ( ) ( )E z E z E z zη η

η η η ηξ− − − + −

+ +− = + (4.35)

where 1η

ηξ + are the coefficients of the polynomial matrix 1

1( )E zη

−

+ . Using (4.35), Equation (4.34) can be

expressed equivalently as

1 1 1 1 1 1 1

1 10 ( ) ( ) ( ) ( ) ( )A z E z z z F z F z A zη η

η η η ηξ− − − − − − − +

+ + = + − +

(4.36)

Due to the monic nature of 1( )A z− , it is evident that 1

1( ) 0E zη

−

+ = . So that (4.35) becomes

1 1 1

1( ) ( )E z E z zη η

η η ηξ− − + −

+ = + (4.37)

and 1

0Fη η

ηξ + = (4.38)

So that (4.36) can be expressed as

1 1 1 1

1( ) ( ) ( )F z z F z A zη

η η ηξ− − − +

+ = +

(4.39)

Recall that the first row of (4.29) corresponds to the future values of the control signals. Thus, using (4.37) to

(4.39), the first 1η + terms of 1( )G zη

− can be computed as follows:

1 1 1

1 1

1 1

0

( ) ( ) ( )

( ) ( )

G z B z E z

G z z B z F

η η

η η

η

− − −

+ +

− − −

=

= +

(4.40)

Note that the first η terms of 1

1( )G zη

−

+ in (4.40) are identical to those of 1( )G zη

− in (4.29). Thus, with these

procedures the predictions of (4.29) can be obtained recursively which is then used to compute the optimal control

signal.

The AGPC Control Law and the Optimal Control Signal

To obtain AGPC control law as well as the sequence of optimal future control signal ( )U k , Equation

(4.29) is expressed in the following compact form

ˆ( ) ( ) ( ) ( )Y k k U k H k= Ψ + (4.41)

where ˆ ˆ ˆ ˆ( ) [ ( ) ( 1) ( )]T

d d pY k Y k N Y k N Y k N= + + + +…

( ) [ ( ) ( 1) ( )]T

u dU k U k U k U k N N= ∆ ∆ + ∆ + −


128

( ) [ ( ) ( 1) ( )]T

d d pH k H k N H k N H k N= + + + +

1

( )1 1 1

0 1( ) [ ( ) ] ( ) ( ) ( )d dN N

NH k z G z G G z G z U k F z Y kη η

η η ηη − − −− − −

−+ = − − − − ∆ +

and ( )kΨ is a ( )1p d uN N N− + × polynomial matrix defined as

0

1 0

1

0

1 1

0 0

0

( )

p d p d p d uN N N N N N N

G

G G

Gk

G

G G G− − − − − +

Ψ =

Substituting (4.41) into the objective function in (4.2) gives

ˆ( , ( )) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )T

TJ k U k R k k U k H k R k k U k H k U k U kκ ρ = − Ψ − − Ψ − +

which can also be expressed in quadratic form as

0

1ˆ( , ( ))2

T TJ k U k U U b U H= Γ + + (4.42)

where

0

2( ),

2( ) ,

( ) ( )

T

T T

T

I

b R H

H R H R H

κ ρ

κ

κ

Γ = Ψ Ψ +

= − Ψ

= − −

(4.43)

The minimization of (4.42) can be expresses as

ˆ( ) min ( , ( ))U k J k U k= (4.44)

subject to (4.4), defined here for the AGPC case as

min max

min max

( )

( )

U U k U

Y U k H Y

∆ ≤ ∆ ≤ ∆

≤ Ψ + ≤ (4.45)

Equation (4.44) is solved subject to (4.45) using MATLAB based on the quadratic programming technique

described in [Boyd and Vandenberghe, 2006] and [Antoniou and Lu, 2007]. Note that only the first vector ( )U k∆

of the solution ( )U k is applied to control the system because at the next sample time 1k + , a new measurement

as well as a new ˆ( | 1)Y k k + will be available and a new ( )U k will be computed and applied. Thus, at each

sampling time, the complete AGPC algorithm is repeated.

Thus, the AGPC algorithm executed at each sample time can be summarized in the following stepwise

procedure:

Step 1: Compute and filter the desired reference signal using (4.1).

Step 2: Extract the linearized NN model parameters using (4.7) for the NNARX or (4.16) for the

NNARMAX model to obtain (4.14) or (4.23) respectively as shown in Section 4.3.1 and 4.3.2.

Step 3: Compute the current system output using (4.14) and evaluate the criterion using (4.2).

Step 4: Solve the Diophantine equation (4.25) recursively as follows:


129

Since 1( )A z− is monic, initialize the recursion by setting 1

1( ) 1E z

− = .

Solve (4.25) to obtain 1 1

1( ) [1 ( )]F z z A z

− −= − ∆ .

Set 1.η =

while p

Nη < , do

1) Evaluate 0

( 1) ( )Fηξ η η+ =

2) for 0τ = to a

n , do

1

1 1 0( 1) ( ) ( ) ( )F F A z Fτ τ τη η η−

+ ++ = −

end for τ ,

3) 1η η= + , Go To 1).

end while η .

Step 5: Compute the η -step ahead output predictions using (4.29 using the solutions of (4.40).

Step 6: for ' 1τ = to u

N

Substitute the solution of Step 5 (that is, from (4.29)) into (4.2) which results in (4.41).

end for 'τ .

Step 7: Using MATLAB quadratic programming solver, evaluate (4.44) subject to (4.45) to obtain the

optimal control signal ( )U k .

4.4 Nonlinear Adaptive Model Predictive Control (NAMPC) Algorithm

The developed NAMPC presented in this Section follows from the Fig. 2.1 and Fig. 2.2 introduced in

Section 2.2 of Chapter 2 except that the “process model” block in Fig. 2.1 is replaced by a neural network (NN) as

shown in Fig. 4.2. Unlike the AGPC which is based on the use of linearized NN model for the controller design,

the NAMPC utilizes the nonlinear NN model directly for the controller design. Thus, at each sampling instant, a

neural network model is obtained through a training procedure using either the modified Levenberg-Marquardt

algorithm (MLMA) or the adaptive recursive least square (ARLS) training algorithms proposed in Chapter 3. To

compute the output of the system (3.5) based on the identified NN model, we assume that 1;d

N d= = and that

(3.71) using the ARLS algorithm or (3.87) using the MLMA algorithm approximates (3.5), so that ˆ( ) ( )k kθ θ= .

For notational convenience, a

n n= and b

m n= are used here for the NAMPC formulation. Although, the

formulation is based on the NNARX model, its extension to the NNARMAX model is essentially the same based

on the NN model. Following the MPC discussion in Section 2.2, at time k we assume that the NN is available, so

that the one step-ahead model predictor of (3.5) can be expressed as:


130

ˆ ˆ( ) ( ( ), , ( ), ( 1), , ( ))Y k J U k d U k d m Y k Y k n= − − − − −… … (4.46)

Using (4.46), the η-step ahead model predictor becomes:

ˆ ˆ ˆ ˆ( | ) ( ( 1), , ( min( , )), ( 1),..., ( max( ,0)),

( ), , ( ))

Y k k J Y k Y k n Y k Y k n

U k d U k d m

η η η η η

η η

+ = + − + − − − −

+ − + − −

…

… (4.47)

Next, we calculate the predicted outputs of the system (4.47) based on the NN model of the system (4.46) in terms

of (3.15), so that (4.47) becomes:

( ), ,0

1

ˆ( | ) ( , )hn

i j j i

j

Y k k W f a j Wη η=

+ = +∑ (4.48)

where ( )j

f i is given by (3.16) except that here a

is replaced by ( , )a jη which is given below as

min( , )

, ,

1 min( , ) 1

, 1 ,0

0

ˆ( , ) ( ) ( )

( )

d n n

j j

n

m

j n j

a j w Y k w Y k

w U k d w

η

τ ττ τ η

τ

τ

η η τ η τ

η τ

−

= = +

+ +

=

= + − + + −

+ + − − +

∑ ∑

∑

(4.49)

Unlike the AGPC case, the proposed NAMPC computes the optimal future control signal based on nonlinear

optimization using the trained neural network model at each sampling instant based on the control scheme shown

in Fig. 4.2. The nonlinear optimization is based on the full Newton method with a new strategy that guarantees the

positive definiteness of the second-order Hessian matrix.

Using the η-step ahead model predictor given in (4.48), the nonlinear minimization problem of (4.2) can

be expressed as follows:

ˆ( ) arg min ( , ( ))U

U k J k U k= (4.50)

Subject to the constraints in (4.3) and (4.4); where the global minimizer of (4.50) where ( )( ) ( ) ( )U k U k U kτ∈ ⊂ is

such that:


•

Neural Network

Model (NNARX or

NNARMAX)

Nonlinear

Optimizer •

+ −

'( )R k

( )E k

•

Constraints

First-Order

Low Pass

Filter

( )d k

( )R k

Fig. 4.2: The proposed NN-based NAMPC strategy with a NN model.


131

( ) ( )ˆ ˆ( , ( )) ( , ( )), ( ) ( )J k U k J k U k U k U kτ τ≤ ∀ ∈ .

The algorithm proposed for solving (4.50) uses the full Newton method based on the Levenberg-

Marquardt algorithm with the following adaptive updating rule:

( ) ( )( ) ( ) ( )U k U k kτ τζ= + (4.51)

which is applied for updating the sequence of future optimal control signal ( )U k , ( ) ( )U kτ is the current iterate of

the control sequence, and ( ) ( )kτζ the search direction given by the following expression:

( )1

( ) ( ) ( ) ( )( ) [ ( )] [ ( )]k H U k I G U kτ τ τ τζ λ

−

= − + (4.52)

where ( )τλ is the adaptation parameter, I is a diagonal matrix; ( )[ ( )]G U kτ and ( )[ ( )]H U k

τ are the Jacobian and

Newton Hessian matrices given respectively as:

( )

( )

( )

( ) ( )

( )

( ) ( )

ˆ( , ( ))[ ( )]

( )

( )2 [ ( )] ( ) 2 ( )

( )

U k U k

T

U k U k

J k U kG U k

U k

U kU k E k U k

U k

τ

τ

τ

τκ ρ

=

=

∂=

∂

∂ = − ⋅ ⋅Φ ⋅ + ⋅ ⋅ ∂

(4.53)

( )

( )

2( )

2

( ) ( )

( ) ( )

ˆ( , ( ))( )

( )

ˆ ( ) ( ) ( )( ) 2

( ) ( ) ( ) ( )

U k U k

T T

U k U k

J k U kH U k

U k

Y k U k U kE k

U k U k U k U k

τ

τ

τ

κ ρ

=

=

∂ = ∂

∂ ∂ ∂ ∂

= ⋅ + ⋅ ⋅ ∂ ∂ ∂ ∂

(4.54)

The Φ in (4.53) is the partial derivatives of (4.48) with respect to the predicted outputs. The computation of Φ

and the future control signals are explained below.

Computation of Φ and the Future Control Signal

Consider the optimal control signal, search direction, the Jacobian matrix and the Hessian matrix given by

(4.51) through (4.54) respectively. To simplify the computation of Φ in (4.53), the control signal ( )U k is

decomposed into the past and future control signals ˆ( , )a jη given below as:

min( , ) min( 2, 1)

, ,

1 1

min( , ) min( , )

, 1 ,

2 1

, , 1

1 1

ˆˆ( , ) ( ) ( 1)

ˆ( ) ( )

( ) ( )

d d u

d

d u d

d

N n N N m

j j n u

N m n

j n d j

N N N

n m

j j n d

N

a j w Y k w U k N

w U k N w Y k

w Y k w U k N w

η η

τ ττ τ

η η

τ ττ η τ η

τ ττ η τ η

η η τ

η τ η τ

η τ η τ

− − − + +

+

= =

−

+ +

= − − + = − +

+ +

= + = − +

= + − + + −

+ − + − + + −

+ + − + − + − +

∑ ∑

∑ ∑

∑ ∑ ,0j

(4.55)


132

The first three sums of (4.55) depend on future control signals while the last three sums depend on past control

signals.

The computation of ( ) ( )U k U k∂ ∂ appearing in (4.53) and (4.54) follows from the constraint specified in

(4.3) for the control horizon, so that the derivative of ( ) ( )U k U k∂ ∂ evaluated at ( ) ( )U k U kτ= can be expressed

directly as:

( )( ) ( )

1 0 0 0 0

1 1 0 0 0( )

0( )

0 0 1 1 0

0 0 0 1 1

U k U k

U k

U k τ=

− ∂ =

∂ −

−

(4.56)

Note that since ( ) ( ) ( 1)U k U k U k∆ = − − , Equation (4.73) is a matrix of dimensionu u

N N× .

Next, the computation for ˆ( ) ( )Y k U k∂ ∂ appearing in (4.54) evaluated at ( ) ( )U k U kτ= is expressed as:

( )( ) ( )

ˆ ˆ( ) ( )

( ) ( 1)ˆ ( )

( )ˆ ˆ( ) ( )

( ) ( 1)

d d

u

U k U k

p p

u

Y k N Y k N

U k U k NY k

U kY k N Y k N

U k U k N

τ=

∂ + ∂ +

∂ ∂ + − ∂ =

∂ ∂ + ∂ + ∂ ∂ + −

(4.57)

Thus, Equation (4.57) is a polynomial matrix of dimension ( 1)u p dN N N× − + . Note that the past control signals

do not contribute to the output predictions because new measurements will be available at the next sampling

instant 1k + . Thus, Φ is computed based only on the future control signals for all [ , ]d pN Nη ∈ and for all

[0,min( 1, 1)]u

l Nη∈ − − ; and can be defined from (4.57) as:

1

ˆ( ) ˆˆ'( ( , )) ( , , )( )

hn

j

j

Y kW f a j b l j

U k l

ηη η

=

∂ +Φ = = ⋅

∂ +∑ (4.58)

where ˆ( , )a jη is computed from (4.55) while ˆ( , , )b l jη for all [0,min( 1, 1)]u

l Nη∈ − − is computed using the first

three sum of ˆ( , )a jη in (4.55) since these terms corresponds to the future control signals and is defined here as:

min( , ) min( 2, 1)

, ,

1 1

min( , )

, 1

2

ˆ ( )( )ˆ( , , )( ) ( )

( )

( )

d d u

d

d u

N n N N m

u

j j n

N m

d

j n

N N

U k NY kb l j w w

U k l U k l

U k Nw

U k l

η η

τ ττ τ

η

ττ η

τη τη

η τ

− − − + +

+

= =

−

+ +

= − − +

∂ + −∂ + −= +

∂ + ∂ +

∂ − + − +∂ +

∑ ∑

∑ (4.59)

Note that from (4.59), the three partial derivative terms can be evaluated respectively as follows:

ˆ( )0, 1

( )d

Y kl N

U k l

η τη τ

∂ + −= ≥ − − +

∂ + (4.60)


133

1,( )

0,( )

uul NU k N

OtherwiseU k l

ητ = −∂ + −=

∂ + (4.61)

1,( )

0,( )

ddl NU k N

OtherwiseU k l

η τη τ = − −∂ − + −=

∂ + (4.62)

Using (4.77) to (4.60), Equation (4.59) can be reduced to the following form defined by ˆ( , , )c l jη given below as:

min( , ) min( 2, 1)

, ,

1 1

min( , )

, , 1

1

min( , )

,

1

ˆ ( )( )ˆ( , , ) , 1

( ) ( )

ˆ( ), max(0, ) 2

( )

ˆ

d d u

d

d

d

N l n N N m

u

j j n u

N l n

j j n N l d u

N l n

j

U k NY kc l j w w l N

U l U k l

Y kw w N m l N

U l

Yw

η η

τ ττ τ

η

τ ητ

η

ττ

τη τη

τ

η τη

τ

− − − − + +

+

= =

− −

+ − − +

=

− −

=

∂ + −∂ + −= + = −

∂ + ∂ +

∂ + −= + − − ≤ ≤ −

∂ +

∂=

∑ ∑

∑

∑ , 1

( ), 0 max(0, )

( ) dj n N l d

kw l N m

U lη

η τη

τ+ − − +

+ −+ ≤ ≤ − −

∂ +

(4.63)

Thus, using (4.63), the computation of the first-order derivative in (4.58) can be reduced to

1

ˆ( )ˆ ˆ'( ( , )) ( , , )

( )

hn

j

j

Y kW f a j c l j

U k l

ηη η

=

∂ +Φ = = ⋅

∂ +∑ (4.64)

Note that ˆ'( ( , ))f a jη is the first-order derivative of (3.16) evaluated for the ˆ( , )a jη contained in (4.64) which can

be expressed in a simplified form as

2ˆ ˆ'( ( , )) 1 ( ( , ))f a j f a jη η= − (4.65)

Finally, the computation of the second-order derivative ˆ ( )

( )( ) ( )

TY k

E kU k U k

∂ ∂ ∂ ∂

appearing in (4.54)

evaluated at ( ) ( )U k U kτ= begins with the vector inside the parenthesis which is expressed as follows:

2ˆ ˆ( ) ( )

( ) ........ ..... ( )( ) ( )

ˆ ( )( )

( )ˆ ˆ( ) ( )

( ) ... ( )( 1) ( 1)

d

d p

T

d p

d p

u u

Y k N Y k NE k N E k N

U k U kY k

E kU k

Y t N Y k NE k N E k N

U k N U k N

∂ + ∂ + + + + +

∂ ∂ ∂ =

∂ ∂ + ∂ +

+ + + + ∂ + − ∂ + −

(4.66)

Next, taking the derivative of (4.66) with respect to ( )U k results in the following u u

N N× matrix:

2 2

2

2 2

2

ˆ ˆ( ) ( )( ) ( )

( ) ( ) ( 1)ˆ ˆ ˆ( ) ( ) ( )

( )( ) ( ) ( ) ( )

ˆ ˆ( ) ( )( ) ( )

( ) ( 1) ( 1)

T T

uT T

T T

u u

Y k Y kE k E k

U k U k U k NY k Y k Y k

E kU k U k U k U k

Y k Y tE k E k

U k U k N U k N

∂ ∂

∂ ∂ ∂ + − ∂ ∂ ∂ ∂ = − ∂ ∂ ∂ ∂ ∂ ∂

∂ ∂ + − ∂ + −

(4.67)


134

The second term in (4.67) is formed by multiplying the vectors derived from (4.66). In terms of the network

parameters, the second-order derivatives defined by (4.67) can be calculated using (4.64) for all [ , ],d pN Nη ∈ for

all [0,min( 1, 1)]u

l Nη∈ − − and for all [0, ]p l∈ as follows:

2

1 2

1

ˆˆ( ) ( , , )

( ) ( ) ( )

hn

j

j

Y k d p jW b b

U k l U k p U k p

η η

=

∂ + ∂= + ∂ + ∂ + ∂ + ∑ (4.68)

where 1ˆˆ ˆ''( ( , )) ( , , ) ( , , )b f a j c l j d p jη η η= ⋅ ⋅ ,

2ˆ'( ( , ))b f a jη= ,

min( , ) 2

,

1

ˆ ˆ( , , ) ( )

( ) ( ) ( )

dN l n

j

d p j Y kw

U k p U k l U k p

η

ττ

η η τ− −

=

∂ ∂ + −=

∂ + ∂ + ∂ +∑

and ˆ( , , )d p jη is again computed from the first three sum of ˆ( , )a jη in (4.55) and is defined as

min( , ) min( 2, 1)

, ,

1 1

min( , )

, 1

2

ˆ ( )( )ˆ( , , )( ) ( )

( )

( )

d d u

d

d u

N n N N m

u

j j n

N m

dj n

N N

U k NY kd p j w w

U k p U k p

U k Nw

U k p

η η

τ ττ τ

η

ττ η

τη τη

η τ

− − − + +

+

= =

−

+ +

= − − +

∂ + −∂ + −= +

∂ + ∂ +

∂ − + − +∂ +

∑ ∑

∑ (4.69)

Note that 'f is the first-order derivative of (3.16) evaluated for the ˆ( , )a jη contained in (4.68) and expressed in

(4.65) while ''f is the second-order derivative of (3.16) evaluated again for ˆ( , )a jη in (4.68) as expressed below

2

ˆ ˆ ˆ''( ( , )) 2 ( ( , )) '( ( , ))

ˆ ˆ2 ( ( , )) 1 ( ( , ))

f a j f a j f a j

f a j f a j

η η η

η η

= −

= − − (4.70)

Computing the Optimal Control Signal

From the computation of ( ) ( )U k U k∂ ∂ appearing in (456) subject to the constraints specified in (4.3) for

the control horizon, it obvious that the Jacobian matrix in (4.53) is positive definite. Also, the second term of the

Hessian matrix in (4.54) is positive definite whereas the first term may not necessarily be positive definite.

Occasionally, this term may become non-positive definite and ill-conditioned or close to being singular. This is

the well-known problem with the Newton method in that the Hessian is not guaranteed to be positive definite in

an open neighborhood of a global minimum. Thus, it is necessary to check that the Hessian is positive definite

before updating the sequence of control inputs. This check will also ensure that the optimization is in a descent

direction towards the minimum. Instead of performing the check for the positive definiteness of the Hessian

matrix on ( )[ ( )]H U kτ directly as given in (4.54), the check on ( )[ ( )]H U k

τ is performed together with ( )τλ and is

expressed in the following form from (4.52) as:


135

( ) ( ) ( )( ) [ ( )]V k H U k Iτ τ τλ= + (4.71)

to obtain a value of ( )τλ that will satisfy this condition. An immediate solution could be to increase or select a

sufficiently large κ and ρ to ensure that (4.54) remain positive definite, but this is not a feasibly approach with

respect to the criteria for controller tuning as outlined in Section 4.5. This problem has led to the formulation of

many algorithms to ensure the positive definite of the Hessian matrix (4.54) which can be summarized under three

main approaches: 1) the proposed Levenberg-Marquard method ([Antoniou and Lu, 2007], [Chiong, 2010],

[Omidvar and Elliot, 1997]); 2) the quasi-Newton algorithm based on the so-called BFGS (Broyden-Fletcher-

Goldfrab-Shanno) method ([Kelly, 1995], [Kelley, 1999], [Song et al., 2006]); and 3) the direct application of the

Gauss-Newton method introduced in Chapter 3 ([Dennis and Schnabel, 1996], [Scales, 1985]).

One alternative could be to use the Gauss-Newton method introduced in Chapter 3 that was used for the

development of the identification algorithms, but rather based on the first-order approximation of the error

between the desired reference and the output. However, due to the fact that the weighting matrices κ and ρ , the

errors may often be large and consequently the convergence may be slow since the optimization is a large residual

problem [Dennis and Schnabel, 1996].

Another alternative could be to use the Quasi-Newton approach to construct a positive definite

approximation of the inverse Hessian matrix based on the BFGS method using information embedded in the

previous evaluation of the Jacobian matrix (4.53) and criterion (4.2). Because, the BFGS method approximates the

Newton search direction, it must be complemented with a line search to ensure the convergence of the algorithm

and to guarantee the validity of the BFGS update ([Dennis and Schnabel, 1996], [Kelly, 1995], [Kelley, 1999],

[Scales, 1985]). Although, the method may have good local convergence when the Hessian is difficult to derive

by analytical means but it has several weaknesses in that it is difficult to implement and may be complex,

unmanageable and computationally expensive as the control and prediction horizon increases. In addition, the

gradient of the Jacobian matrix is necessary and vital for to point out valid descend directions, and thus

approximating the gradient and the inverse Hessian matrices may lead to poor results. In addition, the line search

requires too many evaluations of the criterion. The full Newton method employed here does not approximate

either the Jacobian or the Hessian matrices and employs the dynamic properties of the Levenberg-Marquardt

method [Marquardt, 1963]. A major issue the lies on guaranteed positive definiteness of the Hessian matrix given

in (4.88)

This NAMPC proposed here incorporates a new algorithm given in Table 4.1 which is based on the

Cholesky factorization method ([Antoniou and Lu, 2007], [Boyd and Vandenberghe, 2007]) but implemented

iteratively to ensures that (4.71) is always positive. The algorithm first computes (4.71) and checks for positive

definiteness. If this condition is not satisfied, the algorithm iteratively selects new ( )τλ to compute (4.71) and

terminates when (4.71) becomes positive definite without over– or under– conditioning the Hessian matrix. Once

the positive definiteness of (4.71) is achieved, the Cholesky factors , ( )b aL k are produced by the algorithm of


136

Table 4.1: Iterative algorithm for selecting ( )τλ for guaranteed positive definiteness of the Hessian Matrix

Initialize: 0.5,aλ = 1,bλ = 2,cλ = 4,dλ = 6,eλ = [ , , , , ]km a b c d eλ λ λ λ λ= , ( )kl length km= , 0iter = ,

( )( ( ))sm size V kτ= ,

( )

,( ) 1

a aL k

τ= − , 1p = to 2( )

uN in step of ( 1)

uN + .

Compute ( ) ( )V kτ using (4.71)

for 1sn = to sm

while iter kl< or ( )

,0

a aL

τ< , do

for 1kn = to kl , do

for 1a = to p , do

1/ 21

( ) 2

, , ,

1

( ) ( ) ( )a

a a a a a j

j

L k V k L kτ−

=

= −

∑

if ( )

,0

a aL

τ< , break, End if

for 1b a= + to p , do

1/ 21

( )

, , , ,

1,

1( ) ( ) ( ( ) ( ))

b

b a b a b j a j

ja a

L k V k L k L kL

τ−

=

= − ⋅

∑

End for,

End if

if ( )

,0

a aL

τ< , (1, )km knλ λ= ⋅ , Re-compute (4.71)

for 1a = to p , do

1/ 21

( ) 2

, , ,

1

( ) ( ) ( )a

a a a a a j

j

L k V k L kτ−

=

= −

∑

End for

Else

for 1a = to p , do

1/ 21

( ) 2

, , ,

1

( ) ( ) ( )a

a a a a a j

j

L k V k L kτ−

=

= −

∑

for 1b a= + to p , do

1/ 21

( )

, , , ,

1,

1( ) ( ) ( ( ) ( ))

b

b a b a b j a j

ja a

L k V k L k L kL

τ−

=

= − ⋅

∑

End ( )

,a aL

τ, End for a, End for b,

End for kn

1iter iter= +

if iter kl> and ,

( ) 0a a

L k < , break, End if

Set ( )τλ λ← and recomputed ( )

,( ) ( )

b aV k L k

τ← using ( )τλ .

End while iter ,

End for sn

Table 4.1 whose inverse is guaranteed to be positive definite for use in computing the searching direction ( ) ( )kτζ

in (4.52). The key parameter in the algorithm are the elements of km and how τλ is to be updated at each

iteration.


137

Having satisfied the positive definiteness of (4.71), it is also necessary to determine the optimal control

signal as a global minimum with the assumption that the optimization is in the descend direction. Thus, the

computation of the searching direction ( ) ( )kτζ given by (4.52) is necessary. Here, the Cholesky factors

, ( ( ))b aL kθ obtained from Table 4.1 is reused to compute ( ) ( )kτζ in a two-stage forward and backward

substitution procedures given respectively as:

( ) ( ) ( )

, [ ( )] ( ) [ ( )]b aL U k k G kτ τ τζ θ= (3.72)

( )1

( ) ( ) ( )

,( ) [ ( )] [ ( )]T

b ak L U k G k

τ τ τζ θ−

= (3.73)

Because the Hessian matrix is now guaranteed to be positive definite following the algorithm of Table

4.1, the searching direction will be in the descend direction and the optimization will converge faster given a

reasonably good initial control input ( )U k that is close to the global minimum ( )U k . A widely used method is

to combine the Levenberg-Marquardt method with the trust region approach as proposed in [Fletcher, 1987]. The

method is formulated as a minimization problem around the trust region given as:

( ) ˆ( ) arg min ( , ( ))U

U k J k U kτ = (4.74)

subject to ( 1) ( ) ( )( ) ( )U k U kτ τ τδ− − ≤ (4.75)

where ( )τδ is the trust region radius within which ( )U k can be found and ( 1) ( )U kτ − is the control signal at the

previous iteration. While in the BFGA method, a fixed value of ( )τδ must be selected, the NAMPC proposed in

this work will adjust ( )τδ indirectly according to the value of ( )τλ obtained from the algorithm of Table 4.1.

The last issue to be addressed is related to how ( )U k is to be updated and the step size of the next

iteration. Many algorithms have been proposed for this purpose ([Chiong, 2010], [Hagan and Menhaj, 1994],

Omidvar and Elliot, 1997]) and in some cases fixed step size is used as in [Colin et al, 2007], [Dennis and

Schnabel, 1996], [Kelley, 1995 and 1999], and [Scales, 1985]. For example, in [Colin et al., 2007] a fixed step

size of 4 is chosen where Denis and Schnabel [Dennis and Schnabel, 1996] selected the step size as 1 and

suggested the name Damp Gauss-Newton.

Here we reuse ( )τλ for this purpose. Note that ( )τλ characterizes a hybrid adaptation parameter and has

several effects ([Chiong, 2010], [Hagan and Menhaj, 1994], [Marquardt, 1963], [Fletcher, 1987], [Wu, 2008]): 1)

for large values of ( )τλ , Equation (4.51) becomes steepest descent algorithm (with step ( )1 τλ ) which requires a

descend search method; and 2) for small values of ( )τλ , Equation (4.51) reduces to the Gauss-Newton method

where (4.51) may become non-positive definite or ill-conditioned and the algorithm of Table 4.1 is used.

The approach used in the work builds on the indirect method proposed independently in [Fletcher, 1987]

and [Moré, 1983] and has been suggested to outperform the original Levenberg-Marquardt method [Marquardt,

1963]. The idea here is to observe how well the actual reduction in the criterion (4.2) matches the reduction in the


138

Table 4.2: The implementation steps for the nonlinear adaptive model predictive control (NAMPC) algorithm.

Step 1: Initialize 3[10 ,1]λ −∈ , 4 1[10 ,10 ]δ − −= and specify maximum number of iterations (iter

U ). Set 0τ = .

Step 2: Given the neural network model of the system to be controlled, extract the neural network model.

Step 3: Specify initial sequence of future control inputs (0) ( )U k and initial predicted outputs ˆ( )Y k .

Evaluate ˆ( , ( ))J k U k in (4.2) using the neural network model compute the system outputs.

Step 4: Compute and filter the desired reference signal using (4.1).

Step 5: Whileiter

Uτ <= , do

Step 6: Compute ( )[ ( )]G U kτ in (4.53) and ( )[ ( )]H U k

τ in (4.54).

Step 7: Compute the Cholesky factorization of ( ) ( )V kτ in (4.71) using the algorithm of Table 4.1.

Step 8: Determine the searching direction ( ) ( )kτζ using (4.73).

Step 9: Evaluate ( ) ( )ˆ( , ( ) ( ))J k U k kτ τζ+ and compute ( )τϖ using (4.76).

Step 10: Update ( )τλ according to the conditions on ( )τϖ :

(a) If ( ) 0.75τϖ > , then ( ) ( )0.5*τ τλ λ← and Go To Step 11.

(b) If ( ) 0.25τϖ < , then ( ) ( )2*τ τλ λ← and Go To Step 11.

Step 11: Evaluate (4.74) using the criterion (4.2) subject to the constraints in (4.4).

If ( 1) ( ) ( )

( ) ( )U k U kτ τ τδ−

− ≤ in (4.75) and ( ) 0τϖ > in (4.76) anditer

Uτ ≤ and ( ) 310τλ ≤ ,

Accept ( ) ( )U kτ in (4.74) and Update ( ) ( ) ( )( ) ( ) ( ),U k U k k

τ τ τζ← + Go To Step 12,

Else set 1τ τ= + , ( ) ( 1) ( 1)( ) ( ) ( ),U k U k kτ τ τζ+ +← + and ( ) ( 1)τ τλ λ +← , and Go To Step 5.

Step 12: Accept the sequence of the optimal control signal ( )( ) ( )U k U kτ= in (4.51) subject to the inputs and output

constraints in (4.4).

* This algorithm is implemented at each sampling instant to determine the control signal.

theoretical predicted value of the criterion and then adjust ( )τλ according to this reduction and vice versa. Here,

( )τλ is adjusted according to the according to the accuracy of the ratio ( )τϖ between the actual reduction ( )ared

using (4.2) and theoretical predicted decrease ( )pdec of (4.2) using the value found in (4.74) subject to (4.75).

The ratio of this accuracy can be defined as:

( ) ( ) ( )( )

( ) ( ) ( ) ( ) ( )

ˆ ˆ( ( )) ( ( ) ( ))

( ( )) ( ( )) ( ( )) [ ( )]T T

ared J U k J U k k

pdec k k k G U k

τ τ ττ

τ τ τ τ τ

ζϖ

λ ζ ζ ζ

− += =

− (4.76)

Now, if the ratio is close to one, ( ) ( )( ( )) [ ( )]Tk G U k

τ τζ is an indication that the Hessian matrix is symmetric and

the searching direction is in a descent direction and ( )τλ should be reduced by some factor and thereby increasing

the trust region indirectly. On the other hand, if ( )τϖ is less than one or negative, then ( )τλ should be increased by

some factor thereby reducing the trust-region indirectly.


139

Thus, the NAMPC algorithm executed at each sampling instant k can be summarized in the stepwise

procedure given in Table 4.2. How the parameter ( )τλ should be reduced or increased depends on the criterion.

Suppose that the searching direction found in Step 5 does not lead to a reduction in the criterion, then ( ) 0τϖ <

and consequently the inequality in Step 10(b) is satisfied. Thus, the parameter ( )τλ should be increased until there

is a predicted decrease in the criterion. On the other hand, if the searching direction found in Step 5 reduces the

predicted value of the criterion sufficiently, then ( ) 0τϖ > and consequently the inequality in Step 10(a) is

satisfied. Thus, the parameter ( )τλ must be decreased until there is a predicted increase in the criterion. The main

drawback here is that the searching direction must be recomputed each time ( )τλ is reduced or increased for the

criterion to be evaluated, and hence significant amount of computation is required due to this process.

4.5 Tuning the Neural Network-Based Adaptive Model Predictive Controllers

After the design and possible implementation of the controllers developed in the two preceding sub-

sections, the next issue is towards the practical use of these controllers which involves the selection of the design

parameters 1 2, , ,

uN N N κ and ρ through a tuning process. In addition to the four parameters, the two parameters

( 1( )m

A z− and 1( )

mB z

− ) of the first-order low-pass digital filter are also considered here as tuning parameters

since they are used to calculate and filter the desired reference signals, and thus influence the stability and

robustness of the two controllers. The two horizons 1

N and 2

N have substantial impact on the time needed to

compute the control inputs. Unlike the AGPC, the NAMPC computation time is also influenced by the choice of

the additional design parameters δ and iter

U as well as the initial value of λ . The initial values of the control

inputs 0

( )U and predicted outputs 0

( )Y also influence the convergence rate of the two control algorithms.

Several research reports on MPC explain different routines and techniques for tuning MPC controllers

depending on the control objectives (disturbance and/or noise rejection, stability, robustness) [Clarke et al.,

1987a], [Clarke et al., 1987b], [Nørgaard et al., 2000], [Rossiter, 2004], and [Soeterboek, 1992]). In the following,

the conceptual guidelines for tuning the two controllers developed in the two preceding sub-sections are

highlighted and discussed below.

(i) Minimum prediction horizon,1

:N

This parameter is always selected to model time delay ( )d . It must not be chosen to be smaller because the

1d − first predictions depend upon past control inputs only and thus cannot be influenced. On the other hand,

choosing it to be bigger can lead to quite unpredictable results.


140

(ii) Maximum prediction horizon,2

:N

To ensure stabilization of systems with an unstable inverse, it should be at least as many time steps as there

are past control inputs fed into the network model. Usually it is selected a bit longer and close to the rise time of

the system response (if it is stable). However, it is not often possible to choose it that long because the

optimization problem will become too computationally demanding compared with the selected sampling period.

(iii) Control horizon, :u

N

In the linear case, it is easy to select this parameter to be equal to or exceed the number of unstable or

poorly damped poles. However, in the nonlinear case where the computational burden increases tremendously as

uN is increased, this is leads to a somewhat long control horizon. Unless the sampling period is very long, it is

always a good practice to use the smallest viable value.

(v) The penalty factor on the control input ρ and predicted output :κ

For reasons of numerical robustness, this parameter should be selected as 0κ > to prevent the Hessian from

becoming singular [Soeterboek, 1992]. However, it is primarily used for controlling the magnitude and

smoothness of the control signal and in practice it can be selected from simulation studies.

Note that the tuning rules described above are valid for both the NAMPC and the AGPC cases but the

computational demands associated with selecting long prediction and control horizons may be significantly less p

in the AGPC case. However, for systems where the nonlinearities are not necessarily smooth, selecting a long

prediction horizon is absurd. A remote future prediction will be completely unreliable and using it for calculating

the present control action intuitively may be meaningless.

Chapter 5 Development of Real-Time Implementation Platforms

141

CHAPTER FIVE

DEVELOPMENT OF REAL-TIME IMPLEMENTATION PLATFORMS FOR

THE NEURAL NETWORK-BASED NONLINEAR MODEL IDENTIFICATION

AND ADAPTIVE MODEL PREDICTIVE CONTROL ALGORITHMS

5.1 Introduction

In order to demonstrate the feasibility of the proposed identification and control strategies in an industrial

environment, an industrial network which utilizes the service-oriented architecture (SOA) technology based on

device profile for web services (DPWS) and a real-time embedded system development based on a field

programmable gate array (FPGA) are considered.

Systems that utilize networks for communication between industrial systems and controllers are called

network control systems (NCS). By this way reduction of wiring and ease of maintenance is achieved. However

an appropriate SOA technology for NCS must provide a bounded transmission delay and interoperability between

different components of the NCS. Several efforts have been made for embedding a SOA technology into

industrial control loops ([Chow and Tipsuwan, 2001]; [IEC, 1999]; ([Lee et al., 2006]) but none of them can

provide both the desired characteristics. Next it is described how the proposed SOA based on DPWS can be

implemented in this work and how this industrial network fulfills the aforementioned real-time characteristics for

industrial network control systems.

While the above SOA based on DPWS may be feasible for systems with long sampling interval, real-time

embedded processor systems that could implement part or an entire algorithm that control systems with relatively

short sampling times is introduced. The real-time embedded processor systems are based on the field

programmable gate array (FPGA) with two choices of embedded processors. The first is IBM PowerPC™440

embedded hard processor core and the second is the Xilinx MicroBlaze™ embedded soft processor core. Both

processors are realized on the Xilinx Virtex-5 FX70T ML507 FPGA development board.

An overview of the embedded processor systems and their design considerations are presented. The

overview answers why embedded processor systems are necessary, give some advantages and disadvantages of

FPGA embedded processor systems, and the industry standard benchmark for evaluating the performance of a

typical embedded processor in an FPGA. Several strategies are proposed to achieve enhanced performance from

an FPGA embedded processor system such as compiler optimization, choices of memory types and peripheral,

logic optimization and reduction, hardware co-processing, etc.

Furthermore, detailed techniques for the PowerPC™440 and MicroBlaze™ embedded processor system

design and testing are presented. The implementation and evaluation of the Dhrystone FPGA performance


142

benchmark on the MicroBlaze™ embedded processor system is demonstrated and comparison is made with the

Xilinx benchmark results. Lastly, the embedded processor systems synthesis and device utilizations are compared

5.2 The Description of the proposed Network Control System (NCS)

The general structure of the NCS used in this work is shown in Fig. 5.1. Every transmission medium in

this system constitutes the proposed SOA based on DPWS which consists of two levels: the device and the cell

level based on the architecture presented in [Lee et al., 2006]. The transmission medium between any of these two

levels of the automation system is considered to have the switched Ethernet architecture. In Fig. 1, q is the

number of sensors defining the outputs of a process and p is the number of actuators denoting the control inputs

to the process. The control system (identification and control algorithms) is located at the cell level while the q

sensors and p actuators at the device level. The plant and enterprise levels comprise either the enterprise resource

planning system or the operations management and supervisory level of an industrial process. Next it is explained

how this industrial network offers bounded transmission delay.

5.2.1 Bounded Transmission Delay

The studies presented in [Chow and Tipsuwan, 2001] and [IEC, 1999] provide industrial solutions for

satisfying real-time requirements. On the contrary the work presented in [Lee et al., 2006] offers a simple and

accepted solution for connecting devices within an industrial network. The switched Ethernet is chosen as the

transmission medium. This medium eliminates frame collisions, uses inexpensive and widely accepted technology

and provides a predicted transmission bound as soon as overflow events do not occur in the switches [Decotignie,

2005]. As it is documented in [Lee et al., 2006] the switched Ethernet architecture can be used throughout the

architecture of an automation system. In this way interoperability is provided at the network interface level, i.e.

devices use the same medium access control (MAC) and physical layer (PHY) interfaces for connecting with each

other.

The proposed SOA based on DPWS uses the same architecture and the worst case transmission delay of a

data frame transmitted from the device level to the control system is observed when all the q sensors and p

actuators transmit data simultaneously. This delay is defined as follows:

_ (1) _ 1 _w c s p trans t pD D D D= + + (5.1)

where _s pD is the processing transmission delay at the sensors and actuators,

1transD is the transmission delay (i.e.

delay in queue plus delay in the network) and _t pD is the frame reception delay at the control system. When a


143

TCP connection is established, its delay must also be taken into account. This connection is established by the

exchange of a CONNECTION-REQUEST message and a CONNECTION-ACCEPTED segment as it is

documented in [Tanenbaum, 1996]. Again the worst case TCP connection establishment delay is observed when

all the devices that are located at the device level request such a connection simultaneously. Therefore the worst

case CONNECTION-REQUEST delay from the device level to control system is:

_ (1) _ _ _ _ _ 1 _ _w c tcp request s p tcp trans tcp t p tcpD D D D= + + (5.2)

where _ _s p tcpD is the TCP processing transmission delay at the sensors and actuators while

_ _t p tcpD is TCP

processing reception delay at the control system. These delays correspond to the flow of data from the transport

layer to the PHY layer and vice versa. _ 1trans tcpD is the transmission delay of the TCP request segment transmitted

from the device level to the control system. The worst case CONNECTION-ACCEPTED delay (_ (1) _ _w c tcp acceptD ) is

the delay experienced by the last CONNECTION-ACCEPTED segment sent by the control system to device level

….

Industrial Plant

Identification and Adaptive Control Algorithms

Sensor 1

DPWS Server 1

DPWS Client

Sensor q

DPWS Server b

Actuator1

DPWS Server

1b +

Actuator p

DPWS Server b p+ ….

Ethernet Switch

……………………………………………….………

DPWS Server

Disturbances

d

Control Computer

Fig. 5.1: The proposed network control system (NCS) architecture.


144

and is the same with _ (1) _ _w c tcp requestD . Now, the worst case transmission delay that a TCP data segment experiences

when it is transmitted from the device level to the control system is defined by using (5.1) and (5.2) as:

_ _ (1) _ (1) _ _ _ (1) _ _ _ (1)

_ _ _ _ _ _ 1 _ 12 ( ) 2

w c data w c tcp request w c tcp accept w c

s p t p s p tcp t p tcp trans trans tcp

D D D D

D D D D D D

= + +

= + + × + + + × (5.3)

Let 1 _ _ _ _ _ _2 ( )

pr s p t p s p tcp t p tcpD D D D D= + + × + be the overall processing delay that the TCP data segment

experiences when it is sent from the device level to the control system and 1 1 _ 12

tr trans trans tcpD D D= + × be the overall

transmission delay that a TCP data segment experiences for the same path. Then (5.3) is formed as:

_ _ (1) 1 1w c data pr trD D D= + (5.4)

The worst case transmission delay of a frame transmitted by the control system to the device level is the

delay that the last frame in the control system queue experiences and can be calculated as:

_ (2) _ 2 _w c c cs p trans ct pD D D D D= + + +

where c

D is the computational time that the identification and control algorithms need for computing the control

input signals to the process, _cs p

D is the processing transmission delay at the control system and _ct p

D is the frame

reception delay at the device level. 2trans

D is the transmission delay a frame sent from the control system to the

device level. When a TCP connection is established, it must be taken into account the TCP connection

establishment delay too. Following the same way with the one presented previously, the worst case transmission

delay that a TCP data segment experiences sent from the control system to the device level is defined as:

Schematic view XML representation

SOAP Envelope

SOAP Header

Header Block

Header Block

SOAP Body

Message Body

<?xml version="1.0" encoding="UTF-8"?>

<env:Envelope

xmlns:env="http://www.

w3.org/2003/05/soap-

envelope">

<env:Header>

....

</env:Header>

<env:Body>

....

</env:Body> </env:Envelope>

Fig. 5.2: Structure of a SOAP message


145

_ _ (2) _ (2) _ _ _ (2) _ _ _ (2)

_ _ _ _ _ _ 2 _ 22 ( ) 2

w c data w c tcp request w c tcp accept w c

c cs p ct p s p tcp t p tcp trans trans tcp

D D D D

D D D D D D D

= + +

= + + + × + + + × (5.5)

where _ 2trans tcp

D is the transmission delay of the TCP request segment transmitted from control system to the

device level. Let 2 _ _ _ _ _ _2 ( )

pr c cs p ct p s p tcp t p tcpD D D D D D= + + + × + be the overall processing delay a TCP data

segment experiences transmitted by the control system to the device level and 2 2 _ 22

tr trans trans tcpD D D= + × which is

the overall transmission delay a TCP data segment experiences for the same path. Then (5) is formed as:

_ _ (2) 2 2w c data pr trD D D= + (5.6)

The combination of (5.4) and (5.6) determines the worst case overall control loop delay of the proposed SOA

based on DPWS:

_ _ _ _ (1) _ _ (2) 1 2 1 2w c data w c data w c data pr pr tr trD D D D D D D= + = + + + (5.7)

From the above it is understood that the proposed SOA based on DPWS can be used in an industrial

application if data is transmitted with TCP connection and the real time requirement is below the _ _w c data

D as this

is the bounded delay that it can offer.

5.2.2 Interoperability at the Application Level

Up to this point interoperability is provided at the network interface level by the proposed SOA based on

DPWS. In order to enable every component in the proposed NCS to interact with any other node regardless the

language or implementation platform, the interoperability feature must be provided at the application level too.

Therefore the DPWS must be adopted by all the components of the proposed SOA based on DPWS as it provides

the aforementioned interoperability and it was explained in Section 2.7.2.3.

All the components in the proposed SOA based on DPWS conform to the DPWS specification

implemented on top of switched Ethernet architecture. The sensors and actuators have DPWS server interfaces

and so are DPWS servers, while the control system has DPWS client interface and therefore is DPWS client as it

is shown in Fig. 5.1. As it is claimed in [Jammes and Smit, 2005a], the device-level SOA interaction patterns can

be categorized according to six levels of functionality: addressing, discovery, description, control, eventing and

presentation. After the discovery phase where the DPWS client has discovered the sensors and actuators, it

subscribes to the events of them by publishing the required sampling period using the eventing level interaction.

Every DPWS server assumes that whenever this time expires, there is a change in its state and so it informs the

DPWS client with the new values using the WS-Eventing protocol. Moreover, the DPWS client informs the

actuators with the new control signals as soon as the control algorithms have finished their execution, by using

control level interaction. The network can be considered real-time only if the worst case overall control loop delay

is bounded and less than the sampling period. Lastly, Fig. 5.2 illustrates the structure that all the exchanged


146

messages have in the proposed NCS. Since in this network all the components conform to the DPWS

implemented on top of Ethernet specification, all the exchanged messages have the SOAP structure. The root

element of a SOAP message is the Envelope. It encloses one or two child elements: an optional Header and a

Body. The Header element carries information that does not directly belong to the payload of the message, while

the Body element contains the actual payload of the message. Finally, a namespace is used in the XML

representation for using unambiguous data formats.

5.3 The Development of Real-Time Embedded Processor System Platform

This sub-section is considers the development of a embedded processors on an field programmable gate

array (FPGA), notably Xilinx MicroBlaze (MB) and IBM PowerPC™440 embedded processor systems using the

Xilinx embedded development kit (EDK) design tools which incorporates the Xilinx platform studio (XPS) and

the Xilinx software development kit (SDK). Next, the performances of the two embedded processor system are

investigated and a choice is made as to which is suitable for this work. The embedded processor systems are

design using the XPS via the ISETM

while the peripherals and memories of the processor are tested within the

Virtex-5 FXT ML507 FPGA board via the Xilinx SDK.

5.3.1 Overview of Embedded Processor Systems and Design Considerations

5.3.1.1 Why Embedding a Processor Inside an FPGA?

Embedding a processor inside an FPGA has many advantages. Specific peripherals can be chosen to

improve performance based on the application with unique user-defined peripherals been easily attached.

Likewise, large banks of external memory can be connected to the FPGA and accessed by the embedded

processor system using included memory controllers. A variety of memory controllers enhance the FPGA

embedded processor system’s interface capabilities. FPGA embedded processors use general-purpose FPGA logic

to construct internal memory, processor buses, internal peripherals, and external peripheral controllers including

external memory controllers. As more pieces of buses, memory controllers, peripherals and peripheral controllers

are added to the embedded processor system, the system becomes increasingly more powerful and useful.

However, it is worth noting that the additions of large banks of external memory may increase the latency

to access this external memory and may have negative impact on performance. In addition, adding many pieces of

peripherals and memory as well as their respective controllers may reduce performance and increase the

embedded system cost that consumes the FPGA resources.

FPGA manufacturers often publish embedded processor performance benchmarks. The manufacturers

obviously know what must be done in order to get the best out of their FPGAs that performs the best for each


147

specific benchmark, and they take full advantage of every possible enhancement strategies when benchmarking. A

clue to these strategies is that the FPGA embedded processor system constructed to run the benchmark has very

few peripherals and runs exclusively using internal memory. However, no easy formula or chart exists that shows

how to compare the performance and cost for different memory strategies and peripheral sets. The usual

performance benchmark is the Dhrystone benchmark implementation to evaluate the Dhrystone million

instructions per second (DMIPs) performance measured in terms of the maximum FPGA operating frequency

(fmax) in (MHz). It is then left for the users of such FPGAs to achieve the frequency and DMIPs set out by the

manufacturers.

5.3.1.2 Some Advantages and Disadvantages of FPGA Embedded Processor System

The embedded systems are normally defined as the software implemented in hardware in order to realize

specified real-time functionalities. The normally used soft-core processing hardware includes microcontrollers,

microprocessors, FPGAs, digital signal processors (DSPs), and application-specific integrated circuits (ASICs),

each of which has its own properties. Although, FPGA hardware technologies have attracted an always increasing

interest and have significantly disrupted the embedded system design technologies, it is worth considering some

advantages and disadvantages may be derived or incurred by the use of FPGA embedded technologies.

Here, some advantages of an FPGA embedded processor system when compared to an off-the-shelf

processor are summarized in the following:

1) Hardware Acceleration: The most compelling reason for FPGA embedded processor is the ability to make

trade-offs between hardware and software to maximize efficiency and performance. Suppose an algorithm is

identified as bottleneck, a custom co-processor can be designed in the FPGA specifically for that algorithm.

Then, this co-processor can be attached as a peripheral to the FPGA embedded processor as a co-processing

engine through special, low-latency channels, and custom instructions can be defined to implemented the co-

processor.

2) Peripheral Customization: FPGA embedded processor-based system offers and allow complete flexibility for

the selection of any combination of peripherals or controllers. In fact, new unique peripherals can be design

and connected directly to the processor’s bus with the assumption that there are no standard requirements for

the peripherals.

3) Components and Cost Reduction: With the versatility of the FPGA embedded processor, a previous system that

required multiple components can be replaced with a single FPGA such as in the case when an auxiliary

input/output (I/O) chip or a co-processor is required next to an off-the-shelf processor. By reducing the


148

components count in the design, a reduced board size and inventory management, both of which can save

design time-to-market and cost.

4) Component Obsolescence Mitigation: Obsolescence mitigation is a difficult issue when a design requirement

must ensure that a product lifespan be much longer than the typical lifespan of a standard electronics product.

In this case, FPGA embedded soft-processors could be an excellent solution since the HDL source code for the

soft-processor can be purchased and owned thereby guaranteeing the lifespan of the product.

Additionally, some disadvantages and challenges of an FPGA embedded processor system when compared to an

off-the-shelf processor are discussed in the following.

First, it is worth noting FPGA embedded processor is not without disadvantages. When compared to an

off-the-shelf processor, the hardware platform for the FPGA embedded processor must be designed as above,

which is a challenging hardware-software co-design task. Because of the integration of the hardware and software

platform, the design tools are more complex especially when a co-processing custom peripheral is involved. The

increased tool complexities and design methodologies require that critical decisions be made and adequate

attention be invested.

Next, since FPGA embedded processor software design is still relatively new compared to software

design for standard processors; the software design tools are likewise relatively immature, although workable with

several challenges.

Finally, in terms of design cost, if the desired task can be achieved with a standard off-the-shelf that is

less expensive compared to the FPGA, then using a large FPGA with unused gates or hard processor makes the

FPGA embedded processor system cost inconsequential.

5.3.1.3 Xilinx’s Embedded Hard PowerPC™440 and MicroBlaze Soft Processors

A processor built from dedicated silicon is referred to as a “hard” processor such as the IBM

PowerPC™440 embedded processor core inside the Xilinx Virtex-5 FXT family of FPGAs. On the other hand, a

“soft” processor is built using the FPGA’s general-purpose logic such as the Xilinx MicroBlaze™ embedded

processor core available an intellectual property (IP) for implementation in several Xilinx series of FPGAs. The

soft processor is typically described in a hardware description language (HHL) or netlist. Unlike the hard

processor, the soft processor must be synthesized and fit into the FPGA fabric. In both hard and soft processor

systems, the local memory, internal peripherals, peripheral and memory controllers, and processor buses must be

built from the FPGA’s general-purpose logic.


149

5.3.1.4 Standard Industry Benchmark for FPGA Embedded Processors and Xilinx’s

FPGA Embedded Processors Benchmark Performances

The industry standard benchmark for FPGA embedded processors is Dhrystone MIPs (DMIPs). Xilinx

quote DMIPs for almost all their available embedded processors including MicroBlaze™ and the PowerPC™440

embedded processors. The maximum operating frequency and DMIPs achievable from the Virtex-5 FXT family

of FPGAs as quoted by Xilinx for MicroBlaze™ are 210 MHz and 240 DMIPs respectively. Similar results for

the PowerPC™440 are 550MHz and 1,100 DMIPs for a single processor system. According to Xilinx, this

performance is twice with dual embedded PowerPC™ processors as 1,100MHz and 2,200 DMIPs. The achieved

DMIPs reported by Xilinx are based on several factors to maximize the benchmark results. Such factors include:

1) optimal compiler optimization level, fastest available device family; 2) fastest speed grade in that device

family; 3) executing from the fastest and lowest latency memory which is typically an on-chip memory; 4)

optimization of the embedded processor’s parameterizable features; and so on.

In fact, the FPGA manufacturer, which in this case is Xilinx, knows what must be done to get the most

out of their FPGAs and they take full advantage of every possible enhancement techniques when benchmarking.

Thus, it is also necessary to employ the best enhancement techniques in the embedded processor design proposed

in this work as much as possible, although the task is complicated.

5.3.1.5 Design Considerations for the Proposed FPGA Embedded Processor System

Since the Xilinx base system builder (BSB) wizard provides an efficient way to create an FPGA embedded

processor system, the choice of the memory types, memory controllers, peripherals, peripheral controllers, size

and type of instruction and data cache memories, the optimization levels, and processor clock frequency and size

of local memory. The discussions here are specific to the peripheral that may be considered for the design of the

proposed FPGA embedded processor systems to achieve the following design objectives: high-performance and

optimized speed in terms of operating frequency at reduced cost in terms of FPGA fabric resources consumption.

To be more specific, the proposed FPGA embedded processor system will incorporate a co-processing

system that will be attached to the processor local bus (PLB), a memory and memory controller are required.

Because instruction and data will be read in and written out, the size of the instruction and data cache memories

and peripherals together with their respective controllers must be configured. The initialization of the processor

programs also needs memory and memory controllers. The universal asynchronous receiver and transmitter

(UART) and joint test action group (JTAG) ports are required, and the UART must also be configured properly

for communication. During synthesis, simulation and compilation of the embedded processor system, an

appropriate optimization scheme must be selected to achieve the above design objectives. While the processor

timer is internal, the clock and reset are external. Among other memories and peripherals and their respective


150

controllers, the most important is whether an interrupt and a debug logic controllers will be required. These issues

and other critical considerations for the embedded processor system design are considered in the following.

5.3.1.5.1 Compiler Optimization and Parameters

Compiler optimizations are available in Xilinx platform studio (XPS) based on GNU compiler collections

(GCC). These compilers have several levels of optimization including Levels 0, 1, 2, and 3 as well as size

reduction optimization. The strategies for these different levels of optimizations as given below:

Level 0: This level does not apply any optimization to the design compilation.

Level 1: This is the first and the lowest (Low -01) level of optimization that performs jump and pop

optimization.

Level 2: this the second level of optimization and is designated as Medium (-02). This level activates

nearly all optimizations that do not involve a speed-space trade-off and so the executables do not

increase in size. The compiler doe not perform loop unrolling, function in-lining or strict aliasing

optimizations. This is the standard level used that can be used for all program deployment.

Level 3: This level offers the highest level and is designated High (-03). This level adds more expensive

options, including those that increase code size. In some cases, this optimization level actually

produces code that is less efficient the Level 2, and as such may be used with cautions.

Size Optimized (-0s): This option produces the smallest code size as much as possible.

Note in general, however, that both any of the optimization level and debug option are used, the information

obtained from the optimization process may not correlate with the generated source code.

5.3.1.5.2 Memory Types

The FPGA embedded processor provide access to fast, local memory as well as an interface to slower,

external memory. The way the memory is used has a significant effect on performance. However, the memory

usage can be manipulated using the Linker Script.

Local Memory Only: The local memory provides the fasted option in accessing memory. Xilinx FPGA

local memory is made up of large FPGA memory blocks called BlockRAM (BRAM). Embedded processor


151

accesses BRAM in a single bus cycle. Since the processor and the bus run at the same frequency in MicroBlaze,

instructions stored in BRAM are executed at the full MicroBlaze processor frequency. In the MicroBlaze

processor system, BRAM is essentially equivalent in performance to a Level 1 (L1) cache. On the other hand, the

PowerPC™ can run at frequencies greater than the bus and has true built-in L1 cache. Therefore, BRAM in a

PowerPC™ processor system is equivalent in performance to a Level 2 (L2) cache. Thus, if the program for a

particular embedded processor system design fits entirely within the local memory, then the design is likely to

achieve optimal memory performance, although it is mostly likely that the embedded programs will exceed the

local memory capacity.

External Memory Only: Xilinx FPGAs provides several memory controllers that interface with a variety

of external memory devices. These memory controllers are connected to the processor’s peripheral bus. The three

types of volatile memory are supported by Xilinx FPGAs are static RAM (SRAM), single-data-rate RAM

(SDRAM), and the double-data-rate RAM (DDR) SRAM. The SRAM controller is the smallest and simplest

inside the FPGA while the SDRAM is the most expensive of the three memory types. The DDR SDRAM

controller is the largest and most expensive inside the FPGA, but requires fewer FPGA input-output (I/O) ports

and is least expensive per megabyte.

In addition to the memory access time, the peripheral also incurs some latency. In MicroBlaze, for

example, the memory controllers are attached to the on-chip peripheral bus (OPB). The OPB SDRAM controller

requires about eight to ten cycle latency for a read operation and four to six cycle latency for a write operation

depending on the clock frequency. Thus, it is obvious that the worst possible program performance would be

achieved by having the entire program reside in external memory. Since optimizing execution speed is a typical in

the embedded processor system design, an entire program, should rarely be targeted solely at external memory.

Instruction and Data Cache Memory: The PowerPC™ in Xilinx FPGAs has instruction and data cache

built directly into the silicon of the hard processor. Enabling this cache is almost always a performance advantage

for the PowerPC™ [Fletcher, 2005]. On the other hand, the MicoBlaze™ cache architecture is not on the

dedicated silicon chip rather the instruction and data cache controllers are selectable parameters in the MicroBlaze

configuration. When these controllers are included, the cache memory is built from BRAM. Therefore, enabling

the cache is likely to consume more BRAM than local memory for the same storage size because the cache

architecture requires address line tag storage. Additionally, enabling the cache may also consume general-purpose

FPGA logic to build the cache controllers. The consequences are that the achievable system frequency may be

reduced when the cache is enabled as more logic may be added and the complexity of the design may increase

during the FPGA place and route operation. Despite these consequences in enabling the MicroBlaze cache,

especially the instruction cache, may improve performance, even when the system is likely to run at lower

frequency. Finally, enabling the cached memory is always worth an experiment to justify different trade-offs.


152

Combination of Internal, External and Cache Memory: As discussed earlier, the memory that provides

the best performance is one that only has local memory. However, this architecture may not always be practical

since many useful and efficient embedded programs exceed the available local memory capacity. On the other

hand, running from externally memory exclusively may have more than eight times performance disadvantage

due to the peripheral bus latency.

Caching the external memory is an excellent choice for embedded PowerPC™ processor systems. For

embedded MicroBlaze processor systems, perhaps the optimal memory configuration may be to wisely partition

the program code, maximizing the system frequency and local memory size. Critical data, instructions and stack

can also be placed in local memory. Data cache may not be used so as to allow for a larger local memory bank.

Suppose that the local memory is not large enough; then the instruction cache can be enabled for the address rang

in the external memory used for instructions. By not consuming BRAM in data cache, the local memory can be

increased to contain more space. An instruction cache for the instructions assigned to external memory could be

very effective. Alternatively, experimentation or profiling could show which code fragments are most heavily

accessed; and assigning these fragments to local memory could provide a greater performance improvement than

caching.

5.3.1.5.3 Optimization Specific to an FPGA Embedded Processor

Since the one of the objective of the proposed embedded processor system design using the Xilinx Virtex-

5 FX70T FPGA is to improve the performance of the hardware, additional techniques must be exploited to

achieve this objective. Given the fact that the FPGA embedded processor resides next to additional FPGA

hardware resources, one here technique is to consider a custom co-processor designed specifically to target the

implementation of a core algorithm in the design.

Logic Optimization and Reduction: The key point here is that only peripheral and buses that are

necessary and required should be connected. Suppose that the intended design does will not store and run any

instructions using external memory; then connecting the instruction side of the peripheral bus is not necessary.

Connecting both the instruction and data side of the processor to a single bus may create a multi-master system

which requires an arbiter. Optimal bus performance is achieved when a single master resides on the bus.

Furthermore, debug logic requires resources in the FPGA and may be the hardware bottleneck. When the

design is completely debugged, the debug logic can be removed from the final system, which will potentially

increase the system’s performance. For example, in an embedded MicroBlaze processor system with the cache

enabled, the debug logic will typically be the critical path that will slow down the entire design [Fletcher, 2005].


153

Area and Timing Constraints: Xilinx FPGA place and route tools as well as the Xilinx’s PlanAhead tool

perform much better when the design objectives are well specified. In these Xilinx tools, the desired clock

frequency, pin location, and logic element location can be specified. By providing these details, the design tools

can be able to make efficient, optimized and smarter trade-offs during hardware design implementation.

Therefore, a careful study of the datasheets for each peripheral together with the design guidelines goes a long

way in this regard and it is a necessity.

Hardware Acceleration: Dedicated hardware outperforms software at the expense of FPGA resources for

dramatic performance improvements. Therefore, the FPGA’s ability to accelerate the processor performance with

dedicated hardware should be considered. Provided the hardware divider and the hardware barrel-shifter are

enabled, embedded MicroBlaze™ processor can be customized to use a hardware divider and a hardware barrel-

shifter rather than performing these functions in software. Although, enabling these processor capabilities may

consume FPGA resources, but the performance improvements can be extraordinary.

Co-Processing Hardware: Custom hardware logic can be designed to offload an FPGA embedded

processor. For example, a software bottleneck identified in an algorithm can be converted into a custom hardware.

Then, custom software instructions can be defined to operate the hardware co-processor.

Both MicroBlaze™ and PowerPC™ include very low latency point into the processor, which are ideal for

connecting custom co-processing hardware. For example, the auxiliary processing unit available in the Virtex-5

FPGA provides a direct connection from the PowerPC™440 to co-processing hardware. In MicroBlaze™, the

low-latency interface is called the Fast Simplex Link (FSL) bus which are dedicated channels so that no

arbitration or bus mastering is required. This allows extremely fast interface to the processor except that Xilinx

has announced that the FSL will be discontinued in the future [Xilinx, 2010].

Any operation that is algorithmic, mathematical, or parallel is a good candidate for a hardware co-

processor which is the subject of the proposed embedded processor system design in this work. FPGA logic can

be traded for performance but the advantages can be enormous and performance can be improved significantly.

5.3.2 The PowerPC™ 440 Embedded Processor System Development Using Xilinx

Integrated Software Environment (ISE) and Xilinx Platform Studio (XPS)

The embedded processor designs considered here follows closely from the design considerations outlined

and discussed in Section 5.3.1.4. The embedded processor systems design using the IBM PowerPC™ 440 and the

Xilinx MicroBlaze™ cores is instantiated from the Xilinx ISE which then initializes the XPS where the actual

processor systems’ designs are done. The Xilinx ISE is started and the project name is assigned on the “New

Project Wizard”. The name assigned here for the PowerPC™ processor system is “emb_ppc440_processor”. The


154

FPGA device family Virtex-5 XC5VFX70T is selected and the speed grade for this device family based on our

available Virtex-5 FX70T ML507 FPGA board is -1 and is thus specified as well as the device package of FF1136.

The Xilinx synthesis tool (XST) as the synthesis tool to be used in synthesizing the design. The Xilinx ModelSim-

SE is selected as the simulation tool. The language for the embedded processor system development is the VHDL

(very-high-speed hardware description language). In addition to these selections, the Embedded Processor is also

(a) New project summary (b) Based System Builder: “Welcome”

(c) Based System Builder: “System” (d) Based System Builder: “Processor”

Fig. 5.3: The Xilinx ISE “New Project Summary” and the BSB Welcome, System, and Processor design stages for the

embedded PowerPC™440 processor system.


155

added as a “New Source” in this project wizard. The “emb_ppc440_proceesor” project summary is shown in Fig.

5.3(a).

When the “New Project Wizard” is completed, the ISE initializes and automatically starts up the Xilinx

platform studio (XPS) since it was added as a “New Source”. The XPS in turn initializes and brings up the Base

System Builder (BSB) which is an automated tool that can be used to create an embedded processor system. The

processor design based on the BSB is an eight-stage procedure, namely: Welcome, Board, System, Processor,

Peripheral, Cache, Application, and the Summary.

The “Welcome” allows new processor(s) to be design or an existing pre-designed processor system to be

loaded as shown in Fig. 5.3(b). The “Board” stage allows the FPGA device family and package to be specified, if

different from that specified in the “New Project Wizard”. This is sometimes useful if a custom FPGA board

different from the pre-configured Xilinx FPGA development boards. It is also useful if the processor design was

not initialized and started using the Xilinx ISE. The advantages of initializing and starting an embedded processor

system design from the ISE are many as discussed in Appendix A. The “System” stage shown in Fig. 5.3(c) allows

a single- or dual-processor system to be specified and designed. The Virtex-5 XC5VFX70T devices family

currently supports single processor systems design. Thus, a single processor system is the target in this work. Then

in the “Processor” stage, the choices of selecting a PowerPC™ or a MicroBlaze™ processor are available. Thus, in

this sub-section, a PowerPC™440 is selected as the intended processor as shown in Fig. 5.3(d) whereas in the next

sub-section the MicroBlaze™ processor will be selected.

The “Peripheral” stage allows different memory types and peripherals to be added or removed from the

proposed embedded processor system. Once a memory or peripheral is selected, the associated controller is

automatically added. Furthermore, if the “Interrupt” check box is selected, the interrupt controller is also included

which must be configured in the XPS after the BSB have created embedded processor system. As discussed in

Section 5.3.1.4.2 under memory types as well as hardware and optimization specific to an FPGA embedded

processor in Section 5.3.1.4.3; the choice of memory and hardware peripheral including their respective controllers

have significant effects on the embedded systems performance. Here, peripherals that are not needed are removed.

The actual size of the embedded program is yet to be known and this makes the choice of the memory difficult to

select. In this regard, the embedded processor local memory is selected first. Next, the external DDR SRAM and

the on-board SRAM are added. In this, the serial port is needed to print out all results to the host development

computer. Thus, the only peripheral added here is the UART (RS323_Uart_1) and it is configured as follows: Buat

Rate = 115200, Data Bits = 8, Parity = None, and the Interrupt is not used (that is, it is left unchecked). The BSB

dialog for the “Peripheral” stage and the selected memory types and peripherals is shown in Fig. 5.4(a).

The “Cache” stage allows the instruction and data caches memory types and controllers to be enabled. As

mentioned earlier, the PowerPC™440 embedded in the Virtex-5 series of FPGAs provides 32-Kbit of caches which

are built directly into the silicon of the hard PowerPC™440 core. Normally, these caches are enabled in software


156

and can be configured to cache multiple memory regions. Here, both the instruction and data cache memory types

are enabled, although this can also be configure in the software design part of the embedded processor system

implementation using the Xilinx SDK. The “Applications” stage lists the readily available applications to be

implemented by the embedded processor system. The applications are usually written in C programming language

and users applications. The default Xilinx applications available under “Application” are the “Memory” and

“Peripheral” test programs as shown in Fig. 5.4(b) under the File Location category:

“TestApp_Memory_ppc440_0” and “TestApp_Peripheral_ppc440_0”. Note that new software programs can be

crated and added into this “Application” both from the XPS after the BSB must have finished creating the

processor, and from the Xilinx SDK during the software design portion of the embedded processor system.

The “Summary” is the last stage of the BSB-guided steps for creating an embedded processor system. This

stage lists all the available peripheral associated with the created embedded processor together with their instance

name, base and high addresses as shown under System Summary in Fig. 5.4(b). The “Summary” stage also list the

major software associated with the processor system as shown under Overall in the File Location category in Fig.

5.4(b). The components of the previous “Application” stage are also listed in the “Summary” stage dialog window.

(a) Based System Builder: “Peripheral” (b) Based System Builder: “Summary”

Fig. 5.4: The BSB: the Peripheral and Summary design stages for the embedded PowerPC™440 processor system.


157

Next, the just created PowerPC™440 embedded processor system must be compiled so that all the

memory types, peripherals, memory and peripheral driver software and the entire embedded processor system can

be updated. The Xilinx ISE and the XPX are used interchangeably to perform these compilations. The compilation

procedures are summarized as follows:

1) Starting with the XPS, the board support packages (BSPs) and libraries are generated by selecting “Software

Generate Libraries and BSPs” on the XPS graphical user interface (GUI) shown in Fig. 5.5.

Fig. 5.5: The XPS graphical user interface (GUI) for the creation and initial compilation of the embedded processor system.


158

2) Next, the Netlist is generated by selecting: “Hardware Generate Netlist”. This stage of the design also

generates all the “wrappers”, device drivers, and all the necessary design and technology files that would

required by ISE for complete synthesis and implementation of the embedded processor system in the ISE.

3) After the Netlist generation, attention is turned to the Xilinx ISETM

. A section of the Xilinx ISE™ graphical

user interface (GUI) for the PowerPC™440 embedded processor system design is shown in Fig. 5.6. During

the Netlist generation, the “User Constraint File (UCF) was generated. The UCF file has the project name

with a ucf extension, that is, “emb_ppc440_processor.ucf” and is always located in the directory “data” in the

processor hierarchy. This file defines the constraints on the created processor system together with input-

output (I/O) map of the complete design to the Virtex-5 FX70T FPGA device family and the selected package

in Fig. 5.3(a). This file is introduced in the processor system by selecting “Project Add Source” from the

ISE GUI of Fig. 5.6, and navigating to “data” directory, and the “emb_ppc440_processor.ucf” is added.

4) Next, the programming file (BitStream) for the complete embedded PowerPC™440 processor system is

generated by Double-clicking the blue-colored highlighted “Generate Programming File” shown in Fig. 5.6 to

generate the programming file for the embedded processor project. This is the implementation phase of the

design which is discussed in Appendix A–7. The various stages of this implementation are described in the

Flow of Fig. A.10. As can be seen in Fig. 5.6, the ISE has seven major phases, namely:

Step 1) User Constraints,

Step 2) Synthesize – XST (Xilinx Synthesis Tool),

Step 3) Implemented Design,

Step 4) Generate Programming File,

Step 5) Configure Target Device,

Step 6) Update Bitstream with processor Data, and

Step 7) Analyze Design Using Chipscope.

Double-clicking the “Generate Programming File” implements Steps 2), 3) and 4) to generate this file.

Note that the XPS generated the UCF which takes care of step 1). Otherwise using the Xilinx PlanAhead, the

UCF would have been created here in Step 1). Because, the design is not ready for the target Virtex-5 FX70T

FPGA, Steps 5), 6), and 7) are not implemented here. The generation of the bitstream completed without

errors but with some warnings.

5) Note that the embedded processor design is coordinated by both the Xilinx ISE™ and the XPS. It is observed

that immediately after the generation of the Programming File (bitstream); the Xilinx ISE™ indicates that the

project design is out of data while the XPS indicates that the project file has changed on disk on their

respective GUIs. Therefore, Step 1) to Step 4) is repeated to update the system, after which both notifications

disappear.


159

Fig. 5.6: A section of the Xilinx ISE™ graphical user interface from where the PowerPC™440 embedded processor

system design is instantiated.

In addition to the Programming File, an important file is also generated called the “Block Memory Map

(BMM)” file with extension bmm. For the current PowerPC™440 project, this file is edkBmmFile_bd.bmm.

The BMM file is a text file that has syntactic descriptions of how individual block RAMs constitutes a

contiguous logical data space. The Xilinx Data2MEM [XD2M, 2009] use BMM files to direct the translation

of data into the proper initialization form. Note that since a BMM file is a text file, it is directly editable. This

file together with the bitstream and all the generated device drivers will be required to program the Virtex-5

during the software design portion of the embedded processor system. The BMM file is located in the top


160

Fig. 5.7: The block diagram of the PowerPC™440 embedded processor system with associated memory types,

peripherals, clock generator, buses, hardware and software specifications and key/symbols.

level directory of the processor system together with the bitstream (with extension .BIT). The detailed and

complete Xilinx ISE™ device utilization and XPS synthesis summary reports for the PowerPC™440

embedded processor system design are given in Appendix A–11.1 and Appendix–A.11.2 respectively, for

convenience.

6) Since the embedded processor project is now fully updated by both Xilinx ISE™ and XPS, attention is again

turned to the XPS shown in Fig. 5.5 to perform the following:

1) Generate the block diagram of the complete system is generated by selecting from the XPS GUI of

Fig. 5.5: Project Generate Block Diagram Image which is shown in Fig. 5.7.


161

2) Generate the complete design report by selecting from the XPS GUI of Fig. 5.5: Project Generate

and View Design Report. This report gives the detailed information on the embedded processor

system but is not shown in this work since it is more than 200 pages. It is useful as a reference note

to accessing the different peripherals, memory types, and memory and peripheral drivers especially

when modifications, addressing and integrating custom hardware are necessary.

3) Generate and export the designed embedded processor hardware to the Xilinx software development

kit (Xilinx SDK) by selecting from the XPS GUI in Fig. 5.5: Project Export Hardware Design to

SDK. Although the Export dialog box offers two options for exporting the designed hardware:

Export Only and Export and Lunch SDK, the “Export Only” is selected since the designed hardware

will be exported in the next two sub-section for memory and peripheral testing as well as the

Dhrystone benchmark performance comparison of the designed PowerPC™440 processor system

with Xilinx MicroBlaze™ embedded processor. However, this export process automatically creates

an SDK directory in the current design hierarchy and places the hardware structure of the designed

PowerPC™440 processor system (emb_ppc440_processor.xml) as an XML document in the created

SDK directory.

5.3.3 Embedded MicroBlaze Processor System Development Using the Xilinx

Integrated Software Environment (ISE) and the Xilinx Platform Studio (XPS)

The procedures for creating the embedded MicroBlaze™ processor system is essentially the same as that

for the embedded PowerPC™440 system using the Base System Builder (BSB). However, some differences exist

in the architectural design of the embedded MicroBlaze™ embedded processor when compared to the embedded

PowerPC™440 processor. Here, name assigned to the embedded MicroBlaze™ processor system project is

“emb_mb_processor”. At the “Processor” stage using the Base System Builder (BSB) to create the MicroBlaze™

embedded processor, “MicroBlaze” is selected as the option for “Processor Type” as in the case of Fig. 5.3(b).

As discussed in Section 5.3.1.5.2, the choices and configurations of different memory types and peripherals

influences the performances of embedded processors, especially for the MicroBlaze™ processor where the FPGA

fabrics are used to implement the logic circuits and drivers. Thus, for the “Peripheral” selection stage, data-side

and instruction-side local memory types and controllers are selected. These two were in-built within the

PowerPC™400 core. Similar to the PowerPC™ processor, the DDR2 SDRAM, the SRAM and the UART are

included in the MicroBlaze™ processor system. These peripherals together with their address range are shown in

the design summary of Fig. 5.8(a). Unlike in PowerPC™440 where the instruction and data caches are in-built and

fixed at 32-KB with three memory options SRAM, DDR2 SDRAM and BRAM are available for enabling the

cache memory type, only the first memory type options are available for enabling the MicroBlaze™ processor


162

memory cache. While the instruction and data memory cache size in the PowerPC™440 core is fixed at 32-KB,

that in the MicroBlaze™ processor core can be specified. Noting that the amount of FPGA fabrics required to

implement the memory and the memory address decoders varies with the specified memory size, the instruction

and data caches for the MicroBlaze™ processor system are enabled with each allocated 32-KB SRAM from the

default 8-KB as shown in Fig. 5.8(b). In MicroBlaze™ processor system, small cache sizes are implemented with

FPGA look-up tables (LUTs) while large cache sizes are implemented using block RAMs (BRAMs). As mentioned

earlier in the previous section, these caches are optional and can also be configured during the software

development for the embedded processor system as shown and discussed in Section 5.3.5. The design summary of

the MicroBlaze™ embedded processor system created using the base system builder (BSB) is shown in Fig. 5.8(a)

and list the major software associated with the processor system as shown under Overall in the File Location

category. Like the “Application” stage in the PowerPC™440 processor system, the component associated with the

“Application” stage are also listed under the “System Summary” for the created MicroBlaze™ embedded processor

system.

(a) Based System Builder: “Summary” (b) Based System Builder: “Cache”

Fig. 5.8: The BSB: the Peripheral and Summary design stages for the embedded MicroBlaze™ processor system.


163

Fig. 5.9: The block diagram of the MicroBlaze™ embedded processor system with associated memory types, peripherals,

clock generator, buses, hardware and software specifications and key/symbols.

The software associated with the just created MicroBlaze™ embedded processor system is then compiled

so that all the memory types, peripherals, memory and peripheral driver software as well as the entire embedded

processor system are updated. The compilation procedures are similar that described for the PowerPC™440

embedded processor system where the Xilinx ISE and the XPX are used interchangeably to perform these

compilations.


164

As in the previous sub-section, the wrappers and hardware drivers, Libraries and the board support

packages (BSPs) as well as the Netlist are generated using the XPS via its GUI while the Synthesis, programming

file (Bitstream), block memory file (BMM), all other implementation files and the device utilization summary are

generated using the Xilinx ISE™ software via its GUI. Next, the XPS via its GUI is used to create the SDK

directory in the top level hierarchy of the MicroBlaze™ processor project directory and the hardware description

text file that encrypts the MicroBlaze™ embedded processor system is exported to this SDK directory. Finally,

the block diagram image and the XPS synthesis summary are generation using the XPS via its GUI. The

MicroBlaze™ embedded processor system created is shown in Fig. 5.9. The detailed and complete Xilinx ISE™

device utilization and XPS synthesis summary reports for the MicroBlaze™ embedded processor system design

are given in Appendix A–11.3 and Appendix–A.11.4 respectively, for convenience.

5.3.4 Software Development and Performance Verification of the PowerPC™440

and MicroBlaze™ Embedded Processor Systems Using the Xilinx Software

Development Kit (Xilinx SDK)

In this sub-section, the embedded PowerPC™440 and MicroBlaze™ processor system are tested. The test

includes: memory, peripheral and the evaluation of the Dhrystone benchmark performance. Unfortunately, only

the Dhrystone benchmark evaluation program for MicroBlaze is included. As discussed in the previous sub-

sections, the hardware description files (emb_ppc440_processor.xml and emb_mb_processor.xml) have been

placed in their respective SDK directories. These tests are performed using the Xilinx software development kit

(SDK). The procedures for creating the software platforms and programming the FPGA are summarized as

follows.

Beginning with the embedded MicroBlaze™ processor system, the Xilinx SDK software is launched and

the hardware description is imported independently into the Xilinx SDK workspace via the SDK GUI. This

process automatically builds and initializes all the embedded processor drivers.

First, a new “Software Platform” is created on the embedded MicroBlaze™ processor system using the

Xilinx SDK GUI. A new “Manage Make C Application Project” is then created under the “Software Platform”

and the “Memory Tests” application is selected which uses the “TestApp_Memory.c” shown in Fig. 5.8(a). The

Xilinx SDK automatically builds and compiles the software application project and reports any error(s).

Next, the Virtex-5 ML507 FPGA board is connected, turned ON and program by selecting Tools

Program FPGA from the SDK GUI. This process requires the MicroBlaze processor programming file (bitstream)

generated in the previous sub-section named “emb_mb_processor.bit” and the block memory map

(edkBmmFile.bmm).


165

The results from the FPGA can be observed on the HyperTerminal window of the host computer using

the RS232 serial ports of both the FPGA and the host personal computer (PC) via a null RS232 serial cable. Here,

the host PC is an Intel® Core™2 Quad CPU computer running at 2.66GHz. The universal asynchronous receiver

transmitter (UART) serial port (commonly called serial port) uses a protocol that provides a useful and convenient

way of testing processor-based, high-level code. The print command of C is used to display intermediate values

from the FPGA. For consistency in the data transmission rate, the RS232 port of the host PC is configured as that

of the FPGA in sub-section 5.3.2 as follows: Baud rate = 115200, Data = 8 bits, Parity = none, Stop = 1 bit, Flow

control = none.

Fig. 5.10: Xilinx software development kit graphical user interface for software development and programming the

Virtex-5 ML507 FPGA using the “Debug on Hardware” option.


166

The Memory Test application is executed on the FPGA as “Debug on Hardware” from the SDK GUI as

shown in Fig. 5.10. Running the Memory Test application on the Virtex-5 ML507 FPGA produces the result

shown on the HyperTerminal of Fig. 5.11(a). Note that here, the PowerPC™440 processor hardware description

file “emb_ppc440_processor.xml”, the programming file (bitstream) generated in the previous sub-section named

“emb_pp440_processor.bit”, and the block memory map (edkBmmFile.bmm) are required to program the FPGA.

In order to test the peripherals, another new “Manage Make C Application Project” is created using the

same procedures for the Memory Test. The “Peripheral Tests” which uses the “TestApp_Peripheral.c” shown in

Fig. 5.8(a). The same procedures in the test memory case are followed to build, compile and test the embedded

MicroBlaze™ processor peripherals. Running the Peripheral Test application on the Virtex-5 ML507 FPGA

produces the result shown on the HyperTerminal of Fig. 5.11(b). The memory and peripheral tests performed for

the MicroBlaze™ embedded processor system is repeated for the PowerPC™440 embedded processor system, the

results similar to Fig. 5.11(a) and (b) were obtained.

These test results indicate that the memories and peripherals of the embedded processor systems are fully

functional and well configured which implies that embedded processor systems could be deployed for the

development of embedded system applications.

(a) Memory test (b) Peripheral test

Fig. 5.11: The MicroBlaze™ processor: (a) memory and (b) peripheral test results on the HyperTerminal window.


167

5.3.5 MicroBlaze™ Dhrystone Benchmark Performance Evaluation

The Dhrystone is a benchmark test program used to evaluate the performance of embedded processor

system and its performance is compared to that of the manufacturer to measure how well the memory types,

peripheral and optimization techniques have been employed to create the embedded processor system for

enhanced performance. As mentioned in Section 5.3.1.1, the performance for the Dhrystone benchmark

evaluation are usually measured in terms of the maximum FPGA operating frequency (fmax) and the Dhrystone

million instructions per second (DMIPs). Unfortunately, only the Dhrystone benchmark program for evaluating

embedded MicroBlaze™ processor system is available here for evaluation. However, since essentially the same

memory types, peripheral and their respective controllers, the results for the benchmarking of the MicroBlaze™

processor system could be used to judge the PowerPC™440 processor system and noting that the PowerPC™ is

know for higher speed performance running at a maximum frequency of 550MHz and 1,100 DMIPs when

compared to MicroBlaze™ of 210MHz and 240 DMIPs as discussed in sub-section 5.3.1.4 ([XEPB Virtex-5,

2010], [XMBPRG, 2010]).

To enhance the performance of the Dhrystone benchmarking of the design MicroBlaze™ embedded

processor system, the Dhrystone is configured to load directly into the on-board BlockRAMs (BRAMs) from the

Xilinx platform studio (XPS) for speed performance at maximum operating frequency and DMIPs execution. In

this work, the Dhrystone program is first implemented in the SDK similar to the Memory and Peripheral test

programs to ensure that it is free of errors. A copy of the just tested MicroBlaze™ processor system is made. Next

the XPS is opened via the Xilinx ISE GUI following the same way in which it was created. A new directory

called Dhrystone_TestApp_microblaze_0 is created within the XPS emb_mb_processor hierarchy. A new

software application is then created in the XPS created also called “Dhrystone_TestApp_microblaze_0” as shown

in Fig. 5.12. The Dhrystone benchmark program is then imported into the new “Dhrystone_

TestApp_microblaze_0” software application. As discussed in sub-section 5.3.1.5.1, the medium optimization

Level 02 (Medium (–O2)) is selected as the compiler optimization option as shown in the lower right-hand corner

of Fig. 5.12. The new project is then compiled by right-clicking the new Dhrystone_TestApp_microblaze_0

application selecting “Build Project”. This action creates the executable and linkable (EFL) file for the project.

Since the copied project has change, the Xilinx ISE™ project has also changed and it shows out of data.

Thus, the complete MicroBlaze™ embedded processor project is agian fully recompiled using both the XPS and

the Xilinx ISE™ software according following the 9-Step procedures summarized in sub-section 5.3.2. New

board support packages (BSPs), Netlist, programming file (bitstream), block memory map (BMM), hardware

description file (emb_mb_processor.xml) are generated and exported to the software development kit (SDK).

The Xilinx SDK is again opened. A new software platform is created called “Dhrystone_Test”. A new

“Manage Make C Application Project” is also created. This time around, the just created and compiled

“Dhrystone_Test” software application is selected. Next, the Virtex-5 ML507 is programmed and the Dhrystone


168

application is executed. The maximum operating frequency obtained is 188.2 MHz against the 210 MHz specified

by Xilinx and 204.7 DMIPs against the 240 DMIPs specified by Xilinx [XMBPRG, 2010]. By dividing the

DMIPs by the maximum operating frequency obtained by Xilinx for the Virtex-5 ML507 FPGA gives 0.9748

which implies that the designed MicroBlaze™ embedded processor system is highly optimized for embedded

applications. Note that the embedded programs are initialized and implemented via the BRAM due to its small

size, but the result may be different when the embedded programs are larger than the on-board BRAMs. It may be

argued that similar high and optimized performance may be obtained for the PowerPC™440 embedded processor

system, though the Dhrystone benchmark program is not available for benchmark performance evaluation.

Fig. 5.12: The XPS for creating, compiling and initializing the Dhrystone benchmark program to load from on-board

BRAM for benchmark performance evaluation of MicroBlaze™ embedded processor on Virtex-5 ML507 FPGA.


169

5.3.6 Comparison of the Device Utilization for the PowerPC™440 and MicroBlaze™

Embedded Processor Systems

In this sub-section, the Xilinx platform studio (XPS) synthesis and Xilinx ISE™ device utilization reports

generated by the XPS and Xilinx ISE™ are summarized and are used to deduce and compare the FPGA hardware

resources consumption for creating the PowerPC™440 and the MicroBlaze™ embedded processor systems. The

XPS synthesis report summary is shown in Table 5.1 whereas the Xilinx ISE™ device utilization summary is

shown in Table 5.2. The details of these two reports are given in Appendix A–11, their summaries are presented

here for quick comparison for few remarks.

From the XPS synthesis results of Table 5.1, it is obvious that the MicroBlaze™ consumes more FPGA

hardware resources when compared to the embedded PowerPC™440 processor system. For example, the

PowerPC™440 used only 2 flip flops to implement the ppc440_0_wrapper, whereas the MicroBlaze™ used

1,375 to implement the microblaze_0_wrapper which increases hardware cost. Also, the DDR2 SDRAM

(ddr2_sdram_wrapper) implementation for the PowerPC™440 processor system consumes 2,355 flip flops

against the 3,458 flip flops required by the MicroBlaze™ processor system, which invariably increase hardware

cost. Although, the debug module is implemented in the silicon of the PowerPC™440 hard processor core, a

significant amount of 119 flip flops are required to realize the logic operation in the MicroBlaze™ processor

system. On the other hand, the PowerPC™440 utilized 255 and 138 flip flops to implement the

xps_bram_if_cntlr_1_bram_wrapper and the plb_v46_0_wrapper respectively as against the 150 flip flops

Table 5.1: The Xilinx platform studio (XPS) embedded PowerPC™440 and MicroBlaze™ processor systems

synthesis summary.

PowerPC™440 Embedded

Processor System

MicroBlaze Embedded

Processor System

Flip Flops

Used

Look-Up Tables

(LUTs) Used

BlockRAMs

(BRAMs) Used

Flip Flops

Used

Look-Up Tables

(LUTs) Used

BlockRAMs

(BRAMs) Used

proc_sys_reset_0_wrapper 67 51 67 51

jtagppc_cntrl_inst_wrapper 2

mdm_0_wrapper 119 117

clock_generator_0_wrapper 4 3 4 3

ddr2_sdram_wrapper 2,355 1,768 2 3,458 2,077 5

sram_wrapper 544 316 540 295

rs232_uart_1_wrapper 141 127 144 130

lmb_bram_wrapper 8

ilmb_cntlr_wrapper 2 6

dmb_cntlr_wrapper 2 6

dlmb_wrapper 1 1

ilmb_wrapper 1 1

xps_bram_if_cntlr_1_bram_wrapper 16

xps_bram_if_cntlr_1_wrapper 255 201

plb_v46_0_wrapper 138 220

mb_plb_wrapper 150 410

ppc440_0_wrapper 2 3

microblaze_0_wrapper 1,375 1,220


170

required by the MicroBlaze™ processor system to implement the mb_plb_wrapper. On the average, all other

hardware consumptions by both embedded processor systems are comparable as can be observed in Table 5.1.

The Xilinx ISE™ device utilization report summary of Table 5.2 shows that the main processing engine

of the MicroBlaze™ processor system may have been built from three high-performance DSP48E multipliers

with significant 6,740 look-up tables (LUTs) flip flop pairs. Also, the number of slices occupied by the

MicroBlaze™ processor system outweighs that occupied by the PowerPC™ processor system by 9%.

Furthermore, the number of slice registers and LUTs used in the embedded MicroBlaze™ processor system

design is in excess of 6% and 3% when compared to that used in the PowerPC™440 processor system design. It

can be observed that the embedded PowerPC™440 processor design required additional 22 flip flops for routing

and additional 2% excess flip flops for build the memory.

Table 5.2: The Xilinx ISE™ device utilization summary used by the embedded PowerPC™440 and

MicroBlaze™ processor systems.

PowerPC™440 Embedded

Processor System

MicroBlaze Embedded

Processor System

Flip Flops

Used

Flip Flops

Available

Device

Utilization

Flip Flops

Used

Flip Flops

Available

Device

Utilization

Slice Logic Utilization

Number of Slice Registers 3,040 44,800 5% 5,051 44,800 11%

Number of Slice LUTs 2,538 44,800 5% 3,871 44,800 8%

Number of Route-Thrus 22

Number of Occupied Slices 1,737 11,200 15% 2,748 11,200 24%

Number of LUT Flip-Flops Pairs Used 4,134 57,202 7% 6,740 57,202 11%

Number of Bonded IOBs 184 640 28% 184 640 28%

Number of LOCed IOBs 184 184 100% 184 184 100%

IOB Flip Flops 330 330

Number of Block RAM/FIFO 20 148 13% 17 148 18%

Total of Memory Used (KB) 720 5,328 13% 612 5,328 11%

Number of BUFG/BUFCTRLs 7 32 21% 7 32 21%

Number of IDELAYCTRLs 3 22 13% 3 22 13%

Number of BUFIOs 8 80 10% 8 80 10%

Number of DCM_ADVs 1 12 8% 1 12 8%

Number of PLL_ADVs 1 6 16% 1 6 16%

Number of PPC440s 1 1 100%

Number of BSCANs 1 4 25%

Number of DSP48Es 3 128 2%

Average Fanout of Non-Clock Nets 3.07 3.33

Chapter 6 Case Studies and Simulation Results

171

Chapter 6

Case Studies and Simulation Results

6.1 Introduction

In this chapter, the two proposed neural network (NN) model identification algorithms, the adaptive

recursive least squares (ARLS) and the modified Levenberg-Marquardt algorithm (MLMA), and the two adaptive

predictive control algorithms, the adaptive generalized predictive control (AGPC) and the nonlinear adaptive

model predictive control (NAMPC), are applied to three selected industrial applications as the case studies

considered in this study. The three case studies are: 1) the fluidized bed furnace reactor (FBFR) of the steam

deactivation unit (SDU) of a fluid catalytic cracking (FCC) pilot plant, 2) aerobic reactor unit of an activated

sludge wastewater treatment plant (AS-WWTP), and the auto-pilot control system of a nonlinear F-16 fighter

aircraft.

Section 6.2 is concerned with the NN model identification and adaptive MPC temperature control of the

FBFR process. A brief description of the FBFR process is presented sub-section 6.2.1while its mathematical

model is given in Appendix B. The control objectives are also presented in this section 6.2.1. The neural network

training and validation using the ARLS and the MLMA algorithms as well as their closed-loop identification and

PID control for the FBFR process together with their results are presented in sub-section 6.2.2. Validation and

dynamic performance comparisons of the proposed MLMA algorithm with backpropagation with momentum

(BPM) and the incremental backpropagation (INCBP) algorithms together with their results are presented in sub-

section 6.2.3. Then in sub-section 6.2.4, the validation and performance evaluation of the proposed AGPC and the

NAMPC algorithms for the model-based adaptive control of the temperature of the FBFR process using the

ARLS algorithm as well as their control performances are presented. Sub-section 6.2.5 concludes this section with

implementation and performance evaluation of a PID controller with the proposed NAMPC algorithm over the

proposed service-oriented architecture (SOA) cluster network with their closed-loop simulation results.

Section 6.3 deals with the NN model identification and adaptive MPC control of the soluble oxygen (the

so-called dissolved oxygen concentration, DO) in the aerobic reactor unit of an activated sludge wastewater

treatment plant (AS-WWTP) process. An overview of the AS-WWTP process and a statement of the

identification and control problem are first presented in sub-section 6.3.1 while the detailed AS-WWTP process

description and complete mathematical model is provided in Appendix C. The training and validation results of

the NN that models the AS-WWTP process using the ARLS and the MLMA algorithms as well as their online

closed-loop identification and control performances with the proposed AGPC controller are presented in sub-

section 6.3.2. The validation and dynamic performance comparisons of the proposed ARLS algorithm with the

backpropagation with momentum (BPM) and the incremental backpropagation (INCBP) algorithms together with

their results for the AS-WWTP process are presented in sub-section 6.3.3. The validation and performance

Chapter 6 Case Studies and Simulation Results

172

evaluation of the proposed AGPC and the NAMPC algorithms for the model-based adaptive control of the

dissolved oxygen concentration (DO) in the aerobic reactor unit of the AS-WWTP process using the MLMA

algorithm as well as their control performances are presented in sub-section 6.3.4.

Section 6.4 presents the neural network-based model identification and adaptive MPC control of the auto-

pilot control system of a nonlinear F-16 fighter aircraft. A brief description of the F-16 aircraft and formulation of

the control problem are first presented in sub-section 6.4.1 while the detailed description of the F-16 aircraft and

its detailed mathematical model is given in Appendix D. The training and validation results of the NN that models

the auto-pilot control system of the nonlinear F-16 fighter aircraft using the ARLS and the MLMA algorithms are

presented in sub-section 6.4.3 whereas the online closed-loop model identification and control performances of

the ARLS and the MLMA algorithms with proposed NAMPC controller are presented in sub-section 6.4.4. The

validation and performance evaluation of the proposed AGPC and the NAMPC algorithms for the model-based

adaptive control of the auto-pilot control system of the nonlinear F-16 aircraft using the ARLS algorithm as well

as their control performances are presented in sub-section 6.4.5.

Section 6.5 concludes this chapter. This section is concerned with the real-time implementation of

implementation of the neural network-based AGPC algorithm for the nonlinear F-16 auto-pilot control system on

a Xilinx Virtex-5 FX70T ML507 FPGA board. The model-based for the FPGA implementation of the AGPC

algorithm is presented first in sub-section 6.5.1. Then the hardware synthesis of the AGPC algorithm using the

Xilinx AccelDSP tool is presented in sub-section 6.5.2 together with some discussions and remarks. The model-

based implementation of the synthesized AGPC algorithm using the Xilinx System Generator for DSP is given in

sub-section 6.5.3. Then in sub-section 6.5.4, the hardware-in-the-loop co-simulation of the synthesized AGPC

System Generator model with the Virtex-5 FX70T ML507 FPGA board is presented together with the simulation

results. The synthesis and generation of the AGPC algorithm as a co-processor form the synthesized AGPC

System Generator model together with the techniques for integrating the generated AGPC co-processor with the

embedded PowerPC™440 processor system developed and tested in section 5.3.2 in Chapter 5 are presented in

sub-section 6.5.5. Sub-section 6.5.6 is the last sub-section and it details the real-time implementation of the

embedded PowerPC™440 processor system and the synthesized AGPC System Generator model as a co-

processor on Xilinx Virtex-5 FX70T ML507 FPGA board. The complete closed-loop control of the nonlinear F-

16 aircraft auto-pilot control system using the validated Simulink model, the neural network model based on the

MLMA algorithm and the combined embedded PowerPC™440 processor–AGPC co-processor system. Details of

the hardware utilization, application programming interface (API) for the AGPC co-processor as well as the

software for implementing the complete embedded PowerPC™440 processor–AGPC co-processor system are

provided in Appendix E.

Chapter 6 Case Study 1: The Fluidized Bed Furnace Reactor

173

6.2 The Model Identification and Control Problem of the Fluidized Bed Furnace

Reactor (FBFR) Process

6.2.1 The Fluidized Bed Furnace Reactor (FBFR) Process Description and

Mathematical Model

The cyclic propylene steam deactivation procedure is used to prepare fluid catalytic cracking (FCC)

catalysts for evaluation that will be purchased for use in normal production FCC plants. The FCC process is a

vital part of every modern refinery. Through this process the heavy residue of the atmospheric and vacuum

distillations is catalytically converted from heavy to lighter hydrocarbon products, thus increasing the gasoline

and diesel yield of the refinery. The main task of the FCC is catalyst benchmarking procedures which require that

the catalyst be evaluated at constant conversion levels through accurate temperature control. In order to evaluate

such catalyst in bench scale units and pilot plants, it is necessary to have pre-processed samples from vendor

supplied catalysts that accurately simulate the state of a “used” catalyst which is actually present at any given time

inside the commercial FCC unit. This catalyst state is called “equilibrium catalyst” and all major catalyst vendors

carry out research into finding processes that produce such variation of the produced catalyst. This process is

called “catalyst preparation through metal deactivation”. During this process, the catalyst is impregnated with

metals such as vanadium or nickel from a source such as metal naphthenates before steaming.

The steamers are designed to hydrothermally deactivate cracking catalyst and to simultaneously

deactivate metals deposited on the catalyst. Deactivation involves the exposure of the catalyst to streams

containing steam, propylene, sulphur IV oxide (SO2) and air, and nitrogen (N2) alternatively for a specified

number of cycles at high temperatures. The predefined process procedures must be accurately followed during the

initial heat up and the deactivation stage. Even relatively small overshoots in temperature might give final product

properties that would not be acceptable. At the end of the deactivation procedure that lasts more than 22 hours in

total, excluding the wet impregnation stage, the catalyst must have exact macroscopic properties that match the

catalyst drawn from the industrial process. The most important properties include: unit cell size (UCS) of the

contained zeolite, total surface area (TSA), target metals level. Some properties such as the unit cell size (UCS)

are irreversibly affected by treatment temperature overshoots of only 2%. Thus, an efficient control of the

temperatures inside a furnace heated reactor used in the deactivation of the catalyst during the catalyst processing

experiments is of paramount importance. This reactor will be referred to as the fluidized bed furnace reactor

(FBFR).


174

6.2.1.1 The Fluidized Bed Furnace Reactor (FBFR) Process

The pilot plant scale steam deactivation unit (SDU) is a pilot plant fitted with automated controls for

temperature and gas supply switching as illustrated in Fig. 6.1. The operation is coordinated by a state of the art

industrial control system and software. Three gas lines each consisting of filter, pressure regulator, pressure

indicator and check valve are fed to a single mass flow controller. An on–off solenoid valve manages the flow of

each line. An accurate deionized water pump (DWP) supplies the water that is required for steam generation

through the upper part of the line entering the FBFR reactor. The complete description of the deactivation

procedure is given in [Voutetakis et al., 2006]. Accurate control of temperature and energy requirements of the

fluidized bed furnace reactor (FBFR) of the cyclic propylene steam deactivation unit that is part of an FCC pilot

plant is of paramount importance.

The fluidized bed furnace reactor (FBFR) considers heat transfer in the radial direction for a structure

consisting of successive cylindrical layers as shown in Fig. 6.2. Electric heaters are embedded in ceramic material

in the heater section to generate the necessary heat for the process. The electric heaters are regulated by

manipulating a high resistance potentiometer (HRP). Heat is then transported in the radial direction towards the

centre of the reactor and the insulator section. A dynamic distributed heat transfer model is used for the interior of

the reactor, the air in the gap between the inner reactor wall and the heater and the insulator. The dynamic

behaviour of the temperature is hence expressed as a set of partial differential equations. A lumped model (e.g.,

radial temperature gradients assumed negligible) is used for the inner reactor wall, the heater and the outer reactor

wall sections. The lumped models then result in a set of ordinary differential equations. The dynamics of the

system are hence consist of a series of fast and slow modes depending on the heat capacity and thermal

conductivities of the different layers (i.e. material properties) in the reactor. The complete mathematical model for

the energy balance of the FBFR is expressed as a set of nonlinear partial differential equations with respect to Fig.

6.2 and is given in Appendix B.

The solution of the FBFR mathematical model given by Equations (B.1) – (B.6) in Appendix B–1 was

implemented in MATLAB. The MATLAB program that translates the FBFR mathematical model described by

Equations (B.1)–(B.6) is given in Appendix B–2 together with the FBFR parameters. The MATLAB script that is

used to implement the FBFR model of Appendix B–2 is given in Appendix B–3. The parameters of the FBFR

were obtained from Chemical Process Engineering Research Institute (CPERI), Thermi – Thessaloniki, Greece

where the FCC pilot plant, the SDU and the FBFR are located.

The temperature distribution across the FBFR based on the simulation of (B.1) – (B.6) using the

MATLAB programs of Appendix B–2 and B–3 are given in Fig. 6.3. As can be seen in Fig. 6.3, the temperatures

across the FBFR increases sharply at the initial phase of the deactivation process and gradually reduce to steady-

state values. For the deactivation process to be successful, the temperatures must be maintained at some

prescribed levels throughout the entire deactivation process, and this requires keeping the temperatures to some


175

acceptable values. As can be seen in Fig. 6.3(a) the maximum temperatures of the interior reactor wall (Tirw),

between the reactor wall and the heater (Tbrwh), the outer reactor metal wall (Tormw), and the heater (Th)

respectively do not change significantly. On the other hand, the minimum and maximum temperatures for the

reactor interior (Tri) and that of the insulator (Tins) change significantly as the temperature variations in Fig.

6.3(b) show. In fact, it can be observed clearly in Fig. 6.3(b) that Tins increases from an initial temperature close

to Tormw to values close to Tbrwh since the insulator (ins) and the layer between the reactor wall and the heater

(brwh) are both on opposite sides of the heater.

Deionized

Water

3 8 2C H N

2A i r S O

2N

PI

10

PI

31

PI

41

PI

21

F1

20

Mass Flow

Controller

FC

22

Pressure

Indicators

Pressure

Regulators Filters Check

Valves

On–Off

Solenoids

V-10

500 ml

M

Deionized

Water

Pump

(DWP)

PI

50

Coolant

Fluidized Bed

Furnace Reactor

(FBFR)

Flow Meter

. . . DWP

Control

Electric

Heaters

embedded

in ceramics

Electric

Energy

HRP HRP

Control

Fig. 6.1: Simplified diagram of the steam deactivation unit (SDU) of the FCC pilot plant with the FBFR.

Outer reactor metal wall (ormw)

Insulation (ins)

Electric heater (h)

Intermediate air gap between

reactor wall and heater (brwh)

Inner reactor wall (irw)

Reactor’s interior (ri)

Outer metal wall radius, ormwR

Insulator radius, insR

Heater radius, hR

Empty space radius, brwhR

Reactor wall radius, irw

R

Reactor radius, riR

Fig. 6.2: Schematic of the vertical cross-section of the cylindrical fluidized bed furnace reactor (FBFR).


176

6.2.1.2 The Control Problem of the Fluidized Bed Furnace Reactor (FBFR) Process

The most essential action in the cyclic propylene steam deactivation procedure is the tight control of the

temperature inside the furnace reactor, which is the temperature of the reactor interior Tri. This temperature must

be raised to a final desired level of 860 °K by following a predefined pattern. Thereafter, a sequence of cyclic

operations is performed. The automatic control of this process faces several challenges [9]:

i) Temperature during heat up must follow the predefined ramping.

ii) Due to the large time delays involved that also change over the operational range the system must be tuned

in a suitable way.

iii) Temperature overshoot at the final step of the ramp should not exceed 2% as the catalyst can be irreversibly

affected.

The controller should observe the cycling process without generating new disturbances. During the cyclic

stage (reduction phase) energy is absorbed by the system and released in the oxidation (burning) stage. Heating is

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 105

400

600

800

1000

(a)

Tem

pera

ture

distributions( oK)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 105

400

600

800

1000

(b)

time(seconds, s)

Tem

pera

ture

variations ( oK)

Tir (Min)

Tri (Max)

Tirw

Tbrwh

Th

Tins (Min)

Tins (Max)

Tormw

Fig. 6.3: Temperature distribution and variation across the six sections of the FBFR system: (a) temperature

distribution across the FBFR and (b) temperature variations showing the maximum temperatures for Tirw, Tbrwh, Th

and Tormw with the minimum and maximum temperatures for Tri and Tins.


177

provided through the independent electric resistance heaters, as shown in the diagram of Fig. 6.1. Excessive

heating in the initial phase of the experiment would both increase the energy requirements and further result in

overshooting the desired temperature level where the deactivation cycle will take place [Voutetakis et al., 2006].

The two objectives here are: 1) to develop a neural network that will accurately model the FBFR process, and

2) to use the resulting neural network model to design a controller that will accurately control the temperatures of

the heater (Th) and the reactor interior (Tri) with quick response and with overshoots not exceeding 2% of the

nominal steady state value.

6.2.1.3 FBFR Experiment and Training Data Acquisition

In the previous work presented in [Voutetakis et al., 2006], a well-tuned PID and MPC controllers were

developed to operate the FBFR process in the range of 3.76 and 3.66 kW respectively out of the total 5.04 kW

(kilo Watts) heat energy available for the process with dionized water flow rate of 500 /ml s (milliliter per

second). A sampling time (T) of one minute was considered for 22 hours operating cycles. This means that 1320

data samples can be obtained from the process in every operating cycle. In order to develop a neural network to

accurately model the FBFR process, the heat and the deionized water flow rate supplied to the FBFR were varied

from their minimum up to the 30% above their normal operating values. This variation is to allow for the

coverage of the entire operating range of the FBFR, during both initial heat-up and deactivation, and to account

for the possible uncertainties in the plant model outside the operating region. The minimum and upper valued for

Q are 0 kW and 6.552 kW respectively and were varied in steps of 936 Watts (W). The lower and upper values of

DWP are 0 ml/s and 650 ml/s respectively and were varied in steps of 92.8571 (milliliter per second, ml/s).

Using the validated first principles mathematical model of the FBFR process and the MATLAB programs

for solving the nonlinear partial differential equations for this model given in Appendix B, the FBFR was

simulated in open-loop. For the 8 different combinations of each step changes in Q and DWP using the step

defined above, the six corresponding temperatures were obtained which resulted in the desired 1320 input-output

data pairs. All the 1320 data pairs obtained, were used for network training while the 300 input-output test data

pairs obtained directly from the FBFR were used for the validation of the trained network.

6.2.1.4 Statement of the FBFR Neural Network Model Identification and Control

Problem

As it has been explained above the most essential control action is to make sure that the temperature of

the reactor interior (Tri) is not exceeding 860°K. In this study, the control input for the control of the FBFR is the

electrical energy (Q) to the heater via the high resistant potentiometers (HRP) and the deionized water pump


178

(DWP) flow rate that is ( ) [ ( ) ( )]TU k Q k DWP k= . The controlled outputs of the FBFR system are the

temperatures of the six sections of the FBFR namely the: reactor’s interior Tri , interior reactor wall Tirw , air

gap between the reactor and the heater Tbrwh , heater Th , insulator Tins , and outer reactor metal wall Tormw

which is given by ( ) [ ( ); ( ); ( ); ( ); ( ); ( )]T

Tri Tirw Tbrwh Th Tins TormwY k y k y k y k y k y k y k= .

The input vector to the neural network consists of regressors which are concatenated into the regression

vectors ( )NNARX

kϕ , and ( , ( ))NNARMAX

k kϕ θ respectively for the NNARX and NNARMAX models predictors that

were discussed in Chapter 3 and are defined here as follows:

[ ]( ) ( ) ( ) ( ) ( ) ( ) ( )a

T

n a a a a a ak Tri k n Tirw k n Tbrwh k n Th k n Tins k n Tormw k nϕ = − − − − − − (6.1)

[ ]( ) ( ) ( )b

T

n b bk Q k n DWP k nϕ = − − (6.2)

[

]

( , ( )) ( , ( )), ( , ( )), ( , ( )),

( , ( )), ( , ( )), ( , ( )),

cn Tri c Tirw c Tbrwh c

T

Th c Tins c Tormw c

k k k n k k n k k n k

k n k k n k k n k

ϕ θ ε θ ε θ ε θ

ε θ ε θ ε θ

= − − −

− − −

(6.3)

Thus, the concatenation of (6.1) and (6.2) results in the regression vector for the NNARX models predictor as

( ) ( ) ( )a bNNARX n n

k k kϕ ϕ ϕ = (6.4)

while the concatenation of (6.1) to (6.3) results in the regression vector for the NNARMAX models predictor as

( , ( )) ( ) ( ) ( , ( ))a b cNNARMAX n n n

k k k k k kϕ θ ϕ ϕ ϕ θ = (6.5)

The outputs of the NN are the predicted values of the temperatures of the six sections of the FBFR given by:

ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ) [ ( ) ( ) ( ) ( ) ( ) ( )]T

Tri Tirw Tbrwh Th Tins TormwY k y k y k y k y k y k y k= (6.6)

The neural network model identification schemes for the FBFR process based on the NNARX and NNARMAX

models are shown in Fig. 6.4 (a) and (b) respectively.

Training

Algorithm

Neural

Network

Model

b

n T D L−

FBFR Process

( )kε

ˆ ( )Y k

( )Y k ( )U k • •

•

• +

−

( )bn kϕ

a

n T D L−

( )an

kϕ

( )d k

Training

Algorithm

Neural

Network

Model

b

n TDL−

a

n TDL−

FBFR Process

( , ( ))k kε θ

ˆ( )Y k

( )Y k ( )U k • •

•

•

•

+

−

( )bn

kϕ

( )an

kϕ

c

n TDL−

( , ( ))cn

k kϕ θ

( )d k

•

( , ( ))k kε θ

(a) (b)

Fig. 6.4: Neural network-based FBFR model identification schemes (a) NNARX model and (b) NNARMAX model.


179

6.2.2 Training the Neural Network that Models the FBFR Process

The input vector to the neural network is the NNARMAX model regression vector ( , ( ))NNARMAX

k kϕ θ

defined by (6.4). The vector ( , ( ))cn k kϕ θ , is not known in advance and it is initialized to a small positive random

matrix of dimension c

n byc

n . The outputs ˆ( )Y k of the NN are the predicted values of the process given by

(6.6). However, because from the control point of view the temperatures of interest are those of the heater (Th)

and the reactor interior (Tri), only the simulation results for Th and Tri are presented throughout in this study. So

the predicted outputs of interest are then given as elements of the vector ˆ ˆ ˆ( ) [ ( ); ( )]T

Th TriY k y k y k= .

For assessing the convergence performance, the network was trained for τ = 10, 50, 100 and 500 epochs

(number of iterations) with the following selected parameters: 2p = , 6q = , 3a

n = , 3b

n = , 3c

n = , 24nϕ = and

42nϕ = for NNARX and NNARMAX model predictors respectively, 10h

n = , 6o

n = , 1 5h

eα = − and 1 4o

eα = − .

The details of these parameters are discussed in section 3.3.2 of Chapter 3; where p and q are the number of

inputs and outputs of the system, ,a b

n n and c

n are the orders of the regressors, nϕ is the total number of

regressors (that is, the total number of inputs to the network), h

n and o

n are the number of hidden and output

layers neurons, and h

α and o

α are the hidden and output layers weight decay terms. The four design parameters

for adaptive recursive least squares (ARLS) algorithm defined in (3.68) are selected to be: α=0.5, β=5e-3, 'δ =1e-

5 and π=0.99 resulting to γ=0.0101. The initial values for ēmin and ēmax in (3.67) are equal to 0.0102 and 1.0106e+3

respectively and were evaluated using (3.67). Thus, the ratio ēmin/ēmax in (3.66) is 9.9018e+4 which imply that the

parameters are well selected. Also 0.001τλ = , 0.05s = and 0.01δ = were selected to initialize the modified

Levenberg-Marquardt algorithm (MLMA).

The training data is first scaled using equation (3.89) and the network is trained for 10, 50, 100τ = and

500 epochs using the adaptive recursive least squares (ARLS) and the modified Levenberg-Marquardt (MLMA)

algorithms proposed in Chapter 3. After network training, the trained network is again rescaled according to

(3.90), so that the resulting network can work with unscaled FBFR data. The convergences of the ARLS and

MLMA algorithms for (a) 10, (b) 50, (c) 100 and (d) 500 epochs is shown in Fig. 6.5. One can observe that even

at 10 epochs the two algorithms converge to acceptable performance indexes of approximately 10-3

while a better

convergence is achieved as the number of epochs is increasing until it reaches the convergence limit of 10-6

after

200 epochs.

By comparing Fig. 6.5 (a), (b), (c) and (d), it can be seen that the ARLS has a faster convergence than the

MLMA after approximately 6 epochs. Since real-time identification and control is the primary aim of this work, it

is necessary to investigate the performance of networks trained with relative small number of iterations (epoch).

Thus, the evaluation of the network performance trained with 10 and 100 epochs is investigated here for the


180

FBFR process. The summary of the training results for the ARLS and the MLMA algorithms are presented in

Table 6.1.

The computation times for the network trained with 10 and 100 epochs are shown in the first row of Table

6.1. As it can be seen in Table 6.1, despite the fact that the ARLS algorithm converges faster, the MLMA is about

10.5 and 38.5 faster in terms of computational cost. The mean square error (MSE) and the minimum performance

indices (discussed in subsection 3.3.7) for the network trained with the ARLS and the MLMA algorithms are

given in the second and third lines of Table 6.1. Again, the MLMA algorithm also has smaller mean square errors

and minimum performance indices when compared to the ARLS algorithm. The small values of the mean square

error (MSE) and the minimum performance indexes indicate that MLMA performs better than the ARLS for the

same number of iterations (epochs). These small errors suggest that the MLMA model approximates better the

FBFR system giving smaller errors than the ARLS model.

Table 6.1: Summary of training results for ARLS and MLMA algorithms EPOCHS

10 100

ARLS MLMA ARLS MLMA

Computation time for model identification (sec) 1.5047e+1 1.4375 1.4902e+2 3.875

Mean of square error (MSE) 3.1023e-2 2.0000e-3 2.1449e-2

1.5625e-3

Minimum performance index 1.1778e-3 1.0248e-3 8.1432e-3 2.7162e-5

Mean error of one-step ahead prediction of training data 5.4888e-2 2.0415e-4 3.1811e-3

1.1165e-4

Mean error of one-step prediction of test data 2.7639e-2 1.5363e-2 1.5578e-2 2.0709e-4

Mean value of 5-step ahead prediction error 1.7800e-2 2.5006e-3 1.7511e-3 1.6484e-4

Akaike’s final prediction error (AFPE) estimate 1.1117e-2 1.0246e-2 8.0767e-3 2.6884e-4

2 4 6 8 1010

-3

10-2

10-1

10 Epochs

(a)

ARLS

MLMA

0 20 4010

-6

10-4

10-2

100

50 Epochs

(b)

ARLS

MLMA

0 50 10010

-6

10-4

10-2

100

100 Epochs

(c)

ARLS

MLMA

0 200 40010

-10

10-5

100

500 Epochs

(d)

ARLS

MLMA

Perform

ance Index

Epochs (Number of iterations)

Fig. 6.5: Network convergence using the ARLS and the MLMA algorithms (performance index vs. epoch).


181

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri

(o

K)

(a)

- Scaled training data; --* One-step ahead output predictions using ARLS

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri

(o K

)

(b)

- Scaled training data; --* One-step ahead output predictions using MLMA

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th

(o K

)

(c)


0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th

(o

K)


(d)

Number of training data samples

(a 10)

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri

(o

K)

(a)


0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri

(o

K)

(b)


0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th

(o

K)

(c)


0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th

(o

K)


(d)


(a 100)

Fig. 6.6: Comparison of Tri and Th data predictions with

the training data by the network is trained using

ARLS and MLMA algorithms for 10 Epochs.

Fig. 6.7: Comparison of Tri and Th data predictions with

the training data by the network is trained using


6.2.2.1 Validating the Trained Network that Models the FBFR Process

According to the discussion in Chapter 3, a trained network can be used to model a process once it is

validated, that is, the network demonstrates its ability to predict correctly both the data that were used for its

training and other data that were not used during training. The network trained by the proposed ARLS and

MLMA algorithms has been validated by the use of scaled and unscaled training data as well as with 300 test data

obtained directly from measurements of the FBFR process by the use of three different methods.


182

6.2.2.1.1 Validation by the One-Step Ahead Predictions Simulation

In the one-step ahead prediction method, the errors obtained from one-step ahead output predictions of the

trained network are assessed. In Fig. 6.6 (a) and (b) graphs for the Tri one-step ahead predictions of the scaled

training data (blue -) against the trained network output predictions (red --*) using the neural network model

trained by ARLS and the MLMA algorithms are shown for 10 epochs while the one-step ahead predictions of the

Th training data are shown in Fig. 6.6 (c) and (d) respectively for 10 epochs as well. Similar results for 100

epochs using both ARLS and the MLMA algorithms for Tri and Th are also shown in Fig. 6.7 (a)–(d).

The one-step ahead prediction errors are given in the fourth line of Table 6.1 for the 10 and 100 epochs

respectively. It can be seen in Fig. 6.6 (a)–(d) and Fig. 6.7 (a)–(d) that the network predictions of the training data

generally match closely the original training data used for Tri and Th. However, in the case of the ARLS

algorithm, the error is a bit larger than that of the MLMA algorithm. This observation is better shown in the fourth

line of Table 6.1. These small one-step ahead prediction errors are indications that both trained networks capture

and approximate the nonlinear dynamics of the FBFR accurately. This is further justified by the small mean

values of the MSE obtained using ARLS and MLMA algorithms for Tri and Th respectively.

Furthermore, the suitability of the proposed ARLS and MLMA algorithms for neural network model

identification for use in the FBFR industrial environment is investigated by validating the trained network with

300 unscaled test data obtained directly from the real FBFR process. Graphs of the trained network predictions

(red --*) of the test data with the actual test data (blue -) for 10 epochs for each one of the ARLS and the MLMA

algorithms are shown in Fig. 6.8 (a) and (b) for Tri and in Fig. 6.8(c) and (d) for Th respectively. Similar results

for 100 epochs are shown in Fig. 6.9 (a) and (b) for Tri and Fig. 6.9 (c) and (d) Th respectively. The almost

identical prediction of these data proves the effectiveness of the proposed approach. The prediction accuracies of

the unscaled test data by the networks trained using ARLS and the MLMA algorithm evaluated by the computed

mean prediction errors shown in the fifth line of Table 6.1. One can observe that the test data prediction errors

using MLMA are a bit smaller than those obtained by using the ARLS algorithm. These predictions of the

unscaled test data are given in Figs. 6.8 and 6.9 as well as the prediction errors in Table 6.1 verify the network

ability to model accurately the dynamics of the FBFR.

6.2.2.1.2 K–Step Ahead Prediction Simulations for the FBFR Process

The results of the K-step ahead output predictions (red --*) using the K-step ahead prediction validation

method discussed in Chapter 3 for 5-step ahead output predictions (K = 5) compared with the unscaled training

data (blue -) are shown in Fig. 6.10(a)–(d) and Fig. 6.11(a)–(d) for 10 and 100 epochs respectively. The (a) and

(c) pairs in both figures correspond to the results of the network that was trained by the ARLS while the (b) and

(d) pairs correspond to the MLMA algorithm. The value K = 5 is chosen since it is a typical value used in most

model predictive control (MPC) applications. The comparison of the 5-step ahead output predictions performance


183

0 50 100 150 200 250 3000

500

1000

1500

Tri

(o K

)

(a)

- Unscaled FBFR test data; --* Predicted outputs by ARLS

0 50 100 150 200 250 3000

500

1000

1500

Tri

(o K

)

- Unscaled FBFR test data; --* Predicted outputs by MLMA

(b)

0 50 100 150 200 250 3000

500

1000

1500

Th

(o K

)

(c)


0 50 100 150 200 250 3000

500

1000

1500

Th

(o K

)


(d)

Number of FBFR test data samples

0 50 100 150 200 250 3000

500

1000

1500

Tri

(o K

)

(a)


0 50 100 150 200 250 3000

500

1000

1500

Tri

(o K

)


(b)

0 50 100 150 200 250 3000

500

1000

1500T

h (

o K

)

(c)


0 50 100 150 200 250 3000

500

1000

1500

Th

(o K

)


(d)

Number of FBFR test data samples

(a 100)

Fig. 6.8: Comparison of Tri and Th test data predictions by

the network trained using ARLS and MLMA

algorithms for 10 Epochs.

Fig. 6.7: Comparison of Tri and Th test data predictions

by the network is trained using ARLS and

MLMA algorithms for 100 Epochs.

by the network trained using the ARLS and the MLMA algorithms indicate a superiority of the MLMA over the

ARLS algorithms.

The computation of the mean value of the K-step ahead prediction error (MVPE) using (3.93) gives

1.7800e-002 and 2.5006e-003 by the network trained using ARLS and MLMA algorithms for 10 epoch and

1.7511e-3 and 1.6484e-4 for 100 epoch respectively as shown in the sixth line in Table 6.1. The small mean

values of the 5-step ahead prediction error (MVPE) are indications that the trained network approximates the

dynamics of the FBFR process to a high degree of accuracy with the network using MLMA giving a much

smaller prediction errors.


184

0 200 400 600 800 1000 1200 14000

500

1000

1500

(a)

Tri

(o

K)

- Unscaled training data; --* 5-Step ahead predictions using ARLS

0 200 400 600 800 1000 1200 14000

500

1000

1500

Tri

(o

K)

- Unscaled training data; --* 5-Step ahead predictions using MLMA

(b)

0 200 400 600 800 1000 1200 14000

500

1000

1500

(c)

Th

(o

K)


0 200 400 600 800 1000 1200 14000

500

1000

1500

Th

(o K

)


(d)


(a 10)

0 200 400 600 800 1000 1200 14000

500

1000

1500

(a)

Tri

(o K

)


0 200 400 600 800 1000 1200 14000

500

1000

1500

Tri

(o K

)


(b)

0 200 400 600 800 1000 1200 14000

500

1000

1500

(c)

Th

(o K

)- Unscaled training data; --* 5-Step ahead predictions using ARLS

0 200 400 600 800 1000 1200 14000

500

1000

1500

Th

(o K

)


(d)


(a 100)

Fig. 6.10: 5-step ahead Tri and Th output predictions (red--

*) comparison with original unscaled training data

(blue-) using the network trained using ARLS and


Fig. 6.11: 5-step ahead Tri and Th output predictions (red--

*) comparison with the original unscaled training data

(blue -) using the network trained using ARLS and


6.2.2.1.3 Akaike’s Final Prediction Error (AFPE) Estimates for the FBFR Process

The implementation of the AFPE algorithm discussed in Chapter 3 and defined by (3.94) for the

regularized criterion for the network trained with ARLS and the MLMA algorithms with multiple weight decay

gives their respective AFPE estimates which are defined in the last line of Table 6.1 for 10 and 100 epochs

respectively. These small values of the AFPE estimate indicate that the trained networks capture the underlying


185

dynamics of the FBFR system and that the network is not over-trained [Sjöberg and Ljung, 1995]. This in turn

implies that optimal network parameters have been selected including the weight decay parameters. Again, the

results of the AFPE estimates computed for the networks trained using the MLMA algorithm are slightly smaller

when compared to those obtained using ARLS algorithm.

6.2.2.2 Online Closed-Loop Identification with PID Control

Besides the training of the NN model with static data taken from plant tests, it would be of interest to

validate the prediction accuracy of a trained network under the same dynamic conditions in which the plant is

operating. In the simplest case, the FBFR process is implemented in closed loop with a discrete-time fixed

parameter PID controller as in the closed loop configuration shown in Fig. 6.12 with disturbances ( )d k .

The disturbance ( )d k here is introduced by changing the product of the density ( )h

ρ and heat capacity

( )h

Cp of the electric heater in the mathematical model of the FBFR given in Appendix B. The product of these

two parameters was estimated in [Voutetakis et al., 2006] as 51.31 10× but the actual product of these terms as

used in the FBFR mathematical model given in Appendix B is 57.63800 10× , which is about 5.83 times the

estimated value [Voutetakis et al., 2006]. Let the product of h

ρ and h

Cp be defined ascross h h

Cp Cpρ= i . The

FBFR process model was changed by taking +30% of 57.63800 10cross

Cp = × across 120 which correspond to the

number of simulation samples considered here for the closed loop identification and PID control. The lower and

upper values of +30% of cross

Cp are 59.92940 10× and 55.34660 10× in step of 33.8511 10× .

In the simplest case, the FBFR process affected by the above disturbances is controlled by a discrete-time

fixed parameter proportional-integral-derivative (PID) controller used in a closed-loop configuration illustrated in

Fig. 6.12. This operation is imitated by placing the network trained by each one of the two algorithms in a control

Table 6.2: Input and output constraints on the PID control

of the FBFR process

ARLS MLMA ARLS MLMA

EPOCHS

10 100

FBFR Data

Tri Th Tri Th

Initial control input, U -100 -100 -100 -100

Initial control output, Y 0 0 0 0

Minimum control input, minU -200 -200 -200 -200

Maximum control input, maxU 200 200 200 200

Minimum predicted output, minY 0 0 0 0

Maximum predicted output, maxY 860 1040 860 1040

Desired reference signal, ( )R k 860 1040 860 1040

PID Controller FBFR Process ( )R k ( )E k ( )Y k ( )U k

( )d k

• + –

Fig. 6.12: The PID control scheme.


186

loop as it happens in real plants. The mathematical relationships implemented for the PID controller that

computes the FBFR control inputs ( ) [ ( ) ( )]U k DWP k HRP k= is given by the following equation:

[ ][ ]

1

( ) ( 1)( ) ( ) ( 1) ( )

2

N

P I D

k

E k E kTU k K E k K E k E k K

T=

− −= + − + +∑ (6.7)

where ,P I

K K and D

K are the proportional, integral and derivative gains respectively, T is the sampling time

and ˆ( ) ( ) ( )E k R k Y k= − is the error between the desired reference ( )R k and predicted output ˆ( )Y k , N is the

number of samples. The first, second and third terms in (6.7) corresponds to the present, past and future control

sequence. The minimum and maximum constraints imposed on the PID controller to penalize changes on the

FBFR control inputs ( )U k and outputs ( )Y k are given as:

min max

min max

( )

( )

U U k U

Y Y k Y

≤ ≤

≤ ≤ (6.8)

A major problem with PID controllers is the “wind up” of the integrator resulting in the saturation of the

integral term for control signal of large magnitude. However, rich literatures exist on anti-wind up techniques

which address this problem ([Hippe, 2006]; [Visioli, 2006]). According to this method, the integrator is switched

off when the actuator output exceeds a predefined limit subject the input constraints imposed on the control input

and the predicted outputs defined in (6.8).

First, the discrete-time PID controller is placed in closed loop with the first principles validated model of

the FBFR given in Appendix B and simulated in MATLAB for 120 samples without the above disturbances. The

PID parameters in (6.7) were selected to be 30P

K = , 50I

K = and 100D

K = for both Tri and Th. The constraints

imposed on the PID controller for the FBFR process defined in (6.8) are summarized in Table 6.2 in accordance

with the FBFR process control objectives together with the initial control inputs and outputs. A similar simulation

is carried out with the first principles validated FBFR process model in the presence of the disturbances discussed

above with the same PID control parameters with the same process constraints given in Table 6.2.

The results for the Th and Tri output predictions using the first principles validated model of the FBFR

process without disturbances are shown in Fig. 6.13(a) and (b) while the control (or manipulated) inputs, that is,

the HRP and DWP are shown in Fig. 6.13(c) and (d) respectively. It can be seen in Fig. 6.13(a) and (b) that

although a reasonable tracking of the Th and Tri desired outputs of 1040°K and 860°K are achieved without

overshoot inline with the prescribed control objectives.

On the other hand, similar simulation results for the FBFR process in the presence of disturbances

obviously violates the desired control objectives as it is evident in Fig. 6.14(a) and (b) with large control signal

values, as shown in Fig. 6.14 (c) and (d). As it can be seen in Fig. 6.14, the performance of the PID control

exhibits some oscillation in order to track the desired Tri outputs in Fig. 10(b) whereas the controller could not

track the Th reference signal in Fig. 6.14 (a). This behaviour is due to the disturbances ( )d k introduced into the

FBFR process model. This disturbance combined with the strong nonlinearity associated with the heater and the


187

reactor interior especially at the start of the experiment makes the process tend to operate outside its nominal

operating region for which the fixed-parameter PID controller was designed control given an accurate model.

Next, a similar simulation is carried out but having in the place of the first-principles model the identified

NN model which is trained and updated at each time step over the 120 samples using the ARLS and the MLMA

algorithms for different values of ( )cross

d k Cp= . To clarify the notion of the disturbance, assuming that at time k ,

the control inputs are used to update the FBFR process based on the predicted outputs. After updating the process,

the disturbance is added which attempts to change the process model and thereby causing a possible change in the

process outputs. The objective of the neural network then is to identify a reasonable “good” model of the process

based on learned dynamics while the control objective is to track the desired reference as close as possible at time

1k + . A quick question may be what would happen to the FBFR process during this time lag? A reasonable

answer may be to update the process more than once within a given sampling interval to circumvent any accident.

0 20 40 60 80 100 1200

500

1000

(a)

Th p

redic

tions (

o K

)

0 20 40 60 80 100 1200

200

400

600

800

(b)

Tri p

redic

tions (

o K

)

Th Ref

FBFR

Tri Ref

FBFR

0 20 40 60 80 100 1200

100

200

(c)

HR

P c

ontr

ol sig

nal (K

Ω)

0 20 40 60 80 100 1200

50

100

150

200

(d)

Number of samples

DW

P c

ontr

ol sig

nal (m

/sec)

FBFR HRP

FBFR DWP

0 20 40 60 80 100 1200

500

1000

(a)

Th p

redic

tions ( o

K )

0 20 40 60 80 100 1200

200

400

600

800

(b)

Tri p

redic

tions (

o K

)

Th Ref

FBFR

Tri Ref

FBFR

0 20 40 60 80 100 1200

100

200

(c)

HR

P c

ontr

ol sig

nal (K

Ω)

0 20 40 60 80 100 1200

50

100

150

200

(d)

Number of samples

DW

P c

ontrol sig

nal (m

/sec)

FBFR HRP

FBFR DWP

Fig. 6.13: PID control performance with the first principles

validated model of the FBFR process: (a) Th and

(b) Tri output predictions, and (c) Th and (d) Tri

predictions without disturbances on the model.

Fig. 6.14: PID control performance with the first principles

validated model of the FBFR process under

disturbances: (a) Th and (b) Tri output predictions,

and (c) HRP and (d) DWP control signals.


188

However, in continuation of the closed PID controller simulation with the FBFR process under

disturbances, the same PID parameter values of 30P

K = , 50I

K = and 100D

K = were again used for both Tri

and Th. All simulations were executed over the same 120 sampling instants by superimposing the disturbance

( )cross

d k Cp= on the FBFR system as discussed above. The simulations were done with the network trained by

using the ARLS and the MLMA algorithms under the same disturbance conditions ( )cross

d k Cp= for 10 and 100

epochs.

The results for the Th and Tri output predictions are shown in Fig. 6.15(a) and (b) while the manipulated

inputs, that is, the HRP for regulating Q and DWP for regulating the deionized water flow rate are shown in Fig.

6.15(c) and (d) for a model trained with ARLS and MLMA algorithms with 10 epochs. Similar simulation results

for 100 epochs are shown in Fig. 6.16(a)–(d). As it can be seen in all these results a fairly good follow-up of the

0 20 40 60 80 100 1200

500

1000

(a)

Th

pre

dic

tio

ns

( o

K )

0 20 40 60 80 100 1200

200

400

600

800

(b)

Tri

pre

dic

tio

ns

( o

K )

Th Ref

ARLS

MLMA

Tri Ref

ARLS

MLMA

0 20 40 60 80 100 1200

50

100

150

200

(c)

HR

P c

ontr

ol sig

nal (K

Ω)

0 20 40 60 80 100 1200

50

100

150

200

(d)

Number of samples

DW

P c

ontr

ol sig

nal (m

/sec)

ARLS

MLMA

ARLS

MLMA

0 20 40 60 80 100 1200

500

1000

(a)

Th

pre

dic

tio

ns

( o

K )

0 20 40 60 80 100 1200

200

400

600

800

(b)

Tri

pre

dic

tio

ns

( o

K )

Th Ref

ARLS

MLMA

Tri Ref

ARLS

MLMA

0 20 40 60 80 100 1200

50

100

150

200

(c)

HR

P c

ontr

ol sig

nal (K

Ω)

0 20 40 60 80 100 1200

50

100

150

200

(d)

Number of samples

DW

P c

ontr

ol sig

nal (m

/sec)

ARLS

MLMA

ARLS

MLMA

Fig. 6.15: Closed-loop PID control performance of the

FBFR process using NN model trained with ARLS

(dotted red lines) and MLMA (black dashed dotted

lines) for 10 epochs: (a) Th and (b) Tri output

predictions, (c) HRP and (d) DWP control signals.

Fig. 6.16: Closed-loop PID control performance of the

FBFR process using NN model trained with ARLS

(dotted red lines) and MLMA (black dashed dotted

lines) for 100 epochs: (a) Th and (b) Tri output

predictions, (c) HRP and (d) DWP control signals.


189

first principles model response is achieved by both the ARLS and MLMA models for 10 epochs. It can be

observed that the model based on ARLS exhibits oscillatory behaviour in Th predictions as in Fig. 6.15(a) and (b)

as well as in Fig. 6.16(a) and (b). This behaviour is not unusual because of the strong nonlinearity associated with

the heater and the reactor interior especially at the start of the experiment [Voutetakis et al., 2006]. This effect can

also be observed in Fig. 6.13 (a) and (b) as well as in Fig. 6.14(a) and (b) based on the true first principles

validated FBFR process model. Similar results are obtained with networks trained over 100 epochs. For space

saving reasons these results are not presented.

Comparing the FBFR control performance of Fig. 6.13 and Fig. 6.14 with Fig. 6.15 and Fig. 6.16, it is

evident that the models obtained with the two proposed identification algorithms give good control performances

even with the fixed parameter PID controller under disturbances. With the ARLS and the MLMA training

algorithms proposed in this work, changes on the process dynamics seem to be captured adequately. Furthermore,

the study has shown that the control performance based on the NN model trained using the MLMA outperforms

that based on the ARLS method as this becomes evident in Fig 6.15(a) and Fig. 6.16(a).

6.2.3 Validation and Dynamic Performance Comparison of the Proposed MLMA

algorithm with Backpropagation with momentum (BPM) and Incremental

Backpropagation (INCBP) Algorithms

The backpropagation with momentum (BPM) is a widely used neural network training algorithm. A

version of this algorithm presented in Section 2.4.4.1 of Chapter 2 and is implemented in this work in order to

compare its performance with the identification algorithms proposed in this work. The modified Levenberg-

Marquardt algorithm (MLMA) is considered here for this comparison due to its superior performance over the

adaptive recursive least squares (ARLS) algorithm which is a true online identification algorithm. The third

algorithm included in this comparison is the incremental or online backpropagation (INCBP) algorithm originally

proposed by [Hertz et al., 1991]. The incremental backpropagation (INCBP) algorithm is easily derived by setting

the covariance matrix ( )P k Iµ= on the left hand side of (3.54) in Section 3.3.5 of Chapter under the formulation

of the ARLS algorithm; that is:

1

1

1[ , ( )]

k

I R kk

τ τ

ι

µ ι θ −

=

= ∑ (6.9)

where µ is the step size and I is an identity matrix of appropriate dimension. Next, Equation (3.17) in Section

3.3.3 of Chapter 3 is used to update the algorithm, that is:

( ) ( )

[ ( )]ˆ( ) ( )( )

k k

dJ kk k

d kτ

τ τ

θ θ

θθ θ µ

θ=

= − (6.10)


190

All that is required is to specify a suitable step size µ and carry out the recursive computation of the gradient

given by (6.10).

6.2.3.1 Network Training Using BPM, INCBP and the Proposed MLMA Algorithms

Due to the simplicity and generality of the backpropagation with momentum (BPM) algorithm discussed

in Section 2.4.4.1 in Chapter 2 and the incremental backpropagation (INCBP) algorithm just discussed above in

Section 6.2.3, the NNARX model regression vector ( )NNARX

kϕ defined by (6.4) is used as the input vector to the

neural network. The outputs of the NN are the predicted values of ˆ( )Y k given by (6.6). The simulation results for

Th and Tri are presented, and the predicted outputs of interest is then given as ˆ ˆ ˆ( ) [ ( ) ( )]T

Th TriY k y k y k= .

The two design parameters for the BPM algorithm defined in (2.59) are the learning rate 1 4eγ = − (i.e.

the step size) and momentum term 1 3eα = − . The design parameter for the INCBP is the step size 1 4eµ = − .

Finally, the design parameters for initializing the MLMA algorithm were selected as 0.001τλ = , 0.05s = and

0.01δ = . It was shown in Section 6.2.2 that the performance of the ARLS and the MLMA algorithms converges

to acceptable values after approximately 100 epochs. In order to assess the convergence performance of the

network trained by the three algorithms, the network was also trained for τ = 100 epochs with the following

selected parameters: 2p = , 6q = , 3a

n = , 3b

n = , 24nϕ = , 10h

n = , 6o

n = , 1 5h

eα = − and 1 4o

eα = − .

The training data is first scaled using (3.89) and the network is trained for 100τ = epochs using the

backpropagation with momentum (BPM), the incremental backpropagation (INCBP) and the modified

Levenberg-Marquardt algorithm (MLMA) algorithms. After network training, the trained network is again

rescaled according to (3.90), so that the resulting network can work with unscaled FBFR data. The convergences

of the BPM, INCBP and the MLMA algorithms for 100 epochs are shown in Fig. 6.17 and are evaluated in terms

of the performance index. It can observe that only the MLMA algorithm meets the training goal of 10-6

while the

BPM and the INCBP tend to remain around 10-2

within the prescribed 100 epochs with slow convergence.

By comparing the convergences of the BPM, INCBP and MLMA algorithms in Fig. 6.17, it can be seen

that the BPM and INCBP algorithms converges faster to almost the same values than the MLMA algorithm. It can

also be seen in this figure that the performance index obtained by the network trained using MLMA algorithm has

a much smaller value when compared to that obtained by the network when it is trained using the BPM and the

INCBP algorithms. The summary of the network training results using the BPM, INCBP and the MLMA

algorithms are presented in Table 6.3 for quick comparison of the performances of the network when it is trained

by the three mentioned methods.

The computation time for training the network for 100 epochs using each one from the three algorithms

are shown in the first row of Table 6.3. As it can be seen, despite the fact that the BPM and the INCBP algorithms


191

Table 6.3: Summary of training results for the BPM, INCBP and the MLMA algorithms BPM INCBP MLMA

Computation time for model identification (sec) 4.6250 9.7188 2.7969

Mean square error (MSE) 4.8768 7.2515 9.2836e-3

Minimum performance index 2.9012e-2 4.0337e-2 3.8297e-6


Mean error of one-step prediction of test data 3.0647e+2 4.4755e+2 3.6595e-4

Mean value of 5-step ahead prediction error 3.0548 2.4241 1.6901e-3

Akaike’s final prediction error (AFPE) estimate 2.4126e+1 3.8799e+1 2.4945e-4

0 10 20 30 40 50 60 70 80 90 10010

-6

10-5

10-4

10-3

10-2

10-1

100

Number of iterations (epoch)

BPM

INCBP

MLMA

Fig. 6.17: Network convergence using the BPM, INCBP and the MLMA algorithms (performance index vs. epoch).

converge fast enough the MLMA is about 1.6536 and 3.4748 faster than these two algorithms in terms of

computational time while it retains a smaller performance index value. The mean value of the mean square errors

(MSE) and the minimum performance indexes for the network trained with the BPM, INCBP and the MLMA

algorithms are given in the second and third lines of Table 6.3. Again, the MLMA algorithm also has smaller

mean value of the mean square error (MSE) and minimum performance index when compared to the BPM and

the INCBP algorithms. The small mean value of the mean square error (MSE) and the minimum performance

index obtained by the trained by the MLMA algorithm network, shows that MLMA performs much better than

the BPM and the INCBP algorithms for the same number of iterations (100 epochs). These small errors suggest

that the network with the MLMA algorithm approximates the FBFR system model with better accuracy compared

to that obtained by the network trained with the BPM and INCBP algorithms.

6.2.3.2 Validation of the Network Trained by BPM, INCBP and MLMA Algorithms

In order to verify the prediction capability of the networks trained using the BPM, INCBP and MLMA

algorithms, the same validation techniques used in Section 6.2.2.1 is again used here to evaluate the performance

of the trained network, namely: one-step ahead prediction, 5-step ahead prediction and the Akaike’s final


192

prediction error (AFPE) estimates which were discussed in Chapter 3. The network trained by the proposed ARLS

and MLMA algorithms developed for the FBFR process have been validated by the use of scaled and unscaled

training as well as with 300 test data obtained directly from the FBFR process.

6.2.3.2.1 One-Step Ahead Predictions Simulation for the FBFR Process

As in Section 6.2.2.1.1, in the one-step ahead prediction method, the training data obtained from the open-

loop simulation of the differential equations model that was scaled and used for training the network are

compared with the one-step ahead output predictions of the trained network using and an assessment of their

corresponding errors is made.

The comparison of the one-step ahead predictions of the scaled training data (target output, blue -) against

the trained network output predictions (red -.-) by the networks trained for 100 epochs using the BPM, INCBP

and the MLMA algorithms are shown in Fig. 6.18 (a) and (b) for Tri and Th respectively.

The one-step ahead prediction errors for predicting the scaled training data by the network trained using

the BPM, INCBP and the MLMA algorithms are given in the fourth line of Table 6.3. It can be seen in Fig. 6.18

(a) and (b) for Tri and Th respectively, the network predictions of the training data based on the network trained

using the MLMA algorithm closely match the original training data used for Tri and Th, whereas there are much

prediction mismatch obtained with the networks trained using the BPM and INCBP algorithms. Also, the smaller

one-step ahead prediction error obtained using the network trained by the MLMA when compared to that by BPM

and INCBP algorithms are also evident in the fourth line of Table 6.3. This error is an indication that the trained

networks using the MLMA algorithm captures and approximates the nonlinear dynamics of the FBFR accurately.

This is further justified by the small mean value of the MSE obtained using ARLS and MLMA algorithms for Tri

and Th respectively. This is further justified by the small mean value of the MSE obtained using MLMA

algorithms given in the second line of Table 6.3.

Furthermore, the suitability of the BPM, INCBP and proposed MLMA algorithms for neural network

model identification for use in the FBFR industrial environment is investigated by validating the trained network

with 300 unscaled test data obtained directly from the real FBFR process. The comparison of the trained network

predictions (red --*) of the test data with the actual test data (test data, blue -) for 100 epochs are shown in Fig.

6.19 (a) and (b) for Tri and Th respectively for the BPM, INCBP and the MLMA algorithms. It is evident that the

unscaled test data predictions by network trained using the MLMA algorithm match the true test data to a high

accuracy when compared to those obtained by the network trained using BPM and INCBP. However, the BPM

shows a slight improved performance over the INCBP. The superior performance of the proposed MLMA

algorithm over the BPM and the INCBP algorithms proves the effectiveness of the proposed MLMA approach.


193

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri ( oK)

- Scaled training data; -- One-step ahead output predictions using BPM

Target output

BPM

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri ( oK)

- Scaled training data; -- One-step ahead output predictions using INCBP

Target output

INCBP

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Tri ( oK)

- Scaled training data; -- One-step ahead output predictions using MLMA

(a)

Target output

MLMA

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th ( oK)


Target output

BPM

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th ( oK)


Target output

INCBP

0 200 400 600 800 1000 1200 1400-2

-1

0

1

2

Th ( oK)

- Scaled training data; -- One-step ahead output predictions using MLMA

(b)

Number of scaled training data samples

Target output

MLMA

Fig. 6.18: Comparison of (a) Tri and (b) Th training data predictions by the network trained using backpropagation with

momentum (BPM), incremental backpropagation (INCBP), and the proposed MLMA algorithms.

The one-step ahead prediction accuracies of the unscaled test data by the networks trained using the BPM,

INCBP and the MLMA algorithms is evaluated by the computed mean prediction errors shown in the fifth line of

Table 6.3. It can be seen that the one-step ahead test data prediction errors by the network trained using MLMA

algorithm are much smaller than those obtained from the network trained using the BPM and INCBP algorithms.


194

This one-step ahead unscaled test data prediction results given by Fig. 6.19 as well as the one-step ahead

prediction error in Table 6.3 justify that the network trained using the MLMA algorithm mimic the dynamics of

the FBFR and that the resulting network can be used to model the actual FBFR process in an industrial

environments.

0 50 100 150 200 250 300200

400

600

800

1000

1200

Tri ( oK)

- Unscaled test output; -- Predicted outputs by BPM

Test Data

BPM

0 50 100 150 200 250 300200

400

600

800

1000

1200

Tri ( oK)

- Unscaled test output; -- Predicted outputs by INCBP

Test Data

INCBP

0 50 100 150 200 250 300200

400

600

800

1000

1200

Tri ( oK)

- Unscaled test output; -- Predicted outputs by MLMA

(a)

Test Data

MLMA

0 50 100 150 200 250 300400

600

800

1000

1200

1400

Th ( oK)

- Unscaled test output; -- Predicted outputs by BPM

Test Data

BPM

0 50 100 150 200 250 300400

600

800

1000

1200

1400

Th ( oK)

- Unscaled test output; -- Predicted outputs by INCBP

Test Data

INCBP

0 50 100 150 200 250 300400

600

800

1000

1200

1400

Th ( oK)

- Unscaled test output; -- Predicted outputs by MLMA

(b)

Number of test data samples

Test Data

MLMA

Fig. 6.19: Comparison of (a) Tri and (b) Th test data predictions by the network trained using backpropagation with

momentum (BPM), incremental backpropagation (INCBP), and the proposed MLMA algorithms.


195

6.2.3.2.2 K–Step Ahead Prediction Simulations for the FBFR Process



data (target output) are shown in Fig. 6.20 (a) and (b) for Tri and Th respectively for the network trained using the

BPM, INCBP and MLMA algorithms. Again, the value K = 5 is chosen since it is a typical value used in most

0 200 400 600 800 1000 1200 1400200

400

600

800

1000

1200

1400

Tri ( oK)

- Unscaled training data; -- 5-Step ahead output predictions by BPM

Target output

BPM

0 200 400 600 800 1000 1200 1400200

400

600

800

1000

1200

1400

Tri ( oK)

- Unscaled training data; -- 5-Step ahead output predictions by INCBP

Target output

INCBP

0 200 400 600 800 1000 1200 1400200

400

600

800

1000

1200

1400

Tri ( oK)

- Unscaled training data; -- 5-Step ahead output predictions by MLMA

(a)

Target output

MLMA

0 200 400 600 800 1000 1200 1400400

600

800

1000

1200

1400

1600

Th ( oK)


Target output

BPM

0 200 400 600 800 1000 1200 1400400

600

800

1000

1200

1400

1600

Th ( oK)


Target output

INCBP

0 200 400 600 800 1000 1200 1400400

600

800

1000

1200

1400

1600

Th ( oK)

- Unscaled training data; -- 5-Step ahead output predictions by MLMA

(b)


Target output

MLMA

Fig. 6.20: Comparison of the 5-step ahead output predictions (red --*) of the NN for (a) Tri and (b) Th when it is trained

by the BPM, (INCBP), and the proposed MLMA algorithms with the original unscaled training data (blue-).


196

model predictive control (MPC) applications. The comparison of the 5-step ahead output predictions performance

by the network trained using BPM, INCBP and the MLMA algorithms shows the superior performance of the

MLMA algorithm over the BPM, INCBP algorithms for use in distant or multi-step ahead predictions.


3.0548, 2.4241 and 1.6901e-3 respectively by the network trained using the BPM, INCBP and MLMA algorithms

as shown in the sixth line in Table 6.3. The relatively small MVPE obtained by the network trained with the

MLMA algorithm is indications that the trained network approximates the dynamics of the FBFR process to a

high degree of accuracy.

6.2.3.2.3 Akaike’s Final Prediction Error (AFPE) Estimates for the FBFR Process


regularized criterion for the network trained with the BPM, INCBP and the MLMA algorithms with multiple

weight decay gives the respective AFPE estimates of the three algorithms as defined in the last line of Table 6.3.

These small values of the AFPE estimate indicate that the trained networks capture the underlying dynamics of

the FBFR system and that the network is not over-trained [Sjöberg and Ljung, 1995]. This in turn implies that

optimal network parameters have been selected including the weight decay parameters. Again, the results of the

AFPE estimates obtained with the networks trained using the MLMA algorithm are slightly smaller when

compared to those obtained using BPM and INCBP algorithms.

6.2.3.3 Performance Comparison of the BPM, INCBP and the MLMA Algorithms

The simulation results for the neural network training using the BPM, INCBP and MLMA algorithms as

well as the network validation result for the FBFR process are shown in Fig. 6.17, Fig. 6.18, Fig. 6.19, and Fig.

6.20 respectively; whereas numerical results for these three algorithms are presented in Table 6.3.

The numerical results summarized in Table 6.3 show that when the network is trained by using the

MLMA algorithm presents the best performance as the least values in all of the six properties are achieved. In

addition, the least value of the AFPE is obtained using the MLMA algorithm which indicates that the MLMA

algorithm captures the essential dynamics of the process and that the identified NN model approximates the true

system to a high degree of accuracy. The small mean value of the 5-step ahead prediction error (MVPE) is an

indication that the MLMA algorithm can be used in adaptive predictive control applications due to its accurate 5-

step ahead predictions when compared to BPM and INCBP algorithms. Furthermore, as the relative small

performance index is obtained within a relatively short time of 100 epochs, makes this algorithm more appropriate

for applications with real time requirements.


197

6.2.4 Validation and Performance Evaluation of the Proposed AGPC and NAMPC

Algorithms for the Model-Based Adaptive MPC of the FBFR Process

In Chapter 4, an adaptive generalized predictive control algorithm (AGPC) and a nonlinear adaptive

predictive control algorithm (NAMPC) were proposed. In this sub-section, the performances of both control

algorithms are evaluated. The model structure considered here is the NNARMAX model which was used in

Section 6.2.2 to investigate the efficiencies of the ARLS and the MLMA algorithms in neural network training.

The main control objective here is to ensure that by manipulating the high resistance potentiometer (HRP)

and the deionized water pump (DWP) flow rate, the temperatures of the electric heater (Th) and the reactor

interior (Tri) should not exceed 1040 °K and 860 °K respectively without any overshoots throughout the catalyst

processing and deactivation process. As it is stated in [Voutetakis et al, 2006], a relatively small overshoot above

2% might give unacceptable final product properties.

Before implementing the AGPC and the NAMPC control strategies explained in Chapter 4, the neural

network model obtained using adaptive recursive least squares (ARLS) algorithm in Section 6.2.2 and validated in

Section 6.2.2.1 is used to tune the two controllers subject to the constraints in Table 6.4. The model trained using

ARLS algorithm is used because of its simplicity and because it is a truly online identification algorithm. As

shown in Table 6.4, minimum and maximum input and output constraints are imposed for the efficient and tight

control of the FBFR such that the controlled outputs will follow the desired reference signal without any

overshoot provided that these constraints are not violated. Once an initially tuned controller becomes available

then the adaptive implementation of the AGPC and NAMPC control algorithms is considered by using on-line the

ARLS algorithm for training the NN each time new data becomes available from the closed loop plant operation.

The AGPC and NAMPC algorithms are shown pictorially in Fig. 4.1 and Fig. 4.4 respectively in Chapter

4. Here, the system in Fig. 4.1 and Fig. 4.4 both correspond to the FBFR process. The neural network model in

both Fig. 4.1 and Fig. 4.4 are based on the NNARMAX model identification scheme illustrated in Fig. 3.4 (b) in

Chapter 3.

First, the optimal closed loop tuning parameters for the AGPC and the NAMPC controllers based on the

use of the identified neural network model which was trained with the ARLS algorithm are given in Table 6.5.

The initial control input (ICI) and initial predicted output (ICO) are used to initialize the iterative solutions of the

two controllers. The remaining tuning parameters are discussed in Chapter 4 where d

N , u

N , pN is the minimum,

control and prediction horizons respectively; κ and ρ are weighting factors for penalizing changes on the

control inputs ( )U k and predicted outputs ˆ( )Y k ; λ is the Levenberg-Marquardt parameter which is also the

adaptation parameter for the NAMPC controller; m

A and m

B are the first-order digital reference filter design

parameters; δ is the trust-region algorithm radius within which the optimal control signal can be found and a

design parameter used in the NAMPC algorithm; and iter

U is the maximum number of iterations required to

determine the optimal control signal at each sampling instant. These controller tuning parameters are obtained a


198

Table 6.4: Constraints for the FBFR Process

Process Constraints AGPC NAMPC

Th Tri Th Tri

Minimum control input (minU ) 0 0 0 0

Maximum control input (maxU ) 200 200 200 200

Minimum predicted output (minY ) 0 0 0 0

Maximum predicted output (maxY ) 1200 1000 1040 860

Desired reference signal ( R' ) 1040 860 1040 860

Table 6.5: The AGPC and the NAMPC tuning parameters for the FBFR process

AGPC NAMPC Tuning Parameters

Th Tri Th Tri

Initial control input (ICI, U) -80 -80 -10 -10

Initial predicted output, (IPO, Y) 0 0 0 0

Nd 1 1 1 1

Nu 4 3 3 2

Np 7 7 7 5

κ 1.5 1 1.5 1

ρ 0.08 0.8 1 0.08

λ - - 0.1 0.7

Am [1 -0.7] [1 -0.7] [1 -0.7] [1 -0.7]

Bm [0 0.3] [0 0.3] [0 0.3] [0 0.3]

δ - - 1e-6 1e-4

Uiter - - 10 10

priori using the model of the system which is the trained neural network model in this case. Using the off-line

neural network model, the controller is tuned to obtain the optimal tuning parameters shown in Table 6.4 together

with an initial value for the adaptation parameter λ . The main control parameter that is adjusted adaptively is the

λ which is initially selected around the current operating point of the model used for the controller.


k kϕ θ

defined by (6.4). The setup of the neural network identification scheme is similar to that discussed in Section

6.2.1.4 and 6.2.2. The input ( , ( ))cn k kϕ θ is initialized to a small positive random matrix of dimension

cn by

cn .

The outputs of the neural network are the predicted values of ˆ( )Y k given by (6.5). However, the simulation

results of interest are the temperatures of the heater (Th) and the reactor interior (Tri) given as

ˆ ˆ ˆ( ) [ ( ) ( )]T

Th TriY k y k y k= and the control inputs are ( ) [ ( ) ( )]T

Q DWPU k u k u k= , where ( ) ( )Q HRPu k u k= is

dependent on the control input of the high resistant potentiometer (HRP).

Next, the AGPC and the NAMPC control strategies are programmed to run in closed-loop with the first

principles model of the FBFR process given in Appendix B and implemented in MATLAB environment. Again,

the disturbance ( )d k discussed in Section 6.2.2.2 is introduced by changing the product of the density ( )h

ρ and

heat capacity ( )h

Cp of the electric heater. Except that here, changes in the FBFR process model were made by

taking +30% of cross h h

Cp Cpρ= i over a period of 350 sampling instants which correspond to the number of


199

simulation samples considered here for the closed loop identification and adaptive control of the process. The

lower and upper values of +30% of cross

Cp are 59.92940 10× and 55.34660 10× in steps of 31.3131 10× each.

At time k , the initial control input ( )U k and the disturbances ( )d k in Fig. 4.1 and Fig. 4.4 are applied

to the system to obtain a priori system output which is used in accordance with the known system output to

identify the neural network model of the system by the neural network-based ARLS training algorithm considered

here. On the basis of the identified neural network model, the AGPC and the NAMPC control signal is computed

and applied to the system such that the predicted output will follow the desired reference signal. At time 1k + , the

new control input and the disturbance are applied to the system and the entire identification and control strategy is

repeated for another control input which will make the predicted output to track the desired reference signal. Note

that the control input and the disturbance influence the system’s output. The neural network identification scheme

then attempts to identify this new model using the teacher forcing method with known system output from

experimental data. The function of the adaptive control strategies is to compute a control which will make the

predicted system output to follow the desired reference signal.

All the training data are first scaled using (3.89), the network is trained for 100τ = epochs using the

adaptive recursive least squares (ARLS) and then the trained network is again rescaled according to (3.90). In this

way, the resulting weights of the network can be applied to unscaled control inputs that are calculated by the

controller. As in Section 6.2.2, here the following network parameters were selected: 2p = , 6q = , 3a

n = , 3b

n = ,

3c

n = , 42nϕ = , 10h

n = , 6o

n = , 1 5h

eα = − and 1 4o

eα = − . The four design parameters for adaptive recursive

least squares (ARLS) algorithm defined in (3.68) are selected to be: α=0.5, β=5e-3, 'δ =1e-5 and π=0.99 resulting

to γ=0.0101. The initial values for ēmin and ēmax in (3.67) are equal to 0.0102 and 1.0106e+3 respectively and were

evaluated using (3.67). Thus, the ratio ēmin/ēmax in (3.66) is 9.9018e+4 which imply that the parameters are well

selected.

6.2.4.1 Comparison of Simulation Results for the Control Performance of AGPC

and NAMPC for the FBFR Process Identification and Control

The Th and Tri predictions from the closed loop AGPC and the NAMPC control are shown in Fig. 6.21

(a) – (b) while the control inputs, that is, the HRP and DWP are shown in Fig. 6.21 (c) – (d). In this simulation,

we allow the constraints on the maximum predicted outputs to be 1200°K and 1000°K for Th and Tri respectively

as can be seen in Table 6.4 so that any overshoot can easily be observed.

The identification and control simulation results shown in Fig. 6.21 indicates the observed predicted

outputs based on the computation of the control inputs in response to the desired setpoint changes in the presence

of the disturbances applied to the system as discussed in the previous sub-section. The idea here is observe and

compare the tracking of the desired reference signal by the AGPC and the NAMPC control strategies.


200

0 50 100 150 200 250 300 3500

200

400

600

800

1000

(a)

Th p

redic

tions (oK)

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 3500

200

400

600

800

(b)

Number of samples

Tri p

redic

tions (oK

)

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 350

-50

0

50

100

150

200

(c)

HRP c

ontrol signal (K

Ω)

AGPC

NAMPC

0 50 100 150 200 250 300 350

0

50

100

150

200

(d)

Number of samples

DW

P c

ontrol signal (m

/sec)

AGPC

NAMPC

Fig. 6.21: FBFR temperature predictions by AGPC (blue--) and NAMPC (red -) for (a) Th and (b) Tri with

the manipulated signals (c) HRP and (d) DWP to track the desired reference signal (green .-).

As it can be seen in Fig. 6.21, the NAMPC presents better performance over the AGPC. The AGPC

exhibits a small overshoot with occasional oscillations before tracking the desired reference signals whereas in

certain instances it cannot track the desired reference as shown in Fig. 6.21 (a) and (b). The unusual behaviour or

inability of AGPC to track the heater temperature (Th) at the start of the first rise in temperature can be attributed

to the strong nonlinearity inherent in the dynamics of the heater and initial heat-up. However, the AGPC is able to

maintain the desired temperature of the reactor interior which is the more important control objective [Voutetakis

et al., 2006]. As can also be observed in Fig. 6.21(a), the difficulty with the AGPC controller appears during

changes in the state of the heater temperatures either from lower to higher or from higher to lower temperatures.


201

0 50 100 150 200 250 300 3503

3.05

3.1

3.15

3.2

3.25

3.3

3.35

3.4

(a)

GP

C c

om

puta

tion tim

e f

or

FB

FR

(sec)

AG

PC

com

puta

tion tim

e for

FB

FR

(sec)

0 50 100 150 200 250 300 3508

8.05

8.1

8.15

8.2

8.25

8.3

8.35

8.4

8.45

(c)

NA

MP

C c

om

puta

tion t

ime f

or

FB

FR

(sec)

(a) (b)

Number of samples Number of samples

Fig. 6.22: Computation time for the parallel implementation of the identification and control strategies at each time sample:

(a) AGPC for the FBFR process and (c) NAMPC for the FBFP process.

6.2.4.2 Computation Time for the Neural Network Identification and Control of the

FBFR Process

The neural network identification scheme and AGPC and NAMPC control strategies above were

implemented on an Intel® Core™2 Quad CPU Q8400 running at 2.66 GHz by using the “parfor” command

available in the MATLAB Toolbox of Distributed and Parallel Computing. The “parfor” command implements

the computations specified by the command in parallel and it uses the four Intel® processors available on the

computer system. The “parfor” is initialized using the MATLAB “matlabpool open x” command, where x

specifies the number of processor(s) to be used in the computation. Of course, after the computation, the parallel

session is closed using the command “matlabpool close”.

The plot of the computation time at each time sample is shown in Fig. 6.22. The average computation

time for the identification and control of the FBFR by using the AGPC and the NAMPC algorithms calculated

from the figures given in Fig. 6.22(a) and (b) were found to be 3.1207 and 8.0809 seconds respectively.

As can be seen in Fig. 6.21, the closed loop neural network-based model identification and control by

using the NAMPC strategy outperforms the AGPC but at the expense of extra computation time as in Fig. 6.22.

The computation time required by NAMPC is approximately twice of that taken by the AGPC.

6.2.5 Implementation of the PID and NAMPC algorithms Over the Service-Oriented

Architecture Cluster Network and their Performance Evaluation

In order to verify the extend to which the simulated results are in compliance with reality and especially

to make sure that the execution time of the algorithms can be within the sampling time of the FBFR process the


202

implementation of the algorithms on the hardware in the loop simulator, discussed in chapter 5 was attempted.

The objective here is to investigate whether the proposed identification and control strategies can meet real-time

constraints of the FBFR process in an industrial environment with respect to the sampling time of the FBFR

process in the presence of process constraints and external disturbances ( )d k .

As it was shown in the previous sub-section 6.2.4, the NAMPC control performance outperforms the

AGPC algorithm. The implementation here is restricted to this NAMPC algorithm and the classical PID algorithm

used in the current control system of the real plant. As far as the neural network model identification scheme is

concerned, again the implementation restricted to the best of the two neural network-based model identification

algorithms studied; that is, the modified Levenberg-Marquardt algorithm (MLMA) and the related NNARMAX

model identification scheme.

In order to evaluate the online performance of the proposed identification and control strategies for the

FBFR process the proposed NAMPC is considered due to its superior control performance over the AGPC

controller as it was shown in the previous sub-section 6.2.4. Here, the neural network-based NAMPC control

strategy of Fig. 4.4 and the PID control scheme of Fig. 6.12 are used together with the neural network model

identification scheme of Fig. 3.5(b) in closed loop with the FBFR process in order to evaluate the implementation

of the identification and control strategy on the hardware in the loop simulator.

The reference or setpoint signal used for evaluating the performance of the NAMPC and the PID

controllers for the FBFR process is based on a first-order temperature set-point variations similar to the one used

in the original FBFR problem formulation in [Voutetakis et al., 2006] and is again given here as:

[ ]'( ) 1 exp( 80 / 500)start step

R k T T k= + − − (6.11)

where start

T and step

T denotes the temperature level before the initiation of the experiment and the final set point

value in the temperature. Note that the coefficients in the exponential term influence the first-order change of the

set point. start

T for the heater (Th) and reactor interior (Tri) are both 0 oK whereas

stepT for Th and Tri are 1040 o

K

and 860 oK respectively. One advantage of this first-order change is to avoid abrupt changes in the control inputs

arising from a large step change in the reference signal.

The closed loop identification and control scheme considered here for the PID controller is the one shown

in Fig. 6.12 while that for the NAMPC control scheme is the one shown in Fig. 4.4. The FBFR process shown in

Fig. 6.12 is replaced by the neural network model of Fig. 3.5(b) based on the NNARMAX model identification

scheme illustrated by Fig. 3.5 (b) in Chapter 3. Similarly, the neural network model used in Fig. 4.4 is also based

on the same NNARMAX model identification scheme.


k kϕ θ

defined by (6.5). The setup of the neural network identification scheme here is the same as that discussed in

Section 6.2.2. The input ( , ( ))cn k kϕ θ is initialized to small positive random matrix of values (0, 0.05) with


203

Table 6.6: FBFR Process Constraints

PID Controller NAMPC Process Constraint Parameters

Th Tri Th Tri

Initial control input, ( )U k -100 -100 -100 -100

Initial control output, ˆ( )Y k 0 0 0 0

Minimum control input, minU∆ 0 0 0 0

Maximum control input, maxU∆ 200 200 200 200

Minimum predicted output, min

Y 0 0 0 0

Maximum predicted output, max

Y 1200 1000 1200 1000

Maximum reference signal, '( )R k 1040 860 1040 860

Table 6.7: Tuning Parameters

PID Controller NAMPC Tuning Parameters

Th Tri Th Tri

Nd - - 1 1

Nu - - 2 2

Np - - 7 5

κ - - 1.5 1

ρ - - 0.08 0.08

λ - - 0.1 0.7 ( )τδ - - 1e-6 1e-4

Am [1.0 -0.7] [1.0 -0.7] [1.0 -0.7] [1.0 -0.7]

Bm [0.0 0.3] [0.0 0.3] [0.0 0.3] [0.0 0.3]

uiter - - 10 10

KP 500 250 - -

KI 1000 500 - -

KD 30 50 - -

dimension c

n byc

n and magnitude. The outputs of the neural network are predicted values of ˆ( )Y k given by

(6.6). However, the simulation results of interest are again the temperatures of the heater (Th) and the reactor

interior (Tri) denoted by the vector. Thus, the predicted ˆ ˆ ˆ( ) [ ( ) ( )]T

Th TriY k y k y k= and the control inputs

are ( ) [ ( ) ( )]T

Q DWPU k u k u k= , where ( ) ( )Q HRPu k u k= is the control input of the high resistant potentiometer

(HRP).

Before implementing the NAMPC and the PID control strategies, the neural network model of the FBFR

process is identified and trained using the MLMA algorithm as discussed in Section 6.2.2 for 100 epochs under

the constraints given in Table 6.6 in order to obtain the initial network parameters. The optimal values for the

initial tuning parameters for the PID and NAMPC controllers are given in Table 6.7.

Next, the NAMPC and PID control strategies are programmed to run in closed-loop with the first

principles model of the FBFR process given in Appendix B and implemented in MATLAB environment. Here,

the disturbance ( )d k is introduced by changing the product of the density ( )h

ρ and heat capacity ( )h

Cp of the

electric heater as discussed in the second paragraph of Section 6.2.2.2 as well as in the second and third before the

last paragraph of Section 6.2.4. The only difference in the disturbances here is that the changes in the FBFR


204

process model are made by taking +30% of 57.63800 10cross h h

Cp Cpρ= = ×i across 120 samples which

corresponds to the number of simulation samples considered here for the closed loop identification and adaptive

control of the FBFR process. The lower and upper values of +30% of cross

Cp are 59.92940 10× and 55.34660 10×

in step of 33.8511 10× .

At each of the 120 neural network model identification and adaptive control sequence, the training data is

first scaled using (3.89), the network is trained for 100τ = epochs using the adaptive recursive least squares

(ARLS) and then the trained network is again rescaled according to (3.90), so that the resulting weight can work

with unscaled calculated control inputs by the controllers to the FBFR process. As in Section 6.2.2, the following

network parameters were selected: 2p = , 6q = , 3a

n = , 3b

n = , 3c

n = , 42nϕ = , 10h

n = , 6o

n = , 1 5h

eα = − and

1 4o

eα = − . Also the three design parameters required to initialize the MLMA algorithm were selected as

0.001τλ = , 0.05s = and 0.01δ = .

6.2.5.1 Results of the Closed-Loop Simulation

The closed-loop PID and NAMPC control performance for the Th and Tri output predictions are shown in

Fig. 6.23(a) and (b) while the control inputs; DWP and HRP are shown in Fig. 6.23 (c) and (d). The prediction

errors due to the PID and NAMPC controllers are shown in Fig. 6.23 (e) and (f). In this simulation, we allowed

the constraints on the maximum predicted outputs to be 1200°K and 1000°K for Th and Tri respectively (see

Table 6.6) in order to observe any overshoot. As it can be seen in Fig. 6.23 (a) and (b), the NAMPC shows good

control performance over the PID controller. The PID controller exhibits overshoot with oscillations and hardly

tracks the desired reference signals as in Fig. 6.23(a) and (b) as well as significant output prediction errors, as this

becomes evident in Fig. 6.23(e) and (f).

Thus, at each sampling instant k, a new input-output data pair produced by the FBFR process at the device

level due to changes in ( )cross h h

d k Cp Cpρ= = i is fed to computer executing the neural network model

identification and control scheme as this is shown in Fig. 5.1. This new input-output data pair is obtained and

progressively added to NZ in a first-in first-out fashion, where the last data pair in N

Z is also progressively

discarded to maintain a fixed size of NZ . The first

an and

bn data pair in N

Z are used to construct the terms

defined in the NNARMAX model regression vector ( , ( ))NNARMAX

k kϕ θ with the fact that the posteriori estimate of

( , ( ))cn k kϕ θ is known at future times. In this way, the newly added input-output data pair is included in the

regression vector and the regressors consists of the current input and output states of the FBFR process at each

time sample. After the identification and control action at the current sampling instant k , the current calculated

control inputs ( )U k is delivered again over the SOA cluster network to the plant at the device level. At the next


205

0 50 100 150 200 250 300 350

0

500

1000

(a)

Th predictions (oK )

Th Ref

PID

NAMPC

0 50 100 150 200 250 300 350

0

500

1000

(b)

Tri pre

dictions (oK )

Tri Ref

PID

NAMPC

0 50 100 150 200 250 300 350

0

50

100

150

200

(c)

HRP control signal (K

Ω)

PID

NAMPC

0 50 100 150 200 250 300 350

0

50

100

150

200

(d)

DW

P control signal (m

/sec)

PID

NAMPC

0 50 100 150 200 250 300 350-500

0

500

(e)

Error in T

h p

rediction (oK)

PID

NAMPC

0 50 100 150 200 250 300 350-500

0

500

(f)

Number of samples

Error in T

ri prediction (oK)

PID

NAMPC

Fig. 6.23: FBFR temperature predictions by the PID controller (blue--) and NAMPC (red) for (a) Th and

(b) Tri. The reference signal is shown in (pink -.-); the control signals: (c) HRP and (d) DWP;

and the output prediction errors in (e) and (f) for Th and Tri respectively for k = 350 samples.

sampling instant k , the entire operation is repeated. The data exchange operation between the plant (that is,

FBFR process) and the identification and control scheme is performed at the plant and enterprise level over the

proposed SOA cluster network.

In order to demonstrate the benefits of utilizing this service-oriented architecture (SOA) cluster network,

the same data are delivered over a simple DPWS-based traditional Ethernet network. Both of them consist of six


206

Table 6.8: Size volume of the DPWS (in bytes) from the FBFR process

Message Type DPWS

Probe 663

Probe match 1199

Hello 1045

Device transfer Get 751

Service transfer Get 751

Device transfer Get response 2008

Service transfer Get response 1772

Subscribe 1424

Subscription response 981

Control 575

Notification 726

sensors corresponding to six state output derivatives, two actuators for the current FBFR process inputs and one

component associated with the proposed identification and control scheme. In the traditional Ethernet network the

aforementioned components are interconnected with each other through an Ethernet bus while in the proposed

SOA cluster network through two Ethernet switches complying by this way with the architecture depicted in Fig.

5.1. The performance of both networks is studied during eventing and control level interactions due to the reasons

explained in Section 5.2.1. During these interactions HTTP is used and so TCP connections are established.

Therefore equation (5.7) can be used for calculating the worst case overall control loop delay in the proposed

SOA cluster network. The same equation can be used for determining the worst case overall control loop delay in

the traditional Ethernet network, as long as 1 2,pr prD D are the overall processing delays that TCP data segments

experience in the traditional Ethernet network.

In Table 6.8 data volume of DPWS messages are listed. They were produced from the FBFR control. The

size of the TCP data segment transmitted from the device level to the control system is set to 972 octets (726

octets notification message plus 180 octets of HTTP headers plus 20 octets of TCP headers plus 20 octets of IP

headers plus 26 octets of MAC/DLL/PHY headers) while the size of the TCP data segment transmitted from the

device level to the control system is set to 821 octets (575 octets control message plus 180 octets of HTTP

headers plus 20 octets of TCP headers plus 20 octets of IP headers plus 26 octets of MAC/DLL/PHY headers).

Also a constant bit rate (CBR) application was developed that produces the DPWS eventing and control messages

simultaneously with 1 minute rate for every device that renders in the device level. This application was

developed over TCP/IP. Lastly, 100 traffic generators were plugged in each one of the networks. These generators

produce additional packets of 1246 octets (1000 octets application message plus 180 octets of HTTP headers plus

20 octets of TCP headers plus 20 octets of IP headers plus 26 octets of MAC/DLL/PHY headers) which are

irrelevant to the control application every 120 milliseconds using the CBR application. Next a comparison is

made between the two networks in order to verify the efficacy of the proposed SOA cluster network.


207

0 20 40 60 80 100 1200

20

40

60

80

100

120

Number of samples

Delay (sec)

DPWS over Ethernet

DPWS over Switched Ethernet

Fig. 6.24: 1 2tr tr

D D+ delay (in seconds) between the FBFR process and the control system obtained by SOA

cluster network.

6.2.5.2 Worst Case Overall Control Loop Delay Introduced by a DPWS-Based

Traditional Ethernet Network

In the simple DPWS-based traditional Ethernet network all the components comply with the DPWS

specification. The simulation results obtained are shown in Fig. 6.24. As it is shown, the network never reached a

stable state and so no predictions could be made about the transmission delay1 2

( )tr tr

D D+ . As it can also be seen

in Fig. 6.24, the transmission delay some times exceeds the sampling period of the FBFR process with its

maximum value being 120.2790 seconds. _ _ _ _s p t p cs p ct pD D D D= = = which is the DPWS protocol stack response

time and is defined to be approximately 10 ms [Jammes et al. 2005]. Also _ _ _ _s p tcp t p tcpD D= and is observed to be

approximately 300 sµ while the average cD is approximately 2.7 seconds and so 1 2

2.7424pr pr

D D+ = seconds.

Therefore, the worst case overall control loop delay is calculated using (5.7) to be equal with 123.0214 seconds

and the DPWS-based traditional Ethernet network cannot fulfill the real time characteristics of the FBFR process

as evident in the online closed-loop simulation result of Fig. 6.25.

The slightly poor performance of the NAMPC in tracking the desired reference is due to the transmission

delays introduced by the network. As it is depicted in Fig. 6.25, the NAMPC sometimes tracks and sometimes

does not track the desired reference according to the transmission delay introduced by the network which is

sometimes below or above the sampling time of the FBFR process (see Fig. 6.24).


208

0 20 40 60 80 100 120

0

500

1000

(a)

Th p

redic

tions (

o K

)

0 20 40 60 80 100 120

0

500

1000

(b)

Tri p

redic

tions (

o K

)

Th Ref

NAMPC

Tri Ref

NAMPC

0 20 40 60 80 100 1200

50

100

150

200

(c)

HR

P c

ontr

ol sig

nal

0 20 40 60 80 100 1200

50

100

150

200

(d)

Number of samples

DW

P c

ontr

ol sig

nal

H

RP

co

ntr

ol

sign

al (

KΩ

)

DW

P c

on

tro

l si

gn

al (

m/s

ec)

0 20 40 60 80 100 120

0

500

1000

(a)

Th

pre

dic

tio

ns

( o

K )

0 20 40 60 80 100 120

0

500

1000

(b)

Tri

pre

dic

tio

ns

( o

K )

Th Ref

NAMPC

Tri Ref

NAMPC

0 20 40 60 80 100 1200

50

100

150

200

(c)

HR

P c

ontr

ol sig

nal (K

Ω)

0 20 40 60 80 100 1200

50

100

150

200

(d)

Number of samples

DW

P c

ontr

ol sig

nal (m

/sec)

Fig. 6.25: Online identification and control of the FBFR Fig. 6.26: Online identification and control of the FBFR

process over the DPWS implemented over a traditional process over the proposed SOA cluster network: (a) Th

Ethernet network: (a) Th and (b) Tri predictions with their and (b) Tri predictions with their respective control

respective control signals (c) HRP and (d) DWP. signals (c) HRP and (d) DWP.

6.2.5.3 Worst Case Overall Control Loop Delay Introduced by the Proposed

Service-Oriented Architecture (SOA) Cluster Network Based on the DPWS

The proposed SOA cluster network is based on DPWS implemented on top of switched Ethernet

architecture. 1 2tr tr

D D+ was observed to be 0.1023 seconds at each sampling time (except from the first one) as it

is shown in Fig. 6.24. Moreover the overall processing transmission delay (1 2pr pr

D D+ ) is equal with the one

calculated for the DPWS-based Ethernet network. So, the worst case overall control loop delay is calculated to be

2.8447 seconds using (5.7) which is below the sampling period of the FBFR process. Therefore the proposed

SOA cluster network fulfills the real time requirement of the FBFR process as shown in the online step response

simulation result of Fig. 6.26 where the NAMPC accurately tracks the desired reference signal at each sampling


209

0 20 40 60 80 100 120

12.7

12.8

12.9

13

13.1

13.2

13.3

13.4

13.5

13.6

Onlin

e c

om

puta

tion tim

e for th

e N

CS

(sec)

Number of samples

Fig. 6.27: Computation time for the FBFR model identification and control at each time sample with an average

computation time of 12.8465 seconds.

instant. The closed-loop computation time for the FBFR process model identification and control based on the

service-oriented architecture (SOA) cluster network using an Intel® Core™ 2 CPU @ at 1.86GHz computer is

shown in Fig. 6.27 with an average computation time of 12.8465 seconds which is 9.3411 times less than the

sampling time of the FBFR process with a bound round trip closed-loop delay of approximately 0.1023 seconds at

each sampling instants.

The performance of the online FBFR process model identification using the MLMA and NAMPC

algorithms over the service-oriented architecture (SOA) cluster network shows that the proposed algorithms can

be used in an industrial network control environment since the real-time constraints of 120 seconds for the FBFR

process is satisfied. With the average computation time of 12.8465 seconds, it is possible to update the

identification and control algorithm over the SOA cluster network for the FBFR process about 9 times within the

normal FBFR process sampling period of 120 seconds. The small value of the bound round trip closed-loop delay

of 0.1023 seconds suggest that the proposed SOA based on DPWS can be deployed for the development of a

network control system for systems with sampling times similar to the FBFR process with 120 seconds.

Chapter 6 Case Study 2: Activated Sludge Wastewater Treatment Plant

210

6.3 Activated Sludge Wastewater Treatment Plant (AS-WWTP)

6.3.1 An Overview of the AS-WWTP Process

Activated sludge wastewater treatment plants (WWTPs) are large complex nonlinear multivariable

systems, subject to large disturbances, where different physical and biological phenomena take place. Many

control strategies have been proposed for wastewater treatment plants but their evaluation and comparison are

difficult. This is partly due to the variability of the influent, the complexity of the physical and biochemical

phenomena, and the large range of time constants (from a few minutes to several days) inherent in the activated

sludge process. Additional complication in the evaluation is the lack of standard evaluation criteria.

With the tight effluent requirements defined by the European Union and to increase the acceptability of

the results from wastewater treatment analysis, the generally accepted COST Actions 624 and 682 benchmark

simulation model No. 1 (BSM1) model [Henze et al., 1996] is considered. The BS1 model uses eight basic

different processes to describe the biological behaviour of the AS-WWTP processes. The combinations of the

eight basic processes results in thirteen different observed conversion rates as described in Appendix C. These

components are classified into soluble components ( )S and particulate components ( )X . The nomenclatures and

parameter definitions used for describing the AS-WWTP in this work are given in Table C6.1. Moreover, four

fundamental processes are considered: the growth and decay of biomass (heterotrophic and autotrophic),

ammonification of organic nitrogen and the hydrolysis of particulate organics. The complete BSM1 used to

describe the AS-WWTP considered here is given in Appendix C.

6.3.1.1 Statement of the Activated Sludge Wastewater Treatment Plant (AS-WWTP)

Problem

The activated sludge wastewater treatment plant considered here is strictly based on the benchmark

simulation model no. 1 (BSM1) proposed by the European Working Groups of COST Action 624 and 682 in

conjunction with the International Water Association (IWA) Task Group on Benchmarking of Control Strategies

for wastewater treatment plants (WWTPs) ([Coop, 2000], [COST, 2000], [COST, 2008]). This implementation of

the benchmark simulation model no. 1 (BSM1) follows the methodology specified in [COST, 2008] especially

from the viewpoint of control performances. The complete description of the conventional activated sludge

wastewater treatment plant (AS-WWTP) based on the benchmark simulation model no. 1 (BSM1) is given in

Appendix C together with the mathematical model of the benchmark simulation model no. 1 (BSM1) and the

MATLAB/Simulink programs that implements the mathematical model of the BSM1.

Since the introduction of the benchmark simulation model no. 1 (BSM1) by Henze and co-workers

[Henze et al., 1996] and later by the COST Action Groups [Coop, 2000], [COST, 2000] and [COST, 2008];


211

extensive research and surveys based on simulation with the BSM1 have been reported. For example, ([Ekman et

al., 2006], [Holenda et al., 2008], [Gernaes et al., 2004], [Lee et al., 1999], [Lee et al., 2006], [Shen et al., 2008],

[Stare et al., 2006 and 2007], [Yong et al., 2006]). Dissolved oxygen (DO) concentration and nitrogen

concentration are the frequently investigated controlled variables; whereas oxygen transfer coefficients, internal

recycled nitrate flow rate ( Qa ), recycled activated sludge (RAS) flow rate, waste activated sludge flow rate

(WAS) and external carbon dosing rate [Yong et al., 2006] are also the frequently investigated manipulated

variables in these wastewater treatment plants. Nevertheless, the dissolved oxygen (DO) control is the most wide-

spread in real-life, since the DO level in the aerobic reactors has significant influence on the behaviour and

activity of the heterotrophic and autotrophic microorganisms living in the activated sludge.

The dissolved oxygen (DO) concentration in the aerobic part of the activated sludge process should be

sufficiently high to supply enough oxygen to the microorganisms in the sludge, so that organic matter is degraded

and ammonium is converted to nitrate. On the other hand, an excessively high DO concentration which requires a

air flow rate lead to high energy consumption and may also deteriorate the sludge quality. In addition, a high DO

in the internally recirculated wastewater also makes the denitrification less effective.

However, in an activated sludge wastewater treatment plant designed for tertiary treatment, beside the

conventional dissolved oxygen concentration control, an additional objective is to remove biological nutrients,

suspended solids and organic matter ([Lee et al., 1999 and 2006], [Spellman, 2003]). During the last few decades,

the importance of nutrient removal has increased as a result of the necessity to avoid eutrophication of water

bodies receiving untreated waste water and the effluent of wastewater treatment plants (WWTPs). The term

eutrophication is the slow, natural nutrient enrichment of streams and lakes and water reservoirs. For this reason,

many new WWTPs are now designed for tertiary treatments [Spellman, 2003]. Apart from the important

repercussions on the effluent quality, tertiary treatment also has a beneficial influence on the performance of the

wastewater treatment process itself. This is particularly noticeable in the case of nitrogen removal [Spellman,

2003]. Moreover, the development of nitrification in an activated sludge process is practically unavoidable when

the sewage reaches temperatures of 22°C to 24°C, which will be the case for at least part of the year in tropical

and sub-tropical regions. The formed nitrate can be used by microorganisms in the activated sludge as a substitute

to dissolved oxygen. In an anoxic environment, characterized by the presence of nitrate and the absence of

dissolved oxygen, the nitrate ion can be reduced by organic matter to nitrogen gas: this process is called

denitrification.

Some successful conventional dissolved oxygen (DO) control schemes have been reported ([Ekman et al.,

2006], [Holenda et al., 2008]). Advances towards nitrogen removal have also been reported ([Stare et al., 2006

and 2007], [Yong et al., 2006]) as well as complete biological nutrient removal (BNR) [Lee et al., 1999 and

2006]. In this section, the proposed adaptive recursive least squares (ARLS) algorithm and the modified

Levenberg-Marquardt algorithm (MLMA) are used to train a neural network in order to identify an approximate

neural network model of the third aerobic tank (Unit 5) of the activated sludge wastewater treatment plant (AS-


212

WWTP) process shown in Fig. 6.28 based on the benchmark simulation model no. 1 (BSM1). The performance of

the trained network is also investigated. Next is the adaptive control of the dissolved oxygen (DO) concentration

using the neural network model identification and adaptive algorithms proposed in Chapters 3 and 4 respectively.

In order to illustrate the proposed control scheme for the AS-WWTP, a reduced and simplified form of

Fig. C.1 shown in Fig. 6.28 shows the control scheme for the dissolved oxygen ( DO ) concentration with the

oxygen transfer coefficient 5L

k a as the manipulated variable. Here, the control loop involves controlling the

dissolved oxygen ( DO ) level in the last aerobic tank (Unit 5) to a setpoint of 32.0 g m− by manipulation of the

oxygen transfer coefficient as illustrated in Fig. 6.28. The DO sensor used in this loop is assumed to be ideal with

no delay or noise. The oxygen transfer coefficient in the third aerobic tank (Unit 5) denoted by 5L

k a is

constrained to a maximum of 310 h− . As shown in Fig. 6.28, the second control objective would have involved

controlling the nitrate and nitrite nitrogen NO

S level in the anoxic tank (Unit 2) to a setpoint of 31.0 g m− by

manipulating of the internal recycled nitrate flow rate Qa . In this loop, the nitrate and nitrite nitrogen sensor is

assumed to have a time delay of 10 minutes, with white, normally distributed, zero-mean noise with a standard

deviation of 30.1g m− ([Coop, 2000], [COST, 2000], [COST, 2008]). The internal recycled nitrate flow rate Qa is

constrained to a maximum of 3 192230 m d− .

However, the nitrate and nitrite nitrogen NO

S level control in the anoxic tank (Unit 2) is not controlled

directly in this study, rather it is controlled indirectly by treating the AS-WWTP process as an alternating

m=10

m=1

m=6

Aerobic

Tank

(Unit 4)

Aerobic

Tank

(Unit 3)

Aerobic

Tank

(Unit 5)

Anoxic

Tank

(Unit 2)

Anaerobic

Tank

(Unit 1)

Deoxic Mixing

Tank

Influent

Pump (1)

Influent Tank

Settler Effluent Tank WAS Tank RAS Tank

WAS Pump (5)

Feed

Pump (4)

RAS

Pump (6)

NOX

Pump (3)

Internal nitrate recycle

External nitrate recycle

Pump (2)

e eQ Z

f fQ Z

1 1a aQ Z

u uQ Z

r rQ Z

w wQ Z

RAS Recycle

Pump (7)

Mechanical

Aerator 1 Mechanical

Aerator 2

5L

k a

i n i nQ Z

2 2a aQ Z

Nitrate Dissolved Oxygen,

( )NO

DO S

OS Set point

3(2.0 )g m−

NOS Set point

3(1.0 )gN m−

Time

delay

Noise

Fig. 6.28: The AS-WWTP with dissolved oxygen concentration and the nitrate control loops.


213

activated sludge process [Holenda et al., 2008]. According to this technique, nitrate and nitrite nitrogen NO

S

removal is then realized by simply switching the aeration system on and off to create continuous alternating

aerobic and anoxic conditions respectively. During switched-on period, ammonium is converted into nitrate which

is subsequently used to remove organic carbon in switched-off periods. An important feature of this alternating

approach is its flexible control ability which makes its suitable for optimization of operating costs. Since the

process consists of alternating aerated and non-aerated periods, the aeration can be reduce the global energy

consumption and subsequently the operating costs by 60–80% [Holenda et al., 2008]. Therefore, oxygen control is

of great importance and it is the subject of this study.

6.3.1.2 Statement of the Activated Sludge Wastewater Treatment Plant (AS-WWTP)

Neural Network Model Identification and Control Problem

The activated sludge wastewater treatment plant model defined by the benchmark simulation model no. 1

(BSM1) is described by eight coupled nonlinear differential equations given in Appendix C–3. The BSM1 model

consist of thirteen states defined in Table C.1 in Appendix C as follows: I

S , S

S , I

X , S

X , BH

X , BA

X , P

X , O

S ,

NOS ,

NHS ,

NDS ,

NDX , and

ALKS out of which four states are measurable namely:

SS (readily biodegradable

substrate), BH

X (active heterotrophic biomass), O

S (oxygen) and NO

S (nitrate and nitrite nitrogen). An additional

important parameter TSS is used to assess the amount of soluble solids in all the reactors including Unit 5.

As discussed above, the main objective here is on the efficient neural network model identification of the

activated sludge wastewater treatment plant (AS-WWTP), and the adaptive control of the dissolved oxygen ( DO )

concentration in the anoxic tank (Unit 5) by manipulation of the oxygen mass transfer coefficient ( 5L

k a ). Thus,

the main control input to Unit 5 is the oxygen mass transfer coefficient 5 ( )Lk au k ; that is

5( ) [ ( )]L

T

k aU k u k= (6.12)

The controlled output of the AS-WWTP is the soluble oxygen (O

S ) which defines the dissolved oxygen (DO)

concentration, defined here as:

( ) [ ( )]O

T

SY k y k= (6.13)

Although, the multivariable system has been reduced to a single–input single–output control problem, but the

neural network model identification is formulated as a multiple–input multiple–output (MIMO) problem since all

the fourteen (14) states must be predicted at each sampling instant in order to obtain a reasonable approximate

model that describes the system’s dynamics at that instant. The neural network identification scheme used here is

shown in Fig. 6.29 and is based on the NNARMAX model predictor discussed in Chapter 2 and 3. The input

vector to the neural network (NN) consists of the regression vectors which are concatenated into

( , ( ))NNARMAX

k kϕ θ for the NNARMAX models predictors discussed in Chapter 3 and defined here as:


214

[

]

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

an I a S a I a S a BH a BA a

P a O a NO a NH a ND a ND a

T

ALK a a

k S k n S k n X k n X k n X k n X k n

X k n S k n S k n S k n S k n X k n

S k n TSS k n

ϕ = − − − − − −

− − − − − −

− −

(6.14)

[ ]( ) 5( )b

T

n L bk k a k nϕ = − (6.15)

( , ( )) ( , ( )) ( , ( )) ( , ( )) ( , ( ))

( , ( )) ( , ( )) ( , ( )) ( , ( ))

( , ( )) ( , ( )) ( , ( )) ( , ( ))

( ,

c I S I S

BH BA P O

NO NH ND ND

ALK

n S c S c X c X c

X c X c X c S c

S c S c S c X c

S c

k k k n k k n k k n k k n k

k n k k n k k n k k n k


k n

ϕ θ ε θ ε θ ε θ ε θ

ε θ ε θ ε θ ε θ


ε θ

= − − − −

− − − −

− − − −

− ( )) ( , ( ))T

TSS ck k n kε θ

−

(6.16)

( , ( )) ( ) ( ) ( , ( ))a b cNNARMAX n n n

k k k k k kϕ θ ϕ ϕ ϕ θ = (6.17)

The outputs of the neural network for the AS-WWTP process are the predicted values of the thirteen states

together with the amount of total soluble solids (TSS), thus resulting in fourteen states to be predicted at each

sampling instant given by:

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) ( ) ( ) ( )

I S I S BH BA P

O NO NH ND ND ALK

S S X X X X X

T

S S S S X S TSS

Y k y k y k y k y k y k y k y k

y k y k y k y k y k y k y k

=

(6.18)

However, only the simulation results for the desired output defined by (6.13) for the soluble oxygen (SO), which

defines the dissolved oxygen (DO) concentration, are presented for convenience.

Since disturbances play an important role in the evaluation of controller performances, three influent

disturbance data are defined for different weather conditions, namely: dry-weather data, rain weather data, and

storm weather data. The data for these three influent disturbances are provided by the European COST Actions for

Training

Algorithm

Neural

Network

Model

b

n TDL−

a

n TDL−

AS-WWTP Process

( , ( ))k kε θ

ˆ( )Y k

( )Y k ( )U k • •

•

•

•

+

−

( )bn

kϕ

( )an

kϕ

c

n TDL−

( , ( ))cn

k kϕ θ

( )d k

•

( , ( ))k kε θ

Fig. 6.29: The neural network model identification scheme for AS-WWTP based on NNARMAX model.


215

evaluating controller performances ([Coop, 2000], [COST, 2000], [COST, 2008]). In this study, the dry weather

influent data is used in order to measure how well the trained neural network mimic the dynamics of the As-

WWTP process to meet the control requirement specified above. The dry weather data contains two weeks of

influent data at 15 minutes sampling interval. Although, disturbances ( )d k affecting the AS-WWTP are

incorporated into dry-weather data provided by the COST Action Group, additional sinusoidal disturbances with

non-smooth nonlinearities are introduced in the last sub-section of this section to further investigate the closed-

loop controllers’ performances based on an updated neural network model at each sampling time instants.

6.3.1.3 Experiment with the BSM1 for AS-WWTP Process Neural Network

Training Data Acquisition

For the efficient control of the activated sludge wastewater treatment plant (AS-WWTP) using neural

network, a neural network (NN) model of the AS-WWTP process is needed which requires that the NN be trained

with dynamic data obtained from the AS-WWTP process. In other to obtain dynamic data for the NN training, the

validated and generally accepted COST Actions 624 benchmark simulation model (BSM1) is implemented and

simulated using MATLAB and Simulink. The complete benchmark simulation model no. 1 (BSM1) for the AS-

WWTP process is detailed in Appendix C.

A two-step simulation procedure defined in the simulation benchmark ([Coop, 2000], [COST, 2000],

[COST, 2008]) is used in this study. The first step is the steady state simulation using the constant influent flow

(CONSTINFLUENT) for 150 days as shown and implemented in Fig. C.2. Note that each “Clock” of the AS-

WWTP Simulink model in Fig. C.2 corresponds to one day. In the second step, starting from the steady state

solution obtained with the CONSTINFLUENT data and using the dry-weather influent weather data

(DRYINFLUENT) as inputs. The AS-WWTP process is then simulated for 14 days using the same Simulink

model of Fig. C.2 but by replacing the CONSTINFLUENT influent data with the DRYINFLUENT influent data.

This second simulation generates 1345 dynamic data in which 80% (1076) is used for NN training and 20% (269)

is reserved for the trained NN validation.

6.3.2 Training the Neural Network that Models the AS-WWTP Aerobic Reactor

The NN input vector to the neural network (NN) is the NNARMAX model regression vector

( , ( ))NNARMAX

k kϕ θ defined by (6.17). The input ( , ( ))cn k kϕ θ , that is the initial error estimates ( , ( ))k kε θ given by

(6.16), is not known in advance and it is initialized to small positive random matrix of dimension c

n byc

n . The

outputs of the NN are predicted values of ˆ( )Y k given by (6.18). However, the results for the soluble oxygen (SO)


216

is of interest since it define the amount of dissolved oxygen concentration in Unit 3 of the aerobic reactor. Thus,

only the simulation results for the soluble oxygen (SO) are presented in this section, for convenience.

For assessing the convergence performance, the network was trained for τ = 10, 50, 100 and 500 epochs

(number of iterations) with the following selected parameters: 1p = , 14q = , 2a

n = , 2b

n = , 2c

n = , 58nϕ =

(NNARMAX), 5h

n = , 14o

n = , 1 6h

eα = − and 1 5o

eα = − . The details of these parameters are discussed in section

3.3.2 of Chapter 3; where p and q are the number of inputs and outputs of the system, ,a b

n n and c

n are the

orders of the regressors, nϕ is the total number of regressors (that is, the total number of inputs to the network),

hn and

on are the number of hidden and output layers neurons, and

hα and

oα are the hidden and output layers

weight decay terms. The four design parameters for adaptive recursive least squares (ARLS) algorithm defined in

(3.68) are selected to be: α=0.5, β=5e-3, 'δ =1e-5 and π=0.99 resulting to γ=0.0101. The initial values for ēmin and

ēmax in (3.67) are equal to 0.0102 and 1.0106e+3 respectively and were evaluated using (3.67). Thus, the ratio

ēmin/ēmax in (3.66) is 9.9018e+4 which imply that the parameters are well selected. Also 1 4eτλ = − , 0.05s = and

1 3eδ = − were selected to initialize the modified Levenberg-Marquardt algorithm (MLMA).

The 1076 dry-weather training data is first scaled using equation (3.89) and the network is trained for

10, 50, 100τ = and 500 epochs using the adaptive recursive least squares (ARLS) and the modified Levenberg-

Marquardt (MLMA) algorithms proposed in Chapter 3. After network training, the trained network is again

rescaled according to (3.90), so that the resulting network can work with unscaled AS-WWTP data. The

convergences of the ARLS and MLMA algorithms for (a) 10, (b) 50, (c) 100 and (d) 500 epochs is shown in Fig.

6.30. One can observe that the ARLS algorithm terminated after about 60 epochs while the MLMA algorithm

terminated after about 100 epochs.

By comparing Fig. 6.30 (a)–(d), it can be seen that the ARLS algorithm has a faster convergence than the

MLMA algorithm. Since real-time identification and control is the primary aim of this work, it is necessary to

investigate the performance of networks trained with relative small number of iterations (epoch). The summary of

the training results for the ARLS and the MLMA algorithms are presented in Table 6.9. Next, the evaluation of

the network performance trained with 10 and 100 epochs is investigated for the AS-WWTP aerobic process.


6.9. As it can be seen in Table 6.9, despite the fact that the ARLS algorithm converges faster, the MLMA is about

8 and 17.7 times faster in terms of computational cost. The mean square error (MSE) and the minimum

performance index (discussed in subsection 3.3.7) for the network trained with the ARLS and the MLMA

algorithms are given in the second and third lines of Table 6.9. Again, the MLMA algorithm also has smaller

mean square errors and minimum performance indices when compared to the ARLS algorithm. The small values

of the mean square error (MSE) and the minimum performance indices indicate that MLMA performs better than

the ARLS for the same number of iterations (epochs). These small errors suggest that the MLMA model

approximates better the AS-WWTP process giving smaller errors than the ARLS model.


217

6.3.2.1 Validating the Trained Network that Models the AS-WWTP Process

According to the discussion on network validation in Chapter 3, a trained network can be used to model a

process once it is validated and accepted, that is, the network demonstrates its ability to predict correctly both the

data that were used for its training and other data that were not used during training. The network trained by the

proposed ARLS and MLMA algorithms has been validated with three different methods by the use of scaled and

unscaled training data as well as with the 269 dry-weather data reserved for the validation of the trained network

for the AS-WWTP process.

6.3.2.1.1 Validation by the One-Step Ahead Predictions Simulation

In the one-step ahead prediction method, the errors obtained from one-step ahead output predictions of the

trained network are assessed. In Fig. 6.31 (a) and (b) graphs for the soluble oxygen (SO) one-step ahead

predictions of the scaled training data (blue -) against the trained network output predictions (red --*) using the

Table 6.9: Summary of the training results by ARLS and MLMA algorithms for the AS-WWTP process

EPOCHS

10 100

ARLS MLMA ARLS MLMA

Computation time for model identification (sec) 8.7969 1.0938 42.2188 2.3750

Mean of square error (MSE) 8.8100e-2 2.5000e-4 6.5000e-3 5.0000e-5

Minimum Performance Index 4.1035e-4 7.3300e-5 3.0000e-5 1.0000e-8

Mean error of one-step ahead prediction of training data 1.2000e-3 1.3000e-4 1.4000e-3 1.0000e-5

Mean error of one-step prediction of validation data 7.0000e-3 1.9000e-3 2.0000e-3 1.3950e-4


Akaike’s final prediction error (AFPE) estimate 3.1180e-1 1.02640e-4 3.0162e-2 4.2430e-5

2 4 6 8 1010

-6

10-4

10-2

100

10 Epochs

(a)

ARLS

MLMA

0 10 20 30 40 5010

-10

10-5

100

50 Epochs

(b)

ARLS

MLMA

0 20 40 60 80 10010

-10

10-5

100

100 Epochs

(c)

ARLS

MLMA

0 100 200 30010

-10

10-5

100

500 Epochs

(d)

ARLS

MLMA

Perform

ance Index

Epochs (Number of iterations)

Fig. 6.30: Network convergence for the AS-WWTP using the ARLS and the MLMA algorithms.


218

neural network models trained by ARLS and the MLMA algorithms respectively are shown for 10 epochs.

Similar results for 100 epochs using both ARLS and the MLMA algorithms for SO predictions are shown in Fig.

6.32 (a) and (b) respectively.

The one-step ahead prediction errors are given in the fourth line of Table 6.9 for the 10 and 100 epochs

respectively. It can be seen in both Fig. 6.31 and Fig. 6.32 that the network predictions of the training data

generally match closely the original training data used for the SO. Although, the scaled training data prediction

0 200 400 600 800 1000 1200-4

-2

0

2

SO (m

g C

OD/l)

(a)


0 200 400 600 800 1000 1200-4

-2

0

2

SO

(m

g C

OD/l)

(b)


Fig. 6.31: Comparison of soluble oxygen (SO) data predictions with the training data by the network trained

using ARLS and MLMA algorithms for 10 Epochs.

0 200 400 600 800 1000 1200-4

-2

0

2

SO (mg C

OD/l)

(a)


0 200 400 600 800 1000 1200-4

-2

0

2

SO (mg C

OD/l)

(b)


Fig. 6.32: Comparison of soluble oxygen (SO) data predictions with the training data by the network trained

using ARLS and MLMA algorithms for 100 Epochs.


219

errors by both algorithms are small, the MLMA algorithm appears to have a much smaller error when compared

to the ARLS algorithm. These observations are better shown in the fourth line of Table 6.9. These small one-step

ahead prediction errors are indications that both trained networks capture and approximate the nonlinear dynamics

of the aerobic reactor of the AS-WWTP process accurately. This is further justified by the small mean values of

the MSE obtained using ARLS and MLMA algorithms for the SO.

Furthermore, the suitability of the proposed ARLS and MLMA algorithms for neural network model

identification for use in the real AS-WWTP industrial environment is investigated by validating the trained

network with the 269 unscaled dynamic data obtained from the dry-weather data provided by the COST Action

Group. Graphs of the trained network predictions (red --*) of the validation data with the true data (blue -) for 10

epochs for each one of the ARLS and the MLMA algorithms are shown in Fig. 6.33 (a) and (b) for the SO.

Similar results for 100 epochs are shown in Fig. 6.34 (a) and (b). The almost identical prediction of these data

proves the effectiveness of the proposed approach. The prediction accuracies of the unscaled test data by the

networks trained using ARLS and the MLMA algorithm evaluated by the computed mean prediction errors shown

in the fifth line of Table 6.9. Again, one can observe that the validation data prediction errors obtained by both

algorithms are small, the validation data predictions errors with the model by the MLMA algorithm appears

smaller compared to those obtained by using the ARLS algorithm. These predictions of the unscaled validation

data given in Figs. 6.33 and 6.34 as well as the prediction errors in Table 6.9 verify the neural network ability to

model accurately the dynamics of the third aerobic reactor of the AS-WWTP based on dry-weather influent data.

0 50 100 150 200 250 3000

2

4

6

SO

(m

g C

OD

/l)

(a)

- Unscaled target output; --* Predicted outputs by ARLS

0 50 100 150 200 250 3000

2

4

6

SO

(m

g C

OD

/l)

- Unscaled target output; --* Predicted outputs by MLMA

(b)

0 50 100 150 200 250 3000

2

4

6

SO

(m

g C

OD

/l)

(a)


0 50 100 150 200 250 3000

2

4

6

SO

(m

g C

OD

/l)


(b)

Fig. 6.33: Comparison of soluble oxygen (SO) validation

data predictions by the network trained using


Fig. 6.34: Comparison of soluble oxygen (SO) validation

data predictions by the network trained using



220

6.3.2.1.2 K–Step Ahead Prediction Simulations for the AS-WWTP Process



data (blue -) are shown in Fig. 6.35 and Fig. 6.36 for 10 and 100 epochs respectively. The (a) pairs in both figures

correspond to the results of the network that was trained using the ARLS while the (b) pairs correspond to the

network trained using the MLMA algorithm. Again, the value K = 5 is chosen since it is a typical value used in

most model predictive control (MPC) applications. The comparison of the 5-step ahead output predictions

0 200 400 600 800 1000 12000

2

4

6

(a)

SO (mg C

OD/l)


0 200 400 600 800 1000 12000

2

4

6

SO (mg C

OD/l)


(b)

Fig. 6.35: 5-step ahead soluble oxygen (SO) output predictions (red--*) comparison with original unscaled

training data (blue-) using the network trained using ARLS and MLMA algorithms for 10 Epochs.

0 200 400 600 800 1000 12000

2

4

6

(a)

SO (mg C

OD/l)


0 200 400 600 800 1000 12000

2

4

6

SO (mg C

OD/l)


(b)

Fig. 6.36: 5-step ahead soluble oxygen (SO) output predictions (red--*) comparison with original unscaled

training data (blue -) using the network trained using ARLS and MLMA algorithms for 100 Epochs.


221

performance by the network trained using the ARLS and the MLMA algorithms indicate a superiority of the

MLMA over the ARLS algorithms.


5.8269e-002 and 5.6296e-004 by the network trained using ARLS and MLMA algorithms respectively for 10

epoch and 5.6918e-003 and 5.5812e-005 respective for 100 epoch as shown in the sixth line in Table 6.9. The

small mean values of the 5-step ahead prediction error (MVPE) are indications that the trained network

approximates the dynamics of the aerobic reactor for the AS-WWTP process to a high degree of accuracy with the

networks of both algorithms but with the network based on the MLMA algorithm giving smaller prediction errors.

6.3.2.1.3 Akaike’s Final Prediction Error (AFPE) Estimates for the AS-WWTP

Process


regularized criterion for the network trained with ARLS and the MLMA algorithms with multiple weight decay

gives their respective AFPE estimates which are defined in the last line of Table 6.1 for 10 and 100 epochs

respectively. These small values of the AFPE estimate indicate that the trained networks capture the underlying

dynamics of the aerobic reactor of the AS-WWTP and that the network is not over-trained [Sjöberg and Ljung,

1995]. This in turn implies that optimal network parameters have been selected including the weight decay

parameters. Again, the results of the AFPE estimates computed for the networks trained using the MLMA

algorithm are much smaller when compared to those obtained using ARLS algorithm.

6.3.2.2 Online Closed-Loop Identification and Control with AGPC Controller

Besides the training of the NN model with static data taken during the open-loop experiments from the

AS-WWTP simulation, it would be of interest to validate the prediction accuracy of a trained network under the

same dynamic conditions in which the plant is operating in closed-loop with an adaptive control of the soluble

oxygen (SO) concentration by manipulating the oxygen transfer coefficient parameter (KLa5) of the third aerobic

reactor of the AS-WWTP process. In this case, the AS-WWTP process is implemented in closed-loop with one of

the proposed adaptive predictive control algorithm, the adaptive generalized predictive control (AGPC),

developed in Chapter 4. The closed-loop configuration of the AGPC with the neural network model is shown in

Fig. 6.37.

This adaptive control scheme, as it was explained in chapter 4, involves the computation of the control

actions by the AGPC algorithm. In specific, every time new sensor samples are received, a new set of data is

formed consisting from previous samples and the current new ones. Then the neural network that models the AS-


222

Table 6.10: The AGPC process control and tuning parameters for the AS-WWTP process

AS-WWTP process constraints AGPC tuning parameters

minU maxU

minY maxY R' ICI (U) IPO (Y) Nd Nu Np κ ρ Am Bm

0 10 0 2 2 -5 0 1 2 5 1.5 0.08 [1 -0.7] [0 0.3]

ICI (U) = Initial control input, IPO (Y) = Initial control outputs. All other parameters are defined in the text.

AS-WWTP

Process

( )Y k ( )U k •

Neural Network

NNARMAX Model

Optimizer • +

−

( )E k

•

Constraints ( )d k

η-Step Ahead

Output Predictor

Linearized

Model

Parameters

'( )R k

First-Order

Low Pass

Filter

( )R k

Extract Linear

Model Parameters

Fig. 6.37: The closed-loop AGPC scheme used for the soluble oxygen (SO) in order to evaluate the online model

identification based on ARLS and MLMA algorithms.

WWTP dynamics is trained first by the ARLS and then MLMA methods to obtain their models. Using this newly

trained network as the aerobic reactor of the AS-WWTP process model, an AGPC controller is designed to track

the desired reference trajectory of the SO as dissolved oxygen concentration by manipulating KLa5. To achieve

this desired trajectory, constraints are imposed on the controlled and manipulated variables. These constraints are

given on the left column of Table 6.10. Τhe NN that is used at the control start up is trained outside the control

loop with data received from the open-loop experiment. For the performance evaluation of the complete closed-

loop, the AGPC control scheme shown in Fig. 4.1 in conjunction with the neural network NNARMAX model

identification scheme shown in Fig. 3.5(b) is linked with the AS-WWTP process Simulink model explained in

Appendix C and build from first-principles. The AS-WWTP Simulink model is used in place of the “system” in

both Fig. 3.5(b) as shown in Fig. 6.29 and in Fig. 4.1 as shown in Fig. 6.37 respectively for the AS-WWTP neural

network model identification scheme and adaptive control using the AGPC control strategy.

For the closed-loop start-up, a network trained either by the ARLS or the MLMA algorithms for 10 and

100 epochs was used and the AGPC algorithm was designed by using these initial trained network, and the

constraints of Table 6.16. The obtained AGPC initial design parameter values are given on the right column of

Table 6.10 according to the AGPC formulation in Chapter 4. They were found to be the same for both the ARLS

and the MLMA algorithms with 10 and 100 epochs.

The closed loop simulation was performed over a period of 300 sampling instants by superimposing the new

control input on the AS-WWTP Simulink model.


223

0 50 100 150 200 250 300

0

0.5

1

1.5

2

Number of samples

SO

pre

dic

tions (m

g C

OD

/l)

SO Ref

ARLS

MLMA

(a)

0 50 100 150 200 250 300

0

0.5

1

1.5

2

Number of samples

SO

pre

dic

tion

s (

mg

CO

D/l

)

SO Ref

ARLS

MLMA

(c)

0 50 100 150 200 250 300

0

0.5

1

1.5

Number of samples

KLa5 c

ontrol sig

nal (d

ay)-1

ARLS

MLMA

(b)

0 50 100 150 200 250 300

0

0.5

1

1.5

Number of samples

KLa5 c

ontr

ol sig

nal (d

ay)-1

ARLS

MLMA

(d)

Fig. 6.38: Closed-loop AGPC control performance of the AS-WWTP process using NN model trained with ARLS and

MLMA for (a) 10 and (c) 100 epochs with their respective control signals in (b) and (d).

The closed loop simulation results for the SO predictions as dissolved oxygen concentration (controlled

variable) is shown in Fig. 6.38 (a) while the oxygen transfer coefficient (KLa5) control signal (manipulated input),

is shown in Fig. 6.38(b) for the neural network model trained using the ARLS and MLMA algorithms with 10

epochs. Similar simulation results obtained for 100 epochs are shown in Fig. 6.38 (c) and (d).

One can observe that the SO responses for the alternating sludge process shown in Fig. 6.38(a) obtained

with the models of both ARLS and the MLMA algorithms trained for 10 epochs tracks the desired references is

increased from zero (0) towards the maximum value of two (2 mg COD/l) in almost the same way, but with a

poor tracking abilities during the switching process from zero (0). Two large mismatches in tracking the reference

by the model of the ARLS can also be observed in Fig. 6.38(a). The control signals for the oxygen transfer

coefficient (KLa5) from the models of both algorithms are approximately the same with smooth control efforts.


224

However, the SO predictions for the alternating sludge process shown in Fig. 6.38(c) for 100 epochs

indicates that the models obtained with the MLMA algorithm tracks accurately the desired SO references when

compared to the model obtained with the ARLS during the switching from zero(0) to the maximum allowable

reference value of two (2 mg COD/l). The control signals shown in Fig. 6.38(b) are essentially the same for the

oxygen transfer coefficient (KLa5) with minimum control efforts.

Although, the simulation results from the aerobic reactor model identification for 10 and 100 epochs

showed almost identical performances, it is obvious in the closed-loop identification control that the models

obtained in 100 epochs are more accurate than those of the 10 epochs as can be observed by comparing the SO

control performance results of Fig. 6.38(a)-(b) with that of Fig. 6.38(c)-(d). By comparing the control signals in

Fig. 6.38(b) and Fig. 6.58(d) which indicates the control efforts (aeration energy) obtained with the models of

both algorithms trained for 10 and 100 epochs respectively, one can conclude that the SO control consumes less

energy when compared to the maximum available control energy of 10 day-1

.

It is apparent from the above discussion that the performance of the proposed model-based AGPC control

algorithm based on a model trained with the MLMA algorithm outperforms that of the ARLS algorithm. As in

Fig. 6.38, the identification and control performance based on the models obtained using both the ARLS and the

MLMA with 100 epochs shows the efficiency and reliability of the proposed AGPC algorithm. It can also be

observed that the control performances based on the network trained using both ARLS and the MLMA algorithms

perform indistinguishably well and also justify the effectiveness of the AGPC controller.

6.3.3 Validation and Dynamic Performance Comparison of the BPM, INCBP and

the Proposed ARLS Algorithms for the Model Identification of the Aerobic

Reactor of the AS-WWTP Process

Validation and dynamic performance comparison of the proposed ARLS algorithm, the backpropagation

with momentum (BPM) and the incremental backpropagation (INCBP) algorithms for the third aerobic reactor of

the AS-WWTP process model Identification is investigated in this section. The version of the back-propagation

with momentum (BPM) algorithm used here is presented in Section 2.4.4.1 of Chapter 2. The ARLS algorithm is

considered here which is a true online identification algorithm. The third incremental backpropagation (INCBP)

algorithm given by (6.9) and (6.10) in Section 6.2.3 is used here again for this comparison.

6.3.3.1 Network Training Using BPM, INCBP and the Proposed ARLS Algorithms

Due to the simplicity and generality of the backpropagation with momentum (BPM) algorithm discussed

in Section 2.4.4.1 in Chapter 2 and the incremental backpropagation (INCBP) algorithm discussed in Section


225

6.2.3, the NNARMAX model regression vector ( , ( ))NNARMAX

k kϕ θ defined by (6.17) is used as the input vector to

the neural network. The outputs of the NN are the predicted values of ˆ( )Y k given by (6.18). The simulation

results for the soluble oxygen (SO) which defines the dissolved oxygen concentration are again presented here.

The two design parameters for the BPM algorithm defined in (2.59) are the learning rate 1 5eγ = − (i.e.

the step size) and momentum term 1 4eα = − was selected. The design parameter for the INCBP is the step

size 1 4eµ = − . As in section 6.3.2, the four design parameters for ARLS algorithm defined in (3.68) are selected

to be: α=0.5, β=5e-3, 'δ =1e-5 and π=0.99 resulting to γ=0.0101. The initial values for ēmin and ēmax in (3.67) are

equal to 0.0102 and 1.0106e+3 respectively and were evaluated using (3.67). Thus, the ratio ēmin/ēmax in (3.66) is

9.9018e+4 which implied that the parameters are well selected.

It was shown in Section 6.3.2 that the performance of the ARLS and the MLMA algorithms converges to

acceptable values after approximately 100 epochs. In order to assess the convergence performance of the network

trained by the three algorithms, the network was also trained for τ = 100 epochs with the following selected

parameters: 1p = , 14q = , 2a

n = , 2b

n = , 2c

n = , 58nϕ = , 5h

n = , 14o

n = , 1 6h

eα = − and 1 5o

eα = − .

The training data is scaled using (3.89) and the network is trained for 100τ = epochs using the

backpropagation with momentum (BPM), the incremental backpropagation (INCBP) and the ARLS algorithms.

After network training, the trained network is again rescaled according to (3.90), so that the resulting network can

work with unscaled AS-WWTP data. The convergences of the BPM, INCBP and the ARLS algorithms for 100

epochs are shown in Fig. 6.39 and are evaluated in terms of the performance index. It can observe that only the

ARLS algorithm meets the training goal of 10-6

while the BPM and the INCBP converges slowly within the

prescribed 100 epochs.

By comparing the convergences of the BPM, INCBP and ARLS algorithms in Fig. 6.39, it can be seen

that the BPM and INCBP algorithms converges faster to almost the same values than the ARLS algorithm. It can

also be seen in this figure that the performance index obtained by the network trained using ARLS algorithm has a

much smaller value when compared to that obtained by the network when it is trained using the BPM and the

INCBP algorithms. The summary of the network training results using the BPM, INCBP and the ARLS

algorithms are presented in Table 6.11 for quick comparison of the performances of the network when it is trained

by the three mentioned methods.

The computation time for training the network for 100 epochs using each one from the three algorithms

are shown in the first row of Table 6.11. As it can be seen, despite the fact that the BPM and the INCBP

algorithms converge fast enough than the ARLS algorithm and the ARLS is about 1.6536 and 3.4748 slower than

the BPM and INCBP respectively two algorithms in terms of computational time, but the ARLS retains the

smallest performance index value of 2.6453e-5 as shown in Table 6.11. The mean value of the mean square errors

(MSE) and the minimum performance indexes for the network trained with the BPM, INCBP and the ARLS

algorithms are given in the second and third lines of Table 6.11. Again, the ARLS algorithm also has smaller


226

Table 6.11: Summary of training results for the BPM, INCBP and the ARLS algorithms

BPM INCBP ARLS

Computation time for model identification (sec) 3.3750 8.5938 4.3031e+1

Mean square error (MSE) 4.2778e-3 2.2141e-2 1.8856e-4

Minimum performance index 8.6255e-3 1.2748e-2 2.6453e-5


Mean error of one-step prediction of validation data 1.4209 1.3817e+1 4.0258e-2

Mean value of 5-step ahead prediction error 1.0140e-1 9.7235e+1 1.0361e-3

Akaike’s final prediction error (AFPE) estimate 2.1780e-2 3.2333e-1 6.7325e-5

0 10 20 30 40 50 60 70 80 90 10010

-5

10-4

10-3

10-2

10-1

100

101

Number of iterations (epoch)

Perform

ance index

BPM

INCBP

ARLS

Fig. 6.39: Network convergence using the BPM, INCBP and the ARLS algorithms (performance index vs. epoch).

mean value of the mean square error (MSE) and minimum performance index when compared to the BPM and

the INCBP algorithms. The small mean value of the mean square error (MSE) and the minimum performance

index obtained by the trained by the ARLS algorithm network, shows that ARLS performs much better than the

BPM and the INCBP algorithms for the same number of iterations (100 epochs). These small errors suggest that

the network with the ARLS algorithm approximates the AS-WWTP system model with better accuracy compared

to that obtained by the network trained with the BPM and INCBP algorithms.

6.3.3.2 Validation of the Network Trained by BPM, INCBP and ARLS Algorithms

In order to verify the prediction capability of the networks trained using the BPM, INCBP and ARLS

algorithms, the same validation techniques used in Section 6.3.2.1 is again used here to evaluate the performance

of the trained network, namely: one-step ahead prediction, 10-step ahead prediction and the Akaike’s final

prediction error (AFPE) estimates which were discussed in Chapter 3. The network trained by the BPM, INCBP

and the proposed ARLS and MLMA algorithms developed for the third aerobic reactor of the AS-WWTP process

have been validated by the use of the 1076 scaled and unscaled training as well as with 269 validation data.


227

0 100 200 300 400 500 600 700 800 900 1000-3

-2

-1

0

1

2

SO

(m

g -

CO

D/l

)


(a)

Scaled Target Output

BPM Predictions

0 100 200 300 400 500 600 700 800 900 1000-3

-2

-1

0

1

2

SO

(m

g -

CO

D/l

)


(b)


INCBP Predictions

0 100 200 300 400 500 600 700 800 900 1000-3

-2

-1

0

1

2

SO

(m

g -

CO

D/l

)

- Scaled training data; -- One-step ahead output predictions using ARLS

(c)



ARLS Predictions

Fig. 6.40: Comparison of the one-step ahead prediction of the soluble oxygen (SO) training data by the network trained

using backpropagation with momentum (BPM), incremental backpropagation (INCBP), and the proposed

ARLS algorithms.

6.3.3.2.1 One-Step Ahead Predictions Simulation for the AS-WWTP Process

As in Section 6.2.2.1.1, in the one-step ahead prediction method, the training data obtained from the open-

loop simulation of the differential equations model that was scaled and used for training the network are

compared with the one-step ahead output predictions of the trained network using and an assessment of their

corresponding errors is made.

The comparison of the one-step ahead predictions of the scaled training data (target output, blue -) against

the trained network output predictions (red -.-) by the networks trained for 100 epochs using the BPM, INCBP

and the ARLS algorithms are shown in Fig. 6.40 (a)–(c) respectively for the soluble oxygen (SO).


228

0 50 100 150 200 2500

1

2

3

4

5

SO

(m

g -CO

D/l)

- Unscaled target output; -- Predicted outputs by BPM

(a)

Unscaled Validation Data

BPM Predictions

0 50 100 150 200 2500

1

2

3

4

5

SO

(m

g -CO

D/l)

- Unscaled target output; -- Predicted outputs by INCBP

(b)


INCBP Predictions

0 50 100 150 200 2500

1

2

3

4

5

SO

(m

g -CO

D/l)

- Unscaled target output; -- Predicted outputs by ARLS

(c)

Number of validation data samples


ARLS Predictions

Fig. 6.41: Comparison of the one-step ahead prediction of the soluble oxygen (SO) validation data using the network

trained using the backpropagation with momentum (BPM), incremental backpropagation (INCBP), and the

proposed ARLS algorithms.

The one-step ahead prediction errors for predicting the scaled training data by the network trained using

the BPM, INCBP and the ARLS algorithms are given in the fourth line of Table 6.11. It can be seen in Fig. 6.40

that the network predictions of the scaled training data based on the network trained using the ARLS algorithm

closely match the original training data used for the SO, whereas there are much prediction mismatch obtained

with the networks trained using the BPM and INCBP algorithms. Also, the smaller one-step ahead prediction

error obtained using the network trained by the ARLS when compared to that by BPM and INCBP algorithms are

also evident in the fourth line of Table 6.11. This error is an indication that the trained networks using the ARLS

algorithm captures and approximates the nonlinear dynamics of the aerobic reactor of the AS-WWTP process

accurately. This is further justified by the small mean value of the MSE obtained using ARLS algorithms given in

the second line of Table 6.11 when compared to those obtained by the BPM and INCBP algorithms.

Furthermore, the suitability of the BPM, INCBP and proposed ARLS algorithms for neural network

model identification for use in the AS-WWTP industrial environment is investigated by validating the trained

network with 269 unscaled validation data for the AS-WWTP obtained from the dynamic simulation of the AS-

WWTP process with the dry-weather data. The comparison of the trained network predictions (red --*) of the test

data with the dry-weather validation data (blue -) for 100 epochs are shown in Fig. 6.41 (a)-(c) for the BPM,


229

0 100 200 300 400 500 600 700 800 900 10000

1

2

3

4

5

SO

(m

g -C

OD

/l)


(a)

Unscaled Training Data

BPM Predictions

0 100 200 300 400 500 600 700 800 900 10000

1

2

3

4

5

6

SO

(m

g -

CO

D/l)


(b)


INCBP Predictions

0 100 200 300 400 500 600 700 800 900 10000

1

2

3

4

5

SO

(m

g -

CO

D/l)

- Unscaled training data; -- 10-Step ahead output predictions by ARLS

(c)



ARLS Predictions

Fig. 6.42: Comparison of the ten-step ahead prediction of the soluble oxygen (SO) unscaled training data by the network

trained with backpropagation with momentum (BPM), incremental backpropagation (INCBP), and the

proposed ARLS algorithms.

INCBP and the ARLS algorithms. It is evident that the unscaled test data predictions by network trained using the

ARLS algorithm match the true test data to a high accuracy when compared to those obtained by the network

trained using BPM and INCBP. However, the BPM shows a slight improved performance over the INCBP. The

superior performance of the proposed ARLS algorithm over the BPM and the INCBP algorithms proves the

effectiveness of the proposed ARLS algorithm.

The one-step ahead prediction accuracies of the unscaled test data by the networks trained using the BPM,

INCBP and the ARLS algorithms is evaluated by the computed mean prediction errors shown in the fifth line of

Table 6.11. It can be seen that the one-step ahead test data prediction errors by the network trained using ARLS

algorithm are much smaller than those obtained from the network trained using the BPM and INCBP algorithms.

This one-step ahead unscaled validation data prediction results given by Fig. 6.41 as well as the one-step ahead

prediction error in Table 6.11 justify that the network trained using the ARLS algorithm mimic the dynamics of

the AS-WWTP and that the resulting network can be used to model the actual AS-WWTPR process.


230

6.3.3.2.2 K–Step Ahead Prediction Simulations for the AS-WWTP Process



data (target output) are shown in Fig. 6.42 (a)–(b) for the network trained using the BPM, INCBP and ARLS

algorithms. The comparison of the 10-step ahead output predictions performance by the network trained using

BPM, INCBP and the ARLS algorithms shows the superior performance of the ARLS algorithm over the BPM,

INCBP algorithms for use in distant or multi-step ahead predictions.

The computation of the mean value of the 10-step ahead prediction error (MVPE) using (3.93) gives

1.0140e-1, 9.7235e+1 and 1.0361e-3 by the network trained using the BPM, INCBP and ARLS algorithms as

shown in the sixth line in Table 6.11. The relatively small MVPE obtained by the network trained with the ARLS

algorithm is indications that the trained network approximates the dynamics of the AS-WWTP process to a high

degree of accuracy.

6.3.3.2.3 Akaike’s Final Prediction Error (AFPE) Estimates for the AS-WWTP

Neural Network Model


regularized criterion for the network trained with the BPM, INCBP and the ARLS algorithms with multiple

weight decay gives the respective AFPE estimates of the three algorithms as defined in the last line of Table 6.11.

These small values of the AFPE estimate indicate that the trained networks capture the underlying dynamics of

the third aerobic reactor of the AS-WWTP system and that the network is not over-trained [Sjöberg and Ljung,

1995]. This in turn implies that optimal network parameters have been selected including the weight decay

parameters. Again, the results of the AFPE estimates obtained with the networks trained using the ARLS

algorithm are slightly smaller when compared to those obtained using BPM and INCBP algorithms.

6.3.3.3 Performance Comparison of the BPM, INCBP and the ARLS Algorithms

The simulation results for the neural network training using the BPM, INCBP and ARLS algorithms as

well as the network validation result for the AS-WWTP process are shown in Fig. 6.39, Fig. 6.40, Fig. 6.41, and

Fig. 6.42 respectively; whereas numerical results for these three algorithms are presented in Table 6.11.

The numerical results summarized in Table 6.11 show that when the network is trained by using the

ARLS algorithm presents the best performance as the least values in all of the six properties are achieved. In

addition, the least value of the AFPE is obtained using the ARLS algorithm indicates that the ARLS algorithm

captures the essential dynamics of the process and that the identified NN model approximates the true system to a


231

AS-WWTP

Process

( )Y k ( )U k •

Neural Network

NNARMAX Model

Nonlinear

Optimizer •

+ −

'( )R k

( )E k

•

Constraints

First-Order

Low Pass

Filter

( )d k

( )R k

Fig. 6.43: The closed-loop NAMPC scheme used for the soluble oxygen (SO) in order to evaluate the

online model identification based on ARLS and MLMA algorithms.

high degree of accuracy. The small mean value of the 10-step ahead prediction error (MVPE) is an indication that

the ARLS algorithm can be used in adaptive predictive control applications due to its accurate 10-step ahead

predictions when compared to BPM and INCBP algorithms. Furthermore, as the relative small performance index

is obtained within a relatively short time of 100 epochs, makes this algorithm more appropriate for applications

with real time requirements.

6.3.4 Validation and Performance Evaluation of the Proposed AGPC and NAMPC

Algorithms for Model-Based Adaptive Control of the AS-WWTP Process

The soluble oxygen (dissolved oxygen concentration) in the aerobic reactor of the activated sludge

wastewater treatment plant (AS-WWTP) should be sufficiently high to supply enough oxygen to the

microorganisms in the sludge, so organic matter is degraded and ammonium is converted to nitrate. On the other

hand, an excessively high DO, which requires a high airflow rate, leads to a high energy consumption and may

also deteriorate the sludge quality. A high DO in the internally recirculated water also makes the denitrification

less efficient. Hence, both for economical and process reasons, efficient control of the DO is paramount. The main

control objective here is on the efficient control of the DO concentration as soluble oxygen SO of the third aerobic

reactor while minimizing energy consumption using the two neural network-based proposed adaptive MPC

algorithms: AGPC and NAMPC under two different alternating activated sludge conditions of the AS-WWTP.

Initially, the neural network model of the AS-WWTP process is identified and validated as explained in

the sub-section 6.3.2. The AGPC and the NAMPC controllers were then simulated subject to the constraints given

in Table 6.12 and tuned using the NN model of the AS-WWTP process. The obtained optimal tuning parameters

are given in Table 6.13. Next, the validated Simulink model of the AS-WWTP process is placed in closed-loop

with the NN identification scheme based on the NNARMAX model of Fig. 6.29 and the AGPC controller of Fig.

6.37 and NAMPC controller of Fig. 6.43. At each sampling instant, a new input-output data is obtained from the


232

Table 6.12: Constraints on the soluble oxygen (SO)

concentration control in the aerobic

reactor of the AS-WWTP Process

Process Constraints AGPC NAMPC

Minimum control input (minU ) 0 0

Maximum control input (maxU ) 10 10

Minimum predicted output (minY ) 0 0

Maximum predicted output (maxY ) 2 2

Desired reference signal ( R' ) 2 2

Table 6.13: The AGPC and the NAMPC tuning

parameters for the SO control in the

aerobic reactor AS-WWTP process

Tuning Parameters AGPC NAMPC

Initial control input (ICI, U) 0 -5

Initial predicted output, (IPO, Y) 0 0

Nd 1 1

Nu 2 2

Np 5 5

κ 1.5 1.5

ρ 0.08 0.08

λ - 0.01

Am [1 -0.7] [1 -0.7]

Bm [0 0.3] [0 0.3]

δ - 1e-6

Uiter - 2

AS-WWTP process based on the new control signal, a new neural network model is identified and the two

controllers are designed and implemented online over 300 sampling instants. The neural network model

identification algorithm used here is the MLMA due to its superior performance over the ARLS algorithm.


implemented on an Intel® Core™2 Quad CPU @ 2.66GHz with 3GB of RAM using the MATLAB “parfor”

command available in the MATLAB Distributed and Parallel Toolbox. The “parfor” command implements the

loop specified by the command in parallel and it uses the four Intel® processors available on the computer

system. This MATLAB facility allows the utilization of the four processors available on the computer for the

implementation of the identification and control algorithm at each time step. The “parfor” is initialized using the

MATLAB “matlabpool open x” command, where x specifies the number of processor(s) to be used in the

computation. Of course, after the computation, the parallel session is closed using the command “matlabpool

close”.

6.3.4.1 Comparison of Simulation Results for the Control Performance of AGPC

and NAMPC for the AS-WWTP Process Identification and Control

The closed-loop AGPC and the NAMPC predictions of the soluble oxygen control for an alternating

sludge in the AS-WWTP process are predictions is shown in Fig. 6.44(a) while the aeration control signal for the

manipulated input, oxygen transfer coefficient (KLa5), is shown in Fig. 6.44(b). The computation time for the

identification and control at each sampling instants for the AGPC and the NAMPC are shown in Fig. 6.44(c) and

(d) respectively. The average computation time for identification and control using the AGPC is 0.6594 seconds

while that for NAMPC is 1.7316 seconds.


233

0 50 100 150 200 250 300

0

0.5

1

1.5

2

(a)

SO

Pre

dic

tions (m

g C

OD/l)

Ref

AGPC

NAMPC

0 50 100 150 200 250 300

0

0.5

1

1.5

2

KLa5 c

ontrol sig

nal (d

ay)-1

(b)

Number of samples

AGPC

NAMPC

0 50 100 150 200 250 300

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

(a)

AG

PC

com

puta

tion t

ime (

sec)

0 50 100 150 200 250 3001.4

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

NA

MP

C c

om

puta

tion t

ime (

sec)

(c) (d)


Fig. 6.44: The soluble oxygen predictions and the oxygen transfer coefficient control by (a) AGPC and NAMPC with

the control signal (b) for the manipulated variable, oxygen transfer coefficient (KLa5) for the alternating

AS-WWTP process. Computation time for the parallel implementation of the identification and control

strategies for the AS-WWTP process at each sampling instant sample: (c) AGPC with an average

computation time of 0.6594 seconds and (d) NAMPC with an average computation time of 1.7316 seconds.

One can observe from the control simulation results of Fig. 6.44(a) and (b) that both the AGPC and the

NAMPC provides suitable tracking of the desired soluble oxygen SO desired set-point reference signals without

oscillations or overshoot. However, the energy consumption in controlling the soluble oxygen, dissolved oxygen

concentration, by the AGPC is about one-third compared to that required by the NAMPC algorithm. Furthermore,

the NAMPC requires about 2.6260 times the computational time used by the AGPC to achieve the same accurate

soluble oxygen control.


234

0 50 100 150 200 250 300 350

0

0.5

1

1.5

2

(a)

SO

Pre

dic

tions (m

g C

OD

/l)

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 350

0

0.5

1

1.5

2

KLa5 c

ontrol sig

nal (d

ay)-1

(b)

Number of samples

AGPC

NAMPC

Fig. 6.45: The soluble oxygen control predictions and control by AGPC and NAMPC with the

control signal (b) for the manipulated variable, oxygen transfer coefficient (KLa5) AS-

WWTP process with sinusoidal disturbances.

In the second simulation, the soluble oxygen is increasing and reducing as a sinusoid. This behaviour is

not unusual when dealing with disturbance rejection and noise arising in the AS-WWTP due to excessive

dissolved oxygen concentration [Azwar et al., 2006]. Here, the desired reference is superimposed with a non-

negative sinusoidal disturbance of unit amplitude and the control objective is to track this desired reference and

keeping the SO (that is the dissolved oxygen, DO) concentration at the prescribed level of 2.0 mg COD/l without

any overshoot or generating additional noise as oscillations. The SO predictions and the KLa5 control signals

obtained by the AGPC and the NAMPC algorithms are shown in Fig. 6.45 respectively.

It can be observed in Fig. 6.45 that the NAMPC shows better control performance over the AGPC where

the magnitude of the throttle setting is very large. Actually, the AGPC exhibits some oscillatory overshoots and

hardly track the fast changing sinusoidal reference signal for the SO. These results demonstrates the efficiency of

the proposed NAMPC based on its adaptive optimization strategy for tracking and controlling difficult nonlinear

non-smooth and control applications with relatively fast changing dynamics while the control signal is kept at

minimum to minimize operational costs.

Chapter 6 Case Study 3: Nonlinear F-16 Fighter Aircraft

235

6.4 Neural Network-Based Model Identification and Adaptive Predictive Auto-Pilot

Control of a Nonlinear F-16 Fighter Aircraft

To demonstrate the application of the neural network-based identification and control algorithms to a

highly nonlinear system with relative fast dynamics and short sampling time, the auto-pilot control system of a

nonlinear F-16 fighter aircraft is considered.

The nonlinear F-16 aircraft dynamics is, in general nonlinear, time-varying and uncertain. Traditionally,

aircraft flight control systems are designed by using the mathematical model of the aircraft linearized at various

flight conditions. The aircraft motion variables are sensed and fed into the aircraft control system which adjusts

the s surface actuators via some feedback gains. The adjustment process is called gain scheduling. Since controller

designs are performed off-line using a limited number of linear and time-invariant models, extensive gain

scheduling computation is required. While this approach may handle mild nonlinearities, it is not suitable for

highly nonlinear problems associated with the aircraft. The gain scheduling approach may produce a control law

that is applicable around the current design operating points but not globally. Thus, as aircrafts become more

complex, traditional design methods have not yielded acceptable performance. To overcome these problems,

nonlinear control techniques such as feedback linearization discussed in Section 2.11.1.5 in Chapter 2 have been

studied as alternatives to gain scheduling ([Khalil, 1996]; [Morari and Zafiriou, 1989]; [Nørgaard et al., 2000]).

The use of these techniques is difficult because they depend heavily on accurate knowledge of the aircraft

dynamics. Thus, a totally different approach to the nonlinear F-16 aircraft flight control is presented in this work

based on the use of neural network-based nonlinear modeling and adaptive control techniques.

6.4.1 Formulation of the Nonlinear F-16 Aircraft Control Problem

The F-16 aircraft can be controlled by manipulating the deflections of the aileron, elevator and the rudder

surfaces as well as the thrust illustrated in Fig. 6.46 while the definition of the angle of attack α and the angle of

side slip β together with respect to the orientation of the navigation frame are shown in Fig. 6.47. A positive

aileron, elevator or rudder deflections gives a decrease in roll rate p, pitch rate q or yaw rate r respectively while a

positive thrust t causes an increase in acceleration along the longitudinal body axis. The control of the nonlinear

F-16 aircraft is discussed here with respect to the right control surfaces shown in Fig. 6.46 and Fig. 6.47.

The main objective here for the nonlinear F-16 aircraft is on neural network model identification and

adaptive model-based control of the auto-pilot control system based on the orientations of the three right control

surfaces; that is, the aileron, elevator and the rudder deflection control by manipulating the roll, pitch and the yaw

actuator rates respectively as well as the throttle command for controlling the thrust according to the desired flight

route. The desired routes of the proposed flight are illustrated in Fig. 6.48 and are described below as follows:


236

Right Aileron Right Leading

Edge Flap

Right

Elevator

Right Rudder

Pitch Axis

(Lateral Axis)

Roll Axis,

(Longitudinal Axis)

Yaw Axis

(Vertical Axis)

Thrust

φ in deg

ψ in deg ϑ in deg

,M q

,L p

,N r

,bz w

,b

x u

,b

y v

• cmO

nx

North

nz

South

ny

East

. Ocm

Fig. 6.46: The F-16 aircraft surfaces for the control of the thrust, roll rate (p), pitch rate (q), yaw rate (r), ( , , )b b b

x y z

are the body axes, ( , , )u v w are the velocities along the body axes, (L) is the rolling moment, (M) is the

pitching moment, (N) is the awing moment, ( , , )n n n

x y z is the navigation frame, cm

O is the center of mass,

( , , )φ ϑ ψ are the Euler angles for aileron, elevator and rudder deflections respectively.

,b

z w

,b

x u

,b

y v

. Ocm

α

β

wz

wy

wx

TV

•

cmO

nx

North

nz

South n

y

East

Fig. 6.47: Definition of the angle of attack, α ( 0α > ) and sideslip, β ( 0β > ). , ,n n n

x y z are the North, East and

South orientation of the navigation frame.

1) It is assumed that the aircraft is ready for take off with the three control surfaces at the 0° with the thrust at full

power.

2) The aircraft begins its straight motion by first ascending. This requires that the elevator deflects completely

downwards while the aileron and the rudder still remain at 0°.


237

0 20 40 60 80 100 120 140 160

-100

0

100R

oll

rate

(p)

(a)

2

3

4

5

6

7

0 20 40 60 80 100 120 140 160

-50

0

50

Pitch rate

(q)

(b)

2

3

4

5

67

0 20 40 60 80 100 120 140 160

-100

0

100

Yaw

rate

(r)

(c)

2

3

4

56

7

0 20 40 60 80 100 120 140 1600

1

2

x 104

Thru

st (t)

2 3

45

67

(d)

Number of samples

Fig. 6.48: The desired reference trajectories for the roll rate (p), pitch rate (q) and the yaw rate (r) as well as the thrust (t)

for the nonlinear F-16 aircraft.

3) While still ascending, the aircraft banks completely to the left. This requires the aileron deflects upwards and

rudder to deflect to the right.

4) Next, the aircraft remains on a steady flight and the control surfaces are maintained at approximately 0° while

the thrust at approximately its average value to keep the aircraft in motion.

5) Next, the aircraft descends while banking to the right. Thus, requiring the elevator to deflect upwards

approaching its maximum value whereas the aileron deflects downwards and the rudder deflects to the left.

6) Next, the aircraft makes a slight rolling action to the left and the aileron deflects upwards. At this point the

elevator deflection is slightly downward approaching its zero axis (i.e. the neutral position) while the rudder

deflection is slightly increased and approaching the zero axis. The zero axis (i.e. the neutral position is used to

described the orientation when the aircraft is at rest on the earth surface where are deflections are 0° with


238

respect to the center of mass. This behaviour of the elevator and the rudder indicates that the aircraft descends

slightly with a slight turning to the left.

7) Finally, the aircraft descends and all the deflection surfaces approach to 0° as the aircraft is landing and comes

to rest. The thrust is also reduced to its minimum value of zero as the aircraft comes to rest.

It is important to note that while the aircraft can roll completely about the longitudinal axis using the ailerons, the

turning of the aircraft about vertical axis is accomplished not only by the rudders but also in conjunction with the

ailerons [Anderson and Eberhardt, 2001 and 2010]. The desired reference trajectories for the six intermediate

routes discussed in step 2) to step 7) above are illustrated in the figure of Fig. 6.48 together with the desired thrust

variations.

In state space form ([Ducard, 2009]; [Nelson, 1989]; [Stevens and Lewis, 2003]), the motion of the

nonlinear F-16 aircraft can be represented by the following nonlinear vector differential equation:

( , , )X g U X C= (6.19)

where g is a nonlinear function, U , X and C are the input vector, state vector, and the aerodynamic coefficients

respectively. The thirteen states that are used to describe the rigid-body motion of the nonlinear F-16 aircraft over

a flat Earth is given by:

[ ]T

N E T lefX p p h V p q rφ ϑ ψ α β δ= (6.20)

where N

p , E

p , h , φ , ϑ , ψ , T

V , α , β , p , q , r and LEF

δ are the north position, east position, altitude, roll

angle, pitch angle, yaw angle, velocity, angle of attack, angle of side-slip, roll rate, pitch rate, yaw rate and the

deflection of the leading edge flap respectively. A complete description of the nonlinear F-16 aircraft dynamics is

given in Appendix D. Also, these dynamics are modeled and implemented using Simulink and MATLAB C

programs. The complete model and implementation programs are given in Appendix D.

The aerodynamic coefficients are functions of some of the states, namely: the damping coefficients are

functions of the angle of attack α ; the body-axis aerodynamic force coefficients are functions of α , β , a

δ , e

δ

and r

δ ; the moment coefficients are functions of α , β and e

δ ; the coefficients of the rolling moment due to the

ailerons and the rudder deflections as well as the coefficients of the yawing moment due to the ailerons and the

rudder deflections are functions of α and β [Stevens and Lewis, 2003]. The input vector to the nonlinear F-16

aircraft model is

( ) [ ]T

a e r tU k δ δ δ δ= (6.21)

where a

δ , e

δ , r

δ and t

δ are the aileron deflection, elevator deflection, rudder deflection and throttle setting

respectively. The eighteen outputs of the nonlinear F-16 model are: the twelve derivatives of the state variables of

Np ,

Ep , h , φ , ϑ , ψ ,

TV , α , β , p , q , r ; three normalized acceleration coordinates

nxA ,

nyA ,

nzA ; Mach

number M; dynamic pressure ( )qbar ; and static pressure ( )s

p . The definition of the static and dynamic pressures


239

as well as their differences and measurement techniques are discussed Appendix D–4. The LEF

δ is a function

of α , qbar ands

p as detailed in Appendix D–3 which allows the F-16 aircraft to fly at higher angle of attack

[Nguyen et al., 1979]. The definition of the angle of attack α with respect to the orientation of the navigation

frame is shown in Fig. 6.47 while the details are given in Appendices D–1 and D–3. The inclusion or not of the

deflection of the leading edge flap LEF

δ in F-16 aircraft simulation results to two types of models for the F-16

aircraft, that is the low fidelity and high fidelity models [Russell, 2003]. The low fidelity model excludes the

effects of the LEF

δ [Stevens and Lewis, 2003] whereas the high fidelity model considers the full effects of the

LEFδ [Nguyen et al., 1979]. The differences between these two types of models are based on the data used to

model the aircraft and is discussed in Appendix D–3.

6.4.1.1 Simulation of the Nonlinear F-16 Aircraft for Training Data Acquisition

The F-16 turbofan engine model consists of a first-order dynamic model, a throttle command shaping

function and tables of nonlinear thrust functions of the operating power level, altitude, and Mach number at

different altitudes h as shown in Table VI in Nguyen et al. [Nguyen et al., 1979]. There is one table for each of

the power level called the idle, military, and the maximum ([Nguyen et al., 1979]; [Stevens and Lewis, 2003]). In

the thrust lookup tables, Mach number variation is from 0 to 1 in steps of 0.2 as shown in Tables VI (a) and (b) in

Nguyen et al. [Nguyen et al., 1979]. Altitude variation is from 0 to 50,000ft in steps of 10,000ft as shown in Table

VI (b) in Nguyen et al. [Nguyen et al., 1979].

The aerodynamic data for the high fidelity model used in this work are based on the wind-tunnel tests on

a scaled model of the nonlinear F-16 aircraft provided in [Nguyen et al., 1979] and [Russell, 2003]. The values of

the dimensionless aerodynamic coefficients are presented in multi-dimensional lookup tables associated with

linear interpolation algorithms in the Table III of [Nguyen et al., 1979]. The aerodynamic data are referenced to

the nominal position of the center of gravity XCG = 0.35. The angle of attack ranges from -10° to 45° in steps of

5°, the sideslip angle ranges from -30° to 30° in steps of 5°, and the speed upper limit is 0.6 Mach. The limits of

the actuators used to control:

• the aileron, elevator, rudder and the leading edge flap deflections (LEF

δ ) are +21.5°, +25°, +30° and 0 to

25° respectively,

• the thrust of the turbofan engine are from 1,000 to 19,000 lbs ,

• the throttle setting rate is +10,000 lbs , and

• the roll, pitch, yaw and the LEF

δ actuator rate are +80°, +60°, +120° and +25° respectively.

In order to model the nonlinear F-16 aircraft that could fly at higher angle of attack, the NASA data for

the high fidelity nonlinear F-16 aircraft model [Nguyen et al., 1979] is used in this simulation study. The National


240

Table 6.14: Nonlinear F-16 aircraft model simulation parameters for data acquisition

Parameters Minimum Value Maximum

Value

Steps

1 North and East position,NP and

EP (ft) 0 180 1.8081

2 Altitude, h (ft) 5,000 50,000 454.5455

3 Aileron deflection, φ (deg) -80 +80 1.2121

4 Elevator deflection, ϑ (deg) -60 +60 1.6162

5 Rudder deflection, ψ (deg) -120 +120 2.4242

6 Disturbances on deflections, d (deg) 1 12 0.1111

7 Thrust, t (lb) 10,000 19,000 90.9091

8 Velocity, VT (ft/s) 300 900 6.0606

9 Roll rate, p (rad/s) -21.5 +21.5 0.4343

10 Pitch rate, q (rad/s) -25 +25 0.5051

11 Yaw rate, r (rad/s) -30 +30 0.6061

12 Angle of attack, α (rad/s) -10 +45 1.0101

13 Angle of side slip, β (deg) -10 +45 1.0101

14 Throttle settings (lb/s) 1,000 10,000 90.9091

Agency for Space Administration (NASA) data include a model of the F-16 afterburning turbofan engine, in

which the thrust response is modeled with a first-order lag implemented by the Simulink model shown in Fig. D.9

of Appendix D–3. The lag time constant is a function of the actual engine power level and the throttle setting (or

command).

Thus, given the initial control inputs as the aileron deflection a

δ , elevator deflection e

δ , rudder deflection

rδ , throttle setting

tδ together with their respective disturbances

ad ,

ed ,

rd and

td as well as arbitrary values for

the states defined in (6.20) excluding the last term LEF

δ ; the trimming functions given by Appendices D–6 and D–

7 are used to initially trim the F-16 aircraft to steady wings-level, roll, pull-up or turning flight condition. After

the aircraft has taken one discrete time step, the twelve input states are found from integrating the twelve outputs

state derivatives given by (6.19) together with three normalized accelerations nx

A , ny

A , nz

A ; Mach number M;

dynamic pressure ( )qbar ; static pressure ( )s

p and the deflection of the leading edge flap LEF

δ . For example, the

initial parameters used in one rolling flight simulation test for trimming the aircraft are defined in the MATLAB

script F-16_Simulations.m, given in Appendix D–8, while the MATLAB script that implements the complete

nonlinear F-16 aircraft simulation F16_aircraft_model.m is given in Appendix D–9. Thus, for different values of

the control inputs and the type of flight condition, the F-16 aircraft is simulated for a specified time from Tstart to

TFinal (see Appendix D–9). The number of times the trimming algorithm is called is set to 4 throughout in this

work. This number of trimming gave the same steady state results in most cases.

Using the above information, the nonlinear F-16 aircraft model parameters were varied between their

minimum and maximum values over 100 different values with their respective steps as shown in Table 6.14 for

the purpose of obtaining sufficient data for neural network training and validation. Next, the Simulink model of

the aircraft described in Appendix D–3 was simulated in open-loop for four different flight conditions. These


241

conditions are: steady wings-level flight, rolling flight, pull up/down flight, turning flight. At each flight

simulation 1,100 data were obtained, each for 25 steady wings-level, roll, pull-up/down and turning flight

conditions to obtain 4,400 data samples. The last 100 data from each flight simulation making up 400 data

samples (10% of 4,400) were reserved for network validation while the remaining 4,000 (90% of 4,400) were

used for network training.

6.4.2 Neural Network Identification of the Nonlinear F-16 Aircraft Model

The neural network model predictor which is based on a nonlinear autoregressive with moving average

exogenous input (NNARMAX) discussed in Chapter 3 is considered for modeling the nonlinear F-16 aircraft. The

neural network identification scheme used here is the one defined by Fig. 3.4(b). The input vector to the neural

network consists of the regression (state) vectors ( )an kϕ , ( )

bn kϕ and ( , ( ))cn k kϕ θ which are concatenated into

( , ( ))NNARMAX

k kϕ θ All these vectors are defined by the following relationships:

[( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

an N a E a a a a a

T a a a a a a

T

nx a ny a nz a a a s a

k P k n P k n h k n k n k n k n

V k n k n k n p k n q k n r k n

A k n A k n A k n M k n q k n p k n

ϕ φ ϑ ψ

α β

= − − − − − −

− − − − − −

− − − − − −

(6.22)

( ) ( ) ( ) ( ) ( )b a e r t

T

n b b b bk u k n u k n u k n u k nδ δ δ δϕ = − − − − (6.23)

( , ( )) ( , ( )) ( , ( )) ( , ( )) ( , ( ))

( , ( )) ( , ( )) ( , ( )) ( , ( ))

( , ( )) ( , ( )) ( , ( )) ( , ( ))

( , ( )) ( , (

c N E

T

nx ny

n P c P c h c c

c c V c c

c p c q c r c

A c A c

k k k n k k n k k n k k n k



k n k k n

φ

ϑ ψ α

β

ϕ θ ε θ ε θ ε θ ε θ



ε θ ε θ

= − − − −

− − − −

− − − −

− − )) ( , ( )) ( , ( ))

( , ( )) ( , ( ))

nz

s

A c M c

T

q c p c

k k n k k n k

k n k k n k

ε θ ε θ

ε θ ε θ

− −

− −

(6.24)

( , ( )) ( ); ( ); ( , ( ))a b cNNARMAX n n n

k k k k k kϕ θ ϕ ϕ ϕ θ = (6.25)

where ( )a

u kδ , ( )e

u kδ , ( )r

u kδ , ( )t

u kδ and ( )LEF

u kδδ are the aileron deflection, elevator deflection, rudder

deflection, throttle setting and the deflection of the leading edge flap respectively; ( )T

V k , ( )kα and ( )kβ are the

velocity, angle of attack and angle of sideslip respectively; ( )kφ , ( )kϑ and ( )kψ are the Euler angles; ( )p k ,

( )q k and ( )r k are the angular rates.

Although, the actual outputs of the neural network are the predicted values of the twelve states given by:

ˆ ˆ ˆ ˆˆ ˆ ˆ ˆˆ ˆ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

ˆ ˆ ˆ ˆ ˆˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

N E T

T

nx ny nz s

Y k P k P k h k k k k V k k k

p k q k r k A k A k A k M k q k p k

φ ϑ ψ α β =

(6.26)


242

which are also used for the error predictions of (6.24), for simplicity reasons, the results of the simulations that

follow are only for the predicted values of the three angular rates and the throttle setting given by:

ˆ ˆˆ ˆ ˆ( ) [ ( ) ( ) ( ) ( )]TY k p k q k r k t k= (6.27)

where ˆ ( )p k , ˆ ( )q k , ˆ ( )r k , and ˆ( )t k are the predicted values of the roll rate, pitch rate, yaw rate and the throttle

setting respectively. Although the fifth output of the controller; that is, the deflection of the leading edge flap

LEFδ , contributes to the performance of the F-16 aircraft, it is left out in the simulation results since it is not

directly available for manipulation by the pilot but depends on α , qbar ands

p (Fig. D–4 in Appendix D).

The neural network model identification problem here is to train a neural network in order to determine

the optimal parameters of the network which will provide the same values of the controlled variables with those

obtained from the aircraft when both neural network and aircraft are subjected to the same input stimuli. Then, the

trained network will be employed as the model of the aircraft on which the computations of the control actions

will be based at each sampling instant in this will be also the model that will be upgraded each time a new set of

input-output data become available from the actual operation of the aircraft. The disturbances considered here are

variations in the parameters of the validated nonlinear F-16 aircraft model build by first principles.

6.4.2.1 Performance Comparison of the Neural Network Model Identification Based

on the ARLS and the MLMA Algorithms

The input vector to the neural network is the regression vector ( , ( ))NNARMAX

k kϕ θ defined by (6.25). The

regressors of the moving average input vector ( , ( ))cn k kϕ θ , are not usually known in advance and it is initialized

to small positive random matrix of dimension c

n byc

n . The outputs of the neural network are predicted values

of ˆ( )Y k given by (6.26). However, as discussed earlier, the simulation results for the output predictions presented

here are for the roll rate ˆ ( )p k , pitch rate ˆ( )q k , yaw rate ˆ( )r k and for the thrust ˆ( )t k as defined in (6.27).

For assessing the convergence performance of the network, the network was trained for τ = 20, 50, 100

and 500 epochs with the following selected parameters: 4p = , 18q = , 2a

n = , 2b

n = , 2c

n = , 80nϕ = , 10h

n = ,

18o

n = , 1 7h

eα = − and 1 6o

eα = − . The four design parameters for adaptive recursive least squares (ARLS)

algorithm defined in (3.68) are selected to be: : α=0.5, β=5e-3, 'δ =1e-5 and π=0.99 resulting to γ=0.0101. The

initial values for ēmin and ēmax in (3.67) are equal to 0.0102 and 1.0106e+3 respectively and were evaluated using

(3.67). Thus, the ratio ēmin/ēmax in (3.66) is 9.9018e+4 which imply that the parameters are well selected.

Also 1 3eτλ = − , 5 2s e= − and 1 3eδ = − were selected to initialize the modified Levenberg-Marquardt algorithm

(MLMA).


243

The training 4,000 data is first scaled according to (3.89) and the network is trained for 10, 50, 100τ =

and 500 epochs using the adaptive recursive least squares (ARLS) and the modified Levenberg-Marquardt

(MLMA) algorithms proposed in Chapter 3. After network training, the trained network is again rescaled using

(3.90), so that the network trained for the nonlinear F-16 aircraft can work with unscaled nonlinear F-16 aircraft

data. The convergences of the ARLS and MLMA algorithms for (a) 20, (b) 50, (c) 100 and (d) 500 epochs is

shown in Fig. 6.49 and are evaluated in terms of the performance index discussed in Section 3.3.7 in Chapter 3.

One can observe that at approximately 55 epochs the ARLS and the MLMA algorithms converge to acceptable

performance indexes of approximately 10-5

and 10-7

respectively whereas a slight better convergence is achieved

Table 6.15: Summary of training results using the ARLS and MLMA algorithms for nonlinear F-16 aircraft

EPOCHS

20 100

ARLS MLMA ARLS MLMA

Computation time for model identification (sec) 5.8250e+1 2.1875 2.9859e+2 6.5313

Mean value of the Mean square error (MSE) 2.6503e-1 2.5000e-3 1.1194e-1 4.8828e-4

Minimum performance index 3.3145e-5 1.3999e-5 2.0910e-6 3.4422e-10


5.9850e-7

Mean error of one-step prediction of validation data 1.0078e-1 5.9109e-3 4.7908e-2

1.9437e-4


Akiake’s final prediction error (AFPE) estimate 5.7325e-2 3.6945e-4 2.5060e-2 1.9671e-9

5 10 15 2010

-8

10-6

10-4

10-2

100

20 Epochs

(a)

ARLS

MLMA

0 10 20 30 40 5010

-10

10-8

10-6

10-4

10-2

100

50 Epochs

(b)

ARLS

MLMA

0 20 40 60 80 10010

-10

10-8

10-6

10-4

10-2

100

100 Epochs

(c)

Number of samples

ARLS

MLMA

0 100 200 300 400 50010

-10

10-8

10-6

10-4

10-2

100

500 Epochs

(d)

Perform

ance index e

valuate

d a

t each ite

ration

Number of samples

ARLS

MLMA

Fig. 6.49: Convergence of the NN used to model the F-16 aircraft when it is trained with the ARLS and the MLMA

algorithms (sum of squares error vs. epoch).


244

for 100 epochs with minimum performance indexes of approximately 10-6

for the ARLS algorithm and 10-10

for

the MLMA algorithm. The comparison of Fig. 6.49 (c) and (d) shows that the performance index for the MLMA

algorithm does not decrease any further after 100 epochs while the ARLS algorithm terminates after 85 epochs.

The comparison of Fig. 6.49 (a), (b), (c) and (d) shows that the ARLS has a faster convergence than the

MLMA after approximately 50 epochs. Since real-time identification and control is the primary aim of this work,

it is necessary to investigate the performance of networks trained with relative small number of iterations (epoch).

Thus, the evaluation of the network performance trained with 20 and 100 epochs are investigated here for

the nonlinear F-16 aircraft. The summary of the training results for the ARLS and the MLMA algorithms are

presented in Table 6.15.


6.15. As it can be seen in this table, despite the fact that the ARLS algorithm converges faster, the MLMA

computational cost is less by 26.63 and 45.72 times for the complete 20 and 100 epoch cases than the respective

figures of the ARLS algorithm. The mean square error (MSE) and the minimum performance indexes for the

network trained with the ARLS and the MLMA algorithms are given in the second and third lines of Table 6.15

respectively. Again, it is obvious that the MLMA algorithm also has smaller mean square errors and minimum

performance indexes when compared to the ARLS algorithm. The relatively small values of the mean square error

(MSE) and the minimum performance indexes indicate that the MLMA outperforms the ARLS despite the early

termination of the ARLS algorithm. Generally, both algorithms converge well due to the relatively small

performance index with the ARLS converging faster than the MLMA at the expense of higher computational cost.

These small errors suggest that the trained network with the MLMA algorithm approximates better the F-16

aircraft model. The MLMA algorithm could be advantageous over the ARLS algorithm for the model

identification of a system with relatively short sampling time when used for real-time control such as the

nonlinear F-16 aircraft which has a sampling time of 0.5 seconds. Next, the validity of the trained network is

investigated.

6.4.3 Validation of the Trained Network for Modeling the Nonlinear F-16 Aircraft

As it was discussed in Chapter 3, the process of checking if a trained network predicts correctly both data

that were used for training and unknown data that were not used during training is called network validation. In

the following subsections the validation by different methods of the network that models the F-16 aircraft and

trained by the proposed ARLS and MLMA algorithms is explained. This validation has been made by the use of

scaled and unscaled data as well as with 400 validation data obtained from the experiments and the open-loop

simulations discussed above.


245

0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

Ro

ll ra

te (

p,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

Ro

ll ra

te (

p,

de

g/s

ec

)

(a)


0 500 1000 1500 2000 2500 3000 3500 4000-3

-2

-1

0

1

2

pit

ch

ra

te (

q,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

pit

ch

ra

te (

q,

de

g/s

ec

)


(b)


0 500 1000 1500 2000 2500 3000 3500 4000-3

-2

-1

0

1

2

Ya

w r

ate

(r,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

Ya

w r

ate

(r,

de

g/s

ec

)(c)


0 500 1000 1500 2000 2500 3000 3500 4000-2

-1

0

1

2

Th

rus

t (l

b/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-2

-1

0

1

2

Th

rus

t (l

b/s

ec

)


(d)


Fig. 6.50: Comparison of the output predictions of the scaled training data using the network trained by ARLS and

MLMA: (a) roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d) thrust prediction for 20

epochs.

6.4.3.1 Trained Network Validation by the One-Step Ahead Predictions Simulation

The one-step ahead prediction method makes an assessment of the errors between the training data

obtained from the open-loop simulation of the differential equations model and the one-step ahead predictions of

the trained network.

The comparison of the one-step ahead predictions of the scaled training data (blue -) against the trained

network output predictions (red --*) using the ARLS and the MLMA algorithms are shown in Fig. 6.50 (a)–(d) for


246

the roll rate ˆ( )p , pitch rate ˆ( )q , yaw rate ˆ( )r and the thrust ˆ( )t respectively for 20 epochs. Similar one-step

validation results for 100 epochs are also shown in Fig. 6.51 (a)–(d).

The one-step ahead prediction errors for the scaled training data or 20 and 100 epochs when the network

is trained by using the ARLS and the MLMA algorithms are given in the fourth line of Table 6.15. It can be seen

in Fig. 6.50 (a)–(c) and Fig. 6.51 (a)–(c) that the network predictions of the roll rate ˆ( )p , pitch rate ˆ( )q , yaw rate

ˆ( )r and the thrust ˆ( )t training data closely match the original data used for the network training. A slight

prediction mismatch is observed in Fig. 6.50 (d) and Fig. 6.51(d) when the ARLS algorithm is used. Also, the

0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

Ro

ll ra

te (

p,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

Ro

ll ra

te (

p,

de

g/s

ec

)

(a)


0 500 1000 1500 2000 2500 3000 3500 4000-3

-2

-1

0

1

2

pit

ch

ra

te (

q,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

pit

ch

ra

te (

q,

de

g/s

ec

)


(b)


0 500 1000 1500 2000 2500 3000 3500 4000-3

-2

-1

0

1

2

Ya

w r

ate

(r,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-4

-2

0

2

Ya

w r

ate

(r,

de

g/s

ec

)

(c)


0 500 1000 1500 2000 2500 3000 3500 4000-2

-1

0

1

2

Th

rus

t (l

b/s

ec

)- Scaled training data; --* One-step ahead output predictions using ARLS

0 500 1000 1500 2000 2500 3000 3500 4000-2

-1

0

1

2

Th

rus

t (l

b/s

ec

)


(d)


Fig. 6.51: Comparison of the outpredictions of the scaled training data using the network trained by ARLS and MLMA: (a)

roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d) thrust prediction for 100 Epochs.


247

figures in the fourth line of Table 6.15 show the one-step ahead prediction errors obtained when the network is

trained by the MLMA algorithm are smaller from those obtained by ARLS algorithm. These small one-step ahead

prediction errors are indications that both trained networks approximates the nonlinear dynamics of the F-16

aircraft to an appreciable degree of accuracy. This is further justified by the small mean values of the MSE and

the performance indexes obtained by using the ARLS and MLMA algorithms respectively.

Furthermore, the trained network by the proposed ARLS and MLMA algorithms was validated is with

400 unscaled data obtained directly from experiments from the operation of the nonlinear F-16 aircraft. The

0 50 100 150 200 250 300 350 400-50

0

50

100

Ro

ll ra

te (

p,

de

g/s

ec

)


0 50 100 150 200 250 300 350 400-50

0

50

100

Ro

ll ra

te (

p,

de

g/s

ec

)


(a)

0 50 100 150 200 250 300 350 400-20

0

20

40

pit

ch

ra

te (

q,

de

g/s

ec

)


0 50 100 150 200 250 300 350 400-20

0

20

40

pit

ch

ra

te (

q,

de

g/s

ec

)


(b)


0 50 100 150 200 250 300 350 400-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


0 50 100 150 200 250 300 350 400-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


(c)

0 50 100 150 200 250 300 350 4000.5

1

1.5

2

2.5x 10

4

Th

rus

t (l

b/s

ec

)


0 50 100 150 200 250 300 350 4000

1

2

3x 10

4

Th

rus

t (l

b/s

ec

)


(d)


Fig. 6.52: Comparison of the unscaled data predictions of the trained network the using by ARLS and MLMA for (a) roll

rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d) thrust prediction for 20 Epochs.


248

comparison of the trained network predictions (red --*) with these validation data (blue -) for 20 epochs are

shown in Fig. 6.52 (a)–(d) for the roll rate ˆ( )p , pitch rate ˆ( )q , yaw rate ˆ( )r and the thrust ˆ( )t respectively.

Similar predictions of the validation data for 100 epochs are shown in Fig. 6.53 (a)–(d). The prediction accuracies

of the unscaled validation data were assessed by the computed mean prediction errors shown in the fifth line of

Table 6.15. It can be seen that the prediction errors when the MLMA algorithm is used are slightly less than those

obtained when the ARLS algorithm is used. . These prediction results for the unscaled validation data given by

Fig. 6.52 and 6.53 as well as the prediction errors in Table 6.15 justify that the network mimic satisfactory the

dynamics of the nonlinear F-16 aircraft.

0 50 100 150 200 250 300 350 400-50

0

50

100

Ro

ll ra

te (

p,

de

g/s

ec

)- Unscaled target output; --* Predicted outputs by ARLS

0 50 100 150 200 250 300 350 400-50

0

50

100

Ro

ll ra

te (

p,

de

g/s

ec

)


(a)

0 50 100 150 200 250 300 350 400-20

0

20

40

pit

ch

ra

te (

q,

de

g/s

ec

)


0 50 100 150 200 250 300 350 400-20

0

20

40

pit

ch

ra

te (

q,

de

g/s

ec

)


(b)


0 50 100 150 200 250 300 350 400-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


0 50 100 150 200 250 300 350 400-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


(c)

0 50 100 150 200 250 300 350 4000.5

1

1.5

2

2.5x 10

4

Th

rus

t (l

b/s

ec

)


0 50 100 150 200 250 300 350 4000.5

1

1.5

2

2.5x 10

4

Th

rus

t (l

b/s

ec

)


(d)


Fig. 6.53: Comparison of the output predictions of the unscaled validation data using the network trained by ARLS and

MLMA: (a) roll rate prediction, (b) pitch rate prediction, (c) yaw rate prediction and (d) thrust prediction for

100 Epochs.


249

6.4.3.2 K–Step Ahead Prediction Simulations



data (blue -) are shown in Fig. 6.53 (a)–(d) for the roll rate ˆ( )p , pitch rate ˆ( )q , yaw rate ˆ( )r and the thrust ˆ( )t K-

step ahead output predictions based on the networks trained with 20 epochs using the ARLS and MLMA

algorithms. Similar K-step ahead output predictions based on the networks trained with 100 epochs using the

0 500 1000 1500 2000 2500 3000 3500 4000-200

-100

0

100

Ro

ll ra

te (

p,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-200

-100

0

100

Ro

ll ra

te (

p,

de

g/s

ec

)


(a)

0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

pit

ch

ra

te (

q,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

pit

ch

ra

te (

q,

de

g/s

ec

)


(b)


0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


(c)

0 500 1000 1500 2000 2500 3000 3500 40000

1

2

3x 10

4T

hru

st

(lb

/se

c)


0 500 1000 1500 2000 2500 3000 3500 40000

1

2

3x 10

4

Th

rus

t (l

b/s

ec

)


(d)


Fig. 6.54: 5-step ahead output predictions (red--*) comparison with the original unscaled training data (blue -) using the

Network trained by the ARLS and MLMA algorithms for 20 Epochs: (a) roll rate prediction, (b) pitch rate

prediction, (c) yaw rate prediction and (d) thrust prediction.


250

ARLS and MLMA algorithms are shown in Fig. 6.55(a)–(d). The value K = 5 is chosen since it is a typical value

used in most model predictive control (MPC) applications. The comparison of the 5-step ahead output predictions

performance by the network trained using ARLS and the MLMA algorithms indicates a little improved

performance of the MLMA over the ARLS. The 5-step ahead output predictions of the unscaled thrust ˆ( )t

training by network trained with 20 and 100 epochs using the ARLS algorithm does not show any significant

improved as can be observed in Fig. 6.54(d) and Fig. 6.55(d) whereas that obtained by the network trained using

the MLMA algorithm is able to perform the distant prediction of the data.

0 500 1000 1500 2000 2500 3000 3500 4000-200

-100

0

100

Ro

ll ra

te (

p,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-200

-100

0

100

Ro

ll ra

te (

p,

de

g/s

ec

)


(a)

0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

pit

ch

ra

te (

q,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

pit

ch

ra

te (

q,

de

g/s

ec

)


(b)


0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 4000-100

-50

0

50

100

Ya

w r

ate

(r,

de

g/s

ec

)


(c)

0 500 1000 1500 2000 2500 3000 3500 40000

1

2

3x 10

4

Th

rus

t (l

b/s

ec

)


0 500 1000 1500 2000 2500 3000 3500 40000

1

2

3x 10

4

Th

rus

t (l

b/s

ec

)


(d)


Fig. 6.55: 5-step ahead output predictions (red--*) comparison with the original unscaled training data (blue -) using the

Network trained by the ARLS and MLMA algorithms for 100 Epochs: (a) roll rate prediction, (b) pitch rate

prediction, (c) yaw rate prediction and (d) thrust prediction.


251

The computation of the mean value of the 5-step ahead prediction error (MVPE) using (3.93) gives

1.6506e-002 and 1.6537e-003 by the network trained using ARLS and MLMA algorithms for 20 epoch and

1.6524e-2 and 1.6528e-3 for 100 epoch respectively as shown in the sixth line in Table 6.15. The small mean

values of the 5-step ahead prediction error (MVPE) are indications that the trained network approximates the

dynamics of the nonlinear F-16 aircraft to an appreciable degree of accuracy with the network using MLMA

giving smaller prediction errors.

By comparing the one-step output prediction of the scaled thrust ˆ( )t training in Fig. 6.50(d) and Fig.

6.51(d) with the 5-step ahead output predictions of the unsclaed thrust ˆ( )t data in terms of Fig. 6.54(d) and Fig.

6.55(d), the network trained using the ARLS algorithm does not predict accurately the thrust ˆ( )t data. This is also

justified by the respective one-step and 5-step ahead predictions errors on the fourth and sixth lines on Table 6.15

respectively. This is due to the magnitude of the thrust ˆ( )t data despite the scaling as in Fig. 6.52. On the other

hand, the MLMA algorithm provides additional scaling based on the scaling parameter s in the formulation of the

MLMA algorithm in Chapter 3. This pre-scaling property introduced in the MLMA is evident in the accurate

prediction of the thrust ˆ( )t data with relatively small prediction errors when compared to that obtained using the

ARLS. However, the small output predictions errors obtained by the networks of both the ARLS and MLMA

algorithms are acceptable but the further verification of these algorithms for online identification and control will

be investigated to further justify their performances.

6.4.3.3 The Akaike’s Final Prediction Error (AFPE) Estimates

The implementation of the AFPE algorithm discussed in Chapter 3 and defined by (3.94) resulted to the

estimates which are given in the last line of Table 6.15 for 20 and 100 epochs. This implementation concerns the

network that is trained with ARLS and the MLMA algorithms with multiple weight decay for the regularized

criterion. These small values of the AFPE estimates indicate that the trained captures the underlying dynamics of

the nonlinear F-16 aircraft and is not over-trained [Sjöberg and Ljung, 1995]. This in turn implies that optimal

network parameters have been selected including the weight decay parameters. Again, the results of the AFPE

estimates computed for the networks trained using the MLMA algorithm are much smaller when compared to

those obtained using ARLS algorithm.


252

Training

Algorithm

Neural

Network

Model

b

n TDL−

a

n TDL−

Nonlinear F-16 Aircraft

( , ( ))k kε θ

ˆ( )Y k

( )Y k ( )U k • •

•

•

•

+

−

( )bn

kϕ

( )an

kϕ

cn TDL−

( , ( ))cn k kϕ θ

( )d k

•

( , ( ))k kε θ

Nonlinear

F-16 Aircraft

( )Yk ( )Uk •

Neural Network

NNARMAX Model

Nonlinear

Optimizer •

+ −

'( )R k

( )Ek

•

Constraints

First-Order

Low Pass

Filter

( )dk

( )Rk

(a) (b)

Fig. 6.56: The nonlinear F-16 model: (a) neural network model identification and (b) neural network-based adaptive control

scheme using the NAMPC control strategy.

6.4.4. Closed-Loop Model-Based Adaptive Control of the Nonlinear F-16 Aircraft

Besides the training of the neural network model with static data taken during the open loop simulation

experiments, it would be of interest to observe the application of the presented network training algorithms in

closed-loop with an adaptive control of certain aircraft flight control variables. This adaptive control scheme, as it

was explained in Chapter 4, involves the computation of the control actions by the NAMPC algorithm. In

specific, every time new sensor samples are received, a new set of data is formed consisting from previous

samples and the new ones. Then the neural network that models the aircraft dynamics is trained by one of the

ARLS or MLMA methods. Using this newly trained network as aircraft model, an NAMPC controller is designed

to track any desired reference trajectory. The performance of the proposed NAMPC scheme was evaluated for the

reference trajectories shown in Fig. 6.48 and explained in section 6.4.1. To achieve these trajectories, constraints

are imposed on the controlled and manipulated or flight control variables. These constraints are given in Table

6.16. Τhe NN that is used at the control start up is trained outside the control loop with data received from the

open-loop experiment. For the performance evaluation of the complete closed-loop, the NAMPC control scheme

shown in Fig. 4.4 in conjunction with the neural network NNARMAX model identification scheme shown in Fig.

3.5(b) is linked with the F-16 aircraft Simulink model explained in Appendix D and build from first-principles.

The nonlinear F-16 aircraft Simulink model is used in place of the “system” in both Fig. 3.5(b) and Fig. 4.4 as

shown in Fig. 6.56 (a) and (b) respectively for the nonlinear F-16 neural network model identification scheme and

adaptive control using the NAMPC control strategy.

For the closed-loop start-up, a network trained by the ARLS or the MLMA algorithms for 20 and 100

epochs was used and the NAMPC algorithm was designed by using these initial trained network, and the

constraints of Table 6.16. The obtained NAMPC initial design parameter values are given in Table 6.17 according


253

Table 6.16: Input and output constraints on the nonlinear F-16 aircraft

NAMPC Process Constraints

Aileron Elevator Rudder Thrust

Minimum control input (minU ) -21.5 -25 -30 -10,000

Maximum control input (maxU ) 21.5 25 30 +10,000

Minimum predicted output (minY ) -80 -60 -120 1,000

Maximum predicted output (maxY ) 80 60 120 19,000

Desired reference signal ( R' ) +80 +60 +120 19,000

Table 6.17: Tuning parameters for the NAMPC controller

NAMPC Tuning Parameters

Aileron Elevator Rudde

r

Thrust

Initial control input (ICI, U) 11 11 11 500

Initial predicted output (IPO, Y) 0 0 0 1000

Nd 1 1 1 1

Nu 2 2 2 5

Np 5 5 5 20

κ 0.9 1 0.8 1

ρ 0.003 0.08 0.08 0.008

λ 0.4 0.5 0.1 0.6

Am [1 -0.7] [1 -0.7] [1 -0.7] [1 -0.7]

Bm [0 0.3] [0 0.3] [0 0.3] [0 0.3]

δ 1e-5 1e-5 1e-4 1e-6

Uiter 5 5 5 10

to the NAMPC formulation in Chapter 4. They were found to be the same for both the ARLS and the MLMA

algorithms with 20 and 100 epochs.

The closed loop simulation was performed over a period of 160 sampling instants by superimposing the inputs

and disturbances given in Table 6.14 on F-16 aircraft Simulink model.

The closed loop simulation results for the desired outputs (controlled variables), namely: the roll rate ˆ( )p ,

pitch rate ˆ( )q , yaw rate ˆ( )r and the thrust are shown in Fig. 6. 57(a) while the flight control variables

(manipulated inputs), namely: the aileron deflection a


δ , rudder deflection r

δ and throttle

control t

δ , are shown in Fig. 6.57(b) for the neural network model trained using the ARLS and MLMA

algorithms with 20 epochs. Similar simulation results for 100 epochs are shown in Fig. 6.58 (a) and (b).

One can observe that the yaw rate response of Fig. 6.57(a) obtained with the model that was trained with

the MLMA for 20 epochs tracks the desired reference in a much better way than the ARLS algorithm. By

comparing the control signals in Fig. 6.57(b) and Fig. 6.58(b) which indicates the control efforts (energy)

obtained based on the models trained for 20 and 100 epochs respectively, one can conclude that the flight control

variables (manipulated variables) for the aileron deflection a


δ and rudder deflection r

δ

consume more energy when training is performed with the ARLS.


254

It is apparent from the above discussion that the performance of the proposed model-based NAMPC

control algorithm based on a model trained with the MLMA algorithm outperforms that of the ARLS algorithm.

Also, by referring to the performances of the output predictions and the computed control inputs shown in

Fig. 6.58 based on the models trained for 100 epochs using both the ARLS and the MLMA algorithms, the non-

minimum phase behaviour in the aileron deflectiona


δ and rudder deflection r

δ output

predictions is evident in Fig. 6.58 (a) resulting from the use of the model trained using the ARLS algorithm.

When compared to Fig. 6.57(b), the computed control inputs in Fig. 6.58(b) require relatively small energy

0 20 40 60 80 100 120 140 160

-50

0

50

Ro

ll ra

te (

p)

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-50

0

50

Pit

ch

ra

te (

q)

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-100

-50

0

50

100

Ya

w r

ate

(r)

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 1600

5000

10000

15000

20000

(a)

Number of samples

Th

rott

le s

ett

ing

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-20

-10

0

10

20

Aile

ron

de

fle

cti

on

( δ

a )

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-20

-10

0

10

20

Ele

va

tor

de

fle

cti

on

( δ

e )

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-20

0

20

Ru

dd

er

de

fle

cti

on

( δ

r )

ARLS

MLMA

0 20 40 60 80 100 120 140 160

0

500

1000

1500

2000

(b)

Number of samples

Th

rott

le s

ett

ing

( δ

t )

ARLS

MLMA

Fig. 6.57: Responses of controlled variables and time variations of the manipulated variables when NN is trained with

ARLS and MLMA algorithms for 20 epochs: (a) roll rate, pitch rate, yaw rate and thrust and (b) aileron

deflection, elevator deflection, rudder deflection and the throttle setting.


255

consumption.

As in Fig. 6.57, the identification and control performance based on the models obtained using both the

ARLS and the MLMA with 100 epochs shows the efficiency and reliability of the proposed NAMPC algorithm. It

can also be observed that the control performances based on the network trained using both ARLS and the

MLMA algorithms perform indistinguishably well and also justify the effectiveness of the NAMPC controller.

0 20 40 60 80 100 120 140 160

-50

0

50

Ro

ll ra

te (

p)

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-50

0

50

Pit

ch

ra

te (

q)

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-100

-50

0

50

100

Ya

w r

ate

(r)

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 1600

5000

10000

15000

20000

(a)

Number of samples

Th

rott

le s

ett

ing

pre

dic

tio

ns

Ref

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-20

-10

0

10

20

Aile

ron

de

fle

cti

on

( δ

a )

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-20

-10

0

10

20

Ele

va

tor

de

fle

cti

on

( δ

e )

ARLS

MLMA

0 20 40 60 80 100 120 140 160

-20

0

20

Ru

dd

er

de

fle

cti

on

( δ

r )

ARLS

MLMA

0 20 40 60 80 100 120 140 160

0

500

1000

1500

2000

(b)

Number of samples

Th

rott

le s

ett

ing

( δ

t )

ARLS

MLMA

Fig. 6.58: Responses of controlled variables and time variations of the manipulated variables when NN is trained with

ARLS and MLMA algorithms for 100 epochs: (a) roll rate, pitch rate, yaw rate and the thrust and (b) aileron

deflection, elevator deflection, rudder deflection and the throttle setting.


256

Table 6.18: Constraints for the nonlinear F-16 aircraft

AGPC NAMPC Process Constraints

Aileron Elevator Rudder Thrust Aileron Elevator Rudder Thrust

Minimum control input (minU ) -21.5 -25 -30 -10,000 -21.5 -25 -30 -10,000

Maximum control input (maxU ) 21.5 25 30 10,000 21.5 25 30 10,000

Minimum predicted output (minY ) -80 -60 -120 1,000 -80 -60 -120 1,000

Maximum predicted output (maxY ) 80 60 120 190000 80 60 120 190000

Desired reference signal ( R' ) +80 +60 +120 19000 +80 +60 +120 19000

Table 6.19: Tuning parameters for GPC and NAMPC controllers

AGPC NAMPC Tuning Parameters

Aileron Elevator Rudder Thrust Aileron Elevator Rudder Thrust

Initial control input (ICI, U) 11 11 11 500 11 11 11 500

Initial predicted output (IPO, Y) 0 0 0 1000 0 0 0 1000

Nd 1 1 1 1 1 1 1 1

Nu 2 2 2 3 2 2 2 5

Np 4 4 4 7 5 5 5 20

κ 2 1 1.3 4.8 0.9 1 0.8 1

ρ 1 0.8 0.8 1.2 0.003 0.08 0.08 0.008

λ - - - - 0.4 0.5 0.1 0.6

Am [1 -0.7] [1.05 -0.7] [1 -0.7] [1 -0.7] [1 -0.7] [1 -0.7] [1 -0.7] [1 -0.7]

Bm [0 0.3] [0 0.3] [0 0.3] [0 0.3] [0 0.3] [0 0.3] [0 0.3] [0 0.3]

δ - - - - 1e-5 1e-5 1e-4 1e-6

Uiter - - - - 5 5 5 10

6.4.5 Evaluation and Performance Comparison of the AGPC and NAMPC

Algorithms for the Control of the Nonlinear F-16 Aircraft Auto-Pilot System

Initially, the neural network model of the F-16 aircraft is identified and validated as explained in the sub-

section 6.4.1. The controllers were then simulated subject to the constraints given in Table 6.18 and tuned using

the NN model of the aircraft. The obtained optimal tuning parameters are given in Table 6.19. Next, the nonlinear

F-16 aircraft model is placed in closed-loop with the NN identification scheme based on the ARLS algorithm and

the AGPC and NAMPC controllers. At each sampling instant, a new input-output data is obtained from the

aircraft, a neural network model is identified and the two controllers are designed and implemented online.

6.4.5.1Comparison of Simulation Results of AGPC and NAMPC Controllers

The closed-loop AGPC and the NAMPC responses for the roll rate, pitch rate, yaw rate and the throttle

setting are shown in Fig. 6.59(a) – (d) while the time variations of the manipulated variables, that is, the aileron

deflection, elevator deflection, rudder deflection and the thrust are shown in Fig. 6.59(e) – (h).

For the roll, pitch and yaw rate, the AGPC and the NAMPC perform very well as can be seen in Fig.

6.59(a)–(c). However, in Fig. 6.59(d) one can observe that the NAMPC shows better control performance over the

AGPC where the magnitude of the throttle setting is very large. Actually, the AGPC exhibits some oscillatory

overshoots and hardly tracks the desired reference signal for the throttle.


257

0 50 100 150 200 250 300 350

-50

0

50

(a)

Ro

ll ra

te (

p)

co

ntr

ol

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 350

-50

0

50

(b)

Pit

ch

ra

te (

q)

co

ntr

ol

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 350

-20

-10

0

10

20

(e)

Aile

ron d

eflection (

δa )

AGPC

NAMPC

0 50 100 150 200 250 300 350

-20

0

20

(f)

Ele

vato

r deflection (

δe )

AGPC

NAMPC

0 50 100 150 200 250 300 350

-100

-50

0

50

100

(c)

Ya

w r

ate

(r)

co

ntr

ol

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 350

0

0.5

1

1.5

2

x 104

(d)

Number of samples

Th

rus

t (t

) c

on

tro

l

Ref

AGPC

NAMPC

0 50 100 150 200 250 300 350

-20

0

20

(g)

Ru

dd

er

de

fle

cti

on

( δ

r )

AGPC

NAMPC

0 50 100 150 200 250 300 350

0

500

1000

(h)

Number of samples

Th

rott

le c

om

ma

nd

( δ

t )

AGPC

NAMPC

Fig. 6.59: The AGPC (blue--) and NAMPC (red -) responses of (a) roll rate p, (b) pitch rate q, (c) yaw rate r, and (d) the

throttle setting t with their respective manipulated variables (e) aileron deflections, (f) elevator deflections, (g)

rudder deflections and (h) the thrust command to track the reference signal (green .-).

6.4.5.2 Computation Time for the Neural Network Model Identification and

Adaptive Control of the Nonlinear F-16 Aircraft Auto-Pilot System


implemented on an Intel® Core™2 CPU running at 1.86GHz using the MATLAB “parfor” command available in

the MATLAB Distributed and Parallel Toolbox. The “parfor” command implements the loop specified by the

command in parallel and it uses the four Intel® processors available on the computer system. This MATLAB

facility allows the utilization of the four processors available on the computer for the implementation of the


258

0 50 100 150 200 250 300 3506

6.05

6.1

6.15

6.2

6.25

6.3

6.35

(b)

GP

C c

om

pu

tati

on

tim

e f

or

F-1

6 a

irc

raft

(s

ec

)

AG

PC

co

mp

uta

tio

n t

ime

fo

r F

-16

air

cra

ft (

se

c)

0 50 100 150 200 250 300 350

11

11.05

11.1

11.15

11.2

11.25

11.3

11.35

(d)

NA

MP

C c

om

pu

tati

on

tim

e f

or

F-1

6 a

irc

raft

(s

ec

)

(a) (b)


Fig. 6.60: Computation time for the parallel implementation of the identification and control strategies for the nonlinear F-

16 auto-pilot control system at each time sample: (a) AGPC and (d) NAMPC.

identification and control algorithm at each time step. The “parfor” is initialized using the MATLAB “matlabpool

open x” command, where x specifies the number of processor(s) to be used in the computation. Of course, after

the computation, the parallel session is closed using the command “matlabpool close”.

The plot of the computation time at each time sample is shown in Fig. 6.59. The average computation

time for the identification and control loop using the AGPC from Fig. 6.59(a) is calculated to be 6.1048 seconds

while that for the NAMPC from Fig. 6.60(b) is calculated to be 11.0367 seconds.

As can be seen in Fig. 6.60, the identification and control using NAMPC requires significantly more

computation time than the AGPC. This is the cost that has to be paid for higher control accuracy. This time is

approximately twice for the nonlinear F-16 aircraft auto-pilot control when compared to the time taken by the

AGPC.

Chapter 6 Case Study 4: FPGA Implementation of NN–Based AGPC Algorithm

259

6.5 Real-Time Implementation of the Neural Network-Based Adaptive Generalized

Predictive Control (AGPC) Algorithm for Nonlinear F-16 Fighter Aircraft

Auto-Pilot Control System on a Xilinx Virtex-5 FX70T ML507 FPGA Board

In Section 6.4.2, the adaptive recursive least squares (ARLS) algorithm and the modified Levenberg-

Marquardt algorithm (MLMA) were applied for the nonlinear neural network model identification of a

NNARMAX model of a nonlinear F-16 fighter aircraft (or which would be referred to as F-16 aircraft for

simplicity). The validation results of Section 6.4.3 showed that although both algorithms performed excellently

but the MLMA algorithm outperformed the ARLS algorithm especially in terms of the lower computation time

which is very critical in the F-16 aircraft model identification and control. The suitability of the ARLS and

MLMA algorithms for closed-loop F-16 aircraft was investigated in Section 6.4.4, and the results also showed

that the MLMA algorithm outperformed the ARLS algorithm. Then in Section 6.4.5, the performance comparison

of the proposed neural network-based adaptive generalized predictive (AGPC) and the nonlinear adaptive model

predictive control (NAMPC) algorithms for the F-16 aircraft auto-pilot system control was investigated, where the

NAMPC showed improved control performance when compared to that obtained using the AGPC but at the

expense of approximately twice the computation time required by the AGPC.

In Section 6.4.5.2, the average computation time for the neural network-based model identification and

adaptive control of the nonlinear F-16 aircraft were found to be 6.1048 and 11.0367 seconds for the AGPC and

the NAMPC algorithms respectively. These obtained computation times are much higher than the 0.5 seconds

time constants (sampling time) of the F-16 aircraft [Russel, 2003]. It is possible that the parallelization of these

algorithms could result in significant reduction of the computation time but this aspect is not considered within

the framework of the current study. Also, the two proposed neural network-based nonlinear model identification

algorithms have also not been considered within the framework of the current study.

However, since the AGPC showed acceptable control performance of the F-16 aircraft auto-pilot control

system that not as efficient as that obtained using the NAMPC but with reduced computation time when compared

to the NAMPC, the effort in this work is directed towards the real-time implementation of the neural network-

based AGPC algorithm to achieve a further reduced computation time that would be less than the time constant of

0.5 seconds for the nonlinear F-16 aircraft. The second reason why the AGPC algorithm is considered here for

FPGA implementation is due to its simplicity which may facilitate its hardware realization and analysis by

exploiting hardware trade-off and speed for improved performances.

In the present study, the FPGA realization and implementation of the AGPC algorithm is investigated

using Virtex-5 FX70T FPGA development board of Fig. 2.16 which was introduced and discussed in Section

2.8.4 of Chapter 2. The verification and performance evaluation of the proposed FPGA implementation of the

AGPC algorithm on the Virtex-5 FX70T ML507 FPGA development board is performed in closed-loop with the


260

MATLAB & Simulink

•

Neural Network Model

Identification Scheme

RS232 UART

Serial Cable

(Null Modem)

JTAG

Cable

Virtex-5 FX70T ML507 FPGA Board

with Embedded PowerPC™440

Processor and AGPC Co-Processor ( )U k ( )Y k

( )d k

ˆ( )kθ

Fig. 6.61: The proposed scheme for the FPGA implementation, verification and performance evaluation of a neural

network-based adaptive generalized predictive control (AGPC) algorithm on a Xilinx Virtex-5 FX70T

ML507 FPGA board.

F-16 aircraft. Since none of the two proposed neural network identification scheme is implemented on the FPGA,

this MLMA algorithm is implemented on the host development computer, which will also simulate the validated

Simulink model of the nonlinear F-16 aircraft in closed–loop with a Xilinx Virtex-5FX70T ML507 FPGA board

at each sampling time. The proposed strategy for the FPGA implementation, verification and performance

evaluation of the AGPC algorithm is shown in Fig. 6.61. In this figure, the host development computer is shown

on the right; the Virtex-5 FX70T ML507 FPGA development board is shown in the middle; and the display

monitor is shown on the left. The neural network model identification scheme based on the MLMA algorithm and

the nonlinear F-16 are configured and programmed to run in MATLAB and Simulink respectively. As in Section

6.4: ( )U k is the control input vector to the F-16 aircraft, ( )Y k is the output response of the F-16 aircraft, ( )d k is

the disturbances affecting the F-16 aircraft, and ˆ( )kθ is the identified neural network model of the F-16 aircraft.

In Fig. 6.61, the embedded programs that represents the AGPC algorithm, the embedded PowerPC™440

processor, and other memory and hardware device drivers are downloaded through the JTAG cable; whereas the

communication between the host computer and the Virtex-5 FX70T ML507 FPGA board is accomplished through

the RS232 UART serial cable (null modem). The term “null modem” is used to indicate that the RS232 serial

cable transmit and receive lines on the host computer and the Virtex-5 ML507 FPGA board are crosslinked.

Unlike the standard RS232 serial cable where it is assumed that one end is data terminal equipment (DTE) and the

other end is data circuit-terminating equipment (DCE), the null modem connection is not covered by a standard

based on a specific wiring layout. Finally, the connection between the host computer and the display monitor is a

VGA cable which delivers the information contents of the host computer for display. Of the three connections

shown in Fig. 6.1, note that only the RS232 serial cable is bi-directional for data transmit and receive operations.


261

The proposed technique shown in Fig. 6.61 can be summarized as follows:

1) It is assumed that neural network (NN) has been previously trained based on experimental data obtained

from the simulation of the F-16 aircraft model, that the optimal network parameters have been selected

as in Section 6.4.2, and a validated NN model ˆ( )kθ has been obtained,

2) It is also assumed that the obtained NN have been used to simulate the AGPC subject to the constraints

given in Table 6.18 and the optimal AGPC control parameters have been selected as given in Table 6.19.

3) The simulations between the host computer and the Virtex-5 FPGA board are controlled by a file named

“Flag_a” which is resident on the host computer with initial content “a”. The simulations are initialized

on the host computer. At the end of the NN model identification process, the identification algorithm

writes letter “b” to “Flag_a” to indicate end of identification process. This “b” initializes the embedded

AGPC algorithm on the FPGA, which continuously scans “Flag_a” in search for “b” in order to compute

the new control inputs. At the end of the control inputs computation, the embedded AGPC scheme writes

an “a” back to “Flag_a” to mark end of control inputs computation and initiate a new identification and

control sequence. These read/write and receive/transmit are performed via the RS232 serial cables and

serial ports on the host computer and the Virtex-5 FX70T ML507 FPGA board. As in Section 6.4.4, all

simulations in this section are set for 1k = to 160 samples.

4) Now, referring Fig. 6.61, at the current sampling instant 1k = , the validated Simulink model of the F-16

aircraft is simulated with the current input commands in the presence of disturbances ( )d k to obtain the

output response ( )Y k . The current inputs and the output response are added to the training data set NZ

and the network is trained to obtain a NN model ˆ( )kθ using the MLMA algorithm.

5) The obtained NN model ˆ( )kθ is then employed for the AGPC controller design to compute the next

control inputs that will keep the output response ( )Y k close to the desired reference signal '( )R k . These

computations are performed at the current sampling instant 1k = with a time constant 0.5T = and must

be completed to update the control inputs within this time to keep the F-16 aircraft in its normal route.

6) At time 1k k= + , the NN model identification and the AGPC computations are repeated on the basis of

the “a” and “b” respectively in the text file “Flag_a”.

6.5.1 Model-Based Approach for the FPGA Implementation of the AGPC Algorithm

The model-based approach is proposed here as the technique to be used for the efficient realization and

implementation of the AGPC algorithm on the FPGA board. The term “model-based” is based on the fact that:

1) A hardware model of the AGPC algorithm is first realized by synthesizing the AGPC algorithm

expressed and implemented as MATALB programs using the Xilinx AccelDSP synthesis tool,


262

The MathWorks

MATLAB & Simulink

Xilinx Integrated Software Environment (ISE)

Foundation 11.5

Virtex-5 FX70T

ML507 FPGA

Development Board

Xilinx AccelDSP

Synthesis Tool

Xilinx

AccelWare

(IP Library)

Xilinx Core

Generator

(IP Library)

•

Xilinx Embedded Development

Kit (EDK)

B A RTL Top-

Level Module

Xilinx System Generator

for DSP Generator

Xilinx DSP

Blockset

(IP Library)

Xilinx

Platform

Studio

(XPS)

Xilinx & User

IP Cores

(IP Catalogue)

Xilinx

Software

Development

Kit (SDK)

Xilinx SDK

Compilers, OSs,

& Debug Tools

Host Computer

RTL Top-

Level Module

Fig. 6.62: The block diagram for the proposed model-based design flow for the FPGA implementation of the AGPC

algorithm on Virtex-5 FX70T ML507 FPGA development board.

2) Simulink, which is well-known as an excellent model-based development software, is then used to model

the resulting AGPC algorithmic hardware model as a complete system using additional intellectual

property (IP) cores from the Xilinx System Generator for DSP block library, and

3) A hardware model called the Hardware Co-Simulation (HW-CoSim) block model that can encrypt the

complete System Generator model of the AGPC algorithm can be generated to perform prior FPGA-in-

the-loop simulations using the actual F-16 aircraft model available in the MATLAB/Simulink

environment. By this way, how the complete AGPC algorithmic hardware model will perform when

programmed into the FPGA can be evaluated right from the Simulink environment via the Xilinx System

Generator for DSP, and all design modifications can be made at this point if the simulation results of the

designed and realized hardware performance does not meet the desired design requirements.

The simplified block diagram for the proposed model-based approach for the FPGA implementation of

the AGPC algorithm is shown in Fig. 6.62. Note this block diagram is different from the more general one shown

in Fig. A.1 of Appendix A–1, in that it is specific to the FPGA implementation proposed for the AGPC algorithm

presented in this section 6.5. The figure in Fig. 6.62 consists of five distinct blocks as used in this work are:

1) The MATLAB/Simulink from The MathWorks is used for modeling and verifying the algorithmic and

model-based designs,

2) The Xilinx AccelDSP which is used for modeling and synthesizing MATLAB-based algorithms to

generate a synthesizable hardware model of the algorithm,


263

3) The Xilinx System Generator for DSP, which is a sub-set of MATLAB/Simulink, first uses the generated

hardware from the AccelDSP design flow to create a complete synthesizable model of the AGPC

algorithm, create a hardware co-simulation (HW Co-Sim) block for FPGA-in-the-loop simulation for

performance verification of the System Generator model, and finally exports the System Generator

model as a pcore for integration with a pre-designed PowerPC™440 embedded processor system,

4) The Xilinx embedded development kit (EDK) is used to design an embedded processor system. As

discussed in Chapter 5, the embedded processor proposed for use in this work is the PowerPC™440 hard

processor core. Unlike in Fig. A.1 of Appendix A–1, there is no link between MATLAB/Simulink or

System Generator for DSP with the Xilinx EDK; rather the AGPC pcore generated by the System

Generator for DSP is copied manually to the pre-designed embedded PowerPC™440 processor directory

and integrated with processor system in the XPS environment within the EDK as discussed later in sub-

section 6.5.5, and

5) The Xilinx integrated software environment (ISE) Foundation uses the register-transfer-level (RTL) top-

level module of the design to generate an optimized bitstream for programming the Virtex-5 FX70T

ML507 FPGA development board via the JTAG cable between the host computer and the FPGA board.

Note that the circled symbol (A) is used to illustrate the System Generator path for FPGA-in-the-loop using the

hardware co-simulation block without using the Xilinx ISE as a gateway. However, it calls and uses the Xilinx

ISE Foundation during the compilation, optimization routines and generation of the hardware co-simulation block

as indicated by the circled symbol (B) in tan. Although the host computer is not counted as part of the five blocks,

it is the main development platform upon which all implementations and simulations are performed.

In the remaining parts of this section, the synthesis, modeling, performance verification and evaluation of

the FPGA implementation of the AGPC algorithm on the Virtex-5 FPGA board are given.

6.5.2 Hardware Synthesis of the AGPC Algorithm Using the Xilinx AccelDSP

The adaptive generalized predictive control (AGPC) algorithm is implemented in this work as MATLAB

programs. To realize the hardware implementation of the AGPC algorithm on the Xilinx Virtex-5 FPGA, the

AGPC algorithm is first modeled and synthesized using the AccelDSP modeling and synthesis tool ([XAccelSG,

2009]; [XAccelUG, 2009]). The Xilinx AccelDSP synthesis tool allows for the modeling and synthesis of high-

level MATLAB algorithms for realization on Xilinx FPGAs as discussed in Appendix A–2.

The detailed formulation of the AGPC algorithm is given in Section 4.3 of Chapter 4 and has been used in

Chapter 6.4.5 for the F-16 aircraft control. Here, the F-16 aircraft model identification and control objectives are

as that specified in Section 6.4.1, and the desired reference trajectories are as given in Fig. 6.48 of Section 6.4.1.

The NNARMAX neural network identification scheme based on the MLMA algorithm and method used here is


264

MATLAB

&

Simulink (R2009b)

AGPC Algorithm

– MATLAB Scripts and

Function Design Files

– Design Implementation &

Performance Verification

Xilinx AccelDSP Synthesis Tool

Generate RTL

Verify

Fixed Point

Generate

Fixed Point

Project

Verify

Floating Point

Analyze

Verify RTL

Xilinx AccelDSP IP Libraries

Xilinx AccelWare

(IP Library)

Xilinx Core Generator

(IP Library)

D

E

C

Generate System

Generator

Fig. 6.63: The block diagram of the AGPC modeling and Synthesis using MATLAB and Xilinx AccelDSP synthesis tool.

the same as that shown in Fig. 6.56 of Section 6.4.4. The AGPC is implemented as MATLAB programs with a

script file named “F16_Script.m” and five design function files, namely: “agpc_acceldsp_model.m” as the top-

level design function file that calls the AGPC algorithm, “agpc_algorithm.m” which implements the main AGPC

algorithm, “ref_filter.m”, which computes and filters the reference signal, “shift.m” which is used to update the

predicted outputs and control inputs sequence during the AGPC computations to obtain the optimal control

signals, “sigmoid_tanh.m” which is used to extract the input-to-hidden layer weights of the F-16 aircraft NN

model.

The objective here is to create a synthesizable hardware model representative of the AGPC algorithm, and

so the “Generate System Generator” option in Fig. A.4 (Appendix A–2) is considered here which is shown in the

block diagram of Fig. 6.63. The procedures required to generate the hardware model of the AGPC algorithm are

summarized under the three major blocks shown in Fig. 6.63.

The procedures in Fig. 6.63 begin with the top-left block. The NN model identification scheme and the

AGPC control scheme are programmed to run in closed-loop with the F-16 aircraft model in MATLAB. The

script file “F16_Script.m” invokes the F-16 model identification algorithm which identifies the NN model of the

F-16 aircraft and subsequently invokes the top-level design function file “agpc_acceldsp_model” which

implement the AGPC algorithm for 160k = samples, subject to the constraints given in Table 6.18. As discussed

earlier above, at the end of each of the model identification process “b” is written to “Flag_a” and the model is

also placed in the AGPC directory. Based on “b”, the AGPC algorithm uses the new model to compute the new

control signals which is used to control the F-16 aircraft and write an “a” to “Flag_a” at the end of the control

computations. And the identification processor is repeated for the next time sample and consequently the control


265

computation until 160k = . This identification and control is the verification phase of the AGPC algorithm which

are the functions of the two blocks on the left side of Fig. 6.63 and in indicated by the path with circled (C) in tan

color.

Fig. 6.64: AccelDSP design flow to generate the System Generator block model that encrypts the AGPC algorithm.


266

The path with circled (E) in tan color is the synthesis phase where the pre-verified AGPC algorithmic

MATLAB files are synthesized with the Xilinx AccelDSP synthesis tool. The Xilinx AccelDSP tool also include

as set of intellectual property (IP) libraries (Xilinx AccelWare and Xilinx Core Generator functions) which can be

included in MATLAB algorithmic programs. As shown on the right side in Fig. 6.63, the Xilinx AccelDSP

synthesis tool consists of eight stages to generate the hardware model representative that will encrypt the all the

five design function files of the AGPC algorithm. These stages are summarized as follows:

Stage 1) Project: The Xilinx AccelDSP is launched to open the AceelDSP GUI as shown in Fig. 6.64. The

project name is specified here as “aggpc_acceldsp” and the project directory where all design file,

generated files and generated reports will be stored is also specified. The design flow is set to

“System Generator”; the RTL language is set to “VHDL”; the fixed point language is set to “C++”;

the technology is set to “Virtex-5”; the device is “XC5VFX70T”; the speed grade is specified as “–1”;

and the system frequency is set to “100 MHz”. These selected parameters are shown in Fig. 6.64. In

the FPGA implementation that follows, all “sampling times”, “BLOCK PERIOD” and “BLOCK

LATENCY” are made with respect to this system clock frequency which is fixed for all embedded

processors designed for Virtex-5 FX70T ML507 FPGA board (see Fig. 5.3(d) for example).

Stage 2) Verify Floating Point: This stage prompts for the script file which in this case is “F16_Script”. The

AccelDSP synthesis tool uses this script file to implement and verifies the floating point MATLAB

AGPC algorithm. The F-16 aircraft AGPC control simulation results are shown in Fig. 6.65. As can

be seen in Fig. 6.65, the control performance of the AGPC for the F-16 aircraft control is acceptable

due to the good tracking of the desired reference signals.

Stage 3) Analyze: The stage prompts for the AGPC algorithm top-level function file “agpc_acceldsp_model”,

and then performs extensive analysis on this file and its sub-function files to ensure that they are

fully synthesizable and that they conform to the minimum AccelDSP style guidelines described in

the MATLAB for Synthesis Style Guide [XAccelSG, 2009]. Extensive simulations were performed

to ensure that the AGPC algorithm conforms to the synthesizable AccelDSP style formats.

Stage 4) Generate Fixed Point: The AccelDSP synthesis tool generates a fixed point equivalent of the floating

point MATLAB AGPC algorithm and all the design function files. An intensive quantization of the

design was performed at this stage so that the resulting fixed point AGPC algorithm will produce

results that are identical or approximately the same as the floating point AGPC algorithm. Since

“Overflows” have more severe effects on the design than the “Underflows”, significant efforts was

made to eliminate “Overflows” in order to achieve fairly accurate results. For complete details and

discussions on quantization, Overflows, Underflows and their effects on “Generate Fixed Point”

results, the reader is please referred to the following references [XAccelSG, 2009]; [XAccelUG,

2009]; [XAccelWare, 2007]; [XSysGen, 2010]).


267

Stage 5) Verify Fixed Point: The generated fixed point is verified and comparisons are made with the floating

point equivalent. If significant discrepancies in the results exist, the quantization process is

performed again and the “Generated Fixed Point” and the “Verify Fixed Point” are both repeated.

Stages 3), 4), and 5) required extensive simulations to achieve accurate results. As it is evident in

0 20 40 60 80 100 120 140 160

-50

0

50

Roll

rate

(p) pre

dic

tions

0 20 40 60 80 100 120 140 160

-20

-10

0

10

20

(a)Number of samples

Aile

ron d

eflections ( δ

a )

Roll Ref

AGPC

0 20 40 60 80 100 120 140 160

-50

0

50

Pitch rate

(q) pre

dic

tions

0 20 40 60 80 100 120 140 160-30

-20

-10

0

10

20

30

(b)Number of samples

Ele

vato

r deflections ( δ

e )

Pitch Ref

AGPC

0 20 40 60 80 100 120 140 160

-100

-50

0

50

100

Yaw

rate

(r) p

redic

tions

0 20 40 60 80 100 120 140 160-40

-20

0

20

40

(c)Number of samples

Rudder deflections ( δ

r, deg)

Yaw Ref

AGPC

0 20 40 60 80 100 120 140 160

-50

0

50

Roll

rate

(p) pre

dic

tions

0 20 40 60 80 100 120 140 160

-20

-10

0

10

20


Aile

ron d

eflections ( δ

a )

Roll Ref

AGPC

0 20 40 60 80 100 120 140 160

-50

0

50

Pitch rate

(q) pre

dic

tions

0 20 40 60 80 100 120 140 160-30

-20

-10

0

10

20

30


Ele

vato

r deflections ( δ

e )

Pitch Ref

AGPC

0 20 40 60 80 100 120 140 160

-100

-50

0

50

100

Yaw

rate

(r)

pre

dic

tions

0 20 40 60 80 100 120 140 160-40

-20

0

20

40


Rudder

deflections ( δ

r, deg)

Yaw Ref

AGPC

Fig. 6.65: Floating-point simulation results of the F-16 aircraft

control using the MATLAB AGPC algorithm with

a total computation time of 104.8105 seconds.

Fig. 6.66: AccelDSP fixed-point simulation of the F-16

aircraft control using the C++ AGPC algorithm

with a total computation time 100.17 seconds.


268

Fig. 6.64, the “Overflows” were completely eliminated while three “Underflows” could not be

eliminated. The “Verify Fixed Point” simulation results for the F-16 aircraft AGPC control is shown

in Fig. 6.66. By comparing the floating point simulation results of Fig. 6.65 and the fixed point

results, it can be observed that the AGPC control performance of the fixed point closely follows

those obtained by the floating point AGPC algorithm good tracking of the desired reference signals.

During the Verify Fixed Point, the AccelDSP Synthesis tool captures the data stream on the design

inputs and outputs which will be used for “bit-true” comparison in later stages of the design.

Stage 6) Generate RTL: This stage generates the register-transfer-level (RTL) model in VHDL from the in-

memory design data base. During the “Generate RTL” process”, all hardware logic based on the

design specification and configurations specified during the modification of the design in Step 3), are

automatically generated and inferred. The inputs and outputs (I/O) ports for the design are also

mapped and implemented. This stage also generates a VHDL “Testbench.vhd” file and is stored in

the Testbbench directory shown in Fig. 6.64 that will be used to verify the RTL model. The

generated Testbench is the VHDL equivalent verification constructs of the MATLAB input and

output data stream captured during the Verify Fixed Point stage. Several of the hardware used in this

hardware synthesis and generation are discussed in the next sub-section 6.5.3 and in Appendix E–2.

Stage 7) Verify RTL: The AccelDSP synthesis tool automatically invoke the Xilinx ISE simulator (ISim)

which first compile the VHDL files of the RTL model ([XISim, 2009]; [XISET, 2010]). Next, it

verifies the generated VHDL file of the RTL model by applying the generated verification constructs

in the Testbench, and monitors and compares the output results. The Verify RTL stage reports a

“FAILED” or PASSED” depending on the outputs comparison result. A PASSED was reported

during the Verify RTL simulation which implied that the AGPC algorithm has been correctly mapped

to its RTL model.

Stage 8) Generate System Generator: This stage generates the synthesized hardware model of the AGPC

algorithm shown in Fig. 6.67. As shown in Fig. 6.67, the inputs to the AGPC hardware model are the

NN model parameters as well as the number of samples. The inputs are passed in parallel so that on

a single global clock pulse, all the inputs are used in a frame-based fashion to produce the outputs.

The hardware resources use to create the hardware model of the AGPC algorithm is given in

Appendices E–1, E–2, and E–4 for convenience.

The generated hardware model of the AGPC algorithm will be referred to as “agpc_accelsdp_model”

which was the name of the top-level function used in creating the hardware model. The AccelDSP Synthesis Tool

description of the generated hardware model “agpc_accelsdp_model” is summarized in Fig. 6.68. The block

period is 91 clock pulses of the 100-MHz reference clock frequency which is the sampling period of the generated

hardware model while the latency is 92 by adding one clock pulse to the block period in order to register the


269

NN_Wa_1_1

NN_Wa_1_2

NN_Wa_1_3

NN_Wa_1_4

NN_Wa_1_5

NN_Wa_1_6

NN_Wa_1_7

NN_Wa_1_8

NN_Wa_1_9

NN_Wa_1_10

NN_Wa_1_11

NN_Wa_1_12

NN_Wa_1_13

NN_Wa_1_14

NN_Wa_1_15

NN_Wa_1_16

NN_Wa_1_17

NN_Wa_1_18

NN_Wa_1_19

NN_Wa_1_20

NN_Wa_1_21

NN_Wa_1_22

NN_Wa_1_23

NN_Wa_1_24

NN_Wa_1_25

NN_Wb_1_1

NN_Wb_1_2

NN_Wb_1_3

NN_Wb_1_4

NN_Wb_1_5

NN_Wb_1_6

OUT_SEQ

REF_OUT_1_1

REF_OUT_1_2

REF_OUT_1_3

PRED_OUT_1_1

PRED_OUT_1_2

PRED_OUT_1_3

CONT_OUT_1_1

CONT_OUT_1_2

CONT_OUT_1_3

agpc_acceldsp_model

Fig. 6.67: The System Generator block model of the

AGPC algorithm generated by Xilinx

AccelDSP synthesis tool. Output sequence 1, 2

and 3 corresponds to aileron – roll, elevator –

pitch and rudder – yaw signals respectively.

Fig. 6.68: The AccelDSP Synthesis Tool description of

the generated hardware model of the AGPC

algorithm “agpc_acceldsp_model”. The block

period corresponds to 91 clock pulses of the

FPGA 100-MHz reference clock which is 0.91µs.

inputs. Thus, the generated hardware model implements the AGPC algorithm at a frequency of 1.0989-MHz

corresponding to 0.91µs (microseconds) at each time sample.

The input-to-hidden layer and hidden-to-output layer weights are designated NN_Wa_1_X and

NN_Wb_1_Y respectively, where X is 1 to 25 and Y is 1 to 6. The NN_Wa_1_X are each of type “Fix” with 12

bits for the integer part and 10 bits for the binary part; whereas those for NN_Wb_1_Y are type “Fix” with 20 bits

for the integer part and 12 bits for the binary part. The OUT_SEQ is of type “UFix” with 8 and 0 bits for the

integer and binary parts respectively. The outputs REF_OUT, PRED_OUT and CONT_OUT corresponds to the

desired reference signals, predicted outputs and the computed control inputs respectively were all set to type “Fix”

with 20 and 12 bits for the integer and binary parts respectively. The data types, number of bits and binary points

were selected on trial-and-error method (the so-called quantization process); and for each trial Stage 4) Generate

Fixed Point and Stage 5) Verify Fixed Point were repeated until the Overflows were eliminated and the simulation

results of Fig. 6.66 were obtained.


270

6.5.2.1 Discussions on the Generated Hardware Model of the AGPC Algorithm

The “Fix” and “UFix” are signed and unsigned fixed point modes. The “Fix” is used to provide greater

range for the positive and negative values of the NN weight as well as the outputs which are usually unknown in

nature. The “UFix” is used to provide greater range since the number of sample will always be positive numbers.

An Overflow occurs when the magnitude of a number assigned to a variable exceeds the number of bits allocated

to the integer part of the fixed-point word. On the other hand, an Underflow occurs when a very small fractional

number gets rounded to zero. Underflows are usually more common and less serious than Overflows

([XAccelDSP, 2009]; [XSysGen, 2010]). For example, the observed minimum values of the NN input-to-hidden

layer weights were in the order of 10-5

, and this posed more challenges on quantizing their fixed point equivalent.

The amount of error between the floating and the fixed point design is called quantization error. Because

AccelDSP Auto-Quantizer recognizes the MATLAB 53-bit limit for simulating bit-true fixed point arithmetic, the

maximum “Fix” and “UFix” bits were limited to these value. Additional, challenge was to ensure that no addition,

division or multiplication of two variables in the AGPC algorithm resulted in value greater than 53-bits. As the

resulting hardware cost increases with the number of bits, and the numbers of bits for the integer part must be

greater than the numbers of bits for the binary parts; significant efforts were made to keep the number of bits as

small as possible while ensuring the accuracy of the resulting fixed-point algorithm without Overflows.

In addition to the hardware and as shown in Table 2.1, the maximum number of input-output (I/O) ports

available for use on the Virtex-5 FX70T ML507 FPGA board is 640. Here, the maximum input-output (I/O) ports

used by the generated hardware model are 613 out of the 640 ports. In addition to the 608 I/O ports used which

can be calculated from Fig. 6.68, five additional “UFix” data type ports were created with number of bits 1 and

binary point 0 respectively. Four of these ports are input ports while the fifth is an output port, which listed

consecutively as follows:

1). Clock: The generated hardware model for the AGPC algorithm has one global clock input. Data transfer

on each data ports are synchronized to the clock. The clock frequency corresponds to the 100-MHz

specified in Fig. 6.64.

2). ClockEnable: The ClockEnable enables the clock.

3). Reset: The global reset must be held active high for at least one clock cycle and returns all registers to

known state. The “Generate System Generator” option for the AccelDSP synthesis flow used in this

work (see Fig. 6.63 and Fig. 6.64), processes the all data at fixed rate and has a constant throughput.

Constant throughput means that all required tasks must be completed within the specified block sample

period. This was verified by the “PASSED” issued by the Verify RTL stage of the hardware synthesis.

When the “Generate System Generator” option is specified, it is expected that the generated

hardware model will be part of a larger System Generator design, as it is in this work. Therefore, the

generated hardware model must process and deliver data placed at its input ports to the output pots

within the block sample period. This input-output communication is controlled by an interface protocol,


271

namely: the full handshake protocol and the push-mode handshake protocol. Unlike the full handshake

protocol supported by the ISE option (that is the Verify Gate Level in Fig. A.4) which can be used when

the design does not have constant throughput, the Generate System Generator option implements the

push-mode handshake protocol. Unlike the full handshake protocol, the push-mode handshake protocol

does not rely on input request from the generated hardware before data is sent and output

acknowledgement that data has been received by the output device. This is because the push-mode

protocol is limited to designs with a constant throughput ([XAccelUG, 2007]; [XSysGen, 2010]).

Note that if the AccelDSP is unable to the push-mode protocol, the full handshake is implemented

and the AccelDSP design flow switches to Verify Gate Level option. Experiences have shown that the

resulting Gate level design produce errors when used in the Xilinx ISE Foundation to generate the

programming file. Thus, the MATLAB programs must be modified and all the AccelDSP flow repeated.

4). ac_InputAvail: The ac_InputAvail is simply the input availability. A question here may be from where

and how the input is available, and what is the output device? Let the input and output devices be the

Xilinx “From Register” and “To Register” blocks respectively that are discussed in Appendix A–8. The

data from the input register is presented at the output and the hardware model captures and processes the

data; and then writes the results to the output register. Latency may then be defined here as the time

between when the data is presented and when it is received. Hence, a latency of 1 clock is added in order

to compensate for this time lag.

5). ac_OutputAvail: Also ac_OutputAvail is simply the output availability. The processed data from the

hardware model is written to the output register. The output register also have a latency of 1 clock cycle.

6.5.2.2 Remarks on the Generated Hardware Model of the AGPC Algorithm

By comparing the nonlinear F-16 control simulation results of Fig. 6.65 and Fig. 6.66, it can be observed

that the fixed point C++ program of the AGPC algorithm closely follow the floating point MATLAB due to the

high accuracy quantization of the floating point algorithm. It can also be observed that C++ program took 100.17

seconds as shown by the Verify FixedPoint Report in Fig. 6.64 against the 104.8015 seconds used by the floating

point algorithm, which indicates that the fixed point is 4.6315 second faster than the flowing point counterpart.

By dividing these computation times by the total number of samples being 160, then the average time

required at each sampling instant to compute the control inputs are 0.6550 and 0.6261 seconds for the floating and

fixed point AGPC algorithms respectively. Note that time excludes that for the neural network model

identification. The sampling time of the generated hardware model of the AGPC algorithm in Fig. 6.68 shows the

block period to be 91 based on the specified frequency of 100-MHz for the Virtex-5 XC5VFX70T FPGA.

The block period of 91 implies that the “agpc_acceldsp_model” block model produces an output for a

given input sample after 91 clock cycles. The block period is the time it takes the “agpc_acceldsp_model” block


272

model to process data presented at its input to the time it produces the results. Thus, given the neural network

model, the “agpc_acceldsp_model” block model implements the AGPC algorithm at a frequency of 1.0989MHz

which is 0.91µs (microseconds) at each time sample. Note that since the generated hardware model implements

the AGPC algorithm at 0.91µs with 91 clock pulses and the expected number of samples is 160, then the total

number of simulation samples is 14560 samples as implemented by the generated hardware model.

In the next sub-section, the generated hardware model of the AGPC algorithm is combined with the “To

Register” and “From Register” shared memory registers blocks as well as other blocks from the Simulink and

System Generator block libraries to build and test the complete AGPC algorithm model with memory interfaces.

6.5.3 Model-Based Implementation of the Synthesized AGPC Algorithm Using

Xilinx System Generator for DSP

The generated hardware model that encrypts the AGPC algorithm shown in Fig. 6.67 is employed in this

subsection to build the complete System Generator model of the AGPC algorithm. The input and output interfaces

to the agpc_accelsdp_model are the shared memories “From Register” and “From Register” blocks which are

taken from the System Generator library block in the Simulink library browser. The complete System Generator

model of the AGPC algorithm is shown in Fig. 6.69. These shared memories are used so that an addressable

memory mapped interface can be created which can be used to write to and read from the agpc_accelsdp_model.

The agpc_accelsdp_model receives the neural network weights via a bank of 31 shared memory “To

Register” blocks. As shown in Fig. 6.69, the input-to-hidden layer weights are grouped as sub-system into the

block Wa_Input_Regs while the hidden-to-output weights are grouped into the block Wb_Input_Regs. The

initialization and implementation of the complete System Generator model for the AGPC algorithm of Fig. 6.69 is

controlled by the “Flag_a” via the input “c” as discussed previously. If “c = a”, the model identification process

is implemented. On completion, the model identification algorithm writes a “b” to the file “Flag_a”, which is

used to initialize and implements the AGPC algorithm of Fig. 6.69. On completion, the AGPC algorithm writes an

“a” which is used to repeat the model identification process for the number of samples, k = 14560 samples. The

number of samples is specified via a “Counter Limiter” as 14560 samples. Then, the Xilinx Gateway In block,

discussed in Appendix A–8 and shown in Fig. 6.69 as “IN_OUT_SEQ”, is used to convert the Simulink integer,

double and fixed point data type from the Counter Limiter into System Generator fixed point data type.

The outputs of the “agpc_acceldsp_model” are connected to nine shared memory “From Register”

blocks, namely: AIL_REF, ELEV_REF, RUDD_REF for the reference signals; AIL_PRED, ELEV_PRED,

RUDD_PRED for the predicted outputs; and AIL_ROLL_CONT, ELEV_PITCH_CONT, RUDD_YAW_CONT

for the control signals. Where AIL, ELEV and RUDD represents aileron, elevator and rudder respectively. Again,

these shared memories are used so that an addressable memory mapped interface can be created through which

these registers can be accessed for a write operation by the “agpc_acceldsp_model” and read operation by a


273

NN_Wa_1_1

NN_Wa_1_2

NN_Wa_1_3

NN_Wa_1_4

NN_Wa_1_5

NN_Wa_1_6

NN_Wa_1_7

NN_Wa_1_8

NN_Wa_1_9

NN_Wa_1_10

NN_Wa_1_11

NN_Wa_1_12

NN_Wa_1_13

NN_Wa_1_14

NN_Wa_1_15

NN_Wa_1_16

NN_Wa_1_17

NN_Wa_1_18

NN_Wa_1_19

NN_Wa_1_20

NN_Wa_1_21

NN_Wa_1_22

NN_Wa_1_23

NN_Wa_1_24

NN_Wa_1_25

NN_Wb_1_1

NN_Wb_1_2

NN_Wb_1_3

NN_Wb_1_4

NN_Wb_1_5

NN_Wb_1_6

OUT_SEQ

REF_OUT_1_1

REF_OUT_1_2

REF_OUT_1_3

PRED_OUT_1_1

PRED_OUT_1_2

PRED_OUT_1_3

CONT_OUT_1_1

CONT_OUT_1_2

CONT_OUT_1_3

agpc_acceldsp_model

NN_Wb_1

NN_Wb_2

NN_Wb_3

NN_Wb_4

NN_Wb_5

NN_Wb_6

Wb_Input_Regs

NN_Wa_1

NN_Wa_2

NN_Wa_3

NN_Wa_4

NN_Wa_5

NN_Wa_6

NN_Wa_7

NN_Wa_8

NN_Wa_9

NN_Wa_10

NN_Wa_11

NN_Wa_12

NN_Wa_13

NN_Wa_14

NN_Wa_15

NN_Wa_16

NN_Wa_17

NN_Wa_18

NN_Wa_19

NN_Wa_20

NN_Wa_21

NN_Wa_22

NN_Wa_23

NN_Wa_24

NN_Wa_25

Wa_Input_Regs

rudd_ref.mat

To rudd_ref

rudd_pred.mat

To rudd_pred

rudd_cont.mat

To rudd_cont

elev_ref.mat

To elev_ref

elev_pred.mat

To elev_pred

elev_cont.mat

To elev_cont

ai l_ref.mat

To ai l_ref

ail_pred.mat

To ai l_pred

ail_cont.mat

To ai l_cont

din

en

dout

To Rudd_Ref_Reg

<< 'RUDD_REF' >>

din

en

dout

To Rudd_Pred_Reg

<< 'RUDD_PRED' >>

din

en

dout

To Rudd_Cont_Reg

<< 'RUDD_YAW_CONT' >>

din

en

dout

To Elev_Ref_Reg

<< 'ELEV_REF' >>

din

en

dout

To Elev_Pred_Reg

<< 'ELEV_PRED' >>

din

en

dout

To Elev_Cont_Reg

<< 'ELEV_PITCH_CONT' >>

din

en

dout

To Ail_Ref_Reg

<< 'AIL_REF' >>

din

en

dout

To Ail_Pred_Reg

<< 'AIL_PRED' >>

din

en

dout

To Ail_Cont_Reg

<< 'AIL_ROLL_CONT' >>

Terminator7

Termi_EP1

Term_RY3

Term_RY1

Term_EP3

Term_EP2

Term_AR3

Term_AR2

Term_AR1

din

endout

Samples

<< 'HW_OUT_SEQ' >>

Out

RUDD_YAW_CONT

RUDD_YAW

Out

RUDD_REF

Out

RUDD_PRED

In

IN_OUT_SEQ 1

EN_Regs

Out

ELEV_REF

Out

ELEV_PRED

Out

ELEV_PITCH_CONT

ELEV_PITCH

lim

Counter

Limited

Out

AIL_ROLL_CONT

AIL_ROLL

Out

AIL_REF

Out

AIL_PRED

Sy stem

Generator

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

uint8 UFix_8_0UFix_8_0

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

double

double

double

double

double

double

double

double

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

double

double

double

double

Bool

Fig. 6.69: The complete System Generator model for the generated hardware model “agpc_acceldsp_model” for the AGPC

algorithm.

peripheral. For the purpose of evaluating the performance of the complete AGPC algorithm of Fig. 6.69 in

Simulink, these outputs registers are connected to the Xilinx “Gateway Out” blocks shown in Fig. 6.69. As

discussed in Appendix–A, these Gateway Out block converts the System Generation fixed point data types to

Simulink integer, double, fixed point data types for plotting on the Simulink scope blocks.

As a general rule, every System Generator design must include at least the System Generator Token at the

top-level in the highest hierarchy of the design. The System Generator Token block is introduced and discussed in

Appendix A–3. The System Generator Token is included and shown at the bottom of Fig. 6.69. Note that the

block is not connected to any other block, rather it act as an interface to the Xilinx design and simulation tools.


274

0 2000 4000 6000 8000 10000 12000 14000

-50

0

50

Roll

rate

(p) pre

dic

tions

Roll Ref

AGPC

0 2000 4000 6000 8000 10000 12000 14000

-20

-10

0

10

20


Aile

ron d

eflections ( δ

a )

0 2000 4000 6000 8000 10000 12000 14000

-50

0

50

Pitch rate

(q) pre

dic

tions

Pitch Ref

AGPC

0 2000 4000 6000 8000 10000 12000 14000

-20

0

20


Ele

vato

r deflections ( δ

e )

0 2000 4000 6000 8000 10000 12000 14000

-100

-50

0

50

100

Yaw

rate

(r) p

redic

tions

Yaw Ref

AGPC

0 2000 4000 6000 8000 10000 12000 14000

-20

0

20



r, deg)

(a)

IN_OUT_SEQ

AIL_PRED

AIL_REF

AIL_ROLL_CONT

ELEV_PITCH_CONT

ELEV_PRED

ELEV_REF

RUDD_PRED

RUDD_REF

RUDD_YAW_CONT

JTAGCo-sim

F16_nagpc_ipcorehwcosim

(b)

Fig. 6.70: The nonlinear F-16 aircraft control simulation

results using the System Generator model of

the AGPC algorithm of Fig. 6.69.

Fig. 6.71: (a) System Generator token (left) and the six

System Generator compilation options with the

Hardware Co-Simulation options for Virtex-5 ML507

(right) and (b) Hardware Co-Simulation block.

Next, the complete System Generator model of Fig. 6.69 is implemented in closed-loop with the

nonlinear F-16 aircraft Simulink model and neural network model identification scheme using the Modified

Levenberg-Marquardt algorithm (MLMA). As discussed earlier, due to the block period of 91 clock pulses with

respect to the 100-MHz reference clock frequency, the number of Simulink simulations is specified as 14560

samples. The closed-loop F-16 aircraft control simulation results are shown in Fig. 6.70. It can be observed that


275

the System Generator model of the AGPC algorithm closely track the desired reference signals. However, by

comparing Fig. 6.70 with those of Fig. 6.65 and Fig. 6.66, a small mismatch appears in the fourth control sample

sequence in the two figures of Fig. 6.70 (a) while another can also be observed in the top figures of Fig. 6.70 (b)

and (c). However, the control performance of the complete System Generator model for the AGPC algorithm

using “agpc_acceldsp_model” gives acceptable control results. The computation time for the F-16 aircraft control

for 160 samples at the “agpc_acceldsp_model” block period of 91 Reference Clock Frequency of 100MHz over

the 14560 samples using the complete System Generator model of the AGPC algorithm is 1.8815 seconds. This

implies that each control action is executed in 0.12922ms (milliseconds). This gives an improvement of 5.0689 x

103 times faster when compared to the computation time obtained with MATLAB floating-point AGPC

algorithm.

Although, the achieved computation time of 0.12922ms by the System Generator mode of the AGPC

algorithm is approximately 3.8694 x 103 times below the 0.5 seconds time constant of the F-16 aircraft [Russel,

2003] based on the AGPC control simulations only, additional time will be introduced by the model identification

scheme as well as actuator time constants not covered here. Hence, additional improvements, if possible, are

necessary for further computation time reduction. One approach for this improvement is to integrate the System

Generator model of the AGPC algorithm with an embedded processor system as a co-processing hardware. By

this way, the synthesized AGPC algorithm can be executed at or close to the embedded processor system’s

operating frequency.

Before integrating the System Generator model of the AGPC algorithm with an embedded processor

system, it is necessary to test the closed-loop performance of this model in a hardware-in-the-loop co-simulation

with the Virtex-5 ML507 FPGA board.

6.5.4 Hardware-in-the-Loop Co-Simulation of the System Generator Model of the

Synthesized AGPC Algorithm on Xilinx Virtex-5 FX70T ML507 FPGA Board

System Generator for DSP provides hardware-in-the-loop co-simulation (HW Co-Sim) which makes it

possible to incorporate the System Generator model for the AGPC algorithm running in Simulink directly into the

Virtex-5 FX70T ML507 FPGA board. This allows the compiled portion of the System Generator model to be

tested in the actual Virtex-5 FX70T ML507 FPGA board. Through the HW Co-Sim, the performance of the

System Generator model of the AGPC algorithm when downloaded to the Virtex-5 FX70T ML507 FPGA board

can be verified in advance, and modification through all the phases of the design can be made to correct errors.

In order to perform this HW Co-Sim of the System Generator model for the AGPC algorithm on Virtex-5

FX70T ML507 FPGA board, the HW Co-Sim block must first be generated using the System Generator Token as

discussed in Appendix A–3. As Fig. A.5 in Appendix–3 shows, the Virtex-5 FX70T ML507 FPGA board support

for HW Co-Sim is not available. Here, all the four MATLAB program files that configure the ML506 FPGA


276

rudd_ref_hw.mat

To rudd_ref_hw

rudd_pred_hw.mat

To rudd_pred_hw

rudd_cont_hw.mat

To rudd_cont_hw

elev_ref_hw.mat

To elev_ref_hw

elev_pred_hw.mat

To elev_pred_hw

elev_cont_hw.mat

To elev_cont_hw

ail_ref_hw.mat

To ail_ref_hw

ail_pred_hw.mat

To ail_pred_hw

ail_cont_hw.mat

To ail_cont_hw

RUDD_YAW_HW

IN_OUT_SEQ

AIL_PRED

AIL_REF

AIL_ROLL_CONT

ELEV_PITCH_CONT

ELEV_PRED

ELEV_REF

RUDD_PRED

RUDD_REF

RUDD_YAW_CONT

JTAGCo-sim

F16_nagpc_ipcorehwcosim

ELEV_PITCH_HW

lim

CounterLimited

AIL_ROLL_HW

SystemGenerator

Fig. 6.72: The System Generator model of the AGPC algorithm for the nonlinear F-16 aircraft auto-pilot control

with the generated Hardware Co-Simulation block.

board for HW Co-Sim are copied to a new sub-directory and modified for the Virtex-5 FX70T ML507 FPGA

board. The four files are ML506_JTAG.ucf, ML506_JTAG_PostGeneration.m, ML506_JTAG_Target.m and

xltarget.m. The modifications are made by changing all the “6” to “7” as well as the specifications of the Virtex-5

XC5VFX70T as shown in Fig. 6.64 for the ML507 board. These modifications now allows the Virtex-5 FX70T

ML507 FPGA board to be used for HW Co-Sim via the System Generator Token as shown in Fig. 6.71(a) when

compared to Fig. A.5 in Appendix A–3.

Now, using the System Generator model of the AGPC algorithm of Fig. 6.69 and the System Generator

Token of Fig. 6.71(a), the HW Co-Sim block for the System Generator model is generated which is shown in Fig.

6.71(b). As can be seen in Fig. 6.71(a), both Ethernet and JTAG options for the HW Co-Sim is available for use,

the JTAG interface option is used in this work. The reason is that the Ethernet HW Co-Sim interface utilizes an

evaluation version of the Ethernet MAC IP core which becomes dysfunctional after continuous, prolonged

operation in the FPGA board for approximately 7 hours.

To set up the JTAG HW Co-Sim simulation model, the “Counter Limiter” and the “Scope” blocks from

Simulink libraries, similar to that in Fig. 6.69, are added to the generated JTAG HW Co-Sim block of Fig. 6.71(b)

to obtain the complete HW Co-Sim simulation model shown in Fig. 6.72. Next, the Virtex-5 FX70T ML507

FPGA development board is connected similar to the scheme shown in Fig. 6.61. The closed-loop hardware-in-

the-loop co-simulation with the Virtex-5 FX70T ML507 FPGA board is performed.


277

0 2000 4000 6000 8000 10000 12000 14000

-50

0

50

Roll ra

te (p) pre

dictions

Roll Ref

HW-CoSim

0 2000 4000 6000 8000 10000 12000 14000

-20

-10

0

10

20

(a)

Aile

ron d

eflections ( δ

a )

0 2000 4000 6000 8000 10000 12000 14000

-50

0

50

Pitch rate

(q) pre

dictions

Pitch Ref

HW-CoSim

0 2000 4000 6000 8000 10000 12000 14000

-20

0

20

(b)

Elevato

r deflections ( δe )

0 2000 4000 6000 8000 10000 12000 14000

-100

-50

0

50

100

Yaw rate

(r) p

redictions

Yaw Ref

HW-CoSim

0 2000 4000 6000 8000 10000 12000 14000

-20

0

20

(c)

Number of samples


r, deg)

Fig. 6.73: Hardware-in-the-loop co-simulation results produced by the generated Hardware Co-Simulation

block model evaluated on the Xilinx Virtex-5 ML507 FPGA board over JTAG cable. In the top

plots, the output predictions (yellow) are compared to the reference signal (red).The bottom

plots are the control signals. (a), (b), and (c) are the simulation results for the aileron-roll,

elevator-pitch and rudder-yaw prediction and control respectively.

The control simulation results are shown in Fig. 6.73. Comparing the control results of Fig. 6.73 to those

of Fig. 6.65, Fig. 6.66 and Fig. 6.70; it can be observed that the JTAG HW Co-Sim gives similar good and

acceptable control performances. This is an indication that the design System Generator model of the AGPC

algorithm is a good representative of the original floating pointing MATLAB algorithm. This also justify that a


278

good performance will be achieved from the System Generator model of the AGPC algorithm when it is

programmed into the Virtex-5 FX70T ML507 FPGA.

The computation time for the F-16 aircraft control for 14560 samples with the “agpc_acceldsp_model”

using the JTAG HW Co-Sim block generated from the System Generator model of the AGPC algorithm is 9.2561

seconds. This implies that AGPC control action is executed in 0.63572ms at each time sample. Although this give

an improvement of 1.0303 x 103 times faster over the original floating point MATLAB algorithm. Also, the

achieved computation time of 0.63572ms by the HW Co-Sim block model is approximately 7.8651x 102 times

below the 0.5 seconds time constant of the F-16 aircraft [Russel, 2003] based on the control simulations only. The

comparing the computation time of 0.63572ms obtained with the HW Co-Sim block model with the 0.12922ms

obtained with System Generator model, it is obvious that the HW Co-Sim block model has increased computation

time of 4.9197 seconds. Thus, the System Generator model implementation is about 5 times faster than the

hardware-in-the-loop implementation of the synthesized AGPC algorithmic model than the HW Co-Sim block

model. This increase in computation time is not unusual [XSysGen, 2010].

The possible reasons for increased computation time are: 1) because the HW Co-Sim block is in effect

producing the clock signal for the Virtex-5 FPGA board only when Simulink initializes it, 2) the overhead

associated with the rest of the Simulink model’s simulation, and 3) the communication overhead due to bus

latency between Simulink and the Virtex-5 FPGA can significantly limit the performance achieved including

increased computation time. In addition to the Gate-Level wrapper generated for “agpc_acceldsp_model” by the

AccelDSP synthesis tool (see Fig. 6.64), System Generator for DSP also generate the memory-map interfaces for

the “agpc_acceldsp_model”, and provides the necessary hardware interfaces and software drivers for the System

Generator model. Although a brief discussion on HW Co-Sim is given in Appendix A–3, a more complete detail

on the Hw Co-Sim block generation process can be found in ([XAccelUG, 2009]; [XSysGen, 2010]).

The next sub-section deals with the generation of the AGPC Co-Processor IP core (pcore) from the

System Generator model of the Synthesized AGPC algorithmic model and the integration of this pcore as a co-

processing hardware with an embedded PowerPC™440 processor system.

6.5.5 Generation and Integration of an AGPC Co-Processor Intellectual Property

(IP) Core with an Embedded PowerPC™440 Processor System

This sub-section develops on the embedded PowerPC™440 processor system developed and tested in

Section 5.3.2 of Chapter 5. Based on the discussions in Section 5.3, the IBM PowerPC™ hard processor core is

the proposed embedded used here. Then, the System Generator model of the AGPC algorithm is used to generate

an IP core which shall be referred to as “AGPC Co-Processor” in this work, since it is a highly optimized

algorithm processing hardware. The AGPC Co-Processor is then exported and integrated with a copy of the pre-


279

NN_Wa_1_1

NN_Wa_1_2

NN_Wa_1_3

NN_Wa_1_4

NN_Wa_1_5

NN_Wa_1_6

NN_Wa_1_7

NN_Wa_1_8

NN_Wa_1_9

NN_Wa_1_10

NN_Wa_1_11

NN_Wa_1_12

NN_Wa_1_13

NN_Wa_1_14

NN_Wa_1_15

NN_Wa_1_16

NN_Wa_1_17

NN_Wa_1_18

NN_Wa_1_19

NN_Wa_1_20

NN_Wa_1_21

NN_Wa_1_22

NN_Wa_1_23

NN_Wa_1_24

NN_Wa_1_25

NN_Wb_1_1

NN_Wb_1_2

NN_Wb_1_3

NN_Wb_1_4

NN_Wb_1_5

NN_Wb_1_6

OUT_SEQ

REF_OUT_1_1

REF_OUT_1_2

REF_OUT_1_3

PRED_OUT_1_1

PRED_OUT_1_2

PRED_OUT_1_3

CONT_OUT_1_1

CONT_OUT_1_2

CONT_OUT_1_3

agpc_acceldsp_model

NN_Wb_1

NN_Wb_2

NN_Wb_3

NN_Wb_4

NN_Wb_5

NN_Wb_6

Wb_Input_Regs

NN_Wa_1

NN_Wa_2

NN_Wa_3

NN_Wa_4

NN_Wa_5

NN_Wa_6

NN_Wa_7

NN_Wa_8

NN_Wa_9

NN_Wa_10

NN_Wa_11

NN_Wa_12

NN_Wa_13

NN_Wa_14

NN_Wa_15

NN_Wa_16

NN_Wa_17

NN_Wa_18

NN_Wa_19

NN_Wa_20

NN_Wa_21

NN_Wa_22

NN_Wa_23

NN_Wa_24

NN_Wa_25

Wa_Input_Regs

rudd_ref.mat

To rudd_ref

rudd_pred.mat

To rudd_pred

rudd_cont.mat

To rudd_cont

elev_ref.mat

To elev_ref

elev_pred.mat

To elev_pred

elev_cont.mat

To elev_cont

ail_ref.mat

To ail_ref

ail_pred.mat

To ail_pred

ail_cont.mat

To ail_cont

din

en

dout

To Rudd_Ref_Reg

<< 'RUDD_REF' >>

din

en

dout

To Rudd_Pred_Reg

<< 'RUDD_PRED' >>

din

en

dout

To Rudd_Cont_Reg

<< 'RUDD_YAW_CONT' >>

din

en

dout

To Elev_Ref_Reg

<< 'ELEV_REF' >>

din

en

dout

To Elev_Pred_Reg

<< 'ELEV_PRED' >>

din

en

dout

To Elev_Cont_Reg

<< 'ELEV_PITCH_CONT' >>

din

en

dout

To Ail_Ref_Reg

<< 'AIL_REF' >>

din

en

dout

To Ail_Pred_Reg

<< 'AIL_PRED' >>

din

en

dout

To Ail_Cont_Reg

<< 'AIL_ROLL_CONT' >>

Terminator7

Termi_EP1

Term_RY3

Term_RY1

Term_EP3

Term_EP2

Term_AR3

Term_AR2

Term_AR1

din

endout

Samples

<< 'HW_OUT_SEQ' >>

Out

RUDD_YAW_CONT

RUDD_YAW

Out

RUDD_REF

Out

RUDD_PRED

In

IN_OUT_SEQ 1

EN_Regs

Out

ELEV_REF

Out

ELEV_PRED

Out

ELEV_PITCH_CONT

ELEV_PITCH

EDK Processor

lim

Counter

Limited

Out

AIL_ROLL_CONT

AIL_ROLL

Out

AIL_REF

Out

AIL_PRED

Sy stem

Generator

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_12_10

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

uint16 UFix_8_0UFix_8_0

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

double

double

double

double

double

double

double

double

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

Fix_20_12

double

double

double

double

Bool

Fig. 6.74: The System Generator model for the AGPC algorithm with the EDK Processor block used for to generate the

AGPC Co-Processor IP core. The model here is renamed as “f16_nagpc_ipcore” to distinguish it from Fig. 6.69.

designed embedded PowerPC™440 processor system of Section 5.3.2 as a co-processing hardware. The complete

embedded system design is accomplished using the Xilinx EDK discussed in Appendix A–4. An overview on the

concept of import and exporting the System Generator model of a design are discussed in Appendix A–5 and

Appendix A–6 respectively. A more detailed treatment of all the discussions covered in this sub-section can be

found in ([XEDKPro, 2010]; [Xilinx, 2010]; [XEPB Virtex-5, 2010]; [XPSFRM; 2010]; [XSysGen, 2010]).

System Generator Token provides a simple abstraction for easily adding custom logic into a processor

system as discussed in Appendix–4 (see Fig. A.9) and it is also the main tool used for generating a custom logic

such as the AGPC Co-processor using the EDK pcore generation option in Fig. A.8. The pcore generation

process is facilitated if a memory mapped interface is available on the custom logic. Thus, the shared memory

(“To Register” and “From Register”) in Fig. 6.69 makes possible this interface.


280

In order to generate the AGPC Co-processor IP core, the EDK Processor block shown in Fig. A.8 of

Appendix A–4 and discussed in Appendix A–8 is added to the System Generator model of Fig. 6.69. The

resulting System Generator model for the AGPC algorithm is shown in Fig. 6.74 with the EDK Processor block.

This modified AGPC algorithmic hardware model is renamed as “f16_nagpc_ipcore” to distinguish it from Fig.

6.69. The EDK Processor is then configured for the EDK pcore generation option as in Fig. A.8. Next, the

memories are added using the “Add” tab available on the EDK Processor as in Fig. A.8. The EDK Processor is

update using the “Apply” and “OK” tabs as in Fig. A.8. The AGPC Co-processor IP core generated using the

EDK Processor block via the System Generator Token in the Simulink environment is shown in Fig. 6.75.

In Fig. 6.75, the large middle block is called, by default, the “plb_memorymap” which maps all the

hardware logic used to create the AGPC algorithm “agpc_acceldsp_model” into specific memory locations. The

top left block is, also called by default, the “plb_decode” which is used to decode the memory locations for read

and/write operations and consist of ten registers. The first nine blocks on the bottom left are the “From Register”

blocks which correspond to the nine registers for writing the outputs of the reference signals, predicted outputs

and the control signals respectively from the “agpc_acceldsp_model” to a DTE or save a file. The tenth “From

Registers” block is for the HW_OUT_SEQ for specifying the number of samples. All the thirty-one “To register”

blocks are listed on the right and are used to read input data into the “agpc_acceldsp_model” from specified

locations. Note that all unused outputs are terminated with the Simulink “Terminator” block from the Simulink

Sinks library, and the subsystems “Wa_Input_Regs” and “Wb_Input_Regs” have been removed so that only the

actual “Shared Memory” register blocks are shown in the generated AGPC Co-processor system.

The AGPC Co-processor IP core “f16_nagpc_ipcore” export to the embedded PowerPC™440 processor

system designed in Chapter 5 is a straightforward process. A copy of the embedded PowerPC™440 processor

system designed in Chapter 5 is made. Then, the contents of the “pcore” directory where the “f16_nagpc_ipcore”

has been generated into within the System Generator/Simulink project directory are copied to the “pcore”

directory of the embedded PowerPC™440 directory. The next issue is to connect and configure the

f16_nagpc_ipcore in the embedded processor environment. This task requires the use of the Xilinx ISE™ and the

XPS similar to that discussed in Chapter 5 for the embedded processor design. The procedures for connecting and

configuring the AGPC Co-processor IP core “f16_nagpc_ipcore” are summarized below using the XPS GUI

shown in Fig. 6.76:

1) First the Xilinx ISE™ is opened followed by the XPS software. The current project name is specified here

as “emb_ppc440_agpc”. On the XPS GUI, the user Repository is rescanned to include the just copied

“f16_nagpc_ipcore”. This is achieved by selecting from the XPS GUI “Project Rescan the User

Repositories”. This brings up the “f16_nagpc_ipcore” as “f16_nagpc_ipcore_plbw” under USER in the

IP Catalogue section of the XPS GUI. The “f16_nagpc_ipcore_plbw” will simply be referred to “ipcore”.

2) Next, the Right-clicking the ipcore and selecting “Add IP”, automatically adds the ipcore to the embedded

PowerPC™440 processor system.


281

In

sg_plb_addrpref

wrDBus

bankAddr

linearAddr

RNWReg

addrAck

sm_RUDD_REF

sm_RUDD_PRED

sm_RUDD_YAW_CONT

sm_ELEV_REF

sm_ELEV_PRED

sm_ELEV_PITCH_CONT

sm_AIL_REF

sm_AIL_PRED

sm_AIL_ROLL_CONT

sm_HW_OUT_SEQ

read_bank_out

sm_HW_NN_Wb_6_din

sm_HW_NN_Wb_6_en

sm_HW_NN_Wb_5_din

sm_HW_NN_Wb_5_en

sm_HW_NN_Wb_4_din

sm_HW_NN_Wb_4_en

sm_HW_NN_Wb_3_din

sm_HW_NN_Wb_3_en

sm_HW_NN_Wb_2_din

sm_HW_NN_Wb_2_en

sm_HW_NN_Wb_1_din

sm_HW_NN_Wb_1_en

sm_HW_NN_Wa_9_din

sm_HW_NN_Wa_9_en

sm_HW_NN_Wa_8_din

sm_HW_NN_Wa_8_en

sm_HW_NN_Wa_7_din

sm_HW_NN_Wa_7_en

sm_HW_NN_Wa_6_din

sm_HW_NN_Wa_6_en

sm_HW_NN_Wa_5_din

sm_HW_NN_Wa_5_en

sm_HW_NN_Wa_4_din

sm_HW_NN_Wa_4_en

sm_HW_NN_Wa_3_din

sm_HW_NN_Wa_3_en

sm_HW_NN_Wa_25_din

sm_HW_NN_Wa_25_en

sm_HW_NN_Wa_24_din

sm_HW_NN_Wa_24_en

sm_HW_NN_Wa_23_din

sm_HW_NN_Wa_23_en

sm_HW_NN_Wa_22_din

sm_HW_NN_Wa_22_en

sm_HW_NN_Wa_21_din

sm_HW_NN_Wa_21_en

sm_HW_NN_Wa_20_din

sm_HW_NN_Wa_20_en

sm_HW_NN_Wa_2_din

sm_HW_NN_Wa_2_en

sm_HW_NN_Wa_19_din

sm_HW_NN_Wa_19_en

sm_HW_NN_Wa_18_din

sm_HW_NN_Wa_18_en

sm_HW_NN_Wa_17_din

sm_HW_NN_Wa_17_en

sm_HW_NN_Wa_16_din

sm_HW_NN_Wa_16_en

sm_HW_NN_Wa_15_din

sm_HW_NN_Wa_15_en

sm_HW_NN_Wa_14_din

sm_HW_NN_Wa_14_en

sm_HW_NN_Wa_13_din

sm_HW_NN_Wa_13_en

sm_HW_NN_Wa_12_din

sm_HW_NN_Wa_12_en

sm_HW_NN_Wa_11_din

sm_HW_NN_Wa_11_en

sm_HW_NN_Wa_10_din

sm_HW_NN_Wa_10_en

sm_HW_NN_Wa_1_din

sm_HW_NN_Wa_1_en

xlmax

plb_memmap

plbRst

plbABus

plbPAValid

plbRNW

plbWrDBus

rdData

addrPref

wrDBusReg

addrAck

rdComp

wrDAck

bankAddr

RNWReg

rdDAck

rdDBus

linearAddr

xlmax

plb_decode

din

en

dout

To Register9<< 'HW_NN_Wa_6' >>

din

en

dout


din

en

dout


din

en

dout


din

en

dout

To Register5<< 'HW_NN_Wb_1' >>

din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout


din

en

dout

To Register<< 'HW_NN_Wb_6' >>

Out

Sl_wrDAck

Out

Sl_wrComp

Out

Sl_wait

Out

Sl_rdDBus

Out

Sl_rdDAck

Out

Sl_rdComp

Out

Sl_addrAck

In

SPLB_Rst

In

PLB_wrDBus

In

PLB_RNW

In

PLB_PAValid

In

PLB_ABus

dout

From Register9<< 'HW_OUT_SEQ' >>

dout

From Register8<< 'AIL_ROLL_CONT' >>

dout

From Register7<< 'AIL_PRED' >>

dout

From Register6<< 'AIL_REF' >>

dout

From Register5<< 'ELEV_PITCH_CONT' >>

dout

From Register4<< 'ELEV_PRED' >>

dout

From Register3<< 'ELEV_REF' >>

dout

From Register2<< 'RUDD_YAW_CONT' >>

dout

From Register1<< 'RUDD_PRED' >>

dout

From Register<< 'RUDD_REF' >>

1

0

1

1

1

1

1

Sl_addrAck

double

Sl_addrAck

Sl_wrDAck

double

Sl_wrDAck

doubleHW_NN_Wa_1_en

double

HW_NN_Wa_1_din

double

HW_NN_Wa_10_en

double

HW_NN_Wa_10_din

double

HW_NN_Wa_11_en

doubleHW_NN_Wa_11_din

doubleHW_NN_Wa_12_en

double

HW_NN_Wa_12_din

double

HW_NN_Wa_13_en

double

HW_NN_Wa_13_din

double

HW_NN_Wa_14_en



double

HW_NN_Wa_15_din

double

HW_NN_Wa_16_en

double

HW_NN_Wa_16_din

double

HW_NN_Wa_17_en

double

HW_NN_Wa_17_din


double

HW_NN_Wa_18_din

double

HW_NN_Wa_19_en

double

HW_NN_Wa_19_din

double

HW_NN_Wa_2_en

double

HW_NN_Wa_2_din


double

HW_NN_Wa_20_din

double

HW_NN_Wa_21_en

double

HW_NN_Wa_21_din

double

HW_NN_Wa_22_en

double

HW_NN_Wa_22_din


double

HW_NN_Wa_23_din

double

HW_NN_Wa_24_en

double

HW_NN_Wa_24_din

double

HW_NN_Wa_25_en

double

HW_NN_Wa_25_din

doubleHW_NN_Wa_3_en


double

HW_NN_Wa_4_en

double

HW_NN_Wa_4_din

double

HW_NN_Wa_5_en

double

HW_NN_Wa_5_din

doubleHW_NN_Wa_6_en


double

HW_NN_Wa_7_en

double

HW_NN_Wa_7_din

double

HW_NN_Wa_8_en

double

HW_NN_Wa_8_din

doubleHW_NN_Wa_9_en


double

HW_NN_Wb_1_en

double

HW_NN_Wb_1_din

double

HW_NN_Wb_2_en

double

HW_NN_Wb_2_din

double

HW_NN_Wb_3_en

doubleHW_NN_Wb_3_din

double

HW_NN_Wb_4_en

double

HW_NN_Wb_4_din

double

HW_NN_Wb_5_en

double

HW_NN_Wb_5_din

double

HW_NN_Wb_6_en

doubleHW_NN_Wb_6_din

double

rdData

double

HW_OUT_SEQ_dout

double

AIL_ROLL_CONT_dout

double

AIL_PRED_dout

doubleAIL_REF_dout

double

ELEV_PITCH_CONT_dout

double

ELEV_PRED_dout

double

ELEV_REF_dout

double

RUDD_YAW_CONT_dout

doubleRUDD_PRED_dout

double

RUDD_REF_dout

doubleRNWReg

double

linearAddr

double

bankAddr

double

wrDBusReg

double

HW_NN_Wa_1_dout

doubleHW_NN_Wa_10_dout

double

HW_NN_Wa_11_dout


double

HW_NN_Wa_13_dout

double

HW_NN_Wa_14_dout

double

HW_NN_Wa_15_dout

double

HW_NN_Wa_16_dout

double

HW_NN_Wa_17_dout

double

HW_NN_Wa_18_dout

double

HW_NN_Wa_19_dout

double

HW_NN_Wa_2_dout


double

HW_NN_Wa_21_dout


double

HW_NN_Wa_23_dout


double

HW_NN_Wa_25_dout

double

HW_NN_Wa_3_dout

double

HW_NN_Wa_4_dout

double

HW_NN_Wa_5_dout

double

HW_NN_Wa_6_dout

double

HW_NN_Wa_7_dout

double

HW_NN_Wa_8_dout

double

HW_NN_Wa_9_dout

double

HW_NN_Wb_1_dout

double

HW_NN_Wb_2_dout

doubleHW_NN_Wb_3_dout

double

HW_NN_Wb_4_dout

doubleHW_NN_Wb_5_dout

double

HW_NN_Wb_6_dout

double

Sl_rdDBus

double

Sl_rdDAck

double

Sl_rdComp

double

addrPref

doublePLB_wrDBus

double

PLB_RNW

double

PLB_PAValid

doublePLB_ABus

double

SPLB_Rst

double

doubleSl_wait

double

double

double

double

double

double

double

double double

double

double

double

Fig. 6.75: The generated AGPC Co-processor IP core that will be integrated with a PowerPC™440 processor system.


282

3) Next, the just ipcore is connected to and configured with the PowerPC™440 processor system through the

processor local bus (PLB) using the “System Assembly View” of the XPS GUI as follows.

i). On the XPS GUI, the Bus Interfaces is selected and the ipcore is located. The ipcore is connected to

the PLB by selecting “plb_v46_0”.

ii). Next, the Ports tab on the System Assembly View of the XPX GUI is selected and the ipcore is located.

All the eleven (11) ports of the just added ipcore are made external by selecting “Make External” for

each of the ports.

iii). Finally, to complete the configuring process, the ipcore must be assigned a valid address in the

embedded system. This is achieved by selecting the “Addresses” tab on the System Assembly View of

the XPS GUI and clicking the “Generate Addresses”. This action automatically creates a memory map

for added ipcore and integrates the ipcore into the embedded processor system. Although there is no

port address conflict in this implementation, but if it exists an immediate choice is to manually re-

assign the memory addresses of conflicting ports.

4) To verify that no error(s) occurred for the connected ipcore, the complete embedded PowerPC™440-

AGPC Co-Processor (f16_nagpc_ipcore_plbw) is compiled using the XPS and the Xilinx ISE™ similar

to the procedures to the 9–Step procedures outlined in Section 5.3.2 of Chapter 5.

According these procedures, the board support packages and the libraries are generated by selecting

“Software Generate the Libraries and BSPs” on the XPS GUI as well as the Netlist by selecting

“Hardware Generate Netlist”. Next, the complete XPS project is saved and the Xilinx ISETM

is used to

generate the bitstream by selecting and double clicking on “Generate Programming File” on the Xilinx

ISE™ GUI similar to that on Fig. 5.6. All signals were completely routed, all timing constraints were met

and no error(s) were recorded as shown on the XPS and the Xilinx ISE™ synthesis results in Appendix

E–1 and E–2 respectively. Appendix E – 3 gives the summary of the main contents of the embedded

PowerPC™440–AGPC Co-Processor system. It can be seen in Appendix E–3 that the AGPC Co-

Processor has been added as a peripheral. Its main function is to implement the synthesized and

embedded AGPC algorithmic function.

5). Next, the complete embedded system must be compiled so that it can run on the Virtex-5 FPGA board.

This involves writing software that will initialize the hardware and peripheral drivers. During the AGPC

ipcore generation, the System Generator for DSP generated a complete application programmer interface

(API) that can be used to develop the software required to implement the f16_nagpc_ipcore_plbw

together with the embedded processor system using the Xilinx software development kit (Xilinx SDK).

The generated API is shown in Appendix E – 5. The API simplifies the software development process for

the ipcore since all the associated shared memories and specifications for writing the software are given.

The complete software for implementing the embedded PowerPC™440 processor–AGPC Co-

Processor system is given in Appendix E–6. The software reads the neural network model from the


283

Fig. 6.76: The XPS graphical user interface for the connecting and configuring the embedded PowerPC™440 processor

and the AGPC Co-Processor system.

project directory. The “data read operations” into the Xilinx SDK environment are achieved using file

pointers. The files for writing the embedded AGPC Co-Processor control computations to the nonlinear

F-16 aircraft are also created using file pointers. Next, the f16_nagpc_ipcore_plbw ipcore is initialized

followed by all the shared memories. The developed software is first compiled using the Xilinx SDK to


284

Fig. 6.77: The Xilinx ISE™ software for the synthesis, implementation and generation of the programming file for the

embedded PowerPC™440 processor – AGPC Co-Processor system.

ensure that it is error free. As discussed earlier, the software for implementing the embedded

PowerPC™440 processor–AGPC Co-Processor system is controlled by the “Flag_a” as described. A “b”

written to “Flag_a” by the neural network model identification scheme based on the modified Levenberg-

Marquardt algorithm (MLMA) from the MATLAB environment initiates the AGPC controller


285

implementation. Although, closed-loop simulation is implemented and discussed in the next sub-section,

but the content of “Flag_a” is first set to “b” in order to verify that the developed software is error free.

At each time sample, the developed software writes an “a” to “Flag_a” which is manually changed to “b”

during the software testing process with the Virtex-5 FPGA board. The software is named

f16_nagpc_ipcore as the parent project.

Next, a new software application project named exactly f16_nagpc_ipcore is created using the XPS and

the verified software is import into the XPS as shown in Fig. 6.76. The software is compiled into the embedded

processor system using the XPS, and it is marked to initialize using the on-board BRAMs for faster execution.

The complete embedded PowerPC™440 processor–AGPC Co-Processor system is simulation and compiled

using the XPS and the Xilinx ISE™ software to ensure that there are no error with the complete embedded

system, and that all signal are completely routed and all timing constraints are satisfied. The results of the

compilation, Generate Libraries and BSPs, Netlist, synthesis and the Generate Programming File are shown in

Fig. 6.76 and Fig. 6.77 respectively for the XPS and Xilinx ISE™ software. As it can be seen in both figures, the

completed embedded system compiled successful without errors while meeting all timing constraints with all

signals routed (see Appendix E–1 and E–2).

The complete embedded PowerPC™440 processor, AGPC Co-Processor and the associated memories and

peripherals are shown in Fig. 6.78. It can be seen in the figure that the attached AGPC Co-Processor system is

connected as a slave to the PowerPC™440 processor local bus (PLB). This allows the attached co-processor to

run at the speed of the embedded processor at enhanced speed performance.

The hardware resources used for the synthesis, modeling and generation of the AGPC Co-Processor

“f16_nagpc_ipcore_plbw” starting with the synthesis of the floating point MATLAB AGPC algorithm using the

AccelDSP Synthesis tool in Section 6.5.2 to the ipcore (f16_nagpc_ipcore) generation and integration with the

embedded PowerPC™440 processor system are given in Appendix E–4. For convenience, these resources are

listed in Table 6.20. These resources constitute the actual hardware resources used for mapping of the floating

Table 6.20: The total resources used by the AccelDSP Synthesis and System Generator for DSP modeling

tools for synthesizing, modeling and generating the AGPC Co-Processor system.

Resources Types Used Available Percent (in %)

Slice Registers 4,737 44,800 10

Slice LUTs (Look-Up Tables) 6,693 44,800 14

LUT Flip Flop (FF) Pairs Used 8,006 NA NA

Fully Used LUT-FF Pairs 3,424 8,006 42

Unique Control Sets 74 NA NA

Input-Output Buses (IOBs) 486 NA NA

Bonded IOBs 0 640 0

Block RAM/FIFO 6 148 4

BUFG/BUFGCTRLs 1 32 3

DSP48Es 14 128 10


286

Fig. 6.78: The block diagram of the embedded PowerPC™440–AGPC Co-Processor system with their associated

memory types, peripherals, clock generator, buses, hardware and software specifications and key/symbols.

point MATLAB AGPC algorithm using the AccelDSP Synthesis tool and the additional resources used in

building the System Generator model of the complete AGPC Co-Processor system. The 14 pieces of Virtex-5

DSP48E slices as indicated in Table 6.20, are the main hardware resources that really impede the computational

efficiency of the generated AGPC Co-Processor system. The architecture and functional operation of the Virtex-5

DSP48E slice is discussed in Appendix A–8 and as in ([XAccelUG, 2009], [XSysGen, 2010], [XEDKPro, 2010].


287

In order to evaluate the hardware resources used to implement the AGPC Co-Processor in the embedded

processor system, the hardware resources used by the Xilinx Platform Studio (XPS) for the synthesis of the

embedded PowerPC™440 processor system given in Appendix A–11.1 (also summarized in Table 5.1) and that

used for the complete synthesis of the embedded PowerPC™440 processor –AGPC Co-Processor system given in

Appendix E–1 are compared and summarized in Table 6.21. Similarly, the hardware resources used by the Xilinx

ISE™ software for the implementation of the embedded PowerPC™440 processor system given in Appendix A–

Table 6.21: Comparison of the hardware resources used by the Xilinx platform studio (XPS) for the AGPC Co-

Processor systems synthesis.

Embedded PowerPC™440-AGPC

Co-Processor System

AGPC Co-Processor System

(– Decrease by and + Increase by)

Flip Flops

Used

Look-Up Tables

(LUTs) Used

BlockRAMs

(BRAMs) Used

Flip Flops

Used

Look-Up Tables

(LUTs) Used

BlockRAMs

(BRAMs) Used

f16_nagpc_ipcore_plbw_0_wrapper 4737 6693 +4737 +6693 +6

proc_sys_reset_0_wrapper 67 51 0 0

jtagppc_cntrl_inst_wrapper 2 0

clock_generator_0_wrapper 4 3 0 0

ddr2_sdram_wrapper 2355 1765 2 0 -3 0

sram_wrapper 544 316 0 0

rs232_uart_1_wrapper 140 119 -1 -8

xps_bram_if_cntlr_1_bram_wrapper 2 -14

xps_bram_if_cntlr_1_wrapper 255 230 0 +2

plb_v46_0_wrapper 139 214 +1 -6

ppc440_0_wrapper 2 3 0 0

Table 6.22: Comparison of the hardware resources used by the Xilinx ISE™ for the implementation of the

AGPC Co-Processor system.

Embedded PowerPC™440-AGPC

Co-Processor System


(– Decrease by and + Increase by)

Flip Flops

Used

Flip Flops

Available

Device

Utilization

Flip Flops

Used

Flip Flops

Available

Device

Utilization

Slice Logic Utilization

Number of Slice Registers 7,246 44800 16% +9%

Number of Slice LUTs 9115 44800 20% +5%

Number of Route-Thrus 43 +21

Number of Occupied Slices 4238 11200 38% +23%

Number of LUT Flip-Flops Pairs Used 11185 +7,051

Number of Bonded IOBs 373 640 58% +30%

Number of LOCed IOBs 184 373 49% -51%

IOB Flip Flops 330 0

Number of Block RAM/FIFO 12 148 8% -5%

Total of Memory Used (KB) 432 5323 8% -5%

Number of BUFG/BUFCTRLs 7 32 21% 0%

Number of IDELAYCTRLs 3 22 13% 0%

Number of BUFIOs 8 80 10% 0%

Number of DCM_ADVs 1 12 8% 0%

Number of PLL_ADVs 1 6 16% 0%

Number of PPC440s 1 1 100% 0%

Number of DSP48Es 14 128 10% +10%

Average Fanout of Non-Clock Nets 4.05 +1.05


288

11.2 (also summarized in Table 5.1) and that used for the complete synthesis of the embedded PowerPC™440

processor –AGPC Co-Processor system given in Appendix E–2 are also compared and summarized in Table 6.22.

The results of Table 6.21 indicates that addition 6 Block RAMs used for the “f16_nagpc_ipcore” wrapper

have been added to the embedded system. Note that this wrapper was generated by the AccelDSP synthesis tool

during the synthesis of the MATLAB AGPC algorithm to generate a synthesized hardware model of the AGPC

algorithm and was also used by the System Generator for DSP to generate the AGPC Co-Processor IP core which

was imported into the embedded processor system. The main results of Table 6.22 are the additional increase in

hardware resources of about 30% for the bonded input-output blocks (Bonded IOBs) used by the AGPC Co-

Processor for read/write operations thereby reducing the number of locked (LOCed) IOBs by 51%. Furthermore,

additional 14 flip-flops have constituting about 10% increases the hardware cost for implementing the Virtex-5

DSP48E slices which are the main components that impedes the computational efficiency of the synthesized

AGPC algorithm, and hence the name “AGPC Co-Processor” as a co-processor is derived from the use of the

Virtex-5 DSP48E slices which are by themselves dedicated processors.

6.5.6 Real-Time Implementation of the Embedded PowerPC™440 Processor and

AGPC Co-Processor System on Xilinx Virtex-5 FX70T ML507 FPGA Board

The real-time implementation of the embedded PowerPC™440 processor–AGPC Co-Processor system on

the Virtex-5 FX70T ML507 FPGA development board and the closed-loop FPGA-in-the-loop simulation of the

validated nonlinear F-16 aircraft Simulink model together with the identified neural network model using the

MLMA algorithm are performed at each time sample are discussed in this sub-section. The simulation results are

also presented and discussed. The implementation procedures are outlined as follows.

1). The already developed and compiled embedded PowerPC™440 processor–AGPC Co-Processor system,

discussed in the last sub-section, is opened via the Xilinx ISE™ and the XPS software GUIs as shown in Fig.

6.76 and Fig. 6.77 respectively.

2). Using the XPS software GUI of Fig. 6.76, the hardware description file for the embedded PowerPC™440

processor–AGPC Co-Processor system “emb_ppc440_agpc.xml” is exported to the Xilinx software

development kit (Xilinx SDK) by selecting Project Export Hardware Design to SDK and clicking Export

Only. This automatically creates the SDK directory in the current project (emb_ppc440_agpc) hierarchy, and

stores the generated hardware description file. A new directory for the software development is then manually

created within the SDK directory and it is given the name of the ipcore as “f16_nagpc_ipcore”.

3). The Xilinx SDK software is opened. This automatically request that the hardware description file

“emb_ppc440_agpc.xml” file of the embedded system be specified. Specifying and including the hardware

description file into the Xilinx SDK project, automatically launch the Xilinx SDK GUI shown in Fig. 6.79.


289

Fig. 6.79: Xilinx SDK GUI for software development and programming the Virtex-5 FX70T ML507 FPGA board for

embedded PowerPC™440 processor–AGPC Co-Processor system.

4). Similar to the software development and performance verification discussed in Section 5.3.4 of Chapter 5, a

new software platform is then created using the Xilinx SDK GUI. The new platform is called

“Emb_PPC440_AGPC_Platform” as shown in Fig. 6.79. The Xilinx SDK automatically builds the new

platform and includes all the components of the embedded PowerPC™440 processor–AGPC Co-Processor

system as shown under the Emb_PPC440_AGPC_Platform hierarchy in the XPS GUI of Fig. 6.79.

5). Using the Xilinx SDK GUI, a new “Managed Make C Application Project” is then created and the same name

(f16_nagpc_ipcore) as the software developed and compiled using the XPS shown in Fig. 6.76. Next, the

f16_nagpc_ipcore project is selected from the “Sample Applications” menu. This automatically builds the


290

Fig. 6.80: Memory allocation and generation of the linker script for the embedded PowerPC™440 processor–AGPC Co-

processor system.

f16_nagpc_ipcore software application project and checks for errors. An error was reported because the

embedded software exceeds the available 32KB local processor memory BRAM (xps_bram_if_cntlr_1) limit.

This error is corrected by using the “Linker Script Generator” tool to re-assign all code sections as well as re-

assign the “Heap and Stack” to the SRAM memory as shown in Fig. 6.80. All data sections are allowed to

remain in the local processor memory (xps_bram_if_cntlr_1). The complete C application program is given in

Appendix E–6.

6). Next, the Virtex-5 FX70T ML507 FPGA board is programmed with the generated programming file

(emb_ppc440_agpc.bit) and block memory map (edkBmmFile_bd.bmm) file. During this process, the

bitstream is downloaded to the FPGA with a successful programming of the FPGA as shown in Fig. 6.79.

6.5.6.1 Closed-Loop Control of the Nonlinear F-16 Aircraft Using the Simulink and

Neural Network Models with the Embedded PowerPC™440 Processor–AGPC

Co-Processor System on Xilinx Virtex-5 FX70T ML507 FPGA Board

To perform the closed-loop simulation of the auto-pilot control system of the nonlinear F-16 fighter

aircraft, the FPGA implementation, verification and performance evaluation scheme shown in Fig. 6.61 is used

with the Virtex-5 FX70T ML507 FPGA development board programmed as just described above. The validated


291

0 2000 4000 6000 8000 10000 12000 14000

-50

0

50

Roll

rate

(p)

pre

dic

tions

Roll Ref

FPFA-AGPC

0 2000 4000 6000 8000 10000 12000 14000

-20

-10

0

10

20

(a)

Aile

ron d

eflections (

δa )

0 2000 4000 6000 8000 10000 12000 14000

-50

0

50

Pitch rate

(q)

pre

dic

tions

Pitch Ref

FPFA-AGPC

0 2000 4000 6000 8000 10000 12000 14000

-20

0

20

(b)

Ele

vato

r deflections (

δe )

0 2000 4000 6000 8000 10000 12000 14000

-100

-50

0

50

100

Yaw

rate

(r)

pre

dic

tions

Yaw Ref

FPFA-AGPC

0 2000 4000 6000 8000 10000 12000 14000

-20

0

20

(c)

Number of samples

Rudder

deflections (

δr,

deg)

0 1 2 3 4 5

x 104

-50

0

50

Roll

rate

(p)

pre

dic

tions

Roll Ref

FPFA-AGPC

0 1 2 3 4 5

x 104

-20

-10

0

10

20

(a)A

ilero

n d

eflections ( δ

a )

0 1 2 3 4 5

x 104

-50

0

50

Pitch r

ate

(q)

pre

dic

tions

Pitch Ref

FPFA-AGPC

0 1 2 3 4 5

x 104

-20

0

20

(b)

Ele

vato

r deflections (

δe )

0 1 2 3 4 5

x 104

-100

-50

0

50

100

Yaw

rate

(r)

pre

dic

tions

Yaw Ref

FPFA-AGPC

0 1 2 3 4 5

x 104

-20

0

20

(c)

Number of samples

Rudder

deflections (

δr,

deg)

Fig. 6.81: F-16 aircraft auto-pilot closed-loop control

simulations on the embedded PowerPC™440

processor–AGPC Co-processor system running

on Virtex-5 FPGA for 14,560 samples.

Fig. 6.82: F-16 aircraft auto-pilot closed-loop control

simulations on the embedded PowerPC™440

processor–AGPC Co-processor system running

on Virtex-5 FPGA for 58,240 samples.

simulated Simulink model of the F-16 aircraft and the MLMA neural network (NN) identification algorithm in

two different directories are both placed SDK directory. The different path names for reading from and writing to

the neural network model parameters, the identification/control flag (Flag_a) as well as the AGPC reference

signals, predicted outputs, control signals are given in the C program of Appendix E–6.


292

The validated Simulink F-16 aircraft model and the MLMA NN identification algorithm are setup in

MATLAB. The control flag_a is set to “a” to initialize and implement the NN model identification algorithm.

From the Xilinx SDK GUI, by right-clicking on the “f16_nagpc_ipcore Emb_PPC440_AGPC_Platform” and

selecting “Run As 1 Run on Hardware” the Virtex-5 FX70T ML507 FPGA board is set for the AGPC control

computations based on the content of the Flag_a as read by the C program. The C program in Appendix E–6 for

implementing the AGPC control computations on the Virtex-5 FPGA constantly scans and reads the control flag

(Flag_a.text) in the “while (cont_flag != ‘b’)” loop in the C program. If the content is “b”, the AGPC algorithm is

implemented; otherwise the loop keeps scanning and reading Flag_a. The results of the AGPC computations and

the computation times (START_TIME and END_TIME) are written as text files to the appropriate working

directory as specified in the C program of Appendix E–6. All text file results were plotted using MATLAB.

The first closed-loop identification and control simulations is set to 14560 samples; that is product of the

Block Period (91) and the number of round trip flight desired reference trajectories of (160) shown in Fig. 6.48.

The simulations are performed in steps of 91 being the block period of the AccelDSP synthesized AGPC

algorithmic hardware model. Similarly, the second closed-loop identification and control simulations is set to

58240 samples, which is four times the number of samples in the first closed-loop identification and control

simulations. The reason is to observe the variations in the FPGA computation times.

The results for the first closed-loop control simulations are shown in Fig. 6.81 while the results for the

second closed-loop control simulations are shown in Fig. 6.82. The computation time used at each time sample by

the embedded PowerPC™440 processor–AGPC Co-processor system implementation for computing and writing

the respective control results for the first and second closed-loop identification and control simulations are shown

in Fig. 6.83 (a) and (b) respectively. The average computation times from Fig. 6.83 (a) and (b) are approximately

0.1650156µs and 0.1650168µs respectively. The computation times for simulating the validated Simulink model

of the nonlinear F-16 aircraft and nonlinear F-16 aircraft neural network identification are not included, since only

the computation time for embedded AGPC algorithm running on the FPGA is of interest in the current study.

By comparing Fig. 6.81 and Fig. 6.82, it can be seen that the embedded AGPC algorithm running on the

Virtex-5 FPGA in closed loop with the validated F-16 aircraft model based on NN model identified at each time

sample, gives good control performance with significant reduction in computation times as shown in Fig. 6.83(a)

and (b) at the expense of hardware resources summarized in Table 6.20, Table 6.21 and Table 6.22.

The summary of the computation times based on the control simulations performed at the various stages

of the AGPC Co-Processor system as well as that for the complete embedded PowerPC™440 processor–AGPC

Co-Processor system is given in Table 6.23. Although, the computation time for the neural network model

identification is not considered, it can be argued that the significant reduction in computation time from 655.1ms

obtained from the floating point MATLAB AGPC algorithm of Fig. 6.65 and the 0.63572ms obtained from the

hardware co-simulation (HW Co-Sim) of Fig. 6.72 to the approximately 0.16502µs in Fig. 6.81 and Fig. 6.82

demonstrates the computational efficiency of the embedded AGPC algorithm implemented on the embedded


293

0 2000 4000 6000 8000 10000 12000 140001.6475

1.648

1.6485

1.649

1.6495

1.65

1.6505

1.651

1.6515

1.652

1.6525

x 10-7

Number of samples

Co

mp

uta

tion

tim

e (

in s

ec

on

ds

)

(a)

0 1 2 3 4 5

x 104

1.6475

1.648

1.6485

1.649

1.6495

1.65

1.6505

1.651

1.6515

1.652

1.6525

x 10-7

Number of samples

Com

pu

tatio

n t

ime (

in s

eco

nds

)

(b)

Fig. 6.83: Computation time by the embedded PowerPC™440 processor–AGPC Co-processor system at each time

sample: (a) 14,560 samples for first control simulation and (b) 58,240 samples for second control simulation.

PowerPC™440 processor system as a co-processing hardware system, and that it can be adapted for the nonlinear

F-16 auto-pilot control system. The FPGA implementation gives approximately 3.9698 x 106 and 3.8254 x 10

3

times improvements when compared to floating-point MATLAB AGPC algorithm with a computation time of

655.1ms running an Intel® Core™2 CPU personal computer and the AGPC hardware co-simulation block model

running on the Virtex-5 FX70T ML507 FPGA board with a computation time of 0.63572ms respectively.

Again, by comparing the approximately 0.16502µs of Fig. 6.83 to the identification and control

computation time for the F-16 aircraft of approximately 6.1048 seconds of Fig. 6.60(a) in Section 6.4.5 using the

AGPC algorithm implemented on dual-core processor of Intel® Core™2 CPU running at 1.86GHz with the

Table 6.23: Summary of the computation times at various stages of the AGPC Co-Processor system

development as well as the complete embedded PowerPC™ processor–AGPC Co-Processor

system.

AGPC Co-Processor system development stages Number of samples Total computation

time (seconds)

Computation time at

each time Sample

Floating-point MATLAB AGPC algorithm 160 104.8105 655.1ms

Fixed-point MATLAB AGPC algorithm 160 100.17 626.1ms

System Generator model of the synthesized

AGPGC algorithm

14560 1.8818 0.12922ms

Hardware Co-Simulation block model of the

synthesized AGPGC algorithm

14560 9.2561 0.63572ms

Embedded PowerPC™440 processor – AGPC

Co-Processor system

14560 - 0.1650156µs

Embedded PowerPC™440 processor – AGPC

Co-Processor system

58240 - 0.1650168µs


294

MATLAB “parfor” for parallel implementation, it may also be argued that the FPGA implementation appears to

be a suitable implementation platform due to the significant reduction in computation time.

A major drawback with the FPGA implementation technique demonstrated in this study is that as the size

of the neural network model parameters increases, the number of input-output ports increases which may exceed

the available 640 input-output (I/O) ports on the Virtex-5. Although, the proposed identification algorithms have

been formulated and demonstrated to train the networks using small network parameters for all the examples

considered in this study, but these may not generally be the case for other multivariable systems with several

inputs and outputs. Thus, investigating the FPGA implementation of the neural network-based model

identification algorithms and the nonlinear adaptive model predictive control (NAMPC) algorithm would be a

possible solution to this drawback. In that case, only a streaming measurement data would be required for

implementing the identification and control algorithm running on the FPGA, thus reducing the number of I/O

ports required. Unlike the AGPC algorithm, the NN-based model identification algorithms as well as the NAMPC

algorithm are based on iterative optimizations. This will definitely increase the “BLOCK PERIOD”,

“LATENCY” and consequently FPGA hardware resources. Experiences acquired in this study have shown that

the design will not have a constant throughput required to generate the System Generator block model. A solution

to the aforementioned problem would be the parallelization of these iterative algorithms before their syntheses.

Chapter 7 Evaluation and Discussion of Results, Open Issues and Recommendations

295

CHAPTER 7

CONCLUSIONS

7.1 Evaluation and Discussion of Results

In this thesis, new comprehensive neural network (NN) based algorithms were developed for the model

identification and model predictive control of processes with the purpose of being combined to form adaptive

model predictive control strategy with reduced computational time. Also, new network control system and FPGA

architectures are proposed for further reducing the overhead of the strategies implementation on conventional

network control systems. The identification algorithms are based on a class of dynamic feedforward neural

networks (DFNN) trained with two different methods, namely: the adaptive recursive least square (ARLS)

method and the modified Levenberg-Marquardt algorithm (MLMA). For the model predictive control two

algorithms have been developed, namely: the adaptive generalized predictive control (AGPC) algorithm and the

nonlinear adaptive model predictive control (NAMPC) algorithm.

The adaptive model predictive control strategy involves first the training of the neural network (NN) by

any of the considered training algorithms, the selection of the optimal network parameters and the validation of

the trained network by using any of three different methods, namely: one-step prediction of the training and

validation (test) data, 5-step and 10-step ahead prediction of the training data, and the Akaike’s final prediction

error (AFPE) estimate. Next, the validated NN is used as the process model with which the parameters of the

adaptive model predictive control algorithms are tuned. This strategy is performed every time new samples of the

measured controlled variables are obtained. New control actions are computed by the tuned model predictive

control algorithm.

Hardware in the loop simulations of the implementation of the possible strategies on conventional, SOA

network control system and FPGA architectures have been performed for three highly nonlinear systems, namely:

1) the temperature control of a fluidized bed furnace reactor (FBFR) of the steam deactivation unit (SDU) of a

fluid catalytic cracking (FCC) pilot plant, 2) the soluble oxygen (the so called dissolved oxygen concentration)

control of the third aerobic reactor of an activated sludge wastewater treatment plant (AS-WWTP) process), and

3) the auto-pilot control system of a nonlinear F-16 fighter aircraft.

The obtained results show that the combination of the ARLS or the MLMA algorithms with the NAMPC

algorithms when they are implemented on multi-core and SOA architectures result to the best control performance

for the first two cases. For systems with lower sampling times, like the nonlinear F-16 auto-pilot control system,

requires the same combination of algorithms when it is implemented on an FPGA unit presents improved

performance and is realized within this short sampling time. All the discussions below are made with reference to

the work presented in Chapter 6 and are explained in detail as presented in the following.


296

7.1.1 The Temperature Control of a Fluidized Bed Furnace Reactor (FBFR)

The results that are discussed here concern the model identification and adaptive control of the FBFR

temperatures presented in Section 6.2. The sampling time of the FBFR should not be higher than 120 seconds.

The total computation time of the neural network model identification of the FBFR process using the ARLS and

MLMA algorithms were 149.02 and 3.875 seconds respectively for 100 iterations (epochs) and for 1320 samples.

This implies that the FBFR process model can be identified at each time sample in 0.1129 and 0.0029 seconds

using the ARLS and MLMA algorithms respectively. The network training results in Section 6.2.2 showed that

although the ARLS has faster convergence property, the MLMA has smaller prediction errors than the ARLS.

Also, it was observed in Fig. 6.15 and Fig. 6.16 that the results of the FBFR reactor and heater temperatures

predictions in tracking the respective desired temperatures obtained using the model trained with the MLMA

algorithm outperformed those trained using the ARLS algorithm.

Furthermore, to compare the efficiency of the MLMA algorithm, it was compared with two well-known

existing neural network training algorithms, namely: the backpropagation with momentum (BPM) and the

incremental backproagation (INCBP). Simulation results over 1320 samples showed that the MLMA algorithm

had the best prediction performance and the least computation time of 0.0016 seconds at each time sample

followed by the BPM with 0.0035 seconds, and lastly the INCBP algorithm with 0.0074 seconds.

The overall worst case turnaround time of the closed loop control by the proposed AGPC and NAMPC

algorithms when the model is identified by the ARLS algorithm were found to be 3.1207 and 8.0809 seconds

respectively, when these two algorithms are implemented on a typical four core PC. As turnaround time is defined

as the time elapsed from the moment new samples of the controlled variables are read until the moment the

corrections of the values of the manipulated variables are computed and become available. The results of the

FBFR reactor and heater temperatures predictions in tracking the respective desired temperatures showed that the

NAMPC outperforms the AGPC at the expense of 2.5895 times the computation time used by the AGPC

algorithm as evident in Fig. 6.22. The monitored performance and turnaround times of the considered algorithms

imply that the current personal computer technologies are adequate to solve computationally intensive tasks for

process control applications requiring sampling times much shorter than the upper limit of the FBFR process.

Finally, because of the superior control performance of the NAMPC algorithm when it is combined with

MLMA algorithm for the Neural network model identification, its suitability for the FBFR process model

identification and control over a service-oriented architecture (SOA) cluster network based on the device profile

for web services (DPWS) was investigated . The worst case overall control loop turnaround time was 12.8465

seconds. This result shows that the proposed MLMA model identification and the NAMPC control algorithms

implementation over the SOA cluster network based on the DPWS meets by far the limit imposed by the upper

limit of the FBFR sampling time and therefore they can be deployed for the FBFR process control in an industrial

environment.


297

7.1.2 The Soluble Oxygen Control in the Aerobic Reactor of the Process of an

Activated Sludge Wastewater Treatment Plant (AS-WWTP)

The results that are presented and discussed here follow from the model identification and adaptive

dissolved oxygen concentration control in the aerobic reactor of the AS-WWTP process of Section 6.3. The

sampling time required for the closed loop control of the validated simulation in Simulink of the AS-WWTP

process is 15 minutes which corresponds to the time over which dry-weather data was collected. The total

computation time to obtain the neural network model of the aerobic reactor for the AS-WWTP process using the

ARLS and MLMA algorithms were 42.2188 and 2.3750 seconds respectively over 1,076 training data samples for

100 iterations (epochs). This implies that the NN model of the aerobic reactor of the AS-WWTP process can be

identified in 0.0392 and 0.0022 seconds at each time sample by the ARLS and MLMA algorithms respectively.

The network training and validation have shown that the ARLS has faster convergence while the MLMA has

overall smaller prediction errors. It was also observed that the MLMA algorithm terminated after 100 epochs

when the maximum number of epochs was specified to be 500 while the ARLS terminated earlier just after about

65 epochs. The on-line closed-loop model identification and adaptive AGPC control of the dissolved oxygen

concentration in the aerobic reactor for the AS-WWTP process using the ARLS and MLMA algorithms for 100

epochs in Section 6.3.2.2 showed that the dissolved oxygen concentration predictions in tracking the desired

reference signals using the model trained with the MLMA algorithm outperformed those based on the model

obtained with the ARLS algorithm as evident in Fig. 6.38 performances.

A comparison of the validation results of a network trained for 100 epochs by the ARLS model

identification algorithm with the backpropagation with momentum (BPM) and the incremental backproagation

(INCBP) algorithms showed that, except for the high computation time of 0.0399 for implementing the ARLS

algorithm at each time sample instant), the ARLS algorithm gave the least prediction errors when compared to the

BPM and the INCBP algorithms. The high computation time obtained with the ARLS algorithm is still about

2.2556 x 104 times less than the 15 minutes sampling time of the AS-WWTP process and therefore the use of the

BMP and INCP are of no use for the aerobic reactor NN model identification of the AS-WWTP since the

computation time and prediction performances of the ARLS algorithm is acceptable.

With respect to the control performance the NAMPC outperforms the AGPC at the expense of higher

computation time than that of the AGPC algorithm by 2.6260 times. The observed computation times for both

AGPC and NAMPC algorithms show that the current multi-core PC technologies can be used for implementing

the proposed neural network-based model identification and adaptive model predictive control strategies for the

case of the AS-WWTP with improved control benefits from their applications.


298

7.1.3 The Nonlinear F-16 Fighter Aircraft Auto-Pilot Control

The results discussed here refer to the work presented Section 6.4. The stability analysis of any closed

loop control operation requires a sampling time of the controlled variables of the F-16 aircraft auto pilot is to be

not higher than 0.5 seconds. The computation time for the NN model identification of the F-16 aircraft auto-pilot

by the ARLS and MLMA algorithms were found to be 298.59 and 6.5313 respectively for 100 iterations (epochs)

over 4,000 training data samples. This implies that the neural network model of the nonlinear F-16 aircraft can be

identified in 0.0746 and 0.0016 seconds at each time sample by the ARLS and MLMA algorithms respectively.

The network training and validation results showed the ARLS converges faster while the lower prediction errors

was achieved with the MLMA algorithm. It was also observed that good predictions accuracy was achieved by

having the ARLS algorithm terminating in almost 85 epochs when the maximum number of epochs was specified

to be 500. The computation times for implementing both algorithms were found to be less than the 0.5 seconds

sampling time limit of the nonlinear F-16 aircraft.

The online closed-loop of the nonlinear F-16 aircraft model identification using the ARLS and the

MLMA algorithms and adaptive control of the auto-pilot control system using the proposed nonlinear adaptive

model predictive control (NAMPC) controller showed the roll rate, pitch rate, yaw rate and the throttle setting

predictions compared with the desired reference trajectories and their respective control input signals obtained

using the models obtained by both ARLS and MLMA algorithms gave essentially the same results.

The computation times of the proposed AGPC and NAMPC algorithms when model identification is

performed with the ARLS algorithm were found to be 6.1048 and 11.0367 seconds respectively at each time

sample over 350 simulation samples. The adaptive AGPC and NAMPC control results of Fig. 6.59 showed that

although the nonlinear F-16 aircraft desired reference trajectory predictions and their respective control inputs

computations obtained by both controllers for the roll, pitch and yaw rate predictions are essentially the same, but

the NAMPC outperformed the AGPC in the throttle and thrust control at the expense of an additional time of

1.8079 time more than the computation time used by the AGPC algorithm. However, these computation times for

the identification and adaptive control of the nonlinear F-16 aircraft auto-pilot by algorithms violates the sampling

time of 0.5 and indicates that the typical multi-core computer technologies are inadequate to solve the less

computationally intensive task of the AGPC or the more computationally intensive task of the NAMPC algorithm

within the auto-pilot sampling time requirements. For this reason a more efficient real-time computing platform

was proposed that could meet the 0.5 seconds time constraints for control of the nonlinear F-16 aircraft.

7.1.4 Real-Time Embedded PowerPC™440 Processor–AGPC Co-Processor System

Implementation on Xilinx Virtex-5 FX70T ML507 FPGA Board

In order to meet the real-time constraint of 0.5 seconds of the nonlinear F-16 aircraft control , a real-time

embedded platform which consists of an embedded PowerPC™440 processor system combined with the AGPC


299

algorithm as a co-processor implemented on a field programmable gate array (FPGA) was proposed, developed

and implemented.

The choice for selecting the AGPC algorithm was made because of its lower computational burden

compared to that of the NAMPC and because it presents almost identical performance with that of NAMPC when

it is applied to the F-16 control. The real-time embedded processor platform was developed and implemented

using the Xilinx Virtex-5FX70T ML507 FPGA board detailed in Chapter 5.3.

Also, the choice of the hard core embedded PowerPC™440 processor for use with Xilinx families of

FPGA including the Virtex-5 FX70T ML507 FPGA board used in this work against the Xilinx’s soft-core

embedded MicroBlaze™ processor was made because the conducted Dhrystone benchmarks on both processors

indicated a high DMIPS number for the PowerPC™440 (1100 against 240 of MicroBlaze).

With the last paragraph in mind, the embedded PowerPC™440 processor system was first designed by

using the basic intellectual properties (IPs) cores required for the AGPC realization on the FPGA. The IP cores

are basically hardware peripheral drivers. The embedded IPs cores used for the embedded processor system

design included: the PowerPC™440 BRAM local memory driver, Virtex-5 XC5VFX70T FPGA on-board SRAM

driver, external DDR2 SRAM driver and a RS232 UART driver.

Then in Chapter 6.5, the AGPC algorithm is synthesized to obtain a register-transfer-level (RTL)

hardware model representative of the AGPC algorithm, by using the Xilinx’s AccelDSP Synthesis tool. The

resulting hardware model of the AGPC algorithm was then used to build a complete System Generator model of

the AGPC algorithm with input and output memory-mapped interfaces from which the AGPC co-processor

intellectual property core (pcore) was generated using the Xilinx’s System Generator for DSP modeling tool.

In order to verify how the System Generator model of the synthesized AGPC algorithm would perform

when deployed to the Virtex-5 FPGA, an FPGA-in-the-loop simulation was performed with a generated hardware

co-simulation (HW Co-Sim) block running on the Virtex-5 FPGA board while the nonlinear F-16 aircraft model

and the identification algorithm were simulated in the MATLAB Simulink environment. As Fig. 6.73 shows,

good predictions and tracking of the roll, pitch and yaw rates reference signal were achieved with a computation

time of 0.63572 microseconds (ms) at each time sample. The FPGA hardware co-simulation computation time is

1.0305 x 103 times faster than the floating-point realization of the AGPC algorithm in MATLAB implemented on

a multicore computer.

Finally, the AGPC co-processor is imported and integrated with an embedded PowerPC™440 processor

system to form the real-time embedded processor platform that is used together with the nonlinear F-16 aircraft

model and the neural network identification scheme to evaluate the real-time control performance of the nonlinear

F-16 aircraft auto-pilot control system. The computation time for the embedded AGPC algorithm for each time

sample was approximately 0.16502µs. Therefore, the FPGA implementation resulted in approximately 3.9698 x

106 and 3.8524 x 10

3 times improvements when compared to the 655.1ms and 0.63572ms computation times


300

obtained respectively by using the floating-point MATLAB AGPC algorithm running on a multicore computer

and the AGPC hardware co-simulation block model running on the Xilinx Virtex-5 FX70T ML507 FPGA board.

Although, the computation time for the neural network model identification is not considered, it was

mentioned in Section 7.1.3 that the computation times for the nonlinear F-16 aircraft model identification at each

time sample were 0.0746 and 0.0016 seconds using the ARLS and MLMA algorithms respectively based on their

respective MATLAB floating-point implemented on an Intel® Core™2 CPU @ 1.86GHz computer. The obtained

computation time of 0.16502µs demonstrates the computational efficiency of the real-time embedded

PowerPC™440 processor–AGPC Co-Processor hardware system implementation of the AGPC algorithm on the

Xilinx Virtex-5 FX70T ML507 FPGA board and that it can be adapted for the nonlinear F-16 aircraft auto-pilot

control system.

By adding the achieved 1.6502µs to the approximately 0.0746 or 0.0016 seconds required for the

nonlinear F-16 NN model identification using the ARLS or MLMA algorithm gives 0.0746 and 0.0016 seconds

respectively at each time sample which is about 6.7024 and 312.5 times the nonlinear F-16 aircraft time constant

of 0.5 seconds. Thus, it is evident that the combined model identification and AGPC control computation time of

approximately 6.1048 seconds has been significantly reduced to the time required only for model identification

since the AGPC computation time has become negligible (1.6502µs) due to computational power of the FPGA

and the efficient hardware/software co-design design methodologies.

7.2 Open Issues

Vast number of literature based on several techniques ranging from first principles models, neural

networks, fuzzy logic approaches, neural-fuzzy approaches, adaptive control, adaptive PID control techniques,

evolutionary approaches for modeling and control of the fluidized bed furnace reactor, steam deactivation unit and

fluid catalytic cracking as well as activated sludge wastewater treatment plants. However, literature is scare, if not

unavailable concerning neural network model identification and MPC control of the nonlinear F-16 aircraft

control. This makes if difficult to make a good comparison of the results obtained in this work with the proposed

techniques. Moreover, all of the literature on FPGA implementation of MPC algorithms and its variations

reviewed in this work are based on static first-order models of the F-16 aircraft combine with linear MPC

controllers, thus making it difficult to compare the results obtained in this work with existing techniques and

results.

Service-oriented architecture (SOA) technology based on device profile for web services (DPWS) has

become the state of the art solution for implementing autonomous and interoperable systems and networks

[Jammes and Smit, 2005b]. Moreover, several device level technologies have been proposed, most notably Jini

[Jini, 2010], universal plug-n-play (UPnP) [UPnP, 2010] and the DPWS [Jammes and Smit, 2005a]. In particular,


301

the DPWS has adopted the well-established web services technology [DPWS, 2006]. However, researchers are

yet to fully exploit the application of these novel SOA technologies at the device level for the realization of

efficient networked control system based on the MPC algorithms. As a result, literature concerning MPC

implementation over SOA networks based on DPWS is scare and performance comparisons of the SOA cluster

network approach based on DPWS used in this work cannot be made.

The Dhrystone millions instructions per second (DMIPs) was not used to evaluate the performance of the

designed embedded PowerPC™ processor system in order to compare its operating frequency based on the FPGA

benchmark available in literature. Of all the literature reviewed in this work concerning FPGA implementation of

the different variations of MPC algorithm, none includes an embedded processor talk less of Dhrystone

benchmark performance evaluation. This makes standard literature on FPGA implementation scare for

performance comparisons.

A major drawback with the FPGA implementation technique demonstrated in this study is that as the size

of the neural network model parameters increases, the number of input-output ports increases which may exceed

the available 640 input-output (I/O) ports on the Virtex-5. Although, the proposed identification algorithms have

been formulated and demonstrated to train the networks using small network parameters for all the examples

considered in this study, but these may not generally be the case for other multivariable systems with several

inputs and outputs. Thus, investigating the FPGA implementation of the neural network-based model

identification algorithms and the nonlinear adaptive model predictive control (NAMPC) algorithm would be a

possible solution to this drawback. In that case, only a streaming measurement data would be required for

implementing the identification and control algorithm running on the FPGA, thus reducing the number of I/O

ports required. Unlike the AGPC algorithm, the NN-based model identification algorithms as well as the NAMPC

algorithm are based on iterative optimizations. This will definitely increase the “BLOCK PERIOD”,

“LATENCY” and consequently FPGA hardware resources. Experiences acquired in this study have shown that

the design will not have a constant throughput required to generate the System Generator block model. A solution

to the aforementioned problem could be the parallelization of these iterative algorithms before their syntheses.

7.3 Recommendations

1). The stability properties of the identification and control algorithms have not been investigated in this work and

can be the subject of future work.

2). Rather than using the large training data set at each model identification sequence, the Markovian modeling

approach could be investigated which can depend on few past state(s) of the system only to predict a suitable

model for use in the controller design. In this way, the concept of reinforcement learning could be applied to

recursively train the neural network and then a model-based adaptive critic controller can be proposed.


302

3). The major difficulty in implementing the nonlinear adaptive model predictive control (NAMPC) algorithm

proposed in this work is related to an observed extremely high “Block Period” in the order of 104. Due to the

high “Block Period”, the hardware synthesis of the NAMPC algorithm resulted in a design with a non-constant

throughput. As a result of the high “Block Period”, the generation of the Hardware Co-Simulation block failed

due to the available 4GB DDR RAM compared to the 8GB RAM memory requirement to synthesize and

generate the NAMPC Hardware Co-Simulation block. The high “Block Period” is as a result of the extensive

iterative optimization and complex matrix manipulations performed by the NAMPC algorithm. Rather than

increasing the DDR RAM memory size, an efficient approach could be to: 1) parallelize the NAMPC

algorithm and 2) efficient pipelining of the NAMPC algorithm. This would reduce the number of “Function

Calls” and consequently the resulting synthesized NAMPC hardware “Block Period”.

4). The parallelization of the NAMPC and AGPC algorithms could reduce the computation time of the control

algorithms, and this can enhance the deployment of the proposed control algorithms for the control of dynamic

systems with relative short sampling times.

5). The hardware realization and FPGA implementation of the adaptive recursive least squares (ARLS) algorithm

or the modified Levenberg-Marquardt algorithm (MLMA) nonlinear neural network model identification

algorithms have not been implemented in this work. It is not rule-of-thumb that only the MPC algorithms be

implemented on FPGAs. All related literature on FPGA implementation considered and reviewed in this work

uses static first principles process models to evaluate MPC implementation on FPGA. The parallelization and

hardware realization of these efficient identification algorithms for implementation on an FPGA could also be

considered as future work.

6). Suppose that the System Generator models of the two identification algorithms are implemented as multiple-

input single-output (MISO) system and the representative pcore are generated. Then several instances of the

generated pcore can be instantiated in the embedded processor system for each system outputs. Thus, a

multiple-input multiple-output (MIMO) system can be realized. This realization of the MIMO system is based

on the assumption that there are no strong interactions between the inputs and the outputs of the system under

consideration. Otherwise, a complete MIMO hardware realization of the identification algorithm should be

considered. In a similar manner, the proposed control algorithms can be mapped as single-input single-output

(SISO) system during hardware synthesis while the MIMO realization can be accomplished in the embedded

processor environment.

Based on the above proposition, if the MISO and SISO hardware synthesis of the identification and

control algorithms could result in a MIMO system identification and control realization and implementation in

the processor environment, then a multiprocessor system can be proposed.

References

303

REFERENCES

A

[Akpan, 2009] Akpan, V. A. (Nov., 2009). “FPGA Embedded Systems Design Technologies: with an Overview

of Xilinx Systems Design Tools”. Department of Electrical and Computer Engineering, Aristotle University of

Thessaloniki, Greece, pp. 1 – 31. [Online] Available: http://users.auth.gr/~iosamar/technicalreports.htm

[Akpan and Hassapis, 2009] Akpan, V. A. and Hassapis, G. (2009). “Adaptive predictive control using recurrent

neural network identification”. In the Proceedings of the 17th Mediterranean Conference on Control and

Automation, Thessaloniki, Greece, 24 – 26, June 2009, pp. 61 – 66.

[Akpan and Hassapis, 2010] Akpan, V. A. and Hassapis, G. D. (2010). “Adaptive Recurrent Neural Network

Training Algorithm for Nonlinear Model Identification using Supervised Learning”. In the Proceedings of the

2010 American Control Conference (ACC2010), Baltimore, Maryland, USA, 30 June – 02 July, 2010, pp. 4937 –

4942.

[Akpan and Hassapis, 2011] Akpan, V. A. and Hassapis, G. D. (2010). “Nonlinear model identification and

adaptive model predictive control using neural networks”. ISA Transactions, vol. 50, no. 2, pp. 177 – 194.

[Akpan and Hassapis, 2011] Akpan, V. A. and Hassapis, G. D. (2011). “Training dynamic feedforward neural

networks for online nonlinear model identification and control applications”. International Reviews of Automatic

Control: Theory & Applications, vol. 4, no. 3, pp. 335 – 350.

[Akpan et al., 2010] Akpan, V. A., Samaras, I. K., and Hassapis, G. D. (2010). “Implementation of Neural

Network-Based Nonlinear Adaptive Model Predictive Control over a Service-Oriented Computer Network”. In

the Proceedings of the 2010 American Control Conference (ACC2010), Baltimore, Maryland, USA, 30 June – 02

July 2010, pp. 5495 – 5500.

[Akpan et al., 2011] Akpan, V. A., Samaras, I. K., and Hassapis, G. D. (2011). “A service-oriented computer

network for industrial control applications”. European Journal of Control, (Submitted).

[Al-Duwaish and Karim, 1997] Al-Duwaish, H. and Karim, M. N. (1997). “A new method for the identification

of Hammerstein model”. Automatica, vol. 33, no. 10, pp. 1871-1875.

References

304

[Albertos and Ortega, 1989] Albertos, P. and Ortega, R. (1989). “On generalized predictive control: Two

alternative formulations”. Automatica, vol. 25, no. 5, pp. 753 – 755.

[Amit et al., 1986] Amit, D. J., Gutfreund, H. and Sompolinsky, H. (1985). “Spin-Glass Models of Neural

Networks. Physical Review A, vol. 32, no. 2, pp. 1007 – 1018.

[Anderson and Eberhardt, 2001] Anderson, D. F. and Eberhardt, S. (2001). “Understanding Flight”. New York,

U.S.A.: McGraw-Hill.

[Anderson and Eberhardt, 2010] Anderson, D. F. and Eberhardt, S. (2010). “Understanding Flight”. 2nd Ed., New

York, U.S.A.: McGraw-Hill.

[Anderson and Rosenfeld, 1988] Anderson, J. A., & Rosenfeld, E. (1988). “Neurocomputing: Foundations of

Research”. Cambridge, MA: The MIT Press.

[Aggelogiannaki and Sarimveis, 2006] Aggelogiannaki, E. and Sarimveis, H. (2006). “Affine radial basis function

neural network for robust control of hyperbolic distributed parameter systems”. World Academy of Science,

Engineering and Technology, vol. 22, pp. 162 – 168.

[Antoniou and Lu, 2007] Antoniou, A. and Lu, W. (2007). “Practical Optimization: Algorithms and Engineering

Applications”. New York, USA: Springer.

[Åström and Witternmark, 1995] Åström, K. J. and Witternmark, B. (1995). “Adaptive Control”. 2nd Ed.,

Reading, MA: Addison-Wesley.

[Azwar et al., 2006] Azwar, Hussain, M. A. and Ramachandran, K. B. (2006). “The study of neural network-

based controller for controlling dissolved oxygen concentration in a sequencing batch reactor”. Bioproc. Biosys.

Eng., vol. 28, pp. 251 – 265.

References

305

B

[Bai, 1998] Bai, E. (1998). “An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear

systems”. Automatica, vol. 34(3), pp. 333-338.

[Bai, 2002] Bai, E. (2002). “A blind approach to the Hammerstein-Weiner model identification”. Automatica, vol.

38, pp. 967-979.

[Bemporad and Morari, 1999] Bemporad, A and M. Morari, M. (1999), “Robust model predictive control: A

Survey”. Springer Lecture Notes in Control and Information Sciences: Robustness in Identification and Control,

vol. 245, pp. 207 – 226.

[Bleris et al., 2006] Bleris, L. G., Vouzis, P. D., Arnold, M. G. and Kothare, M. V. (2006). “A co-processor FPGA

platform for the implementation of real-time model predictive control”. In Proc. American Contr. Conf.,

Minneapolis, Minnesota, U.S.A., 14 – 16 June, 2006.

[Bengio et al., 1994] Bengio, Y., Simard, P. and Frasconi, P. (1994). “Learning long-term dependencies with

gradient descent is difficult”. IEEE Trans. Neural Newt., vol. 5, no. 2, pp. 157 – 166.

[Bibes et al., 2005] Bibes, G., Ouvrard, R., Coirault, P., Rambault, L. and Trigeassou, J. C. (2005). “An optimized

regularization method using Volterra model application to atrazine oxidation process”. In Proceedings of the 2005

IEEE Conference on Control Applications, August 28-31, 2005, Toronto, Canada.

[Bohlin, 2006] Bohlin, T. (2006). “Practical Grey-Box Process Identification”. London: Springer-Verlag.

[Bouchard, 2001] Bouchard, M. (2001). “New recursive-least squares algorithms for nonlinear active control of

sound and vibration using neural networks”. IEEE Trans. Neural Newt., vol. 12, no. 1, pp. 135 – 147.

[Boyd and Vandenberghe, 2007] Boyd, S and Vandenberghe, L. (2006). “Convex Optimization”. Cambridge,

UK: Cambridge University Press.

[Bruce et al., 1986] Bruce, A. D., Canning, A., Forrest, B., Gardner, E. and Wallace, D. J. (1986). “Learning and

Memory Properties in Fully Connected Networks”. AIP Conference Proceedings 151, Neural Networks for

Computing, pp. 65 - 70.

References

306

C

[Camacho and Bordons, 2007] Camacho, E. F. and Bordons, C. (2007). “Model Predictive Control”. 2nd ed.,

London: Springer-Verlag.

[Cakici and Bayramolu, 1995] Cakici, A. and Bayramolu, M. (1995). “An approach to controlling sludge age in

the activated sludge process”. Water Research, vol. 94, pp. 1093 – 1097.

[Cardenas and Romero-Troncoso, 2008] Cardenas, E. O. and Romero-Troncoso, R. J. (2008). “MLP neural

network and on-line backpropagation learning implementation in a low-cost FPGA,” In Proc. of the 18 ACM

Great Lakes Symposium on VLSI (GLSVLSI’08), Orlando, Florida, USA, May 4 – 6, 2008, pp. 333 – 338.

[Chen, 2009] Chen, H. F. (2009). “Recursive system identification”. Acta Mathematica Scientia, vol. 29B, no. 3,

pp. 650 – 672.

[Chen and Narendra, 2001] Chen, L. and Narendra, K. S. (2001). “Nonlinear adaptive control using neural

networks and multiple models”. Automatica, vol. 37, pp. 1245 – 1255.

[Chikh et al., 2010] Chikh, L., Poignet, P., Pierrot, F. and Michelin, M. (2010). “A predictive robust cascade

position-torque control strategy for pneumatic artificial muscles”. In 2010 American Control Conference, USA,

Jun. 30 – Jul. 02, 2010, pp. 6022 – 6028.

[Chiong, 2010] Chiong, R. (2010). “Intelligent Systems for Automated Learning and Adaptation: Emerging

Trends and Applications”. Hershey PA, USA: Information Science Reference, ch. 4.

[Chotkowski et al., 2005] Chotkowski, W., Brdys, M. A. and Konarczak, K. (Oct. 10, 2005). “Dissolved oxygen

control for activated sludge processes”. Intl. J. of Sys. Sci., vol. 36, No. 12, pp. 727 – 736.

[Chow and Tipsuwan, 2001] Chow, M. Y. and Tipsuwan, Y. (2001), “Network-based control systems: A

tutorial”. In Proc. 27th Annu. Conf. IEEE Ind. Electron. Soc., 2001, pp. 1593 – 1602.

[Clarke and Mohtadi, 1989] Clarke, D. W. and Mohtadi, C. (1989), “Properties of generalized predictive control”.

Automatica, vol. 25, no. 6, pp. 859 – 875.

References

307

[Clarke et al, 1987a] Clarke, D. W., Mohtadi, C. and Tuffs, P. S. (1987), “Generalized predictive control – Part I.

The basic algorithm,” Automatica, vol. 23, no. 2, pp. 137 – 148.

[Clarke et al, 1987b] Clarke, D. W., Mohtadi, C. and Tuffs, P. S. (1987), “Generalized predictive control – Part II.

Extensions and Interpretations,” Automatica, vol. 23, no. 2, pp. 149 – 160.

[Coetzee et al., 2010] Coetzee, L. C., Craig, I. K. and Kerrigan, E. C. (2010). “Robust nonlinear model predictive

control of a run-of-mine ore milling circuit”. IEEE Trans. On Control System Tech., vol. 18, no. 1, pp. 222 – 229.

[Colin et al., 2007] Colin, G. Chamaillard, Y., Bloch, G. and Corde, G. (2007). “Neural control of fast nonlinear

systems – Application to a turbogharged SI engine with VCT”. IEEE Trans. Neural Networks, vol. 18, no. 4, pp.

1101 – 1114.

[Coop, 2000] J. B. Coop. (2000, Sept.). The COST Simulation Benchmark: Description and Simulation Manual (a

product of COST Action 624 & COST Action 628. [Online]: http://www.ensic.inpl-nancy.fr/COSTWWTP/.

[COST, 2000] Working Groups of COST Actions 632 and 624. (Sept., 2000). IWA Task Group on Benchmarking

of Control Strategies for WWTPs. http://www.ensic.inplnancy.fr/benchmarkWWTP/Bsm1/Benchmark1.htm

[COST, 2008] Working Groups of COST Actions 632 and 624. (Apr., 2008). IWA Task Group on Benchmarking

of Control Strategies for WWTPs: http://www.ensic.inplnancy.fr/benchmarkWWTP/Bsm1/Benchmark1.htm

[Cote et al., 1995] Cote, M., Grandjean, B. P. A., Lessard, P. and Thibault, J. (1995). “Dynamic modelling of the

activated sludge process: improving prediction using neural networks”. Wat. Res., vol. 29, No. 4, pp.995 – 1004.

[Crocker and Collin, 2007] Crocker, D. and Collin, P. (2007). “Dictionary of Aviation”, 2nd Edition. Spain: A&C

Black Publishers Ltd.

[Cutler and Ramaker, 1980] Cutler, C. R. and Ramaker, B. L. (1980), “Dynamic matrix control – A computer

control algorithm”. In Proc. Joint Automatic Control Conference, San Francisco, CA, August 13 – 15, 1980.

[Cucinotta et al., 2009] Cucinotta, T., Mancina, A., Anastasi, G., Lipari, G., Mangeruca, L., Checcozzo, R. and

Rusinà, F. (2009). “A real-time service-oriented architecture for industrial automation”. IEEE Transactions

Industrial Informatics, vol. 5, no. 3, pp. 267 – 277.

References

308

D

[Daniel and Ruano, 1999] Daniel, H. A. and Ruano, A. E. B. (1999). “Performance comparison of parallel

architectures for real-time control”. Microproc. Microsys., vol. 23, pp. 325 – 336.

[Decotignie, 2005] Decotignie, A. J. D. (2005), “Ethernet-based real-time and industrial communications”. In

Proc. IEEE. vol. 93, no. 6, pp. 1102 – 1117.

[DDCMC, 1999] Digital Data Communications for Measurement and Control – Fieldbus for Use in Industrial

Control Systems – Part 4: Data Link Protocol Specification, IEC 61158-4, 1999.

[Dennis and Schnabel, 1996] Dennis, J. E. and Schnabel, R.B. (1996). “Numerical Methods for Unconstrained

Optimization and Nonlinear Equations”. Englewood Cliffs, NJ: SIAM, Prentice-Hall.

[Dias et al., 2005] Dias, F. M., Antunes, A., Vieira, J. and Mota, A. M. (2005). “On-line training of neural

networks: A sliding window approach for the Levenberg-Marquardt algorithm”. IWINAC 2005, LNCS 3562,

Berlin Heidelberg: Springer-Verlag, pp. 577 – 585.

[Dones et al., 2010] Dones, I., Manenti, F., Preisig, H. A. and Buzzi-Ferraris, G. (2010). “Nonlinear model

predictive control: A self-adaptive approach”. Ind. Eng. Chem. Res., vol. 49, no. 10, pp. 4782 – 4791.

[DPWS, 2006] Microsoft devices profile for web services specifications. [Online]. Available:

http://msdn2.microsoft.com/en-us/library/ms951214.aspx February 2006.

[Dubey, 2009] Dubey, R. (2009). “Introduction to Embedded System Design Using Field Programmable Gate

Arrays”. London: Springer-Verlag.

[Dunbar and Desa, 2005] Dunbar, W. B. and Desa, S. (2005). “Distributed model predictive control for dynamical

supply chain management”. In Proc. Int. Workshop on Assessment and Fault Directions of NMPC, Freudentadt-

Lauterbad, Germany, 26 – 30 Aug., 2005.

[Ducard, 2009] Ducard, G. J. J. (2009). “Fault-Tolerant Flight Control and Guidance Systems: Practical Methods

for Small Unmanned Aerial Vehicles”. London: Springer-Verlag Ltd.

References

309

E

[Ekman et al., 1999] Ekman, M., Björlenius, B. and Andersson, M. (2006). “Control of the aeration volume in an

activated sludge process using supervisory control strategies”. Wat. Res., vol. 40, pp. 1668 – 1676.

[Elman, 1990] Elman, J. (1990), “Finding Structure in Time”. Cognitive Science, vol. 14, pp. 179 - 211.

[Enqvist and Ljung, 2005] Enqvist, M. and Ljung, L. (2005). “Linear approximations of nonlinear FIR systems

for separable input processes”. Automatica, vol. 41, pp. 459-473.

[Erl, 2005] Erl, T. (2005), “Service-Oriented Architecture: Concepts, Technology, and Design”. Upper Saddle

River, New Jersey: Prentice-Hall.

[Etkin and Reid, 1996] Etkin, B. and Reid, L. D. (1996). “Dynamics of Flight: Stability and Control”, 3rd ed.

Toronto, Canada: John Wiley & Sons, Inc.

F

[Felt, 2006] Felt, A. J. (2006). “Stochastic linear model predictive control using nested decomposition”.

Department of Mathematics and Computing, University of Wisconsin – Stevens Point, U.S.A., 2006.

[Fletcher, 1987] Fletcher, R. (1987). “Practical Methods of Optimization”. 2nd ed., Chichester: Wiley & Sons.

[Fletcher, 2005] Fletcher, B. H. (2005). “FPGA embedded processors: Revealing true system performance”.

Embedded Training Program, Embedded System Conference, San Francisco, USA, 6 – 10 Mar., 2005, ETP–367,

pp. 1 – 18.

[Floriani et al., 2000] Floriani, E., Dudok de Wit, T. and Le Gal, P. (2000). “Nonlinear interactions in a rotating

disk flow: From a Volterra model to the Ginzburg-Landau equation”. CHAOS, vol. 10(4), pp. 834 – 847.

[Froisy, 1994] Froisy, J. B. (1994). “Model predictive control: Past, present and future”. ISA Trans. 33, pp. 235 –

243.

References

310

G

[Galluzzo et al., 2001] Galluzzo, M., Ducato, R., Bartolozzi, V. and Picciotto, A. (2001). “Expert control of DO in

the aerobic reactor of an activated sludge process”. Comp. & Chem. Eng, vol. 25, pp. 619 – 625.

[Ganssle and Barr, 2003] Ganssle, J. and Barr, M. (2003). “Embedded Systems Dictionary”. CA 95020, USA:

CMP Books.

[García and Morshedi, 1986] García, C. E. and Morshedi, A. M. (1986), “Quadratic programming solution of

dynamic matrix control (QDMC)”. Chem. Eng. Commun. Vol. 46, pp 73 – 87.

[García et al., 1989] García, C. E., Prett, D. M. and Morari, M. (1989), “Model Predictive Control: Theory and

Practice – A Survey”. Automatica, vol. 25, no. 3, 1989, pp. 335 – 348.

[Garcia et al., 2004] Garcia, J., Bleris, L., Arnold, M. G. and Kothare, M. V. (2004). “LNS architectures for

embedded model predictive control processors”. In Proc. CASES’04, Washington, DC, U.S.A., 22 – 25 Sep.,

2004, pp. 79 – 84.

[Gernaey et al., 2004] Gernaey, K. V., van Loosdrecht, M. C. M., Henze, M., Lind, M. and Jørgensen, B. (2004).

“Activated sludge wastewater treatment plant modelling and simulation: state of the art”. Environmental

Modelling and Software, vol. 19, pp. 763 – 783.

[Gómez and Baeyens, 1998] Gómez, J. C. and Baeyens, E. (1998). “Analysis of Dynamic System Identification

using Rational Orthonormal Bases”. Ph.D Dissertation, Department of Electrical and Computer Engineering, The

University of Newcastle, Australia.

[Gomm et al., 1997] Gomm, J. B., Evans, J. T. and Williams, D. (1997). “Development and performance of a

neural-network predictive controller”. Control Eng. Pract., vol. 5, no. 1, pp. 49 – 59.

[Goodhart et al., 2000] Goodhart, S., Nishizawa, J. Yano, K. and Yada, H. (2000), “Advanced control in

cogeneration utility management”. Computing & Control Engineering Journal, vol. 11, no. 6, pp. 273 – 282.

[Goodwin and Sin, 1984] Goodwin, G. C. and Sin, K. S. (1984). “Adaptive Filtering, Prediction and Control”.

New Jersey, USA: Prentice-Hall.

References

311

[Grimble and Ordys, 2001] Grimble, M. J. and Ordys, A. W. (2001), “Predictive control for industrial

applications”. Annual Reviews in Control, vol. 25, pp. 13-24, 2001.

[Grossberg, 1982] Grossberg, S. (1982). “Studies of the Mind and Brain”. Holland: Reidel Press, Drodrecht.

[Guarneri et al., 2008] Guarneri, P., Rocca, G. and Gobbi, M. (2008). “A neural network-based model for the

dynamic simulation of the tire/suspension system while traversing road irregularities”. IEEE Trans. Neural Netw.,

vol.19, no. 9, pp. 1549 – 1563.

[Guccione, 2000] Guccione, S. (2000) “List of FPGA-Based Computing Machines, [Online] Available:

http://www.io.com/~guccione/HW-list.html.

[Gupta et al., 2003] Gupta, M. M., Jin, L. and Homma, N. (2003). “Static and Dynamic Neural Networks: From

Fundamental to Advanced Theory”. Hoboken, New Jersey: John Wiley & Sons.

H

[Hafidi et al., 2008] Hafidi, G., Tabbani, E., Dumur, D. and Vande, A. (2008). “Robust analysis of nonlinear

predictive control of fed-batch E. coil cultures”. In 16th Med. Conf. on Control and Automation, France, Jun. 25 –

27, 2008, pp. 564 – 569.

[Hagan and Menhaj, 1994] Hagan, M. T. and Menhaj, M. B. (1994). “Training feedforward network with the

Marquardt algorithm”. IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989 – 993.

[Hagan et al., 1996] Hagan, M. T., Demuth, H. B. and Beale, M. (1996). “Neural Network Design”. U.S.A.: PWS

Publishing Company.

[Halldrosson and Unbehauen, 2001] Halldrosson, U and Unbehauen, F. H. (2001). “Multirate approach to

nonlinear predictive control”. In Proc. 13th Int. Conf. on Process Control, Štrnské Pleso, Slovakia, 11 – 14 June,

2001.

[Hamilton et al., 2006] Hamilton, R., Braun, B., Dare, R., Koopman, B. and Svoronos, S. (Aug., 2006). “Control

issues and challenges in wastewater treatment plants: Conserving energy and improving performance”. IEEE

Control Systems Magazine, pp. 63 – 69.

References

312

[Haykin, 1999] Haykin, S. (1999). “Neural Networks: A Comprehensive Foundation”. 2nd ed. Upper Saddle

River, NJ: Prentice-Hall.

[He and Ling, 2005] He, M. and Ling, K. V. (2005). “Model predictive control on a chip”. In Proc. Int. Conf.

Contr. Automation, Budapest, Hungary, 27 – 29 June, 2005, pp. 528 – 532.

[He et al., 2006] He, M., Chen, C. and Zhang, X. (2006). “FPGA implementation of a recursive rank one updating

matrix inversion algorithm for constrained MPC”. In Proc. of the 6th World Congress on Intelligent Control and

Automation, Dalian, China, 21 – 23 June, pp. 733 – 737.

[Hebb, 1949] Hebb, D. O. (1949) “Organization of Behavior”. New York: John Wiley.

[Hegrenæs et al., 2005] Hegrenæs, Ø., Gravdahl, J. T. and Tøndel, P. (2005). “Spacecraft altitude control using

explicit model predictive control”. Automatica, vo. 41, pp. 2107 – 2114.

[Henze et al., 1996] Henze, M. Harremoës, P., Jansen, J. and Arvin, E. (1996). “Wastewater Treatment:

Biological and Chemical Processes”. 2nd ed., Berlin: Springer Verlag.

[Hertz et al., 1991] Hertz, J., Krough, A. and R. G. Palmer, R. G. (1991). “Introduction to the Theory of Neural

Computing, Redwood City, California: Addison-Wesley.

[Hines, 1997] Hines, J. W. (1997). “Fuzzy and Neural Approaches in Engineering: MATLAB Supplement”. New

York: John Wiley & Sons.

[Hippe, 2006] Hippe, P. (2006). “Windup in Control”. Germany: Springer-Verlag Ltd.

[Hoerner, 1965] Hoerner, S. F. (1965). “Fluid – Dynamic Drag: Theoretical, Experimental and Statistical

Information”. Bakersfield, USA: Hoerner Fluid Dynamics.

[Holenda et al., 2008] Holenda, B., Domokos, E., Rédey, Á. and Fazakas, J. (2008). “Dissolved oxygen control of

the activated sludge wastewater treatment process using model predictive control”. Comp. & Chem. Eng, vol. 32,

pp. 1270 – 1278.

[Hopfield, 1982] Hopfield, J. J. (1982). “Neural networks and physical systems with emergent collective

computational Abilities”. In Proceedings of the National Academy of Sciences, vol. 79, pp. 2554 - 2558.

References

313

[Hopfield, 1984] Hopfield, J. J. (1984). “Neurons with graded response have collective computational properties

like those of two-state neurons”. In Proceedings of the National Academy of Sciences, No. 81, pp. 3088 - 3092.

[Hopfield et al., 1983] Hopfield, J. J., Feinstein, D. I. and Palmer, R. G. (1983). “Unlearning has a stabilizing

effect in collective memories”. Nature, vol. 304, pp. 159 - 159.

[Hugo, 2000] Hugo, A. (2000). “Limitations of model predictive controllers”. Hydrocarbon Processing, vol. 79,

no. 83, pp. 83 – 88.

I

[IBM PEPC440, 2010] PowerPC 440x6 Embedded Processor Core. User’s Manual, v7.1, September 29, 2010, pp.

1–601. https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/2D417029AE3F3089872570F8006D4E99

[IBM PPC405C, 2006] IBM PowerPC 405 Embedded Core. IBM Systems and Technology Group, New York.

pp. 1 – 2. https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699300651D97

[IBM PPC440C, 2006] IBM PowerPC 440 Embedded Core. IBM Systems and Technology Group, New York.

pp. 1 – 4. https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/F72367F770327F8A87256E63006CB7EC

[IBM TPPC440C, 1999] IBM PowerPC 440 Core: A high-performance, superscalar processor core for embedded

application. IBM Microelectronics Division, Research Triangle Park, NC. September 19, 1999, pp. 1 – 18.

https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/F72367F770327F8A87256E63006CB7EC

[IEC, 1999] IEC 61158–4 (1999). “Digital Data Communications for Measurement and Control—Fieldbus for

Use in Industrial Control Systems” Part 4: Data Link Protocol Specification, IEC 61158 – 4.

[Ilin et al., 2008] Ilin, R., Kozma, R. and Werbos, P. J. (2008). “Beyond feedforward models trained by

backpropagation: A practical training tool for a more efficient universal approximator”. IEEE Trans. Neural

Networks, vol. 19, no. 6, pp. 929 – 937.

[Isermann et al., 1992] Isermann, R., Lachmann, K. and Matko, D. (1992). “Adaptive Control Systems”. Systems

and Control Engineering. New York, NY: Prentice-Hall.

References

314

J

[Jack, 2003] Jack, H. (2003). “Automating Manufacturing System with PLCs”. Version 4.2, April 3, 2003,

http://clay-more.engineer.gvsu.edu/~jackh/books.html.

[Jammes et al., 2005] Jammes, F, Mensch, A. and Smit, H. (2005). “Service-oriented device communications

using the device profile for web services”. In Proc. 3rd

Int’l. Workshop on Middleware for Pervasive and Ad-Hoc

Comput., MPAC05, Poznan, Poland, Nov. 2005, pp. 1 – 8.

[Jammes and Smit, 2005a] Jammes, F. and Smit, H. (2005a), “Service-oriented architectures for devices - the

SIRENA view”. In Proc. 3rd IEEE Int. Conf. Ind. Informat.- INDIN ‘05, Aug. 10-12, 2005, pp. 140 – 147.

[Jammes and Smit, 2005b] Jammes, F. and Smit, H. (2005b), “Service-oriented paradigms in industrial

automation”. IEEE Trans. Ind. Informat., vol. 1, no. 1, pp. 62 – 70.

[Jian et al., 2010] Jian, L., Dewei, L. and Yugeng, X. (2010). “Implementation of dynamic control on FPGA”. In

Proc. of the 29th Chinese Control Conf., Beijing, China, 29 – 31 July, 2010, pp. 5970 – 5974.

[Jiang and Kamel, 2007] Jiang, J. and Kamel, S. (2007). “Pitch control of an aircraft with aggregate reinforcement

learning algorithms”. In Proc. Int’l Joint Conf. on Neural Networks, USA, Aug. 12 – 17, 2007, pp. 41 – 46.

[Jin and Su, 2008] Jin, Y. and Su, C. (2008). “Adaptive model predictive control using diagonal recurrent neural

network”. Fourth Int’l. Conf. on Natural Computation, Jinan, Oct. 18-20, 2008, pp. 276 – 280.

[Jini, 2010] The community resource for Jini technology. [Online]. Available : http://www.jini.org/.

[Johansen et al., 2007] Johansen, T. A., Jackson, W., Schreiber, R. and Tøndel, P. (2007). “Hardware synthesis of

explicit model predictive controllers”. IEEE Trans. Neural Networks, vol. 15, no. 1, pp. 191 – 197.

[Joos and Fichter, 2011] Joos, A. and Fichter, W. (2011). “Parallel implementation of constrained nonlinear

model predictive control for an FPGA-based onboard flight computer”. Advances in Aerospace Guidance,

Navigation and Control, II, pp. 273 – 286.

[Jordan, 1986a] Jordan, M. (1986). “Attractor Dynamics and Parallelism in a Connectionist Sequential Machine”.

In Proceedings of the Eighth Annual Conference on Cognitive Science Society, Hillsdale, NJ, pp. 531-546.

References

315

[Jordan, 1986b] Jordan, M. I. (1986b). “Serial OPrder: A Parallel Distributed Processing Approach”. Technical

Report No. 8604, Institute for Cognitive Science, University of California, San Diego, CA.

K

[Kalafatis et al., 1997] Kalafatis, A. D., Wang, L. and Cluett, W. R. (1997). “Identification of Wiener-type

nonlinear systems in a noisy environment”. Int. Journal of Control, vol. 66, pp. 923-941.

[Kalman, 1960a] Kalman, R. E. (1960). “Contributions to the theory of optimal control”. Bull. Soc. Math. Mex. 5,

pp. 102 – 119.

[Kalman, 1960b] Kalman, R. E. (1960). “A new approach to linear filtering and prediction problems”. Trans.

ASME, J. Basic Engineering, pp. 35 – 45.

[Kalra and Georgaki, 1994] Kalra, L and Georgakis, C. (1994). “Effects of process nonlinearity on the

performance of linear model predictive controllers for the environmentally safe operation of a fluid catalytic

cracking unit”. Ind. Eng. Chem. Res, vol. 33, pp. 3063 – 3069.

[Kelley, 1995] Kelley, C. T. (1995). “Iterative Methods for Linear and Nonlinear Equations”. Philadelphia:

SIAM.

[Kelley, 1999] Kelley, C. T. (1999). “Iterative Methods for Optimization”. Philadelphia: SIAM.

[Kendig, 1984] Kendig, W. C. (Aug. 21,1984). “Aircraft thrust control scheme for terrain following system”.

United States Patent [19], Patent Number 4,467,429, 10 pages.

[Khalil, 1996] Khalil, H. K. (1996). “Nonlinear Systems”. Upper Saddle River, NJ: Prentice-Hall.

[Kilts, 2007] Kilts, S. (2007). “Advanced FPGA Design: Architecture, Implementation, and Optimization”. New

Jersey, USA: John Wiley & Sons.

[Kohonen, 1977] Kohonen, T. (1977). “Associative Memory: A System-Theoretical Approach”. Berlin: Springer-

Verlag.

References

316

[Kohonen, 1982] Kohonen, T. (1982). “Self-organized formation of topologically correct feature maps”.

Biological Cybernetics, vol. 43, pp. 59 - 69.

[Kohonen, 1984] Kohonen, T. (1984). “Self - Organization and Associative Memory”. Berlin: Springer-Verlag.

[Kohonen, 1995] Kohonen, T. (1995). “Self-Organizing Maps”. London: Springer.

[Kosko, 1988] Kosko, B. (1988). “Bidirectional Associative Memories”. IEEE Trans. Systems, Man and

Cybernetics, vol. 18(1), pp. 49-60.

L

[Lee et al., 1999] Lee, T. T., Wang, F. Y., Islam, A. and Newell, R. B. (1999). “Generic distributed parameter

model control of a biological nutrient removal (BNR) activated sludge process”. J. of Pro. Cont., vol. 9, pp. 505 –

525.

[Lee et al., 2006] Lee, K. C., Lee, S. and Lee, M. H. (2006), “Worst case communication delay of real-time

industrial switched Ethernet with multiple levels”. IEEE Trans. Ind. Electron., vol. 53, no. 5, pp. 1669 – 1676.

[Lee et al., 2006(a)] Lee, T. T., Wang, F. Y., Islam, A. and Newell, R. B. (2006). “Advances in distributed

parameter approach to the dynamics and control of activated sludge processes for wastewater treatment”. Wat.

Res., vol. 40, pp. 853 – 869.

[Lee and Markus, 1967] Lee, E. B. and Markus, L. (1967). “Foundations of Optimal Control Theory”. New York:

John Wiley and Sons.

[Levin and Narendra, 1993] Levin, A. U. and Narendra, K. S. (1993). “Control of nonlinear dynamical systems

using neural networks: controllability and stabilization”. IEEE Trans. Neural Networks, vol. 4, no. 2, pp. 192 –

206.

[Lewis and Syrmos, 2003] Lewis, F. L. and Syrmos, V. L. (2003), “Optimal Control”. 2nd ed., New York: John

Wiley & Sons.

References

317

[Li and Kashiwagi, 2005] Li, Y. and Kashiwagi, H. (2005). “High-order Volterra model predictive control and its

application to nonlinear polymerisation process”. International Journal of Automation and Computing, vol. 2, pp.

208 – 214.

[Ling et al., 2004] Ling, K. V., Bingfang, W. U., Minghua, H. E. and Yu, Z. (2004)“A model predictive controller

for multirate cascaded systems”. In Proc. American Control. Conf., Boston, USA, 30 Jun. – 2 Jul., 2004, pp.

1575 – 1579.

[Ling et al., 2005] Ling, K. V., Maciejowski, J. M. and Wu, B. F. (2005). “Multiplexed model predictive control”.

In Proc. 16th IFAC World Congress, Prague, July, 2005.

[Ling et al., 2006] Ling, K. V., Yue, S. P. and Maciejowski, J. M. (2006). “A FPGA implementation of model

predictive control”. In Proc. American Contr. Conf., Minneapolis, Minnesota, U.S.A., 14 – 16 June, 2006, pp.

1930 – 1935.

[Ling et al., 2008] Ling, K. V., Wu, B. F. and Maciejowski, J. M. (2008). “Embedded model predictive control

(MPC) using a FPGA”. In Proc. 17th World Congress, The International Fed. of Aut. Control, Seoul, Korea, July

6 – 11, 2008, pp. 15250 – 15255.

[Lin-Shi et al., 2007] Lin-Shi, X., Morel, F., Lior, A. M., Allard, B. and Rétif, J. M. (2007). “Implementation of

hybrid control for motor drives”. IEEE Transactions on Ind. Elect., vol. 54, no. 4, pp. 1946 – 2126.

[Liu, 2002] Liu, D. (2002). “Neural network-based adaptive critic designs for self-learning control”. In Proc. 9th

International Conf. Neural Information Processing, Singapore, 18 – 22 Nov., 2002, vol. 3, pp. 1252 – 1256.

[Ljung, 1999] Ljung, L. (1999). “System Identification: Theory for the User”, 2nd ed., Upper Saddle River, NJ:

Prentice-Hall.

[Lu and Tsai, 2008] Lu, C. H. and Tsai, C. C. (2008). “Adaptive predictive control with recurrent neural network

for industrial process: An application to temperature control of a variable-frequency oil-cooling machine”. IEEE

Trans. Industrial Electronics, vol. 55, no. 3, pp. 1366 – 1375.

[Lukasse and Keesman, 1999] Lukasse, L. J. S. and Keesman, K. J. (1999). “Optimised operation and design of

alternating activated sludge processes for N-removal”. Wat. Res., vol. 33, No. 11, pp. 2651 – 2659.

References

318

M

[Maciejowski, 2002] Maciejowski, J. M. (2002). “Predictive Control with Constraints”. England: Pearson

Education Limited.

[Maeda and Wakamura, 2005] Maeda Y. and Wakamura, M. (2005). “Simultaneous perturbation learning rule for

recurrent neural networks and its GPGA implementation”. IEEE Trans. Neural netw., vol. 16, no. 6, pp. 1664 –

1672.

[Magni and Scattolini, 2006] Magni, L. and R. Scattolini, R. (2006). “Stabilizing decentralized model predictive

control of nonlinear systems”. Automatica, vol. 42, pp. 1231 – 1236.

[Malinowski and Yu, 2011] Malinowski, A. and Yu, H. (2011). “Comparison of embedded system design for

industrial applications”. IEEE Transactions on Industrial Informatics, vol. 7, no. 2, pp. 244 – 254.

[Manzie et al., 2000] Manzie, C., Palaniswami, M. and Watson, H. (2000). “Model predictive control of a fuel

injection system with a radial basis function network observer”. In Proc. IEEE-INNS-ENNS International Joint

Conference on Neural Networks, Como, Italy, July 24 – 27, vol. 4, pp. 359 – 364.

[Marquardt, 1963] Marquardt, D. W. (1963). “An algorithm for least-squares estimation of nonlinear parameters”.

J. Soc. Ind. Appl. Math., vol. 11, no. 2, pp. 431-441.

[Masters, 1993a] Masters, T. (1993a), “Practical Neural Network Recipes in C++”. San Diego, CA: Academic

Press.

[Masters, 1993b] Masters, T. (1993b). “Advanced Algorithms for Neural Networks”. New York: John Wiley &

Sons.

[MathWorks, 2009] The MathWorks Inc., MATLAB & Simulink R2009b, Natick, USA. www.mathworks.com.

[MathWorks, 2010] The MathWorks Inc., MATLAB & Simulink R2010a, Natick, USA. www.mathworks.com.

[Martin, 2002] Martin, G. (2002). “The future of high-level modelling and system level design: Some possible

methodology scenarios”. [Online] Available: http://www.eda.org/edps/edp02/PAPERS/edp02-s1_1.pdf

References

319

[Maxfield, 2004] Maxfield, C. (2004). “The Design Warrior’s Guide to FPGAs: Devices, Tools and Flows”.

Oxford, UK: Newnes – Elsevier.

[McBride and Narendra, 1965] McBride, L. E., Jr. and Narendra, K. S. (1965). “Optimization of time-varying

systems”. IEEE Transactions on Automatic Control, vol. 10, no. 3, pp. 289 – 294.

[Meloni et al., 2010] Meloni, P., Secchi, S. and Raffo, L. (2010). “An FPGA-based framework for technology-

aware prototyping of multicore embedded architectures”. IEEE Embedded Syst. Letters, vol. 2, no. 1, pp. 5 – 9.

[Mensch and Rouges, 2009] Mensch, A & Rouges, S. (2009). DPWS Core version 2.1 User Guide.

[DPWS, 2006] Microsoft device profile for web services specification. [Online]. Available:

http://msdn2.microsoft.com/en-us/library/ms951214.aspx February 2006.

[Minsky and Pappert, 1969] Minsky, M. and Pappert, S. (1969). “Perceptrons: An Introduction to Computational

Geometry”. Cambridge, MA: MIT Press.

[Mirikitani and Nikolaev, 2007] Mirikitani, D. and Nikolaev, N. (2007). “Recursive Bayesian Levenberg-

Marquardt training of recurrent neural networks”. In. Proc. of Int’l Joint Conf. on Neural Network, Orlando,

Florida, USA, 12 – 17 August, 2007, pp. 282 – 287.

[Mjalli, 2006] Mjalli, F. S. (2006), “Adaptive and predictive control of liquid-liquid extractors using neural-based

instantaneous linearization technique”. Chem. Eng. Technol., vol. 29, no. 5, pp. 539 – 549.

[Mohagheghi et al., 2006] Mohagheghi, S., Venayagamoorthy, G. K. and Harley, R. G. (2006). “Adaptive critic

design based on neuro-fuzzy controller for a static compensator in a multimachine power system”. IEEE Trans.

Power Systems, vol. 21, no. 4, pp. 1744 – 1754.

[Monmasson et al., 2011] Monmasson, E., Idkhajine, L., Cirstea, M. N., Bahri, I., Tisan, A. and Naouar, M. W.

(2011). “FPGAs in industrial control applications”. IEEE Transactions on Industrial Informatics, vol. 7, no. 2, pp.

224 – 242.

[Morari, 1994] Morari, M. (1994), “Model predictive control: Multivariable control technique of choice in the

1990s”. In: Advances in Model-Based Predictive Control, New York: Oxford University Press, pp. 22 – 37.

References

320

[Morari and Zafiriou, 1989] Morari, M. and Zafiriou, E. (1989). “Robust Process Control”. Englewood Cliffs, NJ:

Prentice-Hall.

[Moretti, 2003] G. Moretti, System-level design merits a closer look: the complexity of today's designs requires

system-level, EDN Asia, February, 01 2002, pp. 22-28. [Online] Available: http://www.ednasia.com/article-1129-

systemleveldesignmeritsacloserlook-Asia.html.

[Moré, 1983] Moré, J. J. (1983). “Recent developments in algorithms and software for trust region methods”. In

mathematical programming: The State of the Art, A.Bachem, M. Grötschel and B. Korte Eds. Berlin: Springer-

Verlag, pp. 258-287.

[Muske and Rawlings, 1993] Muske, K. R. and Rawlings, J. B. (1993). “Model predictive control with linear

models”. AIChE Journal, vol. 39, pp. 262-287.

N

[Naouar et al., 2008] Naouar, M. W., Naassani, A. A., Monmasson, E. and Belkhodja, I. S. (2008). “FPGA-based

predictive current controller for synchronous machine speed drive”. IEEE Transactions on Power Electronics,

vol. 23, no. 4, pp. 2115 – 2126.

[Narendra and Annaswamy, 1989] Narendra, K. S. and Annaswamy, A. M. (1989). “Stable Adaptive Systems”.

Englewood Cliff, NJ: Prentice-Hall.

[Narendra and Driollet, 2001] Narendra, K. S. and Driollet, O. A. (2001). “Stochastic adaptive control using

multiple estimation models”. In Proc. of the 2001 American Control Conference (ACC2001), Arlington, VA, 25 –

27 June, 2001, pp. 1539 – 1544.

[Narendra and Parthasarathy, 1990] Narendra, K. S. and Parthasarathy, K. (1990). “Identification and control of

dynamical systems using neural networks”. IEEE Trans. Neural Networks, vol. 1, no. 1, pp. 4 – 27.

[Narendra and Parthasarathy, 1992] Narendra, K. S. and Parthasarathy, K. (1992). “Neural network and

Dynamical systems”. International J. Approximate Reasoning, vol. 6, pp. 109 – 131.

[Nelson, 1989] Nelson, R. C. (1989). “Flight Stability and Automatic Control”. New York: McGraw-Hill, Inc.

References

321

[Nguyen et al., 1979] Nguyen, L. T., Ogburn, M. E., Gilbert, W. P., Kibler, K. S., Brown, P. W. and Deal, P. L.

(1979). “Simulator Study of Stall/Post-Stall Characteristics of a Fighter Airplane With Relaxed Longitudinal

Static Stability”. NASA Technical Paper 1538, Dec., 1979, 233 pages.

[Nikolaou, 2001] Nikolaou, M. (2001). “Model predictive controllers: A critical synthesis of theory and industrial

needs”. Advances in Chemical Engineering, vol. 26, pp. 131 – 204.

[Ning et al., 2000] Ning, Z., Patry, G. G. and Spanjers, H. (2000) “Identification and quantification of nitrogen

nutrient deficiency in the activated sludge process using respirometry”. Wat. Res., vol. 34, No. 13, pp. 3345 –

3354.

[Nørgaard et al., 2000] Nørgaard, M., Ravn, O., Poulsen, N. K. and L.K. Hansen, L. K. (2000) “Neural Networks

for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook”. London: Springer-Verlag.

[Normey-Rico and Camacho, 2007] Normey-Rico, J. E. and E.F. Camacho, E. F. (2007). “Control of Dead-Time

Processes”. London: Springer-Verlag.

O

[Ogunfunmi, 2007] Ogunfunmi, T. (2007). “Adaptive Nonlinear System Identification: The Volterra and Wiener

Model Approaches”. New York: Springer.

[Oldfield and Dorf, 1995] Oldfield, J. V. and Dorf, R. C. (1995). “Field Programmable Gate Arrays:

reconfigurable Logic for Rapid Prototyping and Implementation of Digital Systems”. New York: John Wiley &

Sons, Inc.

[Omidvar and Elliott, 1997] Omidvar, O. M. and Elliott, D. L. (Feb., 1997) “Neural systems for control”.

Academic Press, San Diego. [Online] Available: http://www.isr.umd.edu/~delliot/NeuralSystemsForControl.pdf.

P

[Pan and Wang, 2008] Pan, Y. and Wang, J. (2008). “Two neural network approaches to model predictive

control”. In Proc. 2008 American Control Conference, Washington, USA, Jun. 11-13, 2008, pp. 1685 – 1690.

References

322

[Park and Sandberg, 1991] Park, J., and Sandberg, I. (1991). “Universal Approximation Using Radial-Basis-

Function Networks” Neural Computation, vol. 3, pp. 246-257.

[Pearlmutter, 1990] Pearlmutter, B. A. (1990). “Dynamic Recurrent Neural Networks”. Technical Report

(Supersedes CMU-CS-88-191), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213.

[Pearlmutter, 1995] Pearlmutter, B. A. (1995). “Gradient calculations for recurrent neural networks: A survey”.

IEEE Trans. Neural Networks, vol. 6, no. 5, pp. 1212 – 1228.

[Pérez et al., 2009] Pérez, M., Vásquez, M., Rodríguez, J. and Pontt, J (2009). “FPGA-based predictive current

control of a three-phase active front end rectifier”. IEEE Int’l Conf. on Ind. Tech., Gippsland, 10 – 13 Feb., 2009,

pp. 1 – 6.

[Petridis and Kehagias, 1998] Petridis, V. and Kehagias, A., (1998). “Predictive Modular Neural Networks:

Application to Time Series”. Massachusetts, USA: Kluwer Academic Publishers.

[Phansalkar and Sastry, 1994] Phansalkar, V. V. and Sastry, P. S. (1994). “Analysis of the Back-Propagation

Algorithm with Momentum”. IEEE Transactions on Neural Networks, vol. 5, no. 3, pp. 505 – 506.

[Piotrowski et al., 2008] Piotrowski, P., Brdys, M. A., Konarczak, K., Duzinkiewicz, K. and Chotkowski, W.

(2008). “Hierarchical dissolved oxygen control for activated sludge processes”. Cont. Eng. Pract., vol. 16, pp.

114 – 131.

[Potočnik and Grabec, 2002] Potočnik, P. and Grabec, I. (2002). “Nonlinear model predictive control of a cutting

process”. Neurocomputing, vol. 43, pp. 107 – 126.

[Prokhorov, 2007] Prokhorov, D. V. (2007) “Training recurrent neurocontrollers for real-time applications”. IEEE

Trans. Neural Networks, vol. 18, no. 4, pp. 1003 – 1015.

[Propoi, 1963] Propoi, A. I. (1963). “Use of LP methods for synthesizing sampled-data automatic systems”.

Automation and Remote Control, vol. 24, pp. 912 – 920.

References

323

Q

[Qin and Badgwell, 2003] Qin, S. J. and Badgwell, T. A. (2003). “A Survey of model predictive control

technology”. Control Engineering Practice, vol. 11, pp. 733 – 764.

R

[Richalet et al., 1978] Richalet, J., Rault, A., Testud, J. L. and J. Papon, “Model predictive heuristic control:

Applications to industrial processes”, Automatica 14, pp. 413 – 428.

[Rodrigo et al., 1999] Rodrigo, M. A., Seco, A., Ferrer, J., Penya-Roja, J. M. and Valverde, J. L. (1999).

“Nonlinear control of an activated sludge aeration process: use of fuzzy techniques for tuning PID controllers”.

ISA Transactions, vol. 38, pp. 231 – 241.

[Ronco and Gawthrop, 1997] Ronco, E. and Gawthrop, P. J. (1997). “Neural networks for modelling and

control”. Dept. of Mechanical Engineering, University of Glasgow, UK, Technical Report: csc97008, Nov. 10,

1997.

[Rosenblatt, 1959] Rosenblatt, F. (1959). “Principles of Neurodynamics”. New York: Spartan Books.

[Roskam, 1998] Roskam, J. (1998). “Airplane flight dynamics and automatic flight controls”. Kansas, USA: DAR

Corporation.

[Rossiter, 2004] Rositter, J. A. (2004). “Model-Based Predictive Control”. Florida 33431, USA: CRC Press LLC.

[Rumelhart et al., 1986] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). “Learning representations by

back-propagating errors”. Nature, vol. 323, pp. 533 – 536.

[Russel, 2003] Russel, R. S. (2003). “Non-linear F-16 Simulation using Simulink and Matlab, Ver. 1.0”.

Technical Report, University of Minnesota.

References

324

S

[Salahshoor et al., 2010] Salahshoor, K., Safari, E. and Samadi, M. F. (2010). “Adaptive model predictive control

of a hybrid motorboat using self-organizing GAP-RBF neural network and GA algorithm”. 2nd IEEE Int’l Conf.

on Adv. Computer Control, Shenyang, China, Mar. 27-29, 2010, pp. 588 – 592.

[Salgado et al., 1988] Salgado, M., Goodwin, G. and Middleton, R. (1988). “Modified least squares algorithm

incorporating exponential forgetting and resetting”. Int. J. Control, vol. 47, no. 2, pp. 477 – 491.

[Samaras et al., 2009] Samaras, I. K., Gialelis, J. V., Hassapis, G. D. and Akpan, V. A. (2009). “Utilizing

semantic web services in factory automation towards integrating resource constrained devices into enterprise

information systems”. In the Proceedings of the 14th IEEE International Conference on Emerging Technologies

and Factory Automation (ETFA’2009), Palma de Mallorca, Spain, 22 – 26 Sept., 2009, pp. 1 – 8.

[Samek and Macku, 2008] Samek, D. and Macku, L. (2008) “Semi-batch reactor predictive control using artificial

neural network”. In Proc. 16th Mediterranean Conference on Control and Automation, Ajaccio, France, June 25-

27, 2008, pp. 1532 – 1537.

[Sarangapani, 2006] Sarangapani, J. (2006). “Neural Network Control of Discrete-Time Systems”. Boca Raton:

CRC Press, 2006.

[Savran et al., 2006] Savran, A., Tasaltin, R. and Becerikli, Y. (2006). “Intelligent adaptive nonlinear flight

control for a high performance aircraft with neural networks”. ISA Transactions, vol. 45, no. 2, pp. 225 – 247.

[Scales, 1985] Scales, L. E. (1985). “Introduction to Non-Linear Optimization”. New York: Springer–Verlag.

[Schlimmer, 2009] Schlimmer, J (2004). A technical introduction to the Devices Profile for Web Services.

Microsoft Corporation. [Online] Available: http://msdn.microsoft.com/en-us/library/ms996400.aspx.

[Seborg et al., 2004] Seborg, D. E., Edgar, T. F. and Mellichamp, D. A. (2004). “Process Dynamics and Control”.

2nd ed., U.S.A.: John Wiley & Sons.

[Seng et al., 1998] Seng, T L., Khalid, M., Yusof, R. and Omatu, S. (1998). “Adaptive neuro-fuzzy control system

by RBF and GRNN neural networks”. J. of Intelligent and Robotic Systems, vol. 23, pp. 267 – 289.

References

325

[Seng et al., 2002] Seng, T L., Khalid, M., Yusof, R. and Omatu, S. (1998). “Adaptive GRNN for the modelling

of dynamic plants”. In Proc. 2002 IEEE International Symposium on Intelligent Control, Vancouver, Canada,

Oct. 27 – 30, 2002, pp. 217 – 222.

[Shen et al., 2008] Shen, W., Chen, X. and Jean, J. C. (2008). “Application of model predictive control to the

BSM1 benchmark of wastewater treatment process”. Computers and Chem. Eng., vol. 32, pp. 2849 – 2856.

[Sheng et al., 2002]. Sheng, J. Chen, T. and Shah, S. L. (2002). “Generalized predictive control for non-uniformly

sampled systems”. J. of Proc. Cont., vol. 12, pp. 875 – 885.

[Shoukry et al., 2010a] Shoukry, Y., El-Kharashi, M. W. and Hammad, S. (2010). “MPC-on-chip: An embedded

GPC coprocessor for automotive active suspension systems”. IEEE Embedded Systems Letters, vol. 2, no. 2, pp.

31 – 34, Jun. 2010.

[Shoukry et al., 2010b] Shoukry, Y., El-Kharashi, M. W. and Hammad, S. (2010). “Networked embedded

generalized predictive control for an active suspension system”. 2010 American Control Conference, Baltimore,

MD, USA, June 30 – July 2, 2010, pp. 4570 – 4575.

[Si et al., 2004] Si, J., Barto, A., Powell, W. and Wunsch, D. (2004) “Handbook of Learning and Approximate

Dynamic Programming”. Piscataway, NJ: IEEE Press.

[Sjöberg and Ljung, 1995] Sjöberg, J and Ljung, L. (1995). “Overtraining, regularization, and searching for

minimum in neural networks”. Int’l J. of Contr., vol. 62, pp. 1391 – 1408.

[Sjöberg et al., 1995] Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Pierre-Yves, G., Hjalmarsson,

H. and Juditsky, A. (1995). “Nonlinear black-box modeling in system identification: A unified overview”.

Automatica, vol. 31 (12), pp. 1691-1724.

[Soeterboek, 1992] Soeterboek, R. (1992). “Predictive Control: A Unified Approach”. New York, USA: Prentice–

Hall.

[Song, 2010] Song, Q. (2010). “On the weight convergence of Elman networks”. IEEE Trans. Neural Networks,

vol. 21, no. 3, pp. 463 – 480.

References

326

[Song and Liu, 2006] Song, Q. and Liu, F. (2006). “The direct approach to unified GPC based on

ARMAX/CARIMA/CARMA model and application for pneumatic actuator control”. In Proc. First Int’l Conf. on

Innovative Computing, Information and Control, China, 30 Aug. – 01 Sep., 2006, vol. 1, pp. 336 – 339.

[Song et al., 2006] Song, Q., Lui, F. and Findlay, R. D. (2006). “Generalized predictive control for a pneumatic

system based on an optimized ARMAX model with an artificial neural network”. IEEE CIMCA-IAWTIC’06,

Sydney, Australia, pp. 223 – 228.

[Specht, 1991] Specht, D. (1991), “A General Regression Neural Network”. IEEE Transactions on Neural

Networks, vol. 2(5), pp. 568-576.

[Spellman, 2003] Spellman, F. R. (2003). “Handbook of Water and Wastewater Treatment Plant Operations”.

Boca Raton, Florida: CRC Press LLC, 2003.

[Spooner et al., 2002] Spooner, J. T., Maggiore, M., Ordóñez, R. and Passino, K. M. (2002), “Stable Adaptive

Control and Estimation for Nonlinear Systems: Neural and Fuzzy Approximator Techniques”. New York: John

Wiley & Sons.

[Stare et al., 2006] Stare, A., Vrečko, D. and Hvala, N. (2006). “Modeling, identification, and validation of

models for predictive ammonia control in a wastewater treatment plant – A case study”. ISA Transactions, vol.

45, no. 2, pp. 159 – 174.

[Stare et al., 2007] Stare, A., Vrečko, D., Hvala, N. and Strmčnik, S. (2007). “Comparison of control strategies for

nitrogen removal in an activated sludge process in terms of operating cost: A simulation study”. Wat. Res., vol.

41, pp. 2004 – 2014.

[Steffens and Lant, 1999] Steffens, M. A. and Lant, P. A. (1999). “Multivariable control of nutrient-removing

activated sludge systems”. Wat. Res, vol. 33, No. 12, pp. 2864 – 2878.

[Stevens and Lewis, 2003] Stevens, B. L. and Lewis, F. L. (2003). “Aircraft Control and Simulation”. 2nd ed.,

New York: John Wiley & Sons, Inc.

[Su and Wu, 2009] C. Su and Y. Wu, “Adaptive neural network predictive control based on PSO algorithm,”

Chinese Control and Decision Conference, Guilin, China, June 17-19, 2009, pp. 5829 – 5833.

References

327

[Suárez et al., 2010] Suárez, G. I., Ortiz, O. A., Aballay, P. M. and Aros, N. H. (2010), “Adaptive neural model

predictive control for the grape juice concentration process,” 2010 IEEE Int’l Conf. on Industrial Tech., Vi a del

Mar, Mar. 14-17, 2010, pp. 57 – 63.

T

[Tanenbaum, 1996] Tanenbaum, A. (1996). “Computer Networks”. 3rd

Edition, Upper Saddle River, New Jersey:

Prentice-Hall.

[Tao, 2003] Tao, G. (2003). “Adaptive Control Design and Analysis”. Hoboken, New Jersey: John Wiley & Sons.

[The community resource for Jini technology, 2007] The community resource for Jini technology. (2007).

http://www.jini.org/.

[The DPWS, 2009] The DPWS Core Project (2009). https://forge.soa4d.org/projects/dpwscore/.

[The Network Simulator, 2009] The Network Simulator – ns-2 [Online]. Available: http://www.isi.edu/nsnam/ns/.

[Tøndel et al., 2003] Tøndel, P., Johansen, T. A. and Bemporad, A. (2003). “An algorithm for multi-parametric

quadratic programming and explicit MPC solutions,” Automatica, vol. 39, pp. 489 – 497.

[Tsai et al., 2002] Tsai, P F., Chu, J. Z., Jang S. S. and Shieh, S. S. (2002). “Developing a robust model predictive

control architecture through regional knowledge analysis of artificial neural networks”. J. of Process Control, vol.

13, pp. 423 – 435.

[Tsoukalas and Uhrig, 1997] Tsoukalas, L. H. and Uhrig, R. E. (1997). “Fuzzy and Neural Approaches in

Engineering”. New York: John Wiley & Sons.

U

[UPnP, 2010] The UPnP forum. [Online]. Available: http://www.upnp.org/.

References

328

V

[van Engelen and Gallivang, 2002] van Engelen R. and Gallivang K. (2002). The gSOAP toolkit for web services

and peer-to-peer computing networks, in 2nd IEEE/ACM International Symposium Cluster Computing and the

Grid, pp. 128 – 128.

[Vanderstteen et al., 1997] Vanderstteen, G., Rolain, Y. and Schoukens, J. (1997). “Non-linear estimation of the

frequency-response functions of the linear blocks of a Wiener-Hammerstein model”. Automatica, vol. 33(7), pp.

1351 – 1355.

[Venkateswarlu, and Rao, 2005] Venkateswarlu, C. and Rao, K. V. (2005) “Dynamic recurrent radial basis

function network model predictive control of unstable nonlinear processes”. Chemical Engineering Science., vol.

60, pp. 6718 – 6732.

[Vieira et al., 2005] Vieira, W. G., Santos, V. M. L., Carvalho, F. R., Pereira, J. A. F. R. and Fileti, A. M. F.

(2005). “Identification and predictive control of a FCC unit using a MIMO neural network”. Chem. Eng. & Proc.,

vol. 44, pp. 855 – 868.

[Virtex-4, 2010] Virtex-4 (2010). “Virtex-4 Family Overview: Product Specification”. DS112 (v3.1), August 30,

2010, pp. 1 – 9. http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf

[Virtex-4Q, 2010] Virtex-4Q (2010). “Defense-Grade Virtex-4Q Family Overview: Product Specification”. 2010,

pp. 1. http://www.xilinx.com/publications/prod_mktg/Virtex4QLX_product_table.pdf

[Virtex-4QV, 2010] Virtex-4QV (2010). “Space-Grade Virtex-4QV Family Overview: Product Specification”.

DS653 (v2.0), April 12, 2010, pp. 1 – 8. http://www.xilinx.com/support/documentation/data_sheets/ds653.pdf

http://www.xilinx.com/products/silicon-devices/fpga/virtex-4q/index.htm

[Virtex-5, 2010] Virtex-5 (2010). “Virtex-5 Family Overview: Product Specification”. DS100 (v5.0), February 6,


[Virtex–5, 2010] Virtex–5 FXT FPGAs Documentations (2010). http://www.xilinx.com/products/virtex5/fxt.htm.

[Virtex-5Q, 2010] Virtex-5Q (2010). “Defense-Grade Virtex-5Q Family Overview: Product Specification”.

DS174 (v2.0), March 22, 2010, pp. 1 – 13. http://www.xilinx.com/support/documentation/data_sheets/ds174.pdf

References

329

[Virtex-5QV, 2010] Virtex-5QV (2010). “Space-Grade Virtex-5QV Family Overview: Product Specification”.

DS192 (v1.1), August 30, 2010, pp. 1 – 15.

http://www.xilinx.com/support/documentation/data_sheets/ds192_V5QV_Device_Overview.pdf

[Virtex-6, 2011] Virtex-6 (2011). “Virtex-6 Family Overview: Product Specification”. DS150 (v2.3), March 24,


http://www.xilinx.com/products/silicon-devices/fpga/virtex-6/cxt.htm

[Virtex-7, 2011] Virtex-7 (2011). “Virtex-7 Family Overview: Product Specification”. DS180 (v1.6), March 28,

2011, pp. 1 – 14. http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf

http://www.xilinx.com/products/silicon-devices/fpga/virtex-7/index.htm

[Visioli, 2006] Visioli, A. (2006). “Practical PID Control”. London: Springer-Verlag Ltd.

[Vörös, 1997] Vörös, J. (1997). “Parameter identification of discontinuous Hammerstein systems”. Automatica.

Vol. 33(6), pp. 1141 – 1146.

[Voutetakis et al., 2006] Voutetakis, S. S., Serferlis, P., Papadopoulou, S. and Kyriakos, Y. (2006) “Model-based

control of temperature and energy requirements in a fluidized furnace reactor,” Energy, vol. 31, pp. 2418 – 2427.

[Vouzis et al., 2006] Vouzis, P., Bleris, L. G., Arnold, M. and Kothare, M. V. (2006). “A custom-made

algorithmic-specific processor for model predictive control”. In Proc. Int. Sym. Ind. Elect., Montreal, Canada, 9 –

13 June, 2006.

W

[Wang, 2009] Wang, L. (2009). “Model Predictive Control System Design and Implementation Using

MATLAB”. London: Springer-Verlag.

[Wang and Thomas, 2006] Wang, J. and Thomas, G. (2006). “A model based predictive control scheme for

nonlinear process”. In Proc. 2006 American Control Conference, Minneapolis, Minnesota, USA., June 14-16,

2006, pp. 4842 – 4847.

References

330

[Wang and Thomas, 2009] Wang, J. and Thomas, G. (2009). “Model predictive control for nonlinear affine

systems based on the simplified dual neural network”. In Proc. 2009 IEEE Internation Symposium on Intelligent

Control, Saint Petersburg, Russia, July 8-10, 2009, pp. 683 – 688.

[Wang et al., 2004] Wang, X. G., Tang, Z., Tamura, H., Ishii, M. and Sun, W. D. (2004). “An improved

backpropagation algorithm to avoid the local minima problem”. Neurocomputing, vol. 56, pp. 455 – 460.

[Werbos, 1990] Werbos, P. J. (1990). “Backpropagation through time: What it does and how to do it”. In Proc.

IEEE, vol. 78, no. 10, pp. 1550 – 1560.

[Widrow and Hoff, 1960] Widrow, B. and Hoff, M. E. (1960). “Adaptive switching circuits”. In Proceedings of

the Western Electronic Show Convention, vol. 4, pp. 96 – 104.

[Williams and Zipser, 1989] Williams, R. J. and Zipser, D. (1989). “A learning algorithm for continually running

fully recurrent neural networks”. Neural Comp., vol. 1, pp. 270 – 280.

[Wu, 2008] Wu, J. (2008). “Multilayer potts perceptrons with Levenberg-Marquardt learning”. IEEE

Transactions on Neural Networks, vol. 19, no. 12, pp. 2032 – 2043.

X

[XAccelSG, 2009] MATLAB for Synthesis: Style Guide, v11.4, December 2, 2009, pp. 1 – 232.

[XAccelUG, 2009] Xilinx AccelDSP Synthesis Tool: User Guide, v11.4, December 2, 2009, pp. 1 – 222.

[XAccelWare, 2007] AccelWare DSP IP Toolkits: User Guide, Release 9.2.00, August, 2007, pp. 1 – 290.

[XD2M, 2009] Data2MEM: User Guide (2009), UG658, Version 1.0, April 27, 2009, pp. 1 – 44.

[XEDKProf, 2010], EDK Profiling User Guide: A Guide to Profiling in EDK, UG448, April 19, 2010, pp. 1 – 32.

[XEPB Virtex-5, 2010] Embedded Processor Block in Virtex-5 FPGAs. Reference Guide, v1.7, October 6, 2010,

pp. 1 – 347. http://www.xilinx.com/support/documentation/user_guides/ug200.pdf

References

331

[Xilinx, 2010] Xinix Inc. (2010). www.xilinx.com.

[XISET, 2010] Xilinx ISE In-Dept Tutorial, v12.1, April 19, 2010, pp. 1 – 152.

[XISim, 2009] ISE Simulator (ISim): In-Dept Tutorial, v1.0, April 27, 2009, pp. 1 – 62.

[XMBPRG, 2010] MicroBlaze Processor Reference Guide: Embedded Development Kit (EDK), v11.0, April 19,

2010, pp. 1 – 210. http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/mb_ref_guide.pdf

[XPPC405C- Virtex-4, 2010] PowerPC405 Processor Block Reference Guide: Embedded Development Kit. V2.2,

January 11, 2010, pp. 1 – 252. http://www.xilinx.com/support/documentation/user_guides/ug018.pdf

[XPSFRM, 2010] Platform Specification Format Reference Manual: Embedded Development Kit (EDK), v12.1,

April 19, 2010, pp. 1 – 140. http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/psf_rm.pdf

[XSysGen, 2010] Xilinx System Generator for DSP: User Guide, v12.1, April 19, 2010, pp. 1 – 414.

Y

[Yang and Paindavoine, 2003] Yang, F. and Paindavoine, M. (2003). “Implementation of an RBF neural network

on embedded systems: real-time face tracking and identity verification”. IEEE Trans. Neural Networks, vol. 14,

no. 5, pp. 1162 – 1175.

[Yu and Yu, 2007] Yu, D. W. and Yu, D. L, (2007). “Multi-rate model predictive control of a chemical reactor

based on three neural models”. Biochemical Engineering Journal, vol. 37, pp. 86 – 97.

[Yu et al., 1993] Yu, X., Loh, N. K. and Miller, W. C. (1993). “A new acceleration technique for the back

propagation algorithm”. In Proc. Int’l Conf. on Neu. Netw., San Francisco, 28th March -1

st April, pp. 1157 – 1161.

[Yu et al., 2006] Yu, D. L., Yu, D. W. and Gomm, J. B. (2006). “Neural model adaptation and predictive control

of a chemical process rig”. IEEE Trans. Cont. Sys. Tech., vol. 14, no. 5, pp. 828 – 840.

[Yong et al., 2006] Yong, M., Yongzhen, P. and Jeppsson, U. (2006). “Dynamic evaluation of integrated control

strategies for enhanced nitrogen removal in activated sludge processes”. Cont. Eng. Pract., vol. 14, pp. 1269 –

1278.

References

332

[Yüzgeç et al., 2008] Yüzgeç, U., Becerikli, Y. and Türker, M. (2008). “Dynamic neural-network-based model

predictive control of an industrial baker’s yeast drying process”. IEEE Trans.Neural Networks, vol. 19, no. 7, pp.

1231 – 1242.

Z

[Zamarreño and Vega, 1999] Zamarreño, J. M. and Vega, P. (1999). “Neural predictive control: Application to a

highly non-linear system”. Eng. Appl. of Art. Intell., vol. 12, pp. 149 – 158.

[Zhang and Quan, 2009] Zhang, L. and Quan, S. (2009) “Model predictive control of nonlinear hybrid system

based on neural network optimization”. In Proc. 7th Asian Control Conference, Hong Kong, China, Aug. 27 – 29,

2009, pp. 1097 – 1102.

[Zhao, 1999] Zhao, H., Hao, O. J. and McAvoy, T. J. (1999). “Approaches to modeling nutrients dynamics:

ASM2, simplified model and neural nets”. Wat. Sci. Tech., vol. 39, no. 1, pp. 227 – 234.

[Zheng and Morari, 1995] Zheng, A. and Morari, M. (1995). “Stability of model predictive control with mixed

constraints”. IEEE Trans. Auto. Cont. vol. 40, 1818 – 1823.

[Zhu, 2001] Zhu, Y. (2001). “Multivariable System Identification for Process Control”. Oxford, UK: Elsevier

Science.

[Zhu, 2002] Zhu, Y. (2002). “Estimation of an N-L-N Hammerstein-Wiener model”. Automatica, vol. 38, pp.

1607-1614.

Appendix A Overview of Xilinx FPGA and Embedded System Design Tools

333

APPENDIX A: Overview of Xilinx FPGA and Embedded System Design Tools

Appendix A–1: Overview of the Xilinx Model-Based Design Flow of an Embedded

System

The integration of Simulink and MATLAB from The MathWorks, Inc. [MathWorks, 2010] and the Xilinx

FPGA design suite of tools [Xilinx, 2010] now allows the development of model-based design of a system on a

FPGA. Recently a lot of research work has been carried for the implementation of model predictive control

(MPC) algorithms on a FPGA. They involve the solution s of a computationally intensive online optimization

problem at a very short time interval. Some recent results can be found in ([Bleris et al., 2006], [Garcia et al.,

2004], [Ling et al., 2008], [Meloni et al., 2010], [Shoukry et al., 2010a], [Shoukry et al., 2010b], [Vouzis et al.,

2006]). Additionally, as nonlinearity is the characteristic of many industrial systems, FPGA implementation of

neural network algorithms, which seems to be an efficient method for modeling the dynamics of nonlinear

systems have also been reported in [Cardenas and Troncoso, 2008] and [Maeda and Wakamura, 2009].

MATLAB

Simulink

ModelSim

PE

Xilinx ISE Foundation 11.5

Virtex-5 FXT ML507

FPGA Board

Xilinx System

Generator for DSP

Generator

Xilinx AccelDSP

Synthesis Tool

Xilinx

AccelWare

(IP Library)

Xilinx DSP

Blockset

(IP Library)

Xilinx Core

Generator

(IP Library)

Xilinx EDK

(XPS & XSDK)

Xilinx & User

IP Core

(IP Catalogue)

AD

SE

HSE

NDHE

•

• RTL Module

RTM RTM RTM RTM

Development

Computer

Fig. A.1: Embedded system design flow: IP – Intellectual Property, AD – algorithm developer, SE – system

engineer, HSE – hardware/software engineer, NDSPHE – Non-DSP hardware engineer, EDK –

Embedded Development Kit, XPS – Xilinx Platform Studio, XSDK – Xilinx Software

Development Kit, RTM – RTL Top-Level Module, ISE – Integrated Software Environment.


334

A flow diagram showing a model-based design methodology is shown in Fig. A.1. A related but reduced

architecture from an ASIC point of view has been reported in [Meloni et al, 2010]. As shown in Fig. A.1, four

design approaches can be identified for implementing an FPGA-based design. One is from the point of view of an

algorithm developer (AD), another from the point of view of a system engineer (SE) or a hardware/software

engineer (HSE) or a non-DSP hardware engineer (NDHE). In this work, the first three FPGA-based design and

implementation approaches are presented from an AD, a SE and a HSE view points using model-based design

methodologies. The term “model-based design” here refers to design problems formulated as algorithms and

developed using MATLAB and Simulink from The MathWorks [MathWorks, 2010].

Validated Design

System-Level Design Environment

Fig. A.2: System modeling, development, Simulation and validation.

Fig. A.3: AccelDSP design routine at the Electronic System Level (ESL).


335

Appendix A–2: Algorithm Development Using the Xilinx AccelDSP Synthesis Tool

From an algorithm development and ESL design point of view, the problem is specified, formulated and

developed as a synthesizable MATALB algorithmic model using MATLAB and the Xilinx AccelWare functions.

Once the MATLAB-Simulink algorithm is validated as shown in Fig. A.2 and its performance satisfy the design

specifications, the floating-point MATLAB algorithm is then implemented using the Xilinx AccelDSP synthesis

tool. As illustrated in Fig A.3, the Xilinx AccelDSP tool is an advanced ESL design tool which converts the

synthesizable floating-point MATLAB model to a fixed-point MATLAB and C++ functions, and automatically

generates a verified synthesizable RTL model directly from the fixed-point MATLAB model.

At this point three implementation options are available as illustrated in Fig. A.4, namely: the ISE Verify

Gate Level implementation for programming the FPGA via the Xilinx Integrated Software Environment (ISE)

Foundation, the creation of an IP core for export and integration into a Xilinx System Generator model, or

performing Hardware Co-Simulation (HW Co-Sim). In the present study, only the second option, i.e. the option

with Generate System Generator is considered at this level of the design since the generated System Generator

model is to be integrated with other peripherals to form a complete model-based system.

Verify RTL

Generate RTL

Verify

Fixed Point

Generate

Fixed Point

Project

MATLAB/Simulink

Implementation and

Design Validation

Xilinx AccelDSP Synthesis Design Flow

Verify

Floating Point

Xilinx AccelWare

(IP Library)

Xilinx Core Generator

(IP Library)

System Specification,

Algorithm/Model Development

Analyze

Generate

System Generator

Verify RTL

Generate RTL

Verify

Fixed Point

Generate

Fixed Point

Project

Verify

Floating Point

Analyze

Synthesize RTL

Implement RTL

Verify Gate Level

Verify RTL

Generate RTL

Verify

Fixed Point

Generate

Fixed Point

Project

Verify

Floating Point

Analyze

Generate

HW Co-Sim

•

System Modeling and Development Tools

ISE Gate Level Implementation

Generation of the System Generator IP Core

Generation of the HW Co-Sim Block

Fig. A.4: From system specification and algorithm/model development to Xilinx AccelDSP design synthesis.


336

The AccelDSP IP Core Generators provide a direct path to hardware implementation for complex

MATLAB built-in and toolbox functions, which when used with the AccelDSP synthesis tool produce

synthesizable and pre-verified IP cores that enable algorithmic synthesis either directly on to a Xilinx FPGAs or

into a larger embedded system design using the Xilinx System Generator for DSP as discussed below.

Appendix A–3: Model-Based System Design and Development Using the Xilinx

System Generator for DSP

From the point of view of model-based system design and development, the Xilinx System Generator for

digital signal processing (DSP) enables the use of Simulink-MATLAB modeling and simulation environment for

FPGA design by providing a smooth path from initial design capture via the System Generator token (shown on

the left-hand side of Fig.A.5) to Xilinx FPGA implementation and analysis.

Double clicking on the System Generator token opens up the figure shown on the left-hand side of Fig.

A.5. As shown in this figure, the System Generator provides six compilation options for exploring FPGAs,

Fig. A.5: System Generator token (left) and the six System Generator compilation options (right) with available

Hardware Co-Simulation options without the Virtex-5 ML507 FPGA board.


337

namely: 1) HDL Netlist generation, 2) NGC Netlist generation, 3) Bitstream generation, 4) EDK Export Tool for

exporting an EDK processor IP core or for importing a pre-designed soft IP core of the MicroBlaze (MB)

processor, 5) Hardware Co-Simulation, and 6) Timing and Power Analysis. Only two options are demonstrated in

this report, namely: 1) HDL netlist generation and 2) EDK export Tool – for importing and exporting a EDK

processor IP core. The third option of Bitstream generation is implemented using the Xilinx ISETM

Foundation

throughout the demonstration in this report. System Generator also seamlessly integrates with ModelSim for

Hardware Co-Simulation (HW Co-Sim) and the ISE for FPGA hardware-in-the-loop (HIL) simulation as

illustrated in Fig. A.6.

In this work, the HW Co-Sim block is not generated using the AccelDSP modeling tool but rather using

the System Generator for DSP from within the MATLAB/Simulink environment and the HIL co-simulation is

also performed in this same environment.

The Xilinx DSP blockset, provided with System Generator, contains over 90 DSP building blocks that are

available from the Simulink modeling environment [Xilinx, 2010]. These blocks include filters, correlators,

trigonometric functions, modulation/demodulation, error correction, arithmetic, memories, co-processing

operations, etc; and they produce optimized and synthesized logic for programming the Xilinx FPGA. System

Generator provides three ways for importing processor(s) into a model, namely: as a black box block, as a

PicoBlaze Microcontroller block, and as an EDK Processor block mentioned earlier.

Fig. A.6: HDL Co-Simulation with ModelSim and FPGA Hardware-in-the-Loop (HIL) Simulation with ISE using

System Generator in MATLAB/Simulink modeling environment.


338

Appendix A–4: Xilinx Embedded Development Kit (EDK) Design Tools

The embedded development kit (EDK) is a suite of tools and collections of IP that are used to design a

complete embedded processor system for implementation in a Xilinx FPGA. The EDK is made up of the Xilinx

platform studio (XPS) that is used for designing the hardware portion of the embedded processor system and the

Xilinx software development kit (Xilinx SDK) which is an integrated development environment that is used for

the creation and verification of C/C++ embedded software applications. The EDK also includes hardware IP for

Xilinx embedded processors, drivers and libraries for the embedded software development, GNU compiler and

debugger for C/C++ software development for MicroBlaze (MB) and PowerPC (PPC) processors and utilities to

support all phases of the embedded processor development.

An embedded system design is a complex task since it consists of the hardware and software portions as

shown in the flow chart of the basic EDK design process of an embedded system as shown in Fig. A.7. The task

becomes more complicated when integrating an imported algorithmic IP core or user-defined logic into the

embedded processor system or integrating an exported embedded processor system into an existing system and/or

IP core outside the XPS or the EDK. The communications between the processor and the IP core or user-defined

logic often occur over a shared bus. The information conveyed frequently consists of different types of data such

as data for processing, data denoting the status of IP core or data affecting the mode of operation. Thus, the Xilinx

EDK automates the embedded processor design process using the Base System Builder (BSB) as well as the

import/export process to and from the System Generator or exporting to the Xilinx SDK.

As shown in Fig. A.1 and Fig. A.7, the EDK can also communicate interactively with the Xilinx ISE. The

ISETM

can be viewed as an interface between the completed embedded processor system design and the target

Export

Hardware

Platform to

Xilinx SDK

FPGA Configuration

Design

Implementation

Processor Hardware

Development

XPS

Software Debug

FPGA Configuration

Software Profiling

Software

Development

Xilinx SDK

Verification File

Generation

Xilinx ISETM Foundation

Xilinx EDK

Fig. A.7: The basic embedded system design flow using the EDK via ISETM

.


339

FPGA device. It is good practice to start the embedded processor system design using the ISE because it manages

the complete XPS project (see the embedded system design flow using the EDK via the ISE in Fig. A.7). Starting

the embedded processor system design from the ISE automatically include the EDK user constraint file (UCF)

that defines the FPGA input-output pin configurations and connections as well as several FPGA programming,

simulation and verification files. It is necessary that the interface between the processor and the IP core or user-

defined logic be specified via shared-memories. Shared-memories are used to provide storage locations that can

be referenced by name and allows a memory map and the associated software drives to be generated by the EDK

tools during the embedded processor design phase.

Fig. A.8: EDK import and export options within the System Generator.

MicroBlaze

(MB) Processor

Bus A

dapte

r

Memory

Map

RAM

FIFO

Reg <<’Status’>> Reg

FIFO <<’Stream’>>

RAM <<’Data’>>

IP Core,

User-Defined

or Custom

Logic

Export System Generator Model as Pcore to a Pre-Designed Processor System in XPS

Import a Pre-Designed Processor System from XPS as

HDL Netlist into System Generator Model

In

Out

Fig. A.9: Basic structure, memory map interface and communication between an embedded processor system and an

IP core, user-defined or built with custom logic.


340

The EDK processor development within the System Generator supports two modes of operation shown in

Fig. A.8, namely: the HDL netlisting mode for importing a pre-design EDK processor from the XPS to System

Generator model as a netlist and EDK Pcore generation mode for exporting a System Generator model as a Pcore

to the XPS. These two modes of operations are illustrated in Fig. A.9. Note that as at the time of this report, the

Xilinx EDK and System Generator support only the import of a MB processor system. The exported Pcore can be

integrated into an existing embedded MB processor, PowerPC processor or multi-processor systems.

This work is concerned with the EDK Export Tool for generating, exporting and integrating a custom

logic or intellectual property (IP) core into a pre-designed processor system in the EDK environment. The IP core

or custom is generated within the MATLAB/Simulink environment and imported into the pre-designed

MicroBlaze processor system within the EDK environment. In this work, the System Generator Model of the

adaptive generalized predictive control (AGPC) algorithm is exported as an IP core and integrated with a pre-

designed MicroBlaze processor system in the EDK environment.

Appendix A–5: Importing and Integrating an Embedded Development Kit (EDK)

Processor with a System Generator Model

When the HDL netlisting mode in Fig. A.8 is selected and used with the EDK Export Tool selected via

the System Generator token of Fig. A.5, the processor system will be imported into System Generator as a black

box. The assumption here is that the MB processor has been designed prior its importation into System Generator

model. The creation of a complete processor system is an automated process using the Xilinx Base System

Builder (BSB) Wizard.

The imported EDK processor is also augmented with bus interfaces necessary to connect the System

Generator memory map to the processor. During netlisting within the MATLAB/Simulink environment using

System Generator for DSP, the MicroBlazeTM

processor and the generated memory-mapped hardware are both

nestlisted into hardware. The structure of the EDK processor system imported into the System Generator model is

shown in Fig. A.9 .

It is necessary to note that once the EDK processor is imported into System Generator, in order to work

inside the System Generator (no matter in which the location of EDK project directory is) modifications are made

to the original processor system in the EDK project. Thus, it may be necessary to retain a copy of the original

EDK project before the import process. On the other hand, the EDK processor exported to the Xilinx SDK shown

in Fig. A.7 does not alter the EDK project but rather the software application runs on the embedded processor

system. However, in both the System Generator and Xilinx SDK cases, the processor system must be re-imported

if any changes are made or will be made after the initial import process.


341

Appendix A–6: Exporting and Integrating a System Generator Model with an

Embedded Development Kit (EDK) Processor

When the EDK pcore generation mode is selected in Fig. A.8 and used with the EDK Export Tool

selected via the System Generator token of Fig. A.5, System Generator is able to create a pcore from the given

System Generator model. The structure of the model generated as pcore to the EDK is shown by the portion

enclosed by the upper bracket of Fig. A.9.

In this mode, the assumption is that the MB processor added to the model is just a place-holder. Its actual

implementation is elaborated and filled in by the EDK when the peripheral is finally added into an EDK project.

As such the pcore that is created consists of custom logic, the generated memory map and virtual connections to

the custom logic, and the bus adaptor. The pcore also contains a collection of files describing the peripheral’s

hardware, software drivers, bus connectivity and documentation. Thus, the EDK allows peripherals to be attached

to processors created within the EDK based on the above peripheral information.

Appendix A–7: Xilinx ISETM

Foundation

Design implementation is the process of translating, mapping, place and route, and generating a bitstream

file for the design. With the Xilinx Fmax Technology, the Xilinx ISE™ Foundation provides the solution for

optimal design performance in the least amount of time. The ISE™ learning curve is greatly reduced because the

design implementation tools are conveniently available in ISETM

for easy access and project management, and this

can significantly reduce the project completion time. As shown in Fig. A.1, ISE™ is the bridge between the

complete design and the FPGA device. It provides a complete RTL design environment for Xilinx FPGAs that

includes: design capture using schematics, RTL language templates and RTL editors, simulations, synthesis, place

and route, bitstream generation, and programming of the FPGA as well as supports for real-time in-circuit

debugging of the programmed FPGA using Xilinx ChipScopeTM

Pro. The typical ISE™ design implementation is

summarized in the flowchart of Fig. A.10.

The Xilinx’s Xplorer Script (XST) tool allows for the observation of maximum design performance for

any clock in a design by running the various combinations of implementation tool options. The Xilinx Timing-

Driven Place and Route is enhanced for programmable logic prototyping and implementation. In ISETM

when

specifying timing requirements for critical paths, performance is dramatically improved through tools such as

Timing Analyzer, Constraints Editor with Time Specs, FPGA Editor, and the FloorplannerTM

. The PlanAheadTM

Design Analysis Tools is also used in the design implementation and configuration as an optional hierarchical

FloorplannerTM

and design analysis tool that decreases design time and increases performance by simplifying

logic synthesis through the physical design.


342

Translate and Map facility is also provided by the ISETM

which performs all the steps necessary to read a

netlist file in EDIF format and creates an output file describing the logical design (a logical design is in terms of

logic elements such as AND gates, OR gates, decoders, flip-flops, and RAMs). The ISE™ Timing-Driven Map

technology helps to lower device cost. With an exclusive Timing-Driven Map option, better design utilization for

FPGA device can be achieved, particularly if the device is already more than 90% utilized. Timing-Driven Map is

a next-generation enhancement to ISETM

physical synthesis, and combines placement with logic slice packing to

improve placement quality for “unrelated logic.” Device Configuration is also supported when configuring the

programmable logic device. As the last step in design methodology, a bitstream is generated from the physical

place and route information and is transferred through cables to the target device as shown in Fig. A.1 and Fig.

A.10. The ISETM

Project Navigator guides the designer through a simple Push-Button Design Flow to implement

designs automatically. For more complex designs, designer has to complete control over every aspect of the

design flow process. High Speed Design is fully supported because almost every high-performance logic system

being developed today contains a high-speed memory interface to logic. With DDR and QDR clock timing and

accuracy demands, the most robust design is achieved with no room for error in the interfaces. The ISETM

also

include High Speed input/output (I/O) synthesis optimization of paths to/from a 10.3125 gigabyte per second

(Gb/s) input/output (I/O) ports. The ISE libraries include 1-, 2- and 4-byte versions of Xilinx high-speed protocols

which can readily be included into a HDL code.

RTL Architectural and

Design Verification

RTL Design and

Optimization

RTL Simulation

Synthesis

Place and Route

Static Timing and

Performance Analysis

Program and Debug

Virtex-5 FPGA

Place and Route

Optimization

Synthesis

Optimization •

•

ChipScope Pro

(Optional)

ModelSim

(Optional)

Fig. A.10: Typical ISE™ design implementation flowchart.


343

APPENDIX A – 8: Major Simulink and System Generator for DSP Hardware Block

Description used in Modeling and Synthesis of the Adaptive MPC

Simulink Counter Limiter Block

The “Counter Limited” block, from the Simulink Library, is a counter that wraps back to zero

after it has output the specified upper limit. The count is always initialized to zero. The output is

normally an unsigned inter of 8, 16 or 32 bits. The smallest number of bits needed to represent the

upper limit is used.

Xilinx Gateway In Block

The Xilinx “Gateway In” block is the input into the Xilinx portion of the Simulink design. This

block convert Simulink integer, double and fixed point data types into the System Generator fixed

point data type. The block defines a top-level input port in the HDL design generated by the Xilinx

System Generator.

While converting a double data type to a System Generator fixed point type, the “Gateway In” block uses the

selected overflow and quantization options. For overflow, the options are to saturate to the largest

positive/smallest negative value, to wrap (i.e. to discard bits to the left of the most significant representable bit),

or to flag an overflow as a Simulink error during simulation. For quantization, the options are to round to the

nearest representable value (or to the value furthest from zero if there are two equidistant nearest representable

values), or to truncate (i.e. to discard bits to the right of the least significant bit).

It is important to note that overflow and quantization do not take place in the hardware rather they take place

in the block itself before entering the hardware phase.

The Xilinx “Gateway In” block has several functions. It converts data from Simulink integer, double and

fixed point data type to the System Generator fixed point data type during simulation in Simulink. It defines the

top-level input in the HDL design generated by System Generator. It also defines a testbench stimuli when the

“Create Testbench” box is checked in the System Generator block (see Fig. A.5 for the “Create Testbench” box).

In this case, during HDL code generation, the inputs to the block that occur duringSimulink simulation are logged

as logic vector in a data file. During HDL simulation, an entity that is inserted in the top-level testbench checks

this vector and the corresponding vectors produced by the “Gateway Out” block against the expected results.

Finally, the “Gateway In” also names the corresponding port in the top-level HDL entity.

l im

Counter

Limited

In

Gateway In


344

Xilinx Gateway Out Block

The Xilinx “Gateway Out” block is the output from the Xilin portion of the Simulink design.

This block converts the System Generator fixed point data type into Simulink integer, double and

fixed point data type. This block defines the input/output ports for the top-level of the HDL design

generated by System Generator as well as a top-level output port. It also define the testbench result vectors when

the System Generator “Create Testbench” box is checked. In this case, during HDL code generation, the outputs

from the block that occur during Simulink simulation are logged as logic vectors in a data file. For each top-level

port, an HDL component is inserted in the top-level testbench that checks this vector against expected results

during HDL simulation. Finally, the “Gateway Out” bloch names the corresponding output port on the top-level

HDL entity.

Xilinx From Register Block

The Xilinx “From Register” block implements the trailing half of a D flip-flop based register.

The “From Register” is a shared memory that reads data to a shared memory register. The

physical register can be shared among two designs or two portions of the same design. The block

reads data from a register that is written to by the corresponding “To Register” block. The “dout”

port presents the output of the register. The bit width specified on the mask must match the width

of the corresponding “To Register” block. The “From Register” has a delay of one sample period.

There must be exactly one “To Register” and exactly one “From Register” block for a particular register

name. In addition, the name must be distinct from all other shared memory names in the design. An initial value

specifies the initial value of the register. The register can be Locally Owned and initialized or Owned and

initialized elsewhere. A block that is locally owned is responsible for creating an instance of the register. A block

that is owned elsewhere attaches itself to a register instance that has already been created. As a result, if two

shared register blocks are used in two different models during simulation, the model containing the locally owned

block has to be started first. The “To Register” block is implemented as a synthesizable VHDL module. It does

not use a Xilinx LogiCORE.

Xilinx To Register Block

The Xilinx “To Register” block implements the leading half of D flip-flop based register,

having latency of one sample period. The “To Register” writes data to a shared memory register.

The register can be shared among multiple designs or section of a design. The block has two input

ports. The initial output value is specified by the user in the block parameter dialog box. When the

enable port “en” is asserted, data presented at the input port “din” appears at the output port “dout” after one

sample period. When “en” is not asserted, the last value written to the register is presented at the output port.

dout

From Register

<< 'NN_Weight' >>

din

en

dout

To Register

<< 'Output' >>

Out

Gateway Out


345

addra

dina

wea

addrb

dinb

web

A

B

Dual Port RAM

There must be exactly only one “To Register” block for a particular register. In addition, the shared memory

name must be distinct from all other shared memory names in the design. An initial value specifies the initial

value of the register. The register can be Locally Owned and initialized or Owned and initialized elsewhere. A

block that is locally owned is responsible for creating an instance of the register. A block that is owned elsewhere

attaches itself to a register instance that has already been created. As a result, if two shared register blocks are

used in two different models during simulation, the model containing the locally owned block has to be started

first. The “To Register” block is implemented as a synthesizable VHDL module. It does not use a Xilinx

LogiCORE.

Xilinx Constant Block (EN_Regs)

The Xilinx “Constant” block generates a constant that can be a fixed-point value, a Boolean

value or a DSP48 instruction when set in the DSP48 Instruction Mode. This block is similar to the

Simulink constant except that it can be used here to directly drive the input ports or asserts signal

ports on Xilinx blocks. In this work, the “Constant” block is only used to assert the “To Register”

for synchronization at each sample instance of the clock. The aspect of generating a DSP48 instruction is not used

in this work and is not treated here. The interested reader is referred to Xilinx System Generator User’s Guide

[Xilinx, 2010]. The “Constant” block does not use a Xilinx LogiCORE.

Xilinx Dual Port RAM Block

The Xilinx “Dual Port RAM” block implements a random access memory (RAM). Dual

ports enable simultaneously access to the memory space at different sample rates using

multiple data widths. The block has two independent sets of ports (A and B) for simultaneous

reading and writing operation. Independent address, data and write enable ports allow shared

access to a single memory space. Each port set has one output port and three input ports for

address (“addra” and “addrb”), input data (“dina” and “dinb”)and write enable (“wea” and

“web”). The result of simultaneous access to both ports is as described below:

1). If both ports read simultaneously from the same memory cell, the read operation is successful.

2). If both ports try to write simultaneously to the same memory cell, both outputs are marked as

indeterminate and the operation is unsuccessful.

3). If one port writes and the other reads from the same memory cell, the write operation succeeds and the

read operation results in an indeterminate state except in the case when the write port is in the Read

before Write mode.

The “Dual Port RAM” block also supports various Form Factors (FF). Form factor is defined as:

1

EN_Regs


346

B

A

WFF

W=

where B

W is the data width of port B and A

W is the data width of Port A.

( ) , 0A A BMod D W W = i

for a given FF where A

D is the depth specified for Port A.

The depth of Port B is inferred from the specified form factor as follows:

AB

DD

FF=

The data input ports on Port A and B can have different arithmetic type and binary point position for a form

factor of 1. For form factors greater than 1, the data input ports on Port A and Port B should have an unsigned

arithmetic type with binary point at 0. The output ports labeled A and B have the same types as the corresponding

input data ports.

The location in the memory block can be accessed for reading or writing by providing the valid address on

each individual address port. A valid address is an unsigned integer from 0 to 1d − , where d denotes the depth

of the RAM (number of words in the RAM) for the particular port. An attempt to read past the end of the memory

is caught as an error in simulation. The initial RAM contents can be specified through a block parameter. Each

write enable port must be a Boolean value. When the “we” port is 1, the value on the data input is written to the

location indicated by the address line.

The output during a write operation depends on the write mode. When “we” is 0, the output port has the value

at the location specified by the address line. During a write operation (i.e. “we” asserted), the data presented on

the input port is stored in memory at the location selected by the port’s address input. During a write cycle, the

user can configure the behaviour of the data output ports A/B to one of the following:

1). Read After Write

2). Read Before Write

3). No Read On Write

The “Dual Port RAM” always uses a Xilinx LogiCORE as “Dual Port Block Memory” or “Distributed

Memory”. For “Dual Port Block Memory”, the address width must be equal to 2log d ; where d denotes the

memory depth. The maximum width of data words in the block memory depends on the depth specified and the

maximum depth depends on the target device family.

On the other hand, when the “Distributed Memory” parameter is selected, LogiCORE Distributed Memory is

used. The depth must be between 16 and 65,536 for Virtex-II, Virtex-II Pro, Spartan-3 and Virtex-4. For all other

Xilinx FPGA families, the depth must be between 16 to 4096 inclusive. However, the word with must between 1

to 1024 inclusive.


347

addr

data

we

z-1

Single Port RAM

Xilinx Single Port RAM Block

The Xilinx “Single Port RAM” block implements a random memory access (RAM) with

one data input and one data output port. The block has one output port and three input ports one

for address (“addr”), input data (“data”) and write enable (“we”). Values in a “Single Port

RAM” are stored by word and all words have the same arithmetic type, width and binary

position.

A “Single Port RAM” can be implemented using either block memory or distributed

memory resources in the FPGA fabric. Each data word is associated with exactly one address that must be an

unsigned integer in the range 0 to 1d − , where d is the RAM depth (i.e. number of words in the RAM). An

attempt to read past the end of the memory is caught as an error in the simulation. Although, if a block memory

implementation is chosen, it may be possible to read beyond the specified address range in the hardware (with

unpredictable results). The initial RAM contents can be specified through the block parameters. The behaviour of

the output port depends on the write mode selected. When the “we” is 0, the output port has the value at the

location specified by the address line.

During a write operation (“we” asserted), the data presented to the data input is stored in memory at the

location selected by the address input. The user can configure the behaviour of the data output port upon a write

operation to one of the following modes:

1). Read After Write

2). Read Before Write

3). No Read On Write

The distributed memory LogiCORE supports only the Read Before Write mode. The Xilinx “Single Port RAM”

block also allows distributed memory write mode option set to Read After Write when the specified latency is

greater than 0. The Read After Write mode for the distributed memory is achieved by using extra hardware

resources such as a multiplexer (MUX) at the output of the distributed memory to latch data during a write

operation.

The “Single Port RAM” always uses a Xilinx LogiCORE as “Single Port Block Memory” or “Distributed

Memory”. For “Single Port Block Memory”, the address width must be equal to 2log d ; where d denotes the

memory depth. The maximum width of data words in the block memory depends on the depth specified and the

maximum depth depends on the target device family.

When the “Distributed Memory” parameter is selected, LogiCORE Distributed Memory is used. The depth

must be between 16 and 65,536 for Virtex-II, Virtex-II Pro, Spartan-3 and Virtex-4. For all other Xilinx FPGA

families, the depth must be between 16 to 4096 inclusive. However, the word with must between 1 to 1024

inclusive.


348

Sy stem

Generator

Xilinx System Generator Block

Xilinx “System Generator” block is the main interface between MATLAB/Simulink and

Xilinx design and development tools. Xilinx System Generator automatically compiles designs

into low-level representations using the “System Generator” block.

Before System Generator design can be simulated or translated into hardware, the design must

include the “System Generator” block. A design must contain at least one “System Generator”

block, but can contain several “System Generator” blocks on different levels (one per level). A “System

Generator” block that is underneath another in the hierarchy is a slave; one that is not slave is a master. The scope

of a “System Generator” consist of the level of hierarchy into which it is embedded and all subsystems below that

level and all simulation parameters must be specified in the master “System Generator” block.

Once the “System Generator” block is added, it is possible to specify how code generation and simulation can

be handled. Such code generation and simulation options based on the compilation type as illustrated in Fig. A.5.

Pressing the Generate button in Fig. A.5, which is assumed to be located at the top of the design, instructs

System Generator to compile the design into equivalent low-level results. The compilation type (under

Compilation) specifies the type of result produced. The possible compilations types are:

1). Netlists: Two types of netlist compilation are available, namely: HDL Netlist and NGC Netlist.

HDL (hardware description language) Netlist result is a collection of HDL and EDIF (Electronic Data

Interchange Format) file, and a few auxiliary files that simplifies the downstream processing. The

collection is ready to be processed by a synthesis tool such as the Xilinx Synthesis Tool (XST), and then

fed to the Xilinx physical design tools to produce a configuration bitstream for a Xilinx FPGA.

NGC is similar to the HDL Netlist but the resulting files are NGC files instead of HDL files. The

NGC file is a netlist that contains both logical design data and constraints. This file replaces both EDIF

and NCF (Netlist Constraint File) files.

2). Bitstream which produces an FPGA configuration bitstream that is ready to run in a hardware FPGA

platform.

3). EDK Export Tool which generates programming file that can be exported to and integrated with a pre-

designed embedded processor system as a custom logic (Pcore) in the Xilinx Embedded Development Kit

(EDK) or that can be used to import and integrate a pre-designed embedded processor with the System

Generator model within the Simulink environment for various hardware simulations on the FPGA device.

4). Hardware Co-Simulation: When this compilation type is selected with variety of hardware co-simulation,

the System Generator produces an FPGA configuration bitstream that is ready to run in a hardware FPGA

platform. The particular platform depends on the variety chosen. System Generator also produces a

hardware co-simulation block to which the bitstream is associated and the generated block will be able to

participate in Simulink simulations. The generated hardware is functionally equivalent to the portion of the

design from which it was derived, but is implemented by bitstream. In a simulation, the block delivers the


349

EDK Processor

same results as those produced by the portion except that the results are calculated in working hardware

running on the FPGA device.

5). The Timing and Power Analysis option give a detailed and comprehensive report on the timing and power

consumption of the design.

Xilinx EDK Processor Block

The Xilinx “EDK Processor” block allows user logic developed in System Generator to be

attached to embedded processor systems created using the Xilinx Embedded Development Kit

(EDK) and the Xilinx Platform Studio (XPS) tool suite. As shown in Fig. A.8, the Xilinx “EDK

Processor” block supports two design flows: EDK Pcore generation and HDL netlisting. In EDK

Pcore generation flow, the System Generator models are exported as a Pcore, which can later be imported into

EDK projected and attached to embedded processors. In HDL netlisting flow, the embedded processor systems

created using the EDK tool are imported into System Generator models.

The “EDK” processor block automatically generates a Shared-Memory based memory map interface for the

embedded processor and the user logic using System Generator to communicate with each other. Device drivers

written in C programming language are also automatically generated by “EDK Processor” block in order for the

embedded processors to access the attached shared memories.

Fig.A.9 shows the memory map interface generated by the “EDK Processor” block. The user logic developed

in System Generator is connected to a set of shared memories. These shared memories can be added to the “EDK

Processor” block through the block dialog box of Fig. A.8. The “EDK Processor” block automatically generates

the other half of the shared memories and a memory map interface that connects the shared memories to the

embedded processor system through a pair of Processor Local Bus (PLB) or Fast Simplex Link (FSL) but the

former is used in this work since the later is deprecated. C program device drivers are also automatically

generated so that the embedded processor systems can get access to these shared-memories, by their names or

their locations in the memory map interface.

The memory map interface is generated by the “EDK Processor” block in either the EDK Pcore generation or

HDL netlisting flow. In the EDK Pcore generation flow, on the hardware to the right of the Bus Adaptor is

netlisted into the exported Pcore. In the HDL netlisting flow, all the hardware shown in Fig. A.9 including the

MicroBlaze processor, the memory map interface, the shared memories and the use-defined or custom logic is

enlisted together, just like any other System Generator designs.

It should be noted that only one “EDK Processor” block per design is supported as at the time of this work.

For the HDL netlisting flow, the software based simulation only supports a subset of the MicroBlaze hardware

peripherals and NOT the PowerPC hardware peripherals. Only one MicroBlaze processor per design is supported.

The use of multiple MicroBlaze embedded processors per design and the embedded PowerPC processor are not

supported in the HDL netlisting flow.


350

However, in the XPS environment, multiple processor systems design is supported. Currently, dual

MicroBlaze embedded processors or the combinations of a MicroBlaze embedded processor and an embedded

PowerPC440 processor is support. Thus, a generated System Generator Pcore using the “EDK Proceesor” block

in the EDK Pcore generation flow in System Generator can be imported an attached to these embedded multiple

processors system.

Xilinx DSP48E Block

The Xilinx “DSP48E” block is an efficient building block for DSP applications that uses

Xilinx Virtex-5 device such as in this work. The internal architecture of the Xilinx “DSP48E”

multiplier is shown in Fig. A.11. The DSP48E combines an 18-bit by 25-bit signed multiplier

with a 48-bit adder and a programmable multiplexer to select the adder’s input. Operations can

be selected dynamically. Optional input and multiplier pipeline registers can be selected as

well as registers for the arithmetic logic unit (ALU) mode (ALUMODE), CARRYIN and

OPMODE ports. The Xilinx “DSP48E” block can also target devices that do not contain the DSP48E hardware

primitive if the “Use Synthesizable Model” option is selected on the “Implementation” tab that comes up in the

dialog box when the DSP48E block is double-clicked as shown below in Fig. A.12.

Pipelining the Xilinx “DSP48E” block can be accomplished by using the “Pipelining” tab that comes up in

the dialog box when the DSP48E block is double-clicked as shown below in Fig. A.13. Parameters specific to the

Pipelining tab are:

1). Length of a/acin pipeline: specifies the length of the pipeline on input register A. A pipeline of length 0

removes the register on the input.

2). Length of b/bCIN pipeline: specifies the length of the pipeline for the b input whether it is read from b or bcin.

3). Length of acout pipeline: specifies the length of the pipeline between the a/acin input and the acout output

port. A pipeline of length 0 removes the register from the acout pipeline length. Must be less than or equal to

the length of the a/acin pipeline.

4). Length of bcout pipeline: specifies the length of the pipeline between the b/bcin input and the bcout output

port. A pipeline of length 0 removes the register from the bcout pipeline length. Must be less than or equal to

the length of the b/bcin pipeline.

5). Pipeline c: indicates whether the input from the c port should be registered.

6). Pipeline p: indicates whether the outputs p and pcout should be registered.

7). Pipeline multiplier: indicates whether the internal multiplier should register its output.

8). Pipeline opmode: indicates whether the opmode port should be registered.

9). Pipeline alumode: indicates whether the alumode port should be registered.

10). Pipeline carry in: indicates whether the carry in port should be registered.

11). Pipeline carry in select: indicates whether the carry in select port should be registered

a

b

opmode

alumode

carryin

carryinsel

p

DSP48E


351

Fig. A.11: The internal architecture of the DES48E multiplier for embedding into a Virtex-5 FPGAs.

Fig. A.12: Including the DSP48E into FPGA with non Fig. A.13: The Pipeline parameters tab for pipelining the

DSP48 hardware primitive using the “Use Xilinx DSP48E embedded multiplier.

Synthesizable Model” highlighted with broken

red lines.


352

APPENDIX A–9: PowerPC™ 440 Embedded Processor

The Xilinx Virtex®-5 FXT FPGAs introduce an embedded processor block for PowerPC™ 440

(PPC440) processor designs [XEPB Virtex-5, 2010]. This block contains the PowerPC™ 440x5 32-bit embedded

processor developed by IBM ([IBM PEPC440, 2010]; [IBM TPPC440C, 1999]). The PowerPC 440x5 processor

implements the IBM Book E: Enhanced PowerPC™ Architecture. The PowerPC™ 440’s high-speed, superscalar

design and Book E Enhanced PowerPC™ architecture put it at the leading edge for high performance system-on-

a-chip (SOC) designs. The PowerPC™ 440 core combines the performance and features of standalone

microprocessors with the flexibility, low power, and modularity of embedded CPU cores.

A typical system on a chip design with the PPC440 Core uses the IBM CoreConnect™ bus structure for

system level communication [IBM PEPC440, 2010]. High bandwidth peripherals and the PPC440 core

communicate with one another over the processor local bus (PLB). Less demanding peripherals share the on-chip

peripheral bus (OPB) and communicate to the PLB through the OPB Bridge. The PLB and OPB provide common

interfaces for peripherals and enable quick turnaround, custom solutions for high volume applications. The typical

architectural example of the PPC440 Core-based system on a chip, illustrating the two-level bus structure and

modular core-based design is shown in Fig. A.14.

The PowerPC™ 440 embedded processor contains a dual-issue, superscalar 32-bit reduced instruction set

computer (RISC) central processing unit (CPU), pipelined processing unit, along with other functional elements

Fig. A.14: The PowerPC™ 440 Core system on a chip with two-level bus structure and additional peripherals.


353

required to implement embedded system-on-a-chip solutions. These other functions include memory

management, cache control, timers, and debug facilities. In addition to three separate 128-bit Processor Local Bus

(PLB) interfaces, the embedded processor provides interfaces for custom coprocessors and floating-point

functions, along with separate 32 KB instruction and 32 KB data caches [XEPB Virtex-5, 2010].

APPENDIX A–9.1: The PowerPC™ 440 Core Block Diagram

The PPC440 Core, as a member of the PowerPC™ 400 Family, is supported by the IBM PowerPC™

Embedded Tools™ program, in which over 80 third party vendors have combined with IBM to provide a

complete tools solution including Xilinx [IBM TPPC440C, 1999]. Development tools for the PPC440 include

C/C++ compilers, debuggers, bus functional models, hardware/software co-simulation environments, and real-

time operating systems. As part of the tools program, IBM maintains a complete set of development tools by

offering the High C/C++ Compiler, RISCWatch™ debugger with RISCTrace™ trace interface, VHDL and

Verilog simulation models and a PPC440 Core Superstructure development kit [IBM PEPC440, 2010]. The

PPC440 CPU operates on instructions in a dual issue, seven-stage pipeline, capable of dispatching two

instructions per clock to multiple execution units and to optional Auxiliary Processor Units (APUs). The PPC440

core block diagram is shown in Fig. A.15.

The PowerPC™ 440 embedded processor implements the full, 32-bit fixed-point subset of the IBM Book

E: Enhanced PowerPC™ architecture. The PowerPC™ 440 embedded processor fully complies with this

Fig. A.15: The PowerPC™ 440 embedded processor core block diagram.


354

architectural specification. The 64-bit operations of the architecture are not supported, and the embedded

processor does not implement the floating-point operations, although a floating-point unit (FPU) can be attached

(using the APUs interface). Within the embedded processor, the 64-bit operations and the floating-point

operations are trapped, and the floating-point operations can be emulated using software.

The PowerPC™ 440 embedded processor implemented in Xilinx Virtex-5 devices and discussed in

Xilinx’s documentations differs from the Book E architecture specification in the use of bit numbering for

architected registers ([IBM PEPC440, 2010]; [XEPB Virtex-5, 2010]). Specifically, Book E defines the full, 64-

bit instruction set architecture, where all registers have bit numbers from 0 to 63, with bit 63 being the least

significant. This document describes the PowerPC 440 embedded processor, which is a 32-bit subset

implementation of the architecture. Accordingly, all architected registers are 32 bits in length, with the bits

numbered from 0 to 31, where bit 31 is the least significant. Therefore, references to register bit numbers from 0

to 31 in this document correspond to bits 32 to 63 of the same register in the Book E architecture specification

([IBM PEPC440, 2010]; [XEPB Virtex-5, 2010]).

APPENDIX A–9.2: The PowerPC™ 440 Embedded Processor Organization

The PowerPC 440 embedded processor includes a seven-stage pipelined PowerPC™ 440 processor,

which consists of a three-stage, dual- issue instruction fetch and decode unit with attached branch unit, together

DTLB Load / Store

Queues I-Cache Controller ITLB

Instruction

Unit

Dispatch Dispatch

(Issue 0) (Issue 1)

Branch

Unit

Target

Address

Cache

512

BHT

(4KB)

Complex

Integer

Pipe

DCR Bus

JTAG

Debug

Trace

Interrupt

and

Timers

Clocks

and

Power

Management

MAC

Simple

Integer

Pipe

GPR

File

GPR

File

Load /

Store

Unit

D-Cache Controller

Instruction Cache

(32KB) Data Cache

(32KB)

Memory

Management

Unit

(MMU)

64-Entry

133MH Processor Local Bus (PLB) 128-Bit

128-Bit 128-Bit

Fig. A.16: The logical organization of the PowerPC™ 440 embedded processor.


355

with three independent, four-stage pipelines for complex integer, simple integer, and load/store operations,

respectively. The PowerPC™ 440 embedded processor also includes a memory management unit (MMU),

separate instruction and data cache units, JTAG, debug, and trace logic, and timer facilities. The logical

organization of the PowerPC™ 440 embedded processor is shown in Fig. A.16. The seven-stage pipelines

included in the PowerPC™ 440 embedded processor core central processing unit (CPU) are illustrated as shown

in Fig. A.17. The basic seven-stage pipelines of the PowerPC™ 440 can be outlined as follows ([IBM PEPC440,

2010]; [IBM PPC440C, 2006]; [IBM TPPC440C, 1999]; [XEPB Virtex-5, 2010]):

1). IFTH – Fetch instructions from instruction cache

2). PDCD – Pre-decode; partial instruction decode

3). DISS – Decode/Issue; final decode and issue to units

4). RACC – Register Access; read from multi-ported General Purpose Register (GPR) file

5). EXE1/AGEN – Execute stage 1; complete simple arithmetics, generate load/store address

6). EXE2/CRD – Execute stage 2; multiplex in results from units in preparation for writing into GPR file, Data

Cache access

7). WB – Writeback; write results into GPR file from integer operation or load operation

Stage 1:

Instruction Fetch (IFTH)

Stage 2:

Pre-decode (PDCD)

Stage 3:

Decode/Issue (DISS)

Stage 4:

Register Access (RACC)

Stage 5: Address Generation /

Execute 1(AGEN/EXE1)

Stage 6:

Cache Read / Execute 2

(CRD/EXE2)

Stage 7:

Writeback (WB)

Fig. A.17: The seven-stage pipelines included in the PowerPC™ 440 embedded processor core CPU.


356

APPENDIX A–9.3: PowerPC™ 440 Embedded Processor Block Components, Buses

and Controllers

The main components of the embedded processor block in Virtex-5 FXT FPGAs are the processor, the

crossbar and its interfaces, the Auxiliary Processing Unit (APU) controller, and the control (clock and reset)

module [XEPB Virtex-5, 2010]. The embedded processor block and its components are shown in Fig. A.18.

The PowerPC™ embedded processor has been described above. The processor has three PLB interfaces:

one for instruction reads, one for data reads, and one for data writes. Typically, all three interfaces access a single

large external memory. Peripheral access in PowerPC 440 systems is memory mapped, and the data PLB

interfaces typically connect to various peripherals directly or via bridges. Some of these peripherals might have

Direct Memory Access (DMA) capability to improve data bandwidth and performance. Other peripherals might

rely on a separate DMA engine to provide this improved data bandwidth between the peripheral and memory.

Peripherals can be implemented in soft logic, using the lookup tables (LUTs) and other primitive logic elements

provided by the FPGA, or the peripherals can be implemented in silicon.

Fig. A.18: Power PC™ 440 Embedded Processor Block in Virtex-5 FPGAs.


357

Peripherals are hardened or implemented in silicon if they are likely to be used by a large number of

customers, or if hardening is necessary for performance reasons. Some peripherals are implemented in Virtex-5

FXT silicon, such as integrated endpoints for PCI Express designs and tri-mode Ethernet MACs implemented in

silicon. These peripherals have a LocalLink interface for high-bandwidth data transfers.

Superscalar Instruction Unit

The instruction unit of the PowerPC 440 embedded processor fetches, decodes, and issues two

instructions per cycle to any combination of the three execution pipelines and/or the APU interface. The

instruction unit includes a branch unit, which provides dynamic branch prediction using a branch history table

(BHT), as well as a branch target address cache (BTAC). These mechanisms greatly improve the branch

prediction accuracy and reduce the latency of taken branches, such that the target of a branch can usually be

executed immediately after the branch itself, with no penalty.

Execution Pipelines

The PowerPC 440 embedded processor contains three execution pipelines: complex integer, simple

integer, and load/store. Each pipeline consists of four stages and can access the nine-ported (six read, three write)

GPR file. There are two identical copies of the GPR file to improve performance and avoid contention for it. One

copy is dedicated to the complex integer pipeline, while the other is shared by the simple integer and the

load/store pipelines.

The complex integer pipeline handles all arithmetic, logical, branch, and system management instructions

(such as interrupt and TLB management, move to/from system registers, and so on). The pipeline also handles

multiply-and-divide operations, and 24 DSP instructions that perform a variety of multiply-accumulate

operations. The complex integer pipeline multiply unit can perform 32-bit x 32-bit multiply operations with single

cycle throughput and three-cycle latency; 16-bit x 32-bit multiply operations have only two-cycle latency. Divide

operations take 33 cycles.

The simple integer pipeline can handle most arithmetic and logical operations, which do not update the

Condition Register (CR). The load/store pipeline handles all load, store, and cache management instructions. All

misaligned operations are handled in hardware with no penalty on any operation contained within an aligned 16-

byte region. The load/store pipeline supports all operations to both big-endian and little-endian data regions.

Instruction and Data Cache Controllers

The PowerPC 440 embedded processor provides separate instruction and data cache controllers and 32

KB arrays, which allow concurrent access and minimize pipeline stalls. Both cache controllers have 32-byte lines,


358

and both are 64-way set-associative. Both caches support parity checking on the tags and data in the memory

arrays to protect against soft errors. If a parity error is detected, the CPU causes a machine check exception.

The PowerPC instruction set provides a rich set of cache management instructions for software-enforced

coherency. The PowerPC 440 implementation also provides special debug instructions that can directly read the

tag and data arrays. The instruction cache controller connects to the instruction-side PLB interface of the

processor. The data cache controller connects to the data read and data write PLB interfaces.

Instruction Cache Controller (ICC)

The instruction cache controller (ICC) delivers two instructions per cycle to the instruction unit of the

PowerPC 440 embedded processor. The ICC also handles the execution of the PowerPC instruction cache

management instructions for coherency. The ICC includes a speculative pre-fetch mechanism. These speculative

pre-fetches can be abandoned if the instruction execution branches away from the original instruction stream.

Note that the speculative pre-fetching should not be used with this version of the PowerPC 440 processor because

of known errors documented by IBM.

The ICC supports cache line locking at 16-line granularity. In addition, the notion of a “transient” portion

of the cache is supported, in which the cache can be configured such that only a limited portion is used for

instruction cache lines from memory pages designated by a storage attribute from the MMU as being transient in

nature. Such memory pages would contain code that is unlikely to be reused once the processor moves on to the

next series of instruction lines. Thus performance may be improved by preventing each series of instruction lines

from overwriting all of the “regular” code in the instruction cache.

Data Cache Controller (DCC)

The data cache controller (DCC) handles all load and store data accesses, as well as the PowerPC data

cache management instructions. All misaligned accesses are handled in hardware. Those accesses contained

within a half-line (16 bytes) are handled as a single request. Load and store accesses that cross a 16-byte boundary

are broken into two separate accesses by the hardware.

The DCC interfaces to the APU port to provide direct load/store access to the data cache for APU load

and store operations. Such APU load and store instructions can access up to 16 bytes (one quad word) in a single

cycle. The data cache can be operated in a store-in (copy-back) or write-through manner, according to the write-

through storage attribute specified for the memory page by the MMU. The DCC also supports both store-with-

allocate and store-without-allocate operations, such that store operations that miss in the data cache can either

“allocate” the line in the cache by reading it in and storing the new data into the cache, or alternatively bypass the

cache on a miss and simply store the data to memory. This characteristic can also be specified on a page-by-page

basis by a storage attribute in the MMU.


359

The DCC also supports cache line locking and “transient” data in the same manner as the ICC. The DCC

provides extensive load, store, and flush queues, such that up to three outstanding line fills and up to four

outstanding load misses can be pending, and the DCC can continue servicing subsequent load and store hits in an

out-of-order fashion. Store gathering can also be performed on caching inhibited, write-through, and without-

allocate store operations for up to 16 contiguous bytes. Finally, each cache line has four separate dirty bits (one

per double word), so that the amount of data flushed on cache line replacement can be minimized.

Memory Management Unit (MMU)

The PowerPC 440 MMU generates a 36-bit real address as part of the translation process from the 32-bit

effective address, which is calculated by the processor as an instruction fetch or load/store address. However, only

a 32-bit (4 GB) address space is accessible in Xilinx EDK systems. The high-order 4 bits of the 36-bit real address

must be all zeros.

The MMU provides address translation, access protection, and storage attribute control for embedded

applications. The MMU supports demand paged virtual memory and other management schemes that require

precise control of logical to physical address mapping and flexible memory protection. Working with appropriate

system-level software, the MMU provides the following functions:

1). Translation of the 32-bit effective address space into the 36-bit real address space.

2). Page level read, write, and execute access control.

3). Storage attributes for cache policy, byte order (endianness), and speculative memory access.

4). Software control of page replacement strategy.

The translation lookaside buffer (TLB) is the primary hardware resource involved in the control of

translation, protection, and storage attributes. It consists of 64 entries, each specifying the various attributes of a

given page of the address space. The TLB is fully associative; the entry for a given page can be placed anywhere

in the TLB. The TLB tag and data memory arrays are parity protected against soft errors. If a parity error is

detected, the CPU causes a machine check exception.

Software manages the establishment and replacement of TLB entries, which gives system software

significant flexibility in implementing a custom page replacement strategy. For example, to reduce TLB thrashing

or translation delays, software can reserve several TLB entries for globally accessible static mappings. The

instruction set provides several instructions for managing TLB entries. These instructions are privileged and the

processor must be in supervisor state for them to be executed.

The first step in the address translation process is to expand the effective address into a virtual address.

The 32-bit effective address is appended to an 8-bit process identity (PID) as well as a 1-bit “address space” (AS)

identifier. The PID value is provided by the PID register. The AS identifier is provided by the Machine State

Register (MSR), which contains separate bits for the instruction fetch address space (MSR[IS]) and the data


360

access address space (MSR[DS]). Together, the 32-bit effective address, the 8-bit PID, and the 1-bit AS form a

41-bit virtual address. This 41-bit virtual address is then translated into the 36-bit real address using the TLB.

The MMU divides the address space (effective, virtual, or real) into pages. Eight page sizes (1 KB, 4 KB,

16 KB, 64 KB, 256 KB, 1 MB, 16 MB, 256 MB) are simultaneously supported, such that at any given time the

TLB can contain entries for any combination of page sizes. For an address translation to occur, a valid entry for

the page containing the virtual address must be in the TLB. An attempt to access an address for which no TLB

entry exists causes an Instruction (for fetches) or Data (for load/store accesses) TLB Error exception.

To improve performance, both the instruction cache and the data cache maintain separate shadow TLBs.

The instruction shadow TLB (ITLB) contains four entries, while the data shadow TLB (DTLB) contains eight.

These shadow arrays minimize TLB contention between instruction fetch and data load/store operations. The

instruction fetch and data access mechanisms only access the main 64-entry unified TLB when a miss occurs in

the respective shadow TLB. The penalty for a miss in either of the shadow TLBs is three cycles. Hardware

manages the replacement and invalidation of both the ITLB and DTLB. No system software action is required.

Each TLB entry provides separate user state and supervisor state read, write, and execute permission

controls for the memory page associated with the entry. If software attempts to access a page for which it does not

have the necessary permission, an Instruction (for fetches) or Data (for load/store accesses) Storage exception

occurs.

Each TLB entry also provides a collection of storage attributes for the associated page. These attributes

control cache policy (such as cachability and write-through as opposed to copy-back behavior), byte order (big

endian as opposed to little endian), and enabling of speculative access for the page. In addition, a set of four, user-

definable storage attributes is provided. These attributes can be used to control various system-level behaviors,

such as instruction compression using IBM CodePack technology. They can also be configured to control whether

data cache lines are allocated upon a store miss, and whether accesses to a given page should use the normal or

transient portions of the instruction or data cache.

Timers

The PowerPC 440 embedded processor contains a time base and three timers: a decrementer (DEC), a

fixed interval timer (FIT), and a Watchdog Timer. The time base is a 64-bit counter that gets incremented at a

frequency either equal to the processor clock rate or as controlled by a separate asynchronous timer clock input to

the embedded processor. No interrupt is generated as a result of the time base wrapping back to zero.

The DEC is a 32-bit register that is decremented at the same rate at which the time base is incremented.

The user loads the DEC register with a value to create the desired interval. When the register is decremented to

zero, a number of actions occur: the DEC stops decrementing, a status bit is set in the Timer Status register (TSR),

and a decrementer exception is reported to the interrupt mechanism of the PowerPC 440 embedded processor.

Optionally, the DEC can be programmed to automatically reload the value contained in the Decrementer Auto-


361

Reload register (DECAR), after which the DEC resumes decrementing. The Timer Control register (TCR)

contains the interrupt enable for the decrementer interrupt.

The FIT generates periodic interrupts based on the transition of a selected bit from the time base. Users

can select one of four intervals for the FIT period by setting a control field in the TCR to select the appropriate bit

from the time base. When the selected time base bit transitions from 0 to 1, a status bit is set in the TSR, and a

Fixed Interval Timer exception is reported to the interrupt mechanism of the PowerPC 440 embedded processor.

The FIT interrupt enable is contained in the TCR.

Similar to the FIT, the watchdog timer also generates a periodic interrupt based on the transition of a

selected bit from the time base. Users can select one of four intervals for the watchdog period, again by setting a

control field in the TCR to select the appropriate bit from the time base. Upon the first transition from 0 to 1 of

the selected time base bit, a status bit is set in the TSR, and a watchdog timer exception is reported to the interrupt

mechanism of the PowerPC 440 embedded processor. The watchdog timer can also be configured to initiate a

hardware reset if a second transition of the selected time base bit occurs prior to the first watchdog exception

being serviced. This capability provides an extra measure of recoverability from potential system lock-ups.

Debug Facilities

The PowerPC 440 debug facilities include debug modes for the various types of debugging used during

hardware and software development. Also included are debug events that allow developers to control the debug

process. Debug modes and debug events are controlled using debug registers in the embedded processor. The

debug registers are accessed either through software running on the processor or through the JTAG port. The next

subsection provides a brief overview of the debug modes and development tool support.

Debug Modes

The PowerPC 440 embedded processor supports four debug modes: internal, external, real-time trace, and

debug wait. Each mode supports a different type of debug tool used in embedded systems development. Internal

debug mode supports software-based ROM monitors, and external debug mode supports a hardware emulator

type of debug. Realtime trace mode uses the debug facilities to indicate events within a trace of processor

execution in real time. Debug wait mode enables the processor to continue to service real-time critical interrupts

while instruction execution is otherwise stopped for hardware debug. The debug modes are controlled by Debug

Control Register 0 (DBCR0) and the setting of bits in the Machine State Register (MSR).

Internal debug mode supports accessing architected processor resources, setting hardware and software

breakpoints, and monitoring processor status. In internal debug mode, debug events can generate debug

exceptions, which can interrupt normal program flow so that monitor software can collect processor status and

alter processor resources.


362

Internal debug mode relies on exception-handling software running on the processor along with an

external communications path to debug software problems. This mode is used while the processor continues

executing instructions and enables debugging of problems in application or operating system code. Access to

debugger software executing in the processor while in internal debug mode can be established through a

communications port in the system, such as a serial port or Ethernet connection.

External debug mode supports stopping, starting, and single-stepping the processor, accessing architected

processor resources, setting hardware and software breakpoints, and monitoring processor status. In external

debug mode, debug events can architecturally “freeze” the processor. While the processor is frozen, normal

instruction execution stops, and the architected processor resources can be accessed and altered using a debug tool

attached through the JTAG port. This mode is useful for debugging hardware and low-level control software

problems.

APPENDIX A–9.4: Processor Interfaces

The PowerPC™ 440 embedded processor core has many interfaces as shown above in the logical organization of

the processor in Fig. A.16. A more compact architecture of the PowerPC™ interfaces on the Virtex-5 FPGA is

shown in Fig. A.19. Some major interfaces to the PowerPC™ 440 embedded processor include:

1). Processor Local Bus (PLB)

2). Device configuration register (DCR) interface

Fig. A.19: The architectural implementation of the embedded PowerPC™ processor and connection to the

associated peripherals in the Virtex-5 ML507 FX70T FPGA as well as the Virtex-5 FPGA family

members.


363

3). Auxiliary processor unit (APU) port

4). JTAG, debug, and trace ports

5). Interrupt interface

6). Clock and power management interface

These interfaces are described briefly in the following sub-headings below. Because the discussion can be

cannot be exhausted here, more additional and detailed information on these and other interfaces offered by the

PowerPC™ 440 embedded processor core can be found in [IBM PEPC440, 2010]; [IBM PPC440C, 2006], [IBM

TPPC440C, 1999], and [XEPB Virtex-5, 2010].

Processor Local Bus (PLB)

There are three independent 128-bit PLB interfaces to the PowerPC 440 embedded processor. One PLB

interface supports instruction cache reads, while the other two support data cache reads and writes. All three PLB

interfaces are connected as masters to the crossbar in the embedded processor block in Virtex-5 FPGAs.

The data cache PLB interfaces make requests for 32-byte lines, as well as for 1 to 15 bytes

within a 16-byte (quadword) aligned region. A 16-byte line request is used for quadword APU load

operations to caching inhibited pages, and for quadword APU store operations to caching inhibited, write-through,

or without allocate pages. The instruction cache controller makes 32-byte line read requests.

Each of the PLB interfaces fully supports the address pipelining capabilities of the PLB, and in fact can

go beyond the pipeline depth and minimum latency that the PLB supports. Specifically, each interface supports up

to three pipelined request/acknowledge sequences prior to performing the data transfers associated with the first

request. For the data cache, if each request must be broken into three separate transactions (for example, for a

misaligned doubleword request to a 32-bit PLB slave), then the interface actually supports up to nine outstanding

request/acknowledge sequences prior to the first data transfer. Furthermore, each PLB interface tolerates a zero-

cycle latency between the request and the address and data acknowledge (that is, the request, address

acknowledge, and data acknowledge may all occur in the same cycle).

The PLB interfaces described above are not directly visible to the Virtex-5 FXT FPGA user. The Virtex-5

FXT FPGA user sees only the external interfaces on the embedded processor block, which includes the PowerPC

440 and the crossbar interfaces.

Device Control Register (DCR) Interface

The DCR interface provides a mechanism for the PowerPC 440 embedded processor to set up and check

status of other hardware facilities in the embedded processor block in the Virtex-5 FPGA and elsewhere in the

system. DCRs are accessed through the PowerPC mfdcr and mtdcr instructions.


364

The interface is interlocked with control signals such that it can be connected to peripheral units that can

be clocked at different frequencies from the embedded processor. The DCR interface also allows the PowerPC

440 embedded processor to communicate with peripheral devices without using the PLB interface, avoiding the

impact to the primary system bus bandwidth, and without additional segmentation of the usable address map.

Auxiliary Processor Unit (APU) Port

This interface provides the PowerPC 440 embedded processor with the flexibility for attaching a tightly

coupled, coprocessor-type macro incorporating instructions that go beyond those provided within the embedded

processor itself. The APU port provides sufficient functionality for attachment of various coprocessor functions,

such as a fully compliant PowerPC floating-point unit, or other custom function implementing algorithms

appropriate for specific system applications. The APU interface supports dual issue pipeline designs, and can be

used with macros that contain their own register files, or with simpler macros that use the CPU GPR file for

source and/or target operands. APU load and store instructions can directly access the PowerPC 440 data cache

with operands of up to a quadword (16 bytes) in length.

The APU interface provides the capability for a coprocessor to execute concurrently with the PowerPC

440 embedded processor instructions that are not part of the PowerPC instruction set. Accordingly, areas have

been reserved within the architected instruction space to allow for these customer-specific or application-specific

APU instruction set extensions.

JTAG Port

The JTAG port is enhanced to support the attachment of a debug tool. Through the JTAG test access port,

and using the debug facilities designed into the PowerPC 440 embedded processor, a debug tool can single-step

the processor and interrogate internal processor state to facilitate hardware and software debugging. The

enhancements, which comply with the IEEE 1149.1 specification for vendor-specific extensions, are therefore

compatible with standard JTAG hardware for Boundary-Scan system testing.

Crossbar and its Interfaces

The crossbar and its interfaces allow the processor with its three PLB interfaces, soft peripherals with

PLB interfaces, and peripherals with LocalLink interfaces to share access to a high-performance memory

controller. As shown in Fig. A.18, the crossbar has:

1). Five PLB slave interfaces where three for the PLB interfaces from the processor and two for soft peripherals

with PLB interfaces to allow these peripherals to access the high-speed memory controller interface.

2). Four full-duplex LocalLink channels with built-in DMA control and access to the memory controller interface.


365

3). One high-speed memory controller interface that hardens several parts of a typical memory controller but

leaves the physical interface to the memory to be implemented as soft logic for reasons of flexibility.

4). One PLB master interface to allow the processor to connect to other peripherals in the FPGA logic.

Auxiliary Processor Unit Controller

The embedded processor block in Virtex-5 FPGAs includes a hardened Auxiliary Processor Unit (APU)

controller driven by the APU interface on the processor. The APU interface on the processor allows users to build

an auxiliary processor to execute instructions that are not part of the PowerPC 440 instruction set. However, this

interface requires the auxiliary processor to be clocked at the CPU speed and also be in complete lock-step with

the processor pipeline. The processor can run much faster than a soft core implemented on the FPGA logic, so an

auxiliary processor implemented in soft logic would force the processor to run at a lower speed, reducing the

performance gain. The APU controller directs and synchronizes the CPU pipeline, allowing the soft auxiliary

processor and the CPU to run at different clock rates. Additionally, the APU controller can decode the instructions

on behalf of the soft auxiliary processor unit, resulting in faster overall instruction execution for the instructions

using the auxiliary processor.

Direct Memory Access Controller

The processor block includes a hardened Direct Memory Access (DMA) controller that allows peripherals

to directly transfer data to and from a memory controller connected to the processor block via the memory

controller interface or the PLB interface. The DMA controller can be monitored and controlled through its Device

Control Registers (DCRs). The DMA controller has LocalLink data interfaces to peripherals.

APPENDIX A–10: MicroBlaze™ Embedded Processor

This sub-section gives a brief overview of the basic features and architecture of the Xilinx MicroBlaze™

embedded processor version 7.20 currently support for Xilinx MicroBlaze™ embedded processor development

within the Embedded Development Kit (EDK) 11.4 for Xilinx Virtex-5 FX70T GPGA being used in this work.

Like the IBM PowerPC™, The MicroBlaze™ soft core processor is a 32-bit reduced instruction set

computer (RISC). The processor includes the Big-Endian bit reversed format, 32-bit general purpose registers,

virtual-memory management, cache software support, and Fast Simplex Link (FSL) interfaces.The MicroBlaze

core is organized as a Harvard architecture with separate bus interface units for data and instruction accesses. The

following three memory interfaces are supported: Local Memory Bus (LMB), the IBM Processor Local Bus

(PLB), and Xilinx® CacheLink (XCL). The LMB provides single-cycle access to on-chip dual-port block RAM.


366

The PLB interfaces provide a connection to both on-chip and off-chip peripherals and memory. The CacheLink

interface is intended for use with specialized external memory controllers. MicroBlaze also supports up to 16 Fast

Simplex Link (FSL) ports, each with one master and one slave FSL interface. The architecture of the Xilinx

MicroBlaze™ processor core, the core interfaces, buses, memory and peripherals are shown in Fig. A.20

[XMBPRG, 2010].

The acronyms of the core interfaces shown in Fig. A.20 are defined as follows [XMBPRG, 2010]:

DPLB: Data interface, Processor Local Bus,

DLMB: Data interface, Local Memory Bus (BRAM only),

IPLB: Instruction interface, Processor Local Bus,

ILMB: Instruction interface, Local Memory Bus (BRAM only),

MFSL 0..15: FSL master interfaces,

DWFSL 0..15: FSL master direct connection interfaces,

SFSL 0..15: FSL slave interfaces,

DRFSL 0..15: FSL slave direct connection interfaces,

DXCL: Data side Xilinx CacheLink interface (FSL master/slave pair),

IXCL: Instruction side Xilinx CacheLink interface (FSL master/slave pair),

Core: Miscellaneous signals for: clock, reset, debug, and trace.

Fig. A.20: The architecture of the Xilinx MicroBlaze™ processor core, the core interfaces, buses.


367

The Xilinx MicroBlaze™ soft core processor is highly configurable and allows the section of a specific or

fixed set of features required by the design for embedded processor system development. The fixed features of the

processor includes: 1) thirty-two 32-bit general purpose registers, 2) 32-bit instruction word with three operands

and two addressing mode, 3) a 32-bit address bus, and a single issue pipeline. In addition to these fixed features,

the MicroBlaze™ processor is parameterized to allow selective enabling of additional functionality.

The MicroBlaze™ processor can be configured with the following bus interfaces: 1) A 32-bit version of

the PLB V4.6 interface, 2) LMB provides simple synchronous protocol for efficient block RAM transfers, 3) FSL

provides a fast non-arbitrated streaming communication mechanism, 4) XCL provides a fast slave-side arbitrated

streaming interface between caches and external memory controllers, 5) Debug interface for use with the

Microprocessor Debug Module (MDM) core, and 6) Trace interface for performance analysis.

The processor local bus (PLB) interfaces are implemented as byte-enable capable 32-bit masters. The

MicroBlaze™ on-chip peripheral bus (OPB) interfaces are implemented as byte-enable capable masters. The local

memory bus (LMB) is a synchronous bus used primarily to access on-chip block RAM. It uses a minimum

number of control signals and a simple protocol to ensure that local block RAM are accessed in a single clock

cycle. All the LMB signals are usually active high.

As a note on the embedded MicroBlaz™ processor system clocks and resets signals, the following should

be taken into considerations for improved performances. Although, the overall embedded system reset designated

here as “Reset” and the MicroBlaze™ reset designated here as “MB_Signal” signals are functionality equivalent,

the Reset is primarily intended for use with the on-chip peripheral bus (OPB) interface, whereas the MB_Reset is

intended for the processor local bus (PLB) interfaces. Furthermore, the MicroBlaz™ processor is a synchronous

design clocked with the overall system clock designated here as “Clk” signal, except for the hardware debug

logic, which is clocked with the debug clock signal designated here as “Debug_Clk”. If the hardware debug logic

is not used, there is no minimum frequency limit for the Clk. However, if hardware debug logic is used, there are

signals transferred between the two clock regions. In this case, Clk must have a higher frequency than the debug

clock Debg_Clk [XMBPRG, 2010].


368

APPENDIX A–11: XPS Synthesis and ISE Device Utilization Summaries for the

PowerPC™440 and MicroBlaze™ Embedded Processors Design

APPENDIX A–11.1: XPS Synthesis Summary for PowerPC™440 Processor Design


369

APPENDIX A–11.2: ISE Device Utilization Summary for PowerPC™440 Processor


370


371

APPENDIX A–11.3: XPS Synthesis Summary for MicroBlaze™ Processor Design


372

APPENDIX A–11.4: ISE Device Utilization Summary for MicroBlaze™ Processor


373

Appendix B The Mathematical Model of the FBFR

374

APPENDIX B: The Mathematical Model of the Fluidized Bed Furnace Reactor

Appendix B–1: The Fluidized Bed Furnace Reactor (FBFR)

The complete mathematical model for the energy balance of the FBFR can be expressed by the following

nonlinear partial differential equations with respect to Fig. 6.2 from [Voutetakis et al., 2006] as:

Energy balance in the reactor interior (Tri):

( )

10,

0, ( )0

ri ri

ri ri ri

ri ri

ri ri dwf ri ri irw

ri

T TCp rk

t r r r

T Tk h d T R T

r Rrr r

ρ∂ ∂∂

− = ∂ ∂ ∂

∂ ∂ = − = −

==∂ ∂

(B.1)

Energy balance in the interior reactor wall (Tirw):

( ) ( ) ( )2 2 2 ( ) 2 ( )irw

irw irw irw r r r r r irw irw irw irw brwh irw

TCp R R R h T R T R h T T R

rρ

∂− = − − −

∂ (B.2)

Energy balance in the gap between reactor wall and electric heater (Tbrwh):

( )

( )

10,

( ) ,

( )

brwh brwh

brwh brwh brwh

brwh

brwh irw irw brwh irw

irw

brwhbrwh brwh brwh brwh h

brwh

T TCp rk

t r r r

Tk h T T R

r Rr

Tk h T R T

r Rr

ρ∂ ∂∂

+ = ∂ ∂ ∂

∂ − = −

=∂

∂ − = −=∂

(B.3)

Energy balance in the heater (Th):

( ) ( ) ( ) ( )2 2

,2 ( ) 2 ( ) 2h

h h h brwh brwh brwh brwh h h h h h ins h h mean

TCp R R R h T R T R h T T R R Q A

tρ

∂− = − − − +

∂ (B.4)

Energy balance in the insulator (Tins):

( ) ( )

10,

( ) , ( )

ins ins

ins ins ins

ins ins

ins h h ins h ins ins ins ins ormw

h ins

T TCp rk

t r r r

T Tk h T T R k h T R T

r R r Rr r

ρ∂ ∂∂

+ = ∂ ∂ ∂

∂ ∂ − = − − = −

= =∂ ∂

(B.5)

Energy balance in the outer reactor metal wall (Tormw):

( ) ( ) ( )2 2 2 ( ) 2ormw

ormw ormw ormw ins ins ins ins ins ormw ormw ormw ormw out

TCp R R R h T R T R h T T

tρ

∂− = − − −

∂ (B.6)


375

where ρ denotes density in kg.m-3

, Cp is heat capacity in J.(kg.K)-1

, k is heat conductivity in J.(K.m.s)-1

, h is heat

transfer coefficient in J.K.(m2.s)

-1, T is temperature in K, R is radius in m, Q is heating power in kW, and A is

heating area in m2. The subscripts refer to the different cylindrical layers that constitute the reactor sections of the

fluidized furnace reactor where r is for interior (inside) of the reactor, irw is for interior reactor wall, brwh is for

between the reactor wall and heater, h is for heater, ins is for insulator, and ormw is for outer reactor metal wall.


376

Appendix B–2: MATLAB Program for the Fluidized Bed Furnace Reactor (FBFR)

Model

function HEAT = fluidised_reactor(t,T) global Q dwf Cpcross

% Parameters specification from the physical system Cmon = 900; kmon = 0.00019*((T(17)+T(26))/2); pmon = 761.4; hr = 0.0019*T(17); pf = 8055; Cpf = 480; hf = 15000; pw = 9055; Cpw = 480; hw = 15; Rtout = 0.108; Rtin = 0.088; Rmin = 0.13; Rr = 0.111; Rw = 0.268; Rmout = 0.266; L = 0.45; Tout = 293; A = 3.14*(Rmin+Rr)*L; pi = 3.142;

% dwf = 500;

% pr = 840; % Cpr = 950;

% Cpcross = pr*Cpr;

%********************************************************************** % 1. -----Air inside reactor [N = 1 to 10]------ x1 = Rtin/10; hair1 = ((0.000024*(T(2) + T(10))/2 + 0.00618)*1.73); Ca = 0.0000006*((T(2) + T(10))/2) + 0.00082; pair = 0.00096*(T(2) + T(10))/2 + 1.449; U1 = 100; Ua = 1; Aa = pi*(x1^2 + (Rtin^2 - (9*x1)^2))/2; HEAT(1,1) = T(2)-T(1);

HEAT(2,1) = (hair1/pair*Ca)*(((1/x1)*((T(3)-T(2))/x1))+... ((T(3)-2*T(2)+dwf*T(1))/(x1^2)))+Ua*Aa*(Tout-T(2));

for i = 3:1:10; HEAT(i,1) = (hair1/pair*Ca)*(((1/(i*x1))*((T(i+1)-T(i))/(i*x1)))+... ((T(i+1)-2*T(i)+T(i-1))/(x1^2)))+Ua*Aa*(Tout-T(i)); end

%********************************************************************** % 2. ------Metal of reactor (N = 1)-------- Ab = pi*(Rtout^2-Rtin^2);


377

Ub = 1; HEAT(11,1) = 1/((pf*Cpf*L)*(Rtout^2-Rtin^2))*(2*Rtin*hair1*(T(10)-T(11))+... 2*Rtout*hf*(T(12)-T(11)))+Ub*Ab*(Tout-T(11));

%********************************************************************** % 3. --------Air between reactor and R (N = 1 to 3) hair2 = ((0.000024*(T(12)+T(15))/2+0.00618)*1.73); x2 = (Rr-Rtout)/4; Ca2 = 0.0000006*((T(12)+T(15))/2)+0.00082; pair2 = 0.00096*(T(12)+T(15))/2+1.449; Ac = pi*((((Rtout+x2)^2-Rtout^2)+(Rr^2-(Rtout+3*x2)^2))/2); Uc = 0.1;

HEAT(12,1) = (hair2/pair2*Ca2)*(((1/Rtout+x2))*((T(13)-T(12)/x2))+... ((T(13)-2*T(12)+T(11))/(x2^2)))+Ua*Ac*(Tout-T(12)); HEAT(13,1) = (hair2/pair2*Ca2)*(((1/Rtout+x2))*((T(14)-T(13)/x2))+... ((T(14)-2*T(13)+T(12))/(x2^2)))+Ua*Ac*(Tout-T(13)); HEAT(14,1) = (hair2/pair2*Ca2)*(((1/Rtout+x2))*((T(15)-T(14)/x2))+... ((T(15)-2*T(14)+T(13))/(x2^2)))+Ua*Ac*(Tout-T(14)); HEAT(15,1) = (hair2/pair2*Ca2)*(((1/Rtout+x2))*((T(16)-T(15)/x2))+... ((T(16)-2*T(15)+T(14))/(x2^2)))+Ua*Ac*(Tout-T(15));

%********************************************************************** % 4. -------------Resistance (N = 1)--------------- Ad = pi*(Rmin^2-Rr^2); Ud = 0.08; HEAT(16,1) = 1/((Cpcross*L)*(Rmin^2-Rr^2))*((-Q/A)+2*Rr*hair1*(T(17)-T(16))+... 2*Rmin*hr*(T(15)-T(16)))+Ud*Ad*(Tout-T(16));

%********************************************************************** % 5. ------Insulator (N = 1 to 10) x3 = (Rmout-Rmin)/11; HEAT(17,1) = (kmon/pmon*Cmon)*(((1/(Rmin+x3))*((T(18)-T(17))/x3))+... ((T(18)-2*T(17)+T(16))/(x3^2))); HEAT(18,1) = (kmon/pmon*Cmon)*(((1/(Rmin+x3))*((T(19)-T(18))/x3))+... ((T(19)-2*T(18)+T(17))/(x3^2)));

for i = 19:1:26 for j = 2:1:10 HEAT(i,1) = (kmon/pmon*Cmon)*(((1/(Rmin+(j*x3)))*... ((T(i+1)-T(i))/x3))+((T(i+1)-2*T(i)+T(i-1))/(x3^2))); end end

%********************************************************************** % 6. -------Reactor's wall (N = 1)------------------ Ae = pi*(Rw^2-Rmout^2); Ue = 20; HEAT(27,1) = 1/((pw*Cpw*L)*(Rw^2-Rmout^2))*(2*Rmout*hw*(T(26)-T(27))+... 2*Rw*Ue*(Tout-T(27))); U4 = 1; HEAT(28,1) = 1/((pw*Cpw*L)*(Rw^2-Rmout^2))*(2*Rmout*hw*(T(27)-T(28))+... 2*Rw*Ue*(Tout-T(28)));


378

Appendix B–3: MATLAB Script for Simulation of the FBFR Model

global Q dwf T = zeros(1,28); initial = zeros(1,28);

for i = 1:1:28 m = 293; T(11,i) = m; end

for j = 0:433:210005 tspan = j:1:j+10; if j > 143300 Q = 0;

dwf = 0;

elseif j > 33180 Q = 0;

dwf = 0;

elseif j < 33180 Q = -880;

dwf = -70;

end for i = 1:1:28 initial(i) = T(11,i); end

Ma = eye(28,28); Ma(10,10) = 0; Ma(12,12) = 0; Ma(15,15) = 0; Ma(17,17) = 0; Ma(26,26) = 0; Ma(28,28) = 0;

options = odeset('MassConstant',Ma); [t, T] = ode15s('fluidised_reactor', tspan, initial,options);

figure(1) subplot(2,1,1) plot([j j+433], [T(1,2) T(11,2)],'c',... [j j+433], [T(1,3) T(11,3)], 'c',... [j j+433], [T(1,4) T(11,4)], 'c',... [j j+433], [T(1,5) T(11,5)], 'c',... [j j+433], [T(1,6) T(11,6)], 'c',... [j j+433], [T(1,7) T(11,7)], 'c',... [j j+433], [T(1,8) T(11,8)], 'c',... [j j+433], [T(1,9) T(11,9)], 'c',... [j j+433], [T(1,10) T(11,10)], 'c',... [j j+433], [T(1,11) T(11,11)], 'k',... [j j+433], [T(1,12) T(11,12)], 'm',... [j j+433], [T(1,13) T(11,13)], 'm',... [j j+433], [T(1,14) T(11,14)], 'm',... [j j+433], [T(1,15) T(11,15)], 'm',... [j j+433], [T(1,16) T(11,16)], 'r'); hold on


379

plot([j j+433], [T(1,17) T(11,17)], 'b',... [j j+433], [T(1,18) T(11,18)], 'b',... [j j+433], [T(1,19) T(11,19)], 'b',... [j j+433], [T(1,20) T(11,20)], 'b',... [j j+433], [T(1,21) T(11,21)], 'b',... [j j+433], [T(1,22) T(11,22)], 'b',... [j j+433], [T(1,23) T(11,23)], 'b',... [j j+433], [T(1,24) T(11,24)], 'b',... [j j+433], [T(1,25) T(11,25)], 'b',... [j j+433], [T(1,26) T(11,26)], 'b',... [j j+433], [T(1,27) T(11,27)], 'g',... [j j+433], [T(1,28) T(11,28)], 'g'); xlabel('(a)') ylabel('Temperature distributions( ôK)') axis([0 2.15e5 250 1150]);

subplot(2,1,2) plot([j j+433], [T(1,1) T(11,1)],'c.',... [j j+433], [T(1,10) T(11,10)],'c',... [j j+433], [T(1,11) T(11,11)], 'k',... [j j+433], [T(1,15) T(11,15)], 'm',... [j j+433], [T(1,16) T(11,16)], 'r'); hold on

plot([j j+433], [T(1,26) T(11,26)], 'b.',... [j j+433], [T(1,17) T(11,17)], 'b',... [j j+433], [T(1,27) T(11,27)], 'g'); xlabel('(b)';'time(seconds, s)') ylabel('Temperature variations ( ôK)') axis([0 2.15e5 250 1150]); legend('Tir (Min)','Tri (Max)','Tirw','Tbrwh','Th','Tins (Min)','Tins

(Max)','Tormw','Location','NorthEast') end

Appendix C Activated Sludge Wastewater Treatment Plant (AS-WWTP)

380

APPENDIX C: The Activated Sludge Wastewater Treatment Plant (AS-WWTP)

Process Description and Model

Appendix C–1: Introduction

The wastewater treatment plants are large nonlinear systems subject to large perturbations in influent flow

rate and pollutant load, together with uncertainties concerning the composition of the incoming wastewater.

Wastewater normally contains thousands of different organics and a measurement of each individual organic

matter would be impossible rather a different collective analyses are used which comprise a greater or minor part

of the organics. Activated sludge wastewater treatment processes are difficult to control because of their

complexity; nonlinear behaviour; large uncertainty in uncontrolled inputs and in the model parameters and

structures, and multiple time scales of the dynamics and multivariable input-output structure. The activated sludge

process aims to achieve, at minimum costs, a sufficiently low concentration of biodegradable in the effluent

together with minimal sludge production; and this is achieved through efficient control of the process. The first

control opportunity in ASWWTP is regulating the influent flow-rate which implies that control issues in

wastewater treatment facilities pertain primarily to aeration control for energy usage and satisfying process

demands.

While the dissolve oxygen concentration is considered as the most important control parameter in

activated sludge process (ASP), the control of dissolved oxygen level in the ASWWTP reactors plays an

important role in the operation of the plant. DO concentration control of the ASP has been recognized as a

rewarding and meaningful control, both from economical and biological point of view ([Spellman, 2003],

[Steffens and Lant, 1999]). Some successful conventional and advanced DO control schemes such as PID control

by manipulating the air flow-rate, waste activated sludge (WAS) or return activated sludge (RAS) in full-scale

ASPs as well as the application of adaptive robust generic model controller (ARGMC) based on fuzzy

supervisory control, direct adaptive model reference controller (DAMRC), and model predictive control (MPC)

have been reported in [Cakici and Bayramolu, 1995], [Chotkowski et al., 2005], [Galluzzo et al., 2001], [Holenda

et al., 2008], [Piotrowski et al., 2008] and [Shen et al., 2008].

Most WWTPs tend to exhibit large variations in nitrogen nutrient concentrations, and generally require

the addition or removal of nitrogen and/or phosphorus in order to promote a healthy activated sludge

environment. To remove nitrogen from wastewater in an ASP, two biological processes: nitrification and

denitrification may be used. By nitrification, we mean a microbiological process that converts ammonium into

nitrite and eventually nitrites to nitrates provided that the nitrifying bacteria (a limited group of autotrophic micro-

organisms) exist while denitrification on the other hand, which is a key process for removing nitrate nitrogen,

refers to a process where micro-organisms convert nitrate into atmospheric nitrogen and the process is anaerobic

as nitrate is the oxidizing agent. When nitrate is the oxidizing agent and oxygen is liberated, the process becomes


381

anoxic and it should be noted that this process is inhibited by oxygen. Nitrogen removal is realized by switching

the aeration system on and off to create continuous alternating aerobic and anoxic conditions respectively. During

switched-on periods, ammonium is converted into nitrate which is subsequently used to remove organic carbon in

switched-off periods.

In order to achieve a satisfying nitrification in the aerobic environment, several issues must be considered

such as an adequately large aeration volume together with a sufficiently high concentration of dissolved oxygen

(DO) to cover the oxygen demand of the microorganisms and to avoid the formation of dinitrous oxide (N2O).

Generally, the aeration causes high energy costs and constitutes the major expenditure of an ASP plant. Also a

very high DO level may unfavorably influence the denitrification rate in the anoxic compartments. According to

[Spellman, 2003] and [Steffens and Lant, 1999], for process control reasons as well as on economical grounds, it

is therefore important to control the DO level and limit the aeration as much as possible. The addition of an

excessive uncontrolled amount of nitrogen is obviously not desirable since such practice is not only expensive

because of the waste of excessive nutrient and the cost resulting from the extra oxygen required for nitrification

but it is also environmentally detrimental to receiving waters. On the other hand, an insufficient amount of

nitrogen will result in incomplete oxidation of the waste and deteriorating sludge settling properties ([Ning et al.,

2000], [Spellman, 2003]). Majority of plants accomplish only organic matter removal but increasing regulatory

pressure to remove nutrients such as nitrogen and phosphorus requires more complicated control processes, which

presents optimization and several control issues [Hamilton et al., 2006]. Some conventional control schemes for

nitrogen nutrient removal using supervisory PI controller [Ekman et al., 1999], simple feedforward controller

[Lukasse and Keesman, 1999], absolute PID controllers [Yong et al., 2006] as well as some advanced control

strategies such as the generic distributed parameter model-based control ([Lee et al., 1999 and 2006(a)]) and a

MPC combined with PI and feedforward controllers ([Stare et al., 2007], [Steffens and Lant, 1999] have been

reported.

The reference model of biochemical reactions in the bioreactors is the activated sludge model no.1

(ASM1) ([Henze et al., 1996], [Coop, 2000], [COST, 2000], [COST, 2008]). The success of this model has

prompted widespread interest in biochemical modeling of wastewater in both academia and industry. The overall

WWTP model consists of two main parts: the hydraulic model, which represents reactor behaviour, flow rates and

recirculation; and the second primary component of WWTP model, is the activated sludge model, which portrays

microbial growth, death and nutrient consumption. These models are necessarily approximations to the vast

number of biological processes occurring in each bioreactor. Selection of the proper model allows adequate

description of those processes most relevant to a particular WWTP. The development of accurate models is a

prerequisite for applying model predictive control techniques for the whole process control and dynamic

optimization.

It is well known that the application of advanced control strategies to ASWWTPs is at their infancy

([Coop, 2000], [COST 2000], [COST, 2008]). The inclusion of neural network for nonlinear model identification


382

and a neural network-based model-based predictive control for the absolute control of the ASWWTP has been

proposed in this work. Several papers on the application of neural networks techniques to activated sludge process

(ASP) model identification have been published ([Cote et al., 1995], [Zhao, 1999]). Although, the application of

neural network controllers for controlling dissolved oxygen concentration in a sequencing batch reactor [Azwar et

al., 2006] and activation sludge aeration process has been reported [Rodrigo et al., 1999] using internal model

control (IMC), inverse model control (IMC) and a combination of inverse model control in parallel with PI

controller by the former authors while the later employed a PID controller tuned using fuzzy techniques and

internal model control (IMC). The disadvantage of these strategies lies in the two IMC controllers because these

controllers have been reported of poor performance in the presence of disturbances, does not work for system with

unstable inverse which unfortunately often occur when using a high sampling frequency, and lack of tuning

options among others [Nørgaard et al., 2000].

Excess nitrogen and phosphorus in surface waters and nitrogen in groundwater causes eutrophication

(excess algae growth) in surface waters and health related problems in humans and livestock as a result of high

intake of nitrogen in its nitrate form. Also the effluent quality from industrial wastewater treatment plants are now

subjected to tighter regulation as a result of these nutrients as well as nitrogen and phosphorus in both public and

receiving waters [Spellman, 2003].

m=10

m=1

m=6

Aerobic

Tank

(Unit 4)

Aerobic

Tank

(Unit 3)

Aerobic

Tank

(Unit 5)

Anoxic

Tank

(Unit 2)

Anaerobic

Tank

(Unit 1)

Deoxic Mixing

Tank

Influent

Pump (1)

Influent Tank

Settler Effluent Tank WAS Tank RAS Tank

WAS Pump (5)

Feed

Pump (4)

RAS

Pump (6)

NOX

Pump (3)

Internal nitrate recycle

External nitrate recycle

Pump (2)

e eQ Z

f fQ Z

1 1a aQ Z

u uQ Z

r rQ Z

w wQ Z

RAS Recycle

Pump (7)

Mechanical

Aerator 1

Mechanical

Aerator 2

AF3 AF2 AF1 2 2a aQ Z

Fig. C.1: The schematic of the AS-WWTP process.


383

Appendix C–2: AS-WWTP Process Description

The typical schematic of an AS-WWTP is shown in Fig. C.1. It consists of anaerobic, anoxic and aerobic

zones and a secondary settler in a back-to-back scheme with multiple recycle streams. To ensure that plug flow

conditions prevail in the bioreactors, the basins are usually partitioned such that back-mixing is minimized. The

biological processes within the different zones of the reactors are briefly presented below.

In the anaerobic zone, fermentable organic from the influent wastewater are mixed with the returned

activated sludge (RAS) and converted to volatile fatty acids (VFA) by heterotrophic organisms. The latter is

consumed by phosphorus-accumulating organisms (PAO) and stored internally as poly-β hydroxyl alkanoates

(PHA). Concurrently, poly phosphate and hence energy for volatile fatty acids (VFA) accumulation are internally

released. Denitrification in this zone results in a net reduction of alkalinity and hence there is an increase in pH

due to acids production. If the amount of volatile fatty acids (VFA) is insufficient, additional acids from external

source may be added to maintain a maximum poly-β hydroxyl alkanoates (PHA) uptake by the biological

phosphate organisms. It is also common to install an activated primary sedimentation tank to allow production of

volatile fatty acids (VFA) by fermentation of readily substrate in the incoming sewage.

Table C.1: The AS-WWTP Nomenclatures and Parameter Definitions

Parameters Definition Parameters Definition

SI* Soluble inert organic matter COD Chemical oxygen demand

SS* Readily biodegradable substrate BOD Biochemical oxygen demand

XI* Particulate inert organic matter IQ Influent (inf) quality

XS* Slowly biodegradable substrate EQ Effluent (e) quality

XBH* Active heterotrophic biomass QIN Influent flow rate

XBA* Active autotrophic biomass F_M_R Food-to-microorganisms ratio

XP* Particulate products arising from biomass decay Ntotal Total nitrogen

SO* Oxygen AF1,AF2,AF3 Aeration control points for the aerated reactors

SNO* Nitrate and nitrite nitrogen Qa1 Internal recycled nitrate (IRN) flow rates

SNH* Ammonia and ammonium nitrogen Qa2 External recycled nitrate (ERN) flow rates

SND* Soluble biodegradable organic nitrogen Qf Feed flow rates

XND* Particulate biodegradable organic nitrogen Qw Waste activated sludge (WAS) flow rate

SALK* Alkalinity Qr Recycled activated sludge (RAS) flow rates

TSS Total soluble solids Qe Effluent flow rate

MA1,MA2 Mechanical aerators of the anaerobic and anoxic reactors Qu Sludge under flow rates

MLVSS Mixed liquor volatile suspended solids KLa Mass transfer coefficient of the aerated reactors

IRN Internal recycled nitrate ERN External recycled nitrate

Zf Feed concentration Ze Effluent concentration

Zu Settler underflow concentration Zw Waste activated sludge (WAS) concentration

Zr Recycled activated sludge (RAS) concentration PE Pumping energy

AE Aeration energy DO Dissolved oxygen

Za1 Internal recycled nitrate (IRN) concentration Za2 External recycled nitrate (ERN) concentration

Note: (i) The numerical values of 1, 2, 3, 4, and 5 in front of each parameter correspond to the parameter description in

the anaerobic, anoxic and the three aerated reactors respectively.

(ii) The inf and E (and sometimes e) refers to influent and effluent respectively.

(iii) Other parameters are introduced and defined as they are needed.

(iv) Notations with asterisk (*) are the state variables


384

In the anoxic zone, nitrate (SNO) which is recycled from the aerobic zone is converted to dinitrogen by

facultative heterotrophic organisms. Denitrification in this zone results in the release of alkalinity and hence there

is an increase in pH value. There is also evidence of a pronounced removal of phosphorus in this zone.

In the partially-treated wastewater arriving the aerobic zone, virtually all the readily biodegradable

organic (referred to as biodegradable COD) in the partially-treated wastewater has been consumed by

heterotrophic organisms in the aerobic and anoxic zones. Thus in this aerobic zone, two major processes occur. In

the presence of dissolved oxygen (DO), the released phosphate is taken up by phosphorus-accumulating

organisms (PAO) growing on the stored poly-β hydroxyl alkanoates (PHA). The phosphorus is stored internally

as poly phosphate. This results in a net reduction in phosphate in the wastewater. The second process occurring in

this zone is nitrification of ammonia to nitrate in the wastewater by the autotrophic organisms. In order to

minimize the amount of dissolved oxygen (DO) going into the anoxic zone, the last compartment is typically

aerated. Part of the sludge, which contains phosphorus to be removed, is wasted while the remainder is returned to

the anaerobic zone after thickening in the settler and additional denitrification in the recycled activated sludge

(RAS) tank.

Appendix C–3: AS-WWTP Process Model

As mentioned in above, the BSM1 model involves eight (8) different chemical reactions ( )jρ

incorporating thirteen (13) different components ([Henze et al., 1996], [Coop, 2000], [COST 2000], [COST,

2008]). These components are classified into soluble components ( )S and particulate components ( )X . The

nomenclatures and parameter definitions used for describing the AS-WWTP in this work are given in Table D.1.

The Moreover, four fundamental processes are considered: the growth and decay of biomass (heterotrophic and

autotrophic), ammonification of organic nitrogen and the hydrolysis of particulate organics. The typical schematic

of the AS-WWTP is shown in Fig. C.1.

The eight basic processes that are used to describe the biological behaviour of the AS-WWTP process are:

1j = : Aerobic growth of heterotrophs

1 ,S O

H B H

S S S O

S SX

K S K Sρ µ

=

+ + (C.1)

2j = : Anoic growth of heterotrophs

,

2 ,

,

O HS NO

H g B H

S S O H O NO NO

KS SX

K S K S K Sρ µ η

= + + +

(C.2)

3j = : Aerobic growth of autotrophs


385

3 ,

,

NH O

H B A

NH NH O A O

S SX

K S K Sρ µ

= + +

(C.3)

4j = : Decay of heterotrophs

4 ,H B Hb Xρ = (C.4)

5j = : Decay of autotrophs

5 ,A B Ab Xρ = (C.5)

6j = : Ammonification of soluble organic nitrogen

6 ,a ND B Hk S Xρ = (C.6)

7j = : Hydrolysis of entrapped organics

( ), ,

7 ,

, ,,

S B H O HO NO

h h B H

O H O O H O NO NOX S B H

X X KS Sk X

K S K S K SK X Xρ η

= + + + ++

(C.7)

Table C.2: Stiochiometric parameters with their units and values

Parameters Unit Value

AY g cell COD formed.(g N oxidized)-1 0.24

HY g cell COD formed.(g COD oxidized)-1 0.67

Pf Dimensionless 0.08

XBi g N.(g COD)-1 in biomass 0.08

XPi g N.(g COD)-1 in particulate products 0.06

Table C.3: Kinetic parameters with their units and values

Parameters Unit Value

Hµ (day)-1 4.0

SK g COD.m-3 10.0

,O HK g (–COD).m-3 0.2

NOK g

3NO − N.m-3 0.5

Hb (day)-1 0.3

gη Dimensionless 0.8

hη Dimensionless 0.8

hk g slowly biodegradable COD.(g cell COD.day)-1 3.0

XK g slowly biodegradable COD.(g cell COD)-1 0.1

Aµ (day)-1 0.5

NHK g

3NH − N.m-3 1.0

Ab (day)-1 0.05

,O AK G (COD).m-3 0.4

ak m-3.(g COD.day)-1 0.05


386

8j = : Hydrolysis of entrapped organic nitrogen

( )( ), ,

8 ,

, ,,

S B H O HO NO

h h B H ND S

O H O O H O NO NOX S B H

X X KS Sk X X X

K S K S K SK X Xρ η

= + + + ++

(C.8)

The observed thirteen conversion rates ( )i

r result from combinations of basic processes (C.1) to (C.8) as follows:

( 1)I

S i = : 1 0r = (C.9)

( 2)S

S i = : 2 1 2 7

1 1

H H

rY Y

ρ ρ ρ= − − + (C.10)

( 3)I

X i = : 3 0r = (C.11)

( 4)S

X i = : ( ) ( )4 4 5 71 1P Pr f fρ ρ ρ= − + − − (C.12)

, ( 5)B HX i = : 5 1 2 4r ρ ρ ρ= + − (C.13)

, ( 6)B AX i = : 6 3 5r ρ ρ= − (C.14)

( 7)P

X i = : 7 4 5P Pr f fρ ρ= + (C.15)

( 8)O

S i = : 8 1 3

1 4.57H A

H A

Y Yr

Y Yρ ρ

− −= − − (C.16)

( 9)NO

S i = : 9 2 3

1 1

2.86

H

H A

Yr

Y Yρ ρ

−= − + (C.17)

( 10)NH

S i = : 10 1 2 3 6

1XB XB XB

A

r i i iY

ρ ρ ρ ρ

= − − − + +

(C.18)

( 11)ND

S i = : 11 6 8r ρ ρ= − + (C.19)

( 12)ND

X i = : ( ) ( )12 4 5 8XB P XP XB P XPr i f i i f iρ ρ ρ= − + − − (C.20)

( 13)ALK

S i = : 13 1 2 3 6

1 1 1

14 14 2.86 14 14 7 14

XB H XB XB

H A

i Y i ir

Y Yρ ρ ρ ρ

−= − + − − + +

× (C.21)

The biological parameter values used in the BSM1 correspond approximately to a temperature of 15°C. The

stiochiometric parameters are listed in Table C.2 and the kinetic parameters are listed in Table C.3.

Appendix C–4: General Characteristics of the Biological Reactors

As shown in Fig. C.1, the general characteristics of the biological reactors for the default case are five

compartments where the first two (Unit 1 and Unit 2) are non-aerated compartments whereas the last three (Unit

3, Unit 4 and Unit 5) are aerated compartments.


387

Unit 3 and Unit 4 of the aerated compartments have a fixed oxygen transfer coefficient of

1 110 240L

K a h day− −= = . In Unit 5, the dissolved oxygen (DO) concentration is controlled at a level of

32 ( ).g COD m−− by manipulation of the

LK a . Each of the five compartments has a flow rate

kQ , the

concentrationk

Z , and the reaction ratek

r ; where 1,2, ,5k = … is the number of compartments. The volume of

the non-aerated compartments is 31,000m each while the volume of the aerated compartments is 31,333m .

The general equation for the reactor mass balances is given as:

For 1k = (Unit 1)

( )10 0 1 1 1 1

1

1a a r r

dZQ Z Q Z Q Z Q Z rV

dt V= + + − + (C.22)

1 0a rQ Q Q Q= + + (C.23)

For 2k = to 5 (Unit 2, Unit3, Unit 4 and Unit 5)

( )1 , 1

1k

k O k k k k k

k

dZQ S Q Z r V

dt V− −= − + (C.24)

1k kQ Q −= (C.25)

Special case for oxygen ,( )O kS :

( ) ( )( )*

1 , 1 , ,

1k

k O k k k L k O O k k O kk

k

dZQ S r V K a V S S Q S

dt V− −= + + − − (C.26)

where the saturation concentration for oxygen is * 30.8 .O

S g m−= .

Also,

5aZ Z= (C.27)

5fZ Z= (C.28)

w rZ Z= (C.29)

5f a e r w

e u

Q Q Q Q Q Q

Q Q

= − = + +

= + (C.30)

Appendix C–5: General Characteristics of the Secondary Settler

The secondary settler is modeled as a ten layers non-reactive unit, that is, no biological reactions in the

secondary settler. The 6th layer, counting from the top, is the feed layer as can be seen in Fig. C.1. The settler has

an area (A) of 21,500m . The height ( )m

z of each layer ( )m is 0.4m , for a total height of 4m . Therefore, the

settler has total volume equal to3

6,000m .


388

The solid flux due to gravity is

( )s s

J v X X= (C.31)

where X is the total sludge concentration and ( )s

v X is a double-exponential settling velocity function defined

as:

( ) ( )( ) minmin'

0 0( ) max 0,min , phr X Xr X X

sv X v v e e

−− = −

(C.32)

where min ns fX f X= . The parameter values for the non-exponential settling velocity function (C.32) are given in

Table C.4.

Thus, the mass balances for the sludge are expressed as:

For the feed layer ( 6)m = :

( ) ( ), 1 , , 1min ,f f

clar m up dn m s m s mm

m

Q XJ v v X J J

dX A

dt z

+ −+ − − −

= (C.33)

For the intermediate layers below the feed layer ( 2)m = to ( 5)m = :

( ) ( ) ( )1 , , 1 , , 1min , min ,dn m m s m s m s m s mm

m

v X X J J J JdX

dt z

+ + −− + −= (C.34)

For the bottom layer ( 1)m = :

( ) ( )2 1 ,2 ,11

1

min ,dn s s

v X X J JdX

dt z

− += (C.35)

For the intermediate clarification layers above the feed layer ( 7)m = to ( 9)m = :

Table C.4: The double-exponential settling velocity function parameters with their definition, units and values

Parameters Definition Units Value '

0v Maximum settling

velocity

m.(day)-1

250.0

0v Maximum Vesilind

settling velocity

m.(day)-1

474

hr Hindered zone settling

parameter

m3.(g SS)

-1 0.000576

pr Flocculant zone settling

parameter

m3.(g SS)

-1 0.00286

nsf Non-settleable fraction Dimensionless 0.00228


389

( )

( )

1 , 1 ,

, , 1 1 1

,

1,

min , ,

,

up m m clar m clar mm

m

s j j s j j j t

clar j

j ts j j

v X X J JdX

dt z

v X v X if X XJ

if X Xv X

− +

− − −

−

− + − =

>

= ≤

(C.36)

For the top layer ( 10)m = :

( )

( )

9 10 ,1010

10

,10 10 ,9 9 9

,10

9,10 10

min , ,

,

up clar

s s t

clar

ts

v X X JdX

dt z

v X v X if X XJ

if X Xv X

− − =

>

= ≤

(C.37)

The threshold concentration 33,000 .t

X g m−= .

For the soluble components (including dissolved oxygen), each layer represents a completely mixed

volume and the concentrations of soluble components are given accordingly as:

For the feed layer ( 6)m = :

( )f f

dn up mm

m

Q Xv v Z

dZ A

dt z

− +

= (C.38)

For the intermediate layers below the feed layer ( 1)m = to ( 5)m = :

( )1dn m mm

m

v Z ZdZ

dt z

+ −= (C.39)

For the intermediate layers above the feed layer ( 7)m = to ( 10)m = :

( )1up m mm

m

v Z XdZ

dt z

− −= (C.40)

where

u r w

dn

e

up

Q Q Qv

A A

Qv

A

+ = =

=

(D.41)

The concentrations in the recycle and waste flow are equal to those of the first layer ( 1)m = , that is, 1uZ Z= .

The sludge concentration from the concentrations in Unit 5 of Fig. C.1 can be computed from:

( )

( )

,5 ,5 ,5 , ,5 , ,5

,5 ,5 ,5 , ,5 , ,5

1

0.75

f S P I B H B A

COD SS

S P I B H B A

X X X X X Xfr

X X X X X

−

= + + + +

= + + + +

(C.42)


390

since 4 / 3 0.75COD SS

fr − = = . The same calculation is applied for u

X in the settler underflow and e

X at the plant

exit.

To calculate the distribution of particulate concentrations in the recycled and waste flows, their ratios with

respect to the total solid concentration are assumed to remain constant across the settler:

,5 ,

,5

,

S S u

f u

S

S u u

f

X X

X X

XX X

X

=

=

(C.43)

Equation (C.43) is also true for ,P uX , ,I uX , , ,B H uX , , ,B A uX and ,ND uX . The assumption made here means that the

dynamics of the fractions of particulate concentrations in the inlet of the settler will be directly propagated to the

settler underflow, without taking into account the normal retention time in the settler ([Henze et al., 1996], [Coop,

2000], [COST, 2000], [COST, 2008]).

Sludge Age

(A) Sludge Age Based on Total Amount of Biomass

In the steady-state case, the sludge age calculation is based on the total amount of biomass present in the

system that is in the reactor and settler:

a s

Age

e w

TX TXSludge

φ φ

+=

+ (C.44)

where a

TX is the total amount of biomass present in the reactor and it is expressed as:

( ), , , ,

1

, 5n

a B H i B A i i

i

TX X X V n=

= + =∑ (C.45)

sTX is the total amount of biomass present in the effluent and it is expressed as:

( ), , , ,

1

, 10m

s B H i B A i j

j

TX X X z A m=

= + =∑ (C.46)

eφ is the loss rate of biomass in the effluent and it is expressed as:

( ), , , , , 10e B H m B A m eX X Q mφ = + = (C.47)

wφ is the loss rate of biomass in the waste flow and it is expressed as:

( ), , , ,w B H u B A u wX X Qφ = + (C.48)


391

(B) Sludge Age Based on Total Amount of Solids

In an actual AS-WWTP, the sludge age is measured based on the total amount of solid present in the

system. Thus,

fa fs

Age

fe fw

TX TXSludge

φ φ

+=

+ (C.49)

where faTX is the total amount of solids present in the reactor and can be expressed as:

,

1

, 5n

fa f i i

i

TX X V n=

= =∑ (C.50)

where ( ), , , , , , , ,

1f i S i P i I i B H i B A i

COD SS

X X X X X Xfr −

= + + + + (D.51)

fsTX is the total amount of solids present in the settler and can be expressed as:

,

1

, 10m

fs f i j

j

TX X z A m=

= =∑ (C.52)

where ( ), , , , , , , ,

1f j S j P j I j B H j B A j

COD SS

X X X X X Xfr −

= + + + + (C.53)

feφ is the loss rate of solids in the effluent and can be expressed as:

, , 10fe f m eX Q mφ = = (C.54)

where ( ), , , , , , , ,

1f m S m P m I m B H m B A m

COD SS

X X X X X Xfr −

= + + + + (C.55)

wφ is the loss rate of solids in the waste flow and can be expressed as:

,fw f u wX Qφ = (C.56)

where ( ), , , , , , , ,

1f u S u P u I u B H u B A u

COD SS

X X X X X Xfr −

= + + + + (C.57)


392

Appendix C–6: The Simulink Model of the BSM1 for Evaluating the Performance of

AS-WWTPs

The Simulink model of the BSM1 for AS-WWTPs is shown in Fig. C.2. The model of Fig. C.2 is used for

steady state open-loop simulation of the BSM1 model with constant influent data for 150 days. By comparing Fig.

C.1 and Fig. C.2, the combiner corresponds to the Mixing tank. The combiner is implemented as a MATLAB C

program, combiner.c, given in Appendix C–9. The Flow_comb1 and Flow_comb2 are implemented by combine.c

with different inputs. In the Hyd_delay block of Fig. C.2, the load from the Flow_comb2 is first calculated and the

first order reaction is used for the load and flow concentration. The Hyd_delay behaviour is also model and

implemented as a MATLAB C program given in Appendix C–10 as hyddelayv2.c. Subsequent load flow and

concentration are recalculated based on the delayed flow and load concentration. The bioreactors are indicated by

Bioreactor_1 to Bioreactor_5. The Simulink model for each of the five bioreactors is shown in Fig. C.3 where

LK a corresponds to the mechanical aerators for manipulating the five bioreactors. The bioreactors are

implemented as a single MATLAB C program, asm1.c, given in Appendix C–11. The Deoxic in Fig. C.1 could be

described as the flow splitter (Flow_splitter) in Fig. C.2. The Simulink model for the flow splitter is shown in Fig.

C.4. Finally, Simulink model for the secondary settler is shown in Fig. C.5. The settler is actually implemented as

a MATLAB C program, settler1dv4.c, given in Appendix C–12.

UY

underflow

reac4

To Workspace8

reac3

To Workspace7

reac2

To Workspace6

reac1

To Workspace5

in

To Workspace4

settler

To Workspace3

reac5

To Workspace2

rec

To Workspace10

feed

To Workspace

f eed

Qr

Qw

output

Settler_1D

270

Qw

25000

Qr

Qintr

Qintr

Mux

Qin, Qr and Qintr

Mux Qin and Qr

CONSTINFLUENT

Plant input

-C-

KLa5

-C-

KLa4

-C-

KLa3

-C-

KLa2

-C-

KLa1

hyddelayv2

Hyd_delay

f low

set_1

f low1

f low2

Flow_spl itter

combiner

Flow_comb2

combiner Flow_comb1

150

Clock

input

KLaoutput

Bioreactor_5

input

KLaoutput

Bioreactor_4

input

KLaoutput

Bioreactor_3

input

KLaoutput

Bioreactor_2

input

KLaoutput

Bioreactor_1

Fig. C.2: Open-loop steady-state benchmark simulation model No.1 (BSM1) with constant influent.


393

Appendix C–7: The AS-WWTP Operational Considerations for the Biological

Reactors

The activated sludge process generally requires more sampling and testing to maintain adequate process

control than any of the other unit processes in the wastewater treatment system. During periods of operational

problems, both the parameters tested and the frequency of testing may increase substantially. Process control

testing may include the following: settleability testing to determine the settled sludge volume; suspended solids

testing to determine the influent flow and mixed liquor volatile suspended solids (MLVSS); recycled activated

sludge (RAS) solids and waste activated sludge (WAS) concentrations; determination of the volatile content of

the mixed liquor suspended solids; dissolved oxygen (DO) and pH of the aeration tank; biochemical oxygen

demand (BOD) and chemical oxygen demand (COD) of the aeration tank influent and process effluent; and

microscopic evaluation of the activated sludge to determine the predominant organism. To maintain the working

organisms in the activated sludge process, a suitable environment must be maintained by being aware of the many

1

output

Mux

add KLa

asm1

Bioreactor_12

KLa

1

input

Fig. C.3: Simulink model of the bioreactor model.

2

flow2

1

flow1

Sum

U Y

Selector1

U Y

Selector

Mux

Mux1

Mux

Mux

2

set_1

1

flow

Fig. C.4: Simulink model of the flow splitter.

1

output

Mux

add Qr and Qw

settler1dv4

Settler_1D3

Qw

2

Qr

1

feed

Fig. C.5: Simulink model of the secondary settler.


394

factors influencing the process and by monitoring them repeatedly. Control is defined here as maintaining the

proper solids (also called the floc mass) concentration in the aerator for the incoming wastewater (food) flow by

adjusting the return and waste sludge pumping rates and regulating the oxygen supply to maintain a satisfactory

level of dissolved oxygen (DO) in the process.

The activated sludge process must receive sufficient aeration to keep the activated sludge in suspension

and to satisfy the organism oxygen requirements. Insufficient mixing results in dead spots, septic conditions, and

loss of activated sludge. The activated sludge process also requires sufficient alkalinity to ensure that pH of the

mixed liquor remains within the acceptable range. If organic nitrogen and ammonia are being converted to nitrate

(nitrification), sufficient alkalinity must also be available to support this process as well.

Finally, the microorganisms of the activated sludge process require nutrients (nitrogen, phosphorus, iron,

and other trace metals) to function. Also as temperature decreases, activity of the organisms will also decrease.

However, cold temperatures require longer recovery time for systems that have been upset while warm

temperatures tend to favor denitrification and filamentous growth. It should also be noted that the activity level of

bacteria within the activated sludge process increases with rise in temperature [Spellman, 2003].

Appendix C–8: Criteria for Evaluating and Assessing the Performances of the AS-

WWTP Control

The activated sludge process generally requires more sampling and testing to maintain adequate process

control than any of the other unit processes in the wastewater treatment system. During periods of operational

problems, both the parameters tested and the frequency of testing may increase substantially. Process control

testing may include the following: settleability testing to determine the settled sludge volume; suspended solids

testing to determine influent and MLSS; RAS solids and WAS concentrations; determination of the volatile

content of the mixed liquor suspended solids; DO and pH of the aeration tank; BOD and COD of the aeration tank

influent and process effluent; and microscopic evaluation of the activated sludge to determine the predominant

organism. To maintain the working organisms in the activated sludge process, it is necessary to ensure that a

suitable environment is maintained by being aware of the many factors influencing the process and by monitoring

them repeatedly. Control, in this case, can be defined as maintaining the proper solids (floc mass) concentration in

the aerator for the incoming wastewater flow by adjusting the return and waste sludge pumping rate and

regulating the oxygen supply to maintain a satisfactory level of DO in the process.

(A) Influent Quality (IQ)

As a check on the IQ calculation, an influent quality index (IQ) can be calculated by applying the above

equations to the influent file but the BOD coefficient must be changed from 0.25 to 0.65. It is defined as:


395

( )14

0 0 ,0 ,0 5,0 _

7

1. . ( ) ( ) ( ) ( ) ( ) 3 * ( )

1000

t days

SS COD NKj NKj NO NO BOD in tr

t days

I Q B SS t B COD t B S t B S t B BOD t Q t dtT

=

=

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ ⋅⋅ ∫

where the composition variables are calculated as follows:

,0 ,0 ,0 ,0 , ,0 , ,0 ,0 ,0( ) ( )NKj NH ND ND XB B H B A XP P i

S S S X i X X i X X= + + + + + +

( )0 ,0 ,0 , ,0 , ,0 ,00.75S I B H B A P

SS X X X X X= ⋅ + + + +

( )0 ,0 ,0 , ,0 , ,00.65 (1 ) ( )S S P B H B A

BOD S X f X X= ⋅ + + − ⋅ +

0 ,0 ,0 ,0 ,0 , ,0 , ,0 ,0S I S I B A B A PCOD S S X X X X X= + + + + + +

(B) Effluent Quality (EQ)

The effluent quality (E.Q.) (kg pollution unit/d) is averaged over the period of observation ( )T d (i.e. the

second week or 7 last days for each weather file) based on a weighting of the effluent loads of compounds that

have a major influence on the quality of the receiving water and that are usually included in regional legislation. It

is defined as ([Coop, 2000], [COST 2000], [COST, 2008]):

( )14

, , 5,

7

1. . ( ) ( ) ( ) ( ) ( ) ( )

1000

t days

TSS e COD e NKj NKj e NO NO e BOD e e

t days

E Q B TSS t B COD t B S t B S t B BOD t Q t dtT

=

=

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ ⋅⋅ ∫

where the composition variables are calculated as follows:

, , , , , , , , , ,( ) ( )

NKj e NH e ND e ND e XB B H e B A e XP P e I eS S S X i X X i X X= + + + + + +

( ), , , , , , ,0.75e S e I e B H e B A e P e

TSS X X X X X= ⋅ + + + +

( ), , , , , ,0.25 (1 ) ( )e S e S e P B H e B A eBOD S X f X X= ⋅ + + − ⋅ +

, , , , , , , , ,e S e I e S e I e B H e B A e P eCOD S S X X X X X= + + + + + +

, ,e NKj e NO eNtotal S S= +

where i

B are weighting factors for the different types of pollution to convert them into pollution units (3/g m )

and were chosen to reflect these calculated fractions as follows: 2,TSS

B = 1,COD

B = 20,NKj

B = 20NO

B =

and 5 2.BOD

B =

The major operating cost in biological nutrient removal process as well as nitrogen removing ASPs is

blower energy consumption. If the DO set-point is reduced by a control strategy, significant energy saving can be

achieved. Operational issues are considered through three items: sludge production, pumping energy and aeration

energy (integrations performed on the final 7 days of weather simulations (i.e. from day 22 to day 28 of weather

file simulations, 7T days= ).


396

(C) The Sludge production to be disposed ( / )kg d :

This is the sludge production, ,sludge

P is calculated from the total solid flow from wastage and solid accumulated

in the system over the period of time considered ( 7f

t = for each weather file). The amount of solids in the

system at time t is given by:

( )( ) ( )( ) ( )( )system reactor ssttlerM TSS t M TSS t M TSS t= +

where ( )( )reactor

M TSS t is the amount of solids in the reactor given by:

( )5

, , , , , , ,( )( ) 0.75i

reactor S i I i B H i B A i P i i

i l

M TSS t X X X X X V=

=

= ⋅ + + + + ⋅∑

( )( )settler

M TSS t is the amount of solids in the settler given by:

( )7

, , , , , , ,( )( ) 0.75

j

settler S j I j B H j B A j P j i

j l

M TSS t X X X X X z A=

=

= + + + +∑

( )system

M TSS∆ the change in system sludge mass from the end of day 7 to the end of day 14 given by:

14 7( ) ( ) ( )

system system days system daysM TSS M TSS M TSS∆ = −

and ( )( )watse

M TSS t the amount of waste sludge is given by:

( )14

, , , , , , ,

7

( )( ) 0.75 ( )

t days

watse S w I w B H w B A w P w w

t days

M TSS t X X X X X Q t dt

=

=

= + + + + ⋅∫

So that the total sludge to be disposed becomes:

( )1

( )( ) ( )( )sludge system watse

P M TSS t M TSS tT

= ∆ +

(D) The Total Sludge production ( /kg d ):

The total sludge production takes into account the sludge to be disposed and the sludge lost to the weir and is

calculated as follows:

_( )

total sludge sludge eP P M TSS= +

where ( )14

, , , , , , ,

7

0.75( ) ( )

t days

e S e I e B H e B A e P e e

t days

M T SS X X X X X Q t dtT

=

=

= + + + + ⋅∫

(E) The Pumping Energy (PE):

The pumping energy in /kWh d is calculated as follows:

( )14

7

0.04( ) ( ) ( )

t days

a r w

t days

PE Q t Q t Q t dtT

=

=

= + + ⋅∫


397

where ( )a

Q t is internal recycle flow rate at time 3 1( ) ,t m d −

( )r

Q t is the return sludge recycle flow rate at time

3 1( ) ,t m d − and ( )

wQ t is the waste sludge flow rare at time

3 1( ).t m d −

(F) The Aeration Energy (AE):

The aeration energy (AE) in /kWh d takes into account the plant peculiarities (type of diffuser, bubble size,

depth of submersion, etc,) and is calculated from the l

k a in the three aerated tanks according to the following

relation, valid for Degrémont DP230 porous disks at an immersion depth of 4 :m

( ) ( )( )1 4 5

2

37

240.4032 7 .4408

t d ays i

l li iit days

A E k a k a dtT

= =

==

= + ⋅∑∫

where l

k a in 1

h−

and where i is the compartment number.

The increase in capacity which could be obtained using the proposed control strategy should be evaluated.

This factor is relative to investment costs if the plant would simply be extended to deal with increased load. This

is expressed by the relative increase in the influent flow rate, ,a which can be applied while maintaining the

reference effluent quality index ( )refEQ for the three weather conditions ( 7T = days for each). refEQ is

calculated from the above equation in open loop. 0, 0,( ) * ( )i iQ t a Q t= with 1i = for dry weather, 2i = for storm

weather and 3i = for rain weather. Operation variables such as ,w r

Q Q and L

k a in compartments 3 and 4

remains unchanged.

Appendix C–9: Constraints Imposed on the ASWWTP Based on the Benchmark

(A) Constraints on control handle:

As stated in [15], for reasons of simplicity all available control handles are considered to be ideal with

regard to their behaviour. In the closed-loop test case only two control handles were used: the internal

recirculation flow-rate ( )a

Q and the oxygen transfer rate in the fifth reactor i.e. Unit 5 ( 5).L

K a The following

control handles are considered to exist for implementation of new control strategies with the benchmark

simulation model no. 1(BSM1):

(i) Internal flow recirculation rate ( )a

Q ;

(ii) Return sludge flow rate ( )r

Q ;

(iii) Wastage flow rate ( )w

Q ;


398

(iv) Anoxic/aerobic volume – all five biological reactors are equipped with both aerators and mechanical

mixing devices, i.e. in a discrete fashion the volumes for anoxic and aerobic behaviour can be modified;

(v) Aeration intensity individually for each reactor ( 1, 2, 3, 4, 5)L L L L L

K a K a K a K a K a ;

(vi) External carbon (ext_car) source flow rate _ 1 _ 2 _ 3 _ 4 _ 5( , , , , )ext car ext car ext car ext car ext car

Q Q Q Q Q , where

the carbon source is considered to consist of readily biodegradable substrate, i.e. ;S

S

(vii) Influent distribution by use of step feed (fractions of the influent flow to each of the five biological

reactors: inf 1 inf 2 inf 3 inf 4 inf 5, , , ,Q low Q low Q low Q low Q low

f f f f f );

(viii) Distribution of internal flow recirculation (fractions of the internal recirculation flow to each of the five

biological reactors: 1 2 3 4 5, , , ,Qa Qa Qa Qa Qa

f f f f f );

(ix) Distribution of return sludge flow (fractions of the return sludge flow to each of the five biological

reactors: 1, 2 3 4 5, , ,Qr Qr Qr Qr Qr

f f f f f ).

Table C.5: Numerical values of the constraints available control handles and their limitations

Control handle Minimum

Value

Maximum

Value

Comments

3 1( )a

Q m d− 0 92230 Max = 500% of

0 , s ta bQ

3 1( )r

Q m d− 0 36892 Max = 2500% of

0 , s ta bQ

3 1( )w

Q m d− 0 1844.6 Max = 10% of

0 , s ta bQ

11( )L

K a d− 0 360 Reactor 1(Unit 1)

12 ( )L


13( )L


14 ( )L


15 ( )L


3 1

_ 1 ( )e x t c a rQ m d− 0 5 Reactor 1(Unit 1)

Carbon source concentration 14 0 0 , 0 0 0 m g C O D l−⋅

Available as S

S (e.g. 25% ethanol solution)

3 1

_ 2 ( )e x t c a rQ m d− 0 5 Reactor 2(Unit 2) ) – Otherwise same as above

3 1

_ 3 ( )e x t c a rQ m d− 0 5 Reactor 3(Unit 3) – Otherwise same as above

3 1


3 1


1 2 3

4 5

, , ,

,

Q in Q in Q in

Q in Q in

f f f

f f

0 1 Part of the influent flow rate distributed to each biological reactor.

Note: the sum of all five must always equal one.

1 2 3

4 5

, , ,

,

Q a Q a Q a

Q a Q a

f f f

f f

0 1 Part of the internal recirculation flow rate distributed to each biological reactor.


1 2 3

4 5

, , ,

,

Q r Q r Q r

Q r Q r

f f f

f f

0 1 Part of the sludge return flow rate distributed to each biological reactor.



399

The above selection gives about 30 individual control handles with which to manipulate the defined COST 624

benchmark plant and dramatically increases its flexibility. Although these number of available control handles

may not be realistic for a real plant but is defined for the benchmark plant in order to allow for basically any type

of general control strategies and this is the main purpose of the COST benchmark [15, 16]. The numerical values

of these constraints defining the limitations for the different control handles are given in Table C.5.

(B) Constraints on the Effluent Quality:

The flow average values of the effluent concentrations over the three test periods (dry, rain and storm

weather: 7 days for each) should be constrained for the five effluent components within the following limit: total

nitrogen 3

, ,( 18 ),total NO e NKj e

N S S g m−= + < ⋅ total COD 3( 100 ),eCOD g m−< ⋅ ammonia 3

,( 4 ),NH eS g m−< ⋅ suspended

solids 3( 30 )eTSS g m

−< ⋅ and 5BOD 3( 10 ).e

BOD g m−< ⋅

(C) Operating costs

The major operating cost in biological nutrient removal as well as nitrogen removing ASPs is blower

energy consumption. If the DO set-point is reduced by a control strategy, significant energy saving can be

achieved.

Appendix C–10: Controller Performance Evaluation and Assessment Criteria Based

on the Benchmark

This is the first level of performance assessment which concerns the local control loops, integral of the

absolute error (IAE) and the integral of the squared error (ISE) criteria, by maximal deviation from set-points and

by error variance. Basically this serves as a proof that the proposed control strategy has been applied properly.

(A) Controlled variable performance

(i) Integral of absolute error (IAE) :0

lt

i it

IAE e dt= ⋅∫ ,

where i

e is the error given by intsetpo measured

i i ie Z Z= − , 7l = days and the subscript i is meant to distinguish

different controlled variables in the same system.

(ii) Integral of square error (ISE) :0

2lt

i it

ISE e dt= ⋅∫

(iii) Maximum deviation from setpoint: ( ) max maxerror

i iDev e=


400

(iv) Variance in the controlled variable error : ( )2

2( )i i i

Var e e e= − ,

with 0

lt

it

i

e d te

T

⋅=∫ and 0

2

2

lt

it

i

e d te

T

⋅=∫

(B) Manipulated variable performance

(i) Maximum deviation from setpoint: ( ) ,max ,minmax error

i i iDev u u= −

where i

u is the value of the manipulated variable ( )MV and the minimum and the maximum are determined

during the 7 days of interest defined above where the subscript i is meant to distinguish different manipulated

variables in the same system.

(ii) Maximum deviation in the change in manipulated variable: ( )max max( )iu

i iDev u

∆= ∆

where ( ) ( )i i iu u t dt u t∆ = + −

(iii) Variance in the change in manipulated variable: ( )2

2( )i i i

Var u u u∆ = ∆ − ∆

where 0 0

2

2,

l lt t

i it t

i i

u d t u d tu and u

T T

∆ ⋅ ∆ ⋅∆ = ∆ =

∫ ∫


401

Appendix C–11: MATLAB C Program for the Mixing Tank, combiner.c

/* * flowcombiner.c calculates the concentrations when adding two flow * streams together. * */

#define S_FUNCTION_NAME combiner

#include "simstruc.h"

/* * mdlInitializeSizes - initialize the sizes array */ static void mdlInitializeSizes(SimStruct *S) ssSetNumContStates( S, 0); /* number of continuous states */ ssSetNumDiscStates( S, 0); /* number of discrete states */ ssSetNumInputs( S, 30); /* number of inputs */ ssSetNumOutputs( S, 15); /* number of outputs */ ssSetDirectFeedThrough(S, 1); /* direct feedthrough flag */ ssSetNumSampleTimes( S, 1); /* number of sample times */ ssSetNumSFcnParams( S, 0); /* number of input arguments */ ssSetNumRWork( S, 0); /* number of real work vector elements */ ssSetNumIWork( S, 0); /* number of integer work vector elements*/ ssSetNumPWork( S, 0); /* number of pointer work vector elements*/

/* * mdlInitializeSampleTimes - initialize the sample times array */ static void mdlInitializeSampleTimes(SimStruct *S) ssSetSampleTime(S, 0, CONTINUOUS_SAMPLE_TIME); ssSetOffsetTime(S, 0, 0.0);

/* * mdlInitializeConditions - initialize the states */ static void mdlInitializeConditions(double *x0, SimStruct *S)

/* * mdlOutputs - compute the outputs */

static void mdlOutputs(double *y, double *x, double *u, SimStruct *S, int tid) y[0]=(u[0]*u[14] + u[15]*u[29])/(u[14]+u[29]); y[1]=(u[1]*u[14] + u[16]*u[29])/(u[14]+u[29]); y[2]=(u[2]*u[14] + u[17]*u[29])/(u[14]+u[29]);


402

y[3]=(u[3]*u[14] + u[18]*u[29])/(u[14]+u[29]); y[4]=(u[4]*u[14] + u[19]*u[29])/(u[14]+u[29]); y[5]=(u[5]*u[14] + u[20]*u[29])/(u[14]+u[29]); y[6]=(u[6]*u[14] + u[21]*u[29])/(u[14]+u[29]); y[7]=(u[7]*u[14] + u[22]*u[29])/(u[14]+u[29]); y[8]=(u[8]*u[14] + u[23]*u[29])/(u[14]+u[29]); y[9]=(u[9]*u[14] + u[24]*u[29])/(u[14]+u[29]); y[10]=(u[10]*u[14] + u[25]*u[29])/(u[14]+u[29]); y[11]=(u[11]*u[14] + u[26]*u[29])/(u[14]+u[29]); y[12]=(u[12]*u[14] + u[27]*u[29])/(u[14]+u[29]); y[13]=(u[13]*u[14] + u[28]*u[29])/(u[14]+u[29]); y[14]=(u[14]+u[29]);

/* * mdlUpdate - perform action at major integration time step */

static void mdlUpdate(double *x, double *u, SimStruct *S, int tid)

/* * mdlDerivatives - compute the derivatives */ static void mdlDerivatives(double *dx, double *x, double *u, SimStruct *S, int tid)

/* * mdlTerminate - called when the simulation is terminated. */ static void mdlTerminate(SimStruct *S)

#ifdef MATLAB_MEX_FILE /* Is this file being compiled as a MEX-file? */ #include "simulink.c" /* MEX-file interface mechanism */ #else #include "cg_sfun.h" /* Code generation registration function */ #endif


403

Appendix C–12: MATLAB C Program for the First-order Reaction Flow, hyddelayv2.c

/* * hyddelay is a C-file S-function for first order reaction of flow and conc. * In this version hyddelayv2 the loads are first calculated and the first order

reaction is used for the load and flow. After this the concentrations

recalculated based on the delayed flow and load. Better according to Jens Alex. */

#define S_FUNCTION_NAME hyddelayv2

#include "simstruc.h"

#define XINIT ssGetArg(S,0) #define PAR ssGetArg(S,1) #define T ssGetArg(S,2)

/* * mdlInitializeSizes - initialize the sizes array */ static void mdlInitializeSizes(SimStruct *S) ssSetNumContStates( S, 14); /*13 number of continuous states */ ssSetNumDiscStates( S, 0); /* number of discrete states */ ssSetNumInputs( S, 15); /* number of inputs */ ssSetNumOutputs( S, 15); /* number of outputs */ ssSetDirectFeedThrough(S, 0); /* direct feedthrough flag */ ssSetNumSampleTimes( S, 1); /* number of sample times */ ssSetNumSFcnParams( S, 3); /* number of input arguments */ ssSetNumRWork( S, 0); /* number of real work vector elements */ ssSetNumIWork( S, 0); /* number of integer work vector elements*/ ssSetNumPWork( S, 0); /* number of pointer work vector elements*/


/* * mdlInitializeConditions - initialize the states */ static void mdlInitializeConditions(double *x0, SimStruct *S) int i;

for (i = 0; i < 14; i++) x0[i] = mxGetPr(XINIT)[i];


404


static void mdlOutputs(double *y, double *x, double *u, SimStruct *S, int tid) double X_I2TSS, X_S2TSS, X_BH2TSS, X_BA2TSS, X_P2TSS; int i;

X_I2TSS = mxGetPr(PAR)[19]; X_S2TSS = mxGetPr(PAR)[20]; X_BH2TSS = mxGetPr(PAR)[21]; X_BA2TSS = mxGetPr(PAR)[22]; X_P2TSS = mxGetPr(PAR)[23];

for (i = 0; i < 13; i++) y[i] = x[i]/x[13];

y[13]=(X_I2TSS*x[2]+X_S2TSS*x[3]+X_BH2TSS*x[4]+X_BA2TSS*x[5]+X_P2TSS*x[6])/x[13]; y[14]=x[13];



/* * mdlDerivatives - compute the derivatives */ static void mdlDerivatives(double *dx, double *x, double *u, SimStruct *S, int tid) int i; double timeconst;

timeconst = mxGetPr(T)[0]; if (timeconst > 0.000001) dx[0] = (u[0]*u[14]-x[0])/timeconst; dx[1] = (u[1]*u[14]-x[1])/timeconst; dx[2] = (u[2]*u[14]-x[2])/timeconst; dx[3] = (u[3]*u[14]-x[3])/timeconst; dx[4] = (u[4]*u[14]-x[4])/timeconst; dx[5] = (u[5]*u[14]-x[5])/timeconst; dx[6] = (u[6]*u[14]-x[6])/timeconst; dx[7] = (u[7]*u[14]-x[7])/timeconst; dx[8] = (u[8]*u[14]-x[8])/timeconst; dx[9] = (u[9]*u[14]-x[9])/timeconst; dx[10] = (u[10]*u[14]-x[10])/timeconst; dx[11] = (u[11]*u[14]-x[11])/timeconst; dx[12] = (u[12]*u[14]-x[12])/timeconst; dx[13] = (u[14]-x[13])/timeconst; else dx[0] = 0;


405

dx[1] = 0; dx[2] = 0; dx[3] = 0; dx[4] = 0; dx[5] = 0; dx[6] = 0; dx[7] = 0; dx[8] = 0; dx[9] = 0; dx[10] = 0; dx[11] = 0; dx[12] = 0; dx[13] = 0; x[0] = u[0]*u[14]; x[1] = u[1]*u[14]; x[2] = u[2]*u[14]; x[3] = u[3]*u[14]; x[4] = u[4]*u[14]; x[5] = u[5]*u[14]; x[6] = u[6]*u[14]; x[7] = u[7]*u[14]; x[8] = u[8]*u[14]; x[9] = u[9]*u[14]; x[10] = u[10]*u[14]; x[11] = u[11]*u[14]; x[12] = u[12]*u[14]; x[13] = u[14];




406

Appendix C–13: MATLAB C Program for the BSM No.1 for the Bioreactors,

asm1.c

/* * ASM1 is a C-file S-function for IAWQ AS Model No 1. * */

#define S_FUNCTION_NAME asm1

#include "simstruc.h" #include <math.h>

#define XINIT ssGetArg(S,0) #define PAR ssGetArg(S,1) #define V ssGetArg(S,2) #define SOSAT ssGetArg(S,3)

/* * mdlInitializeSizes - initialize the sizes array */ static void mdlInitializeSizes(SimStruct *S) ssSetNumContStates( S, 13); /*13 number of continuous states */ ssSetNumDiscStates( S, 0); /* number of discrete states */ ssSetNumInputs( S, 16); /* number of inputs */ ssSetNumOutputs( S, 15); /* number of outputs */ ssSetDirectFeedThrough(S, 1); /* direct feedthrough flag */ ssSetNumSampleTimes( S, 1); /* number of sample times */ ssSetNumSFcnParams( S, 4); /* number of input arguments */ ssSetNumRWork( S, 0); /* number of real work vector elements */ ssSetNumIWork( S, 0); /* number of integer work vector elements*/ ssSetNumPWork( S, 0); /* number of pointer work vector elements*/





407


static void mdlOutputs(double *y, double *x, double *u, SimStruct *S, int tid) double X_I2TSS, X_S2TSS, X_BH2TSS, X_BA2TSS, X_P2TSS; int i;

X_I2TSS = mxGetPr(PAR)[19]; X_S2TSS = mxGetPr(PAR)[20]; X_BH2TSS = mxGetPr(PAR)[21]; X_BA2TSS = mxGetPr(PAR)[22]; X_P2TSS = mxGetPr(PAR)[23];

for (i = 0; i < 13; i++) y[i] = x[i];

y[13]=X_I2TSS*x[2]+X_S2TSS*x[3]+X_BH2TSS*x[4]+X_BA2TSS*x[5]+X_P2TSS*x[6]; y[14]=u[14]; /* x[13] */




double mu_H, K_S, K_OH, K_NO, b_H, mu_A, K_NH, K_OA, b_A, ny_g, k_a, k_h, K_X,

ny_h; double Y_H, Y_A, f_P, i_XB, i_XP; double proc1, proc2, proc3, proc4, proc5, proc6, proc7, proc8, proc3x; double reac1, reac2, reac3, reac4, reac5, reac6, reac7, reac8, reac9, reac10,

reac11, reac12, reac13; double vol, SO_sat, T; double xtemp[13]; int i;

mu_H = mxGetPr(PAR)[0]; K_S = mxGetPr(PAR)[1]; K_OH = mxGetPr(PAR)[2]; K_NO = mxGetPr(PAR)[3]; b_H = mxGetPr(PAR)[4]; mu_A = mxGetPr(PAR)[5]; K_NH = mxGetPr(PAR)[6];


408

K_OA = mxGetPr(PAR)[7]; b_A = mxGetPr(PAR)[8]; ny_g = mxGetPr(PAR)[9]; k_a = mxGetPr(PAR)[10]; k_h = mxGetPr(PAR)[11]; K_X = mxGetPr(PAR)[12]; ny_h = mxGetPr(PAR)[13]; Y_H = mxGetPr(PAR)[14]; Y_A = mxGetPr(PAR)[15]; f_P = mxGetPr(PAR)[16]; i_XB = mxGetPr(PAR)[17]; i_XP = mxGetPr(PAR)[18]; vol = mxGetPr(V)[0]; SO_sat = mxGetPr(SOSAT)[0];

for (i = 0; i < 13; i++) if (x[i] < 0) xtemp[i] = 0; else xtemp[i] = x[i];

if (u[15] < 0) x[7] = fabs(u[15]);

proc1 = mu_H*(xtemp[1]/(K_S+xtemp[1]))*(xtemp[7]/(K_OH+xtemp[7]))*xtemp[4]; proc2 =

mu_H*(xtemp[1]/(K_S+xtemp[1]))*(K_OH/(K_OH+xtemp[7]))*(xtemp[8]/(K_NO+xtemp[8]))*ny

_g*xtemp[4]; proc3 = mu_A*(xtemp[9]/(K_NH+xtemp[9]))*(xtemp[7]/(K_OA+xtemp[7]))*xtemp[5]; /* in GPS-X they use proc3x instead of proc3 in the oxygen equation */ /* proc3x = mu_A*(xtemp[9]/(K_NH+xtemp[9]))*(xtemp[7]/(K_OH+xtemp[7]))*xtemp[5]; */ proc4 = b_H*xtemp[4]; proc5 = b_A*xtemp[5]; proc6 = k_a*xtemp[10]*xtemp[4]; proc7 =

k_h*((xtemp[3]/xtemp[4])/(K_X+(xtemp[3]/xtemp[4])))*((xtemp[7]/(K_OH+xtemp[7]))+ny_

h*(K_OH/(K_OH+xtemp[7]))*(xtemp[8]/(K_NO+xtemp[8])))*xtemp[4]; proc8 = proc7*xtemp[11]/xtemp[3];

reac1 = 0; reac2 = (-proc1-proc2)/Y_H+proc7; reac3 = 0; reac4 = (1-f_P)*(proc4+proc5)-proc7; reac5 = proc1+proc2-proc4; reac6 = proc3-proc5; reac7 = f_P*(proc4+proc5); reac8 = -((1-Y_H)/Y_H)*proc1-((4.57-Y_A)/Y_A)*proc3; reac9 = -((1-Y_H)/(2.86*Y_H))*proc2+proc3/Y_A; reac10 = -i_XB*(proc1+proc2)-(i_XB+(1/Y_A))*proc3+proc6; reac11 = -proc6+proc8; reac12 = (i_XB-f_P*i_XP)*(proc4+proc5)-proc8; reac13 = -i_XB/14*proc1+((1-Y_H)/(14*2.86*Y_H)-(i_XB/14))*proc2-

((i_XB/14)+1/(7*Y_A))*proc3+proc6/14;

dx[0] = 1/vol*(u[14]*(u[0]-x[0]))+reac1; dx[1] = 1/vol*(u[14]*(u[1]-x[1]))+reac2; dx[2] = 1/vol*(u[14]*(u[2]-x[2]))+reac3;


409

dx[3] = 1/vol*(u[14]*(u[3]-x[3]))+reac4; dx[4] = 1/vol*(u[14]*(u[4]-x[4]))+reac5; dx[5] = 1/vol*(u[14]*(u[5]-x[5]))+reac6; dx[6] = 1/vol*(u[14]*(u[6]-x[6]))+reac7; if (u[15] < 0) dx[7] = 0; else dx[7] = 1/vol*(u[14]*(u[7]-x[7]))+reac8+u[15]*(SO_sat-x[7]); dx[8] = 1/vol*(u[14]*(u[8]-x[8]))+reac9; dx[9] = 1/vol*(u[14]*(u[9]-x[9]))+reac10; dx[10] = 1/vol*(u[14]*(u[10]-x[10]))+reac11; dx[11] = 1/vol*(u[14]*(u[11]-x[11]))+reac12; dx[12] = 1/vol*(u[14]*(u[12]-x[12]))+reac13; /*dx[13] = (u[14]-x[13])/T; low pass filter for flow, avoid algebraic loops */




410

Appendix C–14: MATLAB C Program for the Secondary Settler, settler1dv4.c

/* * SETTLER1D is a C-file S-function for defining a 10 layer settler model. * can simulate 0, 1 or 10 layers for the solubles by using MODELTYPE */

#define S_FUNCTION_NAME settler1dv4

#include "simstruc.h" #include <math.h>

#define XINIT ssGetArg(S,0) #define PAR ssGetArg(S,1) #define DIM ssGetArg(S,2) #define LAYER ssGetArg(S,3) #define MODELTYPE ssGetArg(S,4)

/* * mdlInitializeSizes - initialize the sizes array */ static void mdlInitializeSizes(SimStruct *S) ssSetNumContStates( S, 80); /* number of continuous states */ ssSetNumDiscStates( S, 0); /* number of discrete states */ ssSetNumInputs( S, 17); /* number of inputs */ ssSetNumOutputs( S, 113); /* number of outputs */ ssSetDirectFeedThrough(S, 1); /* direct feedthrough flag */ ssSetNumSampleTimes( S, 1); /* number of sample times */ ssSetNumSFcnParams( S, 5); /* number of input arguments */ ssSetNumRWork( S, 0); /* number of real work vector elements */ ssSetNumIWork( S, 0); /* number of integer work vector elements*/ ssSetNumPWork( S, 0); /* number of pointer work vector elements*/





411


static void mdlOutputs(double *y, double *x, double *u, SimStruct *S, int tid) double gamma, gamma_eff, modeltype; int i;

gamma = x[9]/u[13]; gamma_eff = x[0]/u[13];

modeltype = mxGetPr(MODELTYPE)[0];

if (modeltype < 0.5) /* underflow */ y[0]=x[19]; y[1]=x[29]; y[2]=u[2]*gamma; y[3]=u[3]*gamma; y[4]=u[4]*gamma; y[5]=u[5]*gamma; y[6]=u[6]*gamma; y[7]=x[39]; /* use oxygen in return sludge flow */ y[8]=x[49]; y[9]=x[59]; y[10]=x[69]; y[11]=u[11]*gamma; y[12]=x[79]; y[13]=x[9]; y[14]=u[15]; /* Q_r */ y[15]=u[16]; /* Q_w */

/* effluent */ y[16]=x[10]; y[17]=x[20]; y[18]=u[2]*gamma_eff; y[19]=u[3]*gamma_eff; y[20]=u[4]*gamma_eff; y[21]=u[5]*gamma_eff; y[22]=u[6]*gamma_eff; y[23]=x[30]; /* use oxygen in effluent flow */ y[24]=x[40]; y[25]=x[50]; y[26]=x[60]; y[27]=u[11]*gamma_eff; y[28]=x[70]; y[29]=x[0]; y[30]=u[14]-u[15]-u[16]; /* Q_e */

/* internal TSS states */ y[31]=x[0]; y[32]=x[1]; y[33]=x[2]; y[34]=x[3];


412

y[35]=x[4]; y[36]=x[5]; y[37]=x[6]; y[38]=x[7]; y[39]=x[8]; y[40]=x[9];

y[41]=gamma; y[42]=gamma_eff;

for (i = 10; i < 80; i++) y[i+33] = x[i];

else if ((modeltype > 0.5) && (modeltype < 1.5)) /* underflow */ y[0]=x[10]; y[1]=x[20]; y[2]=u[2]*gamma; y[3]=u[3]*gamma; y[4]=u[4]*gamma; y[5]=u[5]*gamma; y[6]=u[6]*gamma; y[7]=x[30]; /* use oxygen in return sludge flow */ y[8]=x[40]; y[9]=x[50]; y[10]=x[60]; y[11]=u[11]*gamma; y[12]=x[70]; y[13]=x[9]; y[14]=u[15]; /* Q_r */ y[15]=u[16]; /* Q_w */

/* effluent */ y[16]=x[10]; y[17]=x[20]; y[18]=u[2]*gamma_eff; y[19]=u[3]*gamma_eff; y[20]=u[4]*gamma_eff; y[21]=u[5]*gamma_eff; y[22]=u[6]*gamma_eff; y[23]=x[30]; /* use oxygen in effluent flow */ y[24]=x[40]; y[25]=x[50]; y[26]=x[60]; y[27]=u[11]*gamma_eff; y[28]=x[70]; y[29]=x[0]; y[30]=u[14]-u[15]-u[16]; /* Q_e */

/* internal TSS states */ y[31]=x[0]; y[32]=x[1]; y[33]=x[2]; y[34]=x[3]; y[35]=x[4]; y[36]=x[5]; y[37]=x[6];


413

y[38]=x[7]; y[39]=x[8]; y[40]=x[9];


for (i = 10; i < 20; i++) y[i+33] = x[10]; for (i = 20; i < 30; i++) y[i+33] = x[20]; for (i = 30; i < 40; i++) y[i+33] = x[30]; for (i = 40; i < 50; i++) y[i+33] = x[40]; for (i = 50; i < 60; i++) y[i+33] = x[50]; for (i = 60; i < 70; i++) y[i+33] = x[60]; for (i = 70; i < 80; i++) y[i+33] = x[70];

else if (modeltype > 1.5) /* underflow */ y[0]=u[0]; y[1]=u[1]; y[2]=u[2]*gamma; y[3]=u[3]*gamma; y[4]=u[4]*gamma; y[5]=u[5]*gamma; y[6]=u[6]*gamma; y[7]=u[7]; /* use oxygen in return sludge flow */ y[8]=u[8]; y[9]=u[9]; y[10]=u[10]; y[11]=u[11]*gamma; y[12]=u[12]; y[13]=x[9]; y[14]=u[15]; /* Q_r */ y[15]=u[16]; /* Q_w */

/* effluent */ y[16]=u[0]; y[17]=u[1]; y[18]=u[2]*gamma_eff; y[19]=u[3]*gamma_eff; y[20]=u[4]*gamma_eff; y[21]=u[5]*gamma_eff; y[22]=u[6]*gamma_eff; y[23]=u[7]; /* use oxygen in effluent flow */ y[24]=u[8]; y[25]=u[9]; y[26]=u[10]; y[27]=u[11]*gamma_eff; y[28]=u[12]; y[29]=x[0]; y[30]=u[14]-u[15]-u[16]; /* Q_e */


414

/* internal TSS states */ y[31]=x[0]; y[32]=x[1]; y[33]=x[2]; y[34]=x[3]; y[35]=x[4]; y[36]=x[5]; y[37]=x[6]; y[38]=x[7]; y[39]=x[8]; y[40]=x[9];


for (i = 10; i < 20; i++) y[i+33] = u[0]; for (i = 20; i < 30; i++) y[i+33] = u[1]; for (i = 30; i < 40; i++) y[i+33] = u[7]; for (i = 40; i < 50; i++) y[i+33] = u[8]; for (i = 50; i < 60; i++) y[i+33] = u[9]; for (i = 60; i < 70; i++) y[i+33] = u[10]; for (i = 70; i < 80; i++) y[i+33] = u[12];




double v0_max, v0, r_h, r_p, f_ns, X_t, area, h, feedlayer, volume, modeltype; double Q_f, Q_e, Q_u, v_up, v_dn, v_in, eps; int i; double vs[10]; double Js[11]; double Jstemp[10]; double Jflow[11];

v0_max = mxGetPr(PAR)[0]; v0 = mxGetPr(PAR)[1];


415

r_h = mxGetPr(PAR)[2]; r_p = mxGetPr(PAR)[3]; f_ns = mxGetPr(PAR)[4]; X_t = mxGetPr(PAR)[5]; area = mxGetPr(DIM)[0]; h = mxGetPr(DIM)[1]/mxGetPr(LAYER)[1]; feedlayer = mxGetPr(LAYER)[0]; modeltype = mxGetPr(MODELTYPE)[0]; volume = area*mxGetPr(DIM)[1];

eps = 0.01; v_in = u[14]/area; Q_f = u[14]; Q_u = u[15] + u[16]; Q_e = u[14] - Q_u; v_up = Q_e/area; v_dn = Q_u/area;

for (i = 0; i < 10; i++) vs[i] = v0*(exp(-r_h*(x[i]-f_ns*u[13]))-exp(-r_p*(x[i]-f_ns*u[13]))); if (vs[i] > v0_max) vs[i] = v0_max; else if (vs[i] < 0) vs[i] = 0;

for (i = 0; i < 10; i++) Jstemp[i] = vs[i]*x[i];

for (i = 0; i < 11; i++) if (i < (feedlayer-eps)) Jflow[i] = v_up*x[i]; else Jflow[i] = v_dn*x[i-1];

Js[0] = 0; Js[10] = 0; for (i = 0; i < 9; i++) if ((i < (feedlayer-1-eps)) && (x[i+1] <= X_t)) Js[i+1] = Jstemp[i]; else if (Jstemp[i] < Jstemp[i+1]) Js[i+1] = Jstemp[i]; else Js[i+1] = Jstemp[i+1];

for (i = 0; i < 10; i++) if (i < (feedlayer-1-eps)) dx[i] = (-Jflow[i]+Jflow[i+1]+Js[i]-Js[i+1])/h; else if (i > (feedlayer-eps)) dx[i] = (Jflow[i]-Jflow[i+1]+Js[i]-Js[i+1])/h; else dx[i] = (v_in*u[13]-Jflow[i]-Jflow[i+1]+Js[i]-Js[i+1])/h;


416

/* soluble component S_I */ if (modeltype < 0.5) for (i = 10; i < 20; i++) if (i < (feedlayer-1+10-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+10-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[0]-v_up*x[i]-v_dn*x[i])/h; else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[10] = (Q_f*(u[0]-x[10]))/volume; for (i = 11; i < 20; i++) dx[i] = 0; else if (modeltype > 1.5) for (i = 10; i < 20; i++) dx[i] = 0;

/* soluble component S_S */ if (modeltype < 0.5) for (i = 20; i < 30; i++) if (i < (feedlayer-1+20-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+20-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[1]-v_up*x[i]-v_dn*x[i])/h; else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[20] = (Q_f*(u[1]-x[20]))/volume; for (i = 21; i < 30; i++) dx[i] = 0; else if (modeltype > 1.5) for (i = 20; i < 30; i++) dx[i] = 0;

/* soluble component S_O */ if (modeltype < 0.5) for (i = 30; i < 40; i++) if (i < (feedlayer-1+30-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+30-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[7]-v_up*x[i]-v_dn*x[i])/h; else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[30] = (Q_f*(u[7]-x[30]))/volume; for (i = 31; i < 40; i++) dx[i] = 0;


417

else if (modeltype > 1.5) for (i = 30; i < 40; i++) dx[i] = 0;

/* soluble component S_NO */ if (modeltype < 0.5) for (i = 40; i < 50; i++) if (i < (feedlayer-1+40-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+40-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[8]-v_up*x[i]-v_dn*x[i])/h; else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[40] = (Q_f*(u[8]-x[40]))/volume; for (i = 41; i < 50; i++) dx[i] = 0; else if (modeltype > 1.5) for (i = 40; i < 50; i++) dx[i] = 0;

/* soluble component S_NH */ if (modeltype < 0.5) for (i = 50; i < 60; i++) if (i < (feedlayer-1+50-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+50-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[9]-v_up*x[i]-v_dn*x[i])/h; else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[50] = (Q_f*(u[9]-x[50]))/volume; for (i = 51; i < 60; i++) dx[i] = 0; else if (modeltype > 1.5) for (i = 50; i < 60; i++) dx[i] = 0;

/* soluble component S_ND */ if (modeltype < 0.5) for (i = 60; i < 70; i++) if (i < (feedlayer-1+60-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+60-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[10]-v_up*x[i]-v_dn*x[i])/h;


418

else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[60] = (Q_f*(u[10]-x[60]))/volume; for (i = 61; i < 70; i++) dx[i] = 0; else if (modeltype > 1.5) for (i = 60; i < 70; i++) dx[i] = 0;

/* soluble component S_ALK */ if (modeltype < 0.5) for (i = 70; i < 80; i++) if (i < (feedlayer-1+70-eps)) dx[i] = (-v_up*x[i]+v_up*x[i+1])/h; else if (i > (feedlayer+70-eps)) dx[i] = (v_dn*x[i-1]-v_dn*x[i])/h; else dx[i] = (v_in*u[12]-v_up*x[i]-v_dn*x[i])/h; else if ((modeltype > 0.5) && (modeltype < 1.5)) dx[70] = (Q_f*(u[12]-x[70]))/volume; for (i = 71; i < 80; i++) dx[i] = 0; else if (modeltype > 1.5) for (i = 70; i < 80; i++) dx[i] = 0;




419


Qin0 = 18446; Qintr = 3*Qin0;

S_I1 = 30; S_S1 = 3.2439; X_I1 = 1149.1683; X_S1 = 98.6029; X_BH1 = 2552.1095; X_BA1 = 151.6721; X_P1 = 446.9249; S_O1 = 0.0076964; S_NO1 = 3.5133; S_NH1 = 11.8312; S_ND1 = 1.3621; X_ND1 = 6.1775; S_ALK1 = 5.3399; Q1 = Qin0*2+Qintr;

S_I2 = 30; S_S2 = 1.6707; X_I2 = 1149.1683; X_S2 = 91.7032; X_BH2 = 2552.3711; X_BA2 = 151.5303; X_P2 = 448.0838; S_O2 = 6.0271e-05; S_NO2 = 1; S_NH2 = 12.5482; S_ND2 = 0.78899; X_ND2 = 5.9537; S_ALK2 = 5.5706; Q2 = Qin0*2+Qintr;

S_I3 = 30; S_S3 = 1.2195; X_I3 = 1149.1683; X_S3 = 69.6594; X_BH3 = 2560.2025; X_BA3 = 152.6873; X_P3 = 449.6336; S_O3 = 1.635; S_NO3 = 6.2289; S_NH3 = 7.3197; S_ND3 = 0.8307; X_ND3 = 4.7131; S_ALK3 = 4.8236; Q3 = Qin0*2+Qintr;

S_I4 = 30; S_S4 = 0.97326; X_I4 = 1149.1683; X_S4 = 54.4484; X_BH4 = 2563.3104; X_BA4 = 153.7108; X_P4 = 451.1852;


420

S_O4 = 2.4745; S_NO4 = 11.0693; S_NH4 = 2.7825; S_ND4 = 0.75276; X_ND4 = 3.8403; S_ALK4 = 4.1538; Q4 = Qin0*2+Qintr;

S_I5 = 30; S_S5 = 0.80801; X_I5 = 1149.1683; X_S5 = 44.4828; X_BH5 = 2562.8514; X_BA5 = 154.163; X_P5 = 452.7367; S_O5 = 2; S_NO5 = 13.5243; S_NH5 = 0.67193; S_ND5 = 0.6645; X_ND5 = 3.2605; S_ALK5 = 3.8277; Q5 = Qin0*2+Qintr;

XINIT1 = [ S_I1 S_S1 X_I1 X_S1 X_BH1 X_BA1 X_P1 S_O1 S_NO1

S_NH1 S_ND1 X_ND1 S_ALK1 Q1 ]; XINIT2 = [ S_I2 S_S2 X_I2 X_S2 X_BH2 X_BA2 X_P2 S_O2 S_NO2




S_NH5 S_ND5 X_ND5 S_ALK5 Q5 ];

XINIT1 = XINIT1.*(rand(1, 14)/2); XINIT2 = XINIT2.*(rand(1, 14)/2); XINIT3 = XINIT3.*(rand(1, 14)/2); XINIT4 = XINIT4.*(rand(1, 14)/2); XINIT5 = XINIT5.*(rand(1, 14)/2);

% temperature = 15 degrees Celsius % mu_H = 4.0; % K_S = 10.0; % K_OH = 0.20; % K_NO = 0.50; % b_H = 0.3; % mu_A = 0.5; % K_NH = 1.0; % K_OA = 0.4; % b_A = 0.05; % ny_g = 0.8; % k_a = 0.05; % k_h = 3.0; % K_X = 0.1; % ny_h = 0.8; % else temperature = 10 degrees Celsius mu_H = 3.0; K_S = 20.0; K_OH = 0.20; K_NO = 0.50; b_H = 0.20; mu_A = 0.3; K_NH = 1.0; K_OA = 0.4; b_A = 0.05;


421

ny_g = 0.8; k_a = 0.04; k_h = 1.0; K_X = 0.01; ny_h = 0.4; % else temperature = 20 degrees Celsius % mu_H = 6.0; % K_S = 20.0; % K_OH = 0.20; % K_NO = 0.50; % b_H = 0.62; % mu_A = 0.8; % K_NH = 1.0; % K_OA = 0.4; % b_A = 0.05; % ny_g = 0.8; % k_a = 0.08; % k_h = 3.0; % K_X = 0.03; % ny_h = 0.4;

Y_H = 0.67; Y_A = 0.24; f_P = 0.08; i_XB = 0.08; i_XP = 0.06;

X_I2TSS = 0.75; X_S2TSS = 0.75; X_BH2TSS = 0.75; X_BA2TSS = 0.75; X_P2TSS = 0.75;

PAR1 = [ mu_H K_S K_OH K_NO b_H mu_A K_NH K_OA b_A ny_g k_a

k_h K_X ny_h Y_H Y_A f_P i_XB i_XP X_I2TSS X_S2TSS X_BH2TSS

X_BA2TSS X_P2TSS ]; PAR2 = PAR1; PAR3 = PAR1; PAR4 = PAR1; PAR5 = PAR1;

VOL1 = 1000; VOL2 = VOL1; VOL3 = 1333; VOL4 = VOL3; VOL5 = VOL3;

SOSAT1 = 8; SOSAT2 = SOSAT1; SOSAT3 = SOSAT1; SOSAT4 = SOSAT1; SOSAT5 = SOSAT1;

KLa1 = 0; KLa2 = 0; KLa3 = 10; KLa4 = 10; KLa5 = 10;

delaySNO2 = 1/144; T = 0.0001; QintrT = T*5;


422


Secondary Clarifier (Settler)

TSS_1 = 12.5016; TSS_2 = 18.1183; TSS_3 = 29.548; TSS_4 = 69.0015; TSS_5 = 356.2825; TSS_6 = 356.2825; TSS_7 = 356.2825; TSS_8 = 356.2825; TSS_9 = 356.2825; TSS_10 = 6399.2981;

SI_1 = 30; SI_2 = 30; SI_3 = 30; SI_4 = 30; SI_5 = 30; SI_6 = 30; SI_7 = 30; SI_8 = 30; SI_9 = 30; SI_10 = 30;

SS_1 = 0.80801; SS_2 = 0.80801; SS_3 = 0.80801; SS_4 = 0.80801; SS_5 = 0.80801; SS_6 = 0.80801; SS_7 = 0.80801; SS_8 = 0.80801; SS_9 = 0.80801; SS_10 = 0.80801;

SO_1 = 2; SO_2 = 2; SO_3 = 2; SO_4 = 2; SO_5 = 2; SO_6 = 2; SO_7 = 2; SO_8 = 2; SO_9 = 2; SO_10 = 2;

SNO_1 = 13.5243; SNO_2 = 13.5243; SNO_3 = 13.5243; SNO_4 = 13.5243; SNO_5 = 13.5243; SNO_6 = 13.5243; SNO_7 = 13.5243; SNO_8 = 13.5243; SNO_9 = 13.5243; SNO_10 = 13.5243;


423

SNH_1 = 0.67193; SNH_2 = 0.67193; SNH_3 = 0.67193; SNH_4 = 0.67193; SNH_5 = 0.67193; SNH_6 = 0.67193; SNH_7 = 0.67193; SNH_8 = 0.67193; SNH_9 = 0.67193; SNH_10 = 0.67193;

SND_1 = 0.6645; SND_2 = 0.6645; SND_3 = 0.6645; SND_4 = 0.6645; SND_5 = 0.6645; SND_6 = 0.6645; SND_7 = 0.6645; SND_8 = 0.6645; SND_9 = 0.6645; SND_10 = 0.6645;

SALK_1 = 3.8277; SALK_2 = 3.8277; SALK_3 = 3.8277; SALK_4 = 3.8277; SALK_5 = 3.8277; SALK_6 = 3.8277; SALK_7 = 3.8277; SALK_8 = 3.8277; SALK_9 = 3.8277; SALK_10 = 3.8277;

SETTLERINIT = [ TSS_1 TSS_2 TSS_3 TSS_4 TSS_5 TSS_6 TSS_7 TSS_8 TSS_9... TSS_10 SI_1 SI_2 SI_3 SI_4 SI_5 SI_6 SI_7 SI_8 SI_9 SI_10 SS_1... SS_2 SS_3 SS_4 SS_5 SS_6 SS_7 SS_8 SS_9 SS_10 SO_1 SO_2 SO_3 SO_4... SO_5 SO_6 SO_7 SO_8 SO_9 SO_10 SNO_1 SNO_2 SNO_3 SNO_4 SNO_5 SNO_6... SNO_7 SNO_8 SNO_9 SNO_10 SNH_1 SNH_2 SNH_3 SNH_4 SNH_5 SNH_6 SNH_7... SNH_8 SNH_9 SNH_10 SND_1 SND_2 SND_3 SND_4 SND_5 SND_6 SND_7 SND_8... SND_9 SND_10 SALK_1 SALK_2 SALK_3 SALK_4 SALK_5 SALK_6 SALK_7... SALK_8 SALK_9 SALK_10];

v0_max = 250; v0 = 474; r_h = 0.000576; r_p = 0.00286; f_ns = 0.00228; X_t = 3000;

SETTLERPAR = [ v0_max v0 r_h r_p f_ns X_t ];

area = 1500; height = 4;

DIM = [ area height ]; feedlayer = 5;


424

nooflayers = 10;

LAYER = [ feedlayer nooflayers ]; % to use model with 10 layers for solubles use type 0 (COST Benchmark) % to use model with 1 layer for solubles use type 1 (GSP-X implementation) % to use model with 0 layers for solubles use type 2 (WEST implementation)

MODELTYPE = [ 0 ];


425

Appendix C–17: Initialization of the Dissolved Oxygen and Nitrate PI-Controller

% continuous PI O2-controller KSO5 = 500; % Amplification TiSO5 = 0.001; % I-part time constant, integral time constant TtSO5 = 0.0002; % Antiwindup time constant, tracking time constant SO5intstate = 50; % initial value of I-part SO5awstate = 0; % initial value of antiwindup I-part SO5ref = 2.5; % setpoint for controller useantiwindupSO5 = 1; % 0=no antiwindup, 1=use antiwindup for oxygen control

% continuous PI Qintr-controller KQintr = 15000; % Amplification TiQintr = 0.05; % I-part time constant, integral time constant TtQintr = 0.03; % Antiwindup time constant, tracking time constant Qintrintstate = 0; % initial value of I-part Qintrawstate = 0; % initial value of antiwindup I-part SNO2delayinit = 1; % initial value of delayed measurement value SNO2ref = 1; % setpoint for controller useantiwindupQintr = 1; % 0=no antiwindup, 1=use antiwindup for Qintr control Kfeedforward = 0; % 1.2 Amp. for feedforward of Qin to Qintr (0=off) % K=Kfeedforward*(SNOref/(SNOref+1))*(Qffref*Qin0-55338) Qffref = 3;

usenoiseSNO2 = 1; % 0=no noise, 1=use noise for nitrate sensor

noiseseedSNO2 = 1; noisevarianceSNO2 = 0.01; noisemeanSNO2 = 0;


426

Appendix C–18: Food-to-Microorganism Ratio

% --------Compute the influent Quality Index (Decision parameter)---------

% Beta factor specification BTSS = 2; BCOD = 1; BNK_j = 20; BNO = 20; BBOD = 2;

% Compute the decision functions SNO_inf = SNO_inf_tr; TKN_inf = SNH_inf_tr + SND_inf_tr + XND_inf_tr + i_XB*(XBH_inf_tr + XBA_inf_tr) +

i_XP*(XP_inf_tr + XI_inf_tr); TSS_inf = 0.75*(XS_inf_tr + XI_inf_tr + XBH_inf_tr + XBA_inf_tr + XP_inf_tr); BOD_inf = 0.65*(SS_inf_tr + XS_inf_tr + 0.94*(XBH_inf_tr + XBA_inf_tr)); COD_inf = SS_inf_tr + SI_inf_tr + XS_inf_tr + XI_inf_tr + XBH_inf_tr + XBA_inf_tr

+ XP_inf_tr; Q_inf = Qin_tr;

% Compute total nitrogen %Ntotal_inf = TKN_inf + SNO_inf;

% Compute influent quality inf_qua = (BNK_j*TKN_inf + BTSS*TSS_inf + BCOD*COD_inf + BBOD*BOD_inf +

BNO*SNO_inf).*Q_inf/1000;

% Compute the total sludge tot_slu = TSS_r1_tr*VOL1 + TSS_r2_tr*VOL2 + TSS_r3_tr*VOL3 + TSS_r4_tr*VOL4 +

TSS_r5_tr*VOL5;

% Compute food-to-microorganisms ration F_to_M = Qin_tr.*BOD_inf./tot_slu;

% Compute the amount of microorganisms (MLVSS) MLVSS = BOD_inf./F_to_M;

%====================================================================

startindex=max(find(t <= starttime)); stopindex=min(find(t >= stoptime));

time = t(startindex:stopindex); feedx = feed(startindex:stopindex,:); recx = rec(startindex:stopindex,:); settlerx = settler(startindex:stopindex,:); inx = in(startindex:stopindex,:);

[n,m] = size(reac1);

BODinfluent = 0.65.*(inx(n,2)+inx(n,4)+0.92.*(inx(n,5)+inx(n,6))); sludge = reac1(n,14)*VOL1 + reac2(n,14)*VOL2 + reac3(n,14)*VOL3 ... + reac4(n,14)*VOL4 + reac5(n,14)*VOL5; F_to_M=inx(n,15).*BODinfluent/sludge


427

Appendix C–19: Computation of the Sludge Age

[n,m] = size(reac1); sludge = reac1(n,14)*VOL1 + reac2(n,14)*VOL2 + reac3(n,14)*VOL3 ... + reac4(n,14)*VOL4 + reac5(n,14)*VOL5; waste = settler(n,14)*settler(n,16) + settler(n,30)*settler(n,31); sludge_age = sludge/waste


428

Appendix C–20: Influent Quality (IQ)

% Beta factor specification BTSS = 2; BCOD = 1; BNK_j = 20; BNO = 20; BBOD = 2; f_p = 1;

% Compute the decision functions SNO_0 = SNO_inf_tr; TKN_j0 = SNH_inf_tr + SND_inf_tr + XND_inf_tr + i_XB*(XBH_inf_tr + XBA_inf_tr) +

i_XP*(XP_inf_tr + XI_inf_tr); TSS_0 = 0.75*(XS_inf_tr + XI_inf_tr + XBH_inf_tr + XBA_inf_tr + XP_inf_tr); BOD_50 = 0.65*(SS_inf_tr + XS_inf_tr + (1-f_p)*(XBH_inf_tr + XBA_inf_tr)); COD_0 = SS_inf_tr + SI_inf_tr + XS_inf_tr + XI_inf_tr + XBH_inf_tr + XBA_inf_tr +

XP_inf_tr;

Ntotal_0 = TKN_j0 + SNO_0; Q_inf = Qin_tr;

% Compute effluent qua;ity inf_quality = (BNK_j*TKN_j0 + BTSS*TSS_0 + BCOD*COD_0 + BBOD*BOD_50 +

BNO*SNO_0).*Q_inf/1000;


429

Appendix C–21: Effluent Quality (EQ)

% --------Compute the effluent Quality Index (Decision parameter)---------

% Beta factor specification BTSS = 2; BCOD = 1; BNK_j = 20; BNO = 20; BBOD = 2; f_p = 1;

% Compute the decision functions SNO_e = SNO_se_tr; TKN_je = SNH_se_tr + SND_e_tr + XND_se_tr + i_XB*(XBH_se_tr + XBA_se_tr) +

i_XP*(XP_se_tr + XI_se_tr); TSS_e = 0.75*(XS_se_tr + XI_se_tr + XBH_se_tr + XBA_se_tr + XP_se_tr); BOD_5e = 0.65*(SS_se_tr + XS_se_tr + (1-f_p)*(XBH_se_tr + XBA_se_tr)); COD_e = SS_se_tr + SI_se_tr + XS_se_tr + XI_se_tr + XBH_se_tr + XBA_se_tr +

XP_se_tr;

% Compute total nitrogen Ntotal_e = TKN_je + SNO_e; Q_eff = Qe_tr;

% Compute effluent quality E_Q = (BNK_j*TKN_je + BTSS*TSS_e + BCOD*COD_e + BBOD*BOD_5e +

BNO*SNO_e).*Q_eff/1000);

Appendix D The Nonlinear F-16 Aircraft Description and Model

430

Right Aileron Right Leading

Edge Flap

Right

Elevator

Right Rudder

Pitch Axis

(Lateral Axis)

Roll Axis,

(Longitudinal Axis)

Yaw Axis

(Vertical Axis)

Thrust

φ in deg

ψ in deg ϑ in deg

,M q

,L p

,N r

,b

z w

,b

x u

,b

y v

• cm

O n

x

North

nz

South n

y

East

. Ocm

Fig. D.1: The four right positive control deflections of a nonlinear F-16 aircraft control surfaces with the direction of

positive thrust, roll rate (p), pitch rate (q), yaw rate (r), ( , , )b b b

x y z body axes, velocities ( , , )u v w

along the body axes, rolling moment (L), pitching moment (M), yawing moment (N), navigation frame

( , , )n n n

x y z , the center of mass cm

O , the Euler angles ( , , )φ ϑ ψ for aileron, elevator and rudder

deflections respectively.

APPENDIX D: The Nonlinear F-16 Aircraft Description and Model

Appendix D–1: The Nonlinear F-16 Aircraft Description and Anatomy

The basic F-16 aircraft is shown in Fig. D.1 with the main control components, the body axes and the Euler

angles. The main control components considered here are the ailerons, elevators, rudders and the engine thrust.

Only the right aileron, elevator and rudder are shown in Fig. D.1. The ailerons control the rotation about the

longitudinal axis, the elevators are used to adjust or control the pitch (up/down altitude) about the lateral axis, the

rudders are used to make small directional change and turns (left/right) about the vertical axis. By controlling the

pitch of the aircraft, the elevator also control the angle of attack ( )α of the wind.

The body axes of the F-16 aircraft are conventional. The positive x-axis points forward along the

aircraft’s longitudinal axis, the positive y-axis is directed along the right wing, the positive z-axis is normal to the

x and y axes and points vertically downward when the aircraft is at rest or in level flight. The origin cm

O is

located at the aircraft’s center of mass. This defines a right-handed orthogonal body coordinate frame

( , , )b b b

x y z attached to the aircraft as shown in Fig. D.1. Moments about the x-, y-, and z-axis are labeled L, N


431

n

n

n

x

y

z

1

1

1 n

x

y

z z

=

2

2 1

n

x

y y

z

=

2b

b

b

x x

y

z

=

Yaw Angle

ψ

ϑ

φ

Pitch Angle

Roll Angle

Navigation

Frame

Intermediate

Frame1

Intermediate

Frame2 Aircraft’s Body

Fixed Frame

Fig. D.2: The navigation frame and the Euler angles.

ψ n

z

nx

1y

ny

ψ

1x

2 1y y=

1 nz z=

2z

1x

ϑ

2x

ϑ

•

2 bx x=

2z

bz

2y

φ

• φ

by

Fig. D.3: The Euler angles and frame transformation.

,b

z w

,b

x u

,b

y v

. Ocm

α

β

wz

wy

wx

TV

•

cmO

nx

North

nz

South

ny

East

Fig. D.4: Definition of the angle of attack and sideslip, 0α > and 0β > .

and N respectively. The moment axes obey the right hand about each axis. The body rates ( , , )p q r and the

Euler angles ( , , )φ ϑ ψ are also measured using the right-hand rule about each axis.

The orientation of the navigation frame is North, East and South ( , , )n n n

x y z . The three Euler angle

( , , )φ ϑ ψ rotations continuously relate the orientation of the aircraft’s body frame to the navigation frame as

shown in Figs. D.2 and D.3. As shown in Figs. D.2 and D.3, the navigation coordinate is first transformed into the

intermediate frame 1 via a rotation about the n

z axis by the angle ψ , which defines the aircraft’s heading. This is


432

followed by a rotation about the new 1y axis by the angle ,ϑ which defines the aircraft’s elevation. Finally, the

aircraft bank angle φ defines the rotation about the new 2x axis.

The air flow acting on the airframe is responsible for the aerodynamic forces. The air flow is described by

the air speed vectorT

V . Its norm is the total velocity TV = TV and its direction relative to the airframe is

defined by two angles, namely: the angle of attack α and the sideslip angle β as illustrated in Fig. D.4. As

shown in the figure, the angle of attack α is the angle between the projection of the airspeed vector T

V onto the

( , )b b

x z plane and the b

x axis. The sideslip angle β is the angle between the projection of the airspeed vector

TV onto the ( , )

b bx z plane and the airspeed vector itself. The wind axes coordinate system is such that the

wx

axis points along the airspeed vectorT

V .

Appendix D–2: The Nonlinear F-16 Aircraft Equation of Motions

The six-degree-of-freedom (6-DOF) equations of motion for the F-16 aircraft over a flat non-rotating

Earth can be summarized [Stevens and Lewis, 2003] as:

Force Equations:

sinx

qSu rv qw g C

mϑ= − − + (D.1)

sin cosy

qSv pw ru g C

mφ ϑ= − + + (D.2)

cos cosz

qSw qu pv g C

mφ ϑ= − + + (D.3)

Moment Equations:

( )1 2 3 4p q J r J p J L J N= + + + (D.4)

( )2 2

5 6 7q J pr J q p r J M= − − + (D.5)

( )8 2 4 9r q J p J r J L J N= − + + (D.6)

Kinematic Equations:

tan sin tan cosp q rφ ϑ φ ϑ φ= + + (D.7)

cos sinq rϑ φ φ= − (D.8)

cos sec sin secr qψ φ ϑ φ ϑ= + (D.9)

Navigation Equations:

( ) ( )cos cos cos sin sin sin cos sin sin cos sin cosNp u v wϑ ψ φ ψ φ ϑ ψ φ ψ φ ϑ ψ= + − + + + (D.10)


433

( ) ( )cos sin cos cos sin sin sin sin cos cos sin sinEp u v wϑ ψ φ ψ φ ϑ ψ φ ψ φ ϑ ψ= + + + − + (D.11)

( ) ( )sin sin cos cos cosh u v wϑ φ ϑ φ ϑ= − − (D.12)

where ( , , )u v w are the velocities in the ( , , )b b b

x y z body axes, φ is the roll angle, ϑ is the pitch angle, ψ is

the yaw angle, p is the roll rate, q is the pitch rate, r is the yaw rate, N

p is the north position, E

p is the east

position, and h is the altitude. The terms regarding moments of inertia are defined as:

( )2 2

1

y z z xzI I I I

J− −

=Γ

(D.13)

( )2

x y z xzI I I IJ

− +=

Γ (D.14)

3z

IJ =

Γ (D.15)

4xzI

J =Γ

(D.16)

( )5

z x

y

I IJ

I

−= (D.17)

6xz

y

IJ

I= (D.18)

7

1

y

JI

= (D.119)

( ) 2

8

x x y xzI I I IJ

− +=

Γ (D.20)

9xI

J =Γ

(D.21)

with 2

x z xzI I IΓ = − (D.22)

where ( , , )x y zI I I are the moments of inertia of the body axis system ( , , )b b b

x y z , and xz

I is the ( )b b

x z−

body axis product of inertia. Since the F-16 aircraft is symmetric with respect to the xz plane, xyI and yzI are

both zero. This is also true for aircrafts in general ([Ducard, 2009], [Etkin and Reid, 1996], [Stevens and Lewis,

2003]).

The aerodynamic forces ( , , )a a a

x y z and the moments ( , , )a a a

L M N acting on the aircraft are given

in terms of the dimensionless aerodynamic force and moment coefficients as:

a x

a y

a z

x qSC

y qSC

z qSC

=

=

=

(D.23)


434

and

a l

a m

a n

L qSbC

M qScC

N qSbC

=

=

=

(D.24)

where q is the dynamic pressure (qbar), S is the reference area, c is the average geometric chord, b is the

reference span, x

C is the x body-axis aerodynamic side-force coefficient, yC is the aerodynamic side-force

coefficient, z

C is the z body-axis aerodynamic force coefficient, ,l m

C C and n

C are the aerodynamic rolling,

pitching and yawing moment coefficients respectively. For convenient, the velocities ( , , )u v w are expressed in

terms of the total aircraft velocity ( )T

V , the angle of attack ( )α and the sideslip angle ( )β as derived in the

following:

( )T

T

uu vv wwV

V

+ +=

(D.25)

( )2 cos

T T

T

uV vV

Vβ

β

+=

(D.26)

2 2

uw wu

u wα

−=

+

(D.27)

cos cos , sin , sin cosT T T

u V v V w Vα β β α β= = = (D.28)

A comprehensive formulation and treatment of the F-16 aircraft equations of motion can be found in [Nguyen et

al., 1979] and [Stevens and Lewis, 2003].

The aerodynamic coefficients from the aerodynamic force moments equations in Appendix D–4, are

functions of various variables. The damping coefficients ( )xqC α , ( )yrC α , ( )ypC α , ( )zqC α , ( )lr

C α , ( )lpC α ,

( )mqC α , ( )nr

C α and ( )npC α are expressed as functions of the angle of attack α . The body-axis aerodynamic

force coefficients ( , )x e

C α δ , ( , , )y a rC β δ δ and ( , )z e

C α δ are functions of α , β , a

δ , e

δ andr

δ . The moment

coefficients ( , )M e

C α δ , ( , )l

C α β and ( , )N

C α β are functions of α , β ande

δ . The coefficients of the rolling

moment due to the ailerons ( , )alC δ α β and due to the rudder ( , )

rlC δ α β as well as the coefficients of the yawing

moment due to the ailerons ( , )aNC δ α β and due to the rudder ( , )

rNC δ α β are expressed as functions of α

and β . The engine thrust is expressed as a function of the power level, altitude and the Mach number together

with the acceleration components and the dynamic and static pressures of the engine ([Kendig, 1984]; [Nguyen et

al., 1979]; [Stevens and Lewis, 2003]).


435

rad2deg

State Outputs

T

Simulation Time

Pilot/Control Input

"The Cockpit"

statedelta_lef (deg)

Leading Edge Flap

1

s

F16 State

Derivatives

F-16 Non-linear

Plant

Demux

0

Clock

Fig. D.5: The schematic of the Simulink® model of the nonlinear F-16 aircraft of Fig. D.1.

Units: lbs.

Units: deg.

Units: deg.

Units: deg.

1

Out1

nf16_surfs_1.mat

f16_surfs_1Zero-Order

Hold18

-C-

Thrust Trim Setting

In1 Out1

Thrust

Model

-C-

Rudder Trim Setting

Out1

Rudder Disturbance Rudder

Scope

In1 Out1

Rudder

Actuator

-C-

Elevator Trim Setting

Out1

Elevator Disturbance Elevator

Scope

In1 Out1

Elevator

Actuator

surfaces

Control Surface Deflections

-C-

Aileron Trim Setting

Out1

Aileron Disturbance Aileron

Scope

In1 Out1

Aileron

Actuator

trim

dist

Fig. D.6: The Simulink model of the F-16 aircraft cockpit of Fig. D.5.

Appendix D–3: The Nonlinear F-16 Aircraft Modeled Using Simulink

The nonlinear F-16 aircraft is modeled using Simulink from The MathWorks [MathWorks, 2009] and is

shown in Fig. D.5 while the functional Simulink model of Fig. D.5 is shown in Fig. D.6. The Simulink model for

the F-16 cockpit, that is, the pilot/control input, is shown in Fig. D.6 while the Simulink model for the leading

edge flap (LEF) is shown in Fig. D.7. The Simulink model for the creating the dynamic ( )qbar and static

pressures ( )ps in the “Creating /qbar ps block model of Fig. D.7 is shown in Fig. D.8.


436

All the F-16 aircraft’s four actuators for the ailerons, elevator, rudder, and leading edge flap, as well as

the F-16 thrust model are modeled as first-order lag with a gain (K) and limits on the deflection and rates. The

Simulink model for the actuator with the command and rate saturators is shown in Fig. D.9. The aileron has a gain

of 20.2020 and rate limits of +80 measure in degrees per second (deg/ )s . The elevator has a gain of 20.2020 and

rate limits of +60 deg/ s . The rudder has a gain limit of 20.2020 and rate limits of +120 deg/ s . The thrust

model has a unity gain and a rate limit of +10,000 measured in pounds per second ( / )lbs s . The leading edge

flap has a gain of 7.5329 and deflection rate limits of +25 deg. The aileron, elevator and the rudder all uses same

disturbance model shown in Fig. D.10.

The Simulink model for the complete nonlinear F-16 aircraft dynamics is shown in Fig. D.11 as a

nonlinear plant model (F16 nlsim nlplant) with the states, controls, the deflection of the leading edge flap (LEF),

and the fidelity flag as inputs. The nonlinear F-16 aircraft is implemented as a nonlinear plant in MATLAB C

program, nlplant.c, given in Appendix D–5 using the Simulink MATLAB Function block.

The F-16 nonlinear plant model has thirteen state inputs, namely: north position, (PN), east position (PE),

altitude (h), roll angle ( )φ , pitch angle ( )ϑ , yaw angle ( )ψ , total velocity ( )T

V , angle of attack ( )α , angle of

sideslip ( )β , roll rate (p), pitch rate (q), yaw rate (r), and the deflection of the leading edge flap ( )LEF

δ . The

outputs of the nonlinear plant model are twelve state derivatives of the thirteen state inputs excluding theLEF

δ ,

Mach number (M), dynamic pressure ( , )qbar q , static pressure ( )s

p and three normalized accelerations

( , , )x y zn n n in the ( , , )x y z directions respectively. The Simulink model for the F-16 nonlinear plant state

outputs is shown in Fig. D.12. The initial states of the nonlinear plant model constitute the initial conditions of the

F-16 aircraft and can be specified arbitrarily. After the nonlinear plant has taken one discrete time step, the twelve

input states are found from integrating the twelve output state derivatives. However, the initial states of the F-16

nonlinear plant are obtained here through trimming procedures outlined in the trimming algorithms implemented

as MATLAB programs given in Appendix D–6 and Appendix D–7.

The nonlinear F-16 aircraft model allows for control over aileron, elevator, rudder and thrust. The thrust

is measured in pound ( )lb and acts positively along the positive x-axis. Positive thrust causes increase in

acceleration along the body x-axis. The limit on the thrust setting is from 1000 to 19,000 lbs . The aileron,

elevator and rudder deflections are controlled by their respective actuators discussed earlier. Positive aileron

deflection gives a decrease in the roll rate (p), and this requires that the right aileron deflects downward and the

left aileron deflects upward. A positive elevator deflection results in a decrease in pitch rate (q) and requires that

the elevator deflects downward. A positive deflection of the rudder decreases the yaw rate (r) and can be

described as a deflection to the right. In general, positive deflection gives a decrease in the body rates. The limits

for the control inputs on the aileron, elevator and rudder are +21.5, +25 and +30 degrees respectively. The

positive orientations for each control surface are shown in Fig. D.1.


437

Same as atmos() in nlplant...c

Initial ize LF_state with -alpha0 value in degrees!!!!

1

delta_lef

(deg)

9.05

180/pi

nf16_dlef_1.mat

f16_dlef_1

1.45

Zero-Order

Hold18

1

s

LF_state

d_LF

LEF deflection

In1 Out1

LEF

Actuator

1.387.25

2

Demux

alt

V

qbar / ps

Creating qbar / ps

1

state

phi, theta, psi

x, y

beta, p, q, r

alpha

V

alt

Fig. D.7: The Simulink model of the leading edge flap for the F-16 aircraft.

1

qbar / ps2.377e-3

rho0

1715

0.5

5190.703e-5

Switch

u2

uv

4.14

390

1

2

V

1

alt

tf ac

rho

temp

ps

qbar

Fig. D.8: The Simulink model for creating the ( qbar ) and ( ps ) for the F-16 aircraft.

1

Out1Rate

Saturation

1

s

Integrator

-K-

GainCommand

Saturation

1

In1

1

Out1

Step3

Step2

Step1

Fig. D.9: The Simulink actuator model for the aileron, elevator, Fig. D.10: The aileron, elevator, rudder and

rudder, thrust and the leading edge flap for the F-16 aircraft. thrust disturbances model. The step time

“Step1”, “Step2” and “Step3” for aileron,

elevator, rudder and thrust are all set to 1, 3

and 5 respectively.

The deflection of the leading edge flap ( )LEF

δ is not controlled directly nor can it be changed by the pilot,

but it is dependent on the angle of attack ( )α as well as on the dynamic pressure ( , )qbar q and static pressure

( )s

p for which the F-16 aircraft is flying. The dynamic and static pressures are both measured in pounds per


438

1

Out1

-C-

Fidelity Flag

MATLAB

Function

F16 nlsim

nlplant3

LEF

2

Controls

1

States

Fig. D.11: The Simulink model of the F-16 nonlinear dynamics together with its inputs defined by the MATLAB

Function “nlplant.c” given in Appendix D – 5.

PNPEAltPhiThetaPsiVelAlphaBetapqrNXNYNZMachQbarPs

State Outputs via

Zero-order-hold (ZOH)

Scope5

Scope4

Scope3

Scope2

Scope1

Scope

y_sim

F16-States

Demux1

In1

npos (f t)npos (f t)

epos (f t)epos (f t)

alt (f t)alt (f t)

theta (deg)theta (deg)

psi (deg)psi (deg)

v el (f t/s)v el (f t/s)

alpha( deg)alpha( deg)

beta (deg)beta (deg)

p (deg/s)p (deg/s)

q (deg/s)q (deg/s)

r (deg/s)r (deg/s)

nx (g)nx (g)

ny (g)ny (g)

nz (g)nz (g)

phi (deg)phi (deg)

machmach

qbar (lb/f t f t)qbar (lb/f t f t)

ps (lb/f t f t)ps (lb/f t f t)

Fig. D.12: The F-16 aircraft state outputs sampled at 0.5 second using the Simulink zero-order-hold (ZOH) block.

square feet 2( / )lb ft . The deflection of the leading edge flap ( )LEF

δ is governed by the following transfer

function:

2 7.251.38 9.05 1.45

7.25LEF

s

s q

s pδ α

+= − +

+ (D.29)

In term of the leading edge flap, two nonlinear F-16 aircraft model can be identified, namely: low fidelity model

and high fidelity model [Russell, 2003]. These two model types are distinguished based on the data tables used to

compute the force and moment coefficients in the “Lookup Table” found in the MATLAB C program nlplant.c


439

given in Appendix D–5. The aerodynamic data used to find the force and moment coefficients is tabulated as a

function of the angle of attack ( )α , angle of sideslip ( )β and the elevator deflection ( )e

δ . The force and

moment coefficients are then found by interpolating the entries for a given angle of attack ( )α , angle of sideslip

( )β and the elevator deflection ( )e

δ .

The low fidelity F-16 model does not include the effects of the leading edge flap and there is a complete

decoupling between the longitudinal and lateral direction. The angle of attack ( )α range is -10 to 45 degrees

while the angle of sideslip ( )β range is from +30 degrees. The aerodynamic data for the low fidelity F-16 model

is given in the Appendix of [Stevens and Lewis, 1992]. On the other hand, the high fidelity F-16 model includes

the effect of the leading edge flap and there is complete coupling between the longitudinal and lateral directions

which approximates a real F-16 aircraft. The angle of attack ( )α ranges from -20 to 90 degrees and the angle of

sideslip ( )β ranges from +30 degrees. The aerodynamic data for the high fidelity F-16 model is given in Table III

of the NASA report [Nguyen et al., 1979]. Thus, the high fidelity F-16 model allows the F-16 aircraft to fly at

higher angles of attack ( )α .

Although, both low and high fidelity F-16 models are included in the nlplant.c of Appendix D–5, this

work focuses on the simulation of the high fidelity F-16 aircraft model due to its complete coupling between the

longitudinal and lateral directions. The limit on the deflections of the leading edge flap ( )LEF

δ is from 0 to 25

degrees.

One important parameter in describing high-speed flight of the F-16 aircraft is the Mach number (M)

defined simply as the ratio of the airspeed to the speed of sound:

T T

m m

V VM

a RTγ= = (D.30)

p

m

v

C

Cγ = , (D.31)

0m mT T hλ= + , (D.32)

and 0

aR

RM

= (D.33)

where T

V is the airspeed (total velocity), a is the speed of sound, 1.4m

γ ≈ is the ratio of the specific heat of a gas

at constant pressure ( )pC to that at constant volumev

C , m

T is the air temperature which decrease linearly with

increasing altitude h in the troposphere, 0 288.15 oT K≈ is the air temperature at sea level,

mλ is the temperature

gradient in the troposphere, R is specific gas constant, 1 18314.32a

R JK kmol− −≈ is molar gas constant, and

1

0 28.9644M kg kmol−≈ is molecular weight of air at sea level. Thus, if the F-16 aircraft is traveling at a Mach


440

number of 2, then it is going twice the speed of sound. Also, if the F-16 aircraft is flying at Mach number less

than 1, then the pressure disturbances travel faster than the aircraft and influence the air ahead of the aircraft.

Appendix D–4: Static, Dynamic and Total Pressures

According to the dictionary of aviation [Crocker and Collin, 2007], the dynamic pressure is the pressure

created by the forward movement of the aircraft while the static pressure is the pressure of a fluid acting on and

moving with the aircraft. The total pressure is the sum of the static and dynamic pressures. It is also necessary to

note that airflow is the movement of air over the aircraft as it travels through the atmosphere; the airspeed is the

speed of the aircraft relative to the air around it; airstream is the flow of air caused by the movement of the aircraft

through the air; and the static port which is a small hole in the side of the aircraft which senses static pressure

[Crocker and Collin, 2007]. In order to distinguish between static, dynamic and total pressures as well as how

they are measured, consider air passing through a tube that narrows as in Fig. D.13. In this discussion, it is assume

that the air is moving at speeds below about Mach 0.3 (i.e. three-tenths the speed of sound), where the air can be

considered incompressible [Anderson and Eberhardt, 2001 and 2010].

Thus, as the area of the tube in Fig. D.13 narrows, the velocity must increase. If no other force acts on the

fluid, the pressure at point A must be greater than the pressure at point B. This is the Bernoulli relationship that

explains the lift of flight ([Etkin and Reid, 1996], [Hoerner, 1965], [Roskam, 1998]).

Unlike in physics, from aeronautical point of view, the pressure at point A is measured perpendicular to

the direction of flow as the total pressure (T

p ). Again, unlike in physics, there are also two other pressures

associated with the pressure at point A, namely the static ( ps ) and dynamic ( qbar ) pressures. The static pressure

remains constant while the total pressure increases due to increase in the dynamic pressure due to the airstream.

The first of the three pressures associated with the airstream to consider is the total pressure (T

p ). This is

measured by brining the flowing air to a stop. In Fig. D.13, this is measured by placing a tube facing into the

airflow. The air stops in the tube, and the total pressure is measured as T

p in Fig. D.13. For the situation in the

figure, T

p is the same at both points A and B. In the language of pilots, this is known as the pitot pressure, and the

figure illustrates a pitot tube. The second pressure to considere is the static pressure (T

p ), which is measured

perpendicular to the airflow through a hole in the wall. In Fig. D.13, the static pressure is higher at point A than at

point B. The third pressure is the dynamic pressure ( qbar ), which is the pressure owing to the motion of the air

and is parallel to the flow of air. The dynamic pressure is proportional to the kinetic energy in the air. Thus the

faster the airstream, the higher is the dynamic pressure.

In summary, the total pressure is the sum of static and dynamic pressures, i.e. T

p ps qbar= + . Fig. D.13

shows how the static and total pressure can be determined. Next, it is necessary to know how the dynamic


441

A

sP

TP

bv

av B

C

Fan

• •

• c

v

Propeller

Fig. D.13: Static ( ps ) and total (T

p ) pressures together with the airflow a

v , b

v and c

v .

•

TP measured here

(Pitot tube)

qbar measured here

(Differential pressure gauge) sP measured here

(Static port)

Flowing air

Fig. D.14: The measurement of static ( ps ), dynamic ( qbar ) and total pressure (

Tp ) using the pitot tube.

pressure can be measured. In order to measure the dynamic pressure, the setup of Fig. D.14 is considered.

Between the pitot tube that measures the total pressure and the static port that measures the static pressure, there is

placed a differential pressure gauge. This is a gauge that measures the difference in pressure between the two

ports, which is the difference between the total and static pressures. The gauge is calibrated in speed. Thus, the

difference between the total and the static pressures is the dynamic pressure.

If no energy is added to the air by some mechanism means such as a propeller, the total pressure remains

the same, and an increase in dynamic pressure causes a decrease in static pressure. Thus, when the pressure of the

air decreases because the aircraft is going faster, the pressure referred to is the static pressure. However, if energy

is added to the air by a propeller activated fan as shown in the lower right-hand corner of Fig. D.13. what happens

to the air at point C is that the fan is accelerating air, thus work is done on the air. Therefore, the dynamic pressure

has increased. Since the air is not confined, the static pressure is the same as the surrounding environment and has

not changed. Thus, the total pressure has increased due to increase in dynamic pressure but not due to static

pressure. Hence, air pressure of moving air refers to the static pressure. Also, it does not follow that because air is

flowing faster, the static pressure has decreased [Anderson and Eberhardt, 2001 and 2010].


442

Appendix D–5: The MATLAB C Program for the Nonlinear F-16 Aircraft Model,

nlpant.c

#include "math.h"

/* Merging the nlplant.c (lofi) and nlplant_hifi.c to use same equations of motion, navigation equations and use own look-up tables decided by a flag. */

void atmos(double,double,double*); /* Used by both */ void accels(double*,double*,double*); /* Used by both */

#include "lofi_F16_AeroData.c" /* LOFI Look-up header file*/ #include "hifi_F16_AeroData.c" /* HIFI Look-up header file*/

void nlplant(double*,double*);

/*########################################*/ /*### Added for mex function in matlab ###*/ /*########################################*/

int fix(double); int sign(double);

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

#define XU prhs[0] #define XDOTY plhs[0]

int i; double *xup, *xdotp;

if (mxGetM(XU)==18 && mxGetN(XU)==1)

/* Calling Program */ xup = mxGetPr(XU); XDOTY = mxCreateDoubleMatrix(18, 1, mxREAL); xdotp = mxGetPr(XDOTY);

nlplant(xup,xdotp);

/* debug for (i=0;i<=14;i++) printf("xdotp(%d) = %e\n",i+1,xdotp[i]); end debug */

/* End if */ else mexErrMsgTxt("Input and/or output is wrong size."); /* End else */ /* end mexFunction */


443

/*########################################*/ /*########################################*/

void nlplant(double *xu, double *xdot)

int fi_flag;

/* #include f16_constants */ double g = 32.17; /* gravity, ft/s^2 */ double m = 636.94; /* mass, slugs */ double B = 30.0; /* span, ft */ double S = 300.0; /* planform area, ft^2 */ double cbar = 11.32; /* mean aero chord, ft */ double xcgr = 0.35; /* reference center of gravity as a fraction of cbar */ double xcg = 0.30; /* center of gravity as a fraction of cbar. */

double Heng = 0.0; /* turbine momentum along roll axis. */ double pi = acos(-1); double r2d; /* radians to degrees */

/*NasaData %translated via eq. 2.4-6 on pg 80 of Stevens and Lewis*/ double Jy = 55814.0; /* slug-ft^2 */ double Jxz = 982.0; /* slug-ft^2 */ double Jz = 63100.0; /* slug-ft^2 */ double Jx = 9496.0; /* slug-ft^2 */

double *temp;

double npos, epos, alt, phi, theta, psi, vt, alpha, beta, P, Q, R; double sa, ca, sb, cb, tb, st, ct, tt, sphi, cphi, spsi, cpsi; double T, el, ail, rud, dail, drud, lef, dlef; double qbar, mach, ps; double U, V, W, Udot,Vdot,Wdot; double L_tot, M_tot, N_tot, denom;

double Cx_tot, Cx, delta_Cx_lef, dXdQ, Cxq, delta_Cxq_lef; double Cz_tot, Cz, delta_Cz_lef, dZdQ, Czq, delta_Czq_lef; double Cm_tot, Cm, eta_el, delta_Cm_lef, dMdQ, Cmq, delta_Cmq_lef, delta_Cm,

delta_Cm_ds; double Cy_tot, Cy, delta_Cy_lef, dYdail, delta_Cy_r30, dYdR, dYdP; double delta_Cy_a20, delta_Cy_a20_lef, Cyr, delta_Cyr_lef, Cyp, delta_Cyp_lef; double Cn_tot, Cn, delta_Cn_lef, dNdail, delta_Cn_r30, dNdR, dNdP, delta_Cnbeta; double delta_Cn_a20, delta_Cn_a20_lef, Cnr, delta_Cnr_lef, Cnp, delta_Cnp_lef; double Cl_tot, Cl, delta_Cl_lef, dLdail, delta_Cl_r30, dLdR, dLdP, delta_Clbeta; double delta_Cl_a20, delta_Cl_a20_lef, Clr, delta_Clr_lef, Clp, delta_Clp_lef;

temp = (double *)malloc(9*sizeof(double)); /*size of 9.1 array*/

r2d = 180.0/pi; /* radians to degrees */

/* %%%%%%%%%%%%%%%%%%% States %%%%%%%%%%%%%%%%%%% */ npos = xu[0]; /* north position */


444

epos = xu[1]; /* east position */ alt = xu[2]; /* altitude */ phi = xu[3]; /* orientation angles in rad. */ theta = xu[4]; psi = xu[5];

vt = xu[6]; /* total velocity */ alpha = xu[7]*r2d; /* angle of attack in degrees */ beta = xu[8]*r2d; /* sideslip angle in degrees */ P = xu[9]; /* Roll Rate --- rolling moment is Lbar */ Q = xu[10]; /* Pitch Rate--- pitching moment is M */ R = xu[11]; /* Yaw Rate --- yawing moment is N */

sa = sin(xu[7]); /* sin(alpha) */ ca = cos(xu[7]); /* cos(alpha) */ sb = sin(xu[8]); /* sin(beta) */ cb = cos(xu[8]); /* cos(beta) */ tb = tan(xu[8]); /* tan(beta) */

st = sin(theta); ct = cos(theta); tt = tan(theta); sphi = sin(phi); cphi = cos(phi); spsi = sin(psi); cpsi = cos(psi);

if (vt <= 0.01) vt = 0.01;

/* %%%%%%%%%%%%%%%%%%% Control inputs %%%%%%%%%%%%%%%%%%% */ T = xu[12]; /* thrust */ el = xu[13]; /* Elevator setting in degrees. */ ail = xu[14]; /* Ailerons mex setting in degrees. */ rud = xu[15]; /* Rudder setting in degrees. */ lef = xu[16]; /* Leading edge flap setting in degrees */

fi_flag = xu[17]/1; /* fi_flag */

/* dail = ail/20.0; aileron normalized against max angle */ /* The aileron was normalized using 20.0 but the NASA report and S&L both have 21.5 deg. as maximum deflection. */ /* As a result... */ dail = ail/21.5; drud = rud/30.0; /* rudder normalized against max angle */ dlef = (1 - lef/25.0); /* leading edge flap normalized against max angle */

/* %%%%%%%%%%%%%%%%%% Atmospheric effects sets dynamic pressure and mach number %%%%%%%%%%%%%%%%%% */ atmos(alt,vt,temp); mach = temp[0]; qbar = temp[1]; ps = temp[2];


445

/* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%Dynamics%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */

/* %%%%%%%%%%%%%%%%%% Navigation Equations %%%%%%%%%%%%%%%%%% */ U = vt*ca*cb; /* directional velocities. */ V = vt*sb; W = vt*sa*cb;

/* nposdot */ xdot[0] = U*(ct*cpsi) + V*(sphi*cpsi*st - cphi*spsi) + W*(cphi*st*cpsi + sphi*spsi);

/* eposdot */ xdot[1] = U*(ct*spsi) + V*(sphi*spsi*st + cphi*cpsi) + W*(cphi*st*spsi - sphi*cpsi);

/* altdot */ xdot[2] = U*st - V*(sphi*ct) - W*(cphi*ct);

/* %%%%%%%%%%%%%%%%%%% Kinematic equations %%%%%%%%%%%%%%%%%%% */ /* phidot */ xdot[3] = P + tt*(Q*sphi + R*cphi);

/* theta dot */ xdot[4] = Q*cphi - R*sphi;

/* psidot */ xdot[5] = (Q*sphi + R*cphi)/ct;

/* %%%%%%%%%%%%%%%%%% Lookup Table %%%%%%%%%%%%%%%%%% */ if (fi_flag == 1) /* HIFI Table */ hifi_C(alpha,beta,el,temp); Cx = temp[0]; Cz = temp[1]; Cm = temp[2]; Cy = temp[3]; Cn = temp[4]; Cl = temp[5];

hifi_damping(alpha,temp); Cxq = temp[0]; Cyr = temp[1];


446

Cyp = temp[2]; Czq = temp[3]; Clr = temp[4]; Clp = temp[5]; Cmq = temp[6]; Cnr = temp[7]; Cnp = temp[8];

hifi_C_lef(alpha,beta,temp); delta_Cx_lef = temp[0]; delta_Cz_lef = temp[1]; delta_Cm_lef = temp[2]; delta_Cy_lef = temp[3]; delta_Cn_lef = temp[4]; delta_Cl_lef = temp[5];

hifi_damping_lef(alpha,temp); delta_Cxq_lef = temp[0]; delta_Cyr_lef = temp[1]; delta_Cyp_lef = temp[2]; delta_Czq_lef = temp[3]; delta_Clr_lef = temp[4]; delta_Clp_lef = temp[5]; delta_Cmq_lef = temp[6]; delta_Cnr_lef = temp[7]; delta_Cnp_lef = temp[8];

hifi_rudder(alpha,beta,temp); delta_Cy_r30 = temp[0]; delta_Cn_r30 = temp[1]; delta_Cl_r30 = temp[2];

hifi_ailerons(alpha,beta,temp); delta_Cy_a20 = temp[0]; delta_Cy_a20_lef = temp[1]; delta_Cn_a20 = temp[2]; delta_Cn_a20_lef = temp[3]; delta_Cl_a20 = temp[4]; delta_Cl_a20_lef = temp[5];

hifi_other_coeffs(alpha,el,temp); delta_Cnbeta = temp[0]; delta_Clbeta = temp[1]; delta_Cm = temp[2]; eta_el = temp[3]; delta_Cm_ds = 0; /* ignore deep-stall effect */

else if (fi_flag == 0) /* ############################################## ##########LOFI Table Look-up ################# ##############################################*/

/* The lofi model does not include the leading edge flap. All terms multiplied dlef have been set to zero but just to


447

be sure we will set it to zero. */ dlef = 0.0;

damping(alpha,temp); Cxq = temp[0]; Cyr = temp[1]; Cyp = temp[2]; Czq = temp[3]; Clr = temp[4]; Clp = temp[5]; Cmq = temp[6]; Cnr = temp[7]; Cnp = temp[8];

dmomdcon(alpha,beta, temp); delta_Cl_a20 = temp[0]; /* Formerly dLda in nlplant.c */ delta_Cl_r30 = temp[1]; /* Formerly dLdr in nlplant.c */ delta_Cn_a20 = temp[2]; /* Formerly dNda in nlplant.c */ delta_Cn_r30 = temp[3]; /* Formerly dNdr in nlplant.c */

clcn(alpha,beta,temp); Cl = temp[0]; Cn = temp[1];

cxcm(alpha,el,temp); Cx = temp[0]; Cm = temp[1];

Cy = -.02*beta + .021*dail + .086*drud;

cz(alpha,beta,el,temp); Cz = temp[0]; /*##################################################

/*################################################## ## Set all higher order terms of hifi that are ## ## not applicable to lofi equal to zero. ######## ##################################################*/

delta_Cx_lef = 0.0; delta_Cz_lef = 0.0; delta_Cm_lef = 0.0; delta_Cy_lef = 0.0; delta_Cn_lef = 0.0; delta_Cl_lef = 0.0; delta_Cxq_lef = 0.0; delta_Cyr_lef = 0.0; delta_Cyp_lef = 0.0; delta_Czq_lef = 0.0; delta_Clr_lef = 0.0; delta_Clp_lef = 0.0; delta_Cmq_lef = 0.0; delta_Cnr_lef = 0.0; delta_Cnp_lef = 0.0; delta_Cy_r30 = 0.0; delta_Cy_a20 = 0.0;


448

delta_Cy_a20_lef= 0.0; delta_Cn_a20_lef= 0.0; delta_Cl_a20_lef= 0.0; delta_Cnbeta = 0.0; delta_Clbeta = 0.0; delta_Cm = 0.0; eta_el = 1.0; /* Needs to be one. See equation for Cm_tot*/ delta_Cm_ds = 0.0;

/*################################################## ##################################################*/

/* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% compute Cx_tot, Cz_tot, Cm_tot, Cy_tot, Cn_tot, and Cl_tot (as on NASA report p37-40) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */

/* XXXXXXXX Cx_tot XXXXXXXX */ dXdQ = (cbar/(2*vt))*(Cxq + delta_Cxq_lef*dlef);

Cx_tot = Cx + delta_Cx_lef*dlef + dXdQ*Q;

/* ZZZZZZZZ Cz_tot ZZZZZZZZ */ dZdQ = (cbar/(2*vt))*(Czq + delta_Cz_lef*dlef);

Cz_tot = Cz + delta_Cz_lef*dlef + dZdQ*Q;

/* MMMMMMMM Cm_tot MMMMMMMM */

dMdQ = (cbar/(2*vt))*(Cmq + delta_Cmq_lef*dlef);

Cm_tot = Cm*eta_el + Cz_tot*(xcgr-xcg) + delta_Cm_lef*dlef + dMdQ*Q + delta_Cm +

delta_Cm_ds;

/* YYYYYYYY Cy_tot YYYYYYYY */

dYdail = delta_Cy_a20 + delta_Cy_a20_lef*dlef;

dYdR = (B/(2*vt))*(Cyr + delta_Cyr_lef*dlef);

dYdP = (B/(2*vt))*(Cyp + delta_Cyp_lef*dlef);

Cy_tot = Cy + delta_Cy_lef*dlef + dYdail*dail + delta_Cy_r30*drud + dYdR*R +

dYdP*P;

/* NNNNNNNN Cn_tot NNNNNNNN */ dNdail = delta_Cn_a20 + delta_Cn_a20_lef*dlef;

dNdR = (B/(2*vt))*(Cnr + delta_Cnr_lef*dlef);

dNdP = (B/(2*vt))*(Cnp + delta_Cnp_lef*dlef);

Cn_tot = Cn + delta_Cn_lef*dlef - Cy_tot*(xcgr-xcg)*(cbar/B) + dNdail*dail +

delta_Cn_r30*drud + dNdR*R + dNdP*P + delta_Cnbeta*beta;


449

/* LLLLLLLL Cl_tot LLLLLLLL */

dLdail = delta_Cl_a20 + delta_Cl_a20_lef*dlef;

dLdR = (B/(2*vt))*(Clr + delta_Clr_lef*dlef);

dLdP = (B/(2*vt))*(Clp + delta_Clp_lef*dlef);

Cl_tot = Cl + delta_Cl_lef*dlef + dLdail*dail + delta_Cl_r30*drud + dLdR*R + dLdP*P

+ delta_Clbeta*beta;

/* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% compute Udot,Vdot, Wdot,(as on NASA report p36) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ Udot = R*V - Q*W - g*st + qbar*S*Cx_tot/m + T/m;

Vdot = P*W - R*U + g*ct*sphi + qbar*S*Cy_tot/m;

Wdot = Q*U - P*V + g*ct*cphi + qbar*S*Cz_tot/m;

/* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vt_dot equation (from S&L, p82) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ xdot[6] = (U*Udot + V*Vdot + W*Wdot)/vt;

/* %%%%%%%%%%%%%%%%%% alpha_dot equation %%%%%%%%%%%%%%%%%% */ xdot[7] = (U*Wdot - W*Udot)/(U*U + W*W);

/* %%%%%%%%%%%%%%%%% beta_dot equation %%%%%%%%%%%%%%%%% */ xdot[8] = (Vdot*vt - V*xdot[6])/(vt*vt*cb);

/* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% compute Pdot, Qdot, and Rdot (as in Stevens and Lewis p32) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ L_tot = Cl_tot*qbar*S*B; /* get moments from coefficients */ M_tot = Cm_tot*qbar*S*cbar; N_tot = Cn_tot*qbar*S*B;

denom = Jx*Jz - Jxz*Jxz;

/* %%%%%%%%%%%%%%%%%%%%%%% Pdot %%%%%%%%%%%%%%%%%%%%%%% */ xdot[9] = (Jz*L_tot + Jxz*N_tot - (Jz*(Jz-Jy)+Jxz*Jxz)*Q*R + Jxz*(Jx-Jy+Jz)*P*Q +

Jxz*Q*Heng)/denom;

/* %%%%%%%%%%%%%%%%%%%%%%% Qdot %%%%%%%%%%%%%%%%%%%%%%% */ xdot[10] = (M_tot + (Jz-Jx)*P*R - Jxz*(P*P-R*R) - R*Heng)/Jy;


450

/* %%%%%%%%%%%%%%%%%%%%%%% Rdot %%%%%%%%%%%%%%%%%%%%%%% */ xdot[11] = (Jx*N_tot + Jxz*L_tot + (Jx*(Jx-Jy)+Jxz*Jxz)*P*Q - Jxz*(Jx-Jy+Jz)*Q*R +

Jx*Q*Heng)/denom;

/*########################################*/ /*### Create accelerations anx_cg, any_cg */ /*### ans anz_cg as outputs ##############*/ /*########################################*/ accels(xu,xdot,temp);

xdot[12] = temp[0]; /* anx_cg */ xdot[13] = temp[1]; /* any_cg */ xdot[14] = temp[2]; /* anz_cg */ xdot[15] = mach; xdot[16] = qbar; xdot[17] = ps;

/*########################################*/ /*########################################*/ free(temp);

; /*##### END of nlplant() ####*/

/*########################################*/ /*### Called Sub-Functions ##############*/ /*########################################*/

/*########################################*/ /* Function for mach and qbar */ /*########################################*/ void atmos(double alt, double vt, double *coeff )

double rho0 = 2.377e-3; double tfac, temp, rho, mach, qbar, ps;

tfac =1 - .703e-5*(alt); temp = 519.0*tfac; if (alt >= 35000.0) temp=390;

rho=rho0*pow(tfac,4.14); mach = (vt)/sqrt(1.4*1716.3*temp); qbar = .5*rho*pow(vt,2); ps = 1715.0*rho*temp;

if (ps == 0) ps = 1715;

coeff[0] = mach; coeff[1] = qbar;


451

coeff[2] = ps;

/*########################################*/ /*### Port from matlab fix() function ####*/ /*########################################*/ int fix(double in) int out;

if (in >= 0.0) out = (int)floor(in); else if (in < 0.0) out = (int)ceil(in); return out;

/* port from matlab sign() function */ int sign(double in) int out;

if (in > 0.0) out = 1; else if (in < 0.0) out = -1; else if (in == 0.0) out = 0; return out; /*########################################*/ /*########################################*/

/*########################################*/ /*### Calculate accelerations from states */ /*### and state derivatives. ############ */ /*########################################*/ void accels(double *state, double *xdot, double *y)

#define grav 32.174

double sina, cosa, sinb, cosb ; double vel_u, vel_v, vel_w ; double u_dot, v_dot, w_dot ; double nx_cg, ny_cg, nz_cg ;

sina = sin(state[7]) ; cosa = cos(state[7]) ; sinb = sin(state[8]) ; cosb = cos(state[8]) ; vel_u = state[6]*cosb*cosa ;


452

vel_v = state[6]*sinb ; vel_w = state[6]*cosb*sina ; u_dot = cosb*cosa*xdot[6] - state[6]*sinb*cosa*xdot[8] - state[6]*cosb*sina*xdot[7] ; v_dot = sinb*xdot[6] + state[6]*cosb*xdot[8] ; w_dot = cosb*sina*xdot[6] - state[6]*sinb*sina*xdot[8] + state[6]*cosb*cosa*xdot[7] ; nx_cg = 1.0/grav*(u_dot + state[10]*vel_w - state[11]*vel_v) + sin(state[4]) ; ny_cg = 1.0/grav*(v_dot + state[11]*vel_u - state[9]*vel_w) - cos(state[4])*sin(state[3]) ; nz_cg = -1.0/grav*(w_dot + state[9]*vel_v - state[10]*vel_u) + cos(state[4])*cos(state[3]) ;

y[0] = nx_cg ; y[1] = ny_cg ; y[2] = nz_cg ;

/*########################################*/


453

Appendix D–6: The MATLAB Program for the F-16 Model Trimming Routine,

trim_F16.m

function [trim_state, trim_thrust, trim_control, dLEF, xu] =... trim_F16(thrust, elevator, alpha, ail, rud, vel, alt) %================================================ % F16 nonlinear model trimming routine % for longitudinal motion, steady level flight % This trim function can now trim at three % additional flight conditions % - Steady Turning Flight given turn rate % - Steady Pull-up flight - given pull-up rate % - Steady Roll - given roll rate %================================================

global altitude velocity fi_flag_Simulink global phi psi p q r phi_weight theta_weight psi_weight

altitude = alt; velocity = vel; alpha = alpha*pi/180; %convert to radians

% OUTPUTS: trimmed values for states and controls % INPUTS: guess values for thrust, elevator, alpha (assuming steady level flight)

% Initial Guess for free parameters UX0 = [thrust; elevator; alpha; ail; rud];

% Initialize some varibles % phi = 0; psi = 0; p = 0; q = 0; r = 0; phi_weight = 10; theta_weight = 10; psi_weight = 10;

disp('At what flight condition would you like to trim the F-16?'); disp('1. Steady Wings-Level Flight.'); disp('2. Steady Turning Flight.'); disp('3. Steady Pull-Up Flight.'); disp('4. Steady Roll Flight.'); FC_flag = input('Your Selection: ');

switch FC_flag case 1 % do nothing case 2 r = input('Enter the turning rate (deg/s): '); psi_weight = 0; case 3 q = input('Enter the pull-up rate (deg/s): '); theta_weight = 0; case 4 p = input('Enter the Roll rate (deg/s): '); phi_weight = 0; otherwise disp('Invalid Selection') end


454

% Initializing optimization options and running optimization: OPTIONS = optimset('TolFun',1e-10,'TolX',1e-

10,'MaxFunEvals',5e+04,'MaxIter',1e+04);

iter = 1; while iter == 1

[UX,FVAL,EXITFLAG,OUTPUT] = fminsearch('trimfun',UX0,OPTIONS);

[cost, Xdot, xu] = trimfun(UX);

disp('Trim Values and Cost:'); disp(['cost = ' num2str(cost)]) disp(['thrust = ' num2str(xu(13)) ' lb']) disp(['elev = ' num2str(xu(14)) ' deg']) disp(['ail = ' num2str(xu(15)) ' deg']) disp(['rud = ' num2str(xu(16)) ' deg']) disp(['alpha = ' num2str(xu(8)*180/pi) ' deg']) disp(['dLEF = ' num2str(xu(17)) ' deg']) disp(['Vel. = ' num2str(velocity) 'ft/s']) flag = input('Continue trim rountine iterations? (y/n): ','s'); if flag == 'n' iter = 0; end UX0 = UX; end

% For simulink: trim_state=xu(1:12); trim_thrust=UX(1); trim_ele=UX(2); trim_ail=UX(4); trim_rud=UX(5); trim_control=[UX(2);UX(4);UX(5)]; dLEF = xu(17);


455

Appendix D–7: The MATLAB Program for Computing the Initial States of the

Nonlinear F-16 Model Used in the Trimming Routine, trimfun.m

function [cost, Xdot, xu] = trimfun(UX0)

%========================================================================= % F16 nonlinear model trim cost function for longitudinal motion, % steady level flight (cost = sum of weighted squared state derivatives) % for the High Fidelity (HIFI) F_16 Model % %=========================================================================

global phi psi p q r phi_weight theta_weight psi_weight global altitude velocity fi_flag_Simulink

% Implementing limits: % Thrust limits if UX0(1) > 19000 UX0(1) = 19000; elseif UX0(1) < 1000 UX0(1) = 1000; end;

% elevator limits if UX0(2) > 25 UX0(2) = 25; elseif UX0(2) < -25 UX0(2) = -25; end;

% angle of attack limits if (fi_flag_Simulink == 0) if UX0(3) > 45*pi/180 UX0(3) = 45*pi/180; elseif UX0(3) < -10*pi/180 UX0(3) = -10*pi/180; end elseif (fi_flag_Simulink == 1) if UX0(3) > 90*pi/180 UX0(3) = 90*pi/180; elseif UX0(3) < -20*pi/180 UX0(3) = -20*pi/180; end end

% Aileron limits if UX0(4) > 21.5 UX0(4) = 21.5; elseif UX0(4) < -21.5 UX0(4) = -21.5; end;

% Rudder limits if UX0(5) > 30 UX0(5) = 30; elseif UX0(5) < -30 UX0(5) = -30;


456

end;

if (fi_flag_Simulink == 1) % Calculating qbar, ps and steady state leading edge flap deflection: % (see pg. 43 NASA report) rho0 = 2.377e-3; tfac = 1 - 0.703e-5*altitude; temp = 519*tfac; if (altitude >= 35000) temp = 390; end; rho = rho0*tfac^4.14; qbar = 0.5*rho*velocity^2; ps = 1715*rho*temp; dLEF = 1.38*UX0(3)*180/pi - 9.05*qbar/ps + 1.45;

elseif (fi_flag_Simulink == 0) dLEF = 0.0; end

% Verify that the calculated leading edge flap have not been violated. if (dLEF > 25) dLEF = 25; elseif (dLEF < 0) dLEF = 0; end;

xu = [ 0 ... %npos (ft) 0 ... %epos (ft) altitude ... %altitude (ft) phi*(pi/180) ... %phi (rad) UX0(3) ... %theta (rad) psi*(pi/180) ... %psi (rad) velocity ... %velocity (ft/s) UX0(3) ... %alpha (rad) 0 ... %beta (rad) p*(pi/180) ... %p (rad/s) q*(pi/180) ... %q (rad/s) r*(pi/180) ... %r (rad/s) UX0(1) ... %thrust (lbs) UX0(2) ... %ele (deg) UX0(4) ... %ail (deg) UX0(5) ... %rud (deg) dLEF ... %dLEF (deg) fi_flag_Simulink ...% fidelity flag ]';

OUT = feval('nlplant',xu); Xdot = OUT(1:12,1);

% Create weight function weight = [ 0 ...%npos_dot 0 ...%epos_dot 5 ...%alt_dot phi_weight ...%phi_dot theta_weight ...%theta_dot psi_weight ...%psi_dot 2 ...%V_dot 10 ...%alpha_dpt 10 ...%beta_dot 10 ...%P_dot


457

10 ...%Q_dot 10 ...%R_dot ];

cost = weight*(Xdot.*Xdot);


458

Appendix D–8: MATLAB script for the Simulation of the Nonlinear F-16 Aircraft,

F-16_Simulations.m

% The main Nonlinear F-16 Aircraft simulation script file

global altitude velocity global flight_condition num_trim turning_rate pitch_rate roll_rate

%========================================================================= % Flight_condition must be: % 1 for steady wings-level flight % 2 for truning flights (left or right) using rudder for yaw rate control, % 3 for pulling up flights (up or down) using the elevation for ptich rate control % 4 for rolling flights (360 degress) using the aileron for roll rate control flight_condition = 3; turning_rate = 10; % r in deg pitch_rate = 10; % q in deg roll_rate = 10; % p in deg Elev = 5; % Elevator disturbance deflection cause by pitching up or

down vertically along body axis (in deg) Aile = 5; % Aileron disturbance deflection caused by rolling up or

down longitudinally (in deg) Rudd = 5; % Rudder disturbance deflection caused by yawing left or

right laterally (in deg)

%--------------- delta_T = 0.5; % 0.001 Sampling time of F-16 aircraft TStart = 0; % Start of simulation TFinal = 30; % End of simulation

%--------------- % Trim aircraft to desired altitude and velocity altitude = 50000; % F-16 altitude for the simulation (ft) velocity = 850; % F-16 velocity for the simulation in (ft/s) num_trim = 4;

%--------------- % Initial Conditions for trim routine. % The following values seem to trim to mostflight condition. % But if the F16 does not trim Change these values. thrust = 5000; % thrust, lbs elevator = -0.09; % elevator, degrees alpha = 8.49; % AOA, degrees rudder = -0.01; % rudder angle, degrees aileron = 0.01; % aileron, degrees

%=========================================================================

% Simulate the Nonlinear F-16 Aircraft Model F16_aircraft_model;

%========================================================================= d_LF1 = d_LF; surfaces1 = surfaces; y_sim1 = y_sim; save nf16_orig_1 d_LF1 surfaces1 y_sim1


459

% ======================================================================== % --- Firstly, model_lef; % --- Secondly, extr_comb_1; % --- Thirdly, load or_f16s % Original experimental results load f16_zoh % From ZOH of Simulink f16_comp_2; % Compile the network training data %========================================================================= % Load the data decrsibing the new state new_F16_states; close all


460

Appendix D–9: MATLAB Script for Implementing the Nonlinear F-16 Aircraft

Simulation, F16_aircraft_model.m

%===================================================================== % MATLAB Script File used to implement the non-linear F-16 Simulation. % The results will also be saved to a file and plotted. % ====================================================================

global altitude fi_type velocity fi_flag_Simulink; global surface1 surface2 surface3; global ElevatorDis AileronDis RudderDis; global flight_condition

%--------------- surface1 = 'ele_'; surface2 = 'ail_'; surface3 = 'rud_'; fi_type = 'hifi'; fi_flag_Simulink = 1;

if flight_condition==1 DisEle_1 = 0; DisEle_2 = 0; DisEle_3 = 0; DisAil_1 = 0; DisAil_2 = 0; DisAil_3 = 0; DisRud_1 = 0; DisRud_2 = 0; DisRud_3 = 0; ElevatorDis = 0; AileronDis = 0; RudderDis = 0; surfacedef = 'none'; elseif flight_condition==2 || flight_condition==3 || flight_condition==4 ElevatorDis = Elev; DisEle_1 = ElevatorDis; DisEle_2 = -2*ElevatorDis; DisEle_3 =

ElevatorDis; surfacedef = 'elevator'; %elseif flight_condition==3 AileronDis = Aile; DisAil_1 = AileronDis; DisAil_2 = -2*AileronDis; DisAil_3 = AileronDis;

surfacedef = 'aileron'; %elseif flight_condition==4 RudderDis = Rudd; DisRud_1 = RudderDis; DisRud_2 = -2*RudderDis; DisRud_3 = RudderDis;

surfacedef = 'rudder'; else disp('Error: Invalid Selection') end

%-----The main simulation: trimming and optimization loop-------- [trim_state, trim_thrust, trim_control, dLEF, UX] = ... trim_F16(thrust, elevator, alpha, aileron, rudder, velocity, altitude); open('F16Block') sim( 'F16Block' ,[TStart TFinal]); trim_file = sprintf('%s%.3f%s%.3f%s%.3f_%smodel_alt%0.f_vel%.0f.txt', surface1, ... ElevatorDis, surface2, AileronDis, surface3, RudderDis, fi_type, altitude,

velocity); fid_trim = fopen(trim_file, 'w'); heading1 = sprintf('%% \n\t\t %s DATA Trim-Doublet on %s: Alt %.0f, Alpha

%.0f\n\n', fi_type, surfacedef, altitude, alpha);


461

heading2 =

sprintf('\ntime,npos,epos,alt,phi,theta,psi,vel,alpha,beta,p,q,r,nx,ny,nz,mach,qbar

,ps,\n\n'); fprintf(fid_trim,heading1); fprintf(fid_trim,heading2);

fid_trim = fopen(trim_file, 'a'); for row = 1 : 1 : length(y_sim(:,1)) fprintf(fid_trim,'%8.5f,',T(row,:)); for column = 1 : 1 : length(y_sim(1,:)) fprintf(fid_trim,'%8.5f,',y_sim(row,column)); end for column = 1:1:length(surfaces(1,:)) fprintf(fid_trim,'%8.5f,',surfaces(row,column)); end fprintf(fid_trim,'\n'); end fclose(fid_trim);

% Plot simulation results warning off all graphF16;

Appendix E Embedded PowerPC™440–AGPC Co-Processor Synthesis Summaries

462

APPENDIX E: Embedded PowerPC™440 Processor–AGPC Co-Processor System

XPS Synthesis and Xilinx ISE™ Device Utilization Summaries

APPENDIX E – 1: XPS Synthesis Summary for the Embedded PowerPC™440 Processor–AGPC

Co-Processor System


463

APPENDIX E – 2: Xilinx ISE™ Device Utilization Summary for the Embedded PowerPC™440

Processor–AGPC Co-Processor System


464


465

APPENDIX E – 3: Summary and Table of Contents of the Embedded PowerPC™440 Processor–



466

APPENDIX E–4: The AGPC Co-Processor (f16_nagpc_ipcore_plbw_0) System Device Utilization


467


468

APPENDIX E–5: The EDK Processor API for the AGPC Co-Processor IP Core

Drivers and Software Development Guide


469


470


471


472


473


474


475


476


477


478


479


480

APPENDIX E–6: Software for Initializing the Embedded System Driver and

Implementing the Embedded PowerPC™440 Processor and the AGPC

Co-Processor System on Virtex-5 FX70T ML507 FPGA Board

/* The program for the initialization and implementation of a neural network-based adaptive

generalized predictive control (AGPC) algorithm embedded in a MicroBlaze processor system

for the auto-pilot control system of a nonlinear F-16 aircraft. */

#include "xparameters.h"

#include "xutil.h"

#include "stdio.h"

#include "f16_nagpc_ipcore_plbw.h"

#include "xcope.h"

#include "time.h"

int main(void)

/*==================================Initialize pointers===========================*/

FILE *Flag_a;

//

FILE *HW_NN_Wa;

FILE *HW_NN_Wb;

//

FILE *AIL_REF_P;

FILE *AIL_PRED_P;

FILE *AIL_ROLL_CONT_P;

FILE *ELEV_REF_P;

FILE *ELEV_PRED_P;

FILE *ELEV_PITCH_CONT_P;

FILE *RUDD_REF_P;

FILE *RUDD_PRED_P;

FILE *RUDD_YAW_CONT_P;

//

FILE *START_TIME_P;

FILE *END_TIME_P;

FILE *CLOCKS_PER_SEC_P;

// =========Define the integer to hold the number of iterations===========

char cont_flag, stop_flag;

int BLOCK_PERIOD, HW_OUT_SEQ, NUMBER_OF_SAMPLES, SIM_NUMBER;

BLOCK_PERIOD = 91;

HW_OUT_SEQ = 160;

NUMBER_OF_SAMPLES = BLOCK_PERIOD * HW_OUT_SEQ;

SIM_NUMBER = 0;

Clock_t START_TIME, END_TIME;

//=========================================================================

// Define data to be loaded and written to the inputs of the registers

float HW_NN_Wa_1, HW_NN_Wa_2, HW_NN_Wa_3, HW_NN_Wa_4, HW_NN_Wa_5,

HW_NN_Wa_6, HW_NN_Wa_7, HW_NN_Wa_8, HW_NN_Wa_9, HW_NN_Wa_10,



HW_NN_Wa_21, HW_NN_Wa_22, HW_NN_Wa_23, HW_NN_Wa_24, HW_NN_Wa_25;

float HW_NN_Wb_1, HW_NN_Wb_2, HW_NN_Wb_3, HW_NN_Wb_4, HW_NN_Wb_5, HW_NN_Wb_6;

// Define data to be read and written to the output peripherals.

uint32_t AIL_REF, AIL_PRED, AIL_ROLL_CONT,

ELEV_REF, ELEV_PRED, ELEV_PITCH_CONT,

RUDD_REF, RUDD_PRED, RUDD_YAW_CONT;


481

// Define the values to be read from the peripherals

uint32_t AIL_REF_store, AIL_PRED_store, AIL_ROLL_CONT_store,

ELEV_REF_store, ELEV_PRED_store, ELEV_PITCH_CONT_store,

RUDD_REF_store, RUDD_PRED_store, RUDD_YAW_CONT_store;

// Define the values to be read from the peripherals

float *AIL_REF_P_addr, *AIL_PRED_P_addr, *AIL_ROLL_CONT_P_addr,

*ELEV_REF_P_addr, *ELEV_PRED_P_addr, *ELEV_PITCH_CONT_P_addr,

*RUDD_REF_P_addr, *RUDD_PRED_P_addr, *RUDD_YAW_CONT_P_addr;

//==Begin reading and computation of the AGPC algorithm via the embedded PowerPC440

processor system==

HW_NN_Wa =

fopen("C:\\Xilinx\\11.1\\phd_work\\emb_ppc440_agpc\\emb_ppc440_agpc\\f16_nagpc_ipcore\\F16_NN

_Wa.text","r");

HW_NN_Wb =

fopen("C:\\Xilinx\\11.1\\phd_work\\emb_ppc440_agpc\\emb_ppc440_agpc\\f16_nagpc_ipcore\\F16_NN

_Wb.text","r");

//==================================================================

AIL_REF_P =

fopen("C:\\Xilinx\\11.1\\phd_work\\emb_ppc440_agpc\\emb_ppc440_agpc\\f16_nagpc_ipcore\\fpga_a

gpc_outputs\\AIL_REF.text","w");

AIL_PRED_P =


gpc_outputs\\AIL_PRED.text","w");

AIL_ROLL_CONT_P =


gpc_outputs\\AIL_ROLL_CONT.text","w");

ELEV_REF_P =


gpc_outputs\\ELEV_REF.text","w");

ELEV_PRED_P =


gpc_outputs\\ELEV_PRED.text","w");

ELEV_PITCH_CONT_P =


gpc_outputs\\ELEV_PITCH_CONT.text","w");

RUDD_REF_P =


gpc_outputs\\RUDD_REF.text","w");

RUDD_PRED_P =


gpc_outputs\\RUDD_PRED.text","w");

RUDD_YAW_CONT_P =


gpc_outputs\\RUDD_YAW_CONT.text","w");

//==================================================================

START_TIMEP_P =


gpc_outputs\\START_TIME.text","w");

END_TIME_P =


gpc_outputs\\END_TIME.text","w");

CLOCKS_PER_SEC_P =


gpc_outputs\\CLOCK_PER_SECOND.text","w");

//==================================================================

//===========Load the F-16 neural network model to the register locations ===========

fscanf(HW_NN_Wa,"%f",&HW_NN_Wa_1); //printf("%10.10f\n", HW_NN_Wa_1);


482

























//==================================================

fscanf(HW_NN_Wb,"%f",&HW_NN_Wb_1); //printf("%10.10f\n", HW_NN_Wb_1);






/*=================================================================================*/

printf("==================================================\n");

printf("========== Start of computation ==========\n");

/*=================================================================================*/

/* Initialize the software drivers for the AGPC Pcore devices

with ID 0 and allocate appropraite memory space*/

xc_iface_t *iface;

F16_NAGPC_IPCORE_PLBW_Config F16_NAGPC_IPCORE_PLBW_ConfigTable[160];

xc_create(&iface, &F16_NAGPC_IPCORE_PLBW_ConfigTable[160]);

while (SIM_NUMBER < NUMBER_OF_SAMPLES)

Flag_a =

fopen("C:\\Xilinx\\11.1\\phd_work\\emb_ppc440_agpc\\emb_ppc440_agpc\\f16_nagpc_ipcore\\Flag_a

.text","r");

fscanf(Flag_a,"%c",&cont_flag);

printf("Flag_a = %c\n",cont_flag);

fclose(Flag_a);

while (cont_flag != 'b')

Flag_a =


.text","r");

fscanf(Flag_a,"%c",&cont_flag);

printf("Flag_a = %c\n",cont_flag);

fclose(Flag_a);

if (cont_flag == 'b')


483

START_TIME = clock();

/*=================================================================================*/

// Software drivers, shared memories and AGPC Pcore device declarations:

// First, for the input registers

xc_to_reg_t *toreg_HW_NN_Wa_1, *toreg_HW_NN_Wa_2, *toreg_HW_NN_Wa_3,

*toreg_HW_NN_Wa_4, *toreg_HW_NN_Wa_5, *toreg_HW_NN_Wa_6,







*toreg_HW_NN_Wa_25;

xc_to_reg_t *toreg_HW_NN_Wb_1, *toreg_HW_NN_Wb_2, *toreg_HW_NN_Wb_3,

*toreg_HW_NN_Wb_4, *toreg_HW_NN_Wb_5, *toreg_HW_NN_Wb_6;

xc_to_reg_t *fromreg_HW_OUT_SEQ;

// Second, for the output registers

xc_from_reg_t *fromreg_AIL_REF, *fromreg_AIL_PRED, *fromreg_AIL_ROLL_CONT;

xc_from_reg_t *fromreg_ELEV_REF, *fromreg_ELEV_PRED, *fromreg_ELEV_PITCH_CONT;

xc_from_reg_t *fromreg_RUDD_REF, *fromreg_RUDD_PRED, *fromreg_RUDD_YAW_CONT;

/*==================================================================================*/

// Next, the memory locations for storing the settings of the registers are obtained:

// First, for the inputs

xc_get_shmem(iface, "toreg_HW_NN_Wa_1", (void **) &toreg_HW_NN_Wa_1);

























xc_get_shmem(iface, "toreg_HW_NN_Wb_1", (void **) &toreg_HW_NN_Wb_1);






// For the outputs

xc_get_shmem(iface, "fromreg_AIL_REF", (void **) &fromreg_AIL_REF);

xc_get_shmem(iface, "fromreg_AIL_PRED", (void **) &fromreg_AIL_PRED);

xc_get_shmem(iface, "fromreg_AIL_ROLL_CONT", (void **) &fromreg_AIL_ROLL_CONT);

xc_get_shmem(iface, "fromreg_ELEV_REF", (void **) &fromreg_ELEV_REF);

xc_get_shmem(iface, "fromreg_ELEV_PRED", (void **) &fromreg_ELEV_PRED);


484

xc_get_shmem(iface, "fromreg_ELEV_PITCH_CONT", (void **) &fromreg_ELEV_PITCH_CONT);

xc_get_shmem(iface, "fromreg_RUDD_REF", (void **) &fromreg_RUDD_REF);

xc_get_shmem(iface, "fromreg_RUDD_PRED", (void **) &fromreg_RUDD_PRED);

xc_get_shmem(iface, "fromreg_RUDD_YAW_CONT", (void **) &fromreg_RUDD_YAW_CONT);

xc_get_shmem(iface, "fromreg_HW_OUT_SEQ", (void **) &fromreg_HW_OUT_SEQ);

// Write the data for identified neural network model to the peripheral

xc_write(iface, toreg_HW_NN_Wa_1 -> din, HW_NN_Wa_1);

























xc_write(iface, toreg_HW_NN_Wb_1 -> din, HW_NN_Wb_1);






xc_write(iface, fromreg_HW_OUT_SEQ -> din, HW_OUT_SEQ);

// Read the reference signal, predicted output and the control signal from their

peripherals:

// The roll rate predictions and the aileron control signal.

xc_read(iface, fromreg_AIL_REF -> dout, &AIL_REF_store);

AIL_REF_P_addr = (float*)fromreg_AIL_REF->dout;

xc_read(iface, fromreg_AIL_PRED -> dout, &AIL_PRED_store);

AIL_PRED_P_addr = (float*)fromreg_AIL_PRED->dout;

xc_read(iface, fromreg_AIL_ROLL_CONT -> dout, &AIL_ROLL_CONT_store);

AIL_ROLL_CONT_P_addr = (float*)fromreg_AIL_ROLL_CONT->dout;

// The pitch rate angle predictions and the elevator control signal.

xc_read(iface, fromreg_ELEV_REF -> dout, &ELEV_REF_store);

ELEV_REF_P_addr = (float*)fromreg_ELEV_REF->dout;

xc_read(iface, fromreg_ELEV_PRED -> dout, &ELEV_PRED_store);

ELEV_PRED_P_addr = (float*)fromreg_ELEV_PRED->dout;

xc_read(iface, fromreg_ELEV_PITCH_CONT -> dout, &ELEV_PITCH_CONT_store);

ELEV_PITCH_CONT_P_addr = (float*)fromreg_ELEV_PITCH_CONT->dout;

// The yaw rate predictions and the rudder control signal.

xc_read(iface, fromreg_RUDD_REF -> dout, &RUDD_REF_store);


485

RUDD_REF_P_addr = (float*)fromreg_RUDD_REF->dout;

xc_read(iface, fromreg_RUDD_PRED -> dout, &RUDD_PRED_store);

RUDD_PRED_P_addr = (float*)fromreg_RUDD_PRED->dout;

xc_read(iface, fromreg_RUDD_YAW_CONT -> dout, &RUDD_YAW_CONT_store);

RUDD_YAW_CONT_P_addr = (float*)fromreg_RUDD_YAW_CONT->dout;

// Print the reference signals, output predictions and the control signals at each

time sample to the RS232 serial port

//printf("%d, AIL_REF = %3.4f, AIL_PRED = %3.4f, AIL_ROLL_CONT = %3.4f \n\r",

HW_OUT_SEQ, AIL_REF, AIL_PRED, AIL_ROLL_CONT);

//printf("%d, ELEV_REF = %3.4f, ELEV_PRED = %3.4f, ELEV_PITCH_CONT = %3.4f \n\r",

HW_OUT_SEQ, ELEV_REF, ELEV_PRED, ELEV_PITCH_CONT);

//printf("%d, RUDD_REF = %3.4f, RUDD_PRED = %3.4f, RUDD_YAW_CONT = %3.4f \n\r",

HW_OUT_SEQ, RUDD_REF, RUDD_PRED, RUDD_YAW_CONT);

fprintf(AIL_REF_P, "%3.4f\n", AIL_REF_P_addr);

fprintf(AIL_PRED_P, "%3.4f\n", AIL_PRED_P_addr);

fprintf(AIL_ROLL_CONT_P, "%3.4f\n", AIL_ROLL_CONT_P_addr);

fprintf(ELEV_REF_P, "%3.4f\n", ELEV_REF_P_addr);

fprintf(ELEV_PRED_P, "%3.4f\n", ELEV_PRED_P_addr);

fprintf(ELEV_PITCH_CONT_P, "%3.4f\n", ELEV_PITCH_CONT_P_addr);

fprintf(RUDD_REF_P, "%3.4f\n", RUDD_REF_P_addr);

fprintf(RUDD_PRED_P, "%3.4f\n", RUDD_PRED_P_addr);

fprintf(RUDD_YAW_CONT_P, "%3.4f\n", RUDD_YAW_CONT_P_addr);

//===========================================

fclose(HW_NN_Wa);

fclose(HW_NN_Wb);

//

fclose(AIL_REF_P);

fclose(AIL_PRED_P);

fclose(AIL_ROLL_CONT_P);

fclose(ELEV_REF_P);

fclose(ELEV_PRED_P);

fclose(ELEV_PITCH_CONT_P);

fclose(RUDD_REF_P);

fclose(RUDD_PRED_P);

fclose(RUDD_YAW_CONT_P);

//=============================================================================

fprintf(START_TIME_P, "%2.16e\n", START_TIME);

fprintf(END_TIME_P, "%2.16e\n", END_TIME);

fprintf(CLOKCS_PER_SEC_P, "%d\n", CLOCKS_PER_SEC);

// ------ End_Clock--------------------

END_TIME = clock();

//=============================================================================

fclose(START_TIME_P);

fclose(END_TIME_P);

fclose(CLOCKS_PER_SEC_P);

printf(" Start Time = %2.16e\n\n",START_TIME);

printf(" End Time = %2.16e\n\n",END_TIME);

printf(" Clocks per second = %2.16e\n\n",CLOCKS_PER_SEC);

printf(" Number of simulation = %d\n\n",SIM_NUMBER);


486

//=================================================================================

stop_flag = 'a';

Flag_a =


.text","w+");

fprintf(Flag_a,"%c",stop_flag);

fclose(Flag_a);

//===== Increment the control loop according to the number of samples====

SIM_NUMBER = SIM_NUMBER + BLOCK_PERIOD;

//=================================================================================

printf(" ========== End of computation ==========\n\n");

printf(" BLOCK_PERIOD = %d\n\n",BLOCK_PERIOD);

printf(" Number of samples = %d\n\n",SIM_NUMBER/BLOCK_PERIOD);

return 0;

Documents

ARISTOTLE UNIVERSITY OF THESSALONIKI - The Federal