86
Memòria justificativa de recerca de les beques predoctorals per a la formació de personal investigador (FI) La memòria justificativa consta de les dues parts que venen a continuació: 1.- Dades bàsiques i resums 2.- Memòria del treball (informe científic) Tots els camps són obligatoris 1.- Dades bàsiques i resums Títol del projecte ha de sintetitzar la temàtica científica del vostre document. Intelligent PCA Contribution Analysis for Quality Estimation in Batch Processes. Application in a Sequencing Batch Reactor for Wastewater Treatment Dades de l'investigador (benficiari de l’ajut) Nom Alberto Cognoms Wong Ramírez Correu electrònic [email protected] Dades del director del projecte Nom Joan Cognoms Colomer Llinàs Correu electrònic [email protected] Dades de la universitat / centre al que s’està vinculat Universitat de Girona Departament d'Enginyeria Elèctrica, Electrònica i Automàtica Enginyeria de Control i Sistemes Intel•ligents - Grup de Recerca Número d’expedient 2010FI_B200198 Paraules clau: cal que esmenteu cinc conceptes que defineixin el contingut de la vostra memòria. Batch Processes, Contribution Plots, Data Mining, Fault Diagnosis, Principal Component Analysis Data de presentació de la justificació 28/07/2011

Memòria justificativa de recerca de les beques

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Memòria justificativa de recerca de les beques

Memòria justificativa de recerca de les beques predoctorals per a la formació de personal investigador (FI) La memòria justificativa consta de les dues parts que venen a continuació: 1.- Dades bàsiques i resums 2.- Memòria del treball (informe científic) Tots els camps són obligatoris 1.- Dades bàsiques i resums Títol del projecte ha de sintetitzar la temàtica científica del vostre document. Intelligent PCA Contribution Analysis for Quality Estimation in Batch Processes. Application in a Sequencing Batch Reactor for Wastewater Treatment Dades de l'investigador (benficiari de l’ajut) Nom Alberto

Cognoms Wong Ramírez

Correu electrònic [email protected] Dades del director del projecte Nom Joan

Cognoms Colomer Llinàs

Correu electrònic [email protected] Dades de la universitat / centre al que s’està vinculat Universitat de Girona Departament d'Enginyeria Elèctrica, Electrònica i Automàtica Enginyeria de Control i Sistemes Intel•ligents - Grup de Recerca Número d’expedient 2010FI_B200198 Paraules clau: cal que esmenteu cinc conceptes que defineixin el contingut de la vostra memòria. Batch Processes, Contribution Plots, Data Mining, Fault Diagnosis, Principal Component Analysis

Data de presentació de la justificació 28/07/2011

Page 2: Memòria justificativa de recerca de les beques

Resum en la llengua del projecte (màxim 300 paraules) En aquest treball, es proposa un nou mètode per estimar en temps real la qualitat del producte final en processos per lot. Aquest mètode permet reduir el temps necessari per obtenir els resultats de qualitat de les anàlisi de laboratori. S'utiliza un model de anàlisi de componentes principals (PCA) construït amb dades històriques en condicions normals de funcionament per discernir si un lot finalizat és normal o no. Es calcula una signatura de falla pels lots anormals i es passa a través d'un model de classificació per la seva estimació. L'estudi proposa un mètode per utilitzar la informació de les gràfiques de contribució basat en les signatures de falla, on els indicadors representen el comportament de les variables al llarg del procés en les diferentes etapes. Un conjunt de dades compost per la signatura de falla dels lots anormals històrics es construeix per cercar els patrons i entrenar els models de classifcació per estimar els resultas dels lots futurs. La metodologia proposada s'ha aplicat a un reactor seqüencial per lots (SBR). Diversos algoritmes de classificació es proven per demostrar les possibilitats de la metodologia proposada.

Page 3: Memòria justificativa de recerca de les beques

Resum en anglès(màxim 300 paraules) In this work, a new method to estimate in real-time the quality of final product in batch processes is proposed. This method allows reducing the required time to obtain the quality results by laboratory analysis. A Principal Component Analysis (PCA) model built with historical data in normal operation condition is used to discern if a released batch is normal or not. For abnormal batches, a fault signature is calculated and passes through a classification model for the estimation. The study proposes a method to use the information of the contribution plots as a fault signature, where indicators will represent the behavior of the process variables in the different stages. A fault signature dataset composed of historical abnormal batches is built to search for patterns and train classification models to estimate the results of future batches. The proposed methodology has been applied in a Sequencing Batch Reactor (SBR). Several classification algorithms are tested to prove the possibilities of the proposed methodology.

Page 4: Memòria justificativa de recerca de les beques

2.- Memòria del treball (informe científic sense limitació de paraules). Pot incloure altres fitxers de qualsevol mena, no més grans de 10 MB cadascun d’ells. The structure of this work consist of eight chapters, the glossary and the references. Chapter 1 presents the background of the study, methods and techniques that are going to be executed; the situation in which the study is applied and the objective to achieved for the study. Chapter 2 the different types of wastewater treatment plants are presented, the stages to treat the wastewater, the differences between two wastewater treatment plants and the advantages and disadvantages of one plant with respect the other. Chapter 3 the history and theory of the multivariate statistical process control, its beginnings with statistical process control and how they are applied, then the statistical chart for MSPC to detect faulty product. Next the principal component analysis, a popular MSPC technique for industry processes, its statistical chart for fault detection and the contributions plots to diagnose the faulty products. Finally the unfold-PCA, a technique consistently with PCA but applied to batch processes. Chapter 4 the pilot plant for wastewater treatment description, the historical data with the laboratory analysis of the quality variables of the treated water. Following, the creation of the PCA model for batch processes with the historical data of the plant. The detection of faulty processes with the statistical chart followed by the contribution plots for diagnosis. Chapter 5 the new methodology proposed in this study, contribution limit chart to performed a better diagnosis task than the contribution plots. Other methods that were proposed to achieve a better diagnosis of the stages of a batch process with the contribution limit chart, methods that were discarded because of the poor results. After, the second part of the method proposed were a fault signature is develop to represent a faulty batch to be used to diagnose new released batches. Chapter 6 the new methodology proposed is applied to the historical data of the wastewater treatment plant. The PCA model and the statistic chart are build to detect the faulty batches. The estimation diagnosis of the global quality removal of the treated wastewater are presented using the contribution limit charts, the fault signature with the binary indicator and rules set obtained by a rule induction algorithm. Chapter 7 the new methodology proposed is used to estimate the diagnosis of each quality variable. In this chapter the two indicators for the fault signature is applied to the historical data. Rule induction and classification algorithm are used to obtained the rules set and the knowledge model to performed the estimation diagnosis of the quality variable. Chapter 8 the conclusions of the study. The results with the unfold-PCA technique and the analysis of new methodology proposed to estimate the different quality variables of the process and the advantages of the system. Finally the future works that can be developed with the new methodology.

Page 5: Memòria justificativa de recerca de les beques

Intelligent PCA Contribution

Analysis for Quality Estimation

in Batch Processes. Application

in a Sequencing Batch Reactor

for Wastewater Treatment

Alberto Wong Ramırez

Department of Electrical, Electronic and Automatic Engineering

Control Engineering and Intelligent Systems Group

eXiT

2011 July

Page 6: Memòria justificativa de recerca de les beques
Page 7: Memòria justificativa de recerca de les beques

Abstract

En aquest treball, es proposa un nou metode per estimar en temps real la qualitat del

producte final en processos per lot. Aquest metode permet reduir el temps necessari

per obtenir els resultats de qualitat de les analisi de laboratori. S’utiliza un model

de analisi de componentes principals (PCA) construıt amb dades historiques en

condicions normals de funcionament per discernir si un lot finalizat es normal o no.

Es calcula una signatura de falla pels lots anormals i es passa a traves d’un model

de classificacio per la seva estimacio. L’estudi proposa un metode per utilitzar la

informacio de les grafiques de contribucio basat en les signatures de falla, on els

indicadors representen el comportament de les variables al llarg del proces en les

diferentes etapes. Un conjunt de dades compost per la signatura de falla dels lots

anormals historics es construeix per cercar els patrons i entrenar els models de

classifcacio per estimar els resultas dels lots futurs. La metodologia proposada s’ha

aplicat a un reactor sequencial per lots (SBR). Diversos algoritmes de classificacio

es proven per demostrar les possibilitats de la metodologia proposada.

Page 8: Memòria justificativa de recerca de les beques
Page 9: Memòria justificativa de recerca de les beques

Abstract

In this work, a new method to estimate in real-time the quality of final product

in batch processes is proposed. This method allows reducing the required time to

obtain the quality results by laboratory analysis. A Principal Component Analysis

(PCA) model built with historical data in normal operation condition is used to

discern if a released batch is normal or not. For abnormal batches, a fault signature

is calculated and passes through a classification model for the estimation. The

study proposes a method to use the information of the contribution plots as a fault

signature, where indicators will represent the behavior of the process variables in the

different stages. A fault signature dataset composed of historical abnormal batches

is built to search for patterns and train classification models to estimate the results of

future batches. The proposed methodology has been applied in a Sequencing Batch

Reactor (SBR). Several classification algorithms are tested to prove the possibilities

of the proposed methodology.

Page 10: Memòria justificativa de recerca de les beques
Page 11: Memòria justificativa de recerca de les beques

Acknowledgements

The author wishes to thank the Spanish Goverment (CTQ2008-06865-C02-02), with

the support of the CUR, the DIUE, the Generalitat of Catalonia and the European

Social Fund. The Control Engineering and Intelligent Systems Group (eXiT) and

their personnel for all the support and the Laboratory of Chemical and Environ-

mental Engineering (LEQUIA) and their personnel.

Page 12: Memòria justificativa de recerca de les beques
Page 13: Memòria justificativa de recerca de les beques

Contents

List of Figures xiii

List of Tables xv

1 Introduction 1

1.1 Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Publications and Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Wastewater Treatment Plants 5

2.1 Continuous Treatment Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Sequencing Batch Reactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Multivariate Statistical Process Control 9

3.1 Statistical Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Multivariate Statistical Process Control . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.1 Statistical Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4.1 Principal Components to Retain . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.1.1 Percent Variance Explained . . . . . . . . . . . . . . . . . . . . . 15

3.4.1.2 Kaiser-Guttman Criterion . . . . . . . . . . . . . . . . . . . . . . 15

3.4.1.3 Cattell’s Scree Test . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4.2 Statistical Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.2.1 Squared Prediction Error or Q Statistic Chart . . . . . . . . . . 17

ix

Page 14: Memòria justificativa de recerca de les beques

CONTENTS

3.4.2.2 Hotelling’s T 2 Statistic Chart . . . . . . . . . . . . . . . . . . . . 17

3.4.3 Schematic Interpretation of PCA . . . . . . . . . . . . . . . . . . . . . . . 18

3.4.4 Contribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5 Unfold-PCA for Batch Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Pilot Plant Description and Statistical Modelling 25

4.1 Pilot Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Analysis of Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4 Statistical Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Contribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 New Methodology for Intelligent Contribution Analysis 33

5.1 Contribution Limit Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 Improving the Contribution Limit Chart . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.1 Modify Cumulative Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.2 Sum of Standard Deviation and Stage Mean . . . . . . . . . . . . . . . . . 35

5.2.3 Sum of Standard Deviation with Statistic Range . . . . . . . . . . . . . . 36

5.3 Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3.1 Binary Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . . 36

5.3.2 Numeric Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . 37

5.4 Diagnosis with the Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Intelligent Contribution Analysis for Fault Diagnosis 39

6.1 Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.2 PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3 Contribution Limit Chart and Binary Fault Signature . . . . . . . . . . . . . . . 42

6.4 Diagnosis with the Binary Fault Signature . . . . . . . . . . . . . . . . . . . . . . 43

6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

x

Page 15: Memòria justificativa de recerca de les beques

CONTENTS

7 Intelligent Contribution Analysis for Estimation of Quality Variables 47

7.1 Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.2 PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.3 Binary Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.3.1 Contribution Limit Chart and Binary Fault Signature . . . . . . . . . . . 48

7.3.2 Diagnosis with the Binary Fault Signature . . . . . . . . . . . . . . . . . . 48

7.4 Numeric Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . 51

7.4.1 Contribution Limit Chart and Numeric Fault Signature . . . . . . . . . . 51

7.4.2 Diagnosis with the Numeric Fault Signature . . . . . . . . . . . . . . . . . 52

7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

8 Conclusions and Future Studies 57

Glossary 59

References 61

xi

Page 16: Memòria justificativa de recerca de les beques

CONTENTS

xii

Page 17: Memòria justificativa de recerca de les beques

List of Figures

2.1 Continuous Wastewater Treatment Plant . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Sequencing Batch Reactor Treatment Plant . . . . . . . . . . . . . . . . . . . . . 7

3.1 Schematic Control Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Bivariate vs Univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4 Percent Variance Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5 Eigenvalue vs. Principal Component . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6 A Simplified Representation of PCA . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.7 PCA Model Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.8 3D matrix data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.9 Batch wise unfold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 Pilot Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Pre-process 93 High GQR Batches . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Pre-process 84 High GQR Batches . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4 PCA Model 84 Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Q Statistic Chart for Medium GQR . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.6 Q Statistic Chart for Low GQR . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.7 Q Statistic Chart for Medium and Low GQR Batches . . . . . . . . . . . . . . . 30

4.8 Q Contribution Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.1 Pre-process 70 NOC Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.2 PCA Model 70 Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.3 Q Statistic Chart for AOC Batches . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.4 Fault Signature for AOC Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xiii

Page 18: Memòria justificativa de recerca de les beques

LIST OF FIGURES

6.5 Application Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.1 Fault Signature for AOC Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2 Fault Signature for AOC Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.3 Application Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

xiv

Page 19: Memòria justificativa de recerca de les beques

List of Tables

4.1 Chemical analysis of BNR and GQR. . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Standard Levels for Nutrients Removal. . . . . . . . . . . . . . . . . . . . . . . . 27

6.1 Chemical analysis of BNR and GQR. . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.2 Diagnosis results obtained with the rules set of the CN2 algorithm . . . . . . . . 44

6.3 Diagnosis results obtained with the rules set of the PART algorithm . . . . . . . 44

7.1 CN2 diagnosis table for ammonium. . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.2 CN2 diagnosis table for nitrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.3 CN2 diagnosis table for phosphate. . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.4 IB1 diagnosis table for ammonium. . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.5 CN2 diagnosis table for nitrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.6 CN2 diagnosis table for phosphate. . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.7 IB1 diagnosis table for ammonium. . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.8 CN2 diagnosis table for nitrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.9 CN2 diagnosis table for phosphate. . . . . . . . . . . . . . . . . . . . . . . . . . . 54

xv

Page 20: Memòria justificativa de recerca de les beques

LIST OF TABLES

xvi

Page 21: Memòria justificativa de recerca de les beques

Chapter 1

Introduction

In industrial manufacturing batch processing is an alternative to continuous processing. In

batch processing the input materials are inserted in a reaction tank in a certain sequence and,

after the mixing reaction, a product is released. In some batch processes product quality

is achieved by measuring qualitative variables, which can be done by performing a chemical

laboratory test on the released product. The time period to obtain the chemical test result of

the released product can sometimes be long, requiring that the mixing reaction remains intact

during the time period of the test and risking the loss of valuable materials if the obtained

result is a low-quality product. The development of systems capable to diagnose the quality of

the product release of a batch process to achieve the highest efficiency is a great concern for

production management (1).

Sequencing batch reactor (SBR) processes have demonstrated their efficiency and flexibility

in the treatment of wastewater with high concentrations of nutrients (nitrogen and phosphorous)

and toxic compounds from domestic and industrial sources (2, 3, 4, 5). The SBR process is

highly nonlinear and time varying. Changes in the concentration of the influent, could affect

the process and change the effluent quality. A faster estimation of effluent quality respect the

conventional off-line analysis would be useful to reconfigure and correct the process.

Principal component analysis (PCA) is a tool of multivariate statistical process control

(MSPC) to identify patterns in data of high dimension, expressing the data in a way that

highlights their similarities and differences (6, 7). The primary objective of PCA are data

summarization, classification of variables, outliers detection, early warning of potential mal-

functions and fingerprinting for fault identification (8). PCA is one of the techniques that have

been used in a wide range of continuous processes, proving their ability to detect faults in the

1

Page 22: Memòria justificativa de recerca de les beques

1. INTRODUCTION

processes. Nomikos and McGregor developed the unfold-PCA (U-PCA) for batch processes,

processes with three dimensional data (9). If a process is detected as faulty a PCA contribution

plot is build, a graphical representation on how each variable contributed in the process.

1.1 Current Situation

A SBR reactor with the capacity to remove biological nutrient as organic matter, nitrogen and

phosphorus is applied for wastewater treatment. The measurements of the biological nutrient

removal (BNR) are conducted by laboratory analysis, because the sensors to measure the quality

variables are very expensive. The laboratory results of a finished process can be obtained several

hours later. The process remains intact during the period of the analysis for the quality variables

(organic matter, nitrogen and phosphorus) of the wastewater treated, risking the environmental

requirements to discharged.

1.2 Objective

The main objective for the project is to develop a system that can predict in real-time the quality

variables of the released product from the quantitative variables measured from a batch process.

The advantages of this system consist in a reduction in the investment of expensive sensors to

measure the qualitative variables of a batch process, time reduction for the product quality

analysis with respect to a laboratory analysis that can take several hours and the diagnosis

estimation of the quality variables in real-time, immediately after the product is released.

1.3 Outline

The structure of this work consist of eight chapters, the glossary and the references.

Chapter 1 presents the background of the study, methods and techniques that are going to

be executed; the situation in which the study is applied and the objective to achieved for the

study.

Chapter 2 the different types of wastewater treatment plants are presented, the stages

to treat the wastewater, the differences between two wastewater treatment plants and the

advantages and disadvantages of one plant with respect the other.

Chapter 3 the history and theory of the multivariate statistical process control, its beginnings

with statistical process control and how they are applied, then the statistical chart for MSPC

2

Page 23: Memòria justificativa de recerca de les beques

1.4 Publications and Related

to detect faulty product. Next the principal component analysis, a popular MSPC technique

for industry processes, its statistical chart for fault detection and the contributions plots to

diagnose the faulty products. Finally the unfold-PCA, a technique consistently with PCA but

applied to batch processes.

Chapter 4 the pilot plant for wastewater treatment description, the historical data with the

laboratory analysis of the quality variables of the treated water. Following, the creation of the

PCA model for batch processes with the historical data of the plant. The detection of faulty

processes with the statistical chart followed by the contribution plots for diagnosis.

Chapter 5 the new methodology proposed in this study, contribution limit chart to performed

a better diagnosis task than the contribution plots. Other methods that were proposed to

achieve a better diagnosis of the stages of a batch process with the contribution limit chart,

methods that were discarded because of the poor results. After, the second part of the method

proposed were a fault signature is develop to represent a faulty batch to be used to diagnose

new released batches.

Chapter 6 the new methodology proposed is applied to the historical data of the wastewater

treatment plant. The PCA model and the statistic chart are build to detect the faulty batches.

The estimation diagnosis of the global quality removal of the treated wastewater are presented

using the contribution limit charts, the fault signature with the binary indicator and rules set

obtained by a rule induction algorithm.

Chapter 7 the new methodology proposed is used to estimate the diagnosis of each quality

variable. In this chapter the two indicators for the fault signature is applied to the historical

data. Rule induction and classification algorithm are used to obtained the rules set and the

knowledge model to performed the estimation diagnosis of the quality variable.

Chapter 8 the conclusions of the study. The results with the unfold-PCA technique and the

analysis of new methodology proposed to estimate the different quality variables of the process

and the advantages of the system. Finally the future works that can be developed with the new

methodology.

1.4 Publications and Related

Alberto Wong Ramırez. Multivariate Statistical Process Control (MSPC) Applied to

a Sequencing Batch Reactor for Wastewater Treatment. Master of Science Thesis,

Universitat de Girona (UdG), 2007.

3

Page 24: Memòria justificativa de recerca de les beques

1. INTRODUCTION

A. Wong, J. Colomer, M. Coma and J. Colprim. PCA Intelligent Contribution Anal-

ysis for Fault Diagnosis in a Sequencing Batch Reactor. In Proceedings of the iEMSs

Fifth Biennial Conference, Vol. 3, pages 2230-2237, 2010.

A. Wong, J. Colomer. Soft-Sensor Utilizando Contribuciones ACP para un Reac-

tor Secuencial por Lotes para la Depuracin de Aguas Residuales. In Proceedings

Memorias de la Conferencia Iberoamericana de Complejidad, Informtica y Ciberntica (CICIC

2011), pages 34-39, 2011.

A. Wong Ramırez, J. Colomer Llinas. Fault Diagnosis of Batch Processes Release Using

PCA Contribution Plots as Fault Signatures. In Proceedings of the 13th International

Conference on Enterprise Information Systems, pages 223-228, 2011.

A. Wong Ramırez, J. Colomer Llinas, M. Coma, S. Puig, J. Colprim. Intelligent PCA

Contribution Analysis for Quality Estimation. Submitted to Industrial & Engineering

Chemistry Research, 2011.

4

Page 25: Memòria justificativa de recerca de les beques

Chapter 2

Wastewater Treatment Plants

Wastewater treatment is the process of removing pollutants from wastewater, both runoff and

domestic. The process combine physical, chemical and biological techniques to remove physical,

chemical and biological contaminants (10). The principal objective is to produce a treated

effluent suitable for discharge or reuse back into the environment with standards provided by the

state, Commission Directive 98/15/EC Amending Council Directive 91/271/EEC Concerning

Urban Waste Water Treatment (11).

Wastewater is created by residences, institutions, commercial and industrial buildings. The

wastewater can be treated with small treatment plant or collected and transported through a

pipe network to treatment plant facilities.

2.1 Continuous Treatment Plant

Wastewater treatment plants are commonly composed of a series of stages. There are different

techniques that can be applied to the process to achieve the best quality treated water for

disposal. The major stages to treat the wastewater are (12):

• Preliminary treatment: removes materials that could damage plant equipment of would

occupy treatment capacity without being treated.

• Primary treatment: removes settleable and floatable solids.

• Secondary treatment: removes biochemical oxygen demand (BOD) and dissolved and

colloidal suspended organic matter by biological action. Organics are converted to stable

solids, carbon dioxide and more organisms.

5

Page 26: Memòria justificativa de recerca de les beques

2. WASTEWATER TREATMENT PLANTS

• Advanced waste treatment: uses physical, chemical and biological processes to remove

additional BOD, solids and nutrients.

• Disinfection: removes microorganisms to eliminate or reduce the possibility of disease

when the flow is discharged.

• Sludge treatment: stabilizes the solids removed from wastewater during treatment, inac-

tivates pathogenic organism and reduces the volume of the sludge by removing water.

Figure 2.1: Continuous Wastewater Treatment Plant - Schematic of a continuous wastew-

ater treatment process. Source (10).

2.2 Sequencing Batch Reactor

Sequencing batch reactor differs from the conventional continuous process mainly because the

treatment process in the SBR is performed in one reaction tank following a structure sequence

of stages, while the treatment process in the continuous plant is performed through few reac-

tion vessels. SBR process are commonly used to produce high-quality end products like food,

biochemicals, pharmaceuticals, beverages and many more products from chemical processes.

Batch processes treat material in a prescribed manner for a finite duration. Successful opera-

tion is the reproducibility from batch to batch of a certain product (13). SBR technology have

proved its success in treating urban and industrial wastewater (14, 15).

6

Page 27: Memòria justificativa de recerca de les beques

2.2 Sequencing Batch Reactor

Commonly the SBR process is divide in five discrete periods: fill, reaction, settle, draw and

idle (16). In the fill reaction the influent is introduced to the tank, followed by the reaction

period where the process start to treat the influent. After the reaction cycle is finished, then, a

settle period is performed to separated the solids followed by the draw period where the effluent,

treated wastewater, is obtained. The idle cycle is used for wasting sludge.

Figure 2.2: Sequencing Batch Reactor Treatment Plant - Schematic of a sequencing

batch reactor wastewater treatment process. Source (16).

Few advantages of the SBR are:

• Equalization, primary clarification, biological treatment and secondary clarification can

be achieved in a single reactor tank.

• Operating flexibility and control.

• Minimal space to located the system.

• Cost savings by eliminating other equipments.

Few disadvantages of the SBR are:

• A higher level of sophistication is required, especially for larger systems.

• Higher level of maintenance associated with more sophisticated controls, automated switches

and automated valves.

7

Page 28: Memòria justificativa de recerca de les beques

2. WASTEWATER TREATMENT PLANTS

• Potential of discharging floating or settled sludge during the draw period.

• Potential requirement for equalization after the SBR process is finished.

2.3 Conclusions

In this chapter the two major system processes to treat wastewater were explained. The con-

tinuous process plant requires a large dimensional space to locate the different vessels in which

the wastewater pass through to be treated. The sequencing batch reactor process plant only

needs a reaction tank to execute, practically, all the stages in which the wastewater is treated.

With the SBR process plant the requirements for space dimensionality is very small compared

with the continuous process plant, but the require level of control is more sophisticated.

8

Page 29: Memòria justificativa de recerca de les beques

Chapter 3

Multivariate Statistical Process

Control

Multivariate statistical process control is a technique to study the vast amount of variables in a

complex process. MSPC applied in industrial plant can help to monitor the production, detect

faulty processes and reduce cost by decreasing the defect product rate.

3.1 Statistical Process Control

Statistical process control (SPC) is a statistical technique used to find variation in different sets

of measurements that bring a process into a state of control, and treating this, the improvement

of quality in the output variable.

SPC is attributed to Dr. Walter Shewhart, a concept that he developed during the 1920s

when he was working in the Bell Telephone Laboratories researching on techniques to improve

the quality of the product and reduce cost. With SPC he provided a tool to discern if a process

was in control or not (17).

The shewhart control chart shows if a sample projected in the chart is within the control

limits. The shewhart control chart is based in the measurements performed to quality products

of the process through time and compare those quality characteristic with new samples of the

process. This control chart is composed of the center line where the samples of the process

needs to be around, the upper control limit and the lower control limit where if a sample is

outside the limits, then the process is out of control (18, 19).

The control limits are affected by three parameters: the estimate of average process level, the

process spread expressed as range or standard deviation and a constant based on the probability

9

Page 30: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

of Type I error α. The most popular control limit used is the 3σ control limit. Therefore, the

control limits for the shewhart control chart are:

UCL = QualityCharacteristicMeasurement+ 3σ (3.1)

LCL = QualityCharacteristicMeasurement− 3σ (3.2)

To increase the sensitivity and detect early shifts from the quality characteristic measure-

ment of the process, warning limits are incorporated:

UWL = QualityCharacteristicMeasurement+ 2σ (3.3)

LWL = QualityCharacteristicMeasurement− 2σ (3.4)

Figure 3.1: Schematic Control Chart - Source (19)

The shewhart control chart were used for years in process industry with successful achieve-

ment, where the control is performed in small number of variables, mostly in the quality product

variable. But with the increase of new technology the complexity of the process became more

10

Page 31: Memòria justificativa de recerca de les beques

3.2 Outliers

challenging and the shewhart control chart lacks in detecting changes in the process more

quickly and more over, to control a process with hundreds of variables. Since the shewhart

control chart analyze only one variable, a process with hundred variables will required hundred

shewhart control charts.

3.2 Outliers

An outlier is a sample that is very different from the rest of the data set were it belongs to. The

measurement of the sample differ substantially respect to a certain variable or a combination

of variables (20).

Detecting outliers in a data set is important when a statistical model wants to be create.

The outliers in the data set can influence the parameters calculated to build the model, creating

thresholds that can lead to inaccurate predictions of new samples projected in the model and

then to wrong actions to take.

Figure 3.2: Outlier - Sample 35 is an outlier in the data set.

In figure 3.2 a plot of samples versus standard deviation is presented. The sample 35 have

a high standard deviation with respect to whole data set, meaning that this sample does not

belong to the data set or that the sample have disturbances in its measurements.

11

Page 32: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

3.3 Multivariate Statistical Process Control

SPC is a tool that provide early warnings of fault conditions in the process. The quality

variables of the product are measured in terms of the mean and variation and if a new sample

projected in the statistical chart are outside the thresholds or suggest a shifting of the mean

then actions to correct the behavior needs to be applied (21).

Today a typical industrial process can contain hundreds or thousands of sensors. With

SPC methods each sensor needs its own monitoring chart, which is impractical in situations

where the process is large. Another problem for the statistical process is the way in which

this technique deals with the variables process, assuming that each one is independent of the

others. Because of this limitation a new technique has been developed, finding a way to treat

the variables of a complex process that, in almost all the cases are related to the others.

MSPC is a technique to study the vast amount of variables that we found in a complex

process, and by doing this, trying to find a way or a logical model that represents the mea-

surements in the process for detection of fault processes and to calibrate the process for the

best quality result. Multivariate it is said by the great amounts of variables that we needed to

analyze in a process.

3.3.1 Statistical Charts

Control charting is the most common SPC technique used in the industry (22). The diagnosis

with control chart helps to reduce low quality products in the process (23).

In 1947 Hotelling established the multivariate process control when he applied the technique

to a bombsights problem (24). Hotelling’s T 2 control statistic has the characteristic to find

correlations between the variables of the process. In 1931 Hotelling proposed a concept of

generalized distance between a new observation to its sample mean.

The Hotelling’s T 2 statistic examine a new sample and see if its out-of-control when is

compare with the sample mean. For multiple sets of variables, the Hotellings T 2 statistic will

be plotted in a chart against time or observation and compared with a limit. With a historical

set of data, a normal operation chart can be develop and project the new samples to see if it is

out-of-control.

12

Page 33: Memòria justificativa de recerca de les beques

3.4 Principal Component Analysis

Figure 3.3: Bivariate vs Univariate - Source (25)

3.4 Principal Component Analysis

Principal Component Analysis defined a series of new variables by linear combination of the

original variables that explained the maximal variability and at the same time reduce the

dimension of the problem (6).

PCA search for patterns in the data and deliver information on how the different variables

relates to each other. There are cases where few principal component are needed which explained

the variability of the original data with a minimal loss of information (6, 7). The PCA objective

are data compression, classification of variables, detection of outliers, early warning of process

malfunctions and fault detection (8).

PCA is based in the eigenvector decomposition of the correlation matrix of the process

variables. For a given data matrix X with m rows (samples) and n columns (variables) the

covariance matrix of X is defined as (26):

cov(X) =XTX

m− 1(3.5)

this assumes that the columns of X have been “mean centered”. If the columns of X have

been “autoscaled”, the cov(X) equation gives the correlation matrix of X. PCA decomposes

13

Page 34: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

the data matrix X as the sum of the outer product of vectors ti and pi plus a residual matrix

E:

X = t1pT1 + t2p

T2 + · · ·+ tkp

Tk + E (3.6)

here k must be less than or equal to the smaller dimension of X, i.e. k ≤ minm-1,n. The ti

vectors are known as scores and contain information on how the samples relate to each other.

The pi vectors are eigenvectors of the covariance matrix, i.e. for each pi:

cov(X)pi = λipi (3.7)

where λi is the eigenvalue associated with the eigenvector pi. In PCA the pi are known as

loadings and contain information on how variables relate to each other. The ti form and

orthogonal set (tTi tj = 0 for i 6= j), while the pi are orthonormal (pTi pj = 0 for i 6= j, pTi pj =

1 for i = j). Note that for X and any ti, pi pair

Xpi = ti (3.8)

This is because the score vector ti is the linear combination of the original X data defined

by pi. The ti, pi pairs are arranged in descending ordered according to the associated λi. The

λi are a measure of the amount of variance described by the ti, pi pair. In this context, we

can think of variance as information. Because the ti, pi pairs are in descending order of λi, the

first pair capture the largest amount of information of any pair in the decomposition. In fact,

it can be shown that the t1, p1 pair capture the greatest amount of variation in the data that

it is possible to capture with a linear factor. Subsequent pairs capture the greatest possible

variance remaining at that step.

3.4.1 Principal Components to Retain

Deciding how many principal component to retain is one of the issues for a PCA model. If there

are fewer principal components retained, the threshold would be narrower and then the model

would have higher false alarms. Instead, if there are too many principal component retained,

the threshold of the model would be wider and early detection of process misbehavior would

be slow (27). Several methods have been proposed to retain the exact number of principal

components.

14

Page 35: Memòria justificativa de recerca de les beques

3.4 Principal Component Analysis

3.4.1.1 Percent Variance Explained

In this method the principal components to retain represent a percentage of the total variance in

the process (figure 3.4). This percentage is obtained through the calculation of the eigenvalues

from the covariance matrix, each eigenvalue is a measure of the process variance (27). This

method is arbitrary to build a model with the correct number of principal components, therefore,

others methods needs to confirm if the percentage proposed could be correct.

Figure 3.4: Percent Variance Explained - 20 principal component with their respective

cumulative variance percentage.

3.4.1.2 Kaiser-Guttman Criterion

This method is maybe the most employed to retained principal components. If the eigenvalues

of the covariance matrix are greater than one, then those principal components are retained.

According to this method, eigenvalues lower than one explain less variance than the original

standardize variables (28, 29). Regarding with this method, many authors say that the rule of

greater or less than one is an arbitrary decision in the values around one. It is known that this

method retains many principal components. In figure 3.5 the number of principal component

suggested to retain with this method are two, because both principal component are above one.

3.4.1.3 Cattell’s Scree Test

The method proposed by Catell observe the eigenvalue versus the principal component chart

and looks for a scree shape at the bottom of the graph. In the graph there are two section where

15

Page 36: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

first the eigenvalue start to fall quickly and the second section where the eigenvalues fall looks

more like a straight line. The break of the two section suggest that the first section is the linear

relationship between the variables of the process and the second the noise and uncertainties

of the process (27). This method does not give a clear definition of the break point between

the principals components that have the information and the trivial ones. The scree test has

a tendency to overestimate (29). In figure 3.5 the number of principal component to retain

suggested with this method are between three and four, depends on where is considered the

break point.

Figure 3.5: Eigenvalue vs. Principal Component - Graph to determine the number of

principal component to retain.

3.4.2 Statistical Charts

One of the key issues for the development of a MSPC statistical chart is to have samples

of the process that were functioning within the specifications of the product quality. The

success of this monitoring chart has the basis in that many variables of the process are highly

correlated, therefore, linear combination of the correlated variables can be performed and the

dimensionality of the problem is reduce since the new linear variables explain the process (8, 30).

The PCA statistical charts can detect if a process is out of its control zone, that is, if it is a

faulty process. The T 2 statistic measures the variation of a new process inside the PCA model

and the Q statistic measures if the process is inside the projection of the PCA model.

16

Page 37: Memòria justificativa de recerca de les beques

3.4 Principal Component Analysis

3.4.2.1 Squared Prediction Error or Q Statistic Chart

The squared prediction error (SPE) or the Q statistic chart measure the distance between the

projection space of the PCA model and the new sample that is projected in the model. If the

sample is different from the cases used to build the PCA model, then the sample will move

away from the plane (31).

The Q is simply the sum of squares of each row (sample) of E, for example, for the ith

sample in X, xi:

Qi = eieTi = xi(I−PkP

Tk )xTi (3.9)

where ei is the ith row of E, Pk is the matrix of the first k loadings vectors retained in the

PCA model (where each vector is a column of Pk) and I is the identity matrix of appropriate

size (n by n).

Confidence limits can be calculated for Q, provided that all of the eigenvalues of the covari-

ance matrix of X, the λi, have been obtained:

Qα = Θ1

[cα

√2Θ2h20Θ1

+ 1 +Θ2h0 (h0 − 1)

Θ21

] 1h0

(3.10)

where

Θi =

n∑j=k+1

λij for i = 1, 2, 3 (3.11)

and

h0 = 1− 2Θ1Θ3

3Θ22

(3.12)

In equation 3.10, cα is the standard normal deviate corresponding to the upper (1-α) per-

centile. In equation 3.11, k is the number of principal components retained in the model and

n is the total number of principal components. Thus, n is less than or equal to the smaller of

the number of variables or samples in X.

3.4.2.2 Hotelling’s T 2 Statistic Chart

The sum of normalized squared scores, Hotelling’s T 2 statistic, measure the distance between

the mean position of the process with the new sample. Is a measure of the variation in each

sample within the PCA model.

17

Page 38: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

T 2i = tiλ

−1tTi = xiPλ−1PTxTi (3.13)

where ti in this instance refers to the ith row of Tk, the matrix of k scores vectors from the

PCA model. The matrix λ−1 is a diagonal matrix containing the inverse eigenvalues associated

with the k eigenvectors (principal components) retained in the model.

Statistical confidence limits for the values for T 2 can be calculated by means of the F-

distribution as follows

T 2k,m,α =

k(m− 1)

m− kFk,m−k,α (3.14)

here m is the number of samples used to develop the PCA model and k is the number of

principal component vectors retained in the model.

3.4.3 Schematic Interpretation of PCA

A PCA interpretation for process variables are presented in figure 3.6. The deviations from the

nominal trajectories of five variables are shown. The variables x1, x3 and x4 shows approxi-

mately the same patterns; the x4 variable have some peaks, probably outliers. The variables x2

and x5 also shows approximately the same pattern in both of them. Therefore, the variables x1,

x3 and x4 are highly correlated and a new variable, principal component t1 can be created. The

variables x2 and x5 are highly correlated too, a principal component t2 can be created. The

first principal component correspond to the largest number of correlated variables, the second

principal component to the next largest number of correlated variables.

Figure 3.6: A Simplified Representation of PCA - Source (30)

A PCA model is presented in figure 3.7, were the first and second principal components

are the blue lines, while an unusual sample with T 2 and Q are the red circles and the samples

within control are the green circles.

18

Page 39: Memòria justificativa de recerca de les beques

3.4 Principal Component Analysis

Figure 3.7: PCA Model Schematic - Source (26)

3.4.4 Contribution Plots

The statistical charts do not give information of which process variables caused the process to

be faulty. Contribution plots gives information on how the variables interact in the process. In

a faulty process the contribution plot is used to observed which variables of the process caused

the low-quality of the released product, variables with the highest contribution magnitude

(31, 32, 33). The most common indexes used for fault diagnosis with contribution plots are T 2

and Q.

If the contribution of a particular score variable towards the T 2 statistic is abnormally large,

the individual contribution of the jth process variable to the ith score variable, c(ti)j , can be

determined as follows:

c(ti)j =

pijxjpTi x

λi= pijxj

tiλi

(3.15)

where ti and λi represent the value and the variance of the ith score variable, respectively, pij

is the element in the ith row and the jth column of the matrix P, pi is the ith column vector

of P, z is the current data vector and zj is the value of the jth process variable.

The contribution of the jth variable to the Q statistic can be obtained as follows:

c(Q)j = ΦTj x (3.16)

19

Page 40: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

where ΦTj is the jth row of the matrix IN+M −PPT and IN+M represents and N+M identity

matrix.

3.5 Unfold-PCA for Batch Processes

Principal component analysis is a technique of MSPC that identifies process data patterns

through the correlation of variables. With PCA the vast number of variables in a process

is reduce by creating new variables that represent the linear combination of the correlated

variables (8). PCA is applied to continuous processes where the measured data is arranged

in a 2D matrix, the rows represents the time and the columns the different variables. Batch

processes are finished in a finite time and the data measured from the process is arranged in a

3D matrix (figure 3.8).

Figure 3.8: 3D matrix data. - Measured batch process data arranged as a 3D matrix.

Unfold principal component analysis is a technique that converts a 3D matrix of a batch

process into a 2D matrix to be treated with PCA, a technique developed by Nomikos and

MacGregor as multiway principal component analysis (MPCA) (9), lately known as U-PCA.

Batch-wise unfolding turns the 3D matrix (IxJxK) into a 2D matrix (IxJK), where the i =

1, 2, ..., I are the processed batches, j = 1, 2, ..., J are the variables of the process and k = 1,

2, ..., K is the duration of the process. The columns of the resulting matrix are mean centered

and scaled to unit variance (figure 3.9).

In U-PCA the array X is decomposed as the summation of the product of score vectors (t)

and loading matrices (P) plus a residual array E that is minimized in a least squares sense:

X =

R∑r=1

tr ⊗Pr + E (3.17)

20

Page 41: Memòria justificativa de recerca de les beques

3.5 Unfold-PCA for Batch Processes

Figure 3.9: Batch wise unfold. - Unfolding a 3D matrix into a 2D matrix.

U-PCA is statistically and algorithmically consistent with PCA, therefore, the principal

components to retain from section 3.4.1), the statistical charts from section 3.4.2 and the

contributions plots from section 3.4.4, uses the same theory and methodology to perform the

same task with the U-PCA.

3.5.1 Applications

In the chemical industry batch and semi-batch process are of great demand because of the high

quality products. There are used in reactors, crystallization, distillation, injection molding

processes, the manufacture of polymers and more chemical related industry. One of the charac-

teristic for batch processes is that the processing of the materials are in a prescribed sequence

for a finite duration. The achievement is to reproduce the prescribed recipe from batch to batch

(13).

The MSPC based on U-PCA have been successful to analyze the industrial data with their

statistics charts in the monitoring of real-time processes, completed batches, and for on-line

monitoring (34).

There are several companies that use the methodology as a real-time release monitoring

system for released products. When the batch is finished the data obtained from the process

are passed through the PCA model to observe if the sample is within the control limits. If the

product is beyond the control limits is sent to the laboratory to obtained a diagnosis of the

21

Page 42: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

problem. With this kind of procedure companies saves money and time; money because if the

process continues and the laboratory analysis of the product last few hours, then the next run

batches could have the same problem if there are any problem with the batch; time because the

process does not need to wait for the laboratory analysis to know if its a high quality product

(30).

To perform a batch process monitoring first is needed a set of historical data where the

batches were in normal operation condition (NOC) . With the NOC batches a preliminary

PCA model is built. During the analysis of the NOC PCA model if there are batches that

present disturbances or does not belong to the NOC batches, then those batches considered as

outliers are removed from the set. After removing the outliers, a new PCA model is built and

new batches are projected to test the consistency of the model.

A historical abnormal operation condition (AOC) and a NOC batch are projected in the

NOC PCA model for testing. After the projection the Q and the T 2 of both batches are

calculated and compared with corresponded statistic limit of the model. If the NOC PCA

model is correct, the NOC test batch will be below the statistic limits and the AOC test batch

above the statistic limits. To diagnose the AOC test batch contribution plots can be calculated

to find the cause of the abnormal process (35, 36).

SBR processes are widely used to treat wastewater with high concentration of nutrients

(nitrogen, phosphorous) and toxic compounds from domestic and industrial sources. Variations

in the concentration of the wastewater influent can lead to low quality effluent because those

changes affect directly the biological reaction of the process. Therefore, early fault detection

are needed to correct the biological process since such processes may take few days to recover

from an abnormal state (37).

Studies related with MSPC, PCA and SBR wastewater treatment plant could be found in

(38, 39, 40, 41).

3.6 Conclusions

In this chapter the evolution of the techniques to monitoring industry processes is presented.

The Shewhart control chart was the first SPC technique used to control the behavior of a process

and reduce the quantity of low quality products, this technique was implemented widely in the

industry. Almost three decades after, Hotelling proposed the multivariate process control,

where the control charts tries to find correlation between the variables of the process. The

22

Page 43: Memòria justificativa de recerca de les beques

3.6 Conclusions

principal component analysis, a method developed in 1900, is one of the techniques used for

monitoring, diagnosis and control of todays industries, where large amount of variable are

needed to be controlled. The methods to retain principal components for building a PCA

model, the recognition of outliers, the detection of faulty products with the statistical charts

and the diagnosis with the contribution plot have demonstrated great results in the industry.

The unfold-PCA, a technique mathematically and algorithmically consistent with PCA, applied

to batch process have demonstrated its great results applied in the chemical industry in the

detection of faulty batches and its diagnosis, and the reduction in lost of raw materials and

low quality product. The diagnosis with the contribution plots is a field of PCA that is not

studied widely. If the contribution of a faulty batch, from a small process, is observed at naked

eye, probably the expert in the process can make the diagnosis watching the contribution of

the variables through all the process. The batch processes are highly non-linear, therefore,

is difficult to have an expert that can read the contribution of the process and relate the

contribution of the measured variables with the quality variables. New methods have to be

developed to improve the diagnosis.

23

Page 44: Memòria justificativa de recerca de les beques

3. MULTIVARIATE STATISTICAL PROCESS CONTROL

24

Page 45: Memòria justificativa de recerca de les beques

Chapter 4

Pilot Plant Description and

Statistical Modelling

4.1 Pilot Plant

The pilot plant is an SBR for wastewater treatment with the capability to eliminate organic

matter (C) , ammonium (NH+4 ) , nitrates (NO−

2 orNO−3 ) and phosphate (PO3−

4 ) (figure 4.1).

In continuous systems the reaction and settling occur in different reactors, but in the SBR all the

processes are conducted in a single reactor following a sequence of stages: fill, reaction, settling

and draw. The stages of the batch configuration depends on the wastewater characteristics and

the legal requirements (11).

Figure 4.1: Pilot Plant - Sequencing batch reactor for wastewater treatment.

The pilot plant is located in the LEQUIA laboratory at the University of Girona (Cataloni-

Spain). The maximum capacity of the SBR is 30 liters. The influent wastewater is synthetic,

25

Page 46: Memòria justificativa de recerca de les beques

4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING

is a blend of carbon source, ammonium solution, phosphate buffer, alkalinity control and mi-

croelements solution. The influent wastewater is stored in a store tank with a capacity of 150

liters. The temperature inside the store tank is 4◦C to minimize the microbial activity. This

reactor is located in a thermo-regulated room at 20◦C.

To monitor essential variables, the SBR process is equipped with pH (EPH-M10), dissolved

oxygen (DO) (WTW OXI 340), oxidation reduction potential (ORP) (ORP M10) and Temper-

ature (Temp) (PT 100) Endress-Hauser probes.

The SBR cycle is composed of four section: biological reaction, wastage, settling and draw-

ing. The study will focus in the biological reaction that is composed of six stages: first fill (F1),

anaerobic condition (ANA), first aerobic condition (AE1), second fill (F2), anoxic condition

(ANO) and second aerobic condition (AE2).

4.2 Analysis of Historical Data

The historical data from the SBR process are composed of 266 batch cases associated with

their respective BNR and global quality removal (GQR) for the wastewater processed provided

by the chemical laboratory, table 4.1. The quality specifications are according to the Euro-

pean Community Council Directive (11), table 4.2. Extra information is provided from off-line

analysis in (42).

BatchesBiological Nutrient Removal Global Quality

C NH+4 NO−

2 orNO−3 PO3−

4 Removal

93 X X X X High

58 X × X X Medium

24 X X X × Medium

91 X X • × Low

Table 4.1: Chemical analysis of BNR and GQR.

X = high quality removal. •= medium quality removal. × = low quality removal.

The duration of the different stages of the biological reaction are composed as follow: 10

minutes for F1, 150 minutes for ANA, 100 minutes for AE1, 11 minutes for F2, 75 minutes for

ANO and 78 minutes for AE2. The data collected from the process has a sample every minute.

Since there are four sensors to measure the process, the 3D matrix will have 266 batches in the

I axis, 4 variables in the J axis and 424 instances of time in the K axis (see 3D matrix data in

figure 3.8).

26

Page 47: Memòria justificativa de recerca de les beques

4.3 PCA Model

Biological Nutrient C (NH+4 ) (NO−

2 orNO−3 ) (PO3−

4 )

Removal mgCOD/L mgN/L mgN/L mgP/L

High < 84 < 6, 7 < 3, 3 < 0, 9

Medium 84 - 125 6,7 - 10 3,3 - 5 0,9 - 2

Low > 125 > 10 > 5 > 2

Table 4.2: Standard Levels for Nutrients Removal.

4.3 PCA Model

The 93 cases of high GQR were analyzed to build a PCA model that can detect the medium

and low GQR cases. The cumulative variance expected to retained is 70% or higher and the raw

data are going to be unfolded in batch-wise, figure 3.9. The pre-processing method used for the

unfolded data was the block/group scaling method (32), there were 9 cases considered as outliers

(20), figure 4.2. A PCA model composed of 84 GQR batches (figure 4.3) with three principal

component retaining and explaining 72,81% of cumulative variance was obtained, figure 4.4.

Figure 4.2: Pre-process 93 High GQR Batches - Block/group scaling of the 93 high GQR.

The black circles shows some outliers.

4.4 Statistical Chart

The statistical charts are used to detect faulty batches of the process. To identify if a batch is

faulty, the first statistical chart to project the released batch is the Q statistic. If the released

27

Page 48: Memòria justificativa de recerca de les beques

4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING

Figure 4.3: Pre-process 84 High GQR Batches - Block/group scaling of the 84 high GQR

batches without some outliers.

Figure 4.4: PCA Model 84 Batches - Model with 84 GQR batches and three principal

component.

28

Page 49: Memòria justificativa de recerca de les beques

4.4 Statistical Chart

batch is detected as a faulty batch, then theres no need to project the batch in the T 2 statistic.

If the released batch is between the confidence limits of the Q statistic then the released batch

is projected in the T 2 statistic. If the batch is between the confidence limits of the T 2 statistic

then the batch is within the requirements, otherwise the batch is faulty.

The first task is to observe if the Q statistic chart of the 84 high GQR PCA model can

detect the medium GQR and the low GQR batches. If the PCA model meets the requirements

for the Q statistic to detect both group, then the following task is to observe if there are any

suggestion or clue that could provide information in which group belongs the faulty batch.

In the following figures, figure 4.5 and figure 4.6, the gray circles are the 84 high GQR

batches that were used to build the PCA model and the inverted triangles are the 82 medium

GQR batches (figure 4.5) and the 91 low GQR batches (figure 4.6). In both figures the medium

GQR and the low GQR are above the confidence limit of the Q statistic, meaning that all the

batches are faulty.

Figure 4.5: Q Statistic Chart for Medium GQR - 82 medium GQR batches projected in

the Q statistic chart of the 84 high GQR PCA model.

The medium GQR and the low GQR batches were projected together in the Q statistic chart

to observe if there are any pattern or hint that could lead to identify to which GQR group a

batch belongs (figure 4.7).

In figure 4.7 all the batches are above the confidence limit of the Q statistic chart, as were

before in figure 4.5 and figure 4.6, but the batches from both group are virtually in every

29

Page 50: Memòria justificativa de recerca de les beques

4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING

Figure 4.6: Q Statistic Chart for Low GQR - 91 low GQR batches projected in the Q

statistic chart of the 84 high GQR PCA model.

Figure 4.7: Q Statistic Chart for Medium and Low GQR Batches - 82 medium GQR

and 91 low GQR batches projected in the Q statistic chart of the 84 high GQR PCA model.

30

Page 51: Memòria justificativa de recerca de les beques

4.5 Contribution Plots

position of the Q residual axis. It is clear that further analysis should be performed to identify

whether a batch is medium GQR or low GQR.

4.5 Contribution Plots

To diagnose a batch process a contribution plot of the batch is calculated. The contribution

plot is used to observe which variable or variables of the process caused the faulty process. If a

variable has a higher magnitude value than the others, probably that variable or variables are

causing the failure in the process.

Figure 4.8: Q Contribution Plot - Q contribution plot of a low GQR batch

In figure 4.8 the variables with the highest magnitude with respect to the others are the

pH in the ANO stage and the DO in the AE1 and AE2 stage, therefore this variables are

investigated, probably they are the ones that contribute the most to the faulty process. If

the conclusion for the diagnosis of the faulty batch relays in the variables with the highest

magnitude, it can be incur in taking the wrong actions to change the behavior of the process

and making the process goes from a wrong behavior state to a even more wrong behavior.

The issue lies in how the variables are supposed to contribute in a process. In the q con-

tribution plot in figure 4.8 maybe the variables with the highest magnitude are supposed to

contribute in that way in the process to be within the requirements.

31

Page 52: Memòria justificativa de recerca de les beques

4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING

4.6 Conclusions

This chapter describe the pilot plant, a sequencing batch reactor for wastewater treatment.

The batch process is applied to remove critical nutrients of the wastewater, organic matter,

ammonium, nitrates and phosphate. There are 93 historical batches with normal behavior that

are used to build the PCA model. In the training phase 9 batches were considered outliers.

The 84 normal batches PCA model with the Q statistic chart could detected the 173 abnormal

historical batches. The contribution plot of a faulty batch suggest that the ANO stage of the pH

variable and the AE1 and AE2 stage of the DO variable are probably the series of instances of

the process that contributes to make the process faulty. At naked eye probably the assumption

is right, but developing a method or a tool to be certain of the assumption is necessary.

32

Page 53: Memòria justificativa de recerca de les beques

Chapter 5

New Methodology for Intelligent

Contribution Analysis

When a process is flawed it is important to know its behavior, and which factors were responsible

for the low-quality product. Occasionally, when there are too many factors involved in a faulty

process, the task of classifying the type of failure is difficult. The fault diagnosis of batch

processes is widely studied to prevent failure in the released product, where process misbehavior

is introduced for simulation and prediction results (43).

In recent years the development of techniques for fault detection and diagnosis in batch

processes have been widely used as real-time tools to prevent further releases of low quality

products. Analysis techniques have been proposed in previous studies to monitor the process

of an SBR for wastewater treatment (39, 40, 44), these works are mainly focused on fault

detection. Furthermore, systems capable of estimating quality variables of the process have

been developed using artificial neural networks (43, 45) and in some cases combined with PCA

(46, 47).

The experiments performed by the studies related in estimating the qualitative variables of

the released product has different mixing effluent highly controlled and in few cases have sensors

for the quality variables. In this study the purchased or expensive sensors to measure the

quality variables was discarded due the amount of available budget. Therefore, the techniques

or methods used in other studies could not be executed in this study as in those experiments

presented, because the amount of well controlled data was not enough. One of the key point in

not having controlled data was to resemble the behavior of real wastewater, where the influent is

always different depending on different situations as the weather condition, industry wastewater,

33

Page 54: Memòria justificativa de recerca de les beques

5. NEW METHODOLOGY FOR INTELLIGENT CONTRIBUTION ANALYSIS

urban wastewater and issues that affect the wastewater influent.

Since small budgets in projects restricts certain types of experiments that can be executed,

the development of new ideas to obtain the knowledge required of the process is needed. The

method proposed in this study is to create a fault signature (FS) to predict the diagnosis of

the quality variables for the faulty batches. The FS will represent the behavior of the stages

through each variable with the information gathered on how the variables contribute to the

process. To obtain the behavior of the variables in the stages a contribution limit chart is

developed.

If prediction of the faulty released batches is the objective in this study, a dataset of historical

faulty batches will be needed to associate the behavior of those faulty batches with the behavior

of the batches that needs to be diagnose. Therefore, a fault signature dataset (FSD) with all the

FS of the faulty batches associated with their respective chemical analysis of quality variable

have to be built. To obtained the knowledge of the FSD different classification and rule induction

algorithms are going to be applied to the dataset.

5.1 Contribution Limit Chart

The contribution limit chart are developed to compare the contribution plots against a threshold

for the contribution of the variables. The objective of the limit chart is to better detect the

variables that cause the process to be faulty.

For every time instant of the PCA model batches the mean and the standard deviation

of the contributions are calculated. Then the upper contribution limit (UCL) and the lower

contribution limit (LCL) for the new contribution limit chart will be the mean plus/minus three

times the standard deviation (equation 5.1 and 5.2).

UCL(y) = my + 3std (5.1)

LCL(y) = my − 3std (5.2)

where y is the T 2 or Q for whom the limit is built, my is the mean and std is the standard

deviation.

34

Page 55: Memòria justificativa de recerca de les beques

5.2 Improving the Contribution Limit Chart

5.2 Improving the Contribution Limit Chart

The following methods were proposed to performed a wiser diagnosis. In the analysis of the

faulty batches with the contribution limit chart, there were situations when one contribution

in any stage had a very high magnitude, probably cause from wrongly sensor measurements,

electrical disturbances or any other incidence that could affect the measurements of the process.

Despite all the efforts implemented with the following methods, the results obtained did not

provided great expectation to go further with implementation of these method for this study.

5.2.1 Modify Cumulative Sum

The cumulative sum (CUSUM) are charts used in SPC to detect small shifts in the mean value

of a continuous process (48). The standard deviation for each sample of the process is projected

in the CUSUM chart to display how the samples are shifting from the mean value and a sum

of the standard deviation for the previous samples with the new sample is calculated. A set of

rules describe the warnings when the sum of the deviation are above or below the threshold of

the mean value, meaning that the process is likely to change or release faulty products.

The proposal was to incorporate the CUSUM with the contribution limit of the contribution

limit chart. In this modification of the CUSUM, the contribution limit will act as the mean

value of the process and the quantity, on how many times the standard deviation is above the

contribution limit, the value to do the summation. The summation will be made for each stage

in the different variables. The value will be the one to use for the diagnosis of the batches.

After all the study conducted, the information gathered did not help to performed the diagnosis.

The method could work if the correction of the process is performed on-line. In this study the

correction of the process is performed after the diagnosis of a released batch, real-time diagnosis.

5.2.2 Sum of Standard Deviation and Stage Mean

The proposal was to sum how many standard deviation the contributions in a stage are outside

the contribution limits and compare the value with the mean period length of the corresponding

stage. If the value of the sum of the contributions outside the threshold is greater than the

stage length mean, then the stage is considered as faulty. After all the different tests to verify

the quality of the proposal, the results were not encouraging.

35

Page 56: Memòria justificativa de recerca de les beques

5. NEW METHODOLOGY FOR INTELLIGENT CONTRIBUTION ANALYSIS

5.2.3 Sum of Standard Deviation with Statistic Range

In this proposed method the value of the summation of the contributions is the same as section

5.2.2, the difference is that a statistic range is calculated for the stage, where the highest and

lower values of the contribution limit in the stage are used to obtain the statistic range. If the

value of the sum of standard deviations are outside the range, then the stage is considered as

faulty.

5.3 Fault Signature

The objective of the FS is to create a vector that will represent the behavior of the stages

through each variable of a faulty batch process thanks to the analysis of the contribution limit

chart, which would provide information on how the variables contributed to the process and,

at the same time, reduce the dimensionality of the information that should be analyzed.

In batch processes l = 1, 2, ..., L stages need to be completed to achieve the final product.

So, the summation of all the individual stage durations (βl) must be equal to the K duration

time of the process, as in equation (5.3):

L∑l=1

βl = K (5.3)

The proposal to reduce the dimensionality is to obtain an indicator in each stage for each

variable. A vector containing all the indicators obtained in this way will be the FS. If there are

L stages in the process that need to be completed and J variables that are analyzed, JL will be

the length of the FS vector. In this way, the FS will represent the faulty process with a vector

of JL fields, where JL << JK.

5.3.1 Binary Indicator for Fault Signature

The FS indicators representing the behavior of the stages obtained through the analysis of the

contribution limit chart will be binary values.

In the analysis of the contribution plot for a faulty batch projected in the contribution limit

chart, if the contribution in any instance exceeds the UCL and LCL thresholds, it is counted as

an event. If the total number of events in a stage is equal to or less to a given percentage of the

length of that stage, the indicator of the variable for the stage would be normal (0); otherwise,

it would be abnormal (1).

36

Page 57: Memòria justificativa de recerca de les beques

5.4 Diagnosis with the Fault Signature

One of the issues with this proposal is that implies that each stage has an equal contribution

importance to the process and can incur in a loss of information and consequently a misdiagnosis

of the batch. Also, the choice of a percentage is very relative since there are no method to choose

the correct percentage for the limit.

5.3.2 Numeric Indicator for Fault Signature

The FS indicators representing the behavior of the stages will be the instances outside the

thresholds of the contribution limits.

During the analysis of the contribution plot for a faulty batch projected in the contribution

limit chart, if the contribution in any instance exceeds the UCL and LCL thresholds, it is

counted as an event. Then, at the end of the stage the quantity of the instances outside the

thresholds will be the indicator representing the behavior of the stage.

The advantage with this proposal is that the indicator have different quantity of maximum

range (length period) allowed in the different stages and does not imply that all the stages

contributes equally to the process, therefore, a better FS can be obtained to diagnose a faulty

batch.

5.4 Diagnosis with the Fault Signature

The FS provides the information on how the variables of the faulty batch contribute to the

different stages of the process. The objective is to build a FSD with historical faulty batch

processes associated with their respective quality variable analysis that will be used to diagnose

future batch releases. The integration of statistical methods with expert system has been

proposed to deal with the difficulties of diagnosing faulty process (49, 50).

Since the FSD can have high number of cases and the FS have high number of fields,

knowledge of the FSD can be obtained with rule induction and classification algorithms that

are machine learning tools used to find patterns in databases and classify new events. Given

an input data set, the FSD, the algorithm searches of the best description instances to map the

classes of an output dataset, quality variable. The algorithm will be applied to the FSD and

will deliver a set of rules or a knowledge model to help to predict the diagnosis of the quality

variables in future batch releases.

The rule induction algorithms (RIA) used in this study are the CN2 algorithm (51), an

induction algorithm that combines the ID3 and AQ algorithms to generate IF-THEN rules,

37

Page 58: Memòria justificativa de recerca de les beques

5. NEW METHODOLOGY FOR INTELLIGENT CONTRIBUTION ANALYSIS

and the PART algorithm (52), that create rules from decision trees and use the separate-and-

conquer rule-learning technique. After applying the RIA to the FSD an ordered rules set are

provided and are used to diagnose new faulty batches.

The classification algorithms used are the IB1 algorithm (53), an instance based learner that

use the nearest neighbor as a distant measure, and the KStar algorithm (54), an instant-based

learner that use entropy as a distance measure. After applying the algorithms to the FSD a

classification model is provided and then used to diagnose new faulty batches.

5.5 Conclusions

A new methodology is proposed to do a better diagnosis of a faulty batch. A contribution

limit chart was developed to observe the contribution behavior of a batch. The contribution

limit are create from the contribution of the batches that composed the PCA model, then the

contribution of a doubtful batch is projected in the chart to observe if the contributions are

within the limits or not. The contribution limit chart provide information in which stages

of the process the contribution were outside the thresholds, but the diagnosis task at naked

eye still is difficult. Classification techniques could be used to make the task easier, the issue

lies in with the quantity of instances that need to be analyzed by the classification algorithm.

Therefore, a fault signature that contains the information of the behavior of the contributions

was proposed. The fault signature reduces the instances that need to be analyzed by the

classification algorithms and the diagnosis estimation of future batches can be achieve.

38

Page 59: Memòria justificativa de recerca de les beques

Chapter 6

Intelligent Contribution Analysis

for Fault Diagnosis

In chapter 4 the pilot plant for wastewater treatment and historical data from the process

were introduce. A PCA model to discern between the normal and abnormal batches was built.

The Q statistic chart was able to detect the faulty batches of the medium and low GQR, and

contribution plots of the of a faulty batch was calculated. Neither the statistical chart or the

contribution plot gave hints to discern to which GQR group the faulty batch belong. Therefore,

in chapter 5 a new methodology to predict the quality variables of a batch process is proposed.

The diagnosis of the GQR of the batches presented in table 4.1 are presented in this chapter

using the methodology proposed in chapter 5 and implementing the binary indicator for the

fault signature explained in section 5.3.1. The diagnosis of the fault signatures are going to be

executed with the rules set of the RIA explained in section 5.4.

6.1 Historical Data

The historical data from the SBR process are composed of 266 batch cases associated with their

respective BNR and GQR for the wastewater processed provided by the chemical laboratory.

The GQR diagnosis of the historical data in table 4.1 was redefined as NOC for the 93 high

GQR batches and AOC for the 82 medium and 91 low GQR batches. The AOC batches are

composed as follow: 58 medium GQR batches as AOC1, 24 medium GQR batches as AOC2

and 91 low GQR batches as AOC3.

39

Page 60: Memòria justificativa de recerca de les beques

6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS

BatchesBiological Nutrient Removal Global Quality

C NH+4 NO−

2 orNO−3 PO3−

4 Removal

93 X X X X NOC

58 X × X X AOC1

24 X X X × AOC2

91 X X • × AOC3

Table 6.1: Chemical analysis of BNR and GQR.

X = high quality removal. •= medium quality removal. × = low quality removal.

6.2 PCA Model

In the procedure for this method, 23 batches were considered as outliers to have a better model

to detect AOC batches and build better contribution limits (figure 6.1). The PCA model is

composed of 70 NOC batches with three principal component retained and explaining 75,60%

of cumulative variance (figure 6.2) and the Q residuals for the statistic threshold is 24,40%

(figure 6.3). In the first model built in section 4.3, the cumulative variance was 72,81%; and in

section 4.4 the Q residuals for the statistic threshold was 27,19%.

Figure 6.1: Pre-process 70 NOC Batches - Block/group scaling of the 70 NOC batches

without 23 outliers batches.

In figure 6.3 the circle are the 70 NOC batches, while the inverted triangular are the 173

batches. As seen, all the AOC batches were detected as faulty. The difference with the Q

statistic of the 70 NOC PCA model batches with the Q statistic of the 84 high PCA model

40

Page 61: Memòria justificativa de recerca de les beques

6.2 PCA Model

Figure 6.2: PCA Model 70 Batches - Model with 70 NOC batches and three principal

component.

Figure 6.3: Q Statistic Chart for AOC Batches - 173 AOC batches projected in the Q

statistic chart of the 70 NOC PCA model.

41

Page 62: Memòria justificativa de recerca de les beques

6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS

batches can be seen in figures 6.3 and 4.4, respectively, where the circle that represent the

batches of the PCA model are below the Q statistic threshold in figure 6.3, while in figure

4.4 there are few batches above the Q statistical threshold, those batches were considered as

outliers for the PCA model with 70 NOC batches.

6.3 Contribution Limit Chart and Binary Fault Signature

The contribution plot of the faulty batch is projected in the contribution limit chart and each

time step is compared against the threshold. In this procedure the binary indicator of section

5.3.1 is used. As explained in section 5.3.1, a counter would save the number of instances when

a contribution is outside the thresholds of the contribution limit in each stage. If the value of

the counter is more than 5% the length of the stage, then the indicator of the stage in that

variable is abnormal (1), otherwise normal (0). In the figure 6.4 the FS for the faulty batch can

be observed.

Figure 6.4: Fault Signature for AOC Batch - AOC Batch projected in the Q statistic chart

of the 70 NOC PCA model.

In figure 6.4 the FS is composed of 24 fields where every 6 fields (stages of the process) a

new variable start. It can be observed that the AE1 and AE2 stage of the pH variable and the

42

Page 63: Memòria justificativa de recerca de les beques

6.4 Diagnosis with the Binary Fault Signature

ANA, ANO and AE2 stage of the ORP variable were the ones with more contribution instances

outside the threshold producing the faulty batch.

6.4 Diagnosis with the Binary Fault Signature

In this process the FS can have 576 different sequences, therefore, a FSD containing the AOC

batches with their respective GQR is built. Rule induction algorithms are used to find patterns

in the FSD and deliver a set of rules to diagnose a batch released. There are two RIA that are

going to be used to build the rules to diagnose the released batches: CN2 and PART algorithms.

To test the proposed method with the RIA a training set and a validation set will be created

from the FSD of all AOC batches. The AOC sets are divided randomly and are composed as

follow:

• the training set is composed of 29 AOC1 batches, 12 AOC2 batches and 45 AOC3 batches;

• the validation set is composed of 29 AOC1 batches, 12 AOC2 batches and 46 ACO3

batches.

The RIA CN2 was applied to the training set and the algorithm provided a set of 15 rules

to diagnose the batches of the validation set. The rules obtained are IF - THEN rules. When

the system is ready to evaluate a batch process, it would check the diagnosis of the 24 indica-

tors from the FS. IF the indicators present a combination that is equal to a rule from the set,

THEN the result of that batch is the diagnosis that the algorithm induced from the training

set. Below are 3 examples rules for the 3 different GQR discovered by the CN2 algorithm:

Rule Ex.1:

IF pH AE1 = 1 AND Temp AE1 = 0 AND Temp ANO = 0

THEN Diagnosis = AOC1

Rule Ex.2:

IF pH F1 = 0 AND O2 F1 = 1 AND Temp ANA = 1

THEN Diagnosis = AOC2

Rule Ex.3:

IF O2 F2 = 0 AND Temp ANA = 1 AND Temp AE1 = 1

THEN Diagnosis = AOC3

43

Page 64: Memòria justificativa de recerca de les beques

6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS

CN2 CasesCorrectly Classified Unclassified

Cases % Cases %

AOC1 29 29 100 0 0

AOC2 12 9 75,00 3 25,00

AOC3 46 36 78,26 10 21,74

Total 87 74 85,06 13 14,94

Table 6.2: Diagnosis results obtained with the rules set of the CN2 algorithm

In table 6.2 the diagnosis results with the CN2 algorithm for the batches of the validation

set are shown. The first and second column presents the GQR and the cases, respectively. The

third and fourth column shows the cases correctly classified and the percentage of classification.

The last two column are the cases that were not classified and their percentage. The total

classification percentage for the validation set is 85,06% of correct classification.

Examples rules for the 3 different GQR discovered by the PART algorithm are following

presented:

Rule Ex.1:

IF pH AE1 = 1 AND Temp F2 = 0

THEN Diagnosis = AOC1

Rule Ex.2:

IF O2 F2 = 1 AND Temp ANO = 1 AND Temp AE2 = 0

THEN Diagnosis = AOC2

Rule Ex.3:

IF pH ANA = 0 AND O2 ANO = 0 AND Temp AE1 = 1

THEN Diagnosis = AOC3

CN2 CasesCorrectly Classified Unclassified

Cases % Cases %

AOC1 29 28 96,55 1 3,45

AOC2 12 10 83,33 2 16,67

AOC3 46 36 78,26 10 21,74

Total 87 74 85,06 13 14,94

Table 6.3: Diagnosis results obtained with the rules set of the PART algorithm

44

Page 65: Memòria justificativa de recerca de les beques

6.4 Diagnosis with the Binary Fault Signature

In table 6.2 the diagnosis results with the PART algorithm for the batches of the validation

set are shown. The first and second column presents the GQR and the cases, respectively. The

third and fourth column shows the cases correctly classified and the percentage of classification.

The last two column are the cases that were not classified and their percentage. The total

classification percentage for the validation set is 85,06% of correct classification.

The rule set obtained by the two algorithms, 15 with the CN2 and 16 with the PART

algorithm, provided a classification rate of 85,06%, meaning that 74 batches had a correct

classification of a total of 87 batches. But, the classification rate differs in two GQR. For

instance, in table 6.2 the AOC1 has a classification rate of 100%, while in table 6.3 the AOC1

has a classification rate of 96,55%. Meanwhile, the AOC2 has a classification rate of 75% in

table 6.2 and a classification rate of 83,33% in table 6.3. Depending which group is more critical,

the better classification rate obtained with a RIA is the one that is going to be applied.

Figure 6.5: Application Window - An AOC1 batch with the proposed methods for diagnosis.

The figure 6.5 is the window with the result of the application to diagnose the faulty batches.

At the top of the window an AOC1 batch is projected in the contribution limit chart of the 70

NOC PCA model. At the bottom of the window, in the left corner the FS, the red circle for

45

Page 66: Memòria justificativa de recerca de les beques

6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS

an abnormal stage and the blank circle for a normal stage. In the middle the BNR where the

green circle is for a high removal, the yellow circle for a medium removal and the red circle for

a low removal. And at the right corner, the GQR diagnosis of the batch.

6.5 Conclusions

A PCA model composed of 70 NOC batches was created, 23 of the total NOC batches were

considered as outliers. The Q statistic detect all the AOC batches. As observed in the chapter,

an AOC batch was projected in the contribution limit chart and a fault signature using the

binary indicators was obtained. To follow the procedure of the methodology to estimate the

diagnosis of the global quality removal, each AOC set was divided in half randomly, one half

was used as a training set and the other one as validation set. The rule induction algorithms

were applied to the training set and a set of rules were obtained. After the validation set was

pass through the PCA model for detection, fault signatures of the new batches of the validation

set were created and then these fault signature passed through the rules sets of the algorithms.

The estimation diagnosis for the global quality removal of the validation set give good results,

where the total classification rate with the rules set for the two algorithms were above 85% of

correct diagnosis.

46

Page 67: Memòria justificativa de recerca de les beques

Chapter 7

Intelligent Contribution Analysis

for Estimation of Quality

Variables

The good results obtained in the diagnosis of the faulty batches from chapter 6 have made delve

in the diagnosis task and develop new methods to obtained better diagnostic results. In this

chapter the challenge is to estimate each one of the quality variables of the process. The first

diagnosis will be performed with the binary indicator for the FS and the diagnosis the rules

set of the CN2 algorithm. The second and third diagnosis will be performed with the numeric

indicator for the FS, explained in section 5.3.2, the method proposed after the good results with

the binary indicator. The diagnosis of the faulty batches are performed with the classification

algorithms IB1 and KStar, explained in section 5.4.

7.1 Historical Data

The historical data from the SBR process are composed of 266 batch cases associated with their

respective BNR and GQR for the wastewater processed provided by the chemical laboratory

divided in 93 NOC batches and 173 AOC batches, can be found in table 6.1 of section 6.1.

The FSD of the AOC batches is going to be linked with the BNR of the four quality variables

of the process (table 6.1). There are 173 AOC batches and their BNR according to the effluent

quality are:

• organic matter (C): all the batches have high quality removal,

47

Page 68: Memòria justificativa de recerca de les beques

7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES

• ammonium (NH+4 ): 115 batches with high quality removal and 58 batches with low

quality removal,

• nitrates (NO−2 orNO

−3 ): 82 batches with high quality removal and 91 batches with medium

quality removal,

• phosphate (PO3−4 ): 58 batches with high quality removal and 115 batches with low quality

removal.

7.2 PCA Model

The PCA model for this procedure is the model presented in section 6.2, where the model

retains three principal component explaining 75,60% of cumulative variance (figure 6.2; and

the Q residuals for the statistic threshold is 24,40% (figure 6.3), all the AOC batches were

detected as faulty.

7.3 Binary Indicator for Fault Signature

7.3.1 Contribution Limit Chart and Binary Fault Signature

The contribution plot of the faulty batch is projected in the contribution limit chart and each

time step is compared against the threshold. In this procedure the binary indicator of section

5.3.1 is used. As explained in section 5.3.1, a counter would save the number of instances when

a contribution is outside the thresholds of the contribution limit in each stage. If the value of

the counter is more than 5% the length of the stage, then the indicator of the stage in that

variable is abnormal (1), otherwise normal (0). In the figure 6.4 the FS for the faulty batch can

be observed.

In figure 7.1 the FS is composed of 24 fields where every 6 fields (stages of the process) a

new variable start. It can be observed that the AE1 and AE2 stage of the pH variable and the

ANA, ANO and AE2 stage of the ORP variable were the ones with more contribution instances

outside the threshold producing the faulty batch.

7.3.2 Diagnosis with the Binary Fault Signature

Since the organic matter has high quality removal in all batches, is not taken into account.

A training set composed of 87 random batches from the 173 AOC batches is created, the 86

remaining batches will be the validation set. The CN2 algorithm is applied to the training set

48

Page 69: Memòria justificativa de recerca de les beques

7.3 Binary Indicator for Fault Signature

Figure 7.1: Fault Signature for AOC Batch - AOC Batch projected in the Q statistic chart

of the 70 NOC PCA model.

to obtained the rules set that will help to diagnose the validation set. To diagnose the three

quality variables that need to be measured for a faulty batch, rules set needs to be built for

each quality variable.

The diagnosis of the validation set after been evaluated with the rules set obtained from the

CN2 algorithm are shown in the tables 7.1, 7.2 and 7.3. There are five subdivision for each table:

in the first subdivision, the first column indicate the BNR quality; the second subdivision, the

second column the cases; the third subdivision (correct classification) show the cases correctly

classified from the total cases of the BNR quality from the first section and the second column

the percentage rate of correct classification; the fourth subdivision (wrong classification) shows

the cases that were wrong classified by the rules and what type of BNR quality was assigned to

the cases, the percentage rate of wrong classification can be seen in the fourth column; and the

fifth subdivision (unclassified) shows the cases that were not classified and the second column

indicate the percentage rate. If the sequence of indicators from the FS does not match a rule,

then the case is unclassified.

With the CN2 rule induction algorithm the correct classification rate for the ammonium

was 95,35%, 87,21% for the nitrates and 95,35% for the phosphate (tables 7.1, 7.2 and 7.3).

49

Page 70: Memòria justificativa de recerca de les beques

7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES

BNR CasesCorrect Classification Wrong Classification Unclassified

Classified % High Medium Low % Cases %

High 51 50 98,04 - - 1 1,96 - -

Medium - - - - - - - - -

Low 35 32 91,43 3 - - 8,57 - -

Total 86 82 95,35 3 - 1 4,65 - -

Table 7.1: CN2 diagnosis table for ammonium.

BNR CasesCorrect Classification Wrong Classification Unclassified

Classified % High Medium Low % Cases %

High 45 41 91,11 - 3 - 6,67 1 2,22

Medium 41 34 82,93 2 - - 4,88 5 12,20

Low - - - - - - - - -

Total 86 75 87,21 2 3 - 5,81 6 6,98

Table 7.2: CN2 diagnosis table for nitrates.

BNR CasesCorrect Classification Wrong Classification Unclassified

Classified % High Medium Low % Cases %

High 35 32 91,43 - - 3 8,57 - -

Medium - - - - - - - - -

Low 51 50 98,04 1 - - 1,96 - -

Total 86 82 95,35 1 - 3 4,65 - -

Table 7.3: CN2 diagnosis table for phosphate.

The total classification rate to estimate the diagnosis of the quality variables are above 87%,

while the total classification rate to estimate the diagnosis of the global quality in section 6.4

was above 85,06%. The difference in the classification rates is slightly better if its take into

account that each quality variable is estimated. Moreover, beside the classification rate of the

BNR medium cases in the nitrate quality variable, that has a classification rate of 82,93%, all

the other BNR in the different quality variables have classification above 91%.

50

Page 71: Memòria justificativa de recerca de les beques

7.4 Numeric Indicator for Fault Signature

7.4 Numeric Indicator for Fault Signature

7.4.1 Contribution Limit Chart and Numeric Fault Signature

The contribution plot of the faulty batch is projected in the contribution limit chart and each

time step is compared against the threshold. In this procedure the numeric indicator of section

5.3.2 is used. As explained in section 5.3.2, a counter would save the number of instances when

a contribution is outside the thresholds of the contribution limit chart in each stage. At the

end of the stage the indicator of the stage in the variable is the value of the counter. In the

figure 7.2 the FS for the faulty batch can be observed.

Figure 7.2: Fault Signature for AOC Batch - AOC Batch projected in the Q statistic chart

of the 70 NOC PCA model.

In figure 7.2 the FS is composed of 24 fields where every 6 fields (stages of the process) a

new variable start. It can be observed that the AE1 and AE2 stage of the pH variable and the

ANA, ANO and AE2 stage of the ORP variable were the ones with more contribution instances

outside the threshold producing the faulty batch and at the same time how many instances

the contribution were outside the thresholds, 9 instances for the AE1 and 18 instances for the

AE2 stage of the pH variable, and 84 instances for the ANA, 69 instances for the ANO and 67

instances for the AE2 stage of the ORP variable.

51

Page 72: Memòria justificativa de recerca de les beques

7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES

7.4.2 Diagnosis with the Numeric Fault Signature

Since the organic matter has high quality removal in all batches, is not taken into account.

A training set composed of 87 random batches from the 173 AOC batches is created, the 86

remaining batches will be the validation set. Two classification algorithms are applied to the

training set to obtained the knowledge model that will help to diagnose the validation set,

the IB1 and KStar algorithm, explained in section 5.4. To estimate the diagnosis of the three

quality variables that need to be measured for a faulty batch, knowledge model needs to be

built for each quality variable.

The diagnosis of the validation set after been evaluated with the knowledge model obtained

from the IB1 algorithm are shown in the tables 7.4, 7.5 and 7.6.

There are four subdivision for each table: in the first subdivision, the first column indicate

the BNR quality; in the second subdivision, the second column the cases; the third subdivision

(correct classification) show the cases correctly classified from the total cases of the BNR quality

from the first section and the second column the percentage rate of correct classification; and

the fourth subdivision (wrong classification) shows the cases that were wrong classified by the

knowledge model and what type of BNR quality was assigned to the cases, the percentage rate

of wrong classification can be seen in the fourth column.

BNR CasesCorrect Classification Wrong Classification

Classified % High Medium Low %

High 51 50 98,04 - - 1 1,96

Medium - - - - - - -

Low 35 35 100 - - - -

Total 86 85 98,84 - - 1 1,16

Table 7.4: IB1 diagnosis table for ammonium.

BNR CasesCorrect Classification Wrong Classification

Classified % High Medium Low %

High 45 45 100 - - - -

Medium 41 39 95,12 2 - - 4,88

Low - - - - - - -

Total 86 84 97,67 2 - - 2,33

Table 7.5: CN2 diagnosis table for nitrates.

52

Page 73: Memòria justificativa de recerca de les beques

7.4 Numeric Indicator for Fault Signature

BNR CasesCorrect Classification Wrong Classification

Classified % High Medium Low %

High 35 35 100 - - - -

Medium - - - - - - -

Low 51 50 98,04 1 - - 1,96

Total 86 85 98,84 1 - - 1,16

Table 7.6: CN2 diagnosis table for phosphate.

In table 7.4 the total classification rate for the diagnosis of the ammonium nutrient is 98,84%,

the nitrates nutrient have a total classification rate of 97,67% and the phosphate nutrient a total

classification rate of 98,84%. In comparison, the estimated diagnosis for the quality variables

with the numeric indicator for the FS had a total classification rate above 97,67%, while the

estimated diagnosis for the quality variables with the binary indicator for the FS of section 7.3

had a top total classification rate of 95,35%, therefore, the method to obtain the indicator for

the FS proposed in section 5.3.2 provided a better estimation of the quality variables.

The diagnosis of the validation set after been evaluated with the knowledge model obtained

from the KStar algorithm are presented in tables 7.7, 7.8 and 7.9.

BNR CasesCorrect Classification Wrong Classification

Classified % High Medium Low %

High 51 50 98,04 - - 1 1,96

Medium - - - - - - -

Low 35 35 100 - - - -

Total 86 85 98,84 - - 1 1,16

Table 7.7: IB1 diagnosis table for ammonium.

BNR CasesCorrect Classification Wrong Classification

Classified % High Medium Low %

High 45 45 100 - - - -

Medium 41 39 95,12 2 - - 4,88

Low - - - - - - -

Total 86 84 97,67 2 - - 2,33

Table 7.8: CN2 diagnosis table for nitrates.

53

Page 74: Memòria justificativa de recerca de les beques

7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES

BNR CasesCorrect Classification Wrong Classification

Classified % High Medium Low %

High 35 35 100 - - - -

Medium - - - - - - -

Low 51 50 98,04 1 - - 1,96

Total 86 85 98,84 1 - - 1,16

Table 7.9: CN2 diagnosis table for phosphate.

The tables 7.7, 7.8 and 7.9 where the estimated diagnosis classification of the three nutrients

are found, shows that the total classification rate and the classification of the different BNR

cases with the KStar knowledge model are exactly the same. Probably, one of the reason for

the results is that both algorithms are based in finding the smaller distance between the nearest

neighbors.

Figure 7.3: Application Window - An AOC batch with the proposed methods for diagnosis.

The figure 7.3 is the window with the result of the application to diagnose the faulty batches.

At the top of the window an AOC batch is projected in the contribution limit chart of the 70

NOC PCA model. At the bottom of the window, in the left corner the FS with the numeric

indicator for each stage in the different variables. In the middle the BNR where the green

54

Page 75: Memòria justificativa de recerca de les beques

7.5 Conclusions

circle is for a high removal, the yellow circle for a medium removal and the red circle for a low

removal. And at the right corner, the standard levels for nutrient removals (11).

7.5 Conclusions

The new methodology was used to estimate the diagnosis of the quality variables of the process,

ammonium, nitrates and phosphate. The PCA model was the same one as in chapter 6. In

this chapter the 173 AOC batches were divided randomly in two sets, the training set and the

validation set. The binary indicator was used for the fault signature. The CN2 rule induction

algorithm was applied to the training set to obtained the rules set to estimate the diagnosis of

the quality variables. Since there are three quality variables the algorithm was applied to the

training set three times, the first time the input data (fault signature) with the quality removal

of the ammonium, the second time the input data (fault signature) with the quality removal

of the nitrates and the third time the input data (fault signature) with the quality removal of

the phosphate. After obtaining the rules set for the ammonium, nitrates and phosphate, the

validation set was passed through the system. The result of the estimation diagnosis of the

quality variables for the validation set were above 87,21% of correct diganosis.

To achieve better results with the methodology, the numeric indicator for the fault signature

was proposed. The same training set and validation set were used to test the new proposal.

In this occasion two classification algorithms were used to create a knowledge model, the IB1

and KStar algorithm. Each algorithm was applied to the training set three times, the first time

the input data (fault signature) with the quality removal of the ammonium, the second time

the input data (fault signature) with the quality removal of the nitrates and the third time the

input data (fault signature) with the quality removal of the phosphate. The knowledge models

were applied to the validation set and the estimate diagnosis for the quality variables for the

validation set were above 97,67% of correct diagnosis for both algorithms.

55

Page 76: Memòria justificativa de recerca de les beques

7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES

56

Page 77: Memòria justificativa de recerca de les beques

Chapter 8

Conclusions and Future Studies

In this study a new methodology to estimate the quality of a released batch using the measure-

ments of the variables of the process was proposed. The Contribution limits charts obtained

information on how the variables contributes in the different stages of a faulty process. A fault

signature was created as a tool to be used with classification algorithms. The fault signature

contain the information of the contribution behavior of a faulty process and at the same time

reduce the dimensionality of the instances that need to be analyzed. Two approaches to repre-

sent the behavior of the contributions in the fault signature were developed. The classification

algorithms search for patterns in a fault signature dataset composed of abnormal batches and

provided a trained knowledge model to estimate the faulty behavior of future batches.

To test the methodology a PCA model based in 70 NOC historical batches was built. The Q

statistic chart of the PCA model detected the 173 AOC historical batches. With the Q statistic

the classification or the diagnosis of the batches could not be achieved, therefore, contributions

plots of the faulty batches were calculated to pursuit the task of diagnosis. At naked eye

the diagnosis of the contribution plots is very difficult if there are no expert in process, and

moreover, an expert to diagnose a process that is highly non-linear is very difficult to find.

Contribution limits charts were proposed to tackle the deficiency on the diagnosis with the

contribution plots. If the contribution of a faulty batch is projected in the contribution limit

chart, it can be observe in which stages the variables contribute in an abnormal manner. With

the information of the contribution behavior gathered, classification task could be performed.

The issue, if the batch process duration is long and has many variables to measure, then the

classification task could be a problem.

57

Page 78: Memòria justificativa de recerca de les beques

8. CONCLUSIONS AND FUTURE STUDIES

To overcome the dimensionality of the contributions instances, a fault signature to contain

the information of the contribution behavior, and reduce the dimensionality for the classification

task was proposed. The approaches to reduce the dimensionality of the contributions is to use

a binary indicator or a numeric indicator for the fault signature. With the reduce dimension

of the fault signature, classifications tasks are much easier. An AOC fault signature dataset is

built in order that a classification algorithm searches for patterns that describe the mapping of

the faulty behavior and the laboratory analysis on the product quality.

After the classification models are obtained, the system can estimate the quality diagnosis

of a released batch. The measurements of the released batch are projected in the PCA model

and the Q statistic is observed to discern if the batch is faulty or not. If the batch is faulty the

contribution plot is calculated and projected in the contribution limit chart. The fault signature

is obtained with the information of the contribution behavior of the faulty process. Then, the

fault signature passes through the classification models and depending on its information the

model give a diagnosis estimation of the product quality.

The result obtained with the methodology proposed were great. In the diagnosis of the

global quality variable with the binary indicator for the fault signature the total classification

rate was 85,06% for both rule induction algorithm. The diagnosis of the quality variables with

the binary indicator had a minimun classification rate of 87,21% and a maximum of 95,35%.

And the diagnosis of the quality variables with the numeric indicator for the fault signature

had a minimum classification rate of 97,67% and maximum of 98,84%. The results suggest that

the best approche to fill the fault signature with the information of the contribution behavior

is the numeric indicator, but, if the process is more controlled and not so non-linear, maybe

the other approach is much better.

For future studies, applying the methodology in different batch processes would help to

improve the proposed method. Improvement of the approaches for the fault signature would

help to achieve better classifications rates. The proposed methodology could be applied to a

wide range of batch process, where the cycle of the batches have different stages and the disposal

of sensors to measure the quality variable are not available, and with this studies improve the

robustness of the system.

58

Page 79: Memòria justificativa de recerca de les beques

Glossary

NO−2 Nitrate

NO−3 Nitrate

PO3−4 Phosphate

NH+4 Ammonium

AE1 First aerobic condition for the SBR biological reaction

AE2 Second aerobic condition for the SBR biological reaction

ANA Anaerobic condition for the SBR biological reaction

ANO Anoxic condition for the SBR biological reaction

AOC Abnormal operation condition

BNR Biological nutrient removal

BOD Biochemical oxygen demand

C Organic matter

CUSUM Cumulative Sum

DO Dissolved oxygen

F1 First fill for the SBR biological reaction

F2 Second fill for the SBR biological reaction

FS Fault signature

FSD Fault signature dataset

59

Page 80: Memòria justificativa de recerca de les beques

GLOSSARY

GQR Global quality removal for the wastewater treated

LCL Lower contribution limit for the contribution limit chart

MPCA Multiway principal component analysis

MSPC Multivariate statistical process control

NOC Normal operation condition

ORP Oxidation reduction potential

PCA Principal component analysis

pH Measure of the acidity or alkalinity of an aqueous solution

RIA Rule induction algorithm

SBR Sequencing batch reactor

SPC Statistical process control

SPE Squared prediction error

Temp Temperature

U-PCA Unfold - Principal component analysis

UCL Upper contribution limit for the contribution limit chart

60

Page 81: Memòria justificativa de recerca de les beques

References

[1] P. Nomikos and J. F. MacGregor. Multivariate SPC Charts for Monitoring

Batch Processes. Technometrics, 37(1):41–59, February 1995. 1

[2] S. Puig, M.T. Vives, L. Corominas, M.D. Balaguer, and J. Colprim. Wastewa-

ter Nitrogen Removal in SBRs, Applying a Step-Feed Strategy: from Lab-Scale

to Pilot-Plant Operation. Water Sci. Technol., 50(10):89–96, 2004. 1

[3] S. Puig, M. Coma, M. C.M. VanLoosdrecht, J. Colprim, and M. D. Balaguer.

Biological Nutrient Removal in a Sequencing Batch Reactor Using Ethanol as

Carbon Source. J. Chem. Technol. Biotechnol., 82(10):898–904, 2007. 1

[4] R. Ganigu, H. Lpez, M.D. Balaguer, and J. Colprim. Partial Ammonium Oxi-

dation to Nitrite of High Ammonium Content Urban Landfill Leachates. Water

Res., 41(15):3317–3326, 2007. 1

[5] H. Lopez, S. Puig, R. Ganigud, M. Ruscalleda, M. D. Balaguer, and J. Col-

prim. Start-Up and Enrichment of a Granular Anammox SBR to Treat High

Nitrogen Load Wastewaters. J. Chem. Technol. Biotechnol., 83(3):233–241, 2008. 1

[6] S. Wold, K. Esbensen, and P. Geladi. Principal Component Analysis. Chemom.

Intell. Lab. Syst., 2(1):37–52, 1987. 1, 13

[7] J. E. Jackson. A User’s Guide to Principal Components. John Wiley & Sons Canada,

Limited, March 1991. 1, 13

[8] E. B. Martin, A. J. Morris, and J. Zhang. Process Performance Monitor-

ing Using Multivariate Statistical Process Control. IEE P-Contr. Theor. Ap.,

143(2):132–144, March 1996. 1, 13, 16, 20

61

Page 82: Memòria justificativa de recerca de les beques

REFERENCES

[9] P. Nomikos and J. F. MacGregor. Monitoring Batch Processes Using Multiway

Principal Component Analysis. AIChE J., 40(8):1361–1375, August 1994. 2, 20

[10] Joanne Drinan. Waster and Wastewater Treatment: A Guide for the Nonengineering

Professional. Technomic Publishing Co. Inc., illustrated edition, 2001. 5, 6

[11] European Community. Commission Directive 98/15/EC Amending Council

Directive 91/271/EEC Concerning Urban Waste Water Treatment. Official J.

European Communities., L 67:29–30, March 7 1998. 5, 25, 26, 55

[12] Frank R. Spellman. Handbook of Water and Wastewater Treatment Plant Operations.

Lewis Publishers, 2003. 5

[13] K. A. Kosanovich, M. J. Piovoso, K. S. Dahl, J. F. MacGregor, and

P. Nomikos. Multi-Way PCA Applied to an Industrial Batch Process. In Proc.

Am. Control Conf., 2, pages 1294–1298, 1994. 6, 21

[14] S. Mace and J. Mata-Alvarez. Utilization of SBR Technology for Wastewater

Treatment: An Overview. Ind. Eng. Chem. Res., 41(23):5539–5553, 2002. 6

[15] J. Keller, K. Subramaniam, J. Gsswein, and P.F. Greenfield. Nutrient Re-

moval from Industrial Wastewater Using Single Tank Sequencing Batch Reac-

tors. Water Sci. Technol., 35(6):137–144, 1997. 6

[16] Wisamm S. Al-Rekabi, He Qiang, and Wei Wu Qiang. Review on Sequencing

Batch Reactors. Pakistan Journal of Nutrition, 6(1):11–19, 2007. 7

[17] Douglas C. Montgomery. Introduction to Statistical Quality Control. Wiley, 3 edition,

1996. 9

[18] Ali Cinar and Cenk Undey. Statistical Process and Controller Performance

Monitoring. A Tutorial on Current Methods and Future Directions. Proceedings

of the American Control Conference, 4:2625–2639, June 1999. 9

[19] John S. Oakland. Statistical Process Control. Butterworth-Heinemann, fifth edition,

2003. 9, 10

[20] C. C. Aggarwal and P. S. Yu. Outlier Detection for High Dimensional Data.

In SIGMOD Conference, 2001. 11, 27

62

Page 83: Memòria justificativa de recerca de les beques

REFERENCES

[21] J. F. MacGregor. Using On-Line Process Data to Improve Quality. Is there a

Role for Statisticians? Are They Up for the Challenge? Int. Stat. Rev., 16(2):6–13,

1996. 12

[22] S. Bersimis, J. Panaretos, and S. Psarakis. Multivariate Statistical Process

Control Charts and the Problem of Interpretation: A Short Overview and

Some Applications in Industry. In Proceedings of the 7th Hellenic European Conference

on Computer Mathematics and its Applications, 2005. 12

[23] Julia Doroshenko and Vale. Multivariate Control Charts for the Analysis of

Process. In Modern Problems of Radio Engineering, Telecomunications and Computer

Science, 2002, pages 136–137, 2002. 12

[24] H. Hotelling. Techniques of Statistical Analysis, chapter Multivariate Quality Control,

pages 111–184. McGraw-Hill, 1947. 12

[25] Kuang-Han Chen, Duane S. Boning, and Roy E. Welsch. Multivariate Sta-

tistical Process Control and Signature Analysis Using Eigenfactor Detection

Methods. In Proceedings of the 33rd Symposium on the Interface: Computing Science

and Statistics, number 33, pages 271–291, 2002. 13

[26] Barry M. Wise, Neal B. Gallagher, Stephanie Watts Bulter, Danifl

D. White Jr., and Gabriel G. Barna. Development and Benchmarking of

Multivariate Statistical Process control Tool for a Semiconductor ETCH Pro-

cess: Impact of Measurement Selection and Data Treatment on Sensitivity. In

IFAC SafeProcess’97, pages 35–42, 1997. 13, 19

[27] David M. Himes, Robert H. Storer, and Chistos Georgakis. Determination of

hte Number of Principal Components for Disturbance Detection and Isolation.

In Proceedings of the Amrecian Control Conference, 1994. 14, 15, 16

[28] Gilles Raıche, Martin Riopel, and Jean-Guy Blais. Non Graphical Solutions

for the Cattell’s Scree Test. In International Meeting of the Psychometric Society,

2006. 15

[29] Ruben Daniel Ledesma and Pedro Valero-Mora. Determining the Number of

Factors to Retain in EFA: an easy-to-use computer program for carrying out

Parallel Analysis. Practical Assessment, Research & Evaluation, 12(12), 2007. 15, 16

63

Page 84: Memòria justificativa de recerca de les beques

REFERENCES

[30] T. Kourti. Application of Latent Variable Methods to Process Control and

Multivariate Statistical Process Control in Industry. Int. J. Adapt. Control,

19:213–246, 2005. 16, 18, 22

[31] T. Kourti and J. F. MacGregor. Multivariate SPC Methods for Process and

Product Monitoring. J. Qual. Technol., 28(4):409–428, October 1996. 17, 19

[32] J. A. Westerhuis, S. P. G., and A. K. Smilde. Generalize Contribution Plots in

Multivariate Statistical Process Monitoring. Chemom. Intell. Lab. Syst., 51:95–114,

2000. 19, 27

[33] S. J. Qin. Statistical Process Monitoring: Basics and Beyond. J. Chemom.,

17:480–502, 2003. 19

[34] J. Flores-Cerrillo and J. F. MacGregor. Multivariate Monitoring of Batch

Processes Using Batch-to-Batch Information. AIChE J., 50(6):1219–1228, June

2004. 21

[35] Svante Wold, Nouna Kettaneh, Hakan Friden, and Andrea Holmberg. Mod-

eliling and Diagnostics of Batch Processes and Analogous Kinetic Experiments.

Chemometrics and Intelligent Laboratory Systems, 44(1-2):331–340, 1998. 22

[36] H. J. Ramaker, E. N. M. van Sprang, S. P. Gurden, J. A. Westerhuis, and A. K.

Smilde. Improved Monitoring of Batch Processes by Incorporating External

Information. Journal of Process Control, 12(4):569–576, 2002. 22

[37] C. K. Yoo, K. Villez, I. Lee, C. Rosn, and P. A. Vanrolleghem. Multi-Model

Statistical Process Monitoring and Diagnosis of a Sequencing Batch Reactor.

Biotechnol. Bioeng., 96(4):687–701, March 2007. 22

[38] C. Rosen and G. Olsson. Disturbance Detection in Wastewater Treatment

Plants. Water Science and Technology, 37(12):197–205, 1998. 22

[39] M. Ruiz, J. Colomer, J. Colprim, and J. Melndez. Multivariate Statistical

Process Control for Situation Assessment of a Sequencing Batch Reactor. In

Control 2004, University of Bath, UK, page 11, September 2004. 22, 33

64

Page 85: Memòria justificativa de recerca de les beques

REFERENCES

[40] M. Ruiz, J. Colomer, and J. Melndez. Combination of Statistical Process Con-

trol (SPC) Methods and Classification Strategies for Situation Assessment of

Batch Process. Inteligencia Artificial, Revista Iberoamericana de IA, 10(29):99–107,

2006. 22, 33

[41] Kunwar P. Singh, Amrita Malik, Dinesh Mohan, Sarita Sinha, and Vinod K.

Singh. Chemometric Data Analysis of Pollutans in Wastewater - A Case Study.

Analytica Chimica Acta, 532(1):15–25, 2005. 22

[42] S. Puig, M. Coma, H. Moncls, M.C.M. VanLoosdrecht, J.Colprim, and M.D.

Balaguer. Selection Between Alcohols and Volatile Fatty Acids as External

Carbon Sources for EBPR. Water Research, 42(3):557–566, 2008. 26

[43] Young-Hak Lee, Don-Yong Lee, and Chonghun Han. RMBatch: Intelligent

real-time monitoring and diagnosis system for batch processes. Computers &

Chemical Engineering, 23(Supplement 1):S699 – S702, 1999. European Symposium on

Computer Aided Process Engineering, Proceedings of the European Symposium. 33

[44] K. Villez, M. Ruiz, G. Sin, J. Colomer, C. Rosn, and P. A. Vanrolleghem.

Combining Multiway Principal Component Analysis (MPCA) and Clustering

for Efficient Data Mining of Historical Data Sets of SBR Processes. Water Sci.

Technol., 57(10):1659–1666, 2008. 33

[45] Yejin Kim, Hyeon Bae, Kyungmin Poo, Jongrack Kim, Taesup Moon, Sungshin

Kim, and Changwon Kim. Soft Sensor Using PNN Model and Rule Base for

Wastewater Treatment Plant. In Jun Wang, Zhang Yi, Jacek Zurada, Bao-

Liang Lu, and Hujun Yin, editors, Advances in Neural Networks - ISNN 2006, 3973 of

Lecture Notes in Computer Science, pages 1261–1269. Springer Berlin / Heidelberg, 2006.

33

[46] Sung Hun Hong, Min Woo Lee, Dae Sung Lee, and Jong Moon Park. Moni-

toring of sequencing batch reactor for nitrogen and phosphorus removal using

neural networks. Biochemical Engineering Journal, 35(3):365 – 370, 2007. 33

[47] Liping Fan and Yang Xu. A PCA-Combined Neural Network Software Sensor

for SBR Processes. In Derong Liu, Shumin Fei, Zengguang Hou, Huaguang

Zhang, and Changyin Sun, editors, Advances in Neural Networks ISNN 2007, 4492

65

Page 86: Memòria justificativa de recerca de les beques

REFERENCES

of Lecture Notes in Computer Science, pages 1042–1047. Springer Berlin / Heidelberg,

2007. 33

[48] E. S. Page. Continuous Inspection Schemes. Biometrika, 41(1/2):100–115, June

1954. 35

[49] David Leung and Jose Romagnoli. An integration mechanism for multivariate

knowledge-based fault diagnosis. Journal of Process Control, 12(1):15 – 26, 2002. 37

[50] Fu Xiao, Shengwei Wang, Xinhua Xu, and Gaoming Ge. An isolation enhanced

PCA method with expert-based multivariate decoupling for sensor FDD in

air-conditioning systems. Applied Thermal Engineering, 29(4):712 – 722, 2009. 37

[51] P. Clark and T. Niblett. Induction in Noisy Domains. In I. Bratko and

N. Lavrac, editors, Progress in Machine Learning (Proceedings of the 2nd European

Working Session on Learning), pages 11–30, Wilmslow, UK, 1987. Sigma Press. 37

[52] Eibe Frank and Ian H. Witten. Generating Accurate Rule Sets Without Global

Optimization. In Proceedings of the 15th International Conference on Machine Learning

(1998), 144-151, 1998. 38

[53] David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-Based Learning

Algorithms. Machine Learning, 6(1):37–66, January 1991. 38

[54] John G. Cleary and Leonard E. Trigg. K*: An Instance-based Learner Using

an Entropic Distance Measure. In 12th International Conference on Machine Learning

(1995), pages 108–114, 1995. 38

66