Upload
urbano
View
41
Download
2
Tags:
Embed Size (px)
DESCRIPTION
The Conjunction of Process and Spectral Data for Enhanced Fault Detection. Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/. Motivation. - PowerPoint PPT Presentation
Citation preview
Elaine MartinCentre for Process Analytics and Control Technology
University of Newcastle, England
www.ncl.ac.uk/cpact/
The Conjunction of Process and Spectral Data for Enhanced Fault Detection
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Motivation
It is conjectured that there may be factors relating specifically to a process that cannot be identified from the spectroscopic measurements that could be described by the process data or vice versa.
Consequently one way to enhancing prediction accuracy and process performance and fault detection is through the integration of process and spectral data.
The aim of the subsequent studies was to investigate the combined power of spectral and process data.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Overview
Process Modelling
Fermentation Process• Spectral Data• Spectral and Process Data
Process Monitoring and Fault Detection
Polymer-resin Manufacturing• Process Data• Process and Spectral Data
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Challenges in the Monitoring of Fermentation Processes
Fermentation is a process in which micro-organisms convert chemical species to products of higher value.
On-line information relating to the progression of the process is not easily attained.
Near Infrared and Mid Infrared spectroscopy have been applied for the monitoring of fermentation processes.
The successful implementation of these spectroscopic approaches necessitates the application of appropriate multivariate data analysis techniques, such as partial least squares (PLS).
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Experimental Data Set
The industrial pilot-plant scale Streptomyces fermentation process involves two stages:
Seed stage Final stage
The seed stage materialises in the generation of biomass. The starting ingredients include carbohydrate, soya protein,
vegetable oil and trace elements in water.
The biomass is transferred to the final stage for the production of the desired product.
The final stage is a fed batch process lasting approximately 140hrs.
NIR measurements were collected for the final stage of the process.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Spectra Data Acquisition
The NIR spectral data were recorded using a Zeiss Corona 45
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Description of the Data Set
Final stage data from 7 standard batches and 7 Design of Experiment batches form the basis of the subsequent analysis.
Data collected included on-line process data, off-line data, biochemical and NIR measurements.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Methodological Summary
Pre-processing of the spectral data set First derivates Splining
Segmented wavelength region selection
Global modelling – Linear PLS, Neural Network PLS, Quadratic PLS
Local modelling - Linear PLS, Neural Network PLS, Quadratic PLS
Bagging of the models Linear partial least squares Averaging
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Data Pre-processing
The NIR data (Zeiss Corona NIR) were recorded every 15 minutes and the first derivatives were taken.
Since only ten values of titre were recorded, a spline was fitted to the data.
The splined titre values were aligned to the 550 spectral values for each batch.
The range utilised for both the spectral and quality data was 43.75 to 125 log hours.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Data Pre-processing
0 50 100 150 200 250 300 350 400 450 500 550
0
0.2
0.4
0.6
0.8
1
Time points
Titr
e va
lue
s
Real and splined values for batch 88
Batch 88
0
0.2
0.4
0.6
0.8
1
1.2
30 50 70 90 110 130 150
LH
Titre
value
s
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
NIR Data and First Derivatives
10 20 30 40 50 60
-0.01
0
0.01
0.02
0.03
0.04
0.05
Wavelength/nm
log
(1/R
)
Stack Batches 2088 to 2100
1000 1100 1200 1300 1400 1500 1600
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Wavelength/nm
log(1
/R)
Stack Batches 2088 to 2100
NIR Data First Derivative
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Spectral Window Selection Algorithm
N
Select training and validation batches
Mean centre and take derivatives of the spectral
data
Generate random centres and widths
Build model ‘input’ matrix eliminating
common data. Generate PLS model Calculate RMS errors
Generate random changes to centres
and widths
Apply the randomchanges to the
current centres and widths
Build new input matrix,
generate model and calculate RMS errors
Has the RMS on training
data decreased?
Has number of iterations been
exceeded and there are more models to
build ?
Present the final bagged model
N
Y
Y
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Spectral Window Selection Algorithm
0 50 100 150 200 250 300 350 400 450-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
Wavelength
Centre
Width
Generate random increment in centre and width
Centre
Width
Update the centre and width
Take another step with theCentre and Width increment
Step too far. The prediction error has increased. Go back to where we were.
Generate a new increment in centre and width and continue search
Has the prediction error decreased?Yes, then a step in the right direction
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Benefits of the SWS Algorithm
SWS offers the opportunity to consider not only the extremes of a single wavelength and the full set but also restricts selection to multiple sub-sets of the full set.
Finds the ‘best’ possible models for the product concentration and the biochemical components.
Finds the ‘best’ wavelength range from which these models can be built.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Bagging
SWS does not provide a unique model.
To obtain a more robust model, bagging is implemented.
‘Resample and Combine’ method or ‘bagging’ is an algorithm that helps improve the robustness of models by combining predictions
from different models.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Bagging of Models
30 models were generated by changing the initial random seed of the wavelength selection algorithm.
Bagging was applied to the 30 models:
The average value was calculated from the output of the 30 models.
A PLS model was fitted between the real and fitted values to give a weighted average.
X Y
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Global and Local Modelling
0 100 200 300 400 5000
0.1
0.2
0.3
0.4
0.5
0.6
0.7Full data set of batch 133
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5First half from the data set of batch 133
0 50 100 1500.46
0.48
0.5
0.52
0.54
0.56Second half from the data set of batch 133
Apply Global Modelling
Apply Local Modelling
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Total Sugar for 2088
-1 34 69 104 139
LH
To
tal S
ug
ar
Free glucose for 2088
-1 34 69 104 139
LH
Fre
e g
luco
se
Soluble Phospahe for 2088
-1 34 69 104 139
LH
So
l P
ho
sp
hate
2 critical points at 70 and 100 hours were identified from plots
of the biochemical data
Local Modelling
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Local ModellingcTitre against Log Hours
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
42 52 62 72 82 92 102 112 122 132 142
LH
cT
itre
MS2088 MS2090 MS2092 MS2094 MS2096 MS2098 MS2100
First Time Interval Second Time Interval Third Time Interval
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Local Modelling Approach
Three time regions for both the spectra and the quality variable values (titre) were selected.
Samples up to 70 log hours, i.e 175-280 sample points.
From 70 log hours to 100 log hours, i.e 280-400 sample points.
From 100 log hours up to the end of the chosen window, i.e. 400-500 sample points.
180 190 200 210 220 230 240 250 260 270 280
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Time points
Titr
e va
lue
s
Titre Values 2088 to 2100 for Range[175 280]
10 20 30 40 50 60
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Wavelength/nm lo
g(1/
R)
Stack Batches 2088 to 2100 for samples 175 to 280
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Local Modelling Approach
400 410 420 430 440 450 460 470 480 490 500
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Time points
Titr
e va
lue
s
Titre Values 2088 to 2100 for Range[400 500]
10 20 30 40 50 60
-0.01
0
0.01
0.02
0.03
0.04
0.05
Wavelength/nm lo
g(1/
R)
Stack Batches 2088 to 2100 for samples 400 to 500
280 300 320 340 360 380 400
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
Time points
Titr
e va
lue
s
Titre Values 2088 to 2100 for Range[280 400]
10 20 30 40 50 60
-0.01
0
0.01
0.02
0.03
0.04
Wavelength/nm
log(
1/R
)
Stack Batches 2088 to 2100 for samples 280 to 400
Region 2
Region 3
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Results : Time Interval 1
0 50 100 150 200 250 300 350 400 450 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8 Training Data
Samples
Titr
e v
alu
es
0 50 100 150 200 250 300 350 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Validation Data
Samples
Titr
e v
alu
es
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Results : Time Interval 1
0 5 10 15 20 25 30 350
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
ERRORS EXPERIMENTAL
The RMS of the training set for models 1, 7 and 29 is large.
The RMS of the validation data set for models 1, 7 and 29 is small.
The RMS error for PLS Bagging is smaller than the error of each individual model
0 5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
ERRORS VALIDATION
RMS error after PLS Bagging
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Linear PLS – Region 1 (Wavelength Selection)
0 50 100 150 200 250 300 350 400 450 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 Training Data Set - Performing PLS or Averaging
Samples
Titr
e va
lues
Predicted PLS Real Predicted Averaging
0 50 100 150 200 250 300 350 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 Validation Data Set - Performing PLS or Averaging
Samples T
itre
valu
es
Predicted PLS Real Predicted Averaging
Training Data Set Validation Data Set
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Results : Time Interval 1
10 20 30 40 50 60 0
5
10
15
wavelengths
num
ber o
f app
eara
nces
Frequency of Appearances
10 20 30 40 50 60
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Wavelength/nm
log(1
/R)
Stack Batches 2088 to 2100 for samples 175 to 280
The wavelengths between 30 and 40 are selected most frequently.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Neural Network PLS – Region 2 (Wavelength Selection)
0 50 100 150 200 250 300 350 400 450 5000.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3Training Data Set - Performing PLS or Averaging
Samples
Titr
e va
lues
Predicted PLSReal Predicted Averaging
0 50 100 150 200 250 300 350 4000.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3Validation Data Set - Performing PLS or Averaging
SamplesT
itre
valu
es
Predicted PLSReal Predicted Averaging
Training Data Set Validation Data Set
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Polynomial PLS – Region 3 (Wavelength Selection)
0 50 100 150 200 250 300 3500.9
1
1.1
1.2
1.3
1.4Validation Data Set - Performing PLS or Averaging
Samples
Titr
e va
lues
Predicted PLSReal Predicted Averaging
0 50 100 150 200 250 300 350 400 4500.8
0.9
1
1.1
1.2
1.3
1.4
1.5Training Data Set - Performing PLS or Averaging
Samples
Titr
e va
lues
Predicted PLSReal Predicted Averaging
Training Data Set Validation Data Set
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Local Modelling : Training Data Set
0 50 100 150 200 250 300 3500.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3Training Data Set - All the Wavelengths- First Batch
Samples
Titre
valu
es
Real Predicted
0 10 20 30 40 50 60 70 80 90 1000.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75Training Data Set - Taking the Average-First Batch
Samples
Titre
valu
es
Real Predicted
100 120 140 160 180 200 220 2400.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1Training Data Set - Taking the Average
Samples
Titre
valu
es
Real Predicted
220 240 260 280 300 320 3401.06
1.08
1.1
1.12
1.14
1.16
1.18
1.2
1.22
1.24
1.26Training Data Set - Taking the Average-First Batch
Samples
Titre
valu
es
Real Predicted
Global Modelling Local Modelling
Global Modelling predictions
Local Modelling predictions
for time intervals 1, 2 and 3
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Local Modelling : Validation Data Set
0 50 100 150 200 250 300 3500.2
0.4
0.6
0.8
1
1.2
1.4
1.6Validation Data Set - All the Wavelengths- First Batch
Samples
Titre valu
es
Real Predicted
0 50 100 150 200 250 300 3500.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2Training Data Set - Taking the Average-Local Model
Samples
Titre values
Real Predicted
1rst Time Interval2nd Time Interval
3rd Time Interval
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Genetic Algorithm Results
10 20 30 40 50 60 0
5
10
15
wavelengths
num
ber
of
app
ea
ran
ces
Frequency of Appearances
10 20 30 40 50 600
5
10
15
20
25
30
wavelengths
num
ber
of
appeara
nces
Frequency of Appearances
Genetic algorithms provide the possibility of selecting individual wavelengths but potentially does not predict future samples well.
SWS Genetic Algorithms
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
GA Results – Region 2
0 50 100 150 200 250 300 350 4000.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3Validation Data Set - Taking the Average
Samples
Titr
e va
lues
Real Predicted
0 50 100 150 200 250 300 350 4000.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3Validation Data Set - Taking the Average
Samples
Titr
e va
lues
Real Predicted
SWS Averaging Ga’S Averaging
RMS of Validation - SWS: 0.048 GAs:0.069
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Genetic Algorithm Results
Time Interval 1 Time Interval 2 Time Interval 3
PLS Bagging
Average Bagging
PLS Bagging
Average Bagging
PLS Bagging
Average Bagging
SWS with Linear PLS 0.018 0.034 0.025 0.034 0.039 0.060
GAs with Linear PLS 0.018 0.018 0.023 0.024 0.037 0.038
TRAINING
Time Interval 1 Time Interval 2 Time Interval 3
PLS Bagging
Average Bagging
PLS Bagging
Average Bagging
PLS Bagging
Average Bagging
SWS with Linear PLS 0.045 0.049 0.059 0.048 0.095 0.058
GAs with Linear PLS 0.045 0.043 0.067 0.069 0.177 0.139
VALIDATION RESULTS
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Summary of Results
GAs produced slightly better predictions for the training data set resulting in overfitting.
In the validation model, SWS combination with bagging for local modelling gave better results than the GA in combination with bagging.
Local modelling gives better results than global modelling.
SWS with bagging gives better results compared with the purported ‘one-shot wonder’ models.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Design of Experiment Data
Integration of Process and Spectral Data
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Conjunction of Process and Spectral Data
In the later stages of the fermentation, the error in the calibration models was observed to be greater with offsets being present.
During this time, significant changes in the fermentation broth concentrations occur.
The offset can potentially be modelled by utilising other process information such as off-gas measurements.
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Data Set and Aim
The aim is to infer product concentration and the biochemical components from the spectral data.
Working on the off-line, biochemical and NIR data for the design of experiment batches.
Changing conditions in experimental design:
• Temperature (°C) • pH• Sugar feed (gh-1)• Oil feed (%)
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Conjunction of Process and Spectral Data
MODEL
SpectralΣ
+
Biochemical Concentration
-
Calibration spectral residuals
MODEL
Process DataΣ
+
Calibration Spectral Residuals
-
Innovations
First Step: Calculation of the calibration spectral residuals.
Second Step: Modelling of the calibration spectral residuals from the process data and the generation of the innovations.
Σ
Biochemical Concentration Predictions by Spectra
Residuals Prediction by Process Data
Final Product Concentrations Final Step: Prediction of the
product concentration
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Conjunction of Process and Spectral Data
CER
CO2 Total
pH
OUR
Temperature
5 variables were considered to be the most important for the prediction of product concentration
200 400 600 800 1000 1200 1400 1600
Time Series Plot
5 pH
200 400 600 800 1000 1200 1400 1600
Time Series Plot
2 CER
200 400 600 800 1000 1200 1400 1600
Time Series Plot
3 CO2 Total
200 400 600 800 1000 1200 1400 1600
Time Series Plot
9 OUR
0 200 400 600 800 1000 120026.8
26.85
26.9
26.95
27
27.05
27.1
27.15
27.2
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
0 500 1000 15000
0.2
0.4
0.6
0.8
1
1.2
1.4
Predicted train values
Conjunction of Process and Spectral Data
Predictions Residuals
0 500 1000 1500-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Residuals for training data set
0 50 100 150 200 250 300 3500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Predicted valid values
0 50 100 150 200 250 300 350-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25Residuals for validation data set
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Final predictions of the product
0 50 100 150 200 250 300 350-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Real values, Predicted values and Final predicted values for valid
New residuals
• The off-set is reduced
• The residuals exhibit less structure and reflect noise
Conjunction of Process and Spectral Data
0 50 100 150 200 250 300 350-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25Residuals for the validation data set after adding process data
Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK
Conclusions
A Spectral Window Selection (SWS) algorithm has been proposed to select a window of wave numbers.
Multiple models are ‘bagged’ to produce a more robust model.
SWS produces better results than when the complete wavelength region is included.
Process data was combined with spectral data to eliminate offsets.
The wavelength selection-bagging approach in combination with the process data is now under investigation.
The results to date are promising.