Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
••11
Data-driven modelling:CASE STUDIES in RIVER and FLOOD
MANAGEMENT and CONTROL
Dimitri P. Solomatine
UNESCO-IHE Institute for Water Education
Hydroinformatics Chair
Data driven modelling and forecasting:our experience
replicating behavior of hydrodynamic/hydrological river model with the objective of using the ANN in model-based optimal control of a reservoir (Solomatine & Torres, 1996)
building ANN-based intelligent controller for real-time control of water levels in a polder (Lobbrecht & Solomatine, 1999);
modeling rainfall-runoff process with ANNs (Dibike, Solomatine & Abbott, 1999);
ANN in surge water level prediction for ship guidance reconstructing stage-discharge relationship using ANN (Bhattacharya &
Solomatine 2000)
D.P. Solomatine. Data-driven modelling. Applications. 2
Solomatine, 2000) using M5 model trees to predict river discharge using SVMs in prediction of water flows for flood management (Dibike,
Velickov, Solomatine & Abbott, 2001) Using ANN and other ML methods in optimization of urban systems (Alfonso
et al., 2012; Xu et al., 2015)
••22
Process (physically-based) modelling of flow:river modelling context
Available data:– rainfalls Rt lateral inflows QL
QQttupup
t Q– catchment and river physical properties (soil,
geometry, roughnesses)– initial and boundary conditions for flows Q 0(x,t)
Inputs: QL(x,t), Qup(t), Q 0(x,t) , system properties Output: flow Q (x, t) Model:
( ) ( ( ) ( ) 0( ) )
QQttRRtt
0t
hb
x
Q
D.P. Solomatine. Data-driven modelling. Applications. 3
Q (x, t)=F (QL(x,t), Qup(t), Q 0(x,t) , system properties) Questions:
– are the physical properties of the catchment known?– is F good enough ?
0)(2
2
K
QQgAHh
xgA
A
Q
xt
Q
Using data-driven methods in rainfall-runoff modelling
Available data:– rainfalls Rt
QQttupup
t– runoffs (flows) Qt
Inputs: lagged rainfalls Rt Rt-1 Rt-2 … Output to predict: Qt+T
Model: Qt+T = F (Rt Rt-1 … Qt Qt-1 …Qtup Qt-1
up …)(routing)
QQttRRtt
D.P. Solomatine. Data-driven modelling. Applications. 4
Questions: – how to find the appropriate lags? (lags embody the
physical properties of the catchment)– how to build F ?
••33
Case study SIEVE: flood management problem
D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks in rainfall runoff modeling Hydrological Sciences Journal 48 (3) 2003 399 411
D.P. Solomatine. Data-driven modelling. Applications. 5
rainfall-runoff modeling. Hydrological Sciences Journal, 48 (3), 2003, 399 - 411.
Italy, Tuscany, Arno river basin, Sieve subcatchment
D.P. Solomatine. Data-driven modelling. Applications. 6
••44
Schematisation
mountaneous catchment incatchment in Southern Europe
area of 822 sq. km
D.P. Solomatine. Data-driven modelling. Applications. 7
SIEVE: data
three months of hourly discharge (Q), precipitation (R) and evapotranspiration (E) data available (Dec 1959 to Feb 1960 2160evapotranspiration (E) data available (Dec. 1959 to Feb. 1960, 2160 data points)
for this experiment, data was represented as hourly data on effective rainfall RE and runoff R
1854 examples used for training, 300 for verification the problem: predict flow Q: 1, 3 and 6 hours ahead
D.P. Solomatine. Data-driven modelling. Applications. 8
••55
SIEVE: visualization of data
FLOW1: effective rainfall and discharge data
800 0
Discharge [m3/s]
Eff.rainfall [mm]
300
400
500
600
700
Discharge [m3/s]
2
4
6
8
10
12
14
Effective rainfall [mm]
D.P. Solomatine. Data-driven modelling. Applications. 9
0
100
200
0 500 1000 1500 2000 2500
Time [hrs]
14
16
18
20
Analysis of relationships, determining lags, choice of variables
200
250
300
350
s]
2
2.5
3
3.5
m]
Q
))()(((( jjiiij xExxExE
Covariance
Correlation
Rainfall event and the resulting flow
0
50
100
150
200
90 100 110 120 130 140 150
t [hrs ]
Q [
m3/s
0
0.5
1
1.5
2
R [
mmQ
R
Correlation between Qt+1 and REt-i
0 20.30.40.50.60.70.8
Correlation coeff.
jjii
ijij
•Correlation coefficient
Average mutual information
D.P. Solomatine. Data-driven modelling. Applications. 10
00.10.2
0 2 4 6 8 10 12
Lag in REt-i
AMI between Q(t+1) and R(t-tau)
0.00
0.05
0.10
0.15
0.20
0.25
0 2 4 6 8 10
time lag tau
AM
I
e age utua o at o
••66
SIEVE: variables and posing the problem
after correlation analysis, a number of models was tried, e.g. for Qt+1 :1. Qt+1 = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5 Qt, Qt-1 )Qt+1 f ( t t 1 t 2 t 3 t 4 t 5, Qt Qt 1 )2. Qt+1 = f [MA( REt, REt-1, REt-2), MA(REt-2, REt-3, REt-4), REt-3, REt-5]3. Qt+1 = f [MA(REt, REt-1, REt-2, REt-3, REt-4, REt-5), MA(REt-3, REt-6),
REt-3, REt-5]4. Qt+1 = f [MA(REt-4 , REt-5), Qt-1]5. Qt+1 = f (REt-4 , REt-5)6. Qt+1 = f [MA(REt-4 , REt-5), Qt-2] MA = moving average
d l t d
D.P. Solomatine. Data-driven modelling. Applications. 11
models acceptedQ (t+1) = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 )Q (t+3) = f (REt, REt-1, REt-2, REt-3, Qt, Qt-1 )Q (t+6) = f (REt, Qt )
numerical prediction is done by M5 model trees and ANNs
ANN model Multi-Layered Perceptron (MLP) three layers
SIEVE: numerical prediction
Multi Layered Perceptron (MLP), three layers Optimal number of hidden nodes:
– Qt+1 : 6– Qt+3 : 5– Qt+6 : 3
Training algorithm: back-propagation with adaptive learning rate
D.P. Solomatine. Data-driven modelling. Applications. 12
MT model Same sets of inputs , training and verification data points as that of
ANN
••77
Sample output of Model tree Generated tree (Reduced size)
SIEVE: M5 model tree details are comprehensible for a decision maker - predicting Q(t+3)
Qt <= 51.2 :
| Qt <= 28.7 : LM1 (903/5.66%)
| Qt > 28.7 : LM2 (379/13.1%)
Qt > 51.2 : LM3 (572/66.7%)
Linear modelsLM1: Qt+3 = -0.0118 + 0.317REt + 0.124REt-1 + 0.0844REt-2 –0.109REt-3
D.P. Solomatine. Data-driven modelling. Applications. 13
Qt+3 t t 1 t 2 t 3
+ 1.09Qt - 0.0826Qt-1LM2: Qt+3 = -0.262 + 11.9REt + 0.182REt-1 + 8.9REt-2 - 0.198REt-3 +
3.66Qt - 2.67Qt-1LM3: Qt+3 = 15.5 + 25.7REt + 7.59REt-1 - 0.0923REt-3 + 1.44Qt -
0.732Qt-1
SIEVE: Predicting Q(t+1)Q (t+1) = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 )
ANN verification RMSE=5.175
Prediction of Qt+1 : Verification performance350
NRMSE=0.106COE=0.9886
MT verificationRMSE=3.6123NRMSE=0.074 100
150
200
250
300
Q [
m3/s
]
Observed
Modelled (ANN)
Modelled (MT)
D.P. Solomatine. Data-driven modelling. Applications. 14
COE=0.9944
0
50
0 20 40 60 80 100 120 140 160 180t [hrs]
••88
ANN verification RMSE=11.353
SIEVE: Predicting Q(t+3)Q (t+3) = f (REt, REt-1, REt-2, REt-3, Qt, Qt-1 )
Prediction of Qt+3 : Verification performance
350
NRMSE=0.234COE=0.9452
MT verificationRMSE=12.548NRMSE=0.258 100
150
200
250
300Q
[m
3/s
]ObservedModelled (ANN)Modelled (MT)
D.P. Solomatine. Data-driven modelling. Applications. 15
COE=0.9331
0
50
0 20 40 60 80 100 120 140 160 180t [hrs]
SIEVE: Predicting Q(t+6)Q (t+6) = f (REt, Qt )
ANN verificationRMSE=19.402
Prediction of Qt+6: verification performance
350
NRMSE=0.399COE=0.8401
MT verificationRMSE=21.547NRMSE=0.443
100
150
200
250
300
350
Q [
m3/s
]
Observed
Modelled (ANN)
Modelled (MT)
D.P. Solomatine. Data-driven modelling. Applications. 16
COE=0.8028
0
50
100
0 20 40 60 80 100 120 140 160 180t [hrs]
••99
Comparison of various methods used in Sieve case study
Qt+1
trainingQt+1
testQt+3
trainingQt+3
testQt+6
trainingQt+6
test
ANN 5.826 5.175 13.549 11.353 28.970 19.402
M5 d l t 4 550 3 612 14 384 12 548 27 069 21 547M5 model tree 4.550 3.612 14.384 12.548 27.069 21.547
M5 + AdaBoost.RT 3.488 4.132 9.0415 14.042 22.493 24.078
M5 + AdaBoost.RT+ tuned on test set
3.789 3.754 12.749 11.551 27.958 19.468
SVM 3.212 10.867 20.201
Locally weighted 9.097 10.666 12.556 12.703
D.P. Solomatine. Data-driven modelling. Applications. 17
y gregression
k-nearest neighb. 1.506 11.114 17.57 13.703
Composite: Model tree + 9-NN
4.674 3.350 13.296 14.593
Posing the classification problem for flood management
Classification problem: Classification output is a class L, M or H:– Low flow up to 50 m3/s
3Medium flow up to 350 m3/s High up to 750 m3/s
Class: High flows Q(t+1)
Class: Medium flow Q(t+1)
Rainfall (t-2)
classes classes to be identifiedto be identified
past records
D.P. Solomatine. Data-driven modelling. Applications. 18
Class: Low flows Q(t+1)
Rainfall (t-3)
Flow Q(t)
past records
•New record.
•To which class does it belong?
••1010
SIEVE: the resulting decision tree (12 leaves) that classifies Q(t+3) into Low, Medium, High (error=6%)
Qt <= 51.45
| REt-1 <= 0.6686: L (815.0/10.0)
| REt-1 > 0.6686
| | Qt <= 25.59: L (24.0)
| | Qt > 25.59: M (24.0/7.0)
IF Flow(t) <= 51.45 IF Flow(t) <= 51.45
and Rainfall(tand Rainfall(t--1) > 0.671) > 0.67| | Qt ( )
Qt > 51.45
| REt-1 <= 2.3955
| | Qt <= 59.04
| | | REt <= 0.0255
| | | | Qt <= 52.67: L (5.0)
| | | | Qt > 52.67: M (7.0/1.0)
| | | REt > 0.0255: M (63.0/4.0)
| | Qt > 59.04
| | | Qt 1 <= 348.55: M (271.0)
and Flow(t) <25.6and Flow(t) <25.6
THEN: Low flowTHEN: Low flow
D.P. Solomatine. Data-driven modelling. Applications. 19
| | | Qt-1 3 8.55: ( .0)
| | | Qt-1 > 348.55
| | | | Qt <= 630.2: M (7.0)
| | | | Qt > 630.2: H (3.0)
| REt-1 > 2.3955
| | Qt <= 247.68
| | | REt <= 3.3031: M (3.0)
| | | REt > 3.3031: H (7.0/3.0)
| | Qt > 247.68: H (9.0)
SIEVE: the resulting pruned tree (4 leaves only) that does the same (error=6.7%)
Qt <= 50.85: Low (1282.0/17.0) Qt > 50 85 Qt > 50.85 | Qt <= 214.28: Medium (511.0/5.0) | Qt > 214.28 | | Qt <= 247.68: Medium (15.0/7.0) | | Qt > 247.68: High (46.0)
D.P. Solomatine. Data-driven modelling. Applications. 20
••1111
SIEVE: predicting class of flow Qt+3
Classification of Q t+3 using Decision tree, prunedClass of Flow [m3/s](10/02/70, 04:00 to 22/02/70, 16:00)
300
400
500
600
700
800
Observed class
Predicted class
flowFlow [m /s]
Med.
High
D.P. Solomatine. Data-driven modelling. Applications. 21
0
100
200
0 50 100 150 200 250 300 350t [hrs]
Low
SIEVE: performance of 3 classification methods:decision tree, Bayesian and K-nearest neighbor
Decision Tree(unpruned)
Decision Tree(pruned)
Naïve Bayes K-Nearest Neighbor(k = 3)
Evaluationfor
Train. Test. Train. Test. Train. Test. Train. TestIncorrectlyclassifiedinstances,%
1.02 5 2.10 6.67 6.09 10.67 1.83 7.33
D.P. Solomatine. Data-driven modelling. Applications. 22
%
••1212
Case study HUAIHE: rainfall-runoff modelling in flood management of Huai River by M5 model trees and ANNs
D.P. Solomatine, Y. Xue. M5 model trees compared to neural networks: application to flood forecasting in the upper reach of the Huai River in China.
ASCE Journal of Hydrologic Engineering 9(6) 2004 pp 491-501
D.P. Solomatine. Data-driven modelling. Applications. 23
ASCE Journal of Hydrologic Engineering, 9(6), 2004, pp. 491 501.
Huai River (Huaihe) basin
#·Be ijing
6°
36
113° 115° 117° 119° 121°
#
#
#
#
#
#
#
#
#
##
#
#
#Y
#
#
#
#
#
Xiaohong
Ru
Shaying
Sha
Jial u
HongzeLake
Guo
Xifei
Si
Yi
Shu
Heze
Linyi
Luohe
Xuzhou
Jin ing
Bengbu
Suzhou
Fuyang
HuaibeiXuchang
Zhoukou
Shangqiu
Zaozhuang
Zhumadian
ZhengzhouLianyungang
Pingdingshan
Yellow Sea
He'nan
Shandong
Jiangsu
Yellow River
Huai River Basin
Project Area
River
Lake & Reservoir
Boundary of Province
#Y Capital of Province
# Main C ity
C H I N A
34° 34°
36
6°
D.P. Solomatine. Data-driven modelling. Applications. 24
#
#
#
#
#Y
#Y
Huai
Pi
Don
gfeiSh i
BantaiHua inan
Xinyang
Lu'an
Hubei
AnhuiHefe i
Nanjing
Yangtz
e River
0 100 200 Kilometers
N
EW
S
32° 32°
113° 115° 117° 119° 121°
••1313
Huaihe: Rainfall-runoff modelling, schematisation of the upper reach
2
XixianCTG QX
QC
QZ
QD
12
3
5
QN
area-averaged rainfall
D.P. Solomatine. Data-driven modelling. Applications. 25
4
QN
Huaihe: data
Based on the physical properties of the catchment Inputs:
– Known past rainfalls (Pat, … , Pat-n1, PaMovkt, … , PaMovkt-n2) – Known past runoffs (flows) (QXt,…, QXt-3, QCt,..., QCt-4, QZt,…, QZt-5)
Output:– discharge one ahead (flow) QXt+1
Which outputs to use? – to be determined
b h i
D.P. Solomatine. Data-driven modelling. Applications. 26
by the input-output correlation analysis and data preprocessing
••1414
Huai River: correlation analysis and average mutual information used as the basis for selecting input variables
The correlation of related inputs variables with QXt+1for full-year data
0.4
0.6
0.8
1
r
QC
QZ
QX
Pa
PaMov4
D.P. Solomatine. Data-driven modelling. Applications. 27
0
0.2
0 1 2 3 4 5
time lags (day)
Huaihe: data preparation and selection of variables
Data preparation:– 20 years of training data (daily rainfall and runoff)– filtering data (only flood events are considered)– selection of input variables (analysis of correlation, average mutual
information)INPUT VARIABLES:
No. Var Description Correl.coeff Time lags
1 Pa Area average rainfall at time t 0 35 1
D.P. Solomatine. Data-driven modelling. Applications. 28
1 Pat Area average rainfall at time t 0.35 1
2 Pat-1 Area average rainfall at time t-1 0.65 2
3 Pamov2t 2-days moving average of Pa at t 0.72 1
4 Pamov2t-1 2-days moving average of Pa at t-1 0.48 2
5 QCt Upper stream discharge at t 0.82 1
6 QCt-1 Upper stream discharge at t-1 0.56 2
7 QXt Discharge at predicted station at t 0.77 1
••1515
Huaihe: Model trees
nodes are IF-THEN-ELSE conditions on attributes values
Model 2
Model 1
Model 3X2
3
4
values leaves are linear regression models
Discharge (t) < 154
Discharge (t+1)=
rainfallMov(t) <= 4.5
RainfallMov <= 18.5
rainfallMov(t-1) <= 13.5
rainfallMov(t) <= 4.5
Yes No
...
Model 6Model 4
Model 5
Y (output)1 2 3 4 5 6
1
2
D.P. Solomatine. Data-driven modelling. Applications. 29
g ( )Regression model 1
Discharge (t+1)= Regression model 2
Discharge (t+1)= Regression model 3
Discharge (t+1)= Regression model 4
Discharge (t+1)= Regression model 5
Huaihe: full year data, the resulting model tree (1)
QXt <= 154 :
| PaMov2t <= 4.5 : LM1 (1499/4.86%)
| PaMov2t > 4.5 :
| | PaMov2t <= 18.5 : LM2 (315/15.9%)
| | PaMov2t > 18.5 : LM3 (91/86.9%)
QXt > 154 :
| PaMov2t-1 <= 13.5 :
| | PaMov2t <= 4.5 : LM4 (377/15.9%)
| | PaMov2t > 4.5 : LM5 (109/89.7%)
D.P. Solomatine. Data-driven modelling. Applications. 30
| | ( / )
| PaMov2t-1 > 13.5 :
| | PaMov2t <= 26.5 : LM6 (135/73.1%)
| | PaMov2t > 26.5 : LM7 (49/270%)
••1616
Huaihe: full year data, the linear models on leaves
LM1: QXt+1 = 2.28 + 0.714PaMov2t-1 - 0.21PaMov2t + 1.02Pat-1 + 0.193Pat
- 0.0085QCt-1 + 0.336QCt + 0.771QXt
LM2: QXt+1 = -24.4 - 0.0481PaMov2t-1 - 4.96PaMov2t + 3.91Pat-1 + 4.51Pat
- 0.363QCt-1 + 0.712QCt + 1.05QXt
LM3: QXt+1 = -183 + 10.3PaMov2t-1 + 8.37PaMov2t - 5.32Pat-1 + 1.49Pat
- 0.0193QCt-1 + 0.106QCt + 2.16QXt
LM4: QXt+1 = 47.3 + 1.06PaMov2t-1 - 2.05PaMov2t + 1.91Pat-1 + 4.01Pat
- 0.3QCt-1 + 1.11QCt + 0.383QXt
LM5: QXt+1 = -151 - 0.277PaMov2t-1 - 37.8PaMov2t + 31.1Pat-1 + 30.3Pat
- 0.672QCt-1 + 0.746QCt + 0.842QXt
D.P. Solomatine. Data-driven modelling. Applications. 31
LM6: QXt+1 = 138 - 5.95PaMov2t-1 - 39.5PaMov2t + 29.6Pat-1 + 35.4Pat
- 0.303QCt-1 + 0.836QCt + 0.461QXt
LM7: QXt+1 = -131 - 27.2PaMov2t-1 + 51.9PaMov2t + 0.125Pat-1 - 5.29Pat
- 0.0941QCt-1 + 0.557QCt + 0.754QXt
Huaihe: full year data: resulting hydrograph (training, fragment)
5000
0
1000
2000
3000
4000 pre obv
D.P. Solomatine. Data-driven modelling. Applications. 32
The max training error of model tree using flood season data
0
31898 31958 32018
••1717
Huaihe: full year data, resulting hydrograph (verification)
4000
0
1000
2000
3000
95/5/1 95/8/9 95/11/17 96/2/25 96/6/4 96/9/12 96/12/21
pre obv
D.P. Solomatine. Data-driven modelling. Applications. 33
The max testing error of model tree using flood seasson data
95/5/1 95/8/9 95/11/17 96/2/25 96/6/4 96/9/12 96/12/21
time
Huaihe: Flood season data, relative performance of M5 model trees and ANNs
separate models for flood season (FS, May-October) were built M5 tree had 7 inputs 7 leaves M5 tree had 7 inputs, 7 leaves ANN (MLP) was trained on the same data
Perform ance A NNtesting
A NNtraining
M 5testing
M 5training
RM SE 96 100 87 98
D.P. Solomatine. Data-driven modelling. Applications. 34
M A E 24.9 33.0 24.2 31.4
M axim umError
1446 1498 1651 1766
R 0.95 0.97 0.97 0.97
••1818
M5 and ANN using flood season data (testing, fragment)
4000
0
1000
2000
3000
Dis
char
ge (
m3/
s)
OBS
FS-M5
FS-ANN
D.P. Solomatine. Data-driven modelling. Applications. 35
M5 and ANN, flood season data (testing, fragment)
0
96-6-1 96-7-1 96-7-31 96-8-30
Time
Performance of M5 and ANNin “full-year” and “flood season (FS)” experiments
full-year M5 train
full-year M5 test
full-year naïve, test
full-year linear, test
FS-M5 train.
FS-M5 test
FS-ANN train.
FS-ANN cross-valid
FS-ANN test
train. test test test valid.Years 76-79 90-96 90-96 90-96 76-89 90-96 76-89 90-93 94-96
Number of equ. M5 / hid. nodes ANN
35 35 n/a n/a 7 7 8 8 8
Numb. of samples
5109 2565 2565 2565 2625 1525 2625 653 872
RMSE 69 6 84 5 183 0 160 0 98 87 100 79 96
D.P. Solomatine. Data-driven modelling. Applications. 36
RMSE 69.6 84.5 183.0 160.0 98 87 100 79 96
Mean abs. err. 18.7 18.9 37.1 39.9 31.4 24.2 33.0 25.1 24.9
Max abs. error 1695 2208 3009 3008 1766 1651 1498 1130 1446
Correlation coeff.
0.97 0.95 0.76 0.79 0.97 0.97 0.97 0.98 0.95
••1919
Using mixture of experts (committee machines):each expert (model) is for particular hydrological condition
INPUT Condition 1Q t 1> 1000
Module Y M5
Condition 3
Q xt-1> 1000
Condition 2Q xt-1 1000Q xt > 200
1
Module 2
NANN
M5
ANN
D.P. Solomatine. Data-driven modelling. Applications. 37
Condition 3Pa t-1 > 50Pa Mov2 t-2 < 5Pa Mov2 t-4 < 5
Module 3
Module xCondition x
M5
?
Performance of the M5 mixture model in training (fragment shown is the 1982 flood)
4000
5000 OBS PRE combination of module 1&2 PRE module 1
0
1000
2000
3000
4000
Dis
char
ge (
m3/
s)
D.P. Solomatine. Data-driven modelling. Applications. 38
M5 modular model, training (fragment)
0
82-7-1 82-7-31 82-8-30
Time
••2020
Performance of the M5 mixture model in testing (fragment shown is the 1996 flood)
5000
OBS PRE combination of module1 &2 PRE module 1
1000
2000
3000
4000
Dis
char
ge (
m3/
s)
D.P. Solomatine. Data-driven modelling. Applications. 39
M5 modular model, testing (fragment, 1996 flood)
0
96-6-1 96-7-1 96-7-31 96-8-30 96-9-29 96-10-29
Time
Huaihe: comparison of the mixture of M5 modelswith the “flood season” M5 model
Mixture of M5 models Flood season model (M5)
P f i 76 89 90 96 i 76 89 90 96Performance train76-89 test 90-96 train76-89 test90-96
Correlationcoefficient
0.99 0.98 0.96 0.95
Mean absolute error 83.0 126.8 97.3 114.1
Max absolute error 548.0 509.1 1173.5 1183.6
Root mean squared 127 0 176 1 176 6 222 2
D.P. Solomatine. Data-driven modelling. Applications. 40
Root mean squarederror
127.0 176.1 176.6 222.2
Total Number ofInstances
96 32 96 32
••2121
Models trees (MT) and ANN: comparison
training of MT is much faster than ANN, and it always converges; the results can be easily understood by decision makers; the results can be easily understood by decision makers; by applying pruning (that is making trees smaller by combining
subtrees in one node) it is possible to generate a range of MTs - from an inaccurate but simple linear regression (one leave only) to a much more accurate but complex combination of local models (many branches and leaves)
MT combines a number of “local” models that could be more accurate
D.P. Solomatine. Data-driven modelling. Applications. 41
for some extreme cases
in some cases ANN supercedes MT in accuracy
Case study Swarupgunj: modelling stage-discharge relationship (rating curve)
B. Bhattacharya, D.P. Solomatine. Application of artificial neural network in reconstructing stage-discharge relationship Proc 4th Int Conference on
D.P. Solomatine. Data-driven modelling. Applications. 42
reconstructing stage discharge relationship. Proc. 4th Int. Conference on Hydroinformatics, Cedar-Rapids, July 2000
••2222
Rating curve
Relationship between stage & discharge is expressed& discharge is expressed thru’ a rating curve
Simplified relationship is:Q = (h-h0)
A typical rating curve: Problem: relationship is
complex depends upon
4.0
5.0
6.0
7.0
8.0
9.0
10
20
30
40
50
60
70
80
90
10
0
Discharge (scaled)S
tag
e
D.P. Solomatine. Data-driven modelling. Applications. 43
complex, depends uponpast values etc.
Discharge (scaled)
Stage-discharge data
Data from one discharge measuring station in India (Swarupgunj on the river Bhagirathi)the river Bhagirathi)
The river course was more or less stable during the data collection period
Ten years data are available
2/3rd of the data is used for training and the rest is used for verification
D.P. Solomatine. Data-driven modelling. Applications. 44
••2323
Input parameter selection
The following input parameters were considered:– Stage (t+1)– Stage (t+1)– Stage(t)– Stage(t-1)– Discharge(t)
Output parameter:– Discharge (t+1)
ANN and Model Tree models were developed with these inputs
D.P. Solomatine. Data-driven modelling. Applications. 45
ANN and Model Tree models were developed with these inputs A conventional rating curve was prepared using stage (t) and
discharge (t) data
Generated M5 model trees
unpruned tree with 94 leaves, verification RMSE=76.0
pruned tree with 4 leaves (verification RMSE 69.1)
Qt_1 <= 1500 : | Qt_1 <= 1130 : Qt = -243 - 187ht_1 + 299ht + 0.667Qt_1| Qt_1 > 1130 : Qt = -214 - 387ht_1 + 448ht + 0.885Qt_1Qt_1 > 1500 : | ht <= 7.85 : Qt = -455 - 491ht_1 + 628ht + 0.727Qt_1| ht > 7.85 : Qt = -1720 - 605ht_1 + 924ht + 0.66Qt_1
D.P. Solomatine. Data-driven modelling. Applications. 46
pruned tree with 2 leaves (verification RMSE=69.1)
Qt_1 <= 1500 : Qt = -204 - 301ht_1 + 383ht + 0.788Qt_1Qt_1 > 1500 : Qt = -728 - 550ht_1 + 721ht + 0.745Qt_1
••2424
Training & verification statistics (1)
The model tree was trained with different pruning factor and finally, a a pruning factor 2 was chosen:a pruning factor 2 was chosen:
Training & verification statistics for all the models
Training VerificationRMSE NRMSE COE RMSE NRMSECOE
Prune factor = 0 79.3 0.132 0.991 76 0.111 0.994Prune factor = 1 89.8 0.15 0.989 69.1 0.101 0.995Prune factor = 2 92 0.153 0.988 69.7 0.101 0.995
T i i V ifi i
D.P. Solomatine. Data-driven modelling. Applications. 47
Training VerificationRMSE NRMSE COE RMSE NRMSECOE
Model tree 92.0 0.153 0.988 69.7 0.101 0.995ANN 90.5 0.151 0.988 70.5 0.103 0.995Rating curve 143.3 0.239 0.974 111.2 0.162 0.989
Training & verification statistics (2)
Percentage of verification data with prediction error> 5% > 10% >15% > 20%
Model tree 20.3 1.6 0.2 0.2ANN 21.4 3.1 0.6 0.3Rating curve 42.4 11.8 5.3 1.9
D.P. Solomatine. Data-driven modelling. Applications. 48
••2525
Verification: ANN model
2030405060708090
100
Dis
char
ge(s
cale
d)
Know n discharge Computed discharge (ANN)
D.P. Solomatine. Data-driven modelling. Applications. 49
01020
0 50 100
150
200
250
300
350
400
450
500
550
600
650
Validation events
Verification: Model Tree
30405060708090
100
Dis
char
ge (
scal
ed)
Know n Discharge Computed Discharge (MT)
D.P. Solomatine. Data-driven modelling. Applications. 50
01020
0 50 100
150
200
250
300
350
400
450
500
550
600
650
Validation Events
••2626
Swarupgunj: Conclusions
The data driven models exhibit a far better performance than the conventional rating curveconventional rating curve
The predictive accuracy of the MT & ANN model is almost the same
In cases when large amount of data is available (such as for stage-discharge relationships) accurate data-driven models can be developed
D.P. Solomatine. Data-driven modelling. Applications. 51
developed
Using ANN in replicating hydrodynamic / hydrologic model of river basin
D.P. Solomatine, A. Avila Torres. Neural Network Approximation of a Hydrodynamic Model in Optimizing Reservoir Operation. Proc. 2nd International
Conference on Hydroinformatics. Zurich, Switzerland, August 1996
D.P. Solomatine. Data-driven modelling. Applications. 52
••2727
Case study APURE: enhancement of Apure river basin in Venezuela
D.P. Solomatine. Data-driven modelling. Applications. 53
Multi-criterial optimization: energy production, navigability
Find the reservoirs’ operation policies providing for the high values of the two conflicting criteria:the two conflicting criteria: – (1) follow the energy production target as close as possible; – (2) maximize the period of time when the river is navigable.
MCDM problem is reduced to a single criterion problem: navigability criterion 2 is considered a “soft” constraint problem is solved a number of times as a single criterion problem
D.P. Solomatine. Data-driven modelling. Applications. 54
problem is solved a number of times as a single-criterion problem with respect to the energy criterion 1.
••2828
Replication of Mike11/NAM model by ANN:Apure river basin case study
MIKE-11/NAM modelling system (DHI) was used to model hydrodynamics of the river and hydrology of the 21 adjacent sub-hydrodynamics of the river and hydrology of the 21 adjacent subcatchments
Modelling was performed for years 1981-83 Resulting data files represented the relation between the reservoirs’
water releases and water levels downstream
Sub-catchments’ runoffReservoirs / dams:
D.P. Solomatine. Data-driven modelling. Applications. 55
water levels needed for navigability
La HondaLa CuevasLa Vueltosa
Replication of Mike11/NAM model by ANN:reservoir optimisation in Apure river basin
MIKE-11/NAM modelling system (DHI) was used to
Optimisation
system (DHI) was used to model hydrodynamics of the river and hydrology of the 21 adjacent sub-catchments
Modelling was performed for years 1981-83
Resulting data files
generate release schedules
run Mike11/NAM model to produce water levels
check navigability constraints
Start
river and catchment modelling
run Neural network replicating Mike11/NAM model
D.P. Solomatine. Data-driven modelling. Applications. 56
represented the relation between the reservoirs’ water releases and water levels downstream
constraints
optimal solution reached?
Stop
••2929
Performance of the trained ANN
verification
D.P. Solomatine. Data-driven modelling. Applications. 57
training
Replication of Mike11/NAM model by ANN
Outputs: 3 (water levels in three points) Hidden nodes: 5 Hidden nodes: 5 Inputs: 25
– runoffs from 21 subcatchments (boundary conditions for hydrodynamic model);
– 3 reservoir releases (boundary conditions for hydrodynamic model);– water level at the previous time moment (week).
Instead of modelling 3 outputs, three ANNs were constructed
D.P. Solomatine. Data-driven modelling. Applications. 58
Instead of modelling 3 outputs, three ANNs were constructed
••3030
Case studies SALLAND and OVERWAARD:Using ANNs and fuzzy systems in polder controlUsing ANNs and fuzzy systems in polder control
STOWA/Delft Cluster/IHE research project
D.P. Solomatine. Data-driven modelling. Applications. 59
STOWA/Delft Cluster/IHE research project
Participants (IHE-Delft):A.H. Lobbrecht, D.P. Solomatine
Yonas Dibike, Ling Wang, B. Bhattacharya, B. Bazartseren
Case study 1: Groot Salland
S
D.P. Solomatine. Data-driven modelling. Applications. 60
Rietberg: 6.646 ha Stuw7A: 13.697 ha Stuw3A: 10.130 ha
RietbergStuw 7A
••3131
Case study 1: Groot Salland
building rainfall-runoff model reconstruction of missing data reconstruction of missing data
methods used:– ANN– M5 model trees– committee machines (modular models)
D.P. Solomatine. Data-driven modelling. Applications. 61
Groot Salland: Filling missing data (ANN Testing) Q_Stuw7A = f( (P-E)t … (P-E)t-5 ):
using committee machines (modular local models)
Filling of missing data at Stuw7A (ANN Testing)
14
2
4
6
8
10
12
Run
off (
m3/
s)
D.P. Solomatine. Data-driven modelling. Applications. 62
0
2
01/01/99 03/03/99 05/03/99 07/03/99 09/02/99 11/02/99
Time (day)
ActualFlow LANN GANN
LANN=local ANN (committee machines), GANN=global ANN (trained on whole data set)LANN=local ANN (committee machines), GANN=global ANN (trained on whole data set)
••3232
14
Stuw 7A (ANN verification)
Groot Salland: Using ANN in filling-in data:committee machines (local models, LANN) vs global models (GANN)
(verification, 1998-2000)
2
4
6
8
10
12
Dis
char
ge (m
3 /s)
Q = f ( (P-E)t ... (P-E)t-5 )
D.P. Solomatine. Data-driven modelling. Applications. 63
0
2
01/01/99 03/03/99 05/03/99 07/03/99 09/02/99 11/02/99
Measured LANNGANN
Groundwater: interpolation btw 14-day level (testing) GWL_a(t) = f(GWL_b(t-5),…GWL_b(t+5)
ANN for interpolating grounwater level measurment (Testing)
100
150
200
250
300
GW
L (
cm)
D.P. Solomatine. Data-driven modelling. Applications. 64
0
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Exempel No.
Observed GWL at gg92a ANN GWL output
••3333
Runoff forecasting Stuw7A (Global ANN testing) Qt+1 = f( Pt ,Qt,Qt-1)
R ff F i S 7A (GANN T i )
5
6
7
dRunoff Forecasting at Stue7A (GANN Testing)
2
3
4
5
6
7
8
Dis
cha
rge
(m
3/s
)
0
1
2
3
4
0 1 2 3 4 5 6 7
Observed
Ca
lcu
late
d
D.P. Solomatine. Data-driven modelling. Applications. 65
0
1
2
1 51 101 151 201 251 301 351
Exempel No.
D
Observed Discharge ANN Output
Case study Groot Salland: conclusions
ANNs and model trees were able to predict flows and to perform the data infillingg
M5 model trees perform as good as ANNs and are more transparent “Global” ANN considers the runoff process in its entirety and may not be the
optimal approach:– during the training process rare information is considered as noise and is
filtered out– these rare events are, however, exactly the situations to which we would
like to apply the ANN models most
D.P. Solomatine. Data-driven modelling. Applications. 66
like to apply the ANN models most For small catchments (Rietberg), the runoff times are very short (hours): it
is necessary to adapt the monitoring frequencies to that interval important to make records of all changes in water control/management:
these changes change the modelled system.
••3434
Case study Hoogheemraadschap van de Alblasserwaard en de Vijfheeren-landen (Overwaard)
Artificial Neural Networks and Fuzzy Logic Systems for Model Based Control:Control:– build the simulation model– investigate the possibilities of using new technologies like ANN and
Fuzzy logic for model based control of the water system
Lek
D.P. Solomatine. Data-driven modelling. Applications. 67
Overwaard: On-line performance of ANN-based controller
1
1.5
er
leve
l (m
) -
as
in
0
0.5
10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98
Time
Sim
ula
ted
wa
teu
pp
er
ba
External(ANN) control Dynamic control
-0.4) -
D.P. Solomatine. Data-driven modelling. Applications. 68
-1
-0.8
-0.6
10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98
Date
Sim
ula
ted
wa
ter
lev
el
(mlo
we
r b
as
in
External (ANN) control Dynamic control
••3535
Overwaard: On-line performance of fuzzy-based controller
1
1.5
r le
vel
(m)
- a
sin
0
0.5
10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98
Time
Sim
ula
ted
wa
teu
pp
er
ba
External(FAS) control Dynamic control
-0.4-
D.P. Solomatine. Data-driven modelling. Applications. 69
-1
-0.8
-0.6
10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98
Date
Sim
ula
ted
wa
ter
leve
l (m
) lo
we
r b
as
in
External (FAS) control Dynamic control
Overwaard: On-line performance of controller (discharge)
ANN Controller
10
15
rge
(m
il. m
3)
Cumulative error =Cumulative error =--75,000 m375,000 m3== --0.55%0.55%
FAS Controller
0
5
10/28/98 10/30/98 11/01/98 11/03/98 11/05/98 11/07/98 11/09/98 11/11/98
Time
Cu
m.
dis
ch
a
External (ANN) control Dynamic control
15
0.55%0.55%
D.P. Solomatine. Data-driven modelling. Applications. 70
0
5
10
10/28/98 10/30/98 11/01/98 11/03/98 11/05/98 11/07/98 11/09/98 11/11/98
Time
Cu
m.
dis
ch
arg
e (
mil.
m3
)
E x ternal (F A S ) c ontro l Dy nam ic c ontro l
Cumulative error =Cumulative error =--75,000 m375,000 m3= = --0.55%0.55%
••3636
Case study Overwaard: conclusions (1)
AQUARIUS model of Overwaard was found to simulate the water system very well calibration results were acceptablesystem very well, calibration results were acceptable
ANN and FAS replicate the dynamic control pumping strategy reasonably well.
Replacing the slow computational component by the fast-running (10 times faster) intelligent controllers could simplify the use of AQUARIUS in real time control tasks
D.P. Solomatine. Data-driven modelling. Applications. 71
Case study Overwaard: conclusions (2)
Intelligent controllers are found to be able to reproduce the centralised behaviour (in terms of water levels and correspondingcentralised behaviour (in terms of water levels and corresponding discharges) of optimal control action by using easily measurable local information
External control with ANN and FAS required less than one third of the simulation time of the central optimal control
D.P. Solomatine. Data-driven modelling. Applications. 72
Replacing the slow computational component by the fast-running intelligent controllers could simplify the use of AQUARIUS in real time control tasks
••3737
Applications of data-driven methods in flood management and control problems: conclusions
data-driven methods allow to build accurate predictive models data driven methods are good approximators of physically based data-driven methods are good approximators of physically-based
models– they are faster– can be incorporated into optimization and real-time control loops
using classification methods (decision trees) leads to simpler models and requires less accurate data
in numerical prediction, M5 model trees allow to build transparent
D.P. Solomatine. Data-driven modelling. Applications. 73
in numerical prediction, M5 model trees allow to build transparent models which can be understood by decision makers much better than ANNs
using mixture of models allows to improve the performance if the water system changes, data-driven models have to be re-
trained