Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••11

Data-driven modelling:CASE STUDIES in RIVER and FLOOD

MANAGEMENT and CONTROL

Dimitri P. Solomatine

UNESCO-IHE Institute for Water Education

Hydroinformatics Chair

Data driven modelling and forecasting:our experience

replicating behavior of hydrodynamic/hydrological river model with the objective of using the ANN in model-based optimal control of a reservoir (Solomatine & Torres, 1996)

building ANN-based intelligent controller for real-time control of water levels in a polder (Lobbrecht & Solomatine, 1999);

modeling rainfall-runoff process with ANNs (Dibike, Solomatine & Abbott, 1999);

ANN in surge water level prediction for ship guidance reconstructing stage-discharge relationship using ANN (Bhattacharya &

Solomatine 2000)

D.P. Solomatine. Data-driven modelling. Applications. 2

Solomatine, 2000) using M5 model trees to predict river discharge using SVMs in prediction of water flows for flood management (Dibike,

Velickov, Solomatine & Abbott, 2001) Using ANN and other ML methods in optimization of urban systems (Alfonso

et al., 2012; Xu et al., 2015)

••22

Process (physically-based) modelling of flow:river modelling context

Available data:– rainfalls Rt lateral inflows QL

QQttupup

t Q– catchment and river physical properties (soil,

geometry, roughnesses)– initial and boundary conditions for flows Q 0(x,t)

Inputs: QL(x,t), Qup(t), Q 0(x,t) , system properties Output: flow Q (x, t) Model:

( ) ( ( ) ( ) 0( ) )

QQttRRtt

0t

hb

x

Q


Q (x, t)=F (QL(x,t), Qup(t), Q 0(x,t) , system properties) Questions:

– are the physical properties of the catchment known?– is F good enough ?

0)(2

2

K

QQgAHh

xgA

A

Q

xt

Q

Using data-driven methods in rainfall-runoff modelling

Available data:– rainfalls Rt

QQttupup

t– runoffs (flows) Qt

Inputs: lagged rainfalls Rt Rt-1 Rt-2 … Output to predict: Qt+T

Model: Qt+T = F (Rt Rt-1 … Qt Qt-1 …Qtup Qt-1

up …)(routing)

QQttRRtt


Questions: – how to find the appropriate lags? (lags embody the

physical properties of the catchment)– how to build F ?

••33

Case study SIEVE: flood management problem

D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks in rainfall runoff modeling Hydrological Sciences Journal 48 (3) 2003 399 411


rainfall-runoff modeling. Hydrological Sciences Journal, 48 (3), 2003, 399 - 411.

Italy, Tuscany, Arno river basin, Sieve subcatchment


••44

Schematisation

mountaneous catchment incatchment in Southern Europe

area of 822 sq. km


SIEVE: data

three months of hourly discharge (Q), precipitation (R) and evapotranspiration (E) data available (Dec 1959 to Feb 1960 2160evapotranspiration (E) data available (Dec. 1959 to Feb. 1960, 2160 data points)

for this experiment, data was represented as hourly data on effective rainfall RE and runoff R

1854 examples used for training, 300 for verification the problem: predict flow Q: 1, 3 and 6 hours ahead


••55

SIEVE: visualization of data

FLOW1: effective rainfall and discharge data

800 0

Discharge [m3/s]

Eff.rainfall [mm]

300

400

500

600

700

Discharge [m3/s]

2

4

6

8

10

12

14

Effective rainfall [mm]


0

100

200

0 500 1000 1500 2000 2500

Time [hrs]

14

16

18

20

Analysis of relationships, determining lags, choice of variables

200

250

300

350

s]

2

2.5

3

3.5

m]

Q

))()(((( jjiiij xExxExE

Covariance

Correlation

Rainfall event and the resulting flow

0

50

100

150

200

90 100 110 120 130 140 150

t [hrs ]

Q [

m3/s

0

0.5

1

1.5

2

R [

mmQ

R

Correlation between Qt+1 and REt-i

0 20.30.40.50.60.70.8

Correlation coeff.

jjii

ijij

•Correlation coefficient

Average mutual information


00.10.2

0 2 4 6 8 10 12

Lag in REt-i

AMI between Q(t+1) and R(t-tau)

0.00

0.05

0.10

0.15

0.20

0.25

0 2 4 6 8 10

time lag tau

AM

I

e age utua o at o

••66

SIEVE: variables and posing the problem

after correlation analysis, a number of models was tried, e.g. for Qt+1 :1. Qt+1 = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5 Qt, Qt-1 )Qt+1 f ( t t 1 t 2 t 3 t 4 t 5, Qt Qt 1 )2. Qt+1 = f [MA( REt, REt-1, REt-2), MA(REt-2, REt-3, REt-4), REt-3, REt-5]3. Qt+1 = f [MA(REt, REt-1, REt-2, REt-3, REt-4, REt-5), MA(REt-3, REt-6),

REt-3, REt-5]4. Qt+1 = f [MA(REt-4 , REt-5), Qt-1]5. Qt+1 = f (REt-4 , REt-5)6. Qt+1 = f [MA(REt-4 , REt-5), Qt-2] MA = moving average

d l t d


models acceptedQ (t+1) = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 )Q (t+3) = f (REt, REt-1, REt-2, REt-3, Qt, Qt-1 )Q (t+6) = f (REt, Qt )

numerical prediction is done by M5 model trees and ANNs

ANN model Multi-Layered Perceptron (MLP) three layers

SIEVE: numerical prediction

Multi Layered Perceptron (MLP), three layers Optimal number of hidden nodes:

– Qt+1 : 6– Qt+3 : 5– Qt+6 : 3

Training algorithm: back-propagation with adaptive learning rate


MT model Same sets of inputs , training and verification data points as that of

ANN

••77

Sample output of Model tree Generated tree (Reduced size)

SIEVE: M5 model tree details are comprehensible for a decision maker - predicting Q(t+3)

Qt <= 51.2 :

| Qt <= 28.7 : LM1 (903/5.66%)

| Qt > 28.7 : LM2 (379/13.1%)

Qt > 51.2 : LM3 (572/66.7%)

Linear modelsLM1: Qt+3 = -0.0118 + 0.317REt + 0.124REt-1 + 0.0844REt-2 –0.109REt-3


Qt+3 t t 1 t 2 t 3

+ 1.09Qt - 0.0826Qt-1LM2: Qt+3 = -0.262 + 11.9REt + 0.182REt-1 + 8.9REt-2 - 0.198REt-3 +

3.66Qt - 2.67Qt-1LM3: Qt+3 = 15.5 + 25.7REt + 7.59REt-1 - 0.0923REt-3 + 1.44Qt -

0.732Qt-1

SIEVE: Predicting Q(t+1)Q (t+1) = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 )

ANN verification RMSE=5.175

Prediction of Qt+1 : Verification performance350

NRMSE=0.106COE=0.9886

MT verificationRMSE=3.6123NRMSE=0.074 100

150

200

250

300

Q [

m3/s

]

Observed

Modelled (ANN)

Modelled (MT)


COE=0.9944

0

50

0 20 40 60 80 100 120 140 160 180t [hrs]

••88

ANN verification RMSE=11.353

SIEVE: Predicting Q(t+3)Q (t+3) = f (REt, REt-1, REt-2, REt-3, Qt, Qt-1 )

Prediction of Qt+3 : Verification performance

350

NRMSE=0.234COE=0.9452

MT verificationRMSE=12.548NRMSE=0.258 100

150

200

250

300Q

[m

3/s

]ObservedModelled (ANN)Modelled (MT)


COE=0.9331

0

50

0 20 40 60 80 100 120 140 160 180t [hrs]

SIEVE: Predicting Q(t+6)Q (t+6) = f (REt, Qt )

ANN verificationRMSE=19.402

Prediction of Qt+6: verification performance

350

NRMSE=0.399COE=0.8401

MT verificationRMSE=21.547NRMSE=0.443

100

150

200

250

300

350

Q [

m3/s

]

Observed

Modelled (ANN)

Modelled (MT)


COE=0.8028

0

50

100

0 20 40 60 80 100 120 140 160 180t [hrs]

••99

Comparison of various methods used in Sieve case study

Qt+1

trainingQt+1

testQt+3

trainingQt+3

testQt+6

trainingQt+6

test

ANN 5.826 5.175 13.549 11.353 28.970 19.402

M5 d l t 4 550 3 612 14 384 12 548 27 069 21 547M5 model tree 4.550 3.612 14.384 12.548 27.069 21.547

M5 + AdaBoost.RT 3.488 4.132 9.0415 14.042 22.493 24.078

M5 + AdaBoost.RT+ tuned on test set

3.789 3.754 12.749 11.551 27.958 19.468

SVM 3.212 10.867 20.201

Locally weighted 9.097 10.666 12.556 12.703


y gregression

k-nearest neighb. 1.506 11.114 17.57 13.703

Composite: Model tree + 9-NN

4.674 3.350 13.296 14.593

Posing the classification problem for flood management

Classification problem: Classification output is a class L, M or H:– Low flow up to 50 m3/s

3Medium flow up to 350 m3/s High up to 750 m3/s

Class: High flows Q(t+1)

Class: Medium flow Q(t+1)

Rainfall (t-2)

classes classes to be identifiedto be identified

past records


Class: Low flows Q(t+1)

Rainfall (t-3)

Flow Q(t)

past records

•New record.

•To which class does it belong?

••1010

SIEVE: the resulting decision tree (12 leaves) that classifies Q(t+3) into Low, Medium, High (error=6%)

Qt <= 51.45

| REt-1 <= 0.6686: L (815.0/10.0)

| REt-1 > 0.6686

| | Qt <= 25.59: L (24.0)

| | Qt > 25.59: M (24.0/7.0)

IF Flow(t) <= 51.45 IF Flow(t) <= 51.45

and Rainfall(tand Rainfall(t--1) > 0.671) > 0.67| | Qt ( )

Qt > 51.45

| REt-1 <= 2.3955

| | Qt <= 59.04

| | | REt <= 0.0255

| | | | Qt <= 52.67: L (5.0)

| | | | Qt > 52.67: M (7.0/1.0)

| | | REt > 0.0255: M (63.0/4.0)

| | Qt > 59.04

| | | Qt 1 <= 348.55: M (271.0)

and Flow(t) <25.6and Flow(t) <25.6

THEN: Low flowTHEN: Low flow


| | | Qt-1 3 8.55: ( .0)

| | | Qt-1 > 348.55

| | | | Qt <= 630.2: M (7.0)

| | | | Qt > 630.2: H (3.0)

| REt-1 > 2.3955

| | Qt <= 247.68

| | | REt <= 3.3031: M (3.0)

| | | REt > 3.3031: H (7.0/3.0)

| | Qt > 247.68: H (9.0)

SIEVE: the resulting pruned tree (4 leaves only) that does the same (error=6.7%)

Qt <= 50.85: Low (1282.0/17.0) Qt > 50 85 Qt > 50.85 | Qt <= 214.28: Medium (511.0/5.0) | Qt > 214.28 | | Qt <= 247.68: Medium (15.0/7.0) | | Qt > 247.68: High (46.0)


••1111

SIEVE: predicting class of flow Qt+3

Classification of Q t+3 using Decision tree, prunedClass of Flow [m3/s](10/02/70, 04:00 to 22/02/70, 16:00)

300

400

500

600

700

800

Observed class

Predicted class

flowFlow [m /s]

Med.

High


0

100

200

0 50 100 150 200 250 300 350t [hrs]

Low

SIEVE: performance of 3 classification methods:decision tree, Bayesian and K-nearest neighbor

Decision Tree(unpruned)

Decision Tree(pruned)

Naïve Bayes K-Nearest Neighbor(k = 3)

Evaluationfor

Train. Test. Train. Test. Train. Test. Train. TestIncorrectlyclassifiedinstances,%

1.02 5 2.10 6.67 6.09 10.67 1.83 7.33


%

••1212

Case study HUAIHE: rainfall-runoff modelling in flood management of Huai River by M5 model trees and ANNs

D.P. Solomatine, Y. Xue. M5 model trees compared to neural networks: application to flood forecasting in the upper reach of the Huai River in China.

ASCE Journal of Hydrologic Engineering 9(6) 2004 pp 491-501


ASCE Journal of Hydrologic Engineering, 9(6), 2004, pp. 491 501.

Huai River (Huaihe) basin

#·Be ijing

6°

36

113° 115° 117° 119° 121°

#

#

#

#

#

#

#

#

#

##

#

#

#Y

#

#

#

#

#

Xiaohong

Ru

Shaying

Sha

Jial u

HongzeLake

Guo

Xifei

Si

Yi

Shu

Heze

Linyi

Luohe

Xuzhou

Jin ing

Bengbu

Suzhou

Fuyang

HuaibeiXuchang

Zhoukou

Shangqiu

Zaozhuang

Zhumadian

ZhengzhouLianyungang

Pingdingshan

Yellow Sea

He'nan

Shandong

Jiangsu

Yellow River

Huai River Basin

Project Area

River

Lake & Reservoir

Boundary of Province

#Y Capital of Province

# Main C ity

C H I N A

34° 34°

36

6°


#

#

#

#

#Y

#Y

Huai

Pi

Don

gfeiSh i

BantaiHua inan

Xinyang

Lu'an

Hubei

AnhuiHefe i

Nanjing

Yangtz

e River

0 100 200 Kilometers

N

EW

S

32° 32°

113° 115° 117° 119° 121°

••1313

Huaihe: Rainfall-runoff modelling, schematisation of the upper reach

2

XixianCTG QX

QC

QZ

QD

12

3

5

QN

area-averaged rainfall


4

QN

Huaihe: data

Based on the physical properties of the catchment Inputs:

– Known past rainfalls (Pat, … , Pat-n1, PaMovkt, … , PaMovkt-n2) – Known past runoffs (flows) (QXt,…, QXt-3, QCt,..., QCt-4, QZt,…, QZt-5)

Output:– discharge one ahead (flow) QXt+1

Which outputs to use? – to be determined

b h i


by the input-output correlation analysis and data preprocessing

••1414

Huai River: correlation analysis and average mutual information used as the basis for selecting input variables

The correlation of related inputs variables with QXt+1for full-year data

0.4

0.6

0.8

1

r

QC

QZ

QX

Pa

PaMov4


0

0.2

0 1 2 3 4 5

time lags (day)

Huaihe: data preparation and selection of variables

Data preparation:– 20 years of training data (daily rainfall and runoff)– filtering data (only flood events are considered)– selection of input variables (analysis of correlation, average mutual

information)INPUT VARIABLES:

No. Var Description Correl.coeff Time lags

1 Pa Area average rainfall at time t 0 35 1


1 Pat Area average rainfall at time t 0.35 1

2 Pat-1 Area average rainfall at time t-1 0.65 2

3 Pamov2t 2-days moving average of Pa at t 0.72 1

4 Pamov2t-1 2-days moving average of Pa at t-1 0.48 2

5 QCt Upper stream discharge at t 0.82 1

6 QCt-1 Upper stream discharge at t-1 0.56 2

7 QXt Discharge at predicted station at t 0.77 1

••1515

Huaihe: Model trees

nodes are IF-THEN-ELSE conditions on attributes values

Model 2

Model 1

Model 3X2

3

4

values leaves are linear regression models

Discharge (t) < 154

Discharge (t+1)=

rainfallMov(t) <= 4.5

RainfallMov <= 18.5

rainfallMov(t-1) <= 13.5

rainfallMov(t) <= 4.5

Yes No

...

Model 6Model 4

Model 5

Y (output)1 2 3 4 5 6

1

2


g ( )Regression model 1

Discharge (t+1)= Regression model 2




Huaihe: full year data, the resulting model tree (1)

QXt <= 154 :

| PaMov2t <= 4.5 : LM1 (1499/4.86%)

| PaMov2t > 4.5 :

| | PaMov2t <= 18.5 : LM2 (315/15.9%)

| | PaMov2t > 18.5 : LM3 (91/86.9%)

QXt > 154 :

| PaMov2t-1 <= 13.5 :

| | PaMov2t <= 4.5 : LM4 (377/15.9%)

| | PaMov2t > 4.5 : LM5 (109/89.7%)


| | ( / )

| PaMov2t-1 > 13.5 :

| | PaMov2t <= 26.5 : LM6 (135/73.1%)

| | PaMov2t > 26.5 : LM7 (49/270%)

••1616

Huaihe: full year data, the linear models on leaves

LM1: QXt+1 = 2.28 + 0.714PaMov2t-1 - 0.21PaMov2t + 1.02Pat-1 + 0.193Pat

- 0.0085QCt-1 + 0.336QCt + 0.771QXt

LM2: QXt+1 = -24.4 - 0.0481PaMov2t-1 - 4.96PaMov2t + 3.91Pat-1 + 4.51Pat

- 0.363QCt-1 + 0.712QCt + 1.05QXt

LM3: QXt+1 = -183 + 10.3PaMov2t-1 + 8.37PaMov2t - 5.32Pat-1 + 1.49Pat

- 0.0193QCt-1 + 0.106QCt + 2.16QXt

LM4: QXt+1 = 47.3 + 1.06PaMov2t-1 - 2.05PaMov2t + 1.91Pat-1 + 4.01Pat

- 0.3QCt-1 + 1.11QCt + 0.383QXt

LM5: QXt+1 = -151 - 0.277PaMov2t-1 - 37.8PaMov2t + 31.1Pat-1 + 30.3Pat

- 0.672QCt-1 + 0.746QCt + 0.842QXt


LM6: QXt+1 = 138 - 5.95PaMov2t-1 - 39.5PaMov2t + 29.6Pat-1 + 35.4Pat

- 0.303QCt-1 + 0.836QCt + 0.461QXt

LM7: QXt+1 = -131 - 27.2PaMov2t-1 + 51.9PaMov2t + 0.125Pat-1 - 5.29Pat

- 0.0941QCt-1 + 0.557QCt + 0.754QXt

Huaihe: full year data: resulting hydrograph (training, fragment)

5000

0

1000

2000

3000

4000 pre obv


The max training error of model tree using flood season data

0

31898 31958 32018

••1717

Huaihe: full year data, resulting hydrograph (verification)

4000

0

1000

2000

3000

95/5/1 95/8/9 95/11/17 96/2/25 96/6/4 96/9/12 96/12/21

pre obv


The max testing error of model tree using flood seasson data

95/5/1 95/8/9 95/11/17 96/2/25 96/6/4 96/9/12 96/12/21

time

Huaihe: Flood season data, relative performance of M5 model trees and ANNs

separate models for flood season (FS, May-October) were built M5 tree had 7 inputs 7 leaves M5 tree had 7 inputs, 7 leaves ANN (MLP) was trained on the same data

Perform ance A NNtesting

A NNtraining

M 5testing

M 5training

RM SE 96 100 87 98


M A E 24.9 33.0 24.2 31.4

M axim umError

1446 1498 1651 1766

R 0.95 0.97 0.97 0.97

••1818

M5 and ANN using flood season data (testing, fragment)

4000

0

1000

2000

3000

Dis

char

ge (

m3/

s)

OBS

FS-M5

FS-ANN


M5 and ANN, flood season data (testing, fragment)

0

96-6-1 96-7-1 96-7-31 96-8-30

Time

Performance of M5 and ANNin “full-year” and “flood season (FS)” experiments

full-year M5 train

full-year M5 test

full-year naïve, test

full-year linear, test

FS-M5 train.

FS-M5 test

FS-ANN train.

FS-ANN cross-valid

FS-ANN test

train. test test test valid.Years 76-79 90-96 90-96 90-96 76-89 90-96 76-89 90-93 94-96

Number of equ. M5 / hid. nodes ANN

35 35 n/a n/a 7 7 8 8 8

Numb. of samples

5109 2565 2565 2565 2625 1525 2625 653 872

RMSE 69 6 84 5 183 0 160 0 98 87 100 79 96


RMSE 69.6 84.5 183.0 160.0 98 87 100 79 96

Mean abs. err. 18.7 18.9 37.1 39.9 31.4 24.2 33.0 25.1 24.9

Max abs. error 1695 2208 3009 3008 1766 1651 1498 1130 1446

Correlation coeff.

0.97 0.95 0.76 0.79 0.97 0.97 0.97 0.98 0.95

••1919

Using mixture of experts (committee machines):each expert (model) is for particular hydrological condition

INPUT Condition 1Q t 1> 1000

Module Y M5

Condition 3

Q xt-1> 1000

Condition 2Q xt-1 1000Q xt > 200

1

Module 2

NANN

M5

ANN


Condition 3Pa t-1 > 50Pa Mov2 t-2 < 5Pa Mov2 t-4 < 5

Module 3

Module xCondition x

M5

?

Performance of the M5 mixture model in training (fragment shown is the 1982 flood)

4000

5000 OBS PRE combination of module 1&2 PRE module 1

0

1000

2000

3000

4000

Dis

char

ge (

m3/

s)


M5 modular model, training (fragment)

0

82-7-1 82-7-31 82-8-30

Time

••2020

Performance of the M5 mixture model in testing (fragment shown is the 1996 flood)

5000

OBS PRE combination of module1 &2 PRE module 1

1000

2000

3000

4000

Dis

char

ge (

m3/

s)


M5 modular model, testing (fragment, 1996 flood)

0

96-6-1 96-7-1 96-7-31 96-8-30 96-9-29 96-10-29

Time

Huaihe: comparison of the mixture of M5 modelswith the “flood season” M5 model

Mixture of M5 models Flood season model (M5)

P f i 76 89 90 96 i 76 89 90 96Performance train76-89 test 90-96 train76-89 test90-96

Correlationcoefficient

0.99 0.98 0.96 0.95

Mean absolute error 83.0 126.8 97.3 114.1

Max absolute error 548.0 509.1 1173.5 1183.6

Root mean squared 127 0 176 1 176 6 222 2


Root mean squarederror

127.0 176.1 176.6 222.2

Total Number ofInstances

96 32 96 32

••2121

Models trees (MT) and ANN: comparison

training of MT is much faster than ANN, and it always converges; the results can be easily understood by decision makers; the results can be easily understood by decision makers; by applying pruning (that is making trees smaller by combining

subtrees in one node) it is possible to generate a range of MTs - from an inaccurate but simple linear regression (one leave only) to a much more accurate but complex combination of local models (many branches and leaves)

MT combines a number of “local” models that could be more accurate


for some extreme cases

in some cases ANN supercedes MT in accuracy

Case study Swarupgunj: modelling stage-discharge relationship (rating curve)

B. Bhattacharya, D.P. Solomatine. Application of artificial neural network in reconstructing stage-discharge relationship Proc 4th Int Conference on


reconstructing stage discharge relationship. Proc. 4th Int. Conference on Hydroinformatics, Cedar-Rapids, July 2000

••2222

Rating curve

Relationship between stage & discharge is expressed& discharge is expressed thru’ a rating curve

Simplified relationship is:Q = (h-h0)

A typical rating curve: Problem: relationship is

complex depends upon

4.0

5.0

6.0

7.0

8.0

9.0

10

20

30

40

50

60

70

80

90

10

0

Discharge (scaled)S

tag

e


complex, depends uponpast values etc.

Discharge (scaled)

Stage-discharge data

Data from one discharge measuring station in India (Swarupgunj on the river Bhagirathi)the river Bhagirathi)

The river course was more or less stable during the data collection period

Ten years data are available

2/3rd of the data is used for training and the rest is used for verification


••2323

Input parameter selection

The following input parameters were considered:– Stage (t+1)– Stage (t+1)– Stage(t)– Stage(t-1)– Discharge(t)

Output parameter:– Discharge (t+1)

ANN and Model Tree models were developed with these inputs


ANN and Model Tree models were developed with these inputs A conventional rating curve was prepared using stage (t) and

discharge (t) data

Generated M5 model trees

unpruned tree with 94 leaves, verification RMSE=76.0

pruned tree with 4 leaves (verification RMSE 69.1)

Qt_1 <= 1500 : | Qt_1 <= 1130 : Qt = -243 - 187ht_1 + 299ht + 0.667Qt_1| Qt_1 > 1130 : Qt = -214 - 387ht_1 + 448ht + 0.885Qt_1Qt_1 > 1500 : | ht <= 7.85 : Qt = -455 - 491ht_1 + 628ht + 0.727Qt_1| ht > 7.85 : Qt = -1720 - 605ht_1 + 924ht + 0.66Qt_1


pruned tree with 2 leaves (verification RMSE=69.1)

Qt_1 <= 1500 : Qt = -204 - 301ht_1 + 383ht + 0.788Qt_1Qt_1 > 1500 : Qt = -728 - 550ht_1 + 721ht + 0.745Qt_1

••2424

Training & verification statistics (1)

The model tree was trained with different pruning factor and finally, a a pruning factor 2 was chosen:a pruning factor 2 was chosen:

Training & verification statistics for all the models

Training VerificationRMSE NRMSE COE RMSE NRMSECOE

Prune factor = 0 79.3 0.132 0.991 76 0.111 0.994Prune factor = 1 89.8 0.15 0.989 69.1 0.101 0.995Prune factor = 2 92 0.153 0.988 69.7 0.101 0.995

T i i V ifi i


Training VerificationRMSE NRMSE COE RMSE NRMSECOE

Model tree 92.0 0.153 0.988 69.7 0.101 0.995ANN 90.5 0.151 0.988 70.5 0.103 0.995Rating curve 143.3 0.239 0.974 111.2 0.162 0.989

Training & verification statistics (2)

Percentage of verification data with prediction error> 5% > 10% >15% > 20%

Model tree 20.3 1.6 0.2 0.2ANN 21.4 3.1 0.6 0.3Rating curve 42.4 11.8 5.3 1.9


••2525

Verification: ANN model

2030405060708090

100

Dis

char

ge(s

cale

d)

Know n discharge Computed discharge (ANN)


01020

0 50 100

150

200

250

300

350

400

450

500

550

600

650

Validation events

Verification: Model Tree

30405060708090

100

Dis

char

ge (

scal

ed)

Know n Discharge Computed Discharge (MT)


01020

0 50 100

150

200

250

300

350

400

450

500

550

600

650

Validation Events

••2626

Swarupgunj: Conclusions

The data driven models exhibit a far better performance than the conventional rating curveconventional rating curve

The predictive accuracy of the MT & ANN model is almost the same

In cases when large amount of data is available (such as for stage-discharge relationships) accurate data-driven models can be developed


developed

Using ANN in replicating hydrodynamic / hydrologic model of river basin

D.P. Solomatine, A. Avila Torres. Neural Network Approximation of a Hydrodynamic Model in Optimizing Reservoir Operation. Proc. 2nd International

Conference on Hydroinformatics. Zurich, Switzerland, August 1996


••2727

Case study APURE: enhancement of Apure river basin in Venezuela


Multi-criterial optimization: energy production, navigability

Find the reservoirs’ operation policies providing for the high values of the two conflicting criteria:the two conflicting criteria: – (1) follow the energy production target as close as possible; – (2) maximize the period of time when the river is navigable.

MCDM problem is reduced to a single criterion problem: navigability criterion 2 is considered a “soft” constraint problem is solved a number of times as a single criterion problem


problem is solved a number of times as a single-criterion problem with respect to the energy criterion 1.

••2828

Replication of Mike11/NAM model by ANN:Apure river basin case study

MIKE-11/NAM modelling system (DHI) was used to model hydrodynamics of the river and hydrology of the 21 adjacent sub-hydrodynamics of the river and hydrology of the 21 adjacent subcatchments

Modelling was performed for years 1981-83 Resulting data files represented the relation between the reservoirs’

water releases and water levels downstream

Sub-catchments’ runoffReservoirs / dams:


water levels needed for navigability

La HondaLa CuevasLa Vueltosa

Replication of Mike11/NAM model by ANN:reservoir optimisation in Apure river basin

MIKE-11/NAM modelling system (DHI) was used to

Optimisation

system (DHI) was used to model hydrodynamics of the river and hydrology of the 21 adjacent sub-catchments

Modelling was performed for years 1981-83

Resulting data files

generate release schedules

run Mike11/NAM model to produce water levels

check navigability constraints

Start

river and catchment modelling

run Neural network replicating Mike11/NAM model


represented the relation between the reservoirs’ water releases and water levels downstream

constraints

optimal solution reached?

Stop

••2929

Performance of the trained ANN

verification


training

Replication of Mike11/NAM model by ANN

Outputs: 3 (water levels in three points) Hidden nodes: 5 Hidden nodes: 5 Inputs: 25

– runoffs from 21 subcatchments (boundary conditions for hydrodynamic model);

– 3 reservoir releases (boundary conditions for hydrodynamic model);– water level at the previous time moment (week).

Instead of modelling 3 outputs, three ANNs were constructed


Instead of modelling 3 outputs, three ANNs were constructed

••3030

Case studies SALLAND and OVERWAARD:Using ANNs and fuzzy systems in polder controlUsing ANNs and fuzzy systems in polder control

STOWA/Delft Cluster/IHE research project


STOWA/Delft Cluster/IHE research project

Participants (IHE-Delft):A.H. Lobbrecht, D.P. Solomatine

Yonas Dibike, Ling Wang, B. Bhattacharya, B. Bazartseren

Case study 1: Groot Salland

S


Rietberg: 6.646 ha Stuw7A: 13.697 ha Stuw3A: 10.130 ha

RietbergStuw 7A

••3131

Case study 1: Groot Salland

building rainfall-runoff model reconstruction of missing data reconstruction of missing data

methods used:– ANN– M5 model trees– committee machines (modular models)


Groot Salland: Filling missing data (ANN Testing) Q_Stuw7A = f( (P-E)t … (P-E)t-5 ):

using committee machines (modular local models)

Filling of missing data at Stuw7A (ANN Testing)

14

2

4

6

8

10

12

Run

off (

m3/

s)


0

2

01/01/99 03/03/99 05/03/99 07/03/99 09/02/99 11/02/99

Time (day)

ActualFlow LANN GANN

LANN=local ANN (committee machines), GANN=global ANN (trained on whole data set)LANN=local ANN (committee machines), GANN=global ANN (trained on whole data set)

••3232

14

Stuw 7A (ANN verification)

Groot Salland: Using ANN in filling-in data:committee machines (local models, LANN) vs global models (GANN)

(verification, 1998-2000)

2

4

6

8

10

12

Dis

char

ge (m

3 /s)

Q = f ( (P-E)t ... (P-E)t-5 )


0

2

01/01/99 03/03/99 05/03/99 07/03/99 09/02/99 11/02/99

Measured LANNGANN

Groundwater: interpolation btw 14-day level (testing) GWL_a(t) = f(GWL_b(t-5),…GWL_b(t+5)

ANN for interpolating grounwater level measurment (Testing)

100

150

200

250

300

GW

L (

cm)


0

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Exempel No.

Observed GWL at gg92a ANN GWL output

••3333

Runoff forecasting Stuw7A (Global ANN testing) Qt+1 = f( Pt ,Qt,Qt-1)

R ff F i S 7A (GANN T i )

5

6

7

dRunoff Forecasting at Stue7A (GANN Testing)

2

3

4

5

6

7

8

Dis

cha

rge

(m

3/s

)

0

1

2

3

4

0 1 2 3 4 5 6 7

Observed

Ca

lcu

late

d


0

1

2

1 51 101 151 201 251 301 351

Exempel No.

D

Observed Discharge ANN Output

Case study Groot Salland: conclusions

ANNs and model trees were able to predict flows and to perform the data infillingg

M5 model trees perform as good as ANNs and are more transparent “Global” ANN considers the runoff process in its entirety and may not be the

optimal approach:– during the training process rare information is considered as noise and is

filtered out– these rare events are, however, exactly the situations to which we would

like to apply the ANN models most


like to apply the ANN models most For small catchments (Rietberg), the runoff times are very short (hours): it

is necessary to adapt the monitoring frequencies to that interval important to make records of all changes in water control/management:

these changes change the modelled system.

••3434

Case study Hoogheemraadschap van de Alblasserwaard en de Vijfheeren-landen (Overwaard)

Artificial Neural Networks and Fuzzy Logic Systems for Model Based Control:Control:– build the simulation model– investigate the possibilities of using new technologies like ANN and

Fuzzy logic for model based control of the water system

Lek


Overwaard: On-line performance of ANN-based controller

1

1.5

er

leve

l (m

) -

as

in

0

0.5

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Time

Sim

ula

ted

wa

teu

pp

er

ba

External(ANN) control Dynamic control

-0.4) -


-1

-0.8

-0.6

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Date

Sim

ula

ted

wa

ter

lev

el

(mlo

we

r b

as

in

External (ANN) control Dynamic control

••3535

Overwaard: On-line performance of fuzzy-based controller

1

1.5

r le

vel

(m)

- a

sin

0

0.5

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Time

Sim

ula

ted

wa

teu

pp

er

ba

External(FAS) control Dynamic control

-0.4-


-1

-0.8

-0.6

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Date

Sim

ula

ted

wa

ter

leve

l (m

) lo

we

r b

as

in

External (FAS) control Dynamic control

Overwaard: On-line performance of controller (discharge)

ANN Controller

10

15

rge

(m

il. m

3)

Cumulative error =Cumulative error =--75,000 m375,000 m3== --0.55%0.55%

FAS Controller

0

5

10/28/98 10/30/98 11/01/98 11/03/98 11/05/98 11/07/98 11/09/98 11/11/98

Time

Cu

m.

dis

ch

a

External (ANN) control Dynamic control

15

0.55%0.55%


0

5

10

10/28/98 10/30/98 11/01/98 11/03/98 11/05/98 11/07/98 11/09/98 11/11/98

Time

Cu

m.

dis

ch

arg

e (

mil.

m3

)

E x ternal (F A S ) c ontro l Dy nam ic c ontro l

Cumulative error =Cumulative error =--75,000 m375,000 m3= = --0.55%0.55%

••3636

Case study Overwaard: conclusions (1)

AQUARIUS model of Overwaard was found to simulate the water system very well calibration results were acceptablesystem very well, calibration results were acceptable

ANN and FAS replicate the dynamic control pumping strategy reasonably well.

Replacing the slow computational component by the fast-running (10 times faster) intelligent controllers could simplify the use of AQUARIUS in real time control tasks


Case study Overwaard: conclusions (2)

Intelligent controllers are found to be able to reproduce the centralised behaviour (in terms of water levels and correspondingcentralised behaviour (in terms of water levels and corresponding discharges) of optimal control action by using easily measurable local information

External control with ANN and FAS required less than one third of the simulation time of the central optimal control


Replacing the slow computational component by the fast-running intelligent controllers could simplify the use of AQUARIUS in real time control tasks

••3737

Applications of data-driven methods in flood management and control problems: conclusions

data-driven methods allow to build accurate predictive models data driven methods are good approximators of physically based data-driven methods are good approximators of physically-based

models– they are faster– can be incorporated into optimization and real-time control loops

using classification methods (decision trees) leads to simpler models and requires less accurate data

in numerical prediction, M5 model trees allow to build transparent


in numerical prediction, M5 model trees allow to build transparent models which can be understood by decision makers much better than ANNs

using mixture of models allows to improve the performance if the water system changes, data-driven models have to be re-

trained

Documents

Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks