37
1 Data-driven modelling: CASE STUDIES in RIVER and FLOOD MANAGEMENT and CONTROL Dimitri P. Solomatine UNESCO-IHE Institute for Water Education Hydroinformatics Chair Data driven modelling and forecasting: our experience replicating behavior of hydrodynamic/hydrological river model with the objective of using the ANN in model-based optimal control of a reservoir (Solomatine & Torres, 1996) building ANN-based intelligent controller for real-time control of water levels in a polder (Lobbrecht & Solomatine, 1999); modeling rainfall-runoff process with ANNs (Dibike, Solomatine & Abbott, 1999); ANN in surge water level prediction for ship guidance reconstructing stage-discharge relationship using ANN (Bhattacharya & Solomatine 2000) D.P. Solomatine. Data-driven modelling. Applications. 2 Solomatine, 2000) using M5 model trees to predict river discharge using SVMs in prediction of water flows for flood management (Dibike, Velickov, Solomatine & Abbott, 2001) Using ANN and other ML methods in optimization of urban systems (Alfonso et al., 2012; Xu et al., 2015)

Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••11

Data-driven modelling:CASE STUDIES in RIVER and FLOOD

MANAGEMENT and CONTROL

Dimitri P. Solomatine

UNESCO-IHE Institute for Water Education

Hydroinformatics Chair

Data driven modelling and forecasting:our experience

replicating behavior of hydrodynamic/hydrological river model with the objective of using the ANN in model-based optimal control of a reservoir (Solomatine & Torres, 1996)

building ANN-based intelligent controller for real-time control of water levels in a polder (Lobbrecht & Solomatine, 1999);

modeling rainfall-runoff process with ANNs (Dibike, Solomatine & Abbott, 1999);

ANN in surge water level prediction for ship guidance reconstructing stage-discharge relationship using ANN (Bhattacharya &

Solomatine 2000)

D.P. Solomatine. Data-driven modelling. Applications. 2

Solomatine, 2000) using M5 model trees to predict river discharge using SVMs in prediction of water flows for flood management (Dibike,

Velickov, Solomatine & Abbott, 2001) Using ANN and other ML methods in optimization of urban systems (Alfonso

et al., 2012; Xu et al., 2015)

Page 2: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••22

Process (physically-based) modelling of flow:river modelling context

Available data:– rainfalls Rt lateral inflows QL

QQttupup

t Q– catchment and river physical properties (soil,

geometry, roughnesses)– initial and boundary conditions for flows Q 0(x,t)

Inputs: QL(x,t), Qup(t), Q 0(x,t) , system properties Output: flow Q (x, t) Model:

( ) ( ( ) ( ) 0( ) )

QQttRRtt

0t

hb

x

Q

D.P. Solomatine. Data-driven modelling. Applications. 3

Q (x, t)=F (QL(x,t), Qup(t), Q 0(x,t) , system properties) Questions:

– are the physical properties of the catchment known?– is F good enough ?

0)(2

2

K

QQgAHh

xgA

A

Q

xt

Q

Using data-driven methods in rainfall-runoff modelling

Available data:– rainfalls Rt

QQttupup

t– runoffs (flows) Qt

Inputs: lagged rainfalls Rt Rt-1 Rt-2 … Output to predict: Qt+T

Model: Qt+T = F (Rt Rt-1 … Qt Qt-1 …Qtup Qt-1

up …)(routing)

QQttRRtt

D.P. Solomatine. Data-driven modelling. Applications. 4

Questions: – how to find the appropriate lags? (lags embody the

physical properties of the catchment)– how to build F ?

Page 3: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••33

Case study SIEVE: flood management problem

D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks in rainfall runoff modeling Hydrological Sciences Journal 48 (3) 2003 399 411

D.P. Solomatine. Data-driven modelling. Applications. 5

rainfall-runoff modeling. Hydrological Sciences Journal, 48 (3), 2003, 399 - 411.

Italy, Tuscany, Arno river basin, Sieve subcatchment

D.P. Solomatine. Data-driven modelling. Applications. 6

Page 4: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••44

Schematisation

mountaneous catchment incatchment in Southern Europe

area of 822 sq. km

D.P. Solomatine. Data-driven modelling. Applications. 7

SIEVE: data

three months of hourly discharge (Q), precipitation (R) and evapotranspiration (E) data available (Dec 1959 to Feb 1960 2160evapotranspiration (E) data available (Dec. 1959 to Feb. 1960, 2160 data points)

for this experiment, data was represented as hourly data on effective rainfall RE and runoff R

1854 examples used for training, 300 for verification the problem: predict flow Q: 1, 3 and 6 hours ahead

D.P. Solomatine. Data-driven modelling. Applications. 8

Page 5: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••55

SIEVE: visualization of data

FLOW1: effective rainfall and discharge data

800 0

Discharge [m3/s]

Eff.rainfall [mm]

300

400

500

600

700

Discharge [m3/s]

2

4

6

8

10

12

14

Effective rainfall [mm]

D.P. Solomatine. Data-driven modelling. Applications. 9

0

100

200

0 500 1000 1500 2000 2500

Time [hrs]

14

16

18

20

Analysis of relationships, determining lags, choice of variables

200

250

300

350

s]

2

2.5

3

3.5

m]

Q

))()(((( jjiiij xExxExE

Covariance

Correlation

Rainfall event and the resulting flow

0

50

100

150

200

90 100 110 120 130 140 150

t [hrs ]

Q [

m3/s

0

0.5

1

1.5

2

R [

mmQ

R

Correlation between Qt+1 and REt-i

0 20.30.40.50.60.70.8

Correlation coeff.

jjii

ijij

•Correlation coefficient

Average mutual information

D.P. Solomatine. Data-driven modelling. Applications. 10

00.10.2

0 2 4 6 8 10 12

Lag in REt-i

AMI between Q(t+1) and R(t-tau)

0.00

0.05

0.10

0.15

0.20

0.25

0 2 4 6 8 10

time lag tau

AM

I

e age utua o at o

Page 6: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••66

SIEVE: variables and posing the problem

after correlation analysis, a number of models was tried, e.g. for Qt+1 :1. Qt+1 = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5 Qt, Qt-1 )Qt+1 f ( t t 1 t 2 t 3 t 4 t 5, Qt Qt 1 )2. Qt+1 = f [MA( REt, REt-1, REt-2), MA(REt-2, REt-3, REt-4), REt-3, REt-5]3. Qt+1 = f [MA(REt, REt-1, REt-2, REt-3, REt-4, REt-5), MA(REt-3, REt-6),

REt-3, REt-5]4. Qt+1 = f [MA(REt-4 , REt-5), Qt-1]5. Qt+1 = f (REt-4 , REt-5)6. Qt+1 = f [MA(REt-4 , REt-5), Qt-2] MA = moving average

d l t d

D.P. Solomatine. Data-driven modelling. Applications. 11

models acceptedQ (t+1) = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 )Q (t+3) = f (REt, REt-1, REt-2, REt-3, Qt, Qt-1 )Q (t+6) = f (REt, Qt )

numerical prediction is done by M5 model trees and ANNs

ANN model Multi-Layered Perceptron (MLP) three layers

SIEVE: numerical prediction

Multi Layered Perceptron (MLP), three layers Optimal number of hidden nodes:

– Qt+1 : 6– Qt+3 : 5– Qt+6 : 3

Training algorithm: back-propagation with adaptive learning rate

D.P. Solomatine. Data-driven modelling. Applications. 12

MT model Same sets of inputs , training and verification data points as that of

ANN

Page 7: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••77

Sample output of Model tree Generated tree (Reduced size)

SIEVE: M5 model tree details are comprehensible for a decision maker - predicting Q(t+3)

Qt <= 51.2 :

| Qt <= 28.7 : LM1 (903/5.66%)

| Qt > 28.7 : LM2 (379/13.1%)

Qt > 51.2 : LM3 (572/66.7%)

Linear modelsLM1: Qt+3 = -0.0118 + 0.317REt + 0.124REt-1 + 0.0844REt-2 –0.109REt-3

D.P. Solomatine. Data-driven modelling. Applications. 13

Qt+3 t t 1 t 2 t 3

+ 1.09Qt - 0.0826Qt-1LM2: Qt+3 = -0.262 + 11.9REt + 0.182REt-1 + 8.9REt-2 - 0.198REt-3 +

3.66Qt - 2.67Qt-1LM3: Qt+3 = 15.5 + 25.7REt + 7.59REt-1 - 0.0923REt-3 + 1.44Qt -

0.732Qt-1

SIEVE: Predicting Q(t+1)Q (t+1) = f (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 )

ANN verification RMSE=5.175

Prediction of Qt+1 : Verification performance350

NRMSE=0.106COE=0.9886

MT verificationRMSE=3.6123NRMSE=0.074 100

150

200

250

300

Q [

m3/s

]

Observed

Modelled (ANN)

Modelled (MT)

D.P. Solomatine. Data-driven modelling. Applications. 14

COE=0.9944

0

50

0 20 40 60 80 100 120 140 160 180t [hrs]

Page 8: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••88

ANN verification RMSE=11.353

SIEVE: Predicting Q(t+3)Q (t+3) = f (REt, REt-1, REt-2, REt-3, Qt, Qt-1 )

Prediction of Qt+3 : Verification performance

350

NRMSE=0.234COE=0.9452

MT verificationRMSE=12.548NRMSE=0.258 100

150

200

250

300Q

[m

3/s

]ObservedModelled (ANN)Modelled (MT)

D.P. Solomatine. Data-driven modelling. Applications. 15

COE=0.9331

0

50

0 20 40 60 80 100 120 140 160 180t [hrs]

SIEVE: Predicting Q(t+6)Q (t+6) = f (REt, Qt )

ANN verificationRMSE=19.402

Prediction of Qt+6: verification performance

350

NRMSE=0.399COE=0.8401

MT verificationRMSE=21.547NRMSE=0.443

100

150

200

250

300

350

Q [

m3/s

]

Observed

Modelled (ANN)

Modelled (MT)

D.P. Solomatine. Data-driven modelling. Applications. 16

COE=0.8028

0

50

100

0 20 40 60 80 100 120 140 160 180t [hrs]

Page 9: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••99

Comparison of various methods used in Sieve case study

Qt+1

trainingQt+1

testQt+3

trainingQt+3

testQt+6

trainingQt+6

test

ANN 5.826 5.175 13.549 11.353 28.970 19.402

M5 d l t 4 550 3 612 14 384 12 548 27 069 21 547M5 model tree 4.550 3.612 14.384 12.548 27.069 21.547

M5 + AdaBoost.RT 3.488 4.132 9.0415 14.042 22.493 24.078

M5 + AdaBoost.RT+ tuned on test set

3.789 3.754 12.749 11.551 27.958 19.468

SVM 3.212 10.867 20.201

Locally weighted 9.097 10.666 12.556 12.703

D.P. Solomatine. Data-driven modelling. Applications. 17

y gregression

k-nearest neighb. 1.506 11.114 17.57 13.703

Composite: Model tree + 9-NN

4.674 3.350 13.296 14.593

Posing the classification problem for flood management

Classification problem: Classification output is a class L, M or H:– Low flow up to 50 m3/s

3Medium flow up to 350 m3/s High up to 750 m3/s

Class: High flows Q(t+1)

Class: Medium flow Q(t+1)

Rainfall (t-2)

classes classes to be identifiedto be identified

past records

D.P. Solomatine. Data-driven modelling. Applications. 18

Class: Low flows Q(t+1)

Rainfall (t-3)

Flow Q(t)

past records

•New record.

•To which class does it belong?

Page 10: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1010

SIEVE: the resulting decision tree (12 leaves) that classifies Q(t+3) into Low, Medium, High (error=6%)

Qt <= 51.45

| REt-1 <= 0.6686: L (815.0/10.0)

| REt-1 > 0.6686

| | Qt <= 25.59: L (24.0)

| | Qt > 25.59: M (24.0/7.0)

IF Flow(t) <= 51.45 IF Flow(t) <= 51.45

and Rainfall(tand Rainfall(t--1) > 0.671) > 0.67| | Qt ( )

Qt > 51.45

| REt-1 <= 2.3955

| | Qt <= 59.04

| | | REt <= 0.0255

| | | | Qt <= 52.67: L (5.0)

| | | | Qt > 52.67: M (7.0/1.0)

| | | REt > 0.0255: M (63.0/4.0)

| | Qt > 59.04

| | | Qt 1 <= 348.55: M (271.0)

and Flow(t) <25.6and Flow(t) <25.6

THEN: Low flowTHEN: Low flow

D.P. Solomatine. Data-driven modelling. Applications. 19

| | | Qt-1 3 8.55: ( .0)

| | | Qt-1 > 348.55

| | | | Qt <= 630.2: M (7.0)

| | | | Qt > 630.2: H (3.0)

| REt-1 > 2.3955

| | Qt <= 247.68

| | | REt <= 3.3031: M (3.0)

| | | REt > 3.3031: H (7.0/3.0)

| | Qt > 247.68: H (9.0)

SIEVE: the resulting pruned tree (4 leaves only) that does the same (error=6.7%)

Qt <= 50.85: Low (1282.0/17.0) Qt > 50 85 Qt > 50.85 | Qt <= 214.28: Medium (511.0/5.0) | Qt > 214.28 | | Qt <= 247.68: Medium (15.0/7.0) | | Qt > 247.68: High (46.0)

D.P. Solomatine. Data-driven modelling. Applications. 20

Page 11: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1111

SIEVE: predicting class of flow Qt+3

Classification of Q t+3 using Decision tree, prunedClass of Flow [m3/s](10/02/70, 04:00 to 22/02/70, 16:00)

300

400

500

600

700

800

Observed class

Predicted class

flowFlow [m /s]

Med.

High

D.P. Solomatine. Data-driven modelling. Applications. 21

0

100

200

0 50 100 150 200 250 300 350t [hrs]

Low

SIEVE: performance of 3 classification methods:decision tree, Bayesian and K-nearest neighbor

Decision Tree(unpruned)

Decision Tree(pruned)

Naïve Bayes K-Nearest Neighbor(k = 3)

Evaluationfor

Train. Test. Train. Test. Train. Test. Train. TestIncorrectlyclassifiedinstances,%

1.02 5 2.10 6.67 6.09 10.67 1.83 7.33

D.P. Solomatine. Data-driven modelling. Applications. 22

%

Page 12: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1212

Case study HUAIHE: rainfall-runoff modelling in flood management of Huai River by M5 model trees and ANNs

D.P. Solomatine, Y. Xue. M5 model trees compared to neural networks: application to flood forecasting in the upper reach of the Huai River in China.

ASCE Journal of Hydrologic Engineering 9(6) 2004 pp 491-501

D.P. Solomatine. Data-driven modelling. Applications. 23

ASCE Journal of Hydrologic Engineering, 9(6), 2004, pp. 491 501.

Huai River (Huaihe) basin

#·Be ijing

36

113° 115° 117° 119° 121°

#

#

#

#

#

#

#

#

#

##

#

#

#Y

#

#

#

#

#

Xiaohong

Ru

Shaying

Sha

Jial u

HongzeLake

Guo

Xifei

Si

Yi

Shu

Heze

Linyi

Luohe

Xuzhou

Jin ing

Bengbu

Suzhou

Fuyang

HuaibeiXuchang

Zhoukou

Shangqiu

Zaozhuang

Zhumadian

ZhengzhouLianyungang

Pingdingshan

Yellow Sea

He'nan

Shandong

Jiangsu

Yellow River

Huai River Basin

Project Area

River

Lake & Reservoir

Boundary of Province

#Y Capital of Province

# Main C ity

C H I N A

34° 34°

36

D.P. Solomatine. Data-driven modelling. Applications. 24

#

#

#

#

#Y

#Y

Huai

Pi

Don

gfeiSh i

BantaiHua inan

Xinyang

Lu'an

Hubei

AnhuiHefe i

Nanjing

Yangtz

e River

0 100 200 Kilometers

N

EW

S

32° 32°

113° 115° 117° 119° 121°

Page 13: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1313

Huaihe: Rainfall-runoff modelling, schematisation of the upper reach

2

XixianCTG QX

QC

QZ

QD

12

3

5

QN

area-averaged rainfall

D.P. Solomatine. Data-driven modelling. Applications. 25

4

QN

Huaihe: data

Based on the physical properties of the catchment Inputs:

– Known past rainfalls (Pat, … , Pat-n1, PaMovkt, … , PaMovkt-n2) – Known past runoffs (flows) (QXt,…, QXt-3, QCt,..., QCt-4, QZt,…, QZt-5)

Output:– discharge one ahead (flow) QXt+1

Which outputs to use? – to be determined

b h i

D.P. Solomatine. Data-driven modelling. Applications. 26

by the input-output correlation analysis and data preprocessing

Page 14: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1414

Huai River: correlation analysis and average mutual information used as the basis for selecting input variables

The correlation of related inputs variables with QXt+1for full-year data

0.4

0.6

0.8

1

r

QC

QZ

QX

Pa

PaMov4

D.P. Solomatine. Data-driven modelling. Applications. 27

0

0.2

0 1 2 3 4 5

time lags (day)

Huaihe: data preparation and selection of variables

Data preparation:– 20 years of training data (daily rainfall and runoff)– filtering data (only flood events are considered)– selection of input variables (analysis of correlation, average mutual

information)INPUT VARIABLES:

No. Var Description Correl.coeff Time lags

1 Pa Area average rainfall at time t 0 35 1

D.P. Solomatine. Data-driven modelling. Applications. 28

1 Pat Area average rainfall at time t 0.35 1

2 Pat-1 Area average rainfall at time t-1 0.65 2

3 Pamov2t 2-days moving average of Pa at t 0.72 1

4 Pamov2t-1 2-days moving average of Pa at t-1 0.48 2

5 QCt Upper stream discharge at t 0.82 1

6 QCt-1 Upper stream discharge at t-1 0.56 2

7 QXt Discharge at predicted station at t 0.77 1

Page 15: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1515

Huaihe: Model trees

nodes are IF-THEN-ELSE conditions on attributes values

Model 2

Model 1

Model 3X2

3

4

values leaves are linear regression models

Discharge (t) < 154

Discharge (t+1)=

rainfallMov(t) <= 4.5

RainfallMov <= 18.5

rainfallMov(t-1) <= 13.5

rainfallMov(t) <= 4.5

Yes No

...

Model 6Model 4

Model 5

Y (output)1 2 3 4 5 6

1

2

D.P. Solomatine. Data-driven modelling. Applications. 29

g ( )Regression model 1

Discharge (t+1)= Regression model 2

Discharge (t+1)= Regression model 3

Discharge (t+1)= Regression model 4

Discharge (t+1)= Regression model 5

Huaihe: full year data, the resulting model tree (1)

QXt <= 154 :

| PaMov2t <= 4.5 : LM1 (1499/4.86%)

| PaMov2t > 4.5 :

| | PaMov2t <= 18.5 : LM2 (315/15.9%)

| | PaMov2t > 18.5 : LM3 (91/86.9%)

QXt > 154 :

| PaMov2t-1 <= 13.5 :

| | PaMov2t <= 4.5 : LM4 (377/15.9%)

| | PaMov2t > 4.5 : LM5 (109/89.7%)

D.P. Solomatine. Data-driven modelling. Applications. 30

| | ( / )

| PaMov2t-1 > 13.5 :

| | PaMov2t <= 26.5 : LM6 (135/73.1%)

| | PaMov2t > 26.5 : LM7 (49/270%)

Page 16: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1616

Huaihe: full year data, the linear models on leaves

LM1: QXt+1 = 2.28 + 0.714PaMov2t-1 - 0.21PaMov2t + 1.02Pat-1 + 0.193Pat

- 0.0085QCt-1 + 0.336QCt + 0.771QXt

LM2: QXt+1 = -24.4 - 0.0481PaMov2t-1 - 4.96PaMov2t + 3.91Pat-1 + 4.51Pat

- 0.363QCt-1 + 0.712QCt + 1.05QXt

LM3: QXt+1 = -183 + 10.3PaMov2t-1 + 8.37PaMov2t - 5.32Pat-1 + 1.49Pat

- 0.0193QCt-1 + 0.106QCt + 2.16QXt

LM4: QXt+1 = 47.3 + 1.06PaMov2t-1 - 2.05PaMov2t + 1.91Pat-1 + 4.01Pat

- 0.3QCt-1 + 1.11QCt + 0.383QXt

LM5: QXt+1 = -151 - 0.277PaMov2t-1 - 37.8PaMov2t + 31.1Pat-1 + 30.3Pat

- 0.672QCt-1 + 0.746QCt + 0.842QXt

D.P. Solomatine. Data-driven modelling. Applications. 31

LM6: QXt+1 = 138 - 5.95PaMov2t-1 - 39.5PaMov2t + 29.6Pat-1 + 35.4Pat

- 0.303QCt-1 + 0.836QCt + 0.461QXt

LM7: QXt+1 = -131 - 27.2PaMov2t-1 + 51.9PaMov2t + 0.125Pat-1 - 5.29Pat

- 0.0941QCt-1 + 0.557QCt + 0.754QXt

Huaihe: full year data: resulting hydrograph (training, fragment)

5000

0

1000

2000

3000

4000 pre obv

D.P. Solomatine. Data-driven modelling. Applications. 32

The max training error of model tree using flood season data

0

31898 31958 32018

Page 17: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1717

Huaihe: full year data, resulting hydrograph (verification)

4000

0

1000

2000

3000

95/5/1 95/8/9 95/11/17 96/2/25 96/6/4 96/9/12 96/12/21

pre obv

D.P. Solomatine. Data-driven modelling. Applications. 33

The max testing error of model tree using flood seasson data

95/5/1 95/8/9 95/11/17 96/2/25 96/6/4 96/9/12 96/12/21

time

Huaihe: Flood season data, relative performance of M5 model trees and ANNs

separate models for flood season (FS, May-October) were built M5 tree had 7 inputs 7 leaves M5 tree had 7 inputs, 7 leaves ANN (MLP) was trained on the same data

Perform ance A NNtesting

A NNtraining

M 5testing

M 5training

RM SE 96 100 87 98

D.P. Solomatine. Data-driven modelling. Applications. 34

M A E 24.9 33.0 24.2 31.4

M axim umError

1446 1498 1651 1766

R 0.95 0.97 0.97 0.97

Page 18: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1818

M5 and ANN using flood season data (testing, fragment)

4000

0

1000

2000

3000

Dis

char

ge (

m3/

s)

OBS

FS-M5

FS-ANN

D.P. Solomatine. Data-driven modelling. Applications. 35

M5 and ANN, flood season data (testing, fragment)

0

96-6-1 96-7-1 96-7-31 96-8-30

Time

Performance of M5 and ANNin “full-year” and “flood season (FS)” experiments

full-year M5 train

full-year M5 test

full-year naïve, test

full-year linear, test

FS-M5 train.

FS-M5 test

FS-ANN train.

FS-ANN cross-valid

FS-ANN test

train. test test test valid.Years 76-79 90-96 90-96 90-96 76-89 90-96 76-89 90-93 94-96

Number of equ. M5 / hid. nodes ANN

35 35 n/a n/a 7 7 8 8 8

Numb. of samples

5109 2565 2565 2565 2625 1525 2625 653 872

RMSE 69 6 84 5 183 0 160 0 98 87 100 79 96

D.P. Solomatine. Data-driven modelling. Applications. 36

RMSE 69.6 84.5 183.0 160.0 98 87 100 79 96

Mean abs. err. 18.7 18.9 37.1 39.9 31.4 24.2 33.0 25.1 24.9

Max abs. error 1695 2208 3009 3008 1766 1651 1498 1130 1446

Correlation coeff.

0.97 0.95 0.76 0.79 0.97 0.97 0.97 0.98 0.95

Page 19: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••1919

Using mixture of experts (committee machines):each expert (model) is for particular hydrological condition

INPUT Condition 1Q t 1> 1000

Module Y M5

Condition 3

Q xt-1> 1000

Condition 2Q xt-1 1000Q xt > 200

1

Module 2

NANN

M5

ANN

D.P. Solomatine. Data-driven modelling. Applications. 37

Condition 3Pa t-1 > 50Pa Mov2 t-2 < 5Pa Mov2 t-4 < 5

Module 3

Module xCondition x

M5

?

Performance of the M5 mixture model in training (fragment shown is the 1982 flood)

4000

5000 OBS PRE combination of module 1&2 PRE module 1

0

1000

2000

3000

4000

Dis

char

ge (

m3/

s)

D.P. Solomatine. Data-driven modelling. Applications. 38

M5 modular model, training (fragment)

0

82-7-1 82-7-31 82-8-30

Time

Page 20: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2020

Performance of the M5 mixture model in testing (fragment shown is the 1996 flood)

5000

OBS PRE combination of module1 &2 PRE module 1

1000

2000

3000

4000

Dis

char

ge (

m3/

s)

D.P. Solomatine. Data-driven modelling. Applications. 39

M5 modular model, testing (fragment, 1996 flood)

0

96-6-1 96-7-1 96-7-31 96-8-30 96-9-29 96-10-29

Time

Huaihe: comparison of the mixture of M5 modelswith the “flood season” M5 model

Mixture of M5 models Flood season model (M5)

P f i 76 89 90 96 i 76 89 90 96Performance train76-89 test 90-96 train76-89 test90-96

Correlationcoefficient

0.99 0.98 0.96 0.95

Mean absolute error 83.0 126.8 97.3 114.1

Max absolute error 548.0 509.1 1173.5 1183.6

Root mean squared 127 0 176 1 176 6 222 2

D.P. Solomatine. Data-driven modelling. Applications. 40

Root mean squarederror

127.0 176.1 176.6 222.2

Total Number ofInstances

96 32 96 32

Page 21: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2121

Models trees (MT) and ANN: comparison

training of MT is much faster than ANN, and it always converges; the results can be easily understood by decision makers; the results can be easily understood by decision makers; by applying pruning (that is making trees smaller by combining

subtrees in one node) it is possible to generate a range of MTs - from an inaccurate but simple linear regression (one leave only) to a much more accurate but complex combination of local models (many branches and leaves)

MT combines a number of “local” models that could be more accurate

D.P. Solomatine. Data-driven modelling. Applications. 41

for some extreme cases

in some cases ANN supercedes MT in accuracy

Case study Swarupgunj: modelling stage-discharge relationship (rating curve)

B. Bhattacharya, D.P. Solomatine. Application of artificial neural network in reconstructing stage-discharge relationship Proc 4th Int Conference on

D.P. Solomatine. Data-driven modelling. Applications. 42

reconstructing stage discharge relationship. Proc. 4th Int. Conference on Hydroinformatics, Cedar-Rapids, July 2000

Page 22: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2222

Rating curve

Relationship between stage & discharge is expressed& discharge is expressed thru’ a rating curve

Simplified relationship is:Q = (h-h0)

A typical rating curve: Problem: relationship is

complex depends upon

4.0

5.0

6.0

7.0

8.0

9.0

10

20

30

40

50

60

70

80

90

10

0

Discharge (scaled)S

tag

e

D.P. Solomatine. Data-driven modelling. Applications. 43

complex, depends uponpast values etc.

Discharge (scaled)

Stage-discharge data

Data from one discharge measuring station in India (Swarupgunj on the river Bhagirathi)the river Bhagirathi)

The river course was more or less stable during the data collection period

Ten years data are available

2/3rd of the data is used for training and the rest is used for verification

D.P. Solomatine. Data-driven modelling. Applications. 44

Page 23: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2323

Input parameter selection

The following input parameters were considered:– Stage (t+1)– Stage (t+1)– Stage(t)– Stage(t-1)– Discharge(t)

Output parameter:– Discharge (t+1)

ANN and Model Tree models were developed with these inputs

D.P. Solomatine. Data-driven modelling. Applications. 45

ANN and Model Tree models were developed with these inputs A conventional rating curve was prepared using stage (t) and

discharge (t) data

Generated M5 model trees

unpruned tree with 94 leaves, verification RMSE=76.0

pruned tree with 4 leaves (verification RMSE 69.1)

Qt_1 <= 1500 : | Qt_1 <= 1130 : Qt = -243 - 187ht_1 + 299ht + 0.667Qt_1| Qt_1 > 1130 : Qt = -214 - 387ht_1 + 448ht + 0.885Qt_1Qt_1 > 1500 : | ht <= 7.85 : Qt = -455 - 491ht_1 + 628ht + 0.727Qt_1| ht > 7.85 : Qt = -1720 - 605ht_1 + 924ht + 0.66Qt_1

D.P. Solomatine. Data-driven modelling. Applications. 46

pruned tree with 2 leaves (verification RMSE=69.1)

Qt_1 <= 1500 : Qt = -204 - 301ht_1 + 383ht + 0.788Qt_1Qt_1 > 1500 : Qt = -728 - 550ht_1 + 721ht + 0.745Qt_1

Page 24: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2424

Training & verification statistics (1)

The model tree was trained with different pruning factor and finally, a a pruning factor 2 was chosen:a pruning factor 2 was chosen:

Training & verification statistics for all the models

Training VerificationRMSE NRMSE COE RMSE NRMSECOE

Prune factor = 0 79.3 0.132 0.991 76 0.111 0.994Prune factor = 1 89.8 0.15 0.989 69.1 0.101 0.995Prune factor = 2 92 0.153 0.988 69.7 0.101 0.995

T i i V ifi i

D.P. Solomatine. Data-driven modelling. Applications. 47

Training VerificationRMSE NRMSE COE RMSE NRMSECOE

Model tree 92.0 0.153 0.988 69.7 0.101 0.995ANN 90.5 0.151 0.988 70.5 0.103 0.995Rating curve 143.3 0.239 0.974 111.2 0.162 0.989

Training & verification statistics (2)

Percentage of verification data with prediction error> 5% > 10% >15% > 20%

Model tree 20.3 1.6 0.2 0.2ANN 21.4 3.1 0.6 0.3Rating curve 42.4 11.8 5.3 1.9

D.P. Solomatine. Data-driven modelling. Applications. 48

Page 25: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2525

Verification: ANN model

2030405060708090

100

Dis

char

ge(s

cale

d)

Know n discharge Computed discharge (ANN)

D.P. Solomatine. Data-driven modelling. Applications. 49

01020

0 50 100

150

200

250

300

350

400

450

500

550

600

650

Validation events

Verification: Model Tree

30405060708090

100

Dis

char

ge (

scal

ed)

Know n Discharge Computed Discharge (MT)

D.P. Solomatine. Data-driven modelling. Applications. 50

01020

0 50 100

150

200

250

300

350

400

450

500

550

600

650

Validation Events

Page 26: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2626

Swarupgunj: Conclusions

The data driven models exhibit a far better performance than the conventional rating curveconventional rating curve

The predictive accuracy of the MT & ANN model is almost the same

In cases when large amount of data is available (such as for stage-discharge relationships) accurate data-driven models can be developed

D.P. Solomatine. Data-driven modelling. Applications. 51

developed

Using ANN in replicating hydrodynamic / hydrologic model of river basin

D.P. Solomatine, A. Avila Torres. Neural Network Approximation of a Hydrodynamic Model in Optimizing Reservoir Operation. Proc. 2nd International

Conference on Hydroinformatics. Zurich, Switzerland, August 1996

D.P. Solomatine. Data-driven modelling. Applications. 52

Page 27: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2727

Case study APURE: enhancement of Apure river basin in Venezuela

D.P. Solomatine. Data-driven modelling. Applications. 53

Multi-criterial optimization: energy production, navigability

Find the reservoirs’ operation policies providing for the high values of the two conflicting criteria:the two conflicting criteria: – (1) follow the energy production target as close as possible; – (2) maximize the period of time when the river is navigable.

MCDM problem is reduced to a single criterion problem: navigability criterion 2 is considered a “soft” constraint problem is solved a number of times as a single criterion problem

D.P. Solomatine. Data-driven modelling. Applications. 54

problem is solved a number of times as a single-criterion problem with respect to the energy criterion 1.

Page 28: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2828

Replication of Mike11/NAM model by ANN:Apure river basin case study

MIKE-11/NAM modelling system (DHI) was used to model hydrodynamics of the river and hydrology of the 21 adjacent sub-hydrodynamics of the river and hydrology of the 21 adjacent subcatchments

Modelling was performed for years 1981-83 Resulting data files represented the relation between the reservoirs’

water releases and water levels downstream

Sub-catchments’ runoffReservoirs / dams:

D.P. Solomatine. Data-driven modelling. Applications. 55

water levels needed for navigability

La HondaLa CuevasLa Vueltosa

Replication of Mike11/NAM model by ANN:reservoir optimisation in Apure river basin

MIKE-11/NAM modelling system (DHI) was used to

Optimisation

system (DHI) was used to model hydrodynamics of the river and hydrology of the 21 adjacent sub-catchments

Modelling was performed for years 1981-83

Resulting data files

generate release schedules

run Mike11/NAM model to produce water levels

check navigability constraints

Start

river and catchment modelling

run Neural network replicating Mike11/NAM model

D.P. Solomatine. Data-driven modelling. Applications. 56

represented the relation between the reservoirs’ water releases and water levels downstream

constraints

optimal solution reached?

Stop

Page 29: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••2929

Performance of the trained ANN

verification

D.P. Solomatine. Data-driven modelling. Applications. 57

training

Replication of Mike11/NAM model by ANN

Outputs: 3 (water levels in three points) Hidden nodes: 5 Hidden nodes: 5 Inputs: 25

– runoffs from 21 subcatchments (boundary conditions for hydrodynamic model);

– 3 reservoir releases (boundary conditions for hydrodynamic model);– water level at the previous time moment (week).

Instead of modelling 3 outputs, three ANNs were constructed

D.P. Solomatine. Data-driven modelling. Applications. 58

Instead of modelling 3 outputs, three ANNs were constructed

Page 30: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3030

Case studies SALLAND and OVERWAARD:Using ANNs and fuzzy systems in polder controlUsing ANNs and fuzzy systems in polder control

STOWA/Delft Cluster/IHE research project

D.P. Solomatine. Data-driven modelling. Applications. 59

STOWA/Delft Cluster/IHE research project

Participants (IHE-Delft):A.H. Lobbrecht, D.P. Solomatine

Yonas Dibike, Ling Wang, B. Bhattacharya, B. Bazartseren

Case study 1: Groot Salland

S

D.P. Solomatine. Data-driven modelling. Applications. 60

Rietberg: 6.646 ha Stuw7A: 13.697 ha Stuw3A: 10.130 ha

RietbergStuw 7A

Page 31: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3131

Case study 1: Groot Salland

building rainfall-runoff model reconstruction of missing data reconstruction of missing data

methods used:– ANN– M5 model trees– committee machines (modular models)

D.P. Solomatine. Data-driven modelling. Applications. 61

Groot Salland: Filling missing data (ANN Testing) Q_Stuw7A = f( (P-E)t … (P-E)t-5 ):

using committee machines (modular local models)

Filling of missing data at Stuw7A (ANN Testing)

14

2

4

6

8

10

12

Run

off (

m3/

s)

D.P. Solomatine. Data-driven modelling. Applications. 62

0

2

01/01/99 03/03/99 05/03/99 07/03/99 09/02/99 11/02/99

Time (day)

ActualFlow LANN GANN

LANN=local ANN (committee machines), GANN=global ANN (trained on whole data set)LANN=local ANN (committee machines), GANN=global ANN (trained on whole data set)

Page 32: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3232

14

Stuw 7A (ANN verification)

Groot Salland: Using ANN in filling-in data:committee machines (local models, LANN) vs global models (GANN)

(verification, 1998-2000)

2

4

6

8

10

12

Dis

char

ge (m

3 /s)

Q = f ( (P-E)t ... (P-E)t-5 )

D.P. Solomatine. Data-driven modelling. Applications. 63

0

2

01/01/99 03/03/99 05/03/99 07/03/99 09/02/99 11/02/99

Measured LANNGANN

Groundwater: interpolation btw 14-day level (testing) GWL_a(t) = f(GWL_b(t-5),…GWL_b(t+5)

ANN for interpolating grounwater level measurment (Testing)

100

150

200

250

300

GW

L (

cm)

D.P. Solomatine. Data-driven modelling. Applications. 64

0

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Exempel No.

Observed GWL at gg92a ANN GWL output

Page 33: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3333

Runoff forecasting Stuw7A (Global ANN testing) Qt+1 = f( Pt ,Qt,Qt-1)

R ff F i S 7A (GANN T i )

5

6

7

dRunoff Forecasting at Stue7A (GANN Testing)

2

3

4

5

6

7

8

Dis

cha

rge

(m

3/s

)

0

1

2

3

4

0 1 2 3 4 5 6 7

Observed

Ca

lcu

late

d

D.P. Solomatine. Data-driven modelling. Applications. 65

0

1

2

1 51 101 151 201 251 301 351

Exempel No.

D

Observed Discharge ANN Output

Case study Groot Salland: conclusions

ANNs and model trees were able to predict flows and to perform the data infillingg

M5 model trees perform as good as ANNs and are more transparent “Global” ANN considers the runoff process in its entirety and may not be the

optimal approach:– during the training process rare information is considered as noise and is

filtered out– these rare events are, however, exactly the situations to which we would

like to apply the ANN models most

D.P. Solomatine. Data-driven modelling. Applications. 66

like to apply the ANN models most For small catchments (Rietberg), the runoff times are very short (hours): it

is necessary to adapt the monitoring frequencies to that interval important to make records of all changes in water control/management:

these changes change the modelled system.

Page 34: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3434

Case study Hoogheemraadschap van de Alblasserwaard en de Vijfheeren-landen (Overwaard)

Artificial Neural Networks and Fuzzy Logic Systems for Model Based Control:Control:– build the simulation model– investigate the possibilities of using new technologies like ANN and

Fuzzy logic for model based control of the water system

Lek

D.P. Solomatine. Data-driven modelling. Applications. 67

Overwaard: On-line performance of ANN-based controller

1

1.5

er

leve

l (m

) -

as

in

0

0.5

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Time

Sim

ula

ted

wa

teu

pp

er

ba

External(ANN) control Dynamic control

-0.4) -

D.P. Solomatine. Data-driven modelling. Applications. 68

-1

-0.8

-0.6

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Date

Sim

ula

ted

wa

ter

lev

el

(mlo

we

r b

as

in

External (ANN) control Dynamic control

Page 35: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3535

Overwaard: On-line performance of fuzzy-based controller

1

1.5

r le

vel

(m)

- a

sin

0

0.5

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Time

Sim

ula

ted

wa

teu

pp

er

ba

External(FAS) control Dynamic control

-0.4-

D.P. Solomatine. Data-driven modelling. Applications. 69

-1

-0.8

-0.6

10/28/98 10/31/98 11/03/98 11/06/98 11/09/98 11/12/98

Date

Sim

ula

ted

wa

ter

leve

l (m

) lo

we

r b

as

in

External (FAS) control Dynamic control

Overwaard: On-line performance of controller (discharge)

ANN Controller

10

15

rge

(m

il. m

3)

Cumulative error =Cumulative error =--75,000 m375,000 m3== --0.55%0.55%

FAS Controller

0

5

10/28/98 10/30/98 11/01/98 11/03/98 11/05/98 11/07/98 11/09/98 11/11/98

Time

Cu

m.

dis

ch

a

External (ANN) control Dynamic control

15

0.55%0.55%

D.P. Solomatine. Data-driven modelling. Applications. 70

0

5

10

10/28/98 10/30/98 11/01/98 11/03/98 11/05/98 11/07/98 11/09/98 11/11/98

Time

Cu

m.

dis

ch

arg

e (

mil.

m3

)

E x ternal (F A S ) c ontro l Dy nam ic c ontro l

Cumulative error =Cumulative error =--75,000 m375,000 m3= = --0.55%0.55%

Page 36: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3636

Case study Overwaard: conclusions (1)

AQUARIUS model of Overwaard was found to simulate the water system very well calibration results were acceptablesystem very well, calibration results were acceptable

ANN and FAS replicate the dynamic control pumping strategy reasonably well.

Replacing the slow computational component by the fast-running (10 times faster) intelligent controllers could simplify the use of AQUARIUS in real time control tasks

D.P. Solomatine. Data-driven modelling. Applications. 71

Case study Overwaard: conclusions (2)

Intelligent controllers are found to be able to reproduce the centralised behaviour (in terms of water levels and correspondingcentralised behaviour (in terms of water levels and corresponding discharges) of optimal control action by using easily measurable local information

External control with ANN and FAS required less than one third of the simulation time of the central optimal control

D.P. Solomatine. Data-driven modelling. Applications. 72

Replacing the slow computational component by the fast-running intelligent controllers could simplify the use of AQUARIUS in real time control tasks

Page 37: Data-driven modelling: CASE STUDIES in RIVER and ......•3 Case study SIEVE: flood management problem D.P. Solomatine, K.N. Dulal. Model trees as an alternative to neural networks

••3737

Applications of data-driven methods in flood management and control problems: conclusions

data-driven methods allow to build accurate predictive models data driven methods are good approximators of physically based data-driven methods are good approximators of physically-based

models– they are faster– can be incorporated into optimization and real-time control loops

using classification methods (decision trees) leads to simpler models and requires less accurate data

in numerical prediction, M5 model trees allow to build transparent

D.P. Solomatine. Data-driven modelling. Applications. 73

in numerical prediction, M5 model trees allow to build transparent models which can be understood by decision makers much better than ANNs

using mixture of models allows to improve the performance if the water system changes, data-driven models have to be re-

trained