"All models are wrong, though some may be said to be useful." (G.E.Box). It's not...

RULES FOR RESPONSIBLE

MODEL BUILDINGWilliam James

University Professor Emeritus

President, CHI

Guelph, Canada bill@computationalhydraulics.com

"All models are wrong, though some may be said to be useful." (G.E.Box).

It's not enough to know simply when or how a model may be said to be useful - it's more important to know how reliable it is.

A model is a concept. Concepts are used in thinking, scientific deduction, engn’rg design and forensics. They are improved by experience. We do not necessarily require the model that most approaches perfection, rather we seek the model that provides an acceptably accurate explanation. Simple models are often said to be “better” than complex models.

Optimal model complexity depends on the questions to be resolved and the resources available.

•accept the limits of the discipline of engineering; •improve and restore the natural balances and bio-diversity; •correct the human behaviour that caused the problem to the ecosystem; •imitate the structure of the natural, native or indigenous system; •be good for all parts of the natural system; •not enrich one individual or group to the distress or impoverishment of another; •be in harmony with good character, cultural value, and moral law.

Your model should meet your own ethical standards – it should:

•the living world is the matrix for all design, •design should follow the laws of life, •biological equity must determine design, •design must reflect bioregionality, •projects should use renewable energy systems, •design should integrate living systems, •projects should heal the planet, and •design should follow a sacred ecology

variance can be systematically reduced by including (explaining) more and more relevant processes, at a higher time and spatial resolution.

- fundamental tenet

The implicit problem in critical thinking is to find the most probable flaws in an argument, to discern the best lines of thought and to improve the argument. The solution may be stated: if we test the argument perhaps over a long time, which parts of the argument are less likely to be a valid, and how may the experience be better explained elsewise?

The implicit problem in scientific method is to find the optimum or sufficient description of the dominant processes. The solution may be stated: if we test the current explanation of dominant processes over a long period of time, e.g. 75-years for an engineering environmental problem, is the description optimal in the sense that it is the most parsimonious description that meets the required, or imposed, uncertainty?

The implicit problem in engineering design is to find the optimum cost-effective array of best practices. A solution may be stated: if the 75-year rainfall time series that occurred at the International Airport, had in fact occurred at Foxran Estates, then plan 126 would have been the most cost-effective of the 329 plans examined - had they, of course, all existed over this time.

The implicit problem in engineering forensics is to find the most credible explanation for an acute problem, and to suggest a cost-effective solution which is generally to replace the acute problem by a chronic problem. The price to be paid is vigilance.

Concerns include: What array of models should be used? What is the model applicability in the

context of the study objectives? What accuracy is achievable? What is the uncertainty of the model? What investment of model effort is most

cost-efficient? Is cost-efficiency appropriate for

optimizing an uncertain model?

Rule: A model is used to help select the best among competing proposals. It is fundamentally irresponsible and unethical for modelers not to interpret the inherent uncertainty

Steps in model construction

1. review and re-state the problem 2. construct the as-is model input data set3. select model performance evaluation criteria 4. select an objective function 5. calibrate and evaluate the model 6. satisfied? If no, go back to 1; If y, go to 7 7. model several theoretical or to-be situations 8. select the likely best alternative 9. report the best solution and its uncertainty.

Rule: Computed and observed time series are more ethically represented as smudges than single-valued lines.

Rule: Objectives must be simplified and related to the computed output and objective functions. The model must include code that adequately describes all significant processes.

where:Nm = the number of modules active in the model,

Ns = number of sub-spaces modeled in each module,

Npr= number of processes modeled in each sub-space,

pa = the input parameters required for each process.

M S pr

1 1 1,,

cost is taken to be a combination of:

1. engn’rg fees to design alternative solutions; 2. construction costs of the selected alternative; 3. intangible costs; 4. costs due to uncertainty of the selected option.

Design costs term

Model error term

Evaluation function

Complexity C

Optimum complexity

106102100 108105104 107103101 109

Note: Bill’s suggested

relations & numbers

Rule: In determining the best level of complexity, test simple models first, proceeding to more complex, until the required accuracy of the computed response function is achieved. Use the least number of processes, discretized spaces, and the biggest time step that delivers the required uncertainty.

Sensitivity analysis consists of 1. varying model coefficients one at a

time, with the amount varied being representative of the uncertainty in the parameter being analyzed,

2. dividing resulting dimensionless change in computed response by the dimensionless parameter variation, and then

3. ranking the resulting sensitivity gradient.

-7.5 -5.0 -2.5 0.0 2.5 5.0 7.5

Non-linear sensitivity gradients for peak FlowMedium duration, medium intensity (0.3 in/hr for 1 hr) / Location 100

Percent change in parameter

WW1 WAREA WW3 WSLOPE WW5 WW6

WW7 WW8 WW9 WW10 ww11

Wkbk:59

Rule: Do not test a generalized program per se for sensitivity, parameter optimization, or error, because individual applications are likely to be radically different. Values of parameters in the input datafile determine which processes will be dominant or dormant. Relative parameter values change both the model sensitivity and the model uncertainty. Each model application must be separately tested over the relevant range of model

categorize input parameters in four groups:

1. can be measured with almost total certainty:

2. can be readily measured in the field or laboratory.

3. cannot be easily measured in the field or laboratory.

4. cannot be measured with any certainty at all.

model process calibrationmodel process calibration

parameter estimation

continuous model

event calibration

sensitivity analysis

1. CalibrationStart

parameters

CalibrationIFs

User inputPost-

processor

Programs

Datafile

OK?Parameter

OptimizationSensitivityAnalysis

ErrorAnalysis

ModelLongterm

Inference

ContinuousFuzzy RFs

2. Inferences

Steps in model construction

1. review and re-state the problem 2. construct the as-is model input data set3. select model performance evaluation criteria 4. select an objective function 5. calibrate and evaluate the model 6. satisfied? If no, go back to 1; If y, go to 7 7. model several theoretical or to-be situations 8. select the likely best alternative 9. report the best solution and its uncertainty.

• Nodes– Depth, head, volume, lateral inflow, total inflow,

flooding• Links

– Flow, depth, velocity, capacity• System

– Temp, rainfall, snow depth, losses, runoff, dry weather inflow, ground water inflow, RDII inflow, direct inflow, total inflow, flooding, outflow, storage

24 ResponseFunctions

PCSWMM 2005Utilities

PCSWMMTerminology

typical cycle in a response or input function

-the functions may be observed, synthetic or computed; RFcrit and IFcrit are arbitrary

typical cycle in a response or input function

-the functions may be observed, synthetic or computed; RFcrit and IFcrit are arbitrary

t1,1 t1,2 t1,3 t1,4 t2,1 t2,2

RFcrit, IFcrit

RF(t), IF(t)

OF1: (t2,1 - t1,1) duration of wet event

OF2: (t2,2 - t1,3) duration of dry event

OF3: RF(t1,3) peak flow, flux, or concentration

OF4: RF(t1,1) minimum flow, flux or concentration

OF5: *INT (t1,4-t1,1) total wet event flow or flux

OF6: (t1,4 - t1,2) duration of exceedance

OF7: (t2,2 - t1,4) duration of deficit

OF8: n[RF>RFcrit] number of exceedances

OF9: n[RF<RFcrit] number of deficits

OF10: *INT (t2,2-t1,4) volume of deficit

OF11: *INT (t1,4-t1,2) volume of excess

OF12: OF5/OF1 wet event mean concentration

OF13: *INT (t2,1-t1,4) total dry event flow or flux

OF14: OF13/OF2 dry event mean concentrationR:117

dtRF(t) = OFt

dtRF(t)-RF = OF crit

dtRF-RF(t) = OF crit

RF(t)dt = OFt

Dominant process Objective function

Overland flow over impervious areas OF3

Infiltration into the upper soil mantle OF4

Pollutant washoff OF5

Erosion OF1

Overland flow over pervious areas OF3

Pollutant build-up OF5

Recovery of storages OF2

Recovery of loss (infiltration) rates OF4

Recession of storages OF7

Evaporation *IF8

Snowmelt *IF11

snow accumulation *IF7 R:119

Rule: Select the best objective function thoughtfully, by relating it back to the original design questions. Use the minimum acceptable number of objective functions.

1. observation error, related to field instrumentation, comprising two components, one random one systematic; 2. sampling error, associated with the timing and location of the field equipment; 3. numerical error, identified with numerical math used in the code; 4. structural error, related to disaggregation (the number & resolution of the processes active); 5. structural error, related to discretization (the spatial resolution);6. structural error, related to poor formulation of one or more of the component process relations and code; and7. propagated error, related to erroneous parameters.

Internal descriptionExternal description

1. aggregation error

2. numerical error

3. structural error

1. uncertainty due to naturalvariability, or unobservedinput disturbances.

2. measurement and samplingerrors of observed input andoutput.

Prior knowledge

Calibration process

Identify as-is model

3. start-up error

4. input TS datafile error

5. model error

4. discretization error

5. input environmentdatafile error

6. model structure andstate-parameter error

7. parameter optimiza-tion error

Design process 6. uncertainty of to-beparameters

(inference to the to-be and as-was scenarios

7. user output-interpretationerror

8. parameter propaga-tion error

9. error analysis

Rule: Sixteen sources of error are listed in the framework for uncertainty analysis presented here. When interpreting the computed output from your model, all sixteen sources should be explicitly interpreted.

model users must be able to:

1. isolate the important empirical parameters that require refining (calibration),

2. associate these parameters with their correct processes (may be more than one),

3. isolate the conditions under which the processes are active (again may be more than one), and then

4. select state-variable events (SV sub-spaces) for sensitivity (which may be hypothetical events), and

5. select state-variable events from the observed record for calibration analyses. R:136

(Ofi)c

(Ofi)o

A represents “small” eventsB represents “medium” eventsC represents “big” eventsD represents fuzzy overlaps

Short-duration-high-intensity SDHI 20 m; 3 in/h

Medium-duration-hi-intensity MDHI 60 m; 1.0 in/hr

long-duration-high-intensity LDHI 600 m; 0.2 in/h

Short-duration-med-intensity SDMI 20 m; 0.4 in/hr

Medium-duratn-med-intensity MDMI 60 m; 0.3 in/h

long-duration-med-intensity LDMI 600 m; 0.1 in/h

Short-duration-low-intensity SDLI 20 m; 0.1 in/h

Medium-duration-low-intensity MDLI 60 m; 0.1 in/hr

long-duration-low-intensity LDLI 600 m; 0.1 in/h

Evapo-transpiration:

Short-duration-high-intensity SDHI 1 d; 0.5 in/d

long-duration-high-intensity LDHI 10 d; 0.3 in/d

Short-duration-low-intensity SDHI 1 d; 0.05in/d

long-duration-low-intensity LDLI 10 d; 0.05 in/dR:139

Light rate of rain Overland flow over impervious areas

Medium rate of rain Infiltration into upper soil mantle; pollutant washoff

Heavy rate of rain Erosion; pollutant washoff; pervious area flow

Long duration rain Overland flow over pervious areas

No rain:

Long duration drought Pollutant build-up; groundwater depletion

Short duration drought Storage recessions

Temperature:

High temperatures Evapo-transpiration; snowmelt

Low temperatures Snow accumulation & ripening

High wind Snowmelt R:140

Rule: Associate parameters with processes, and processes with causative events, and causative events with limited state-variable sub-spaces.

A total error statistic (EFt) may be used to quantify overall

goodness of fit:

|)CPF - OPF| (w + n

) OOF - COF( w)- (1.0 = EF pp

where:

EFt = total error statistic (m3/s);

w = weighting factor;

n = number of measured hourly flows;

OOF = measured flow (m3/s);

COF = computed flow (m3/s);

OPF = measured peak flow (m3/s); and

CPF = computed peak flow (m3/s).

Rule: Use first-order error analysis to report the estimated propagated error in your recommended design solution.

not used

rate of rain

evapo-transpirationrate

zero duration of rain

medium zero short med long

evapo-transpiration duration rate-of-rain

general form:If X period is

Y , analyze

Z parameters.where X, Y, Z have the following meanings: X

Z1. rain

erosion

2. rain

medium

pervious area flow3. rain

medium

pollutant washoff4. rain

impervious area flow5. rain

rain-out 6. ET

exists

recovery of storages7. ET

exists

recovery of loss rates

exists

groundwater depletion9. ET

medium

pollutant build-up

Rule: Analyse only sensitive parameters, and then only against relevant events.

Framework for continuous modeling:

At your desk:1. Make a list of simplified design questions, and postulate the relationship between your list and your proposed objective functions.

2. Select the best objective functions and response functions for your study problem. Minimize the computed output and computer execution times. Allocate storage space for computed time series management.

3. Obtain or generate a credible, very-long-term time series to drive your model for design inference.

4. Obtain a short but sufficient record of good, observed events to calibrate your model.

Using the PCSWMM4 shell:5. List all parameters that need to be

optimized, and their associated processes.

6 Associate all processes with the limited state-variable sub-spaces where they dominate.

7. Search the good observed record for a sufficient number of appropriate events.

8. Estimate: 1. the mean most likely value, 2. a higher most likely value, and 3. a lower most likely value for each of

all input parameters. Choose the sensitivity test range, but keep it small.

9. Carry out the sensitivity tests, and rank all parameters, in terms of their dimensionless sensitivity gradients.

10. Optimize the parameters to give the smallest error.

11. Run the calibrated model for the long term time series for each array of BMPs.

12. Infer which is the best array. Rerun the model for this array estimating the error in the computed response functions.

13. Study all the input and output information again; make certain that it is logical, and gain knowledge about the performance of the drainage system. Interpret the impact of the errors.

At your client's office:

14. Report your recommendations, and, provided you follow the logic, become rich and famous.

The following 8 rules form a personal catechism for honest, very-long term, continuous surface water quality modeling

Rule 1: Do not calibrate all parameters simultaneously against a long-term continuous observed record, notwithstanding any early advice to the contrary in the literature.

Rule 2: Transpose or synthesize a long-term, hydro-meteorologic input time-series from the same hydrologic region, and use this for inferring comparative performance of various arrays of BMPs. Many records of 50 years duration or longer are available.

Rule 3: Carefully choose the best objective functions that represent the design questions and the model variability. Get the advisory committee to justify the selections in writing.

Rule 4: In order to control the amount of computing, associate the input parameters with processes, and processes with causative events, and causative events with limited state-variable sub-spaces. For this activity, sensitivity analysis code in PCSWMM4 is helpful. Do not analyze parameters outside these spaces.

Rule 5: Use three estimates of the most likely parameter values. It is more meaningful to compare the computed response from several reasonable models, rather than responses computed using extreme values.

Rule 6. Assume that the WQM is approximately linear, for the purposes of optimizing parameters, and estimating the propagated error. Then analyze for sensitivity near the mean expected values of all input parameters.

Rule 7: Calibrate only sensitive parameters, and then only against relevant events for which you have good, short-term observed data. And that must include good rate-of-rain with adequate coverage and spatial resolution.

Rule 8: Use first-order linear error analysis, and report the estimated propagated error in your recommended design solution.

The endThe end

•www.computationalhydraulics.com •www.eos.uoguelph.ca/webfiles/james

•bill@computationalhydraulics.com

• wjames@uoguelph.ca

see you on-line at:see you on-line at:

"All models are wrong, though some may be said to be useful." (G.E.Box). It's not...

Documents

Would a radiologic examination be useful ?

Learning with ensembles: How overfitting can be useful

Why economic valuation of Hima can be useful

Can Rose's paradox be useful in crime prevention

Ivycorp text features that could be useful for business

What could be being said here? - WordPress.com · What could be being said here? Sally Charlie Linus Lucy Rerun

Start. Super Searching Google Timelines would be useful for more historical viewpointsGoogle Timelines would be useful for more historical viewpoints

This needs to be said media, inc

DataDirector Can be Useful? Hastings Schools Spring 2012 PD

How a microcredit system could be useful for the local ...533666/FULLTEXT01.pdfAs Polish writer Ryszard Kapuscinski said in Heban, “does not exist comprehension if there is not shared”1

All That Can Be Said

And God said, “Let there be light…” And God said, “Let there be light…” Gen. 1:3

Do testers have to code... to be useful?

How Social Media Data Can Be Useful in 2015

Could Acupuncture Be Useful in the Treatment of

How iPad Can Be Useful for Education

Audit Committee Institute - home.kpmg€¦ · Qualitative characteristics of useful financial information – For financial information to be useful, it needs to be relevant to users

“She said, continue with what they said at the clinic, you will be … · “She said, continue with what they said at the clinic, you will be fine”: Female sex worker’s conceptualizations

New members welcome Tips for using the machines How much … · 2016-02-10 · Metaphysical Properties Tektites are considered to be a very useful stone for meditation, they are said

USEFUL WEBSITES OF AGRICULTURAL INFORMATION. "Agriculture is the backbone of the Indian Economy"- said Mahatma Gandhi six decades