Upload
marcus-george
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
An Ontology-Based Approach to Building BNs for the Weather Forecasting Domain
Tali Boneh
Ann Nicholson, Kevin Korb
(Monash University)
John Bally
(Bureau of Meteorology)
Monash Bayesian Reasoning Workshop
April 2006
The weather forecasting domain
• The Australian Bureau of Meteorology (Bureau) is the
national meteorological authority of Australia.
• Its role is to observe and understand Australian weather and
climate and provide weather services.
• A service is defined by its clients.
• The output of a service is products, namely weather reports
in a variety of formats (text and graphics) using several
delivery media such as newspapers, radio and the internet.
Traditional weather forecasting process
Forecasters:
• examine a large amount of data from different sources
and in different formats;
• analyse and integrate these data to generate the
weather products/reports;
• use several tools (Decision Support Systems), as well
as their own judgment, to enable integration of
information and make diagnosis-prediction decisions.
• the weather products are created by typing text into
formatted forms using a specialised text editor
Characteristics of Decision Support Systems
Aim of DSS: to digitally and graphically display data for
forecasters
• The graphical representation enables forecasters to
manually interact with the data, adjust it, change,
integrate and create new data when necessary
• The manual interaction enables forecasters to digitally
and graphically represent their thoughts
• The digital representation enables an automated text
(product) generation
Requirements and limitations - I
• The information contains uncertainty
incomplete knowledge missing data uncertainty in observation uncertainty in guidance
• Data quality is low, data are imprecise.
• It is not always clear how to combine the information
how to weigh the different bits of information how to incorporate historical data how to include forecasters’ knowledge
Requirements and limitations - II
• Existing DSS focus only on data storage, graphical user interface, and automated products generation
• More advanced meteorological decisions are left to the subjective judgement, experience, knowledge and character of the forecaster:
– how to derive weather elements based on others
– how to manipulate forecast data
– how to integrate data from different sources
The final representation is still subjective.
Requirements and limitations - III
• The domain is highly complex and involves different
dimensions
– e.g. elements, locations, time issues (time of the year,
time of the day, and lead times)
• The domain evolves and changes rapidly with better
understanding of the atmosphere, better technology
and better Numerical Weather Prediction models
A rapid development of decision support systems is
required
Requirements and limitations - IV
• In some cases it is desirable to implement more
than one technology using the same data
Approaches to DDSs should support multiple
technologies (e.g. BNs, ANNs, rule-based)
New DSS approach
• Integration of information that can capture complex meteorological concepts in ways that match the forecaster's knowledge.
• DSSs that
– can derive forecast weather elements based on local or synoptic-scale information
– modify forecast data using complex meteorological concepts while ensuring weather element consistency
– avoid comprehensive modelling and implementation
– deal with separate small decision steps .
Current new tools: 'state of the art'
• The Australian Thunderstorm Interactive Forecast System (TIFS)
– the DSS is incorporated in the software as a code
– knowledge is not explicit and some may be lost
• The National Oceanic and Atmospheric Administration's National Weather Service the “Graphical Forecasting Editor” (GFE)
– includes a framework, called Smart Tools ( based on Python)
– lets forecasters write their own tools
– can be documented at any level the forecaster finds appropriate
– code can be verified and become available to all forecasters
– code is kept in a central repository with its documentation
– procedures can be created and become operational quickly.
Disadvantages of current tools
• Most of these DSSs are rule-based
the tools do not appropriately deal with uncertainty
• Knowledge is not explicit: captured directly into a coding
language
– representation from which the domain knowledge is not
easily recognisable and may be lost as a result of the
modelling decisions taken and the representation itself
the knowledge cannot be easily shared and reused
Possible Resolution: dealing with uncertainty
• Bayesian Network technology deals with
uncertainty, missing data and poor data quality .
• Probability theory is one of the scientific ways of
dealing with reasoning under uncertainty.
• Applying formal statistics can yield better results,
compared with subjective judgment .
• The final output of the process is objective and
is based on a solid mathematical basis.
Problems with Knowledge Engineering BNs
Capturing the knowledge directly into a BN may result in:
• a representation from which the domain knowledge is not easily recognisable
– example: row information may need to be processed before it can be provided to the network. The details will be buried as a code in the implementation
• loss of information as a result of modelling decisions.
– example: omitting a variable from the network for efficiency reasons. The variable could be useful if different technologies are to be implemented
Ontology-based approach
Knowledge Base
BayesianNetworks
Other Technology
Other Technology
Data
semi-automated construction
Ontology-based approach
• To overcome potential disadvantages:
knowledge should be represented in a form that enables re-use and sharing across software and people
need a knowledge-level-model that is independent of particular computer languages
• The concept of constructing small steps of DSSs requires that the domain expert should be able to develop their own networks.
need to support the forecaster in constructing BN
A consensual conceptualisation of a domain for the purpose of knowledge to be shared and re-used is called Ontology.
Ontology
• In Philosophy: a systematic explanation of being
• In knowledge engineering: a formal, explicit specification
of a shared conceptualisation
• Ontologies aim to capture consensual knowledge in a
generic way, for the purpose of re-use and sharing
across machines and people.
Ontology design
• Declarative Knowledge
– knowledge about what objects states
and relations are in the domain
– concepts: wind, temperature, fog
• Procedural Knowledge
– knowledge about how to find relevant
facts and make inferences
– how to predict: wind, temperature, fog
Ontology
Declarative Knowledge
Procedural Knowledge
Forecasting Ontology – declarative knowledge
• Weather services and products– Service: aviation, disaster, marine, public
– Product: airport briefing, synoptic situation, recent events, media statement
• Weather data sources: NWPs, radar, satellite, tracker, guidance
• Weather phenomenon/information– weather elements: wind, temperature, fog, thunderstorm, inversion)
– tools additional information: tracker length
– other environment information: time issues
• Database schema
Forecasting Ontology – procedural knowledge
• procedure– rule based– bayesian network– decision theory– neural network
• procedure working data• output relation• algorithm
– value description– description-description– general algorithm
Knowledge elicitation
• Bayesian Network
– input variables
– output variables
– type of connection between input and output (predictor/environment/guidance/network refinements)
– working data for learning the probabilities
– multiple working data describing the inputs and
outputs at runtime
Semi-automated Construction of BN
• Extraction from ontology
– inputs, outputs variables → BN nodes
– Direction of arcs:
Predictors – from output to input (sensors)
Environment – from input to output (background factor)
Guidance – from output to input (sensor)
• Refinement of structure
– more arcs can go from the environment to the predictors
– Intermediate variables to reduce size of CPTs
– CPTs (from data, from experts, combination)
• Updating ontology
Case study – forecasting fog
• Different types of variables
– guidance:
Stern-Parkyn, Regano
– meteorological variables
weather elements:
Moisture, Pressure Gradient and Lapse Rate
environment variables (background factors):
Rainfall, Month
Possible BN structure(s) can be constructed from this knowledge.
Fog – Ontology fragment
Fog Y/N Prob
Pressure Gradient 3pm Bendigo
Pressure Gradient 3pm Wonthaggi
pYWON-pYBDG pYWON-pYBDG
Pressure Gradient 3pm East Sale
Pressure Gradient 3pm Hamilton
Combined Pressure Gradient 3pm
Predictor
Predicted Rainfall Amount Y/N 9am-9am
Environment
Moisture 6/9pm
Predictor
Stern/Parkyn
Environment
Regano
Guidance
Guidance
Month
Actual 6/9pm Data
Actual MSLP Data
Incremental prototyping development model
• Construction in steps
– guidance only
– meteorology only
– combined network
Bayesian Network: fog – guidance only
SternParkyn
0 to 11 to 22 to 55 to 1010 to 1515 to 3030 to 100
46.915.117.59.784.124.382.27
4.79 ± 11
ReganoLatest
Vfavfavunfav
13.416.070.7
Fog
fognofog
3.2396.8
Bayesian Network: fog – meteorology only
Month
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
8.767.998.768.488.768.488.768.367.788.047.788.04
Fog
fognofog
3.7196.3
LapseRate9pmCont
< 2.052.05 to 2.752.75 to 3.25>= 3.25
26.518.117.637.8
2.74 ± 0.75
Gradient
Vfavfavunfav
33.019.347.6
RainNoRain
0 to 4.5>= 4.5
91.68.41
LengthOfNight
Nov to JanFeb and OctMarch and SeptApr and AugMay to July
24.616.016.516.826.0
Moisture
Vfavfavunfav
23.519.756.8
LapseRate9pmCont
< 2.052.05 to 2.752.75 to 3.25>= 3.25
26.518.117.637.8
2.74 ± 0.75
Month
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
8.767.998.768.488.768.488.768.367.788.047.788.04
RainNoRain
0 to 4.5>= 4.5
91.68.41
Fog
fognofog
3.7196.3
Gradient
Vfavfavunfav
33.019.347.6
LengthOfNight
Nov to JanFeb and OctMarch and SeptApr and AugMay to July
24.616.016.516.826.0
ReganoLatest
Vfavfavunfav
13.616.070.4
SternParkyn
0 to 11 to 22 to 55 to 1010 to 1515 to 3030 to 100
51.014.316.78.393.603.892.12
4.39 ± 11
Moisture
Vfavfavunfav
23.519.756.8
Bayesian Network: fog – combined
Environment
WeatherGuidance
Meteorology
ROC curve evaluation
• Receiver Operating Characteristic (ROC) curves
• P(true positive) vs. P(false positive)
• Area under curve (AUC) is global measure
– perfect test: AUC = 1
• Can be used to find optimal cutoff values
Bureau Evaluation Measures
• POD (True Positive Rate)
True Positive(True Positive + False Negative) = #fog events
• False Positive Rate
False Positive (False Positive + True Negative) = #no-fog events
• False Positive Ratio (FAR)
False Positive (False Positive + True Positive) = #fog was forecasted
Evaluation
• Stratified 10-fold cross-validation used
• Dataset randomly divided into 90% (training) and 10% (validation) fractions
– separately for fog and no-fog cases
• Process repeated for 3 networks
Results: ROC evaluation of the three networks
ROC Fog network
0
20
40
60
80
100
0 20 40 60 80 100
FARare
PO
D
Guidance Only - AUC 0.833
Met Only - AUC 0.916
Combined - AUC 0.928
ROC evaluation of the Melbourne Network
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
FARate
PO
D
3pm TAF - 0.857
3pm Combined - 0.917
ROC evaluation of the Melbourne Network
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
FARate
PO
D
9pm TAF - 0.866
9pm Combined - 0.928
POD & FAR – operations versus network
Forecast Operational
POD (%)
Operational
FAR (%)
Network
POD (%)
Network
FAR (%)
3pm TAF 56 73 65 77
3pm TAF and
Code Grey
87 90 95 90
9pm TAF 67 73 71 76
9pm TAF and
Code Grey
87 90 95 89
1% cutoff was used for Code Grey
20% cutoff was used for TAF
Ontology preferences
Fog Say No Fog -20
Fog Say Code Grey - less than 5% chance of fog 10
Fog Say Code Grey - 5% chance of fog 16
Fog Say Code Grey - 10% chance of fog 18
Fog Say Code Grey - 20% chance of fog 19
Fog Say TAF – Prob Fog 20
No Fog Say No Fog 2
No Fog Say Code Grey - less than 5% chance of fog -1
No Fog Say Code Grey - 5% chance of fog -2
No Fog Say Code Grey - 10% chance of fog -3
No Fog Say Code Grey - 20% chance of fog -4
No Fog Say TAF – Prob Fog -5
Ontology (POD – FAR)
FORECAST OUTCOME
Model POD Model FAR
No fog 100 97
<5% Code Grey 95-89 90
5% Code Grey 88-84 85-83
10% Code Grey 81-73 82-80
20% Code Grey 76-70 79-78
Fog on TAF 71-65 77-76
Fog decision network
Month
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
8.767.998.768.488.768.488.768.367.788.047.788.04
LengthOfNight
Nov to JanFeb and OctMarch and SeptApr and AugMay to July
24.616.016.516.826.0
RainNoRain
0 to 4.5>= 4.5
91.68.41
Fog
fognofog
3.7196.3
LapseRate9pmCont
< 2.052.05 to 2.752.75 to 3.25>= 3.25
26.518.117.637.8
2.74 ± 0.75
Moisture
Vfavfavunfav
23.519.756.8
Gradient
Vfavfavunfav
33.019.347.6
SternParkyn
0 to 11 to 22 to 55 to 1010 to 1515 to 3030 to 100
51.014.316.78.393.603.892.12
4.39 ± 11
U
Decision
saynofogCodeGreyLessThen5CodeGrey5CodeGrey10CodeGrey20sayfog
0.677751.319301.404691.358431.289331.20796
Conclusions
• Small fragments of Bayesian Networks are beneficial in the forecasting domain
• The incremental development model supports the acceptance of the Bayesian Networks
• The ontology was found to be useful
– for the explicit representation of all elicited knowledge including background information (variables, discretisation, arcs and probabilities)
– for sharing information between domain experts and the knowledge engineer
– as a guide for further elicitation
– in supporting the domain experts in the construction of a Bayesian Network
Future Work
• Further development of the ontology
• More research on how to determine preferences
• Other forecasting case studies
– Thunderstorms
• Testing
• Implementation issues