Kansas State University Department of Computing and Information Sciences Real-Time Bayesian Network Inference for Decision Support in Personnel Management:

Kansas State University

Department of Computing and Information Sciences

Real-Time Bayesian Network Inference for Real-Time Bayesian Network Inference for Decision Support in Personnel Management:Decision Support in Personnel Management:

Report on Research ActivitiesReport on Research Activities

William H. Hsu, Computing and Information Sciences

Haipeng Guo, Computing and Information Sciences

Shing I Chang, Industrial and Manufacturing Systems Engineering

Kansas State Universityhttp://groups.yahoo.com/group/onr-mpp

This presentation is:

http://www.kddresearch.org/KSU/CIS/ONR-2002-Jun-04.ppt



OverviewOverview

• Knowledge Discovery in Databases (KDD)– Towards scalable data mining

– Applications of KDD: learning and reasoning

• Building Causal Models for Decision Support

• Time Series and Model Integration– Prognostic (prediction and monitoring) applications

– Crisis monitoring and simulation

– Anomaly, intrusion, fraud detection

– Web log analysis

– Applying high-performance neural, genetic, Bayesian computation

• Information Retrieval: Document Categorization, Text Mining– Business intelligence applications (e.g., patents)

– “Web mining”: dynamic indexing and document analysis

• High-Performance KDD Program at K-State



High-Performance Database Mining and KDD: High-Performance Database Mining and KDD:

Current Research Programs at K-StateCurrent Research Programs at K-State• Laboratory for Knowledge Discovery in Databases (KDD)

– Research emphases: machine learning, reasoning under uncertainty

– Applications• Decision support

• Digital libraries and information retrieval

• Remote sensing, robot vision and control

• Human-Computer Interaction (HCI) - e.g., simulation-based training

• Computational science and engineering (CSE)

• Curriculum and Research Development

– Real-time automated reasoning (inference)

– Machine learning

– Probabilistic models for multi-objective optimization

– Intelligent displays: visualization of diagrammatic models

– Knowledge-based expert systems, data modeling for KDD



Stages of Data Mining andStages of Data Mining andKnowledge Discovery in DatabasesKnowledge Discovery in Databases



Visual Programming:Visual Programming:Java-Based Software Development PlatformJava-Based Software Development Platform

D2K © 2002 National Center for Supercomputing Applications (NCSA)

Used with permission.



X1

X2

X3

X4

Season:SpringSummerFallWinter

Sprinkler: On, Off

Rain: None, Drizzle, Steady, Downpour

Ground:Wet, Dry

X5

Ground:Slippery, Not-Slippery

P(Summer, Off, Drizzle, Wet, Not-Slippery) = P(S) · P(O | S) · P(D | S) · P(W | O, D) · P(N | W)

• Conditional Independence– X is conditionally independent (CI) from Y given Z (sometimes written X Y | Z) iff

P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z

– Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning) T R | L

• Bayesian Network– Directed graph model of conditional dependence assertions (or CI assumptions)

– Vertices (nodes): denote events (each a random variable)

– Edges (arcs, links): denote conditional dependencies

• General Product (Chain) Rule for BBNs

• Example (“Sprinkler” BBN)

n

iiin21 Xparents |XPX , ,X,XP

1

Bayesian Belief Networks (BBNS):Bayesian Belief Networks (BBNS):DefinitionDefinition



Bayesian Networks andBayesian Networks andRecommender SystemsRecommender Systems

• Current Research

– Efficient BBN inference (parallel, multi-threaded Lauritzen-Spiegelhalter in D2K)

– Hybrid quantitative and qualitative inference (“simulation”)

– Continuous variables and hybrid (discrete/continuous) BBNs

– Induction of hidden variables

– Local structure: localized constraints and assumptions, e.g., Noisy-OR BBNs

– Online learning

• Incrementality (aka lifelong, situated, in vivo learning)

• Ability to change network structure during inferential process

– Polytree structure learning (tree decomposition): alternatives to Chow-Liu

– Complexity of learning, inference in restricted classes of BBNs

• Future Work

– Decision networks aka influence diagrams (BBN + utility)

– Anytime / real-time BBN inference for time-constrained decision support

– Some temporal models: Dynamic Bayesian Networks (DBNs)



Data Mining:Data Mining:Development CycleDevelopment Cycle

0

10

20

30

40

50

60

ObjectiveDetermination

Data Preparation MachineLearning

Analysis &Assimilation

Eff

ort

(%

)

• Model Identification– Queries: classification, assignment

– Specification of data model

– Grouping of attributes by type

• Prediction Objective Identification– Assignment specification

– Identification of metrics

• Reduction– Refinement of data model

– Selection of relevant data (quantitative, qualitative)

• Synthesis: New Attributes

• Integration: Multiple Data Sources (e.g., Enlisted Master File, Surveys)

Environment(Data Model)

LearningElement

KnowledgeBase

Decision SupportSystem



Learning Bayesian Networks:Learning Bayesian Networks: Gradient Ascent Gradient Ascent

• Algorithm Train-BN (D)

– Let wijk denote one entry in the CPT for variable Yi in the network

• wijk = P(Yi = yij | parents(Yi) = <the list uik of values>)

• e.g., if Yi Campfire, then (for example) uik <Storm = T, BusTourGroup = F>

– WHILE termination condition not met DO // perform gradient ascent

• Update all CPT entries wijk using training data D

• Renormalize wijk to assure invariants:

• Applying Train-BN

– Learns CPT values

– Useful in case of known structure

– Key problems: learning structure from data, approximate inference

BusTourGroup

Storm

Lightning Campfire

ForestFireThunder

Dx ijk

ikijhijkijk w

x |u,yPrww

10

1

ijk

j ijk

w . j

w



• General-Case BBN Structure Learning: Use Inference to Compute Scores

• Recall: Bayesian Inference aka Bayesian Reasoning

– Assumption: h H are mutually exclusive and exhaustive

– Optimal strategy: combine predictions of hypotheses in proportion to likelihood

• Compute conditional probability of hypothesis h given observed data D

• i.e., compute expectation over unknown h for unseen cases

• Let h structure, parameters CPTs

Scores for Learning Structure:Scores for Learning Structure:The Role of InferenceThe Role of Inference

Hh

m

n21m

D|hP h D,|xP

x,,x,x|x,,x,xPD|xP

1

m211

dΘ h |ΘPΘ h,|DPhP

hPh|DPD|hP

Posterior Score Marginal Likelihood

Prior over Structures Likelihood

Prior over Parameters



Learning Structure:Learning Structure:K2K2 Algorithm and Algorithm and ALARMALARM

• Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)

FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn}

WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent

Best argmaxj>i (P(D | xj Parents[xi]) // max Dirichlet score

IF (Parents[xi] + Best).Score > Parents[xi].Score) THEN Parents[xi] += Best

RETURN ({Parents[xi] | i {1, 2, …, n}})

• A Logical Alarm Reduction Mechanism [Beinlich et al, 1989]

– BBN model for patient monitoring in surgical anesthesia

– Vertices (37): findings (e.g., esophageal intubation), intermediates, observables

– K2: found BBN different in only 1 edge from gold standard (elicited from expert)

17

6 5 4

19

10 21

311127

20

22

15

34

32

1229

9

28

7 8

30

2518

26

1 2 3

33 14

35

23

13

36

24

16

37



Major Software Releases, FY 2002Major Software Releases, FY 2002

• Bayesian Network Tools in Java (BNJ)– v1.0a released Wed 08 May 2002 to www.Sourceforge.net

– Key features

• Standardized data format (XML)

• Existing algorithms: inference, structure learning, data generation

– Experimental results

• Improved structure learning using K2, inference-based validation

• Adaptive importance sampling (AIS) inference competitive with best published algorithms

• Machine Learning in Java (MLJ)– v1.0a released Fri 10 May 2002 to www.Sourceforge.net

– Key features: (3) inductive learning algorithms from MLC++, (2) inductive learning wrappers (1 from MLC++, 1 from GA literature)

– Experimental results

• Genetic wrappers for feature subset selection: Jenesis, MLJ-CHC

• Overfitting control in supervised inductive learning for classification



• About BNJ– v1.0a, 08 May 2002: 26000+ lines of Java code, GNU Public License (GPL)– http://www.kddresearch.org/Groups/Probabilistic-Reasoning/BNJ– Key features [Perry, Stilson, Guo, Hsu, 2002]

• XML BN Interchange Format (XBN) converter – to serve 7 client formats (MSBN, Hugin, SPI, IDEAL, Ergo, TETRAD, Bayesware)

• Full exact inference: Lauritzen-Spiegelhalter (Hugin) algorithm• Five (5) importance sampling algorithms: forward simulation (likelihood

weighting) [Shachter and Peot, 1990], probabilistic logic sampling [Henrion, 1986], backward sampling [Fung and del Favero, 1995] self-importance sampling [Shachter and Peot, 1990], adaptive importance sampling [Cheng and Druzdzel, 2000]

• Data generator

• Published Research with Applications to Personnel Science– Recent work

• GA for improved structure learning: results in [HGPS02a; HGPS02b]• Real-time inference framework – multifractal analysis [GH02b]

– Current work: prediction – migration trends (EMF); Sparse Candidate– Planned continuation: (dynamic) decision networks; continuous BNs

BBayesian ayesian NNetwork Tools in etwork Tools in JJava (BNJ)ava (BNJ)



Change of Representationand Inductive Bias Control

[B] Representation Evaluatorfor Learning Problems

eI

D: Training Data

: Inference Specification

Dtrain (Inductive Learning)

Dval (Inference)

[A] Genetic Algorithm

OptimizedRepresentation

α̂

α

CandidateRepresentation

f(α)

RepresentationFitness

GA for BN Structure LearningGA for BN Structure Learning[Hsu, Guo, Perry, Stilson, GECCO-2002][Hsu, Guo, Perry, Stilson, GECCO-2002]



[B] Representation Evaluatorfor Input Specifications

: Evidence SpecificationeI

Dtrain (Model Training)

Dval (Model Validation by Inference)

f(α)

Specification Fitness(Inferential Loss)

[ii] Validation(Measurementof Inferential

Loss)

hHypothesis

[i] Inductive Learning(Parameter Estimation

from Training Data)

α

CandidateInput Specification

Model-Based ValidationModel-Based Validation[Hsu, Guo, Perry, Stilson, GECCO-2002][Hsu, Guo, Perry, Stilson, GECCO-2002]



BNJ: Integrated Tool forBNJ: Integrated Tool forBayesian Network Learning and InferenceBayesian Network Learning and Inference

XML Bayesian Network Learned from Data using K2 in BNJ



• About MLJ– v1.0a, 10 May 2002: 24000+ lines of Java code, GNU Public License (GPL)– http://www.kddresearch.org/Groups/Machine-Learning/MLJ– Key features [Hsu, Schmidt, Louis, 2002]

• Conformant to MLC++ input-output specification• Three (3) inductive learning algorithms: ID3, C4.5, discrete Naïve Bayes• Two (2) wrapper inducers: feature subset selection [Kohavi and John,

1997], CHC [Eshelman, 1990; Guerra-Salcedo and Whitley, 1999]

• Published Research with Applications to Personnel Science– Recent work

• Multi-agent learning [GH01, GH02a]• Genetic feature selection wrappers [HSL02, HWRC02, HS02]

– Current work: WEKA compatibility, parallel online continuous arcing– Planned continuations

• New inducers: instance-based (k-nearest-neighbor), sequential rule covering, feedforward artificial neural network (multi-layer perceptron)

• New wrappers: theory-guided constructive induction, boosting (Arc-x4, AdaBoost.M1, POCA)

• Integration of reinforcement learning (RL) inducers

MMachine achine LLearning in earning in JJava (MLJ)ava (MLJ)



Infrastructure for High-Performance Infrastructure for High-Performance Computation in Data MiningComputation in Data Mining

Rapid KDD Development Environment: Operational Overview



National Center for Supercomputing National Center for Supercomputing Applications (NCSA)Applications (NCSA) D2K D2K



Visual Programming Interface (Java):Visual Programming Interface (Java):Parallel Genetic AlgorithmsParallel Genetic Algorithms



Time Series Modeling and Prediction:Time Series Modeling and Prediction:Integration with Information VisualizationIntegration with Information Visualization

New Time Series Visualization System (Java3D)



Demographics-Based Clustering for Demographics-Based Clustering for Prediction (Continuing Research)Prediction (Continuing Research)

Cluster Formation and Segmentation Algorithm (Sketch)

Dimensionality-Reducing

Projection (x’)Clusters of

Similar Records

DelaunayTriangulation

Voronoi (Nearest Neighbor)

Diagram (y)



Data Clustering inData Clustering inInteractive Real-Time Decision SupportInteractive Real-Time Decision Support

15 × 15 Self-Organizing Map(U-Matrix Output)

Cluster Map(Personnel Database)



• Laboratory for Knowledge Discovery in Databases (KDD)– Applications: interdisciplinary research programs at K-State, FY 2002

• Decision support, optimization (Hsu, CIS; Chang, IMSE)

• (NSF EPSCoR) Bioinformatics – gene expression modeling (Hsu, CIS; Welch, Agronomy; Roe, Biology; Das, EECE)

• Digital libraries, info retrieval (Hsu, CIS; Zollman, Physics; Math, Art)

• Human-Computer Interaction (HCI) - e.g., simulation-based training

• Curriculum Development– Real-time intelligent systems (Chang, Hsu, Neilsen, Singh)

– Machine learning and artificial intelligence; info visualization (Hsu)

– Other: bioinformatics, digital libraries, robotics, DBMS

• Research Partnerships– NCSA: National Computational Science Alliance, National Center for

Supercomputing Applications

– Defense (ONR, ARL, DARPA), Industry (Raytheon)

• Publications, More Info: http://www.kddresearch.org

Summary:Summary:State of High-Performance KDD at KSU-CISState of High-Performance KDD at KSU-CIS

Documents

Kansas State University Department of Computing and Information Sciences Real-Time Bayesian Network Inference for Decision Support in Personnel Management: