Upload
shawn-augustine-johnston
View
221
Download
3
Tags:
Embed Size (px)
Citation preview
A Process Catalog for Workflow Generation
Michael Wolverton, David Martin,Ian Harrison, Jerome Thomere
SRI International
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Outline
Program overview Project overview Qualitative (capabilities) layer *
– Modeling & query handling
Quantitative (“quality of service”) layers– Modeling & query handling
Implementation
* Primary focus in this talk
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Tangram Program Objectives
Support the intelligence analyst in using data analysis tools effectively
Automatic instantiation of data analysis workflows– Maximize performance within acceptable resource constraints– Reusable workflow templates– Flexible workflow requests
Automatic selection of data analysis components and datasets
Quick & easy characterization of component descriptions– By non-experts– Supporting precise capabilities queries– Incorporating empirical measures of speed and effectiveness
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Example Workflow Template
EntityEquivalence
(Alias Resolution)
SuspicionScoring
GroupDetection
GroupDetection
GroupHypothesis
Merging
LikelihoodRatio
Detection
InexactGraph
Matching
LogicalInference
EventEquivalence
Entity/Transaction
Data
GroupSeedSet
Recognized Events/Alerts
• Backwards sweep
• Forwards sweep
ISWC 2008 A Process Catalog for Workflow Generation David Martin
VulnExpVulnExpPatternPattern
Simple Example: “Backward Sweep”
(containsNodeType ?DS ‘SuspiciousEvent)(containsNodeType ?DS ‘SuspiciousEvent)
ProcessProcessDescriptionsDescriptions
AccuracyAccuracyModelsModels
(containsNodeType ?DS(containsNodeType ?DS‘‘memberOf)memberOf)
(containsNodeType ?DS(containsNodeType ?DS‘‘suspiciousEntity)suspiciousEntity)
LAWLAW
Threat ResourceThreat ResourceAcquire PatternAcquire Pattern
CADRECADRE
‘‘memberOfmemberOf
NetKitNetKit
‘‘suspiciousEntitysuspiciousEntity
UWisc SuspicionUWisc SuspicionScoringScoring
Qualitative QueryQualitative Query
Process + PreconditionsProcess + Preconditions
Qualitative QueryQualitative Query
Process + PreconditionsProcess + Preconditions
(containsLinkType (containsLinkType ?DS ‘suspiciousEntity)?DS ‘suspiciousEntity)
(containsNodeType ?DS ‘Group)(containsNodeType ?DS ‘Group)
ISWC 2008 A Process Catalog for Workflow Generation David Martin
VulnExpVulnExpPatternPattern
Simple Example: “Forward Sweep”
LAWLAW
Threat ResourceThreat ResourceAcquire PatternAcquire Pattern
CADRECADRE
‘‘memberOfmemberOf
NetKitNetKit
‘‘suspiciousEntitysuspiciousEntity
UWisc SuspicionUWisc SuspicionScoringScoring
Data ModelData Model
Query: Process + Problem + Data ModelQuery: Process + Problem + Data Model
Data ModelData Model
Query: Process + Problem + Data ModelQuery: Process + Problem + Data Model
ProcessProcessDescriptionsDescriptions
AccuracyAccuracyModelsModels
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Outline
Program overview Project overview Qualitative (capabilities) layer *
– Modeling & query handling
Quantitative (“quality of service”) layers– Modeling & query handling
Implementation
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Project Overview: Objectives and Approach
Challenge: Characterize individual components in a way that allows a workflow management component to reason about them effectively
Approach: Characterize processes & answer queries in terms of:– Process capabilities
• What kinds of problems they are capable of answering
– How they modify the available data• What data looks like before running the process and what it looks
like after– Content– Accuracy
– Performance• System requirements (memory, OS, etc.)• Time, memory use, etc.
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Approach: Layered Process Description
Layer Name
Contents FormalismSource of
knowledge
Capabilities
Qualitative “functional” descriptions, hard resource constraints, invocation details
Static characteristics in OWL; pre & postconditions in rules
Hand-coded by component developers
Data Modification
Statistical “before/after” descriptions of data
Problem X Data Model
=>
Data Model
Experimental analysis, theoretical analysis
AccuracyStatistical description of expected accuracy of algorithm results
Problem X Data Model X Accuracy Model
=>Accuracy Model
Experimental analysis, theoretical analysis
PerformanceStatistical prediction of performance of algorithm
Problem X Data Model X Resource Model
=>Performance Model
Experimental analysis, theoretical analysis
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Relationship to Service Discovery Problem
Easier in some ways (simplifying assumptions)– Components operate on data only
• No side-effects “in the world”
– Simple patterns of I/O shared by most components– Smallish domain model (ontology)
Harder in some ways– Need to return preconditions related to specific needs
• “Least sufficient conditions”
– Need hi-fidelity (quantitative) “Quality of Service” models
– Compute QoS for specific datasets at query-time
ISWC 2008 A Process Catalog for Workflow Generation David Martin
ProCat Architecture
. . .
CL Reasoner Quantitative Layer Prediction
Quantitative Models Repository
Linear PredictorPM Non-LinearSearch Model
Predictor
. . . . . .
Process
Query Handler
. . .
. . .
PM1
PM2
GD1
GD2
Capabilities Layer KB
. . .
. . .
Ontologies
Data TEO
PM1 PM2 GD1 GD2
. . .Coeff. Coeff.GD1 GD2
. . .Data
GD2
Data
PM1Pattern
1Pattern
1
OWL
SPARQL
RDFS++ Reasoning
RDF/XML Syntax with ExtensionSPARQL
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Outline
Program overview Project overview Qualitative (capabilities) layer
– Modeling & query handling
Quantitative (“quality of service”) layers– Modeling & query handling
Implementation
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Capabilities Layer
. . .
CL Reasoner Quantitative Layer Prediction
Quantitative Models Repository
Linear PredictorPM Non-LinearSearch Model
Predictor
. . . . . .
Process
Query Handler
. . .
. . .PM2
GD1
GD2
Capabilities Layer KB
. . .
. . .
Ontologies
Data TEO
PM1 PM2 GD1 GD2
. . .Coeff. Coeff.GD1 GD2
. . .Data
GD2
Data
PM1Pattern
1Pattern
1
PM1
I/O Behavior
I/O Behavior
Pattern 1
Pattern 2I/O Rules
Process
Requirements
Class
Proc Inst 1
• Invoc. Command
• Resource Requirements Site
Proc Inst 2
• Invoc. Command
• Resource Requirements Site
. . .. . .
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Example: Capabilities Query
<pcat:FindInputDataRequirements> <pcat:component> <rdf:Description rdf:about="http://...#?component2"> <rdf:type rdf:resource="http://.../Process.owl#PatternMatchingProcess"/> <pdl:hasOutput rdf:resource="http://...#?dataVariable5"/> <pdl:hasInput rdf:resource="http://...#?dataVariable4"/> <pdl:hasInput rdf:resource="http://...#?dataVariable3"/> </rdf:Description> </pcat:component> <pcat:constraints> <rdf:Description rdf:about="http://...#?dataVariable5"> <pdl:hasRole rdf:resource="http://.../Process.owl#HypothesisOutputRole"/> <rdf:type rdf:resource="http://...#Hypothesis"/> <pdl:containsNodeType rdf:resource="http://...#MoneyLaunderingEvent"/> </rdf:Description> ….. </pcat:constraints></pcat:FindInputDataRequirements>
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Process Description Ontology
Process– Class hierarchy– Parameters
• Types• Roles• Default values• Multiple inheritance
– Pre- and post-conditions Process Usage Template Process installation
– Resource requirements• Memory, disk space, libraries, etc.
– Invocation conventions• Environment variables, paths
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Capabilities Layer Challenges
Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process
pre: (input containsNodeType Person)
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Capabilities Layer Challenges
Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process
Propagation of values (in “backwards sweep”)
pre: (input containsNodeType ?T)
post: (output containsNodeType ?T)
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Capabilities Layer Challenges
Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process
Propagation of values (in “backwards sweep”) Universally quantified conditional rules
(output containsNodeType ?T) :- (input1 containsNodeType ?T), (input2 containsNodeType ?T).
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Capabilities Layer Challenges
Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process
Propagation of values (in “backwards sweep”) Universally quantified rules Queries may contain pre- and post-condition elements
(including arbitrary pre-condition elements)
pre: (input1 rdf:type PersonDataset) (input2 rdf:type EventDataset) (input2 temporalRange <...>) post: (output containsLinkType ParticipatedIn)
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Capabilities Layer Challenges
Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process
Propagation of values (in “backwards sweep”) Universally quantified rules Queries may contain pre- and post-condition elements Least sufficient precondition is desired
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Solution
Process Usage Template (PUT)– “Snapshot” of an arbitrary successful occurrence of a process– Each process can have multiple PUTs
2 declarative units– Pre / post condition (existentially quantified)– Conditional effect rules (universally quantified)
Two-stage query processing– SPARQL queries identify candidate processes based on “static” properties– Prolog-based evaluation of pre/post-condition query clauses
Asymmetric treatment of pre vs. post– Query postcondition clauses must be derivable from PUT postcondition (or
conditional effect)– Query precondition clauses must be consistent with PUT precondition
Result precondition is accumulation of– Precondition (with propagated variable bindings)– Bodies of CE rules used to establish postcondition clauses
(with propagated variable bindings)– Precondition clauses given in query
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Outline
Program overview Project overview Qualitative (capabilities) layer *
– Modeling & query handling
Quantitative (“quality of service”) layers– Modeling & query handling
Implementation
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Quantitative Layers
Layer Name
Contents FormalismSource of
knowledge
Capabilities
Qualitative “functional” descriptions, hard resource constraints, invocation details
Static characteristics in OWL; pre & postconditions in rules
Hand-coded by component developers
Data Modification
Statistical “before/after” descriptions of data
Problem X Data Model
=>
Data Model
Experimental analysis, theoretical analysis
Accuracy
Statistical description of expected accuracy of algorithm results
Problem X Data Model X Accuracy Model
=>Accuracy Model
Experimental analysis, theoretical analysis
Performance
Statistical prediction of performance of algorithm
Problem X Data Model X Resource Model
=>Performance Model
Experimental analysis, theoretical analysis
ISWC 2008 A Process Catalog for Workflow Generation David Martin
ProCat Quantitative Layers Architecture
Quantitative Layer Prediction
Quantitative Models Repository
Linear PredictorPM NonlinearSearch Model
Predictor
. . . . . .
Query Handler
PM1 PM2 GD1 GD2
. . .Coeff. Coeff.GD1 GD2
. . .Data
GD2
Data
PM1Pattern
1Pattern
1
SR
4.2 and 5.2 Queries Quantitative Data, Accuracy, and Performance Predictions
DC MetricsOntology
TEE
ExperimentalResults
PredictionEngine
ComponentExecution
Data
Models
+ Data Characterizations
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Quantitative Layers
Requirements– Precise– Efficient– Composable
Quantitative models represented declaratively – Tabular format (not in OWL)
Query result generation done procedurally – Using lisp functions
Coefficients for the linear model can be learned through a regression method
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Process-specific Prediction Models
0
100
200
300
400
500
600
700
800
900
H_Y4_4018 H_Y4_4019 H_Y4_4020 H_Y4_4021 H_Y4_4028
Dataset
Res
ult
s
Predicted
Actual
0
500
1000
1500
2000
2500
3000
3500
4000
4500
H_Y4_4018 H_Y4_4019 H_Y4_4020 H_Y4_4021 H_Y4_4028
Dataset
Sta
tes
Exp
and
ed
Predicted
Actual
Recurrence relation Pattern Matcher model compared to LAW actual results Mean error:
– Data Modification: 20%– Performance: 19%
Runtime differs from LAW by over 2 orders of magnitude
Data Modification Performance
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Outline
Program overview Project overview Qualitative (capabilities) layer *
– Modeling & query handling
Quantitative (“quality of service”) layers– Modeling & query handling
Implementation
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Implementation
Triple Store
Sparql RDFS++ Prolog Access APISOAP
ProCat Server
AllegroGraph
ProCat infrastructure
Tangram Workflow Services API ProCat API
Domain ontologiesWINGS
ComponentdescriptionsTEE
ProCatGUI
Concurrent queries
Logging
Web service API
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Future directions
Validity checking of ontology updates Validity checking of new / updated process
characterizations Allow for disjunction in pre- and post-conditions Process characterization editor Automation of quantitative model acquisition Assistance for updating process descriptions against
ontology changes Better online browsing and catalog management
ISWC 2008 A Process Catalog for Workflow Generation David Martin
Summary
Design & implementation of a Process Catalog for Workflow Generation– Qualitative (capabilities) layer– Quantitative (“quality of service”) layers
Novel elements– Quantitative layers (“Quality of Service”)
• Numeric models for data modification, accuracy, performance
Novel approach to reasoning about pre- and post-conditions– Propagation of values (in “backwards sweep”)– Universally quantified rules– Queries may contain pre- and post-condition elements– Computation of least sufficient precondition