Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Approximate Semantic Matching of Heterogeneous Events
Souleiman Hasan, Sean O’Riain, Edward CurryDigital Enterprise Research Institute (DERI), National University of Ireland, Galway (NUIG)
[email protected]://www.StefanDecker.org/
In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012), Berlin, Germany, 2012.
Digital Enterprise Research Institute www.deri.ie
Outline
Introduction Smart Environments Motivational Scenario Related Work
Proposal Approximate Semantic
Matching
Experiments Wikipedia Freebase
Conclusions Q & A
2 of 34
Digital Enterprise Research Institute www.deri.ie
Smart Environments
Smart Homes, Grids, Cities… Internet-of-Things, Sensor Web…
Non-technical users High heterogeneity Trend for dynamic data-driven decision making
by 2020 50 billion devices connected to mobile networks (OECD, 2012)
Event/Situation of InterestNew free parking space near me
Event/Situation of InterestSoccer match played in Berlin........
3 of 34
Digital Enterprise Research Institute www.deri.ie
Event Cloud
CIO
Helpdesk
Maintenance Personnel
Equipment Monitoring
Building
Data Center
Operational Level
Strategic Level
CSO
Motivational Scenario- Enterprise
Various terms used: energy consumption, energy usage….room, space, zone…
Situation of InterestCompany CO2 emissions performance
Dynamic Environments:New events from equipments joining and leaving
Energy usage by global IT department
PUE of the Data Center in Dublin
kWhs used by server 172.16.0.8
4 of 34
Digital Enterprise Research Institute www.deri.ie
Handling of semantically heterogeneous events
Handling of dynamic environments with event types by sources joining and leaving
Low cost of rules management Usability Precision
Requirements
5 of 34
Digital Enterprise Research Institute www.deri.ie
EPL
Inte
rface
an
d P
ars
er
Single EventMatcher
User
Situation of InterestWhen a floor is empty and its energy usage for an hour is above threshold w.r.t budget then it is an excessive usage
Event Processing
Developer
ERP
BMS
Pattern Matcher
UI
Transl
ati
on
EVENT PROCESSING RULEINSERT INTO ExcessiveEnergyUsageByFloorSELECT a.floor as floorFROM PATTERN [(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor (a.floor=b.floor))].WIN:TIME(1 hour)GROUP BY a.floorWHERE (b.usage) > GetAcceptableThreshold(a.budgetValue)
Templates Repository
Rules Repository
Exe
cuti
on
R
ep
osi
tory
CEP Engine
PC NO XDG26359Floor: 1stusage: 3 kWh
VM: vmdgsit01.deri.ieFloor: 1stusage: 15 kWh
Non-technical users with natural language needs
Separated from the engine
Rules tied to vocabularyHigh cost in case of
heterogeneity or change
6 of 34
Digital Enterprise Research Institute www.deri.ie
Exact Event Processing Paradigm
Requirement Addressing by the paradigm
Semantic Heterogeneity Does not scale out to high heterogeneous environments
Dynamic Environment Does not scale out to high dynamic environments
Rule Management High cost on large heterogeneity and dynamicity
Usability Low
Precision 100% (typically)
7 of 34
Digital Enterprise Research Institute www.deri.ie
Decoupling in Event Systems
Space Time
Synchronization
Event Producer
Event Consumer
Space
Synchronization
Time
Producers and consumers don’t know each other
Participants don’t need to be actively involved in the interaction the same time
Event producers and consumers don’t get blocked to send/receive events
8 of 34
Digital Enterprise Research Institute www.deri.ie
Decoupling in Event Systems
Principle “Removal of explicit dependencies between
participants” (Eugster et al., 2003) Outcome
Scalability
Event Producer
Event Consumer
Space
Synchronization
Time
9 of 34
Digital Enterprise Research Institute www.deri.ie
Semantic Coupling
Current event-based systems keep explicit semantic dependency between participants
Limited scalability in highly heterogeneous and dynamic environment
Event Producer
Event Consumer
Space
Synchronization
Time
Semantic (Event types, property, values)
10 of 34
Digital Enterprise Research Institute www.deri.ie
Current Approaches
Ontology-based (Petrovic et al., 2003), (Zhang & Ye, 2008)… Does not “remove explicit dependency” Hard to achieve ontology agreement a priori at large-
scale of heterogeneity and dynamicism Medium usability, 100% precision typically
Fuzzy sets (Liu & Jacobsen, 2002) Address only event numerical values vs. string values
subscriptions Medium usability, High precision
11 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach
Approximate semantic matching of events
Types & properties possible mappings
EventType(s)
PropertiesValues
Values possible mappings
Pick best overall mapping
Post-matching event processing
Subscription
Type(s)Properties
Values
12 of 34
Digital Enterprise Research Institute www.deri.ie
Semantic Similarity f: Terms X Terms [0,1] term1, term2 are Terms
f(term1, term2)=0 absolute semantic mismatchf(term1,term2)=1 exact match
E.g. Football Match and Soccer Match are similar Relatedness: a general case of similarity
E.g. Football Match and Referee related but not similar Thesaurus-based: e.g. WordNet-based Distributional semantics-based: e.g. Wikipedia
ESA The more Wikipedia articles two terms occurs in, the
more related they are
Background
13 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
event
Football Match
Netherlands National Football Team
2010 FIFA World Cup Final
Johannesburg
FNB stadium
Spain National Football Team
Howard Webb type
name
team
locationlocation
team
referee
Event type “”Soccer MatchEvent team “Spain”Event place “South Africa”
Subscription
14 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
typename
refereeteam
location
typeplaceteam
SubscriptionEvent
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.10.20.30.40.50.60.70.80.9
1
LinJiang&ConrathLeacock&ChodorowLeskPathResnikGloss VectorWuPalmerWikipedia ESAExactMatcher
Recall
Precision
15 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
typename
refereeteam
location
typeplaceteam
SubscriptionEvent
Determine top m correspondence candidatesRankSimJiiang&Conrath(ps, pe)
Measure properties relatednessfP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe))
16 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
typename
refereeteam
location
typeplaceteam
SubscriptionEvent
typelocation
team
typeplaceteam
90%Top 1
typename
referee
typeplaceteam
40%Top 2
17 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Football MatchHoward Webb
Spain National Football TeamJohannesburgFNB stadium
Netherlands National Football Team
Soccer MatchSouth AfricaSpain
SubscriptionEvent
Measure values relatedness fV=WikipediaESA(Vs, Ve)
18 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Spain National Football Team
Spain95%
Football MatchHoward Webb
Spain National Football TeamJohannesburgFNB stadium
Netherlands National Football Team
Soccer MatchSouth AfricaSpain
SubscriptionEvent
Netherlands National Football Team
Spain30%
19 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Football MatchHoward Webb
Spain National Football TeamJohannesburgFNB stadium
Netherlands National Football Team
Soccer MatchSouth AfricaSpain
SubscriptionEvent
typename
refereeteam
location
typeplaceteam
Calculate statements relatedness fSTMT =fP(ps, pe)*fV(vs, ve)
20 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Football MatchHoward Webb
Spain National Football TeamJohannesburgFNB stadium
Netherlands National Football Team
Soccer MatchSouth AfricaSpain
SubscriptionEvent
typename
refereeteam
location
typeplaceteam
Determine correspondent event statement Corre by Max fSTMT
21 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Rank within a window Complex Event Processing …
22 of 34
Digital Enterprise Research Institute www.deri.ie
Experiments Overview
Methodology Prepare an event set that reflect required semantic
heterogeneity (Wikipedia events) Prepare gold standard set of subscriptions that stress
multiple aspects of semantic coupling Validate suitability of semantic approximation from
precision perspective Use a different event set and same subscriptions to
validate low maintainability cost (Freebase events) Evaluation Criteria
Average interpolated Precision-Recall Curve on 11 recall points
Maximal F1 Score over the average curve
23 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Wikipedia Events
Event Set Statistics
Source structured Wikipedia Infoboxes, DBpedia 31 August 2011
Collection Triples directly associated to instances of dbpedia-owl:Event class
Data model RDF
Total # of events 20,156
Total # of distinct event types 4,950
Total # of distinct event properties 1,459
Total # of distinct event values 500,717
Total # of triples 1,502,599
Average # of distinct type per event 7.42
Average # of distinct property per event 30.52
Average # of distinct value per event 54.16
Average # of triple per event 64.67
24 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Wikipedia Events
Example Event Types Football Match Race Music Festival Space Mission Election 10th-Century BC Conflicts Academic Conference Aviation Accident …
25 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Subscription Set
Manually created gold standard set of subscriptions
ID Description Subscription # of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
1 Football matches played by Spain in the FNB stadium
event type "Football Match"event team "Spain national football team"event stadium "FNB Stadium"
1 1 NO NO NO
2 Football matches played in the FNB stadium
event type "Football Match"event place "FNB Stadium"
2 2 NO YES NO
3 Events taking place in Wembley stadium
event type "Event"event place "Wembley Stadium"
219 5 NO YES Syntactic
4 Charity events taking place in Wembley stadium
event type "Charity"event place "Wembley Stadium"
29 6 YES YES Semantic+ Syntactic
5 Charity Rock events taking place in Wembley stadium
event type "Charity"event type "Rock"event place "Wembley Stadium"
2 2 YES YES Semantic+ Syntactic
6 Football matches played in the UK
event type "Football Match"event stadium "United Kingdom"
505 603 NO YES Background Knowledge
7 Football matches played by a South American team in Europe
event type "Football Match"event team "South America"event stadium "Europe"
20 123,774
NO YES Background Knowledge
26 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Subscription Set
Manually created gold standard set of subscriptions
ID Description Template # of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
1 Football matches played by Spain in the FNB stadium
event type "Football Match"event team "Spain national football team"event stadium "FNB Stadium"
1 1 NO NO NO
2 Football matches played in the FNB stadium
event type "Football Match"event place "FNB Stadium"
2 2 NO YES NO
3 Events taking place in Wembley stadium
event type "Event"event place "Wembley Stadium"
219 5 NO YES Syntactic
4 Charity events taking place in Wembley stadium
event type "Charity"event place "Wembley Stadium"
29 6 YES YES Semantic+ Syntactic
5 Charity Rock events taking place in Wembley stadium
event type "Charity"event type "Rock"event place "Wembley Stadium"
2 2 YES YES Semantic+ Syntactic
6 Football matches played in the UK
event type "Football Match"event stadium "United Kingdom"
505 603 NO YES Background Knowledge
7 Football matches played by a South American team in Europe
event type "Football Match"event team "South America"event stadium "Europe"
20 123,774
NO YES Background Knowledge
Subscriptionevent type "Event"event place "Wembley Stadium"
SPARQL pattern 1?event rdf:type dbpedia-owl:Event.?event dbpprop:stadium dbpedia:Wembley_Stadium.
SPARQL pattern 2?event rdf:type dbpedia-owl:Event.?event dbpedia-owl:location dbpedia:Wembley_Stadium.
… …
ID
Description Subscription
# of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
3 Events taking place in Wembley stadium
event type "Event"event place "Wembley Stadium"
219 5 NO YES Syntactic
27 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.10.20.30.40.50.60.70.80.9
1
Jiang&Conrath Wikipedia ESARecall
Precision
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.10.20.30.40.50.60.70.80.9
1
Jiang&Conrath Wikipedia ESARecall
Precision
Events taking place in Wembley stadium
Football matches played in the UK
Need for a hybrid matcher that combines both
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0 2^ -25 2^ -20 2^ -15 2^ -10 2^ -5 1
Fre
quen
cy
Semantic similarity or relatedness score (log scale)
Jiang&Conrath WikipediaESA
28 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.10.20.30.40.50.60.70.80.9
1
Jiang&Conrath Wikipedia ESA HybridRecall
Precision
Matcher Jiang&Conrath Wikipedia ESA HybridMaximal F1 Score 70.06% 44.26% 75.45%Recall 80% 80% 90%Precision 62.31% 30.59% 64.94%
Hybrid matcher outperforms a single similarity or relatedness measure matcher.
29 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 2- Freebase Event Set
Event Set Statistics
Source Freebase events dump 1 December 2011, triples current
Collection Triples directly associated to instances of “fbase:time.event" class
Data model RDF
Total # of events 84,529
Total # of distinct event types 858
Total # of distinct event properties 1,242
Total # of distinct event values 1,199,627
Total # of triples 1,859,338
Average # of distinct type per event 3.33
Average # of distinct property per event 10.67
Average # of distinct value per event 21.66
Average # of triple per event 21.99
30 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 2- Subscription Set
Same as in Experiment 1.
ID Description Subscription # of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
1 Football matches played by Spain in the FNB stadium
event type "Football Match"event team "Spain national football team"event stadium "FNB Stadium"
1 1 YES YES NO
2 Football matches played in the FNB stadium
event type "Football Match"event place "FNB Stadium"
8 2 YES YES NO
3 Events taking place in Wembley stadium
event type "Event"event place "Wembley Stadium"
29 5 NO YES NO
4 Charity events taking place in Wembley stadium
event type "Charity"event place "Wembley Stadium"
0 - - - -
5 Charity Rock events taking place in Wembley stadium
event type "Charity"event type "Rock"event place "Wembley Stadium"
0 - - - -
6 Football matches played in the UK
event type "Football Match"event stadium "United Kingdom"
34 1,398 YES YES Background Knowledge
7 Football matches played by a South American team in Europe
event type "Football Match"event team "South America"event stadium "Europe"
2 219,600 YES YES Background Knowledge
31 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 2- Results
Matcher Jiang&Conrath Wikipedia ESA HybridMaximal F1 Score 44.60% 70.73% 76.33%Recall 60% 80% 80%Precision 35.49% 63.39% 72.98%
Hybrid matcher gives similar results in Freebase as in DBpedia
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.10.20.30.40.50.60.70.80.9
1
Jiang&Conrath Wikipedia ESA HybridRecall
Precision
32 of 34
Digital Enterprise Research Institute www.deri.ie
Conclusions
33 of 34
Exact MatcherApproximate Semantic
MatcherNumber of Required Subscriptions 345,000 7Maximal F1-Score 100% 75.89%
Approximate semantic matcher addresses subscriptions/ rules maintainability cost in heterogeneous and dynamic environments
Approximate semantic matcher is suitable when less than 100% precision is acceptable
A hybrid matcher outperforms a single similarity or relatedness measure matcher.
Digital Enterprise Research Institute www.deri.ie
Future Work
Need to enhance subscription set for more representativeness.
Approximate semantic matcher generates “uncertain” results whose impacts on further event processing functions such as CEP needs to be studied
34 of 34