Click here to load reader

Approximate Semantic Matching of Heterogeneous Events

  • View
    104

  • Download
    1

Embed Size (px)

DESCRIPTION

Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach. Hasan S, O€'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).

Text of Approximate Semantic Matching of Heterogeneous Events

  • 1. Digital Enterprise Research Institutewww.deri.ie Approximate Semantic Matching ofHeterogeneous EventsSouleiman Hasan, Sean ORiain, Edward CurryDigital Enterprise Research Institute (DERI) National University of Ireland, Galway (NUIG)In proceedings of DEBS 2012, Berlin, Germany [email protected] http://www.StefanDecker.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

2. Further ReadingDigital Enterprise Research Institute www.deri.ieHasan S, ORiain S, Curry E.Approximate Semantic Matching of Heterogeneous Events. In:6th ACM International Conference on Distributed Event-BasedSystems (DEBS 2012)www.edwardcurry.org 3. OutlineDigital Enterprise Research Institute www.deri.ie n Introduction n Experiments Smart Environments Wikipedia Motivational Scenario Freebase Related Workn Conclusions n Proposal n Q&A Approximate SemanticMatching3 of 34 4. Smart EnvironmentsDigital Enterprise Research Institutewww.deri.ie n Smart Homes, Grids, Cities n Internet-of-Things, Sensor Web by 202050 billiondevices connected to mobile networks (OECD, 2012) n Non-technical users n High heterogeneity n Trend for dynamic data-driven decision making Event/Situation of InterestEvent/Situation of InterestSoccer match played in BerlinNew free parking space near me ........ 4 of 34 5. Motivational Scenario- EnterpriseDigital Enterprise Research Institute www.deri.ie CIOCSOSituation of InterestCompany CO2 emissionsperformanceEnergy usage by global IT departmentHelpdeskVarious terms used:energy consumption,energy usage. PUE of the Data Center inroom, space, zone DublinMaintenance PersonnelDynamic Environments:New events fromkWhs used byequipments joining and serverleaving172.16.0.8 Building Data Center 5 of 34 6. RequirementsDigital Enterprise Research Institute www.deri.ie n Handling of semantically heterogeneous events n Handling of dynamic environments with event types by sources joining and leaving n Low cost of rules management n Usability n Precision6 of 34 7. Event ProcessingDigital Enterprise Research Institutewww.deri.ieSituation of InterestWhen a floor is empty and its energy usage for an hour is abovethreshold w.r.t budget then it is an excessive usageNon-technical users with UserTranslation Developernatural language needsCEP Engine Separated from the engine UIRules tied RULE vocabularyEVENT PROCESSING to EPL Interface RulesRepositoryand ParserExecutionINSERT INTO ExcessiveEnergyUsageByFloorPattern Matcher RepositorySELECT a.floor as floor case of High cost inheterogeneity or changeFROM PATTERNSingle Event Templates[(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensorMatcherRepository (a.floor=b.floor))] .WIN:TIME(1 hour) GROUP BY a.floor WHERE (b.usage) > GetAcceptableThreshold(a.budgetValue)ERP PC NO XDG26359 Floor: 1st usage: 3 kWh VM: vmdgsit01.deri.ie Floor: 1st BMS usage: 15 kWh 7 of 34 8. Exact Event Processing ParadigmDigital Enterprise Research Institute www.deri.ieRequirement Addressing by the paradigmSemantic HeterogeneityDoes not scale out to highheterogeneous environmentsDynamic Environment Does not scale out to high dynamicenvironmentsRule Management High cost on large heterogeneity anddynamicityUsability LowPrecision 100% (typically)8 of 34 9. Decoupling in Event SystemsDigital Enterprise Research Institutewww.deri.ie n Space Producers and consumers dont know each other n Time Participants dont need to be actively involved in theinteraction the same time n Synchronization Event producers and consumers dont getblocked to send/receive eventsSpaceTime EventEvent Producer ConsumerSynchronization9 of 34 10. Decoupling in Event SystemsDigital Enterprise Research Institute www.deri.ie n Principle Removal of explicit dependencies between participants (Eugster et al., 2003) n Outcome ScalabilitySpaceTimeEvent EventProducerConsumerSynchronization10 of 34 11. Semantic CouplingDigital Enterprise Research Institutewww.deri.ie n Current event-based systems keep explicit semantic dependency between participants n Limited scalability in highly heterogeneous and dynamic environmentSpaceTime EventEvent Producer Consumer Synchronization Semantic(Event types, property, values)11 of 34 12. Current ApproachesDigital Enterprise Research Institute www.deri.ie n Ontology-based (Petrovic et al., 2003), (Zhang & Ye, 2008) Does not remove explicit dependency Hard to achieve ontology agreement a priori at large-scaleof heterogeneity and dynamicism Medium usability, 100% precision typically n Fuzzy sets (Liu & Jacobsen, 2002) Address only event numerical values vs. string valuessubscriptions Medium usability, High precision 12 of 34 13. Proposed ApproachDigital Enterprise Research Institutewww.deri.ie n Approximate semantic matching of events Event Types & propertiesType(s)possible mappings Properties Values SubscriptionValues possible Type(s)mappingsPropertiesValuesPick best overallmapping Post-matching eventprocessing13 of 34 14. BackgroundDigital Enterprise Research Institute www.deri.ie q Semantic Similarityq f: Terms X Terms [0,1]q term1, term2 are Terms qf(term1, term2)=0 absolute semantic mismatch qf(term1,term2)=1 exact matchq E.g. Football Match and Soccer Match are similar q Relatedness: a general case of similarityq E.g. Football Match and Referee related but not similar q Thesaurus-based: e.g. WordNet-based q Distributional semantics-based: e.g. Wikipedia ESAq The more Wikipedia articles two terms occurs in, the morerelated they are 14 of 34 15. Proposed Approach InstantiationDigital Enterprise Research Institutewww.deri.ieFootball Match Types & properties possible mappings2010 FIFA World Howard Webb typeCup Final referee nameValues possiblemappings Spain Nationaleventteam Football Team team Pick best overall location Netherlands Nationalmappinglocation Football TeamJohannesburg Post-matching eventFNB stadium processing Subscription EventtypeSoccer Match EventteamSpain Eventplace South Africa15 of 34 16. Proposed Approach InstantiationDigital Enterprise Research Institutewww.deri.ieEvent Subscription Types & properties possible mappings type typenameplaceValues possiblereferee teammappingsteam location Pick best overallmapping1 0.9 Lin 0.8 Post-matching event0.7 Jiang&Conrath processingPrecision 0.6 0.5 Leacock&Chodorow 0.4 Lesk 0.3 0.2 Path 0.1 0 Resnik 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Gloss Vector Recall 16 of 34 17. Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ieEvent SubscriptionTypes & propertiespossible mappings type typenameplace Values possiblereferee team mappingsteam locationPick best overall mapping Determine top m correspondence candidatesPost-matching event RankSimJiiang&Conrath(ps, pe) processing Measure properties relatedness fP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe)) 17 of 34 18. Proposed Approach InstantiationDigital Enterprise Research Institutewww.deri.ieEvent Subscription Types & properties possible mappings type typenameplaceValues possiblereferee teammappingsteam location Pick best overallmapping type type Top 1 location 90% placePost-matching eventteamteamprocessing type type Top 2name40% placereferee team 18 of 34 19. Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ieEventSubscription Types & propertiespossible mappings Football MatchHoward Webb Soccer Match Spain National Football TeamSouth Africa Values possibleJohannesburg FNB stadium Spain mappings Netherlands National Football Team Pick best overall mappingMeasure values relatedness fV=WikipediaESA(Vs, Ve)Post-matching event processing19 of 34 20. Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ieEventSubscription Types & propertiespossible mappings Football MatchHoward Webb Soccer Match Spain National Football TeamSouth Africa Values possibleJohannesburg FNB stadium Spain mappings Netherlands National Football Team Pick best overall mapping Spain National 95%Spain Football TeamPost-matching event processing Netherlands National 30%SpainFootball Team20 of 34 21. Proposed Approach InstantiationDigital Enterprise Research Institutewww.deri.ieEvent Subscription Types & properties possible mappingstypetype name placeValues possible refereeteammappings teamlocationPick best overallmapping Football MatchHoward WebbSoccer Match Spain National Football Team South Africa Post-matching eventJohannesburg FNB stadiumSpain processing Netherlands National Football Team Calculate statements relatedness fSTMT =fP(ps, pe)*fV(vs, ve) 21 of 34 22. Proposed Approach InstantiationDigital Enterprise Research Institutewww.deri.ieEvent Subscription Types & properties possible mappingstypetype name placeValues possible refereeteammappings teamlocationPick best overallmapping Football MatchHoward WebbSoccer Match Spain National Football Team South Africa Post-matching eventJohannesburg FNB stadiumSpain processing Netherlands National Football Team Determine correspondent event statement Corre by Max fSTMT 22 of 34 23. Proposed Approach InstantiationDigital Enterprise Research Institutewww.deri.ie Types & properties n Rank within a windowpossible mappings n Complex Event Processing Values possible n mappingsPick best overallmapping Post-matching eventprocessing23 of 34 24. Experiments OverviewDigital Enterprise Research Institutewww.deri.ie n Methodology Prepare an event set that reflect required semanticheterogeneity (Wikipedia events) Prepare gold standard set of subscriptions that stressmultiple aspects of semantic coupling Validate suitability of semantic approximation fromprecision perspective Use a different event set and same subscriptions tovalidate low maintainability cost (Freebase events) n Evaluation Criteria Average interpolated Precision-Recall Curve on 11 recallpoints Maximal F1 Score over the average curve 24 of 34 25. Experiment 1- Wikipedia Event

Search related