Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
ICT, STREP
FERARI ICT-FP7-619491
Flexible Event pRocessing for big dAta aRchItectures
Collaborative Project
D 4.3
Automatic Generation of Annotated Event Processing Network from the
Goal-Driven Model
01.02.2016 – 31.01.2017 (final period)
Contractual Date of Delivery: 31.01.2017
Actual Date of Delivery: 31.01.2017
Author(s): Fabiana Fournier (IBM) and Inna Skarbovsky (IBM)
Institution: IBM
Workpackage: WP4
Security: PU
Nature: R
Total number of pages: 23
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Project coordinator name: Michael Mock
Project coordinator organisation name:
Fraunhofer Institute for Intelligent Analysis
and Information Systems (IAIS)
Revision: 1
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
URL: http://www.iais.fraunhofer.de
Abstract:
The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures)
project is to pave the way for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling business users to express complex analytics tasks through a high-level declarative language that supports distributed
complex event processing as an integral part of the system architecture. This report finalizes the work carried out for the development of a complex event processing
model and methodology suitable for specification, implementation, and maintenance of event-driven applications in the Big Data architecture.
In FERARI, we introduced a model driven approach based on a set of diagrams and tables that can be automatically translated into an event processing network and
eventually into a running application. In this report we detail the construction of the platform independent model and from it into a platform specific model; and
exemplified it with the mobile phone use case we have in the project. In addition, we extend our model and methodology to support the optimization and learning
frameworks developed in the course of the project.
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Revision history
Administration Status
Project acronym: FERARI ID: ICT-FP7-619491
Document identifier: D 4.3 Automatic Generation of annotated event processing network from the goal-driven method (03.02.2016 – 31.01.2017)
Leading Partner: IBM
Report version: 1 Report preparation date: 31.01.2017 Classification: PU
Nature: REPORT
Author(s) and contributors: Fabiana Fournier (IBM) and Inna Skarbovsky (IBM)
Status: - Plan
- Draft
- Working
- Final
x Submitted
Copyright
This report is © FERARI Consortium 2017. Its duplication is restricted to the personal use
within the consortium and the European Commission. www.ferari-project.eu
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Document History Version Date Author Change Description 0.1 0.2
15/12/2016 1/1/2017
Fabiana Fournier (IBM) Fabiana Fournier (IBM)
First draft Second draft including section 9
0.3 0.4 0.5
15/1/2017 17/1/2017 25/1/2017
Fabiana Fournier (IBM) Fabiana Fournier (IBM) Fabiana Fournier (IBM)
First complete version Inclusion of abstract Updates per internal review
1.0 30/1/2017 Fabiana Fournier (IBM) Final fixes and cleanup
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Table of Contents 1 Introduction .......................................................................................................................................... 2
1.1 Purpose and scope of the document ............................................................................................ 2
1.2 Relationship with other documents ............................................................................................. 2
1.3 Illustrative Example - The Mobile phone fraud use case .............................................................. 2
2 Recap on the event model (TEM) ......................................................................................................... 3
3 Transform the CIM to the platform independent model (PIM) ............................................................ 4
4 Generate the code and create the platform specific model (PSM) ...................................................... 7
5 TEM extensions for the sake of the optimizer and learning framework .............................................. 9
5.1 Extensions principles ..................................................................................................................... 9
5.2 TEM extensions for the FERARI optimizer and learning framework........................................... 10
5.2.1 EPN Optimization table ....................................................................................................... 11
5.2.2 EPA Optimization table ....................................................................................................... 11
5.2.3 GM operator table .............................................................................................................. 12
5.2.4 GML operator table ............................................................................................................. 12
5.2.5 Generated JSON file ............................................................................................................ 13
5.3 Extensions to the TEM methodology .......................................................................................... 15
5.4 Summary of TEM extensions for optimization and learning frameworks .................................. 15
6 Summary ............................................................................................................................................. 16
7 References .......................................................................................................................................... 16
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
List of Tables Table 1: Expensive calls EDT ......................................................................................................................... 6
Table 2: calls_cost_sum<Expensive calls> computation table ..................................................................... 6
Table 3: call_start_dates< Expensive calls> computation table ................................................................... 6
Table 4: Expensive calls policy table ............................................................................................................. 6
Table 5: EPN Optimization table ................................................................................................................. 11
Table 6: EPA Optimization tale ................................................................................................................... 12
Table 7: Introducing the GM operator ........................................................................................................ 12
Table 8: GM operator table ........................................................................................................................ 12
Table 9: Introducing the GML operator ...................................................................................................... 13
Table 10: GML operator table ..................................................................................................................... 13
List of Figures Figure 1: EPA generated for the Expensive calls situation ............................................................................ 5
Figure 2: Generated EPN in the mobile fraud use case ................................................................................ 7
Figure 3: JSON snippet for Expensive calls EPA ............................................................................................ 8
Figure 4: Generation of code with PROTON ................................................................................................. 9
Figure 5: JSON snippet showing optimization parameters ......................................................................... 14
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
Acronyms CEP Complex Event Processing
CIM Computation Independent Model
EDT Event Derivation Table
EPA Event Processing Agent
EPN Event Processing Network
FERARI Flexible Event pRocessing for big dAta aRchItectures
JSON JavaScript Object Notation
PIM Platform Independent Model
PROTON IBM PROactive Technology Online
PSM Platform Specific Model
TDM The Decision Model
TEM The Event Model
WP Work Package
FERARI Deliverable D4.3
Automatic Generation of Annotated Event
Processing Network from the Goal-Driven
Method
Fabiana Fournier (IBM) and Inna Skarbovsky (IBM)
1 Introduction
1.1 Purpose and scope of the document
This report presents the final deliverable of work package 4 (WP4) in the scope of the FERARI (Flexible
Event pRocessing for big dAta aRchItectures) project. D4.3 deals with the “automatic generation of
annotated event processing network from the goal-driven model: The deliverable describes the annotated
event processing network that is created from the models and methodologies defined in deliverable 4.2”. In
other words, D4.3 relies on and complements D4.2.
D4.2 presents The Event model (TEM) a new way to model, develop, validate, maintain, and implement
event-driven applications. TEM is based on a set of well-defined principles and building blocks, and does not
require substantial programming skills, thus making it suitable for business users and the project goals. A
methodology is also described as part of this report. D4.2 covers the Computation Independent Model
(CIM) layer in our model driven approach for event processing applications. D4.3 covers the remaining two
layers, i.e., the Platform Independent Model (PIM), and Platform Specific Model (PSM).
In addition, and as specific extension for FERARI, in the scope of D4.3 we extend our model and
methodology to support the optimization and learning frameworks developed so far in the project.
We exemplify our model using the mobile fraud use case in the project.
This report is structured as follows: Section 22 recaps on the main takeaways from deliverable 4.2. Section 3
describes the transformation of the TEM model into the platform independent model represented by an
event processing network, whereas Section 4 details the code generation. Section 5 describes the extensions
to our model and methodology that stem from the optimization and learning frameworks developed in
FERARI. We conclude the report with a summary.
1.2 Relationship with other documents
Deliverable 4.3 is a straightforward continuation of deliverable 4.2 “Goal driven model and methodology for
specification of event processing applications”, therefore it is directly connected to D4.2. As one of the major
extensions to the event model is towards the optimization and learning frameworks made in FERARI, D4.3
is also related to D5.2 “Algorithms for Robust Distributed Stream Monitoring and Supporting Data Integrity,
D5.3 “Implementation of Algorithms for Robust Distributed Stream Monitoring and Supporting data
Integrity”, and D2.3 “Final Prototype” (the reports for the deliverables can be found in1 ).
1.3 Illustrative Example - The Mobile phone fraud use case
We illustrate our work during the third year of the project using the mobile fraud use case previously analyzed
and implemented in the scope of D4.1 (first year of the project) and exemplified in TEM in the scope of
D4.2 (second year of the project). The goal in is to identify users, who use a network service without the
intention to pay for that use. For the sake of completeness, we describe below the situations we are looking
for in this scenario:
A long call (lasts more than 40 minutes) to a premium service (long distance call) is made during
night hours, i.e., from 7 PM to 7AM the next day (Long call at night).
Same as the previous one, but this time we are looking for at least three of these long distance calls at
night per calling number in a day (Frequent long calls at night).
Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day
(Frequent long calls).
Same as the previous one, but each occurrence call lasts at least 60 minutes (Frequent each long
call).
We are looking for high usage (>100) of a line for long distance calls every six hours (Expensive
calls).
We also remind that in our complex event processing (CEP) terminology we follow the semantics presented
in Etzion’s and Niblet’s book [1]. For the main constructs and terms please refer to D4.12.
2 Recap on the event model (TEM)
D4.2 (refer to3) describes our work during the second year of the project. The deliverable introduces TEM as
a way to model, develop, validate, maintain, and implement event-driven applications. The basic main idea
behind our model driven approach is to express the derivation logic through a collection of related
normalized tables. These tables can be validated and transformed into code generation. This idea has already
been successfully proven in the domain of business rules by the decision model (TDM) [2] and therefore we
believe that business users will be receptive towards TEM.
In addition, D4.2 introduces a model driven approach for event driven applications based on TEM that
includes both functional and non-functional requirements, as briefly described below for the sake of
completeness.
1. Construct the computation independent model (CIM)
2. Transform the CIM to the platform independent model (PIM)
3. Generate the code and create the platform specific model (PSM).
4. Operate the application and support modifications.
D4.2 details the first step of the methodology, i.e., the construction of the CIM model (equivalent to TEM
tables) and the forth step, which is a direct outcome of working with a model driven approach. A TEM
model is composed of:
1. A diagram for each situation.
2. A set of logic tables: An event derivation table (EDT), a computation table, and a policy table for
each node in the diagram.
3. A set of glossary tables: a concepts lexicon table, a fact types table, actors table, and an IT elements
table.
In the following sections, we detail the second and third steps in our methodology to complete the entire
process. The input to step 2 is a valid CIM model ready for the transformation to the PIM and PSM.
3 Transform the CIM to the platform independent model (PIM)
The platform independent model (PIM) is a generic representation of an event processing application. The
CIM might omit some details that can be implicitly inferred or specified by IT people at a later phase.
Examples of omitted details are: assignment of fact types associated with derived events whose values are
copied, and the physical realization of data elements and the way they are fetched (part of original event,
enrichment of events, or query of data stores). The implementation details are beyond the scope of this report.
We adopted the approach of transforming the CIM to a PIM rather than do direct transformation to a PSM
model, since the aim of TEM is to be generic and fit multiple implementations. For the PIM, we use the
model described in [1], which is based on the notions of event processing network (EPN) and event
processing agents (EPAs). It is a comprehensive model that can be mapped to many specific event processing
languages. Algorithm 1 below depicts how to generate an EPN out of a TEM model.
Algorithm 1: EPN generation
Procedure CreateEPN foreach Event Derivation Table (EDT)
for i=1 to EDT.numberOfRows do for j=1 to EDT.numberOfPatternConditions do
EPAij:= CreateEPA (EDT, i, j) RowEPAi := AND (RowEPAi , EPAij) //AND relationship among EPAs in the same row
endfor EDT_EPN := OR (EDT_EPN, RowEPAi) //OR relationship among EPAs in different rows endfor EPN = Append (EPN, EDT_EPN) //union of subEPNs of EDTs into an EPN endforeach
return EPN Procedure CreateEPA (EDT, i, j)
EPA.inputEvents := get EDT.columHeadings.event_types EPA.derivedEvent := EDT.name EPA.temporalContext := EDT.row[i].whenConditions EPA.segmentationContext := EDT.row[i].partitionByCondition EPA.filterOnEventExpression := EDT.row[i].filterOnEventConditions EPA.pattern := EDT.row[i].PatternCondition[j] EPA.filterOnPatternExpression := EDT.row[i].FilterOnPatternConditions EPA.policyValues := CalculatePolicyValues(EDT.name, i) foreach EPA.derivedEvent.attribute derivedExpressionValue := CalculateComputationValues(EPA.derivedEvent.attribute, i) if derivedExpressionValue != null then EPA.derivedEvent.attribute.expressionValue := derivedExpressionValue //values for computed attributes else EPA.derivedEvent.attribute.expressionValue := inputEvent.attribute // implicit inferring by copying of input event attribute values endforeach
return EPA Procedure CalculateComputationValues (attributeName, i)
computationExpression := Find (ComputationTable.attributeName.rowInEDT, i) //attribute computed expression or value associated to row i in EDT in the computation table
return computationExpression;
Procedure CalculatePolicyValues (derivedEvent, i) policyDerivedEvent := Find (PolicyTable, derivedEvent, i) //policy values associated to row i in corresponding EDT in policy table if policyDerivedEvent.values != null then policies.evaluationPolicy := policyDerivedEvent.evaluation policies.cardinalityPolicy := policyDerivedEvent.cardinality policies.repeatedPolicy := policyDerivedEvent.repeated policies.consumptionPolicy := policyDerivedEvent .consumption endif else policies := defaultValues
return policies
Informally, the construction of the EPN follows the following steps:
Each row in an EDT generates an EPA for each operator in the row. If there is more than one EPA,
an additional AND EPA will be added among all the EPAs.
Add an OR EPA between the EPAs of the EDT
Each row in each computation table derives a computed value in the event payload
Each row in each policy table derives the policy assignments of the corresponding EPA.
For example, the algorithm generates the corresponding EPA depicted in Figure 1 for the Expensive calls
situation from the EDT shown in Table 1 along with its corresponding computation tables (Table 2 and
Table 3) and policy table (Table 4). For a complete explanation of these tables refer to D4.23.
Figure 1: EPA generated for the Expensive calls situation
Event Processing Agent
CDR
within context
filtering
SUM
deriving
ExpensiveCalls
other_party_tel_number is member of premium services AND call_direction = 1
countSum(total_call_charge_amount) > 100
callsCostSum: countSum.SUMcall_start_dates: {call_start_date<CDR>}
Table 1: Expensive calls EDT
Table 2: calls_cost_sum<Expensive calls> computation table
Table 3: call_start_dates< Expensive calls> computation table
Table 4: Expensive calls policy table
In our illustrative example of the mobile phone fraud use case, the algorithm generates the following EPN
(Figure 2).
1every 6
hours
first CDR member
of
premium
services
= 1 > 100same
Expensive calls Logic
Row #When
Expression
When
Start
When
End
Partition by Pattern Filter on event Filter on pattern
Calling number other_party_tel_nu
mber <CDR>
call_direction
<CDR>
SUM(total_call_charge
_amount<CDR>)
Row # Row in Event
derivation Table
1 countSum.SUM 1
calls_cost_sum<Expensive calls> Computation
Row # Evaluation Cardinality Repeated ConsumptionRow in Event
derivation Table
1 immediate 1
Expensive calls Policy
Row # Row in Event
derivation Table
1 {call_start_date<CDR>} 1
call_start_dates<Expensive calls> Computation
Figure 2: Generated EPN in the mobile fraud use case
4 Generate the code and create the platform specific model (PSM)
This phase is a mapping between the PIM and PSM. Assuming that all missing details are obtained at the
PIM level, this is a mere functional transformation.
One way to represent an EPN is through a JSON (JavaScript Object Notation) file. This file can then be
provided to PROTON4 as configuration to the run-time engine. At execution, PROTON’s run-time engine
accesses the JSON file, loads and parses all the definitions, creates a thread per each input and output adapter
and starts listening for events incoming from the input adapters (representing producers) and forwards events
to output adapters (representing consumers).
In our EPA example of Figure 1, the corresponding JSON describing it is shown in Figure 3.
ExpensiveCalls
FrequentLongCallsAtNight
CD
Rs
FILTERLongCallAtNight
Situ
atio
ns
FrequentLongCalls
FrequentEachLongCall
COUNT
COUNT
SUM
AND
COUNT
SUM
Figure 3: JSON snippet for Expensive calls EPA
Figure 4 sketches how the JSON representation of the EPN is imported into PROTON to be applied during
run-time.
{
"name": "ExpensiveCallsEPA",
"createdDate": "Mon Mar 09 2015",
"epaType": "Aggregate",
"context": "CompositeExpensiveCalls",
"inputEvents": [
{
"name": "CDR",
"filterExpression": "(IndexOf(CDR.other_party_tel_number,'960') == 1 ||
IndexOf(CDR.other_party_tel_number,'960') == 2) &&
EqualsIgnoreCase(CDR.call_direction,'O')",
"consumptionPolicy": "Reuse",
"instanceSelectionPolicy": "First"
}
],
"computedVariables": [
{
"name": "SUM",
"aggregationType": "Sum",
"CDR": "CDR.total_call_charge_amount"
}
],
"assertion": "SUM > 100",
"evaluationPolicy": "Immediate",
"cardinalityPolicy": "Single",
"internalSegmentation": [],
"derivedEvents": [
{
"name": "ExpensiveCalls",
"reportParticipants": false,
"expressions": {
"Duration": "0",
"calling_number": "context.CallingNumberSegmentation",
"CallsCostSum": "SUM",
"call_start_dates":"ArrayConvert(CDR.call_start_date)"
}
}
]
}
Figure 4: Generation of code with PROTON
5 TEM extensions for the sake of the optimizer and learning framework
5.1 Extensions principles
TEM is primarily intended for business users as a means to define the business logic of an event processing
application. The requested extensions, on the other hand, are intended for IT people/developers as part of
the FERARI platform capabilities. Our goal was to extend the model we had to include all input required for
both the optimizer and the learning framework in such a way that we don’t “break” any of our TEM
principles. Therefore the extensions should satisfy two requirements:
1. Have the same “look and feel” as before in order to get one complete and coherent model.
2. At the same time, be an additional, optional, and independent part of both the model and
methodology. This part comes after the business logic of the application is specified by a business
user and it is intended for optimization purposes.
Therefore, in order to meet the above requirements we had to keep the following characteristics of TEM:
Model driven approach for event driven applications
Tabular representation of the logical artifacts
Normalized tables
Same structure of tables, that is: first row in the table designates the table name according to a
pre-defined syntax; the second row designates the table columns; whereas the third row and on,
designates the different conditions that hold for a specific instance of the table. Any TEM table
consists of a collection of conditions that issue a disjunctive normal form, namely all conditions
in a single row have conjunction relationship among them (AND), while the relationships among
multiple rows is a disjunction (OR).
One single logical artifact (EDT) for each derived event.
TEM extensions are targeted to technical people, and as such, they should be decoupled and
independent of the TEM core tables and methodology.
PROTON
TEM compiler
CEP engine(Run time)
CDRs
Fraudulent calling numbers
JSON
In the following sections we describe our extensions that satisfy the above characteristics.
5.2 TEM extensions for the FERARI optimizer and learning framework
In order to be able to generate an EPN adequate for the optimizer to understand and perform, TEM needs to
provide the following information (refer to D5.2 and D5.3 for information on the optimization approach in
FERARI):
1. Query Rewriting/Reordering Override: If we are confident about the queries as written we should
have an option if we do not wish any rewriting. The Boolean attribute rewrite is needed at the EPN
level marking that no query transformation is desirable. Additionally, another Boolean attribute reorder
is needed marking that the user doesn’t wish to break the query plan in multiple steps.
2. Optimization parameters: The optimizer should also be aware based on which parameter it should
optimize its plans for. This part is needed for the selection of the final plan from the list of the
optimal plans. There are 4 different attributes that we should have; maxLatency = float
(default=infinity), maxCost = float (default=infinity), latencyWeight and costWeight. If one weight is
specified the other is set to 0. If both weights are missing we optimize for cost.
3. Local Operator Placements: The optimizer needs to know if an operator (EPA) should be placed in
every local site. In that regard, the EPN definition file should include a Boolean attribute named
localPlacement.
4. Confidence on uncertainty: A threshold value per operator based on which we should qualify
uncertain events. This value is input to the optimizer in order to be able to take it into account in the
plan generation.
5. Geometric Method operator: There is the need for creating and describe in the definitions file the
GM operator. The necessary fields are listed below:
“name”: string
“functionName”: string
“functionLocation”: string
“monitoringObject”: event or event attribute
“derivedEvents”: list of derived events with their attributes
“sites”: list of sites to monitor
“defLSVValue”: float
“thresholdType”: dynamic or default
“threshold”: integer
“equalityInAboveThresholdRegion”: boolean
“resolveSteps”: integer or “optimize”
“weightVectors”: list of weights per site, on absence average
“context”: context as in proton, composite or temporal
6. Geometric Method Learning operator: Even though the distributed online learning can be regarded
as a special case of the geometric method, for convenience we defined specific fields suitable for
distributed learning.
“functionName”: string
“monitoringObject”: event or event attribute
“derivedEvents”: list of derived events with their attributes
“initialModel”: string
“synchProtocol”: string, location of class implementing ISyncOp interface
“synchProtocolParams”: map of parameter names to parameter values
“resolutionProtocol”: string, location of class implementing IResolutionProtocol interface
“resolutionProtocolParams”: map of parameter names to parameter values
“serviceType”: string, currently one of “classification”, “regression”, “outlier detection”, “KDE”
“updateRule”: string, location of class implementing IUpRule interface
“updateRuleParams”: map of parameter names to parameter values
“modelType”: string, location of class implementing IModel interface
“modelParams”: map of parameter names to parameter values
“lossFunction”: string, location of class implementing ILossFunction interface
“lossFunctionParams”: map of parameter names to parameter values
“aggregationMethod”: string, location of class implementing IAggregationMethod interface
Note that requirements 1 and 2 are at the level of the EPN, while requirements 3 and 4 are at the level of the
EPA (operator). Requirements 5 and 6 specifically address the new geometric method and geometric method
learning operators accordingly.
Based on our illustrative mobile fraud use case, we introduce the new tables and populate them with some
values, later on shown in a snippet of the generated JSON (see Section 5.2.5).
5.2.1 EPN Optimization table
A single table that is composed of the fields needed to satisfy requirements 1 and 2 above (Table 5). As all
tables in TEM the first row indicates its name, in our case, EPN Optimization. The table consists of two parts
separated by a red vertical line. The right hand part satisfies requirement 1, while the left hand part satisfies
requirement 2. As any TEM table, all conditions in a single row maintain a conjunction relationship.
Table 5: EPN Optimization table
5.2.2 EPA Optimization table
For each EPA in an EDT, we introduce an EPA Optimization table as shown in Table 6. The table name (first
row of the table) is composed of the name of the derived event + suffix “EPA”. The reference to the
corresponding EDT is done by the name of the event (name of the EDT) and the specific row in the EDT
(the corresponding EPA)Table 5.
Row # Rewrite Reorder maxLatency maxCost latencyWeight costWeight
1 T T 10.25 2.5 0.8 0.2
EPN Optimization
Table 6: EPA Optimization tale
5.2.3 GM operator table
To address requirement 5, we include a new type of operator named GM in the corresponding EDT (Table 7)
that references to a corresponding GM operator table (Table 8).
Table 7: Introducing the GM operator
The name for the GM operator table is the GMname + suffix of “GM”. Again, as in all TEM tables, the
second row is the headings. In this case the columns correspond to the fields required in 5.
Table 8: GM operator table
(*) eu.ferari.examples.distributedcount.function.IdentityFunction
5.2.4 GML operator table
Similarly to the extension for the GM operator, we introduce a new operator type named GML operator and its
corresponding table (Table 10 ) that is referenced by the keyword GML in the corresponding EDT table
(Table 9). As its counterpart the GM operator, the name of this table is operator name + suffix GML. The
column headings of the GML operator table match the fields in requirement 6 above.
Local
placement
[T/F]
Confide
nce
[0..1]
Row in Event
derivation Table
T 0.7 1
Long call at night EPA
1
EDT name
GM
Filter on
event
Pattern Filter on
patternRow #
When
Expression
When
Start
When End Partition
by
Row #
function
Name
functionLo
cation
monitoring
Object
equalityIn
Above
Threshold
Region
resolve
Steps
weight
VectorsRow in Event
derivation
Table
1 ferari * CDR T optimize 1
DistributedCounter GMsites
0.0f
defLSVValue threshold
Type
threshold
dynamic 100
Table 9: Introducing the GML operator
Table 10: GML operator table
* eu.ferari.learning.ComEffLearner
** location of update rule in framework
*** location of loss function in framework
5.2.5 Generated JSON file
The above populated tables translation is represented in the snippet of the resulting JSON file snippet in
Figure 5. The EPN optimization table is shown in blue. The EPA optimization table translation is shown in
orange. The GM parameters are color coded in green, whereas the GMLL parameters in purple.
1
EDT name
GML
Filter on
event
Pattern Filter on
patternRow #
When
Expression
When
Start
When End Partition
by
Figure 5: JSON snippet showing optimization parameters
{"epn": {
"name": "MobileFraud"
"rewrite": "true"
“reorder”: “true”
"optimize": {
“maxLatency” : “10.5”,
“maxCost” : “2.5”,
“latencyWeight” : “0.8 “,
“costWeight”: “0.2”
}],
“GM”:[ { “name”: “distributedCounter”, “functionName”: “eu.ferari.examples.distributedcount.function.IdentityFunction”, “monitoringObject”: [{“CDR” },
{“another”}] //list of input events “derivedEvents”: [ {
“name”: ”globalThresholdViolation”, “otherAttribute”: ”otherValue” } ],
“defLSVValue”: “0.0f”, “thresholdType”: “dynamic”, //optional field default static or dynamic “threshold”: “100”, “equalityInAboveThresholdRegion”: “true”, “context” : “gmCompositeContext” },…],
"GML" : [ { "name":"learnerName", "functionName":"eu.ferari.learning.ComEffLearner", "monitoringObject":[{“CallPOPDWH” }, {“another”}] //list of input events "derivedEvents":[{ "name":"selectivityUpdate", "selectivity":"[float]"}], "context":"time window or segmentation", "initialModel":"random", "synchProtocol":"dynamic", "synchProtocolParams":[{"batchSize":"8"},{"threshold":"0.6"}], "resolutionProtocol":"hedgeActive", "resolutionProtocolParams":"none", "serviceType":"KDE", "updateRule":"location of update rule in framework", "updateRuleParams":[{"lambda":"0.1"},{"eta":"1.0"}], "modelType":"kernel", "modelParams":[{"kernelType":"gaussian"}, {"sigma":"0.1"},
5.3 Extensions to the TEM methodology
As aforementioned, as we aim to decouple the FERARI’s specific extensions from the generic model targeted
to non-technical users, we add one optional step to the methodology, as shown below.
As detailed in D4.23, the construction of the computation independent model includes the following 6 steps.
1. Identify the goals in terms of situations that need to be derived from the application and identify a
consumer for each situation (the “WHAT” phase).
2. For each such situation, construct a diagram that drills down to what is needed to be known or
detected in order to derive this situation (the high level “HOW” phase).
3. For each node in the diagram, construct a corresponding EDT and optionally computation and
policy tables that specify the logic for the node. This step is done bottom-up starting from the leaves
of the diagram and finishing with the situations to be detected.
4. For each event or fact type that is referred in the logic artifacts, locate its origin or create a
requirement to fetch or instrument it. If it is not feasible, refine the requirements.
5. Complete the glossary.
6. Validate the model against TEM Principles.
We add step number 7 which states:
7. Optional: For optimization and learning frameworks complete the model by adding the EPN and
EPA optimization tables, and for each row in an EDT that corresponds to a GM or GML operator
its GM and GML operator tables accordingly.
5.4 Summary of TEM extensions for optimization and learning frameworks
In summary, we extended the generic TEM to cope with the requirements imposed by the optimization and
learning frameworks in the FERARI project. To this end, we added to the model four new tables:
The EPN optimization table
The EPA optimization table
The GM and GML operators tables
We showed how these tables can be translated to the JSON definitions file. Furthermore, we extended our
methodology by an additional step dedicated to IT users interested in leveraging the optimization and
learning features offered by FERARI.
6 Summary
The Event Model follows the Model Driven Engineering approach [3][4] and can be classified as a CIM
(Computation Independent Model), providing independence in the physical data representation, and omitting
details which are obvious to the designer. This model can be directly translated to an execution model (PSM
– Platform Specific Model in the Model Driven Architecture terminology) through an intermediate generic
representation (PIM – Platform Independent Model). In deliverable 4.23 we focused on the CIM layer and
presented the complete event model along with an accompanying methodology for the approach. The first
part of this report complements the model driven approach and presents the two remaining layers of PIM
and PSM. We show the translation of TEM (i.e., CIM) into a corresponding EPN (PIM) that can be then
represented as a JSON file and be fed into our CEP engine PROTON for run-time (PSM). The result of this
work is a complete and full model driven approach for event driven applications intended for business users.
The approach is illustrated using the mobile fraud use case in the project.
In the second part of the report, we extend the model and the methodology to meet specific requirements
that stem from the optimization and learning techniques developed in FERARI. The extensions have been
carried out in a way that the new extensions follow TEM principles on the one hand, but can be decoupled
from the core and generic model, on the other.
7 References
[1]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company.
[2]. Von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press.
[3]. Bodenstein C., Lohse F., and Zimmermann A. 2010. Executable specifications for model-based
development of automotive software. SMC 2010, 727-732.
[4]. Brambilla M., Cabot J., and Wimmer M. 2012. Model Driven Software Engineering in Practice. Morgan &
Claypool.