23
ICT, STREP FERARI ICT-FP7-619491 Flexible Event pRocessing for big dAta aRchItectures Collaborative Project D 4.3 Automatic Generation of Annotated Event Processing Network from the Goal-Driven Model 01.02.2016 – 31.01.2017 (final period) Contractual Date of Delivery: 31.01.2017 Actual Date of Delivery: 31.01.2017 Author(s): Fabiana Fournier (IBM) and Inna Skarbovsky (IBM) Institution: IBM Workpackage: WP4 Security: PU Nature: R Total number of pages: 23

The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

ICT, STREP

FERARI ICT-FP7-619491

Flexible Event pRocessing for big dAta aRchItectures

Collaborative Project

D 4.3

Automatic Generation of Annotated Event Processing Network from the

Goal-Driven Model

01.02.2016 – 31.01.2017 (final period)

Contractual Date of Delivery: 31.01.2017

Actual Date of Delivery: 31.01.2017

Author(s): Fabiana Fournier (IBM) and Inna Skarbovsky (IBM)

Institution: IBM

Workpackage: WP4

Security: PU

Nature: R

Total number of pages: 23

Page 2: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

Project coordinator name: Michael Mock

Project coordinator organisation name:

Fraunhofer Institute for Intelligent Analysis

and Information Systems (IAIS)

Revision: 1

Schloss Birlinghoven, 53754 Sankt Augustin, Germany

URL: http://www.iais.fraunhofer.de

Abstract:

The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures)

project is to pave the way for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling business users to express complex analytics tasks through a high-level declarative language that supports distributed

complex event processing as an integral part of the system architecture. This report finalizes the work carried out for the development of a complex event processing

model and methodology suitable for specification, implementation, and maintenance of event-driven applications in the Big Data architecture.

In FERARI, we introduced a model driven approach based on a set of diagrams and tables that can be automatically translated into an event processing network and

eventually into a running application. In this report we detail the construction of the platform independent model and from it into a platform specific model; and

exemplified it with the mobile phone use case we have in the project. In addition, we extend our model and methodology to support the optimization and learning

frameworks developed in the course of the project.

Page 3: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

Revision history

Administration Status

Project acronym: FERARI ID: ICT-FP7-619491

Document identifier: D 4.3 Automatic Generation of annotated event processing network from the goal-driven method (03.02.2016 – 31.01.2017)

Leading Partner: IBM

Report version: 1 Report preparation date: 31.01.2017 Classification: PU

Nature: REPORT

Author(s) and contributors: Fabiana Fournier (IBM) and Inna Skarbovsky (IBM)

Status: - Plan

- Draft

- Working

- Final

x Submitted

Copyright

This report is © FERARI Consortium 2017. Its duplication is restricted to the personal use

within the consortium and the European Commission. www.ferari-project.eu

Page 4: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

Document History Version Date Author Change Description 0.1 0.2

15/12/2016 1/1/2017

Fabiana Fournier (IBM) Fabiana Fournier (IBM)

First draft Second draft including section 9

0.3 0.4 0.5

15/1/2017 17/1/2017 25/1/2017

Fabiana Fournier (IBM) Fabiana Fournier (IBM) Fabiana Fournier (IBM)

First complete version Inclusion of abstract Updates per internal review

1.0 30/1/2017 Fabiana Fournier (IBM) Final fixes and cleanup

Page 5: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

Table of Contents 1 Introduction .......................................................................................................................................... 2

1.1 Purpose and scope of the document ............................................................................................ 2

1.2 Relationship with other documents ............................................................................................. 2

1.3 Illustrative Example - The Mobile phone fraud use case .............................................................. 2

2 Recap on the event model (TEM) ......................................................................................................... 3

3 Transform the CIM to the platform independent model (PIM) ............................................................ 4

4 Generate the code and create the platform specific model (PSM) ...................................................... 7

5 TEM extensions for the sake of the optimizer and learning framework .............................................. 9

5.1 Extensions principles ..................................................................................................................... 9

5.2 TEM extensions for the FERARI optimizer and learning framework........................................... 10

5.2.1 EPN Optimization table ....................................................................................................... 11

5.2.2 EPA Optimization table ....................................................................................................... 11

5.2.3 GM operator table .............................................................................................................. 12

5.2.4 GML operator table ............................................................................................................. 12

5.2.5 Generated JSON file ............................................................................................................ 13

5.3 Extensions to the TEM methodology .......................................................................................... 15

5.4 Summary of TEM extensions for optimization and learning frameworks .................................. 15

6 Summary ............................................................................................................................................. 16

7 References .......................................................................................................................................... 16

Page 6: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

List of Tables Table 1: Expensive calls EDT ......................................................................................................................... 6

Table 2: calls_cost_sum<Expensive calls> computation table ..................................................................... 6

Table 3: call_start_dates< Expensive calls> computation table ................................................................... 6

Table 4: Expensive calls policy table ............................................................................................................. 6

Table 5: EPN Optimization table ................................................................................................................. 11

Table 6: EPA Optimization tale ................................................................................................................... 12

Table 7: Introducing the GM operator ........................................................................................................ 12

Table 8: GM operator table ........................................................................................................................ 12

Table 9: Introducing the GML operator ...................................................................................................... 13

Table 10: GML operator table ..................................................................................................................... 13

List of Figures Figure 1: EPA generated for the Expensive calls situation ............................................................................ 5

Figure 2: Generated EPN in the mobile fraud use case ................................................................................ 7

Figure 3: JSON snippet for Expensive calls EPA ............................................................................................ 8

Figure 4: Generation of code with PROTON ................................................................................................. 9

Figure 5: JSON snippet showing optimization parameters ......................................................................... 14

Page 7: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

Acronyms CEP Complex Event Processing

CIM Computation Independent Model

EDT Event Derivation Table

EPA Event Processing Agent

EPN Event Processing Network

FERARI Flexible Event pRocessing for big dAta aRchItectures

JSON JavaScript Object Notation

PIM Platform Independent Model

PROTON IBM PROactive Technology Online

PSM Platform Specific Model

TDM The Decision Model

TEM The Event Model

WP Work Package

Page 8: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

FERARI Deliverable D4.3

Automatic Generation of Annotated Event

Processing Network from the Goal-Driven

Method

Fabiana Fournier (IBM) and Inna Skarbovsky (IBM)

Page 9: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

1 Introduction

1.1 Purpose and scope of the document

This report presents the final deliverable of work package 4 (WP4) in the scope of the FERARI (Flexible

Event pRocessing for big dAta aRchItectures) project. D4.3 deals with the “automatic generation of

annotated event processing network from the goal-driven model: The deliverable describes the annotated

event processing network that is created from the models and methodologies defined in deliverable 4.2”. In

other words, D4.3 relies on and complements D4.2.

D4.2 presents The Event model (TEM) a new way to model, develop, validate, maintain, and implement

event-driven applications. TEM is based on a set of well-defined principles and building blocks, and does not

require substantial programming skills, thus making it suitable for business users and the project goals. A

methodology is also described as part of this report. D4.2 covers the Computation Independent Model

(CIM) layer in our model driven approach for event processing applications. D4.3 covers the remaining two

layers, i.e., the Platform Independent Model (PIM), and Platform Specific Model (PSM).

In addition, and as specific extension for FERARI, in the scope of D4.3 we extend our model and

methodology to support the optimization and learning frameworks developed so far in the project.

We exemplify our model using the mobile fraud use case in the project.

This report is structured as follows: Section 22 recaps on the main takeaways from deliverable 4.2. Section 3

describes the transformation of the TEM model into the platform independent model represented by an

event processing network, whereas Section 4 details the code generation. Section 5 describes the extensions

to our model and methodology that stem from the optimization and learning frameworks developed in

FERARI. We conclude the report with a summary.

1.2 Relationship with other documents

Deliverable 4.3 is a straightforward continuation of deliverable 4.2 “Goal driven model and methodology for

specification of event processing applications”, therefore it is directly connected to D4.2. As one of the major

extensions to the event model is towards the optimization and learning frameworks made in FERARI, D4.3

is also related to D5.2 “Algorithms for Robust Distributed Stream Monitoring and Supporting Data Integrity,

D5.3 “Implementation of Algorithms for Robust Distributed Stream Monitoring and Supporting data

Integrity”, and D2.3 “Final Prototype” (the reports for the deliverables can be found in1 ).

1.3 Illustrative Example - The Mobile phone fraud use case

We illustrate our work during the third year of the project using the mobile fraud use case previously analyzed

and implemented in the scope of D4.1 (first year of the project) and exemplified in TEM in the scope of

D4.2 (second year of the project). The goal in is to identify users, who use a network service without the

intention to pay for that use. For the sake of completeness, we describe below the situations we are looking

for in this scenario:

Page 10: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

A long call (lasts more than 40 minutes) to a premium service (long distance call) is made during

night hours, i.e., from 7 PM to 7AM the next day (Long call at night).

Same as the previous one, but this time we are looking for at least three of these long distance calls at

night per calling number in a day (Frequent long calls at night).

Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day

(Frequent long calls).

Same as the previous one, but each occurrence call lasts at least 60 minutes (Frequent each long

call).

We are looking for high usage (>100) of a line for long distance calls every six hours (Expensive

calls).

We also remind that in our complex event processing (CEP) terminology we follow the semantics presented

in Etzion’s and Niblet’s book [1]. For the main constructs and terms please refer to D4.12.

2 Recap on the event model (TEM)

D4.2 (refer to3) describes our work during the second year of the project. The deliverable introduces TEM as

a way to model, develop, validate, maintain, and implement event-driven applications. The basic main idea

behind our model driven approach is to express the derivation logic through a collection of related

normalized tables. These tables can be validated and transformed into code generation. This idea has already

been successfully proven in the domain of business rules by the decision model (TDM) [2] and therefore we

believe that business users will be receptive towards TEM.

In addition, D4.2 introduces a model driven approach for event driven applications based on TEM that

includes both functional and non-functional requirements, as briefly described below for the sake of

completeness.

1. Construct the computation independent model (CIM)

2. Transform the CIM to the platform independent model (PIM)

3. Generate the code and create the platform specific model (PSM).

4. Operate the application and support modifications.

D4.2 details the first step of the methodology, i.e., the construction of the CIM model (equivalent to TEM

tables) and the forth step, which is a direct outcome of working with a model driven approach. A TEM

model is composed of:

1. A diagram for each situation.

2. A set of logic tables: An event derivation table (EDT), a computation table, and a policy table for

each node in the diagram.

Page 11: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

3. A set of glossary tables: a concepts lexicon table, a fact types table, actors table, and an IT elements

table.

In the following sections, we detail the second and third steps in our methodology to complete the entire

process. The input to step 2 is a valid CIM model ready for the transformation to the PIM and PSM.

3 Transform the CIM to the platform independent model (PIM)

The platform independent model (PIM) is a generic representation of an event processing application. The

CIM might omit some details that can be implicitly inferred or specified by IT people at a later phase.

Examples of omitted details are: assignment of fact types associated with derived events whose values are

copied, and the physical realization of data elements and the way they are fetched (part of original event,

enrichment of events, or query of data stores). The implementation details are beyond the scope of this report.

We adopted the approach of transforming the CIM to a PIM rather than do direct transformation to a PSM

model, since the aim of TEM is to be generic and fit multiple implementations. For the PIM, we use the

model described in [1], which is based on the notions of event processing network (EPN) and event

processing agents (EPAs). It is a comprehensive model that can be mapped to many specific event processing

languages. Algorithm 1 below depicts how to generate an EPN out of a TEM model.

Algorithm 1: EPN generation

Procedure CreateEPN foreach Event Derivation Table (EDT)

for i=1 to EDT.numberOfRows do for j=1 to EDT.numberOfPatternConditions do

EPAij:= CreateEPA (EDT, i, j) RowEPAi := AND (RowEPAi , EPAij) //AND relationship among EPAs in the same row

endfor EDT_EPN := OR (EDT_EPN, RowEPAi) //OR relationship among EPAs in different rows endfor EPN = Append (EPN, EDT_EPN) //union of subEPNs of EDTs into an EPN endforeach

return EPN Procedure CreateEPA (EDT, i, j)

EPA.inputEvents := get EDT.columHeadings.event_types EPA.derivedEvent := EDT.name EPA.temporalContext := EDT.row[i].whenConditions EPA.segmentationContext := EDT.row[i].partitionByCondition EPA.filterOnEventExpression := EDT.row[i].filterOnEventConditions EPA.pattern := EDT.row[i].PatternCondition[j] EPA.filterOnPatternExpression := EDT.row[i].FilterOnPatternConditions EPA.policyValues := CalculatePolicyValues(EDT.name, i) foreach EPA.derivedEvent.attribute derivedExpressionValue := CalculateComputationValues(EPA.derivedEvent.attribute, i) if derivedExpressionValue != null then EPA.derivedEvent.attribute.expressionValue := derivedExpressionValue //values for computed attributes else EPA.derivedEvent.attribute.expressionValue := inputEvent.attribute // implicit inferring by copying of input event attribute values endforeach

Page 12: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

return EPA Procedure CalculateComputationValues (attributeName, i)

computationExpression := Find (ComputationTable.attributeName.rowInEDT, i) //attribute computed expression or value associated to row i in EDT in the computation table

return computationExpression;

Procedure CalculatePolicyValues (derivedEvent, i) policyDerivedEvent := Find (PolicyTable, derivedEvent, i) //policy values associated to row i in corresponding EDT in policy table if policyDerivedEvent.values != null then policies.evaluationPolicy := policyDerivedEvent.evaluation policies.cardinalityPolicy := policyDerivedEvent.cardinality policies.repeatedPolicy := policyDerivedEvent.repeated policies.consumptionPolicy := policyDerivedEvent .consumption endif else policies := defaultValues

return policies

Informally, the construction of the EPN follows the following steps:

Each row in an EDT generates an EPA for each operator in the row. If there is more than one EPA,

an additional AND EPA will be added among all the EPAs.

Add an OR EPA between the EPAs of the EDT

Each row in each computation table derives a computed value in the event payload

Each row in each policy table derives the policy assignments of the corresponding EPA.

For example, the algorithm generates the corresponding EPA depicted in Figure 1 for the Expensive calls

situation from the EDT shown in Table 1 along with its corresponding computation tables (Table 2 and

Table 3) and policy table (Table 4). For a complete explanation of these tables refer to D4.23.

Figure 1: EPA generated for the Expensive calls situation

Event Processing Agent

CDR

within context

filtering

SUM

deriving

ExpensiveCalls

other_party_tel_number is member of premium services AND call_direction = 1

countSum(total_call_charge_amount) > 100

callsCostSum: countSum.SUMcall_start_dates: {call_start_date<CDR>}

Page 13: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Table 1: Expensive calls EDT

Table 2: calls_cost_sum<Expensive calls> computation table

Table 3: call_start_dates< Expensive calls> computation table

Table 4: Expensive calls policy table

In our illustrative example of the mobile phone fraud use case, the algorithm generates the following EPN

(Figure 2).

1every 6

hours

first CDR member

of

premium

services

= 1 > 100same

Expensive calls Logic

Row #When

Expression

When

Start

When

End

Partition by Pattern Filter on event Filter on pattern

Calling number other_party_tel_nu

mber <CDR>

call_direction

<CDR>

SUM(total_call_charge

_amount<CDR>) 

Row # Row in Event

derivation Table

1 countSum.SUM 1

calls_cost_sum<Expensive calls> Computation

Row # Evaluation Cardinality Repeated ConsumptionRow in Event

derivation Table

1 immediate 1

Expensive calls Policy

Row # Row in Event

derivation Table

1 {call_start_date<CDR>} 1

call_start_dates<Expensive calls> Computation

Page 14: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Figure 2: Generated EPN in the mobile fraud use case

4 Generate the code and create the platform specific model (PSM)

This phase is a mapping between the PIM and PSM. Assuming that all missing details are obtained at the

PIM level, this is a mere functional transformation.

One way to represent an EPN is through a JSON (JavaScript Object Notation) file. This file can then be

provided to PROTON4 as configuration to the run-time engine. At execution, PROTON’s run-time engine

accesses the JSON file, loads and parses all the definitions, creates a thread per each input and output adapter

and starts listening for events incoming from the input adapters (representing producers) and forwards events

to output adapters (representing consumers).

In our EPA example of Figure 1, the corresponding JSON describing it is shown in Figure 3.

ExpensiveCalls

FrequentLongCallsAtNight

CD

Rs

FILTERLongCallAtNight

Situ

atio

ns

FrequentLongCalls

FrequentEachLongCall

COUNT

COUNT

SUM

AND

COUNT

SUM

Page 15: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Figure 3: JSON snippet for Expensive calls EPA

Figure 4 sketches how the JSON representation of the EPN is imported into PROTON to be applied during

run-time.

{

"name": "ExpensiveCallsEPA",

"createdDate": "Mon Mar 09 2015",

"epaType": "Aggregate",

"context": "CompositeExpensiveCalls",

"inputEvents": [

{

"name": "CDR",

"filterExpression": "(IndexOf(CDR.other_party_tel_number,'960') == 1 ||

IndexOf(CDR.other_party_tel_number,'960') == 2) &&

EqualsIgnoreCase(CDR.call_direction,'O')",

"consumptionPolicy": "Reuse",

"instanceSelectionPolicy": "First"

}

],

"computedVariables": [

{

"name": "SUM",

"aggregationType": "Sum",

"CDR": "CDR.total_call_charge_amount"

}

],

"assertion": "SUM > 100",

"evaluationPolicy": "Immediate",

"cardinalityPolicy": "Single",

"internalSegmentation": [],

"derivedEvents": [

{

"name": "ExpensiveCalls",

"reportParticipants": false,

"expressions": {

"Duration": "0",

"calling_number": "context.CallingNumberSegmentation",

"CallsCostSum": "SUM",

"call_start_dates":"ArrayConvert(CDR.call_start_date)"

}

}

]

}

Page 16: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Figure 4: Generation of code with PROTON

5 TEM extensions for the sake of the optimizer and learning framework

5.1 Extensions principles

TEM is primarily intended for business users as a means to define the business logic of an event processing

application. The requested extensions, on the other hand, are intended for IT people/developers as part of

the FERARI platform capabilities. Our goal was to extend the model we had to include all input required for

both the optimizer and the learning framework in such a way that we don’t “break” any of our TEM

principles. Therefore the extensions should satisfy two requirements:

1. Have the same “look and feel” as before in order to get one complete and coherent model.

2. At the same time, be an additional, optional, and independent part of both the model and

methodology. This part comes after the business logic of the application is specified by a business

user and it is intended for optimization purposes.

Therefore, in order to meet the above requirements we had to keep the following characteristics of TEM:

Model driven approach for event driven applications

Tabular representation of the logical artifacts

Normalized tables

Same structure of tables, that is: first row in the table designates the table name according to a

pre-defined syntax; the second row designates the table columns; whereas the third row and on,

designates the different conditions that hold for a specific instance of the table. Any TEM table

consists of a collection of conditions that issue a disjunctive normal form, namely all conditions

in a single row have conjunction relationship among them (AND), while the relationships among

multiple rows is a disjunction (OR).

One single logical artifact (EDT) for each derived event.

TEM extensions are targeted to technical people, and as such, they should be decoupled and

independent of the TEM core tables and methodology.

PROTON

TEM compiler

CEP engine(Run time)

CDRs

Fraudulent calling numbers

JSON

Page 17: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

In the following sections we describe our extensions that satisfy the above characteristics.

5.2 TEM extensions for the FERARI optimizer and learning framework

In order to be able to generate an EPN adequate for the optimizer to understand and perform, TEM needs to

provide the following information (refer to D5.2 and D5.3 for information on the optimization approach in

FERARI):

1. Query Rewriting/Reordering Override: If we are confident about the queries as written we should

have an option if we do not wish any rewriting. The Boolean attribute rewrite is needed at the EPN

level marking that no query transformation is desirable. Additionally, another Boolean attribute reorder

is needed marking that the user doesn’t wish to break the query plan in multiple steps.

2. Optimization parameters: The optimizer should also be aware based on which parameter it should

optimize its plans for. This part is needed for the selection of the final plan from the list of the

optimal plans. There are 4 different attributes that we should have; maxLatency = float

(default=infinity), maxCost = float (default=infinity), latencyWeight and costWeight. If one weight is

specified the other is set to 0. If both weights are missing we optimize for cost.

3. Local Operator Placements: The optimizer needs to know if an operator (EPA) should be placed in

every local site. In that regard, the EPN definition file should include a Boolean attribute named

localPlacement.

4. Confidence on uncertainty: A threshold value per operator based on which we should qualify

uncertain events. This value is input to the optimizer in order to be able to take it into account in the

plan generation.

5. Geometric Method operator: There is the need for creating and describe in the definitions file the

GM operator. The necessary fields are listed below:

“name”: string

“functionName”: string

“functionLocation”: string

“monitoringObject”: event or event attribute

“derivedEvents”: list of derived events with their attributes

“sites”: list of sites to monitor

“defLSVValue”: float

“thresholdType”: dynamic or default

“threshold”: integer

“equalityInAboveThresholdRegion”: boolean

“resolveSteps”: integer or “optimize”

“weightVectors”: list of weights per site, on absence average

“context”: context as in proton, composite or temporal

6. Geometric Method Learning operator: Even though the distributed online learning can be regarded

as a special case of the geometric method, for convenience we defined specific fields suitable for

distributed learning.

“functionName”: string

“monitoringObject”: event or event attribute

“derivedEvents”: list of derived events with their attributes

Page 18: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

“initialModel”: string

“synchProtocol”: string, location of class implementing ISyncOp interface

“synchProtocolParams”: map of parameter names to parameter values

“resolutionProtocol”: string, location of class implementing IResolutionProtocol interface

“resolutionProtocolParams”: map of parameter names to parameter values

“serviceType”: string, currently one of “classification”, “regression”, “outlier detection”, “KDE”

“updateRule”: string, location of class implementing IUpRule interface

“updateRuleParams”: map of parameter names to parameter values

“modelType”: string, location of class implementing IModel interface

“modelParams”: map of parameter names to parameter values

“lossFunction”: string, location of class implementing ILossFunction interface

“lossFunctionParams”: map of parameter names to parameter values

“aggregationMethod”: string, location of class implementing IAggregationMethod interface

Note that requirements 1 and 2 are at the level of the EPN, while requirements 3 and 4 are at the level of the

EPA (operator). Requirements 5 and 6 specifically address the new geometric method and geometric method

learning operators accordingly.

Based on our illustrative mobile fraud use case, we introduce the new tables and populate them with some

values, later on shown in a snippet of the generated JSON (see Section 5.2.5).

5.2.1 EPN Optimization table

A single table that is composed of the fields needed to satisfy requirements 1 and 2 above (Table 5). As all

tables in TEM the first row indicates its name, in our case, EPN Optimization. The table consists of two parts

separated by a red vertical line. The right hand part satisfies requirement 1, while the left hand part satisfies

requirement 2. As any TEM table, all conditions in a single row maintain a conjunction relationship.

Table 5: EPN Optimization table

5.2.2 EPA Optimization table

For each EPA in an EDT, we introduce an EPA Optimization table as shown in Table 6. The table name (first

row of the table) is composed of the name of the derived event + suffix “EPA”. The reference to the

corresponding EDT is done by the name of the event (name of the EDT) and the specific row in the EDT

(the corresponding EPA)Table 5.

Row # Rewrite Reorder maxLatency maxCost latencyWeight costWeight

1 T T 10.25 2.5 0.8 0.2

EPN Optimization

Page 19: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Table 6: EPA Optimization tale

5.2.3 GM operator table

To address requirement 5, we include a new type of operator named GM in the corresponding EDT (Table 7)

that references to a corresponding GM operator table (Table 8).

Table 7: Introducing the GM operator

The name for the GM operator table is the GMname + suffix of “GM”. Again, as in all TEM tables, the

second row is the headings. In this case the columns correspond to the fields required in 5.

Table 8: GM operator table

(*) eu.ferari.examples.distributedcount.function.IdentityFunction

5.2.4 GML operator table

Similarly to the extension for the GM operator, we introduce a new operator type named GML operator and its

corresponding table (Table 10 ) that is referenced by the keyword GML in the corresponding EDT table

(Table 9). As its counterpart the GM operator, the name of this table is operator name + suffix GML. The

column headings of the GML operator table match the fields in requirement 6 above.

Local

placement

[T/F]

Confide

nce

[0..1]

Row in Event

derivation Table

T 0.7 1

Long call at night EPA

1

EDT name

GM

Filter on

event

Pattern Filter on

patternRow #

When

Expression

When

Start

When End Partition

by

Row #

function

Name

functionLo

cation

monitoring

Object

equalityIn

Above

Threshold

Region

resolve

Steps

weight

VectorsRow in Event

derivation

Table

1 ferari * CDR T optimize 1

DistributedCounter GMsites

0.0f

defLSVValue threshold

Type

threshold

dynamic 100

Page 20: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Table 9: Introducing the GML operator

Table 10: GML operator table

* eu.ferari.learning.ComEffLearner

** location of update rule in framework

*** location of loss function in framework

5.2.5 Generated JSON file

The above populated tables translation is represented in the snippet of the resulting JSON file snippet in

Figure 5. The EPN optimization table is shown in blue. The EPA optimization table translation is shown in

orange. The GM parameters are color coded in green, whereas the GMLL parameters in purple.

1

EDT name

GML

Filter on

event

Pattern Filter on

patternRow #

When

Expression

When

Start

When End Partition

by

Page 21: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

Figure 5: JSON snippet showing optimization parameters

{"epn": {

"name": "MobileFraud"

"rewrite": "true"

“reorder”: “true”

"optimize": {

“maxLatency” : “10.5”,

“maxCost” : “2.5”,

“latencyWeight” : “0.8 “,

“costWeight”: “0.2”

}],

“GM”:[ { “name”: “distributedCounter”, “functionName”: “eu.ferari.examples.distributedcount.function.IdentityFunction”, “monitoringObject”: [{“CDR” },

{“another”}] //list of input events “derivedEvents”: [ {

“name”: ”globalThresholdViolation”, “otherAttribute”: ”otherValue” } ],

“defLSVValue”: “0.0f”, “thresholdType”: “dynamic”, //optional field default static or dynamic “threshold”: “100”, “equalityInAboveThresholdRegion”: “true”, “context” : “gmCompositeContext” },…],

"GML" : [ { "name":"learnerName", "functionName":"eu.ferari.learning.ComEffLearner", "monitoringObject":[{“CallPOPDWH” }, {“another”}] //list of input events "derivedEvents":[{ "name":"selectivityUpdate", "selectivity":"[float]"}], "context":"time window or segmentation", "initialModel":"random", "synchProtocol":"dynamic", "synchProtocolParams":[{"batchSize":"8"},{"threshold":"0.6"}], "resolutionProtocol":"hedgeActive", "resolutionProtocolParams":"none", "serviceType":"KDE", "updateRule":"location of update rule in framework", "updateRuleParams":[{"lambda":"0.1"},{"eta":"1.0"}], "modelType":"kernel", "modelParams":[{"kernelType":"gaussian"}, {"sigma":"0.1"},

Page 22: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

5.3 Extensions to the TEM methodology

As aforementioned, as we aim to decouple the FERARI’s specific extensions from the generic model targeted

to non-technical users, we add one optional step to the methodology, as shown below.

As detailed in D4.23, the construction of the computation independent model includes the following 6 steps.

1. Identify the goals in terms of situations that need to be derived from the application and identify a

consumer for each situation (the “WHAT” phase).

2. For each such situation, construct a diagram that drills down to what is needed to be known or

detected in order to derive this situation (the high level “HOW” phase).

3. For each node in the diagram, construct a corresponding EDT and optionally computation and

policy tables that specify the logic for the node. This step is done bottom-up starting from the leaves

of the diagram and finishing with the situations to be detected.

4. For each event or fact type that is referred in the logic artifacts, locate its origin or create a

requirement to fetch or instrument it. If it is not feasible, refine the requirements.

5. Complete the glossary.

6. Validate the model against TEM Principles.

We add step number 7 which states:

7. Optional: For optimization and learning frameworks complete the model by adding the EPN and

EPA optimization tables, and for each row in an EDT that corresponds to a GM or GML operator

its GM and GML operator tables accordingly.

5.4 Summary of TEM extensions for optimization and learning frameworks

In summary, we extended the generic TEM to cope with the requirements imposed by the optimization and

learning frameworks in the FERARI project. To this end, we added to the model four new tables:

The EPN optimization table

The EPA optimization table

The GM and GML operators tables

We showed how these tables can be translated to the JSON definitions file. Furthermore, we extended our

methodology by an additional step dedicated to IT users interested in leveraging the optimization and

learning features offered by FERARI.

Page 23: The Architecture Design of the SPEEDD Prototype · Multiple long distance calls per calling number (more than 9) that last more than 60 minutes in a day (Frequent long calls). Same

6 Summary

The Event Model follows the Model Driven Engineering approach [3][4] and can be classified as a CIM

(Computation Independent Model), providing independence in the physical data representation, and omitting

details which are obvious to the designer. This model can be directly translated to an execution model (PSM

– Platform Specific Model in the Model Driven Architecture terminology) through an intermediate generic

representation (PIM – Platform Independent Model). In deliverable 4.23 we focused on the CIM layer and

presented the complete event model along with an accompanying methodology for the approach. The first

part of this report complements the model driven approach and presents the two remaining layers of PIM

and PSM. We show the translation of TEM (i.e., CIM) into a corresponding EPN (PIM) that can be then

represented as a JSON file and be fed into our CEP engine PROTON for run-time (PSM). The result of this

work is a complete and full model driven approach for event driven applications intended for business users.

The approach is illustrated using the mobile fraud use case in the project.

In the second part of the report, we extend the model and the methodology to meet specific requirements

that stem from the optimization and learning techniques developed in FERARI. The extensions have been

carried out in a way that the new extensions follow TEM principles on the one hand, but can be decoupled

from the core and generic model, on the other.

7 References

[1]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company.

[2]. Von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press.

[3]. Bodenstein C., Lohse F., and Zimmermann A. 2010. Executable specifications for model-based

development of automotive software. SMC 2010, 727-732.

[4]. Brambilla M., Cabot J., and Wimmer M. 2012. Model Driven Software Engineering in Practice. Morgan &

Claypool.