52
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee. Project Acronym: SecureIoT Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA) Project Full Title: Predictive Security for IoT Platforms and Networks of Smart Objects DELIVERABLE D3.1 – Security Information Storage and Analytics Infrastructure Deliverable Number D3.1 Deliverable Name Security Information Storage and Analytics Infrastructure Dissemination level Public Type of Document Report Contractual date of delivery 30/09/2018 Deliverable Leader AIT Status & version Final-1.2 WP / Task responsible WP3/Task T3.1 (AIT) Keywords: IoT Security, SecureIoT infrastructure, Data collection, Data streaming, Big data analytics Abstract (few lines): The document presents the infrastructure that will be used in SecureIoT for the planned trials. Different technologies for its various components are presented and arguments are given for the selected ones. Deliverable Leader: Athens Information Technology (John Soldatos, Sofoklis Efremidis) Contributors: John Soldatos (AIT), Sofoklis Efremidis (AIT), Daniel Calvo Alonso (AOTS) Reviewers: Sofianna Menesidou (UBI), Mariza Konidi (INTRA) Approved by: George Koutalieris (INTRA)

DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee.

Project Acronym: SecureIoT

Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA)

Project Full Title: Predictive Security for IoT Platforms and Networks of Smart

Objects

DELIVERABLE D3.1 – Security Information

Storage and Analytics Infrastructure

Deliverable Number D3.1 Deliverable Name Security Information Storage and Analytics

Infrastructure Dissemination level Public

Type of Document Report

Contractual date of delivery 30/09/2018

Deliverable Leader AIT

Status & version Final-1.2

WP / Task responsible WP3/Task T3.1 (AIT)

Keywords: IoT Security, SecureIoT infrastructure, Data collection, Data

streaming, Big data analytics

Abstract (few lines): The document presents the infrastructure that will be used in

SecureIoT for the planned trials. Different technologies for its

various components are presented and arguments are given for

the selected ones.

Deliverable Leader: Athens Information Technology (John Soldatos, Sofoklis

Efremidis)

Contributors: John Soldatos (AIT), Sofoklis Efremidis (AIT), Daniel Calvo

Alonso (AOTS)

Reviewers: Sofianna Menesidou (UBI), Mariza Konidi (INTRA)

Approved by: George Koutalieris (INTRA)

Page 2: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 2

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Executive Summary

This document presents the infrastructure that will be employed in SecureIoT. The

infrastructure is aligned with the overall architecture of the project and comprises a set of

interconnected components for collecting security data from the target IoT system, streaming

the data for storage and processing and applying predictive IoT security analytics techniques for

the timely detection of any security issues at the target IoT system and also the visualization of

the collected data. The document presents requirements for the parts of the infrastructure and

arguments for the selection of its components.

Page 3: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 3

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Document History

Version Date Contributor(s) Description

0.10 04/6/2018 John Soldatos (AIT) Initial structure of the document for

discussion during the Bilbo meeting

0.11 08/6/2018 John Soldatos (AIT) Revised/updated structure

0.12 11/6/2018 John Soldatos (AIT) Updates based on feedback from ATOS

0.13 13/7/2018 Sofoklis Efremidis

(AIT) Updates on Chapters 2, 3, 4

0.15 13/7/2018 Sofoklis Efremidis

(AIT) Information fine-tuning in Chapters 2 & 3

0.16 27/7/2018 Sofoklis Efremidis

(AIT) Updates on Chapters 1 and 2

0.17 6/9/2018 Sofoklis Efremidis,

John Soldatos (AIT) Alignment with D2.4, Chapter 2

0.18 20/9/2018 Daniel Calvo

(ATOS)

IoT application data modelling, subsection

3.1.4

0.20 21/9/2018 Sofoklis Efremidis

(AIT) Updates to Chapter 4

0.21 22/9/2018 Sofoklis Efremidis,

John Soldatos (AIT) Content harmonization

0.90 24/9/2018

Sofoklis Efremidis

(AIT), John

Soldatos (AIT)

Document edits

0.91 25/9/2018 Sofoklis Efremidis

(AIT) Updates to Chapter 4

1.00 26/9/2018 Sofoklis Efremidis

(AIT) Document final edits

1.00 28/9/2018 Sofianna

Menesidou (UBI) Reviewer’s comments received

1.00 28/9/2018 Mariza Konidi

(INTRA) Reviewer’s comments received

1.10 28/9/2018 Sofoklis Efremidis

(AIT) Edits incorporating reviewers’ comments

1.20 29/9/2018 Sofoklis Efremidis

(AIT) Final document polishing for submission

Page 4: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 4

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Table of Contents Executive Summary ......................................................................................................................... 2

Definitions, Acronyms and Abbreviations ...................................................................................... 6

1 Introduction ............................................................................................................................. 7

1.1 Scope and Purpose.......................................................................................................... 7

1.2 Background and Vision ................................................................................................... 7

1.3 Methodology ................................................................................................................... 8

1.4 Document Structure ....................................................................................................... 9

2 SecureIoT Data Storage and Analytics Requirements ........................................................... 10

2.1 The four Vs of SecureIoT BigData ................................................................................. 10

2.1.1 Data Characteristics .................................................................................................. 10

2.1.2 Data Types ................................................................................................................. 11

2.2 Data streaming requirements ....................................................................................... 12

2.3 Data analytics requirements ......................................................................................... 13

2.3.1 Simple Analytics – Rule-Based .................................................................................. 13

2.3.2 Machine Learning ..................................................................................................... 14

2.3.3 Deep Learning ........................................................................................................... 14

2.4 Alignment to WP2 and the SecureIoT Architecture ..................................................... 14

3 Information Streaming and Storage Infrastructure .............................................................. 20

3.1 SecureIoT Information Modelling ................................................................................. 20

3.1.1 IoT Assets Modelling ................................................................................................. 20

3.1.2 Attack Modelling ....................................................................................................... 21

3.1.3 IoT Security Data Modelling ...................................................................................... 22

3.1.4 IoT Application Data Modelling ................................................................................ 23

3.1.5 IoT Security Templates & Rulesets Modelling .......................................................... 28

3.2 Data Collection Infrastructure ...................................................................................... 29

3.3 Data Streaming Infrastructure ...................................................................................... 30

3.3.1 Request Reply ........................................................................................................... 31

3.3.2 Publish Subscribe ...................................................................................................... 32

Page 5: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 5

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

3.3.3 SecureIoT Streaming Infrastructure ......................................................................... 34

3.4 Data Storage Infrastructure .......................................................................................... 37

4 SecureIoT Analytics Infrastructure ........................................................................................ 39

4.1 Data Analytics in SecureIoT........................................................................................... 39

4.1.1 Analytics Layers ......................................................................................................... 39

4.1.2 Data analytics requirements ..................................................................................... 39

4.2 Data Analytics Framework in SecureIoT ....................................................................... 39

4.2.1 Apache Hadoop ......................................................................................................... 40

4.2.2 Apache Spark ............................................................................................................ 40

4.2.3 SecureIoT Data Analytics framework ........................................................................ 41

5 Prototype Implementation and Demonstration ................................................................... 42

5.1 Data collection .............................................................................................................. 42

5.2 Data storage .................................................................................................................. 47

6 Conclusions ............................................................................................................................ 50

References .................................................................................................................................... 51

Table of Figures FIGURE 1: OVERVIEW OF THE SECAAS PARADIGM. ................................................................................................................. 15 FIGURE 2: OVERVIEW OF SECUREIOT ARCHITECTURE. .............................................................................................................. 16 FIGURE 3: ARCHITECTURE OF THE DATA COLLECTION AND ACTUATION LAYER. ............................................................................... 17 FIGURE 4: ARCHITECTURE OF THE SECURITY INTELLIGENCE LAYER................................................................................................ 18 FIGURE 5: OVERVIEW OF THE SECUREIOT INFRASTRUCTURE. ..................................................................................................... 19 FIGURE 6: ASSET MODEL. ................................................................................................................................................... 21 FIGURE 7: ATTACK MODEL. ................................................................................................................................................. 22 FIGURE 8: SECUREIOT PROBES COLLECTING AND HARMONIZING APPLICATION-DATA INFORMATION FROM MULTIPLE IOT PLATFORMS ...... 24 FIGURE 9: UML CLASS DIAGRAM FOR NGSI. ......................................................................................................................... 25 FIGURE 10: LOGICAL VIEW OF OPENMTC CONNECTOR FOR FIWARE ORION CONTEXT BROKER. .................................................... 25 FIGURE 11: NGSI-LD ONTOLOGY APPLIED IN AN EXAMPLE. ...................................................................................................... 27 FIGURE 12: SSN/SOSA CONCEPTUAL MODULES, CLASSES AND PROPERTIES FOR OBSERVATION PERSPECTIVE...................................... 28 FIGURE 13: OVERVIEW OF THE REQUEST-REPLY ARCHITECTURE. ................................................................................................. 31 FIGURE 14: OVERVIEW OF THE REQUEST-REPLY ARCHITECTURE WITH MULTIPLE CLIENTS. ................................................................ 32 FIGURE 15: OVERVIEW OF THE PUBLISH-SUBSCRIBE ARCHITECTURE. ........................................................................................... 32 FIGURE 16: OVERVIEW OF THE PUBLISH-SUBSCRIBE ARCHITECTURE WITH MULTIPLE DATA PRODUCERS AND CONSUMERS. ..................... 33 FIGURE 17: OVERVIEW OF BROKER ARCHITECTURE.................................................................................................................. 34 FIGURE 18: KAFKA PARTITIONS AND READ-WRITE OPERATIONS.................................................................................................. 35 FIGURE 19: OVERVIEW OF RABBITMQ ARCHITECTURE. ............................................................................................................ 36 FIGURE 20: OVERVIEW OF THE INFRASTRUCTURE SETUP. .......................................................................................................... 42

Page 6: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 6

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

List of Tables NO TABLE OF FIGURES ENTRIES FOUND.

Definitions, Acronyms and Abbreviations

Acronym Title

AMQP Advanced Message Queuing Protocol

API Application Programming Interfaces

BRMS Business Rule Management System

CRUD Create Read Update Delete

CVSS Common Vulnerability Scoring System

DAG Directed Acyclic Graph

ECU Electronic Control Unit

ETL Extract Transform Load

ESB Enterprise Service Bus

ETSI European Telecommunications Standards Institute

ETSI CIM ETSI Context Information Management

ETSI NGSI-LD ETSI Next Generation Service Interfaces-Linked Data

GDPR General Data Protection Regulation

HDFS Hadoop Distributed File System

IIoT Industrial Internet-of-Things

IoT Internet-of-Thing

OEM Original Equipment Manufacturer

RDD Resilient Distributed Dataset

RDF Resource Description Format

SECaaS Security as a Service

SOSA Sensor, Observation, Sample and Actuator

SSN Semantic Sensor Network

W3C World Wide Web Consortium

Page 7: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 7

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

1 Introduction 1.1 Scope and Purpose SecureIoT [1] plans to architect, implement, and demonstrate a standards-based open end-to-

end security framework for securing cross-platform, dynamic, and decentralized IoT systems.

The security framework will be aligned with international standards and initiatives and will

support the development, integration and deployment of cross platform IoT services that may

involve multiple dynamic autonomous and intelligent smart objects and devices. Security

support will be provided through select security services for risk assessment and mitigation, for

seamless development of secure IoT services, as well as for auditing and compliance, as a set of

add-on services (as opposed to built-in ones), following the Security as a Service (SECaaS)

paradigm. The SecureIoT framework targets application developers, platform providers,

solution integrators and IoT OEMs.

The SECaaS services will be heavily based on predictive IoT security functionalities, whereby

security related data that are generated from different nodes of the target IoT services will be

communicated to and continuously analyzed by an analytics engine implementing targeted and

sophisticated machine learning algorithms, which will provide continuous and timely

monitoring and alerting.

Security data are communicated through a scalable, flexible and efficient communication

infrastructure for further storage and analytics processing. The analytics engine makes use of

security related knowledge bases that contain templates and rules (relating to historical

security aspects of IoT services) that are applied to the monitored data, raising alerts when

security breaches to the target IoT service are suspected.

SecureIoT will conduct three trials demonstrating the IoT security services that will be

developed. This document presents the infrastructure that will be used in the course of the

project for setting up the project trials, and in particular the data collection, data transfer, data

storage, and the analytics parts of it.

1.2 Background and Vision The vision of SecureIoT is to secure the next generation of dynamic, decentralized, multi-

platform IoT systems, which may include intelligent and (semi)autonomous objects or things.

The project will realize this vision by providing a set of SECaaS services that will support the

operation of target IoT systems or deployments based on predictive analytics. The envisaged

services comprise

(a) Risk assessment and mitigation, by applying well established approaches like the NIST

Common Vulnerability Scoring System (CVSS) for identifying risks and providing solution

for their mitigation

Page 8: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 8

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

(b) Compliance auditing and recommendations, by providing tools that provide support for

security and privacy controls at various levels of the IoT deployment, including controls

that pertain to the enforcement of regulations like GDPR, NIS and ePrivacy directives.

(c) Developers’ support through a set of programming language level annotations and their

mappings to runtime functionalities for monitoring and policy enforcement.

The runtime support functionalities of SecureIoT comprise the collection, and filtering of

security related data as well as use of predictive analytics techniques for proactively and timely

identifying security related issues for the target IoT deployment, like attacks and incidents.

1.3 Methodology The specification of the SecureIoT infrastructure follows the overall project architecture and the

requirements that have to be met for securing the target IoT services. The following steps were

followed for defining the infrastructure that will be put in place.

First, the nature and the properties of the collected security data are identified. Security related

data in SecureIoT are collected from different levels of the IoT service, namely, the devices,

edge, cloud, and duplication. The SecureIoT security framework supports cross platform

deployments of the IoT services, thereby collected security data may originate from a number

of platforms on which the target IoT service may be deployed. Practically, security data are

generated by probes that are associated with select nodes of the target IoT service.

Second, after the properties of the security data are identified, their schema is defined. The

schema is generic enough to accommodate the diverse types of data that may be generated by

a number of sources and may assume a number of forms.

Third, the infrastructure for the collection of the security data is specified. Collection of data

has to be very efficient, and flexible. It is accomplished through a number of probes that are

deployed along select nodes of the IoT system, which collect the data of interest from the IoT

nodes and communicate them to the analytics modules.

Fourth, the transfer and storage of the collected security data is specified. Transfer of large

amounts of data has to be very efficient and flexible and impose the least possible overhead to

the operation of the target IoT service. Moreover, data storage must also be efficient and

flexible and allow the query and processing of data that are characterized by their large

volumes and diversity of types.

For the specification of the infrastructure that will be put in place in the context of Secure IoT

some key requirements related to its performance, availability, and reliability are stated. In the

sequel some key platforms are presented and justification is given for the one that will be used

as part of the SecureIoT infrastructure.

Page 9: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 9

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

1.4 Document Structure The document is structured as follows: Chapter 2 presents the data storage and analytics

requirements for the platform. The chapter first lists the characteristics of the security data that

are collected by the target IoT system, i.e., the high volume, speed, diversity, and quality. It

then gives a generic model that intends to encompass all different types of collected data.

Based on the properties of the security data identified the chapter lists the requirements that

the infrastructure has to meet regarding their collection, streaming, storage and processing.

Finally, the architecture of the infrastructure that will be used is presented and is alignment

with the overall SecureIoT architecture is shown.

Chapter 3 presents the information streaming and storage infrastructure. The same chapter

presents also the information collection components of the infrastructure. Different

technologies are presented and arguments are given for the selection of those that meet the

project’s requirements and will be used for its trials.

Chapter 4 presents the analytics infrastructure. Similar to Chapter 3 different analytics

technologies are presented that are candidates to be part of the project’s infrastructure.

Arguments for the one that will be used in the project’ trials are also presented.

Chapter 5 gives a prototype implementation of the infrastructure that has been put in place.

The overall setup of the infrastructure is presented along with its different components and

their configurations. Sample runs with artificially collected data and corresponding screenshots

are also presented in that chapter. The infrastructure will be used for running the project trials

and will be updated according to their needs.

Finally, Chapter 6 concludes the document.

Page 10: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 10

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

2 SecureIoT Data Storage and Analytics

Requirements 2.1 The four Vs of SecureIoT BigData SecureIoT aspires to provide a set of security services for target IoT systems based on security

analytics. The functioning of these services will be based on the security data that are collected

from probes that are deployed along the target IoT system at different layers of the supporting

IoT platform, i.e., network, device, edge, cloud, and application layers, as well as across

different IoT platforms that may support the target IoT system and applications over it.

The security oriented data that are collected by the SecureIoT services possess similar

characteristics to Big Data [2] that are encountered in other contexts, in particular:

Volume: SecureIoT services make use of large volumes of security data. These data may

be either historical security data that are used to both train and fine tune the analytics

algorithms, or large volumes of security data that are collected during the operation of

the target IoT system from its different architectural layers. The latter are used to flag

out abnormal behavior of the target IoT system so as to take actions for guarding its

security.

Variety: Security data that are collected by the SecureIoT services come from a number

of feeds that are deployed along with the target IoT system. Different types of IoT

devices generate different types of security data. Moreover, different IoT architectural

layers (network, field, edge, cloud, application) generate different types of security data.

Finally, different IoT deployments also generate different types of security data.

Velocity: Security data are generated with different speeds by the deployed data

probes. Both streamed and non-streamed data are used by the analytics algorithms.

Moreover, streamed data are collected and transferred with varying speeds depending

on the required accuracy of the monitoring and reaction processes.

Veracity: The quality and trustworthiness of the collected security data is a prerequisite

for the quality of the results that will be produced by the analytics algorithms. Security

data that are collected by IoT deployments in the context of SecureIoT will be filtered

before they are processed so as to guarantee quality level for the security services to be

provided.

2.1.1 Data Characteristics SecureIoT services depend on data collected from a number of probes that are deployed along

a number of architectural layers of the target IoT system. In particular, security data are

collected from the network, the field devices, the edge devices, the cloud servers and the IoT

application itself and are fed to the analytics module that realizes the runtime services of

Page 11: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 11

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

SecureIoT. The analytics module makes use of data analytics algorithms, which are trained and

tuned based on large amounts of historic security data.

Characteristics of the collected security data that are used for implementing the envisaged

security services are similar to those of big data in other contexts and include:

Large volumes: SecureIoT services make use of large volumes of both streaming and

non-streamed data, the former coming from the target IoT deployment while the latter

are used for training the analytics algorithms. Security data are collected from a large

number of elements of the target IoT system, including devices, edge servers, the cloud

and the IoT application itself. The precision and quality of the results produced by the

analytics algorithms depend on the volumes of processed data with larger volumes

allowing for better trained algorithms and more precise alarms generated. As a

consequence, the infrastructure that supports the IoT security services must be ble to

handle large volumes of collected data.

Large variety: The security analytics modules are fed with data coming from a variety of

probes that are deployed along the target IoT system. Different probes generate

different types of data, with different formats, resulting thus to a large variety of

generated data that must be transferred and fed to the security analytic modules.

Moreover, for the SecureIoT services to be widely applicable they must scale and be

adaptable to new types of IoT devices, which implies that they should allow new types

of generated data to be handled seamlessly.

Large speeds: SecureIoT services depend on real time generated data that are

generated in real time in order to provide alerts in real time incident identification

scenarios. Streaming data are generated continuously and with adjustable rate of

generation. These data are streamed to the analytics modules through well-established

platforms like the Apache Kafka.

Veracity: SecureIoT data are generated by probes along the IoT deployment. The

generated data are typically an exact representation of the current security state of the

various IoT components. The SecureIoT services make use of technologies that

guarantee the integrity of the generated data and as a consequence their quality when

they reach the analytics modules.

As a consequence, the infrastructure that will support the security services of SecureIoT should

be able to handle with efficiency, flexibility, and reliability the collection, transfer, storage and

processing of the security data collected from the target IoT system.

2.1.2 Data Types

SecureIoT services are based on security data that are collected from a number of different

components of IoT deployments. In particular, field devices, edge devices, and cloud based

servers generate different types of security data that are subsequently fed into the security

Page 12: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 12

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

analytics modules that implement the SecureIoT services. Moreover, application specific

security data are generated by the IoT application components.

The variety of the different types of data that may be generated from components of an IoT

deployment dictates an equal number of corresponding specifications. Such an approach is

does not scale well when new IoT components are introduced resulting into new data

specifications. Moreover, the interfaces through which these data will be communicated need

be enhanced and adapted to the new types of data. Alternatively, a generic and normalized

data representation may be defined along with mappings with the different native

representations. The advantage of this approach is that data type specificities are contained at

the points where data are produced or consumed. Following the latter approach a generic data

model (or type) for the security related data may be defined as follows:

type securityData { String sourceID; enum {perf, status, usage, alert, …} typeOfData; DateTime timestamp; HashMap<String, Object> properties; HashMap<String, Object> data; String comments; String reserved;

}

This is a rather generic data model that intends to encompass a wide variety of security related

data types in the context of SecureIoT. Specialization of the model may be defined for specific

purposes or project trials. The various fields of this abstract data model are rather self-

explanatory. The properties field in the model above is intended to capture properties or

attributes of the data source entity. For example, properties like if a device is mobile or not and

its current location may be included in this field. Infrastructure specific properties may also be

included in this field. For example, as will be explained in later chapters Apache Kafka will be

used in the context of the project as the data streaming component of its infrastructure. Kafka

defines the concept of topic to partition data to thematic queues. In such a case the pair

<”topic”, “topic name”> will appear in the properties field. This approach makes the data model

independent of its infrastructure. If, for example, at a later stage Apache Kafka is replaced by

another streaming engine, the data model will remain valid.

2.2 Data streaming requirements Collected security data need be streamed to the persistency and analytics modules of the

SecureIoT platform. These data carry information that is directly related to the security of the

target IoT system, for example, unauthorized attempts for remote access, unauthorized

attempts for software updates, deviating behavior of system components and so on. Based on

these data and the applicable rules the analytics modules will raise alarms for early detection or

security related issues to the target IoT system. As early detection of such issues constitutes a

Page 13: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 13

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

core functionality of the SecureIoT services and as such it is highly important, the streaming of

these data and as a consequence the components of the SecureIoT infrastructure that will

implement it has to meet some stringent requirements.

High volume: SecureIoT data streaming has to be able to cope with high volumes of data.

Generated security data come from different levels of the target IoT system, the IoT

application itself as well as the IoT platforms on which the application is deployed.

Fast transfer: security data have to reach their destination as fast as possible so as to allow

for early detection of security issues at the target IoT system. Combined with the high

volume requirement for security data implies that the SecureIoT data streaming

component has to support very high throughput.

Reliable communication: security data have to be delivered reliably to their destination.

This requirement implies that no such data are allowed to be lost while in transit.

Therefore, the data streaming component has to guarantee at least once semantics,

meaning that each datum will be delivered at least once, even in the presence of failures.

High availability: the services provided by the streaming component have to be highly

available to guarantee that downtime is minimal and security data are always delivered to

their destination.

2.3 Data analytics requirements The security analytics of the SecureIoT services make use of complementary technologies to

flag security breaches of the target IoT system. Both rule based and machine learning analytics

techniques are used for the purpose as exemplified in the sequel.

2.3.1 Simple Analytics – Rule-Based Rule based decision making are the simplest form of analytics. Their function is based on static

rules that specify what action to be taken when some conditions are met on a stream of inputs.

The rules have the form

rule: condition → action

where condition is a predicate on some input values. Conditions may also be time dependent,

so that values that appeared in the past can be expressed through appropriate predicates. In

addition, predicates on sequences of values may also be specified, for example, a set of

constantly increasing values for a certain duration.

A rule engine monitors a stream of inputs coming from a variety of sources like sensors,

devices, probes, etc. and tries to determine when conditions of the specified ruleset are met.

When a condition is met the corresponding rule fires and its action is performed. The most

widely used algorithm by rules engines is the Rete algorithm. Systems that implement such

functionalities are called Business Rule Management Systems (BRMS). Typically, network

intrusion detection systems are rule and signature based that operate on the detected traffic at

Page 14: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 14

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

flowing at the network boundary. Such detection is part of the functionality provided by the

SecureIoT SECaaS services.

It follows that in the context of SecureIoT the analytics engine should possess rule driven and

signature based functionality. For a given ruleset the engine should be able to monitor the data

stream coming from the target IoT system including network data and apply the rules, firing

those whose precondition is satisfied, or signatures are matched.

2.3.2 Machine Learning Machine learning is an approach to mechanized learning that is based on statistical techniques

that are applied on data. It provides for sophisticated approach to decision making since data

models, patterns, and rules are automatically extracted from already seen data and are

subsequently applied to new feeds of generated data. Two broad categories of machine

learning algorithms exist: (a) supervised learning, in which the given input and output data have

already been labelled and the objective is to discover the rules that map inputs to outputs, and

(b) unsupervised learning, in which there is no labelling of data, in which case the objective is to

discover rules and patterns from the given data.

Careful observation of patterns that appear in the dataset under processing may result into

extraction of rules that may be formulated given a set of metrics like the maximum desired

length of a rule, the confidence and support it has from the give dataset, etc. Several algorithms

have been described and appeared in the literature for rule extraction. Rule extraction is a

computationally intense process and for large datasets efficient parallelization of the algorithms

becomes mandatory.

In the context of SecureIoT the analytics engine should be able to support machine learning

algorithms for extraction of rules from large datasets and later apply the detected rules to

security data coming from the target IoT system. Such algorithms (e.g., [3]) typically require

large amounts of computing power they are parallelizable and are implemented in clusters.

2.3.3 Deep Learning

Deep learning is a special case of machine learning which uses an array of layers comprising

nonlinear processing elements for feature extraction and transformation of the input data. The

layers are arranged so as each layer receives as input the output of the previous layer. The

learning process itself can be either supervised or unsupervised. Deep learning approaches

include deep neural networks, deep belief networks and recurrent neural networks. Deep

learning algorithms for detecting security events are still at research level. [4] presents an

approach to using Deep Learning techniques for learning unknown intrusions to networks.

2.4 Alignment to WP2 and the SecureIoT Architecture This section presents shortly the SecureIoT architecture to which the infrastructure for

supporting the SECaaS services is aligned. The SECaaS services are offered to IoT

systems/platforms owners or operators, in-line with the paradigm shown in Figure 1. Security

Page 15: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 15

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

data are collected from the target IoT system that may be deployed on a number of IoT

platforms by the SecureIoT platform. The SecureIoT platform in turn provides a number of

services like Risk Assessment, Compliance and Auditing, and Developer’s support to the IoT

system operators, deployers, and developers. Moreover, the platform provides runtime

support during the operation of the target IoT system by monitoring collected security data and

generating alerts when security issues are detected at the target IoT system or generating

visualizations of them.

Figure 1: Overview of the SECaaS paradigm.

The SecureIoT architecture comprises a set of layers that communicate with well defined

interfaces. The layers as detailed in [5] and shown in are as follows:

IoT systems layer, which comprise the target IoT system

Data collection and actuation layer, which is responsible for interacting with the

components of the target IoT for collecting data and configuring its elements

Security intelligence, which is responsible for analysing the collected data by employing

data analytics and machine learning techniques and detecting security related issues

Security services (SECaaS), namely Risk Assessment, Compliance Auditing, and Developer

support.

Security use cases, which applies the security services to three representative scenarios

Figure 2 shows an overview of the SecureIoT architecture.

SecureIoT Platform

Risk Assessment Compliance Auditing

Alters Automation

IoT platform #1

Data Collection

IoT platform #N

Data CollectionCross-Platform & Cross-Vertical

SECaaS…….

Single Platform SECaaSSingle Platform SECaaS

Page 16: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 16

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 2: Overview of SecureIoT architecture.

The architecture of the data collection and actuation layer is shown in Figure 3. Details of the

components of the layer are presented in [5].

Page 17: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 17

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 3: Architecture of the data collection and actuation layer.

Figure 4 shows the architecture of the security and intelligence layer. Details of the components

of the layer are presented in [5].

Page 18: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 18

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 4: Architecture of the security intelligence layer.

The realization of the SecureIoT architecture will be supported by an infrastructure that will be

put in place and used in the context of the project’s trials. The main components of the

infrastructure concern:

Collection of security data from select nodes of the target IoT deployment, the IoT

platforms that support the IoT application and the IoT application itself.

Streaming of security data from their source to the analytics engine.

Processing and storage of the collected security data using data analytics techniques for

identifying and flagging any security issues at the target IoT system and application as

well as visualizing them.

The overall architecture of the infrastructure is shown in Figure 5. The infrastructure is

compatible with the Secure IoT architecture that is shown in Figure 2. In which the data

collection, data streaming and data processing components are shown. In addition, in Figure 5

the technologies that will be used in the context of the project are shown. These technologies

are presented in the following sections and their selection is justified.

IoT Systems (Platforms &

Devices)

FieldNetwork

FieldDevice

Edge

Cloud

App Intelligent(Context-

Aware)Data

Collection

Actuation & Automation

Open APIs

IoT Security Template Extraction (Analytics)

Template Execution

Engine(e.g., Rule

Engine)

Global Storage(Cloud)

(SecureIoT Database + Probes Registry)

IoT Security Templates Database

Templates

ContextualizationEngine

IoT Security Knowledge Base

Security Policy Enforcement Point

WP4

Open APIs

WP3Management &

Configuration ToolsVisualization (Dashboards)

Page 19: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 19

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 5: Overview of the SecureIoT infrastructure.

Elastic Beats

Elastic Beats

Elastic Beats

Apache Kafka

Apache Spark

Elastic Search

Data Collection IoT probes

Data Streaming

Data Storage Data Analytics

Page 20: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 20

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

3 Information Streaming and Storage

Infrastructure 3.1 SecureIoT Information Modelling 3.1.1 IoT Assets Modelling

An asset is defined as a physical or logical object owned by or under the custodial duties of an

organization, having either a perceived or actual value to the organization [6]. Assets can be

either material or immaterial, and include

Physical objects

Software

Document

Intellectual property (licenses, patents)

Humans

services

SecureIoT assets and their relationships are modeled as a graph, in which their properties are

represented. Each asset is modeled as a node of the graph, having a set of properties, while a

relationship is modeled as an edge of the graph having a single property. Nodes can have a

number of labels, while edges can have a single label. Labels are used to narrow searches and

navigations through the graph. Moreover, both graph nodes and edges can have multiple

properties, which are represented as [key, value] pairs. Graph databases like neo4j may be used

for implementing the asset models of a target IoT deployment and navigating through them.

Figure 6 shows a graph model of the assets of a hypothetical company. It models a plant that

contains a bolting robot, which in turn comprises two sensors: a proximity sensor and a torque

one. The model shows also the relations between the various assets.

Page 21: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 21

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 6: Asset model.

3.1.2 Attack Modelling

[6] lists some of the most widely known attacks for IoT based systems. They include

Wired and wireless scanning and mapping attacks

Protocol attacks

Eavesdropping attacks (loss of confidentiality)

Cryptographic algorithm and key management attacks

Spoofing and masquerading (authentication attacks)

Operating system and application integrity attacks

Denial of service and jamming

Physical security attacks (for example, tampering, interface exposures)

Access control attacks (privilege escalation) As noted by the authors, most of these attacks are customised to a particular IoT system

vulnerability. A list like the one above cannot be fixed, as new attack types are expected to

appear and put into use in the future.

In addition to identified attacks, standard quantifications of their impact has been specified. [7],

the Common Vulnerability Scoring System (CVSS) provides an open framework for

communicating the characteristics and impacts of IT vulnerabilities by defining metrics and

allowing thus accurate estimate of their impact.

Plant

Name: “ManA” Addr: “Sunville” Size: 1000sqm

Robot

Name: “RobA” Manufr: “Kuka” Action: Bolting

Sensor1

Name: “Sens1” Mnfr: “Mouser”

Action: Proximity

Sensor2

Name: “Sens2” Mnfr: “Kistler” Action: Torque

Company

Name: “BigC” Hq: “Main Str.”

Indus: “Automotive”

owns

Since: 1/1/2018 contains

Since: 1/1/2010

SetMinProximity

Setting: 2mm Reading: 1mm Ts: 25/6/2018

SetMaxTorque

Setting: 20Nm Reading: 18Nm Ts: 25/6/2018

Page 22: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 22

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Attacks are classified using attack trees, which model how an asset may be attacked. Each node

of the tree models an attack and has children that model the sub-attacks that must be

performed for the attack to succeed. For example, an attack to an industrial robot may be

modelled with a tree as sown in Figure 7.

Figure 7: Attack model.

3.1.3 IoT Security Data Modelling

Security data are generated by the probes that are deployed along the target IoT system and

are fed to the analytics modules of the SecureIoT services. As stated above, a generic and

scalable approach to modeling security relevant data is to abstract away from the specificities

of the various devices, sensors, edge nodes, etc. of the target IoT system and allow for a generic

data type for representing security related data, as shown below

type securityData { String sourceID; enum {perf, status, usage, alert, …} typeOfData; DateTime timestamp; HashMap<String, Object> properties; HashMap<String, Object> data; String comments; String reserved;

}

This type of unstructured data falls in the NoSQL category for which technologies and tools

exist for their efficient manipulation and search. The ElasticSerach engine for example can

Attack a

robot

Penetrate

firewall Identify

robot

Send attack

code

Launch

attack

Install code Execute

attack code

Page 23: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 23

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

handle efficient indexing and searching of large volumes of data having types as above.

Moreover, the engine provides REST and HTTP interfaces for their manipulation.

{ “text”: “…”, “version”: “…”, “ts”: “2018-06-25T17:00:00.0.00Z”, “properties”: [ { “topic”: “…” }, {“location”: “…”}, …], “data”: [ { “key1”: “…” }, {“key2”: “…”}, …, {“keyn”: “…”}], “comment”: “…”, “reserved”: “…” }

3.1.4 IoT Application Data Modelling Interoperability at the application data level is one of the most challenging aspects that are

currently limiting the definite explosion of IoT technologies: although the number of use-cases

with clear, successful and consolidates business is growing exponentially, it is still difficult to

find relevant examples of deployments where the information gathered by the same groups of

sensors or devices is used to create advanced end-user services across multiple verticals or

domains. The IoT interoperability problem also introduces an additional complexity in the

potential co-creation of complex applications and services relying on devices or sensor

belonging or connect to multiple IoT platforms managed by different organizations, aka,

business domains. Thus, resolving the interoperability problem has become a very active

research field, with diverse approaches like federation based on semantics as it is proposed by

F. Carrez et. al. in [8].

Within SecureIoT, the use of application data may be critical in order to detect anomalies and

to implement predictive security services. As it was explained in SecureIoT D2.1 [9], a great

number of treats in the main application domains of IoT technologies do not imply attack

patterns that affect to the network traffic or to the communication protocol or software

vulnerabilities. For instance, in the case of the connected vehicle scenario, detecting a

compromised onboard Electronic Control Unit (ECU) shall be possible by checking possible

inconsistencies between correlated fields (e.g., speed versus acceleration versus gear) or even

by comparing the data received by multiple cars driving simultaneously through the same

route. Thus, SecureIoT must be able to collect, store and analyse application data generated at

the different tiers of the IoT stack, from field-level devices and smart objects to platform

components.

In this subsection, an analysis of some initiatives and solutions to address the interoperability

burden are presented. It is important to highlight the impact of this technological choice not

only from a technological perspective due to its consequences in the high-level components of

SecureIoT architecture; but also, from the business point of view, since the capacity to work

with as many solutions as possible or even to easily expand the compatibility to new systems

shall be essential for a successful exploitation strategy.

Page 24: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 24

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

In the present deliverable, the analysis is constrained to the presentation and review of the

alternatives that could be exploited by SecureIoT. A final decision will be taken and described in

the next version (D3.2), considering the progress and inputs of all the documents that must be

released in milestones MS2 (M9) to MS9 (M21).

In general, from a logical point of view, data probes deployed at the different tiers of an IoT

stack will collect application-level data from components of all the layers of the stack. SecureIoT

Global Storage component must contain harmonized information so the translation from the

corresponding data model will be done by the probes. This approach is showed in Figure 8.

Figure 8: SecureIoT probes collecting and harmonizing application-data information from multiple IoT platforms

3.1.4.1 FIWARE NGSI and data models

As it is also explained in SecureIoT D2.4 [5], the central component of FIWARE IoT platform is

the Context Broker, which must be deployed mandatory. The main role of the Context Broker is

the large-scale management of context information by means of the implementation of a Next

Generation Service Interface (NGSI). Detailed documentation for the Context Broker and NGSI

API can be found at [10] and [11].

FIWARE NGSI enables the virtual or digital representation of entities (e.g., a vehicle, a room in a

house or a device), which include multiple context attributes (e.g., speed, temperature,

humidity, etc.) and metadata. The attributes’ values may come from IoT devices and smart

objects but also from other sources like web-services, chatbots, IoT platforms or even humans.

A comprehensive diagram that illustrates NGSI classes is showed in Figure 9 (extracted from

[11]). JSON syntax is proposed to represent structured data based on the entity-attribute data

model.

Page 25: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 25

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 9: UML class diagram for NGSI.

In addition, FIWARE complement NGSI with a set of harmonized schemas, vocabularies or

ontologies that try to warranty portability and interoperability. FIWARE data models are

created leveraging on the experience acquired during real experimentation and large-scale

projects in fields like smart cities, transportation or environmental monitoring. They are also

strongly based on already existing standardization activities like Schema.org

(https://schema.org/) or SAREF (http://ontology.tno.nl/saref/). The complete list of FIWARE

data models is available at [12].

The development and deployment of adaptation or translation components is the main

mechanism to achieve interoperability in multi-platform interactions involving FIWARE based

systems. This approach is currently implemented by the IDAS (also known as IoT Agents)

FIWARE Generic Enabler to interconnect devices which use specific IoT communication

protocols (e.g., LoRaWAN, MQTT, CoAP) and data models (e.g., CayenneLpp, IETF CBOR,

LWM2M). Another representative example is the incubated Generic Enabler which provides

integration with OpenMTC (based on oneMTM) IoT middleware. A specific connector, showed

in the middle of Figure 10, has been developed to translate data to/from NGSI and to enable bi-

directional data flows between OpenMTC backend and FIWARE Context Broker [13].

Figure 10: Logical view of OpenMTC connector for FIWARE Orion Context Broker.

Page 26: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 26

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

3.1.4.2 ETSI Context Information Management (CIM)

The European Telecommunications Standards Institute (ETSI) has an Industry Specification

Group dedicated to working on cross-cutting Context Information Management (ISG CIM). The

first specification was released in April 2018 as at [14]. The working group includes reference

industrial organizations like Telefonica, Orange, NEC or British Telecommunications and the

European Commission as a counsellor. As it is stated on page 11 of [14], ESTI CIM “leverages on

the former OMA NGSI 9 and 10 interfaces and FIWARE NGSIv2 to incorporate the latest

advances from Linked Data”.

ETSI CIM aims to standardize the following aspects:

NGSI-LD: an information model to structure context information.

Possible architectures to use NGSI-LD API

NGSI-LD data representation based on JSON-LD.

NGSI-LD query language to retrieve entities and apply filters.

The specification of the API operations

The specification of the API HTTP binding.

In comparison with NGSI, the adoption of a Linked Data approach formalizes the representation

of relationships between entities and adds context useful information regarding the specific

ontology applied to each one of them. It must be noted that the fact of using JSON-LD syntax

makes possible a smooth transition from classical NGSI entities to the new format.

An example of NGSI-LD ontology and its instantiation to model a vehicle is included in Figure 11

(p. 19 of [14]).

Page 27: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 27

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 11: NGSI-LD ontology applied in an example.

At the meta-model level, NGSI-LD introduces the Resource Description Format (RDF) concepts

of Properties and Relationships. At the cross-domain ontology, additional common properties

are defined: Geolocation, Temporal Property, unitCode. Also, possible values for the properties:

TimeInterval (used by Temporal Property) and Geometry (used by Geolocation). Finally, for

each domain it is possible to derive new entities (e.g., parking, street, gate or car), relationships

(e.g., adjacentTo, hasOpening) and properties (hasState, reliability).

On one hand, the main benefit of ETSI CIM / NGSI-LD with respect to the previous NGSI

approach will be the possibility of performing advanced queries that exploit the relationships

between entities, e.g., to get all the entities and attributes of all the vehicles of a parking.

On the other hand, it must be also taken into account that achieving good performance and

scaling RDF databases is a complex issue that may affect to the overall system. Moreover, from

a practical position, there is not yet an available implementation of a Context Broker that

supports NGSI-LD.

3.1.4.3 Semantic Sensor Network (SSN) Ontology

The SSN initiative of the World Wide Web Consortium (W3C) follows the same goal that is also

behind ETSI CIM / NGSI-LD: to apply Linked Data paradigm to the information collected by IoT

devices and to propose a common ontology. It proposes two different specifications: Semantic

Sensor Network (SSN) and Sensor, Observation, Sample and Actuator (SOSA) ontologies. The

Page 28: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 28

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

latter one is a minimum version of the SSN like Schema.org vocabularies that has been designed

to simplify the adoption process.

Figure 12 (extracted from [15]) shows how SOSA and SSN specify multiple conceptual modules

(black boxes), classes and properties considering only the observation perspective. SOSA and

SSN elements are depicted in green and blue colour respectively. Similar diagrams are also

available for actuation and sampling perspectives.

Figure 12: SSN/SOSA conceptual modules, classes and properties for observation perspective.

Therefore, SSN/SOSA can be considered as an alternative to ETSI CIM / NGSI-LD although the

former one is more focused on IoT devices or sensors aspects (e.g., sensing mechanisms,

observations, etc.), what could be seen as a benefit in terms of the level of detail that could be

captured by SecureIoT data probes but also adds non-essential information that could slow

even more the performance of the queries. In fact, this problem has led to formulate

lightweight versions of SSN/SOSA [15].

3.1.5 IoT Security Templates & Rulesets Modelling

The SecureIoT Analytics module is the core component of the overall SecureIoT architecture as

it detects security related issues that may arise to the target IoT deployment. The two main

inputs to the Analytics module are streaming data generated from the deployed probes to

select target IoT nodes and contextualization data that are stored in permanent storage and

comprise security template and rulesets.

Page 29: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 29

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Rules can either be specified manually or extracted from collections of security related data

after employing machine learning techniques. Rules have the generic form

antecedent → consequent

The rule antecedent is the conjunction of predicates over the values of some security

attributes. It has the form

𝑃1⋀ … ⋀𝑃𝑛, 𝑛 ≥ 1

where each 𝑃𝑖 = 𝑃𝑖(𝑎𝑖1, 𝑎𝑖2

, … , 𝑎𝑖𝑖𝑛). The attributes that appear in the predicate are names of

security data that are generated by the SecureIoT probes. In practical terms they are keys of the

content field of the securityData model presented above. For evaluating a predicate, the values

that correspond to the names that appear in it are used.

When the rule antecedent is evaluated to true (i.e., all of the predicates are satisfied) then the

rule fires and its consequent is executed. The consequent may involve an action, like raising an

alert, setting a flag, creating a log entry, and so on. The Analytics module is responsible for

applying the specified or discovered rules to the input security data, executing the

corresponding consequent when a rule fires. In general, more than one rule may fire after

receiving input data; in such cases the consequents of all firing rules will be executed.

For performing its task, the Analytics module makes use of security templates, which contain

historical data, rules that may be extracted from them as well as conditions or exceptions for

applying the rules. For example, in a specific IoT deployment it may be normal to receive ten

SYNC messages within one minute. In this hypothetical scenario rules that specify the raise of

an alert under the given conditions should be ignored. In this sense the security template

specifies a context in which rules should be applied. Therefore, the templates specify the

context in which rules may be enabled or disabled.

3.2 Data Collection Infrastructure The SecureIoT SECaaS services are provided as add-ons to existing IoT systems, including legacy

systems or ones that have to meet strict efficiency or throughput requirements. Following the

SecureIoT architecture presented in D2.4 [5], security related data are collected from select

nodes of target IoT systems through probes or agents that are deployed along them. As the

collection of security data has to impose the least possible overhead to the nodes of the target

IoT system the data collection agents have to be lightweight and their interactions with the

target IoT system remain the least possible.

A consequence of this requirement is that data should be pushed to the SecureIoT analytics

engine as opposed to be pulled from it. Pulling data imposes the extra overhead on the server

to listen for incoming connection requests, establish new connections when a request comes,

and send data to the requesting client when requested to do so, as opposed to pushing data in

Page 30: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 30

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

which case data are forwarded as they are generated making thus the data producer more

efficient and lightweight.

There are several lightweight technologies for collecting and pushing data. In the sequel two of

them are presented, Elastic Beats and Sematext Logagent.

Elastic Beats [17] is a lightweight open source and free platform for data collection and

forwarding probes or agents, which can be installed at different nodes of a distributed system,

including devices of an IoT system. Each Beat is a separately installable agent. The Beats API

specifies how data may be collected and be shipped to a data sink. There are a number of

predefined Beats as follows

Auditbeat: for data auditing, mainly from Linux based systems by communicating

directly with the Linux audit framework.

Filebeat: for forwarding and centralizing files and logs

Heartbeat: for detecting the availability of a server. It issues a request to one or more

URLs and waits for receiving the reply. It then reports on the aliveness of the sites along

with the response time

Metricbeat: for reporting on a set of metrics of a system, including CPU, memory

utilization, load balancing, and so on.

Packetbeat: for reporting network traffic

Winlogbeat: for reporting Windows event logs

In addition, custom Beats may be implemented by using the libbeat library.

Sematext Logagent [18] is a lightweight open source log shipper similar to Elastic Beats, and

more precisely to Elastic Filebeat. It provides support for log parsing, log routing, log

enrichment, and disk buffering of data and supports two way SSL authentication.

Logagent supports a number of inputs, for example files, streams, sockets, databases, as well as

filtering of input data. It can output to Apache Kafka, Elasticsearch, while the output filters

support aggregation of parsed data and data enrichment.

In the context of SecureIoT Elastic Beats will be used as the data collection platform the main

reason being that custom made Beats need be developed for the specialized nodes of the

target IoT system, IoT application, IoT platform for the collection of the pertinent security data.

3.3 Data Streaming Infrastructure This section presents the streaming infrastructure that will be used in SecureIoT. The security

services that are offered by SecureIoT depend on a highly distributed and decoupled

infrastructure. The target IoT system comprises several levels, from the low level devices and

smart objects to the supporting IoT platforms and the IoT applications. The communication and

data exchange between the various components is a core functionality that has to be

implemented efficiently, flexible and remain transparent to the target IoT system.

Page 31: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 31

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

A key observation is that messages that are collected from the nodes of the target IoT

deployment are actually events. For example, an attempt for a remote connection to a node

will typically generate an event. Similarly, a flow of SYN messages will result to the generation

of an event. As a consequence, the integration technology that will be used as part of the

SecureIoT infrastructure has to be event oriented. This requirement sets it apart from other

integration technologies like Enterprise Service Buses (ESB) and Extract-Transform-Load (ELT)

tools.

This section presents the streaming infrastructure that will be used in SecureIoT. It first

presents the two main approaches for communicating real time application level generated

data from a source to a destination in a distributed setting. In the context of SecureIoT security

data are communicated from the probes that are deployed with the target IoT system to the

analytics module for further processing. The two approaches presented are the request-reply

and the publish-subscribe. In the sequel it focuses on the Apache Kafka platform, giving some of

its capabilities and giving the main arguments for its selection as the platform that will be used

for communicating security data in SecureIoT.

3.3.1 Request Reply

In the request-reply model of communicating data between two entities A and B when A needs

some data that are produced by B, it makes a request to B and B replies with the data, as shown

in Figure 13 below. In practical terms, when client A needs some data from B, it opens a

connection to B and sends a request. B, on the other hand waits for client requests. When it

receives the next one, it prepares a response to be sent back to A.

Figure 13: Overview of the request-reply architecture.

For example, if B is a server that provides the current temperature in a specific area, when A

needs to know that temperature it will send a request to B and will do the same every time it

needs that temperature. Temperatures may change between A’s requests, but A maintains the

choice of issuing a request whenever it needs a piece of data from B, effectively receiving data

at its own pace. A does not know when new data are available at B. Therefore, if A is interested

in getting updates (at every temperature change, say) it may need to issue frequent requests,

possibly placing an overhead to the communications network and in several cases getting back

nothing new. On the other hand, if A decides not to place overhead to B by reducing the

frequency of requests to it, it runs the risk of missing some data updates.

A B

request

reply

Page 32: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 32

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

In a more complex scenario when multiple (distributed) clients like A request data from a

number of servers like B, then each server B has to keep track of and satisfy each request,

which makes B’s logic more complicated. The corresponding architecture is shown in Figure 14.

Figure 14: Overview of the request-reply architecture with multiple clients.

3.3.2 Publish Subscribe

The publish-subscribe model of communicating data between two entities A and B introduces a

third entity in between them, the broker. Entities that are interested in new data coming from

B will register to receive such data with the broker. Whenever a new datum is published from B,

the broker gets to know about it and it duplicates and forwards the new datum to all entities

that have subscribed for data coming from B as shown in Figure 15.

Figure 15: Overview of the publish-subscribe architecture.

Continuing the example of the temperature server B, when a client A wants to receive new

temperatures from B, it will first register with broker R its interest to B’s data. When B produces

a new temperature, it will send the value to the broker and the broker will forward it to all

entities like A that registered with R their interest to B’s values. If multiple such entities exist

then B’s value will be replicated. In this scenario B simply announces a new value when such

value has been produced. Broker R already maintains a list of interested clients like A so it only

B B B B

B B B B

A R B

register

announce announce

Page 33: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 33

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

copies and transmits A’s value to each one of them. In the publish-subscribe model A will only

receive data from B whenever B has produced something to be announced.

In a more complicated scenario in which multiple clients like A request data from B, B’s logic

remains simple. All interested entities register their interest with the broker and receive data

when are generated by B; there is no need to make repeated requests to B to receive new data.

B, on the other hand, generates data at its own pace without having to afford the overhead of

replying to individual requests by clients. The corresponding architecture is shown Figure 16. It

is evident that the publish-subscribe model of communicating data scales very well and much

better than then request-reply one.

Figure 16: Overview of the publish-subscribe architecture with multiple data producers and consumers.

The broker implements the core functionality for communicating data from producers to

consumers. Typically, it contains a routing component and a number of output queues as

shown in Figure 17. Data producers send their data to the routing component, which, in turn,

places the data into one or more queues, possibly replicating them. Consumers, on the other

hand, read data from the queue they have registered with. Depending on the platform, data

may persist in the various queues or be transient.

B B B B

A A A A

R

Page 34: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 34

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Figure 17: Overview of Broker architecture.

SecureIoT security data producers are the probes that are deployed with the target IoT system,

which produce security related data according to the defined configurations and policies. As the

probes have to impose the least possible overhead to the target IoT system and remain

lightweight a publish-subscribe platform will be used for relaying data generated by them to

the analytics modules; Apache Kafka is a candidate such platform.

3.3.3 SecureIoT Streaming Infrastructure

The large amount of security data that are produced by probes that are deployed along select

nodes of the target IoT system has to be communicated in a flexible and efficient way for

subsequent storage and analytics processing. Therefore, efficiency is a key requirement for the

streaming infrastructure that will be put in place in the context of SecureIoT. Moreover, the

streaming infrastructure has to implement the publish-subscribe model of data communication,

whose advantages were emphasized in the previous subsections.

Security data that are produced by the nodes of the target IoT system are actually events, s

opposed to data that are exchanged between parts of a distributed system for its functioning.

They are generated when interesting things happen at the nodes of the target IoT system, for

example, an attempt for a remote connection, a component’s firmware update, detection of a

flood of SYN messages and so on. Event driven streaming infrastructures are clearly

distinguished from other integration solutions including ESBs and ELTs.

There exist a number of platforms that implement the publish/subscribe paradigm, for example

Apache Kafka, ZeroMQ, ActiveMq, JBOSS Messaging, RabbitMQ, and HornetMQ. The rest of this

section presents two of the most widely used streaming platforms, namely Apache Kafka and

RabbitMQ.

Apache Kafka [19] is a streaming platform for handling large volumes of event based data flows.

It is easily scalable and flexible for accommodating multiple data sources and destinations

effectively decoupling data producers from data consumers and can easily be integrated with a

Routing element P

rod

uce

d d

ata

Co

nsu

med

data

Page 35: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 35

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

number of other technologies. Kafka has been designed and tuned for efficient, high

throughput, low latency, real time, and scalable streaming of large amounts of event based

data. The platform implements the publish-subscribe model of data streaming for

communicating data from multiple data producers to multiple data consumers. Kafka defines

the concept of topic, which is a subject of interest. Producers produce data for one or more

topics and consumers register their interest to one or more topics to receive data. Within a

topic data are partitioned and messages in each partition are ordered and timestamped.

Partitions are replicated and distributed over the nodes of the Kafka deployment cluster.

A key characteristic of Kafka is persistence of streaming data. When data are published to

Kafka, they are written to the filesystem and remain there for a configurable amount of time.

The advantage of this approach is that clients are able to reload parts of a data log, or new

coming clients to catch up by loading the whole history of the logged data. Moreover, clients

can read data independently of one another and each at their own speed. Data items are

indexed (starting at index 0) as shown in Figure 18 and reside in partitions that may be

replicated across the nodes of a cluster.

Figure 18: Kafka Partitions and read-write operations.

Apache Kafka was originated as an internal project in LinkedIn [20] implemented in Scala but

now is an open source stream processing platform under the Apache Software Foundation [21].

Benchmarking of the Kafka platform appears in [22].

Kafka provides its services over a number of APIs, as follows:

Producer API: Allows applications to produce streams of data

Consumer API: Allows applications to consume streams of data

Connector API: Allows the definition of connectors for read/write of data to other

applications.

Producer (write) Consumer 1 (read)

Consumer 2 (read)

0 1 2 3 4 …

Partition 0

Partition 1

Partition 2

Page 36: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 36

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Streams API: Allows stateful processing of stream data, including operations like

filtering, mapping, aggregation, joins.

Kafka depends on Apache Zookeeper [23], which is a centralized service for maintaining

configuration information, and providing naming, distributed synchronization, and services.

RabbitMQ [24] is a data streaming platform similar to Apache Kafka that has been developed in

Erlang and implements a variety of messaging protocols including the Advanced Message

Queuing Protocol (AMQP). AMQP originated in JPMorgan Chase and is well tuned for

performance, scalability and reliability primarily for applications in the financial sector but also

ones of broader scope. A high level architecture of RabbitMQ is shown in Figure 19.

Figure 19: Overview of RabbitMQ architecture.

An exchange is a data router. Exchanges are bound to queues based on platform configuration.

Different types of exchanges are supported by RabbitMQ. Direct exchanges send data to a

specified output queue. Topic exchanges apply matching rules to the incoming data to decide

to which output queue will send it. Fanout exchanges copy and send data to all output queues

they are linked with. Finally, headers exchanges decide the output queue based on the data

header. Compared to Kafka, it does not provide persistency of streaming data. Instead, it makes

use of smart queues, which monitor data consumption by consumers, only retaining data as

long as needed before they are consumed.

There have been several comparisons and benchmarks between Kafka and RabbitMQ. [25]

provides a thorough such comparison and concludes with a guide that guides selection of one

or the other platform based on a number of criteria. In the context of SecureIoT where security

data are to be communicated both fast and reliably the most relevant criteria of those listed in

that paper are the following:

Very large system throughput

Very large throughput per topic

Exchanges Producers

Consumers Queues

Page 37: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 37

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

At least once delivery semantics (in case of failures the platform guarantees that no data

gets lost)

High availability

Long term data storage may be desirable but not critical requirement for SecureIoT. The table

of the paper shows that under these requirements Kafka with replication should be the

platform of choice. Similar conclusions are reached by [26], where it is stated that Kafka should

be preferred over RabbitMQ when high data flows are expected, high availability is required

and guarantees against data losses.

Therefore, Apache Kafka will be used as the streaming platform for transferring security related

data from the probes to the analytics engine.

3.4 Data Storage Infrastructure Data that are collected from the target IoT system are permanently stored for subsequent

analysis and further training of the security analytics algorithms. Several alternatives are

examined below for the persistent storage of these data.

The simplest and most primitive form of permanent data storage is provided by plain files. Files

can easily store data sequentially by appending new data as they arrive. The disadvantage of

using plain files is that they provide no inherent support for searching or otherwise processing

data, except for sequentially scanning them. Nevertheless, some primitive structuring of

relatively small amounts of data may be provided by the files themselves. For example,

different files may be defined for containing data from different time periods or different types

of data, and this may be reflected to the names of the files themselves.

Typically, databases are employed for storing large amounts of data. Depending on the nature

of the data either SQL or NoSQL databases may be used. SQL databases are used for storage

and retrieval of structured data that can be modeled in tabular forms. Tables are used to store

either the data themselves or relations between data. Table columns represent data attributes

while each table row contains a data record. It follows that data that fit into this model must be

very well structured with each data record having a fixed number of attributes and each

attribute be of a specified type. Moreover, methodological approaches to structuring SQL

databases so as to remove redundancy and improve integrity allow no multivalued attributes.

NoSQL databases take a different approach by allowing storage and retrieval of non-structured

data. They have become popular in big data applications, which need to process large amounts

of data coming from different sources and be of different types. For such data with no

particular structure NoSQL databases are well suited as they provide for efficient storage,

indexing and retrieval of the data.

The following paragraphs present shortly two widely used NoSQL database systems, MongoDB

and ElasticSearch.

Page 38: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 38

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

MongoDB [27] is a widely used NoSQL database. It stores data as JSON like documents, which

means that documents may have arbitrary fields, which can be nested at arbitrary levels and

different documents may have different structure or fields while the document structure may

change over time. MongoDB supports indexing of documents to facilitate subsequent efficient

retrieval. Queries may be expressed by referencing document fields. For example, queries may

require fields to have specific values or fields whose values fall within a certain range or fields

with values that match a regular expression. MongoDB may be used for storing and handling

security data as they are generated from SecureIoT probes. MongoDB is a distributed NoSQL

database. It indexes documents and divides the indices into shards, i.e., horizontal partitions of

the data. Shards are maintained and replicated in different servers; for each shard one or more

replicas may be maintained. MongoDB may be used for holding security data in the context of

SecureIoT, as different IoT probes may generate different structures of security data, which

may be stored as MongoDB documents.

Elasticsearch [28] is a distributed search and analytics engine for JSON documents, as compared

to MongoDB which is primarily a document store database. It provides a REST API for real time

data collection and search. It supports both structured and unstructured data, numbers, text,

and geolocations, and achieves good efficiency by appropriately indexing them.

Elasticsearch is based on the popular Lucene information retrieval library. It supports the

modern architectural style of multitenancy, i.e., having a single Elasticsearch deployment

supporting multiple tenants as opposed to a single deployment per tenant. It is part of the

Elastic Stack integrated suite, which includes the data collection engine LogStash and the

analytics engine Kibana. Similar to MongoDB, the data are partitioned into shards, which are

replicated among servers.

Elasticsearch provides support to make applications GDPR compliant by incorporating a number

of features as follows [29]:

Access Controls: role-based access control, down to the field level, may be implemented

for ensuring that only authorized persons can access GDPR Personal Data in the

Elasticsearch cluster.

Monitor Access and Breaches: Elasticsearch audit and access logs may be combined with

machine learning and alerting jobs for access monitoring and breach detection.

Pseudonymization: the Logstash fingerprint filter may be used to replace personal data

with hashed values.

Encryption: TLS / SSL may be enabled for securing data in transit from snooping and

tampering.

Elasticsearch will be used as part of the SecureIoT infrastructure for storing security data that

are collected from SecureIoT probes.

Page 39: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 39

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

4 SecureIoT Analytics Infrastructure This chapter presents the analytics infrastructure that will be put in place for the support of the

SecureIoT services. The analytics module is a core component of the SecureIoT architecture and

is responsible for generating alerts when security issues with the target IoT system are

detected. It makes uses of custom made analytics algorithms that monitor security related data

collected by probes form the target IoT system and as well as templates and rulesets that are

maintained in a knowledge base.

4.1 Data Analytics in SecureIoT 4.1.1 Analytics Layers SecureIoT makes use of predictive security analytics for identifying security issues at the target

IoT system and the corresponding IoT application. The analytics components of the SecureIoT

architecture cooperatively provide the core real time functionalities of the SecureIoT platform.

The SecureIoT analytics components are distinguished in layers as follows:

Edge analytics components. They are lightweight components that are deployed at the

edge nodes of the target IoT system. These components implement simple analytics

functions like data aggregations, statistical calculations, or other application specific

calculations and they stream the results to the core analytics components. Streaming of

the results is done similar to the streaming of security data from other probes of the

SecureIoT platform. The data model presented in previous sections is generic enough to

accommodate data that are generated by the edge analytics components.

Core analytics components. The implement the predictive security analytics functions of

the SecureIoT services. As shown in the SecureIoT architecture in Figure 4, they make use

of large data sets and security templates to provide their services. The core analytics

components use an analytics platform on which they run.

4.1.2 Data analytics requirements

The major requirement for the data analytics framework that will be used in SecureIoT is its

efficiency. Security related incidents and issues that may take place at the target IoT system and

application, have to be detected as quickly as possible, given also that the analytics engine has

to process large amounts of security related data either stored or streamed.

4.2 Data Analytics Framework in SecureIoT There are a number of widely used platforms for big data that are available, several of them

being free and open source. This section gives a short overview of two of the most popular,

Hadoop and Apache Spark and gives arguments for the selection of the latter in the context of

SecureIoT.

Page 40: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 40

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

4.2.1 Apache Hadoop

Apache Hadoop [30] is a framework that allows for the distributed processing of large data sets

across clusters of computers using simple programming models. It started as a Yahoo project in

2006 and it later became an Apache open source project. Hadoop comprises three main

modules, the Hadoop Distributed Filesystem (HDFS), the coordination and scheduling module

YARN, and the MapReduce algorithm for processing large amounts of data. Hadoop is mainly

oriented towards batch processing of data, as it makes heavy use of HDFS.

The core of Hadoop is the MapReduce algorithm which is a programming paradigm for

processing big data sets that allows for parallelism and distribution. As the name implies,

MapReduce comprises a Map step that performs filtering tasks followed by a Reduce step that

performs aggregation tasks on the outputs of the Map step.

Hadoop is a highly fault-tolerant platform as it replicates data across many machines. Each file

is split into blocks, which are replicated across several machines, so that if a single machine

fails, the file can be rebuilt from other block replicas that reside in other machines.

Hadoop uses Apache Mahout [31] for data processing and machine learning. Mahout is a

distributed linear algebra framework that allows the implementation of distributed scalable

machine learning algorithms, mainly for collaborative filtering, clustering and classification, all

of which run on top of MapReduce.

4.2.2 Apache Spark

Apache Spark [32] was initiated at the University of California Berkeley and is now an Apache

project. It is a highly efficient analytics engine for large scale data processing, for either

streaming or stored data. Spark can run as a standalone platform or on a cluster on top of JVM.

Alternatively, it can run on top of Hadoop YARN. Spark allows writing of applications in Java,

Python, Scala, R, and SQL.

All data to be processed are maintained in memory, hence the high performance capabilities of

the platform. The same holds for any generated data. In memory data may be transferred to

permanent storage after the programmer explicitly programs the transfer. According to [32]

Spark performs 100 times faster than Hadoop. Further reports state that Spark won the 2014

Gray Sort Benchmark [33] (Daytona 100TB category), sorting 100TB of data in 23 minutes on a

cluster of 206 nodes, with the previous world record being 72 minutes, set by a Hadoop

MapReduce cluster of 2100 nodes.

Spark makes use of a replication technology by the name Resilient Distributed Dataset (RDD),

which is an immutable collection of data amenable to parallel processing that allows it to run in

a cluster and achieve fault tolerance capabilities. Along with RDD, Spark creates a Directed

Acyclic Graph (DAG) that models the relationships between data operations. The formation of

RDD and the corresponding graph gives Spark its fault tolerance capabilities. If some data is

corrupt, or a machine that hosts them fails, the data can be recovered from replicas residing in

other nodes.

Page 41: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 41

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Spark overcomes the limitations of Hadoop, which are based on the map-reduce model of data

processing, whose main drawback is that it imposes a linear way of processing data, something

that may be a limitation for certain types of application. Spark, on the other hand, allows the

development of iterative analytics algorithms that may provide for more sophisticated

processing of data. Spark includes libraries for SQL, streaming data, MLlib (a machine learning

library), and GraphX (a graph computation platform) support. It easily interfaces with a number

of other platforms including MongoDB, Elasticsearch, HDFS, and so on.

The MLlib is a scalable machine learning library that provides support for applications that are

oriented towards in memory data use.

4.2.3 SecureIoT Data Analytics framework

SecureIoT will use the Apache Spark as the platform for its core analytics component. The

argument in favor of Spark is its high speed and efficiency. As security related issues at the

target IoT system should be detected as quickly as possible, the speed of the analytics engine is

a high ranked requirement.

Page 42: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 42

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

5 Prototype Implementation and Demonstration This chapter presents a prototype implementation and demonstration of the storage and

analytics infrastructure that will be used for the SecureIoT trials, i.e., the components for data

collection, transfer, storage, and analytics processing, along with their configurations and setup.

The use and operation of the infrastructure is also presented. Sample data are generated and

their collection and subsequent transfer to the storage and analytics modules are

demonstrated.

Figure 20 shows an overview of the infrastructure setup. Data collection executes beats on

nodes in containers. Beats collect data and ship them to Logstash. Logstash transform data into

SecureIoT internal format, and finally sends them to both Elasticsearch and Kafka, where they

can be queried by static or dynamic analysis tools.

Figure 20: Overview of the infrastructure setup.

The following sections give details of the interfaces among components of the infrastructure.

5.1 Data collection Data collection has been implemented as a Spring Boot application. The component uses

MongoDB to maintain its internal state, which comprises (1) the metrics it can collect, (2) the

Page 43: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 43

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

platforms and the nodes it collects metrics from, and (3) the collectors it has spawned for that

purpose.

Data collection exposes the following REST API that allows users to monitor different metrics on

different nodes that belong to different platforms.

Title Create a collector

Description Creates a collector for the given metric on the given node.

URL /collectors

Method POST

Request headers Content-Type: application/json

Request body { "metric": "...", "node": "..." } metric: The metric to collect. node: The node to collect the metric from.

Status code 201 (Created): The collector was created. 400 (Bad Request): The request was invalid (e.g. the metric was missing). 500 (Internal Server Error): The collector failed to be created.

Response headers Content-Type: application/json Location: …

Response body { "id": "...", "metric": "...", "node": "...", "status": "..." }

id: The ID of the collector. metric: The metric that the collector collects. node: The node where the collector collect the metric from. status: The status of the collector.

Page 44: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 44

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Example Request { "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "node": "82a26f51-0695-4cbd-9736-81291f354fc0" }

Response { "id": "bb142fea-a242-4f7c-a6e1-e87a70099755", "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "node": "82a26f51-0695-4cbd-9736-81291f354fc0", "status": "stopped" }

Notes

The collector is only created; it is not started.

The status of the new collector is stopped.

Title Start a collector

Description Starts the collector with the given ID.

URL /collectors/:id/start

Method POST

Request parameters id: The ID of the collector.

Status code 204 (No Content): The collector was started. 404 (Not Found): The collector with the given ID was not found. 409 (Conflict): The collector was in an invalid state. 500 (Internal Server Error): The collector failed to be started.

Notes

The status of the collector must be stopped, before it can be started.

Once it has been started, the status of the collector is changed to running.

Page 45: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 45

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Title Stop a collector

Description Stops the collector with the given ID.

URL /collectors/:id/stop

Method POST

Request parameters id: The ID of the collector.

Status code 204 (No Content): The collector was stopped. 404 (Not Found): The collector with the given ID was not found. 409 (Conflict): The collector was in an invalid state. 500 (Internal Server Error): The collector failed to be stopped.

Notes

The status of the collector must be running, before it can be stopped.

Once it has been stopped, the status of the collector is changed to stopped.

Title Delete a collector

Description Deletes the collector with the given ID.

URL /collectors /:id

Method DELETE

Request parameters id: The ID of the collector.

Status code 204 (No Content): The collector was deleted. 404 (Not Found): The collector with the given ID was not found. 409 (Conflict): The collector was in an invalid state. 500 (Internal Server Error): The collector failed to be deleted.

Notes

The collector must be stopped, before it can be deleted.

Title Search for collectors

Description Searches for collectors that match the given criteria.

URL /collectors/search

Method POST

Page 46: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 46

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Request headers Content-Type: application/json

Request body { "metric": "...", "node": "...", "platform": "...", "status": "..." } metric: The metric that the collector collects. node: The node where the collector collects the metric from. platform: The platform where the collector collects the metric from. status: The status of the collector.

Status code 200 (OK): Collectors were retrieved. 500 (Internal Server Error): Collectors failed to be retrieved.

Response headers Content-Type: application/json Location: …

Response body { "collectors": [ { "id": "...", "metric": "...", "node": "...", "status": "..." }, ... ] }

collectors: The collectors that match the given criteria. id: The ID of the collector. metric: The metric that the collector collects. node: The node where the collector collects the metric from. status: The status of the collector.

Example

Page 47: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 47

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Request { "metric": "d6c25553-9bca-4334-b08b-eedd62155599" }

Response { "collectors": [ "id": "bb142fea-a242-4f7c-a6e1-e87a70099755", "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "node": "82a26f51-0695-4cbd-9736-81291f354fc0", "status": "stopped" ] }

Apart from the above endpoints, data collection provides also endpoints that allow users to

create, update, delete and search for platforms, nodes and metrics.

Each collector is currently implemented as a Beat with the appropriate configuration. For

example, a collector that collects system-level CPU usage from a server is Metricbeat deployed

on that server and configured to collect CPU usage. We are already experimenting with running

beats in containers.

All beats are configured to ship their data to Logstash, which in turn sends them to both

Elasticsearch and Kafka with the use of the corresponding output plugins. That way analysis can

be done both on data at rest (Elasticsearch) and on data in transit (Kafka).

More information about the data collection component can be found at

https://gitlab.atosresearch.eu/secure-iot/data-collection.

5.2 Data storage Data storage has been also implemented as a Spring Boot application. The component serves

as an abstraction layer over Elasticsearch.

Data storage exposes the following REST API that allows users to query stored data.

Title Query data

Description Queries stored data.

URL /collectors

Method POST

Request headers Content-Type: application/json

Page 48: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 48

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

Request body { "query": ... } query: The query to execute.

Status code 200 (OK): Data were retrieved. 500 (Internal Server Error): Data failed to be retrieved.

Response headers Content-Type: application/json Location: …

Response body { "data": [ { "platform": "...", "node": "...", "metric": "...", "time": "...", "value": ... }, ... ] }

data: The data that match the give criteria. platform: The platform where the data were collected from. node: The node where the data were collected from. metric: The metric that the data are about. time: The date and time when the data were collected. value: The value.

Example Request { "query": { "match": { "node": "82a26f51-0695-4cbd-9736-81291f354fc0"

Page 49: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 49

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

} } }

Response { "data": [ { "platform": "6b2f9297-6c5e-4859-b82d-da5dbbaabd3f", "node": "82a26f51-0695-4cbd-9736-81291f354fc0", "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "time": "2018-08-25T08:00:00+000", "value": 5.00 } ] }

The endpoint currently accepts queries in the ElasticSearch Query DSL. We may reconsider that

approach in the next versions.

More information about the data storage component can be found at

https://gitlab.atosresearch.eu/secure-iot/data-storage.

Page 50: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 50

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

6 Conclusions This document presents the Security Storage and Analytics Infrastructure that will be put in

place for running the trials of SecureIoT. The infrastructure is aligned with the SecureIoT

architecture and as defined in D2.4 and comprises a number of open source components.

Requirements for the parts of the infrastructure are expressed and different alternative

technologies are presented. Based on the requirements, the document argues for selection of

the most appropriate technology that will be used in the context of the project. The last

chapter presents a prototype setup of the infrastructure based on the selected components

and gives some examples of its functioning. The infrastructure will be configured for running

the planned trials of the project and will be refined according to the needs. The final refined

version of the infrastructure will be presented in a follow-up document at the end of the

project.

Page 51: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 51

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

References [1] https://secureiot.eu/

[2] https://en.wikipedia.org/wiki/Big_data

[3] Ioannis T. Christou, Emmanouil Amolochitis, Zheng-Hua Tan. “A Parallel/Distributed

Algorithmic Framework for Mining All Quantitative Association Rules”, April 2018,

https://arxiv.org/abs/1804.06764

[4] Quamar Niyaz, Weiqing Sun, Ahmad Y Javaid, and Mansoor Alam. “A Deep Learning

Approach for Network Intrusion Detection System”, IEEE Transactions on Emerging

Topics in Computational Intelligence, 2018.

[5] SecureIoT “D2.4 – Architecture and Technical Specifications”. J. Soldatos and all, 2018.

[6] Brian Russel, Drew van Duren. “Practical Internet of Things Security”, Pact Publishing,

2016

[7] https://nvd.nist.gov/vuln-metrics/cvss

[8] F. Carrez, T. Elsaleh, D. Gómez, L. Sánchez, J. Lanza and P. Grace. “A Reference

Architecture for federating IoT infrastructures supporting semantic interoperability”,

2017 European Conference on Networks and Communications (EuCNC), Oulu, 2017,

pp. 1-6. doi: 10.1109/EuCNC.2017.7980765

[9] SecureIoT “D2.1 – Reference Scenarios and Use Cases”. K. Kalaboukas and all, 2018.

[10] FIWARE Orion Context Broker documentation in Read The Docs. https://fiware-

orion.readthedocs.io/en/master/index.html

[11] FIWARE NGSI API specification. http://telefonicaid.github.io/fiware-

orion/api/v2/stable/

[12] FIWARE data models. https://github.com/Fiware/dataModels

[13] T. Günter. “OpenMTC – An open source implementation of the oneM2m standard”,

FIWARE Global Summit, 2018, https://es.slideshare.net/FI-WARE/fiware-global-

summit-openmtc-a-open-source-implementation-of-the-onem2m-standard

[14] ETSI GS CIM 004. “Context Information Management (CIM); Application Programming

Interface (API)”,

https://www.etsi.org/deliver/etsi_gs/CIM/001_099/004/01.01.01_60/gs_CIM004v010

101p.pdf

[15] Semantic Sensor Network Ontology. https://www.w3.org/TR/vocab-ssn/

[16] M. Bermudez-Edo, T. Elsaleh, P. Barnaghi and K. Taylor. “IoT-Lite: A Lightweight

Semantic Model for the Internet of Things”, 2016 Intl. IEEE Conferences on Ubiquitous

Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and

Communications, Cloud and Big Data Computing, Internet of People, and Smart World

Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, 2016, pp. 90-97.

doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0035

[17] https://www.elastic.co/products/beats

[18] https://sematext.com/logagent/

Page 52: DELIVERABLE D3.1 Security Information Storage and Analytics … · 2019-09-22 · ESB Enterprise Service Bus ... ETSI CIM ETSI Context Information Management ETSI NGSI-LD ETSI Next

Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.

Page | 52

D3.1 – Security Information Storage and Analytics Infrastructure

Version: v1.2 - Final, Date 29/09/2018

[19] http://kafka.apache.org/documentation.html

[20] https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin

[21] http://apache.org/

[22] https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-

second-three-cheap-machines

[23] https://zookeeper.apache.org/

[24] http://www.rabbitmq.com

[25] Philippe Dobbelaere and Kyumars Sheykh Esmaili. “Kafka versus RabbitMQ: A

comparative study of two industry reference publish/subscribe implementations.

Industry Paper”. DEBS '17 Proceedings of the 11th ACM International Conference on

Distributed and Event-based Systems, pp. 227-238.

[26] Nicolas Nannoni. “Message-Oriented Middleware for Scalable Data Analytics

Architectures”, Master’s Thesis, KTH, Sweden, 2015.

[27] https://www.mongodb.com/

[28] https://www.elastic.co/

[29] https://www.elastic.co/gdpr

[30] https://hadoop.apache.org/

[31] https://mahout.apache.org/

[32] http://spark.apache.org/

[33] http://sortbenchmark.org/