45
Big Data Integrator Platform Platform Architecture and Features Dr. Hajira Jabeen Technical Team Leader-BDE University of Bonn BDE Presentation, EBDVF, 17

Big Data Integrator Platform · 2019-03-07 · Big Data Integrator Platform Platform Architecture and ... Spark Flink Semantic Layer Ontario SANSA Semagrow Kafka Real-time Stream

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Big Data Integrator Platform Platform Architecture and Features

Dr. Hajira JabeenTechnical Team Leader-BDEUniversity of Bonn

BDE Presentation, EBDVF, 17

2

BigDataEurope

Making Big Data Accessible

How can we make it easy?

3

Platform Goals

◎ Easy to:o Install

o Develop

o Deploy

o Integrate

4

Platform Architecture5

6

Platform Architecture

Platform Architecture7

Platform Architecture Support Layer

Init Daemon

GUIs

Monitor

App Layer

Traffic Forecast Satellite Image Analysis

Platform Layer

Spark Flink Semantic Layer

Ontario SANSA SemagrowKafka

Real-time Stream Monitoring

...

...

Resource Management Layer (Swarm)

Hardware Layer

Premises Cloud (AWS, GCE, MS Azure, …)

Data Layer

Hadoop NOSQL Store CassandraElasticsearch ...RDF Store

8

BDE Supported FrameworksSearch/indexing Data processing

Apache Solr Apache Spark

Data acquisition Apache Flink

Apache Flume Semantic Components

Message passing Strabon

Apache Kafka Sextant

Data storage GeoTriples

Hue Silk

Apache Cassandra SEMAGROW

ScyllaDB LIMES

Apache Hive 4Store

Postgis OpenLink Virtuoso

9

Platform features◎ BDE Development Environment

o Stack buildero Workflow buildero Instructions to add custom components

◎ Administrative Interfaceo SwarmUIo Logger Interface

◎ UI Integratoro Workflow monitoro Integrated web interface

10

BDE Integrator UI-WorkFlow 11

StackbuilderSelect components => (Push Create-Flow)

WorkFlow builderArrange Components => (Push Monitor)

SwarmUISee the scaling and scale up/down

BDE LoggerNavigate the componentUI and deploy jobs

Git-clone

New Stack

Integrator UI

WorkMonitorDeployment status of Components => (Push OK)

BDE-IDE

Stack Builder12

Stack Editor13

Component

Services/dockers

BDE Workflow Builder14

Component 1

Component 2

Component 3

BDE Workflow Monitor15

Component 1Finished

Component 2Finished

Component 3Inprogress

Swarm UI-Pipeline16

Increase number of instances

Monitor17

Integrator UI18

Component 1 Component 2

Demo @ booth-10, 1st Floor

19

◎ Open-Source, Community Driveno Commitment from core BDE consortium team

o Independent BD components maintenance

o Platform Maintenance driven by BDI users

◎ Adopterso Feuga , Eurostat, ILVO, I2cat, Vicomtech, IoF, ...

◎ Follow Up Projectso HOBBIT, Special, BigDataOcean, Qrowd, BETTER, …

Maintenance and Uptake20

BDE vs Hadoop distributionsHortonworks Cloudera MapR Bigtop BDE

File System HDFS HDFS NFS HDFS HDFS

Installation Native Native Native Native lightweight virtualization

Flexible Modular Architecture no no no no yes

High Availability Single failure recovery (yarn)

Single failure recovery (yarn)

Self healing, mult. failure rec.

Single failure recovery (yarn)

Failure recovery

Cost Commercial Commercial Commercial Free Free

Scaling Freemium Freemium Freemium Free Free

Addition of custom components

Not easy No No No Yes

Integration testing yes yes yes yes --

Operating systems Linux Linux Linux Linux Windows/Mac/Linux

Management tool Ambari Cloudera manager MapR Control system

- Docker swarm + Custom UI

21

SANSAScalable Semantic Analytics Stack

22

SANSA: Vision

SANSA Layers

RDF to Tensors

Machine Learning Layer

••

••

Interactive SANSA in Browser

Semantic Data Lake

◎ Data Lake o Repository of data collected in its original formatso Structured, semi-structured, unstructuredo Schema-less

◎ Semantic Data Lakeo Add a Semantic Layer on top of source datasets❖ The data is semantically lifted using ontologies❖ Provide a uniform view over nonuniform data

29

Metadataproperty -> data source (type)

Semantic Data Lake30

Decomposing User QuerySPARQL query

Database XML File

?item gho:Country ?country .?item gho:Disease ?disease .

...

SELECT country, disease, ...

FROM Observations

Finding Relevant Data Sources+ Queries Translation

SQL XPathSQL

MongoDB

JSONPath

SQL

XML

MongoDB

Execution Plan

Thank you!

31

BDI on Github:https://github.com/big-data-europe

Technical [email protected]

Project Website:www.big-data-europe.eu

SANSA Stackhttps://github.com/SANSA-Stack [email protected]

Paris, EBDVF – 22nd November 2017

The mobility use case in Thessaloniki

◎ Multisource datasets (speed, traffic flow, travel time) are being used in Thessaloniki for the provision of traffic status short-term prediction based on mobility/traffic patterns recognition.

◎ Integration of machine learning techniques using the travel times, traffic counts and speeds as well as the correlations of traffic speed, to train an appropriate Neural Network Model for efficient and robust traffic speed prediction.

The datasets

◎ Floating Car Datao 500 – 2.500 speed measurements per minute

o Location, speed, orientation, status

o Hundreds of Gb (historical dataset)

Mobility services in Thessaloniki

Mobility services in Thessaloniki

Mobility services in Thessaloniki

Mobility services in Thessaloniki

Mobility services in Thessaloniki◎TrafficThess (http://www.trafficthess.imet.gr)

o Visual representation of the current as well as past speeds in Thessaloniki, Greece

o Email notificationso Historical raw data export per link in open format

◎TrafficPaths (http://www.trafficpaths.imet.gr)o Descriptive information of the current travel times wherever available

(Thessaloniki, Patra, Irakleon, Serres, Kavala)o Mobile friendly web page

◎TrafficThess Reports (http://www.trafficthessreports.imet.gr) o Visual and descriptive representation of the current traffic conditions (speeds &

travel times) on the main roads of Thessaloniki, Greeceo Highly customizable email notificationso Normalized historical data export per road in open formato Traffic calendar (powered by Google)

◎BDE (http://trafficstatusprediction.imet.gr/#)

Mobility services in ThessalonikiTrafficThess (http://www.trafficthess.imet.gr)Reliable traffic conditions monitoring on a 24/7/365 basis

Traffic conditions in the city of Thessaloniki, Greece during snowfall on 10 & 11 Jan 2017:

https://youtu.be/2z12tUkuwaM(credits to anmpout for helping out with the video)

Keep calm!It’s just another congestion on the ring road…

Mobility services in Thessaloniki

Mobility services in ThessalonikiTrafficPaths (http://www.trafficpaths.imet.gr)Calculation of travel times on a 24/7/365 basis

Mobility services in ThessalonikiTrafficThess Reports (http://www.trafficthessreports.imet.gr)

A personalized single point of access

Mobility services in Thessaloniki• Datatank (Back office + restAPIs)

• CKAN (front end)

http://opendata.imet.gr/dataset

Paris, EBDVF – 22nd November 2017

DR. JOSEP MARIA SALANOVA [email protected] +30 2310 498 433