41
BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I. BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH http://guidoschmutz.wordpress.com @gschmutz Streaming Visualization DOAG Konferenz 2019 Guido Schmutz

Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I. BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH

http://guidoschmutz.wordpress.com@gschmutz

Streaming VisualizationDOAG Konferenz 2019Guido Schmutz

Page 2: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Agenda

1. Motivation / Introduction

2. Stream Data Integration & Stream Analytics Ecosystem

3. Three Blueprints for Streaming Visualization

End-to-End Demo available here:https://github.com/gschmutz/various-demos/tree/master/streaming-visualization

Page 3: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH

GuidoWorking at Trivadis for more than 22 yearsConsultant, Trainer, Platform Architect for Java, Oracle, SOA and Big Data / Fast DataOracle Groundbreaker Ambassador & Oracle ACE Director

@gschmutz guidoschmutz.wordpress.com

175th

edition

Page 4: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK
Page 5: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Motivation / Introduction

Page 6: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Timely decisions require new data immediately

Page 7: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Keep the data in motion …

Data at Rest Data in Motion

Store

(Re)Act

Visualize/Analyze

StoreAct

Analyze

111010101010110

111010101010110

vs.

Visualize

Page 8: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Hadoop ClusterdHadoop ClusterBig Data

Reference Architecture for Data Analytics Solutions

SQL

Search

Service

BI Tools

Enterprise Data Warehouse

Search / Explore

File Import / SQL Import

Event Hub

Data Flow

Data FlowChange DataCapture Parallel

Processing

Storage

Storage

Raw

Ref

ined

Results

SQL Export

Microservice State

{ }

API

StreamProcessor

State

{ }

API

EventStream

EventStream

Search

Service

Stream Analytics

MicroservicesEnterprise Apps

Logic

{ }

API

Edge Node

Rules

Event Hub

Storage

Bulk Source

Event Source

Location

DBExtract

File

DB

IoTData

MobileApps

Social

Event Stream

Telemetry

Page 9: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Two Types of Stream Processing(by Gartner)

Stream Data Integration• focuses on the ingestion and processing of

data sources targeting real-time extract-transform-load (ETL) and data integration use cases

• filter and enrich the data

Stream Analytics• targets analytics use cases

• calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events)

• Complex events may signify threats or opportunities that require a response from the business

Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte

Page 10: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Stream Data Integration & Stream Analytics Ecosystem

Page 11: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Stream Data Integration & Stream Analytics Ecosystem

Stream Analytics

Event Hub

Open Source Closed Source

Stream Data Integration

Source: adapted from Tibco

Edge

Page 12: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Apache Kafka – A Streaming Platform

Kafka Cluster

Consumer 1 Consume 2r

Broker 1 Broker 2 Broker 3Zookeeper Ensemble

ZK 1 ZK 2ZK 3

Schema Registry

Service 1

Management

Control Center

Kafka Manager

KAdmin Producer 1 Producer 2

kafkacat

Data Retention:• Never• Time (TTL) or Size-based• Log-Compacted based

Producer3Producer3

ConsumerConsumer 3

Page 13: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Apache Kafka – A Streaming Platform

SourceConnector

SinkConnector

trucking_driver

KSQL Engine

Kafka Streams

Kafka Broker

Page 14: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Demo using Kafka Stack for Stream Data Integration

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data Flow ??

Filter: #doag2019,….User: @gschmutz

Page 15: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Demo: Kafka Connect to retrieve Tweets

curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" \-H "Content-Type: application/json" \--data '{

"name": "twitter-source","config": {

"connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",

"twitter.oauth.consumerKey": "xxxxx","twitter.oauth.consumerSecret": "xxxxx","twitter.oauth.accessToken": "xxxx","twitter.oauth.accessTokenSecret": "xxxxx","process.deletes": "false","filter.keywords": "#doag2019","filter.userIds": "15148494","kafka.status.topic": "tweet-raw-v1","tasks.max": "1"}

}'

Page 16: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Demo: KSQL for Streaming ETL

CREATE STREAM tweet_sWITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS SELECT id , createdAt , text , user->screenNameFROM tweet_raw_s;

CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1', VALUE_FORMAT='AVRO');

SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word FROM tweet_raw_sWHERE lang = 'en' or lang = 'de';

SELECT id, LCASE(hashtagentities[0]->text) FROM tweet_raw_sWHERE hashtagentities[0] IS NOT NULL;

Page 17: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Demo using Kafka Stack for Stream Data Integration

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data Flow ??

Filter: #voxxeddaysbanff,#java,#kafka,….User: @VoxxedDaysBanff, @gschmutz

Page 18: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Visualization: many many options!

But do they all support Streaming Data?

Page 19: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Three Blueprints forStreaming Visualization

Page 20: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP1: Fast datastore with regular polling from consumer

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Page 21: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP1-1: Elasticsearch / Kibana

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Alternatives:SOLR & Banana

Page 22: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP1-2: InfluxDB / Grafana or Chronograf

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Alternatives:Prometheus & GrafanaDruid & Superset

Page 23: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP1-3: NoSQL & Custom Web

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Page 24: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP-1: Demo Redis NoSQL & Custom Web

https://opensky-network.org/

Page 25: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP1-4: Kafka Streams Interactive Query & Custom App

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Alternatives:Flink…

Page 26: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP2: Direct Streaming to the Consumer

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Page 27: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP2-1: Kafka Connect to Slack / WhatsApp

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Alternatives:TwitterSMS…

Page 28: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP-2-1: Demo Kafka Connect to Slack

curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" \-H "Content-Type: application/json" \--data '{

"name": "slack-sink","config": {"connector.class": "net..SlackSinkConnector","tasks.max": "1","topics":"slack-notify","slack.token":”XXXX","slack.channel":"general","message.template":"tweet by ${USER_SCREENNAME} with ${TEXT}",

}}'

Page 29: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP2-2: Kafka to Tipboard (Dashboard Solution)

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Alternatives:DashingGeckoboard…

Page 30: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP2-2: Demo Kafka to Tipboard (Dashboard Solution)

http://allegro.tech/tipboard/

Page 31: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP2-2: Demo Kafka to Tipboard (Dashboard Solution) c.subscribe(['DASH_TWEET_COUNT_BY_HOUR_T'])

while True:msg = c.poll(1.0)

data = json.loads(msg.value().decode('utf-8'))data_selected = data.get('NOF_TWEETS’)data_prepared = prepare_for_just_value(data_selected)data_jsoned = json.dumps(data_prepared)data_to_push = { 'tile': TILE_NAME, 'key': TILE_KEY

, 'data': data_jsoned }resp = requests.post(API_URL_PUSH, data=data_to_push)

def prepare_for_just_value(data):# data={"title": "Number of Tweets:", "description": "(1 hour)", "just-value": "23"

data_prepared = datadata_prepared = {'title': '# Tweets:', 'description': 'per hour’,

'just-value': data_prepared}return data_prepared

Page 32: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP2-3: Web Sockets / SSE & Custom Modern Web App

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Sever Sent Event (SSE)

Page 33: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3: Streaming SQL Result to Consumer

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

Page 34: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3-1: KSQL and Arcadia Data

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

Page 35: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3-1: Demo KSQL and Arcadia Data

https://www.arcadiadata.com/

Page 36: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3-2: KSQL with REST API to Custom Web App

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

Page 37: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3-2: Demo KSQL with REST API

curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’ -i http://analyticsplatform:8088/query --data '{

"ksql": "SELECT text FROM tweet_raw_s;","streamsProperties": { "ksql.streams.auto.offset.reset": "latest” }

}'

{"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to @Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null}

{"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy \n\n#SaturdayThoughts#SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics #…"]},"errorMessage":null,"finalMessage":null}

{"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her while you can! �@ARUKscientist� �@S_Bauermeister� #bigdata #ARUKConfhttps://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null}

{"row":{"columns":["Blockchain Competitive Innovation Advantage"]},"errorMessage":null,"finalMessage":null}

Page 38: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3-3: Spark Streaming & Oracle Stream Analytics

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

Page 39: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

BP3-3: Demo Spark Streaming & Oracle Stream Analytics

https://www.oracle.com/middleware/technologies/complex-event-processing.html

Page 40: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Summary

BP1: Fast Store & Polling

• “classic” pattern

• Not end-to-end “data-in-motion” -> “Data-at-rest” before visualization

• Slight delay might not be acceptable for monitoring dashboard

• Can use full power of data store(s) => NoSQL

• In-memory reduces overhead

BP2: Stream to Consumer

• minimal latency

• More difficult on “client side”

• good if stream holds directly what should be displayed

• More difficult if data in stream needs to be analyzed before visualization

• No historical info available

BP3: Streaming SQL

• Minimal latency

• Power of SQL query engine available for visualization

• possibility for “self-service” style visualization

• Some analytics are more difficult on streaming data

• No historical info available

Page 41: Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK