19
How Spark is Enabling the New Wave of Converged Applications Tugdual Grall MapR Technologies

Spark Summit EU talk by Tug Grall

Embed Size (px)

Citation preview

Page 1: Spark Summit EU talk by Tug Grall

How Spark is Enabling the New Wave of Converged Applications

Tugdual Grall MapR Technologies

Page 2: Spark Summit EU talk by Tug Grall

Decreasing Job Latencies

Hours Mins Secs Milli Secs

on-disk

in-memory Tipping Point

Page 3: Spark Summit EU talk by Tug Grall

Analytics & ETL: Batch or Continuous ?

Value of Data

Time since data is generated

Value of Data

Volume of Data used for Analytics

It’s not an either or, you have to do both

Page 4: Spark Summit EU talk by Tug Grall

Why Stream Processing?

6:01 P.M.: 32° 6:02 P.M.: 32° 6:03 P.M.: 33° 6:04 P.M.: 36° 6:05 P.M.: 37° 6:06 P.M.: 36° 6:07 P.M.: 36° 6:08 P.M.: 35° 6:09 P.M.: 35° 6:10 P.M.: 35° 6:11 P.M.: 35° 6:12 P.M.: 35° 6:13 P.M.: 35°

37°

It was hot at 6:05 yesterday!

Batch processing may be too late for some events

Page 5: Spark Summit EU talk by Tug Grall

Why Stream Processing? It’s becoming important to process events as they arrive

6:05 P.M.: 37°Topic

Temperature

Turn on the air conditioning!

Stream

Page 6: Spark Summit EU talk by Tug Grall

Advanced Analytics

Descriptive Predictive Streaming Prescriptive

● What Happened ● Why did it happen ● Discovery in nature ● Batch Analytics

● What will happen ● Combines historical data with

rules and algorithms ● ML (Batch + Real Time)

● What + When + Why ● Suggestions

to take advantage of future opportunity or mitigate risks

● Agility is key to success.

● Analyse data as it happens ● Triggers and Alarms. ● Anomaly detection ● Continuous ETL and Analytics

There is a need to converged these Analytics

Page 7: Spark Summit EU talk by Tug Grall

Converged Computing

Offline Real Time

Programmatic Spark & ML Spark Streaming

SQL Spark SQL Spark Structured Streaming

Page 8: Spark Summit EU talk by Tug Grall

The Many “Convergences” In Progress

CONVERGENCE

On Prem & Cloud

Analytics & Operations

Data at Rest & Data in Motion

Storage & Compute

Files, Tables, Stream data

Page 9: Spark Summit EU talk by Tug Grall

Spark on Non-Converged Platform

Kafka

Topic

Topic

Clu

ster

1

Clu

ster

3

NoSQL Database

Advanced Analytics

ManagementMonitoringSecurity

ManagementMonitoring

Security

Hadoop/S3 Storage

ManagementMonitoringSecurity

Kafka Cluster

Clu

ster

2

Real-time dashboards

Real-Time Producers

• Redundant 3x Management, Monitoring and Security • Redundant 3x Data Storage

Page 10: Spark Summit EU talk by Tug Grall

Converged Computing & Converged Data Management

Page 11: Spark Summit EU talk by Tug Grall

11

Open Source Engines & Tools Commercial Engines & Applications

Enterprise-Grade Platform Services

Dat

aPr

oces

sing

Web-Scale StorageMapR-FS MapR-DB

Search and Others

Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability

MapR Streams

Cloud and Managed Services

Search and Others

Unified M

anagement and M

onitoring

Search and Others

Event StreamingDatabase

Custom Apps

HDFS API POSIX, NFS HBase API JSON API Kafka API

MapR Converged Data Platform

Page 12: Spark Summit EU talk by Tug Grall

SAMPLE CUSTOMER USE CASE

Page 13: Spark Summit EU talk by Tug Grall

13

Website Click-Stream

Topic

Topic

Topic

Topic

Real Time/Offline ClickStream Analysis

Internal Data Sources

External Data Sources

Support Tickets

DBMSEmail

CRM

● Prediction Modelling ● Attribution Modelling ● Cohort Analysis ● Customer Lifetime Value ● Attrition Modelling ● Response Modelling ● Churn Modelling

Eliminate latency due to data movement between clusters

Datalake/DataHub

Eliminate Redundant storage with MapR streams and lower the TCO

360 Degree Customer View

Customer Behavior PredictionBetter Conversion Rate and Lower attrition $$$

Offline Real Time

HA, DR, NFS, Snapshots, Data Protection

Customer 360 & Behavior prediction

Page 14: Spark Summit EU talk by Tug Grall

STREAMING FIRST ARCHITECTURE

Page 15: Spark Summit EU talk by Tug Grall

What Do We Exactly Need to Do ?Serve DataStore DataCollect Data Process DataData Sources

Stream

Topic

NFS/POSIX

Page 16: Spark Summit EU talk by Tug Grall

Trinity of Real Time

Real-Time Producers

Top

Topic

Global Messaging System

Transformational Tier

Operational NoSQL/Document

Database

Real Time Analytics

Page 17: Spark Summit EU talk by Tug Grall

Continuous Streaming ETL & Computed Analytics

17

DB

Application

Topic

Topic

Topic

Topic

● 60 events/sec ● 10 MB/event ● Tabled based

topics

Search Application

Multi-Tier Data Archival

Level 1 Aggregates

Level 2 Aggregates

Level 3 Aggregates

Pre-Computed

On-Demand

Advanced ML Analytics

Delta Aggregates

Pre-compute analytics with Spark Streaming on Data-in-motion

Page 18: Spark Summit EU talk by Tug Grall

Q&A

1. Read explanation of and Download code – https://www.mapr.com/blog/fast-scalable-streaming-applications-mapr-streams-spark-streaming-and-mapr-db – https://www.mapr.com/blog/spark-streaming-hbase

2. Get Started: MapR Converged Data Platform https://www.mapr.com/get-started-with-mapr 3. Get Answers: MapR Converge Community https://community.mapr.com/community/answers 4. Get Trained: MapR On-Demand Training https://learn.mapr.com

Engage with us!

Page 19: Spark Summit EU talk by Tug Grall

THANK YOU.Contact information or call to action goes here.