19
1 © Cloudera, Inc. All rights reserved. Marton Balassi | Solutions Architect | Flink PMC @MartonBalassi | [email protected] Big Data Use Cases in Europe Experiences from the field

Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

1© Cloudera, Inc. All rights reserved.

Marton Balassi | Solutions Architect

| Flink PMC@MartonBalassi | [email protected]

Big Data Use Cases in EuropeExperiences from the field

Page 2: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

2© Cloudera, Inc. All rights reserved.

Introduction

• As a Solutions Architect I have worked with 20+ customers in Europe during the last year

• Focused on architecture, but also involved in implementation

• My favorite topics are stream processing and data science

• Let me share some of the uplifting and the challenging lessons learned from colleagues

of mine and my own experience

• Solutions from Telco, Finance, Retail, Gaming, Data Science

• Disclaimer: My view is my own, subjective and inherently partial.

Page 3: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

3© Cloudera, Inc. All rights reserved.

Let us do our first Hadoop PoC

What is the most common first Hadoop use case?

Page 4: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

4© Cloudera, Inc. All rights reserved.

Data warehouse offloading

• Reproduce an RDBMS-based report

• Easily comparable results

• Ingestion (Sqoop, Flume, Gobblin)

• Storage (HDFS, Kudu, HBase)

• Interactive Query (Impala, Spark

SQL, Hive LLAP, Presto)

• User interface (Hue, Zeppelin)

Page 5: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

5© Cloudera, Inc. All rights reserved.

Let us see some more interesting use cases

Page 6: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

6© Cloudera, Inc. All rights reserved.

Syslog ingest @ Vodafone UK

• SIEM/Cybersecurity depends on

the input data quality and quantity

• Facilitates fault monitoring, threat

intelligence, incident response, and

litigation

• Data is collected on national level

from TCP, UDP syslog

Tristans Stevens,https://blog.cloudera.com/blog/2016/03/building-benchmarking-and-tuning-syslog-ingest-architecture/

Page 7: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

7© Cloudera, Inc. All rights reserved.

Syslog ingest @ Vodafone UK

• Ingestion with Flume, Kafka

• Interactive queries with Impala

• Free-text search with Solr

• Machine Learning with Spark MLLib

Tristans Stevens,https://blog.cloudera.com/blog/2016/03/building-benchmarking-and-tuning-syslog-ingest-architecture/

Page 8: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

8© Cloudera, Inc. All rights reserved.

Augmenting the log analytics pipeline

Michael Sun and Jeff Shmain,https://blog.cloudera.com/blog/2017/03/how-to-log-analytics-with-solr-spark-opentsdb-and-grafana/

Page 9: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

9© Cloudera, Inc. All rights reserved.

Augmenting the log analytics pipeline

Michael Sun and Jeff Shmain,https://blog.cloudera.com/blog/2017/03/how-to-log-analytics-with-solr-spark-opentsdb-and-grafana/

Error tracking

(Solr/Hue)

Custom monitoring

(OpenTSDB/Graphana)

Page 10: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

10© Cloudera, Inc. All rights reserved.

• Search works on distance of features

• The canonical example is searching words in documents

• Searching dresses by color or shape is also possible (given we can describe a shape)

• Implementation relies on Solr

Search is not solely for text

Base implementation by Mathias Lux, https://github.com/dermotte/liresolr.Use case by Nihed Mbarek.

Page 11: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

11© Cloudera, Inc. All rights reserved.

Near real-time transactional analytics system@ Santander• Bank card transactions data

• “Spendlytics” app

• Stored in HBase to serve the

frontend

• Ingested through Flume/Kafka

• Enriched from local RocksDB

instances

James Kinley, Ian Buss, and Rob Siwickihttp://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/

Page 12: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

12© Cloudera, Inc. All rights reserved.

Near real-time transactional analytics system@ Santander• Bank card transactions data

• “Spendlytics” app

• Stored in Hbase to serve the

frontend

• Ingested through Flume/Kafka

• Enriched from local RocksDB

instances

James Kinley, Ian Buss, and Rob Siwickihttp://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/

Page 13: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

13© Cloudera, Inc. All rights reserved.

Scalable Real-Time Analytics Platform @ King.com

• Low latency Gaming analytics

• Analysts write Groovy scripts

• Deployed in Apache Flink

• 30 billion events/day

• RocksDB state in TB scale

• State is queryable from the outside

Gyula Fora, Mattias Anderssonhttps://data-artisans.com/blog/rbea-scalable-real-time-analytics-at-king

Page 14: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

14© Cloudera, Inc. All rights reserved.

Scalable Real-Time Analytics Platform @ King.com

• Low latency Gaming analytics

• Analysts write Groovy scripts

• Deployed in Apache Flink

• 30 billion events/day

• RocksDB state in TB scale

• State is queryable from the outside

Gyula Fora, Mattias Anderssonhttps://data-artisans.com/blog/rbea-scalable-real-time-analytics-at-king

Page 15: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

15© Cloudera, Inc. All rights reserved.

A new breed of Data Science libraries

• Hail is a Genomics library

• Implemented in Python, on Spark

• Genome sequencing is feasible,

today we are facing thousands of

sequences

• Easy access to distributed

computing is key

Tom White, Jonathan Keebler https://blog.cloudera.com/blog/2017/05/hail-scalable-genomics-analysis-with-spark/

Page 16: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

16© Cloudera, Inc. All rights reserved.

Data Science environments

• Notebook environments (Jupyter,

Zeppelin)

• Great for story telling

• Pain points:

• Collaboration

• Multi-tenancy

• Security

• New solutions are emerging…Tristan Zajonchttps://blog.cloudera.com/blog/2017/05/getting-started-with-cloudera-data-science-workbench/

Page 17: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

17© Cloudera, Inc. All rights reserved.

We have some gotchas too…

Page 18: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

18© Cloudera, Inc. All rights reserved.

Be mindful of…

• Educating your team

• Security

• Authentication

• Authorization

• Encryption

• Auditing, lineage

• Workflow management

Page 19: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in

19© Cloudera, Inc. All rights reserved.

Thank you@[email protected]