14
Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven Alexandre Vasseur, Pivotal @PivotalFrance

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

Embed Size (px)

Citation preview

Page 1: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven

Alexandre Vasseur, Pivotal @PivotalFrance

Page 2: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

If you have one thing to do

Store Massive Data Sets

Achieve Continuous Innovation at Scale

Becoming Data Driven with Apps

Page 3: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

Data Driven Apps AGILE

DEV & DATA SCIENCE

MODERN, COLLABORATIVE

APP & DEV PLATFORM:

MODERN, CLOUD-ORIENTED

& OPEN

DATA FABRIC: MODERN

CLOUD-ORIENTED & OPEN

Page 4: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

The Big Data Problem

Fragmentation Contraints Complexity

Page 5: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

Pivotal + Hortonworks Alliance

•  Started July 2014 around Ambari collaboration •  Announcing Pivotal Big Data Suite

on Hortonworks Data Platform •  Advanced support from world’s leading Hortonworks

support services •  Joint engineering efforts and enhanced Pivotal HD

Page 6: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

ODP - Standardize Hadoop Ecosystem

•  Deliver ODP Core to build a versionned, packaged, tested set of Hadoop components.

•  Focus on developing a platform, rather than projects •  Initial scope on Apache Hadoop

HDFS / MR / Yarn / Ambari

Remove vendors lock-in

Ecosystem Effect

Shorter Innovation Cycles

http://opendataplatform.org

Page 7: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

Open Sourced but not just Hadoop

•  Open sourcing all Pivotal Big Data Suite components –  Pivotal GemFire - premium in-memory NoSQL database

–  Pivotal HAWQ - world’s leading SQL compliant enterprise SQL on Hadoop

–  Pivotal Greenplum Database - advanced enterprise MPP analytic database with Hadoop interconnect

– SpringXD - Unified, distributed, and extensible system for data driven application development

Page 8: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

HAWQ SQL on Hadoop

PROVEN AT SCALE PRODUCTIVE NATIVE on HADOOP / ODP OPEN & EXTENSIBLE

Page 9: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

HAWQ SQL on Hadoop

10+ years R&D in Massively Parallel SQL SQL engine at peta scale analytics in world’s largest industries Mature cost based query optimizer Full SQL semantics Rich ecosystem of ELT/dataviz/BI & partners PL/*, build in analytics, R native framing All Hadoop formats (gz, Parquet, HAWQ etc) Data node short circuit reads (colocated, not M/R based) Predicate pushdown to Hive, HBase HAWQ PXF: Query federation to NoSQL, DB, etc

Page 10: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

SpringXD Data from anywhere, to anywhere Real time & batch

Ingest + analytics + jobs orchestration

Developer friendly Built in connectors

With / without Spark

DSL

Your choice of Hadoop Your choice of messaging

Standalone, YARN & outside Hadoop

Page 11: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

Simplify Data Driven Applications

•  PaaS with NoSQL & Big Data choices built-in •  Emergence of vertical services: Mobile, IoT, …

Data centric runtimes built in Java/PHP/Node.js/Ruby Python R/Shiny Scala SpringXD

Large choice of data services DB, clustered MySQL etc Memcache, Redis etc GemFire, Cassandra etc Hadoop, GreenPlum etc

Can run virtualized inside PaaS Can run multi-tenant-ified alongside PaaS

Page 12: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

DEMO

PHD (or any ODP Core-based Hadoop Distribution)

HDFS

HAWQ (SQL on Hadoop)

GreenplumDB (Analytics DW)

GemFire (JSON/Object

in memory data grid)

Redis (Key Value Store)

Rab

bitM

Q

SpringXD (Stream Processing/scoring)

Spr

ingX

D

Clo

ud F

ound

ry D

ata

Ser

vice

s

HBase Hive

PXF (Filtered Pushdown)

Direct Store Federated

GPHDFS

Write behind Persistence

Analytic Apps Online Apps

Pivotal Big Data Suite

Spark

Page 13: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

© Copyright 2015 Pivotal. All rights reserved.

The New Data Imperatives

Converged Data & Cloud

Open Data-Driven Apps

Page 14: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

A NEW PLATFORM FOR A NEW ERA

Meet us at the booth ! Come to do a “HAWQ in 2 min” lab

Win a Solo2 Beats Headphone !