50
Use this title slide only with an image Timo Elliott, July 2016 Exploring New Big Data Architectures

Exploring New Big Data Architectures

Embed Size (px)

Citation preview

Page 1: Exploring New Big Data Architectures

Use this title slide only with an image

Timo Elliott, July 2016

Exploring New Big Data Architectures

Page 2: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2Internal

What Is Big Data?

Nah, just kidding…

Have you noticed that “false information” spelled backwards is “false information”?

Did you know that THIS MORNING there is more data in the world

than EVER BEFORE?!

Page 3: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 3Internal

Big Data Architectures = Digital Business

By 2018, 40% of enterprise architecture teams teams will be distinguished as leaders by their primary focus on applying disruptive technologies to drive business innovation.

By 2018, 40% of enterprise architecture teams will be responsible for advancing the organization's digital business strategy.

By 2018, the new economics of connections will drive organizations to increase investments in connected physical assets and systems by 30%.

By 2018, 20% of enterprise architects will use business ecosystem modeling to identify and predict business moments.

By 2017, 20% of EA will be responsible for identifying new business designs that leverage business algorithms.

Source: Gartner, Predicts 2016: Five Key Trends Driving Enterprise Architecture Into the Future

Page 4: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Internal

“Modern BI”

DATA Self-servicedata preparation

Structured/Unstructured

Internal/External

Batch/Streaming

Integration, blending

Cleansing, augmentation

Agile modeling

BI DBColumnar

In-memory

Self-servicedata analysis

Data discovery

Visual exploration

Dashboards/storytelling

Agile Iteration

OptionalData warehouse

Semantic layers

OLAP Cubes

Page 5: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 5Internal

Data-Driven Approach

Push:• From IT• Data-Driven• Data to Insight• Technology-Centric

A.S.P.I.R.E.

Page 6: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6Internal

Value-Driven Approach

Pull:• From LOB• Outcome-Driven• Insight to Data• Use-Case-Centric

Page 7: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 7Internal

Combination Approach

Push:• From IT• Data-Driven• Data to Insight• Technology-Centric

Pull:• From LOB• Outcome-Driven• Insight to Data• Use-Case-Centric

Page 8: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 8Internal

Invest in Big Data Architectures

INFORMATION ANALYTICS

Page 9: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 9Internal

Invest in Self-Service Data Discovery Tools

“Through 2020 spending on self-service visual discovery and data preparation market will grow 2.5x faster than traditional IT-controlled tools for similar functionality”

– IDC, 2015

Page 10: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10Internal

Invest in Self-Service Data Preparation

SAP Agile Data Preparation

I.e., “Data Blending” — combine, merge, cleanse data

Page 11: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 11Internal

SAP BusinessObjects Cloud

Page 12: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12Internal

You Need Both of These…

Page 13: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 13Internal

Architecture as Platform For the Future: Innovate & Renovate

Source: Hortonworks

Page 14: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14Internal

A Common Question

“We like SAP ERP (and HANA), we like Hadoop, and your BI tools are a standard. But we don’t understand how it’s all going to fit together. Help!”

Page 15: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 15Internal

What is Hadoop?

Page 16: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16Internal

It’s Time to Hug The Elephant!

Page 17: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 17Internal

“Classic” Hadoop Use Cases

Semi-structured data loading / processing• First web data, now IoT / documents / images, etc.

Offload traditional relational DW• Typically no reduction in existing DW, but new data increasingly tiered

Queryable alternative to tape backups• E.g. when upgrade to different ERP system, keep copy of all old data

Page 18: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 18Internal

Page 19: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 19Internal

Coca-Cola East Japan Architecture

Page 20: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 20Internal

Other Interesting Hadoop Use Cases

Fast scale up / down• Game apps company: big fan of Teradata, and found it cheaper to run than Hadoop, but when

individual games became a hit, they needed to be able to scale up (and down) fast

Avoid “brittle” ETL, push schema creation to the business• Large investment bank had dozens of different CRM setups, thousands of ETL jobs that kept

breaking – kept traditional DW, but added data lake -- “it’s all in there – have fun!”

Excel on steroids / exploration• Big, one-off decisions• We don’t know what we don’t know

Page 21: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 21Internal

Sandboxing / Data Extensions

Page 22: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 22Internal

Not Just a Data Store – A Platform

Far more than a batch-driven data store• Many still have an out of date view – Yarn / Spark etc• ”Data at Rest and Data in Motion”• But still not for “transactions” any time soon

Still maturing, still a lot of work, but has proved enterprise value• In particular, overcame biggest security & auditing concerns – Kerberos integration, encryption,

tokenization, Apache Ranger… • Low capital costs to try things out (but don’t underestimate time / training / expertise needed)

Considered the heart of “digital transformation” in some large organizations…• ...At least by the team implementing Hadoop! (but there’s typically a large ”traditional IT”

modernization effort going on at the same time)

Page 23: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 23Internal

Centrica (British Gas)

Page 24: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 24Internal

Zurich Insurance

Page 25: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 25Internal

Renault Big Data Architecture

“We just intercept data in motion”• Two-speed taken to extreme QualitySales and

MarketingSupply Chain Engineering

Consumers

Open DataInternet of Things

Producers

Batch (RDBMS, Files)

Messages, Logs

Streaming, Data Flow

NFS Gateway, Sqoop, Spark SQL

FLUMELOGSTASH

KAFKA PRODUCERS Kafka

Broker(Topics)

Spark Streaming

Elasticsearch

HBASEHIVEHDFS

Spark SQLSpark RDD

YARN + HDFS

Page 26: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 26Internal

Apache Atlas – Open Data Governance

Data Classification• Import or define taxonomy business-oriented annotations for data

• Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes

• Export metadata to third-party systems

Centralized Auditing• Capture security access information for every application, process, and interaction with data

• Capture the operational information for execution, steps, and activities

Search & Lineage (Browse)• Pre-defined navigation paths to explore the data classification and audit information

• Text-based search features locates relevant data and audit event across Data Lake quickly and accurately

• Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information

Security & Policy Engine • Rationalize compliance policy at runtime based on data classification schemes

• Advanced definition of policies for preventing data derivation based on classification (i.e. re-identification)

Apache Atlas

Knowledge Store

Audit Store

ModelsType-System

Policy RulesTaxonomies

Tag Based Policies

Data Lifecycle Management

Real Time Tag Based Access Control

REST API

Services

Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOXDodd-Frank

Energy

PPDM

Retail

PCIPII

Other

CWM

Page 27: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 27Internal

Atlas & Ranger

Source: Hortonworks

Page 28: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 28Internal

Apache Nifi – Data in Motion

Page 29: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 29Internal

Apache Flink – Data Stream Analytics

Flink

Historic data

Kafka, RabbitMQ, ...

HDFS, JDBC, ...

ETL, Graphs,Machine LearningRelational, …

Low latency,windowing, aggregations, ...

Event logs

Real-time data streams

(master)

Page 30: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 31Internal

Zeppelin – Analytic “Notebooks”

Page 31: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 32Internal

Result of All This: Data Complexity For The Foreseeable Future

Data Warehouse

Hybrid Transaction/

Analytical Processing

Hadoop,MongoDB,Spark, etc Personal

Data / BI

Where does data arrive?When does it need to move?Where does modeling happen?What can users do themselves?What governance is required?

Big Data Architectures got complicated

What we would like — consistent, seamless solution

Data

Feeds

Page 32: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 33Internal

An Example “Target Big Data Architecture” ETL

IngestionExtracting data from source systems and making it available for up-stream consumption

SourcesExisting and new data sets from external and internal sources

Big Data Platform (Data Lake)Core technology set enabling very high volume computation and storage for raw data and ready to use processed data.

RelationalTraditional RDBMS

Performance ClustersFit-for-purpose clusters targeting real-time and near-real time use cases providing faster storage and access to data

Real-time StreamingReal-time ingestion of data, enabling event processing and visualization

DataServices & Interface LayersETL and APIs that allow data to be extracted from the data platforms and be further analyzed , visualized or exported

Exploratory AnalyticsNew and existing applications to support data discovery and advanced analytics

Application ConsumptionDashboards, reporting and web services to expose the underlying data to external users

Data Management and GovernanceCentralized user management for proper authentication and authorization, meta data management.

InMemory/Appliance

EDWTrad sources

CustomerMobile value chainFixed value chain

Network probes

Machine Logs

Interaction logs

Social media

Others

Event stream Processing

APIs

Connectors

ODBCInformatica

BusinessObjects

Customer facing services

SAS

Others

SAS Visual Analytics

SAS EG/EM

New Analytical Tools

Existing New

Ready to use (Hadoop)

Raw data (Hadoop)

Black boxSemantic

Layer

Splunk

Splunk

Page 33: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 34Internal

The SAP focus: End-to-end value chain

SAP HANA PlatformSPATIAL

PROCESSING

ANALYTICS, TEXT, GRAPH, PREDICTIVE

ENGINES

CONSUME

COMPUTE

STORAGE

SOURCE

INGEST

Application Development Environment

Transformations & Cleansing

Smart Data IntegrationSmart Data Quality

StreamProcessing

Smart Data Streaming

STREAM PROCESSING

LogsTextOLTP Social Machine GeoERP SensorStore & forward

Mobile applications and BI

Smart Data Access

Virtual Tables

User Defined Functions

101010010101101001110

Dynamic Tiering

Aged datain Disk

In-Memory

Data model& data

Calculation engine

Fastcomputing

Column Storage

High performance analytics

Series Data Storage

Store time-series data

Reporting &Dashboards

High Performance Applications

Data Exploration& Visualization

Adhoc & OLAP Analytics

PredictiveAnalysis

Business Planning & Forecasting Lumira / BI

Hadoop / NoSQL

MapReduce

YARN

HDFS

Page 34: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 35Internal

The Journey so far..HANA & Hadoop Integration

HANA & Hadoop Integration SQL on Hadoop via Smart Data Access (virtual tables)

– Hive (SPS06) Remote caching with Hive (SPS07) Connectivity to Apache Spark using ODBC Execution of MR-Jobs via HANA (Virtual Functions)

and direct access to HDFS (SPS 09) Spark SQL adapter via SDA (SPS10) Join relocation to Hadoop thru SparkRDD Unified Admin thru Ambari integration for Hortonworks

Key Benefits Deep Integration for storage & processing Optimized data access between HANA & Hadoop Data tiering to Hadoop for cold storage

Page 35: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 36Internal

Data tiering with SAP HANA

Page 36: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 37Internal

Data Lifecycle Manager (DLM) for Hadoop as a tier

Define a data aging strategy with DLM Leverage SAP HANA Dynamic Tiering (Warm-Store), Hadoop or SAP Sybase IQ in SAP HANA native use cases with a tool based approach to model aging rules on tables to displace ‘aged’ data to HANA extended tables to optimize the memory footprint of data in SAP HANA.

SAP HANA

Data Lifecycle Manager

HOT-STORE(Column Table)

WARM-STORE(Extended Table)

DATA MOVEMENT

*

Page 37: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 38Internal

SAP HANA VoraWhat’s Inside and What Does It Do?

DemocratizeData Access

Make PrecisionDecisions

SimplifyBig DataOwnership

SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Drill Downs on HDFS

Mashup API EnhancementsCompiled Queries

HANA-Spark AdapterUnified LandscapeOpen Programming

Any Hadoop Clusters

Page 38: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 39Internal

SAP Big Data Platform – “Hadoop Inside”Vision

HANA native BigData Dynamic Tiering Smart Data Streaming NoSQL | Graph | Geo |

TimeSeries

HANA & Hadoop SDA Hive | Spark MapReduce | HDFS Admin & Monitoring User Mgmt / Security

Hadoop Extension Vora Engine Integrated with HANA and

Hadoop

HANA Data Management Platform

Instant Results

SAP HANAIn-Memory

0.0sec ∞Infinite Storage Raw Data

HADOOPVora

Information Management | Text | Search | Graph | Geospatial | Predictive

Smart Data Streaming

Administration | Monitoring | Operations | User Management | Security

Page 39: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 40Internal

Key Features -- Vora SQL Engine

#FEA433

Components

Written FromScratch

Multi Platform

Compressed Columns

Parallel QueryProcessing

In Memory Storage Fast Column Scans

Cache EfficientAlgorithms

Code Generation

Page 40: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 41Internal

SAP HANA Vora Modeler

Page 41: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 42Internal

SQL/OLAP on Big Data

• Hierarchical data storage of contextual data supports structured analysis

• Fast drill-down interaction aids in root-cause analysis

• Familiar OLAP tool enables experienced business analysts derive useful insights from contextual data

• Support for HDFS, Parquet and ORC formats

• LLVM/Clang – JIT compilation of query plans and execution

Hadoop/NoSQL DATA

Page 42: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 43Internal

SQL-on-Hadoop using Vora

A different context allows access to SAP HANA data from Spark SQL

Creates an in-memory data object, similar to a Spark dataframe

Load data from HDFS, temp dable will be distributed across Hadoop cluster

Page 43: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 44Internal

SAP Predictive Analytics 3.0

Native Spark Modeling

Standalone or included in SAP HANA

Predictive Factory

Integration with cloud & other apps

Page 44: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 45Internal

DW Directions

SAP HANA DW SAP HANA DWSAP HANA DWOptional Components

DW Foundation

PowerDesigner

HANA EIM

Business Warehouse

SAP HANA Platform

Planning and Definition2015

Market presence in Data Warehousing with a clear roadmap

Strong and simplified offering with tight integration

Convergence into one technology stack addressing BW and SQL-based

DW needs

DWH Foundation

PowerDesigner

HANA EIM

Business Warehouse

SAP HANA Platform

DW Modeling DW ETL & DM

SAP HANA Platform

Analytics , BI Suite, Predictive Analytics , BI Suite, Predictive Analytics , BI Suite, Predictive

HadoopSAP HANA Vora

HadoopSAP HANA Vora

HadoopSAP HANA Vora

This is the current state of planning and may be changed by SAP at any time.

Execution and Delivery2016-2018 Vision

Page 45: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 46Internal

SAP HANA DW – Future-proof data management platform

?

Page 46: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 48Internal

Page 47: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 49Internal

The Big Big Picture

Embrace Hadoop as if it were SAP technology

HANA Hadoop

What SAP does best: business process (live!)

Vora

“infrastructure”

Page 48: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 50Internal

Looking Forward to the Future: “Data Refineries”

Nobody believes that a single big data warehouse is THE solution any more• But they’re not going away any time soon • “Data warehouses are dead! Long live data warehousing!”

Instead:

Enterprise Information Catalog – transparency• Search for data: origin, owner, trust level, sensitivity, formats, how to order…

Data Factories – workflows, not just data• The collective know-how on getting, refining, displaying data

More info from Mike Ferguson, here:http://www.slideshare.net/HadoopSummit/organising-the-data-lake-information-management-in-a-big-data-world

Page 49: Exploring New Big Data Architectures

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 51Internal

Suite

Applications

S/4HANA

DigitalBoardroomIcon

Analytics

C4A

BOBJ

ExtensionsApplicationsIoT

HANA Cloud Platform

(Micro-) Services

IoTPlatform

Identity Management

Business Network

CEC

Platform

HANAEnterprise

Computing Platform

any DB Hadoop

VoraDistributed Computing

Platform

SAP Platform for Digital Transformation

Page 50: Exploring New Big Data Architectures

Thank you

[email protected] siteemail

instagram