21
Building a modern data architecture March 31, 2016 Ben Sharma | CEO and Founder [email protected]

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

  • Upload
    zaloni

  • View
    839

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Building a modern data architecture March 31, 2016

Ben Sharma | CEO and Founder

[email protected]

Page 2: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

•  Award-winning provider of enterprise data lake management solutions:

Integrated data lake management platform

Self-service data preparation

•  Data Lake Design and Implementation Services

•  Data Science Professional Services

2 Zaloni Proprietary

Delivering on the business of big data

Funded by top-tier technology investors:

Page 3: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Data lakes will be central to the modern data architecture

Agility Insight Scalability

3 Zaloni Proprietary

Page 4: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

•  Store all types data: structured and unstructured data

•  Store raw data in its original form for extended period of time

•  Uses various tools to correlate, enrich and query for insights on the data

•  Provides democratized access via a single unified view across the Enterprise

The promise of a data lake: All data is welcome….

Zaloni Proprietary 4

Page 5: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Data architecture modernization Tr

aditi

onal

N

ew

Data Lake

Sources ETL EDW

Derived (Transformed)

Discovery Sandbox EDW

Streaming

Unstructured Data

Various Sources

Zaloni Proprietary

Data Discovery Analytics

BI

Data Science Data Discovery

Analytics BI

5

Page 6: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Data lake challenges and complications

•  Ingestion

•  Lack of Visibility

•  Privacy and Compliance

•  Quality Issues

•  Reliance on IT

•  Reusability

•  Rate of Change

•  Skills Gap

•  Complexity

Building: Managing: Delivering:

Zaloni Proprietary 6

Engage the business

• Discover • Enrich

• Provision

Govern the data in the lake

• Cleanse • Secure

• Operationalize

Enable the data lake

•  Ingest • Organize • Catalog

Page 7: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Data lake reference architecture Consumption

ZoneSource System

File Data

DB Data

ETL Extracts

Streaming

TransientLoading Zone

Raw Data Refined Data

Trusted Data

DiscoverySandbox

Original unaltered data attributes

Tokenized Data

APIs

Reference Data Master Data

Data WranglingData DiscoveryExploratory Analytics

Metadata Data Quality Data Catalog Security

Data Lake

Integrate to common formatData ValidationData CleansingAggregations

OLTP or ODS

Enterprise Data Warehouse

Logs(or other unstructured

data)

Cloud Services

Business AnalystsResearchersData Scientists

Zaloni Proprietary 7

Page 8: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Data lake management platform

Unified Data Management

Managed Ingestion

Data Reliability

Data Visibility

Data Security and Privacy

Integrated Data Lake

Management

Zaloni Proprietary 8

Page 9: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

•  Ability to ingest vast amounts of data

•  Ability to handle a wide variety of formats (streaming, files, custom)

•  Ability to handle wide variety of sources

•  Capture operational metadata implicitly as new data arrives

•  Build in repeatability through automation to pick up incoming data and apply pre-defined processing

First things first….managed ingestion

Various Sources

Streaming

Unstructured Data

Zaloni Proprietary 9

Page 10: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

•  Reduced time to insight for analytics

•  File and record level watermarking provides data lineage

Capture metadata to improve data visibility and reliability

Type of Metadata Description Example

Technical Captures the form and structure of each data set

Type of data (text, JSON, Avro), structure of the data (fields and their types)

Operational Captures lineage, quality, profile and provenance of the data

Source and target locations of data, size, number of records, lineage

Business Captures what it all means to the user

Business names, descriptions, tags, quality and masking rules

Zaloni Proprietary 10

Page 11: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Diagram derived from Gartner report on Self Service Data Preparation

•  Interactive data preparation to address errors, corrupted formats, duplicates •  Data enrichment to go from raw to refined •  Self service to prepare data without IT request/SQL knowledge

Data ready: Data preparation required for actionable data

Orchestrate and automate workflows

Transform Refined Data

Explore

BI Reports Enterprise Data

Integrations

Data Science Data Discovery

Analytics Raw Data

Automation

Reusable Transformations

Data Preparation

Zaloni Proprietary 11

Page 12: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

•  Data lakes enable multiple groups to share access to centrally stored data

•  Differing permissions require enhanced data security

§  Mask or tokenize data before published in the lake for consumption

§  Policy-based security

•  Metadata management enables audit and traceability

•  End result: more open and democratized access to data in the lake for those with permission

Protect sensitive data

Zaloni Proprietary 12

Page 13: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Discover, Enrich, Provision

Self Service Data Preparation for Analytics: Catalog, Wrangling, Collaboration •  See what data is available across your enterprise •  Blend data in the lake without a costly IT project •  Perform interactive data-driven transformations •  Collaborate and share data assets and transformations with peers

EXPLORE PREPARE OPERATIONALIZE

13 Zaloni Proprietary

Page 14: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Catalog with KPIs

Zaloni Confidential and Proprietary 14

Page 15: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

•  Seeing rapid increase of big data in the Cloud •  Leverage cloud platforms as complementary to on-premises •  Support sensitive data on premise and external data in the cloud

(e.g. client data, machine-generated)

Key data challenges for hybrid environments:

“Ground to Cloud” hybrid architectures

Zaloni Proprietary

VISIBILITY GOVERNANCE

Need enterprise-wide data catalog (logical data lake)

Need consistent data governance requirements for hybrid platforms

15

Page 16: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

INGEST Manage data ingestion

so you know what is your Hadoop Data Lake

ORGANIZE Define and capture

metadata for ease of searching and browsing

ENRICH Orchestrate and manage

the data preparation process

ENGAGE Data visibility and self-

service data preparation

Manage the complete data pipeline

16 Zaloni Proprietary

Page 17: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Network Data Lake architecture

BI Tools

Network Data Lake

Custom Apps

Data Warehouse

Custom Applications: •  Subscriber Usage •  Network Usage Exploration & Ad-hoc Analytics

Data Lake

Manage Ingestion Manage Metadata Manage, Monitor, Schedule

Operations and Metadata Store

Data Quality & Rules Engine

Transformation

Engine

Work flow Executor

Enterprise Data

Warehouse

•  CDR •  DPI

•  IPFIX

•  SNMP •  RADIUS

Network Data

•  CRM •  Billing •  Inventory

Enterprise Data

Zaloni Proprietary 17

Page 18: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Managed data lake for healthcare payers

Data Lake Management

Edge NodeData Sources

Relational

Streaming

Files

Data Lake

Configure Ingestion Administer Metadata Manage, Monitor, Schedule

Operations and Metadata Store

Data Quality & Rules Engine

Transformation Engine

Workflow Executor

Analytical Applications

Enterprise Data Warehouse

Consumers

Data Lake

•  Claims

•  EMR •  Lab/Pathology

•  Pharmacy •  Member

•  Social

•  Enterprise Data

Applications:•  HEDIS Reporting

•  Bundle Payments

•  Medical Benefits

Management

•  Scorecards

•  Enterprise Reports

Batch Ingestion

Streaming Ingestion

Change Data

Capture

Data Sets:

18 Zaloni Proprietary

Page 19: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Data Lake for BCBS239 Compliance (RDARR)

Register/ updatemetadata

RDBMS

Mainframes

Flat files

Binary files

Source Systems

Metadatarepositories

MetadataManagement

solution

Extract/ Readmetadata

Data Ingestion Data Quality and Validation

Layout Standardization

Operational Metadata

Generation

Data at Rest

Data Acquisition Automation

•  Automated Data Acquisition Framework providing timeliness of data

•  Capture Metadata in all phases: Ingestion, Transformation

•  Integration with Enterprise Metadata Management

•  Integrated Data Quality Analysis

Zaloni Proprietary 19

Page 20: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Getting Started

Roadmap

Prototype

Analytics Strategy

Business drivers AND

Business Questions:

Where is fraud

occurring? How to optimize

inventory?

Data Use Cases Platform

Subject areas Source system

Capabilities, Process

Ingest, Organize,

Enrich, Explore

Roadmap

Prototype

Analytics Strategy

1Questions 2 Inputs 3 Outcomes

Zaloni Proprietary 20

+ + =

Page 21: Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Stop by booth #1335 and ask for a copy of our new book and a free t-shirt!

DON’T GO IN THE DATA LAKE WITHOUT US

Zaloni Proprietary