92
The Journey to Becoming a Data- Driven Enterprise Pivotal Big Data Roadshow 2015

Pivotal Big Data Roadshow

  • Upload
    pivotal

  • View
    82.116

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Pivotal Big Data Roadshow

The Journey to Becoming a Data-Driven Enterprise

Pivotal Big Data Roadshow 2015

Page 2: Pivotal Big Data Roadshow

2 © Copyright 2015 Pivotal. All rights reserved.

Where we’re going today…

3 Great Keynotes

•  Journey to a Data-driven Enterprise

•  Data Science Use Cases

•  Streaming Data and Internet of Things

Internet of Things Demo and Architecture Overview

Intensive hands-on training sessions

Page 3: Pivotal Big Data Roadshow

3 © Copyright 2015 Pivotal. All rights reserved.

Today’s Agenda

8:00 AM - 9:00 AM - Check-In & Breakfast

9:00 AM - 10:15 AM - Journey to a Data-Driven Enterprise…

10:15 AM - 10:30 AM - Coffee Break

10:30 AM - 12:00 PM - Internet of Things Demo and Architecture Overview

12:00 PM - 1:00 PM - Lunch & Birds of a Feather Discussion

1:00 PM - 3:00 PM - Hands-On Technical Workshop

Page 4: Pivotal Big Data Roadshow

© Copyright 2015 Pivotal. All rights reserved.

MASHING BIG DATA WITH BIG MACHINES IS ‘BEAUTIFUL, DESIRABLE, INVESTABLE’ - IT COULD TRANSFORM GE'S BUSINESS -

AND THE ECONOMY.

JEFF IMMELT, CEO, GE

”JEFF IMMELT, CEO, GE

Page 5: Pivotal Big Data Roadshow

© Copyright 2015 Pivotal. All rights reserved.

ANALYZING INTERNET OF THINGS USING BIG DATA SUITE

Internet of Things matter for... •  Industrial Manufacturers •  Transportation •  Healthcare, Life Sciences •  Financial Services •  Retail •  Telecom and Media

Page 6: Pivotal Big Data Roadshow

© Copyright 2015 Pivotal. All rights reserved.

THE POWER OF 1

R X

Increasing Freight Utilization Rail

Predictive Maintenance Healthcare

Predictive Diagnostics Power

Driving Outcomes That Matter

One Percent Improvement Equals $27B

Industry Value by Reducing System

Inefficiency

$63B Industry Value by Reducing Process

Inefficiency

$66B Industry Value with

Efficiency Improvements In Gas-fired Power

Plant Fleets Source: General Electric

Page 7: Pivotal Big Data Roadshow

© Copyright 2015 Pivotal. All rights reserved.

THE INTERNET OF THINGS JOURNEY

STORE •  Structured

•  Unstructured

•  High Volume

•  High Velocity

ANALYZE •  Predictive Analytics

•  Machine Learning

•  Advance Data Science

•  Realtime Analytics

DEVELOP •  Advanced Analytic Pipelines

•  Realtime Analytical Applications

•  Global Scale Data-Driven Applications

•  Enterprise, Consumer, IoT, and Mobile

INNOVATE •  Agile Dev Expertise

•  DevOps

•  Hybrid Cloud

•  Continuous Delivery

•  Closed Loop Applications

AGILE DEVELOPMENT

BIG DATA PREDICTIVE ANALYTICS

ENTERPRISE PAAS

Page 8: Pivotal Big Data Roadshow

8 © Copyright 2015 Pivotal. All rights reserved.

0% of CIOs think their IT infrastructure is fully prepared for big data (3)

30% of companies have deployed advanced analytics, 11% big data analysis (4)

44% of new applications failed to meet performance expectations (5)

2X 90% of companies allocate at least 2X more cloud capacity than needed to ensure performance (6)

But…

80% of CEOs thinking data mining and analysis are strategically important (1)

4% of companies use analytics effectively (2)

(1) 2015 PWC CEO Survey; (2)2013 Baine and Company - The Value of Big Data; (3) 2014 IT Infrastructure Conversation - IBM; (4) Ernest and Young - 2014 Enterprise IT Trends and Investments; (5) 2014 Riverbed Tecnologies - The Transformers; (6) 2014 ElasticHosts CIO Study

LARGE ENTERPRISE BIG DATA TROUBLE

Page 9: Pivotal Big Data Roadshow

9 © Copyright 2015 Pivotal. All rights reserved.

BIG DATA CHASM

70% of data

generated by customers

80% of data stored

3% prepared for

analysis

0.5% being

analyzed

<0.5% being

operationalized

9

THE DATA DIVIDE

Page 10: Pivotal Big Data Roadshow

10 © Copyright 2015 Pivotal. All rights reserved.

Software Is Eating The World

Data Is Fueling Software

SOFTWARE IS EATING THE WORLD

Page 11: Pivotal Big Data Roadshow

11 © Copyright 2015 Pivotal. All rights reserved.

WE CHOSE PIVOTAL BECAUSE WE BELIEVE IT PROVIDES A 360-DEGREE VIEW OF THE PROCESS. FROM A DATA SCIENCE AND DATA TECHNOLOGY PERSPECTIVE, IT MEANS DELIVERING BEST-IN-

CLASS DATA TECHNOLOGIES AND ENABLING THEM ON THEIR PLATFORM.

Page 12: Pivotal Big Data Roadshow

12 © Copyright 2015 Pivotal. All rights reserved.

ACROSS INDUSTRIES

Page 13: Pivotal Big Data Roadshow

13 © Copyright 2015 Pivotal. All rights reserved.

THE NEW DATA IMPERATIVES

Converged Data & Cloud

Open Data-Driven Apps

Page 14: Pivotal Big Data Roadshow

14 © Copyright 2015 Pivotal. All rights reserved.

THE BIG DATA PROBLEM

Fragmentation Constraints Complexity

Page 15: Pivotal Big Data Roadshow

15 © Copyright 2015 Pivotal. All rights reserved.

•  Remove Lock-in

•  Leverage Ecosystem

•  Co-innovate

GUIDING PRINCIPLES IN THE NEW ERA

OPEN AGILE CLOUD-READY

•  Shorten innovation cycles

•  Reduce TCO

•  Improve TTM

•  Solve business problems

•  Avoid lock-in

•  Appropriate security

Page 16: Pivotal Big Data Roadshow

16 © Copyright 2015 Pivotal. All rights reserved.

JOURNEY TO A DATA-DRIVEN ENTERPRISE

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

Page 17: Pivotal Big Data Roadshow

17 © Copyright 2015 Pivotal. All rights reserved.

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

DATA-DRIVEN COMPANIES: USE MODERN DATA INFRASTRUCTURE

Page 18: Pivotal Big Data Roadshow

18 © Copyright 2015 Pivotal. All rights reserved.

MODERNIZE DATA INFRASTRUCTURE

Elastic, Scale-out storage and processing

Flexible data types and pipelining

ETL on demand: low operational cost Expanded use cases

Higher quality analytics Lowered storage/processing cost

Less fragmented ecosystem Reduced vendor lock-in

REQUIREMENTS BENEFITS

Cloud friendly and open-source based

Page 19: Pivotal Big Data Roadshow

19 © Copyright 2015 Pivotal. All rights reserved.

Modernize data infrastructure

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

DATA-DRIVEN COMPANIES: STRATEGICALLY USE ADVANCED ANALYTICS

Page 20: Pivotal Big Data Roadshow

20 © Copyright 2015 Pivotal. All rights reserved.

ADVANCED ANALYTICS

Leverage existing skills and tools Rapid time to insights

Internet of Things use cases Rapid time to insights

Solve business problems Predictive insights: proactive execution

REQUIREMENTS BENEFITS

Machine learning and advanced analytics

010101010101010100101010101010101100101010

SQL- compliant batch and interactive queries

Massive stream processing

0101010101010101001010

1010101010110010101010

10101010

Page 21: Pivotal Big Data Roadshow

21 © Copyright 2015 Pivotal. All rights reserved.

Modernize data infrastructure

Perform advanced analytics Discover insights

DATA-DRIVEN COMPANIES: INNOVATE AT SCALE

Deploy analytic apps and automate at scale

Page 22: Pivotal Big Data Roadshow

22 © Copyright 2015 Pivotal. All rights reserved.

ANALYTIC APPS AND AUTOMATION AT SCALE

Reduced time to action Low ‘analytics ó app-dev’ integration cost

Reduced time to insights Flexible ingestion: low operating cost

High performance: low operating cost Transactional safety: business critical ops

REQUIREMENTS BENEFITS

Low-latency, distributed in-memory transactions

Resilient, scale-out messaging and object storage

Agile analytic app-dev with enterprise PaaS PaaS

Page 23: Pivotal Big Data Roadshow

23 © Copyright 2015 Pivotal. All rights reserved.

JOURNEY TO A DATA-DRIVEN ENTERPRISE

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

Pivotal Data Science helps you move from BI to Data Science

Pivotal Labs helps you move to an agile development of apps at scale

Pivotal Data Engineering helps you move from data administration to data engineering

Page 24: Pivotal Big Data Roadshow

24 © Copyright 2015 Pivotal. All rights reserved.

•  Remove Lock-in

•  Leverage Ecosystem

•  Co-innovate

GUIDING PRINCIPLES IN THE NEW ERA

OPEN AGILE CLOUD-READY

•  Shorten innovation cycles

•  Reduce TCO

•  Improve TTM

•  Solve business problems

•  Avoid lock-in

•  Appropriate security

Page 25: Pivotal Big Data Roadshow

25 © Copyright 2015 Pivotal. All rights reserved.

World’s First Open Sourced,

Enterprise-Class Data Portfolio

+ Open Data Platform

PIVOTAL BIG DATA SUITE

OPEN AGILE CLOUD-READY

Modern Data Infrastructure

+ Advanced Analytics

+ Apps at Scale

Multiple Cloud Deployment Models

+ Big Data Suite on Pivotal

Cloud Foundry

Page 26: Pivotal Big Data Roadshow

26 © Copyright 2015 Pivotal. All rights reserved.

PIVOTAL BIG DATA SUITE

Page 27: Pivotal Big Data Roadshow

27 © Copyright 2015 Pivotal. All rights reserved.

Open sourcing all Pivotal Big Data Suite components including:

WORLD’S FIRST OPEN SOURCED BIG DATA PORTFOLIO BUILDING ON SUCCESS OF CLOUD FOUNDRY FOUNDATION

BUILT FOR ENTERPRISES

Pivotal GemFire Pivotal HAWQ Pivotal Greenplum Database

Page 28: Pivotal Big Data Roadshow

28 © Copyright 2015 Pivotal. All rights reserved.

BUILT FOR ENTERPRISES

Value added features: enterprise grade performance + robustness without lock-in •  Advanced Query Optimization in analytics

•  WAN replication and continuous query in transactional processing

Flexible Deployment models: align to business objectives and needs •  Balance cost objectives with policy and compliance requirements

•  Leverage Pivotal’s pre-integration + certification on supported configurations

Enterprise grade support: one throat to choke for the suite •  Focus on business problems – not on lifecycle management

•  Expert support on Big Data Suite means reduced business risk

Page 29: Pivotal Big Data Roadshow

29 © Copyright 2015 Pivotal. All rights reserved.

•  Common core for Hadoop ecosystem

•  Rapidly accelerated certifications, ecosystem development and enterprise-grade quality

OpenDataPlatform.org

OPEN

Page 30: Pivotal Big Data Roadshow

30 © Copyright 2015 Pivotal. All rights reserved.

AGILE

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

Spring XD Spark Pivotal HD & Open Data Platform

Pivotal Greenplum Database

Pivotal HAWQ Rabbit MQ

Redis

Pivotal GemFire Pivotal BDS on PCF

Pivotal Cloud Foundry

Page 31: Pivotal Big Data Roadshow

31 © Copyright 2015 Pivotal. All rights reserved.

CLOUD-READY

COMMODITY HARDWARE APPLIANCE HYBRID CLOUD CLOUD

IaaS IaaS

PAAS

Page 32: Pivotal Big Data Roadshow

32 © Copyright 2015 Pivotal. All rights reserved.

THE INTERNET OF THINGS JOURNEY WITH PIVOTAL BIG DATA SUITE

STORE •  Structured

•  Unstructured

•  High Volume

•  High Velocity

ANALYZE •  Predictive Analytics

•  Machine Learning

•  Advance Data Science

•  Realtime Analytics

DEVELOP •  Advanced Analytic Pipelines

•  Realtime Analytical Applications

•  Global Scale Data-Driven Applications

•  Enterprise, Consumer, IoT, and Mobile

INNOVATE •  Agile Dev Expertise

•  DevOps

•  Hybrid Cloud

•  Continuous Delivery

•  Closed Loop Applications

AGILE DEVELOPMENT

BIG DATA PREDICTIVE ANALYTICS

ENTERPRISE PAAS

Spring XD

Spark

Pivotal HD & Open Data Platform

Spring XD

Pivotal Greenplum Database

Pivotal HAWQ

Spring XD

Pivotal GemFire

Redis

Rabbit MQ

Spring IO

Groovy

Pivotal BDS on PCF

Pivotal Cloud Foundry

Pivotal Labs Data Science Data Engineering

Page 33: Pivotal Big Data Roadshow

33 © Copyright 2015 Pivotal. All rights reserved.

WHY PIVOTAL FOR BIG DATA ? Complete platform

SQL on Hadoop leadership

Deployment options

Open source

Flexible licensing

Advanced data services

Pivotal Data Engineering Pivotal Labs Pivotal Data Science

Page 34: Pivotal Big Data Roadshow

34 © Copyright 2015 Pivotal. All rights reserved.

Complete platform

SQL on Hadoop leadership

Deployment options

Open source

Flexible licensing

Advanced data services

WHY PIVOTAL FOR BIG DATA ? Pivotal Data Engineering Pivotal Labs Pivotal Data Science

Page 35: Pivotal Big Data Roadshow

35 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO, CHECKOUT…

•  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

•  Pivotal Academy @ https://pivotal.biglms.com Or reach out to your local Pivotal Account Executive…

Page 36: Pivotal Big Data Roadshow

36 © Copyright 2015 Pivotal. All rights reserved. 36 © Copyright 2013 Pivotal. All rights reserved.

Pivotal Data Science Overview and Use Cases Pivotal Big Data Roadshow

Page 37: Pivotal Big Data Roadshow

37 © Copyright 2015 Pivotal. All rights reserved.

DATA SCIENCE?

App Development

Analytics

Business Intelligence Reporting

Visualization

Dashboards

Insights Big Data

Machine Learning Statistics

Mathematics Time Series

Algorithms

Databases

Software

Modeling

Queries

Real-Time

Sensors

Predictive Models

ETL

Research

Hadoop

Distributed Computing

MapReduce

SQL

In-Memory

OLAP

Text Mining

Unstructured Data

Open Source

Decision Science

Ad Hoc Queries

Hacking

In-Database Analytics

Internet of Things

Data Cleansing

Sentiment

Page 38: Pivotal Big Data Roadshow

38 © Copyright 2015 Pivotal. All rights reserved.

•  ETL

•  Unstructured

•  Data Cleansing

•  Sensors

Data Related

•  Algorithms

•  Mathematics

•  Time Series

•  Statistics

•  Predictive Modeling

•  Machine Learning

•  Text Mining

•  Sentiment

•  Map Reduce

Fields of Study & Techniques

•  Dashboards

•  Insights

•  Visualization

•  Ad Hoc Queries

•  Reporting

Business Intelligence

•  Software

•  In-Database Analysis

•  Distributed Computing

•  Hadoop

•  Open Source

Implementation

•  Big Data

•  Decision Science

•  Internet of Things

•  Real-Time

•  Hacking

•  In-Memory

Industry Buzzwords

Page 39: Pivotal Big Data Roadshow

39 © Copyright 2015 Pivotal. All rights reserved.

What is Data Science?

The use of statistical and machine learning techniques on big multi-structured data in a distributed computing environment to identify correlations and causal relationships, classify and predict events, identify patterns and anomalies, and infer probabilities, interest, and sentiment.

DRIVE AUTOMATED, LOW-LATENCY ACTIONS IN RESPONSE TO EVENTS OF INTEREST

Page 40: Pivotal Big Data Roadshow

40 © Copyright 2015 Pivotal. All rights reserved.

Gene Sequencing

Smart Grids

COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014

READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS 250 MILLION

PHOTOS EACH DAY

Billions of Data Points

Oil Exploration

Video Surveillance

OIL RIGS GENERATE

25000 DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

Page 41: Pivotal Big Data Roadshow

41 © Copyright 2015 Pivotal. All rights reserved.

What is Big Data Analytics?

Descriptive Analytics

WHAT HAPPENED? Diagnostic Analytics

WHY DID IT HAPPEN? Predictive Analytics

WHAT WILL HAPPEN?

Prescriptive Analytics

HOW CAN WE MAKE IT HAPPEN?

Complexity

Value of Analytics

($)

Page 42: Pivotal Big Data Roadshow

42 © Copyright 2015 Pivotal. All rights reserved.

Pivotal Big Data Suite

P L A T F O R M

Data Science Toolkit KEY TOOLS KEY LANGUAGES

SQL

Page 43: Pivotal Big Data Roadshow

43 © Copyright 2015 Pivotal. All rights reserved.

A single address for everything analytics Analytics with Pivotal

Time-to-Insights

FORECASTING CLUSTERING

REGRESSION

CLASSIFICATION

OPTIMIZATION

Page 44: Pivotal Big Data Roadshow

44 © Copyright 2015 Pivotal. All rights reserved.

Smart Systems = Sensors + Digital Brain + Actuators

Problem Formulation

Modeling Step

Data Step Application Step

Data Science for Building Models

Sensors & Actuators

Data Lake

Page 45: Pivotal Big Data Roadshow

45 © Copyright 2015 Pivotal. All rights reserved. 45 © Copyright 2013 Pivotal. All rights reserved.

Data Science Use Cases

Page 46: Pivotal Big Data Roadshow

46 © Copyright 2015 Pivotal. All rights reserved. 46 © Copyright 2013 Pivotal. All rights reserved.

Smart Meter Analytics

Page 47: Pivotal Big Data Roadshow

47 © Copyright 2015 Pivotal. All rights reserved.

The Digital Brain: Making a Smart Grid Smarter!

Action: Where (and when) to send trucks, preventive maintenance

The Digital Brain: Uses Fourier transform extracts patterns and flags outliers/anomalies

Input: Data from smart meters

Page 48: Pivotal Big Data Roadshow

48 © Copyright 2015 Pivotal. All rights reserved.

Smart Meter Analytics – Significant Use Cases

•  Load profiling • Theft prevention • Demand prediction •  Load forecasting • Root cause of power failures • Black-out warning • Anomaly detection • Network topology error

detection

Page 49: Pivotal Big Data Roadshow

49 © Copyright 2015 Pivotal. All rights reserved.

SOLUTION

•  Analyze smart meter power data using unsupervised clustering techniques and detect anomalies based on distance metric in clusters

•  Reduce time required to monitor and improve grid efficiencies

•  Leveraged the MPP architecture of Pivotal GPDB and MADlib in-database machine learning library for fast computation at scale

Electricity Network Load Profiling and Outlier Detection CUSTOMER

A major smart grid infrastructure provider

BUSINESS PROBLEM

Profile power consumption patterns based on smart meter data and flag anomalous usage

CHALLENGES

•  Large volume of smart meter data (several months of data from 100s of thousands of meters) could not be analyzed effectively by legacy system

•  Timely business insights on large scale smart grid infrastructure demand fast processing of data

Page 50: Pivotal Big Data Roadshow

50 © Copyright 2015 Pivotal. All rights reserved.

Electricity Network Load Profiling and Outlier Detection

Dashboards for navigating clusters and outliers

Page 51: Pivotal Big Data Roadshow

51 © Copyright 2015 Pivotal. All rights reserved.

Network Topology Error Detection CUSTOMER

A major utility

BUSINESS PROBLEM

Use load and voltage meter readings to determine errors in transformer network topologies

CHALLENGES

•  Time consuming process to detect network topology errors on entire network in legacy system

•  Timely detection of network topology errors requires big data infrastructure and analytical capabilities

SOLUTION

•  For each transformer network in parallel, solve an LP to determine scale of topology error, which can be used to flag and rank anomalous network topologies

•  Reduce time for topology error detection from several days/weeks to few minutes!

Page 52: Pivotal Big Data Roadshow

52 © Copyright 2015 Pivotal. All rights reserved. 52 © Copyright 2013 Pivotal. All rights reserved.

Security and Fraud

Page 53: Pivotal Big Data Roadshow

53 © Copyright 2015 Pivotal. All rights reserved.

Attacker elevates access to important user, service and admin accounts, and specific systems

Data is acquired from target servers and staged for exfiltration

Data is exfiltrated via encrypted files over ftp to external, compromised machine at a hosting provider

A handful of users are targeted by two phishing attacks: one user opens Zero day payload (CVE-02011-0609)

The user machine is accessed remotely by Poison Ivy tool

Advanced Persistent Threat (APT) APT Kill Chain

1 432

Phishing & Zero Day Attack Back Door Lateral Movement Data Gathering Exfiltrate

5

Page 54: Pivotal Big Data Roadshow

54 © Copyright 2015 Pivotal. All rights reserved.

Anomalous User-to-Resource Access Detection BUSINESS PROBLEM

Detect anomalous user behaviors in the global enterprise computer network

SUMMARY Given local-to-local communication data, identify anomalous users within an enterprise. �  Reduce malware-dwell time, typical 243 days �  Signature-based approaches cannot detect

such behavior

CHALLENGES

10 Billion events in 6 months; 15K+ network devices; No existing SIEM solutions can model user behavioral resource access baseline and enable anomaly detection in an adaptive and scalable architecture.

SOLUTION

An innovative Graph Mining based algorithmic framework with advanced Machine Learning. Network topology and temporal behaviors are both modeled. (Patent pending). Implemented in MPP and PL/R, enabling parallel model training and behavior risk scoring. Successfully identified DLP violating anomalous users.

Page 55: Pivotal Big Data Roadshow

55 © Copyright 2015 Pivotal. All rights reserved. 55 © Copyright 2013 Pivotal. All rights reserved.

Financial Services

Page 56: Pivotal Big Data Roadshow

56 © Copyright 2015 Pivotal. All rights reserved.

Identifying and Pricing Cross-Sell Opportunities CUSTOMER

A global financial services provider

BUSINESS PROBLEM

Identify cross-sell opportunities between two business arms of a financial institution.

CHALLENGES

Integration of large-scale data originating from multiple data warehouses. Developing predictive models to identify novel cross-sell opportunities within the financial institution. Evaluate the identified cross-sell opportunities by their revenue potential.

SOLUTIONS

�  Fast integration of data in Pivotal Greenplum Database.

�  Predictive models and evaluation of profitability: –  Association rule. –  Logistic regression for each product

offered. –  Estimation of revenue opportunity.

�  On-demand reporting and visualization via custom dashboards connected to in-database models.

Page 57: Pivotal Big Data Roadshow

57 © Copyright 2015 Pivotal. All rights reserved.

Credit Risk Assessment and Stress Testing CUSTOMER

A global financial services provider

BUSINESS PROBLEM

Speed up the process of compliance reporting and stress testing for Basel III.

CHALLENGES

Running the calculation procedures on the customer’s legacy database were time-consuming, therefore had to be done in overnight batch mode.

SOLUTION

�  Implement risk asset calculation and stress testing on Pivotal Greenplum Database.

�  Three years of data was processed in well under 2 minutes, significantly faster than the customer’s current procedures.

�  Connect an “in- database” visualization tool to Pivotal Greenplum Database via ODBC for on-demand reporting and visualization.

Page 58: Pivotal Big Data Roadshow

58 © Copyright 2015 Pivotal. All rights reserved.

Financial Compliance BUSINESS PROBLEM •  Ensure compliance with Dodd-Frank and Basel

Committee regulations •  Identify underlying risk and fraud while reducing the

compliance department’s overburdened

Emails Chats Trades

Transactions Policy Securities

Phone Calls Watch Lists …

Financial compliance Data Lake

Data integration

Data clean up Modeling Classification

and ranking

Analyst user interfaces Feedback

Analytics

Analyst feedback Data integration: e.g., append trade information with email and chat communications

Data cleanup: e.g., identify newsletters and spam emails

Modeling: •  Predictive modeling to flag

messages and trades •  Graph and cohort analysis

Analyst feedback Reviewed fraud instances included in periodic model refreshes

SOLUTION �  A data lake platform coupled with cutting edge data

science techniques �  Flexible user interface to promote an adaptive,

continuously learning compliance framework

Page 59: Pivotal Big Data Roadshow

59 © Copyright 2015 Pivotal. All rights reserved.

Pivotal Topic & Sentiment Analysis Engine

External Tables

PXF

HDFS

Source: http Sink: hdfs

Parallel Parsing of JSON

(PL/Python)

HAWQ

Nightly Cron Jobs

Topic Analysis through MADlib pLDA

Unsupervised Sentiment Analysis

(PL/Python)

D3.js Spring XD

Twitter Decahose (~55 million tweets/day)

Page 60: Pivotal Big Data Roadshow

60 © Copyright 2015 Pivotal. All rights reserved.

http://blog.pivotal.io/data-science-pivotal

Check out the Pivotal Data Science Blog!

Page 61: Pivotal Big Data Roadshow

61 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO…

•  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

•  Pivotal Academy @ https://pivotal.biglms.com

•  Or reach out to your local Pivotal Account Executive…

Page 62: Pivotal Big Data Roadshow

62 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming and IoT Using Pivotal Big Data Suite

Page 63: Pivotal Big Data Roadshow

63 © Copyright 2015 Pivotal. All rights reserved.

Converging Trends

Innovation New Data New Processes New Insights

The Journey to the Data-Driven Enterprise

Data Science and Machine

Learning Big Data

Internet of Things

Page 64: Pivotal Big Data Roadshow

64 © Copyright 2015 Pivotal. All rights reserved.

HDFS

Data Lake

Ingest Store Analytics

Hard to change Labor intensive

Inefficient

Coding based No real-time information Based on expensive ETL

Migrating from a Reactive, Static and Constrained Model…

Page 65: Pivotal Big Data Roadshow

65 © Copyright 2015 Pivotal. All rights reserved.

HDFS Data Lake Expert System /

Machine Learning

In-Memory Real-Time Data

Continuous Learning Continuous Improvement

Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

To Pro-Active, Self-Improving, Machine Learning Systems

Page 66: Pivotal Big Data Roadshow

66 © Copyright 2015 Pivotal. All rights reserved. New York Times Research: http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

“50-80% OF THE TIME ON DATA

SCIENCE PROJECTS IS SPENT ON DATA WRANGLING

Page 67: Pivotal Big Data Roadshow

67 © Copyright 2015 Pivotal. All rights reserved.

Data Feeds Stream Processing

Expert Systems Machine Learning

Historical Data

Business Value Smart Decisions

Still…

HDFS

Data Lake

Page 68: Pivotal Big Data Roadshow

68 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink SpringXD

GemFire

Data Stream Needs an Agile, Scalable and Fast Solution

HAWQ GPDB

Data Lake

Page 69: Pivotal Big Data Roadshow

69 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink SpringXD

Distributed Computing

In-Memory Real-Time Data

Spring XD Orchestrates and Automates all the Steps on Data Stream Pipelining

Expert System / Machine Learning

Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB

Data Lake

Page 70: Pivotal Big Data Roadshow

70 © Copyright 2015 Pivotal. All rights reserved.

INGEST / SINK PROCESS ANALYZE

•  No coding required

•  Dozens of built-in connectors

•  Seamless integration with Kafka, Sqoop

•  Create new connectors easily using Spring

•  Call Spark, Reactor or RxJava

•  Built-in configurable filtering, splitting and transformation

•  Out-of-box configurable jobs for batch processing

•  Import and invoke PMML jobs easily

•  Call Python, R, Madlib and other tools

•  Built-in configurable counters and gauges

Spring XD State of the Art Data Pipeline Automation

Page 71: Pivotal Big Data Roadshow

71 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink SpringXD

Distributed Computing

GemFire Provides Scalable, Low-Latency Data Access, Storage and Event Processing

Expert System / Machine Learning

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB

Data Lake

Page 72: Pivotal Big Data Roadshow

72 © Copyright 2015 Pivotal. All rights reserved.

GemFire

•  In-Memory Enterprise Data Grid •  Horizontally Scalable, Consistent, Highly

Available •  Event handling •  Continuous Queries •  Enterprise Data Geo Distribution

In-memory Real Time Data

Page 73: Pivotal Big Data Roadshow

73 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink SpringXD

Distributed Computing

Pivotal Provides SQL Based Advanced Analytics

Expert System / Machine Learning

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable

Data Lake

HAWQ GPDB

Page 74: Pivotal Big Data Roadshow

74 © Copyright 2015 Pivotal. All rights reserved.

HAWQ

•  Massively Parallel Processing RDBMS on HADOOP

•  ANSI SQL on Hadoop •  Extremely high performance for

analytics (not like Hive) •  Stores all data directly on

HDFS •  Open-Source

Advanced SQL analytics in Hadoop

Combining SQL with Hadoop is key for analytics

SQL remains #1 choice for Data Science

Page 75: Pivotal Big Data Roadshow

75 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink SpringXD

Developers and Data Scientists Can Focus on the Business Value of Data

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable

Data Lake

HAWQ GPDB

Page 76: Pivotal Big Data Roadshow

76 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

Distributed Computing Real-Time Data Expert Systems &

Machine Learning Advanced Analytics

HDFS Data Lake

Page 77: Pivotal Big Data Roadshow

77 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

HDFS Data Lake

GemFire HAWQ GPDB

SpringXD

Page 78: Pivotal Big Data Roadshow

78 © Copyright 2015 Pivotal. All rights reserved.

“SO WE ARE MOVING TO A WORLD WHERE THE

MACHINES WE WORK WITH ARE NOT JUST INTELLIGENT; THEY ARE BRILLIANT.THEY ARE

SELF-AWARE, THEY ARE PREDICTIVE, REACTIVE AND SOCIAL. IT'S A WORLD WHERE INFORMATION ITSELF BECOMES INTELLIGENT AND COMES TO US

AUTOMATICALLY WHEN WE NEED IT WITHOUT HAVING TO LOOK FOR IT.

”MARCO ANNUNZIATA, GE

Page 79: Pivotal Big Data Roadshow

79 © Copyright 2015 Pivotal. All rights reserved.

The IoT Market Verticals Diversified Industrial Manufacturers

Agriculture, Security, Retail

Auto Manufacturers

Media (via Mobile devices)

Urban Infrastructure, Cities

Consumer, Connected Home etc.

Healthcare, Life Sciences

Page 80: Pivotal Big Data Roadshow

80 © Copyright 2015 Pivotal. All rights reserved.

“THE MAGIC HAPPENS WHEN YOU MARRY THE

TRADITIONAL ENGINEERING APPROACH WITH THE DATA SCIENCE ENABLED BY THE DATA LAKE. IT OPENS UP A WHOLE NEW WORLD OF POSSIBLE

‘WHAT IF’ QUESTIONS.

”DAVE BARTLETT, GE AVIATION

Page 81: Pivotal Big Data Roadshow

81 © Copyright 2015 Pivotal. All rights reserved.

GE Aviation – Big Data & IoT •  Goal

•  Improve jet engine efficiency and increase service profitability •  Unable to store & analyze massive amounts of data for analytics

•  Solution •  LARGE DATA SETS ingested via batch •  Store 100s TB of engine data in Hadoop (PHD) •  Open doors for industrial engineers to poke at data (HAWQ) •  FAST MACHINE LEARNING based algorithms

•  2000x faster, 10x cheaper •  Customer portals for visibility

Page 82: Pivotal Big Data Roadshow

82 © Copyright 2015 Pivotal. All rights reserved.

“THE REAL OPPORTUNITY FOR

CHANGE...SURPASSING THE MAGNITUDE OF THE CONSUMER INTERNET...IS THE INDUSTRIAL

INTERNET, AN OPEN, GLOBAL NETWORK THAT CONNECTS PEOPLE, DATA AND MACHINES.

”JEFF IMMELT, CEO, GE

Page 83: Pivotal Big Data Roadshow

83 © Copyright 2015 Pivotal. All rights reserved.

GE Energy – Fast Data & IoT •  Goal

•  Failing gas turbines causing issues with power generation •  Unable to store & process fire-hose of data

•  Solution •  HIGH VELOCITY data ingestion from Gas Turbines •  Store 10 TB of turbine data in memory (GemFire) •  VERY LOW LATENCY and HIGH SPEED data access •  “Predictive” Maintenance

Page 84: Pivotal Big Data Roadshow

84 © Copyright 2015 Pivotal. All rights reserved.

“… YOU USE THOSE DEVICES TO INSPECT CARS AND KEEP TRACK OF INSPECTION RECORDS. THAT CAN THEN FLOW INTO ASSET MANAGEMENT SOFTWARE TO PROVIDE PREDICTIVE ANALYSIS OF WHEN THE

ASSET NEEDS TO BE MAINTAINED ... WHEN YOU BOIL THAT DOWN TO BUSINESS, IT’S ABOUT

COMPETITIVE ADVANTAGE.

”BRAD HOWELL, LODESTAR LOGISTICS

Page 85: Pivotal Big Data Roadshow

85 © Copyright 2015 Pivotal. All rights reserved.

GE Transportation – Big & Fast Data •  Goal

•  Help rail companies manage locomotives better •  Fast data from tracks & Big data from sensors in locomotives

•  Solutions •  MACHINE LEARNING MODEL built from combined data set (MADLib) •  REAL TIME SCORING of rail sensor data (Spring XD) •  REAL-TIME ALERTING of critical events via email & a REAL TIME DASHBOARD (Spring)

Page 86: Pivotal Big Data Roadshow

86 © Copyright 2015 Pivotal. All rights reserved.

“WE EXPECT THE PRECISION AGRICULTURE SPACE

TO CONTINUE TO GROW QUICKLY AS DATA BECOMES CHEAPER TO STORE AND EASIER TO MOVE FROM PLATFORM TO PLATFORM. WE ARE

JUST BEGINNING TO EXPLORE ALL THE VALUE WE CAN CREATE FOR FARMERS WITH THESE TOOLS.

”BRETT BEGEMANN, MONSANTO

Page 87: Pivotal Big Data Roadshow

87 © Copyright 2015 Pivotal. All rights reserved.

Monsanto – Agriculture & IoT •  Goal

•  Help farmers maximize crop yields

•  Solution •  Use of Big Data to collect & store data from farm equipment •  Combine with climate and other information •  Build custom apps using an agile app dev platform (PCF)

Page 88: Pivotal Big Data Roadshow

88 © Copyright 2015 Pivotal. All rights reserved.

IoT - Need for a Platform

A Better Customer Experience

Pivotal Cloud Foundry

Pivotal Big Data Suite

Using Innovative Data-Driven Apps On a Integrated Platform

Page 89: Pivotal Big Data Roadshow

89 © Copyright 2015 Pivotal. All rights reserved.

The Connected Car Architecture INGESTION

JSON / HTTP

STREAM PROCESSING

Spring XD Transform Enrich

DATA LAKE

Pivotal HD Sink

ADVANCED ANALYTICS

HAWQ

REAL-TIME DATA INSIGHTS

GemFire

MOBILE SERVICES

MICROSERVICES

Pivotal CF Dashboard Analytics App Simulator

IoT APPS

Rabbit MQ

PUSH

Page 90: Pivotal Big Data Roadshow

90 © Copyright 2015 Pivotal. All rights reserved.

Horizontally Scalable Fault Tolerant Extensible Open-Source

STREAM PROCESSING

Spring XD

Rabbit MQ

DATA LAKE

Pivotal HD

ADVANCED ANALYTICS

HAWQ

ENRICHER PREDICTIVE ANALYTICS

+ Timestamp & GUID

+ MPG, rangE & route

MOBILE APP

JSON

REAL-TIME DATA INSIGHTS

GemFire

CAR SENSOR

Sink

Tap

DASHBOARD

Page 91: Pivotal Big Data Roadshow

91 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO, CHECKOUT…

•  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

•  Pivotal Academy @ https://pivotal.biglms.com

•  Or reach out to your local Pivotal Account Executive…

Page 92: Pivotal Big Data Roadshow

BUILT FOR THE SPEED OF BUSINESS