34
© 2016 MapR Technologies 1 © 2016 MapR Technologies Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr and More September 7 th , 2016

Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

Embed Size (px)

Citation preview

Page 1: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 1© 2016 MapR Technologies

Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr and More

September 7th, 2016

Page 2: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 2

Today’s Presenters

Steve WooledgeVP, Product & Digital @swooledge

Kandarp DesaiDirector of Engineering @kandarpdesai

Page 3: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 3

Page 4: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 4

Big Data Meets the As-it-Happen World

1 Billion

6 Billion

50 Billion

2000s: Mobile Internet

2020: Internet of People and Things

1990s: Fixed Internet

Connected Devices Worldwide

By 2020, 21% of all “high value” data will come from IoT

- IDC

Page 5: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 5

Legacy

• Complex and slow• Multiple versions of the truth• Difficult governance• High TCO

• Real-time data to action• Single data copy • Easy governance• Low TCO – scales horizontally

Analytics

OperationsData

Data

HTAP (Hybrid Transaction/Analytical Processing) – Gartner 2015

Data

Modern Apps

Page 6: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 6

Use Cases by Industry

Page 7: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 7

A Once-in-30-Year Shift in Data Architecture

Critical infrastructure for next-gen business processes

Next-Gen Applications Legacy Applications

Open Source Analytic Innovations Legacy

Disruptive Data Platform

On Premise Private Cloud Public Cloud

Heterogeneous Hardware

Next-Gen Data Platform

Page 8: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 8

Leading Research Sees Data Platform Convergence

“…we expect data-platform contraction to be driven by convergence of the various approaches to data processing and analytics.

A number of 451 Research enterprise clients are already in the process of assembling what we might call ‘converged data platforms,’ combining operational and analytic databases with data grid/cache technologies, Hadoop and stream-processing technologies.”

- Matt Aslett, 451 ResearchToward a Converged Data Platform, Dec 2015

Page 9: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 9

Next-Gen Application Requirements

Customer Experience

Data Architecture Optimization

Security Investigation & Event Management

Operational Intelligence

Managed Services & Custom AppsA

pps

Proc

essi

ng

Batch Interactive Streaming Transactions Storage

Dat

a

Data Platform/Storage

Page 10: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 10

Silos – Point Solutions

Customer Experience

Data Architecture Optimization

Security Investigation & Event Management

Operational Intelligence

Managed Services & Custom AppsA

pps

Proc

essi

ng

Batch Interactive Streaming Transactions Storage

Dat

a

HDFS NoSQL Event Streaming

RDBMS SAN / NAS

Page 11: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 11

MapR Solution: Common Data Services for All Applications

Customer Experience

Data Architecture Optimization

Security Investigation & Event Management

Operational Intelligence

Managed Services & Custom AppsA

pps

Proc

essi

ng

Batch Interactive Streaming Transactions Storage

Dat

a

HDFS NoSQL Event Streaming

RDBMS SAN / NAS

MapR Converged Data Platform• Combined analytic and operational data• Single copy of data not silos• Unified platform rather than separate point solutions 

Page 12: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 12

Open Source Engines & Tools Commercial Engines & Applications

Enterprise-Grade Platform Services

Dat

aPr

oces

sing

Web-Scale StorageMapR-FS MapR-DB

Search and Others

Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability

MapR Streams

Cloud and Managed Services

Search and Others

Unified M

anagement and M

onitoring

Search and Others

Event StreamingDatabase

Custom Apps

HDFS API POSIX, NFS HBase API JSON API Kafka API

MapR Converged Data Platform

Page 13: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies 13

< 1%

MapR: the Production Choice for Big Data Applications

Best Product High Growth

> 100% Growth

18% Customers with >50 apps**

382% Avg. 3-yr ROI*

700+ CustomersBig Data

Converged Data Platform

Apache Open Source

Churn+ Innovation

* IDC – “The Business Value of MapR”, 2016.** - TechValidate Research, 2015

Page 14: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

KANDARP DESAIDIRECTOR OF ENGINEERING

TWITTER @kandarpdesai

How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr and More

Page 15: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

15©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

This presentation and the accompanying oral presentation contain “forward-looking” statements that are based on our management’s current expectations and projections about future events and trends that we believe may affect our business, financial condition, operating results and growth prospects. Forward-looking statements include all statements other than statements of historical fact contained in this presentation, including information relating to future events or our future financial or operating performance, such as our future product release dates. Forward-looking statements are subject to substantial risks, uncertainties and other factors. These factors, together with those that may be described in greater detail in a registration statement (including a prospectus) that we may subsequently file with the Securities and Exchange Commission (“SEC”) for the transaction to which this presentation relates, may cause our actual results, events, or circumstances to differ materially from those described in our forward-looking statements. You should not rely upon forward-looking statements as predictions of future events. Our forward-looking statements relate only to events as of the date on which the statements are made. We undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of this presentation or to reflect new information or the occurrence of unanticipated events, except as required by law. In addition to financial measures prepared in accordance with generally accepted accounting principles in the United States (“U.S. GAAP”), this presentation includes certain non-GAAP financial measures. We believe that these non-GAAP financial measures are useful as a supplement in evaluating our ongoing operational performance and enhancing an overall understanding of our past financial performance. The non-GAAP financial measures included in this presentation should not be considered in isolation from, or as a substitute for, financial information prepared in accordance with U.S. GAAP. A reconciliation between each non-GAAP financial measure and its nearest GAAP equivalent is included at the end of this presentation. We are an “emerging growth company” as defined under the Securities Act of 1933, as amended (the “Act”). This presentation and the accompanying oral presentation are intended to qualify as communications permitted pursuant to Section 5(d) of the Act. We may file a registration statement (including a prospectus) under the Act with the SEC for the transaction to which this communication relates. In the event we conduct an offering, before you invest you should read the prospectus in the registration statement and other documents we file with the SEC for more complete information about us and the offering. When available, you may get these documents for free by visiting EDGAR on the SEC website at http://www.sec.gov.

SAFE HARBOR

Page 16: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

16©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

COMPENSATION IS A GLOBAL PROBLEM

Page 17: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

17©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

IT IS LARGELY A MANUAL PROCESS THAT

Requires too much personal attention and takes too long

Generates too many errors

Specific Pain Points:

Time consuming, manual process for mid-level analyst…takes 16-24 hours/month

3% Errors on $750,000 commission budget = $22,500

Reps lack visibility into performance, earnings

Executive reporting is manual, as is accruals

Increased plan complexity and growing sales team

Page 18: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

18©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

BEST OF BREED APPROACH

HCM ERP CPQ CRM

Page 19: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

19©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential. 19

Multi-level horizontal scalability

App1

Calc1

App2

Calc2

App3 Appn

Pod-1 Pod-2

SingleSaaS

System

Multiple Pods

Calculation

Application

Calc3 Calcn

App1

Calc1

App2

Calc2

App3 Appn

Calc3 Calcn

RDBMSStorage

Pod-N

App1

Calc1

App2

Calc2

App3 Appn

Calc3 Calcn

DB Nodes DB Nodes DB Nodes

Cache Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1

Page 20: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

20©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

CRUD to MapR-FS or MapR-DB

Async Event(s)

Building Data Pipeline : RDBMS → HadoopMapR Cloudera Hortonworks Apache

Hadoop

Core functionalities

Storage 7 5 4 X

Cluster Management 7 6 5 X

Data Access 8 6 2 X

Strategy & Market Presence

Support & PS 8 6 X X

Roadmap 9 5 X X

Adoption X X X X

Pricing & Company

Customer Count X X X X

Employee Base X X X XNOTES:• Scores in table on scale - 1: lowest; 10 Highest• Actual table contains many more criteria

RDBMS

Page 21: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

21©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Building Data Pipeline : RDBMS Hadoop (MapR-FS)

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

POD 1

POD 2

POD N

CRUD to MFS or MapR-DB

• Billions of Calculation Results Per Month

• Thousands of Events Per Minute

• Homegrown Workflow and Event Management

• Combines RDBMS ACID and MapR-FS Snapshots to Achieve Immutable Copy of Data

• Evolution from Workflow Management to Pub/Sub System Such As Kafka

Page 22: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

22©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Page 23: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

23©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

• Group Transaction Events In Logical Buckets• Efficient Data movement• Easier To Detect Missing Data ( If any )• Re-Broadcasting Is Append Only Event

• Platform First , Product Second Approach• Don’t Take Data Validation Lightly • Investing More Up-Front = Better Future• Choosing Perfect Hadoop Distribution Is Not TRIVIAL • Do Not Underestimate Power Of Snapshots

Lessons Learned

Page 24: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

24©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Building Data Pipeline : Data Platform Processing

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

POD 1

POD 2

POD N

CRUD to MFS or MapR-DB

• Process 11 Years of Empirical Data• Billions of Rows ; 10s TBs of Data• Custom MapReduce Framework Running 1000s of Jobs• Types of Operations : Multiple Types of Joins, Aggregation at

Lowest Level, etc.• Prepares Data for Product and Data Science Team

Batch Map- Reduce / ETL

Converged Data Platform

Page 25: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

25©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Building Data Pipeline : Batch SPARK Processing

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

POD 1

POD 2

POD N

CRUD to MFS or MapR-DB

• Batch Spark Jobs Running On-Demand and Weekly• Types of Operations : Multiple Types of Joins, Aggregation

for application• 100s GBs of Data ; Billions of Records• Was Map-Reduce (~ 17 Hours) -> Now Spark Jobs (~6 hours) • Was Hive ( 10 Hours ) -> Now Drill ( ~ 2 hours )• Off-line Data Science Models

Batch Map- Reduce / ETL

Batch SPARK Processing

Converged Data Platform

Page 26: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

26©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Building Data Pipeline : Real-Time SPARK Processing

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

POD 1

POD 2

POD N

CRUD to MFS or MapR-DB

• Spark Data Processing for Real-Time Web Benchmarking App• Real-Time Data Science Models Processing Such as GLM• Types of Operations : Average, Percentile, Distributions,

others• Long Running Spark Context • Varieties of RDD Caching Techniques to Speed Calculation

Batch Map- Reduce / ETL

Batch SPARK Processing

Agg. Stored on MapR-DB Serving Data

to Benchmarking Application

Converged Data Platform

Page 27: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

27©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Xactly Insights : Web Application

Real-Time Calculation Of Custom GLM

Under 2-3 Seconds Response Time

Calculates Percentiles Real-Time

Under 3 Seconds Response Time

Page 28: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

28©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Page 29: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

29©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Page 30: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

30©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Building Data Pipeline : Spark Transaction Processing

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

POD 1

POD 2

POD N

SQOOP transfers data to MapR-DB

• Short Lived Spark Context Per Tenant• 100 GBs of Data Processing Per

Business• Replacing Store Procedures• 2.5x Faster Processing Speed With

Spark

Spark Processing Generates

Results

RDBMSRDBMS

RDBMSRDBMS

RDBMSRDBMS

POD 1

POD 2

POD N

RDBMS

RDBMS

RDBMS

Converged Data Platform

Page 31: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

31©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Building Data Pipeline : Real-Time Search

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

RDBMSRDBMSRDBMS

POD 1

POD 2

POD N

CRUD to MFS or MapR-DB

• ~ Real Time Search On Thousands of Standard and Custom Fields

• Types of Operations : Multiple Types of Joins an Mappings• 100s of Small Map-Reduce Jobs / 10 minutes• ~100 GBs Data Size / 5 minutes

MapReduce Prepares Data for

Solr InjectionSOLR Engine

Converged Data Platform

Page 32: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

32©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

• Design & Build For Future Not Only Present• Big Data Frameworks : Easy to Use > Extremely Hard to Master• MapR-DB vs MapR-FS• Never Settle for Default Configuration ; Customization Can Make Life Much Better• Do Not Use SPARK Without Proper Understanding

• Simple Debugging Can Consume Entire Sprint Or More• Memory Management In SPARK May Surprise You • Many More

• SPARK SQL Is Good Though Developers Must Prefer Power Of SPARK Scala API.• Retire Hive & Adopt Drill

Lessons Learned

Page 33: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.

Xactly’s vision is to change the world of incentive compensation.For more information, visit www.xactlycorp.com

• WE ARE HIRING !• [email protected] or

https://www.xactlycorp.com/company/careers/

Page 34: Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr, and More

© 2016 MapR Technologies

Q & AEngage with us!

1. Get Case Studies: Big Data All-Starshttps://www.mapr.com/when-streaming-becomes-strategic

2. Get Started: MapR Converged Data Platformhttps://www.mapr.com/get-started-with-mapr

3. Get Answers: MapR Converge Communityhttps://www.mapr.com/big-data-all-stars