Upload
mapr-technologies
View
158
Download
1
Embed Size (px)
Citation preview
© 2016 MapR Technologies 1© 2016 MapR Technologies
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr and More
September 7th, 2016
© 2016 MapR Technologies 2
Today’s Presenters
Steve WooledgeVP, Product & Digital @swooledge
Kandarp DesaiDirector of Engineering @kandarpdesai
© 2016 MapR Technologies 3
© 2016 MapR Technologies 4
Big Data Meets the As-it-Happen World
1 Billion
6 Billion
50 Billion
2000s: Mobile Internet
2020: Internet of People and Things
1990s: Fixed Internet
Connected Devices Worldwide
By 2020, 21% of all “high value” data will come from IoT
- IDC
© 2016 MapR Technologies 5
Legacy
• Complex and slow• Multiple versions of the truth• Difficult governance• High TCO
• Real-time data to action• Single data copy • Easy governance• Low TCO – scales horizontally
Analytics
OperationsData
Data
HTAP (Hybrid Transaction/Analytical Processing) – Gartner 2015
Data
Modern Apps
© 2016 MapR Technologies 6
Use Cases by Industry
© 2016 MapR Technologies 7
A Once-in-30-Year Shift in Data Architecture
Critical infrastructure for next-gen business processes
Next-Gen Applications Legacy Applications
Open Source Analytic Innovations Legacy
Disruptive Data Platform
On Premise Private Cloud Public Cloud
Heterogeneous Hardware
Next-Gen Data Platform
© 2016 MapR Technologies 8
Leading Research Sees Data Platform Convergence
“…we expect data-platform contraction to be driven by convergence of the various approaches to data processing and analytics.
A number of 451 Research enterprise clients are already in the process of assembling what we might call ‘converged data platforms,’ combining operational and analytic databases with data grid/cache technologies, Hadoop and stream-processing technologies.”
- Matt Aslett, 451 ResearchToward a Converged Data Platform, Dec 2015
© 2016 MapR Technologies 9
Next-Gen Application Requirements
Customer Experience
Data Architecture Optimization
Security Investigation & Event Management
Operational Intelligence
Managed Services & Custom AppsA
pps
Proc
essi
ng
Batch Interactive Streaming Transactions Storage
Dat
a
Data Platform/Storage
© 2016 MapR Technologies 10
Silos – Point Solutions
Customer Experience
Data Architecture Optimization
Security Investigation & Event Management
Operational Intelligence
Managed Services & Custom AppsA
pps
Proc
essi
ng
Batch Interactive Streaming Transactions Storage
Dat
a
HDFS NoSQL Event Streaming
RDBMS SAN / NAS
© 2016 MapR Technologies 11
MapR Solution: Common Data Services for All Applications
Customer Experience
Data Architecture Optimization
Security Investigation & Event Management
Operational Intelligence
Managed Services & Custom AppsA
pps
Proc
essi
ng
Batch Interactive Streaming Transactions Storage
Dat
a
HDFS NoSQL Event Streaming
RDBMS SAN / NAS
MapR Converged Data Platform• Combined analytic and operational data• Single copy of data not silos• Unified platform rather than separate point solutions
© 2016 MapR Technologies 12
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
Dat
aPr
oces
sing
Web-Scale StorageMapR-FS MapR-DB
Search and Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and Managed Services
Search and Others
Unified M
anagement and M
onitoring
Search and Others
Event StreamingDatabase
Custom Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
© 2016 MapR Technologies 13
< 1%
MapR: the Production Choice for Big Data Applications
Best Product High Growth
> 100% Growth
18% Customers with >50 apps**
382% Avg. 3-yr ROI*
700+ CustomersBig Data
Converged Data Platform
Apache Open Source
Churn+ Innovation
* IDC – “The Business Value of MapR”, 2016.** - TechValidate Research, 2015
KANDARP DESAIDIRECTOR OF ENGINEERING
TWITTER @kandarpdesai
How to Build a Successful Converged Data Platform with Hadoop, Spark, Solr and More
15©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
This presentation and the accompanying oral presentation contain “forward-looking” statements that are based on our management’s current expectations and projections about future events and trends that we believe may affect our business, financial condition, operating results and growth prospects. Forward-looking statements include all statements other than statements of historical fact contained in this presentation, including information relating to future events or our future financial or operating performance, such as our future product release dates. Forward-looking statements are subject to substantial risks, uncertainties and other factors. These factors, together with those that may be described in greater detail in a registration statement (including a prospectus) that we may subsequently file with the Securities and Exchange Commission (“SEC”) for the transaction to which this presentation relates, may cause our actual results, events, or circumstances to differ materially from those described in our forward-looking statements. You should not rely upon forward-looking statements as predictions of future events. Our forward-looking statements relate only to events as of the date on which the statements are made. We undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of this presentation or to reflect new information or the occurrence of unanticipated events, except as required by law. In addition to financial measures prepared in accordance with generally accepted accounting principles in the United States (“U.S. GAAP”), this presentation includes certain non-GAAP financial measures. We believe that these non-GAAP financial measures are useful as a supplement in evaluating our ongoing operational performance and enhancing an overall understanding of our past financial performance. The non-GAAP financial measures included in this presentation should not be considered in isolation from, or as a substitute for, financial information prepared in accordance with U.S. GAAP. A reconciliation between each non-GAAP financial measure and its nearest GAAP equivalent is included at the end of this presentation. We are an “emerging growth company” as defined under the Securities Act of 1933, as amended (the “Act”). This presentation and the accompanying oral presentation are intended to qualify as communications permitted pursuant to Section 5(d) of the Act. We may file a registration statement (including a prospectus) under the Act with the SEC for the transaction to which this communication relates. In the event we conduct an offering, before you invest you should read the prospectus in the registration statement and other documents we file with the SEC for more complete information about us and the offering. When available, you may get these documents for free by visiting EDGAR on the SEC website at http://www.sec.gov.
SAFE HARBOR
16©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
COMPENSATION IS A GLOBAL PROBLEM
17©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
IT IS LARGELY A MANUAL PROCESS THAT
Requires too much personal attention and takes too long
Generates too many errors
Specific Pain Points:
Time consuming, manual process for mid-level analyst…takes 16-24 hours/month
3% Errors on $750,000 commission budget = $22,500
Reps lack visibility into performance, earnings
Executive reporting is manual, as is accruals
Increased plan complexity and growing sales team
18©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
BEST OF BREED APPROACH
HCM ERP CPQ CRM
19©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential. 19
Multi-level horizontal scalability
App1
Calc1
App2
Calc2
App3 Appn
Pod-1 Pod-2
SingleSaaS
System
Multiple Pods
Calculation
Application
Calc3 Calcn
App1
Calc1
App2
Calc2
App3 Appn
Calc3 Calcn
RDBMSStorage
Pod-N
App1
Calc1
App2
Calc2
App3 Appn
Calc3 Calcn
DB Nodes DB Nodes DB Nodes
Cache Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1 Cache1
20©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
CRUD to MapR-FS or MapR-DB
Async Event(s)
Building Data Pipeline : RDBMS → HadoopMapR Cloudera Hortonworks Apache
Hadoop
Core functionalities
Storage 7 5 4 X
Cluster Management 7 6 5 X
Data Access 8 6 2 X
Strategy & Market Presence
Support & PS 8 6 X X
Roadmap 9 5 X X
Adoption X X X X
Pricing & Company
Customer Count X X X X
Employee Base X X X XNOTES:• Scores in table on scale - 1: lowest; 10 Highest• Actual table contains many more criteria
RDBMS
21©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Building Data Pipeline : RDBMS Hadoop (MapR-FS)
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
POD 1
POD 2
POD N
CRUD to MFS or MapR-DB
• Billions of Calculation Results Per Month
• Thousands of Events Per Minute
• Homegrown Workflow and Event Management
• Combines RDBMS ACID and MapR-FS Snapshots to Achieve Immutable Copy of Data
• Evolution from Workflow Management to Pub/Sub System Such As Kafka
22©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
23©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
• Group Transaction Events In Logical Buckets• Efficient Data movement• Easier To Detect Missing Data ( If any )• Re-Broadcasting Is Append Only Event
• Platform First , Product Second Approach• Don’t Take Data Validation Lightly • Investing More Up-Front = Better Future• Choosing Perfect Hadoop Distribution Is Not TRIVIAL • Do Not Underestimate Power Of Snapshots
Lessons Learned
24©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Building Data Pipeline : Data Platform Processing
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
POD 1
POD 2
POD N
CRUD to MFS or MapR-DB
• Process 11 Years of Empirical Data• Billions of Rows ; 10s TBs of Data• Custom MapReduce Framework Running 1000s of Jobs• Types of Operations : Multiple Types of Joins, Aggregation at
Lowest Level, etc.• Prepares Data for Product and Data Science Team
Batch Map- Reduce / ETL
Converged Data Platform
25©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Building Data Pipeline : Batch SPARK Processing
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
POD 1
POD 2
POD N
CRUD to MFS or MapR-DB
• Batch Spark Jobs Running On-Demand and Weekly• Types of Operations : Multiple Types of Joins, Aggregation
for application• 100s GBs of Data ; Billions of Records• Was Map-Reduce (~ 17 Hours) -> Now Spark Jobs (~6 hours) • Was Hive ( 10 Hours ) -> Now Drill ( ~ 2 hours )• Off-line Data Science Models
Batch Map- Reduce / ETL
Batch SPARK Processing
Converged Data Platform
26©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Building Data Pipeline : Real-Time SPARK Processing
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
POD 1
POD 2
POD N
CRUD to MFS or MapR-DB
• Spark Data Processing for Real-Time Web Benchmarking App• Real-Time Data Science Models Processing Such as GLM• Types of Operations : Average, Percentile, Distributions,
others• Long Running Spark Context • Varieties of RDD Caching Techniques to Speed Calculation
Batch Map- Reduce / ETL
Batch SPARK Processing
Agg. Stored on MapR-DB Serving Data
to Benchmarking Application
Converged Data Platform
27©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Xactly Insights : Web Application
Real-Time Calculation Of Custom GLM
Under 2-3 Seconds Response Time
Calculates Percentiles Real-Time
Under 3 Seconds Response Time
28©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
29©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
30©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Building Data Pipeline : Spark Transaction Processing
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
POD 1
POD 2
POD N
SQOOP transfers data to MapR-DB
• Short Lived Spark Context Per Tenant• 100 GBs of Data Processing Per
Business• Replacing Store Procedures• 2.5x Faster Processing Speed With
Spark
Spark Processing Generates
Results
RDBMSRDBMS
RDBMSRDBMS
RDBMSRDBMS
POD 1
POD 2
POD N
RDBMS
RDBMS
RDBMS
Converged Data Platform
31©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Building Data Pipeline : Real-Time Search
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
RDBMSRDBMSRDBMS
POD 1
POD 2
POD N
CRUD to MFS or MapR-DB
• ~ Real Time Search On Thousands of Standard and Custom Fields
• Types of Operations : Multiple Types of Joins an Mappings• 100s of Small Map-Reduce Jobs / 10 minutes• ~100 GBs Data Size / 5 minutes
MapReduce Prepares Data for
Solr InjectionSOLR Engine
Converged Data Platform
32©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
• Design & Build For Future Not Only Present• Big Data Frameworks : Easy to Use > Extremely Hard to Master• MapR-DB vs MapR-FS• Never Settle for Default Configuration ; Customization Can Make Life Much Better• Do Not Use SPARK Without Proper Understanding
• Simple Debugging Can Consume Entire Sprint Or More• Memory Management In SPARK May Surprise You • Many More
• SPARK SQL Is Good Though Developers Must Prefer Power Of SPARK Scala API.• Retire Hive & Adopt Drill
Lessons Learned
©2016 Xactly Corporation. All rights reserved. Proprietary & Confidential.
Xactly’s vision is to change the world of incentive compensation.For more information, visit www.xactlycorp.com
• WE ARE HIRING !• [email protected] or
https://www.xactlycorp.com/company/careers/
© 2016 MapR Technologies
Q & AEngage with us!
1. Get Case Studies: Big Data All-Starshttps://www.mapr.com/when-streaming-becomes-strategic
2. Get Started: MapR Converged Data Platformhttps://www.mapr.com/get-started-with-mapr
3. Get Answers: MapR Converge Communityhttps://www.mapr.com/big-data-all-stars