Upload
datatorrent
View
22
Download
3
Embed Size (px)
Citation preview
Emerging Data Architectures for Next Generation WorkloadsMILIND BHANDARKAR
FOUNDER & CEO, AMPOOL INC.
About Me
• http://www.linkedin.com/in/milindb• Founding member of
Apache Hadoop team at Yahoo [2005-2010]• HDFS, MapReduce, Pig,
YARN, HAWQ, Geode…
• Chief Architect at Greenplum Labs (2011-2013)• Chief Scientist at Pivotal
Software (2013-2015)• Founder, CEO Ampool
(2015-)
Agenda
• History of Data Platforms
• Rise of RDBMS
• Hadoop & NoSQL
• Technology Trends
• Ampool : Active Data Store for Next Generation Workloads
Before 1975: CODASYL
• Committee On Data System Languages
• Development of Data Processing Languages (resulted in COBOL)
• 1967 – Database Technology Group (DBTG)
• Network Database Model
1975 - Today: System R & RDBMS
• 1970 – Relational Model of Data by E. F. Codd
• Projections, Selections, Joins, Difference, Union
• 1974-75 – IBM System R – An Experimental Database System based on Relational Algebra
• Structured Query Language - SQL
Early 2000s: OLTP – OLAP Split
• OLTP/ODS• Transactional, Low Latency, Highly Concurrent, Scale Up
• Highly Structured, Normalized Data Model
• Customer facing, ~50-50 Reads/Updates
• Oracle, MS SQL Server, IBM DB2, MySQL, Postgres
• OLAP/BI• Not transactional/long-running transactions, High throughput, Scale Out
• Structured/Semi-structured, Denormalized data model
• Business internal, ~90-10 Reads/Updates
• Teradata, Netezza, Vertica, Greenplum, Aster Data
• Mostly MPP
CAP Theorem
• Brewer’s Conjecture 1999
• Proved 2002 by Gilbert & Lynch
• It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:• Consistency
• Availability
• Partition tolerance
2005-Today: NoSQL
• Amazon Dynamo, Google BigTable
• Key-Value Stores (e.g. Voldemort, Riak)
• Wide-Column, Column-Family Stores (e.g. HBase, Cassandra)
• Document Stores (e.g. MongoDB, CouchBase)
• Graph Stores (e.g. Neo4J)
• Time-Series Stores (Metric Stores, e.g. InfluxDB, OpenTSDB)
• Non-Relational? Non-Transactional? No SQL or “Not Only SQL” or “Not Yet SQL” ?
• Deconstruction of the Database
Data Lakes• First Gather All Data• Transactional Data from Business Applications•User Behavior Data from Interactive Applications• Social Media Data, Location Data• Public Datasets, Third-Party Data
• Then, Magic Happens … ?
22
73%Planning, Implementing, or Expanding the use of Real-Time Data Platforms
26
Source: Forrester Global Business Technographics Data And Analytics Online Survey, 2015
Enterprises are hyper-personalizing Appsusing advanced predictive Analyticsdriven by high-velocity Big Data
2/3/17Ampool® Confidential
27
DATA "
USERS
Analytics
#
#
#
#
Apps
Multi-Device
Testing
$
%
&
|
)
Enterprises Aspire To Build & Continuously Tune Real-time Intelligent, Data-driven Applications
28
ChallengesSlow, complexdata pipelines
Real-time App pressure
Operational complexity
Data Silos & DisconnectedProcessing
Problems With Current Lambda Architecture
Multiple Data Stores• One each for streaming, batch,
queries, and applications cache
Need to create multiple copies of data• Format conversions• Data Serialization & Deserialization
at each stage• Data governance is distributed &
complex
Latency due to data propagation• Inhibits real-time insights
29
Data Sources
Streaming Layer
Batch LayerAll Data
!! Computations
!! Computations
Query Layer!! Query Engine
Application(s)
Problem: Complex, Slow Data ProcessingSolution: Memory-centric Active Data Store
30
Eliminate• File-based data
exchange
• File format Conversions
• Data Copies
• Serialization overheads
• Lack of Multi-tenancy
DATA
!
"
#
$
|
'
(
(
(
!!
!
* DATA
"
#
$
%
|
(
)
""
"
Ampool: Unified Active Data Store Closes The Loop Driving Value At The Speed Of Business
31
Ingest & Store hot data & update in-situServe data concurrently to multiple stages & tenantsAutomatically tier data to warm & cold (archive) stores, with usage & timeLink insights back to Applications driving decisions in a closed loop
DATA
"
"
"
"
#
$
|
'
(ampool
Technology Trends…
• Huge (and rapidly growing) gap between memory and I/O bandwidths
• Rapidly increasing Network Bandwidth (~1000x in ~15 years)
• Plummeting costs of Solid State Storage (Comparable to HDD by 2019)
• NVDIMMs supported by major OSs, 10x density, 1/5th $/GB compared to DRAM
• Emergence of Storage Class Memory• 3D XPoint, PCM, MRAM, Memristor etc.
32
…leading to Mainstream In-Memory Computing
• Scale-Out On-Demand Compute Infrastructure• Public & Private Clouds
• Fine-Grained Virtualization & Microservices• Containerization & Orchestration
Why Now? Storage Technologies Price/Performance
Storage Type Cost /GiB ($) Latency (ns)
IOPS(4KiB Random
I/OPer Second)
Bandwidth(MB/s)
Million IOPS Per GB Cost of Storage
GB/s Bandwidth
Per GiB Cost of Storage
Min Max Min Max Min Max Min Max
DRAM (DDR4) 6 10 30 50 15,000,000 60,000 1,500,000 2,500,000 6.0 10.0 SCM (3DXpoint, Projected) 3 5 100 500 10,000,000 10,000 2,000,000 3,333,333 2.0 3.3
NAND PCIe3 SSD (MLC) 2 6 50,000 1000000 100,000 3,000 16,667 50,000 0.5 1.5
HDD (7200 RPM) 0.03 0.2 5,000,000 10,000,000 100 100 500 3,333 0.5 3.3
36
Ampool: Distributed Memory-centric Active Data Store Powered By Apache Geode
+Î
Loca
tor
Serv
er
Serv
er
Serv
er
Serv
er
!!!!
!
#
$$$
%% %$$ $$ $$ $$
REST
In-Memory Distributed Sys
Low-latency Comms.
Key-Value Store
Function Pushdown
+
High Throughput
Table Store
Pluggable Store Manager
Java API
Java API
Smart Data Tiering
Mature Event Model
Tunable Consistency
Metadata/ Catalog
Security AuthZ
37
~2xFaster than HBase on Inserts
~4xFaster than HBase on Scans
~6xFaster than Alluxio (Tachyon) on
Scans
2-3xFaster than HBase on Lookups
Multi-Modal Analytics
• Real-time performance
• Operational Analytics
• Batch/ Machine Learning
• Business Intelligence
• Predictive Maintenance
Target Verticals & Use-Cases
Financial Svcs. Telecom Retail Media• Fraud Detection• Credit/ Market risks• Event-based marketing
• Network/ quality opt.• Mobile user analysis• Event-based marketing
• Targeted digital offers• Markdown optimization• Event-based targeting
• Content/ ad delivery• Event/ behavior-based
targeting
Anomaly Detection IoT Analytics• Event/ activity monitoring• Real-time automated decisions
• Device management• Comms. optimization
360 Customer Analytics• Social media sentiment analysis• Event-based ad targeting
39
& /company/ampool-inc- '( /AmpoolIO@AmpoolIO# www.ampool.io
2/3/17Ampool® Confidential
45