Upload
big-data-spain
View
593
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Hadoop is an open source framework designed to rapidly ingest, store, and analyze large data sets. Hadoop is well suited for batch processing where immediate interactive analytics are not required. But today, Hadoop does not support the operational and transactional workloads. These workloads consist of a constant flow of transactions requiring low-latency response times for read/write access.
Citation preview
HP TRAFODION
RODRIGO MERINOSENIOR PRESALES SOLUTION ARCHITECT HEWLETT-PACKARD
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion: How to use Hadoop for operational and transactional purposes
Enterprise-Class Operational SQL-on-Hadoop DBMS
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Agenda
Current database landscape … and a prediction
How special are transactions
HP Trafodion. Trafodion Innovation
Use cases
Trafodion: an open-source project
Current database landscape
Source: https://451research.com
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The current situation
Each database type has itsstrengths and their perfect fit
… but they also have weaknesses
You can’t use one of them for alltype of workloads!
Source: http://www.datasciencecentral.com/profiles/blogs/hadoop-vs-nosql-vs-sql-vs-newsql-by-example
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Hadoop workload profiles
Operational Non-interactive
• Real-time analytics • Data preparation• Incremental batch
processing• Dashboards,
scorecards
Interactive
• Parameterized reports
• Drilldown visualization
• Exploration
Batch
• Operational batch processing
• Enterprise reports• Data mining
• Transactional SQL = OLTP + interactions
Sub-second Response Time Hours
Current Market Focus: Data Warehousing and Analytics
OperationalOptimizations
DataIntegrity
Workload Management
Transaction Support
Real-time Performance
Exposes Hadoop limitations
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
We could have a situation like this. Sound familiar?
But if we use the right tool for each job…
MapReduceMPP DBMS
NoSQL
DBMS
In MemoryAnalytics
Large Data Movement / Replication of Data
Varying Platform Requirements
Departmental segmentation
HDFS CentricTraditional
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Big Data is hard to move… because it’s BIG !!!
Source: www.pinterest.com
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
And there is a fair chance that something will fail
Source: www.shutterstock.com
… and a prediction
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Hadoop: One platform to rule them all
Source: www.wallconvert.com
The Future of Hadoop: What Happened & What's Possible?
Operational SQL-on-Hadoop
“Transactions were something that were long thought to be out of scope for this style of platform. There are a lot of important cases for transactions. You are selling a ticket to something then you need to move money from one place to another. You need to assign a seat to someone. And you need to make sure that the money is in one place or the other. Not in both, not nowhere. And you need to at the same time assign that seat or not assign that seat. This is an important class of workload that is currently well served but not by the Hadoop platform. A year ago Google published a paper describing their internal system they have built on their platform, that is very similar to Hadoop, which does this, demonstrating that its possible to bring online transaction processing to this style of platform. And in the past when we have seen its possible, within a few years it happens. So I think the prediction we can make here is that it is inevitable that we will see just about every kind of workload be moved to this platform – even Online Transaction Processing.
– Doug Cutting, Cloudera, October 30 2013
How special are transactions
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Characteristics of operational DBMS applicationsGeneralized characteristics and requirements:• Low latency response times
• ACID (data consistency guaranteed) transactions
• Large number of users
• High concurrency
• High availability
• Scalable data volumes
• Multi-structured data
• Rapidly evolving data requirements (i.e. flexible schemas)
Expose Hadoop limitations
Operational QueryOptimization
DataIntegrity
Workload Management
Transaction Support
Real-time Performance
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Characteristics of operational DBMS applicationsGeneralized characteristics and requirements:• Low latency response times
• ACID (data consistency guaranteed) transactions
• Large number of users
• High concurrency
• High availability
• Scalable data volumes
• Multi-structured data
• Rapidly evolving data requirements (i.e. flexible schemas)
Expose Hadoop limitations
Operational QueryOptimization
DataIntegrity
Workload Management
Transaction Support
Real-time Performance
Source: michaeljswart.com
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ACID properties for transactions
AtomicityEither all operations of the transaction are properly reflected in the database or none are.
ConsistencyExecution of a transaction in isolation preserves the consistency of the database.
IsolationAlthough multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions.
DurabilityAfter a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The typical bank transfer example
Transfer £50 from account A to account B
Read(A)
A = A - 50
Write(A)
Read(B)
B = B + 50
Write(B)
AtomicityShouldn’t take money from A without giving it to B
ConsistencyMoney isn’t lost or gained
IsolationOther queries shouldn’t see A or B change until completion
DurabilityThe money does not go back to A
transaction
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
And a funnier example
“As the 6 a.m. deadline approached, Police Minister Toleafoa Faafisi went on national radio to tell drivers everywhere to stop their vehicles. Minutes later, Prime Minister Tuilaepa SaileleMalielegaoi broadcast the formal instructions for drivers to switch sides.”
Imagine we could do it in a SQL database:
If this “transaction” were not atomic there would be trouble!
On 2009 Samoa switched from driving on the right side of the road to the left
Source: michaeljswart.com
Trafodion
Trafodion - IntroductionOpen source project to develop transactional SQL-on-HBase
Rides the unstoppable Hadoop wave!Transforms how companies store, process, and share big data
Affordable performance, elastic scalability, availability
Open source project - downloadable for freeEliminates vendor lock-in and licensing fees
Leverages community development resources and speed
Schema flexibility and multi-structured dataCapturing and storing all data for all business functions
Full-function ANSI SQLReuses existing SQL skills and improves developer productivity
Distributed ACID transaction protectionGuarantees data consistency across multiple rows, tables, SQLstatements
Targeted for operational workloads!Optimized for real-time transaction processing applications i.e.
OLTP + New Style Transactions (Interactions + Observations)
Leverages 20+ years of HP investments
+Transactional SQLHBase
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion - Features
Complete: Full-function SQL Reuse existing SQL skills and improve developer productivity
Protected: Distributed ACID transactionsData consistency across multiple rows, tables, SQL statements
Efficient: Low-latency R/W transactionsOptimized for real-time transaction processing applications
Interoperable: Standard ODBC/JDBC accessWorks with existing tools and applications
Data federation: Trafodion/HBase/Hive tablesEnables multiple data model deployment
Scalable: Elastic scale for high concurrencyProvides elastic scalability as number of users / data grows
Highly Available: For enterprise applicationsLeverages HBase / Hadoop replication
Open: Hadoop and Linux distribution neutralEasy to add to existing infrastructure with no vendor lock-in
Eco-system: Leverages large Hadoop eco-systemCan use any tool or database accessing Hadoop
Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop
+Transactional SQL Hadoop
HBase vs. Trafodion comparison
HBase Trafodion + HBaseData abstraction Key and value pair Relational schema
Physical Layout Column family store where row data is stored together by cells
Same except there is a single column family with space-saving column encoding
Column values Uninterpreted array of bytes Explicitly defined and enforced data types
ACID Guarantee Single row atomicity Multi- SQL statements, tables, and rows defined as part of transaction
Language API Get/put/delete SQL (Trafodian invokes native HBase API)
Row Key Index Single (string) row key Composite (multi-column) row key
Secondary Indexes Not supported Arbitrary secondary key columns
Trafodion and Hadoop – Benefits!Leverages and extends Hadoop for transactional SQL workloads
Complete: Full-function ANSI SQLReuse existing SQL skills and improve developer productivity
Protected: Distributed ACID transactionsGuarantees data consistency across multiple rows, tables, SQL statements
Efficient: Optimized for low-latency read and write transactionsSupports real-time transaction processing applications
Flexible: Schema flexibility and multi-structured dataSeamlessly integrates structured, unstructured, and semi-structured data
Interoperable: Standard ODBC/JDBC accessWorks with existing tools and applications
Open: Hadoop and Linux distribution neutralEasy to add to your existing infrastructure and no vendor lock-in
Open source project sponsorship and investment from HP
Scale without complexity
Reuse SQL skills
ComplementsHadoop
Reduce Costs
Real-time Performance
+
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Innovations in Trafodion
Trafodion innovation built upon Hadoop stack
Leverages Hadoop andHBase for core modules• Maintains API compatibility
• Inherited scalability and availability
Differentiation• ANSI SQL via ODBC/JDBC
• Relational schema abstraction
• Distributed transaction protection
• Mature SQL technology
• Automatic parallelism
Zook
eepe
r
Client Application using ODBC/JDBC on Windows/Linux
Client Services for ODBC and JDBC
SQL Compiler / Optimizer / Executor
Distributed Transaction ManagerHive
HBase
HDFS
+StandardHadoop Trafodion
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion – Software architecture (3 layers)
JDBC ODBC
User and ISV Operational Applications
Driver
Client
SQL
StorageEngine
*ESP
CMP Master
ESPDTM
WMS
Compiler and Optimizer Workload Management
SQL ParallelismDistributed Transaction Management
. . . .
Future
Database Connectivity
HBase
Relational Schema
Trafodion Tables
HDFS
Data StoreIntegrationHBase
Native HBase Tables KVS, Columnar via
HBase API + coprocessors
Hive
Direct HDFS access to Hive tables using
HCatalog
*Executor Server ProcessHP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Process overview and SQL execution flow
…
…HBase HBase
Client connections via ODBC, JDBC.Net (future).
SQL execution service with an instance of the executor serving as the master for parallel SQL execution plans.CMP (Compiler/Optimizer) component to
generate the optimal execution plan.
DTM provides distributed transaction management across the cluster.
Executor server processes used for parallel execution based on plan (optional). Multiple layers of ESP may be used.
HBase data services responsible for accessing and maintaining database objects.
Operational Application Clients.
HDFS
HBase-Trx provides transactional resource management for HBase.
Database connection service –lightweight coordination service &
process control using Apache Zookeeper.
ESP ESPDTM
CMP
DCSMaster
TRX TRX
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Optimized execution plans based on statistics
Optimizer features• Top-down, multi-pass optimizations, branch and
bound plan pruning considers more potential plans• Utilizes “equal-height” histogram statistics• SQL pushdown considerations e.g. predicate
evaluation• Eliminates sorts when feasible, syntactically and
semantically• In-memory vs. overflow considerations• Optimal degree of parallelism (DOP) considerations
including non-parallel plans
Benefits• Facilitates enhanced parallelism and SQL object
handling efficiencies• Optimizations for operational transactions and
reporting workloads
SQL Statement
Optimized Plan
SQL Normalizer
Plan
Generator
Table Statistics
Cardinality Estimator
Cost Estimator
SQL Analyzer
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data flow SQL execution with optimized DOP
Data-flow, scheduler-driven Parallelism throughout
Scan Scan
Join
Group By
Operatorparallelism
Partitionedparallelism
Pipelineparallelism
Master
Join
ScanGroup by
Scan
40
30
20
– Operators executed by Master or ESP
– Varying degrees of parallelism
– SQL divided into operatorsNested, merge, hash joins; unions; partial & full aggregations; sorts; input/output operations (scan, update, delete, insert)
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion Distributed transaction protection
Multiple row inserts, updates, and deletes to a table
Multiple table and SQL insert, update, and delete statements
Distributed multiple HBase region insert, update, and delete transaction (2-phase commit)
Read-only transaction (eliminates commit overhead)
Trafodion
1
4
3
. . .
Region A
Region B
Region C
Region D
2
Table A
Table B
Table C
Table A
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Integrating external (non-Trafodion) Hadoop tablesBenefits• Able to run queries against external tables without needing to copy them into a Trafodion table structure
• Optimized access to external HBase and Hive tables without complex map-reduce programming
• Data can be joined across disparate data sources (e.g. Trafodion, Hive, HBase)
• Able to leverage HBase’s inherent schema flexibility capabilities
HBase tables (created outside of Trafodion by HBase)• Schema-less format i.e. no information in Trafodion metadata
• Accessible through Trafodion SQL in two modes– Cell-per-row access i.e. each row returned represents a single HBase cell
– Row-wise access i.e. all column values of the row will be returned as a single, big varchar
Hive tables (created outside of Trafodion by Hive)• Hive metadata, HDFS files storage, delimited data, read/append only
• Support for both SELECT and INSERT statements
• Automatic data type mapping
Trafodion use cases
Good fit for Trafodion
• Onlinefinancial management
Finance
• Billingsystems
• Provisioningsystems
Telecom
• RFID tracking
Manufacturing
• SmartMetering
Energy
• Authorizationand claims processing
Healthcare
• 911Emergency System
Government
• Reservationsystems
Transportation
• Onlineshopping
Consumer &Retail
Multi-Structured Data
ACID Protection, Data Integrity
Low Latency, High Concurrency
Generates Revenue Touches the Customer Helps Run the Business
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
35 HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Operational SQL on Hadoop – Use cases
• Integration of structured, semi-structured, and unstructured support
• Integration of operational, historical, & external (Big) data along common master data for better insights
Item id Description Cost Price …Structured
Type Display Size Resolution Brand Model 3D …
…ISBN Author Publish Date Format Dept
TV
Book
…
Semi- structured
SELECT all TVs WHERE Price > 2000 and Type = ‘Plasma’ and Display Size > ‘50’ and customer sentiment is very positive
Unstructured Image …
Review …
Open distributed HDFS structures
HBase & HiveFree at last!
Capture data directly into open file structures
Accessible for reporting & analytics with no latency
Trafodion: An open-source project
Modern open source environmentFollowing best practices of OpenStack project
Source code in GitHub
Build/test in OpenStack gerrit, zuul, jenkins
Defect tracking in Launchpad
Documentation in MediaWiki
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Building an Open Source Community
Simple installation
Meritocracy
Recruiting project contributors
Share your expertise: Developing, fixing defects, testing,writing, translating and more
Want to try?
Discover our capabilities: Download and install in your Hadoop environment and take a test-drive
www.trafodion.org
Recruiting project contributors
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
See for yourself…Come discover and develop on Trafodion
www.trafodion.orgHP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank You
17TH ~ 18th NOV 2014MADRID (SPAIN)