Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
NAVIGATING THEDATABASE UNIVERSE"
Stay Tuned for Today’s Session!
Navigating the Database Universe"
Dr. Michael Stonebraker and Scott Jarr!
A Few Housekeeping Items!
• Remember to mute your line!
• Type your questions for the presenters in the chat box in the lower right side!
• We will answer as many questions as we have time for at the end of the presentation!
• If you experience audio difficulties, you can dial in using the following:!– Telephone: +1 (626) 544-0059 "
– Access Code: 228-638-049 "
– Webinar ID: 912-533-214 ""
About Our Presenters!
Mike Stonebraker"Co-founder & CTO, VoltDB!!!
A pioneer of database research and technology for more than a quarter of a century, and the main architect of the Ingres relational DBMS and the object-relational DBMS PostgreSQL !
Scott Jarr"Co-founder & Chief Strategy Officer, VoltDB!!More than 20 years of experience building, launching and growing technology companies from inception to market leadership in the search, mobile, security, storage and virtualization markets !
• The (proper) design of DBMSs!– Presented by Dr. Michael Stonebraker!
• The database universe!
• Where the future value comes from!
Agenda!
• “Big Data” is a rare, transformative market!• Velocity is becoming the cornerstone!• Specialized databases (working together) are
the answer!• Products must provide tangible customer
value... Fast"
We Believe…!
THE (PROPER) DESIGN OF THE DBMS"
Dr. Michael Stonebraker!
Lessons from 40 Years of Database Design!1. Get the user interaction right!
– Bet on a small number of easy-to-understand constructs!
– Plus standards!
2. Get the implementation right!– Bet on a small number of easy-to-
understand constructs!
3. One size does not fit all!– At least not if you want fast, big or
complex!
Those who don’t learn from history are des3ned to repeat it.
“ ” -‐Winston Churchill
#1: Get the User Interaction Right!
Winner: RDBMS!• Simple data model
(tables)!• Simple access
language (SQL)!• ACID (transactions)!• Standards (SQL)!
Loser: CODASYL"• Complicated data model
(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)!
• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)!
Loser: OODBs"• Complex data model
(hierarchical records, pointers, sets, arrays, etc.)!
• Complex access language (navigation, through this sea)!
• No standards!
Historical Lesson: RDBMS vs. CODASYL vs. OODB!
Interaction Take Away − Simple is Good"
• ACID was easy for people to understand!
• SQL provided a standard, high-level language and made people productive (transportable skills)!
#2: Get the Implementation Right!
• Leverage a few simple ideas: Early relational implementations!– System R storage system dropped links!– Views (protection, schema modification, performance)!– Cost-based optimizer!
• Leverage a few simple ideas: Postgres!– User-defined data types and functions (adopted by most everybody)!– Rules/triggers!– No-overwrite storage!
• Leverage a few simple ideas: Vertica!– Store data by column!– Compressed up the ging gong!– Parallel load without compromising ACID!
Historical W
inners"
#3: One Size Does NOT Fit All!
• OSFA is an old technology with hundreds of bags hanging off it!
• It breaks 100% of the time when under load!
• Load = size or speed or complexity!
• Load is increasing at a startling rate!
• Purpose-built will exceed by 10x to 100x!
• History has not been completely written yet…but let’s look at VoltDB as an example!
…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.
“
” -‐My Top 10 Asser7ons About Data Warehouses, 2010
Example: VoltDB!• Get the interface right"
– SQL!– ACID!
• Implementation: Leverage a few simple ideas"– Main memory!– Stored procedures!– Deterministic scheduling!
• Specialization"– OLTP focus allowed for above implementation choices!
!
Proving the Theory!
• Challenge: OLTP performance!
– TPC-C CPU cycles!
– On the Shore DBMS prototype!
– Elephants should be similar!
Recovery 24% Latching 24%
Buffer Pool 24% Locking 24%
Useful Work 4%
Implementation Construct #1: Main Memory!• Main memory format for data!
– Disk format gets you buffer pool overhead!
• What happens if data doesn’t fit?!– Return to disk-buffer pool architecture (slow)!– Anti-caching!
• Main memory format for data!
• When memory fills up, then bundle together elderly tuples and write them out!
• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin) !
• Run Xact normally !
Implementation Construct #2: Stored Procedures!
• Round trip to the DBMS is expensive!– Do it once per transaction!
– Not once per command!
– Or even once per cursor move!
• Ad-hoc queries supported!– Turn them into dynamic stored procedures!
Implementation Construct #3: Deterministic and Non-deterministic Scheduling!• Non-deterministic (can’t tell order until commit time)!
– MVCC!
– Dynamic locking!
• Deterministic!– Time stamp order!
Result of Design Principles: VoltDB Example!
• Good interface decisions – made developers more productive!– SQL & ACID!
• Leveraging a few simple implementation ideas – made VoltDB wicked fast!
– Main memory!
– Stored procedures!
– Deterministic scheduling!
Proving the Theory!
• Answer: OLTP performance!– 3 million transactions per second!
– 7x Cassandra!
– 15 million SQL statements per second!
– 100,000+ transactions per commodity server!
…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.
“
” -‐The End of an Architectural Era (It’s Time for a Complete
Rewrite), 2007
THE DATABASE UNIVERSE"
Scott Jarr!
Technology Meets the Market!
Believe"– “Big Data” is a rare, transformative market!– Velocity is becoming the cornerstone!– Specialized databases (working together) are the answer!– Products must provide tangible customer value… Fast!
Observations"– Noisy, crowded and new – kinda like Christmas shopping at the mall!– Everyone wants to understand where the pieces fit!– Analysts build maps on technology NOT use cases!
"What we need is…"!
Data Value Chain!
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade • Serve ad • Enrich stream • Examine packet • Approve trans.
• Calculate risk • Leaderboard • Aggregate • Count
• Retrieve click stream
• Show orders
• Backtest algo • BI • Daily reports
• Algo discovery • Log analysis • Fraud pattern match
Age of Data
Data Value Chain!
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade • Serve ad • Enrich stream • Examine packet • Approve trans.
• Calculate risk • Leaderboard • Aggregate • Count
• Retrieve click stream
• Show orders
• Backtest algo • BI • Daily reports
• Algo discovery • Log analysis • Fraud pattern match
Value of Individual Data Item
Data Value
Aggregate Data Value
Age of Data
Traditional RDBMS Simple Slow Small
Fast Complex Large
App
licat
ion
Com
plex
ity Value of Individual Data Item Aggregate Data Value
Data Value
The Database Universe!
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Transactional Analytic
Traditional RDBMS Simple Slow Small
Fast Complex Large
App
licat
ion
Com
plex
ity Value of Individual Data Item Aggregate Data Value
Data Value
NewSQL Data
Warehouse
Hadoop, etc. NoSQL
Velocity
The Database Universe!
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Transactional Analytic
Closed-loop Big Data!
Interactive & Real-time Analytics
Historical Reports & Analytics
Exploratory Analytics
logins sensors impressions orders
authoriza7ons clicks trades
Closed-loop Big Data!• Make the most
informed decision every time there is an interaction!
• Real-time decisions are informed by operational analytics and past knowledge!
Knowledge
Interactive & Real-time Analytics
Historical Reports & Analytics
Exploratory Analytics
logins sensors impressions orders
authoriza7ons clicks trades
The Velocity Use Case!
What’s it look like?"– High throughput, relentless data feeds!
– Fast decisions on high-value data!
– Real-time, operational analytics present immediate visibility!
What’s the big deal? "– Batch converts to real time = efficiency!
– Decisions made at time of event = better decisions!
– Ability to micro segment/target/personalize/etc. = conversion, satisfaction, more data is coming at you, use it to improve your business!
QUESTIONS AND ANSWERS"
Next Up!
THANK YOU"www.voltdb.com!