VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Preview:

Citation preview

Stonebraker Live!

Navigating the Database Universe

VoltDB presents

SCOTT JARR

Co-founder and Chief Strategy Officer

• The (proper) design of DBMSs

– Presented by Dr. Michael Stonebraker, Co-founder

• The database universe

– Presented by Scott Jarr, Co-founder and Chief Strategy Officer

• Introducing VoltDB 3.0

– Presented by Mark Hydar, VP of Market Technology and Strategy

Agenda

• “Big Data” is a rare, transformative market

• Velocity is becoming the cornerstone

• Specialized databases (working together) are

the answer

• Products must provide tangible customer

value... Fast

We Believe…

THE (PROPER) DESIGN

OF THE DBMS

Dr. Michael Stonebraker

Lessons from 40 Years of Database Design

1. Get the user interaction right

– Bet on a small number of easy-to-understand constructs

– Plus standards

2. Get the implementation right

– Bet on a small number of easy-to-understand constructs

3. One size does not fit all

– At least not if you want fast, big or complex

Those who don’t learn from history are destined to repeat it.

“”-Winston Churchill

#1: Get the User Interaction Right

Winner: RDBMS

• Simple data model (tables)

• Simple access language (SQL)

• ACID (transactions)

• Standards (SQL)

Loser: CODASYL• Complicated data model

(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)

• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)

Loser: OODBs

• Complex data model (hierarchical records, pointers, sets, arrays, etc.)

• Complex access language (navigation, through this sea)

• No standards

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and

made people productive (transportable skills)

#2: Get the Implementation Right

• Leverage a few simple ideas: Early relational implementations

– System R storage system dropped links

– Views (protection, schema modification, performance)

– Cost-based optimizer

• Leverage a few simple ideas: Postgres

– User-defined data types and functions (adopted by most everybody)

– Rules/triggers

– No-overwrite storage

• Leverage a few simple ideas: Vertica

– Store data by column

– Compressed up the ging gong

– Parallel load without compromising ACID

His

toric

al W

inn

ers

#3: One Size Does NOT Fit All

• OSFA is an old technology with hundreds

of bags hanging off it

• It breaks 100% of the time when under

load

• Load = size or speed or complexity

• Load is increasing at a startling rate

• Purpose-built will exceed by 10x to 100x

• History has not been completely written

yet…but let’s look at VoltDB as an

example

…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.

”-My Top 10 Assertions About Data Warehouses, 2010

Example: VoltDB

• Get the interface right– SQL

– ACID

• Implementation: Leverage a few simple ideas– Main memory

– Stored procedures

– Deterministic scheduling

• Specialization– OLTP focus allowed for above implementation choices

Proving the Theory

• Challenge: OLTP performance

– TPC-C CPU cycles

– On the Shore DBMS prototype

– Elephants should be similar

Recovery 24%Latching 24%

Buffer Pool 24%Locking 24%

Useful Work4%

Single Threaded

• Gets rid of the latching problem

• What about Multicore?

– Divide the memory on an N-core node so it looks like N single-core nodes

– Which are single threaded…

Implementation Construct #1: Main Memory

• Main memory format for data

– Disk format gets you buffer pool overhead

• What happens if data doesn’t fit?

– Return to disk-buffer pool architecture (slow)

– Anti-caching

• Main memory format for data

• When memory fills up, then bundle together elderly tuples and write them out

• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)

• Run Xact normally

Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive

– Do it once per transaction

– Not once per command

– Or even once per cursor move

• Ad-hoc queries supported

– Turn them into dynamic stored procedures

Implementation Construct #3: Deterministic Scheduling

• Transactions are ordered and run to completion

– No locking

• Active-active replication (HA)

– Run transaction at all replicas – in the same pre-determined order

• What about a cluster-wide power failure?

– Asyn checkpointing

– With a command log

– Wildly faster than data logging

Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive

– SQL & ACID

• Leveraging a few simple implementation ideas – made

VoltDB wicked fast

– Main memory

– Stored procedures

– Deterministic scheduling

Proving the Theory

• Answer: OLTP performance

– 3 million transactions per second

– 7x Cassandra

– 15 million SQL statements per

second

– 100,000+ transactions per

commodity server

…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.

”-The End of an Architectural Era (It’s Time for a Complete

Rewrite), 2007

THE DATABASE UNIVERSE

Scott Jarr

Technology Meets the Market

Believe

– “Big Data” is a rare, transformative market

– Velocity is becoming the cornerstone

– Specialized databases (working together) are the answer

– Products must provide tangible customer value… Fast

Observations

– Noisy, crowded and new – kinda like Christmas shopping at the mall

– Everyone wants to understand where the pieces fit

– Analysts build maps on technology NOT use cases

What we need is…

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade

• Serve ad

• Enrich stream

• Examine packet

• Approve trans.

• Calculate risk

• Leaderboard

• Aggregate

• Count

• Retrieve click

stream

• Show orders

• Backtest algo

• BI

• Daily reports

• Algo discovery

• Log analysis

• Fraud pattern match

Age of Data

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade

• Serve ad

• Enrich stream

• Examine packet

• Approve trans.

• Calculate risk

• Leaderboard

• Aggregate

• Count

• Retrieve click

stream

• Show orders

• Backtest algo

• BI

• Daily reports

• Algo discovery

• Log analysis

• Fraud pattern match

Value of Individual

Data Item

Da

ta V

alu

e

Aggregate

Data Value

Age of Data

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Ap

pli

ca

tio

n C

om

ple

xit

y

Value of Individual Data Item Aggregate Data Value

Da

ta V

alu

e

The Database Universe

Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory

Analytics

Transactional Analytic

Traditional RDBMS

Simple SlowSmall

FastComplexLarge

Ap

pli

ca

tio

n C

om

ple

xit

y

Value of Individual Data Item Aggregate Data Value

Da

ta V

alu

e

Data

Warehouse

Hadoop, etc.NoSQL

The Database Universe

Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory

Analytics

Transactional Analytic

NewSQL

Velocity

Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Closed-loop Big Data

• Make the most

informed decision

every time there is an

interaction

• Real-time decisions

are informed by

operational analytics

and past knowledge

Knowledge

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

The Velocity Use Case

What’s it look like?

– High throughput, relentless data feeds

– Fast decisions on high-value data

– Real-time, operational analytics present immediate visibility

What’s the big deal?

– Batch visibility converts to real time = immediate business impact

– Decisions made at time of event = higher impact decisions with immediate returns

– Ability to ingest and manage massive amounts of data = business differentiation and disruption

HELLO 3.0!

Mark Hydar

Introducing VoltDB 3.0

Introducing VoltDB 3.0

• Available now!

– Both commercial and open source offerings

– www.voltdb.com/downloads

• Key improvements

– Even faster

– Easier to build high-velocity applications

– Expanded reach across developers and applications

– Extensible to integrate with existing data infrastructure

Latency and Throughput, 50-50 Read/Write Workload

Latency and Throughput, 50-50 Read/Write Workload

0

2

4

6

8

10

12

14

16

-50000 0 50000 100000 150000 200000 250000 300000

Late

ncy (

ms)

TPS

3.0

2.8.4.1

VoltDB 3.0 vs. v2.8.4.1

Key/Value 50/50 read/write workload

3 Node, K=1 Cluster

Read/Write Workload Latency/Throughput

Read/Write Workload Latency/Throughput

0

1

2

3

4

5

6

7

8

9

-50000 0 50000 100000 150000 200000 250000 300000 350000

Avg

. L

ate

ncy (

ms)

TPS

10% read/90% write

50% read/50% write

90% read/10% write

VoltDB 3.0

Key/Value various read/write workload

3 Node, K=1 Cluster

Faster: Ad Hoc SQL Performance

• Conversational SQL

• Thousands to 10,000+ ad hoc SQL transactions/second

• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL

Performance

Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

• UNION

• Column Functions

• Counting function (leaderboard ranking queries)

• Ability to define index using column functions

Easier Development: New SQL Support

• JSON values stored in a varchar column

• Field() column function

• Indexing on JSON elements

CREATE INDEX session_site_moderator

ON user_session_table (field(json_data, 'site'),

field(json_data, 'moderator'), username);

• New JSON sample in kit

Easier Development: JSON Support

Easier Development: JSON Support

Easier Development: Online Operations

Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to

existing operations

• Online schema update

• No service window

Easier Development: Streamlined Development

• Elimination of project.xml

• VoltDB-specific configuration now defined in DDL

• Defaulting of deployment.xml

• New Volt Compiler CLI:

voltdb compile

Easier Development:

Streamlined Development

Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

• Elimination of strict NTP configuration

• Scales to large # of nodesExpanded Reach:

Cloud-Friendly

Integration: High-Performance Export

• Parallelized export

• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export

Integration: Client Library Updates

• New PHP Client

• Node.js client v1.0

• Go Client

• Coming soon: updated Erlang client

Integration: Client Library Updates

http://golang.org

Other Notable New Features

• Explain command

• CSV loader utility

• CSV snapshots

• New Administration CLI: voltadmin

– voltadmin save

– voltadmin restore

– voltadmin pause

– voltadmin resume

– voltadmin shutdown

Other Notable New Features

More Samples Available

for Download

More Samples Available for Download

http://voltdb.com/comm

unity/volt-labs.php

Volt University

• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly

• Curriculum and supporting material range from beginner to advanced

• Three types of instruction:

– Volt University Online

– Volt University Classroom

– Volt Vanguard Certification

Volt University

Summary: VoltDB v3.0 Features

• Even faster

• Easier to build high-velocity applications

• Expanded reach across developers and applications

• Extensible to integrate with existing data infrastructure

• Volt Labs

• Volt University

VoltDB v3.0

DOWNLOAD 3.0

atwww.voltdb.com

Imagine the

Possibilities

More Information?

E-mail info@voltdb.com

Visit our forumshttp://community.voltdb.com/forum

Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index

Follow@VoltDB on Twitter

More Information?

QUESTIONS?

THANK YOU