47
Stonebraker Live! Navigating the Database Universe VoltDB presents

VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Embed Size (px)

Citation preview

Page 1: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Stonebraker Live!

Navigating the Database Universe

VoltDB presents

Page 2: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

SCOTT JARR

Co-founder and Chief Strategy Officer

Page 3: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

• The (proper) design of DBMSs

– Presented by Dr. Michael Stonebraker, Co-founder

• The database universe

– Presented by Scott Jarr, Co-founder and Chief Strategy Officer

• Introducing VoltDB 3.0

– Presented by Mark Hydar, VP of Market Technology and Strategy

Agenda

Page 4: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

• “Big Data” is a rare, transformative market

• Velocity is becoming the cornerstone

• Specialized databases (working together) are

the answer

• Products must provide tangible customer

value... Fast

We Believe…

Page 5: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

THE (PROPER) DESIGN

OF THE DBMS

Dr. Michael Stonebraker

Page 6: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Lessons from 40 Years of Database Design

1. Get the user interaction right

– Bet on a small number of easy-to-understand constructs

– Plus standards

2. Get the implementation right

– Bet on a small number of easy-to-understand constructs

3. One size does not fit all

– At least not if you want fast, big or complex

Those who don’t learn from history are destined to repeat it.

“”-Winston Churchill

Page 7: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

#1: Get the User Interaction Right

Winner: RDBMS

• Simple data model (tables)

• Simple access language (SQL)

• ACID (transactions)

• Standards (SQL)

Loser: CODASYL• Complicated data model

(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)

• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)

Loser: OODBs

• Complex data model (hierarchical records, pointers, sets, arrays, etc.)

• Complex access language (navigation, through this sea)

• No standards

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Page 8: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and

made people productive (transportable skills)

Page 9: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

#2: Get the Implementation Right

• Leverage a few simple ideas: Early relational implementations

– System R storage system dropped links

– Views (protection, schema modification, performance)

– Cost-based optimizer

• Leverage a few simple ideas: Postgres

– User-defined data types and functions (adopted by most everybody)

– Rules/triggers

– No-overwrite storage

• Leverage a few simple ideas: Vertica

– Store data by column

– Compressed up the ging gong

– Parallel load without compromising ACID

His

toric

al W

inn

ers

Page 10: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

#3: One Size Does NOT Fit All

• OSFA is an old technology with hundreds

of bags hanging off it

• It breaks 100% of the time when under

load

• Load = size or speed or complexity

• Load is increasing at a startling rate

• Purpose-built will exceed by 10x to 100x

• History has not been completely written

yet…but let’s look at VoltDB as an

example

…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.

”-My Top 10 Assertions About Data Warehouses, 2010

Page 11: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Example: VoltDB

• Get the interface right– SQL

– ACID

• Implementation: Leverage a few simple ideas– Main memory

– Stored procedures

– Deterministic scheduling

• Specialization– OLTP focus allowed for above implementation choices

Page 12: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Proving the Theory

• Challenge: OLTP performance

– TPC-C CPU cycles

– On the Shore DBMS prototype

– Elephants should be similar

Recovery 24%Latching 24%

Buffer Pool 24%Locking 24%

Useful Work4%

Page 13: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Single Threaded

• Gets rid of the latching problem

• What about Multicore?

– Divide the memory on an N-core node so it looks like N single-core nodes

– Which are single threaded…

Page 14: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Implementation Construct #1: Main Memory

• Main memory format for data

– Disk format gets you buffer pool overhead

• What happens if data doesn’t fit?

– Return to disk-buffer pool architecture (slow)

– Anti-caching

• Main memory format for data

• When memory fills up, then bundle together elderly tuples and write them out

• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)

• Run Xact normally

Page 15: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive

– Do it once per transaction

– Not once per command

– Or even once per cursor move

• Ad-hoc queries supported

– Turn them into dynamic stored procedures

Page 16: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Implementation Construct #3: Deterministic Scheduling

• Transactions are ordered and run to completion

– No locking

• Active-active replication (HA)

– Run transaction at all replicas – in the same pre-determined order

• What about a cluster-wide power failure?

– Asyn checkpointing

– With a command log

– Wildly faster than data logging

Page 17: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive

– SQL & ACID

• Leveraging a few simple implementation ideas – made

VoltDB wicked fast

– Main memory

– Stored procedures

– Deterministic scheduling

Page 18: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Proving the Theory

• Answer: OLTP performance

– 3 million transactions per second

– 7x Cassandra

– 15 million SQL statements per

second

– 100,000+ transactions per

commodity server

…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.

”-The End of an Architectural Era (It’s Time for a Complete

Rewrite), 2007

Page 19: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

THE DATABASE UNIVERSE

Scott Jarr

Page 20: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Technology Meets the Market

Believe

– “Big Data” is a rare, transformative market

– Velocity is becoming the cornerstone

– Specialized databases (working together) are the answer

– Products must provide tangible customer value… Fast

Observations

– Noisy, crowded and new – kinda like Christmas shopping at the mall

– Everyone wants to understand where the pieces fit

– Analysts build maps on technology NOT use cases

What we need is…

Page 21: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade

• Serve ad

• Enrich stream

• Examine packet

• Approve trans.

• Calculate risk

• Leaderboard

• Aggregate

• Count

• Retrieve click

stream

• Show orders

• Backtest algo

• BI

• Daily reports

• Algo discovery

• Log analysis

• Fraud pattern match

Age of Data

Page 22: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade

• Serve ad

• Enrich stream

• Examine packet

• Approve trans.

• Calculate risk

• Leaderboard

• Aggregate

• Count

• Retrieve click

stream

• Show orders

• Backtest algo

• BI

• Daily reports

• Algo discovery

• Log analysis

• Fraud pattern match

Value of Individual

Data Item

Da

ta V

alu

e

Aggregate

Data Value

Age of Data

Page 23: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Ap

pli

ca

tio

n C

om

ple

xit

y

Value of Individual Data Item Aggregate Data Value

Da

ta V

alu

e

The Database Universe

Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory

Analytics

Transactional Analytic

Page 24: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Traditional RDBMS

Simple SlowSmall

FastComplexLarge

Ap

pli

ca

tio

n C

om

ple

xit

y

Value of Individual Data Item Aggregate Data Value

Da

ta V

alu

e

Data

Warehouse

Hadoop, etc.NoSQL

The Database Universe

Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory

Analytics

Transactional Analytic

NewSQL

Velocity

Page 25: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Page 26: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Closed-loop Big Data

• Make the most

informed decision

every time there is an

interaction

• Real-time decisions

are informed by

operational analytics

and past knowledge

Knowledge

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Page 27: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

The Velocity Use Case

What’s it look like?

– High throughput, relentless data feeds

– Fast decisions on high-value data

– Real-time, operational analytics present immediate visibility

What’s the big deal?

– Batch visibility converts to real time = immediate business impact

– Decisions made at time of event = higher impact decisions with immediate returns

– Ability to ingest and manage massive amounts of data = business differentiation and disruption

Page 28: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

HELLO 3.0!

Mark Hydar

Page 29: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Introducing VoltDB 3.0

Introducing VoltDB 3.0

• Available now!

– Both commercial and open source offerings

– www.voltdb.com/downloads

• Key improvements

– Even faster

– Easier to build high-velocity applications

– Expanded reach across developers and applications

– Extensible to integrate with existing data infrastructure

Page 30: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Latency and Throughput, 50-50 Read/Write Workload

Latency and Throughput, 50-50 Read/Write Workload

0

2

4

6

8

10

12

14

16

-50000 0 50000 100000 150000 200000 250000 300000

Late

ncy (

ms)

TPS

3.0

2.8.4.1

VoltDB 3.0 vs. v2.8.4.1

Key/Value 50/50 read/write workload

3 Node, K=1 Cluster

Page 31: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Read/Write Workload Latency/Throughput

Read/Write Workload Latency/Throughput

0

1

2

3

4

5

6

7

8

9

-50000 0 50000 100000 150000 200000 250000 300000 350000

Avg

. L

ate

ncy (

ms)

TPS

10% read/90% write

50% read/50% write

90% read/10% write

VoltDB 3.0

Key/Value various read/write workload

3 Node, K=1 Cluster

Page 32: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Faster: Ad Hoc SQL Performance

• Conversational SQL

• Thousands to 10,000+ ad hoc SQL transactions/second

• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL

Performance

Page 33: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

• UNION

• Column Functions

• Counting function (leaderboard ranking queries)

• Ability to define index using column functions

Easier Development: New SQL Support

Page 34: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

• JSON values stored in a varchar column

• Field() column function

• Indexing on JSON elements

CREATE INDEX session_site_moderator

ON user_session_table (field(json_data, 'site'),

field(json_data, 'moderator'), username);

• New JSON sample in kit

Easier Development: JSON Support

Easier Development: JSON Support

Page 35: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Easier Development: Online Operations

Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to

existing operations

• Online schema update

• No service window

Page 36: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Easier Development: Streamlined Development

• Elimination of project.xml

• VoltDB-specific configuration now defined in DDL

• Defaulting of deployment.xml

• New Volt Compiler CLI:

voltdb compile

Easier Development:

Streamlined Development

Page 37: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

• Elimination of strict NTP configuration

• Scales to large # of nodesExpanded Reach:

Cloud-Friendly

Page 38: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Integration: High-Performance Export

• Parallelized export

• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export

Page 39: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Integration: Client Library Updates

• New PHP Client

• Node.js client v1.0

• Go Client

• Coming soon: updated Erlang client

Integration: Client Library Updates

http://golang.org

Page 40: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Other Notable New Features

• Explain command

• CSV loader utility

• CSV snapshots

• New Administration CLI: voltadmin

– voltadmin save

– voltadmin restore

– voltadmin pause

– voltadmin resume

– voltadmin shutdown

Other Notable New Features

Page 41: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

More Samples Available

for Download

More Samples Available for Download

http://voltdb.com/comm

unity/volt-labs.php

Page 42: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Volt University

• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly

• Curriculum and supporting material range from beginner to advanced

• Three types of instruction:

– Volt University Online

– Volt University Classroom

– Volt Vanguard Certification

Volt University

Page 43: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Summary: VoltDB v3.0 Features

• Even faster

• Easier to build high-velocity applications

• Expanded reach across developers and applications

• Extensible to integrate with existing data infrastructure

• Volt Labs

• Volt University

VoltDB v3.0

Page 44: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

DOWNLOAD 3.0

atwww.voltdb.com

Imagine the

Possibilities

Page 45: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

More Information?

E-mail [email protected]

Visit our forumshttp://community.voltdb.com/forum

Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index

Follow@VoltDB on Twitter

More Information?

Page 46: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

QUESTIONS?

Page 47: VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

THANK YOU