VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Stonebraker Live!

Navigating the Database Universe

VoltDB presents

SCOTT JARR

Co-founder and Chief Strategy Officer

• The (proper) design of DBMSs

– Presented by Dr. Michael Stonebraker, Co-founder

• The database universe

– Presented by Scott Jarr, Co-founder and Chief Strategy Officer

• Introducing VoltDB 3.0

– Presented by Mark Hydar, VP of Market Technology and Strategy

Agenda

• “Big Data” is a rare, transformative market

• Velocity is becoming the cornerstone

• Specialized databases (working together) are

the answer

• Products must provide tangible customer

value... Fast

We Believe…

THE (PROPER) DESIGN

OF THE DBMS

Dr. Michael Stonebraker

Lessons from 40 Years of Database Design

1. Get the user interaction right

– Bet on a small number of easy-to-understand constructs

– Plus standards

2. Get the implementation right

– Bet on a small number of easy-to-understand constructs

3. One size does not fit all

– At least not if you want fast, big or complex

Those who don’t learn from history are destined to repeat it.

“”-Winston Churchill

#1: Get the User Interaction Right

Winner: RDBMS

• Simple data model (tables)

• Simple access language (SQL)

• ACID (transactions)

• Standards (SQL)

Loser: CODASYL• Complicated data model

(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)

• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)

Loser: OODBs

• Complex data model (hierarchical records, pointers, sets, arrays, etc.)

• Complex access language (navigation, through this sea)

• No standards

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and

made people productive (transportable skills)

#2: Get the Implementation Right

• Leverage a few simple ideas: Early relational implementations

– System R storage system dropped links

– Views (protection, schema modification, performance)

– Cost-based optimizer

• Leverage a few simple ideas: Postgres

– User-defined data types and functions (adopted by most everybody)

– Rules/triggers

– No-overwrite storage

• Leverage a few simple ideas: Vertica

– Store data by column

– Compressed up the ging gong

– Parallel load without compromising ACID

#3: One Size Does NOT Fit All

• OSFA is an old technology with hundreds

of bags hanging off it

• It breaks 100% of the time when under

• Load = size or speed or complexity

• Load is increasing at a startling rate

• Purpose-built will exceed by 10x to 100x

• History has not been completely written

yet…but let’s look at VoltDB as an

example

…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.

”-My Top 10 Assertions About Data Warehouses, 2010

Example: VoltDB

• Get the interface right– SQL

– ACID

• Implementation: Leverage a few simple ideas– Main memory

– Stored procedures

– Deterministic scheduling

• Specialization– OLTP focus allowed for above implementation choices

Proving the Theory

• Challenge: OLTP performance

– TPC-C CPU cycles

– On the Shore DBMS prototype

– Elephants should be similar

Recovery 24%Latching 24%

Buffer Pool 24%Locking 24%

Useful Work4%

Single Threaded

• Gets rid of the latching problem

• What about Multicore?

– Divide the memory on an N-core node so it looks like N single-core nodes

– Which are single threaded…

Implementation Construct #1: Main Memory

• Main memory format for data

– Disk format gets you buffer pool overhead

• What happens if data doesn’t fit?

– Return to disk-buffer pool architecture (slow)

– Anti-caching

• Main memory format for data

• When memory fills up, then bundle together elderly tuples and write them out

• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)

• Run Xact normally

Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive

– Do it once per transaction

– Not once per command

– Or even once per cursor move

• Ad-hoc queries supported

– Turn them into dynamic stored procedures

Implementation Construct #3: Deterministic Scheduling

• Transactions are ordered and run to completion

– No locking

• Active-active replication (HA)

– Run transaction at all replicas – in the same pre-determined order

• What about a cluster-wide power failure?

– Asyn checkpointing

– With a command log

– Wildly faster than data logging

Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive

– SQL & ACID

• Leveraging a few simple implementation ideas – made

VoltDB wicked fast

– Main memory

– Stored procedures

– Deterministic scheduling

Proving the Theory

• Answer: OLTP performance

– 3 million transactions per second

– 7x Cassandra

– 15 million SQL statements per

second

– 100,000+ transactions per

commodity server

…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.

”-The End of an Architectural Era (It’s Time for a Complete

Rewrite), 2007

THE DATABASE UNIVERSE

Scott Jarr

Technology Meets the Market

Believe

– “Big Data” is a rare, transformative market

– Velocity is becoming the cornerstone

– Specialized databases (working together) are the answer

– Products must provide tangible customer value… Fast

Observations

– Noisy, crowded and new – kinda like Christmas shopping at the mall

– Everyone wants to understand where the pieces fit

– Analysts build maps on technology NOT use cases

What we need is…

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade

• Serve ad

• Enrich stream

• Examine packet

• Approve trans.

• Calculate risk

• Leaderboard

• Aggregate

• Count

• Retrieve click

stream

• Show orders

• Backtest algo

• BI

• Daily reports

• Algo discovery

• Log analysis

• Fraud pattern match

Age of Data

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade

• Serve ad

• Enrich stream

• Examine packet

• Approve trans.

• Calculate risk

• Leaderboard

• Aggregate

• Count

• Retrieve click

stream

• Show orders

• Backtest algo

• BI

• Daily reports

• Algo discovery

• Log analysis

• Fraud pattern match

Value of Individual

Data Item

Aggregate

Data Value

Age of Data

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Value of Individual Data Item Aggregate Data Value

The Database Universe

Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory

Analytics

Transactional Analytic

Traditional RDBMS

Simple SlowSmall

FastComplexLarge

Value of Individual Data Item Aggregate Data Value

Warehouse

Hadoop, etc.NoSQL

The Database Universe

Interactive Real-time Analytics Record Lookup Historical AnalyticsExploratory

Analytics

Transactional Analytic

NewSQL

Velocity

Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Closed-loop Big Data

• Make the most

informed decision

every time there is an

interaction

• Real-time decisions

are informed by

operational analytics

and past knowledge

Knowledge

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

The Velocity Use Case

What’s it look like?

– High throughput, relentless data feeds

– Fast decisions on high-value data

– Real-time, operational analytics present immediate visibility

What’s the big deal?

– Batch visibility converts to real time = immediate business impact

– Decisions made at time of event = higher impact decisions with immediate returns

– Ability to ingest and manage massive amounts of data = business differentiation and disruption

HELLO 3.0!

Mark Hydar

Introducing VoltDB 3.0

• Available now!

– Both commercial and open source offerings

– www.voltdb.com/downloads

• Key improvements

– Even faster

– Easier to build high-velocity applications

– Expanded reach across developers and applications

– Extensible to integrate with existing data infrastructure

Latency and Throughput, 50-50 Read/Write Workload

-50000 0 50000 100000 150000 200000 250000 300000

2.8.4.1

VoltDB 3.0 vs. v2.8.4.1

Key/Value 50/50 read/write workload

3 Node, K=1 Cluster

Read/Write Workload Latency/Throughput

-50000 0 50000 100000 150000 200000 250000 300000 350000

10% read/90% write

50% read/50% write

90% read/10% write

VoltDB 3.0

Key/Value various read/write workload

3 Node, K=1 Cluster

Faster: Ad Hoc SQL Performance

• Conversational SQL

• Thousands to 10,000+ ad hoc SQL transactions/second

• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL

Performance

Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

• UNION

• Column Functions

• Counting function (leaderboard ranking queries)

• Ability to define index using column functions

Easier Development: New SQL Support

• JSON values stored in a varchar column

• Field() column function

• Indexing on JSON elements

CREATE INDEX session_site_moderator

ON user_session_table (field(json_data, 'site'),

field(json_data, 'moderator'), username);

• New JSON sample in kit

Easier Development: JSON Support

Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to

existing operations

• Online schema update

• No service window

Easier Development: Streamlined Development

• Elimination of project.xml

• VoltDB-specific configuration now defined in DDL

• Defaulting of deployment.xml

• New Volt Compiler CLI:

voltdb compile

Easier Development:

Streamlined Development

Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

• Elimination of strict NTP configuration

• Scales to large # of nodesExpanded Reach:

Cloud-Friendly

Integration: High-Performance Export

• Parallelized export

• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export

Integration: Client Library Updates

• New PHP Client

• Node.js client v1.0

• Go Client

• Coming soon: updated Erlang client

Integration: Client Library Updates

http://golang.org

Other Notable New Features

• Explain command

• CSV loader utility

• CSV snapshots

• New Administration CLI: voltadmin

– voltadmin save

– voltadmin restore

– voltadmin pause

– voltadmin resume

– voltadmin shutdown

Other Notable New Features

More Samples Available

for Download

More Samples Available for Download

http://voltdb.com/comm

unity/volt-labs.php

Volt University

• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly

• Curriculum and supporting material range from beginner to advanced

• Three types of instruction:

– Volt University Online

– Volt University Classroom

– Volt Vanguard Certification

Volt University

Summary: VoltDB v3.0 Features

• Even faster

• Easier to build high-velocity applications

• Expanded reach across developers and applications

• Extensible to integrate with existing data infrastructure

• Volt Labs

• Volt University

VoltDB v3.0

DOWNLOAD 3.0

atwww.voltdb.com

Imagine the

Possibilities

More Information?

E-mail info@voltdb.com

Visit our forumshttp://community.voltdb.com/forum

Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index

Follow@VoltDB on Twitter

More Information?

QUESTIONS?

THANK YOU

VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Technology

VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

VoltDB - Stonebraker Live! - New York City 2013

Housing Strategies-programs 1.29.13 for Community & Economic Development Committee

VoltDB and the Jepsen Test

VoltDB Partner Program Planning to Expand (Slides)

VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

Voltdb @ NoSQL[br]

VoltDB : A Technical Overview

Postgres by Stonebraker

NIST Stonebraker pdf

"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB

VoltDB talk at QCON-Brasil

Copyright 2000 Reserved€ Stonebraker

Tackling Data Curation in Three Generations Mike Stonebraker

The NewSQL database for high velocity applications Introduction to VoltDB Big Data & Analytics – Unites States AFPOA Fred Holahan, CMO, VoltDB, Inc. e:

10 Minute Intro to VoltDB

Voltdb: Shard It by V. Torshyn

VoltDB for Financial Services Technical Overview · 2020-04-07 · VoltDB for Financial Services Technical Overview Financial services organizations have multiple masters: regulators,

1 SQL SQL - intergalactic dataspeak [Stonebraker]

What to do with Scientific Data? Michael Stonebraker