35
VoltDB (New SQL) SUNNIE CHUNG CIS 612

Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB (New SQL)SUNNIE CHUNG

CIS 612

Page 2: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB

� VoltDB is an ACID-compliant relational database management system, which uses memory as storage to maximize performance.

� VoltDB also uses shared-nothing architecture in which each node is independent and self-sufficient. The architecture brings a RDBMS as VoltDB scalability.

2

Page 3: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Architecture

� VoltDB belongs to a NewSQL relation database system. NewSQL is type of modern database management systems that seek to provide the same scalable performance of NoSQL while still maintaining the ACID guarantees of traditional database system.

� Automatic partitioning across shared-nothing server cluster

� Main-memory data architecture

� Elimination of multi-threading and locking overhead

� Automatic replication and command logging

� Stored procedure interface for transactions

3

Page 4: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Features

� VoltDB uses in-memory storage to maximize throughput, avoiding costly disk access.

� Further performance gains are achieved by serializing all data access, avoiding many of the time-consuming functions of traditional databases such as locking, latching, and maintaining transaction logs.

� Scalability, reliability, and high availability are achieved through clustering and replication across multiple servers and server farms.

� Scaling is transparent to applications and can be done in two dimensions: Up (by increasing the capacity of existing database nodes) and Out (by increasing the number of nodes in cluster)

4

Page 5: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

ACID in VoltDB

� VoltDB is a fully ACID-compliant transactions database, relieving the application developer from having to develop code to perform transactions and mange rollbacks within their own application.

� It guarantees that data will be 100% accurate all the time. ACID is ensured by:

� Data is organized into in-memory partitions

� Clients connect to the database and send transactions

� Incoming transactions are routed to data and executed serially

� Each stored procedure is defined as a transaction, the stored procedure succeeds and rollbacks as a whole to ensure database consistency.

5

Page 6: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

ACID in VoltDB

� Also by using serialized processing (single-threaded), VoltDB ensures transactional consistency without the overhead of locking, latching and transaction logs. Handling multiple requests at a time is conducted by partitioning.

� It is slower with multiple-partitioned transaction than with single-partitioned transaction but the integrity is maintained and throughput is maximized.

6

Page 7: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

How VoltDB Works

� Tested with VoltDB Enterprise version 4.2 for Mac

� VoltDB is not like traditional database products because there is no generic database. Instead, each VoltDB database is optimized for a specific application by compiling the schema, stored procedure, and partitioning information into VoltDB application catalog.

� The catalog then will be loaded on or more lost machines to create a distributed database.

7

Page 8: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB example

Example: a schema is saved in a text file towns.sql

CREATE TABLE towns (

town VARCHAR(128),

county VARCHAR(64),

state VARCHAR(2)

);

8

Page 9: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Compiling the Application Catalog:

� $ voltdb compile towns.sql

� ------------------------------------------

� Successfully created catalog.jar

� Includes schema: towns.sql

� [MP][WRITE] TOWNS.insert

� INSERT INTO TOWNS VALUES (?, ?, ?);

� ------------------------------------------

� Catalog contains 1 built-in CRUD procedures.

� Simple insert, update, delete and select procedures are created

� automatically for convenience.

� ------------------------------------------

� Full catalog report can be found at file:///Users/nqt289/Desktop/voltdb/catalog-report.html

� Or can be viewed at "http://localhost:8080" when the server is running.

9

Page 10: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB Command

� Or to name the catalog (default is catalog.jar)

$ voltdb compile –o towns.jar towns.sql

Starting the Database:

$ voltdb create towns.jar

Initializing VoltDB...

Build: 4.2 voltdb-4.2-0-gc9751d3-local Enterprise Edition

Connecting to VoltDB cluster as the leader...

Host id of this node is: 0

Starting VoltDB with trial license. License expires on May 17, 2014.

Initializing the database and command logs. This may take a moment...

WARN: This is not a highly available cluster. K-Safety is set to 0.

Server completed initialization.

� Check report, schema, procedure, etc. at http://localhost:8080/

10

Page 11: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Command Line interface

� VoltDB provides a SQL shell interpreter that allows users to execute VoltDB SQL and Stored Procedure interactively as well as non-interactively via scripts.

� VOLTDB provides a command line interface, which can be accessed through sqlcmd

� $ sqlcmd

� SQL Command :: localhost:21212

� 1>

11

Page 12: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Command Line interface

� Three key options at the sqlcmd prompt:

� SQL queries: for ad hoc SQL queries

� Procedure calls: execute stored procedures

� Exit: to exit interactive session

12

Page 13: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB Query/Syntax� VoltDB supports a subset of ANSI-standard SQL 99, including CREATE

INDEX, CREATE TABLE, and CREATE VIEW for schema definition and SELECT, INSERT, UPDATE, and DELETE for data manipulation.

Insert statement:

1> insert into towns values ('Billerica','Middlesex','MA');

(1 row(s) affected)

2> insert into towns values ('Buffalo','Erie','NY');

(1 row(s) affected)

3> insert into towns values ('Bay View','Erie','OH');

(1 row(s) affected)

13

Page 14: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB Query/Syntax

� Select statement:

4> select count(*) as total from towns;

TOTAL

------

3

(1 row(s) affected)

5> select town, state from towns ORDER BY town;

TOWN STATE

---------- ------

Bay View OH

Billerica MA

Buffalo NY

(3 row(s) affected)

� Exit:

6> exit

14

Page 15: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB Input

� CSV and TXT files are standard input files to be loaded into VoltDBdatabase.

� VoltDB provides a simplified CSV loader through shell script csvloader.

� Command:

� csvloader tableName < dataFile.csv

� csvloader tableName –f dataFile.csv

15

Page 16: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB Input Example:

� Create a database with two tables towns and people from a schema saved in towns.sql

$ voltdb compile -o towns.jar towns.sql

$ voltdb create towns.jar

� Prepare input files

$ cut -d'|' -f2,4-7,16 POP_PLACES_20140401.txt | grep -v '|$' | grep -v '||' > towns.txt

16

Page 17: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

VoltDB Input Example:

� Loading the data:

$ csvloader --separator "|" --skip 1 --file towns.txt towns

Read 194465 rows from file and successfully inserted 194465 rows (final)

Elapsed time: 4.989 seconds

� Invalid row file: /Users/nqt289/Desktop/voltdb/csvloader_TOWNS_insert_invalidrows.csv

� Log file: /Users/nqt289/Desktop/voltdb/csvloader_TOWNS_insert_log.log

� Report file: /Users/nqt289/Desktop/voltdb/csvloader_TOWNS_insert_report.log

17

Page 18: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Querying the Database

1> SELECT town,state,elevation from towns order by elevation desc limit 5;

TOWN STATE ELEVATION

------------------------- ------ ----------

Corona (historical) CO 3573

Quartzville (historical) CO 3527

Logtown (historical) CO 3524

Tomboy (historical) CO 3508

Rexford (historical) CO 3484

(5 row(s) affected)

18

Page 19: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Querying the Database

2> select town, count(town) as duplicates from towns

3> group by town order by duplicates desc limit 5;

TOWN DUPLICATES

------------ -----------

Midway 214

Fairview 211

Oak Grove 167

Five Points 150

Riverside 130

(5 row(s) affected)

19

Page 20: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Querying the Database

� Load another file: people.txt

$ csvloader --file people.txt --skip 1 people

Read 3143 rows from file and successfully inserted 1802 rows (final)

Elapsed time: 0.467 seconds

20

Page 21: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Querying the Database� Check “people” table

1> select * from people order by population desc limit 5;

STATE_NUM COUNTY_NUM STATE COUNTY POPULATION

---------- ----------- ----------- ------------------- -----------

6 37 California Los Angeles County 9818605

17 31 Illinois Cook County 5194675

4 13 Arizona Maricopa County 3817117

6 73 California San Diego County 3095313

6 59 California Orange County 3010232

(5 row(s) affected)

21

Page 22: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Querying the Database � Perform join tables

2> select top 5 min(t.elevation) as height,

3> t.state,t.county, max(p.population)

4> from towns as t, people as p

5> where t.state_num=p.state_num and t.county_num=p.county_num

6> group by t.state, t.county order by height desc;

HEIGHT STATE COUNTY C4

------- ------ --------- ------

2754 CO Lake 7310

2640 CO Hinsdale 843

2609 CO Mineral 712

2523 CO San Juan 699

2452 CO Summit 27994

(5 row(s) affected)

22

Page 23: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Save and Recover

� As VoltDB uses memory for operational storage unit, it provides a tool to save database snapshots.

� Snapshots are a complete disk-based representation of a VoltDBdatabase, including everything needed to reproduce the database after a shutdown.

23

Page 24: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Save and Recover

� Save:

$ voltadmin save /Users/nqt289/Desktop/voltdb/voltdbroot/snapshots/ "townsandpeople"

-- Snapshot Save Results --

HOST_ID HOSTNAME TABLE RESULT ERR_MSG

------- ------------------------ ------ ------- -------

0 Thuats-MacBook-Pro.local PEOPLE SUCCESS

0 Thuats-MacBook-Pro.local STATES SUCCESS

0 Thuats-MacBook-Pro.local TOWNS SUCCESS

0 Thuats-MacBook-Pro.local SUCCESS

24

Page 25: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Save and Recover

� Recover:

$ voltdb recover

Initializing VoltDB...

Build: 4.2 voltdb-4.2-0-gc9751d3-local Enterprise Edition

Connecting to VoltDB cluster as the leader...

Host id of this node is: 0

Starting VoltDB with trial license. License expires on May 17, 2014.

Initializing the database and command logs. This may take a moment...

WARN: This is not a highly available cluster. K-Safety is set to 0.

Restoring from path: voltdbroot/snapshots with nonce: townsandpeople

Finished restore of voltdbroot/snapshots with nonce: townsandpeople in 0.87 seconds

Server completed initialization.

25

Page 26: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Save and Recover

� Adding, dropping tables, or changing stored procedure can be done while the database is running. New catalog will be created then data can be recovered.

� When updating schema, the deploymeny.xml is required to specify configurations such as number of servers, number of partitions, etc.

� $ voltadmin update towns.jar voltdbroot/deployment.xml

26

Page 27: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Stored Procedure

� A stored procedure is added to the schema, in order to update a snapshot of running database needs to be saved and new catalog is then compiled.

$ voltadmin save /Users/nqt289/Desktop/voltdb/voltdbroot/snapshots/ "states"

$ voltadmin restore

27

Page 28: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Stored Procedure

CREATE PROCEDURE leastpopulated AS

SELECT TOP 1 county, abbreviation, population

FROM people, states WHERE people.state_num=?

AND people.state_num=states.state_num

ORDER BY population ASC;

28

Page 29: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Test procedure:1> exec leastpopulated 6;

COUNTY ABBREVIATION POPULATION

-------------- ------------- -----------

Alpine County CA 1175

(1 row(s) affected)

2> exec leastpopulated 48;

COUNTY ABBREVIATION POPULATION

-------------- ------------- -----------

Loving County TX 82

(1 row(s) affected)

29

Page 30: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Stored Procedure

� Stored procedure can also be written in Java, which is a more handy way to working with complex procedure.

� Example of stored procedure that uses SELECT to check each record before either INSERT a new one or UPDATE existing one. UpdatePeople.java

See VoltDB Example codes

30

Page 31: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Stored Procedure

� Compile java file with specified jar files

$ javac -cp "/Users/nqt289/voltdb-ent-4.2/voltdb/*" UpdatePeople.java

$ voltadmin save /Users/nqt289/Desktop/voltdb/voltdbroot/snapshots/ "tutorial5"

$ voltdb compile --classpath=./ -o towns.jar towns.sql

$ voltadmin update towns.jar deployment.xml

INFO: The catalog update succeeded.

31

Page 32: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Stored Procedure

� Check two counties with smallest population

1> SELECT TOP 2 county, abbreviation, population

2> FROM people,states WHERE people.state_num=states.state_num

3> ORDER BY population ASC;

COUNTY ABBREVIATION POPULATION

--------------- ------------- -----------

Loving County TX 82

Kalawao County HI 90

(2 row(s) affected)

32

Page 33: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Stored Procedure

� Check two counties with smallest population again:

1> SELECT TOP 2 county, abbreviation, population

2> FROM people,states WHERE people.state_num=states.state_num

3> ORDER BY population ASC;

COUNTY ABBREVIATION POPULATION

--------------- ------------- -----------

Kalawao County HI 90

Loving County TX 94

(2 row(s) affected)

33

Page 34: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Client Applications

� VoltDB provides client libraries in several programming languages (Java, Python, C++, etc.) with the same process:

� Create a client connection to the database

org.voltdb.client.Client client = null;

ClientConfig config = null;

try {

config = new ClientConfig("advent","xyzzy");

client = ClientFactory.createClient(config);

client.createConnection("myserver.xyz.net", 21211);

} catch (java.io.IOException e) {

e.printStackTrace();

System.exit(-1);

}

34

Page 35: Lecture Notes VoltDBcis.csuohio.edu/~sschung/cis612/Lecture_Notes_VoltDB_1.pdfVoltDB VoltDBis an ACID-compliant relational database management system, which uses memory as storage

Client Applications

� Interacting with the database :

� Calling a procedure, processing results

client.callProcedure(new MyCallback(), `

"NewCustomer",

firstname,

lastname,

custID};

� Close the connection

try {

client.drain();

client.close();

} catch (InterruptedException e) {

e.printStackTrace();

}

35