30
Active Cloud DB: A RESTful Software-as-a-Service for Language Agnostic Access to Distributed Datastores Chris Bunch Jonathan Kupferman Chandra Krintz Wednesday, October 27, 2010 CloudComp 2010 1

Active Cloud DB at CloudComp '10

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Active Cloud DB at CloudComp '10

Active Cloud DB: A RESTful Software-as-a-Service for Language

Agnostic Access to Distributed Datastores

Chris Bunch Jonathan Kupferman Chandra KrintzWednesday, October 27, 2010

CloudComp 2010

1

Page 2: Active Cloud DB at CloudComp '10

Who’s Using NoSQL?

2

and many others!

Page 3: Active Cloud DB at CloudComp '10

Do It Yourself!

• Pick a datastore

• Learn how the interfaces SHOULD work

• Learn how the interfaces REALLY work

• Migrate to a non-relational data model

• each of these are non-trivial!

3

Page 4: Active Cloud DB at CloudComp '10

Trouble in Paradise

4

(at least they’re honest about it)

Page 5: Active Cloud DB at CloudComp '10

The Problem

• No way to compare databases with real applications

• No standard on what a real test is

• Too many variables in the equation

• Topology, query language, data model, APIs, consistency settings (to name a few)

5

Page 6: Active Cloud DB at CloudComp '10

You Need A Better Way

• Need a platform to:

• Easily evaluate datastores

• Quickly evaluate datastores

• Evaluate datastores on similar metrics

6

Page 7: Active Cloud DB at CloudComp '10

Our Contribution

• Active Cloud DB: A Google App Engine app that exposes the DB via REST

• Exposes string key/value DB

• Speed up repeated operations via caching

• Works on Google or AppScale

• Free access to BigTable

7

Page 8: Active Cloud DB at CloudComp '10

8

Page 9: Active Cloud DB at CloudComp '10

Realistically Speaking

• One test takes ~ 2 hours

• In one day at work you could generate a graph comparing:

• HBase

• Cassandra

• Google BigTable

• Amazon SimpleDB

9

Page 10: Active Cloud DB at CloudComp '10

RESTful Interface

• GET /resources/key ➜ get

• POST /resources/key (with value) ➜ put

• DELETE /resources/key ➜ delete

• GET /resources ➜ query (get all)

10

Page 11: Active Cloud DB at CloudComp '10

Caching Support

• Leverages Memcache API / memcached

• Provides a Least-Recently-Used Cache

• Write-through caching strategy - all puts / deletes are written to the cache

• Generational caching strategy - queries use a generation number

11

Page 12: Active Cloud DB at CloudComp '10

Bookstore App

• Four prototypes available that use Active Cloud DB:

• Ruby on Rails

• Ruby (through Sinatra)

• Python (via Django)

• Python (through web.py)

12

Page 13: Active Cloud DB at CloudComp '10

13

Page 14: Active Cloud DB at CloudComp '10

The Actual Code

• With BigTable:

• val = `curl -X GET http://your-app.appspot.com/resources/#{key}`

• Or in AppScale:

• val = `curl -X GET http://128.111.55.223:8080/resources/#{key}`

14

Page 15: Active Cloud DB at CloudComp '10

• Originally presented at CloudComp 2009

• An open-source implementation of the Google App Engine APIs

• Automatically configures and deploys cloud infrastructures to run your application

• includes database deployment

15

Page 16: Active Cloud DB at CloudComp '10

• Supported Datastores as of AppScale 1.4:

• HBase, Hypertable

• MySQL

• Cassandra, Voldemort, Scalaris

• MongoDB

• MemcacheDB

• Amazon SimpleDB

16

Page 17: Active Cloud DB at CloudComp '10

17

Page 18: Active Cloud DB at CloudComp '10

Not Good Enough

• AppScale / GAE solve the problem for Python and Java

• But only with certain APIs

• And with certain restrictions

• Need something general purpose

•All languages, no restrictions

18

Page 19: Active Cloud DB at CloudComp '10

But how do we test it?

• Cassandra 0.5.0 / MemcacheDB 1.2.1β

• Place 1000 items in the database and time:

• Get, put, query, delete operations

• Nine accessor threads

• Standard deployment model

19

Page 20: Active Cloud DB at CloudComp '10

20

Page 21: Active Cloud DB at CloudComp '10

21

Page 22: Active Cloud DB at CloudComp '10

22

Page 23: Active Cloud DB at CloudComp '10

A different type of test

• Workload model

• 10000 random operations selected

• 50/30/20 get/put/query ratio

• Constrained to 16 nodes

• Performed on initially empty database

23

Page 24: Active Cloud DB at CloudComp '10

24

Page 25: Active Cloud DB at CloudComp '10

25

Page 26: Active Cloud DB at CloudComp '10

26

Page 27: Active Cloud DB at CloudComp '10

Future Work

• Performance impact of:

• Cache size

• Millions of items in DB

• Overhead of Active Cloud DB

• Transaction support

27

Page 28: Active Cloud DB at CloudComp '10

Related Work

• BigTable as a Web Service

• Not open source, HBase-like API

• Yahoo Cloud Serving Benchmark[SOCC10]

• Doesn’t run applications

• No automation - you set up the DB, you set up the schemas, etc.

28

Page 29: Active Cloud DB at CloudComp '10

Active Cloud DB is Open for Business

• Open source - free to use

• Customize your own batch test or workload test

• Access it via any programming language

• Bookstore applications included

29

Page 30: Active Cloud DB at CloudComp '10

Thanks!

• Download Active Cloud DB and AppScale:

• http://appscale.cs.ucsb.edu

• To my advisor, Chandra Krintz

• To the AppScale team, especially co-lead Navraj Chohan

30