Active Cloud DB at CloudComp '10

Preview:

DESCRIPTION

 

Citation preview

Active Cloud DB: A RESTful Software-as-a-Service for Language

Agnostic Access to Distributed Datastores

Chris Bunch Jonathan Kupferman Chandra KrintzWednesday, October 27, 2010

CloudComp 2010

1

Who’s Using NoSQL?

2

and many others!

Do It Yourself!

• Pick a datastore

• Learn how the interfaces SHOULD work

• Learn how the interfaces REALLY work

• Migrate to a non-relational data model

• each of these are non-trivial!

3

Trouble in Paradise

4

(at least they’re honest about it)

The Problem

• No way to compare databases with real applications

• No standard on what a real test is

• Too many variables in the equation

• Topology, query language, data model, APIs, consistency settings (to name a few)

5

You Need A Better Way

• Need a platform to:

• Easily evaluate datastores

• Quickly evaluate datastores

• Evaluate datastores on similar metrics

6

Our Contribution

• Active Cloud DB: A Google App Engine app that exposes the DB via REST

• Exposes string key/value DB

• Speed up repeated operations via caching

• Works on Google or AppScale

• Free access to BigTable

7

8

Realistically Speaking

• One test takes ~ 2 hours

• In one day at work you could generate a graph comparing:

• HBase

• Cassandra

• Google BigTable

• Amazon SimpleDB

9

RESTful Interface

• GET /resources/key ➜ get

• POST /resources/key (with value) ➜ put

• DELETE /resources/key ➜ delete

• GET /resources ➜ query (get all)

10

Caching Support

• Leverages Memcache API / memcached

• Provides a Least-Recently-Used Cache

• Write-through caching strategy - all puts / deletes are written to the cache

• Generational caching strategy - queries use a generation number

11

Bookstore App

• Four prototypes available that use Active Cloud DB:

• Ruby on Rails

• Ruby (through Sinatra)

• Python (via Django)

• Python (through web.py)

12

13

The Actual Code

• With BigTable:

• val = `curl -X GET http://your-app.appspot.com/resources/#{key}`

• Or in AppScale:

• val = `curl -X GET http://128.111.55.223:8080/resources/#{key}`

14

• Originally presented at CloudComp 2009

• An open-source implementation of the Google App Engine APIs

• Automatically configures and deploys cloud infrastructures to run your application

• includes database deployment

15

• Supported Datastores as of AppScale 1.4:

• HBase, Hypertable

• MySQL

• Cassandra, Voldemort, Scalaris

• MongoDB

• MemcacheDB

• Amazon SimpleDB

16

17

Not Good Enough

• AppScale / GAE solve the problem for Python and Java

• But only with certain APIs

• And with certain restrictions

• Need something general purpose

•All languages, no restrictions

18

But how do we test it?

• Cassandra 0.5.0 / MemcacheDB 1.2.1β

• Place 1000 items in the database and time:

• Get, put, query, delete operations

• Nine accessor threads

• Standard deployment model

19

20

21

22

A different type of test

• Workload model

• 10000 random operations selected

• 50/30/20 get/put/query ratio

• Constrained to 16 nodes

• Performed on initially empty database

23

24

25

26

Future Work

• Performance impact of:

• Cache size

• Millions of items in DB

• Overhead of Active Cloud DB

• Transaction support

27

Related Work

• BigTable as a Web Service

• Not open source, HBase-like API

• Yahoo Cloud Serving Benchmark[SOCC10]

• Doesn’t run applications

• No automation - you set up the DB, you set up the schemas, etc.

28

Active Cloud DB is Open for Business

• Open source - free to use

• Customize your own batch test or workload test

• Access it via any programming language

• Bookstore applications included

29

Thanks!

• Download Active Cloud DB and AppScale:

• http://appscale.cs.ucsb.edu

• To my advisor, Chandra Krintz

• To the AppScale team, especially co-lead Navraj Chohan

30

Recommended