Upload
chris-bunch
View
1.999
Download
0
Embed Size (px)
DESCRIPTION
These are the slides from my talk about the AppScale project at the SBonRails meetup. It covers AppScale as well as Google App Engine and the research projects have come out of it, including Neptune, a Ruby DSL focused on computation-heavy workloads.
Citation preview
The AppScale ProjectPresented by Chris Bunch
(on behalf of the AppScale team)March 7, 2011 @ sbonrails meetup
Thursday, March 10, 2011
Thursday, March 10, 2011
Overview
• Google App Engine
• AppScale - now with 50% Ruby!
• Research Directions
• Neptune - A Ruby DSL for the cloud
Thursday, March 10, 2011
Google App Engine
• A web framework introduced in 2008
• Python and Java supported
• Offers a Platform-as-a-Service: Use Google’s APIs to achieve scale
• Upload your app to Google
Thursday, March 10, 2011
Quotas
Thursday, March 10, 2011
Data Model
• Not relational - semi-structured schema
• Compare to models in Rails
• Exposes a get / put / delete / query interface
Thursday, March 10, 2011
Storing Data
• Datastore API - Persistent storage
• Memcache API - Transient storage
• User can set expiration times
• Blobstore API - Store large files
• need to enable billing to use it
Thursday, March 10, 2011
Be Social!
• Mail API - Send and receive e-mail
• XMPP API - Send and receive IMs
• Channel API - Creating persistent connections via XMPP
• Use for chat rooms, games, etc.
Thursday, March 10, 2011
Background Tasks
• Cron API - Access a URL periodically
• Descriptive language: “every 5 minutes”, “every 1st Sun of Jan, Mar, Dec”, etc.
• Uses a separate cron.yaml file
• Taskqueue API - Within your app, fire off tasks to be done later
Thursday, March 10, 2011
Dealing with Users
• Users API: Uses Google Accounts
• Don’t write that ‘forgot password’ page ever again!
• Authorization: via app.yaml:
• anyone, must login, or admin only
Thursday, March 10, 2011
When Services Fail
• Originally: failures throw exceptions
• Just catch them all!
• Capabilities API: Check if a service is available
• Datastore, Memcache, and so on
Thursday, March 10, 2011
Deploying Your App
• Develop locally on SDK
• Stub implementations of most APIs
• Then deploy to Google
Thursday, March 10, 2011
How to Scale
• Limitations on the programming model:
• No filesystem interaction
• 30 second limit per web request
• Language libraries must be on whitelist
• Sandboxed execution
Thursday, March 10, 2011
Enter AppScale
• App Engine is easy to use
• but we really want to tinker with the internals!
• Need an open platform to experiment on
• test API implementations
• add new APIs
Thursday, March 10, 2011
Enter AppScale
• Lots of NoSQL DBs out there
• Hard to compare DBs
• Configuration and deployment can be complex
• Need one-button deployment
Thursday, March 10, 2011
Storing Data
• Datastore API - AppServers use a database agnostic layer - sends requests to PBServer
• Named for data format: Protocol Buffers
• Memcache API - memcached
• Blobstore API - Custom server
Thursday, March 10, 2011
Be Social!
• Mail API - sendmail (disabled by default)
• XMPP API - ejabberd
• Channel API - strophejs
Thursday, March 10, 2011
Background Tasks
• Cron API - Uses Vixie Cron
• Taskqueue - Separate thread fetches web page
• Both make a single attempt
• Will replace with distributed, fault-tolerant versions
Thursday, March 10, 2011
Dealing with Users
• Users API: Defers users to AppLoadBalancer
• Password reset via command-line tools
• Authorization: no major changes here
Thursday, March 10, 2011
Deploying Your App
• Develop locally on SDK
• Stub implementations of most APIs
• Then deploy to AppScale!
• Use your own cluster or via Amazon
• Command-line tools mirror Amazon’s
Thursday, March 10, 2011
Deploying Your App
• run-instances: Start AppScale
• describe-instances: View cloud metadata
• upload-app: Deploy an App Engine app
• remove-app: Un-deploy an App Engine app
• terminate-instances: Stop AppScale
Thursday, March 10, 2011
Deployment Models
• Cloud deployment: Amazon EC2 or Eucalyptus (the open source implementation of the EC2 APIs)
• Just specify how many machines you need
• Non-cloud deployment via Xen or KVM
Thursday, March 10, 2011
Thursday, March 10, 2011
AppController
• The brains of the outfit
• Runs on every node
• Handles configuration and deployment of all services (including other AppControllers)
• Written in Ruby
Thursday, March 10, 2011
Load balancer
• Routes users to their app via nginx
• haproxy makes sure app servers are live
• Can’t assume the user has DNS:
• Thus we wrote the AppLoadBalancer
• Rails app that routes users to apps
• Performs authentication as well
Thursday, March 10, 2011
AppLoadBalancer
Thursday, March 10, 2011
App Server
• We modified the App Engine SDK
• Easier for Python (source included)
• Harder for Java (had to decompile)
• Removed non-scalable API implementations
• Goal: Use open source whenever possible
Thursday, March 10, 2011
A Common Feature Request
Thursday, March 10, 2011
Database Options
• Open source / open APIs / proprietary
• Master / slave v. peer-to-peer
• Differences in query languages
• Data model (key/val, semi-structured)
• In-memory or persistent
• Data consistency model
• Interfaces - REST / Thrift / libraries
Thursday, March 10, 2011
In AppScale:
• BigTable clones:
• Master / slave relationship
• Master stores metadata
• Slaves store data
• Fault-tolerant to slave failure
• Partially tolerant to master failure
Thursday, March 10, 2011
In AppScale:
• Variably consistent DBs
• Voldemort and
• Both are peer-to-peer: no SPOF
• Voldemort: Specify consistency per table
• Cassandra: Specify consistency per request
Thursday, March 10, 2011
In AppScale:
• Relational:
• Not NoSQL but used like NoSQL
• Document-oriented:
• Targets append-heavy workloads
Thursday, March 10, 2011
In AppScale:
• Key-value datastores:
• MemcacheDB: like memcached but persistent and replicated
• Scalaris: in-memory, no persistence
• SimpleDB: semi-structured but used as key-value (will update this in the future)
Thursday, March 10, 2011
Research Ideas• Placement support
• Monitoring
• Shared memory
• Cost modeling
• Hybrid cloud
• Active Cloud DB
• Disaster Recovery
• Neptune
Thursday, March 10, 2011
Placement Support
Thursday, March 10, 2011
Monitr
Thursday, March 10, 2011
Shared memory
• Since AppServer + DB are co-located, reduce message overhead
• no serialization
• Leverage CoLoRs to do so across languages
• AS is in Python or Java, DBS is Python
• Can be orders-of-magnitude faster
Thursday, March 10, 2011
Cost modeling
• Can we reproduce Google’s cost model?
• We can reproduce memory, network bandwidth in / out, size and types of data
• Can’t reproduce CPU - it’s based on Google’s load, which we can’t capture
• varies based on placement and time of day
Thursday, March 10, 2011
Hybrid Cloud
Thursday, March 10, 2011
Database Agnostic Transactions
• Want to support disparate DBs with ACID
• Leverage ZooKeeper for versioning
• And PBServer as the DB agnostic layer
• Needs strong consistency from DB itself
• And row-level atomicity on updates
Thursday, March 10, 2011
Active Cloud DB
• Need a common interface to DBs
• But not just for Java / Python
• Named after Rails’ ActiveRecord
• Exposes REST interface for DB
• Included in AppScale 1.3
Thursday, March 10, 2011
Disaster Recovery
• People are using App Engine as a production level environment
• Need a way to automatically back up data
• Can leverage this data for data analytics
• Need to also seamlessly switch to AppScale version if App Engine version goes down
Thursday, March 10, 2011
Neptune
• Need a simple way to run compute-intensive jobs
• We have the code from the ‘net
• We have the resources - the cloud
• But the average user does not have the know how
• Our solution: create a domain specific language for configuring cloud apps
• Based on Ruby
Thursday, March 10, 2011
Syntax
• It’s as easy as:
neptune :type => “mpi”,
:code => “MpiNQueens”,
:nodes_to_use => 8,
:output => “/mpi/output-1.txt”
Thursday, March 10, 2011
Neptune Supports:
• Message Passing Interface (MPI)
• MapReduce
• Unified Parallel C (UPC)
• X10
• Erlang
Thursday, March 10, 2011
Extensibility
• Experts can add support for other computational jobs
• Biochemists can run simulations via DFSP and dwSSA
• Embarassingly parallel Monte Carlo simulations
Thursday, March 10, 2011
Compiling Code
• You may not have the binaries, so compile from source!
• Auto-generates makefiles for beginners
neptune :type => “compile”,
:code => “/home/appscale/mpi_nqueens”
Thursday, March 10, 2011
Installing Neptune
• Just use good old ‘gem’:
• gem install neptune
• Current version is 0.0.4, fully compatible with AppScale 1.5
• More info at our web page:
• http://neptune-lang.org
Thursday, March 10, 2011
Wrapping It Up
• Thanks to the AppScale team, especially:
• Co-lead Navraj Chohan and advisor Professor Chandra Krintz
• Check us out on the web:
• http://appscale.cs.ucsb.edu
• http://code.google.com/p/appscale
Thursday, March 10, 2011