26
An Evaluation of Distributed Datastores Using The AppScale Cloud Platform Presented By- Himanshu Ranjan Vaishnav TE-42065 (Comp-I) SEMINAR GUIDE - Prof. Mrs S. S. Sonawani 1 05/15/22

An evaluation of distributed datastores using AppScale Cloud Platform

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: An evaluation of distributed datastores using AppScale Cloud Platform

An Evaluation of Distributed Datastores UsingThe AppScale Cloud Platform

Presented By- Himanshu Ranjan Vaishnav

TE-42065 (Comp-I)

SEMINAR GUIDE - Prof. Mrs S. S. Sonawani

1

04/09/23

Page 2: An evaluation of distributed datastores using AppScale Cloud Platform

What is AppScale?

AppScale is an open-source implementation of the Google App Engine

cloud platform.

AppScale is an extension of the non-scalable software development kit

that Google makes available for testing and debugging applications.

App-Scale currently supports HBase, Hypertable, Cassandra, Voldemort,

MongoDB, MemcacheDB, Scalaris, and MySQL Cluster datastores.

2

04/09/23

Page 3: An evaluation of distributed datastores using AppScale Cloud Platform

What AppScale Does?

AppScale is a robust, open source implementation of the Google App

Engine APIs that executes over private virtualized cluster resources and

cloud infrastructures including Amazon Web Services and Eucalyptus.

Users can execute their existing Google App Engine applications over

AppScale without modification.

AppScale automates deployment and simplifies configuration of datastores

that implement the API and facilitates their comparison and evaluation on

end-to-end performance using real programs (Google App Engine

applications).

3

04/09/23

Page 4: An evaluation of distributed datastores using AppScale Cloud Platform

AppScale Features

• More Choices of data Stores

• Neptune Language

• MapReduce

• Fault Tolerance

And More

• App Engine Portability

4

04/09/23

Page 5: An evaluation of distributed datastores using AppScale Cloud Platform

Google App Engine

A software development platform

Platform-as-a-service (PaaS)

GAE Datastore

Big Table

A master/slave relationship

5

04/09/23

Page 6: An evaluation of distributed datastores using AppScale Cloud Platform

Continue….

GAE Datastore API provides the following primitives:

For eg.

• Put (k, v): Add key k and value v to table; creating a table if needed

• Get (k): Return value associated with key k

• Delete (k): Remove key k and its value

• Query (q): Perform query q using the Google Query Language (GQL) on a single table, returning a list of values

• Count (t): For a given query, returns the size of the list of values returned

6

04/09/23

Page 7: An evaluation of distributed datastores using AppScale Cloud Platform

Google App Engine APIs

Blobstore API

Channel API

Datastore API

Images API

Memcache API

Namespace API

Task Queue API

Users API

URL Fetch API

XMPP API

MapReduce Streaming API

EC2 API

7

04/09/23

Page 8: An evaluation of distributed datastores using AppScale Cloud Platform

AppScale deployment

AS – App Server

ALB – App Load Balancer

DBS – Data Base Slave Peer

DBM – Data Base Master Peer

8

04/09/23

Page 9: An evaluation of distributed datastores using AppScale Cloud Platform

Multi-tiered approach within AppScale

9

04/09/23

Page 10: An evaluation of distributed datastores using AppScale Cloud Platform

Database Services

Protocol Buffer Server (PBServer)

User/App Server (UAServer)

Blobstore service

Monitoring Services

Neptune

10

04/09/23

Page 11: An evaluation of distributed datastores using AppScale Cloud Platform

APPSCALE DISTRIBUTED DATABASE SUPPORT

Cassandra

HBase

Hypertable

MemcacheDB

MongoDB

Voldemort

MySQL

11

04/09/23

Page 12: An evaluation of distributed datastores using AppScale Cloud Platform

1. Cassandra

Facebook engineers designed, implemented, and released

A hybrid approach

Consistent

Written in the Java and exposes its API through the Thrift software

framework

Supports range queries

12

04/09/23

Page 13: An evaluation of distributed datastores using AppScale Cloud Platform

2. HBase

Developed and released by PowerSet

An official Hadoop subproject

Employs a master-slave distributed architecture

Provides flexible column support

Written primarily in Java, with a small portion of the code base in C

HBase is deployed over the Hadoop Distributed File System (HDFS)

13

04/09/23

Page 14: An evaluation of distributed datastores using AppScale Cloud Platform

3. Hypertable

Hypertable was developed by Zvents

Provide an open source version of Google’s BigTable

Written in C++

RangeServer

14

04/09/23

Page 15: An evaluation of distributed datastores using AppScale Cloud Platform

4. MemcacheDB

Developed by Open source developer Steve Chu

Employs a master-slave approach

Runs with a single master node and multiple replica nodes

Written in C and uses Berkeley DB

15

04/09/23

Page 16: An evaluation of distributed datastores using AppScale Cloud Platform

5. MongoDB

Developed and released by 10gen

Provide both the speed and scalability

Written in C++

Queries are performed using hashtable

16

04/09/23

Page 17: An evaluation of distributed datastores using AppScale Cloud Platform

6. Voldemort

Developed by and currently in use internally at LinkedIn

Eventual consistency

More Developer friendly

Written in Java and exposes its API via Thrift

17

04/09/23

Page 18: An evaluation of distributed datastores using AppScale Cloud Platform

7. MySQL

A well-known relational database

Employ MySQL Cluster

Provides concurrent access to the system

Written in C and C++

18

04/09/23

Page 19: An evaluation of distributed datastores using AppScale Cloud Platform

EVALUATION

Load tables in all databases with 1000 items

Test specifics:

– On Each database put, get, delete, no-op performed

– Considered- light load: one thread, medium load: three concurrent thread,

heavy thread: nine concurrent thread

– Repeat each experiment 5 times

Executes this application in an AppScale cloud

Each node executes with 2 virtual processors, 10GB of disk(max), 4GB of

memory

19

04/09/23

Page 20: An evaluation of distributed datastores using AppScale Cloud Platform

Experimental Results20

04/09/23

Page 21: An evaluation of distributed datastores using AppScale Cloud Platform

Limitations

Persistence

Blobstore Max File Size

Datastore

Task Queue

Mail

Follow a ”deploy on all nodes”

Limited distribution supported

Lake of retrieving the entire

table to run a query

Not released the source code of

the Java App Engine server

21

04/09/23

Page 22: An evaluation of distributed datastores using AppScale Cloud Platform

Future Work

Expand out of the web services domain

– Investigating opportunities in streaming

– Integrated MapReduce support for highperformance computing (HPC)

– Co-locate AppEngines and use shared memory

Additional databases:

– MongoDB, Scalaris, CouchDB

22

04/09/23

Page 23: An evaluation of distributed datastores using AppScale Cloud Platform

Continue…

Extending AppScale with new services for

- large-scale data analytics

- data

- computation intensive tasks

Cloud-agnostic

Integration of mobile device

23

04/09/23

Page 24: An evaluation of distributed datastores using AppScale Cloud Platform

CONCLUSION

Presents an open source implementation of the Google App Engine

(GAE) Datastore API with in a cloud platform called AppScale

The implementation unifies access to wide range of open source

distributed database technologies and automates their configuration

and deployment. However, each database differs in the degree to which

it implements the APIs.

24

04/09/23

Page 25: An evaluation of distributed datastores using AppScale Cloud Platform

DEMO

25

04/09/23

Page 26: An evaluation of distributed datastores using AppScale Cloud Platform

Thank YouAny Questions ??

26

04/09/23