©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar...

Preview:

Citation preview

©2011 Hewlett-Packard Company and Vertica Confidential11

Cloud Storage Challenges

Dr. Dinkar Sitaram

dinkar.sitaram@hp.com

2

Overview

– Types of cloud storage

– Building cloud-scale storages

– Challenges: theoretical considerations

– Dealing with the challenges

Based on Moving to the Cloud by Dinkar Sitaram & Geetha Manjunath,to be published by Elsevier

©2011 Hewlett-Packard Company and Vertica Confidential33

Types of cloud storage

4

File-based cloud storage

– Allow storage of files in cloud

– Amazon S3, Windows Azure, …

– Built on top of HTTP

– Amazon S3 Overview• Create bucket, objects

• GET http://dinkar.s3.amazon.aws.com/project/file.c

• No directories: file names

• Need AWS Access Key and AWS Secret Key

– Region: geographical

5

Database oriented cloud storage

– Offers a database service

– Examples: Amazon RDS (MySQL), Windows Azure SQL

– RDS examples• Can administer (e.g., create, replicate) database using Amazon RDS

APIs− Db.createDBInstanceAsync (parms) creates a database

• Use JDBC APIs to build applications− ResultSet rs = stmt.executeQuery (“SELECT * FROM Employee”)

6

Key-value stores

– Database consists of <key, value> pairs• No schema as in relational databases

• Typically data need not be normalized

• More flexible than RDBMS, scales due to fewer restrictions

• More work in application (e.g., valid values) to guarantee traditional RDBMS qualities

– Examples: Amazon SimpleDB, Google BigTable, Hadoop HBase

– Programming example (SDB)• Google SimpleJDBC

• String insert = "INSERT INTO employees (name, title) VALUES (‘Dinkar', ‘Architect’)";

• int val = st.executeUpdate(insert);

7

XML databases

– Store XML documents

– Examples: MongoDB• Stores JSON documents { “Name”: “Dinkar”, “Attributes”: {“Sex”: “M”,

“Title”: “Architect”} }

• Documents can have pointers to other documents

• Index on any attribute (including embedded): db.Orders.EnsureIndex()

• Searching: db.orders.find()

– XML DBs midway between key-value stores and RDBMS• Explicitly create indices

• More complex structures

• Some XML DBs, e.g., CouchDB, offer transactions

©2011 Hewlett-Packard Company and Vertica Confidential88

Building cloud-scale storage

9

Cloud storage requirements

– Scaling to cloud-scale: partitioning

– Availability: replication

10

Partitioning strategies

– Similar to methods for partitioning databases

– Round-robin on partitioning attributes• Loses associativity

– Hash partitioning

– Range-based

– Directory-based• Memcached

• Can provide, e.g., geographical partitioning

– References: Parallel database systems: the future of high performance database systems, by DeWitt, D and Gray, J, Communications of the ACM, Volume 35 Issue 6, June 1992.

11

Amazon availability

– Multiple availability zones per regions• Zones failure isolated from each other

– Data replicated across 3 availability zones by default

©2011 Hewlett-Packard Company and Vertica Confidential1212

Challenges: Theoretical considerations

13

CAP theorem

– Fundamental limitation of distributed systems

– No distributed system can satisfy all three properties below• Conjectured in [Brewer00], proved in [LynGil02] by considering a two-node cluster

• Consistency: all operations appear to be serialized on a non-distributed object

• Availability: every operation returns a result

• Partition-tolerance: Arbitrary number of messages between service nodes are lost

– References1.[Brewer00] Towards Robust Distributed Systems by Eric A. Brewer, ACM Symposium on Principles of Distributed Systems, July 16-19 2000, Portland, Oregon

2.[LynGil02] Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, by Nancy Lynch and Seth Gilbert, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59

14

2-node example

1. Servers replicated for availability

2. If network partitions3.Allow servers to operate independently (inconsistent) OR

4. Bring servers down (no availability)

15

Practical example: Netflix

– Netflix: video on demand over the Internet

– Runs on Amazon cloud

– Consider the following scenario• User at TV updates list of favorites• Load balancer sends update to server

1• Set top box requests favorites list• Load balancer sends update to server

2• Is the returned result consistent?

Depends!

– Comparing NoSQL Availability Models by Adrian Cockcroft, http://perfcap.blogspot.com/2010/10/comparing-nosql-availability-models.html

©2011 Hewlett-Packard Company and Vertica Confidential1616

Dealing with inconsistency predicted by CAP theorem

17

Relaxed consistency

– Consistency can be relaxed• Weak consistency: system does not guarantee to return consistent

results• Eventual consistency: if no further updates, system will become

consistent. If updates are infrequent, can wait for some time to get consistent value

• Read your writes consistency: a client performing a read after a write will always see its own updates

• Session consistency: consistency within a session

– Amazon S3• US Standard Region: Eventual consistency• US West, EU, Asia Pacific Regions: Read your writes consistency for new

object creation, eventual consistency for writes and deletes

– Reference: Eventual Consistency by Werner Vogel, Communications of the ACM, January 2009

18

Example: Handling inconsistency

– BASE: an alternative to ACID [Brewer00]

• Basically Available

• Soft-state

• Eventually consistent

– Example: online shopping portal• User table: transactions by user

• Transaction table: transactions used for billing

• How do we update both tables after a purchase?

– Traditional database method• Begin transaction

• Update User table

• Update Transaction table

• End transaction

– BASE, an ACID Alternative, by D. Pritchett, ACM Queue, June 2008

– A common cloud Method• Queue update to user table

• Queue update to transaction table

– Databases could be inconsistent

– Will become eventually consistent

User table Transaction table

Application

©2011 Hewlett-Packard Company and Vertica Confidential1919

Conclusions

20

Conclusions

– Many alternatives for building cloud storage exist

– Careful trade-off between consistency and availability