60
1 Cloud Computing Lectures 11, 12 and 13 Cloud Storage 2014-2015

Cloud Computing - fenix. · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

Embed Size (px)

Citation preview

Page 1: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

1

Cloud Computing

Lectures 11, 12 and 13

Cloud Storage

2014-2015

Page 2: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

2

Up until now…

• Introduction

• Definition of Cloud Computing

• Grid Computing

• Content Distribution Networks

• Cycle-Sharing

• Distributed Scheduling

• Map Reduce

Page 3: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

3

Outline

• Components of Cloud Platforms

• Storage Types

• Storage Products

• Cloud File Systems

• Cloud Object Storage

Page 4: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

4

Components of Cloud Computing

Platforms

Data Storage

Execution Model

Programming Model

Mo

nito

ring

•How to program an application?

•How is the platform viewed?

•Which abstraction is accessible: VM? API?

Framework?

•Which operations can I perform?

•How are my data stored and accessed?

•Monitoring: How can I evaluate the state

of executions/nodes/data...?

Page 5: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

5

Major Cloud Platforms

• Apache Hadoop

• Amazon Web Services

• Google App Engine

• Microsoft Azure

• OpenStack

Page 6: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

6

Storage Types

• A range of search, streaming and indexing variants.

• File System:

• Hierarchical organization, files, permission, streaming data,...

• Object Storage:

• Direct Program <-> Storage interaction

• Object ID indexing

• Tables (no-SQL DB):

• records and tables

• Search

• No relational model

• Relational Databases:

• Full relational model

• Conventional services

• We will see that the categories are becoming blurred...

Page 7: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

7

Storage Products (i)

• File System

• Hadoop File System / Google File System

• Object/Byte Storage

• Amazon S3

• MS Azure Blobs

• Table

• Hadoop HBase / Google Big Table (AppEngine Datastore)

• Amazon Simple DB

• MS Azure Tables

• Hadoop Hive

• Yahoo PNUTS

• Relational Databases

• Amazon RDS

• SQL Azure

Page 8: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

8

Cloud File System: HDFS/GFS

• Distributed File System

• Reimplementation of the Google File System (GFS).

• Runs on clusters of generic machines.

• HDFS is tuned for:

• Very large files.

• Streaming access.

• Generic hardware.

• Scalability Key: data operations don’t go through the central server.

Page 9: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

9

Blocks

• Simplify space management: allocation, replication and a file may grow almost indefinitely.

• Evolution:

• Disk blocks: 512 bytes

• File system blocks: 2,4,8 kB

• HDFS blocks: 64MB

• To eliminate seek steps: contiguous 64MB.

• A file smaller than one block does not occupy 1 block.

Page 10: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

10

Namenode

• Manages the file system name space: folder hierarchy, name uniqueness,…

• Maintains the folder tree and the metadata in 2 files: namespace image and edit log.

• HDFS cannot operate without the namenode.

• Files can be written, read, renamed and deleted.

• It is not possible to:• Write in the middle of the file.

• Write concurrently to the same file.

• Fault tolerance mechanism: atomic replication to another machine.

Page 11: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

11

Datanode

• Manage a set of blocks.

• Process clients’ or namenode’s

writing/reading requests.

• Periodically notifies the namenode of the

blocks it holds..

• If a block’s replication factor drops below a

configuration value, a new replica is created.

Page 12: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

12

Permissions

• Permissions in HDFS are similar to UNIX:

• user, group e other

• read, write e execute

• As the user is very often remote, any

username from a remote node is trusted.

Therefore, protection is weak.

• They are more geared towards managing a

group of users in the cluster.

Page 13: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

13

Consistency Model

• Formalization of the visibility of read and write

operations.

• After an operation call finishes, who sees what

and when?

• HDFS model: There are no guarantees that the

last block has been written unless sync() is

called.

Page 14: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

14

Error Checking

• The block correction is checked using a hashing function (CRC32 - checksum).

• At file creation:• Client calculates the checksum for each 512 byte block.

• Datanode stores the checksum.

• At file access:• Client reads the data and the checksum from the

datanode.

• If the check fails, it tries other replicas.

• Periodically, the datanode checks its blocks checksum.

Page 15: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

15

Reading

• Client contacts the namenode to get the list of the datanodes with the file’s blocks (stored in memory).

• Receives a FSDataInputStream that transparently chooses the best datanode, opens and closes connections to the datanodes, requests blocks from the namenode, repeats operations if necessary and logs failed datanodes.

Page 16: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

16

Reading

Page 17: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

17

Choosing Nodes: Distance

• Nodes choose the closer sources of data.

• Assumes a tree structured organization.

• Distance equal to the name of hops between the tree nodes.

• distance(/d1/r1/n1, /d1/r1/n1) = 0 (processes on the same node)

• distance(/d1/r1/n1, /d1/r1/n2) = 2 (processes on the same racks)

• distance(/d1/r1/n1, /d1/r2/n3) = 4 (processes on different racks)

• distance(/d1/r1/n1, /d2/r3/n4) = 6 (processes on different datacentres)

Page 18: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

18

Distance Between Nodes

Page 19: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

19

Writing (+ creating)

• Client requests a new file to the namenode

checking permission and uniqueness. If it succeeds, it receives a FSOutputStream .

• Namenode provides a set of datanodes for replication.

• Blocks write requests are kept in a data queue.

• Unconfirmed block write request are kept in a ack queue.

Page 20: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

20

Writing

Page 21: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

21

Writing

• In case the datanode fails, the client changes

the block id so that the corrupted replica is

deleted later.

• By default, if one of the replicas is successfully

written, the writing is considered done. The

other replicas are written asynchronously.

Page 22: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

22

Command Line Tool

• hadoop fs• ls

• mkdir

• rm

• rmr

• put

• copyToLocal

• copyFromLocal

Page 23: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

23

Cloud Object Store:

Amazon Simple Storage System (S3)

Page 24: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

24

S3

• Amazon’s persistent object storage system.

• Implementation based on the Dynamo system

(SOSP, 2007).

• Accessible using HTTP: 3 different protocols,

e.g. SOAP.

Page 25: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

25

Dynamo: Intuition

• CAP Theorem: Consistency, Availability and Partition

tolerance - Pick two!

• At Amazon: Availability = Client’s trusts

• Cannot be sacrificed.

• In large data centres there are going to be frequent

faults:

• The possibility of a partition has to be included.

• Most data services tolerate small inconsistencies:

• Relaxed consistency ==> Eventual consistency.

Page 26: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

26

Consistency Models

• Strong Consistency: Once a write operations is finished for the requester, any subsequent read will return the value that was written.

• Weak Consistency: The system does not guarantee that subsequent accesses return the written value. Some condition must be verified for the written value to be returned (a time interval, an access to a synchro variable,…). The period between the write finishing and the value visibility is called the inconsistency window.

• Eventual Consistency: The system guarantees that, if there no more writes, the updates will become visible for all clients (e.g. DNS): a DNS name update is propagated between zones until all clients see the new value.

Page 27: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

27

Variants of Eventual Consistency

• Causal Consistency: Two causally related writes (A happens before B) cannot lead to B being written before A. There are no guarantees regarding write operations that are not causally related.

• Read-your-writes Consistency: Every time a process A writes a value, all subsequent reads must reflect that write (a particular case of causal consistency).

• Session Consistency: A practical implementation of the previous model. All operations are done in the context of a session. During the session, the system guarantees “read-your-writes”. In the case of certain faults, the session is ended and the “read-your-writes” guarantee is restarted.

• Monotonic Reads Consistency: If a process has seen a subsequent value, subsequent reads will never return a previous value.

• Monotonic Writes Consistency: Systems that do not guarantee ordered writes in the same process. Very rare…

Page 28: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

28

Dynamo Assumptions

• Interaction Model:• Total reads and writes with unique IDs.

• Binary objects with up to 5GB.

• No operations on multiple objects.

• ACID properties (Atomicity, Consistency, Isolation, Durability):• Atomicity/Isolation: total writes of an object.

• Durability: replicated write.

• Only the consistency isn’t strong.

• Efficiency:• Optimize for the 99,9 percentile.

Page 29: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

29

Design Decisions

• Incremental Scalability:

• Adding nodes has to be simple.

• Load balancing and support for heterogeneity:

• The system must distribute the requests.

• And support nodes with different characteristics.

• Solution: nodes in a Chord like DHT.

Page 30: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

30

Design Decisions

• Symmetry:

• All nodes are equally responsible peers.

• Decentralization:

• Avoid single points of failure.

Page 31: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

31

Dynamo: Design Decisions

Problem Technique Advantage

Partitioning Consistent Hashing Incremental Stability

Write Availability Vector clocks and conflict resolution of writes

Version size does not depend on the update rate

Temporary Faults Relaxed quorum and hinted handoff

High availability and durability

Permanent Faults Anti-entropy with Merkle trees Synchronizes replicas asynchronously

Membership and Fault detection

Gossip based membership protocol

Maintains symmetry and avoids and centralized

directory

Page 32: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

32

Dynamo: API

• Two operations:• put(key, context, object)

• key: object ID.

• context: vector clocks and object’s history.

• object: data to be written.

• get(key)

Page 33: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

33

Partitioning and Replication

• Uses consistent hashing.

• Similar to Chord:

• Each node has an id in the key space.

• Nodes are arranged in a ring.

• Data are stored in the node with the lowest key

that is larger than the object’s

• Replication:

• All objects are replicated in the N nodes that

follow the node associated with the object.

Page 34: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

34

The Chord Ring with Replication

Page 35: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

35

Virtual Nodes

• Problem: few nodes or heterogeneous nodes

lead to bad load balancing.

• Dynamo solution:

• Use virtual nodes

• Each physical nodes has several “virtual

node”tickets.

• More powerful machines can have more tickets.

• “Virtual node” tickets are distributed randomly.

Page 36: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

36

Data Versions

• Nodes for writing and reading are selected based on load.

• So, we have eventual consistency:

• There may be different versions written on different replicas.

• Conflict resolution is made when reading and not when writing.

• Syntactic Reconciliation:

• Some changes can be made automatically. For formats with clearly identifiable parts and operations (e.g. mail file).

• Semantic Reconciliation:

• The user must decide.

• Divergence is uncommon. For all read operations:

• 99.94% - 1 version;

• 0.00057% - 2 versions;

• 0.00047% - 3 versions;

• 0.00009% - 4 versions.

• Timeout:

• After a number of generations without writing, versions are discarded.

Page 37: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

37

Vector Clocks (i)

• Represents time in a distributed system

without clock sync.

• Replaces physical time with causality.

• A vector clock is a list of (node, counter) pairs.

• If all positions of the vector clock time of an

event A are smaller than those of another

event B then A happened before B. There is a

causal chain of events from A to B.

Page 38: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

38

Vector Clocks (ii)

Real time

Page 39: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

39

Object Versions

• If we assign a vector clock timestamp to all object versions we can detect divergent replicas.

• Example:

• X, Y e Z are servers with replicas of object D.

• D5 is a semantic reconciliation performed by the user.

Page 40: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

40

Executing get() e put()

• For good performance, two possibilities:

• Route requests through a load balancer that chooses the node based on the load:

• Creates a bottleneck.

• Use a client side library to choose the node where to send the request (which will be the coordinator):

• Requires recompiling the client. Probably irrelevant in AWS.

• Then the coordinator executes the quorum reads or writes.

Page 41: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

41

Read/Write Operations

• Dynamo supports writing and reading using a quorum model. This allows not waiting for all replicas when you do an operation.

• Consider R and W are the number or read and write replicas that must synchronously take part in an operation.

• If R + W > N we have a quorum based system, then the set of replicas used for writing always overlap with the set of read replicas:

• It is impossible to read an object without seeing the latest written object.

• Latency is determined by the slowest node in the R (or W) set. Therefore, to improve performance, one lowers R or W.

Page 42: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

42

Sloppy Quorum

• To ensure availability, Dynamo uses a “sloppy

quorum”.

• Each data item is stored on N nodes of list

spanning multiple machines and data centers

(preference list).

• Operations are performed not on the N

existing replicas but on the first healthy N

nodes on the preference list.

Page 43: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

43

Tolerating Temporary Faults:

Hinted Handoff

• Assuming N = 3. If A is unavailable or fails when we write, send a replica to D.

• D marks the replica as temporary and returns the data to A as soon as it recovers.

• Replicas are chosen from a preference list of nodes.

• Preference lists always span multiple datacenters for fault tolerance.

Page 44: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

44

Membership and Fault Detection

• Ring Membership:

• At startup use an external entry point to avoid

partitioned rings.

• Gossip asynchronously to update the DHT.

Exchange membership lists with random node

every 2 seconds.

• Fault Detection:

• Faults are detected by neighbours with periodic

messages with a timeout on reply.

Page 45: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

45

Permanent Faults

• When a hinted replica (that has write-ops

belonging to another replica) is considered

failed:

• Data is synchronized with the new replica using

Merkle trees.

Page 46: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

46

Merkle Trees

• Accelerates synchronization between nodes by comparing trees of hashes.

• Each tree node has a hash of the children.

• It makes it very easy to identify what needs to be exchanged.

• The update can be asynchronous:

• An out-of-date tree is not serious.

Page 47: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

47

Merkle Trees: Dynamo

• Each node has a set of keys.

• All objects are leafs of the Merkle tree.

• Replicas exchange the top of the Merkle tree

periodically.

• If it's different, they recursively exchange the

hash of lower nodes.

Page 48: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

48

Back to S3

• Additional issues when compared to Dynamo:

• Access to S3 is controlled by an ACL based on the clients’

AWS identity and checked with their secret key.

• Occasionally, some S3 calls fail and must be repeated.

Programs accessing S3 should take this into account.

• Dynamo replication is performed between data centers.

• This large scale replication has some lag.

Page 49: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

49

Service Level Agreements

• Hosting contracts and cloud platforms, like S3, include SLAs.

• Very often described as average, median and/or variances of response times:

• Extreme cases are always problematic.

• Amazon optimizes for 99,9% of the requests:

• Example: 300ms response time for 99,9% of the requests below a peak request rate of 500 request per second.

Page 50: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

50

Buckets and Objects

• S3 data are stored as Dynamo objects.

• Operations on objects are:

– PUT, GET, DELETE, HEAD (get metadata)

• Objects can be grouped in buckets.

• Buckets are used for delimiting namespaces:

• http://mybucket.s3.amazonaws.com/myobj

• http://s3.amazonaws.com/mybucket/myobj

Page 51: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

51

S3: REST GET

Sample Request

GET /my-image.jpg HTTP/1.1

Host: bucket.s3.amazonaws.com

Date: Wed, 28 Oct 2009 22:32:00 GMT

Authorization: AWS 02236Q3V0WHVSRW0EXG2:0RQf4/cRonhpaBX5sCYVf1bNRuU=

Sample Response

HTTP/1.1 200 OK

x-amz-id-2: eftixk72aD6Ap51TnqcoF8eFidJG9Z/2mkiDFu8yU9AS1ed4OpIszj7UDNEHGran

x-amz-request-id: 318BC8BC148832E5

Date: Wed, 28 Oct 2009 22:32:00 GMT

Last-Modified: Wed, 12 Oct 2009 17:50:00 GMT

ETag: "fba9dede5f27731c9771645a39863328"

Content-Length: 434234

Content-Type: text/plain

Connection: close

Server: AmazonS3

[434234 bytes of object data]

See http://s3.amazonaws.com/doc/s3-

developer-guide/RESTAuthentication.html

Page 52: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

52

S3: REST PUT

Sample RequestPUT /my-image.jpg HTTP/1.1Host: myBucket.s3.amazonaws.comDate: Wed, 12 Oct 2009 17:50:00 GMTAuthorization: AWS 15B4D3461F177624206A:xQE0diMbLRepdf3YB+FIEXAMPLE=Content-Type: text/plainContent-Length: 11434Expect: 100-continue[11434 bytes of object data]

Sample ResponseHTTP/1.1 100 ContinueHTTP/1.1 200 OKx-amz-id-2: LriYPLdmOdAiIfgSm/F1YsViT1LW94/xUQxMsF7xiEb1a0wiIOIxl+zbwZ163pt7x-amz-request-id: 0A49CE4060975EACx-amz-version-id: 43jfkodU8493jnFJD9fjj3HHNVfdsQUIFDNsidf038jfdsjGFDSIRpDate: Wed, 12 Oct 2009 17:50:00 GMTETag: "fbacf535f27731c9771645a39863328"Content-Length: 0Connection: closeServer: AmazonS3

Page 53: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

53

S3: REST in Javapublic void createBucket() throws

Exception{

// S3 timestamp pattern.String fmt = "EEE, dd MMM yyyy

HH:mm:ss ";SimpleDateFormat df = new

SimpleDateFormat(fmt, Locale.US);

df.setTimeZone(TimeZone.getTimeZone("GMT"));

// Data needed for signatureString method = "PUT";String contentMD5 = "";String contentType = "";String date = df.format(new Date()) +

"GMT";String bucket = "/onjava";

// Generate signatureStringBuffer buf = new StringBuffer();buf.append(method).append("\n");buf.append(contentMD5).append("\n");buf.append(contentType).append("\n");buf.append(date).append("\n");buf.append (bucket);String signature =

sign(buf.toString());

// Connection to s3.amazonaws.comHttpURLConnection httpConn = null;URL url = new

URL("http","s3.amazonaws.com",80,bucket);

httpConn = (HttpURLConnection) url.openConnection();

httpConn.setDoInput(true);httpConn.setDoOutput(true);httpConn.setUseCaches(false);httpConn.setDefaultUseCaches(false);httpConn.setAllowUserInteraction(true);httpConn.setRequestMethod(method);httpConn.setRequestProperty("Date",

date);httpConn.setRequestProperty("Content-

Length", "0");String AWSAuth = "AWS " + keyId + ":" +

signature;

httpConn.setRequestProperty("Authorization", AWSAuth);

// Send the HTTP PUT request.int statusCode =

httpConn.getResponseCode();if ((statusCode/100) != 2){

// Deal with S3 error stream.InputStream in =

httpConn.getErrorStream();String errorStr = getS3ErrorCode(in);

}}

Page 54: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

54

S3: REST in JetS3t

String awsAccessKey = "YOUR_AWS_ACCESS_KEY";

String awsSecretKey = "YOUR_AWS_SECRET_KEY";

AWSCredentials awsCredentials =

new AWSCredentials(awsAccessKey, awsSecretKey);

S3Service s3Service = new RestS3Service(awsCredentials);

S3Bucket euBucket = s3Service.createBucket("eu-bucket", S3Bucket.LOCATION_EUROPE);

Page 55: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

55

Windows Azure

Page 56: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

56

• Volatile storage:

• Instance disk

• Memory cache

• Persistent Storage:

• Windows Azure Storage:

• Blobs (objects)

• Tables

• Queues

• SQL Azure:

• Relational DB

Azure Storage (i)

Page 57: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

57

Azure Storage (ii)

• Service is accessible via Web Services or libraries on top of these (C#, VB, Java).

• Blobs, Tables e Queues are stored in partitions.

• Partitions are the replication and load balancing unit. Blobs and queues are not sharded. Tables may be.

• All partitions have 3 replicas.

• Partitions are represented in a DFS as one or more extents (contiguous files) of up to 1GB.

Page 58: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

58

Blobs

• A blob is a <name, object> pair.

• Allows storage of objects from a few bytes up to

50GB.

• Blobs are stored in containers.

• There is no hierarchy in blob storage but it can be

simulated because names may contain “/”s.

• URLs schema:

http://<StorageAccount>.blob.core.windows.net/<Co

ntainer>/<BlobName>

Page 59: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

59

Operations on Blobs

• Put: creating

• Get: reading

• Set: updating

• Delete: eliminating

• Lease: 1 minute locking.

Page 60: Cloud Computing - fenix.  · PDF fileComponents of Cloud Computing Platforms Data Storage ... faults: •The possibility ... Design Decisions Problem Technique Advantage

60

Next Time...

• Storage in Cloud Platforms