42
Making Cloud Storage Making Cloud Storage Provenance-Aware Provenance-Aware Kiran-Kumar Muniswamy- Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

Embed Size (px)

Citation preview

Page 1: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

Making Cloud Storage Provenance-Making Cloud Storage Provenance-AwareAware

Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer

Harvard School of Engineering and Applied

Sciences

Page 2: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 22

The CloudThe Cloud

Next generation computing environmentNext generation computing environment Cheap: Pay as you goCheap: Pay as you go Provision resources (storage, CPU) on a need basisProvision resources (storage, CPU) on a need basis

Provides illusion of infinite resourcesProvides illusion of infinite resources Companies with large batch oriented tasks can get results Companies with large batch oriented tasks can get results

quicklyquickly

Cloud providersCloud providers Amazon Web Services (AWS)Amazon Web Services (AWS) Google AppEngineGoogle AppEngine Microsoft AzureMicrosoft Azure

Page 3: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 33

Provenance for the CloudProvenance for the Cloud

As apps move to the cloud, so will the dataAs apps move to the cloud, so will the data Amazon hosts scientific data for freeAmazon hosts scientific data for free

However, most cloud services are not However, most cloud services are not designed to store provenancedesigned to store provenance

Why Provenance?Why Provenance? Debug Application ResultsDebug Application Results Validate Data SetsValidate Data Sets Improve Search ResultsImprove Search Results Regulatory ComplianceRegulatory Compliance

Page 4: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 44

Provenance PropertiesProvenance Properties

We identified the following propertiesWe identified the following properties Read CorrectnessRead Correctness Causal Ancestry OrderingCausal Ancestry Ordering QueryableQueryable

Page 5: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 55

Read CorrectnessRead Correctness

Data must be what is described by provenanceData must be what is described by provenance Provenance accurately describes the data objectProvenance accurately describes the data object MechanismsMechanisms

AtomicityAtomicity: At storage time, both provenance and data : At storage time, both provenance and data should be stored or neither should be storedshould be stored or neither should be stored

ConsistencyConsistency: At retrieval time, data returned should : At retrieval time, data returned should be consistent with provenancebe consistent with provenance

Page 6: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 66

Causal Ancestry OrderingCausal Ancestry Ordering

The provenance and data of an ancestor object must be recorded in the provenance systemNo dangling references

Page 7: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 77

Efficient QueryEfficient Query

Provenance must be accessible to users who want to verify properties of their data or simply be aware of its lineage If provenance is not readily accessible, the

provenance is of questionable value.

Page 8: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 88

GoalGoal

How do we design protocols around How do we design protocols around current cloud services such that these current cloud services such that these properties are satisfied?properties are satisfied?

SettingSetting Provenance-Aware Storage system (PASS) Provenance-Aware Storage system (PASS)

tracks and collects provenancetracks and collects provenance Primarily considered AWSPrimarily considered AWS

Used 3 services: S3, SimpleDB, SQSUsed 3 services: S3, SimpleDB, SQS

Page 9: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 99

OutlineOutline

IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status

Page 10: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1010

Observes system calls that applications make and captures relationships between objects P: read A

Generates record: P depends on A Cache the record

P: write B Generates record: B depends on P Store both ‘B depends on P’ and ‘P depends on A’

Mirrors data locally and caches provenance till we need to send it to AWS

Provenance-Aware Storage SystemProvenance-Aware Storage System

Page 11: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1111

OutlineOutline

IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status

Page 12: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1212

Simple Storage Service (S3)Simple Storage Service (S3)

Object Store: sizes from 1byte to 5GBObject Store: sizes from 1byte to 5GB Object’s identified by URIObject’s identified by URI SOAP or REST interfaceSOAP or REST interface Operations: Operations:

PUT, GET, HEAD, COPY, DELETEPUT, GET, HEAD, COPY, DELETE PUT: store an object and its metadata (2KB limit)PUT: store an object and its metadata (2KB limit) HEAD: retrieves metadata of an objectHEAD: retrieves metadata of an object

Cost: data storage + bandwidth + num opsCost: data storage + bandwidth + num ops Eventual consistencyEventual consistency

Page 13: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1313

Architecture 1: Standalone S3Architecture 1: Standalone S3

ApplicationApplication

PASSPASS

S3S3

Prov+DataProv+Data

UserUserSystemSystem

Page 14: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1414

Protocol 1: Standalone S3Protocol 1: Standalone S3

PASSPASS

S3S3

PUT:(Prov >1KB)

PUT:(Prov >1KB)

OKOK

Page 15: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1515

Protocol 1: Standalone S3Protocol 1: Standalone S3

PASSPASS

S3S3PUT:D

ata

PUT:Data

OKOK

Page 16: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1616

PropertiesProperties

ArchArch Read Read

CorrectnessCorrectness

Causal Causal OrderingOrdering

EfficientEfficient

QueryQuery

AtomicityAtomicity ConsistencyConsistency

S3S3

Page 17: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1717

OutlineOutline

IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status

Page 18: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1818

SimpleDBSimpleDB

Service providing database functionalityService providing database functionality Data model: items described by attribute-value Data model: items described by attribute-value

pairspairs 256 attrs maximum, name/value < 1KB256 attrs maximum, name/value < 1KB Operations: PutAttributes, Query, Operations: PutAttributes, Query,

QueryWithAttributes, and SELECTQueryWithAttributes, and SELECT Query returns itemsQuery returns items QueryWithAttributes returns both items and attributesQueryWithAttributes returns both items and attributes

Cost: bandwidth + storage + num ops + Cost: bandwidth + storage + num ops + machine hrsmachine hrs

Page 19: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1919

Architecture 2: S3 + SimpleDBArchitecture 2: S3 + SimpleDB

ApplicationApplication

PASSPASS

S3S3

UserUserSystemSystem

SimpleDBSimpleDB

DataData ProvProv

Page 20: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2020

Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB

PASSPASS

S3S3

PUT:(rec > 1KB)

PUT:(rec > 1KB)

OKOK

SimpleDBSimpleDB

PutAttrs+PutAttrs+

OKOK

Page 21: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2121

Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB

PASSPASS

S3S3PUT:D

ata

PUT:Data

OKOK

SimpleDBSimpleDB

Page 22: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2222

PropertiesProperties

ArchArch Read Read

CorrectnessCorrectness

Causal Causal OrderingOrdering

EfficientEfficient

QueryQuery

AtomicityAtomicity ConsistencyConsistency

S3S3

SimpleDBSimpleDB

Page 23: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2323

OutlineOutline

IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status

Page 24: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2424

Simple Queuing Service (SQS)Simple Queuing Service (SQS)

Distributed Messaging SystemDistributed Messaging System Queues are identified by URLQueues are identified by URL Operations: SendMessage, Operations: SendMessage,

ReceiveMessage, DeleteMessageReceiveMessage, DeleteMessage VisibilityTimeout:VisibilityTimeout:

Message will not be available for x seconds Message will not be available for x seconds after a ReceiveMessageafter a ReceiveMessage

Limits: 8KB message size, max 10 msgs can be Limits: 8KB message size, max 10 msgs can be receivedreceived

Page 25: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2525

Architecture 3: S3 + SimpleDB + SQSArchitecture 3: S3 + SimpleDB + SQS

ApplicationApplication

PASSPASS

S3S3

UserUserSystemSystem

SimpleDBSimpleDB

Queue1Queue1

DataData ProvProv

Page 26: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2626

Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS

PASSPASS

SimpleDBSimpleDBSQSSQS

CommitdCommitd

S3S3PUT: Temp copyPUT: Temp copy

OKOK

SndMsg+SndMsg+ OKOK

RecvMsg+RecvMsg+

COPYCOPY

OKOK

OKOK

PutAttrs+

PutAttrs+

Page 27: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2727

Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS

PASSPASS

SimpleDBSimpleDBSQSSQS

CommitdCommitd

S3S3

DelMsg+DelMsg+

DEL:CPY

DEL:CPY

OKOK

Page 28: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2828

IdempotencyIdempotency

SimpleDB, S3, and SQS are idempotentSimpleDB, S3, and SQS are idempotent If a commit daemon crashes, comes back If a commit daemon crashes, comes back

up and processes a transaction again, up and processes a transaction again, there will not be errorsthere will not be errors

Page 29: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2929

PropertiesProperties

ArchArch Read Read

CorrectnessCorrectness

Causal Causal OrderingOrdering

EfficientEfficient

QueryQuery

AtomicityAtomicity ConsistencyConsistency

S3S3

SimpleDBSimpleDB

SQSSQS

Page 30: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3030

OutlineOutline

IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status

Page 31: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3131

AnalysisAnalysis

Extracted provenance by running three Extracted provenance by running three workloadsworkloads Linux compileLinux compile BlastBlast Provenance challengeProvenance challenge

Compute cost to store and query Compute cost to store and query provenanceprovenance Number of opsNumber of ops BandwidthBandwidth

Page 32: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3232

Storage CostStorage Cost

Raw P1 P2 P3

Data 1.27GB 121.8MB

(9.3%)

167.8MB

(13.6%)

421.4MB

(32.2%)

ops 31,180 24,952

(80.0%)

168,514

(540.5%)

231,287

(741.7%)

P1 = S3P1 = S3

P2 = S3 + SimpleDBP2 = S3 + SimpleDB

P3 = S3 + SimpleDB + SQSP3 = S3 + SimpleDB + SQS

Page 33: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3333

Query CostQuery Cost

1. Dump the provenance of a given object Ran it on all objects for statistical

significance

2. Find all the files that were outputs of blast.

3. Find all the descendants of files derived from blast.

Page 34: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3434

Query resultsQuery results

Query S3 SimpleDB

Data OpsOps DataData OpsOps

11 121.8MB 56,132 51.24MB 71,825

22 121.8MB 56,132 2.8KB 6

33 121.8MB 56,132 13.8KB 31

Page 35: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3535

OutlineOutline

IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status

Page 36: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3636

ConclusionsConclusions

Identified the properties that need to be Identified the properties that need to be satisfied for storing provenance in the satisfied for storing provenance in the cloudcloud

Presented various protocols for storing Presented various protocols for storing provenance and data on the cloudprovenance and data on the cloud

Costs of storing provenance is reasonableCosts of storing provenance is reasonable

Page 37: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3737

StatusStatus

System almost readySystem almost ready Plan to submit it to Symposium on Operating Plan to submit it to Symposium on Operating

Systems PrinciplesSystems Principles ( (SOSP)SOSP) Really hard to drive up the costReally hard to drive up the cost

Jan Bill = $1.95Jan Bill = $1.95 Feb Bill = $9.38Feb Bill = $9.38

Page 38: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3838

ExtraExtra

Page 39: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3939

Protocol 1: Standalone S3Protocol 1: Standalone S3

On file close:On file close: Convert the provenance into attribute-Convert the provenance into attribute-

value pairs as required by S3 value pairs as required by S3 If (sizeof(record) > 1KB) If (sizeof(record) > 1KB)

Store the record in a separate S3 object Store the record in a separate S3 object Replace attribute-value pair with pointer to Replace attribute-value pair with pointer to

this object this object Upload the file using PUT:Upload the file using PUT:

Arguments: object, attributesArguments: object, attributes

Page 40: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 4040

Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB

On file close:On file close:1.1. Convert the provenance into attribute-value pairs as Convert the provenance into attribute-value pairs as

required by SimpleDBrequired by SimpleDB Additonal record: md5sum of (file contents + version)Additonal record: md5sum of (file contents + version)

2.2. If (sizeof(record) > 1KB)If (sizeof(record) > 1KB) Store the record in a separate S3 objectStore the record in a separate S3 object Replace attribute-value pair with pointer to this objectReplace attribute-value pair with pointer to this object

3.3. Issue PutAttributes: store the provenanceIssue PutAttributes: store the provenance One item per version (= One PutAttributes) per version of the One item per version (= One PutAttributes) per version of the

objectobject

4.4. Upload the file to S3 using PUTUpload the file to S3 using PUT

Page 41: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 4141

Protocol 3: S3 + SimpleDB + SQS (I)Protocol 3: S3 + SimpleDB + SQS (I)

Log phase: Log data on a queueLog phase: Log data on a queue1.1. Store a copy of the file in a temporary Store a copy of the file in a temporary

location on S3location on S3

2.2. Allocate a transaction id (uuid)Allocate a transaction id (uuid)

3.3. Split provenance into chunks of 8KB and Split provenance into chunks of 8KB and enqueue them on an SQS queueenqueue them on an SQS queue

Tag each message with the transaction IDTag each message with the transaction ID One additional record that has a pointer to the One additional record that has a pointer to the

temp S3 objecttemp S3 object

Page 42: Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 4242

Protocol 3: S3 + SimpleDB + SQS (II)Protocol 3: S3 + SimpleDB + SQS (II)

Commit phase: move data from SQS to S3 and Commit phase: move data from SQS to S3 and SimpleDBSimpleDB

1.1. ReceiveMessage: get messages from the queue and ReceiveMessage: get messages from the queue and assemble the packetsassemble the packets

2.2. Store the provenance in SimpleDB using Store the provenance in SimpleDB using PutAttributes callPutAttributes call

Take care of overflowsTake care of overflows

3.3. Execute an S3 COPY and copy the object from its Execute an S3 COPY and copy the object from its temporary location to permanenttemporary location to permanent

4.4. Delete Messages from SQSDelete Messages from SQS5.5. Delete temporary file copyDelete temporary file copy