Upload
shelley
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10)). Kiran -Kumar Muniswamy -Reddy, Peter Macko , and Margo Seltzer Harvard School of Engineering and Applied Sciences. Outline. Introduction Background Provenance System Property Architecture & Protocol - PowerPoint PPT Presentation
Citation preview
1
PROVENANCE FOR THE CLOUD(USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10))Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo SeltzerHarvard School of Engineering and Applied Sciences
2
Outline Introduction Background Provenance System Property Architecture & Protocol Evaluation Conclusion & Comment
3
Introduction Problem to Solve
Implement a provenance aware storage system in current cloud stores ( use Amazon )
4
Background(1/3) Provenance
Data has two critical components What it is ( contents ) Where it came from ( ancestry )
The provenance is the description of how the object was derived.
The metadata that describes the history of an object Why use provenance?
Use case – Slogan Digital Sky Survey (SDSS) Debug Experimental Results Detect and Avoid Faulty Data Propagation Improving Text Search Result
Security
5
6
Background(2/3) Provenance can be abstract defined as a
directed acyclic graph ( DAG ) Nodes
objects : files, processes, tuples, data sets, etc Have attributes
Command line arguments Name and Version number
Edges Indicate a dependency between the objects
7
Justification Report
is justified by
is response to
is caused by
is caused by
is response to
is response to
is based on
is based on
is based on
is caused by
Data Collection Request
I1
Blood Test Request
I2
Donor Data Request
I4Donation DecisionI9
Blood Test Request
I6
Decision Request
I8
Blood Test Result
I7
Donor Data
I5
Patient Brain Death Notification
I3
8
Background(3/3) Eventual Consistency
A weaker form of data consistency During a sufficient long period of time, and
no updates are sent, we can expect that all replicas in system will be consistent
9
Provenance System Property(1/2)
Provenance Data Coupling An object and its provenance must match The provenance must accurately and
completely describe the data Multi-object Causal Ordering
The causal relationship among objects A system must ensure that an object’s
ancestors and their provenance are persistent before making the object itself persistent
10
Justification Report
is justified by
is response to
is caused by
is caused by
is response to
is response to
is based on
is based on
is based on
is caused by
Data Collection Request
I1
Blood Test Request
I2
Donor Data Request
I4Donation DecisionI9
Blood Test Request
I6
Decision Request
I8
Blood Test Result
I7
Donor Data
I5
Patient Brain Death Notification
I3
11
Provenance System Property(2/2) Data Independent Persistence
Ensure a system retain an object’s provenance, even if the object is removed
Efficient Query Be accessible to users who want to access
or verify provenance properties of their data
12
Architecture(1)
13
Architecture(2) – S3 Simple Storage Service(S3)
Amazon’s storage service An object store where the size of objects
can range from 1 byte to 5GB With each objects, clients can store up to
2KB of metadata Use SOAP or REST API
PUT, GET, HEAD, COPY, DELETE
14
Architecture(3) - SimpleDB SimpleDB
An Amazon’s service that provides the functionality of indexing and querying data
Data model consist items that are described by <attribute,value> pairs
Each item can have 256 <attribute,value> pairs
Each attribute name and value can be as large as 1KB
15
Architecture(4) - SQS Simple Queueing Service
Distributed messaging system that allows users to exchange messages between various distributed components in their systems
8KB limit of the size of the message In this paper, SQS is used as a write-ahead
log(WAL)
16
Architecture(5) -- PASS Provenance-Aware Storage System
A storage system that automatically collects , stores., manages, and provides search for provenance
Monitor system calls Generate provenance and sending both
provenance and data to PA-S3fs
17
Architecture(6) – PA-S3fs Provenance Aware S3 File System
Caches data and provenance on the client to reduce traffic to S3
Send data and provenance to the cloud
18
Protocol(1)
19
Protocol(2) Protocol 1 ( P1 )
Standalone Cloud Store Map each file to an S3 object and store the
provenance as a separate S3 object Provenance object
Named with a uuid Contain the name of primary object
Primary object metadata Version number and uuid
20
Protocol(3) P1 does not support
data coupling But can detect
decoupling Query is inefficient
Need retrieve all provenance
Client
PUT:Provenance
OKPUT:Data
OK
S3
21
Protocol(4)
22
Protocol(5) Protocol 2 ( P2 )
Cloud store with a cloud database Store provenance as one SimpleDB item
If item is larger than 1KB SimpleDB limit store provenance as S3 object save the pointer in attribute-value
23
Protocol(6) Provide efficient
provenance queries Does not support
data coupling
Client
PUT: Prov > 1KB
OK
PUT:Data
OK
S3
SimpleDB
OK
BatchPUTAttributes: Prov
24
Protocol(7) Protocol 3 ( P3 )
Cloud store with Cloud Database and Messaging Service
Use SQS as a write-ahead log (WAL) 8KB limit Store large objects as temporary S3 objects , and
record the pointer in WAL Commit daemon
Read the log records Assemble all the records belonging to a transaction Ignore the records if the client crash
25
ClientPUT: Temp data copy
OK
Copy:Data
OK
S3
SimpleDB
OK
BatchPUTAttributes
SQSSendMessage: Prov
OK
CommitdRecvMess
age
S3
S3PUT:Prov>1
KB
Delete:temp
Delete:Msg
OK
OK OK
26
Protocol(9)
27
Evaluation(1) Workload
CVSROOT nightly backup IO intensive 240 operations
Blast Mix of compute and IO operations Provenance tree has a depth of 5 10773 operations
Challenge Mix of compute and IO operations Provenance tree has a depth of 11 6179 operations
28
Evaluation(2)EC2 instance
Local machine
29
Evaluation(3) Query performance
Q1 Retrieve all the provenance ever recorded
Q2 Retrieve the provenance of all version of one
object Q3
Find all files that were directly output by Blast Q4
Find all the descendants of files derived from Blast
30
Evaluation(4)
31
Conclusion Definition of properties that provenance
systems must exhibit Design and implementation of three
protocols for storing provenance and data on the cloud
All three protocols have reasonable overhead in time and minimal financial overhead
32
Comment Economy
Provenance can not increase profit directly Customer loyalty
Security Provenance can ensure correctness of files But it may contain sensitive information
33
THE END