55
Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Embed Size (px)

Citation preview

Page 1: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Ubiquitous Data Access

Doppalapudi Raghu Chaitanya

Jaliparthi Gangadhar

Page 2: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Outline Ubiquitous Data History - NFS, AFS CODA File system Cedar LBNFS Operation shipping MFS Data Staging on untrusted surrogates Portable soul pads Portable & distributed storage GFS Conclusion

Page 3: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Ubiquitous Data

“In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998

“Fundamentally, the ability to access all information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.”

Ubiquitous data access will put existing data management techniques to the test, in all aspects – searching, location, reliability, consistency, …

Page 4: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Ubiquitous Data AccessState of the Art Everyone uses a database system and/or search engine

every day Although they may not realize it! (the true test of “ubiquity”).

The Internet and WWW have become a ubiquitous means of global data dissemination and exchange.

Databases play a crucial but largely invisible role here. XML and related standards are enabling increasingly

sophisticated interoperation. Wireless access provides anytime-anywhere access and

enables location-centric applications.

Page 5: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Characteristics of Ubiquitous Data systems

functionality scalability serializability optimality interoperability

personalization

globalization

synchronization

flow regulation

integration

Page 6: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

History

NFS (1985) Sun Microsystems NFS allows one

computer attached to a network to access the file systems present on the hard disk of another computer on the N/w.

Page 7: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

AFS (Andrews File System)

AFS was developed at CMU AFS has many benefits in security & scalability areas AFS uses Kerberos for authentication Read and write operations on an open file are directed

only to the locally cached copy When modified file is closed, the changed portions are

copied back to the file server Cache consistency is maintained by a mechanism called

callback AFS influenced lot of today’s distributed file systems like

CODA

Page 8: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

CODA

Page 9: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

CODA File System

CODA is a Network File System that achieves high availability by techniques using two techniques:

Server Replication & Disconnected Operation Disconnected operation is the mode of operation that

enables a client to continue accessing critical data during temporary failures of network connectivity

Server replication involves maintaining read-write replicas at more than one server. The replication sites for a volume is its volume storage group (VSG)

Main idea behind this is caching of data to improve availability

Page 10: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Design

On each client, a user level process called Venus, manages a file cache on the local disk. It is ‘venus’ that bears the brunt of disconnected operation

Page 11: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Venus States

Venus operates in three states

Hoarding

Emulation

Reintegration

Page 12: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Hoarding

When there is good connectivity between client and server

In this state venus hoards useful data in anticipation of disconnection

It should estimate the files used later and prefetch them for disconnected operation

Hoard Walking: maintains client cache in equilibrium, caches high priority files for high availability. Periodically restores equilibrium by performing hoard walk.

Page 13: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Emulation

When client is very weakly or disconnected with server Venus acts as pseudo server, assumes full responsibility

for access When a client asks for a file, venus provides the file if it

is stored in cache If the requested file is not present in cache it reports a

error, but not as a cache miss Logging: During emulation venus records sufficient

information to replay update activity when it reintegrates.

Page 14: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Reintegration

When network connectivity is resumed between client and server

Reintegration is a transitory state through which venus passes in changing roles from pseudo-server to cache manager

Venus propagates changes made during emultion, and update its cache to reflect current server state

Conflict handling

Page 15: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Drawbacks

Updates are not visible to other clients Cache misses may impede progress Exhaustion of cache space is a concern Update conflicts become more likely Updates are at a risk due to theft, loss or damage

Page 16: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Google gears

Page 17: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Cedar

Page 18: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Cedar

Mobile database access over low-bandwidth Networks Relational databases is core of business process Cedar is useful for mobile commerce, traveling sales

people, disaster recovery Stale client replica can be used to reduce data

transmission volume Basics of database

Page 19: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Cedar Architecture

Page 20: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Content Addressable Storage Storing information that can be retrieved based on its content System will record a content address, which is an identifier uniquely

and permanently linked to the information content itself. A request to retrieve information from a CAS system must provide

the content identifier, from which the system can determine the physical location of the data and retrieve it

Any change to a data element will necessarily change its content address

CAS device will not permit editing information once it has been stored.

Page 21: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Cedar Protocol

Page 22: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Transparency of cedar

Application Transparency Database Transparency Adaptive Interposition

Commonality detection Exploring structure in data Generating compact CAS descriptions

Page 23: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Creating and refreshing client replicas

Hoard Granularity Database hoard

profiles Tools for handling Refreshing stale

client replicas

Page 24: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Results of Cedar

Page 25: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Drawbacks of cedar

Page 26: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

LBFS-Low bandwidth Network File System

Page 27: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

LBFS-Low Bandwidth Network File System

A NFS for efficient use of network in the face of low connectivity

LBFS exploits the similarities between files or versions of the same file to save bandwidth

Avoids sending of data over network when same data can already be found in server file system or client cache

Applied together with compression and caching to improve performance

Page 28: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Design

LBFS server divides the file it stores into chunks and indexes the chunks by hash value.

Client indexes a large persistent cache Whenever requesting data transfer, each system identifies the

chunks already in the system

Page 29: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Reading a file in LBFS

Page 30: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Observations

Page 31: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Drawbacks

Same files appear different when encrypted differently- so LBFS is not useful here

Synchronization problems with different chunk sizes Useful only when there exists minimal commonality

between files

Page 32: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Operation Shipping

Page 33: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Operation Shipping for Mobile File Systems

How to propagate an updated large file from a weakly connected client to its server?

operation shipping or operation based update propagation can be used to solve the problem.

Value shipping

Page 34: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Operation shipping

The user operation is send to a surrogate client that is strongly connected to the server

The surrogate replays the user operation, regenerates the files, checks whether they are identical to original files, and, if so, sends the files to the servers on behalf of the client.

Forward error correction is used to restore minor re-execution discrepancies.

Page 35: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Operation shipping

Page 36: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Observations:

Network traffic reductions from 12 to 400 time Speedups in the range from 1.4 to nearly 50 times. Correctness of the re-executed file is ensured

May not be feasible when the surrogate doesn't support the user operation

There are some side effects that makes the re-executed file to be different from that of main file. In such cases we have to fall back for value shipping.

Page 37: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Data Staging on Untrusted Surrogates

Page 38: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Data staging on Untrusted Surrogates

How untrusted computers can be used to facilitate secure mobile data access?

Data staging can improve the performance of Distributed file systems

Data staging opportunistically prefetches files and caches them on a nearby surrogates.

Surrogates are untrusted and unmanaged: we use end to end and secure hashes to provide privacy and authenticity of data.

Results show reduction in average latency by 54%

Page 39: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

System model

Page 40: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

observations

Page 41: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Pros/cons

PROS Reduces the latency between server and a client Increases pervasiveness by supporting small devices with small

memory and limited power

CONS Surrogates are manually located at present Malicious surrogates provide risks like eavesdrop, denial of

service, corruption of data, etc.

Page 42: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Portable Soul pads

Page 43: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Architecture

ISR (Internet Suspend/Respond)

User’s computation state is stored as a check-pointed virtual machine image.

Remote Desktop

Page 44: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Soul pad

Knoppix for Auto-configuring host OS

VMware workstation for the VMM

Windows or Linux for guest OS

Page 45: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Observations

Soul pad provide AES 128 block encryption When USB drive is removed all the memory that is

related to soul pad operations is erased. Backups are created on network file systems when ever

host has internet connection. Resume & Suspend Latencies Application Response times Instruction set Architecture diversity

Page 46: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Practical Implementation

Mojopac Install Mojopac on USB pen drive Install software on Mojopac Use that software on which ever system you want Copyrights violations need to be changed

Page 47: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Integrating Portable and Distributed Storage

Page 48: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Architecture

Each have their own pros and cons Performance and availability increases by integrating

portable and distributed storage Lookaside caching

Page 49: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

GFS Google file system

Page 50: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

GFS

A scalable large distributed data-intensive applications.

Fault tolerant while running on inexpensive hardware.

Google’s storage platform for generation and processing of data.

Hundreds of terabytes of storage access thousands of disks on thousands of machines and accessed by hundreds of clients

Page 51: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

GFS Architecture

Page 52: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Working of GFS Single master, Multiple chunk servers, Multiple

Users fixed-size chunks (giant blocks) (how big? 64MB) 64-bit ids for each chunk clients read/write chunks directly from chunkservers chunks are the unit of replication Master maintains all metadata namespace and access control map from filenames to chunk ids current locations for each chunk metadata is cached at clients

Page 53: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Other Google technologies

Bigtable: A Distributed Storage System for Structured Data

Used for Google Earth and Google Finance. Bigtable has successfully provided a flexible, high-

performance solution for all of these Google products

Page 54: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

References

1. Disconnected Operation in the Coda File System – James J. Kistler, CMU

2. Exploiting weak connectivity for Mobile File Access - Lily B. Mummert, CMU

3. A Low Bandwidth Network File system – Athicha Muthithachareon,MIT

4. Data staging on untrusted surrogates – Jason Flinn, Intel Research

5. Operation shipping for Mobile File systems – Yai Lee, IEEE

6. Improving Mobile Database Access over WANs – Niraj Tolia, CMU

7. Reincarnating PCs with portable soulpads– Ramon Caceres, IBM Research

8. Pervasive personal computing in internet suspend system – satya, CMU

9. Integrating portable and distributed storage – Niraj Tolia, CMU

10. The Google File System – Sanjay Ghemawat, Google

11. Coda File System – M Satyanarayan, CMU

Page 55: Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar