A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU

A Low-Bandwidth Network File System

A. Muthitacharoen, MIT

B. Chen, MIT

D. Mazieres, NYU

Key Ideas A network file systems for slow or wide-area

networks Exploits similarities between files or versions

of the same file Avoids sending data that can be found in

the server’s file system or the client’s cache Also uses conventional compression and

caching Requires 90% less bandwidth than traditional

network file systems

Working on slow networks Make local copies

Must worry about update conflicts Use remote login

Only for text-based applications Use instead a LBFS

Better than remote login Must deal with issues like auto-saves

blocking the editor for the duration of transfer

LBFS Exploits cross-file similarities especially with

previous versions of the same file Auto-save files, …

LBFS file server divides the files it stores into chunks and indexes the chunks by hash value

LBFS client similarly indexes a large persistent file cache

LBFS never transfers chunks that the recipient already has

Previous Work (I) AFS Callbacks require server to notify clients

when a cached file has been modified Leases achieve same goal but have an

expiration time Coda supports slow networks and even

disconnected operation Defers some updates to saves bandwidth

OceanStore applies Bayou’s conflict resolution mechanisms to a file system

Previous Work (II) Operation-based updates (Lee et al.)

Proxy-client close to the server duplicates client computations in the hope of duplicating its output files

Spring and Wetherall propose to use two large cooperating caches storing identical copies of the last n megabytes of network traffic

Rsync uses directory tree mirroring at client and server.

LBFS LBFS provides close-to-open consistency

Similar to AFS session consistency LBFS assumes clients will have a cache

large enough to contain a user’s entire working set of files

When possible, LBFS reconstitutes files using chunks of existing data in the file system and client cache instead of transmitting those chunks over the network

Indexing Issues Major challenge is keeping the index a

reasonable size while dealing with shifting offsets Indexing conventional file blocks

would not work Indexing and hashing overlapping file

blocks at all offsets would require too much space

LBFS Solution Considers only non-overlapping chunks of

files Sets chunk boundaries based on file

contents to avoid sensitivity to shifting file offset

Examines every overlapping 48-byte region of the file to selects boundary regions, or breakpoints, using Rabin fingerprints

Expected chunk size is 8 KB plus the size of the 48-byte breakpoint window

Handling Insertions

More Indexing Issues Pathological cases

Very small chunks Sending hashes of chunks would

consume as much bandwidth as just sending the file

Very large chunks Cannot be sent in a single RPC

LBFS imposes minimum and maximum chuck sizes

The Chunk Database Indexes each chunk by the first 64 bits

of its SHA-1 hash To avoid synchronization problems,

LBFS always recomputes the SHA-1 hash of any data chunk before using it Simplifies crash recovery

Recomputed SHA-1 values are also used to detect hash collisions in the database

Protocol Based on NFS version 3 Adds

Extensions to exploit inter-file commonality (GETHASH)

Leases Compresses all traffic using

conventional gzip

File Consistency (I) Whenever a client makes any RPC on an LBFS

file, it gets back a read lease on the file. If a user opens a file whose lease has expired,

the client asks the server for the attributes of the file Grants the client a lease on the file. Client can check if it has the current version

of the file in its cache If the file times have changed, client must

obtain new contents of file from server

File Consistency (II) No need for write leases

LBFS provides close-to-open consistency Server never demands back a dirty file

If multiple clients are writing the same file,the last one to close the file will overwrite changes from the others

File updates are atomic Limits damage caused by concurrent

updates

Security Issues LBFS uses SFS security infrastructure

Servers have public keys Messages are encrypted

Specific security issue: A user could check whether the file

system contains a particular chunk of data by observing subtle timing differences in server’s answer to CONDWRITE request

Implementation (I)

Implementation (II) Uses NFS Two NFS-related issues

When server commits a temporary file to a target file, it must copy the contents of the temporary file onto the target file to preserve the target file i-node

Hard to preserve previous contents of a truncated file

Message order is guaranteed by TCP

Evaluation (I) Communality of data in /usr/local

Evaluation (II) Normalized bandwidth consumption

(2 of 3 benchmarks)

Key First four bars of each workload show

upstream bandwidth, the second four downstream bandwidth.

CIFS is Windows natural network file system “Leases+Gzip” uses LBFS file caching,

leases, and data compression but not its chunking scheme

“LBFS, new DB” is LBFS starting with a a new database

Evaluation (III)

Normalized application times

Key Execution times weere normalized

orma,ized execution times Measurements made over a cable modem link with 384 Kb/sc uplink and 1.5 Mb/s downlink

LAN data were obtained on a 100 Mb/s full-duplex LAN.

Conclusion Under normal circumstances, LBFS

consumes 90% less bandwidth than traditional file systems.

Makes transparent remote file access a viable and less frustrating alternative to running interactive programs on remote machines.

Documents

A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU