39
Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia †* , Michael Kozuch , M. Satyanarayanan †* , Brad Karp , Thomas Bressoud †‡ , and Adrian Perrig * * Carnegie Mellon University, Intel Research Pittsburgh and Denison University

Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

Opportunistic Use of Content Addressable

Storage for Distributed File Systems

Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*, Brad Karp†,Thomas Bressoud†‡, and Adrian Perrig*

*Carnegie Mellon University, †Intel Research Pittsburgh and ‡Denison University

Page 2: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

2

Introduction

Using a Distributed File System on a Wide Area Network is slow!However, there seems to be a growth in the number of providers of Content Addressable Storage (CAS)Therefore can we make opportunistic use of these CAS providers to benefit client-server file systems like NFS, AFS, and Coda?

Page 3: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

3

Content Addressable StorageContent Addressable Storage is data that is identified by its contents instead of a name

Foo.txtFoo.txtFoo.txt 0xd58e23b71b1b...0xd58e23b71b1b...

File Content Addressable Name

CryptographicHash

An example of a CAS provider is a Distributed Hash Table (DHT)

Page 4: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

4

MotivationUse CAS as a performance enhancement when the file server is remoteConvert data transfers from WAN to LAN

CASProviders

File Server Client

Page 5: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

5

Talk Outline

IntroductionThe CASPER File System

Building Blocks• Recipes• Jukeboxes• Recipe Servers

ArchitectureBenchmarks and PerformanceFuzzy MatchingConclusions

Page 6: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

6

The CASPER File System

It can make use of any available CAS provider to improve read performanceHowever it does not depend on the CAS providers

• In the absence of a useful CAS provider, you are no worse off than you originally were

Writes are sent directly to the File Server and not to the CAS provider

Page 7: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

7

Recipes

File Data Content Addressable Name

0x330c7eb274a4...0x330c7eb274a4...

0x1deb72e98470...0x1deb72e98470...

0xf13758906c8d...0xf13758906c8d...

0xe13b918d6a50...0xe13b918d6a50...

0xf9d09794b6d7...0xf9d09794b6d7...

RecipeCryptographic Hash

Page 8: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

8

Building Blocks: Recipes

Description of objects in a content addressable wayFirst class entity in the file system

• Can be cached

Uses XML for data representation• Compression used over the network

Can be maintained lazily as they contain version information

lego.com

Page 9: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

9

Recipe Example (XML)<recipe type="file"><metadata><version>00 00 01 04 01</version>…

</metadata>

<recipe_choice><hash_list hash_type="SHA-1" block_type="variable"

number="5"><hash size="4189">330c7eb274a4...</hash>…

</hash_list></recipe_choice>

</recipe>

Page 10: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

10

CASPER Architecture

Client

DFSClient

RecipeServer

DFS FileServer

Server

WAN Connection DFS FileServer

CAS Provider

LAN

Con

nect

ion

Jukebox

Page 11: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

11

Building Blocks: JukeboxesJukeboxes are abstractions of a Content Addressable Storage provider

• Provide access to data based on their hash value

Provide no guarantee to consumers about persistence or reliabilitySupport a Query() and Fetch() interface

• MultiQuery() and MultiFetch() also available

Examples include your desktop, a departmental jukebox, P2P systems, etc.

Page 12: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

12

Building Blocks: Recipe Server

This module generates recipe representation of files present in the underlying file systemCan be placed either on the Distributed File System server or on any other machine well connected to itHelps in maintaining consistency by informing the client of changes in files that it has reconstructed

lego.com

Page 13: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

13

CASPER details…The file system is based on Coda

• Whole file caching, open-close consistency

Proxy based layering approach usedCoda takes care of consistency, conflict detection, resolution, etc.

• The file server is the final authoritative source

CASPER allows us to service cache misses faster that might be usually possible

Page 14: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

14

CASPER Architecture

DFSClient

Client

CodaClient

DFSProxy

RecipeServer

DFS FileServer

Server

Coda FileServer

WAN Connection

CAS Provider

LAN

Con

nect

ion

Jukebox

Page 15: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

15

CASPER Implementation

CodaClient

DFSProxy

RecipeServer

Coda FileServer

CAS Provider

2. Recipe Request

3. Recipe ResponseLA

N C

onne

ctio

n

1. File Request4.

CA

S R

eque

st

5. C

AS

Res

pons

e

6. Missed Block Request

7. Missed Block Response

Client

WAN Connection

Server

Jukebox

Page 16: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

16

Talk Outline

IntroductionThe CASPER File System

Building Blocks• Recipes• Jukeboxes• Recipe Servers

ArchitectureBenchmarks and PerformanceFuzzy MatchingConclusions

Page 17: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

17

Experimental Setup

NIST Net RouterClient

File Server +Recipe Server

Jukebox

WAN bandwidth limitations between the server and client were controlled using NIST Net

• 10 Mb/s, 1 Mb/s + 10ms, and 100 Kb/s + 100ms

The hit-ratio on the jukebox was set to 100%, 66%, 33%, and 0%Clients began with a cold cache (no files or recipes)

Page 18: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

18

Benchmarks

Binary Install BenchmarkVirtual Machine MigrationModified Andrew Benchmark

Page 19: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

19

Benchmark DescriptionBinary Install (RPM based)

Installed RPMs for the Mozilla 1.1 browser• 6 RPMs, Total size of 13.5 MB

Virtual Machine migrationTime taken to resume a migrated Virtual Machine and execute a MS Office-based benchmarkTrace accesses ~1000 files @ 256 KB each

• No think time modeled

Page 20: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

20

Benchmark Description (II)Modified Andrew Benchmark

Phases include: Create Directory, Copy, Scan Directory, Read All, and Make

• However, only the Copy phase will exhibit an improvement

Uses Apache 1.3.27• 11.36 MB source tree, 977 files• 53% of files are less than 4 KB and 71% of them

are less than 8 KB in size

Page 21: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

21

Mozilla (RPM) Install

Time normalized against vanilla CodaGain most pronounced at lower bandwidthVery low overhead seen for these experiments (between 1-5 %)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

100 Kb/s 1 Mb/s 10 Mb/s

Nor

mal

ized

Run

time

100% 66% 33% 0%

44 sec150 sec1238 secBaseline

Page 22: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

22

Virtual Machine Migration

Time normalized against vanilla CodaLarge amounts of data show benefit even at higher bandwidthsHigh overhead seen at 10 Mb/s is an artifact of data buffering

0

0.2

0.4

0.6

0.8

1

1.2

1.4

100 Kb/s 1 Mb/s 10 Mb/s

Nor

mal

ized

Run

time

100% 66% 33% 0%

203 sec2046 sec21523 secBaseline

Page 23: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

23

Andrew Benchmark

Only the Copy phase shows benefitsMore than half of the files are fetched over the WAN without talking to the jukebox because of their small size

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

100 Kb/s 1 Mb/s 10 Mb/s

Nor

mal

ized

Run

time

100% 66% 33% 0%

1.5 1.6

1.8

13 sec103 sec1150 secBaseline

Page 24: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

24

CommonalityThe question of where and how much commonality can be found is still openHowever, there are a number of applications that will benefit from this approach

Some of the applications that would benefit include:

Virtual Machine MigrationBinary Installs and UpgradesSoftware Development

Page 25: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

25

Mozilla binary commonality

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

mozilla-16

mozilla-17

mozilla-18

mozilla-19

mozilla-20

mozilla-21

mozilla-22

mozilla-23

mozilla-24

mozilla-25

mozilla-16 mozilla-17 mozilla-18 mozilla-19 mozilla-20 mozilla-21mozilla-22 mozilla-23 mozilla-24 mozilla-25

Page 26: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

26

Linux kernel commonality – 2.2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2.2.0

2.2.1

2.2.2

2.2.3

2.2.4

2.2.5

2.2.6

2.2.7

2.2.8

2.2.9

2.2.10

2.2.11

2.2.12

2.2.13

2.2.14

2.2.15

2.2.16

2.2.17

2.2.18

2.2.19

2.2.20

2.2.21

2.2.22

2.2.23

2.2.24

2.2.0 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8

2.2.9 2.2.10 2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 2.2.16 2.2.17

2.2.18 2.2.19 2.2.20 2.2.21 2.2.22 2.2.23 2.2.24

Page 27: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

27

Related Work

Delta Encoding• rsync, HTTP, etc.

Distributed File Systems• NFS, AFS, Coda, etc.

P2P Content Addressable Networks• Chord, Pastry, Freenet, CAN, etc.

Hash based storage and file systems• Venti, LBFS, Ivy, EMC’s Centera, Farsite, etc.

Page 28: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

28

Conclusions

Introduction of the concept of recipes

Proven benefits of opportunistic use of content providers by traditional distributed file systems on WANs

Introduced “Fuzzy Matching”

Page 29: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

29

Page 30: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

30

Backup Slides

Page 31: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

31

Where did the time go?For the Andrew benchmark

Reconstruction of a large number of small files takes 4 roundtripsThere is also the overhead of compression, verification, etc.

Some part of the system (CAS requests) can be optimized by performing work in parallel

Page 32: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

32

Number of round trips

CodaClient

DFSProxy

RecipeServer

Coda FileServer

CAS Provider

1. Recipe Request

Recipe ResponseLA

N C

onne

ctio

n2

& 3

CA

S R

eque

st

CA

S R

espo

nse 4. Missed Block Request

Missed Block Response

Client

WAN Connection

Server

Jukebox

Page 33: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

33

Absolute Andrew

13.3 (0.5)103.3 (0.5)1150.7 (0.5)Baseline23.7 (0.5)108.7 (0.5)1069.3 (1.7)0%21.3 (0.5)85.0 (0.8)762.7 (0.5)33%20.0 (1.6)64.0 (0.8)520.7 (0.5)66%17.3 (1.2)40.3 (0.9)261.3 (0.5)100%10 Mb/s1 Mb/s100 Kb/s

Network BandwidthJukebox Hit-Ratio

Andrew Benchmark: Copy Performance (sec)

Page 34: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

34

NFS Implementation?The benchmark results would not change significantly (with the possible exception of the Virtual Machine migration benchmark).It is definitely possible to adopt a similar approach

In fact, an NFS proxy (without CASPER) exists.

However, the semantics of such a system are still unclear…

Page 35: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

35

Fuzzy MatchingQuestion: Can we convert an incorrect block into what we need?If there is a block that is “near” to what is needed, treat it as a transmission errorFix it by applying an error-correcting codeFuzzy Matching needs three components

• Exact hash• Fuzzy hash• ECC information

Page 36: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

36

Fuzzy Hashes

File Data

4 7 8 9 2 0Shingleprints 3

Sort

0 2 3 4 7 8 9

Fuzzy Hash

Page 37: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

37

Fuzzy Hashing and ECCs

A fuzzy hash could simply be a shingle• Hash a number of features with a sliding window• Take the first m hashes after sorting to be

representative of the data• These m shingles are used to find “near” blocks

After finding a similar block, an ECC that tolerates a number of changes could be applied to recover the original blockDefinite tradeoff between recipe size and Fuzzy Matching but this approach is promising

Page 38: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

38

Linux kernel commonality - 2.4

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2.4.0

2.4.1

2.4.2

2.4.3

2.4.4

2.4.5

2.4.6

2.4.7

2.4.8

2.4.9

2.4.10

2.4.11

2.4.12

2.4.13

2.4.14

2.4.15

2.4.16

2.4.17

2.4.18

2.4.19

2.4.20

2.4.0 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.4.7 2.4.8

2.4.9 2.4.10 2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 2.4.16 2.4.17

2.4.18 2.4.19 2.4.20

Page 39: Opportunistic Use of Content Addressable Storage for ......Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia†*, Michael Kozuch†, M. Satyanarayanan†*,

39

Linux kernel – 2.2 (B/W)