Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others)...

Preview:

Citation preview

Peer-to-peer archival data trading

Brian Cooper

Joint work with Hector Garcia-Molina

(and others)Stanford University

2 Data trading

Problem: Fragile Data

Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business

3 Data trading

Replication-based preservation

4 Data trading

Replication-based preservation

5 Data trading

Motivation

Several systems use replication Preserve digital collections SAV, others

Archival part of digital library Individual organizations cooperate Not a lot of money to spend

6 Data trading

Goal Reliable replication of digital collections Given that

Resources are limited Sites are autonomous Not all sites are equal

Traditional methods Central control Random Replicate popular

Metric Reliability Not necessarily “efficiency”

7 Data trading

Our solution

Data trading “I’ll store a copy of your collection if you’ll store

a copy of mine” Sites make local decisions

Who to trade with How many copies to make How much space to provide Etc.

8 Data trading

Trading network A series of binary, peer-to-peer trading

links

A

D

B

H

C

E

G

F

9 Data trading

Reliability layer

Archived data

Architecture

Users

Users

Filesystem

InfoMonitor

SAV ArchiveSAV Archive

Archived data

Internet

Local archive

Remote archive

Reliability layer

Service layer

This architecture developed with Arturo Crespo

10 Data trading

Overview

Trading model Trading algorithm Optimizing (and simulating)

trading Some results Some stuff we are still working on

11 Data trading

Trading model

12 Data trading

Trading model Archive site: an autonomous archiving

provider

13 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials

14 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections

15 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials

16 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials Data reliability: probability that data is not

lost

17 Data trading

Deeds

A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred

Trading algorithm Sites trade deeds Sites exercise deeds to

replicate collections

Deed for spaceFor use by: Library of Congress

or for transfer

623 gigabytes

Stanford University

18 Data trading

C

A B

Deed trading

Collection 1

Collection 1

Collection 2

Collection 2 Collectio

n 3Collection 3

19 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2

Collection 1

Collection 2

Collection 3

20 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

21 Data trading

Alternative solutions

Are there other ways besides trading?

22 Data trading

Other solutions: central control

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

23 Data trading

Other solutions: client-based

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

24 Data trading

Other solutions: random

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

25 Data trading

Why is trading good?

High reliability Framework for replication

Site autonomy Make local decisions No submission to external authority

Fairness Contribute more = more reliability Must contribute resources

A

D

B

H

C

E

G

F

26 Data trading

Decisions facing an archive Who to trade with How much to trade When to ask for a trade Providing space Advertising space Picking a number of copies Coping with varying site reliabilities What to do with acquired resources How to deliver other services

Many many degrees of freedom!

27 Data trading

Our approach Define a basic trading protocol

Deed trading Assume all sites follow same rules

Basic system for trading Extend: not all sites are equal

Some are more reliable or trusted Extend: sites have freedom to negotiate

Bid trading Extend: some sites are malicious

Ensure documents survive despite evildoers For each model, what policies are best?

28 Data trading

How do we evaluate policies?

Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy

29 Data trading

Simulation parameters

Number of sites 2 to 15

Site reliability 0.5 to 0.8

Collections per site

4 to 25

Data per collection

50 Gb to 1000 Gb

Space per site 2x data to 7x data

Replication goal 2 to 15 copies

Scenarios per simulation

200

30 Data trading

Reliability

Site reliability Will a site fail? Example: 0.9 = 10% chance of failure

Data reliability How safe is the data? Despite site failures Example: 320 year MTTF

31 Data trading

Basic trading approach

How does trading work? Assuming all sites follow “the rules”

Example: advertising policy

“Let’s trade. How much space do you have?”

A B

32 Data trading

Advertising policy

“I have 120 GB”120 GB

Space fractional policy

“I have 60 GB”60 GB

Data proportional policy

“I have 40 GB”

40 GB

40 GBData

A B

A B

A B

33 Data trading

Result

0

0.2

0.4

0.6

0.8

1

1.2

2 3 4 5 6 7

Global FG (storage space as a multiple of data size)

Glo

bal

rel

iab

ilit

y (p

rob

abil

ity

of

no

dat

a lo

ss)

Space-fractional Data-proportional

34 Data trading

Extend: some sites > others May prefer certain sites

More reliable Better reputation Part of same system

Example: who to trade with?

??

?A

35 Data trading

1

10

100

1000

10000

0.5 0.6 0.7 0.8 0.9

Local site reliability

Av

era

ge

loc

al d

ata

MT

TF

Clustering MostReliable ClosestReliability

Who to trade with?

36 Data trading

Extend: freedom to negotiate

Bid for trades

“80 GB”

“95 GB”

“120 GB”

“How much do I pay for 100 GB of your space?”

A

37 Data trading

Bid trading

Questions When do I call auctions? How much do I bid? Can I take advantage of the system

by being clever?

38 Data trading

Extend: some sites are malicious

Secure services Publish: Makes copies to survive failures Search: Find documents Retrieve: Get a copy of a document

Challenges Attacker may delete copy Attacker may provide fake search results Attacker may provide altered document Attacker may disrupt message routing …

Joint work with Mayank Bawa and Neil Daswani

39 Data trading

Current and future work

Access Support searching over collections Distribute indexes via trading

Prototype implementation Basic SAV architecture implemented Trading protocol/policies must be

added Develop security techniques

further

40 Data trading

Current and future work Other topics of interest

Designing peer-to-peer primitives Building other p2p services

Other ways of acquiring data How to archive active systems

Semantic archiving Managing “format obsolescence” Finding data once it is archived

41 Data trading

Other parts of SAV project SAV data model

Write-once objects Signature-based naming

How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.)

Modeling archival repositories Arturo Crespo Choose best components and design

42 Data trading

Related work Peer-to-peer replication

SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems

RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Deep storage (Tivoli)

Barter/auction based systems ContractNet

Distributed resource allocation File Allocation Problem

43 Data trading

Conclusion Important, exciting area

Preservation critical Difficult to accomplish

Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions

Trading networks replicate data Model for trading networks Trading algorithm Simulation results

A

D

B

H

C

E

G

F

44 Data trading

For more information

cooperb@stanford.edu http://www-diglib.stanford.edu/

Recommended