31
Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard Dunn, Stefan Saroiu Steve Gribble, Hank Levy, John Zahorjan Most of these are taken from the original presentation by Gummadi

Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload

  • Upload
    ceana

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload. Krishna Gummadi, Richard Dunn, Stefan Saroiu Steve Gribble, Hank Levy, John Zahorjan Most of these are taken from the original presentation by Gummadi. The Internet has changed (again!). - PowerPoint PPT Presentation

Citation preview

Page 1: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload

Krishna Gummadi, Richard Dunn, Stefan SaroiuSteve Gribble, Hank Levy, John Zahorjan

Most of these are taken from the original presentation by Gummadi

Page 2: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

The Internet has changed (again!)

• Explosive growth of P2P file-sharing systems

– now the dominant source of Internet traffic

– its workload consists of large multimedia (audio, video) files

• P2P file-sharing is very different than the Web

– in terms of both workload and infrastructure

– we understand the dynamics of the Web, but the dynamics

of P2P are largely unknown

Page 3: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Why measure?

Measure

Build model

Predict andValidate model

Gives clue to accurate model building and prediction

Page 4: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

The current paper

• Multimedia workloads– what files are being exchanged– goal: to identify the forces driving the workload and understand

the potential impacts of future changes in them

• P2P delivery infrastructure– how the files are being exchanged– goal: to understand the behavior of Kazaa peers, and derive

implications for P2P as a delivery infrastructure

Studis the Kazaa peer-to-peer file-sharing system, to understand two separate phenomena

Page 5: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa: Quick Overview

• Peers are individually owned computers– most connected by modems or broadband– no centralized components

• Two-level structure: some peers are “super-nodes”– super-nodes index content from peers underneath– files transferred in segments from multiple peers

simultaneously

• The protocol is proprietary

Page 6: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Methodology

• Capture a 6-month long trace of Kazaa traffic at UW– trace gathered from May 28th – December 17th, 2002

• passively observe all objects flowing into UW campus

• classify based on port numbers and HTTP headers

• anonymize sensitive data before writing to disk

• Limitations:– only studied one population (UW)– could see data transfers, but not encrypted control traffic– cannot see internal Kazaa traffic

Page 7: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Trace Characteristics

Page 8: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Outline

• Introduction

• Some observations about Kazaa

• A model for studying multimedia workloads

• Locality-aware P2P request distribution

• Conclusions

Page 9: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa is really 2 workloads

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

<= 10 10-100 >100

object size (MB)

% of requests/bytes

requests

bytes transferred

• If you care about:– making users happy: make sure audio arrives quickly– making IT dept. happy: cache or rate limit video

Page 10: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa users are very patient

• audio file takes 1 hr to fetch over broadband, video takes 1 day– but in either case, Kazaa users were willing to wait for weeks!

– Kazaa is a batch system, while the Web is interactive

0

0.2

0.4

0.6

0.8

1

-3

download latency

% of requests (CDF)

5 mins 1 hour 1 week1 day 150 days

Video

Audio

Page 11: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa objects are immutable

• The Web is driven by object change

(many visit cnn.com every hour. Why?)– users revisit popular sites, as their content changes– rate of change limits Web cache effectiveness [Wolman 99]

• In contrast, Kazaa objects never change– as a result, users rarely re-download the same object

• 94% of the time, a user fetches an object at-most-once

• 99% of the time, a user fetches an object at-most-twice

– implications:• # requests to popular objects bounded by user population size

Page 12: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa popularity has high turnover

• Popularity is short lived: rankings constantly change– only 5% of the top-100 audio objects stayed in the top-100

over our entire trace [video: 44%]

• Newly popular objects tend to be recently born– of audio objects that “broke into” the top-100, 79% were born

a month before becoming popular [video: 84%]

Page 13: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Zipf distribution

Zipf’s Law states that the popularity of an object

of rank k is 1/ k of the popularity of the

top-ranked object

rank1 2 3

popu

larit

y On a log-log graph, it will be linear

Page 14: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa does not obey Zipf’s law

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

1 10 100 1000 10000 100000 1000000 1E+07 1E+08

object rank

# of requests

Video

WWW objects

• Kazaa: the most popular objects are 100x less popular than Zipf predicts

Page 15: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Factors driving P2P file-sharing workloads

• Our traces suggest two factors drive P2P workloads:

1. Fetch-at-most-once behavior– resulting in a “flattened head” in popularity curve

2. The “dynamics” of objects and users over time– new objects are born, old objects lose popularity, and

new users join the system

• Let’s build a model to gain insight into these factors

Page 16: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

It’s not just Kazaa

1

10

100

1000

1 10 100 1000movie index

rental frequency

0.001

0.01

0.1

1

10

100

1000

10000

100000

1000000

1 10 100 1000movie index

box office sales ($millions)

• Video rental and movie box office sales data show similar properties– multimedia in general seems

to be non-Zipf

video store rentals

box office sales

Page 17: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Outline

• Introduction

• Some observations about Kazaa

• A model for studying multimedia workloads

• Locality-aware P2P request distribution

• Conclusions

Page 18: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Model basics

1. Objects are chosen from an underlying Zipf curve

2. But we enforce “fetch-at-most-once” behavior– when a user picks an object, it is removed from her

distribution

3. Fold in user, object dynamics– new objects inserted with initial popularity drawn from Zipf

• new popular objects displace the old popular objects

– new users begin with a fresh Zipf curve

Page 19: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Model parameters

C # of clients 1,000

O # of objects 40,000

λRclient req. rate 2 objs/day

α Zipf param driving obj. popularity

1.0

P(x) prob. client req. object of pop rank x

Zipf (1.0) +

fetch-at-most-once

A(x) prob. of new object inserted at pop rank x

Zipf (1.0)

M cache size (frac. of obj) varies

λO object arrival rate varies

λcclient arrival rate varies

Page 20: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Fetch-at-most-once flattens Zipf’s head

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000

object rank

# of requests

fetch-at-most-once + Zipf

Zipf

Page 21: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

File sharing effectiveness

An organization is experiencing to much

demand for external bandwidth for P2P

applications. How will the demand change

if a proxy cache is used? Let us examine

the hit ratio of the proxy cache.

Page 22: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Caching implications

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800 900 1000

days

hit rate

Zipf

fetch-at-most-once + Zipf

cache size = 1000 objectsrequest rate per user = 2/day

• In the absence of new objects and users– fetch-many: cache hit rate is stable– fetch-at-most-once: hit rate degrades over time

Fetch repeatedlylike Web objects

Popular objects areConsumed early. After this,

It is pretty much random

Page 23: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9

Object arrival rate / avg. per-user request rate

hit rate

cache size = 8000 objects

New objects help (not hurt)

• New objects do cause cold misses– but they replenish the supply of popular objects that are the

source of file sharing hits

• A slow, constant arrival rate stabilizes performance– rate needed is proportional to avg. per-user request rate

Page 24: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

New users cannot help

• They have potential…– new users have a “fresh” Zipf curve to draw from– therefore will have a high initial hit rate

• But the new users grow old too– ultimately, they increase the size of the “elderly” population– to offset, must add users at exponentially increasing rate

• not sustainable in the long run

Page 25: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Validating the model

1

10

100

1000

10000

1 10 100 1000 10000 100000

object rank

# requests

measured

our model

static Zipf

• We parameterized our model using measured trace values– its output closely matches the trace itself

Page 26: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Outline

• Introduction

• Some observations about Kazaa

• A model for studying multimedia workloads

• Locality-aware P2P request distribution

• Conclusions

Page 27: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Kazaa has significant untapped locality

• We simulated a proxy cache for UW P2P environment– 86% of Kazaa bytes already exist within UW when they are

downloaded externally by a UW peer

88.5%

64.6%

14.0% 11.5%

35.4%

86.0%

0%

20%

40%

60%

80%

100%

all objects Video objects Audio objects

% bytes transferred

miss

hit

Page 28: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Locality Aware Request Routing

• Idea: download content from local peers, if available– local peers as a distributed cache instead of a proxy cache

• Can be implemented in several ways– scheme 1: use a redirector instead of a cache

• redirector sits at organizational border, indexes content, reflects download requests to peers that can serve them

– scheme 2: decentralized request distribution• use location information in P2P protocols (e.g., a DHT)

• We simulated locality-awareness using our trace data– note that both schemes are identical w.r.t the simulation

Page 29: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Locality-aware routing performance

0%

20%

40%

60%

80%

100%

all objects video objects audio objects

% bytes transferred

cold

unavailable

hit67.9%

11.5%

20.8%

37.1%

35.4%

27.5%

63.2%

14.0%

22.8%

• “P2P-ness” introduces a new kind of miss: “unavailable” miss– even with pessimistic peer availability, locality-awareness saves

significant bandwidth– goal of P2P system: minimize the new miss types

• achieve upper bound imposed by workload (cold misses only)

Page 30: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

How can we eliminate unavailable misses?

• Popularity drives a kind of “natural replication”– descriptive, but also predictive

• popular objects take care of themselves, unpopular can’t help

• focus on “middle” popularity objects when designing systems

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000 1000000

object # (sorted by popularity)

unavailability miss bytes (CDF)

Video objects

Audio objects

Page 31: Measurement, Modeling and Analysis  of a Peer-to-Peer File-Sharing Workload

Conclusions

• P2P file-sharing driven by different forces than the Web • Multimedia workloads:

– driven by 2 factors: fetch-at-most-once, object/user dynamics– constructed a model that explains non-zipf behavior and

validated it

• P2P infrastructure:– current file-sharing architectures miss opportunity– locality-aware architectures can save significant bandwidth– a challenge for P2P: eliminating unavailable misses