32
Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar University of California

Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Embed Size (px)

DESCRIPTION

Coherency of Dynamic Data Strong coherency The client and source always in sync (U(t) = S(t)) Strong coherency is expensive! Relax strong coherency:  - coherency Time domain:  t - coherency Value domain:  v - coherency The difference in the data values at the client and the source bounded by  v at all times E.g.: temperature changes greater than 1 degree Source S(t) Repository R(t) Clien t U(t)

Citation preview

Page 1: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Client Assignment in Content Dissemination Networks for Dynamic Data

Shetal ShahKrithi Ramamritham

Indian Institute of Technology Bombay

Chinya Ravishankar

University of California Riverside

Page 2: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Dynamic Data

Traffic data packets thru switches / vehicles on highways

Stock prices, Sport Scores

• rapid and unpredictable changes• time critical, value critical• used in on-line monitoring, decision making

More and more of data gathered from the web/internet is dynamic

Page 3: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Coherency of Dynamic Data

Strong coherency The client and source always in sync (U(t) = S(t)) Strong coherency is expensive!

Relax strong coherency: - coherency Time domain: t - coherency Value domain: v - coherency

The difference in the data values at the client and the source bounded by v at all times

E.g.: temperature changes greater than 1 degree

SourceS(t)

RepositoryR(t)

ClientU(t)

vtStUt |)()(|,

Page 4: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Broad Focus of work To create a scalable content dissemination network

(CDN) for streaming/dynamic data.

Metric: Fidelity: % of time coherency requirement is met

Page 5: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

• Clients request for different data items by specifying coherence requirements for each item

• Repositories derive their requirements from the client requirements

• Source pushes the changes of interest to repositories

• Repositories cooperate with each other and the source to serve clients

Basic Framework: Sources, Repositories, Clients

Page 6: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Example Dissemination NetworkData Set: p, q, r Max Clients : 2

Source

p: 0.2, q : 0.2 r: 0.2

p: 0.4, r: 0.3 q: 0.3

R1 R2

R4 R3

Page 7: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Challenges – I

Given the data and coherency needs of repositories, how should repositories cooperate to satisfy these

needs?

How should repositories refresh the data such that coherency requirements of dependents are satisfied?

How to make repository network resilient to failures? [VLDB02, VLDB03, IEEE TKDE]

Page 8: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Challenges – II:Service to Clients

Given the data and coherency needs of clients

what data at what coherency should reside in each repository?

Given the data and the coherency available at repositories,

how to assign clients to the repositories?

Service toClients

Assign datato

repositories

Assign clientsto

repositories

Page 9: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Assigning clients to repositories

Client request is satisfied

Overheads are low Communication

delay Computational

delayC1

R2

Source

p:0.2, q:0.2 r:0.2

p:0.4, r: 0.3 q: 0.3

R1

R4 R3

q:0.3?

Assign <client, data- item, coherence> to repository

Page 10: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Overview Client assignment problem is NP-Hard Solve using preferences

Clients and repositories order each other by preferences

Use Stable Marriages Assign costs and do many-to-one client-repository

pairing

Page 11: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Cost based Client Assignment

• Assign cost to each potential <client request, repository> pair• Minimum Cost Assignment = {1,3,7} 7

6

1

3

89

5

<client, data item, coherence>Repositories

Page 12: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Client Assignment

• An assignment may contribute to delay for other assignments at the same node

• Assignment = {1,3,8}

7

6

1

3

89

5

Minimum Weight Matching<client, data item, coherence>

Repositories

Page 13: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Many-to-one Matching: Min Cost Network Flows

Directed graph, G={V, E} Start vertex End vertex or sink Edge

Capacity: maximum flow the edge can have

Cost: per unit flow Intermediate vertex

Inflow = outflow

Start

End

4 2 3

22 2

5

2

8

Page 14: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Maximum Flow

Value of the flow: flow leaving the source Maximum flow: value of flow is maximum Cost of flow = edgesflow * cost per unit flow) Min Cost Flow: maximum flow of minimum cost

2

Start

End

2 2 3

2 1

5

2

2

Page 15: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Client Assignment Using Network Flows

X,Y, Z : number of clients the repository is willing to serve

Capacity of <source, client request>

edge = 1

Sum of capacities on <repository, sink> edges number of client requests

End

1Start

X YZ

1 1

Page 16: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Network Flows: Costs and Capacities

<client request, repository> edge

Capacity : 1 Cost: function of

communication delays and coherence requirement

Cost of all other edges:0

Start1 1

1 1 11

XY Z

Page 17: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Max Flows Flow out of start node =

number of client requestsEach unit of flow makes one

assignment Cost of unit flow = cost of

assignment Maximum Flow of

minimum Cost => required solution

But this could overload the repositories!

Start1 1

1 1 11

XY Z

Page 18: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Considering Load:Iterative Min Cost Flows Load depends on the coherence

requirement of the assignments Assignments depend on this load! Limit the number of requests assigned

to a repository using <repository, sink> capacity

But this number does not translate into load It translates to load if coherences are close

to each other

Page 19: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Iterative Min Cost Flows

Split the requests into ranges. For each range:

Calculate the approximate load at each repository due to the previous assignments

Calculate the approximate load of the assignments to be made in this range

Determine the capacity of each repository

Find min-cost max flow

Page 20: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

For Each Range Number of updates for coherence ci is ci

-2

Approximate load at a repository:Ai. Average load A. For n client requests, expected load = n * ci

-2

Number of repositories: k Let ti be the number of assignments in the

current range to repository Ri Total load at Ri will be Ai + ti * ci

-2

Average load at R after assignment =

Capacity for Ri

)(2ii AAc

kn

2 icknA

Page 21: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Best Effort Service

Source

p:0.2, q:0.2 r:0.2

p:0.4, r: 0.3 q: 0.3

R1

R4 R3

q:0.1C1

Client will be served q at coherence 0.2

R2

Page 22: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Augmentation

Source

p:0.2, q:0.1 r:0.2

p:0.4, r: 0.3 q: 0.3

R1

R4 R3

q:0.1C1

Coherence of A for q is changed to 0.1. R2

Page 23: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Experimental Methodology Network: 1 source, 10 - 20 repositories,

10,000 – 80,000 client requests Real stock traces: 100-1000 Time duration of observations: 10,000 s Ranges for min cost flow: {0.01-0.03,

0.03- 0.07, 0.07-0.2, 0.2-1.0} Network Flow Solver: RelaxIV from

www.di.unipi.it/di/groups/optimize/ORGroup.html

Page 24: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

For comparison… Prior online Global Heuristic

Selector node for each data item Selector keeps information of

coherence requirements at repositories delays between the nodes in the network number of clients assigned to each repository

Client is assigned to a repository where the sum of the delays is minimized.

Two flavours: GHIS, GHESS. Agarwal et al. Construction of a Temporal Coherency Preserving Dynamic Data Dissemination Network. RTSS’04

Page 25: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Performance of the algorithms

GHIS does better than MCF, GHES initially, but degrades rapidly

unsatisfied requests source overloading!

Augmentation performs very well

GHES and MCF are comparable for small number of repositories

50% client requests between 0.01 to 0.09. Remaining from 0.1 to 0.99

GHIS

Page 26: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

MCF vs GHES (best effort)

MCF does better as the number of repositories increase

In fact for some simple inputs, MCF did better than GHES by a factor of 9!

Topology: 1 source, 10 repositories, 50 data items

GHISGHESMCF

Page 27: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Augmentation helps, but…

as the load

increases, augmentation increases

loss in fidelity

As load increases, serving clients at less stringent coherence requirements might actually reduce the loss in fidelity!

Page 28: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Need to adapt to load– Fair vs. biased approaches

Fair Approach Biased Approach

It is better to be biased than to be fair!

MCF_aug MCF_augMCF

Page 29: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Adaptive Algorithm For each data item, source maintains a list of

unique coherences and the number of clients for each coherence

If the queuing delay at any source/repository crosses a threshold th1

For each data item, the source reduces the coherence of service for some clients

If the queuing delays at any source/repository goes below a threshold th2. Resume service at desired coherency to some of the

clients

Page 30: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Performance of the adaptive algorithm

Augmented adaptation performs the best!

Page 31: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Conclusions and Current Work

Conclusions We prove that the client assignment

problem is NP-Hard Develop two new heuristics for the client

assignment problem Develop an adaptive algorithm for client

assignmentCurrent Work Investigation of the algorithms in real

network settings – Planet Lab.

Page 32: Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar

Thank You!