Traffic (1993-2000) Heavy tails (HT) in net traffic??? Careful measurements Appropriate statistics...

Preview:

Citation preview

Traffic (1993-2000)

• Heavy tails (HT) in net traffic???

• Careful measurements• Appropriate statistics• Connecting traffic to

application behavior• “optimal” web layout

HT files

HT traffic

Traffic

Verbal

Data/stat

Mod/sim

Analysis

Synthesis

Is streamed out on the net.

Creating fractal Gaussian internet traffic (Willinger,…)

2

3 H

Heavy tailed files

time

log(file size)

> 1.0

log(

> s

ize)

p s-

Traffic (1993)

• Traffic is “bursty”?Traffic

Verbal

Traffic (1993-2000)

• Bursty???• Careful measurements• Appropriate statistics

Traffic

Verbal

Data/stat

Why?

Heavy tailed files

time

Long space

Becomes long time

Why?

Traffic

Verbal

Data/stat

Mod/sim

Heavy tailed files

time

log(file size)

> 1.0

log(

> s

ize)

p s-

2

3 H

Traffic

Verbal

Data/stat

Mod/sim

Analysis

Heavy tailed files

time

log(file size)

> 1.0

log(

> s

ize)

p s-

What?

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

Frequency

Decimated dataLog (base 10)

Forest fires1000 km2

(Malamud)

WWW filesMbytes

(Crovella)

Data compression

(Huffman)

Cumulative

log( ( ))P X x

log( )x

cx Probability that a file is bigger than x.

1cx

Probability that a packet is in a file bigger than x.

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

FrequencyFires

Web filesCodewords

Cumulative

Log (base 10)

-1/2

-1

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

Frequency Forest fires1000 km2

WWW filesMbytes

Data compression

Cumulative

-1/2

-1

exponential

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

Frequency Forest fires1000 km2

WWW filesMbytes

Data compression

Cumulative

exponential

All events are close in size.

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

Frequency Forest fires1000 km2

WWW filesMbytes

Data compression

Cumulative

-1/2

-1

Most events are small

But the large events are huge

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

FF

WWWDC

Data + Model/Theory

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

Frequency

Decimated dataLog (base 10)

WWW filesMbytes

(Crovella)

Cumulative Most files are small

(mice)

Most packets are in large files (elephants)

NetworkNetwork

Sources

Mice

Elephants

Router queues

Delay sensitive

Bandwidth sensitive

Unfortunate interaction of files with congestion

control

Heavy tailed files

time

log(file size)

> 1.0

log(

> s

ize)

p s-

Why?

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

Size of events

Frequency

WWW filesMbytes

Data compression

Cumulative

exponential

All events are close in size.

Source coding for data compression

Based on frequencies of source word occurrences,

Select code words.

To minimize message length.

0 1 2-1

0

1

2

3

4

5

6

DC

Data

Avg. length =

log( )

i i

i i

p l

p p

How well does the model predict the data?

length log(

xp( )

)

ei i

i i

l p

p cl

0 1 2-1

0

1

2

3

4

5

6

DC

Data + Model

How well does the model predict the data?

Not surprising, because the file was compressed using

Shannon theory.

Small discrepancy due to integer lengths.

length log(

xp( )

)

ei i

i i

l p

p cl

Avg. length =

log( )

i i

i i

p l

p p

Generalized “coding” problems

• Minimize avg file transfer• No feedback• Discrete (0-d) topology

• Minimize avg file transfer• Feedback• 1-d topology

Web

Data compression

document

split into N files to minimize download time

A toy website model(= 1-d grid HOT design)

Traffic

Verbal

Data/stat

Mod/sim

Analysis

Synthesis

Probability of user access

Wasteful

Hard to navigate.

Wasteful

Hard to navigate.

Just right

More complete website models

(Zhu, Yu)

• Detailed models – user behavior – content and hyperlinks

• Necessary for real web layout optimization• Statistics consistent with simpler models• Improved protocol design (TCP)• Commercial implications still unclear

Traffic (1993-2000)

• Heavy tails (HT) in net traffic???

• Careful measurements• Appropriate statistics• Connecting traffic to

application behavior• “optimal” web layout

HT files

HT traffic

Traffic

Verbal

Data/stat

Mod/sim

Analysis

Synthesis

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

WWWDC

Data

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

WWWDC

Data + Model/Theory

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

WWW

Data + Model/Theory

Are individual websites distributed like this?

Roughly, yes.

-6 -5 -4 -3 -2 -1 0 1 2-1

0

1

2

3

4

5

6

WWWDC

Data + Model/Theory

How has the data changed since 1995?

Traffic (1993-2000)

Traffic Topology Layering C&D

Verbal

Data/stat

Mod/sim

Analysis

Synthesis

Theory and the Internet

Traffic Topology C&D Layering

Verbal

Data/stat

Mod/sim

Analysis

Synthesis

NetworkNetwork

Sources

Mice

Elephants

Router queues

NetworkNetwork

Sources

Mice

Elephants

Router queues

Delay sensitive

Bandwidth sensitive

Unfortunate interaction of files with congestion

control

NetworkNetwork

Sources

Mice

Elephants

Router queues

Delay sensitive

Bandwidth sensitive

Better Control

Fortunate interaction of files with improved congestion control

High variability in context

More high variability• Heterogeneity• Human behavior• Actuating

Today: • Simplify/broaden • Look back/sideways

Extend• Optimization• Layer/distribute• Dynamics/control

Develop• Delays• Actuation