31
Automatically Inferring Automatically Inferring Patterns of Resource Patterns of Resource Consumption in Network Consumption in Network Traffic Traffic Cristian Estan, Stefan Savage, George Varghese University of California, San Diego

Trafficclusters

Embed Size (px)

Citation preview

Page 1: Trafficclusters

Automatically Inferring Automatically Inferring Patterns of Resource Patterns of Resource

Consumption in Network Consumption in Network TrafficTraffic

Cristian Estan, Stefan Savage, George Varghese

University of California, San Diego

Page 2: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 2

Who is using my link?Who is using my link?

Page 3: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 3

Do something smarter!

Too much data for a human

Looking at the trafficLooking at the traffic

Page 4: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 4

Looking at traffic aggregatesLooking at traffic aggregates

Aggregating on individual packet header fields gives useful results but

Traffic reports are not always at the right granularity (e.g. individual IP address, subnet, etc.)

Cannot show aggregates defined over multiple fields (e.g. which network uses which application)

The traffic analysis tool should automatically find aggregates over the right fields at the right granularity

Rank Destination IP Traffic

1 jeff.dorm.bigU.edu 11.9%

2 tracy.dorm.bigU.edu 3.12%

3 risc.cs.bigU.edu 2.83%

Most traffic goes to the dorms …

Rank Destination network Traffic

1 library.bigU.edu 27.5%

2 cs.bigU.edu 18.1%

3 dorm.bigU.edu 17.8%

What apps are used?

Rank Source port Traffic

1 Web 42.1%

2 Kazaa 6.7%

3 Ssh 6.3%

Dest. IP

Dest. net

Source port

Where does the traffic come

from?……

Src. IP Src. port

Src. net

Dest. portDest. IP

Dest. net

Protocol

Which network uses

web and which one kazaa?

Page 5: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 5

Ideal traffic reportIdeal traffic report

Traffic aggregate Traffic

Web traffic 42.1%

Web traffic to library.bigU.edu 26.7%

Web traffic from www.schwarzenegger.com 13.4%

ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu 11.9%

Web is the dominant applicationThe library is a

heavy user of webThat’s a big flash

crowd!

This is a Denial of Service attack !!

This paper is about giving the network administrator insightful traffic reports

Page 6: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 6

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 7: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 7

ApproachApproach

Characterize traffic mix by describing all important traffic aggregates

Multidimensional aggregates (e.g. flash crowd described by protocol, port number and IP address)

Aggregates at the the right level of granularity (e.g. computer, subnet, ISP)

Traffic analysis is automated – finds insightful data without human guidance

Page 8: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 8

Definition: traffic clustersDefinition: traffic clusters

Traffic clusters are the multidimensional traffic aggregates identified by our reports

A cluster is defined by a range for each field The ranges are from natural hierarchies (e.g. IP

prefix hierarchy) – meaningful aggregates Example

Traffic aggregate: incoming web traffic for CS Dept. Traffic cluster: ( SrcIP=*, DestIP in 132.239.64.0/21,

Proto=TCP, SrcPort=80, DestPort in [1024,65535] )

Page 9: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 9

Traffic reports give the volume of chosen traffic clusters To keep report size manageable describe only clusters

above threshold (e.g. H=total of traffic/20) To avoid redundant data compress by omitting clusters

whose traffic can be inferred (up to error H) from non-overlapping more specific clusters in the report

To highlight non-obvious aggregates prioritize by using unexpectedness label

Example» 50% of all traffic is web» Prefix B receives 20% of all traffic» The web traffic received by prefix B is 15% instead of

50%*20%=10%, unexpectedness label is 15%/10%=150%

Definition: traffic reportDefinition: traffic report

Page 10: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 10

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 11: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 11

Algorithms and theoryAlgorithms and theory

Algorithms and theoretical bounds in the paper Unidimensional reports are easy to compute

Multidimensional reports are exponentially harder as we add more fields

Next few slides Example of unidimensional compression

Example for the structure of the multidimensional cluster space

Page 12: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 12

Unidimensional report exampleUnidimensional report example

10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.8 10.0.0.9 10.0.0.10 10.0.0.14

15 35 30 40 160 110 35 75

10.0.0.2/31 10.0.0.4/3150 10.0.0.8/31 10.0.0.10/31

70 270 35 75

10.0.0.0/30 10.0.0.4/30 10.0.0.8/30 7530550 70

10.0.0.0/29 10.0.0.8/29120 380

10.0.0.0/28 500500

120 380

305

270

160 110

HierarchyThreshold=100

10.0.0.14/31

10.0.0.12/30

Page 13: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 13

270

120

500

305

380

160 110

Unidimensional report exampleUnidimensional report example

10.0.0.8 10.0.0.9

10.0.0.0/29 10.0.0.8/29

10.0.0.8/31

10.0.0.8/30

10.0.0.0/28

120 380

160 110

Compression

305-270<100

380-270≥100

Source IP Traffic

10.0.0.0/29 120

10.0.0.8/29 380

10.0.0.8 160

10.0.0.9 110

Page 14: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 14

Multidimensional structure ex.Multidimensional structure ex.

All traffic All traffic

US EU

CA NY GB DE

Web Mail

Source net Application

US Web

Nodes (clusters) have multiple parents

US

Web

Nodes (clusters) overlap

CA

Page 15: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 15

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 16: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 16

System: AutoFocusSystem: AutoFocus

Trafficparser

Web basedGUI

Cluster miner

Grapher

Packet header trace

categories

names

Page 17: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 17

Page 18: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 18

Page 19: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 19

Page 20: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 20

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 21: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 21

Backups from CAIDA to tape server

Semi-regular time pattern

FTP from SLAC Stanford

Scripps web traffic

Web & Squid servers

Large ssh traffic

Steady ICMP probing from CAIDA

Structure of regular traffic mixStructure of regular traffic mix

SD-NAP

SD-NAP

Page 22: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 22

Analysis of unusual eventsAnalysis of unusual events UCSD to UCLA route change Sapphire/SQL Slammer worm

Site 2

Page 23: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 23

ConclusionsConclusions

1010111101010000101011111101011001010101101011010000101010100101010111101010101000101111010000010111111101011001010111010111100100101010100011011111100010101110110101100101010110101111000010101011110111010111010101010111111010110010101011010101111101010000110100001011010100101011001000000101011001010101011111000010001000010101011110101000010111001010101101011110000010101011111101011000101111010000010111110101011010111100100101010110010101010001010100101010110101010010111001010000010100001110110101010110111111000101011101011101011001010101101011110000110111101110101110101010101111110101100101010110101111011101010000110101010010101101010111010101001010000101011010101001010100000101010101010101101011101010100000010101010101101010101011110101110101011010100011000101010010111010101001101010100001000110101111010100010110

Page 24: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 24

ConclusionsConclusions Multidimensional traffic clusters using natural hierarchies

describe traffic aggregates

Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity

Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates

Our prototype system, AutoFocus, provides insights into the structure of regular traffic and unexpected events

Page 25: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 25

Thank you!Thank you!

Alpha version of AutoFocus downloadable from

http://ial.ucsd.edu/AutoFocus/

Any questions?

Acknowledgements: NIST, NSF, Vern Paxson, David Moore, Liliana Estan, Jennifer Rexford, Alex Snoeren, Geoff Voelker

Page 26: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 26

Bounds and running timesBounds and running times

Report size Running time Memory usage

unc. 1dim. rep. ≤1+(d-1)T/H O(n+m(d-1)) O(m(d-1))

1dim. report ≤ T/H linear linear

1dim. Δ report ≤T1/H+T2/H linear

unc. +dim. rep. ≤ T/H ∏di ≈result*n O(m+result)

+dim. rep. ≤ T/H ∏di/max(di)

+dim. Δ report ≈eresult

Page 27: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 27

Open questionsOpen questions

Are there tighter bounds for the size of the reports?

Are there algorithms that produce smaller results?

Are there algorithms that compute traffic reports more efficiently? In streaming fashion?

Page 28: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 28

Delta reportsDelta reports Why repeat the same traffic report if the traffic doesn’t

change from one day to the other?

Delta reports describe the clusters that increased or decreased by more than the threshold from one interval to the other

On related traffic mixes delta reports much smaller than traffic reports

Multidimensional compression very hard for delta reports

We have only exponential algorithm for the cluster delta

Page 29: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 29

Greedy compression algorithmGreedy compression algorithm

Page 30: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 30

Multidimensional report exampleMultidimensional report example

Thresholding Compression

Page 31: Trafficclusters

April 13, 2023 Traffic Clusters - 2003 31

System detailsSystem details

Part Language LoC Status

Backend C++ 5400 stable

GUI HTML,

Javascript

1000 functional

Glue perl 350 evolving