125
Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino Email:[email protected] 1

Traffic classification and applications to traffic monitoring

  • Upload
    amos

  • View
    53

  • Download
    2

Embed Size (px)

DESCRIPTION

Traffic classification and applications to traffic monitoring. Marco Mellia Electronic and Telecommunication Department Politecnico di Torino Email:[email protected]. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior - PowerPoint PPT Presentation

Citation preview

Page 1: Traffic classification and applications to traffic monitoring

Traffic classification and applications to traffic monitoring

Marco MelliaElectronic and Telecommunication DepartmentPolitecnico di TorinoEmail:[email protected]

1

Page 2: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Traffic Classification & Measurement Why?

Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering …

How? By means of passive measurement

2

Page 3: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 3

Scenario

Traffic classifier Deep packet inspection Statistical methods

Persistent and scalable monitoring platform Round Robin Database (RRD) Histograms

Internal ClientsEdge

Router

External Servers

http

://ts

tat.t

lc.po

lito.

it

Page 4: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Tstat at a Glance

Page 5: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Worm and Viruses?

Did someone open a Christmas card? Happy new year to Windows!!

Page 6: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Anomalies (Good!)Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

Page 7: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

New Applications – P2PTVFiorentina 4 - Udinese 2

Inter 1 - Juventus 0

Page 8: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Megaupload blocked 19/01/12

Page 9: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How to monitor traffic?

All previous examples rely on the availability of a CLASSIFIER A tool that can discriminate classes of traffic

Classification: the problem of assigning a class to an observation The set of classes is pre-defined The output may be correct or not

9

Page 10: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Some terminology

Question: Is this a cat, a rabbit, or a dog?

10

Page 11: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Some terminology

Question: Is this a cat, a rabbit, or a dog?

11

Page 12: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How to compute performance? Confusion matrix

On rows we have the actual class On columns we have the predicted class

Allows to see if some confusion arises

12

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Page 13: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How to compute performance? Confusion matrix

True positive It was classified as a cat, and it was a cat

13

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Page 14: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How to compute performance? Confusion matrix

False negative It was classified NOT as a cat, but it was a cat

14

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Page 15: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How to compute performance? Confusion matrix

True negative It was classified NOT as a cat, and it was NOT a cat

15

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Page 16: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How to compute performance? Confusion matrix

False positive It was classified as a cat, but it was NOT a cat

16

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Page 17: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Other metrics

Accuracy: is the ratio of the sum of all True Positives to the sum of all tests, for all classes.

It is biased toward the most predominant class in a data set. Consider for example a test to identify patients

that suffer from a disease that affects 10 patient over 100 tests. The classifier that always returns ``sane'' will have accuracy of 90%.

17

Page 18: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Other metrics

Recall of a class: is the ratio of the True Positives and the sum of True Positives and False Negatives. Recall(cat)=5/(5+3+0)

It is a measure of the ability of a classifier to select instances of the given class from a data set 18

Page 19: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Other metrics

Precision of a class: is the ratio of True Positives and the sum of True Positives and False Positive Precision(cat) = 5/(5+2+0)

It is a metric that measure how precise is the classifier in labeling only samples of a given class19

Predicted classCat Dog Rabbit

Actualclass

Cat 5 3 0Dog 2 3 1

Rabbit 0 2 11

Page 20: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Traffic classification

InternetService Provider

Look at the packets…

Tell me what protocol and/or application

generated them

Page 21: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

InternetService Provider

Port:

Port: 4662/4672

Port:

Port:

??

? ?

Payload: “bittorrent”

Payload: E4/E5

Payload:

Payload:

?

RTP protocol

Skype Bittorrent

Gtalk eMule

Typical approach: Deep Packet Inspection (DPI)

Page 22: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

The problem of traffic classification Deep Packet Inspection

Based on looking for some pre-defined payload patterns, deep in the packet

Simple at L2-L4 “if ethertype == 0x0800, then there is an IP packet” Usually done with a set of if-then-else or even switch-

case Ambiguous at L7

TCP port 80 does not mean automatically “protocol HTTP”

Page 23: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

DPI: Rule-set complexity Practical rule-sets:

Snort, as of November 2007 8536 rules, 5549 Perl Compatible Regular Expressions

OpenDPI as of February 2012 118 protocols

Tstat as of February 2012 Approx 200 classes/services

23

Deep packet inspection

Regular expression matching at line rate

Finite Automata based techniques

=

Page 24: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Some notes...

Protocol identification… … or application verification?

Skype can use the standard HTTP protocol to exchange data

Is that traffic “Skype” or “HTTP”? Today everything is going over HTTP

Is it Facebook? Twitter? YouTube video? Or HTTP?

Page 25: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

The question

Which granularity are you interested into ??

25

Page 26: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Several approaches to traffic classification

Content-based

Packet-based(e.g., Spatscheck)

Port-based(stateless)

Statistical methods

Host social behaviour(e.g., Faloutsos)

Traffic statistics(e.g., Salgarelli, Baiocchi, Moore, Mellia)

Auto-learning methods (e.g. Bayes)

Preclassified bins

Message-based

Protocol behaviour(e.g., BinPac, SML)

Traffic classification

Pre-computed or auto-learning signatures

Payload-based(stateful)

Page 27: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Some references

Page 28: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Some references

Some critical overview/tutorial T.T.T.Nguyen, G.Armitage. A survey of techniques for internet traffic

classification using machine learning, Communications Surveys & Tutorials, IEEE, V.10, N.4, pp.56 - 76

H.Kim, KC Claffy, M.Fomenkov, D.Barman, M.Faloutsos, K.Lee. Internet traffic classification demystified: myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference (CoNEXT '08). ACM, New York, NY, USA, 2008.

For this class: D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli Revealing skype traffic:

when randomness plays with you ACM SIGCOMM Kyoto, JP, ISBN: 978-1-59593-71, 27 August 2007.

A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st TMA Workshop, Aachen, 11 May 2009.

A. Finamore, M. Mellia, M. Meo, D. Rossi, KISS: Stochastic Packet Inspection Classifier for UDP Traffic, IEEE/ACM Transactions on Networking "5", Vol. 18, pp. 1505-1015, ISSN: 1063-6692, October 2010.

G. La Mantia, D. Rossi, A. Finamore, M. Mellia, M. Meo, Stochastic Packet Inspection for TCP Traffic, IEEE ICC, Cape Town, South Africa, 23 May 2010.

Page 29: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

InternetService Provider

Port:

Port: 4662/4672

Port:

Port:

??

? ?

Payload: “bittorrent”

Payload: E4/E5

Payload:

Payload:

?

RTP protocol

Skype Bittorrent

Gtalk eMule

Typical approach: Deep Packet Inspection (DPI)

It fails more and more:P2P

EncryptionProprietary solution

Many different flavours

Page 30: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

The Failure of DPI

11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released

Page 31: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Possible Solution: Behavioral ClassifierPhase 1

Feature

Phase 3

Verify

1. Statistical characterization of traffic2. Look for the behaviour of unknown traffic and

assign the class that better fits it3. Check for possible classification mistakes

Phase 2

DecisionTraffic(Known)

(Training) (Operation)

Page 32: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Behavioural classifiers

Which statistics? Packet size

Average, std, max, min Len of first X pkts

IPG Average, std, max, min IPG of first X+1 pkts

Total size, duration, #data packets From client, from server, from both RTT, #concurrent connection, rtx,

dups, … TCP options, flags, signaling, …

Feature selection?

Which decision process? Ad Hoc Bayesian Neural Networks Decision trees SVM …

Which training set? Supervised techniques

32

Page 33: Traffic classification and applications to traffic monitoring

The case of Skype

Consider a simple example

33

Page 34: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Our Goal

Identify Skype traffic Motivations

Operators need to know what is running in their network New business models, provisioning, TE, etc.

Understand user behaviour Traffic characterization, security … It’s fun

Page 35: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype Overview

Skype offers voice, video, chat and data transfer services over IP

Closed design, proprietary solutions P2P technology Proprietary protocols Encrypted communications

Easy to use, difficult to reveal It is the perfect example of DPI failure

No serverNo well-known port…

No standardNo RFC

State-of-the-ArtEncryption/ObfuscationMechanisms

Page 36: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Our Goal

Identify Skype traffic Voice stream first: both E2E and SkypeOut/In

streams Possible video/chat/file transfers/signaling

Constraints Passive observation of traffic Protocol ignorance

Page 37: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Three Classifiers

Naïve Bayes Classifier

Payload Based Classifier

Chi Square Classifier

Skype?

TrafficFlow

Skype?

Skype?

AND

Skype?

Page 38: Traffic classification and applications to traffic monitoring

Phase 1 – try to understand it

Page 39: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype as VoIP Application

Skype selects the voice codec from a list Low bit rate: 10-32 kbps Regular Inter-Packet-Gap (30 ms frames)

Redundancy may be added to mitigate packet loss

Framing may be modified from the original codec one

Multiplexes different source into the same message (voice, video, chat,…)

Page 40: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype Source Model

SkypeMessageTCP/UDPIP

Page 41: Traffic classification and applications to traffic monitoring

Skype Header Formats(What we guess about it)

Can we design a DPI classifier?

Page 42: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Possible Skype Messages

Signaling and data messages Use TCP, with ciphered payload

Login, lookup, signaling… Data flow

Use UDP whenever possible: payload is encrypted… but some header MUST be exposed

Question: Some header MUST be exposed… Why?!??

Impossibleto exploit.Everything is ciphered

Page 43: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Possible Skype Messages

Signaling and data messages Use TCP, with ciphered payload

Login, lookup, signaling… Data flow

Use UDP whenever possible: payload is encrypted… but some header MUST be exposed

Source

AES

Receiver

AES

Impossibleto exploit.Everything is ciphered

Unreliable

Page 44: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype Source Model

SkypeMessageTCP/UDPIP

Page 45: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

SoM Format for E2E Messages

Start of Message (SoM) of End2End messages carried by UDP has:

ID: 16 bits long random identifier FUNC: 5 bits long function (multiplexing?),

obfuscated in a Byte

0 8 16 24---------------------| ID | FUNC |---------------------

Page 46: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 46

Function Values

0x01 = ??Query message

0x02 = ??Query

0x0d = Data

0x07 = NAK

VoiceVideoChatFile

Page 47: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

PBC SoM can be used to identify

Skype flows carried by UDP 5bits long signature

Question: Which is the chance that you have a

false positive?

Classic signature based classifier

Page 48: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

PBC SoM can be used to identify

Skype flows carried by UDP 5bits long signature

Question: Which is the chance that you have a

false positive? We look for 1 string out of 32 possible strings

1/32 of false detection possible Can we improve it by checking multiple packets

Yet we can have a very high false positive rate

Classic signature based classifier

Page 49: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

PBC

Any other smart way of improving accuracy? Hint: this is UDP

49

Page 50: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

PBC SoM can be used to identify

Skype flows carried by UDP 5bits long signature

Classic signature based classifier

Page 51: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

PBC SoM can be used to identify

Skype flows carried by UDP 5bits long signature

IMPROVE: Identify Skype socket address at clients The UDP port is FIXED and not random (as in TCP) Then, look for Skype flows with the same UDP port

It works with UDP only at edge node only

Complex Cannot discriminate VOICE/VIDEO/CHAT/DATA

Classic signature based classifier

Page 52: Traffic classification and applications to traffic monitoring

Skype Encrypts Traffic

Can we leverage this?

Page 53: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype Source Model

SkypeMessageTCP/UDPIP

Page 54: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Randomness Classifier

Skype encrypts traffic payload looks like random

Some headers are constant (FUNC) Apply randomness test to the payload bits

Chi-Square test:statistic test for random sequences

i

i

EEx 2

Page 55: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 55

CSC

Split the payload into groups

Apply the test on the values assumed at each group Each message is an

observation Some groups will contain

Random bits Mixed bits Deterministic bits

0 8 16 24---------------------| ID | FUNC |---------------------

Page 56: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

CSC

1

10

100

1000

10000

100000

1e+006

100 1000 10000 100000 1e+006n [pkt]

Deterministic groupRandom group

Mixed group

Set a threshold

Page 57: Traffic classification and applications to traffic monitoring

Skype is a VoIP Application

Which are the features that make it different from a bulk download?

Page 58: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype Source Model

SkypeMessageTCP/UDPIP

Page 59: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Which features?

Question: Which features would you select to differentiate a VoIP stream from a data download?

59

Page 60: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Sample Trace

0 20 40 60 80

100[K

bps]

Average ThroughputBandwidth limit

0 20 40 60 80

100

[ms]

Framing

0 50

100 150 200 250 300

0 30 60 90 120 150 180 210 240 270 300

[Byt

es]

Time [s]

Skype Message Size

RegularIPG

Small/regularpackets

Page 61: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Naive Baysean Classifier

Simple classifier: based on the a-priori prob, evaluate the a-posteriori prob

How similar is this flow to a Skype voice flow?

What makes “VoIP” traffic different from other traffic? Packet size, i.e., small packets (packet NBC) Inter-Packet-Gap, i.e., small IPG (IPG NBC)

Page 62: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Design of the behavioral classifier We consider windows of N packets In each window, we check the IPG and the payloadsize

distribution We compare against the expected distribution and

compute a belief One belief for each “mode”

After K windows, take a decision Consider the most likely mode in each windows Average over all K windows to get the average max belief Compare it against a threshold

62

Page 63: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Skype Naive Bayes Classifier

W(k) W(k+1) W(k+2) W(k+3) W(k+4)

X

Y

max

max

AVG

AVG

min

Bs(k,j)

E[Bs(,j)]

Bt(k) E[Bt]

B

Packet NBCPacket NBCPacket NBC

IPG NBCIPG NBCIPG NBC``

Page 64: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

NBC over Time

-25

-20

-15

-10

-5

0

0 30 60 90 120 150 180 210 240

Bel

ief

Time

E[PKT] = 90BE[PKT] = 210BE[PKT] = 252B

0

50

100

150

200

[Byt

es]

Set a threshold

Page 65: Traffic classification and applications to traffic monitoring

Results

Page 66: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Three Classifiers

Naïve Bayes Classifier

Payload Based Classifier

Chi Square Classifier

Skype?

TrafficFlow

Skype?

Skype?

AND

Skype?

UDPbenchmarkdataset

Page 67: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Scenario

Testbed traces: 100% accuracy Campus LAN@polito

Simple scenario, no P2P, no VoIP Italian ISP

Stiff scenario: lot of P2P, tons of VoIP Results consider

True positive (OK): Skype, and identified False positive (FP): Not Skype, but identified False negative (FN): Skype, but discarded

Page 68: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

N OK FP FP% FN FN%PBC 65 (50) --- --- --- ---

NBC 27437 50 27387 73.73% 15 23.08%

CSC 191 57 134 0.36% 8 12.31%

NBC+CSC 51 49 2 0.01% 16 24.62%

TOT >100 37212

Performance Evaluation: UDP

NBC + CSC

Chi Square Classifier

Naïve Based Classifier

Payload Based Classifier

Page 69: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

CSC Threshold Impact

1

10

100

1000

10000

100000

1e+006

100 1000 10000 100000 1e+006n [pkt]

Deterministic groupRandom group

Mixed group

0 1 2 3 4 5 6

0 100 200 300

FP [%

]

E2E

0 10 20 30 40 50 60 70 80 90

100

0 100 200 300

FN [%

]

CSC Threshold

Page 70: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

NBC Threshold Impact

-25

-20

-15

-10

-5

0

0 30 60 90 120 150 180 210 240

Bel

ief

Time

E[PKT] = 90BE[PKT] = 210BE[PKT] = 252B

0

50

100

150

200

[Byt

es]

0

1

2

3

4

5

-30 -25 -20 -15 -10 -5 0

FP [%

]

E2E

0

20

40

60

80

100

-30 -25 -20 -15 -10 -5 0

FN [%

]

Minimum Belief

E2E

Page 71: Traffic classification and applications to traffic monitoring

Kiss: chi square stocastich classifier orStocastic Packet InspectionGeneralize it

71

Page 72: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Statistical characterization of bits in a flow

Do NOT look at the SEMANTIC and TIMING… but rather look at the protocol FORMAT

Testc2

Page 73: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

0 4 8 16 19 24 32

Source Port Destination Port

Sequence Number

Acknowledgment Number

Checksum

Options

Window

Urgent Pointer

HLEN Resv Control flag

Padding

Question: Which protocol is this?

CONSTANT CONSTANT

CONSTANTCONSTANT

CONSTANT

COUNTER

COUNTER

RANDOM

RANDOM

RANDOM

Page 74: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 74

CSC

Split the payload into groups

Apply the test on the values assumed at each group Each message is an

observation Some groups will contain

Random bits Mixed bits Deterministic bits

0 8 16 24---------------------| ID | FUNC |---------------------

Page 75: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Chunking and c2

UDP headerFirst N payload

bytes

C chunks Each of

b bitsc2

1c2

C[ ], … ,Vector of Statistics

The provides an implicit measure of entropy or randomness

c2

Observeddistribution

Expecteddistribution(uniform)

Page 76: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Consider a chunk of 2 bits:

0 1 2 3 0 1 2 3 0 1 2 3

RandomValues

DeterministicValue Counter

Oi

and different beaviour

Ei

Page 77: Traffic classification and applications to traffic monitoring

Question: Why comparing against a UNIFORM pdf??

77

Page 78: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

4 bit long chunks: evolution

random

x x x x

c2

Page 79: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

random

Deterministic )12(2 bNc

0 0 0 1

4 bit long chunks: evolutionc2

Page 80: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

random

deterministic

mixed

x 0 0 0

x 0 x 0

0 x x x

4 bit long chunks: evolutionc2

Question: is it dependent on WHICH bits are fixedAnd WHICH are random???

Page 81: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

KISS

1

10

100

1000

10000

100000

1e+006

100 1000 10000 100000 1e+006n [pkt]

Deterministic groupRandom group

Mixed group

Page 82: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Question: What is this?!?!

2 byte long counter

MSG L2 L1 LSGMostSignificantGroup

LessSignificantGroup

Page 83: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Protocol format as seen from thec2

Page 84: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

c2

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Page 85: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

C-dimension space

c21

c2C[ ], … ,

Iperspace

ClassificationRegions

EuclideanDistance

Support VectorMachine

c2i

c2j

?Class

Class

My Point

Page 86: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Example considering the c2

Page 87: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

c2i

c 2j Centroid

Center of mass

Euclidean Distance Classifier

Page 88: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

c2i

c 2j

True NegativeAre “Far”

True PositivesAre “Nearby”

CentroidCenter of mass

Euclidean Distance Classifier

Page 89: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

c2i

c2j

False Positives

CentroidCenter of mass

Iper-sphere

Euclidean Distance Classifier

Page 90: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

c2i

c 2j Centroid

Center of mass

Iper-sphere False negatives

Radius

Euclidean Distance Classifier

Page 91: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

c2i

c 2j Centroid

Center of mass

Iper-sphere max { True Pos. } min { False Neg. }

Confidence

The distance is a measure of the condifence of the decision

Euclidean Distance Classifier

Page 92: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Radius

True

Pos

itive

– F

alse

pos

itive

How to define the sphere radius?

Page 93: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Space ofsamples(dim. C)

Kernel function

Space of feature

(dim. ∞)

Kernel functions Move point so that borders

are simple

Support Vector Machine

Page 94: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Support vectors

Support vectors

Kernel functions Move point so that borders

are simple

Borders are planes Simple surface! Nice math Support Vectors

LibSVM

Support Vector Machine

Page 95: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Decision Distance from the border Confidence is aprobability

p ( class )

Kernel functions

Borders are planes Simple surface! Nice math Support Vectors

LibSVM

Support Vector Machine

Page 96: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Performance evaluationHow accurate is all this?

Our ApproachPhase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

c2

Page 97: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Per flow and per endpoint

What are we going to classify? It can be applied to both single flows And to endpoints

Question: Do we assume to monitor ALL packets? Do we assume to monitor since the first

packet?

97

Page 98: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Per flow and per endpoint

What are we going to classify? It can be applied to both single flows And to endpoints

Question: Do we assume to monitor ALL packets? Do we assume to monitor since the FIRST

packet? NO!

It is robust to sampling It can start from any point

98

Page 99: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Per flow and per endpoint

99

Page 100: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Real traffic tracesInternet

Fastweb

Known + Other Training Known Traffic False Negatives Unknown traffic False Positives

Trace

RTPeMuleDNS

Oracle(DPI +Manual )

other

Other UnknownTraffic

1 day long trace

20 GByte diUDP traffic

Page 101: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Definition of false positive/negative

TrafficOracle (DPI)

eMuleRTP

DNS

Other

Classifing “known”

true positives

false negatives

KISS

true negatives

false positives

Classifing “other”KISS

Page 102: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Case A Case BRtp 0.08 0.23Edk 13.03 7.97Dns 6.57 19.19

Case A Case B0.00 0.050.98 0.540.12 2.14

Case A Case Bother 13.6 17.01

Euclidean Distance SVM

Case A Case B0.00 0.18

Results

Known traffic(False Neg.)

[%]

Other(False Pos.)

[%]

Page 103: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Tuning trainset size

%

True positives

False positives

Samples per class

Small training setFor “known”: 70-80 MbyteFor “other”: 300 Mbyte

Page 104: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

c2

packets

%

True positives

False positives

Tuning Num. of Packets for

(confidence 5%)

Protocols with volumesat least 70-80 pkts per flow

Page 105: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Real traffic trace

RTP errors are oracle mistakes(do not identify RTP v1)

DNS errors are due to impure training set

(for the oracle all port 53 is DNS traffic)

EDK errors are (maybe) Xbox Live(proper training for “other”)

FN are always below 3%!!!

Page 106: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

P2P-TV applications

P2P-TV applications are becoming popularThey heavly rely on UDP at the transport protocolThey are based on proprietary protocolsThey are evolving over time very quicklyHow to identify them?... After 6 hours, KISS give you results

Page 107: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Putting all together

Now with 9 classes 3 different networks

107

Page 108: Traffic classification and applications to traffic monitoring

Another example of behavioral classifier

Abacus

108

Page 109: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Abacus: Rationale Applications are like people in a party

room Some prefer brief exchanges with many other

people

Some likes long talks with few other people

“Attitudes” are different across P2P applications...

Some prefer to download small pieces of data from many peers

Some prefer to download all data from almost the same peers

... enough to classify them Observe a host for a given time

Count the number of peers contacted and the number of packets exchanged which represent the attitude

Page 110: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Abacus signature definition Consider a host X which in a fixed time-

window ΔT = 5s is contacted by N=5 peers Yi

for each peer Yi count the number of packets sent to X in ΔT

Consider a set of bins of exponential width

Divide the peers in bins according to the number of exchanged packets

Normalize the bins, i.e. divide for the total number of peers N

The final signature is an empirical probability distribution function

In the example N=5, bins = (1, 0, 2, 2) Abacus signature (0.2, 0, 0.4, 0.4)

X

Y1 Y2

1 2 3-4 5-8 ...

Y3 Y5Y4

9-16

Page 111: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Signature comparisonPP

Live

TVA

nts

Joos

t

SopC

ast

Page 112: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Performance evaluationHow accurate is all this?

Our ApproachPhase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Statistical characterization of bits in a flowABACUS signatures

Decision process

Supervised machine learning based on SVM

Page 113: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Hyper-space is partitioned every point is given a label even “unknown” apps

Need a way to recognize them Define a center for each class Define a threshold R Calculate the distance d between

the point and the center of the assigned class

If d > R mark the new point as unknown

Bhattacharyya distance BD Distance between p.d.f.

Rejection criterion

B=qp,BD 1 n

=x

xqxp=B1

Training points

R

R

Center of the class

New points

Labeled as“unknown”

Labeled as“green”

Page 114: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Experimental results

PPLive TVAnts SopCast Joost UnkPPLive 81.66 0.58 9.55 2.32 5.9TVAnts 0.41 98.84 0.15 0.57 0

SopCast 3.76 0.11 89.62 0.32 6.2Joost 2.84 0.55 0.28 89.5 6.9Rea

l val

ue

Abacus signature confusion matrixClassification outcome

Page 115: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Experimental resultsFor “unknown traffic” the selection of the rejection threshold R is fundamental!

For R~0

low TPR

low FPR

For R~1

high TPR

high FPR

For R=0.5

high TPR

low FPR

Page 116: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

The Failure of DPI

Page 117: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Question you should ask yourself Which feature to use?

Are those “portable”? How often should the classifier be retrained? Can it take a decision after some packets?

Open issues Which is a good training set? How to get a valid benchmarking set?

Never trust other classifiers! How to make it work at 100Gb/s? How much traffic can it actually classify (coverage)?

Page 118: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

How Much Traffic can be Classified?

118

Page 119: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Interesting research topics

And for TCP? And for HTTP?

How to get Facebook or Twitter Over HTTPS? When served by the same CDN?

And for the application? Is it a video seen on Facebook?

Can I get rid of the training set? Use unsupervised classifiers?

119

Page 120: Traffic classification and applications to traffic monitoring

120

Page 121: Traffic classification and applications to traffic monitoring

Ops, wrong key

121

Page 122: Traffic classification and applications to traffic monitoring

And for TCP?

122

Page 123: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Chunking and c2

First N payload bytes

C chunks Each of

b bitsc2

1c2

C[ ], … ,Vector of Statistics

The provides an implicit measure of entropy or randomness

c2

Observeddistribution

Expecteddistribution(uniform)

UDPTCP

Page 124: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Results

124

Page 125: Traffic classification and applications to traffic monitoring

3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012

Results

125