41
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1 , Zhichun Li 1 , Yan Chen 1 , Yan Gao 1 , Ashish Gupta 1 , Yin Zhang 2 , Peter Dinda 1 , Ming-Yang Kao 1 , Gokhan Memik 1 1 Lab for Internet and Security Technology (LIST), Northwestern Univ. 2 University of Texas at Austin

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications

  • Upload
    von

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications. Robert Schweller 1 , Zhichun Li 1 , Yan Chen 1 , Yan Gao 1 , Ashish Gupta 1 , Yin Zhang 2 , Peter Dinda 1 , Ming-Yang Kao 1 , Gokhan Memik 1. - PowerPoint PPT Presentation

Citation preview

Page 1: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications

Robert Schweller1, Zhichun Li1, Yan Chen1, Yan Gao1, Ashish Gupta1, Yin Zhang2, Peter Dind

a1, Ming-Yang Kao1, Gokhan Memik1

1 Lab for Internet and Security Technology (LIST), Northwestern Univ. 2 University of Texas at Austin

Page 2: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

The Spread of Sapphire/Slammer Worms

Page 3: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Motivation (online change detection)

• Online network anomaly/intrusion detection over high speed links– Small memory usage– Small # of memory access per packet– Scalable to large key space size

• Primitives for online anomaly detection– Heavy hitters (lots of prior work)– Heavy changes: enabler for aggregate queries

over multiple data streams• Asymmetric routing demands spatial aggregation• Time Series Analysis (TSA) need temporal

aggregation

Page 4: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Outline

• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion

Page 5: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

[Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]First to detect flow-level heavy changes in massive data streams at network traffic speeds

K-ary sketch

1

j

H

0 1 K-1…

……

Page 6: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

k-ary sketch

1

j

H

0 1 K-1…

……

hj(k)

hH(k)

h1(k)Update (k, u): Tj [ hj(k)] += u (for all j)

Estimate v(S, k): sum of updates for key k

KKsumkhT jj

j /11/)]([

median

[Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]APIs:

+ =

S=COMBINE(,S1,,S2):

Page 7: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

??

• Main problem– Cannot efficiently report keys with heavy change

INFERENCE(S,t)– Important function for anomaly detection!

• Our Contribution– Determine set of keys that have “large” estimates in

a sketch

Reverse Sketch Problem

Page 8: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Reversible sketch framework

Streamingdatarecording

reversiblek-ary

sketch

value storedvalue

Modularhashing

IP manglingkey

Heavychangedetection reversible

k-ary sketch

Reversehashing

ReverseIP mangling

heavychangekeys

changethreshold

Page 9: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Outline

• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion

Page 10: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

• Intersect A1, A2, A3, A4, A5

Taking Intersections

H = 5 K = 212 #keys = 232 (IP addresses)

E[false positives] << 1

Page 11: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

The problem with simple intersection

• Each set Ai can be very large !H = 5 K = 212 #keys = 232 (IP addresses)

|A1| = 232 / 212 = 220

Page 12: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

The problem with simple intersection

• Each set Ai can be very large !

• Solution:

Modular hashing

Page 13: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

010 110 001 101

h()

12 bits

Page 14: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

Page 15: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Modular hashing reduces the set size

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25 Intersection:Only 32 elements per word set

Page 16: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25

Intersection:

Modular hashing reduces the set size

Page 17: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

Page 18: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

IP Mangling with GF (Galois Extension Field)Solution:

IP Mangling: a bijective mapping function for breaking the key space continuity

Page 19: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Outline

• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion

Page 20: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Handling Multiple Intersections…

1

2

3

5

4

b1

b2

b4

b5

b3b3

b1

b2

b4

b5

2H different intersections

Much more difficult – Solution: Reverse Hashing algorithms• Step 1: Reverse hashing for each module• Step 2: Infer the whole key through bucket index matching among candidates from each module

Page 21: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Reverse Hashing for Each Module

123

54

H=5, r=1, K=212

r tolerance level

412

312

212

11212 AAAAA 32w

ijA

}5,2{}3,2{112

111

11 AAG

i

ir GI 11

candidate set of the first word in Hash table i

All possible values of the first word in the sketch

1iG

Take the first word as an example

}3,2{}3,0{132

131

13 AAG

}10,9{}6,2{122

121

12 AAG

}8,2{}10,3{142

141

14 AAG

}9,6{}7,3{152

151

15 AAG

{ 2,3,5}{ 2,

6,9,10}{0,2,3}{ 2,3,8,10}{ 3,6,7,9}

{2}{2,3}

Page 22: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Bucket Index Matrix of Candidates

H=5, r=1, K=212 For each x in I1, we can get B1(x), a vector of the heavy bucket sets which x hashes to.

192.168.0.1

123

54

b11

b21

b42

b51

b32b31

b12

b22

b41

b52

123

54

b11

b21

b42

b51

b32b31

b12

b22

b41

b52

192.123.47.62

123

54

b11

b21

b42

b51

b32b31

b12

b22

b41

b52

192.*.*.* hash to the red heavy buckets

5251

4241

32

21

1211

1

,,

,

)192(

bbbb

bb

bb

B

Page 23: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Prefix Extension Algorithm

I1 I2B1 B2

150

47

236

36,3,19,4,1

15,2

41153

31

5,27,3,2

2

72

104

8,7,35

9,45,12,1

9,312

6,22,1+ =

<150.72>

}8,7,3{}3{}5{}6,3,1{

}9,4{}9,4,1{}5,1{}1{

}2,1{}5,2{

3*

9,412

<47.72>

***5*

* more than r=1Ignore!

<236.104>

31222

Ignore!

Path discovery algorithm

Page 24: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

<150.72>

3*

9,412

<236.104>

31222

+ =

<150.72.182>

3*412

<236.104.49>

31222

<150.72.32>

3*912

182

32

49

31

4,31

2,1

37,1

912

312

6,22

I3 B3

+ =75

9,5,314

2,12

I4 B4

3*412

<150.72.182.75>

31*22

<236.104.49.75>

Prefix Extension Algorithm

Page 25: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Recap:

Streamingdatarecording

reversiblek-ary

sketch

value storedvalue

Modularhashing

IP manglingkey

Heavychangedetection reversible

k-ary sketch

Reversehashing

ReverseIP mangling

heavychangekeys

changethreshold

)( loglog/1 nn

)loglog

log(n

n

n is the size of key space

Page 26: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Outline

• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion

Page 27: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Evaluation

• Dataset– A large US ISP (330M Netflow records)– NU (19M Netflow records)

• Efficient data recordingFor the worst case traffic, all 40-byte packets– Software: 526Mbps on P4 3.2Ghz PC– Hardware: 16Gbps on a single FPGA broad– Only a few hundred KB to a couple of MB memory used– Only 15 memory access per packet for 48 bit reversible s

ketches and 16 per packet for 64 bit reversible sketches• Efficient heavy change detection and key inference

– 0.34 seconds for 100 changes. 13.33 seconds for 1000 change

Page 28: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Key Inference Accuracy• True positives and false positives of 16bit

reversible sketches for 32bit IP addresses

88

92

96

100

20001600120085045050

0.040.060.080.120.252.40

True

Pos

itive

Per

cent

age

Number of heavy changes

H=6, r=1H=6, r=2H=5, r=1Deltoids

0.2

0.6

1

20001600120085045050

0.040.060.080.120.252.40

Fals

e P

ositi

ve P

erce

ntag

e

Number of heavy changes

H=6, r=1H=6, r=2H=5, r=1

Deltoid

[Deltoids]: S.Muthukrishnan and Graham Cormode, What's New: Find Significant Differences in Network Data Streams. Infocom 2004

Page 29: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

• Stress test with larger dataset still accurate• Scalable to larger key space size: similar res

ults for 64bit IP pairs• Built anomaly/intrusion detection system to d

etect, e.g., SYN flooding and port scans [ICDCS 2006]

More Results

Page 30: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Conclusions

Proposed the first reversible sketches which• Record high speed network streams online• Detect the heavy changes and infer the

keys online• Small memory usage, small # of memory

access per packet• Scalable to large key space size

Page 31: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Backup Slides

Page 32: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Related work

• Compare with [deltoids]– Accuracy better– Scalable to large key space better– # of Memory access less

• [PCF, IMC2004]: not reversible• [Q. Zhao et al, IMC2005] [S.Venkataraman,

NDSS2005]: unique fan-out (fan-in) estimation.

Page 33: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Modular Hashing

Optimal Hashing

Page 34: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

However… Not reversibleLack of an inference API: INFERENCE(S,t)• Important function for anomaly detection!• Decouple the recording stage of sketches from the detection stage to enable efficient combine and inference.• Given a threshold t, report keys whose corresponding sum of updates are larger than the threshold.Our contribution: an efficient algorithm for inference

Reversible sketch problem

Page 35: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

??

Page 36: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

IP Mangling with

Solution:

Page 37: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

IP-mangling

• Use GF (Galois Extension Field) function for attack resilience

Page 38: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Modular Hashing

Modular Hashing with IP Mangling Optimal Hashing

Page 39: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

Reverse Hashing for Each Module

123

54

b11

b21

b42

b51

b32b31

b12

b22

b41

b52

H=5, r=1, K=212

411

311

211

11111 AAAAA 4

12312

212

11212 AAAAA 32w

ijA

{*}112

111

11 AAG

{*}122

121

12 AAG

{*}132

131

13 AAG

{*}152

151

15 AAG

{*}142

141

14 AAG

s}hash table r)-(Hleast at in bucketsheavy tomapped is |{ 111 vGvGIi

ii

ir

all possible value of the first word for the No. j heavy bucket in Hash table i

all possible value of the first word in Hash table i

All possible value of the first word in the sketch

1ijA

1iG

Take the first word as an example

Page 40: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

False positive reduction by original sketch verifying

<150.72.182.75>

Estimate(<150.72.182.75>, 180)

Threshold150

(<150.72.182.75>, 180)

Final result

Verified original k-ary sketch

Page 41: Reverse Hashing for High-speed  Network Monitoring: Algorithms, Evaluation, and Applications

K-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

• first to detect flow-level heavy changes in massive data streams at network traffic speeds• APIs

– UPDATE(S,k,u): Tj [ hj(k)] += u (for all j)– ESTIMATE(S, k): sum of updates for key k– Linear combination: S=COMBINE(,S1,,S2)

+ =