44
Network Network Anomography Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong Sun Some slides borrow from Yin Zhang

Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

Network Network AnomographyAnomography

Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan

Internet Measurement Conference 2005Berkeley, CA, USA

Presented by Huizhong SunSome slides borrow from Yin Zhang

Page 2: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

2

Network Anomaly DetectionNetwork Anomaly Detection

• Is the network experiencing unusual conditions?– Call these conditions anomalies– Anomalies can often indicate network problems

• DDoS attack, network worms, flash crowds, misconfigurations , vendor implementation bugs, …

– Need rapid detection and diagnosis• Want to fix the problem quickly

• Questions of interest– Detection

• Is there an unusual event?

– Identification• What’s the best explanation?

– Quantification• How serious is the problem?

Page 3: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

3

Network AnomographyNetwork Anomography

• What we want – Volume anomalies

[Lakhina04]

Significant changes in an Origin-Destination flow, i.e., traffic matrix element

– Detect Volume anomalies

– Identify which O-D pair

A

BC

Page 4: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

4

Network AnomographyNetwork Anomography• Challenge

– It is difficult to measure traffic matrix directly– The anomalies detection problem is somewhat more

complex and difficult• First, anomaly detection is performed on a series of

measurements over a period of time, rather than from a single snapshot.

• In addition to changes in the traffic, the solution must build in the ability to deal with changes in routing.

• What we have– Link traffic measurements Simple Network

Management Protocol (SNMP) data on individual link loads is available almost ubiquitously.

• Network Anomography– Infer volume anomalies from link traffic

measurements

Page 5: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

5

An IllustrationAn Illustration

Courtesy: Anukool Lakhina [Lakhina04]

Page 6: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

6

Anomography =Anomography =Anomalies + TomographyAnomalies + Tomography

Page 7: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

7

Mathematical FormulationMathematical Formulation

Problem: Infer changes in TM elements (xt) given link measurements (bt)

Only measure at links

1

3

2router

route 1

route 3

route 2

,t,t,t xxb 321

link 2

link 1

link 3

t

t

t

t

t

t

x

x

x

b

b

b

,3

,2

,1

,3

,2

,1

011

101

110

Page 8: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

8

Mathematical FormulationMathematical Formulation

bt = At xt (t=1,…,T)

Typically massively under-constrained!

Only measure at links

1

3

2router

route 1

route 3

route 2

,t,t,t xxb 321

link 2

link 1

link 3

Page 9: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

9

Static Network AnomographyStatic Network Anomography

Time-invariant At (= A), B=[b1…bT], X=[x1…xT]

Only measure at links

1

3

2router

route 1

route 3

route 2

,t,t,t xxb 321

link 2

link 1

link 3

B = AX

Page 10: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

10

Anomography StrategiesAnomography Strategies

• Early Inverse1. Inversion

– Infer OD flows X by solving bt=Axt

2. Anomaly extraction– Extract volume anomalies X from inferred X

Drawback: errors in step 1 may contaminate step 2

• Late Inverse1. Anomaly extraction

– Extract link traffic anomalies B from B

2. Inversion– Infer volume anomalies X by solving bt=Axt

Idea: defer “lossy” inference to the last step

Page 11: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

11

Extracting Link Anomalies Extracting Link Anomalies BB• Temporal Anomography:

– Fourier / wavelet analysis• Link anomalies = the high frequency components

– ARIMA modeling• Diff• EWMA (Exponentially Weighted Moving Average) is

ARIMA(0, 1, 1) • Holt-Winters is ARIMA(0, 2, 2)

– Temporal PCA• PCA = Principal Component Analysis• Project columns onto principal link column vectors

• Spatial Anomography:– Spatial PCA [Lakhina04]

• Project rows onto principal link row vectors

Page 12: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

12

Extracting Link Anomalies Extracting Link Anomalies BB

• Fourier analysis– Fourier analysis decompose a complex periodic

waveform into a set of sinusoids with different amplitudes, frequencies and phases.

– The sum of these sinusoids can exactly match the original waveform.

– The idea of using the Fourier analysis to extract anomalous link traffic is to filter out the low frequency components.

– In general, low frequency components capture the daily and weekly traffic patterns, while high frequency components represent the sudden changes in traffic behavior.

Page 13: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

13

Extracting Link Anomalies Extracting Link Anomalies BB

• Fourier analysis– For a discrete-time signal x0, x1, . . . , xN-1, the

Discrete Fourier Transform (DFT) is defined by

– where fn is a complex number that captures the amplitude and phase of the signal at the n-th frequency

– Lower n corresponds to a lower frequency component, with f0 being the DC component,

– fn with n close to N/2 corresponding to high frequencies

Page 14: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

14

Extracting Link Anomalies Extracting Link Anomalies BB

• Fourier analysis– The Inverse Discrete Fourier Transform (IDFT)

is used to reconstruct the signal in the time domain by

– An efficient way to implement the DFT and IDFT is the Fast Fourier Transform (FFT).

– The computational complexity of the FFT is O(N log(N)).

Page 15: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

15

Extracting Link Anomalies Extracting Link Anomalies BB

• FFT based anomography.– 1. Transform link traffic B into the frequency

domain: F = FFT(B): apply the FFT on each row of B. (a row corresponds to the time series of traffic data on one link.)

– 2. Remove low frequency components: i.e. set Fi = 0, for i ∈[1, c] ∪ [N-c, N], where c is a cut-off frequency.

• (For example, using 10-minute aggregated link traffic data of one week duration, and c = 10N/60, corresponding to a frequency of one cycle per hour.)

– 3. Transform back into the time domain: i.e. we take B = IFFT(F). The result is the high frequency components in the traffic data, which we will use as anomalous link traffic

Page 16: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

16

Extracting Link Anomalies Extracting Link Anomalies BB

• Wavelet analysis– 1. Use wavelets to decompose B into

different frequency levels: W = WAVEDEC(B), by applying a multi-level 1-D wavelet decomposition on each row of B.

– 2. Then remove low- and mid-frequency components in W by setting all coefficients at frequency levels higher than wc to 0. Here wc is a cut-off frequency level.

– 3. Reconstruct the signal: B = WAVEREC(W’). The result is the high-frequency components in the traffic data.

Page 17: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

17

Extracting Link Anomalies Extracting Link Anomalies BB

• ARIMA Modeling -- Box-Jenkins methodology, or AutoRegressive Integrated Moving Average (ARIMA)

• A class of linear time-series forecasting techniques that capture the linear dependency of the future values on the past.

• It has been extensively used for anomaly detection in univariate time series.

• To get back to anomaly detection, we simply identify the forecast errors as anomalous link traffic.

• Traffic behavior that cannot be well captured by the model is considered anomalous.

Page 18: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

18

Extracting Link Anomalies Extracting Link Anomalies BB

• ARIMA(p, d, q) model includes three parameters:– The autoregressive parameter (p),

– The number of differencing passes (d),

– The moving average parameter (q).

– Some model used for detecting anomalies in time-series,

• for example, the Exponentially Weighted Moving Average (EWMA) is ARIMA(0, 1, 1); Holt-Winters is ARIMA(0, 2, 2).

Page 19: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

19

Extracting Link Anomalies Extracting Link Anomalies BB

• ARIMA(p, d, q) model includes three parameters:– the autoregressive parameter (p), – the number of differencing passes (d),– the moving average parameter (q).

where zk is obtained by differencing the original time series d times (when d ≥ 1) or by subtracting the mean from the original time series (when d = 0), ek is the forecast error at time k, φi (i = 1, ..., p) and θj (j = 1, ..., q) are the autoregression and moving average coefficients, respectively.

Page 20: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

20

Extracting Link Anomalies Extracting Link Anomalies BB

Page 21: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

Diagnosing Network-Wide Diagnosing Network-Wide Traffic AnomaliesTraffic Anomalies

Anukool Lakhina, Mark Crovella, Christophe Diot

“Diagnosing Network-Wide Traffic Anomalies”

SIGCOMM’04,

Page 22: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

22

Extracting Link Anomalies Extracting Link Anomalies BB

• Spatial Anomography: Spatial PCA [Lakhina04] – 1. Identify the first axis that the link traffic data

have the greatest degree of variance along the first axis

– 2. Identify the second axis that the link traffic data have the second greatest degree of variance along the second one, and so on so forth:

Page 23: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

23

Extracting Link Anomalies Extracting Link Anomalies BB

• Spatial Anomography: Spatial PCA [Lakhina04]

– 3. Divide the link traffic space into the normal subspace and the anomalous subspace

• by examining the projection of the time series of link traffic data on each principal axis in order.

• As soon as a projection is found that contains a 3σ deviation from the mean, that principal axis and all subsequent axes are assigned to the anomalous subspace.

• All previous principal axis are assigned to the normal subspace.

Page 24: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

24

Data CollectedData Collected

Abilene Sprint-Europe

Page 25: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

25Low Intrinsic Dimensionality of Link Low Intrinsic Dimensionality of Link TrafficTraffic

Studied via Principal Component Analysis

Key result: Normal traffic is well approximated as occupying a low dimensional subspace

Reasons: 1. Links share OD flows2. Set of OD flows also low dimensional

Page 26: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

26

The Subspace MethodThe Subspace Method

• An approach to separate normal from anomalous traffic

• Normal Subspace, : space spanned by the first k principal components

• Anomalous Subspace, : space spanned by the remaining principal components

• Then, decompose traffic on all links by projecting onto and to obtain:

Traffic vector of all links at a particular point in time

Normal trafficvector

Residual trafficvector

Page 27: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

27

Traffic on Link 1

Tra

ffic

on

Link

2A Geometric IllustrationA Geometric Illustration

In general, anomalous traffic results in a large value of

y

Page 28: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

28

DetectionDetection

Traffic on Link 1

Tra

ffic

on

Link

2

• Capture size of vector using squared prediction error (SPE):

Result due to [Jackson and Mudholkar, 1979]

Page 29: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

29

Detection IllustrationDetection Illustration

Value ofover time (all traffic)

over time(SPE)

Value of

SPE at anomaly time points clearly stand out

Page 30: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

30

Extracting Link Anomalies Extracting Link Anomalies BB

Temporal PCA

• PCA = Principal Component Analysis• Similar with Spatial PCA• Project columns onto principal link column

vectors

Page 31: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

31

• Temporal Anomography: B = AX

• Now if we know B, how to solve the abnormal traffic O-D pairs X ?

• (1) Pseudoinverse solution

• (2) Sparsity maximization

Solving bSolving btt = Ax = Axtt

Page 32: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

32

Solving Solving bbt t = A x= A xtt

• Pseudoinverse: xt = pinv(A) bt

– Shortest minimal L2-norm solution

• Solve xt subject to |bt – A xt|2 is minimal

Page 33: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

33

Solving Solving bbt t = A x= A xtt

• Maximize sparsity – In practice, we expect only a few anomalies at

any one time, so x typically has only a small number of large values.

– Hence it is natural to proceed by maximizing the sparsity of x, i.e., solving the following l0 norm minimization problem:

Page 34: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

34

Performance EvaluationPerformance Evaluation

• Fix one anomaly extraction method• Compare “real” and “inferred”

anomalies– “real” anomalies: directly from OD flow

data– “inferred” anomalies: from link data

• Order them by size– Compare the size

• How many of the top N do we find– Gives detection rate: | top N”real” top Ninferred |

/ N

Page 35: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

35

Performance EvaluationPerformance Evaluation

Page 36: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

36

Performance EvaluationPerformance Evaluation

Page 37: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

37

Performance EvaluationPerformance Evaluation

Page 38: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

38

Performance EvaluationPerformance Evaluation

Page 39: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

39Performance Evaluation: Performance Evaluation: AnomographyAnomography

• Hard to compare performance– Lack ground-truth: what is an anomaly?

• So compare events from different methods– Compute top M “benchmark” anomalies

• Apply an anomaly extraction method directly on OD flow data

– Compute top N “inferred” anomalies • Apply another anomography method on link data

– Report min(M,N) - | top Mbenchmark top Ninferred |• M N “false negatives”

# big “benchmark” anomalies not considered big by anomography• M N “false positives”

# big “inferred” anomalies not considered big by benchmark method

– Choose M, N similar to numbers of anomalies a provider is willing to investigate, e.g. 30-50 per week

Page 40: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

40

Anomography: “False Negatives”Anomography: “False Negatives”Top 50

Inferred“ False Negatives” with Top 30 Benchmark

Diff EWMA

H-W ARIMA

Fourier

Wavelet

T-PCA

S-PCA

Diff 0 0 1 1 5 5 17 12

EWMA 0 0 1 1 5 5 17 12

Holt-Winters

1 1 0 0 6 4 18 12

ARIMA 1 1 0 0 6 4 18 12

Fourier 3 4 8 8 1 7 19 18

Wavelet 0 1 2 2 5 0 13 11

T-PCA 14 14 14 14 19 15 3 15

S-PCA 10 10 13 13 15 11 1 131. Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely

consistent2. PCA methods not consistent (even with each other)

- PCA cannot detect anomalies in the “normal” subspace- PCA insensitive to reordering of [b1…bT] cannot utilize all

temporal info3. Spatial methods (e.g. spatial PCA) are not self-consistent

Page 41: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

41

Anomography: “False Positives”Anomography: “False Positives”Top 30

Inferred“ False Positives” with Top 50 Benchmark

Diff EWMA

H-W ARIMA

Fourier

Wavelet

T-PCA

S-PCA

Diff 3 3 6 6 6 4 14 14

EWMA 3 3 6 6 7 5 13 15

Holt-Winters

4 4 1 1 8 3 13 10

ARIMA 4 4 1 1 8 3 13 10

Fourier 6 6 7 6 2 6 19 18

Wavelet 6 6 6 6 8 1 13 12

T-PCA 17 17 17 17 20 13 0 14

S-PCA 18 18 18 18 20 14 1 141. Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely

consistent2. PCA methods not consistent (even with each other)

- PCA cannot detect anomalies in the “normal” subspace- PCA insensitive to reordering of [b1…bT] cannot utilize all

temporal info3. Spatial methods (e.g. spatial PCA) are not self-consistent

Page 42: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

42

ConclusionsConclusions

• Anomography = Anomalies + Tomography– Find anomalies in {xt} given bt=Atxt (t=1,…,T)

• Contributions1. A general framework for anomography methods

– Decouple anomaly extraction and inference components

2. A number of novel algorithms– Taking advantage of the range of choices for anomaly

extraction and inference components– Choosing between spatial vs. temporal approaches

3. Extensive evaluation on real traffic data– 6-month Abilene and 1-month Tier-1 ISP

• The method of choice: ARIMA + Sparsity-L1

Page 43: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

43

Thank you !Thank you !Question?Question?

Page 44: Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong

44

Extracting Link Anomalies Extracting Link Anomalies BB• Temporal Anomography: B = BT

– Fourier / wavelet analysis• Link anomalies = the high frequency components

– ARIMA modeling• Diff: ft = bt-1 bt = bt – ft

• EWMA: ft = (1-) ft-1 + bt-1 bt = bt – ft

– Temporal PCA• PCA = Principal Component Analysis• Project columns onto principal link column vectors

• Spatial Anomography: B = TB– Spatial PCA [Lakhina04]

• Project rows onto principal link row vectors