Upload
amalie
View
75
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Context-Aware Time Series Anomaly Detection for Complex Systems. Manish Gupta, UIUC Abhishek B. Sharma , NEC Labs America Haifeng Chen, NEC Labs America Guofei (Geoff) Jiang, NEC Labs America. Focus on Complex Systems. Lots of components interacting to accomplish challenging tasks. - PowerPoint PPT Presentation
Citation preview
SDM4Service 5/4/2013 1
Context-Aware Time Series Anomaly Detection for Complex Systems
Manish Gupta, UIUCAbhishek B. Sharma, NEC Labs AmericaHaifeng Chen, NEC Labs AmericaGuofei (Geoff) Jiang, NEC Labs America
SDM4Service 5/4/2013 2
Focus on Complex Systems• Lots of components interacting to accomplish challenging
tasks
Data centers Power plant Manufacturing plant
SDM4Service 5/4/2013 3
Switch to proactive maintenance1. Continuous monitoring from multiple vantage points2. Replace calendar-based or reactive maintenance with early
detection, localization, and remediation• Enabling technologies
– low cost ubiquitous sensing and communication• Challenge
– How to combine heterogeneous data?– Unstructured or semi-structured log data– Multivariate time series data
SDM4Service 5/4/2013 4
Importance of Collating Information• Only time series data
– No context or the global semantic view– Many false positives– Multiple alarms related to a single
event• Only system logs
– High level application/workflow view– Incomplete Coverage
• “Cost”– Lack of root-cause visibility
• Absence of observed system behavior
0 10000 20000 30000 40000 50000 60000 700000
0.010.020.030.040.050.060.070.080.09
0.1
CPU Utilization
Mem Utilization
Time (s)
Nor
mal
ized
Mea
sure
men
t
Task execution
SDM4Service 5/4/2013 5
Our vision• Logs capture the context of a system’s operations• Time series monitoring data record the state of different
components• Hypothesis: jointly mining log and time series data for
anomaly detection is more accurate and robust.– Context-aware time series anomaly detection
SDM4Service 5/4/2013 6
Outline1. Introduction and Motivation2. Framework for combining logs and time series data3. Proposed solution4. Instantiation details for Hadoop5. Evaluation 6. Conclude
SDM4Service 5/4/2013 7
Framework for combining logs and time series data
Time series
SDM4Service 5/4/2013 8
What is an instance?• An instance spans the interval between two consecutive
context changing events on a component.– Assumption: we can identify context changing events.
• Instance I = (C,M); C: content features, M: metrics/time series
t1 t2
t1: Task execution startst2: Task execution finishes
SDM4Service 5/4/2013 9
Problem statement and solution approach
• Given: Instances I1, I2 , …, IL
• Find: Top K anomalous instances• 2 stage solution• Find patterns
• Context patterns• Metric patterns
• Find anomalies
Two notions of similarity:Peer similarity: similarity in context variables across instancesTemporal similarity: similarity in time series data for similar contexts
SDM4Service 5/4/2013 10
Proposed Solution• Extraction of Context Patterns
– Normalize the data– Use K-means clustering
• Extraction of Metric Patterns
C1
C2
C3
M1
M2M3
M4
Not an anomalyAnomaly
𝑠𝑐𝑜𝑟𝑒 ( 𝐼 )=1−𝑠𝑖𝑚 (𝑀 ,𝑀 2 )
Anomaly Detection
Anomaly Post-processing– Remove instance if nearest context
cluster is far away.
CPUMemory
Disk Read
Disk Write
eth0 TX
eth0 RX CPU
10 20.9 622.4 14.0 4.2 5.6 10
20 24.5 977.7 82.7 4.0 1.8 20
10 24.6 836.4 90.0 1.6 1.6 10
30 20.6 198.6 78.6 7.1 0.3 3040 29.3 850.9 99.1 5.1 10.0 40
similarity?
𝑷 𝒄×𝒌 𝑸𝒄×𝒌
similarity?
CPUMemory
Disk Read
Disk Write
eth0 TX
eth0 RX CPU
10 20.9 622.4 14.0 4.2 5.6 1020 24.5 977.7 82.7 4.0 1.8 2010 24.6 836.4 90.0 1.6 1.6 1030 20.6 198.6 78.6 7.1 0.3 30
11
Instantiating the framework for MapReduce (Hadoop)
• MapReduce programming model– Example: count the frequency of all words appearing in a document
• Distributed block storage (e.g. HDFS)• Two phases of computation: Map and Reduce
A B C
B C D
E F G
A B D
A: 1 B: 1 C: 1
B: 1 C: 1 D: 1
E: 1 F: 1 G: 1
A: 1 B: 1 D: 1
Map
Map
Map
Map
Intermediate output
Reduce
A: 2B: 3C: 2D: 2E: 1F: 1G: 1
Final output
SDM4Service 5/4/2013
SDM4Service 5/4/2013 12
Hadoop: Open source implementation of MapReduce runtime
Map and Reduce phases exhibitpeer and temporal similarity
SDM4Service 5/4/2013 13
Discussion• Selecting number of principal components ()– Capture >95% variance for both time series.
• Selecting number of context/metric clusters– Knee point of within cluster sum of squares versus
# clusters curve.• Richer context for MapReduce– Job conf parameters – Events extracted using regex pattern matches from
logs.
SDM4Service 5/4/2013 14
Evaluation1. Synthetic datasets
– Context part comes from real Hadoop runs.– Metrics part is synthetically generated.– Hadoop cluster: master + 5 slaves.– Workload: standard Hadoop examples like sorting, count word
frequencies, etc.– 3 context clusters.
2. Real Hadoop runs with injected faults– CPU hog and Disk hog
SDM4Service 5/4/2013 15
Synthetic data:Context Clusters for Hadoop Examples
#Map
s
#Red
uces
COM
BIN
E O
UTPU
T RE
CORD
S
COM
MIT
TED
HEAP
BYT
ES
CPU
MILL
ISEC
ON
DS
FILE
BYT
ES W
RITT
EN
HDFS
BYT
ES R
EAD
MAP
INPU
T BY
TES
MAP
INPU
T RE
CORD
S
MAP
OUT
PUT
BYTE
S
MAP
OUT
PUT
MAT
ERIA
LIZED
BYT
ES
PHYS
ICAL
MEM
ORY
BYT
ES
RECO
RDS
WRI
TTEN
REDU
CE IN
PUT
GRO
UPS
REDU
CE IN
PUT
RECO
RDS
REDU
CE O
UTPU
T RE
CORD
S
REDU
CE S
HUFF
LE B
YTES
SPILL
ED R
ECO
RDS
SPLIT
RAW
BYT
ES
VIRT
UAL M
EMO
RY B
YTES-2
-1
0
1
2
3Cluster1 Cluster2 Cluster3
Nor
mal
ized
Mea
sure
men
t
• Cluster 1: large number of Map tasks high values for Map counters. • Cluster 2: instances with a few Map and a few Reduce tasks.• Cluster 3: instances with large number of Reduce tasks and high values for
Reduce counters.
SDM4Service 5/4/2013 16
Injecting Anomalies in Synthetic Dataset
• Fix anomaly factor=• Randomly select instances into set R• For each instance in R, choose to add swap-
anomaly or new-anomaly.• Swap Anomaly: Swap the metrics part with
another randomly chosen instance.• New Anomaly: Replace the metrics time series
part with a new random matrix.
SDM4Service 5/4/2013 17
Synthetic Dataset Results
• 20 experiments per setting.• Avg. standard deviations are 3.34% for CA, 7.06 % for SI and 4.58%
for NC.
SI (1%)NC (28%)
SDM4Service 5/4/2013 18
Results on real Hadoop runs with injected faults
1 15 29 43 57 71 85 99 1131270
0.050.1
0.150.2
0.250.3
0.35Disk HogCPU Hog
Instance Number
Anom
aly
Scor
e
• Original number of anomalies– Disk hog: 7.– CPU hog: 4.
• Detected anomalies– Disk hog: 4 in top5, all 7 in top 10.– CPU hog: 3 in top 5, all 4 in top 10.
1 12 23 34 45 56 67 780
20
40
60
80
100Anomaly
Metric Cluster 0
Metric Cluster 1
Metric Cluster 2
Time (sec)
CPU
Utiliz
ation
SDM4Service 5/4/2013 19
Conclusion and Future work• Proactive maintenance is more effective when we combine
information from heterogeneous sources– System logs and time series measurements
• We proposed a clustering based approach for finding context patterns from log data and metric patterns from time series– Use these patterns for anomaly detection
• Future directions– How to define context and instances in other settings?– Define anomalies based on transition in context and expected change
in metrics
SDM4Service 5/4/2013 20
Appendix
SDM4Service 5/4/2013 21
500 1000 2000 50000
5000
10000
15000
20000
25000
30000#Metrics=5#Metrics=10#Metrics=20
Number of instances (N)
Exec
ution
Tim
e fo
r Met
ric
Patte
rns D
isco
very
(sec
)
Running Time
• Algorithm is linear in number of instances.• Time spent in anomaly detection: ~188ms.
SDM4Service 5/4/2013 22
Real Datasets
• Workload: Multiple runs of RandomWriter and Sort.
• RandomWriter (16 Maps) writes 1 GB data in 64 MB chunks and Sort (16 Maps and 16 Reduces) sorts the data.
• Anomalies are inserted on 1 machine for– CPU Hog: Infinite loop.– Disk Hog: Sequential write to file on disk.
• Total instances: 134 (Disk Hog) & 121 (CPU Hog).
SDM4Service 5/4/2013 23
Context Clusters for RandomWriter+Sort Dataset
#Map
s
#Red
uces
COM
MIT
TED
HEAP
BYT
ES
CPU
MILL
ISEC
ON
DS
FILE
BYT
ES R
EAD
FILE
BYT
ES W
RITT
EN
HDFS
BYT
ES W
RITT
EN
MAP
OUT
PUT
MAT
ERIA
LIZED
BYT
ES
MAP
OUT
PUT
RECO
RDS
PHYS
ICAL
MEM
ORY
BYT
ES
RECO
RDS
WRI
TTEN
REDU
CE IN
PUT
RECO
RDS
REDU
CE O
UTPU
T RE
CORD
S
REDU
CE S
HUFF
LE B
YTES
SPILL
ED R
ECO
RDS
SPLIT
RAW
BYT
ES
VIRT
UAL M
EMO
RY B
YTES-3
-2-1012 Cluster1 Cluster2 Cluster3
Nor
mal
ized
Mea
sure
men
t
• Cluster 1 consists of a mix of Maps and Reduces and has a distinctly high number of HDFS bytes being written.
• Cluster 2 is Map-heavy and shows a large number of Map Output Records.• Cluster 3 is Reduce-heavy and hence demonstrates a large activity in Reduce
counters.
SDM4Service 5/4/2013 24
Metric Patterns