View
530
Download
1
Category
Tags:
Preview:
DESCRIPTION
Our presentation made at DEBS'10, held in Cambridge, UK, in July, 2010. Describes the solution to monitor datacenters through CEP and Machine Learning.
Citation preview
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
HOLMES: An event-driven solution to monitor data centers through continuous queries and
machine learning
Pedro Henriques dos Santos TeixeiraRicardo Gomes Clemente
Ronald Andreu KaiserDenis Almeida Vieira Jr
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Topics
• Motivation• Use Case• The Solution
• Overview• System architecture• CEP• Machine learning• CEP & Machine learning integration• Visualization and User Interface
• Conclusion
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Motivation
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Motivation
• Non-stop growing environment, dynamic• Understand our environment• Too many dependencies• Can't afford downtime
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Motivation
• Monitoring can be tricky• Precede the inevitable and try to avoid chaos• 1.2K servers• 14K+ monitored items• Correlation
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Use Case
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Use Case
• Big Brother Brazil• New world record• 151 million votes in 2 days• Peaks of 13500 votes per minute (~220 v/s)• DDoS atack detected
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Overview
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
The System Architecture
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
HOLMES
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
System architecture – modules and its purposes
• CEP module: known problems• Machine learning module: unknown problems• Visualization module: situational awareness• Storage: events history/log
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP
• Reaction to incidents in real-time is a requirement for data center monitoring
• Expression of abstract rules related to the business is desirable
• Correlation of events through user-defined queries
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP - Esper
• Open source CEP Implementation
• Supports an EPL
• High throughput, requirement in our context
• Ease of embed in our application
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP – simple example
SELECT avg(response_time) FROM HTTP.win:time(5 min)
E1E5 E4 E3 E2 E1
events stream
Ei
response time...
5 min
4 t.u. 3 t.u. 2 t.u. 3 t.u. 5 t.u.
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
If the number of sessions increase in 10% in a 3 minute window and the
average of cpu's usage of the web farm do not
increase in 5% and the number of slow queries in
the database is higher than 10, then we have achieved a
database contention situation. Alarm it!
If the number of sessions increase in 10% in a 3 minute window and the
average of cpu's usage of the web farm do not
increase in 5% and the number of slow queries in
the database is higher than 10, then we have achieved a
database contention situation. Alarm it!
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Machine learning“any signal, which is totally predictable, carries no information” - Shannon
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Machine learning characteristics
• FRAHST learns to detect anomalous behaviors
• Unsupervised streaming algorithm
• Linear complexity to the number of data streams
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
FRAHST, state-of-the-art
For further information, see reference [12] in our paper.
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Anomaly detection
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP & Machine Learning Integration
• Users choose the data streams to be correlated
• CEP module aggregates events
• Notifications are raised whether a rank variance is detected
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Visualization and User Interface
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Visualization and User Interface
• Users can create Perspectives
• Real-time dashboard personalizations
• Events history visualization
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Dashboards
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Conclusion
• Successfully implementation and acceptance in a real use case
• New challenges• improving situational
awareness & prediction• Make creation of queries
more intuitive
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
This presentation:
http://www.slideshare.net/intelie/debs2010
Our Nagios Plugin source code:
http://github.com/intelie/neb2activemq
Intelligent Monitoring with Esper:
http://esper.codehaus.org/tutorials/tutorial/presentations.html
Denis Vieira Jr. - davieira@gmail.com Ronald Kaiser - ronald@intelie.com.br
Recommended