Event Driven Solution to monitor Datacenters through continuous queries and machine learning

Preview:

DESCRIPTION

Our presentation made at DEBS'10, held in Cambridge, UK, in July, 2010. Describes the solution to monitor datacenters through CEP and Machine Learning.

Citation preview

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES: An event-driven solution to monitor data centers through continuous queries and

machine learning

Pedro Henriques dos Santos TeixeiraRicardo Gomes Clemente

Ronald Andreu KaiserDenis Almeida Vieira Jr

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Topics

• Motivation• Use Case• The Solution

• Overview• System architecture• CEP• Machine learning• CEP & Machine learning integration• Visualization and User Interface

• Conclusion

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Non-stop growing environment, dynamic• Understand our environment• Too many dependencies• Can't afford downtime

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Monitoring can be tricky• Precede the inevitable and try to avoid chaos• 1.2K servers• 14K+ monitored items• Correlation

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

• Big Brother Brazil• New world record• 151 million votes in 2 days• Peaks of 13500 votes per minute (~220 v/s)• DDoS atack detected

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Overview

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

The System Architecture

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

System architecture – modules and its purposes

• CEP module: known problems• Machine learning module: unknown problems• Visualization module: situational awareness• Storage: events history/log

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

• Reaction to incidents in real-time is a requirement for data center monitoring

• Expression of abstract rules related to the business is desirable

• Correlation of events through user-defined queries

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP - Esper

• Open source CEP Implementation

• Supports an EPL

• High throughput, requirement in our context

• Ease of embed in our application

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP – simple example

SELECT avg(response_time) FROM HTTP.win:time(5 min)

E1E5 E4 E3 E2 E1

events stream

Ei

response time...

5 min

4 t.u. 3 t.u. 2 t.u. 3 t.u. 5 t.u.

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning“any signal, which is totally predictable, carries no information” - Shannon

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning characteristics

• FRAHST learns to detect anomalous behaviors

• Unsupervised streaming algorithm

• Linear complexity to the number of data streams

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

FRAHST, state-of-the-art

For further information, see reference [12] in our paper.

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Anomaly detection

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP & Machine Learning Integration

• Users choose the data streams to be correlated

• CEP module aggregates events

• Notifications are raised whether a rank variance is detected

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

• Users can create Perspectives

• Real-time dashboard personalizations

• Events history visualization

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Dashboards

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Conclusion

• Successfully implementation and acceptance in a real use case

• New challenges• improving situational

awareness & prediction• Make creation of queries

more intuitive

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

This presentation:

http://www.slideshare.net/intelie/debs2010

Our Nagios Plugin source code:

http://github.com/intelie/neb2activemq

Intelligent Monitoring with Esper:

http://esper.codehaus.org/tutorials/tutorial/presentations.html

Denis Vieira Jr. - davieira@gmail.com Ronald Kaiser - ronald@intelie.com.br

Recommended