28
DEBS 2010 – 4 th ACM International Conference on Distributed Event-Based System Cambridge, United Kingdom HOLMES: An event-driven solution to monitor data centers through continuous queries and machine learning Pedro Henriques dos Santos Teixeira Ricardo Gomes Clemente Ronald Andreu Kaiser Denis Almeida Vieira Jr

Event Driven Solution to monitor Datacenters through continuous queries and machine learning

Embed Size (px)

DESCRIPTION

Our presentation made at DEBS'10, held in Cambridge, UK, in July, 2010. Describes the solution to monitor datacenters through CEP and Machine Learning.

Citation preview

Page 1: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES: An event-driven solution to monitor data centers through continuous queries and

machine learning

Pedro Henriques dos Santos TeixeiraRicardo Gomes Clemente

Ronald Andreu KaiserDenis Almeida Vieira Jr

Page 2: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Topics

• Motivation• Use Case• The Solution

• Overview• System architecture• CEP• Machine learning• CEP & Machine learning integration• Visualization and User Interface

• Conclusion

Page 3: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

Page 4: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Non-stop growing environment, dynamic• Understand our environment• Too many dependencies• Can't afford downtime

Page 5: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Monitoring can be tricky• Precede the inevitable and try to avoid chaos• 1.2K servers• 14K+ monitored items• Correlation

Page 6: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

Page 7: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

• Big Brother Brazil• New world record• 151 million votes in 2 days• Peaks of 13500 votes per minute (~220 v/s)• DDoS atack detected

Page 8: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Overview

Page 9: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Page 10: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

The System Architecture

Page 11: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES

Page 12: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

System architecture – modules and its purposes

• CEP module: known problems• Machine learning module: unknown problems• Visualization module: situational awareness• Storage: events history/log

Page 13: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

Page 14: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

• Reaction to incidents in real-time is a requirement for data center monitoring

• Expression of abstract rules related to the business is desirable

• Correlation of events through user-defined queries

Page 15: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP - Esper

• Open source CEP Implementation

• Supports an EPL

• High throughput, requirement in our context

• Ease of embed in our application

Page 16: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP – simple example

SELECT avg(response_time) FROM HTTP.win:time(5 min)

E1E5 E4 E3 E2 E1

events stream

Ei

response time...

5 min

4 t.u. 3 t.u. 2 t.u. 3 t.u. 5 t.u.

Page 17: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

Page 18: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning“any signal, which is totally predictable, carries no information” - Shannon

Page 19: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning characteristics

• FRAHST learns to detect anomalous behaviors

• Unsupervised streaming algorithm

• Linear complexity to the number of data streams

Page 20: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

FRAHST, state-of-the-art

For further information, see reference [12] in our paper.

Page 21: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Anomaly detection

Page 22: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP & Machine Learning Integration

• Users choose the data streams to be correlated

• CEP module aggregates events

• Notifications are raised whether a rank variance is detected

Page 23: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

Page 24: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

• Users can create Perspectives

• Real-time dashboard personalizations

• Events history visualization

Page 25: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Dashboards

Page 26: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Page 27: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Conclusion

• Successfully implementation and acceptance in a real use case

• New challenges• improving situational

awareness & prediction• Make creation of queries

more intuitive

Page 28: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

This presentation:

http://www.slideshare.net/intelie/debs2010

Our Nagios Plugin source code:

http://github.com/intelie/neb2activemq

Intelligent Monitoring with Esper:

http://esper.codehaus.org/tutorials/tutorial/presentations.html

Denis Vieira Jr. - [email protected] Ronald Kaiser - [email protected]