16
MONITORING WITH MONALISA Costin Grigoras <[email protected]>

monitoring with MonALISA

  • Upload
    brinly

  • View
    69

  • Download
    0

Embed Size (px)

DESCRIPTION

monitoring with MonALISA. Costin Grigoras . Useful links. What is MonALISA ?. MonALISA communication architecture. Monitoring modules. ApMon. Data storage model. MonALISA clients. AliEn monitoring architecture. The Repository. Automatic actions. - PowerPoint PPT Presentation

Citation preview

Page 1: monitoring  with MonALISA

MONITORING WITH MONALISA

Costin Grigoras <[email protected]>

Page 2: monitoring  with MonALISA

MO

NITO

RING W

ITH MO

NALISA

What is MonALISA ?

MonALISA communication architecture

Monitoring modules

ApMon

Data storage model

MonALISA clients

AliEn monitoring architecture

The Repository

Automatic actions

Putting it all together

Useful links

3

4

6

7

8

9

10

11

13

14

15

Page 3: monitoring  with MonALISA

MO

NALISA

What is MonALISA ?

Caltech project started in 2002http://monalisa.caltech.edu/

Java-based set of distributed, self-describing services

Offers the infrastructure to collect any type of information

Can process it in near real time

The services can cooperate in performing the monitoring tasks

Can act as a platform for running distributed user agents

Page 4: monitoring  with MonALISA

MO

NALISA CO

MM

UN

ICATION

ARCHITECTURE

MonALISA software components and the connections between them

Data consumers

Multiplexing layerHelps firewalled endpoints connect

Registration and discoveryJINI-Lookup Services Secure & Public

MonALISA services

Proxies

ClientsHL services

Agents

Network of

Data gathering services

Fully Distributed System with no Single Point of Failure

Page 5: monitoring  with MonALISA

MO

NALISA CO

MM

UN

ICATION

ARCHITECTURE

Subscriber/notification paradigm

Configuration Control (SSL)

Predicates & Agents

Data (via ML Proxy)

Applications

ML Client

LookupService

Registration DiscoveryAGENTS

FILTERS / TRIGGERSMonitoring Modules

Dynamic Loading

Data Store

Push or Pull, depending on device

ML Service

Page 6: monitoring  with MonALISA

MO

NITO

RING M

ODU

LESMonALISA service includes many modules; easily extendable

The service package includes:-Local host monitoring (CPU, memory, network traffic , processes and sockets in each state, LM sensors, APC UPSs), log files tailing-SNMP generic & specific modules -Condor, PBS, LSF and SGE (accounting & host monitoring), Ganglia-Ping, tracepath, traceroute, pathload and other network-related measurements-Ciena, Optical switches-Calling external applications/scripts that return as output the values-XDR-formatted UDP messages (such as ApMon).

New modules can be added by implementing a simple Java interface.

Filters can also be defined to aggregate data in new ways.

The Service can also react to the monitoring data it receives, more about the actions it can take later.

MonALISA can run code as distributed agents-Used by VRVS/Evo to maintain the tree of connections between reflectors-Eof optical paths between two network endpoints for particular data transfers

Page 7: monitoring  with MonALISA

APMO

NEmbeddable APlication MONitoring library

ApMon is a collection of libraries in various languages (C, C++, Java, Perl, Python), all offering a simple API to sending monitoring information

Based on the XDR open format of data packing (efficient packing of the values)

Implemented over UDP to minimize the impact on the monitored application

Allows applications to send particular values and also provides local host monitoring

The Perl version is used by AliEn to send monitoring information from each service and JobAgent to the site local MonALISA instance

The C/C++ implementations are used in ROOT (TMonalisaWriter), Proof and Xrootd. CMS also uses ApMon to send monitoring information from all jobs to a single point

Can also be used stand-alone, in a wrapper application that loops forever, sending the default host monitoring parameters + any other interesting values (like some services’ status)

It can be configured by API, local configuration file or URL of one

Page 8: monitoring  with MonALISA

MonALISA keeps a memory buffer for a minimal monitoring history

In addition, data can be kept in configurable database structures-The service keeps one week of raw data and one month of averaged values-The client creates three averaged structures

Parallel database backends can be used to increase performance and reliability

DATA STORAGE M

ODEL

Shared codebase between components

Response

Short term,high resolution

Medium term,lower resolution

Long term,low resolution

Memory buffer Volatile storage

Persistent storage (DB)

time

Request at highest resolution

Page 9: monitoring  with MonALISA

MO

NALISA CLIEN

TSTwo important clients

GUI client-Interactive exploring of all the parameters-Can plot history or real-time values-Customizable history query interval-Subscribes to those particular series and updates the plots in real time

Storage client (aka Repository)-Subscribes to a set of parameters and stores them in database structures suitable for long-term archival-Is usually complemented by a web interface presenting these values-Can also be embedded in another controlling application

WebServices clients-Limited functionality: they lack the subscription mechanism

Page 10: monitoring  with MonALISA

ALIEN M

ON

ITORIN

G ARCHITECTURE

Monitoring follows the general AliEn layout: one service per site collects and aggregates site-local monitoring information

10

Long HistoryDB

LCG Tools

MonALISA AliEn Site

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

MonALISA @CERN

MonALISA LCG Site

ApMon

AliEn CE

ApMon

AliEn SE

ApMon

ClusterMonitor

ApMon

AliEn TQ

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

ApMon

AliEn CE

ApMon

AliEn SE

ApMon

ClusterMonitor

ApMon

AliEn IS

ApMon

AliEn Optimizers

ApMon

AliEn Brokers

ApMon

MySQLServers

ApMon

CastorGridScripts

ApMon

APIServices

MonALISARepository

Aggregated Data

rss

vsz

cputime

run

time

jobslots

free

spac

e

nr. o

ffil

es

openfiles

Queued

JobAgents

cpuksi2k

jobstatus

disk

used

processes loadnet

In/o

ut

jobsstatussockets

migratedmbytesactive

sessions

MyProxy

status

Alerts

Actions

http://pcalimonitor.cern.ch/

Page 11: monitoring  with MonALISA

THE REPOSITO

RYMonALISA repository for ALICE

Page 12: monitoring  with MonALISA

THE REPOSITO

RYMonitoring the ALICE Grid: some numbers

85 sites defined9000 worker nodes14000 parallel jobs25 central machines

More than 1.1M parameters

Storing only aggregated data where possible we have reached >35000 active parameters in the database

We store new values at ~100Hz

15000 dynamic pages / day

1 client/web server + 3 database instances

170GB of history

Page 13: monitoring  with MonALISA

Actions

AUTO

MATIC ACTIO

NS

What to do with this monitoring data?

Decisions taken upon:-Absence/presence of some parameter(s)-Values above/below predefined thresholds-Arbitrary correlations between values-User-defined code

Action types:-Notifications-RSS/Atom feeds-Annotation of charts with the events-Calling external programs-Logging-Running custom code, for example in ALICE

- restart site services- maintain DNS-based

load balancing

ML Service

ML Service

Actions

• Traffic• Jobs• Hosts• Apps

• Temperature• Humidity• A/C Power• …

Sensors Localdecisions

Global decisions

Global ML

Services

SSL

SSL

Page 14: monitoring  with MonALISA

PUTTIN

G IT ALL TOGETHER

How to monitor your system

Step 0: basic monitoring support

Install a MonALISA service instance or reuse the one installed by AliEn

Step 1: instrument the applications with monitoring calls

ApMon covers both the host monitoring and sending your own parameters. Use a pointer to the actual configuration

Step 2: collect the data

Install a Repository and collect only the parameters that you need

Step 3: present the data

Learning from the past is important. Add annotations of changes

Step 4: react to events

Start by sending emails when things don’t work. Continue by automatizing the manual procedures

Step 5: go to a workshop in Philippines.

Page 15: monitoring  with MonALISA

USEFU

L LINKS

Exploring ML world can start from here

Main documentation pagehttp://monalisa.caltech.edu/

Experiment repositoriesALICE: http://pcalimonitor.cern.ch/PANDA: http://mlr2.gla.ac.uk:7001/

Cluster monitoring repositoriesGSI: http://lxgrid2.gsi.de:8080/Muenster: http://gridikp.uni-muenster.de:8080/ CAF: http://pcalimonitor.cern.ch/stats?page=CAF/machines

Page 16: monitoring  with MonALISA

2Q||!2Q

?

Thanks a lot for your attention!

Any questions?