34
Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Embed Size (px)

Citation preview

Page 1: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Grid MonitoringBy

Zoran Obradovic

CSE-510 October 2007

Page 2: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Grid Monitoring

Reasons for monitoringAuthorization, scheduling, sense of control

Monitoring systemsGlobus (Monitoring and Discovery System MDS), Ganglia, Nagios, Inca, MonaLisa

StandardsGIPS compliance verification

Page 3: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Monitoring the state of grid resources, services and job activity is an important part of managing a grid environment

Administrators need a sense of control overThe resources provided in such distributed computing.

It is important for grid administrators to know the current state of the grid to provide operations and support

*It is also an important tool for grid users

The desire is to develop a system that will give administratorsThe ability to look at the grid system, and be able to administer it As if it were a single workstation.

Reasons

Page 4: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Monitoring can provide grid administrators, as well as users, with significant information about what resources are available in the grid and what state they are in.

Job monitors gather vital information about job submissions on specific resources by harvesting data from local cluster job Managers.

Resource allocation

Page 5: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Monitoring allows for various resources to be dynamicallyinstantiated and adjusted using constantly running background Processes.

Security: Keeps track of who is using the grid, permissions, Data integrity, minimizes possibility of malicious activity, threats, and accidents,

Page 6: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Monitoring Systems

Page 7: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

MonAlisa

Monitoring Agents using a Large Integrated Services Architecture

Built by Caltech and its partners with the support of the U.S. CMS software and computing program.

The design is built on Dynamic Distributed Service Architecture

Able to provide complete monitoring, control and global optimization services for complex systems.

Page 8: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

It is an group of independent multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to communicate and work together in performing a range of information gathering and processing tasks

Page 9: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

                                                                                                                                                                                               

Page 10: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

If a monitoring task fails or hangs due to I/O errors, the other tasks are not delayed or disrupted, since they are executing in other, independent threads

Pool of threads is created once, and the threads are then reused when a task assigned to a thread is completed.

Page 11: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 12: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Each MonALISA service registers itself with a set of Lookup Services (LUSs) as part of one or more groups and it publishes some attributes that describe itself.

Lookup services have replicated information.

MonALISA LUSs restrict the services' registration based on an authorized X.509 certificate.

Page 13: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 14: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 15: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 16: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 17: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 18: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 19: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 20: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

The combination of the service architecture and codemobility makes it possible to build an extensible hierarchy of services that is capable of managing very large systems.

Page 21: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Monitoring all aspects of complex systems :

System information for computer nodes and clusters.

Network information (traffic, flows, connectivity, topology) for WAN and LAN.

Monitoring the performance of Applications, Jobs or services.

End User Systems, and End To End performance measurements.

Page 22: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

The Monitoring and Discovery System (MDS) is a suite of web services to monitor and determine resources and services on Grids

Globus

Allows users to discover what resources are considered part of a Virtual Organization

It offers trigger and indexing services

Page 23: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Trigger Service: gathers information and evaluates that data against a set of conditions defined in a configuration file. When a condition is met, an action takes place, such as emailing a system administrator when the disk space on a server reaches a threshold.

Indexing Service: Gathers information and publish that informationas resource properties. Clients use the resource property query and subscription/notification interfaces to retrieve information from an Index.

Page 24: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Information Providers For Globus Monitoring Toolkit

Hawkeye Information Provider

Ganglia Information Provider

WS GRAM

Reliable File Transfer Service (RFT)

Page 25: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

What do they provide?

-basic host data (name, ID) -processor information -memory size -OS name and version -file system data -processor load data -queue information -number of CPUs available and free -job count information -some memory statistics-status data of the server -transfer status for a file or set of files -number of active transfers

Page 26: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Ganglia

Scalable distributed monitoring system for high-performance computing systems

It uses XML for data representation, XDR (external data representation)portable data transportand RRDtool for data storage and visualization

Uses data structures and algorithms to achieve very low per-node overheads and high concurrency

Page 27: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 28: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

It has been used to link clusters across university campuses and

around the world and can scale to handle clusters with 2000 nodes.

Current support comes from Planet Lab, an open platform for developing,

deploying, and accessing planetary-scale services.

Page 29: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Nagios

“Nagios is a host and service monitor designed to inform you of

network problems before your clients, end-users or managers do.”

It is designed to run in Linux operating systems-works fine under most *nix variants

The monitoring daemon runs intermittent checks on hosts and services an administrator can specify using external "plugins" which return status information to Nagios

If a problem arises in a cluster or a grid , the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message).

Page 30: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 31: Grid Monitoring By Zoran Obradovic CSE-510 October 2007
Page 32: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Global Investment Performance Standards

“The principal goal of the Investment Performance Council is to have all countries adopt the GIPS standards as the standard for investment firms seeking to present historical investment performance”

GIPS compliance acting as a “passport” allows firms to enter the arena ofinvestment management competition on a global basis and to compete on an equal footing.

Today, 25 countries throughout North America, Europe, Africa, and the Asia Pacific Region have adopted the GIPS standards

Page 33: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

-Standard interface for presenting monitoring information about a resource

-GIP sensor suite used as reference implementation

-Information about grids to be returned in LDIF formatstandard data interchange format for representing LDAP directory content as well as directory update

-GLUE Schema: abstract modeling for Grid resources and mapping to concrete schemas that can be used in Grid Information Services

-Monitoring and Discovery System (MDS) 2.4 Gris

Page 34: Grid Monitoring By Zoran Obradovic CSE-510 October 2007

Sources:

http://www.sura.org/cookbook

http://monalisa.cacr.caltech.edu/monalisa.htm

http://www.globus.org/toolkit/docs/4.0/info/key-index.html

http://ganglia.sourceforge.net/

http://www.nagios.org/about/

osg-docdb.opensciencegrid.org/0004/000499/001/OSGMiddleware.pp