21
[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen, David Quarrie, Brian Tierney, Craig Tull HCG/NERSC/LBNL CHEP 2003 La Jolla, CA - March 24, 2003

- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

Embed Size (px)

DESCRIPTION

- GMA Athena (24mar03 - CHEP La Jolla, CA) Athena/GAUDI Architecture Converter Algorithm Event Data Service Persistency Service Data Files Algorithm Transient Event Store Detec. Data Service Persistency Service Data Files Transient Detector Store Message Service JobOptions Service Particle Prop. Service Other Services Histogram Service Persistency Service Data Files Transient Histogram Store Application Manager Converter

Citation preview

Page 1: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

GMA Instrumentation of the Athena Framework using NetLogger

Dan Gunter, Wim Lavrijsen, David Quarrie, Brian Tierney, Craig Tull

HCG/NERSC/LBNLCHEP 2003

La Jolla, CA - March 24, 2003

Page 2: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

The Problem

• The Atlas Athena Framework has a large number of components

• When running in a Grid environment, and something goes wrong (e.g.: the job runs slower than expected or crashes) it is very difficult to determine which component is at fault

• Constant, verbose logging generates too much information

• Solution: We are using NetLogger and pyGMA to instrument and monitor Athena

Page 3: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Athena/GAUDI Architecture

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager ConverterConverter

Page 4: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Grid Testbed Topologies (2002)

EDG Testbed(star)

US ATLAS(mesh) NorduGrid

(mesh)

Page 5: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Review: Grid Monitoring Architecture (GMA): Terminology and Architecture

• (Performance) Event:— Typed collection of data with

a specific structure• Producer Interface:

— makes performance data (events) available

• Consumer Interface: — receives performance data

(events)• Directory Service:

— supports information publication and discovery

— must be distributed and/or replicated

eventdata

Consumerevent publication

information

Producer

DirectoryService

event publicationinformation

events

producer

analysis, filtering, etc.

Producer Interface

Consumer Interface

consumer

producerhttp://www.ggf.org/Documents/GFD/GFD-I.7.pdf

Page 6: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Athena Distributed Instrumentation

• Part of SuperComputing 2002 ATLAS demo• IGMASvc IMonitorSvc extension?

—Abstract application monitoring service.• NetLogger (http://www-didc.lbl.gov/NetLogger/)

—End-to-End Monitoring & Analysis of Distributed Systems

—C, C++, Java, Python, Perl, Tcl APIs—Web Service Activation

• Prophesy (http://prophesy.mcs.anl.gov/)—An Infrastructure for Analyzing & Modeling the

Performance of Parallel & Distributed Applications—Normally a Parse & auto-instrument approach (C &

FORTRAN).

Page 7: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

DIDC Technologies Used

• LBNL's Data Intensive Distributed Computing Group• NetLogger provides

—Easy to use instrumentation library—Ability to correlate data from varies sources based on time—Easy way to collect data from multiple clients/servers reliably—Visualization and analysis tools

• pyGMA provides—Easy to use producer and consumer python library

for constructing GGF-defined GMA services• Activation Service provides

—Ability to remotely trigger and collect monitoring data in running Grid applications

Page 8: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

NetLogger Toolkit

• DIDC have developed the NetLogger Toolkit (short for Networked Application Logger), which includes:— tools to make it easy for distributed applications to log

interesting events at every critical point• NetLogger client library (C, C++, Java, Perl, Python)

— tools for host and network monitoring—event visualization tools that allow one to correlate

application events with host/network events—NetLogger event archive and retrieval tools (new)

• NetLogger combines network, host, and application-level monitoring to provide a complete view of the entire system.

• Open Source (http://www-didc.lbl.gov/NetLogger/)

Page 9: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

GMASvc Service

• Typical Athena Abstract Interface design.—Dual Use Library

• Linking Algorithms, etc & Loading DL—Concrete implementation using NetLogger—Properties to adjust:

• NetLogger: On/Off/Level, Distinguished User Name, Activation Service

—Controlled by Environment Variables.—Use in Algorithms, Converters, StoreGate

Store/Retreive, etc.• GMAAuditor

—Typical Athena Auditor bracketing standard Algorthm methods (initialize, execute, finalize)

Page 10: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Atlas Athena Monitoring Activation: SC02 Demo

Send activationrequest

EventConsumer

NetLoggerInstrumentedAthena Job

Send monitoringdata to activationservice

Send events backto consumer

NetLoggerInstrumentedAthena Job

NetLoggerInstrumentedAthena Job

ActivationService

EventConsumer

Activation Service canact as an event filter,buffer, multiplexer, ordemultiplexer

NERSC PDSF LinuxWorkstation Farm

Page 11: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Activation Service Architecture

Page 12: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Activation Service GUI

Page 13: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

NetLogger Analysis: Key Concepts

• NetLogger visualization tools are based on time correlated and object correlated events.— precision timestamps (default = microsecond)

• If applications specify an “object ID” for related events, this allows the NetLogger visualization tools to generate an object “lifeline”

• In order to associate a group of events into a “lifeline”, you must assign an “Event ID” to each NetLogger event— Sample Event ID: file name, block ID, frame ID, etc.

Page 14: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

NLV Athena Example

Page 15: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Completed Tasks

• Instrumented several Athena components with NetLogger

• Developed prototype activation service• Developed prototype interface to the activation

service for Athena monitoring events• Demonstrated at SC02

Page 16: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Current Work

• We are now working on expanding on the components used in the SC02 demo —Develop a “proof of concept” general purpose

Grid troubleshooting architecture in concert with GANGA, Athena, DOE Science Grid

• Tasks include—Further integration of Atlas Software with

Globus (Large ITR work related)—Further NetLogger instrumentation of Globus,

GANGA, and Athena—Redesign of activation service for increased

performance—Integration with Karlo Berket’s scalable and

secure peer-to-peer resource discovery service • will be used to locate producers

Page 17: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

For More Information

• NetLogger: http://www-didc.lbl.gov/NetLogger/

• SC02 Demo: http://annwm.lbl.gov/henp/meet/sc02_nov02/

• Athena: http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/architecture/General/index.html

• Email: [email protected], [email protected]

Page 18: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Extra Slides if you want more details

Page 19: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Monitoring Components

Distributed Error and Logging Event

Directory Service

GridService CGrid

Service A

GridService B

Error andLoggingEvent

Service

several of these services onthe Grid, e.g.: 1 per site

Error andLoggingEvent

Service

Grid User orDeveloper

1) find all Error and Logging EventServices with information about my job

2) request or subscribe toevents related to my job

Page 20: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Activation Service

Page 21: - GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,

[email protected] - GMA Athena (24mar03 - CHEP 2003 @ La Jolla, CA)

Ganglia Cluster Monitoring