77
Information Systems describing resources Grid Middleware 4 David Groep, lecture series 2005-2006

Information Systems describing resources

  • Upload
    lore

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Information Systems describing resources. Grid Middleware 4 David Groep, lecture series 2005-2006. Outline. Taxonomy of information systems hierarchies and republishers Grid Monitoring Architecture push and pull, subscriptions Performance of an IS collecting information sensors - PowerPoint PPT Presentation

Citation preview

Page 1: Information Systems describing resources

Information Systems

describing resources

Grid Middleware 4

David Groep, lecture series 2005-2006

Page 2: Information Systems describing resources

Grid Middleware IV 2

Outline

Taxonomy of information systems hierarchies and republishers Grid Monitoring Architecture push and pull, subscriptions

Performance of an IS collecting information sensors

IS content: schemas and approaches

Page 3: Information Systems describing resources

Grid Middleware IV 3

Grid Information Systems

Concerns data shared between administrative domains for use by multiple people or VOs

So it does not include things like cluster temperature monitoring debugging streams accounting history

Page 4: Information Systems describing resources

Grid Middleware IV 4

Classification of information systems

Which monitoring systems types are suitable for grid? Paper:

http://www.cs.man.ac.uk/~zanikols/fgcs05.pdf

Different types are: Level 0

self-contained not accessible by programs (but only e.g. web) Level 1

events are accessible remotely at the single producer level Level 2

includes republishers with fixed functionality Level 3

supports hierarchies of republishers

Page 5: Information Systems describing resources

Grid Middleware IV 5

System taxonomy: levels of systems

Components used in information systems

and taxonomy levels

graphics and concept from S. Zanikolas et al., FGCS 21 (2005) 163-188

Page 6: Information Systems describing resources

Grid Middleware IV 6

Information system classes

Level 2 or 3 system are suitable

Reference architecture: GMA Grid Monitoring Architecture requirements

(performance) information with relatively short lifetime frequent updates (should) carry quality-of-information status as well

but: when you get down to it, almost anything fits in this architecture

including directories with relatively static information suitable mainly for resource state

Page 7: Information Systems describing resources

Grid Middleware IV 7

Grid Monitoring Architecture

Definition of terms and roles (GWD-GP-16-2)

Functions: Registry (directory)

Add, Update, Remove, Search

Producer Maintain Registration, Accept Query,

Accept (Un)subscribe, Locate Consumer, Notify, Initiate (Un)subscribe

Consumer Locate Producer, Initiate Query, ~ (Un)subscribe, Maintain Registration,

Accept Notification, ~ (Un)subscribe, Locate Event Schema

Page 8: Information Systems describing resources

Grid Middleware IV 8

GMA: Intermediaries

Also referred to as ‘republishers’make it a level-3 system

Examples Latest Producer

return the ‘last’ value of an event

Archiver (history producer) storage of historical monitoring data e.g. accounting records

Page 9: Information Systems describing resources

Grid Middleware IV 9

Directories

Information providers ‘publish’ information to a directory

Directories may be linked in networked hierarchies

Information is usually also in a DIT-like structure(Directory Information Tree)

Typical implementation: LDAP

Page 10: Information Systems describing resources

Grid Middleware IV 10

Approaches to sending information

Orthogonal to the topology is the information flow model

Push model information gets published regardless of its use bet it’s there (in higher-level aggregators) when it’s needed e.g. Condor Hawkeye, LCG BDII

Hybrid information location gets published consumers can subscribe to information and from then on continuously

get it e.g. R-GMA, (MDS4?)

Pull model information is retrieved on-demand, and you cannot subscribe e.g. MDS-2

Page 11: Information Systems describing resources

Grid Middleware IV 11

Information Systems

Examples shown in this lecture

1. Monitoring and Discovery Service (MDS)2. Relational Grid Monitoring Arch (R-GMA)3. Hawk eye4. Berkeley-DataBase Information Index (BDII)

Page 12: Information Systems describing resources

Grid Middleware IV 12

1 – MDS2

Part of GT2.x Typical use: resource selection by brokers Architecture

decentralized hierarchical soft-state protocols with timeouts supports caching in index servers

Security: GSI (optional)

Page 13: Information Systems describing resources

Grid Middleware IV 13

MDS2 Architecture

GI IS

Cache contains info fromA and B

GI IS requests infofrom GRIS services

Client 1 Client 2

Client 2 uses GI IS for searching collective information

GRIS register with GI IS

Resource A

GRIS

IPIP Resource B

GRIS

IPIP

IP

Client 1 searchesthe GRIS directly

GI IS

Cache contains info fromA and B

GI IS requests infofrom GRIS services

Client 1 Client 2

Client 2 uses GI IS for searching collective information

GRIS register with GI IS

Resource A

GRIS

IPIPResource A

GRIS

IPIP Resource B

GRIS

IPIP

IP

Resource B

GRIS

IPIP

IP

Client 1 searchesthe GRIS directly

graphic: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 14: Information Systems describing resources

Grid Middleware IV 14

MDS2 information flow

Soft-state registration of GRISes with GIISes time out on the registration (TTL and nextUpdate)

Data retrieved on-demand from underlying GRIS timeout on the answer resources silently drop out if they fail

GRISes collect information using scripts

GIISes can be collated in arbitrary hierarchies

Page 15: Information Systems describing resources

Grid Middleware IV 15

2 – R-GMA

‘straight’ implementation of the GMA uses a relational representation of the data

notification/subscription directly from the source implementation in Java

developed in EU DataGrid and EGEE JRA1 UK cluster, Steve Fisher (RAL), et al.

Page 16: Information Systems describing resources

Grid Middleware IV 16

R-GMA Archirecture

Page 17: Information Systems describing resources

Grid Middleware IV 17

MON Box

Every site has a MON box to proxy information local cache of info in memory through-channel to systems behind a firewall

producers/consumers connect actively to the MON box

Multiple producers can publish in the same table joins can be done, but only via a secondary producer

Usually deployed with a single registry

Page 18: Information Systems describing resources

Grid Middleware IV 18

R-GMA plain SQL interface

bosui:davidg:1001$ rgma

Welcome to the R-GMA virtual database for Virtual Organisations.

================================================================

Your local R-GMA server is:

https://eg.nikhef.nl:8443/R-GMA

You are connected to the following R-GMA Registry services:

https://lcgic01.gridpp.rl.ac.uk:8443/R-GMA/RegistryServlet

You are connected to the following R-GMA Schema service:

https://lcgic01.gridpp.rl.ac.uk:8443/R-GMA/SchemaServlet

Type "help" for a list of commands.

rgma> show tables

+------------------------------------------+

| Table Name |

+------------------------------------------+

| ArchiverTestTable |

| ... |

| GlueCE |

| ... |

+------------------------------------------+

Page 19: Information Systems describing resources

Grid Middleware IV 19

Queries

rgma> select UniqueID,Name,TotalCPUs from GlueCE WHERE UniqueID LIKE '%ulakbim%';

+--------------------------------------------------+---------+-----------+

| UniqueID | Name | TotalCPUs |

+--------------------------------------------------+---------+-----------+

| ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-seegrid | seegrid | 126 |

| ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-trgrida | trgrida | 126 |

| ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-lhcb | lhcb | 126 |

...

Page 20: Information Systems describing resources

Grid Middleware IV 20

3 – Hawkeye

Condor information system publishes class-ads for

matchmaking fault detection

periodic updates to the agents by the modules information kept in the agents

Page 21: Information Systems describing resources

Grid Middleware IV 21

Hawkeye architecture

Manager

Agent Agent Agent

Module Module Module Module Module Module

graphic: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 22: Information Systems describing resources

Grid Middleware IV 22

4 – BDII & GIP

BDII conceptually similar to Hawkeye but data is pulled rather than pushed mentioned here because of it’s wide-spread deployment in EGEE/LCG,

OSG, &c Generic Information Providers (GIP)

scripting framework to produce LDIF static values overridden by output from scripts

periodically, LDAP queries sent to subordinate directories with time-out on the answer previous answer is persistent for a defined amount of time

contrary to MDS2, BDII will never forget

Paper:http://indico.cern.ch/materialDisplay.py?contribId=126&sessionId=23&materialId=paper&confId=0

Page 23: Information Systems describing resources

Grid Middleware IV 23

BDII organisation

Page 24: Information Systems describing resources

Grid Middleware IV 24

BDII scaling

OpenLDAP update (write) is not optimized with SleepyCat Berkeley DB, simultaneous read/write lead to

timeouts So, put in a forwarder service that redirects to a pool of

OpenLDAP/DB backends that swap roles

Page 25: Information Systems describing resources

Grid Middleware IV 25

WS style information systems

MDS4 based on WS-RF, WS-Notification mechanisms provides a common aggregator framework for

index service (republisher) trigger service (send events, mails, execute programs) archive service

NAREGI Distributed Information Service Aggregator collect information from various sources put these as CIM objects in a database OGSA-DAI front-end to the database with CIM objects

PS: OGSA-DAI (Data Access & Integration) is a system for providing uniform grid access to database resources

Page 26: Information Systems describing resources

Grid Middleware IV 26

MDS4 Aggregator Framework

Page 27: Information Systems describing resources

Grid Middleware IV 27

NAREGI Distributed Information Service

graphic:Satoshi Matuoka, Tokyo Institute of Technology & NII, NAREGI

Page 28: Information Systems describing resources

Grid Middleware IV 28

Status

Both developed and available

neither been tested yet at the very large scale i.e. O(1000) resources, thousands of simultaneous queries

Page 29: Information Systems describing resources

Hierarchies and Views

Page 30: Information Systems describing resources

Grid Middleware IV 30

Views on the information system

For resource information information view on those resources to which the viewer

potientially has access

a single global root is neither feasible nor needed a per-VO or per-infrastructure view is sufficient

For ‘application level’ monitoring fine-grained access control needed at the VO or user level attributes in the schema may have different privacy levels requires view management like in regular databases

Page 31: Information Systems describing resources

Grid Middleware IV 31

Typical hierarchical top levels today

per-infrastructure e.g. EGEE/LCG, OSG, NAREGI used by many VOs needs support at the infrastructure level

per-VO view prevalent in ‘grass-roots’ deployment

all systems can support both although not all in the same way:

R-GMA works with per-site mon boxes that (today) use a central registry -> one per infrastructure

Page 32: Information Systems describing resources

Performance

an example of a grid performance study

Page 33: Information Systems describing resources

Grid Middleware IV 33

Performance analysis

Best paper so far: X. Zhang, J. Freschl, J. Schopf, A performance study of monitoring and information services for distributed systems, in: Proceedings of the 12th IEEE High Performance Distributed Computing (HPDC-12 2003), IEEE Computer Society Press, Seattle, WA, USA, 2003, pp. 270–282.

Perf results on R-GMA are outdated, but basics still do hold MDS2 has since been replaced with MDS4 (in GT4) The three systems selected are indicative of the different classes, and

thus it’s a very valuable comparison!

Data in the next slides by Jennifer Schopf from the GridForum NL/ISOC NL Masterclass 2005

Page 34: Information Systems describing resources

Grid Middleware IV 34

Roles of components in the comparison

MDS2 R-GMA Hawkeye

InfoCollector

Information Provider

Producer Module

Info Server

GRIS Producer Servlet

Agent

Aggregate Info Server

GIIS Combo Producer-Consumer

Manager

Directory Server

GIIS Registry Manager

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 35: Information Systems describing resources

Grid Middleware IV 35

Performance analysis

Three ‘characteristics’ systems MDS2 (pull system, with and without caching) R-GMA (hybrid, straight GMA implementation w/Relational IF) Hawkeye (push system, from Condor)

Tests done on a small test bed (~7 systems) scaling has not been tested but results are at least comparable

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 36: Information Systems describing resources

Grid Middleware IV 36

Performance analysis: other facts

Keep in mind that MDS2 & Hawkeye are programmed in C

R-GMA is in Java

This R-GMA version relied heavily on threads i.e. implementation was straight translation of architecture JVM and Linux kernel 2.4 don’t like too many O(500) threads…

Page 37: Information Systems describing resources

Grid Middleware IV 37

Model for evaluation

paper attempts to compare similar properties in the three systems

deploy in a standard mode (as depicted)

Registration & Data

Client Query

AggregateInformation Server

DirectoryServer

InformationServer

InformationCollector

Client

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 38: Information Systems describing resources

Grid Middleware IV 38

Experiments in Zhang et al.

1. How many users can query an information server at a time?

2. How many users can query a directory server?3. How does an information server scale with the

amount of data in it?4. How does an aggregator scale with the number

of information servers registered to it?

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 39: Information Systems describing resources

Grid Middleware IV 39

Experiments

Registration & Data

Client Query

AggregateInformation Server

DirectoryServer

InformationServer

InformationCollector

Client

4

1

2

3

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 40: Information Systems describing resources

Grid Middleware IV 40

Comparing Information Systems

We also looked at the queries in depth - NetLogger 3 phases

Connect, Process, Response

Response

Process

Connect

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 41: Information Systems describing resources

Grid Middleware IV 41

Testbed

Lucky cluster at Argonne 7 nodes, each has two 1133 MHz Intel PIII CPUs (with a 512 KB cache)

and 512 MB main memory

Users simulated at the UC nodes 20 P3 Linux nodes, mostly 1.1 GHz R-GMA has an issue with the shared file system, so we also simulated

users on Lucky nodes

All figures are 10 minute averages Queries happening with a one second wait between each

query (think synchronous send with a 1 second wait)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 42: Information Systems describing resources

Grid Middleware IV 42

Metrics

Throughput Number of requests processed per second

Response time Average amount of time (in sec) to handle a request

Load percentage of CPU cycles spent in user mode and system mode,

recorded by Ganglia High when running small number compute intensive aps

Load1 average number of processes in the ready queue waiting to run, 1

minute average, from Ganglia High when large number of aps blocking on I/O

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 43: Information Systems describing resources

Grid Middleware IV 43

Information Server Throughputvs. Number of Users

0

20

40

60

80

100

120

140

160

180

200

1 10 50 100 200 300 400 500 600No. of Users

MDS2.4 GRIS (cache) MDS2.4 GRIS (no cache)

R-GMA 3.4.6 LatestProducerServlet Hawkeye 1.0 Agent

(Larger number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 44: Information Systems describing resources

Grid Middleware IV 44

Query Times

0.001

0.01

0.1

1

10

100

1000

Connection Phase Processing Phase Response TransmissionPhase

Tim

e (s

ec)

MDS2 GRIS(caching) MDS2 GRIS(no caching)R-GMA ProducerServlet Hawkeye Agent

0.001

0.01

0.1

1

10

100

1000

Connection Phase Processing Phase Response

Transmission Phase

Tim

e (

se

c)

MDS2 GRIS(caching) MDS2 GRIS(no caching)

R-GMA ProducerServlet Hawkeye Agent

50 users 400 users

(Smaller number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 45: Information Systems describing resources

Grid Middleware IV 45

Experiment 1 Summary

Caching can significantly improve performance of the information server Particularly desirable if one wishes the server to scale well with an

increasing number of users

When setting up an information server, care should be taken to make sure the server is on a well-connected machine Network behavior plays a larger role than expected If this is not an option, thought should be given to duplicating the server if

more than 200 users are expected to query it

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 46: Information Systems describing resources

Grid Middleware IV 46

Directory Server Throughput

0

20

40

60

80

100

120

140

160

1 10 50 100 200 300 400 500 600No. of Users

Thr

ough

put (

quer

ies/

sec)

MDS2.4 GIIS (cache) R-GMA 3.4.6 RegistryHawkeye 1.0 Manager

(Larger number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 47: Information Systems describing resources

Grid Middleware IV 47

Directory Server CPU Load

0

10

20

30

40

50

60

70

1 10 50 100 200 300 400 500 600No. of Users

CP

U_l

oad

(%)

MDS2.4 GIIS (cache) R-GMA 3.4.6 RegistryHawkeye 1.0 Manager

(Smaller number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 48: Information Systems describing resources

Grid Middleware IV 48

Query Times

0.001

0.01

0.1

1

10

100

Connection Phase Processing Phase ResponseTransmission Phase

Tim

e (s

ec)

MDS2 GIIS(caching) R-GMA RegistryHawkeye Manager

0.001

0.01

0.1

1

10

100

Connection Phase Processing Phase ResponseTransmission Phase

Tim

e (s

ec)

MDS2 GIIS(caching) R-GMA RegistryHawkeye Manager

50 users 400 users

(Smaller number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 49: Information Systems describing resources

Grid Middleware IV 49

Experiment 2 Summary

Because of the network contention issues, the placement of a directory server on a highly connected machine will play a large role in the scalability as the number of users grows

Significant loads are seen even with only a few users, it will be important that this service be run on a dedicated machine, or that it be duplicated as the number of users grows.

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 50: Information Systems describing resources

Grid Middleware IV 50

Information Server Scalabilitywith Information Collectors

0

2

4

6

8

10

10 20 30 40 50 60 70 80 90Number of Information Collectors

Thr

ough

put (

quer

ies/

sec)

MDS2.4 GRIS (cache) MDS2.4 GRIS (no cache)R-GMA 3.4.6 LatestProducerServlet Hawkeye 1.0 Agent

(Larger number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 51: Information Systems describing resources

Grid Middleware IV 51

Experiment 3 Load Measurements

(Smaller number is better)

0

10

20

30

40

50

60

70

80

90

10 20 30 40 50 60 70 80 90Number of Information Collectors

CP

U_l

oad

(%)

MDS2.4 GRIS (cache) MDS2.4 GRIS (no cache)R-GMA 3.4.6 LatestProducerServlet Hawkeye 1.0 Agent

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 52: Information Systems describing resources

Grid Middleware IV 52

Experiment 3 Query Times

0.001

0.01

0.1

1

10

100

Connection Phase Processing Phase Response TransmissionPhase

Tim

e (s

ec)

MDS2 GRIS(caching) MDS2 GRIS(no caching)R-GMA ProducerServlet Hawkeye Agent

0.001

0.01

0.1

1

10

100

Connection Phase Processing Phase Response Transmission

Phase

MDS2 GRIS(caching) MDS2 GRIS(no caching)R-GMA ProducerServlet Hawkeye Agent

30 Info Collectors 80 Info Collectors

(Smaller number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 53: Information Systems describing resources

Grid Middleware IV 53

Sample Query

Note: log scaleideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 54: Information Systems describing resources

Grid Middleware IV 54

Experiment 3 Summary

The more the data is cached, the less often it has to be fetched, thereby increasing throughput

Search time isn’t significant at these sizes

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 55: Information Systems describing resources

Grid Middleware IV 55

Aggregate Information Server Scalability

0

2

4

6

8

10

1 10 50 100 200 300 400 500 600No. of Information Servers

Thr

ough

put (

quer

ies/

sec)

MDS2.4 GIIS (query all) MDS2.4 GIIS (query part)R-GMA 3.4.6 ProducerConsumer Hawkeye 1.0 Manager

(Larger number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 56: Information Systems describing resources

Grid Middleware IV 56

Load

0

10

20

30

40

50

60

1 10 50 100 200 300 400 500 600No. of Information Servers

CP

U_

loa

d (

%)

MDS2.4 GIIS (query all) MDS2.4 GIIS (query part)

R-GMA 3.4.6 ProducerConsumer Hawkeye 1.0 Manager

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 57: Information Systems describing resources

Grid Middleware IV 57

Query Response Times

0.001

0.01

0.1

1

10

Connection Phase Processing Phase Response TransmissionPhase

Tim

e (s

ec)

MDS2 GIIS(all) MDS2 GIIS(portion)R-GMA ProducerConsumer Hawkeye Manager

0.001

0.01

0.1

1

10

Connection Phase Processing Phase Response TransmissionPhase

Tim

e (s

ec)

MDS2 GIIS(all) MDS2 GIIS(portion)R-GMA ProducerConsumer Hawkeye Manager

50 Info Servers 400 Info Servers

(Smaller number is better)

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 58: Information Systems describing resources

Grid Middleware IV 58

Experiment 4 Summary

None of the Aggregate Information Servers scaled well with the number of Information Servers registered to them

When building hierarchies of aggregation, they will need to be rather narrow and deep having very few Information Servers registered to any one Aggregate Information Server.

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 59: Information Systems describing resources

Grid Middleware IV 59

Overall Results

Performance can be a matter of deployment Effect of background load Effect of network bandwidth

Performance can be affected by underlying infrastructure LDAP/Java strengths and weaknesses

Performance can be improved using standard techniques Caching; multi-threading; etc.

ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

Page 60: Information Systems describing resources

Grid Middleware IV 60

Observations on the performance study

Measures performance, not stability test bed size is only 7 machines and 10 clients local cluster, i.e. latency is well controlled

In a real-life deployment, complexity is determining factor in success simple systems are more likely to ‘survive’ systems with soft-state registration& timeouts (like MDS)

are more prone to instabilities than systems based on a persistent ‘elephant-style’ memory (like BDII) (c.f. sypical signal processing issues)

Page 61: Information Systems describing resources

Assorted Issues

Page 62: Information Systems describing resources

Grid Middleware IV 62

Access Control

AuthN is simple keeps out 99% of the rogue information doesn’t do a bit for privacy preservation not every cert owner is a grid user for a specific infrastructure

course grained ACLs better grid-mapfile, ACLs on access to service keeps known bad guys out still no privacy

fine-grained acls support within the DB engine is actually required

as it’s too hard to retro-fit otherwise

Page 63: Information Systems describing resources

Grid Middleware IV 63

Timeout issues

differences in timeouts in information providers lead to ‘phase difference’ effects in the system temporary amnesia of aggregate information indices cumulative delays

Timeouts registration period for GRIS with a GIIS time to bind to GRIS (defines if a resource is up or down) time to produce information entries cache TTL in the GIIS timeout before removing stale information from GIIS

essentially it’s feed-back signal theory ;-)

Page 64: Information Systems describing resources

Content of the Information System

Page 65: Information Systems describing resources

Grid Middleware IV 65

Approaches to resource information

Resource description GLUE CIM and similar but slightly different schemas for ARC and GT2

Job description Unicore’s AJO

Page 66: Information Systems describing resources

Grid Middleware IV 66

Information Schemas: GLUE

Describes resource availability information Common for various middleware suites

Known limitations not even all specified info is actually used contains lots of info that are un-used cannot express information needed for brokering at the

appropriate granularity level (this is fundamental for all such information schemas)

More specifics discussed with each component

See http://infnforge.cnaf.infn.it/glueinfomodel/

Page 67: Information Systems describing resources

Grid Middleware IV 67

Glue Abstractions

Core Entities Site: name, contact info, latitude/longitude, sponsor Service: type, version, endpoint, status, WDSL URL, Semantics

URN, StartTime

Cluster ComputingElement: Info, State, Policy, ACBRule, …

VOView: ACBRule, Running, Waiting, Free, ERT, WRT, … SubCluster: HostOperatingSystem*, HostAppSWRTEnv,

StorageElement

Page 68: Information Systems describing resources

Grid Middleware IV 68

GLUE Core Schema

Page 69: Information Systems describing resources

Grid Middleware IV 69

GLUE Cluster

Page 70: Information Systems describing resources

Grid Middleware IV 70

GLUE Storage

Page 71: Information Systems describing resources

Grid Middleware IV 71

GLUE Linking compute and storage

Useful is storage is accessible via POSIX, or via faster networks

position of such a binding is difficult abused for pure-SE info as this is the only place

where the file path to the storage was specified…

Page 72: Information Systems describing resources

Grid Middleware IV 72

Alternative schemas with the same viewpoint

Original GT2 schema (obsolete) NorduGrid ARC

Page 73: Information Systems describing resources

Grid Middleware IV 73

CIM Common Information Model

object oriented abstraction of information (DMTF) uses abstractions, dependencies, inheritance

goes beyond a mere information model by defining methods for standard object behaviour trying to solve every possible problem (and solve the

perpetuum mobile issue in the process …)

information components of CIM can use used to represent resources

Page 74: Information Systems describing resources

Grid Middleware IV 74

Common Information Model (CIM)

Object-oriented schema developed by the DMTF representation in different formats (such as XML) See http://www.dmtf.org/standards/cim/

Extended for grid elements by the GCS-WG BatchService, &c

The NAREGI grid is main user of this system

Page 75: Information Systems describing resources

Grid Middleware IV 75

Example: CIM Job Submission Interface

(See Core Model)

EnabledLogicalElement

System

(See Core Model)

Process

CreationClassName : string {key}Handle : string {key}Priority : uint32ExecutionState : uint16OtherExecutionDescription : stringCreationDate : datetimeTerminationDate : datetimeKernelModeTime : uint64UserModeTime : uint64WorkingSetSize : uint64

OSProcess

1

CreationClassName : string {key}Name : string {override, key}

JobDestination

HostedJobDestination

JobStatus : stringTimeSubmitted : datetimeScheduledStartTime : datetimeStartTime : datetimeElapsedTime : datetimeUntilTime : datetimeNotify : stringOwner : stringPriority : uint32 PercentComplete {units}DeleteOnCompletion : booleanErrorCode : uint16ErrorDescription : string

KillJob ([IN] DeleteOnKill : boolean) : uint32 {enum}

Job

JobDestination

Jobs

*

*w

OperatingSystem

(See System Model(Operating System)

Batch Jobs, Submission, and Processing

ServiceProcess

*

*

*w

1

(See Core Model)

LogicalElement

(See Core Model)

ManagedSystemElement

(See Core Model)

ManagedElement

(See Core Model)

SettingData

(See Core Model)

ScopedSettingData

ScheduledStartTime : datatimeReoccuringElapsedTime : datetimeUntilTime : datetimeNotify : stringOwner : stringPriority : uint32DeleteOnCompletion : boolean

ScheduledJob

OwningJobElement

0..1

*

AffectedJobElement

*

*

ConcreteJob

InstanceID : string {key}Name : string {override, req'd}

ProcessOfJob

* *

Association

AggregationAssociation with WEAK reference

Inheritance

Aggregation with WEAK reference

w

w

* Equivalent to: 0 .. nComposition Aggregation

OwningBatchJobQueue

BatchSAP

BatchProtocol : uint16[ ] {enum}BatchProtocolInfo : string[ ]

BatchService

<insert properties for a batch service>

*

*

(See Core Model)

Service

(See Core Model)

ServiceAccessPoint

BatchJob

JobID : string {key}SchedulingInformation :stringMaxCPUTime : uint32 {units}CPUTimeUsed : uint32 {units}BatchJobStatus : uint16 {enum}TimeCompleted : datetimeJobOrigination : string

BatchQueue

QueueEnabled : booleanQueueAccepting : booleanNumberOnQueue : uint32QueueStatus : uint16 {enum}QueueStatusInfo : stringDefaultJobPriority : uint32JobPriorityHigh : uint32JobPriorityLow : uint32MaxJobWallTime : uint32 {units}MaxJobCPUTime : uint32 {units}MaxTotalJobs : uint32MaxRunningJobs : uint32RunningJobs : uint32WaitingJobs : uint32

QueueForBatchService*w

1

*

*w

QueueForwardsToBatchSAP

*

Logical File

(See System Model(File System)

DataFile

(See System Model(File System)

*

ProcessExecutable

*

OSServicingJob

OSServicingQueue

*

*

*

*

RecurringBatchJob

<any addtl properties needed?>

BatchSchedulingData

*

*

Page 76: Information Systems describing resources

Grid Middleware IV 76

The Unicore information model

Describe the resource requests(so opposite viewpoint compared to GLUE)

the resources themselves need not be described, since they will ‘bid’ on the job requests

we will deal with this one in the Brokering & CE lecture

Page 77: Information Systems describing resources

Grid Middleware IV 77

Summary

Information systems used across multiple organisations and by multiple people or VOs taxonomy classiciation: (republishing; data flow) Any grid information system needs

programmatic access via producer/consumer APIs compositional IS freedom (VO or infrastructure hierarchies)

focus has been on resource selection used for brokering decisions, either by people or programs needs a common information schema or translators

for application-level information systems user-defined schema and a schema registry (like R-GMA)