View
259
Download
1
Tags:
Embed Size (px)
Citation preview
The EU DataGrid Architecture
The European DataGrid Project Team
http://www.eu-datagrid.org
The EDG Architecture Tutorial - n° 2
Contents
Middleware architecture overview
EDG structure Job scheduling
Fabric management
Data Management
Monitoring
Storage
Networking
Summary
The EDG Architecture Tutorial - n° 3
EDG middleware architecture Globus hourglass
Current EDG architectural functional blocks: Basic Services ( authentication, authorization, Replica
Catalog, secure file transfer,Info Providers) rely on Globus 2.0 (GSI, GRIS/GIIS,GRAM, MDS)
OS & Net services
Basic Services
High level GRID middleware
LHCVO common application layer
Other apps
ALICE ATLAS CMS LHCb
Specific application layer Other apps
GLOBUS 2.0
GRID middleware
The EDG Architecture Tutorial - n° 4
DataGrid Architecture
Collective ServicesCollective Services
Information & MonitoringInformation
& MonitoringReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication & Accounting
Authorization Authentication & Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
Database Services
Database Services
Fabric servicesFabric services
ConfigurationManagement
ConfigurationManagement
Node Installation &Management
Node Installation &Management
Monitoringand Fault Tolerance
Monitoringand Fault Tolerance
Resource Management
Resource Management
Fabric StorageManagement
Fabric StorageManagement
Grid
Fabric
Local Computing
Grid Grid Application LayerGrid Application Layer
Data Management
Data Management
Job Management
Job Management
Metadata Management
Metadata Management
Object to File
Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
The EDG Architecture Tutorial - n° 5
EDG middleware architecture: EDG interfaces
Computing Computing ElementsElements
SystemSystem ManagersManagers
ScientisScientiststs
OperatingOperating SystemSystem
FileFile SystemsSystems
StorageStorage ElementsElements
MassMass Storage Storage SystemsSystemsHPSS, CastorHPSS, Castor
UserUser AccountsAccounts
CertificateCertificate AuthoritiesAuthorities
ApplicationApplication DevelopersDevelopers
BatchBatch SystemsSystems
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local Application
Local Application
Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication
Accounting
Authorization Authentication
AccountingReplica CatalogReplica Catalog
Storage Element Services
Storage Element Services
SQL Database Services
SQL Database Services
Fabric servicesFabric services
ConfigManagem.
ConfigManagem.
Node Installation Managem.
Node Installation Managem.
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric
StorageManagem.
Fabric Storage
Managem.
Grid Application LayerGrid Application Layer
Data Managem.
Data Managem.
Job Managem.
Job Managem.
Metadata Managem.Metadata
Managem.Object to File MapObject to File Map
Logging & Book-
keeping
Logging & Book-
keeping
The EDG Architecture Tutorial - n° 6
EDG middleware architecture: The Workload Management System
(WP1)
WP1 is responsible for the Workload Management System (WMS).
The WMS is currently composed by the following parts:
User Interface (UI) : access point for the user to the GRID ( using JDL)
Resource Broker (RB) : the broker of GRID resources, matchmaking
Job Submission System (JSS) : Condor-G; interfacing batch systems
Information Index (II) : an LDAP server used as a filter to select resources
Logging and Bookkeeping services (LB) : MySQL databases to store Job Info
The EDG Architecture Tutorial - n° 7
WP1: Work Load Management
ComponentsJob Description Language
Resource Broker
Job Submission Service
Information Index
User Interface
Logging & Bookkeeping Service
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication
Accounting
Authorization Authentication
Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
Fabric servicesFabric services
ConfigManagement
ConfigManagement
Node Installation Management
Node Installation Management
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Data Managem.
Data Managem.
Metadata Managem.Metadata
Managem.Object to
File Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
Implementation: UI : python (LB client : C++)
RB : C++
JSS : C++, python
II : LDAP server
LB: MySQL, C++
Input/Output Sandboxes: GridFTP
Job Managem.
Job Managem.
SQL Database Services
SQL Database Services
WMS main interfaces: Globus Gatekeeper
WP2 Replica Catalog APIs
WP3 Information Systems
WP7 network monitoring info providers
End User (using JDL files, on the UI)
The EDG Architecture Tutorial - n° 10
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
File Transfer
The EDG Architecture Tutorial - n° 11
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
The EDG Architecture Tutorial - n° 12
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Replica Selection: Get ‘best’ file
The EDG Architecture Tutorial - n° 13
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Pre- Post-processing: Prepare files for transferValidate files after transfer
Replica Selection: Get ‘best’ file
The EDG Architecture Tutorial - n° 14
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Pre- Post-processing: Prepare files for transferValidate files after transfer
Replica Selection: Get ‘best’ file
Replication Automation:
Data Source subscription
The EDG Architecture Tutorial - n° 15
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Pre- Post-processing: Prepare files for transferValidate files after transfer
Replica Selection: Get ‘best’ file
Replication Automation:
Data Source subscription
Load balancing: Replicate based on usage
The EDG Architecture Tutorial - n° 16
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator
Pre- Post-processing: Prepare files for transferValidate files after transfer
Replica Selection: Get ‘best’ file
Replication Automation:
Data Source subscription
Load balancing: Replicate based on usage
The EDG Architecture Tutorial - n° 17
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator
Pre- Post-processing: Prepare files for transferValidate files after transfer
Replica Selection: Get ‘best’ file
Replication Automation:
Data Source subscription
Load balancing: Replicate based on usageMetadata:
LFN metadataTransaction informationAccess patterns
The EDG Architecture Tutorial - n° 18
File Management
Site A
Storage Element A Storage Element B
Site B
File BFile A File X
File Y File BFile A File C
File D
Replica Catalog: Map Logical to Site files
File Transfer
Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator
Pre- Post-processing: Prepare files for transferValidate files after transfer
Replica Selection: Get ‘best’ file
Replication Automation:
Data Source subscription
Load balancing: Replicate based on usageMetadata:
LFN metadataTransaction informationAccess patterns
The EDG Architecture Tutorial - n° 19
Current State File Transfer: Use GridFTP – deployed
Close collaboration with Globus NetLogger (Brian Tierney and John Bresnahan)
Replication: GDMP – deployed Wrapper around Globus ReplicaCatalog All functionality in one integrated package Using Globus 2 Uses GridFTP for transferring file
Replication: edg-replica-manager – deployed
Replication: Replica Location Service Giggle – in testing Distributed Replica Catalog
Replication: Replica Manager Reptor – in testing
Optimization: Replica Selection OptorSim – in simulation
Metadata Storage: SQL Database Service Spitfire – deployed Servlets on HTTP(S) with XML (XSQL) GSI enabled access + extensions
GSI interface to CASTOR – delivered
The EDG Architecture Tutorial - n° 20
WP2: Data Management
Deployed ComponentsGridFTP
Replica Manager - edg-replica-manager
Replica Catalog - globus-replica-catalog
GDMP
Spitfire
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorGrid
SchedulerGrid
SchedulerReplica
ManagerReplica
Manager
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication
Accounting
Authorization Authentication
Accounting
Replica CatalogReplica Catalog
Fabric servicesFabric services
ConfigManagement
ConfigManagement
Node Installation Management
Node Installation Management
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Job Managem.
Job Managem.
Metadata Managem.Metadata
Managem.Object to
File Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
Implementation: RM: C++ classes (under development)
RC : Globus Replica Catalog wrapper
GDMP : C++
Spitfire : Java, Web Services
Data Managem.
Data Managem.
SQL Database Services
SQL Database Services
WP2 main interfaces: The GRID Storage Element
WP1 Resource Broker APIs
WP3 GRID Info services
WP7 network monitoring info providers
End User (using GDMP)
Storage Element Services
Storage Element Services
The EDG Architecture Tutorial - n° 21
Copy data file to storage element:globus-url-copy file:///${chemin}/L69999
gsiftp://lxshare0219.cern.ch/flatfiles/SE1/lhcb/L69999
Register stored data in the catalog:/opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export
GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf;/opt/edg/bin/gdmp_register_local_file -d /flatfiles/SE1/lhcb"
Publish catalog:/opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export
GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf; /opt/edg/bin/gdmp_publish_catalogue -n"
Copy output to MSS: rfcp L1600061 /castor/cern.ch/lhcb/mc/L1600061
Example of Data Management by LHCb
The EDG Architecture Tutorial - n° 22
ReplicaOptimiser
Replica Manager
Replica Catalogue
SE
CE
ReplicaOptimiser
Replica Manager
SE
CEphysical file transfer
communication
Client
The Replica Manager APIs
The EDG Architecture Tutorial - n° 23
The Replica Manager APIs
RM.copy(PhysicalFileName source,
PhysicalFileName destination,
String protocol):Status
allows for third-party transfer
transfer between: two StorageElements or ComputingElement and Storage Element Space management policies under development
The EDG Architecture Tutorial - n° 24
RM.add/deletePhysicalFileName(LogicalFileName lfn,
PhysicalFileName pfn)
Replica Catalogue operations only - no file transfer
RM.copyAndAddPhysicalFile(PhysicalFileName source,
PhysicalFileName destination,
LogicalFileName lfn,
String protocol):Status
third-party transfer but :
files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i.e. needs to be registered in the RC)!
RM.deletePhysicalFile(LogicalFileName lfn,
PhysicalFileName pfn)
The Replica Manager APIs
The EDG Architecture Tutorial - n° 25
WP2 next generation Replication Services
Replica Manager
Replica Metadata
Replica Location
File Transfer
Optimization
Transaction
Consistency
Preprocessing
Postprocessing
Subscription
Client
Reptor
Giggle
RepMeC
Optor
GDMP
The EDG Architecture Tutorial - n° 26
Replication Services Architecture
ReplicaLocation
Index
Site
Replica Manager
StorageElement
ComputingElement
Optimiser
Resource Broker
User Interface
Pre-/Post-processing
Core API
Optimisation API
Processing API
LocalReplicaCatalog
ReplicaLocation
Index
ReplicaMetadata Catalog
ReplicaLocation
Index
Site
Replica Manager
StorageElement
ComputingElement
Optimiser
Pre-/Post-processing
LocalReplicaCatalog
The EDG Architecture Tutorial - n° 27
Metadata Management and Security
Project Spitfire
'Simple' Grid Persistency Grid Metadata Application Metadata Unified Grid enabled front end to relational databases.
Metadata Replication and Consistency
Publish information on the metadata service
Secure Grid Services
Grid authentication, authorization and access control mechanisms enabled in Spitfire
Modular design, reusable by other Grid Services
The EDG Architecture Tutorial - n° 28
Spitfire Architecture
Oracle DB2 PostGres MySQL
Atomic RDBMS is always consistent
No local replication of data
Role-based authorization
XSQL Servlet as one access mode
for ‘simple’ web access
Web/Grid Services Paradigm SOAP interfaces JDBC interface to RDBMS
Plugability and extensibility
OracleLayer DB2Layer PGLayer MyLayerLocal Spitfire
Layer
Connecting Layer Global Spitfire LayerSOAP
SOAP SOAP
SOAP SOAP
SOAP
The EDG Architecture Tutorial - n° 29
WP3’s task is to provide information about
The Grid itself This includes information about resources (ComputingElements, StorageElements and the Network), for which the Globus MDS is a common solution; and job status information(as implemented by WP1's Logging and Bookkeeping).
Grid applications This is information published by user jobs. This is used for performance monitoring.
WP3 : GRID monitoring and Info Providers
The EDG Architecture Tutorial - n° 30
Main WP3 components: MDS v 2.1: the Globus Monitoring and Discovery Services based on
Soft State Registration protocols and LDAP aggregate directory services
Ftree : EDG developed directory service based on OpenLDAP plus caching to address shortcoming in MDS v1, optimizing data access performances
R-GMA: Relational GMA (Grid Monitoring Architecture [Consumers, Producers and Directory Services, GGF] ) implementation which makes information from producers available to consumers as relations (tables) . It also uses relations to handle the registration of producers. R-GMA is consistent with GMA principles.
GRM / PROVE: Application monitoring and visualization tools of the P-GRADE graphical parallel programming environment, properly modified for application monitoring in the DataGrid. The instrumentation library of GRM is generalized for a flexible trace event specification. The components of GRM will be connected to the R-GMA using its Producer and Consumer APIs.
WP3 : GRID monitoring and Info Providers
The EDG Architecture Tutorial - n° 31
R-GMA
Use the GMA from GGF
A relational implementation
Applied to both information and monitoring
Creates impression that you have one RDBMS per VO
Producer
Consumer
Registry
subscribe
lookup
The EDG Architecture Tutorial - n° 32
Relational Approach
Producers announce: SQL “CREATE TABLE” publish: SQL “INSERT”
Consumers collect: SQL “SELECT”
The EDG Architecture Tutorial - n° 33
R-GMA
API – Servlet communication http(s) in
XML back
Sensor Code
ProducerAPI
Application Code
ConsumerAPI
ProducerServlet
RegistryAPI
Registry Servlet
SchemaAPI
Schema Servlet
Consumer Servlet
RegistryAPI
The EDG Architecture Tutorial - n° 34
Schema & ContributionsCPULoad (Global Schema)
Country Site Facility Load Timestamp
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
CH CERN ALICE 0.9 19055611022002
CH CERN CDF 0.6 19055511022002
CPULoad (Producer3)
CH CERN ATLAS 1.6 19055611022002
CH CERN CDF 0.6 19055511022002
CPULoad (Producer 1)
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
CPULoad (Producer 2)
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
The EDG Architecture Tutorial - n° 35
Contributions are Views
CPULoad (Producer 1)
UK RAL CDF 0.3 19055711022002
UK RAL ATLAS 1.6 19055611022002
CPULoad (Producer 2)
UK GLA CDF 0.4 19055811022002
UK GLA ALICE 0.5 19055611022002
SELECT * FROM cpuLoad
WHERE country = ’UK’ AND site = ’RAL’
SELECT * FROM cpuLoad
WHERE country = ’UK’ AND site = ’GLA’
The EDG Architecture Tutorial - n° 36
WP3: GRID Monitoring
ComponentsMDS / FTree
R-GMA
GRM/Prove
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication
Accounting
Authorization Authentication
Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
Fabric servicesFabric services
ConfigManagement
ConfigManagement
Node Installation Management
Node Installation Management
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Data Managem.
Data Managem.
Metadata Managem.Metadata
Managem.Object to
File Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
Implementation: MDS : LDAP, Globus GRIS, GIIS
FTree : OpenLDAP, caching
RGMA : Java , C++, MySQL, TomCat
GRM / PROVE : P-GRADE
Job Managem.
Job Managem.
SQL Database Services
SQL Database Services
WP3 main interfaces: WP1 Resource Broker ( InfoIndex)
WP2 RM optimizer
all GRID services producing info (SE,CE..)
WP7 network monitoring
The EDG Architecture Tutorial - n° 37
WP4 is responsible to deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. The computing fabric is called the Computing Element in EDG.
User Job Control and Management (Grid and local jobs) on fabric batch and/or interactive CPU services
Gridification – Grid interface to fabric resources
Resource Management – manage underlying batch services
Automated System Administration for Computing Fabric Elements. These subsystems are reserved for system administrators and operators for performing system maintenance
Configuration Management
Installation Management
Fabric Monitoring
EDG middleware architecture: WP4 : Fabric Management
Components
The EDG Architecture Tutorial - n° 38
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
WP4 Architecture logical overview
The EDG Architecture Tutorial - n° 39
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
WP4 Architecture logical overview
- Interface between Grid-wide services and local fabric;
- Provides local authentication, authorization and mapping of grid credentials.
- Interface between Grid-wide services and local fabric;
- Provides local authentication, authorization and mapping of grid credentials.
The EDG Architecture Tutorial - n° 40
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
WP4 Architecture logical overview
- provides transparent access (both job and admin) to different cluster batch systems;
- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).
- provides transparent access (both job and admin) to different cluster batch systems;
- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).
The EDG Architecture Tutorial - n° 41
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
WP4 Architecture logical overview
- provides the tools to install and manage all software running on the fabric nodes;
-Agent to install, upgrade, remove and configure software packages on the nodes.
-bootstrap services and software repositories.
- provides the tools to install and manage all software running on the fabric nodes;
-Agent to install, upgrade, remove and configure software packages on the nodes.
-bootstrap services and software repositories.
The EDG Architecture Tutorial - n° 42
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
WP4 Architecture logical overview
-provides a central storage and management of all fabric configuration information;
-Compile HLD templates to LLD node profiles
- central DB and set of protocols and APIs to store and retrieve information.
-provides a central storage and management of all fabric configuration information;
-Compile HLD templates to LLD node profiles
- central DB and set of protocols and APIs to store and retrieve information.
The EDG Architecture Tutorial - n° 43
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
WP4 Architecture logical overview - provides the tools
for gathering monitoring information on fabric nodes;
-central measurement repository stores all monitoring information;
- fault tolerance correlation engines detect failures and trigger recovery actions.
- provides the tools for gathering monitoring information on fabric nodes;
-central measurement repository stores all monitoring information;
- fault tolerance correlation engines detect failures and trigger recovery actions.
The EDG Architecture Tutorial - n° 44
User job management (Grid and local)
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Monitoring
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
The EDG Architecture Tutorial - n° 45
User job management (Grid and local)
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Monitoring
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
- Submit job- Submit job
The EDG Architecture Tutorial - n° 46
User job management (Grid and local)
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Monitoring
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
- publish resource and accounting information
- publish resource and accounting information
The EDG Architecture Tutorial - n° 47
User job management (Grid and local)
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Monitoring
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
- Optimized selection of site
- Optimized selection of site
The EDG Architecture Tutorial - n° 48
User job management (Grid and local)
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Monitoring
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
- Authorize
- Map grid local credentials
- Authorize
- Map grid local credentials
The EDG Architecture Tutorial - n° 49
User job management (Grid and local)
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Monitoring
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
- Select an optimal batch queue and submit
- Return job status and output
- Select an optimal batch queue and submit
- Return job status and output
The EDG Architecture Tutorial - n° 50
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
The EDG Architecture Tutorial - n° 51
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
- Node malfunction detected
- Node malfunction detected
The EDG Architecture Tutorial - n° 52
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
-Remove node from queue
-Wait for running jobs(?)
-Remove node from queue
-Wait for running jobs(?)
The EDG Architecture Tutorial - n° 53
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
- Update configuration templates
- Update configuration templates
The EDG Architecture Tutorial - n° 54
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
- Trigger repair- Trigger repair
The EDG Architecture Tutorial - n° 55
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
- Repair (e.g. restart, reboot, reconfigure, …)
- Repair (e.g. restart, reboot, reconfigure, …)
The EDG Architecture Tutorial - n° 56
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
- Node OK detected- Node OK detected
The EDG Architecture Tutorial - n° 57
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
- Put back node in queue
- Put back node in queue
The EDG Architecture Tutorial - n° 58
Automated management of large clusters
WP4 subsystems
Other Wps
Farm A (LSF) Farm B (PBS)
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault ToleranceResource
Management
Information
Invocation
Automation
The EDG Architecture Tutorial - n° 59
LCFG (Local ConFiGuration system)
Widely used fabric tool, whose purpose is to handle automated installation and configuration in a very diverse and evolving environment
Mechanism: Abstract configuration parameters are stored in a central
repository located in the LCFG server.
Scripts on the host machine (LCFG client) read these configuration parameters and either generate traditional configuration files, or directly manipulate various services.
The EDG Architecture Tutorial - n° 60
WP4: Fabric Management
ComponentsLCFG
Fabric Monitoring
PBS & LSF info providers
Image installation
Config. Cache Mgr
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication
Accounting
Authorization Authentication
Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
Fabric servicesFabric services
ConfigManagement
ConfigManagement
Node Installation Management
Node Installation Management
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Data Managem.
Data Managem.
Metadata Managem.Metadata
Managem.Object to
File Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
Implementation: LCFG : C++, XML, HTTP
Job Managem.
Job Managem.
SQL Database Services
SQL Database Services
WP4 main interfaces: WP1 Resource Broker ( InfoIndex)
WP2 Data management
WP5 Storage Element
WP3 GRID Info Services
The EDG Architecture Tutorial - n° 61
WP5 delivers the Grid interface to Storage.
Its service, the Storage Element (SE) is interfacing to underlying Mass Storage Systems or simple storage services.
WP5 : Mass Storage Management
The EDG Architecture Tutorial - n° 62
Interface1
Interface3
Interface2
Message Queue
Session Manager
System Log House Keeping
MetaData
MSSInterface
MSSInterface
MSS1 MSS2
Top layer
Core
Bottom layer
Clients ( RB,JSS, RM, GDMP, InfoServices(WP3),User Applic running on CEs, CLIs)
Storage Element
The SE architecture
The EDG Architecture Tutorial - n° 63
Client SE
ReplicaManager/Catalog
Storage6
2
3
4
1
1. The Client asks a catalog to provide the location of a file2. The catalog responds with the name of an SE3. The client asks the SE for the file4. The SE asks the storage system to provide the file5. The storage system sends the file to the client through the SE or 6. directly
5
6
SE Interactions
The EDG Architecture Tutorial - n° 64
WP5: Mass Storage Management Achievements
Definition of Architecture and Design for DataGrid storage Element
Collaboration with Globus on GridFTP/RFIO
Collaboration with PPDG on control API Staging from/to CASTOR at CERN
succesfully implemented and tested Succesfully Interfaced to GDMP
Supported Storage Systems: UNIX disk systems HPSS (High Performance Storage
System) CASTOR (through RFIO) GridFTP servers DMF Enstore
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication
Accounting
Authorization Authentication
Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
Fabric servicesFabric services
ConfigManagement
ConfigManagement
Node Installation Management
Node Installation Management
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Data Managem.
Data Managem.
Metadata Managem.Metadata
Managem.Object to
File Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
Job Managem.
Job Managem.
SQL Database Services
SQL Database Services
WP5 (SE) main interfaces: WP1 Resource Broker & JSS
WP2 RM, RC
WP7 for GRIDftp monitoring
WP3 GRID Info Services
The EDG Architecture Tutorial - n° 65
WP6: TestBed Integration and demonstrators
WP6 goals: the EDG testbed
Integration of EDG sw releases (currently 1.2) and deployment all over the EDG testbed : the integration team
Working implementation of multiple VOs & basic security infrastructure
Definition of acceptable usage contracts and creation of Certification Authorities group
Set up of the Authorization Working Group to manage authorization policies on the testbed
Components
Support for test-VO, mkgridmap tools
Globus packaging & EDG config
Build tools, CVS central s/w repository
End-user documents
Collective ServicesCollective Services
Info & MonitorInfo &
MonitorReplica
ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication Accounting
Authorization Authentication Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
Fabric servicesFabric services
ConfigManagement
ConfigManagement
Node Installation Management
Node Installation Management
MonitoringFault
Tolerance
MonitoringFault
Tolerance
Resource Managem.Resource
Managem.Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Data Managem.
Data Managem.
Metadata Managem.Metadata
Managem.Object to
File Mapping
Object to File
Mapping
Logging & Book-
keeping
Logging & Book-
keeping
Job Managem.
Job Managem.
SQL Database Services
SQL Database Services