Upload
adrian-bolton
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
GridPP7 - 02 July 2003
Stefan Stonjek Slide 1
SAM middleware components
Stefan StonjekUniversity of Oxford
7th GridPP Meeting 02nd July 2003 Oxford
GridPP7 - 02 July 2003
Stefan Stonjek Slide 2
Outline
Introduction to SAMInternals of a SAM stationDesign of a SAM stationSAM-Grid ArchitectureD0 reconstruction effortOutlook and Summary
GridPP7 - 02 July 2003
Stefan Stonjek Slide 3
Introduction to SAM
SAM: Sequential data Access via Meta-dataSAM is a distributed data handling systemOne SAM station per processing node/cluster/site
D0: RAL, IC, Manchester, LancasterCDF: RAL, Oxford, Glasgow, Scotgrid, UCL, Liverpool
GridPP7 - 02 July 2003
Stefan Stonjek Slide 4
SAM – central vs. decentral
Each SAM station has a local file cacheFiles are transferred from station to station (no central storage, peer to peer)
Central database keeps track of all files, metadata, users, etc. in the SAM systemNo full peer to peer yet Peer to peer with central database
GridPP7 - 02 July 2003
Stefan Stonjek Slide 5
The SAM Station
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
Each station runs one station master processThis communicates with the outside worldLocal SAM processes talk to the station masterStation master talks with the central database
GridPP7 - 02 July 2003
Stefan Stonjek Slide 6
A SAM Analysis Project
For every new analysis job a new project is createdCorresponds to a list of filesProject-Master process keeps track of the status of each file in this projectA project can have multiple consumersEvery file to only one consumerAllow easy processing on farms
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
GridPP7 - 02 July 2003
Stefan Stonjek Slide 7
SAM File transfers
Station initiates file transfersStation keeps track of the needs of all projects
transfer files accordinglyStager uses can use different transfer protocols
Depends on local and remote configuration
Cache content of each station is kept in central database
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
GridPP7 - 02 July 2003
Stefan Stonjek Slide 8
SAM Station to database communication
Station talks to a db-server (=CORBA to SQL translator)ORACLE databaseJust one client for the database Reduce load to database
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
GridPP7 - 02 July 2003
Stefan Stonjek Slide 9
Station to Station Transfer
File transfer is done station to stationSeveral possible transfer protocolsNegotiated between stationsEach station has it’s own cacheLocation information from central database
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
node
Stationremote-station
smaster
Disks/Cache
Stager
stagerng
Project
pmaster
Project
pmaster
Project
pmasterconsumer consumer
eworker
eworker
gridftp
bbftp
Other CacheDCache
GridPP7 - 02 July 2003
Stefan Stonjek Slide 10
Grid Job and Information Management (JIM)
Counterpart for the data handling system (SAM)Based on existing tools (Globus, Condor etc.)Allow brokering based on information from the data-handling system
GridPP7 - 02 July 2003
Stefan Stonjek Slide 11
SAM-Grid Architecture
GridPP7 - 02 July 2003
Stefan Stonjek Slide 12
Job HandlingCondor for submission and brokering
Decision making is based on: Resource information (general and job specific) Job information
Decision making is interfaced with data handling middleware
not just static resource information allows brokering to include data handling considerations
Decision making is entirely in the Condor framework strong promotion of standards interoperability
GRAM protocol to transfer job to execution siteAuthentication via GSI (Grid Security Infrastructure)
GridPP7 - 02 July 2003
Stefan Stonjek Slide 13
Job Management
JOB
Computing Element
Submission Client
User Interface
QueuingSystem
User Interface
User Interface
BrokerMatch
Making Service
Information Collector
Execution Site #1
Submission Client
Submission Client
Match Making Service
Match Making Service
Computing Element
Grid Sensors
Execution Site #n
Queuing System
Queuing System
Grid Sensors
Storage Element
Storage Element
Computing Element
Storage Element
Data Handling System
Data Handling System
Storage Element
Storage Element
Storage Element
Storage Element
Information Collector
Information Collector
Grid Sensors
Grid Sensors
Grid Sensors
Grid Sensors
Computing Element
Computing Element
Data Handling System
Data Handling System
Data Handling System
Data Handling System
JOB
Computing Element
Submission Client
User Interface
QueuingSystem
User Interface
User Interface
BrokerMatch
Making Service
Information Collector
Execution Site #1
Submission Client
Submission Client
Match Making Service
Match Making Service
Computing Element
Grid Sensors
Execution Site #n
Queuing System
Queuing System
Grid Sensors
Storage Element
Storage Element
Computing Element
Storage Element
Data Handling System
Data Handling System
Storage Element
Storage Element
Storage Element
Storage Element
Information Collector
Information Collector
Grid Sensors
Grid Sensors
Grid Sensors
Grid Sensors
Computing Element
Computing Element
Data Handling System
Data Handling System
Data Handling System
Data Handling System
GridPP7 - 02 July 2003
Stefan Stonjek Slide 14
JIM Monitoring
Information Management
Resource description for brokering
Infrastructure for monitoring
Monitors sites, resources and jobsDistributed knowledgeWeb based information retrival
Web Server
Site 1 Information System
IPIPIP
Web Browser
Web Server 1
Site 2 Information System
IPIP
IPIP
Web Server N
Site N Information System
Web Server
Site 1 Information System
IPIPIP
Web Browser
Web Server 1
Site 2 Information System
IPIP
IPIP
Web Server N
Site N Information System
GridPP7 - 02 July 2003
Stefan Stonjek Slide 15
SAM-Grid Logistics
SiteSite SiteSite SiteSite
Resource Selector
Info Collector
Info Gatherer
Match Making
User InterfaceUser Interface User InterfaceUser Interface
SubmissionGlobal Job Queue
Grid Client
SubmissionSubmission
User InterfaceUser Interface User InterfaceUser Interface
Global DH ServicesSAM Naming Server
SAM Log Server
Resource Optimizer
SAM DB ServerRC MetaData Catalog
Bookkeeping Service
SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker Nodes
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
Info Manager
XML DB server
Site Conf.Glob/Loc JID map...
Info Providers
MDS
MSS Cache Site
Web ServGrid Monitoring
User Tools
SiteSite SiteSite SiteSite
Resource Selector
Info Collector
Info Gatherer
Match Making
Resource Selector
Info Collector
Info Gatherer
Match Making
Info Collector
Info Gatherer
Match Making
User InterfaceUser Interface User InterfaceUser InterfaceUser InterfaceUser Interface User InterfaceUser Interface
SubmissionGlobal Job Queue
Grid Client
SubmissionGlobal Job Queue
Grid Client
SubmissionSubmission
User InterfaceUser Interface User InterfaceUser Interface
Global DH ServicesSAM Naming Server
SAM Log Server
Resource Optimizer
SAM DB ServerRC MetaData Catalog
Bookkeeping Service
Global DH ServicesSAM Naming Server
SAM Log Server
Resource Optimizer
SAM DB ServerRC MetaData Catalog
Bookkeeping Service
SAM DB ServerRC MetaData Catalog
Bookkeeping Service
SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker Nodes
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
Info Manager
XML DB server
Site Conf.Glob/Loc JID map...
Info Providers
MDS
MSS Cache Site
SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker Nodes
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker Nodes
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
SAM Stager(s)
SAM Station(+other servs)
Data Handling
SAM Stager(s)SAM Stager(s)
SAM Station(+other servs)
Data Handling
Worker NodesWorker Nodes
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Grid Gateway
Local Job Handler(CAF, D0MC, BS, ...)
JIM Advertise
Local Job Handling
Cluster
AAA
Dist.FS
Info Manager
XML DB server
Site Conf.Glob/Loc JID map...
Info Providers
MDS
Info Manager
XML DB server
Site Conf.Glob/Loc JID map...
XML DB server
Site Conf.Glob/Loc JID map...
Site Conf.Site Conf.Glob/Loc JID mapGlob/Loc JID map......
Info Providers
MDS
Info Providers
MDS
MSS CacheMSS Cache Site
Web ServGrid Monitoring
User Tools
Web ServGrid Monitoring
User Tools
GridPP7 - 02 July 2003
Stefan Stonjek Slide 16
Outlook:D0 Reprocessing Challenge
D0 will reprocess all Run II data01st Sep 2003 – 25th Nov 2003 (86 days), Conference deadlineLions share at D0 remote computing facilities, including RAL, IC, Manchester, Lancaster Karlsruhe, Wuppertal, Lyon, Michigan,
NIKHEF etc.SAM to move data, runjob site job managementJIM submission and monitoring
GridPP7 - 02 July 2003
Stefan Stonjek Slide 17
Outlook:D0 Reprocessing Challenge (2)
150 million events / 22.5 TByte input data Second level to second level 25 TByte output data
SAM routinely handles this data volume Currently mainly on-site of Fermilab
First large scale, large volume “real” data challengeFirst HEP experiment to reprocess data in distributed fashion
GridPP7 - 02 July 2003
Stefan Stonjek Slide 18
Summary
SAM is a distributed data handling system It is used in production
JIM allows to broker jobs based on job specific information and dynamic resourcesGridPP plays a vital role for the development of SAM-Grid