Upload
elwin-boone
View
222
Download
3
Tags:
Embed Size (px)
Citation preview
What is a Grid ?“Dependable, consistent, pervasive access to resources”
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals in the absence of central control, omniscience, trust relationships
Make it easy to use diverse, geographically distributed, locally managed and controlled computing facilities as if they formed a coherent local cluster
What does the Grid do for you? You submit your work And the Grid
“Partitions” your work into convenient execution units based on the available resources, data distribution, … if there is scope for parallelism
Finds convenient places for it to be run Organises efficient access to your data
Caching, migration, replication Deals with authentication and authorization to the different
sites that you will be using Interfaces to local site resource allocation mechanisms, policies Runs your jobs Monitors progress Recovers from problems Tells you when your work is complete
Grid approach in many sciences and disciplines …
Mathematicians Solve NUG30
Looking for the solution to the NUG30 quadratic assignment problem
An informal collaboration of mathematicians and computer scientists
Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)
14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23
MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin
Network for Earthquake Engineering
Simulation
NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
On-demand access to experiments, data streams, computing, archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
Grid approach to address the High
Energy Physics (HEP) computing problem
HEP computing characteristics Large numbers of independent events to process Large data sets, mostly read-only Modest floating point requirement Batch processing for production & selection - interactive for analysis Commodity components are just fine for HEP Very large aggregate requirements – computation, data The LHC challenge
Jump in orders of magnitude wrt. previous experiments Geographical dispersion of people and of resources Scale
Petabytes per year of data Thousands of processors Thousands of disks Terabits/second of I/O bandwidth …
Complexity Lifetime (20 years) …
CMS: 1800 physicists150 institutes32 countries
World Wide Collaboration distributed computing & storage capacity
Solution?
Regional Computing Centres Serve better the needs of the world-wide
distributed community Data available nearby Reduce dependence on links to CERN Exploit established computing expertise &
infrastructure in national labs, universities
See http://www.cern.ch/monarc
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Grid as a possible approach Various technical issues to address
Resource Discovery Resource Management
Distributed scheduling, optimal co-allocation of CPU, data and network resources, uniform interface to different local resource managers, …
Data Management Petabyte-scale information volumes, high speed data moving and
replica, replica synchronization, data caching, uniform interface to mass storage management systems, …
Automated system mgmt techniques of large computing fabrics Monitoring Services Security
Authentication, Authorization … Scalability, Robustness, Resilience
Grid model to address such problems
State (HEP-centric view) circa 2.5 years ago
Globus project Globus toolkit: core services for Grid tools
and applications (Authentication, Information service, Resource management, etc…)
Good basis to build on but: No higher level services Handling of lots of data not addressed No production quality implementations Not possible to do real work with Grids yet …
DataGrid Project (EDG) Project started Jan 2001, duration 3 years Goals
To build a significant prototype of the LHC computing model To collaborate with and complement other European and US
projects To develop a sustainable computing model applicable to other
sciences and industry: biology, earth observation etc. Specific project objectives
Middleware for fabric & Grid management evaluation, test, and integration of existing M/W S/W and research and development of new S/W as appropriate
Large scale testbed Production quality demonstrations
Open source and technology transfer
See http://www.eu-datagrid.org
Main Partners CERN
CNRS - France
ESA/ESRIN - Italy
INFN - Italy
NIKHEF – The Netherlands
PPARC - UK
Research and Academic Institutes•CESNET (Czech Republic)•Commissariat à l'énergie atomique (CEA) – France•Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI)•Consiglio Nazionale delle Ricerche (Italy)•Helsinki Institute of Physics – Finland•Institut de Fisica d'Altes Energies (IFAE) - Spain•Istituto Trentino di Cultura (IRST) – Italy•Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany•Royal Netherlands Meteorological Institute (KNMI)•Ruprecht-Karls-Universität Heidelberg - Germany•Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands•Swedish Natural Science Research Council (NFR) - Sweden
Associated Partners
Industry Partners•Datamat (Italy)•IBM (UK)•Compagnie des Signaux (France)
The Middleware Working Group coordinates the development of the software modules leveraging, existing and long tested open standard solutions. Five parallel development teams implement the software: job scheduling, data management, grid monitoring, fabric management and mass storage management.
The Infrastructure Working Group is focused on the integration of middleware software with systems and networks to provide testbeds to demonstrate the effectiveness of DataGrid in production quality operations over high performance networks.
The Applications Working Group exploits the project developments to process large amounts of data produced by experiments in the fields of High Energy Physics (HEP), Earth Observations (EO) and Biology.
The Management Working Group has in charge the coordination of the entire project on a day-to-day basis and the dissemination of the results among industries and research institutes.
Applications
Middleware
Infrastructure
Managem
ent
Test
bed
Applications
Middleware
Infrastructure
Managem
ent
Test
bed
Applications
Middleware
Infrastructure
Managem
ent
Test
bed
Applications
Middleware
Infrastructure
Managem
ent
Test
bed
DataGrid Architecture
Collective ServicesCollective Services
Information &
Monitoring
Information &
Monitoring
Replica ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication and Accounting
Authorization Authentication and Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
SQL Database Services
SQL Database Services
Fabric servicesFabric services
ConfigurationManagement
ConfigurationManagement
Node Installation &Management
Node Installation &Management
Monitoringand
Fault Tolerance
Monitoringand
Fault Tolerance
Resource Management
Resource Management
Fabric StorageManagement
Fabric StorageManagement
Grid
Fabric
Local Computing
Grid Grid Application LayerGrid Application Layer
Data Management
Data Management
Job Management
Job Management
Metadata Management
Metadata Management
Object to File
Mapping
Object to File
Mapping
Service Index
Service Index
DataGrid achievements Testbed 1: first release of EDG middleware
First workload management system “Super scheduling" component using application data and
computing elements requirements
File Replication Tools (GDMP), Replica Catalog, SQL Grid Database Service, …
Tools for farm installation and configuration
… Used for real productions Towards testbed 2: new functionalities and
increased reliability
Job submission scenariodg-job-submit myjob.jdl
Myjob.jdlExecutable = "$(CMS)/exe/sum.exe";InputData = "LF:testbed0-00019";ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";DataAccessProtocol = "gridftp";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other.Architecture == "INTEL" && other.OpSys== "LINUX Red Hat 6.2";Rank = other.FreeCPUs;
Other HEP Grid initiatives PPDG (US) GriPhyN (US) DataTag & iVDGL
Transatlantic testbeds (to address interoperability)
LCG (LHC Computing Grid Project)
The Grid World: current status Dozens of major Grid projects in scientific
& technical computing/research & education
Considerable consensus on key concepts and technologies Open source Globus Toolkit™ a de facto
standard for major protocols & services Industrial interest emerging rapidly Opportunity: convergence of eScience and
eBusiness requirements & technologies
Problems Almost all projects have developed
specialized services which have been layered on top of standard services (security, remote job execution, etc.)
Patchwork of protocols and non-interoperable “standards” and difficult to re-use “implementations”
Exploit Web Services
Web Services Increasingly popular standards-based
framework for accessing network applications W3C standardization; Microsoft, IBM, Sun, others
WSDL: Web Services Description Language Interface Definition Language for Web services
SOAP: Simple Object Access Protocol XML-based RPC protocol; common WSDL target
WS-Inspection Conventions for locating service descriptions
UDDI: Universal Desc., Discovery, & Integration Directory for Web services
Open Grid Service Architecture (OGSA) Service orientation
Computational resources, storage resources, networks, programs, databases, etc. all represented as services
Allows standard interface definition mechanisms: multiple protocol bindings, multiple implementations, local/remote transparency
Grid service: web service with semantic for service interactions Management of transient instances (& state)
Global Grid Forum Mission
To focus on the promotion and development of Grid technologies and applications via the development and documentation of "best practices," implementation guidelines, and standards with an emphasis on "rough consensus and running code"
An Open Process for Development of Standards A Forum for Information Exchange A Regular Gathering to Encourage Shared Effort
See http://www.globalgridforum.org