Grid Computing: like herding cats?Stephen JarvisHigh Performance Systems GroupUniversity of Warwick, UK
Sessions on GridWhat are we going to cover today?A brief historyWhy we are doing itApplicationsUsersChallengesMiddlewareWhat are you going to cover next week?technical talk on the specifics of our workIncluding application to e-Business and e-Science
An Overused AnalogyElectrical Power GridComputing power might somehow be like electrical powerplug inswitch onhave access to unlimited powerWe dont know who supplies the power, or where it comes fromjust pick up the bill at the end of the monthIs this the future of computing?
Sounds great - but how long?
Is the computing infrastructure available?Computing power1986: Cray X-MP ($8M)2000: Nintendo-64 ($149)2003: Earth Simulator (NEC), ASCI Q (LANL)2005: Blue Gene/L (IBM), 360 TeraflopsLook at www.top500.org for current supercomputers!
Storage & NetworkStorage capabilities1986: Local data stores (MB)2002: Goddard Earth Observation System 29TBNetwork capabilities1986 : NFSNET 56Kb/s backbone1990s: Upgraded to 45Mb/s (gave us the Internet)2000s: 40 Gb/s
Many Potential ResourcesGRID
Some History: NASAs Information Power GridThe vision mid 90sto promote a revolution in how NASA addresses large-scale science and engineering by providing a persistent HPC infrastructureComputing and data management serviceson-demandlocate and co-schedule multi-Center resources address large-scale and/or widely distributed problemsAncillary services workflow management and coordination security, charging
Lift CapabilitiesDrag CapabilitiesResponsivenessThrust performanceReverse Thrust performanceResponsivenessFuel ConsumptionBraking performanceSteering capabilitiesTractionDampening capabilitiesCrew Capabilities- accuracy- perception- stamina- re-action times- SOPsEngine ModelsAirframe ModelsLanding Gear ModelsStabilizer ModelsWhole system simulations are produced by coupling all of the sub-system simulations
VirtualNational Air SpaceVNASGRCLaRCAirframe ModelsLandingGear ModelsARCStabilizer ModelsFAA Ops DataWeather DataAirline Schedule DataDigital Flight DataRadar TracksTerrain DataSurface Data22,000 CommercialUS Flights a day50,000 Engine Runs22,000 Airframe Impact Runs132,000 Landing/Take-offGear Runs48,000 Human Crew Runs66,000 Stabilizer Runs44,000 Wing RunsSimulationDrivers(Being pulled togetherunder the NASA AvSPAviation ExtraNet (AEN)National Air Space Simulation Environment
What is a Computational Grid?A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities.The capabilities need not be high end.The infrastructure needs to be relatively transparent.
Selected Grid ProjectsUS BasedNASA Information Power GridDARPA CoABS GridDOE Science GridNSF National Virtual ObservatoryNSF GriPhyNDOE Particle Physics Data GridNSF DTF TeraGridDOE ASCI DISCOM GridDOE Earth Systems Grid etcEU BasedDataGrid (CERN, ..)EuroGrid (Unicore)Damien (Metacomputing)DataTag (TransAtlanticTestbed, )Astrophysical Virtual ObservatoryGRIP (Globus/Unicore)GRIA (Industrial applications)GridLab (Cactus Toolkit, ..)CrossGrid (Infrastructure Components)EGSO (Solar Physics)
Other National Projects UK - e-Science GridNetherlands VLAM-G, DutchGridGermany UNICORE Grid, D-GridFrance Etoile GridItaly INFN GridEire Grid-IrelandScandinavia - NorduGridPoland PIONIER GridHungary DemoGridJapan JpGrid, ITBLSouth Korea N*Grid Australia Nimrod-G, .Thailand Singapore AsiaPacific Grid
The Big Spend: two examplesUS Tera Grid$100 Million US Dollars (so far)5 supercomputer centresNew ultra-fast optical network 40Gb/s Grid software and parallel middlewareCoordinated virtual organisationsScientific applications and usersUK e-Science Grid250 Million (so far)Regional e-Science centresNew infrastructureMiddleware developmentBig science projects
CambridgeNewcastleEdinburghOxfordGlasgowManchesterCardiffSotonLondonBelfastDLRLHinxtonLancasterWhite RoseBirmingham/WarwickBristolUCLe-Science Grid
Who wants Grids and why?NASAAerospace simulations, Air traffic controlNWS, In-aircraft computingVirtual AirspaceFree fly, Accident preventionIBMOn-demand computing infrastructureProtect softwareSupport business computingGovernments Simulation experimentsBiodiversity, genomics, military, space science
Classes of Grid applications
CategoryExamplesCharacteristicsDistributed supercomputingDIS, Stellar dynamics, ChemistryVery large problems, lots of CPU, memoryHigh ThroughputChip design, cryptographyHarnessing idle resourcesOn DemandMedical, Weather predictionRemote resources, time boundedData IntensivePhysics, Sky surveysSynthesis of new informationCollaborativeData exploration, virtual environmentsConnection between many parties
Classes of Grid
CategoryExamplesCharacteristicsData GridEU DataGridLots of data sources from one site, processing off siteCompute GridChip design, cryptographyHarnessing and connecting rare resourcesScavenging GridSETICPU Cycle steeling, commodity resourcesEnterprise GridBankingMulti-site, but one organisation
Discovery Net Project
Nucleotide Annotation WorkflowsDownload sequence from Reference ServerSave to Distributed Annotation Server1800 clicks 500 Web access200 copy/paste 3 weeks work in 1 workflow and few second execution
An e-science challenge non-trivialNASA IPG as a possible paradigmNeed to integrate rigorously if to deliver accurate & hence biomedically useful resultsNoble (2002) Nature Rev. Mol. Cell.Biol. 3:460Sansom et al. (2000) Trends Biochem. Sci. 25:368molecularcellularorganismGrand Challenge: Integrating Different Levels of Simulation
Classes of Grid users
ClassPurposeMakes Use OfConcernsEnd UsersSolve problemsApplicationsTransparency, performanceApplication DevelopersDevelop applicationsProgramming models, toolsEase of use, performanceTool DevelopersDevelop tools & prog. modelsGrid servicesAdaptivity, securityGrid DevelopersProvide grid servicesExisting grid servicesConnectivity, securitySystem AdministratorsManagement of resourcesManagement toolsBalancing concerns
Grid architectureComposed of hierarchy of sub-systemsScalability is vitalKey elements:End systemsSingle compute nodes, storage systems, IO devices etc.ClustersHomogeneous networks of workstations; parallel & distributed managementIntranetHeterogeneous collections of clusters; geographically distributedInternetInterconnected intranets; no centralised control
End SystemsState of the artPrivileged OS; complete control of resources and servicesIntegrated nature allows high performancePlenty of high level languages and toolFuture directionsLack features for integration into larger systemsOS support for distributed computationMobile code (sandboxing)Reduction in network overheads
ClustersState of the artHigh-speed LAN, 100s or 1000s of nodesSingle administrative domainProgramming libraries like MPIInter-process communication, co-schedulingFuture directionsPerformance improvementsOS support
IntranetsState of the artGrids of many resources, but one admin. domainManagement of heterogeneous resourcesData sharing (e.g. databases, web services)Supporting software environments inc. CORBALoad sharing systems such as LSF and CondorResource discoveryFuture directionsIncreasing complexity (physical scale etc)PerformanceLack of global knowledge
InternetsState of the artGeographical distribution, no central controlData sharing is very successfulManagement is difficultFuture directionsSharing other computing services (e.g. computation)Identification of resourcesTransparencyInternet services
Basic Grid servicesAuthenticationCan the users use the system; what jobs can they run?Acquiring resourcesWhat resources are available?Resource allocation policy; schedulingSecurityIs the data safe? Is the user process safe?AccountingIs the service free, or should the user pay?
Research Challenges (#1)Grids computing is a relatively new areaThere are many challengesNature of ApplicationsNew methods of scientific and business computing Programming models and toolsRethinking programming, algorithms, abstraction etc.Use of software components/servicesSystem ArchitectureMinimal demands should be placed on contributing sitesScalabilityEvolution of future systems and services
Research Challenges (#2)Problem solving methodsLatency- and fault-tolerant strategiesHighly concurrent and speculative execution Resource managementHow are the resources shared?How do we achieve end-to-end performance?Need to specify QoS requirementsThen need to translate this to resource levelContention?
Research Challenges (#3)SecurityHow do we safely share data, resources, tasks?How is code transferred?How does licensing work?Instrumentation and performanceHow do we maintain good performance?How can load-balancing be controlled?How do we measure grid performance?Networking and infrastructureSignificant impact on networkingNeed to combine high and low bandwidth
Development of middlewareMany people see middleware as the vital ingredientGlobus toolkitComponent services for security, resource location, resource management, information servicesOGSAOpen Grid Services ArchitectureDrawing on web services technologyGGFInternational organisation driving Grid developmentContains partners such as Microsoft, IBM, NASA etc.
Middleware Conceptual LayersWorkload Generation, VisualizationDiscovery, Mapping, Scheduling, Security, AccountingComputing, Storage, Instrumentation
Requirements include:Offers up useful resourcesAccessible and useable resourcesStable and adequately supportedSingle user Laptop feel Middle