1
Grid Computing(Special Topics in Computer Engineering)
Veera Muangsin
23 January 2004
2
Outline
• High-Performance Computing
• Grid Computing
• Grid Applications
• Grid Architecture
• Grid Middleware
• Grid Services
3
High-Performance Computing
4
mega = 106 (ล้�าน) giga = 109 (พั�นล้�าน) , tera = 1012 (ล้�านล้�าน) , peta = 1015 (พั�นล้�านล้�าน)
World’s Fastest Computers: The Top 5
5
#1 Japan’s Earth Simulator
Specifications
Peak performance / processor
Peak performance / node
Shared memory
8 Gflops
64 Gflops
16 GB
Total number of processors
Total number of nodes
Total peak performance
Total main memory
5,120640
40 Tflops
10 TB
6
Processor Cabinets
7
Earth Simulator does climate modelingGrid points: 3840*1920*96
J=1
92
0
I=3840PN320
. . .
PN02PN03
PN01
K=96
J=1
92
0
PN320
. . .
PN02PN03
PN01
K=96
FFT
InversedFFT
Parallel decompositionGrid space Spectral
space
8
• Being constructed by IBM• To be completed in 2006• Expected performance:
1 PetaFLOPS, to be no.1 in the TOP500 list
(in 2003 the aggregated performance of TOP500 machines is 528 TFlops)
• Applications: molecular dynamics, protein folding, drug-protein interaction (docking)
9
Clusters
The most common architecture in the TOP500– 7 in the top 10
– 208 from 500
10
11
#2 LANL’s ASCI Q
• 13.88 TFlops• 8192-node cluster
HP AlphaServer 1.25 GHz• LANL (Los Alamos
National Laboratory)• Analyze and predict the perf
ormance, safety, and reliability of nuclear weapons
12
#3 Virginia Tech’s System X
• 10.28 TFlops• 1,100-node cluster, Apple
G5, Dual PowerPC970 2GHz, 4GB memory, 160GB disk (total 176 TB), Mac OS X (FreeBSD based UNIX)
• $5.2 millions
13
System X’s Applications
• Nanoscale Electonics• Quantum Chemistry• Computational Chemistry/Biochemistry• Computational Fluid Dynamics• Computational Acoustics• Ecomputational Electromagnetics• Wireless Systems Modeling• Large scale Network emulation
14
#4 NCSA’s Tungsten
• 9.81 TFlops• 1,450-node cluster, dual-
processor Dell PowerEdge 1750, Intel Xeon 3.06 GHz
• NCSA (National Center for Supercomputing Applications)
15
#5 PNNL’s MPP2
• 8.63 TFlops
• 980-node cluster, HP Longs Peak, dual Intel Itanium-2 1.5 GHz
• PNNL (Pacific Northwest National Laboratory)
• Application: Molecular Science
16
The Real No.1 68.06 TFlops !!!
17Last updated: Fri Jan 23 01:33:45 2004
TotalLast 24
Hours
Users 4,848,5841,457 (new
users)
Results received
1,213,258,391
1,507,691
Total CPU time
1,783,547.603 years
1,324.293 years
Floating PointOperations
4.315893e+21
5.879995e+18(68.06 TeraFLOPs/sec)
Average CPU timeper work unit
12 hr 52 min 39.4 sec
7 hr 41 min 39.9 sec
18
Science at Home
19
Evaluate AIDS drugs at home• 9,020 users (12 Jan 2004) • AutoDock: predict how drug candidates, might bind to a receptor of HIV’s protein
20
Scientific Applications
• Always push computer technology to its limit
• Grand Challenge applications– Those applications that cannot be completed with sufficient
accuracy and timeliness to be of interest, due to limitations such as speed and memory in current computing systems
• Next challenge: large scale collaborative problems
21
E-Science: a new way to do science
• Pre-electronic science– Theorize and/or experiment, in small teams
• Post-electronic science– Construct and mine very large databases– Develop computer simulations & analyses– Access specialized devices remotely– Exchange information within distributed
multidisciplinary teams
22
Data Intensive Science: 2000-2015• Scientific discovery increasingly driven by IT
– Computationally intensive analyses– Massive data collections– Data distributed across networks of varying capability– Geographically distributed collaboration
• Dominant factor: data growth– 2000 ~0.5 Petabyte– 2005 ~10 Petabytes– 2010 ~100 Petabytes– 2015 ~1000 Petabytes?
• Storage density doubles every 12 months• Transforming entire disciplines in physical and biological
sciences
23
Network• Network vs. computer performance
– Computer speed doubles every 18 months– Network speed doubles every 9 months– Difference = order of magnitude per 5 years
• 1986 to 2000– Computers: x 500– Networks: x 340,000
• 2001 to 2010– Computers: x 60– Networks: x 4000
24
E-Science Infrastructure
sensor nets
data archives
computers
software
colleagues
instruments
25DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
tomographic reconstruction
real-timecollection
wide-areadissemination
desktop & VR clients with
shared controls
Advanced Photon Source
Online Access to Scientific Instruments
archival storage
26
Data Intensive Physical Sciences
• High energy & nuclear physics– Including new experiments at CERN
• Astronomy: Digital sky surveys
• Time-dependent 3-D systems (simulation, data)– Earth Observation, climate modeling– Geophysics, earthquake modeling– Fluids, aerodynamic design– Pollutant dispersal scenarios
27
Data Intensive Biology and Medicine
• Medical data– X-Ray– Digitizing patient records
• X-ray crystallography• Molecular genomics and related disciplines
– Human Genome, other genome databases– Proteomics (protein structure, activities, …)– Protein interactions, drug delivery
• 3-D Brain scans
28
Grid Computing
29
What is Grid?
Google Search (Jan 2004)“grid computing” >600,000 hits
“grid computing” AND hype
>20,000 hits(hype = โฆษณาชวนเช��อ)
30
From Web to Grid
• 1999: Grids add to the web
• computing power, data management, instruments
• E-Science
• Commerce is not far behind
• 1989: Tim Berners-Lee invented the web
• so physicists around the world could share documents
31
The Grid Opportunity:e-Science and e-Business
• Physicists worldwide pool resources for peta-op analyses of petabytes of data
• Engineers collaborate to design buildings, cars
• An insurance company mines data from partner hospitals for fraud detection
• An enterprise configures internal & external resources to support e-Business workload
32
Grid
• “We will give you access to some of our computers and instruments if you give us access to some of yours.”
• “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
33
Grid• Grid provides the infrastructure
– to dynamically managed:• Compute resources
• Data sources (static and live)
• Scientific Instruments (Wind Tunnels, Telescopes, Microscopes, Simulators, etc.)
– to build large scale collaborative problem solving environments that are:
• cost effective
• secure
34
Grid Applications
35
Life Sciences
NETWORK
IMAGINGINSTRUMENTS
COMPUTATIONALRESOURCES
LARGE DATABASES
DATA ACQUISITIONPROCESSING,
ANALYSISADVANCED
VISUALIZATION
36
Biomedical applications• Data mining on genomic
databases (exponential growth)
• Indexing of medical databases (Tb/hospital/year)
• Collaborative framework for large scale experiments
• Parallel processing for– Databases analysis– Complex 3D modelling
37
Digital Radiology on the Grid
• 28 petabytes/year for 2000 hospitals• must satisfy privacy laws
University of Pennsylvania
38
Brain Imaging• Biomedical Informatics Research Network [BIRN] Reference set of brains provides essential data for developing
therapies for neurological disorders (Multiple Sclerosis, Alzheimer’s disease).
• Pre-BIRN: – One lab, small patient base– 4 TB collection
• With TeraGrid– Tens of collaborating labs– Larger population sample– 400 TB data collection: more brains, higher resolution– Multiple-scale data integration, analysis
39
Earth Observations
ESA missions: • about 100 Gbytes of data
per day (ERS 1/2)• 500 Gbytes, for the next
ENVISAT mission
40
Particle Physics• Simulate and reconstruct complex physics
phenomena millions of times
41
Whole-system Simulations
•braking performance•steering capabilities•traction•dampening capabilities
landing gear models
•lift capabilities•drag capabilities•responsiveness
wing models
•deflection capabilities•responsiveness
stabilizer modelsairframe models
crew capabilities- accuracy- perception- stamina- reaction times- SOP’s
human models •thrust performance•reverse thrust performance•responsiveness•fuel consumption
engine models
NASA Information Power Grid: coupling all sub-system simulations
42
National Airspace Simulation Environment
NASA Information Power Grid: aircraft, flight paths, airport operations and the environmentare combined to get a virtual national airspace
VirtualNational
Air SpaceVNAS
GRCengine models
LaRC
airframe models
landinggear models
ARC
wing models
stabilizer models
human models
• FAA ops data• weather data• airline schedule data• digital flight data• radar tracks• terrain data• surface data
22,000 commercialUS flights a day
50,000 engine runs
22,000 airframe impact runs
132,000 landing/take-off gear runs
48,000 human crew runs
66,000 stabilizer runs
44,000 wing runs
simulationdrivers
43
Global In-flight Engine Diagnostics
in-flight data
airline
maintenance centre
ground station
global networkeg SITA
internet, e-mail, pager
DS&S Engine Health Center
data centre
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
44
Emergency Response Teams
• Bring sensors, data, simulations and experts together– wildfire: predict movement
of fire & direct fire-fighters – also earthquakes,
peacekeeping forces, battlefields,…
Los Alamos National Laboratory: wildfireNational Earthquake Simulation Grid
45
Grid C Grid Computing Today omputing Today
DISCOM
SinRG
APGrid
IPG …
46
Selected Major Grid Projects
Grid testbed linking IBM laboratoriesIBMBlueGrid
Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics
eu-datagrid.org
European Union
European Union (EU) DataGrid
Delivery and analysis of large climate model datasets for the climate research community
earthsystemgrid.orgDOE Office of Science
Earth System Grid (ESG)
Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities
sciencegrid.org
DOE Office of Science
DOE Science Grid
Create operational Grid providing access to resources at three U.S. DOE weapons laboratories
www.cs.sandia.gov/discomDOE Defense Programs
DISCOM
Create & deploy group collaboration systems using commodity technologies
www.mcs.anl.gov/FL/accessgrid; DOE, NSF
Access Grid
FocusURL & SponsorsName
ggg
g
g
g
New
New
47
Selected Major Grid Projects
Create & apply an operational grid within the U.K. for particle physics research
gridpp.ac.uk
U.K. eScience
GridPP
Integration, deployment, support of the NSF Middleware Infrastructure for research & education
grids-center.org
NSF
Grid Research Integration Dev. & Support Center
Research on Grid technologies; development and support of Globus Toolkit™; application and deployment
globus.org
DARPA, DOE, NSF, NASA, Msoft
Globus Project™
Grid technologies and applicationsgridlab.org
European Union
GridLab
Create a national computational collaboratory for fusion research
fusiongrid.org
DOE Off. Science
Fusion Collaboratory
Create tech for remote access to supercomp resources & simulation codes; in GRIP, integrate with Globus Toolkit™
eurogrid.org
European Union
EuroGrid, Grid Interoperability (GRIP)
FocusURL/SponsorNameg
g
g
g
g
g
New
New
New
New
New
48
Selected Major Grid Projects
Create and apply a production Grid for earthquake engineering
neesgrid.org
NSF
Network for Earthquake Eng. Simulation Grid
Create and apply production Grids for data analysis in high energy and nuclear physics experiments
ppdg.net
DOE Science
Particle Physics Data Grid
Create international Data Grid to enable large-scale experimentation on Grid technologies & applications
ivdgl.org
NSF
International Virtual Data Grid Laboratory
Create and apply a production Grid for aerosciences and other NASA missions
ipg.nasa.gov
NASA
Information Power Grid
Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS
griphyn.org
NSF
Grid Physics Network
Research into program development technologies for Grid applications
hipersoft.rice.edu/grads; NSF
Grid Application Dev. Software
FocusURL/SponsorNameg
g
g
g
gNew
New
g
49
Selected Major Grid Projects
Support center for Grid projects within the U.K.
grid-support.ac.uk
U.K. eScience
UK Grid Support Center
Technologies for remote access to supercomputers
BMBFTUnicore
U.S. science infrastructure linking four major resource sites at 40 Gb/s
teragrid.org
NSF
TeraGrid
FocusURL/SponsorNameg
gNew
New
Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS
See also www.gridforum.org
50
TeraGrid
• 13.6 trillion calculations per second
• Over 600 trillion bytes of immediately accessible data
• 40 gigabit per second network speed
51
TeraGrid
52
European DataGrid
RAL
Lund
Lisboa
Santander
Madrid
Valencia
Barcelona
Paris
Berlin
LyonGrenoble
Marseille
Brno
Prague
Torino
Milano
BO-CNAFPD-LNL
Pisa
Roma
Catania
ESRIN
CERN
IPSL
Estec KNMI
(>40)Testbed Sites
53
UK e-Science Grid
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Soton
London
Belfast
DL
RAL Hinxton
e-Science Centers
54
Asia-Pacific Grid (APGrid)Japan
Australia
USA
Canada
Korea
Thailand
Taiwan
Singapore
Malaysia
APAN members
55
Grid goes to business
• IBM, HP, Oracle, Sun, …
• www.ibm.com/grid
• www.hp.com/techservers/grid
• www.oracle.com/technologies/grid
• www.sun.com/grid
56
For More Information
• Globus Project™– www.globus.org
• Grid Forum– www.gridforum.org
• Book (Morgan Kaufman)– www.mkp.com/grids