Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Hong Kong Grid and Grid Research and Deployment香港网格及网格技術研究及實踐
Francis C.M. Lau (劉智滿博士)(with C.L. Wang and Roy Ho)
Department of Computer Science (and Information Systems)The University of Hong Kong
香港大學計算機科學系
2
HKU Systems Research Group香港大學系統研究組
“Small group, big research”雷声小,雨点大
“喊得多,做的少” – 金海
“Small group, big research”雷声小,雨点大
“喊得多,做的少” – 金海
3
Agenda
The Hong Kong GridGrid research at HKU
InstantGrid/SLIMJESSICA2G-JavaMPILOTS DSMG-Pass
Conclusion
4
Hong Kong Grid 香港网格
5
HK & Grid
HK as a regional hub (香港作爲亞洲联网的樞紐之一)
Interconnecting major cities in Asia Pacific & US
Little restrictions谁都不管!
6
Hong Kong Grid (HKGrid)
Goals:To construct a grid test bed in HK
Grid R&DInstitutions, government, industryPartners in China and Asia-Pacific
To act as gateway for grids nearbyTo demonstrate key R&D outcomes
Supported by grants from the HKSAR government, HKU, etc.http://www.hkgrid.org/
7
HKGrid - Current Constituents
Service gatewayThe HK Institute of HPC高性能計算中心
Pentium 4 x 300(#175 in TOP500, 11/2002)
HKU – CS Department香港大學計算機科學系
2-way Xeon SMP x 128(#240 in TOP500, 11/2003)
HKU – Computer Centre香港大學計算中心
Service gatewayThe HK Polytechnic University理工大學
4-way SMP clusterHK University of Science and Technology 科技大學
2-way Xeon SMP x 64(#300 in TOP500, 6/2003)
HK Baptist University浸會大學
Service gatewayCity University of HK城市大學
Computing facilitiesInstitutions
A 4 Tflop/s theoretical maximum computing power
8
HKGrid – Network Connections
Links to China National Grid (CNGrid) and Asia-Pacific Grid (ApGrid) via CERNET and APANPlan to connect to China Grid (if Prof. JH lets us)Internet2 connection to the Abilene backbone at Chicago, USAPlays the role of a gateway for the other bigger grids
9URL: http://gideon.csis.hku.hk/hkgrid/
Performance Monitoring
with Ganglia
10
HKGrid Launched in Cluster2003
Main Organizers of HKGrid; with Dr. Z.W. Xu of CNGrid
Cluster2003 國際大會
11
Progress
Oct-Dec 2003HKGrid Certificate Authority (CA) and middleware (GT and monitoring software) installedHKGrid officially launched in Cluster2003Weather forecasting demo with AIST (Japan) and Kasetsart University (Thailand)Climate simulation demo in the 5th PRAGMA
Jan-Apr 2004HK Supercomputing Directory released SLIM demonstration for HK Science and Technology Parks, HK Linux Industry Association, HK Government
Ongoing workDeploy our advanced grid platform in HKGridInteroperability with CNGrid, ApGrid, and other country grids
12
Research Projects with HKGridHKBU (浸大): Knowledge grid, autonomous grid service compositionHKCU (中大): Agent-based wireless grid computing HKPU (理大): Peer-to-peer grid, meta-scheduling, fault tolerance HKUST (科大) : Resource allocation and scheduling, topology optimizationHKU (港大)
Computer Centre : HKU campus grid ; scientific applications running across the ApGridCS : Robust Speech Recognition (Dr. Q. Huo) CS : Simulation for the DNA Shuffling Experiment (Dr. T.W. Lam) CS : Approximate String Matching on DNA Sequences (L.L. Cheng)CS : Whole Genome Alignment via Mutation-Sensitive Sequence Similarity (Dr. T.W. Lam) CS : HKU Grid Point (A 863 Project: China National Grid) CS : Asia-Pacific Grid ETI: Modeling of Air Quality in Hong Kong (E-Business Technology Institute with the Environmental Protection Department, HKSAR)ME: Parallel Simulation of Turbulent Flow Model (Dr. C.H. Liu, Dept. of Mechanical Engineering)
13
Other Adoptions in Hong Kong
Local financial institutes to model foreign exchange market and forecast exchange ratesThe Environmental Protection Department has attempted to investigate the inter-connections of air pollution mosaic through numerical simulation (since 2001)Government plans to harness grid technologies to utilize idle PCs during off-hoursApplications!!!!
14
Grid Facilities at HKU港大网格設施
15
HKU Grid Point: Grid and Cluster Software
- Globus Toolkit (GT) 2.0, 2.4, 3.0.1
- OpenPBS 2.3.16 - Maui 3.2.5
-HPF, Fortran 90-C, C++, Java with MPI-JESSICA2 (HKU)
- MPICH-G2 1.2.3
Grid middleware
Job scheduling
Programming
Communication Lib
Remote job submission
Gatekeepergideon.csis.hku.hk
Local Job Scheduler
CSGideon
CSOstrich
CSSrgdell
CCHPCPower
IPC / Network communication
16
Computer Centre - HPCPOWER
IBM 2-way Xeon x 128; Ranked #240 in TOP500, 11/2003
17
CS Department – “Self-Made” Gideon 300 PC cluster
Pentium x 300; Ranked #340 (#175) in TOP500, 6/2003 (11/2002)
18
HKU in CNGrid (863 Project)
上海超级计算中心
中科院计算所
香港大学 (CS)
西安交通大学
中国科技大学
国防科技大学
中科院应用物理所
清华大学
China National Grid Participants
Supporting software:VEGA (织女星) GOS: dynamic service deployment, single-sign-on, data replication, and performance monitoring. Developed by Institute of Computing Technology, Chinese Academy of SciencesV.1.0 released 8
中科院计算所开发的网格系统软件已将计算所、华中科技大学 与香港大学网格节点连接在一起,通过VEGA_GOS …
19
160160
8080
3232
32324040
1616
ApGrid Test Bed – Weather Forecasting
20
Visits, demos, making noises … (2003)
21
Grid Research at HKU香港大學网格研究項目
22
Grid Computing : A Refresher
Grid Computing
Access toremote resources
via standard protocols
forcross-domain collaboration
CPU power,Memory,Network,Storage…
Data..Services..
Resource providers
End users
Computing as “utilities”, like electricity, water, etc.Advantages:
Cost-effectivenessPlatform extensibilityConvenience (Plug & Play)
23
Potential Applications
HPCHigh-energy/particle physics, environmental science, bioinformatics, molecular modeling, drug design, neuroscience, weather forecasting, aerospace design, earthquake simulation, …
Grid ServicesVideo-conferencing, e-learning, supply chain mgmt., automobile manufacturing, OLTP front ends, CRM, financial analysis… and many others after convergence of OGSA and WS (i.e., WSRF)
24
Our Position
Will all these apps be naturally supported by interconnecting the computing resources?
Spec. of “commodity” grid platforms ≠ ideal execution environments for apps
Grid middleware to bridge the gap
Existing middleware provides the mechanisms to access remote resources, but does not address many fundamental problems
We aim to derive solutions to these problems, and incorporate them to form an advanced grid platform (AGP)
25
Commodity Grids Difficult to Use
Heterogeneous & dynamicLoad balancingHow to distribute the work?Meta-scheduling vs. local autonomy
Poor programmabilityInconsistent software configuration
OS, library, middleware, …Collective computation?
Complicated security managementO(nm) for n grid points and m users/appsUser-to-host authentication?
26
HKU’s Advanced Grid PlatformGoals 我們的目標
Advanced Grid Platform
Load balancing
On-demand grid
construction
“Grid-friendly”programming
Flexibleexecution
environments
Consolidatedsecurity
27
Core Components/Projects
G-JavaMPIG-JavaMPILoad balancing, work (re-)distribution mechanisms
SSI for programmability.JESSICA: LAN-based distributed JVMsLOTS: WAN-based DSM for grid
JESSICA2JESSICA2 LOTSLOTS
InstantGridInstantGridInstantGrid
SLIMSLIMSLIM
G-PassG-Pass Grid-wide VO-centric security
On-demand grid construction
Execution environments mgmt.,dissemination mechanisms
Production use Experimental Prototype stage
28
InstantGrid on SLIM
On-demand construction of grid points with customized execution environments
29
SLIM – Single Linux System Mgmt.
A network service for managing and constructingEE’s, and disseminating them to remote computing platformsGrid computing decouples computing platforms (resources) and computing logic (applications)I.e., a single platform can run completely different applicationsProblem: different applications demand different execution environments (OS, shared libraries, supporting apps, etc.)The troubles of managing execution environments (EE’s) on the resource provider’s side offset the benefits of resource sharing
30
SLIM – System design
How does it work?A node sends a EE specification across the network to find the Boot serverBoot server delivers the requested Linux kernelImage server constructs an EE by collecting shared libraries, user data, etc.Linux kernel boots, and contacts the Image Server to “mount” the EE via a file synchronization protocol such as NFSAggressive caching techniques are deployed to optimize performance
““On demandOn demand””
““Get what you needGet what you need””
31
InstantGrid: On-Demand Grid Point Construction
Integrates SLIM with GT, PBS, and GangliaModular design, supports any grid middlewareConfigure-before-disseminate, simplifies grid point managementOn-demand construction of grid points, virtually effortless on the clients’ and compute nodes’ partCustomizable EE, consistent across the entire grid
32
Performance Evaluation
InstantGrid:Construct a 256-node grid point (Linux+GT+PBS+Ganglia) from scratch(PXE enabled) through Fast Ethernet: 5 minutesGenerate host certificates for 256 machines: 9 minutesTotal time : 14 minutes
SLIM:272 PCs < 5 minutes(Linux only)
33
InstantGrid/SLIMCurrent Progress
Released to general public since April 2004>150 downloads; from Mainland China, Hong Kong, Macau, Taiwan, USA, and SingaporeHKSAR Government bodies, academic institutions and high schools, software development firms, and private companiesInstantGrid/SLIM has been managing:
the HKU-CSIS grid point (350 nodes) for various grid research projectsan addition 300+ lab machines for teaching purposes (different courses have different requirements)
34
InstantGrid/SLIMFuture Work
To overcome the challenges in deploying InstantGrid over broadband links
“Pervasive grid computing”
Standard for EE specificationNegotiation protocols among grid points to compromise on EE spec.
Platform extensibility
35
InstantGrid/SLIM – Key References
http://slim.csis.hku.hk/
R.S.C. Ho, C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau, “Managing Execution Environments for Utility Computing,” Network Research Workshop, APAN 2004, July, 2004
Try it
(LinuxPilot 2004/04)
36
JESSICA2
A Java-Enabled Single-System Image Computing Architecture
37
A Form of “Mobile” Computing
Applications should not be stationary – to take advantage of multiplicity of distributed resources and to achieve efficiency (e.g., load balancing)If threads and processes can be migrated, then applications can, and multi-process/thread apps can execute in real/enhanced parallelism – Amoeba!This applies best to certain (many?) grid appsSupports for dynamic process/thread migration should be built from ground 0
38
JESSICA2
A Distributed Java Virtual Machine (DJVM)consisting of a group of extended JVMsrunning in a distributed environment
Supports true parallel execution of a multithreaded Java application
Java threads can freely move across node boundaries and execute in parallel
Grid as a single machine – Single System Image (SSI)
39
JESSICA2 Architecture
Thread Migration
Global Object Space
JESSICA2JVM
A Multithreaded Java Program
JESSICA2JVM
JESSICA2JVM
JESSICA2JVM
JESSICA2JVM
JESSICA2JVM
Master Worker Worker Worker Worker Worker
JIT Compiler ModePortable Java Frame
40
JESSICA2 Main FeaturesTransparent Java thread migration
Runtime capturing and restoring of thread execution contextNo source code modification; no bytecodeinstrumentation (preprocessing); no new API introducedEnable dynamic load balancing on clusters
Full Speed Computation JITEE: cluster-aware bytecode execution engineOperated in Just-In-Time (JIT) compilation modeZero cost if no migration
Transparent Remote Object AccessGlobal Object Space : A shared global heap spanning all cluster nodes Adaptive migrating home protocol for memory consistency + various optimizing schemes.I/O redirection
41
Ray Tracing on JESSICA2 (64 PCs)
Linux 2.4.18-3 kernel (Redhat 7.3)
64 nodes: 108 seconds
1 node: 4402 seconds ( 1.2 hour)
Speedup = 4402/108=40.75
42
JESSICA – Key references
W.Z. Zhu , C.L. Wang, and F.C.M. Lau “A Lightweight Solution for Transparent Java Thread Migration in Just-in-Time Compilers,” The 2003 International Conference on Parallel Processing (ICPP-2003), pp. 465-472, Taiwan, Oct. 6-10, 2003W.Z. Zhu, C.L. Wang and F.C.M. Lau, “JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support,” IEEE Fourth International Conference on Cluster Computing (CLUSTER 2002), Chicago, USA, September 23-26, 2002, pp. 381-388.M.J.M. Ma, C.L. Wang, F.C.M. Lau. “JESSICA: Java-Enabled Single-System-Image Computing Architecture,” Journal of Parallel and Distributed Computing, Vol. 60, No. 10, October 2000, pp. 1194-1222.
43
G-JavaMPI
A grid-enabled Java-MPI system with dynamic load-balancing via process migration
44
G-JavaMPI
Goal: load balancing for gridGrid’s heterogeneity & dynamicityPoor parallelization of programs
A grid-enabled implementation of Java binding of MPITransparent Java process migration(through JVMDI)Balance both CPU and network loadsCommunication-aware process migration policies based on:
application’s communication patternavailable network bandwidth on grid overlays
45
G-JavaMPI – System design
(*)
Gatekeeper
(1)(1*)
LS
Gatekeeper(3*)
LS
Gatekeeper
(3)
LS (2)
WAN
Migrating(restarting a new process through Globus remote job request with delegated user credentials and Java-MPI job credentials)
Java-MPI communication
Some legacymessages are redirectedduring migration
(2*)
JVM
M
Migration module resides in each JVM
46
G-JavaMPI – Ongoing andFuture Work
The migration mechanism has been implementedRequests for source code from universities in China, Germany, and SingaporeFuture work target at process migration policies
CPU and network heterogeneities cause long “blocking” periods in cooperative processes, thus limiting the system throughputG-JavaMPI aims to detect and eliminate “blocking” through process migration (e.g. to migrate a “bottleneck” process to a faster node, etc.)
47
LOTS
Large Object Space (DSM) on Grid
48
LOTS
OS
H/W
LOTS
OS
H/W
LOTS
OS
H/W
LOTS
OS
H/W
LOTS
OS
H/W
Large Global Large Global Object SpaceObject Space
LOTS: Large Object Space on Grid
A software distributed memory system for GridProvides a large distributed memory space > the process spaceUses local hard disk to store recently unused objectsScope Consistency + Home Migration to reduce redundant data traffic
Grid
49
G-Pass
Virtual organization-centric grid security
50
G-Pass
Multi-agent and some dynamic grid systems (e.g., G-JavaMPI, etc.) demand flexible authentication schemes“user-to-host” too limitedIdentity of “VO”?In G-Pass, each process is given a G-Pass credential, which is valid in a pre-defined, grid-wide, security contextExamples of context: names of VO, file access privileges, valid period, etc.G-Pass forms a foundation for secure process migration within and across grid points, which provides the needed support for our G-JavaMPI, JESSICA, and LOTS projects
51
SummaryPerformance
G-JavaMPI & JESSICA establish extensible grid platforms Process/thread migration enables performance optimization and load balancingLOTS supports shared memory programming environment on large object space (data grid applications)
ReliabilityG-JavaMPI migrates processes from failed machinesInstantGrid/SLIM help construct platforms for failover
ConvenienceG-JavaMPI, JESSICA, and LOTS enhance programmabilityInstantGrid/SLIM simplify grid point management
SecurityG-Pass consolidates grid-wide security and enables application mobility
52
Conclusion
Grid computing is a relatively new paradigm that deserves further investigation
We identify and address the fundamental research issues in grid computing
Our advanced grid computing platform is geared to deploy in production grid systems
53
To Find Out More
• Hong Kong Grid • http://www.hkgrid.org/
• Grid Computing Research Portal• http://grid.csis.hku.hk/
• The HKU Systems Research Group• http://www.srg.csis.hku.hk/
The HK Supercomputing Directoryhttp://www.hkhpc.org/~SuperDir/
54
Thanks! 谢谢!
HKU Systems Research Group (12/2003)香港大學系統研究組