View
221
Download
0
Category
Preview:
Citation preview
CSC Site UpdateCSC Site Update
HP Nordic TIGHP Nordic TIG
April 2008Janne IgnatiusMarko MyllynenDan Still
CSC at glance
Founded in 1970 as a technical support unit for Univac 1108
Reorganized as a company, CSC - Scientific Computing Ltd. in 1993
All shares to the Ministry of Education of Finland in 1997
Operate on a non-profit principle
Facilities in Keilaniemi, Espoo, since March, 2005
MISSION
CSC is the national IT center for science developing and providing services for universities, research institutes, and industry.
VISION
CSC is well known and appreciated in Finland as well as abroad as a pioneer, collaboration partner, and center of competence in the field of IT technology for science.
CSC at a Glance
CSC’s Services
FUNET SERVICES
COMPUTING SERVICES
APPLICATION SERVICES
DATA SERVICES FOR SCIENCE AND
CULTURE
INFORMATION MANAGEMENT SERVICES
Louhi - Cray XT4 Supercomputer
1st phase installed 04/2007 1012 computing nodes each having 2.6 GHz
AMD Opteron dual core processor High bandwidth low latency interconnect
(SeaStar2) 1 - 2 GB memory per core Peak performance 10.6 teraflops Final configuration (to be installed Q3/2008)
core count open, 1-2 GB memory per core Peak performance 70+ teraflops
Murska - HP CP4000 BL ProLiant Supercluster
Installed 04/2007, expanded 11/2007 544 compute nodes each having two 2.6
GHz AMD Opteron dual core processor 2176 compute cores 4x DDR InfiniBand interconnect 5 TB total memory: 256 nodes * 4GB, 128 *
8GB, 128 * 16GB, 32 * 32GB 100 TB SFS/Lustre file system Peak performance 11.3 teraflops
Murska - HP CP4000 BL ProLiant, cont.
RHEL 4 based HP XC 3.1 cluster operating system SLURM/LSF HP-MPI PGI, PathScale, GNU, TotalView, ACML, … HP Xtools, collectl, mpe2, …
Blade hardware working surprisingly well Interconnect working nicely Disk system also working ok after initial issues
• MSA20 disk array failure recovery suboptimal• SFS quota still limited to 4 TB
System constantly in heavy use
Murska - HP CP4000 BL Availability
Three unexpected breaks after Nov 2007 upgrades• 29.1.2008: SFS hang, fixed with disk array reset• 30.1.2008: Ethernet switch died (in the cabin where several
power supplies had died few days earlier..)• 12.3.3008: SFS hang, fixed with disk array reset
System availability since Nov 2007 95%-100%
System usage since Nov 2007 30%-100%
Sepeli - HP ProLiant DL145 Cluster
Installed 2005 128 (earlier 256) compute nodes 512 cores and 2 TB memory
• 4x DDR InfiniBand / GigE interconnect
4 TB PVFS2 / NFS disk system Peak performance 3.1 teraflops Earlier part of national M-grid,
now being dedicated to LHC use (particle collision data analysis)
Sepeli - HP ProLiant DL145 Cluster, cont.
RHEL 4 based Rocks 3.1 cluster operating system SGE
Overall system lifespan price/performance quite satisfactory
InfiniBand hardware very stable
Grid Engine tight integration with multiple MPI flavors labor-intensive
DL145 iLO initially unreliable, improved over time
Material Sciences National Grid Infrastructure (M-grid) A joint project of CSC, 7 Finnish universities
and Helsinki Institute of Physics funded by the Finnish Academy for the National Research Infrastructure Program in the Grid area
Aims to build a homogeneous PC-cluster environment with theoretical peak of approx. 3 teraflops per 350 nodes
Environment• Hardware: Provided by HP. Dual AMD Opteron
1.8-2.2 GHz nodes with 2-8 GB memory, 1-2 TB shared storage, separate 2xGE (communications and NFS), remote administration
• OS: NPACI Rocks Cluster Distribution / 64 bit, based on Red Hat Enterprise Linux 3, 4
• Grid middleware: NorduGrid ARC Grid MW compiled
• With Globus 3.2.1 libraries, Sun Grid Engine as LRMS
• Centrally managed configuration with Cfengine
CSC• Administration tasks• Maintains Operating
System, LRMS, Grid middleware, certain libraries• Separate small test cluster for testing new software releases, • Tools for system monitoring, integrity checking, etc.
CSC• Administration tasks• Maintains Operating
System, LRMS, Grid middleware, certain libraries• Separate small test cluster for testing new software releases, • Tools for system monitoring, integrity checking, etc.
Some international activities
PRACE
DEISA
EGEE, EGI, NDGF; HPC-EUROPA, …
Thank You!
Questions?
Recommended