View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Talk Outline Where Why What How Who
Where Where
– B513» Main Computer Room, ~1,500m2 & 1.5kW/m2, built for
mainframes in 1970, upgraded for LHC PC clusters 2003-2005. » Second ~1,200m2 room created in the basement in 2003 as
additional space for LHC clusters and to allow ongoing operations during the main room upgrade. Cooling limited to 500W/m2.
– Tape Robot building ~50m from B513» Constructed in 2001 to avoid loss of all CERN data due to an
incident in B513.
Why What How Who
Why Where Why
– Support» Laboratory computing infrastructure
Campus networks—general purpose and technical Home directory, email & web servers (10k+ users) Administrative computing servers
» Physics computing services Interactive cluster Batch computing Data recording, storage and management Grid computing infrastructure
What How Who
Physics Computing Requirements 25,000k SI2K in 2008, rising to 56,000k in 2010
– 2,500-3,000 boxes– 500kW-600kW @ 200W/box.
2.5MW @ 0.1W/SI2K
6,800TB online disk in 2008, 11,800TB in 2010– 1,200-1,500 boxes,– 600kW-750kW
15PB of data per year– 30,000 500GB cartridges/year– Five 6,000 slot robots/year
Sustained data recording at up to 2GB/s– Over 250 tape drives and associated servers
What are the major issues Where Why What are the major issues
– Commodity equipment from multiple vendors– Large scale clusters– Infrastructure issues
» Power and cooling» Limited budget
How Who
Commodity equipment & many vendors Given the requirements, significant pressure to limit
cost per SI2K and cost per TB. Open tender purchase process
– Requirements in terms of box performance– Reliability criteria seen as subjective and so difficult to
incorporate in process.» Also, as internal components are similar, are branded boxes
intrinsically more reliable?
Cost requiremens and tender process lead to “white box” equipment, not branded.
Tender purchase process leads to frequent changes of bidder. Good in that there is competition and we aren’t reliant on a
single supplier. Bad as we must deal with many companies, most of whom are
remote and subcontract maintenance services.
Large Scale Clusters The large number of boxes leads to problems in
terms of– Maintaining software homogeneity across the
clusters– Maintaining services despite the inevitable failures– Logistics
» Boxes arrive in batches of O(500)» Are vendors respecting the contractual warranty times?
(Have they returned the box we sent them last week…)
» How to manage service upgrades especially as not all boxes for a service will be up at the time of
upgrade
– …
Infrastructure Issues Cooling capacity limits the equipment we can install
– Maximum cooling of 1.5kW/m2
– 40x1U servers @ 200W/box = 8kW/m2
We cannot provide diesel backup for the full computer centre load.– Swiss/French auto-transfer covers most failures.– Dedicated zone for “critical equipment” with diesel
backup and dual power supplies.» Limited to 250kW for networks and laboratory computing
infrastructure.» … and physics services such as Grid and data management
servers but not all the physics network, so careful planning needed in terms of
switch/router allocations and the power connections.
ELFms Extremely Large Farm management system
– box nodes in:» deliver required configuration» monitor performance and any
deviation from the required state» track nodes through hardware and
software state changes
Three components:– quattor for configuration, installation and node
management– Lemon for system and service monitoring– Leaf for managing state changes—both hardware
(HMS) and software (SMS)
Node ConfigurationManagement
NodeManagement
quattor quattor takes care of the configuration,
installation and management of nodes.– A Configuration Database holds the ‘desired state’
of all fabric elements» Node setup (CPU, HD, memory, software RPMs/PKGs, network,
system services, location, audit info…)» Cluster (name and type, batch system, load balancing info…)» Defined in templates arranged in hierarchies – common
properties set only once
– Autonomous management agents running on the nodetake care of
» Base installation» Service (re-)configuration» Software installation and management
quattor architecture
Node Configuration Manager NCM
CompA CompB CompC
ServiceA ServiceB ServiceC
RPMs / PKGs
SW Package ManagerSPMA
Managed Nodes
SW server(s)
HTTP
SWRepository RPMs
Install server
HTTP / PXE System installer
Install Manager
base OS
XML configuration profiles
Configuration server
HTTP
CDB
SQL backend
SQL
CLI
GUI
scriptsXML backend
SOAP
Lemon Lemon (LHC Era Monitoring) is a client-server
tool suite for monitoring status and performance comprising– sensors to measure the values of various metrics
» Several sensors exist to monitor node performance, process, hw and sw monitoring, database monitoring, security, alarms
» “External” sensors for metrics such as hardware errors and computer centre power consumption.
– a monitoring agent running on each node. This manages the sensors and sends data to the central repository
– a central repository to store the full monitoring history
» two implementations, Oracle or flat file based
– an RRD based display framework» Pre-processes data into rrd files and creates cluster
summaries Including “virtual” clusters such as the set of nodes being used by a given
experiment.
Lemon architecture
CorrelationEngines
User Workstations
Web browser
Lemon CLI
User
MonitoringRepository
TCP/UDP
SOAP
SOAP
Repositorybackend
SQL
Nodes
Monitoring Agent
Sensor SensorSensor
RRDTool / PHP
apache
HTTP
Leaf LEAF (LHC Era Automated Fabric) is a collection of
workflows for high level node hardware and software state management, built on top of quattor and Lemon.– HMS (Hardware Management System)
» Track systems through all physical steps in lifecycle eg. installation, moves, vendor calls, retirement
» Automatically requests installs, retires etc. to technicians» GUI to locate equipment physically» HMS implementation is CERN specific, but concepts and design
should be generic
– SMS (State Management System)» Automated handling (and tracking of) high-level configuration steps
Reconfigure and reboot all LXPLUS nodes for new kernel and/or physical move Drain and reconfig nodes for diagnosis / repair operations
» Issues all necessary (re)configuration commands via quattor» extensible framework – plug-ins for site-specific operations possible
– CCTracker (in development)» shows location of equipment in room
Node
HMS
LAN DB
SMS
CDB
OperationsSysadmins
1. Import
2. Set to standby
3. Update
4. Refresh
5. Take out of production
6. Shutdown work order
7. Request move10. Install work order
8. Update
9. Update
11. Set to production
12. Update
13. Refresh 14. Put into production
Use Case: Move rack of machines
Who Where Why What How Who
– Contract Shift Operators: 1 person 24x7– Technician level System Administration Team
» 10 team members plus 3 people for machine room operations plus engineer level manager
– Engineer level teams for Physics computing» System & Hardware support: approx 10FTE» Service support: approx 10FTE» ELFms software: 3FTE plus students and collaborators.
~30FTE-years total investment since 2001
Summary Physics requirements, budget and tendering
process lead to large scale clusters of commodity hardware.
We have developed and deployed tools to install, configure, monitor nodes and to automate hardware and software lifecycle steps.
Services must cope with individual node failures– already the case for simple services such as batch– new data management software being introduced to
reduce reliance on individual servers– focussing now on grid level services
We believe we are well prepared for LHC computing– but expect managing the large scale, complex
environment to be an exciting adventure