The CERN Computer Centres October 14 th 2005 Tony.Cass@ CERN.ch

The CERNComputer Centres

October 14th 2005

[email protected]

2 [email protected]

Talk Outline Where Why What How Who

3 [email protected]

Where Where

– B513» Main Computer Room, ~1,500m2 & 1.5kW/m2, built for

mainframes in 1970, upgraded for LHC PC clusters 2003-2005. » Second ~1,200m2 room created in the basement in 2003 as

additional space for LHC clusters and to allow ongoing operations during the main room upgrade. Cooling limited to 500W/m2.

– Tape Robot building ~50m from B513» Constructed in 2001 to avoid loss of all CERN data due to an

incident in B513.

Why What How Who

4 [email protected]

Why Where Why

– Support» Laboratory computing infrastructure

Campus networks—general purpose and technical Home directory, email & web servers (10k+ users) Administrative computing servers

» Physics computing services Interactive cluster Batch computing Data recording, storage and management Grid computing infrastructure

What How Who

5 [email protected]

Physics Computing Requirements 25,000k SI2K in 2008, rising to 56,000k in 2010

– 2,500-3,000 boxes– 500kW-600kW @ 200W/box.

2.5MW @ 0.1W/SI2K

6,800TB online disk in 2008, 11,800TB in 2010– 1,200-1,500 boxes,– 600kW-750kW

15PB of data per year– 30,000 500GB cartridges/year– Five 6,000 slot robots/year

Sustained data recording at up to 2GB/s– Over 250 tape drives and associated servers

6 [email protected]

What are the major issues Where Why What are the major issues

– Commodity equipment from multiple vendors– Large scale clusters– Infrastructure issues

» Power and cooling» Limited budget

How Who

7 [email protected]

Commodity equipment & many vendors Given the requirements, significant pressure to limit

cost per SI2K and cost per TB. Open tender purchase process

– Requirements in terms of box performance– Reliability criteria seen as subjective and so difficult to

incorporate in process.» Also, as internal components are similar, are branded boxes

intrinsically more reliable?

Cost requiremens and tender process lead to “white box” equipment, not branded.

Tender purchase process leads to frequent changes of bidder. Good in that there is competition and we aren’t reliant on a

single supplier. Bad as we must deal with many companies, most of whom are

remote and subcontract maintenance services.

8 [email protected]

Large Scale Clusters The large number of boxes leads to problems in

terms of– Maintaining software homogeneity across the

clusters– Maintaining services despite the inevitable failures– Logistics

» Boxes arrive in batches of O(500)» Are vendors respecting the contractual warranty times?

(Have they returned the box we sent them last week…)

» How to manage service upgrades especially as not all boxes for a service will be up at the time of

upgrade

– …

9 [email protected]

10 [email protected]

Infrastructure Issues Cooling capacity limits the equipment we can install

– Maximum cooling of 1.5kW/m2

– 40x1U servers @ 200W/box = 8kW/m2

We cannot provide diesel backup for the full computer centre load.– Swiss/French auto-transfer covers most failures.– Dedicated zone for “critical equipment” with diesel

backup and dual power supplies.» Limited to 250kW for networks and laboratory computing

infrastructure.» … and physics services such as Grid and data management

servers but not all the physics network, so careful planning needed in terms of

switch/router allocations and the power connections.


How Where Why What How

Who


How Where Why What How

– Rigorous, centralised control Who


ELFms Extremely Large Farm management system

– box nodes in:» deliver required configuration» monitor performance and any

deviation from the required state» track nodes through hardware and

software state changes

Three components:– quattor for configuration, installation and node

management– Lemon for system and service monitoring– Leaf for managing state changes—both hardware

(HMS) and software (SMS)

Node ConfigurationManagement

NodeManagement


quattor quattor takes care of the configuration,

installation and management of nodes.– A Configuration Database holds the ‘desired state’

of all fabric elements» Node setup (CPU, HD, memory, software RPMs/PKGs, network,

system services, location, audit info…)» Cluster (name and type, batch system, load balancing info…)» Defined in templates arranged in hierarchies – common

properties set only once

– Autonomous management agents running on the nodetake care of

» Base installation» Service (re-)configuration» Software installation and management


quattor architecture

Node Configuration Manager NCM

CompA CompB CompC

ServiceA ServiceB ServiceC

RPMs / PKGs

SW Package ManagerSPMA

Managed Nodes

SW server(s)

HTTP

SWRepository RPMs

Install server

HTTP / PXE System installer

Install Manager

base OS

XML configuration profiles

Configuration server

HTTP

CDB

SQL backend

SQL

CLI

GUI

scriptsXML backend

SOAP


Lemon Lemon (LHC Era Monitoring) is a client-server

tool suite for monitoring status and performance comprising– sensors to measure the values of various metrics

» Several sensors exist to monitor node performance, process, hw and sw monitoring, database monitoring, security, alarms

» “External” sensors for metrics such as hardware errors and computer centre power consumption.

– a monitoring agent running on each node. This manages the sensors and sends data to the central repository

– a central repository to store the full monitoring history

» two implementations, Oracle or flat file based

– an RRD based display framework» Pre-processes data into rrd files and creates cluster

summaries Including “virtual” clusters such as the set of nodes being used by a given

experiment.


Lemon architecture

CorrelationEngines

User Workstations

Web browser

Lemon CLI

User

MonitoringRepository

TCP/UDP

SOAP

SOAP

Repositorybackend

SQL

Nodes

Monitoring Agent

Sensor SensorSensor

RRDTool / PHP

apache

HTTP


Leaf LEAF (LHC Era Automated Fabric) is a collection of

workflows for high level node hardware and software state management, built on top of quattor and Lemon.– HMS (Hardware Management System)

» Track systems through all physical steps in lifecycle eg. installation, moves, vendor calls, retirement

» Automatically requests installs, retires etc. to technicians» GUI to locate equipment physically» HMS implementation is CERN specific, but concepts and design

should be generic

– SMS (State Management System)» Automated handling (and tracking of) high-level configuration steps

Reconfigure and reboot all LXPLUS nodes for new kernel and/or physical move Drain and reconfig nodes for diagnosis / repair operations

» Issues all necessary (re)configuration commands via quattor» extensible framework – plug-ins for site-specific operations possible

– CCTracker (in development)» shows location of equipment in room


Node

HMS

LAN DB

SMS

CDB

OperationsSysadmins

1. Import

2. Set to standby

3. Update

4. Refresh

5. Take out of production

6. Shutdown work order

7. Request move10. Install work order

8. Update

9. Update

11. Set to production

12. Update

13. Refresh 14. Put into production

Use Case: Move rack of machines













Who Where Why What How Who

– Contract Shift Operators: 1 person 24x7– Technician level System Administration Team

» 10 team members plus 3 people for machine room operations plus engineer level manager

– Engineer level teams for Physics computing» System & Hardware support: approx 10FTE» Service support: approx 10FTE» ELFms software: 3FTE plus students and collaborators.

~30FTE-years total investment since 2001


Summary Physics requirements, budget and tendering

process lead to large scale clusters of commodity hardware.

We have developed and deployed tools to install, configure, monitor nodes and to automate hardware and software lifecycle steps.

Services must cope with individual node failures– already the case for simple services such as batch– new data management software being introduced to

reduce reliance on individual servers– focussing now on grid level services

We believe we are well prepared for LHC computing– but expect managing the large scale, complex

environment to be an exciting adventure

Documents

The CERN Computer Centres October 14 th 2005 Tony.Cass@ CERN.ch