18
Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the LHCb collaboration

Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Embed Size (px)

Citation preview

Page 1: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Petabyte-scale computing challenges of the LHCb experiment

UK e-Science All Hands Meeting 2008, Edinburgh,

9th September 2008

Y. Y. Li on behalf of the LHCb collaboration

Page 2: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Outline

The questions…The LHC – the experiments taking up the

challengeThe LHCb experiment at the LHCLHCb computing model

Data flow, processing requirements

Distributive computing in LHCb Architecture and functionality Performance

Page 3: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

The questions…

The Standard Model of particle physics explains much of the interactions between the fundamental particles that form the Universe

all experiments so far confirming its predictionsBUT many questions still remain …

How does gravity fit into the model? Where does all the mass come from? Why do we have a Universe made up of matter? Does dark matter exist and how much?

Search for phenomena beyond our current understanding

Go back to the 1st billionth of a second after the BIG BANG…

Page 4: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

The Large Hadron Collider

100m below surface on Swiss/French border 14 TeV proton-proton collider, 7x higher than previous machines 1,232 superconducting magnets chilled to -271.3ºc 4 experiments/detectors

France

Swiss

CMSAlice

LHCb

Atlas

27Km

After ~25 years since its first proposal…

1st circulating beam tomorrow!

1st collisions in October 2008.

Page 5: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

LHCb

p p

VErtex LOcator – b decay vertex

Operates only ~5mm from the beam

Ring Imaging CHerenkov detector – particle ID

Human eye – 100 photos/s

RICH – 40million photos/s

LHC beauty experiment Special purpose detector to search for:

New physics in very rare b quark decays Investigates particle-antiparticle asymmetry

~ 1Trillion bb pairs per year!

Page 6: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Data flow

Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci Python: Bender)

Gauss

Event Generation

Detector Simulation

Boole

Digitization

Brunel

Reconstruction

DaVinci

Analysis

Bender

Sim

DST

DSTStatistics

RAWRAWData flow

from detector

Production Job

Detector calibrations

Analysis Job

Sim – Simulation data formatDST – Data Storage Tape

Page 7: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

CPU times

Gauss

Event Generation

Detector Simulation

Analysis

2,000 interesting events selected per second = 50MB/s data transferred and stored

40 million collisions (events) per second

DST

Offline full reconstruction, 150MB processed per second of running

Full simulation reconstruction, 100MB / event

500KB / event

962 physicist, 56 institutes in 4 continents

Full simulation DST 80s / event (2.8GHz Xeon processor)~100 years for 1 CPU to simulate 1s of real data!

107s data taking / year + simulation ~O(PB) data per year

Page 8: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

LHCb computing structure

CERN

RAL, UK

PIC, Spain

IN2P3, France

GridKA, Germany

NIKHEF, Netherlands

CNAF, Italy

Detector RAW data transfer

10MB/s

Simulation data transfer

1MB/s Tier 0 CERN Raw data, ~3K cpu

Tier 1 Large centres Reconstruction and

analysis, ~15K cpu

Tier 2 Universities (~34)

Simulations, ~19K cpu

Tier 3 / 4 Laptops, desktops etc…

Simulations

Needs distributed computing

Page 9: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

LHCb Grid Middleware - DIRAC

LHCb’s grid middleware: Distributed Infrastructure with Remote Agent Control

PythonMulti-platform (Linux, Windows)Built with common grid tools

GSI (Grid Security Infrastructure) authenticationPulls together all resources, shared with other

experiments Uses experimental wide CPU fair share

Optimises CPU usage with Long, steady simulation jobs by production managers Chaotic analysis usage by individual users

Page 10: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

DIRAC architecture

Service orientated architecture4 parts

User interface Services Agents Resources

Uses a pull strategy for assigning CPU’s Free, stable CPU’s request for jobs from main

server Useful in masking instability of resources from

users

Page 11: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Linux based

Multi-platform

Multi-platform

Combination of DIRAC

services and non-DIRAC

services

Web monitoring

Page 12: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Security and data access

DISET, DIRAC SEcuriTy module Uses openssl and modified pyopenssl Allows for proxy support for secure access

DISET portal used to facilitate secure access on various platforms when authentication process is OS dependent

Platform binaries shipped with DIRAC, version is determined during installation

Various data access protocols supported SRM, GridFTP, .NetGridFTP on Windows etc …

Data Services operates on main server Each file is assigned a logical file name that matches

to the physical file name(s)

Page 13: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Compute element resources

Other grids, e.g. WLCG (Worldwide LHC Computing Grid) Linux machines

Local batch systems, CondorStand alone, desktops, laptops etc …Windows

3 sites so far ~100 CPU Windows Server, Windows Compute Cluster Windows XP

~90% of the World’s computers are Windows

Page 14: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Pilot agents

Used to access other grid resources, e.g. WLCG via gLite

User job triggers pilot agent submission by DIRAC as a ‘grid job’ to reserve CPU time

Pilot on WN checks environment before retrieving the user job from DIRAC WMS

Advantages Easy control of CPU quota for shared resources Several pilot agents can be deployed for the same job if failure on WN occurs If full reserved CPU time is not used another

job can also be retrieved from the DIRAC WMS

Page 15: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Agents on Windows

Windows resources – CPU scavenging

Non-LHC dedicated CPU’s Spare CPU’s at

Universities, private home computers etc…

Agent launch would be triggered by e.g. screen saver

CPU resource contribution determined by owner during DIRAC installation

Head Node

DIRA

C A

PID

IRAC

API

DIRAC WMS

Job ManagementService

SandboxService

Job Matcher

LFC Service

Job MonitoringService

SoftwareRepository

DISETLocal

SE

DIRACAgent

Watch-dog

DIRAC

WrapperJob

ProxyServer

Windows Compute Cluster external communication

Windows Compute Cluster internal communication

Head NodeHead Node

DIRA

C A

PID

IRAC

API

DIRAC WMSDIRAC WMS

Job ManagementService

SandboxService

Job Matcher

Job ManagementService

SandboxService

Job Matcher

LFC Service

Job MonitoringService

SoftwareRepositorySoftware

Repository

DISETDISETLocal

SELocal

SE

DIRACAgent

Watch-dog

Watch-dog

DIRAC

WrapperJobJob

ProxyServerProxyServer

Windows Compute Cluster external communication

Windows Compute Cluster internal communication

Windows Compute Cluster Shared single DIRAC installation Job Wrapper submits retrieved jobs via Windows CC submission calls Local job scheduling determined by the Windows CC scheduling service

Page 16: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Cross-platform submissions

Submissions made with valid grid proxy

Three Ways JDL (Job Description Language) DIRAC API Ganga job management system

Built on DIRAC API commands

Full porting to Windows under process

SoftwarePackages = { “DaVinci.v19r12" };InputSandbox = { “DaVinci.opts” };InputData = { "LFN:/lhcb/production/DC06/v2/00980000/DST/Presel_00980000_00001212.dst" };JobName = “DaVinci_1";Owner = "yingying";StdOutput = "std.out";StdError = "std.err";OutputSandbox = { "std.out", "std.err", “DaVinci_v19r12.log” “DVhbook.root” };JobType = "user";

JDLimport DIRACfrom DIRAC.Client.Dirac import *

dirac = Dirac()job = Job()

job.setApplication(‘DaVinci', 'v19r12')job.setInputSandbox(['DaVinci.opts’])job.setInputData(['LFN:/lhcb/production/DC06/v2/00980000/DST/Presel_00980000_00001212.dst'])job.setOutputSandbox([‘DaVinci_v19r12.log’, ‘DVhbook.root’])

dirac.submit(job)

API

User pre-compiled binaries can also be shipped

Jobs are then bound to be processed on the same platform

Successfully used in full selection and background analysis studies (User – Windows, resources – Windows and Linux)

Page 17: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Performance

Successful processing of data challenges since 2004 Latest data challenge

record of >10,000 simultaneous jobs (analysis and production) 700M events simulated in 475 days, ~1700 years of CPU time

Windows SitesLinux Sites

Total Running Jobs: 9715

Page 18: Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the

Conclusions

LHC will expect O(PBytes) of data per year per experiment

Data to be analysed by 1,000s physicists on 4 continents

LHCb distributed computing structure is in place, pulling together a total of ~40K CPU’s from across the World

The DIRAC system has been fine tuned on the experiences from the past 4 years of intensive testing

We now eagerly await for the LHC switch on and the true test!

1st beams tomorrow morning!!!