LHC Computing Grid Visualization and management of information (IA369) Teacher Léo Pini Magalhães Danny Lachos Ramon Fontes

LHC Computing Grid Visualization and management of information (IA369) Teacher Lo Pini Magalhes Danny Lachos Ramon Fontes

Agenda 1. Introduction 2. Motivation for Grid computing 3. Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6. Conclusions 7. References

Introduction Fundamental research in particle physics Designs, builds & operates large accelerators Financed by 20 European countries 3,000 staff 6,000 users (researchers) from all over the world 2012: A boson with mass around 125 GeV/c 2 consistent with long-sought Higgs boson. Brazil was approved by CERN Council on 13 December 2013 to become the first Latin American associate member. As of July 2014, Brazil still needs to sign and ratify its accession agreement. CERN - The European Organisation for Nuclear Research The European Laboratory for Particle Physics

Introduction CERN - Location

Introduction LHC is the worlds largest and most powerful particle accelerator. Built by the European Organization for Nuclear Research (CERN) from 1998 to 2008. The LHC was built in collaboration with over 10,000 scientists and engineers from over 100 countries, as well as hundreds of universities and laboratories. LHC The Large Hadron Collider It lies in a tunnel 27 kilometers in circumference, as deep as 175 meters beneath the Franco-Swiss border near Geneva, Switzerland. The first beam was circulated through the collider on the morning of 10 September 2008. CERN successfully fired the protons around the tunnel in stages, three kilometers at a time.

Introduction LHC Experiments CMS ATLAS LHCb 15 Petabytes/year

Introduction Petabyte science The LHC produces at design parameters over 600 millions collisions ( ~ 10 9 collisions) proton-proton per second in ATLAS or CMS detectors. The amount of data collected for each event is around 1 MB (1 Megabyte). A trigger is designed to reject the uninteresting events and keep the interesting ones (For example, the ATLAS trigger system is designed to collect about 200 events per second) 200 events/s x 1 Mbyte = 200 MB/s (200 Megabyte/second) Taking two shifts of ten hours per day, and about 300 days per year: 200 MB/s x 2 x 10 x 3600 x 300 ~ 410 15 bytes/year 4 PB/year 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB 5GB = A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB Annual production of one LHC experiment 1 Exabyte (1EB) = 1000 PB 3EB = World annual information production

Introduction

Motivation for Grid computing In principle, the analysis of LHC data could have been carried out on one gigantic computer cluster sited at or near CERN. Problem: even with Computer Centre upgrade, CERN can provide only a fraction of the necessary resources. Solution: CERN has over 250 partner institutes in Europe, over 200 in rest of the world. Most have significant computing resources. Build a Grid that unites these computing resources.

Motivation for Grid computing Computing Grid is a computing infrastructure that is dependable, consistent, pervasive and inexpensive. The Grid took its name from the electricity grid where a number of different electricity producers are linked together by a series of pylons, substations, etc, to provide the end user. The computing Grid links various distributed computer resources, data services, CPU farms, etc, providing standard protocols for submitting the programs to be executed and receiving the output. In 2005, CERN proposed the setting up of the LHC Computing Grid (LCG) that became the Worldwide LHC Computing Grid (WLCG) in 2006. By 2012 it had become the worlds largest computing Grid, comprising over 170 computing facilities in 36 countries.

Building a Grid 1) TOOLKITS To enable a computing Grid to link together distributed heterogeneous resources in a transparent manner dedicated software is required, known as middleware. The Grid middleware that hides this complexity is assembled from various toolkits into a coherent system that has to scale to the size and architecture of the physical infrastructure. The middleware used by the WLCG is based on the Globus Toolkit. The Globus Toolkit provides the underlying layer of the software stack and includes components for security, information infrastructure, resource management, data management, communication, fault detection, and portability. A key element of Globus is the use of X.509 digital certificates for authorization and authentication.

Building a Grid A standard is a set of construction rules that tells you how to represent a required set of information. X.509 digital certificates X.509 is an ITU Telecommunications Standardization Sector (ITU-T) standard used in cryptography to implement public key infrastructure (PKI) to verify that a public key belongs to the user, computer or service identity contained within the certificate. X.509 was initially issued on July 3, 1988. An X.509 certificate contains information about the identity to which a certificate is issued and the identity that issued it. Standard information in an X.509 certificate includes: Version, Serial number, Algorithm information, Issuer distinguished name, Validity period of the certificate, Subject distinguished name, Subject public key information and Extensions (optional).

Building a Grid 2) ARCHITECTURE The LHC Computing Grid has a hierarchical structure of tier centres based around a single Tier-0 at CERN. After initial processing, this data will be distributed to a series of Tier-1 centres, large computer centres with sufficient storage capacity and with round-the-clock support for the Grid. The Tier-1 centres will make data available to Tier-2 centres, each consisting of one or several collaborating computing facilities, which can store sufficient data and provide adequate computing power for specific analysis tasks. Individual scientists will access these facilities through Tier-3 computing resources, which can consist of local clusters in a University Department or even individual PCs.

Building a Grid Data Reduction In almost all circumstances, it is impractical to work with all of the data in a Big Data resource. A three level trigger is used to select events that show signs of interesting physics processes: The first level is a hardware based trigger, extremely fast and wholly automatic process that looks for simple signs of interesting physics. The level-2 trigger is software based, and selects events based on a rudimentary analysys of regions of interest identified in level-1. The level-3 trigger does a preliminary reconstruction of the entire event, events that are selected by this trigger is stored for offline analysis. The principle is the same for all experiments, the purpose being to select only those events which will contain interesting physics and to filter out the less interesting background, thereby reducing the final volume of data to be written to permanent storage.

Building a Grid 3) COMPUTING SERVICES Most Big Data resources do not provide a great deal of information about the content and operation of their systems. With few exceptions, the providers of Big Data expect their intended users to approach their resource with a skill set appropriate for their information domain. A Grid job consists of the program to be executed, the necessary input files and a script specified using a formal language, such as the job definition language (JDL). Specifications input and output files and the requirements of the job in terms of memory, software, execution time, etc are based on the Condor ClassAd language. A workload management system (WMS) accepts each job and matches it to suitable site for execution where the necessary resources are available. Small output files are transmitted back through the WMS, while larger data files may be written to a storage element (SE) and catalogued.

Building a Grid Data Integration and Software (1/2) Job definition language (JDL) Is the language used to describe a job. User has to describe his jobs and their requirements, and to retrieve the output when the jobs are finished. A job description is a file (called JDL file) consisting of lines having the format: attribute = expression. Attention! JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line. Simple example: Type = "Job"; JobType = "Normal"; Executable = "myexe"; StdInput = "myinput.txt"; StdOutput = "message.txt"; StdError = "error.txt"; InputSandbox = {"myinput.txt", "/home/user/example/myexe"}; OutputSandbox = {"message.txt", "error.txt"};

Building a Grid Data Integration and Software (2/2) Condor ClassAd language The Condor system aims to maximize the utilization of workstations with as little interference as posible between the jobs it schedules and the activities of the people who own workstations. Classified Advertisements (classads) are a flexible mechanism for representing the characteristics and constraints of machines and jobs in the Condor system. A ClassAd is a set of uniquely named expressions. Each named expression is called an attribute. Simple example: MyType = Machine TargetType = Job Machine = froth.cs.visc.edu Arch = INTEL OpSys = LINUX Disk = 35882 Memory = 128

Building a Grid 4) DATA MANAGEMENT When data originates from many different sources, arrives in many different forms, grows in size, changes its values, and extends into the past and the future, the game shifts from data computation to data management. The data management services provide reliable tools to store, replicate and retrieve files. The number of files stored in the grid is typically of the order of 10 9. These should be accessible from any grid site, without requiring the user to know their physical location. A file can have as many replicas as needed and can be stored in any types of storage systems.

Building a Grid Files in the Grid can be referred by different names: - The Grid Unique IDentifier (GUID) - The Logical File Name (LFN) - The Storage URL (SURL) - The Transport URL (TURL) the GUIDs and LFNs identify a file irrespective of its location. the SURLs and TURLs contain information about where a physical replica is located.

Building a Grid The Grid Unique IDentifier (GUID) A way of naming data objects so that they can be retrieved by their name and a way of distinguishing each object from every other object in the system. An object identifier is an alphanumeric string associated with the object. The Grid Unique IDentifier (GUID), which identifies a file uniquely. This is assigned the first time the file is registered in the Grid. Based on the UUID (Universally unique identifier) standard to guarantee its uniqueness. A GUID is of the form: guid: guid:93bd772a-b282-4332-a0c5-c79e99fc2e9c

Building a Grid The Logical File Name (LFN) The ability to deidentify data objects confers enormous advantages when issues of confidentiality, privacy, and intellectual property emerge. The Logical File Name (LFN), which can be used to refer to a file in the place of the GUID. There is no need to know the name of the storage element that hosts the file. This allows the architecture of the storage system to be transparent against changes to the physical path. The LFNs are organized in a hierarchical directory like structure, and they will have the following format: lfn:/grid/_MyVO_/_MyDirs_/_MyFile_ lfn:/grid/dteam/generated/2007-05-02/test_result.txt

Building a Grid The Storage URL (SURL) The Storage URL (or Physical File Name PFN or Site FN), identifies a replica in a Storage Element (SE). The concept of SURLs also allows hiding the physical path of a file within a storage system. In other words, there are two levels of logical path, the first one on the grid level (LFN) and the second one on the storage system itself (SURLs). The general form: sfn | srm://_SE_hostname_/_some_string_ srm://lxdpm01.cern.ch/dpm/cern.ch/home/dteam/generated/2007-05- 02/file3596e86f-c402-11d7-a6b0-f53ee5a37e1d sfn://tbed0101.cern.ch/flatfiles/SE00/dteam/generated/2004-02- 26/file3596e86f-c402-11d7-a6b0-f53ee5a37e1d

Building a Grid The Transport URL (TURL) The Transport URL (TURL), which is a valid URI with the necessary information to access a file in a SE. Temporary locator of a replica + access protocol. A TURL is of the form: :// rfio://lxdpm01.cern.ch/storage00/dteam/generated/2007-05- 02/file27ad6ba1-46df-4052-abfd-1e75ef364 Where must be a valid protocol (supported by the SE) to access the contents of the file ( GSIFTP, RFIO, GSIFTP ). While SURLs are in principle invariable (they are entries in the file catalog), TURLs are obtained dynamically from the SURL through the Information System or the SRM interface. The TURL therefore can change with time and should be considered only valid for a relatively small period of time after it has been obtained.

Building a Grid Grid Unique IDentifier Relationship between LFNs, GUIDs, SURLs, TURLs and Aliases. Logical File Name Storage URL Transport URL

Using the Grid The experiment collaborations use the Grid for a wide range of tasks. The use-cases can be categorized, at a high level, into two types: - Structured work refers to well-defined processes that are run, typically, by a small number of experts. - Chaotic workflows are typically bespoke analysis jobs, run by a large number of different users. Users wishing to use Grid resources require both authentication and authorization. There is both a push and a pull model for Grid job submission. Software tools are required that handle the job submission, monitoring, data- movement and bookkeeping.

Using the Grid A schematic view of Grid job execution 1. The user submits his job to the WMS. 2. A compute element is selected to run the job. 3. Any input data is copied from a suitable SE. 4. Output data is written back to a SE. 5. Output files are transferred back to the WMS. 6. The user retrieves the output from the WMS.

http://www.youtube.com/watch?v=jDC3-QSiLB4 Processing LHC Data

Conclusions Even as construction of the Large Hadron Collider was underway, no real technical solution to the computing challenge existed and no realistic financial provision had been established. However, over the last 12 years an infrastructure was developed based on the concept of Grid computing, enabled by the dramatically falling costs of hardware. The fact that the LHC Computing Grid was created by a loosely coordinated global effort built on the strength of a common goal, is testament to the success of scientific collaboration across national boundaries and political ideologies, and to the cooperation of funding agencies. Physicists working on the LHC are able to access and analyse the huge amounts of data in a timely manner and scientific results are generated much faster than previously, an almost inconceivable situation given the step-change in the volume and complexity of the data.

References How to deal with petabytes of data: the LHC Grid project http://iopscience.iop.org/0034-4885/77/6/065902/pdf/0034- 4885_77_6_065902.pdf Grid-Enabled Standards-based Data Management http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4367964 Advancements in Big Data Processing in the Atlas and CMS Experiments http://arxiv.org/ftp/arxiv/papers/1303/1303.1950.pdf Processing LHC Data http://www.youtube.com/watch?v=jDC3-QSiLB4 Even Bigger Data: Preparing for the LHC/ATLAS Upgrade http://repositorium.sdum.uminho.pt/bitstream/1822/21661/1/IBERGRID- 2012.pdf

Thank you! Danny Lachos Ramon Fontes Its been a global effort, a global success. It has only been possible because of the extraordinary achievements of the accelerators, experiments and the Grid computing July 2012, Rolf Heuer, Director General of CERN, commenting on the discovery of a particle consistent with the Higgs boson.

Documents

LHC Computing Grid Visualization and management of information (IA369) Teacher Léo Pini Magalhães Danny Lachos Ramon Fontes