LHC Computing Grid Visualization and management of information
(IA369) Teacher Lo Pini Magalhes Danny Lachos Ramon Fontes
Slide 2
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 3
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 4
Introduction Fundamental research in particle physics Designs,
builds & operates large accelerators Financed by 20 European
countries 3,000 staff 6,000 users (researchers) from all over the
world 2012: A boson with mass around 125 GeV/c 2 consistent with
long-sought Higgs boson. Brazil was approved by CERN Council on 13
December 2013 to become the first Latin American associate member.
As of July 2014, Brazil still needs to sign and ratify its
accession agreement. CERN - The European Organisation for Nuclear
Research The European Laboratory for Particle Physics
Slide 5
Introduction CERN - Location
Slide 6
Introduction LHC is the worlds largest and most powerful
particle accelerator. Built by the European Organization for
Nuclear Research (CERN) from 1998 to 2008. The LHC was built in
collaboration with over 10,000 scientists and engineers from over
100 countries, as well as hundreds of universities and
laboratories. LHC The Large Hadron Collider It lies in a tunnel 27
kilometers in circumference, as deep as 175 meters beneath the
Franco-Swiss border near Geneva, Switzerland. The first beam was
circulated through the collider on the morning of 10 September
2008. CERN successfully fired the protons around the tunnel in
stages, three kilometers at a time.
Introduction Petabyte science The LHC produces at design
parameters over 600 millions collisions ( ~ 10 9 collisions)
proton-proton per second in ATLAS or CMS detectors. The amount of
data collected for each event is around 1 MB (1 Megabyte). A
trigger is designed to reject the uninteresting events and keep the
interesting ones (For example, the ATLAS trigger system is designed
to collect about 200 events per second) 200 events/s x 1 Mbyte =
200 MB/s (200 Megabyte/second) Taking two shifts of ten hours per
day, and about 300 days per year: 200 MB/s x 2 x 10 x 3600 x 300 ~
410 15 bytes/year 4 PB/year 1 Megabyte (1MB) A digital photo 1
Gigabyte (1GB) = 1000MB 5GB = A DVD movie 1 Terabyte (1TB) = 1000GB
World annual book production 1 Petabyte (1PB) = 1000TB Annual
production of one LHC experiment 1 Exabyte (1EB) = 1000 PB 3EB =
World annual information production
Slide 9
Introduction
Slide 10
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 11
Motivation for Grid computing In principle, the analysis of LHC
data could have been carried out on one gigantic computer cluster
sited at or near CERN. Problem: even with Computer Centre upgrade,
CERN can provide only a fraction of the necessary resources.
Solution: CERN has over 250 partner institutes in Europe, over 200
in rest of the world. Most have significant computing resources.
Build a Grid that unites these computing resources.
Slide 12
Motivation for Grid computing Computing Grid is a computing
infrastructure that is dependable, consistent, pervasive and
inexpensive. The Grid took its name from the electricity grid where
a number of different electricity producers are linked together by
a series of pylons, substations, etc, to provide the end user. The
computing Grid links various distributed computer resources, data
services, CPU farms, etc, providing standard protocols for
submitting the programs to be executed and receiving the output. In
2005, CERN proposed the setting up of the LHC Computing Grid (LCG)
that became the Worldwide LHC Computing Grid (WLCG) in 2006. By
2012 it had become the worlds largest computing Grid, comprising
over 170 computing facilities in 36 countries.
Slide 13
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 14
Building a Grid 1) TOOLKITS To enable a computing Grid to link
together distributed heterogeneous resources in a transparent
manner dedicated software is required, known as middleware. The
Grid middleware that hides this complexity is assembled from
various toolkits into a coherent system that has to scale to the
size and architecture of the physical infrastructure. The
middleware used by the WLCG is based on the Globus Toolkit. The
Globus Toolkit provides the underlying layer of the software stack
and includes components for security, information infrastructure,
resource management, data management, communication, fault
detection, and portability. A key element of Globus is the use of
X.509 digital certificates for authorization and
authentication.
Slide 15
Building a Grid A standard is a set of construction rules that
tells you how to represent a required set of information. X.509
digital certificates X.509 is an ITU Telecommunications
Standardization Sector (ITU-T) standard used in cryptography to
implement public key infrastructure (PKI) to verify that a public
key belongs to the user, computer or service identity contained
within the certificate. X.509 was initially issued on July 3, 1988.
An X.509 certificate contains information about the identity to
which a certificate is issued and the identity that issued it.
Standard information in an X.509 certificate includes: Version,
Serial number, Algorithm information, Issuer distinguished name,
Validity period of the certificate, Subject distinguished name,
Subject public key information and Extensions (optional).
Slide 16
Building a Grid 2) ARCHITECTURE The LHC Computing Grid has a
hierarchical structure of tier centres based around a single Tier-0
at CERN. After initial processing, this data will be distributed to
a series of Tier-1 centres, large computer centres with sufficient
storage capacity and with round-the-clock support for the Grid. The
Tier-1 centres will make data available to Tier-2 centres, each
consisting of one or several collaborating computing facilities,
which can store sufficient data and provide adequate computing
power for specific analysis tasks. Individual scientists will
access these facilities through Tier-3 computing resources, which
can consist of local clusters in a University Department or even
individual PCs.
Slide 17
Building a Grid Data Reduction In almost all circumstances, it
is impractical to work with all of the data in a Big Data resource.
A three level trigger is used to select events that show signs of
interesting physics processes: The first level is a hardware based
trigger, extremely fast and wholly automatic process that looks for
simple signs of interesting physics. The level-2 trigger is
software based, and selects events based on a rudimentary analysys
of regions of interest identified in level-1. The level-3 trigger
does a preliminary reconstruction of the entire event, events that
are selected by this trigger is stored for offline analysis. The
principle is the same for all experiments, the purpose being to
select only those events which will contain interesting physics and
to filter out the less interesting background, thereby reducing the
final volume of data to be written to permanent storage.
Slide 18
Building a Grid 3) COMPUTING SERVICES Most Big Data resources
do not provide a great deal of information about the content and
operation of their systems. With few exceptions, the providers of
Big Data expect their intended users to approach their resource
with a skill set appropriate for their information domain. A Grid
job consists of the program to be executed, the necessary input
files and a script specified using a formal language, such as the
job definition language (JDL). Specifications input and output
files and the requirements of the job in terms of memory, software,
execution time, etc are based on the Condor ClassAd language. A
workload management system (WMS) accepts each job and matches it to
suitable site for execution where the necessary resources are
available. Small output files are transmitted back through the WMS,
while larger data files may be written to a storage element (SE)
and catalogued.
Slide 19
Building a Grid Data Integration and Software (1/2) Job
definition language (JDL) Is the language used to describe a job.
User has to describe his jobs and their requirements, and to
retrieve the output when the jobs are finished. A job description
is a file (called JDL file) consisting of lines having the format:
attribute = expression. Attention! JDL is sensitive to blank
characters and tabs. No blank characters or tabs should follow the
semicolon at the end of a line. Simple example: Type = "Job";
JobType = "Normal"; Executable = "myexe"; StdInput = "myinput.txt";
StdOutput = "message.txt"; StdError = "error.txt"; InputSandbox =
{"myinput.txt", "/home/user/example/myexe"}; OutputSandbox =
{"message.txt", "error.txt"};
Slide 20
Building a Grid Data Integration and Software (2/2) Condor
ClassAd language The Condor system aims to maximize the utilization
of workstations with as little interference as posible between the
jobs it schedules and the activities of the people who own
workstations. Classified Advertisements (classads) are a flexible
mechanism for representing the characteristics and constraints of
machines and jobs in the Condor system. A ClassAd is a set of
uniquely named expressions. Each named expression is called an
attribute. Simple example: MyType = Machine TargetType = Job
Machine = froth.cs.visc.edu Arch = INTEL OpSys = LINUX Disk = 35882
Memory = 128
Slide 21
Building a Grid 4) DATA MANAGEMENT When data originates from
many different sources, arrives in many different forms, grows in
size, changes its values, and extends into the past and the future,
the game shifts from data computation to data management. The data
management services provide reliable tools to store, replicate and
retrieve files. The number of files stored in the grid is typically
of the order of 10 9. These should be accessible from any grid
site, without requiring the user to know their physical location. A
file can have as many replicas as needed and can be stored in any
types of storage systems.
Slide 22
Building a Grid Files in the Grid can be referred by different
names: - The Grid Unique IDentifier (GUID) - The Logical File Name
(LFN) - The Storage URL (SURL) - The Transport URL (TURL) the GUIDs
and LFNs identify a file irrespective of its location. the SURLs
and TURLs contain information about where a physical replica is
located.
Slide 23
Building a Grid The Grid Unique IDentifier (GUID) A way of
naming data objects so that they can be retrieved by their name and
a way of distinguishing each object from every other object in the
system. An object identifier is an alphanumeric string associated
with the object. The Grid Unique IDentifier (GUID), which
identifies a file uniquely. This is assigned the first time the
file is registered in the Grid. Based on the UUID (Universally
unique identifier) standard to guarantee its uniqueness. A GUID is
of the form: guid: guid:93bd772a-b282-4332-a0c5-c79e99fc2e9c
Slide 24
Building a Grid The Logical File Name (LFN) The ability to
deidentify data objects confers enormous advantages when issues of
confidentiality, privacy, and intellectual property emerge. The
Logical File Name (LFN), which can be used to refer to a file in
the place of the GUID. There is no need to know the name of the
storage element that hosts the file. This allows the architecture
of the storage system to be transparent against changes to the
physical path. The LFNs are organized in a hierarchical directory
like structure, and they will have the following format:
lfn:/grid/_MyVO_/_MyDirs_/_MyFile_
lfn:/grid/dteam/generated/2007-05-02/test_result.txt
Slide 25
Building a Grid The Storage URL (SURL) The Storage URL (or
Physical File Name PFN or Site FN), identifies a replica in a
Storage Element (SE). The concept of SURLs also allows hiding the
physical path of a file within a storage system. In other words,
there are two levels of logical path, the first one on the grid
level (LFN) and the second one on the storage system itself
(SURLs). The general form: sfn | srm://_SE_hostname_/_some_string_
srm://lxdpm01.cern.ch/dpm/cern.ch/home/dteam/generated/2007-05-
02/file3596e86f-c402-11d7-a6b0-f53ee5a37e1d
sfn://tbed0101.cern.ch/flatfiles/SE00/dteam/generated/2004-02-
26/file3596e86f-c402-11d7-a6b0-f53ee5a37e1d
Slide 26
Building a Grid The Transport URL (TURL) The Transport URL
(TURL), which is a valid URI with the necessary information to
access a file in a SE. Temporary locator of a replica + access
protocol. A TURL is of the form: ://
rfio://lxdpm01.cern.ch/storage00/dteam/generated/2007-05-
02/file27ad6ba1-46df-4052-abfd-1e75ef364 Where must be a valid
protocol (supported by the SE) to access the contents of the file (
GSIFTP, RFIO, GSIFTP ). While SURLs are in principle invariable
(they are entries in the file catalog), TURLs are obtained
dynamically from the SURL through the Information System or the SRM
interface. The TURL therefore can change with time and should be
considered only valid for a relatively small period of time after
it has been obtained.
Slide 27
Building a Grid Grid Unique IDentifier Relationship between
LFNs, GUIDs, SURLs, TURLs and Aliases. Logical File Name Storage
URL Transport URL
Slide 28
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 29
Using the Grid The experiment collaborations use the Grid for a
wide range of tasks. The use-cases can be categorized, at a high
level, into two types: - Structured work refers to well-defined
processes that are run, typically, by a small number of experts. -
Chaotic workflows are typically bespoke analysis jobs, run by a
large number of different users. Users wishing to use Grid
resources require both authentication and authorization. There is
both a push and a pull model for Grid job submission. Software
tools are required that handle the job submission, monitoring,
data- movement and bookkeeping.
Slide 30
Using the Grid A schematic view of Grid job execution 1. The
user submits his job to the WMS. 2. A compute element is selected
to run the job. 3. Any input data is copied from a suitable SE. 4.
Output data is written back to a SE. 5. Output files are
transferred back to the WMS. 6. The user retrieves the output from
the WMS.
Slide 31
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 32
http://www.youtube.com/watch?v=jDC3-QSiLB4 Processing LHC
Data
Slide 33
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 34
Conclusions Even as construction of the Large Hadron Collider
was underway, no real technical solution to the computing challenge
existed and no realistic financial provision had been established.
However, over the last 12 years an infrastructure was developed
based on the concept of Grid computing, enabled by the dramatically
falling costs of hardware. The fact that the LHC Computing Grid was
created by a loosely coordinated global effort built on the
strength of a common goal, is testament to the success of
scientific collaboration across national boundaries and political
ideologies, and to the cooperation of funding agencies. Physicists
working on the LHC are able to access and analyse the huge amounts
of data in a timely manner and scientific results are generated
much faster than previously, an almost inconceivable situation
given the step-change in the volume and complexity of the
data.
Slide 35
Agenda 1. Introduction 2. Motivation for Grid computing 3.
Building a Grid 4. Using the Grid 5. Video Processing LHC Data 6.
Conclusions 7. References
Slide 36
References How to deal with petabytes of data: the LHC Grid
project http://iopscience.iop.org/0034-4885/77/6/065902/pdf/0034-
4885_77_6_065902.pdf Grid-Enabled Standards-based Data Management
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4367964
Advancements in Big Data Processing in the Atlas and CMS
Experiments http://arxiv.org/ftp/arxiv/papers/1303/1303.1950.pdf
Processing LHC Data http://www.youtube.com/watch?v=jDC3-QSiLB4 Even
Bigger Data: Preparing for the LHC/ATLAS Upgrade
http://repositorium.sdum.uminho.pt/bitstream/1822/21661/1/IBERGRID-
2012.pdf
Slide 37
Thank you! Danny Lachos Ramon Fontes Its been a global effort,
a global success. It has only been possible because of the
extraordinary achievements of the accelerators, experiments and the
Grid computing July 2012, Rolf Heuer, Director General of CERN,
commenting on the discovery of a particle consistent with the Higgs
boson.