Comparative Study of - CERN Document Server study.pdf · regards to this comparative study we will focus on conditions databases in the field of particle physics. Particle physics

Comparative Study of

Condition Databases

P.Bhatnagar, G.De Almeida Calheiros, D.Fratte, K.Karmas, A.Megerdoumian, D.Satria,

B.Tassew

04 July 2018

Introduction

The evolution of technology is rapidly flourishing. With that comes a lot of data and therefore

databases, where you save the data collected and produced by users, computers, and

experiments. Like any other field, the field of physics has a lot of experiments taking place

worldwide, which produces large amount of data for physicists to examine. To make sense of

a massive chunk of data, we require special databases, computing power to collect the data

without any leaks, filter it and format it so it’s easily accessible, readable, and available. In

regards to this comparative study we will focus on conditions databases in the field of particle

physics. Particle physics is the branch of physics that deals with the properties, relationships,

and interactions of subatomic particles[1] and this is the field in which CERN specialises.

Within particle physics, a new dimension has been brought up in the recent years and that is

the study of hidden particles. These are the particles that scientists believe are produced when

two protons collide in a beam experiment. This has become a new challenge in the last decade

for scientists working in the field of particle physics. To prove that scientists are correct when

they theorize that hidden particle exists, we can look back in 2012, when a particle called Higgs

Boson was found in LHC experiment within CMS facility. The research is taking place and it

is predicted that it can make a huge difference in the future and past theories.

In recent times, many other scientific institutes around the world has started taking the hidden

particles seriously. There are multiple facilities which have been established to experiment.

Currently, CERN is a forerunner when it comes to conditional databases with it’s Large Hadron

Collider situated in Geneva, Switzerland, but there are many others with the likes of BaBar

(Stanford University) , BELLE II (Japanese), FAIR, to mention a few. For this comparative

study, we will be focusing on LHC, BaBar, BELLE II, and FAIR.

BaBar

The BaBar experiment uses a distributed database architecture for its condition data [2]. This

version of the database was deployed in 2002 replacing the original version that was

implemented in 1999 due to its limitation in performance and scalability. This was partially

due to the fact that it was not designed to be used in a distributed environment. Another reason

for the change was that it exposed internal implementations through its interface. The

experiment used the same technology in both online and offline systems. The underlying

database was implemented using Objectivity/DB for compatibility reasons although it is

technology neutral. The main concept behind this design was the concept of an ‘origin’. Each

installation of the database has its native origin. Origins are the owners of the data, i.e. data

associated with the origin can only be updated at the database installations of the same origin.

When an installation contains data from other origins, it will only be available in read only

mode. The design handles versioning of conditions data through a 2-D geometric model

consisting of the insertion interval of validity timelines.

The main advantage of this approach was the support for distributed architecture. The

limitations associated with it are the complexity of the schema which contains 50 persistent

classes for metadata with more than 400 classes for the payload,and management efforts

regarding database and content management. The figure below displays the implementation

followed by the Babar Experiment.

Figure 1. Distributed model of CDB Figure 2. Map of Distributed CDB

CMS

In the CMS experiment there were also two versions of the conditions database and an

exploratory research towards a third one. In the first version, the Object Relational Mapping

(ORM) approach was followed. Root based packages were used for C++ class introspection,

which meant that the inner workings of the classes needed to be known in order to serialize

them. This led to a complex system where many layers of abstraction were needed on top of

the data access layer and a large number of tables were generated, which then had to be stored

in different schemata per subsystem to avoid naming conflicts.

For the second version[3], in order to simplify the read and write of conditions, a much simpler

database design was used. This design simplified the data into five relational tables

(GLOBAL_TAG_MAP, GLOBAL_TAG, TAG, IOV, and PAYLOAD) with only one of these,

the payload table, which holds the bulk data. For reading a specific payload, a global tag and

tag were specified with an interval of validity and a specific payload was identified by a

timestamp. To make queries execute faster indexing was used. Oracle was selected as an

underlying persistent technology for the conditions data due to scalability, reliability and

resilience, and 24/7 database administration support from CERN IT department. SQLite was

used to store user data in private files as an interchange format. A C++ abstraction layer for

databases called CORAL was added to achieve interoperability between SQLite and Oracle.

There were many technologies that were used in this infrastructure including, Boost library for

serialization of C++ objects; Clang-based build infrastructure to generate serialization code;

Frontier for load balancing and simultaneous connection to the database; Conditions Uploader

for centralized access to upload new conditions data, synchronization, security and logging;

and a Web-based conditions browser for inspection of available data and searching. ATLAS

experiment has similar infrastructure for conditions database with CMS[4].

CMS researchers have also presented a study of alternative data storage backends for the CMS

Conditions Databases, evaluating some of the most popular NoSQL ones to support a key-

value representation of the CMS conditions. The reasons behind the study are some important

quality attributes they intend to achieve, such as having a highly available and consistent access

to condition data. Enhancement on the performance is also something to take into account.

The data that CMS is handling has specific characteristics such as, it does not require table

joins; there are no complex or nested queries; conditions are never updated; there are no

concurrency writes among others. For those reasons they have considered deploying their

solution in this study using the NoSQL approach, and performed some metrics tasks upon them

for further analysis.

The results are specific in terms of which database technology performed better according to

their parameters and data requirements. MongoDB and Cassandra seem to outperform other

options considered in the paper. They both show excellent performance, simplicity and

flexibility metrics, though MongoDB has the best performance in time to fetch.

Belle II

Belle II experiment uses industry standard applications and tools, extensively to get support by

IT staff instead of dedicated scientists to reduce cost and effort[5, 6]. The performance was

proven more than sufficient for the current loads. Figure 3 depicts the current back-end

implementation of Belle II. Every component in Figure 3 is implemented as separate Docker

container.

Figure 3. Current back-end implementation of conditions database in Belle II experiment.

The conditions database in this project utilizes PostgreSQL. The payload in the database is only

composed of references to files on a separate server for keeping the database small. Three

NGINX HTTP servers acts as file servers complemented by a load balancer to control the

traffic among the servers. Lustre as a parallel distributed file system accommodates the back-

end file system. Payara is an application server to provide facilities to create a server

environment that connect to the Lustre file system. An improvement for application server was

performed to replace full Payara Server development to Payara Micro to solve the problem of

running out of memory when the request rate is significantly higher than the expected ones.

Integration with Hibernate and Hazelcast has been enabled for two levels of application caching

in addition to web-proxy caching. The first reason for the additional caching is to improve

performance for web-proxy low hit/miss ratio use cases and in turn minimize database access.

Secondly, the application cache allows clustering/synchronization of service installations.

Service WRITEs (i.e. PUT/POST/DELETE) are communicated as cache invalidation messages

to the other cluster members. Hibernate as cache level 1 provides the conditions data storage

and retrieval query service. Hazelcast as cache level 2 provides entity caching for Hibernate.

Squid cache proxy server allows the system to reduce load on the API and database to store

REST resource responses.

The future improvement for data cache is to place portions of the cache at three or more sites

with Hazelcast to provide faster response times and backup capability. When Squid and API

services are scalable, an assessment to see the need for scalability on PostgreSQL will take

place since PostgreSQL is currently still a single-site instance. At present, there are no

restrictions on database access. For improvement of the database access, authentication is

needed for writing, but for reading it is not necessary.

LHCb

The LHCb project uses a different approach to save the condition data, that consist in map time

evolution of conditions data on a filesystem hierarchy and uses Git as internal database to

manage them. This approach is not much different from the workflow of any other regular

software project and also it reduces the complexity of how to persist the data. At the same time

allowing the researcher easily to replace data partially before analysing it without changing the

whole database. On the other hand, it adds more complexity in how to work with the data. In

order to make the solution workable it is necessary to add complexity in different places. For

instance, to replace data it is necessary to create a nested Git repository. Another interesting

point when it comes to sharing the information with other researchers they use distribution

service named cvmfs (CernVM File System) in conjunction with Git.

To conclude, there are different technologies in use regarding conditions data although one

pattern seems to be common among them in terms of the data model. This model consists of a

Global Tag table that points to a payload data for a given interval of validity and is related to a

system Tag that in turn consists of many intervals of validity. The diagram below shows this

model[11].

Figure 4. The commonly followed approach as described by Laycok et al [11]

Table on the next page shows the similarities and differences amongst different Condition

databases mentioned above. The basis of discriminating them is on Relation and Non relations,

technologies (server/client), design rationales, features, and limitations of these databases.

Experiment Relational Vs

Non-Relational

Technologies (server)

Technologies (Client)

Design rationales Features Limitations

Babar 1st version

Relational Objectivity/DB

Performance & scalability issues

Design not supporting distributed computing

Babar 2nd version, CDB

Relational First implemented in Objectivity/DB for compatibility purposes

Limitations of the

first version

Distributed design

50 persistent classes(metadata)

400 unique classes(payload

Complex schema

CMS 2nd version

ORM, Object Oriented, C++

Boost library(for runtime serialization of objects to records)

Clang python library (generates code for class introspection automatically)

Oracle DB

CORAL plugins

POOL Persistency framework

Portability across platforms

Stability and reliability

Multithreading support

Scalability, reliability & 24 h support(Oracle)

Simple queries

Time search for IOV partially done in client side (cache index)

Uniquely identifiable payloads(with hash)

Only 5 tables

Boost requires dedicated build infrastructure(clang)

Study for CMS suggested the use of NoSQL databases like MongoDB, Cassandra and RIAK) - MongoDB showed the highest performance)

CORAL (C++ abstraction layer)

Frontier layer for load balancing

SQL Lite for

private files of users

SQL alchemy binding layer for RDBMS in python (DB connection and transactions management)

ATLAS and LHCb[12]

Relational

CORAL Server

Oracle, MySQL, SQLite, FroNTier

COOL (collaboration between ATLAS and LHCb)

Performance optimization for data insertion and retrieval

Online and offline clusters

Support for different relational technologies

Ported to 64 bit platforms

Server side query optimization

Authentication only using username and passwords

Performance bottlenecks due to separate physical connection from different client jobs)

Distributed DB services

Belle II [13-15]

Relational PostgreSQL

Swagger for the application interface development

Payara for Java EE application server

The extensive use of industry standard applications and tools to allow the project to be largely supported by IT staff instead of scientists to reduce cost and effort

To make maintenance as easy as possible

Standardized REST API makes the client coding independent of the actual database implementation details

Caching support

using a Squid cache proxy server to reduce load on the API and database

Payload file server has multiple HTTP servers linked with load balancer

Database and payload (file) server are separate systems

Each component of the database back-end implementation are implemented in Docker

Communication with the database uses standard HTTP (XML and JSON data)

The file server currently consists of three NGINX HTTP servers handled by a load balancer to distribute traffic

Payara (Java) runs out of memory if request rate is significantly higher than the expected rates

Future improvement is placing portions of the cache at three or more sites with Hazelcast to provide faster response times as well as backup capability for site or network outages.

The PostgreSQL database is still a single-site instance and not currently scalable. Several options of supporting a distributed database are being investigated, including OpenStack Trove and CockroachDB. The replicated databases would be sited in tandem with the Hazelcast cache cluster sites.

To keep the database small, the payloads only consist of references to files on a separate server. These “payload files” are currently assumed to be ROOT objects by the client, although there is nothing in the database design that limits the data type.

evenly across the servers.

The back-end file system is based on Lustre, an open-source parallel file system for high-performance computing environments.

Hibernate as cache level 1 provides the conditions data storage and retrieval query service.

Hazelcast v3.8.6 in-memory data grid for support of scalable caching (L2) as well as transparent distribution of the service across multiple sites

Authentication is currently not implemented, but is planned for services that would modify the database. The possibility of leveraging the X.509 authentication already present in the Belle II Grid computing interface is being investigated.

LHCb Non-Relational

Git

GitCondDB

Git

Easy to persist data;

Data saved as text files

The logic how to access/manage the data is complex and it is

Easy to create new versions of the data;

Able to deal with a

huge amount of data;

implemented in the browser;

Is necessary to use other services in other to share the data;

Performance degradation with database exceeding 10 GBs

ALICE File system AliEn client ROOT Files on Local and Grid Storage

ALiCDBmanager

AliROOT CDB Access framework

Offline and online object Versioning

done during storage

The difference in such systems lies in the representation of payload. Some experiments like

ALICE and NA62 use a File based approach while others like ATLAS use BLOB fields in

relational databases. Another approach was seen in LHCb which uses git to store its conditions

data and handle versioning. The advantage of using relational databases like oracle lies in the

fact that, the look up for payload data using Global Tag, Tag and IOV’s is easily handled

through indexing. On the other hand the File based approach is very simple and flexible in

terms of the kind of data that can be saved. The challenge of the file based approach in having

a performant caching layer that does the filesystem mapping efficiently.

A new approach that is being suggested by some studies is that NoSQL databases approach can

support this payload-based data well[10]. Another suggestion made by Laycock et al[11] is a

REST web service with client interface for database insertion with a relational data model as

the persistent layer.

LHCb Implementation:

The above diagram describes the Git workflow used in git CondDB. In the upper part,

auxiliaries scripts are ran it needs to convert the existing data (from the previous database

implementation) for the new format. In the lower part, it presents the integration / workflow

with Git.

Concept The idea is mapping the time evolution of conditions data on a filesystem hierarchy and use Git internal database to manage versions and tags.

Layout The conditions database uses a 3-dimensional structure, where the axis are:

condition id (usually, path to an XML file) version (tag, commit id, branch name, ...) Interval Of Validity (IOV, range of event times a value should be used for)

The first two axis, id and version, map directly to the two axis of a Git repository (path and version), while for the IOVs we use a special convention:

If a condition id points to a file, there is only one value for the whole span of all event times (from 0 to MAX)

If a condition id points to a directory, it must contain a file called IOVs in which each line is constituted by the start of the IOV for a value and the relative path to the file containing the value, or payload (the end of the IOV is defined by the next line or set to MAX for the last line of the file. If the path to the value points to a directory, the algorithm to extract the value is repeated on the contained IOVs file and the deduced IOV is truncated to the boundaries deduced in the previous iteration.

To explain better how the IOV definitions work, let's assume we need to find the value for condition Conditions/MyDetector/Cond.xml at event time 1234

if Conditions/MyDetector/Cond.xml is a file: the value is the content of the file and the IOV is [0, MAX)

if Conditions/MyDetector/Cond.xml is a directory (simple case) 1. we open Conditions/MyDetector/Cond.xml/IOVs where we find something like

0 value1

100 value2

200 value1

1000 value3

2000 value4

2. the value is the content of Conditions/MyDetector/Cond.xml/value3 and the IOV is [1000, 2000)

if Conditions/MyDetector/Cond.xml is a directory (nested case) 1. we open Conditions/MyDetector/Cond.xml/IOVs where we find something like

0 value1

100 subdir1

1200 subdir2

1. we open Conditions/MyDetector/Cond.xml/subdir2/IOVs to look for IOVs in [1200, MAX), where we find something like

1000 ../value3

2000 ../value4

2. the value is the content of Conditions/MyDetector/Cond.xml/value3 and the

IOV is [1000, 2000)

2. The value is the content of Conditions/MyDetector/Cond.xml/value3 and the IOV is

[1200, 2000) (i.e. the IOV found in the subdir, bounded to the IOV of the subdir itself)

Alice Implementation:

How FairMQ is used in ALICE-FAIR?

Note: FairMQ is for the transport layer only

Sample Code [16]

ID="101"

inputFile=”~/install/FairRoot/example/…/data/testdigi_TGeant3.root"

parameterFile=“~/install/FairRoot/example/…/data/testparams_TGeant3.

root"

branch="FairTestDetectorDigi"

eventRate="0"

numIoThreads="1"

outputSocketType="push"

outputBufSize=$buffSize

outputMethod="bind"

outputAddress="tcp://*:5565"

/local/home/cwg13/install/FairRoot/build/bin/testDetectorSampler$dat

aFormat $ID $inputFile $parameterFile $branch $eventRate

$numIoThreads $outputSocketType $outputBufSize $outputMethod

$outputAddress

Implementation[17]

Figure 1. activation of new storage location

Fig 2. Write

Fig 3. retrieval

Bibliography:

1. https://en.wikipedia.org/wiki/Particle_physics

2. A. Khan, I. A. Gaponenko and D. N. Brown, "Distributed Online Conditions Database

of the BaBar Experiment," in IEEE Transactions on Nuclear Science, vol. 55, no. 5,

pp. 2579-2583, Oct. 2008.

3. Di Guida S. et al., “The CMS Condition Database system”. In Proceedings of the CHEP

conference J. Phys.: Conf. Ser, 2015.

4. D. Barberis et al., “Designing a future Conditions Database based on LHC experience”

, J. Phys.: Conf. Series 664, 2015.

5. L. Wood et al., “Implementing the Belle II Conditions Database using Industry-

Standard Tools”, presented at ACAT conference, Aug. 2017. url:

https://indico.cern.ch/event/567550/contributions/2686391/attachments/1512060/2358

335/ACAT_CondDB_release.pdf.

6. L. Wood, T Elsethagen, and K. Fox, “Belle II Conditions Database: Status and Plans”,

Pacific Northwest National Laboratory, Oct. 2017.

7. https://twiki.cern.ch/twiki/bin/view/LHCb/GitCondDB

8. https://twiki.cern.ch/twiki/bin/view/LHCb/CondDBManagement

9. https://cernvm.cern.ch/portal/filesystem

10. R. Sipos, "Evaluation of NoSQL prototypes for the CMS conditions database," 2015

IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), San

Diego, CA, 2015, pp. 1-7.

11. P J Laycock, D Dykstra, A Formica, G Govi, A Pfeiffer, S Roe,R Sipos, “A

Conditions Data Management System for HEP Experiments”, presented at 18th

International Workshop on Advanced Computing and Analysis Techniques in Physics

Research, Nov 2017

12. A. Valassi, R. Basset, M. Clemencic, G. Pucciani, S. A. Schmidt and M. Wache,

"COOL, LCG conditions database for the LHC experiments: Development and

deployment status," 2008 IEEE Nuclear Science Symposium Conference Record,

Dresden, Germany, 2008, pp. 3021-3028. url:

https://ieeexplore.ieee.org/document/4774995/

13. L. Wood et al., “Implementing the Belle II Conditions Database using Industry-

Standard Tools”, presented at ACAT conference (August 2017). url:

https://en.wikipedia.org/wiki/Particle_physics

https://indico.cern.ch/event/567550/contributions/2686391/attachments/1512060/2358335/ACAT_CondDB_release.pdf




https://twiki.cern.ch/twiki/bin/view/LHCb/GitCondDB

https://cernvm.cern.ch/portal/filesystem

https://ieeexplore.ieee.org/document/4774995/


https://indico.cern.ch/event/567550/contributions/2686391/attachments/1512060/2358

335/ACAT_CondDB_release.pdf.

14. L. Wood, T Elsethagen, and K. Fox, “Belle II Conditions Database: Status and Plans”,

Pacific Northwest National Laboratory, (October 2017).

url:https://confluence.desy.de/download/attachments/77820194/CondDB_Status_v2.p

df?version=1&modificationDate=1511655220433&api=v2

15. L. Wood, "Conditions Database Code Review", Nov. 2017. url:

https://confluence.desy.de/display/BI/Conditions+Database+Code+Review

16. https://indico.cern.ch/event/315816/contribution/0/attachments/606182/834226/ALFA_C

WG13_250414.pptx

17. https://indico.cern.ch/event/420329/contributions/1883596/attachments/877515/1231388

/DBAccessClasses_AliceOffWeek_031005.pdf



https://confluence.desy.de/download/attachments/77820194/CondDB_Status_v2.pdf?version=1&modificationDate=1511655220433&api=v2

https://confluence.desy.de/download/attachments/77820194/CondDB_Status_v2.pdf?version=1&modificationDate=1511655220433&api=v2

https://confluence.desy.de/display/BI/Conditions+Database+Code+Review

https://indico.cern.ch/event/315816/contribution/0/attachments/606182/834226/ALFA_CWG13_250414.pptx

https://indico.cern.ch/event/315816/contribution/0/attachments/606182/834226/ALFA_CWG13_250414.pptx

https://indico.cern.ch/event/420329/contributions/1883596/attachments/877515/1231388/DBAccessClasses_AliceOffWeek_031005.pdf

https://indico.cern.ch/event/420329/contributions/1883596/attachments/877515/1231388/DBAccessClasses_AliceOffWeek_031005.pdf

Straw Detector and

Online System

P. Bhatnagar, D. Satria

20 June 2018

I. Straw Detector

The data generated by the straw detector are divided into four types including mapping, status, alignment, and calibration as shown in Table 1. Mapping data consists of the electronic channel ID and straw ID. Each channel of the straw detector has one integer data for status. Alignment data for every straw has 18 double data. The total number of straws is 18,000. Therefore, the total alignment data size is approximately 324,000 doubles. The calibration data contains the v-shape and resolution data. The v-shape is parameterized by bins of R and each bin has 20 parameters. There are 100 bins for v-shape data. Assume that the resolution has same data size. The total data for calibration is 328,000 doubles and 54,000 integers. Table 1. Predicted Generated Data from Straw Detector

Data Type Parameter Size Total data

Mapping Channel ID and Straw ID 2 integers 36000 integers

Status Channel status 1 integer / status 18000 integers

Alignment Straw alignment 18 doubles / straw 324000 doubles

Calibration V-shape parameter 2000 parameters 328000 doubles and 54000

integers Resolution 2000 parameters

The conditions data of the straw detector are stored in the conditions database containing a table associating electronic channel ID and straw ID for mapping, a table associating straw ID with its status, parameters that describe the best known geometry for alignment, and the performance after calibration for each straw. Track reconstruction in online and offline systems uses these conditions data of the straw detector. The conditions data are used to reconstruct as precisely as possible and to update the calibration and alignment coefficients in online system. Typically, a new calibration, a repair, or any malfunction of the straw detector will trigger a change in the version of the conditions data. Normally, the straw channel status is updated once a week. Mapping data of the detector updates 1-2 times per year. The alignment and calibration parameters are also updated rarely a few times per year. The track reconstruction requires the conditions data from the straw detector, magnetic field data of spectrometer, and some data from the precise timing detector. Versioning of the conditions data sets with time validity windows is important for the tracking.

2

II. Online System All conditions data from the subdetectors should be written to the conditions database without any filtering in the online system. Approximately 100 bytes of data about beam conditions will be written to conditions database every 6 to 10 seconds. The online system requires configuration data for every cycle from the conditions database. During the extraction of the proton beam from the experiment, the online system fetches proton intensity and position with the data size between 10 to 100 bytes every 10 seconds from the conditions database.

When there is a trigger process in the online system and it produces some results, the conditions data will be versioned. If the result only has one part, the version of the conditions data is only one. However, if there are possibly more updates for the result, the version of the data can be more than one. For example in data reprocessing, there can be many updates of the alignment and calibration constants with many versions of the data.

3

Reference

[1] P. Lichard, “NA62 Straw detector read-out system”, CERN: Aug. 2014. url: https://www.slideserve.com/demont/na62-straw-detector-read-out-system

[2] K. Anthony, “CERN: Using 2,000 vacuum-resistant straws to probe new physics”, CERN: Jun. 2012. url: https://phys.org/news/2012-06-cern-vacuum-resistant-straws-probe-physics.html

[3] The SHiP Collaboration, “A Facility to Search for Hidden Particles (SHiP) at the CERN SPS”, CERN, Apr. 2015.

4

https://www.slideserve.com/demont/na62-straw-detector-read-out-system

https://phys.org/news/2012-06-cern-vacuum-resistant-straws-probe-physics.html

Documents

Comparative Study of - CERN Document Server study.pdf · regards to this comparative study we will focus on conditions databases in the field of particle physics. Particle physics