Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Comparative Study of
Condition Databases
P.Bhatnagar, G.De Almeida Calheiros, D.Fratte, K.Karmas, A.Megerdoumian, D.Satria,
B.Tassew
04 July 2018
Introduction
The evolution of technology is rapidly flourishing. With that comes a lot of data and therefore
databases, where you save the data collected and produced by users, computers, and
experiments. Like any other field, the field of physics has a lot of experiments taking place
worldwide, which produces large amount of data for physicists to examine. To make sense of
a massive chunk of data, we require special databases, computing power to collect the data
without any leaks, filter it and format it so it’s easily accessible, readable, and available. In
regards to this comparative study we will focus on conditions databases in the field of particle
physics. Particle physics is the branch of physics that deals with the properties, relationships,
and interactions of subatomic particles[1] and this is the field in which CERN specialises.
Within particle physics, a new dimension has been brought up in the recent years and that is
the study of hidden particles. These are the particles that scientists believe are produced when
two protons collide in a beam experiment. This has become a new challenge in the last decade
for scientists working in the field of particle physics. To prove that scientists are correct when
they theorize that hidden particle exists, we can look back in 2012, when a particle called Higgs
Boson was found in LHC experiment within CMS facility. The research is taking place and it
is predicted that it can make a huge difference in the future and past theories.
In recent times, many other scientific institutes around the world has started taking the hidden
particles seriously. There are multiple facilities which have been established to experiment.
Currently, CERN is a forerunner when it comes to conditional databases with it’s Large Hadron
Collider situated in Geneva, Switzerland, but there are many others with the likes of BaBar
(Stanford University) , BELLE II (Japanese), FAIR, to mention a few. For this comparative
study, we will be focusing on LHC, BaBar, BELLE II, and FAIR.
BaBar
The BaBar experiment uses a distributed database architecture for its condition data [2]. This
version of the database was deployed in 2002 replacing the original version that was
implemented in 1999 due to its limitation in performance and scalability. This was partially
due to the fact that it was not designed to be used in a distributed environment. Another reason
for the change was that it exposed internal implementations through its interface. The
experiment used the same technology in both online and offline systems. The underlying
database was implemented using Objectivity/DB for compatibility reasons although it is
technology neutral. The main concept behind this design was the concept of an ‘origin’. Each
installation of the database has its native origin. Origins are the owners of the data, i.e. data
associated with the origin can only be updated at the database installations of the same origin.
When an installation contains data from other origins, it will only be available in read only
mode. The design handles versioning of conditions data through a 2-D geometric model
consisting of the insertion interval of validity timelines.
The main advantage of this approach was the support for distributed architecture. The
limitations associated with it are the complexity of the schema which contains 50 persistent
classes for metadata with more than 400 classes for the payload,and management efforts
regarding database and content management. The figure below displays the implementation
followed by the Babar Experiment.
Figure 1. Distributed model of CDB Figure 2. Map of Distributed CDB
CMS
In the CMS experiment there were also two versions of the conditions database and an
exploratory research towards a third one. In the first version, the Object Relational Mapping
(ORM) approach was followed. Root based packages were used for C++ class introspection,
which meant that the inner workings of the classes needed to be known in order to serialize
them. This led to a complex system where many layers of abstraction were needed on top of
the data access layer and a large number of tables were generated, which then had to be stored
in different schemata per subsystem to avoid naming conflicts.
For the second version[3], in order to simplify the read and write of conditions, a much simpler
database design was used. This design simplified the data into five relational tables
(GLOBAL_TAG_MAP, GLOBAL_TAG, TAG, IOV, and PAYLOAD) with only one of these,
the payload table, which holds the bulk data. For reading a specific payload, a global tag and
tag were specified with an interval of validity and a specific payload was identified by a
timestamp. To make queries execute faster indexing was used. Oracle was selected as an
underlying persistent technology for the conditions data due to scalability, reliability and
resilience, and 24/7 database administration support from CERN IT department. SQLite was
used to store user data in private files as an interchange format. A C++ abstraction layer for
databases called CORAL was added to achieve interoperability between SQLite and Oracle.
There were many technologies that were used in this infrastructure including, Boost library for
serialization of C++ objects; Clang-based build infrastructure to generate serialization code;
Frontier for load balancing and simultaneous connection to the database; Conditions Uploader
for centralized access to upload new conditions data, synchronization, security and logging;
and a Web-based conditions browser for inspection of available data and searching. ATLAS
experiment has similar infrastructure for conditions database with CMS[4].
CMS researchers have also presented a study of alternative data storage backends for the CMS
Conditions Databases, evaluating some of the most popular NoSQL ones to support a key-
value representation of the CMS conditions. The reasons behind the study are some important
quality attributes they intend to achieve, such as having a highly available and consistent access
to condition data. Enhancement on the performance is also something to take into account.
The data that CMS is handling has specific characteristics such as, it does not require table
joins; there are no complex or nested queries; conditions are never updated; there are no
concurrency writes among others. For those reasons they have considered deploying their
solution in this study using the NoSQL approach, and performed some metrics tasks upon them
for further analysis.
The results are specific in terms of which database technology performed better according to
their parameters and data requirements. MongoDB and Cassandra seem to outperform other
options considered in the paper. They both show excellent performance, simplicity and
flexibility metrics, though MongoDB has the best performance in time to fetch.
Belle II
Belle II experiment uses industry standard applications and tools, extensively to get support by
IT staff instead of dedicated scientists to reduce cost and effort[5, 6]. The performance was
proven more than sufficient for the current loads. Figure 3 depicts the current back-end
implementation of Belle II. Every component in Figure 3 is implemented as separate Docker
container.
Figure 3. Current back-end implementation of conditions database in Belle II experiment.
The conditions database in this project utilizes PostgreSQL. The payload in the database is only
composed of references to files on a separate server for keeping the database small. Three
NGINX HTTP servers acts as file servers complemented by a load balancer to control the
traffic among the servers. Lustre as a parallel distributed file system accommodates the back-
end file system. Payara is an application server to provide facilities to create a server
environment that connect to the Lustre file system. An improvement for application server was
performed to replace full Payara Server development to Payara Micro to solve the problem of
running out of memory when the request rate is significantly higher than the expected ones.
Integration with Hibernate and Hazelcast has been enabled for two levels of application caching
in addition to web-proxy caching. The first reason for the additional caching is to improve
performance for web-proxy low hit/miss ratio use cases and in turn minimize database access.
Secondly, the application cache allows clustering/synchronization of service installations.
Service WRITEs (i.e. PUT/POST/DELETE) are communicated as cache invalidation messages
to the other cluster members. Hibernate as cache level 1 provides the conditions data storage
and retrieval query service. Hazelcast as cache level 2 provides entity caching for Hibernate.
Squid cache proxy server allows the system to reduce load on the API and database to store
REST resource responses.
The future improvement for data cache is to place portions of the cache at three or more sites
with Hazelcast to provide faster response times and backup capability. When Squid and API
services are scalable, an assessment to see the need for scalability on PostgreSQL will take
place since PostgreSQL is currently still a single-site instance. At present, there are no
restrictions on database access. For improvement of the database access, authentication is
needed for writing, but for reading it is not necessary.
LHCb
The LHCb project uses a different approach to save the condition data, that consist in map time
evolution of conditions data on a filesystem hierarchy and uses Git as internal database to
manage them. This approach is not much different from the workflow of any other regular
software project and also it reduces the complexity of how to persist the data. At the same time
allowing the researcher easily to replace data partially before analysing it without changing the
whole database. On the other hand, it adds more complexity in how to work with the data. In
order to make the solution workable it is necessary to add complexity in different places. For
instance, to replace data it is necessary to create a nested Git repository. Another interesting
point when it comes to sharing the information with other researchers they use distribution
service named cvmfs (CernVM File System) in conjunction with Git.
To conclude, there are different technologies in use regarding conditions data although one
pattern seems to be common among them in terms of the data model. This model consists of a
Global Tag table that points to a payload data for a given interval of validity and is related to a
system Tag that in turn consists of many intervals of validity. The diagram below shows this
model[11].
Figure 4. The commonly followed approach as described by Laycok et al [11]
Table on the next page shows the similarities and differences amongst different Condition
databases mentioned above. The basis of discriminating them is on Relation and Non relations,
technologies (server/client), design rationales, features, and limitations of these databases.
Experiment Relational Vs
Non-Relational
Technologies (server)
Technologies (Client)
Design rationales Features Limitations
Babar 1st version
Relational Objectivity/DB
Performance & scalability issues
Design not supporting distributed computing
Babar 2nd version, CDB
Relational First implemented in Objectivity/DB for compatibility purposes
Limitations of the
first version
Distributed design
50 persistent classes(metadata)
400 unique classes(payload
Complex schema
CMS 2nd version
ORM, Object Oriented, C++
Boost library(for runtime serialization of objects to records)
Clang python library (generates code for class introspection automatically)
Oracle DB
CORAL plugins
POOL Persistency framework
Portability across platforms
Stability and reliability
Multithreading support
Scalability, reliability & 24 h support(Oracle)
Simple queries
Time search for IOV partially done in client side (cache index)
Uniquely identifiable payloads(with hash)
Only 5 tables
Boost requires dedicated build infrastructure(clang)
Study for CMS suggested the use of NoSQL databases like MongoDB, Cassandra and RIAK) - MongoDB showed the highest performance)
CORAL (C++ abstraction layer)
Frontier layer for load balancing
SQL Lite for
private files of users
SQL alchemy binding layer for RDBMS in python (DB connection and transactions management)
ATLAS and LHCb[12]
Relational
CORAL Server
Oracle, MySQL, SQLite, FroNTier
COOL (collaboration between ATLAS and LHCb)
Performance optimization for data insertion and retrieval
Online and offline clusters
Support for different relational technologies
Ported to 64 bit platforms
Server side query optimization
Authentication only using username and passwords
Performance bottlenecks due to separate physical connection from different client jobs)
Distributed DB services
Belle II [13-15]
Relational PostgreSQL
Swagger for the application interface development
Payara for Java EE application server
The extensive use of industry standard applications and tools to allow the project to be largely supported by IT staff instead of scientists to reduce cost and effort
To make maintenance as easy as possible
Standardized REST API makes the client coding independent of the actual database implementation details
Caching support
using a Squid cache proxy server to reduce load on the API and database
Payload file server has multiple HTTP servers linked with load balancer
Database and payload (file) server are separate systems
Each component of the database back-end implementation are implemented in Docker
Communication with the database uses standard HTTP (XML and JSON data)
The file server currently consists of three NGINX HTTP servers handled by a load balancer to distribute traffic
Payara (Java) runs out of memory if request rate is significantly higher than the expected rates
Future improvement is placing portions of the cache at three or more sites with Hazelcast to provide faster response times as well as backup capability for site or network outages.
The PostgreSQL database is still a single-site instance and not currently scalable. Several options of supporting a distributed database are being investigated, including OpenStack Trove and CockroachDB. The replicated databases would be sited in tandem with the Hazelcast cache cluster sites.
To keep the database small, the payloads only consist of references to files on a separate server. These “payload files” are currently assumed to be ROOT objects by the client, although there is nothing in the database design that limits the data type.
evenly across the servers.
The back-end file system is based on Lustre, an open-source parallel file system for high-performance computing environments.
Hibernate as cache level 1 provides the conditions data storage and retrieval query service.
Hazelcast v3.8.6 in-memory data grid for support of scalable caching (L2) as well as transparent distribution of the service across multiple sites
Authentication is currently not implemented, but is planned for services that would modify the database. The possibility of leveraging the X.509 authentication already present in the Belle II Grid computing interface is being investigated.
LHCb Non-Relational
Git
GitCondDB
Git
Easy to persist data;
Data saved as text files
The logic how to access/manage the data is complex and it is
Easy to create new versions of the data;
Able to deal with a
huge amount of data;
implemented in the browser;
Is necessary to use other services in other to share the data;
Performance degradation with database exceeding 10 GBs
ALICE File system AliEn client ROOT Files on Local and Grid Storage
ALiCDBmanager
AliROOT CDB Access framework
Offline and online object Versioning
done during storage
The difference in such systems lies in the representation of payload. Some experiments like
ALICE and NA62 use a File based approach while others like ATLAS use BLOB fields in
relational databases. Another approach was seen in LHCb which uses git to store its conditions
data and handle versioning. The advantage of using relational databases like oracle lies in the
fact that, the look up for payload data using Global Tag, Tag and IOV’s is easily handled
through indexing. On the other hand the File based approach is very simple and flexible in
terms of the kind of data that can be saved. The challenge of the file based approach in having
a performant caching layer that does the filesystem mapping efficiently.
A new approach that is being suggested by some studies is that NoSQL databases approach can
support this payload-based data well[10]. Another suggestion made by Laycock et al[11] is a
REST web service with client interface for database insertion with a relational data model as
the persistent layer.
LHCb Implementation:
The above diagram describes the Git workflow used in git CondDB. In the upper part,
auxiliaries scripts are ran it needs to convert the existing data (from the previous database
implementation) for the new format. In the lower part, it presents the integration / workflow
with Git.
Concept The idea is mapping the time evolution of conditions data on a filesystem hierarchy and use Git internal database to manage versions and tags.
Layout The conditions database uses a 3-dimensional structure, where the axis are:
condition id (usually, path to an XML file) version (tag, commit id, branch name, ...) Interval Of Validity (IOV, range of event times a value should be used for)
The first two axis, id and version, map directly to the two axis of a Git repository (path and version), while for the IOVs we use a special convention:
If a condition id points to a file, there is only one value for the whole span of all event times (from 0 to MAX)
If a condition id points to a directory, it must contain a file called IOVs in which each line is constituted by the start of the IOV for a value and the relative path to the file containing the value, or payload (the end of the IOV is defined by the next line or set to MAX for the last line of the file. If the path to the value points to a directory, the algorithm to extract the value is repeated on the contained IOVs file and the deduced IOV is truncated to the boundaries deduced in the previous iteration.
To explain better how the IOV definitions work, let's assume we need to find the value for condition Conditions/MyDetector/Cond.xml at event time 1234
if Conditions/MyDetector/Cond.xml is a file: the value is the content of the file and the IOV is [0, MAX)
if Conditions/MyDetector/Cond.xml is a directory (simple case) 1. we open Conditions/MyDetector/Cond.xml/IOVs where we find something like
0 value1
100 value2
200 value1
1000 value3
2000 value4
2. the value is the content of Conditions/MyDetector/Cond.xml/value3 and the IOV is [1000, 2000)
if Conditions/MyDetector/Cond.xml is a directory (nested case) 1. we open Conditions/MyDetector/Cond.xml/IOVs where we find something like
0 value1
100 subdir1
1200 subdir2
1. we open Conditions/MyDetector/Cond.xml/subdir2/IOVs to look for IOVs in [1200, MAX), where we find something like
1000 ../value3
2000 ../value4
2. the value is the content of Conditions/MyDetector/Cond.xml/value3 and the
IOV is [1000, 2000)
2. The value is the content of Conditions/MyDetector/Cond.xml/value3 and the IOV is
[1200, 2000) (i.e. the IOV found in the subdir, bounded to the IOV of the subdir itself)
Alice Implementation:
How FairMQ is used in ALICE-FAIR?
Note: FairMQ is for the transport layer only
Sample Code [16]
ID="101"
inputFile=”~/install/FairRoot/example/…/data/testdigi_TGeant3.root"
parameterFile=“~/install/FairRoot/example/…/data/testparams_TGeant3.
root"
branch="FairTestDetectorDigi"
eventRate="0"
numIoThreads="1"
outputSocketType="push"
outputBufSize=$buffSize
outputMethod="bind"
outputAddress="tcp://*:5565"
/local/home/cwg13/install/FairRoot/build/bin/testDetectorSampler$dat
aFormat $ID $inputFile $parameterFile $branch $eventRate
$numIoThreads $outputSocketType $outputBufSize $outputMethod
$outputAddress
Implementation[17]
Figure 1. activation of new storage location
Fig 2. Write
Fig 3. retrieval
Bibliography:
1. https://en.wikipedia.org/wiki/Particle_physics
2. A. Khan, I. A. Gaponenko and D. N. Brown, "Distributed Online Conditions Database
of the BaBar Experiment," in IEEE Transactions on Nuclear Science, vol. 55, no. 5,
pp. 2579-2583, Oct. 2008.
3. Di Guida S. et al., “The CMS Condition Database system”. In Proceedings of the CHEP
conference J. Phys.: Conf. Ser, 2015.
4. D. Barberis et al., “Designing a future Conditions Database based on LHC experience”
, J. Phys.: Conf. Series 664, 2015.
5. L. Wood et al., “Implementing the Belle II Conditions Database using Industry-
Standard Tools”, presented at ACAT conference, Aug. 2017. url:
https://indico.cern.ch/event/567550/contributions/2686391/attachments/1512060/2358
335/ACAT_CondDB_release.pdf.
6. L. Wood, T Elsethagen, and K. Fox, “Belle II Conditions Database: Status and Plans”,
Pacific Northwest National Laboratory, Oct. 2017.
7. https://twiki.cern.ch/twiki/bin/view/LHCb/GitCondDB
8. https://twiki.cern.ch/twiki/bin/view/LHCb/CondDBManagement
9. https://cernvm.cern.ch/portal/filesystem
10. R. Sipos, "Evaluation of NoSQL prototypes for the CMS conditions database," 2015
IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), San
Diego, CA, 2015, pp. 1-7.
11. P J Laycock, D Dykstra, A Formica, G Govi, A Pfeiffer, S Roe,R Sipos, “A
Conditions Data Management System for HEP Experiments”, presented at 18th
International Workshop on Advanced Computing and Analysis Techniques in Physics
Research, Nov 2017
12. A. Valassi, R. Basset, M. Clemencic, G. Pucciani, S. A. Schmidt and M. Wache,
"COOL, LCG conditions database for the LHC experiments: Development and
deployment status," 2008 IEEE Nuclear Science Symposium Conference Record,
Dresden, Germany, 2008, pp. 3021-3028. url:
https://ieeexplore.ieee.org/document/4774995/
13. L. Wood et al., “Implementing the Belle II Conditions Database using Industry-
Standard Tools”, presented at ACAT conference (August 2017). url:
https://indico.cern.ch/event/567550/contributions/2686391/attachments/1512060/2358
335/ACAT_CondDB_release.pdf.
14. L. Wood, T Elsethagen, and K. Fox, “Belle II Conditions Database: Status and Plans”,
Pacific Northwest National Laboratory, (October 2017).
url:https://confluence.desy.de/download/attachments/77820194/CondDB_Status_v2.p
df?version=1&modificationDate=1511655220433&api=v2
15. L. Wood, "Conditions Database Code Review", Nov. 2017. url:
https://confluence.desy.de/display/BI/Conditions+Database+Code+Review
16. https://indico.cern.ch/event/315816/contribution/0/attachments/606182/834226/ALFA_C
WG13_250414.pptx
17. https://indico.cern.ch/event/420329/contributions/1883596/attachments/877515/1231388
/DBAccessClasses_AliceOffWeek_031005.pdf
Straw Detector and
Online System
P. Bhatnagar, D. Satria
20 June 2018
I. Straw Detector
The data generated by the straw detector are divided into four types including mapping, status, alignment, and calibration as shown in Table 1. Mapping data consists of the electronic channel ID and straw ID. Each channel of the straw detector has one integer data for status. Alignment data for every straw has 18 double data. The total number of straws is 18,000. Therefore, the total alignment data size is approximately 324,000 doubles. The calibration data contains the v-shape and resolution data. The v-shape is parameterized by bins of R and each bin has 20 parameters. There are 100 bins for v-shape data. Assume that the resolution has same data size. The total data for calibration is 328,000 doubles and 54,000 integers. Table 1. Predicted Generated Data from Straw Detector
Data Type Parameter Size Total data
Mapping Channel ID and Straw ID 2 integers 36000 integers
Status Channel status 1 integer / status 18000 integers
Alignment Straw alignment 18 doubles / straw 324000 doubles
Calibration V-shape parameter 2000 parameters 328000 doubles and 54000
integers Resolution 2000 parameters
The conditions data of the straw detector are stored in the conditions database containing a table associating electronic channel ID and straw ID for mapping, a table associating straw ID with its status, parameters that describe the best known geometry for alignment, and the performance after calibration for each straw. Track reconstruction in online and offline systems uses these conditions data of the straw detector. The conditions data are used to reconstruct as precisely as possible and to update the calibration and alignment coefficients in online system. Typically, a new calibration, a repair, or any malfunction of the straw detector will trigger a change in the version of the conditions data. Normally, the straw channel status is updated once a week. Mapping data of the detector updates 1-2 times per year. The alignment and calibration parameters are also updated rarely a few times per year. The track reconstruction requires the conditions data from the straw detector, magnetic field data of spectrometer, and some data from the precise timing detector. Versioning of the conditions data sets with time validity windows is important for the tracking.
2
II. Online System All conditions data from the subdetectors should be written to the conditions database without any filtering in the online system. Approximately 100 bytes of data about beam conditions will be written to conditions database every 6 to 10 seconds. The online system requires configuration data for every cycle from the conditions database. During the extraction of the proton beam from the experiment, the online system fetches proton intensity and position with the data size between 10 to 100 bytes every 10 seconds from the conditions database.
When there is a trigger process in the online system and it produces some results, the conditions data will be versioned. If the result only has one part, the version of the conditions data is only one. However, if there are possibly more updates for the result, the version of the data can be more than one. For example in data reprocessing, there can be many updates of the alignment and calibration constants with many versions of the data.
3
Reference
[1] P. Lichard, “NA62 Straw detector read-out system”, CERN: Aug. 2014. url: https://www.slideserve.com/demont/na62-straw-detector-read-out-system
[2] K. Anthony, “CERN: Using 2,000 vacuum-resistant straws to probe new physics”, CERN: Jun. 2012. url: https://phys.org/news/2012-06-cern-vacuum-resistant-straws-probe-physics.html
[3] The SHiP Collaboration, “A Facility to Search for Hidden Particles (SHiP) at the CERN SPS”, CERN, Apr. 2015.
4