27
1 Simulated Luminosity Calculations for the Large Hadron Collider Christopher Rogan Fall 2004 Junior Paper Advisor: Professor Daniel Marlow

Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

1

Simulated Luminosity Calculations for the Large

Hadron Collider

Christopher Rogan

Fall 2004 Junior Paper

Advisor: Professor Daniel Marlow

Page 2: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

2

1 Introduction 1.1 The Large Hadron Collider

The Large Hadron Collider (LHC) is a particle accelerator that will bring protons

and ions into head-on collisions at higher energies than have ever been achieved at existing colliders. It will be built along the Franco-Swiss border west of Geneva at the foot of the Jura Mountains. The LHC will be part of CERN, the European organization for Nuclear Research, the world’s largest particle physics center. Presently, the most powerful tool for research at CERN is the Large Electron-Positron Collider (LEP). It has provided measurements unrivaled in quantity and quality, testing properties of the Standard Model to a fraction of 1%, a margin that is projected to soon reach one part in a thousand. By 1996 the LEP could achieve energies as high as 90 GeV per beam. The data produced from the LEP project have such high accuracy that they are sensitive to physical phenomena occurring at energies beyond the range that the collider is able to generate, providing an impetus for achieving greater energies with the LHC. The Large Hadron Collider is being designed to use the previously built 27 km LEP tunnel, along with its existing particle sources and pre-accelerators. Joint LHC/LEP operation is expected to be able to collide proton beams at energies around 7 TeV. Given these unsurpassed energies, the potential for research with the LHC is immense. [26] 1.2 The Compact Muon Solenoid

The detectors associated with the LHC include ATLAS, CMS, LHCb and ALICE.

For this research, the detector of note is the Compact Muon solenoid (CMS). Like most detectors for particle physics, the CMS is based around a magnet system in order to facilitate the measurement of the momentum of charged particles. In the case of the CMS, the magnet is a large superconducting solenoid, the largest of its type ever constructed. [6] Calorimeters are located inside the detector to absorb electrons, photons and hadrons in order to measure their energies. There are two calorimeter layers, one used to measure the energies of electrons and photons, the other to measure the energies of hadrons. These are the electromagnetic calorimeter (ECAL) and hadronic calorimeter (HCAL) respectively. [5]

1.3 Physical Goals

The motivations for building the LHC are numerous, as there are many

unanswered questions in physics whose answers can only be discovered at the higher energies that the LHC will provide. For example, within the Standard Model, the idea of the Higgs mechanism suggests that the whole of space is filled with a ‘Higgs field’ and that particles acquire their masses through their interactions with this field. Associated with the Higgs field is at least one undiscovered particle, the Higgs boson, which is possibly detectable at the high energies that the LHC will be able to achieve. Also, theories of supersymmetry (SUSY) can be explored with the LHC as the presence of supersymmetric particles is expected. In addition to the questions that the LHC is

Page 3: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

3

expected to answer, history has demonstrated that advances in physics are often unexpected, and the LHC will provide the opportunity for these serendipitous discoveries.

When two beams of particles collide head on in the LHC, each beam has an interaction cross section, ! , which is the effective area of a beam for producing a scattering event. The event rate, R, in a collider is proportional to the cross section, and the factor of proportionality is called the luminosity, L: !LR = (1.3.1) This concept is slightly reformulated for the purposes of this research. In the LHC, the beams of protons are organized in compact bunches, so that the collision of beams can be seen as a bunch crossing. Each bunch has an effective interaction cross section, and the beams can be calibrated so that bunch crossings occur at a set frequency, f. Given this frequency, the expected number of collision events in a bunch crossing, <Nint>, can be expressed as

f

LN

!>=<

int (1.3.2)

Therefore, the luminosity can be calculated if the observed number of interactions, the cross-section and frequency of bunch crossings are known. The problem is that it is difficult to calculate the value < Nint > given data from the CMS detectors. Bunch crossings occur at a point in approximately the radial center of the CMS within the LHC. The CMS appears as a cylindrical tunnel, and the calorimetric detectors are organized in a two dimensional array on the inside of this tunnel. When an event occurs, new particles are produced that scatter in all directions, and the energies of incident particles at the detectors are recorded. The difficulty in calculating < Nint > arises because interactions produce a range of energy depositions. It can be difficult to distinguish between one interaction with high transverse energies and several less energetic interactions since after each bunch crossing the only evidence of interactions is the smear of transverse energy deposits over the array of CMS detectors. The purpose of this research is to use a Monte Carlo simulation to generate data from such bunch crossings and events and analyze the resulting energy deposits in the array of calorimeters in order to be able to better calculate the number of events in each bunch crossing, quantifying the effects of working at different luminosities.

2 Particle Accelerators 2.1 Types of Accelerators In general, the basis for a particle accelerator is a vacuum pipe, which is permanently pumped out, making sure the residual pressure is as low as possible. Within this vacuum, electric fields are used to accelerate particles. Intense radio waves are generated by amplifiers and are sent into Radio-Frequency (RF) cavities, where the waves resonate. As particles travel through a RF cavity they are accelerated by energy transferred from these radio waves. Linacs, or linear particle accelerators, are those where the vacuum pipe is straight, causing particles to only pass through each RF cavity once, entering one side of the accelerator and exiting the other. To use RF cavities more effectively, some accelerators, called synchrotrons, are designed as a closed loop so that

Page 4: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

4

particles pass through each cavity multiple times. To bend the beam of particles dipole magnets are used, exerting a magnetic field in the cavity. This magnetic field applies force to the charged particle beam perpendicular to its motion, precisely the central force required to create a circular trajectory. These beams will naturally diverge, so quadrupole magnets are used to focus them, providing constraints that keep them within the vacuum pipe. The effect of these magnets is analogous to that of a lens on a beam of light. [4] Accelerator experiments can then be separated in to two additional types: fixed target and colliding. In the case of fixed target, a beam of particles is accelerated and then collided into the atoms of the target. New and scattered particles then generally travel in the direction of the incident beam, meaning that detectors for this type of configuration are generally conical in shape. Most of the energy contained in the beam is used in the recoil of the target and center of mass motion, meaning that less is available in the creation of new particles. For colliding experiments, as the name suggests, two separate beams of particles are individually accelerated and then smashed into each other head-on, wasting no recoil energy and producing a more energetic collision. Radiated particles travel in all directions after this type of collision, so detectors are usually cylindrical or spherical, as is the case with the CMS. [19] 2.2 Details of the Large Hadron Collider The LHC is a synchrotron type accelerator, demonstrated by its circular design in Figure 2.2.1. As with all accelerators, the particles that make up the beam are injected

Figure 2.2.1: The Large Hadron Collider

Page 5: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

5

from a source outside of the accelerator. In the case of protons, a linac (LINAC 2 in Fig. 2.2.1) feeds the particles into the PS (Proton Synchrotron) Booster, PS, SPS and LHC successively, whereas for ions, particles are stored in an intermediate stage after Linac 3 in the LEIR, and the PS Booster is omitted. The PS Booster and PS are two synchrotron accelerators that achieve energies of 1.4 GeV and 26 GeV respectively. These two components make up the PS complex, whose primary purpose is to achieve sufficient brightness for eventual injection into the LHC. In addition to this, the PS complex minimizes space-charge effects, reduces emittance, and ensures that bunches are separated by precisely 25 ns intervals. The SPS is another, larger, synchrotron accelerator that increases the energy of the particle beams to 450 GeV before releasing them to the transfer lines TI2 and TI8, where they are injected into the LHC. [20] The LHC is actually two accelerators, projecting two beams in opposite directions with TI2 and TI8 as the beams’ respective sources.

The enormous power of the beams within the LHC requires the use of huge dipole magnets to generate the fields necessary to bend the beams in a circular trajectory. These powerful magnetic fields are achieved by using superconducting magnets, which require very low temperatures, meaning the whole LHC operates at about 300 degrees below room temperature and will be the largest superconducting installation in the world. For reasons of economy and compactness, the magnets for both beams are located in the same “2-in-1” housing. [18]

Due to the temperature requirements, the LHC requires a large cryogenic system to provide refrigeration for the different cryo-magnets, accelerating RF cavities and their supplementary equipment. The LHC is divided into eight 3.3 km sections, each of which has its own cryogenic plant. The vacuum chamber is in the inner wall of the cryostat and operates at about 1.9 K. For economic reasons, the heat load from the synchrotron radiation and from image currents in the vacuum chamber is absorbed on a ‘beam screen’, operating between 5 and 20 K. [21] 2.3 Luminosity

As suggested earlier, the luminosity of an accelerator is a measure of the number

of collisions that occur per unit time for a given cross section. This is a quantity that researchers strive to maximize for several reasons. Firstly, having a higher luminosity means that interactions with smaller cross sections can be observed. Furthermore, greater luminosity implies an increased rate of interaction, which creates more opportunities for discovery and allows for more detailed studies to be accomplished. This goal can be achieved by small changes at every step of the accelerator process, including increasing the number of bunches in the accelerator, and achieving lower vacuum pressures. The LHC is designed to achieve luminosity on the order of 1034 cm-1s-1.

3 The Compact Muon Solenoid Detector 3.1 Architecture of the CMS The purpose of the CMS detector is to take measurements of muons, electrons and photons over the energy range that will exist in the LHC. The CMS is constructed around

Page 6: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

6

a solenoid with superconducting coils that can generate a magnetic field of about 4 Tesla. The reason for a large magnetic field is this: In the presence of a magnetic field, charged particles experience a force according to

Bvc

qF

rrv!= )( (3.1.1)

Where F is the force on the particle, q is the particle’s charge, v is the velocity of the particle and B is the magnetic field acting on it. The amount the particle is bent upon entering the CMS detector is proportional to the force on it, which is proportional to the magnitude of the magnetic field. Simultaneously, the amount the particle’s trajectory will bend is inversely proportional to the momentum of the particle. This means that high-resolution measurements of particles with large momentums require a large magnetic field. 3.2 CMS Component Details

Within the CMS are five detectors whose ultimate purpose is to reconstruct events

after proton-proton collisions. The innermost subdetector is the tracker, a collection of 25,000 silicon strip sensors. Each sensor has an area of 150x150 µm and is between 300 µm and 500 µm thick. The purpose of the tracker is to reconstruct the trajectories of particles passing through the CMS, identifying the vertex, or spatial origin, of the particle. It does this by noting the location of the small currents generated when particles cross the array of silicon strips. [7]

Figure 3.2.1: CMS Architecture

The outermost layer of sub-detectors is comprised of muon detectors, placed outside of the solenoid coil. The purpose of this detector is to identify muons and, together with the tracker, measure their momentum accurately. The identification of muons is ensured, since within this detector is a layer of iron, which can only be traversed

Page 7: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

7

by muons and neutrinos. These detectors consist of four muon stations arranged in concentric cylinders around the beam line. Identification is achieved by lining up the hits in at least 2 of the 4 muon stations. The muons are detected by arrays of drift tubes containing 50 µm stainless-steel wires surrounded by an Ar-CO2 gas mixture. Incident muons ionize the gas, and free electrons collect on the adjacent wires. A current is induced on nearby wires and the position of the muon can then be inferred. Since the proton-proton interaction rate is on the order of 1 GHz, and events can only be processed at ~10 MHz the muon detector has a trigger system which selects only interesting events. [10] Between the tracker detector and muon detector, within the solenoid coils, are the calorimeter detectors ECAL and HCAL. The ECAL measures the energies of positrons, electrons and photons and is made up of over 80,000 scintillating lead-tungstate (PbWO4) crystals equipped with photodiodes and associated electronics (a scintillating material emits light when an ionizing particle enters the object). When energetic photons interact with matter, they do so by pair production: the formation of a positron and electron. Similarly, when electrons and positrons interact with matter, photons are radiated through the bremsstrahlung process. Upon entering the lead-tungstate crystals, energized positrons, electrons and photons proceed to interact through the above to processes until they have energies comparable to the electrons in the crystal lattice. The collection of these particles is called the electromagnetic shower, and the result is that the lattice electrons are excited out of their ground state. Upon relaxation of these lattice electrons, visible photons are emitted, and detected with an avalanche photodiode (APD). This photodetector creates a photocurrent proportional to the number of incident photons. Due to low light-yield, the current is amplified and then digitized. This signal travels off the detector via optical fibers to trigger and data acquisition systems. [8] The HCAL measures the energy of particles that interact via the strong interaction, hadrons. In addition, the HCAL assists in the identification and measurement of quarks, gluons and neutrinos by noting missing transverse energy flows in events. In conjunction with the tracker, ECAL and muon detector the HCAL also helps in identifying electrons, photons and muons. It is composed of the Hadron barrel (HB) and Hadron endcap (HE) calorimeters, alternating layers of 50 mm thick copper absorber plates and 4 mm thick plastic scintillating sheets. These sheets are sub-divided into ‘Mega-tiles’, which are covered in wave-shifting fibers. The tiles absorb particles and emit light, which is absorbed by the fibers. Subsequently, the fibers fluoresce and the light is collected at each subdivision. In addition, there are two hadronic forward (HF) calorimeters located at each end of the CMS detector. The HF’s are located in areas of harsh radiation fields, and are made of more resilient steel absorber plates. These plates are threaded with quartz fibers. Fiber optics transport light to photomultipliers, which are converted to digital signals sent to the trigger and data acquisition systems. [9]

4 Neural Networks 4.1 Neural Network Formulation The inspiration for artificial neural networks is the human brain itself. Neural networks mimic the structure and functionality of a biological nerve system in order to effectively learn in the same way. They are capable of making decisions based on

Page 8: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

8

incomplete, noisy and disordered information, and can generalize rules from data on which they are trained in order to apply the same rules to new stimuli. Neural networks are made up of processing elements (PE’s) and weighted connections between the PE’s. Figure 4.1.1 is an example of a typical neural network with three layers. A layer in a neural network consists of a collection of PE’s, also called neurons. Each neuron within a layer collects the values propagated by its input connections and performs a pre-defined mathematical operation, producing a single output value, which is subsequently passed on via another weighted connection to other neurons. The example neural network in Figure 4.1.1 has three layers, Fx, Fy and Fz, which consist of neurons Fx {x1, x2,…, xn}, Fy {y1, y2,…,yp} and Fz {z1, z2,…,zq}. The neurons in each layer are connected by weighted links represented by the arrows in the figure. Each link acts as both a label and a value. For example, in Figure 4.1.1 the connection from x1 to y2 is denoted w12. The values of these connection weights are generally determined by a neural network training procedure. In fact, the architecture of the network (topology of neurons and links) is usually held fixed once decided upon, and the network learns through changes in only these weights. The output of each neuron along a particular connection is multiplied by the weight associated with that connection before it is combined with other inputs at another neuron.

Figure 4.1.1 Topology of a feed-forward backpropagating neural network

The most common computation procedure performed by a neuron is to sum each of the input values (the output from neurons in previous layers multiplied by connection weights) and use this sum as the input to an activation function, f, associated with the neuron

!=

=n

j

ijii wxfy1

)( (4.1.1)

where yi is the output of the neuron. The most common activation function is the hyperbolic tangent function

Page 9: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

9

xx

xx

ee

eexf

!!

!!

"

"

+

"=)( (4.1.2)

where α > 0. A training cycle for a neural network proceeds as follows: An input vector of data is given to the network along with a vector of desired responses, one for each node in the output layer. The vector of inputs is propagated through the network and the errors between the vector of desired responses and the vector of actual responses can be examined. It is this error that is used to determine the weight changes of the connections in the network according to some prescribed learning rule. The best-known and most widely used learning algorithm is the backpropagation algorithm, as introduced by Werbos. [27] A popular measure of the error, E, for a single input and output pattern is the sum of the squares of differences between the desired output vector and actual output vector at each output neuron:

! "=i

iiadE

2)(2

1 (4.1.3)

where di and ai are the desired and actual output of the ith neuron in the output layer. The learning routine of a backpropagation network assumes that each input vector is paired with a target vector, called training pairs. For example, in the network shown in Figure 4.1.1 the input vectors Ak are propagated through the network and output vectors Bk are then compared to the target vectors associated with each Ak by the error function E. Before training begins, all of the weights are assigned small, random values in the range (-1, 1) to prevent the network from being dominated by any one input parameter. Learning is done by iteratively adjusting the weight matrices of the connections in the network W and V in a way that is intended to minimize E. Each input pattern is presented to the network; the error function is calculated and is then backpropagated through the network, adjusting these weights. This learning process continues until E becomes less than a predetermined minimum or some other stopping criterion is met. Once the network is trained, input vectors can be inserted into the network and outputs are ‘produced’ that coincide with the same pattern of input/output vectors that was used to train the network. 4.2 Neurocomputing and optimization Given the feed-forward backpropagation paradigm of neural networks, the problem of training the network reduces to minimizing the error function with respect to connection weights of the network. The optimization schemes used to train networks in this research are the steepest descent and conjugate gradient methods, and are described in the following sections. 4.2.1 Steepest Descent Method The standard backpropagation method is that of steepest descent, also known as the gradient descent method. This method uses each connection weight of the neural network to be trained as an independent dimension in a mathematical space of network weights. Hence each weight configuration of network weights corresponds to a point in this multi-dimensional weight space. The steepest descent method proceeds at each iteration by taking the gradient of the error function E in weight space and moving from the current weight configuration to a new one along the opposite direction of the gradient vector, the direction in weight space where the error function decreases most rapidly.

Page 10: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

10

Using the notation given by Sanchez-Sinencio [24] in the context of the example network in Figure 4.1.1, the steepest descent algorithm can be explained as follows: The output error for all of the output layer neurons, Fz{}, is given by

!=

"=q

j

jj zbE1

2)(2

1 (4.2.1.1)

where bj is the desired output and zj is the actual output. The output for a neuron in the layer Fz is given by

!=

=p

i

ijijwyz

1

(4.2.1.2)

where wij is the weight between the ith neuron in the Fy layer and the jth neuron in the Fz layer. For each neuron in Fy, the output is given by

!=

==n

h

ihihi rfvxfy1

)()(

(4.2.1.3) where vhi is the weight between the hth neuron in the Fx layer and the ith neuron in the Fy layer. The activation function, f, is the hyperbolic tangent function. As suggested earlier, the algorithm proceeds by taking the gradient of the cost function, E, with respect to the connection weights. Hence the weights between the Fy and Fz layers are adjusted using the gradient

!!==

="="#

#=

#

# q

j

ijijj

q

j

jj

ijij

yyzbzbww

E

11

2 )(])(2

1[ $ (4.2.1.4)

where δj is the error in the output of the jth neuron in the Fz layer. The weight adjustments between the Fx layer and the Fy layer are found using the chain rule:

!=

"=#

#

#

#

#

#

#

#=

#

# p

l

hihllll

hi

h

h

i

i

i

ihi

arfwyybv

x

x

r

r

y

y

E

v

E

1

)(')( (4.2.1.5)

Once the gradient of the error is calculated, the weights are updated in the negative direction of the gradient at a certain rate:

ij

old

ij

new

ijw

Eww

!

!"= # ,

(4.2.1.6)

hi

old

hi

new

hi

v

Evv

!

!"= # ,

(4.2.1.7) where α and β are positive constants called the learning rates. For the standard backpropagation algorithm the learning rates are fixed. A momentum term is introduced in order to speed the learning process and prevent the algorithm from getting stuck in shallow local minimums in weight space. With the momentum term, Equations 4.2.1.6 and 4.2.1.7 become:

old

ij

ij

old

ij

new

ij ww

Eww !+

"

"#= $% ,

(4.2.1.8) ,

old

hi

hi

old

hi

new

hiv

v

Evv !+

"

"#= $%

(4.2.1.9)

Page 11: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

11

where λ is the momentum rate, λ ∈ [0, 1]. 4.2.2 Conjugate Gradient Method In the context of neural networks, the conjugate gradient method can be thought of as an extension of the steepest descent algorithm to include variable learning rates. The algorithm proceeds as follows: An initial guess for the minimum in weight space, w0, is assigned and from it an initial search direction, s0, is defined by

0

0

0g

w

Es !=

"

"!= (4.2.2.1)

Now a series of approximations wk to minimize E are generated as follows: for k = 0, 1,… a scalar αk is found so that the function )()(

kkswEJ !! += (4.2.2.2)

is minimized. Here α is analogous to the learning rate described earlier. This is a one-dimensional minimization process, which can be accomplished by the method described in Dennis. [12] The new estimate of the minimum w is then given by

kkkksww !+=+1 (4.2.2.3)

Several different methods have been suggested to choose the new conjugate search direction sk+1. [13] For this research, the method of Polack Ribiere [22] was chosen as it was proven to be more numerically stable. [16] Using this approach, sk+1 is calculated by

kkkkswEs 111 )( +++ +!"= # , (4.2.2.4)

where

k

T

k

kk

T

kk

gg

ggg )( 11

1

!= ++

+" , )( kk wEg != (4.2.2.5)

4.2.3 Genetic Algorithms Genetic algorithms are optimization schemes that are motivated by a natural selection scheme analogous to those of Darwin’s for biological species. They are probabilistic rather than deterministic algorithms and search through ‘populations’ of solutions instead of individual solutions. For convenience, a general description of genetic algorithms will not be given as they vary greatly in implementation and structure. The genetic algorithm scheme used in this research and the motivation behind it will be described. As mentioned in Section 4.1, once the topology of a neural network is decided upon, the only changes made to it are adjustments of the weighted connections within the network. The earlier description of neural networks takes for granted how difficult it is to come upon a particular network architecture that is conducive to solving the problem at hand. One method used to find a suitable architecture for the neural network used in this research was through a genetic optimization algorithm. The precise details of the algorithm’s implementation are described in Section 5.4.2. An overview of the algorithm is as follows:

Page 12: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

12

First, in order to use a genetic algorithm approach, solutions must be encoded as chromosomes, or strings of binary bits. In this research a ceiling is imposed on the network size as described in Section 5.4.2 and network architectures smaller than this maximum are encoded as chromosomes by having the presence or absence of a neuron in a layer represented by a 1 or 0 respectively at the index of the chromosome corresponding to that particular neuron. The presence of a particular link between neurons is represented in the same way on the chromosome. It is important to note that this formulation allows for the collection of links between layers to be sparse rather than exhaustive, a favorable property as smaller networks are preferred for computational reasons. The algorithm begins with an initial population of these chromosomes of fixed size being generated randomly. The fitness of each member of the population is then calculated, where fitness is some positive function of a particular solution representing the relative success of that solution’s architecture. The population is then sorted by fitness values, and from this ordering the probability of a particular solution’s survival, PSi, is calculated. As suggested by Baker [1], PSi is made to be a linear function of a solution’s rank within the population. From here, a scheme called stochastic remainder selection without replacement is used where the expected number of copies of each solution, Ei is NPE

Sii= (4.2.3.1)

where N is the size of the population. Each solution is copied Ii times, where Ii is the integer part of Ei. The fractional remainder is the probability that the solution will be copied again. At this stage, all of the copies of solutions from the previous population are paired off at random, creating N/2 pairs. Each of the pairs undergoes a two-point crossover with some set probability Pc as suggested by Cavicchio. [3] This process is shown below in Figure 4.2.3.1:

Figure 4.2.3.1 A two-point crossover of reduced surrogates

Here, the paired chromosomes are represented as their reduced surrogates, which are strings of only the bits where the two differ, as suggested by Booker. [2] The reduced surrogates are then represented as loops and two points in the loops are selected at random. All of the bits within these two points in the loops are then switched between the two chromosomes. This stage of crossovers is analogous to biological breeding, and allows new solutions to be introduced into the population.

Page 13: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

13

Once crossovers are complete, each bit on every chromosome undergoes a mutation with some small set probability Pm, which means the value of the bit is flipped. The remaining population is now the new generation of solutions, and replaces the initial population. The entire process is repeated, creating new generations, until the stopping criterion is met. This research prescribes a maximum number of generations at which the algorithm stops, unless the stopping criterion of the elitist scheme, suggested by De Jong [11], is met. In this approach, if the same individual is the fittest member of the population for five consecutive generations the algorithm is stopped. Upon stopping, the fittest solution in the last generation is the approximate optimal solution for the whole process.

5 Luminosity Measurement 5.1 Luminosity Measurement Strategies As described earlier, the ultimate purpose of this research is to be able to calculate the luminosity of the LHC. Again the luminosity L is given by

f

LN

!>=<

int (5.1.1)

where <Nint> is the number of interactions in each bunch crossing, σ is the interaction cross section and f is the frequency of bunch crossings. The complication in this formulation arises because it is difficult to discern from the smear of energy deposits in the two dimensional array of CMS detectors how many interactions actually occurred in a given bunch crossing. Often one interaction, which produces a large smear of transverse energy, is difficult to discern from several less energetic interactions.

In this research two different strategies were used to try to overcome this problem. One was conceived by professor Daniel Marlow and can be denoted ‘the zero counting method’. The other is to take data from individual bunch crossings and use a neural network approach to determine how many interactions occurred, and from this calculate the luminosity. 5.1.1 Zero Counting Method As the label suggests, the strategy employed in this method is one of counting zeroes. We assume that the distribution of Nint for the bunch crossings is essentially a poisson distribution centered around < Nint > = µ such that the probability that a bunch crossing will yield Nint events is given by

!

);(int

int

int

N

eNp

Nµµ

µ!

= (5.1.1.1)

Given this distribution, the number of bunch crossings that yield no events can be tallied and divided by the total number of bunch crossings. Calling this number m we can then infer that: )log();0( mmp !="= µµ (5.1.1.2) The problem with this scheme is that in order for it to not produce erroneous results the number of bunch crossings that yield zero events must make up more than approximately

Page 14: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

14

1% of the total bunch crossings. With this in mind, the zero counting method is only appropriate if and only if

6.4)100

1log()log( !"="# mµ (5.1.1.3)

Considering the properties of the LHC and its proton-proton collisions we can estimate that: f = 40 MHz, σ = 80 mb (milli-barns), L = 1034 cm-2s-1 (5.1.1.4) This would imply that 6.420

int!>=< N and that the zero counting scheme will not

work. In order to save this strategy, the problem is reformulated such that:

f

ALN A !

>=<int

(5.1.1.5)

where A is a positive acceptance parameter less than 1. A can be defined such that, given the estimated values defined earlier; <Nint

A>can be made arbitrarily small, and hence less than 4.6. Analysis of CMS detector data with this method consists of choosing appropriate formulations for A and quantifying their success or failure. 5.1.2 Neural Network Method Although sharing the same goal as the zero counting method, calculating the luminosity from CMS detector data, the neural network method is formulated in an entirely different way. Rather than tackling the entire set of bunch crossing data at once, this approach essentially examines the data from each bunch crossing individually and, given the CMS data from that bunch crossing, tries to be able to identify how many interactions took place. The data from each bunch crossing is organized into a vector of numbers corresponding to the amount of transverse energy detected in a partition of CMS bins and the number of interactions associated with that bunch crossing. A neural network is then trained using thousands of these data vectors, utilizing for input the energy data of the bunch crossing and as an output goal how many interactions actually occurred in the bunch crossing. The intention is that, through training, the network will learn the seemingly unknown pattern of how particular CMS detector data corresponds to the actual number of interactions. Although this approach will not produce an explicit formula to determine the number of interactions from data, it will produce a trained neural network that can hopefully take data it has never seen as input and ‘produce’ the number of interactions in a bunch crossing with a reasonable degree of accuracy. 5.2 Description of Simulation The Monte Carlo simulation used to generate the data for this work uses as input CMS detector data for 19,970,964 events (almost 200,000 interactions) created using the programs PYTHIA and GEANT. PYTHIA is a generator of simulated high-energy physics events including, for example, collisions between elementary particles such as e+, e-, p and pbar in various combinations. It contains theory and models for many physics considerations, including hard and soft interactions, parton distributions, initial and final state parton showers, multiple interactions and fragmentation and decay. [25]

Page 15: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

15

The output of this program is information concerning particles’ 4-vectors and particular characteristics. GEANT essentially describes the passage of elementary particles through matter. It takes the output from PYTHIA and tracks the particles through the CMS detectors, simulating a response and producing the energy depositions in each subdetector. [14] The Monte Carlo simulation was written by Professor Daniel Marlow, with small changes made by Christopher Rogan. It is essentially in C++, but with additions made in order to run in the ROOT framework, an object oriented platform designed originally in the context of the NA49 experiment at CERN, one whose rate of data generation (about 10 Terabytes per run) is comparable to that expected from the LHC experiments. [23] The simulation loops over a range of luminosity values centered at the anticipated luminosity of the LHC, 1034 cm-2s-1. For each luminosity the simulation assumes a frequency, f, of 40 MHz and a minimum bias cross-section of 80 mb. With these values, a simulated < Nint > is calculated given the current luminosity. The simulation then loops over all of the hits from the PYTHIA/GEANT data until it is exhausted or 10,000 bunch crossings have occurred. Each entry in the list of hits from PYTHIA/GEANT has several parameters including a module ID which describes which CMS subdetector the hit occurred in, which interaction the hit is associated with and the energy deposit in the detector corresponding to the hit. The CMS subdetectors are organized by the parameters ‘eta’ and ‘phi’. Phi corresponds to the value ϕ, the planar angle of the detector in the plane perpendicular to the length of the detector tube. Eta corresponds to the value η, which is associated to the polar angle θ, measured from the center of the detector, by

))2

log(tan(!

" #= (5.2.1)

In the loop the following occurs: the simulated < Nint > for the luminosity the simulation is looping over, denoted NS, is used as input to a function which returns a random number, NR, from a poisson distribution centered at NS. This becomes the number of interactions in the current bunch crossing. The simulation zeroes all of the arrays associated with CMS detector energy depositions and then begins scrolling through hits, adding the energies from those hits in the appropriate parts of the detector array. The energies are scaled down by a sine factor associated with η so that only the transverse energy (energy perpendicular to the length of the pipe) is registered and energies are also randomized around a poisson distribution centered at ‘scaled energy’, a value determined by a gain parameter and the energy listed in the PYTHIA/GEANT data. Once the hits from a particular interaction are exhausted, NR is decremented and a new interaction is started. This continues until NR is zero, and at this point the CMS detector arrays contain all the data from the bunch crossing and values of note for the different analysis methods can be recorded. The arrays are again zeroed and a new NR is calculated for the next bunch crossing. This process continues, generating data for a large number of bunch crossings. 5.3 Zero Counting Method Results For this analysis method, two different schemes for the formulation of the acceptance parameter, A, are used.

The first is probabilistic in nature, where A is simply a number chosen between 0 and 1. In the simulation when the energy depositions for each detector hit are about to be tallied, a random number R ∈ (0, 1) is generated, and the energy from the hit is added to

Page 16: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

16

the detector array if and only if R < A. With this method, the formulation for A is explicit, and can be implemented and adjusted easily in a real physical experiment. After each bunch crossing, the number of the 864 detector modules that have no energy deposited in them is tallied, and the number of zeroes in each bunch crossing is summed over the whole simulation. This final tally of zeroes can then be converted into the probability of zero interactions occurring, and used to calculate <Nint

A> as described in section 5.1.1. The second formulation for A is implicit, in that the actual value for A is never

actually calculated. Rather, A appears in the simulation as an adjustable threshold value. In this scheme, after each bunch crossing, instead of zeroes being counted, any modules with an energy deposition less than the threshold value are tallied as zeroes, and a value for <Nint

A> is calculated as in the explicit formulation for A. It is prudent to note at this point that the luminosity is not calculated directly from

these values of <NintA> in this work, and hence the success of a formulation for A is not

judged based on how accurately it predicts the luminosity. Rather, what is important is linearity; the value of <Nint

A> calculated with a particular choice of A should scale linearly with luminosity. The interaction cross section associated with the CMS detectors and accelerator can be calculated and calibrated experimentally at low luminosities. From this the acceptance parameter, whatever it may be, can be calculated if it is unknown. So with the interaction cross section and acceptance parameter dictating the constant of proportionality, the linearity of <Nint

A> with respect to luminosity is what ensures the success of the method. 5.3.1 Results for Probabilistic Formulation of A With the explicit probabilistic method, it is immediately obvious that the acceptance parameter is the same for each of the detector modules. The method is used to calculate a value <Nint

A> for each of the luminosities the simulation loops over, ranging from .01 x 1034

cm-2s-1 to 10.0 x 1034 cm-2s-1. A linear fit is created for each set of

resulting <NintA>, passing through (0, 0) and the value of <Nint

A> corresponding to the luminosity 1 x 1034

cm-2s-1. A plot of these values corresponding to A = .1 is shown below in Figure 5.3.1.1.

Log(<Nint>) vs. Log(Luminosity)

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

-3 -2 -1 0 1 2

Log(Luminosity)

Lo

g(<

Nin

t>)

Figure 5.3.1.1 A plot of number of interactions per bunch crossing vs. luminosity

The ‘linearity’ of a resulting set of values of <Nint

A> is measured by

Page 17: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

17

!=

><"><=10

1

2

int ))log()(log(n

nfitn

A

L NNMSE (5.3.1.1)

where MSEL is a measurement of error in linearity, <NintA>n is the number of interactions

given a particular value of A, <Nfit>n is the number of interactions according to the linear fit and the subscript n corresponds to the value of the luminosity. Using Equation 5.3.1.1 as a measurement of linearity, the success for different values of A can be quantified. Figure 5.3.1.2 shows a plot of MSEL vs. A.

MSE-L vs. A

0

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

0.0008

0 0.05 0.1 0.15 0.2

A

MS

E-L

Figure 5.3.1.2 MSEL vs. A

As Figure 5.3.1.2 demonstrates, the smaller the value of A, the greater the ‘linearity’ of the results for <Nint

A>. This can be attributed to smaller values of A leading to more zeroes, and hence a more statistically accurate calculation of <Nint

A> using Equation 5.1.1.2. The smallest value of A tested, .005, yielded values of <Nint

A> with fractional errors less than 2% relative to the linear fit for all 10 luminosities tested. 5.3.2 Results for Threshold Formulation of A With this scheme for A, the actual value of A is never calculated and somewhat irrelevant. As with the probabilistic scheme, what is important is that the choice of threshold yields values of <Nint

A> that scale linearly with luminosity. The threshold term appears in the simulation after each bunch crossing when the CMS detector data has been accrued. For any particular detector module, denoted a ‘physical tower’, if the magnitude of transverse energy is less than the threshold value a zero is tallied. In addition to these physical towers, after each bunch crossing ‘trigger towers’ are formed, each of which is simply the sum of the transverse energy for an array of 3 x 2 physical towers. The threshold value for these trigger towers is 3 times that for the physical towers so that the trigger towers and physical towers, after the zeroes for all of the bunch crossings have been counted, yield two different values of <Nint

A>. If there is one threshold value for all of the CMS detector modules the success of

the method requires that the acceptance parameter, whatever its magnitude, is the same for all of the physical towers. Due to azimuthal symmetry, the module parameter ‘phi’ described in Section 5.2 does not violate this condition. Unfortunately, it is not immediately obvious that the module parameter ‘eta’ does. To test this, the simulation is

Page 18: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

18

run in a way that constructs ‘eta towers’, 12 towers that correspond to collections of physical towers with the same values of η.1 Each of these towers then yields an independent value of <Nint

A>. The result of this examination is that each of the eta towers does yield a different value of <Nint

A>, although there are no correlations between relative values of <Nint

A> for different luminosities or even between different simulation runs. The conclusion is that the acceptance parameters are essentially the same for all detector modules, although this issue does merit further research beyond the scope of this work.

The problem with this implicit formulation for A is that the method breaks down at higher thresholds. Figures 5.3.2.1 and 5.3.2.2 below show the resulting values for <Nint

A> at the thresholds .1 and 5.1 respectively.

Log(<Nint>) vs. Log(Luminosity)

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

-3 -2 -1 0 1 2

Log(Luminosity)

Lo

g(<

Nin

t>)

Simulated Value

linear f it

Figure 5.3.2.1 <Nint

A> vs. luminosity at a threshold of .1

Log(<Nint>) vs. Log(Luminosity)

-6

-5

-4

-3

-2

-1

0

-3 -2 -1 0 1 2

Log(Luminosity)

Lo

g(<

Nin

t>)

Simulated Value

linear f it

Figure 5.3.2.2 <Nint

A> vs. luminosity at a threshold of 5.1

As the above figures show, at higher thresholds the relationship between <NintA>

and luminosity becomes highly nonlinear. The reason for this is that there are two ways to overcome the threshold: through a single energy deposition with a particular energy or two smaller depositions. The rate for single hits scales linearly with luminosity, as wanted. The problem is that multiple hits scale as a power of luminosity (two hits as luminosity squared, three hits as luminosity cubed et cetera). As the threshold is raised higher, these nonlinear effects become more evident. At the same time, the threshold

1 Here modules in the ‘forward’ and ‘backward’ directions of the CMS detector are treated as the same eta, whereas in the neural network approach they are not, yielding 12 different values of eta rather than 24.

Page 19: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

19

must be high enough to implicitly create an acceptance parameter that yields enough zeroes to ensure the statistical accuracy of the zero counting approach. Using the measure of linearity MSEL and the linear fitting approach described in Section 5.3.1, the success of different threshold values can be quantified by measuring the ‘linearity’ of the values of <Nint

A> that they yield.

MSE-L vs. Threshold value

0

0.5

1

1.5

0 2 4 6

Threshold value

MS

E-L Physical Tow ers

Trigger Tow ers

Figure 5.3.2.3 MSE-L vs. Threshold in a range from .1 to 5.1

MSE-L vs. Threshold value

0

0.1

0.2

0.3

0 0.5 1 1.5

Threshold value

MS

E-L Physical Tow ers

Trigger Tow ers

Figure 5.3.2.4 MSE-L vs. Threshold in a range from .01 to .96

The above figures demonstrate several important results. The values of MSEL are significantly higher for the trigger towers relative to the physical towers, except at very low thresholds, where they are actually lower for the trigger towers. Also, the figures demonstrate the very noticeable effects of nonlinearity at higher thresholds. Despite this trend, Figure 5.3.2.4 reveals that around the threshold value of .11 the value of MSEL reaches a minimum for both the physical towers and trigger towers. This is most likely an indication of the effect of not having enough zeroes for statistical accuracy in the zero counting method overcoming the nonlinear effects of higher thresholds. At a threshold of .11, the resulting values of <Nint

A> from the trigger towers have a lower MSEL value than the physical towers’, and have fractional errors all less than 6% relative to the linear fit. 5.4 Neural Network Method Results 5.4.1 Formulation of Network input

Page 20: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

20

The input for the neural network is the CMS detector data for a bunch crossing while the output target is the number of interactions occurring in the bunch crossing, NR, from earlier, denoted [NR]. Due to the symmetry in the parameter ϕ for the detectors, and the lack thereof in η, the network input is organized into a 24 x 1 array of cells, corresponding to the 24 different η detector modules summer over ϕ, denoted [η1, η2,…, η24]. Neural networks do not favor raw data, as the magnitude of particular entries can dominate the network’s computations, so the data for each value of η and the output is normalized individually between –1 and 1 according to the formula

12

minmax

min !!

!=

xx

xxxn

(5.4.1.1)

where x = [η1, η2,…, η24]⊕ [NR], xmin and xmax are the minimum and maximum of each index of x respectively and xn is the normalized vector data for the network. After scaling the data, it is split into three subsets: training, validation and testing. The training and validation sets consist of 3000 and 1000 vectors respectively, and are used to train the network. Training consists of the network using the training set as input and using the backpropagation algorithms described in Section 4.2. The validation set is used throughout training to ensure that the network does not become simply a lookup table for the training set, stopping training if the validation data begins to perform too poorly. The testing set consists of 3000 vectors and is used to assess the success of the neural network approach. 5.4.2 Determination of Network Architecture Deciding upon a particular network architecture is extremely difficult, as the research in the area is significantly less expansive than that on the other aspects of neural networks. Fortunately, there are some rough indications as to how the network topology should look. Based on the Universal Approximation Theorem, Hornik et al proved that “A two hidden layer network is capable of approximating any useful function”. Hidden layers refer to those in between the input and output layers, meaning a two hidden layer network would look like that in Figure 4.1.1 with one additional layer. In addition, Hornik et al concluded that the mapping ability of a feed-forward neural network is not inherent in the choice of activation function, rather, “it is the multilayer feed-forward structure that leads to the general function approximation capability”. [17] Applying these results, a two-hidden layer network is used, but the problem of how many neurons to place in each of the hidden layers is still a problem to which Hornik provides no guidance. To overcome this issue, two different approaches are used. The first is somewhat of an exhaustive approach. The problem is essentially searching neuron space, the 2-dimensional space with the number of neurons in the first and second layers as the independent variables, for the point where the trained network yields the lowest error. The first step of the exhaustive search is to choose a large portion of neuron space, covering architectures having from 1 to 50 neurons in the first layer and 1 to 100 in the second layer. 1000 random points are picked in this space and the networks corresponding to those points are constructed and trained using the steepest descent algorithm with learning rate α = .05 and momentum λ = .9, yielding mean square errors based on the data in the final round of training. A polynomial interpolation scheme is

Page 21: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

21

then used to create a mapping of the MSE (mean squared error) for the entire portion of neuron space. The result is shown below in Figure 5.4.2.1.

Figure 5.4.2.1 MSE as a function of neurons in the first and second hidden layers:

random search with interpolation The visualized MSE surface is highly non-smooth, even with the interpolation applied, but from this surface a smaller area of minimum MSE can be identified and a more stringent search can be applied. This surface yielded a minimum at [14, 42], and it can be seen from Figure 5.4.2.1 that the area surrounding this point appears to be minimal. Hence a truly exhaustive search is run of all the architectures with 5-20 neurons in the first hidden layer and 32-47 neurons in the second hidden layer. The result is shown below in Figure 5.4.2.2.

Figure 5.4.2.2 MSE as a function of neurons in the first and second hidden layers:

exhaustive search

Page 22: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

22

The lowest MSE is found at the point [15, 42] as shown by the arrow in Figure 5.4.2.2, with MSE = 0.037223. The architecture with 15 neurons in the first hidden layer and 42 in the second hidden layer will now be denoted the [15 42] architecture. In the second approach, a genetic algorithm is used to find the optimal network architecture as described in Section 4.2.3. This approach is advantageous over the previous one, as it allows not only the MSE to be considered but also the size of the architecture when evaluating desirability. The appearance of each link is optional, allowing for more sparse networks, and links between non-adjacent layers are also permitted if favorable. A maximum of 15 neurons in the first hidden layer and 42 in the second are permitted, as the purpose of this approach is to ‘prune’ the network to allow for faster computation. Based on the findings of Grefenstette [15], a population size of N = 80 is used with a probability of crossover Pc = .45 and a probability of mutation Pm = .01. The simulation is stopped after 30 generations and training is done using the steepest descent algorithm. The fitness function, F, takes into account not only the MSE, but also the numbers of neurons and links:

LinksLayers NNMSE

F*)24((

101

++= (5.4.2.1)

The resulting optimal network has 6 neurons in the first hidden layer and 11 in the second, significantly less than the network found through the exhaustive approach. It yielded an MSE of .042936 and is shown below in Figure 5.4.2.3.

Figure 5.4.2.3 Optimal neural network from genetic algorithm approach

5.4.3 Neural Network Method Results and Discussion

Once the network architectures are set, the network is trained. Up until this

point, only the steepest descent backpropagation algorithm is used since it requires less computation than the conjugate gradient algorithm and only relative network performance is of interest when deciding upon the architecture. The [15, 42] feed-forward network and the optimal network from the genetic algorithm approach are trained using both the steepest descent algorithm and the conjugate gradient algorithm. For the testing and training of these neural networks, bunch crossing data

Page 23: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

23

is taken from 10 different luminosities ranging from .01 x 1034 cm-2s-1 to 10.0 x 1034

cm-2s-1. Each network is trained using data sets from each of the 10 luminosities, 7000 data vectors each, as described in Sec. 5.4.1. Figure 5.4.3.1 displays the results of training over 1000 epochs using the two different backpropagation algorithms and two different network architectures at a luminosity of 1034

cm-2s-1.

Figure 5.4.3.1 MSE during training for different architectures and

backpropagation algorithms This figure displays the MSE for the training data sets, and is not necessarily representative of the results for testing. Interestingly, in this plot as on the analogous ones for different luminosities, the plot for the Genetic Algorithm architecture trained with the steepest descent method is not visible. This is due to its mirroring the plot for the conjugate gradient training method with the same architecture, somewhat of an anomaly, as it appears that initialization conditions for the network cause the two different algorithms to pick almost identical weight adjustments throughout training. Figure 5.4.3.2 demonstrates an example training session for the [15, 42] architecture with the steepest descent method at a luminosity of 0.5 x 1034

cm-2s-1.

Figure 5.4.3.2 Training session for [15 42] architecture with steepest descent method

Page 24: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

24

This figure demonstrates the correlation between the MSE performance of the training data set and testing data set. After the networks are trained, the inputs from the testing data sets are used to generate simulated values for the number of interactions for each bunch crossing. These values are then averaged to calculate <Nint> for each interaction. Since this value of <Nint>corresponds by definition to the minimum bias cross-section used to generate the data, this same cross-section can be used to calculate a luminosity value from the simulated values of <Nint>. The results of these calculations are shown below in Table 5.4.3.1.

Luminosity [15 42] Architecture Genetic Algorithm Architecture

(10^34 cm^-2s^-1) Steepest Descent Conjugate Gradient Steepest Descent Conjugate Gradient Predicted Frac. Error Predicted Frac. Error Predicted Frac. Error Predicted Frac. Error

0.01 0.0123 0.23 0.0093 0.07 0.0096 0.04 0.0097 0.03 0.02 0.0201 0.005 0.0198 0.01 0.0194 0.03 0.0195 0.025 0.05 0.0493 0.014 0.0503 0.006 0.0506 0.012 0.0504 0.008 0.1 0.102 0.02 0.0996 0.004 0.0984 0.016 0.0985 0.015 0.2 0.1979 0.0105 0.2017 0.0085 0.2004 0.002 0.2002 0.001 0.5 0.5045 0.009 0.5045 0.009 0.5044 0.0088 0.5043 0.0086 1 1.0012 0.0012 0.9997 0.0003 0.9923 0.0077 0.9933 0.0067 2 2.017 0.0085 2.0172 0.0086 1.9864 0.0068 1.9877 0.00615 5 4.9831 0.00338 4.9797 0.00406 4.8891 0.02218 4.8891 0.02218

10 9.9734 0.00266 9.9852 0.00148 9.9952 0.00048 9.9958 0.00042 MSE = 0.00538 MSE = 0.00053 MSE = 0.000358 MSE = 0.0002464

Table 5.4.3.1 Calculated luminosities and errors using neural network approach

The fractional error for each calculation is computed and from that the different architectures and backpropagation algorithms can be compared using mean squared errors over all the luminosities. Table 5.4.3.1 demonstrates that, for both architectures, the conjugate gradient algorithm outperformed the steepest descent algorithm as expected. Furthermore, the architecture from the genetic algorithm approach significantly outperformed the architecture from the exhaustive search approach. The trained network with the best performance, the genetic algorithm architecture trained with the conjugate gradient algorithm, has errors below 3% for calculation of all luminosities and a MSE of only .0002464.

6 Conclusions and Outlook In this work the number of interactions per bunch crossing was calculated for the LHC using simulated CMS detector data. Two techniques were used to make this calculation, one utilizing neural networks to identify and ‘learn’ the patterns of CMS detector data and the other taking advantage of the statistical distribution the number of interactions per bunch crossing. The zero counting approach incorporated two different formulations for an acceptance parameter A that would permit the probability of a bunch crossing having zero interactions to be large enough to be a statistically viable indicator of the mean number of interactions. The approach using an explicit value of A was explored and shown to have merit, yielding values for <Nint

A> that scaled linearly with luminosity, an essential criteria

Page 25: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

25

for this approach to be viable. Although it was concluded that lower values of A are more favorable for the purposes of this work, an optimal value for A was not found, and will remain an issue for possible further work. The threshold method had similar success, and an approximation to an optimal value for the threshold value was found. In this work, a linear fit for the values of <Nint

A> was made using the point corresponding to a luminosity of 1034

cm-2s-1, the expected luminosity of LHC. It would have perhaps been better to make this fit to a point corresponding to a lower luminosity value, as experimental calibrations for cross section will be executed at this lower luminosity. Further work with this method is justified given the reasonably successful results, and should incorporate the experimental calibration of cross sections and implicit thresholds. The neural network approach used a genetic optimization algorithm to find an optimal network architecture and then compared the use of two different optimization methods to train the network, the steepest descent algorithm and the conjugate gradient algorithm. This approach also yielded relatively successful results, predicting from CMS detector data not only the number of interactions per bunch crossing but also the luminosities used to generate the simulated data within 3% error. The factor that detracts from the applicability of this approach is the dependence of the neural network training on simulated data, namely the data with ‘answers included’ needed to train the networks. A continuation of this work should try to make the approach more readily applicable to raw CMS detector data from actual experiment. Overall this work succeeded in developing two techniques that can be used to calculate the luminosity of the LHC using the CMS detector, a relatively significant task important for future research using the accelerator.

7 References [1] Baker, J.E. “Adaptive Selection Methods for Genetic Algorithms”,

Proceeding of an International Conference on Genetic Algorithms and their Applications ed. J.J. Grefenstette. Lawrence Erlbaum Associates. Hillsdale, NJ. 101-111.

[2] Booker, L.B.. “Improving Search in Genetic Algorithms”. Genetic Algorithms

and Simulated Annealing ed. L. Davis. Pitman. London, 1987. 61-73. [3] Cavicchio, D. J.. Adaptive Search Using Simulated Evolution. Ph.D. Thesis.

University of Michigan: Ann Arbor, MI., 1970.

[4] CERN Homepage. CERN. 10 November 2004. <http://public.web.cern.ch/Public/Welcome.html>.

[5] CMS Detector: Calorimetry. CERN. 10 November 2004.

<http://cmsdoc.cern.ch/cms/outreach/html/CMSdetectorInfo/Calorimetry/page1.html>.

[6] CMS Detector FAQ. CERN. 10 November 2004.

<http://cmsdoc.cern.ch/cms/outreach/html/CMSfaq/CMSfaq.html#Anchor-Wher-64357>.

Page 26: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

26

[7] CMS Detector Tracker. CERN. 10 November 2004.

<http://cmsinfo.cern.ch/Welcome.html/CMSdetectorInfo/CMStracker.html>. [8] CMS HCAL Detector. CERN. 10 November 2004.

<http://cmsinfo.cern.ch/Welcome.html/CMSdetectorInfo/CMShcal.html>. [9] CMS HCAL Documentation. CERN. 10 November 2004.

<http://cmsdoc.cern.ch/cms/outreach/html/CMSdetectorInfo/CMShcal.html>. [10] CMS Muon Detector. CERN. 10 November 2004.

<http://cmsinfo.cern.ch/Welcome.html/CMSdetectorInfo/CMSmuon.html>. [11] De Jong, K. A.. An Analysis of the Behavior of a Class of Genetic Adaptive Systems.

Ph.D. Thesis. University of Michigan. Ann Arbor, MI., 1975. [12] Dennis, J.E. Jr.. Schnabel, R.B.. Numerical Methods for Unconstrained

Optimization and Nonlinear Equations. Prentice-Hall. Englewood Cliffs, New Jersey, 1983.

[13] Fletcher, R.. Reeves, C.M.. “Function Minimizations by Conjugate Gradients”.

Computer Journal. 7 (1964): 149-154. [14] GEANT Webpage. CERN. 20 December 2004.

<http://wwwasd.web.cern.ch/wwwasd/geant/>. [15] Gredenstette, J.J.. “Incorporating Problem Specific Knowledge into Genetic

Algorithms”. Genetic Algorithms and Simulated Annealing. Editor L. Davis. Pitman: London, 1987.

[16] Haykin, S.. Neural Networks: A Comprehensive Foundation. 2nd ed.

Prentice-Hall. Englewood Cliffs, NJ, 1999. [17] Hornik, K. Stinchocombe, M. White, H. Multilayer feedforward networks are

universal approximators. Neural Networks. 1989: 359-366. [18] How Does the LHC Work?. CERN 10 November 2004.

<http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/CERNFuture/HowLHC/HowLHC-en.html>.

[19] How to Study Particles. CERN. 10 November 2004.

<http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/HowStudyPrtcles/Experiments/Experiments-en.html>.

[20] LHC: Injectors and Transfer Lines. CERN. 10 November 2004.

<http://edms.cern.ch/cedar/plsql/navigation.tree?top=1613491629&open=1613491629&expand_open=Y>.

[21] LHC: Vacuum System. CERN. 10 November 2004

Page 27: Simulated Luminosity Calculations for the Large Hadron ...crogan/FallJP.pdf · 1.1 The Large Hadron Collider The Large Hadron Collider (LHC) is a particle accelerator that will bring

27

<http://edms.cern.ch/cedar/plsql/navigation.tree?cookie=3146384&p_top_id=1603527231&p_top_type=P&p_open_id=1603527231&p_open_type=P>.

[22] Polack, E.. Weiss, G.H.. “Computational Methods in Optimization: A Unified

Approach”. Operations Research 20; 2 (1972): 456. [23] ROOT Homepage. CERN. 1 November 2004. < http://root.cern.ch/>. [24] Sanchez-Sinencio, E.. Lau, C.. Artificial Neural Networks Paradigms,

Applications and Hardware Implementations. IEEE Press. New York, 1992. [25] Sjöstrand, Torbjörn. PYTHIA (and JETSET) Webpage. 20 December 2004.

<http://www.thep.lu.se/~torbjorn/Pythia.html>. [26] The Large Hadron Collider: General Information. CERN. 10 November 2004. <http://lhc.web.cern.ch/lhc/general/gen_info.htm>.

[27] Werbos, P.J.. Beyond Regression: New Tools for Prediction and Analysis in the

Behavioral Sciences. Ph.D. Dissertation. Applied Mathematics. Harvard University, Boston, MA, 1974.