Jan Uythoven for the Reliability Sub Working Group (March 2004 – June 2005) & …

Availability Workshop CERN28 November 2013

Dependability calculations prior to 2008 and operational experience during the first LHC run

Jan Uythovenfor the Reliability Sub Working Group

(March 2004 – June 2005) & ….

2Jan Uythoven, Prior 2008 and experience run I

Reliability Sub Working Group

13th and last meeting on 17th June 2005 Many of the players are still around and here today

LHC Availability Workshop, 28/11/2013


The model used – LHC Project Report 812 Simplified model of the LHC machine protection system Main players: PIC, QPS, BLM, BIS, LBDS Fast losses and slow losses, redundancy between systems


60 %

15 %

15 %

P=0

10 %


Redundancy between systems


Presentation R.Filippini, 18/3/2005

P=0

15 %

C=1

15 %


Resulting numbers from the study

Contributions from the different “PhD students” Result depends on operational scenario


MP unsafety is 2.310-4 /year = 5.8 10-7 /hour = SIL3

42 false dumps is about 10 % of all dumps in the model200 days of 2 fills per day, 2 h between fills


7 years later: Compare with Run I results


2010

Estimated 10.5 %


More from Ben @ Evian December 2012

Very good – global – agreement between prediction and what was measured

The QPS is the only system which systematically seems to behave worse than expected: major changes LS1

Availability ok model ok safety ok What is this trend on the BLM system ????



Reaction from Christos

How is the statistics done ? Only above 450.1 GeV: 2012 give 15 BLM faults If all interlocks: 31 BLM faults in 2012 If all faults: 70 BLM faults in 2012

But in all statistics the trend was spottedand the problem identified

Conclusions made on the equipment side The trend over the years was spotted The failure mode identified: optical link Action LS1: replace / clean optical fiber connections


Statistics

Act

Identify


Christos @ Annecy 2013



Machine Protection Machine Availability


2010

Estimated 10.5 %Not addressed at the time


Assumptions used by the model of RSWG The predicted number of dumps due to the MPS fit the model as

statistics only took into account all fills above 450.1 GeV The model assumed 2 fills per day of 10 h Only dumps in physics We did have about the 2 fills per day,

but they were not 10 h long …

However, down time at injection or without beam can be as important as downtime during physics The model should be extended to consider luminosity production and

not only the number of beam dumps The model should also be extended to all equipment and not only

machine protection elements MPS is only in a limited number of cases the cause of the down time

The model was really good for machine safety but not that useful for estimating machine availability



Where did we go after 2005?


Predictions of the Sub-Working group

For a Sub-System of the LHC

LHC Run I StatisticsAgree with the predictions !

Machine is safe without too much impact on

availability

We need to have common metrics of

faults

For availability it needs to be extended to all

equipment

LHC Availability Working Group

To be applied to luminosity and not # faults



The Availability Working Group (AWG) has been created on 20/06/2012 after the endorsement of the LMC committee, with the purpose of creating common definitions and metrics for LHC dependability, modelling LHC availability using data from system experts and operations, identifying strengths and weaknesses for LHC machine availability and developing strategies for improving availability of LHC, LHC upgrades and future machines


À la Ben:



8 meetings so far Synergy between equipment groups to measure and

improve availability of their systems Collaboration with the Maintenance Management Project

which also aims at measuring the same faults Workshop on Machine availability and dependability for

post LS1 LHC TODAY



Conclusions The simplified model of the machine protection system (2005 !) has

proven to be amazingly accurate Although the top model is very simple, it is based on very detailed and complex models

of the individual systems: a lot of work! One PhD per subsystem

Comparison with machine run I data gives us Confidence in the machine safety Handle on machine availability of some subsystems

Improve weak points, equipment which is underperforming Find trends and act

Fault statistics can be analysed at system level down to component level Compare with dependability models of the operational equipment Not all equipment has a dependability model

Extension of 2005 model – AWG Need to be able to measure properly, operational statistics Extend to outside the Machine Protection System Quantify the effect on luminosity


Documents

Jan Uythoven for the Reliability Sub Working Group (March 2004 – June 2005) & …