Safety Controller Synthesis for Collaborative Robots · (IRSs) and humans [1], [2] can leverage their complementary skills, but is difﬁcult to achieve because of uncontrolled hazards

Safety Controller Synthesis for Collaborative RobotsMario Gleirscher∗†, Radu Calinescu∗†

∗Assuring Autonomy International Programme, University of York, York, UK†Department of Computer Science, University of York, York, UK

mario.gleirscher,[email protected]

Abstract—In human-robot collaboration (HRC), software-based automatic safety controllers (ASCs) are used in variousforms (e.g. shutdown mechanisms, emergency brakes, interlocks)to improve operational safety. Complex robotic tasks and in-creasingly close human-robot interaction pose new challengesto ASC developers and certification authorities. Key amongthese challenges is the need to assure the correctness of ASCsunder reasonably weak assumptions. To address this need, weintroduce and evaluate a tool-supported ASC synthesis methodfor HRC in manufacturing. Our ASC synthesis is: (i) informedby the manufacturing process, risk analysis, and regulations;(ii) formally verified against correctness criteria; and (iii) selectedfrom a design space of feasible controllers according to aset of optimality criteria. The synthesised ASC can detect theoccurrence of hazards, move the process into a safe state, and,in certain circumstances, return the process to an operationalstate from which it can resume its original task.

Index Terms—Controller synthesis, human-robot collabora-tion, software engineering, probabilistic model checking.

I. INTRODUCTION

An effective collaboration between industrial robot systems(IRSs) and humans [1], [2] can leverage their complementaryskills, but is difficult to achieve because of uncontrolledhazards and unexploited sensing, tracking, and safety mea-sures [3]. Such hazards have been studied since the 1970s,resulting in elaborate risk taxonomies based on workspaces,tasks, and human body regions [2], [4]–[10]. The majorityare impact hazards (e.g. unexpected movement, reach beyondarea, dangerous workpieces, hazardous manipulation), trap-ping hazards (e.g. operator in cage), and failing equipment.

Addressing these hazards involves the examination of eachmode of operation (e.g. normal, maintenance) for its hazardousbehaviour, and the use of automatic safety controllers (ASCs)to trigger mode-specific safety measures [2]. Malfunctiondiagnostics (e.g. fault detection, wear-out monitoring) canfurther inform the ASC. As shown in Tab. I, a variety of safetymeasures [3] can prevent or mitigate hazards and accidents byreducing the probability of their occurrence and the severityof their consequences. There are functional measures usingelectronic equipment (e.g. speed & separation monitoring) andintrinsic measures not using such equipment (e.g. fence, flex-ible surface). Functional measures focusing on the correctnessand reliability of a controller are called dependability mea-sures [5], [11]. Functional measures are said to be passive ifthey focus on severity reduction (e.g. force-feedback control),active otherwise (e.g. safety-rated monitored stop).

The standardisation of safety requirements for IRSs [4]culminated in ANSI/RIA R15.06, ISO 10218 [12], 13482, and

15066. According to ISO 10218, an IRS comprises a robotarm, a robot controller, an end-effector, and a workpiece (see,e.g. Fig. 2a below). In collaborative operation, the operatorand the IRS (called a cobot [13]) can occupy the collabora-tive workspace (i.e., a subset of the safeguarded workspace)simultaneously while the IRS is performing tasks [14]. Basedon that, ISO 15066 recommends four safety modes, describedand combined with work layouts in [8], [15]:• safety-rated monitored stop (powered but no simultaneous

activity of robot and operator in shared workspace),• hand-guided operation (zero-gravity control, guided by

an operator, no actuation without operator input),• speed & separation monitoring (speed continuously

adapted to distance of robot and operator), and• power & force limiting (reduced impact on the human

body, a robot’s power and applied forces are limited).In the following, we highlight recent challenges and explainhow our work addresses these.

Challenges: Since the 1980s, tele-programming and simu-lation have led to a reduction of hazard exposure. However,guarding arrangements interfere with manufacturing processesand mobile robots. Complex tasks require continuous and closehuman-robot interaction (e.g. mutual take-over of tasks), mu-tual clarification of intent, and trading off risk [15], [16]. Robotmovements need to be predictable and impacts on the humanbody need to be attenuated (e.g. speed & separation monitoringrequires stereo vision and laser scanners to distinguish safetyzones). Engineers need to consider a variety of complex failuremodes. This situation implies requirements and design spacesfor ASCs, so engineers want to answer questions such as:• Which ASC design minimises the probability of incidents

in presence of human and sensor errors?

Table I: IRS safety measures by stage of causal chain

Stage Type of Measure ExamplesHazardprevention

1. safeguard/barrier fence, interlock2. IT safety verified safety controller3. IT security security-verified (safety) controller

Hazardmitigation& accidentprevention

4. reliability fault-tolerant scene interpretation5. workspace intru-sion detection

speed & separation monitoring,safety-rated monitored stop

6. shift of control hand-guided operation

Accidentmitigation(alleviation)

7. power & forcelimitation

low weight parts, flexible surfaces;variable impedance, touch-sensitive,& force-feedback control

8. system halt emergency stop, dead-man’s switch

arX

iv:2

007.

0334

0v1

[cs

.RO

] 7

Jul

202

0

• Which design minimises nuisance to the human, max-imises productivity, etc. while maintaining safety?

• Does a selected controller correctly handle hazards whendetected and return the system to a useful safe state?

Contributions: We introduce a tool-supported method forthe synthesis of discrete-event ASCs that meet safety re-quirements and optimise process performance for human-robotcooperation (alternative use of shared workspace) and col-laboration (simultaneous use of shared workspace, with closeinteraction) [8], [17]. We model the manufacturing processand its safety analysis as a Markov decision process (MDP)and select a correct-by-construction ASC from a set of MDPpolicies. We extend our notion of risk structures [18] andour tool YAP [19]. This simplifies the modelling of activitiesand actors, critical events (CEs, e.g. hazards), mitigations (e.g.safety mode changes) and reward structures for risk optimi-sation; and automates the translation of risk structures intoMDPs. Our approach facilitates the verification of safety ofthe MDP and of probabilistic reach-avoid properties of aselected policy. A verified ASC detects hazards and controlstheir mitigation by (i) the execution of a safety function, (ii) atransition to a safer mode, or (iii) a transition to a safer activity.

Overview: Sec. II discusses related work, Sec. III introducesour case study as a running example, and Sec. IV provides thetheoretical background. We describe and evaluate the ASCsynthesis method in Sec. V and Sec. VI, respectively, and weconclude with a short summary in Sec. VII.

II. RELATED WORK

To the best of our knowledge, our method is the first end-to-end approach to synthesising ASCs for handling multiplerisks in HRC for manufacturing processes.

Askarpour et al. [20] discuss a discrete-event formalisationof a work cell in the linear-time temporal language TRIO.Actions are specified as pre/inv/post-triples (with a safetyinvariant) for contract-based reasoning with the SAT solverZot. In contrast, our approach builds on probabilistic guardedcommand language (pGCL), separating action modelling fromproperty specification. Beyond counterexamples for modelrepair, our approach yields an executable policy. While theiruse of a priority parameter helps to abstract from unnecessarystate variables, we propose guards to implement flexible in-dividual action orderings. Moreover, violations of inv lead topausing the cell whereas our approach can deal with multiplemitigation options offering a variety of safety responses.

For generic robot applications, Orlandini et al. [21] employthe action language PDDL for modelling and timed game au-tomata for controller synthesis. The model checker UPPAAL-TIGA is used for verifying (i.e., finding winning strategies for)reach-avoid properties of type A(safe U goal). While gamesolving could enhance our verification approach, our methodfocuses on guidance in risk modelling for safely optimisedHRC performance. Cesta et al. [22] present an approach tosynthesise controllers (i.e., plans) for HRC applications using atimeline-based PDDL planner. While they distinguish control-lable (i.e., duration known) from uncontrollable actions (i.e.,

welder

robotarm

effector

(a) Safeguarded area (company)

(b) Workbench (company)

(c) Replica (research lab)

Figure 1: Actual (a, b) and replicated (c) cobot setting

duration unknown), an important aspect of HRC modelling,their focus is on task planning and scheduling rather than onrisk modelling for verified synthesis of ASCs.

Heinzmann and Zelinsky [23] propose a power & forcelimiting mode always active during an HRC activity describedas a discrete-event controller. Long et al. [24] propose a speed& separation monitoring scheme with nominal (max. velocity),reduced (speed limiting), and passive (hand-guided operation)safety modes. While these authors do not aim at synthesis ortask modelling, their elaborate safety modes may serve as atarget platform to our multi-risk synthesis approach.

III. RUNNING EXAMPLE: MANUFACTURING COBOTS

Fig. 1 shows an IRS manufacturing cell at a UK company(with the pictures anonymised for confidentiality reasons) andreplicated in a testbed at the University of Sheffield (Fig. 1c).The corresponding process (call it P) consists of activi-ties (Fig. 2b) collaboratively repeated by an operator, astationary robotic arm, and a spot welder (Fig. 2a). Previoussafety analysis (i.e., hazard identification, risk assessment,requirements derivation) resulted in two sensors (i.e., a rangefinder in Fig. 1a and a light barrier in Fig. 1b, indicated in red)triggering an emergency stop if a person approaches the welderor enters the workbench while the robot or welder are active.Tab. II shows our partial safety analysis of the cell followingthe guidance in Sec. I. The right column specifies safety goalsagainst each accident and controller requirements (e.g. mode-switch requirements) handling each latent cause in the leftcolumn, and indicating how the hazard is to be removed.

IV. PRELIMINARIES

Our method uses MDPs as a formal model of P , and MDPpolicies as the design space for controller synthesis.

Definition 1. Markov decision process (MDP). Given alldistributions Dist(αP) over an action alphabet αP of aprocess P , an MDP is a tuple M = (S, s0, αP , δP , L) with a

2

Table II: Our partial safety analysis of the manufacturing cell referring to the measures recommended in ISO 15066

Id Critical Event (Risk Factor) Safety Requirement

Accident (to be prevented or alleviated) Safety GoalRC Robot arm harshly Collides with operator The robot shall avoid harsh active collisions with the operator.WS Welding Sparks cause operator injuries The welding process shall reduce sparks injuring the operator.RT Robot arm Touches the operator The robot shall avoid active contact with the operator.

Latent Cause (to be mitigated timely)† Controller RequirementHRW Human operator and Robot use Workbench at the

same time(m) The robot shall perform a safety-rated monitored stop and (r) resume normal operationafter the operator has left the shared workbench.

HW Human operator is entering the Workbench whilethe robot is away from the bench

(m) If the robot moves a workpiece to the bench then it shall switch to power & forcelimiting mode and (r) resume normal operation after the operator has left the workbench.

HS Human operator has entered the Safeguarded areawhile robot moving or welder active

(m) The welder shall be switched off, the robot to speed & separation monitoring. (r) Bothshall resume normal mode after the operator has left and acknowledged the notification.

HC Human operator is Close to the welding spot whilerobot working and welder active

(m) The welder shall be switched off, the robot to safety-rated monitored stop. (r) Both shallresume normal or idle mode with a reset procedure after the operator has left.

† m. . . mitigation requirement, r. . . resumption requirement

Spotwelder

Robot

Safeguardedworkspace

Workbench

Collaborativeworkspace

Operator

(a) HRC setting (conceptual, top view)

main

idle

base

moving

exchWrkp

welding

(b) Process activities

Figure 2: Conceptual setting (a) and activities in the manufacturingprocess (b) performed by the operator, the robot, and the welder (inblue), classified by the activity groups moving and base (in gray)

set S of states, an initial state s0 ∈ S, a probabilistic transitionfunction δP : S × αP → Dist(αP), and a map L : S → 2AP

labelling S with atomic propositions AP [25].

Given a map A : S → 2αP , |A(s)| > 1 signifies non-deterministic choice in s. Its resolution for S forms a policy.

Definition 2. Memoryless Policy. A memoryless policy is amap π : S → Dist(αP) s.t. π(s)(a) > 0 ⇒ a ∈ A(s). π isdeterministic if ∀s ∈ S ∃a ∈ A(s) : π(s)(a) = 1 ∧ ∀a′ ∈αP \ {a} : π(s)(a′) = 0.

The following discussion is restricted to deterministic mem-oryless policies. Let ΠM be the set of all such policies forM.Then, action rewards defined by a map rqaction : S × αP →R≥0 allow the assessment of ΠM based on a quantity q.

Verification ofM is based on probabilistic computation treelogic (PCTL) whose properties over AP are formed by

φ ::= > | a | ¬φ | φ ∧ φ | Eφ | Aϕ and ϕ ::= Xφ | φUφ

with a ∈ AP ; an optional bound b ∈ N+ for U∼b with∼ ∈ {<,≤,=,≥}; the quantification operators P∼b|=? ϕ toverify (or with =?, to quantify) probabilities, S∼b|=?[a] todetermine long-run probabilities, Rq∼b|[min|max]=?[Fφ | C[∼b]]to calculate reachability and accumulative action rewards,

and the abbreviations Fφ ≡ >Uφ, Gφ ≡ ¬F¬φ, andφWψ ≡ φUψ ∨ Gφ. For sake of brevity, consider thetreatment of PCTL in [25], [26].

The concise definition of δP , the behaviour of P , is facil-itated by PRISM’s [25] pGCL. Guarded commands are ofthe form [α] γ −→ υ where α is an event label and υ aprobabilistic update applicable to s ∈ S only if s |= γ, whereγ is an expression in the propositional fragment of PCTL.1

Generally, υ ::= π1 : υ1 + · · · + πn : υn with Σi∈1..nπi = 1and assignments υi to state variables of type B, N, or R.

For safety analysis, we view the cell in Sec. III as a processP , monitored and influenced by an ASC to mitigate hazardsand prevent accidents. An accident a ∈ S is an undesiredconsequence reachable from a set Ξ ⊂ S forming the causesof a. The fraction of a cause c ∈ Ξ not related to the operatoris called a hazard h [27], [28]. We call c latent2 if there aresufficient resources (e.g. time for removing h by transition tos 6∈ Ξ) to prevent the accident. h includes states in S \ Ξbeing critical because certain events (e.g. an operator action)cause a transition to Ξ, and possibly a, if h stays active, furtherconditions hold, and no safety measures are put in place timely.

Risk modelling can be facilitated by specifying risk factorsand combining them into risk structures [18]. A risk factor f isa labelled transition system (LTS) modelling the life cycle ofa critical event (i.e., hazard, cause, mishap). f has the phasesinactive (0f ), active (f ), and mitigated (f) and transitionsbetween these phases signifying endangerment events (e ) andmitigation (m ) and resumption (mr) actions. Let F be aset of factors, e.g. the ones in column Id in Tab. II. TheCartesian product of the phases of the factors in F yieldsthe risk space R(F ). To utilise factor LTSs for the translationof ASC designs into pGCL, we further develop the notion ofrisk factors in Sec. V-B as part of our contribution.

V. APPROACH: SAFETY CONTROLLER SYNTHESIS

Fig. 3 shows the steps and artefacts of the proposed methoddetailed below and illustrated with a running example.

1We use −→ to separate guard and update expressions and → both forlogical implication and the definition of mappings.

2As opposed to immediate causes reducing the possibilities of risk handling.

3

Spotwelder

Robot

Safeguardedworkspace

Workbench

Collaborativeworkspace

Operator

Guidelines &Expert Knowledge

e.g. ISO 15066

ProcessModelling1

sensing/actuation,location (space discretisation),

typical uncertainties & costs

SafetyAnalysis

2

Critical Events& Safety

Requirements

risk model,e.g. failuremodes,probabilities

safety modes, mitigations,corresp. sensing/actuation

Process asan MDP

main

idle

base

moving

exchWrkp

welding

MitigationDesign

3

MDP withASC Design

Space

capturesmitigationoptions

VerifiedControllerSynthesis

4

DiscreteEvent Safety

Controller

ASCTesting

Figure 3: Main steps and artefacts of the proposed method (futurework indicated in dashed lines)

// Locationsconst atTable = 0; const sharedTbl = 1; const inCell = 2; const

atWeldSpot = 3;// Status of workpiece supportconst empty = 0; const left = 1; const right = 2; const both = 3;// Status of range finderconst far = 0; const near = 1; const close = 2;

Figure 4: Fragment of the data type definition in PRISM

A. 1 Modelling the Manufacturing Process

Activities in P (Fig. 2b) are structured by sets of guardedcommands. We distinguish actions of controllable actors (e.g.robot arm, welder, operator) and the ASC, and events of asensor module and shared “manipulables” (e.g. workpiece sup-port). S is built from discrete variables (cf. Fig. 4) capturingthe world state (e.g. robot location; workbench status), sensoryinputs (e.g. range finder), control outputs (e.g. robot behaviour,notifications), user inputs (e.g. start button), and modes (e.g.current activity, safety mode).

Mode variables (e.g. ract, safmod) are used to specify afilter φa for enabling actions that form an activity (e.g. grabworkpiece, move arm to welder), or a filter φsm for enablingactions in a particular safety mode. Thus, the structure ofguarded commands for P follows the pattern

[α] ¬ω ∧ φsm ∧ φa ∧ γ −→ υ

with an action label α, a guard ω to prevent from leaving thefinal state, a check γ of individual conditions, and an updateexpression υ (cf. Sec. IV). Given a set Ssm of safety modes,modelling involves the restriction of guarded commands of allactors in P , by adding φsm and φa to their guards, to obtainmode- and activity-aware guarded commands.

Example 1. Fig. 5 specifies the two robot actions r moveToTableand r grabLeftWorkpiece of the activity exchWrkp.

[r moveToTable] !CYCLEEND& (safmod=normal|safmod=ssmon|safmod=pflim)& ract=exchWrkp& !rloc=sharedTbl & ((wps!=right&reffocc) |wps=left&!reffocc )−> (rloc’=sharedTbl);

[r grabLeftWorkpiece] !CYCLEEND& (safmod=normal|safmod=ssmon|safmod=pflim|safmod=hguid)& ract=exchWrkp& rloc=sharedTbl & ! reffocc & wps=left−> (reffocc’=true)&(wpfin’=false) ;

[r placeWorkpieceRight] !CYCLEEND & (safmod=normal|safmod=ssmon|safmod=pflim|safmod=hguid) & ract=exchWrkp & rloc=sharedTbl &reffocc & wpfin −> (reffocc’=false);

Figure 5: PRISM model fragment of the module robotArm

B. 2 Safety Analysis and Risk Modelling

Fig. 6 further develops the notion [18] of a risk factor ftowards guidance in the formalisation of hazards, causes, andmishaps and the events forming a causal chain (e.g. a mishapevent leads to a mishap state). Based on that, f supports thedesign of hazard mitigations to reduce accidents, and accidentalleviations to reduce consequences. Hence, each critical eventneeds to be translated into a risk factor. Example 2 instantiatesf with the hazard HC from Tab. II.

Example 2. For the hazard HC, Fig. 6 describes(a) how an endangerment e f activates HC (i.e., leads to a risk state

ρHC ∈ R(F ) where the predicate HC holds true),(b) how mitigations (e.g. issuing an operator notification) update

P to enter the phase HC (i.e., HC is false),(c) further mitigations (e.g. waiting for operator response),(d) resumptions (e.g. switching from speed & separation monitoring

to normal) that update P to return to phase 0HC where bothHC and HC are false,

(e) further endangerments (e.g. erroneous robot movement) re-activating HC from state HC,

(f) a mishap event moving P into a state with HC true (i.e., anf-accident occurs),

(g) alleviations to handle consequences of HC in phase HC.

Phase f ′, reachable by non-deterministic or probabilisticchoice, models an undetected endangerment (e.g. because ofa faulty range finder for HC) that can lead to f. For sake ofsimplicity, the endanger choices in f and f

′are not shown.

f, f′, and 0f form the f-safe region of P . Example 3 explains

how one models risk for the welding activity in YAP script.

Example 3. First, the Activity section of Fig. 7 specifies thatwelding includes the specification of the activity moving and thatthe activity exchWrkp is a successor of welding. This way, onespecifies an activity automaton for P as shown in Fig. 2b.

Next, the HazardModel section lists critical events relevant towelding, the two mishaps RC and RT and the latent cause HC (cf.Tab. II). One can hypothesise high-level relationships between criticalevents using constraints. E.g. RC requiresNOf (2|HRW,HS,HC|2)expresses the assumption that exactly two of the listed events have tohave occurred before RC can occur. Such relationships are typicallyidentified during preliminary hazard operability studies (HazOp),system FMEA, or system FTA.

Furthermore, HC is specified by (a) an informal description, (b) aguard describing its activation HC , (c) mis (i.e., an action, e.g.

4

f-sa

fe

fm

itiga

ted

f-unsafe

fin

activ

e

0fstart f

f ′

ff′

f

e f

endanger

e f

endanger

endanger

mf

miti

gate

resume fromsafety function

mfr

resume

f-nominaloperation

preparingfor mitigation

undetectedf-unsafeoperation

alleviation

further mitigation, waitingfor environment to react

f-mitigatedoperation

further preparationsfor resumption

mishap

mishap

Figure 6: Phases and actions of a risk factor f. ⇒. . . multiple optionalactions considered, —. . . minimum amount of information to beprovided for a risk factor, - -. . . optional modelling aspects.

Activity {include moving; successor exchWrkp;}HazardModel {

RC requiresNOf (2|HRW,HS,HC|2);RT requiresNOf (1|HRW,HS,HC);HC desc ”(H)uman (C)lose to active welder and robot working”

mitDeniesMit (HRW,HW,HS,RC,RT)guard ”hACT WELDING & hloc=atWeldSpot”detectedBy (SHARE.HCdet)mitigatedBy (SHUTDOWN.HCmit,SHUTDOWN.HCmit2)resumedBy (NOTIFY.HCres,NOTIFY.HCres2)mis=”h exitPlant”prob=0.2 // of mishap if HC not detected OR not mitigated timelysev=5; }

Figure 7: YAP risk model for the welding activity from Fig. 2b

of the operator, with the mishap HC as a bad outcome if HC isundetected or not mitigated timely), (d) prob (i.e., the probability ofHC under these conditions), and (e) sev, quantifying the severity ofthe best, average, or worst expected consequences from HC.

Probabilistic choice in M can be used to model severaluncertainties. Informed by FTA and FMEA, one can considersensor and actuator faults. In our example, the range finderas the detector of eHC fails by 5% when the operator entersthe cell. Informed by HazOp, human errors can be modelledsimilarly. In our example, with a 10% chance, the operatorenters the cell, knowing that robotArm and welder areactive. Moreover, one can model the probability of occurrenceof a mishap under the condition of an active hazard. In ourexample, with a 20% chance, HC may follow HC ′ (i.e.,HC remains undetected because of the aforementioned sensorfault) or HC (i.e., the ASC is not reacting timely).

C. 3 Designing Mitigation and Resumption Options

The capabilities of actors in P determine the controlla-bility of critical events. We found three techniques usefulin designing mitigations and resumptions: action filters (i.e.,safety modes, cf. Sec. I), activity changes (e.g. change fromwelding to off), and safety functions (e.g. notification).Recall that mitigations and resumptions are actions (i.e., tran-sition labels in a risk factor LTS). Accordingly, the example in

mode HCdet desc ”range finder”guard ”hACT WELDING & rngDet=close”embodiedBy rngDet;

mode HCmit desc ”protective emergency stop of robotic system”event stopupdate ”( notif ’=leaveArea)”target ( act=off , safmod=stopped)disruption =10nuisance=2effort =1;

mode HCres desc ”resumption from emergency stop”event resumeguard ”notif=leaveArea & !hST HOinSGA”update ”( notif ’=ok)”target ( act=exchWrkp, safmod=normal);

Figure 8: YAP action specifications for the risk factor HC

// Critical event predicates// HC:monitor ”(H)uman (C)lose to active welder and robot working”formula RCE HC = hACT WELDING & hloc=atWeldSpot;formula CE HC = hACT WELDING & rngDet=close;

Figure 9: Monitor predicates for HC generated for the PRISM model

Fig. 8 specifies details about the actions referred to in Fig. 7.Here, the following parameters drive the design space of anASC: (a) a detectedBy reference (i.e., associating the guardwith a sensor predicate), (b) a mitigatedBy reference to oneor more mitigation options, and (c) a resumedBy referenceto one or more resumption options. For this approach, weextended YAP’s input language to develop these actions intoguarded commands.

Example 4. As an example for (b), in Fig. 8, the action HCmit oftype SHUTDOWN (i) synchronises with the robotArm and welderon the event stop, (ii) update models a safety function, issuinga notification to the operator to leave the safeguarded area, and(iii) target switches the manufacturing cell to the activity off andto the safety mode stopped, all triggered by the range finder.

Indicated in Fig. 6, HCmit models one option for mHC.One can distinguish several such options by quantities such asdisruption of the manufacturing process, nuisance of the op-erator, and effort to be spent by the machines. In combinationwith processing time and value for each nominal action of P ,these quantities enable the evaluation and selection of optimalpolicies as we shall see below.

This part of the YAP model can be translated into pGCL.Endangerments are translated into commands of the form

[e f′ ] φa ∧ χ −→ f ′ and [e f ] φa ∧ ζ −→ (1− p) : f + p : f ′

with guards including a hazard condition χ and a correspond-ing monitoring (or sensor) predicate ζ. Constraints, such asrequiresNOf in Example 3, are then used to derive part of ζ.

Example 5. Fig. 9 indicates the transcription of guard and detect-edBy into a pair of predicates, RCE HC describing actual states,and CE HC signifying states monitored by the range finder, wherep can denote the sensor fault probability.

Mitigations are translated into commands of the form

[m ft ] φt ∧ f −→ υt′ and [m f ] φsm′,a′,sf ′ ∧ f −→ f

5

with t ∈ {sm, a, sf } in φt for checking permission in thecurrent safety mode, activity, and state of safety functions,and in υt′ for hazard removal by switching into a safer activitya′, a safer mode sm ′, and by applying the safety function sf .These updates are checked by φsm′,a′,sf ′ to be able to proceedto f. Resumptions are translated into commands of the form

[m fr,t] φt ∧ ρf −→ υt′ and [m f

r] φsm′,a′,sf ′ ∧ ρf −→ 0f

where φt guards the resumption based on the safety modeand function in place, ρf ⊆ R(F ) restricts permission to riskstates (Sec. IV and Fig. 6) with f mitigated; and υt′ invertsthe safety function (sf −1), relaxes to the safety mode sm ′,and returns to an, ideally more productive, activity a′ of P .

D. 4 Verified Controller Synthesis

The present approach follows a two-staged search throughthe ASC design space: The first stage is carried through byYAP when generating the guarded commands. The secondstage is performed by PRISM when synthesising MDP poli-cies. For search space reduction, YAP employs risk gradientsbetween safety modes and activities in the first stage. For thesecond stage, YAP generates reward structures for some of thequantities introduced in Sec. V-C.

1) Guarded Command Generation: The generation of υt′for mitigations and resumptions requires the choice of a safetymode and activity to switch to, depending on the current modeand activity. Given activities Sa and modes Ssm , two skew-diagonal risk gradient matrices Sa ∈ R|Sa|×|Sa| and Ssm ∈R|Ssm |×|Ssm |, e.g. manually crafted from safety analysis, canresolve this choice based on the following justification.

Assume a1, a2 ∈ Sa vary in physical movement, force, andspeed. If a1 means more or wider movement, higher forceapplication, or higher speed than a2, then a change from a1

to a2 will likely reduce risk. Hence, a positive gradient isassigned to Sa

a1a2 . Similarly, assume m1,m2 ∈ Ssm vary P’scapabilities by relaxing or restricting the range and shape ofpermitted actions. If m1 permits stronger capabilities than m2,then a change from m1 to m2 will likely reduce risk. Again,we assign a positive gradient to Ssm

m1m2. The diagonality of S

provides the dual for resumptions where a negative gradientof the same amount from m2 to m1 is assigned to Ssm

m2m1.

Let current safety mode c and mitigation m f with targetmode t. m f (m f

r) changes to t only if the gradient from c tot is ≥ 0 (≤ 0). If Ssm

ct ≥ 0, then a switch to t is includedin υsm′ , otherwise υsm′ leaves M in c. We implemented thisscheme for activities analogously.

Example 6. Fig. 10 shows the result of applying this scheme in thegeneration of an ASC for the risk factor HC based on Fig. 8.

Essentially, S approximates the change of risk in case of achange from one activity or safety mode to another. Using S,the majority of an ASC can be described in YAP script.

2) MDP Verification: This step requires establishingM |=φwf ∧φc with properties expressed in PCTL (Sec. IV). φwf isa well-formedness property including the verification of, e.g.hazard occurrence and freedom of pre-final deadlocks, and

// Endangerments (monitor)[si HCact] wact=welding & ract=exchWrkp & CE HC & !(HCp=act |

HCp=mis)−> (HCp’=act);

...// Escalation to mishap if not mitigated ( for analysis only)[ h exitPlant ] true −>

((! HCp=mis & (CE HC | RCE HC))?0.2:0):(HCp’=mis)+((!HCp=mis & (CE HC | RCE HC))?0.8:1):true;

// Mitigation with synchronous events[s HCstop] wact=welding & ract=exchWrkp & HCp=act −> true;...// Change of safety modes[si HCmitsafmod] safmod=normal & HCp=act −> (safmod’=stopped);...// Execution of safety functions[si HCmitfun] HCp=act & !((notif=leaveArea)) −> (notif’=leaveArea);// For entering the mitigated phase[si HCmit] HCp=act & (notif=leaveArea) & !CE HC −> (HCp’=mit);// Switching off safety functions[si HCresfun] HCp=mit & !CE HC & notif=leaveArea & !hST HOinSGA

−> (notif’=ok);// Resuming to a less restrictive safety mode[si HCressafmod] safmod=normal & HCp=mit & HCp=mit & (notif=ok)−> (safmod’=normal)&(HCp’=sfd);

...// Resuming agent’s activities (via synchronisation )[s HCresume] HCp=sfd & !CE HC & (notif=ok) −> (HCp’=inact);

Figure 10: PRISM model fragment generated for the risk factor HC

the falsification, e.g. that final states must not be initial states.φwf helps to simplify model debugging, decrease model size,guarantee progress, and reduce vacuity. φc specifies safety-carrying correctness including, e.g. ASC progress (and acrosscycles, liveness), particularly, that ΠM (i.e., the ASC designspace) contains complete mitigation paths from critical events.Tab. III lists examples of φwf and φc to be verified of M.

3) Policy Synthesis: The ASC design space (ΠM) is createdby commands (e.g. mitigations, resumptions) simultaneouslyenabled in s ∈ S, yielding multiple policies for s and somecommands enabled in multiple states, giving rise to a policyfor each ordering in which these commands can be chosen.

An optimal policy π?, including the ASC decisions, can beselected from ΠM based on multiple criteria (e.g. minimumrisk and nuisance, maximum productivity). For that, M usesaction rewards to quantify (i) productivity, up- and down-timeof P; (ii) factor-, mode-, and activity-based risk; risk reductionpotential; disruptiveness and nuisance; resource consumption;and eff ective time of the ASC.

Example 7. Fig. 11 shows a pGCL fragment generated by YAP.

4) Discrete-time Markov chain (DTMC) Verification: Dueto known restrictions in combining multi-objective queriesand constraints in PRISM, part of the verification appliesto the policy as a DTMC. This step requires establishingπ? |= φs where φs can include liveness, safety, and reliabilityproperties (e.g. “reach-avoid” of type A G Fψ ∧ A G¬φ; theprobability of failure on demand of the ASC; the probabilityof a mishap from any hazard is below a threshold). Tab. IIIlists examples of properties to be verified of π?.

Fig. 12 visualises π? as a graph with nodes for all states

6

// Risk of occurrence of HCrewards ”risk HC”[r moveToWelder] (RCE HC | CE HC) & !CYCLEEND : 5;[rw weldStep] (RCE HC | CE HC) & !CYCLEEND : 10;[h approachWeldSpot] (RCE HC | CE HC) & !CYCLEEND : 7;...[ h exitPlant ] notif =leaveArea & (RCE HC | CE HC) & !CYCLEEND :

2;endrewards...// Nuisance (e .g. to the human operator; per mitigation option )rewards ”nuisance”// HC−mitigation: HCmit[si HCmitsafmod] (HCp=act | HCp=mit) : 2.0;[s HCstop] (HCp=act | HCp=mit) : 2.0;[si HCmitfun] (HCp=act | HCp=mit) : 2.0;endrewards

Figure 11: PRISM rewards for risk from HC and nuisance of HCmit

31 --:1 32 --:1

33 --:1

38 --:1

159 --:1

160 --:1 161 --:1 162 --:1

0 hi_mayEnterCell:0.9

1

hi_mayEnterCell:0.1

6

h_enterCell:0.05

7

h_enterCell:0.95

8

h_approachWeldSpot:0.05

10

h_approachWeldSpot:0.95h_approachWeldSpot:0.95

9


3

h_withDrawArm:1

4

h_withDrawArm:1

5

2

h_withDrawArm:1

85

r_grabLeftWorkpiece:1

h_exitPlant:1

93

r_grabLeftWorkpiece:1h_exitPlant:1

11

12

h_start:1

hi_mayEnterCell:0.9

14

hi_mayEnterCell:0.1

19

h_placeWorkpieceLeft:1


15

hi_mayEnterCell:0.1

22

h_enterCell:0.05

24

h_enterCell:0.95

h_withDrawArm:1

26


30

h_approachWeldSpot:0.95h_approachWeldSpot:0.95

28


16

20

h_placeWorkpieceLeft:1

17

h_withDrawArm:1

18

h_withDrawArm:1

r_moveToTable:1

21

25


29


h_exitPlant:1 h_exitPlant:1

h_exitPlant:1r_moveToTable:1

23


27


h_exitPlant:1

h_exitPlant:1

44

43

h_grabRightWorkpiece:1

p_final:1

46

45


p_final:1

48

47


p_final:1

67


68


69


70

72


74


h_exitPlant:1h_exitPlant:1

71


73


h_exitPlant:1

77

56

h_exitPlant:1

54

si_HCresfun:1

80

66

h_exitPlant:1

64

si_HCresfun:1

82

81


p_final:1


84

hi_mayEnterCell:0.1

89

h_enterCell:0.05

90

h_enterCell:0.95

91

h_approachWeldSpot:0.05 h_approachWeldSpot:0.95 h_approachWeldSpot:0.95

92


86

h_withDrawArm:1

87

h_withDrawArm:1

88

h_withDrawArm:1

96

r_moveToWelder:1

h_exitPlant:1

104

r_moveToWelder:1h_exitPlant:1


95

hi_mayEnterCell:0.1

100

h_enterCell:0.05

101

h_enterCell:0.95

102

h_approachWeldSpot:0.05h_approachWeldSpot:0.95h_approachWeldSpot:0.95

103


97

h_withDrawArm:1

98

h_withDrawArm:1

99

h_withDrawArm:1

181

rw_weldStep:1

h_exitPlant:1

202

rw_weldStep:1h_exitPlant:1

117

105

h_withDrawArm:1

r_placeWorkpieceRight:1

118

106

h_withDrawArm:1


119

107

h_withDrawArm:1


120

122


124



121


123


h_exitPlant:1

127

111

h_exitPlant:1

110

si_HCresfun:1

130

116

h_exitPlant:1

115

si_HCresfun:1

144

132

h_withDrawArm:1

r_moveToTable:1

145

133

h_withDrawArm:1

r_moveToTable:1

146

134

h_withDrawArm:1

r_moveToTable:1

147

149


151



148


150


h_exitPlant:1

154

138

h_exitPlant:1

137

si_HCresfun:1

157

143

h_exitPlant:1

142

si_HCresfun:1

174

166

h_exitPlant:1

165

si_HCresfun:1

177

171

h_exitPlant:1

170

si_HCresfun:1

195

179

h_withDrawArm:1

rw_leaveWelder:1

196

180

h_withDrawArm:1

rw_leaveWelder:1

197

h_withDrawArm:1

rw_leaveWelder:1

198

200

h_approachWeldSpot:0.05 h_approachWeldSpot:0.95

h_exitPlant:0.8

186

h_exitPlant:0.2

203

si_HCact:1

199


201


h_exitPlant:0.8 h_exitPlant:0.2

p_final:1

57

p_final:1

187

p_final:1

193

p_final:1

194

p_final:1

131


158

r_moveToTable:1

178

r_moveToTable:1

34

35

si_HCmitfun:1

40

si_HCmitsafmod:1

42

si_HCmit:1

36

si_HCressafmod:1

37

si_HCresfun:1

39

si_HCmitfun:1

41

si_HCresfun:1

si_HCressafmod:1

49

51

si_HCmitfun:1

61

si_HCmitsafmod:1

50

52

si_HCmitfun:1

62

si_HCmitsafmod:1

65

si_HCmit:1

si_HCmit:1

53

si_HCressafmod:1

58

si_HCressafmod:1

s_HCresume:1

55

si_HCresfun:1

59

si_HCmitfun:1

60

si_HCmitfun:1

63

si_HCresfun:1

si_HCressafmod:1

si_HCressafmod:1

75

76

si_HCmitfun:1

79

si_HCmitsafmod:1

si_HCmit:1

78

si_HCmitfun:1

108

109

si_HCmitfun:1

114

si_HCmitsafmod:1

si_HCmit:1

112

si_HCressafmod:1

s_HCresume:1

113

si_HCmitfun:1

si_HCressafmod:1

125

126

si_HCmitfun:1

129

si_HCmitsafmod:1

si_HCmit:1

128

si_HCmitfun:1

135

136

si_HCmitfun:1

141

si_HCmitsafmod:1

si_HCmit:1

139

si_HCressafmod:1

s_HCresume:1

140

si_HCmitfun:1

si_HCressafmod:1

152

153

si_HCmitfun:1

156

si_HCmitsafmod:1

si_HCmit:1

155

si_HCmitfun:1

163

164

si_HCmitfun:1

169

si_HCmitsafmod:1

si_HCmit:1

167

si_HCressafmod:1

s_HCresume:1

168

si_HCmitfun:1

si_HCressafmod:1

172

173

si_HCmitfun:1

176

si_HCmitsafmod:1

si_HCmit:1

175

si_HCmitfun:1

182

183

si_HCmitfun:1

190

si_HCmitsafmod:1

s_HCstop:1

184

188

si_HCressafmod:1

s_HCresume:1

185

si_HCresfun:1

189

si_HCmitfun:1

191

si_HCressafmod:1

192

si_HCresfun:1

204

si_HCmitfun:1

206

si_HCmitsafmod:1

s_HCstop:1

205

si_HCmitfun:1

Figure 12: Bird’s-eye view of the policy synthesised for the queryRpot

max=?[C]∧Reffmax=?[C]. Nodes are the states reachable in M from

s0, including HC-safe states (green), HC-unsafe states (orange), andmishap states (red). Edges indicate robotArm and welder actions(red), actions of the operator (black), the ASC (green), and cycletermination (blue). The gray fragment is magnified in Fig. 13.

reachable from s0, and edges for all transitions generated byδP as derived from the guarded commands of P . The edgesform executions of M from s0 under π?.

Example 8. Fig. 12 provides a bird’s eye view of a synthesised policy.States coloured in green form the set HC-safe, i.e., states in whichHC is inactive or mitigated.

Fig. 13 shows a fragment of Fig. 12. Note the 5% chanceof a sensor failure from state 90 leading to state 92 (i.e., theoperator has approached the robotArm and welder) whereHC will remain undetected and not handled. Otherwise, in state 93,HC will be mitigated after the next work step of the robotArm andwelder leading to state 202. From there, the ASC mitigates to aprotective stop (state 176), and resumes to state 178 (i.e., wherethe operator has left the safeguarded area) from where the currentmanufacturing cycle can be finished (state 33).

Table III: Examples of checked properties and queried objectives

Property† Description

Well-formedness φwf of Mv: E F(f ∧ ¬final) Can the hazard f occur during a

cycle of P?f: E F(deadlock ∧ ¬final) Are all deadlock states final? Does

P deadlock early?f: A F f Is f inevitable?f: ¬∃s ∈ S : final ∧ init Are there initial states that are also

final states?v: E F final Can P finish a production cycle?

Querying for a (Pareto-)optimal ASC π?

Rpotmax=?[C] ∧ Reff

max=?[C] Assuming an adversarial environ-ment, select π that maximallyutilises the ASC.

Rprodmax=?[C] ∧ Rsev

≤s [C] ∧ Rrisk≤r [C] Select ASC that maximises produc-

tivity constrained by risk level rand expected severity s.

Rprodmax=?[C] ∧ Rsev

≤s [C] Select ASC that maximises produc-tivity constrained by exposure p tosevere injuries.

Cycle-bounded correctness φc of a policy π (or the policy space ΠM)

v: A F(ζ → A X f ) Does the ASC on all paths imme-diately detect the hazard χ?

v: A F(f → (A F f → (A F 0f))) Does the ASC lively handle hazardf in all situations?

v: E F(f ∧ F final) Does the ASC resume P so it canfinish its cycle after f has occurred?

v: P>p[G¬mishap] Is the probability of mishap free-dom greater than p?

Reliability φr of a selected ASC π?

v: S<p mishap Is the steady-state (long-run) prob-ability of any mishap f below p?

† deadlock . . . state with no commands enabled, final . . . end of man-ufacturing cycle, init . . . initial state of a manufacturing cycle, mishap. . . mishap state, p . . . probability bound, v . . . to be verified, f . . . to befalsified, prod . . . productivity, sev . . . severity, eff . . . ASC effectiveness,risk . . . risk level, pot . . . risk reduction potential

VI. EVALUATION

In this section, we discuss the adequacy and efficacy of theproposed method from several viewpoints.

A. Research Questions and Evaluation Methodology

Based on the questions raised in Sec. I, we investigate the(i) scalability and performance of the approach and (ii) theeffectiveness of the ASCs synthesised by it, asking:RQ1 How well can the approach deal with multiple hazards

and mitigation and resumption options? What are theresulting model sizes and analysis times?

RQ2 What is the likelihood of incident/accident-free opera-tion under the control of the synthesised ASCs?

RQ3 Which process overheads are to be expected of an ASCimplementation?

For RQ1, we consider as inputs and parameters a YAPrisk model and a PRISM MDP model of the cell (with YAPtemplate placeholders), a single initial state of these modelswhere all actors are in the activity off and no hazard is active.Accordingly, we prepare and analyse multiple increments of

7

93

90h_ approachWeldSpot:0.95

92

h_ approachWeldSpot:0.05

104

r_ moveToWelder:1

202

rw_ weldStep:1

177

203

si_ HCact:1

176

si_ HCmit:1

204

si_ HCmitfun:1

206

si_ HCmitsafmod:1

s_ HCstop:1

33 --:1

82

81

h_ grabRightWorkpiece:1

p_ final:1

171

h_ exitPlant:1

170

si_ HCresfun:1

131

r_ placeWorkpieceRight:1

178

r_ moveToTable:1

167

s_ HCresume:1

si_ HCressafmod:1

177

Figure 13: Fragment of the policy shown in Fig. 12. Split at state177 for layout efficiency.

the risk model, each adding one critical event, mitigationoptions, and constraints to the model.

For RQ2, let Ξ ⊂ S be the set of non-accident F -unsafestates, i.e., states labelled with at least one critical event,describing the abstract state where any critical event has atleast been sensed by the ASC (e.g. CE HC with its handlingnot yet started, i.e., 0HC). For MDPs, we evaluate accidentfreedom with

P¬A ≡ fs∈Ξ Psmin=?[¬mishap W safe] (1)

where f ∈ {min,mean,max}. For Ξ, Formula (1) requiresthe ASC to minimise the probability of mishaps until an F -safe state (i.e., S \ Ξ) is reached. In Tab. IV, bµe denotes thetriple comprising min, the arithmetic mean µ, and max. P¬Aaggregates these three probabilities over Ξ.

Next, we synthesise policies for each of the MDP incre-ments for the three optimisation queries

Rpotmax=?[C] ∧ Pmax=?[F final t ], (a)

Rprodmax=?[C] ∧ Pmax=?[F final t ], and (b)

Reffmax=?[C] ∧ Rnuis

max=?[C]. (c)

where final t = {s ∈ S | s ∈ final ∧ all tasks finished}. In thespirit of negative testing, Formula (a) aims at maximising theuse of the ASC (i.e., approximating worst-case behaviour ofthe operator and other actors) while maximising the probabilityof finishing two tasks, i.e., finishing a workpiece and carryingthrough cell maintenance. This query does not take into ac-count further opmitisation parameters defined for mitigationsand resumptions. As opposed to that, Formula (b) fosters themaximisation of productivity, any combination of decisionsallowing the finalisation of tasks is preferred, hence, transitionsleading to accidents or the use of the ASC are equallyneglected. While Formula (c) also forces the environment totrigger the ASC, these policies represent the best ASC usagein terms of nuisance and eff ort. Because of constraints in theuse of Rmin for MDPs, we maximise costs interpreting positivevalues as negative (e.g. the higher the nuisance the better).

Figure 14: Pareto curve with five policies for Formula (c) for model7

We investigate the Pareto curves of the policies synthesisedfrom the Formulas (a) to (c). For policies with less than1000 states, we inspect the corresponding policy graphs (e.g.whether there is a path from initial to final or whether pathsfrom unsafe states reachable from initial avoid deadlocks). Fi-nally, we evaluate accident freedom according to Formula (1),except that we use P=? for DTMCs instead of Pmin=?.3

B. Results

For the experiment, we used YAP 0.5.1 and PRISM 4.5, onGNU/Linux 5.4.19 (x86, 64bit), and an Intel R© Core i7-8665Uwith up to 8 CPUs of up to 4.8 MHz, and 16 GiB RAM.

Tab. IV shows the data collected from seven models createdfor RQ1 and RQ2. The result bµe = [1, 1, 1] for a policydenotes 100% conditional accident freedom. This desirableresult is most often achieved with Formula (c) due to thefact that simultaneity of decisions of the environment and theASC in the same state is avoided by focusing on rewards onlyspecified for ASC actions. Such rewards model the fact thatan ASC is usually much faster than an operator. Formulas (a)and (b) show poorer accident freedom because productivityrewards given to the environment compete with rewards givento the ASC to exploit its risk reduction potential.

For demonstration of YAP’s capabilities, the incident RTand the accident RC are included in the risk model withouthandler commands. However, these factors add further con-straints on R(F ) to be dealt with by the ASC. Hence, mrstays at 15 actions and c rises to 15 constraints. In model7 (last line of Tab. IV), the Ξ-fraction of S (12079 states) andR(F ) (122 risk states) differ by two orders of magnitude. Webelieve, such an abstraction underpins the potential usefulnessof the proposed risk model in such applications.

For RQ3, we can at the current stage of this projectonly provide a ballpark figure for the detection and handlingoverheads. Let t : αP → R be the processing time required foran action, e.g. for the calculation of the detection of HC in eHC.If implemented as part of a sequential cell controller, the ASCrequires a time slot of length Σf∈F t(e

f) in each control cycle.If monitored simultaneously in dedicated ASC hardware, theslowest detection rate for F is 1/maxf∈F t(e f). The overheadfor handling f can be estimated from Fig. 6 and may rangefrom t(mHC) to Σk∈{sm,a,sf }

(t(m f

k) + t(m fr,k)

).

3To keep manual workload under control, if PRISM lists several adver-saries, we apply the experiment procedure only to the first listed.

8

Table IV: Results of the experiment for RQ1 (scalability) and RQ2 (accident-free operation)

Risk Model† MDP† (a) max-ASC† (b) max-prod (c) opt-ASCF mr/c |R(F )| tY P¬A Ξ sta/tra P¬A Ξ tP P¬A Ξ tP P¬A Ξ tP

[ms] bµe bµe [s] bµe [s] bµe [s]

HC 5/0 3 40 [.9,.9,.9] 14 322/1031 [1,1,1] 3 .02 [1,1,1] 1 .02 [1,1,1] 6 .15+HS 9/2 5 52 [.92,.96,.98] 256 930/3483 [.07,.66,1] 11 .77 [0,.88,1] 8 .82 [.95,.98,1] 18 .9+WS 11/3 8 44 [.93,.97,1] 288 1088/3865 [0,.29,1] 17 2.1 [0,.8,1] 5 2 [1,1,1] 24 1.5+HRW 13/7 16 65 [.93,.97,1] 981 7675/33322 [1,1,1] 17 9.7 [1,1,1] 11 9.4 [1,1,1] 15 13.3+HW 15/8 36 76 [.93,.97,1] 2296 21281/98694 [1,1,1] 15 42.9 [0,.71,1] 7 41.4 [1,1,1] 15 46.6+RT 15/9 50 87 [.93,.97,1] 2864 21965/100133 [1,1,1] 13 48.2 [1,1,1] 9 46.4 [1,1,1] 15 53.8+RC 15/15 122 162 [.93,.99,1] 12079 21670/102263 [0,.94,1] 35 38 [0,.72,1] 22 36.6 [1,1,1] 36 51.1† F . . . critical event set; mr/c. . . number of mitigations+resumptions/constraints; |R(F )|. . . cardinality of the risk space; tY . . . YAP’s processing time;P¬A. . . probability of conditional accident freedom; Ξ. . . set of F -unsafe states; sta/tra. . . number of states/transitions of the MDP (sta equals the sizeof the policies); Formulas (a) to (c). . . optimisation queries; tP . . . PRISM’s processing time

C. Discussion

Relative Safety of a Policy: To simplify game-theoreticreasoning about M, we reduce non-deterministic choice forthe environment (i.e., operator, robot, welder). The moredeterministic such choice, the closer the gap between policyspace ΠM and ASC design space. Any decisions left to theenvironment will make a verified policy π safe relative to π’senvironmental decisions. These decisions form the assumptionof the ASC’s safety guarantee. Occupational health and safetyassumes trained operators not to act maliciously, suggesting“friendly environments” with realistic human errors. To in-crease priority of the ASC, we can express such an assumption,e.g. by minimising risk and maximising pot.

Sensing Assumptions: In our example, the ASC relies onthe detection of an operator (e.g. extremities, body) and arobot (e.g. arm, effector) entering a location, the cell state (e.g.grabber occupied, workbench support filled), and the work-piece location (e.g. in grabber, in support). ForM, we assumethe tracking system (i.e., range finder and light barrier inthe industrial setting, Kinect in the lab replica) to map thelocation of the operator and robot to the areas “at table”, “atworkbench”, “in cell”, and “at welding spot”. In Fig. 1b, therange finder signals “at welding spot” if the closest detectedobject is nearer than the close range, and “in cell” if the closestobject is nearer than the wide range. Tracking extensions, notdiscussed here, could include object silhouettes and minimumdistances, operator intent, or joint velocities and forces.

Sensor Faults: pGCL requires much care with the mod-elling of real-time behaviour, particularly, when actions fromseveral concurrent modules are enabled. To model real-timeASC behaviour, we synchronise operator actions with sensorevents and force the priority of ASC reactions in π? bymaximising the risk reduction potential (cf. pot in Tab. III).While synchronisation restricts global variable use increasingM’s state space, we found it to be the best solution.

Model Debugging and Tool Restrictions: To reduce the statespace, we strongly discretise location. To simplify debugging,we use probabilistic choice in synchronous updates only inone of the participating commands. To support synchronisationwith complex updates, we avoid global variables.

State rewards would allow a natural modelling of, e.g. risk

exposure. In PRISM 4.5, one needs to use action rewardsfor multi-objective queries of MDPs. Risk gradient matriceshelp to overcome a minor restriction in PRISM’s definition ofaction rewards.4 Alternatively, we could have introduced extrastates, however, at the cost of increasing M’s state space,undesirable for synthesis. Rewards require the elimination ofnon-zero end components (i.e., deadlocks or components withcycles that allow infinite paths and, hence, infinite rewardaccumulation). PRISM provides facilities to identify suchcomponents, however, their elimination is non-trivial and labo-rious in large models and can require intricate model revisions.

VII. CONCLUSION

We introduced a tool-supported method for the correct-by-construction synthesis of automatic safety controllers fromMarkov decision process models of human-robot collaborationsettings. These controllers implement regulatory safety goalsfor such settings. We describe steps for streamlining the mod-elling of MDPs. Our method draws support from two tools,YAP for structured risk modelling and MDP generation andPRISM for probabilistic model checking and MDP policy syn-thesis. We show that our approach can be used to incrementallybuild up multi-hazard models including alternative mitigationand resumption options. Hence, our approach improves thestate of the art of ASC synthesis for HRC settings, particularlywhen dealing with multiple risks, mitigation options, andsafety modes. The verification results obtained by using ourmethod can form evidence in an ASC assurance case [29].

Future Work: Our approach limits the inference of high ef-fectiveness of an ASC from high conditional accident freedomof the associated policy. Our setting can require the assessmentof how much the decisions of the ASC and the environmentcontribute to the accident freedom. We plan to explore game-theoretic settings to remove this limitation.

The evaluation of the verified controller in the manufactur-ing cell (e.g. overhead in resource usage, influence on nominaloperation) is out of scope of this paper. Such an evaluationrequires the translation of the controller into an executableform. Our next steps will be the conversion of the synthesised

4Currently, rewards cannot be associated with particular updates, i.e., withincoming transitions rather than only states.

9

DTMC into a program for the digital twin simulator and thereplica of the cell. Note that this translation has to be verifiedto match the executable form with the verified properties.Additionally, we plan to derive tests for this program fromthe facilities provided by the simulator.

For optimal synthesis, the proposed method uses parameterssuch as upper risk and severity bounds in constraints. Weplan to introduce parameters for the probabilities into theMDP, supported by tools such as EVOCHECKER [30], and touse parametric risk gradient matrices by extending YAP. Weintend to explore the use of EVOCHECKER to avoid the splitof the verification procedure into two stages (cf. Secs. V-D2and V-D4). We also like to explore online policy synthesis toallow more variety in environmental decisions (e.g. maliciousoperators). This corresponds to weakening the assumptionsunder which the ASC can guarantee safety.

Unable to collect data (cf. Sec. V-B) from an industrialapplication, we had to make best guesses of probabilities.However, the frequency of undesired intrusion of operatorsinto the safeguarded area and accident likelihood can betransferred into our example. This example can be extendedby randomised control decisions with fixed probabilities (e.g.workload), by adding uncertain action outcomes (e.g. weldingerrors), and by time-dependent randomised choice of mitiga-tion options. To use time in guarded commands, we want toexplore clock-based models as far as synthesis capabilitiesallow this, rather than only using reward structures.

Acknowledgements: This research was funded by the As-suring Autonomy International Programme grant CSI:Cobot.We are grateful for many insights into manufacturing robotcontrol from our project partners at the University of Sheffieldand our industrial collaborator. We also thank David Parker forhis advice in the use of PRISM’s policy synthesis facility.

REFERENCES

[1] P. Nicolaisen, “Occupational safety and industrial robots,” in RobotSafety, Bonney and Yong, Eds. IFS (Publications) Ltd., 1985, pp.33–48.

[2] R. H. Jones, “A study of safety and production problems and safetystrategies associated with industrial robot systems,” Ph.D. dissertation,Imperial College, 1986.

[8] L. Kaiser, A. Schlotzhauer, and M. Brandsttter, “Safety-related risksand opportunities of key design-aspects for industrial human-robotcollaboration,” in LNCS. Springer, 2018, pp. 95–104.

[3] A. D. Santis, B. Siciliano, A. D. Luca, and A. Bicchi, “An atlas ofphysical human–robot interaction,” Mechanism and Machine Theory,vol. 43, no. 3, pp. 253–270, mar 2008.

[4] N. Sugimoto, “Safety engineering on industrial robots and their draftstandards for safety requirements,” in Proceedings of the 7th Interna-tional Symposium on Industrial Robots, 1977, pp. 461–470.

[5] R. Alami, A. Albu-Schaeffer, A. Bicchi, R. Bischoff, R. Chatila, A. D.Luca, A. D. Santis, G. Giralt, J. Guiochet, G. Hirzinger, F. Ingrand,V. Lippiello, R. Mattone, D. Powell, S. Sen, B. Siciliano, G. Tonietti,and L. Villani, “Safe and dependable physical human-robot interaction inanthropic domains: State of the art and challenges,” in 2006 IEEE/RSJInternational Conference on Intelligent Robots and Systems. IEEE,2006.

[6] S. Haddadin, A. Albu-Schaffer, and G. Hirzinger, “Requirements forsafe robots: Measurements, analysis and new insights,” The InternationalJournal of Robotics Research, vol. 28, no. 11-12, pp. 1507–1527, aug2009.

[7] X. V. Wang, Z. Kemeny, J. Vancza, and L. Wang, “Human–robot collab-orative assembly in cyber-physical production: Classification frameworkand implementation,” CIRP Annals, vol. 66, no. 1, pp. 5–8, 2017.

[9] B. Matthias, S. Kock, H. Jerregard, M. Kallman, and I. Lundberg,“Safety of collaborative industrial robots: Certification possibilities fora collaborative assembly robot concept,” in 2011 IEEE InternationalSymposium on Assembly and Manufacturing (ISAM). IEEE, may 2011.

[10] J. A. Marvel, J. Falco, and I. Marstio, “Characterizing task-based human-robot collaboration safety in manufacturing,” IEEE Transactions onSystems, Man, and Cybernetics: Systems, vol. 45, no. 2, pp. 260–275,feb 2015.

[11] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic conceptsand taxonomy of dependable and secure computing,” Dependable andSecure Computing, IEEE Transactions on, vol. 1, no. 1, pp. 11–33, 2004.

[12] ISO 10218, “Robots and robotic devices – safety requirements forindustrial robots,” Robotic Industries Association (RIA), Standard,2011. [Online]. Available: https://www.iso.org/standard/51330.html

[13] R. B. Gillespie, J. E. Colgate, and M. A. Peshkin, “A general frameworkfor cobot control,” IEEE Transactions on Robotics and Automation,vol. 17, no. 4, pp. 391–401, 2001.

[14] ISO/TS 15066, “Robots and robotic devices – collaborative robots,”Robotic Industries Association (RIA), Standard, 2016. [Online].Available: https://www.iso.org/standard/62996.html

[15] V. Villani, F. Pini, F. Leali, and C. Secchi, “Survey on human–robotcollaboration in industrial settings: Safety, intuitive interfaces and appli-cations,” Mechatronics, vol. 55, pp. 248–266, nov 2018.

[16] B. Hayes and B. Scassellati, “Challenges in shared-environment human-robot collaboration,” in Proceedings of the Collaborative ManipulationWorkshop at HRI, 2013.

[17] E. Helms, R. D. Schraft, and M. Hagele, “rob@work: Robot assistantin industrial environments,” in Proceedings. 11th IEEE InternationalWorkshop on Robot and Human Interactive Communication. IEEE,2002.

[18] M. Gleirscher, “Run-time risk mitigation in automated vehicles: Amodel for studying preparatory steps,” in 1st iFM Workshop on FormalVerification of Autonomous Vehicles (FVAV), ser. EPTCS, L. Bulwahn,M. Kamali, and S. Linker, Eds., 2017.

[19] ——, YAP – Yet Another Planner: User’s Manual, Technical Universityof Munich and University of York, 2020. [Online]. Available:http://gleirscher.de/dl/yap-manual.pdf

[20] M. Askarpour, D. Mandrioli, M. Rossi, and F. Vicentini, “SAFER-HRC: Safety analysis through formal vERification in human-robotcollaboration,” in LNCS. Springer, 2016, pp. 283–295.

[21] A. Orlandini, M. Suriano, A. Cesta, and A. Finzi, “Controller synthesisfor safety critical planning,” in 2013 IEEE 25th International Conferenceon Tools with Artificial Intelligence. IEEE, nov 2013.

[22] A. Cesta, A. Orlandini, G. Bernardi, and A. Umbrico, “Towards aplanning-based framework for symbiotic human-robot collaboration,” in2016 IEEE 21st International Conference on Emerging Technologiesand Factory Automation (ETFA), sep 2016.

[23] J. Heinzmann and A. Zelinsky, “Quantitative safety guarantees forphysical human-robot interaction,” The International Journal of RoboticsResearch, vol. 22, no. 7-8, pp. 479–504, 2003.

[24] P. Long, C. Chevallereau, D. Chablat, and A. Girin, “An industrialsecurity system for human-robot coexistence,” Industrial Robot: AnInternational Journal, vol. 45, no. 2, pp. 220–226, 2018.

[25] V. Forejt, M. Kwiatkowska, G. Norman, and D. Parker, “Automatedverification techniques for probabilistic systems,” in Formal Methods forEternal Networked Soft. Sys., ser. LNCS, M. Bernardo and V. Issarny,Eds., vol. 6659, 2011, pp. 53–113, tutorial.

[26] C. Baier and J.-P. Katoen, Principles of Model Checking. MIT Press,2008.

[27] N. G. Leveson, Safeware: System Safety and Computers. Addison-Wesley, 1995.

[28] ——, Engineering a Safer World: Systems Thinking Applied to Safety,ser. Engineering Systems. MIT Press, 2012.

[29] M. Gleirscher, S. Foster, and Y. Nemouchi, “Evolution of formalmodel-based assurance cases for autonomous robots,” in 17th Int. Conf.Software Engineering and Formal Methods, ser. LNCS, vol. 11724.Springer, 2019.

[30] S. Gerasimou, R. Calinescu, and G. Tamburrelli, “Synthesis of proba-bilistic models for quality-of-service software engineering,” Automated

Software Engineering, vol. 25, no. 4, pp. 785–831, 2018.

10

https://www.iso.org/standard/51330.html

https://www.iso.org/standard/62996.html

http://gleirscher.de/dl/yap-manual.pdf

Documents

Safety Controller Synthesis for Collaborative Robots · (IRSs) and humans [1], [2] can leverage their complementary skills, but is difﬁcult to achieve because of uncontrolled hazards