28
Bettina Könighofer Shield Synthesis for AI www.iaik.tugraz.at www.iaik.tugraz.at Shield Synthesis for AI Bettina Könighofer Roderick Bloem Ufuk Topcu Scott Niekum Mohammed Alshiek Suda Bharadwaj Rüdiger Ehlers Nils Jansen Sebastian Junges Rayna Dimitrova Thomas Henzinger Guy Avni Krishnendu Chatterjee

Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

www.iaik.tugraz.at www.iaik.tugraz.at

Shield Synthesis for AI

Bettina KönighoferRoderick Bloem

Ufuk TopcuScott NiekumMohammed AlshiekSuda Bharadwaj

Rüdiger Ehlers

Nils Jansen

Sebastian Junges

Rayna Dimitrova

Thomas HenzingerGuy AvniKrishnendu Chatterjee

Page 2: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

2

LiVe @ ETAPS, PragueApril 6, 2019

Reinforcement Learning

ReactiveSynthesis

Your Controller

Synthesis

Your Specification

𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑭𝑅 ∧ 𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑭𝑆∧ 𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑿 𝐶 𝑼 𝑏𝑙𝑜𝑐𝑘𝑒𝑑

Infinitely often, visit R and S. If S is blocked, go to C. Resume visiting R and S once S is unblocked.

via Game Solving

Page 3: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

3

LiVe @ ETAPS, PragueApril 6, 2019

Reinforcement Learning

ReactiveSynthesis

• Large• Complicated• Highly optimized• Many sensors• …

Your Controller

• Large• Hard to write• Greyscale

Your Specification

Page 4: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

4

Reinforcement Learning

Reinforcement Learning

ReactiveSynthesis

EnvironmentLearning

Agent𝑠𝑡𝑎𝑡𝑒

𝑎𝑐𝑡𝑖𝑜𝑛

𝑟𝑒𝑤𝑎𝑟𝑑

LiVe @ ETAPS, PragueApril 6, 2019

Page 5: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

5

LiVe @ ETAPS, PragueApril 6, 2019

Reinforcement Learning

ReactiveSynthesis

Correctness Guarantees

OptimalityHow?

Page 6: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

6

LiVe @ ETAPS, PragueApril 6, 2019

Reinforcement Learning

ReactiveSynthesis

Correctness Guarantees

Optimality

• Large• Complicated• Highly optimized• Many sensors• …

Your Controller

• Large• Hard to write• Greyscale

Your Specification

Shielding

Page 7: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

7

LiVe @ ETAPS, PragueApril 6, 2019

Reinforcement Learning

ReactiveSynthesis

• Critical aspects only• Small & sweet

• Large• Complicated• Highly optimized• Many sensors• …

Your Controller Critical Spec

Correctness Guarantees

OptimalityShielding

Page 8: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

8

LiVe @ ETAPS, PragueApril 6, 2019

EnvironmentLearning

Agent

Shield

ShieldingPreemptive

Reinforcement Learning

ReactiveSynthesis

Minimal Interference

Page 9: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

9

LiVe @ ETAPS, PragueApril 6, 2019

ShieldingPost-Posed

Environment Learning Agent

Shield

Policy Update: for 𝒔𝒂𝒇𝒆_𝒂𝒄𝒕𝒊𝒐𝒏 using 𝑟𝑒𝑤𝑎𝑟𝑑 for 𝑎𝑐𝑡𝑖𝑜𝑛 if 𝑎𝑐𝑡𝑖𝑜𝑛 𝒔𝒂𝒇𝒆_𝒂𝒄𝒕𝒊𝒐𝒏 :

1. Assign a punishment to 𝑎𝑐𝑡𝑖𝑜𝑛2. Assign 𝑟𝑒𝑤𝑎𝑟𝑑 to 𝑎𝑐𝑡𝑖𝑜𝑛

Shield can be added in execution phase

Page 10: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

A Shield for PAC-MAN10

LiVe @ ETAPS, PragueApril 6, 2019

M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, U. Topcu: Safe Reinforcement Learning via Shielding. AAAI 2018

Page 11: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Outline11

Safety Shields

Optimal Shields

Safety Shields for Multi-Agent Systems

Probabilistic Safety Shields

LiVe @ ETAPS, PragueApril 6, 2019

AAAI-18

submission

ACC-19

arXiv

Page 12: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Optimal Shields12

LiVe @ ETAPS, PragueApril 6, 2019

Problems of learned controllers (Safety problems)

1. Difficult to add new features2. Poor performance on

un-trained behavior3. No local fairness

Solution:Optimal Shield

Page 13: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Shields for Traffic Light Controllers13

LiVe @ ETAPS, PragueApril 6, 2019

Learned Controller: “minimize total waiting time”

1. Difficult to add new features priority to public transport, changes due to an accident

2. Poor performance on un-trained behavior Uniform traffic congestion meets rush-hour traffic

3. No local fairness Farm road never gets green

Page 14: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Lightweight shields 𝑐 : Cost for behavior 𝑐 : Cost for interference

Optimal Shields Synthesis14

LiVe @ ETAPS, PragueApril 6, 2019

Environment Learned Controller

opt-Shield’

𝝀 ⋅ 𝒄𝑩𝑬𝑯 𝟏 𝝀 ⋅ 𝒄𝑰𝑵𝑻

Mean-Payoff Game with 2 Objectives Mean-Payoff Game

𝜆: tradeoff between objective of controller vs shield

Two cost functions

Page 15: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Controller Deep Convolutional Q-Network

16 dim input vector num approaching cars, waiting time 4 layers (16, 604, 604, 4 nodes), Q-learning: 𝛼 0.001, 𝛾 0.95

“Minimize waiting time of two junctions” Shield 𝑐 : size of maximal queue 𝑐 : 1 for interference, 0 otherwise

Dealing with rush-hour traffic15

LiVe @ ETAPS, PragueApril 6, 2019

𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑠𝑡𝑎𝑡𝑒1,8,1,2

Page 16: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Dealing with rush-hour traffic16

LiVe @ ETAPS, PragueApril 6, 2019

Page 17: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Outline17

Safety Shields

Optimal Shields

Safety Shields for Multi-Agent Systems

Probabilistic Safety Shields

LiVe @ ETAPS, PragueApril 6, 2019

AAAI-18

submission

ACC-19

arXiv

Page 18: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Safety Shields for Multi-Agent Systems18

Task: Enforce global safety property1. Quantitative interference costs Counting cost function Different costs for interferences with different agents

2. Fair Shielding Do not always interfere with the same agent repeatedly

LiVe @ ETAPS, PragueApril 6, 2019

Page 19: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Case Study: Warehouse19

LiVe @ ETAPS, PragueApril 6, 2019

Page 20: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Case Study: Warehouse20

LiVe @ ETAPS, PragueApril 6, 2019

S. Bharadwaj, R. Bloem, R. Dimitrova, B. Könighofer, and U. Topcu: Synthesis of Minimum-Cost Shields for Multi-agent Systems. ACC-19

Page 21: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Outline21

Safety Shields

Optimal Shields

Safety Shields for Multi-Agent Systems

Probabilistic Safety Shields

LiVe @ ETAPS, PragueApril 6, 2019

AAAI-18

submission

ACC-19

arXiv

Page 22: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Shielding original Pacman?22

LiVe @ ETAPS, PragueApril 6, 2019

State space is huge!

Not realizable!

Page 23: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Learning the Adversary Model23

Each ghost has it‘s individual behaviour Observe it, model the behaviour Data augmentation techniques Is PAC-MAN north, south, east, or west?

Results in MDP of environment Guaranteed safety w.r.t. probabilistic temporal

logic spec

LiVe @ ETAPS, PragueApril 6, 2019

Page 24: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

MDP is huge! Scalability24

Finite Horizon safety for finite number of steps infinite horizon may cause large errors anyways

Piecewise Construction compute shield for each state independently

LiVe @ ETAPS, PragueApril 6, 2019

Page 25: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

MDP is huge! Scalability25

Independent Agents crashing probabilities for different agents are

stochastically independent compute individually, compose shields

Abstractions adversaries may be far away neglect adversary positions that are not relevant

LiVe @ ETAPS, PragueApril 6, 2019

Page 26: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Probabilistic Safety Shield for Pacman26

LiVe @ ETAPS, PragueApril 6, 2019

N. Jansen, B. Könighofer2, S. Junges, and R. Bloem:Shielded Decision-Making in MDPs, arXiv

Page 27: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

Future Work27

Safety Shields

Optimal Shields

Safety Shields for Multi-Agent Systems

Probabilistic Safety Shields

LiVe @ ETAPS, PragueApril 6, 2019

Performance in autonomous systems

Shields for CPS, Deal with wrong models

Partially observable MDPs

Distributed Shield Synthesis

Page 28: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global

Bettina Könighofer Shield Synthesis for AI

28

LiVe @ ETAPS, PragueApril 6, 2019