Advanced Laboratory on Embedded Systems S.r.l.
A Research and Innovation Company
Alberto Ferrari – Research Scientist
7/9/2012 Alberto Ferrari - ALES S.r.l.
Application landscape and complexity
Mapping applications to execution platforms — Satisfying Safety and real-time requirements
Response Time Analysis — WCE vs Test based
Performance prediction is really hard …
What are the alternatives ? — Predictable architectures
— Probabilistic real-time analysis
— Statistical Model Checking
Conclusions
Outline
7/9/2012 Alberto Ferrari - ALES S.r.l. 2
3 Alberto Ferrari – ALES S.r.l.
Otis Pratt &
Whitney
Carrier
Sikorsky
UTC Fire & Security
Hamilton Sundstrand
UTC TODAY
aerospace systems
power solutions building systems UTC Power
Otis UTC Fire & Security
Hamilton Sundstrand Sikorsky
Carrier
Pratt & Whitney
2009 Revenue - $53 billion
7/9/2012
4 Alberto Ferrari – ALES S.r.l.
OTIS Elevators
1. EN: GeN2-Cx
2. ANSI:
Gen2/GEM
3. JIS:
GeN2-JIS
Remote monitoring and control
7/9/2012
5 Alberto Ferrari – ALES S.r.l.
Vent
Inflow determined by
mass flow, temperature
and gas concentration
Temperature distribution
O2 concentration distribution
Overpressure
time trace
Modeling capability
enables analysis and
design of better fire
suppression systems
ODE for cylinder/nozzle
density and temperature
Modeling enables analysis, decision support and design
of robust control for fire suppression system
Outcome – better and safer fire suppression Implementation with minimal change in hardware
Space to be protected and
components of fire suppression system
Control Panel:
Control logic,
communications Sensors
Gas
discharge
nozzle
Gas cylinder
0 10 20 30 40 50 60 70
Time (sec)
Withoutoverpressure control
Withoverpressure control
Safepressurelimit
Pre
ssu
re
Inert Gas Fire Suppression
Embedded systems modeling for fast verification,function reuse & reduced risks
7/9/2012
6 Alberto Ferrari – ALES S.r.l.
Energy, Comfort, Security Needs in Buildings are Evolving UTC presence in buildings creates opportunities and research challenges
Carbon-neutral buildings by 2030
Buildings must be 4X-5X more
energy efficient
Threats becoming more complex
98% false alarms
Customer-focused solutions
Enabled by integrated systems
7/9/2012
7 Alberto Ferrari – ALES S.r.l.
AEROSPACE SYSTEM EVOLUTION
7/9/2012
8 Alberto Ferrari – ALES S.r.l.
MORE ELECTRIC AIRCRAFT
7/9/2012
Sub-System Design
System Design
Design and Verification Processes
7/9/2012 ALES S.r.l.
System partitioning
Network requirements specification
Network selection and configuration Sub-System
integration
Network Integration
System Validation System requirements
Component implementation
Network and Sub-System specification
Partitioning
Component specification
Component Verification
Sub-System Verification
9
V & V
Electrical Domain Control Domain
Cyber Physical Systems – Abstracting Time
timed untimed/DT
Execution platform (untimed)
7/9/2012 Alberto Ferrari - ALES S.r.l. 10
Infinite resources never failing…
Electrical Domain Control Domain
Introducing Non Ideal Execution Platform…
timed untimed/DT
RTOS RTOS RTOS RTOS RTOS
timed
… adding time … adding failure modes
- Impact on control function - HW/SW performance - Identify bottlenecks
- Buffer size - Bandwidth capacity
- End-to-end delay - Reconfigurations
7/9/2012 Alberto Ferrari - ALES S.r.l.
Performance constraints (latency, safety)
11
Mapping functional network to the platform: — Allocation: static vs dynamic — Scheduling: static vs dynamic
Design and verification: — Safety constraints (design)
• Robustness to faults (10-7-10-11) – Automotive: ISO26262 – Avionics: ARP4761, DO178C – Building Automation: IEC61508, ISO22201,
EN 81-1/A1
— Real-time constraints (verify) • Latency • Response time • Resource usage
Assuming that the correct behavior is implemented …
7/9/2012 Alberto Ferrari - ALES S.r.l.
Real-time and Safety Analysis
Functional
Description
Platform
Description
Function/
Platform
Mapping
Verification
Verification
Synthesis
Platform
Abstraction
Functional
Requirements
Verification
Non
Functional
Requirements
Verification
Platform
Abstraction
12
Highly process oriented governed by standards (ISO26262, ARP4761)
Mainly based on manual flows/analysis
Safety requirements processed since the initial phases of the design
Verification based on tests and simulation methods
Model based methods recently accepted by standards (DOC178C)
7/9/2012 Alberto Ferrari - ALES S.r.l.
Addressing Safety: Industrial practice
13
Fault Tolerant Scheduling design flow
Abstractinput
FineCTRL
ArbiterBest AbstractOut
Iterator
CoarseCTRL
Plant
FaultBehavior
Allocate Schedule
Mapping
Input ArbiterBest Output
FineCTRL
CoarseCTRL Sens
Sens
Sens Act
Act
Input ArbiterBest Output
ECU0
ECU1
ECU2
CH0
CH1
CoarseCTRL
Sens
Act
Sens Sens
Act
Optimize
FineCTRL
CoarseCTRL Sens
Sens
Sens Act
Act
Input ArbiterBest Output
ECU0
ECU1
ECU2
CH0
CH1
CoarseCTRL
Input ArbiterBest Output
Refine
Input
Arb
iterB
est
Outp
ut
Fin
eC
TR
L
Coars
eC
TR
L
Sens
Sens
Sens
Act
Act
Input
Arb
iterB
est
Outp
ut
EC
U0
EC
U1
EC
U2
CH
0
CH
1
Coars
eC
TR
L
Courtesy from Claudio Pinello
7/9/2012 Alberto Ferrari - ALES S.r.l. 14
Experiment/Simulation is not exhaustive: — Partial sampling yields too optimistic behavior
Analysis is complex, almost impractical — Simplified analysis yields too pessimistic results
Architectural elements with dynamic (non linear) behavior — State based information
Analysis vs Measurements
Best case
Worst case
Lower bound
Upper bound
Best case predictability
Worst case predictability
Best measurements
Worst measurements
Dis
trib
uti
on
of
tim
e
time
7/9/2012 Alberto Ferrari - ALES S.r.l. 15
Use abstract model to compute the communication time and the execution time of software — Best case - everything goes right: no cache miss, needed
resources free, no conflict to access the media — Worst case - everything goes bad: e.g. empty cache, needed
resources are busy, media busy with other communications
Timing Accident: event causing additional delay Timing Penalties: associated delay
Unfortunately modern execution platforms are very
difficult to abstract to keep the problem solvable and at the same time to achieve good prediction results — Timing accident are complex and dynamic — Timing penalties have big range
7/9/2012 Alberto Ferrari - ALES S.r.l.
Response Time Analysis
16
Experiment/Simulation is not exhaustive — Risk of missing worse cases
Analysis based on simplified models yields too pessimistic results (x10-x100) — Design solution is not cost effective
Predictability of modern architectures…
Best case
Worst case
Lower bound
Upper bound
Best case predictability
Worst case predictability
Best measurements
Worst measurements
Dis
trib
uti
on
of
tim
e
time
x10-x100
7/9/2012 Alberto Ferrari - ALES S.r.l. 17
Predictable architectures — Remove/Reduce sources of indetermination. E.g.
• Time Trigger Architecture: still need WCE to fit executions into slots
— Enable formal analysis — Not always cost efficient — Long term solution
Probabilistic analysis — Accept sources of indetermination of current architectures — “Gray-box” analysis (partial understanding of the architectural
components) — Statistical description of the overall system — e.g. probabilistic real-time (hard deadlines met with probability of 10-6,
10-12)
Statistical Model Checking — Prove with given confidence the satisfaction of a timing property — Bound the number of simulation runs
Looking for alternative approaches
7/9/2012 Alberto Ferrari - ALES S.r.l. 18
Communication: — Time trigger architecture
• The time uncertainty is reduced to the computation in a time slot
Computation: — Architecture with Reduced Timing Accident
• ISA with timing extension • Time interleaved (lately used on peripheral Cores) • Program Managed Temporary Memory
– Caches, buffers, RAM and related controllers are fully under software control
• Predictable DRAM Controllers (Predator, AMC) • Example: PRET: Berkeley Predictable Architecture
Adding temporal semantics to the computational model — PTIDES: Programming Temporally Integrated Distributed
Embedded Systems • Uses actor-oriented design • Based on Discrete-Event(DE) model of computation
Predictable Architectures
7/9/2012 Alberto Ferrari - ALES S.r.l. 19
Probabilistic Hard Real-Time Systems — Deadlines must be met, but it is likely to accept a
deadline miss (probabilities of the order of 10-6-10-12) — It is just another failure event of the system
Two approaches: — Gray-box and Statistical Analysis:
• “…the effects are understood in principle, but can not be captured analytically without very pessimistic simplifications and a cycle true simulation of all possible program paths with all potential input data combinations is inhibited by the complexity of the problem.”
• Probabilistic description of the system behavior (WCET)
— Randomized Architectures • The execution platform behaves randomly !
Probabilistic Real-Time
7/9/2012 Alberto Ferrari - ALES S.r.l. 20
Approach of Bernat et.al. 2002 “WCET Analysis of Probabilistic Hard Real-Time Systems”
Derive Execution Time Profile (ETP) associated to smaller units of a program — By measurement (on real processors) or by simulation
— By analytical methods
Statistical combination of ETP (composition rules and calculation procedure) — For Independent ETP: assuming independency between execution of two units of
program)
— For Dependent ETP: assuming that it is possible to calculate a Joint Execution Profile (JEP)
— For Unknown (dependency) ETP: assuming a worst JEP
Problems/Doubts: — How to accurately calculate ETP?
— How to accurately calculate JEP? • Dramatically more complex (experiment set grows quadratic)
Probabilistic Real-Time Analysis
[YCS-2003-353.pdf “pWCET: a Tool for Probabilistic Worst-CaseExecution Time Analysis of Real-Time Systems”]
[RTSS02_probabilistic_hard.pdf “WCET Analysis of Probabilistic Hard Real-Time Systems”]
7/9/2012 Alberto Ferrari - ALES S.r.l. 21
Move towards a truly randomized architectures — Opposite direction of predictable (deterministic) HW/SW architectures — Break any dependency on code execution history (typical in caches, but
not only there!) — Probabilistic analysis suffers dependencies (i.e. it is much simpler to rely
on statistical independence)
Example with caches: — Randomized replacement policy (select random a replacement victim) — Traditional replacement (LRU) depends on execution history
• probabilistic execution time analysis suffer of systemic cache misses (probabilistic analysis relies on statistical independence)
Problems/Doubts — How to achieve true randomization ? — Is this still cost effective ? — A reduced complexity architecture could achieve better mean
performance and B/WCET bounds ?
Randomized HW/SW Architectures
[ecrts09.pdf “Using Randomized Caches in Probabilistic Real-Time Systems”]
[ACM-probabilistic_rt.pdf “PROARTIS: Probabilistically Analysable Real-Time Systems”] 7/9/2012 Alberto Ferrari - ALES S.r.l. 22
Application scenario
Executable model of a system with random components
Property Φ verified or falsified by the model in a known and finite time
Statistical Model checking (SMC)
Estimation of the probability 𝑝 ̂ of satisfaction of property Φ
Statistical method -> estimation characterized by confidence (1- δ), error probability (ԑ):
Pr(|𝑝 ̂−𝑝|<휀) > 1−𝛿 — Chernoff-Hoeffding bound to determine the number of independent samples
𝑚 >4
휀2ln
2
𝛿
Application
Performance of a CAN network — Random components: clock drift, tolerance and aging of electrical components
Property Φ: transmission of a packet without errors — 𝑝 ̂ is the probability of correct transmission (1−𝑝 ̂=𝑃𝐸𝑅) — 1 transmitted packet == 1 statistical sample
Alberto Ferrari - ALES S.r.l.
Statistical model checking
7/9/2012 23
Clusters of nodes
Scenario: 32 nodes, distance between the clusters 4 m, distance between nodes [0.4, 2] m, clock drift of CAN controller ~N(0,150 ppm)
SMC: 50000 samples → confidence interval: 0.01, error probability: 0.015
Theoretical limit: 1 Mbit/s → 40 m (analytic model)
Statistical Model Checking Performance in critical scenarios
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00%
# n
od
e
Packet error probability estimation
Packet error rate [%]
Not negligible packet error rate at the theoretical length!
Reference: “Message priority inversion on a CAN bus”, Texas Instruments Incorporated
Simulated model:
Analysis of critical scenarios not captured by the analytic model
7/9/2012 Alberto Ferrari - ALES S.r.l. 24
Pros
—Scale fairly reasonably with the system size (simulation based)
Cons
—Scale pretty badly with confidence level
• 99% -> 50K run!
Statistical Model Checking Pros & Cons
7/9/2012 Alberto Ferrari - ALES S.r.l. 25
Several challenges in the design and verification of cyber-physical systems — Safety constraints are becoming stronger and their satisfaction
requires substantial manual design and verification effort • There are few approaches to design under these constraints, but still not
mature enough for industrial deployment
— Real-time Performance analysis for modern architecture is hard • Different approaches have been proposed, none is really ready for prime
time
— Combined approach for safety and real-time design/verification still far from becoming real
Multi-processors Distributed Architecture are becoming reality also in embedded systems — Increase WECT bounds (apparently) — Solutions to cost optimize these architectures are not yet
available on industrial scale
Conclusions
7/9/2012 Alberto Ferrari - ALES S.r.l. 26
Christian Nastasi
Alessandro Ulisse
Massimiliano D’Angelo
Acknowledgments
7/9/2012 Alberto Ferrari - ALES S.r.l. 27