Upload
megan-willis
View
216
Download
0
Embed Size (px)
Citation preview
Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun
Department of Informatics, Gyeongsang National University, Republic of Korea
Contents
• ARINC-653
• ARINC-653 Health Management
• Data Races
• On-the-fly Race Healing Framework
• Race Healing Mechanism
• Development
• Evaluation
• Conclusion
DASC 2010 3
ARINC-653
• ARINC-653 standard defines an application executive (APEX)– To provide OS or Middle-ware services for IMA
• The main objective of ARINC-653 is to provide temporal and spatial partitioning– To enable applications, each executing in a partition, to run
simultaneously and independently on the same architecture
• Temporal partitioning provides strict time slicing to guarantee that only one application accesses resources at each time
• Spatial partitioning provides strict memory management by guaranteeing that a partition exclusively accesses a memory area
DASC 2010 4
ARINC-653 Health Management (1/2)
• An important feature in ARINC-653 is indisputably its health monitor (HM)– It has the responsibility to detect and provide recovery
mechanisms for hardware and software failures
– It has the objective of containing and isolating faults before they propagate across the whole system.
• HM manages recovery tables in three levels indexed by both of the error identifier and the system state for a precise error handling– System HM Table
– Module HM Table
– Partition HM Table
DASC 2010 5
ARINC-653 Health Management (2/2)
• For errors at process level, the HM invokes a user-defined aperiodic error handler– The error handler should be efficient and execute as fast as
possible not to monopolize the system
RTOS
Configuration (XML)
Health Monitor
Module OS Module OSModule OS
Applications Applications Applications
Health Monitor Health Monitor Health Monitor
……
DASC 2010 6
Data Races (1/2)
• Data races may occur when two concurrent threads access a shared memory location without proper inter-thread coordination, and at least one of the accesses is a write.– Unpredictable and mysterious results due to data races may be
reported to the programmer
• An example of multithreaded program
Thread A://dCount is sharedLock(L1)Read dCount;Add one;Write dCount;Unlock(L1);
Thread B://dCount is shared
Read dCount;Add one;Write dCount;
Expected result
Thread A Thread B
Read
Write
Read
Write
Let’s consider “dCount++” instruction
DASC 2010 7
Data Races (2/2)
• Under the influence of the scheduler, the program may run into different interleaving and produce unexpected results
• Synchronization errors lead to asymmetric races – Symmetric races are usually benign, but asymmetric races are
generally harmful
Thread A Thread B
Satisfactory result
Read
Write
Read
Write
Thread A Thread B Thread A Thread B
Unexpected results
Read
Write
Read
Write
Read
Read
Write
Write
Our race healing is motivated by these harmful races
DASC 2010 8
On-the-fly Race Healing Framework (1/2)• We reinforce the native health monitoring function of
ARINC-653 with race detection and healing abilities
• Concept of race healing in ARINC-653
Thread A Thread B
RaceDetection
Health Monitor
Partition OS
ARINC 653
Race Healing
Add/Remove Lock
Thread A Thread B
Value Checking
Read
Write
Read
Write
Read
Write
Read
Write
Notifies
Invokes
Heals
DASC 2010 9
On-the-fly Race Healing Framework (2/2)
InstrumentedProgram
InstrumentedProgram
On-the-fly Race Detection EngineOn-the-fly Race
Detection Engine
On-the-flyRace Healing
Engine
On-the-flyRace Healing
Engine
LogLog
Partition OSPartition OS
Health MonitorHealth Monitor
Native Error Handler
Native Error Handler
ARINC 653
Monitoring (1)
(2)
(3)
(4)
(5)
(1) Instrumented program is monitored by on-the-fly race detector
(2) Once a data race is detected, the HM is notified
(3) The race healer will be invoked by the concerned partition OS as error handler
(4) The race healer accesses the racing code and tries to heal the data race
(5) If the healer fails to do this, a notification is sent back to the HM, which might launch an emergency recovery function
DASC 2010 10
Race Detection Engine
• For on-the-fly race detection, our framework uses the protocol presented by Dinning and Schonberg, 1991– This protocol guarantees to detect at least one race for each
shared variable, if any exists
– The protocol defines the structure and the maintaining policy for an access history with locking mechanism
R1 R3
W2W4
TM
TA TB
Read WriteCS-Read
CS-Write
Access History
R3
W4 Reported Races
W2-R3 R1-W4 W2-W4
R1
W2
DASC 2010 11
• To heal asymmetric races, our technique inserts a lock into not or incompletely synchronized thread to remove or change interleaving
Race Healing Engine
Thread A Thread B
Read
Write
Read
Write
Race Detection
Thread A Thread B
Read
Write
Read
Write
Healing
Thread A Thread B
Read
Write
Read
Write
Race Detection
DASC 2010 12
Development
• Environment– Single Board Computer (SBC) with Intel Xeon Dual core 2
CPUs and 4GB Memory
– RT-Linux operating system
– GNU C compiler 4.3 for OpenMP
– Simulated integrated modular avionics (SIMA) was installed to provide ARINC-653 services
• The race detector and the race healer are both implemented as dynamic libraries using C– The healing function is registered in each monitored program as
its error handler
– Upon race detection, the SIMA HM is notified by the race detector using RAISE_APPLICATION_ERROR system call.
DASC 2010 13
Evaluation
• The efficiency of our framework was evaluated by analyzing the overhead of the race healing functions– The overhead comes from actions of the label generator, the race
detector, and the race healer
• The results shows that our technique slows down in average about 2 times the original program execution– A set of synthetic programs
which only consider asymmetric races was developed using OpenMP directives
DASC 2010 14
Conclusion
• Race Healing Framework– This paper presents a framework that can be embedded in the
ARINC-653 health monitor to detect and heal data races on-the-fly
– It assures the flight software to run safely
• Experimentation and Result– The framework implemented on the simulated integrated modular
avionics (SIMA) that provides ARINC-653 services
– The experimental results show that our framework slows down in average about 2 times the original program execution
– The overhead introduced by our framework is manageable for a large class of soft real-time programs
• We will extend the healing functionality to handle more general race patterns