15
Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun rtment of Informatics, Gyeongsang National University, Republic of K

Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

Embed Size (px)

Citation preview

Page 1: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun

Department of Informatics, Gyeongsang National University, Republic of Korea

Page 2: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

Contents

• ARINC-653

• ARINC-653 Health Management

• Data Races

• On-the-fly Race Healing Framework

• Race Healing Mechanism

• Development

• Evaluation

• Conclusion

Page 3: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 3

ARINC-653

• ARINC-653 standard defines an application executive (APEX)– To provide OS or Middle-ware services for IMA

• The main objective of ARINC-653 is to provide temporal and spatial partitioning– To enable applications, each executing in a partition, to run

simultaneously and independently on the same architecture

• Temporal partitioning provides strict time slicing to guarantee that only one application accesses resources at each time

• Spatial partitioning provides strict memory management by guaranteeing that a partition exclusively accesses a memory area

Page 4: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 4

ARINC-653 Health Management (1/2)

• An important feature in ARINC-653 is indisputably its health monitor (HM)– It has the responsibility to detect and provide recovery

mechanisms for hardware and software failures

– It has the objective of containing and isolating faults before they propagate across the whole system.

• HM manages recovery tables in three levels indexed by both of the error identifier and the system state for a precise error handling– System HM Table

– Module HM Table

– Partition HM Table

Page 5: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 5

ARINC-653 Health Management (2/2)

• For errors at process level, the HM invokes a user-defined aperiodic error handler– The error handler should be efficient and execute as fast as

possible not to monopolize the system

RTOS

Configuration (XML)

Health Monitor

Module OS Module OSModule OS

Applications Applications Applications

Health Monitor Health Monitor Health Monitor

……

Page 6: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 6

Data Races (1/2)

• Data races may occur when two concurrent threads access a shared memory location without proper inter-thread coordination, and at least one of the accesses is a write.– Unpredictable and mysterious results due to data races may be

reported to the programmer

• An example of multithreaded program

Thread A://dCount is sharedLock(L1)Read dCount;Add one;Write dCount;Unlock(L1);

Thread B://dCount is shared

Read dCount;Add one;Write dCount;

Expected result

Thread A Thread B

Read

Write

Read

Write

Let’s consider “dCount++” instruction

Page 7: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 7

Data Races (2/2)

• Under the influence of the scheduler, the program may run into different interleaving and produce unexpected results

• Synchronization errors lead to asymmetric races – Symmetric races are usually benign, but asymmetric races are

generally harmful

Thread A Thread B

Satisfactory result

Read

Write

Read

Write

Thread A Thread B Thread A Thread B

Unexpected results

Read

Write

Read

Write

Read

Read

Write

Write

Our race healing is motivated by these harmful races

Page 8: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 8

On-the-fly Race Healing Framework (1/2)• We reinforce the native health monitoring function of

ARINC-653 with race detection and healing abilities

• Concept of race healing in ARINC-653

Thread A Thread B

RaceDetection

Health Monitor

Partition OS

ARINC 653

Race Healing

Add/Remove Lock

Thread A Thread B

Value Checking

Read

Write

Read

Write

Read

Write

Read

Write

Notifies

Invokes

Heals

Page 9: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 9

On-the-fly Race Healing Framework (2/2)

InstrumentedProgram

InstrumentedProgram

On-the-fly Race Detection EngineOn-the-fly Race

Detection Engine

On-the-flyRace Healing

Engine

On-the-flyRace Healing

Engine

LogLog

Partition OSPartition OS

Health MonitorHealth Monitor

Native Error Handler

Native Error Handler

ARINC 653

Monitoring (1)

(2)

(3)

(4)

(5)

(1) Instrumented program is monitored by on-the-fly race detector

(2) Once a data race is detected, the HM is notified

(3) The race healer will be invoked by the concerned partition OS as error handler

(4) The race healer accesses the racing code and tries to heal the data race

(5) If the healer fails to do this, a notification is sent back to the HM, which might launch an emergency recovery function

Page 10: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 10

Race Detection Engine

• For on-the-fly race detection, our framework uses the protocol presented by Dinning and Schonberg, 1991– This protocol guarantees to detect at least one race for each

shared variable, if any exists

– The protocol defines the structure and the maintaining policy for an access history with locking mechanism

R1 R3

W2W4

TM

TA TB

Read WriteCS-Read

CS-Write

Access History

R3

W4 Reported Races

W2-R3 R1-W4 W2-W4

R1

W2

Page 11: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 11

• To heal asymmetric races, our technique inserts a lock into not or incompletely synchronized thread to remove or change interleaving

Race Healing Engine

Thread A Thread B

Read

Write

Read

Write

Race Detection

Thread A Thread B

Read

Write

Read

Write

Healing

Thread A Thread B

Read

Write

Read

Write

Race Detection

Page 12: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 12

Development

• Environment– Single Board Computer (SBC) with Intel Xeon Dual core 2

CPUs and 4GB Memory

– RT-Linux operating system

– GNU C compiler 4.3 for OpenMP

– Simulated integrated modular avionics (SIMA) was installed to provide ARINC-653 services

• The race detector and the race healer are both implemented as dynamic libraries using C– The healing function is registered in each monitored program as

its error handler

– Upon race detection, the SIMA HM is notified by the race detector using RAISE_APPLICATION_ERROR system call.

Page 13: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 13

Evaluation

• The efficiency of our framework was evaluated by analyzing the overhead of the race healing functions– The overhead comes from actions of the label generator, the race

detector, and the race healer

• The results shows that our technique slows down in average about 2 times the original program execution– A set of synthetic programs

which only consider asymmetric races was developed using OpenMP directives

Page 14: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

DASC 2010 14

Conclusion

• Race Healing Framework– This paper presents a framework that can be embedded in the

ARINC-653 health monitor to detect and heal data races on-the-fly

– It assures the flight software to run safely

• Experimentation and Result– The framework implemented on the simulated integrated modular

avionics (SIMA) that provides ARINC-653 services

– The experimental results show that our framework slows down in average about 2 times the original program execution

– The overhead introduced by our framework is manageable for a large class of soft real-time programs

• We will extend the healing functionality to handle more general race patterns

Page 15: Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea