1
DAL: Programming Efficient and Fault-Tolerant Applications for Many-Core Systems Iuliana Bacivarov 1 , Ikbel Belaid 2 , Andrea Biagioni 3 , Ashraf El Antably 2 , Nicolas Fournel 2 , Ottorino Frezza 3 , Jovana Jovic 4 , Rainer Leupers 4 , Francesca Lo Cicero 3 , Alessandro Lonardo 3 , Luis Murillo 4 , Pier Stanislao Paolucci 3 , Devendra Rai 1 , Davide Rossetti 3 , Frédéric Rousseau 2 , Lars Schor 1 , Christoph Schumacher 4 , Francesco Simula 3 , Lothar Thiele 1 , Laura Tosoratto 3 , Piero Vicini 3 , Hoeseok Yang 1 References L. Schor, I. Bacivarov, H. Yang, L. Thiele: “Worst-Case Temperature Guarantees for Real-Time Applications on Multi-Core Systems” , Proc. Real-Time and Embedded Technology and Applications Symposium (RTAS), Apr. 2012. K. Huang, W. Haid, I. Bacivarov, M. Keller, L. Thiele: “Embedding Formal Performance Analysis into the Design Cycle of MPSoCs for Real-time Multimedia Applications”, ACM Transactions on Embedded Computing Systems, 2012. A. Chagoya-Garzon, N. Poste, F. Rousseau: “Semi-Automation of Configuration Files Generation for Heterogeneous Multi-Tile Systems”, Proc. Computer Software and Application Conference (COMPSAC), July 2011. I. Bacivarov, W. Haid, K. Huang, L. Thiele: “Methods and Tools for Mapping Process Networks onto Multi-Processor Systems-On-Chip”, Handbook of Signal Processing Systems, Springer, pages 1007-1040, Oct. 2010. C. Schumacher, R. Leupers, D. Petras and A. Hoffmann: “parSC: Synchronous Parallel SystemC Simulation on Multi-Core Host Architectures”, Proc. Int’l Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), Oct. 2010. R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, P.S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto, P. Vicini: “ apeNET+: High Bandwidth 3D Torus Direct Network for PetaFLOPS Scale Commodity Clusters”, Proc. Int’l Conf. on Computing in High Energy and Nuclear Physics (CHEP), Oct. 2010. Stefan Bleuler, Marco Laumanns, Lothar Thiele, Eckart Zitzler: “PISA - A Platform and Programming Language Independent Interface for Search Algorithms”, Evolutionary Multi-Criterion Optimization (EMO), Apr. 2003. Vision Effective Many-Tile System-Level Programming Environment Fault-Tolerance at System-Level Application Specification Hierarchical model of computation FSM controls multiple concurrent Kahn process networks Events causing scenario transition Contact 1 Computer Engineering and Networks Laboratory, ETH Zurich, Switzerland, email: [email protected] 2 System Level Synthesis Group, Laboratoire TIMA , France, email: [email protected] 3 INFN Roma, Italy, email: [email protected] 4 Institute for Comm. Technologies and Embedded Systems, RWTH-Aachen University, Germany, email: [email protected] EURETILE: http://euretile.roma1.infn.it DAL: http://www.tik.ee.ethz.ch/~euretile Platform Specification Scalable many-tile platform Hierarchical communication infrastructure: First level (cortical columns): instruction-level parallelism, intra-process parallelism Second level (cortical areas): process-network parallelism Third level (neo-cortex): concurrent applications Run-Time Environment Hierarchically centralized system of controllers: Many-core architecture divided into several clusters Each cluster has a single (local) cluster controller Cluster controllers under the control of a main controller Cluster controller: Receives events from running applications Produces commands to the distributed system that lead to pausing/stopping/starting of tasks and queues Hardware-dependent Software (HdS) Applications and run-time environment independent of the target platform HdS: software stack to abstract the hardware by the application code: Operating system (OS) Communication primitives (middleware) Hardware abstraction layer (HAL) Multiprocessor targets: Intel x86-based HPC platform RISC (IRISC-based) embedded platform Virtual EURETILE Platform (VEP) Virtual platform scalable to many simulated tiles/cores Non-intrusive controllability and visibility to foster debug and programming better than in HW Fault injection and many-tile concurrency debug frameworks Two simulation modes: fast host- compiled and accurate ISS-based EURETILE HPC platform Based on QUonG (Quantum chromodynamics ON GPU) PC mesh based on Intel multi-core CPUs accelerated with high-end GPU and interconnected via 3-d torus network Communicating with custom interconnect (APEnet+: DNPs on FPGA-based PCI Express card) Software-programmable accelerators in the form of ASIPs Scenario 3 Scenario 2 Scenario 2 Scenario 3 Scenario 1 e 1 , e 2 e 3 e 4 e 5 , e 6 e 7 e 8 , e 9 Design Space Exploration Each “application scenario” is linked to one set of mappings Mapping optimization using PISA and EXPO MOEA (multi-objective evolutionary algorithm) module to compute the Pareto front of optimal mappings Performance analysis using MPA (modular performance analysis) framework to provide real-time behavior and temperature guarantees while optimizing average power consumption, data throughput, and latency Mapping 3 Mapping 2 Mapping 1 EXPO mapping generation and variation candidate mapping modular performance analysis (MPA) Analysis Controller Mechanism Scen. 3 Scen. 1 Scen. 2 e 1 , e 2 e 3 e 4 e 5 , e 6 e 7 e 8 , e 9 virtual mapping physical mapping events fault-tolerance (dynamic remapping) dynamic selection of mapping commands for state transactions Pareto optimal solutions MOEA evolutionary algorithm DNP MEM ASIP DSP RISC RISC “cortical columns” “cortical areas” “neo-cortex” high-temperature fault dynamic applications dynamic mapping many-tile hardware Parallelism instruction-level task-level application-level Model of Computation (MoCs) sequential parallel dynamic, concurrent, scalable Providing reliability guarantees on mappings at design time (e.g., maximum temperature) Conservative Analysis For non-critical applications Empty (disengaged) tiles and clusters In case of a fault: restart application on an empty tile or cluster Over-Provisioning For safety critical applications Distinction between pure computation processes and I/O processes Duplication of computation processes Task Duplication System-level performance analysis model calibration fault main controller cluster controller Supported commands: Stop application Start application Pause application Resume application System Structure APEnet+ HdS API DNA-OS HAL HAL API HdS application Component-based operating system Provides high-level mechanisms such as: Threads (based on POSIX.1-2001 API standard) Semaphores Dynamic memory (malloc and free) Inputs/outputs management Low memory footprint Minimal impact on the overall performances DNA-OS COM Mem. Periph. Bus Tile DNP or Host- Compiled HAL AED Target ISS IRISC Link Probe DNP DNP DNP DNP tile tile tile tile Many-tile Debug Framework Fault Injection Framework VEP-EX deadlock, lost packet… Link down, CPU fault… Scenario 1 a 1 a 3 P 1 P 3 b risc RISC a 11 a 23 C 1 C 2 b bus BUS a 12 a 22 P 2 b dsp DSP

DAL: Programming Efficient and Fault-Tolerant Applications ... · DAL: Programming Efficient and Fault-Tolerant Applications for Many-Core Systems Iuliana Bacivarov1, Ikbel Belaid2,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DAL: Programming Efficient and Fault-Tolerant Applications ... · DAL: Programming Efficient and Fault-Tolerant Applications for Many-Core Systems Iuliana Bacivarov1, Ikbel Belaid2,

DAL: Programming Efficient and Fault-Tolerant Applications for Many-Core Systems

Iuliana Bacivarov1, Ikbel Belaid2, Andrea Biagioni3, Ashraf El Antably2, Nicolas Fournel2, Ottorino Frezza3, Jovana Jovic4, Rainer Leupers4, Francesca Lo Cicero3, Alessandro Lonardo3, Luis Murillo4, Pier Stanislao Paolucci3, Devendra Rai1, Davide Rossetti3, Frédéric Rousseau2,

Lars Schor1, Christoph Schumacher4, Francesco Simula3, Lothar Thiele1, Laura Tosoratto3, Piero Vicini3, Hoeseok Yang1

References • L. Schor, I. Bacivarov, H. Yang, L. Thiele: “Worst-Case Temperature Guarantees for Real-Time Applications on Multi-Core Systems”, Proc. Real-Time and Embedded Technology and Applications Symposium (RTAS), Apr. 2012.

• K. Huang, W. Haid, I. Bacivarov, M. Keller, L. Thiele: “Embedding Formal Performance Analysis into the Design Cycle of MPSoCs for Real-time Multimedia Applications”, ACM Transactions on Embedded Computing Systems, 2012.

• A. Chagoya-Garzon, N. Poste, F. Rousseau: “Semi-Automation of Configuration Files Generation for Heterogeneous Multi-Tile Systems”, Proc. Computer Software and Application Conference (COMPSAC), July 2011.

• I. Bacivarov, W. Haid, K. Huang, L. Thiele: “Methods and Tools for Mapping Process Networks onto Multi-Processor Systems-On-Chip”, Handbook of Signal Processing Systems, Springer, pages 1007-1040, Oct. 2010.

• C. Schumacher, R. Leupers, D. Petras and A. Hoffmann: “parSC: Synchronous Parallel SystemC Simulation on Multi-Core Host Architectures”, Proc. Int’l Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), Oct. 2010.

• R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, P.S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto, P. Vicini: “apeNET+: High Bandwidth 3D Torus Direct Network for PetaFLOPS Scale Commodity Clusters”, Proc. Int’l Conf. on Computing in High Energy and Nuclear Physics (CHEP), Oct. 2010.

• Stefan Bleuler, Marco Laumanns, Lothar Thiele, Eckart Zitzler: “PISA - A Platform and Programming Language Independent Interface for Search Algorithms”, Evolutionary Multi-Criterion Optimization (EMO), Apr. 2003.

Vision Effective Many-Tile System-Level Programming Environment Fault-Tolerance at System-Level

Application Specification • Hierarchical model of computation • FSM controls multiple concurrent

Kahn process networks • Events causing scenario transition

Contact 1 Computer Engineering and Networks Laboratory, ETH Zurich, Switzerland,

email: [email protected] 2 System Level Synthesis Group, Laboratoire TIMA , France,

email: [email protected] 3 INFN Roma, Italy, email: [email protected] 4 Institute for Comm. Technologies and Embedded Systems, RWTH-Aachen University,

Germany, email: [email protected]

EURETILE: http://euretile.roma1.infn.it

DAL: http://www.tik.ee.ethz.ch/~euretile

Platform Specification • Scalable many-tile platform • Hierarchical communication infrastructure: First level (cortical columns): instruction-level

parallelism, intra-process parallelism Second level (cortical areas):

process-network parallelism Third level (neo-cortex):

concurrent applications

Run-Time Environment • Hierarchically centralized system of controllers: Many-core architecture divided into several clusters Each cluster has a single (local) cluster controller Cluster controllers under the control of a main controller

• Cluster controller: Receives events from running applications Produces commands to the distributed system that

lead to pausing/stopping/starting of tasks and queues

Hardware-dependent Software (HdS) • Applications and run-time environment independent of the target platform • HdS: software stack to abstract the hardware by the application code: Operating system (OS) Communication primitives (middleware) Hardware abstraction layer (HAL)

• Multiprocessor targets: Intel x86-based HPC platform RISC (IRISC-based) embedded platform

Virtual EURETILE Platform (VEP) • Virtual platform scalable to many

simulated tiles/cores • Non-intrusive controllability and

visibility to foster debug and programming better than in HW

• Fault injection and many-tile concurrency debug frameworks

• Two simulation modes: fast host- compiled and accurate ISS-based

EURETILE HPC platform • Based on QUonG (Quantum chromodynamics ON GPU) PC mesh based on Intel multi-core CPUs accelerated

with high-end GPU and interconnected via 3-d torus network

Communicating with custom interconnect (APEnet+: DNPs on FPGA-based PCI Express card)

Software-programmable accelerators in the form of ASIPs

Scenario 3

Scenario 2

Scenario 2

Scenario 3

Scenario 1

e1, e2

e3

e4 e5, e6

e7

e8, e9

Design Space Exploration • Each “application scenario” is linked to one set of mappings • Mapping optimization using PISA and EXPO • MOEA (multi-objective evolutionary algorithm) module

to compute the Pareto front of optimal mappings • Performance analysis using MPA (modular performance

analysis) framework to provide real-time behavior and temperature guarantees while optimizing average power consumption, data throughput, and latency

Mapping 3 Mapping 2

Mapping 1

EXPO

mapping generation and variation

candidate mapping modular performance

analysis (MPA)

Analysis

Controller Mechanism

Scen. 3

Scen. 1

Scen. 2

e1, e2

e3

e4 e5, e6

e7

e8, e9

virtual mapping

physical mapping

events

fault-tolerance (dynamic remapping)

dynamic selection of mapping

commands for state transactions

Pareto optimal

solutions

MOEA

evolutionary algorithm

DNP MEM

ASIP ASIP DSP DSP

RISC RISC

“cortical columns”

“cortical areas” “neo-cortex”

high-temperature

fault

dynamic applications

dynamic mapping

many-tile hardware

Parallelism

instruction-level

task-level

application-level

Model of Computation (MoCs)

sequential

parallel

dynamic, concurrent, scalable

•Providing reliability guarantees on mappings at design time (e.g., maximum temperature)

Conservative Analysis • For non-critical applications • Empty (disengaged) tiles and

clusters • In case of a fault: restart

application on an empty tile or cluster

Over-Provisioning • For safety critical applications •Distinction between pure

computation processes and I/O processes

•Duplication of computation processes

Task Duplication

Syst

em-l

evel

per

form

ance

an

alys

is m

od

el c

alib

rati

on

fault

main controller cluster controller

Supported commands:

Stop application

Start application

Pause application

Resume application

System Structure

APEnet+

HdS API

DNA-OS

HAL

HAL API

HdS

application • Component-based operating system • Provides high-level mechanisms such as: Threads (based on POSIX.1-2001 API standard) Semaphores Dynamic memory (malloc and free) Inputs/outputs management

• Low memory footprint • Minimal impact on the overall performances

DNA-OS

COM

Mem.

Periph. Bu

s

Tile

DNP

or

Host-

Compiled

HAL

AED

Target ISS

IRISC

Link Probe

DNP DNP

DNP DNP

tile tile

tile tile

Many-tile Debug Framework

Fault Injection Framework VEP-EX

deadlock, lost packet…

Link down, CPU fault…

Scenario 1

a 1

a 3

P 1

P 3

b risc

RISC

a 1 1

a 2 3

C 1

C 2

b bus

BUS

a 1 2

a 2 2

P 2

b dsp

DSP