29
FP7-612069-HARPA Project HARPA Management of mixed criticality and reliability at run-time: the HARPA approach Thematic Session on Challenges in Mixed Criticality and Real-time and Reliability in Networked Complex Embedded Systems Barcelona, May 15, 2014 HiPEAC CSW Prof. William Fornaciari [email protected] home.deib.polimi.it/fornacia

Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

HARPA

Management of mixed criticality and reliability at run-time:

the HARPA approach

Thematic Session on Challenges in Mixed Criticality and Real-time and Reliability in

Networked Complex Embedded Systems

Barcelona, May 15, 2014

HiPEAC CSW

Prof. William Fornaciari

[email protected]

home.deib.polimi.it/fornacia

Page 2: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

The project in a nutshell Objectives

Outputs

Exploitation

Organization of the activities and WP mapping

Mixed criticality support and run-time management

Some experimental results

How to get information, contact us

2

Outline

Page 3: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

The Challenge: dependable performance Critical for embedded applications timing correctness Paramount for HPC load balancing and fast execution To be considered with other figures of merit (mixed

criticality)

The Vision: a synergic approach Exploit synergies in the ES or the HPC domains Merging concepts, assessing key applications

The Goal: HARnessing Perfomance vAriability Cost-effective variations confront in next ES/HPC Dependable performance, slack identification, timing

3

Introduction

Page 4: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project 4

Introduction

CONSORTIUM OVERVIEW

Participant Business activity / Expertise

Main role in project No. Name Country

1 POLIMI IT University Coordinator of the project. Development of the HARPA OS engine. WP1 and WP7 leader.

2 IMEC BE Research and Technology

Providing link to advanced process technology reliability modeling and WP 4 leader

3 ICCS GR University Development HARPA run-time engine and WP2 leader. Dissemination activities (WP6) leader

4 UCY CY University R&D on run-time Monitors, Knobs and Network-on-Chip. WP3 leader.

5 IT4I CZ Research and Technology

Application of HARPA environment for HPC simulations. WP6 leader.

6 THALES FR Industrial Providing an industrial high-end embedded application which will serve as a use case for the HARPA runtime evaluation. WP5 leader

7 HEN IT SME Providing an industrial application for low-end embedded systems which will serve as a use case for the HARPA runtime evaluation.

Page 5: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

SO1 - Shaving margins Adopt Razor-like concepts into different aspects of a system that are

typically over-provisioned for the worst case

Worst-Case Execution Time (WCET) for time predictability is an example of such over-provisioning in the embedded systems domain.

Over-provisioning also characterizes current design practices in the on-chip interconnect of HPC-oriented multi-core CPUs

SO2 - A more predictable system with real-time guarantees The different monitors, knobs, and the HARPA engine will allow to

study the correlation between the different elements of the system

SO3 - Implementation of effective platform monitors/knobs The implemented monitors and knobs should be lightweight and

should have no or negligible impact on the chip.

Cross-layer approach, whereby monitors and knobs throughout the system stack facilitate a comprehensive control strategy

5

Introduction

Page 6: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

O1 - Performance-dependable multi-core architectures for ES and HPC Augment existing multi-core designs to guarantee

performance dependability

Proactive and reactive techniques derived from ES and HPC

O2 - Monitors/knobs in hardware designs Monitors will allow the identification of the main sources of

performance unpredictability

Knobs will allow the control of applications execution, providing dependable performance

O3 - Monitors/knobs in software designs Track the resources that lead to at least 90% of the

unpredictability

6

Introduction

Page 7: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

O4 - System sw designs that support high performance dependability Provide high commitment in the SLAs in conjunction with the

run-time systems

O5 - Run-time designs that support high performance dependability Develop run-time engine designs to provide high

performance dependability guarantees

O6 - Methodologies for conflicting metrics Develop optimization methodologies at hardware level

exploiting models to maintain HARPA-OS architecture independent as much as possible.

These methodologies follow high level directives provided by the HARPA-OS level to tradeoff different metrics

7

Introduction

Page 8: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

O7 - Develop sw/hw interfaces to provide fluent communication flow Develop interfaces between the different computing stack

layers that allow each layer to obtain information in a reduced timeframe

O8 - New application guidelines to improve performance dependability Develop guidelines that will help improve the performance

dependability guarantees (target 25% improvement)

O9 - Validate the results with industrial case studies Evaluation of the techniques proposed in the project will be

performed on industrial applications provided by the partners THALES, Henesis, and IT4I

8

Introduction

Page 9: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

ITO1 - System Architectural Design Principles

Define a set of hardware and software design

guidelines allowing heterogeneous multicore systems

to provide dependable performance guarantees

Performance guarantees facilitated by the HARPA

engine through the use of monitors and knobs

orchestrated by appropriate control policies

The monitors and knobs operate on pertinent non-

functional objectives, such as power, energy, timing,

wear-out, etc. The proposed solution should be low-cost

and should be applicable to both embedded systems

and high-performance general-purpose environments

9

Introduction

Page 10: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

ITO2 - Dependable Performance Guarantees

Provide the implementation of the HARPA engine. The

HARPA engine is the main outcome of this project

Develop sufficiently generic software that can easily

adapt to different types of hardware depending on the

available monitors and knobs in the system

At the end of this project, this outcome will be directly

exploitable, once appropriately adapted to the existing

hardware

10

Introduction

Page 11: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

ITO3 - Demonstrators Develop case studies with applications representing

different scenarios from both the embedded systems world and the HPC world

These applications will validate the efficacy and efficiency of the various techniques and mechanisms derived from (and cross-fertilized with) both computing paradigms

The HARPA project will test the HARPA engine on platforms, representative of embedded systems, and a full-system evaluation environment simulating typical HPC setups

The idea is to explore the capabilities of the HARPA engine with the monitors and knobs available in existing and future heterogeneous multi-core architectures

11

Introduction

Page 12: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project 12

FLOREON+ (IT4I)

Spectrum Sensing (THALES)

Beesper (HENESIS)

Concept Vehicles

HPC: Floreon+ Environmental risk modelling and simulation – risk

management

High-end ES: Spectrum sensing Explore the frequency spectrum to perform radio

freq. allocation

Low-end ES: Beesper Monitoring landslide based on WSN and cameras

Cross-domain video processing (POLIMI) • Example: people identification/searching

HPC: massively process multiple cameras/images Embedded: power constrained processing

Page 13: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Technology scaling: Challenges

> 20 nm: as transistors aged…

13

< 20 nm: as transistors will age…

Time-dependent phenomena become prevalent -> dynamics of applications matters

Not to mention variability related to mixed workloads and data dependency on top of that: …but it is not a fault, it is a feature!

Acc

ele

rate

d t

est

(fe

w y

ear

s)

William Fornaciari
Sticky Note
PART OF THE OVERALL DATA, IN PARTICULAR THOSE BELOW 20nm HAVE BEEN REMOVED SINCE THEY ARE PROPRIETARY (IMEC)
Page 14: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project 14

Filling the gap

Domain State-of-the-art Novelties to be introduced through HARPA

PV

Mo

de

llin

g Averaging out models with a single signal probability value for the entire system lifetime

Use of atomistic models accounting for time- and workload- dependent variability [Grasser12b]

Highly accurate but CPU-intensive TCAD reliability models [Rodopoulos11]

Reconciliation of CPU-intensity with accuracy in the context of reliability simulations

Kn

ob

s &

M

on

ito

rs Specific metric targeted in isolation

Holistic approach that combines and integrates multiple non-functional requirements

Non-functional metrics not exposed to end user

User and system establish performance dependability agreement, which is upheld throughout system lifetime

PV

Mit

igat

ion

Built-in self-test and job scheduling to enhance MTTF [Feng08]

Detailed knob and monitor placement to enable runtime performance dependability

Time-zero variability compensation at the testing phase [Pineda12a, Pineda12b]

Time- and workload-dependent variability mitigation at runtime throughout system lifetime

Page 15: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

User Requirements Quality & Quality Cost

HARPA Operating System ~1s responsiveness

HARPA Run Time Engine ~1ms responsiveness

Monitors and Knobs Cross-Layer Placement

Hardware System EC or HPC Platform

15

In a nutshell….

Examples of Monitors/Knobs

Timing violation / DVFS

Bit-flip / ECC

Power Consumption / DVFS

Performance / Scheduling

QoS / Resource allocation

Page 16: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project 16

WP-level organization

Proactively and reactively control Guarantee performance of applications on

heterogeneous architecture Establish Service-Level Agreements

(SLA) with the system Periodically monitor the system state Select and steer the appropriate knobs to

provide the performance guarantees, against time-dependent variations

Notion of SLA different for ES and HPC ES: SLA primarily focuses on satisfying

constraints HPC: minimization of deviations from

nominal specifications

HARPA engine: split between The Operating System (HARPA-OS) The Run-Time system (HARPA-RT)

Page 17: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Impact

HARPA OS Already available as open source: BBQ open source project

(http://bosp.dei.polimi.it, see demo videos) Implementation already running on x86 and ARM platforms Run-time management of GPGPU in progress Customizations for possible customers (it was ready for inclusion in

the STHORM SDK platform)

HARPA RT Demonstrator of control loop implemented in firmware exploiting

low level modeling information. 1 ms reaction time Possible porting on commercial platforms

Monitors and Knobs New set of Hw and Sw knobs/monitors Support for Hw and Sw adaptation

Modeling and PV mitigation Filing of a patent before the end of the project Possibility to deal with 7nm time-dependent variability Methodology to validate platform reliability models

17

Page 18: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Provide a methodology and framework to guarantee the QoS of application execution Run-time system resources allocation

• to different applications running concurrently

Meet QoS/SLA requirements while optimizing a mix of figures of merit

• including reliability and power budgeting

Interface the low level run-time management monitoring/tuning PEs and other system

resources

Achieve a wide applicability of the methodology Across a number of possible architectures

ranging from HPC to embedded many-cores

18

WP1 Objectives

WP5

Applications

WP2

HARPA RTE

WP3

Monitors and

Knobs

Page 19: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Run-Time Resources Management (RTRM) is about finding the optimal trade-off between

QoS requirements and resources availability

Target scenario HW standpoint: Shared resources

• targeting many-core devices, both multi-cores and GPGPUs ─ considering process variations and run-time issues

SW standpoint: Mixed workloads • subject to resources sharing and competition

─ considering relative criticality and time-varying requirements

Simple solutions are required support for frequently changing use-cases

suitable for both critical and best-effort applications

19

What is RTRM?

Page 20: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Multiple devices, subsystems Heterogeneous -> Homogeneous (Many-Cores,

GPGPUs) Scalability and Retargetability

Shared resources among different applications Computation, memory, energy, bandwidth…

System-wide resources management

Multiple applications and usage scenarios Run-time changing requirements

Time adaptability

20

Main Goals of RTRM

Page 21: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Methodology to support system-wide run-time resource management exploiting design-time information

hierarchical and distributed control

BarbequeRTRM Framework multi-objective optimization strategy

easily portable and modular design

run-time tunable and scalable policies

open source project

21

The starting point

http://bosp.dei.polimi.it

www.harpa-project.eu/

Page 22: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Introduction of a new modular policy (YaMS) partition available resources (R) on applications (A)

• considering A priorities and R “residual” availabilities

multi-objective optimization • support a set of tunable goals

─ DONE: performances, overheads,

congestion, fairness ─ WIP: stability, robustness,

thermal and power

increase overall system value • considering discrete and tunable

improvements

LP theory, MMKP heuristic promote scheduling of some AWMs

• which improve optimization goals

demote scheduling of others AWMs • which degrades solution metrics

22

BarbequeRTRM Development

Page 23: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

23

BarbequeRTRM Concepts

Track run-time variability application requirements resources availabilities

Overheads contingency design-time profiling run-time optimization

Support different granularities system-wide optimization application-specific tuning

Integrated work-flow single framework to support

both design-time and run-time

Page 24: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

24

BarbequeRTRM Concepts

System-Wide RTRM Coarse grained control on

platform available resources:

- resource accounting

- partitioning and abstraction

- manage applications priorities

- power/thermal “coarse tuning”

Applications

HARPA OS Engine

HARPA

RT Engine

Access

Control

Guide

Assistanc

e

Business

Intelligenc

e

Monitorin

g and

Security Requirements

Notify

Constraints

Configure

Optimization

Policy

Application-Specific RTM Fine grained control on

application specific parameters:

- task ordering

- application parameters monitoring

Platform-Specific

RTRM Fine grained control on

platform available resources:

- resource monitoring

- allocation enforcing

- low-level HW events handling

e.g., critical conditions, faults...

- power/thermal “fine tuning”

WP2

WP5

WP

1

WP3

CPU Count,

Bandwidth, ... Goal Gap, ...

PVT constraints

on Clocks,...

Upper bound

on F and P,...

Embedded

HPC

Page 25: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

BarbequeRTRM Development -

[1] S.Libutti et. al., “Exploiting Performance Counters for Energy Efficient Co-Scheduling of Mixed Workloads on Multi-Core Platforms”. HiPEAC – PARMA-DITAM. 01/2014.

Future developments

Extension to embedded

multi/many-core architectures

GPGPUs

bandwidth allocation

IPC: 1.070 => 1.325

Power [W]

WP5

WP[2,3]

Design of a run-time CGroup tuning support Improved code execution efficiency

more than x1.3 execution time speedup • increased IPC

reduced context switches, reduced OS overhead

26

Page 26: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

27

BarbequeRTRM Development

Application

explicitly select

the computing device

for executing kernels

What if more

applications compete

for resources?

System-wide

Run-time Resource Manager require

Page 27: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project 28

BarbequeRTRM Development

GPU 0 load GPU 1

load

Exec time

-50%

Exec time

GPU 1 temperature GPU 0 temperature Temp. delta Thermal

unbalancing

from 12-13°C

to 3-8 °C

Load [%

]

Load [%

]

Tem

pera

ture

[°C

]

Tem

pera

ture

[°C

]

∆ [°C

] T

ime [s]

[1] G.Massari et. al., “Extending a Run-time Resource Management framework to support OpenCL and Heterogeneous Systems”. HiPEAC – PARMA-DITAM. 01/2014.

GPUs load and temperature balancing AMD Nbody sample (32768 particles input), from 1 to 4 running instances

2 GPUs (AMD Radeon 7750 HD, fGPU=400MHz, fMEM=300MHz)

Page 28: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

Impact

IT4I

Incremental adoption: Installation of HARPA OS (BBQ) already started, Experiments also on GPGPU management

QoS guaranteed of critical tasks, better power management

HARPA environment ensure load-balancing and error resilience based on criticality of the situation

HENESIS

Use of HARPA OS and HARPA technology for new generation of products before end of the project

Improved reliability and extended lifetime

Deployment of one or more pilot installations to test the device in real-world scenario

THALES

Experiments to see how to achieve 20 years of duration of products

• With power budget reduced of one order of magnitude

• Exploiting multi-many core and run-time management

• Analyses impact of intensive DVFS decisions on SoC reliability over the time

POLIMI

Exercise the entire HARPA flow

Vehicle for public dissemination

Reference design for training

29

Hig

hly

Re

usa

ble

fo

r o

the

r ap

plic

atio

ns

Page 29: Management of mixed criticality and reliability at run ...rts.eit.uni-kl.de/hipeac-ws-0514/Presentations/WilliamFornaciari.pdf · Run-Time Resources Management (RTRM) is about finding

FP7-612069-HARPA Project

HARPA project website http://www.harpa-project.eu

HARPA OS – BOSP bosp.dei.polimi.it

In future use of openAIRE

Meet us during workshops we organize HiPEAC, DATE, … Disseminatione Manager:

prof. Dimitrios Soudris, ICCS [email protected]

Contact (project coordinator)

Prof. William Fornaciari

Politecnico di Milano - DEIB

[email protected]

home.deib.polimi.it/fornacia

30

Thanks for your attention