46
Application-Platform Mapping in Multiprocessor Systems-on-Chip Leandro Soares Indrusiak [email protected] http://www-users.cs.york.ac.uk/lsi CREDES Kick-off Meeting Tallinn - June 2009

Application-Platform Mapping in Multiprocessor Systems-on-Chip

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Application-Platform Mapping inMultiprocessor Systems-on-Chip

Leandro Soares Indrusiak

[email protected]://www-users.cs.york.ac.uk/lsi

CREDES Kick-off Meeting – Tallinn - June 2009

2

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Outline

Motivation

Application-Platform Mapping

Multiprocessor System-on-Chip Platforms

Application Modelling

System Validation

Conclusions and Future Work

3

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Motivation

Application-Platformmapping isn’t really aproblem whendealing withsequential softwareand uniprocessorhardware

application platform

PE

4

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Motivation

It has been studied fordecades by researchersin parallel anddistributed computing

platform1

PE

application

2

platform2

PE

platform3

PE

platform4

PE

application

3

application

4

application

5

application

1

5

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Motivation

CMOS limits can be felt inmany ways number of transistors

clock frequency

power dissipation

Expectations ofperformance increasemust be met somehow multiprocessing seems to be

the best shot

1970 1980 1990 2000 2010

transistors (x 103)

clock freq. (MHz)

power (W)

10000000

1000000

100000

10000

1000

100

10

1

0

6

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Motivation

Application-Platformmapping is a criticalissue when it comesto multiprocessorplatforms

application platform

PE

PE

PE

PE?

IDLEIDLE

IDLE

REALLY?

7

Application-Platform Mapping in MPSoC | L. S. Indrusiak

MotivationSources: Gordon ASPLOS’06, Kudlur PLDI‘08

1

1975

2

4

8

16

32

64

128

256

512

1980 1985 1990 1995 2000 2005 2010

400480088080 8086 286 386 486 Pentium P2 P3 P4

Athlon Itanium Itanium2

Power4 PA8800400480088080

PA8800

OpteronCoreDuo

Power6Xbox 360

BCM 1480Opteron 4P

Xeon

Niagara Cell

RAW

RAZAXLR Cavium

Unicore

Homogeneous Multicore

Heterogeneous MulticoreCISCO CSR1

Larrabee

PicoChip AMBRIC

AMD Fusion

NVIDIA G80

Core

Core2Duo

Core2Quad

#core

s/c

hip

C/C++/Java ???

8

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Motivation

Application-Platformmapping is a criticalissue when it comesto multiprocessorplatforms

must be able toexplore concurrencyat the applicationlevel

application platform

PE

PE

PE

PE

9

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Motivation

Application-Platformmapping is a criticalissue when it comesto multiprocessorplatforms

can be donedynamically toimprove performanceor to increasedependability

application platform

PE

PE

PE

PE

10

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application-Platform Mapping

The simplest formulation of the mapping problemresembles a graph isomorphism problem application is a graph G = G(A, C), where ai A is an application task

and ci,j C represents the communication from ai to aj

platform is a graph G‘ = G(B, D), where bi B is a processor and di,j

D represents the channel from bi to bj

objective is to map tasks onto processors such that task thatcommunicate with each other lie on adjacent processors

There is no known polynomial time solution for thegraph isomorphism problem

11

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application-Platform Mapping

More sophisticated problem formulations may includeadditional information to the application and platformgraphs, so that different objectives can be met

platform:processing power of bi

power consumption and latency of di,j

application:computational cost and deadline of ai

volume, max latency and required bandwidth of ci,j

objectives: reduce bandwidth requirements on the platform, reducepower consumption, balance thermal dissipation, etc.

12

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application-Platform Mapping

Platform models over-simplify the complex design spaceof on-chip multiprocessor platforms

All approaches disregard the temporal dimension application and platform models must include further details on time

and concurrency

13

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Many architectural alternatives homogeneous vs. heterogeneous

processing

shared memory vs. distributed memory

on-chip interconnect

• point-to-point, crossbar

• on-chip bus

• network-on-chip

source: Intel

14

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

• It has nine processors: one PowerProcessor Element (PPE) and eightSynergistic Processing Elements(SPE)• PPE has separated I & D L1 cache(32KB each)• Each SPE can only access its256KB of local storage (LS) and usesits memory flow controller (MFE) toperform DMA operations to/from LS(non-blocking)• Bandwidth (3.2 GHz)

• SPE <-> LS = 2x 25.6 GB/s• MFC <-> EIB = 2x 25.6 GB/s• MIC <-> EIB = 2x 25.6 GB/s• L1 <-> L2 = 2x 51.2 GB/s

Cell Processor

15

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

ARM Cortex-9 MPCore

• Can have 1-4 Cortex-A9 cores withseparated I & D L1 cache (32KBeach)• Data caches of all cores are fullycoherent• Interface to external components bemade cache-coherent• On-chip interconnect based onAMBA standard

16

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

As the number of cores increase, on-chip communicationbecomes a major issue

Source: ITRS Roadmap

17

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Current solution: reusabilitystandard processors

reusable IP cores

reusable on-chip interconnects

Networks-on-Chip (NoC)multi-hop, packet-based

interconnect

scalable, high bandwidth

low power (shorter wires)

regular, reusable

reusable interconnect

OS OS OSOS

PE PE PEPE

middleware

application

PE PE

PEPE

PE PE

PE

PE

R

PE

R R

RR R

RR R

18

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Networks-on-Chip: design decisions topology

packetisation

flow control

routing

PE PE

PEPE

PE PE

PE

PE

R

PE

R R

RR R

RR R

R

PE

PE

PE

PE

R

PE

PE

PE

PE

R

PE

PE

PE

PE

R

PE

PE

PE

PE

R

R R R

switching

arbitration

buffering

19

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Networks-on-Chip for 3Darchitecturesall previous alternatives and

more

may integrate dies producedwith different technologies

asymmetry between horizontaland vertical channels

increased locality

thermal issues become critical

• load balancing

20

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Design decisions must beevaluated using abstractmodels of the platform cycle-accurate

cycle-approximate

untimed/functional

structural/graph

Modeling trade-offs accuracy

observability

speed

21

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Cycle-accurate and cycle-approximate models

graphical interface toexperiment withtopology, routing,buffering, ...

observability: bufferoccupation, latency,throughput, ...

flit-level accuracy

22

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

experimentation ofalternative codingtechniques aiming toreduce the powerconsumption in NoCinterconnects

reductions of up to60% of bit transitionwere achieved

Coding techniques to reduce bit transition activity overNoC interconnects

23

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Multiprocessor System-on-Chip Platforms

Simplified models packet-level accuracy, 1-position buffers

• complete packet is abstracted by header and trailler

• header‘s way through the network is fully simulated, trailler latencies arecalculated

• less than 5% error for average latency (compared with cycle-accurate HDLmodel)

fully analytical

• latencies of packet delivery are calculated upon packet creation and refineduntil packet delivery

• function of network occupation, route, buffer sizes, switching and arbitrationtechniques

real-time analysis

• static analysis of worst-case interference between network flows

31

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application Modeling

Compilers can only goso far on extractingparallelism fromsequential code

Developers must havethe means to specify theinherent concurrency ofeach particularapplication avoiding pitfalls such as

deadlocks or undesirednon-determinism

IDLEIDLE

IDLE

IDLEIDLEIDLE

IDLEIDLE

IDLE

IDLEIDLEIDLE

IDLEIDLE

IDLEIDLE

IDLE IDLE

IDLE

application platform

32

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application Modeling

Many concurrentprogramming modelsavailable threads

concurrent processes withmessage-passing

actors

streams/dataflow

CSP

timed automata

<add your favorite here>

application application

33

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application Modeling

The choice of whichconcurrent programmingmodel should be adopteddepends on application domain

familiarity by thedesigner/developer

availability of stable flowsand tools

libraries in Javaand C++

threads

OpenMPmessagepassing

Simulink,Ptolemy

actors

StreamIt,CUDA

streams /dataflow

toolsmodel

UPAALtimed automata

34

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application Modeling

Application models areabstract representationsof a program, which canbe used both at designtime and at runtime can be strongly influenced

by the programming model

period=1ms

period=2ms

period=4ms

period=1ms

A

B

C

D

D1 D2 D3 D4

A1 A2 B1 A3 A4 B2 C1

35

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application Modeling

Executable models fast execution

rich set of modeling constructs, different abstraction levels

clearly defined execution semantics, so that the dynamic behavior of theapplication can be properly validated

Formal properties models of concurrent computation

concurrency constraint through the definition of ordering relations

polymorphic type systems

36

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Concurrent Models of Computation

Actor orientation, as proposed by E. A. Lee, UCBerkeley revisited fundamental concepts from Atkinson & Hewitt (MIT, 1977)

and Gul Agha (MIT, 1986)

Actor orientation basics execution semantics of a given system model was factored out from

the individual components of that model

execution semantics is associated to a well defined Model ofComputation (MoC)

heterogeneous models can be created through the hierarchicalcomposition of MoCs

37

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Actor-orientation

executionsemantic of thetop level model

executionsemantic of thecomposite model

38

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Application Modeling

Extending actor-orientation with types and explicit ordering UML suitable visual representation for the definition of polymorphic type

systems (class diagrams) and ordering relations (sequence diagram)

but

UML is not an executable specification language

?

39

Application-Platform Mapping in MPSoC | L. S. Indrusiak

UML sequence diagrams within actors

Recall the formal definition of MSC

tuple P, E, C, l, m, < where P is a finite set of processes

E is a finite set of events

C is a finite set of names for messages

l: E →T = { p!q(a), p?q(a), p(a) | p≠q P, a C }

m: S →R

< E E is a acyclic relation between events consisting of:

• a total order on EP for every p P, and

• s < r, whenever m(s)=r

40

Application-Platform Mapping in MPSoC | L. S. Indrusiak

UML sequence diagrams within actors

Recall the formal definition of MSC order of events (message occurrence) within a process (lifeline) is a

total order

the reflexive-transitive closure of < (denoted as <*) is a partial orderon the complete set E

enough for the definition of an untimed model of computation

different possibilities were explored and integrated as a library ofdirectors on an extended version of Ptolemy II

41

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Case study

Application modeling based on behavioral patterns andpolymorphic type systems

constraints toconcurrent executiondescribed usingUML

applicationfunctionalitydescribed usingactors

42

Application-Platform Mapping in MPSoC | L. S. Indrusiak

System Validation

Joint validation ofapplication andplatform models

Application is back-annotated withcommunicationcosts and powerconsumption datafrom the executionof platform models

MAPPER

43

Application-Platform Mapping in MPSoC | L. S. Indrusiak

System Validation

Mapping heuristicbecomes a designdecision experiments show that

in large multiprocessorplatforms, the averagecommunication latencyof a given applicationcan vary by 2 orders ofmagnitude accordingto the employedmapping heuristic

MAPPER

44

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Case study

Application and platformmodeling on Ptolemy II multiple MoCs

supports the evaluation of howdifferent platforms execute agiven application

supports the evaluation ofdifferent mapping heuristics

synthetic example:autonomous vehicle

45

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Conclusions and Future Work

Extensions to the state-of-the art on system-levelspecification and validation combination of formalism, simulation/execution and analytical solution

Extensions on application and platform models allow forricher mapping mechanisms temporal constraint based on concurrent MoC and explicit ordering

spacial constraint based on functional types

46

Application-Platform Mapping in MPSoC | L. S. Indrusiak

Conclusions and Future Work

Further progress must be done in mapping to fullyexplore information in application and platform models time and concurrency

functional constraints

Must explore different implementation styles, aiming toachieve more reliable platforms static vs. dynamic mapping

centralised vs. distributed mapping