Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Application-Platform Mapping inMultiprocessor Systems-on-Chip
Leandro Soares Indrusiak
[email protected]://www-users.cs.york.ac.uk/lsi
CREDES Kick-off Meeting – Tallinn - June 2009
2
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Outline
Motivation
Application-Platform Mapping
Multiprocessor System-on-Chip Platforms
Application Modelling
System Validation
Conclusions and Future Work
3
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Motivation
Application-Platformmapping isn’t really aproblem whendealing withsequential softwareand uniprocessorhardware
application platform
PE
4
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Motivation
It has been studied fordecades by researchersin parallel anddistributed computing
platform1
PE
application
2
platform2
PE
platform3
PE
platform4
PE
application
3
application
4
application
5
application
1
5
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Motivation
CMOS limits can be felt inmany ways number of transistors
clock frequency
power dissipation
Expectations ofperformance increasemust be met somehow multiprocessing seems to be
the best shot
1970 1980 1990 2000 2010
transistors (x 103)
clock freq. (MHz)
power (W)
10000000
1000000
100000
10000
1000
100
10
1
0
6
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Motivation
Application-Platformmapping is a criticalissue when it comesto multiprocessorplatforms
application platform
PE
PE
PE
PE?
IDLEIDLE
IDLE
REALLY?
7
Application-Platform Mapping in MPSoC | L. S. Indrusiak
MotivationSources: Gordon ASPLOS’06, Kudlur PLDI‘08
1
1975
2
4
8
16
32
64
128
256
512
1980 1985 1990 1995 2000 2005 2010
400480088080 8086 286 386 486 Pentium P2 P3 P4
Athlon Itanium Itanium2
Power4 PA8800400480088080
PA8800
OpteronCoreDuo
Power6Xbox 360
BCM 1480Opteron 4P
Xeon
Niagara Cell
RAW
RAZAXLR Cavium
Unicore
Homogeneous Multicore
Heterogeneous MulticoreCISCO CSR1
Larrabee
PicoChip AMBRIC
AMD Fusion
NVIDIA G80
Core
Core2Duo
Core2Quad
#core
s/c
hip
C/C++/Java ???
8
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Motivation
Application-Platformmapping is a criticalissue when it comesto multiprocessorplatforms
must be able toexplore concurrencyat the applicationlevel
application platform
PE
PE
PE
PE
9
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Motivation
Application-Platformmapping is a criticalissue when it comesto multiprocessorplatforms
can be donedynamically toimprove performanceor to increasedependability
application platform
PE
PE
PE
PE
10
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application-Platform Mapping
The simplest formulation of the mapping problemresembles a graph isomorphism problem application is a graph G = G(A, C), where ai A is an application task
and ci,j C represents the communication from ai to aj
platform is a graph G‘ = G(B, D), where bi B is a processor and di,j
D represents the channel from bi to bj
objective is to map tasks onto processors such that task thatcommunicate with each other lie on adjacent processors
There is no known polynomial time solution for thegraph isomorphism problem
11
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application-Platform Mapping
More sophisticated problem formulations may includeadditional information to the application and platformgraphs, so that different objectives can be met
platform:processing power of bi
power consumption and latency of di,j
application:computational cost and deadline of ai
volume, max latency and required bandwidth of ci,j
objectives: reduce bandwidth requirements on the platform, reducepower consumption, balance thermal dissipation, etc.
12
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application-Platform Mapping
Platform models over-simplify the complex design spaceof on-chip multiprocessor platforms
All approaches disregard the temporal dimension application and platform models must include further details on time
and concurrency
13
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Many architectural alternatives homogeneous vs. heterogeneous
processing
shared memory vs. distributed memory
on-chip interconnect
• point-to-point, crossbar
• on-chip bus
• network-on-chip
source: Intel
14
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
• It has nine processors: one PowerProcessor Element (PPE) and eightSynergistic Processing Elements(SPE)• PPE has separated I & D L1 cache(32KB each)• Each SPE can only access its256KB of local storage (LS) and usesits memory flow controller (MFE) toperform DMA operations to/from LS(non-blocking)• Bandwidth (3.2 GHz)
• SPE <-> LS = 2x 25.6 GB/s• MFC <-> EIB = 2x 25.6 GB/s• MIC <-> EIB = 2x 25.6 GB/s• L1 <-> L2 = 2x 51.2 GB/s
Cell Processor
15
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
ARM Cortex-9 MPCore
• Can have 1-4 Cortex-A9 cores withseparated I & D L1 cache (32KBeach)• Data caches of all cores are fullycoherent• Interface to external components bemade cache-coherent• On-chip interconnect based onAMBA standard
16
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
As the number of cores increase, on-chip communicationbecomes a major issue
Source: ITRS Roadmap
17
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Current solution: reusabilitystandard processors
reusable IP cores
reusable on-chip interconnects
Networks-on-Chip (NoC)multi-hop, packet-based
interconnect
scalable, high bandwidth
low power (shorter wires)
regular, reusable
reusable interconnect
OS OS OSOS
PE PE PEPE
middleware
application
PE PE
PEPE
PE PE
PE
PE
R
PE
R R
RR R
RR R
18
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Networks-on-Chip: design decisions topology
packetisation
flow control
routing
PE PE
PEPE
PE PE
PE
PE
R
PE
R R
RR R
RR R
R
PE
PE
PE
PE
R
PE
PE
PE
PE
R
PE
PE
PE
PE
R
PE
PE
PE
PE
R
R R R
switching
arbitration
buffering
19
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Networks-on-Chip for 3Darchitecturesall previous alternatives and
more
may integrate dies producedwith different technologies
asymmetry between horizontaland vertical channels
increased locality
thermal issues become critical
• load balancing
20
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Design decisions must beevaluated using abstractmodels of the platform cycle-accurate
cycle-approximate
untimed/functional
structural/graph
Modeling trade-offs accuracy
observability
speed
21
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Cycle-accurate and cycle-approximate models
graphical interface toexperiment withtopology, routing,buffering, ...
observability: bufferoccupation, latency,throughput, ...
flit-level accuracy
22
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
experimentation ofalternative codingtechniques aiming toreduce the powerconsumption in NoCinterconnects
reductions of up to60% of bit transitionwere achieved
Coding techniques to reduce bit transition activity overNoC interconnects
23
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Multiprocessor System-on-Chip Platforms
Simplified models packet-level accuracy, 1-position buffers
• complete packet is abstracted by header and trailler
• header‘s way through the network is fully simulated, trailler latencies arecalculated
• less than 5% error for average latency (compared with cycle-accurate HDLmodel)
fully analytical
• latencies of packet delivery are calculated upon packet creation and refineduntil packet delivery
• function of network occupation, route, buffer sizes, switching and arbitrationtechniques
real-time analysis
• static analysis of worst-case interference between network flows
31
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application Modeling
Compilers can only goso far on extractingparallelism fromsequential code
Developers must havethe means to specify theinherent concurrency ofeach particularapplication avoiding pitfalls such as
deadlocks or undesirednon-determinism
IDLEIDLE
IDLE
IDLEIDLEIDLE
IDLEIDLE
IDLE
IDLEIDLEIDLE
IDLEIDLE
IDLEIDLE
IDLE IDLE
IDLE
application platform
32
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application Modeling
Many concurrentprogramming modelsavailable threads
concurrent processes withmessage-passing
actors
streams/dataflow
CSP
timed automata
<add your favorite here>
application application
33
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application Modeling
The choice of whichconcurrent programmingmodel should be adopteddepends on application domain
familiarity by thedesigner/developer
availability of stable flowsand tools
libraries in Javaand C++
threads
OpenMPmessagepassing
Simulink,Ptolemy
actors
StreamIt,CUDA
streams /dataflow
toolsmodel
UPAALtimed automata
34
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application Modeling
Application models areabstract representationsof a program, which canbe used both at designtime and at runtime can be strongly influenced
by the programming model
period=1ms
period=2ms
period=4ms
period=1ms
A
B
C
D
D1 D2 D3 D4
A1 A2 B1 A3 A4 B2 C1
35
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application Modeling
Executable models fast execution
rich set of modeling constructs, different abstraction levels
clearly defined execution semantics, so that the dynamic behavior of theapplication can be properly validated
Formal properties models of concurrent computation
concurrency constraint through the definition of ordering relations
polymorphic type systems
36
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Concurrent Models of Computation
Actor orientation, as proposed by E. A. Lee, UCBerkeley revisited fundamental concepts from Atkinson & Hewitt (MIT, 1977)
and Gul Agha (MIT, 1986)
Actor orientation basics execution semantics of a given system model was factored out from
the individual components of that model
execution semantics is associated to a well defined Model ofComputation (MoC)
heterogeneous models can be created through the hierarchicalcomposition of MoCs
37
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Actor-orientation
executionsemantic of thetop level model
executionsemantic of thecomposite model
38
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Application Modeling
Extending actor-orientation with types and explicit ordering UML suitable visual representation for the definition of polymorphic type
systems (class diagrams) and ordering relations (sequence diagram)
but
UML is not an executable specification language
?
39
Application-Platform Mapping in MPSoC | L. S. Indrusiak
UML sequence diagrams within actors
Recall the formal definition of MSC
tuple P, E, C, l, m, < where P is a finite set of processes
E is a finite set of events
C is a finite set of names for messages
l: E →T = { p!q(a), p?q(a), p(a) | p≠q P, a C }
m: S →R
< E E is a acyclic relation between events consisting of:
• a total order on EP for every p P, and
• s < r, whenever m(s)=r
40
Application-Platform Mapping in MPSoC | L. S. Indrusiak
UML sequence diagrams within actors
Recall the formal definition of MSC order of events (message occurrence) within a process (lifeline) is a
total order
the reflexive-transitive closure of < (denoted as <*) is a partial orderon the complete set E
enough for the definition of an untimed model of computation
different possibilities were explored and integrated as a library ofdirectors on an extended version of Ptolemy II
41
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Case study
Application modeling based on behavioral patterns andpolymorphic type systems
constraints toconcurrent executiondescribed usingUML
applicationfunctionalitydescribed usingactors
42
Application-Platform Mapping in MPSoC | L. S. Indrusiak
System Validation
Joint validation ofapplication andplatform models
Application is back-annotated withcommunicationcosts and powerconsumption datafrom the executionof platform models
MAPPER
43
Application-Platform Mapping in MPSoC | L. S. Indrusiak
System Validation
Mapping heuristicbecomes a designdecision experiments show that
in large multiprocessorplatforms, the averagecommunication latencyof a given applicationcan vary by 2 orders ofmagnitude accordingto the employedmapping heuristic
MAPPER
44
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Case study
Application and platformmodeling on Ptolemy II multiple MoCs
supports the evaluation of howdifferent platforms execute agiven application
supports the evaluation ofdifferent mapping heuristics
synthetic example:autonomous vehicle
45
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Conclusions and Future Work
Extensions to the state-of-the art on system-levelspecification and validation combination of formalism, simulation/execution and analytical solution
Extensions on application and platform models allow forricher mapping mechanisms temporal constraint based on concurrent MoC and explicit ordering
spacial constraint based on functional types
46
Application-Platform Mapping in MPSoC | L. S. Indrusiak
Conclusions and Future Work
Further progress must be done in mapping to fullyexplore information in application and platform models time and concurrency
functional constraints
Must explore different implementation styles, aiming toachieve more reliable platforms static vs. dynamic mapping
centralised vs. distributed mapping