View
215
Download
0
Category
Tags:
Preview:
Citation preview
Predictable Implementation of Real-Time Applications on Multiprocessor
Systems-on-Chip
Alexandru Andrei
Embedded Systems LaboratoryLinköping University, Sweden
2
GSM Phone:SearchRadio Link ControlTalking
GSM Phone:SearchRadio Link ControlTalking
MP3 playerMP3 player
Digital Camera:Take PhotoRestore Photo
Digital Camera:Take PhotoRestore Photo
...... High performanceLow powerPredictable
3
Design Flow
Hardwareplatform
Software Application(s)
ExtractTask Graph
Extract TaskParameters
Optimize
Formal
Simulation
CPU0
ASIC0
CPU1
Bus
for (i=0;i<99;i++) x=x+a[i];for(j=0;j<100;j++) y=y+b[i];if (x<y)z=y;
•Worst case execution times•Task power
dl
dl
for (i=0;i<99;i++) x=x+a[i];
for (j=0;j<100;j++) y=y+b[i];
if (x<y)z=y;Implement
Extract TaskParameters
Optimize
4
Application Model
dl
dl
5
Hardware Architecture
Bus
CPU CPU CPUInterruptDevice
PrivateMemory
PrivateMemory
PrivateMemory
SemaphoreDevice
SharedMemory
CACHE CACHE CACHE
6
Execution Model
CPU1
CPU2
BUS
Shared Mem
Private Mem1
Cac
he
Cac
he
Private Mem2
copy(s,y)use(y)
2:
y
Instructions 2
Original TG
copy(x,s)comp(x)
x
Instructions 11:
s
7
Task Model
i
j
Original TG
wi
rj
Explicitcommunication
i
j
Extended TG
8
Motivational Example
1 2
wi
WCET: 1 =60; 2 =25; w2 =12
1 and 2 have a deadline at time 63
PMem1
Bus
CPU1
CPU2
ShMem
PMem2
1
2
wiwi
9
Motivational Example (2)
CPU1
CPU2
BUS
1
2
Implicit communication
w2
M1 M3 M5
M2 M4
I 1
I 20 6 9 15
0 6 11 17 24
33 39
36
57
Explicit communication
dl=63
I5 w2I 4
I 3
10
w2 I5I4I3I2
Motivational Example (3)
CPU1
CPU2
BUS
1
2 w2
M1 M3 M5
M2 M4
I1
0 6 9 18
0 3112 17 24
36 49
43
67
dl=63
0 6 12 18 24 31
Deadlineviolation !
43 49
Using a FCFS bus arbiter
11
w2I5I2I3 I4
Motivational Example (4)
CPU1
CPU2
BUS
1
2 w2
M1 M3
M2
I1
0 6 9 18
0 3212 17 26
33 39
39
57
dl=63
0 6 9 21 32 4915
M4
M4
26 39
Using a bus schedule
12
Motivational Example Message
In multiprocessor systems, the WCET depends on the bus load !
In multiprocessor systems, the WCET depends on the schedule !
In multiprocessor systems, the schedule depends on the WCET !
13
Implicit Communication
Benchmark Bus Utilization Impl.Communication
GSM1) 12% 39%
MP32) 26% 42%
MP33) 49% 86%
Setup:ARM7 cores, ST bus protocol1) Icache: 4096b, Dcache: 1024b2)Icache: 4096b, Dcache: 1024b3)Icache: 16b, Dcache: 256b
14
WCET Analysis
Difficult both for single and multiprocessor systems Single processor tools: Symta/P, Absint aiT
Handle instruction and data caches
Basic idea: enumerate all the possible paths of the program (CFG) and consider always the longest one
15
WCET Analisys Flow
source files
analysisData flow
Instr. addressextraction
Program segmentsimulation
Abstract syntax treegeneration
Data dependencyanalysis
analysisData flow
extractionData address
analysisData cache
binary fileCFG construction
Annotated CFG
WCET
Instruction cache Data cache
Instr. Cacheanalysis
analysis analysis
16
WCET Analysis: Example
void foo() { int i, temp; for (i=0;
i<100;i++) {
temp=a[i]; a[temp]=0;
}}
17
WCET Analysis: CFG
1:void foo() {2: int i, temp;3: for (i=0;4: i<N;5: i++) {6: temp=a[i];7: a[temp]=0; 8: }9:}
id: 2
id: 17Lno:3,4,9
id: 12Lno:3,4,6
id: 4
id: 13Lno:6,7,5,4,6
id: 16Lno:6,7,5,4,8
id: 11
18
WCET Analysis: CFG
id: 2
id: 17Lno:3,4
id: 12Lno:3,4
id: 4
id: 13Lno:6,7,5,4,6
id: 16Lno:6,7,5,4,8
id: 11
Control nodes: 2, 4, 11
Basic blocks: 12, 17, 13, 6
id: 4
Loop bound(for ex. N=100)
19
WCET Analysis with Instruction Cache
Generate the address traces for each program blockAssume always a miss at the beginning of each blockUse a cache simulator to get the cache rate/miss ratio for each block
We can do better
20
WCET Analysis with ICache: Unrolled CFG
1:void foo() {
2: int i, temp;
3: for (i=0;
4: i<100;
5: i++) {
6: temp=a[i];
7: a[temp]=0;
8: }
9:}
id: 2
id: 17Lno:3,4
id: 12Lno:3,4
id: 4
id: 13Lno:6,7,5,4,6
id: 16Lno:6,7,5,4,8
id: 11id: 104
id: 13Lno:6,7,5,4,6
21
WCET Analysis with ICache: Unrolled CFG
id: 2
id: 17Lno:3,4
id: 12Lno:3,4
id: 4
id: 13Lno:6,7,5,4,6
id: 16Lno:6,7,5,4,8
id: 11id: 104
id: 13Lno:6,7,5,4,6
miss lno 6 (d)lno 6miss lno 7 (d)lno 7, 5, 4
miss lno 6 (d)miss lno 6 (i)
lno 6miss lno 7 (i)miss lno 7 (d)lno 7miss lno 5 (i)lno 5, 4
miss lno 3 (i)miss lno 3 (d)lno 3miss lno 4 (i)lno 4
22
WCET Analysis: Multiprocessor
Cache miss penalty is constant in single processor case
Cache miss penalty is variable in the multiprocessor case
23
Predictable MPSoC Bus Access
Partition the bus period in bus slots (TDMA) Assign bus slots to the processors The bus arbiter grants the bus to a processor
only during its allocated slots Eliminates the bus interference Not flexible: an idle bus slot can not be used
by another processor
24
Analysis & Bus Accessid: 2
id: 17Lno:3,4
id: 12Lno:3,4
id: 4
id: 13Lno:6,7,5,4,6id: 16
Lno:6,7,5,4,8
id: 11
id: 104
id: 13Lno:6,7,5,4,6
miss lno 6 (d)lno 6miss lno 7 (d)lno 7, 5, 4
miss lno 6 (d)miss lno 6 (i)
lno 6miss lno 7 (i)miss lno 7 (d)lno 7miss lno 5 (i)lno 5, 4
miss lno 3 (i)miss lno 3 (d)lno 3miss lno 4 (i)lno 4
Bus schedule CPU1 CPU2 CPU1 CPU2 CPU2CPU1 ...24 320 8 16 42 52
25
Multiprocessor Analysis and Optimization
In multiprocessor systems, the WCET depends on the schedule !
In multiprocessor systems, the schedule depends on the WCET !
26
5
Overall ApproachC
PU
1C
PU
2C
PU
3B
US
1
2
3
CPU1: 1, 4
CPU2: 2
CPU3: 3 , 5
41
3
1
2
3
2
4
2
3
44
2
5
2
5
44 4
55
27
Overall Approach
starting at tfor the time interval
Select bus schedule B
tasks from set Determine WCET of the
is the earliest timea tasks from set
finishes
Schedule new task attime t>=
that are active at time t
is the set of all tasksN
ew t
ask
to s
ched
ule
optim
izat
ion
Bus
sch
edul
e
28
Overall Approach
starting at tfor the time interval
Select bus schedule B
tasks from set Determine WCET of the
is the earliest timea tasks from set
finishes
Schedule new task attime t >=
that are active at time t
is the set of all tasksN
ew t
ask
to s
ched
ule
optim
izat
ion
Bus
sch
edul
e
29
Bus Schedule: BSA1
t0
t1
t3 CPU2
t1 t2t0 t4t3
CPU2CPU1 CPU1 CPU2
... ...over
a p
erio
d slot_start ownerCPU1
CPU2
CPU1
...
t2
30
Bus Schedule: BSA2
t0
owners1, 2 12
seg_sizeseg_start
owner size13
CPU1
CPU2
Segment 1 Segment 2ov
er a
per
iod
...
t1 t2t0 t4t3
CPU2CPU1 CPU1 CPU2 ...
t4
owners2, 1 7
seg_sizeseg_start
owner size25
CPU1
CPU2
CPU2 CPU1
t5 t6
...
31
Bus Schedule: BSA3
t0
seg_start owners1, 2 3
slot_size
t42, 1 6
... ... ...
over
a p
erio
d
Segment 1 Segment 2
t1 t2t0 t4t3
CPU2CPU1 CPU1 CPU2 ...CPU2 CPU1
t5 t6
32
Experimental Results
BSA4
BSA3
BSA2
BSA1
Number of CPUs
No
rmal
ized
Sch
edu
le L
eng
th
1
1.5
2
2.5
3
3.5
4
2 4 6 8 10 12 14 16 18 20
33
Experimental Results
4.0 3.0 2.6
1.2 1.01.82.2
5.0
1
1.5
2
2.5
3
3.5
2 4 6 8 10
Number of CPUs
No
rmal
ized
Sch
edu
le L
eng
th
34
Real-life Example
Smart phone GSM voice codec (encoder+decoder)
and Mp3 player 64 tasks, between 100-2000 lines of C
code per task 4 ARM7 processors, interconnected via
a bus
35
Real-life Example
BSA_1 BSA_2 BSA_3 BSA_4
1.17 1.33 1.31 1.62
GSM + Mp364 tasks4 ARM7 processors
36
Conclusions
Realistic model for MPSoC WCET analysis must be integrated in the
system scheduling Tool for system level scheduling and WCET Tested on real applications
37
ARTIST
LiU
TU Brauschweig U. of Bologna
Original SymtaP code
Bus controllerImplementation
Recommended