Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru...

Preview:

Citation preview

Predictable Implementation of Real-Time Applications on Multiprocessor

Systems-on-Chip

Alexandru Andrei

Embedded Systems LaboratoryLinköping University, Sweden

2

GSM Phone:SearchRadio Link ControlTalking

GSM Phone:SearchRadio Link ControlTalking

MP3 playerMP3 player

Digital Camera:Take PhotoRestore Photo

Digital Camera:Take PhotoRestore Photo

...... High performanceLow powerPredictable

3

Design Flow

Hardwareplatform

Software Application(s)

ExtractTask Graph

Extract TaskParameters

Optimize

Formal

Simulation

CPU0

ASIC0

CPU1

Bus

for (i=0;i<99;i++) x=x+a[i];for(j=0;j<100;j++) y=y+b[i];if (x<y)z=y;

•Worst case execution times•Task power

dl

dl

for (i=0;i<99;i++) x=x+a[i];

for (j=0;j<100;j++) y=y+b[i];

if (x<y)z=y;Implement

Extract TaskParameters

Optimize

4

Application Model

dl

dl

5

Hardware Architecture

Bus

CPU CPU CPUInterruptDevice

PrivateMemory

PrivateMemory

PrivateMemory

SemaphoreDevice

SharedMemory

CACHE CACHE CACHE

6

Execution Model

CPU1

CPU2

BUS

Shared Mem

Private Mem1

Cac

he

Cac

he

Private Mem2

copy(s,y)use(y)

2:

y

Instructions 2

Original TG

copy(x,s)comp(x)

x

Instructions 11:

s

7

Task Model

i

j

Original TG

wi

rj

Explicitcommunication

i

j

Extended TG

8

Motivational Example

1 2

wi

WCET: 1 =60; 2 =25; w2 =12

1 and 2 have a deadline at time 63

PMem1

Bus

CPU1

CPU2

ShMem

PMem2

1

2

wiwi

9

Motivational Example (2)

CPU1

CPU2

BUS

1

2

Implicit communication

w2

M1 M3 M5

M2 M4

I 1

I 20 6 9 15

0 6 11 17 24

33 39

36

57

Explicit communication

dl=63

I5 w2I 4

I 3

10

w2 I5I4I3I2

Motivational Example (3)

CPU1

CPU2

BUS

1

2 w2

M1 M3 M5

M2 M4

I1

0 6 9 18

0 3112 17 24

36 49

43

67

dl=63

0 6 12 18 24 31

Deadlineviolation !

43 49

Using a FCFS bus arbiter

11

w2I5I2I3 I4

Motivational Example (4)

CPU1

CPU2

BUS

1

2 w2

M1 M3

M2

I1

0 6 9 18

0 3212 17 26

33 39

39

57

dl=63

0 6 9 21 32 4915

M4

M4

26 39

Using a bus schedule

12

Motivational Example Message

In multiprocessor systems, the WCET depends on the bus load !

In multiprocessor systems, the WCET depends on the schedule !

In multiprocessor systems, the schedule depends on the WCET !

13

Implicit Communication

Benchmark Bus Utilization Impl.Communication

GSM1) 12% 39%

MP32) 26% 42%

MP33) 49% 86%

Setup:ARM7 cores, ST bus protocol1) Icache: 4096b, Dcache: 1024b2)Icache: 4096b, Dcache: 1024b3)Icache: 16b, Dcache: 256b

14

WCET Analysis

Difficult both for single and multiprocessor systems Single processor tools: Symta/P, Absint aiT

Handle instruction and data caches

Basic idea: enumerate all the possible paths of the program (CFG) and consider always the longest one

15

WCET Analisys Flow

source files

analysisData flow

Instr. addressextraction

Program segmentsimulation

Abstract syntax treegeneration

Data dependencyanalysis

analysisData flow

extractionData address

analysisData cache

binary fileCFG construction

Annotated CFG

WCET

Instruction cache Data cache

Instr. Cacheanalysis

analysis analysis

16

WCET Analysis: Example

void foo() { int i, temp; for (i=0;

i<100;i++) {

temp=a[i]; a[temp]=0;

}}

17

WCET Analysis: CFG

1:void foo() {2: int i, temp;3: for (i=0;4: i<N;5: i++) {6: temp=a[i];7: a[temp]=0; 8: }9:}

id: 2

id: 17Lno:3,4,9

id: 12Lno:3,4,6

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11

18

WCET Analysis: CFG

id: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11

Control nodes: 2, 4, 11

Basic blocks: 12, 17, 13, 6

id: 4

Loop bound(for ex. N=100)

19

WCET Analysis with Instruction Cache

Generate the address traces for each program blockAssume always a miss at the beginning of each blockUse a cache simulator to get the cache rate/miss ratio for each block

We can do better

20

WCET Analysis with ICache: Unrolled CFG

1:void foo() {

2: int i, temp;

3: for (i=0;

4: i<100;

5: i++) {

6: temp=a[i];

7: a[temp]=0;

8: }

9:}

id: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11id: 104

id: 13Lno:6,7,5,4,6

21

WCET Analysis with ICache: Unrolled CFG

id: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11id: 104

id: 13Lno:6,7,5,4,6

miss lno 6 (d)lno 6miss lno 7 (d)lno 7, 5, 4

miss lno 6 (d)miss lno 6 (i)

lno 6miss lno 7 (i)miss lno 7 (d)lno 7miss lno 5 (i)lno 5, 4

miss lno 3 (i)miss lno 3 (d)lno 3miss lno 4 (i)lno 4

22

WCET Analysis: Multiprocessor

Cache miss penalty is constant in single processor case

Cache miss penalty is variable in the multiprocessor case

23

Predictable MPSoC Bus Access

Partition the bus period in bus slots (TDMA) Assign bus slots to the processors The bus arbiter grants the bus to a processor

only during its allocated slots Eliminates the bus interference Not flexible: an idle bus slot can not be used

by another processor

24

Analysis & Bus Accessid: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6id: 16

Lno:6,7,5,4,8

id: 11

id: 104

id: 13Lno:6,7,5,4,6

miss lno 6 (d)lno 6miss lno 7 (d)lno 7, 5, 4

miss lno 6 (d)miss lno 6 (i)

lno 6miss lno 7 (i)miss lno 7 (d)lno 7miss lno 5 (i)lno 5, 4

miss lno 3 (i)miss lno 3 (d)lno 3miss lno 4 (i)lno 4

Bus schedule CPU1 CPU2 CPU1 CPU2 CPU2CPU1 ...24 320 8 16 42 52

25

Multiprocessor Analysis and Optimization

In multiprocessor systems, the WCET depends on the schedule !

In multiprocessor systems, the schedule depends on the WCET !

26

5

Overall ApproachC

PU

1C

PU

2C

PU

3B

US

1

2

3

CPU1: 1, 4

CPU2: 2

CPU3: 3 , 5

41

3

1

2

3

2

4

2

3

44

2

5

2

5

44 4

55

27

Overall Approach

starting at tfor the time interval

Select bus schedule B

tasks from set Determine WCET of the

is the earliest timea tasks from set

finishes

Schedule new task attime t>=

that are active at time t

is the set of all tasksN

ew t

ask

to s

ched

ule

optim

izat

ion

Bus

sch

edul

e

28

Overall Approach

starting at tfor the time interval

Select bus schedule B

tasks from set Determine WCET of the

is the earliest timea tasks from set

finishes

Schedule new task attime t >=

that are active at time t

is the set of all tasksN

ew t

ask

to s

ched

ule

optim

izat

ion

Bus

sch

edul

e

29

Bus Schedule: BSA1

t0

t1

t3 CPU2

t1 t2t0 t4t3

CPU2CPU1 CPU1 CPU2

... ...over

a p

erio

d slot_start ownerCPU1

CPU2

CPU1

...

t2

30

Bus Schedule: BSA2

t0

owners1, 2 12

seg_sizeseg_start

owner size13

CPU1

CPU2

Segment 1 Segment 2ov

er a

per

iod

...

t1 t2t0 t4t3

CPU2CPU1 CPU1 CPU2 ...

t4

owners2, 1 7

seg_sizeseg_start

owner size25

CPU1

CPU2

CPU2 CPU1

t5 t6

...

31

Bus Schedule: BSA3

t0

seg_start owners1, 2 3

slot_size

t42, 1 6

... ... ...

over

a p

erio

d

Segment 1 Segment 2

t1 t2t0 t4t3

CPU2CPU1 CPU1 CPU2 ...CPU2 CPU1

t5 t6

32

Experimental Results

BSA4

BSA3

BSA2

BSA1

Number of CPUs

No

rmal

ized

Sch

edu

le L

eng

th

1

1.5

2

2.5

3

3.5

4

2 4 6 8 10 12 14 16 18 20

33

Experimental Results

4.0 3.0 2.6

1.2 1.01.82.2

5.0

1

1.5

2

2.5

3

3.5

2 4 6 8 10

Number of CPUs

No

rmal

ized

Sch

edu

le L

eng

th

34

Real-life Example

Smart phone GSM voice codec (encoder+decoder)

and Mp3 player 64 tasks, between 100-2000 lines of C

code per task 4 ARM7 processors, interconnected via

a bus

35

Real-life Example

BSA_1 BSA_2 BSA_3 BSA_4

1.17 1.33 1.31 1.62

GSM + Mp364 tasks4 ARM7 processors

36

Conclusions

Realistic model for MPSoC WCET analysis must be integrated in the

system scheduling Tool for system level scheduling and WCET Tested on real applications

37

ARTIST

LiU

TU Brauschweig U. of Bologna

Original SymtaP code

Bus controllerImplementation

Recommended