31
William Sandqvist willia [email protected]

William Sandqvist [email protected]. What simplifications could a compiler, or you, do without sacrifice fast execution?

Embed Size (px)

Citation preview

Page 1: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Page 2: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

What simplifications could a compiler, or you, do without sacrifice fast execution?

Page 3: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

5-7 Code optimization#define MAX 10int a[MAX], b[MAX], c[MAX], x[MAX], y[MAX];int i, j, r, s;. . .int f(int a, int b){ int z; z = 2 * a – b; return z;}

int g(int a, int b, int c){ int z; z = a * c – c * b; return z;}

Two functions f and g

What code optimization can the compiler do?

-O, -O0, -O1, -O2, -O3, -Os ?

With the –O or –O0 you have to do all optimi-zations yourself

Page 4: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Optimization flags

-O, -O0 No optimization-O1 Optimize for size-O2 Optimize for speed

and enable some optimization-O3 Enable all optimizations as O2,

and intensive loop optimizations-Os Optimize for speed

Default setting!

Page 5: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Two for loops. . .for(i = 0; i <= MAX -1; i++) { x[i] = f(a[i], b[i]); }s = 2 * r;for(j = 0; j <= MAX - 1; j++) { y[j] = s * g(a[j], b[j], c[j]);

}

What can be done?

We want shorter execution time without increasing the code!

Page 6: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Loop integrationThe two loops have the same range (0, MAX-1), and no data dependency (x only in loop1, y only in loop2).

Loops can be integrated – saves loop overhead ( only i )!

s = 2 * r;for(i = 0; i <= MAX - 1; i++) { x[i] = f(a[i], b[i]); y[j] = s * g(a[j], b[j], c[j]); }

Page 7: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Precalculation at compile time

s = 2 * r;for(i = 0; i <= 9; i++) { x[i] = f(a[i], b[i]); y[j] = s * g(a[j], b[j], c[j]); }

The defined constant MAX is used as MAX - 1 in the loop. MAX - 1 could be precalculated as 10 – 1 = 9 at compile time!

Page 8: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Algebraic simplification

)( bacbccaz

Rewriting function g can save one multiplication operation:

int g(int a, int b, int c){ int z; z = c * (a – b); return z;}

mul sub mul mul sub

Page 9: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Inlining of functionsBoth functions f and g are ”short” and their code could be inserted directly in the loop.

int a[10], b[10], c[10], x[10], y[10];int i, r, s;s = 2 * r;for(i = 0; i <= 9; i++) { x[i] = 2 * a[i] – b[i]; y[j] = s * ((a[i] – b[i]) * c[i]); }

loop unrolling would give shorter execution time, but it would also increase the code size, so it can’t be used in this case.

Page 10: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Page 11: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

5-2 Register lifetimeA processor has this instruction type: op R1, R2, R3 all three registers must be different. Code to run:

u = c + d; (1) v = a – b; (2)w = a – u; (3)x = v + e; (4)

How many registers are needed?

Page 12: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Register Life Time Graph

u = c + d; (1) v = a – b; (2)w = a – u; (3)x = v + e; (4)

Four registers are needed!

Page 13: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Data Flow GraphA Data Flow Graph can detect data dependencies.

(1) Must be before (3)

(2) Must be before (4)

(2) and (3) can change execution order!

u = c + d; (1) v = a – b; (2)w = a – u; (3)x = v + e; (4)

Page 14: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

New Register Life Time Graph

u = c + d; (1) w = a – u; (2’)v = a – b; (3’)x = v + e; (4)

New instruction order

Now only 3 registers needed. Saving 25%.

Page 15: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Page 16: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

5-8 CDFG

y = 0;if(mode == 1) { for(i = 0; i < 5; i++) { y += a[i] * b[i]; } }

a) Control and Data Flow Graph (CDFG)

b) Multiplication takes 3 cycles, all other instructions take 1 cycle. Best/Worst execution time?

mode =0

TBest = 1+1= 2

mode =1

TWorst =1+1 +1+(5+1) + 5*4 +5 = 34T = 3+1 = 4

Page 17: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Multiply – Accumulate operationc) MAC-unit!

R1 = R1 + R2 * R3 in one cycle!

y += a[i] * b[i]; /* one cycle */

TWorst = 1+1 +1+(5+1) + 5*1 +5 = 19

19/34 = 0.56. With MAC 56% of ordinary processor execution time.

T = 1

Page 18: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Page 19: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Processes on a CPU

Page 20: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Scheduling states of process

Page 21: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Priority Driven Scheduling• Each process has fixed priority

• The ready process with the highest priority executes

• Process executes until completion or preemtion by higher priority process

Page 22: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Examples of sampling frequencies and execution period.

GPS sensor20 Hz

Speed sensor1 kHz

Joystick500 Hz

Actuator servo2000 HzProcess periods:

GPS=1/20 =50 ms

Speed =1/1000 =1 ms

Joystick = 1/500 =2 ms

Servo = 1/2000 =0.5 ms

RTOS

Tasks will often run periodicaly with different process periods.

Page 23: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Task Triplet

P( max execution time, period, deadline )

deadline < = period

RMS: deadline = period (simplification)

Page 24: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

6-2 Processor utilization and feasible scheduling

Timeline = least-common multiple of process periods

9, 2, 6 33, 2, 23 332 = 18

CPU utilization:

118

396

6

1

2

1

9

3

period

timeexecution

1 i

i

n

i

U100% ?

Task Triplet:P(execution time, period, deadline) deadline = period

P1(3, 9, 9) P2(1, 2, 2) P3(1, 6, 6)

Page 25: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Rate Monotonic Scheduling

RMS guarantee, feasible schedule exists if :

12

1

nnU n = 3 U < 0.78 In this case U = 1 so there is no guarantee!

RMS shortest period is assigned the highest priority and so on.

( Limit: n = U < 69% )

Page 26: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

RMS figurePriorities: P2 > P3 > P1 (2 < 6 < 9)

P1 misses the deadline! No feasible schedule with RMS!

Page 27: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Earliest Deadline First SchedulingEDF guarantee, feasible schedule exists if : U 1This case U = 1, EDF shall produce a feasible schedule.

Page 28: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

Page 29: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

6.3 Scheduling and semaphoresP(execution time, period, deadline)

P1(1, 3, 3) P2(1, 4, 4) P3(2, 6, 6) 3, 22, 23 322 = 12

P1: f1() [2] accessSem1() g1() [1] releaseSem1() y1() [1]

P2: f2() [1] g2() [1] y2() [1]

P3: f3() [1] accessSem1() g3() [2] releaseSem1() y3() [1]

Sem1 is a binary semaphore. accessSem1() and releaseSem1() takes 0 time.

RMS P1 > P2 > P3 (3 < 4 < 6)12

11

3

1

4

1

3

1U

Page 30: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

RMS with no critical sections

Page 31: William Sandqvist william@kth.se. What simplifications could a compiler, or you, do without sacrifice fast execution?

William Sandqvist [email protected]

RMS with critical sections