34
1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems Research Departments of CS & ECE North Carolina State University

1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

Embed Size (px)

Citation preview

Page 1: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

1

F A S TF A S T

Frequency-Aware Static Timing Analysis

By

Kiran Seth, Aravindh Anantaraman,

Frank Mueller and Eric Rotenberg

Center for Embedded Systems Research Departments of CS & ECE

North Carolina State University

Page 2: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

2

Real-Time Systems

Tasks have a deadline must terminate on time

Classification— Hard Real-time: missed deadline catastrophe— Soft Real-time: missed deadline low QoS.

Multi-tasking real-time systems require scheduling algorithms

— Scheduler ensures task arbitration online— Schedulability test ensures met deadlines (static test) requires known Worst-Case Execution Time (WCET)

Page 3: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

3

Static Timing Analysis

To schedule tasks in Real-time systems, need— Worst-case Execution Time (WCET) and— Worst-case Execution Cycles (WCEC)

Experimental WCET unsafe bounds— Due to input & hardware complexity

Use static timing analysis toolset to obtain safe WCET bounds

Page 4: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

4

Static Instruction Cache Analysis

Work explained in [Mueller RTS-J’00]

Interprocedural data-flow analysis Predicts each cache reference as one of

— always-hit— always-miss— first-hit— first-miss

Each instruction categorized— for each loop level— and function (loop w/ 1 iteration)

Page 5: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

5

Static Data Cache Simulation

For accurate static timing analysis— need data cache analysis

Currently, data cache analysis tool not accurate enough— Too many restrictions, not general enough for real code— Improvements by [Vera RTSS’03]

Solutions — All data accesses hits… highly underestimated.— All data accesses misses… highly overestimated.

Assume big enough cache to fit all data set

Assume first-time accesses as misses (cold misses, only), o/w hits

— Accurate? Yes. But what is caches smaller?— No significant impact on this study

Page 6: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

6

Static Timing Analyzer

Path & tree-based approach [Healy IEEE TC’99]

Find nodes in the CFG and derive WCEC for each node

A node is a function or loop

WCET is calculated bottom-up

Standard timing analysis assumptions apply — No recursion— All loop bounds must be known— No function pointers

Page 7: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

7

Motivation of FAST

Dynamic Voltage Scaling (DVS) scheduling schemes— Change frequency/voltage for system

save power without missing deadlines— Several DVS scheduling schemes available— Good fit for real-time systems— Most real-time systems

– have low utilization– are low-power embedded systems

Potential for considerable energy savings with DVS

Page 8: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

8

Problem

Current DVS schemes:— Ignore effects of frequency scaling on WCEC

— DVS schemes assume: WCEC constant with frequency Overestimate WCET at lower frequencies

To demonstrate the problem— WCET of C-Lab benchmark static timing analysis tool— For frequencies 100MHz – 1GHz— Assess observed WCEC & WCET vs.

assumption made by DVS schemes

Page 9: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

9

Actual vs. Assumed WCEC for FFT

0

500000

1000000

1500000

2000000

2500000

3000000

Frequency (MHz)

Nu

mb

er o

f cy

cles

Actual WCEC

Assumed WCEC

WCEC changes with frequency modulation— WCEC increases with higher frequency— Constant memory latency: 100ns

Page 10: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

10

0.000000

5.000000

10.000000

15.000000

20.000000

25.000000

30.000000

Frequency (MHz)

Tim

e (m

s)Actual WCET

Assumed WCET

Actual vs. Assumed WCET for FFT

Difference in chosen frequency for DVS w/ WCET=5ms— assumed: ~ 550 MHz— actual: ~ 150 MHz

Page 11: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

11

Parametric Frequency Model

Problem:

DVS— Considers processor frequency scaling— Ignores effect of frequency scaling on memory accesses

With frequency scaling:— Cycles for processor operations remains constant— Except for memory operations problem

DVS schemes overestimate the WCET at lower frequencies— Cannot fully utilize available slack— Power savings potential largely wasted

Page 12: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

12

Parametric Frequency Model

Solution:

Calculate WCEC— accounting for effects of memory accesses— using the new parametric frequency model

Model:

WCEC(f) = i + mN = i + mLf

i: Invariant # of worst-case cycles (for non-memory operations)

m: # of worst-case memory accesses

N: # of cycles per memory access— depends on memory latency L and frequency f: N = Lf

Page 13: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

13

Using the Parametric Frequency Model

A: add R2, R1, R3B: load R4, [M1]C: add R2, R1, R4D: add R2, R1, R5

Instruction sequence simulated through simple pipeline explain parametric frequency model

Simple pipeline:— 6 stages— Data & instruction cache— N = 10

Page 14: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

14

Example 0: Cache Hits

Recall: B is load instruction

WCEC = 9 + 0N

— Each row represents pipeline stage.— Time (and cycle count) increases horizontally.

Page 15: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

15

Example 1: Effect of I-cache miss

WCEC = 9 + 1N

Stall due to I-cache miss is shown

Model accurately captures memory latency, however long

Page 16: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

16

Example 2: Effect of D-cache miss

Recall: B is load instruction

WCEC = 9 + 1N

Stall due to D-cache miss is shown

Again, model captures memory latency, however long

Notice: during stall cycles, no useful work is done

Page 17: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

17

Example 3: Effect of I- & D-cache Miss

WCEC = 9 + 2N I-cache miss first, then D-cache miss

Overlap between useful cycles & stall cycles

Also during high-latency execution operations— E.g. floating-point, multiply, … overlap w/ D-cache miss

Leads to overestimation in practice rare, still safe WCET

Page 18: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

18

Experimental Validation

Combine frequency model with our static timing analyzer FAST tool

WCEC FAST equations

Experiment to validate results from FAST tool— Run benchmarks through FAST tool— An equation representing WCEC for benchmark obtained

— Run same benchmarks through traditional timing analysis tool

— Vary frequencies: 100MHz-1GHz

Page 19: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

19

Frequency-Aware Static Timing Analysis (FAST)

FAST tool “as accurate” as traditional static timing analysis

Slight overestimation in case of floating-point benchmarks

0.998

0.999

1.000

1.001

1.002

1.003

1.004

1.005

fft adpcm lms cnt mm srt

Benchmarks

Ra

tio

(F

AS

T V

S.

Sta

tic

tim

ing

an

aly

sis

) Frequency = 100 MHz

Frequency = 400 MHz

Frequency = 700 MHz

Frequency = 1000 MHz

Page 20: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

20

FAST in EDF Scheduling with DVS

DVS with EDF: Ck/Pk , where =fc/fm

FAST with EDF: (ik+mkLfm)/Pkfm

— Schedulability test: (ik/Pk) / fm (1 - L mk/Pk)

Implemented frequency model for 3 EDF-DVS algorithms— Algorithms by [Pillai & Shin]— Look-ahead improved:

– @ completion, consider next deadline– up to 34% additional energy savings (5-11% on avg.), low U– but 0.5-8% less savings at high utilization

Page 21: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

21

Improving DVS schemes

Use parametric frequency model to improve DVS schemes— provide accurate WCET

Improved energy savings

Architectural Simulator: SimpleScalar+Wattch [Brooks ISCA’00]

— 6-stage simple in-order pipeline processor model— I-cache and D-cache (8KB each)— Run 4-8 tasks simultaneously (scheduler runs as its own

task)— More accurate than E ~ V2f model ? Results newer than paper

Page 22: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

22

Static RT-DVS vs. FAST Static RT-DVS

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Static

Fast Static

Base case: EDF

Tasks at 1GHz Idle: 100MHz

no sleep mode small task periods

tasksets 1: integer 2: float 3: mix

Static scheme better than base EDF 12-60% energy savings FAST-Static even better 40-78% savings

high + lower utilization

Page 23: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

23

Cycle-conserving RT-DVS vs.FAST cycle-conserving RT-DVS

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Cycle

Fast Cycle

dynamic scheduling early completion, reclaimed as slack Cycle-conserving 57-72% energy savings FAST 71-80% savings

Page 24: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

24

Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Look-Ahead

Fast Look-Ahead

most aggressive DVS: early completion + max. deferral Look-ahead: slightly higher savings than cycle-conserving @ 68-80% FAST: slightly better in most cases @ 72-83%

Page 25: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

25

Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS

E ~ V2f model

Higher savings: up to 96% ?

Ratio look-ahead / FAST similar

Wattch detailed power model

Probably more accurate

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Look-Ahead

Fast Look-Ahead

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6

Tasksets

En

erg

y n

orm

aliz

ed t

o b

ase

ED

F

Look-ahead RT-DVS

FAST Look-ahead RT-DVS

Page 26: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

26

Conclusion

Energy savings in real-time systems can be significantly improved by considering the effects of frequency scaling on WCET

— FAST + Static RT-DVS– as good as Look-Ahead RT-DVS– less overhead

The parameterized frequency model can easily track effects of frequency scaling on WCET

FAST tool works best when — Many cache misses— If D-cache analysis is highly inaccurate (usually true)

FAST can make up for it— High memory latency— Insufficient dynamic slack reclaiming (during DVS scheduling)— Integrated into real-time hardware support [VISA ISCA’03]

Page 27: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

27

BACKUP SLIDES

Page 28: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

28

The V2f model

0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

3500.00

Frequency (MHz)

Po

wer

Page 29: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

29

Old DVS Scheduling Simulator

Event based simulator of scheduler.

Have to assume miss rate for the tasks in dynamic schemes.

Uses E ~ V2f energy model.

Gives a good idea about savings, BUT accurate ??

Page 30: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

30

Static RT-DVS vs. FAST Static RT-DVS

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6

Tasksets

En

erg

y n

orm

aliz

ed t

o b

ase

ED

F

Static RT-DVS

FAST Static RT-DVS

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Static

Fast Static

Page 31: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

31

Cycle-conserving RT-DVS vs.FAST cycle-conserving RT-DVS

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 3 4 5 6

Tasksets

En

erg

y n

orm

ali

zed

to

ba

se

ED

F

Cycle-conserving RT-DVS

FAST Cycle-conserving RT-DVS

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Cycle

Fast Cycle

Page 32: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

32

Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6

Tasksets

En

erg

y n

orm

aliz

ed t

o b

ase

ED

F

Look-ahead RT-DVS

FAST Look-ahead RT-DVS

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

taskset 1(high)

taskset 2(high)

taskset 3(high)

taskset 1(low)

taskset 2(low)

taskset 3(low)

Tasksets

En

erg

y N

orm

aliz

ed t

o b

ase

edf

Look-Ahead

Fast Look-Ahead

Page 33: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

33

DVS schemes (Pillai & Shin)

Static RT-DVS – Uses static slack available in the schedule.

Cycle-conserving RT-DVS – Uses static slack + dynamic slack due to early completion.

Look-ahead RT-DVS – Uses static slack + dynamic slack due to early completion + latest possible scheduling (look-ahead).

Page 34: 1 F A S T F A S T Frequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems

34

Complexity

Original EDF test O(n)

Modified EDF test still O(n)