29
Predictable Programming on a Precision Timed Architecture Hiren D. Patel UC Berkeley [email protected] Joint work with: Ben Lickly, Isaac Liu, Edward A. Lee - UC Berkeley Sungjun Kim, Stephen A. Edwards - Columbia University

Predictable Programming on a Precision Timed Architecture

  • Upload
    hoshi

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Predictable Programming on a Precision Timed Architecture. Hiren D. Patel UC Berkeley [email protected] Joint work with: Ben Lickly , Isaac Liu, Edward A. Lee - UC Berkeley Sungjun Kim, Stephen A. Edwards - Columbia University. Edwards and Lee - Case for PRET. - PowerPoint PPT Presentation

Citation preview

Page 1: Predictable Programming on a Precision Timed Architecture

Predictable Programming on a Precision Timed Architecture

Hiren D. PatelUC Berkeley

[email protected]

Joint work with: Ben Lickly, Isaac Liu, Edward A. Lee - UC Berkeley

Sungjun Kim, Stephen A. Edwards - Columbia University

Page 2: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 2

Edwards and Lee - Case for PRET

• 2007 – Edwards and Lee made a case for precision timed computers (PRET machines)– Predictability– Repeatability

S. A. Edwards and E. A. Lee, The case for the precision timed (PRET) machine. In Proceedings of the 44th Annual Conference on Design Automation (San Diego, California, June 04 - 08, 2007). DAC '07. ACM, New York, NY, 264-265.

2

Page 3: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 3

Edwards and Lee - Case for PRET

• Unpredictability– Difficulty in determining timing behavior

through analysis

• Non-repeatability– Lack of guarantee that every execution

yields the same timing behavior

• Brittleness– Small changes have big effects on timing

behavior

3

Page 4: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 4

Brittleness

• Expensive affair

• Tight coupling of software and hardware

• Reliance on testing for validation

• Upgrading difficult

• Solution: stockpile

4

Source: www.skycontrol.net

Page 5: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 5

But wait …

• Real-time scheduling– Worst-case execution

time• Detailed model of

hardware• Large engineering

effort• Valid for particular

hardware models

– Interrupts, inter-process communication, locks …

• Bench testing

– Brittle

5

Sebastian Altmeyer, Christian Hümbert, Björn Lisper, and Reinhard Wilhelm. Parametric Timing Analysis for Complex Architectures. In Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'08), pages 367-376, Kaohsiung, Taiwan, August 2008. IEEE Computer Society.

Page 6: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 6

Precise Timing and High Performance

6

Traditional Alternative

Caches Scratchpads

Deep out-of-order pipelines Thread-interleaved pipelines

Function-only ISAs ISAs with timing instructions

Function-only languages Languages and programming models with timing

Best-effort communication Fixed-latency communication

Time-sharing Multiple independent processors

Page 7: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 7

Outline

• Introduction• Related Work• PRET Machine• Programming Example• Future Work• Conclusion

7

Page 8: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 8

Related Work

• Java Optimized Processor– Schoeberl et al. [2003]

• Timing instructions – Ip and Edwards [2006]

• Reactive processors– Von Hanxleden et al. [2005]– Salcic et al. [2005]

• Virtual Simple Architecture– Mueller et al. [2003]

8

Page 9: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 99

Semantics of Timing Instructions

• Deadline instructions– Denote the required

execution time of a block

• When decoded– Stall instruction if

timer value is not 0– Otherwise set timer

value to new value

deadi $t0, 10

deadi $t0, 8

deadi $t0, 0

L0:

deadi $t0, 10

b L0

Straight Line Block 0

Straight Line Block 1

Loop Block

Page 10: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 10

Tracing A Program Fragment

A: deadi $t0, 6B: sethi %hi(0x3f800000),

%g1C: or %g1, 0x200, %g1 D: st %g1, [ %fp + -12 ]E: deadi $t0, 8F: …

cycle

065432108

$t0

Page 11: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 1111

Precision Timed Architecture

Thread-interleaved pipeline

Scratchpad memories

Time-triggered main memoryaccess

Round-robin thread scheduling

Page 12: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 12

Memory Hierarchy

• Clocks– Main clock– Derived clocks

• Instruction and data scratchpad memories – 1 cycle access latency

• Main memory – 16MB size– Latency of 50ns– Frequency:250Mhz

• ~13 cycles latency12

CoreCore MainMem.MainMem.

SPMSPMSPMSPMSPMSPMSPMSPMSPMSPMSPMSPM

DMADMA

Page 13: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 13

Thread-interleaved Pipeline

• Thread stalls – Main memory access– Multi-cycle operations– Deadline instructions

• Replay mechanism– Execute same PC next

iteration– Multi-cycle ALU ops

replay instructions

13

FetchFetch

DecodeDecode

Reg. AccessReg. Access

ExecuteExecute

MemoryMemory

WriteBackWriteBack

F/D

D/R

R/E

E/M

M/W

Decrement DeadlineTimers

Stall ifDeadlineInstruction

Increment PC

Check main memory access

Page 14: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 14

Time-Triggered Access through Memory Wheel

• Decouple thread’s access pattern

• Time-triggered access

• Best-case access time– If accessed 1st cycle

• Worst-case access time– If accessed 2nd cycle

of window

14

thread0 thread1 thread2 thread3 thread4 thread5 thread0

90 cycles until thread0 completes

On time On time On time On time On time

Page 15: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 1515

Tool Flow

• GCC 3.4.4, SystemC 2.2, Python 2.4

Boot code Motorola SREC files

C programstiming instructions

GCC to compile boot codeand program code

Page 16: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 16

Simple Mutual Exclusion Example

• Producer followed by Consumer and Observer– Consumer and Observer execute together

• Loop rate of two rotations of memory wheel– 1st for Producer to write– 2nd Consumer and Observer to read

16

Write to shared dataRead from shared data

Write to output

Page 17: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 1717

Video Game Example

Graphic Thread

Graphic Thread

VGA-Driver Thread

VGA-Driver Thread

Even BufferEven Buffer

Odd BufferOdd

Buffer

Main-Control Thread

Main-Control Thread

Odd Queue

Even Queue

Command

Command

Pixel Data

Pixel Data

Swap (When Sync Requested and When Odd Queue Empty)

Sync (After queue swapped)

Update Screen (Sync request)

Sync (After buffer swapped)

Refresh (Sync request)

Swap (When sync requested and when Vertical blank)

Page 18: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 18

Timing Requirements

18

Signal Timing Requirement

Pixel Cycles

V. Sync 64µs 1611

V. Back-porch 1.02ms 25679

Draw 480 lines 15.25ms

V. Front-porch 350µs 8811

H. Sync 3.77µs 96

H. Back-porch 1.89µs 48

Draw 640 pixels 25.42µs

H. Front-porch 0.64µs 16

Page 19: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 19

Timing Implementation

• Pixel-clock using derived clock– 25.175Mhz– ~ 39.72ns cycle

period

• Drawing 16 pixels

19

Page 20: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 2020

Future Work

• Architecture– DMA– DDR2 main memory model– Thread synchronization primitives– Shared data between threads

• Real-time Benchmarks– With timing requirements

• Programming models– Memory allocation schemes– Synchronizations

Page 21: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 2121

Conclusion

• What we want …– Time as a first class citizen of embedded

computing– Predictability– Repeatability

• Where we are at …– PRET cycle-accurate simulator– Release …

Page 22: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 22

Page 23: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 23

Extras

Page 24: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 24

More on Brittleness

• Small changes may have big effects on timing behavior

Theorem (Richard’s anomalies):If a task set with fixed priorities, execution times, and

precedence constraints is optimally scheduled on a fixed number of processors, then increasing the number of processors, reducing execution times, or weakening precedence constraints can increase the schedule length.

Richard L. Graham, “Bounds on the performance of scheduling algorithms”, in E. G. Coffman, Jr.(ed.), Computer and Job-Shop Scheduling Theory, John Wiley, New York, 1975.

Page 25: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 25

Richard’s Anomalies

1

9

2

5

3

6

4

7

T1/3 T2/2 T3/2 T4/2

T9/9 T5/4 T6/4 T7/4

8

T8/4

0 3 12

• 9 tasks, 3 processors, priority list, precedence order, execution times.

Page 26: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 26

• eTime’ = eTime - 1

Richard’s Anomalies: Reducing Execution Times

1

9

2

5

3

6

4

7

T1/2 T2/1 T3/1 T4/1

T9/8 T5/3 T6/3 T7/3

8

T8/3

0 3 12

Page 27: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 27

Richard’s Anomalies: More Processors

1

9

2

5

3

6

4

7

T1/3 T2/2 T3/2 T4/2

T9/9 T5/4 T6/4 T7/4

8

T8/4

0 3 12

• 4 processors

15

Page 28: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 28

Richard’s Anomalies: Changing Priority List

1

7

2

4

6

3

3

8

T1/3 T2/2 T3/2 T4/2

T9/9 T5/4 T6/4 T7/4

9

T8/4

0 3 12

• L = (T1,T2,T4,T5,T6,T3,T9,T7,T8)

Page 29: Predictable Programming on a Precision Timed Architecture

Patel, UC Berkeley, PRET 29

Brittleness Again…

• In general, all task scheduling strategies are brittle