Precision-Timed (PRET) Machines - Electronic sander/tutorials/hipeac-2013/files/  · Reineke

  • View
    212

  • Download
    0

Embed Size (px)

Text of Precision-Timed (PRET) Machines - Electronic sander/tutorials/hipeac-2013/files/  · Reineke

  • Precision-Timed (PRET) Machines

    Stephen A. Edwards Columbia University Sungjun Kim Columbia University Edward A. Lee UC Berkeley Ben Lickly UC Berkeley Isaac Liu UC Berkeley Hiren D. Patel University of Waterloo Jan Reineke Saarland University UC Berkeley

    Designing Next-Generation Real-Time Streaming Systems Tutorial at HIPEAC 2013

  • Reineke et al., Saarland 2

    C

    ProgramBinary

    Hardware

    RealizationCompiler

    WCET

    Analysis

    Ptolemy

    Model

    Code

    Generator

    Timing Requirements

    Current Timing Verification Process

  • Reineke et al., Saarland 3

    WCET

    Analysis

    WCET

    Analysis

    WCET

    Analysis

    ArchitectureArchitecture

    ArchitectureHardware

    Realization

    WCET

    Analysis

    C

    ProgramBinaryCompiler

    Ptolemy

    Model

    Code

    Generator

    Timing Requirements

    Current Timing Verification Process

    New Architecture New Analysis & Recertification

    For modern architectures: Extremely time-consuming, costly and error-prone

    Boeing: 40 years supply of

  • Reineke et al., Saarland 4

    Lack of Suitable Abstractions for Real-Time Systems

    Higher-level Model of Computation

    C-level Programming Language

    Instruction Set Architecture (ISA)

    Hardware Realizations

    Code Generation

    Compilation

    Execution

    Abstracts from execution time

    Increasingly unpredictable

  • Reineke et al., Saarland 5

    Agenda of this PRET (and this presentation)

    Higher-level Model of Computation

    C-level Programming Language

    Instruction Set Architecture (ISA)

    Hardware Realizations

    Code Generation

    Compilation

    Execution

    Endow with temporal semantics and control over timing

    Development of the PTARM, a predictable hardware realization

  • Reineke et al., Saarland 6

    Agenda of this PRET (and this presentation)

    Higher-level Model of Computation

    C-level Programming Language

    Instruction Set Architecture (ISA)

    Hardware Realizations

    Code Generation

    Compilation

    Execution

    Endow with temporal semantics and control over timing

    Development of the PTARM, a predictable hardware realization

  • Reineke et al., Saarland 7

    Adding Control over Timing to the ISA Capability 1: delay until

    Some possible capabilities in an ISA: [C1] Execute a block of code taking at least a

    specified time

    time1 second

    Code blockdelay_

    until

    time1 second

    Code block

    Where could this be useful? - Finishing early is not always better:

    - Scheduling Anomalies (Grahams anomalies) - Communication protocols may expect periodic behavior -

  • Reineke et al., Saarland 8

    Adding Control over Timing to the ISA Capabilities 2+3: late and immediate miss detection

    [C2] Conditionally branch if the specified time was exceeded.

    [C3] If the specified time is exceeded during execution of the block, branch immediately to an exception handler.

    time1 second

    Code block

    branch_expired

    time1 second

    Code block

    exception_on_expire

  • Reineke et al., Saarland 9

    Applications of Variants 2+3 late and immediate miss detection

    [C3] immediate miss detection: l Runtime detection of missed deadlines to initiate

    error handling mechanisms l Anytime algorithms l However: unknown state after exception is taken

    [C2] late miss detection: l No problems with unknown state of system l Change parameters of algorithm to meet future

    deadlines

  • Reineke et al., Saarland 10

    PRET Assembly Instructions Supporting these Four Capabilities

    set_time %r, loads current time + into %r

    delay_until %r stall until current time >= %r

    branch_expired %r, branch to target if current time > %r

    exception_on_expire %r, arm processor to throw exception when current time > %r

    deactivate_exception disarm the processor for exception

  • Reineke et al., Saarland 11

    Controlled Timing in Assembly Code

    [C1] Delay un-l: set_time r1, 1s // Code block delay_until r1

    [C2] Late miss detec-on set_time r1, 1s // Code block branch_expired r1, delay_until r1

    set_time r1, 1s exception_on_expire r1, 1 // Code block deactivate_exception 1 delay_until r1

    [C3] Immediate miss detec-on

  • Reineke et al., Saarland 12

    MTFD Meet the F(inal) Deadline

    Capability [C1] ensures that a block of code takes at least a given time.

    [C4] MTFD: Execute a block of code taking at most the specified time.

    [C4] Exact execu-on: set_time r1, 1s // Code block MTFD r1 delay_until r1

    Being arbitrarily slow is always possible and easy. But what about being fast?

  • Reineke et al., Saarland 13

    Current Timing Verification Process

    New Architecture Recertification

    Extremely time-consuming and costly

    WCET

    Analysis

    WCET

    Analysis

    WCET

    Analysis

    ArchitectureArchitecture

    ArchitectureHardware

    Realization

    WCET

    Analysis

    C

    ProgramBinaryCompiler

    Ptolemy

    Model

    Code

    Generator

    Timing Requirements

  • Reineke et al., Saarland 14

    The Future Timing Verification Process

    Timing is property of ISA Compiler can check constraints once and for all Downside: little flexibility in development of hardware

    realizations

    ArchitectureArchitecture

    Architecture

    C Program

    w/ Deadline

    Instructions

    BinaryHardware

    RealizationCompiler

    Timing

    Analysis

    Ptolemy

    Model

    Code

    Generator

  • Reineke et al., Saarland 15

    The Future Timing Verification Process: Flexibility through Parameterization

    ISA leaves more freedom to implementations through a parameterized timing model

    Compiler generates constraints on parameters which are sufficient to meet the timing constraints

    Parametric timing analysis is ongoing work

    ArchitectureArchitecture

    Architecture

    C Program

    w/ Deadline

    Instructions

    BinaryHardware

    RealizationCompiler

    Const

    raints

    Parametric

    Timing

    Analysis

    Ptolemy

    Model

    Code

    Generator

  • Reineke et al., Saarland 16

    The Future Timing Verification Process: Flexibility through Parameterization

    Possible parameters: Latencies of different components, such as the

    pipeline, scratchpad memory, main memory, buses Sizes of buffers, such as scratchpad memories or

    caches.

    ArchitectureArchitecture

    Architecture

    C Program

    w/ Deadline

    Instructions

    BinaryHardware

    RealizationCompiler

    Const

    raints

    Parametric

    Timing

    Analysis

    Ptolemy

    Model

    Code

    Generator

  • Reineke et al., Saarland 17

    The Future Timing Verification Process: Flexibility through Parameterization

    Challenge: Parameterization should allow for: Efficient and accurate parametric timing analysis, and Admit a wide variety of cost-efficient hardware

    realizations.

    ArchitectureArchitecture

    Architecture

    C Program

    w/ Deadline

    Instructions

    BinaryHardware

    RealizationCompiler

    Const

    raints

    Parametric

    Timing

    Analysis

    Ptolemy

    Model

    Code

    Generator

  • Reineke et al., Saarland 18

    Agenda of this PRET (and this presentation)

    Higher-level Model of Computation

    C-level Programming Language

    Instruction Set Architecture (ISA)

    Hardware Realizations

    Code Generation

    Compilation

    Execution

    Endow with temporal semantics and control over timing

    Development of the PTARM, a predictable hardware realization

  • Reineke et al., Saarland 19

    Hardware Realizations: Challenges to deliver predictable timing

    Pipelining Memory hierarchy: Caches, DRAM On-chip communication I/O (DMA, interrupts) Resource sharing (e.g. in multicore architectures)

  • Reineke et al., Saarland 20

    First Problem: Pipelining Processor Design 101

    Hennessey and Patterson, Computer Architecture: A Quantitative Approach, 2007.from Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2007.

  • Reineke et al., Saarland 21

    First Problem: Pipelining

    from Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2007.

    Pipeline It!

    Hennessey and Patterson, Computer Architecture: A Quantitative Approach, 2007.

  • Reineke et al., Saarland 22

    Pipelining: Hazards

    from Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2007.

    Great Except for Hazards

    Hennessey and Patterson, Computer Architecture: A Quantitative Approach, 2007.

  • Reineke et al., Saarland 23

    Forwarding helps, but not all the time ...But It Does Not Solve Everything...

    LD R1, 45(r2)

    DADD R5, R1, R7

    BE R5, R3, R0

    ST R5, 48(R2)

    Unpipelined F D E M W F D E M W F D E M W F D E M W

    F D E M W

    The Dream F D E M W

    F D E M W

    F D E M W

    F D E M W

    The Reality F D E M W Memory Hazard

    F D E M W Data Hazard

    F D E M W Branch Hazard

  • Reineke et al., Saarland 24

    Our Solution: Thread-interleaved