20
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida Liza Rodriguez Aurelio Morales

EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

  • Upload
    mauli

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand. EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida. Liza Rodriguez Aurelio Morales. Outline - PowerPoint PPT Presentation

Citation preview

Page 1: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

Pipelines for Future Architectures in Time Critical Embedded SystemsBy: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand

EEL 6935 - Embedded SystemsDept. of Electrical and Computer

EngineeringUniversity of Florida

Liza RodriguezAurelio Morales

Page 2: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

2 of 23

OutlineOutline

• Pipelining ReviewPipelining Review• Timing AnalysisTiming Analysis

•AnomaliesAnomalies•Domino EffectsDomino Effects

• Architecture ClassificationsArchitecture Classifications• ConclusionsConclusions

Page 3: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

3 of 23

OutlineOutline

• Pipelining ReviewPipelining Review• Timing AnalysisTiming Analysis

•AnomaliesAnomalies•Domino EffectsDomino Effects

• Architecture ClassificationsArchitecture Classifications• ConclusionsConclusions

Page 4: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

4 of 23

Pipelining ReviewPipelining Review

•Pipelining is an implementation technique where multiple instructions are overlapped in execution

•Pipelining takes advantage of parallelism that exists among the actions needed to execute and instruction

• Pipelining is like an assembly line, each stage operates in parallel with the other stages

• Instructions enter at one end, progress through the stages, and exit at the other end

•Pipelining is the key implementation technique used to make fast CPUs

Page 5: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

5 of 23

Pipelined ExamplePipelined Example

LD r4, 0(r3)

Fetch Decode Execute Memory Write Back

LD r4, 0(r3) 5 cycles (5)ADD r1, r7, r3 1 cycles (4)

ADD r1, r7, r3

001100

r4LOAD 0 + r3 read

LD r4, 0(r3)ADD r1, r7, r3

101011

ADD r7 + r3

ADD r2, r6, r30

101011

ADD r6 + r3

ADD r2, r6, r30

ADD r2, r6, r30 1 cycles (4)

XXXXXX r1r2

• Pipeline registers separate functional units to allow parallel operation

• Pipeline will stall if there is a hazard

Page 6: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

6 of 23

Further OptimizationsFurther Optimizations

• Superscalar – executes more than one instruction per clock cycle by simultaneously dispatching multiple instructions to redundant functional units

• Branch Prediction – predict branches based on a predefined static algorithm or based on dynamic branch history

• Out of order execution – instructions are dynamically scheduled to avoid hazards

and dependencies that may stall the pipeline

Fetch Decode Execute Memory Write Back

Fetch Decode Execute Memory Write Back

Execute

ADD r1, r2, r3 waitSUB r1, r2, r3 waitMUL r6, r7, r8 ready

Memory

LD r4, (0) r5 wait ST r2, (0) r1 readyLD r4, (0) r1 wait

ReservationStations

Functional Units

Page 7: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

7 of 23

OutlineOutline

• Pipelining ReviewPipelining Review• Timing AnalysisTiming Analysis

•AnomaliesAnomalies•Domino EffectsDomino Effects

• Architecture ClassificationsArchitecture Classifications• ConclusionsConclusions

Page 8: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

8 of 23

Real Time Embedded SystemsReal Time Embedded Systems•Timing Analysis

• The analysis for a set of tasks executing on a given hardware to guarantee that timing constraints will be met

• Timing requires upper and lower bounds on execution times of tasks to be known:

• Worst Case Execution Time (WCET), Best Case Execution Time (BCET)

• Analysis results are highly dependent on the architecture

• An architecture without accompanying performance analysis technology should not be seriously considered for time critical embedded applications

•Desired Criteria• Soundness – valid, reliable, free from random error

• Obtainable Precision – architecture has predictability properties

• Analysis effort to reach precision – depends on solution space to be explored

Page 9: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

9 of 23

Timing AnalysisTiming Analysis•Non-Pipelined Architecture – Simple

• Add the execution times of individual instructions to obtain a bound on the execution time of a basic block

•Pipelined Architecture – Complex• Overlapped instructions - cannot consider individual

instructions in isolation

• Instructions must be considered collectively to obtain timing bounds

Page 10: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

10 of 23

Timing AnalysisTiming Analysis•Pipelined Architecture – Complex

• To do WCET analysis, the most costly pipeline path should be selected

• To compute a precise bound, the analysis needs to include as many “timing accidents” as possible

• Timing accidents: data hazards, branch mispredictions, occupied functional units, cache misses, etc.

• Issues: timing anomalies and domino effects

• Thus, timing has to follow all possible successor states

• The more performance enhancing features the pipeline has, the larger the search space

Page 11: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

11 of 23

Timing AnomalyTiming Anomaly

•Formal definition - a situation where the local worst case does not contribute to the global worst case

•A better definition – a positive improvement to the architecture that has a negative effect on execution time

•Examples:• A caches miss may result in a shorter execution time

• Shortening an instruction leads to longer execution time

Page 12: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

12 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13

Timing Anomaly Example: Cache Hit or MissTiming Anomaly Example: Cache Hit or Miss• A LD r4,

0(r3)B ADD r5, r4, r4C ADD r1, r6, r6D MUL r2, r1, r1E MUL r3, r2, r2

• Miss Penalty 8 cyc.LSU 2 cyc.ALU 1 cyc.Multiplier 4 cyc.

• Architecture is made up of functional units and reservation stations – similar to Tomasulo’s Algorithm

LSU

ALU

MULT

A B C D E

1 2 3 4 5 6 7 8 9 10 11 12 13

LSU

ALU

MULT

A B C D E

A

B C

D E

A

BC

D E

Cache Hit

Cache Miss

Page 13: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

13 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13

Timing Anomaly Example: Reduced InstructionTiming Anomaly Example: Reduced Instruction• A MUL r2, r1,

r1B ADD r3, r2, r2C ADD r4, r5, r5D LD r6, 0(r4)E ADD r7, r6, r6

• Miss Penalty 8 cyc.LSU 4 cyc.ALU 2 cyc.Multiplier ? cyc.

• Architecture is made up of functional units and reservation stations – similar to Tomasulo’s Algorithm

LSU

ALU

MULT

A B C D E

1 2 3 4 5 6 7 8 9 10 11 12 13

LSU

ALU

MULT

A B C D E

A

CB

D

E

Multiplier = 5 cycles

Multiplier = 2 cycles

A

BC

D

E

Page 14: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

14 of 23

Domino EffectsDomino Effects

•Formal definition – a system exhibits a domino effect if there are two hardware states s, t such that the difference in execution time may be arbitrarily high and cannot be bounded by a constant

•A better definition – a minor timing accident can cause an unbounded increase in execution time

•Examples:• Timing accident in a loop

• PowerPC755 pipeline – Schneider

• Pseudo-least-recently used (PLRU) replacement policy – Berg

Page 15: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

15 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Domino EffectsDomino EffectsA B A B A B A

ALU

• A ADD r4, r3, r3B ADD r1, r2, r2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A B A B A B A

ALU

• First A gets delayed one clock cycle due to a dependency with the previous instruction

• A Dispatch EA +5A ExecuteImmdtB DispatchDA+4B ExecuteDA+6

A BA BA BA

BA B A BA A

Page 16: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

16 of 23

OutlineOutline

• Pipelining ReviewPipelining Review• Timing AnalysisTiming Analysis

•AnomaliesAnomalies•Domino EffectsDomino Effects

• Architecture ClassificationsArchitecture Classifications• ConclusionsConclusions

Page 17: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

17 of 23

Classification of ArchitecturesClassification of Architectures• Fully Timing Compositional Architectures

• No timing anomalies or domino effects

• Timing analysis can safely follow worst case paths only

• Example: ARM7

• Compositional Architectures with Constant Bounded Effects• Exhibit timing anomalies but no domino effects

• Timing analysis has to consider all paths but can be optimized to safely discard all local non-worst case paths by adding a constant number of cycles to the worst case path – trading precision with efficiency

• Example: Infineon TriCore

• Non Compositional Architectures• Exhibit timing anomalies and domino effects

• Timing analysis has to follow all possible paths since a local effect can greatly influence the future execution arbitrarily

• Example: PowerPC775

Page 18: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

18 of 23

OutlineOutline

• Pipelining ReviewPipelining Review• Timing AnalysisTiming Analysis

•AnomaliesAnomalies•Domino EffectsDomino Effects

• Architecture ClassificationsArchitecture Classifications• ConclusionsConclusions

Page 19: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

19 of 23

ConclusionsConclusions

• Architectural optimizations in embedded systems are necessary to improve performance and to meet critical time constraints

• Pipelines - multiple issue, out of order execution, branch prediction, etc.

• However, an architectural optimization may not be worth implementing if effects such as timing anomalies and domino will have a negative impact on timing analysis

• How good is an optimization if you can’t measure its effects?

• A trade off exists between the amount of executions time you can save by pipeline optimizations and the amount of precision you lose in timing analysis

Page 20: EEL 6935 - Embedded Systems Dept. of Electrical and Computer Engineering University of Florida

20 of 23

Questions?Questions?