51
Computer Architecture Computer Architecture Pipelined Processor Ola Flygt Växjö University http://w3.msi.vxu.se/users/ofl/ [email protected] +46 470 70 86 49

Computer Architecture Pipelined Processor

  • Upload
    cera

  • View
    95

  • Download
    1

Embed Size (px)

DESCRIPTION

Computer Architecture Pipelined Processor. Ola Flygt Växjö University http://w3.msi.vxu.se/users/ofl/ [email protected] +46 470 70 86 49. Outline. 5.1 Basic concept 5.2 Design space of pipelines 5.3 Overview of pipelined instruction processing - PowerPoint PPT Presentation

Citation preview

Page 1: Computer Architecture Pipelined Processor

Computer Computer ArchitectureArchitecturePipelined Processor

Ola FlygtVäxjö University

http://w3.msi.vxu.se/users/ofl/[email protected]

+46 470 70 86 49

Page 2: Computer Architecture Pipelined Processor

Outline

5.1 Basic concept5.2 Design space of pipelines5.3 Overview of pipelined instruction

processing5.4 Pipelined execution of integer and

Boolean instructions5.5 Pipelined processing of loads and

stores

CH01

Page 3: Computer Architecture Pipelined Processor

5.1.1 Principle of pipelining

Page 4: Computer Architecture Pipelined Processor

Principle of pipeline.

Page 5: Computer Architecture Pipelined Processor

Processing of a sequence of instructions using a basic

pipeline

Page 6: Computer Architecture Pipelined Processor

Pipelined and unpipelined processing

Page 7: Computer Architecture Pipelined Processor

5.1.2 General structure of pipelines

Page 8: Computer Architecture Pipelined Processor

Structure and pipelined operation of the Fx unit of the

IBM Power1

Page 9: Computer Architecture Pipelined Processor

Pipeline Performance Measures

Cycle time: tc

is determined by the worst-case processing time of the longest stage

Repetition Rate: R the shortest possible time interval between

subsequent independent instructions in the pipeline

Performance potential of a pipeline: P P = 1/(R * tc)

PowerPC603 FP double Mul. e.g. R = 2, tc = 12 nsec

P = 1/(R * tc) = 1/(2*12nec) = 44.6 MFLOPS

Page 10: Computer Architecture Pipelined Processor

Performance: RAW-dependent

Latency: specifies the amount of time that the result of a particular

instruction takes to become available in the pipeline for a subsequent dependent instruction.

Define-use latency (1 to 100 cycles) mul r1, r2, r3 add r5, r1, r4

Load-use latency (1 to 3 cycles, sometimes much more) load r1, x add r5, r1, r2

Stalled: the immediately following RAW-dependent instruction has to be stalled in the pipeline for n-1 cycle

Page 11: Computer Architecture Pipelined Processor

Improve Performance

Multiple-operation instructionsHP PA 7100

FMPYADD RM1, RM2, RM3, RA1, RA2RM3RM1*RM2 RA2RA1+RA2

PowerPCFMA for performing (A*C) + B

Page 12: Computer Architecture Pipelined Processor

5.1.4 Application scenarios of pipelines

Page 13: Computer Architecture Pipelined Processor

5.2 Design space of pipelines

key aspect of the design space of pipeline

Page 14: Computer Architecture Pipelined Processor

5.2.2 Basic layout of a pipeline

Design space of the overall stage layout

Page 15: Computer Architecture Pipelined Processor

Increasing parellelism by raising the number of pipeline stages

Page 16: Computer Architecture Pipelined Processor

Eight-stage pipeline

Page 17: Computer Architecture Pipelined Processor

Problems arise for more stages

data and control dependencies occur more frequentlystalled and wait for datareload pipe in case of branch

subtask becomes less balances (in execution time)cycle time is determined by the worst-case

processing time of the longest stage In most case

5-10 stages

Page 18: Computer Architecture Pipelined Processor

Pipeline example: DEC 21064

Page 19: Computer Architecture Pipelined Processor

Layout of the stage sequence

Page 20: Computer Architecture Pipelined Processor

Bypasses (data forwarding in RAW)

Unless special arrangements are made, the results of the operation instruction is written into the register file, or into the memory, and then it is fetched from there as a source operand.

Page 21: Computer Architecture Pipelined Processor

Principle of bypassing in define-use and load-use conflicts

Page 22: Computer Architecture Pipelined Processor

Possibilities for the timing of pipeline operation

Page 23: Computer Architecture Pipelined Processor

5.3 Overview: pipelined instruction processing

Page 24: Computer Architecture Pipelined Processor

Declaration of Logical Pipeline: e.g. Powerpc 601

Page 25: Computer Architecture Pipelined Processor

Detailed Specification of each of the pipelines

Page 26: Computer Architecture Pipelined Processor

Implementation of instruction pipelines (v.s. logical)

Page 27: Computer Architecture Pipelined Processor

Layout of physical pipelines

Page 28: Computer Architecture Pipelined Processor

Multiplicity of pipelines

Page 29: Computer Architecture Pipelined Processor

Preserveing sequential consistency

Page 30: Computer Architecture Pipelined Processor

Preserving sequential consistency, implementation

e.g.

Page 31: Computer Architecture Pipelined Processor

Preserving sequential consistency, e.g.

Page 32: Computer Architecture Pipelined Processor

Case studies: Pentium

Logic layout of Pentium’s pipelines

Page 33: Computer Architecture Pipelined Processor

Case studies: PowerPC 604

Page 34: Computer Architecture Pipelined Processor

5.4 (Specific) Pipelines execution:Integer and Boolean instructions

(FX)

Page 35: Computer Architecture Pipelined Processor

RISC pipelines 4 or 5 stages

C

Page 36: Computer Architecture Pipelined Processor

Traditional FX pipeline of RISC processors

Page 37: Computer Architecture Pipelined Processor

Logical to Physical: e.g. PowerPC601 using a single

universal FX unit

Page 38: Computer Architecture Pipelined Processor

Layout of the 5 stages FX and L/S pipelines in the MIPS R4200

Page 39: Computer Architecture Pipelined Processor

CISC pipeline 6 or 5 stages

Page 40: Computer Architecture Pipelined Processor

Traditional CISC pipeline: The execution of register-memory

instruction

Page 41: Computer Architecture Pipelined Processor

CISC pipeline: Execution of register-register and load/store

instructions

Page 42: Computer Architecture Pipelined Processor

CISC pipeline 5 stage: recycling E/C stage

Page 43: Computer Architecture Pipelined Processor

Implementation of FX units: how many

Page 44: Computer Architecture Pipelined Processor

Trend in increasing the performance

Page 45: Computer Architecture Pipelined Processor

5.5 (Specific) Pipelines execution: loads and

stores

Page 46: Computer Architecture Pipelined Processor

5.5.3 Load-use delay: RICS pipelines

Page 47: Computer Architecture Pipelined Processor

Load-use delay: MIPS

Page 48: Computer Architecture Pipelined Processor

Load-use delay: CISC

Page 49: Computer Architecture Pipelined Processor

Handling Load-use delay

Basic approaches to cope with a load-use delay

Page 50: Computer Architecture Pipelined Processor

Remove Load-use delay

Page 51: Computer Architecture Pipelined Processor

Remove Load-use delay: bringing forward the calculation of virtual address: for slow

cache