140
Chapter 7 <1> Digital Design and Computer Architecture: ARM® Edi>on © 2015 Chapter 7 Digital Design and Computer Architecture: ARM® Edi*on Sarah L. Harris and David Money Harris

Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Embed Size (px)

Citation preview

Page 1: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<1>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Chapter7

DigitalDesignandComputerArchitecture:ARM®Edi*onSarahL.HarrisandDavidMoneyHarris

Page 2: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<2>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Chapter7::Topics

•  Introduc*on•  PerformanceAnalysis•  Single-CycleProcessor•  Mul*cycleProcessor•  PipelinedProcessor•  AdvancedMicroarchitecture

Page 3: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<3>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Microarchitecture:howtoimplementanarchitectureinhardware

•  Processor:– Datapath:func>onalblocks–  Control:controlsignals

Introduc>on

Page 4: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<4>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Mul>pleimplementa>onsforasinglearchitecture:– Single-cycle:Eachinstruc>onexecutesinasinglecycle

– Mul*cycle:Eachinstruc>onisbrokenupintoseriesofshortersteps

– Pipelined:Eachinstruc>onbrokenupintoseriesofsteps&mul>pleinstruc>onsexecuteatonce

Microarchitecture

Page 5: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<5>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Programexecu*on*me

Execu*onTime=(#instruc*ons)(cycles/instruc*on)(seconds/cycle)

•  Defini*ons:–  CPI:Cycles/instruc>on–  clockperiod:seconds/cycle–  IPC:instruc>ons/cycle=IPC

•  Challengeistosa*sfyconstraintsof:–  Cost–  Power–  Performance

ProcessorPerformance

Page 6: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<6>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  ConsidersubsetofARMinstruc>ons:– Data-processinginstruc*ons:

•  ADD,SUB,AND,ORR •  withregisterandimmediateSrc2,butnoshiLs

– Memoryinstruc*ons:•  LDR,STR •  withposi*veimmediateoffset

–  Branchinstruc*ons:•  B

ARMProcessor

Page 7: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<7>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Review:Instruc>onFormats

Branch

Page 8: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<8>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Determineseverythingaboutaprocessor:– Architecturalstate:

•  16registers(includingPC)•  Statusregister

– Memory

ArchitecturalStateElements

Page 9: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<9>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

CLK

R15

CLK

Status

32 32 32 32

32

32

32

3232

32

32

4

4

4

4 4

ARMArchitecturalStateElements

Page 10: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<10>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Datapath•  Control

Single-CycleARMProcessor

Page 11: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<11>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Datapath•  Control

Single-CycleARMProcessor

Page 12: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<12>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Datapath:startwithLDRinstruc>on•  Example: LDR R1, [R2, #5] LDR Rd, [Rn, imm12]

Single-CycleARMProcessor

Page 13: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<13>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

STEP1:Fetchinstruc>on

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

Instr

CLK

R15

Single-CycleDatapath:LDRfetch

Page 14: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<14>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

STEP2:ReadsourceoperandsfromRF

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

Instr 19:16

CLK

R15

RA1

Single-CycleDatapath:LDRRegRead

LDR Rd, [Rn, imm12]

Page 15: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<15>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

STEP3:Extendtheimmediate

ExtImm

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

Instr 19:16

15:12

11:0

CLK

R15

RA1

Extend

Single-CycleDatapath:LDRImmed.

LDR Rd, [Rn, imm12]

Page 16: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<16>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

STEP4:Computethememoryaddress

ExtImm

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

Instr 19:16

15:12

11:0

SrcB

ALUResult

SrcA

CLK

ALU

R15

RA1

Extend

ALUControl00

Single-CycleDatapath:LDRAddress

LDR Rd, [Rn, imm12]

Page 17: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<17>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR Rd, [Rn, imm12]

STEP5:Readdatafrommemoryandwriteitbacktoregisterfile

ExtImm

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

SrcA

CLK

ALU

R15

RA1

Extend

RegWrite ALUControl1 00

Single-CycleDatapath:LDRMemRead

Page 18: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<18>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

STEP6:Determineaddressofnextinstruc>on

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

SrcA

PCPlus4

CLK

ALU

R15

RA1

Extend

RegWrite ALUControl1 00

o

Single-CycleDatapath:PCIncrement

Page 19: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<19>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

PCcanbesource/des>na>onofinstruc>on

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPC1

0PC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

SrcA

PCPlus4

CLK

ALU

PCPlus8 R15+

4

RA1

Extend

RegWritePCSrc ALUControl1 1 00

Single-CycleDatapath:AccesstoPC

Page 20: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<20>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

PCcanbesource/des>na>onofinstruc>on•  Source:R15mustbeavailableinRegisterFile

–  PCisreadasthecurrentPCplus8

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPC1

0PC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

SrcA

PCPlus4

CLK

ALU

PCPlus8 R15+

4

RA1

Extend

RegWritePCSrc ALUControl1 1 00

Single-CycleDatapath:AccesstoPC

Page 21: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<21>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

PCcanbesource/des>na>onofinstruc>on•  Source:R15mustbeavailableinRegisterFile

–  PCisreadasthecurrentPCplus8•  Des*na*on:BeabletowriteresulttoPC

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPC1

0PC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

SrcA

PCPlus4

CLK

ALU

PCPlus8 R15+

4

RA1

Extend

RegWritePCSrc ALUControl1 1 00

Single-CycleDatapath:AccesstoPC

Page 22: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<22>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

ExpanddatapathtohandleSTR:•  WritedatainRdtomemory

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPC1

0PC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

CLK

ALU

PCPlus8 R15+

4

RA1

RA2

Extend

RegWritePCSrc MemWriteALUControl

0 0 00 1

Single-CycleDatapath:STR

STR Rd, [Rn, imm12]

Page 23: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<23>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

WithimmediateSrc2:•  ReadfromRnandImm8(ImmSrcchoosesthezero-extendedImm8

insteadofImm12)•  WriteALUResulttoregisterfile•  WritetoRd

Single-CycleDatapath:Data-processing

ADD Rd, Rn, imm8

Page 24: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<24>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

WithimmediateSrc2:•  ReadfromRnandImm8(ImmSrcchoosesthezero-extendedImm8

insteadofImm12)•  WriteALUResulttoregisterfile•  WritetoRd

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WE

10

PC10

PC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

ALUFlags

CLK

ALU

PCPlus8 R15+

4

RA1

RA2

Extend

RegWritePCSrc ImmSrc MemWrite MemtoRegALUControl

0 1 0 varies 0 0

Single-CycleDatapath:Data-processing

ADD Rd, Rn, imm8

Page 25: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<25>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

WithregisterSrc2:•  ReadfromRnandRm(insteadofImm8) •  WriteALUResulttoregisterfile•  WritetoRd

Single-CycleDatapath:Data-processing

ADD Rd, Rn, Rm

Page 26: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<26>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

WithregisterSrc2:•  ReadfromRnandRm(insteadofImm8) •  WriteALUResulttoregisterfile•  WritetoRd

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PC10

PC'

Instr 19:16

15:12

11:0

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

ALUFlags

CLK

ALU

PCPlus8 R15

3:0

+

4

RA1

RA2

Extend

01

RegSrc RegWritePCSrc ImmSrc MemWrite MemtoRegALUControlALUSrc

0 1 X 0 varies 0 00

Single-CycleDatapath:Data-processing

ADD Rd, Rn, Rm

Page 27: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<27>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Calculatebranchtargetaddress: BTA=(ExtImm)+(PC+8)

ExtImm=Imm24<<2andsign-extended

Single-CycleDatapath:B

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PC10

PC'

Instr

19:16

15:12

23:0

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

ALUFlags

CLK

ALU

PCPlus8 R15

3:0+

4

15RA1

RA2

Extend

01

01

RegSrc RegWritePCSrc ImmSrc MemWrite MemtoRegALUControlALUSrc

11 0 10 1 00 0 0x

B Label

Page 28: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<29>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Single-CycleARMProcessor

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PC10

PC'

Instr

19:16

15:12

23:0

25:20

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

27:26

ImmSrc

PCSrc

MemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

ALUFlags

CLK

ALUControl

ALU

PCPlus8 R15

3:0

Cond31:28

Flags

15:12 Rd

+

4

15RA1

RA2

0 1

Extend

01

01

RegSrc

Page 29: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<66>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Example:ORR

Page 30: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<73>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

ProgramExecu*onTime=(#instruc>ons)(cycles/instruc>on)(seconds/cycle)=#instruc>onsxCPIxTC

Review:ProcessorPerformance

Page 31: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<74>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

TClimitedbycri*calpath(LDR)

Single-CyclePerformance

Page 32: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<75>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Single-cyclecri*calpath: Tc1 = tpcq_PC + tmem + tdec + max[tmux + tRFread, tsext +

tmux] + tALU + tmem + tmux + tRFsetup

•  Typically,limi*ngpathsare:– memory,ALU,registerfile–  Tc1 = tpcq_PC + 2tmem + tdec + tRFread + tALU + 2tmux +

tRFsetup

Single-CyclePerformance

Page 33: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<76>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Element Parameter Delay(ps)Registerclock-to-Q tpcq_PC 40 Registersetup tsetup 50 Mul>plexer tmux 25 ALU tALU 120 Decoder tdec 70 Memoryread tmem 200 Registerfileread tRFread 100 Registerfilesetup tRFsetup 60

Tc1 = ?

Single-CyclePerformanceExample

Page 34: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<77>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Tc1 = tpcq_PC + 2tmem + tdec + tRFread + tALU + 2tmux + tRFsetup = [50 + 2(200) + 70 + 100 + 120 + 2(25) + 60] ps = 840 ps

Single-CyclePerformanceExampleElement Parameter Delay(ps)Registerclock-to-Q tpcq_PC 40 Registersetup tsetup 50 Mul>plexer tmux 25 ALU tALU 120 Decoder tdec 70 Memoryread tmem 200 Registerfileread tRFread 100 Registerfilesetup tRFsetup 60

Page 35: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<78>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Programwith100billioninstruc>ons: Execu*onTime=#instruc>onsxCPIxTC =(100×109)(1)(840×10-12s) =84seconds

Single-CyclePerformanceExample

Page 36: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<79>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Single-cycle:+simple-  cycle>melimitedbylongestinstruc>on(LDR)-  separatememoriesforinstruc>onanddata-  3adders/ALUs

•  Mul*cycleprocessoraddressestheseissuesbybreakinginstruc*onintoshorterstepso shorterinstruc>onstakefewerstepso canre-usehardwareo cycle>meisfaster

Mul>cycleARMProcessor

Page 37: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<80>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Single-cycle:+simple-  cycle>melimitedbylongestinstruc>on(LDR)-  separatememoriesforinstruc>onanddata-  3adders/ALUs

•  Mul*cycle:+higherclockspeed+simplerinstruc>onsrunfaster+reuseexpensivehardwareonmul>plecycles-sequencingoverheadpaidmany>mes

Mul>cycleARMProcessor

Page 38: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<81>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Single-cycle:+simple-  cycle>melimitedbylongestinstruc>on(LDR)-  separatememoriesforinstruc>onanddata-  3adders/ALUs

•  Mul*cycle:+higherclockspeed+simplerinstruc>onsrunfaster+reuseexpensivehardwareonmul>plecycles-sequencingoverheadpaidmany>mes

Mul>cycleARMProcessor

Samedesignstepsassingle-cycle:•  firstdatapath•  thencontrol

Page 39: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<82>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

ReplaceInstruc>onandDatamemorieswithasingleunifiedmemory–morerealis>c

Mul>cycleStateElements

Page 40: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<83>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

STEP1:Fetchinstruc>on

Mul>cycleDatapath:Instruc>onFetch

LDR Rd, [Rn, imm12]

Page 41: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<84>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR Rd, [Rn, imm12]

Mul>cycleDatapath:LDRRegisterRead

STEP2:ReadsourceoperandsfromRF

Page 42: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<85>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR Rd, [Rn, imm12]

Mul>cycleDatapath:LDRAddress

STEP3:Computethememoryaddress

Page 43: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<86>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR Rd, [Rn, imm12]

Mul>cycleDatapath:LDRMemoryRead

STEP4:Readdatafrommemory

Page 44: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<87>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR Rd, [Rn, imm12]

Mul>cycleDatapath:LDRWriteRegister

STEP5:Writedatabacktoregisterfile

Page 45: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<88>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:IncrementPC

STEP6:IncrementPC

Page 46: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<89>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:AccesstoPC

PCcanberead/wrijenbyinstruc>on

Page 47: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<90>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:AccesstoPC

PCcanberead/wrijenbyinstruc>on•  Read:R15(PC+8)availableinRegisterFile

Page 48: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<91>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:ReadtoPC(R15)

Example:ADD R1, R15, R2

Page 49: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<92>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:ReadtoPC(R15)

Example:ADD R1, R15, R2 •  R15needstobereadasPC+8fromRegisterFile(RF)in2ndstep•  So(alsoin2ndstep)PC+8isproducedbyALUandroutedtoR15

inputofRF

Page 50: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<93>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:ReadtoPC(R15)

Example:ADD R1, R15, R2 •  R15needstobereadasPC+8fromRegisterFile(RF)in2ndstep•  So(alsoin2ndstep)PC+8isproducedbyALUandroutedtoR15

inputofRF–  SrcA=PC(whichwasalreadyupdatedinstep1toPC+4)–  SrcB=4–  ALUResult=PC+8

•  ALUResultisfedtoR15inputportofRFin2ndstep(whichisthenroutedtoRD1outputofRF)

Page 51: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<94>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:ReadtoPC(R15)

Example:ADD R1, R15, R2 •  R15needstobereadasPC+8fromRegisterFile(RF)in2ndstep•  So(alsoin2ndstep)PC+8isproducedbyALUandroutedtoR15

inputofRF

Page 52: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<95>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:AccesstoPC

PCcanberead/wrijenbyinstruc>on•  Read:R15(PC+8)availableinRegisterFile•  Write:Beabletowriteresultofinstruc>ontoPC

Page 53: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<96>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:WritetoPC(R15)

Example:SUB R15, R8, R3

Page 54: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<97>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:WritetoPC(R15)

Example:SUB R15, R8, R3 •  Resultofinstruc>onneedstobewrijentothePCregister•  ALUResultalreadyroutedtothePCregister,justassertPCWrite

Page 55: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<98>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleDatapath:WritetoPC(R15)

Example:SUB R15, R8, R3 •  Resultofinstruc>onneedstobewrijentothePCregister•  ALUResultalreadyroutedtothePCregister,justassertPCWrite

Page 56: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<99>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

WritedatainRn tomemory

Mul>cycleDatapath:STR

Page 57: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<100>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Withimmediateaddressing(i.e.,animmediateSrc2),noaddi>onalchangesneededfordatapath

Mul>cycleDatapath:Data-processing

Page 58: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<101>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Withregisteraddressing(registerSrc2):ReadfromRnandRm

Mul>cycleDatapath:Data-processing

Page 59: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<102>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Calculatebranchtargetaddress: BTA=(ExtImm)+(PC+8)

ExtImm=Imm24<<2andsign-extended

Mul>cycleDatapath:B

Page 60: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<103>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleARMProcessor

Page 61: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<111>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:Fetch

Page 62: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<112>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:Decode

Page 63: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<113>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:Address

Page 64: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<114>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:ReadMemory

Page 65: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<116>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:LDR

Page 66: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<117>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:STR

Page 67: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<118>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:Data-processing

Page 68: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<119>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MainControllerFSM:Data-processing

Page 69: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<120>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleControllerFSM

Page 70: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<125>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Instruc>onstakedifferentnumberofcycles.

Mul>cycleProcessorPerformance

Page 71: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<126>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cycleControllerFSM

Page 72: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<127>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Instruc>onstakedifferentnumberofcycles:–  3cycles: –  4cycles: –  5cycles:

Mul>cycleProcessorPerformance

Page 73: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<128>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Instruc>onstakedifferentnumberofcycles:–  3cycles:B –  4cycles:DP, STR –  5cycles: LDR

Mul>cycleProcessorPerformance

Page 74: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<129>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Instruc>onstakedifferentnumberofcycles:–  3cycles:B –  4cycles:DP, STR –  5cycles: LDR

•  CPIisweightedaverage•  SPECINT2000benchmark:

–  25%loads–  10%stores–  13%branches–  52%R-type

Mul>cycleProcessorPerformance

Page 75: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<130>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Instruc>onstakedifferentnumberofcycles:–  3cycles:B –  4cycles:DP, STR –  5cycles: LDR

•  CPIisweightedaverage•  SPECINT2000benchmark:

–  25%loads–  10%stores–  13%branches–  52%R-type

Average CPI = (0.13)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12

Mul>cycleProcessorPerformance

Page 76: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<131>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Mul>cyclecri>calpath:•  Assump>ons:•  RFisfasterthanmemory•  wri>ngmemoryisfasterthanreadingmemory

Tc2 = tpcq + 2tmux + max(tALU + tmux, tmem) + tsetup

Mul>cycleProcessorPerformance

Page 77: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<132>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Tc2 = ?

Mul>cyclePerformanceExampleElement Parameter Delay(ps)Registerclock-to-Q tpcq_PC 40

Registersetup tsetup 50

Mul>plexer tmux 25

ALU tALU 120

Decoder tdec 70

Memoryread tmem 200

Registerfileread tRFread 100

Registerfilesetup tRFsetup 60

Page 78: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<133>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Tc2 = tpcq + 2tmux + max[tALU + tmux, tmem] + tsetup = [40 + 2(25) + 200 + 50] ps = 340 ps

Mul>cyclePerformanceExampleElement Parameter Delay(ps)Registerclock-to-Q tpcq_PC 40

Registersetup tsetup 50

Mul>plexer tmux 25

ALU tALU 120

Decoder tdec 70

Memoryread tmem 200

Registerfileread tRFread 100

Registerfilesetup tRFsetup 60

Page 79: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<134>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Foraprogramwith100billioninstruc>onsexecu>ngonamul*cycleARMprocessor

– CPI=4.12cycles/instruc>on– Clockcycle*me:Tc2=340ps

Execu*onTime=?

Mul>cyclePerformanceExample

Page 80: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<135>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Foraprogramwith100billioninstruc>onsexecu>ngonamul*cycleARMprocessor

– CPI=4.12cycles/instruc>on– Clockcycle*me:Tc2=340ps

Execu*onTime=(#instruc>ons)×CPI×Tc =(100×109)(4.12)(340×10-12) =140seconds

Mul>cyclePerformanceExample

Page 81: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<136>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Foraprogramwith100billioninstruc>onsexecu>ngonamul*cycleARMprocessor

– CPI=4.12cycles/instruc>on– Clockcycle*me:Tc2=340ps

Execu*onTime=(#instruc>ons)×CPI×Tc =(100×109)(4.12)(340×10-12) =140seconds

Thisisslowerthanthesingle-cycleprocessor(84sec.)

Mul>cyclePerformanceExample

Page 82: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<137>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Review:Single-CycleARMProcessor

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PC10

PC'

Instr

19:16

15:12

23:0

25:20

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

27:26

ImmSrc

PCSrc

MemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

ALUFlags

CLK

ALUControl

ALU

PCPlus8 R15

3:0

Cond31:28

Flags

15:12 Rd

+

4

15RA1

RA2

0 1

Extend

01

01

RegSrc

Page 83: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<138>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Review:Mul>cycleARMProcessor

Page 84: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<139>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Temporalparallelism•  Dividesingle-cycleprocessorinto5stages:

–  Fetch–  Decode–  Execute– Memory– Writeback

•  Addpipelineregistersbetweenstages

PipelinedARMProcessor

Page 85: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<140>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Single-Cyclevs.Pipelined

Time(ps)Instr

FetchInstruction

DecRead Reg

ExecuteALU

MemoryRead/Write

WrReg1

2

0 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 15001000

Instr

1

2

(b)

3

FetchInstruction

DecRead Reg

ExecuteALU

MemoryRead/Write

WrReg

FetchInstruction

DecRead Reg

ExecuteALU

MemoryRead/Write

WrReg

FetchInstruction

DecRead Reg

ExecuteALU

MemoryRead/Write

WrReg

FetchInstruction

DecRead Reg

ExecuteALU

MemoryRead/Write

WrReg

Single-Cycle

Pipelined

Page 86: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<141>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

PipelinedProcessorAbstrac>on

Time(cycles)

LDR R2, [R0, #40] RF 40

R0RF

R2+ DM

RF R10

R9RF

R3+ DM

RF R5

R1RF

R4- DM

RF R13

R12RF

R5& DM

RF 20

R1RF

R6+ DM

RF 42

R11RF

R7| DM

ADD R3, R9, R10

SUB R4, R1, R5

AND R5, R12, R13

STR R6, [R1, #20]

ORR R7, R11, #42

1 2 3 4 5 6 7 8 9 10

ADD

IM

IM

IM

IM

IM

IM LDR

SUB

AND

STR

ORR

Page 87: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<142>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Single-Cycle&PipelinedDatapath

ExtImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PC10

PC'

Instr

19:16

15:12

23:0

SrcB

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

CLK

ALU

PCPlus8 R15

3:0

+

4

15RA1

RA2

Extend

01

01

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PCF10

PC'

InstrD

19:16

15:12

23:0

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

CLK

ALU

PCPlus8 R15

3:0

+

4

15RA1D

RA2D

Extend

01

01

CLK CLK CLK CLK

Fetch Decode Execute Memory Writeback

InstrF

ALUOutM ALUOutW

WA3D

Single-Cycle

Pipelined

Page 88: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<143>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  WA3mustarriveatsame*measResult•  Registerfilewri]enonfallingedgeofCLK

CorrectedPipelinedDatapath

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PCF10

PC'

InstrD

19:16

15:12

23:0

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

CLK

ALU

PCPlus8

R15

3:0

+

4

15RA1D

RA2D

Extend

01

01

CLK CLK CLK CLK

InstrF

ALUOutM ALUOutWWA3E WA3M WA3WWA3D

Page 89: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<144>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

RemoveadderbyusingPCPlus4FaLerPChasbeenupdatedtoPC+4

Op>mizedPipelinedDatapath

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PCF10

PC'

InstrD

19:16

15:12

23:0

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

CLK

ALU

R15

3:015

RA1D

RA2D

Extend

01

01

CLK CLK CLK CLK

InstrF

ALUOutM ALUOutWWA3E WA3M WA3WWA3D

PCPlus8D

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PCF10

PC'

InstrD

19:16

15:12

23:0

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

CLK

ALU

PCPlus8

R15

3:0

+

4

15RA1D

RA2D

Extend

01

01

CLK CLK CLK CLK

InstrF

ALUOutM ALUOutWWA3E WA3M WA3WWA3D

Page 90: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<145>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Samecontrolunitassingle-cycleprocessor•  Controldelayedtoproperpipelinestage

PipelinedProcessorControl

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

01

A RDData

MemoryWD

WE

10

PCFPC'

InstrD

19:16

15:12

23:0

25:20

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

27:26

ImmSrcD

MemWriteDMemtoRegD

ALUSrcD

RegWriteD

OpFunct

ControlUnit

ALUFlags

CLK

ALUControlD

ALU

PCPlus8D

R15

3:0

31:28

FlagWriteD

15:12 Rd

15RA1D

RA2D

0 1

Extend

01

01

RegSrcD

CLK

InstrF

CLK

ALUOutM ALUOutWWA3E WA3M WA3W

CLK CLK

MemWriteE

MemtoRegE

ALUSrcE

RegWriteE

ALUControlEMemWriteMMemtoRegMRegWriteM

MemtoRegWRegWriteW

BranchD

FlagsE

FlagWriteE

BranchE

CondE

CondExE

10

PCSrcD PCSrcE PCSrcM PCSrcW

Flags'CondUnit

Page 91: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<146>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Whenaninstruc>ondependsonresultfrominstruc>onthathasn’tcompleted

•  Types:– Datahazard:registervaluenotyetwrijenbacktoregisterfile

– Controlhazard:nextinstruc>onnotdecidedyet(causedbybranch)

PipelineHazards

Page 92: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<147>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

DataHazard

Time(cycles)

ADD R1, R4, R5 RF R5

R4RF

R1+ DM

RF R3

R1RF

R8& DM

RF R1

R6RF

R9| DM

RF R7

R1RF

R10- DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM

IM ADD

ORR

SUB

Page 93: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<148>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  InsertNOPsincodeatcompile>me•  Rearrangecodeatcompile>me•  Forwarddataatrun>me•  Stalltheprocessoratrun>me

HandlingDataHazards

Page 94: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<149>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  InsertenoughNOPsforresulttobeready•  Ormoveindependentusefulinstruc>onsforward

Compile-TimeHazardElimina>on

Time(cycles)

ADD R1, R4, R5 RF R5

R4RF

R1+ DM

RF R3

R1RF

R8& DM

RF R1

R6RF

R9| DM

RF R7

R1RF

R10- DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM

IM ADD

ORR

SUB

NOP

NOP

RF RFDMNOPIM

RF RFDMNOPIM

9 10

Page 95: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<150>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

DataForwarding

Time(cycles)

ADD R1, R4, R5 RF R5

R4RF

R1+ DM

RF R3

R1RF

R8& DM

RF R1

R6RF

R9| DM

RF R7

R1RF

R10- DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM

IM ADD

ORR

SUB

Page 96: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<151>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

DataForwarding

•  CheckifregisterreadinExecutestagematchesregisterwrijeninMemoryorWritebackstage

•  Ifso,forwardresult

Time(cycles)

ADD R1, R4, R5 RF R5

R4RF

R1+ DM

RF R3

R1RF

R8& DM

RF R1

R6RF

R9| DM

RF R7

R1RF

R10- DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM

IM ADD

ORR

SUB

Page 97: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<152>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

DataForwarding

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

RegisterFile

01

A RDData

MemoryWD

WE

10

PCFPC'

InstrD

19:16

15:12

23:0

25:20

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

27:26

ImmSrcD

MemWriteDMemtoRegD

ALUSrcD

RegWriteD

OpFunct

ControlUnit

ALUFlags

CLK

ALUControlD

ALU

PCPlus8D

R15

3:0

31:28

FlagWriteD

15:12 Rd

15RA1D

RA2D

0 1

Extend

01

01

RegSrcD

CLK

InstrF

CLK

ALUOutM ALUOutWWA3E WA3M WA3W

CLK CLK

MemWriteE

MemtoRegE

ALUSrcE

RegWriteE

ALUControlEMemWriteMMemtoRegMRegWriteM

MemtoRegWRegWriteW

BranchD

FlagsE

FlagWriteE

BranchE

CondE

CondExE

10

PCSrcD PCSrcE PCSrcM PCSrcW

Flags'

CondUnit

000110

000110

HazardUnit

ForwardA

EForw

ardBE

RegW

riteM

Match

RegW

riteW

CLK

Page 98: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<153>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

DataForwarding•  ExecutestageregistermatchesMemorystageregister?

Match_1E_M=(RA1E==WA3M)Match_2E_M=(RA2E==WA3M)

•  ExecutestageregistermatchesWritebackstageregister?Match_1E_W=(RA1E==WA3W)Match_2E_W=(RA2E==WA3W)

•  Ifitmatches,forwardresult:if(Match_1E_M•RegWriteM) ForwardAE=10;elseif(Match_1E_W•RegWriteW) ForwardAE=01;else ForwardAE=00;

Page 99: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<154>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

DataForwarding•  ExecutestageregistermatchesMemorystageregister?

Match_1E_M=(RA1E==WA3M)Match_2E_M=(RA2E==WA3M)

•  ExecutestageregistermatchesWritebackstageregister?Match_1E_W=(RA1E==WA3W)Match_2E_W=(RA2E==WA3W)

•  Ifitmatches,forwardresult:if(Match_1E_M•RegWriteM) ForwardAE=10;elseif(Match_1E_W•RegWriteW) ForwardAE=01;else ForwardAE=00;

ForwardBEsamebutwithMatch2E

Page 100: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<155>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Stalling

Time(cycles)

LDR R1, [R4, #40] RF 40

R4RF

R1+ DM

RF R3

R1RF

R8& DM

RF R1

R6RF

R9| DM

RF R7

R1RF

R10- DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM

IM LDR

ORR

SUB

Trouble!

Page 101: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<156>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Stalling

Time(cycles)

LDR R1, [R4, #40] RF 40

R4RF

R1+ DM

RF R3

R1RF

R8& DM

RF R1

R6RF

R9| DM

RF R7

R1RF

R10- DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM

IM LDR

ORR

SUB

9

RF R3

R1

IM ORR

Stall

Page 102: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<157>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

StallingHardware

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

RegisterFile

01

A RDData

MemoryWD

WE

10

PCFPC'

InstrD

19:16

15:12

23:0

25:20

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

27:26

ImmSrcD

MemWriteDMemtoRegD

ALUSrcD

RegWriteD

OpFunct

ControlUnit

ALUFlags

CLK

ALUControlD

ALU

PCPlus8D

R15

3:0

31:28

FlagWriteD

15:12 Rd

15RA1D

RA2D

0 1

Extend

01

01

RegSrcD

CLK

InstrF

CLK

ALUOutM ALUOutWWA3E WA3M WA3W

CLK CLK

MemWriteE

MemtoRegE

ALUSrcE

RegWriteE

ALUControlEMemWriteMMemtoRegMRegWriteM

MemtoRegWRegWriteW

BranchD

FlagsE

FlagWriteE

BranchE

CondECondExE

10

PCSrcD PCSrcE PCSrcM PCSrcW

Flags'

CondUnit

000110

000110

HazardUnit

ForwardA

EForw

ardBE

RegW

riteM

Match

RegW

riteW

MemtoRegE

StallF

StallD

FlushE

EN

CLR

CLREN

FlushD

CLK

Page 103: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<158>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  IseithersourceregisterintheDecodestagethesameastheonebeingwrijenintheExecutestage?

Match_12D_E=(RA1D==WA3E)+(RA2D==WA3E)•  IsaLDRintheExecutestageANDMatch_12D_E?

ldrstall=Match_12D_E•MemtoRegEStallF=StallD=FlushE=ldrstall

StallingLogic

Page 104: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<159>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  B:–  branchnotdeterminedun>ltheWritebackstageofpipeline

–  Instruc>onsaserbranchfetchedbeforebranchoccurs

–  These4instruc>onsmustbeflushedifbranchhappens

•  WritestoPC(R15)similar

ControlHazards

Page 105: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<160>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

ControlHazardsTime(cycles)

B 3C RF RFDM

RF R3

R1RF& DM

RF R1

R6RF| DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM B

ORR

20

24

28

2C

34... ...

9

Flushthese

instructions

64 ADD R12, R3, R4 RF R4

R3RF

R12+ DMIM ADD

RF R7

R1RF- DMIM SUB

RF R8

R1RF- DMIM SUBSUB R11, R1, R830

10

Branchmispredic*onpenalty•  numberofinstruc>onflushedwhenbranchistaken(4)•  MaybereducedbydeterminingBTAearlier

Page 106: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<161>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

EarlyBranchResolu>on

•  DetermineBTAinExecutestage– Branchmispredic>onpenalty=2cycles

•  Hardwarechanges–  Addabranchmul>plexerbeforePCregistertoselectBTAfromALUResultE

–  AddBranchTakenEselectsignalforthismul>plexer(onlyassertedifbranchcondi>onsa>sfied)

–  PCSrcWnowonlyassertedforwritestoPC

Page 107: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<162>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

PipelinedprocessorwithEarlyBTA

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

RegisterFile

01

A RDData

MemoryWD

WE

10

PCF01

PC'

InstrD

19:16

15:12

23:0

25:20

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

27:26

ImmSrcD

MemWriteDMemtoRegD

ALUSrcD

RegWriteD

OpFunct

ControlUnit

ALUFlags

CLK

ALUControlD

ALU

PCPlus8D

R15

3:0

31:28

FlagWriteD

15:12 Rd

15RA1D

RA2D

0 1

Extend

01

01

RegSrcD

CLK

InstrF

CLK

ALUOutM ALUOutW

000110

000110

WA3E WA3M WA3W

CLK CLK

MemWriteE

MemtoRegE

ALUSrcE

RegWriteE

ALUControlEMemWriteMMemtoRegMRegWriteM

MemtoRegWRegWriteW

BranchD

FlagsE

FlagWriteE

BranchE

CondECondExE

HazardUnit

StallF

StallD

FlushE

ForwardA

EForw

ardBE

EN

CLR

CLREN

10

PCSrcD PCSrcE PCSrcM PCSrcW

FlushD

Flags'CondUnit

BranchTakenE

RegW

riteM

Match

RegW

riteW

MemtoR

egECLK

Page 108: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<163>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

ControlHazardswithEarlyBTATime(cycles)

B 3C RF RFDM

RF R3

R1RF& DM

RF R1

R6RF| DM

AND R8, R1, R3

ORR R9, R6, R1

SUB R10, R1, R7

1 2 3 4 5 6 7 8

AND

IM

IM

IM B

ORR

20

24

28

2C

34... ...

9

Flushthese

instructions

64 ADD R12, R3, R4 RF R4

R3RF

R12+ DMIM ADD

SUB R11, R1, R830

10

Page 109: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<164>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  PCWrPendingF=1ifwritetoPCinDecode,ExecuteorMemory

PCWrPendingF=PCSrcD+PCSrcE+PCSrcM

•  StallFetchifPCWrPendingFStallF=ldrStallD+PCWrPendingF

•  FlushDecodeifPCWrPendingFORPCiswrijeninWritebackORbranchistaken

FlushD=PCWrPendingF+PCSrcW+BranchTakenE

•  FlushExecuteifbranchistakenFlushE=ldrStallD+BranchTakenE

•  StallDecodeifldrStallD(asbefore)StallD=ldrStallD

ControlStallingLogic

Page 110: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<165>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

ARMPipelinedProcessorwithHazardUnit

ExtImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

RegisterFile

01

A RDData

MemoryWD

WE

10

PCF01

PC'

InstrD

19:16

15:12

23:0

25:20

SrcBE

ALUResultE ReadDataW

WriteDataE

SrcAE

PCPlus4F

ResultW

27:26

ImmSrcD

MemWriteDMemtoRegD

ALUSrcD

RegWriteD

OpFunct

ControlUnit

ALUFlags

CLK

ALUControlD

ALU

PCPlus8D

R15

3:0

31:28

FlagWriteD

15:12 Rd

15RA1D

RA2D

0 1

Extend

01

01

RegSrcD

CLK

InstrF

CLK

ALUOutM ALUOutW

000110

000110

WA3E WA3M WA3W

CLK CLK

MemWriteE

MemtoRegE

ALUSrcE

RegWriteE

ALUControlEMemWriteMMemtoRegMRegWriteM

MemtoRegWRegWriteW

BranchD

FlagsE

FlagWriteE

BranchE

CondECondExE

HazardUnit

StallF

StallD

FlushE

ForwardA

EForw

ardBE

EN

CLR

CLREN

10

PCSrcD PCSrcE PCSrcM PCSrcW

FlushD

Flags'CondUnit

BranchTakenE

RegW

riteM

Match

RegW

riteW

MemtoR

egECLK

Page 111: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<166>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  SPECINT2000benchmark:–  25%loads–  10%stores–  13%branches–  52%R-type

•  Suppose:–  40%ofloadsusedbynextinstruc>on–  50%ofbranchesmispredicted

•  WhatistheaverageCPI?

PipelinedPerformanceExample

Page 112: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<167>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  SPECINT2000benchmark:–  25%loads–  10%stores–  13%branches–  52%R-type

•  Suppose:–  40%ofloadsusedbynextinstruc>on–  50%ofbranchesmispredicted

•  WhatistheaverageCPI?–  LoadCPI=1whennotstalling,2whenstalling

So,CPIlw=1(0.6)+2(0.4)=1.4–  BranchCPI=1whennotstalling,3whenstalling

So,CPIbeq=1(0.5)+3(0.5)=2

Average CPI = (0.25)(1.4) + (0.1)(1) + (0.13)(2) + (0.52)(1) =1.23

PipelinedPerformanceExample

Page 113: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<168>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Pipelined processor critical path: Tc3 = max [

tpcq + tmem + tsetup Fetch 2(tRFread + tsetup ) Decode tpcq + 2tmux + tALU + tsetup Execute tpcq + tmem + tsetup Memory 2(tpcq + tmux + tRFwrite) ] Writeback

PipelinedPerformance

Page 114: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<169>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Element Parameter Delay(ps)Registerclock-to-Q tpcq_PC 40 Registersetup tsetup 50 Mul>plexer tmux 25 ALU tALU 120 Memoryread tmem 200 Registerfileread tRFread 100 Registerfilesetup tRFsetup 60 Registerfilewrite tRFwrite 70

Cycle*me: Tc3 = ?

PipelinedPerformanceExample

Page 115: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<170>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Element Parameter Delay(ps)Registerclock-to-Q tpcq_PC 40 Registersetup tsetup 50 Mul>plexer tmux 25 ALU tALU 120 Memoryread tmem 200 Registerfileread tRFread 100 Registerfilesetup tRFsetup 60 Registerfilewrite tRFwrite 70

Cycle*me: Tc3 = 2(tRFread + tsetup ) = 2[100 + 50] ps = 300 ps

PipelinedPerformanceExample

Page 116: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<171>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Programwith100billioninstruc>onsExecu*onTime =(#instruc>ons)×CPI×Tc =(100×109)(1.23)(300×10-12) =36.9seconds

PipelinedPerformanceExample

Page 117: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<172>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Processor

Execu*onTime(seconds)

Speedup(single-cycleasbaseline)

Single-cycle 84 1

Mul*cycle 140 0.6

Pipelined 36.9 2.28

ProcessorPerformanceComparison

Page 118: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<173>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  DeepPipelining•  Micro-opera>ons•  BranchPredic>on•  SuperscalarProcessors•  OutofOrderProcessors•  RegisterRenaming•  SIMD•  Mul>threading•  Mul>processors

AdvancedMicroarchitecture

Page 119: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<174>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  10-20stagestypical•  Numberofstageslimitedby:– Pipelinehazards– Sequencingoverhead– Power– Cost

DeepPipelining

Page 120: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<175>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Decomposemorecomplexinstruc>onsintoaseriesofsimpleinstruc>onscalledmicro-operaKons(micro-opsorµ-ops)

•  Atrun->me,complexinstruc>onsaredecodedintooneormoremicro-ops

•  UsedheavilyinCISC(complexinstruc>onsetcomputer)architectures(e.g.,x86)

•  UsedforsomeARMinstruc>ons,forexample:

ComplexOp Micro-opSequence LDR R1, [R2], #4 LDR R1, [R2] ADD R2, R2, #4

Withoutu-ops,wouldneed2ndwriteportontheregisterfile

Micro-opera>ons

Page 121: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<176>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Allowfordensecode(fewermemoryaccesses)•  YetpreservesimplicityofRISChardware•  ARMstrikesbalancebychoosinginstruc>onsthat:

–  GivebejercodedensitythanpureRISCinstruc>onsets(suchasMIPS)

–  EnablemoreefficientdecodingthanCISCinstruc>onsets(suchasx86)

Micro-opera>ons

Page 122: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<177>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Guesswhetherbranchwillbetaken– Backwardbranchesareusuallytaken(loops)– Considerhistorytoimproveguess

•  Goodpredic>onreducesfrac>onofbranchesrequiringaflush

BranchPredic>on

Page 123: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<178>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Idealpipelinedprocessor:CPI=1•  Branchmispredic>onincreasesCPI•  Sta*cbranchpredic*on:–  Checkdirec>onofbranch(forwardorbackward)–  Ifbackward,predicttaken–  Else,predictnottaken

•  Dynamicbranchpredic*on:–  Keephistoryoflastseveralhundred(orthousand)branchesinbranchtargetbuffer,record:•  Branchdes>na>on•  Whetherbranchwastaken

BranchPredic>on

Page 124: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<179>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

MOV R1, #0 ; R1 = sum

MOV R0, #0 ; R0 = i

FOR ; for (i=0; i<10; i=i+1)

CMP R0, #10

BGE DONE

ADD R1, R1, R0 ; sum = sum + i ADD R0, R0, #1

B FOR

DONE

BranchPredic>onExample

Page 125: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<180>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Rememberswhetherbranchwastakenthelast>meanddoesthesamething

•  Mispredictsfirstandlastbranchofloop

1-BitBranchPredictor

Page 126: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<181>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

Onlymispredictslastbranchofloop

stronglytaken

predicttaken

weaklytaken

predicttaken

weaklynot taken

predictnot taken

stronglynot taken

predictnot taken

taken taken taken

takentakentaken

taken

taken

2-BitBranchPredictor

Page 127: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<182>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Mul>plecopiesofdatapathexecutemul>pleinstruc>onsatonce

•  Dependenciesmakeittrickytoissuemul>pleinstruc>onsatonce

CLK CLK CLK CLK

ARD A1

A2RD1A3

WD3WD6

A4A5A6

RD4

RD2RD5

InstructionMemory

RegisterFile Data

Memory

ALUs

PC

CLK

A1A2

WD1WD2

RD1RD2

Superscalar

Page 128: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<183>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

IdealIPC: 2 ActualIPC: 2

SuperscalarExample

Time(cycles)

1 2 3 4 5 6 7 8

RF40

R0

RF

R8+

DMIM

LDR

ADD

LDR R8, [R0, #40]

ADD R9, R1, R2

SUB R10, R1, R3

AND R11, R3, R4

ORR R12, R1, R5

STR R5, [R0, #80]

R9R2

R1

+

RFR3

R1

RF

R10-

DMIM

SUB

AND R11R4

R3

&

RFR5

R1

RF

R12|

DMIM

ORR

STR 80

R0

+ R5

Page 129: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<184>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

SuperscalarwithDependencies

Stall

Time(cycles)

1 2 3 4 5 6 7 8

RF40

R0

RF

R8+

DMIM

LDRLDR R8, [R0, #40]

ADD R9, R8, R1

SUB R8, R2, R3

AND R10, R4, R8

STR R7, [R11, #80]

RFR1

R8ADD

RFR1

R8

RF

R9+

DM

RFR8

R4

RF

R10&

DMIM

AND

IMORR

AND

SUB

|R6

R5R11

RF80

R11

RF+

DMSTR

IM

R7

9

R3

R2

R3

R2-

R8

ORRORR R11, R5, R6

IM

IdealIPC: 2 ActualIPC: 6/5=1.2

Page 130: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<185>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Looksaheadacrossmul>pleinstruc>ons•  Issuesasmanyinstruc>onsaspossibleatonce•  Issuesinstruc>onsoutoforder(aslongasnodependencies)

•  Dependencies:–  RAW(readaserwrite):oneinstruc>onwrites,laterinstruc>onreadsaregister

–  WAR(writeaserread):oneinstruc>onreads,laterinstruc>onwritesaregister

–  WAW(writeaserwrite):oneinstruc>onwrites,laterinstruc>onwritesaregister

OutofOrderProcessor

Page 131: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<186>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Instruc*onlevelparallelism(ILP):numberofinstruc>onthatcanbeissuedsimultaneously(average<3)

•  Scoreboard:tablethatkeepstrackof:– Instruc>onswai>ngtoissue– Availablefunc>onalunits– Dependencies

OutofOrderProcessor

Page 132: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<187>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR R8, [R0, #40] ADD R9, R8, R1 SUB R8, R2, R3 IdealIPC: 2 AND R10, R4, R8 ActualIPC: 6/4=1.5 ORR R11, R5, R6

STR R7, [R11, #80]

OutofOrderProcessorExample

Time(cycles)

1 2 3 4 5 6 7 8

RF40

R0

RF

R8+

DMIM

LDRLDR R8, [R0, #40]

ADD R9, R8, R1

SUB R8, R2, R3

AND R10, R4, R8

STR R7, [R11, #80]

ORR|R6

R5R11

RF80

R11

RF+

DMSTR R7

ORR R11, R5, R6

IM

RFR1

R8

RF

R9+

DMIM

ADD

SUB-R3

R2R8

two cycle latencybetween load anduse of R8

RAW

WAR

RAW

RFR8

R4

RF&

DMAND

IM

R10

RAW

Page 133: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<188>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

LDR R8, [R0, #40] ADD R9, R8, R1 SUB R8, R2, R3 IdealIPC: 2 AND R10, R4, R8 ActualIPC: 6/3=2 ORR R11, R5, R6

STR R7, [R11, #80]

RegisterRenaming

Time(cycles)

1 2 3 4 5 6 7

RF40

R0

RF

R8+

DMIM

LDRLDR R8, [R0, #40]

ADD R9, R8, R1

SUB T0, R2, R3

AND R10, R4, T0

STR R7, [R11, #80]

SUB-R3

R2T0

RFT0

R4

RF&

DMAND

R7

ORR R11, R5, R6IM

RFR1

R8

RF

R9+

DMIM

ADD

STR+80

R11

RAW

R6

R5|

ORR

2-cycle RAW

RAW

R10

R11

Page 134: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<189>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  SingleInstruc>onMul>pleData(SIMD)–  Singleinstruc>onactsonmul>plepiecesofdataatonce–  Commonapplica>on:graphics–  Performshortarithme>copera>ons(alsocalledpackedarithmeKc)

•  Forexample,addeight8-bitelements

SIMD

a0

0781516232431 Bit position

D0a1a2a3

b0 D1b1b2b3

a0 + b0 D2a1 + b1a2 + b2a3 + b3

+

a4a5a6a7

b4b5b6b7

a4 + b4a5 + b5a6 + b6a7 + b7

3239404748555663

Page 135: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<190>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Mul*threading– Wordprocessor:threadfortyping,spellchecking,prin>ng

•  Mul*processors– Mul>pleprocessors(cores)onasinglechip

AdvancedArchitectureTechniques

Page 136: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<191>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Process:programrunningonacomputer– Mul>pleprocessescanrunatonce:e.g.,surfingWeb,playingmusic,wri>ngapaper

•  Thread:partofaprogram– Eachprocesshasmul>plethreads:e.g.,awordprocessormayhavethreadsfortyping,spellchecking,prin>ng

Threading:Defini>ons

Page 137: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<192>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Onethreadrunsatonce•  Whenonethreadstalls(forexample,wai>ngformemory):– Architecturalstateofthatthreadstored– Architecturalstateofwai>ngthreadloadedintoprocessoranditruns

–  Calledcontextswitching•  Appearstouserlikeallthreadsrunningsimultaneously

ThreadsinConven>onalProcessor

Page 138: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<193>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Mul>plecopiesofarchitecturalstate•  Mul>plethreadsac*veatonce:– Whenonethreadstalls,anotherrunsimmediately–  Ifonethreadcan’tkeepallexecu>onunitsbusy,anotherthreadcanusethem

•  Doesnotincreaseinstruc>on-levelparallelism(ILP)ofsinglethread,butincreasesthroughput

Intelcallsthis“hyperthreading”

Mul>threading

Page 139: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<194>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Mul>pleprocessors(cores)withamethodofcommunica>onbetweenthem

•  Types:– Homogeneous:mul>plecoreswithsharedmainmemory

– Heterogeneous:separatecoresfordifferenttasks(forexample,DSPandCPUincellphone)

–  Clusters:eachcorehasownmemorysystem

Mul>processors

Page 140: Digital Design and Computer Architecture: ARM® Edi*oncs.slu.edu/~fritts/csci2400/schedule/DDCAarm_Ch7.pdf · Digital Design and Computer Architecture: ... Chapter 7 Digital Design

Chapter7<195>DigitalDesignandComputerArchitecture:ARM®Edi>on©2015

•  Pajerson&Hennessy’s:ComputerArchitecture:AQuanKtaKveApproach

•  Conferences:– www.cs.wisc.edu/~arch/www/–  ISCA(Interna>onalSymposiumonComputerArchitecture)

– HPCA(Interna>onalSymposiumonHighPerformanceComputerArchitecture)

OtherResources