36
CSIS1120A 14. Processor Organization CSIS1120A 14. Processor Organization 1

CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

  • Upload
    others

  • View
    16

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

CSIS1120A14. Processor Organization

CSIS1120A 14. Processor Organization 1

Page 2: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Processor Organization

The processor performs operations via two operations:1 moving data from one place to another2 perform data transformation through ALU.

The data are stored in registers inside the CPU.These operations are controlled by ”control signals”generated by the control units. (The flags set in theinstruction decoder function in the simulator in Assignment2/4).Data movement can be described as follows:e.g. Instruction fetch:

MAR ← PC

PC ← PC+4

CSIS1120A 14. Processor Organization 2

Page 3: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Processors

Typical internal structure of CPU

CSIS1120A 14. Processor Organization 3

Page 4: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Registers

Registers are divided into 2 types:user-visible registers — general purpose registers, otherregisters such as index registers, base pointers, etc.Hidden registers — PC, IR, MAR, MBR, TEMP, PSW(processor status word, or flag register) etc.

CSIS1120A 14. Processor Organization 4

Page 5: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Control Signals

Data movement is controlled by control signals:

e.g. MAR← PC

This operation requires 2 control signals:1 A control signal to tell the PC to put its content onto the

CPU bus (In terms of digital logic circuits: to enable theoutput driver of PC)

2 A control signal to tell the MAR to get the value from theCPU bus (In terms of digital logic circuits: to clock the latchso that it will store its input into the register)

Other operations may require more control signals, e.g.Control signals to tell the ALU what operation to perform,control signals to read/write external memory etc.

CSIS1120A 14. Processor Organization 5

Page 6: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Hardwired Control vs MicroProgrammed Control

We have two ways to get the control signals.Hardwired Control

the control signals are generated by logic gates.e.g. write down the truth table, and use logic gates.It is faster, but the design is more complicated (nowadaysuse CAD).

Microprogrammed ControlThe control signals are stored in memory — microcodes.The entire truth table is stored in memory, and the inputsare the addresses of memory.Even though the control store is inside the CPU, memoryaccess is still slower.However, simple design and implementation, easy tomodify/debug.

Modern Processor tends to use hardwired control, and bysimplifying the instruction set, we can make the logicsimpler.

CSIS1120A 14. Processor Organization 6

Page 7: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Instruction Execution Cycle

There are 3 stages in an instruction execution cycle:1 Instruction Fetch2 Instruction Decode3 Instruction Execution:

Read Operands from registersALU operationsMemory ReferenceBranch CompletionWrite Back Registers

They are similar to content of the simulator in Assignment.

CSIS1120A 14. Processor Organization 7

Page 8: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Example

ADD A, B, C

where A, B, C are of absolute (direct) addressing, i.e. theaddresses are in the words following the instruction. This is a4-word instruction.

Instruction Fetch

IF: MAR ← PCPC ← PC+4IR ← mem[MAR]

CSIS1120A 14. Processor Organization 8

Page 9: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Example: ADD A, B, C

Instruction Execute

OF: MAR ← PC # operand fetchPC ← PC+4MBR ← mem[MAR] # addr of A in MBRMAR ← MBRMBR ← mem[MAR] # value of A in MBR

ALU.input1 ← MBR

MAR ← PCPC ← PC+4MBR ← mem[MAR] # addr of B in MBRMAR ← MBRMBR ← mem[MAR] # value of B in MBR

ALU.input2 ← MBR

CSIS1120A 14. Processor Organization 9

Page 10: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Example: ADD A, B, C

Exec: ALU.ouput ← ALU.input1 + ALU.input2

WB: MAR ← PC # write back result

PC ← PC+4MBR ← mem[MAR] # addr of C in MBR

MAR ← MBRMBR ← ALU.output # value of C in MBR

mem[MAR] ← MBR # memory write

CSIS1120A 14. Processor Organization 10

Page 11: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Instruction Pipeline

to increase the throughput of instruction execution.We assume only the LD/ST instructions uses memory, allother instructions uses registers.Divide the execution process into different stages, forexample:

1 Instruction Fetch (IF)2 Instruction Decode (ID) [operand fetch from register]3 Instruction Execution (EX)4 Memory Operation (MEM)5 Write back (register files) (WB)

CSIS1120A 14. Processor Organization 11

Page 12: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Instruction Pipeline

CSIS1120A 14. Processor Organization 12

Page 13: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Instruction Pipeline

CSIS1120A 14. Processor Organization 13

Page 14: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Pipeline

Note that not all instruction require all stages, e.g. onlyLOAD and STORE require Memory operation.The resources (hardware) required by each stages shouldnot overlap with each other, otherwise duplicate resourceis needed:

dedicated incrementer for PC instead of using ALU.Separate read port for data and instruction (IF vs MEM)multiple internal buses instead of single bus.

For an ideal pipeline, although each instruction still need togo through all 5 stages (i.e. 5 clock cycles) the throughputis 1 instruction/clock cycle.

CSIS1120A 14. Processor Organization 14

Page 15: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Pipeline Hazards

Pipeline Hazards are the situations that prevent the nextinstruction from entering the pipeline (for execution).Three classes of pipeline hazards:

1 structural hazard — arises from resource conflict, e.g. Ifonly one multiplier unit available, and current and nextinstruction both require multiply, the next instruction mustwait for previous multiply to finish, or when PC incrementand ALU operation both requires the ALU, there will beresource conflict.

2 Data hazard — instruction waiting for the result of previousinstructions. (data dependency)

3 Control hazard — arises from branches or otherinstructions that changes the PC (e.g. CALL/JSR, jumpsubroutine). We don’t know where to continue until thebranch instruction finishes.

CSIS1120A 14. Processor Organization 15

Page 16: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Structural Hazard

For example, if only one memory read port is available,then instruction fetch and memory read/write cannot beoverlapped. (Note that not all instruction require memoryread/write).For LOAD/STORE, then IF of the instruction that overlapswith MEM need to stop and wait.or, the increment of PC cannot be overlapped with ALUoperations, if both requires ALU.Solution: Add more resources, e.g. Dedicated incrementerfor PC, separate data/instruction cache, so that IF and MEMstage will use different hardware.

CSIS1120A 14. Processor Organization 16

Page 17: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Structural Hazard

CSIS1120A 14. Processor Organization 17

Page 18: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Data Hazard

Example:ADD R2, R3, R1SUB R1, R5, R5

source operand R1 of the SUB instructions depends on theresult of previous instruction.See the figure at next stage, where operand fetch is doneat the same time as instruction decoding (operandpre-fetching).

CSIS1120A 14. Processor Organization 18

Page 19: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Data Hazard

CSIS1120A 14. Processor Organization 19

Page 20: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Solution to Data Hazard

Can use hardware techniquesThe result is already at the output of the ALU by the end ofEX cycle of the ADD instruction.can forward the result to the ALU input for the EX of nextinstructionThis is called data forwarding.

Still problem with LD instruction, where the data isavailable only at MEM stage.By software techniques

Change the execution sequence, by inserting unrelatedinstruction in between, e.g.

ADD R11, R12, R13 ADD R2, R3, R1ADD R14, R15, R16 ADD R11, R12, R13ADD R2, R3, R1 ADD R14, R15, R16SUB R1, R5, R5 SUB R1, R5, R5

CSIS1120A 14. Processor Organization 20

Page 21: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Data Hazard from LD instruction

CSIS1120A 14. Processor Organization 21

Page 22: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Control Hazard

The IF stage of the next instruction cannot start until thebranch is resolved.the CPU must wait, and there is no other way to resolve it.One can use Branch Prediction.

On a branch instruction, we continue execution along oneof the two paths.The execution can always be discarded if nothing has beenchanged (i.e. write back to register/memory).50% of the time we will choose the correct path.In some situation, e.g. branches corresponding to for-loop,we will have even higher correct prediction (because in afor-loop, most of the time, we will branch back).

CSIS1120A 14. Processor Organization 22

Page 23: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Control Hazard

CSIS1120A 14. Processor Organization 23

Page 24: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Control Hazard with Simple Branch Prediction

CSIS1120A 14. Processor Organization 24

Page 25: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Dynamic Branch Prediction

Current prediction depends on previous prediction.Rationale: For example, in a for-loop, usually, we willpredict branch taken. However, for the last iteration, thebranch is not taken. The next time, branch is taken again.Require two consecutive wrong prediction to changedecision.

CSIS1120A 14. Processor Organization 25

Page 26: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Delayed Branch

To avoid control hazard, the instruction following thebranch instruction is always executed.Branch only after executing the following instruction.This is employed in MIPS CPU.The compiler will add a NOP instruction after the branch,which is always correct.Then by instruction rescheduing, try to find a usefulinstruction to put in the delay slot.

CSIS1120A 14. Processor Organization 26

Page 27: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Delayed Branch

CSIS1120A 14. Processor Organization 27

Page 28: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Instruction Scheduling in Delayed Branch

CSIS1120A 14. Processor Organization 28

Page 29: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Processor Performance

Execution Time = Instruction Count × CPI × ClockCycle TimeCPI = Clock per instruction.Clock rate not the only factor, when comparing twoprocessors.You have to compare how many instruction is needed toachieve a function, and how many clock cycle aninstruction requires.

CSIS1120A 14. Processor Organization 29

Page 30: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Processor Performance

exploit more effective instruction pipelining.use multiple instruction execution unit.large register file — reduce memory access, henceincreasing speedsimplified instruction set — reduces or eliminates the needfor microprograms, which is slower than hardwiredimplementation

CSIS1120A 14. Processor Organization 30

Page 31: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Characteristics of Modern Processors

All instructions are register-register type except LOAD andSTORE (which access memory)Fixed length and simple, fixed format instructions that donot cross main memory word boundary — for example, wecan start fetching register operands if they are always atthe same place within the instruction and discard the readif there are no operands.Relatively few operations and addressing modes — SimpleCPU, simpler CPU implementation, faster clock rate. Also,the pipeline can be designed more efficiently.Use of hardwired rather than microprogrammed control.Use of instruction pipelining and extensive software &hardware techniques to eliminate pipeline stall (pipelinehazards).These are concepts of Reduced Instruction Set Computers(RISC).

CSIS1120A 14. Processor Organization 31

Page 32: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Performance Improvement by using Registers

ADD A, B, C LD A, R1LD B, R2ADD R1, R2, R3ST R3, C

Number of Memory Access = 3 (in both cases)

ADD A, B, C LD A, R1ADD D, C, F LD B, R2ADD B, F, C ADD R1, R2, R3

LD D, R4ADD R4, R3, R5ADD R2, R5, R3ST R3, CST R5, F

Number of Memory Access is 9 vs 5.CSIS1120A 14. Processor Organization 32

Page 33: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Performance Issue

Simple Instruction Set, and hardwired logic⇒ high clock rate (low clock cycle time)Extensive Pipelining⇒ low CPISimple instruction⇒ high instruction countHowever, empirical studies shows that the increase ininstruction count is usually very small, e.g. 20% even usingsimple instruction.Hence there is an overall improvement in performance

CSIS1120A 14. Processor Organization 33

Page 34: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Reduced Instruction Set Computer (RISC)Background

Advances in Computer Technologyfully utilize the large amount of registers.rearrangement of instructions to reduce pipeline stall.

Standardized Operating System and EnvironmentsUnix/Linux Operating System with source codeX-windows as standard GUIcan afford to remove compatibility with older instruction set,can easily port OS, GUI and application software.

CSIS1120A 14. Processor Organization 34

Page 35: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Relative Code Size

Three studies were performed to compare the number ofinstructions executed in different sets of C programs fordifferent processors, including RISC I processor (reducedinstruction set computer), against complex instruction setprocessors (VAX-11, Motorala M68000, and Zilog Z8002).

Processors 11 C Prog 12 C Prog 5 C ProgRISC I 1.0 1.0 1.0

VAX 11/780 0.8 0.67M68000 0.9 0.9Z8002 1.2 1.12

The instruction count only increases by 20% – 30%.CPI reduced by 2-4 times. (e.g. VAX-11/780 has a CPI of8)Clock cycle time is also reduced.

CSIS1120A 14. Processor Organization 35

Page 36: CSIS1120A - 14. Processor Organizationi.cs.hku.hk/~kpchan/CSIS1120A/Notes/14.processor.pdf · Processor Organization The processor performs operations via two operations: 1 moving

Characteristics of RISC

All instructions are register-register type except LOAD andSTORE (which access memory)Fixed length and simple, fixed format instructions that donot cross main memory word boundary.Relatively few operations and addressing modes.Use of hardwired rather than microprogrammed control.Use of instruction pipelining and extensive software andhardware techniques to eliminate pipeline disruption.Rely on optimizing compiler to enhance systemperformance.RISC is a design philosophy where performance of CPU isenhanced.Nowadays, usually we will see some kind of hybrid design.

CSIS1120A 14. Processor Organization 36