87
The Processor: Datapath and Control

The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Embed Size (px)

Citation preview

Page 1: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

The Processor:

Datapath and Control

Page 2: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Outline

Goals in processor implementation

Brief review of sequential logic design

Pieces of the processor implementation puzzle

A simple implementation of a MIPS integer instruction subsetDatapath Control logic design

A multi-cycle MIPS implementationDatapath Control logic design

Microcoded control

Exceptions

Some real microprocessor datapath and control

Page 3: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Goals in processor implementation

Balance the rate of supply of instructions and data and the rate at which the execution core can consume them and can update memory

instruction supply data supplyexecution core

Page 4: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Goals in processor implementation

Recall from Chapter 2CPU Time = INST x CPI x CT

INST largely a function of the ISA and compiler

Objective: minimize CPI x CT within design constraints (cost, power, etc.)

Trading off CPI and CT is tricky

multiplier

multiplier

multiplier

logic

logic

logic

Page 5: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Brief review of sequential logic design

State elements are clocked devicesFlip flops, etc

Combinatorial elements hold no stateALU, caches, multiplier, multiplexers, etc.

In edge triggered clocking, state elements are only updated on the (rising) edge of the clock pulse

Page 6: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Brief review of sequential logic design

The same state element can be read at the beginning of a clock cycle and updated at the end

Example: incrementing the PC

Add

12

8

PC

4

clock

PC register 8 12

12Add output

Add input 8

clock

Page 7: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Our processor design progression

(1) Instruction fetch, execute, and operand reads from data memory all take place in a single clock cycle

(2) Instruction fetch, execute, and operand reads from data memory take place in successive clock cycles

(3) A pipelined design

Page 8: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Pieces of the processor puzzle

Instruction fetch

Execution

Data memory

instruction supply data supplyexecution core

Page 9: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction fetch datapath

Memory to hold instructions

Register to hold the instruction memory address

Logic to generate the next instruction address

PC +4

Page 10: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Execution datapath

Focus on only a subset of all MIPS instructionsadd, sub, and, orlw, sw sltbeq, j

For all instructions except j, we Read operands from the register filePerform an ALU operation

For all instructions except sw, beq, and j, we write a result into the register file

Page 11: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Execution datapath

Register file block diagram

Read register 1,2: source operand register numbers Read data 1,2: source operands (32 bits each)Write register: destination operand register numberWrite data: data written into register file RegWrite: when asserted, enables the writing of Write

Data

Page 12: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Execution datapath

Datapath for R-type (add, sub, and, or, slt)

R-type instruction format:

op rs rt functrd shamt31 26 16 15 11 10 6 5 025 2021

Page 13: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Execution datapath

Datapath for beq instruction

I-type instruction format:

Zero ALU output indicates if rs=rt (branch is taken/not taken)Branch target address is the sign extended immediate left

shifted two positions, and added to PC+4

op rs rt immediate31 26 16 15 025 2021

Page 14: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Data memory Used for lw, sw (I-type format)

Block diagram

Address: memory location to be read or writtenRead data: data out of the memory on a loadWrite data: data into the memory on a storeMemRead: indicates a read operation is to be performedMemWrite: indicates a write operation is to be performed

Page 15: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Execution datapath + data memory

Datapath for lw, sw

Address is the sign-extended immediate added to the source operand read out of the register file

sw: data written to memory from specified registerlw: data written to register file from specified memory

address

Page 16: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Putting the pieces together Single clock cycle for fetch, execute, and

operand read from data memory

3 MUXesRegister file operand or sign extended immediate to ALUALU or data memory output written to register filePC+4 or branch target address written to PC register

Page 17: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath for R-type instructions

Example: add $4, $18, $30

Page 18: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath for I-type ALU instructions

Example: slti $7, $4, 100

Page 19: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath for not taken beq instruction

Example: beq $28, $13, EXIT

Page 20: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath for taken beq instruction

Example: beq $28, $13, EXIT

Page 21: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath for load instruction

Example: lw $8, 112($2)

Page 22: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath for store instruction

Example: sw $10, 0($3)

Page 23: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Control signals we need to generate

Page 24: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

ALU operation control

ALU control input codes from Chapter 4

Two steps to generate the ALU control inputUse the opcode to distinguish R-type, lw and sw, and

beqIf R-type, use funct field to determine the ALU control

input

ALU control input ALU operation Used for

000 and and

001 or or

010 add add, lw, sw

110 subtract sub, beq

111 set on less than slt

Page 25: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

ALU operation control

Opcode used to generate a 2-bit signal called ALUOp with the following encodings00: lw or sw, perform an ALU add 01: beq, perform an ALU subtract 10: R-type, ALU operation is determined by the funct

field

Funct Instruction

ALU control input

100000 add 010

100010 sub 110

100100 and 000

100101 or 001

101010 slt 111

Page 26: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Comparing instruction fields

Opcode, source registers, function code, and immediate fields always in same place

Destination register isbits 15-11 (rd) for R-typebits 20-16 (rt) for lwMUX to select the right one

0 rs rt functrd shamt31 26 16 15 11 10 6 5 025 2021

4 rs rt immediate (offset)31 26 16 15 025 2021

R-type

beq

35 (43) rs rt immediate (offset)31 26 16 15 025 2021

lw (sw)

Page 27: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Datapath with instr fields and ALU control

Page 28: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Main control unit design

Page 29: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Main control unit design

Truth table

(4)

(0)

(34)

(43)

Page 30: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Adding support for jump instructions

J-type format

Next PC formed by shifting left the 26-bit target two bits and combining it with the 4 high-order bits of PC+4

Now the next PC will be one ofPC+4beq target addressj target address

We need another MUX and control bit

2 target31 26 025

Page 31: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Adding support for jump instructions

Page 32: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Evaluation of the simple implementation

All instructions take one clock cycle (CPI = 1)

Assume the following worst case delaysInstruction memory: 4 time units Data memory: 4 time units (read), 2 time units (write)ALU: 4 time unitsAdders: 3 time unitsRegister file: 2 time units (read), 1 time unit (write)MUXes, sign extension, gates, and shifters: 1 time unit

Large disparity in worst case delays among instruction typesR-type: 4+2+1+4+1+1 = 13 time unitsbeq: 4+2+1+4+1+1+1 = 14 time unitsj: 4+1+1 = 6 time unitsstore: 4+2+4+2 = 12 time unitsload: 4+2+4+4+1+1 = 16 time units

Page 33: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Evaluation of the simple implementation

Disparity would be worse in a real machineEven slower integer instructions (e.g., multiply/divide

in MIPS)Floating point instructions

Simple instructions take as long as complex ones

Page 34: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

A multicycle implementation

Instruction fetch, register file access, etc occur in separate clock cycles

Different instruction types take different numbers of cycles to complete

Clock cycle time should be faster

Page 35: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

High level view of datapath

New registers store results of each step Not programmer visible!

Hardware can be sharedOne ALU for PC+4, branch target calculation, EA calculation,

and arithmetic operationsOne memory for instructions and data

Page 36: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Detailed multi-cycle datapath

Page 37: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Multi-cycle control

Page 38: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

First two cycles for all instructions

Instruction fetch (1st cycle)Load the instruction into the IR register

IR = Memory[PC]Increment the PC

PC = PC+4

Instruction decode and register fetch (2nd cycle)Read register file locations rs and rt, results into the A

and B registersA=Reg[IR[25-21]]B=Reg[IR[20-16]]

Calculate the branch target address and load into ALUOutALUOut = PC+(sign-extend (IR[15-0]) <<2)

Page 39: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction fetch

IR=Mem[PC]

Page 40: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction fetch

PC=PC+4

Page 41: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction decode and register fetch

A=Reg[IR[25-21]], B=Reg[IR[20-16]]

Page 42: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction decode and register fetch

ALUOut = PC+(sign-extend (IR[15-0]) <<2)

Page 43: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Additional cycles for R-type

Execution ALUOut = A op B

CompletionReg[IR[15-11]] = ALUOut

Page 44: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

R-type execution cycle

ALUOut = A op B

Page 45: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

R-type completion cycle

Reg[IR[15-11]] = ALUOut

Page 46: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Additional cycles for store

Address computationALUOut = A + sign-extend (IR[15-0])

Memory accessMemory[ALUOut] = B

Page 47: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Store address computation cycle

ALUOut = A + sign-extend (IR[15-0])

Page 48: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Store memory access cycle

Memory[ALUOut] = B

Page 49: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Additional cycles for load

Address computation ALUOut = A + sign-extend (IR[15-0])

Memory accessMDR = Memory[ALUOut]

Read completionReg[IR[20-16]] = MDR

Page 50: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Load memory access cycle

MDR = Memory[ALUOut]

Page 51: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Load read completion cycle

Reg[IR[20-16]] = MDR

Page 52: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Additional cycle for beq

Branch completionif (A == B) PC = ALUOut

Page 53: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Branch completion cycle for beq

if (A == B) PC = ALUOut

Page 54: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Additional cycle for j

Jump completionPC = PC[31-28] || (IR[25-0]<<2)

Page 55: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Jump completion cycle for j

PC = PC[31-28] || (IR[25-0]<<2)

Page 56: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Control logic design

Implemented as a Finite State Machine

Inputs: 6 opcode bitsOutputs: 16 control signalsState: 4 bits for 10 states

Page 57: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

High-level view of FSM

Page 58: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction fetch cycle

Page 59: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Instruction decode/register fetch cycle

Page 60: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

R-type execution cycle

Page 61: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

R-type completion cycle

Page 62: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Memory address computation cycle

Page 63: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Store memory access cycle

Page 64: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Load memory access cycle

Page 65: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Load read completion cycle

Page 66: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

beq branch completion cycle

Page 67: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

j jump completion cycle

Page 68: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Complete FSM

Page 69: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Evaluation of the multi-cycle design

CPI calculated based on the instruction mixFor gcc (Figure 4.54)

23% loads (5 cycles each)13% stores (4 cycles each)19% branches (3 cycles each)2% jumps (3 cycles each)43% ALU (4 cycles each)

CPI = 0.23*5+0.13*4+0.19*3+0.02*3+0.43*4=4.02

Cycle time is calculated from the longest delay path assuming the same timing delays as before

Page 70: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Worst case datapath: branch target

ALUOut = PC+(sign-extend (IR[15-0]) <<2)

Delay = 7 time units (delay of simple = 16)

Page 71: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Evaluation of the multi-cycle design

Time per instruction of simple and multi-cycleTPI(simple) = CPI(simple) x cycle time(simple) = 16TPI(multi-cycle) = 4.02 x 7 = 28.1

Simple single-cycle implementation is faster

Multicycle with pipelining will be considerably faster than single-cycle implementation

Page 72: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Exceptions

An exception is an event that causes a deviation from the normal execution of instructions

Types of exceptions Operating system call (e.g., read a file, print a file)Input/output device requestPage fault (request for instruction/data not in memory – Ch 7)Arithmetic error (overflow, underflow, etc.)Undefined instructionMisaligned memory access (e.g., word access to odd address)Memory protection violationHardware errorPower failure

An exception is not usually due to an error!

We need to be able to restart the program at the point where the exception was detected

Page 73: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Handling exceptions

Detect the exception

Save enough information about the exception to handle it properly

Save enough information about the program to resume it after the exception is handled

Handle the exception

Either terminate the program or resume executing it depending on the exception type

Page 74: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Detecting exceptions

Performed by hardware

Overflow: determined from the opcode and the overflow output of the ALU

Undefined instruction: determined from The opcode in the main control unitThe function code and ALUop in the ALU control logic

Page 75: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Detecting exceptions

overflow

undefinedinstruction

Page 76: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Saving exception information

Performed by hardware

We need the type of exception and the PC of the instruction when the exception occurred

In MIPS, the Cause register holds the exception typeNeed an encoding for each exception typeNeed a signal from the control unit to load it into the

Cause register

and the Exception Program Counter (EPC) register holds the PCNeed to subtract 4 from the PC register to get the

correct PC (since we loaded PC+4 into the PC register during the Instruction Fetch cycle)

Need a signal from the control unit to load it into EPC

Page 77: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Saving exception information

Page 78: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Saving program information

Needed in order to restart the program from the point where the exception occurred

Performed by hardware and software

EPC register holds the PC of the instruction that had the exception (where we will restart the program)

The software routine that handles the exception saves any registers that it will need to the stack and restores them when it is done

Page 79: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Handling the exception

Performed by hardware and software

Need to transfer control to a software routine to handle the exception (exception handler)

The exception handler runs in a privileged mode that allows it to use special instructions and access all of memoryOur programs run in user mode

The hardware enables the privileged mode, loads PC with the address of the exception handler, and transitions to the Fetch state

Page 80: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Handling the exception Loading the PC with exception handler

address

Page 81: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

Exception handler

Stores the values of the registers that it will need to the stack

Handles the particular exceptionOperating system call: calls the subroutine associated with the

callUnderflow: sets register to zero or uses denormalized numbers I/O: handles the particular I/O request, e.g., keyboard input

Restores registers from the stack (if program is to be restarted)

Terminates the program, or resumes execution by loading the PC with EPC and transitioning to the Instruction Fetch state

Page 82: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

FSM modifications

Page 83: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

The Intel Pentium processor

Introduced in 1993

Uses a multi-cycle datapath with the following steps for integer instructionsPrefetch (PF): read instruction from the instruction

memoryDecode 1 (D1): first stage of instruction decodeDecode 2 (D2): second stage of instruction decodeExecute (E): perform the ALU operationWrite back (WB): write the result to the register file

Datapath usage varies by instruction typeSimple instructions make one pass through the

datapath using state machine controlComplex instructions make multiple passes, reusing

the same hardware elements under microcode control

Page 84: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

The Intel Pentium processor

The Pentium is a 2-way superscalar design as two instructions can simultaneously execute

Ideal CPI for a 2-way superscalar is 0.5

Conditions for superscalar executionBoth must be simple instructionsThe result of the first instruction cannot be needed by the

secondBoth instructions cannot write the same registerThe first instruction in program sequence cannot be a

jump

PF D1

D2 E WB

D2 E WB U pipe

V pipe

Page 85: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

The Intel Pentium Pro processor

Introduced in 1995 as the successor to the Pentium

The basis for the Pentium II and Pentium III

Implements a 14-cycle, 3-way superscalar integer datapathVery high frequency is the goal

Uses out-of-order execution in that instructions may execute out of their original program orderCompletely handled by hardware transparently to

softwareInstructions execute as soon as their source operands

become availableComplicates exception handling

Some instructions before the excepting one may not have executed, while some after it may have executed

Page 86: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

The Intel Pentium Pro processor

Pentium Pro designers (and AMD designers before them) used innovative engineering to overcome the disadvantages of CISC ISAsMany complex X86 instructions are internally

translated by hardware into RISC-like micro-ops with state machine control

Achieves a very low CPI for simple integer operations even on programs compiled for older implementations

Combination of high frequency and low CPI gave the Pentium Pro extremely competitive integer performance versus RISC microprocessorsResult has been that RISC CPUs have failed to gain the

desktop market share that had been expected

Page 87: The Processor: Datapath and Control. Outline Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation

The Intel Pentium 4 processor

20 cycle superscalar integer pipeline

Extremely high frequency (>3GHz)

Major effort to lower power dissipationClock gating: clock to a unit is turned off when the unit

is not in useTrace cache: caches micro-ops of previously decoded

complex instructions to avoid power-consuming decode operation