56
Pentium Processor

Pentium processor

Embed Size (px)

Citation preview

Page 1: Pentium processor

Pentium Processor

Page 2: Pentium processor

Features of Pentium• Introduced in 1993 with clock frequency

ranging from 60 to 66 MHz• The primary changes in Pentium Processor

were:– Superscalar Architecture– Dynamic Branch Prediction– Pipelined Floating-Point Unit– Separate 8K Code and Data Caches– Writeback MESI Protocol in the Data Cache– 64-Bit Data Bus– Bus Cycle Pipelining

Page 3: Pentium processor

Pentium Architecture

Page 4: Pentium processor

Pentium Architecture• It has data bus of 64 bit and address bus of

32-bit• There are two separate 8kB caches – one

for code and one for data.• Each cache has a separate address

translation TLB which translates linear addresses to physical.

• Code Cache: – 2 way set associative cache– 256 lines b/w code cache and prefetch

buffer, permitting prefetching of 32 bytes (256/8) of instructions

Page 5: Pentium processor

Pentium Architecture• Prefetch Buffers:

▫Four prefetch buffers within the processor works as two independent pairs. When instructions are prefetched from cache, they

are placed into one set of prefetch buffers. The other set is used as when a branch operation is

predicted.▫Prefetch buffer sends a pair of instructions to

instruction decoder• Instruction Decode Unit:

▫It occurs in two stages – Decode1 (D1) and Decode2(D2)

▫D1 checks whether instructions can be paired▫D2 calculates the address of memory resident

operands

Page 6: Pentium processor

Pentium Architecture• Control Unit :

▫This unit interprets the instruction word and microcode entry point fed to it by Instruction Decode Unit

▫It handles exceptions, breakpoints and interrupts. ▫It controls the integer pipelines and floating point

sequences• Microcode ROM :

▫Stores microcode sequences• Arithmetic/Logic Units (ALUs) :

▫There are two parallel integer instruction pipelines: u-pipeline and v-pipeline

▫The u-pipeline has a barrel shifter▫The two ALUs perform the arithmetic and logical

operations specified by their instructions in their respective pipeline

Page 7: Pentium processor

Pentium Registers• Four 32-bit registers can be used as∗ Four 32-bit register (EAX, EBX, ECX, EDX)∗ Four 16-bit register (AX, BX, CX, DX)∗ Eight 8-bit register (AH, AL, BH, BL, CH, CL, DH, DL)• Some registers have special use∗ ECX for count in loop instructions

Page 8: Pentium processor

Pentium Registers (Eflags)

• Flags never change for any data transfer or program control operation.

• Some of the flags are also used to control features found in the microprocessor.

Page 9: Pentium processor

•Flag bits, with a brief description of function.

•C (carry) holds the carry after addition or borrow after subtraction. ▫also indicates error conditions

•P (parity) is the count of ones in a number expressed as even or odd. Logic 0 for odd parity; logic 1 for even parity. ▫if a number contains three binary one bits,

it has odd parity▫if a number contains no one bits, it has even

parity

Page 10: Pentium processor

•C (carry) holds the carry after addition or borrow after subtraction. ▫also indicates error conditions

•P (parity) is the count of ones in a number expressed as even or odd. Logic 0 for odd parity; logic 1 for even parity. ▫if a number contains three binary one bits, it has

odd parity; If a number contains no one bits, ithas even parity

•A (auxiliary carry) holds the carry (half-carry) after addition or the borrow after subtraction between bit positions 3 and 4 of the result.

Page 11: Pentium processor

• Z (zero) shows that the result of an arithmetic or logic operation is zero.

• S (sign) flag holds the arithmetic sign of the result after an arithmetic or logic instruction executes.

• T (trap) The trap flag enables trapping through an on-chip debugging feature.

• I (interrupt) controls operation of the INTR (interrupt request) input pin.

• D (direction) selects increment or decrement mode for the DI and/or SI registers.

• O (overflow) occurs when signed numbers are added or subtracted. ▫an overflow indicates the result has exceeded

the capacity of the machine

Page 12: Pentium processor

•IOPL used in protected mode operationto select the privilege level for I/O devices.

•NT (nested task) flag indicates the current task is nested within another task in protected mode operation.

•RF (resume) used with debugging to control resumption of execution after the next instruction.

•VM (virtual mode) flag bit selects virtual mode operation in a protected mode system

Page 13: Pentium processor

• AC, (alignment check) flag bit activates if a word or doubleword is addressed on a non-word or non-doubleword boundary.

• VIF is a copy of the interrupt flag bit available to the Pentium 4–(virtual interrupt)

• VIP (virtual) provides information about a virtual mode interrupt for (interrupt pending) Pentium. ▫used in multitasking environments to provide

virtual interrupt flags• ID (identification) flag indicates that the

Pentium microprocessors support the CPUID instruction. ▫CPUID instruction provides the system with

information about the Pentium microprocessor

Page 14: Pentium processor

Control Registers

Page 15: Pentium processor

• CD cache disable controls the internal cache. If CD=1 , the cache will not fill with new data . If CD=0 misses will cause the cache to fill with new data

• NW Not write through selects the mode of operation for the data cache. If NW=1, the data cache is inhibited from cache write though

• AM Alignment mask enables alignment checking when set, it only occurs for protected mode

• WP write protect protects user level pages against supervisor level write operations. When WP=1, the supervisor can write to user level segments

• NE numeric error enables standard numeric coprocessor error detection.

Page 16: Pentium processor

Pin Diagram

Page 17: Pentium processor

•CLOCK▫CLK - Clock (Input)

Fundamental Timing for the Pentium The CPU uses this signal as the internal

processor clock.▫BF - Bus Frequency (Input)

Bus Frequency determines the bus-to-core frequency ratio

When BF is strapped to Vcc, the processor will operate at a 2 to 3 bus to core frequency ratio.

When BF is strapped to Vss, the processor will operate at a 1 to 2 bus to core frequency ratio.

Page 18: Pentium processor

•Initialization▫RESET - (Input)

Forces the CPU to begin execution at a known state.

▫INIT - Initialization (Input) The Pentium processor initialization input pin

forces the Pentium processor to begin execution in a known state.

The processor state after INIT is the same as the state after RESET except that the internal caches, write buffers, and floating point registers retain the values they had prior to INIT.

Page 19: Pentium processor

•Address Bus▫A31:A3 - ADDRESS bus lines

Output except for cache snooping ▫The number of address lines determines

the amount of memory supported by the processor.

▫Determines where in the 4GB memory space or 64K IO space the processor is accessing.

▫These are input lines when AHOLD & EADS# are active for Inquire Cycles (snooping)

Page 20: Pentium processor

•Address Bus▫BE7#:BEO#: Byte Enable lines (Outputs) ▫Byte Enables to enable each of the 8 bytes in

the 64-bit data path. Helps define the physical area of memory or I/O

accessed. The Pentium uses Byte Enables to address

locations within a QWORD. In effect a decode of the address lines A2-A0

which the Pentium does not generate. Which lines go active depends on the address,

and whether a byte, word, double word or quad word is required.

Page 21: Pentium processor

•Address Mask▫A20M#: Address 20 Mask (Input)

Emulates the address wraparound at 1 MByte which occurs on the 8086.

When A20M# is asserted, the Pentium processor masks physical address bit 20 (A20) before performing a lookup to the internal caches or driving a memory cycle on the bus.

A20#M must be asserted only when the processor is in real mode.

•Internal Parity▫IERR# - Internal Error (Output)

Alerts System of Internal Parity Errors

Page 22: Pentium processor

• Address Parity▫ AP Address Parity (I/O)

Bi-directional address parity pin for the address lines. Address Parity is driven by the Pentium processor

with even parity information on all CPU generated cycles in the same clock that the address is driven

Even parity must be driven back to the CPU during inquire cycles on this pin in the same clock as EADS#.

Not supported on all systems▫APCHK#: Address Parity Check Signal (Output)

The status of the address parity check is driven on the APCHK# output.

Even Parity Checking

Page 23: Pentium processor

•Data Bus. ▫D63:DO - Data Lines (I/O).

The bi-directional 64-bit data path to or from the CPU.

The signal W/R# distinguishes direction. During reads, the CPU samples the data bus

when BRDY# is asserted.▫DP7: DP0 - Data Parity (I/O)

Bi-directional data parity pins for the data bus. Even Parity Check. One for each byte of the data

bus Output on writes, Input on reads. Not supported on all systems.

Page 24: Pentium processor

•Bus Control▫ADS# - Address Strobe (output)

Indicates that a new valid bus cycle is currently being driven by the Pentium processor.

The following are some of the signals which are valid when ADS#=0 Addresses (A31:3) Byte Enables (BE7#:0#) Bus Cycle definition (M/IO#; D/C#; W/R#,

CACHE#) From power-on the ADS# signal should be

asserted periodically when bus cycles are running

Page 25: Pentium processor

• Bus Control (Cont.)▫BRDY# - Burst Ready (Input)

Transfer complete indication. The burst ready input indicates that the external

system has presented data on the data pins in response to a read or that the external system has accepted the Pentium processor data in response to a write request.

This signal ends the current bus cycle and is used to extend bus cycles to allow slow devices extra time.

If LOW (non-burst cycles), this signal ends the current bus cycle and the next bus cycle can begin.

If HIGH the Pentium is prevented from continuing processing and wait states are added.

Page 26: Pentium processor

•Bus Cycle Definition▫M/IO# - Memory or Input/Output (output)

M/IO# distinguishes between Memory and I/O cycles.

The memory/input-output is one of the primary bus cycle definition pins. 1 = Memory Cycle 0 = Input/Output Cycle

It is driven valid in the same clock as the ADS# signal is asserted.

Page 27: Pentium processor

•Bus Cycle Definition (Cont.)▫D/C# - Data or Code (output)

D/C# distinguishes between data and code or special cycles (control)

The data/code output is one of the primary bus cycle definition pins. 1 = Data 0 = Code / Control

»Control for Interrupt Acknowledge or Special Cycles It is driven valid in the same clock as the

ADS# signal is asserted.

Page 28: Pentium processor

•Bus Cycle Definition (Cont.)▫W/R# - Write or Read (output)W/R# distinguishes between Write and Read

cycles. Write/read is one of the primary bus cycle

definition pins. 1 = Write 0 = Read

It is driven valid in the same clock as the ADS# signal is asserted.

Page 29: Pentium processor

•Bus Cycle Definition (Cont.)▫Cache# - Cache ability (output)

Processor indication of internal cache ability. The L1 cache must be enabled using the CD

bit in CR0 for Cache# to be asserted low. The Cache# signal could also be described as

the BURST instruction signal, because the Cache# signal (qualified with KEN#) results in a burst mode transfer of 32 bytes of code or data.

Cache# and Ken# are used together to determine if a read will be turned into a linefill. (Burst cycle).

During write-back cycles, the CPU asserts the CACHE# signal (KEN# does not have to be asserted)

Page 30: Pentium processor

•Bus Cycle Definition (Cont.)▫NA# - Next Address (Input)

Indicates external memory is prepared for a pipeline cycle.

An active next address input indicates that the external memory system is ready to accept a new bus cycle although all data transfers for the current cycle have not yet completed.

When NA# is asserted, the Pentium supplies the address for the start of the next transfer early, so that the memory system can latch the new address before the transfer is ready to start.

A detailed discussion of Address Pipelining is beyond the scope of this course.

Page 31: Pentium processor

• Bus Cycle Definition (Cont.) ▫Lock# - Bus Lock (Output)

The bus lock pin indicates that the current bus cycle is locked, typically for a read-modify-write operation.

The CPU will not allow a bus hold when LOCK# is asserted.

Locked cycles are generated when the programmer prefixes certain instructions with the LOCK prefix. e.g. LOCK INC [EDI] ;Increment a memory location

Locked cycles are generated automatically for certain bus transfer operations. Interrupt Acknowledge cycles The XCHG instructions when 1 operand is memory-based. See Pentium manual for more details.

Page 32: Pentium processor

•Cache Control▫KEN# - Cache Enable (Input)

Indicates to the Pentium whether or not the system can support a cache line fill for the current cycle.

Cache# and Ken# are used together to determine if a read will be turned into a linefill. (Burst cycle).

▫WB/WT# - Write-back/Write-through (Input) This pin allows a cache line to be defined as a

a write back or write-through on a line by line basis.

Page 33: Pentium processor

•Bus Arbitration▫HOLD - Bus Hold (Input)

Allows another bus master complete control of the CPU bus.

In response to the bus hold request, the Pentium processor will float most of its output and input/output pins and assert HLDA after completing all outstanding bus cycles.

The Pentium processor will maintain its bus in this state until HOLD is de-asserted.

▫HLDA - Bus Hold Acknowledge (Output) External indication that the Pentium™

outputs are floated.

Page 34: Pentium processor

•Bus Arbitration (Cont.)▫BOFF# - Backoff (Input)

Forces the Pentium to get off the bus in the next clock.

After BOFF# is removed, the Pentium restarts the bus cycle.

▫BREQ - Bus Request (output) Indicates externally when a bus cycle is

pending internally. Used to inform the arbitration logic that the

Pentium need control of the bus to perform a bus cycle.

Page 35: Pentium processor

•Interrupts▫INTR - Maskable Interrupt (Input)

Indicates that an external interrupt has been generated.

If the IF(Interrupt Enable Flag) bit in the EFLAGS register is set, the Pentium processor will generate two locked interrupt acknowledge bus cycles (to get type number) and vectors to an interrupt handler after the current instruction execution is completed.

▫NMI - Non-Maskable Interrupt (Input) Indicates that an external non maskable interrupt

has been generated. The Pentium processor will vector to a Type 2

interrupt handler after the current instruction execution is completed

Page 36: Pentium processor

•Probe Mode▫R/S# - Resume/Stop [Run/Scan] (Input)

The run/stop input is an asynchronous, edge-sensitive interrupt used to stop the normal execution of the processor and place it into an idle state.

▫PRDY - Probe Ready (Output) The probe ready output pin indicates that the

processor has stopped normal execution in response to the R/S# pin going active. The CPU enters Probe Mode.

Page 37: Pentium processor

What is Superscalar?•Common instructions (arithmetic,

load/store, conditional branch) can be initiated and executed independently

•Equally applicable to RISC & CISC•In practice usually RISC

Page 38: Pentium processor

General Superscalar Organization

Page 39: Pentium processor

Superpipelined•Many pipeline stages need less than half a

clock cycle•Double internal clock speed gets two

tasks per external clock cycle•Superscalar allows parallel fetch execute

Page 40: Pentium processor

Superscalar v Superpipeline

Page 41: Pentium processor

Limitations•Instruction level parallelism•Compiler based optimisation•Hardware techniques•Limited by

▫True data dependency▫Procedural dependency▫Resource conflicts▫Output dependency▫Antidependency

Page 42: Pentium processor

True Data Dependency•ADD r1, r2 (r1 := r1+r2;)•MOVE r3,r1 (r3 := r1;)•Can fetch and decode second instruction

in parallel with first•Can NOT execute second instruction until

first is finished

Page 43: Pentium processor

Procedural Dependency•Can not execute instructions after a

branch in parallel with instructions before a branch

•Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed

•This prevents simultaneous fetches

Page 44: Pentium processor

Resource Conflict•Two or more instructions requiring access

to the same resource at the same time▫e.g. two arithmetic instructions

•Can duplicate resources▫e.g. have two arithmetic units

Page 45: Pentium processor

Output Dependency•Write-write dependency

▫R3:=R3 + R5; (I1)▫R4:=R3 + 1; (I2)▫R3:=R5 + 1; (I3)▫R7:=R3 + R4; (I4)In the above instruction sequence I2 cannot be

executed before I1 as of true dependency and similar of I4 and I3

If They are not executed sequentially wrong values will be fetched which is referred as output dependency

Page 46: Pentium processor

Antidependency•Write-write dependency

▫R3:=R3 + R5; (I1)▫R4:=R3 + 1; (I2)▫R3:=R5 + 1; (I3)▫R7:=R3 + R4; (I4)▫I3 can not complete before I2 starts as I2

needs a value in R3 and I3 changes R3

Page 47: Pentium processor

Design Issues•Instruction level parallelism

▫Instructions in a sequence are independent▫Execution can be overlapped▫Governed by data and procedural

dependency•Machine Parallelism

▫Ability to take advantage of instruction level parallelism

▫Governed by number of parallel pipelines

Page 48: Pentium processor

Instruction Issue Policy•Order in which instructions are fetched•Order in which instructions are executed•Order in which instructions change

registers and memory

Page 49: Pentium processor

In-Order Issue In-Order Completion•Issue instructions in the order they occur•Not very efficient•May fetch >1 instruction•Instructions must stall if necessary

Page 50: Pentium processor

In-Order Issue Out-of-Order Completion

•If any instruction is independent on current instruction then it is then it is allowed to execute before completion of current instruction

Page 51: Pentium processor

Out-of-Order Issue Out-of-Order Completion• Decouple decode pipeline from execution

pipeline• Can continue to fetch and decode until this

pipeline is full• When a functional unit becomes available an

instruction can be executed• Since instructions have been decoded, processor

can look ahead

Page 52: Pentium processor

Register Renaming•Output and antidependencies occur

because register contents may not reflect the correct ordering from the program

•May result in a pipeline stall•Registers allocated dynamically

▫i.e. registers are not specifically named

Page 53: Pentium processor

Superscalar Execution

Page 54: Pentium processor

Superscalar Implementation•Simultaneously fetch multiple instructions•Logic to determine true dependencies

involving register values•Mechanisms to communicate these values•Mechanisms to initiate multiple

instructions in parallel•Resources for parallel execution of

multiple instructions•Mechanisms for committing process state

in correct order

Page 55: Pentium processor

Programmers model

Page 56: Pentium processor

Data Transfer Instructions•Move data between memory and the

general purpose and segment registers.•Perform some operations as conditional

moves, stack access, and data conversion