12
The AMD K8 Processor Architecture December 14 th 2006

The AMD K8 Processor Architecture December 14 th 2006

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The AMD K8 Processor Architecture December 14 th 2006

The AMD K8 Processor Architecture

December 14th 2006

Page 2: The AMD K8 Processor Architecture December 14 th 2006

K7 vs K8

K7: 3 x86 decoding units, 3 integer units (ALU), 3 floating point units (FPU),128KB L1 cache

K8: 3 decoders (16 bytes of instructions per clock cycle); x86 instructions decoded into fixed length micro-operations (µOPs). Complex instructions are decoded into 2 + µOps FastPath: Certain µOPs are packed together µOPs are then dispatched to the execution units. 3 Address Generation Units (AGU) for Loads and Stores Three integer units (ALU): most µOps executed in one cycle,

multiplication has a 3 cycles latency in 32 bits, and a 5 cycles latency in 64 bits

Three floating point units (FPU), that handle x87, MMX, 3DNow!, SSE and SSE2 instructions

Load/Store stage: The L1 is dual-ported, that means it can handle two 64 bits reads or writes each clock cycle

Page 3: The AMD K8 Processor Architecture December 14 th 2006

K8 Hammer Microarchitecture

Page 4: The AMD K8 Processor Architecture December 14 th 2006

K7 vs K8 Pipelines

Page 5: The AMD K8 Processor Architecture December 14 th 2006

K8 L1 and L2Cache The L1 cache

CPU K8 Athlon XP Pentium 4 Northwood Pentium 4 Prescott

Sizecode : 64KB

data : 64KBcode : 64Ko

data : 64KBTC : 12Kµops

data : 8KBTC : 12Kµops

data : 16KB

Associativitycode : 2 way

data : 2 waycode : 2 way

data : 2 wayTC : 8 way

data : 4 wayTC : 8 way

data : 8 way

Cache line sizecode : 64 bytes

data : 64 bytescode : 64 bytes

data : 64 bytesTC : n.adata : 64 bytes

TC : n.adata : 64 bytes

Write policy Write Back Write Back Write Through Write Through

Latency 3 cycles 3 cycles 2 cycles 4 cycles

The L2 cache

CPU K8 Athlon XP Pentium 4 Northwood Pentium 4 Prescott

Size512KB (Newcastle)

1024KB (Hammer)256 and 512KB 512KB 1024KB

Associativity 16 way 16 way 8 way 8 way

Cache line size 64 bytes 64 bytes 64 bytes 64 bytes

Latency(given by

manufacturer)? 8 cycles 7 cycles 11 cycles

Bus width 128 bits 64 bits 256 bits 256 bits

L1 relationship exclusive exclusive inclusive inclusive

Page 6: The AMD K8 Processor Architecture December 14 th 2006

Exclusive vs Inclusive Cache

Exclusive L1-L2Positive Negative

L1 and L2 cache designs a cache line (instructions/data) is not persisted from L1 to L2

No constraint on the L2 size (it can be small). Total cache size is sum of the sub-level sizes.

L2 performance impaired (latency)

Need to use a Victim Buffer

Inclusive L1-L2Positive Negative

Duplicates the content of the L1 cache in the L2 Cache

L2 performance improved Constraint on the L1/L2 size ratio (relatively large L2)Total cache size may be smaller.

Page 7: The AMD K8 Processor Architecture December 14 th 2006

K8 Athlon 64

Page 8: The AMD K8 Processor Architecture December 14 th 2006

Athlon 64 Operating Modes

Page 9: The AMD K8 Processor Architecture December 14 th 2006
Page 10: The AMD K8 Processor Architecture December 14 th 2006
Page 11: The AMD K8 Processor Architecture December 14 th 2006
Page 12: The AMD K8 Processor Architecture December 14 th 2006

Opteron VS. Xeon