The AMD K8 Processor Architecture December 14 th 2006

The AMD K8 Processor Architecture

December 14th 2006

K7 vs K8

K7: 3 x86 decoding units, 3 integer units (ALU), 3 floating point units (FPU),128KB L1 cache

K8: 3 decoders (16 bytes of instructions per clock cycle); x86 instructions decoded into fixed length micro-operations (µOPs). Complex instructions are decoded into 2 + µOps FastPath: Certain µOPs are packed together µOPs are then dispatched to the execution units. 3 Address Generation Units (AGU) for Loads and Stores Three integer units (ALU): most µOps executed in one cycle,

multiplication has a 3 cycles latency in 32 bits, and a 5 cycles latency in 64 bits

Three floating point units (FPU), that handle x87, MMX, 3DNow!, SSE and SSE2 instructions

Load/Store stage: The L1 is dual-ported, that means it can handle two 64 bits reads or writes each clock cycle

K8 Hammer Microarchitecture

K7 vs K8 Pipelines

K8 L1 and L2Cache The L1 cache

CPU K8 Athlon XP Pentium 4 Northwood Pentium 4 Prescott

Sizecode : 64KB

data : 64KBcode : 64Ko

data : 64KBTC : 12Kµops

data : 8KBTC : 12Kµops

data : 16KB

Associativitycode : 2 way

data : 2 waycode : 2 way

data : 2 wayTC : 8 way

data : 4 wayTC : 8 way

data : 8 way

Cache line sizecode : 64 bytes

data : 64 bytescode : 64 bytes

data : 64 bytesTC : n.adata : 64 bytes

TC : n.adata : 64 bytes

Write policy Write Back Write Back Write Through Write Through

Latency 3 cycles 3 cycles 2 cycles 4 cycles

The L2 cache

CPU K8 Athlon XP Pentium 4 Northwood Pentium 4 Prescott

Size512KB (Newcastle)

1024KB (Hammer)256 and 512KB 512KB 1024KB

Associativity 16 way 16 way 8 way 8 way

Cache line size 64 bytes 64 bytes 64 bytes 64 bytes

Latency(given by

manufacturer)? 8 cycles 7 cycles 11 cycles

Bus width 128 bits 64 bits 256 bits 256 bits

L1 relationship exclusive exclusive inclusive inclusive

Exclusive vs Inclusive Cache

Exclusive L1-L2Positive Negative

L1 and L2 cache designs a cache line (instructions/data) is not persisted from L1 to L2

No constraint on the L2 size (it can be small). Total cache size is sum of the sub-level sizes.

L2 performance impaired (latency)

Need to use a Victim Buffer

Inclusive L1-L2Positive Negative

Duplicates the content of the L1 cache in the L2 Cache

L2 performance improved Constraint on the L1/L2 size ratio (relatively large L2)Total cache size may be smaller.

K8 Athlon 64

Athlon 64 Operating Modes

Opteron VS. Xeon

The AMD K8 Processor Architecture December 14 th 2006

Documents

Processor Amd - Spesifikasi

AMD Socket 939 Processor Motherboard

Central processor amd memory

AMD Socket F Dual Processor Motherboard

Six-Core AMD Opteron EE Processor

AMD-K6 MMX Enhanced Processor Multimedia Technology

AMD Ryzen Processor and AMD Ryzen Master Over · PDF fileAdvanced Micro Devices AMD Ryzen Processor and AMD Ryzen Master Over-clocking User’s Guide Publication # 55931 Revision:

Pengaruh Overclocking Processor AMD Ryzen 5 Pada …

AMD K8 / AM2 Plattform Installationdownload.gigabyte.asia/FileList/Manual/cooler_manual_gh-psu21-fb_g.… · Abbildung B 1. Einheitliches Design für alle LGA775 und AMD AM2 / K8

AMD APU and Processor Comparisons · AMD APU and Processor Comparisons AMD Client Desktop Feb 2013 AMD

Six Core AMD Opteron Processor With AMD Chipset Platform

Компютърни системи с 64-битови процесори на AMD K8

AMD OpteronTM Processor (“Magny-Cours”)junfeng/12sp-w4118/lectures/amd.pdf · AMD Opteron TM Processor (“Magny-Cours”) Pat Conway ... Processor Architecture AMD driving the

Tabel Perkembangan Processor AMD

AMD Processor Power and Thermal Data Sheet

AMD Ryzen Processor and AMD Ryzen Master Ryzen Processor and AMD … · AMD Ryzen™ Threadripper™ Processor and AMD Ryzen™ Master Overclocking User’s Guide 55931 Rev. 1.10

AMD Athlon MP Processor Model 10

Revision Guide for AMD Family 10h Processors · 2013. 10. 24. · Mobile Processor, AMD Phenom™ II Dual-Core Mobile Processor, AMD Phenom™ II Triple-Core Mobile Processor and

AMD K-6 Processor Evaluation

AMD Socket 754 Processor Motherboard