36
An Integrated Quad-Core AMD OpteronMicroProcessor S. Searles 1 , J. Dorsey 1 , M. Ciraula 2 , S. Johnson 1 , N. Bujanos 1 , D. Wu 1 , M. Braganza 1 , S. Meyers 1 , E. Fang 3 , R. Kumar 3 1 AMD, Austin, TX; 2 AMD, Fort Collins, CO; 3 AMD, Sunnyvale, CA

An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

An Integrated Quad-Core AMD Opteron™MicroProcessor

S. Searles1, J. Dorsey1, M. Ciraula2, S. Johnson1, N. Bujanos1, D. Wu1, M. Braganza1, S. Meyers1,

E. Fang3, R. Kumar3

1AMD, Austin, TX; 2AMD, Fort Collins, CO; 3AMD, Sunnyvale, CA

Page 2: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

OutlineTechnology Highlights

Major Features

Clock/Voltage Domains and PLLs

Cache Design Choices

DDR DRAM I/O

HyperTransportTM (HT) I/O

Thermal Monitoring

Mixed Signal Design Study (DDR)2

Page 3: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Technology

65nm, SOI CMOS, 0.8-1.4V

Dual strain liners + eSiGe

Floating and Body tied

11 Cu layers + low-k

1X, 1.3X, 2X, 4X thick Cu

F04 inv delay, 15ps800nmM11800nmM10400nmM9400nmM8400nmM7400nmM6270nmM5270nmM4270nmM3200nmM2200nmM1270nmContacted PolyPitchLayer

3

Page 4: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

NM

OS

PM

OS

4

Cross Sections

Page 5: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Notable Attributes

• Monolithic, Quad x86 cores + Northbridge• Symmetric cores, same DRAM/HT latency• Enhanced 128b FPU• Shared 32Way, 2MB L3• Each core has an exclusive 16Way, 0.5MB L2• Cores have 2Way, 64kB L1 I and D Caches• 2 DDR channels, 4 DIMM/Channel, DDR2+3• Enhanced HT links for 2 to 8Way MP5

Page 6: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Die Overview

PLL

Thermal

128-bit FPU

L1 Data Cache

L1 Instr Cache

Load/ Store

512kB L2 Cache

L2 CtlFetch/

Decode/ Branch

Execution

2MB Shared L3 Cache

Northbridge

DDR

PHY

HT PHY, link 1

HT PHY, link 4

HT

PHY,

link

2H

T PH

Y, li

nk 3

Fuses

Fuses

Core 2

Core 3Core 4

Slow I/O

Slow I/O

• 463 Million Transistors

• 283mm2

• 2.0GHz & up• 0.8 to 1.4V

6

Page 7: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

7

Core/L2 OverviewCore L2

Page 8: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Voltage/Clock Domains

–VDDCORE: 0.8V-1.4V• Core and L2: 2.0GHz and up

–VDDNB: 0.8V-1.4V• Northbridge and L3: 75% of core

–VLDT: 1.2V• HyperTransport links

–VDDIO: 1.8V (VTT:0.9V)• DDR I/O

–VDDA: 2.5V• PLLs (10 across the die) + Thermal

Multiple supplies for power optimization and isolation

8

Page 9: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Clock Domains

Regional clock domains: lower power, tighter skew (12ps)

Per core, independent PLL running at 2+GHz

Independent Northbridge (NB) PLL running at ¾*core

L3 uses the NB clock but at arbitrary phase

DDR and HT interfaces have independent PLLs

Source Synchronous clocked Buses connect I/O to NB.

NB-2-core & core-2-NB communication: 16 entry FIFO.1. Asynchronous : Max performance: ~2 entry pointer separation.2. Synchronous: Provides deterministic operation for ATE.

9

Page 10: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Domain Crossing

•Domains have: different PLLs (wander) and voltages•Wide operating range: 0.7 to 1.5V on either side•Improved latency by embedding into a dynamic FIFO•Precharge to VDD_B, evaluate with VDD_A

10

Page 11: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

•PowerOK asserting: Valid voltage to all domains is present

•PowerOK signal originates in the VDDIO domain of the I/Os

•Propagates around the chip in VDDNB

•PowerOK passes from NB to all domains

•Power sniffer cell is designed with hysteresis to prevent spurious switching on power up.

Power OK Distribution

11

Page 12: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

•All PLLs run off of one 200MHz Reference Clock– Distributed by a binary tree of specially filtered repeaters– Low-pass power-supply filter, 2Mhz pole– Reduces feed-through jitter at PLL output: higher Fmax– 500ps skew target at 0.8V: deterministic test

12

Noise Filtered Ref Clock

Page 13: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

• Accepts 100 or 200 MHz Ref Clock in

• Feedback divider from 1 to 64 (all codes)

• Spine divider (1,2,3,4,6,128,512) post VCO

• Deterministic Spine divider changes (P-state)

• Duty cycle correction on output to spine

• Clock shrink/stretch for speedpath debug13

PLL Features

Page 14: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Positive feedback delay control

Center frequency is R dominated

Constant swing output

IpIn

QpQn

FILTp FILTn

OUTpOUTn

I

Rload

INp

INn

Rload

14

PLL VCO

Page 15: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

PLL Close Loop Responce Gain, Phase vs Frequency

-60.0E+0-57.0E+0-54.0E+0-51.0E+0-48.0E+0-45.0E+0-42.0E+0-39.0E+0-36.0E+0-33.0E+0-30.0E+0-27.0E+0-24.0E+0-21.0E+0-18.0E+0-15.0E+0-12.0E+0-9.0E+0-6.0E+0-3.0E+0

000.0E+03.0E+06.0E+09.0E+0

12.0E+0

10.0E+3 100.0E+3 1.0E+6 10.0E+6 100.0E+6 1.0E+9

Frequency (Hz)

Gai

n (d

B)

-180.0

-165.0

-150.0

-135.0

-120.0

-105.0

-90.0

-75.0

-60.0

-45.0

-30.0

-15.0

0.0

Hmag(f)

Hphase(f)

PLL Closed Loop Response

15

Page 16: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Domino Pull Down

Month Day, Year

Precharge Left

Data

166T

Cells

166T

Cells

DataX

Precharge Right

BLX

BL

1.06um2• Single

Ended

• Sink and Source for Writes

• Local and Global Bit lines

16

L1 Cache Bit Slice

Page 17: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

• Single Ended for stability

• Precharge is self timed off Write

• Fuse control

• Allows ECC on partial writes

Read PrechargeWrite

Word Line

Precharge

DataX

BL

Data

BLX

Clock

17

L1 Timing: Read Modified Write

Page 18: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

PCH<1:0> COLSEL<7:0> PCHL<1:0>

WR

WRXSuperBL

<3:0>

<3:0>

<3:0>

<3:0>

<3:0>

<3:0> <3:0>

<3:0>

<3:0>

<3:0>

<3:0>

<3:0>

<3:0>

<3:0>

<3:0>

<3:0>

RA

M A

RR

AY

32 R

ows

x 4

Col

s

0.81um2

RA

M A

RR

AY

32 R

ows

x 4

Col

s

• Like L1, Single Ended for stability

• 3 Level Bit Line

18

L2 & L3 Cache Bit Slice

Page 19: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

CLK

RdEn

RdIndex

HitWay

RdData

CaptureClk

0 1 2 3 4 5

1.5 Cycles

1.5 Cycles

1 Cycle

• Flexible tiling methodology: liquid L3

• Asynchronous timing for flexible layout

L3 Timing

19

Page 20: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

0

1

2

3

4

5

6

7

0.7V 0.9V 1.1V 1.3V

L2/L3 Cells

Sig

ma

• Product Reliability requires > 5 sigma margin• Single-ended vs. small-swing: better margin

20

L2 & L3 Read Disturb Stability

Page 21: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

No SOIMemory

no flood (no flip) 7.9σ

flood (no flip) 7.2σ

no flood (flip) 5.48σ

flood (flip) 4.53σ

•Study small swing (sense-amplifier-based)

•Increased ΔVT & ΔL variation + SOI Hysteresys

With SOIMemory

21

L2 & L3 Small Swing Study

Page 22: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

RptrWptrDQ

DLL

PipelineControl

DQDLL

DQ[7:0]

DQS/DM[3:0]

DQ D +−

+−

DLLx2 MemVref

PLL

Supports DDR2 & 3, 1.4 to 1.9V, 400 to 1600Mb/s

2 independent (R/W,W/R,R/R,W/W), 64b channels (+ECC)

4 DIMM/chan (U,R,SO), per-DIMM trained delay settings.

44 DLL/chan with dynamic delay update (per burst).

22

DDR DRAM I/O

Page 23: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

• DDR 800 (1250ps UI)

• Real Motherboard

• Under the Socket

• +/– 300mV margin

0 20% 40% 60% 80% 100%

600

400

200

0

-600

-400

-200

23

DDR DRAM Read Eye

Page 24: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

DDR 800, no Channel, ATE Trigger on gold clock

DDR 800, at the DIMM, real traffic, Trigger on 0V Strobe crossing

24

DDR DRAM Write Eye

Page 25: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

HyperTransportTM I/O

• Backward Compatible (legacy) to HT1– 400, 800, 1200, 1600 and 2000 MT/s

• Supports all HT3 speed– 2.4, 2.8, 3.2, 4.0, 4.4, 4.8 and 5.2 GT/s

• Support various power saving modes• Tolerates large HT3 common mode variation• Linear TX equalization (de-emphasis)• Non-linear RX equalization (DFE)

25

Page 26: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

• Clock forwarded

• Simple DLL CDR

• SSC tolerant

26

HT Block Diagram

Page 27: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Data

Data

De-emph enable

De-emph enable

Out

OutX

• Primary Voltage mode Driver

• Eq done with current sources

• Lower power when > 3dB of boost

27

HT Driver with Linear Eq

Page 28: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

HT RX with DFE

28

Page 29: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Remote Sensor Remote

SensorRemote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

TCEN

TCEN

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor Remote

SensorRemote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

TCEN

Remote SensorRemote

SensorRemote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

TCEN

Remote SensorRemote

SensorRemote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

Remote Sensor

TCEN

TCON

Remote Sensor

Remote Sensor

CORE 0

CORE 1 CORE 3

CORE 2

NorthBridge

Thermal: Chip Wide Block Diagram

29

Page 30: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

ExternalDiode

ThermCenter

ThermCenter

RemoteDiodeSensor

30

Thermal Sensor Locations

Page 31: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

0

1

7

Reference I & V

I I*9

ΣΔA/D

CMP/Latch

Control Logic

Temp[8:0]Local Osc(20MHz)

Regulators2.5V

Regulated Supply for Digital Circuits

Regulated Supply for Analog Circuits

LevelShifters

+-

+-

ΔVbe &Scaler

198μV = 1C

31

Thermal: Block Diagram

Page 32: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Design Study (DS): DDR Phy• Advice for Mixed-Signal IC designers

• Technology does not care about you

• When you overload functions, beware

• Always have software hooks (use all modes)

• Poly (or other) fuses are your friends

• Put Defaults in metal (one layer to change)

• Use servo loops to set bias conditions

• Use DACs for the servo loop reference32

Page 33: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

DS DDR: Register Space• Use scalar bits if you have less than 50 bits

• Otherwise, use a simple bus and register

• Physical locality to where the bits are used

• Extensible: Easy to add bits and functions

• Add in write masks and broadcasts

• It will save your bacon

33

Page 34: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

34

DS DDR: Registers

D Qck

Decode

16

1616

16 RdDataOut

WrData

RdStrobe

WrStrobeAddress

RdDataIn

4

16

NibbleMask 4

Register

Write1 Write2 Read1 Read2 Write3 Null Read3AddressMask

RdStrobe

WrStrobe

RdData

WrData WrDat1 WrDat2 WrDat3

RdDat1 RdDat2 RdDat3

Page 35: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

DS DDR: Servo Loop• Use feedback to deal with PVT (servo)

• Use DACs to set the servo reference

35

Rref

+−4 IDAC

VTT

VDDIO

Iref

Rload Rload

Register

Itail

15·W

15·W

8·W

8·W

C[3]

4·W

4·W

C[2]

2·W

2·W

C[1]

1·W

1·W

C[0]

IDAC

Page 36: An Integrated Quad-Core AMD Opteron MicroProcessor...Technology 65nm, SOI CMOS, 0.8-1.4V Dual strain liners + eSiGe Floating and Body tied 11 Cu layers + low-k 1X, 1.3X, 2X, 4X thick

Summary• Monolithic, Quad x86 cores

• 65nm SOI CMOS, 11 Cu layers

• Integrated Northbridge

• Enhanced 128b FPU

• 2MB shared L3

• 2 independent DDR channels, DDR2 & 3

• Enhanced HT links for 2 to 8Way MP

• Comprehensive thermal monitoring36