20
1 Next Generation Stacked Memory Systems Alok Gupta NVIDIA, Santa Clara, CA

Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

1

Next Generation Stacked Memory Systems

Alok Gupta

NVIDIA, Santa Clara, CA

Page 2: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

2

OUTLINE

1. Motivation 2. Memory Bandwidth Trends 3. Stacked Memory System 4. Conclusion 5. Q&A

Page 3: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

3

MOTIVATION

To keep up with increasing logic horsepower, memory bandwidth must scale every generation else performance becomes IO limited

Absolute power not just a mobile problem

For example, GPUs are maxed out on power budgets at 225-300W

IO bandwidth improvements must be achieved within same power budget as last generation which implies memory system power needs to stay same

Process scaling brings limited improvement in memory IO power – logic at least benefits from Moore’s law

New technology and ideas needed to keep memory bandwidth growth in similar power envelope

Page 4: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

4

GPU MEMORY SYSTEM BANDWIDTH

Page 5: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

5

MEMORY BANDWIDTH MAPPED TO DRAM TECHNOLOGY

Stacked Memory

Page 6: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

6

PACKAGING TRENDS

Conventional MCM Multi-Chip Module

Organic

Interposer

Silicon

Interposer Die Stacking

Diagram

Complexity Well understood Well-established

process No TSVs needed

TSVs limited to

silicon interposer

TSVs needed

across all chips in

the stack

Cost Low Medium TBD Higher Higher

Form Factor

Size (Reference)

Smaller PCB

Larger Package

Smaller PCB

Larger Package

Smaller PCB

Similar Package

Smaller PCB

Smaller Package

PKG PKG PKG

Silicon Interposer

PKG

GPU

2D 2.5D 3D 2.1D

Page 7: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

7

HIGH BANDWIDTH MEMORY (HBM) DRAM

• A single package containing multiple memory die stacked together, using through-silicon vias (TSV). The memory within HBM is organized into channels wherein each channel is functionally and operationally independent

• HBM DRAM uses a wide-interface architecture to achieve high-speed, low-power operation and is best suited for 2.5D Silicon Interposer based system designs

Base Logic

Layer in

DRAM Process

Page 8: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

8

HIGH BANDWIDTH MEMORY (HBM) DRAM

HBM DRAM array

2-8Gb DRAM die w/ ECC

4/8-high stack – 1GB to 8GB per stack

Up to 256GB DRAM internal bandwidth

Base Layer w/ HBM IO + DRAM Test and Repair logic – in DRAM process

DRAM Interface

Signaling – 1.2V LVCMOS

Data rate – 800MHz-1000MHz DDR, wide 1024-bit interface

Page 9: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

9

MOBILE WIDE-IO2 MEMORY

Density: 8Gb

4/8 independent 64-bit channels, 256/512-bit interface

no cross channel restrictions

Interface Speed: 400-566MHz DDR

Bandwidth: 25.6-68.2GBps

1 through 4 high stacks

Mono stack is micro-bumped w/o TSV, Multi-high stack w/ TSV

Power efficient – leverages LP process, CMOS signaling

Designed for 3D stacking but can be made to work for 2.1/2.5D solutions

Page 10: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

10

2.1D MEMORY SYSTEMS WITH WIDE MEMORY

2.1D Organic

Interposer Fan-Out WLP Fan-Out WLP PoP Interposer PoP

PKG Constructions

PKG Height (mm) 0.84 0.53 0.90 0.90

PKG Technology

Maturity Level Medium Medium Low Medium

Thermal Good Good Poor Poor

PKG Reliability Unknown

Cost Unknown

Page 11: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

11

2.1D MEMORY SYSTEM CHALLENGES

Number of IOs

Signal density and routing limits number of IOs

Interface Speed

Channel not as benign as other stacked solutions – performance/power trade-off

Bandwidth and Capacity scaling

Package Reliability

Solution Cost is Work-In-Progress

Page 12: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

12

Package Substrate

Silicon Interposer

2.5D MEMORY SYSTEM WITH HBM DRAM

Passive silicon interposer

Package Substrate

GPU/CPU

HBM

HBM

HBM

HBM

HBM

HBM

HBM

HBM

Cross-Section View

GPU/CPU

HBM

HBM

Top View

HBM

HBM

Page 13: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

13

2.5D HIGH DENSITY GPU-MEMORY INTERCONNECT

Silicon interposer enables fine pitch geometries

>50x finer geometry

Performance depends on signal integrity requirements

Loss (width)

Crosstalk (spacing)

GPU - HBM signal routing on Silicon Interposer

Page 14: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

14

SILICON INTERPOSER LOSS & CROSSTALK

Insertion Loss

Resistance in channel

Slew Rate degradation due to channel loss

Very simple channel transfer function (almost RC) compared to off-chip signaling

Crosstalk dominated by adjacent aggressors

Line space and thickness

Eye is nice and open

Silicon Interposer Channel Characteristics

Resistance

creates DC

Channel loss

Slew Rate

Degradation due to

loss

Coupled crosstalk is

dominated by

adjacent signals

Sharp Roll-off

w/o resonances

Page 15: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

15

2.5D MEMORY SYSTEM CHALLENGES

Stacked memory solution still at an early stage – no very large volume products

Eco-system challenges

Multi-sourcing

Active collaboration required between foundry, memory vendor, and OSAT to deliver a successful product

Assembly, Test/Repair, and failure analysis

Solution cost trend is a big unknown

Page 16: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

16

3D MEMORY SYSTEM WITH WIDE-IO2

Page 17: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

17

WIDE-IO2 SOC CO-LAYOUT

Routing blockages

Power delivery

Keep-out regions

Thermal hot-spots

Page 18: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

18

3D MEMORY SYSTEM CHALLENGES

Memory and SoC Co-layout

Bandwidth and Capacity scaling

Thermals

Power delivery

Cost

Page 19: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

19

SUMMARY

Stacked memory is a promising solution for ever demanding need for bandwidth

Xilinx is shipping large FPGAs using silicon interposer – solves a unique problem

High Volume Manufacturing for mainstream GPU/CPU devices still work-in-progress

Business challenges

Multiple companies need to work together (foundry + memory vendor + OSAT

Assembly, Failure Analysis, Test and Repair

DRAM Cost per bit

Stacking of heterogeneous devices not well understood – co-design, thermal and mechanical challenges

Page 20: Next Generation Stacked Memory Systems · 2.5D MEMORY SYSTEM CHALLENGES Stacked memory solution still at an early stage – no very large volume products Eco-system challenges Multi-sourcing

20

QUESTIONS?