25
1 Error Protected Data Bus Inversion Using Standard DRAM Components Maurizio Skerlj Qimonda AG Server Memory Systems Engineering D-81739 Munich, Germany [email protected] Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences CH-1015 Lausanne, Switzerland [email protected]

Skerlj Dbi Ecc Isqed08 1 B2

  • Upload
    skerlj

  • View
    240

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Skerlj Dbi Ecc Isqed08 1 B2

1

Error Protected Data Bus InversionUsing Standard DRAM Components

Maurizio Skerlj

Qimonda AG

Server Memory Systems Engineering

D-81739 Munich, Germany

[email protected]

Paolo Ienne

Ecole Polytechnique Fédérale de Lausanne (EPFL)

School of Computer and Communication Sciences

CH-1015 Lausanne, Switzerland

[email protected]

Page 3: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 3

High-end Server Power Breakdown(fully populated)

IBM p670 (2003)

0

500

1000

1500

2000

2500

3000

3500

Energy SmallConfiguration

Energy LargeConfiguration

Wat

tsI/O + fans

Memory + fans

Processors + fans

Processors and associated fans

account for ~45% of energy consumption.

Architectural changes help lowering the

energy consumption.

Main memory and associated fans account for ~45% of energy consumption. Higher density and higher bandwidth increase the energy consumption.

Page 4: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 4

Energy Cost of Data Center

100.000 ft2 data center size

Building cost:20 USD/Watt of peak power

250 W/ft2 energy density

USD 500 millionin building cost

Electricity cost:0.80 USD/Watt per Year

20 MWatt yearly power consumption

Electricity annual billof USD 16 million

The incentive for green data centers

over a 10 year lifetime

1% less energy meansUSD 6.6 million savings

Page 7: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 7

Bus-invert Technique for Lower I/O Energy Consumption

CommunicationChannel RX

VddVdd

RTT

RTX .0

,

1

2

0

P

RR

VP

TTTX

dd

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

Original data encoded on a 4-bit bus

11111

11101

11011

11001

10111

10101

10011

01110

01111

01101

01011

10110

00111

11010

11100

11110

Low-power encoded 4-bit data on a 5-bit bus

Bus-invert encoding reduces

peak I/O power since all ’0’

pattern is avoided

Bus-invert reduces average I/O power since no more than n/2 bits can be ’0’

[Fletcher, US Pat. 4667337, 1987][Stan and Burleson, IEEE Tran on VLSI, 1995]

Page 9: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 9

Bus-invert Usage in CommercialMemory Systems

Client ECC BI BI MemoryError free channel

Architecture

The bus-invert technique is today successfully used in someapplications in order to reduce the energy consumption

Processor ECC BI BI MemoryInter-chip

CPU

On-chip Cache

[Mulla and Tu, US Pat. Appl. 289435, 2005]

GPU BI BI MemoryShort point-to-point

GPU GDRAM

Graphic Systems

[Ihm et al., ISSCC 2007]

Page 12: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 12

If it uses ECC, it’s reliable. Really?

System reliability is as weak as the weakest of its components.

Original dataError protectionencoding

Low powerencoding (bus

inversion)

ECC

Encoded data forlow power

b An error on the databits in combinationwith an error on thebus inversion bitscannot be detectedany more.

ECC encoding forsingle error correctionanddouble error detection

Single bit errorhitting the businversion bit willresult in a multipleerrors. This singleerror cannot becorrected.

Page 13: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 13

System Requirements for Correct Extension

ClientECC

BI MemoryNoisy channel

ECC protects also bus-invert (BI) bits

ECC protects against communication errors

ECC protects against soft and hard failures

ECC ECC

OK,Standard

BER > 0Soft and hard failures

No additional latency due to parallel encoding

An ideal main memory sub-system for servers should provide:• low latency,• high reliability,• standard compliancy.

Page 16: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 16

Proposed Encoding Scheme

Error protectionencoding

Original data

Low powerencoding (bus

inversion)

0,...,0

Stored ECCcorrections forbus inverstion

+

Encoded data forlow power

b

ECC

x

Original data extended in size.The ECC will protect also theadditional zeros.

The ECC word is addedlinearly with the ECCword of the bus-invertencoded word.

The ECC word protectsthe whole message.

Bus-inversionsignalling bits(protected by ECC).

Data to be decodedaccording to the bus-invert algorithm. Anyerrors will be correctedor detected in parallelby the ECC word.

All the bits in the message enjoy the ECC protection.

Page 17: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 17

How to Squeeze More Bits in the Standard Frame?

64b

it

64b

it8b

it

64b

it

64b

it8b

it

Dat

aS

EC

DE

D

chec

k b

its

72-b

it b

us

8bit

8bit

Standard frame(burst length of 4)

67 b

its:

64-b

it da

ta +

3 in

vers

ion

bits

5 bi

ts:

SE

CD

ED

che

ck

bits

(on

2 co

lum

ns)

Modified frame with bus inversion

(burst length of 4)

8-bitECC

2-bit parity

22b

itb 0

22b

itb 1

20b

itb 2

22b

itb 0

22b

itb 1

20b

itb 2

8-bitECC

2-bit parity

22b

itb 0

22b

itb 1

20b

itb 2

22b

itb 0

22b

itb 1

20b

itb 2

Error detection and correction code efficiency increases with the block size.

Page 19: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 19

Stochastical Model

States of the Model: S - exactly one bit is wrong due to a soft error;H - exactly one bit is wrong due to a hard failure;0 - Nb bits are correct;0’- (Nb-1) bits are correct;W – one bit wrong due to a communication error during a write operation (communication errors during reads can be restored with retransmission)

Assumptions:All failure mechanisms are Poissonian processes. H, S, and b are rates of occurrence of respectively hard, soft and communication fails. T0 is the time period with witch the word is purged from soft errors. Tb is the bit-unit interval.

[Noorlag et al., IEEE JSSC., 1980]

Page 22: Skerlj Dbi Ecc Isqed08 1 B2

ISQED08 – M. Skerlj and P. Ienne 22

Achieved Power Savings

Memory Controller (may be integrated

with the CPU)

Memory Controller I/O interface

Motherboard or

Riser Card

DIMM (Dual In-line Memory Module),

4GByte using 4 ranks of 512Mbit by-4 DRAMs

Portion of the system accounted for power calculations

Contributor w/o Bus-invert [Watt]

With Bus-invert [Watt]

Savings

Command, address, control bus

0.56 0.56 -

Data and strobe 1.63 0.96 41%

Total 10.80 10.126 6%

Simulation results are conservative:• usually there is more than 1 CPU socket• there is more memory channels in parallel

Power consumption is also relevant as • it limits the spacing between modules• thermal design is challenging (the air flow is heated by the CPU)

CPUsocket

CPUsocket

Memory Slots

MemoryChannels

Simulated system:

Page 25: Skerlj Dbi Ecc Isqed08 1 B2

25

Thank You