Upload
skerlj
View
240
Download
0
Tags:
Embed Size (px)
Citation preview
1
Error Protected Data Bus InversionUsing Standard DRAM Components
Maurizio Skerlj
Qimonda AG
Server Memory Systems Engineering
D-81739 Munich, Germany
Paolo Ienne
Ecole Polytechnique Fédérale de Lausanne (EPFL)
School of Computer and Communication Sciences
CH-1015 Lausanne, Switzerland
ISQED08 – M. Skerlj and P. Ienne 2
Data Center Energy DemandHigher performance at
lower power consumption but also
higher power consumption per real-estate area
(ESHRA)
Increased power supply and cooling capacity
Energy demand for IT doubled in 2000-2005
(EPA)Does this look sustainable?
Ba
rrel
pri
ce in
US
D
ISQED08 – M. Skerlj and P. Ienne 3
High-end Server Power Breakdown(fully populated)
IBM p670 (2003)
0
500
1000
1500
2000
2500
3000
3500
Energy SmallConfiguration
Energy LargeConfiguration
Wat
tsI/O + fans
Memory + fans
Processors + fans
Processors and associated fans
account for ~45% of energy consumption.
Architectural changes help lowering the
energy consumption.
Main memory and associated fans account for ~45% of energy consumption. Higher density and higher bandwidth increase the energy consumption.
ISQED08 – M. Skerlj and P. Ienne 4
Energy Cost of Data Center
100.000 ft2 data center size
Building cost:20 USD/Watt of peak power
250 W/ft2 energy density
USD 500 millionin building cost
Electricity cost:0.80 USD/Watt per Year
20 MWatt yearly power consumption
Electricity annual billof USD 16 million
The incentive for green data centers
over a 10 year lifetime
1% less energy meansUSD 6.6 million savings
ISQED08 – M. Skerlj and P. Ienne 5
Outline
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings• Conclusions
ISQED08 – M. Skerlj and P. Ienne 6
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings• Conclusions
Outline
ISQED08 – M. Skerlj and P. Ienne 7
Bus-invert Technique for Lower I/O Energy Consumption
CommunicationChannel RX
VddVdd
RTT
RTX .0
,
1
2
0
P
RR
VP
TTTX
dd
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Original data encoded on a 4-bit bus
11111
11101
11011
11001
10111
10101
10011
01110
01111
01101
01011
10110
00111
11010
11100
11110
Low-power encoded 4-bit data on a 5-bit bus
Bus-invert encoding reduces
peak I/O power since all ’0’
pattern is avoided
Bus-invert reduces average I/O power since no more than n/2 bits can be ’0’
[Fletcher, US Pat. 4667337, 1987][Stan and Burleson, IEEE Tran on VLSI, 1995]
ISQED08 – M. Skerlj and P. Ienne 8
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• Achieved energy savings• System reliability• Conclusions
Outline
ISQED08 – M. Skerlj and P. Ienne 9
Bus-invert Usage in CommercialMemory Systems
Client ECC BI BI MemoryError free channel
Architecture
The bus-invert technique is today successfully used in someapplications in order to reduce the energy consumption
Processor ECC BI BI MemoryInter-chip
CPU
On-chip Cache
[Mulla and Tu, US Pat. Appl. 289435, 2005]
GPU BI BI MemoryShort point-to-point
GPU GDRAM
Graphic Systems
[Ihm et al., ISSCC 2007]
ISQED08 – M. Skerlj and P. Ienne 10
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings• Conclusions
Outline
ISQED08 – M. Skerlj and P. Ienne 11
Extension of the Architecture Lead to Reliability Issues
Cache ECC BI BI SDRAMLong multi-stub bus
Cache ECC BI SDRAMLong multi-stub bus
DIMMs
DIMMsCPU
CPU
BER > 0 Non-standard
BER > 0
Soft and Hard Failures
System reliability and usage of standard parts is paramount in servers.Architectures which do not fulfil those requirements are not viable.
ISQED08 – M. Skerlj and P. Ienne 12
If it uses ECC, it’s reliable. Really?
System reliability is as weak as the weakest of its components.
Original dataError protectionencoding
Low powerencoding (bus
inversion)
ECC
Encoded data forlow power
b An error on the databits in combinationwith an error on thebus inversion bitscannot be detectedany more.
ECC encoding forsingle error correctionanddouble error detection
Single bit errorhitting the businversion bit willresult in a multipleerrors. This singleerror cannot becorrected.
ISQED08 – M. Skerlj and P. Ienne 13
System Requirements for Correct Extension
ClientECC
BI MemoryNoisy channel
ECC protects also bus-invert (BI) bits
ECC protects against communication errors
ECC protects against soft and hard failures
ECC ECC
OK,Standard
BER > 0Soft and hard failures
No additional latency due to parallel encoding
An ideal main memory sub-system for servers should provide:• low latency,• high reliability,• standard compliancy.
ISQED08 – M. Skerlj and P. Ienne 14
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings• Conclusions
Outline
ISQED08 – M. Skerlj and P. Ienne 15
Exploiting Code Linearity to Combine Error Protection and Bus-invert Coding
x
c
x
ECCCalculation
x1x2
c1c2
x1x2ECC
Calculation
Systematic codes are simple ... ... and linear
1
xxInverting a codeword means
x ECCCalculation
1 ECC
Calculation
+
x
c’
A fixed value can be hard-coded
ISQED08 – M. Skerlj and P. Ienne 16
Proposed Encoding Scheme
Error protectionencoding
Original data
Low powerencoding (bus
inversion)
0,...,0
Stored ECCcorrections forbus inverstion
+
Encoded data forlow power
b
ECC
x
Original data extended in size.The ECC will protect also theadditional zeros.
The ECC word is addedlinearly with the ECCword of the bus-invertencoded word.
The ECC word protectsthe whole message.
Bus-inversionsignalling bits(protected by ECC).
Data to be decodedaccording to the bus-invert algorithm. Anyerrors will be correctedor detected in parallelby the ECC word.
All the bits in the message enjoy the ECC protection.
ISQED08 – M. Skerlj and P. Ienne 17
How to Squeeze More Bits in the Standard Frame?
64b
it
64b
it8b
it
64b
it
64b
it8b
it
Dat
aS
EC
DE
D
chec
k b
its
72-b
it b
us
8bit
8bit
Standard frame(burst length of 4)
67 b
its:
64-b
it da
ta +
3 in
vers
ion
bits
5 bi
ts:
SE
CD
ED
che
ck
bits
(on
2 co
lum
ns)
Modified frame with bus inversion
(burst length of 4)
8-bitECC
2-bit parity
22b
itb 0
22b
itb 1
20b
itb 2
22b
itb 0
22b
itb 1
20b
itb 2
8-bitECC
2-bit parity
22b
itb 0
22b
itb 1
20b
itb 2
22b
itb 0
22b
itb 1
20b
itb 2
Error detection and correction code efficiency increases with the block size.
ISQED08 – M. Skerlj and P. Ienne 18
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings
Outline
ISQED08 – M. Skerlj and P. Ienne 19
Stochastical Model
States of the Model: S - exactly one bit is wrong due to a soft error;H - exactly one bit is wrong due to a hard failure;0 - Nb bits are correct;0’- (Nb-1) bits are correct;W – one bit wrong due to a communication error during a write operation (communication errors during reads can be restored with retransmission)
Assumptions:All failure mechanisms are Poissonian processes. H, S, and b are rates of occurrence of respectively hard, soft and communication fails. T0 is the time period with witch the word is purged from soft errors. Tb is the bit-unit interval.
[Noorlag et al., IEEE JSSC., 1980]
ISQED08 – M. Skerlj and P. Ienne 20
Our Solution Saves Energy with no Reliability Issues or Cost Increase
ISQED08 – M. Skerlj and P. Ienne 21
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings• Conclusions
Outline
ISQED08 – M. Skerlj and P. Ienne 22
Achieved Power Savings
Memory Controller (may be integrated
with the CPU)
Memory Controller I/O interface
Motherboard or
Riser Card
DIMM (Dual In-line Memory Module),
4GByte using 4 ranks of 512Mbit by-4 DRAMs
Portion of the system accounted for power calculations
Contributor w/o Bus-invert [Watt]
With Bus-invert [Watt]
Savings
Command, address, control bus
0.56 0.56 -
Data and strobe 1.63 0.96 41%
Total 10.80 10.126 6%
Simulation results are conservative:• usually there is more than 1 CPU socket• there is more memory channels in parallel
Power consumption is also relevant as • it limits the spacing between modules• thermal design is challenging (the air flow is heated by the CPU)
CPUsocket
CPUsocket
Memory Slots
MemoryChannels
Simulated system:
ISQED08 – M. Skerlj and P. Ienne 23
• The bus-invert technique• Bus-invert in commercial memory systems• Problems with extension to main memory• Proposed architecture• System reliability• Achieved energy savings• Conclusions
Outline
ISQED08 – M. Skerlj and P. Ienne 24
Conclusions
• has to improve;
• Technology scaling improves only ;
• Energy efficiency provides a high financial leverage;
Area Watt
ePerformanc
Watt
ePerformanc
Our proposed novel scheme achieves:
• higher energy efficiency
• without compromising the system reliability
• at no additional cost!
25
Thank You