View
4
Download
0
Category
Preview:
Citation preview
Would Error Correction Provide a
Benefit in Classical Computers?
27 Jan 2012 Photons, Electrons, Bands
Thomas Szkopek
Canada Research Chair in Nanoscale Electronics Department of Electrical and Computer Engineering
Acknowledgements
Vwani Roychowdhury,
(collaborator)
UCLA
Funding:
Eli Yablonovitch,
(provocateur)
UC Berkeley
system reliability
3
ENIAC, 1946 17,468 vacuum tubes mean time between faults: ~2 days
Source Drain
Gate
IBM BlueGene/L, 2006 131,072 processors mean time between faults: ~6 days
Lawrence Livermore National Laboratory
system reliability
4
Source Drain
Gate
“[with] current state‐of‐the‐art fault‐tolerance strategy, checkpoint/restart, for a 1 PFlop/s system… a computational job that could complete in 100 hours in a failure‐free environment will actually take 251 hours” “While several [high-end computing] vendors are looking to address reliability at the hardware level, the costs are proving to be staggeringly high in both money and power.”
DeBardeleben et al., High‐End Computing Resilience: Analysis of Issues Facing the HEC Community and Path‐Forward for Research and Development, Los Alamos National Laboratory 2010, http://institute.lanl.gov/resilience/docs/
let’s look at the hardware level!
error correction: memory and communications
5
reliable encoding
reliable decoding & error correction
channel (memory)
identity
transmitter (write)
receiver (read)
errors
• reliable encoding, decoding and error correcting hardware • efficient, complex codes are used
error correction: computation
6
reliable encoding
reliable decoding & error correction
logic unit
encoded logic
encoder decoder
errors
• reliable encoding, decoding and error correcting hardware • logic performed in code space (eg. Reed-Muller codes)
D. Pradhan & S. Reddy, IEEE Trans. Comp. 21, 1331 (1972).
• however, it is likely that all hardware is equally (un)reliable
error correction: computation
7
error correction logic
error correction logic
errors
• errors occur in all hardware • never decode bits or they will be corrupted, in other words:
all operations must be perfomed in protected code space!
protecting 1 bit : repetition
8
repetition code
“0” = 0 0 0 0 0
“1” = 1 1 1 1 1
error correction by majority vote
0 0 0 1 0 0 0 0 0 0
1 1 0 1 1 1 1 1 1 1
0 1 0 1 1 1 1 1 1 1
0 1 0 0 1 0 0 0 0 0
J. von Neumann, Lectures on Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components, 1952.
single bit flip: p
logical bit flip: P = 60p3 + …
p
error rate
p P = 60p3
protecting 1 bit
9
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
If majority gates are error-free, then the majority voting process is error free if <50% of input bits are in error.
MAJ
MAJ = majority vote
protecting 1 bit
10
President Harry S. Truman
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
MAJ
If majority gates are error-prone, then the majority voting process is error-prone.
MAJ = majority vote If majority gates are error-free, then the majority voting process is error free if <50% of input bits are in error.
error with probability p
fault tolerant architecture
11
majority gate M error correction concatenation
copy the bits ×3
majority vote ×3
Triplicate repetition code and fault-tolerant majority
PO Boykin, VP Roychowdhury, Proc. Int. Conf. Dep. Sys. Net. 2005.
fault tolerant architecture
12 PO Boykin, VP Roychowdhury, Proc. Int. Conf. Dep. Sys. Net. 2005.
error per majority gate: p
error with L concatenations:
P p
2L
bits with L concatenations:
~
1
108
N 9L (with ancillae)
error rate versus bits:
P p
Nlog2/log9
protecting more than 1 bit?
13
Can universal logic operations be performed in code space? (difficulty lies in the parity bits) Unknown. Best result is with an evolving RM code space. Can fault tolerant error correction be performed? Unknown. Promising quantum computing results. Is the overhead prohibitive? Unknown.
error correction logic
error correction logic
14
what about device physics? complementary transistor inverter:
Nin = input charge Nout = output charge N = CV/e = maximum charge Gn = n-channel conductance Gp = p-channel conductance
G
pG
0exp eV
GS/ k
BT
Assume sub-threshold conductance / thermionic emission through channels:
G
nG
0exp eV
GS/ k
BT
source
drain
VGS
15
CNT inverter Ph. Avouris, et al., Physica B 323 (2002) 6–14
Si nanowire inverter D. Wang, et al., Small 2 (2006) 1153-8
complementary logic
ZnO nanowire inverter S. Roy, et al., Nanotech 21 (2010) 245306
source
drain
VGS
16
complementary logic
Nout
N
2G
pG
n
GpG
n
N
2tanh
Nin
kBTC / e2
information theoretic perspective: single charge ~ physical bit
total charge ~ logical bit signal restoration ~ majority vote
metal-insulator transition in transistor channels:
δq2=kBTC
18
complementary logic
p(N
in)
Nin
Probability of logical error:
P :1
2
2
N ln 1
1/2
N
exp(eV / 8kBT )
Error scales as a ideal majority vote of N electrons with an error p per electron:
p
2
4
logical error
P N
N / 2
pN/2 :
2
N
1/2
4p N/2
NM
NM
reliability and redundancy
19
error rate per particle
p
logical error rate for N particles
P : T
p
T
Nlog2/log9
p
P :2
N
1/2
4p N/2
p ~ exp eV
kBT
P :1
N ln 1
4p
1/2
4p N 2
p ~ r
r 2
P :2
N
1/2
4p N 2
ideal majority vote
transistor logic circuit
ballistic gates
1-bit architecture
J
00010000
10111111
exp
on
enti
al
sup
pre
ssio
n in
N
sub
-exp
on
enti
al
sup
pre
ssio
n in
N
T. Szkopek et al PRL 106, 176801 (2011).
45nm node (2010)
21nm node (2015)
11.9nm node (2020)
L, gate length [nm] 27 17 10.7
Cg, gate capacitance [aF] 19.7 10.0 4.0
V, operating voltage [V] 0.97 0.81 0.68
N, electrons per inverter gate 240 100 34
N, electrons per NAND gate 480 200 68
M, transistors/chip 2.2×109 8.8×109 35×109
f, clock freq. [GHz] 5.9 8.5 12.4
P, error probability at 1000 FITs 2×10−29 4×10−30 4×10−31
P, error probability at 1 fault/year 2×10−27 4×10−28 7×10−29
CMOS
20 International Technology Roadmap for Semiconductors, 2009 edition.
Intel 45nm, strained Si
Source Drain
Gate
error rate comparison
21 T. Szkopek et al PRL 106, 176801 (2011).
structural disorder in transistor structures will increase error rates
1 electron, eV/kBT = 0.97eV/26meV
N = 30 at eV = 1.00 eV
is equivalent to
N = 3000 at eV = 10 meV
conclusions
22
• physics of transistors provides protection against logical errors • for 1-bit protection, it is better to prevent errors than to correct errors • error correction with multiple-bit code protection is an open problem
J
classical computing with spin
24
magnetic moments interaction:
V ~
2
r3
interaction error: V ~V
r
r
rotation for distinguishable states:
V t
h
rotation error: ~
r
r
J1 J2
r spin placement accurate to within δr
probability of erroneous spin flip!
δr
classical computing with spin
25
spin 1/2
~
r
r
Probability of error:
p ~
1
42
spin j = N × 1/2
p ~2
N
1/2
N
Probability of error:
✗
✗
classical computing with spin
26
N × spin 1/2
p ~
1
42
P N
N / 2
pN/2 ~
2
N
1/2
NMajority vote on N spins:
Probability of single error:
✗ ✗ ✗ ✗ ✗
Recommended