Upload
independent
View
1
Download
0
Embed Size (px)
Citation preview
Fo
Defect tolerant architectures are gaining importance for building economical and cheap computing systems with billions of devices of nanometer dimension. This is because, at the nanoscale, devices will be prone to errors due to manufacturing defects, ageing, transient faults and quantum physical effects. Due to the increase in the device density, micro-architects may opt for redundancy based defect tolerant techniques. Logic circuits are implemented in many non-silicon manufacturing methodologies such as Quantum Dot Cellular Automata using three input majority gates as the basic logic devices. We have extended our previous work and analyzed redundancy based majority gate architectures by using probabilistic model checking techniques. Such analysis provides efficient evaluation of the reliability/redundancy trade-offs. Analytical probabilistic models to evaluate reliability/redundancy trade-offs are error prone and cumbersome.
F E R M A T rmal Engineering Research using Methods,
Abstractions and Transformations
Technical Report No: 2004-13
MT
db
Reliability Evaluation of ultiplexing Based Defect-olerant Majority Circuits
Debayan Bhaduri Sandeep Shukla
Reliability Evaluation of Multiplexing Based Defect-TolerantMajority Circuits ∗
Debayan Bhaduri Sandeep Shukla
FERMAT LabThe Bradley Department of Electrical & Computer Engineering
Virginia Polytechnic Institute and State UniversityBlacksburg, VA 24060
E-mail:{dbhaduri, shukla }@vt.edu
∗This work was supported by NSF Grant CCR-0340740
i
Contents
1 Introduction 2
2 Background 2
3 Model Construction 4
4 Experiments and Results 5
5 Conclusion and Future work 7
List of Figures
1 A majority multiplexing unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Reliability for I/O Bundle Size of 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Reliability for I/O Bundle Size of 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
List of Tables
ii
Abstract
Defect tolerant architectures are gaining importance for building economical and cheap computing sys-
tems with billions of devices of nanometer dimension. This is because, at the nanoscale, devices will
be prone to errors due to manufacturing defects, ageing, transient faults and quantum physical effects.
Due to the increase in the device density, micro-architects may opt for redundancy based defect toler-
ant techniques. Logic circuits are implemented in many non-silicon manufacturing methodologies such
as Quantum Dot Cellular Automata using three input majority gates as the basic logic devices. We
have extended our previous work and analyzed redundancy based majority gate architectures by using
probabilistic model checking techniques. Such analysis provides efficient evaluation of the reliabili-
ty/redundancy trade-offs. Analytical probabilistic models to evaluate reliability/redundancy trade-offs
are error prone and cumbersome.
1
1 Introduction
In the future, nanotechnology will let us combine the fundamental building blocks of nature easily,
inexpensively and in most of the ways permitted by the laws of physics. Continued improvements in
lithography have resulted in line widths that are less than one micron. Sub-micron lithography is clearly
very valuable but it is equally clear that conventional lithography will not let us build semiconductor
devices in which individual dopant atoms are located at specific lattice sites. There is fairly widespread
belief that silicon based technologies are likely to continue for at least another several years and then
reach their practical limits. If we are to continue these miniaturization trends we will have to develop new
manufacturing technologies which will let us inexpensively build computer systems with mole quantities
of logic elements that are molecular in both size and precision and are interconnected in complex and
highly idiosyncratic patterns. Such technologies will increase the defect density and assuming error-
free computing may no longer be possible. Due to the small feature size, redundancy based defect-
tolerance will be adopted, and conventional techniques such as Von Neumann multiplexing [9] may be
implemented to obtain high reliability.
Non-silicon manufacturing methodologies such as quantum dots [8], quantum cellular automata [1, 4]
use majority logic devices as the fundamental building blocks of a Boolean network. In this paper, we
analyze reliabilty/redundancy trade-offs for multiplexing based majority circuits by building a generic
multiplexing library. This is also an enhancement of our probabilistic model checking based tool
NANOPRISM [2] . Such a library can be applied to model any arbitrary boolean circuit or a portion of
a large Boolean network and also at different levels of granularity, such as gate level, logic block level,
logic function level, unit level etc.
2 Background
Defect-Tolerant Computing: Formally, adefect-tolerant architectureis one which uses techniques to
mitigate the effects of defects in the devices that make up the architecture, and guarantees a given level
of reliability. In 1952, von Neumann introduced a redundancy technique called NAND multiplexing [9]
for constructing reliable computation from unreliable devices (due to unreliable valve based computers
at that time). He showed that, if the failure probabilities of the gates are sufficiently small and failures
are independent, then computations may be done with a high probability of correctness. Pippenger [6]
2
showed that von Neumann’s construction works only when the probability of failure per gate is strictly
less than1/2, and that computation in the presence of noise (which can be seen as the presence of defect),
requires more layers of redundancy. In [3, 5], NAND multiplexing was compared to other techniques
for fault-tolerance and theoretical calculations showed that the redundancy level must be quite high to
obtain acceptable levels of reliability.
U
M
M
M
:
:
:
:
:
:
:
:
:
:
X
Z
Y
Executive
Stage
U
M
M
M
:
: :
:
:
:
:
:
:
:
Restorative Stage
:
:
:
:
:
M=Majority Gate
Figure 1. A majority multiplexing unit
Multiplexing Based Defect-Tolerance:The basic technique of multiplexing is to replace a processing
unit by a multiplexed unit withN copies of every input and output of the processing unit. In a multiplex-
ing unit, there areN devices which in parallel process the copies of the inputs to giveN outputs. If the
inputs and devices are reliable, then each element of the output set will be identical and equal to that of
the processing unit. However, when there are errors in the inputs and devices are faulty, the outputs will
not be identical. Instead, after defining some critical level∆∈ (0,0.5), the output of the multiplexing unit
is considered stimulated (taking logical valuetrue) if at least(1−∆)·N of the outputs are stimulated and
non-stimulated (taking logical valuefalse) if no more than∆·N outputs are stimulated. In cases where
the number of stimulated outputs does not meet either criteria, i.e. the number of stimulated outputs is
in the interval(∆·N,(1−∆)·N), then the output is undecided, and hence a malfunction will occur. The
basic design of a multiplexing unit consists of two stages: theexecutive stagewhich performs the basic
function of the processing unit to be replaced, and therestorative stagewhich reduces the degradation
in the executive stage caused by errors in both the inputs and faulty devices.
In this paper, we consider multiplexing when the processing unit is a single majority gate. We therefore
replace the inputs and output of the gate withN copies and in the executive stage duplicate the majority
3
gateN times, as in Figure1. The unitU represents arandom permutationof the input signals, that is,
each signal of the first input is randomly paired with a signal from the second input to form an input pair
for one of the copies of the gate. Also shown in Figure1 is the restorative stage which takes the output
of the executive stage as its inputs. To give a more effective restoration mechanism this stage can be
iterated [9].
Probabilistic Model Checking and Prism: Probabilistic model checkingis a range of techniques for
calculating the likelihood of the occurrence of certain events during the execution of unreliable or un-
predictable systems. The system is usually specified as a state transition system, with probability values
attached to the transitions. A probabilistic model checker applies algorithmic techniques to analyze the
state space and calculate performance measures. We use PRISM [7], a probabilistic model checker de-
veloped at the University of Birmingham. We usediscrete-time Markov chains(DTMCs) to model the
generic multiplexing library for majority logic gates. This model of computation is suitable for conven-
tional digital circuits and the fault models considered. The fault models are manufacturing defects in the
gates and transient errors that can occur at any point of time in a Boolean network.
3 Model Construction
In this section we explain the PRISM model of a majority gate multiplexing configuration. The first
approach is directly modeling the system as shown in Figure1. A PRISM module is constructed for
each multiplexing stage comprisingN majority gates and these modules are combined through syn-
chronous parallel composition. However, following this construction leads to the well know state space
explosion problem. At the same time, we observe that the actual values of the inputs and outputs for
each stage is not important, instead one needs to keep track of only the total number of stimulated (and
non-stimulated) inputs and outputs. Furthermore, to allow us to compute these values, without having to
store all the outputs of the majority gates in each stage, we replace the set ofN majority gates working in
parallel with N majority gates working insequence. The same methodology is applied to the multiplex-
ing stages of the system so as to reuse the same module for each of the stages while keeping a record of
the outputs from the previous stage. This folds space into time, or in other words reuse the same majority
gate/stage over time rather than making redundancy over space. This approach does not influence the
performance of the system since each majority gate works independently and the probability of each
4
gate failing is also independent.
The unitU in Figure1 performs random permutation. Consider the case whenk outputs from the
previous stage are stimulated for some0 < k < N. Since there arek stimulated outputs, the next stage
will have k of the inputs stimulated ifU performs random permutation. Therefore, the probability of
either all or none of inputs being stimulated inputs is 0. This implies that each of the majority gates
in a stage are dependent on one other, for example, if one majority gate has a stimulated input, then
the probability of another having the same input stimulated decreases. It is difficult to calculate the
reliability of a system by means of analytical techniques for such a scenario. To change the number of
restorative stages, bundle size, input probabilities or probability of the majority gates failing requires
only modification of parameters given at the start of the model description. Since PRISM can also
represent non-deterministic behavior, one can set upper and lower bounds on the probability of gate
failure and then obtain best and worst case reliability characteristics for the system under these bounds.
1 2 3 4 5 6 7 8 9 10 110.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
number of restorative stages
pro
bab
ility
of
erro
r le
ss t
han
10%
probability of gate failure = 0.01probability of gate failure = 0.02probability of gate failure = 0.03probability of gate failure = 0.04
(a) Probability that atmost 10% of the
outputs are incorrect
1 2 3 4 5 6 7
4
6
8
10
12
14
16
number of restorative stages
exp
ecte
d %
of
inco
rrec
t o
utp
uts
probability of gate failure = 0.04probability of gate failure = 0.03probability of gate failure = 0.02probability of gate failure = 0.01
(b) Expected percentage of incorrect
outputs (large probability of failure)
Figure 2. Reliability for I/O Bundle Size of 20
4 Experiments and Results
In this section we report the reliability measures of multiplexing based majority systems both when the
I/O bundles are of size 10 and 20. These bundle sizes are only for illustration purposes and we have
investigated the performance of these systems for larger bundle sizes. In all the experiments reported in
this paper, we assume that the inputsX, Y andZ are identical (this is often true in circuits containing
similar devices) and that two of the inputs have high probability (0.9) of being logic high while the third
5
input has a 0.9 probability of being a logic low. Thus the circuit’s correct output should be stimulated.
Also, it is assumed that the gate failure is a von Neumann fault, i.e. when a gate fails, the value of its
output is inverted. In Figure2, we consider a bundle size of 10 and the probability of gate failure varying
from 0.01 to 0.04. The probability that system error is less than 10% and the expected percentage of
incorrect outputs are plotted against the number of restorative stages. As can be seen from the results,
after incorporating certain number of restorative stages, increasing these does not make the system ap-
preciably more reliable. The reliability tends to reach a steady state. This is because at large gate failure
probabilities, the restorative stages are sufficiently affected as well and augmenting these to the architec-
tural configuration does not reduce the degradation in the reliability of computation. From these results
we therefore conclude that, in the case of a bundle size equal to 10, if the gate failure probability of the
gates is greater than or equal to 0.01 then the system cannot be made more reliable once a sufficient
number of restorative stages have been added (in this case5).
−8 −7 −6 −5 −4 −30.75
0.8
0.85
0.9
0.95
1
error of individual gate (10x)
pro
bab
ility
of
erro
r le
ss t
han
10%
U : 3 Stages U : 4 StagesU : 5 StagesU : 7 Stages
(a) Probability that atmost 10% of the
outputs are incorrect
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
number of non−stimulated outputs
prob
abili
ty probability gate failure 0.2probability gate failure 0.1probability gate failure 0.02probability gate failure 0.0001
(b) Output distribution for 7 restora-
tive stages
Figure 3. Reliability for I/O Bundle Size of 20
On the other hand, Figure3(a) plots the probability that at most 10% of the outputs of the overall
system are incorrect (non-stimulated), for small gate failure probabilities and for a bundle size of 20. The
number of restorative stages varies between 1 and 7. This indicates that increasing the number of stages
can greatly enhance the reliability of the system. However, the the rate of increase in reliability decreases
as more restorative stages are added to the system. Moreover, there is a limit to the reliability which can
be gained by adding additional stages. Figure3(b) reports the distribution of the non-stimulated outputs
(error) for different majority gate failure probabilities for 7 restorative stages and the same bundle size.
We have also computed the output distribution of the system for different number of restorative stages,
6
and hence any measure of reliability can be calculated from these results. But PRISM can be used
directly for computing other measures of reliability as well. As expected, the output distribution and
the result from Figure3(a) show that, as the probability of a gate failure decreases, the reliability of the
multiplexing system increases.
5 Conclusion and Future work
In conclusion, this paper focuses on the need to have automation methodologies for analyzing reliability
of defect-tolerant architectural configurations. These architectures will be used to implement logic built
from emerging non-silicon manufacturing technologies. We have extended our tool NANOPRISM [2]
by developing a DTMC based generic multiplexing framework. A fragment of or an entire arbitrary
Boolean network can be plugged into such a framework to evaluate the redundancy and reliability trade-
offs. Analytical approaches may be error prone and cumbersome for complex network of gates. Our
probabilistic model checking methodology offers a complementary approach to such analytical method-
ologies for defect-tolerant nano architectures.
It is important to note that there is a difference between the bounds on the probability of gate fail-
ure required here for reliable computation and the theoretical bounds presented in the literature. This
difference is to be expected: in this paper we evaluate the performance of the system under a fixed
configuration (bundle size and number of restorative stages), whereas the bounds presented in the liter-
ature correspond to the scenario where the bundle size or number of restorative stages can be increased
arbitrarily in order to achieve a reliable system.
References
[1] Islamshah Amlani, Alexei O. Orlov, Geza Toth, Gary H. Bernstein, Craig S. Lent, and Gregory L.
Snider,Digital logic gate using quantum-dot cellular automata, Science284 (1999), no. 289-291,
Available at: http://www.nd.edu/ qcahome/reprints/Amlani2.pdf.
[2] Debayan Bhaduri and Sandeep Shukla,Nanoprism: A tool for evaluating granularity vs.
reliability trade-offs in nano architectures, GLSVLSI (Boston, MA), ACM, April 2004,
http://fermat.ece.vt.edu/Publications/pubs/techrep/techrep0318.pdf.
7
[3] J. Han and P. Jonker,A system architecture solution for unreliable nanoelectronic devices, IEEE
Transactions on Nanotechnology1 (2002), 201–208.
[4] C. Lent, A device architecture for computing with quantum dots, Porceedings of the IEEE, April
1997, p. 85.
[5] K. Nikolic, A. Sadek, and M. Forshaw,Architectures for reliable computing with unreliable nan-
odevices, Proc. IEEE-NANO’01, IEEE, 2001, pp. 254–259.
[6] N. Pippenger,Reliable computation by formulas in the presence of noise, IEEE Transactions on
Information Theory34 (1988), no. 2, 194–197.
[7] Web Page: www.cs.bham.ac.uk/ dxp/prism/.
[8] R. Turton,The quantum dot: A journey into the future of microelectronics, Oxford University Press,
U.K, 1995.
[9] J. von Neumann,Probabilistic logics and synthesis of reliable organisms from unreliable compo-
nents, Automata Studies (1956), 43–98.
8