Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Enhanced N-Version Programming and Recovery Block Techniques for Web Service Systems
Kuan-Li Peng1, Chin-Yu Huang1, Pin-Heng Wang2, Chao-Jung Hsu3
1Department of Computer Science
National Tsing Hua University Hsinchu, Taiwan
2System Development Department III
Alpha Networks Inc. Hsinchu Taiwan
3Medical Department Altek Corporation Hsinchu, Taiwan
ABASTRACT In recent years, web services (WS’s) have been widely used to
support interoperable machine-to-machine interaction over a
network. In order to ensure a reliable WS system, a number of
fault tolerance designs have been proposed. It is known that
network connection and hardware devices may fail. In addition,
the acceptance test (AT) as well as the decision mechanism (DM),
which are common in fault tolerance designs, could also fail
unexpectedly. Such uncertainties may affect the reliability of a
WS-based system but have not yet been carefully considered in
reliability modeling. Therefore, we propose extended NVP
(ENVP) and extended RB (ERB) for the reliability analysis.
Various operations of ENVP and ERB are discussed, and a
simulation procedure is implemented to evaluate the system
reliability and the failure probability of fault-tolerant WS-based
systems. The experimental results show a high degree of
correlation between the numbers of AT’s and the reliability
improvements. The proposed fault tolerance designs could
improve the system reliability, and the simulation procedure
could also help in exploring appropriate configurations of fault
tolerance designs for practitioners.
Categories and Subject Descriptors
C.4 [Performance of Systems]: design studies, fault tolerance,
modeling techniques, reliability, availability, and serviceability;
H.3.5 [Online Information Systems]: Web-based services.
General Terms
Management, Design, Reliability, Experimentation.
Keywords Acceptance Test (AT), Decision Mechanism (DM), Fault
Tolerance Design, Reliability Assessment, Web Service (WS).
NOTATIONS n Number of functionally identical WS’s.
m Number of AT’s.
WSi The i-th functionally identical WS. Specifically, WS1
is the primary WS (PWS) and WS2~WSn are the
alternative WS’s (AWS).
ATr The r-th AT.
Hmid Host machine of the middleware.
Hi Host machine of WSi.
Pi Provider of WSi.
Ni Network connected to WSi.
NC Network connected to a client.
rcomp Reliability of the component.
fcomp Failure probability of the component.
fcomp=1-rcomp.
1. INTRODUCTION A Web service (WS) is generally defined as a software system
designed to support interoperable machine-to-machine interaction
over a network [1]. In the past, different web-based techniques
such as XML, simple object access protocol (SOAP), universal
description discovery and integration (UDDI), and web services
description language (WSDL) have been widely used to realize
operations of the web service architecture [2]. By using WS-based
techniques, users on different platforms could access different
kinds of services easily. Software developers could also combine
various WS’s to systems in order to achieve agile development.
With more and more publicly-available WS-based systems, the
way to implement a reliable WS’s platform has become an
important issue [3-6].
On the other hand, it is hard to remove all faults during software
operational phases. The software practitioners instead consider
fault tolerance (FT) designs with redundancy or backups of core
components to build highly reliable systems. The N-version
programming (NVP) and recovery block (RB) are two popular
fault tolerance mechanisms [3, 5]. The NVP utilizes functionally
equivalent software components (versions) to enable software
fault tolerance [4], and the RB utilizes different representations of
input data to provide the tolerance of design faults [6],
respectively. Other types of fault tolerance techniques were
proposed in previous studies [7-16], such as N-self-checking
programming, and distributed recovery block.
However, the reliability of web-based systems is hard to evaluate
[8-10]. Berman et al. investigated several backup mechanisms and
cost assessments to enhance the accuracy of software reliability
analysis [6-8]. Communications through undependable networks
between Web services make reliability assurance even more
challenging. Specifically, host overload, software failures,
hardware problems, and network congestions could all make a
WS request fail. To better assess the reliability of WS-based fault
tolerance systems, the above factors need to be considered [8, 9].
In addition, the factors of network connection and hardware
devices have not been carefully considered in the reliability
analysis of fault-tolerant WS-based systems. It is also noted that
the acceptance test (AT) and the decision mechanism (DM) are
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
InnoSWDev'14, November 16-22, 2014, Hong Kong, China
Copyright 2014 ACM 978-1-4503-3226-2/14/11... $15.00
http://dx.doi.org/10.1145/2666581.2666587
usually assumed error free, but in fact these components may still
fail and become a reliability bottleneck of a WS system [17]. In
this paper, we propose extended NVP (ENVP) and extended RB
(ERB) for WS-based fault tolerance systems. Besides, the
corresponding reliability models are developed to analyze the
proposed systems. The experiments are implemented by a
simulation procedure. Finally, the reliability improvement
suggestions for various fault tolerance configurations are
discussed.
2. RELATED WORKS In the following, two fault tolerance techniques, NVP and RB, are
surveyed.
2.1 N-Version Programming (NVP)
NVP was firstly proposed by Elmendorf [2, 11] and further
developed by Avizienis and Chen [9, 12]. NVP utilizes design
diversity by incorporating functionally identical versions of a
program. Different versions of a program could be executed
concurrently and a decision mechanism (DM) examines a
consensus result [4]. NVP enhances the dependability of the
software system under the assumption of low probabilities for two
or more versions of a program to produce similar erroneous
results simultaneously.
In the past few decades, researchers have focused on NVP to
improve the overall quality and independence of the diverse
developments, such as N-self-checking programming and N-
version executive [9]. Lyu et al. applied an NVP design paradigm
and had a significant reliability improvement [17, 19].
Additionally, a number of reliability models were proposed for
evaluating the NVP designs. The stochastic reliability and safety
models were separately developed by Tomek, Dugan, Mainini,
and Rendell in [9]. It was noticed that the common cause failures
could occur in different versions of a program. Dai et al. took
account of correlated failures and further modified the NVP
reliability model [13]. After that, Teng and Pham developed
NVP-based software reliability growth models which considered
the error-introduction rate and the error-removal efficiency in the
testing and debugging phases [20]. These models can be used to
predict the system reliability and to assist the testing strategies.
2.2 Recovery Block (RB)
RB was firstly proposed by Horning et al. [21] and implemented
by Randell [22], commonly used for improving system reliability
[2]. The primary version of a program is executed first, followed
by an acceptance test (AT) to examine the correctness of results.
If system failure is detected, the system will restore to a previous
checkpoint state and execute the next alternative version if there
is one. In real-time applications, a distinctive AT named a
watchdog timer is then combined to the system [23].
Zhou et al. showed that the reliability of a Web-service system
alone would not reflect the actual dependability users perceived
[8], and they categorized the failures of web-service systems into
four levels: service level, host level, provider level, and network
level. A number of reliability models were consequently
developed for Web-service systems. For example, Peng and
Huang [17] proposed a reliability framework for Service-Oriented
Architecture systems considering the reliabilities of services and
network conditions. Notably, the AT is also a possible source of
errors and could even be the reliability bottleneck of the RB [17].
Therefore, Elfawal et al. [14] proposed a fault-tolerant AT to
improve the service quality.
3. FAULT-TOLERANT RELIABILITY
MODELING In this section, we analyze the reliability of WS systems by
considering a number of fault-tolerant scenarios.
3.1 Extended NVP (ENVP)
Fig. 1 illustrates the message flow of ENVP with 3 AT’s. In the
ENVP scheme, client’s requests are forwarded to n functionally
identical WS’s through the middleware. The middleware serves
as the coordinator between the service users (client) and the
service providers (services). With the introduction of AT, after
receiving the outputs of the WS’s, the middleware could first
check the correctness of the outputs before passing them to the
DM, thus reducing the negative effects of common cause failures
as mentioned in section 2. Since the AT might still not be failure-
free, m redundant AT’s are used in case the originally selected AT
fails.
The reliability of an ENVP is formulated as follows:
1,1
1
mrrrfrrCmidr NHDM
m
r ATNVPENVP, (1)
where
7,5,3,)(
2/)1(
njPrn
njNVP (2)
and P(j) is the probability that exactly j functionally identical
WS’s are executed successfully. Specifically for n=3,
321321
321321
)3()2(
subsubsubsubsubsub
subsubsubsubsubsub
NVP
rrrrrf
rfrfrr
PPr
(3)
where rsub-i is the probability that WSi produces a correct result
and the result is transmitted to the middleware successfully,
formulated by
nirrrrr
iiii NPHWSisub 1, , (4)
and
isubisub rf 1. (5)
Based upon (1), the failure-free probability during the whole
ENVP operation is
n
i
m
r ATATNPHWS
NHDMFFENVP
ri
riiii
Cmid
rprrrr
rrrP
1 1
)(
(6)
where i
rATp is the probability that the r-th AT is firstly chosen for
WSi. For the case of successful ENVP operation with some
masked failures of the components (such as the case illustrated by
Fig. 2), the probability can be obtained by
)()( FFENVPENVPSOENVP PrP . (7)
ENVP will have incorrect results or throw an exception with one
of the following conditions:
(i) The majority of WS’s fail.
(ii) The DM or all AT’s fail.
(iii) The middleware or the network connections fail.
Its probability is obtained by
ENVPFOENVP rP 1)( . (8)
3.2 Extended RB (ERB)
Fig. 3 illustrates the message flow of ERB with 3 AT’s. In the
ERB scheme, the client’s requests are forwarded to the primary
WS (WS1). After receiving the outputs of the WS, the middleware
then validates the outputs by AT. The checkpoint will restore and
the next alternative WS (WSi+1) will be executed when the current
output is not accepted by AT. There are m redundant AT’s. For
each validation request, an AT is selected randomly, and the
remaining AT’s serve as backups when the original AT fails.
The reliability of an ERB is formulated as follows:
cMid NH
n
i isubERB rrrr .1)1(1
(9)
where
m
r ATNPHWSisub riiiifrrrrr
11
. (10)
Based upon (9), the failure-free probability during the whole ERB
operation is
Request
H1 P1
P2
P3
N1
N2
N3
CP
AT1
AT2
AT3
DM HMid. NC ResponseH2
H3
1
1
1 2
2
2
3
Middleware
WS1
WS2
WS3
Fig. 1. ENVP message flow.
Data passing is depicted by arrows. Circle-ended connections are failure operations.
1’s are requests from client. 2’s are outputs of WS’s. 3’s are responses to client.
Fig. 2. ENVP successful failure recovery.
m
r ATATNH
NPHWSFFERB
rrCmidrprr
rrrrP
1
)(
1
1111 . (11)
The probability of successful ERB with some masked failures of
the components (such as the case illustrated by Fig. 4) is obtained
by
)()( FFERBERBSOERB PrP . (12)
Finally, the probability of incorrect or exceptional ERB
operations is
ERBFOERB rP 1)( . (13)
4. SIMULATION AND ANALYSIS In this section, a simulation procedure will be used to evaluate
various operations of ENVP and ERB. The system reliability of
ENVP and ERB will also be analyzed. We follow similar
simulation methods from [24] and assume that the failure process
can be described by non-homogeneous Poisson process (NHPP).
A well-known NHPP model named Goel-Okumoto (GO) model
is selected to obtain the failure rates of AT’s and DM’s [25].
4.1 Simulation Procedure of ENVP
Fig. 11 illustrates the procedure to simulate ENVP operations.
The complete simulation loop is highlighted with color, the logics
related to multiple AT’s are colored red, and the DM parts are
colored brown in the figure.
Simulation initially receives the reliabilities or failure rates of
each component within the system in step 1. These values are
fixed in the operational phase and could be provided by applying
the GO model from the middleware directly. During the
simulation, a failure-free operation will be recorded when the
invocations of all the replica services (WS1-WSn) successfully
pass through the components along the WS transmission path in
steps 3-6 and a correct result (R1) is returned. In addition to the
Request H1 P1
P2
P3
N1
N2
N3
AT1
AT2
AT3
HMid. NC ResponseH2
H3
1
2
2
2
3
Middleware
CP
CP WS1
WS2
WS3
Fig. 3. ERB message flow.
Fig. 4. ERB successful failure recovery.
correct results, failures detected in steps 3-4 and 6-10 will also
cause WS failures (R2) or incorrect results (R3).
For each AT, a constant failure rate λ is compared with a random
variable from the interval (0, 1] in step 6. The AT fails when the
random variable is lower than λ, which forces the next AT to be
invoked if there is one in step 10. Tests for WS’s are similar
except that the comparisons are made with the failure rates of the
WS’s.
4.2 Simulation Procedure of ERB
Fig. 12 illustrates the procedure to simulate ERB operations. The
simulation loop is highlighted with color, the logics of normal RB
are colored red, and the mechanism of multiple AT’s are colored
brown in the figure.
In the beginning, simulation receives the reliabilities or failure
rates of the components within the system in step 1. During the
simulation, a failure-free operation will be recorded by
successfully passing all components along the transmission path
in step 3, the primary WS in steps 4 and 5, and the first chosen
AT in steps 6 and 7. When a failure occurs in steps 3, 5, or 7, the
ERB will successfully recover and still execute successfully as
long as alternative WS’s (step 10) or AT’s (step 12) are available
and the chosen AT does not falsely accept an incorrect result (step
9). ERB executes incorrectly (R3) when an incorrect result is not
identified by either the middleware in step 8 or the selected AT in
step 9. ERB may also fail (R2) when no correct results are
received and identified and no spare WS’s or AT’s are available.
4.3 Experimental Results
The experimental results are illustrated to analyze ENVP and
ERB operations. In order to investigate the reliability
improvement of adding AT’s, a correlation determination R2 is
calculated to analyze the degree of interaction between the system
reliability of ENVP/ERB and the number of WS’s (#VP) or AT’s
(#AT) [26]. The correlation coefficient R between two random
variables X and Y is defined as
YX
XY
SS
SR
ˆˆ
ˆ
, (14)
where
n
i iY
n
i iX
n
i iiXY
nYYS
nXXS
nYYXXS
1
2
1
2
1
1)(ˆ
1)(ˆ
1))((ˆ
(15)
and R ranges from 0 to 1. The coefficient of determination R2 [26]
is defined as the square of the correlation coefficient, taking on
values between 0 and 1, and is used to explain how much the
variation of the system reliability is affected by the numbers of
WS’s or AT’s in use.
In ENVP and ERB simulations, the test runs 100,000 times.
Besides, the number of WS’s for ENVP is denoted as “1VP,”
“3VP,” and “5VP.” The number of WS’s for ERB is denoted as
“1RB,” “2RB,” “3RB,” and “4RB,” respectively. The number of
AT’s increases from “1AT” to “4AT” in ENVP and ERB
operations.
The results of various ENVP operations are shown in Tables 1-5
and Figs. 5-7. From Table 1, the R2 values reveal a high degree of
correlation between the measured system reliabilities and the
numbers of AT’s or VP’s in ENVP operations. Moreover, from
Tables 2-3 and Fig 5, the system reliability can be gradually
improved by increasing the numbers of AT’s and VP’s.
Further, the ENVP operations can be more reliable from “1AT” to
“4AT.” Thus, the proposed fault-tolerant designs can help
improve the system reliability of ENVP. If there is only 1 AT, the
increases of VP’s do not have a great impact on the system
reliability of ENVP. In addition, we can find that the system
reliability improves steadily and slowly as the number of AT’s
increases to 2 or more. Therefore, for the sake of cost-
effectiveness, “3VP” and a small number of AT’s may be
considered.
Table 1. Reliability and Correlation of Various ENVP Configurations
#AT
#VP 1AT 2AT 3AT 4AT R
2
1VP 0.89193 0.92584 0.93588 0.93876 0.81647
3VP 0.91951 0.97461 0.98153 0.98256 0.70089
5VP 0.91803 0.97953 0.98467 0.98502 0.66546
R2 0.70760 0.81810 0.79806 0.78976
Table 2. Probability of ENVP Failure-Free Operations
#AT
#VP 1AT 2AT 3AT 4AT
1VP 0.89193 0.92584 0.93588 0.93876
3VP 0.73887 0.80125 0.82568 0.83484
5VP 0.61372 0.68756 0.71483 0.73143
Table 3. Probability of ENVP Successful Farilure Recoveries
#AT
#VP 1AT 2AT 3AT 4AT
1VP 0.00000 0.00000 0.00000 0.00000
3VP 0.18064 0.17336 0.15585 0.14772
5VP 0.30431 0.29197 0.26984 0.25359
Table 4. Probability of ENVP Incorrect Results
#AT
#VP 1AT 2AT 3AT 4AT
1VP 0.00002 0.00000 0.00001 0.00000
3VP 0.00000 0.00000 0.00000 0.00000
5VP 0.00000 0.00000 0.00000 0.00000
Table 5. Probability of ENVP Failures (Exceptional Results)
#AT
#VP 1AT 2AT 3AT 4AT
1VP 0.10805 0.07416 0.06411 0.06124
3VP 0.08049 0.02539 0.01847 0.01744
5VP 0.08197 0.02047 0.01533 0.01498
Tables 4 and 5 illustrate the probability of incorrect or exceptional
ENVP operations. We can clearly see that the failure probability
can be greatly reduced as AT’s and VP’s increase. This is because
the more WS’s we use, the higher the chances are that the correct
outputs can be obtained. Similarly, when the number of AT’s
increases, more reliable ENVP operations can be expected. It is
also noted from Table 4 that the probabilities of failure operations
are basically negligible except for in the “1VP” cases, where no
DM’s are actually used.
The results of various ERB operations are shown in Tables 6-10
and Figs. 8-10. From Table 6, the R2 values reveal a high degree
of correlation between the measured system reliabilities and the
numbers of AT’s or RB’s. Furthermore, from Tables 7-8 it can be
seen that the system reliability can be gradually improved by
increasing the numbers of AT’s and RB’s. However, there are
only small reliability improvements when the number of AT’s
exceeds 2. Therefore, “2AT” may be a better strategy under the
constraints of cost-effectiveness.
Further, we can see that the proposed ERB can help improve the
system reliability as AT’s and RB’s increase. Tables 9-10 display
the probability of incorrect or exceptional ERB operations. It can
be seen that the probabilities of incorrect results are close to 0 and
ERB failures are significantly reduced as the numbers of AT’s or
RB’s increase.
Fig 5. ENVP system reliabilities.
Fig. 6. ENVP operations (2AT cases).
Fig. 7. ENVP operations (3VP cases).
Table 6. Reliability and Correlation of Various ERB Operations
#AT
#RB 1AT 2AT 3AT 4AT R
2
1RB 0.86431 0.89845 0.91078 0.91093 0.79783
2RB 0.91880 0.96846 0.97682 0.98034 0.75670
3RB 0.92254 0.97824 0.98536 0.98691 0.70830
4RB 0.92373 0.98242 0.98809 0.98811 0.67015
R2 0.66713 0.73530 0.71799 0.68094
Table 7. Probability of ERB Failure-Free Operations
#AT
#RB 1AT 2AT 3AT 4AT
1RB 0.86431 0.87390 0.87599 0.86962
2RB 0.86422 0.87570 0.87198 0.86792
3RB 0.86347 0.87295 0.87252 0.86819
4RB 0.86427 0.87354 0.87379 0.86899
Table 8. Probability of ERB Successful Failure Recoveries
#AT
#RB 1AT 2AT 3AT 4AT
1RB 0.00000 0.02455 0.03479 0.04131
2RB 0.05458 0.09276 0.10484 0.11242
3RB 0.05907 0.10529 0.11284 0.11872
4RB 0.05946 0.10888 0.11430 0.11912
Table 9. Probability of ERB Incorrect Results
#AT
#RB 1AT 2AT 3AT 4AT
1RB 0.00015 0.00026 0.00023 0.00024
2RB 0.00031 0.00019 0.00018 0.00020
3RB 0.00027 0.00023 0.00021 0.00014
4RB 0.00028 0.00017 0.00028 0.00030
TABLE 10. Probability of ERB Failures (Exception Results)
#AT
#RB 1AT 2AT 3AT 4AT
1RB 0.13554 0.10129 0.08899 0.08883
2RB 0.08089 0.03135 0.02300 0.01946
3RB 0.07719 0.02153 0.01443 0.01295
4RB 0.07599 0.01741 0.01163 0.01159
5. CONCLUSIONS This paper presents the reliability analysis of ENVP and ERB
operations. ENVP and ERB well enhance the reliability of fault
tolerance designs by adding redundancy logics to protect the
vulnerable AT’s, and the proposed reliability models are
integrated into ENVP and ERB operations. Extensive
experimental results show that the combination of “3NVP” and
“2AT” may be considered in the ENVP designs to balance the
reliability enhancements and the extra costs of redundant AT’s.
Similarly, “2RB” with “2AT” may be a good choice for ERB
designs. The correlation analysis illustrates that there could be a
high degree of correlation between the numbers of AT’s and the
reliability improvements. Thus, our proposed fault-tolerant
designs can help improve the system reliability of ENVP and
ERB operations. In the cost-effective constraint, the proposed
simulation procedure may help software practitioners explore
appropriate configurations of fault tolerance designs.
6. ACKNOWLEDGMENTS The work described in this paper was supported by the Ministry
of Science and Technology, Taiwan, under Grants NSC 101-
2221-E-007-034-MY2, NSC 101-2220-E-007-005, and MOST
103-2220-E-007-022.
7. REFERENCES [1] D. Booth, H. Haas, F. McCabe, E. Newcomer, M.
Champion, C. Ferris, and D. Orchard. Web Services
Architecture. W3C Working Group Note, 2004.
[2] L. L. Pullum. Software Fault Tolerance Techniques and
Implementation, Artech House Publishers, 2001.
[3] K. Goševa-Popstojanova and A. Grnarov. Performability
and Reliability Modeling of N Version Fault Tolerant
Software in Real Time Systems. In Proceedings of the
23rd EUROMICRO Conference, pages 532-539, Budapest,
Hungary, 1997.
[4] K. Goševa-Popstojanova and A. Grnarov, N-Version
Programming with Majority Voting Decision:
Dependability Modeling and Evaluation. In Micro-
processing and Microprogramming, 38(1-5): 811-818,
1993.
[5] A. Armoush, F. Salewski, and S. Kowalewski. Recovery
Block with Backup Voting: A New Pattern with Extended
Representation for Safety Critical Embedded Systems. In
Proceedings of the 11th International Conference on
Information Technology, ICIT 2008, pages 232-237,
Bhubaneswar, India, 2008.
[6] O. Berman and U.D. Kumar. Optimization Models for
Recovery Block Schemes. European Journal of
Operational Research, 115(2):368-379, 1999.
[7] J. B. Dugan and M. R. Lyu. System Reliability Analysis of
an N-Version Programming Application. IEEE Trans.
Reliability, 43(4):513-519, 1994.
[8] B. Zhou, K. Yin, S. Zhang, H. Jiang, and A. J. Kavs. A
Tree-Based Reliability Model for Composite Web Service
with Common-Cause Failures. In Proceedings of the 5th
International Conference on Advances in Grid and
Pervasive Computing, pages 418-429, Hualien, Taiwan,
2010.
[9] M. R. Lyu. Software Fault Tolerance, John Wiley & Sons
Ltd., 1995.
[10] C. J. Hsu and C. Y. Huang. Reliability analysis using
weighted combinational models for web-based software. In
Proceedings of the 18th international conference on World
wide web, WWW 2009, pages 1131-1132, Madrid, Spain,
2009.
[11] W. R. Elmendorf. Fault-Tolerant Programming. In The 2nd
Annual International Symposium on Fault Tolerant
Computing, FTCS-2, pages 79-83, 1972.
Fig. 8. ERB System reliabilities.
Fig. 9. ERB operations (2AT cases).
Fig. 10. ERB operations (3RB cases).
[12] A. Avizienis. On the Implementation of N-Version
Programming for Software Fault-Tolerance During
Execution. IEEE International Computer Software and
Applications Conference, COMPSAC 1977, pages 149-155,
1977.
[13] Y. S. Dai, M. Xie, K. L. Poh, and S. H. Ng. A Model for
Correlated Failures in N-Version Programming. IIE
Transactions, 36(12):1183-1192, 2004.
[14] H. E. Mansour and T. Dillon. Dependability and Rollback
Recovery for Composite Web Services. IEEE Trans.
Services Computing, 4(4):328-339, 2011.
[15] Z. Zheng and M. R. Lyu. A Distributed Replication
Strategy Evaluation and Selection Framework for Fault
Tolerant Web Services. In Proceedings of the 6th IEEE
International Conference on Web Services, ICWS 2008,
pages 145-152, Beijing, China, 2008.
[16] N. Milanovic. Contract-Based Web Service Composition
Framework with Correctness Guarantees. In Proceedings
of the 2nd International Symposium on Service Availability,
pages 52-67, Berlin, Germany, 2005.
[17] K. L. Peng and C. Y. Huang. Reliability Evaluation of
Service-Oriented Architecture Systems Considering Fault-
Tolerance Designs. Journal of Applied Mathematics, 2014.
DOI= http://dx.doi.org/10.1155/2014/160608.
[18] M. R. Lyu and Y. T. He. Improving the N-Version
Programming Process Through the Evolution of a Design
Paradigm. IEEE Trans. Reliability, 42(2):179-189, 1993.
[19] M. R. Lyu, J. Chen, and A. Avižienis. Experience in
Metrics and Measurements for N-Version Programming.
International Journal of Reliability, Quality and Safety
Engineering, 1(1):41-62, 1994.
[20] X. Teng and H. Pham. A Software-Reliability Growth
Model for N-Version Programming Systems. IEEE Trans.
Reliability, 51(3):311-321, 2002.
[21] J. J. Horning, H. C. Lauer, P. M. Melliar-Smith, and B.
Randell. A Program Structure for Error Detection and
Recovery. Lecture Notes in Computer Science, 61:171-187,
1974.
[22] B. Randell. System Structure for Software Fault Tolerance.
IEEE Trans. on Software Engineering, SE-1(2):220-232,
1975.
[23] H. Hecht. Fault Tolerant Software for Real-Time
Applications. ACM Computing Surveys, 8(4):391-407,
1976.
[24] S. S. Gokhale and M. R. Lyu. A Simulation Approach to
Structure-Based Software Reliability Analysis. IEEE Trans.
Software Engineering, 31(8):643- 656, 2005.
[25] A. L. Goel and K. Okumoto. Time-Dependent Error-
Detection Rate Model for Software Reliability and Other
Performance Measures. IEEE Trans. Reliability, R-
28(3):206-211, 1979.
[26] G. Keller. Statistics for Management and Economics, 8th
edition, South-Western College, 2008.
Appendix
Fig. 11. Simulation procedure of ENVP.
1. Global Init.
2. Local Init.
Comp. failure
rate estimates
3. Test comps.
along trans. path
4. Test WS i7. Fault detected
by middleware?
6. Test AT
5. Randomly sel.
a spare AT8. Fault detected
by AT?
9. Falsely rej. by AT?
10. More
alt. AT's?
R2. Request
to WS i fails
R3. Incorrect
result
recorded
R1. Correct
result recorded
Some fail
All pass
Fails
Succ.
Fails
Succ.
Yes
No
Yes
No
Yes
No
NoYes
11. More WS's?
S1. Succ. exec
recorded
S2. Fail. exec
recorded
14. More runs?
Yes
No
15. Generate
sim. results
Sim. report
13. No. correct >
No. incorrect?
Yes
No
Yes
No
12. Test DMSucc. Fails
Fig. 12. Simulation procedure of ERB.
1. Global Init.
2. Local Init.
Comp. failure
rate estimates
3. Test comps.
along trans. path
5. Test WS i
8. Fault detected
by middleware?
7. Test AT
6. Randomly sel.
a spare AT
9. Fault detected
by AT?
11. Falsely rej. by AT?
12. More
alt. AT's?
R2. Sys. fails
R3. Incorrect
exec.
recorded
R1. Succ.
exec. recorded
Some fail
All pass
Fails
Succ.
Fails
Succ.
Yes
No
Yes
No
Yes
No
NoYes
10. More
alt. WS's?
13. More runs?14. Generate
sim. results
Sim. report
Yes
No
4. Sel. next alt. WS i
NoYes