Click here to load reader
Upload
ijafrc
View
147
Download
1
Embed Size (px)
DESCRIPTION
Cloud computing is powering the overall business and organizational growth by providing the three basic services like Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). As the worldwide cloud users demand multiple services from cloud at a time, so it is the most important concern for cloud service providers to concentrate on the reliability of the system. The reliability of any system can be determined by the number of failures occurred in the cloud computing environment vs. the total number of tasks done by the cloud. The reliability of a system depends on the reliability of each and every component of the system with which the system is composed of. In this paper, an attempt has been made to analyze different cloud reliability techniques, different components for reliability measurement and the methodology for measuring reliability. Further, based on these parameters we have prepared a comparison table to compare these techniques.
Citation preview
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
107 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
An Analysis Of Cloud ReliabilityApproaches Based on Cloud
Components And Reliability Techniques Abishi Chowdhury1, Priyanka Tripathi2
National Institute of Technical Teachers’ Training and Research, Bhopal, India1,2
[email protected] 1, [email protected] 2
A B S T R A C T
Cloud computing is powering the overall business and organizational growth by providing the
threebasic services like Software as a Service (SaaS), Platform as a Service (PaaS) and
Infrastructure as a Service (IaaS). As the worldwide cloud users demand multiple services from
cloud at a time, so it is the most important concern for cloud service providers to concentrate on
the reliability of the system. The reliability of any system can be determined by the number of
failures occurred in the cloud computing environment vs. the total number of tasks done by the
cloud. The reliability of a system depends on the reliability of each and every component of the
system with which the system is composed of. In this paper, an attempt has been made to analyze
different cloud reliability techniques, different components for reliability measurement and the
methodology for measuring reliability. Further, based on these parameters we have prepared a
comparison table to compare these techniques.
Index Terms: Virtual Machine; Reliability; Cloud Manager; Fault Tolerance; Fault manager; Fault
Tolerance middleware
I. INTRODUCTION
Cloud computing provides on demand services to its users and the users can demand any kind of services
in the form of Software, Platform, Infrastructure and so on at anytime from anywhere [1][2]. Cloud
reserves its abstract nature while providing these services to the cloud users. Cloud comprises different
servers. A Datacenter can be a collection of thousands of the servers. Users request for the cloud
infrastructure and use the servers for doing their tasks. The cloud provides infrastructure in the form of
virtual machines (VMs). Cloud can also provide a whole virtual infrastructure for providing services to
the cloud users. And for doing this cloud uses different types of approaches.
Any type system can go under failure. Failure in the cloud environment is not an exceptional case because
fault is a nature of the technology. Failure can occur at any time. Failure affects different aspect of the
system. Most importantly, it affects the vast worldwide business of the cloud computing. Sometimes a
small failure can give a great loss to the cloud service provider. Failure can affect the revenue and the
long term image of the cloud service providers.
Failures can be hardware failures and the software failures. Both require different strategy for getting a
solution. Hardware failure can be the failure like failure of the memory, failure of the disk, etc. The
software failure can be like the failure application failure, Execution time failure, Timeout failure etc.
Reliability of cloud resources does not depend on the reliability of the individual resources. But it
depends on the reliability of their collective working. While calculating the reliability of cloud computing
it should be kept in mind that whether the components are working in parallel or not.
This paper presents a study of cloud reliability and the techniques which are proposed for measuring and
improving the cloud reliability. First the National Institute of Standards and Technology (NIST) standards
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
108 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
about the reliability of cloud computing are discussed. And then different techniques about measuring
the cloud reliability and the techniques to overcome the faults in the cloud environment are discussed.
II. RELIABILITY CONCEPT
A. NIST Standards
As stated in National Institute of Standards and Technology, NIST [3], broadly reliability is the function of
four main components of cloud computing.
• The software and hardware offered by the cloud service providers
• The personnel resources provided
• Connectivity
• The consumers’ personnel
It is very difficult task to measure the reliability of cloud computing environment. The main reasons for
this, first, as there are a number of components in a cloud environment, the individual reliability of these
components is different from the reliability of these components taken together. And the second is that it
is highly dynamic and depends on the environment. Now, first we have to consider all the possible
conditions of failures in cloud environment, then only a reliability model can be considered.
B. DIFFERENT RELIABILITY APPROACHES
1. Adaptive Fault Tolerance (AFT)
It is an adaptive fault tolerance technique [4] in real time cloud computing environment (AFTRC). It
tolerates the faults on the basis of reliability of each virtual machine. Based on the reliability a virtual
machine is selected. There are two types of nodes, the virtual machine which are running on the cloud
and the adjudication node. On the running virtual machine we have, the real time application and the test
for acceptance of logical validity. At adjudication node there is the decision system, reliability assessor
and time checker. In brief, this technique uses
• Acceptance test: This is for checking the results of the real time algorithms.
• Time checker: It checks the time of the results produced by each module.
• Reliability assessor: It assesses the reliability of every virtual machine.
• Decision mechanism: It is used for making decision about virtual machine.
• Recovery cache: It is used for the checkpoints.
2. Cloud Service Reliability Modelling and Analysis
In this paper the authors had presented [5] an inventive reliability model for cloud computing, which
deals with several types of failures that affect the success and failure of cloud services. First, a cloud
computing system in the VGrADS (Virtual Grid Application Development Software) is proposed. In this
system, there is a CMS (Cloud Management System) which is composed of a set of servers that serves four
different responsibilities. Such as,
• Managing a request queue that contains the requests of different cloud users
• Managing computing resources, such as PCs, etc.
• Managing data resources, such as, Databases, etc.
• Scheduling requests, and assigning these to different computing and data resources.
Whenever any request comes it passes through the CMS, then CMS provides the resources to them. A
number of failures were analyzed such as Computing resource missing, Data resource missing, Overflow
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
109 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
failure, timeout failure, Database failure, Network failure, Software failure and Hardware failure
[6][7][8]. Classification of reliability stages is given in figure1.
Figure 1. Reliability Stages
The solution proposed for the Request Stage Reliability is the Markov Model and for Execution Stage
Reliability is a graph based model. This model is further enhanced with the combination of the graph
based model and the Bayesian network model. The overall reliability is given by the multiplication of the
request stage reliability and the execution stage reliability.
3. Fault-Tolerant and Reliable Computation in Cloud Computing
In this paper the authors have explored the security aspect of scientific computation in cloud computing.
The proper cloud selection strategy and protection against faulty and mischievous cloud was investigated
[9]. They have considered the scientific computation in large matrix multiplication. At first, they assumed
that there are several clouds and each of which contains several servers. These servers are trusted
partially based on the experience of the individual client and the client knows the reliability and cost of
each of the cloud. Different cost for different cloud. The work is divided in the multiplication of the rows
and columns on the different clouds. The cost calculation of different clouds is given. Now, the problem is,
suppose a client dispatches li rows of matrix A for being multiplied with matrix B in the cloud i. Then the
overall cost will be:
C = ∑ ������ ��
The reliability of the dispatched task will be:
R= ( ∑ ����� �
�� )/ l
Where, l = no of rows in matrix A and ∑ �� �����
The main objective is to minimize the overall cost C, subject to R >= Rs, where Rs is a minimum reliability
requirement which is previously specified. Now, the overall reliability of this task can be expressed as the
minimum value of the reliability of all the clouds involved in this calculation, i.e.
� �� ���,�,…�����
Then we can simply discard all the clouds with reliability less than the minimum required reliability, Rs.
Now, from the remaining set of the clouds we can choose the cloud with reliability value greater than Rs
and with the lowest cost Ci first. Then we can choose other clouds with higher reliability as per necessity.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
110 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
4. Fault Tolerance and Resilience
In this paper [10], the concept of fault, errors and failure can be expressed by applying the following
chain:
Fault � Error � Failure
The failure behavior of the servers that are contained in the data center can be obtained by the study
about the server failures and the hardware failures. It is necessary to apply fault tolerance system to
enhance the reliability of hard disks in order to considerably cut down the number of failures. According
to the study of the system, failed machines are replaced. The study of the failure behavior of networks
should also be done as several network components are associated for constructing the data center.
Based on this study, it is observed that the reliability of comprehensive data center network is almost
99.99%. Fault tolerance is the capability of the system to achieve its function in spite of the presence of
failures. The classification of faults are done into two categories as shown in the below figure 2:
Figure 2: Classification of Faults
First, Crash faults which block the several system components from functioning or to remain idle at the
time of failures for example hard disk crash, power outage, etc.
Second is Byzantine faults that cause the system components to behave incorrectly at the time of failure.
As a result, the system shows erratic behavior.
The most popular methods to resolve these above two types of faults are described below:
Checking and monitoring: In this method the system is being observed continuously during its runtime
in order to justify the correctness of the system specification.
Checkpoint and restart: In this method the state of the system is grabbed and stored so that when the
system goes through a failure its correct state is restored using the checkpoint information.
Replication: And in this method the essential system components are replicated or imitated in such a
way so that a copy of this system components is available during a failure.
5. Fault Tolerance Middleware
The Low latency Fault Tolerance (LLFT) middleware [11] is a service that contributes fault tolerance
reliable services for distributed applications within data centers that comprises of several servers,
storage and networks. By using leader/follower approach, this LLFT middleware imitates the application
process in order to secure the application from several faults, particularly, the Crash fault and Timing
fault. Due to crash fault, a process or processor does not yield any further result and it does not yield any
result within a specific time constraint due to the timing fault. But the Byzantine fault is not handled by
this middleware. Two types of leader/follower replications are supported by this LLFT middleware.
These are as follows:
� Semi active replication: In this process the primary replica orderly arranges the received
messages and executes the operations and also provides the ordering information to the backup
replicas for the non-deterministic operations.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
111 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
� Semi passive replication: It performs not only the same of the above but also in addition to this
the primary replica always communicates state update to the backup replicas. It uses lesser
processing power than semi active replication but if the primary fails it acquires greater latency
for the purpose of recovery and reconfiguration.
The LLFT middleware comprises of the following:
� Low Latency Fault Tolerance (LLFT) Messaging Protocol: It basically contributes two main
services for application messages; these are the following:
• Reliable delivery: In this all the members of a group receive each and every message that
is multicast to this group on a network connection.
• Total ordering: The primary replica in a group communicates the ordering information
to all the backups in this group and all the members in a group hand over the messages to
the application maintaining the same order.
� Low Latency Fault Tolerance (LLFT) Membership Protocol: This protocol confirms that all the
members of a particular group must have a consistent view about the membership set and the
primary replica of that group. It is much faster than a multi-round consensus protocol which is
mainly necessary in the case when primary replica fails. The primary replica decides the inclusion
and exclusion of the backup replicas to or from the group on the basis of their ranks and
precedence.
The precedence of a group member is determined by the order in which the members join the
group and the rank of the primary replica is 1 and for the backup replicas it will be 2, 3, 4… which
are assigned by the primary replica based on their precedence.
� Low Latency Fault Tolerance (LLFT) Virtual Determinizer Framework: The applications in
cloud computing environment commonly incorporate several sources of non-determinism.
Therefore, to preserve firm replica consistency, it is vital to mask these sources of non-
determinism. It records the ordering information and the results of each non deterministic
operation accomplished by the primary replica and at the back up replicas it carries out the same
ordering as the primary.
6. A System Level Approach
The purpose of this approach [12] is to overcome the limitations of current existing methodologies by
providing fault tolerance properties as on demand services. It contributes flexibility for the applications
to dynamically regulate its fault tolerance properties and the level of the required availability and
reliability overtime. The cost of the resources can be reduced to a certain extent and the performance
level can be adjusted according to the particular business needs. It allows the users to obtain an explicitly
fault tolerance support for its applications without having a comprehensive knowledge about the system
level proceedings. By adding a new dedicated service layer between the computing framework and the
applications, it is possible to provide fault tolerance reliable support to each application abstracting the
complications of the elemental infrastructure. To promote a well-developed support, it is necessary for
the service layer to accommodate a range of reliability mechanisms and also to construct a fault tolerance
solution which can be dispatched to different applications. And to accomplish this, fault tolerance
solution can be viewed as a combination of a set of definite activities. For example, each fault tolerance
mechanism, such as fault detection, replication of an application, masking and recovery, etc. can be
observed as a specific or distinct activity that are combined together to build a fault tolerance solution.
Now, this each individual activity can be accomplished as a stand-alone configurable module which
produces a consistent solution to a repetitive system failure. Moreover, each module is combined with a
set of metadata which characterize its functional, structural and operational properties. These metadata
International Journal of Advance Foundation and Research in Computer (IJAFRC)
112 | © 2014, IJAFRC All Rights Reserved
can be inspected during runtime and compared with diffe
relevant activities. This approach can be achieved by implementing each module separately as a web
service in the form of WSDL [13]
designing a scheme, the Fault Tolerance Manager (FTM).
FTM is composed of a set of following components:
• Replication Manager: It incorporates techniques to manage firmness in a replica group by updating
the state of the backup replicas and the primary replica.
• Fault Detection/Prediction Manager:
and to send notification regarding this to FTM kernel in order to invoke services from Fault Masking
Manager and Recovery Manager. It also notifies Resource Man
the resource state of the cloud.
• Fault Masking Manager: This component involves a collection of algorithms that mask the
occurrence of failures and restrict the faults to meet high availabil
• Recovery Manager: It incorporates all the mechanisms which is used to resume er
a normal node.
• Messaging Monitor: It is used to convey necessary messages among different replicas of a replica
group and also for inter-component c
• Client/Admin Interface: This is used to achieve users’ requirements and act as an interface
FTM and the end users.
• FTM Kernel: It is the pivotal computing component of FTM which manages the reliability m
present in the scheme.
• Resource Manager: This component is used to efficiently allocate required resources and to avoid
under provisioning and over provisioning during failures.
7. Fault Tolerant Approaches in Cloud Infrastructure
In most of the recent approaches, fault
customers. But there are no collaborations between them. Therefore, sometimes this leads to a partial or
faulty solution. To overcome this issue, different fault tolerance policies in clo
investigated in this paper [14]. There are mainly two types of policies, i.e. in the first one fault tolerance
mechanisms are handled solely by either the cloud service provider or the customer and in the second
policy there is a collaborative management between the custo
Fig
In general there are three layers in a cloud platform that is shown in the figure
Virtual machines and Resources and
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014.
© 2014, IJAFRC All Rights Reserved
can be inspected during runtime and compared with different users requirements in order to choose
relevant activities. This approach can be achieved by implementing each module separately as a web
document. The feasibility of this proposed approach is obtained by
heme, the Fault Tolerance Manager (FTM).
FTM is composed of a set of following components:
It incorporates techniques to manage firmness in a replica group by updating
plicas and the primary replica.
t Detection/Prediction Manager: It is used to detect the faults promptly after their occurrence
and to send notification regarding this to FTM kernel in order to invoke services from Fault Masking
Manager and Recovery Manager. It also notifies Resource Manager about the faulty replica to update
This component involves a collection of algorithms that mask the
occurrence of failures and restrict the faults to meet high availability demands of the cloud
It incorporates all the mechanisms which is used to resume er
It is used to convey necessary messages among different replicas of a replica
component communication.
This is used to achieve users’ requirements and act as an interface
It is the pivotal computing component of FTM which manages the reliability m
This component is used to efficiently allocate required resources and to avoid
under provisioning and over provisioning during failures.
Fault Tolerant Approaches in Cloud Infrastructure
In most of the recent approaches, fault tolerance is entirely handled by the cloud service providers or the
customers. But there are no collaborations between them. Therefore, sometimes this leads to a partial or
faulty solution. To overcome this issue, different fault tolerance policies in cloud computing have been
. There are mainly two types of policies, i.e. in the first one fault tolerance
mechanisms are handled solely by either the cloud service provider or the customer and in the second
laborative management between the customers and the service providers.
Figure 3. Cloud computing Architecture
In general there are three layers in a cloud platform that is shown in the figure 3
Virtual machines and Resources and each of these are associated with several failures. That is why, there
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
www.ijafrc.org
rent users requirements in order to choose
relevant activities. This approach can be achieved by implementing each module separately as a web
. The feasibility of this proposed approach is obtained by
It incorporates techniques to manage firmness in a replica group by updating
It is used to detect the faults promptly after their occurrence
and to send notification regarding this to FTM kernel in order to invoke services from Fault Masking
ager about the faulty replica to update
This component involves a collection of algorithms that mask the
ity demands of the cloud users.
It incorporates all the mechanisms which is used to resume erroneous nodes to
It is used to convey necessary messages among different replicas of a replica
This is used to achieve users’ requirements and act as an interface between
It is the pivotal computing component of FTM which manages the reliability mechanism
This component is used to efficiently allocate required resources and to avoid
tolerance is entirely handled by the cloud service providers or the
customers. But there are no collaborations between them. Therefore, sometimes this leads to a partial or
ud computing have been
. There are mainly two types of policies, i.e. in the first one fault tolerance
mechanisms are handled solely by either the cloud service provider or the customer and in the second
mers and the service providers.
3, these are Applications,
each of these are associated with several failures. That is why, there
International Journal of Advance Foundation and Research in Computer (IJAFRC)
113 | © 2014, IJAFRC All Rights Reserved
are mainly three types of failures: Application failure, Virtual machine failure and Hardware failure. And
for these failures there are some fault tolerance so
First fault tolerance method concentrates in stateless applications like proxy e.g. HAProxy or MySQL
Proxy. The second is a state-full method, in this customer must implement the functions for storing the
state of the server, so that on the next start of t
fault, sensors can be used. First, the faulty VM is deallocated from the job. Second, a new VM is allocated.
Third, start the tasks that are running on the failed VM. Fourth, restore the state of th
Fault tolerance system. The customer cannot see all these types of the fault. These can be monitored
the cloud service providers. This is done by a monitoring system composed of the sensors. These
techniques are used in [15][16].
8. A Virtualization and Fault Tolerance Approach
Fault tolerance is provided to cloud infrastructure by implementing the cloud manager [17], Load
balancer, Fault Handler and the Decision maker. A parameter, success rate is used specially for fault
tolerance. In this, Job is given to the virtual machine which has the success rate more than some specific
value. In this way the chances of the fault decrease. The fault handler has the responsibility that when a
VM is found to be faulty its performance table must b
performance table the cloud infrastruct
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014.
© 2014, IJAFRC All Rights Reserved
are mainly three types of failures: Application failure, Virtual machine failure and Hardware failure. And
for these failures there are some fault tolerance solutions that are described in figure 4
Figure 4. Fault Tolerance
First fault tolerance method concentrates in stateless applications like proxy e.g. HAProxy or MySQL
full method, in this customer must implement the functions for storing the
state of the server, so that on the next start of the system this state can be resumed. For repairing the VM
fault, sensors can be used. First, the faulty VM is deallocated from the job. Second, a new VM is allocated.
Third, start the tasks that are running on the failed VM. Fourth, restore the state of th
Fault tolerance system. The customer cannot see all these types of the fault. These can be monitored
This is done by a monitoring system composed of the sensors. These
A Virtualization and Fault Tolerance Approach
Fault tolerance is provided to cloud infrastructure by implementing the cloud manager [17], Load
balancer, Fault Handler and the Decision maker. A parameter, success rate is used specially for fault
In this, Job is given to the virtual machine which has the success rate more than some specific
value. In this way the chances of the fault decrease. The fault handler has the responsibility that when a
VM is found to be faulty its performance table must be updated. According to the success rate and the
performance table the cloud infrastructure is made more fault tolerant.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
www.ijafrc.org
are mainly three types of failures: Application failure, Virtual machine failure and Hardware failure. And
in figure 4:
First fault tolerance method concentrates in stateless applications like proxy e.g. HAProxy or MySQL-
full method, in this customer must implement the functions for storing the
he system this state can be resumed. For repairing the VM
fault, sensors can be used. First, the faulty VM is deallocated from the job. Second, a new VM is allocated.
Third, start the tasks that are running on the failed VM. Fourth, restore the state of the VM for physical
Fault tolerance system. The customer cannot see all these types of the fault. These can be monitored by
This is done by a monitoring system composed of the sensors. These
Fault tolerance is provided to cloud infrastructure by implementing the cloud manager [17], Load
balancer, Fault Handler and the Decision maker. A parameter, success rate is used specially for fault
In this, Job is given to the virtual machine which has the success rate more than some specific
value. In this way the chances of the fault decrease. The fault handler has the responsibility that when a
e updated. According to the success rate and the
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
114 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
III. TABULAR ANALYSIS
Tabular analysis of different approaches is done as shown in table 1. First column tells about the
component for reliability measurements, these components are like VM, some approaches broadly
consider the reliability of the whole infrastructure and the system. The second column represents the
methodology used for cloud reliability measurement. Third column represents the techniques and
components used for reliability measurement. It also represents the effect of techniques on the reliability
of the system.
Sl
No.
Technique
Name
Components for
the reliability
measurement
Methodology
for
measuring
reliability
Reliability
measurement
Used for
B.1 Adaptive fault
tolerance
(AFT)
Reliability of each
virtual machine is
measured.
Virtual
machines are
divided into
two category
running and
adjudication
virtual
machines.
Acceptance
test, Time
checker,
Reliability
assessor,
Decision
mechanism,
Recovery cache
are used.
Real time
cloud
computing
B.2 Cloud Service
Reliability
Modelling
and Analysis
Reliability of the
system is measured
Reliability is
divided into
two parts:
request time
reliability and
execution
time
reliability
Total reliability
is measured by
the product of
the two
reliabilities.
Handling
different
failures in
Cloud
Computing
Environment
B.3 Fault-
Tolerant and
Reliable
Computation
in Cloud
Computing
Reliability of the
server is measured
Reliability
and the cost
relation are
studied.
Reliable
component
with the
reliability,
greater than a
threshold value
and having less
cost is selected.
General
Scientific
computation
B.4 Fault
Tolerance
and
Resilience
Failure of a
machine
Faults are
divided into
two parts
Crash faults
and
Byzantine
Faults.
Reliability of
the system
increase with
the checkpoint,
restart,
replacement.
Characterizing
recurrent
failures in
Cloud
environment
B.5 Fault
Tolerance
Middleware
Fault in the system Faults are
divided into
two parts
Crash and
Timing fault.
By providing a
middleware
service overall
reliability of
the system
increases.
Distributed
applications
fault tolerance
B.6 A System
Level
Approach
Reliability as a
service
By
introducing
the FTM for
Reliability of
the system
increases by
providing
fault tolerance
property as on
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
115 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
reliability using FTM. demand
service
B.7 Fault
Tolerant
Approaches
in Cloud
Infrastructure
Faults in the system Faults are
divided into
application,
virtual
machine and
the physical
node faults.
Reliability is
increased by
the stateless
and the state-
full approaches.
Replication and
the sensors
increase the
fault tolerance.
Autonomic
repairing of
faults
B.8 A
Virtualization
and Fault
Tolerance
Approach
Cloud
infrastructure
Faults are
handled by
the fault
handler.
Automatic
updating the
system by the
fault handler
and using
success rate
parameter to
increase the
reliability of
the system.
Reducing the
service time
and increasing
the system
availability in
a Cloud
environment
IV. CONCLUSION AND FUTURE WORK
In this paper, we have studied different approaches for cloud computing reliability. There are many
issues about the cloud reliability like the heterogeneity, dynamic nature etc. Reliability of cloud
computing depends on the reliability of its components like VM, Physical nodes or the application
running on the cloud environment. There are several types of faults in the cloud environment like crash
fault, timing fault, application faults, etc. Reliability of the cloud environment can be increased by
replication, restart, continuous auditing of all the information about each component of cloud
environment, by using efficient sensors for monitoring. In future we will work on the improvement of the
cloud reliability by proposing a mechanism which implements a collection of these reliability approaches.
V. REFERENCES
[1] National Institute of standards and technology U.S Department of Commerce special publication
800-145 Peter Mell Timothy Grance.
[2] Introduction to Cloud Computing architecture white paper sun microsystem.
[3] Lee Badger, Tim Grance, Robert Patt-Corner, Jeff Voas, “Cloud Computing Synopsis and
Recommendations” NIST Special Publication 800-146.
[4] Sheheryar Malik, Fabrice Huet, “Adaptive Fault Tolerance in Real Time Cloud Computing” World
Congress on Services 2011 IEEE.
[5] Yuan-Shun Dai, Bo Yang, Jack Dongarra, Gewei Zhang, “Cloud Service Reliability: Modeling and
Analysis”.
[6] D. Abramson, R. Buyya, J. Giddy, “A computational economy for grid computing and its
implementation in the Nimrod-G resource broker. Future Generation Computer Systems”.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
116 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org
[7] Y.S. Dai, M. Xie, K.L. Poh “Reliability of grid service systems, Computers & Industrial Engineering”,
50(1-2), 130-147.
[8] Y.S. Dai, M. Xie, K.L. Poh, “Reliability Analysis of Grid Computing Systems”,the9th IEEE Pacific Rim
Symposium on Dependable Computing IEEE Computer Press.
[9] Jing Deng, Scott C.-H. Huang, Yunghsiang S. Han, Julia H. Deng, “Fault-Tolerant and Reliable
Computation in Cloud Computing”.
[10] Ravi Jhawar Vincenzo Piuri, “Fault Tolerance and Resilience in Cloud Computing Environments”.
[11] Wenbing Zhao,P. M. Melliar-Smith and L. E. Moser “Fault Tolerance Middleware for Cloud
Computing” 978-0-7695-4130-3/10 IEEE.
[12] Ravi Jhawar, Vincenzo Piuri, Marco Santambrogioy, “A Comprehensive Conceptual System-Level
Approach to Fault Tolerance in Cloud Computing” 978-1-4673-0750-5 2012 IEEE.
[13] T. Erl, “Service-Oriented Architecture: Concepts, Technology, and Design” USA: Prentice Hall PTR.
[14] Alain Tchana, Laurent Broto, Daniel Hagimont “Fault Tolerance Approaches in Cloud Computing
Infrastructures” ISBN: 978-1-61208-187-8 2012 IEEE.
[15] Microsoft, “Windows azure: Microsoft’s cloud services platform,”
http://www.microsoft.com/windowsazure/.
[16] Walters John Paul, Chaudhary Vipin, “A fault-tolerant strategy for virtualized hpc clusters”, The
Journal of Supercomputing.
[17] Pranesh Das, Dr. Pabitra Mohan Khilar “VFT: A Virtualization and Fault Tolerance Approach for
Cloud Computing” 978-1-4673-5758-6/13/ 2013 IEEE.