Upload
m-z-khan
View
221
Download
0
Embed Size (px)
Citation preview
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
1/7
Design Considerations for Fault Management in
Wireless Sensor NetworksMuhammad Z Khan, Madjid Merabti, Bob Askwith
School of Computing and Mathematical Sciences,
Liverpool John Moores UniversityByrom St. Liverpool, L3 3AF, UK
Abstract- Wireless Sensor Networks (WSNs) are envisioned asdensely deployed tiny sensors, left unattended to monitor and
interact with physical and environmental phenomena. Faults and
failures are inevitable in WSNs due to the inhospitable
environment and unattended deployment. In this paper, we
survey fault management in WSNs, and review and categorize
current approaches and techniques dealing with faults and failure
in WSNs at different levels. The categorization is based on the
different phases of fault management, i.e. fault detection, fault
diagnosis and fault recovery. Based on the literature survey weelaborate different issues and problems in existing approaches for
fault management. We attest that most of these approaches are
application specific and address faults only at a certain level.
Therefore, it cannot guarantee that a protocol developed for one
specific application can carry over directly to another application.
We finally, outline a design criterion to develop application
independent fault management architecture, which can provide
extensive fault management for all types of faults and failures
with a more holistic approach to enable a wider adoption of WSNs
applications and technology. This survey is a part of our ongoing
research to develop application independent fault management.
We are currently investigating mechanisms to inject applications
knowledge into the large computing management infrastructure
of WSNs. Application knowledge is the driving force to direct its
operations, in order to tailor to the special needs of oneapplication to another application.
I. INTRODUCTIONRecent advances in wireless networking and communication,
the development of MEMS (Micro-Electro-Mechanical
Systems) and its integration with embedded microprocessors
have enabled a new breed of sensor networks suitable for a
wide range of civil, commercial, and military applications.
Modern WSNs are made up of a collection of densely
deployed, inexpensive, tiny sensor devices that are networked
through a low power wireless communication, to cooperatively
monitor the physical or environmental phenomenon.Figure.1 [1], is an example of general WSN, where sensor
nodes are scattered into a sensor filed, perform sensing and
sending results back to the end user (performing local
monitoring/remote monitoring) through Sink node. Proposed
applications of WSNs include environmental monitoring,
habitat monitoring, structure monitoring, healthcare, disaster
prediction and management, enemy tracking in the battlefield,
security surveillance, home appliances and entertainment. Thecore reason of its popularity is its low price and its ease of
deployment, particularly such networks are useful in hazardous
and inaccessible environments, where there is no or less humanaccessibility e.g. battlefield and chemically polluted areas etc.
Figure 1. Wireless Sensor Network
Sensor nodes in WSNs are expected to operate autonomously
for a long period of time and may not be easily approachable
for battery replacement and maintenance due to their physical
deployment location. Therefore, faults and failures are normal
facts in WSNs. Thus, in order to guarantee the network quality
of service and performance, it is essential for the WSNs to be
able to detect faults and failures and to perform something akinto healing and recovering from events that might cause faults
or misbehaviour in the network. A set of functions or
applications designed specifically for this purpose is called a
fault-management platform. Most of the existing fault
management approaches for WSNs have been integrated with
application requirements. The main reason for this is that
WSNs are energy and resource constrained, and direct
application of traditional fault management techniques incurs a
significant overhead. Thereby, to design an application
independent and efficient fault management architecture, we
must take into account a wide variety of sensor applications
with diverse needs, different sources of faults, and with various
network configurations. In addition, scalability, mobility, andtimeliness may have to be considered [2].In this paper, we discuss faults and fault management in
WSNs. We categorize and compare existing fault management
approaches based on their classification into three main phases:
fault detection, fault diagnosis and fault recovery. From the
literature survey, we attest that most of the existing fault
management approaches are tightly application specific and
address faults only at a certain level. We also mention issues
and problems in the existing approaches, and attest that there is
a need for fault management architecture with more holistic
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
2/7
approach to enable a wider adoption of WSNs applications and
technology. We also outline some design criteria for
developing an application independent architecture for WSNs.
The rest of the paper is organized as follows: Section II defines
faults, sources of faults and types of faults in WSNs. Section
III explain fault management in WSNs. In Section IV wesurvey and categorize state of the art fault management
approaches for WSN and mention different issues and
problems in them. Section V describes a design criterion forapplication independent fault management architecture for
WSNs, and finally the paper concludes in section VI.
II. SOURCES OF FAULTS IN WSNsFault is any kind of defect that leads the system to failure, and
failure is a situation when the system deviates from its
specification and cant deliver its intended functionality.
Koushanfar et al. [3] categorized faults into three types:
Permanent faults These faults are continuous andstable in nature e.g. hardware faults within a
component of a sensor node. Intermittent faults An intermittent fault has an
occasional manifestation due to the unstable
characteristics of the hardware, or as a consequence of
software being in a particular subset of space.
Transient faults Transient faults are the result ofsome temporary environmental impact on otherwise
correct hardware, e.g. the impact of cosmic radiation
on the sensing enclosure of a sensor node.
Faults in WSNs can occur for various reasons. Some of the
prominent sources of faults mentioned in [2, 4] are:
1) Node Level Faults: Nodes are fragile; they may fail due to
the depletion of batteries, node's hardware/software
malfunction and the external impact of harsh environmentalconditions (direct contact with water causing short circuit, node
crash by tree falling etc).
2) Network Level Faults: Instability of the link between nodes
causing network partitions and dynamic changes in network
topology leads to network level faults.
3) Sink Level Faults: Failure of the sink leads to a massive
failure of the network. At the sink level, software, that store
and process data are subject to bugs and can lead to loss of data
within the period when fault occurs.
4) Faults caused by adversaries: Since WSNs are often
deployed for critical applications, attacks by adversaries may
cause node faults and consequently, lead the network to failure.
The lack of infrastructure and broadcast nature of wirelessmedium enable adversaries to intrude into the network, and
disrupt the whole functionality (e.g. routing, aggregation etc)
of an individual sensor node [5].
III. FAULT MANAGEMENT IN WSNsFault management is a very important component of network
management concerned with detecting, diagnosing, and
resolving faults in the network. Proper implementation of fault
management can keep the network running at an optimum level
and minimize the risk of failure, consequently, make the
network more fault tolerant [6]. Important functions of fault
management include:
constant monitoring of system status and usage level general diagnostics tracing the location of potential and actual failure Auto-recovery and self-healing in the event of failure
A sensor network management system can be categorized
according to the approaches taken for monitoring, and control.
From the management system organization perspective, there
are two main categories of network monitoring [7]:passive Vs
active monitoring, andpro-active Vs reactive monitoring.
Passive monitoring The passive model triggers thealarms when a fault is detected.
Active monitoring Sensor nodes continuously sendthe keep alive or update messages to the control centre
to inform them of their existence.
Pro-active monitoring A management systemactively collects and analyzes the network present
states to detect past events and to predict future events
in order to maintain the performance of the network. Reactive monitoring A Management system gathers
information about the network states to detect whether
events of interest have occurred and then to take some
adaptive measures to re-configure the network.Fault management in WSNs can be classified according to its
network management system architecture [8]: centralized,
distributed or hierarchical.1) Centralized architecture: Base station or the central
manager has rich and unlimited resources. Therefore, it
performs complex management tasks and controls the whole
network.
2) Distributed Architecture: Instead of having a single central
controller, distributed architecture employs multiple managerstations throughout the whole network. Each manager controls
a sub-region of the network and may communicate directly
with other manager stations in a co-operative manner in order
to perform management functions.
3) Hierarchical Architecture:It is a hybrid between centralized
and distributed architectures. Sub-controller or managers are
distributed throughout the network in a tree shape hierarchical
manner, having lower and higher level of hierarchy. These
managers are referred as the Intermediate managers, manage asub-section of a network and perform the management
functions, but they don't communicate directly with each other.
IV. FAULT MANAGEMENT APPRAOCHESFault management in WSNs is different from traditional
networks. Recently, researchers have developed various
techniques and approaches to deal with various types of faults
at different layers of the network. To provide resilience in
faulty situations these three main actions (fault detection, fault
diagnosis and fault recovery) must be performed [2, 9]. We
categorize these existing approaches according to differentphases of the fault management architecture, i.e. fault
detection, fault diagnosis and fault recovery. In this section,
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
3/7
we will discuss these phases and state of the art approaches to
perform these functions. We also highlight different issues and
problems in the proposed fault management approaches for
WSNs.
A. Fault Detection (First Phase)In the fault management of WSNs, fault detection is the first
phase, where faults and failures in the network are properlyidentified by the management system. The aim of the faultdetection is to ensure that the services being provided are
functioning properly, and in some cases to predict if they will
continue to function properly in the future. Generally, there are
two types of failure detection: explicit detection and implicit
detection, for more details see [9]. Implicit detection is
normally carried out with a passive or active model. Recent
research has investigated automatic fault detection techniques
for WSNs, because the method of visual observation and
manual intervention for fault detection is unsuitable due to its
deployment in inaccessible and hostile environment. Existing
fault detection approaches are mainly classified into two types:
centralizedand distributedapproaches.1) Centralized Approaches: In centralized approaches most of
the management and monitoring tasks are performed by the
central manager or base station, which have powerful resources
(e.g. Energy, computing and memory etc). The central manager
generally adopts an active monitoring model to detect faults,
states of the network performance, and life of an individual
sensor node. A centralized sink location based scheme,
Sympathy [10], provides a debugging tool to detect and
localize faults that may occur due to the interactions between
different sensor nodes in the network. Sympathy has two main
types of nodes: Sympathy-sink and Sympathy-node.Sympathy-sink makes request to Sympathy-node using
message-flooding technique to pool event data and current
states (metrics) of a network. A sympathy management system
actively monitors and dynamically collects run-time node
states and flow information towards Sympathy-sink. In
addition, it detects possible faults by analyzing node states
together with network performance [11]. MANNA [12], a
policy-based centralized approach using the concept of external
managers to detect faults in the network. MANNA assigns
different roles (Managers or Agents) to various sensor nodes
depending on the network characteristics (homogeneous vs.
heterogeneous) and topology. These distinguished nodes
exchange request and response messages with each other for
management purposes. MANNA performs centralized fault
detection based on the analysis of gathered WSN data.
MANNA architecture requires manual configuration and
human intervention to set up agents, which are not practical for
sensor networks deployed in an inaccessible terrain. Similarly,
agent-based fault detection mechanism [13], based on data
aggregation at sink node is an efficient and fast fault detection
approach with minimum energy consumption.
By comparing the current or historical sates of sensor nodes
against the overall network state model (i.e. Topology map,
energy map, coverage map etc) the central manager in WinMS
[14] proposed a centralized approach to detect and prevent
potential failure in the network. WinMS has a lightweight
TDMA protocol design; that provides energy-efficient
management, data transport and local repairs. However, the
initial setup cost for creating a data gathering tree and node
schedule is dependent of the network density [7]. Staddon et al.
[15], while tracing failed nodes in the network proposed asimilar centralized management approach, whereby the
manager monitors the health of an individual sensor node to
detect node failures in the network. The base station constructsthe whole map of the network topology with the help of nodes'
routing update message providing a method for recovering
corrupted routes. Existing centralized-based approaches suffers
from problems such as insufficient scalability, availability and
flexibility, when network becomes more distributed.
2) Distributed Approaches: Distributed approaches employ the
concept of local decision making and distribute management
functions throughout the network. The more decisions a local
node can take by itself, the fewer the number of messages may
need to be delivered to the central manager. These approaches
conserve a lot of sensor node energy and ensure the longevity
of the network [9]. Hsin and Liu [16] proposed an efficientdistributed two-phase self-monitoring mechanism (TP) for fault
detection. In TP, health of an individual node has been
monitored to detect malfunctioning nodes and intrusions that
can result in the destruction of nodes. Each node monitors its
own health and its neighbours' health to provide local fault
detection. TP performs either explicit or implicit fault
detection, based on a two-phase timer scheme for local co-
ordination and information exchange among nodes. Thisapproach requires the network to be pre-configured and each
sensor should have a unique ID. Failure detection through
neighbour co-ordination is used in a number of different
schemes [17, 18]. In these approaches, nodes co-ordinate and
exchange messages and information with their neighbours to
detect and identify network faults before contacting the central
node. Cheng et al. [19], proposed a distributed mechanism for
fault and anomaly detection to identify failed or misbehaving
nodes in event-driven WSNs applications.
Clustering has become an emerging technology for building
scalable and energy-balanced application for WSNs. Recent
research has used clustering approach to evenly distribute fault
management tasks in the network. Clustering divide the whole
network into a group of sensor nodes called clusters, where one
node is selected as a Cluster Head (CH), which have its
associated sensor nodes called cluster members. The CH
executes different fault management functions to detect faults
inside the cluster. Cluster based approaches for distributed fault
detection are proposed by Venkataraman et al. [20] and Yao-
Chang et al. [21]. The approach adopted in these mechanisms
is the exchange of messages between the CH and its member
sensor nodes. If a node is failing due to its energy depletion, it
sends the fail report message to its neighbours and its CH. TheCH in this way can detect the potential fault and invoke the
fault recovery mechanism to keep the cluster connected
Clouqueur et al. [22] use the concept of decision fusion sensors
to co-ordinate with each other to obtain the same global
network state information. It can detect suspicious nodes, if
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
4/7
they then send inconsistent information to the decision fusion
center. WSNMP [23], is a hierarchical network management
system based on clustered formation. WSNMP provides a
method to monitor the network states by collecting
management data and accordingly control and maintain the
network resources.Most recently, researchers are taking interests towards the
statistical analysis for fault detection algorithms. These
mechanisms are simpler to operate, and perform equally wellas other techniques. A Mobile Agent (MB) based approach
proposed by Al-Kasassbeh et al. [24] can present a reasonable
new technology that will help to achieve distributed
management. The proposed technique used a statistical method
based on Wiener Filter to capture the abnormal behaviour of
the MIB variable. The mobile agent migrates from one node to
another accessing an appropriate subset of MIB from each
node and analyzing them locally to perform fault detection.
The method is proved to be more scalable, efficient and does
not take longer for its preparation.
Distributed approaches provide a major shift in the design of
fault management architecture for WSNs. Managementresponsibilities are transferred more towards the sensor nodes,
instead of a central manager, which ultimately makes the
network more reliable and self-managed.
B. Fault Diagnosis (Second Phase)In a fault management architecture fault diagnosis is the next
phase after fault detection. After the detection of faults
(alarms) the management system will start to identify the real
causes of faults. In this way detected faults are properly
identified and distinguished from the other irrelevant false
alarms. The accuracy and correctness of a detected fault have
already been partly achieved using various fault detection
methods already proposed. However, there is still a need ofmore comprehensive model of faults in sensor networks to
support the systems for accurate fault diagnosis [9]. In WSNs,
when a sink node does not receive messages from a specific
region of the routing tree, it is unknown whether it is due to the
failure of a key routing node, or failure of all nodes in the
region. Staddon et al. [15] proposed a fault tracing protocol to
differentiate these two types of faults. The protocols enable the
sink node to construct the complete topology of the network
(each individual node piggybacks its neighbour node's ID,
along with its own reading). Failed nodes can then be traced by
using a divide-and-conqueror strategy based on adaptive route
update messages. This approach is unsuitable for large scale
WSNs, because if there are constant failures, the sink would befrequently broadcasting routing update messages to the nodes,
which will incur significant overhead. To overcome this
problem Liu et al. [11] proposed a probabilistic diagnosis
(PAD) approach for inferring the root cause of abnormal
behavior. PAD uses a packet marking algorithm for efficiently
constructing and dynamically maintaining the inference model.
The algorithm does not incur extra overhead for information
gathering and provides an on-line diagnosis of an operationalsensor network system, which passively observes the network
symptoms from the sink. A distributed diagnosis algorithm for
isolating faulty sensor nodes in WSNs is presented in [17]. Thealgorithm diagnoses transient faults: communication and
incorrect sensor reading faults. Faulty nodes are simply
isolated by identifying fault-free nodes.
Sympathy [10] is a diagnosis tool for detecting and debugging
a node self, path and sink faults. Sympathy monitors regular
network traffic generated by each healthy node, i.e. sensor
readings, routing update messages, synchronization beacons
etc. Sympathy detects faults when nodes are not delivering
sufficient data to the sink, and treats the absence of monitored
traffic as an identification of faults. Sympathy identifies
whether the root cause of failure is node health, connectivity
problem, or at the sink by using an empirical decision tree [2].Chen et al. [18] and Koushanfar et al. [3] focus on sensor
hardware and actuator faults respectively, which are more
prone to be malfunctioning. Intermittent faults are also an
important class of failures. Khilar et al. [25] presented a
probabilistic approach to fault diagnosis in WSNs, considering
intermittent faults in sensor nodes and permanent faults in
wireless links. However, Clouqueur et al. [22] considered
faulty nodes due to harsh environmental conditions.
C. Fault Recovery (Third Phase)We have discussed different techniques for fault detection and
fault diagnosis; we next discuss how faults can be treated. Fault
recovery is the final phase of fault management architecturewhereby the sensor network is re-configured and restructured
in such a way that faults and failures do not impact further on
the network performance [9]. WinMS [14], a centralized
architecture that analyses network state to detect and predict
potential failures and take corrective and preventive actions
accordingly. In WinMS there is a schedule period where local
nodes listen to its environmental activities and can self-
configure themselves in the event of failure without prior
knowledge of the full network topology. WinMS uses a pro-
active technique to instruct nodes to send data less frequently
to conserve energy. The main advantage of WinMS is that it
adaptively adjusts the network by providing local and central
recovery mechanism. A distributed localized Cluster-Basedapproach for fault detection and network connectivity recovery
is proposed by Venkataraman et al. [20]. The scheme is energy
efficient and responsive, however it considers only permanent
faults which occur mainly due to energy depletion in particular
that ultimately leads to the loss of connectivity and coverage in
the network. To improve the robustness and efficiency of
clustered-based scenario, Lai and Chen [26] proposed a
CMATO (Cluster-Member-based fAult Tolerant mechanism)
algorithm. CMATO views the cluster as a whole and takes
advantage of the inter-cluster monitoring of nodes to detect the
faults. When the cluster member detects fault that is caused by
the cluster head, they act co-operatively to select new cluster
head to replace the failed one.
Koushanfar et al. [3] proposed a heterogeneous back-up
scheme for tolerating and recovering of sensor node hardware
malfunctioning. They argued that a single type of hardware
resource can back up different types of resources. The
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
5/7
proposed protocol focuses on five types of resources: sensing,
computing, storage, communication and actuating, which can
replace each other through suitable changes in a system and
application software. WSNMP [23], is a hierarchical network
management architecture, which is based on clustering
formation. The protocol monitors the network with minimumoverhead, collects the management data and finally re-
configures the network periodically to recover it from failure.
The protocol describes the algorithm which generates thetopology of the entire network; once the topology is modeled
the central manager (CM) can reconfigure the network
minimum overhead in the event of any node or link failures.
The protocol focuses on application that provides management
schemes in terms of monitoring and controlling of WSN. It
also detects network faults by identifying non-response nodes,
and if required re-configure the routing path. WSNMP does
well in static homogenous WSNs, but provides no solution in
dynamic changing topology.
Algorithms like LEACH (Low Energy Adaptive Clustering
Hierarchy) and HEED (Hybrid Energy Distributed Clustering)
mainly focus on the balanced energy consumption mechanismand efficient clustering forming. They believe that recovery
through neighbouring cluster head is better than a gateway
node. For example, Asim et al. [27] proposed a distributed
fault detection and recovery architecture of homogenous WSN.
The scheme does the local detection and recovery with mutual
nodes co-ordination. They divide the network in a virtual grid
instead of clustering, which is more energy-efficient and light
weight with minimum communication cost, provides betterreliability and energy efficiency. However, they only consider
permanent faults.
Most of the schemes (centralized and distributed) discussed
here, are not fully adaptive and self-managed. The fault
management and recovery are carried out by exchanging
excessive messages between the central manager and nodes or
CH and member nodes. To overcome this problem Yu et al.
[28], proposed a biologically inspired self-managed fault
management architecture for WSNs. The proposed self-
managed hierarchical architecture fully distributes the
management tasks among different sensor nodes in the
network. The scheme introduces more self-managing functions
to the sensor nodes, which encourages them to be more self
dependent on monitoring their own status instead of frequent
consulting with their cluster-head. In additions, they also give a
solution for faulty nodes replacement in a self-configurable
WSN. The paper, particularly tries to examine the self-
management capabilities adapting to various requirements (e.g.
sensor node failure) in a rapidly changing and hostile
environment. Instead of considering the stereotype distributed
clustering technique, the authors introduce a new management
layer between the cluster-head and its leaf nodes. This will
make the sensor nodes more self-managed (local computationinstead of message transmission in sensor networks).
Table I show the overall classification and comparison of
existing fault management approaches and architecture. The
table describes the approaches with their operation
organization and types of faults they detect, diagnosis and
recoverfrom.
D. Problems and IssuesIn this section, we highlight different issues and problems
existed in already proposed fault management approaches for
WSNs. We believe that there is a need for application
independent fault management architecture with more holisticapproach. It is evident from the literature survey that different
approaches for fault management suffer from the following
problems:
Due to application specific nature of WSNs, it is verychallenging to apply existing fault management
architecture from one application to another.
Most existing approaches [16, 29] mainly focus onfailure detection. However, there is still no
comprehensive solution for fault management in
WSNs from the management architecture perspective.
Different mechanisms proposed for fault recovery [3]are not directly relevant to fault recovery in respect of
the network system level management (e.g. networkconnectivity and network coverage area etc).
Fault recovery mechanisms are mainly applicationspecific (e.g. gateway recovery, common node
recovery etc) and focus on small region or individual
nodes thereby are not fully scalable.
Some decentralized approaches e.g. Hsin et al. [16]require the network to be pre-configured, which is
very costly for resource constrained WSNs.
Some management frameworks require the externalhuman manager to monitor the network management
functionalities e.g. TinyDB, MOTE-VEW and sNMP.
Some schemes [20, 27] only consider permanentfaults and avoid Intermittent and Transient faults.
Most existing approaches in WSNs isolate [30] failedor misbehaving nodes directly from the network
communication, but there is no adequate fault
recovery procedure available.
V. APPLICATION INDEPENDENT ARCHITECTUREIn general, WSNs are tightly application-dependent. WhenWSNs are deployed, applications are not stand-alone but are
integrated into the management infrastructure. The design of
applications and management architecture in WSNs are also
dependent on application semantics (e.g. application specific
data processing combined with data routing). Therefore, unlike
traditional networks, resource constrained WSNs limits the
senor nodes to accommodate a wide variety of applications.
Furthermore, application designers have to develop complex
and special protocol and algorithms for specific sensor
applications [9].
From the above discussion, we can attest that there is a need for
an application independent fault management for WSNs to
improve their robustness, reliability and to enable a wider
adoption of WSNs applications and technology.
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
6/7
TABLE 1
FAULT MANAGEMET APPROACHES CATEGORIZATION
More specifically the architecture should have the following
capabilities:
Unique characteristics and restrictions of WSNs mustbe taken into account when proposing faultmanagement architecture for WSNs.
The fault management architecture should beapplication independent with holistic approach that
tackles faults at a number of different levels with low
overhead in terms of computing bandwidth, reliability
and energy consumption.
It should reduce the un-certainty associated withWSNs operations through fault detection, diagnosis
and recovery.
The architecture should be context aware, adaptive,self-organized and distributed so that the use of
network resources may be up-to its minimum level
while performing fault management responsibilities. The architecture should be lightweight in terms of
design and operations. For this purpose layered
system structure may be used. In layered-based
system structure each functional component is
designed and programmed separately for varioussensor applications and management functions.
In order to provide a continuous support for faultmanagement in various applications of WSNs, a
generic common interface should be provided.
To improve resilience against failures and make thenetwork more fault-tolerant the management
architecture needs to reconfigure its operation and
functions reflective to changes in environment andcircumstances. In other words, fault management
architecture should be self-configured and self-
organized so that it can continuously monitor the
network for faults and technical problems without too
much human intervention.This survey is a part of our ongoing research to develop
application independent fault management architecture for
WSNs. We verified that most of the proposed schemes and
approaches for fault management are tightly applicationsspecific. Therefore, we outline a design criterion for
developing application independent fault management
architecture with more holistic approach. We integrate
application knowledge into the MIB (Management Information
Base) of management infrastructure. The applicationknowledge is the driving force to direct its operation in order to
tailor to the special needs of one application to another
application. Application knowledge may contain information
about the application's network topology, its deployed scenario(indoor/outdoor), data generation and traffic, nature of a sensed
phenomenon, and nodes' power consumption. Integrating the
application knowledge into the MIB of management
infrastructure, provide the basis to develop application
independent fault management architecture with more generic
and holistic approach, which can easily be applicable from one
sensor application to another.
The above discussion of various issues considered and outlined
is by no means exhaustive or complete. There are several otherfactors and designs considerations to be tackled, including
node deployment versus placement, synchronization, coverage,
and security, before we can design and develop application
independent fault management architecture for WSNs.
VI. CONCLUSIONIn this paper we presented a survey on fault management in
WSNs, and reviewed current approaches dealing with fault and
failures in WSNs at different levels. We surveyed state of the
art protocols, algorithms and techniques applied for fault
management in WSNs. Based on our literature survey we canverify that current approaches of fault management provide
solution for faults and failures only in specific applications and
scenarios. We also mentioned problems and issues in the
existing management approaches and attest that there is a need
for application independent fault management architecture that
can provide extensive fault management solutions for all types
of faults and failures in WSNs. Finally, we proposed some
design criteria that need to be considered when designingapplication independent fault management architecture for
WSNs with more a holistic approach. By integrating the
application knowledge into the management infrastructure
Schemes Management System Organization Types of faults & failures addressed Action taken
Sympathy [10]Centralized Hierarchical,
Pro-active monitoring
Node self, Network faults, Sink fault, Crash &
time-out omission failures
Fault Detection &
Diagnosis
MANNA [12]Centralized + Distributed
Passive monitoringNode faults
Detection, Diagnosis &
Recovery
WinMS [14]Centralized + Distributed (Hierarchical)
Pro-active monitoring Node faults (week or faulty) Detection & Recovery
WSNMP [23]Centralized + Distributed (Hierarchical
Clustering based)Node faults, Network faults Detection & Recovery
Cluster-Based approach
[20, 21]Centralized + Distributed
Node faults (energy failures), Network faults
(network connectivity), Permanent faultsDetection & Recovery
Passive Diagnosis of WSNs
[11]
Centralized + Hierarchical,
Probabilistic approach
Passive monitoring
Node faults, Network faults, Transient faultsDetection, Diagnosis &
Recovery
Efficient Tracing of failed
nodes [15]
Centralized
Active monitoringNode faults, Route Faults
Detection, Diagnosis &
Recovery
8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks
7/7
provides us the basis to develop application independent fault
management architecture. We are further investigating the
mechanism that how to inject application knowledge into the
MIB of management infrastructure.
REFERENCES
[1] http://www.alicosystems.com/Wireless%20Sensor%20Netw.
[2] L. Paradis and Q. Han, "A Survey of Fault Management in WirelessSensor Networks," Journal of Network and System Management,
Springer Science + Business Media, LLC, vol. 15, pp. 171-190,
June 2007.
[3] F. Koushanfar, M. Potkonjak, and A. Sangiovanni-Vincentell,
"Fault tolerance techniques for wireless ad hoc sensor networks," in
Proceedings of IEEE Sensors, 2002, pp. 1491-1496 vol.2.
[4] M. Ding, D. Chen, K. Xing, and X. Cheng, "Localized fault-tolerant
event boundary detection in sensor networks," in INFOCOM 2005,
24th Annual Joint Conference of the IEEE Computer and
Communications Societies. vol. 2, D. Chen, Ed., 2005, pp. 902-913
vol. 2.
[5] R. Linnyer Beatrys, G. S. Isabela, B. e. O. Leonardo, W. Hao Chi,
Jos, S. N. Marcos, and A. F. L. Antonio, "Fault management in
event-driven wireless sensor networks," in Proceedings of the 7th
ACM international symposium on Modeling, analysis and
simulation of wireless and mobile systems Venice, Italy: ACM,
2004, pp. 149-156.
[6] J. Suhonen, M. Kohvakka, M. Hannikainen, and T. D. Hamalainen,
"Embedded Software Architecture for Diagnosing Network and
Node Failures in Wireless Sensor Networks," in Embedded
Computer Systems: Architectures, Modeling, and Simulation. vol.
5114/2008: Springer Berlin / Heidelberg, July 18, 2008, pp. 258-
267.
[7] W. L. Lee, A. Datta, and R. Cardell-Oliver, Network Management
in Wireless Sensor Networks: Handbook on Mobile Ad Hoc and
Pervasive Communications American Scientific Publishers, 2006.
[8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, "A
Survey on Sensor Networks," IEEE Communication Magazine, pp.
102-114, August 2002.
[9] Y. Mengjie, H. Mokhtar, and M. Merabti, "Fault Management in
Wireless Sensor Networks," IEEE Wireless Communications, vol.
14, pp. 13-19, 2007.
[10] N. Ramanathan, E. Kohler, L. Girod, and D. Estrin, "Sympathy: a
debugging system for sensor networks [wireless networks]," in 29th
Annual IEEE International Conference on Local Computer
Networks, 2004. , pp. 554-555.
[11] K. Liu, M. Li, Y. Liu, M. Li, Z. Guo, and F. Hong, "Passive
diagnosis for wireless sensor networks," in Proceedings of the 6th
ACM conference on Embedded network sensor systems, Sensys'08
Raleigh, NC, USA: ACM, 2008, pp. 113-126.
[12] L. B. Ruiz, J. M. Nogueira, and A. A. F. Loureiro, "MANNA: a
management architecture for wireless sensor networks,"
Communications Magazine, IEEE, vol. 41, pp. 116-125, 2003.
[13] S. Elhadi, X. Xinyu, and Z. Haiyi, "Agent-based Fault Detection
Mechanism in Wireless Sensor Networks," in Proceedings of the
2007 IEEE/WIC/ACM International Conference on Intelligent
Agent Technology : IEEE Computer Society, 2007.
[14] W. L. Lee, A. Datta, and R. Cardell-Oliver, "WinMS: Wireless
Sensor Network-Management System, An Adaptive Policy-BasedManagement for Wireless Sensor Networks," School of Computer
Science & Software Engineering, The University of Western
Australia, CSSE Technical Report UWA-CSSE-06-001, June 2006.
[15] S. Jessica, B. Dirk, and D. Glenn, "Efficient tracing of failed nodes
in sensor networks," in Proceedings of the 1st ACM international
workshop on Wireless sensor networks and applications Atlanta,
Georgia, USA: ACM, 2002, pp. 122-130.
[16] C. Hsin and M. Liu, "A Two-Phase Self-Monitoring Mechanism for
Wireless Sensor Networks," Journal of Computer Communications
special issue on Sensor Networks, vol. 29, pp. 462-476, February
2006.
[17] L. Myeong-Hyeon and C. Yoon-Hwa, "Distributed diagnosis of
wireless sensor networks," in IEEE Region 10 Conference
,TENCON'07, , 2007, pp. 1-4.
[18] J. Chen, S. Kher, and A. Somani, "Distributed fault detection of
wireless sensor networks," in Proceedings of the 2006 workshop on
Dependability issues in wireless ad hoc networks and sensor
networks Los Angeles, CA, USA: ACM, 2006.
[19] S.-T. Cheng, S.-Y. Li, and C.-M. Chen, "Distributed Detection in
Wireless Sensor Networks," in Seventh IEEE/ACIS International
Conference on Computer and Information Science, ICIS'08, 2008,
pp. 401-406.[20] G. Venkataraman, S. Emmanuel, and S. Thambipillai, "A Cluster-
Based Approach to Fault Detection and Recovery in Wireless
Sensor Networks," in 4th International Symposium on Wireless
Communication Systems, ISWCS'07. , 2007, pp. 35-39.
[21] C. Yao-Chung, L. Zhi-Sheng, and C. Jiann-Liang, "Cluster based
self-organization management protocols for wireless sensor
networks," Consumer Electronics, IEEE Transactions on, vol. 52,
pp. 75-80, 2006.
[22] C. Thomas, K. S. Kewal, and R. Parameswaran, "Fault Tolerance in
Collaborative Sensor Networks for Target Detection," IEEE Trans.
Comput., vol. 53, pp. 320-333, 2004.
[23] M. M. Alam, M. Mamun-Or-Rashid, and C. S. Hong, "WSNMP: A
Network Management Protocol for Wireless Sensor Networks," in
10th International Conference on Advanced Communication
Technology, (ICACT'08) vol. 1, 2008, pp. 742-747.
[24] M. Al-Kasassbeh and M. Adda, "Network fault detection withWiener filter-based agent," Journal of Network and Computer
Applications, vol. 32, pp. 824-833, 2009.
[25] P. M. Khilar and S. Mahapatra, "Intermittent Fault Diagnosis in
Wireless Sensor Networks," in Information Technology, (ICIT
2007). 10th International Conference on, 2007, pp. 145-147.
[26] L. Yongxuan and C. Hong, "Energy-Efficient Fault-Tolerant
Mechanism for Clustered Wireless Sensor Networks," in
Proceedings of 16th International Conference on Computer
Communications and Networks, ICCCN'07, 2007, pp. 272-277.
[27] M. Asim, H. Mokhtar, and M. Merabti, "A Fault Management
Architecture for Wireless Sensor Network," in International
Wireless Communications and Mobile Computing Conference,
IWCMC '08. , 2008, pp. 779-785.
[28] M. Yu, H. Mokhtar, and M. Merabti, "Self-Managed Fault
Management in Wireless Sensor Networks," in The Second
International Conference on Mobile Ubiquitous Computing,Systems, Services and Technologies, UBICOMM '08. , 2008, pp. 13-
18.
[29] A. Peffig, R. Szewczy, J. D. Tygar, Victorw, and D. E. Culler,
"SPINS: Security Protocols for Sensor Networks," in ACM
MobiCom' 01, Rome, Italy, 2001, pp. 189-199.
[30] S. Marti, T. J. Giuli, K. Lai, and M. Baker, "Mitigating routing
misbehavior in mobile ad hoc networks," in Proceedings of the 6th
annual international conference on Mobile computing and
networking Boston, Massachusetts, United States: ACM, 2000, pp.
255-265.