Design Considerations for Fault Management in Wireless Sensor Networks

8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

1/7

Design Considerations for Fault Management in

Wireless Sensor NetworksMuhammad Z Khan, Madjid Merabti, Bob Askwith

School of Computing and Mathematical Sciences,

Liverpool John Moores UniversityByrom St. Liverpool, L3 3AF, UK

[email protected]

Abstract- Wireless Sensor Networks (WSNs) are envisioned asdensely deployed tiny sensors, left unattended to monitor and

interact with physical and environmental phenomena. Faults and

failures are inevitable in WSNs due to the inhospitable

environment and unattended deployment. In this paper, we

survey fault management in WSNs, and review and categorize

current approaches and techniques dealing with faults and failure

in WSNs at different levels. The categorization is based on the

different phases of fault management, i.e. fault detection, fault

diagnosis and fault recovery. Based on the literature survey weelaborate different issues and problems in existing approaches for

fault management. We attest that most of these approaches are

application specific and address faults only at a certain level.

Therefore, it cannot guarantee that a protocol developed for one

specific application can carry over directly to another application.

We finally, outline a design criterion to develop application

independent fault management architecture, which can provide

extensive fault management for all types of faults and failures

with a more holistic approach to enable a wider adoption of WSNs

applications and technology. This survey is a part of our ongoing

research to develop application independent fault management.

We are currently investigating mechanisms to inject applications

knowledge into the large computing management infrastructure

of WSNs. Application knowledge is the driving force to direct its

operations, in order to tailor to the special needs of oneapplication to another application.

I. INTRODUCTIONRecent advances in wireless networking and communication,

the development of MEMS (Micro-Electro-Mechanical

Systems) and its integration with embedded microprocessors

have enabled a new breed of sensor networks suitable for a

wide range of civil, commercial, and military applications.

Modern WSNs are made up of a collection of densely

deployed, inexpensive, tiny sensor devices that are networked

through a low power wireless communication, to cooperatively

monitor the physical or environmental phenomenon.Figure.1 [1], is an example of general WSN, where sensor

nodes are scattered into a sensor filed, perform sensing and

sending results back to the end user (performing local

monitoring/remote monitoring) through Sink node. Proposed

applications of WSNs include environmental monitoring,

habitat monitoring, structure monitoring, healthcare, disaster

prediction and management, enemy tracking in the battlefield,

security surveillance, home appliances and entertainment. Thecore reason of its popularity is its low price and its ease of

deployment, particularly such networks are useful in hazardous

and inaccessible environments, where there is no or less humanaccessibility e.g. battlefield and chemically polluted areas etc.

Figure 1. Wireless Sensor Network

Sensor nodes in WSNs are expected to operate autonomously

for a long period of time and may not be easily approachable

for battery replacement and maintenance due to their physical

deployment location. Therefore, faults and failures are normal

facts in WSNs. Thus, in order to guarantee the network quality

of service and performance, it is essential for the WSNs to be

able to detect faults and failures and to perform something akinto healing and recovering from events that might cause faults

or misbehaviour in the network. A set of functions or

applications designed specifically for this purpose is called a

fault-management platform. Most of the existing fault

management approaches for WSNs have been integrated with

application requirements. The main reason for this is that

WSNs are energy and resource constrained, and direct

application of traditional fault management techniques incurs a

significant overhead. Thereby, to design an application

independent and efficient fault management architecture, we

must take into account a wide variety of sensor applications

with diverse needs, different sources of faults, and with various

network configurations. In addition, scalability, mobility, andtimeliness may have to be considered [2].In this paper, we discuss faults and fault management in

WSNs. We categorize and compare existing fault management

approaches based on their classification into three main phases:

fault detection, fault diagnosis and fault recovery. From the

literature survey, we attest that most of the existing fault

management approaches are tightly application specific and

address faults only at a certain level. We also mention issues

and problems in the existing approaches, and attest that there is

a need for fault management architecture with more holistic


2/7

approach to enable a wider adoption of WSNs applications and

technology. We also outline some design criteria for

developing an application independent architecture for WSNs.

The rest of the paper is organized as follows: Section II defines

faults, sources of faults and types of faults in WSNs. Section

III explain fault management in WSNs. In Section IV wesurvey and categorize state of the art fault management

approaches for WSN and mention different issues and

problems in them. Section V describes a design criterion forapplication independent fault management architecture for

WSNs, and finally the paper concludes in section VI.

II. SOURCES OF FAULTS IN WSNsFault is any kind of defect that leads the system to failure, and

failure is a situation when the system deviates from its

specification and cant deliver its intended functionality.

Koushanfar et al. [3] categorized faults into three types:

Permanent faults These faults are continuous andstable in nature e.g. hardware faults within a

component of a sensor node. Intermittent faults An intermittent fault has an

occasional manifestation due to the unstable

characteristics of the hardware, or as a consequence of

software being in a particular subset of space.

Transient faults Transient faults are the result ofsome temporary environmental impact on otherwise

correct hardware, e.g. the impact of cosmic radiation

on the sensing enclosure of a sensor node.

Faults in WSNs can occur for various reasons. Some of the

prominent sources of faults mentioned in [2, 4] are:

1) Node Level Faults: Nodes are fragile; they may fail due to

the depletion of batteries, node's hardware/software

malfunction and the external impact of harsh environmentalconditions (direct contact with water causing short circuit, node

crash by tree falling etc).

2) Network Level Faults: Instability of the link between nodes

causing network partitions and dynamic changes in network

topology leads to network level faults.

3) Sink Level Faults: Failure of the sink leads to a massive

failure of the network. At the sink level, software, that store

and process data are subject to bugs and can lead to loss of data

within the period when fault occurs.

4) Faults caused by adversaries: Since WSNs are often

deployed for critical applications, attacks by adversaries may

cause node faults and consequently, lead the network to failure.

The lack of infrastructure and broadcast nature of wirelessmedium enable adversaries to intrude into the network, and

disrupt the whole functionality (e.g. routing, aggregation etc)

of an individual sensor node [5].

III. FAULT MANAGEMENT IN WSNsFault management is a very important component of network

management concerned with detecting, diagnosing, and

resolving faults in the network. Proper implementation of fault

management can keep the network running at an optimum level

and minimize the risk of failure, consequently, make the

network more fault tolerant [6]. Important functions of fault

management include:

constant monitoring of system status and usage level general diagnostics tracing the location of potential and actual failure Auto-recovery and self-healing in the event of failure

A sensor network management system can be categorized

according to the approaches taken for monitoring, and control.

From the management system organization perspective, there

are two main categories of network monitoring [7]:passive Vs

active monitoring, andpro-active Vs reactive monitoring.

Passive monitoring The passive model triggers thealarms when a fault is detected.

Active monitoring Sensor nodes continuously sendthe keep alive or update messages to the control centre

to inform them of their existence.

Pro-active monitoring A management systemactively collects and analyzes the network present

states to detect past events and to predict future events

in order to maintain the performance of the network. Reactive monitoring A Management system gathers

information about the network states to detect whether

events of interest have occurred and then to take some

adaptive measures to re-configure the network.Fault management in WSNs can be classified according to its

network management system architecture [8]: centralized,

distributed or hierarchical.1) Centralized architecture: Base station or the central

manager has rich and unlimited resources. Therefore, it

performs complex management tasks and controls the whole

network.

2) Distributed Architecture: Instead of having a single central

controller, distributed architecture employs multiple managerstations throughout the whole network. Each manager controls

a sub-region of the network and may communicate directly

with other manager stations in a co-operative manner in order

to perform management functions.

3) Hierarchical Architecture:It is a hybrid between centralized

and distributed architectures. Sub-controller or managers are

distributed throughout the network in a tree shape hierarchical

manner, having lower and higher level of hierarchy. These

managers are referred as the Intermediate managers, manage asub-section of a network and perform the management

functions, but they don't communicate directly with each other.

IV. FAULT MANAGEMENT APPRAOCHESFault management in WSNs is different from traditional

networks. Recently, researchers have developed various

techniques and approaches to deal with various types of faults

at different layers of the network. To provide resilience in

faulty situations these three main actions (fault detection, fault

diagnosis and fault recovery) must be performed [2, 9]. We

categorize these existing approaches according to differentphases of the fault management architecture, i.e. fault

detection, fault diagnosis and fault recovery. In this section,


3/7

we will discuss these phases and state of the art approaches to

perform these functions. We also highlight different issues and

problems in the proposed fault management approaches for

WSNs.

A. Fault Detection (First Phase)In the fault management of WSNs, fault detection is the first

phase, where faults and failures in the network are properlyidentified by the management system. The aim of the faultdetection is to ensure that the services being provided are

functioning properly, and in some cases to predict if they will

continue to function properly in the future. Generally, there are

two types of failure detection: explicit detection and implicit

detection, for more details see [9]. Implicit detection is

normally carried out with a passive or active model. Recent

research has investigated automatic fault detection techniques

for WSNs, because the method of visual observation and

manual intervention for fault detection is unsuitable due to its

deployment in inaccessible and hostile environment. Existing

fault detection approaches are mainly classified into two types:

centralizedand distributedapproaches.1) Centralized Approaches: In centralized approaches most of

the management and monitoring tasks are performed by the

central manager or base station, which have powerful resources

(e.g. Energy, computing and memory etc). The central manager

generally adopts an active monitoring model to detect faults,

states of the network performance, and life of an individual

sensor node. A centralized sink location based scheme,

Sympathy [10], provides a debugging tool to detect and

localize faults that may occur due to the interactions between

different sensor nodes in the network. Sympathy has two main

types of nodes: Sympathy-sink and Sympathy-node.Sympathy-sink makes request to Sympathy-node using

message-flooding technique to pool event data and current

states (metrics) of a network. A sympathy management system

actively monitors and dynamically collects run-time node

states and flow information towards Sympathy-sink. In

addition, it detects possible faults by analyzing node states

together with network performance [11]. MANNA [12], a

policy-based centralized approach using the concept of external

managers to detect faults in the network. MANNA assigns

different roles (Managers or Agents) to various sensor nodes

depending on the network characteristics (homogeneous vs.

heterogeneous) and topology. These distinguished nodes

exchange request and response messages with each other for

management purposes. MANNA performs centralized fault

detection based on the analysis of gathered WSN data.

MANNA architecture requires manual configuration and

human intervention to set up agents, which are not practical for

sensor networks deployed in an inaccessible terrain. Similarly,

agent-based fault detection mechanism [13], based on data

aggregation at sink node is an efficient and fast fault detection

approach with minimum energy consumption.

By comparing the current or historical sates of sensor nodes

against the overall network state model (i.e. Topology map,

energy map, coverage map etc) the central manager in WinMS

[14] proposed a centralized approach to detect and prevent

potential failure in the network. WinMS has a lightweight

TDMA protocol design; that provides energy-efficient

management, data transport and local repairs. However, the

initial setup cost for creating a data gathering tree and node

schedule is dependent of the network density [7]. Staddon et al.

[15], while tracing failed nodes in the network proposed asimilar centralized management approach, whereby the

manager monitors the health of an individual sensor node to

detect node failures in the network. The base station constructsthe whole map of the network topology with the help of nodes'

routing update message providing a method for recovering

corrupted routes. Existing centralized-based approaches suffers

from problems such as insufficient scalability, availability and

flexibility, when network becomes more distributed.

2) Distributed Approaches: Distributed approaches employ the

concept of local decision making and distribute management

functions throughout the network. The more decisions a local

node can take by itself, the fewer the number of messages may

need to be delivered to the central manager. These approaches

conserve a lot of sensor node energy and ensure the longevity

of the network [9]. Hsin and Liu [16] proposed an efficientdistributed two-phase self-monitoring mechanism (TP) for fault

detection. In TP, health of an individual node has been

monitored to detect malfunctioning nodes and intrusions that

can result in the destruction of nodes. Each node monitors its

own health and its neighbours' health to provide local fault

detection. TP performs either explicit or implicit fault

detection, based on a two-phase timer scheme for local co-

ordination and information exchange among nodes. Thisapproach requires the network to be pre-configured and each

sensor should have a unique ID. Failure detection through

neighbour co-ordination is used in a number of different

schemes [17, 18]. In these approaches, nodes co-ordinate and

exchange messages and information with their neighbours to

detect and identify network faults before contacting the central

node. Cheng et al. [19], proposed a distributed mechanism for

fault and anomaly detection to identify failed or misbehaving

nodes in event-driven WSNs applications.

Clustering has become an emerging technology for building

scalable and energy-balanced application for WSNs. Recent

research has used clustering approach to evenly distribute fault

management tasks in the network. Clustering divide the whole

network into a group of sensor nodes called clusters, where one

node is selected as a Cluster Head (CH), which have its

associated sensor nodes called cluster members. The CH

executes different fault management functions to detect faults

inside the cluster. Cluster based approaches for distributed fault

detection are proposed by Venkataraman et al. [20] and Yao-

Chang et al. [21]. The approach adopted in these mechanisms

is the exchange of messages between the CH and its member

sensor nodes. If a node is failing due to its energy depletion, it

sends the fail report message to its neighbours and its CH. TheCH in this way can detect the potential fault and invoke the

fault recovery mechanism to keep the cluster connected

Clouqueur et al. [22] use the concept of decision fusion sensors

to co-ordinate with each other to obtain the same global

network state information. It can detect suspicious nodes, if


4/7

they then send inconsistent information to the decision fusion

center. WSNMP [23], is a hierarchical network management

system based on clustered formation. WSNMP provides a

method to monitor the network states by collecting

management data and accordingly control and maintain the

network resources.Most recently, researchers are taking interests towards the

statistical analysis for fault detection algorithms. These

mechanisms are simpler to operate, and perform equally wellas other techniques. A Mobile Agent (MB) based approach

proposed by Al-Kasassbeh et al. [24] can present a reasonable

new technology that will help to achieve distributed

management. The proposed technique used a statistical method

based on Wiener Filter to capture the abnormal behaviour of

the MIB variable. The mobile agent migrates from one node to

another accessing an appropriate subset of MIB from each

node and analyzing them locally to perform fault detection.

The method is proved to be more scalable, efficient and does

not take longer for its preparation.

Distributed approaches provide a major shift in the design of

fault management architecture for WSNs. Managementresponsibilities are transferred more towards the sensor nodes,

instead of a central manager, which ultimately makes the

network more reliable and self-managed.

B. Fault Diagnosis (Second Phase)In a fault management architecture fault diagnosis is the next

phase after fault detection. After the detection of faults

(alarms) the management system will start to identify the real

causes of faults. In this way detected faults are properly

identified and distinguished from the other irrelevant false

alarms. The accuracy and correctness of a detected fault have

already been partly achieved using various fault detection

methods already proposed. However, there is still a need ofmore comprehensive model of faults in sensor networks to

support the systems for accurate fault diagnosis [9]. In WSNs,

when a sink node does not receive messages from a specific

region of the routing tree, it is unknown whether it is due to the

failure of a key routing node, or failure of all nodes in the

region. Staddon et al. [15] proposed a fault tracing protocol to

differentiate these two types of faults. The protocols enable the

sink node to construct the complete topology of the network

(each individual node piggybacks its neighbour node's ID,

along with its own reading). Failed nodes can then be traced by

using a divide-and-conqueror strategy based on adaptive route

update messages. This approach is unsuitable for large scale

WSNs, because if there are constant failures, the sink would befrequently broadcasting routing update messages to the nodes,

which will incur significant overhead. To overcome this

problem Liu et al. [11] proposed a probabilistic diagnosis

(PAD) approach for inferring the root cause of abnormal

behavior. PAD uses a packet marking algorithm for efficiently

constructing and dynamically maintaining the inference model.

The algorithm does not incur extra overhead for information

gathering and provides an on-line diagnosis of an operationalsensor network system, which passively observes the network

symptoms from the sink. A distributed diagnosis algorithm for

isolating faulty sensor nodes in WSNs is presented in [17]. Thealgorithm diagnoses transient faults: communication and

incorrect sensor reading faults. Faulty nodes are simply

isolated by identifying fault-free nodes.

Sympathy [10] is a diagnosis tool for detecting and debugging

a node self, path and sink faults. Sympathy monitors regular

network traffic generated by each healthy node, i.e. sensor

readings, routing update messages, synchronization beacons

etc. Sympathy detects faults when nodes are not delivering

sufficient data to the sink, and treats the absence of monitored

traffic as an identification of faults. Sympathy identifies

whether the root cause of failure is node health, connectivity

problem, or at the sink by using an empirical decision tree [2].Chen et al. [18] and Koushanfar et al. [3] focus on sensor

hardware and actuator faults respectively, which are more

prone to be malfunctioning. Intermittent faults are also an

important class of failures. Khilar et al. [25] presented a

probabilistic approach to fault diagnosis in WSNs, considering

intermittent faults in sensor nodes and permanent faults in

wireless links. However, Clouqueur et al. [22] considered

faulty nodes due to harsh environmental conditions.

C. Fault Recovery (Third Phase)We have discussed different techniques for fault detection and

fault diagnosis; we next discuss how faults can be treated. Fault

recovery is the final phase of fault management architecturewhereby the sensor network is re-configured and restructured

in such a way that faults and failures do not impact further on

the network performance [9]. WinMS [14], a centralized

architecture that analyses network state to detect and predict

potential failures and take corrective and preventive actions

accordingly. In WinMS there is a schedule period where local

nodes listen to its environmental activities and can self-

configure themselves in the event of failure without prior

knowledge of the full network topology. WinMS uses a pro-

active technique to instruct nodes to send data less frequently

to conserve energy. The main advantage of WinMS is that it

adaptively adjusts the network by providing local and central

recovery mechanism. A distributed localized Cluster-Basedapproach for fault detection and network connectivity recovery

is proposed by Venkataraman et al. [20]. The scheme is energy

efficient and responsive, however it considers only permanent

faults which occur mainly due to energy depletion in particular

that ultimately leads to the loss of connectivity and coverage in

the network. To improve the robustness and efficiency of

clustered-based scenario, Lai and Chen [26] proposed a

CMATO (Cluster-Member-based fAult Tolerant mechanism)

algorithm. CMATO views the cluster as a whole and takes

advantage of the inter-cluster monitoring of nodes to detect the

faults. When the cluster member detects fault that is caused by

the cluster head, they act co-operatively to select new cluster

head to replace the failed one.

Koushanfar et al. [3] proposed a heterogeneous back-up

scheme for tolerating and recovering of sensor node hardware

malfunctioning. They argued that a single type of hardware

resource can back up different types of resources. The


5/7

proposed protocol focuses on five types of resources: sensing,

computing, storage, communication and actuating, which can

replace each other through suitable changes in a system and

application software. WSNMP [23], is a hierarchical network

management architecture, which is based on clustering

formation. The protocol monitors the network with minimumoverhead, collects the management data and finally re-

configures the network periodically to recover it from failure.

The protocol describes the algorithm which generates thetopology of the entire network; once the topology is modeled

the central manager (CM) can reconfigure the network

minimum overhead in the event of any node or link failures.

The protocol focuses on application that provides management

schemes in terms of monitoring and controlling of WSN. It

also detects network faults by identifying non-response nodes,

and if required re-configure the routing path. WSNMP does

well in static homogenous WSNs, but provides no solution in

dynamic changing topology.

Algorithms like LEACH (Low Energy Adaptive Clustering

Hierarchy) and HEED (Hybrid Energy Distributed Clustering)

mainly focus on the balanced energy consumption mechanismand efficient clustering forming. They believe that recovery

through neighbouring cluster head is better than a gateway

node. For example, Asim et al. [27] proposed a distributed

fault detection and recovery architecture of homogenous WSN.

The scheme does the local detection and recovery with mutual

nodes co-ordination. They divide the network in a virtual grid

instead of clustering, which is more energy-efficient and light

weight with minimum communication cost, provides betterreliability and energy efficiency. However, they only consider

permanent faults.

Most of the schemes (centralized and distributed) discussed

here, are not fully adaptive and self-managed. The fault

management and recovery are carried out by exchanging

excessive messages between the central manager and nodes or

CH and member nodes. To overcome this problem Yu et al.

[28], proposed a biologically inspired self-managed fault

management architecture for WSNs. The proposed self-

managed hierarchical architecture fully distributes the

management tasks among different sensor nodes in the

network. The scheme introduces more self-managing functions

to the sensor nodes, which encourages them to be more self

dependent on monitoring their own status instead of frequent

consulting with their cluster-head. In additions, they also give a

solution for faulty nodes replacement in a self-configurable

WSN. The paper, particularly tries to examine the self-

management capabilities adapting to various requirements (e.g.

sensor node failure) in a rapidly changing and hostile

environment. Instead of considering the stereotype distributed

clustering technique, the authors introduce a new management

layer between the cluster-head and its leaf nodes. This will

make the sensor nodes more self-managed (local computationinstead of message transmission in sensor networks).

Table I show the overall classification and comparison of

existing fault management approaches and architecture. The

table describes the approaches with their operation

organization and types of faults they detect, diagnosis and

recoverfrom.

D. Problems and IssuesIn this section, we highlight different issues and problems

existed in already proposed fault management approaches for

WSNs. We believe that there is a need for application

independent fault management architecture with more holisticapproach. It is evident from the literature survey that different

approaches for fault management suffer from the following

problems:

Due to application specific nature of WSNs, it is verychallenging to apply existing fault management

architecture from one application to another.

Most existing approaches [16, 29] mainly focus onfailure detection. However, there is still no

comprehensive solution for fault management in

WSNs from the management architecture perspective.

Different mechanisms proposed for fault recovery [3]are not directly relevant to fault recovery in respect of

the network system level management (e.g. networkconnectivity and network coverage area etc).

Fault recovery mechanisms are mainly applicationspecific (e.g. gateway recovery, common node

recovery etc) and focus on small region or individual

nodes thereby are not fully scalable.

Some decentralized approaches e.g. Hsin et al. [16]require the network to be pre-configured, which is

very costly for resource constrained WSNs.

Some management frameworks require the externalhuman manager to monitor the network management

functionalities e.g. TinyDB, MOTE-VEW and sNMP.

Some schemes [20, 27] only consider permanentfaults and avoid Intermittent and Transient faults.

Most existing approaches in WSNs isolate [30] failedor misbehaving nodes directly from the network

communication, but there is no adequate fault

recovery procedure available.

V. APPLICATION INDEPENDENT ARCHITECTUREIn general, WSNs are tightly application-dependent. WhenWSNs are deployed, applications are not stand-alone but are

integrated into the management infrastructure. The design of

applications and management architecture in WSNs are also

dependent on application semantics (e.g. application specific

data processing combined with data routing). Therefore, unlike

traditional networks, resource constrained WSNs limits the

senor nodes to accommodate a wide variety of applications.

Furthermore, application designers have to develop complex

and special protocol and algorithms for specific sensor

applications [9].

From the above discussion, we can attest that there is a need for

an application independent fault management for WSNs to

improve their robustness, reliability and to enable a wider

adoption of WSNs applications and technology.


6/7

TABLE 1

FAULT MANAGEMET APPROACHES CATEGORIZATION

More specifically the architecture should have the following

capabilities:

Unique characteristics and restrictions of WSNs mustbe taken into account when proposing faultmanagement architecture for WSNs.

The fault management architecture should beapplication independent with holistic approach that

tackles faults at a number of different levels with low

overhead in terms of computing bandwidth, reliability

and energy consumption.

It should reduce the un-certainty associated withWSNs operations through fault detection, diagnosis

and recovery.

The architecture should be context aware, adaptive,self-organized and distributed so that the use of

network resources may be up-to its minimum level

while performing fault management responsibilities. The architecture should be lightweight in terms of

design and operations. For this purpose layered

system structure may be used. In layered-based

system structure each functional component is

designed and programmed separately for varioussensor applications and management functions.

In order to provide a continuous support for faultmanagement in various applications of WSNs, a

generic common interface should be provided.

To improve resilience against failures and make thenetwork more fault-tolerant the management

architecture needs to reconfigure its operation and

functions reflective to changes in environment andcircumstances. In other words, fault management

architecture should be self-configured and self-

organized so that it can continuously monitor the

network for faults and technical problems without too

much human intervention.This survey is a part of our ongoing research to develop

application independent fault management architecture for

WSNs. We verified that most of the proposed schemes and

approaches for fault management are tightly applicationsspecific. Therefore, we outline a design criterion for

developing application independent fault management

architecture with more holistic approach. We integrate

application knowledge into the MIB (Management Information

Base) of management infrastructure. The applicationknowledge is the driving force to direct its operation in order to

tailor to the special needs of one application to another

application. Application knowledge may contain information

about the application's network topology, its deployed scenario(indoor/outdoor), data generation and traffic, nature of a sensed

phenomenon, and nodes' power consumption. Integrating the

application knowledge into the MIB of management

infrastructure, provide the basis to develop application

independent fault management architecture with more generic

and holistic approach, which can easily be applicable from one

sensor application to another.

The above discussion of various issues considered and outlined

is by no means exhaustive or complete. There are several otherfactors and designs considerations to be tackled, including

node deployment versus placement, synchronization, coverage,

and security, before we can design and develop application

independent fault management architecture for WSNs.

VI. CONCLUSIONIn this paper we presented a survey on fault management in

WSNs, and reviewed current approaches dealing with fault and

failures in WSNs at different levels. We surveyed state of the

art protocols, algorithms and techniques applied for fault

management in WSNs. Based on our literature survey we canverify that current approaches of fault management provide

solution for faults and failures only in specific applications and

scenarios. We also mentioned problems and issues in the

existing management approaches and attest that there is a need

for application independent fault management architecture that

can provide extensive fault management solutions for all types

of faults and failures in WSNs. Finally, we proposed some

design criteria that need to be considered when designingapplication independent fault management architecture for

WSNs with more a holistic approach. By integrating the

application knowledge into the management infrastructure

Schemes Management System Organization Types of faults & failures addressed Action taken

Sympathy [10]Centralized Hierarchical,

Pro-active monitoring

Node self, Network faults, Sink fault, Crash &

time-out omission failures

Fault Detection &

Diagnosis

MANNA [12]Centralized + Distributed

Passive monitoringNode faults

Detection, Diagnosis &

Recovery

WinMS [14]Centralized + Distributed (Hierarchical)

Pro-active monitoring Node faults (week or faulty) Detection & Recovery

WSNMP [23]Centralized + Distributed (Hierarchical

Clustering based)Node faults, Network faults Detection & Recovery

Cluster-Based approach

[20, 21]Centralized + Distributed

Node faults (energy failures), Network faults

(network connectivity), Permanent faultsDetection & Recovery

Passive Diagnosis of WSNs

[11]

Centralized + Hierarchical,

Probabilistic approach

Passive monitoring

Node faults, Network faults, Transient faultsDetection, Diagnosis &

Recovery

Efficient Tracing of failed

nodes [15]

Centralized

Active monitoringNode faults, Route Faults

Detection, Diagnosis &

Recovery


7/7

provides us the basis to develop application independent fault

management architecture. We are further investigating the

mechanism that how to inject application knowledge into the

MIB of management infrastructure.

REFERENCES

[1] http://www.alicosystems.com/Wireless%20Sensor%20Netw.

[2] L. Paradis and Q. Han, "A Survey of Fault Management in WirelessSensor Networks," Journal of Network and System Management,

Springer Science + Business Media, LLC, vol. 15, pp. 171-190,

June 2007.

[3] F. Koushanfar, M. Potkonjak, and A. Sangiovanni-Vincentell,

"Fault tolerance techniques for wireless ad hoc sensor networks," in

Proceedings of IEEE Sensors, 2002, pp. 1491-1496 vol.2.

[4] M. Ding, D. Chen, K. Xing, and X. Cheng, "Localized fault-tolerant

event boundary detection in sensor networks," in INFOCOM 2005,

24th Annual Joint Conference of the IEEE Computer and

Communications Societies. vol. 2, D. Chen, Ed., 2005, pp. 902-913

vol. 2.

[5] R. Linnyer Beatrys, G. S. Isabela, B. e. O. Leonardo, W. Hao Chi,

Jos, S. N. Marcos, and A. F. L. Antonio, "Fault management in

event-driven wireless sensor networks," in Proceedings of the 7th

ACM international symposium on Modeling, analysis and

simulation of wireless and mobile systems Venice, Italy: ACM,

2004, pp. 149-156.

[6] J. Suhonen, M. Kohvakka, M. Hannikainen, and T. D. Hamalainen,

"Embedded Software Architecture for Diagnosing Network and

Node Failures in Wireless Sensor Networks," in Embedded

Computer Systems: Architectures, Modeling, and Simulation. vol.

5114/2008: Springer Berlin / Heidelberg, July 18, 2008, pp. 258-

267.

[7] W. L. Lee, A. Datta, and R. Cardell-Oliver, Network Management

in Wireless Sensor Networks: Handbook on Mobile Ad Hoc and

Pervasive Communications American Scientific Publishers, 2006.

[8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, "A

Survey on Sensor Networks," IEEE Communication Magazine, pp.

102-114, August 2002.

[9] Y. Mengjie, H. Mokhtar, and M. Merabti, "Fault Management in

Wireless Sensor Networks," IEEE Wireless Communications, vol.

14, pp. 13-19, 2007.

[10] N. Ramanathan, E. Kohler, L. Girod, and D. Estrin, "Sympathy: a

debugging system for sensor networks [wireless networks]," in 29th

Annual IEEE International Conference on Local Computer

Networks, 2004. , pp. 554-555.

[11] K. Liu, M. Li, Y. Liu, M. Li, Z. Guo, and F. Hong, "Passive

diagnosis for wireless sensor networks," in Proceedings of the 6th

ACM conference on Embedded network sensor systems, Sensys'08

Raleigh, NC, USA: ACM, 2008, pp. 113-126.

[12] L. B. Ruiz, J. M. Nogueira, and A. A. F. Loureiro, "MANNA: a

management architecture for wireless sensor networks,"

Communications Magazine, IEEE, vol. 41, pp. 116-125, 2003.

[13] S. Elhadi, X. Xinyu, and Z. Haiyi, "Agent-based Fault Detection

Mechanism in Wireless Sensor Networks," in Proceedings of the

2007 IEEE/WIC/ACM International Conference on Intelligent

Agent Technology : IEEE Computer Society, 2007.

[14] W. L. Lee, A. Datta, and R. Cardell-Oliver, "WinMS: Wireless

Sensor Network-Management System, An Adaptive Policy-BasedManagement for Wireless Sensor Networks," School of Computer

Science & Software Engineering, The University of Western

Australia, CSSE Technical Report UWA-CSSE-06-001, June 2006.

[15] S. Jessica, B. Dirk, and D. Glenn, "Efficient tracing of failed nodes

in sensor networks," in Proceedings of the 1st ACM international

workshop on Wireless sensor networks and applications Atlanta,

Georgia, USA: ACM, 2002, pp. 122-130.

[16] C. Hsin and M. Liu, "A Two-Phase Self-Monitoring Mechanism for

Wireless Sensor Networks," Journal of Computer Communications

special issue on Sensor Networks, vol. 29, pp. 462-476, February

2006.

[17] L. Myeong-Hyeon and C. Yoon-Hwa, "Distributed diagnosis of

wireless sensor networks," in IEEE Region 10 Conference

,TENCON'07, , 2007, pp. 1-4.

[18] J. Chen, S. Kher, and A. Somani, "Distributed fault detection of

wireless sensor networks," in Proceedings of the 2006 workshop on

Dependability issues in wireless ad hoc networks and sensor

networks Los Angeles, CA, USA: ACM, 2006.

[19] S.-T. Cheng, S.-Y. Li, and C.-M. Chen, "Distributed Detection in

Wireless Sensor Networks," in Seventh IEEE/ACIS International

Conference on Computer and Information Science, ICIS'08, 2008,

pp. 401-406.[20] G. Venkataraman, S. Emmanuel, and S. Thambipillai, "A Cluster-

Based Approach to Fault Detection and Recovery in Wireless

Sensor Networks," in 4th International Symposium on Wireless

Communication Systems, ISWCS'07. , 2007, pp. 35-39.

[21] C. Yao-Chung, L. Zhi-Sheng, and C. Jiann-Liang, "Cluster based

self-organization management protocols for wireless sensor

networks," Consumer Electronics, IEEE Transactions on, vol. 52,

pp. 75-80, 2006.

[22] C. Thomas, K. S. Kewal, and R. Parameswaran, "Fault Tolerance in

Collaborative Sensor Networks for Target Detection," IEEE Trans.

Comput., vol. 53, pp. 320-333, 2004.

[23] M. M. Alam, M. Mamun-Or-Rashid, and C. S. Hong, "WSNMP: A

Network Management Protocol for Wireless Sensor Networks," in

10th International Conference on Advanced Communication

Technology, (ICACT'08) vol. 1, 2008, pp. 742-747.

[24] M. Al-Kasassbeh and M. Adda, "Network fault detection withWiener filter-based agent," Journal of Network and Computer

Applications, vol. 32, pp. 824-833, 2009.

[25] P. M. Khilar and S. Mahapatra, "Intermittent Fault Diagnosis in

Wireless Sensor Networks," in Information Technology, (ICIT

2007). 10th International Conference on, 2007, pp. 145-147.

[26] L. Yongxuan and C. Hong, "Energy-Efficient Fault-Tolerant

Mechanism for Clustered Wireless Sensor Networks," in

Proceedings of 16th International Conference on Computer

Communications and Networks, ICCCN'07, 2007, pp. 272-277.

[27] M. Asim, H. Mokhtar, and M. Merabti, "A Fault Management

Architecture for Wireless Sensor Network," in International

Wireless Communications and Mobile Computing Conference,

IWCMC '08. , 2008, pp. 779-785.

[28] M. Yu, H. Mokhtar, and M. Merabti, "Self-Managed Fault

Management in Wireless Sensor Networks," in The Second

International Conference on Mobile Ubiquitous Computing,Systems, Services and Technologies, UBICOMM '08. , 2008, pp. 13-

18.

[29] A. Peffig, R. Szewczy, J. D. Tygar, Victorw, and D. E. Culler,

"SPINS: Security Protocols for Sensor Networks," in ACM

MobiCom' 01, Rome, Italy, 2001, pp. 189-199.

[30] S. Marti, T. J. Giuli, K. Lai, and M. Baker, "Mitigating routing

misbehavior in mobile ad hoc networks," in Proceedings of the 6th

annual international conference on Mobile computing and

networking Boston, Massachusetts, United States: ACM, 2000, pp.

255-265.

Documents

Design Considerations for Fault Management in Wireless Sensor Networks