Design Considerations for Fault Management in Wireless Sensor Networks

Embed Size (px)

Citation preview

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    1/7

    Design Considerations for Fault Management in

    Wireless Sensor NetworksMuhammad Z Khan, Madjid Merabti, Bob Askwith

    School of Computing and Mathematical Sciences,

    Liverpool John Moores UniversityByrom St. Liverpool, L3 3AF, UK

    [email protected]

    Abstract- Wireless Sensor Networks (WSNs) are envisioned asdensely deployed tiny sensors, left unattended to monitor and

    interact with physical and environmental phenomena. Faults and

    failures are inevitable in WSNs due to the inhospitable

    environment and unattended deployment. In this paper, we

    survey fault management in WSNs, and review and categorize

    current approaches and techniques dealing with faults and failure

    in WSNs at different levels. The categorization is based on the

    different phases of fault management, i.e. fault detection, fault

    diagnosis and fault recovery. Based on the literature survey weelaborate different issues and problems in existing approaches for

    fault management. We attest that most of these approaches are

    application specific and address faults only at a certain level.

    Therefore, it cannot guarantee that a protocol developed for one

    specific application can carry over directly to another application.

    We finally, outline a design criterion to develop application

    independent fault management architecture, which can provide

    extensive fault management for all types of faults and failures

    with a more holistic approach to enable a wider adoption of WSNs

    applications and technology. This survey is a part of our ongoing

    research to develop application independent fault management.

    We are currently investigating mechanisms to inject applications

    knowledge into the large computing management infrastructure

    of WSNs. Application knowledge is the driving force to direct its

    operations, in order to tailor to the special needs of oneapplication to another application.

    I. INTRODUCTIONRecent advances in wireless networking and communication,

    the development of MEMS (Micro-Electro-Mechanical

    Systems) and its integration with embedded microprocessors

    have enabled a new breed of sensor networks suitable for a

    wide range of civil, commercial, and military applications.

    Modern WSNs are made up of a collection of densely

    deployed, inexpensive, tiny sensor devices that are networked

    through a low power wireless communication, to cooperatively

    monitor the physical or environmental phenomenon.Figure.1 [1], is an example of general WSN, where sensor

    nodes are scattered into a sensor filed, perform sensing and

    sending results back to the end user (performing local

    monitoring/remote monitoring) through Sink node. Proposed

    applications of WSNs include environmental monitoring,

    habitat monitoring, structure monitoring, healthcare, disaster

    prediction and management, enemy tracking in the battlefield,

    security surveillance, home appliances and entertainment. Thecore reason of its popularity is its low price and its ease of

    deployment, particularly such networks are useful in hazardous

    and inaccessible environments, where there is no or less humanaccessibility e.g. battlefield and chemically polluted areas etc.

    Figure 1. Wireless Sensor Network

    Sensor nodes in WSNs are expected to operate autonomously

    for a long period of time and may not be easily approachable

    for battery replacement and maintenance due to their physical

    deployment location. Therefore, faults and failures are normal

    facts in WSNs. Thus, in order to guarantee the network quality

    of service and performance, it is essential for the WSNs to be

    able to detect faults and failures and to perform something akinto healing and recovering from events that might cause faults

    or misbehaviour in the network. A set of functions or

    applications designed specifically for this purpose is called a

    fault-management platform. Most of the existing fault

    management approaches for WSNs have been integrated with

    application requirements. The main reason for this is that

    WSNs are energy and resource constrained, and direct

    application of traditional fault management techniques incurs a

    significant overhead. Thereby, to design an application

    independent and efficient fault management architecture, we

    must take into account a wide variety of sensor applications

    with diverse needs, different sources of faults, and with various

    network configurations. In addition, scalability, mobility, andtimeliness may have to be considered [2].In this paper, we discuss faults and fault management in

    WSNs. We categorize and compare existing fault management

    approaches based on their classification into three main phases:

    fault detection, fault diagnosis and fault recovery. From the

    literature survey, we attest that most of the existing fault

    management approaches are tightly application specific and

    address faults only at a certain level. We also mention issues

    and problems in the existing approaches, and attest that there is

    a need for fault management architecture with more holistic

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    2/7

    approach to enable a wider adoption of WSNs applications and

    technology. We also outline some design criteria for

    developing an application independent architecture for WSNs.

    The rest of the paper is organized as follows: Section II defines

    faults, sources of faults and types of faults in WSNs. Section

    III explain fault management in WSNs. In Section IV wesurvey and categorize state of the art fault management

    approaches for WSN and mention different issues and

    problems in them. Section V describes a design criterion forapplication independent fault management architecture for

    WSNs, and finally the paper concludes in section VI.

    II. SOURCES OF FAULTS IN WSNsFault is any kind of defect that leads the system to failure, and

    failure is a situation when the system deviates from its

    specification and cant deliver its intended functionality.

    Koushanfar et al. [3] categorized faults into three types:

    Permanent faults These faults are continuous andstable in nature e.g. hardware faults within a

    component of a sensor node. Intermittent faults An intermittent fault has an

    occasional manifestation due to the unstable

    characteristics of the hardware, or as a consequence of

    software being in a particular subset of space.

    Transient faults Transient faults are the result ofsome temporary environmental impact on otherwise

    correct hardware, e.g. the impact of cosmic radiation

    on the sensing enclosure of a sensor node.

    Faults in WSNs can occur for various reasons. Some of the

    prominent sources of faults mentioned in [2, 4] are:

    1) Node Level Faults: Nodes are fragile; they may fail due to

    the depletion of batteries, node's hardware/software

    malfunction and the external impact of harsh environmentalconditions (direct contact with water causing short circuit, node

    crash by tree falling etc).

    2) Network Level Faults: Instability of the link between nodes

    causing network partitions and dynamic changes in network

    topology leads to network level faults.

    3) Sink Level Faults: Failure of the sink leads to a massive

    failure of the network. At the sink level, software, that store

    and process data are subject to bugs and can lead to loss of data

    within the period when fault occurs.

    4) Faults caused by adversaries: Since WSNs are often

    deployed for critical applications, attacks by adversaries may

    cause node faults and consequently, lead the network to failure.

    The lack of infrastructure and broadcast nature of wirelessmedium enable adversaries to intrude into the network, and

    disrupt the whole functionality (e.g. routing, aggregation etc)

    of an individual sensor node [5].

    III. FAULT MANAGEMENT IN WSNsFault management is a very important component of network

    management concerned with detecting, diagnosing, and

    resolving faults in the network. Proper implementation of fault

    management can keep the network running at an optimum level

    and minimize the risk of failure, consequently, make the

    network more fault tolerant [6]. Important functions of fault

    management include:

    constant monitoring of system status and usage level general diagnostics tracing the location of potential and actual failure Auto-recovery and self-healing in the event of failure

    A sensor network management system can be categorized

    according to the approaches taken for monitoring, and control.

    From the management system organization perspective, there

    are two main categories of network monitoring [7]:passive Vs

    active monitoring, andpro-active Vs reactive monitoring.

    Passive monitoring The passive model triggers thealarms when a fault is detected.

    Active monitoring Sensor nodes continuously sendthe keep alive or update messages to the control centre

    to inform them of their existence.

    Pro-active monitoring A management systemactively collects and analyzes the network present

    states to detect past events and to predict future events

    in order to maintain the performance of the network. Reactive monitoring A Management system gathers

    information about the network states to detect whether

    events of interest have occurred and then to take some

    adaptive measures to re-configure the network.Fault management in WSNs can be classified according to its

    network management system architecture [8]: centralized,

    distributed or hierarchical.1) Centralized architecture: Base station or the central

    manager has rich and unlimited resources. Therefore, it

    performs complex management tasks and controls the whole

    network.

    2) Distributed Architecture: Instead of having a single central

    controller, distributed architecture employs multiple managerstations throughout the whole network. Each manager controls

    a sub-region of the network and may communicate directly

    with other manager stations in a co-operative manner in order

    to perform management functions.

    3) Hierarchical Architecture:It is a hybrid between centralized

    and distributed architectures. Sub-controller or managers are

    distributed throughout the network in a tree shape hierarchical

    manner, having lower and higher level of hierarchy. These

    managers are referred as the Intermediate managers, manage asub-section of a network and perform the management

    functions, but they don't communicate directly with each other.

    IV. FAULT MANAGEMENT APPRAOCHESFault management in WSNs is different from traditional

    networks. Recently, researchers have developed various

    techniques and approaches to deal with various types of faults

    at different layers of the network. To provide resilience in

    faulty situations these three main actions (fault detection, fault

    diagnosis and fault recovery) must be performed [2, 9]. We

    categorize these existing approaches according to differentphases of the fault management architecture, i.e. fault

    detection, fault diagnosis and fault recovery. In this section,

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    3/7

    we will discuss these phases and state of the art approaches to

    perform these functions. We also highlight different issues and

    problems in the proposed fault management approaches for

    WSNs.

    A. Fault Detection (First Phase)In the fault management of WSNs, fault detection is the first

    phase, where faults and failures in the network are properlyidentified by the management system. The aim of the faultdetection is to ensure that the services being provided are

    functioning properly, and in some cases to predict if they will

    continue to function properly in the future. Generally, there are

    two types of failure detection: explicit detection and implicit

    detection, for more details see [9]. Implicit detection is

    normally carried out with a passive or active model. Recent

    research has investigated automatic fault detection techniques

    for WSNs, because the method of visual observation and

    manual intervention for fault detection is unsuitable due to its

    deployment in inaccessible and hostile environment. Existing

    fault detection approaches are mainly classified into two types:

    centralizedand distributedapproaches.1) Centralized Approaches: In centralized approaches most of

    the management and monitoring tasks are performed by the

    central manager or base station, which have powerful resources

    (e.g. Energy, computing and memory etc). The central manager

    generally adopts an active monitoring model to detect faults,

    states of the network performance, and life of an individual

    sensor node. A centralized sink location based scheme,

    Sympathy [10], provides a debugging tool to detect and

    localize faults that may occur due to the interactions between

    different sensor nodes in the network. Sympathy has two main

    types of nodes: Sympathy-sink and Sympathy-node.Sympathy-sink makes request to Sympathy-node using

    message-flooding technique to pool event data and current

    states (metrics) of a network. A sympathy management system

    actively monitors and dynamically collects run-time node

    states and flow information towards Sympathy-sink. In

    addition, it detects possible faults by analyzing node states

    together with network performance [11]. MANNA [12], a

    policy-based centralized approach using the concept of external

    managers to detect faults in the network. MANNA assigns

    different roles (Managers or Agents) to various sensor nodes

    depending on the network characteristics (homogeneous vs.

    heterogeneous) and topology. These distinguished nodes

    exchange request and response messages with each other for

    management purposes. MANNA performs centralized fault

    detection based on the analysis of gathered WSN data.

    MANNA architecture requires manual configuration and

    human intervention to set up agents, which are not practical for

    sensor networks deployed in an inaccessible terrain. Similarly,

    agent-based fault detection mechanism [13], based on data

    aggregation at sink node is an efficient and fast fault detection

    approach with minimum energy consumption.

    By comparing the current or historical sates of sensor nodes

    against the overall network state model (i.e. Topology map,

    energy map, coverage map etc) the central manager in WinMS

    [14] proposed a centralized approach to detect and prevent

    potential failure in the network. WinMS has a lightweight

    TDMA protocol design; that provides energy-efficient

    management, data transport and local repairs. However, the

    initial setup cost for creating a data gathering tree and node

    schedule is dependent of the network density [7]. Staddon et al.

    [15], while tracing failed nodes in the network proposed asimilar centralized management approach, whereby the

    manager monitors the health of an individual sensor node to

    detect node failures in the network. The base station constructsthe whole map of the network topology with the help of nodes'

    routing update message providing a method for recovering

    corrupted routes. Existing centralized-based approaches suffers

    from problems such as insufficient scalability, availability and

    flexibility, when network becomes more distributed.

    2) Distributed Approaches: Distributed approaches employ the

    concept of local decision making and distribute management

    functions throughout the network. The more decisions a local

    node can take by itself, the fewer the number of messages may

    need to be delivered to the central manager. These approaches

    conserve a lot of sensor node energy and ensure the longevity

    of the network [9]. Hsin and Liu [16] proposed an efficientdistributed two-phase self-monitoring mechanism (TP) for fault

    detection. In TP, health of an individual node has been

    monitored to detect malfunctioning nodes and intrusions that

    can result in the destruction of nodes. Each node monitors its

    own health and its neighbours' health to provide local fault

    detection. TP performs either explicit or implicit fault

    detection, based on a two-phase timer scheme for local co-

    ordination and information exchange among nodes. Thisapproach requires the network to be pre-configured and each

    sensor should have a unique ID. Failure detection through

    neighbour co-ordination is used in a number of different

    schemes [17, 18]. In these approaches, nodes co-ordinate and

    exchange messages and information with their neighbours to

    detect and identify network faults before contacting the central

    node. Cheng et al. [19], proposed a distributed mechanism for

    fault and anomaly detection to identify failed or misbehaving

    nodes in event-driven WSNs applications.

    Clustering has become an emerging technology for building

    scalable and energy-balanced application for WSNs. Recent

    research has used clustering approach to evenly distribute fault

    management tasks in the network. Clustering divide the whole

    network into a group of sensor nodes called clusters, where one

    node is selected as a Cluster Head (CH), which have its

    associated sensor nodes called cluster members. The CH

    executes different fault management functions to detect faults

    inside the cluster. Cluster based approaches for distributed fault

    detection are proposed by Venkataraman et al. [20] and Yao-

    Chang et al. [21]. The approach adopted in these mechanisms

    is the exchange of messages between the CH and its member

    sensor nodes. If a node is failing due to its energy depletion, it

    sends the fail report message to its neighbours and its CH. TheCH in this way can detect the potential fault and invoke the

    fault recovery mechanism to keep the cluster connected

    Clouqueur et al. [22] use the concept of decision fusion sensors

    to co-ordinate with each other to obtain the same global

    network state information. It can detect suspicious nodes, if

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    4/7

    they then send inconsistent information to the decision fusion

    center. WSNMP [23], is a hierarchical network management

    system based on clustered formation. WSNMP provides a

    method to monitor the network states by collecting

    management data and accordingly control and maintain the

    network resources.Most recently, researchers are taking interests towards the

    statistical analysis for fault detection algorithms. These

    mechanisms are simpler to operate, and perform equally wellas other techniques. A Mobile Agent (MB) based approach

    proposed by Al-Kasassbeh et al. [24] can present a reasonable

    new technology that will help to achieve distributed

    management. The proposed technique used a statistical method

    based on Wiener Filter to capture the abnormal behaviour of

    the MIB variable. The mobile agent migrates from one node to

    another accessing an appropriate subset of MIB from each

    node and analyzing them locally to perform fault detection.

    The method is proved to be more scalable, efficient and does

    not take longer for its preparation.

    Distributed approaches provide a major shift in the design of

    fault management architecture for WSNs. Managementresponsibilities are transferred more towards the sensor nodes,

    instead of a central manager, which ultimately makes the

    network more reliable and self-managed.

    B. Fault Diagnosis (Second Phase)In a fault management architecture fault diagnosis is the next

    phase after fault detection. After the detection of faults

    (alarms) the management system will start to identify the real

    causes of faults. In this way detected faults are properly

    identified and distinguished from the other irrelevant false

    alarms. The accuracy and correctness of a detected fault have

    already been partly achieved using various fault detection

    methods already proposed. However, there is still a need ofmore comprehensive model of faults in sensor networks to

    support the systems for accurate fault diagnosis [9]. In WSNs,

    when a sink node does not receive messages from a specific

    region of the routing tree, it is unknown whether it is due to the

    failure of a key routing node, or failure of all nodes in the

    region. Staddon et al. [15] proposed a fault tracing protocol to

    differentiate these two types of faults. The protocols enable the

    sink node to construct the complete topology of the network

    (each individual node piggybacks its neighbour node's ID,

    along with its own reading). Failed nodes can then be traced by

    using a divide-and-conqueror strategy based on adaptive route

    update messages. This approach is unsuitable for large scale

    WSNs, because if there are constant failures, the sink would befrequently broadcasting routing update messages to the nodes,

    which will incur significant overhead. To overcome this

    problem Liu et al. [11] proposed a probabilistic diagnosis

    (PAD) approach for inferring the root cause of abnormal

    behavior. PAD uses a packet marking algorithm for efficiently

    constructing and dynamically maintaining the inference model.

    The algorithm does not incur extra overhead for information

    gathering and provides an on-line diagnosis of an operationalsensor network system, which passively observes the network

    symptoms from the sink. A distributed diagnosis algorithm for

    isolating faulty sensor nodes in WSNs is presented in [17]. Thealgorithm diagnoses transient faults: communication and

    incorrect sensor reading faults. Faulty nodes are simply

    isolated by identifying fault-free nodes.

    Sympathy [10] is a diagnosis tool for detecting and debugging

    a node self, path and sink faults. Sympathy monitors regular

    network traffic generated by each healthy node, i.e. sensor

    readings, routing update messages, synchronization beacons

    etc. Sympathy detects faults when nodes are not delivering

    sufficient data to the sink, and treats the absence of monitored

    traffic as an identification of faults. Sympathy identifies

    whether the root cause of failure is node health, connectivity

    problem, or at the sink by using an empirical decision tree [2].Chen et al. [18] and Koushanfar et al. [3] focus on sensor

    hardware and actuator faults respectively, which are more

    prone to be malfunctioning. Intermittent faults are also an

    important class of failures. Khilar et al. [25] presented a

    probabilistic approach to fault diagnosis in WSNs, considering

    intermittent faults in sensor nodes and permanent faults in

    wireless links. However, Clouqueur et al. [22] considered

    faulty nodes due to harsh environmental conditions.

    C. Fault Recovery (Third Phase)We have discussed different techniques for fault detection and

    fault diagnosis; we next discuss how faults can be treated. Fault

    recovery is the final phase of fault management architecturewhereby the sensor network is re-configured and restructured

    in such a way that faults and failures do not impact further on

    the network performance [9]. WinMS [14], a centralized

    architecture that analyses network state to detect and predict

    potential failures and take corrective and preventive actions

    accordingly. In WinMS there is a schedule period where local

    nodes listen to its environmental activities and can self-

    configure themselves in the event of failure without prior

    knowledge of the full network topology. WinMS uses a pro-

    active technique to instruct nodes to send data less frequently

    to conserve energy. The main advantage of WinMS is that it

    adaptively adjusts the network by providing local and central

    recovery mechanism. A distributed localized Cluster-Basedapproach for fault detection and network connectivity recovery

    is proposed by Venkataraman et al. [20]. The scheme is energy

    efficient and responsive, however it considers only permanent

    faults which occur mainly due to energy depletion in particular

    that ultimately leads to the loss of connectivity and coverage in

    the network. To improve the robustness and efficiency of

    clustered-based scenario, Lai and Chen [26] proposed a

    CMATO (Cluster-Member-based fAult Tolerant mechanism)

    algorithm. CMATO views the cluster as a whole and takes

    advantage of the inter-cluster monitoring of nodes to detect the

    faults. When the cluster member detects fault that is caused by

    the cluster head, they act co-operatively to select new cluster

    head to replace the failed one.

    Koushanfar et al. [3] proposed a heterogeneous back-up

    scheme for tolerating and recovering of sensor node hardware

    malfunctioning. They argued that a single type of hardware

    resource can back up different types of resources. The

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    5/7

    proposed protocol focuses on five types of resources: sensing,

    computing, storage, communication and actuating, which can

    replace each other through suitable changes in a system and

    application software. WSNMP [23], is a hierarchical network

    management architecture, which is based on clustering

    formation. The protocol monitors the network with minimumoverhead, collects the management data and finally re-

    configures the network periodically to recover it from failure.

    The protocol describes the algorithm which generates thetopology of the entire network; once the topology is modeled

    the central manager (CM) can reconfigure the network

    minimum overhead in the event of any node or link failures.

    The protocol focuses on application that provides management

    schemes in terms of monitoring and controlling of WSN. It

    also detects network faults by identifying non-response nodes,

    and if required re-configure the routing path. WSNMP does

    well in static homogenous WSNs, but provides no solution in

    dynamic changing topology.

    Algorithms like LEACH (Low Energy Adaptive Clustering

    Hierarchy) and HEED (Hybrid Energy Distributed Clustering)

    mainly focus on the balanced energy consumption mechanismand efficient clustering forming. They believe that recovery

    through neighbouring cluster head is better than a gateway

    node. For example, Asim et al. [27] proposed a distributed

    fault detection and recovery architecture of homogenous WSN.

    The scheme does the local detection and recovery with mutual

    nodes co-ordination. They divide the network in a virtual grid

    instead of clustering, which is more energy-efficient and light

    weight with minimum communication cost, provides betterreliability and energy efficiency. However, they only consider

    permanent faults.

    Most of the schemes (centralized and distributed) discussed

    here, are not fully adaptive and self-managed. The fault

    management and recovery are carried out by exchanging

    excessive messages between the central manager and nodes or

    CH and member nodes. To overcome this problem Yu et al.

    [28], proposed a biologically inspired self-managed fault

    management architecture for WSNs. The proposed self-

    managed hierarchical architecture fully distributes the

    management tasks among different sensor nodes in the

    network. The scheme introduces more self-managing functions

    to the sensor nodes, which encourages them to be more self

    dependent on monitoring their own status instead of frequent

    consulting with their cluster-head. In additions, they also give a

    solution for faulty nodes replacement in a self-configurable

    WSN. The paper, particularly tries to examine the self-

    management capabilities adapting to various requirements (e.g.

    sensor node failure) in a rapidly changing and hostile

    environment. Instead of considering the stereotype distributed

    clustering technique, the authors introduce a new management

    layer between the cluster-head and its leaf nodes. This will

    make the sensor nodes more self-managed (local computationinstead of message transmission in sensor networks).

    Table I show the overall classification and comparison of

    existing fault management approaches and architecture. The

    table describes the approaches with their operation

    organization and types of faults they detect, diagnosis and

    recoverfrom.

    D. Problems and IssuesIn this section, we highlight different issues and problems

    existed in already proposed fault management approaches for

    WSNs. We believe that there is a need for application

    independent fault management architecture with more holisticapproach. It is evident from the literature survey that different

    approaches for fault management suffer from the following

    problems:

    Due to application specific nature of WSNs, it is verychallenging to apply existing fault management

    architecture from one application to another.

    Most existing approaches [16, 29] mainly focus onfailure detection. However, there is still no

    comprehensive solution for fault management in

    WSNs from the management architecture perspective.

    Different mechanisms proposed for fault recovery [3]are not directly relevant to fault recovery in respect of

    the network system level management (e.g. networkconnectivity and network coverage area etc).

    Fault recovery mechanisms are mainly applicationspecific (e.g. gateway recovery, common node

    recovery etc) and focus on small region or individual

    nodes thereby are not fully scalable.

    Some decentralized approaches e.g. Hsin et al. [16]require the network to be pre-configured, which is

    very costly for resource constrained WSNs.

    Some management frameworks require the externalhuman manager to monitor the network management

    functionalities e.g. TinyDB, MOTE-VEW and sNMP.

    Some schemes [20, 27] only consider permanentfaults and avoid Intermittent and Transient faults.

    Most existing approaches in WSNs isolate [30] failedor misbehaving nodes directly from the network

    communication, but there is no adequate fault

    recovery procedure available.

    V. APPLICATION INDEPENDENT ARCHITECTUREIn general, WSNs are tightly application-dependent. WhenWSNs are deployed, applications are not stand-alone but are

    integrated into the management infrastructure. The design of

    applications and management architecture in WSNs are also

    dependent on application semantics (e.g. application specific

    data processing combined with data routing). Therefore, unlike

    traditional networks, resource constrained WSNs limits the

    senor nodes to accommodate a wide variety of applications.

    Furthermore, application designers have to develop complex

    and special protocol and algorithms for specific sensor

    applications [9].

    From the above discussion, we can attest that there is a need for

    an application independent fault management for WSNs to

    improve their robustness, reliability and to enable a wider

    adoption of WSNs applications and technology.

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    6/7

    TABLE 1

    FAULT MANAGEMET APPROACHES CATEGORIZATION

    More specifically the architecture should have the following

    capabilities:

    Unique characteristics and restrictions of WSNs mustbe taken into account when proposing faultmanagement architecture for WSNs.

    The fault management architecture should beapplication independent with holistic approach that

    tackles faults at a number of different levels with low

    overhead in terms of computing bandwidth, reliability

    and energy consumption.

    It should reduce the un-certainty associated withWSNs operations through fault detection, diagnosis

    and recovery.

    The architecture should be context aware, adaptive,self-organized and distributed so that the use of

    network resources may be up-to its minimum level

    while performing fault management responsibilities. The architecture should be lightweight in terms of

    design and operations. For this purpose layered

    system structure may be used. In layered-based

    system structure each functional component is

    designed and programmed separately for varioussensor applications and management functions.

    In order to provide a continuous support for faultmanagement in various applications of WSNs, a

    generic common interface should be provided.

    To improve resilience against failures and make thenetwork more fault-tolerant the management

    architecture needs to reconfigure its operation and

    functions reflective to changes in environment andcircumstances. In other words, fault management

    architecture should be self-configured and self-

    organized so that it can continuously monitor the

    network for faults and technical problems without too

    much human intervention.This survey is a part of our ongoing research to develop

    application independent fault management architecture for

    WSNs. We verified that most of the proposed schemes and

    approaches for fault management are tightly applicationsspecific. Therefore, we outline a design criterion for

    developing application independent fault management

    architecture with more holistic approach. We integrate

    application knowledge into the MIB (Management Information

    Base) of management infrastructure. The applicationknowledge is the driving force to direct its operation in order to

    tailor to the special needs of one application to another

    application. Application knowledge may contain information

    about the application's network topology, its deployed scenario(indoor/outdoor), data generation and traffic, nature of a sensed

    phenomenon, and nodes' power consumption. Integrating the

    application knowledge into the MIB of management

    infrastructure, provide the basis to develop application

    independent fault management architecture with more generic

    and holistic approach, which can easily be applicable from one

    sensor application to another.

    The above discussion of various issues considered and outlined

    is by no means exhaustive or complete. There are several otherfactors and designs considerations to be tackled, including

    node deployment versus placement, synchronization, coverage,

    and security, before we can design and develop application

    independent fault management architecture for WSNs.

    VI. CONCLUSIONIn this paper we presented a survey on fault management in

    WSNs, and reviewed current approaches dealing with fault and

    failures in WSNs at different levels. We surveyed state of the

    art protocols, algorithms and techniques applied for fault

    management in WSNs. Based on our literature survey we canverify that current approaches of fault management provide

    solution for faults and failures only in specific applications and

    scenarios. We also mentioned problems and issues in the

    existing management approaches and attest that there is a need

    for application independent fault management architecture that

    can provide extensive fault management solutions for all types

    of faults and failures in WSNs. Finally, we proposed some

    design criteria that need to be considered when designingapplication independent fault management architecture for

    WSNs with more a holistic approach. By integrating the

    application knowledge into the management infrastructure

    Schemes Management System Organization Types of faults & failures addressed Action taken

    Sympathy [10]Centralized Hierarchical,

    Pro-active monitoring

    Node self, Network faults, Sink fault, Crash &

    time-out omission failures

    Fault Detection &

    Diagnosis

    MANNA [12]Centralized + Distributed

    Passive monitoringNode faults

    Detection, Diagnosis &

    Recovery

    WinMS [14]Centralized + Distributed (Hierarchical)

    Pro-active monitoring Node faults (week or faulty) Detection & Recovery

    WSNMP [23]Centralized + Distributed (Hierarchical

    Clustering based)Node faults, Network faults Detection & Recovery

    Cluster-Based approach

    [20, 21]Centralized + Distributed

    Node faults (energy failures), Network faults

    (network connectivity), Permanent faultsDetection & Recovery

    Passive Diagnosis of WSNs

    [11]

    Centralized + Hierarchical,

    Probabilistic approach

    Passive monitoring

    Node faults, Network faults, Transient faultsDetection, Diagnosis &

    Recovery

    Efficient Tracing of failed

    nodes [15]

    Centralized

    Active monitoringNode faults, Route Faults

    Detection, Diagnosis &

    Recovery

  • 8/9/2019 Design Considerations for Fault Management in Wireless Sensor Networks

    7/7

    provides us the basis to develop application independent fault

    management architecture. We are further investigating the

    mechanism that how to inject application knowledge into the

    MIB of management infrastructure.

    REFERENCES

    [1] http://www.alicosystems.com/Wireless%20Sensor%20Netw.

    [2] L. Paradis and Q. Han, "A Survey of Fault Management in WirelessSensor Networks," Journal of Network and System Management,

    Springer Science + Business Media, LLC, vol. 15, pp. 171-190,

    June 2007.

    [3] F. Koushanfar, M. Potkonjak, and A. Sangiovanni-Vincentell,

    "Fault tolerance techniques for wireless ad hoc sensor networks," in

    Proceedings of IEEE Sensors, 2002, pp. 1491-1496 vol.2.

    [4] M. Ding, D. Chen, K. Xing, and X. Cheng, "Localized fault-tolerant

    event boundary detection in sensor networks," in INFOCOM 2005,

    24th Annual Joint Conference of the IEEE Computer and

    Communications Societies. vol. 2, D. Chen, Ed., 2005, pp. 902-913

    vol. 2.

    [5] R. Linnyer Beatrys, G. S. Isabela, B. e. O. Leonardo, W. Hao Chi,

    Jos, S. N. Marcos, and A. F. L. Antonio, "Fault management in

    event-driven wireless sensor networks," in Proceedings of the 7th

    ACM international symposium on Modeling, analysis and

    simulation of wireless and mobile systems Venice, Italy: ACM,

    2004, pp. 149-156.

    [6] J. Suhonen, M. Kohvakka, M. Hannikainen, and T. D. Hamalainen,

    "Embedded Software Architecture for Diagnosing Network and

    Node Failures in Wireless Sensor Networks," in Embedded

    Computer Systems: Architectures, Modeling, and Simulation. vol.

    5114/2008: Springer Berlin / Heidelberg, July 18, 2008, pp. 258-

    267.

    [7] W. L. Lee, A. Datta, and R. Cardell-Oliver, Network Management

    in Wireless Sensor Networks: Handbook on Mobile Ad Hoc and

    Pervasive Communications American Scientific Publishers, 2006.

    [8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, "A

    Survey on Sensor Networks," IEEE Communication Magazine, pp.

    102-114, August 2002.

    [9] Y. Mengjie, H. Mokhtar, and M. Merabti, "Fault Management in

    Wireless Sensor Networks," IEEE Wireless Communications, vol.

    14, pp. 13-19, 2007.

    [10] N. Ramanathan, E. Kohler, L. Girod, and D. Estrin, "Sympathy: a

    debugging system for sensor networks [wireless networks]," in 29th

    Annual IEEE International Conference on Local Computer

    Networks, 2004. , pp. 554-555.

    [11] K. Liu, M. Li, Y. Liu, M. Li, Z. Guo, and F. Hong, "Passive

    diagnosis for wireless sensor networks," in Proceedings of the 6th

    ACM conference on Embedded network sensor systems, Sensys'08

    Raleigh, NC, USA: ACM, 2008, pp. 113-126.

    [12] L. B. Ruiz, J. M. Nogueira, and A. A. F. Loureiro, "MANNA: a

    management architecture for wireless sensor networks,"

    Communications Magazine, IEEE, vol. 41, pp. 116-125, 2003.

    [13] S. Elhadi, X. Xinyu, and Z. Haiyi, "Agent-based Fault Detection

    Mechanism in Wireless Sensor Networks," in Proceedings of the

    2007 IEEE/WIC/ACM International Conference on Intelligent

    Agent Technology : IEEE Computer Society, 2007.

    [14] W. L. Lee, A. Datta, and R. Cardell-Oliver, "WinMS: Wireless

    Sensor Network-Management System, An Adaptive Policy-BasedManagement for Wireless Sensor Networks," School of Computer

    Science & Software Engineering, The University of Western

    Australia, CSSE Technical Report UWA-CSSE-06-001, June 2006.

    [15] S. Jessica, B. Dirk, and D. Glenn, "Efficient tracing of failed nodes

    in sensor networks," in Proceedings of the 1st ACM international

    workshop on Wireless sensor networks and applications Atlanta,

    Georgia, USA: ACM, 2002, pp. 122-130.

    [16] C. Hsin and M. Liu, "A Two-Phase Self-Monitoring Mechanism for

    Wireless Sensor Networks," Journal of Computer Communications

    special issue on Sensor Networks, vol. 29, pp. 462-476, February

    2006.

    [17] L. Myeong-Hyeon and C. Yoon-Hwa, "Distributed diagnosis of

    wireless sensor networks," in IEEE Region 10 Conference

    ,TENCON'07, , 2007, pp. 1-4.

    [18] J. Chen, S. Kher, and A. Somani, "Distributed fault detection of

    wireless sensor networks," in Proceedings of the 2006 workshop on

    Dependability issues in wireless ad hoc networks and sensor

    networks Los Angeles, CA, USA: ACM, 2006.

    [19] S.-T. Cheng, S.-Y. Li, and C.-M. Chen, "Distributed Detection in

    Wireless Sensor Networks," in Seventh IEEE/ACIS International

    Conference on Computer and Information Science, ICIS'08, 2008,

    pp. 401-406.[20] G. Venkataraman, S. Emmanuel, and S. Thambipillai, "A Cluster-

    Based Approach to Fault Detection and Recovery in Wireless

    Sensor Networks," in 4th International Symposium on Wireless

    Communication Systems, ISWCS'07. , 2007, pp. 35-39.

    [21] C. Yao-Chung, L. Zhi-Sheng, and C. Jiann-Liang, "Cluster based

    self-organization management protocols for wireless sensor

    networks," Consumer Electronics, IEEE Transactions on, vol. 52,

    pp. 75-80, 2006.

    [22] C. Thomas, K. S. Kewal, and R. Parameswaran, "Fault Tolerance in

    Collaborative Sensor Networks for Target Detection," IEEE Trans.

    Comput., vol. 53, pp. 320-333, 2004.

    [23] M. M. Alam, M. Mamun-Or-Rashid, and C. S. Hong, "WSNMP: A

    Network Management Protocol for Wireless Sensor Networks," in

    10th International Conference on Advanced Communication

    Technology, (ICACT'08) vol. 1, 2008, pp. 742-747.

    [24] M. Al-Kasassbeh and M. Adda, "Network fault detection withWiener filter-based agent," Journal of Network and Computer

    Applications, vol. 32, pp. 824-833, 2009.

    [25] P. M. Khilar and S. Mahapatra, "Intermittent Fault Diagnosis in

    Wireless Sensor Networks," in Information Technology, (ICIT

    2007). 10th International Conference on, 2007, pp. 145-147.

    [26] L. Yongxuan and C. Hong, "Energy-Efficient Fault-Tolerant

    Mechanism for Clustered Wireless Sensor Networks," in

    Proceedings of 16th International Conference on Computer

    Communications and Networks, ICCCN'07, 2007, pp. 272-277.

    [27] M. Asim, H. Mokhtar, and M. Merabti, "A Fault Management

    Architecture for Wireless Sensor Network," in International

    Wireless Communications and Mobile Computing Conference,

    IWCMC '08. , 2008, pp. 779-785.

    [28] M. Yu, H. Mokhtar, and M. Merabti, "Self-Managed Fault

    Management in Wireless Sensor Networks," in The Second

    International Conference on Mobile Ubiquitous Computing,Systems, Services and Technologies, UBICOMM '08. , 2008, pp. 13-

    18.

    [29] A. Peffig, R. Szewczy, J. D. Tygar, Victorw, and D. E. Culler,

    "SPINS: Security Protocols for Sensor Networks," in ACM

    MobiCom' 01, Rome, Italy, 2001, pp. 189-199.

    [30] S. Marti, T. J. Giuli, K. Lai, and M. Baker, "Mitigating routing

    misbehavior in mobile ad hoc networks," in Proceedings of the 6th

    annual international conference on Mobile computing and

    networking Boston, Massachusetts, United States: ACM, 2000, pp.

    255-265.