528 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. …people.cs.vt.edu/~irchen/6204/paper/Muk-TMC09.pdf · 2009. 8. 23. · Abstract—Wireless Sensor Networks are a fast-growing

Model-Based Techniques for Data Reliabilityin Wireless Sensor Networks

Shoubhik Mukhopadhyay, Student Member, IEEE, Curt Schurgers, Member, IEEE,

Debashis Panigrahi, Member, IEEE, and Sujit Dey, Senior Member, IEEE

Abstract—Wireless Sensor Networks are a fast-growing class of systems. They offer many new design challenges, due to stringent

requirements like tight energy budgets, low-cost components, limited processing resources, and small footprint devices. Such strict

design goals call for technologies like nanometer-scale semiconductor design and low-power wireless communication to be used. But

using them would also make the sensor data more vulnerable to errors, within both the sensor nodes’ hardware and the wireless

communication links. Assuring the reliability of the data is going to be one of the major design challenges of future sensor networks.

Traditional methods for reliability cannot always be used, because they introduce overheads at different levels, from hardware

complexity to amount of data transmitted. This paper presents a new method that makes use of the properties of sensor data to enable

reliable data collection. The approach consists of creating predictive models based on the temporal correlation in the data and using

them for real-time error correction. This method handles multiple sources of errors together without imposing additional complexity or

resource overhead at the sensor nodes. We demonstrate the ability to correct transient errors arising in sensor node hardware and

wireless communication channels through simulation results on real sensor data.

Index Terms—Reliability, data models, wireless sensor networks, error correction.

Ç

1 INTRODUCTION

RECENT advances in semiconductor technology haveenabled the convergence of sensing, communicating,

and computing on a single device. This has led to theemergence of wireless sensor networks, which are predictedto grow rapidly in the near future [3], [4] and enable a widerange of new applications. This growth is mainly driven bydevelopments in semiconductor design technology, e.g., thescaling of feature sizes and lowering of operating voltages,which are allowing sensor nodes to become smaller andpower efficient [5]. Many new sensor applications are beingdeveloped that can benefit from large numbers of suchsmall sensor nodes [6], [7]. But as the nodes get smaller andcheaper, ensuring the reliability of sensor data becomesharder, since the hardware becomes less robust to manytypes of errors due to the effects of aggressive technologyscaling. Similarly, errors in the wireless communicationchannels are another source of unreliability, as limitationson transmission power due to tight energy constraintsmakes them more susceptible to noise and interference. Theproblem is further aggravated by exposure to harshphysical environments, which is common for many typicalsensing applications. Subsequently, ensuring the reliabilityof the data in a sensor network is going to be a growingproblem and be a challenging part of designing sensornetworks. In this paper, we propose a novel approach to

enable reliable data collection with low-cost and small-footprint devices. Our approach utilizes properties specificto sensor networks and has two advantages: First, it handlesmultiple sources of errors, and second, it imposes nooverhead at the sensor nodes.

To introduce our approach, let us consider a sensornetwork that monitors the temperature distribution in anoffice environment by making new readings every fewminutes. Now, since the “meaning” of the data carried overthis network is known beforehand, during regular opera-tion, certain properties of the data can be validated withinthe network, and samples deduced to be impossible can bemarked or ruled out. For example, if the temperature ismeasured every minute and is always observed to report avalue within two degrees of 25 �C, a few isolated readings at100 �C or �10 �C are likely to be due to errors and should beverified against other sources before triggering any re-sponse. Similarly, if a light sensor that is used to auto-matically control lighting reports a sudden change in thelight level but falls back to the prior level in the next second,it is probably not a good design to change the lighting levelimmediately. Both of these cases illustrate how the meaningof the data, known to the application designer, can be usedto test the validity of the samples within the network as theyare recorded. The approach for reliability that we propose inthis paper is a generalization of this idea, where knowledgeof any properties about the data source can be systematicallyused to perform error correction. Such knowledge could beavailable during design, passed on during deployment, orinferred from historical data.

The different sources of errors affecting wireless sensornetworks have been studied extensively, and many methodsto handle them individually have been developed in theliterature. However, the traditional approaches to providereliability against each kind of error are based on adding

528 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009

. The authors are with the Department of Electrical and ComputerEngineering, University of California, San Diego, 9500 Gilman Dr.,La Jolla, CA 92093-0407.E-mail: {shoubhik, curts, dpani, dey}@ece.ucsd.edu.

Manuscript received 4 June 2007; revised 27 Jan. 2008; accepted 18 Aug.2008; published online 5 Sept. 2008.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2007-06-0153.Digital Object Identifier no. 10.1109/TMC.2008.131.

1536-1233/09/$25.00 � 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

Authorized licensed use limited to: to IEEExplore provided by Virginia Tech Libraries. Downloaded on August 22, 2009 at 23:14 from IEEE Xplore. Restrictions apply.

redundancy and lead to resource overheads in terms ofcommunication bandwidth or hardware complexity, asdiscussed in Section 2 of this paper. On the other hand,our approach uses the properties of the data to distinguishthem from errors introduced at any stage. Since it introducesno hardware, processing, or energy overheads at the sensornodes, it allows simplification of the sensor node design andfurther miniaturization without impacting the robustness.As we discuss in Section 3, this also facilitates a hierarchicalnetwork architecture for sensor networks.

We demonstrate our approach through a methodologythat uses temporal correlation in sensor data sets to performerror detection and correction at receiver nodes. Ourmethod, described in Section 4, consists of generatingpredictions for future sensor data at runtime by using theknowledge about correlation and comparing sequences ofthese predictions with the observed data within a decisiontree. The correlation properties are embodied in datamodels, which generate predictions based on the recenthistory of observations. While we limit our implementationonly to autoregressive (AR) models, the methodology canbe extended to work with other types of models that aresuitable for specific properties of data, e.g., periodicity. Wealso illustrate how our method can be used to complementtraditional techniques like Cyclic Redundancy Check(CRC)-based error detection, e.g., to adjust the level ofprotection depending on the relative levels of correlation inthe data and the levels of errors. We present the problem ofdata modeling in Section 5 and discuss the criteria thatmake a model suitable for use in our method. We alsodescribe how the model can be adapted to changes in dataproperties at runtime.

2 MOTIVATION

In this section, we motivate the need for new approaches ofreliable data collection in wireless sensor networks. We firstdescribe the different types of errors that can affect thesensor data and then look at some of the traditionalmethods for handling such problems. We also discusswhy these techniques will not be able to meet therequirements of many sensor networking applications andwhy a different approach for reliability is necessary.

2.1 Sources of Errors in Sensor Networks

In a wireless sensor network, sensor data samples areexposed to various sources of errors during the course ofsensing, processing, and communication. First, the sensorscan report faulty readings because of changing operatingconditions (e.g., temperature, humidity, etc.) or bias andcalibration drifts caused by aging [8], [9]. After sensing andquantization, the data can still be affected within the sensornodes by various hardware errors such as Crosstalk andradiation effects [10]. Afterward, the data samples are againexposed to channel errors in the wireless communicationchannel. These errors can differ widely in terms of severity,frequency of occurrence, and statistical properties. In thispaper, we classify the sources of error into two types,transient and permanent, based on the effect they have onthe sensor data. The method of error correction that wepresent here is designed to address the transient errors thataffect the sensor data after quantization and is effectiveagainst errors from multiple sources.

The main driving force behind the feasibility of ultrasmallcheap low-power sensor nodes has been the aggressivetechnology scaling in VLSI circuits, which has enabled thesmall form factors and high battery efficiency of sensornodes. However, as a side effect of this scaling, transienterrors in hardware are becoming a prominent problem, asthe shrinking of feature sizes to nanometer scales and thelowering of supply voltages to subvolt ranges are makingthem vulnerable to various noise and interference effects[11], [12], [13]. These effects include various temporaryenvironmental conditions such as power supply andinterconnect noise, electromagnetic interference, electro-static discharges, and also neutron and alpha particle strikes[14], [15], [16]. Short-term disturbances caused by any ofthese effects can lead to transient errors in logic or memorycircuits that would affect any data being processed or stored.

The other source of transient errors in sensor data is thecommunication channel. Effects like noise, interference, andfading are already substantial problems and, since they arefundamental properties of the physical medium, are goingto be around for future generations of sensor networks too.Moreover, the need for longer battery life and low-poweroperation are going to limit the transmission power and thenumber of retransmissions that could be used to compen-sate for such channel impairments at the lower layers.

For the current prototypes of sensor nodes, the dominantsource of errors are the sensing errors and the communica-tion channel errors. But the trends shown by semiconductortechnology development indicate that as the nodes getsmaller, the effect of soft errors in the hardware are going tobecome more dominant. Additionally, outdoor deploymentin harsh physical conditions also aggravates the vulner-ability of these nodes. Therefore, for future generations ofsensor networks that are envisioned (e.g., Smartdust [7],[17]), errors in both the hardware and communicationchannels are going to be significant problems. We now lookat how these problems may be handled by traditionalapproaches to error correction.

2.2 Traditional Methods for Error Correction

The different sources of errors have been studied exten-sively in the literature, and various methods have beendeveloped to handle them individually. Some of thesetraditional methods for error correction are listed in the firstfour rows of Table 1, with the check marks indicating whichsource of errors each method is able to address. However,

MUKHOPADHYAY ET AL.: MODEL-BASED TECHNIQUES FOR DATA RELIABILITY IN WIRELESS SENSOR NETWORKS 529

TABLE 1Different Reliability Methods and Their Resource Overheads


each of these approaches also adds some overhead into thesystem, which can be prohibitive for many applications ofsensor networks. As also listed in the table, these overheadscan be in terms of transmitted data, costs, or hardwarecomplexity. For example, the traditional methods to handlesoft errors in circuits, such as Triple Modular Redundancy(TMR) or Error Correction Codes (ECCs), result in addingextra hardware and complexity to the designs [18], [19],[20]. With the projected increase in the levels of soft errors,the cost and complexity overheads will continue to increasewith successive generations of technology. Similarly, com-munication errors are traditionally handled by introducingredundancy, through either channel coding or AutomaticRetransmission Requests (ARQs) coupled with error detec-tion. Channel coding works using Forward Error Correction(FEC) codes, which add extra bits to transmitted packetsthat allow correct decoding when some of the bits arecorrupted. One common example of FEC is Reed-Solomoncoding [21], which is widely used in telecommunications,broadcasting, and data storage. It works by processing datain fixed-size blocks, adding a fixed number of overhead bitsduring encoding. The decoder can recover a block of datawhen the number of errors for the block does not exceedhalf the number of overhead bits. In Section 6.4, wecompare our approach against Reed-Solomon coding anddemonstrate the substantial energy savings that can beachieved by avoiding the overheads in packet size. Theoverheads of channel coding can be offset by datacompression through source coding, but this will increasethe complexity and cost requirements on the sensor nodes.ARQ is simple to implement and has little overhead undergood channel conditions, but in the presence of errors, it canhave high energy and bandwidth overheads due toretransmissions.

These observations show that the traditional techniquesfor error handling have significant resource overheads,which will make them prohibitively difficult to apply inmany sensor networking applications. The effect of theseoverheads will increase further in future generations ofsensor networks, with sensor nodes becoming smaller andmore resource constrained. As a result, newer techniquesfor reliability need to be developed to fit the needs of highlyresource-constrained sensor networks, particularly forapplications with large numbers of sensor nodes operatingin the presence of multiple sources of errors. We propose anapproach that uses the redundancy within sensor data setsto address transient errors in both computation andcommunication, thus enhancing reliability without anycomplexity overhead in the sensor nodes. We also demon-strate how this approach can be combined with traditionaltechniques depending on the requirements of the applica-tion and the level of redundancy within the data.

3 PAPER CONTRIBUTIONS AND OVERALL

APPROACH

In this section, we present the contributions of this paper.The first contribution is a novel approach for errorcorrection in wireless sensor networks, which can handlethe different types of errors mentioned in Section 2. Wedescribe this approach and its implications on the network

architectures for sensor networks. The second contributionis model-based error correction, a methodology for errorcorrection based on this approach. The final part of thissection gives an overview of this method.

3.1 Using Sensor Data for Error Correction

The central idea of our proposed approach consists of usingthe properties of sensor data sources to detect and correcterrors in the data. Sensor networks are designed to observephysical processes and typically designed to oversamplethe sensors, which results in significant spatial andtemporal correlations [22]. The correlation provides built-in redundancy within the data, which can be handled indifferent ways as part of the system design. One optioncould be to eliminate this redundancy close to the source bycompression techniques, as is done in traditional commu-nication networks. However, while this improves utiliza-tion of the communication channel, it also reduces therobustness of the data against errors and necessitates strongerror coding methods that can add complexity to the endpoints. Instead, we propose to utilize this redundancy tocorrect errors from various sources. The approach is basedon the difference between the correlation properties of thedata and the sources of errors. For example, the soft errorsare completely uncoordinated [10], while the sensor datacan have different types of correlation, depending upon theapplication, type of sensor, location of deployment, etc.Since the correlation properties of the sensor data can varydepending on many factors, the success of this approachdepends on effectively identifying these properties. It alsodepends on how well the properties of the errorsdistinguish them from the data. In the later part of thissection, we describe a method that tries to achieve boththese goals, and in the next section, we will discuss whichproperties of errors allow them to be distinguishable.

There are two important benefits of being able to use thedata properties for error correction. First, it makes thenetwork design simpler and more efficient by addressingmultiple types of errors together and removing the over-heads of removing or adding any redundancy throughcompression or error coding. Second, it allows all theprocessing required for error correction to be moved out ofthe sensor node. This can have a significant impact on thenetwork architecture, since it lets the sensor nodes to becomesimpler and use specialized nodes to perform error correc-tion, possibly for multiple sensor nodes simultaneously. Theresulting network architecture is described below.

3.1.1 Network Architecture

Hierarchical topologies are becoming increasingly popularfor wireless sensor network design, e.g., in IEEE 802.15.4and other research networks [23], [24], [25]. In a hierarchicaltopology, some of the nodes (cluster-heads) are dedicated towork as gateways for a group of other nodes (called leafnodes), so that the latter can only connect to a node outsidetheir cluster through the cluster-heads. Typically, the nodesin a hierarchical sensor network architecture are hetero-geneous, since the cluster-heads carry more traffic while theleaf nodes are dedicated to sensing. Therefore, the separa-tion of error correction functionality from the site of sensingand the asymmetric resource requirements of our approach



makes it a natural fit for hierarchical network architectures.Fig. 1 illustrates how our correction approach can bemapped onto a hierarchical network architecture. Thesensing is done by clusters of dedicated sensor nodes thatreport the sensor data to more complex cluster-head nodes.Each cluster is organized in a star topology, but the cluster-heads can connect between themselves using any flat orclustered connection topology. The cluster-heads aredistinct from the resource-limited sensor nodes by design,as well as functionality. Unlike the very resource-limitedsensor nodes, they have more processing and energyresources and are capable of multiple complex functionsthat involve data processing and storage. Each cluster-headnode can configure the scheduling of sensing, datareporting, and sleeping cycles of the sensor nodes around it.

Although our method is not constrained to sucharchitectures, using it here makes a sensor network moreflexible and scalable and enables handling of diversesensing requirements without sacrificing the reliability ofthe overall system. Thus, the sensor nodes can be simpleand very cheap, with only a sense-and-report functionality.Such sensors can be deployed in large numbers and can bereplaced or supplemented when requirements change orindividual nodes fail. On the other hand, the cluster-headswill be complex but can have most of their functionalityimplemented in software, which will allow a cost-effectivegeneric design to be used across applications. Since ourapproach is based on using the correlation properties ofsensor data, it can also benefit from the large number ofsensor nodes and lots of measurement points that such anarchitecture will enable. When the level of redundancy inthe data is extreme (too low or too high), it is possible to useour techniques in conjunction with traditional approacheslike channel coding and source coding to tune the level ofoperation to current requirements. We illustrate this inSection 4.5 through a hybrid approach that incorporates theoutcome of CRC checksum computation in our method.

3.2 Overview of Model-Based Error Correction

We implement the idea of using data properties forreliability in our proposed methodology for error correc-tion, called model-based error correction. Our method consistsof analyzing the sensor data and capturing the relevantproperties in a data model, which is used during systemoperation to perform error detection and correction onincoming data. The algorithms in our approach run on thecluster-head nodes and can also be implemented inhardware to improve their efficiency or allow parallelexecution.

There are two main parts in this method: data modelingand correction. The first step is the construction of the datamodel by analysis of the properties of the source data.Certain statistical properties will be generally applicable tothe type of application and may be identified by offlineanalysis, e.g., knowing that correlation time is on the orderof hours rather than milliseconds. Other properties willneed to be identified for each sensor and need runtimetuning of the data model. Once a model is identified, it isused during normal operation of the network to detect andcorrect errors, as illustrated in Fig. 2. Before each sample ofsensor data is received, its predicted value ðXpÞ iscomputed using the data model. Upon receiving theobserved sample ðXÞ, the error correction block uses Xp

and the past observations to decide the likelihood of theobserved data being erroneous. The algorithm chooses thecorrected value ðXCÞ to report based on either the observedvalue or the predicted value, depending on the outcome oferror detection. The proposed method also includes afacility for adapting the model online based on the collecteddata to improve the accuracy of prediction and correction. Itshould be noted that our approach is best suited for sensornetworks designed to monitor slow variations over time inrelation to the sampling rate. This excludes event-detectionapplications or those with low sampling rates, where thedata are normally expected to contain sharp variationslasting for one or two samples, and the transient errors canbe indistinguishable from application events.

4 PREDICTIVE ERROR CORRECTION

In the previous section, we presented the idea of perform-ing error correction using data models that can representthe correlation in the sensor data sources. Here, we firstdiscuss how our approach for correction is shaped andsupported by the characteristics of errors occurring inwireless sensor networks. We then present the details of ourmethod and explain the correction algorithms and imple-mentation framework. It is assumed in this section that aworking data model is made available externally. Thecomputation of the model is covered in Section 5.

4.1 Error Models

The approach for error correction proposed in this paperis to use the correlation properties within the data. For thisto be effective, it is necessary that the correlationcharacteristics of the data are different than that of theerror. Since our implementation is based on temporalcorrelation across successive data samples, it is expectedto be effective against errors that are uncorrelated in the


Fig. 1. Hierarchical network architecture showing the functions of sensor

and cluster-head nodes.Fig. 2. Overall scheme for model-based error correction.


same time scale. We observed that among the types oferrors discussed in Section 2, many of the errors can bemodeled as random bit errors in the quantized samples,uniformly distributed across the bit positions. Below, weexplain how this model can be applicable for two differenttypes of errors and its importance in the context of ourerror correction methodology.

The first type consists of the transient errors that occur inhardware for the various reasons discussed in Section 2.These are well-known problems, and numerous faultmodels have been developed for different sources, e.g.,probabilistic models for process variations or randomparticle strikes, as well as deterministic models based oncircuit layouts and signals for Crosstalk [26], [27]. However,these usually model the effect of the errors at the gate levelor register level [28]. Since we are interested in the overalleffect of errors on the data, we propose to model the varioustypes of uncorrelated single-event upsets, as described in [16],as independent bit errors that are uniformly distributedacross the data. This model is particularly appropriate fortransient errors in memory chips, where the uniformstructures for the logic and layout of memory cells makethe cells equally susceptible to radiation and interferenceeffects. However, although it does not include design- andinput-specific parameters, our model is also useful as a first-order approximation for logic circuits.

The other source of errors that we consider is the wirelesscommunication channel. There are various well-knownmodels to represent communication channel errors as well.The choice can depend on the source of the errors, whichcan include noise, interference, and fading due to mobility,as well as the specifics of the modulation scheme, channelconditions, transmission rates, etc. For channels dominatedby fading or interference errors, the channel conditions areusually measured and used to estimate an error model,which is applied for a sequence of packets together.However, such models are most effective where data aretransmitted in continuous streams, so that the channelproperties across adjacent packets are correlated. On theother hand, in a typical sensor network scenario, the sensordata may be sampled and transmitted at larger intervals,probably on the order of seconds [29]. This leads to thepacket interval being larger than typical channel coherencetimes by orders of magnitude, so that estimating a channelmodel for shaping the transmissions of a sequence ofpackets will be very inefficient. In the absence of anyestimates, we propose to use a model of uncoordinatedrandom bit errors that are independent across packets.

If both the hardware and communication errors can berepresented with the uniform random model, it allows ourcorrection approach to be effective in distinguishing themfrom the data by relying on the data correlation. Moreover,this also allows both types of errors to be addressed together,which is one of the main benefits of our method. For oursimulations in Section 6, we generate the errors using aBernoulli process with a uniform error probability for all bitsin the data. It should be noted that although we have used theuniform BER model for its simplicity, it is not a prerequisitefor our method, which can be effective under bursty errorconditions as well. The main requirement for our approach isthat errors are uncorrelated across data samples. Therefore,

as long as the packet transmission intervals outstrip thechannel coherence time or burst lengths, the correlation inthe data can still be effective in identifying errors. Forexample, if the sampling of the sensors happens every fewseconds, 50-200 ms error bursts [30] are not going to affectmultiple data samples, so the errors in successive sampleswill be uncorrelated. Moreover, if the high BER during anerror burst affects multiple bits within each packet, it helpsour approach by making such a sample easier to detect. Wealso confirm this in Section 6, using traces of real errors withbursty nonidealized characteristics.

4.2 Correction Methodology

Our error-correction method consists of computing apredicted value for each sample and using the predictionerror, i.e., the difference between the predicted and theobserved value, to detect whether the observation is correct orerroneous. In general, an abruptly large prediction errorimplies that the observation has been corrupted by an error.However, since the data source is a random process, there willalways be a certain level of prediction error for even the bestmodel. Moreover, as described in Section 5, limitations likecomplexity or size of history can also introduce inadequaciesin the models, which lead to an increase in the level ofprediction errors. Therefore, the main challenge in ourmethod lies in detecting the cause of a prediction error afterit is observed, i.e., whether it is due to the randomness of thedata or from an error introduced into the data after sensing.

In our method, we compare the prediction error for eachsample with that of its neighboring samples and delay thereporting of the final value by a few sample periods toallow comparison with future samples. Whenever there is aprediction error, our method decides whether to report theobserved or the predicted value by looking at how thechoice affects the prediction errors for the subsequentsamples. Since the correlation model makes the predictionbased on recent history, choosing an erroneous observationaffects the prediction of the future samples and results in aprogressive degradation of prediction accuracy. On theother hand, if a prediction inaccuracy is due to randomnessor modeling error, choosing the observed value is unlikelyto impact the future levels of prediction errors. The detailsof the decision-making algorithms are discussed inSection 4.4. Since we assume that errors in adjacent samplesare uncorrelated (see Section 2), whenever the decisionalgorithm detects an error in an observed data sample, theobservation is marked and treated as an erasure. Thesample is also excluded from use in computing subsequentpredictions in order to limit propagation of errors.

The main steps in our implementation of error detectionare outlined in Fig. 3. There are two blocks, prediction anddecision, which implement the main control functions. Thereare also two storage blocks, observation history and predictionhistory, which maintain the state information required formodeling and prediction, respectively. The prediction blockimplements the data prediction model obtained from anexternal model generator, which produces a predicted valuefor incoming data samples based on recent history ofobservations. The output from the prediction block is storedin the prediction history and is used along with the historyof observations as input by the prediction block to predict



future samples. The decision block determines whether anerror occurred in a sample by analyzing the effect of thisdecision on the predictions of the future samples. To enabledoing this efficiently, the history of the predictions since thelast corrected sample is stored in a tree structure, calledPrediction History Tree (PHT), which is processed by thedecision algorithm. Below, we first explain the structure ofthe PHT, followed by a detailed description of the decisionalgorithms that can operate on the PHT structure. We alsodiscuss a hybrid approach that can be used when the sensornode radio has some out-of-band error detection mechanismsuch as a CRC checksum function built into the hardware. Inthat case, we discuss how the result of the checksumcalculation can be used to complement the model-basederror detection to improve the correction performance.

4.3 Data Structure for Prediction History: PHT

The PHT holds the few most recently observed sensor datasamples at any time. It also stores all the possible sequencesof predicted values for these samples, along with thecorresponding prediction errors. It is a complete binary tree,where all nodes at a given level, i.e., at the same distanceaway from the root, contain the different possible values forthe same sample. The root node contains the last correcteddata sample, and its two children hold the observed andpredicted values of the next sample currently underconsideration. The leaf nodes of the PHT hold the predictedvalues for the currently observed sample, and each pathfrom the root to a leaf node holds a possible sequence ofobserved or predicted values for the samples from the onelast corrected till the one most recently observed. The depthof the PHT, N , defined as the number of samples held in itafter the first level, is a key parameter in the decisionalgorithm. It is called decision delay, and its choice isbounded by the delay tolerance of the application and theavailable processing resources in the node.

The structure of the PHT and the method of updating itare illustrated with an example in Fig. 4, where N ¼ 2, sothat the tree has 4 ðN þ 2Þ levels and 15 ð2Nþ2 � 1Þ nodes.Each level l 2 0; 3 corresponds to a sample value at timen� 3þ l, and each node in this level holds a pair of values;the first represents the observed or a predicted value for thedata sample, and the second holds the prediction error forthe nodes with a predicted sample. Outgoing links marked0 and 1 connect each node to two child nodes, containingthe observed and predicted data values for the next sample,respectively. The nodes are sequentially numbered startingwith 0 for the root such that for any node i, there are two

child nodes 2iþ 1 and 2iþ 2 that hold the observed andpredicted values, respectively. The root contains the lastcorrected value Xc½n� 3� ¼ 100 ðXc½n�N � 1�Þ, and theeven-numbered leaf nodes contain the different predictedvalues of Xp½n� that would be computed for differentchoices of previous values.

Once the new sample X½n� is observed, the predictionerrors for all the values of Xp½n� are computed. The decisionalgorithm is run to choose the most likely value forXc½n� 2�out of the values in nodes 1 and 2. The contending paths inthe example are the ones ending in any of the even-numbered nodes in the last level, i.e., nodes 8, 10, 12, or 14.The prediction errors for the nodes on each of the paths areused as a basis of comparison by the algorithm, and finally,the value of either the observation 125 (node 1) or theprediction 100 (node 2) is chosen for Xc½n� 2�.

After making the decision, the algorithm updates thePHT for the next sample. We call the two subtrees rooted innodes 1 and 2 as the observation subtree and prediction subtree,respectively. One of these nodes is chosen to become thenew root node, and all the nodes in its subtree move up byone level. The values in the other subtree are discarded. Anew level of leaf nodes is then added, with the even-numbered nodes containing the next set of predicted valuesXp½nþ 1�. The next observed value X½n� is inserted in all theodd numbered nodes in level N, and the decision andupdate process is repeated for the next sample.

4.4 Decision Algorithm

Given the recent history stored in the PHT, the task of thedecision algorithm is to determine at time n whetherX½n�N�, the observation N samples back, was erroneous.In other words, X½n�N� has to be assigned either theobserved or the predicted value, stored in nodes 1 and 2 ofthe PHT, respectively (Fig. 4). The decision is based onhow the choice affects the prediction accuracy of the nextN samples (till X½n�). Each root-to-leaf path in the PHTcontains a possible sequence of values for these samples,based on different choices of observed or predicted valuefor each. The algorithms compare the behavior of predic-tion errors along these paths to reach the decision.

One possible approach for the decision algorithm is toselect the subtree of PHT that contains the root-leaf path


Fig. 3. Functional diagram of a predictive error correction block. The

dotted line shows the hybrid approach, which incorporates the output

from CRC detection when available.

Fig. 4. PHT with example data ðN ¼ 2Þ.


with the minimum average correction error. However, sincethe samples in the PHT are yet to be corrected, thecorrection errors for them are still unknown. Our firstalgorithm, MinErr, follows this approach, using the predic-tion errors as estimates of the correction errors. The averageprediction error (RMS) is computed over all the predictedsamples in each path, and the path with the minimum RMSerror is selected (Fig. 5). This path will contain either node 1or 2, which is returned as the corrected value of Xc½n�N �.For example, in Fig. 4, among the four paths ending innodes 8, 10, 12, and 14, the path with minimum error is0 : 14. Since it contains node 2, the predicted value is chosenfor X½n� 2�. The PHT is then updated as discussed earlier,and the steps are repeated for the next sample [1]. Thisalgorithm is simple to implement and consists of a smalland fixed number of computations, but it suffers from ahigh sensitivity to the modeling performance. As men-tioned in Section 4.2, prediction errors can occur even forcorrectly recorded samples due to modeling errors. Whilethe modeling error reflects the quality of the model whenaveraged over a number of samples, the errors in individualsamples can be unpredictably high or low. In the PHT, theeffect of individual modeling errors is amplified for thepaths that have a small number of predicted samples, whichcan allow a single sample to determine the overall decision.For example, for path 0 : 8 in Fig. 4, the path error is thesame as the error in node 8. Now, if the observed value ofX½n� were 111 instead of 110, the errors for paths 0 : 8 and0 : 14 would have been 19 and 21, respectively, and MinErr

would have chosen the observed value for X½n�.

This problem is avoided in the MinMax algorithm(Fig. 6), where the subtrees of nodes 1 and 2 are consideredseparately, and the path with the maximum average error ineach subtree is found. The subtree with the smallermaximum error is selected for the decision. This approachfavors the solution that is expected to perform better in theworst case, e.g., the predicted value is chosen in theexample because it performs better over multiple paths.This makes MinMax more resilient to modeling errors,since a path with a spuriously low average error will notaffect the solution if the other paths in the subtree havehigher average errors. For example, in Fig. 4, only paths0 : 10 and 0 : 12 are directly compared. In this case, X½n�would have to be as high as 145 ðpath errors ¼ 15; 32; 19; 33Þto affect the final decision.

Though more resilient than MinErr to modeling errors,MinMax does not take into account the specific properties ofthe model, which can cause spurious detections for certaintypes of models. Consider a case where the size of the historyrequired to compute the prediction is smaller than the depthof the PHT. Now, for some paths, the value of X½n�N �maynot have any effect on the prediction forX½n�. For example, ifthe model used in Fig. 4 only uses the previous sample forprediction, then the choice of the predicted value in node 1will have no effect on the predicted value in node 12. In somecases, the robustness of the decision can be further improvedby excluding paths like these.

This is done in the Peer algorithm, shown in Fig. 7. Here,individual pairs of nodes in each subtree are compared,instead of full paths. The algorithm compares nodes inparallel (peer) positions within the observation and predic-tion subtrees in terms of the absolute prediction errors.After all the comparisons, the subtree that has moresamples with lower prediction errors than their peer nodesis chosen. The correction engine also uses available knowl-edge of the model to identify the predictions that areindependent of the choice of X½n�N� and excludes themfrom the decision-making process. Therefore, for theexample PHT in Fig. 4, the prediction errors for nodes 4,8, and 10 are compared against that of nodes 6, 12, and14, respectively, and the result is to choose the predicted


Fig. 5. MinErr algorithm for error detection and correction.

Fig. 6. MinMax algorithm for error detection and correction.


value, since its subtree has lower prediction errors in twocomparisons versus one higher. The property of the modelis represented as the model order parameter M, which isthe number of samples from history that are used by themodel for prediction (Section 5.1). Before each comparison,it is ensured that either node 1 or 2 or a sample directlypredicted from it is among the previous M samples of theones under examination. As mentioned earlier, if M ¼ 1,this step would exclude the comparison between nodes 8and 12 in the example. Moreover, when the difference inprediction errors within a pair is much smaller than theaverage modeling errors, that pair is disregarded as well.The parameter ETH, representing this Error threshold, isused to implement this. The resulting algorithm offers themost stable results among the three approaches.

The three algorithms mentioned here have similarcomplexity of runtime computation and storage require-ments, with the problem size defined to be the tree-depth N .Since the PHT has 2Nþ2 nodes, the space complexity is�ð2Nþ2Þ. The worst case time complexity has two compo-nents from the prediction ½�ðM:2NÞ� and the decisionalgorithm ½�ð2Nþ1Þ�, respectively, either of which candominate depending on the relative sizes of the modelorder and the PHT. The correction accuracy is harder toanalyze, because it depends on the statistics of the modelingerrors, as well as the actual errors introduced in theobservation. We observed from our experimental studies(Section 6) that all the algorithms perform well when themodels are accurate. However, as discussed before, theyhave different types of resilience toward modeling errors,with Peer designed to perform best.

4.5 Hybrid Correction

The method for model-based correction described aboveassumes that the error detection has to be done without anyadditional information apart from the observed sensor data.However, most radios used in currently available sensornodes already have a checksum function built into thehardware [31], and it is possible to complement ourapproach using the CRC output.

When the checksum is available, we use it to comple-ment the model-based error detection and increase the

overall performance of correction. The pseudocode for thishybrid method is shown in Fig. 8. Here, the result of thechecksum computation is fed to the decision algorithmwhen it is available, as denoted in Fig. 3 with the dottedline. The normal model-based detection using the PHT isdone on a data sample only when the checksum detects noerrors. When an error is detected by the CRC, it is treated asa missing sample and the predicted value from the model isused. The modeling and update stages remain unchanged.This approach improves the correction performance byidentifying communication errors that could have beenmisinterpreted by our approach as modeling errors. It doesnot capture errors originating before the communicationlink, i.e., sensing errors or hardware errors, which wouldstill have to be detected by the model-based approach.

5 DATA MODELING

In the previous sections, we described how predictive datamodels can be used to perform error detection and correctionon sensor data. So far, we have assumed that a data model,which can provide a prediction of the next data sample, isavailable to the error-correction framework. In this section,we discuss the problem of creating the data model, examinewhich characteristics make a model suitable, and describethe implementation of data modeling in our system.

The objective of the data model in our model-basedcorrection system is to predict the next sample of the sensordata. Properties of the data source are identified and usedto make the predictions. Previous research has shown thatthere are many possible types of model that can be used forthis purpose [32], [33], [34]. However, there are a numberof additional requirements characteristic to our error-correction framework that determine the suitability of aparticular type of model. In this section, we first explorethe most important of these requirements and show howour specific choice of model satisfies these requirements. In


Fig. 7. Peer algorithm for error detection and correction.

Fig. 8. Hybrid Peer algorithm with CRC for Error Detection and

Correction.


the second part, we discuss the implementation issues fordata modeling.

5.1 Requirements of Data Model

The performance of the model-based error correctiondepends on the accuracy of the predictions, so maximizingthe prediction accuracy is the primary goal of the datamodel. However, for the model to be used effectively in ourframework, the prediction also needs to be fast and havereasonably low computation and storage overheads. Theserequirements can place conflicting demands on the model,and it is necessary to a strike a balance among them whenchoosing the model.

Ideally, to maximize the prediction accuracy, the modelshould completely represent the correlation present in thedata set. But when a data set shows strong long-rangecorrelation, a complete representation of the correlation canlead to prohibitive latency or resource overheads. It can bealso caused by the choice of high sampling rates for slowlyvarying data sources. The resource limitation is most criticalfor the prediction step, since it is repeated for every path ofthe PHT, i.e., 2N times, for each data sample during thedata-correction process (Section 4.3). Moreover, with multi-ple sensors reporting to a cluster-head, the impact of anyoverhead is multiplied many times. For example, environ-mental sensor data like outdoor temperatures can exhibitcorrelation in the short term, on the scale of hours (e.g., dayversus night), as well as in the long term, on a scale of years(e.g., seasons) or multiple year cycles (e.g., El Nino). Now, inorder to capture the inherent correlation completely, theprediction model would have to refer to a very large dataset spanning durations of years. For computing even asingle prediction, the computations involved will lead tounacceptable latency. Similarly, the need to refer to largeamounts of history would also incur prohibitive memoryrequirements. As a result, it may be necessary to choose aless complex model of limited dimensions to make its useon the cluster-head feasible.

Another problem affecting the accuracy of the predic-tions occurs when the data source is not strictly stationarybut has statistical properties that change over time. This willcause the prediction accuracy to deteriorate progressively,so that a new model will need to be computed. Dependingon the type of data source, it may be necessary to learn themodel multiple times to reflect the current characteristics ofthe sensor data. However, while prediction accuracy isimportant, the best possible model may be too complex ortoo large to update efficiently.

Therefore, there are three main requirements that need tobe considered when selecting a data model: it has to produceaccurate predictions, it should be easy to use for prediction,and it needs to be easy to learn when necessary. There are

many types of models that can satisfy these requirementswith different degrees of success. For our example imple-mentation, we specifically chose linear AR models, whichcapture the effect of recent history through an “aging”process. Such models have been shown to be effective incapturing short-term prediction in time-series analysis andare widely used for forecasting [35], [36]. Choosing ARmodels also satisfies the additional requirements very well,since they use linear prediction functions that can becomputed very efficiently with minimal resource overhead.

The AR model captures the autocorrelation in a data setby expressing the prediction as a linear combination ofprevious samples, as shown below:

Xp½n� ¼XMi¼1

ai½n�:X½n� i�; ð1Þ

X½n� ¼ Xp½n� þ e½n�; ð2Þ

where X½n� and Xp½n� represent the observed and predicteddata values at time n, respectively, and e½n� is the predictionerror. The model is characterized in terms of the modelorder M, which defines the size of the model, and theM coefficients ai. This model has the advantage of lowcomplexity of prediction, �ðMÞ, which reduces the impacton the resource cost at the cluster-head. Learning the modelconsists of estimating the coefficients ais, for which we userecursive LS estimation [32], [37].

Some examples of the modeling performance are shownbelow using four types of data sources. These data sets willbe referred to and described in more detail in Section 6.Table 2 lists the sources and characteristics of the data, andTable 3 presents the modeling errors with estimated ARmodels offline.

5.2 Implementation of Modeling and Update

We designed the data modeling system in two parts: 1) off-line identification of the type of the model through theanalysis of statistical properties and 2) runtime updates tothe model (Fig. 9). As described earlier in this section, weselected the AR model as balance between accuracy and


TABLE 2List of Data Sets

TABLE 3Modeling Performance for Data Sets

Fig. 9. Detailed schematic of data modeling.


ease of updates. In our implementation, the offline model-ing consists of computing the order of the AR model, basedon the correlation time and the sampling rate from trainingdata. The process of runtime model estimation involves twooperations: one to determine when an update to the modelis needed by tracking the prediction accuracy and a secondblock that performs the actual model update.

To support the runtime model updates, we introduce aspecial mode of operation, called Estimation Mode, in whichthe sensor nodes temporarily report the data with additionalerror protections. Operating in this mode makes more reliabledata available for computing updated data models at thecluster-heads. The Estimation Mode can be implementedthrough a variety of temporary software-based redundancymeasures. For example, this may involve making redundantreadings at double the sampling rate to avoid sensing errors,storing redundant copies to overcome hardware errors, andmultiple transmissions to protect against communicationchannel errors. During normal operation, the trackingprocess continuously monitors the trends in the predictionerrors and triggers a model update request when necessary.The details of this monitoring process are discussed later.Upon receiving the update request, the error correctionsystem pauses the regular data gathering process, instructsthe sensor node to temporarily switch to the EstimationMode, and starts updating the model with the protected data.After the update is complete, the sensor node goes back toCorrection Mode, the normal mode of operation.

It should be noted that the sensor data collected in theEstimation Mode are still useful to the application usingthe data, and the switching between modes is transparent tothe application. Moreover, the Estimation Mode can beimplemented without any additional hardware overhead.However, the redundancy measures introduced in thesensor node increase the energy costs per bit whenoperating in this mode. Therefore, when triggering theupdates, the benefits of updating the data model need to becompared to the overhead of switching to the EstimationMode. In some cases, this overhead may make it preferableto continue to use the offline models instead of making theupdates, especially if the properties of the data source donot change too much. For many applications, the modelupdates can be made more efficient by sharing them acrossmultiple neighboring cluster-heads, since a change in theobserved physical phenomena that requires a model updateis likely to affect many sensors deployed over a large areacovering multiple clusters. In such a situation, a newlyupdated model from a neighboring cluster-head can serveas a good starting point for model estimation or even beused directly. While this would require additional trafficand synchronization between cluster-heads, it can lead tosubstantial savings in resources by reducing the time spentin Estimation Mode by each sensor node.

5.2.1 Model Tracking

We track the model accuracy using the prediction error, i.e.

the difference between the observed and predicted values, for

the recently obtained samples. When operating in Correction

Mode, a running windowed average of the prediction error is

maintained and compared with a fixed threshold value to

trigger a switch to Estimation Mode. Among the past

samples, only the correctly received ones are used, and the

threshold is scaled for the number of correct samples in the

averaging window. A comparison of the average estimation

error with a fixed threshold gives a simple way to trigger

model changes. The choice of the two parameters, threshold

value and size of the averaging window, determine the

frequency of updates. The optimal choice would depend on

various characteristics of the data source and the system: the

level of randomness within the data, the stationarity proper-

ties of the generation process, the accuracy of the data model,

and the cost overhead of operating in the Estimation Mode as

compared to the Correction Mode. Thus, the parameters

provide a way to tune the overall system operation to achieve

cost-performance trade-offs according to a global policy.

5.2.2 Model Updates

The runtime update stage operates in the resource-heavyEstimation Mode, so it needs to estimate the model with aminimal set of data points, unlike offline modeling whichcan be done with unlimited data points. In order to reducethe overheads, we restrict the updates to only estimating theparameters of an existing model instead of recomputingthe model. Many other optimizations are possible to makethe online updates efficient as well. For example, if long-term periodic variations (known as seasonality in time-series literature) can be identified in the data during offlinemodeling, it may be possible to characterize the data-generating process in terms of a set of states among which itoperates. The states can be associated with precomputedmodels, so that the runtime updates would only consist ofmatching the current conditions with the most likely state.For our specific case of AR models, we reestimate thecoefficients but do not change the model’s order at runtime.

The trade-off between the accuracy of the model and theresource costs of update is also important in determiningwhen to stop the online estimation process. Increasing thenumber of samples used for estimation can increase theaccuracy of the model. At the same time, sending moreprotected samples adds to the resource overhead at thesensor node and also reduces the number of data samplescorrected by the cluster-head. In our implementation, wehave used the RMS prediction error as a measure of theadequacy of the model. The system moves from EstimationMode to the Correction Mode when the average predictionerror over the current Estimation Mode window falls belowa preset threshold.

Fig. 10 shows the effect of runtime updates on themodeling errors for the data set 4 in Table 2. The figure liststhe coefficients of three data models computed using


Fig. 10. Effect of runtime model updates on prediction accuracy for light

sensor data (data set 4).


different subsets of the same data set as training data. Eachsubset is denoted in terms of a range of the time indexes(R0-R3). The table shows the modeling errors observedwhen the models computed over some of these ranges wereused to predict the values of samples in the other regions.The figure illustrates how the prediction error for region R3can be vastly reduced by recomputing the model at regionR1 (Model 2) or R2 (Model 3), compared to using Model 1,which is computed over R0.

6 EXPERIMENTAL EVALUATION

To evaluate our method of model-based correction, we usedsimulations over multiple data sets. We start with thedescription of the evaluation setup and describe theperformance of our approach without and with onlinemodel updates. For each case, we present the performanceof the models, followed by the performance of the overallerror detection and correction algorithms. We also show theperformance of the correction system when used togetherwith an external error detection system like CRC.

6.1 Evaluation Setup

In order to evaluate our model-based correction technique,we implemented our algorithms in C and Matlab [38] andevaluated their correction performance on real sensor datafrom different sources, under different levels of simulatederror conditions. The specific data sets considered in theevaluation were listed in Table 2 along with their samplingperiod and the number of samples available. The sources ofthe data in Table 2 include an indoor light-level sensor froma testbed network we implemented [1], publicly availableenvironmental temperature and humidity sensor data fromthe California Data Exchange Center (CDEC) [39], andserver rack temperature variations collected from a com-mercial data center.1 Each of the data sets reports informa-tion about the sensed physical process as values quantizedto 8-bit samples, i.e., the data values range from 0 to 255. Itis important to note that the data sets differ in terms ofautocorrelation properties and degrees of stationarity,which provided an opportunity for evaluating the correc-tion performance under real-world limitations of the model.

As discussed in Section 2, we model the errors in thesensor data as random bit errors. In our model, the datareceived at the cluster-head is represented as a sequence of8-bit values, with an independent error probability for eachbit. This error represents the transient errors occurring inboth the sensor nodes and the wireless communicationlinks. For our simulations, we vary the error probabilityover the range 10�4 to 10�2.

The performance of the correction is evaluated in termsof the errors in the sensor values after correction. To capturethe higher impact of errors on smaller values and make afair comparison across data sets with different averagelevels, we use a relative error measure. Using the notationdescribed in Section 4, we define the metric as the percenterror in corrected output Xc as

Eout ¼ 100:

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

Ntot�XNtot

i¼1

XsðiÞ �XcðiÞXcðiÞ

��2

vuut ;

where XsðkÞ is the kth sample originally generated by thesensor, and Ntot is the total number of sample values in adata set. In our evaluations, the output error Eout iscompared to the input error Ein, which represents therelative errors in the observed samples as received by theerror correction block. Ein is computed by replacing Xc withX in the above expression.

6.2 Performance with Only Offline Modeling

6.2.1 Data Modeling

We used AR models for predicting the sensor data for thedifferent data sets mentioned in Table 2. For each data set,the model order minimized the modeling error over thegiven data set using recursive LS estimation [37]. The modelcomputed this way for each of the data sets are shown inTable 3. The table illustrates the differences in characteristicsamong the various data sets in terms of the correlation andsampling rate. For example, data set 4 is likely to have beenoversampled because of the very low modeling error thatcould be obtained even with an order-2 model. On the otherhand, the models for data sets 2 and 3 have high degrees ofrandomness, as denoted by the larger modeling errors.

6.2.2 Data Correction

Fig. 11 shows the variation of the sensor data (data set 4)with time, as well as the effects of errors on the data. It alsoprovides a qualitative idea of the error correction perfor-mance of the Peer algorithm. The probability of error foreach bit is 10�2, and the large number of sharp peaks in theupper plot illustrate the instances when errors occur inhigher order data bits. Some of the errors that were notcorrected happen to occur near a point where the sensordata is also changing fast, in which case the modeling erroris more likely to be high.


1. These are unpublished data sets obtained from HP Data Centersthrough HP Labs, Palo Alto, California.

Fig. 11. Error correction using the Peer algorithm on light sensor data

(data set 4) with offline data modeling only ðBER ¼ 10�2Þ.


The plots in Fig. 12 compare the performances of thethree algorithms, MinErr, MinMax, and Peer, when thesame model is used without runtime reestimation. The finalcorrection error ðEoutÞ is plotted against the errors in theobserved data ðEinÞ. The results demonstrate that Peerperforms marginally better than MinMax for most of themeasured error rates, and they both outperform MinErrsubstantially. Moreover, these performance gaps keepincreasing with rising error levels. The two plots show thedifference in comparative performance in two data sets thatare different in the distribution of data values and theexisting temporal correlation. In Table 3, it can be seen thatthe modeling error for data set 2 is substantially higher thanthat for data set 4. This agrees with the observation that thefinal correction errors for data set 4 are smaller as well. Forexample, the output error in data set 4 at Ein ¼ 1 is less thanthe error in data set 2 at Ein ¼ 0:5. Also, all the curves showa knee above which the output error starts to grow at afaster rate, marking the point where the effects of modelingand observation errors have similar characteristics. Theposition of the knee increases with the modeling error.

To validate our error models, we also evaluated ouralgorithms over real communication channel errors, usingthe same data set as in Fig. 12b. The errors were introducedusing traces of transmissions between Zigbee CC2420

(Chipcon) radios and GNURadio receivers, which wereobtained from [40]. The resulting plots in Fig. 13 show thatthe results from our simulations correspond closely withperformance over real error sources.

It can be observed from the plots that our methodperforms well in the presence of high error levels, but atvery low error levels of input errors, there is a nonzeroresidual error remaining in the output. There are twosources of errors in the output: modeling errors that areintroduced when a predicted value is substituted for anerroneous sample, and observation errors that pass un-detected because they cannot be distinguished frommodeling errors. Moreover, when the modeling errors arehigh, they can also cause observations to be falsely detectedas erroneous by the decision algorithms, in which case themodeling errors get passed into the output. At low inputerror levels, the output error is thus dominated by themodeling errors, and the residual error is dependent on themodel quality. Upon close comparison of the output andthe input data, examples of the residual errors wereobserved where there were sharp and narrow peaks inthe data itself, which can happen when the data isundersampled. The decision algorithms interpret them aserroneous since the line quickly falls back to the previoustrend, very much resembling a transient error. However, itshould be noted that the rate of these errors is directlyconnected to a lack of correlation within the data. If thishappens frequently, it may be possible to reduce the level ofthe residual error by adjusting the sampling rate so that nocorrelated variations in the process are interpreted as errors.

6.3 Performance with Runtime Model Estimation

6.3.1 Data Modeling

Table 4 shows the effect of updating the model parametersat runtime on the modeling error for the example data set 4.The first row shows the outcome of an idealized best casescenario for offline-only updates, where all the availabledata are used to estimate the model parameters. While thisis impractical to use, it provides an idea of the minimummodeling error that can be attained by using the largestpossible amount of data for model estimation. The secondmodel A2 is estimated using only the early portion of the


Fig. 12. Comparison of correction performance of the three decisionalgorithms using offline modeling. (a) and (b) Correspond to twodata sets.

Fig. 13. Comparison of correction performance of the three algorithms

with error traces using offline modeling. The sensor data set same as in

Fig. 12b.


data (samples 550-700) and shows an increase in modelingerror when used for later samples (1,550-1,700 or 1,700-2,000). Estimating the model again at 1,550-1,700 ðA3Þshows an improvement in prediction for subsequentsamples. The above results illustrate the need for a way ofmodeling different parts of the data differently. Asmentioned in Section 5, we do this by runtime reestimationof the model parameter, whose effect on the overallcorrection performance is shown below.

6.3.2 Data Correction with Runtime Model Estimation

The plot in Fig. 14 shows the effect of putting runtimemodel updates together with the Peer algorithm. The plotshows part of the same data with same error level as inFig. 11 but with online reestimation of model parameters.The plot shows a better correction performance than thatwithout online updates. The time spent in the estimationmode is indicated by the line at the top. Since the variationin the sensor values is very regular, only a small fraction ofthe time is spent estimating the model in this case.

Table 5 shows the performance improvements obtainedfor two data sets by using model updates. It also shows theoverhead for the online estimation, as the ratio of number ofsamples used for model updates and those corrected using

the model. It is observed that for either of the data sets,when the error at the input is higher, the improvementattained with model updates for a given data set is higher.This is expected, since the presence of higher levels ofmodeling error makes it more likely for larger observationerrors to be passed through the correction block, thusdegrading the performance further. We also notice that thefrequency of updates is automatically adjusted based on thelevel of the error; therefore, both data sets have substan-tially larger modeling overheads when there are moreerrors at the input.

6.4 Comparison with Traditional Approaches

Since the proposed approach addresses errors from multi-ple sources together, it is difficult to perform a one-on-onecomparison of the correction performance with traditionaltechniques for error handling. Instead, here, we comparethe costs of using each approach in terms of the resourceoverheads incurred to perform error correction.

Fig. 15 shows the comparison of our approach with Reed-Solomon coding, a traditional FEC method for the commu-nication channel. The plot shows the error-correctionperformance for different configurations of the FEC methodwith different code rates and overheads. It can be observedthat using Reed-Solomon coding to match the performanceof our approach would lead to overheads of up to 86 percent,for BERs ranging from 10�4 to 10�2. This overhead willoccur in terms of additional transmitted bits. In comparison,the costs of our method are the storage and processingoverheads for the PHT, which had 64 nodes ðN ¼ 4Þ for thisexperiment. Unlike FEC, the overhead for our approachonly occurs at the cluster-head.


TABLE 4Need for Model Updates: Data Set 4, Model Order 4

Fig. 14. Error correction using Peer on light sensor data (data set 4) withruntime model updates ðBER ¼ 10�2Þ. The plot at the top shows whenthe system state switches between Estimation and Correction modes.

TABLE 5Effect of Dynamic Model Updates

Fig. 15. Comparison of model-based error correction with Reed-

Solomon coding at different code rates (data set 4, N ¼ 4).


The traditional methods of hardware error correction arecircuit hardening techniques like ECC and TMR. Thesemethods incur overheads in terms of silicon area, up to25-180 percent [18].

In comparison, it should be noted that our proposedtechnique poses no resource or complexity overhead on thesensor nodes to correct transient errors. Moreover, theproposed approach handles multiple types of randomerrors, where each of the traditional methods address onlyone source of error.

6.5 Result of Using CRC Checksum

Fig. 16 shows the reliability obtained when the output ofCRC checksum can be utilized, simulated with data set 4.For these experiments, the model-based correction ap-proach used the Peer algorithm in Fig. 7, while the hybridapproach used the modified version of the algorithm shownin Fig. 8. The error model used in this simulation wasdifferent, including only communication channel errors.The bit-error rate varied from 10�3 to 10�1 to simulate verystrong error conditions. It can be observed from the figurethat using the CRC output can reduce the error by about50 percent under such high error conditions.

7 RELATED WORK

This paper presents a method for improving reliability ofsensor data against uncorrelated transient errors. In thissection, we first present a comparison with prior work thathandles the same types of errors. Later, we also discussapproaches that address other types of data unreliability inwireless sensor networks.

Transient failures in wireless sensor networks result inloss or corruption of the data, and studies on realdeployments have shown the rates of loss to be signifi-cantly higher than in other wireless networks [41].Traditional ways of handling communication errors in-clude various types of FEC, e.g., Reed-Solomon, Turbocodes, all of which have overheads in transmitted data [21].Some channel coding schemes have been proposed forwireless sensor networks, but they still incur overheads atthe sender for computing the codes, as well as foradditional transmitted bits [42], [43]. Another approachhas been to add reliability in MAC or Transport layers, butthese are based on packet retransmissions and hence incur

even higher energy overheads [44], [45]. Other works haveaddressed the problem in the context of multihop net-works, taking into account the effect of errors on routingand clustering [46], [47]. A different approach has beentaken in joint source-channel coding [48], [49], which try tooptimize the communication architecture for properties ofthe data source to minimize the effect of communicationerrors. For circuit-level problems, there have been replica-tion-based methods proposed to mitigate the effects of softerrors, like TMR or error-corrected memory [18], [50].However, the overhead of replication makes them un-suitable for use in sensor networks.

Many techniques have been developed to address othertypes of reliability problems in sensor networks. In [8], theauthors described a method that corrects measurementerrors at the sensor based on prior knowledge of the datadistribution and an error model. The idea used here issimilar to our approach but is designed only for smallamounts of additive errors. Permanent sensing errors likebias or faulty sensors have been addressed by calibrationmethods [9] or distributed detection schemes [51], [52].Application-level techniques have been proposed to ad-dress coverage or event detection goals in spite of erroneoussamples or malfunctioning nodes, like query processing inthe presence of erroneous data [53] or localization [54],where known properties of the data are used to maximizethe probability of detection. Also, problems like link ornode outages have been addressed by robust aggregationand routing techniques that ensure the reliability of the dataaggregation tree [25], [55], [56], [57]. These techniques areorthogonal to our approach and can be used on top of ourmethods to address other failures.

While we use the correlation properties in the sensordata to distinguish data from errors, another approach forsystem design is to remove this correlation throughcompression. Several techniques have been proposed thatuse the correlations of sensor data to develop efficientcompression algorithms [58], reduce memory usage forrouting algorithms [59], etc.

8 CONCLUSIONS

In this article, we investigate transient data errors in

wireless sensor networks, and examine the problems of

trying to apply traditional error correction methods in this

context. We present a novel approach to sensor network

design that uses specific properties of sensor data to enable

reliable data collection at no additional cost to the sensor

nodes. We show that our approach, called model-based

error correction, corrects transient errors introduced in the

sensor node hardware, as well as in the wireless commu-

nication channel. Through a simulation-based study on real

sensor data, we demonstrate that with the proposed

enhancements, the presented framework can be used

efficiently to address transient errors in different types of

sensor data across diverse applications. We also implemen-

ted the correction algorithm in software on a sensor testbed.

The proposed future extensions include the exploration of

other data models (e.g., probabilistic), use of sophisticated


Fig. 16. Error correction performance for the hybrid approach (with CRC).


estimation algorithms for correction, and addressing tem-porally correlated errors like sensor bias through the use ofspatial correlation in the sensor data.

ACKNOWLEDGMENTS

The authors thank Kyle Jamieson and Hari Balakrishnan forthe error traces and Salil Pradhan and Malena Mesarina forthe thermal sensor data used in our evaluations. Theauthors would also like to acknowledge the usefulcomments and suggestions of Niloy Mitra and all theanonymous reviewers, which helped improve the quality ofthis paper. This work was supported by the UC DiscoveryGrant (Grant com02-10126) and the Center for WirelessCommunications, University of California, San Diego.

REFERENCES

[1] S. Mukhopadhyay, D. Panigrahi, and S. Dey, “Data Aware, LowCost Error Correction for Wireless Sensor Networks,” Proc. IEEEWireless Comm. and Networking Conf. (WCNC ’04), pp. 2492-2497,Mar. 2004.

[2] S. Mukhopadhyay, D. Panigrahi, and S. Dey, “Model Based ErrorCorrection for Wireless Sensor Networks,” Proc. IEEE Sensor andAd Hoc Comm. and Networks (SECON ’04), pp. 575-584, 2004.

[3] M. Hatler and C. Chi, “Wireless Sensor Networks: GrowingMarkets, Accelerating Demand,” technical report, ON World,Oct. 2005.

[4] Wireless Sensor Networks Market Expected to Skyrocket, http://www.controldesign.com/industrynews/2005/040.html, 2005.

[5] J. Hill et al., “System Architecture Directions for NetworkedSensors,” Proc. Ninth Int’l Conf. Architectural Support for Program-ming Languages and Operating Systems (ASPLOS ’00), pp. 93-104,2000.

[6] N. Ferzli et al., “An Application of Smart Dust for PavementCondition Monitoring,” Smart Structures and Materials: Proc. SPIE,vol. 6174, pp. 976-987, 2006.

[7] I. Mahgoub and M. Ilyas, Smart Dust: Sensor Network Applications,Architecture, and Design. CRC Press, 2006.

[8] E. Elnahrawy and B. Nath, “Cleaning and Querying NoisySensors,” Proc. Second ACM Int’l Workshop Wireless Sensor Networksand Applications (WSNA ’03), 2003.

[9] V. Bychkovskiy, S. Megerian, D. Estrin, and M. Potkonjak, “ACollaborative Approach to In-Place Sensor Calibration,” Proc. Int’lWorkshop Information Processing in Sensor Networks (IPSN ’03), 2003.

[10] T. Karnik and P. Hazucha, “Characterization of Soft ErrorsCaused by Single Event Upsets in CMOS Processes,” IEEE Trans.Dependable and Secure Computing, vol. 1, no. 2, pp. 128-143, 2004.

[11] P. Shivakumar et al., “Modeling the Effect of Technology Trendson the Soft Error Rate of Combinational Logic,” Proc. Int’l Conf.Dependable Systems and Networks (DSN ’02), pp. 389-398, 2002.

[12] G. Schindlbeck, “Trend in DRAM Soft Errors,” Proc. 12th IEEE Int’lOn-Line Testing Symp. (IOLTS ’06), p. 272, 2006.

[13] R. Baumann, “Technology Scaling Trends and Accelerated Testingfor Soft Errors in Commercial Silicon Devices,” Proc. Ninth IEEEInt’l On-Line Testing Symp. (IOLTS ’03), p. 4, 2003.

[14] R. Baumann, “Radiation-Induced Soft Errors in AdvancedSemiconductor Technologies,” IEEE Trans. Device and MaterialsReliability, vol. 5, no. 3, pp. 305-316, 2005.

[15] P. Hazucha et al., “Neutron Soft Error Rate Measurements in a90-nm CMOS Process and Scaling Trends in SRAM from 0.25-�mto 90-nm Generation,” IEEE Int’l Electron Devices Meeting. IEDMTechnical Digest, pp. 21.5.1-21.5.4, 2003.

[16] C. Zhao, X. Bai, and S. Dey, “A Static Noise Impact AnalysisMethodology for Evaluating Transient Error Effects in DigitalVLSI Circuits,” Proc. Int’l Test Conf. (ITC ’05), p. 40.2, Oct. 2005.

[17] B. Warneke, M. Last, B. Liebowitz, and K. Pister, “Smart Dust:Communicating with a Cubic-Millimeter Computer,” Computer,vol. 34, no. 1, pp. 44-51, 2001.

[18] Y. Zhao and S. Dey, “Separate Dual Transistor Registers: A CircuitSolution for On-Line Testing of Transient Errors in UDSM-IC,”Proc. Ninth IEEE Int’l On-Line Testing Symp. (IOLTS ’03), pp. 7-11,2003.

[19] K. Itoh, M. Horiguchi, and T. Kawahara, “Ultra-Low VoltageNano-Scale Embedded RAMs,” Proc. IEEE Int’l Conf. Circuits andSystems (ISCAS ’06), pp. 25-28, 2006.

[20] S. Mukherjee, J. Emer, and S. Reinhardt, “The Soft Error Problem:An Architectural Perspective,” Proc. 11th Int’l Symp. High-Performance Computer Architecture (HPCA ’05), pp. 243-247, 2005.

[21] S.B. Wicker, Error Control Systems for Digital Communication andStorage. Prentice Hall, 1995.

[22] M.C. Vuran, O.B. Akan, and I.F. Akyildiz, “Spatio-TemporalCorrelation: Theory and Applications for Wireless Sensor Net-works,” Computer Networks, vol. 45, no. 3, pp. 245-259, 2004.

[23] L. Sankaranarayanan, G. Kramer, and N. Mandayam, “Hierarch-ical Sensor Networks: Capacity Bounds and Cooperative Strate-gies Using the Multiple-Access Relay Channel Model,” Proc. IEEESensor and Ad Hoc Comm. and Networks (SECON ’04), pp. 191-199,2004.

[24] S. Bandyopadhyay and E. Coyle, “An Energy Efficient Hierarch-ical Clustering Algorithm for Wireless Sensor Networks,” Proc.IEEE INFOCOM, vol. 3, pp. 1713-1723, 2003.

[25] S. Soro and W. Heinzelman, “Prolonging the Lifetime of WirelessSensor Networks via Unequal Clustering,” Proc. 19th IEEE Int’lParallel and Distributed Processing Symp. (IPDPS ’05), p. 8, 2005.

[26] M. Zhang and N.R. Shanbhag, “A Soft Error Rate Analysis (Sera)Methodology,” Proc. IEEE/ACM Int’l Conf. Computer-Aided Design(ICCAD ’04), pp. 111-118, 2004.

[27] C. Zhao, X. Bai, and S. Dey, “A Scalable Soft Spot AnalysisMethodology for Compound Noise Effects in Nano-MeterCircuits,” Proc. 41st Ann. Conf. Design Automation (DAC ’04),pp. 894-899, 2004.

[28] S. Mitra, N. Kee, and S. Kim, “Robust System Design with Built-InSoft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, 2005.

[29] R. Szewczyk et al., “Application Driven Systems Research: HabitatMonitoring with Sensor Networks,” Comm. ACM, special issue onsensor networks, pp. 34-40, June 2004.

[30] A. Willig and R. Mitschke, “Results of Bit Error Measurementswith Sensor Nodes and Casuistic Consequences for Design ofEnergy-Efficient Error Control Schemes,” Proc. Third EuropeanWorkshop Wireless Sensor Networks (EWSN ’06), 2006.

[31] L. Nachman et al., “The Intel Mote Platform: A Bluetooth-BasedSensor Network for Industrial Monitoring,” Proc. Fourth Int’lSymp. Information Processing in Sensor Networks (IPSN ’05), p. 61,2005.

[32] H. Akaike, “Fitting Autoregressive Models for Prediction,” Annalsof the Inst. Statistical Math., vol. 21, pp. 243-247, 1969.

[33] A. Harvey, Forecasting, Structural Time Series Models and the KalmanFilter. Cambridge Univ. Press, 1989.

[34] H. Kantz, T. Schreiber, and D. Wojcik, “Nonlinear Time SeriesAnalysis,” Pure and Applied Geophysics. Birkhauser, 1998.

[35] M. Thottan and C. Ji, “Fault Prediction at the Network LayerUsing Intelligent Agents,” Proc. Sixth IFIP/IEEE Int’l Symp.Integrated Network Management (IM ’99), pp. 745-759, 1999.

[36] C. Chatfield, The Analysis of Time Series: An Introduction, sixth ed.CRC Press, 2004.

[37] S. Haykin, Adaptive Filter Theory, second ed. Prentice Hall, 1996.[38] MATLAB: A High-Level Technical Computing Environment, http://

www.mathworks.com/products/matlab, 2008.[39] CDEC: California Data Exchange Center, California Dept. of Water

Resources, http://cdec.water.ca.gov, 2008.[40] K. Jamieson and H. Balakrishnan, “PPR: Partial Packet Recovery

for Wireless Networks,” ACM SIGCOMM Computer Comm. Rev.,vol. 37, no. 4, pp. 409-420, 2007.

[41] J. Zhao and R. Govindan, “Understanding Packet DeliveryPerformance in Dense Wireless Sensor Networks,” Proc. First Int’lConf. Embedded Networked Sensor Systems (SenSys ’03), pp. 1-13,2003.

[42] S. Howard, C. Schlegel, and K. Iniewski, “Error Control Coding inLow-Power Wireless Sensor Networks: When Is ECC Energy-Efficient?” EURASIP J. Wireless Comm. and Networking, vol. 2006,no. 2, p. 29, 2006.

[43] J. Jeong and C. Ee, Forward Error Correction in Sensor Networks.Univ. of California, May 2006.

[44] F. Stann and J. Heidemann, “RMST: Reliable Data Transport inSensor Networks,” Proc. First Int’l Workshop Sensor NetworkProtocols and Applications (SNPA ’03), 2003.

[45] C.-Y. Wan, A.T. Campbell, and L. Krishnamurthy, ReliableTransport for Sensor Networks: PSFQ—Pump Slowly Fetch QuicklyParadigm. Kluwer Academic Publishers, 2004.



[46] Q. Cao et al., “Cluster-Based Forwarding for Reliable End-to-EndDelivery in Wireless Sensor Networks,” Proc. IEEE INFOCOM,pp. 1928-1936, 2007.

[47] M. Vuran and I. Akyildiz, “Cross-Layer Analysis of Error Controlin Wireless Sensor Networks,” Proc. IEEE Comm. Soc. Conf. Sensorand Ad Hoc Comm. and Networks (SECON ’06), pp. 585-594, vol. 2,2006.

[48] S. Cui et al., “Energy-Efficient Joint Estimation in SensorNetworks: Analog versus Digital,” Proc. IEEE Int’l Conf. Acoustics,Speech, and Signal Processing (ICASSP ’05), vol. 4, 2005.

[49] M. Gastpar and M. Vetterli, “Power, Spatio-Temporal Bandwidth,and Distortion in Large Sensor Networks,” IEEE J. Selected Areas inComm., vol. 23, no. 4, pp. 745-754, 2005.

[50] C. Zhao and S. Dey, “Improving Transient Error Tolerance ofDigital VLSI Circuits Using Robustness Compiler (ROCO),” Proc.Seventh Int’l Symp. Quality Electronic Design (ISQED ’06), p. 6, 2006.

[51] X. Luo, M. Dong, and Y. Huang, “On Distributed Fault-TolerantDetection in Wireless Sensor Networks,” IEEE Trans. Computers,vol. 55, no. 1, pp. 58-70, 2006.

[52] B. Krishnamachari and S. Iyengar, “Distributed Bayesian Algo-rithms for Fault-Tolerant Event Region Detection in WirelessSensor Networks,” IEEE Trans. Computers, vol. 53, no. 3, pp. 241-250, Mar. 2004.

[53] J.M. Hellerstein, W. Hong, S. Madden, and K. Stanek, “BeyondAverage: Towards Sophisticated Sensing with Queries,” Proc.Second Int’l Workshop Information Processing in Sensor Networks(IPSN ’03), Mar. 2003.

[54] A. Savvides, W. Garber, R. Moses, and M. Srivastava, “AnAnalysis of Error Inducing Parameters in Multihop SensorNode Localization,” IEEE Trans. Mobile Computing, vol. 4, no. 6,pp. 567-577, 2005.

[55] A. Manjhi, S. Nath, and P. Gibbons, Tributaries and Deltas:Efficient and Robust Aggregation in Sensor Network Streams,pp. 287-298. ACM Press, 2005.

[56] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-Based Computationof Aggregate Information,” Proc. 44th Ann. IEEE Symp. Foundationsof Computer Science (FOCS ’03), p. 482, 2003.

[57] D. Ganeshan, R. Govindan, S. Shenker, and D. Estrin, “HighlyResilient, Energy Efficient Multipath Routing in Wireless SensorNetworks,” Mobile Computing and Comm. Rev., vol. 1, no. 2, 2002.

[58] S.S. Pradhan and K. Ramachandran, “Distributed Source Coding:Symmetric Rates and Applications to Sensor Networks,” Proc.IEEE Data Compression Conf. (DCC ’00), Mar. 2000.

[59] M. Drinic et al., “Model-Based Compression in Wireless Ad HocNetworks,” Proc. First Int’l Conf. Embedded Networked SensorSystems (SenSys ’03), pp. 231-242, 2003.

Shoubhik Mukhopadhyay received the BTechdegree in electronics and electrical communica-tion engineering from the Indian Institute ofTechnology, Kharagpur, India, in 1999 and theMS degree in electrical and computer engineer-ing from the University of California, San Diego,in 2004. He is currently working toward the PhDdegree in electrical and computer engineering inthe Department of Electrical and ComputerEngineering, University of California, San Diego.

He is a student member of the IEEE.

Curt Schurgers received the MS degree from theKatholieke Universiteit Leuven (KUL), Belgium,in 1997 and the PhD degree from the Universityof California, Los Angeles (UCLA), in 2002. Hewas a researcher at IMEC, Belgium, from 1997to 1999 and at the Massachusetts Institute ofTechnology (MIT) in 2003. Currently, he is aprofessor in the Department of Electrical andComputer Engineering, University of California,San Diego. He is a member of the IEEE.

Debashis Panigrahi received the BTech de-gree in computer science and engineering fromthe Indian Institute of Technology, Kharagpur,India, in 1998 and the MS and CPhil degrees incomputer engineering from the University ofCalifornia, San Diego (UCSD), in 2001 and2004, respectively. He is currently a softwaremanager and one of the founding members ofOrtiva Wireless Inc., San Diego, where hedevelops technologies and products for optimiz-

ing multimedia delivery over wireless networks. Prior to joining UCSD in1999, he was with Synopsys India, as a research and developmentengineer. He has coauthored 25 technical papers. He is a member ofthe IEEE.

Sujit Dey received the PhD degree in computerscience from Duke University, Durham, NorthCarolina, in 1991. He is a professor with theDepartment of Electrical and Computer Engi-neering, University of California, San Diego(UCSD), where he heads the Mobile SystemsDesign and Test Laboratory, engaged in devel-oping configurable platforms, consisting ofadaptive wireless protocols and multimediaalgorithms and deep-submicron adaptive sys-

tem on chips, for next-generation wireless networks and applications.He is affiliated with the California Institute of Telecommunications andInformation Technology (Cal-IT2), and the UCSD Center for WirelessCommunications. Based on innovative technologies developed in hislaboratory at UCSD, he founded Ortiva Wireless in 2004 and has servedas its CEO and CTO. He has served as the chair of the advisory board ofZyray Wireless and as an advisor to multiple companies including STMicroelectronics and NEC. Prior to joining UCSD in 1997, he was aSenior Research Staff Member at the NEC C&C Research Laboratories,Princeton, New Jersey. He has coauthored more than 150 publications,including journal and conference papers, a book on low-powerdesign, and several book chapters. He is the coinventor of 11 USpatents, with 10 others pending, resulting in multiple technologylicensing. He has been the recipient of several best paper awards andhas chaired multiple IEEE conferences and workshops. He is a seniormember of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.



Documents

528 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. …people.cs.vt.edu/~irchen/6204/paper/Muk-TMC09.pdf · 2009. 8. 23. · Abstract—Wireless Sensor Networks are a fast-growing