Model Based Techniques for DATA RELIABILITY in Wireless Sensor Networks

Model Based Techniques for DATA RELIABILITY in Wireless Sensor Networks.

-Hambi

Problem Definition Assuring the reliability of the data is going to

be one of the major design challenges of future sensor networks

How are we going to achieve that while having stringent requirements like tight energy budgets, low-cost components, limited processing resources, and small footprint devices ?

Outline

Brief Introduction Motivation Overall approach Predictive error correction Data Modeling Performance Analysis

Introduction The growth in the wireless sensor networks is

mainly driven by developments in semiconductor design technology,

e.g., The scaling of feature sizes and lowering of operating voltages, which are allowing sensor nodes to become smaller and power efficient.

As the nodes get smaller and cheaper, ensuring the reliability of sensor data becomes harder.

Sources of Errors Operating conditions. Aging which causes calibration drifts. Cross talk, radiation effects. Two types of errors- transient, permanent. shrinking of feature sizes to nanometer scales

and the lowering of supply voltages to sub-volt ranges are making them vulnerable to various noise and interference effects.

Errors in the communication channel.

Traditional Methods for Error Correction

Continued…

Communication errors - Channel coding or Automatic Retransmission Requests (ARQs) coupled with error detection.

Channel coding- Forward Error Correction (FEC) codes.

Adds extra bits to transmitted packets that allow correct decoding when some of the bits are corrupted. Eg: Reed-Solomon coding.

Processing data in fixed-size blocks, adding a fixed number of overhead bits during encoding.

The decoder can recover a block of data when the number of errors for the block does not exceed half the number of overhead bits.

Approach Model-based error correction techniques.

• Use sensor data for error correction.• Increased redundancy- Yes, but, can be handled in

different ways as part of the system design.• Reducing redundancy may reduce robustness of

data against errors.• why not use this redundancy to ensure reliability ?

Advantages:• Network is much more efficient.• Multiple types of errors are handled together.• Sensor nodes need not worry about the error-

correction.• Follows the KIS principle- sensors are meant just

for sensing.

Architecture

Continued… The sensing is done by clusters of dedicated

sensor nodes that report the sensor data to more complex cluster-head nodes.

Cluster heads• Have more processing and energy resources.• Capable of multiple complex functions that involve

data processing and storage.• Configures scheduling of sensing, data-reporting

and sleeping cycles of the nodes around it.• Functionality implemented in software.

Approach can be used with the traditional approach.

Model-based error correction The idea is to use the Data properties for

reliability. Analyze the sensor data. Capture relevant properties in a data model. Properties like …?

• Depends on the type of application.• E.g.: Knowing that correlation time is in hours

rather than milli seconds.

Use this model for error detection and correction.

Overall scheme for model based error correction

what exactly are we doing here ?

Before each sample is received, predicted value(Xp) is calculated- using the data-model.

Now get the observed value (X). Use Xp and the past observations (given in the

data-model). Decide whether the sample is erroneous-

Algorithm decides that. Report the corrected value Xc-depends in the

O/P of error detection.

What else do we have ? Adopt the model-online based on the collected

data. Improves accuracy of prediction, correction.

Note: we are NOT dealing with sensors that have slow variations in time in relation to sampling rate.

E.g.: Event detection applications/ low sampling rate applications.

Data is normally expected to contain sharp variations randomly spread.

Predictive Error Correction-Intro

Correlation characteristics of the data should be different than that of the error.

Most of the errors are random-bit errors, uniformly distributed across bit positions.

Dealing with transient errors• Probabilistic models for process variations,

random particle strikes.• Deterministic models-ckt layout, cross talk.• Effect visible only at the gate-level or register-

level. Uniform structure of the logic and layout make

cells equally susceptible to radiation effects.

Continued… For communication channel errors, channels

conditions are measured. Hard to model for large packet arrival

intervals. In absence of any estimates, use the model of

uncoordinated random bits. We use Bernoulli process with uniform error

probability for all the bits in the data. If sampling happens every few seconds, 50-

200ms bursts have no influence.

Correction Methodology Calculate the difference between the predicted

and the observed value. There will be a level of prediction error. Complexities like size of history can also

introduce inadequacy. Main challenge:

• Identify the cause of the error- Randomness of data or error introduced after sensing.

Compare the prediction errors of the samples sequentially.

Continued… Delay the reporting by few sample periods. Compare with the future samples. Report depending on how the choice (Xp or X ) affects

the future samples. How do we identify errors ?

Erroneous observation leads to continuous degradation of the predicted value of the future samples.

Modeling error- unlikely to introduce degrading predicted values.

whenever the decision algorithm detects an error in an observed data sample, the observation is marked and treated as an erasure.

Functional Diagram-PEC block

Continued… Prediction, decision blocks- control blocks.

Implements the prediction model. Produces predicted value based on recent history.

Observation history, prediction history blocks-Storage blocks.

O/P prediction block stored in PHT. Both are I/Ps to prediction block to predict future

samples.

Decision block: PHT is processed by the decision algorithm in

Decision block.

Data Structure of PHT Holds the few most recently observed samples

at any time. All possible sequences of Xps’ and the

corresponding error values (Xe) are stored. It’s a binary tree. Root node contains last corrected data sample. Its (Root node) children contain X and Xp

values of the very next sample. The LEAF NODES hold the predicted values of

the current sample.

Continued… Each path from root to leaf holds a possible

sequence of observed or predicted values. Decision delay : DEPTH of the PHT (N). Tree has N+2 Levels. Each node in a level hold a pair of values.

<Observed/predicted value , prediction error>

Nodes are sequentially numbered starting from 0.

2i+ 1 Observed values (Odd Numbered) 2i + 2 Predicted values (Even Numbered)

Continued… Root node contains the last corrected value Xc[n-

3]= 100. Even numbered leaf nodes contain different

predicted values of Xp[n] that would be computed for different choice of previous values.

Once the new sample X[n] is observed, the prediction errors for all the values of Xp[n] are computed.

Decision algorithm decides between Nodes 1 or 2. Prediction errors are used for comparison. Finally Xc[n-2] is chosen.

Continued… The algorithm updates the PHT for the next

sample. Node 1 Observation Sub tree. Node 2 Prediction sub tree. One sub tree is chosen. Nodes are moved up

by one level. The other sub tree is discarded. Decision and the update process is repeated

next.

Decision Algorithms

Four Algorithms:• Min-Err Algorithm• Min-Max Algorithm• Peer Algorithm• Hybrid Peer with CRC check

Min-Err Algorithm Decision is based on how the choice affects the

prediction accuracy of the next N samples. Based on different choices of Observed/ predicted

values, different sequence of samples are available. Select the sub-tree of the PHT that contains root-leaf

path with minimum RMS correction error. Ex: Among paths ending with Nodes 8,10,12,14. The RMS value is Min. for nodes ending with path 14. Hence Node 2 is chosen.

Min-err drawbacks Highly sensitive to the modeling performance. You might end up getting prediction error values

even for the authentic observed values. The effect of the modeling error is amplified for

the paths that have small number of predicted samples.

A single sample might be used for the decision making.

Ex: Lets say the observed value is 111 (go back to slide 23). Errors for the path would have been 19 and 21, which means the other node would have been chosen.

Min-max algorithm Sub-trees of nodes 1 and 2 are considered

separately. Path with the Max-average error is found. Sub-tree with the smaller maximum average

error is selected for the decision. Pros:

More resilient to modeling errors. In the example (slide-29), only paths ending

with nodes 10 and 12 are compared. N should be as big as 145 to affect the final

value.

Min-max continued… Cons:

Doesn’t take into account certain cases which causes spurious error for certain models.

Ex: size of history is smaller than the depth of the PHT.

Lets say the example model on slide 29 uses just the previous sample for prediction.

So, value predicted at node 1 will have no affect on predicted value at node 12.

Peer Algorithm Individual pairs of nodes in each sub tree are

compared, instead of full paths. Nodes in the peer positions are compared. Absolute prediction errors are compared. Sub-tree that has more samples with lower

prediction errors. Predictions that are independent of the choice

are excluded from the decision-making process. Nodes 4,8,10 are compared against 6,12 and 14

respectively.

Continued…

Continued.. Model parameter-M is the number of samples

used for prediction from the PHT. Before each comparison, it is ensured that

nodes 1 and 2 or a sample directly predicted from it is among the previous M samples.

If difference between prediction errors in a pair < average modeling error, that pair is discarded.

ETH (Error threshold) is used to do that.

Hybrid Correction

Hybrid with CRC No additional info apart from the sensor data

used in the earlier models. Check Sum function built in with hardware. Can be complemented with model-based error

detection. Result of the check sum is fed to the decision

algorithm when it is available. When error is detected by the CRC, its treated

as a missing sample and predicted value for the model is used.

DATA-MODEL How do we create the data-model ?

Properties of data source are used for predictions.

What are the requirements ? Maximize the prediction accuracy. Prediction needs to be fast Low computation and storage overheads.

Ex: when the data source is strictly not stationary.

Auto-Regressive models

Used for our implementation. Capture the effect of recent history

through “aging” process. Computationally very efficient since

linear prediction functions are used. Prediction is expressed as a linear

combination of previous samples.

Continued…

Implementation

Implementation continued.. Modeling is done in two parts.

Offline Analysis through Statistical properties.

Runtime updates to the model. Offline model – compute the order of the AR

model based on the correlation time & sampling rate.

Runtime Updates: Track the prediction accuracy Do the model update

Runtime model Special model of operation- Estimation mode. Sensors temporarily report the data with additional

protection. More reliable data is available for computing updated

models at the cluster heads. Implemented by making redundant readings. Trends in the prediction errors are continuously monitored. Model-update request triggered when necessary. Stop the data gathering process. Temporarily switch to the estimation mode. Update the model with the protected data. Get back to correction mode i.e, normal mode of operation.

Pros and cons… Sensor data collected during the update mode

is still transparent to the application. Estimation mode implemented without

additional hardware overhead. Redundancy leads to increase in energy costs

per bit. Can be made efficient by sharing across

multiple cluster-heads.

Model tracking and model Updates During correction mode, a running windowed

average of the prediction error is maintained. Threshold is scaled for the number of correct

samples in the averaging window. Threshold value and the size of the averaging

window, determine the frequency of updates. Optimal choice will depend on the

characteristics of the data source and the system.

Update stages has minimal set of data points as it runs in resource heavy estimation mode.

Model updates… States can be associated with pre-computed

models. Increasing number of samples may increase

accuracy, but, at the cost of additional

resource overhead.

Performance evaluation Peer >Min-max>>Min-err – Offline mode. For dynamic model updates, when the error in

the input is higher, the improvement attained with updates is higher.

Correction performance is better with runtime than with that of offline mode.

Reed-solomon leads to 86% overhead. Using CRC o/p can reduce errors by 50% under

high error conditions.

Questions/Comments ?

Documents

Model Based Techniques for DATA RELIABILITY in Wireless Sensor Networks