A Resilient Transport System for Wireless Sensor …campbell/papers/wan-thesis.pdfA Resilient Transport System for Wireless Sensor Networks Chieh-Yih Wan Submitted in partial ful llment

A Resilient Transport System for Wireless Sensor

Networks

Chieh-Yih Wan

Submitted in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

in the Graduate School of Arts and Sciences

Columbia University

2005

c© 2005

Chieh-Yih Wan

All Rights Reserved

ABSTRACT

A Resilient Transport System for Wireless Sensor Networks

Chieh-Yih Wan

This thesis contributes toward the design of a new resilient transport system for

wireless sensor networks. Sensor networks have recently emerged as a vital new area

in networking research, one that tightly marries sensing, computing, and wireless

communications for the first time. Wireless sensors are embedded in the real world

and interact closely with the physical environment in which they reside. These net-

works must be designed to effectively deal with the network’s dynamically changing

resources, including available energy, bandwidth, processing power, node density,

and connectivity. This dissertation focuses on making the sensor network trans-

port system resilient to such changes - in many cases abrupt changes. We define

transport resilience as the ability of the network to deliver a sufficient amount of

sensing events to meet the applications’ fidelity requirement for a set of different

traffic classes while reducing the energy consumption of the network. More specifi-

cally, we investigate, study, and analyze two classes of transport resilience: (1) the

need to reliably deliver data under various error conditions; and (2) the need to

maintain the application’s fidelity under congested network conditions. We take an

experimental systems research approach to the problem of supporting resilience in

sensor networks by building an experimental sensor network testbed and evaluating

a set of new resilient transport algorithms under various workloads and changing

network conditions. We study the behavior of these algorithms under testbed condi-

tions, and apply what is learned toward the construction of larger and more scalable

resilient networks.

This thesis makes a number of contributions. First, we propose a new reliable

delivery transport paradigm for sensor networks called Pump Slowly Fetch Quickly

(PSFQ). PSFQ represents a lightweight, scalable and robust transport protocol that

is customizable to meet a wide variety of applications needs (e.g., re-programming,

actuation, reliable event delivery). We present the design and implementation of

PSFQ, and evaluate the protocol using the ns-2 simulator and an experimental wire-

less sensor testbed based on Berkeley motes and the TinyOS operating system. The

PSFQ protocol represents the first reliable transport proposed for wireless sensors

networks.

Next, we present the design of an energy-efficient congestion control scheme for

sensor networks called CODA (COngestion Detection and Avoidance). We define

a new objective function for traffic control in sensor networks, which maximizes

the operational lifetime of the network while delivering acceptable data fidelity to

sensor network applications. CODA is founded on three important distributed con-

trol mechanisms: (1) an accurate and energy-efficient congestion detection scheme;

(2) a hop-by-hop backpressure algorithm; and (3) a sink to multi-source regulation

scheme. We evaluate a number of congestion scenarios and define new performance

metrics that capture the impact of CODA on the sensing application performance.

We analyze the performance benefits and practical engineering challenges of imple-

menting CODA in an experimental sensor network motes testbed. CODA represents

the first comprehensive solution to the congestion problem in sensor networks.

The final contribution of this dissertation explores a complementary solution to

CODA called dual radio virtual sinks that boosts the performance of sensor networks

even under persistent overload conditions. We propose to randomly distribute a

small number of all-wireless dual radio virtual sinks throughout the sensor field. In

essence, these virtual sinks operate as safety valves in the sensor field by selectively

siphoning off overload traffic in order to maintain the fidelity of the application

signal delivered to the network’s physical sink. A key feature of virtual sinks is

that they are equipped with a secondary higher bandwidth, long-range radio (e.g.,

the IEEE 802.11), in addition to their primary low bandwidth, low power mote

radio. Virtual sinks are capable of dynamically forming a secondary ad hoc radio

network that can be used on-demand by the mote radio network. Rather than rate-

controlling packets during periods of congestion (as is the case with CODA), virtual

sinks take the congested traffic off the low-powered sensor network and move it on

to the secondary radio network, transiting it to the network’s physical sink. We

study, propose, and evaluate a set of algorithms for virtual sink discovery, selection,

traffic transiting and load balancing. We leverage the use of the Stargate platform

to support an all-wireless virtual sink approach in our sensor network testbed.

We believe that sensor networks must be built to be robust to various software

and hardware failures, and be resilient to dynamic resource changes such as node

failures, increased packet error rates, and traffic surges. Collectively, PSFQ, CODA,

and virtual sinks provide a set of energy-efficient, robust transport mechanisms that

serve as a foundation for making sensor networks more resilient.

Contents

1 Introduction 1

1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 A Resilient Transport System for Sensor Networks . . . . . . . 5

1.1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1.3 Technical Barriers . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2. Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2.1 Pump Slowly Fetch Quickly (PSFQ) . . . . . . . . . . . . . . 16

1.2.2 CODA (COngestion Detection and Avoidance) . . . . . . . . . 17

1.2.3 Dual Radio Virtual Sinks . . . . . . . . . . . . . . . . . . . . . 18

1.3. Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Pump-Slowly, Fetch-Quickly (PSFQ): A Reliable Transport Proto-

col for Sensor Networks 22

2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2. Protocol Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.1 Hop-by-Hop Error Recovery . . . . . . . . . . . . . . . . . . . 25

2.2.2 Fetch/Pump Relationship . . . . . . . . . . . . . . . . . . . . 27

2.2.3 Multi-modal Operations . . . . . . . . . . . . . . . . . . . . . 29

2.3. Protocol Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3.1 Pump Operation . . . . . . . . . . . . . . . . . . . . . . . . . 31

i

2.3.2 Fetch Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.3 Report Operation . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.3.4 Single-Packet Message Delivery . . . . . . . . . . . . . . . . . 40

2.4. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.1 Simulation Approach . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 43

2.5. Experimental Testbed Results . . . . . . . . . . . . . . . . . . . . . . 48

2.5.1 PSFQ Parameter Space and Timer Bounds . . . . . . . . . . . 48

2.5.2 Messaging Overhead . . . . . . . . . . . . . . . . . . . . . . . 49

2.5.3 Network Size versus Network Density . . . . . . . . . . . . . . 51

2.6. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Energy-Efficient Congestion Detection and Avoidance in Sensor

Networks 56

3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2. Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.1 CSMA Considerations . . . . . . . . . . . . . . . . . . . . . . 63

3.2.2 Congestion Detection . . . . . . . . . . . . . . . . . . . . . . . 66

3.3. CODA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.3.1 Open-Loop Hop-by-Hop Backpressure . . . . . . . . . . . . . . 72

3.3.2 Closed-Loop Multi-Source Regulation . . . . . . . . . . . . . . 77

3.4. Experimental Sensor Network Testbed . . . . . . . . . . . . . . . . . 82

3.4.1 Measuring the β Value . . . . . . . . . . . . . . . . . . . . . . 83

3.4.2 Channel Loading Measurement and Utilization . . . . . . . . . 84

3.4.3 Energy Tax, Fidelity Penalty, and Power . . . . . . . . . . . . 88

3.4.4 Open-loop Control . . . . . . . . . . . . . . . . . . . . . . . . 90

ii

3.4.5 Combining Open-loop and Closed-loop Control . . . . . . . . 92

3.5. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

3.5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 95

3.5.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 98

3.6. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4 Dual Radio Virtual Sinks 110

4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.3. Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.3.1 Funneling Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.3.2 Small World Observations and Shortcuts . . . . . . . . . . . . 117

4.3.3 Traffic Redirection and Prioritization Issues . . . . . . . . . . 119

4.3.4 Transparency and Compatibility Issues . . . . . . . . . . . . . 120

4.4. Siphon Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.4.1 Virtual Sink Discovery and Visibility Scope Control . . . . . . 123

4.4.2 Congestion Detection . . . . . . . . . . . . . . . . . . . . . . . 126

4.4.3 Traffic Redirection . . . . . . . . . . . . . . . . . . . . . . . . 129

4.4.4 Congestion in the Secondary Network . . . . . . . . . . . . . . 131

4.5. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 132

4.5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 132

4.5.2 Delay Device and Directed Diffusion . . . . . . . . . . . . . . 134

4.5.3 Energy Tax, Fidelity Ratio and Residual Energy . . . . . . . . 136

4.5.4 Early Congestion Detection . . . . . . . . . . . . . . . . . . . 137

4.5.5 Virtual Sink’s Visibility Scope Impact . . . . . . . . . . . . . . 139

4.5.6 Always-on versus On-demand Virtual Sinks . . . . . . . . . . 140

iii

4.5.7 Partitioned Secondary Network . . . . . . . . . . . . . . . . . 142

4.5.8 VS Density Impact . . . . . . . . . . . . . . . . . . . . . . . . 143

4.5.9 Load Balancing Feature . . . . . . . . . . . . . . . . . . . . . 146

4.6. Sensor Network Testbed Implementation . . . . . . . . . . . . . . . . 148

4.6.1 Stargate and Mica Mote Testbed . . . . . . . . . . . . . . . . 148

4.6.2 Congestion Detection for Traffic Redirection Decision . . . . . 149

4.6.3 A Generic Data Dissemination Application . . . . . . . . . . . 153

4.6.4 Post-Facto Traffic Siphoning . . . . . . . . . . . . . . . . . . . 155

4.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5 Conclusion 158

5.1. The Critical Issue of Transport Resilience . . . . . . . . . . . . . . . . 158

5.2. Reliable Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5.3. Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

5.4. Endnote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6 My Publications as a PhD Candidate 166

6.1. Journal Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

6.2. Journal Papers under Submission . . . . . . . . . . . . . . . . . . . . 167

6.3. Magazine Papers, Review Articles and Book Chapters . . . . . . . . . 167

6.4. Conference Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.5. IETF Internet Drafts . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

References 170

iv

List of Figures

1-1 The funneling effect. Sensors within the range of an event region/epicenter

(enclosed by the dotted ellipse line) generate data that travels along

a propagation funnel (enclosed by dotted line) toward the sink when

an event occurs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2-1 Probability of successful delivery of a message using an end-to-end

model across a multi-hop network. . . . . . . . . . . . . . . . . . . . . 26

2-2 Probability of successful delivery of a message over one hop when the

mechanism allows multiple retransmissions before the next packet

arrival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2-3 Sensor network in a building. A user node at location 0 injects 50

packets into the network within 0.5 seconds. . . . . . . . . . . . . . . 43

2-4 Error tolerance comparison - average delivery ratio as a function

of the number of hops under various channel condition for different

packet error rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2-5 Comparison of average latency as a function of channel error rate. . . 46

2-6 Average delivery overhead as a function of channel error rate. . . . . 47

2-7 A 4-hop network physically arranged in a string/chain topology. . . . 49

2-8 Breakdown of PSFQ messages. Average delivery overhead is 1.2±0.13. 50

2-9 Average delivery overhead as a function of network size and density. . 51

2-10 Average delivery latency as a function of network size and density. . . 52

v

3-1 Total number of packets dropped by the sensor network per data

event packet delivered at the sink (Drop Rate) as a function of the

source rate. The x axis is plotted in log scale to highlight data points

with low reporting rates. All packets that are dropped during the

50 second simulation session are counted as part of the drop rate

including the MAC signaling (e.g., RTS/CTS/ACK and ARP), data

event, and diffusion messaging packets. . . . . . . . . . . . . . . . . . 58

3-2 A simple IEEE 802.11 wireless network of 5 nodes illustrates receiver-

based congestion detection. . . . . . . . . . . . . . . . . . . . . . . . . 67

3-3 Channel load and buffer occupancy time series traces with and with-

out virtual carrier sense (VC)+link-layer ACK, and packet delivery

trace with VC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3-4 Queueing performance of a real sensor network of Mica motes. . . . . 69

3-5 Closed-loop control model. The impact of Wsink and the multiplica-

tive decrease factor d. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3-6 MAC layer stopwatch placement for β measurement. Diagram of

receive and transmit state flows in the TinyOS MAC component code.

Placement of the stopwatch start/stop trigger points are marked with

an X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3-7 A limit on measured channel load is imposed by β. Nominal load

curve increases with constant slope as the source packet rate increases,

while the measured load saturates at a value below 70%. . . . . . . . 87

3-8 Experimental sensor network testbed topology. Nodes are well con-

nected. Packets are unicast. . . . . . . . . . . . . . . . . . . . . . . . 89

3-9 Improvement in energy tax with small fidelity penalty using CODA.

Priority of Src-2 evident from the fidelity penalty results. . . . . . . . 91

vi

3-10 Experimental sensor network testbed topology to capture the funnel-

ing effect in a larger network with sparsely located sources. . . . . . . 92

3-11 Time series traces that present the rate control dynamics and the

event fidelity/delivery performance of CODA. CODA’s rate control

scheme does not increase the degree of variability to the event delivery

performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3-12 Tradeoff between fidelity and energy tax that obtain the most benefit,

i.e. maximum “power”, for the network. . . . . . . . . . . . . . . . . 94

3-13 Power of CODA versus non-CODA in an experimental Mica motes

testbed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

3-14 Network of 30 nodes. Sensors within the range of the event epicentre,

which is enclosed by the dotted ellipse, generate impulse data when

an event occurs. The circle represents the radio range (40m) of the

sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3-15 Time series traces for densely deployed sources that generate high

rate data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3-16 (a) Packet delivery and (b) Packet drop time series traces for a 15-

node network with low rate traffic. The plots show the traces for

three cases: when only open-loop control (OCC) is used, both open-

loop and closed-loop control (CCC) are enabled and when congestion

control is disabled (noCC). . . . . . . . . . . . . . . . . . . . . . . . . 100

3-17 Average energy tax and fidelity penalty as a function of the network

size when only CODA’s open loop control is used. . . . . . . . . . . . 101

3-18 Energy tax as a function of network size for high and low rate data

traffic. The difference between the data points with and without

CODA indicates the energy saving achieved by CODA. . . . . . . . . 102

vii

3-19 Fidelity penalty as a function of the network size for high and low

rate data traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4-1 The funneling effect. Sensors within the range of an event region/epicenter

(enclosed by the dotted ellipse) generate impulse data that travel

along a propagation funnel (enclosed by dotted line) toward the sink

when an event occurs. . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4-2 Reduction of average distance in a network with increasing percentage

of dual-radio nodes that provide the shortcuts between nodes. . . . . 118

4-3 Early Congestion Detection. Different congestion level thresholds

that can avoid congestion down the funnel. . . . . . . . . . . . . . . . 138

4-4 The impact of the visibility scope of a VS for a network of 30 nodes. . 139

4-5 Fidelity and Energy Tax performance in a network where there are

always-on virtual sinks. . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4-6 Fidelity and Energy Tax performance in a network where there are

virtual sinks that are put into service only when congestion is detected.141

4-7 Number of sensor nodes required to ensure connectivity in the cor-

responding areas of network coverage as well as the number of VSs

(right vertical axis) required to ensure performance improvement. . . 144

4-8 Fraction of Virtual Sinks needed to assure improved network perfor-

mance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4-9 Requirement for a connected secondary network. The transmission

range of the long-range radio is expressed as the multiples of transmis-

sion radius of the low-power radio. The visibility scope requirement

that assures both energy tax and fidelity improvement is plotted as

filled square in the figure. . . . . . . . . . . . . . . . . . . . . . . . . 146

viii

4-10 Energy Distribution (Complementary CDF) of a 70-node network

with 3 virtual sinks scattered randomly across the network. With

Siphon’s load balancing feature, more nodes share the energy load.

Therefore, fewer nodes have residual energy larger than 85%, but

more nodes have larger residual energy (e.g., the percentage of nodes

having residual energy larger than 75% increase from 60% to 85%),

effectively increasing the operation lifetime of the network. . . . . . . 147

4-11 A sensor network testbed of 30 nodes . . . . . . . . . . . . . . . . . . 149

4-12 Early congestion detection threshold. An appropriate choice for the

early congestion detection threshold must be based on application

loss tolerance parameters. . . . . . . . . . . . . . . . . . . . . . . . . 150

4-13 Queueing performance and buffer occupancy threshold for congestion

avoidance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

4-14 Siphon performance in a real sensor network of 30 nodes. Notice the

priority favor of CODA toward src-3. . . . . . . . . . . . . . . . . . . 154

4-15 Post-facto traffic redirection versus early-detection approach. . . . . . 155

ix

Acknowledgements

First of all, I would like to thank my adviser Professor Andrew T. Campbell for his

support and constant prodding in the past few years. Professor Campbell’s high

expectation towards research has brought my thesis to the next level of practicabil-

ity and academic contribution. Furthermore, Professor Campbell’s persistence and

optimism have helped me overcome many obstacles and accomplish the impossible.

I want to thank Dr. Lakshman Krishnamurthy for giving me the opportunity to

work with him in 2001 for my summer internship with Intel. Through Lakshman, I

have learned to be practical in exploring the unknowns and to exercise practicality

in seeking solutions. Because of him, there came the creation of PSFQ, the first

chapter of my thesis.

I am deeply grateful to Dr. Wai Chen. I was inspired by his teaching at Columbia

University and I had also given the opportunity to work with him at Telcordia

Technologies as my first summer internship in the United States. Dr. Chen has

become more than just a professor or a superior at work to me, he is a true mentor.

He has given me consolations and encouragements as I faltered along during this

time of my life.

Many thanks to Professor Mischa Schwartz for his conscientiousness in editing

my thesis and great admiration for his wisdom.

Special thanks goes to Shane B. Eisenman for being such a great help to my

research and most of all, such a wonderful friend. His friendship has lit up the

otherwise boring and seemingly never-ending Ph.D. life. I will always treasure our

“mini ball game” at the nights that we spent together at the COMET lab burning

the midnight oil for our research papers.

Deep appreciation and thanks to my dearest wife for her endurance to put up

with my hectic life style for being a Ph.D. student. At last, I want to dedicate my

x

thesis to my father in Malaysia and also my beloved mother who passed away few

years ago.

xi

1

Chapter 1

Introduction

1.1. Overview

Over the last several years sensor networks have emerged as a vital new area in

networking research, one that tightly marries sensing, computing, and wireless com-

munications for the first time. This new frontier in wireless sensing presents many

new technical challenges for the research community as well as many untapped op-

portunities for a diverse set of industries and early adopters, from environmental

sensing to ubiquitous computing. This new era is also likely to have a significant

impact on how we as individuals interact with the physical world around us. In this

dissertation, we address some important networking problems associated with this

nascent field that could limit this broad vision, if not solved.

The notion of sensor networks has been around for over two decades now [1], but

recently the coming together of sensing and wireless communications has revolution-

ized the field and enabled significant advances. Early sensor networks emerged in

the 1980s and included radar networks used in air traffic control systems, and the

national power grid. However, these networks were wired networks, and while some

researchers working on these early systems envisioned large numbers of small sen-

sors, the technology to do this has only recently emerged. Today’s sensor networks

2

comprise multiple distributed sensors that can collect information associated with

events of interest wirelessly. Each sensor node has an embedded processing capa-

bility, onboard storage and communication capability, and potentially has multiple

onboard sensors interfacing with, or monitoring, the physical environment in which

the sensor resides.

There have been a number of important enablers for advances in networked sens-

ing. Today, silicon technology continues to push into two complementary directions,

i.e., increased processing speed measured in GHz and the decreased circuit feature

size measured in µm and nm. Together with important advances in miniaturization

and low power wireless technologies, the emergence of low cost, small scale devices

with sensing, communication and computing have begun to enable the much vaunted

era of ubiquitous computing [2] proposed by Mark Weiser [3] in the early 90s. The

necessary enabling technologies to revolutionize the world of sensing have finally ar-

rived. The current state of the art in microelectronics and microelectromechanical

system (MEMS) [4] provide a foundation for autonomous microsystems that com-

pose wireless sensor networks. These networks offer low cost, distributed monitoring

solutions for a wide variety of applications and systems [5] [6] [7]. Anything from

monitoring weather patterns to tracking human movement can be done with these

large-scale, highly distributed systems of small, untethered, low-power, unattended

sensors and actuators.

Wireless sensor networks represent a new class of distributed systems that op-

erate under a new set of constraints. Many tasks are greatly complicated in sensor

networks due to their unique requirements and constraints - energy efficiency, re-

source limitations in computing and communicating, scalability, automatic adapta-

tion to environmental dynamics, etc. In many ways, sensor networks are breathing

new life into a number of old problems. For example, because of the unpredictable

3

and possibly harsh environments where the sensors might be deployed, assuring re-

liable communications between sensor nodes for the purposes of control and data

collection is a significant challenge. This is in contrast with traditional networks

such as the Internet, in which reliable networking has been well studied and applied

with great success in support of everyday applications; the same comment applies

to cellular networks and wireless LAN-based networks that we use everyday.

There has been a growing amount of networking research conducted on wireless

sensor networks focusing on several important problems; these include but are not

limited to:

• Data dissemination mechanisms [8] [9] [10] in which researchers study the data

routing problems in sensor networks. These studies have also identified the

unique data-centric nature [11] of sensor networks.

• Power conservation/coordination schemes [12] [13] that allow nodes to schedule

appropriate sleep cycles among themselves to save power, while maintaining

network connectivity.

• Energy-efficient medium access schemes [14] [15] [16] for low-power radio com-

munications.

• Coverage problems [17] [18] in which researchers study the sensing coverage

and the minimal/maximal exposure paths in breach-detection applications.

• Efficient and modular operating system support [19] [20] [21] [22] for low-end,

low-power sensor platforms such as the popular Berkeley motes series [23] [24]

[6] [25].

• Distributed time synchronization [26] [27] and lightweight geographic localiza-

tion [28] [29] mechanisms without GPS support.

4

• Data fusion and signal processing techniques [30] [31] for reliable event detec-

tion/sensing, etc.

Sensor networks are embedded in the real world and interact closely with the

changing physical environment in which they reside. To the best of our knowledge

there has been little or no work on understanding to what degree do sensor networks

and their associated data collection and network management algorithms need to

be resilient to dynamic changes in network conditions and environment in which

sensors are embedded. This is the broad question that this thesis addresses. We

argue that without making the network algorithms resilient to potentially harsh and

fast changing operating conditions the rollout of sensor network technology may be

jeopardized.

Furthermore, depending on the nature of the applications, sensor networks are

expected to operate under a wide variety of environmental conditions, and under

some scenarios, during moments of natural disasters. As a result, sensor networks

must be fault-tolerant and capable of adapting to the environment. Early sensor

network research suggests that in dense sensor networks, node or communication

failures in the network could be absorbed by the natural redundancy of large num-

bers of nodes or the large correlation of the signal in time and space [32]. We

conjecture in this dissertation that the benefits of such redundancy in the network

(while helpful) is limited and not sufficient to support resilient sensor network op-

erations and applications. We believe that there is a pressing need for reliable

communication protocols and congestion control in sensor networks. This disser-

tation studies resilience in sensor networks, specifically at the data transport level

where the problems of packet losses due to communication errors and congestion

can make these networks unstable, energy wasting, and potentially nonoperational.

Our conjecture is that existing sensor network design at the data transport level is

5

not robust enough to operate even under moderate conditions, and certainly inca-

pable of delivering high fidelity data to applications under higher traffic loads, or

in environments where serious error conditions are experienced. This thesis investi-

gates aspects of this problem and explores a spectrum of general design principles

for resilient control and transport mechanisms that enable the robust operation of

sensor networks under a wide variety of conditions.

1.1.1 A Resilient Transport System for Sensor Networks

Sensor networks must deal with resources that are dynamically changing, including

energy, bandwidth, processing power, node density and connectivity. Furthermore,

they must also deal with the adverse effects from uncertain and dynamic physical

environments. Therefore, a resilient sensor network must operate autonomously,

changing its configuration as required and running algorithms that are optimized

for node survivability and energy usage.

We observe that there are three classes of data traffic patterns in sensor networks

associated with three different classes of applications. The first class represents peri-

odic traffic, which is associated with the applications that generate periodic reports

associated with environmental conditions. Such applications include habitat mon-

itoring [33] and the structural/environmental monitoring of buildings or machines

[34], e.g., the periodic reporting of the temperature readings of a room or the acous-

tic signature of a machine. The next class represents discrete event traffic, which

is associated with event sensing applications that generate discrete event reports

triggered by an event of interest, e.g., enemy and target movements on the battle-

field. Finally, the third class represents impulse wave traffic, which is associated

with data impulse applications that typically monitor large-scale disastrous events

such as earthquakes, biochemical attacks, forest fires, etc. These applications can

6

generate large impulse waves of correlated data across a large area that can easily

overwhelm the sensor network if it is not designed to be robust to these conditions.

It is likely that future sensor networks will need to be designed to carry multiple

traffic classes simultaneously. This is very challenging for the existing state of the

art.

We define transport resilience in sensor networks as the ability to deliver a suf-

ficient amount of event data to meet an application’s fidelity (e.g., rate of events)

[35] needs (for the different traffic patterns discussed above) while minimizing the

energy consumption. We investigate two classes of transport resilience in this thesis:

1. The need to reliably deliver data with minimal energy expenditure under var-

ious error conditions, potentially, significantly harsh.

2. The need to maintain the fidelity of the signal to the applications [35] under

congested network conditions, which we show in this dissertation is a signifi-

cant problem in the emerging sensor networks.

In typical sensor network applications, data from the active sensors are delivered

through the network to a relatively small number of sink points (see Figure 1-1) that

are attached to the regular communication infrastructure (e.g., Internet). On the

other hand, users control/manage the network or provide actuation instructions [36]

to the network through control signals sent from the sinks. Hence, in a general sense,

there are two information flows in opposite directions in the network, as seen from

the sink’s point of view. In this dissertation, we consider the problems associated

with supporting a resilient transport in sensor networks from the same perspective,

i.e., sinks to sources and sources to sinks.

First, we consider the data that flows from sinks to sources for the purpose of

controlling (e.g., actuation) or managing (e.g., reprogramming) the network. For

7

example, new applications/software should be ready for rapid deployment in an ad

hoc fashion, supporting the re-tasking or reprogramming of sensors, or allowing the

network to adaptively reconfigure and repair itself in response to unpredictable run-

time dynamics. For example, a user can install a new data encoding scheme on each

sensor that supports a lower transmission rate but is less sensitive to the channel

quality after the network has been deployed in a high interference environment1. As

a general comment, the information flow from sink to sources is much more sensitive

to message loss in comparison to communications between the sources and sinks. In

this dissertation, we consider the issue of loss when the application is sensitive to

such loss as the reliable transport problem.

Second, we consider data that flows from sources to sinks for the purpose of ex-

tracting useful, reliable, and timely information from the deployed sensor network.

The information flow in this respect is generally more tolerant to packet loss be-

cause of the natural redundancy inherent in disseminated sensor data [32] [8] but

is prone to suffer significant fidelity degradation due to network congestion. In this

dissertation, we consider this as the congestion control problem.

Note that in general, both the reliable transport and congestion problems are

not limited to the specific downlink or uplink directions of data flows; although,

we observe that existing sensor network applications often exhibit this directional

property of the information flows in the network. In what follows, we discuss the

challenges in supporting reliable transport and congestion control in sensor networks.

1Note that the RFM [37] radio used by the Rene2 Berkeley mote [23] sensor platform supportsthis kind of operation.

8

1.1.1.1 Reliable Transport Challenges

The ability to provide reliable data delivery is the first step toward the creation

of resilient sensor networks. Such a capability benefits a spectrum of new sensor

network services. For example, the assured delivery of important target information,

or the ability to modify the software algorithms running in sensors, (i.e., over-the-

air dynamic reconfiguration or re-tasking of sensor networks). There are several

sensor platforms that have the capability to support the re-tasking of individual

sensors but not a network of sensors. For example, the Berkeley motes [19] [23]

are capable of receiving code segments from the network and assembling them into

a completely new execution image in EEPROM secondary store before re-tasking

a sensor. Currently, however, there is no transport protocol capable of reliably

delivering code segments to groups of sensors. A number of groups have recently

started projects to build reliability into sensor networks and develop on-demand

over-the-air reprogramming infrastructure for re-tasking. For example, the SOS [22]

[38] project is developing an operating system that supports the dynamic loading of

software modules to create a system supporting dynamic addition, modification, and

removal of network services on a sensor node. Many other applications of a reliable

transports are emerging, e.g., reliable actuation, reliable signaling, and database

duplication [39] stored in remote sensors.

There are a number of challenges associated with the development of a reliable

transport protocol for sensor networks. Early experimental testbed results [40] [41]

reveal that the link quality of the wireless channel in sensor networks is highly vari-

able and unpredictable. For example, the packet reception rate can range from

90% to 50%. In the case of a re-tasking application, how can a transport support

such an application when possibly hundreds or thousands of nodes need to be re-

programmed in a controlled, reliable, robust, timely and scalable manner? Such a

9

reliable transport protocol must be lightweight and energy-efficient to be realized

on low-end sensor nodes, and capable of isolating applications from the unreliable

nature of wireless sensor networks. We address this problem in this dissertation

through the development of a reliable transport protocol and use a reprogramming

application capable of re-tasking sensor networks as an application driver.

1.1.1.2 Congestion Control Challenges

While developing the reliable transport protocol discussed in this dissertation, we

analyzed the loss patterns from a sensor network testbed and observed that signifi-

cant loss is also due to congestion for a wide range of workloads, including light and

moderate traffic. This observation leads us to the study of the congestion problem

in sensor networks.

Sensor networks are likely to suffer from various degrees of congestion depending

on the different classes of applications they run, which in turn generate different

traffic patterns (as discussed in Section 1.1.1). Even with a simple application

that generates periodic workloads, congestion can occur in wireless sensor networks

because of the poor and time-varying channel quality [41]. In this case, the time-

varying channel suffers from occasional deep fades for an extended period of time.

During these periods, the queue of the sending node grows quickly (when link-layer

ARQ is in use). This results in the eventual overflow of the sensor’s buffer and

potentially significant packet drops. In sensor networks that support applications

that generate discrete events and impulse waves, the congestion problem is much

more severe. Event-driven sensor networks typically operate under light load, but

can suddenly become active in response to a detected event. The transport of

event impulses is likely to trigger varying degrees of congestion, potentially leading

to the congestion collapse of a sensor network. Although a sensor network may

10

spend only a small fraction of its time dealing with data impulses, it is during these

impulse periods that the information it transits is of greatest importance, and it

is at this exact time that the information in transit is more likely to be lost; this

makes the congestion problem in sensor networks such a severe one. While some

researchers have broadly discussed congestion issues in sensor networks [35] there is

no comprehensive approach to solving this important problem.

1.1.2 Problem Statement

In traditional networks (e.g., IP networks), congestion control and transport relia-

bility are often coupled into a single protocol solution (e.g., TCP). This approach,

however, is not necessarily correct in the context of sensor networks. In sensor net-

works, the energy expenditure is more important than occasional data loss because

of the natural redundancy inherent in disseminated sensor data. Depending on the

application and the direction of the information flow as discussed in Section 1.1.1,

not all data packets require strict reliability. For example, applications that monitor

the temperature of a certain geographic region can tolerate occasional packet loss.

Therefore, the complex protocol machinery that would ensure the reliable delivery

of data is not always needed. Due to this application-specific nature of sensor net-

works, we argue that there is a need for the separation of transport reliability and

congestion control in sensor networks.

This dissertation investigates the tradeoffs and performance limits of applying

existing techniques to solving the reliable transport and congestion problems in

sensor networks. Based on this analysis we propose new approaches that are specif-

ically designed to best fit the unique constraints of sensor networks and emerging

application needs. The resulting transport and control algorithms proposed in this

dissertation provide a general set of mechanisms that can be plugged into applica-

11

tions or the appropriate layers of the protocol stack in support of energy efficient

reliable transport and congestion control.

1.1.3 Technical Barriers

Sensor networks have unique system characteristics and constraints [5] [6] [7] that are

significantly different from traditional networks. For example, the communication

channel condition is highly variable and unpredictable due to (i) a low-power radio

that is susceptible to channel fading; and (ii) the highly dynamic and harsh physical

environment. It may be reasonable to expect that low-power radio performance

can be improved significantly in the foreseeable future, but it is unlikely that the

improvement in the radio design itself can counter all existing adverse effects from

the environment, such as during moments of disasters such as flooding, fires, etc. In

what follows, we discuss the important technical barriers to realizing energy-efficient

reliable delivery and congestion avoidance in sensor networks.

1.1.3.1 Reliable Transport

Reliable data delivery in IP networks relies on end-to-end error recovery mechanisms

(e.g., TCP) in which only the final destination node is responsible for detecting loss

and requesting retransmissions. The end-to-end paradigm [42] has had a large im-

pact enabling the Internet and its success. However, we argue that the end-to-end

paradigm is not appropriate for the design of networking protocols in sensor net-

works, which are chiefly considered to be data-centric [11] [32] instead of user or

host-centric. In the case of reliable data delivery, the biggest problem with end-to-

end recovery has to do with the physical characteristics of the transport medium.

Sensor networks usually operate in harsh radio environments, and rely on multihop

forwarding techniques to exchange messages. Error accumulates exponentially over

12

multi-hops, therefore packet loss and reordering is more likely. Recent studies [40]

[43] [41] show that in sensor networks that use low-power radios without frequency

diversity, packet delivery performance is both highly variable and exhibits spatial

and temporal dependency. For example, the packet reception rate can range from

90% to 50%. Under such error-prone channel conditions, it is almost impossible to

deliver a single event using an end-to-end approach for larger networks. This obser-

vation suggests that end-to-end error recovery is not a good candidate for reliable

delivery in sensor networks. In Chapter 2, we propose an alternative approach that

is characterized by hop-by-hop error recovery and rate control.

TCP uses a positive acknowledgment (ACK) approach for loss detection. It is

well-known that a positive ACK approach performs better in high error rate envi-

ronments than a negative acknowledgment (NAK) approach. However, the control

overhead of the positive ACK approach is very high even under error-free conditions

because of the requirement to acknowledge each individual packet. Each of these

ACKs consumes energy and is therefore costly. Energy conservation is a particularly

critical issue in sensor networks. In a NAK based system, a node pays a price in

terms of control overhead only when the channel condition is poor. In addition, an

ACK approach can not support multicast or broadcast operations because of the

ACK implosion problem [44]. Many communication scenarios in sensor networks

are often concerned with group-based communications, i.e., one-to-many commu-

nications. This is especially true for data that flows from sinks to sources for the

purpose of control or management of the network. As a result, a NAK approach

makes more sense to provide reliable delivery in sensor networks since packet deliv-

ery is free of overhead in error-free environments and group-based communications

can be supported.

Another approach would be to layer a NAK transport upon a positive ACK link

13

Physical sink Virtual sink sensor Active sensor

Figure 1-1: The funneling effect. Sensors within the range of an event re-gion/epicenter (enclosed by the dotted ellipse line) generate data that travels along apropagation funnel (enclosed by dotted line) toward the sink when an event occurs.

layer in a unicast scenario. The feasibility of using such an approach relies on the

sensor’s radio design. Many tradeoffs exist: for example, some low-power radios

support low-overhead link-layer synchronous ACK [45] (e.g., the RFM radio used

in Mica [24]), others support built-in link-layer ACK for higher data rates up to

250 Kbps (e.g., the IEEE 802.15.4 radio used in Telos [25]), while with other radios

the support of ACKs is costly (e.g., the Chipcon radio [46] used in Mica2 [45]) in

terms of the energy and bandwidth consumption. Further complicating the issue,

these link-layer ACK approaches assume a symmetrical link environment, which is

not always present in today’s sensor networks [41]. As a result, there is a need to

study new transport approaches capable of supporting a wide range of channel error

conditions (including non-symmetrical link environments) that take energy savings,

scalability, and lightweight operations into account as key design goals.

14

1.1.3.2 Congestion Avoidance and Control

As shown in Figure 1-1, we observe that sensor networks exhibit a unique funneling

effect that significantly complicates the design of these networks. The funneling

effect results when events or periodic reports are generated and then move through

the sensor network on a hop-by-hop basis toward a relatively small number of phys-

ical sink points that are attached to the regular communication infrastructure (e.g.,

the Internet). The flow of events out of the network has similarities to the flow of

people from a large arena after sporting events complete. This leads to a number

of significant challenges including increased transit traffic intensity, congestion, and

packet loss at nodes closer to the sink.

In wired networks, congestion is signified by buffer drops and increased delays.

Therefore, monitoring the buffer size and transmission delays provides an accurate

measure for congestion detection. In sensor networks, since the transmission medium

is shared, traffic between other nodes in the neighborhood may cause interference.

As a result, radio channel quality can be seriously degraded in times of congestion,

resulting in an increase in packet error rates. Today, variants of CSMA MAC are

widely used in many sensor platforms [23]. In a CSMA-based network, because of

the contention on the medium, congestion causes the increase of packet collisions

in addition to an increase in packet error rates. These account for the majority of

the packet drops in times of congestion, as opposed to packet drops due to buffer

overflows in wired network. As a result, the buffer occupancy and transmission

delays alone can no longer provide an accurate and timely indication of congestion

in sensor networks. There is a need for new congestion detection techniques that

are efficient, accurate, and incur low cost in terms of energy, computation and

complexity.

TCP offers end-to-end flow control through its window-based control mechanism

15

and avoids congestion via the Additive Increase and Multiplicative Decrease (AIMD)

of the window size at the sending host. This mitigates congestion by aggressively

metering traffic being admitted into the network when congestion is detected. In

sensor networks, however, throttling transmission rates at the sources alone does

not resolve congestion nor its negative impact on the network. This is because

the major concerns during congestion are the degraded application data fidelity

measured at the sink, and the energy wasted due to the packet loss/drops in the

network. These occur as a result of packet collisions and degraded channel quality

due to interference. The main goal of congestion control in sensor networks is thus

to maintain data fidelity and reduce packet drops due to collision, in addition to

regulating the admitted traffic into the network at the sources.

A number of distinct congestion scenarios are likely to arise in sensor networks.

First, densely deployed sensors generating impulse data events create persistent

hotspots (i.e., congested nodes or areas) proportional to the impulse rate at loca-

tions close to the sources (e.g., within one or two hops). In this scenario, localized,

fast time scale mechanisms capable of providing backpressure from the points of

congestion back to the sources may be potentially effective. Second, sparsely de-

ployed sensors generating low data rate events create transient hotspots potentially

anywhere in the sensor field but likely farther from the sources, toward the sink.

In this case, fast time scale resolution of localized hotspots using a combination

of localized backpressure (between nodes identified in a hotspot region) and rate

limiting techniques may be potentially more effective. Because of the transient na-

ture of congestion, source nodes may not be involved in the backpressure. Third,

sparsely deployed sensors generating high data-rate events create both transient and

persistent hotspots distributed throughout the sensor field. In this final scenario, a

combination of fast time scale actions to resolve localized transient hotspots, and

16

closed loop rate regulation of all sources that contribute toward creating persistent

congestion may be potentially effective.

1.2. Thesis Outline

In order to overcome the technical barriers to supporting resilient transport system

in sensor networks discussed above, we propose to use a combination of simula-

tions, experimentations and analytical modeling to best understand the problem

and solution space. We emphasize a methodology founded on an experimental sys-

tems research approach that builds a small experimental sensor network testbed in

the laboratory to best understand the problems discussed and our proposed algo-

rithms. We study resilience issues in this small testbed, and apply what is learned

toward the construction of larger and more scalable distributed systems. We adopt

a general design principle when studying reliable transport and congestion control

that makes minimum assumptions about the network and applications. Such an

approach is important because of the application-specific nature of sensor networks.

The outline of our study is as follows.

1.2.1 Pump Slowly Fetch Quickly (PSFQ)

Chapter 2 describes the development of Pump Slowly Fetch Quickly (PSFQ), a

lightweight, scalable and robust transport protocol that is customizable to meet

the needs of supporting reliable control and management of sensor networks, and

remotely programming/re-tasking sensor nodes over-the-air. PSFQ represents a sim-

ple approach with reduced requirements on the routing infrastructure (as opposed

to IP multicast routing requirements), and reduced signaling, thereby reducing the

communication cost for data reliability. Further, PSFQ is responsive to high error

rates, allowing successful operation even under highly error-prone conditions.

17

Several important contributions come out of this work. First, we propose and

justify hop-by-hop error recovery in which intermediate nodes also take responsi-

bility for loss detection and recovery, so that reliable data exchange is done on a

hop-by-hop basis rather than end-to-end. Second, we analyze a simplified model of

our NAK-based algorithm and determine a near-optimal ratio between the timers as-

sociated with the forwarding (pump) and retransmission (fetch) operations. Third,

PSFQ exhibits a unique multi-modal communication property that provides a grace-

ful tradeoff between the packet switching and store-and-forward paradigms, depend-

ing on the channel conditions encountered. This multi-modal transport behavior is

crucial to the performance of the reliable delivery service in sensor networks and is

responsive across a wide range of bit error rates, (i.e., low, moderate, and high bit

error rates).

We present the design and implementation of PSFQ, and evaluate the proto-

col using the ns-2 simulator and an experimental wireless sensor testbed based on

Berkeley motes and the TinyOS operating system. We show that PSFQ can out-

perform existing related techniques and is highly responsive to the various error

conditions experienced in sensor networks.

1.2.2 CODA (COngestion Detection and Avoidance)

A resilient sensor network must be capable of balancing the offered load, while at-

tempting to maintain acceptable fidelity (e.g., rate of events) of the delivered signal

at the sink during periods of transient and more persistent congestion. In Chap-

ter 3, we present the design of an energy-efficient congestion control scheme for

sensor networks called CODA (COngestion Detection and Avoidance). We explore

and identify a new objective function for traffic control mechanisms in wireless sen-

sor network, which attempts to maximize the operational lifetime of the network

18

while delivering acceptable data fidelity to the applications. CODA implements

three components to realize such an objective function: (1) a timely, accurate and

energy-efficient congestion detection scheme; (2) a hop-by-hop backpressure algo-

rithm; and (3) a sink to multi-source regulation scheme. To evaluate CODA in a

realistic environment, we study and analyze a number of congestion scenarios that

we believe will be prevalent in sensor networks. We define new performance metrics

suitable for sensor network transport that capture the impact of CODA on a sensing

application’s performance. Furthermore, we discuss the performance benefits and

practical engineering challenges of implementing CODA in an experimental sensor

network testbed based on Berkeley motes using CSMA. Both testbed and simu-

lation results indicate that CODA significantly improves the performance of data

dissemination applications such as Directed Diffusion [11] by mitigating hotspots,

and reducing the energy consumption and fidelity penalty of sensing applications.

1.2.3 Dual Radio Virtual Sinks

The combination of the funneling effect and low-power radio channel can signifi-

cantly limit the network’s ability to deliver high fidelity data from sources to sinks

(i.e., to applications). To overcome this capacity limitation, new technologies must

be studied and developed. We observe that CODA provides a conservative solution

to mitigating congestion in sensor networks and assumes that all nodes are equal

(with the exception of the sink) in trying to counter and react to the onset of conges-

tion. When congestion occurs and the channel becomes saturated, the application

fidelity, which can be viewed as the application’s quality of service measured at

the sink, can be significantly degraded. This is because CODA’s congestion control

policy at sources and forwarding nodes is to rate control the traffic during peri-

ods of persistent congestion. While CODA and other traditional congestion control

19

schemes are capable of avoiding congestion and costly packet loss and therefore en-

ergy waste, it is to the detriment of the maximum number of events that can be

delivered to the sink.

In Chapter 4, we study an alternative but complementary solution to CODA

that maintains the application’s fidelity during persistent overload conditions. A

number of observations inform our study and design; primarily, “small worlds” [47]

research has shown that a small fraction of shortcut nodes randomly distributed

in a network is enough to effectively reduce the network diameter resulting in a

fast distribution network. Inspired by this result, we propose a new approach to

mitigating congestion in sensor networks based on the concept of dual radio virtual

sinks. Our proposal is as follows. After randomly distributing a small number of

all-wireless dual radio virtual sinks throughout the sensor field, we propose to enable

these virtual sinks to operate as safety valves in the sensor field. Specifically, virtual

sinks selectively siphon off high load traffic in order to maintain the fidelity of the

application signal at the physical sink. Virtual sinks are equipped with a secondary

long-range radio interface, such as the IEEE 802.11, in addition to their primary low

power mote radio. Virtual sinks are capable of dynamically forming a secondary ad

hoc radio network. Rather than rate controlling packets as is the case with CODA,

virtual sinks take the congested traffic off the low-powered low-bandwidth primary

sensor network and move it on to the higher-bandwidth secondary radio network,

transiting it to the final physical data sink.

Chapter 4 explores algorithms for virtual sink discovery, selection, traffic tran-

siting, and load balancing. We leverage the use of heterogeneous sensors and study

the use of Stargate [48] nodes to support an all-wireless virtual sink approach in our

sensor network testbed. We show that a small number of virtual sinks are sufficient

to significantly improve the data fidelity of the sensor networks while operating in

20

overload conditions.

1.3. Thesis Contributions

This dissertation provides several broad contributions summarized as follows:

1. There is a growing need to support reliable data communications in sensor

networks that are capable of supporting new applications, such as the assured

delivery of high priority events to sinks, the duplication of a database [39]

stored in a remote sensor on other sensors, the reliable control and manage-

ment of sensor networks, and remotely programming/re-tasking sensor nodes

over-the-air [19] [22]. This represents a significant challenge because there

is no prior work on reliable transports for sensor networks, and most of the

existing approaches in wired and mobile ad hoc (MANET) networks can not

guide an efficient solution to this problem. Our work in PSFQ as described in

Chapter 2 has been widely recognized as the first contribution to the problem

of reliable delivery in sensor networks. Following the publication of PSFQ

[49], a number of follow-up studies have been conducted that continue to ad-

vance the development of a de facto standard for reliable transport in sensor

networks.

2. While some researchers have discussed congestion issues [35] there has been

no comprehensive approach to the problem. CODA is the first such general

algorithmic approach that includes a low-cost sampling scheme for congestion

detection, a backpressure algorithm, and sink to multi-source regulation, as

presented in Chapter 3. Hop-by-hop error recovery and flow control have been

proposed before in wired networks [50] [51] [52] but not for sensor networks.

The major contributions of CODA include a low cost congestion detection

21

technique, and a new objective function that governs the design of two energy-

efficient control mechanisms, that are responsive to a wide set of congestion

scenarios and congestion time scales.

3. Dual radio wireless systems have been proposed in the literature [53] for cel-

lular and WLAN-based networks. However, there is limited understanding

of the use of dual radio nodes in sensor networks. The contribution of vir-

tual sinks and its support Siphon protocol discussed in Chapter 4 allows the

network to maintain application fidelity even under overload conditions. We

believe it is essential that networks comprising a large number of sensors in-

corporate heterogeneity by means of special nodes (e.g., Stargate nodes) that

offer enhanced services to applications. These special nodes will likely offer

other services than the ones we study, such as additional storage and computa-

tional capability. As such, the Siphon protocol discussed in Chapter 4 is more

broadly applicable to new special node services. Therefore, this contribution

explores general design principles that exploit dual radio nodes in sensor net-

works. However, the specific contribution of virtual sinks is that they add a

new level of resilience to sensor networks.

Collectively, PSFQ, CODA, and virtual sinks considerably enhance the resilience

and performance of the transport system for wireless sensor networks.

22

Chapter 2

Pump-Slowly, Fetch-Quickly (PSFQ): A Reliable

Transport Protocol for Sensor Networks

2.1. Introduction

There is a considerable amount of research in the area of wireless sensor networks

ranging from real-time tracking to ubiquitous computing where users interact with

potentially large numbers of embedded devices. This chapter addresses the design of

system support for a new class of applications emerging in wireless sensor networks

that require reliable data delivery. One such application that is driving our research

is the reprogramming or “re-tasking” of groups of sensors over-the-air. This is one

new application in sensor networks that requires the underlying transport protocol

to support reliable data delivery. Today, sensor networks tend to be application

specific and are typically hard-wired to perform a specific task efficiently at low cost.

We believe that as the number of sensor network applications grows, there will be a

need to build more powerful general-purpose hardware and software environments

capable of reprogramming or re-tasking sensors to do a variety of tasks. These

general-purpose sensors would be capable of servicing new and evolving classes of

applications. Such systems are beginning to emerge. For example, the Berkeley

23

motes [23] [54] [24] are capable of receiving code segments from the network and

assembling them into a completely new execution image in EEPROM secondary

store before re-tasking a sensor.

Unlike traditional networks (e.g., IP networks), reliable data delivery is still an

open research question in the context of wireless sensor networks. To our knowledge

there has been little work on the design of reliable transport protocols for sensor

networks. This is expected because the vast majority of sensor network applica-

tions do not require reliable data delivery. For example, in applications such as

temperature monitoring or animal location tracking, the occasional loss of sensor

readings is tolerable, and therefore, the complex protocol machinery that would

ensure the reliable delivery of data is not needed. Directed Diffusion [8] is one of

a representative class of data dissemination mechanisms, specifically designed for a

general class of applications in sensor networks. Directed Diffusion provides robust

dissemination through the use of multi-path data forwarding, but the correct recep-

tion of all data messages is not assured. We observed that in the context of sensor

networks, data that flows from sources to sinks is generally tolerable of loss. On

the other hand, data that flows from sinks to sources for the purpose of control or

management (e.g., re-tasking sensors, actuation) is sensitive to message loss. For

example, disseminating a program image to sensor nodes is problematic. Loss of

a single message associated with code segment or script would render the image

useless and the re-tasking operation a failure.

There are a number of challenges associated with the development of a reliable

transport protocol for sensor networks. For example, in the case of a re-tasking

application there may be a need to reprogram certain groups of sensors (e.g., within

a disaster recovery area). This would require addressing groups of sensors, loading

new binaries into them, and then, switching over to the new re-tasked application in

24

a controlled manner. Another example of new reliable data requirements relates to

simply injecting scripts into sensors to customize them rather than sending complete,

and potentially bandwidth demanding, code segments. Such re-tasking becomes in-

creasingly challenging as the number of sensor nodes in the network grows. How can

a transport protocol offer suitable support for such a re-tasking application where

possibly hundreds or thousands of nodes need to be reprogrammed in a controlled,

reliable, robust and scalable manner? Such a reliable transport protocol must be

lightweight and energy-efficient to be realized on low-end sensor nodes, such as, the

Berkeley mote series of sensors, and capable of isolating applications from the unreli-

able nature of wireless sensor networks in an efficient and robust manner. The error

rates experienced by these wireless networks can vary widely, and therefore, any

reliable transport protocol must be capable of delivering reliable data to potentially

large groups of sensors under such conditions.

In this chapter, we propose PSFQ (Pump Slowly, Fetch Quickly), a new reli-

able transport protocol for wireless sensor networks. Due to the application-specific

nature of sensor networks, it is hard to generalize a specific scheme that can be

optimized for every application. Rather, the focus of this chapter is the design

and evaluation of a new transport system that is simple, robust, scalable, and cus-

tomizable to different applications’ needs. PSFQ represents a simple approach with

reduced requirements on the routing infrastructure (as opposed to IP multicast

routing requirements), reduced signaling, thereby reducing the communication cost

for data reliability, and finally, responsive to high error rates allowing successful

operation even under highly error-prone conditions.

This chapter represents an extended version of the work that first appeared in

[49] and is organized as follows. Section 2.2. presents the PSFQ model and discusses

important design choices. Section 2.3. details the design of the PSFQ pump, fetch

25

and report mechanisms. Section 2.4. presents an evaluation of the protocol and

comparison to existing related techniques such as Scalable Reliable Multicast (SRM)

[55] using the ns-2 simulator. We show that PSFQ can outperform an idealized SRM

scheme and is highly responsive to the various error conditions experienced in sensor

networks. Section 2.5. discusses experimental results from the implementation of

PSFQ in a wireless sensor testbed based on Berkeley motes. Section 2.6. discusses

related work, and finally, we present some concluding remarks in Section 2.7..

2.2. Protocol Design Space

The key idea that underpins the design of PSFQ is to distribute data from a source

node by pacing data at a relatively slow speed (“pump slowly”), but allowing nodes

that experience data loss to fetch (i.e., recover) any missing segments from imme-

diate neighbors very aggressively (local recovery, “fetch quickly”). Messages that

are lost are detected when a higher sequence number than expected is received at

a node triggering the fetch operation, (i.e., an energy-efficient negative acknowledg-

ment system that PSFQ is based on). The motivation behind our simple model is to

achieve loose delay bounds while reducing the lost recovery cost by using localized

recovery of data among immediate neighbors.

2.2.1 Hop-by-Hop Error Recovery

To achieve these goals we have taken a different approach in comparison to tradi-

tional end-to-end error recovery mechanisms in which only the final destination node

is responsible for detecting loss and requesting retransmission. The biggest problem

with end-to-end recovery has to do with the physical characteristic of the trans-

port medium. Sensor networks usually operate in harsh radio environments, and

rely on multihop forwarding techniques to exchange messages. Error accumulates

26

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 4 6 8 10 12 14

Suc

cess

Rat

e

Network Size (number of hops)

1% error5% error

10% error20% error30% error40% error50% error

Figure 2-1: Probability of successful delivery of a message using an end-to-end modelacross a multi-hop network.

exponentially over multi-hops, therefore, packet loss and reordering is more likely.

To simply illustrate this, assume that the packet error rate of a wireless channel

is p then the chances of exchanging a message successfully across n hops decreases

quickly to (1−p)n. Figure 2-1 illustrates this problem numerically. Figure 2-1 plots

the success rate as a function of the network size in number of hops, and shows

that for larger networks it is almost impossible to deliver a single message using

an end-to-end approach in a lossy link environment when the error rate is larger

than 10%. In [56] [41] the authors show that it is not unusual to experience error

rates of 10% or above in dense wireless sensor networks. We believe that the error

rate could be even higher in many cases, such as, military applications, industrial

process monitoring, and disaster recovery activities. This observation suggests that

end-to-end error recovery is not a good candidate for reliable transport in wireless

sensor networks.

27

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.01 0.1 0.3 0.6 0.9

Suc

cess

Rat

e

Packet Loss Rate (Log scale)

Allow 7 retransmissionsAllow 5 retransmissionsAllow 3 retransmissionsAllow 1 retransmission

no retransmission

Figure 2-2: Probability of successful delivery of a message over one hop when themechanism allows multiple retransmissions before the next packet arrival.

We propose hop-by-hop error recovery in which intermediate nodes also take

responsibility for loss detection and recovery, so reliable data exchange is done on

a hop-by-hop basis rather than end-to-end. This approach essentially segments

multihop forwarding operations into a series of single hop transmission processes

that eliminate error accumulation. The hop-by-hop approach thus scales better

and is more tolerable to errors while reducing the likelihood of packet reordering in

comparison to end-to-end approaches.

2.2.2 Fetch/Pump Relationship

For a negative acknowledgment system, the data delivery latency would be depen-

dent on the expected number of retransmissions for successful delivery. To reduce

the latency, it is essential to maximize the probability of successful delivery of a

packet within a “controllable time frame”. An intuitive approach to doing this

28

would be to enable the possible multiple retransmissions of packet i (therefore in-

creasing the chances of successful delivery) before the next packet i + 1 arrives; in

other words, clear the queue at a receiver (e.g., an intermediate sensor) before new

packets arrive in order to keep the queue length small and hence reduce the delay.

However, it is non-trivial to determine the optimal number of retransmissions that

tradeoff the success rate, (i.e., probability of successful delivery of a single message

within a time frame) against wasting too much energy on retransmissions. In order

to investigate and justify this design decision, we analyze a simple model, which

approximates this mechanism. Assuming that the packet loss rate p stays constant

during the controllable time frame, it can be shown that in a negative acknowledg-

ment system, the probability of a successful delivery of a packet between two nodes

that allows k retransmissions can be expressed recursively as:

(1 − p) + p × Ω(k) (k ≥ 1) (2.1)

where,

Ω(k) = Φ(1) + Φ(2) + . . . + Φ(k)

Φ(k) = (1 − p)2 × [1 − p − Φ(1) − . . . − Φ(k − 1)] (Φ(0) = 0) (2.2)

Ω(k) is the probability of a successful recovery of a missing segment within k re-

transmission, Φ(k) is the probability of a successful recovery of the missing segment

at kth retransmission. The above expressions are evaluated numerically against the

packet loss rate p, as shown in Figure 2-2, demonstrating the impact of increasing

the number of retransmissions up to k equal to 7. We can see that substantial

improvements in the success rate can be gained in the region where the channel

error rate is between 0 and 60%. However, the additional benefit of allowing more

29

retransmissions diminishes quickly and becomes negligible when k is larger than 5.

This simple analysis implies that the tolerable ratio between the timers associated

with the pump and fetch operations is within the range between 3 and 5.

2.2.3 Multi-modal Operations

In a negative acknowledgment system, a local loss event could propagate to down-

stream nodes if higher sequence number packets are continuously forwarded. The

propagation of a loss event could cause a serious waste of energy because a loss

event will trigger error recovery operations that attempt to fetch the missing packet

quickly from immediate neighbors, whereas none of their (downstream nodes) neigh-

bors would have the missing packet. Therefore, the loss cannot be recovered and the

control messages associated with the fetch operation are wasted. As a result, it is

necessary to make sure that intermediate nodes only relay messages with continuous

sequence numbers.

The use of a data cache is required to buffer messages to ensure in-sequence

data forwarding and the complete recovery for any fetch operations from down-

stream nodes. Note that the cache size effect is not investigated here but for our

reference application (i.e., re-tasking) the cache keeps all code segments. This pump

mechanism not only prevents propagation of loss events and the triggering of un-

necessary fetch operations from downstream nodes, but it also greatly contributes

toward the error tolerance of the protocol against channel conditions. By localizing

loss events and not relaying any higher sequence number messages until recovery

has taken place, this mechanism operates in a similar fashion to a store-and-forward

approach where an intermediate node relays a file only after the node has received

the complete file. The store-and-forward approach is effective in highly error-prone

environments because it essentially segments the multi-hop forwarding operations

30

into a series of single hop transmission processes.

PSFQ benefits from the following tradeoff between store-and-forward and packet

switching. The pump operation operates in a multihop packet-switching mode dur-

ing periods of low errors when lost packets can be recovered quickly, and behaves

more like store-and-forwarding communications when the channel is highly error-

prone. Therefore, PSFQ exhibits a novel multi-modal communications property

that provides a graceful tradeoff between the packet switching and store-and-forward

paradigms, depending on the channel conditions encountered.

2.3. Protocol Description

PSFQ comprises three protocol functions: message relaying (pump operation), relay-

initiated error recovery (fetch operation), and selective status reporting (report op-

eration). A user injects messages into the network and intermediate nodes buffer

and relay messages with the proper schedule to achieve loose delay bounds. A relay

node maintains a data cache and uses cached information to detect data loss, initi-

ating error recovery operations if necessary. It is important for the user to obtain

statistics about the dissemination status in the network as a basis for subsequent

decision-making, such as the correct time to switch over to the new task in the

case of re-tasking/ programming sensors over-the-air. Therefore, it is necessary to

incorporate a feedback and reporting mechanism into PSFQ that is flexible (i.e.,

adaptive to the environment) and scalable (i.e., minimizes the overhead).

In what follows, we describe the main PSFQ operations (viz. pump, fetch and

report) with specific reference to a re-tasking application – one in which a user

needs to re-task a set of sensor nodes by distributing control scripts or binary code

segments into the targeted sensor nodes.

31

2.3.1 Pump Operation

Recall that PSFQ is not a routing solution but a transport scheme. In a case where

a specific node needs to be addressed directly, instead of a whole group of sensors,

which is the norm, then PSFQ can operate on top of existing routing schemes (e.g.,

DSDV [57]) to support reliable data transport1. A user node can use TTL-based

as well as group address filtering [19] methods to control the scope of its re-tasking

operation. Note, however, that this method does not provide accurate scope control

because in many cases the intended receivers cannot be neatly defined by a limit of

TTL. To enable local loss recovery and in-sequence data delivery, a data cache is

created and maintained at intermediate nodes. The pump operation is important

in controlling the timely dissemination of code segments to all target nodes, and

providing basic flow control so that the re-tasking operation does not overwhelm

the regular operations of the sensor network. This requires proper scheduling for

data forwarding. We adopt a simple scheduling scheme, which use two timers Tmin

and Tmax for scheduling purposes.

2.3.1.1 Pump Timers

A user node broadcasts a packet to its neighbors every Tmin until all the data

fragments have been sent out. Neighbors that receive this packet will check against

their local data cache discarding any duplicates. If this is a new message PSFQ

will buffer the packet and decrease the TTL by 1. If the TTL value is not zero

and there is no gap in the sequence number, then PSFQ sets a schedule to forward

the message. The packet will be delayed for a random period between Tmin and

1To support reliable transport for any-to-any communication scenarios, PSFQ is layered upon arouting scheme and uses the unicast address of the destination node instead of a broadcast addressin data packets. In order to support hop-by-hop error recovery, a “snoop” component is needed tocopy packets from the routing agent to the PSFQ agent. In this case, only nodes en-route to thedestination node, as determined by the routing algorithm, participate in the PSFQ operations.

32

Tmax and then relayed to its neighbors that are one or more hops away from the

source. In this specific reference case PSFQ simply rebroadcasts the packet. A

packet propagates outward from the source node up to TTL hops away in this

mode. The random delay before forwarding a message is necessary to avoid collisions

because RTS/CTS dialogues are inappropriate in broadcasting operations when the

timing of rebroadcasts among interfering nodes can be highly correlated.

Tmin has several considerations. First, there is a need to provide a time-buffer

for local packet recovery. One of the main motivations behind the PSFQ paradigm

is to recover lost packets quickly among immediate neighboring nodes within a

controllable time frame. Tmin serves such a purpose in the sense that a node has an

opportunity to recover any missing segment before the next segment comes from its

upstream neighbors, since a node must wait at least Tmin before forwarding a packet

as part of the pump operation. Next, there is a need to reduce redundant broadcasts.

In a densely deployed network it is not unusual to have multiple immediate neighbors

within radio transmission range. In [58], the authors show that a rebroadcast system

can provide only 0-61% additional coverage over what was already covered by the

previous transmissions. Furthermore, it is shown that if a message has been heard

more than 4 times the additional coverage is below 0.05%. Tmin associated with

the pump operation provides an opportunity for a node to hear the same message

from other rebroadcasting nodes before it would actually have started to transmit

the message. A counter is used to keep track of the number of times the same

broadcast message is heard. If the counter reaches 4 before the scheduled rebroadcast

of a message then the transmission is cancelled and the node will not relay the

specific message because the expected benefit (additional coverage) is very limited

in comparison to the cost of transmission. Tmax can be used to provide a loose

statistical delay bound for the last hop to successfully receive the last segment of

33

a complete file (e.g., a program image or script). Assuming that any missing data

is recovered within one Tmax interval using the aggressive fetch operation described

in next section, then the relationship between delay bound D(n) and Tmax is as

follows: D(n) = Tmax × n× (Number of hops), where n is the number of fragments

of a file.

2.3.2 Fetch Operation

A node goes into the PSFQ fetch mode once a sequence number gap in a file’s

fragments is detected. A fetch operation is the proactive act of requesting a retrans-

mission from neighboring nodes once loss is detected at a receiving node. PSFQ

uses the concept of loss aggregation whenever loss is detected; that is, it attempts

to batch up all message losses in a single fetch operation whenever possible.

2.3.2.1 Loss Aggregation

Data loss is often correlated in time because of fading conditions and other chan-

nel impairments. As a result loss usually occurs in batches (bursty loss). PSFQ

aggregates loss such that the fetch operation deals with a “window” of lost packets

instead of a single packet loss. In a dense network where a node usually has more

than one neighbor it is possible that each of its neighbors only obtains or retains

part of the missing segments in the loss window. PSFQ allows different segments

of the loss window to be recovered from different neighbors. In order to reduce

redundant retransmissions of the same segment each neighbor waits for a random

time before transmitting segments. Other nodes that have the data and scheduled

retransmissions will cancel their timers if they hear the same repair from a neigh-

boring node. In poor radio environments successive loss could occur including loss

of retransmissions and fetch control messages. Therefore, it is not unusual to have

34

multiple gaps in the sequence number of messages received by a node after several

such failures. Aggregating multiple loss windows in the fetch operation increases

the likelihood of successful recovery in the sense that as long as one fetch control

message is heard by one neighbor then all the missing segments could be resent by

this neighbor.

2.3.2.2 Fetch Timer

In fetch mode, a node aggressively sends out NAK messages to its immediate neigh-

bors to request missing segments. If no reply is heard or only a partial set of missing

segments are recovered within a period Tr (Tr < Tmax, this timer defines the ratio

between pump and fetch, as discussed earlier) then the node will resend the NAK

every Tr interval (with slight randomization to avoid synchronization between neigh-

bors) until all the missing segments are recovered or the number of retries exceed

a preset threshold thereby ending the fetch operation. The first NAK is scheduled

to be sent out within a short delay that is randomly computed between 0 and ∆

( Tr). The first NAK is cancelled (to keep the number of duplicates low) in the

case where a NAK for the same missing segments is overheard from another node

before the NAK is sent. Since ∆ is small the chance of this happening is relatively

small. In general, retransmissions in response to a NAK coming from other nodes

are not guaranteed to be overheard by the node that cancelled its first NAK. In

[58] the authors show that at most there is a 40% chance that the canceling node

receives the retransmitted data under such conditions. Note, however, that a node

that cancels its NAK will eventually resend a NAK within Tr if the missing seg-

ments are not recovered, therefore, such an approach is safe and beneficial given the

tradeoffs.

To avoid the message implosion problem NAK messages never propagate; that

35

is, neighbors do not relay NAK messages unless the number of times the same NAK

is heard exceeds a predefined threshold while the missing segments requested by the

NAK message are no longer retained in a node’s data cache. In this case, the NAK

is relayed once, which in effect broadens the NAK scope to one more hop to increase

the chances of recovery.

Each neighbor that receives a NAK message checks the loss window field. If

the missing segment is found in its data cache the neighboring node schedules a

reply event (sending the missing segment) at a random time between ( 1

4Tr,

1

2Tr).

Neighbors will cancel this event whenever a reply to the same NAK for the same

segment is overheard. In the case where the loss window in a NAK message contains

more than one segment to be resent, or more than one loss window exists in the NAK

message, then neighboring nodes that are capable of recovering missing segments

will schedule their reply events such that packets are sent in-sequence at a speed

that is not faster than once every 1

4Tr.

2.3.2.3 Proactive Fetch

As in many negative acknowledgment systems the fetch operation described above

is a reactive loss recovery scheme in the sense that a loss is detected only when a

packet with a higher sequence number is received. This could cause problems on

rare occasions; for example, if the last segment of a file is lost there is no way for

the receiving node to detect this loss since no packet with a higher sequence number

will be sent. In addition, if the file to be injected into the network is small (e.g., a

script instead of binary code), it is not unusual to lose all subsequent segments up

to the last segment following a bursty loss. In this case, the loss is also undetectable

and thus non-recoverable with such a reactive loss detection scheme. In order to

cope with these problems PSFQ supports a timer-based proactive fetch operation

36

such that a node can also enter the fetch mode proactively and send a NAK message

for the next segment or the remaining segments if the last segment has not been

received and no new packet is delivered after a period of time Tpro.

The proactive fetch mechanism is designed to automatically trigger the fetch

mode at the proper time. If the fetch mode is triggered too early, then the extra

control messaging might be wasted since upstream nodes may still be relaying mes-

sages or they may not have received the necessary segments. In contrast, if the fetch

mode is triggered too late, then the target node might waste too much time waiting

for the last segment of a file, significantly increasing the overall delivery latency of

a file transfer. The correct choice of Tpro must consider these two cases. In our

reference application, where each segment of a file needs to be kept in a data cache

or external storage for the re-tasking operation, the proactive fetch mechanism will

“Nak” for all the remaining segments up to the last segment if the last segment

has not been received and no new packet arrives after a period of time Tpro. Tpro

should be proportional to the difference between the last highest sequence number

(Slast) packet received and the largest sequence number (Smax) of the file (the dif-

ference is equal to the number of remaining segments associated with the file), i.e.,

Tpro = α × (Smax − Slast) × Tmax (α ≥ 1). α is a scaling factor to adjust the delay

in triggering the proactive fetch and should be set to 1 for most operational cases.

This definition of Tpro guarantees that a node will wait long enough until all

upstream nodes have received all segments before a node moves into the proactive

fetch mode. This enables a node to start the proactive fetch earlier when it is closer

to the end of a file, and wait longer when it is further from completion. Such an

approach adapts nicely to the quality of the radio environment. If the channel is in a

good condition, then it is unlikely to experience successive packet loss; therefore, the

reason for the reception of no new messages prior to the anticipated last segment is

37

most likely due to the loss of the last segment, hence, it is wise to start the proactive

fetch promptly. In contrast, a node is likely to suffer from successive packet loss

when the channel is error-prone; therefore, it makes sense to wait longer before

pumping more control messages into the channel. If the sensor network is known

to be deployed in a harsh radio environment then α should be set larger than 1 so

that a node waits longer before starting the proactive fetch operation.

In other applications where the data cache size is small and nodes only can keep

a portion of the segments that have been received, the proactive fetch mechanism

will “Nak” for the same amount of segments (or less) that the data cache can

maintain. In this case, Tpro should be proportional to the size of the data cache. If

the data cache keeps n segments, then Tpro = α × n × Tmax(α ≥ 1). As discussed

previously, α should be set to 1 in low error environments and to a larger value in

harsher radio environments. This approach keeps the sequence number gap at any

node smaller than n, i.e., it makes sure that a node will fetch proactively after n

successive missing segments. Recall that a node waits at most Tmax before relaying

a message in the pump operation so that the probability of finding missing segments

in the data cache of upstream nodes is maximized.

The proactive fetch operation would ensure all intended receivers eventually

receive all of the data. But like any protocols that try to a maximum number before

giving up, PSFQ proactive fetch could stop after reaching a threshold, which is an

application-specific choice.

2.3.2.4 Signal Strength based Fetch

Recent studies [56] [41] [43] show that in sensor networks that use low-power radios

without frequency diversity, there exists very high variability in the packet delivery

performance that is both spatial and temporal dependent. Because of intermittent

38

packet reception from nodes that are more than a single hop away (however weak the

signal is) can cause nodes to send unnecessary NAK messages and retransmissions,

PSFQ also takes into consideration the received signal strength of a packet during

the fetch and repair operations. A node maintains a table of parent nodes (i.e.,

nodes from which it receives messages) with their associated average signal quality

measurements. When a node detects a gap in the sequence number upon receiving

a packet it only respond and send out a NAK if this packet comes from a parent

with the strongest average signal quality measurement. This effectively suppresses

unnecessary NAK messages triggered by the reception of packets that come from

nodes that are multiple hops away.

Similarly, when a node transmits a NAK message it includes the preferred parent

with the strongest average signal in the message. Nodes that receive this NAK will

determine if they are the preferred parent/neighbor. All non-preferred neighbors

double their response time delay in sending repair packets so that they have a greater

chance of hearing the repair packet from a better candidate node (i.e., preferred

parent/neighbor), allowing the node to cancel a repair whenever a response is heard

before sending. This approach prevents nodes sending redundant retransmissions

when they do not have a good chance of delivering these messages to a fetching

node.

2.3.3 Report Operation

PSFQ supports an optional report operation designed specifically to feedback data

delivery status to users in a simple and scalable manner. In wireless communication

it is well known that the communication cost of sending a long message is less

than sending the same amount of data using many shorter messages [59]. Given

the potentially large number of target nodes in a sensor network in addition to

39

potentially long paths (i.e., longer paths through multi-hops greatly increases the

delivery cost of data), the network would become overwhelmed if each node sent

feedback in the form of report messages. Therefore, there is a need to minimize the

number of messages used for feedback purposes.

A node enters the report mode when it receives a data message with the report bit

set in the message header. The user node sets the report bit in its injected message

whenever it needs to know the latest status of the surrounding nodes. To reduce the

number of report messages and to avoid report implosion only the last hop nodes,

(i.e., TTL = 1) will respond immediately by initiating a report message by sending it

to its parent node, where the previous segment came from, at a random time between

(0, ∆). Each node along the path toward the source node will piggyback its report

message by adding its own status information into the report, and then propagate

the aggregated report toward the user node. Each node will ignore the report if it

found its own ID in the report to avoid looping. Nodes that are not last hop nodes

but are in report mode will wait for a period of time (Treport = Tmax × TTL + ∆)

sufficient to receive a report message from a last hop node, enabling it to piggyback

its state information. A node that has not received a report message after Treport in

the report mode will initiate its own report message and send it to its parent node.

If the network is very large then it is possible for a node to receive a report message

that has no space to append more state information. In this case, a node will create

a new report message and send it prior to relaying the previously received report

that had no space remaining to piggyback its state information. This ensures that

other nodes en-route toward the user node will use the newer report message rather

than creating new reports because they themselves received the original report with

no space for piggybacking additional status.

40

2.3.4 Single-Packet Message Delivery

There is a need to support the reliable delivery of single-shot atomic messages in

sensor networks, for example, in support of reliable control and management of

sensors. For messages that fit into a single packet (e.g., smaller than the network

MTU) delivery failure is undetectable using PSFQ’s NAK-based protocol without

the addition of explicit signaling. This is because PSFQ detects loss by observing

sequence number gaps or timeouts. To address this service need PSFQ makes use

of its reporting primitive to acquire application-specific feedback at the sink. PSFQ

sets the report bit at the sink in every single-packet message that requires reliable

delivery. Based on the feedback status the sink resends the packet until all receivers

confirm reception. This essentially turns PSFQ into a positive aggregated-ACK

protocol used in an on-demand manner by the sink for these special case messages.

The use of the report mechanism to support reliable data delivery of single-shot

atomic messages highlights the flexible use of PSFQ mechanisms to meet application

specific needs.

2.4. Performance Evaluation

We use packet-level simulation to study the performance of PSFQ in relation to

several evaluation metrics and discuss the benefits of some of our design choices.

Simulation results indicate that PSFQ is capable of delivering reliable data in wire-

less sensor networks even under highly error prone conditions.

2.4.1 Simulation Approach

We implement PSFQ as part of our reference re-tasking application using the ns-

2 network simulator [60]. In order to highlight the different design choices made

we compare the performance of PSFQ to an idealized version of Scalable Reliable

41

Multicast (SRM) [55], which has some similar properties to PSFQ, but is designed

to support reliable multicast services in IP networks. While there is a growing

body of work in multicast [61] [62] in mobile ad hoc networks and some initial work

on reliable multicast support [63] [64], we have chosen SRM as the best possible

candidate that is well understood in the literature. SRM supports reliable multicast

on top of IP and uses three control messages for reliable delivery, including session,

request and repair messaging. Because SRM is designed to operate on top of an IP

multicasting substrate, it assumes an environment where there is a single path from

a source to an individual receiver, and each node receives each multicast packet at

most once. SRM is also intended for a topology where routers are not active members

of the group and do not maintain state, except for that needed for multicast routing.

SRM represents a scheme that uses explicit signaling for reliable data delivery while

PSFQ is a more minimalist transport that can be unicast (on top of routing) or

broadcast, and does not require periodic signaling.

We compare PSFQ with the loss detection/recovery approach of SRM but extract

out the IP multicast substrate and replace it with an idealized omniscient multicast

routing scheme. We therefore only compare the reliable delivery portions of the

SRM and PSFQ protocols. Since PSFQ uses a simple broadcast mechanism as a

means for routing in our reference application, it makes sense to layer SRM over an

ideal omniscient multicast routing layer for simulation purposes. Using omniscient

multicast, the source transmits its data along the shortest-path multicast tree to all

intended receivers in which the shortest path computation and the tree construction

to every destination is free in term of communication cost.

The major purpose of our comparison is to highlight the impact of different

design choices made. SRM represents a traditional receiver-based reliable transport

solution and is designed to be highly scalable for Internet applications. The SRM

42

service model has the closest resemblance to our reference application in sensor

networks. However, SRM is designed to operate in the wired Internet in which the

transport medium is highly reliable and does not suffer from the unique problems

found in wireless sensor networks, such as, hidden terminal and interference. To

make a fair comparison, we try to idealize the lower layer to minimize the differences

of the transport medium (which SRM is designed for) for simulation purposes, and,

solely focus on the reliable data delivery mechanism - we term this idealized SRM

scheme as SRM-I.

The goal of our evaluation is also to justify the design choices of PSFQ. We choose

three metrics that underpin the major motivation behind the design of PSFQ:

• Average Delivery Ratio, which measures the ratio of the number of messages

a target node received to the number of messages a user node injects into the

network. This metric indicates the error tolerance of a scheme at the point

where a scheme fails to deliver 100% of the messages injected by a user node

within certain time limits.

• Average Latency, which measures the average time elapsed between the trans-

mission of the first data packet from the user node until the reception of the

last packet by the last target node in the sensor network. This metric examines

the delay bound performance of a scheme.

• Average Delivery Overhead, which measures the total number of messages

sent per data message received by a target node. This metric examines the

communication cost to achieve reliable delivery over the network.

We study these metrics as a function of the channel error rate as well as the

network size.

43

20 m

0

25 m

Figure 2-3: Sensor network in a building. A user node at location 0 injects 50packets into the network within 0.5 seconds.

To evaluate PSFQ in a realistic scenario, we simulate the re-tasking of a simple

sensor network in a disaster recovery scenario where the sensor nodes are deployed

along the hallway on each floor of a building. Figure 2-3 shows such a simple sensor

network in a space of dimensions 100 × 100 m2. Each sensor node is located 20

m from each other. Nodes use radios with 2 Mbps bandwidth with nominal radio

range of 25 m. The channel access is the simple CSMA/CA and we use a uniformly

distributed channel error model. A user node at location 0 attempts to inject a

program image file of size equal to 2.5 KB into every node on the floor for the

purposes of re-tasking. The packet size is 50 bytes. Packets are generated from

the user node and transmitted at a rate of one packet every 10 ms. For PSFQ, the

timer parameters are set conservatively to follow the PSFQ paradigm: Tmax is 100

ms, Tmin is 50 ms, and Tr is 20 ms. Therefore, the fetch operation is 5 times faster

than pump operation. Each experiment is run 10 times and the results shown are

an average of these runs.

2.4.2 Simulation Results

One of the major goals of PSFQ is to be able to work correctly under a wide variety

of wireless channel conditions. The first experiment examines the error tolerance of

PSFQ and SRM-I, and compares their results.

44

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12 14

Ave

rage

Del

iver

y R

atio

Network Size (number of hops)

SRM-I, 30% error rateSRM-I, 50% error rateSRM-I, 70% error ratePSFQ , 30% error ratePSFQ , 50% error ratePSFQ , 70% error rate

Figure 2-4: Error tolerance comparison - average delivery ratio as a function of thenumber of hops under various channel condition for different packet error rates.

In Figure 2-4, we present the results for PSFQ and SRM-I under various channel

error conditions as we increase the network size in terms of the number of hops. As

one might expect, the average delivery ratio of both schemes decreases as the channel

error rate increases. For larger error rates, the delivery ratio decreases rapidly when

the network size increases. Notice that the user node starts sending data packets

into the network at a constant rate of one packet every 10 ms at 2 seconds into

the simulation trace and finishes sending all 50 packets within 0.5 seconds. The

simulation ran for 100 seconds after the user node stopped sending data packets.

Observe from Figure 2-4, SRM-I can achieve 100% delivery up to 13 hops away from

the source node only when the channel error rate is smaller than 30%. For 50% error

rate, the 100% delivery point decreases to within 5 hops; and for larger error rates,

SRM-I is only able to deliver a portion of the file up to two hops away from the user

node. In contrast, PSFQ achieves a much higher delivery ratio for all cases under

45

consideration for a wide range of channel error conditions. PSFQ achieves 100%

delivery up to 10 hops away from the user node even at 50% error rate and delivers

more than 90% of the packet up to 13 hops away. Even under extremely error-prone

conditions of 70% channel error rate, PSFQ is still able to deliver 100% data up to

4 hops away and 70% of the packets up to 13 hops, while SRM-I can only deliver

less than 30% of data up to 2 hops.

The better error tolerance exhibited by PSFQ in comparison to SRM-I justifies

the design paradigm of pump slowly and fetch quickly for wireless sensor networks.

The in-sequence data pump operation prevents the propagation of loss events, as

discussed in Section 2.2.3. SRM-I does not attempt to provide ordered delivery of

data and loss events are propagated along the multicast tree. In contrast, PSFQ’s

aggressive fetch operation and loss aggregation techniques support multiple loss

windows in a single control message. One high-level design lesson here is the in-

effectiveness of control messages under high-loss rate scenarios. SRM relies on the

underlying MAC layer to reliably deliver explicit and periodic control messages be-

tween members of a multicast group. The failure of the virtual carrier sense in

IEEE 802.11 MAC under high-loss rate environments cause SRM-I to fail, whereas

PSFQ’s minimalist approach enables it to do efficient control broadcasting, even

under high-loss conditions.

Our second experiment examines the data delivery latency of both schemes under

various channel conditions. The results are shown in Figure 2-5. The delivery

latency is determined only when all the intended target nodes have received all of

the data packets before the simulation terminates. For SRM-I, we know that 100%

delivery can be achieved only within a limited number of hops when the error rate

is high. In this experiment, we compare the two schemes using a 3-hop network and

investigate PSFQ’s performance with a larger number of hops since PSFQ has better

46

0

5

10

15

20

25

30

35

40

45

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Ave

rage

Del

iver

y La

tenc

y (S

econ

ds)

Channel Error Rate

SRM-I (3-Hop Network)PSFQ (3-Hop Network)PSFQ (4-Hop Network)PSFQ (5-Hop Network)

Figure 2-5: Comparison of average latency as a function of channel error rate.

error tolerance properties. Figure 2-5 shows that SRM-I has a smaller delay than

PSFQ when the error rate is smaller than 40%, but its delay grows exponentially

as the error rate increases, while PSFQ grows more slowly until it hits its error

tolerance barrier at 70% error rate. The reason that SRM-I performs better than

PSFQ in terms of delay in the lower error rate region is due to the “pump slowly”

mechanism, where each node delays a random period of time between Tmin and Tmax

before forwarding packets. Despite this small penalty in the lower error rate region

the coupling of this mechanism with the “fetch quickly” operation proves to be very

effective. As shown in Figure 2-5, PSFQ can provide delay assurances even at very

high error rates.

In the next experiment, we study the communication cost for reliability in both

schemes under various channel conditions using a 3-hop network, including a 16-

node (4× 4) 3-hop grid network to explore PSFQ’s performance in a dense network

47

0

5

10

15

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Ave

rage

Del

iver

y O

verh

ead

(tra

nsm

issi

ons/

pkt)

Channel Error Rate

100% delivery(PSFQ)

100% delivery(SRM-I)

SRM-I (MAC signaling not included)SRM-I (Total)PSFQ (Total)

PSFQ (Grid topology)

Figure 2-6: Average delivery overhead as a function of channel error rate.

where nodes can have up to four neighbors. Communication cost is measured as the

average number of transmissions per data packet (i.e., average delivery overhead).

For SRM-I, we separate the communication cost of the SRM-specific loss recovery

mechanisms from the total communication cost, which includes the cost associated

with the link-layer loss recovery mechanisms (RTS/CTS/ACK). Figure 2-6 shows

that the cost for PSFQ is consistently smaller than SRM-I by several times even af-

ter excluding the link-layer cost of SRM-I. We can observe from Figure 2-6 that the

communication cost in a denser grid network closely matches but is lower than its

chain-network counterpart, indicating that PSFQ can exploit neighbor redundancy

while suppressing unnecessary redundant transmissions. Figure 2-6 also illustrates

the 100% delivery barrier of both schemes (the two vertical lines). The 52% error

rate mark shows the limit for SRM-I while the 70% error rate mark shows the oper-

ational boundary for PSFQ. The different performance observed under simulation

48

is rooted in the distinct design choices made for each protocol. PSFQ utilize a pas-

sive, on-demand loss recovery mechanism, whereas SRM employ periodic exchange

of session messages for loss detection/recovery.

2.5. Experimental Testbed Results

In what follows, we discuss experiences implementing PSFQ in an experimental

wireless sensor testbed using the TinyOS platform [23] [19] and Rene2 motes [23].

The Rene2 sensor device has an ATMEL 4 MHz, low power, 8-bit micro-controller

with 16K bytes of program memory and 1K bytes of data memory, 32 KB EEPROM

serves as secondary storage. The radio is a single channel RF transceiver operating

at 916 MHz and capable of transmitting at 10 kbps using on-off-keying encoding.

TinyOS [19] is an event-based operating system employing a CSMA MAC. The

packet size is 36 bytes. With a link speed of 10 kbps the channel can deliver at

most 16 packets per second. Tuning the transmission power can change the radio

transmission range of the motes.

We implement the PSFQ pump, fetch, and report operations as a component

of TinyOS that interfaces with the lower layer radio components. In the imple-

mentation, every data fragment that is received correctly is stored in the external

EEPROM at a predefined location based on its sequence number. The sequence

number is used as an index to locate and retrieve data segments when a node re-

ceives a NAK from its neighbors.

2.5.1 PSFQ Parameter Space and Timer Bounds

Among the various PSFQ operations, the most aggressive timer is the fetch timer,

as defined in Section 2.3.2.2. A successful recovery after a sequence number gap has

been detected relies on two successful packet receptions being accomplished within

49

Base Station

1 13 20 21

Figure 2-7: A 4-hop network physically arranged in a string/chain topology.

Tr, (i.e., one for receiving the NAK at the neighbors and another for receiving

the repair packet at the fetching node). Since the transmission time of a single

packet is non-negligible in low bandwidth environments (i.e., approximately 67 ms

for Rene2 mote), Tr should be long enough to accommodate at least two packet

transmissions. There exists a lower bound for Tr that is defined at the granularity

of the transmission time of a single packet; assume this is Tpkt. Recall that upon

receiving a NAK, a node schedules a repair to be sent at a random time ∈ [ 1

4Tr,

1

2Tr].

Therefore, Tr must be long enough to wait for the largest delayed repair from the

neighborhood to avoid unnecessary retransmission of NAK messages, (i.e., Tr ≥1

2Tr + 2Tpkt). Therefore, Tr ≥ 4Tpkt. In reality, to avoid using up all the available

channel bandwidth during fetch operations, we increase the lower bound by one

or two Tpkt times to allow at least one packet transmission for other operations or

applications, and to accommodate other possible processing delays. For example, a

reasonable bound is: Tr ≥ 6Tpkt and Tmax ≥ 5Tr ≥ 30Tpkt. These values are used in

all of our testbed experiments discussed in the remainder of this section.

2.5.2 Messaging Overhead

In our experiments, we manipulate the radio transmission power of the motes to cre-

ate multi-hop networks such that motes that are separated by 5 inches can maintain

90% ∼ 100% reception rate, while motes that are separated by 10 inches can hardly

hear each other, (i.e., the reception rate is between 0% ∼ 15%). Figure 2-7 shows

a 4-hop network in a string/chain topology in which each node is separated by 5

inches. Here we refer to a hop distance as the distance between nodes that can main-

50

Figure 2-8: Breakdown of PSFQ messages. Average delivery overhead is 1.2± 0.13.

tain excellent communication, (i.e., more than a 90% packet reception rate). Our

test scenario sends a new execution image (i.e., image file of the TinyOS BLINK [19]

application segmented into 70 over-the-air packets) from the base station (BS) con-

nected to a PC to all the sensor nodes using PSFQ. When the base station confirms

the 100% reception of the image by all sensors (using the PSFQ report operation)

then it sends a single control message that propagates to all the sensor nodes to

initiate the process of transferring the new image from external EEPROM to the

internal Flash to complete the reprogramming of the application. Note that we use

PSFQ’s single packet reliable service to do this controlled application switchover at

sensors, as discussed in Section 2.3.4.

Figure 2-8 shows the result of our experiments in terms of communication over-

head with the breakdown of the PSFQ messages. Each data point in the figure

is an average of 10 independent experiments and the 95% confidence intervals are

all within 10% of the average value. The overall average delivery overhead is 1.2

transmissions per received packet.

51

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 1.5 2 2.5 3 3.5 4 4.5 5

Ave

rage

Del

iver

y O

verh

ead

(Xm

issi

ons/

pkt)

Network Size (# of hops)

1x Density2x Density3x Density

Figure 2-9: Average delivery overhead as a function of network size and density.

2.5.3 Network Size versus Network Density

In what follows, we examine the impact of the network density and the network

size on the performance of PSFQ in terms of delivery latency and average delivery

overhead.

Using the same test scenario described in Section 2.5.2, we measure both the

communication cost and delivery latency of PSFQ with various network sizes as

well as various node densities in our Rene2 testbed, in which motes are arranged in

a string/chain topology. Figure 2-9 and 2-10 present the results of these experiments.

Each data point is an average of 10 independent experiments and the corresponding

95% confidence intervals are plotted as y-axis error bars in the figures, respectively.

Figure 2-9 shows that the communication cost for reliable delivery increases

rapidly when the network size increases from 1 hop to multiple hops, but it also

levels off and stabilizes quickly for a network size of 4 to 5 hops. The reason for

52

100

150

200

250

300

350

1 1.5 2 2.5 3 3.5 4 4.5 5

Ave

rage

Del

iver

y La

tenc

y (S

econ

ds)

Network Size (# of hops)

1x Density2x Density3x Density

Figure 2-10: Average delivery latency as a function of network size and density.

the rapid rise of the communication cost is due to the well-known hidden terminals

problem in CSMA networks, which becomes evident only in multi-hop environments

and creates collisions that force packet drops. Nevertheless, PSFQ’s pump/fetch op-

erations can effectively prevent loss propagation along the distribution chain, and

therefore is able to maintain relatively low overhead (∼ 1.3 transmissions/packet)

as the network size increases. Interestingly, we can observe from Figure 2-9 that as

we increase the network density the communication cost actually decreases. This

indicates that PSFQ effectively suppresses redundant transmissions and takes ad-

vantage of overhearing transmissions from a dense neighborhood to reduce a node’s

transmissions, and hence, reduces the overall delivery overhead. Figure 2-10 shows

that the delivery latencies of PSFQ increase almost linearly with the network size

but they are rather independent of the network density, which indicates that PSFQ

can adapt well in a high-density environment.

53

2.6. Related Work

To our best knowledge PSFQ represents the first reliable transport for sensor net-

works. In what follows, we contrast more recent contributions [65] [66] [67] for

reliable data delivery in sensor networks that followed the initial publication of

PSFQ [49].

RMST (Reliable Multi-Segment Transport) [65] is a transport layer paradigm

for sensor networks that is closest to our work. RMST is designed to complement

Directed Diffusion [8] by adding a reliable data transport service on top of it. RMST

is a NAK-based protocol like PSFQ, which has primarily timer driven loss detection

and repair mechanisms. The authors analyze the tradeoff between hop-by-hop vs.

end-to-end repair and conclude the importance of hop-by-hop recovery, which is

consistent with our analysis and simulation results. In contrast to PSFQ, which

provides reliability purely at the transport layer, RMST involves both the transport

and MAC layers (e.g., SMAC [14]) to provide reliable delivery.

In ESRT [66] the authors propose using an event-to-sink reliability model in

providing reliable event detection that embeds a congestion control component. In

contrast to PSFQ, ESRT does not deal with data flows that require strict delivery

guarantees; rather, the authors define the “desired event reliability” as the num-

ber of data packets required for reliable event detection that is determined by the

application.

A sink-to-sensor reliability solution is presented in [67] that focuses on communi-

cation reliability from the sink to the sensors in a static network. The authors pro-

pose using a two-radio approach where each node is equipped with a low frequency

“busy-tone” radio in addition to the default radio that is used for data transmis-

sion and reception. The busy-tone radio is used to ensure delivery of single-packet

messages or the first packet of a longer message. A NAK-based recovery core is

54

constructed from the minimum dominating set of the underlying graph.

2.7. Conclusion

We have presented PSFQ, a reliable transport protocol for wireless sensor networks.

PSFQ is a lightweight, simple mechanism that is scalable and robust making mini-

mum assumptions about the underlying transport infrastructure. Based on our ref-

erence application for remotely programming sensors over-the-air, we have discussed

a number of important design goals that underpin the protocol’s development, in-

cluding, the correct and efficient operation under high packet error rate conditions

and support for loose delay bounds for reliable data delivery. We evaluated PSFQ

using simulation and through implementation in an experimental motes testbed. We

found that PSFQ outperforms SRM-I in terms of error tolerance, communication

overhead, and delivery latency.

Our work in PSFQ has been widely recognized as the first contribution to the

problem of reliable delivery in sensor networks. Several important contributions

came out of this work [65] [67] [68]. First, we proposed hop-by-hop error recovery

in which intermediate nodes also take responsibility for loss detection and recovery,

so that reliable data exchange is done on a hop-by-hop basis rather than end-to-

end. Second, we analyzed a simplified model of our NAK-based algorithm and

determined a near-optimal ratio between the timers associated with the forwarding

(pump) and retransmission (fetch) operations. Third, PSFQ exhibits a novel multi-

modal communication property that provides a graceful tradeoff between packet

switching and store-and-forward paradigms, depending on the channel conditions

encountered.

While investigating the reliable transport issues, we analyzed the loss patterns

in our sensor network testbed and observed that significant loss is also due to con-

55

gestion, over a wide range of workloads, including light and moderate traffic. This

observation leads us to study congestion problems in sensor networks. While some

researchers have discussed congestion issues [35] in sensor networks there has been

no comprehensive approach to the problem proposed in the literature. In the next

chapter, we address this challenge and propose the first such general algorithmic

approach called CODA (COngestion Detection and Avoidance) that includes a low-

cost sampling scheme for congestion detection, a backpressure algorithm, and sink

to source regulation.

56

Chapter 3

Energy-Efficient Congestion Detection and

Avoidance in Sensor Networks

3.1. Introduction

Sensor networks come in a wide variety of forms, covering different geographical

areas, being sparsely or densely deployed, using devices with a variety of energy

constraints, and implementing an assortment of sensing applications. One applica-

tion driving the development of sensor networks is the reporting of conditions within

a region where the environment abruptly changes due to observed events, such as

target detection, earthquakes, floods, or fires, and in habitat monitoring. Sensor

networks may typically operate under light load, but can suddenly become active in

response to a detected event. These scenarios require data to be delivered through

the sensor network quickly to a relatively small number of physical sink points that

are attached to the regular communication infrastructure. Sensor networks exhibit

a unique funneling effect where events are generated en masse and then must be

quickly moved toward a sink point. The flow of events out of the network has

similarities to the flow of people from a large arena after sporting events complete.

This leads to a number of significant challenges including increased transit traffic

57

intensity, congestion, and packet loss (and therefore energy and bandwidth waste)

at nodes closer to the sink.

Depending on the application this can result in the generation of large, sudden,

and correlated impulses of data that must be delivered to a small number of sinks

without significantly disrupting the performance (i.e., fidelity) of the sensing appli-

cation. Although a sensor network may spend only a small fraction of time dealing

with impulses, it is during this time that the information it delivers is of greatest

importance.

The transport of event impulses is likely to lead to varying degrees of conges-

tion in sensor networks. In order to illustrate the congestion problem consider the

following simple but realistic simulation scenario. Figure 3-1 shows the impact of

congestion on data dissemination in a sensor network for a moderate number of

active sources with varying reporting rates. The ns-2 simulation results are for the

well-known Directed Diffusion scheme [11] operating in a moderately sized 30-node

sensor network using a 2 Mbps IEEE 802.11 MAC with 6 active sources and 3 sinks.

The 6 sources are randomly selected among the 30 nodes in the network and the

3 sinks are uniformly scattered across the sensor field. Each source generates data

event packets at a common fixed rate while the sinks subscribe (i.e., broadcast cor-

responding interest packets) to different sources at random times within the first 20

seconds of the 50-second simulation scenario. Event and interest packets are 64 and

36 bytes in size, respectively. The plot illustrates that as the source rate increases

beyond a certain network capacity threshold (10 events/s in this network), conges-

tion occurs more frequently and the total number of packets dropped per received

data packet at the sink increases rapidly. The plot shows that even with low to

moderate source event rates there is a large drop rate observed across the sensor

network. For example, with a source event rate of 20 events/s in the network one

58

0

1

2

3

4

5

6

0.5 1 2 4 10 20 50 100

Dro

p R

ate

Source Rate (Events/s)

Drop.Rate

Figure 3-1: Total number of packets dropped by the sensor network per data eventpacket delivered at the sink (Drop Rate) as a function of the source rate. The x axisis plotted in log scale to highlight data points with low reporting rates. All packetsthat are dropped during the 50 second simulation session are counted as part ofthe drop rate including the MAC signaling (e.g., RTS/CTS/ACK and ARP), dataevent, and diffusion messaging packets.

packet is dropped across the sensor field for every data event packet received at the

sink. Dropped packets can include MAC signaling, data event packets themselves,

and the diffusion messaging packets. The drop rates shown in Figure 3-1 repre-

sent not only significant packet losses in the sensor network, but more importantly,

energy wasted by the sensing application.

Depending on the type of sensing application the rate of event impulses may vary

in frequency. Some applications may only generate light traffic from small regions

of the sensor network (e.g., target detection) while others (e.g., fires, earthquakes

detection) may generate large waves of impulses, potentially across the whole sens-

ing area (which causes high loss, as shown in Figure 3-1). In traditional computer

networks, throughput and delay are two important performance metrics that im-

pact the users’ experience. Therefore, the objective function for control mechanisms

adopted to control the traffic is often defined as maximizing the ratio of throughput

59

to delay [69], i.e., the power. However, in the context of sensor networks, because

of their limited resources and application specific nature, we observe that maximiz-

ing this ratio does not necessarily result in the optimal performance. Rather, the

objective of sensor networks is to maximize the operational lifetime while delivering

acceptable data fidelity to the applications.

In response to this, future congestion control mechanisms for sensor networks

must be capable of balancing the offered load, while attempting to maintain accept-

able fidelity (e.g., rate of events) of the delivered signal at the sink during periods

of transient and more persistent congestion. A number of distinct congestion sce-

narios are likely to arise. First, densely deployed sensors generating impulse data

events will create persistent hotspots proportional to the impulse rate beginning

at a location very close to the sources (e.g., within one or two hops). In this sce-

nario, localized, fast time scale mechanisms capable of providing backpressure from

the points of congestion back to the sources could be effective. Second, sparsely

deployed sensors generating low data rate events will create transient hotspots po-

tentially anywhere in the sensor field but likely farther from the sources, toward the

sink. In this case, fast time scale resolution of localized hotspots using a combi-

nation of localized backpressure (between nodes identified in a hotspot region) and

rate limiting techniques could be more effective. Because of the transient nature of

congestion, source nodes may not be involved in the backpressure. Third, sparsely

deployed sensors generating high data-rate events will create both transient and

persistent hotspots distributed throughout the sensor field. In this final scenario, a

combination of fast time scale actions to resolve localized transient hotspots, and

closed loop rate regulation of all sources that contribute toward creating persistent

congestion could be effective.

In this chapter, we propose an energy efficient congestion control scheme for

60

sensor networks called CODA (COngestion Detection and Avoidance) that comprises

three mechanisms:

• Congestion detection. Accurate and efficient congestion detection plays an

important role in the congestion control of wireless networks. CODA uses

a combination of the present and past channel loading conditions, and the

current buffer occupancy, to infer accurate detection of congestion at each

receiver with low cost. Sensor networks must know the state of the channel

since the transmission medium is shared and may be congested with traffic

between other nodes in the neighborhood. Listening to the channel to measure

local loading incurs high energy costs, if performed all the time. Therefore,

CODA uses a sampling scheme that activates local channel monitoring at the

appropriate time to minimize cost while forming an accurate estimate. Once

congestion is detected, nodes signal their upstream neighbors via a backpres-

sure mechanism that is discussed next.

• Open-loop, hop-by-hop backpressure. In CODA a node broadcasts backpressure

messages as long as it detects congestion. Backpressure signals are propagated

upstream toward the source. In the case of impulse data events in dense net-

works it is very likely that backpressure will propagate directly to the sources.

Nodes that receive backpressure signals can throttle their sending rates based

on the local congestion policy (e.g., silence for a random time or AIMD, etc.).

When an upstream node (toward the source) receives a backpressure message

it decides whether or not to further propagate the backpressure upstream,

based on its own local measured network conditions.

• Closed-loop, multi-source regulation. In CODA, closed-loop regulation oper-

ates over a slower time scale and is capable of asserting congestion control

61

over multiple sources from a single sink in the event of persistent congestion.

When a source event rate is less than some fraction of the maximum theoret-

ical throughput of the channel, the source regulates itself. When this value

is exceeded, however, a source is more likely to contribute to congestion, and

therefore, closed-loop congestion control is triggered. The source only enters

sink regulation if this threshold is exceeded. At this point a source requires

constant, slow time-scale feedback (e.g., ACK) from the sink to maintain its

rate. The reception of ACKs at sources serves as a self-clocking mechanism

allowing sources to maintain their current event rates. In contrast, failure to

receive ACKs forces a source to reduce its own rate. Once a source has de-

termined congestion has passed it takes itself out of sink regulation under its

own direction.

The chapter is organized as follows. Section 3.2. discusses a number of important

design considerations for mitigating congestion in sensor networks including MAC

and congestion detection issues. Section 3.3. details CODA’s backpressure and rate

regulation mechanisms. Following this, an implementation of CODA is evaluated in

an experimental sensor testbed in Section 3.4.. We define three important perfor-

mance metrics (i.e., energy tax, fidelity penalty, and power) to evaluate the impact

of CODA on the performance of sensing applications. Because CODA is designed

to interwork with existing data dissemination schemes, we also evaluate it using

one well-known dissemination mechanism. Section 3.5. presents our performance

evaluation of CODA working with Directed Diffusion [11] using the ns-2 simula-

tor. Section 3.6. presents the related work. Finally, some concluding remarks are

discussed in Section 3.7..

62

3.2. Design Considerations

In what follows, we discuss the technical considerations that underpin the design of

CODA while the detailed design is presented in Section 3.3..

The medium access control plays a significant role in the performance of man-

aging impulses of data in a wireless shared medium, including the detection of

congestion. A growing number of sensor networks use CSMA or variants for the

medium access control. For example, the widely used Berkeley motes [23] use a sim-

ple CSMA MAC as part of the TinyOS [19] platform. In [14] the authors proposed

a modified version of CSMA called S-MAC, which combines TDMA scheduling with

CSMA’s contention-based medium access, without a strict requirement for time syn-

chronization. S-MAC uses virtual carrier sense to avoid hidden-terminal problems,

allowing nodes other than the sender and receiver to enter sleep mode (during the

NAV after the RTS/CTS exchange), thus saving energy. A collision-minimizing

CSMA MAC is proposed in [70] that is optimized for event-driven sensor networks.

The authors propose to utilize a non-uniform probability distribution for nodes to

randomly select contention slots such that collisions between contending stations

are minimized.

There is a growing effort to design suitable TDMA schemes for sensor networks

where energy can be conserved by turning off nodes periodically. Congestion can

still occur when using TDMA or other schedule-based schemes when the incoming

traffic exceeds the node capacity and the queue overflows. Because TDMA and

other schedule-based access schemes (e.g., [71] [72]) can control and schedule traffic

flows in the network to provide collision-free communication, the impact of con-

gestion is less severe and the congestion control mechanism is simpler compared

to a CSMA MAC. For example, congestion can be reliably detected by monitoring

the queue size at each node (i.e., when buffer overflows), eliminating the need for

63

the new congestion detection schemes proposed in the rest of this section. Nev-

ertheless, the new objective function for congestion control in sensor networks (as

discussed in previous section) demands new feedback control mechanisms even for

TDMA/schedule-based networks. These new control mechanisms are discussed in

detail in section 3.3.1 and 3.3.2 and can be used seamlessly on both contention-based

and schedule-based networks.

A number of considerations shape the design of CODA. In what follows, we

discuss the MAC and congestion detection considerations with a focus toward CSMA

or contention-based schemes.

3.2.1 CSMA Considerations

3.2.1.1 Throughput Issues

The theoretical maximum throughput (channel utilization) for the CSMA scheme

is approximately [73]:

Smax ≈ 1

(1 + 2√

β)(for β 1), (3.1)

where,

β =τC

L. (3.2)

The performance of CSMA is highly dependent on the value of β, which is a measure

of radio propagation delay and channel idle detection delay. τ is the sum of both

radio propagation delay and channel idle detection delay in seconds, C is the raw

channel bit rate and L is the expected number of bits in a data packet. If nodes can

detect idle periods quickly, in other words have a very small β value, then CSMA

can offer very good channel utilization regardless of the offered load.

Equation (3.1) gives the channel capacity of CSMA within one hop. In [74] the

64

authors show that an ideal ad hoc multihop forwarding chain should be able to

achieve 25% of the throughput that a single-hop transmission can achieve. This

observation has important implications in the design of our congestion detection

and closed-loop regulation mechanisms, as discussed in Section 3.2.2 and Section

3.3.2, respectively.

3.2.1.2 Hidden Terminals

CSMA suffers from the well-known hidden terminal problem in multihop environ-

ments. IEEE 802.11 utilizes virtual carrier sense (VC), namely an RTS/CTS ex-

change, to eliminate hidden terminals. In order to reduce the signaling overhead

incurred by adding VC, IEEE 802.11 does not exchange RTS/CTS for small pack-

ets. In sensor networks, packets are usually small in nature (i.e., on the order of few

tens of bytes) because of the low duty cycle requirement and traffic characteristics

[5]. Therefore, the signaling cost is high if the RTS/CTS exchange is used for ev-

ery message. Furthermore, sensor nodes have a limited energy budget making the

energy cost of doing this prohibitively high.

Usually, nodes other than event source nodes and the forwarding nodes will

be silent most of the time. Therefore, loss due to hidden terminals rarely occurs

when the workload of the network is low. In [75], the authors show that in general,

when nodes are nicely randomized and coupled with appropriate delay in send-

ing/forwarding packets, the probability of having hidden terminals is low even in

dense networks. In S-MAC [14], an RTS/CTS exchange is used in an aggregated

manner (i.e., not for every single packet) to reduce the energy cost.

In the context of sensor networks, the VC scheme is costly and mostly unneces-

sary during normal operation. There is a need, however, to devise a scheme that can

work satisfactorily with or without the VC for collision avoidance, that incurs low

65

cost or no cost during normal operations, and yet is responsive enough to quickly

resolve congestion1. In Section 3.2.2, we discuss such a scheme.

3.2.1.3 Link-layer ARQ

In the IEEE 802.11 MAC, a packet will be kept in the sending buffer until an

ACK is received or the number of retransmissions exceeds a certain threshold. This

mechanism increases the link reliability at the expense of energy and buffer space.

However, both of these resources are scarce in sensor nodes where support for re-

liability may not always be necessary under normal operations (i.e., due to the

application-specific nature of sensor networks not all data packets require strict

reliability2). Today different sensor platforms utilize different radio technologies;

some radios support low-overhead synchronous ACK [45] (e.g., the RFM radio used

in Mica [24]) and some radios include built-in link-layer ACK supporting higher

data rates up to 250 Kbps (e.g., the IEEE 802.15.4 radio used in Telos [19]), while

in others supporting ACK could be costly (e.g., the Chipcon radio used in Mica2

[45]) in terms of energy and bandwidth consumption.

We believe there is a need for a separation between reliability and congestion

control in the design of sensor networks protocols. The use of VC and link-layer ARQ

as a reliable means of communication is essential for critical information exchange

(e.g., routing signaling), but they are not necessarily relevant during congestion. In

sensor networks, energy expenditure is more important than occasional data loss

because of the natural redundancy inherent in disseminated sensor data. The main

objective function is therefore to minimize energy expenditure. This is in contrast

1Depending on the sensing applications and the radio technologies, a user might choose toomit the VC for data packets but retain it for critical signaling messages (e.g., control packets forrouting protocol) in order to reduce overhead.

2For example, applications that generate periodic workload can often reasonably assume thatsubsequent reports will supersede any lost data.

66

to TCP where the lost data is always recovered. In our design, congestion control

elements do not explicitly look at loss (unlike TCP), allowing CODA to decouple

reliability from other control mechanisms. CODA is therefore capable of working

with or without reliability elements, such as link-layer ARQ, depending on the

application’s needs and the radio technology used in the sensor platform.

3.2.2 Congestion Detection

Accurate and efficient congestion detection plays an important role in congestion

control of sensor networks. There is a need for new congestion detection techniques

that incur low cost in terms of energy and computation complexity. Several tech-

niques are possible.

3.2.2.1 Buffer Queue Length

Queue management is often used in traditional data networks for congestion detec-

tion. However, without link-layer ACK (some applications might not require this

and hence would omit it to save the overhead, as discussed above), buffer occu-

pancy or queue length cannot be used as a reliable indication of congestion. To

illustrate this, we perform an ns-2 simulation of the simple IEEE 802.11 wireless

5-node network shown in Figure 3-2. In the simulation, nodes 1 and 4 each start

sending (1 second apart in simulation time) CBR traffic that consumes 50% of the

channel capacity through node 2 to node 3 and 5, respectively. One of the sources

stops sending data after 10 seconds. We ran two simulation trials, one with the VC

enabled (including link ARQ), the other with it disabled and no link ARQ.

Figure 3-3 shows the time series traces for both channel loading and buffer

occupancy as well as the packet delivery ratio measured at the intermediate node 2.

It is clear from the plot that the channel loading almost immediately rises to 90%

67

24

1

3

5

Queue LengthChannel load

Figure 3-2: A simple IEEE 802.11 wireless network of 5 nodes illustrates receiver-based congestion detection.

during the time both sources are on. Congestion occurs and the packet delivery

ratio drops from 100% to around 20% during this period. Note that the buffer

occupancy grows at a slower rate during this congestion period, particularly in

the trace corresponding to the simulation where the VC is disabled. The buffer

occupancy (without link ACK) even drops at around 5 seconds into the simulation,

which provides false information about the congestion state. This is because without

the link-layer ACK, the clearing of the queue at the transmitter does not mean that

congestion is alleviated since packets that leave the queue might fail to reach the

next hop as a result of collisions. Note that CSMA does not guarantee collision-free

transmissions among neighboring nodes because of the detection delay [73].

This simple simulation shows that the buffer occupancy alone does not provide

an accurate and timely indication of congestion even when the link ARQ is enabled,

except in the extreme case when the queue is empty or about to overflow. The first

case indicates good traffic conditions and the latter one signals serious congestion.

As shown in the figure, the queue takes a much longer time to grow beyond a

68

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

Nor

mal

ized

Rat

io

Time (Seconds)

Channel Load/UtilizationBuffer Occupancy, VC

Buffer Occupancy, no VCPacket Delivery Ratio

Figure 3-3: Channel load and buffer occupancy time series traces with and withoutvirtual carrier sense (VC)+link-layer ACK, and packet delivery trace with VC.

high watermark level (e.g., 0.8) that signifies congestion compared to the channel

load. We argue that this bimodal effect and detection latency is not responsive

enough and too coarse to provide accurate, timely and efficient congestion control,

especially in the case of event-driven sensor networks where short-lived hotspots

are likely to occur across different time-scales. Therefore, we propose augmenting

buffer monitoring with channel load measurement for fast and reliable congestion

detection in sensor networks.

3.2.2.2 Channel Loading

In CSMA networks, it is straightforward for sensors to listen to the channel, trace the

channel busy time, and calculate the local channel loading conditions. Since Smax in

Equation (3.1) gives the optimal utilization of the channel when β is determined, if

one senses that the channel loading reaches a certain fraction of the channel capacity,

this would indicate a high probability of collision [74].

Listening to the channel consumes a significant portion of energy [13] in a node.

69

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70

Ave

rage

Que

ue S

ize

(pkt

s)

Channel Utilization (%)

Average Queue Size

Figure 3-4: Queueing performance of a real sensor network of Mica motes.

Therefore, performing this operation all of the time is not practical in sensor net-

works. In Section 3.3.1, we propose a sampling scheme that activates local channel

monitoring only at the appropriate time to minimize the energy cost while forming

an accurate estimate of conditions.

Channel loading and buffer occupancy give accurate information about how busy

the surrounding network is. These provide a good congestion detection measure for

hop-by-hop flow control, but the scope of this control is inherently local. Hop-by-

hop flow control has limited effect, for example, in mitigating large-scale congestion

caused by data impulses from sparsely located sources that generate high-rate traffic.

To understand this limitation in a practical sensor network, we study the channel

load and queue performance using our Mica mote [23] testbed. We generate data

packets at different rates that drive the network to different levels of congestion and

measure the average queue size of the nodes in a small neighborhood that share the

wireless medium. The experimental results shown in Figure 3-4 plot the measured

average queue size against the channel load (utilization). The figure shows that the

queue size is very small ( 1) for all channel loads before the channel saturates

70

at a utilization of approximately 70%. Note that the curve resembles a typical

M/M/1 queue, except that it saturates at a utilization far lower than one, which

is a limitation imposed by the channel idle detection delay (this result is further

confirmed when we measure the β value in Section 3.4.1).

In CODA’s hop-by-hop flow control, a congested node, determined by the mea-

sured channel load and queue size, provides backpressure to its upstream neighbors.

Because of the funneling effect in sensor networks, particularly for sparsely located

sources, congestion is most likely to occur at downstream sensors closer to the sink.

Therefore, upstream sensors located closer to the sources within the propagation

funnel (i.e., data flowing from multiple sources toward a sink) are likely to experi-

ence lower channel load, and hence a low queue occupancy according to Figure 3-4.

As a result, the backpressure signal would most likely stop propagating before the

feedback signal reaches the sources. Therefore, the hop-by-hop backpressure mecha-

nism alone is not enough to mitigate large-scale congestion. A new mechanism that

resembles end-to-end closed-loop control, that allows a user to control the desired

reporting rate of an application is needed and explored in the next section.

3.2.2.3 Reporting Rate/Fidelity Measurement

For typical applications in sensor networks [11], the sinks expect a certain sampling

rate or reporting rate coming from the sources. This rate is application-specific, and

can be seen as an indication of event fidelity [35]; that is, the reporting rate from

the sources with respect to certain phenomena should be high enough to satisfy

the applications’ desired accuracy. When a sink consistently receives a less than

desired reporting rate, it can be inferred that packets are being dropped along the

path, most probably due to congestion. On the other hand, most applications can

tolerate a certain degree of fidelity loss, allowing the user to tradeoff fidelity to

71

avoid congestion in the network, if needed. Therefore, the fidelity measurements

such as the number of event packets received within a time period can be used as a

congestion signal and a metric for end-to-end control.

Such fidelity measurement schemes need to operate on a much longer time-scale

compared to the packet transmission time-scale, and consider:

• End-to-end delay between sources and sink nodes since only the sink can

recognize its own requirements on the sampling rate.

• Stability - to avoid unnecessary reaction to transient phenomena that could

cause oscillations, a sink should not respond too quickly to events, and there-

fore, should define an appropriate “observation period” (i.e., window) over a

longer time-scale for measurement.

We conjecture that pure window-based end-to-end control schemes like TCP

are not well suited to sensor networks. In addition to the excessive end-to-end

acknowledgment overhead, there exists a mismatch of the traffic model with the

applications (i.e., the data traffic in most sensor network applications is CBR in

nature and might experience a sudden increase in the data rate when an interesting

event occurs). In TCP, since every incoming ACK increases the transmission window

size, low-rate CBR could falsely inflate the window to a very large size that could

easily overwhelm the network with event-based applications. To avoid this well-

known large-window problem in TCP and the excessive ACK overhead, in section

3.3.2 we propose a novel low cost end-to-end closed-loop control mechanism that

combines window-based and rate-based control.

72

3.3. CODA Design

Hotspots (i.e., congestion) can occur in different regions of a sensor field due to

different congestion scenarios that arise. This motivates the need for CODA’s open-

loop hop-by-hop backpressure and closed-loop multi-source regulation mechanisms.

These two control mechanisms, while insufficient in isolation, complement each other

nicely. Different rate control functions are required at different nodes in the sen-

sor network depending on whether they are sources, sinks, or intermediate nodes.

Sources know the properties of the traffic they inject while intermediate nodes do

not. Sinks are best placed to understand the fidelity rate of the received signal, and

in some applications, sinks are powerful nodes that are capable of performing sophis-

ticated heuristics. The goal of CODA is to maintain low or no cost operations during

normal conditions, but be responsive enough to quickly mitigate congestion around

hotspots once it is detected. In what follows, we discuss CODA’s backpressure and

multi-source regulation mechanisms.

3.3.1 Open-Loop Hop-by-Hop Backpressure

Backpressure is the primary fast time scale control mechanism when congestion

occurs. The main idea is to use the components mentioned in Section 3.2.2 to do

local congestion detection at each node with low cost. Once congestion is detected,

the receiver will broadcast a backpressure message to its neighbors and at the same

time make local adjustments to prevent propagating the congestion downstream.

A node broadcasts backpressure messages as long as it detects congestion. Back-

pressure signals are propagated upstream toward the source. In the case of impulse

data events in dense networks it is very likely that the backpressure may propagate

directly to the sources. Nodes that receive backpressure signals could throttle their

sending rates (e.g., be silence for a random period of time) or regulate data rates

73

based on some local congestion policy (e.g., AIMD).

When an upstream node (toward the source) receives a backpressure message,

based on its own local network conditions it determines whether or not to further

propagate the backpressure signal upstream. For example, nodes do not propagate

the backpressure message if they are not congested.

We use the term depth of congestion to indicate the number of hops that the

backpressure message has traversed before a non-congested node is encountered.

The depth of congestion can be used by the routing protocol and local packet drop

policies to help balance the energy consumed during congestion across different

paths. Two simple schemes can be used:

• Consider the instantaneous depth of congestion as an indicator to the routing

protocol to select better paths, thereby reducing traffic over the paths suffering

deep congestion.

• Alternatively, rather than coupling congestion control and routing, the nodes

can silently suppress or drop important signaling messages associated with

routing or data dissemination protocols (e.g., interests [11], data advertise-

ments [9], etc.). Such actions would help to push event flows out of congested

regions and away from hotspots in a more transparent way.

Further investigation of using depth of congestion to assist routing is out of the

scope of this chapter. The rest of this section will describe the main elements and

detailed operations of CODA’s open-loop control.

3.3.1.1 Receiver-based Detection

As mentioned in Section 3.2.2, there are multiple good indications of congestion:

• a nearly overflowing queue.

74

• a measured channel load higher than a fraction of the optimum utilization.

This provides a probabilistic indication of congestion by observing how closely

the channel load approaches the upper bound.

Monitoring the queue size comes almost for free except for a little processing

overhead, but it provides only a bimodal indication with non-negligible latency.

Listening to the channel either to measure the channel loading or to acquire signaling

information for collision detection provides a fast and good indication but incurs high

energy cost if performed all the time. Therefore, it is crucial to activate the latter

component only at the appropriate time in order to minimize cost.

Consider the typical packet forwarding behavior of a sensor network node and

its normal radio operational modes. The radio stays in the listening mode except

when it is turned off or transmitting. When a carrier is detected on the channel,

the radio switches into the receiving mode to look for a transmission preamble and

continues to receive the packet bit stream. Before forwarding this packet to the

next hop, CSMA requires the radio to detect an idle channel which implies listening

for a certain amount of time. If the channel is clear during this period, then the

radio switches into the transmission mode and sends out a packet. There is no extra

cost to listen and measure channel loading when a node wants to transmit a packet

since carrier sense is required anyway before a packet transmission. Based on this

observation, we conclude that the proper time to activate the detection mechanism

is when a node’s send buffer is not empty. In other words, a node’s radio might

be turned off most of the time according to some node coordination schemes (e.g.,

GAF [13], SPAN [12], S-MAC [14], etc.), but, whenever receiving or transmitting a

packet, the radio must reside in the listening mode for a time.

Figure 3-2 illustrates a typical scenario in sensor networks in which hotspots or

congestion areas could be created. In this example, nodes 1 and 4 each send CBR

75

traffic that consumes 50% of the channel capacity through node 2 to node 3 and 5,

respectively. Packets that are received by node 2 stay in its queue because of the

very busy channel and are eventually dropped. This simple example shows that in

a congested neighborhood, a receiver’s (e.g., node 2, the forwarding node) buffer

occupancy is high or at least non-empty. A node that activates the channel loading

measurement during the moment when its buffer is not empty is highly responsive

with almost no cost. The channel loading measurement will stop naturally when

the buffer is cleared, which indicates with high probability that any congestion is

mitigated and data flows smoothly around the neighborhood. Based on this obser-

vation, there is little extra cost to measure the channel loading if a node activates

channel monitoring only when it is “receiving” a packet and needs to forward it

later on. The only time CODA needs to do this is when a node has something to

send, and it has to do carrier sense anyway for those situations.

3.3.1.2 Minimum Cost Sampling

A sensing epoch is defined as a multiple of the packet transmission time. When a

node starts sensing the channel (i.e., when it has something to send in its buffer),

we probe the MAC for at least 1 epoch time to measure the channel load. During an

epoch period, instead of forcing the MAC to continuously listen during the backoff

time, a node performs periodic sampling of the radio states (non-invasive probing,

i.e., we do not modify the radio state machine) so that the radio can be turned off

during the interval. This non-invasive sampling scheme provides an elegant way to

measure the channel load without adding any energy cost to the radio other than

the cost required by the original CSMA state machine. We use a simple sampling

scheme where the channel load is measured for N consecutive sensing epochs of

length E, with a predefined sampling rate to obtain channel state information; that

76

is, the number of times that the channel state is busy or idle within a single sensing

epoch. We then calculate the sensed channel load Φ as the exponential average of

Φn (the measured channel load during epoch n) with parameter α over the previous

N consecutive sensing epochs, as shown in Equation (3.3).

Φn+1 = αΦn + (1 − α)Φn, (n ∈ 1, 2, ..., N, Φ1 = Φ1). (3.3)

If the send buffer is cleared before n counts to N, then the average value is ignored

and n is reset to 1. The tuple (N,E, α) offers a way to tune the sampling scheme

to accurately measure the channel load for specific radio and system architectures.

In Section 3.4.2, we describe and demonstrate the tuning of these three parameters

in an experimental sensor network testbed comprised of Berkeley Mica motes.

3.3.1.3 Backpressure Message

When the sensed channel load exceeds a threshold (this can simply be Smax, as

shown in later evaluation sections) or the buffer occupancy reaches a certain high

watermark level, congestion is implicit. A node broadcasts a message as a back-

pressure signal and at the same time exercises the local congestion policy. Although

there is no guarantee that all neighboring nodes will get this message, at least some

nodes will get it probabilistically. A node broadcasts a backpressure message when

it detects congestion based on channel loading and buffer occupancy. A node will

continue broadcasting this message up to a certain maximum number of times with

minimum separation as long as congestion persists. Alternatively, a node can set a

congestion bit in the header of every outgoing packet [76] instead of sending explicit

backpressure signals. However, this scheme requires all nodes to overhear traffic

from the neighborhood, which is difficult to realize in non-CSMA based MAC such

as TDMA.

77

The backpressure message provides the basis for the open loop backpressure

mechanism and can also serve as an on-demand “Clear To Send” (CTS) signal, so

that all other neighbors except a single sender (which could be picked randomly,

or a node can assign more chances to more desirable senders) can be silenced at

least for a single packet transmission time. This deals with hidden terminals and

supports an implicit priority scheme in CODA. The “chosen node” embedded in

the backpressure message can be selected based on data type or other metrics that

essentially assign the chosen sender a higher priority to use the bandwidth. All

nodes can share a priority list of data types, with a certain data type having higher

priority than others.

3.3.2 Closed-Loop Multi-Source Regulation

In sensor networks there is a need to assert congestion control over multiple sources

from a single sink in the event of persistent congestion, where the sink plays an

important role as a 1-to-N controller over multiple sources. Note that backpres-

sure alone cannot resolve congestion under all scenarios because our design does

not propagate the congestion signal in cases where nodes do not locally experi-

ence congestion - to do so would be very costly in terms of power and bandwidth

consumption.

The cost of closed-loop flow control is typically high in comparison to simple

open-loop control because of the required feedback signaling. We propose an ap-

proach that would dynamically regulate all sources associated with a particular

data event. Under normal operation sources would regulate themselves at prede-

fined rates (e.g., based on the data dissemination protocol [11] [9]) without the

intervention of closed loop sink regulation.

When the source event rate (r) is less than some fraction η of the maximum

78

theoretical throughput (Smax) of the channel the source regulates itself. When this

value is exceeded (r ≥ ηSmax), a source is more likely to contribute to congestion and

therefore closed-loop control is triggered. The threshold η here is not the same as the

threshold that used in local congestion detection, in fact η should be much smaller

because of the result suggested in [74]. The source only enters sink regulation if this

threshold is exceeded. At this point a source requires steady periodic feedback (e.g.,

ACKs) from the sink to maintain its rate (r). A source triggers sink regulation

when it detects (r ≥ ηSmax) by setting the regulate bit in the event packets it

forwards toward the sink. Reception of packets with the regulate bit set forces the

sink to send “aggregated ACKs” (e.g., 1 ACK per 100 events received at the sink)

to regulate all sources associated with a particular data event. ACKs could be sent

in an application specific manner. For example, the sink could send the ACK only

along paths it wants to reinforce in the case of a Directed Diffusion [11] application.

The reception of ACKs at sources serves as a self-clocking mechanism allowing the

sources to maintain the current event rate (r).

When a source sets its regulate bit it expects to receive an ACK from the sink at

some predefined rate, or better, a certain number of ACKs over a predefined period

allowing for the occasional loss of ACKs due to transient congestion. If a source

receives a prescribed number of ACKs during this interval it maintains its rate (r).

When congestion builds up ACKs can be lost, forcing sources to drop their event

rate (r) according to some rate decrease function (e.g., multiplicative decrease, etc.).

The sink can stop sending ACKs based on its view of network conditions. The sink

is capable of measuring its own local channel loading (ρ) conditions and if this is

excessive (ρ ≥ γSmax) it can stop sending ACKs to sources.

Because the sink expects a certain reporting rate it can also take application-

specific actions when this rate is consistently less than the desired reporting rate

79

(i.e., the fidelity of the signal [35]). In this case the sink infers that packets are

being dropped along the path due to persistent congestion and stops sending ACKs

to sources. When congestion clears the sink can start to transmit ACKs again, and

as a result, the event rate of the source nodes will increase according to some rate

increase function (e.g., additive increase).

Because in some applications the sink is powerful in comparison to sensors and

a point of data collection, it can maintain state information associated with specific

data types. By observing packet streams from sources at the sink, if congestion is

inferred the sink can send explicit control signals to those sources to lower their

threshold value η to force them to trigger sink regulation even at a lower rate than

others, (i.e., other more important observers). This provides an implicit priority

mechanism as part of the closed-loop congestion control mechanism.

When the event rate at the sources is reset (e.g., via reinforcement [11]) to a value

(r) that is less than some factor η of the maximum theoretical throughput (Smax)

of the channel then the sources begin again to regulate themselves without the need

of ACKs from the sink. Such a multimodal congestion control scheme provides

the foundation for designing efficient and low cost control that can be practically

implemented in sensor networks based on the Berkeley motes series [23][24], as

discussed in Section 3.4.. Overall, closed loop multi-source regulation works closer

to the application layer and operates on a much larger (order of magnitude) time-

scale than its open-loop counterpart.

3.3.2.1 A Hybrid Window-based and Rate-based Algorithm

In essence, CODA’s closed-loop control can be realized as a combination of window-

based and rate-based schemes. We define the drop rate (i.e., number of packets

dropped in the network per received packet at the sink) as an energy metric called

80

1

2

3

4

5

6

7

8

9

10

0 50 100 150 200 250 300 350 400

Rat

e (p

kt/s

)

Time (seconds)

d=0.5,Wsink=50d=0.8,Wsink=50d=0.8,Wsink=25

Figure 3-5: Closed-loop control model. The impact of Wsink and the multiplicativedecrease factor d.

the energy tax or ETax. The packet loss rate p is thus ETax

1+ETax

. With a source event

rate of r, the expected number of event packets received at the sink, which is a

measure of application fidelity, is r(1 − p) or r1+ETax

. The application fidelity is

approximately inversely proportional to ETax.

Recall a key objective of sensor networks is to maximize the operational lifetime

while delivering acceptable data fidelity to the applications. This demands a mech-

anism to control the network so that the energy tax does not exceed an acceptable

value, which is in essence an application-specific choice. This is the objective func-

tion for CODA’s closed-loop control. Under overload conditions, assume that the

network does not drop ACKs from the sinks, (i.e. ACKs are delivered through high

priority queues), and the majority of packet loss in the network is due to congestion.

We can then realize this objective through a hybrid rate-based and window-based

algorithm. This algorithm governs the window sizes at both source and sink with

81

the ETax in the following equation:

Wsrc = r(τf + τb) + Wsink(1 + ETax) (3.4)

Wsrc is the window size or the number of event packets a source is allowed to send

at the current rate r without receiving an ACK from the sink. Wsink is the window

size or the number of accumulated event packets a sink receives before it sends

an aggregated ACK. r is the source rate during the current observation cycle and

(τf + τb) is the sum of the forward and backward one-way delays between a source

and the sink. The algorithm is such that, if a source does not receive an ACK after

it has sent out Wsrc event packets at rate r, it should decrease its rate from r to d · r

(d < 1 multiplicative decrease). If later an ACK is received at the source within

the next observation cycle Wsrc, then the source increases its rate from r to r + b

(additive increase). In other words, this control scheme ensures that a source would

cut its rate whenever the perceived energy tax rises beyond an acceptable value

ETax. Wsink determines the control overhead and the length of the decision period

that controls the convergence time of the rate control algorithm. To understand

the tradeoff between the control overhead and the convergence time, we numerically

evaluate Equation (3.4), simulating a network that experiences congestion when the

source rate exceeds 3 pkt/s but no congestion when the source rate is below 1.5

pkt/s.

In Figure 3-5, we evaluate the impact of two values of multiplicative decrease

factor d and two values of Wsink. For a fixed Wsink (e.g., equal to 50 or 2% control

overhead for sending ACKs), we observe that the source rate with a smaller d (i.e.,

0.5) drops more quickly than a source with a larger d value (i.e., 0.8). However, the

rate with a smaller d oscillates and thus takes a longer time to restore and converge

to an acceptable rate that avoids congestion. Therefore, a smaller d can reduce the

82

energy tax but most likely will hurt the fidelity because of the longer convergence

time. On the other hand, a larger d would have a larger energy tax because of the

slower rate reduction, even though it could achieve higher data fidelity because of

the finer levels of granularity of rate reduction and thus can converge faster to an

acceptable rate. Note that Wsink controls the length of the “observation cycle” and

thus a smaller Wsink can accelerate the rate reduction process. In Figure 3-5, we can

see that a smaller Wsink (i.e., 25) causes the rate of a source with d = 0.8 to decrease

as fast as d = 0.5. This allows the algorithm to achieve the same reduction in energy

tax while maintaining high fidelity, at the expense of higher control overhead (i.e.,

an increase from 2% to 4%) because of the smaller value of Wsink. We study these

parameter tradeoffs in our mote testbed and discuss the result in Section 3.4.5 under

real-world experimental conditions.

3.4. Experimental Sensor Network Testbed

In this section, we discuss experiences implementing CODA on a real sensor system

using the TinyOS platform [19] on Mica motes [23]. We report evaluation results,

including measuring the β value, tuning the parameters for accurate channel load

measurement, and finally, evaluating CODA with a generic data dissemination ap-

plication.

The sensor device has an ATMEL 4 MHz, low power, 8-bit microcontroller with

128K bytes of program memory, 4K bytes of data memory, and a 512 KB external

flash serves as secondary storage. The radio is a single channel RF transceiver

operating at 916 MHz and is capable of transmitting at 10 Kbps using on-off-keying

encoding. For all our experiments we use a Non-Persistent CSMA MAC on top of

the Mica motes.

83

3.4.1 Measuring the β Value

An important decision that must be made when using CODA’s open-loop control

mechanism described in Section 3.3.1 is the congestion threshold at which we should

start applying backpressure. A first step in making this decision is to determine the

maximum channel utilization achievable with the radio and MAC protocol being

used.

As noted in Equation 3.1 in Section 3.2.1, for the CSMA MAC protocol, the

channel utilization in a wireless network depends on the propagation delay between

the nodes with the maximum physical separation that can still interfere with each

other’s communications, and the channel idle detection delay. In sensor networks,

the maximum physical separation is typically tens of meters or less and as such the

propagation delay is negligible for most purposes. Thus, if the channel idle detection

delay is also negligible, CSMA should provide almost 100% utilization of the offered

load of the channel. However, in practice, the utilization is much less due to the

latency in the idle channel detection at the MAC layer. We can use the parameter β

as defined in Equation 3.2 to predict how much this latency degrades the maximum

channel utilization.

We measure the β value for the Mica mote using a simple experimental setup

involving two motes both running TinyOS [19]. Stopwatches inserted in the MAC

provide the basis for the measurement of β. Figure 3-6 shows the placement of the

stopwatches within the receive and transmit flows of the MAC layer. Mote A starts

its watch when the MAC receives a packet to be sent from the upper layers of the

network stack and stops its watch when it detects the start-symbol of an incoming

packet from mote B. The locations of the stopwatch trigger points in the mote B

MAC are the same as in mote A, but the operations are reversed. It starts the

watch when it receives a packet and stops it when it starts to transmit.

84

A single iteration of the measurement consists of mote A sending a packet to

mote B and mote B immediately reflecting the packet back to mote A. Due to the

symmetry inherent in the placement of the stopwatch trigger points, β is propor-

tional to half the difference between Stopwatch A and Stopwatch B:

β =(StopwatchA − StopwatchB)

(2 ∗ (Packet transmission time)). (3.5)

Over 50 iterations, we measure an average β of 0.030 ± 0.003 (with confidence

level of 95%) for the Mica motes. Substituting β into Equation 3.1, the standard

expression for CSMA throughput (Smax), we predict a maximum channel utilization

of approximately 73%. The same measurement procedure executed on the Mica2

mote predicts a maximum throughput of approximately 36% with the default MAC

in TinyOS-1.1.0. Note that the measurement of β is simply a way to provide theo-

retical rationale to determine a reasonable threshold. Alternatively, one can always

determine a suitable threshold experimentally.

3.4.2 Channel Loading Measurement and Utilization

Setting the channel loading threshold that will trigger the backpressure mechanism

requires consideration of the tradeoff between energy savings and fidelity. Conserv-

ing energy implies a strategy that senses the channel load sparsely in time (fewer

timer interrupts and processing). However, the channel load measurement is most

accurate when sensing densely in time. As a compromise between dense and sparse

sampling, we use the scheme discussed in Section 3.3.1.2 where the channel load

is measured for N consecutive epochs of length E (with some fixed channel state

sampling rate within this epoch), and an exponential average, with parameter α,

is calculated to represent the sensed channel load. The problem then becomes to

manipulate these three parameters (N,E, α) so that the node’s sensed channel load

85

Tx ByteT_carrier sense

T_encode

T_preamble search

T_start symbol search

T_read bits

T_decodeRx Byte

BA

Start Timer

Start Timer

Rx Byte

T_start symbol search

T_read bits

T_decode

T_preamble search

T_carrier sense

T_encode

Tx ByteStop Timer

Stop Timer

Upper Layers(Assuming T_prop ~ 0)

Upper Layers

Upper Layers

Figure 3-6: MAC layer stopwatch placement for β measurement. Diagram of receiveand transmit state flows in the TinyOS MAC component code. Placement of thestopwatch start/stop trigger points are marked with an X.

86

is as close as possible to the actual channel load.

To do this optimization experimentally, we use two motes running TinyOS with

a CSMA MAC. Mote S is a randomized CBR source that sends at 4 packets per

second. Mote R is the receiver that senses the channel load using the scheme men-

tioned in the previous paragraph. The channel is sampled once per millisecond for

each epoch E for a total of N epochs. Using this setup we tested all combinations

of N ∈ 2, 3, 4, 5; E ∈ 100ms, 200ms, 300ms and α ∈ 0.75, 0.80, 0.85, 0.90. A

time series average, of the exponential averages, is taken over 256 seconds for each

combination (1024 packets are sent). Using this method we found that the combi-

nation (4, 100ms, 0.85) yielded the average sensed channel load at mote R closest to

the actual average channel load (in % terms) calculated by mote S with an accuracy

of 0.16±0.07. In general, we observe that the detection accuracy is not very sensitive

(the difference is within 5%) to these three parameters. Therefore, manual calibra-

tion for each new CSMA-based radio might not be necessary. Our experiences with

the new generation of Mica2 mote, which uses a different radio/MAC than Mica, is

consistent with this conjecture.

In order to address the more realistic case of a node that both listens to and

forwards packets, a third mote F is added to the previous experimental setup with

all motes well within the transmission range of each other. Mote F forwards packets

sent from mote S in a random interval between 30 and 130 milliseconds after it

receives them, and also senses the channel load using the same scheme with the same

(N,E, α) parameters that mote R uses. There is now contention for the channel

since there are two packet sources (motes S and F). To minimize the probability

of dropping packets from the application layer because of buffer limitations, we use

a buffer size of 3 packets at the MAC layer. This decision is based on the queue

performance result shown in Figure 3-4, where we observe that the average queue

87

50

55

60

65

70

75

80

85

90

95

100

2 4 6 8 10 12 14 16 18

Cha

nnel

Loa

d/D

eliv

ery

Rat

io (

%)

Source Rate (packets/sec)

Nominal LoadMeasured Load

Application Delivery RatioMAC Delivery Ratio

Figure 3-7: A limit on measured channel load is imposed by β. Nominal load curveincreases with constant slope as the source packet rate increases, while the measuredload saturates at a value below 70%.

size is 3 before the channel saturates. Mote R remains as a reference node to

check the channel load sensed by mote F and also to keep track of the number of

packets sent by motes S and F to calculate the delivery ratio.

With mote S sending 1024 packets, we measure the packet delivery ratio and

channel load sensing accuracy using different source packet rates (viz. 4, 5, 6.25,

7.69, 9.09, 10, 16.67). The average sensed channel load at R and F, along with the

nominal channel load (calculated based strictly on offered load), are plotted against

the source packet rate in Figure 3-7.

Figure 3-7 shows the β-dependency of the CSMA MAC on the Mica mote. We

can see from the plot of the nominal channel load that the offered load is more

than enough to saturate the channel at points above 7.69 packets per second (source

packet rate). However, we can also observe that regardless of the source packet rate,

the measured channel load/utilization saturates below 70%. This is in agreement

with the limitation predicted by β (as shown in Section 3.4.1), if we can assume that

packet collision and buffer limitation do not contribute significantly to the observed

88

reduced channel load. To verify this assumption, we analyze the packet delivery

ratio at both the MAC and application layer in Figure 3-7.

We define the MAC packet delivery ratio as the percentage of packets sent by the

MAC layer at motes S and F that are actually received by mote R. The application

delivery ratio is the percentage of packets sent by the application layer (i.e., passed

down to the MAC queue) at motes S and F that are actually received by mote R.

Figure 3-7 shows that both application and MAC delivery ratios match each other

closely, indicating that nearly every packet that gets into the MAC queue is sent and

received successfully, eliminating the effect of packet collision and buffer overflowing

in the reduced channel load.

3.4.3 Energy Tax, Fidelity Penalty, and Power

We define three metrics to analyze the performance of CODA on sensing applica-

tions:

• Average Energy Tax - this metric calculates the ratio between the total number

of packets dropped in the sensor network and the total number of packets

received at the sinks over an experiment, as introduced in Section 3.3.2.1.

Since packet transmission/reception consumes the main portion of the energy

of a node, the number of wasted packets per received packet directly indicates

the energy saving aspect of CODA when compared to the case of systems

without CODA.

• Average Fidelity Penalty - we define the data fidelity as the delivery to the

sink of the required number of data event packets within a certain time limit

(i.e., event delivery rate). This metric measures the difference between the

average number of data packets received at a sink when using CODA and when

using the ideal scheme discussed in the Appendix A. Since CODA’s control

89

Src−1 Src−3Src−2

Sink

Figure 3-8: Experimental sensor network testbed topology. Nodes are well con-nected. Packets are unicast.

policy is to rate control the sources during periods of congestion, fidelity is

necessarily degraded on the average. This fidelity difference, when normalized

to the ideal fidelity obtained at the sink, indicates the fidelity penalty for using

CODA. A lower fidelity penalty is desired by CODA to efficiently alleviate

congestion while attempting not to impact the system performance seen by

sensing applications.

• Power - this metric calculates the ratio of data fidelity to energy tax. Tradi-

tional end-to-end congestion control schemes often define power as the ratio

of throughput to delay where the objective function is to maximize the power.

We borrow the same idea but maximize the power by operating the network

to minimize energy tax (thereby maximizing the operational lifetime of the

network) while delivering acceptable data fidelity to the applications. This is

the objective of our closed-loop control.

90

3.4.4 Open-loop Control

We create a simple generic data dissemination application to evaluate our congestion

control scheme in a wireless sensor network. The simple application implements the

open-loop fast time scale component of our scheme using TinyOS and runs on our

Mica mote testbed. When an intermediate (non source/sink) node receives a packet

to forward, it enables channel load sensing. It disables sensing when its packet

queue is emptied. If the channel load exceeds a given threshold value (e.g., 73%

as discussed in Section 3.4.1) during the sensing period or its buffer overflows, it

transmits a backpressure packet. The sources use a multiplicative rate reduction

policy. When a source receives a backpressure message, it reduces its rate by half.

A minimum rate of 2 packets per second is imposed such that a source sending

at this rate will ignore subsequent backpressure messages. An intermediate node

stops transmitting for a random amount of time (up to 400 ms) when it receives

a backpressure message except if it is the “chosen node”, as discussed in Section

3.3.1.3. No link-layer ACKs are used in any testbed experiments.

The experimental sensor network testbed topology is shown in Figure 3-8. Pack-

ets are unicast, with the arrows in Figure 3-8 indicating the unicast paths. The

topology represents a dense deployment of motes so that the radio range of many

of the motes in the graph overlap. The local congestion policy of the intermediate

nodes can include the designation of a “chosen parent” (i.e., the chosen node, as

discussed in Section 3.3.1.3) or set of parents, such that a backpressure message

sent by this node will invoke the suppression method at its neighbors except for

the chosen parent(s). This supports traffic prioritization. In Figure 3-8, the thick

arrows show the “chosen paths”. Paths funnel events toward the sink node. The

three source nodes provide a high traffic load to the network, representing a data

impulse. The source rates are: Src-1: 8pps (packets per second), Src-2: 4pps, Src-3:

91

0

0.2

0.4

0.6

0.8

1

0

0.5

1

1.5

2

2.5

3

Src−1 Src−2 Src−3


without CODAwith CODA

without CODAwith CODA

Fide

lity

Pena

ltyE

nerg

y T

ax

Figure 3-9: Improvement in energy tax with small fidelity penalty using CODA.Priority of Src-2 evident from the fidelity penalty results.

7pps, respectively.

The sink node counts the number of packets it receives from each respective

source. Each source node counts the number of packets it actually sends over the

air and the number of packets the application tries to send. The difference between

these last two counters measures the number of packets a source’s MAC layer drops.

Using ten 120-second trials, we obtain average values for the packets received,

sent, and attempted to be sent but failed (e.g., because of a busy channel, buffer

overflow, etc.) corresponding to each of the three sources. From this measured

data, we calculate the energy tax and fidelity penalty for each of the three sources.

Figure 3-9 shows the result of experiments with and without CODA enabled. We

can see from the figure that with a small fidelity penalty compared with non-CODA

system we can achieve a 3x reduction in energy tax on average. We observe that

without CODA the fidelity penalty is the same for all three sources. With CODA

92

Src1 Src2 Src3

Sink

Figure 3-10: Experimental sensor network testbed topology to capture the funnelingeffect in a larger network with sparsely located sources.

the penalty for Src-2 is less than the other two sources. In contrast with the other

sources, the fidelity penalty for Src-2 is less with CODA than without CODA. The

reason is because the data type of Src-2 has the highest priority. When CODA is

used in the presence of congestion, the suppression mechanism favors Src-2 packets

over the others.

3.4.5 Combining Open-loop and Closed-loop Control

We reuse the application described above but increase the network size by adding

more motes into the testbed to capture the funneling effect in a larger network as

shown in Figure 3-10. We implement CODA’s closed-loop control component, as

discussed in Section 3.3.2, into the application running in parallel with the open-

loop component. The first experiment examines the rate control dynamics of CODA.

Figure 3-11 presents time series traces taken at one of the sources in the topology,

i.e., Src1 in Figure 3-10. Wsink is set to 25 (representing 4% of control overhead)

and we examine our closed-loop model using two values for multiplicative factor d,

of 0.5 and 0.8, respectively. The two time series traces (source rate) closely resemble

the numerical example traces shown in Figure 3-5. We observe that the source rate

93

0

0.5

1

1.5

2

2.5

3

0 200 400 600 800 1000 1200

0

0.5

1

1.5

Sou

rce

Rat

e (p

kts/

sec)

Fid

elity

or

Agg

rega

te D

eliv

ery

Rat

e (p

kts/

sec)

Time (secs)

Src-1 Rate, d=0.5 Src-1 Rate, d=0.8

Event Delivery, CODAEvent Delivery, no CODA

Figure 3-11: Time series traces that present the rate control dynamics and theevent fidelity/delivery performance of CODA. CODA’s rate control scheme doesnot increase the degree of variability to the event delivery performance.

oscillates when using a smaller d = 0.5 and converges more slowly when compared to

its counterpart when d = 0.8. In the experiment, the open-loop control component

is running in parallel and backpressure signals are originated from the mote closest

to the sink (i.e. at the funnel neck). However, we observe that none of the signals

propagate back to any of the sources, confirming our postulation regarding the

limitations of open-loop control discussed in Section 3.2.2.2.

To understand the impact of our rate control algorithm on the stability of event

delivery/fidelity at the sink, we plot the event delivery rate measured at the sink as

time series traces in Figure 3-11. While the traces exhibit a high degree of variability

even without any rate control (trace with no CODA), we observe that CODA rate

control does not increase the degree of variation. Rather, the trace with CODA is

more stable (has less variation) after the rate converges to a value that is determined

by the ETax threshold in the closed-loop model.

The next experiment examines our closed-loop control model that controls the

tradeoffs between the perceived energy tax of the network and the perceived appli-

94

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

0.4 0.6 0.8 1 1.2 1.4 1.6

Pow

er (

pkts

/sec

)

Preset Etax Threshold

d=0.5,Wsink=12(8% ovrhd)d=0.8,Wsink=12(8% ovrhd)d=0.5,Wsink=25(4% ovrhd)d=0.8,Wsink=25(4% ovrhd)

Figure 3-12: Tradeoff between fidelity and energy tax that obtain the most benefit,i.e. maximum “power”, for the network.

cation fidelity. As discussed in Section 3.3.2.1, a smaller value of d yields a larger

saving of energy tax but negatively impacts the data fidelity. Similarly, allowing a

smaller value of ETax threshold in the network (Equation 3.4) would reduce Wsrc,

hence a smaller observation cycle. This makes the control algorithm more sensitive

to packet loss, thus reducing rate and energy tax more aggresively but would ad-

versely affect the data fidelity. To achieve a balance and obtain the best benefit out

of the closed-loop control, we calculate the power metric, defined in Section 3.4.3 as

the ratio of data fidelity to energy tax, and present the results with different control

parameters in Figure 3-12. The result clearly indicates that a smaller value of ETax

almost always guarantees a higher power. Therefore, a smaller observation cycle

can gain more in energy tax than it harms the fidelity. On the other hand, although

a smaller value of d gives a higher average power, the gain is less stable as observed

in the high degree of variability (indicated by the error bars, which represent the

corresponding 95% confidence intervals).

Finally, Figure 3-13 presents the performance gain of CODA compared to the

95

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

1 2 3 4 5 6 7 8 9

Pow

er (

pack

ets/

sec)

Aggregate Source Rate (packets/sec)

CODAno CODA

Figure 3-13: Power of CODA versus non-CODA in an experimental Mica motestestbed.

cases without CODA under different network workload. We observe that CODA is

able to prevent the network power from degrading exponentially when the workload

increases.

3.5. Simulation Results

We use packet-level simulation to study the scalability performance and the network

dynamics of CODA in large networks.

3.5.1 Simulation Environment

We implemented both open-loop backpressure and closed-loop regulation in the ns-

2 [60] simulator in their simplest instantiation; that is, a simple AIMD function

is implemented at each sensor source by an application agent. The reception of

backpressure messages at the source, or, in the case of closed-loop control, not

receiving a sufficient number of ACKs from the sink over a predefined period of time,

will cause a source to cut its rate by half (i.e., d = 0.5). For intermediate nodes

96

(non source/sink), local congestion policy is such that backpressure messages will

halt a node’s transmission for a small random number of packet times (i.e., packet

transmission times) unless a node is the chosen node specified in the backpressure

message, as discussed in Section 3.3.1.3.

In all our experiments, we use random topologies with different network sizes.

We generate sensor networks of different sizes by placing nodes randomly in a square

area. Different sizes are generated by scaling the square size and keeping the nominal

radio range (40 meters in our simulations) constant in order to approximately keep

the average density of sensor nodes constant. In most of our simulations, we study

five different sensor fields with size ranging from 30 to 120 nodes in increments of 20

nodes. For each network size, our results are averaged over five different generated

topologies and each value is reported with its corresponding 95% confidence interval.

Our simulations use a 2 Mbps IEEE 802.11 MAC provided in ns-2 simulator,

with some modifications. First, we disable the use of RTS/CTS exchanges and link-

layer ARQ for data packets. We do this for the reasons discussed in Section 3.2.1

because we want to capture the realistic cases where reliable delivery of data is not

needed and the fidelity can be compromised to save energy. Although we use IEEE

802.11 in the simulation, most sensor platforms use simpler link technologies where

the ARQ is not enabled by default, (e.g., Berkeley motes). Next, we added code to

the MAC to measure the channel loading using the epoch parameters (N = 3, E =

200ms, α = 0.5), as defined in Section 3.3.1.2. The choice of the parameters is not

crucial because the ns-2 simulator does not model the details of the IEEE 802.11

physical layer. The MAC broadcasts backpressure messages when the measured

channel load exceeds a threshold of 80%. We have added code to model the channel

idle detection delay with a β of 0.01, which yields a Smax of 80%. Closed-loop

multi-source regulation is implemented as an application agent attached to source-

97

Event Epicenter

Figure 3-14: Network of 30 nodes. Sensors within the range of the event epicentre,which is enclosed by the dotted ellipse, generate impulse data when an event occurs.The circle represents the radio range (40m) of the sensor.

sink pairs. Wsink is set to 100 and the ETax threshold to 2 for the closed-loop control

parameters.

Finally, we use Directed Diffusion [11] as the routing core in the simulations

since our congestion control fits nicely into the diffusion paradigm, and since doing

so allows insight into CODA’s interaction with a realistic data routing model where

congestion can occur.

In most of our simulations, we use a fixed workload that consists of 6 sources

and 3 sinks. All sources are randomly selected from nodes in the network. Sinks

are uniformly scattered across the sensor field. A sink subscribes to 2 data types

corresponding to two different sources. This models the typical case in which there

are fewer sinks than sources in a sensor field. Each source generates packets at a

different rate. An event packet is 64 bytes and an interest packet is 36 bytes in size

[11], respectively.

98

3.5.2 Results and Discussion

We evaluate CODA under the three distinct congestion scenarios discussed in the

Introduction section to best understand its behavior and dynamics in responding

to the different types of congestion found in sensor networks. First we look at a

densely deployed sensor field that generates impulse data events. Next, we examine

the behavior of our scheme when dealing with transient hotspots in sparsely deployed

sensor networks of different sizes. Last, we examine the case where both transient

and persistent hotspots occur in a sparsely deployed sensor field generating data at

a high rate.

3.5.2.1 Congestion Scenario - Dense Sources, High Rate

We simulate a network with 30 nodes, as shown in Figure 3-14, emulating a disaster-

related event (e.g., fire, earthquake) that occurs 10 seconds into the simulation. Each

node within the epicenter region, which is enclosed by the dotted ellipse, generates

at least 100 packets per second sent toward the sinks, shown as filled black dots in

the figure.

Figure 3-15 shows both the number of packets delivered and the packets dropped

as time series traces. For the packet delivery trace, we count the number of data

packets a sink receives every fixed interval of 500ms, which indicates the fidelity

of the data samples. For the packet dropped trace, we count the number of data

packets dropped within the whole network every 500ms.

From the traces, it is clear that the difference in data delivery (fidelity) with and

without CODA is small, while the number of packets dropped is an order of magni-

tude smaller (hence the energy savings) when congestion control is applied. We can

also observe from the plot that the congestion is effectively relieved within 2 to 3

seconds. This shows the adaptive property of CODA. The delivery plot reflects the

99

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30

Num

ber

of P

acke

ts

Time (s)

packet-delivery-trace-CCpacket-delivery-trace-noCC

packet-drop-trace-CCpacket-drop-trace-noCC

Figure 3-15: Time series traces for densely deployed sources that generate high ratedata.

real system goodput, which is highly dependent on the system capacity, indicating

the maximum channel utilization. When impulses happen, the channel is saturated

so it can deliver only a fraction of the event’s data. CODA’s open-loop backpressure

(even with a very simple policy) adapts well to operate close enough to the channel

saturation, as shown in Figure 3-15, while efficiently alleviating congestion. This

greatly reduces the number of packets dropped thereby saving energy, which is the

key objective function for CODA. The same simulation scenario is repeated 5 times

using different topologies of the same size. Overall, using CODA obtains packet

(energy) saving up to 88 ± 2% while the fidelity penalty paid is only 3 ± 11%.

3.5.2.2 Congestion Scenario - Sparse Sources, Low Rate

To examine the ability to deal with transient hotspots, in these simulations all six

sources send at low data rates, at most 20 packets per second. Four of the sources

are randomly selected so that they are turned on and off at a random time between

10 and 20 seconds into the simulation.

100

(b.)

(a.)

0

50

100

150

200

250

300

0 5 10 15 20 25 30Time (s)

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30Time (s)

pkt−delivery−OCCpkt−delivery−CCC

pkt−delivery−noCC

pkt−drop−CCCpkt−drop−noCC

pkt−drop−OCC

Num

ber

of P

acke

tsN

umbe

r of

Pac

kets

Figure 3-16: (a) Packet delivery and (b) Packet drop time series traces for a 15-nodenetwork with low rate traffic. The plots show the traces for three cases: when onlyopen-loop control (OCC) is used, both open-loop and closed-loop control (CCC) areenabled and when congestion control is disabled (noCC).

101

0

1

2

3

4

5

6

7

8

20 40 60 80 100 120

Rat

io

Network Size (# of nodes)

E.Tax.noCODAE.Tax.CODA

Fidelity.Penalty.CODA

Figure 3-17: Average energy tax and fidelity penalty as a function of the networksize when only CODA’s open loop control is used.

Figure 3-16 shows the packet delivery and packet drop traces for one of the simu-

lation sessions in a network of 15 nodes. Observe in Figure 3-16(a), the difference in

fidelity between the three cases is small, except for around 20 seconds into the trace,

where only open-loop control is used. Figure 3-16(b) shows a large improvement in

energy savings (i.e., packet drop reduction) especially when closed-loop control is

also enabled together with open-loop control. Again, the figure shows that at around

20 seconds into the trace, open-loop control cannot resolve congestion as there is no

reduction in the number of dropped packets and there is low delivery during this pe-

riod. This is because transient hotspots turn into persistent congestion at around 18

seconds into the trace until four of the sources turn off after 20 seconds. Open-loop

control cannot deal with persistent congestion unless the hotspots are close to the

sources, as discussed in Section 3.2.2.2. On the other hand, the trace corresponding

to closed-loop regulation also shows that the fidelity is maintained while effectively

alleviating congestion with only a small amount of additional signaling overhead.

Importantly, the signaling cost of CODA is less than 1% with respect to the number

of data packets delivered to the sink.

102

0

5

10

15

20

20 40 60 80 100 120

Rat

io


High.noCODA.E.TaxLow.noCODA.E.Tax

High.CODA.E.TaxLow.CODA.E.Tax

Figure 3-18: Energy tax as a function of network size for high and low rate datatraffic. The difference between the data points with and without CODA indicatesthe energy saving achieved by CODA.

The same behavior can be observed in Figure 3-17, where the two metrics (i.e.,

energy tax and fidelity penalty) are plotted as a function of the network size. Note

that when using only open-loop control, the energy savings has a large variation,

indicated by the error bars that represent 95% confidence intervals. This indicates

that congestion is not always resolved, especially for larger-sized networks. This is

because in larger networks, persistent hotspots, which localized open-loop control is

unable to resolve, are more likely to occur given the long routes between source-sink

pairs. When closed-loop control is also enabled, the energy savings is large, up to

500% with a small variation, and increases with the growing network size, as shown

in Figure 3-18.

Overall, the gain from using open-loop control in larger networks is limited.

Hotspots are likely to persist when the sources are generating data at a low rate

because of possible long routes and the funneling effect. Enabling closed-loop control

even at low source rates can improve the performance significantly, with the addition

of a small overhead for the control packets from sinks. Note that the amount of

103

overhead is only a small fraction (i.e., 1%, of the number of data packets that the

sink receives). This result suggests that except for small networks, always enabling

closed-loop control is beneficial, regardless of the source rate. This is an important

observation that guides the use of CODA’s mechanisms in sensor networks.

3.5.2.3 Congestion Scenario - Sparse Sources, High Rate

We examine the performance of our scheme in resolving both transient and per-

sistent hotspots where sparsely located sources generate high data traffic. In the

simulations, all sources generate 50 packets per second data traffic over the 30 sec-

ond simulation time. Both open-loop and closed-loop control are used throughout

the simulations. Figure 3-18 shows that CODA can obtain up to 15 times or 1500%

energy savings. Figure 3-19 shows that CODA can maintain a relatively low fidelity

penalty of less than 40% as compared to the ideal scheme. Observe that energy tax

increases as the network grows in general. However, in Figure 3-18 we can see that

the energy tax actually decreases when the network grows beyond the size of 100

nodes (the same behavior can be observed in Figure 3-17). This is because under

a fixed workload, which is the case in our simulations, a network’s capacity could

increase when the network grows beyond certain sizes. This is because the data

dissemination paths from the sources to the sinks spread across a broader network

and the funneling effect is lessened.

3.6. Related Work

There is a growing interest in the problem of congestion in sensor networks. The

need for congestion avoidance techniques is identified in [35] while discussing the

infrastructure tradeoffs for sensor networks. Tilak, Abu-Ghazaleh, and Heinzelman

[35] show the impact of increasing the density and reporting rate on the performance

104

0

0.2

0.4

0.6

0.8

1

20 40 60 80 100 120

Rat

io


LowRate.Fidelity.PenaltyHighRate.Fidelity.Penalty

Figure 3-19: Fidelity penalty as a function of the network size for high and low ratedata traffic.

of the network. While the authors do not propose any congestion avoidance mech-

anisms, they do note that any such mechanism must converge on a reporting rate

that is just sufficient to meet the performance or fidelity of the sensing application.

This is an important observation in the context of sensor networks.

Some existing data dissemination schemes [11] [49] can be configured or mod-

ified to be responsive to congestion. For example, Directed Diffusion [11] can use

in-network data reduction techniques such as aggressive aggregation when conges-

tion is detected. Other protocols, such as PSFQ (Pump Slowly Fetch Quickly [49], a

reliable transport protocol for sensor networks discussed in Chapter 2) can adapt the

protocol (i.e., modulate its pump/fetch ratio) to avoid congestion. However, such

approaches involve highly specialized parameter tuning, accurate timing configura-

tion, and an in-depth understanding of the protocol’s internal operations. There is a

need for a comprehensive set of congestion control mechanisms specifically designed

to best fit the unique constraints and requirements of sensor networks and their

emerging applications. These mechanisms should provide a general set of compo-

105

nents that can be plugged into applications or the MAC in support of energy efficient

congestion control.

In [75] a comprehensive study of carrier sensing mechanisms for sensor networks

is reported. The authors propose an adaptive rate control mechanism that supports

fair bandwidth allocation for all nodes in the network. Implicit loss (i.e., failed

attempts to inject a packet into the network) is used as a collision signal to ad-

just the transmission rate of nodes. The paper focuses on fairness issues in access

control but not congestion control. In [77] the authors assume homogeneous appli-

cations in an indoor environment where sinks are sensor access points (SAPs) that

work collaboratively to collect data from a sensor field. The authors propose using

a combination of a hop-by-hop flow control scheme and a SAP selection routing

metric that considers packet loss probabilities, path load, and path length to select

congestion-free paths to SAPs, improving the capacity of the network.

In [66] an event-to-sink reliable transport protocol (ESRT) provides support for

congestion control. ESRT regulates the reporting rate of sensors in response to

congestion detected in the network. This paper is inspired, as our work is, by the

observations of Tilak, Abu-Ghazaleh, and Heinzelman [35] discussed above. ESRT

monitors the local buffer level of sensor nodes and sets a congestion notification

bit in the packets it forwards to sinks if the buffer overflows. If a sink receives a

packet with the congestion notification bit set it infers congestion and broadcasts

a control signal informing all source nodes to reduce their common reporting fre-

quency according to some function. As discussed in [66] the sink must broadcast

this control signal at high energy so that all sources in the sensor field can hear it.

Such a signal has a number of potential drawbacks, however, particularly in large

sensor networks. Any on-going event transmission would be disrupted by such a

high powered congestion signal to sources. In addition, rate regulating all sources

106

in the manner proposed in [66] is fine for homogeneous applications where all sen-

sors in the network have the same reporting rate but not for heterogeneous sources.

Even with homogeneous sources, ESRT always regulates all sources regardless of

where the hotspot occurs in the sensor field or whether the observed hotspot im-

pacts a path between a source and sink. We believe there is a need to support

heterogeneous sources and only regulate those sources that are responsible for, or

impacted by, transient or persistent congestion conditions. Furthermore, we be-

lieve that closed-loop regulation of sources should not use high energy but instead

hop-by-hop signaling that does not interfere with on-going data dissemination.

More recently, Ee and Bajcsy study the fairness issues of congestion control in

sensor networks [78]. They propose a distributed congestion control algorithm in

the transport layer of the traditional network stack model, to ensures the fair de-

livery of packets to a central node. In [76], Hull et al. experimentally investigate

the end-to-end performance of various congestion avoidance techniques in a 55-node

sensor networks. They propose a strategy called Fusion that combines three con-

gestion control techniques that operate at different layers of the traditional protocol

stack. These techniques include a version of hop-by-hop flow control (similar to

CODA’s open-loop control), a source rate limiting scheme (similar to the adaptive

rate control mechanism proposed in [75]) that meters traffic being admitted into

the network, and a prioritized MAC layer that gives a backlogged node priority

over non-backlogged nodes for access to the shared medium. Based on an extensive

amount of experimental data from the sensor network, the paper shows the ad-

verse effects of network congestion and demonstrates that Fusion, the combination

of these three techniques, can greatly improve the network efficiency (up to 300%)

under realistic workloads.

A number of other groups have looked at the issue of congestion control in

107

wireless networks other than sensor networks. For example, WTCP [79] monitors

the ratio of inter-packet separation for senders and receivers to detect and react to

congestion in wireless LANs. SWAN [80] forces sources to re-negotiate end-to-end

flows if congestion is detected in wireless ad hoc networks. RALM [81] employs

TCP-like congestion and error control mechanisms for multicast support in wireless

ad hoc networks. While multicast congestion control and congestion control in

wireless networks are of interest they do not address the same problem space as

energy efficient congestion detection and avoidance for sensor networks.

3.7. Conclusion

In this chapter, we have presented an energy efficient congestion control scheme for

sensor networks called CODA. The framework is targeted at CSMA-based sensors3,

and comprises three key mechanisms: (i) receiver-based congestion detection, (ii)

open-loop hop-by-hop backpressure, and (iii) closed-loop multi-source regulation.

We have presented experimental results from a small sensor network testbed based

on TinyOS running on Berkeley Mica motes. We defined three performance metrics,

average energy tax, average fidelity penalty and power, which capture the impact

of CODA on sensing applications’ performance. A number of important results

came out of our study and implementation. It was straightforward to measure

β, channel loading at the receiver, and to evaluate CODA with a generic data

dissemination scheme. We have also demonstrated through simulation that CODA

can be integrated to support data dissemination schemes and be responsive to a

number of different congestion control scenarios that we believe will be prevalent in

future sensor network deployments. Simulation results indicated that CODA can

3Other than the congestion detection component, the two control components are independentof the MAC used and can work with other scheduled-based MAC such as TDMA.

108

improve the performance of Directed Diffusion by significantly reducing the average

energy tax with minimal fidelity penalty to sensing applications. These results are

very promising and provide a basis for further larger scale experimentation.

Our study of congestion problems in this chapter also reveals that the unique

funneling effect in sensor networks as well as the low-power radio communication

channel can significantly limit the networks’ ability to deliver high fidelity data

from sources to sinks. For example, in Section 3.4.4 our experiment with CODA

in a real testbed setting showed that the application fidelity penalty (i.e., the mea-

sured degradation in application quality as measured at the sink) during periods of

congestion can be up to 80% penalty (see Figure 3-9). To overcome this capacity

limitation, new technologies must be introduced. In the next chapter, we address

this challenge and explore alternative or complementary solutions that can maintain

the application fidelity during persistent overload conditions based on the concept

of dual radio virtual sinks.

109

A. Experimentally determining the ideal fidelity of a network

Assume that there exists an ideal congestion control scheme that is capable of

rate-controlling each source to share the network capacity equally without dropping

each other’s packets. The problem then becomes finding out the network capac-

ity or at least the upper bound of the network capacity. The actual capacity of

the network is application-specific depending on several factors including the radio

bandwidth, the MAC operations, the routing/data dissemination schemes, and the

traffic pattern. Assume that the network is homogeneous in the sense that all wire-

less links are symmetrical and equal. We can determine the upper bound of the

network capacity in a simple and practical manner through experimentation. The

idea is as follows:

Def: Cmax,i = Maximum data delivery rate of a path i associated with source i,

in which the packet drop rate is minimum.

Consider that multiple distinct sources send data toward a common sink trav-

elling along different paths. Assume these dissemination paths from the sources to

the sink coincide with each other and share at least one common link. This is a

reasonable assumption considering the funneling effect toward the sink that these

transmissions have to share at least the air around the sink. Therefore, the data

dissemination capacity for a sink is limited by MaxCmax,i. Thus we can find the

upper bound and calculate the ideal fidelity by measuring MaxCmax,i experimen-

tally.

110

Chapter 4

Dual Radio Virtual Sinks

4.1. Introduction

Wireless sensor networks [5] [11] comprise emerging technologies that offer a low

cost, distributed monitoring solution for a wide variety of applications and systems.

One application driving the development of sensor networks is the reporting of

conditions within a region of interest where the environment can abruptly change

due to a sudden event, such as target movements on the battlefield, biochemical

attack or fire. This chapter focuses specifically on sensor systems that are to be

designed to efficiently deliver information during and immediately following an event

that triggers an abrupt change.

This chapter describes a strategy of handling sudden impulses of data, which

will otherwise move the sensor network almost instantaneously from light load to

overload, while maintaining application fidelity. Data must be delivered through

the sensor network quickly to a relatively small number of physical sink points that

are attached to the regular communication infrastructure. Sensor networks exhibit

a unique funneling effect where events are generated en masse and then have to

quickly move toward a sink point. The flow of events has similarities to the flow of

people from a large arena after a sporting event completes. This leads to a number

111

of significant challenges that include increased transit traffic intensity, congestion

and packet loss (and therefore energy and bandwidth waste) at nodes closer to the

sink. These have a detrimental effect on the operational lifetime of sensor networks.

The major limitation in the design of existing sensor networks is that they are

ill-equipped to deal with data impulses. In [82] [41], the authors show that existing

sensor network protocols and technologies experience and allow large packet loss

even under light to moderate load. What is needed is a well-planned, coordinated

(in a distributed and scalable manner) data exit strategy through a likely small

number of physical sinks.

We propose the novel idea of randomly distributing throughout the sensor field

a small number of all-wireless dual-radio virtual sinks (VSs) that are capable of

offering enhanced congestion avoidance services to the existing low-power sensor

network. While such special nodes can be exploited to support a variety of ap-

plication specific (e.g. aggregation, coding) and common network functions (e.g.,

storage, localized activation), we focus here on the ability to selectively siphon off

data events from regions of the sensor field under critical overload. In essence vir-

tual sinks operate as safety valves in the sensor field to divert selected packets from

congested areas in order to maintain the fidelity of the application signal (e.g., as

simple as events/sec, or more complex) at the physical sink, and to alleviate the

funneling effect.

In this chapter we call these specialized nodes virtual sinks (VSs) to distinguish

them from physical sinks that typically provide a gateway to the Internet via a

wireline interface. Virtual sinks are equipped with a secondary long-range radio

interface, such as IEEE 802.11, in addition to their primary low power mote radio.

Virtual sinks are capable of dynamically forming a secondary ad hoc radio network

that is rooted at a physical sink. Rather than rate controlling as in the case of

112

the congestion avoidance techniques such those as used by CODA [82] described in

previous chapter, virtual sinks take some traffic off the low-powered sensor network

(i.e., off the primary radio network) when persistent congestion is detected, and

move it to the physical sink using the secondary radio network.

The chapter is organized as follows. Section 4.2. presents the related work. Sec-

tion 4.3. discusses a number of important design considerations such as the funneling

effect, small world observations that can be exploited, and traffic redirection strate-

gies. Section 4.4. provides the detailed design of the Siphon algorithms. While

our design addresses overload traffic management in sensor networks we believe the

Siphon algorithms are more generally applicable to a broader class of new applica-

tions that exploit special nodes with additional capability (e.g., dual radio, more

computational capability or more storage). Section 4.5. studies the performance

properties of Siphon using the ns-2 simulator, which is enhanced to support dual

radio virtual sink nodes. Experimental results from a Stargate implementation of

virtual sinks in a Mica sensor network testbed are reported in Section 4.6.. Section

4.7. concludes.

4.2. Related Work

Event-based sensor networks generate impulse data traffic triggered by events of

interest. Large scale events (e.g. forest fires, earthquakes) can generate large im-

pulse waves of correlated data across a large area, creating a bottleneck down the

propagation funnel toward a sink even when the report rate is low.

Existing congestion control [82] [66] [76] schemes do not adequately address the

funneling effect and the type of congestion scenarios exhibited because of this effect.

Our prior work on CODA discussed in previous chapter is representative of the first

generation congestion control schemes. CODA provides a conservative solution to

113

mitigating congestion in sensor networks and assumes that all nodes are equal (with

the exception of the sink) in trying to counter and react to the onset of congestion.

CODA’s congestion control policy at source and forwarding nodes is to rate control

the traffic through a hop-by-hop backpressure mechanism as well as a closed-loop

multi-source regulation scheme during periods of persistent congestion. Thus, when

congestion occurs and the channel becomes saturated, the application fidelity [35],

which can be viewed as the application’s quality of service measured at the sink,

can be significantly degraded.

ESRT [66] regulates the reporting rate of sensors in response to congestion de-

tected in the network by monitoring the local buffer level of sensor nodes. In [76],

Hull et al. experimentally investigate the end-to-end performance of various con-

gestion avoidance techniques in a 55-node sensor network. They propose a strategy

called Fusion that combines three congestion control techniques that operate at dif-

ferent layers of the traditional protocol stack. These techniques include a version

of hop-by-hop flow control similar to CODA’s open-loop control, a source rate lim-

iting scheme similar to the adaptive rate control mechanism proposed in [75] that

meters traffic being admitted into the network, and a prioritized MAC layer that

gives a backlogged node priority over non-backlogged nodes for access to the shared

medium.

A collision-minimizing CSMA MAC is proposed in [70] that is optimized for

event-driven sensor networks. The authors propose to utilize a non-uniform prob-

ability distribution for nodes to randomly select contention slots such that colli-

sions between contending stations are minimized. This MAC can reduce packet loss

around hotspots but can not completely resolve congestion due to the funneling

effect when the incoming traffic exceeds the node capacity and the queue overflows.

While CODA and other schemes are capable of avoiding congestion/collision and

114

costly packet loss and therefore energy waste, it is to the detriment of the maximum

number of events that can be funnelled to the sink. The fundamental question that

this chapter addresses is whether alternative or complementary solutions exist that

maintain the application fidelity during persistent overload conditions.

Recently, the idea of utilizing multiple coordinated radios operating over multiple

channels to improve and optimize wireless network capacity has been proposed. In

[83] the authors exploit the possibility of adding a second low-power radio of lower

complexity and capability into a node in a wireless LAN network to increase the

battery lifetime of the node. The main idea is to use the secondary lower-power

radio to wake up a node, allowing the node to shutdown the primary radio during

idle periods. In [53], the authors explore the implications of using multiple radios

on each node that work in an integrated manner to solve a number of existing

problems in wireless networking with a focus toward the energy management and

capacity enhancement issues in wireless LAN environments.

In [84], the authors propose leveraging processing and energy heterogeneity in

sensor network to improve the network performance, including a topology control

protocol which systematically shifts the network’s routing burden to energy-rich

nodes. More recently, in [85] the authors propose to extend the idea of exploiting

heterogeneity in sensor networks to use a modest number of line-powered back-

hauled nodes that connect to the wired backbone network. They prove analytically

that this approach increases network reliability and lifetime in a grid network. They

do not, however, consider congestion or the funneling effect in the network.

Finally, [86] [87] investigate the optimal radio transmission range that balances

the wireless capacity and network connectivity from a theoretical viewpoint. Results

suggest that the smallest transmission range that is just enough to assure network

connectivity is optimal. However, the papers do not consider the dual-radio config-

115

uration, where a separate secondary channel can increase the network capacity.

4.3. Design Considerations

A number of questions arise when studying the deployment of virtual sinks. What

is the optimal number and distribution of virtual sinks to minimize congestion and

energy consumption? Utilizing a longer-range radio is usually demanding in terms of

energy consumption. Therefore, one should only activate the secondary long-range

radio when it is needed. When does a virtual sink offer such hotspot services to local

sensors? How do sensors discover local virtual sinks? When congestion or overload

conditions occur which packets should be redirected onto the secondary long-range

network? How can sensor networks automatically benefit from the existence of

virtual sinks in their neighborhoods, but maintain uninterrupted services in their

absence using the existing congestion avoidance mechanisms, such as those discussed

earlier [82] [66] [76]? What if the virtual sinks cannot form a connected network

with the physical sink on their own? In what follows, we explore these questions

and discuss the technical considerations that underpin the design of Siphon. The

detailed design is presented in Section 4.4..

4.3.1 Funneling Effect

Conventional networks assume traffic flows in all directions. However, sensor net-

works exhibit a unique funneling effect where events are generated en masse and

then have to quickly move toward a relatively small number of physical sink points.

Figure 4-1 illustrates the funneling effect. Sensors within the range of an event

epicenter generate impulse data that travels along a propagation funnel toward a

sink when an event occurs. One or more physical sinks can exist at any location in

the sensor field to collect the event data from the active sensors. Sensors located

116

Physical sink Virtual sink sensor Active sensor

Figure 4-1: The funneling effect. Sensors within the range of an event re-gion/epicenter (enclosed by the dotted ellipse) generate impulse data that travelalong a propagation funnel (enclosed by dotted line) toward the sink when an eventoccurs.

within the propagation funnel between the event epicenter and the physical sink

will typically consume more energy. This leads to a number of significant challenges

that virtual sinks can help address.

First, the funneling effect places heavier load on sensors that are closer to a

sink point. As a result the sensors nearest the sink will use energy at the fastest

rate, significantly impacting the operational lifetime of the network. Second, traffic

intensifies at the neck of the funnel causing congestion, packet loss, and therefore

wasted energy and bandwidth. The aggregation of data events can help offset con-

gestion and the disproportionate amount of energy consumed by forwarding nodes

located nearer the sink by trading off computation and communications resources.

However, it is unlikely that aggregation techniques alone can completely resolve the

congestion problem and funneling effect. Because of the build up of traffic close to

the sink, loss of aggregated data packets is more likely. This can severely impact the

117

reporting capability (i.e., the fidelity) of the network to meet the application’s needs.

Aggregate packets not only represent the reporting of accumulated events from the

network and are considered more ”valuable” in comparison to non-aggregated pack-

ets, but also they consume more energy on average than non-aggregated packets

by the time they eventually reach the sink. This argues for priority treatment of

aggregated packets in the case of congestion.

4.3.2 Small World Observations and Shortcuts

By using specialized dual-radio nodes as virtual sinks, the second long-range radio

can serve the purpose of creating “shortcuts” in the sensor network among other

virtual sinks and one or more physical sinks. Our goal is to design control mecha-

nisms such that the network can automatically benefit from the existence of virtual

sinks in the neighborhood, in a way that the gain increases when the number of

these nodes increases, but to maintain uninterrupted baseline services even without

any of these nodes.

While a dual-radio sensor platform (e.g. Stargate [48]) is feasible, the cost is still

much higher than a single-radio platform (e.g., Berkeley motes series [23]). There-

fore, the cost of deploying a large number of dual-radio sensor platforms in sensor

networks is prohibitive. Recent “small world” studies conducted by Watts and Stro-

gatz [47] has shown that a small fraction of shortcut nodes randomly distributed

in a network is enough to effectively reduce the network diameter, resulting in a

fast distribution network. From this, we conjecture that only a small fraction of

shortcut nodes (i.e., virtual sinks) would be needed to create a fast secondary radio

distribution network for overload traffic.

To examine this conjecture, we simulate a sensor network of 100 nodes using the

ns-2 simulator, with nodes randomly distributed across a 350m x 350m square. The

118

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30

Nor

mal

ized

Sho

rtes

t dis

tanc

e (R

atio

)

Fraction of Dual-radio nodes (%)

Maximum Shortest DistanceAverage Shortest Distance

Figure 4-2: Reduction of average distance in a network with increasing percentageof dual-radio nodes that provide the shortcuts between nodes.

transmission radii of the sensors are 30m for the primary low power, short-range

radio and 150m for the secondary long-range radio. We vary the number of dual-

radio virtual sink nodes from 1% to 30% of the total number of nodes in the sensor

field. A node is randomly selected as a virtual sink from the set of all nodes until we

have the designated fraction of virtual sinks in the network. The shortest distances

for the network are computed using the Floyd-Warshall algorithm [88] which has an

O(N 3) complexity. We independently generate ten networks. Figure 4-2 shows the

average shortest distance and the maximum shortest distance in the network with

a varying fraction of virtual sinks. Both shortcut metrics are normalized against

results when there are no virtual sinks present. Each data point is an average

over the ten network configurations. Assuming physical sinks can be located at

any location in the network, the figure clearly shows that when only 5% of virtual

sinks exist in the network, the average distance is halved. This is a very promising

result because it indicates that a small number of nodes can be used to form a

fast, low-diameter secondary distribution network that can serve to re-route traffic

to a physical sink. This result underpins the viability of the virtual sink concept,

119

showing enhanced congestion control and load balancing in a sensor network can be

implemented in a cost-effective manner.

To ensure a reasonable probability of finding a local virtual sink in the neigh-

borhood of a propagation funnel, there is a lower bound on the number of virtual

sinks needed in the network. In Section 4.5.8 we present an analytical model to offer

insights into finding this lower bound.

4.3.3 Traffic Redirection and Prioritization Issues

The goal of virtual sinks is to steer overload traffic, i.e., redirect event packets

away from congested regions toward a physical sink. Figure 4-1 illustrates this

idea; virtual sinks behave as local sinks that “attract” part of the traffic from a

congested neighborhood, and send it over a long-range radio link that provides a

shortcut distribution path toward the physical sink. Since the secondary, long-range

distribution path usually involves shorter hop distances, delivery latency is likely

to be reduced especially when the secondary radio has higher bandwidth, as in the

case of the Stargate [48] platform.

Utilizing long-range radio is usually expensive in terms of energy consumption;

therefore a virtual sink should only activate its secondary radio when it is necessary

and beneficial. Assuming nodes can measure and quantify their local congestion

levels, the point at which sensors should start (or stop) using available local virtual

sink services must be determined. Further, there is a need to develop redirect

algorithms capable of redirecting designated traffic (e.g., data impulses) as early in

the propagation funnel as possible. That is, data impulse traffic will be redirected

to a virtual sink at the farthest upstream locations within the propagation funnel,

possibly even under lower congestion levels. This would effectively divert part of the

traffic to nearby virtual sinks (if they exist) to counter the funneling effect. Section

120

4.4.2 presents an efficient scheme to address this issue.

Another important issue is deciding what portion of the local forwarding traffic

virtual sinks should redirect. As a general approach, nodes will maintain a dynamic

list of event packet types, e.g., data types (possibly prioritized), aggregates, and

control messages. When congestion exceeds a certain threshold, a node that has

a virtual sink within l hops distance determines whether to redirect certain data

types to its local sink according to local policy. One example is only redirecting

data types associated with impulse data, (e.g., seismic data) as opposed to periodic

events (e.g., temperature). Another policy would be to redirect prioritized or im-

portant target event data directly to the nearest virtual sink because its expedited

service can increase the timeliness and reliability of packet delivery. The traffic redi-

rection through virtual sinks may unavoidably introduce out-of-order data delivery

into the network, at least for a short period during which packets that are previ-

ously delivered through the primary network (before the redirection occurs) to the

physical sink lag behind those later packets that are delivered through virtual sinks.

However, because sensor data are typically time-stamped [26] for the purposes of

local collaborations or aggregations, the impact of a short period of out-of-order

data delivery is minimal to the applications.

4.3.4 Transparency and Compatibility Issues

Our goal is to design a general solution that is “application-agnostic”, i.e. to provide

a common set of fundamental building blocks that can be readily combined and in-

corporated into emerging and off-the-shelf sensor networking technologies, including

MAC [14] [16] [72], transport and routing [11] [89] protocols. Hence, it is important

to design a scheme that maintains maximum transparency and compatibility with a

wide variety of sensor technologies, as mentioned above. Furthermore, virtual sinks

121

can be incrementally deployed in sensor networks in response to different workloads

(i.e., traffic patterns, such as continuous event, and impulse-based offered load) and

fidelity needs.

One way to maintain maximum transparency is to use the same routing logic

on both primary and secondary radio networks, i.e. the networking layer views the

network as a homogeneous system, despite the existence of special dual-radio nodes.

A virtual sink simply advertises the lower routing cost calculated among its two radio

interfaces to its neighbors. In this approach, a virtual sink is likely to be always

utilized by its neighbors (even if there is no congestion) because of the shortcut

routes (hence lower routing cost) through the secondary radio network. Routing

protocols in sensor networks [11] [89] often take into account the link qualities of

the neighbors in calculating routing cost. Therefore, congestion will eventually be

reflected on the routing cost as it degrades the link qualities of neighboring nodes,

forcing data packets to be routed through other non-congested nodes, possibly a

virtual sink. While this is a simple approach to exploit virtual sinks, it lacks the

flexibility to fulfill our goals, specifically for the following reasons:

1. A virtual sink is not necessarily energy-rich, it might not be feasible to con-

tinuously utilize the secondary longer range radio for an extended amount of

time.

2. Because of the stability requirement of a routing protocol as well as its sig-

naling overhead considerations, the routing cost updates occur at a relatively

longer time scale. Relying on routing cost updates to route around a hotspot

could be too slow to deal with impulse data traffic since the congestion could

be short-lived.

3. Shortest paths or minimum cost paths are not always the best choice during

122

congestion, considering the funneling effect as discussed in Section 4.3.1. Our

goal is to divert data traffic in a timely manner to avoid congestion while

maximizing the application fidelity, even if it requires involving sub-optimal

paths through a virtual sink.

In this chapter, we investigate a different approach. We explicitly expose the

existence of special nodes, the virtual sinks, to the network to fully exploit the het-

erogeneity in the network. Different radio technologies are designed and optimized

for different usages. For example, the RFM [37] or Chipcon [46] radios used on

the Berkeley Mote platforms [23] are energy efficient, but support low data rates

in short range, while the IEEE 802.11 radio supports high data rate, long range

operations but consumes a large amount of energy. Because of the different radio

characteristics, the best routing strategy for different radio networks is likely to be

different. For example, the RFM radio’s lack of frequency diversity makes it vulner-

able to multipath fading and it suffers from time-varying link quality, while IEEE

802.11 radio offers a much better resistance to multipath channel fading because of

its spread spectrum design. Therefore, different routing metrics are likely to be used

in different radio networks to take advantage or to complement the characteristics

of different radio technologies. Section 4.4. presents an algorithmic approach that

supports the above features.

4.4. Siphon Design

The key aim driving the design of Siphon is to exploit the existence of virtual sinks

in the network to siphon overload traffic from the network, thereby mitigating the

funneling effect. The realization of this idea requires a protocol with the follow-

ing characteristics: (i) an energy-efficient mechanism to discover the existence of

virtual sinks in the neighborhood, (ii) an accurate congestion detection technique

123

to determine the correct time to utilize the discovered virtual sinks, and finally

(iii) a transparent mechanism to influence the dissemination path of event data. In

what follows, we discuss the detailed design of these components. While our design

addresses overload traffic management in sensor networks we believe the Siphon al-

gorithms are more generally applicable to a broader class of new applications that

exploit special nodes with additional capability (e.g., dual radio, more computa-

tional capability or more storage).

4.4.1 Virtual Sink Discovery and Visibility Scope Control

Like any new service, we envision Siphon may be deployed in an incremental fash-

ion, either for logistical reasons or in response to anticipated traffic characteristics.

Specifically, the physical sink might not be equipped with a secondary radio1. As a

result, there is no guarantee that the virtual sinks (VSs) can form a connected sec-

ondary network rooted at a physical sink through their long-range radio. Further,

due to the relatively sparse required concentration of VSs, as discussed in Sections

4.3.2 and 4.5.8, there is no assurance that a VS is adjacent to a congested region.

Consequently, a congested node requires a method to discover, in an energy-efficient

manner, the existence of a local VS that could be multiple hops away.

We propose an in-band signaling approach that embeds a signature byte into any

periodic control packets originated by a physical sink. In typical sensor network ap-

plications, a physical sink is required to send periodic signaling into the network for

management purposes. For example, Directed Diffusion requires periodic interest

refreshes [11], and in MultiHopRouter [89], a routing protocol included in TinyOS

1Siphon’s algorithms do not assume that the physical sinks have secondary radios and canoperate even under these conditions. However, in the case of our application we generally assumethat physical sinks are equipped with secondary radios. We do not assume, however, that aconnected VS overlay exist including the physical sinks, even though this is likely under theapplications we consider.

124

[19] for mote-based sensor networks, route control messages are periodically broad-

cast from each node in the network to estimate routing cost and monitor link quality.

In these cases the Siphon signature byte can ride for free, allowing for nearly zero-

overhead VS discovery. For applications that do not require periodic sink control

messages, an independent signature byte application is invoked that broadcasts low

rate (once per few minutes) VS signaling messages from a physical sink, resulting in

a small overhead. This overhead can be minimized through smart management at

the sink, as discussed in Section 4.4.2.2. As shown in the following discussion, the

embedded signature byte approach to VS discovery is also used for controlling the

visibility of the VS to its neighbors.

This signature byte contains a VS-TTL (virtual sink TTL) field that specifies

the scope (hop count) over which a VS is advertised. A VS-TTL of l allows nodes

up to l hops from a VS to utilize Siphon’s congestion mitigation services. Clearly,

a larger value of l allows more nodes to utilize a local VS, but increasing l does not

necessarily lead to better network performance. First, packets from nodes reached

only by a large l have longer paths to the VS and may not benefit from its use.

Also, a broad VS scope advertisement increases the chance of localized congestion

around a VS (where each VS potentially creates a funneling effect similar to the

original problem). On the other hand, a smaller value of l implies shorter redirect

paths, improving delivery latency and energy consumption, but confines the benefit

to fewer nodes. Section 4.5.5 investigates the tradeoffs involved in determining the

optimal value of l under different conditions.

The handling of signature byte messages is different for VS and non-VS nodes

in the network; the process flow for each case is outlined below. Note that physical

sinks that do not have a secondary radio broadcast Siphon control packets (i.e.,

any non-data packet that include a Siphon signature byte) with the VS-TTL set to

125

NULL; otherwise, physical sinks set VS-TTL to l.

For VS nodes:

For any incoming packets,

IF (non-data packet)

IF (signature byte exists)

IF (packet arrives via secondary radio)

set VS-TTL to l;

ELSE

leave VS-TTL as NULL;

ENDIF

identify the forwarder of this packet,

set it as the next Siphon hop;

ENDIF

forward the packet through both radios;

ENDIF

Notice that VSs receiving control packets containing the Siphon signature byte

via their low power radios leave the VS-TTL as NULL and thus do not advertise

their presence to the neighborhood. Such a VS has no path to a physical sink via

its secondary network and, thus, other nodes derive no extra benefit by forwarding

packets through this node. However, the Siphon protocol definition allows for a

graph of VSs not connected to any dual-radio physical sink to carry traffic on its

secondary network. We evaluate this scenario in Section 4.5.7 and discuss whether

this yields any performance benefits.

For non-VS nodes:

For any incoming packets,

126

IF (non-data packet)

IF (signature byte exists)

IF (VS-TTL > 0)

identify the forwarder of this packet,

set it as a VS neighbor;

VS-TTL −−;

ENDIF

ENDIF

forward the packet;

ENDIF

Note that the existence of a VS neighbor indicates a VS is located in the neigh-

borhood and can be reached through this specific neighbor. Through this procedure

a sensor maintains a list of neighbors through which neighborhood VSs are accessi-

ble. Because of the small fraction of VSs in the network (as governed by our small

world insights from Section 4.3.2), there is usually only one neighbor in the list.

Therefore, the memory overhead for maintaining a VS list is negligible. In fact, in

many cases the overhead could be reduced to a single bit in each neighbor entry of

the routing table.

4.4.2 Congestion Detection

Accurate and efficient congestion detection plays an important role in the Siphon

framework inasmuch as it indicates the proper time a sensor should attempt to

utilize any VSs it has discovered. We describe two techniques for congestion detec-

tion control: (i) node-initiated congestion detection; and (ii) physical sink initiated

“post-facto” actuation of the VS infrastructure. In what follows we discuss the two

127

techniques and their application in Siphon.

4.4.2.1 Node-initiated Congestion Detection

In our previous CODA work discussed in Chapter 3 we describe a CSMA-based,

energy-efficient congestion detection technique where wireless receivers use a com-

bination of the present and past channel loading conditions, obtained through a

low-cost sampling technique, and the current buffer occupancy to infer congestion.

In Siphon, we adopt these mechanisms proposed in Chapter 3 to determine the local

congestion levels that a node is experiencing.

While the above congestion detection scheme is CSMA-based or contention-

based, the idea can be generalized to other MACs that are often used in sensor net-

works, including schedule-based [71] [72] and hybrid-based MACs (e.g., S-MAC [14],

T-MAC [16]). For pure schedule-based MACs that attempt to guarantee collision-

free communication, queue occupancy provides a good measure of the congestion

level. For hybrid-based MACs such as T-MAC [16], a good measure is a combina-

tion of the queue occupancy and the duty cycle length of the scheduled activity of

a node.

However congestion level is measured, when the local channel load approaches

or exceeds the theoretical upper bound of the channel throughput [82], or when the

buffer occupancy grows beyond a high water mark, a sensor node located within

the visibility scope of a VS will activate its redirect algorithm (see Section 4.4.3 for

details) to divert designated traffic (e.g., data impulses, prioritized traffic, etc.) out

of the neighborhood, utilizing the VSs. To best counter the funneling effect, it is

essential to redirect data impulses as early as is possible in the propagation funnel.

However, in order not to diminish any possible aggregation effort of correlated data

in the network (aggregation is most effective deep in the funnel), it is beneficial to

128

redirect traffic later in the funnel. To achieve a balance, it is best to redirect data

at a location just before congestion is most likely to occur in the funnel. In Section

4.5.4 we verify this conjecture.

4.4.2.2 Physical Sink initiated “Post-Facto” Congestion Detection

As an alternative approach to the node initiated congestion detection discussed in

the previous section, we consider the “post-facto” activation of the VS infrastruc-

ture via congestion inference at a physical sink. The physical sink, as a point of

data collection in the funnel, can do smart monitoring of the event data quality and

the measured application fidelity [35], and initiate VS signaling only when the mea-

sured application fidelity is degraded below a certain threshold. In this approach,

the siphoning service is enabled only after congestion or fidelity degradation is mea-

sured in the primary low-power radio network. As such, the approach has limited

capabilities dealing with transient congestion deep in the network, but may be ad-

equate when congestion occurs closer to the physical sink. This technique has the

advantage of not requiring underlying congestion detection support at each node.

To propagate the signal in a timely manner from the physical sink, a control mes-

sage is broadcast through its non-congested secondary radio network (a connected

secondary network is required). Because the traffic siphoning in the “post-facto”

approach is based on the perceived performance measured at the physical sink but

not the congestion levels experienced in the network, we conjecture that it also has

the advantage of avoiding premature traffic siphoning especially when network-wide

aggregation [90] is used. In Section 4.6., we examine the effectiveness of employing

the “post-facto” congestion approach in a sensor network testbed.

129

4.4.3 Traffic Redirection

Traffic redirection in Siphon is enabled by the use of one redirection bit in the net-

work layer header. We consider two approaches in setting the redirection bit: (i)

on-demand redirection, in which the redirection bit is set only when congestion is

detected; and (ii) always-on redirection, in which the redirection bit is always set.

We discuss the tradeoffs of these two approaches in Section 4.5.6. The basic redirec-

tion mechanism is as follows. A sensor that receives a packet with the redirection

bit set forwards the packet to its VS neighbor, a process through which the redi-

rected packet would eventually reach a VS. If the redirection bit is not set then

routing follows the paths determined by the underlying data dissemination/routing

protocol.

When a VS receives a redirected packet, it forwards it to the neighbor through

which it most recently received a control message embedding the signature byte.

As discussed in Section 4.4.1, such control packets can arrive either through a VS’s

primary or secondary radio interface. In the best case, all VSs are connected to a

physical sink via the secondary network overlay and all physical sink-bound packets

are routed through the VS are forwarded on a fast track all the way to the physical

sink. When the secondary network is partitioned, the last VS (closest to the sink)

in the secondary network fragment must direct all sink-bound packets back on the

primary network, specifically to the sensor it has identified in the discovery phase.

From here, packets are again routed to the physical sink according to the default

routing paths.

Recent experimental studies [41] [89] show that sensor networks using low-power

radios often suffer from highly variant wireless link quality that is both time and

location dependent. To ensure that traffic siphoning through the VS infrastructure

does not degrade the network’s primary packet forwarding service only neighbors

130

with good link quality are utilized to redirect packets to a VS. Many routing proto-

cols (e.g., MultiHopRouter [89]) maintain a neighbor table that includes a continu-

ously updated link quality estimation for a selected set of neighbors. When a sensor

located within the visibility scope of a VS detects congestion while forwarding event

packets, it makes a decision to redirect a specific type of data packet based on local

policy, as discussed in Section 4.3.3.

As a general policy rule for traffic redirection in Siphon, an alternate (redirect)

next hop neighbor must have a link estimation that is within 15% (lowerbound) of

the link estimation of the current chosen next hop. Otherwise the VS should not be

utilized. If the redirect policy parameters are met, the congested sensor marks the

redirection bit in the routing header of the data packet being forwarded and redirects

it to a VS neighbor selected from its local list. Conformance with an appropriate

policy allows use of the VS infrastructure to improve application data fidelity by

bypassing funnel congestion, without incurring potentially an unacceptable level of

packet loss through use of low quality links to the local VS.

VSs offer shortcuts and possibly higher bandwidth pipes for data delivery in

sensor networks. Traffic siphoning through VSs may subtly impact the routing pro-

tocols operating in the primary and secondary network only if the routing metrics

used are sensitive to enhanced service characteristic, such as the delay or loss as-

sociated with the data delivery paths in the network. For example, data-centric

dissemination protocols such as Directed Diffusion [11] and variants of DSDV-based

routing protocol [84], are capable of choosing empirically good paths that dynam-

ically adapt to changing network conditions. These protocols are therefore delay-

sensitive and their routing decisions could be impacted by traffic siphoning in a

subtle way2. In Section 4.5.2 we discuss these interactions and propose a simple

2As a general comment only protocols that base their routing decisions on the actual data

131

and elegant method to deal seamlessly with such behavior. As an example, we show

how Siphon interwork seamlessly with Directed Diffusion.

Using the same routing logic (e.g., Directed Diffusion) on both radio networks

has the benefit of having simpler and consistent control/management at the network

level. However, as discussed in Section 4.3.4, this approach limits the flexibility for

optimization. In general, different routing protocols use different routing metrics.

As discussed in Section 4.3.4, the primary network and the secondary network can

run different routing protocols that are completely independent and optimized for

each radio network. We conjecture this is the favorable approach when the two

radio networks have significantly different communication properties. In that case,

there is minimal interaction between the two networks once packets are redirected

to a nearby VS. For example, Stargate supports running AODV [93] on its IEEE

802.11 network while interfacing with a mote-based sensor network running the

single-destination MultiHopRouter protocol [89]. The routing decisions on both

networks are completely independent from each other and hence traffic siphoning

requires minimal or no interaction with the routing protocols. As a result, Siphon

works naturally with this approach.

4.4.4 Congestion in the Secondary Network

The traffic siphoning service is complementary to the first generation congestion

control schemes such as CODA and Fusion [76], and as such can be run in par-

allel with these techniques on the primary and secondary networks. When the

secondary network is also overloaded, traffic redirection through VSs offers little

benefit. Therefore, a VS always monitors its own congestion levels on both primary

delivery service perceived at the receiver are potentially impacted. Other protocols, that base theirrouting decisions on fixed routing metrics, such as shortest path routing, geographical routing [91]or routing on a curve [92] approach are not affected.

132

and secondary radio channels and does not advertise its existence when either one

of its radio networks is overloaded. For the IEEE 802.11 radio (which we use in our

experimentation), Murty et al. [94] propose an algorithm to calculate the normal-

ized collision-induced bit error rate as part of their scheme to predict congestion

and dynamically adjust the MAC parameters for throughput optimization. We use

this technique as a reliable scheme to detect congestion on Siphon’s secondary IEEE

802.11 network. This forces a VS to refrain from offering service or to reduce its

scope of service according to the level of detected congestion.

When both primary and secondary networks are overloaded, the congestion lev-

els on both networks will eventually rise beyond certain thresholds. In that case,

CODA’s backpressure mechanism (or the similar mechanism in Fusion) will be trig-

gered, (i.e., the system falls back to the traditional schemes that rate-control the

source and forwarding nodes to alleviate congestion [82]). In general, VSs are less

likely to be congested since they can send and receive packets at the same time

through the two different radios, in channels with different characteristics (fading,

throughput, delay, etc.).

4.5. Performance Evaluation

We use packet-level simulation to obtain preliminary performance evaluation results

for Siphon. We also discuss the implications of our results on the design choices that

shape Siphon.

4.5.1 Simulation Environment

We implement Siphon as an extension to the ns-2 [60] simulator in its simplest in-

stantiation. First, to model a VS node we add support for a second long-range radio

interface that has a transmission range of 250m. The primary low-power radio used

133

in our simulations is configured to have a 40m transmission range to model a typical

sensor node. We use Directed Diffusion [11] as the routing core in the simulations,

which allows the simulations to shed light on Siphon’s interaction with a realistic

data routing model where congestion can occur. For data-centric dissemination pro-

tocols that utilize empirically good paths, such as Directed Diffusion [11], there is

a simple and elegant scheme for transparent traffic redirection through the use of

a delay device. We describe the specifics of this scheme, including the appropriate

local redirection rules, in Section 4.5.2.

Our simulations use the 2 Mbps IEEE 802.11 MAC provided in ns-2 with some

modifications. We add code to the MAC to measure channel loading using the epoch

parameters (N = 3, E = 200ms, α = 0.5), as defined in Chapter 3. The MAC of

a node sets a congestion flag at the routing agent when the measured channel load

exceeds a threshold of 70%. To perform early congestion detection, as discussed

in Section 4.4.2, it might be beneficial to trigger the congestion flag at a lower

threshold. We verify this in Section 4.5.4.

In all our experiments, we use random topologies with different network sizes.

For each network size, our results are averaged over five different generated topolo-

gies and each value is reported with its corresponding 95% confidence interval. In

most of our simulations, we use a fixed workload that consists of six sources and

two physical sinks. A sink subscribes to three data types corresponding to three

different sources [11]. Note, however, that the network dynamics in the simulations

are non-deterministic because each sink subscribes at a random time to a set of

sources that is randomly chosen over different simulation sessions. Therefore, the

congestion periods and locations are also non-deterministic because of Directed Dif-

fusion’s ability to choose empirically good paths that are dynamically adapted to

the network conditions.

134

To model the large-scale, impulse type data traffic that is generated from an event

epicenter, all sources are located in a neighborhood of a node randomly selected from

nodes in the network. Sinks are uniformly scattered across the sensor field.

4.5.2 Delay Device and Directed Diffusion

Here we describe a scheme to seamlessly integrate Siphon redirection with data-

centric dissemination protocols using Directed Diffusion as an example. In Directed

Diffusion, the sources initially generate low rate data packets that are marked ex-

ploratory and disseminated through multiple paths toward the physical sink. Based

on the measured delivery performance, the sink later reinforces one or more empir-

ically good paths capable of delivering high quality data traffic, (i.e., with lowest

latency and highest fidelity delivery). Subsequently, the sources generate higher

rate data packets, no longer marked exploratory, which are transported along the

reinforced paths.

As mentioned in Section 4.4.3, when routing protocols are delay-sensitive, the

enhanced service offered by Siphon can affect routing decisions. This is certainly the

case for Directed Diffusion, which is used as the routing protocol for both primary

and secondary networks in our simulations. Specifically, exploratory data packets

traversing the low-delay paths provided by the VS secondary network will almost

certainly reach the physical sink before packets passing through the primary net-

work. As a result, paths using the VS secondary network will always be reinforced,

regardless of the congestion state of the network when using delay sensitive pro-

tocols such as Directed Diffusion. In Section 4.5.6, we discuss the merits of such

always-on operation of the secondary network. In general, however, a mechanism

that allows for the conditional (i.e., “on-demand”) usage of the low-delay VS paths

is required.

135

To this end, we implement a delay device on each VS that operates on the

secondary radio interface and is activated whenever a VS forwards an exploratory

data packet through the long-range radio. The device delays the forwarding of

exploratory data packets via the secondary radio by D seconds. D should be large

enough such that these exploratory data packets will not be the first to be delivered

to the physical sink, instead allowing packets to reach the physical sink via the

primary radio network first. For example, in our simulations we use the maximum

round-trip delay between two nodes that are furthest apart from each other in the

network as the value of D for the delay device. In this manner, paths on the primary

network (instead of the secondary network via VSs) will be reinforced by Directed

Diffusion, and this situation will persist while the network is in a non-congested

state.

When a node within the visibility scope of a VS detects congestion while for-

warding data packets, it takes action such that the VS secondary network becomes

more attractive to Directed Diffusion, allowing traffic to be siphoned from the con-

gested region through the VSs. Specifically, such a node selectively duplicates a

data packet (e.g., one in every fifty data packets), marks it exploratory (using the

Directed Diffusion exploratory bit) and sets the redirection bit (the same bit as

described in Section 4.4.3), and forwards to its VS neighbors. Note that the orig-

inal packet is still forwarded along the existing routing paths during this period.

A VS receiving an exploratory data message with the redirection bit set will dis-

able the delay device and forward the message immediately through both interfaces

(assuming they have matching gradient entries [11]). Without the delay added by

the delay device, a dissemination path over the VS secondary network is likely to

be reinforced by the sink. Subsequently, high rate data will be redirected over the

secondary network, until the congestion ends. At that point, the node that had

136

originally signaled the congestion stops setting the redirection bit in data packets

it forwards through the VS. The VS will reinstitute the delay device, and Directed

Diffusion will ultimately again reinforce the best path(s) on the primary network.

Note that the low rate exploratory data message serves the purpose of probing for

new routes that are better (e.g., have lower latencies). If the VS’s long-range radio

link is overloaded, then delivery performance will suffer and it will not necessarily

be used. This self-balancing feature helps to prevent an overloaded VS from being

utilized.

The use of delay devices on the VSs provides an elegant way for Siphon to seam-

lessly interact with Directed Diffusion. This delay device resolves any undesirable

behavior that is brought about through the interaction of Siphon when using delay-

sensitive routing scheme such as Directed Diffusion. It can also be used by the VS

to assert active control. For example, a VS may selectively enable/disable the delay

device based on its own view of the environment or local conditions, such as the

amount of energy left. A VS that is not energy-constrained might disable the delay

device, thus enabling the siphoning function even when there is no congestion in the

neighborhood.

4.5.3 Energy Tax, Fidelity Ratio and Residual Energy

We define three metrics to analyze the performance of Siphon on sensing applica-

tions:

• Average Energy Tax - this metric is the ratio between the total number of pack-

ets dropped3 in the sensor network and total number of packets received at the

physical sinks over a simulation session. Since packet transmission/reception

3Dropped packets include the MAC signaling (e.g., RTS/CTS/ACK and ARP), event data, andDiffusion messaging packets.

137

consumes the main portion of the energy of a node, the number of wasted pack-

ets per received packet directly indicates the energy saving aspect of Siphon

when compared to the case without it.

• Average Fidelity Ratio - we define the data fidelity as the delivery of the

required number of data packets within a certain time limit. This metric is

the ratio between the average number of data packets received at a physical

sink when using Siphon and when using vanilla Directed Diffusion. The ratio

indicates the fidelity improvement/degradation by using Siphon.

• Residual Energy - we use the ns-2 energy model for IEEE 802.11 network

to measure the remaining energy of each node at the end of a simulation.

This metric is calculated by normalizing the remaining energy to each node’s

initial energy. The residual energy distribution allows us to examine the load

balancing feature of Siphon and to estimate effective network lifetime.

We use these three metrics to evaluate and quantify the benefits of using Siphon

under different scenarios and configurations in the following sections.

4.5.4 Early Congestion Detection

Using fidelity and energy tax performance as a guide, we first search for a congestion

(channel load) threshold that will trigger traffic siphoning to best avoid congestion

in the funnel. We simulate a network of 30 nodes, where 2 nodes are randomly

selected as VSs, one of which is also selected as the physical sink. There is only one

VS within the network that can be utilized to redirect data traffic. Six nodes are

selected as sources according to the process described previously in Section 4.5.1.

Each source generates 15 packets per second sent toward the physical sink starting

at a random time distributed uniformly from 10 to 15 seconds into the simulation,

138

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

10 20 30 40 50 60 70 80 90 100

Nor

mal

ized

Rat

io

Congestion level threshold (Channel load %)

Fidelity RatioEnergy Tax

Figure 4-3: Early Congestion Detection. Different congestion level thresholds thatcan avoid congestion down the funnel.

and run for 100 more seconds. In the simulations, we vary the congestion threshold

at which we should start redirecting data traffic to a nearby VS. Figure 4-3 plots

both metrics (fidelity and energy tax) against different congestion levels.

In the simulation, we strategically place the VS at a location within a few hops

of the propagation funnel toward the physical sink. Figure 4-3 shows that as long

as the VS is utilized for traffic siphoning, the data fidelity is improved regardless of

the congestion level threshold. However, the energy tax of the network rises quickly

when the threshold is set higher than 80%. In our simulations, we observe that a

channel utilization of 80% is where the channel saturates and suffers from frequent

collisions between neighboring nodes. Note that this is also the threshold chosen to

trigger CODA’s open-loop backpressure scheme in Chapter 3. This indicates that

when the threshold is set too high it is too late to divert traffic at a location that

is deep in the funnel. Considering Siphon is a complementary scheme (to CODA)

that prevents congestion by diverting traffic earlier in the funnel, Figure 4-3 indicates

that a threshold that is slightly lower than the channel saturation level would be

appropriate. For example, 70% is appropriate since the energy tax is only slightly

139

-1.5

-1

-0.5

0

0.5

1

1.5

2

1 1.5 2 2.5 3 3.5 4 4.5 5

Fid

elity

Rat

io/E

nerg

y T

ax S

avin

g

Visibility Scope of Virtual Sink (# of hops)

FidelityEnergy Tax Saving

Figure 4-4: The impact of the visibility scope of a VS for a network of 30 nodes.

higher than that incurred by the lower thresholds. Notice that utilizing a VS at a

lower threshold means its energy is more quickly drained.

While a high buffer occupancy can also serve as a good indicator for congestion,

we observe that, in our simulator, it grows at a much slower rate than the channel

load. In Section 4.6.2 we investigate an appropriate buffer occupancy level threshold

that best predicts congestion in our sensor testbed.

4.5.5 Virtual Sink’s Visibility Scope Impact

In what follows, we investigate the visibility scope of a virtual sink. We vary the

scope l from 1 to 5 and measure the fidelity ratio as well as the average energy tax.

In Figure 4-4, the energy tax is normalized such that it represents the energy tax

savings when using Siphon.

Figure 4-4 shows that for all values of l, the average fidelity ratio is larger than

1 (despite the high variability when l is larger than 2), indicating that fidelity can

be improved whenever a VS is utilized. However, the energy tax savings decreases

when l is larger than 2, and drops rapidly below zero, indicating that the nodes

actually consume more energy when they utilize and redirect data traffic to a VS

140

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

30 40 50 60 70 80 90 100 110 120

Nor

mal

ized

Rat

io


Fidelity RatioEnergy Tax Saving

Figure 4-5: Fidelity and Energy Tax performance in a network where there arealways-on virtual sinks.

that is more than two hops away. Through careful examination of the details of our

simulation, we observe that when l is larger than 2, it often creates local congestion

around the VS as more nodes within the funnel are trying to redirect data through

the same VS. This causes frequent collisions and therefore more packet drops and

more retransmissions.

Figure 4-4 shows that when l is 2, both the fidelity gain (20%) and energy tax

saving (60%) are the highest, and have smaller confidence intervals, indicating that

the optimal scope is l equal to 2.

4.5.6 Always-on versus On-demand Virtual Sinks

An always-on VS continuously powers up the secondary radio to help forward nearby

data traffic, regardless of the congestion conditions in the neighborhood. This can

help to enhance data delivery service in the field at the expense of consuming more

of its own energy. On the other hand, an on-demand VS will not power up its

long-range radio unless its visibility scope overlaps a congestion region. In what

follows, we specifically investigate the tradeoff of these two approaches. To model

141

0

0.5

1

1.5

2

2.5

3

30 40 50 60 70 80 90 100 110 120

Nor

mal

ized

Rat

io


Fidelity RatioE-Tax Saving

Fidelity Ratio, PartitionedE-Tax Saving, Partitioned

Figure 4-6: Fidelity and Energy Tax performance in a network where there arevirtual sinks that are put into service only when congestion is detected.

an always-on VS, we simply disable the delay device on the node, as discussed in

Section 4.5.2.

Figure 4-5 and 4-6 present the fidelity ratio and energy tax saving performance

of these two approaches in a set of networks of different sizes. Six sources and two

physical sinks form two propagation funnels in the network. In each simulation,

5% of the nodes are randomly selected to be the VSs, the process is similar to that

discussed in section 4.3.2. In this scenario, the 5% VSs are uniformly distributed

across the field and form a connected secondary network over long range radios.

Figure 4-6 also includes another set of plots that present the scenario when the VSs

can not form a connected secondary network, as discussed in Section 4.5.7.

Always-on VSs are utilized whenever they are able to deliver data event to a

physical sink with lower delay and higher fidelity. Figure 4-5 shows that Siphon is

able to obtain greater fidelity gain in a larger network, although the energy gain does

not follow the same trend. The fidelity gain increases almost linearly with increasing

network sizes, while the gain in energy tax levels off after a network size of 50 nodes.

This indicates that when the number of nodes in the network increases, the number

142

of dropped packets increases almost linearly because of longer propagation path

and more intense funneling effect. But with Siphon, the VSs are able to siphon off

events to maintain the fidelity level regardless of the linearly increasing number of

packet drops. Without Siphon, the packet delivery service degrades linearly while

the number of packets dropped (wasted) increases rapidly. This indicates that the

energy tax of Siphon degrades much slower than the vanilla Directed Diffusion.

When the network size increases, Siphon can continue to obtain larger fidelity gains,

although the energy benefit obtained does not keep pace.

Figure 4-6 closely agrees with Figure 4-5, except with a much higher degree of

variability (indicated by the error bars that represent the 95% confidence intervals).

This indicates that instead of utilizing VSs in an always-on fashion, possibly ex-

hausting the energy of the VS, on-demand VSs that power up the secondary long

range radio only in times of congestion and can achieve almost as good energy sav-

ings and fidelity improvement may be preferred. Figure 4-6 also shows the efficacy

of our congestion detection scheme since it enables the nodes within the visibility

scope of a VS to correctly detect congestion and utilize the nearby VS. However,

the on-demand nature of this approach increases the dynamics and introduces more

disturbance into the network, hence the high degree of variability in the plot. This

result clearly illustrates the tradeoff between data delivery, service stability, and

energy consumption of the VSs.

4.5.7 Partitioned Secondary Network

If only a small number of VSs are deployed in the network or if the physical sink

does not support a secondary radio, then the VSs may not form a connected network

and the primary short-range channel must be used to deliver packets between VSs.

To model this, we move VS functionality from one of the physical sinks to another

143

node. This action conserves the number of VSs, while partitioning the secondary

network. Each of the simulations described in the previous section is repeated. The

result is presented in Figure 4-6, which shows that both the fidelity and energy tax

gains are much smaller (and have higher variability) than their connected network

counterparts, especially for energy tax performance in smaller networks. For exam-

ple, in a 30-node network, the energy tax sometimes is even higher than without

siphoning, indicated by the error bars of energy tax saving that include negative val-

ues in Figure 4-6. We observe that in smaller networks, the paths that connect VSs

through the primary channel often coincide with the original propagation funnels

toward the physical sinks. Although this could improve load balancing by diverting

traffic through VSs and their surrounding neighbors, it does not eliminate network

bottlenecks caused by funneling effect. This result suggests that a connected sec-

ondary network is required to reap a consistent benefit from traffic siphoning for

the purpose of congestion avoidance.

4.5.8 VS Density Impact

In section 4.5.5 we demonstrate that a visibility scope of 2 hops is most appropriate.

Section 4.5.7 describes the two disadvantages of having partitioned secondary net-

work. These results influence the distribution of VSs in a network. Kumar and Xue

[86] proved an asymptotic result for full connectivity within a randomly distributed

wireless network, i.e. in a wireless network consisting of n nodes, the network is

asymptotically connected if each node connects to greater than 5.1774 log n nearest

neighbors.

This result provides an analytical foundation and an elegant strategy for con-

nected ad hoc network deployment. Consider a radio communication range of r,

using the uniform iid node placement assumption in [86], we can derive that a ran-

144

100^2

200^2

300^2

400^2

500^2

600^2

100 200 300 400 500 600 700 800 900 1000

2

4

6

8

10

12

14

16

Are

a of

net

wor

k co

vera

ge (

m^2

)

# of

V-s

inks

Number of sensor nodes

Connectivity-proof requirement of network coverage

Ordinary SensorVirtual Sink (right y-axis)

Figure 4-7: Number of sensor nodes required to ensure connectivity in the corre-sponding areas of network coverage as well as the number of VSs (right vertical axis)required to ensure performance improvement.

domly distributed network of N nodes can fully cover an area and guarantee full

connectivity therein if it fulfills the constraint:

Area =N ∗ π ∗ r2

5.1774 log N, (4.1)

Meanwhile consider a visibility scope of l hops, we can calculate the number of VSs

required in this network deployment as:

Nv−sink =Area

π(lr)2=

N

5.1774 log N ∗ l. (4.2)

Using equation 4.1 and 4.2 one can determine the number of sensors and VSs

required to populate a designated area of interest. Figure 4-7 plots the above ex-

pressions numerically against number of sensors N . The radio communication range

of a sensor is r = 40m, while the long-range radio communication range of a VS

is 250m and the visibility scope l is 2 hops. With this specific setup, according

145

1.5

2

2.5

3

3.5

4

0 100 200 300 400 500 600 700 800 900 10001.5

2

2.5

3

3.5

4

Frac

tion

of V

-sin

k (%

)

Number of sensor nodes

Fraction of Virtual sinks needed to assure improved network performance

Percentage of V-sinks

Figure 4-8: Fraction of Virtual Sinks needed to assure improved network perfor-mance.

to Figure 4-7, an area of 6002m2 would require 1000 randomly distributed sensors

to ensure network connectivity, while 16 VSs are enough to guarantee performance

improvement from siphoning. To further understand the cost of siphoning, Figure

4-8 presents the fraction of VSs needed in this scenario. We observe that as the

network size increases, the cost actually decreases, e.g. only 1.6% of nodes needed

to be VSs for a 1000-node network.

On the other hand, the connected secondary network requirement imposes a

lower-bound on the ratio between the transmission range of the long-range radio

and the low-power radio. In Figure 4-9, we consider the network area covered

by the secondary long-range radio of VSs with different transmission range ratios,

and plot them against the number of VSs in the network. We observe that for a

visibility scope of l = 2, when the transmission ratio > 5, the network coverage

of the required number of VSs is always smaller than the network coverage of the

same number of VSs needed to ensure connectivity in secondary network. In other

words, for a specific area, the number of VSs required for a visibility scope of 2 is

146

200^2

300^2

400^2

500^2

600^2

700^2

800^2

2 4 6 8 10 12 14 16

Are

a of

net

wor

k co

vera

ge (

m^2

)

Number of V-sinks

Fully connected secondary network requirement

6.25x Xmit Radius5x Xmit Radius4x Xmit Radius3x Xmit Radius2x Xmit Radius

Visibility scope of 2

Figure 4-9: Requirement for a connected secondary network. The transmissionrange of the long-range radio is expressed as the multiples of transmission radius ofthe low-power radio. The visibility scope requirement that assures both energy taxand fidelity improvement is plotted as filled square in the figure.

always larger than that required to ensure full connectivity in the same area. This

indicates that if the transmission range of a VS’s long-range radio is at least 5 times

that of its low-power radio, then the number of VSs required for visibility scope of

2 also guarantees a connected secondary network. Therefore, Figures 4-7 and 4-8

provide an important and accurate roadmap for network deployment.

4.5.9 Load Balancing Feature

In what follows, we study the load balancing feature of Siphon in terms of its energy

impact on the network. We simulate a moderate-size network of 70 nodes. 3 VSs

are scattered at random locations in the network. One of these VSs is selected

as the physical sink, subscribing to six randomly designated sources that generate

data at 10 packet per second. We measure the residual energy of each node at the

end of each simulation and plot the average complementary cumulative distribution

frequency (CDF) of the residual energy distribution of the network in Figure 4-10.

147

0

0.2

0.4

0.6

0.8

1

60 65 70 75 80 85 90 95 100

Com

plem

enta

ry C

DF

(P

rob

X>

b)

Residual Energy (%)

Directed DiffusionSiphon+Directed Diffusion

Figure 4-10: Energy Distribution (Complementary CDF) of a 70-node network with3 virtual sinks scattered randomly across the network. With Siphon’s load balanc-ing feature, more nodes share the energy load. Therefore, fewer nodes have residualenergy larger than 85%, but more nodes have larger residual energy (e.g., the per-centage of nodes having residual energy larger than 75% increase from 60% to 85%),effectively increasing the operation lifetime of the network.

Figure 4-10 shows that the minimum residual energy of the network increases

from 67% to 72%. In other words, all of the nodes have residual energy larger than

72% of its initial energy capacity. However, the plot also shows that the probability

of any nodes having residual energy larger than 86% is 0%, while without Siphon,

the probability is higher. This indicates that the maximum residual energy of the

nodes decreases because more nodes are involved in forwarding packets in Siphon,

thus, more nodes share the energy consumption. In other words, there are fewer rich

nodes (in terms of energy), but overall there are more richer nodes in the network

as a result of the load balancing feature of Siphon. Note that there is no node

possessing residual energy more than 88%; all nodes at least spend some energy.

This is because of the periodic interest flooding requirement of Directed Diffusion.

In summary, Siphon can balance the load in the network so that more nodes have

higher residual energy as more nodes share the energy load, effectively increasing

148

the operational lifetime of the network.

4.6. Sensor Network Testbed Implementation

In this section, we discuss the implementation of Siphon on a real sensor network

using the TinyOS platform [19] on Mica motes [23] and the Stargate platform [48].

We report evaluation results, including appropriate threshold values for congestion

levels that should trigger the traffic redirection, and an evaluation of Siphon in a

generic data dissemination application as compared to CODA [82]. We also evaluate

a “post-facto” approach that activates the VSs only after congestion has occurred

and impacted the application’s fidelity, as discussed in Section 4.4.2.2.

4.6.1 Stargate and Mica Mote Testbed

The sensor device (Mica) has an ATMEL 4 MHz, low power, 8-bit microcontroller

with 128K bytes of program memory, 4K bytes of data memory. The radio is a single

channel RFM radio transceiver operating at 916 MHz and capable of transmitting

at 10 kbps using on-off-keying encoding. The Stargate is a powerful single board

computer with enhanced communications and sensor signal processing capabilities

that runs a version of embedded Linux. The Stargate uses Intel’s 400 MHz X-Scale

processor and supports serial communication with Mica through a 51-pin connector.

Also, the Stargate supports a PCMCIA slot where we can install a IEEE 802.11b

network card, enabling the Stargate to become a VS that talks to both the long-

range IEEE 802.11 network as well as the short-range RFM radio network formed

by the Mica motes.

We implement CODA’s open-loop control function that supports priority for-

warding of packets from a list of pre-defined data types. This includes a channel

load measurement MAC module as described in Chapter 3 on Mica. To support

149

Virtual Sink

Source

Source

Source

Sink

Figure 4-11: A sensor network testbed of 30 nodes

Siphon, we use Stargates as VSs and implement the traffic redirecting function as

well as the VS visibility scope control function on both Mica and Stargate platforms.

4.6.2 Congestion Detection for Traffic Redirection Decision

An important decision that must be made when using Siphon is the congestion

threshold at which it is appropriate to start redirecting data traffic to a nearby VS.

To determine this threshold experimentally, we deployed a 30-node sensor network

using the Mica motes, as shown in Figure 4-11. The topology is arranged to capture

a funneling effect described in Section 4.3.1.

4.6.2.1 Channel Load Threshold

Due to the application-specific nature of sensor networks, an appropriate choice for

the early congestion detection threshold must be based on application loss tolerance

parameters. The energy tax that we define in Section 4.5.3 is an appropriate metric

150

0

5

10

15

20

25

30

35

40

45

50

10 20 30 40 50 60 70 80 90

Ene

rgy

Tax

Channel Load (%)

Single StreamThree Streams

Figure 4-12: Early congestion detection threshold. An appropriate choice for theearly congestion detection threshold must be based on application loss toleranceparameters.

to measure loss tolerances of an application because it represents the number of drops

in the network per single event delivered at the physical sink. In the experiment,

we setup a node at one end to be a source and a sink at the other end (Figure

4-11). The source generates data packets at different rates that drive the network

to different levels of congestion. Each scenario is repeated five times. We calculate

the average energy tax of the network and plot it against the average channel load

measurement of a node that is located at the middle position in the topology. Figure

4-12 shows that for a single stream of data traffic, the energy tax increases from 0

to 3 when the channel load increases from 10% to 30%. It increases exponentially

when the channel load reaches 70%, which indicates that the channel is saturated4.

To capture the more realistic traffic pattern and congestion scenario, in Figure

4-12 we also plot the curve for three forwarding streams involving three different

4Note that this result closely matches the result reported in Chapter 3 in which we measuredthe β value that limits the channel throughput upper-bound of a Mica mote radio to be around70%.

151

data sources disseminating data toward a sink (Figure 4-11). We observe that the

two curves closely match each other between the channel load range of 30% and

70%. This indicates that our channel load measurement is accurate in reflecting

the energy tax irrespective of the number of traffic streams in the network. This

is an important feature since a node is not generally aware of the number of traffic

streams in the network and should thus not be required to make routing decision

based on it (e.g., when to redirect the traffic to a nearby VS).

Following the rationale suggested in Section 4.5.4 from the simulation results, a

threshold that is slightly lower than the channel saturation level would be appro-

priate. According to Figure 4-12, a threshold that is between 60-70% should be

appropriate to trigger traffic siphoning to avoid congestion. However, considering

the much higher energy tax (around 10 for a channel load of 60%) of the RFM radio

network compared to the IEEE 802.11 network, we choose a smaller threshold, i.e.,

30%, for early congestion detection.

4.6.2.2 Buffer Occupancy Threshold

Queue management is often used in traditional data networks for congestion de-

tection, i.e., congestion is signified when a node’s buffer occupancy grows beyond a

high water mark level. However, as discussed previously in Section 4.5.4, we observe

through simulations that the channel load provides a much faster and reliable in-

dication of network congestion than buffer occupancy. While the same observation

holds true for our mote-based sensor testbed, we observe one exception, in which

case the time-varying channel suffers from occasional deep fades for an extended

time period. During this period, while the measured channel load is low, few pack-

ets can be delivered between forwarding nodes. Therefore, the queue of the sending

node grows quickly (when link-ARQ is used) and eventually overflows and starts

152

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8 1

Del

iver

y R

atio

/Buf

fer

Occ

upan

cy

Channel Load/Utilization

Packet Delivery RatioBuffer Occupancy

Figure 4-13: Queueing performance and buffer occupancy threshold for congestionavoidance.

dropping packets. Based on this observation, it is beneficial to determine an ap-

propriate buffer occupancy level that can reliably indicate congestion in addition to

channel load indication.

In our testbed, we generate data packets at different rates and measure the aver-

age queue size of the nodes in a small neighborhood that share the wireless medium.

Figure 4-13 plots the measured normalized average buffer occupancy against the

channel load (utilization). We also plot the packet delivery ratio between neigh-

boring nodes in the same figure. We observe that the buffer occupancy is small

(≤ 15%) when the channel quality is excellent and the packet delivery ratio is high.

On the other hand, when the buffer occupancy ≥ 20%, the packet delivery ratio

drops quickly signifying the onset of congestion. Based on this result, we set the

buffer occupancy threshold to 20% in our testbed for all experiments discussed in

next section.

153

4.6.3 A Generic Data Dissemination Application

In what follows, we evaluate Siphon using a realistic data dissemination application

and compare the result with CODA’s open loop control. We reuse the sensor network

topology in Figure 4-11 to carry out the experiments, only this time we add two

Stargates into the sensor network testbed, one of which is also a physical sink.

The other is a VS that is placed at random locations in the testbed for different

experiments.

For every scenario, we collect data for three different cases:

• The vanilla application without any congestion control/avoidance mechanism

• CODA open loop control with priority support is used. One of the sources

(Src-3) generates data packets with higher priority than the other two

• Siphon with one VS that is placed at random locations for different experi-

ments.

Five independent experiments are conducted for each case and we calculate the

average energy tax savings and fidelity ratio. Both metrics are normalized to the

result obtained for the case without any congestion control/avoidance mechanism.

Figure 4-14 presents the results in bar charts. From the figure, we observe that

in terms of energy tax, CODA’s open-loop hop-by-hop backpressure scheme has

limited benefits in this scenario since the hotspot is far away from the sources and

the congestion is persistent. However, CODA’s priority support for src-3 improves

both its energy tax (up to 55%) and fidelity (up to 200%). On the other hand,

Siphon can improve both its energy tax (12% to 68%) and fidelity (10% - 110%) for

all sources.

154

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Ene

rgy

Tax

Sav

ing


SIPHON

CODA

0

0.5

1

1.5

2

Fide

lity

Rat

io


CODA

SIPHON

Figure 4-14: Siphon performance in a real sensor network of 30 nodes. Notice thepriority favor of CODA toward src-3.

155

0

0.5

1

1.5

2

2.5

3

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Nor

mal

ized

Rat

io

Average Traffic/Channel Load

Fidelity Ratio, Post-factoE-Tax Saving, Post-facto

Fidelity Ratio, Early DetectionE-Tax Saving, Early Detection

Figure 4-15: Post-facto traffic redirection versus early-detection approach.

4.6.4 Post-Facto Traffic Siphoning

As discussed in Section 4.4.2.2, a physical sink can infer congestion by monitoring

the event data quality, and enable “post-facto” traffic siphoning through secondary

network only when the measured application fidelity is degraded below a certain

threshold. We implement an application agent that analyzes in real-time the event

data delivery ratio of each source on the physical sink. The agent calculates the

moving average of the data delivery ratio using a window of five seconds, and will

initiate VS signaling when the measured delivery ratio is lower than 60% for at least

10 seconds. Figure 4-15 presents the results as compared to its early-detection based

counterpart under different traffic loads.

Figure 4-15 shows that while the post-facto approach does not perform as well as

the early-detection approach under high traffic load scenarios (≥ 50% channel load),

it performs as well in the lower traffic load region. In fact, the post-facto approach

performs better than the early-detection approach at traffic loads lower than 30%.

We observe that under low traffic load, the network sometimes suffers from poor

connectivity or frequent collisions due to hidden terminals during the periods in

156

which both the measured channel load and buffer occupancy are low. As a result,

the measured data delivery ratio degrades and triggers post-facto traffic siphoning

that improves subsequent data delivery, while in early-detection approach the VS is

not utilized because of the perceived low channel load and buffer occupancy.

4.7. Conclusion

There is a growing need for improved congestion control and load balancing support

in emerging sensor networks. The first generation of congestion avoidance mecha-

nisms for sensors are effective at limiting packet loss due to congestion and allowing

the network to find a stable operating point under increasing load. However, these

mechanisms are not sufficient to deal with the new types of congestion that are

an artifact of the funneling effect and a product of data impulse applications. In

this chapter, we have taken a new approach and proposed dual-radio virtual sinks.

We have discussed Siphon and its algorithms for deploying virtual sinks in sensor

networks. Siphon is evaluated using extensive simulations to gain insights into its

performance and properties under a variety of different conditions. Preliminary

results from an implementation of Siphon in an experimental testbed using Mica

motes and Stargate virtual sinks show that our approach provides substantial im-

provements over the first generation congestion control approaches.

Recent studies [53] [95] [83] show that smart utilization of multiple radios can

either increase the network capacity, improve channel performance or save energy.

In this chapter, we utilize dual-radio virtual sinks to counter traffic funnels caused

by impulse type applications, avoiding congestion and balancing load. As a broader

comment, our contribution is the exploration of general design principles that en-

able exploitation of special nodes, such as dual-radio virtual sinks, to increase the

resilience of sensor networks with affordable cost. The idea of using special nodes can

157

be pushed into a higher level of abstraction. For example, in this chapter we exploit

a virtual sink’s characteristic of longer transmission range. The same concept can

be extended to special nodes with higher transit bandwidth or larger storage space.

The same signaling mechanisms in traffic siphoning can be used for the different

kinds of exploitation mentioned above. Therefore, we believe Siphon’s algorithms

are more broadly applicable for a class of new application that exploit special nodes

with additional capability (e.g., dual radio, more computation capability, more stor-

age).

158

Chapter 5

Conclusion

5.1. The Critical Issue of Transport Resilience

This dissertation has focused on new transport and control paradigms for a class

of new wireless networks based on sensing and actuation. Sensor networks are

embedded in the real world and interact closely with the physical environment in

which they reside. These networks must be designed to effectively deal with the

network’s dynamically changing resources, including energy, bandwidth, processing

power, node density, and connectivity. Importantly, these sensor networks must be

designed to be responsive to such changing conditions while supporting a wide range

of traffic demands from sensors. Traffic demands in sensors networks are different

from other traditional networks that have been studied because the injected traffic

is strongly influenced by, and coupled to, changes in the physical environment that

has been instrumented. Furthermore, sensor networks have to deal with the adverse

effects from uncertain and dynamic physical environments.

Because of these challenges we believe that sensor networks must be designed

and built with resilience as a primary design goal, and not, as in many cases in the

current state of the art, as an secondary add-on. As a general architectural comment,

a resilient sensor network must operate autonomously, changing its configuration as

159

required and running algorithms that are optimized for node survivability and energy

usage. This thesis does not address the resilience issues across the broad architecture

(from radio, MAC, routing, aggregation, transport to application). Rather, we focus

on one aspect of this broader architectural problem: making the sensor network

transport systems much more resilient to changes (in many cases abrupt) in the

network resources and environment, and application traffic demands.

In this dissertation, we have argued that a resilient transport system is a funda-

mental building block of future sensor networks; it is fundamental to the operations

of the network, fundamental to the stability of the network, and finally, fundamen-

tal to the energy-conserving performance goals of the network. In this dissertation,

we define transport resilience as the ability to deliver a sufficient number of events

to meet the applications’ fidelity requirements for a set of different traffic patterns

(i.e., periodic, discrete, impulse) while minimizing the energy consumption of the

network. When our study began there was little in the literature related to transport

resilience. Our investigation identified two classes of transport resilience:

1. The need to reliably deliver data with minimal energy expenditure under var-

ious error conditions.

2. The need to maintain the fidelity of the signal delivered to the applications

under congested network conditions.

We addressed the first challenge of reliable data delivery in Chapter 2 and the

congestion problem in Chapter 3 and Chapter 4.

5.2. Reliable Delivery

The first contribution of this dissertation focused on proposing a new reliable de-

livery transport paradigm for sensor networks. The Pump Slowly Fetch Quickly

160

(PSFQ) protocol represented the first reliable transport proposed for wireless sen-

sors networks. PSFQ represents a lightweight, scalable and robust transport pro-

tocol that is customizable to meet a wide variety of application’s needs (e.g., re-

programming, actuation, reliable event delivery). We specifically focused on one

novel reliable data delivery application associated with remotely programming/re-

tasking sensor nodes over-the-air. This was the first realization of such remote

over-the-air programming capability for sensor network, to the best of our knowl-

edge. PSFQ represented an enabling technology for such advanced applications that

had not been feasible prior to the development of PSFQ. In Chapter 2, we presented

the design and implementation of PSFQ, and evaluated the protocol using the ns-

2 simulator and an experimental wireless sensor testbed based on Berkeley motes

and the TinyOS operating system. We showed that PSFQ can outperform existing

related techniques (e.g., an idealized SRM) and is highly responsive to the various

error conditions experienced in sensor networks.

Chapter 2 provided several important contributions to the problem of reliable

transport in sensor networks. First, we proposed and justified hop-by-hop error

recovery in which intermediate nodes also take responsibility for loss detection and

recovery, so that reliable data exchange is done on a hop-by-hop basis rather than

end-to-end. Second, we analyzed a simplified model of our NAK-based algorithm

and showed the optimal ratio between the timers associated with the forwarding

(pump) and retransmission (fetch) operations. Third, PSFQ exhibits a novel multi-

modal communication property that provides a graceful tradeoff between the packet

switching and store-and-forward paradigms, depending on the channel conditions

encountered.

The results presented in Chapter 2 assume the data cache of each node keeps all

fragments of a file. They also assume a fixed sensor network where there is no node

161

mobility in the network. Future work in this area includes the investigation of cache

size limits and the impact of node mobility on the basic reliable transport protocol

design and operation. We also plan to explore other important transport issues

in sensor networks utilizing the PSFQ paradigm; we plan to investigate different

variants of PSFQ that can be optimized by different metrics such as “delay sensi-

tive” reliable delivery, (e.g., by increasing the degree of pipelining in pumping data

packets thereby speeding up data delivery), or adaptive “network density-aware”

transports, (e.g., by modulating the pump/fetch ratio to take advantages of the

node redundancy in a high density environment).

5.3. Congestion Control

Chapter 3 presented the design of an energy-efficient congestion control scheme for

sensor networks called CODA (COngestion Detection and Avoidance). We explored

a new objective function for traffic control in sensor network that maximizes the

operational lifetime of the network while delivering acceptable data fidelity to sensor

network applications.

To enable the dynamic adaptation of sensor applications to the network condi-

tions while meeting the proposed objective function, CODA was founded on three

important distributed control mechanisms: (1) an accurate and energy-efficient con-

gestion detection scheme; (2) a hop-by-hop backpressure algorithm; and (3) a sink to

multi-source regulation scheme. In Chapter 3, we explored a number of congestion

scenarios and defined new performance metrics that captured the impact of CODA

on sensing applications’ performance. We discussed the performance benefits and

practical engineering challenges of implementing CODA in an experimental sensor

network testbed based on Berkeley motes. Both testbed and simulation results in-

dicated that CODA significantly improved the performance of data dissemination

162

applications such as Directed Diffusion by mitigating hotspots, and reducing the

energy consumption and fidelity penalty on sensing applications. These results are

very promising and provide a basis for further larger-scale experimentation.

CODA represents the first comprehensive solution to the congestion problem in

sensor networks. That is not to say that CODA is the final answer to this prob-

lem. Many other open issues exist. As part of future work we are studying the

performance benefits of using CODA with reliable transport mechanisms such as

PSFQ. The results from Chapter 3 also highlighted significant problems associated

with the stable operation of sensor networks that complicate the design of more

effective congestion control mechanisms; the funnelling effect dominates the trans-

port control design in sensor networks. As shown in Chapter 3, the funnelling

effect significantly limits the network capacity in delivering high fidelity data to

the applications. While first generation congestion control schemes such as CODA

are capable of avoiding congestion and costly packet loss (energy waste) in sensor

networks, they do so by using rate control techniques at sources and intermediate

sensors that limit the maximum number of events that can be transported to the

sink. This raised a significant problem that we addressed in Chapter 4 where we

explored a complementary solution to CODA that was capable of maintaining the

application fidelity during persistent overload conditions - something that CODA

was not capable of doing.

Chapter 4 introduced the concept of dual-radio virtual sinks. We proposed to

randomly distribute a small number of all-wireless dual radio virtual sinks through-

out the sensor field. In essence these virtual sinks operated as safety valves in the

sensor field by selectively siphoning off overload traffic in order to maintain the fi-

delity of the application signal at the physical sink. A key attribute of virtual sinks

is that they are equipped with a secondary long-range radio interface, such as the

163

IEEE 802.11, in addition to their primary low power mote radio. Virtual sinks are

capable of dynamically forming a secondary ad hoc radio network. Rather than

rate-controlling packets during congestion, as is the case with CODA, virtual sinks

take the congested traffic off the low-powered sensor network and move it on to the

secondary radio network, transiting it to the final physical data sink.

Chapter 4 explored algorithms for virtual sink discovery, selection, traffic tran-

siting and load balancing. We described the use of the Stargate [48] platform to

support an all-wireless virtual sink approach in our sensor network testbed. We

showed that a small number of virtual sinks are enough to significantly improve

the data fidelity of the sensor networks. Virtual sinks and the Siphon protocols

discussed in Chapter 4 represent a new direction in traffic over-load management

in sensor networks. In fact, one can design radically different protocol architectures

using dual-radio systems, as indicated with the use of multi-radio in wireless mesh

networks. We believe networks incorporating multi-radio platforms constitute a

promising direction for sensor networks. Future work will extend our model to more

than two separate radio networks and examine more affordable secondary radio sys-

tems. The interaction between the two radio systems in support of sensor traffic

needs further study. For example, we presented some initial ideas associated with

congestion push between one radio network and the other. Such behavior could lead

to instabilities if the dual-radio network is not designed with these types of subtle

interactions in mind. We plan to explore these interactions and to build more robust

multi-radio network systems.

5.4. Endnote

There have been a number of significant advances in the area of protocol design

for sensor networks over the last few years. However, we believe that the issue of

164

transport resilience has been neglected in the design of these protocols. As such,

we argue, the usefulness of existing sensor networks and their ability to operate

in a stable and energy efficient manner when resources, the environment, or traffic

demands suddenly shift is limited. Without building resilience into the transport

we believe the network will at best significantly under perform and at worst be

unusable, for example, because of congestion collapse.

In this dissertation we have shown the types of problems that can emerge if

resilience is not built into the transport as a first class citizen. We discussed a

number of considerations that must be included when designing resilient transport

systems for sensor networks. Our contributions are three-fold. At the transport we

designed, implemented and experimentally evaluated the PSFQ protocol for reli-

able data delivery in sensor networks. PSFQ is capable of successful operation even

under conditions that experience significant packet error losses. The last two contri-

butions are linked by the goal of solving the congestion problem which significantly

limits the operation and roll out of sensor networks because of the unique funnelling

effect that these networks exhibit. We first presented the design, implementation,

and experimental evaluation of CODA to mitigate congestion in sensor networks.

While CODA’s simple control mechanisms are effective at suppressing congestion

and allowing the network to operate at a stable point under varying traffic loads it

does so at the cost of limiting the application’s fidelity. The final contribution of

the dissertation addresses this issue. Virtual sinks represent an enabling technology

that allows sensor networks to quickly control congestion using techniques such as

CODA, but to do so while maintaining the application’s fidelity needs. We consider

this interplay between controlled traffic transit under potentially high impulse loads

while maximizing the fidelity of the network, a compelling problem that virtual

sinks begin to address.

165

The work presented in this dissertation takes an experimental systems research

approach through building small scale testbeds and studying how PSFQ, CODA and

Virtual Sinks can scale to even larger distributed networked systems. The source

code for PSFQ, CODA and Virtual Sinks is freely available from the web [96] for

further experimentation.

Collectively, we hope that PSFQ, CODA, and Virtual Sinks provide a set of suit-

able energy-efficient, robust transport building blocks that can serve as a foundation

for building more resilient sensor network.

166

Chapter 6

My Publications as a PhD Candidate

My publications as a Ph.D. candidate (2000-2005) are listed below. This list also

includes research papers that are indirectly related to the work presented in this

thesis, including, the design and implementation of a seamless mobility solution

(Cellular IP) for mobile networks, and the investigation and comparison of various

micromobility protocols based on ns-2 simulations.

6.1. Journal Papers

• Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. Pump-

Slowly, Fetch-Quickly (PSFQ): A Reliable Transport Protocol for Sensor Net-

works. IEEE Journal on Selected Areas in Communications, Vol. 23, No. 4,

pp. 862-872, April 2005.

• A. T. Campbell, J. Gomez, C-Y. Wan, S. Kim, Z. Turanyi,and A. Valko.

Internet Micromobility. Journal of High Speed Networks, Special Issue on

Multimedia in Wired and Wireless Environment, 11(3-4):177-198, September

2002.

167

6.2. Journal Papers under Submission

• Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. Congestion

Detection and Avoidance in Sensor Networks. IEEE/ACM Transactions on

Networking (in submission).

6.3. Magazine Papers, Review Articles and Book Chapters

• Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. Reli-

able Transport for Sensor Networks. (Eds.) Taieb Znati, Krishna M. Sivalingam,

and Cauligi Raghavendra. Wireless Sensor Networks, Kluwer Academic/Springer

Verlag Publishers, Chapter 8. pp. 153-182, ISBN:1-4020-7883-8, May 2004.

• A. T. Campbell, J. Gomez, C-Y. Wan, S. Kim, Z. Turanyi, and A. Valko.

Comparison of IP Micro-Mobility Protocols. IEEE Wireless Communications

Magazine, Vol. 9, No. 1, pp 72-78, February 2002.

• J. Gomez, S. Kim, C-Y. Wan, Z. Turanyi, A. Valko, and A. T. Campbell.

Design, Implementation and Evaluation of Cellular IP. IEEE Personal Com-

munications, Special Issue on IP-based Mobile Telecommunications Networks,

Vol. 7, No. 4, pp. 42-49, August 2000.

6.4. Conference Papers

• Chieh-Yih Wan, Shane B. Eisenman, Andrew T. Campbell and Jon Crowcroft.

Siphon: Overload Traffic Management using Multi-Radio Virtual Sinks in

Sensor Networks. Accepted and to be published in SenSys 2005.

• Chieh-Yih Wan, Andrew T. Campbell, and Jon Crowcroft. A Case for All-

Wireless, Dual Radio Virtual Sinks (poster abstract). In Proc. of Second

168

ACM Conference on Embedded Networked Sensor Systems (SenSys 2004), pp.

267-268, Baltimore, Nov 3-5, 2004.

• Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. CODA +

PSFQ + Virtual Sinks = Enabling Technologies for Resilient Sensor Net-

working (demo abstract). In Proc. of Second ACM Conference on Embedded

Networked Sensor Systems (SenSys 2004), pp. 308, Baltimore, Nov 3-5, 2004.

• Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. CODA: COn-

gestion Detection and Avoidance in Sensor Networks. In Proc. of First ACM

Conference on Embedded Networked Sensor Systems (SenSys 2003), pp. 266-

279, Los Angeles, November 5-7, 2003.

• Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. PSFQ:

A Reliable Transport Protocol For Wireless Sensor Networks. In Proc. of First

ACM International Workshop on Wireless Sensor Networks and Applications

(WSNA 2002), pp. 1-11, Atlanta, September 28, 2002.

• S. Kim, C-Y. Wan, A. T. Campbell, J. Gomez and A. G. Valko. A Cellular

IP Testbed Demostrator. In Proc. Sixth IEEE International Workshop on

Mobile Multimedia Communications (MOMUC’99), San Diego, California, 15-

17 November 1999.

6.5. IETF Internet Drafts

• Z. D. Shelby, D. Gatzounas, C-Y. Wan, and A. T. Campbell. “Cellular IPv6”

, Internet Draft, draft-ietf-seamoby-cellularipv6-00.txt, IETF Mobile IP Work-

ing Group Document, November 2000.

169

• A. T. Cambell, S. Kim, J. Gomez, C-Y. Wan, Z. Turanyi and A. Valko. “Cel-

lular IP”, Internet Draft, draft-ietf-mobileip-cellularip-00.txt, IETF Mobile IP

Working Group Document, December 1999.

• A. T. Cambell, S. Kim, J. Gomez, C-Y. Wan, Z. Turanyi, and A. Valko.

“Cellular IP Performance”, Internet Draft, draft-gomez-cellularip-perf-00.txt,

October 1999.

• A. T. Campbell, J. Gomez, C-Y. Wan, Z. Turanyi, and A. Valko. “Cellular

IP”, Internet Draft, draft-valko-cellularip-01.txt, October 1999.

170

References

[1] Chee-Yee Chong and Srikanta P. Kumar. Sensor Networks: Evolution, Oppor-tunities, and Challenges. Proceedings of the IEEE, 91(8):1247–1256, August2003.

[2] Ubiquitous Computing - The Third Wave in Computing.http://www.ubiq.com/hypertext/weiser/UbiHome.html.

[3] Mark weiser. http://www.ubiq.com/weiser/.

[4] J. W. Gardner, V. K. Varadan, and O. O. Awadelkarim. Microsensors, MEMSand Smart Devices. Wiley, New York, 2001.

[5] G. Pottie and W.J. Kaiser. Wireless integrated network sensors. Communica-tions of the ACM, 43(5):51–58, May 2000.

[6] Brett Warneke, Matt Last, Brian Liebowitz, and Kristofer S. J. Pister. Smartdust: Communicating with a cubic-millimeter computer. Computer, 34(1):44–51, 2001.

[7] David Culler, Deborah Estrin, and Mani Srivastava. Overview of sensor net-works. IEEE Computer, Special Issue in Sensor Networks, August 2004.

[8] Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, John Hei-demann, and Fabio Silva. Directed diffusion for wireless sensor networking.ACM/IEEE Transactions on Networking, 11(1):2–16, February 2002.

[9] W.R. Heinzelman, J. Kulik, and H. Balakrishnan. Adaptive protocols for in-formation dissemination in wireless sensor networks. In Proc. of the 5th An-nual International Conference on Mobile Computing and Networking (Mobicom1999), pages 174–185, 1999.

[10] John Heidemann, Fabio Silva, and Deborah Estrin. Matching data dissem-ination algorithms to application requirements. In Proceedings of the ACMSenSys Conference, pages 218–229, Los Angeles, California, USA, November2003. ACM.

171

[11] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalableand robust communication paradigm for sensor networks. In Proc. of the 6thAnnual International Conference on Mobile Computing and Networking (Mo-bicom 2000), pages 56–67, August 2000.

[12] Benjie Chen, Kyle Jamieson, and Hari Balakrishnan. An energy efficient coor-dination algorithm for topology maintenance in ad hoc wireless networks. InProc. of the 7th Annual International Conference on Mobile Computing andNetworking (Mobicom 2001), pages 85–96, July 2001.

[13] Y. Xu, J. Heideman, and D. Estrin. Geography-informed energy conservationfor ad hoc routing. In Proc. of the 7th Annual International Conference onMobile Computing and Networking (Mobicom 2001), pages 70–84, July 2001.

[14] W. Ye, J. Heidemann, and D. Estrin. An energy efficient mac protocol for wire-less sensor networks. In Proc. of the 21st International Annual Joint Confer-ence of the IEEE Computer and Communications Societies (INFOCOM 2002),pages 1567–1576. New York, June 2002.

[15] Joe Polastre, Jason Hill, and David Culler. Versatile low power media accessfor wireless sensor networks. In Proc. of Second ACM Conference on EmbeddedNetworked Sensor Systems (SenSys 2004), pages 95–107. Baltimore, November3-5 2004.

[16] T. V. Dam and K. Langendoen. An adaptive energy-efficient mac protocolfor wireless sensor networks. In Proc. of First ACM Conference on EmbeddedNetworked Sensor Systems (SenSys’03), pages 171–180. Los Angeles, November5-7 2003.

[17] X. Wang, G. Xing, Y. Zhang, C. Lu, R. Pless, and C. Gill. Integrated cov-erage and connectivity configuration in wireless sensor networks. In Proc. ofFirst ACM Conference on Embedded Networked Sensor Systems (SenSys 2003),pages 28–39. Los Angeles, November 5-7 2003.

[18] G. Veltri, Q. Huang, G. Qu, and M. Potkonjak. Minimal and maximal exposurepath algorithms for wireless embedded sensor networks. In Proc. of First ACMConference on Embedded Networked Sensor Systems (SenSys 2003), pages 40–50. Los Angeles, November 5-7 2003.

[19] Tinyos homepage. http://webs.cs.berkeley.edu/tos/.

[20] Philip Levis and David Culler. Mate: A tiny virtual machine for sensor net-works. In Proceedings of the 10th International Conference on Architectural

172

Support for Programming Languages and Operating Systems (ASPLOS X),2002.

[21] Philip Levis, Sam Madden, David Gay, Joe Polastre, Robert Szewczyk, AlecWoo, Eric Brewer, and David Culler. The emergence of networking abstractionsand techniques in tinyos. In Proc. of the First USENIX/ACM Symposium onNetworked Systems Design and Implementation (NSDI 2004), 2004.

[22] Roy Shea, Chih-Chieh Han, and Ram Rengaswamy. Motivations Behind SOS.Technical Report SOS2000-1, University of California Los Angeles, NetworkedEmbedded Systems Lab, Los Angeles, CA, February 2004.

[23] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, andKristofer Pister. System arthitecture directions for network sensors. In Proc.of the 9th International Conf. on Arch. Support for Programming Languagesand Operating Systems, pages 93–104, November 2000.

[24] Jason Hill and David Culler. Mica: A wireless platform for deeply embeddednetworks. IEEE Micro., 22(6):12–24, November/December 2002.

[25] moteiv: Wireless sensor networks. http://www.moteiv.com/.

[26] Jeremy Elson. Time synchronization in wireless sensor networks. Ph.D. disser-tation, May 2003.

[27] S. Ganeriwal, R. Kumar, and M. B. Srivastava. Time-sync protocol for sensornetworks. In Proc. of First ACM Conference on Embedded Networked SensorSystems (SenSys 2003), pages 138–149. Los Angeles, November 5-7 2003.

[28] Lewis Girod and Deborah Estrin. Robust range estimation using acoustic andmultimodal sensing. In Proc. of the IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS 2001). Maui, Hawaii, October 2001.

[29] Nirupama Bulusu, John Heidemann, and Deborah Estrin. Gps-less low costoutdoor localization for very small devices. IEEE Personal CommunicationsMagazine, 7(5):28–34, October 2000.

[30] Grime S and Durrant-Whyte H F. Data fusion in decentralized sensor networks.Control Engineering Practice, 2(5):849–863, 1994.

[31] D. Guo and X. Wang. Dynamic sensor collaboration via sequential monte carlo.IEEE Journal on Selected Areas in Communications, 22(6):1037–1047, August2004.

173

[32] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar. Next century chal-lenges: Scalable coordination in sensor networks. In Proc. of the 5th AnnualInternational Conference on Mobile Computing and Networking (ACM Mobi-com 1999), August 1999.

[33] Robert Szewczyk, Joe Polastre, Alan Mainwaring, John Anderson, and DavidCuller. An analysis of a large scale habitat monitoring application. In Proc.of Second ACM Conference on Embedded Networked Sensor Systems (SenSys2004), pages 214–226. Baltimore, November 3-5 2004.

[34] Mark D. Yarvis, W. Steven Conner, Lakshman Krishnamurthy, JasmeetChhabra, Brent Elliott, and Alan Mainwaring. Real-world experiences with aninteractive ad hoc sensor network. In Proceedings of the International Work-shop on Ad Hoc Networking (IWAHN 2002), pages 143–151. Vancouver, BritishColumbia, Canada, August 2002.

[35] Sameer Tilak, Nael B. Abu-Ghazaleh, and Wendi Heinzelman. Infrastructuretradeoffs for sensor networks. In Proc. of First ACM International Workshopon Wireless Sensor Networks and Applications (WSNA 2002), pages 49–58.Atlanta, September 2002.

[36] I. F. Akyildiz and I. H. Kasimoglu. Wireless sensor and actor networks: Re-search challenges. Ad Hoc Networks Journal (Elsevier), 2:351–367, October2004.

[37] RF Monolithics. Tr1000 916.50mhz hybrid transceivers. http://www.rfm.com.

[38] Sos operating system. http://nesl.ee.ucla.edu/projects/sos.

[39] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S. Shenker.GHT: A Geographic Hash Table for Data-Centric Storage. In Proc. of FirstACM International Workshop on Wireless Sensor Networks and Applications(WSNA 2002), pages 78–87. Atlanta, September 2002.

[40] Chalermek Intanagonwiwat, Deborah Estrin, Ramesh Govindan, and John Hei-demann. Impact of network density on data aggregation in wireless sensor net-works. In Proc. of International Conference on Distributed Computing Systems(ICDCS), July 2002.

[41] Jerry Zhao and Ramesh Govindan. Understanding packet delivery performancein dense wireless sensor network. In Proc. of First ACM Conference on Em-bedded Networked Sensor Systems (SenSys 2003), pages 1–13. Los Angeles,November 5-7 2003.

174

[42] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in systemdesign. ACM Transactions on Computer Systems, 2(4), November 1984.

[43] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and S. Wicker.Complex behavior at scale: An experimental study of low-power wireless sen-sor networks. In Technical Report UCLA/CSD-TR02-0013. Computer ScienceDepartment, UCLA, July 2002.

[44] Sridhar Pingali, Don Towsley, and James F. Kurose. A comparison of sender-initiated and receiver-initiated reliable multicast protocols. In Proceedings ofthe Sigmetrics Conference on Measurement and Modeling of Computer Systems,pages 221–230, New York, NY, USA, 1994. ACM Press.

[45] Philip Levis, Sam Madden, David Gay, Joe Polastre, Robert Szewczyk, AlecWoo, Eric Brewer, and David Culler. The emergence of networking abstractionsand techniques in tinyos. In Proc. of the First USENIX/ACM Symposium onNetworked Systems Design and Implementation (NSDI 2004), 2004.

[46] Chipcon Corporation. CC1000 low power FSK transceiver, April 2002.http://www.chipcon.com/files/CC1000 Data Sheet 2 1.pdf.

[47] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks.Nature, 393:440–442, 1998.

[48] Stargate datasheet. http://www.xbow.com/Products/Product-pdf-files/.

[49] Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. PSFQ:A Reliable Transport Protocol for Wireless Sensor Networks. In Proc. of FirstACM International Workshop on Wireless Sensor Networks and Applications(WSNA 2002), pages 1–11. Atlanta, September 2002.

[50] P. Mishra and H. Kanakia. A hop by hop rate-based congestion control scheme.In Proc. of the ACM SIGCOMM Conf., pages 112–123. Baltimore, MD, August1992.

[51] W. Noureddine and F. Tobagi. Selective Backpressure in Switched EthernetLANs. In Proc. of the IEEE GLOBECOM Conf., pages 1256–1263. Rio DeJaneiro, Brazil, December 1999.

[52] C. Ozveren, R. Simcoe, and G. Varghese. Reliable and efficient hop-by-hop flowcontrol. In Proc. of the ACM SIGCOMM Conf. London, UK, August 1994.

[53] P. Bahl, A. Adya, J. Padhye, and A. Wolman. Reconsidering wireless systemswith multiple radios. ACM SIGCOMM Computer Communications Review(CCR), July 2004.

175

[54] Cots dust - large scale models for smart dust.http://www-bsac.eecs.berkeley.edu/shollar/macro motes/macromotes.html.

[55] S. Floyd, V. Jacobson, C. Liu, S. Macanne, and L. Zhang. A reliable multicastframework for lightweight session and application layer framing. IEEE/ACMTransactions on Networking, 5(6):784–803, December 1997.

[56] J. Zhao, R. Govindan, and D. Estrin. Computing aggregates for monitoringwireless sensor networks. In Proc. Of the IEEE ICC Workshop on SensorNetwork Protocols and Applications. Anchorage, AK, May 2003.

[57] C.E. Perkins and P. Bhagwat. Highly dynamic destination-sequenced distance-vector routing (dsdv) for mobile computers. In SIGCOMM Symposium onCommunications Architectures and Protocols, pages 212–225, September 1994.

[58] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu. The broadcast storm prob-lem in a mobile adhoc network. In Proc. of the 5th Annual ACM/IEEE Inter-national Conference on Mobile Computing and Networking (Mobicom 1999),pages 151–162, August 1999.

[59] D. A. Maltz. On-demand routing in multi-hop wireless mobile ad hoc networks.Ph.D. dissertation, 2001.

[60] The network simulator - ns2. http://www.isi.edu/nsnam/ns/.

[61] J.J. Garcia-Luna-Aceves and E. L. Madruga. The core assisted mesh protocol.IEEE Journal on Selected Areas in Communications, 17(8):1380–1394, August1999.

[62] S.-J. Lee, M. Gerla, and C.-C. Chiang. On-demand multicast routing protocol.In Proc. IEEE Wireless Communications and Networking Conf., pages 1298–1304, Sep. 21-25 1999.

[63] C. Ho, K. Obraczka, G. Tsudik, and K. Viswanath. Flooding for ReliableMulticast in Multi-Hop Ad Hoc Networks. In Mobicom Workshop on DiscreteAlgorithms and Methods for Mobility (DialM.99), August 1999.

[64] E. Pagani and G. Rossi. Reliable Broadcast in Mobile Multihop Packet Net-works. In Proc. of the third annual ACM/IEEE international conference onMobile computing and networking, pages 34–42, September 1997.

[65] R. Stann and J. Heidemann. Rmst: Reliable data transport in sensor networks.In 1st IEEE International Workshop on Sensor Net Protocols and Applications(SNPA). Anchorage, Alaska, USA, May 2003.

176

[66] Yogesh Sankarasubramaniam, Ozgur Akan, and Ian Akyildiz. Event-to-sinkreliable transport in wireless sensor networks. In Proc. of the 4th ACM Sym-posium on Mobile Ad Hoc Networking & Computing (MobiHoc 2003), pages177–188. Annapolis, Maryland, June 2003.

[67] S-J. Park and R. Sivakumar. Sink-to-Sensors Reliability in Sensor Networks. InProc. of the 4th ACM Symposium on Mobile Ad Hoc Networking & Computing(MobiHoc 2003). Annapolis, Maryland, June 2003.

[68] Jonathon hui and David Culler. The dynamic behavior of a data disseminationprotocol for network programming at scale. In Proc. of Second ACM Con-ference on Embedded Networked Sensor Systems (SenSys 2004), pages 81–94.Baltimore, November 3-5 2004.

[69] K. K. Ramakrishnan and R. Jain. A binary feedback scheme for congestionavoidance in computer networks. ACM Transactions on Computer Systems,8(2):158–181, May 1990.

[70] Y.C. Tay, Kyle Jamieson, and Hari Balakrishnan. Collision-minimizing csmaand its applications to wireless sensor networks. IEEE Journal on SelectedAreas in Communications, 22(6):1048–1057, August 2004.

[71] L. P. Clare, G. Pottie, and J. R. Agre. Self-organizing distributed microsensornetworks. In Proc. of SPIE 13th Annual Internl. Symp. on Aerospace/DefenseSensing, Simulation, and Controls (AeroSense). Orlando, FL, April 1999.

[72] V. Rajendran, K. Obraczka, and J.J. Garcia. Energy-efficient, collision-freemedium access control for wireless sensor networks. In Proc. of First ACMConference on Embedded Networked Sensor Systems (SenSys’03), pages 181–192. Los Angeles, November 5-7 2003.

[73] D. Bertsekas and R. Gallager. DATA NETWORKS, second edition. PrenticeHall, Upper Saddle River, New Jersey, 1992.

[74] J. Li, C. Blake, D. De Couto, H. Lee, and R. Morris. Capacity of ad hoc wirelessnetworks. In Proc. of the Seventh Annual International Conference on MobileComputing and Networking, pages 61–69, July 2001.

[75] Alec Woo and David Culler. A transmission control scheme for media accessin sensor networks. In Proc. of the 7th Annual International Conference onMobile Computing and Networking (Mobicom 2001), pages 221–235, July 2001.

[76] Bret Hull, Kyle Jamieson, and Hari Balakrishnan. Mitigating congestion inwireless sensor networks. In Proc. of Second ACM Conference on Embedded

177

Networked Sensor Systems (SenSys 2004), pages 134–147. Baltimore, November3-5 2004.

[77] Bret Hull, Kyle Jamieson, and Hari Balakrishnan. Bandwidth management inwireless sensor networks (poster abstract). In Proc. of First ACM Conferenceon Embedded Networked Sensor Systems (SenSys 2003), pages 306–307. LosAngeles, November 5-7 2003.

[78] Cheng Tien Ee and Ruzena Bajcsy. Congestion control and fairness for many-to-one routing in sensor networks. In Proc. of Second ACM Conference onEmbedded Networked Sensor Systems (SenSys 2004), pages 148–161. Baltimore,November 3-5 2004.

[79] P. Sinha, N. Venkitaraman, R. Sivakumar, and V. Bharghavan. Wtcp: Areliable transport protocol for wireless wide-area networks. In Proc. of the5th Annual International Conference on Mobile Computing and Networking(Mobicom 1999). Seattle, August 1999.

[80] G.-S. Ahn, A. T. Campbell, Andras Veres, and Li-Hsiang Sun. Supportingservice differentiation for real-time and best effort traffic in stateless wirelessad hoc networks (swan). IEEE Transactions on Mobile Computing, 1(3):192–207, July-September 2002.

[81] K. Tang and M. Gerla. Reliable on-demand multicast routing with congestioncontrol in wireless ad hoc networks. In Proceedings of SPIE 2001. Denver,August 2001.

[82] Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. CODA: COn-gestion Detection and Avoidance in Sensor Networks. In Proc. of First ACMConference on Embedded Networked Sensor Systems (SenSys 2003), pages 266–279. Los Angeles, November 5-7 2003.

[83] Eugene Shih, Paramvir Bahl, and Michael J. Sinclair. Wake on wireless: Anevent driven energy saving strategy for battery operated devices. In Proc. ofthe 8th Annual International Conference on Mobile Computing and Networking.Atlanta, GA, September 2002.

[84] W. S. Conner, J. Chhabra, M. Yarvis, and L. Krishnamurthy. Experimentalevaluation of synchronization and topology control for in-building sensor net-work applications. In Proc. of 2nd ACM International Workshop on WirelessSensor Networks and Applications (WSNA 2003), pages 38–49. San Diego, CA,September 2003.

178

[85] M. Yarvis, N. Kushalnagar, H. Singh, A. Rangarajan, Y. Liu, and S. Singh.Exploiting heterogeneity in sensor networks. In Proceedings of IEEE INFOCOM2005 (to appear). Miami, FL, March 2005.

[86] Feng Xue and P. R. Kumar. The number of neighbors needed for connectivityof wireless networks. Wireless Networks, 10(2):169–181, March 2004.

[87] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Trans-actions on Information Theory, IT-46(2):388–404, March 2000.

[88] T.H.Corman, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MITPress, McGraw Hill, 1990.

[89] Alec Woo and David Culler. Taming the underlying challenges of reliable multi-hop routing in sensor networks. In Proc. of First ACM Conference on EmbeddedNetworked Sensor Systems (SenSys 2003), pages 14–27. Los Angeles, November5-7 2003.

[90] T. He, B. M. Blum, J. A. Stankovic, and T. Abdelzaher. Aida: Adaptiveapplication-independent data aggregation in wireless sensor networks. ACMTransactions on Embedded Computing Systems, 3(2):426–457, May 2004.

[91] J. C. Navas and Tomasz Imielinski. Geographic addressing and routing. InProc. of the 3rd Annual International Conference on Mobile Computing andNetworking (Mobicom 1997). Budapest, Hungary, September 1997.

[92] Badri Nath and Dragos Niculescu. Routing on a curve. In Proc. of the FirstWorkshop on Hot Topics in Networks (HotNets-I). Princeton, NJ, October2002.

[93] C. E. Perkins, E. M. Belding-Royer, and S. Das. Ad hoc on demand distancevector (aodv) routing. IETF RFC 3561.

[94] R. Murty, E. H. Qi, and M. Hazra. An adaptive approach to wireless net-work performance optimization. In Wireless World Research Forum (WWRF11Meeting). Oslo, Norway, June 10-11 2004.

[95] A. Adya, P. Bahl, J. Padhye, A. Wolman, and L. Zhou. A multi-radio unifica-tion protocol for ieee 802.11 wireless networks. In Technical Report (MSR-TR-2003-44). Microsoft Research, July 2003.

[96] The armstrong project. http://www.comet.columbia.edu/armstrong.

Documents

A Resilient Transport System for Wireless Sensor …campbell/papers/wan-thesis.pdfA Resilient Transport System for Wireless Sensor Networks Chieh-Yih Wan Submitted in partial ful llment