Wireless Sensor Networks Data and Databases Professor Jack Stankovic Department of Computer Science...

Preview:

Citation preview

Wireless Sensor Networks

Data and Databases

Professor Jack StankovicDepartment of Computer

ScienceUniversity of Virginia

OutlineOutline

• Overview of– Database Perspective for WSN– Storage Issues– General Architectures– Queries (what they look like)– TinyDB/TAG

• Example Protocol– SEAD

Classical DB Classical DB

Query

Query Optimizer

Plan

Schema (Personnel Records)

Database

DataIndices

Streams Stock Market Quotes News Feeds

(Confidence of Data)

Ad Hoc WSN – DB ViewAd Hoc WSN – DB View

QueryOptimizerPlan

Temperature MapData(i)

Cluster HeadMore Storage

Why Different?Why Different?• Amount of memory small

– No disks

• Highly decentralized• Volatile

– Nodes sleep/awake– Nodes fail– RAM (and FLASH)

• Data is transient• Data is uncertain (range queries)• Query on time/location/area

Why Different?Why Different?

• Multiple queries that follow each other

• Real-Time Streams• Cost models for optimizing the plans

for executing queries are difficult– Goal: Answer the query to a specified

confidence level at minimum cost• Minimize energy, messages, time, …

Why Different?Why Different?

• Data is correlated– Av. Temperature in area of 10 nodes

• Expensive to query all 10• Nodes near each other have similar

temperatures

– Learn correlation• Sensor on a window sill x degrees warmer

than center of room on sunny days

WSN - Data PerspectiveWSN - Data Perspective

• Raw sensor readings – data• Process data into information

• Example– Magnetic + acoustic + motion =>

vehicle

WSN – Data PerspectiveWSN – Data Perspective

• In-network aggregation– Minimize energy used– Reduce end-to-end delay

• Archive all data ??• Handle (dynamic, periodic) queries

– Disseminate queries into WSN

• Raise level of abstraction– View as a database

Data StorageData Storage

• Collect physical measurements, including data streams

• Store the data - where

SensorNode

Raw DataDetection InformationClassification

SensorNode

Mem.Small storageFlashNo disk

Append only

Communicate

Situation Assessment

CacheLog Mote

Data StorageData Storage

• In-network processing to reduce storage requirements

• Send results of queries back to (multiple) users– Can be mobile– Replicate in network stored data for

• Efficiency• Reliability

Data StorageData Storage

• Tag data with confidence level• Encrypt data• Compress data• Drop data• Age data• Aggregate data (min, max, mean,

…)• Blur data => privacy

Data StorageData Storage

• Data consists of real world measurements and is inherently noisy– Exact match queries not always useful– Range-based queries more appropriate

• Real-time queries – Sample rates– Deadlines– Data freshness (temporal validity)– Continuous, long running queries

More Complicated Scenario

More Complicated Scenario

Tree construction:

• Hierarchical Structure

• Subscription Requests

• Replica Placement• Mobility

Management

Data AssociationData Association

• Tracking “N” targets– People, vehicles, animals

• RFID tags • Known/friendly targets

Smart Living SpaceSmart Living Space

Architecture (1)Architecture (1)Base StationData Stored hereQueries performed here

Data

Data

Data Data

ApplicationsApplications

• Monitor soil moisture• Create temperature maps• …

Architecture (2) Architecture (2)

Base Station

QueriesFlood

Data Stored Decentralized at Each Node

ApplicationsApplications

• Number of horses in meadow• Tank appears• …

• DD• RAP

Architecture (3)Architecture (3)

Base Station

Query toRendezvousPoints

Stargates/Log motes

Hierarchical Network

ApplicationsApplications

• Medical• Environmental• …

Architecture (4) Architecture (4) DistantWorkStation

Data Stored Decentralized at Each Node Collected by Data Mules

Disconnected System

ApplicationsApplications

• Environmental Studies• Bridge Analysis• Structural Assessments• Difficult to access areas – use

helicopters• …

Example of SQL QueryExample of SQL Query

• Retrieve, every 45 seconds, the rainfall level if it is greater than 50 mm

SELECT R.Sensor.getRainfallLevel()FROM RFSensors RWHERE R.Sensor.getRainfallLevel() > 50AND $every(45);

Queries - ExtensionsQueries - Extensions

• Choose area• Choose lifetime• Aggregate data over a group of sensors• Set conditions restricting which sensors

can contribute data• Correlate data from different sensors

– Sound alarm whenever two sensors within 10 m of each other detect abnormality

• Specify probabilities for equality tests• Ask for range data• Ask for confidence level on answer

Examples of QueriesExamples of Queries

• Military Surveillance– ???

• Medical Domain– Assisted Living Spaces

• ???

– Nursing Homes• ???

• Environmental– ???

Disseminating the Query

Disseminating the Query

• Flooding • Selective Flooding (to an area)• Spanning Tree

– Multiple needed if multiple base stations

– Multiple needed for different queries at same base station

• Store data by name and hash to that location to retrieve the data

Geographic Hash Table (GHT)

Geographic Hash Table (GHT)

• Translate from a attribute to a storage location

• Distribute data evenly over the network

• Example: GHT system (A Geographic Hash Table for Data Centric Storage – see Ch 6.6 in text))

GHTGHT

• Events are named with keys• Storage and retrieval of event are

performed with these keys• Key is hashed to a geographic

position• Locate node closest to this position

x

Hash to x

Closest node

GHTGHT

Base Station

QueryStore Tank Info Here

Disseminating the Query

Disseminating the Query

• Given a cost model for using the WSN

• Given the request with a confidence level

• Create a plan to disseminate the query at minimum cost to obtain the answer AND meet confidence in the answer

TinyDBTinyDB

• For Periodic (Environmental) Applications

• Integrates query and query response with power management by scheduling sleep/wake-up times depending on the depth of the tree– Coordinate sleep/wake-up with sensing

• Note the need for clock sync

TAG of TinyDBTAG of TinyDB

Base Station

2 Phases (sleep when possible)• disseminate periodic query• collect data (scheduled)

EpochPipelining

Another Issue - IndexingAnother Issue - Indexing

• Indexing – Cost of building and maintaining an

index may be too high for WSN– More likely when nodes begin to have

more storage/memory

– Example system: DIFS (A Distributed Index for Features in Sensor Networks)• Low average search costs• Hash chooses a location within a region not

over the whole system like GHT

Underlying SupportUnderlying Support

• ELF: An Efficient Log-Structured Flash File System– Persistent storage– Appending data to file– Delivery to base station – later– Supports garbage collection– Accounts for limited number of writes to

flash (e.g., 10,000 writes)• Wear leveling

– API – open(); read(); write(); delete()

Content Distribution Content Distribution

Ad hoc Wireless Sensor Network

Environmental Measurements

Information(Quality dimensions: Refresh Rate, Accuracy, …)

Energy(Computation, Communication)

Mobile Monitoring Agents

TimeMust be

Minimized

Content DisseminationContent Dissemination

Information source(Aperiodic or Periodic updates)

Data replicas(Placement?)

Receivers(Refresh Rate,Accuracy)

Goal: Find the optimal communication path to send sensory data from a monitored source to multiple mobile sinks such that energy usage is minimized and requirements are met.

ApplicationsApplications

• Soldiers with PDA monitoring for chemical contamination

• Note: – Current: 1 to n– Multiple sinks and multiple sources are

possible

IssuesIssues• Building the dissemination tree• Maintaining it as nodes enter/leave• Disseminating the data• Maintaining linkage to mobile sinks

• Save energy!!!• Meet end-to-end delay• Meet refresh rate

Self-Organizing

Dissemination TreesDissemination Trees

Dense sensor network

Unicast(GeographicForwarding)

Dissemination TreesDissemination Trees

Unicast(GeographicForwarding)

MinimumSpanningTree

Dense sensor network

Dissemination TreesDissemination Trees

Unicast(GeographicForwarding)

MinimumSpanningTree

SteinerPoint:replicas

SteinerTree

Dense sensor network

Sending the DataSending the Data

SteinerPoint

SteinerTree

R

R

RR

R

Update rate = R

Regular Multicast

Sending the DataSending the Data

SteinerPoint

SteinerTree

R

R

RR

R

Update rate = R

Regular Multicast

Weighted Steiner Tree

?

Update rate = R

r1

r2

r3

r4

R

r1

r4

r3

r2

Asynchronous Multicast

r1, r2, r3, r4

are receiverrefresh rates

r4

r2

Weighted Steiner Tree

CachingSteinerpoints

SEAD: Scalable Energy Efficient Asynchronous Dissemination

Protocol

SEAD: Scalable Energy Efficient Asynchronous Dissemination

Protocol

• An asynchronous content distribution multicast tree is maintained

• Tree is modified when– a sink joins– a sink leaves– a sink moves beyond

some threshold

• Cost of building tree is minimized

r1

r4

r3

r2r4

r2

CachingSteinerpoints

Disseminationto Mobile Sinks

Access node

Mobile node

Forwardingchain

AssumptionsAssumptions

• WSN is static and then mobile nodes (e.g., PDAs) enter the network

• Dissemination trees are among the static nodes– Important – mobile nodes are NOT part

of the tree

• SEAD works with an overlay network– Source, sink representatives (access

points) and Steiner points (see previous slide)

4 Phases 4 Phases• First - Subscription Query

– Mobile node attaches to nearest node as access point

– Access node sends join query to source

Subscription Query (1)Subscription Query (1)

Source

Sink 1 (access node)

Sink 2 (access node)

4 Phases 4 Phases• Subscription Query

– Mobile node attaches to nearest node as access point

– Access node sends join query to source

• Second -Gate replica search– Attach new node on current tree at best

gate replica

Gate Replica Search (2)

Gate Replica Search (2)

Source

Sink 1 (access node)

GateReplicas(assume they existfor some currenttree – not shown)

Gate Replica Search (2)

Gate Replica Search (2)

Information source

Receivers

Attach to mostappropriate gatereplica

• Saves energy

Adjust ReplicaAdjust Replica

• If node moves and access point is no longer appropriate re-adjust tree in the local area, if necessary

• If new access node can attach without increasing cost then no need for additional replica (i.e., no better neighbors exist)

• If new replica is needed then it is chosen based on minimizing overall cost for this area

4 Phases 4 Phases• Subscription Query

– Mobile node attaches to nearest node as access point

– Access node sends join query to source

• Gate replica search– Attach new node on current tree at best

gate replica

• Third - Replica Placement– Locally adjust the tree to a better

dissemination tree

Gate Replica Placement (3)

Gate Replica Placement (3)

Source

Sink 1 (access node)

Sink 2 (access node)

GateReplica

Replica PlacementReplica Placement

• Branch cost must reflect the amount of energy spent on communication along the branch

• Tree cost = branch cost

r1

r4

r3

r2r4

r2

Branch Cost?

Branch Cost MetricBranch Cost Metric

Geographic Forwarding

Branch Cost = Distance x Packet_Rate

Distance

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

1. Broadcast replica request

2. Collect replica cost bids

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

Current cost: Sum of branch

lengths weighted by rate

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

1. Broadcast replica request

2. Collect replica cost bids

3. If local cost decreased,

choose least cost child

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

1. Broadcast replica request

2. Collect replica cost bids

3. If local cost decreased,

choose least cost child

Process Repeated:

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

1. Broadcast replica request

2. Collect replica cost bids

3. If local cost decreased,

choose least cost child

Process Repeated:

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

Current cost: Sum of branch

lengths weighted by rate

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

1. Broadcast replica request

2. Collect replica cost bids

3. If local cost decreased,

choose least cost child

Process Repeated:

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

1. Broadcast replica request

2. Collect replica cost bids

3. If local cost decreased,

choose least cost child

Process Repeated:

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

Current cost: Sum of branch

lengths weighted by rate

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

Can’t reduce cost further.

Replica placement

terminates

Access nodes are NOT

replicas

Cost Minimizing Replica Placement

Example

Cost Minimizing Replica Placement

Example

Source

Sink 1 (access node)

Sink 2 (access node)

Minimum cost replica

placement found

New Replica

4 Phases 4 Phases• Subscription Query

– Mobile node attaches to nearest node as access point

– Access node sends join query to source

• Gate replica search– Attach new node on current tree at best gate

replica

• Replica Placement– Locally adjust the tree to a better

dissemination tree

• Four - Sink Mobility– Change access point if mobile node moves too

far away

Sink Mobility (4)Sink Mobility (4)

Source

Sink 1 (access node)

Sink 2 (access node)

Adjust ReplicaAdjust Replica

• If node moves and access point is no longer appropriate re-adjust tree in the local area, if necessary

• If new access node can attach without increasing cost then no need for additional replica (i.e., no better neighbors exist)

• If new replica is needed then it is chosen based on minimizing overall cost for this area– all previous children of the previous replica

and this new access node are divided among the two replicas based on cost

Key PointsKey Points

• Mobile nodes are never used to route information– Minimizes need for recalculation of

dissemination tree

• Source node can be a leader for an area of nodes

• Scales – DD does not scale

Key PointsKey Points• SEAD is an overlay network

Geographic Forwarding

Distance

Other Issues For SEADOther Issues For SEAD

• Multiple simultaneous dissemination trees

• Multiple sources and 1 destination, e.g., a base station

Performance of SEADPerformance of SEAD

• Less energy consumed and less end-to-end delay than– Directed Diffusion– Two Tier Directed Diffusion– Mobile ad hoc multicast

SummarySummary

• Information is the whole point of the WSN

• Measure/Search - the physical world

• Spectrum - Raw data to application specific information (sensor fusion) – Elderly person

• Raw Sensor Data• Out of bed, near counter, pill bottle moved,

bathroom• Implies the person is OK

SummarySummary

• General queries versus fixed queries– Space, time, streaming, uncertain, …

• Data Aging ideas• Rendezvous points for queries and

data• Uncertain and range based data• High Level View of Using System• Real-Time

• Replication • Indices

Recommended