Upload
simon-hicks
View
219
Download
3
Embed Size (px)
Citation preview
Wireless Sensor Networks
Data and Databases
Professor Jack StankovicDepartment of Computer
ScienceUniversity of Virginia
OutlineOutline
• Overview of– Database Perspective for WSN– Storage Issues– General Architectures– Queries (what they look like)– TinyDB/TAG
• Example Protocol– SEAD
Classical DB Classical DB
Query
Query Optimizer
Plan
Schema (Personnel Records)
Database
DataIndices
Streams Stock Market Quotes News Feeds
(Confidence of Data)
Ad Hoc WSN – DB ViewAd Hoc WSN – DB View
QueryOptimizerPlan
Temperature MapData(i)
Cluster HeadMore Storage
Why Different?Why Different?• Amount of memory small
– No disks
• Highly decentralized• Volatile
– Nodes sleep/awake– Nodes fail– RAM (and FLASH)
• Data is transient• Data is uncertain (range queries)• Query on time/location/area
Why Different?Why Different?
• Multiple queries that follow each other
• Real-Time Streams• Cost models for optimizing the plans
for executing queries are difficult– Goal: Answer the query to a specified
confidence level at minimum cost• Minimize energy, messages, time, …
Why Different?Why Different?
• Data is correlated– Av. Temperature in area of 10 nodes
• Expensive to query all 10• Nodes near each other have similar
temperatures
– Learn correlation• Sensor on a window sill x degrees warmer
than center of room on sunny days
WSN - Data PerspectiveWSN - Data Perspective
• Raw sensor readings – data• Process data into information
• Example– Magnetic + acoustic + motion =>
vehicle
WSN – Data PerspectiveWSN – Data Perspective
• In-network aggregation– Minimize energy used– Reduce end-to-end delay
• Archive all data ??• Handle (dynamic, periodic) queries
– Disseminate queries into WSN
• Raise level of abstraction– View as a database
Data StorageData Storage
• Collect physical measurements, including data streams
• Store the data - where
SensorNode
Raw DataDetection InformationClassification
SensorNode
Mem.Small storageFlashNo disk
Append only
Communicate
Situation Assessment
CacheLog Mote
Data StorageData Storage
• In-network processing to reduce storage requirements
• Send results of queries back to (multiple) users– Can be mobile– Replicate in network stored data for
• Efficiency• Reliability
Data StorageData Storage
• Tag data with confidence level• Encrypt data• Compress data• Drop data• Age data• Aggregate data (min, max, mean,
…)• Blur data => privacy
Data StorageData Storage
• Data consists of real world measurements and is inherently noisy– Exact match queries not always useful– Range-based queries more appropriate
• Real-time queries – Sample rates– Deadlines– Data freshness (temporal validity)– Continuous, long running queries
More Complicated Scenario
More Complicated Scenario
Tree construction:
• Hierarchical Structure
• Subscription Requests
• Replica Placement• Mobility
Management
Data AssociationData Association
• Tracking “N” targets– People, vehicles, animals
• RFID tags • Known/friendly targets
Smart Living SpaceSmart Living Space
Architecture (1)Architecture (1)Base StationData Stored hereQueries performed here
Data
Data
Data Data
ApplicationsApplications
• Monitor soil moisture• Create temperature maps• …
Architecture (2) Architecture (2)
Base Station
QueriesFlood
Data Stored Decentralized at Each Node
ApplicationsApplications
• Number of horses in meadow• Tank appears• …
• DD• RAP
Architecture (3)Architecture (3)
Base Station
Query toRendezvousPoints
Stargates/Log motes
Hierarchical Network
ApplicationsApplications
• Medical• Environmental• …
Architecture (4) Architecture (4) DistantWorkStation
Data Stored Decentralized at Each Node Collected by Data Mules
Disconnected System
ApplicationsApplications
• Environmental Studies• Bridge Analysis• Structural Assessments• Difficult to access areas – use
helicopters• …
Example of SQL QueryExample of SQL Query
• Retrieve, every 45 seconds, the rainfall level if it is greater than 50 mm
SELECT R.Sensor.getRainfallLevel()FROM RFSensors RWHERE R.Sensor.getRainfallLevel() > 50AND $every(45);
Queries - ExtensionsQueries - Extensions
• Choose area• Choose lifetime• Aggregate data over a group of sensors• Set conditions restricting which sensors
can contribute data• Correlate data from different sensors
– Sound alarm whenever two sensors within 10 m of each other detect abnormality
• Specify probabilities for equality tests• Ask for range data• Ask for confidence level on answer
Examples of QueriesExamples of Queries
• Military Surveillance– ???
• Medical Domain– Assisted Living Spaces
• ???
– Nursing Homes• ???
• Environmental– ???
Disseminating the Query
Disseminating the Query
• Flooding • Selective Flooding (to an area)• Spanning Tree
– Multiple needed if multiple base stations
– Multiple needed for different queries at same base station
• Store data by name and hash to that location to retrieve the data
Geographic Hash Table (GHT)
Geographic Hash Table (GHT)
• Translate from a attribute to a storage location
• Distribute data evenly over the network
• Example: GHT system (A Geographic Hash Table for Data Centric Storage – see Ch 6.6 in text))
GHTGHT
• Events are named with keys• Storage and retrieval of event are
performed with these keys• Key is hashed to a geographic
position• Locate node closest to this position
x
Hash to x
Closest node
GHTGHT
Base Station
QueryStore Tank Info Here
Disseminating the Query
Disseminating the Query
• Given a cost model for using the WSN
• Given the request with a confidence level
• Create a plan to disseminate the query at minimum cost to obtain the answer AND meet confidence in the answer
TinyDBTinyDB
• For Periodic (Environmental) Applications
• Integrates query and query response with power management by scheduling sleep/wake-up times depending on the depth of the tree– Coordinate sleep/wake-up with sensing
• Note the need for clock sync
TAG of TinyDBTAG of TinyDB
Base Station
2 Phases (sleep when possible)• disseminate periodic query• collect data (scheduled)
EpochPipelining
Another Issue - IndexingAnother Issue - Indexing
• Indexing – Cost of building and maintaining an
index may be too high for WSN– More likely when nodes begin to have
more storage/memory
– Example system: DIFS (A Distributed Index for Features in Sensor Networks)• Low average search costs• Hash chooses a location within a region not
over the whole system like GHT
Underlying SupportUnderlying Support
• ELF: An Efficient Log-Structured Flash File System– Persistent storage– Appending data to file– Delivery to base station – later– Supports garbage collection– Accounts for limited number of writes to
flash (e.g., 10,000 writes)• Wear leveling
– API – open(); read(); write(); delete()
Content Distribution Content Distribution
Ad hoc Wireless Sensor Network
Environmental Measurements
Information(Quality dimensions: Refresh Rate, Accuracy, …)
Energy(Computation, Communication)
Mobile Monitoring Agents
TimeMust be
Minimized
Content DisseminationContent Dissemination
Information source(Aperiodic or Periodic updates)
Data replicas(Placement?)
Receivers(Refresh Rate,Accuracy)
Goal: Find the optimal communication path to send sensory data from a monitored source to multiple mobile sinks such that energy usage is minimized and requirements are met.
ApplicationsApplications
• Soldiers with PDA monitoring for chemical contamination
• Note: – Current: 1 to n– Multiple sinks and multiple sources are
possible
IssuesIssues• Building the dissemination tree• Maintaining it as nodes enter/leave• Disseminating the data• Maintaining linkage to mobile sinks
• Save energy!!!• Meet end-to-end delay• Meet refresh rate
Self-Organizing
Dissemination TreesDissemination Trees
Dense sensor network
Unicast(GeographicForwarding)
Dissemination TreesDissemination Trees
Unicast(GeographicForwarding)
MinimumSpanningTree
Dense sensor network
Dissemination TreesDissemination Trees
Unicast(GeographicForwarding)
MinimumSpanningTree
SteinerPoint:replicas
SteinerTree
Dense sensor network
Sending the DataSending the Data
SteinerPoint
SteinerTree
R
R
RR
R
Update rate = R
Regular Multicast
Sending the DataSending the Data
SteinerPoint
SteinerTree
R
R
RR
R
Update rate = R
Regular Multicast
Weighted Steiner Tree
?
Update rate = R
r1
r2
r3
r4
R
r1
r4
r3
r2
Asynchronous Multicast
r1, r2, r3, r4
are receiverrefresh rates
r4
r2
Weighted Steiner Tree
CachingSteinerpoints
SEAD: Scalable Energy Efficient Asynchronous Dissemination
Protocol
SEAD: Scalable Energy Efficient Asynchronous Dissemination
Protocol
• An asynchronous content distribution multicast tree is maintained
• Tree is modified when– a sink joins– a sink leaves– a sink moves beyond
some threshold
• Cost of building tree is minimized
r1
r4
r3
r2r4
r2
CachingSteinerpoints
Disseminationto Mobile Sinks
Access node
Mobile node
Forwardingchain
AssumptionsAssumptions
• WSN is static and then mobile nodes (e.g., PDAs) enter the network
• Dissemination trees are among the static nodes– Important – mobile nodes are NOT part
of the tree
• SEAD works with an overlay network– Source, sink representatives (access
points) and Steiner points (see previous slide)
4 Phases 4 Phases• First - Subscription Query
– Mobile node attaches to nearest node as access point
– Access node sends join query to source
Subscription Query (1)Subscription Query (1)
Source
Sink 1 (access node)
Sink 2 (access node)
4 Phases 4 Phases• Subscription Query
– Mobile node attaches to nearest node as access point
– Access node sends join query to source
• Second -Gate replica search– Attach new node on current tree at best
gate replica
Gate Replica Search (2)
Gate Replica Search (2)
Source
Sink 1 (access node)
GateReplicas(assume they existfor some currenttree – not shown)
Gate Replica Search (2)
Gate Replica Search (2)
Information source
Receivers
Attach to mostappropriate gatereplica
• Saves energy
Adjust ReplicaAdjust Replica
• If node moves and access point is no longer appropriate re-adjust tree in the local area, if necessary
• If new access node can attach without increasing cost then no need for additional replica (i.e., no better neighbors exist)
• If new replica is needed then it is chosen based on minimizing overall cost for this area
4 Phases 4 Phases• Subscription Query
– Mobile node attaches to nearest node as access point
– Access node sends join query to source
• Gate replica search– Attach new node on current tree at best
gate replica
• Third - Replica Placement– Locally adjust the tree to a better
dissemination tree
Gate Replica Placement (3)
Gate Replica Placement (3)
Source
Sink 1 (access node)
Sink 2 (access node)
GateReplica
Replica PlacementReplica Placement
• Branch cost must reflect the amount of energy spent on communication along the branch
• Tree cost = branch cost
r1
r4
r3
r2r4
r2
Branch Cost?
Branch Cost MetricBranch Cost Metric
Geographic Forwarding
Branch Cost = Distance x Packet_Rate
Distance
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
1. Broadcast replica request
2. Collect replica cost bids
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
Current cost: Sum of branch
lengths weighted by rate
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
1. Broadcast replica request
2. Collect replica cost bids
3. If local cost decreased,
choose least cost child
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
1. Broadcast replica request
2. Collect replica cost bids
3. If local cost decreased,
choose least cost child
Process Repeated:
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
1. Broadcast replica request
2. Collect replica cost bids
3. If local cost decreased,
choose least cost child
Process Repeated:
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
Current cost: Sum of branch
lengths weighted by rate
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
1. Broadcast replica request
2. Collect replica cost bids
3. If local cost decreased,
choose least cost child
Process Repeated:
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
1. Broadcast replica request
2. Collect replica cost bids
3. If local cost decreased,
choose least cost child
Process Repeated:
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
Current cost: Sum of branch
lengths weighted by rate
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
Can’t reduce cost further.
Replica placement
terminates
Access nodes are NOT
replicas
Cost Minimizing Replica Placement
Example
Cost Minimizing Replica Placement
Example
Source
Sink 1 (access node)
Sink 2 (access node)
Minimum cost replica
placement found
New Replica
4 Phases 4 Phases• Subscription Query
– Mobile node attaches to nearest node as access point
– Access node sends join query to source
• Gate replica search– Attach new node on current tree at best gate
replica
• Replica Placement– Locally adjust the tree to a better
dissemination tree
• Four - Sink Mobility– Change access point if mobile node moves too
far away
Sink Mobility (4)Sink Mobility (4)
Source
Sink 1 (access node)
Sink 2 (access node)
Adjust ReplicaAdjust Replica
• If node moves and access point is no longer appropriate re-adjust tree in the local area, if necessary
• If new access node can attach without increasing cost then no need for additional replica (i.e., no better neighbors exist)
• If new replica is needed then it is chosen based on minimizing overall cost for this area– all previous children of the previous replica
and this new access node are divided among the two replicas based on cost
Key PointsKey Points
• Mobile nodes are never used to route information– Minimizes need for recalculation of
dissemination tree
• Source node can be a leader for an area of nodes
• Scales – DD does not scale
Key PointsKey Points• SEAD is an overlay network
Geographic Forwarding
Distance
Other Issues For SEADOther Issues For SEAD
• Multiple simultaneous dissemination trees
• Multiple sources and 1 destination, e.g., a base station
Performance of SEADPerformance of SEAD
• Less energy consumed and less end-to-end delay than– Directed Diffusion– Two Tier Directed Diffusion– Mobile ad hoc multicast
SummarySummary
• Information is the whole point of the WSN
• Measure/Search - the physical world
• Spectrum - Raw data to application specific information (sensor fusion) – Elderly person
• Raw Sensor Data• Out of bed, near counter, pill bottle moved,
bathroom• Implies the person is OK
SummarySummary
• General queries versus fixed queries– Space, time, streaming, uncertain, …
• Data Aging ideas• Rendezvous points for queries and
data• Uncertain and range based data• High Level View of Using System• Real-Time
• Replication • Indices