Upload
poppy-cook
View
213
Download
0
Embed Size (px)
Citation preview
1
In Network Processing
When processing is cheaper than transmitting
Daniel V Uhlig Maryam Rahmaniheris
2
Basic Problem How to gather interesting data from
thousands of Motes?• Tens to thousands of motes• Unreliable individually
To collect and analyze data • Long term low energy deployment• Can using processing power at each Mote
Analyze local before sharing data
3
Costs Transmission of data is expensive compare to
CPU cycles• 1Kb transmitted 100 meters = 3 million CPU
instructions• AA power Mote can transmit 1 message per day for
about two months (assuming no other power draws)
• Power density is growing very slowly compared to computation power, storage, etc
Analyze and process locally, only transmitting what is required
4
Framework of Problem Minimize communications
◦ Minimize broadcast/receive time◦ Minimize message size ◦ Move computations to individual nodes
Nodes pass data in multi-hop fashion towards a root
Select connectivity so graph helps with processing
Handle faulty nodes within network
5
Example of Problem (MAX)
C: 4, 6
F: 2, 7,5, 10
E: 3, 5, 1
D: 3,4, 6
B: 4,7, 6
5
10
6
10A: 7,1, 6
7
10
6 5
5
6
Complications Max is very simple What about Count?
◦ Need to avoid double counting due to redundant paths
What about spatial events?◦ Need to evaluate readings across multiple sensors
Correlation between events Failures of nodes can loose branches of the
tree
7
Design Decisions
• Connectivity Graph – unstructured or how to structure
• Diffusion of requests and how to combine data
• Maintenance messages vs Query messages• Reliability of results• Load balancing– messages traffic – storage
• Storage costs at different nodes
8
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor
Networks
S.Madden, M.Franklin, J.Hellerstein, and W.Hong
Intel Research, 2002
9
TAG
• Aggregates values in low power, distributed network
• Implemented on TinyOS Motes• SQL like language to search for values or
sets of values– Simple declarative language
• Energy savings• Tree based methodology– Root node generates requests and dissipates
down the children
10
TAG Functions• Three functions to aggregate results– f (merge function)• Each node runs f to combine values• <z>=f (<x> , <y>) • EX: <SUM, COUNT>=f (<SUM1+SUM2>, <COUNT1+COUNT2>)
– i (initialize function)• Generates state record at lowest level of tree• EX:<SUM, COUNT>
– e (evaluator function)• Root uses e to generate the final result• RESULT=e<z>, • EX: SUM/COUNT
• Functions must be preloaded on Motes or distributed via software protocols
11
TAG
1 1
31
1
37
1
2 1
10Count =
Max via tree
12
TAG Taxonomy
All searches have different properties that affect aggregate performance
• Duplicate insensitive – unaffected by double counting (Max, Min) vs (Count, Average)– Restrict network properties
• Exemplary – return one value (Max/Min)– Sensitive to failure
• Summary – computation over values (Average)– Less sensitive to failure
13
TAG Taxonomy
• Distributive – Partial states are the same as final state (Max)
• Algebraic – Partial states are of fixed size but differ from final state (Average - Sum, Count)
• Holistic – Partial states contain all sub-records (median)– Unique – similar to Holistic, but partial records
may be smaller then holistic• Content Sensitive – Size of partial records
depend on content (Count Distinct)
14
TAG Diffusion of requests and then collection of
information Epochs subdivided
for each level to complete task◦ Saves energy◦ Limits rate of data
flow
15
TAG Optimizations Snooping – Broadcast messages so others
can hear messages◦ Rejoin tree if parents have failure◦ Listen to other broadcasts and only broadcast if
its values are needed In case of MAX, do not broadcast if peer has
transmitted a higher value Hypothesis testing – root guesses at value
to minimize traffic
16
TAG - Results Theoretic results for
◦ 2500 Nodes Savings depend on
function Duplicate
Insensitive, summary best◦ Distributive helps
Holistic is the worse
17
TAG Real World Results• 16 Mote network• Count number of motes
in 4 sec epochs• No optimizations• Quality of count is due to
less radio contention in TAG
• Centralized used 4685 messages vs TAG’s 2330
• 50% reduction, but less then theoretical results – Different loss model, node
placement
18
Advantages/Disadvantages
• Loss of nodes and subtrees–Maintenance for structured connectivity
• Single message per node per epoch–Message size might increase at higher level nodes– Root gets overload (Does it always matter?)
• Epochs give a method for idling nodes– Snooping not included, timing issues
20
Synopsis Diffusion for Robust Aggregation in Sensor Networks
S.Nath, P.Gibbons, S.Seshan, Z.AndersonMicrosoft Research, 2008
21
TAG◦ Not robust against node or link failure◦ A single node failure leads to loss of the entire sub branch's data
Synopsis Diffusion◦ Exploiting the broadcast nature of wireless medium to enhance reliability
◦ Separating routing from aggregation
◦ The final aggregated data at the sink is independent of the underlying routing topology
◦ Synopsis diffusion can be used on top of any routing structure
◦ The order of evaluations and the number of times each data included in the result is irrelevant
Motivation
TAG
Not robust against node or link failure
22
1 1
31
1
37
1
2 1
103Count = 10
23
Multi-path routing
◦ Benefits Robust Energy-efficient
◦ Challenges Duplicate sensitivity Order sensitivity
Synopsis Diffusion
14
7
152
20 23
Count =
1
3
2
5810
24
A novel aggregation framework◦ ODI synopsis: small-sized digest of the partial results
Bit-vectors Sample Histogram
Better aggregation topologies◦ Multi-path routing◦ Implicit acknowledgment◦ Adaptive rings
Example aggregates
Performance evaluation
Contributions
25
The exact definition of these functions depend on the particular aggregation function:◦ SG(.)
Takes a sensor reading and generates a synopsis◦ SF(.,.)
Takes two synopsis and generates a new one◦ SE(.)
Translates a synopsis into the final answer
AggregationSG: Synopsis Generation
SF: Synopsis FusionSE: Synopsis Evaluation
26
Distribution phase◦ The aggregate query is flooded◦ The aggregate topology is constructed
Aggregation phase◦ Aggregated values are routed toward Sink◦ SG() and SF() functions are used to create partial
results
Synopsis diffusion Algorithm
27
The sink is in R0
A node is in Ri if it’s i hops away from sink
Nodes in Ri-1 should hear the broadcast by nodes in Ri
Loose synchronization between nodes in different rings
Each node transmits only once◦ Energy cost same as tree
Ring Topology
R3
R2
R0
R1
A
B
C
28
Coin tossing experiment CT(x) used in Flajolet and Martin’s Algorithm:
◦ For i=1,…,x-1: CT(x) = i with probability ◦ Simulates the behavior of the exponential hash function◦ Synopsis: a bit vector of length k > log(n)
n is an upper bound on the number of the sensor nodes in the network
◦ SG(): a bit vector of length k with only the CT(k)th bit is set
◦ SF(): bit wise Boolean OR◦ SE(): the index of lowest-order 0 in the bit vector= i->
Example: Count
77.0/2 1i
i2
SG: Synopsis GenerationSF: Synopsis Fusion
SE: Synopsis Evaluation
Magic Constant
29
The number of live sensor nodes, N, is proportional to
Example: Count
0 1 0 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 1 0 0 1 0
0 1 1 0 0 0
0 1 0 0 0 0 0 1 0 0 1 0
0 1 1 0 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 1 0 1 1 Count 1 bits
4
12 i
Intuition: The probability of N nodes all failing to set the ith bit is which is approximately 0.37 when and even smaller for larger N.
Ni )21( iN 2
SG: Synopsis GenerationSF: Synopsis Fusion
SE: Synopsis Evaluation
30
ODI-Correctness
Aggregation DAG Canonical left-deep tree
For any aggregation DAG, the resulting synopsis is identical to the synopsis produced by the canonical left-deep tree
SG SG SG SG SG
SF
SF
SF
SF
SF S
F
SF
r1 r2 r5r3 r4
s
SG
SG
SG
SGSG
r1 r2
r3
r4
r5
SF
SF
SF
SF
s
SG: Synopsis GenerationSF: Synopsis Fusion
SE: Synopsis Evaluation
31
◦ P1: SG() preserves duplicates
If two reading are considered duplicates then the same synopsis is generated
◦ P2: SF() is commutative SF(s1, s2) = SF(s2, s1)
◦ P3: SF() is associative SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3)
◦ P4: SF() is same-synopsis idempotent SF(s, s) = s
A Simple Test for ODI-CorrectnessTheorem: Properties P1-P4
are necessary and sufficient properties for ODI-
Correctness
32
Uniform Sample of Readings
◦ Synopsis: A sample of size K of <value, random number, sensor id> tuples
◦ SG(): Output the tuple <valu, ru, idu>
◦ SF(s,s’): outputs the K tuples in s∪s’ with the K largest ri
◦ SE(s): Output the set of values val i in s
◦ Useful holistic aggregation
More Examples SG: Synopsis Generation
SF: Synopsis FusionSE: Synopsis Evaluation
33
Frequent Items (items occurring at least T times)
◦ Synopsis: A set of <val, weight> pairs, the values are unique and the weights are at least log(T)
◦ SG(): Compute CT(k) where k>log(n) and call this weight and if it’s at least log(T) output <val, weight>
◦ SF(s,s’): For each distinct value discard all but the pair <value, weight> with maximum weight. Output the remaining pairs.
◦ SE(s): Output <value, > for each <val, weight> pair in s as a frequent value and its approximate count
◦ Intuition: A value occurring at least T time is expected to have at least one of its calls to CT() return at least log(T) p=1/T
More Examples
weight2
SG: Synopsis GenerationSF: Synopsis Fusion
SE: Synopsis Evaluation
34
Communication error◦ 1-Percent contributing
◦ h: height of DAG
◦ k: the number of neighbors each nodes has
◦ p: probability of loss
◦ The overall communication error upper bound:
◦ If p=0.1, h=10 then the error is negligible with k=3
Approximation error◦ Introduced by SG(), SF(), and SE() functions
◦ Theorem 2: any approximation error guarantees provided for the centralized data stream scenario immediately applies to a synopsis diffusion algorithm , as long as the data stream synopsis is ODI-correct.
Error Bounds of Approximation
hkp )1(1
35
Implicit acknowledgement provided by ODI synopses
◦ Retransmission High energy cost and delay
◦ Adapting the topology When the number of times a node’s transmission is included
in the parents transmission is below a threshold Assigning the node to a ring that can have a good number of
parents Assign a node in ring i with probability p to :
Ring i +1 If ni > ni-1 ni+1 > ni -1 and ni+2 > ni
Ring i -1 If ni-2 > ni-1 ni-1 < ni+1 and ni-2 > ni
Adaptive Rings
36
Effectiveness of Adaptation
Rings Adaptive Rings
•Random placement of sensors in a 20*20 grid with a realistic communication model•the solid squares indicate the nodes not accounted for in the final answer
37
Realistic Loss Experiment
The algorithms are implemented in TAG simulator 600 sensors deployed randomly in a 20 ft * 20 ft grid The query node is in the center Loss probabilities are assigned based of the distance between nodes
38
Impact of Packet Loss
RMS Error % Value Included
39
Pros◦ High reliability and robustness◦ More accurate answers◦ Implicit acknowledgment◦ Dynamic topology adaptation◦ Moderately affected by mobility
Cons◦ Approximation error◦ Low node density decreases the benefits◦ The fusion functions should be defined for each
aggregation function◦ Increased message size
Synopsis Diffusion
40
Is there any benefit in coupling routing with aggregation?◦ Choosing the paths and finding the optimal aggregation points◦ Routing the sensed data along a longer path to maximize
aggregation◦ Finding the optimal routing structure
Considering energy cost of links NP-Complete Heuristics (Greedy Incremental)
Considering data correlation in the aggregation process◦ Spatial◦ Temporal
Defining a threshold TiNA
Overall Discussion points
41
Could energy saving gained by aggregation be outweighed by the cost of it? ◦ Aggregation function cost
Storage cost Computation cost (Number of CPU cycles)
No mobility◦ Static aggregation tree
Structure-less or structured? That is the question…◦ Continuous◦ On-demand
Overall Discussion points
42
Generalize Problem to other areas Transmitting large amounts of data on the
internet is slow ◦ Better to process locally and transmit the
interesting parts only
43
Overall Discussion points How does query rate affect design
decisions?
Load balancing between levels of the tree◦ Overload root and main nodes
How will video capabilities of Imote affect aggregation models?