42
In Network Processing When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

Embed Size (px)

Citation preview

Page 1: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

1

In Network Processing

When processing is cheaper than transmitting

Daniel V Uhlig Maryam Rahmaniheris

Page 2: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

2

Basic Problem How to gather interesting data from

thousands of Motes?• Tens to thousands of motes• Unreliable individually

To collect and analyze data • Long term low energy deployment• Can using processing power at each Mote

Analyze local before sharing data

Page 3: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

3

Costs Transmission of data is expensive compare to

CPU cycles• 1Kb transmitted 100 meters = 3 million CPU

instructions• AA power Mote can transmit 1 message per day for

about two months (assuming no other power draws)

• Power density is growing very slowly compared to computation power, storage, etc

Analyze and process locally, only transmitting what is required

Page 4: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

4

Framework of Problem Minimize communications

◦ Minimize broadcast/receive time◦ Minimize message size ◦ Move computations to individual nodes

Nodes pass data in multi-hop fashion towards a root

Select connectivity so graph helps with processing

Handle faulty nodes within network

Page 5: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

5

Example of Problem (MAX)

C: 4, 6

F: 2, 7,5, 10

E: 3, 5, 1

D: 3,4, 6

B: 4,7, 6

5

10

6

10A: 7,1, 6

7

10

6 5

5

Page 6: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

6

Complications Max is very simple What about Count?

◦ Need to avoid double counting due to redundant paths

What about spatial events?◦ Need to evaluate readings across multiple sensors

Correlation between events Failures of nodes can loose branches of the

tree

Page 7: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

7

Design Decisions

• Connectivity Graph – unstructured or how to structure

• Diffusion of requests and how to combine data

• Maintenance messages vs Query messages• Reliability of results• Load balancing– messages traffic – storage

• Storage costs at different nodes

Page 8: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

8

TAG: a Tiny Aggregation Service for Ad-Hoc Sensor

Networks

S.Madden, M.Franklin, J.Hellerstein, and W.Hong

Intel Research, 2002

Page 9: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

9

TAG

• Aggregates values in low power, distributed network

• Implemented on TinyOS Motes• SQL like language to search for values or

sets of values– Simple declarative language

• Energy savings• Tree based methodology– Root node generates requests and dissipates

down the children

Page 10: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

10

TAG Functions• Three functions to aggregate results– f (merge function)• Each node runs f to combine values• <z>=f (<x> , <y>) • EX: <SUM, COUNT>=f (<SUM1+SUM2>, <COUNT1+COUNT2>)

– i (initialize function)• Generates state record at lowest level of tree• EX:<SUM, COUNT>

– e (evaluator function)• Root uses e to generate the final result• RESULT=e<z>, • EX: SUM/COUNT

• Functions must be preloaded on Motes or distributed via software protocols

Page 11: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

11

TAG

1 1

31

1

37

1

2 1

10Count =

Max via tree

Page 12: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

12

TAG Taxonomy

All searches have different properties that affect aggregate performance

• Duplicate insensitive – unaffected by double counting (Max, Min) vs (Count, Average)– Restrict network properties

• Exemplary – return one value (Max/Min)– Sensitive to failure

• Summary – computation over values (Average)– Less sensitive to failure

Page 13: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

13

TAG Taxonomy

• Distributive – Partial states are the same as final state (Max)

• Algebraic – Partial states are of fixed size but differ from final state (Average - Sum, Count)

• Holistic – Partial states contain all sub-records (median)– Unique – similar to Holistic, but partial records

may be smaller then holistic• Content Sensitive – Size of partial records

depend on content (Count Distinct)

Page 14: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

14

TAG Diffusion of requests and then collection of

information Epochs subdivided

for each level to complete task◦ Saves energy◦ Limits rate of data

flow

Page 15: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

15

TAG Optimizations Snooping – Broadcast messages so others

can hear messages◦ Rejoin tree if parents have failure◦ Listen to other broadcasts and only broadcast if

its values are needed In case of MAX, do not broadcast if peer has

transmitted a higher value Hypothesis testing – root guesses at value

to minimize traffic

Page 16: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

16

TAG - Results Theoretic results for

◦ 2500 Nodes Savings depend on

function Duplicate

Insensitive, summary best◦ Distributive helps

Holistic is the worse

Page 17: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

17

TAG Real World Results• 16 Mote network• Count number of motes

in 4 sec epochs• No optimizations• Quality of count is due to

less radio contention in TAG

• Centralized used 4685 messages vs TAG’s 2330

• 50% reduction, but less then theoretical results – Different loss model, node

placement

Page 18: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

18

Advantages/Disadvantages

• Loss of nodes and subtrees–Maintenance for structured connectivity

• Single message per node per epoch–Message size might increase at higher level nodes– Root gets overload (Does it always matter?)

• Epochs give a method for idling nodes– Snooping not included, timing issues

Page 19: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

20

Synopsis Diffusion for Robust Aggregation in Sensor Networks

S.Nath, P.Gibbons, S.Seshan, Z.AndersonMicrosoft Research, 2008

Page 20: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

21

TAG◦ Not robust against node or link failure◦ A single node failure leads to loss of the entire sub branch's data

Synopsis Diffusion◦ Exploiting the broadcast nature of wireless medium to enhance reliability

◦ Separating routing from aggregation

◦ The final aggregated data at the sink is independent of the underlying routing topology

◦ Synopsis diffusion can be used on top of any routing structure

◦ The order of evaluations and the number of times each data included in the result is irrelevant

Motivation

Page 21: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

TAG

Not robust against node or link failure

22

1 1

31

1

37

1

2 1

103Count = 10

Page 22: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

23

Multi-path routing

◦ Benefits Robust Energy-efficient

◦ Challenges Duplicate sensitivity Order sensitivity

Synopsis Diffusion

14

7

152

20 23

Count =

1

3

2

5810

Page 23: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

24

A novel aggregation framework◦ ODI synopsis: small-sized digest of the partial results

Bit-vectors Sample Histogram

Better aggregation topologies◦ Multi-path routing◦ Implicit acknowledgment◦ Adaptive rings

Example aggregates

Performance evaluation

Contributions

Page 24: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

25

The exact definition of these functions depend on the particular aggregation function:◦ SG(.)

Takes a sensor reading and generates a synopsis◦ SF(.,.)

Takes two synopsis and generates a new one◦ SE(.)

Translates a synopsis into the final answer

AggregationSG: Synopsis Generation

SF: Synopsis FusionSE: Synopsis Evaluation

Page 25: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

26

Distribution phase◦ The aggregate query is flooded◦ The aggregate topology is constructed

Aggregation phase◦ Aggregated values are routed toward Sink◦ SG() and SF() functions are used to create partial

results

Synopsis diffusion Algorithm

Page 26: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

27

The sink is in R0

A node is in Ri if it’s i hops away from sink

Nodes in Ri-1 should hear the broadcast by nodes in Ri

Loose synchronization between nodes in different rings

Each node transmits only once◦ Energy cost same as tree

Ring Topology

R3

R2

R0

R1

A

B

C

Page 27: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

28

Coin tossing experiment CT(x) used in Flajolet and Martin’s Algorithm:

◦ For i=1,…,x-1: CT(x) = i with probability ◦ Simulates the behavior of the exponential hash function◦ Synopsis: a bit vector of length k > log(n)

n is an upper bound on the number of the sensor nodes in the network

◦ SG(): a bit vector of length k with only the CT(k)th bit is set

◦ SF(): bit wise Boolean OR◦ SE(): the index of lowest-order 0 in the bit vector= i->

Example: Count

77.0/2 1i

i2

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Magic Constant

Page 28: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

29

The number of live sensor nodes, N, is proportional to

Example: Count

0 1 0 0 0 0 0 0 0 0 1 0

0 0 1 0 0 0 0 0 0 0 0 1

0 1 0 0 0 0 0 1 0 0 1 0

0 1 1 0 0 0

0 1 0 0 0 0 0 1 0 0 1 0

0 1 1 0 1 0

0 1 0 0 1 0

0 1 0 0 1 1

0 1 1 0 1 1 Count 1 bits

4

12 i

Intuition: The probability of N nodes all failing to set the ith bit is which is approximately 0.37 when and even smaller for larger N.

Ni )21( iN 2

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Page 29: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

30

ODI-Correctness

Aggregation DAG Canonical left-deep tree

For any aggregation DAG, the resulting synopsis is identical to the synopsis produced by the canonical left-deep tree

SG SG SG SG SG

SF

SF

SF

SF

SF S

F

SF

r1 r2 r5r3 r4

s

SG

SG

SG

SGSG

r1 r2

r3

r4

r5

SF

SF

SF

SF

s

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Page 30: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

31

◦ P1: SG() preserves duplicates

If two reading are considered duplicates then the same synopsis is generated

◦ P2: SF() is commutative SF(s1, s2) = SF(s2, s1)

◦ P3: SF() is associative SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3)

◦ P4: SF() is same-synopsis idempotent SF(s, s) = s

A Simple Test for ODI-CorrectnessTheorem: Properties P1-P4

are necessary and sufficient properties for ODI-

Correctness

Page 31: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

32

Uniform Sample of Readings

◦ Synopsis: A sample of size K of <value, random number, sensor id> tuples

◦ SG(): Output the tuple <valu, ru, idu>

◦ SF(s,s’): outputs the K tuples in s∪s’ with the K largest ri

◦ SE(s): Output the set of values val i in s

◦ Useful holistic aggregation

More Examples SG: Synopsis Generation

SF: Synopsis FusionSE: Synopsis Evaluation

Page 32: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

33

Frequent Items (items occurring at least T times)

◦ Synopsis: A set of <val, weight> pairs, the values are unique and the weights are at least log(T)

◦ SG(): Compute CT(k) where k>log(n) and call this weight and if it’s at least log(T) output <val, weight>

◦ SF(s,s’): For each distinct value discard all but the pair <value, weight> with maximum weight. Output the remaining pairs.

◦ SE(s): Output <value, > for each <val, weight> pair in s as a frequent value and its approximate count

◦ Intuition: A value occurring at least T time is expected to have at least one of its calls to CT() return at least log(T) p=1/T

More Examples

weight2

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Page 33: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

34

Communication error◦ 1-Percent contributing

◦ h: height of DAG

◦ k: the number of neighbors each nodes has

◦ p: probability of loss

◦ The overall communication error upper bound:

◦ If p=0.1, h=10 then the error is negligible with k=3

Approximation error◦ Introduced by SG(), SF(), and SE() functions

◦ Theorem 2: any approximation error guarantees provided for the centralized data stream scenario immediately applies to a synopsis diffusion algorithm , as long as the data stream synopsis is ODI-correct.

Error Bounds of Approximation

hkp )1(1

Page 34: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

35

Implicit acknowledgement provided by ODI synopses

◦ Retransmission High energy cost and delay

◦ Adapting the topology When the number of times a node’s transmission is included

in the parents transmission is below a threshold Assigning the node to a ring that can have a good number of

parents Assign a node in ring i with probability p to :

Ring i +1 If ni > ni-1 ni+1 > ni -1 and ni+2 > ni

Ring i -1 If ni-2 > ni-1 ni-1 < ni+1 and ni-2 > ni

Adaptive Rings

Page 35: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

36

Effectiveness of Adaptation

Rings Adaptive Rings

•Random placement of sensors in a 20*20 grid with a realistic communication model•the solid squares indicate the nodes not accounted for in the final answer

Page 36: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

37

Realistic Loss Experiment

The algorithms are implemented in TAG simulator 600 sensors deployed randomly in a 20 ft * 20 ft grid The query node is in the center Loss probabilities are assigned based of the distance between nodes

Page 37: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

38

Impact of Packet Loss

RMS Error % Value Included

Page 38: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

39

Pros◦ High reliability and robustness◦ More accurate answers◦ Implicit acknowledgment◦ Dynamic topology adaptation◦ Moderately affected by mobility

Cons◦ Approximation error◦ Low node density decreases the benefits◦ The fusion functions should be defined for each

aggregation function◦ Increased message size

Synopsis Diffusion

Page 39: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

40

Is there any benefit in coupling routing with aggregation?◦ Choosing the paths and finding the optimal aggregation points◦ Routing the sensed data along a longer path to maximize

aggregation◦ Finding the optimal routing structure

Considering energy cost of links NP-Complete Heuristics (Greedy Incremental)

Considering data correlation in the aggregation process◦ Spatial◦ Temporal

Defining a threshold TiNA

Overall Discussion points

Page 40: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

41

Could energy saving gained by aggregation be outweighed by the cost of it? ◦ Aggregation function cost

Storage cost Computation cost (Number of CPU cycles)

No mobility◦ Static aggregation tree

Structure-less or structured? That is the question…◦ Continuous◦ On-demand

Overall Discussion points

Page 41: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

42

Generalize Problem to other areas Transmitting large amounts of data on the

internet is slow ◦ Better to process locally and transmit the

interesting parts only

Page 42: When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

43

Overall Discussion points How does query rate affect design

decisions?

Load balancing between levels of the tree◦ Overload root and main nodes

How will video capabilities of Imote affect aggregation models?