42
In Network Processing When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1

In Network Processing

  • Upload
    suchi

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

In Network Processing. When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris. Basic Problem. How to gather interesting data from thousands of Motes? Tens to thousands of motes Unreliable individually To collect and analyze data - PowerPoint PPT Presentation

Citation preview

Page 1: In Network Processing

1

In Network Processing

When processing is cheaper than transmitting

Daniel V Uhlig Maryam Rahmaniheris

Page 2: In Network Processing

2

Basic Problem How to gather interesting data from

thousands of Motes?• Tens to thousands of motes• Unreliable individually

To collect and analyze data • Long term low energy deployment• Can using processing power at each Mote

Analyze local before sharing data

Page 3: In Network Processing

3

Costs Transmission of data is expensive compare to

CPU cycles• 1Kb transmitted 100 meters = 3 million CPU

instructions• AA power Mote can transmit 1 message per day for

about two months (assuming no other power draws)• Power density is growing very slowly compared to

computation power, storage, etc

Analyze and process locally, only transmitting what is required

Page 4: In Network Processing

4

Framework of Problem Minimize communications

◦ Minimize broadcast/receive time◦ Minimize message size ◦ Move computations to individual nodes

Nodes pass data in multi-hop fashion towards a root

Select connectivity so graph helps with processing

Handle faulty nodes within network

Page 5: In Network Processing

5

Example of Problem (MAX)

C: 4, 6

F: 2, 7,5, 10

E: 3, 5, 1

D: 3,4, 6

B: 4,7, 6

5

10

6

10A: 7,1, 6

7

10

6 5

5

Page 6: In Network Processing

6

Complications Max is very simple What about Count?

◦ Need to avoid double counting due to redundant paths

What about spatial events?◦ Need to evaluate readings across multiple sensors

Correlation between events Failures of nodes can loose branches of the

tree

Page 7: In Network Processing

7

Design Decisions• Connectivity Graph – unstructured or how to structure

• Diffusion of requests and how to combine data

• Maintenance messages vs Query messages• Reliability of results• Load balancing– messages traffic – storage

• Storage costs at different nodes

Page 8: In Network Processing

8

TAG: a Tiny Aggregation Service for Ad-Hoc Sensor

NetworksS.Madden, M.Franklin, J.Hellerstein, and W.Hong

Intel Research, 2002

Page 9: In Network Processing

9

TAG• Aggregates values in low power, distributed

network• Implemented on TinyOS Motes• SQL like language to search for values or

sets of values– Simple declarative language

• Energy savings• Tree based methodology– Root node generates requests and dissipates

down the children

Page 10: In Network Processing

10

TAG Functions• Three functions to aggregate results– f (merge function)• Each node runs f to combine values• <z>=f (<x> , <y>) • EX: <SUM, COUNT>=f (<SUM1+SUM2>, <COUNT1+COUNT2>)

– i (initialize function)• Generates state record at lowest level of tree• EX:<SUM, COUNT>

– e (evaluator function)• Root uses e to generate the final result• RESULT=e<z>, • EX: SUM/COUNT

• Functions must be preloaded on Motes or distributed via software protocols

Page 11: In Network Processing

11

TAG

1 1

31

1

371

2 110Count =

Max via tree

Page 12: In Network Processing

12

TAG TaxonomyAll searches have different properties that

affect aggregate performance• Duplicate insensitive – unaffected by double

counting (Max, Min) vs (Count, Average)– Restrict network properties

• Exemplary – return one value (Max/Min)– Sensitive to failure

• Summary – computation over values (Average)– Less sensitive to failure

Page 13: In Network Processing

13

TAG Taxonomy• Distributive – Partial states are the same as

final state (Max)• Algebraic – Partial states are of fixed size but

differ from final state (Average - Sum, Count)• Holistic – Partial states contain all sub-records

(median)– Unique – similar to Holistic, but partial records may

be smaller then holistic• Content Sensitive – Size of partial records

depend on content (Count Distinct)

Page 14: In Network Processing

14

TAG Diffusion of requests and then collection of

information Epochs subdivided

for each level to complete task◦ Saves energy◦ Limits rate of data

flow

Page 15: In Network Processing

15

TAG Optimizations Snooping – Broadcast messages so others

can hear messages◦ Rejoin tree if parents have failure◦ Listen to other broadcasts and only broadcast if

its values are needed In case of MAX, do not broadcast if peer has

transmitted a higher value Hypothesis testing – root guesses at value

to minimize traffic

Page 16: In Network Processing

16

TAG - Results Theoretic results for

◦ 2500 Nodes Savings depend on

function Duplicate Insensitive,

summary best◦ Distributive helps

Holistic is the worse

Page 17: In Network Processing

17

TAG Real World Results• 16 Mote network• Count number of motes

in 4 sec epochs• No optimizations• Quality of count is due to

less radio contention in TAG

• Centralized used 4685 messages vs TAG’s 2330

• 50% reduction, but less then theoretical results – Different loss model, node

placement

Page 18: In Network Processing

18

Advantages/Disadvantages• Loss of nodes and subtrees–Maintenance for structured connectivity

• Single message per node per epoch–Message size might increase at higher level nodes– Root gets overload (Does it always matter?)

• Epochs give a method for idling nodes– Snooping not included, timing issues

Page 19: In Network Processing

20

Synopsis Diffusion for Robust Aggregation in Sensor Networks

S.Nath, P.Gibbons, S.Seshan, Z.AndersonMicrosoft Research, 2008

Page 20: In Network Processing

21

TAG◦ Not robust against node or link failure◦ A single node failure leads to loss of the entire sub branch's data

Synopsis Diffusion◦ Exploiting the broadcast nature of wireless medium to enhance reliability

◦ Separating routing from aggregation

◦ The final aggregated data at the sink is independent of the underlying routing topology

◦ Synopsis diffusion can be used on top of any routing structure

◦ The order of evaluations and the number of times each data included in the result is irrelevant

Motivation

Page 21: In Network Processing

TAG

Not robust against node or link failure

22

1 1

31

1

371

2 1103Count = 10

Page 22: In Network Processing

23

Multi-path routing

◦ Benefits Robust Energy-efficient

◦ Challenges Duplicate sensitivity Order sensitivity

Synopsis Diffusion

14

715

2

20 23Count =

1

3

2

5810

Page 23: In Network Processing

24

A novel aggregation framework◦ ODI synopsis: small-sized digest of the partial results

Bit-vectors Sample Histogram

Better aggregation topologies◦ Multi-path routing◦ Implicit acknowledgment◦ Adaptive rings

Example aggregates

Performance evaluation

Contributions

Page 24: In Network Processing

25

The exact definition of these functions depend on the particular aggregation function:◦ SG(.)

Takes a sensor reading and generates a synopsis◦ SF(.,.)

Takes two synopsis and generates a new one◦ SE(.)

Translates a synopsis into the final answer

AggregationSG: Synopsis Generation

SF: Synopsis FusionSE: Synopsis Evaluation

Page 25: In Network Processing

26

Distribution phase◦ The aggregate query is flooded◦ The aggregate topology is constructed

Aggregation phase◦ Aggregated values are routed toward Sink◦ SG() and SF() functions are used to create partial

results

Synopsis diffusion Algorithm

Page 26: In Network Processing

27

The sink is in R0

A node is in Ri if it’s i hops away from sink

Nodes in Ri-1 should hear the broadcast by nodes in Ri

Loose synchronization between nodes in different rings

Each node transmits only once◦ Energy cost same as tree

Ring Topology

R3

R2

R0

R1

A

B

C

Page 27: In Network Processing

28

Coin tossing experiment CT(x) used in Flajolet and Martin’s Algorithm:

◦ For i=1,…,x-1: CT(x) = i with probability ◦ Simulates the behavior of the exponential hash function◦ Synopsis: a bit vector of length k > log(n)

n is an upper bound on the number of the sensor nodes in the network

◦ SG(): a bit vector of length k with only the CT(k)th bit is set

◦ SF(): bit wise Boolean OR◦ SE(): the index of lowest-order 0 in the bit vector= i->

Example: Count

77.0/2 1i

i2

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Magic Constant

Page 28: In Network Processing

29

The number of live sensor nodes, N, is proportional to

Example: Count

0 1 0 0 0 0 0 0 0 0 1 0

0 0 1 0 0 0 0 0 0 0 0 1

0 1 0 0 0 0 0 1 0 0 1 0

0 1 1 0 0 0

0 1 0 0 0 0 0 1 0 0 1 0

0 1 1 0 1 0

0 1 0 0 1 0

0 1 0 0 1 1

0 1 1 0 1 1 Count 1 bits4

12 i

Intuition: The probability of N nodes all failing to set the ith bit is which is approximately 0.37 when and even smaller for larger N.

Ni )21( iN 2

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Page 29: In Network Processing

30

ODI-Correctness

Aggregation DAG Canonical left-deep tree

For any aggregation DAG, the resulting synopsis is identical to the synopsis produced by the canonical left-deep tree

SG SG SG SG SG

SF

SF

SF

SF

SF S

FSF

r1 r2 r5r3 r4

s

SG

SG

SG

SGSG

r1 r2

r3

r4

r5

SF

SF

SF

SF

s

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Page 30: In Network Processing

31

◦ P1: SG() preserves duplicates

If two reading are considered duplicates then the same synopsis is generated

◦ P2: SF() is commutative SF(s1, s2) = SF(s2, s1)

◦ P3: SF() is associative SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3)

◦ P4: SF() is same-synopsis idempotent SF(s, s) = s

A Simple Test for ODI-CorrectnessTheorem: Properties P1-P4

are necessary and sufficient properties for ODI-

Correctness

Page 31: In Network Processing

32

Uniform Sample of Readings

◦ Synopsis: A sample of size K of <value, random number, sensor id> tuples

◦ SG(): Output the tuple <valu, ru, idu>

◦ SF(s,s’): outputs the K tuples in s∪s’ with the K largest ri

◦ SE(s): Output the set of values vali in s

◦ Useful holistic aggregation

More Examples SG: Synopsis Generation

SF: Synopsis FusionSE: Synopsis Evaluation

Page 32: In Network Processing

33

Frequent Items (items occurring at least T times)

◦ Synopsis: A set of <val, weight> pairs, the values are unique and the weights are at least log(T)

◦ SG(): Compute CT(k) where k>log(n) and call this weight and if it’s at least log(T) output <val, weight>

◦ SF(s,s’): For each distinct value discard all but the pair <value, weight> with maximum weight. Output the remaining pairs.

◦ SE(s): Output <value, > for each <val, weight> pair in s as a frequent value and its approximate count

◦ Intuition: A value occurring at least T time is expected to have at least one of its calls to CT() return at least log(T) p=1/T

More Examples

weight2

SG: Synopsis GenerationSF: Synopsis Fusion

SE: Synopsis Evaluation

Page 33: In Network Processing

34

Communication error◦ 1-Percent contributing◦ h: height of DAG ◦ k: the number of neighbors each nodes has◦ p: probability of loss ◦ The overall communication error upper bound: ◦ If p=0.1, h=10 then the error is negligible with k=3

Approximation error◦ Introduced by SG(), SF(), and SE() functions◦ Theorem 2: any approximation error guarantees provided for the

centralized data stream scenario immediately applies to a synopsis diffusion algorithm , as long as the data stream synopsis is ODI-correct.

Error Bounds of Approximation

hkp )1(1

Page 34: In Network Processing

35

Implicit acknowledgement provided by ODI synopses ◦ Retransmission

High energy cost and delay◦ Adapting the topology

When the number of times a node’s transmission is included in the parents transmission is below a threshold

Assigning the node to a ring that can have a good number of parents

Assign a node in ring i with probability p to : Ring i +1 If

ni > ni-1 ni+1 > ni -1 and ni+2 > ni

Ring i -1 If ni-2 > ni-1 ni-1 < ni+1 and ni-2 > ni

Adaptive Rings

Page 35: In Network Processing

36

Effectiveness of Adaptation

Rings Adaptive Rings

•Random placement of sensors in a 20*20 grid with a realistic communication model•the solid squares indicate the nodes not accounted for in the final answer

Page 36: In Network Processing

37

Realistic Loss Experiment

The algorithms are implemented in TAG simulator 600 sensors deployed randomly in a 20 ft * 20 ft grid The query node is in the center Loss probabilities are assigned based of the distance between nodes

Page 37: In Network Processing

38

Impact of Packet Loss

RMS Error % Value Included

Page 38: In Network Processing

39

Pros◦ High reliability and robustness◦ More accurate answers◦ Implicit acknowledgment◦ Dynamic topology adaptation◦ Moderately affected by mobility

Cons◦ Approximation error◦ Low node density decreases the benefits◦ The fusion functions should be defined for each

aggregation function◦ Increased message size

Synopsis Diffusion

Page 39: In Network Processing

40

Is there any benefit in coupling routing with aggregation?◦ Choosing the paths and finding the optimal aggregation points◦ Routing the sensed data along a longer path to maximize

aggregation◦ Finding the optimal routing structure

Considering energy cost of links NP-Complete Heuristics (Greedy Incremental)

Considering data correlation in the aggregation process◦ Spatial◦ Temporal

Defining a threshold TiNA

Overall Discussion points

Page 40: In Network Processing

41

Could energy saving gained by aggregation be outweighed by the cost of it? ◦ Aggregation function cost

Storage cost Computation cost (Number of CPU cycles)

No mobility◦ Static aggregation tree

Structure-less or structured? That is the question…◦ Continuous◦ On-demand

Overall Discussion points

Page 41: In Network Processing

42

Generalize Problem to other areas Transmitting large amounts of data on the

internet is slow ◦ Better to process locally and transmit the

interesting parts only

Page 42: In Network Processing

43

Overall Discussion points How does query rate affect design

decisions?

Load balancing between levels of the tree◦ Overload root and main nodes

How will video capabilities of Imote affect aggregation models?