Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst

Collecting Correlated Information from a Sensor Network

Micah Adler

University of Massachusetts, Amherst

Fundamental Problem• Collecting information from distributed sources.

– Objective: correlations reduce bits that must be sent.

• Correlation examples in sensor networks:– Weather in geographic region.

– Similar views of same image.

• Our focus: information theory– Number of bits sent.

– Ignore network topology.

Modeling Correlation• k sensor nodes each have n-bit string.

• Input drawn from distribution D.– Sample specifies all kn bits.

– Captures correlations

and a priori knowledge.

• Objective: – Inform server of all k strings.

– Ideally: nodes send H(D) bits.

– H(D): Binary entropy of D.

x1

x2

x3

x4

x5

x6

xk

Binary entropy of D

• Optimal code to describe sample from D:– Expected number of bits required ≈ H(D).

• Ranges from 0 to kn.

– Easy if entire sample known by single node.• Idea: shorter codewords for more likely samples.

• Challenge of our problem: input distributed.

€

H D( ) = −Pr x[ ]

x ∈ D∑ log2 Pr x[ ]

Distributed Source Coding

• [Slepian-Wolf, 1973]:– Simultaneously encode r independent samples.– As r ,

• Bits sent by nodes rH(D).

• Probability of error 0.

• Drawback: relies on r – Recent research: try to remove this.

€

x11x1

2... x1r

€

x21x2

2... x2r

€

x31x3

2... x3r

€

x41 x4

2... x4r

€

x51x5

2... x5r

€

x61x6

2... x6r

€

xk1xk

2... xkr

Recent Research on DSC

• Survey [XLC 2004]: over 50 recent DSC papers– All previous techniques:

• Significant restrictions on D and/or

• Large values of r required.– Can also be viewed as restriction on D.

– Generalizing D most important open problem.

• Impossibility result:– There are D such that no encoding for small r achieves O(H(D)+k) bits sent from nodes.

• Our result: general D, r=1, O(H(D)+k) bits.

New approach• Allow interactive communication!

– Nodes receive “feedback” from server.• Also utilized for DSC in [CPR 2003].

– Server at least as powerful as nodes.

• Power utilization: – Central issue for sensor networks.– Node sending: power intensive.– Node receiving requires less power.

• Analogy: portable radio vs. cellphone.

x1

x2

x4

x5

xk

x3

New approach• Communication model:

– Synchronous rounds:• Nodes send bits to server.• Server sends bits back to nodes.• Nothing directly between nodes.

• Objectives:– Minimize bits sent by nodes.

• Ideally O(H(D)+k).

– Minimize bits sent by server.– Minimize rounds of communication.

x1

x2

x4

x5

xk

x3

Asymmetric Commmunication Channels

• Adler-Maggs 1998: k=1 case.

• Subsequent work: [HL2002] [GGS2001] [W2000] [WAF2001]

• Other applications:– Circumventing web censorship [FBHBK2002]

– Design of websites [BKLM2002]

• Sensor networks problem: natural parallelization.

xD

Who knows what?

• Nodes: only know own string.– Can also assume they know distribution D.

• Server: knows distribution D.– Typical in work on DSC.

– Some applications: D must be learned by server

• Most such cases: D varies with time.

• Crucial to have r as small as possible.

D

X1 X2 X3 X4 X5 X6 X7 X8, D , D , D , D , D , D , D , D

New Contributions• New technique to communicate interactively:

– O(H(D)+k) node bits. – O(kn + H(D) log n) server bits.– O(log min(k, H(D))) rounds.

• Lower bound:– kn bits must be exchanged if no error.

• If server is allowed error with probability ∆:– O(H(D)+k) node bits. – O(k log (kn/∆) + H(D) log n) server bits.– O(log min(k, H(D))) rounds.

General strategy

• Support uniform case:– D is uniform over set of possible inputs.

• General distributions:– Technique from [Adler-Maggs 1998]

• “Reduce” to support uniform case.

– Requires modification to support uniform protocol.

• Allowing for Error:– Same protocol with some enhancements.

Support Uniform Input

• D: k-dimensional binary matrix– side length 2n

• Choose X: uniform 1 entry of matrix.

• Server is given matrix, wants to know X.

• Node i given ith coordinate of X.

• H(D) = log M– M: number of possible inputs.

Basic Building Block

• Standard fingerprinting techniques:– Class of hash functions f:

• n-bit string 1 bit.• For randomly chosen f,

– If x y, then Pr[ f(x) = f(y)] ≈ 1/2

• Description of f requires O(log n) bits.

Not possible inputs

Possible inputs

Node 1 bits:

Protocol for k=1

• Server sends node log M fingerprint fcts.

• Node sends back resulting fingerprint bits.

• Intuition: each bit removes half inputs left.

What about k=2?

Node 1 bits:

Node 2 bits:

What about k=2?

Node 1 bits:

Node 2 bits:

First step: allow many rounds.

• Each round: – Server chooses one node.– That node sends single fingerprint bit.

• Objectives:– Ideal: each bit removes half remaining inputs.– Our goal: expected inputs removed is constant fraction.

– Possibility: no node can send such a bit.

• Need to distinguish “useful” bits from “not useful” bits.

Balanced Bits• Fingerprint bit sent by node i is balanced:

– No value for i has > 1/2 possible inputs,– given all information considered so far.

• Balanced bits, in expectation,eliminate constant fraction of inputs.

- Protocol goal:–Collect O(log M) balanced bits.–Don’t collect (log M) unbalanced bits.

Balanced bit:Unbalanced bit:

Objective: minimize rounds.

• Must send multiple bits from multiple nodes.– But shouldn’t send too many unbalanced bits!

• Difficulty:– Must decide how many at start of round.– As bits processed, matrix evolves.

• Node may only send unbalanced bits.

Number of bits sent per round

• Defined node: only one possible value left.– Should no longer send bits.

• First try:– Round i: ki undefined nodes, each send bits

• Possible problem: – Most nodes become defined at start of round i

– Nodes might send total bits.

€

log M

ki

€

Ω logk log M

loglog k

⎛

⎝ ⎜

⎞

⎠ ⎟

Protocol Description• Phases: first round and second round.

– First of phase i: undefined nodes send bi bits.

• Server processes bits in any order.• Counts number of balanced bits.

– Second: if node had any unbalanced bits• Query if it has first heavy string.

• Continue until (log M) balanced bits– Or until know entire input.– Send nodes remaining possibilities.

€

bi = minlog M

ki,3bi−1

2

⎛

⎝ ⎜

⎞

⎠ ⎟

Performance of protocol

• Theorem:– Expected bits sent by nodes:

O(k + log M)– Expected bits sent by server:

O(kn + log M log n)– Expected number of rounds:

O(min(log log M, log k))

Proof: O(log M + k) node bits.

• Key: if node i sends unbalanced bit in phase– Pr[i defined by end of phase] 1/2

• Expected answers to heavy queries: O(1).• Accounting for unbalanced bits:

– Charge to bit sent by same node:• In most recent balanced round

– If none, then first round.

• Spread bits charged to round evenly.

Proof: O(log M + k) node bits.

• Expected unbalanced bits charged to any bit:

• Total balanced bits: O(log M)

• Total first round bits:€

i−112( )

i=1

∞

∑i

32( ) =O 1( )

€

k log M

k

⎡ ⎢ ⎢

⎤ ⎥ ⎥

Proof: server bits and rounds

• Server bits: after O(log M) balanced bits– Pr[incorrect input not eliminated] = 1/M– Expected possible inputs remaining: O(1)

• Rounds:– Sparse phase: < 2/3 log M fingerprint bits– O(1) non-sparse phases.– O(min(log k, log log M)) sparse phases.

Conclusions

• New technique for collecting distributed and correlated information.

• Allows for arbitrary distributions and correlations.• Single sample sufficient.

• Open problems:– Lower bound on rounds.– Achieving full Slepian-Wolf rate region.– Incorporating network topology.

Documents

Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst