34
1 Fountain Codes Based Distributed Storage Algorithms for Large- scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

1

Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks

Salah A. Aly, Zhenning Kong, Emina Soljanin

IEEE IPSN 2008

Page 2: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

2

Outlines

Introduction Fountain Codes LT-Codes Based Distributed Storage

(LTCDS) Algorithms With limited Global Information - LTCDS-I Without any Global Information - LTCDS-II

Performance Evaluation Conclusion

Page 3: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

3

Introduction

Nodes in wireless sensor networks have limited resources e.g. CPU power, bandwidth, memory, lifetime.

They can monitor objects, detect fires, floods, and other disaster phenomenon.

We consider a network with n randomly distributed nodes; among them are k sensor nodes, k/n = 10%.

Our goal is to find techniques to redundantly distribute data from k source nodes to n storage nodes.

So, by querying any (1+ε)k nodes, one can retrieve the original information acquired by the k sources with low computational complexity.

Page 4: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

4

Introduction There are 25 sensors monitoring an area. There are 225 additional storage nodes. Information acquired by the sensors should

1) be available in any neighborhood 2) easy to compute from storage 3) be extractable from any 25+ nodes

Fig. 1. A sensor network has 25 sensors (big dots) monitoring an area and 225 storage nodes (small dots). A good distributed storage algorithm should enable us to recover the original 25 source packets from any 25+ nodes (e.g., the set of nodes within any one of the three illustrated circular regions).

Page 5: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

5

Introduction

We know how to solve the centralized version of this problem by coding (e.g. Fountain Codes, MDS Codes, linear codes).

Our contribution: Solve the problem in a distributed decentralized random way on a network.

Problem: Find an efficient strategy to add some redundancy distribute information randomly through a

network decode easily from (1+ε)k nodes

Page 6: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

6

Network Model

Suppose a network with n storage nodes randomly distributed.

k source nodes have information to be disseminated randomly throughout the network for storage.

Every source node si generates an independent packet. We will use Fountain codes and random walks to

disseminate information from k to n. The idea is to build a system of n equations in k variables. For example,

y1 = x1 ⊕ x2 ⊕ x3

y2 = x2 ⊕ x3 ⊕ x5 ⊕ xi

y3 = x1 ⊕ x3 ⊕ … ⊕ xk

...yn = x4 ⊕ x6 ⊕ xi ⊕ … ⊕ xk

Decode easily from (1+ε)k equations, forε>0.

Page 7: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

7

Fountain codes

Assume k source blocks S = {x1, x2,…,xk}. Each output block yi is obtained by XORing some source blocks

from S. d(yi) is number of incoming blocks in ith equation, 1 ≦d(yi) ≦ k.

The Fountain code idea: Choose d(yi) randomly according to a probability distribution such that it is easy to decode from any (1+ε)k output blocks.

Easy to decode: Hard to decode:

x1 x1 ⊕ x2

x1 ⊕ x2 x2 ⊕ x3

x1 ⊕ x2 ⊕ x3 x1 ⊕ x4 ⊕ x5

LT and Raptor codes are two classes of Fountain codes.

Page 8: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

8

Fountain codes

For k source blocks {x1, x2,…,xk} and a probability distributionΩ (d) with 1≦ d≦ k, a Fountain code with parameters (k, Ω) is a potentially limitless stream of output blocks {y1,y2,…}

Each output block yi is obtained by XORing d randomly and independently chosen source blocks.

Figure 1. The encoding operations of Fountain codes: each output is obtained by XORing d source blocks chosen uniformly and independently at random from k source inputs, where d is drawn according to a probability distribution Ω(d).

Page 9: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

9

LT Codes

Definition 2. (Code Degree) For Fountain codes, the number of source blocks used to generate an encoded output y is called the code degree of y, and denoted by dc(y). By constraction, the code degree distributionΩ(d) is the probability distribution of dc(y).

LT (Luby Transform) codes are a special class of Fountain codes which uses Ideal Soliton or Robust Soliton distributions. The Ideal Soliton distribution Ωis(d) for k source blocks is given by

Robust Soliton distribution is a special case of Ideal Solition distribution with some further assumptions.

Page 10: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

10

LT Codes

Lemma 3 (Luby [12]). For LT codes with Robust Soliton distribution, k original source blocks can be recovered from any k + O(√k ln2(k/δ)) encoded output blocks with probability 1 − δ.

Both encoding and decoding complexity is O(kln(k/δ)).

Page 11: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

11

LT-Codes Based Distributed Storage (LTCDS) Algorithms

We propose 2 LT-Codes Based Distributed Storage (LTCDS) Algorithms. In both algorithms, the source packets are disseminated throughout the network by simple random walks with Robust Soliton distribution. i) Algorithm 1, called LTCDS-I, we assume that

each node in the network knows the global information k and n.

ii) Algorithm 2, called LTCDS-II, is a fully distributed algorithm and values of n and k are not known. The price we pay is extra transmissions of the source packets to obtain some estimations for n and k.

Page 12: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

12

Previous Work

Previous work focused on techniques based on some pre-assumptions about the network such as geographical locations or routing tables. Lin el al.[Infocomm07] studied the question ”how to

retrieve historical data that the sensors have gathered even if some nodes failed or disappeared?” They proposed two decentralized algorithms using Fountain codes to guarantee the persistence and reliability of cached data on unreliable sensors. But they assume that the maximaun degree of a node is known and the source sends b packets (high complexity).

Dimakis el al.[Infocomm06] used a decentralized implementation of Fountain codes that uses geographic routing and every node has to know its location. They applied their work to grid networks.

Kamara el al.[Sigcomm07] proposed a novel technique called growth codes to increase data persistence, i.e. increasing the amount of information that can be recovered at the sink.

Page 13: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

13

Algorithm 1: Knowing global information k and n We use simple random walk for each source to disseminate its

information. Each node u that has packets to transmit chooses one node v

among its neighbors uniformly independently at random. We let each node to accept a source packet equiprobability. Each source packet to visit each node in the network at least

once.

Page 14: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

14

The algorithm consists of three phases. (i) Initialization Phase:

(1) Each node u draws a random number dc(u) according to the distributionΩis(d). Each source node Si,i = 1,..,k generate a header for its source packet xsi and put its ID and a counter c(xsi) = 0.

(2) Each source node si send out its own source packetsi to one of its neighbor u, chosen uniformly at random among all its neighbors N(si).

(3) The node u accepts this source packetsi with probability d/k and updates its storage as

Algorithm 1: Knowing global information k and n

Page 15: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

15

Algorithm 1: Knowing global information k and n

(ii) Encoding Phase: (1) In each round, when a node u receives at least one source

packet before the current round, u forwards the head-of-line (HOL) packet x in its forward queue to one of its random neighbor v.

(2) The node v makes its decisions: If it is the first time that x visits u, then the node v accepts this

source packet with probability d/k and updates its storage as

Else if c(x) < C1nlogn where C1 is a system parameter, then node v puts it into its forward queue and increases the counter of x by one:

If c(x) ≧ C1nlogn then the node v discards packet x forever. (iii) Storage Phase:

When a node u has made its decisions for all the source packets xs1,xs2,..,xsk , i.e., all these packets have visited the node u at least once, the node u finishes its encoding process and yu is the storage packet of u.

Page 16: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

16

Algorithm 1: Knowing global information k and n

Page 17: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

17

Algorithm 1: Knowing global information k and n

Theorem 7. Suppose sensor networks have n nodes and k sources and the LTCDS-I algorithm uses the Robust Soliton distribution Ωrs. Then, when n and k are sufficient large, the k original source packets can be recovered from any k + O(√k ln2(k/δ)) storage nodes with probability 1 − δ. The decoding complexity is O(kln(k/δ)).

Theorem 8. Denote by the total number of transmissions of the LTCDS-I algorithm, then we have

where k is the total number of sources, and n is the total number of nodes in the network.

Page 18: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

18

Algorithm 2: Without knowing global information n and k

In the previous algorithm, values of n and k are known.

We do not assume any thing about the network topology.

Every node does not need to maintain a routing table or knows the maximum degree of a graph.

We design LTCDS-II algorithm, for large values of n and k.

Page 19: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

19

Algorithm 2: Without knowing global information n and k

We design a fully distributed storage algorithm which does not require any global information i.e., values of k and n are not known.

The idea is to utilize simple random walks to do inference to obtain individual estimations of n and k for each node.

We use inter-visit time of random graphs. Definition 9. (Inter-Visit Time or Return Time)

For a random walk on a graph, the inter-visit time of node u, Tvisit(u), is the amount of time between any two consecutive visits of the random walk to node u. This inter-visit time is also called return time.

Our goal is to compute and

Page 20: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

20

Algorithm 2: Without knowing global information n and k

Lemma 10. For a node u with node degree dn(u) in a random geometric graph, the mean inter-visit time is given by

where μ is the mean degree of the graph. The total number of nodes n can be estimated by

However, the mean degree μ is a global information and may be hard to obtain. Thus, with some further approximation and we have

Page 21: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

21

Algorithm 2: Without knowing global information n and k

Definition 11. (Inter-Packet Time) For k random walks on a graph, the inter-packet time of node u, Tpacket(u), is the amount of time between any two consecutive visits of those k random walks to node u.

Lemma 12. For a node u with node degree dn(u) in a random geometric graph with k simple random walks, the mean inter-packet time is given by

An estimation of k can be obtained by

After obtaining estimations for both n and k, we can employ similar techniques used in LTCDS-I to do LT coding and storage.

Page 22: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

22

Algorithm 2: Without knowing global information n and k

The algorithm consists of four phases. (i)Initialization Phase (ii)Inference Phase (iii)Encoding Phase (iv)Storage Phase

Page 23: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

23

Performance Evaluation

Definition 16. (Successful Decoding Probability) Successful decoding probability Ps is the probability that the k source packets are all recovered from the h querying nodes. Ps = Ms / M

Definition 15. (Decoding Ratio) Decoding ratio η is the ratio between the number of queried nodes h and the number of sources k, η= h/k

Page 24: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

24

Performance Evaluation

Figure 3. Decoding performance of LTCDSI algorithm with small number of nodes and sources

•When the decoding ratio is above 2, the successful decoding probability is about 99%.

•When the total number of nodes increases but the ratio between k and n and the decoding ratio η are kept as constants, the successful decoding probability Ps increases when η ≥ 1.5 and decreases when η < 1.5.

Page 25: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

25

Performance Evaluation

Figure 4. Decoding performance of LTCDS-I algorithm with medium number of nodes and sources

Page 26: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

26

Performance Evaluation

Figure 5. Decoding performance of LTCDS-I algorithm with different number of nodes

•Fixing the ratio between n and k as 10%, k/n=0.1•As n grows, the successful decoding probability increases until it reaches some platform which is the successful decoding probability of real LT codes.

Page 27: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

27

Performance Evaluation

• Figure 6. Decoding performance of LTCDS-I algorithm with different system parameter C1

•Studying values of the constant C1, for C1≧ 3, Ps is almost a constant close to 1. It means after 3nlogn steps, almost all source packets visit each node at least once.

Page 28: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

28

Performance Evaluation

Figure 7. Decoding performance of LTCDSII algorithm with small number of nodes and sources

•The decoding performance of the LTCDS-II algorithm is a little bit worse than the LTCDS-I algorithm when decoding ratio η is small, and almost the same when η is large.

Page 29: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

29

Performance Evaluation

Figure 8. Decoding performance of LTCDS-II algorithm with medium number of nodes and sources

Page 30: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

30

Performance Evaluation

Figure 9. Estimation results in LTCDS-II algorithm with n = 200 nodes and k = 20 sources: (a) estimations of n; (b) estimations of k.•The estimations of k are more accurate and concentrated than the estimations of n.

Page 31: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

31

Performance Evaluation

Figure 10. Estimation results in LTCDS-II algorithm with n = 1000 nodes and k = 100 sources: (a) estimations of n; (b) estimations of k.

Page 32: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

32

Performance Evaluation

Figure 11. Decoding performance of LTCDS-II algorithm with different system parameter C2

•When C2 is chosen to be small, the performance of the LTCDS-II algorithm is very poor. •This is due to the inaccurate estimations of k and n of each node.•When C2 is large, for example, when C2 ≥ 30, the performance is almost the same.

Page 33: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

33

Conclusion

We proposed 2 new decentralized algorithms that utilize Fountain codes and random walks to distribute information sensed by k sensing source nodes to n storage nodes.

Page 34: 1 Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008

34

References

[1] D. Aldous and J. Fill. Reversible Markov Chains and Random Walks on Graphs. Preprint, available at http://statwww.berkeley.edu/users/aldous/RWG/book.html, 2002.

[6] A. G. Dimakis, V. Prabhakaran, and K. Ramchandran. Distributed fountain codes for networked storage. Acoustics, Speech and Signal Processing, ICASSP 2006, may 2006.

[9] A. Kamra, V.Misra, J. Feldman, and D. Rubenstein. Growth codes: Maximizing sensor network data persistence. In Proc. of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, Sigcomm06, pages 255 – 266, Pisa, Italy, 2006.

[10] Y. Lin, B. Li, , and B. Liang. Differentiated data persistence with priority random linear code. In Proc. of 27th International Conference on Distributed Computing Systems (ICDCS’07), Toronto, Canada, June, 2007.

[11] Y. Lin, B. Liang, and B. Li. Data persistence in large-scale sensor networks with decentralized fountain codes. In Proc. of the 26th IEEE INFOCOM07, Anchorage, Alaska, May 6-12, 2007.

[12] M. Luby. LT codes. In Proc. 43rd Symposium on Foundations of Computer Science (FOCS 2002), 16-19 November 2002, Vancouver, BC, Canada, 2002.

[13] D. S. Lun, N. Ranakar, R. Koetter, M. Medard, E. Ahmed, and H. Lee. Achieving minimum-cost multicast: A decentralized approach based on network coding. In In Proc. The 24th IEEE INFOCOM, volume 3, pages 1607– 1617, March 2005.

[14] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.

[17] S. Ross. Stochastic Processes. Wiley, New York, second edition, 1995.