43
Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees Wojciech Szpankowski Department of Computer Science Purdue University, W. Lafayette, IN U.S.A. August 14, 2012 Polish Combinatorial Conference, Bedlewo, 2012 Dedicated to PHILIPPE FLAJOLET Joint work with H-K. Hwang, M. Drmota, P . Flajolet, P . Nicodeme, G. Park, and M. Ward.

Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Analytic Combinatorics (of P. Flajolet)

Profile of Digital Trees

Wojciech Szpankowski∗

Department of Computer Science

Purdue University, W. Lafayette, IN

U.S.A.

August 14, 2012

Polish Combinatorial Conference, Bedlewo, 2012

Dedicated to PHILIPPE FLAJOLET

∗Joint work with H-K. Hwang, M. Drmota, P. Flajolet, P. Nicodeme, G. Park, and M. Ward.

Page 2: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Outline of the Presentation

1. Tries and Suffix Trees

2. Profiles of Tries

• Trie Parameters

• Main Results

• Sketch of Proofs

• Consequences (height, fill-up, shortest path)

3. Profile of Digital Search Trees

4. Applications: Error Resilient Lempel-Ziv’77 (Suffix Trees)

5. Analysis of Algorithms and Analytic Information Theory

Page 3: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Tries and Suffix Trees

00

0

0100

0

01010

0

01011

1

1

0

011

1

1

0

1010

0

1011

1

1

0

1

A trie, or prefix tree, is an ordered tree

data structure that stores keys usually

represented by strings.

Tries were introduced by de la Briandais

(1959) and Fredkin (1960) who introduced

the name:

“tries” derived from retrieval.

Suffix tree is a trie built form suffixes of

one string.

Other digital trees are: PATRICIA and

digital search trees (DST).

Typical Tries: In this talk we mostly discuss random tries (and DST)

built from n (independent) sequences generated by a binary memoryless

source with p denoting the probability of generating a “0” (q = 1−p ≤ p).

Page 4: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Digital Trees

Digital Trees:

Tries, PATRICIA Tries, Digital Search Trees:

Figure 1: A trie, PATRICIA trie and a digital search tree bu ilt from:

x1 = 11100 . . . , x2 = 10111 . . . , x3 = 00110 . . . , and x4 = 00001 . . ..

Page 5: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Usefulness of Tries

Tries, suffix tress, and DST are widely used in diverse applications:

• automatically correcting words in texts; Kukich (1992);

• taxonomies of regular language; Watson (1995);

• event history in datarace detection for multi-threaded object-oriented

programs; Choi et al. (2002);

• internet IP addresses lookup; Nilsson and Tikkanen (2002);

• data compression, Lempel-Ziv, . . . ; W.S. (2001);

• distributed hash tables, Malkhi et al. (2002) and Adler et al. (2003).

Fundamental, prototype data structures:

• have a large number of variations and extensions (Patricia, DST, bucket

digital search trees, k-d tries, quadtries, LC-tries, multiple-tries, etc.);

• closely connected to several splitting procedures using coin-flipping:

collision resolution in multi-access (or broadcast) communication

models, loser selection or leader election, etc.

• have direct combinatorial interpretations in terms of words, urn models,

etc.

Page 6: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Outline Update

1. Tries and Suffix Trees

2. Usefulness of Tries

3. Profiles of Tries

• Trie Parameters

• Main Results

• Sketch of Proofs

• Consequences

4. Applications

(G. Park) (P. Nicodeme)

G. Park, H-K. Hwang, P. Nicodeme, W. Szpankowski,

“Profile in Tries,”

SIAM J. Computing, 38, 5, 1821-1880, 2009.

Page 7: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

External and Internal Profiles

00

0

0100

0

01010

0

01011

1

1

0

011

1

1

0

1010

0

1011

1

1

0

1

B0n = 0, I0

n = 1

B1n = 0, I1

n = 2

B2n = 1, I2

n = 2

B3n = 1, I3

n = 2

B4n = 3, I4

n = 1

B5n = 2 I5

n = 0

External profile and internal profile:

Bkn = # external nodes at distance k from the root;

Ikn = # internal nodes at distance k from the root.

Page 8: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Why to Study Profiles?

• Fine, informative shape characteristic;

• Related to path length, depth, height, shortest path, width, etc.;

• Breadth-first search;

• Compression algorithms.

• Mathematically challenging, phenomenally interesting!

Example: Parameters such height Hn, shortest path, sn, fill-up level Fn, and

depth, Dn can be studied through the profiles since:

Hn = max{k : Bkn > 0},

sn = min{k : Bkn > 0},

Fn = max{k : Ikn = 2k},

Pr(Dn = k) =E[Bk

n]n .

0 1

x4 x5

x1 x2

Fn Dn5 Hns n

x3

Page 9: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Recurrence for the Profiles

External Profile Bkn:

Define the probability generating function as

Bnk(u)

Bik-1(u) Bn-i(u)

k-1

Bkn(u) = E[uBk

n] =∑

ℓ≥0

P (Bkn = l)ul.

Then

Bkn(u) =

n∑

i=0

(ni

)piqn−i

Bk−1i (u)B

k−1n−i(u)

with B0n = 1 for n 6= 1 and B0

1 = u

Internal Profile probability generating function Ikn(u) = E[Ik

n] satisfies the

same recurrence with I0n(u) = u for n > 1 and I0

0(u) = I01(u) = 1.

Average External Profile:

E[Bkn] =

n∑

i=0

(ni

)piqn−i

(E[Bk−1i ] + E[B

k−1n−i ]), n ≥ 2, k ≥ 1,

under some initial conditions (e.g., E[Bk0 ] = 0 for all k).

Page 10: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Main Results

Notation: r = p/q = p/(1 − p)> 1, and α := αn,k = k

log n:

p0.50.60.70.80.9 1

2

4

6

8

10

α1

α2

α3

α1 :=1

log(1/q),

α2 :=p2 + q2

p2 log(1/p) + q2 log(1/q),

α3 :=2

log(1/(p2 + q2)).

1: Exponential Growth (0 < α < α1):

Let 1 ≤ k ≤ 1log q−1(log n − log log logn + log(r − 1) − ε):

E[Bkn] = nqk(1 − qk)n−1

(1 + O

((log n)−δ

))= O(2−nν)

2: Logarithmic Growth (0 < α < α1):

Let 1 ≤ k ≤ 1log q−1(log n − log log logn + m log(r − 1) − ε):

E[Bkn] = O(log log n · logm−β

n).

where m and β are constants (smaller or greater than m).

Page 11: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Phase Transitions

3: Polynomial Growth: α1 · log n < k < α2 · logn: (α1 < α < α2)

E[Bkn] ∼ G1 (log n)

pρqρ(p−ρ + q−ρ)√2πα log(p/q)

· nυ1

√log n

,

where G1(x) is a periodic function and

υ1 = −ρ + α log(p−ρ

+ q−ρ

), ρ = − 1

log(p/q)log

(−1 − α log q

1 + α log p

).

6 × 10−23

−6 × 10−23

1p = 0.55

3 × 10−7

−3 × 10−7

1

p = 0.65

6 × 10−3

−6 × 10−3

1p = 0.75p = 0.85p = 0.95

Figure 2: The fluctuating part of the periodic function G1(x) for p = 0.55, 0.65, . . . , 0.95.

4: Polynomial Growth/Decay: α2 · log n < k: (α2 < α)

E[Bkn] =

2pq

p2 + q2n

ν2 + O (nν3)

where ν2 = 2 + α log(p2 + q2) for some ν3 < ν2.

Page 12: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

External Shapes

log2n

log n + O(1)

log2 n + O(1)

2 log2 n + O(1)

– Typeset by FoilTEX – 1

log1/qn

log log n + O(1)

log n

p log(1/p) + q log(1/q)

2log(1/(p2+q2))

log n + O(1)

1(p = 0.5, α1 = α2 = 1/ log 2) (p = 0.75)

Page 13: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Average Internal Profile

1: Almost Full Tree: k < α1 · logn

E(Ikn) = 2k − E(Bn,k)(1 + o(1)).

2: Phase Transition I: α1 · logn < k < α0 · logn, where α0 = 2log(1/p)+log(1/q)

E[Ikn] = 2

k − G2 (log n)E(Bn,k)(1 + o(1))

where G2(x) is a periodic function.

3: Phase Transition II: α0 · logn < k < α2 · logn

E[Ikn] = G2 (logn)E(Bn,k)(1 + o(1))

where G2(x) is a periodic function.

4: Polynomial Growth/Decay: α2 · log n < k

E[Ikn] =

1

2n

ν2(1 + o(1))

where ν2 = 2 − α log(p2 + q2).

Page 14: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Variance and Limiting Distributions of the External Profile

Variance:

1: k < α1 · logn: V[Bkn] ∼ E[Bk

n].

2: α1 · logn < k < α2 · logn: V[Bkn] ∼ G3(log n)E[Bk

n].where G3(log n) is a periodic function.

3: α2 · logn < k: V[Bkn] ∼ 2E[Bk

n].

Limiting Distributions:

Central Limit Theorem: For α1 · logn < k < α3 · log n:

Bkn − E[Bk

n]√V[Bk

n]→ N(0, 1),

where N(0, 1) is the standard normal distribution.

Poisson Distribution: For α3 · log n < k:

P (Bn,k = 2m) =λ0

m

m!e−λ0 + o(1), and P (Bn,k = 2m + 1) = o(1),

where λ0 := pqn2(p2 + q2)k−1.

Page 15: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Consequences

Height: For large n (cf. Flajolet, 1980, Pittel, 1985, W.S., 1988, Devroye, 1992)

Hn =2

log(p2 + q2)−1logn = α3log n := kH, (whp).

••

• ••

••

Upper Bound: P (Hn > (1 + ǫ)kH) ≤ P (Bkn ≥ 1) ≤ E[Bk

n] → 0.

Lower Bound: P (Hn < (1 − ǫ)kH) ≤ P (B⌈(1−ε)kH⌉)n = 0)

≤ V[B⌈(1−ǫ)kHn ⌉]

(E[B⌈(1−ǫ)kH⌉n ])2

= O

(1

E[B⌈(1−ǫ)kH⌉n ]

)→ 0.

Define: kS := ⌊ 1log q−1(log n − log log log n + log(e log r))⌋.

Shortest Path: For large n (cf. Knessl and W.S., 2005)

P (sn = kS or sn = kS + 1) → 1.

Fill-up: For large n (cf. Pittel, 1986, Devroye, 1992, Knessl & W.S., 2005)

P (Fn = kS − 1 or Fn = kS) → 1.

Page 16: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Sketch of the Proof

1. Recurrence: E[Bkn] =

∑ni=0

(ni

)piqn−i(E[Bk−1

i ]+E[Bk−1n−i ]), n ≥ 2, k ≥ 1.

Page 17: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Sketch of the Proof

1. Recurrence: E[Bkn] =

∑ni=0

(ni

)piqn−i(E[Bk−1

i ]+E[Bk−1n−i ]), n ≥ 2, k ≥ 1.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

Ek(z) = Ek−1(zp) + Ek−1(zq), k ≥ 2,

Page 18: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Sketch of the Proof

1. Recurrence: E[Bkn] =

∑ni=0

(ni

)piqn−i(E[Bk−1

i ]+E[Bk−1n−i ]), n ≥ 2, k ≥ 1.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

Ek(z) = Ek−1(zp) + Ek−1(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = (p−s + q−s)E∗k−1(s):

E∗k(s) = (p

−s+ q

−s)k−1 · s · (p−s

+ q−s − 1)Γ(s)

for ℜ(s) ∈ (−2,∞), where Γ(s) is the Euler Gamma function.

Page 19: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Sketch of the Proof

1. Recurrence: E[Bkn] =

∑ni=0

(ni

)piqn−i(E[Bk−1

i ]+E[Bk−1n−i ]), n ≥ 2, k ≥ 1.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

Ek(z) = Ek−1(zp) + Ek−1(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = (p−s + q−s)E∗k−1(s):

E∗k(s) = (p

−s+ q

−s)k−1 · s · (p−s

+ q−s − 1)Γ(s)

for ℜ(s) ∈ (−2,∞), where Γ(s) is the Euler Gamma function.

4. Inverse Mellin Transform: Ek(z) = 12πi

∫ c+i∞c−i∞ z−sE∗

k(s)ds:

Ek(z) =1

2πi

∫ c+i∞

c−i∞s(p

−s+ q

−s − 1)Γ(s)z−s

(p−s

+ q−s

)k−1

ds

through the saddle point method (see next slide for more).

Page 20: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Sketch of the Proof

1. Recurrence: E[Bkn] =

∑ni=0

(ni

)piqn−i(E[Bk−1

i ]+E[Bk−1n−i ]), n ≥ 2, k ≥ 1.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

Ek(z) = Ek−1(zp) + Ek−1(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = (p−s + q−s)E∗k−1(s):

E∗k(s) = (p

−s+ q

−s)k−1 · s · (p−s

+ q−s − 1)Γ(s)

for ℜ(s) ∈ (−2,∞), where Γ(s) is the Euler Gamma function.

4. Inverse Mellin Transform: Ek(z) = 12πi

∫ c+i∞c−i∞ z−sE∗

k(s)ds:

Ek(z) =1

2πi

∫ c+i∞

c−i∞s(p

−s+ q

−s − 1)Γ(s)z−s

(p−s

+ q−s

)k−1

ds

through the saddle point method (see next slide for more).

5. Depoissonization: From the Poisson transform Ek(z) to E[Bkn].

Page 21: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Saddle Point Method: Phase Transitions

By depoisonization we have Ek(n) ∼ Ek(z), where recall

Ek(n) =1

2πi

∫ c+i∞

c−i∞g(s)Γ(s + 1)n

−s(p

−s+ q

−s)kds

=1

2πi

∫ c+i∞

c−i∞g(s)Γ(s + 1) exp(h(s)log n)ds, k = αlogn.

where g(s) = (p−s + q−s − 1) (note g(0) = g(−1) = 0). For k = αlog n,

n−s(p−s + q−s)k = exp(h(s)logn) is large.

Page 22: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Saddle Point Method: Phase Transitions

By depoisonization we have Ek(n) ∼ Ek(z), where recall

Ek(n) =1

2πi

∫ c+i∞

c−i∞g(s)Γ(s + 1)n−s(p−s + q−s)kds

=1

2πi

∫ c+i∞

c−i∞g(s)Γ(s + 1) exp(h(s)log n)ds, k = αlogn.

where g(s) = (p−s + q−s − 1) (note g(0) = g(−1) = 0). For k = αlog n,

n−s(p−s + q−s)k = exp(h(s)logn) is large.

Saddle Point Method:

A function F (z) analytic with nonnegative coefficients and “fast growth”.

fn = [zn]F (z) =

1

2iπ

CF (z)

dz

zn+1=

1

2iπ

CeH(z)

dz,

where H(z) := logF (z) − (n + 1) log z.

Define H ′(z0) = 0 to be the saddle point, that is,

H(z) = H(z0) +12(z − z0)

2H ′′(z0) + O(H ′′′(z0)(z − z0)3).

[zn]F (z) =

1√2π|H ′′(z0)|

exp(H(z0)) ·(1 + O

(H ′′′(z0)

(H ′′(z0))3/2

)).

Page 23: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Infinity of Saddle Points

The saddle point equation is h′(s) = 0 where

h(s) = α log(p−s + q−s) − s.

It has a unique real root:

ρ =−1

log rlog

(α log q−1 − 1

1 − α log p−1

),

1

log q−1< α <

1

log p−1.

Note∣∣∣p−ρ−itj + q−ρ−itj

∣∣∣ = p−ρ + q−ρ for tj = 2πj/ log r, j ∈ Z

nlog

ρ +⋅

rlog----------i

ρ +

⋅rlog

----------

i

ρ –

ρ – i

i⋅

rlog----------

2πlog-----------

r⋅

nlog–

2

2

1

1

j ≥

j ≤ρ – i⋅

rlog----------

2π j

ρ + i⋅

rlog----------

2π j,

,

Phase Transitions:

1. There are infinitely many saddle points ρ + itj.

2. ρ → ∞ as α ↓ 1/ log q−1 = α1.

3 ρ → −∞ when α ↑ 1/ log p−1.

4. Saddle points coalesce with poles of the

Γ(s + 1) function at s = −2,−3, . . ..

Pole s = −2 leads to α2.

Page 24: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Analytic Depoissonization

Theorem 1 (Jacquet, and W.S., 1998). Let G(z) =∑

n≥0 gnzn

n! e−z be the

Poisson transform of gn. In a linear cone Sθ assume two conditions:

(I) For z ∈ Sθ

and some reals B,R > 0, ν

|z| > R ⇒ |G(z)| ≤ B|z|νΨ(|z|)

where Ψ(x) is a

slowly varying function.

(O) For z /∈ Sθ and A,α < 1

|z| > R ⇒ |G(z)ez| ≤ Aexp(α|z|).

Then

gn = G(n) + O(nν−1

Ψ(n)).

Using the depoissonization theorem, we get E[Bkn] = Ek(n) + O

(nν−1√log n

).

Page 25: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Outline Update

1. Tries and Suffix Trees

2. Profiles of Tries

3. Profile of Digital Search Trees

4. Applications: Error Resilient Lempel-Ziv’77

M. Drmota and W. Szpankowski,

“The Expected Profile of Digital Search Trees”,

J. Combin. Theory, Ser. A, 118, 1939-1965, 2011.

Page 26: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Digital Search Trees

Figure 3: A digital search tree built on eight strings s1, . . . , s8 (i.e.,

s1 = 0 . . ., s2 = 1 . . ., s3 = 01 . . ., s4 = 11 . . ., etc.) with internal and

external (squares) nodes, and its profiles.

Page 27: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Profile of Digital Search Trees

1. Recurrence: E[Bk+1n+1] =

∑ni=0

(ni

)piqn−i(E[Bk

i ] +E[Bkn−i]), n ≥ 2, k ≥ 0.

Page 28: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Profile of Digital Search Trees

1. Recurrence: E[Bk+1n+1] =

∑ni=0

(ni

)piqn−i(E[Bk

i ] +E[Bkn−i]), n ≥ 2, k ≥ 0.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

E′k+1(z) + Ek+1(z) = Ek(zp) + Ek(zq), k ≥ 2,

Page 29: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Profile of Digital Search Trees

1. Recurrence: E[Bk+1n+1] =

∑ni=0

(ni

)piqn−i(E[Bk

i ] +E[Bkn−i]), n ≥ 2, k ≥ 0.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

E′k+1(z) + Ek+1(z) = Ek(zp) + Ek(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = −Γ(s)Fk(s):

F∗k+1(s) − F

∗k+1(s − 1) = (p

−s+ q

−s) · F ∗

k (s)

for ℜ(s) ∈ (−k − 1, 0), and F ∗0 (s) = 1.

Page 30: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Profile of Digital Search Trees

1. Recurrence: E[Bk+1n+1] =

∑ni=0

(ni

)piqn−i(E[Bk

i ] +E[Bkn−i]), n ≥ 2, k ≥ 0.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

E′k+1(z) + Ek+1(z) = Ek(zp) + Ek(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = −Γ(s)Fk(s):

F∗k+1(s) − F

∗k+1(s − 1) = (p

−s+ q

−s) · F ∗

k (s)

for ℜ(s) ∈ (−k − 1, 0), and F ∗0 (s) = 1.

4. The Power Series: f(x, s) =∑

k≥0 F k(s)xs becomes

f(x, s) =g(x, s)

g(x, 0), g(x, s) =

h(x, s)

1 − x(p−s + q−s),

where g(x, s) = 1+x∑

j≥0 g(x, s − j)(p−s+j+q−s+j), and asymptotically

F ∗k(s) ∼ A(s)(p−s + q−s)k.

where A(s) is analytic with A(−r) = 0 for r = 0, 1, 2 . . ..

Page 31: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Profile of Digital Search Trees

1. Recurrence: E[Bk+1n+1] =

∑ni=0

(ni

)piqn−i(E[Bk

i ] +E[Bkn−i]), n ≥ 2, k ≥ 0.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

E′k+1(z) + Ek+1(z) = Ek(zp) + Ek(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = −Γ(s)Fk(s):

F∗k+1(s) − F

∗k+1(s − 1) = (p

−s+ q

−s) · F ∗

k (s)

for ℜ(s) ∈ (−k − 1, 0), and F ∗0 (s) = 1.

4. The Power Series: f(x, s) =∑

k≥0 F k(s)xs becomes

f(x, s) =g(x, s)

g(x, 0), g(x, s) =

h(x, s)

1 − x(p−s + q−s),

where g(x, s) = 1+x∑

j≥0 g(x, s − j)(p−s+j+q−s+j), and asymptotically

F ∗k(s) ∼ A(s)(p−s + q−s)k.

where A(s) is analytic with A(−r) = 0 for r = 0, 1, 2 . . ..

5. Inverse Mellin Transform by saddle point and depoissonization.

Page 32: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Profile of Digital Search Trees

1. Recurrence: E[Bk+1n+1] =

∑ni=0

(ni

)piqn−i(E[Bk

i ] +E[Bkn−i]), n ≥ 2, k ≥ 0.

2. Poisson Transform: Ek(z) =∑∞

n=0 E[Bkn]

zn

n! e−z:

E′k+1(z) + Ek+1(z) = Ek(zp) + Ek(zq), k ≥ 2,

3. Mellin Transform: E∗k(s) :=

∫∞0

zs−1Ek(z)dz = −Γ(s)Fk(s):

F∗k+1(s) − F

∗k+1(s − 1) = (p

−s+ q

−s) · F ∗

k (s)

for ℜ(s) ∈ (−k − 1, 0), and F ∗0 (s) = 1.

4. The Power Series: f(x, s) =∑

k≥0 F k(s)xs becomes

f(x, s) =g(x, s)

g(x, 0), g(x, s) =

h(x, s)

1 − x(p−s + q−s),

where g(x, s) = 1+x∑

j≥0 g(x, s − j)(p−s+j+q−s+j), and asymptotically

F∗k(s) ∼ A(s)(p

−s+ q

−s)k.

where A(s) is analytic with A(−r) = 0 for r = 0, 1, 2 . . ..

5. Inverse Mellin Transform by saddle point and depoissonization.

6. Asymptotically DST profile behaves similarly to the profile of tries.

Page 33: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Main Results

Theorem 2. Let E Ikn denote the expected internal profile in (asymmetric)

digital search trees, 0 < p < q = 1 − p < 1. The following assertions hold:

1. α1 = 1

log 1p+ ε ≤ k

log n ≤ α0 − ε:

E Ikn = 2

k − G3

(log p

kn) (p−ρn,k + q−ρn,k)kn−ρn,k

√2πβ(ρn,k)k

(1 + O

(k−1/2

)),

where G3(x) is a non-zero periodic function with period 1.

2. k = α0

(log n + ξ

√α0β(0) logn

), where ξ = o((log n)

16), then

E Ikn = 2

kΦ(−ξ)

(1 + O

(1 + |ξ|3√

logn

))

where Φ is the normal distribution function.

3. α0 + ε ≤ klog n ≤ 1

log 1q− ε = α4 − ε:

E Ikn = G3

(log p

kn) (p−ρn,k + q−ρn,k)kn−ρn,k

√2πβ(ρn,k)k

(1 + O

(k−1/2

))

where ρn,k = ρ(k/ logn).

Page 34: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Thank You!

Page 35: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Outline Update

1. Tries and Suffix Trees

2. Usefulness of Tries

3. Profiles of Tries

4. Applications: Error Resilient Lempel-Ziv’77 (Suffix Trees)

S. Lonardi and M. Ward, and W. Szpankowski,

“Error Resilient LZ’77 Compression: Algorithms, Analysis, and Experiments”,

IEEE Trans. Information Theory, 53, 1799-1813, 2007.

Page 36: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Error Resilient LZ’77 Scheme

1. The Lempel-Ziv’77 works on-line: It compresses phrases by replacing the

longest prefix by (pointer, length) of its copy.

2. Castelli and Lastras in 2004 proved that a single error in LZ’77 corrupts

O(n2/3) phrases, thus about O(n2/3 logn) symbols, where n is the size.

3. There are multiple copies of the longest prefix that we denote by Mn for

a database of length n.

4. By a judicious choice of pointers in the LZ’77 scheme, we can recover

⌊log2 Mn⌋ bits without losing a bit in compression. Parity bits recovered

from the multiple copies are used for the Reed-Solomon channel coding.

historyhistory current positioncurrent position

0001

10

11

Page 37: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Analysis of Mn Via Suffix Trees

Why does LZRS’77 work so well?

Performance of LZRS’77 depends on Mn. How does Mn typically behave?

Build a suffix tree from the first n suffixes of the database X (i.e., S1 =X∞

1 , S2 = X∞2 , . . . , Sn = X∞

n ). Then insert the (n+1)st suffix, Sn+1 = X∞n+1.

Depth of insertion of Sn+1 is the (n+1)-st phrase length, and Mn is the size

of the subtree that starts at the insertion point of the (n + 1)st suffix.

S1 S2

S3 S4

S5

Mn

Figure 6: M4(=2) is the size of the subtree at the insertion point of S5.

Page 38: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Analysis of Mn for Independent Tries

1. Consider digital tries built over n independent strings.

Average E[MIn] and probability generating function

satisfying the following recurrences

(p = 1 − q is the probability of generating a “1”) MkI

MnI

Mn k–I

q = 1-pp

E[MIn] = pn(qn + pE[MI

n]) + qn(pn + qE[MIn)] +

n−1∑

k=1

(nk

)pkqn−k(pE[MI

k ] + qE[MIn−k])

E[uMIn] = pn(qun + pE[uMI

n]) + qn(pun + qE[uMIn]) +

n−1∑

k=1

(nk

)pkqn−k(pE[uMI

k ] + qE[uMI

n−k]).

• (Analytic) Poissonization(W (z) =

∑n≥0 E[M

In]

zn

n! e−z)

:

W (z) = qpzeqz + pqzepz + pW (pz) + qW (qz).

• Mellin Transform(f∗(s) =

∫∞0

f(x)xs−1dx):

W∗(s) =

Γ(s + 1)(pq−s + qp−s)

1 − p−s+1 − q−s+1.

• Inverse Mellin Transform: W (z) = 1/h + fluctuations.

• (Analytic) Depoissonization: E[MIn] = 1/h + fluctuations.

Page 39: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Analysis of Mn for Dependent Strings

2. Suffix Trees: Using analytic combinatorics on words we prove that

M(z, u) =∞∑

n=1

∞∑

k=1

P(Mn = k)ukzn

=∑

w∈A∗α∈A

uP(β)P(w)

Dw(z)

Dwα(z) − (1 − z)

Dw(z) − u(Dwα(z) − (1 − z))

Dw(z) = (1 − z)Sw(z) + zmP (w) and Sw(z) is the autocorrelation

polynomial:

Sw(z) =∑

k∈P(w)

P(wmk+1)z

m−k

P(w) denotes the set of positions k of w satisfyingw1 . . . wk = wm−k+1 . . . wm.

For any ε > 0 there exists β > 1 such that (all hard analytic work is here!)

|Pr(Mn = k) − Pr(MIn = k)| = O(n−εβ−k)

Random suffix trees resemble random independent tries (cf. P. Jacquet,

W.S., 1994, Lonardi, W.S., Ward, 2005: IEEE Trans. Inf. Th., 2007).

Page 40: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Main Results

Theorem 4 (Ward, W.S., 2005). Let zk = 2krπiln p ∀k ∈ Z, where ln p

ln q = rs for

some relatively prime integers r, s (i.e., ln pln q is rational).

The jth factorial moment E[(Mn)j] = E[Mn(Mn − 1) · · ·Mn(−j + 1)] is

E[(Mn)j] = Γ(j)

q(p/q)j + p(q/p)j

h+ δj(log1/p n) + O(n−η)

where h = −p log p− q log q is the entropy rate, η > 0, and where Γ is the

Euler gamma function and

δj(t) =∑

k 6=0

−e2krπitΓ(zk + j)

(pjq−zk−j+1 + qjp−zk−j+1

)

p−zk+1 ln p + q−zk+1 ln q.

δj is a periodic function that has a small magnitude and exhibits fluctuation

when ln pln q is rational

Note: On average there are

E[Mn] ∼ 1/h additional pointers.

j 1ln 2

∑k 6=0

∣∣Γ(j − 2kiπ

ln 2

)∣∣1 1.4260 ×10−5

3 1.2072 ×10−3

5 1.1421 ×10−1

6 1.1823 ×100

8 1.4721 ×102

9 1.7798 ×103

10 2.2737 ×104

Page 41: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Distribution of Mn

Theorem 5 (Ward, W.S., 2005). Let zk = 2krπiln p ∀k ∈ Z, where ln p

ln q = rs. Then

P (Mn = j) =pjq + qjp

jh+∑

k 6=0

−e2krπi log1/p n

Γ(zk)(pjq + qjp)(zk)

j

j!(p−zk+1 ln p + q−zk+1 ln q)+ O(n

−η)

where η > 0, and Γ is the Euler gamma function.

Therefore, Mn follows the logarithmic series distribution with mean 1/h (plus

some fluctuations).

The logarithmic series distribution ((pjq + qjp)/(jh))

is well concentrated around its mean EMn ≈ 1/h.

0

0.2

0.4

0.6

0.8

2 3 4 5 6 7 8

x

Page 42: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Analytic Information Theory and Analysis of Algorithms

• In the 1997 Shannon Lecture Jacob Ziv presented compelling

arguments for “backing off” from first-order asymptotics in order to

predict the behavior of real systems with finite length description.

• To overcome these difficulties we propose replacing first-order analyses

by full asymptotic expansions and more accurate analyses (e.g., large

deviations, central limit laws).

• Following Hadamard’s precept1, we study information theory problems

using techniques of complex analysis such as generating functions,

combinatorial calculus, Rice’s formula, Mellin transform, Fourier series,

sequences distributed modulo 1, saddle point methods, analytic

poissonization and depoissonization, and singularity analysis.

• This program, which applies complex-analytic tools of analysis of

algorithms to information theory, constitutes analytic information

theory.

1The shortest path between two truths on the real line passes through the complex plane.

Page 43: Analytic Combinatorics (of P. Flajolet) Profile of Digital ... · Analytic Combinatorics (of P. Flajolet) Profile of Digital Trees ... Analysis of Algorithms and Analytic Information

Thank You!