Sketching and Streaming Entropy via Approximation Theory

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Sketching and Streaming Entropy via Approximation Theory. Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT). Streaming Model. m updates. Increment x 4. Increment x 1. x ∈ ℤ n. x = (0, 0, 0, 0, …, 0). x = ( 1 , 0, 0, 0, …, 0). x = (1, 0, 0, 1 , …, 0). - PowerPoint PPT Presentation

Text of Sketching and Streaming Entropy via Approximation Theory

  • Sketching and Streaming Entropy via Approximation TheoryNick Harvey (MSR/Waterloo)Jelani Nelson (MIT)Krzysztof Onak (MIT)

  • Streaming ModelIncrement x1x nm updatesIncrement x4Goal: Compute statistics, e.g. ||x||1, ||x||2 Trivial solution: Store x (or store all updates) O(nlog(m)) spaceGoal: Compute using O(polylog(nm)) space

  • Streaming Algorithms(a very brief introduction)Fact: [Alon-Matias-Szegedy 99], [Bar-Yossef et al. 02], [Indyk-Woodruff 05], [Bhuvanagiri et al. 06], [Indyk 06], [Li 08], [Li 09]Can compute (1) = (1)Fp using O(-2 logc n) bits of space(if 0 p2) O(-O(1) n1-2/p logO(1)(n)) bits(if 2
  • Practical MotivationGeneral goal: Dealing with massive data setsInternet traffic, large databases, Network monitoring & anomaly detectionStream consists of internet packetsxi = # packets sent to port iUnder typical conditions, x is very concentratedUnder port scan attack, x less concentratedCan detect by estimating empirical entropy [Lakhina et al. 05], [Xu et al. 05], [Zhao et al. 07]

  • EntropyProbability distribution a = (a1, a2, , an)Entropy H(a) = - ailg(ai)Examples:a = (1/n, 1/n, , 1/n) : H(a) = lg(n)a = (0, , 0, 1, 0, , 0) : H(a) = 0small when concentrated, LARGE when not

  • Streaming Algorithms for EntropyHow much space to estimate H(x)?[Guha-McGregor-Venkatasubramanian 06], [Chakrabarti-Do Ba-Muthu 06], [Bhuvanagiri-Ganguly 06][Chakrabarti-Cormode-McGregor 07]: multiplicative (1) approx: O(-2 log2 m) bits additive approx: O(-2 log4 m) bits (-2) lower bound for both

    Our contributions:Additive or multiplicative (1) approximation(-2 log3 m) bits, and can handle deletionsCan sketch entropy in the same space~

  • First IdeaIf you can estimate Fp for p1,then you can estimate H(x)Why?Rnyi entropy

  • Review of RnyiDefinition:

    Convergence to Shannon:Hp(x)p102Alfred RnyiClaude Shannon

  • Overview of AlgorithmSet p=1.01 and let x =

    Compute

    Set

    So~~~~~(using Lis compressed counting)Analysis

  • Making the tradeoffHow quickly does Hp(x) converge to H(x)?

    Theorem: Let x be distr., with mini xi 1/m. Let . Then

    Let . Then

    Plugging in: O(-3 log4 m) bits of space suffice for additive approximationMultiplicative ApproximationAdditive Approximation~~~~~~

  • Proof: A trick worth rememberingLet f : and g : be such thatlHopitals rule says thatIt actually says more! It says converges to at least as fast as does.

  • ImprovementsStatus: additive approx using O(-3 log4 m) bits How to reduce space further?Interpolate with multiple points: Hp1(x), Hp2(x), ...

  • Analyzing Interpolation

    Let f(z) be a Ck+1 functionInterpolate f with polynomial q with q(zi)=f(zi), 0ikFact:

    where y, zi [a,b]Our case: Set f(z) = H1+z(x)Goal: Analyze f(k+1)(z)

  • Bounding DerivativesRnyi derivatives are messy to analyzeSwitch to Tsallis entropy f(z) = S1+z(x),Can prove Tsallis also converges to Shannon

    ~ (when a=-O(1/(klog m)), b=0) can set k = log(1/)+loglog mFact:

  • Key Ingredient:Noisy InterpolationWe dont have f(zi), we have f(zi)

    How to interpolate in presence of noise?

    Idea: we pick our zi very carefully

  • Chebyshev Polynomials

    Rogosinskis Theorem: q(x) of degree k and |q(j)| 1 (0jk) |q(x)| |Tk(x)| for |x| > 1

    Map [-1,1] onto interpolation interval [z0,zk]Choose zj to be image of j, j=0,,kLet q(z) interpolate f(zj) and q(z) interpolate f(zj)r(z) = (q(z)-q(z))/ satisfies Rogosinskis conditions!~~

  • Tradeoff in Choosing zk

    zk close to 0 |Tk(preimage(0))|still smallbut zk close to 0 high space complexityJust how close do we need 0 and zk to be? Tk grows quickly once leaving [z0, zk]z0zk0

  • The Magic of Chebyshev[Paturi 92]:Tk(1 + 1/kc) e4k1-(c/2). Set c = 2.

    Suffices to set zk=-O(1/(k3log m))

    Translates to (-2 log3 m) space

  • The Final Algorithm(additive approximation)Set k = lg(1/) + lglg(m), zj = (k2cos(j/k)-(k2+1))/(9k3lg(m)) (0 j k)Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 j kInterpolate degree-k polynomial q(zj) = S1+zjOutput q(0)~~~~~

  • Multiplicative ApproximationHow to get multiplicative approximation?Additive approximation is multiplicative, unless H(x) is smallH(x) small large [CCM 07]Suppose and define We combine (1)RF1 and (1)RF1+zj to get (1)f(zj)Question: How do we get (1)RFp?Two different approaches:A general approach (for any p, and negative frequencies)An approach exploiting p 1, only for nonnegative freqs (better by log(m))

  • Questions / ThoughtsFor what other problems can we use this generalize-then-interpolate strategy?Some non-streaming problems too?

    The power of moments?

    The power of residual moments? CountMin (CM 05) + CountSketch (CCF 02) HSS (Ganguly et al.)

    WANTED: Faster moment estimation (some progress in [Cormode-Ganguly 07])

    ********