Sketching and Streaming Entropy via Approximation TheoryNick Harvey (MSR/Waterloo)Jelani Nelson (MIT)Krzysztof Onak (MIT)
Streaming ModelIncrement x1x nm updatesIncrement x4Goal: Compute statistics, e.g. ||x||1, ||x||2 Trivial solution: Store x (or store all updates) O(nlog(m)) spaceGoal: Compute using O(polylog(nm)) space
- Streaming Algorithms(a very brief introduction)Fact: [Alon-Matias-Szegedy 99], [Bar-Yossef et al. 02], [Indyk-Woodruff 05], [Bhuvanagiri et al. 06], [Indyk 06], [Li 08], [Li 09]Can compute (1) = (1)Fp using O(-2 logc n) bits of space(if 0 p2) O(-O(1) n1-2/p logO(1)(n)) bits(if 2
Practical MotivationGeneral goal: Dealing with massive data setsInternet traffic, large databases, Network monitoring & anomaly detectionStream consists of internet packetsxi = # packets sent to port iUnder typical conditions, x is very concentratedUnder port scan attack, x less concentratedCan detect by estimating empirical entropy [Lakhina et al. 05], [Xu et al. 05], [Zhao et al. 07]
EntropyProbability distribution a = (a1, a2, , an)Entropy H(a) = - ailg(ai)Examples:a = (1/n, 1/n, , 1/n) : H(a) = lg(n)a = (0, , 0, 1, 0, , 0) : H(a) = 0small when concentrated, LARGE when not
Streaming Algorithms for EntropyHow much space to estimate H(x)?[Guha-McGregor-Venkatasubramanian 06], [Chakrabarti-Do Ba-Muthu 06], [Bhuvanagiri-Ganguly 06][Chakrabarti-Cormode-McGregor 07]: multiplicative (1) approx: O(-2 log2 m) bits additive approx: O(-2 log4 m) bits (-2) lower bound for both
Our contributions:Additive or multiplicative (1) approximation(-2 log3 m) bits, and can handle deletionsCan sketch entropy in the same space~
First IdeaIf you can estimate Fp for p1,then you can estimate H(x)Why?Rnyi entropy
Review of RnyiDefinition:
Convergence to Shannon:Hp(x)p102Alfred RnyiClaude Shannon
Overview of AlgorithmSet p=1.01 and let x =
So~~~~~(using Lis compressed counting)Analysis
Making the tradeoffHow quickly does Hp(x) converge to H(x)?
Theorem: Let x be distr., with mini xi 1/m. Let . Then
Let . Then
Plugging in: O(-3 log4 m) bits of space suffice for additive approximationMultiplicative ApproximationAdditive Approximation~~~~~~
Proof: A trick worth rememberingLet f : and g : be such thatlHopitals rule says thatIt actually says more! It says converges to at least as fast as does.
ImprovementsStatus: additive approx using O(-3 log4 m) bits How to reduce space further?Interpolate with multiple points: Hp1(x), Hp2(x), ...
Let f(z) be a Ck+1 functionInterpolate f with polynomial q with q(zi)=f(zi), 0ikFact:
where y, zi [a,b]Our case: Set f(z) = H1+z(x)Goal: Analyze f(k+1)(z)
Bounding DerivativesRnyi derivatives are messy to analyzeSwitch to Tsallis entropy f(z) = S1+z(x),Can prove Tsallis also converges to Shannon
~ (when a=-O(1/(klog m)), b=0) can set k = log(1/)+loglog mFact:
Key Ingredient:Noisy InterpolationWe dont have f(zi), we have f(zi)
How to interpolate in presence of noise?
Idea: we pick our zi very carefully
Rogosinskis Theorem: q(x) of degree k and |q(j)| 1 (0jk) |q(x)| |Tk(x)| for |x| > 1
Map [-1,1] onto interpolation interval [z0,zk]Choose zj to be image of j, j=0,,kLet q(z) interpolate f(zj) and q(z) interpolate f(zj)r(z) = (q(z)-q(z))/ satisfies Rogosinskis conditions!~~
Tradeoff in Choosing zk
zk close to 0 |Tk(preimage(0))|still smallbut zk close to 0 high space complexityJust how close do we need 0 and zk to be? Tk grows quickly once leaving [z0, zk]z0zk0
The Magic of Chebyshev[Paturi 92]:Tk(1 + 1/kc) e4k1-(c/2). Set c = 2.
Suffices to set zk=-O(1/(k3log m))
Translates to (-2 log3 m) space
The Final Algorithm(additive approximation)Set k = lg(1/) + lglg(m), zj = (k2cos(j/k)-(k2+1))/(9k3lg(m)) (0 j k)Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 j kInterpolate degree-k polynomial q(zj) = S1+zjOutput q(0)~~~~~
Multiplicative ApproximationHow to get multiplicative approximation?Additive approximation is multiplicative, unless H(x) is smallH(x) small large [CCM 07]Suppose and define We combine (1)RF1 and (1)RF1+zj to get (1)f(zj)Question: How do we get (1)RFp?Two different approaches:A general approach (for any p, and negative frequencies)An approach exploiting p 1, only for nonnegative freqs (better by log(m))
Questions / ThoughtsFor what other problems can we use this generalize-then-interpolate strategy?Some non-streaming problems too?
The power of moments?
The power of residual moments? CountMin (CM 05) + CountSketch (CCF 02) HSS (Ganguly et al.)
WANTED: Faster moment estimation (some progress in [Cormode-Ganguly 07])