# Sketching and Streaming Entropy via Approximation Theory

• View
23

0

Embed Size (px)

DESCRIPTION

Sketching and Streaming Entropy via Approximation Theory. Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT). Streaming Model. m updates. Increment x 4. Increment x 1. x ∈ ℤ n. x = (0, 0, 0, 0, …, 0). x = ( 1 , 0, 0, 0, …, 0). x = (1, 0, 0, 1 , …, 0). - PowerPoint PPT Presentation

### Text of Sketching and Streaming Entropy via Approximation Theory

• Sketching and Streaming Entropy via Approximation TheoryNick Harvey (MSR/Waterloo)Jelani Nelson (MIT)Krzysztof Onak (MIT)

• Streaming ModelIncrement x1x nm updatesIncrement x4Goal: Compute statistics, e.g. ||x||1, ||x||2 Trivial solution: Store x (or store all updates) O(nlog(m)) spaceGoal: Compute using O(polylog(nm)) space

• Streaming Algorithms(a very brief introduction)Fact: [Alon-Matias-Szegedy 99], [Bar-Yossef et al. 02], [Indyk-Woodruff 05], [Bhuvanagiri et al. 06], [Indyk 06], [Li 08], [Li 09]Can compute (1) = (1)Fp using O(-2 logc n) bits of space(if 0 p2) O(-O(1) n1-2/p logO(1)(n)) bits(if 2
• Practical MotivationGeneral goal: Dealing with massive data setsInternet traffic, large databases, Network monitoring & anomaly detectionStream consists of internet packetsxi = # packets sent to port iUnder typical conditions, x is very concentratedUnder port scan attack, x less concentratedCan detect by estimating empirical entropy [Lakhina et al. 05], [Xu et al. 05], [Zhao et al. 07]

• EntropyProbability distribution a = (a1, a2, , an)Entropy H(a) = - ailg(ai)Examples:a = (1/n, 1/n, , 1/n) : H(a) = lg(n)a = (0, , 0, 1, 0, , 0) : H(a) = 0small when concentrated, LARGE when not

• Streaming Algorithms for EntropyHow much space to estimate H(x)?[Guha-McGregor-Venkatasubramanian 06], [Chakrabarti-Do Ba-Muthu 06], [Bhuvanagiri-Ganguly 06][Chakrabarti-Cormode-McGregor 07]: multiplicative (1) approx: O(-2 log2 m) bits additive approx: O(-2 log4 m) bits (-2) lower bound for both

Our contributions:Additive or multiplicative (1) approximation(-2 log3 m) bits, and can handle deletionsCan sketch entropy in the same space~

• First IdeaIf you can estimate Fp for p1,then you can estimate H(x)Why?Rnyi entropy

• Review of RnyiDefinition:

Convergence to Shannon:Hp(x)p102Alfred RnyiClaude Shannon

• Overview of AlgorithmSet p=1.01 and let x =

Compute

Set

So~~~~~(using Lis compressed counting)Analysis

• Making the tradeoffHow quickly does Hp(x) converge to H(x)?

Theorem: Let x be distr., with mini xi 1/m. Let . Then

Let . Then

Plugging in: O(-3 log4 m) bits of space suffice for additive approximationMultiplicative ApproximationAdditive Approximation~~~~~~

• Proof: A trick worth rememberingLet f : and g : be such thatlHopitals rule says thatIt actually says more! It says converges to at least as fast as does.

• ImprovementsStatus: additive approx using O(-3 log4 m) bits How to reduce space further?Interpolate with multiple points: Hp1(x), Hp2(x), ...

• Analyzing Interpolation

Let f(z) be a Ck+1 functionInterpolate f with polynomial q with q(zi)=f(zi), 0ikFact:

where y, zi [a,b]Our case: Set f(z) = H1+z(x)Goal: Analyze f(k+1)(z)

• Bounding DerivativesRnyi derivatives are messy to analyzeSwitch to Tsallis entropy f(z) = S1+z(x),Can prove Tsallis also converges to Shannon

~ (when a=-O(1/(klog m)), b=0) can set k = log(1/)+loglog mFact:

• Key Ingredient:Noisy InterpolationWe dont have f(zi), we have f(zi)

How to interpolate in presence of noise?

Idea: we pick our zi very carefully

• Chebyshev Polynomials

Rogosinskis Theorem: q(x) of degree k and |q(j)| 1 (0jk) |q(x)| |Tk(x)| for |x| > 1

Map [-1,1] onto interpolation interval [z0,zk]Choose zj to be image of j, j=0,,kLet q(z) interpolate f(zj) and q(z) interpolate f(zj)r(z) = (q(z)-q(z))/ satisfies Rogosinskis conditions!~~

zk close to 0 |Tk(preimage(0))|still smallbut zk close to 0 high space complexityJust how close do we need 0 and zk to be? Tk grows quickly once leaving [z0, zk]z0zk0

• The Magic of Chebyshev[Paturi 92]:Tk(1 + 1/kc) e4k1-(c/2). Set c = 2.

Suffices to set zk=-O(1/(k3log m))

Translates to (-2 log3 m) space

• The Final Algorithm(additive approximation)Set k = lg(1/) + lglg(m), zj = (k2cos(j/k)-(k2+1))/(9k3lg(m)) (0 j k)Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 j kInterpolate degree-k polynomial q(zj) = S1+zjOutput q(0)~~~~~

• Multiplicative ApproximationHow to get multiplicative approximation?Additive approximation is multiplicative, unless H(x) is smallH(x) small large [CCM 07]Suppose and define We combine (1)RF1 and (1)RF1+zj to get (1)f(zj)Question: How do we get (1)RFp?Two different approaches:A general approach (for any p, and negative frequencies)An approach exploiting p 1, only for nonnegative freqs (better by log(m))

• Questions / ThoughtsFor what other problems can we use this generalize-then-interpolate strategy?Some non-streaming problems too?

The power of moments?

The power of residual moments? CountMin (CM 05) + CountSketch (CCF 02) HSS (Ganguly et al.)

WANTED: Faster moment estimation (some progress in [Cormode-Ganguly 07])

********

Recommended ##### A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning ... 2016-09-30
Documents ##### 6.5 Analyzing and Sketching Graphs Section 6.5 Analyzing and Sketching Graphs 275 EXAMPLE 2 Sketching
Documents ##### sketching - Technical University of ... â€¢ Sketching â€¢ CountMin sketch Today Sketching â€¢ Sketching
Documents ##### Randomized Sketching for Large-Scale Sparse Ridge ... approximation problems, which are ubiquitous in
Documents ##### CURVE SKETCHING - FCAMPENA ... Curve Sketching of Polynomial in Factored Form In geometry, curve sketching
Documents Documents