Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University

Distribution-free testing algorithms for monomials

with a sublinear number of queries

Elya Dolev & Dana Ron Tel-Aviv University

Property testing of (Boolean) functions (“standard/uniform”

version)f : {0,1}n {0,1} - the tested functionF - family of functions (e.g. linear functions)Given a dist. par. and query access to f

fx f(x)

If f F, then accept w.p. 2/3 If dist(f,F) > then reject w.p 2/3 where dist(f,F) = mingF{dist(f,g)} and dist(f,g) = PrxU[f(x) g(x)]

Property testing of (Boolean) functions distribution-free version f : {0,1}n {0,1} - the tested functionF - family of functions (e.g. linear functions)D - (unknown) underlying distributionGiven a dist. par. , access to examples distributed by D and query access to f

fx f(x)

If f F, then accept w.p. 2/3 If distD(f,F) > then reject w.p 2/3 where distD(f,F) = mingF{distD(f,g)} and distD(f,g) = PrxD[f(x) g(x)]

xD

Inspired by dist-free PAC learning model [Valiant]

(Dist-free) Testing and Learning

Dist-free testing was initially considered in [Goldreich,Goldwasser,R]. Observed that testing is no harder than (proper) learning (in particular, dist-free+queries).

Q1: When is standard/dist-free testing easier than learning?

Q2: What is relation btwn complexity of standard and dist-free testing?

Testing and Learning

Quite a few classes for which standard testing is easier than learning (under the unif. dist. + queries):• Linear functions [Blum,Luby,Rubinfeld]• Low-degree polynomials [Rubinfeld&Sudan]• Singletons, monomials, small monotone DNF [Parnas,R,Samorodintsky] • Monotone functions [Ergun,Kannan,Kumar,Rubinfeld,Viswanathan][Dodis,Goldreich,Lehman,Raskhodnikova,R,Samorodintsky]• Small juntas [Fischer,Kindler,R,Safra,Samorodintsky]• Small decision lists, decision trees, DNF (general) [Diakonikolas,Lee,Matulef,Onak,Rubinfeld,Servedio,Wan]• Linear thresh. functions [Matulef,O’Donnell,Rubinfeld,Servedio]• . . .

Fewer positive results for dist-free testing [Halevy,Kushilevtz]x2. Tends to be more challenging.

Background on distribution-free testing

One of the main positive (and general) results: if class has standard tester and can be self-corrected, then have dist-free tester [Halevy&Kushilevtz].

In particular gives dist-free testers for linear functions and low-degree polynomials.

What about other classes of interest (e.g., from learning point of view) which don’t have self-correctors?

Background on distribution-free testing

What about other classes of interest?

[Glasner&Servedio] considered question for monomials (monotone/general), decision lists, linear thresh. func.

Prove that every dist-free tester must perform ((n/log(n))1/5) queries (for const. ), in contrast to standard testing of classes where there is no dependence on n (and poly on 1/).

Shows that strong dependence on n is unavoidable, but can we get some sublinear dependence on n? (Dist-free learning + queries requires linear dependence [Turan])

Our Results

We give a positive answer to the question for monomials – both monotone and general.

The complexity of our dist-free testing algorithms is O(n1/2log(n)/).

Dist-free testing of monotone monomials

Let MM denote the class of monotone monomials (over n variables). Consider any f in MM.

Observe:

• For each y s.t. f(y)=0, exists j s.t. yj=0 and xjf

• For each y s.t. f(y)=1, for every j s.t. yj=0, xjf Example: y0=010, f(y0)=0; y1=011, f(y1)=1

x1 or x3 must be in monomial

x1 cannot be in monomial


Def of the violation hypergraph Hf of a function f: - Its vertex set is {0,1}n; - Each (hyper)edge is a subset e={y0,y1,…,yt} where f(y0)=0 and f(yi)=1 for every i>0, s.t.

Z(y0) i>0Z(yi) (so that there is no g in MM consistent with f on e).Example: y0=010, y1=011, y2=110 (f(y0)=0, f(y1)=f(y2)=1)

x1 or x3 must be in monomial



Notation: Z(y)={j: yj=0}


Def of the violation hypergraph Hf of a function f : - Its vertex set is {0,1}n; - Each (hyper)edge is a subset e={y0,y1,…,yt} where f(y0)=0 and f(yi)=1 for every i>0, s.t. Z(y0) i>0Z(yi)

By def, if f is in MM then no edges in Hf .

Lemma: If distD(f,MM) > , then D(C) > for every vertex cover C of Hf .

Testing algorithm tries to find an edge in Hf.

Claim: Let R {0,1}n. If no e E(Hf) is subset of R, then exists g in MM that agrees with f on R.


Claim: Let R {0,1}n. If no e E(Hf) is subset of R then exists g in MM that agrees with f on R.

f(y)=0f(y)=1

Let S(R) = {i : yi=1 yRf-1(1) } (if Rf-1(1)= then S=[n])Define g(x) = iS(R)xi . Hence g(y)=f(y), yRf-1(1).

Consider yRf-1(0). Suppose g(y)=1, i.e., yi=1, iS.But then {y} (Rf-1(1)) is an edge in Hf, contrary to premise of claim.

{0,1}n

R

E(Hf)


Testing algorithm tries to find an edge in Hf.

Notation: for Z[n], y(Z) has all coordinates in Z equal 0, and others 1 (e.g., y({1,3}) = 0101, y({2}) = 1011)

Basic building block: procedure that given y f-1(0) searches for index j s.t. yj=0 and f(y({j}))=0 (i.e. xj must be in monomial if f in MM).Procedure performs binary search. - Starts with Z = Z(y). - In each iteration partitions Z to two equal parts Z1, Z2, and queries y(Z1) and y(Z2). - Continues with Zi s.t.f(y(Zi))=0 (if f(y(Z1))=f(y(Z2))=1 then {y(Z),y(Z1),y(Z2)} is an edge so can reject)- Stops when |Z|=1.

Z

Z1

Z2

{j} (rep index of y(


Testing algorithm for MM

- Obtain sample T of (n1/2/) points dist. D.

- For each y in T s.t. f(y)=0 run search proc. on y.

- If search failed for some y then reject (and halt). Otherwise, let J be union of all indices returned.

- Obtain sample T’ of (n1/2/) points dist. D.

- If exists y’ in T’ s.t. f(y’)=1 and Z(y’) J then reject, o.w. accept.

Found edge {y(Z),y(Z1),y(Z2

)}Found edge {y({j}),y’}


Testing algorithm for MM- Obtain sample T of (n1/2/) points dist. D.- For each y in T s.t. f(y)=0 run search proc. on y.- If search failed for some y then reject (and halt). Otherwise, let J be union of all indices returned.- Obtain sample T’ of (n1/2/) points dist. D.- If exists y’ in T’ s.t. f(y’)=1 and Z(y’) J then reject, o.w. accept.

Query complexity of alg: |T|log(n)+|T’| = O(n1/2log(n)/)

If f in MM, alg always accepts.

If distD(f,MM) > then prove that rejects w.p. 2/3. Lemma: If distD(f,MM) > , then w.p. 5/6 over choice of T (of size (n1/2/) ), either T0=Tf-1(0) contains point that fails search or D(Y1(J))=(/n1/2) where J=J(T0) is union of indices returned by search, and Y1(J)={yf-1(1): Z(y)J}.


Prove contrapositive: If w.p.> 1/6 over choice of T (of size (n1/2/) ): - T0=Tf-1(0) does not contain any empty point and- D(Y1(J))=O(/n1/2) ( Y1(J)={yf-1(1): Z(y)J} ), then can construct vertex cover C of Hf s.t. D(C) ≤ , so that distD(f,MM) ≤ .First put in C all empty points. Total weight of these points is very small (O(/n1/2))

Continue in O(n1/2) iterations. In iteration r add to C subset Yrf-1(1) s.t. D(Yr)=O(/n1/2).

Why cover? for yf-1(0) and jZ(y), if jJ then Y1(J) covers all edges in Hf that contain y (e.g., y=0101, j=3, J={3,4}, if y in edge {y=y0,y1,…,yt}, then have yi=??0?, so that yiY1(J).)

Can show (by prob argument) that in each iteration (but last) exists T s.t. D(Y1(J(T0)))= O(/n1/2) and J(T0) contains (n1/2) new indices. After last iteration add all yf-1(0) whose rep index did not appear in any iteration (can show that have small weight).

point for which search fails


Suppose w.p. > 1/6 over choice of T (of size (n1/2/)), T0=Tf-1(0) does not contain empty point and D(Y1(J(T0)))=O(/n1/2).C {empty points}

T1 T10 J1=J(T1

0) Y1 = Y1(J1), C C Y1, J* J* J1

J* (J* is set of “covered indices”)

|J1|=(n1/2) D(Y1)=O(/n1/2)T2 T2

0 J2=J(T10) Y2 = Y1(J2),

C C Y1, J* J* J1

|J2\J*|=(n1/2) D(Y2)=O(/n1/2)

Ts Ts0 Js=J(Ts

0) Ys = Y1(Js), (s=O(n1/2)) C C Ys, J* J* Js D(Ys)=O(/

n1/2)C C {all yf-1(0) s.t. Z(y) J* = }

D(C)=O(/n1/2)

Dist-free testing of general monomials

Let GM denote the class of (general) monomials (over n variables). Consider any f in GM.

Observe:

• For each y s.t. f(y)=0, either exists j s.t. yj=0 and xjf or exists j s.t. yj=1 and xjf

• For each y s.t. f(y)=1, for every j s.t. yj=0, xjf and for every j s.t. yj=1, xjf Example: y0=010, f(y0)=0; y1=011, f(y1)=1

x1 or x2 or x3 must be in monomial




Dist-free testing of general monomials

First, modify notion of violation hypergraph Hf : each edge {y0,y1,…,yt} still satisfies f(y0)=0,

f(yi)=1, i>0, but now, Z(y0) i>0Z(yi) and O(y0)

i>0O(yi)

Next, binary search is performed on y in f-1(0) but “w.r.t.” w in f-1(1). Search finds index j s.t. f(w’)=0 for w’ that differs from w only on j’th coordinate. (in monotone case, implicitly w = 1n).

After performing search on O(n1/2/) sample points in f-1(0) (w.r.t. same w) and obtaining set J of “representative indices”, take additional sample and see if contains y in f-1(1) s.t. yj wj for some j in J.

Summary and Open problems

Give sublinear (Õ(n1/2)) algorithms for dist-free testing of monotone/general monomials. (Alg for general monomials extends alg for monotone monomials.)

Two natural questions:

• What is exact complexity of dist-free testing of monomials? (Lower bound of [GS] is (n1/5))

• What about other classes studied by [GS]? (Decision lists and linear threshold functions.)

Thanks

Standard vs. dist-free testing of monomials

When the underlying distribution is uniform (standard testing), if f is a k-monomial, then Pr[f(x)=1] = 2-k, and so can effectively consider only monomials where k = O(log(1/))).

This is not generally true in dist-free case. Specifically, lower bound of [GS] constructs functions that depend on many variables and underlying dist. D helps to “hide non-monomiality”.

Note: dist-free testing for (monotone) k-monomials when k is fixed, can be done using exp(k) samples+queries (combine [PRS] and [HK])


Claim: Let R {0,1}n. If no e E(Hf) is subset of R then exists g in MM that agrees with f on R.

f(y)=0f(y)=1

Let S(R) = {i : yi=1 yRf-1(1) } (if Rf-1(1)= then S=[n])Define g(x) = iS(R)xi . Hence g(y)=f(y), yRf-1(1).

Consider yRf-1(0). Suppose g(y)=1, i.e., yi=1, iS.But then {y} (Rf-1(1)) is an edge in Hf, contrary to premise of claim.

Documents

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University