Upload
logan-rogers
View
226
Download
1
Tags:
Embed Size (px)
Citation preview
Testing Collections of Properties
Reut Levi Dana Ron
Ronitt Rubinfeld
ICS 2011
Shopping distribution
What properties do your distributions have?
Transactions in California Transactions in New York
Testing closeness of two distributions:
trend change?
Testing Independence:Shopping patterns:
Independent of zip code?
This work: Many distributions
One distribution:
D is arbitrary black-box distribution over [n], generates iid samples.
Sample complexity in terms of n? (can it be sublinear?)
D
Test
samples
Pass/Fail?
Uniformity (n1/2) [Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08]
Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01]
Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08]
Independence O(n12/3 n2
1/3), (n12/3 n2
1/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work
Entropy n1/β^2+o(1) [Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08]
Support Size (n/logn) [Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10]
Monotonicity on total order (n1/2) [Batu, Kumar, Rubinfeld 04]
Monotonicity on poset n1-o(1)
[Bhattacharyya, Fischer, Rubinfeld, Valiant 10]
Some answers…
Collection of distributions:
Two models: Sampling model:
Get (i,x) for random i, xDi
Query model: Get (i,x) for query i and xDi
Sample complexity in terms of n,m?
D1
Test
samples
Pass/Fail?
D2 Dm…
Further refinement: Known or unknown distribution on i’s?
Properties considered:
Equivalence All distributions are equal
``Clusterability’’ Distributions can be clustered into k
clusters such that within a cluster, all distributions are close
Equivalence vs. independence
Process of drawing pairs: Draw i [m], x Di output (i,x)
Easy fact: (i,x) independent iff Di‘s are equal
Results
Def: (D1,…Dm) has the Equivalence property if Di = Di' for all 1 ≤ i, i’ ≤ m.
Lower Bound Upper Bound
n>m (n2/3m1/3) Unknown Weights Õ(n2/3m1/3)
m>n (n1/2m1/2) Õ(n1/2m1/2) Known Weights
Also yields “tight” lower bound for independence testing
Clusterability
Can we cluster distributions s.t. in each cluster, distributions (very) close? Sample complexity of test is
O(kn2/3) for n = domain size, k = number of clusters No dependence on number of distributions Closeness requirement is very stringent
Open Questions
• Clusterability in the sampling model, less stringent notion of close
• Other properties of collections?• E.g., all distributions are shifts of each other?
Thank you