SURAJIT CHAUDHURIRAJEEV MOTWANIVIVEK NARASAYYA
On random sampling over Joins
Presented by : Srikantha Nema
Outline
Semantics of SampleDifficulty of join SamplingAlgorithms for SamplingSampling strategiesNew strategies for join SamplingExperimental evaluationConclusions
Terminologies
SAMPLE(R, f) is an SQL operation
When a query Q is evaluated, we obtain relation R
f is a fraction of a relation R
Semantics of Sample
Sampling with Replacement (WR)
Sampling without Replacement (WoR)
Independent Coin Flips (CF)
Difficulty of Join Sampling
,,,...,,,,,,, 23212011 kbabababaBAR
kcacacacaCAR ,,....,,,,,,, 12111022
),( 21 fRRSAMPLE
),(),( 2211 fRSAMPLEfRSAMPLE ?
Classification of Join Sampling problem
Case A No information is available for either or
Case B No information is available for but indexes and
/or statistics are available for Case C
Indexes/statistics are available for and
1R 2R
1R2R
1R 2R
Algorithms for Sampling
Unweighted Sequential WR Sampling Black-Box U1 Black-Box U2
Weighted Sequential WR Sampling Black-Box WR1 Black-Box WR2
New strategies for join Sampling
Strategy Stream-Sample
Strategy Group-Sample
Strategy Frequency-Partition-Sample
Conclusions
Difficulty of join samplingClassification of the problem into 3 casesStrategies for join samplingNew schemes for sequential random
sampling for uniform and weighted samplingMore efficient strategies can be developed
for the case of single joinMore work needed to understand the
problem of sampling the result of join trees