35
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra

Approximate Query Processing using Wavelets

  • Upload
    shepry

  • View
    54

  • Download
    1

Embed Size (px)

DESCRIPTION

Approximate Query Processing using Wavelets. Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra. Outline. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Approximate Query Processing using Wavelets

Kaushik Chakrabarti(Univ Of Illinois)Minos Garofalakis(Bell Labs)Rajeev Rastogi(Bell Labs)Kyuseok Shim(KAIST and AITrc)Presented at 26th VLDB Conference, Cairo, Egypt

Presented BySupriya Sudheendra

Page 2: Approximate Query Processing using Wavelets

Outline

Page 3: Approximate Query Processing using Wavelets

Introductiono Approximate Query Processing is a viable

solution for: Huge amounts of data High query complexities Stringent response-time requirements

o Decision Support Systems Support business and organizational decision-

making activities Helps decision makers compile useful

information from raw data, solve problems and make decisions

Page 4: Approximate Query Processing using Wavelets

Introduction…o DSS users pose very complex queries to the

DBMS Requires complex operations over GB or TBs

of disk-resident data Very long time to execute and produce exact

answers Number of scenarios where users prefer a fast,

approximate answers

Page 5: Approximate Query Processing using Wavelets

Prior Worko Previous Approximate query processing

techniques Focused on specific forms of aggregate queries Data reduction mechanism – how to obtain the

synopses of datao Sampling-based Techniques

A join-operator on 2 uniform random samples results in a non-uniform sample having very few tuples

For non-aggregate queries, it produces a small subset of the exact answer which might be empty when joins are involved.

Page 6: Approximate Query Processing using Wavelets

Prior Work…o Histogram Based Techniques

Problematic for high-dimensional data Storage overhead High construction cost

o Wavelet Based Techniques Mathematical tool for hierarchical

decomposition of functions Apply wavelet decomposition to input data

collection –> data synopsis Avoids high construction costs and storage

overhead

Page 7: Approximate Query Processing using Wavelets

Contribution of the Papero Viability and effectiveness of wavelets as a

generic tool for high-dimensional DSSo New, I/O-efficient wavelet decomposition

algorithm for relational tableso Novel Query processing algebra for Wavelet-

Co-Efficient Data Synopseso Extensive Experiments

Page 8: Approximate Query Processing using Wavelets

Backgroundo Mathematical tool to hierarchically decompose

functionso Coarse overall approximation together with detail

coefficients that influence function at various scaleso Haar wavelets are conceptually simple, fast to

computeo Variety of applications like image editing and querying

Page 9: Approximate Query Processing using Wavelets

One-Dimensional Haar Waveletso How to compute, given a data array:

Average the values together pairwise to get a “lower-resolution” representation of data

Detailed coefficients-> differences of the averaged value from the computed pairwise average

Reconstruction of the data array possible Why Detail Coefficients

Page 10: Approximate Query Processing using Wavelets

One-dimensional Haar Wavelets

o Wavelet Transform: Overall average followed by detail coefficients in increasing order of resolution. Each entry->wavelet coefficient

o WA = [4, -2, 0, -1]

o For vectors containing similar values, most detail coefficients have small values that

can be eliminated Introduces only small errors

Page 11: Approximate Query Processing using Wavelets

One-dimensional Haar Waveletso Overall average more important than any

detail coefficiento To normalize the final entries of WA, each

wavelet coefficient is divided by 2l

l: level of resolution WA = [4, -2, 0, -1/2]

Page 12: Approximate Query Processing using Wavelets

Multi-dimensional Haar Waveletso Haar wavelets can be extended to multi-

dimensional array Standard Decomposition

Fix an ordering for the data dimensions(1,2,…d) Apply complete 1-D wavelet transform for each 1-d

row of array cells along dimension k

Page 13: Approximate Query Processing using Wavelets

Nonstandard Decomposition Alternates between dimensions during successive

steps of pairwise averaging and differencing for each 1-D row of array cells along dimension k

Repeated recursively on quadrant containing all averages across all dimensions

Page 14: Approximate Query Processing using Wavelets

Non-standard Decomposition

Pairwise averaging and differencing for one positioning of 2x2 box with root [2i1, 2i2]

Distribution of the results in the wavelet transform array

Process is recursed on lower-left quadrant of WA

Page 15: Approximate Query Processing using Wavelets

Example Decomposition of a 4 X 4 Array

Page 16: Approximate Query Processing using Wavelets

Multi-dimensional Haar coefficients: Semantics and Representationo D-dimensional Haar basis function

corresponding to Wavelet w is defined by: D-dimensional rectangular support region Quadrant sign information

Page 17: Approximate Query Processing using Wavelets

Support Regions for 16 Nonstandard 2-D Haar Basis Function

Blank areas – regions of A whose reconstruction is independent of the coefficient

WA[0,0] – overall average WA[3,3] – contributes only to upper right

quadrant

Page 18: Approximate Query Processing using Wavelets

Haar CoEfficients: Semantics and Representationo W = <R, S, v>

W.R – d-dimensional support hyper-rectangle of W encloses all cells in A to which W contributes Hyper-rectangle – represented by low and high

boundaries across each dimension j, 1<= j <=d W.R.boundary[j].lo and W.R.boundary[j].hi W contributes to each data cell A[i1,……id] where

W.R.boundary[j].lo <= ij <= W.R.boundary[j].hi for all j

Page 19: Approximate Query Processing using Wavelets

o W.S – sign information for all d-dimensional quadrants of W.R Denoted by W.S.sign[j].lo and W.S.sign[j].hi

corresponding to lower and upper half of W.R’s extent along j

Computed as the product of d sign-vector entries that map to that quadrant

o W.v – scalar magnitude of W Quantity that W contributes to all data array

cells enclosed in W.R

Page 20: Approximate Query Processing using Wavelets

Building Wavelet Coefficient Synopseso Relation R with d attributes X1, X2, ………Xd

o Can represent R as a d-dimensional array AR

o Jth dimension is indexed by the values of attribute Xj

o Cells contain the count of tuples in R having the corresponding combination of attribute values

o AR – joint frequency distribution of all attributes of R

Page 21: Approximate Query Processing using Wavelets

Chunk-based organization of relational tablesJoint frequency array AR – split into d-

dimensional chunks Tuples of R of same chunk are stored

contiguously on diskIf R is not chunked, one extra pre-processing

step to reorganize R on disk

Page 22: Approximate Query Processing using Wavelets

ComputeWavelet Algorithm

When a chunk is loaded for the first time, ComputeWavelet can perform entire computation for decomposing

Pairwise averaging and differencing is performed as soon as 2d averages are accumulated

Memory efficient- no more than one active sub-array at a time for each level of resolution

Page 23: Approximate Query Processing using Wavelets

Processing Relational Queries in Wavelet Coefficient Domain

Wavelet-Coefficient Synopses

WT1, WT2,…WTk

RS of Wavelet Coefficients

WS

Approx. Result Relation

S

Wavelet-Coefficient Synopses

WT1, WT2,…WTk

Approximate Relations

T1, T2,….Tk

Approx. Result Relation

S

Op(WT1,….WTk)

Render(WS)

Render(WT1…WTk)

Op(T1, T2…. Tk)

Page 24: Approximate Query Processing using Wavelets

Selection Operator

Our selection operator has the general form selectpred(WT ), where pred represents a generic conjunctive predicate on a subset of the d attributes in T; that is, pred = (li1 ≤ Xi1 ≤ hi1 ) ∧ . . . ∧ (lik ≤ Xik ≤ hik ), where lij and hij denote the low and high boundaries of the selected range along each selection dimension Dij , j = 1, 2, · · · , k, k ≤ d.

Page 25: Approximate Query Processing using Wavelets

Selection - Relational Domain

o In relational domain, interested in only those cells inside query range

o In wavelet domain, interested in only the coefficients that contribute to those cells

Dim D1(Attr1)

Dim D2(Attr2)

Count

0 6 61 2 31 3 41 5 61 6 82 6 73 0 14 2 35 2 26 1 36 2 26 5 16 6 3

Dim. D2

6

3

73

322

4

1

1

86

3

Query RangeQuery Range

Dim.

D1

Joint Data Distribution ArrayJoint Data Distribution ArrayRelatioRelationn

Page 26: Approximate Query Processing using Wavelets

Projection Operator

Page 27: Approximate Query Processing using Wavelets

Projection- Wavelet Domain

Page 28: Approximate Query Processing using Wavelets

Join Operator

Page 29: Approximate Query Processing using Wavelets

Join Operator- Wavelet Domain

Page 30: Approximate Query Processing using Wavelets

Experimental Studyo Improved answer qualityo Low synopsis construction costso Fast query execution

Page 31: Approximate Query Processing using Wavelets

Query Execution Times

Page 32: Approximate Query Processing using Wavelets

SELECT-JOIN-SUM

Page 33: Approximate Query Processing using Wavelets

SELECT Query errors on real-life data

Page 34: Approximate Query Processing using Wavelets

Conclusiono Multidimensional wavelets as an effective tool

for general purpose approximate query processing in modern, high dimensional applications

o The query processing algorithms operate directly on the wavelet-coefficient synopses of relational data, thus allowing for very fast processing of arbitrarily complex queries entirely in the wavelet-coefficient domain

o Extensive experimental study with synthetic as well as real-life data sets that verifies the effectiveness of the wavelet-based approach compared to both sampling and histograms

Page 35: Approximate Query Processing using Wavelets