qCube: Efficient integration of range query operators over a high dimension data cube

Embed Size (px)

DESCRIPTION

Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Sub-cube and Top-k Similar inquire query operators

Text of qCube: Efficient integration of range query operators over a high dimension data cube

  • qCube: Efficient integration of range query operators over a high dimension data cube Rodrigo Rocha Silva Doctorate Student Prof. Dr. Celso Massaki Hirata Advisor Prof. Dr. Joubert de Castro Lima Co-Advisor ITA INSTITUTO TECNOLGICO DE AERONUTICA Electronic Engineering and Computer Science Division - EEC/I Department of Computer Science Brazil
  • qCube: Efficient integration of range query operators over a high dimension data cube Goal Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Subcube and Top-k Similar inquire query operators Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 2
  • qCube: Efficient integration of range query operators over a high dimension data cube Topics Motivation Data Cube Related Work Query Cube (qCube) Experiments Results Conclusions Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 3
  • qCube: Efficient integration of range query operators over a high dimension data cube Motivation Users need to view data in a tangible way, such as reports, cross tables and histograms Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 4
  • qCube: Efficient integration of range query operators over a high dimension data cube Motivation Suppose that at some decision-making process it is necessary the following information : What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries The average temperatures above 30 degrees Celsius on the weekends of leap years in the last 200 years. Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 5
  • qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube, introduced by Gray et al., 1996, is a generalization of the group-by operator over all possible combinations of dimensions with various granularity aggregates. Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 6
  • qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube has exponential complexity with respect to the number of dimensions For an input with size d the output has size 2d Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 7
  • qCube: Efficient integration of range query operators over a high dimension data cube Data Cube Hierarchies Year Discipline Day Department Year Wednesday, October 02, 2012 Hour 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 8
  • qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A C COUNT A B C COUNT * * * 11 * b2 c1 1 a1 * * 3 * b2 c2 1 a2 * * 5 * b3 c2 3 a3 Base Relation R 11 tuples B * * 3 a1 b1 c1 1 A B C COUNT * b1 * 6 a3 b3 c2 1 a1 b1 c1 1 * b2 * 2 a2 b3 c2 1 a3 b3 c2 1 * b3 * 3 a3 b1 c1 1 a2 b3 c2 1 * * c1 4 a2 b1 c1 1 a3 b1 c1 1 * * c2 7 a2 b2 c2 1 a2 b1 c1 1 a1 b1 * 2 a1 b1 c2 1 a2 b2 c2 1 a1 b3 * 1 a2 b2 c1 1 a1 b1 c2 1 a2 b1 * 2 a3 b1 c2 1 a2 b2 c1 1 a2 b2 * 2 a1 b3 c2 1 a3 b1 c2 1 a2 b3 * 1 a2 b1 c2 1 a1 b3 c2 1 a3 b1 * 2 a2 b1 c2 1 a3 b3 * 1 a1 * c1 1 a1 * c2 2 a2 * c1 2 a2 * c2 3 a3 * c1 1 a3 * c2 2 * b1 c1 3 * b1 c2 3 Wednesday, October 02, 2012 FULL 3D CUBE + 38 tuples 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 9
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing Approach Partitions the data vertically Reduces high-dimensional cube into a set of lower dimensional cubes Lossless reduction Offers tradeoffs between the amount of pre-processing and the speed of online computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 10
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing Example Let the cube aggregation function be count tid A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 Divide the 5 dimensions into 2 shell fragments: (A, B, C) and (D, E) From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 11
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing 1-D Inverted Indices Build traditional invert index or RID list Attribute Value TID List List Size a1 123 3 a2 45 2 b1 145 3 b2 23 2 c1 12345 5 d1 1345 4 d2 2 1 e1 12 2 e2 34 2 e3 5 1 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 12
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing Approach Generalize the 1-D inverted indices to multi-dimensional ones in the data cube sense Compute all cuboids for data cubes ABC and DE while retaining the inverted indices For example, shell fragment cube ABC contains 7 cuboids: A, B, C AB, AC, BC ABC This completes the offline computation stage Cell Intersection TID List List Size a1 b1 1 2 3 1 4 5 1 1 a1 b2 1 2 3 2 3 23 2 a2 b1 4 5 1 4 5 45 2 a2 b2 4 5 2 3 0 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 13
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing Measure Table If measures other than count are present, store in ID_measure table separate from the shell fragments tid count sum 1 5 70 2 3 10 3 8 20 4 5 40 5 2 30 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 14
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing Query Given the fragment cubes, process a query as follows 1. Divide the query into fragment, same as the shell 2. Fetch the corresponding TID list for each fragment from the fragment cube 3. Intersect the TID lists from each fragment to construct instantiated base table 4. Compute the data cube using the base table with any cubing algorithm From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 15
  • qCube: Efficient integration of range query operators over a high dimension data cube Related Work Frag-Cubing Approach A B C D E F G H I J K L M N Base Table Online Computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 16
  • qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements a set of tuple identifiers per dimension attribute, similar to Frag-Cubing; Therefore, qCube can answer point queries using tuple identifiers intersections and range queries using unions plus intersections algorithms, regardless measure function types. Frag-Cubing just implements point and some inquire queries. There is no Frag-Cubing solution for queries like What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 17
  • qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements the range query operators: Equal; Not Equal; Greater or Less than; Some; Between and Similar. Also implements inquire query operators: Distinct; Sub-cube; Top-k Similar. Over a high dimension data cube. Wednesday, October 02, 2012 28 Simpsio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 18
  • qCube: Efficient integration of range