23
Compressing Compressing Relations And Relations And Indexes Indexes Jonathan Goldstein Raghu Ramakrishnan Uri Shaft Department of Compter Sciences, University of Wisconsin-Madison June 18, 1997

Compressing Relations And Indexes

  • Upload
    thuong

  • View
    124

  • Download
    10

Embed Size (px)

DESCRIPTION

Compressing Relations And Indexes. Jonathan GoldsteinRaghu Ramakrishnan Uri Shaft Department of Compter Sciences, University of Wisconsin-Madison June 18, 1997. Agenda. Introduction Compressing A Relation Compression Applied to Rectangle Base Indexes Performance Evaluation - PowerPoint PPT Presentation

Citation preview

Compressing Relations Compressing Relations And IndexesAnd Indexes

Jonathan Goldstein Raghu RamakrishnanUri Shaft

Department of Compter Sciences, University of Wisconsin-Madison

June 18, 1997

AgendaAgenda

IntroductionIntroduction Compressing A RelationCompressing A Relation Compression Applied to Compression Applied to

Rectangle Base IndexesRectangle Base Indexes Performance EvaluationPerformance Evaluation Questions and RemarksQuestions and Remarks

IntroductionIntroduction

Page level CompressionPage level Compression Performance StudyPerformance Study Application to B-trees and Application to B-trees and

R-treesR-trees Multidimensional bulk Multidimensional bulk

loading algorithmloading algorithm

IntroductionIntroduction

IntroductionIntroduction

Compressing A Compressing A relationrelation

Frames Of ReferenceFrames Of Reference Non numeric attributesNon numeric attributes File level compressionFile level compression

Frames of Frames of ReferenceReference

Lossy Lossy CompressionCompression

Point approximation in lossy Point approximation in lossy compressioncompression

Compressing an Compressing an indexing structureindexing structure

Compressing a B-treeCompressing a B-tree Compressing a rectangle Compressing a rectangle

based indexing structurebased indexing structure Compression oriented Compression oriented

Bulk LoadingBulk Loading

Rectangle Based Rectangle Based indexing qualitiesindexing qualities

Changing the frame Changing the frame of referenceof reference

Bulk-Loading Bulk-Loading AlgorithmAlgorithm

Input. A set of points in Input. A set of points in some n-dimentional some n-dimentional space.space.

Output. A partition of the Output. A partition of the inut into subsets.inut into subsets.

Requirements. The Requirements. The partition shuold group partition shuold group points that are close to points that are close to each other in the same each other in the same group as much as possiblggroup as much as possiblg

GB-Pack GB-Pack compression compression oriented bulk oriented bulk loadingloading

GB-Pack GB-Pack compression compression oriented bulk oriented bulk loadingloading

Qualities:Qualities:• trading off some tree quality trading off some tree quality

for increased compression.for increased compression.• number of entries per page is number of entries per page is

data-dependent.data-dependent.• cutting a dimension in a cutting a dimension in a

value boundary in the data.value boundary in the data.

GB-Pack GB-Pack compression compression oriented bulk oriented bulk loadingloading

GB-Pack GB-Pack compression compression oriented bulk oriented bulk loadingloading

GB-Pack GB-Pack compression compression oriented bulk oriented bulk loadingloading

Performance Performance EvaluationEvaluation

Relational Compression Relational Compression Experiments.Experiments.

CPU vs. I/O Costs.CPU vs. I/O Costs. Comparison With Comparison With

Techniques in commercial Techniques in commercial systems.systems.

Importance of Tuple-Level Importance of Tuple-Level Decompression.Decompression.

R-tree Compression R-tree Compression Experiments.Experiments.

Synthetic Data SetsSynthetic Data Sets

• Size: The number of tuples in Size: The number of tuples in the relation.the relation.

• Dimensionality: The number Dimensionality: The number of attributes of the relations.of attributes of the relations.

• Range: The range of values Range: The range of values for the attributes.for the attributes.

• Distribution :uniform(worst Distribution :uniform(worst case) / exponential.case) / exponential.

• Partition Strategy.Partition Strategy.• Page size.Page size.

Sales Data SetSales Data Set

Sales data set. Compression Sales data set. Compression Achieved versus dimensionality Achieved versus dimensionality

CPU vs. I/O CostsCPU vs. I/O Costs

R-tree Compression R-tree Compression ExperimentsExperiments

Testing the quality of R-trees on Sales Testing the quality of R-trees on Sales Data SetData Set..

Questions And Questions And RemarksRemarks