View
1.779
Download
3
Tags:
Embed Size (px)
DESCRIPTION
This presentation is prepared for the presentation of thesis titled as "Data compression for Large Multidimensional Data Warehouses" which was done for the partial fulfillment of the undergrad course in Dept of CSE, KUET, Bangladesh.
Citation preview
1
Data Compression for Large Multidimensional Data Warehouses
Dr. K.M. Azharul HasanAssociate Professor,
Head of the Department,Department of CSE, KUET
Presented by:Supervisor:Abdullah Al Mahmud,
Roll : 0507006Md. Mushfiqur
Rahman, Roll : 0507029
This slide is prepared by Abdullah Al Mahmud for the presentation of Thesis which was done as the partial fulfillment of degree of in undergrad course in Khulna University of Engineering & Technology(KUET), Bangladesh
2
Presentation Layout
Objectives Existing Compression Schemes Traditional Extendible Array Proposed Compression Scheme EXCS (Extendible Array Based Compression
Scheme)Comparative AnalysisConclusion
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
3
Data compression technology reduces: effective price of logical data storage
capacityimproves query performance
Multidimensional array is widely used in large number of scientific research.
An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses
Objectives
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
4
Existing Compression Schemes (1/ 3)
Bitmap compression Run Length Encoding Header compression Compressed Column Storage Compressed Row Storage
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
5
Existing Compression Schemes (2/ 3)
(a) A sparse array. (b) The CRS scheme
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
6
Existing Compression Schemes (3/ 3)
Classical methods cannot support updates without completely readjusting runs .
Compressing sparse array
Do not support extendibility
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
7
Traditional Extendible Array
TEA supports dynamic extension of dimension size.
0 1
2 3
4
5
6 7 8
9
10
11
0 1 4 9
0
2
6
0
0 1 3 5
2
4
Address Table
History Table
0History Counter= 012345
Figure 1: TEA Construction And Access
Position <1,3>H1[1]<H2[3]
Address of Cell=Address1[3]+1=10
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
8
Proposed Compression Scheme
Multidimensional arrays are important for sparse array operations
Extendibility of multidimensional arrays
A compression technique that can work on multidimensional extendible array
Our proposed compression scheme is EXCS (Extendible array based Compression Scheme)
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
9
Extendible array based Compression Scheme (EXCS) 1/3
We implemented the multidimensional extendible array in secondary memory
We have considered dimension =3 in our experimental approach
The sub-arrays are distinguished to store them individually in the secondary memory
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
10
Extendible array based Compression Scheme (EXCS) 2/3
The sub-arrays are of n-1(=2) dimension
A large no. of sub-arrays are generated to be compressed
Sub-arrays are dynamically taken as input
Only the max no of sub-arrays is to be given
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
11
Extendible array based Compression Scheme (EXCS) 3/3
Each sub-array is compressed individually
The compression technique used is similar to CRS
The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
12
Performance Measurement Performance is measured by measuring
two key factors of the compression schemes: Data Density Length of Dimension/ Number of Data
compression ratio=(compressed data/ original data)
space savings = 1 – compression ratio
we have considered space savings in percent
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
13
Comparative Analysis (1/4)
64 729 4096 15625 46656
-40
-20
0
20
40
60
80
100
HeaderBitmapCRSEACRSOffsetS
pa
ce
sa
vin
gs
No. of data
Figure: Comparison with fixed density = 20%
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
14
64 729 4096 15625 46656
-40
-20
0
20
40
60
80
100
HeaderBitmapCRSEACRSOffset
Sp
ac
e s
av
ing
s
No. of data
Figure: Comparison with fixed density = 25%
Comparative Analysis (2/4)
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
15
Comparative Analysis (3/4)
10 20 30 40 50
-60
-40
-20
0
20
40
60
80
100
Header
Bitmap
CRS
EACRS
Offset
co
mp
res
sio
n r
ati
o
Density of data
Figure: Comparison with fixed no. of data=64Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
16
Comparative Analysis (4/4)
10 20 30 40 50
-60
-40
-20
0
20
40
60
80
100
HeaderBitmapCRSEACRSOffset
co
mp
res
sio
n r
ati
o
Density of data
Figure: Comparison with fixed no. of data=4096
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
17
Performance Measurement
Extendibility of arrays Using multidimensional arrays Extendibility toward any dimensionEXCS allows dynamic extension of
arrays.In analysis, we can extend data up to n
dimensions Performance is good for large no. of
dataAbdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
18
Conclusion
Our proposed compression scheme is experimentally done up to 3 dimension data
It can be extended experimentally for compressing n dimension data in future.
EXCS is effective for large multidimensional data warehouses
Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh