18
Data Compression for Large Multidimensional Data Warehouses Dr. K.M. Azharul Hasan Associate Professor, Head of the Department, Department of CSE, KUET Presented by: Supervisor: Abdullah Al Mahmud, Roll : 0507006 Md. Mushfiqur Rahman, Roll : 0507029 1 This slide is prepared by Abdullah Al Mahmud for the presentation of Thesis which was done as the partial fulfillment of degree of in undergrad course in Khulna University of Engineering & Technology(KUET), Bangladesh

Data compression for Large Multidimensional Data Warehouses

  • View
    1.779

  • Download
    3

Embed Size (px)

DESCRIPTION

This presentation is prepared for the presentation of thesis titled as "Data compression for Large Multidimensional Data Warehouses" which was done for the partial fulfillment of the undergrad course in Dept of CSE, KUET, Bangladesh.

Citation preview

Page 1: Data compression for Large Multidimensional Data Warehouses

1

Data Compression for Large Multidimensional Data Warehouses

Dr. K.M. Azharul HasanAssociate Professor,

Head of the Department,Department of CSE, KUET

Presented by:Supervisor:Abdullah Al Mahmud,

Roll : 0507006Md. Mushfiqur

Rahman, Roll : 0507029

This slide is prepared by Abdullah Al Mahmud for the presentation of Thesis which was done as the partial fulfillment of degree of in undergrad course in Khulna University of Engineering & Technology(KUET), Bangladesh

Page 2: Data compression for Large Multidimensional Data Warehouses

2

Presentation Layout

Objectives Existing Compression Schemes Traditional Extendible Array Proposed Compression Scheme EXCS (Extendible Array Based Compression

Scheme)Comparative AnalysisConclusion

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 3: Data compression for Large Multidimensional Data Warehouses

3

Data compression technology reduces: effective price of logical data storage

capacityimproves query performance

Multidimensional array is widely used in large number of scientific research.

An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses

Objectives

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 4: Data compression for Large Multidimensional Data Warehouses

4

Existing Compression Schemes (1/ 3)

Bitmap compression Run Length Encoding Header compression Compressed Column Storage Compressed Row Storage

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 5: Data compression for Large Multidimensional Data Warehouses

5

Existing Compression Schemes (2/ 3)

(a) A sparse array. (b) The CRS scheme

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 6: Data compression for Large Multidimensional Data Warehouses

6

Existing Compression Schemes (3/ 3)

Classical methods cannot support updates without completely readjusting runs .

Compressing sparse array

Do not support extendibility

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 7: Data compression for Large Multidimensional Data Warehouses

7

Traditional Extendible Array

TEA supports dynamic extension of dimension size.

0 1

2 3

4

5

6 7 8

9

10

11

0 1 4 9

0

2

6

0

0 1 3 5

2

4

Address Table

History Table

0History Counter= 012345

Figure 1: TEA Construction And Access

Position <1,3>H1[1]<H2[3]

Address of Cell=Address1[3]+1=10

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 8: Data compression for Large Multidimensional Data Warehouses

8

Proposed Compression Scheme

Multidimensional arrays are important for sparse array operations

Extendibility of multidimensional arrays

A compression technique that can work on multidimensional extendible array

Our proposed compression scheme is EXCS (Extendible array based Compression Scheme)

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 9: Data compression for Large Multidimensional Data Warehouses

9

Extendible array based Compression Scheme (EXCS) 1/3

We implemented the multidimensional extendible array in secondary memory

We have considered dimension =3 in our experimental approach

The sub-arrays are distinguished to store them individually in the secondary memory

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 10: Data compression for Large Multidimensional Data Warehouses

10

Extendible array based Compression Scheme (EXCS) 2/3

The sub-arrays are of n-1(=2) dimension

A large no. of sub-arrays are generated to be compressed

Sub-arrays are dynamically taken as input

Only the max no of sub-arrays is to be given

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 11: Data compression for Large Multidimensional Data Warehouses

11

Extendible array based Compression Scheme (EXCS) 3/3

Each sub-array is compressed individually

The compression technique used is similar to CRS

The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 12: Data compression for Large Multidimensional Data Warehouses

12

Performance Measurement Performance is measured by measuring

two key factors of the compression schemes: Data Density Length of Dimension/ Number of Data

compression ratio=(compressed data/ original data)

space savings = 1 – compression ratio

we have considered space savings in percent

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 13: Data compression for Large Multidimensional Data Warehouses

13

Comparative Analysis (1/4)

64 729 4096 15625 46656

-40

-20

0

20

40

60

80

100

HeaderBitmapCRSEACRSOffsetS

pa

ce

sa

vin

gs

No. of data

Figure: Comparison with fixed density = 20%

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 14: Data compression for Large Multidimensional Data Warehouses

14

64 729 4096 15625 46656

-40

-20

0

20

40

60

80

100

HeaderBitmapCRSEACRSOffset

Sp

ac

e s

av

ing

s

No. of data

Figure: Comparison with fixed density = 25%

Comparative Analysis (2/4)

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 15: Data compression for Large Multidimensional Data Warehouses

15

Comparative Analysis (3/4)

10 20 30 40 50

-60

-40

-20

0

20

40

60

80

100

Header

Bitmap

CRS

EACRS

Offset

co

mp

res

sio

n r

ati

o

Density of data

Figure: Comparison with fixed no. of data=64Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 16: Data compression for Large Multidimensional Data Warehouses

16

Comparative Analysis (4/4)

10 20 30 40 50

-60

-40

-20

0

20

40

60

80

100

HeaderBitmapCRSEACRSOffset

co

mp

res

sio

n r

ati

o

Density of data

Figure: Comparison with fixed no. of data=4096

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 17: Data compression for Large Multidimensional Data Warehouses

17

Performance Measurement

Extendibility of arrays Using multidimensional arrays Extendibility toward any dimensionEXCS allows dynamic extension of

arrays.In analysis, we can extend data up to n

dimensions Performance is good for large no. of

dataAbdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh

Page 18: Data compression for Large Multidimensional Data Warehouses

18

Conclusion

Our proposed compression scheme is experimentally done up to 3 dimension data

It can be extended experimentally for compressing n dimension data in future.

EXCS is effective for large multidimensional data warehouses

Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh