32
Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002

Achieving Scalability in OLAP Materialized View Selection

  • Upload
    chico

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Achieving Scalability in OLAP Materialized View Selection. Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002. Topics. Overview of OLAP Exponentiality in View Selection Our Polynomial Greedy Algorithm (PGA) Test Results Conclusions Current Work. Customer. CustID. Name. - PowerPoint PPT Presentation

Citation preview

Page 1: Achieving Scalability in OLAP Materialized View Selection

Achieving Scalability in OLAP Materialized View Selection

Thomas P. NadeauToby J. Teorey

University of Michigan

DOLAP 2002

Page 2: Achieving Scalability in OLAP Materialized View Selection

2

Topics

• Overview of OLAP• Exponentiality in View Selection• Our Polynomial Greedy Algorithm (PGA)• Test Results• Conclusions• Current Work

Page 3: Achieving Scalability in OLAP Materialized View Selection

3

Example Star Schema

Sell

CustID

DateID

BindID

Cost

Fact Table

DateID

Month

Quarter

Year

Calendar

CustID

Name

City

State/Prov

Customer

Bind StyleBindID

Desc

Page 4: Achieving Scalability in OLAP Materialized View Selection

4

Star Schema Viewed with Data

Fact Table

Bind StyleBindID

PBHC

DescPaper BackHard Cover

DateID Month Quarter Year

1/1/98 Jan 1 1998

1/2/98 Jan 1 1998

12/31/00 Dec 4 2000

CustomerCustID Name City State/Prov

00001 U of M Ann Arbor MI00002 Smith & Co. Toronto Ont

SellCustID DateID BindID Cost$60000002 12/31/00 PB $500

$130000222 1/1/99 HC $1100

Many Rows

Calendar

Page 5: Achieving Scalability in OLAP Materialized View Selection

5

Eight Dimensions of Book Database

Attribute Hierarchy Levels

Trim Width 4

Trim Length 4

Pages 4

Quantity 4

Stock Width 4

Stock Length 4

Bind Style 4

Press 4

Page 6: Achieving Scalability in OLAP Materialized View Selection

6

Combinatorial Explosion

• Possible views = ℓi,

where d = |dimensions| ℓi = |levels| in dimension i

• Book database example– 2 dimensions, 42 = 16 views– 4 dimensions, 44 = 256 views– 6 dimensions, 46 = 4,096 views– 8 dimensions, 48 = 65,536 views

i = 1

d

Page 7: Achieving Scalability in OLAP Materialized View Selection

7

Recap

• Materialized views quicken query responses

• Disk space limits view materialization

• Update window is a constraint

• Solution: Select strategic views

Page 8: Achieving Scalability in OLAP Materialized View Selection

8

Our OLAP Optimization ApproachFact Table

Update

Users

Sample Data

Estimated View Size

Strategic Views

Current Views

Incremental Data

QueriesQuick

Responses

Completed Work

Current Work

Initial Data

Estimate Request

View Size Estimation

View Selection

View Maintenance

Query Optimization

Page 9: Achieving Scalability in OLAP Materialized View Selection

9

View Selection:Example of Hypercube Lattice [HRU96]

p = Part

s = Supplier

c = Customer

{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 10: Achieving Scalability in OLAP Materialized View Selection

10

Example of HRU Algorithm [HRU96]

5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0

5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M

6M - 1

{p, s}{c, s}{c, p}

{s}{p}{c}{}

Iteration 1

Benefits of Possible Materialization Choices

p = Part

s = Supplier

c = Customer

{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 11: Achieving Scalability in OLAP Materialized View Selection

11

0 x 4 = 00 x 4 = 0

0.79M x 2 = 1.58M0.6M x 2 = 1.2M

5.9M x 2 = 11.8M0.8M - 1

Iteration 2

Benefits of Possible Materialization Choices

p = Part

s = Supplier

c = Customer

Example of HRU

5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0

5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M

6M - 1

{p, s}{c, s}{c, p}

{s}{p}{c}{}

Iteration 1

{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 12: Achieving Scalability in OLAP Materialized View Selection

12

Exponentiality in HRU

• O(kn2) time, where k = |views to select|, n = |possible views|

• n = 2d in non-hierarchical database, where d = |dimensions|

• HRU algorithm is O(k22d) time• Two sources of exponentiality

– Each possible view is evaluated– Each view evaluation considers the effect of

materialization on every descendent

Page 13: Achieving Scalability in OLAP Materialized View Selection

13

Polynomial Greedy Algorithm (PGA)

Nominate smallest child view

Nomination Selection

For each candidate

Select fact table

[more candidates]

[else]

[termination condition met]

[else]

Evaluate benefit

Select view greedily

Start new path

[path ended]

[continuing path]

Page 14: Achieving Scalability in OLAP Materialized View Selection

14

p = Part

s = Supplier

c = Customer

Example of PGA [NT02]{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 15: Achieving Scalability in OLAP Materialized View Selection

15

Example of PGA{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

p = Part

s = Supplier

c = Customer

Nomination

Candidates

{p, s}{s}{}

Page 16: Achieving Scalability in OLAP Materialized View Selection

16

Example of PGA

p = Part

s = Supplier

c = Customer

Candidates

{p, s}{s}{}

Iteration 1

5.2M x 4 = 20.8M5.99M x 2 = 11.98M

6M - 1

Nomination Selection

{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 17: Achieving Scalability in OLAP Materialized View Selection

17

Example of PGA

p = Part

s = Supplier

c = Customer

Candidates

{p, s}{s}{}

Iteration 1

5.2M x 4 = 20.8M5.99M x 2 = 11.98M

6M - 1

Candidates

{c, s}{s}{c}{}

Nomination Selection Nomination

{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 18: Achieving Scalability in OLAP Materialized View Selection

18

Example of PGA

p = Part

s = Supplier

c = Customer

Candidates

{p, s}{s}{}

Iteration 1

5.2M x 4 = 20.8M5.99M x 2 = 11.98M

6M - 1

Candidates

0 x 2 = 00.79M x 2 = 1.58M 5.9M x 2 = 11.8M

6M - 1

{c, s}{s}{c}{}

Iteration 2

Nomination Selection Nomination Selection

{c, p, s} 6M

{p, s} 0.8M {c, s} 6M {c, p} 6M

{s} 0.01M {p} 0.2M {c} 0.1M

{} 1

Page 19: Achieving Scalability in OLAP Materialized View Selection

19

Nomination Complexity

• Maximum swatch width is d.

• Maximum path length is d.

• Finding one path is O(d2) time

• Our strategy nominates a path each time a view is selected, complexity is O(d2k) time

Page 20: Achieving Scalability in OLAP Materialized View Selection

20

Evaluating Views in PGA

• Polynomial time evaluation requires approximating materialization benefits

• Account for smallest ancestor

• Account for materialized view with largest overlap in descendants

• Complexity of our algorithm is O(d2k2)

Page 21: Achieving Scalability in OLAP Materialized View Selection

21

Complexities

d = | dimensions |

g = geometric mean of the number of hierarchical levels per dimension

k = | views selected for materialization |

ℓ = | layers in lattice |

Database Type HRU PGA

Non-Hierarchical O(k22d) time O(d2k2) time

    O(d2k) space

Hierarchical O(kg2d) time O(dk2ℓ) time

    O(dkℓ) space

Page 22: Achieving Scalability in OLAP Materialized View Selection

22

Near Optimal Selection

d=2, ℓ = 4

0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350

OptimalHRUPolynomial Greedy

Materialization Costs (rows)

Qu

ery

Cos

ts (

row

s)

Page 23: Achieving Scalability in OLAP Materialized View Selection

23

Query Costs at Four Dimensions

Qu

ery

Cos

ts (

thou

san

ds

of r

ows)

Materialization Costs (thousands of rows)

0

200

400

600

800

0 20 40 60 80 100 120 140

HRU PGA

Page 24: Achieving Scalability in OLAP Materialized View Selection

24

Query Costs at Six Dimensions

Qu

ery

Cos

ts (

mil

lion

s of

row

s)

Materialization Costs (thousands of rows)

0

5

10

15

20

0 50 100 150 200 250

HRU PGA

Page 25: Achieving Scalability in OLAP Materialized View Selection

25

Query Costs at Eight Dimensions

Qu

ery

Cos

ts (

mil

lion

s of

row

s)

Materialization Costs (thousands of rows)

0

50

100

150

200

250

300

350

0 100 200 300 400 500

HRU PGA

Page 26: Achieving Scalability in OLAP Materialized View Selection

26

Performance at Four Dimensions

Materialization Costs (thousands of rows)

Pro

cess

ing

Tim

e (s

econ

ds)

0

50

100

150

200

250

0 20 40 60 80 100 120 140

HRU PGA

Page 27: Achieving Scalability in OLAP Materialized View Selection

27

Performance at Six Dimensions

0.00

50.00

100.00

150.00

200.00

0 50 100 150 200 250

HRU PGA

Materialization Costs (thousands of rows)

Pro

cess

ing

Tim

e (m

inu

tes)

Page 28: Achieving Scalability in OLAP Materialized View Selection

28

Performance at Eight Dimensions

0.00

50.00

100.00

150.00

200.00

0 100 200 300 400 500Materialization Costs (thousands of rows)

Pro

cess

ing

Tim

e (m

inu

tes)

HRU PGA

Page 29: Achieving Scalability in OLAP Materialized View Selection

29

Conclusions

• PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity

• PGA extends the usefulness of OLAP systems into higher dimensionality

Page 30: Achieving Scalability in OLAP Materialized View Selection

30

Current WorkFact Table

Update

Users

Sample Data

Estimated View Size

Strategic Views

Current Views

Incremental Data

QueriesQuick

Responses

Completed Work

Current Work

Initial Data

Estimate Request

View Size Estimation

View Selection

View Maintenance

Query Optimization

Page 31: Achieving Scalability in OLAP Materialized View Selection

31

Current Work

• Design alternative data structures for materialized views in OLAP

• Test impact of new data structures on update and query costs.

• Integrate our work into an OLAP system

Page 32: Achieving Scalability in OLAP Materialized View Selection

32

References

• [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada.

• [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada.

• [NT02] T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html .