20
Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1 , Ladjel BELLATRECHE 2 , Ahcène BOUKORCA 2 , Selma BOUARAR 2 1 University of Mostaganem, Algeria 2 LIAS/ISAE-ENSMA, Futuroscope, France [email protected] Laboratoire d’Informatique et d’Automatique pour les Systèmes

Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Embed Size (px)

DESCRIPTION

3 Energy in the DW World: Scenarios ExcelERP Databases Extraction TransformationLoading Sources Logical Design Query Processing: Date Cars Cities sum Peugeot Toyota Renault 1Qtr 2Qtr 3Qtr 4Qtr Paris Poitiers Tours Sum Physical Design Conceptual Design [Xu13], [Lang09], [Harizopoulos09], [Kunjir12], [Lang11] MirabelProject ( FP7 ) Deployment Design

Citation preview

Page 1: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Eco-DMW: Eco-Design Methodology for Data Warehouses

Amine ROUKH1, Ladjel BELLATRECHE2, Ahcène BOUKORCA2, Selma BOUARAR2

1University of Mostaganem, Algeria2LIAS/ISAE-ENSMA, Futuroscope, France

[email protected]

Laboratoire d’Informatique et d’Automatique pour les Systèmes

Page 2: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Context

68%

28%4% 0,01%

IT EquipementCoolingBuilding and IT Power LossLighting and General Receptacle

o DBMS is one of the major energy consumer;o Performance-oriented Design

Towards an Energy-aware DBMS Design

Database Design: Big PictureEnergy Consumption of a Data Center

2

Page 3: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

3

Energy in the DW World:Scenarios

Excel

ERP

Databases

Extr

actio

n

Tran

sfor

mati

on

Load

ing

Sources Logical Design

Query

Processing:

Date

Cars

Citie

s

sum

sum Peugeot

ToyotaRenault

1Qtr 2Qtr 3Qtr 4Qtr

Paris

Poitiers

Tours

Sum

Physical DesignConceptual Design

[Xu13], [Lang09],[Harizopoulos09],[Kunjir12], [Lang11]

Mirabel Project(FP7)

Deployment Design

Page 4: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

4

Selection of Optimization Structures (OS)

Formalization1. A DW Schema2. A Very Large Workload Q3. A Set of OS4. A Set of Constraints C related to

OS5. Non-Functional Requirements

(NFR)

Objective: selects schemes of OS satisfying NFR and respecting C

OS

Redundant(ROS)

e.g. materialized views

Non-Redundant

(NROS)e.g. Horizontal

Partitioning Constraints:

o Storage;o Maintenance

Constraint:o Number of

final fragments

Revisit of the Physical Design Phase Recommendations of Stavros Harizopoulos (CIDR’09) and Goetz Graefe (HP)

Physical DesignCharacteristics of OS

Page 5: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

5

Background & State of Art

Contributions

Experimental Studies

Summary

Agenda

Page 6: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

6

Power represents the rate of doing work, or energy per a unit of time (watts).

Energy is the ability to do work (joules).

Baseline Power: the power consumption when the machine is idle.

Active Power: the power consumption due to the execution of the workload.

Basic Concepts

Peak power: represents the maximum power.

Average power: average power consumed during the query execution.

State-of-art Contributions Experimental Studies Summary

Page 7: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

777

Definition of query processing cost models @ operation processing level:

e.g. [Xu’13, Kunjir’12, Roukh’15]

New Techniques:

Purchase Person Product

σ

π

⋈ ⋈σ

3 watts

5 watts

2 watts

3 watts + 2 watts5 watts + 3

watts

10 watts + 13 watts

15 watts + 28 watts

3 watts + 43 watts

Current Node Power +

Inherited Power

Total Power : 46 watts

o The proposition of cost-driven techniques for reducing energy.o QED (Explicit Delay) [Lang’09], E2DBMS (Automatic Feedback Control)

implemented in Postgres [Tu’11]

State of Art State-of-art Contributions Experimental Studies Summary

Page 8: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

8

Challenges:1. Accurate cost models (power consumption and processing time);2. A good trade-off between these two costs.

Case Study: Materialized View Selection Problem (MVSP)

Energy @ Physical Design State-of-art Contributions Experimental Studies Summary

MV Cube

Page 9: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Motivating Example State-of-art Contributions Experimental Studies Summary

Materialized Views Execution Time(min)

Power(watts)

C, P, D, S, L 10.83 16.07J32 3.2 19.73J31 5.13 18.06

J31 , J32 , J30 2.28 21.17J29 6.18 17.66

J29 , J31 , J30 2.45 21.01J29 , J30 , J31 , J32 1.9 23.11

Materialized view selection using two objective functions:

(1) query processing cost and (2) power consumption

7 Queries of SSB

9

Page 10: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Cost Models

ii n

jjio

n

jjcpui COSTIOCOSTCPUQPower

11

__)(

Power required of a given query Qi with nj operations is (intuitively):

1. IO_COST: number of I/O required to run the specified operation;2. CPU_COST: number of CPU Cycle and buffer cache gets required.

Use a Machine Learning Technique to calcule βi: Multiple Polynomial Regression

ii n

jjio

n

jjcpui COSTIOCOSTCPUQTime

11

__)(

1. αIO: CPU time of one CPU Cycle.2. αCPU: IO time to execute one IO operation.

State-of-art Contributions Experimental Studies Summary

Processing time of a given query Qi (nj operations)

ε) (CPU_COST β) (IO_COSTβ

)× CPU_COST (IO_COST β) (CPU_COSTβ

)(IO_COSTβ ) (CPU_COST β) (IO_COST β ) = βPower(Qi

414

413

52

4

23210

10

Page 11: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Identification of Pareto Front Set

7 queries

Evolutionary Algorithms to Select Non-dominated Solutions (due to the Search Space of MVSP)

State-of-art Contributions Experimental Studies Summary

11

Page 12: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Genetic AlgorithmCoding

Fitness Evaluation

Crossover

Stop

Mutation

Selection

Itr < Itr_NbrYesNo

1 1 0 1 {J29,J30,J32}

Bit string representation

Pareto-ranking based on multi objective genetic algorithm (NSGA-II)

Flip Bit Mutation

Half uniform crossover

0 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1 0

0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 0

P1 P2

C1 C2

*Weighted sum of the objective functions

State-of-art Contributions Experimental Studies Summary

WSOF*

Set of MV Conf

Final MV Conf

12

Page 13: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Experimental Study (I) Environment

® Linux, Dell Precision, Intel Core i5

2.27GHz, 4GB RAM® Oracle 11gR2, Java® R language for regression® MOEA Framework

Data Warehouse ® SSB benchmark ® Scale Factor = 10

Power Measurement :® Watts UP? Pro ES power meter® 1 hertz sampling frequency

State-of-art Contributions Experimental Studies Summary

13

Page 14: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Experimental Study (II) State-of-art Contributions Experimental Studies Summary

Study of the Quality of our Cost Models

14

Page 15: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Our Algorithm vs. Exhaustive Algorithm

State-of-art Contributions Experimental Studies Summary

Experimental Study (III)

Workload of 200 queries:1. MOEA: 7s2. BNL*: 4 days!

*Block-Nested Loops algorithm to get Pareto front points

15

Page 16: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Size of Materialized Views vs. Performance and Power Consumption.

Experimental Study (IV) State-of-art Contributions Experimental Studies Summary

#Cost Analysis

16

Page 17: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Performance and Power/Energy Saving

State-of-art Contributions Experimental Studies Summary

Experimental Study (V)

• Origin: workload without optimization100

timeMV

timeMVii Power

PowerPowergPowerSavin

17

Page 18: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

TPC-H 10GB

TPC-H 100GB

TPC-DS 100GB

0.0%

1.0%

2.0%

3.0%

4.0%Average Power Error

Avg Error

Environment® Dell PowerEdge, Intel Xeon E3 2.67GHz, 10GB RAM, 1TB HDD

Data Warehouse ® TPC-H, TPC-DS® Scale Factor = 10, 100

Experimental Study (VI) State-of-art Contributions Experimental Studies Summary

18

Page 19: Eco-DMW: Eco-Design Methodology for Data Warehouses Amine ROUKH 1, Ladjel BELLATRECHE 2, Ahcène BOUKORCA 2, Selma BOUARAR 2 1 University of Mostaganem,

Summary

Energy-aware Physical Design Power & Query Processing Cost Models (machine learning)

MVSPVery Large Workload

A multi-objective materialized view selection algorithm

Experimental Studies Active power savings up to 38% and total energy savings up to 84%

Generalization of the Methodology to other OS

Study of the variation of deployment platforms of DBMS

Integration of Energy in earlier phases of the design.

State-of-art Contributions Experimental Studies Summary

19