View
229
Download
0
Category
Tags:
Preview:
Citation preview
Ahsan AbdullahAhsan Abdullah
11
Data Warehousing Data Warehousing Lecture-12Lecture-12
Relational OLAP (ROLAP)Relational OLAP (ROLAP)
Virtual University of PakistanVirtual University of Pakistan
Ahsan AbdullahAssoc. Prof. & Head
Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, IslamabadEmail: ahsan@cluxing.com
Ahsan AbdullahAhsan Abdullah
22
Relational OLAP (ROLAP)Relational OLAP (ROLAP)
Ahsan AbdullahAhsan Abdullah
33
Why ROLAP?Why ROLAP?
Issue of scalability i.e. curse of dimensionality Issue of scalability i.e. curse of dimensionality for MOLAPfor MOLAP
Deployment of significantly large dimension tables Deployment of significantly large dimension tables as compared to MOLAP using secondary storage.as compared to MOLAP using secondary storage.
Aggregate awareness allows using pre-built Aggregate awareness allows using pre-built summary tables by some front-end tools.summary tables by some front-end tools.
Star schema designs usually used to facilitate Star schema designs usually used to facilitate ROLAP querying (in next lecture).ROLAP querying (in next lecture).
Ahsan AbdullahAhsan Abdullah
44
ROLAP as a “Cube”ROLAP as a “Cube” OLAP data is stored in a relational database (e.g. a OLAP data is stored in a relational database (e.g. a
star schema)star schema)
The fact table is a way of The fact table is a way of visualizing as visualizing as a “un-rolled” a “un-rolled” cube.cube.
So where is the So where is the cubecube?? It’s a matter of perceptionIt’s a matter of perception Visualize the fact table as an elementary cube. Visualize the fact table as an elementary cube.
Pro
du
ctGeo
g.Time
500500Z1Z1P2P2M2M2
250250Z1Z1P1P1M1M1
Sale K Rs.Sale K Rs.ZoneZoneProductProductMonthMonth
Fact Table
Ahsan AbdullahAhsan Abdullah
55
How to create “Cube” in ROLAP How to create “Cube” in ROLAP Cube is a logical entity containing values of a Cube is a logical entity containing values of a
certain fact at a certain aggregation level at an certain fact at a certain aggregation level at an intersection of a combination of dimensions.intersection of a combination of dimensions.
The following table can be created using The following table can be created using 3 3 queriesqueries
SUMSUM
(Sales_Amt)(Sales_Amt) M1M1 M2M2 M3M3 ALLALL
P1P1
P2P2
P3P3
TotalTotal
Month_ID
Pro
du
ct_I
D
Ahsan AbdullahAhsan Abdullah
66
For the table entries, without the totalsFor the table entries, without the totalsSELECT SELECT S.Month_Id, S.Product_Id, S.Month_Id, S.Product_Id,
SUM(S.Sales_Amt)SUM(S.Sales_Amt)FROM SalesFROM SalesGROUP BYGROUP BY S.Month_Id, S.Product_Id;S.Month_Id, S.Product_Id;
For the row totalsFor the row totalsSELECTSELECT S.Product_Id, SUM (Sales_Amt)S.Product_Id, SUM (Sales_Amt)FROM FROM SalesSalesGROUP BYGROUP BY S.Product_Id;S.Product_Id;
For the column totalsFor the column totalsSELECT S.Month_Id, SUM (Sales) SELECT S.Month_Id, SUM (Sales) FROM Sales FROM Sales GROUP BY S.Month_Id;GROUP BY S.Month_Id;
How to create “Cube” in ROLAP using SQL How to create “Cube” in ROLAP using SQL
Ahsan AbdullahAhsan Abdullah
77
Problem With Simple ApproachProblem With Simple Approach
Number of required queries increases exponentially Number of required queries increases exponentially with the increase in number of dimensions. with the increase in number of dimensions.
Its wasteful to compute all queries.Its wasteful to compute all queries.
In the example, the first query can do most of the work of In the example, the first query can do most of the work of the other two queriesthe other two queries
If we could save that result and aggregate over Month_Id If we could save that result and aggregate over Month_Id and Product_Id, we could compute the other queries more and Product_Id, we could compute the other queries more efficientlyefficiently
Ahsan AbdullahAhsan Abdullah
88
CUBE ClauseCUBE Clause
The CUBE clause is part of SQL:1999The CUBE clause is part of SQL:1999
GROUP BY CUBEGROUP BY CUBE (v1, v2, …, vn) (v1, v2, …, vn)
Equivalent to a collection of Equivalent to a collection of GROUP BYGROUP BYs, one for s, one for each of the subsets of v1, v2, …, vneach of the subsets of v1, v2, …, vn
Ahsan AbdullahAhsan Abdullah
99
ROLAP & Space RequirementROLAP & Space Requirement
If one is not careful, with the increase in number of If one is not careful, with the increase in number of dimensions, the number of summary tables gets very dimensions, the number of summary tables gets very largelarge
Consider the example discussed earlier with the Consider the example discussed earlier with the following two dimensions on the fact table...following two dimensions on the fact table...
Time:Time: Day, Week, Month, Quarter, Year, All Days Day, Week, Month, Quarter, Year, All Days
Product:Product: Item, Sub-Category, Category, All Products Item, Sub-Category, Category, All Products
Ahsan AbdullahAhsan Abdullah
1010
A naïve implementation will require all combinations of summary tables at each and every aggregation level.
…24 summary tables, add in
geography, results in 120 tables
EXAMPLE: ROLAP & Space RequirementEXAMPLE: ROLAP & Space Requirement
Ahsan AbdullahAhsan Abdullah
1111
ROLAP IssuesROLAP Issues
Maintenance.Maintenance.
Non standard hierarchy of dimensions.Non standard hierarchy of dimensions.
Non standard conventions.Non standard conventions.
Explosion of storage space requirement.Explosion of storage space requirement.
Aggregation pit-falls.Aggregation pit-falls.
Ahsan AbdullahAhsan Abdullah
1212
ROLAP Issue: ROLAP Issue: MaintenanceSummary tables are mostly a maintenance Summary tables are mostly a maintenance issue (similar to MOLAP) than a storage issue (similar to MOLAP) than a storage issue.issue.
Notice that summary tables get much smaller as Notice that summary tables get much smaller as dimensions get less detailed (e.g., year vs. day).dimensions get less detailed (e.g., year vs. day).
Should plan for twice the size of the unsummarized Should plan for twice the size of the unsummarized data for ROLAP summaries in most environments.data for ROLAP summaries in most environments.
Assuming "to-date" summaries, every detail record Assuming "to-date" summaries, every detail record that is received into warehouse must aggregate that is received into warehouse must aggregate into EVERY summary table.into EVERY summary table.
Ahsan AbdullahAhsan Abdullah
1313
Dimensions are NOT always simple hierarchiesDimensions are NOT always simple hierarchies
Dimensions can be more than simple hierarchies i.e. item, Dimensions can be more than simple hierarchies i.e. item, subcategory, category, etc. subcategory, category, etc.
The product dimension might also branch off by trade style The product dimension might also branch off by trade style that cross simple hierarchy boundaries such as:that cross simple hierarchy boundaries such as:
Looking at sales of Looking at sales of air conditionersair conditioners that cross manufacturer that cross manufacturer boundaries, such as COY1, COY2, COY3 etc. boundaries, such as COY1, COY2, COY3 etc.
Looking at sales of all “Looking at sales of all “green coloredgreen colored” items that even cross ” items that even cross product categories (washing machine, refrigerator, split-AC, product categories (washing machine, refrigerator, split-AC, etc.). etc.).
Looking at a combination of both.Looking at a combination of both.
ROLAP Issue: ROLAP Issue: Hierarchies
Ahsan AbdullahAhsan Abdullah
1414
Conventions are NOT absoluteConventions are NOT absolute
Example:Example: What is calendar year? What is a week? What is calendar year? What is a week?
Calendar:Calendar:
01 Jan. to 31 Dec or 01 Jan. to 31 Dec or
01 Jul. to 30 Jun. or01 Jul. to 30 Jun. or
01 Sep to 30 Aug.01 Sep to 30 Aug.
Week:Week:
Mon. to Sat. or Thu. to Wed.Mon. to Sat. or Thu. to Wed.
ROLAP Issue: ROLAP Issue: Convention
Ahsan AbdullahAhsan Abdullah
1515
ROLAP Issue: ROLAP Issue: Storage space explosion
Summary tables required for non-standard groupingSummary tables required for non-standard grouping
Summary tables required along different definitions Summary tables required along different definitions of year, week etc.of year, week etc.
Brute force approach would quickly overwhelm the Brute force approach would quickly overwhelm the system storage capacity due to a combinatorial system storage capacity due to a combinatorial explosion.explosion.
Ahsan AbdullahAhsan Abdullah
1616
ROALP Issues: ROALP Issues: Aggregation pitfalls
Coarser granularity correspondingly decreases Coarser granularity correspondingly decreases potential cardinality.potential cardinality.
Aggregating whatever that can be aggregated.Aggregating whatever that can be aggregated.
Throwing away the detail data after Throwing away the detail data after aggregation. aggregation.
Ahsan AbdullahAhsan Abdullah
1717
How to Reduce Summary tables?How to Reduce Summary tables?
Many ROLAP products have developed means Many ROLAP products have developed means to reduce the number of summary tables by:to reduce the number of summary tables by:
Building summaries on-the-fly as required by end-Building summaries on-the-fly as required by end-user applications.user applications.
Enhancing performance on common queries at Enhancing performance on common queries at coarser granularities.coarser granularities.
Providing smart tools to assist DBAs in selecting Providing smart tools to assist DBAs in selecting the "best” aggregations to build i.e. trade-off the "best” aggregations to build i.e. trade-off between speed and space.between speed and space.
Ahsan AbdullahAhsan Abdullah
1818
Maximum performance boost implies using Maximum performance boost implies using lots of disk space for storing every pre-lots of disk space for storing every pre-calculation.calculation.
Minimum performance boost implies no disk Minimum performance boost implies no disk space with zero pre-calculation.space with zero pre-calculation.
Using meta data to determine best level of Using meta data to determine best level of pre-aggregation from which all other pre-aggregation from which all other aggregates can be computed. aggregates can be computed.
Performance vs. Space Trade-OffPerformance vs. Space Trade-Off
Ahsan AbdullahAhsan Abdullah
Performance vs. Space Trade-off using WizardPerformance vs. Space Trade-off using Wizard
20
40
60
80
100
2 4 6 8
MB
% G
ain
Aggregation answers most queries
Aggregation answers few queries
Ahsan AbdullahAhsan Abdullah
2020
HOLAPHOLAP
Target is to get the best of both worlds.Target is to get the best of both worlds.
HOLAP (Hybrid OLAP) allow co-existence of HOLAP (Hybrid OLAP) allow co-existence of pre-built MOLAP cubes alongside relational pre-built MOLAP cubes alongside relational OLAP or ROLAP structures.OLAP or ROLAP structures.
How much to pre-build?How much to pre-build?
Ahsan AbdullahAhsan Abdullah
2121
DOLAPDOLAP
Cube on the remote server
Local Machine/Server
Subset of the cube is transferred to the local
machine
Ahsan AbdullahAhsan Abdullah
2222
EndEnd
Recommended