View
216
Download
0
Category
Preview:
Citation preview
Master of Science in Engineering in Computer Science (MSE-CS)
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE
ANTONIO RUBERTI
(MSE-CS)
Seminars in Software and Services for the Information Society
Umberto Nanni
1Seminars of Software and Services for the Information SocietyUmberto Nanni
Data Warehousing – Introduction to the Multidimensional Model
Multidimensional model
event in the real worldU. Nanni, di anni 28,
acquista
uno scopettone
Pippo, di anni 32,
acquista
brillantante
Pluto, di anni 77,
acquista
dentifricio
Paperino, di anni 16,
acquista
lucido da scarpe
U. Nanni, aged 28,
buysDB design:
datum in the operational DB(rappresenting the event)
vendita di
uno scopettoneone box of “DS”
dish soap is sold
uno scopettonebrillantantedentifriciolucido da scarpebuys
a box of dish soapDB design:
selection of all and only features
having operational relevance
DW design:
..., definition of aggregations (with a given granularity)
of analytic interest to some users
2Seminars of Software and Services for the Information SocietyUmberto Nanni 2
fact in the enterpise DW
(measurement
in an n-dimensional
space)
sold N units of article X
at point od sale Y
on day Z
of analytic interest to some users
Events and Facts
• EVENT (in the real world) and FACT (of
interest) are terms whose meaning is interest) are terms whose meaning is
determined by the granularity (level of detail)
of the multidimensional representation of the
Data Warehouse
Example of FACT:
3Seminars of Software and Services for the Information SocietyUmberto Nanni
Example of FACT:
on 02/05/2012 in “Rome 23” PoS (Point of Sale) 278
boxes of soap have been sold, cashing 745 Euro
Multidimensional Cube
13
/11
/20
12
06
/11
/20
12
07
/11
/20
12
08
/11
/20
12
11
/11
/20
12
12
/11
/20
12
05
/11
/20
12
05
/05
/20
04
brusher
Roma
toothpaste
brusher
dish soap
broom
13
/11
/20
12
06
/11
/20
12
07
/11
/20
12
08
/11
/20
12
11
/
12
/11
/20
12
05
/11
/20
12
Firenze
L’Aquila
Roma
Torino
Venezia
4Seminars of Software and Services for the Information SocietyUmberto Nanni
units: 278
sales: € 745
Foggia
Palermo
Dimensions
13
/11
/20
12
06
/11
/20
12
07
/11
/20
12
08
/11
/20
12
11
/11
/20
12
12
/11
/20
12
05
/11
/20
12
← Ame →
Firenze
L’Aquila
Roma
Torino
Venezia
13
/11
/20
12
06
/11
/20
12
07
/11
/20
12
08
/11
/20
12
11
/
12
/11
/20
12
05
/11
/20
12
← p
lace
→
toothpaste
brusher
dish soap
broom
5Seminars of Software and Services for the Information SocietyUmberto Nanni
Foggia
Venezia
Palermo
A relational schema
for the multidimensional cube
TABLE(dimens-1, …, dimens-k, measur-1, …, measur-h)
dimensions(classification attributes)
measurements(variables, metrics, indicators, …)
Example:
SALES(product, city, date, numUnitsSold, sales)
6Seminars of Software and Services for the Information SocietyUmberto Nanni
IDENTIFICATIVE KEY
Functional dependence: dimensions → measurements
In the example: product, city, date → numUnitsSold, sales
Dimensional hierarchy
Each dimension is associated with a hierarchy that groups the values at different levels of aggregation (also orthogonally).(also orthogonally).
Each node of the hierarchy is called DIMENSIONAL ATTRIBUTE
Example (time dimension):
day
Functional Dependencies:
7Seminars of Software and Services for the Information SocietyUmberto Nanni
weekmonth
four-month
period
quarter
year
Functional Dependencies:
day→ week
day → month
month → quarter
month → four-month-period
quarter → year
four-month-period → year
ALL
Access to Data Warehouse
Reporting
• predefined needs (poss., parametrized)
• automated extraction of information
OLAP
• analysis needs that are not identifiable in advance (browsing on dimensions and indicators)
• interactive data exploration, searching for information of interest
8Seminars of Software and Services for the Information SocietyUmberto Nanni
information of interest
Data Mining
• analysis needs that are not identifiable in advance
• (semi)automatic data exploration
Reporting
Report with predefined structure and format
• query– typically based on restrictions and / or aggregations– typically based on restrictions and / or aggregations
• presentation– interleaved with text data
– in tabular and / or graphic form
– customized with environment variables (date, user, ...)
• generation– on user demand
9Seminars of Software and Services for the Information SocietyUmberto Nanni
– on user demand
– periodic
– under specified conditions
• distribution– preview/print, e-mail, web, …
OLAP
• implementation:
– analysis session, divided into a series of steps, – analysis session, divided into a series of steps,
each dependent on the results obtained
previously
• tipical users:
10Seminars of Software and Services for the Information SocietyUmberto Nanni
– domain experts, not necessarily computer
experts
OLAP: example
06
/11
/20
12
07
/11
/20
12
08
/11
/20
12
/20
12
12
/11
/20
12
05
/11
/20
12
Firenze
L’Aquila
Roma
Torino0
6/1
1/2
01
2
07
/11
/20
12
08
/11
/20
12
11
/11
/20
12
12
/11
/20
12
05
/11
/20
12
toothpaste
brusher
dish soap
11Seminars of Software and Services for the Information SocietyUmberto Nanni
Palermo
OLAP – Basic operations in the multidimensional
model
RESTRICTION (filtering based on values)
• selection on specific values of dimensional attributes• selection on specific values of dimensional attributes
• in case of equality with a single value, the operator is called
SLICING (one dimension disappears)
AGGREGATION (level of abstrazion increases)
• considered one dimension, group one dimensional attribute
12Seminars of Software and Services for the Information SocietyUmberto Nanni
• considered one dimension, group one dimensional attribute
to a higher level of granularity
• joining up to the highest level (the most generic, consisting of
a single value), a dimension disappears
OLAP: Roll-Up Operator
Aggregation along one dimension
→ use of an aggregate operator on a measurement attribute
Examples:
a. sales of each product by city and month
b. daily sales per city (total on all products)
c. daily sales by product (North / Central / South)
← day →← month →
13Seminars of Software and Services for the Information SocietyUmberto Nanni
← city
→
← city
→(a)
OLAP: Aggregative Operators
DISTRIBUTIVE: computable from:
1. partial aggregates1. partial aggregates
– Examples: SUM, MAX, MIN, …
ALGEBRAIC: computable from:
1. partial aggregates
2. finite set of support information
– Examples : AVG, standard deviation, …
14Seminars of Software and Services for the Information SocietyUmberto Nanni
– Examples : AVG, standard deviation, …
OLISTIC: computable from:
1. original data
– Example: rank of an array
OLAP: Drill-Down Operator
Disaggregation over one dimension
→ disaggregated data are assumed to be available
Examples:Examples:
a. from “sales of each product by city and month”
to “daily sales”
b. from “monthly sales by city (total on all products)”
to “daily sales by city and product”← date →← month →
15Seminars of Software and Services for the Information SocietyUmberto Nanni
← city
→
← city
→ (a)
OLAP: Cuboids (dimensions)
• given a data cube, you can perform different roll-up
operations up to the disappearance of a dimension
• given a k-dimensional data cube, there exist 2k• given a k-dimensional data cube, there exist 2k
possible cuboidsall
product date city
roll-up
16Seminars of Software and Services for the Information SocietyUmberto Nanni
product, date, city
product, data product, city date, citydrill-down
roll-up
OLAP: Cuboids (aggregation levels)
• given a data cube, you can perform different roll-up operations,
increasing the aggregation level (up to the disappearance of a
dimension)dimension)
• given a k-dimensional data cube, resp. with n1, n2, …, nk
agregation levels, there exist n1×n2 ×…×nk possible cuboids
{ }
brand month
roll-up
17Seminars of Software and Services for the Information SocietyUmberto Nanni
article, date
articlebrand, month
date
article, month brand, date
drill-down
roll-up
OLAP: Slice-and-Dice Operators
Selection (up to “slicing”)
Examples:
rinsing detergents
Examples:
from “monthly sales of products by city” to:
a. monthly sales of products by city in central Italy
b. monthly sales by city of dishwasher detergents
Se
pt-
12
Oct
-12
Au
g-1
2
Sp
tt-1
2
Oct
-12
Au
g-1
218Seminars of Software and Services for the Information SocietyUmberto Nanni
brooms
rinsing detergentsdishwasher detergents
Firenze
L’Aquila
Roma
Torino
Palermo
(b)
dishwasher detergents
Firenze
L’Aquila
Roma
Torino
Palermo
Sp
tt
Oct
OLAP: Pivoting Operators
Rotating dimensions on a bidimensional table
Example
from: sales of (products by month) by (city)from: sales of (products by month) by (city)
to: sales of (products by city) by (month)
RM FI AQ TO PArinsing det aug-12 34 23 12 56 65
sep-12 56 45 23 44 67oct-12 76 34 34 55 45
dishw. det. aug-12 57 46 35 79 88
aug-12 sep-12 oct-12rinsing det. RM 34 56 76
FI 23 45 34AQ 12 23 34TO 56 44 55PA 65 67 45
dishw. det. RM 57 79 99FI 46 68 57
AQ 35 46 57two
19Seminars of Software and Services for the Information SocietyUmberto Nanni
sep-12 79 68 46 67 90oct-12 99 57 57 78 68
brooms aug-12 46 35 24 68 77sep-12 68 57 35 56 79oct-12 88 46 46 67 57
AQ 35 46 57TO 79 67 78PA 88 90 68
brooms RM 46 68 88FI 35 57 46
AQ 24 35 46TO 68 56 67PA 77 79 57
twooperations
OLAP: Drill-Through Operator
• access data at a level of aggregation lower
than the Data Warehouse
• access to reconciled data (three-level
architecture) or to the operational DB
20Seminars of Software and Services for the Information SocietyUmberto Nanni
• a sort of extreme Drill-Down
OLAP: Drill-Across Operators
• Correlate data from two or more cubes, based on
the dimensional values, returning the value in a new the dimensional values, returning the value in a new
cube
Example:
Starting from
sales by product, date, city
employers by city
21Seminars of Software and Services for the Information SocietyUmberto Nanni
employers by city
calculate:
average sales by product, employee
A Book on Datawarehouse
Matteo Golfarelli, Stefano Rizzi
Data Warehouse DesignData Warehouse Design
Modern Principles and Methodologies
McGraw-Hill, 2009.
Matteo Golfarelli, Stefano Rizzi
22Seminars of Software and Services for the Information SocietyUmberto Nanni
Matteo Golfarelli, Stefano Rizzi
Data Warehouse
Teoria e Pratica della Progettazione (2a ed.)
McGraw-Hill, 2006.
Recommended