Upload
berit
View
36
Download
0
Embed Size (px)
DESCRIPTION
From UML to ROLAP multidimensional databases using a pivot model. Nicolas PRAT, ESSEC Business School Jacky AKOKA, CNAM Paris. BDA 2002, INT, Evry, October 2002. Overview. 1. Introduction 2. Unified multidimensional metamodel 3. Design method 4. Conclusion. 1. Introduction. Introduction. - PowerPoint PPT Presentation
Citation preview
N.PRAT, J.AKOKA – BDA 2002
From UML to ROLAP From UML to ROLAP multidimensional databases multidimensional databases
using a pivot modelusing a pivot model
Nicolas PRAT, ESSEC Business SchoolNicolas PRAT, ESSEC Business SchoolJacky AKOKA, CNAM ParisJacky AKOKA, CNAM Paris
BDA 2002, INT, Evry, October 2002 BDA 2002, INT, Evry, October 2002
N.PRAT, J.AKOKA – BDA 2002 2
OverviewOverview
1. Introduction1. Introduction 2. Unified multidimensional metamodel2. Unified multidimensional metamodel 3. Design method3. Design method 4. Conclusion4. Conclusion
N.PRAT, J.AKOKA – BDA 2002 3
IntroductionIntroduction Data warehousing and OLAP market growing rapidlyData warehousing and OLAP market growing rapidly
=> need for systematic, tool-supported method for data => need for systematic, tool-supported method for data warehouse/multidimensional database design.warehouse/multidimensional database design.
Difficulty of data warehouse design often Difficulty of data warehouse design often underestimated by OLAP tool vendors. However, underestimated by OLAP tool vendors. However, crucial phase.crucial phase. =>=>
Data warehouse design should follow the Data warehouse design should follow the conceptual/logical/physical design phases (as in conceptual/logical/physical design phases (as in transactional database design).transactional database design).
1. Introduction
N.PRAT, J.AKOKA – BDA 2002 4
State of the artState of the art Many papers proposing multidimensional data models (sometimes Many papers proposing multidimensional data models (sometimes
with associated algebra/query language).with associated algebra/query language).
Only a few data warehouse design methods (Akoka 97, Akoka 01, Only a few data warehouse design methods (Akoka 97, Akoka 01, Golfarelli 98, Cabibbo 98, Moody 00).Golfarelli 98, Cabibbo 98, Moody 00).
Distinction between conceptual/logical/physical:Distinction between conceptual/logical/physical: Often unclear and/or missing phases.Often unclear and/or missing phases.
Our contribution:Our contribution: Data Data warehousewarehouse designdesign method based on UML, spanning the three design method based on UML, spanning the three design
phases.phases. Metamodels for each Metamodels for each designdesign step step ((including unified multidimensional including unified multidimensional
metamodelmetamodel=>pivot model=>pivot model).). Transformations operating on the concepts of the metamodelTransformations operating on the concepts of the metamodelss.. Specification of the transformations in OCL (Object Constraint Language).Specification of the transformations in OCL (Object Constraint Language).
1. Introduction
N.PRAT, J.AKOKA – BDA 2002 5
Multidimensional metamodelMultidimensional metamodel Problem with the multidimensional Problem with the multidimensional metametamodel:model:
No agreement on the concepts of this model (e.g. facts).No agreement on the concepts of this model (e.g. facts). No agreement on the level of this model: physical, logical, or No agreement on the level of this model: physical, logical, or
conceptual.conceptual.
We consider the multidimensional We consider the multidimensional metametamodel to be at model to be at the logical level:the logical level: It exists independently of implementationIt exists independently of implementation.. Its concepts (e.g. dimension) are not as close to reality as Its concepts (e.g. dimension) are not as close to reality as
concepts like the object or the entity.concepts like the object or the entity. Strong parallel with the relational model.Strong parallel with the relational model.
We have defined a unified multidimensional model.We have defined a unified multidimensional model.
2. Unified multidimensional metamodel
N.PRAT, J.AKOKA – BDA 2002 6
4
6
9
9
3
1
12
6 8 11 5 9 9
P1 P2 P3 P4 P5 P6 P7
3 March 99
4 March 99
5 March 99
6 March 99
7 March 99
8 March 99
9 March 99
BordeauxBrest
LyonNantes
Paris
LEGEND
Multidimensional modelingMultidimensional modeling
CATEGORY
:
product nameunit price
Attribute
PRODUCT
DA
Y
DIMENSION
CITY
REGION
Measure
Quantity sold
MO
NT
H
QU
AR
TE
R
YE
AR
Hierarchy
2. Unified multidimensional metamodel
N.PRAT, J.AKOKA – BDA 2002 7
Unified multidimensional metamodelUnified multidimensional metamodel2. Unified multidimensional metamodel
AggregateFunction
name : FunctionNamerestrictionLevel : Integer
AggregateFunction
name : FunctionNamerestrictionLevel : Integer
1..*
0..*
1..*
0..*
1..*
0..*
1..*
0..*
Measure
dummyMeasure : Boolean
Measure
dummyMeasure : Boolean
DimensionAttributeDimensionAttribute
1..*
0..*
+dimension
1..*
+measure0..* Dimensioning
strong : Boolean
Dimensioning
strong : Boolean
1..*
0..*
+dimension
1..*
+measure0..*
DimensionDimension
+ +owner
1attribute
0..*DimensionHierarchyDimensionHierarchy
++source1dimensionLink 0..*
DimensionLinkDimensionLink
+target dimensionLink 0..*1
1..*2..*
+dimensionHierarchy1..*
+dimensionLink
2..*
{ordered}
1..*2..*
++
{ }dimensionHierarchy
1..*dimensionLink
2..*
ordered
level : Integerlevel : Integer
ModelElement
name : Name
ModelElement
name : NameMultidimensionalModelMultidimensionalModel
MultidimensionalModelElementMultidimensionalModelElement
11..*
1+ownedElement1..*
11..* +
N.PRAT, J.AKOKA – BDA 2002 8
OverviewOverview3. Design method
Universe of discourseUniverse of discourse
CONCEPTUALDESIGN
CONCEPTUALDESIGN
LOGICALDESIGN
LOGICALDESIGN
PHYSICALDESIGN
PHYSICALDESIGN
DATACONFRON -
TATION
DATACONFRON -
TATION
UML schema
conceptual modeling
UML model
conceptual modeling
enrichment /transformation
Enriched /transformed UML schema
enrichment /transformation
Enriched /transformed UML model
Logical mapping
Unified multidimensional schema
Logical mapping
Unified multidimensional model
Physical mapping
ROLAP snowflakeschema
ROLAP star
schema
MOLAPschema
Physical mapping
Source confrontation
Data Warehouse Metadata
Source confrontation
Data Warehouse Metadata
N.PRAT, J.AKOKA – BDA 2002 9
Conceptual designConceptual design Multidimensional representation of data (OLAP).Multidimensional representation of data (OLAP).
Conceptual phase necessary (vs. direct representation of data in Conceptual phase necessary (vs. direct representation of data in ROLAP stars/snowflakes or MOLAP cubes).ROLAP stars/snowflakes or MOLAP cubes).
Choice of UML for the conceptual phase:Choice of UML for the conceptual phase: Standard and well-known formalismStandard and well-known formalism Simple and powerful constructs to represent data at a high level of Simple and powerful constructs to represent data at a high level of
abstractionabstraction ““Easy” mapping to relational and multidimensional systems.Easy” mapping to relational and multidimensional systems.
2-step conceptual design:2-step conceptual design: Definition of a UML model (class diagram without operations)Definition of a UML model (class diagram without operations) Enrichment/transformation of this model to facilitate further automatic Enrichment/transformation of this model to facilitate further automatic
mapping to a unified multidimensional model.mapping to a unified multidimensional model. =>need to enrich UML metamodel.=>need to enrich UML metamodel.
3. Design method
N.PRAT, J.AKOKA – BDA 2002 10
Enriched UML metamodelEnriched UML metamodel3. Design method
ModelElementname : Name
Relationship
AttributeOfOrdinaryClass
identifyingAttribute : Boolean
AttributeOfAssociationClass
OrdinaryClass AssociationClass OrdinaryAssociation
GeneralizationConstraint
UMLModel
Constraint
UMLModelElement
1
1..*
1
+ownedElement1..*
0..*0..*
+constraint
0..*+constrainedElement
0..*{ordered}
Association
Attributemeasure : Boolean
AssociationEnd
aggregation:AggregationKindmultiplicity : Multiplicity
11
2..* {ordered}
Generalization
Class1 0..*
+owner
1
+attribute
0..*{ordered}
0..*1
+association
0..*+participant
1
0..*
1
+specialization
0..*
+parent1
0..*
1+generalization
0..*
+child1
{disjoint,complete}
{disjoint,complete}
{disjoint,complete}
+connection
+association
ModelElementname : Name
Relationship
AttributeOfOrdinaryClass
identifyingAttribute :Boolean
AttributeOfAssociationClass
OrdinaryClass AssociationClass OrdinaryAssociation
GeneralizationConstraint
UMLModel
Constraint
UMLModelElement
1
1..*
1
+ownedElement1..*
0..*0..*
+constraint
0..*+constrainedElement
0..*{ordered}
Association
Attributemeasure :Boolean
AssociationEnd
aggregation:AggregationKindmultiplicity : Multiplicity
11
2..* {ordered}
Generalization
Class1 0..*
+owner
1
+attribute
0..*{ordered}
0..*1
+association
0..*+participant
1
0..*
1
+specialization
0..*
+parent1
0..*
1+generalization
0..*
+child1
{disjoint,complete}
{disjoint,complete}
{disjoint,complete}
+connection
+association
N.PRAT, J.AKOKA – BDA 2002 11
Conceptual Conceptual design design ((stepstep 11))
3. Design method
percentage_of_region
Private_ shareholder Public_ shareholderpublic_ shareholder_level
Person Company
manager_name
exposure
media_exposure
Year
year
Datedd_mm_yy
Media_type
media_typeinsertion
Shareholdershareholder_name
Region
regionnumber_of_ inhabitants
Quarter
quarter
11..*11..*
11..*
11..*
consumptionproduct_ consumption
Mediamedia_nameadvertising_price
1
*
1
*
1
*
1
*main_
shareholder
*
1..*
*
1..*gets
Product_type
product_typeproduct_unit *1..* *1..*
may_be_ advertised_in
Targettarget_codestatusminimum_agemaximum_age
sex
1..*
1
1..*
1
*
*
*
*
Advertising_ campaign
campaign_code
1
*
1
*
during
*
*
*
*
is_ strongly_ influenced_by
1..*
*
1..*
*
in
Product
product_codeproduct_name
1
*
1
*
*
*
*
*
*
1
*
1
for
**
*
percentage_of_region
Private_ shareholder Public_ shareholderpublic_ shareholder_level
Person Company
manager_name
exposure
media_exposure
Year
year
Datedd_mm_yy
Media_type
media_typeinsertion
Shareholdershareholder_name
Region
regionnumber_of_ inhabitants
Quarter
quarter
11..*11..*
11..*
11..*
consumptionproduct_ consumption
Mediamedia_nameadvertising_price
1
*
1
*
1
*
1
*main_
shareholder
*
1..*
*
1..*gets
Product_type
product_typeproduct_unit *1..* *1..*
may_be_ advertised_in
Targettarget_codestatusminimum_agemaximum_age
sex
1..*
1
1..*
1
*
*
*
*
Advertising_ campaign
campaign_code
1
*
1
*
during
*
*
*
*
is_ strongly_ influenced_by
1..*
*
1..*
*
in
Product
product_codeproduct_name
1
*
1
*
*
*
*
*
*
1
*
1
for
**
*
{overlapping,complete}
{disjoint,complete}
N.PRAT, J.AKOKA – BDA 2002 12
Conceptual design (Conceptual design (stepstep 2) 2) Enrichment/transformation of the UML Enrichment/transformation of the UML modelmodel with 4 with 4
types of successive types of successive transformationstransformations:: Determination of identifying attributesDetermination of identifying attributes Determination of attributes representing measuresDetermination of attributes representing measures Migration of association attributesMigration of association attributes TransformationTransformation of generalizations. of generalizations.
Determination of identifying attributes:Determination of identifying attributes: Identifier=not a standard concept in UMLIdentifier=not a standard concept in UML Necessary in order to define dimensions in the logical phaseNecessary in order to define dimensions in the logical phase Necessary for ordinary classes onlyNecessary for ordinary classes only Use of the tagged value {id}.Use of the tagged value {id}.
3. Design method
N.PRAT, J.AKOKA – BDA 2002 13
Conceptual design (Conceptual design (stepstep 2) 2) Determination of attributes representing measures:Determination of attributes representing measures:
Measures vs. qualitative valuesMeasures vs. qualitative values Distinction cannot based performed automatically based on typesDistinction cannot based performed automatically based on types Not necessary for identifiers (defined previously)Not necessary for identifiers (defined previously) Use of the tagged value {meas}.Use of the tagged value {meas}.
Migration of 1-1 and 1-N association attributes:Migration of 1-1 and 1-N association attributes: Check validity of representation first. Check validity of representation first. Transformation Tcc3 : Each attribute belonging to a 1-1 association is Transformation Tcc3 : Each attribute belonging to a 1-1 association is
transferred to one of the classes involved in the association.transferred to one of the classes involved in the association. Transformation Tcc4 : Each attribute belonging to a 1-N association is Transformation Tcc4 : Each attribute belonging to a 1-N association is
transferred to the N-class, i.e. the class involved several times in the transferred to the N-class, i.e. the class involved several times in the association.association.
TransformationTransformation of generalizations: of generalizations: No direct mapping of UML generalizations to multidimensional hierarchies. No direct mapping of UML generalizations to multidimensional hierarchies. UML generalizations transformed into aggregations and classesUML generalizations transformed into aggregations and classes..
3. Design method
N.PRAT, J.AKOKA – BDA 2002 14
Conceptual design (Conceptual design (stepstep 2) 2) Transformation of generalizations (cont’d):Transformation of generalizations (cont’d):
Transformation Tcc5 : For each level i of specialization of a class C, a class named Transformation Tcc5 : For each level i of specialization of a class C, a class named Type-C-i is created. The occurrences of these classes define all the specializations of C. Type-C-i is created. The occurrences of these classes define all the specializations of C. In case of overlapping between specializations, a special value is created for each In case of overlapping between specializations, a special value is created for each overlapping between two or more sub-classes of C. In case of incomplete overlapping between two or more sub-classes of C. In case of incomplete specialization, the special value “others” is created. A N-1 aggregation is created specialization, the special value “others” is created. A N-1 aggregation is created between the classes C and Type-C-i. between the classes C and Type-C-i.
3. Design method
Private_shareholder Public_ shareholder
public_shareholder_level
Person Company
manager_name
Private_shareholder Public_ shareholder
public_shareholder_level
Person Company
manager_name
{overlapping,complete}
{disjoint,complete}
Shareholder _type
shareholder_type {id}
Shareholder
shareholder_name {id}Transformation Tcc5
Shareholder_type
shareholder_type {id}
Shareholder
shareholder_name{id}public_shareholder_levelmanager_name
1*
Private_shareholder_type
private_shareholder_type {id}
1
**
*
1
1Shareholder_type
shareholder_type {id}
Shareholder
shareholder_name{id}public_shareholder_levelmanager_name
1*
Private_shareholder_type
private_shareholder_type {id}
1
**
*
1
1
Occurrences of shareholder_type: {private,public,both}Occurrences of private_shareholder_type: {person,company,others}
N.PRAT, J.AKOKA – BDA 2002 15
Logical designLogical design From enriched/transformed UML model to unified From enriched/transformed UML model to unified
multidimensional model. Mapping of:multidimensional model. Mapping of: Ordinary classes and their attributes (transformations Tcl1 to Tcl3)Ordinary classes and their attributes (transformations Tcl1 to Tcl3) Associations and their attributes (transformations Tcl4 to Tcl6).Associations and their attributes (transformations Tcl4 to Tcl6).
Mapping ordinary classes and their attributes :Mapping ordinary classes and their attributes : Transformation Tcl1: The identifying attribute of each ordinary class is Transformation Tcl1: The identifying attribute of each ordinary class is
mapped into a dimension in the multidimensional model. mapped into a dimension in the multidimensional model. Transformation Tcl2: The non-identifying attributes of each ordinary Transformation Tcl2: The non-identifying attributes of each ordinary
class are mapped into dimension attributes in the multidimensional model class are mapped into dimension attributes in the multidimensional model if these non-identifying attributes are not measures of interest. if these non-identifying attributes are not measures of interest.
Transformation Tcl3: The non-identifying attributes of each ordinary Transformation Tcl3: The non-identifying attributes of each ordinary class are mapped into measures in the multidimensional model if these class are mapped into measures in the multidimensional model if these non-identifying attributes are measures of interest. non-identifying attributes are measures of interest.
3. Design method
N.PRAT, J.AKOKA – BDA 2002 16
Logical designLogical design Specifying transformation Tcl3 with OCL :Specifying transformation Tcl3 with OCL :
ContextContext UMLModel::Tcl3(nonIdentifier:Attribute, UMLModel::Tcl3(nonIdentifier:Attribute,multidimensionalModel:MultidimensionalModel)multidimensionalModel:MultidimensionalModel):Measure:Measureprepre: nonIdentifier.owner.oclIsTypeOf(OrdinaryClass): nonIdentifier.owner.oclIsTypeOf(OrdinaryClass)
=true=true andand nonIdentifier.identifyingAttribute=falsenonIdentifier.identifyingAttribute=false andand nonIdentifier.measure=truenonIdentifier.measure=truepostpost:result.name=nonIdentifier.name:result.name=nonIdentifier.namepost:post:nonIdentifier.owner.attribute->nonIdentifier.owner.attribute->
forall(a1:Attribute|forall(a1:Attribute| ifif a1.identifyingAttribute=true a1.identifyingAttribute=true thenthen result.dimension=Tcl1(a1) result.dimension=Tcl1(a1) endifendif))postpost:multimensionalModel->includes(result) :multimensionalModel->includes(result)
3. Design method
N.PRAT, J.AKOKA – BDA 2002 17
Logical designLogical design Mapping ordinary classes and their attributes (example):Mapping ordinary classes and their attributes (example):
3. Design method
dimension target_codedimension quarterdimension year dimension regiondimension media_namedimension media_type
attribute status [target_code]attribute minimum_age [target_code]attribute maximum_age [target_code]attribute sex [target_code]attribute insertion [media_type]attribute advertising_price [media_name]
measure percentage_of_region [target_code]measure number_of_inhabitants [region]
exposure
media_exposure {meas}
Media_type
media_type {id}insertion
Region
region {id}number_of_inhabitants {meas}
Media
media_name {id}advertising _price
1
*
1
**
1..*
*
1..*gets
Target
target_code {id}statusminimum_agemaximum_agesexpercentage_of_region {meas}
*
*
*
*
*
exposure
media_exposure {meas}
Media_type
media_type {id}insertion
Region
region {id}number_of_inhabitants {meas}
Year
year {id}
Quarter
quarter {id}
1
1..*
1
1..*
Year
year {id}
Quarter
quarter {id}
1
1..*
1
1..*
Media
media_name {id}advertising _price
1
*
1
**
1..*
*
1..*gets
Target
target_code {id}statusminimum_agemaximum_agesexpercentage_of_region {meas}
*
*
*
*
*
Enriched/transformed UML model
Unified multidimensional model
Transformation Tcl1
Transformation Tcl2
Transformation Tcl3
N.PRAT, J.AKOKA – BDA 2002 18
Logical designLogical design Mapping associations and their attributes :Mapping associations and their attributes :
Transformation Tcl4: The attributes of each association class Transformation Tcl4: The attributes of each association class are mapped into measures, associated with dimensions are mapped into measures, associated with dimensions obtained by mapping the identifying attributes of the ordinary obtained by mapping the identifying attributes of the ordinary classes directly or indirectly participating in the association classes directly or indirectly participating in the association class (transformation Tcl1). class (transformation Tcl1).
Transformation Tcl5: A path formed by N-1 associations is Transformation Tcl5: A path formed by N-1 associations is mapped into a hierarchy in the multidimensional model. mapped into a hierarchy in the multidimensional model.
Transformation Tcl6: Every N-M or N-ary association Transformation Tcl6: Every N-M or N-ary association without at least one attribute that is always defined is mapped without at least one attribute that is always defined is mapped into a dummy measure, associated with dimensions obtained into a dummy measure, associated with dimensions obtained by mapping the identifying attributes of the ordinary classes by mapping the identifying attributes of the ordinary classes directly or indirectly participating in the association directly or indirectly participating in the association (transformation Tcl1). (transformation Tcl1).
3. Design method
N.PRAT, J.AKOKA – BDA 2002 19
Logical designLogical design Mapping Mapping associationsassociations and their attributes (example): and their attributes (example):
3. Design method
dimension target_codedimension quarterdimension year dimension regiondimension media_namedimension media_type
attribute status [target_code]attribute minimum_age [target_code]attribute maximum_age [target_code]attribute sex [target_code]attribute insertion [media_type]attribute advertising_price [media_name]
measure percentage_of_region [target_code]measure number_of_inhabitants [region]
measure media_exposure [media_name,target_code,quarter]
hierarchy time quarter->year hierarchy media_type media_name->media_type
dummy measure gets [region,media_name]
exposure
media_exposure {meas}
Media_type
media_type {id}insertion
Region
region {id}number_of_inhabitants {meas}
Media
media_name {id}advertising _price
1
*
1
**
1..*
*
1..*gets
Target
target_code {id}statusminimum_agemaximum_agesexpercentage_of_region {meas}
*
*
*
*
*
exposure
media_exposure {meas}
Media_type
media_type {id}insertion
Region
region {id}number_of_inhabitants {meas}
Year
year {id}
Quarter
quarter {id}
1..*
Year
year {id}
Quarter
quarter {id}
Media
media_name {id}advertising _price
1
*
1
*1..*
Target
target_code {id}statusminimum_agemaximum_agesexpercentage_of_region {meas}
*
*
*
*
*
Enriched/transformed UML model
Unified multidimensional model
Transformation Tcl4
Transformation Tcl5
Transformation Tcl6
1
N.PRAT, J.AKOKA – BDA 2002 20
Physical designPhysical design For each type of target system: metamodel + associated For each type of target system: metamodel + associated
transformations (elaborates on/completes OMG’s transformations (elaborates on/completes OMG’s Common Warehouse Metamodel).Common Warehouse Metamodel).
Example transformation (ROLAP star) :Example transformation (ROLAP star) : Transformation Tls4: Every hierarchy D1->D2->…->Dn of Transformation Tls4: Every hierarchy D1->D2->…->Dn of
the logical model is mapped by considering all the sub-the logical model is mapped by considering all the sub-hierarchies Dj->Dj+1…->Dn where 1<=j<n and Dj hierarchies Dj->Dj+1…->Dn where 1<=j<n and Dj dimensions at least one measure. A sub-hierarchy Dj-dimensions at least one measure. A sub-hierarchy Dj->Dj+1…->Dn is mapped in the physical model by defining in >Dj+1…->Dn is mapped in the physical model by defining in the dimension table identified by Dj a column corresponding the dimension table identified by Dj a column corresponding to each of the Di (where j<i<=n). to each of the Di (where j<i<=n).
3. Design method
N.PRAT, J.AKOKA – BDA 2002 21
ConclusionConclusion Data warehouse design method based on UML:Data warehouse design method based on UML:
Spans the conceptual/logical/physical levelsSpans the conceptual/logical/physical levels Each step: metamodels + associated transformationsEach step: metamodels + associated transformations Unified multidimensional Unified multidimensional metametamodel at the logical level model at the logical level
((pivot metamodelpivot metamodel).).
Tool support (prototype developed).Tool support (prototype developed).
Future works:Future works: Complete/specialise set of transformationsComplete/specialise set of transformations Further experimentationFurther experimentation Reverse engineering.Reverse engineering.
4. Conclusion