28
On Querying Versions of On Querying Versions of Multiversion Data Warehouse Multiversion Data Warehouse Tadeusz Morzy Tadeusz Morzy Robert Wrembel Robert Wrembel Poznań University of Technology Poznań University of Technology Institute of Computing Science Institute of Computing Science Poznań, Poland Poznań, Poland [email protected] [email protected]

On Querying Versions of Multiversion Data Warehouse

  • Upload
    taffy

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

On Querying Versions of Multiversion Data Warehouse. Tadeusz Morzy Robert Wrembel Poznań University of Technology Institute of Computing Science Poznań, Poland [email protected]. Presentation Outline. Context and motivation of our work Related work Contribution - PowerPoint PPT Presentation

Citation preview

Page 1: On Querying Versions of  Multiversion Data Warehouse

On Querying Versions of On Querying Versions of Multiversion Data Warehouse Multiversion Data Warehouse

Tadeusz MorzyTadeusz Morzy

Robert WrembelRobert Wrembel

Poznań University of TechnologyPoznań University of Technology

Institute of Computing ScienceInstitute of Computing Science

Poznań, PolandPoznań, Poland

[email protected]@cs.put.poznan.pl

Page 2: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 2

Presentation OutlinePresentation Outline

Context and motivation of our work

Related work

Contribution The concept of a multiversion data warehouse

Querying a muliversion data warehouse

Ongoing and future work

Page 3: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 3

Context and Motivation Context and Motivation (1)(1)

The research area encompases handling dynamics of EDSs in a DW (by means

of applying a MVDW) querying a MVDW

Dynamic nature of EDSs data dynamics

user data processing in EDSs DW refreshing

schema dynamics new user requirements dynamic nature of a real world tuning purposes require changes to EDSs schemas

changes to DW structure

Page 4: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 4

European Union extension compare the sum of membership fees paid by the

countries in 2002, 2003, 2004 in 2004 – substantial increase did the countries pay

more in 2004? without the knowledge about EU extensions we could

end up with confusing conclusions

Context and Motivation Context and Motivation (2)(2)

EU

Belgium...

UK

T1 - 2002 T3 - 2004

Belgium...

UK

EU

PolandSlovenia

...

EU

T1 - 2003

Belgium...

UK

Page 5: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 5

Sales analysis: reclassification of products to categories e.g., building elements changed tax from 7% to 22%

(Poland) compute and compare the sum of income from brick sales in

2003 and 2004 did sales of bricks increased 2004 by 15%? by simply updating brick’s vat from 7% to 22% we lose

information that in the past vat was 7%

Context and Motivation Context and Motivation (3)(3)

Page 6: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 6

ChallengesChallenges

Dynamic nature of EDSs and real world should be reflected in a DW

New functionality of a data warehouse supporting changes of:

fact and level tables dimension instance structures

providing tools for appropriate analysis of data coming from different time periods

Page 7: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 7

Related Approaches Related Approaches (1)(1)

Schema and data evolution [Blaschka et al., DaWaK99], [Hurtado et al.,

ICDE99, DOLAP99], [Koeller et al. DOLAP98]

Temporal extensions [Chamoni et al., DaWaK99], [Eder et al.,

DaWaK01, CAISE02], [Mendelzon et al., VLDB00]

Versioning implicit [Kang et al., VLDB02], [Quass et al.,

SIGMOD97], [Kulkarni et al, IDEAS99], [Teschke et al., DEXA98]

explicit [Bellahsene, DEXA98], [Body et al. DOLAP02]

virtual [Balmin et al., VLDB00]

Page 8: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 8

The approaches assume that time is linear (DW states are ordered by time) true for past

Other LimitationsOther Limitations

not always true for future what-if analysis

Page 9: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 9

Our ApproachOur Approach (1)(1)

Multiversion Data Warehouse MVDW is composed of a set of its versions changes in a DW structure and data reflected in a new

explicitly derived version of a DW DW Version

a schema version (facts, dimensions, levels, level instances)

an instance version (stores the set of data consistent with its schema version; measures/cell values)

Page 10: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 10

Our ApproachOur Approach (2)(2)

Types of DW versions real

• reflects changes in real world• linearly ordered by time they are valid within• derived from another real version

alternative• created for simulation purposes (what-if analysis)• form DAG• derived from another real or alternative version

Page 11: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 11

Data Model and ConceptsData Model and Concepts

Formal model of MVDW International Conference on Enterprise Information Systems

(ICEIS), France, 2003

Time integrity constraints for DW versions ACM Symposium on Applied Computing (SAC), Cyprus, 2004

Data sharing concept and evaluation 6th Baltic Conference on Databases & Information Systems,

Ryga, 2004 Conference on Current Trends in Theory and Practice of

Informatics (SOFSEM), Slovakia, 2005 (to appear)

Transaction concept International Conference on Enterprise Information Systems

(ICEIS), Portugal, 2004

Page 12: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 12

Querying MVDWQuerying MVDW

Step 1 query decomposition PQ execution PR retrieval and

presentation Step 2

PR integration

Page 13: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 13

MVQ User Interface MVQ User Interface (1)(1)

Page 14: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 14

MVQ User Interface MVQ User Interface (2)(2)

Page 15: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 15

Modes of Querying Modes of Querying (1)(1)

Querying the current DW version by default a user addresses the latest real DW version

Querying the set of real DW versions by specifying time period of interest, real versions are

valid within begin validity time - end validity time

select ...from ...where ...group by ...version from date 'begin date' to date 'end date'

select ...from ...where ...group by ...version from date 'begin date' to date 'end date'

Page 16: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 16

Modes of Querying Modes of Querying (2)(2)

Querying the set of alternative DW versions a user has to explicitly provide a set of alternative

versions of interest

select ...from ...where ...group by ...alternative version in (ver_id | ver_name,..., )

select ...from ...where ...group by ...alternative version in (ver_id | ver_name,..., )

Page 17: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 17

Modes of Querying Modes of Querying (3)(3)

Merging results of partial queries by default, every result set of a partial query is

presented to a user separately in some cases partial queries can be merged into one

result set merging the results obtained by partial queries is

defined by including the MERGE INTO clause

select ...from ...where ...group by ...version from date 'begin date' to date 'end date'merge into {ver_id | ver_name}

select ...from ...where ...group by ...version from date 'begin date' to date 'end date'merge into {ver_id | ver_name}

original partial results have to be transformed into a common schema transformation methods have to exist in the MVDW data dictionary

Page 18: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 18

Modes of Querying Modes of Querying (4)(4)

Merging results of partial queries into a common DW version will be possible if a multiversion query addresses attributes that are

present in all versions of interest there exist transformation methods between adjacent

DW versions

Page 19: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 19

Heterogeneous Schema Heterogeneous Schema VersionsVersions

Every version of a schema may have different structure problems in querying

Cases causing schemas heterogeneity handled by our prototype reclassification of level instances merging level instances splitting level instances level detachment level inclusion changing table name changing attribute name changing attribute domain dropping an attribute adding an attribute

Page 20: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 20

Reclassification of level Reclassification of level instancesinstances

Result sets from RV2 and RV3 annotated with metadata information

Dimension PRODUCT: Level PRODUCT: change association: Ytong bricks (vat 7% vat 22%)Dimension PRODUCT: Level PRODUCT: change association: Ytong bricks (vat 7% vat 22%)

Page 21: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 21

Merging level instancesMerging level instances

Result sets from RV3 and RV4 annotated with metadata information

Merge (Castorama, Marx Pipes) CastoramaMerge (Castorama, Marx Pipes) Castorama

Page 22: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 22

Level DetachmentLevel Detachment

Result sets from RV4 and RV5 annotated with metadata information

Dimension Shop: level detachment CityDimension Shop: source attribute: Shop.city_name City.city_name

Dimension Shop: level detachment CityDimension Shop: source attribute: Shop.city_name City.city_name

Page 23: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 23

Changing table/attribute nameChanging table/attribute name

Result sets from RV5 and RV6 annotated with metadata information

Table name changing: Sale Poland_SaleAttribute name changing: City.city_name City.cityTable name changing: Sale Poland_SaleAttribute name changing: City.city_name City.city

Page 24: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 24

Changing attribute domainChanging attribute domain

Hangling changes of attribute domains between DW versions

A forward and a backward conversion method has to be provided and registered in data dictionary Attr_Mappings.am_forward_meth_name Attr_Mappings.am_backward_meth_name

Conversion methods are implemented by a DW admin

Page 25: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 25

Prototype LimitationsPrototype Limitations

All predicates of the SELECT command apply to all DW versions pointed to in the VERSION FROM and VERSION IN clauses it is not possible to express a predicate on a single DW version

The query parser is unable to infer appropriate versions of interest from the WHERE clause

The query parser is able to compute an integrated result set of a multiversion query using basic aggregate functions: SUM, MIN, MAX, AVG

Page 26: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 26

MVDW MetamodelMVDW Metamodel

Page 27: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 27

Ongoing WorkOngoing Work

Building a Multiversion DWMS a project supported from the Polish State

Committee for Scientific Research

DS1 DS2 DS3

ODS data integration and buffering detection of schema and data changes propagation of schema and data changes

MVQIMVQI

MVDWMMVDWM

ODSMODSM

Page 28: On Querying Versions of  Multiversion Data Warehouse

Morzy T., Wrembel R.: On Querying Versions of Multiversion Data WarehouseDOLAP 2004 28

Open issues indexing multiversion data handling quality of information in a

MVDW

Future WorkFuture Work