14
OLAP Operators for Complex Object Data Cubes Doulkifli Boukraˆ a 1 ,Omar Boussa¨ ıd 2 , Fadila Bentayeb 2 1 High School of Computer Science, Oued-Smar, Algiers d [email protected] 2 Lumi` ere University - Lyon 2, 5 avenue Pierre Mend` es-France, 69676 Bron Cedex {omar.boussaid,fadila.bentayeb}@univ-lyon2.fr Abstract. Nowadays, multidimensional models are recognized to best reflect the decision makers’ analytical view of data. The classical multi- dimensional models were meant to analyze conventional data (numerical and categorical). However, they fail to handle data complexity, which is expressed by the multiplicity of data sources, the heterogeneity of for- mats, the diversity of structures, etc. To this end, new multidimensional models have been proposed for OLAP purposes. Nevertheless, data com- plexity is partially covered in these models, which may cause a lack in decision making. In our previous work, we proposed to integrate data complexity within a complex object-based multidimensional model. In this paper, based on our proposed model, we provide adapted OLAP operators that take into account data complexity. Thus, we define op- erators to create complex data cubes, to visualize them and to analyze them. Key words: Multidimensional model, complex object, complex cube, OLAP operator 1 Introduction 1.1 Context and related work Nowadays, multidimensional modeling is recognized to best reflect the decision makers’ analytical view of data as witnessed by the literature richness about multidimensional models. These models were surveyed in [1]. Associated with the models are the OLAP operators that allow expressing analysis needs such as slice-and-dice, rollup and drill-down [5]. Besides, decision making involves more and more complex data (multiple sources, heterogeneous formats, diverse structures, etc.) Warehousing and analyzing complex data are not straightfor- ward activities. Moreover, we believe that the more data complexity aspects are considered in the warehousing process, the more accurate decisions are. Recently, there have been several papers on warehousing and analyzing non- conventional data. Examples of related work deal with unstructured textual data [9], semistructured data, represented with XML [8], temporal data [16], spatial data [7].

OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object DataCubes

Doulkifli Boukraa1,Omar Boussaıd2, Fadila Bentayeb2

1 High School of Computer Science, Oued-Smar, Algiersd [email protected]

2 Lumiere University - Lyon 2, 5 avenue Pierre Mendes-France, 69676 Bron Cedex{omar.boussaid,fadila.bentayeb}@univ-lyon2.fr

Abstract. Nowadays, multidimensional models are recognized to bestreflect the decision makers’ analytical view of data. The classical multi-dimensional models were meant to analyze conventional data (numericaland categorical). However, they fail to handle data complexity, which isexpressed by the multiplicity of data sources, the heterogeneity of for-mats, the diversity of structures, etc. To this end, new multidimensionalmodels have been proposed for OLAP purposes. Nevertheless, data com-plexity is partially covered in these models, which may cause a lack indecision making. In our previous work, we proposed to integrate datacomplexity within a complex object-based multidimensional model. Inthis paper, based on our proposed model, we provide adapted OLAPoperators that take into account data complexity. Thus, we define op-erators to create complex data cubes, to visualize them and to analyzethem.

Key words: Multidimensional model, complex object, complex cube,OLAP operator

1 Introduction

1.1 Context and related work

Nowadays, multidimensional modeling is recognized to best reflect the decisionmakers’ analytical view of data as witnessed by the literature richness aboutmultidimensional models. These models were surveyed in [1]. Associated withthe models are the OLAP operators that allow expressing analysis needs suchas slice-and-dice, rollup and drill-down [5]. Besides, decision making involvesmore and more complex data (multiple sources, heterogeneous formats, diversestructures, etc.) Warehousing and analyzing complex data are not straightfor-ward activities. Moreover, we believe that the more data complexity aspects areconsidered in the warehousing process, the more accurate decisions are.

Recently, there have been several papers on warehousing and analyzing non-conventional data. Examples of related work deal with unstructured textual data[9], semistructured data, represented with XML [8], temporal data [16], spatialdata [7].

Page 2: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

2 D. Boukraa et al.

Regarding data modeling and analysis, three approaches can be distinguished.The first approach consists in using existing OLAP tools to analyze non conven-tional data. In this case, data may remain in their sources as in [10] and OLAPoperates on a integrated virtual schema of a middleware. User queries are thentranslated according to the data source schemes. Some other papers propose tocapture the multidimensional concepts from non-conventional data in a bottom-up way and provide multidimensional models that can integrate into existingOLAP tools [8]. The advantage of this approach is to benefit from the maturityof existing OLAP technologies. However, data complexity is lost due to the lim-itation of the underlying multidimensional models. A mixed solution consists inextending traditional OLAP querying to external object data [14]. In this case,the object data serves as a decoration of the retrieved multidimensional data.

In the second approach, new multidimensional models are provided to dealwith one or many aspects of data complexity. Some models are brought to theconceptual level such as object-models [12, 13], temporal data models [16] orspatial models [2]. Other models are described at the logical level, especiallywith XML for semistructured data. Besides, the underlying OLAP operations arerevisited with respect to data nature and new operators are proposed. Examplesinclude the XML OLAP operators [17] and textual data aggregation [15].

1.2 Motivation and Contributions

The related work shows that many aspects of data complexity are covered, yetseparately. Furthermore, there is a lack of a framework that integrates as manyaspects as possible. We believe that such a framework would leverage the decisionmaking process since it provides the analysts with different points of view of thesame data, which is likely to be the case in real life. For instance, to best diagnosea medical case, doctors would combine numerical data (e.g. measurements) withtextual reports, radiographies, etc. Moreover, a complex data-warehousing andanalysis framework has to address the following issues:

– Cover as many data complexity aspects as possible in the multidimensionalmodel;

– Integrate into existing OLAP tools when only some aspects of complexityare considered;

– Support a large set of OLAP visualization techniques.

In a previous work, we proposed a multidimensional model that addressesthe first issue [3]. Our model is based on the concept of complex object thatcovers many aspects of data complexity, such as the multiplicity of structures,formats, sources, etc. In this paper, our main contributions are the following:

– a set of OLAP operators to construct complex data cubes;– a set of OLAP operators to visualize the data cubes and to analyze them.

The remainder of this paper is organized as follows. In section 2, we recallthe concepts of our proposed model. Then, we present in section 3 the set of

Page 3: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object Data Cubes 3

operators that construct complex cubes, visualize their data and analyze them.In section 4, we present some implementation details. Finally, we conclude insection 5 and give some perspectives.

2 The Complex Object-based Multidimensional Model

In this section, we recall the main concepts underlying our multidimensionalmodel of complex objects. Further details can be found in [3].

2.1 Concepts and Definitions

The Complex Object. The concept of complex object (CO) was proposedby Boussaıd et al [4] as a solution for complex data integration. According tothe authors, a CO is a physical or abstract entity composed by one or manysub-documents3. Each sub-document may represent a simple or tagged text, arelational view, an image or temporal data (e.g. sound, video). Basically, a COdescribes low level features of data (e.g. image color) and other features such asthe object’s source name and languages, but it can be extended to include otherfeatures such as semantic information (e.g. image content). In our model, weuse a CO to represent both facts and dimension members. Compared to existingobject multidimensional models, a fact or dimension member can be representedwith a whole UML class diagram rather than a single UML class as in manymodels. The advantage of such a conceptualization is the possibility to describestructurally and semantically rich facts and dimensions. A CO fact is equivalentto the xFact introduced by Nassis et al. [13]. In addition, we abstract a CO asa set of attributes. An attribute may be simple (e.g. image color) or complex ifcomposed by simple or other complex attributes (e.g. a whole image). Attributesare then related to each other via several relationships such as composition andassociation. At this stage, however, we consider only one kind of relationshipsthat organizes attributes in hierarchies, as it will be seen later.

Definition 1. A CO is a pair Obj = (IDObj , SAObj) where IDObj represents the

object’s identifier and SAObj = {AObji /i ∈ N} represents the set of its attributes.

The Complex Relationship. A complex relationship (CR) is an explicit linkbetween two COs. It may range from simple associations to aggregations, compo-sitions and specialization/generalization, etc. A CR is characterized by its nameand by the names of the two COs that it links.

Definition 2. A CR is a pair R = (ObjRs , ObjRt ) where ObjRs represents the

source object of R and ObjRt represents its target object.

3 the term document is used in a broad sense

Page 4: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

4 D. Boukraa et al.

The Attribute Hierarchy. An attribute hierarchy (AH) is a special relation-ship between a CO’s attributes. It is characterized by its name and by the setof its attributes, each one having a level within the hierarchy.

Definition 3. An AH is denoted by AHObj = {AObji ∈ SAObj ∪ {IDObj}/i ∈

N} ∪ {AllA} where AllA denotes a dummy attribute at the least detailed level.

The Object Hierarchy. An object hierarchy (OH) is similar to an AH but itis defined between many COs rather than between attributes. An OH is char-acterized by its name and by the set of COs, each one having a level within thehierarchy.

Definition 4. An OH is denoted by OH = {Obji/i ∈ N} ∪ {AllObj} whereAllObj represents a dummy object at the least detailed level.

The Multidimensional Schema. The multidimensional schema is composedby: (1) the set of complex objects, (2) the set of complex relationships, (3) theset of attribute hierarchies and (4) the set of object hierarchies.

Definition 5. The multidimensional schema is denoted by SCM=(SO, SR,SAH, SOH) where SO = {Obji/i ∈ N}, SR = {Rj/j ∈ N}, SAH = {AHk/k ∈N} and SOH = {OHm/m ∈ N}.

2.2 Example 1

A research laboratory wishes to warehouse data about scientific publishing inorder to answer different analysis needs like (1) assessing the quality of publica-tions according to different criteria (e.g. publication ratings), (2) assessing thescientific production of a researcher according to his/her publishing frequency.Data of scientific publishing can be considered as complex: they originate frommany sources (e.g. DBLP, PubZone.org), they may have different formats (e.g.images in conference websites) and they may be diversely structured (e.g. publi-cations are typically semistructured). In order to meet the users’ analysis needs,the data may be organized using our model as follows.

1. The complex objects: Publication, Author, Proceedings, Conference,Jounal number, Journal volume, Journal, Date. Examples of attributes forthe object Publication are title, pages, keyword, type and the identifier pub-lication id ;

2. The relationships: Authored by between Publication and Author, Date pubbetween Publication and Date, Publi conf between Publication and Proceed-ings, Publi journal between Publication and Journal number ;

Page 5: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object Data Cubes 5

3. The attribute hierarchies: H pub associated with Publication and composedby publication id and type, H time associated with Date and composed bydate id, month and year ;

4. The object hierarchies : H conf composed by Proceedings and Conference,H journal composed by Journal number, Journal volume and Journal.

The multidimensional schema described above is depicted in Figure 1.

Fig. 1. Example of a complex object-based multidimensional schema

3 OLAP operators

Our proposed multidimensional model is independent from any analysis context,i.e. there is no a priori fact or dimension. Indeed, the fact and dimensions aredefined on-line using a projection operation of the multidimensional schema ona set of its components. The projection produces a new structure, called complexcube that can be materialized and visualized. Furthermore, existing cubes canserve as the basis to create new ones, by modifying either their structure or theirdata. In this section, we present all these operations.

3.1 The cube construction operators

Cubic Projection. The objective of this operation is to construct a complexcube from the multidimensional schema by projecting it on the following ele-ments:

1. one complex object to play the role of the fact;2. a set of measures, each one associated with

(a) one attribute of the fact containing the basic (most detailed) values ofthe measure;

(b) one function that aggregates the measure values;(c) a set of relationships along which the aggregation of the measure values

makes sense (i.e. respecting the measure additivity);

Page 6: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

6 D. Boukraa et al.

3. a set of relationships that link the fact to the other objects;4. a set of objects that are directly linked to the fact;5. a set of object hierarchies containing the projected objects in 4;6. a set of object hierarchies associated with the objects projected in 4 and 5.

A dimension is then composed by the members of all the object hierarchies thatcontain the object directly linked to the fact.

Definition 6. Let SCM = (SO, SR, SAH,SOH) be a multidimensional schemaand SAF = {afi, i ∈ N} be a set of aggregation functions. The cubic projectionis denoted by ΠCObj(SCM) = C = (F, SM,SRC , SD, SAHC , SOHC) where

– F is the fact object such that F ∈ SO;– SM is the set of measures such that SM = {Mi/i ∈ N} where Mi represents

a measure. We also define the following three functions. (1) AttM associateseach measure Mi with one of the fact attributes, denoted by AMi whereAMi ∈ {IDF } ∪ SAF . (2) AggRel associates each measure Mi with the setSRMi . The set SRMi is such that SRMi = SRC if Mi is additive, SRMi = φif Mi is non-additive and SRMi ⊂ SRC if Mi is semi-additive. (3) AggFunassociates each measure Mi with a function afMi ∈ SAF ;

– SRC = {RCi , i ∈ N} is the set of relationships where SRC ⊆ SR;

– SD = {Dj , j ∈ N} is the set of dimensional objects SD ⊆ SO;– SAHC = {OHC

m,m ∈ N} is the set of reduced attribute hierarchies;– SOHC = {AHC

k , k ∈ N} is the set of reduced object hierarchies.

Example 2. Consider the multidimensional schema (Fig. 1) called SCM pub.Let’s suppose that the user aims at analyzing the publication ratings and theirkeywords to get the maximum ratings of publications by author and period oftime and the top keywords by author and conference. The corresponding cube forthis analysis context is the result of projecting SCM Pub on the CO Publication.We denote by C pub such a cube. Then, C pub= ΠC Publication(SCM pub)=(F, SM, SRC pub, SD, SAHC pub, SOHC pub) such that

– F = Publication– SM = {max rating, top keyword} where• AttM(max rating) = Rating• AggRel(max rating) = {Authored by, Date pub}• AggFun(max rating) = max• AttM(top keyword) = keyword• AggRel(top keyword) = {Authored by, Publi conf }• AggFun(top keyword) = top keyword

– SRC pub={Authored by, Date pub, Publi journal, Publi conf }– SD = {Time, Author, Proceeding, Conference, Journal number, Journal volume,

Journal}– SAHC pub = {H time}– SOHC pub = {H conf, H journal}

Page 7: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object Data Cubes 7

Constructing new cubes from existing cubes. In our model, a complexcube is a materialized view of the multidimensional schema. Materialized views(MV) are used to significantly enhance query response time in data warehouses[11]. In our work, we use MV (existing cubes) to construct new cubes as an al-ternative to using the multidimensional schema. This is argued by the following.First, from a structural standpoint, the analysis contexts are likely to share manycommon elements (the same fact, the same dimensions, etc.) and it is naturalto modify the structure of existing cubes by adding or deleting some elements.Secondly, from a content standpoint, many analysis contexts may correspond tothe same cube structure while containing different data. In this section, we pro-vide two kinds of cube-based operators: (1) structure-related operations (table1) that modify the structure of existing cubes and (2) data-related operations(table 2) that produce same-structured cubes but that contain different data.

Operation Definition

Add / remove a relationship ADDR(C,R+)|REMR(C,RC−)

Add / remove an attribute hierarchy ADDAH(C,AH+)|REMAH(C,AHC− )

Add / remove an object hierarchy ADDOH(C,OH+)|REMOH(C,OHC− )

Add / remove a measure ADDM (C,M+)|REMM (C,MC− )

Table 1. Structure-related operators for complex cube construction

Operation Definition

Data selection according to a predicate on an object σCC(P (Objσ))

Union of two cubes C1 ∪ C2

Difference of two cubes based one object C1 −C C2(Obj−)

Intersection of two cubes based on one object C1 ∩ C2(Obj∩)

Table 2. Data-related operators for complex cube construction

Example 3. Based on the cube C pub of example 2, the user can create a newcube in order to analyze max rating by author and by date. We can write C pub1= REMOH (REMOH (REMR (REMR (REMM (C pub, top keyword),Publi journal),Publi conf ),H conf ),H journal). Now, based on C pub1, the usercan switch from max rating to top keyword and analyze it by author and byjournal. We can write C pub2 = ADDOH (ADDR (REMR (REMM (ADDM

(C pub1, top keyword), max rating), Date pub), Publi journal),H journal). Then, based on C pub2, the user can create two cubes that con-tain respectively publications whose titles contain the word database: C1 =

Page 8: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

8 D. Boukraa et al.

σC(Contains(Publication.T itle,=′ database′)) and authors whom surnames be-gin with the letter A: C2 = σC(FirstLetter(Author.FamilyName =′ A′))).Finally, based on C1 and C2, a cube can be created for publications whose ti-tles contain the word database and written, among others, by authors whomsurnames begin with A:C1 ∩ C2(Publication).

3.2 Visualization operators

In order to perform OLAP analyzes, data are displayed using different visual-ization solutions (e.g. cross-tabs). Choosing a convenient solution for the user isan issue in OLAP visualization [6]. Therefore, as stated in [6], there is a needof an abstraction layer of OLAP visualization that enables switching from onevisualization technique to another. In this paper, we introduce the notion of viewover a complex cube as an abstract visualization solution. Thus, it is possibleto describe OLAP analysis operations at an abstract level and then let the userchoose the appropriate interface according the nature of data to be analyzed.Furthermore, we provide a formal description of a view that can be stored andgrouped with other view definitions to form navigation contexts. Then, the nav-igation contexts can be further processed using different techniques (e.g. datamining) to enhance or personalize the OLAP user interface.

View Projection. A view projection operation is similar to the Display op-eration introduced in [15]. In our work, this operation displays the followingelements:

– A view fact (VF) that maps onto to the fact of the complex cube. A VF ischaracterized by its name and by the set of its features. A feature maps ontoa fact attribute of the complex cube;

– A set of measures, selected among the measures of the cube;– A view dimension (VD) per relationship of the cube. A VD corresponds to a

dimension of the cube and it is characterized by its name and by the set ofits features. A VD feature maps onto one attribute of a dimension memberof the cube. Moreover, in order to know the aggregation level of the measurevalues, we associate the VD with two elements:• a complex object that belongs to one object hierarchy of the dimension

if there is any. We denote by AO such an object. In case there is nohierarchy, the VD is associated with the object directly linked to thefact;

• an attribute that belongs to one attribute hierarchy related to AO. Letus call this attribute AA.

The aggregated values of a measure are calculated along the AO then alongAA. The VD features that may be displayed on a view depend on AO andAA. Figure 2 depicts the notion of view over a complex cube. In this figure,the VF features are AV F

1 and AV F2 , the measures are M1 and M2, the VDs

are V D1, V D2 and V D3 which correspond to the relationships R1, R2 et R3

and the VD features are AV D11 , AV D1

2 , AV D21 , AV D2

2 , AV D23 and AV D3

1 .

Page 9: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object Data Cubes 9

Fig. 2. The notion of view over a complex cube

Definition 7. Let C = (F, SM,SRC , SD, SAHC , SOHC) be a complex cube.The view projection is denoted by V (C) = V C = (FV , SMV , SDV ) where

– FV is the VF such that FV = {AFVp /p ∈ N} where FV ⊆ SAF ∪ IDF if the

view displays the basic data of the cube (no aggregation is performed) andit is set to a constant Undefined if the view displays aggregated values of atleast one measure;

– SMV = {MVi /i ∈ N} ⊆ SM is the set of measures to be displayed;

– SDV = {DV Rj/j ∈ N} where DV Rj is a view dimension corresponding tothe relationship Rj of C. Moreover, we define the function V iewObj(DV Rj )= V OV Rj which associates DV Rj with an object belonging to the cubedimension that corresponds to the relationship Rj . Finally, we define thefunction V iewAtt(DV Rj ) = V AV Rj that associates DV Rj with an attributebelonging to one attribute hierarchy of V OV Rj .

Example 4. Let us suppose that the user wants to analyze max rating byauthor and by year and top keyword by author and by proceedings. The userwants to display the publication titles, the authors’ first names and surnamesand the proceedings’ titles . The view that corresponds to such an analysisis depicted in Fig. 3. Here, the VF features (the publication titles) are set toUndefined because the values of max rating are aggregated at the year levelalong the hierarchy H time. Formally, let C be the cube defined by this analysiscontext and V RK the view that the user wants to display. Then, V RK =ΠV (C)=(FV RK , SMV RK , SDV RK) where

– FVRK=Undefined;– SMV RK={max rating, top keyword};– SDV RK = {Authors, Time, Conferences} such that

• Authors = {Author.Firstname, Author.Lastname} where∗ ViewObj(Authors) = Author

Page 10: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

10 D. Boukraa et al.

∗ ViewAtt(Authors) = Author.Author id• Time = {Date.Year} where∗ ViewObj (Time) = Date∗ ViewAtt(Time) = {Date.Year}

• Conferences={Proceedings.Name} where∗ ViewObj (Conferences) = Proceedings∗ ViewAtt(Conferences) = Proceeding.Proceeding ID

Fig. 3. Example of a view over a complex cube

Structuring the view and restricting the data. Structuring a view consistsin adding or deleting features or measures according to the user’s needs, i.e.to have more or less information displayed. The operations of restricting dataare similar to slice-and-dice operations in traditional OLAP. They consist inapplying a selection predicate either to the fact/dimension feature values or tothe detailed/aggregated measure values . Conversely, unrestricting data consistsin displaying all the values of a feature or a measure and it applies to previouslyrestricted values. Table 3 summarizes these operations.

Operation Definition

Add / Remove measures ADDVM (V C ,M+)|REMVM (V C ,M−)

Add / Remove features of fact ADDFF (V C , FF+)|REMFF (V C , FF−)of dimension ADDDF (V C , DF+)|REMDF (V C , DF−)

Restrict/Unrestrict data of fact σFFVC(P (FF ))|µFFV C(FF )

of dimension σDFVC(P (DF ))|µDFV C(DF )

of measure σMVC(P (M))|µMV C(M)

Table 3. Structure and data-related operators on views over complex cubes

Page 11: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object Data Cubes 11

Example 5. Examples of visualization operation on the view V RK of ex-ample 4 are (1) adding the names of conferences: ADDDF (V RK, Confer-ences.Conference.Name) and (2) displaying only publications of year 2000:σDFV RK(Time.Date.year = ’2000’).

3.3 Aggregate operators

An aggregate operation consists in changing the associated object of a viewdimension or its associated attribute. The measure values are then aggregatedor detailed consequently. We define three kinds of aggregate operations.

– the Object hierarchy-based operations consist in changing the current AO bythe object at the upper level according to an object hierarchy (rollup) or atthe lower level (drill down);

– the Attribute hierarchy-based operations consist in changing the current AAby the attribute at the upper level according to an attribute hierarchy(rollup) or at the lower level (drill down);

– the Hierarchyless operations are applicable if AO (resp. AA) does not belongto any object-hierarchy (resp. to any attribute hierarchy). The hierarchylessrollup/drill-down consist respectively in removing/displaying the VD.

Note. Due to space limitation, we omit the formal notations for aggregateoperations. Yet, we summarize them in table 4.

Operation Definition

Rollup Object hierarchy-based RollUpOH(V C , DV RRU , OHC)Attribute hierarchy-based RollUpAH(V C , DV RRU , AHC)

Hierarchyless RollUpHL(V C , DV RRU )

Drill down Object hierarchy-based DrillDownOH(V C , DV RRU , OHC)Attribute hierarchy-based DrillDownAH(V C , DV RRU , AHC)

Hierarchyless DrillDownHL(V C , DV RRU )

Table 4. Granularity-related operators

Example 6. Examples of aggregate operations on the view V RK are thefollowing: (1) Attribute hierarchy-based rollup along the hierarchy H time andaccording to the relationship Date pub to get max rating by authors and forall periods of time: RollUpAH(V RK, Time, H time). (2) Object hierarchy-based rollup along the hierarchy H conf to get top keyword by author andby conference: RollUpOH(V RK, Conferences, H Conf). (3) Hierarchylessrollup to get max rating by year and top keyword by proceedings for all au-thors: RollUpHL(V RK,Authors). The drill down operations are the converseof the previous operations.

Page 12: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

12 D. Boukraa et al.

4 Implementation

In order to validate our multidimensional schema and operators, we implementedthe core of a warehousing and analysis framework (Fig. 4). We have translatedthe conceptual modeling elements into the logical and physical levels using XML.The choice of XML is motivated by its widespread use and its power to describeheterogeneous and diversely structured data or complex data in general. Thus,we developed an XML schema that describes the structure of any data warehouseand any complex cube. Then, for a functional validation, we used the dblp.xmlfile as the main source of our data warehouse. At the metadata level, we definedan XML file dblp xwh.xml that describes the content of the data warehousewhich is then materialized as a set of other xml files.

The modules of the platform the following. (1) the ETL module reads dblp.xmland loads the data into the data warehouse XML files. These files are then storedinto a native XML database (eXist). (2) the cube specification module imple-ments the cubic projection operator. It reads the meta data file (dblp xwh.xml)as well as the data files and produces a metadata file (cube.xml) and a setof XML documents that contain the real data. The visualization and analysis-related operators will be implemented in a next step.

Fig. 4. System architecture

5 Conclusion

In this paper, we have presented a set of OLAP operators for a complex object-based multidimensional model. The first set of operators allows constructing ofcomplex data cubes from the multidimensional schema or from existing cubes.

Page 13: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

OLAP Operators for Complex Object Data Cubes 13

The structure-related operators produce new cubes having a different struc-ture than the original cubes whereas the data-related operators produce same-structured cubes but containing more or less data. We have also defined aprojection operator over a data cube in order to display its data. Other view-related operators aim at displaying more or less features or measures on a view(structure-related operators) or at getting more or less data displayed (data-related operators). Finally, the aggregate operators aim at having more or lessdetail about the measures’ values. There are many perspectives for our work.First, we plan to consider more complex-structured measures instead of simpleattributes. Then, we shall extend the cube structure by considering the attributehierarchies in relation to the fact and the object hierarchies that contain the fact.These hierarchies will then allow observing a same measure at different levelsof the hierarchies and thus enable fact-based aggregate operations. Finally, weplan to extend the visualization and analysis operations to displaying the inter-nal structure of the complex objects (e.g. the relationships between attributes)and thus enable relationship-aware analyzes.

References

1. Alberto Abello, Jose Samos, and Felix Saltor. A framework for the classifica-tion and description of multidimensional data models. In Heinrich C. Mayr, JirıLazansky, Gerald Quirchmayr, and Pavel Vogel, editors, Proceedings of the 12th In-ternational Conference on Database and Expert Systems Applications (DEXA’01),Munich, Germany, Lecture Notes in Computer Science, pages 668–677. Springer,2001.

2. Sandro Bimonte, Anne Tchounikine, and Maryvonne Miquel. Towards a spa-tial multidimensional model. In Il-Yeol Song and Juan Trujillo, editors, Proceed-ings of the ACM 8th International Workshop on Data Warehousing and OLAP(DOLAP’05), Bremen, Germany, pages 39–46. ACM, 2005.

3. Omar Boussaıd and Doulkifli Boukraa. Multidimensional Modeling of ComplexData, pages 1358–1364. Encyclopedia of Data Warehousing and Mining, SecondEdition. IGI Publishing, Hershey, PA, USA, 2009.

4. Omar Boussaıd, Adrian Tanasescu, Fadila Bentayeb, and Jerome Darmont. In-tegration and dimensional modelling approaches for complex data warehousing.Journal of Global Optimization, 37(4):571–591, April 2007.

5. Surajit Chaudhuri and Umeshwar Dayal. An overview of data warehousing andolap technology. SIGMOD Record, 26(1):65–74, 1997.

6. Alfredo Cuzzocrea and Svetlana Mansmann. OLAP Visualization: Models, Issues,and Techniques, pages 1439–1446. Encyclopedia of Data Warehousing and Mining,Second Edition. IGI Publishing, Hershey, PA, USA, 2009.

7. M. L. Damiani and Stefano Spaccapietra. Spatial Data Warehouse Modelling. InProcessing and Managing Complex Data for Decision Support. Idea Group Pub-lishing, 2006.

8. Matteo Golfarelli, Stefano Rizzi, and Boris Vrdoljak. Data warehouse design fromxml sources. In Proceedings of teh 4th ACM International Workshop on DataWarehousing and OLAP (DOLAP 2001), Atlanta, Georgia, USA, 2001.

9. Akihiro Inokuchi and Koichi Takeda. A method for online analytical processingof text data. In Mario J. Silva, Alberto H. F. Laender, Ricardo A. Baeza-Yates,

Page 14: OLAP Operators for Complex Object Data Cubeseric.univ-lyon2.fr/~bentayeb/documents/version...OLAP Operators for Complex Object Data Cubes Doulki i Boukra^a1,Omar Boussa d 2, Fadila

14 D. Boukraa et al.

Deborah L. McGuinness, Bjørn Olstad, Øystein Haug Olsen, and Andre O. Falcao,editors, Proceedings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM 2007), Lisbon, Portuga, pages 455–464. ACM, 2007.

10. Mikael R. Jensen, Thomas H. Møller, and Torben Bach Pedersen. Specifying olapcubes on xml data. Journal of Intelligent Information Systems, 17(2-3):255–280,2001.

11. Beixin Lin, Yu Hong, and Zu-Hsu Lee. Data Warehouse Performance, pages 580–585. Encyclopedia of Data Warehousing and Mining, Second Edition. IGI Publish-ing, Hershey, PA, USA, 2009.

12. Sergio Lujan-Mora, Juan Trujillo, and Il-Yeol Song. Multidimensional model-ing with uml package diagrams. In Stefano Spaccapietra, Salvatore T. March,and Yahiko Kambayashi, editors, Proceedings of the 21st International Conferenceon Conceptual Modeling Conceptual Modeling (ER’02), Tampere, Finland, volume2503 of Lecture Notes in Computer Science, pages 199–213. Springer, 2002.

13. Vicky Nassis, Rajagopal Rajugan, Tharam S. Dillon, and J. Wenny Rahayu. Con-ceptual design of xml document warehouses. In Yahiko Kambayashi, Mukesh K.Mohania, and Wolfram Woß, editors, Proceedings of the 6th International Con-ference on Data Warehousing and Knowledge Discovery (DaWaK 2004), volume3181 of Lecture Notes in Computer Science, pages 1–14. Springer, 2004.

14. Torben Bach Pedersen, Junmin Gu, Arie Shoshani, and Christian S. Jensen.Object-extended olap querying. Data Knowledge Engineering, 68(5):453–480, 2009.

15. Franck Ravat, Olivier Teste, Ronan Tournier, and Gilles Zurfluh. Algebraic andgraphic languages for olap manipulations. International Journal of Data Ware-housing and Mining, 4(1):17–46, 2008.

16. Alejandro A. Vaisman and Alberto O. Mendelzon. A temporal query lan-guage for olap: Implementation and a case study. In Giorgio Ghelli and GostaGrahne, editors, 8th International Workshop on Database Programming Languages(DBPL’01), Frascati, Italy, volume 2397 of Lecture Notes in Computer Science,pages 78–96. Springer, 2001.

17. Nuwee Wiwatwattana, H. V. Jagadish, Laks V. S. Lakshmanan, and Divesh Sri-

vastava. X3: A cube operator for xml olap. In Proceedings of the 23rd Interna-tional Conference on Data Engineering (ICDE’07), Istanbul, Turkey, pages 916–925. IEEE, 2007.