51
1 The ORA-SS Approach for Designing Semistructured Databases Xiaoying Wu, Tok Wang Ling, Mong Li Lee National University of Singapore Gillian Dobbie University of Auckland, New Zealand

1 The ORA-SS Approach for Designing Semistructured Databases Xiaoying Wu, Tok Wang Ling, Mong Li Lee National University of Singapore Gillian Dobbie University

Embed Size (px)

Citation preview

1

The ORA-SS Approach for Designing Semistructured Databases

Xiaoying Wu, Tok Wang Ling, Mong Li LeeNational University of Singapore

Gillian DobbieUniversity of Auckland, New Zealand

2

Outline

1. Motivation 2. Introduction to ORA-SS (Object-Relationship- Attribute ) Model3. From ORA-SS to XML DTD4. Normal form for ORA-SS schema diagram5. Designing ORA-SS schema diagram into

normal form6. Comparison with related proposals7. Summary

3

1. Motivation Example 1.1: Redundancy in XML document

<department> <name>cs</name> <professor> <staffnumber>12</staffnumber> <name>Smith</name> <course>

<coursecode>230</coursecode> <title>Database</title> </course> </professor> <professor> <staffnumber>22</staffnumber> <name>Jones</name> <course>

<coursecode>230</coursecode> <title>Database</title> </course> </professor></department>

4

1. Motivation (Cont.) Example 1.1 (Cont.)

name course

11

course code

4

1

CS 3

6 7 8

10

5

16 17

professor name

staff number

name course

title

12 Smith

department.

230 database

department

name

name

staff number

professor

grade

course code

course

(b) DataGuide

(a) OEM Database

professor

staff number

230

course code

22 Jones

20

18

title

database

19

5

1. Motivation (Cont.) Example 1.1 (Cont.)Corresponding ORA-SS instance diagram and schema diagram

department

name: cs

Staff number: 12

name: Smith

course

course code: 230

course

title: Database

name: Jones

Staff number:22

professor

course code: 230

title: Database

(a) ORA-SS instance diagram

professor

department

name professor

2, 1:n, 1:1

course name

title course code

Staff number

2, 1:n, 1:n

(b) Nested object class in an ORA-SS schema diagram

6

1. Motivation (Cont.) Example 1.1 (Cont.)

department

name professor

2, 1:n, 1:1

course1 name Staff

number

2, 1:n, 1:n

course

title course code

Course-Ref

A better Designed ORA-SS schema diagram

7

Example 1.1 (Cont.)

1. Motivation (Cont.)

department

name: C.S.

Staff number: 12

name: Smith

course1

name: Jones

Staff number:22

professor

course

course code: 230

title: database

professor

course1

Course-Ref

Course-Ref

A better Designed ORA-SS instance schema diagram

8

1. Motivation (Cont.) Example 1.2:Ambiguity in OEM database and its DataGgide

member

9

number

4

2

J1

3

5 6 7

8

member id

name position publication

title

M1

project

Pub1

An OEM Database

name

16

12

number

10

11

title

Pub2

publication

1 JMP

17

J2

18

id

project

name

19 20

31

J3

32

member id

project

33

13

15

number

14

title

Pub3

publication

name

25

number

21 22 23

24

name position publication

title

M1

Pub1

28

number

26

27

title

Pub2

publication

29

31

number

30

Pub3

publication

34

title

DataGuide

name

id

position

member

name

number

publication

project

title

9

1. Motivation (Cont.) Example 1.2(Cont.) :Ternary Relationship Type Representation

project

id member

jm 2, +,+

publication

position

title number

name

mp 3, 0:n, 1:m

name

j1 j2 j3

m1 m2

pub1 Pub2 Pub3

(a) ORA-SS Schema Diagram (mp is a ternary relationship type)

(b) A data instance of (a)

project member publication

(c) DataGuide

name

id

position

member

name

number

publication

project

title

10

1. Motivation (Cont.) Example 1.2 (Cont.):Binary Relationship Type Representation

project

id member

jm 2, +, +

publication position

title number

name

mp 2, *, +

name

j1 j2 j3

m1 m2

pub1 Pub2 Pub3

(a)ORA-SS Schema Diagram (mp is a binary relationship type)

(b) A data instance of (a)

(c) DataGuide

project member publication

name

id

position

member

name

number

publication

project

title

Note the DataGuide for the schema diagram is the same as for the previous schema!

11

2. Introduction to ORA-SS Model

Four concepts: object classes relationship types attributes references

Four Diagrams: schema diagram instance diagram functional dependency diagram inheritance diagram

12

2. Introduction to ORA-SS Model(Cont.) Object Class

– attributes of object class • Single valued• Multi-valued

– ordering on object class

Object class employee with attributesin an ORA-SS schema diagram

employee

name SSN age hobby *

13

2. Introduction to ORA-SS Model(Cont.)

Relationship Type– attributes of relationship type

• Single valued• Multi-valued

– degree of n-ary relationship type– participation constraints of objects in

relationship type– disjunctive relationship type– recursive relationship type

14

2. Introduction to ORA-SS Model(Cont.) Relationship type(Cont.)

Representing binary relationship type

project

id member

jm 2, +, +

publication position

title number

name

mp 2, *, +

name

j1 j2 j3

m1 m2

pub1 Pub2 Pub3

(a)ORA-SS Schema Diagram (mp is a binary relationship type)

(b) A data instance of (a)

project member publication

15

2. Introduction to ORA-SS Model(Cont.)

Relationship type(Cont.)

Representing ternary relationship type

project

id member

jm 2, +,+

publication

position

title number

name

mp 3, 0:n, 1:m

name

j1 j2 j3

m1 m2

pub1 Pub2 Pub3

(a) ORA-SS Schema Diagram (mp is a ternary relationship type)

(b) A data instance of (a)

project member publication

16

2. Introduction to ORA-SS Model(Cont.)

Attributes– key attribute and identifier– composite attribute– disjunctive attribute– attribute with unknown structure (ANY)– ordering on attribute– Attributes of object class/relationship type– Single-valued / multi-valued attribute– fixed and default values of attribute– derived attribute

17

2. Introduction to ORA-SS Model(Cont.)

Attributes(Cont.)

Object classes with relationship type and attributes in an ORA-SS schema diagram

course

student ANY

first- name

grade number

number

cs 2, 4:n, 3:8

title *

dept-prefix D:CS

last- name

mark

cs cs

* hobby

18

Attributes(Cont.)

course

title

project

topic

homework

deadline number

assign 2, 1:n, 1:1

code

lecture theatre

laboratory

algorithm

exam venue

2. Introduction to ORA-SS Model(Cont.)

Disjunctive attribute and relationship in an ORA-SS schema diagram

19

2. Introduction to ORA-SS Model(Cont.) References

Referencing an object class in an ORA-SS schema diagram

student1

grade

cs 2, 1:n, 1:m

cs

Student-Ref

course

title code

lecture theatre

laboratory

+ text

book

student

number address

first name

last name

name exam venue

20

2. Introduction to ORA-SS Model(Cont.)

Recursive relationship type in an ORA-SS schema diagram

Symmetric relationship sets in an ORA-SS schema diagram

References (Cont.) course

prereq title code

cp 2, 0:5, 1:n

title

course-prereq.

course

student1 title

grade

code

cs 2, +, +

cs

student

name number

course1

grade

cs Student-Ref Course-Ref

cs 2, +, +

21

3. Mapping ORA-SS schema diagram to XML DTD

Algorithm 1: Mapping ORA-SS Schema Diagram to XML DTDinput: an ORA-SS schema diagram SDoutput: an XML DTDBeginFor each object class O in SD do: Step 1. sub-object classes of O <!ELEMENT O (subelementsList)>. Step 2. For each attribute A of O Case (1)A is a single valued simple attribute <!ATTLIST O A type> Case (2)A is a single valued composite attribute, replace A with its components and add them to <!ATTLIST O attributeName type> Case (3)A is a multivalued simple attribute <!ELEMENTA

(#PCDATA)>. Case (4)A is a multivalued composite attribute <!ELEMENTA

(#EMPTY)>, A’s components <!ATTLIST A componentName type >

22

Algorithm 1: mapping ORA-SS schema diagram to XML DTD (cont.)

3. Mapping ORA-SS schema diagram to XML DTD (Cont.)

Step 3. For each relationship attribute A under O

Case (1)A is a simple attribute <!ELEMENTA (#PCDATA)> add A to O ’s subelementsList.

Case (2)A is a multi-valued simple attribute <!ELEMENTA (#PCDATA)> and add A to O ’s subelementsList .

Case (3)A is a single-valued composite attribute <!ELEMENTA (#PCDATA)>. A’s components <!ATTLISTA componentName type >.

Case (4) A is a multi-valued composite attribute <!ELEMENTA (#PCDATA)>. A’s components <!ATTLISTA componentName type >. add A to O ’s subelementsList.

Step 4. For each reference O-Ref

Case (1) O is a child object class of O1, and has no extra attributes and child object classes

<!ATTLIST O1 O-Ref IDREF(S)>

Case (2) O is a root object class or it has nested attributes or child object classes

<!ATTLIST O O-Ref IDREF(S)>

23

3. Mapping ORA-SS schema diagram to XML DTD (Cont.)

Example 3.1

student1

grade

cs 2, 1:n, 1:m

cs

Student-Ref

course

title code

lecture theatre

laboratory

+ text

book

student

number address

first name

last name

name exam venue

Referencing an object class in an ORA-SS schema diagram

24

Example 3.1 (Cont.)

<!ELEMENT course (textbook+, student1+)> <!ATTLIST course code CDATA #REQUIRED

title CDATA lecture-theater CDATA #IMPLIED laboratory CDATA #IMPLIED >

<!ELEMENT textbook #PCDATA> <!ELEMENT student1 (grade)> <!ATTLIST student1 Student-Ref IDREF #REQUIRED > <!ELEMENT grade #PCDATA > <!ELEMENT student (name)> <!ATTLIST student number ID #REQUIRED

address CDATA> <!ELEMENT name EMPTY> <!ATTLIST name first-name CDATA

last-name CDATA>

An XML DTD for the ORA-SS schema diagram

3. Mapping ORA-SS schema diagram to XML DTD (Cont.)

25

4. Normal form for ORA-SS schema diagram Observation: ORA-SS is similar to nested relations

– tree-like structure – repeating groups or multiple occurrences of objects.

e.g.: the corresponding nested relation for the following ORA-SS schema diagram is

Dept (dept-name, course (code, title, student (number, s-name, grade)*)*)

department

Dept name

course

2, 1:n, 1:1

student

title

grade number

code

cs, 2, 1:n, 1:n

cs

s-name

26

4. Normal form for ORA-SS schema diagram(Cont.)

Objectives: To ensure the corresponding set of nested relations of the ORA-SS schema diagram is in normal form for set of nested relations (NF-NR) [5,6]

We will define Object class normal form (O-NF) Relationship type normal form (R-NF) ORA-SS normal form schema (ORA-SS NF)

27

4. Normal form for ORA-SS schema diagram(Cont.)

Defn: object class normal form (O-NF)

An object class O of an ORA-SS schema diagram is said to be in object class normal form (O-NF), if the nested relation constructed by O’s single valued attributes as its atomic attributes, O’s multivalued attributes as its repeating groups, is in normal form NF-NR.

28

Example 4.1:Assume we have following functional dependencies: {S# dept, deptfaculty} for the ORA-SS schema diagram:

4. Normal form for ORA-SS schema diagram(Cont.)

staff

dept faculty S#

The corresponding nested relation for the schema diagram is : Staff(s#,dept,faculty),

it is not in 3NF, since faculty is transitive dependent on S# , hence the relation is not in NF-NR.

faculty

dept

2,1:n,1:1

staff

2,1:n,1:1

A better Designed ORA-SS schema diagram:

Transitive functional dependency is removed.

29

4. Normal form for ORA-SS schema diagram(Cont.)

Defn: relationship type normal form (R-NF)

A relationship type R of an ORA-SS schema diagram D is said to be in relationship type normal form (R-NF), if the nested relation constructed by the identifiers of the participating object classes, and R’s atomic attributes as its atomic attributes, R’s multivalued attributes and composite attributes as its repeating groups, is in normal form NF-NR.

30

Example 4.2:The ORA-SS schema attempts to show that the lecturer can teach all the courses using all the textbooks as described on the curriculum, i.e. it should satisfy a MVD constraints: course-codeisbn | staff#..

text

ct 2, 1:n, 1:n

lecturer

title

name office staff#

isbn

ctl, 3, 1:n, 1:n

course

title course code

The nested relation for the relationship type ctl is: ctl(course-code,isbn,staff#)

It is not in 4NF, so is not in NF-NR, hence the relationship type ctl is not in R-NF.

4. Normal form for ORA-SS schema diagram(Cont.)

course

title code

text

title isbn

ct 2, 1:n, 1:n

lecturer

name office staff#

cl 2, 1:n, 1:n

A better design: MVD is removed

31

4. Normal form for ORA-SS schema diagram(Cont.)

Defn: ORA-SS normal form schemaAn ORA-SS schema diagram D is in normal form (NF) iff it satisfies thefollowing conditions:1.Every object class in D is in O-NF.2.For every relationship type R in D (a) R is in R-NF. (b) Case(1) R is a binary relationship type from object class A to object class B, then all the B’s attributes can stay with B only if R is a one-to-many or one-to-one binary relationship type from A to B. All the attributes of R (if any) should be attached to B.

Case (2) R is a n-ary relationship type with n (n>2) participating objectclasses O1,O2,…,On, and the path going downward from the top of Dlinking those object classes is /O1/O2/…/On, then for each object classOi (2in),

(i) Oi should have an i-ary relationship Ri with its ancestors O1,O2,…,Oi-1. (ii) The attributes of Oi can stay with Oi only if functional dependency Oi O1,O2,…,Oi-1 can be derived from the functional dependency diagram for D. The attributes of Ri (if any) should be attached to Oi. 3.There is no relationship type nested under another many-to-many or many-

to one binary or n-ary (n>2) relationship type.4.Every relationship type cannot be derived from other relationship types in D.

32

4. Normal form for ORA-SS schema diagram(Cont.)

Example 4.4: The ORA-SS schema diagram is not in NF, if professor is also an employee in the department: the qualification of a professor can be derived from that of employee, such information will be repeated in the underlying databases.

professor

staff#

degree

title

year

employee

name

company j-date

* research interests

* grad student + Qual.

degree year

Qual.

job-history

+ *

department

name

staff#

2,1:n,1:1 2,1:n,1:1

professor

title

employee

name

company j-date

* research interests

* grad student

degree year

Qual.

job-history

+ *

department

name

staff#

2,1:n,1:1 2,1:n,1:1

Staff-Ref

A ORA-SS schema diagram that not in NF A ORA-SS schema diagram that in NF

33

5. Converting ORA-SS Schema Diagrams into Normal Form

Two Approaches for Designing Semistructured Databases: Approach 1.

– based on the users’ requirements, come out an initial ORA-SS schema diagram;

– normalize the ORA-SS schema diagram to its normal form; – map it to an XML DTD or XML Schema;

Approach 2.– Extract schema from the instances using the schema extracting

techniques.– Translate the schema into ORA-SS schema diagram. Here we need

semantic enrichment, since not all semantics needed are available from the extracted schema.

– Convert the ORA-SS schema diagram into its normal form.– translate the NF ORA-SS schema diagram back to XML DTD or XML

Schema.– Restructuring the initial data instance to conform to the generated

XML DTD or XML Schema.

34

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Algorithm 2: Converting an ORA-SS schema diagram into NF ORA-SS schema diagram. Input : an ORA-SS schema diagram SD, and its functional dependency diagram. Output : a NF ORA-SS schema diagram. { step 1. Convert any non O-NF object class to O-NF. step 2. Make each relationship type R in R-NF. step 3. This step involves two sub-steps. (1) Construct diagrams for each object class with their attributes. (2) Represent each relationship type R. We make R satisfy the item (b) of condition 2 as well as condition 3 of the NF definition by

introducing referencing object classes, and requiring each relationship type start with an object class with attributes (i.e., non-reference object class). step 4. Remove those relationship types along with their associated attributes that can be derived from other relationship types in the schema

diagram to satisfy condition 4 of NF definition. }

35

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.1: There is a many-to-many binary relationship pc between

professor and course, and a many-to-many binary relationship ct between course and textbook.

It is not in NF ORA-SS since it violates the condition 3 of the NF definition.

professor

staff# course

pc, 2, *, *

textbook title

author ISBN

code

ct, 2, *,*

title +

name

.

(a) Initial ORA-SS schema diagram

36

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.1 (Cont.)

Step 1. The three given object classes are already in O-NF. Step 2. The two relationship type pc and ct are already in R-NF.Step 3. (1) generate three diagrams for the object classes with

attributes.

professor

name

course

title code staff#

textbook

author title +

ISBN

(b) Fragment diagrams for object classes

37

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.1 (Cont.)

Step 3.(Cont.) (2) represent the binary relationship pc, by creating a

reference object class course1 referencing course and nest course1

under professor

professor

staff# Course1

pc, 2, *, *

name

course

title code

textbook

author title +

ISBN c-ref

(c) Diagrams after representing relationship pc

38

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.1 (Cont.)

Step 3.(Cont.) (2) represent the binary relationship ct, by creating a reference object class textbook1 referencing

textbook and nest textbook1 under course.

professor

staff# course1

pc, 2, *, *

name

course

textbook1 title code

textbook

author title +

ISBN

ct, 2, *,*

c-ref t-ref

Step 4.(passed). The schema generated is in NF.

(d) Final ORA-SS schema diagram that in NF

39

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.2.

There is a binary relationship cs between course and student and a ternary relationship cst between course, student and tutor. The grade is an attribute of the binary relationship cs, and feedback is an attribute of the ternary relationship cst.

It is not in NF ORA-SS since it violates the item (ii) of case 2 in condition 2-(b) of NF definition.

interest

course

cid title

cs,2,0:m,0:n

student

sid name

cst,3,0:m,0:n

age ?

?

grade

cs

?

tutor

tid feedback

cst

* name

(a) Initial ORA-SS schema diagram

40

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.2(Cont.)

Step 1. The three given object classes are already in O-NF. Step 2.The two relationship type cs and cst are already in R-NF.Step 3. (1) generate three diagrams for the object classes with

attributes.

interest

course

cid title ?

student

sid name age ?

tutor

tid *

name

(b) Fragment diagrams for object classes

41

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.2 (Cont.)

Step 3.(Cont.) (2) represent the binary relationship cs. we create a

reference object class student1 referencing student and nest

student1

under course. Relationship attribute grade is attached to

student1.

interest

course

cid title

cs, 2,0:m,0:n

student1

?

grade

cs

?

student

sid name age ?

tutor

tid *

name s-ref

(c) Diagram representing binary relationship cs

42

5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)

Example 5.2 (Cont.)

Step 3.(Cont.)

(2) represent the relationship cst. we create a reference object class

tutor1 referencing tutor, and nest tutor1 under student1. Relationship

attribute feedback is attached to tutor1. interest

course

cid title

cs,2,0:m,0:n

student1

cst,3,0:m,0:n

?

grade

cs

?

tutor1

cst

feedback

student

sid name age ?

tutor

tid *

name s-ref

t-ref

Step 4.(passed). The schema generated is in NF.

(d) Final ORA-SS schema diagram that in NF

43

6. Comparison with Related Proposal

The first attempt to define normal form for semistructured data[4] – Defines a schema called S3-Graph, a

labeled graph in which vertices correspond to objects and edges represent the object-subobject relationship. Its data instance is called semistructured data graph.

– S3-Graph cannot show the degree of a n-ary relationship type, neither can it distinguish between attributes of object classes and attributes of relationships types.

44

6. Comparison with Related Proposal(Cont.)

The first attempt to define normal form for semistructured data[4] (Cont.)– Defined a dependency constraint SS-

dependency.– Proposes S3-NF. An S3-Graph is in S3-NF if

there is no transitive SS-dependency. Hence, only this kind of redundancy can be recognized by S3-NF

45

6. Comparison with Related Proposal(Cont.)

The first attempt to define normal form for semistructured data[4] (Cont.)– Presents two approaches to design S3-NF databases

1. The decomposition method can remove identified transitive SS-dependency and achieve S3-NF, while may not able to remove the partial functional dependency inside an entity type or object classes, as well as the redundancy result from over-nesting.

2. The transformation of a normal form ER diagram into an S3-Graph. The result may not be unique but is dependent on the path constructed. Hence some results may not satisfy the application requirements and comply with the user’s viewpoints.

46

6. Comparison with Related Proposal(Cont.)

The most recent proposal: XNF (XML Normal Form)[2]

– It mainly provides algorithms to translate a schema, represented in a conceptual model called CM hypergraph to a scheme-tree forest in XNF.

– CM hypergraph has no concept of attribute (so too many objects) and no hierarchical structure.

– The given algorithms are non-deterministic, and suffers from efficiency.

– Adding new required information requires redesign schema.

– The algorithms generate a large no of solutions rather than verifying whether a SS schema is in normal form or not.

– ISA hierarchies are removed from CM hypergraph before input to the algorithms.

47

6. Comparison with Related Proposal(Cont.)

The advantages of our proposal: – 2-level design: incremental and iterative

• First, identify or figure out object classes,and relationship types from user requirements.

• Then add attributes for object classes and relationship types.

In contrast, XNF requires all the needed information to be presented at once. Even

a small change in information requirements requires redesign the whole schema.

48

6. Comparison with Related Proposal(Cont.)

The advantages of our proposal (Cont.): – Preserve the hierarchical structure

satisfying users’ requirements. In contrast, since CM graph has no

hierarchy, XNF needs to generate many solutions.

The approach fails when user already has a hierarchical structure, and wants to

preserve it and verifies the design is good or not.

49

7. Summary ORA-SS model helps to detect redundancy in

semistructured data. We need a normal form for ORA-SS, since ORA-

SS schema diagrams may contain redundancies and suffers from considerable updating anomalies.

We define a normal form ORA-SS schema diagram. It ensures– no unnecessary redundancy and– no updating anomalies for semistructured

databases generated from the schema . We present an algorithm for mapping ORA-SS

schema diagram into XML DTD/Schema

50

7. Summary (Cont.)

We give a design methodology and present a comprehensive algorithm for normalizing an ORA-SS schema diagram into its normal form. The steps presented can also be used as guidelines for designing semistructured databases using the ORA-SS model – As ORA-SS distinguished objects Vs.

attributes, the design complexity is reduced.– ORA-SS allows 2 levels of design: first object

classes and relationship type then add in attributes.

We show that ORA-SS design approach outperform other related proposals.

51

References1. G.Dobbie, X.Y.Wu, T.W.Ling and M.L.Lee. ORA-SS: An Object-

Relationship-Attribute Model for Semistructured Data. Technical Report TR21/00, School of Computing, National University of Singapore, 2000.

2. D.W.Embley and W.Y.Mok. Developing XML Documents with Guaranteed “Good” Properties. ER 2001.

3. R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. Proceedings of the Twenty-Third International Conference on Very Large Data Bases, pages 436-445, Athens, Greece, August 1997.

4. S. Y. Lee, M. L. Lee, T. W. Ling and L. A.. Kalinichenko. Designing Good Semi-structured Databases. ER 1999: 131-145

5. T.W. Ling. A Normal Form for Entity-Relationship Diagrams. Proc. 4th International Conference on Entity-Relationship Approach (1985)

6. T. W. Ling. A normal form for sets of not-necessarily normalized relations. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pp. 578-586. United States: IEEE Computer Society Press, 1989.

7. X.Y.Wu, T.W. Ling, M.L.Lee, G.Dobbie. Designing Semistructured Databases Using ORA-SS Model, in Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE), IEEE Computer Society Kyoto, Japan, December 2001.