Upload
philip-oneal
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
1
The ORA-SS Approach for Designing Semistructured Databases
Xiaoying Wu, Tok Wang Ling, Mong Li LeeNational University of Singapore
Gillian DobbieUniversity of Auckland, New Zealand
2
Outline
1. Motivation 2. Introduction to ORA-SS (Object-Relationship- Attribute ) Model3. From ORA-SS to XML DTD4. Normal form for ORA-SS schema diagram5. Designing ORA-SS schema diagram into
normal form6. Comparison with related proposals7. Summary
3
1. Motivation Example 1.1: Redundancy in XML document
<department> <name>cs</name> <professor> <staffnumber>12</staffnumber> <name>Smith</name> <course>
<coursecode>230</coursecode> <title>Database</title> </course> </professor> <professor> <staffnumber>22</staffnumber> <name>Jones</name> <course>
<coursecode>230</coursecode> <title>Database</title> </course> </professor></department>
4
1. Motivation (Cont.) Example 1.1 (Cont.)
name course
11
course code
4
1
CS 3
6 7 8
10
5
16 17
professor name
staff number
name course
title
12 Smith
department.
230 database
department
name
name
staff number
professor
grade
course code
course
(b) DataGuide
(a) OEM Database
professor
staff number
230
course code
22 Jones
20
18
title
database
19
5
1. Motivation (Cont.) Example 1.1 (Cont.)Corresponding ORA-SS instance diagram and schema diagram
department
name: cs
Staff number: 12
name: Smith
course
course code: 230
course
title: Database
name: Jones
Staff number:22
professor
course code: 230
title: Database
(a) ORA-SS instance diagram
professor
department
name professor
2, 1:n, 1:1
course name
title course code
Staff number
2, 1:n, 1:n
(b) Nested object class in an ORA-SS schema diagram
6
1. Motivation (Cont.) Example 1.1 (Cont.)
department
name professor
2, 1:n, 1:1
course1 name Staff
number
2, 1:n, 1:n
course
title course code
Course-Ref
A better Designed ORA-SS schema diagram
7
Example 1.1 (Cont.)
1. Motivation (Cont.)
department
name: C.S.
Staff number: 12
name: Smith
course1
name: Jones
Staff number:22
professor
course
course code: 230
title: database
professor
course1
Course-Ref
Course-Ref
A better Designed ORA-SS instance schema diagram
8
1. Motivation (Cont.) Example 1.2:Ambiguity in OEM database and its DataGgide
member
9
number
4
2
J1
3
5 6 7
8
member id
name position publication
title
M1
project
Pub1
An OEM Database
name
16
12
number
10
11
title
Pub2
publication
1 JMP
17
J2
18
id
project
name
19 20
31
J3
32
member id
project
33
13
15
number
14
title
Pub3
publication
name
25
number
21 22 23
24
name position publication
title
M1
Pub1
28
number
26
27
title
Pub2
publication
29
31
number
30
Pub3
publication
34
title
DataGuide
name
id
position
member
name
number
publication
project
title
9
1. Motivation (Cont.) Example 1.2(Cont.) :Ternary Relationship Type Representation
project
id member
jm 2, +,+
publication
position
title number
name
mp 3, 0:n, 1:m
name
j1 j2 j3
m1 m2
pub1 Pub2 Pub3
(a) ORA-SS Schema Diagram (mp is a ternary relationship type)
(b) A data instance of (a)
project member publication
(c) DataGuide
name
id
position
member
name
number
publication
project
title
10
1. Motivation (Cont.) Example 1.2 (Cont.):Binary Relationship Type Representation
project
id member
jm 2, +, +
publication position
title number
name
mp 2, *, +
name
j1 j2 j3
m1 m2
pub1 Pub2 Pub3
(a)ORA-SS Schema Diagram (mp is a binary relationship type)
(b) A data instance of (a)
(c) DataGuide
project member publication
name
id
position
member
name
number
publication
project
title
Note the DataGuide for the schema diagram is the same as for the previous schema!
11
2. Introduction to ORA-SS Model
Four concepts: object classes relationship types attributes references
Four Diagrams: schema diagram instance diagram functional dependency diagram inheritance diagram
12
2. Introduction to ORA-SS Model(Cont.) Object Class
– attributes of object class • Single valued• Multi-valued
– ordering on object class
Object class employee with attributesin an ORA-SS schema diagram
employee
name SSN age hobby *
13
2. Introduction to ORA-SS Model(Cont.)
Relationship Type– attributes of relationship type
• Single valued• Multi-valued
– degree of n-ary relationship type– participation constraints of objects in
relationship type– disjunctive relationship type– recursive relationship type
14
2. Introduction to ORA-SS Model(Cont.) Relationship type(Cont.)
Representing binary relationship type
project
id member
jm 2, +, +
publication position
title number
name
mp 2, *, +
name
j1 j2 j3
m1 m2
pub1 Pub2 Pub3
(a)ORA-SS Schema Diagram (mp is a binary relationship type)
(b) A data instance of (a)
project member publication
15
2. Introduction to ORA-SS Model(Cont.)
Relationship type(Cont.)
Representing ternary relationship type
project
id member
jm 2, +,+
publication
position
title number
name
mp 3, 0:n, 1:m
name
j1 j2 j3
m1 m2
pub1 Pub2 Pub3
(a) ORA-SS Schema Diagram (mp is a ternary relationship type)
(b) A data instance of (a)
project member publication
16
2. Introduction to ORA-SS Model(Cont.)
Attributes– key attribute and identifier– composite attribute– disjunctive attribute– attribute with unknown structure (ANY)– ordering on attribute– Attributes of object class/relationship type– Single-valued / multi-valued attribute– fixed and default values of attribute– derived attribute
17
2. Introduction to ORA-SS Model(Cont.)
Attributes(Cont.)
Object classes with relationship type and attributes in an ORA-SS schema diagram
course
student ANY
first- name
grade number
number
cs 2, 4:n, 3:8
title *
dept-prefix D:CS
last- name
mark
cs cs
* hobby
18
Attributes(Cont.)
course
title
project
topic
homework
deadline number
assign 2, 1:n, 1:1
code
lecture theatre
laboratory
algorithm
exam venue
2. Introduction to ORA-SS Model(Cont.)
Disjunctive attribute and relationship in an ORA-SS schema diagram
19
2. Introduction to ORA-SS Model(Cont.) References
Referencing an object class in an ORA-SS schema diagram
student1
grade
cs 2, 1:n, 1:m
cs
Student-Ref
course
title code
lecture theatre
laboratory
+ text
book
student
number address
first name
last name
name exam venue
20
2. Introduction to ORA-SS Model(Cont.)
Recursive relationship type in an ORA-SS schema diagram
Symmetric relationship sets in an ORA-SS schema diagram
References (Cont.) course
prereq title code
cp 2, 0:5, 1:n
title
course-prereq.
course
student1 title
grade
code
cs 2, +, +
cs
student
name number
course1
grade
cs Student-Ref Course-Ref
cs 2, +, +
21
3. Mapping ORA-SS schema diagram to XML DTD
Algorithm 1: Mapping ORA-SS Schema Diagram to XML DTDinput: an ORA-SS schema diagram SDoutput: an XML DTDBeginFor each object class O in SD do: Step 1. sub-object classes of O <!ELEMENT O (subelementsList)>. Step 2. For each attribute A of O Case (1)A is a single valued simple attribute <!ATTLIST O A type> Case (2)A is a single valued composite attribute, replace A with its components and add them to <!ATTLIST O attributeName type> Case (3)A is a multivalued simple attribute <!ELEMENTA
(#PCDATA)>. Case (4)A is a multivalued composite attribute <!ELEMENTA
(#EMPTY)>, A’s components <!ATTLIST A componentName type >
22
Algorithm 1: mapping ORA-SS schema diagram to XML DTD (cont.)
3. Mapping ORA-SS schema diagram to XML DTD (Cont.)
Step 3. For each relationship attribute A under O
Case (1)A is a simple attribute <!ELEMENTA (#PCDATA)> add A to O ’s subelementsList.
Case (2)A is a multi-valued simple attribute <!ELEMENTA (#PCDATA)> and add A to O ’s subelementsList .
Case (3)A is a single-valued composite attribute <!ELEMENTA (#PCDATA)>. A’s components <!ATTLISTA componentName type >.
Case (4) A is a multi-valued composite attribute <!ELEMENTA (#PCDATA)>. A’s components <!ATTLISTA componentName type >. add A to O ’s subelementsList.
Step 4. For each reference O-Ref
Case (1) O is a child object class of O1, and has no extra attributes and child object classes
<!ATTLIST O1 O-Ref IDREF(S)>
Case (2) O is a root object class or it has nested attributes or child object classes
<!ATTLIST O O-Ref IDREF(S)>
23
3. Mapping ORA-SS schema diagram to XML DTD (Cont.)
Example 3.1
student1
grade
cs 2, 1:n, 1:m
cs
Student-Ref
course
title code
lecture theatre
laboratory
+ text
book
student
number address
first name
last name
name exam venue
Referencing an object class in an ORA-SS schema diagram
24
Example 3.1 (Cont.)
<!ELEMENT course (textbook+, student1+)> <!ATTLIST course code CDATA #REQUIRED
title CDATA lecture-theater CDATA #IMPLIED laboratory CDATA #IMPLIED >
<!ELEMENT textbook #PCDATA> <!ELEMENT student1 (grade)> <!ATTLIST student1 Student-Ref IDREF #REQUIRED > <!ELEMENT grade #PCDATA > <!ELEMENT student (name)> <!ATTLIST student number ID #REQUIRED
address CDATA> <!ELEMENT name EMPTY> <!ATTLIST name first-name CDATA
last-name CDATA>
An XML DTD for the ORA-SS schema diagram
3. Mapping ORA-SS schema diagram to XML DTD (Cont.)
25
4. Normal form for ORA-SS schema diagram Observation: ORA-SS is similar to nested relations
– tree-like structure – repeating groups or multiple occurrences of objects.
e.g.: the corresponding nested relation for the following ORA-SS schema diagram is
Dept (dept-name, course (code, title, student (number, s-name, grade)*)*)
department
Dept name
course
2, 1:n, 1:1
student
title
grade number
code
cs, 2, 1:n, 1:n
cs
s-name
26
4. Normal form for ORA-SS schema diagram(Cont.)
Objectives: To ensure the corresponding set of nested relations of the ORA-SS schema diagram is in normal form for set of nested relations (NF-NR) [5,6]
We will define Object class normal form (O-NF) Relationship type normal form (R-NF) ORA-SS normal form schema (ORA-SS NF)
27
4. Normal form for ORA-SS schema diagram(Cont.)
Defn: object class normal form (O-NF)
An object class O of an ORA-SS schema diagram is said to be in object class normal form (O-NF), if the nested relation constructed by O’s single valued attributes as its atomic attributes, O’s multivalued attributes as its repeating groups, is in normal form NF-NR.
28
Example 4.1:Assume we have following functional dependencies: {S# dept, deptfaculty} for the ORA-SS schema diagram:
4. Normal form for ORA-SS schema diagram(Cont.)
staff
dept faculty S#
The corresponding nested relation for the schema diagram is : Staff(s#,dept,faculty),
it is not in 3NF, since faculty is transitive dependent on S# , hence the relation is not in NF-NR.
faculty
dept
2,1:n,1:1
staff
2,1:n,1:1
A better Designed ORA-SS schema diagram:
Transitive functional dependency is removed.
29
4. Normal form for ORA-SS schema diagram(Cont.)
Defn: relationship type normal form (R-NF)
A relationship type R of an ORA-SS schema diagram D is said to be in relationship type normal form (R-NF), if the nested relation constructed by the identifiers of the participating object classes, and R’s atomic attributes as its atomic attributes, R’s multivalued attributes and composite attributes as its repeating groups, is in normal form NF-NR.
30
Example 4.2:The ORA-SS schema attempts to show that the lecturer can teach all the courses using all the textbooks as described on the curriculum, i.e. it should satisfy a MVD constraints: course-codeisbn | staff#..
text
ct 2, 1:n, 1:n
lecturer
title
name office staff#
isbn
ctl, 3, 1:n, 1:n
course
title course code
The nested relation for the relationship type ctl is: ctl(course-code,isbn,staff#)
It is not in 4NF, so is not in NF-NR, hence the relationship type ctl is not in R-NF.
4. Normal form for ORA-SS schema diagram(Cont.)
course
title code
text
title isbn
ct 2, 1:n, 1:n
lecturer
name office staff#
cl 2, 1:n, 1:n
A better design: MVD is removed
31
4. Normal form for ORA-SS schema diagram(Cont.)
Defn: ORA-SS normal form schemaAn ORA-SS schema diagram D is in normal form (NF) iff it satisfies thefollowing conditions:1.Every object class in D is in O-NF.2.For every relationship type R in D (a) R is in R-NF. (b) Case(1) R is a binary relationship type from object class A to object class B, then all the B’s attributes can stay with B only if R is a one-to-many or one-to-one binary relationship type from A to B. All the attributes of R (if any) should be attached to B.
Case (2) R is a n-ary relationship type with n (n>2) participating objectclasses O1,O2,…,On, and the path going downward from the top of Dlinking those object classes is /O1/O2/…/On, then for each object classOi (2in),
(i) Oi should have an i-ary relationship Ri with its ancestors O1,O2,…,Oi-1. (ii) The attributes of Oi can stay with Oi only if functional dependency Oi O1,O2,…,Oi-1 can be derived from the functional dependency diagram for D. The attributes of Ri (if any) should be attached to Oi. 3.There is no relationship type nested under another many-to-many or many-
to one binary or n-ary (n>2) relationship type.4.Every relationship type cannot be derived from other relationship types in D.
32
4. Normal form for ORA-SS schema diagram(Cont.)
Example 4.4: The ORA-SS schema diagram is not in NF, if professor is also an employee in the department: the qualification of a professor can be derived from that of employee, such information will be repeated in the underlying databases.
professor
staff#
degree
title
year
employee
name
company j-date
* research interests
* grad student + Qual.
degree year
Qual.
job-history
+ *
department
name
staff#
2,1:n,1:1 2,1:n,1:1
professor
title
employee
name
company j-date
* research interests
* grad student
degree year
Qual.
job-history
+ *
department
name
staff#
2,1:n,1:1 2,1:n,1:1
Staff-Ref
A ORA-SS schema diagram that not in NF A ORA-SS schema diagram that in NF
33
5. Converting ORA-SS Schema Diagrams into Normal Form
Two Approaches for Designing Semistructured Databases: Approach 1.
– based on the users’ requirements, come out an initial ORA-SS schema diagram;
– normalize the ORA-SS schema diagram to its normal form; – map it to an XML DTD or XML Schema;
Approach 2.– Extract schema from the instances using the schema extracting
techniques.– Translate the schema into ORA-SS schema diagram. Here we need
semantic enrichment, since not all semantics needed are available from the extracted schema.
– Convert the ORA-SS schema diagram into its normal form.– translate the NF ORA-SS schema diagram back to XML DTD or XML
Schema.– Restructuring the initial data instance to conform to the generated
XML DTD or XML Schema.
34
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Algorithm 2: Converting an ORA-SS schema diagram into NF ORA-SS schema diagram. Input : an ORA-SS schema diagram SD, and its functional dependency diagram. Output : a NF ORA-SS schema diagram. { step 1. Convert any non O-NF object class to O-NF. step 2. Make each relationship type R in R-NF. step 3. This step involves two sub-steps. (1) Construct diagrams for each object class with their attributes. (2) Represent each relationship type R. We make R satisfy the item (b) of condition 2 as well as condition 3 of the NF definition by
introducing referencing object classes, and requiring each relationship type start with an object class with attributes (i.e., non-reference object class). step 4. Remove those relationship types along with their associated attributes that can be derived from other relationship types in the schema
diagram to satisfy condition 4 of NF definition. }
35
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.1: There is a many-to-many binary relationship pc between
professor and course, and a many-to-many binary relationship ct between course and textbook.
It is not in NF ORA-SS since it violates the condition 3 of the NF definition.
professor
staff# course
pc, 2, *, *
textbook title
author ISBN
code
ct, 2, *,*
title +
name
.
(a) Initial ORA-SS schema diagram
36
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.1 (Cont.)
Step 1. The three given object classes are already in O-NF. Step 2. The two relationship type pc and ct are already in R-NF.Step 3. (1) generate three diagrams for the object classes with
attributes.
professor
name
course
title code staff#
textbook
author title +
ISBN
(b) Fragment diagrams for object classes
37
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.1 (Cont.)
Step 3.(Cont.) (2) represent the binary relationship pc, by creating a
reference object class course1 referencing course and nest course1
under professor
professor
staff# Course1
pc, 2, *, *
name
course
title code
textbook
author title +
ISBN c-ref
(c) Diagrams after representing relationship pc
38
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.1 (Cont.)
Step 3.(Cont.) (2) represent the binary relationship ct, by creating a reference object class textbook1 referencing
textbook and nest textbook1 under course.
professor
staff# course1
pc, 2, *, *
name
course
textbook1 title code
textbook
author title +
ISBN
ct, 2, *,*
c-ref t-ref
Step 4.(passed). The schema generated is in NF.
(d) Final ORA-SS schema diagram that in NF
39
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.2.
There is a binary relationship cs between course and student and a ternary relationship cst between course, student and tutor. The grade is an attribute of the binary relationship cs, and feedback is an attribute of the ternary relationship cst.
It is not in NF ORA-SS since it violates the item (ii) of case 2 in condition 2-(b) of NF definition.
interest
course
cid title
cs,2,0:m,0:n
student
sid name
cst,3,0:m,0:n
age ?
?
grade
cs
?
tutor
tid feedback
cst
* name
(a) Initial ORA-SS schema diagram
40
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.2(Cont.)
Step 1. The three given object classes are already in O-NF. Step 2.The two relationship type cs and cst are already in R-NF.Step 3. (1) generate three diagrams for the object classes with
attributes.
interest
course
cid title ?
student
sid name age ?
tutor
tid *
name
(b) Fragment diagrams for object classes
41
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.2 (Cont.)
Step 3.(Cont.) (2) represent the binary relationship cs. we create a
reference object class student1 referencing student and nest
student1
under course. Relationship attribute grade is attached to
student1.
interest
course
cid title
cs, 2,0:m,0:n
student1
?
grade
cs
?
student
sid name age ?
tutor
tid *
name s-ref
(c) Diagram representing binary relationship cs
42
5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.)
Example 5.2 (Cont.)
Step 3.(Cont.)
(2) represent the relationship cst. we create a reference object class
tutor1 referencing tutor, and nest tutor1 under student1. Relationship
attribute feedback is attached to tutor1. interest
course
cid title
cs,2,0:m,0:n
student1
cst,3,0:m,0:n
?
grade
cs
?
tutor1
cst
feedback
student
sid name age ?
tutor
tid *
name s-ref
t-ref
Step 4.(passed). The schema generated is in NF.
(d) Final ORA-SS schema diagram that in NF
43
6. Comparison with Related Proposal
The first attempt to define normal form for semistructured data[4] – Defines a schema called S3-Graph, a
labeled graph in which vertices correspond to objects and edges represent the object-subobject relationship. Its data instance is called semistructured data graph.
– S3-Graph cannot show the degree of a n-ary relationship type, neither can it distinguish between attributes of object classes and attributes of relationships types.
44
6. Comparison with Related Proposal(Cont.)
The first attempt to define normal form for semistructured data[4] (Cont.)– Defined a dependency constraint SS-
dependency.– Proposes S3-NF. An S3-Graph is in S3-NF if
there is no transitive SS-dependency. Hence, only this kind of redundancy can be recognized by S3-NF
45
6. Comparison with Related Proposal(Cont.)
The first attempt to define normal form for semistructured data[4] (Cont.)– Presents two approaches to design S3-NF databases
1. The decomposition method can remove identified transitive SS-dependency and achieve S3-NF, while may not able to remove the partial functional dependency inside an entity type or object classes, as well as the redundancy result from over-nesting.
2. The transformation of a normal form ER diagram into an S3-Graph. The result may not be unique but is dependent on the path constructed. Hence some results may not satisfy the application requirements and comply with the user’s viewpoints.
46
6. Comparison with Related Proposal(Cont.)
The most recent proposal: XNF (XML Normal Form)[2]
– It mainly provides algorithms to translate a schema, represented in a conceptual model called CM hypergraph to a scheme-tree forest in XNF.
– CM hypergraph has no concept of attribute (so too many objects) and no hierarchical structure.
– The given algorithms are non-deterministic, and suffers from efficiency.
– Adding new required information requires redesign schema.
– The algorithms generate a large no of solutions rather than verifying whether a SS schema is in normal form or not.
– ISA hierarchies are removed from CM hypergraph before input to the algorithms.
47
6. Comparison with Related Proposal(Cont.)
The advantages of our proposal: – 2-level design: incremental and iterative
• First, identify or figure out object classes,and relationship types from user requirements.
• Then add attributes for object classes and relationship types.
In contrast, XNF requires all the needed information to be presented at once. Even
a small change in information requirements requires redesign the whole schema.
48
6. Comparison with Related Proposal(Cont.)
The advantages of our proposal (Cont.): – Preserve the hierarchical structure
satisfying users’ requirements. In contrast, since CM graph has no
hierarchy, XNF needs to generate many solutions.
The approach fails when user already has a hierarchical structure, and wants to
preserve it and verifies the design is good or not.
49
7. Summary ORA-SS model helps to detect redundancy in
semistructured data. We need a normal form for ORA-SS, since ORA-
SS schema diagrams may contain redundancies and suffers from considerable updating anomalies.
We define a normal form ORA-SS schema diagram. It ensures– no unnecessary redundancy and– no updating anomalies for semistructured
databases generated from the schema . We present an algorithm for mapping ORA-SS
schema diagram into XML DTD/Schema
50
7. Summary (Cont.)
We give a design methodology and present a comprehensive algorithm for normalizing an ORA-SS schema diagram into its normal form. The steps presented can also be used as guidelines for designing semistructured databases using the ORA-SS model – As ORA-SS distinguished objects Vs.
attributes, the design complexity is reduced.– ORA-SS allows 2 levels of design: first object
classes and relationship type then add in attributes.
We show that ORA-SS design approach outperform other related proposals.
51
References1. G.Dobbie, X.Y.Wu, T.W.Ling and M.L.Lee. ORA-SS: An Object-
Relationship-Attribute Model for Semistructured Data. Technical Report TR21/00, School of Computing, National University of Singapore, 2000.
2. D.W.Embley and W.Y.Mok. Developing XML Documents with Guaranteed “Good” Properties. ER 2001.
3. R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. Proceedings of the Twenty-Third International Conference on Very Large Data Bases, pages 436-445, Athens, Greece, August 1997.
4. S. Y. Lee, M. L. Lee, T. W. Ling and L. A.. Kalinichenko. Designing Good Semi-structured Databases. ER 1999: 131-145
5. T.W. Ling. A Normal Form for Entity-Relationship Diagrams. Proc. 4th International Conference on Entity-Relationship Approach (1985)
6. T. W. Ling. A normal form for sets of not-necessarily normalized relations. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pp. 578-586. United States: IEEE Computer Society Press, 1989.
7. X.Y.Wu, T.W. Ling, M.L.Lee, G.Dobbie. Designing Semistructured Databases Using ORA-SS Model, in Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE), IEEE Computer Society Kyoto, Japan, December 2001.