29
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: [email protected] Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi

XML-to-Relational Schema Mapping Algorithm ODTDMap

  • Upload
    harvey

  • View
    47

  • Download
    3

Embed Size (px)

DESCRIPTION

XML-to-Relational Schema Mapping Algorithm ODTDMap. Speaker: Artem Chebotko* Email: [email protected] Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi. Introduction. XML has emerged as the standard for representing and exchanging data on the World Wide Web. - PowerPoint PPT Presentation

Citation preview

Page 1: XML-to-Relational Schema Mapping Algorithm ODTDMap

XML-to-Relational Schema Mapping Algorithm ODTDMap

Speaker: Artem Chebotko*

Email: [email protected]

Wayne State University

Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi

Page 2: XML-to-Relational Schema Mapping Algorithm ODTDMap

2

Introduction

• XML has emerged as the standard for representing and exchanging data on the World Wide Web.

• The increasing amount of XML documents requires the need to store and query XML documents efficiently.

Page 3: XML-to-Relational Schema Mapping Algorithm ODTDMap

3

Current approaches of storing and querying XML documents

• Native XML repositories, e.g., Software AG’s Tamino, eXcelon’s XIS.

• XML-enabled commercial database systems such as SQL Server, Oracle, and DB2

• Using RDBMS/ODBMS to store and query XML documents.

Page 4: XML-to-Relational Schema Mapping Algorithm ODTDMap

4

Issues of the relational approach

• Schema Mapping– XML data model needs to be mapped into the relational

model• Data Mapping

– XML documents need to be shredded and composed into tuples to be inserted into the relational database

• Query Mapping– XML queries need to be translated into SQL queries

• Reverse Data Mapping– Query results need to be tagged to XML format.

Page 5: XML-to-Relational Schema Mapping Algorithm ODTDMap

5

Our contributions

• We propose a schema mapping algorithm, ODTDMap, which generates a relational schema from an XML DTD for storing and querying ordered XML documents.

• Improvements over the existing algorithms– Losslessness

– Efficient support for XML queries

– Completeness (recursion, set-valued attributes DTD operators)

Page 6: XML-to-Relational Schema Mapping Algorithm ODTDMap

6

Outline of the talk

• Introduction of XML DTDs

• Mapping DTDs to relational schemas– Simplifying DTDs– Creating and inlining DTD graphs– Generating relational schemas

• An example

• Conclusions and future work

Page 7: XML-to-Relational Schema Mapping Algorithm ODTDMap

7

An overview of DTDs A DTD example

<!DOCTYPE memo [

<!ELEMENT memo (to, from, date, subject?, body)>

<!ATTLIST memo security CDATA>

<!ATTLIST memo lang CDATA>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT date (#PCDATA)>

<!ELEMENT subject (#PCDATA)>

<!ELEMENT body (para+)>

<!ELEMENT para (#PCDATA)>

]

Page 8: XML-to-Relational Schema Mapping Algorithm ODTDMap

8

DTD: Document Type Defintion

• <!DOCTYPE root-element [ doctype-declaration...

• <!ELEMENT element-name content-model>, content model: “|”, “,”, “*”, “+”, “?”

• <!ATTLIST element-name attr-name attr-type attr-default ...>

Page 9: XML-to-Relational Schema Mapping Algorithm ODTDMap

9

DTD: Document Type Definition (con’t)

• <!ATTLIST element-name attr-name attr-type attr-default ...>declares which attributes are allowed or required in which elements attribute types:

– CDATA: any value is allowed (the default) – (value|...): enumeration of allowed values – ID, IDREF, IDREFS: ID attribute values must be unique (contain "element

identity"), IDREF attribute values must match some ID (reference to an element)

– ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them deprecated)

• attribute defaults: – #REQUIRED: the attribute must be explicitly provided – #IMPLIED: attribute is optional, no default provided – "value": if not explicitly provided, this value inserted by default – #FIXED "value": as above, but only this value is allowed

Page 10: XML-to-Relational Schema Mapping Algorithm ODTDMap

10

Mapping DTDs to relational schemas

• Simplifying DTDs

• Creating and inlining DTD graphs

• Generating relational schemas

Page 11: XML-to-Relational Schema Mapping Algorithm ODTDMap

11

Simplifying DTDs

• A DTD might be very complex due to nesting, e.g.,

<ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)>• An XML query language is concerned about:

– The parent-child relationships between XML elements

– The relative order relationships between siblings (add an ordinal attribute to each relation)

Page 12: XML-to-Relational Schema Mapping Algorithm ODTDMap

12

DTD simplifications rules1. e+ e*

2. e? e

3. (e1 | … | en) (e1, … ,en)

4. (a) (e1,… ,en)* (e1*, … ,en

*) (b) e** e*

5. (a) …, e, …, e, … …,e*, …,… (b) …, e, …, e*, … …,e*, …,… (c) …, e*, …, e, … …,e*, …,… (d) …, e*, …, e*, … …,e*, …,…

Page 13: XML-to-Relational Schema Mapping Algorithm ODTDMap

13

Example of simplifying a DTD

<ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)>

simplified to

<ELEMENT a (b*, c*, d, e, f, g*, h*)>

Page 14: XML-to-Relational Schema Mapping Algorithm ODTDMap

14

Creating and inlining DTD graphs

• We create a DTD graph based on the simplified DTD. • Definition 3.2 (DTD graph) The structure of a DTD can

be represented by a labeled graph, in which nodes represent elements and attributes, and edges represent their parent-child relationships. The edges are labeled by either `*' (star edge) or `, ' (normal edge) where the label `,' is not shown for simplicity.

• Idea: inline a child c to its parent p if p can contain at most one occurrence of c.

• Rationale: inlined elements will produce a relation.

Page 15: XML-to-Relational Schema Mapping Algorithm ODTDMap

15

Inlinable node and subtree, shared node

• Definition 3.3 (Inlinable node) Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge.

• Definition 3.4 (Inlinable subtree) Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree. This subtree is called the inlinable subtree for the node e (it is rooted at e).

• Definition 3.5 (Shared node) Given a DTD graph, a node is called a shared node if it has more than one incoming edge.

Page 16: XML-to-Relational Schema Mapping Algorithm ODTDMap

16

Inlining

• Case 1: Node a is connected to b by a normal edge and b has no other incoming edges, inlining b to a.

• Case 2: Node a is connected to b by a normal edge but b has other incoming edges, b is a shared node, no inlining.

• Case 3: Node a is connected to b by a star edge, no inlining.

Page 17: XML-to-Relational Schema Mapping Algorithm ODTDMap

17

Inlining (con’t)

Page 18: XML-to-Relational Schema Mapping Algorithm ODTDMap

18

Inlining DTD graphs

Page 19: XML-to-Relational Schema Mapping Algorithm ODTDMap

19

Complexity of inlining

• Theorem 3.7 (Time Complexity)

The time complexity of our inlining algorithm is O(n) where n is the number of elements in the input DTD.

Page 20: XML-to-Relational Schema Mapping Algorithm ODTDMap

20

The inlining procedure

Page 21: XML-to-Relational Schema Mapping Algorithm ODTDMap

21

The inlining procedure (con’t)INCORRECT

Page 22: XML-to-Relational Schema Mapping Algorithm ODTDMap

22

The inlining procedure (con’t)CORRECT

Page 23: XML-to-Relational Schema Mapping Algorithm ODTDMap

23

Generating relational schema

Page 24: XML-to-Relational Schema Mapping Algorithm ODTDMap

24

Generating schema mapping info.

• Definition 3.8 ( Mapping) is a mapping from X to R, where X is the set of XML element and attribute types in the input XML DTD, and R is the set of relations in the relational database. Given an XML element type e, (e) will return the corresponding relation that is used to store e. Similarly, given an XML attribute type a of element type e, (e.a) will return the corresponding relation that is used to store a of e.

Page 25: XML-to-Relational Schema Mapping Algorithm ODTDMap

25

A complete example

Page 26: XML-to-Relational Schema Mapping Algorithm ODTDMap

26

DTD graphInlined DTD graph

Page 27: XML-to-Relational Schema Mapping Algorithm ODTDMap

27

Generated relational schema

Page 28: XML-to-Relational Schema Mapping Algorithm ODTDMap

28

Conclusions

• We defined the schema mapping algorithm ODTDMap, which has several improvements over the existing ones.

• It is lossless in the sense that one can reconstruct original XML document in the given document order, based on the target relational schema generated by ODTDMap.

• It has efficient support for recursive queries and schemas.• It defines how to map set-valued XML attributes.• Experimental results showed good performance and

scalability of the algorithm.

Page 29: XML-to-Relational Schema Mapping Algorithm ODTDMap

29

Future work

• Extending our work to XML Schema to support data types other than string type.

• Maintain the ID/IDREF/IDREFS in terms of key and foreign key constraints.