69
IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al- Shorbagy

IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Embed Size (px)

Citation preview

Page 1: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

IS432Semi-Structured Data

Lecture 1:

SSD & XML

Dr. Gamal Al-Shorbagy

Page 2: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

In this lecture

• What semi structured data is.– Why we need it

– How it is represented and processed

– Related technologies

• What is XML– XML syntax

– XML Query data model

– Comparison of XML with semistructured data

Papers:– XML, Java, and the future of the Web by Jon Bosak, Sun Microsystems.

– W3C XML Query Data Model Mary Fernandez, Jonathan Robie.

2

Page 3: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

The Data

A Document/Page for a common user Type of data ?

Difficult to identify.

Is there any order ?No particular format or sequence

Does it follow any rules ? Can we predict about data ?

Management and Representation Unmanageable by nature Often found as; text , video, sound , images

Query and Search Brute force, finding a needle in the haystack.

3

Page 4: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

The data

A table for organizations Data follows certain model e.g. Relational;

Entities, Same Attributes, Order and Relations Schema Data separation

First Schema then Data Data elements are strongly “typed” and “Ordered” Corporate Ownership

Management and Representation Specialized DBMS Engine

Management, Storage, Query Formulation Represented as ; Entity - Tuples, Class - Objects

Query and Search Optimized via indexes, trees …

4

Page 5: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

group of tables

5

Page 6: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

The data

• name: Some Body• email: [email protected], [email protected] • ------------------------------------------------------------------• name:

• first name: Ceaser• last name: The Great

• email: [email protected] • ------------------------------------------------------------------• name: Ranjeet Singh

• affiliation: Punjab 6

Page 7: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

the Data

A graph/web for advanced users Structure Data (mixture) Schema-Less, Self Describing (Prescription Vs. Description) Schema may evolve overtime Schema may be larger than the data itself Irregular, Incomplete, Evolving Structure Entities may have different/missing attributes(Example; Person) Ownership is often shared among organizations

Management and Representation Data Representation & Exchange on WWW Labeled Directed Graph Representation

Query and Search Getting better …

7

Page 8: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Kinds of Data

Title Author FN

Author LN

Publisher Page

A D Edd IEEE 233

B Ted Hee ACM 553

StructuredUnstructured

• Semi Structured

&o1

&o12 &o24 &o29

&o43&96

&243 &206

&25

“Serge”“Abiteboul”

1997

“Victor”“Vianu”

122 133

paperbook

paper

references

referencesreferences

authortitle

yearhttp

author

authorauthor

titlepublisherauthor

authortitle

page

firstnamelastname

firstname lastname firstlast

Bib

8

Page 9: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Why Semi structured data is important?

Scenario An organization A publishes movie data on its

web pages (HTML), generated from DBMS.A second organization B wants some movie

information; can access only web data.

DBMS

A BHTML

When we want to treat Web sources like a database, but can’t constrain these sources with a schema

9

Page 10: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Why Semi structured data is important?

Scenario; Electronic Data Interchange Standard

computer-to-computer interchange of strictly formatted messages http://www.itl.nist.gov/fipspubs/fip161-2.htm

When we want as flexible format for data exchange between disparate systems/databases;

Electronic Data Interchange ISO Standard

10

Page 11: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Semi Structured Data-(Pros&Cons)

Advantages No need to update schema continuouslyEasy to discover new data and load it Easy to integrate heterogeneous data Easy to query without knowing data types

Disadvantages The type information lossHarder Storage/Query Optimization/Management

11

Page 12: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Managing Semi structured Data

How do we model it? (directed labeled graphs).How do we query it? (many proposals, all include

regular path expressions).Optimize queries? (beginning to understand).Store the data? (looking for patterns)Integrity constraints, views, updates,…,

12

Page 13: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Semi Structured Data: OEM

Object Exchange Model Data in OEM is schema-less and self-describing, can be thought of as labeled

directed graph where nodes are objects, consisting of: unique object identifier (for example, &7), descriptive textual label (street), type (string), a value (“22 Deer Rd”).

Objects: atomic and complex: atomic object contains value for base type (e.g., integer or string) and in

diagram has no outgoing edges. All other objects are complex objects whose types are a set of object

identifiers.

Lore: OEM Confirming Data Storage System http://infolab.stanford.edu/lore/ Lorel: Lore Query Language

13

Page 14: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Semi-structured data model example&o1

&o12 &o24 &o29

&o43&96

&243 &206

&25

“Serge”“Abiteboul”

1997

“Victor”“Vianu”

122 133

paperbook

paper

references

referencesreferences

authortitle

yearhttp

author

authorauthor

title publisherauthor

authortitle

page

firstnamelastname

firstname lastname firstlast

Bib

Object Exchange Model (OEM)

complex object

atomic object

Nodes are objects; labels on the arcs are attribute names. 14

Page 15: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Querying Semi structured Data

Important features:Ability to navigate the data (regular path expressions),Querying the attribute names (arc variables),Create new structures,Type coercion.

Languages: Lorel (Stanford) http://infolab.stanford.edu/pub/papers/lorel96.ps UnQL (U. Penn), http://www.unqlspec.org/display/UnQL/Home

15

Page 16: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

17.2 Semistructured data

Lore and Lorel

Lore (Lightweight Object Repository) A DBMS Has external data manager

Lorel (Lore language): Returning meaningful results even when some data absent To operate uniformly over single-valued and set-valued data Accepts data with different types Can return heterogeneous objects Allows the object structure to be partially known.

Example: Find all properties with annual rent.SELECT DreamHomes.PropertyForRent FROM DreamHome.PropertyForRent.annualRent

Answer: PropertyForRent &6, street &14 “18 Dale Rd”, type &15 1, annualRent &16 7200 OverseenBy &4

16

Page 17: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Data Models Timeline

• Network Data Models (1964)

• Hierarchical Data Models (1968)

• Relational Data Models (1970)

• Object-oriented Data Models (~ 1985)

• Object-relational Data Models (~ 1990)

• Semi-structured Data Models (XML 1.0) (~1998)

17

Page 18: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML

• a W3C standard to complement HTML

• origins: structured text SGML

• motivation:– HTML describes presentation– XML describes content

• • http://www.w3.org/TR/2000/REC-xml-20001006 (version

2, 10/2000)

SGMLXMLHTML4.0

18

Page 19: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML – An Embodiment of Semi structured Data

Meta-language

A de-facto language to Represent Semi-Structured Data To create new languages (WAP, VoiceXML, MathML)

Extensibility

Create new elements Create new languages (WML, WAP)

Markup

Text Markup Element = Data + Markup Document = Nested Elements

<note>

<to>Rana </to>

<from>Tunga </from>

<heading>Hello </heading>

<body>What’s up ! </body>

</note>

19

Page 20: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

From HTML to XML

HTML describes the presentation20

Page 21: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

HTML

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteoul, Buneman, Suciu

<br> Morgan Kaufmann, 199921

Page 22: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

XML describes the content22

Page 23: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML VS. HTML

XML and HTML were designed with different goals:

XML to describe data and to focus on what data is.

HTML was designed to display data and to focus on how data looks.

It is important to understand that XML is not a replacement for HTML.

23

Page 24: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Data Model

Several competing models:• Document Object Model (DOM):

– http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010209/ (2/2001)

– class hierarchy (node, element, attribute,…)– objects have behavior– defines API to inspect/modify the document

• XSL data model• Infoset

– PSV (post schema validation)

• XML Query data model (next)

24

Page 25: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Why XML

Portability Language neutrality Platform independence Program-Data Decoupling

Logic and NotationData and MetadataInformation and StructureContent and Form

25

Page 26: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Why XML

Data Evolution: Schema update not required

Integration: A prior knowledge of schema is not necessary

Sharing between incompatible formats Interoperability without rebuilding the systems.

Report Concrete Examples

26

Page 27: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

How computers understand xml

Parsers; Software to understand XMLRemoves Markup and Retrieves Data

Document Object Model (DOM)Model a document as a Tree

Simple API for XML (SAX)Sequential access

27

Page 28: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

What XML is not

A little hard to understand, but XML does not DO anything. XML is created to structure, store and send information.

<note>

<to>Rana </to>

<from>Tunga </from>

<heading>Hello </heading>

<body>What’s up ! </body>

</note>

The note; a header, a message body, sender and receiver information. But still, this XML document does not DO anything.

Just information wrapped in XML tags. Someone must write a piece of software to send, receive or display it.

28

Page 29: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

29

Page 30: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More XML: Attributes

<book price = “55” currency = “USD”>

<title> Foundations of Databases </title>

<author> Abiteboul </author>

<year> 1995 </year>

</book>

attributes are alternative ways to represent data30

Page 31: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Parsers and Well-formed XML Documents

• XML parser– Processes XML document

• Reads XML document

• Checks syntax

• Reports errors (if any)

• Allows programmatic access to document’s contents

31

Page 32: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Parsers and Well-formed XML Documents (cont.)

• XML document syntax– Considered well formed if syntactically correct

• Single root element

• Each element has start tag and end tag

• Tags properly nested

• Attribute (discussed later) values in quotes

• Proper capitalization– Case sensitive

32

Page 33: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Parsers and Well-formed XML Documents (cont.)

• XML parsers support– Document Object Model (DOM)

• Builds tree structure containing document data in memory

– Simple API for XML (SAX)• Generates events when tags, comments, etc. are

encountered– (Events are notifications to the application)

33

Page 34: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Parsing an XML Document with msxml

• XML document– Contains data

– Does not contain formatting information

– Load XML document into Internet Explorer 5.0• Document is parsed by msxml.

• Places plus (+) or minus (-) signs next to container elements– Plus sign indicates that all child elements are hidden– Clicking plus sign expands container element

» Displays children– Minus sign indicates that all child elements are visible– Clicking minus sign collapses container element

» Hides children

• Error generated, if document is not well formed

34

Page 35: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML document shown in IE5.

35

Page 36: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Characters

• Character set– Characters that may be represented in XML

document• e.g., ASCII character set

– Letters of English alphabet

– Digits (0-9)

– Punctuation characters, such as !, - and ?

36

Page 37: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Character Set

• XML documents may contain– Carriage returns– Line feeds– Unicode characters (Section 5.5.4)

• Enables computers to process characters for several languages

37

Page 38: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Characters vs. Markup

• XML must differentiate between– Markup text

• Enclosed in angle brackets (< and >)– e.g,. Child elements

– Character data• Text between start tag and end tag

– e.g., Fig. 5.1, line 7: Welcome to XML!

38

Page 39: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

White Space, Entity References and Built-in Entities

• Whitespace characters– Spaces, tabs, line feeds and carriage returns

• Significant (preserved by application)

• Insignificant (not preserved by application)– Normalization

» Whitespace collapsed into single whitespace character» Sometimes whitespace removed entirely

<markup>This is character data</markup>

after normalization, becomes

<markup>This is character data</markup>

39

Page 40: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

White Space, Entity References and Built-in Entities (cont.)

• XML-reserved characters– Ampersand (&)

– Left-angle bracket (<)

– Right-angle bracket (>)

– Apostrophe (’)

– Double quote (”)

• Entity references– Allow to use XML-reserved characters

• Begin with ampersand (&) and end with semicolon (;)

– Prevents from misinterpreting character data as markup

40

Page 41: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

White Space, Entity References and Built-in Entities (cont.)

• Build-in entities– Ampersand (&amp;)– Left-angle bracket (&lt;)– Right-angle bracket (&gt;)– Apostrophe (&apos;)– Quotation mark (&quot;)– Mark up characters “<>&” in element message

<message>&lt;&gt;&amp;</message>

41

Page 42: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More XML: Oids and References

<person id=“o555”> <name> Jane </name> </person>

<person id=“o456”> <name> Mary </name>

<children idref=“o123 o555”/>

</person>

<person id=“o123” mother=“o456”><name>John</name>

</person>oids and references in XML are just syntax

42

Page 43: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More XML: CDATA Section

• Syntax: <![CDATA[ .....any text here...]]>

• Example:

<example> <![CDATA[ some text here </notAtag> <>]]>

</example>

43

Page 44: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Using a CDATA section

44

Page 45: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More XML: Entity References

• Syntax: &entityname;

• Example: <element> this is less than &lt; </element>

• Some entities: &lt; <

&gt; >

&amp; &

&apos; ‘

&quot; “

&#38; Unicode char 45

Page 46: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More XML: Processing Instructions

• Syntax: <?target argument?>• Example:

<product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price></product>

• What do they mean ?

46

Page 47: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More XML: Comments

• Syntax <!-- .... Comment text... -->

• Yes, they are part of the data model !!!

47

Page 48: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Namespaces

• http://www.w3.org/TR/REC-xml-names (1/99)

• name ::= [prefix:]localpart

<book xmlns:isbn=“www.isbn-org.org/def”>

<title> … </title>

<number> 15 </number>

<isbn:number> …. </isbn:number>

</book>

<book xmlns:isbn=“www.isbn-org.org/def”>

<title> … </title>

<number> 15 </number>

<isbn:number> …. </isbn:number>

</book>48

Page 49: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

<tag xmlns:mystyle = “http://…”>

<mystyle:title> … </mystyle:title>

<mystyle:number> …

</tag>

<tag xmlns:mystyle = “http://…”>

<mystyle:title> … </mystyle:title>

<mystyle:number> …

</tag>

XML Namespaces

• syntactic: <number> , <isbn:number>

• semantic: provide URL for schema

defined here

49

Page 50: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Namespaces

• Naming collisions– Two different elements have same name

<subject>Math</subject>

<subject>Thrombosis</subject>

• Namespaces– Differentiate elements that have same name

<school:subject>Math</school:subject>

<medical:subject>Thrombosis</medical:subject>

• school and medical are namespace prefixes– Prepended to elements and attribute names– Tied to uniform resource identifier (URI)

» Series of characters for differentiating names

50

Page 51: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Namespaces

• Creating namespaces– Use xmlns keyword

xmlns:text = “urn:deitel:textInfo”

xmlns:image = “urn:deitel:imageInfo”

• Creates two namespace prefixes text and image•urn:deitel:textInfo is URI for prefix text•urn:deitel:imageInfo is URI for prefix image

– Default namespaces• Child elements of this namespace do not need prefix xmlns = “urn:deitel:textInfo”

51

Page 52: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

1 <?xml version = "1.0"?>

2

3 <!-- Fig. 5.8 : namespace.xml -->

4 <!-- Namespaces -->

5

6 <directory xmlns:text = "urn:deitel:textInfo"

7 xmlns:image = "urn:deitel:imageInfo">

8

9 <text:file filename = "book.xml">

10 <text:description>A book list</text:description>

11 </text:file>

12

13 <image:file filename = "funny.jpg">

14 <image:description>A funny picture</image:description>

15 <image:size width = "200" height = "100"/>

16 </image:file>

17

18 </directory>

Element directory contains two namespace prefixes

Use prefix text to describe elements file

and description

Apply prefix text to describe elements file, description and size

Page 53: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

1 <?xml version = "1.0"?>

2

3 <!-- Fig. 5.9 : defaultnamespace.xml -->

4 <!-- Using Default Namespaces -->

5

6 <directory xmlns = "urn:deitel:textInfo"

7 xmlns:image = "urn:deitel:imageInfo">

8

9 <file filename = "book.xml">

10 <description>A book list</description>

11 </file>

12

13 <image:file filename = "funny.jpg">

14 <image:description>A funny picture</image:description>

15 <image:size width = "200" height = "100"/>

16 </image:file>

17

18 </directory>

urn:deitel:textInfo is default namespace

Element file is in default namespace

Specify namespace

Page 54: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML StylesheetExtensible Stylesheet Language (XSL)Language for document transformation

Transformation Converting XML to another form

Formatting objectsLayout of XML document

Defined by W3C

http://www.codeproject.com/Articles/294380/Applying-XSLT-Stylesheet-to-an-XML-File-at-Runtime 54

Page 55: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Xml path WHY

To Access particular parts of and XML Document To Navigate within an XML Document

WHAT Analogous to Select statement in SQL

HOW It views an XML document as a tree Root of the tree is a node, which doesn’t correspond

to anything in the document Internal nodes are elements Leaves are either

Attributes Text nodes Comments

55

Page 56: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Xml path

56

Page 57: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Xml query

• WHAT • XQuery can be used to: Extract information to use in a Web Service Generate summary reports Transform XML data to XHTML Search Web documents for relevant

information

WHYNeed to extract parts of XML documents (Database)Need to transform documents into different forms

Another XML form HTML (to display on a Web browser) Other (e.g. bibtex)

Need to relate – join – parts of the same or different documents

HOW•The XML-QL language •XQuery – W3C standard.

• Very powerful, fairly intuitive, SQL-style

57

Page 58: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

• http://www.w3.org/TR/query-datamodel/2/2001

• Describes XML as a tree, specialized nodes

• Uses a functional-style notation (think ML)

58

Page 59: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

• Node ::= DocNode | ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode

59

Page 60: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

Element node (simplified definition):

• elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode

• QNameValue = means “a tag name”• {...} = means “set of...”• [...] = means “list of ...”

60

Page 61: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

• Reads: “give me a tag, a set of attributes, a list of elements/values, and I will return an element”

61

Page 62: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

Example<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])

price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…

book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])

price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…

62

Page 63: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

Attribute node:

• attrNode : (QNameValue, ValueNode) AttrNode

63

Page 64: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

Example

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

price2 = attrNode(price,string10) string10 = valueNode(…) /* next */currency3 = attrNode(currency, string11)string11 = valueNode(…)

price2 = attrNode(price,string10) string10 = valueNode(…) /* next */currency3 = attrNode(currency, string11)string11 = valueNode(…)

64

Page 65: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

Value node:• ValueNode = StringValue |

BoolValue | FloatValue …

• stringValue : string StringValue• boolValue : boolean BoolValue• floatValue : float FloatValue

65

Page 66: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

XML Query Data Model

Example

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

price2 = attrNode(price,string10)string10 = valueNode(stringValue(“55”))currency3 = attrNode(currency, string11)string11 = valueNode(stringValue(“USD”))

title4 = elemNode(title, string9)string9 = valueNode(stringValue(“Foundations…”))

price2 = attrNode(price,string10)string10 = valueNode(stringValue(“55”))currency3 = attrNode(currency, string11)string11 = valueNode(stringValue(“USD”))

title4 = elemNode(title, string9)string9 = valueNode(stringValue(“Foundations…”))

66

Page 67: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Semi-structured Data vs. XML• both described best by a graph

• both are schema-less, self-describing

• Attributes ---> tags

• objects ---> elements

• atomic values ---> CDATA (characters)

• Order? Assumed in XML.

• XML attributes (fixable)

• References in XML.

67

Page 68: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

Similarities and Differences

<person id=“o123”>

<name> Alan </name>

<age> 42 </age>

<email> ab@com </email>

</person>

<person id=“o123”>

<name> Alan </name>

<age> 42 </age>

<email> ab@com </email>

</person>

{ person: &o123

{ name: “Alan”,

age: 42,

email: “ab@com” }

}

{ person: &o123

{ name: “Alan”,

age: 42,

email: “ab@com” }

}

person

name age email

Alan 42 ab@com

person

name age email

Alan 42 ab@com

father father

<person father=“o123”> …</person>

{ person: { father: &o123 …}}

similar on trees, different on graphs68

Page 69: IS432 Semi-Structured Data Lecture 1: SSD & XML Dr. Gamal Al-Shorbagy

More Differences

• XML is ordered, ssd is not

• XML can mix text and elements:

<talk> Making Java easier to type and easier to type

<speaker> Phil Wadler </speaker>

</talk>

• XML has lots of other stuff: entities, processing instructions, comments

Very important:these differences make XML data management harder 69