Yazici XML Ex

Preview:

Citation preview

XML and Internet DatabasesXML an nt rn t Data as s

1

Outline

• Background: documents (SGML/HTML) • Background: documents (SGML/HTML) and databases (structured and

i d d ) semistructured data)

• XML Basics and Document Type • XML Basics and Document Type Descriptors

• XML query languages: XPath, XQuery

2

Part I: Background

What’s the difference between the world of documents and information retrieval and

databases and query interfaces?

3

Documents vs DatabasesDocument world

> plenty of small documents> usually static

Database world> a few large databases> usually dynamic> usually static

> implicit structuresection, paragraph, toc,

> usually dynamic

> explicit structure (schema)section, paragraph, toc,

> tagging

> human friendly> records

> machine friendlyy

> contentform/layout, annotation

> machine friendly

> contentschema, data, methods

> Paradigms“Save as”

schema, data, methods

> ParadigmsAtomicity, Concurrency, Isolation, Durability

> meta-dataauthor name, date, subject > meta-data

schema description

4

What to do with themDocuments Database

• editing

i ti

• updating

• printing

• spell-checking• cleaning

spell checking• counting words

• querying• retrieving (IR)

hi

q y g

• searching• composing/transforming

5

HTMLLin f n f p blishin h p t xt n th W ld Wid • Lingua franca for publishing hypertext on the World Wide Web

• HTML is widely used for formatting and structuring Web documentsdocuments.

• Designed to describe how a Web browser should arrange text, images and push-buttons on a page.

• Easy to learn but does not convey structure and meaning of Easy to learn, but does not convey structure and meaning of data in the Web pages.

• Fixed tag set.

<HTML><HEAD><TITLE>Welcome to the XML course</TITLE></HEAD>

Opening tag Text (PCDATA)

<HEAD><TITLE>Welcome to the XML course</TITLE></HEAD><BODY>

<H1>Introduction</H1><IMG SRC=”dragon.jpeg" WIDTH="200" HEIGHT="150” >

</BODY></HTML>

Closing tag “Bachelor” tagAttribute name Attribute value

6

</HTML>

Semistructure data

1. Information integration: important new application that motivates what followsapplication that motivates what follows.

2. Semistructured data: a new data model designed to cope with problems of designed to cope with problems of information integration.

3 XML W b d d h i 3. XML: a new Web standard that is essentially semistructured data.

4. XQUERY: an emerging standard query language for XML data.

7

Information IntegrationProblem: related data exists in many places. They

talk about the same things, but differ in model, g , ff ,schema, conventions (e.g., terminology).

Example: In the real world, every bar has its own database.

• Some may have relations like beer-price; others have an Microsoft Word file from which the menu i i t dis printed.

• Some keep phones of manufacturers but not addressesaddresses.

• Some distinguish beers and ales; others do not.

8

Two approaches

1. Warehousing: Make copies of information at each data source centrallyat each data source centrally.– Reconstruct data daily/weekly/monthly,

but do not try to keep it up to datebut do not try to keep it up-to-date.

2. Mediation: Create a view of all information, but do not make copies.p– Answer queries by sending appropriate

queries to sources.q .

9

userquery result

Warehousing WarehouseWar hous ng Warehouse

Combiner

Wrapper Wrapper

DB1 DB2

10

Mediationltquery result

Mediator

Wrapper Wrapperresult

queryqueryresult

Wrapper Wrapperquery result query result

DB1 DB2

11

Semistructured Data

• A different kind of data model, more suited to information-integration suited to information-integration applications than either relational or OO.

Think of “objects ” but with the type of – Think of objects, but with the type of an object for its own business rather than the business of the class to which than the business of the class to which it belongs.All i f i f l – Allows information from several sources, with related but different properties, to b fit t th i h lbe fit together in one whole.

• Major application: XML documents.

12

Graph Representation of Semistructured DataSemistructured Data

• Nodes = objects.N d d i l d h • Nodes connected in a general rooted graph structure.

• Labels on arcs.• Atomic values on leaf nodes.m f .• Big deal: no restriction on labels

(roughly = attributes)(roughly = attributes).– Zero, one, or many children of a given

label type are all OKlabel type are all OK.

13

XML (Extensible Markup Language)

HTML uses tags for formatting (e.g., “italic”).XML uses tags for semantics (e g “this is an XML uses tags for semantics (e.g., this is an

address”).• Two modes:• Two modes:1. Well-formed XML: A document that obeys the

“nested tags” rule and does not repeat annested tags rule and does not repeat anattribute within a tag is said to be well-formed.It allows you to invent your own tags much likeIt allows you to invent your own tags, much likelabels in semistructured data.

2 Valid XML involves a DTD (Document Type 2. Valid XML involves a DTD (Document Type Definition) that tells the labels and gives a grammar for how they may be nested.

14

g f y y

Well-Formed XML

1. Declaration = <? ... ?> .Normal declaration is– Normal declaration is<? XML VERSION = "1.0" STANDALONE = "yes" ?>?>

– “Standalone” means that there is no DTD specifiedspecified.

2. Root tag surrounds the entire balance of the d tdocument.– <FOO> is balanced by </FOO>, as in HTML.

3. Any balanced structure of tags OK.– Option of tags that don’t require balance

15

Option of tags that don t require balance, like <P> in HTML.

The Structure of XML

• XML consists of tags and text

• Tags come in pairs <date> ...</date>g p

• They must be properly nestedThey must be properly nested<date> <day> ... </day> ... </date> --- good

d t d /d t /d b d<date> <day> ... </date>... </day> --- bad

16

XML text

XML has only one “basic” type -- text.

It is bounded by tags, e.g.<title> The Big Sleep </title><year> 1935 </ year> --- 1935 is still textyea 935 / yea 935 s st ll text

XML text is called PCDATA (for parsedXML text is called PCDATA (for parsedcharacter data). It uses a 16-bit encoding.

17

XML structureXML structure

Nesting tags can be used to express various Nesting tags can be used to express various structures. E.g., A tuple (record) :

<person>M l l At hi /<name> Malcolm Atchison </name>

<tel> (215) 898 4321 </tel>< il> @d l </ il><email> mp@dcs.gla.ac.sc </email>

</person>

18

TerminologyThe segment of an XML document between an opening and a corresponding closing tag is opening and a corresponding closing tag is called an element.

<person><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><email> mp@dcs.gla.ac.sc </email>

element<email> mp@dcs.gla.ac.sc </email>

</person>

lelement a sub element not an elementelement, a sub-elementof

19

XML is tree-likeXML is tree like

person

name emailtel tel email

Malcolm Atchison (215) 898 4321(215) 898 4321 mp@dcs.gla.ac.sc

20

A C l t XML D tA Complete XML Document

<?xml version="1.0"?><person><name> Malcolm Atchison </name><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><email> mp@dcs.gla.ac.sc </email>

</person>/p

21

Example

bbarbeer

beerbar

Bud A.B.

prize

name

manfmanfname

M’lob1995 Gold

Bud A.B.awardyear

name

servedAt 1995 GoldservedAt

Joe’s Maple

name addr

22

Joe s Maple

Example

<?XML VERSION = "1.0" STANDALONE = "yes"?>y<BARS>

<BAR><NAME>Joe's Bar</NAME><BAR><NAME>Joe s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR></BAR><BAR> ...

23

</BARS>

Representing relational DBs:p gTwo ways

projects:title budget managedBy

employees:name ssn age

24

Project and Employee relations in XML

Projects and employees are intermixed

<db><project> <employee>

<title> Pattern recognition </title><budget> 10000 </budget><managedBy> Joe

<name> Sandra </name><ssn> 2234 </ssn><age> 35 </age>/ l

g y</managedBy>

</project><employee>

</employee><project>

<title> Auto guided vehicle </title><budget> 70000 </budget><employee>

<name> Joe </name><ssn> 344556 </ssn>

<budget> 70000 </budget><managedBy> Sandra </managedBy>

</project>:<age> 34 < /age>

</employee>

:</db>

25

Project and Employee relations in XML (cont’d)

<db>l

Employees follows projects

<projects><project>

<title> Pattern recognition </title>

<employees><employee>

<name> Joe </name>g /<budget> 10000 </budget><managedBy> Joe </managedBy>

</project>

<ssn> 344556 </ssn><age> 34 </age>

</employee></project><project>

<title> Auto guided vehicles </title>

</employee> <employee>

<name> Sandra </name>

<budget> 70000 </budget><managedBy> Sandra

</managedBy>

<ssn> 2234 </ssn><age>35 </age>

</employee>/ g y</project>

:</projects>

</employee>:<employees>/db

26

</projects> </db>

Project and Employee relations in XML (cont’d)

db

Or without “separator” tags …<db>

<projects> <title> Pattern recognition </title> <employees>

<name> Joe </name>g

<budget> 10000 </budget><managedBy> Joe </managedBy><title> Auto guided vehicles

<name> Joe </name><ssn> 344556 </ssn><age> 34 </age><name> Sandra </name><title> Auto guided vehicles

</title><budget> 70000 </budget>

dB S d

<name> Sandra </name><ssn> 2234 </ssn><age> 35 </age>:<managedBy> Sandra

</managedBy>:

:</employees>

</db>

</projects>

27

AttributesAn (opening) tag may contain attributes. These are typically used to describe the content of an yp y felement.

<entry><word language = “en”> cheese </word><word language = “fr”> fromage </word><word language = fr > fromage </word><word language = “ro”> branza </word><meaning> A food made … </meaning>g / g

</entry>

28

Attributes (cont’d)Another common use for attributes is to express dimension or typeyp

<picture><height dim “cm”> 2400 </height><height dim= “cm”> 2400 </height><width dim= “in”> 96 </width><data encoding = “gif” compression = “zip”><data encoding gif compression zip >

M05-.+C$@02!G96YE<FEC ...</data>

</picture>

29

Using IDs<family>

<person id="jane" mother="mary" father="john"> <name> Jane Doe </name><name> Jane Doe </name>

</person><person id="john" children="jane jack"> p j j j

<name> John Doe </name></person> <person id="mary" children="jane jack"><person id= mary children= jane jack >

<name> Mary Doe </name></person>

<person id="jack" mother=”mary" father="john"> <name> Jack Doe </name>

</person></person></family>

30

An object-oriented schema An object-oriented schema

class Movie class Actorclass Movie

( extent Movies, key title ){

class Actor

( extent Actors, key name ){

attribute string title;

attribute string director;

l h

attribute string name;

relationship set<Movie> acted_In

relationship set<Actor> casts

inverse Actor::acted_In;

attribute int budget;

inverse Movie::casts;

attribute int age;

attribute set<string> directed;attribute int budget;} ;

attribute set<string> directed;} ;

31

An example<db>

<movie id=“m1”><title>Waking Ned Divine</title><title>Waking Ned Divine</title><director>Kirk Jones III</director><cast idrefs=“a1 a3”></cast>

<actor id=“a1”><name>David Kelly</name>

f<budget>100,000</budget> </movie><movie id=“m2”>

<acted_In idrefs=“m1 m3 m78” ></acted_In>

</actor>t id “ 2”movie id m2

<title>Dragonheart</title><director>Rob Cohen</director>< t id f “ 2 9 21”></ t>

<actor id=“a2”><name>Sean Connery</name><acted_In idrefs=“m2 m9 m11”></acted In><cast idrefs=“a2 a9 a21”></cast>

<budget>110,000</budget> </movie>

</acted_In><age>68</age>

</actor><actor id=“a3”>

<movie id=“m3”><title>Moondance</title><director>Dagmar Hirtz</director>

<actor id= a3 ><name>Ian Bannen</name><acted_In idrefs=“m1 m35”></acted In><director>Dagmar Hirtz</director>

<cast idrefs=“a1 a8”></cast><budget>90,000</budget>

</acted_In></actor>:

</db>

32

</movie>:

/db

Part II: Document Type DescriptorsPart II: Document Type Descriptors(DTD)

Imposing structure on XML documentsp g

33

Document Type DescriptorsDocument ype Descr ptors

• Document Type Descriptors (DTDs) impose yp p ( ) pstructure on an XML document.

Th i l ti hi b t DTD • There is some relationship between a DTD and a schema, but it is not close – there is till d f dditi l “t i ” tstill a need for additional “typing” systems.

• The DTD is a syntactic specificationThe DTD is a syntactic specification.

34

Document Type Definitions (DTD)

Essentially a grammar describing the legal nesting of tags.Ess nt a y a grammar scr ng th ga n st ng of tags.• Intention is that DTD’s will be standards for a domain,

used by everyone preparing or using data in that domain.y y p p g g– Example: a DTD for describing protein structure; a

DTD for describing bar menus, etc.

Gross Structure of a DTD:Gross Structure of a DTD:<!DOCTYPE root tag [

<!ELEMENT name (components)><!ELEMENT name (components)>more elements

]>

35

]>

Example: An Address BookExample: An Address Book<person>

<name> MacNiel, John </name>

<greet> Dr. John MacNiel </greet>

Exactly one nameAt most one greeting

<addr>1234 Huron Street </addr>

<addr> Rome, OH 98765 </addr>

As many address lines as needed (in order)<addr> Rome, OH 98765 </addr>

<tel> (321) 786 2543 </tel>

<f > (321) 786 2543 </f >

( )

Mixed telephones d f<fax> (321) 786 2543 </fax>

<tel> (321) 786 2543 </tel>

and faxes

As many<email> jm@abc.com </email>

</person>

As manyas needed

36

Specifying the structureSpecifying the structure

name t s if name l t• name to specify a name element• greet? to specify an optional g p y p

(0 or 1) greet elements• name greet? to specify a name followed by • name,greet? to specify a name followed by

an optional greet

37

Specifying the structure (cont)Specifying the structure (cont)

add * t s if 0 add ess li s• addr* to specify 0 or more address lines

• tel | fax a tel or a fax element | m

• (tel | fax)* 0 or more repeats of tel or fax

• email* 0 or more email elements

38

A DTD for the address bookA DTD for the address book

<!DOCTYPE addressbook [[<!ELEMENT addressbook (person*)><!ELEMENT personp

(name, greet?, address*, (fax | tel)*, email*)><!ELEMENT name (#PCDATA)>( )<!ELEMENT greet (#PCDATA)><!ELEMENT address (#PCDATA)>( )<!ELEMENT tel (#PCDATA)><!ELEMENT fax (#PCDATA)><!ELEMENT email (#PCDATA)>

]>

39

Two DTDs for the relational DBTwo DTDs for the relational DB

<!DOCTYPE db [<!ELEMENT db (projects,employees)><!ELEMENT projects (project*)><!ELEMENT projects (project*)><!ELEMENT employees (employee*)><!ELEMENT project (title, budget, managedBy)>p j ( , g , g y)<!ELEMENT employee (name, ssn, age)>...

]>]>

40

Summary of XML regular expressionsy g p• Each element name is a tag.

It t th t th t t d • Its components are the tags that appear nested within, in the order specified.A The tag A occurs• A The tag A occurs

• e1,e2 The expression e1 followed by e2* 0 f • e* 0 or more occurrences of e

• e? Optional -- 0 or 1 occurrences1 • e+ 1 or more occurrences

• e1 | e2 either e1 or e2( ) i• (e) grouping

41

Back to the object-oriented schema Back to the object-oriented schema

class Movie class Actorclass Movie

( extent Movies, key title ){

class Actor

( extent Actors, key name ){

attribute string title;

attribute string director;

l h

attribute string name;

relationship set<Movie> acted_In

relationship set<Actor> casts

inverse Actor::acted_In;

attribute int budget;

inverse Movie::casts;

attribute int age;

attribute set<string> directed;attribute int budget;} ;

attribute set<string> directed;} ;

42

Schema dtdSchema.dtd

<!DOCTYPE db [<!ELEMENT db (movie+, actor+)>( , )<!ELEMENT movie (title,director,casts,budget)><!ATTLIST movie id ID #REQUIRED><!ELEMENT title (#PCDATA)><!ELEMENT director (#PCDATA)><!ELEMENT casts EMPTY><!ELEMENT casts EMPTY>

<!ATTLIST casts idrefs IDREFS #REQUIRED><!ELEMENT budget (#PCDATA)>

43

Schema dtd (cont’d)Schema.dtd (cont d)

<!ELEMENT actor (name, acted_In,age?, directed*)><!ATTLIST actor id ID #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT acted_In EMPTY>

<!ATTLIST acted In idrefs IDREFS #REQUIRED><!ATTLIST acted_In idrefs IDREFS #REQUIRED><!ELEMENT age (#PCDATA)><!ELEMENT directed (#PCDATA)>

]>

44

Elements of a DTD

An element is a name (its tag) and a parenthesizeddescription of tags within an elementdescription of tags within an element.• Special case: (#PCDATA) after an element name means it

is textis text.Example

<!DOCTYPE Bars [<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME BEER+)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME PRICE)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]>

45

]>

Example of (a)<?XML VERSION = "1.0" STANDALONE = "no"?>

<!DOCTYPE Bars [<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]]>

<BARS><BAR><NAME>Joe's Bar</NAME>

<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>

<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>

</BAR>

46

</BAR><BAR> ...

</BARS>

Example of (b)Suppose our bars DTD is in file bar.dtd:

<?XML VERSION = "1.0" STANDALONE = "no"?>

<!DOCTYPE Bars SYSTEM "bar.dtd">

<BARS><BAR><NAME>Joe's Bar</NAME>

<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>

<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>

</BAR></BAR><BAR> ...

</BARS>

47

Attribute Lists• Opening tags can have “arguments” that appear within the tag in • Opening tags can have arguments that appear within the tag, in

analogy to constructs like <A HREF = ...> in HTML.• Keyword !ATTLIST introduces a list of attributes and their types

for a given element.

Example:Example:<!ELEMENT BAR (NAME BEER*)><!ATTLIST BAR

type = "sushi"|"sports"|"other">

• Bar objects can have a type and the value of that type is limited to • Bar objects can have a type, and the value of that type is limited to the three strings shown.

• Example of use:<BAR type = "sushi">

. . .</BAR>

48

ID’s and IDREF’s

• ID stands for identifier. No two ID attributes with the same name may have the same value (of type CDATA)name may have the same value (of type CDATA).

• IDREF stands for identifier reference. Every value associated with an IDREF attribute must exist as an ID attribute value.

• These are pointers from one object to another, analogous to NAME = foo and HREF = #foo in HTML.

• Allows the structure of an XML document to be a general graph • Allows the structure of an XML document to be a general graph, rather than just a tree.

• An attribute of type ID can be used to give the object (string b t i d l i t ) i t i id tifibetween opening and closing tags) a unique string identifier.

• An attribute of type IDREF refers to some object by its identifier.• Also IDREFS to allow multiple object references within one

tag. That is, IDREFS specifies several (0 or more) identifiers

49

ExampleLet us include in our Bars document type elements that are the

manufacturers of beers, and have each beer object link, with jan IDREF, to the proper manufacturer object.<!DOCTYPE Bars [

<!ELEMENT BARS (BAR*)><!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)>( )<!ELEMENT MANF (ADDR)>

<!ATTLIST MANF (name ID)><!ELEMENT ADDR (#PCDATA)><!ELEMENT BEER (NAME, PRICE)>

<!ATTLIST BEER (manf = IDREF)><!ATTLIST BEER (manf = IDREF)><!ELEMENT PRICE (#PCDATA)>

]>

50

]

Connecting the document with its DTDConnecting the document with its DTD

In line:In line:<?xml version="1.0"?><!DOCTYPE db [<!ELEMENT > ]><!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>

Another file:<!DOCTYPE db SYSTEM "schema.dtd">

A URL:<!DOCTYPE db SYSTEM

"http://www.schemaauthority.com/schema.dtd">

51

DTDs v.s Schemas (or Types)• By database (or programming language) standards

DTDs are rather weak specifications. p f– Only one base type -- PCDATA– No useful “abstractions” e.g., sets– IDREFs are untyped. You point to something, but you

don’t know what!– No constraints e g child is inverse of parentNo constraints e.g., child is inverse of parent– No methods– Tag definitions are global

• Some of the XML extensions impose something like a schema or type on an XML document. We’ll

h lsee these later

52

L t f ibiliti f hLots of possibilities for schemas

• XML Schema (under W3C’s spotlight)• XDR (Microsoft’s BizTalk)( )• SOX (Schema for Object-Oriented XML)• Schematron• DSD (AT&T Labs and BRICS)• and more.and more.

53

Some tools• XML Authority

http://www.extensibility.com/tibco/solutions/xmlp y_authority/index.htm

• XML Spy pyhttp://www.xmlspy.com/download.html

54

SummarySummary

• XML is a new data format. Its main virtues are widespread acceptance and the (important) ability to handle semistructured data (data without sch m )schema).

• DTDs provide some useful syntactic constraints on documents As schemas they are weakdocuments. As schemas they are weak.

55

Why a query language? Extracting, Restructuring, Integration BrowsingIntegration, Browsing…

XML-QL http://www.w3.org/TR/NOTE-xml-qlhttp://db.cis.upenn.edu/XML-QL/

XPATH (part of a query language)h 3 /TR/ hhttp:www.w3.org/TR/xpath

XSLThttp://www w3 org/TR/xslthttp://www.w3.org/TR/xslthttp://www.mulberrytech.com/quickref/XSLTquickref.pdf

QUILThttp://www.almaden.ibm.com/cs/people/chamberlin/quilt.htmlhttp://db.cis.upenn.edu/Kweelt/

56

XPath• Reasonably widely adopted -- in XML-Schema and query

languages.• Neither more expressive nor less expressive than regular path

iexpressions• Primary goal = to permit to access some nodes from a given

documentXP th i st t is i ti• XPath main construct : axis navigation

• An XPath path consists of one or more navigation steps, separated by /A i ti st is t i l t: is d t st list f • A navigation step is a triplet: axis + node-test + list of predicates

• Examplesp– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle =

“XML”][2]

• XPath also offers some shortcuts– no axis means child– // /descendant-or-self::node()/

57

// /descendant or self::node()/

XPath- child axis navigationXPath child axis navigation• author is shorthand for child::author. Examples:

– aaa -- all the child nodes labeled aaa (1 3)aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)g y

context node

aaa

bbb

ccc aaa

aaa bbb ccc

1 2 3

4 5 6 7

– . -- the context node

bbb aaa bbb ccc

– / -- the root node

58

XPath- child axis navigation (cont)XPath child axis navigation (cont)– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node ./aaa all the aaa children of the context node

(equivalent to aaa)– text() -- all the text children of the context node

d () ll h hild f h d (i l d – node() -- all the children of the context node (includes text and attribute nodes)

– .. -- parent of the context node.. parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //para -- all the para nodes in the document– //text() -- all the text nodes in the document

@font the font attribute node of the context node– @font the font attribute node of the context node

59

Predicates– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context chapter[5] the fifth chapter child of the context

node– [last()] -- the last child node of the context node[ ast()] the last ch ld node of the context node– chapter[title=“introduction”] -- the chapter children

of the context node that have one or more titlechildren whose string-value is “introduction”

– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”Joe

60

Unions of Path Expressions

• employee | consultant -- the union of the employee and consultant nodes that are employee and consultant nodes that are children of the context nodeFor some reason • For some reason person/(employee|consultant) -- is not allowedallowed

• However / d ()[b l ( l | lt t)]person/node()[boolean(employee|consultant)]

is allowed!!

61

Axis navigation• So far, nearly all our expressions have moved us down the by

moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root// all descendants of the root– .// all descendants of the context node

• All other expressions have been abbreviations for child::… hild hild i l f ie.g. child::para. child:is an example of an axis

• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-g gsibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others

describe sequences of nodes.describe sequences of nodes.

62

XPath Navigation Axes

ancestor

following-siblingpreceding-sibling

child

self

followingprecedingattribute

descendant

namespace

XPath abbreviated syntax

(nothing) child::@ tt ib t@ attribute::// /descendant-or-self::node()

self::node(). self::node().// descendant-or-self::node.. parent::node()p ()/ (document root)

Examples of XPath queries• If the Company XML document is stored at the location

www.company.com/info.xml then the first Xpath expression can be written as can be written as

• doc(www.company.com/info.xml)/company• Some examples of Xpath expressions on XML documents

th t f ll th XML h fil C that follow the XML schema file Company are:• /company - returns the company root node and all its descendant

nodes, that is, the wholeXML docukment./ /d• /company/department -

• //employee [employeeSalary gt 70000]/employeeName – returns all employeeName nodes that are direct children of an employee node, such that the employee node has another child element such that the employee node has another child element employeeSalary whose value is gt 70000.

• /company/employee [employeeSalary gt 70000]/employeeName -/ / j / j W k [h 20 0] hild • /company/project/projectWorker [hours ge 20.0] – returns a child

node hours with a value ge 20.0 hours.

65

XQuery• Xpath allows to write expressions that select

nodes from a tree-structured XML document.f• XQuery permits the specification of more general

queries on one or more XML documents.q• The typical form of a query in Xqurey is known as

a FLWR expression.

• FOR <variable bindings to individual nodes (elements)>• LET <variable bindings to collection of nodes (elements)>• WHERE <qualifier conditions>• RETURN <query result specification>

66

Examples for XQuery queries• FOR $x IN

doc(www.company.com/info.xml)//employee [employeeSalary gt 70000]/employeeName//employee [employeeSalary gt 70000]/employeeNameRETURN <res> $x/firstName, $x/lastName </res>

• FOR $x IN/ / /doc(www.company.com/info.xml)/company/employee

WHERE $x/employeeSalary gt 70000RETURN <res> $x/EmployeeName/firstName, $ / / /$x/employeeName/lastName </res>

• FOR $x INdoc(www.company.com/info.xml)/company( p y ) p y

/project [projectNumber = 5]/projectWorker,$y INdoc(www.company.com/info.xml)/company/employee doc(www.company.com/info.xml)/company/employee WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssnRETURN <res> $x/EmployeeName/firstName,

$y/employeeName/lastName, $x/hours </res>

67

$y/employeeName/lastName, $x/hours /res

XQueryEmerging standard for querying XML documents.

Basic form:FOR <variables ranging over sets of elements>WHERE <condition>RETURN <set of elements>;

• Sets of elements described by paths, consisting fof:

1. URL, if necessary.2. Element names forming a path in the

semistructured data graph, e.g., //BAR/NAME =“start at any BAR node and go to a NAME child ”start at any BAR node and go to a NAME child.

3. Ending condition of the form[<condition about subelements @attributes and values>]

68

[<condition about subelements, @attributes, and values>]

ExampleThe file http://www.cse.ucsc.edu/bars.xml:

<?XML VERSION = "1.0" STANDALONE = "no"?><!DOCTYPE Bars SYSTEM "bar dtd"><!DOCTYPE Bars SYSTEM "bar.dtd"><BARS>

<BAR type = "sports">/<NAME>Joe's Bar</NAME>

<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>

<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>

</BAR><BAR type = "sushi">

<NAME>Homma's</NAME><BEER><NAME>Sapporo</NAME><BEER><NAME>Sapporo</NAME>

<PRICE>4.00</PRICE></BEER></BAR> ...

</BARS>

69

</BARS>

XQUERY Query

• Query: Find the prices charged for Bud by sports barsQu ry F n th pr c s charg for u y sports arsthat serve Miller.

FOR $ba IN document("http://www.cse.ucsc.edu/bars.html")

//BAR[@ " "]//BAR[@type = "sports"],$be IN

$b / [ A " d"]$ba/BEER[NAME = "Bud"]WHERE $ba/BEER/[NAME = "Miller"]RETURN $be/PRICE;RETURN $be/PRICE;

70

Conclusions• XML is a data format for which there are an

increasing number of useful tools forg f f f– Constructing schemas– Programming– Querying

• Although it is likely that a query language will soon m s st d d th is l ss m t emerge as a standard, there is less agreement or

understanding on how to store XML data efficientlyefficiently.

• Many other database issues remain to make it useful for manipulating large amounts of data.f f m p g g m f .

71

Recommended