71
XML and Internet Databases 1

Yazici XML Ex

Embed Size (px)

Citation preview

Page 1: Yazici XML Ex

XML and Internet DatabasesXML an nt rn t Data as s

1

Page 2: Yazici XML Ex

Outline

• Background: documents (SGML/HTML) • Background: documents (SGML/HTML) and databases (structured and

i d d ) semistructured data)

• XML Basics and Document Type • XML Basics and Document Type Descriptors

• XML query languages: XPath, XQuery

2

Page 3: Yazici XML Ex

Part I: Background

What’s the difference between the world of documents and information retrieval and

databases and query interfaces?

3

Page 4: Yazici XML Ex

Documents vs DatabasesDocument world

> plenty of small documents> usually static

Database world> a few large databases> usually dynamic> usually static

> implicit structuresection, paragraph, toc,

> usually dynamic

> explicit structure (schema)section, paragraph, toc,

> tagging

> human friendly> records

> machine friendlyy

> contentform/layout, annotation

> machine friendly

> contentschema, data, methods

> Paradigms“Save as”

schema, data, methods

> ParadigmsAtomicity, Concurrency, Isolation, Durability

> meta-dataauthor name, date, subject > meta-data

schema description

4

Page 5: Yazici XML Ex

What to do with themDocuments Database

• editing

i ti

• updating

• printing

• spell-checking• cleaning

spell checking• counting words

• querying• retrieving (IR)

hi

q y g

• searching• composing/transforming

5

Page 6: Yazici XML Ex

HTMLLin f n f p blishin h p t xt n th W ld Wid • Lingua franca for publishing hypertext on the World Wide Web

• HTML is widely used for formatting and structuring Web documentsdocuments.

• Designed to describe how a Web browser should arrange text, images and push-buttons on a page.

• Easy to learn but does not convey structure and meaning of Easy to learn, but does not convey structure and meaning of data in the Web pages.

• Fixed tag set.

<HTML><HEAD><TITLE>Welcome to the XML course</TITLE></HEAD>

Opening tag Text (PCDATA)

<HEAD><TITLE>Welcome to the XML course</TITLE></HEAD><BODY>

<H1>Introduction</H1><IMG SRC=”dragon.jpeg" WIDTH="200" HEIGHT="150” >

</BODY></HTML>

Closing tag “Bachelor” tagAttribute name Attribute value

6

</HTML>

Page 7: Yazici XML Ex

Semistructure data

1. Information integration: important new application that motivates what followsapplication that motivates what follows.

2. Semistructured data: a new data model designed to cope with problems of designed to cope with problems of information integration.

3 XML W b d d h i 3. XML: a new Web standard that is essentially semistructured data.

4. XQUERY: an emerging standard query language for XML data.

7

Page 8: Yazici XML Ex

Information IntegrationProblem: related data exists in many places. They

talk about the same things, but differ in model, g , ff ,schema, conventions (e.g., terminology).

Example: In the real world, every bar has its own database.

• Some may have relations like beer-price; others have an Microsoft Word file from which the menu i i t dis printed.

• Some keep phones of manufacturers but not addressesaddresses.

• Some distinguish beers and ales; others do not.

8

Page 9: Yazici XML Ex

Two approaches

1. Warehousing: Make copies of information at each data source centrallyat each data source centrally.– Reconstruct data daily/weekly/monthly,

but do not try to keep it up to datebut do not try to keep it up-to-date.

2. Mediation: Create a view of all information, but do not make copies.p– Answer queries by sending appropriate

queries to sources.q .

9

Page 10: Yazici XML Ex

userquery result

Warehousing WarehouseWar hous ng Warehouse

Combiner

Wrapper Wrapper

DB1 DB2

10

Page 11: Yazici XML Ex

Mediationltquery result

Mediator

Wrapper Wrapperresult

queryqueryresult

Wrapper Wrapperquery result query result

DB1 DB2

11

Page 12: Yazici XML Ex

Semistructured Data

• A different kind of data model, more suited to information-integration suited to information-integration applications than either relational or OO.

Think of “objects ” but with the type of – Think of objects, but with the type of an object for its own business rather than the business of the class to which than the business of the class to which it belongs.All i f i f l – Allows information from several sources, with related but different properties, to b fit t th i h lbe fit together in one whole.

• Major application: XML documents.

12

Page 13: Yazici XML Ex

Graph Representation of Semistructured DataSemistructured Data

• Nodes = objects.N d d i l d h • Nodes connected in a general rooted graph structure.

• Labels on arcs.• Atomic values on leaf nodes.m f .• Big deal: no restriction on labels

(roughly = attributes)(roughly = attributes).– Zero, one, or many children of a given

label type are all OKlabel type are all OK.

13

Page 14: Yazici XML Ex

XML (Extensible Markup Language)

HTML uses tags for formatting (e.g., “italic”).XML uses tags for semantics (e g “this is an XML uses tags for semantics (e.g., this is an

address”).• Two modes:• Two modes:1. Well-formed XML: A document that obeys the

“nested tags” rule and does not repeat annested tags rule and does not repeat anattribute within a tag is said to be well-formed.It allows you to invent your own tags much likeIt allows you to invent your own tags, much likelabels in semistructured data.

2 Valid XML involves a DTD (Document Type 2. Valid XML involves a DTD (Document Type Definition) that tells the labels and gives a grammar for how they may be nested.

14

g f y y

Page 15: Yazici XML Ex

Well-Formed XML

1. Declaration = <? ... ?> .Normal declaration is– Normal declaration is<? XML VERSION = "1.0" STANDALONE = "yes" ?>?>

– “Standalone” means that there is no DTD specifiedspecified.

2. Root tag surrounds the entire balance of the d tdocument.– <FOO> is balanced by </FOO>, as in HTML.

3. Any balanced structure of tags OK.– Option of tags that don’t require balance

15

Option of tags that don t require balance, like <P> in HTML.

Page 16: Yazici XML Ex

The Structure of XML

• XML consists of tags and text

• Tags come in pairs <date> ...</date>g p

• They must be properly nestedThey must be properly nested<date> <day> ... </day> ... </date> --- good

d t d /d t /d b d<date> <day> ... </date>... </day> --- bad

16

Page 17: Yazici XML Ex

XML text

XML has only one “basic” type -- text.

It is bounded by tags, e.g.<title> The Big Sleep </title><year> 1935 </ year> --- 1935 is still textyea 935 / yea 935 s st ll text

XML text is called PCDATA (for parsedXML text is called PCDATA (for parsedcharacter data). It uses a 16-bit encoding.

17

Page 18: Yazici XML Ex

XML structureXML structure

Nesting tags can be used to express various Nesting tags can be used to express various structures. E.g., A tuple (record) :

<person>M l l At hi /<name> Malcolm Atchison </name>

<tel> (215) 898 4321 </tel>< il> @d l </ il><email> [email protected] </email>

</person>

18

Page 19: Yazici XML Ex

TerminologyThe segment of an XML document between an opening and a corresponding closing tag is opening and a corresponding closing tag is called an element.

<person><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><email> [email protected] </email>

element<email> [email protected] </email>

</person>

lelement a sub element not an elementelement, a sub-elementof

19

Page 20: Yazici XML Ex

XML is tree-likeXML is tree like

person

name emailtel tel email

Malcolm Atchison (215) 898 4321(215) 898 4321 [email protected]

20

Page 21: Yazici XML Ex

A C l t XML D tA Complete XML Document

<?xml version="1.0"?><person><name> Malcolm Atchison </name><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><email> [email protected] </email>

</person>/p

21

Page 22: Yazici XML Ex

Example

bbarbeer

beerbar

Bud A.B.

prize

name

manfmanfname

M’lob1995 Gold

Bud A.B.awardyear

name

servedAt 1995 GoldservedAt

Joe’s Maple

name addr

22

Joe s Maple

Page 23: Yazici XML Ex

Example

<?XML VERSION = "1.0" STANDALONE = "yes"?>y<BARS>

<BAR><NAME>Joe's Bar</NAME><BAR><NAME>Joe s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR></BAR><BAR> ...

23

</BARS>

Page 24: Yazici XML Ex

Representing relational DBs:p gTwo ways

projects:title budget managedBy

employees:name ssn age

24

Page 25: Yazici XML Ex

Project and Employee relations in XML

Projects and employees are intermixed

<db><project> <employee>

<title> Pattern recognition </title><budget> 10000 </budget><managedBy> Joe

<name> Sandra </name><ssn> 2234 </ssn><age> 35 </age>/ l

g y</managedBy>

</project><employee>

</employee><project>

<title> Auto guided vehicle </title><budget> 70000 </budget><employee>

<name> Joe </name><ssn> 344556 </ssn>

<budget> 70000 </budget><managedBy> Sandra </managedBy>

</project>:<age> 34 < /age>

</employee>

:</db>

25

Page 26: Yazici XML Ex

Project and Employee relations in XML (cont’d)

<db>l

Employees follows projects

<projects><project>

<title> Pattern recognition </title>

<employees><employee>

<name> Joe </name>g /<budget> 10000 </budget><managedBy> Joe </managedBy>

</project>

<ssn> 344556 </ssn><age> 34 </age>

</employee></project><project>

<title> Auto guided vehicles </title>

</employee> <employee>

<name> Sandra </name>

<budget> 70000 </budget><managedBy> Sandra

</managedBy>

<ssn> 2234 </ssn><age>35 </age>

</employee>/ g y</project>

:</projects>

</employee>:<employees>/db

26

</projects> </db>

Page 27: Yazici XML Ex

Project and Employee relations in XML (cont’d)

db

Or without “separator” tags …<db>

<projects> <title> Pattern recognition </title> <employees>

<name> Joe </name>g

<budget> 10000 </budget><managedBy> Joe </managedBy><title> Auto guided vehicles

<name> Joe </name><ssn> 344556 </ssn><age> 34 </age><name> Sandra </name><title> Auto guided vehicles

</title><budget> 70000 </budget>

dB S d

<name> Sandra </name><ssn> 2234 </ssn><age> 35 </age>:<managedBy> Sandra

</managedBy>:

:</employees>

</db>

</projects>

27

Page 28: Yazici XML Ex

AttributesAn (opening) tag may contain attributes. These are typically used to describe the content of an yp y felement.

<entry><word language = “en”> cheese </word><word language = “fr”> fromage </word><word language = fr > fromage </word><word language = “ro”> branza </word><meaning> A food made … </meaning>g / g

</entry>

28

Page 29: Yazici XML Ex

Attributes (cont’d)Another common use for attributes is to express dimension or typeyp

<picture><height dim “cm”> 2400 </height><height dim= “cm”> 2400 </height><width dim= “in”> 96 </width><data encoding = “gif” compression = “zip”><data encoding gif compression zip >

M05-.+C$@02!G96YE<FEC ...</data>

</picture>

29

Page 30: Yazici XML Ex

Using IDs<family>

<person id="jane" mother="mary" father="john"> <name> Jane Doe </name><name> Jane Doe </name>

</person><person id="john" children="jane jack"> p j j j

<name> John Doe </name></person> <person id="mary" children="jane jack"><person id= mary children= jane jack >

<name> Mary Doe </name></person>

<person id="jack" mother=”mary" father="john"> <name> Jack Doe </name>

</person></person></family>

30

Page 31: Yazici XML Ex

An object-oriented schema An object-oriented schema

class Movie class Actorclass Movie

( extent Movies, key title ){

class Actor

( extent Actors, key name ){

attribute string title;

attribute string director;

l h

attribute string name;

relationship set<Movie> acted_In

relationship set<Actor> casts

inverse Actor::acted_In;

attribute int budget;

inverse Movie::casts;

attribute int age;

attribute set<string> directed;attribute int budget;} ;

attribute set<string> directed;} ;

31

Page 32: Yazici XML Ex

An example<db>

<movie id=“m1”><title>Waking Ned Divine</title><title>Waking Ned Divine</title><director>Kirk Jones III</director><cast idrefs=“a1 a3”></cast>

<actor id=“a1”><name>David Kelly</name>

f<budget>100,000</budget> </movie><movie id=“m2”>

<acted_In idrefs=“m1 m3 m78” ></acted_In>

</actor>t id “ 2”movie id m2

<title>Dragonheart</title><director>Rob Cohen</director>< t id f “ 2 9 21”></ t>

<actor id=“a2”><name>Sean Connery</name><acted_In idrefs=“m2 m9 m11”></acted In><cast idrefs=“a2 a9 a21”></cast>

<budget>110,000</budget> </movie>

</acted_In><age>68</age>

</actor><actor id=“a3”>

<movie id=“m3”><title>Moondance</title><director>Dagmar Hirtz</director>

<actor id= a3 ><name>Ian Bannen</name><acted_In idrefs=“m1 m35”></acted In><director>Dagmar Hirtz</director>

<cast idrefs=“a1 a8”></cast><budget>90,000</budget>

</acted_In></actor>:

</db>

32

</movie>:

/db

Page 33: Yazici XML Ex

Part II: Document Type DescriptorsPart II: Document Type Descriptors(DTD)

Imposing structure on XML documentsp g

33

Page 34: Yazici XML Ex

Document Type DescriptorsDocument ype Descr ptors

• Document Type Descriptors (DTDs) impose yp p ( ) pstructure on an XML document.

Th i l ti hi b t DTD • There is some relationship between a DTD and a schema, but it is not close – there is till d f dditi l “t i ” tstill a need for additional “typing” systems.

• The DTD is a syntactic specificationThe DTD is a syntactic specification.

34

Page 35: Yazici XML Ex

Document Type Definitions (DTD)

Essentially a grammar describing the legal nesting of tags.Ess nt a y a grammar scr ng th ga n st ng of tags.• Intention is that DTD’s will be standards for a domain,

used by everyone preparing or using data in that domain.y y p p g g– Example: a DTD for describing protein structure; a

DTD for describing bar menus, etc.

Gross Structure of a DTD:Gross Structure of a DTD:<!DOCTYPE root tag [

<!ELEMENT name (components)><!ELEMENT name (components)>more elements

]>

35

]>

Page 36: Yazici XML Ex

Example: An Address BookExample: An Address Book<person>

<name> MacNiel, John </name>

<greet> Dr. John MacNiel </greet>

Exactly one nameAt most one greeting

<addr>1234 Huron Street </addr>

<addr> Rome, OH 98765 </addr>

As many address lines as needed (in order)<addr> Rome, OH 98765 </addr>

<tel> (321) 786 2543 </tel>

<f > (321) 786 2543 </f >

( )

Mixed telephones d f<fax> (321) 786 2543 </fax>

<tel> (321) 786 2543 </tel>

and faxes

As many<email> [email protected] </email>

</person>

As manyas needed

36

Page 37: Yazici XML Ex

Specifying the structureSpecifying the structure

name t s if name l t• name to specify a name element• greet? to specify an optional g p y p

(0 or 1) greet elements• name greet? to specify a name followed by • name,greet? to specify a name followed by

an optional greet

37

Page 38: Yazici XML Ex

Specifying the structure (cont)Specifying the structure (cont)

add * t s if 0 add ess li s• addr* to specify 0 or more address lines

• tel | fax a tel or a fax element | m

• (tel | fax)* 0 or more repeats of tel or fax

• email* 0 or more email elements

38

Page 39: Yazici XML Ex

A DTD for the address bookA DTD for the address book

<!DOCTYPE addressbook [[<!ELEMENT addressbook (person*)><!ELEMENT personp

(name, greet?, address*, (fax | tel)*, email*)><!ELEMENT name (#PCDATA)>( )<!ELEMENT greet (#PCDATA)><!ELEMENT address (#PCDATA)>( )<!ELEMENT tel (#PCDATA)><!ELEMENT fax (#PCDATA)><!ELEMENT email (#PCDATA)>

]>

39

Page 40: Yazici XML Ex

Two DTDs for the relational DBTwo DTDs for the relational DB

<!DOCTYPE db [<!ELEMENT db (projects,employees)><!ELEMENT projects (project*)><!ELEMENT projects (project*)><!ELEMENT employees (employee*)><!ELEMENT project (title, budget, managedBy)>p j ( , g , g y)<!ELEMENT employee (name, ssn, age)>...

]>]>

40

Page 41: Yazici XML Ex

Summary of XML regular expressionsy g p• Each element name is a tag.

It t th t th t t d • Its components are the tags that appear nested within, in the order specified.A The tag A occurs• A The tag A occurs

• e1,e2 The expression e1 followed by e2* 0 f • e* 0 or more occurrences of e

• e? Optional -- 0 or 1 occurrences1 • e+ 1 or more occurrences

• e1 | e2 either e1 or e2( ) i• (e) grouping

41

Page 42: Yazici XML Ex

Back to the object-oriented schema Back to the object-oriented schema

class Movie class Actorclass Movie

( extent Movies, key title ){

class Actor

( extent Actors, key name ){

attribute string title;

attribute string director;

l h

attribute string name;

relationship set<Movie> acted_In

relationship set<Actor> casts

inverse Actor::acted_In;

attribute int budget;

inverse Movie::casts;

attribute int age;

attribute set<string> directed;attribute int budget;} ;

attribute set<string> directed;} ;

42

Page 43: Yazici XML Ex

Schema dtdSchema.dtd

<!DOCTYPE db [<!ELEMENT db (movie+, actor+)>( , )<!ELEMENT movie (title,director,casts,budget)><!ATTLIST movie id ID #REQUIRED><!ELEMENT title (#PCDATA)><!ELEMENT director (#PCDATA)><!ELEMENT casts EMPTY><!ELEMENT casts EMPTY>

<!ATTLIST casts idrefs IDREFS #REQUIRED><!ELEMENT budget (#PCDATA)>

43

Page 44: Yazici XML Ex

Schema dtd (cont’d)Schema.dtd (cont d)

<!ELEMENT actor (name, acted_In,age?, directed*)><!ATTLIST actor id ID #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT acted_In EMPTY>

<!ATTLIST acted In idrefs IDREFS #REQUIRED><!ATTLIST acted_In idrefs IDREFS #REQUIRED><!ELEMENT age (#PCDATA)><!ELEMENT directed (#PCDATA)>

]>

44

Page 45: Yazici XML Ex

Elements of a DTD

An element is a name (its tag) and a parenthesizeddescription of tags within an elementdescription of tags within an element.• Special case: (#PCDATA) after an element name means it

is textis text.Example

<!DOCTYPE Bars [<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME BEER+)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME PRICE)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]>

45

]>

Page 46: Yazici XML Ex

Example of (a)<?XML VERSION = "1.0" STANDALONE = "no"?>

<!DOCTYPE Bars [<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]]>

<BARS><BAR><NAME>Joe's Bar</NAME>

<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>

<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>

</BAR>

46

</BAR><BAR> ...

</BARS>

Page 47: Yazici XML Ex

Example of (b)Suppose our bars DTD is in file bar.dtd:

<?XML VERSION = "1.0" STANDALONE = "no"?>

<!DOCTYPE Bars SYSTEM "bar.dtd">

<BARS><BAR><NAME>Joe's Bar</NAME>

<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>

<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>

</BAR></BAR><BAR> ...

</BARS>

47

Page 48: Yazici XML Ex

Attribute Lists• Opening tags can have “arguments” that appear within the tag in • Opening tags can have arguments that appear within the tag, in

analogy to constructs like <A HREF = ...> in HTML.• Keyword !ATTLIST introduces a list of attributes and their types

for a given element.

Example:Example:<!ELEMENT BAR (NAME BEER*)><!ATTLIST BAR

type = "sushi"|"sports"|"other">

• Bar objects can have a type and the value of that type is limited to • Bar objects can have a type, and the value of that type is limited to the three strings shown.

• Example of use:<BAR type = "sushi">

. . .</BAR>

48

Page 49: Yazici XML Ex

ID’s and IDREF’s

• ID stands for identifier. No two ID attributes with the same name may have the same value (of type CDATA)name may have the same value (of type CDATA).

• IDREF stands for identifier reference. Every value associated with an IDREF attribute must exist as an ID attribute value.

• These are pointers from one object to another, analogous to NAME = foo and HREF = #foo in HTML.

• Allows the structure of an XML document to be a general graph • Allows the structure of an XML document to be a general graph, rather than just a tree.

• An attribute of type ID can be used to give the object (string b t i d l i t ) i t i id tifibetween opening and closing tags) a unique string identifier.

• An attribute of type IDREF refers to some object by its identifier.• Also IDREFS to allow multiple object references within one

tag. That is, IDREFS specifies several (0 or more) identifiers

49

Page 50: Yazici XML Ex

ExampleLet us include in our Bars document type elements that are the

manufacturers of beers, and have each beer object link, with jan IDREF, to the proper manufacturer object.<!DOCTYPE Bars [

<!ELEMENT BARS (BAR*)><!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)>( )<!ELEMENT MANF (ADDR)>

<!ATTLIST MANF (name ID)><!ELEMENT ADDR (#PCDATA)><!ELEMENT BEER (NAME, PRICE)>

<!ATTLIST BEER (manf = IDREF)><!ATTLIST BEER (manf = IDREF)><!ELEMENT PRICE (#PCDATA)>

]>

50

]

Page 51: Yazici XML Ex

Connecting the document with its DTDConnecting the document with its DTD

In line:In line:<?xml version="1.0"?><!DOCTYPE db [<!ELEMENT > ]><!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>

Another file:<!DOCTYPE db SYSTEM "schema.dtd">

A URL:<!DOCTYPE db SYSTEM

"http://www.schemaauthority.com/schema.dtd">

51

Page 52: Yazici XML Ex

DTDs v.s Schemas (or Types)• By database (or programming language) standards

DTDs are rather weak specifications. p f– Only one base type -- PCDATA– No useful “abstractions” e.g., sets– IDREFs are untyped. You point to something, but you

don’t know what!– No constraints e g child is inverse of parentNo constraints e.g., child is inverse of parent– No methods– Tag definitions are global

• Some of the XML extensions impose something like a schema or type on an XML document. We’ll

h lsee these later

52

Page 53: Yazici XML Ex

L t f ibiliti f hLots of possibilities for schemas

• XML Schema (under W3C’s spotlight)• XDR (Microsoft’s BizTalk)( )• SOX (Schema for Object-Oriented XML)• Schematron• DSD (AT&T Labs and BRICS)• and more.and more.

53

Page 54: Yazici XML Ex

Some tools• XML Authority

http://www.extensibility.com/tibco/solutions/xmlp y_authority/index.htm

• XML Spy pyhttp://www.xmlspy.com/download.html

54

Page 55: Yazici XML Ex

SummarySummary

• XML is a new data format. Its main virtues are widespread acceptance and the (important) ability to handle semistructured data (data without sch m )schema).

• DTDs provide some useful syntactic constraints on documents As schemas they are weakdocuments. As schemas they are weak.

55

Page 56: Yazici XML Ex

Why a query language? Extracting, Restructuring, Integration BrowsingIntegration, Browsing…

XML-QL http://www.w3.org/TR/NOTE-xml-qlhttp://db.cis.upenn.edu/XML-QL/

XPATH (part of a query language)h 3 /TR/ hhttp:www.w3.org/TR/xpath

XSLThttp://www w3 org/TR/xslthttp://www.w3.org/TR/xslthttp://www.mulberrytech.com/quickref/XSLTquickref.pdf

QUILThttp://www.almaden.ibm.com/cs/people/chamberlin/quilt.htmlhttp://db.cis.upenn.edu/Kweelt/

56

Page 57: Yazici XML Ex

XPath• Reasonably widely adopted -- in XML-Schema and query

languages.• Neither more expressive nor less expressive than regular path

iexpressions• Primary goal = to permit to access some nodes from a given

documentXP th i st t is i ti• XPath main construct : axis navigation

• An XPath path consists of one or more navigation steps, separated by /A i ti st is t i l t: is d t st list f • A navigation step is a triplet: axis + node-test + list of predicates

• Examplesp– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle =

“XML”][2]

• XPath also offers some shortcuts– no axis means child– // /descendant-or-self::node()/

57

// /descendant or self::node()/

Page 58: Yazici XML Ex

XPath- child axis navigationXPath child axis navigation• author is shorthand for child::author. Examples:

– aaa -- all the child nodes labeled aaa (1 3)aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)g y

context node

aaa

bbb

ccc aaa

aaa bbb ccc

1 2 3

4 5 6 7

– . -- the context node

bbb aaa bbb ccc

– / -- the root node

58

Page 59: Yazici XML Ex

XPath- child axis navigation (cont)XPath child axis navigation (cont)– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node ./aaa all the aaa children of the context node

(equivalent to aaa)– text() -- all the text children of the context node

d () ll h hild f h d (i l d – node() -- all the children of the context node (includes text and attribute nodes)

– .. -- parent of the context node.. parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //para -- all the para nodes in the document– //text() -- all the text nodes in the document

@font the font attribute node of the context node– @font the font attribute node of the context node

59

Page 60: Yazici XML Ex

Predicates– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context chapter[5] the fifth chapter child of the context

node– [last()] -- the last child node of the context node[ ast()] the last ch ld node of the context node– chapter[title=“introduction”] -- the chapter children

of the context node that have one or more titlechildren whose string-value is “introduction”

– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”Joe

60

Page 61: Yazici XML Ex

Unions of Path Expressions

• employee | consultant -- the union of the employee and consultant nodes that are employee and consultant nodes that are children of the context nodeFor some reason • For some reason person/(employee|consultant) -- is not allowedallowed

• However / d ()[b l ( l | lt t)]person/node()[boolean(employee|consultant)]

is allowed!!

61

Page 62: Yazici XML Ex

Axis navigation• So far, nearly all our expressions have moved us down the by

moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root// all descendants of the root– .// all descendants of the context node

• All other expressions have been abbreviations for child::… hild hild i l f ie.g. child::para. child:is an example of an axis

• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-g gsibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others

describe sequences of nodes.describe sequences of nodes.

62

Page 63: Yazici XML Ex

XPath Navigation Axes

ancestor

following-siblingpreceding-sibling

child

self

followingprecedingattribute

descendant

namespace

Page 64: Yazici XML Ex

XPath abbreviated syntax

(nothing) child::@ tt ib t@ attribute::// /descendant-or-self::node()

self::node(). self::node().// descendant-or-self::node.. parent::node()p ()/ (document root)

Page 65: Yazici XML Ex

Examples of XPath queries• If the Company XML document is stored at the location

www.company.com/info.xml then the first Xpath expression can be written as can be written as

• doc(www.company.com/info.xml)/company• Some examples of Xpath expressions on XML documents

th t f ll th XML h fil C that follow the XML schema file Company are:• /company - returns the company root node and all its descendant

nodes, that is, the wholeXML docukment./ /d• /company/department -

• //employee [employeeSalary gt 70000]/employeeName – returns all employeeName nodes that are direct children of an employee node, such that the employee node has another child element such that the employee node has another child element employeeSalary whose value is gt 70000.

• /company/employee [employeeSalary gt 70000]/employeeName -/ / j / j W k [h 20 0] hild • /company/project/projectWorker [hours ge 20.0] – returns a child

node hours with a value ge 20.0 hours.

65

Page 66: Yazici XML Ex

XQuery• Xpath allows to write expressions that select

nodes from a tree-structured XML document.f• XQuery permits the specification of more general

queries on one or more XML documents.q• The typical form of a query in Xqurey is known as

a FLWR expression.

• FOR <variable bindings to individual nodes (elements)>• LET <variable bindings to collection of nodes (elements)>• WHERE <qualifier conditions>• RETURN <query result specification>

66

Page 67: Yazici XML Ex

Examples for XQuery queries• FOR $x IN

doc(www.company.com/info.xml)//employee [employeeSalary gt 70000]/employeeName//employee [employeeSalary gt 70000]/employeeNameRETURN <res> $x/firstName, $x/lastName </res>

• FOR $x IN/ / /doc(www.company.com/info.xml)/company/employee

WHERE $x/employeeSalary gt 70000RETURN <res> $x/EmployeeName/firstName, $ / / /$x/employeeName/lastName </res>

• FOR $x INdoc(www.company.com/info.xml)/company( p y ) p y

/project [projectNumber = 5]/projectWorker,$y INdoc(www.company.com/info.xml)/company/employee doc(www.company.com/info.xml)/company/employee WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssnRETURN <res> $x/EmployeeName/firstName,

$y/employeeName/lastName, $x/hours </res>

67

$y/employeeName/lastName, $x/hours /res

Page 68: Yazici XML Ex

XQueryEmerging standard for querying XML documents.

Basic form:FOR <variables ranging over sets of elements>WHERE <condition>RETURN <set of elements>;

• Sets of elements described by paths, consisting fof:

1. URL, if necessary.2. Element names forming a path in the

semistructured data graph, e.g., //BAR/NAME =“start at any BAR node and go to a NAME child ”start at any BAR node and go to a NAME child.

3. Ending condition of the form[<condition about subelements @attributes and values>]

68

[<condition about subelements, @attributes, and values>]

Page 69: Yazici XML Ex

ExampleThe file http://www.cse.ucsc.edu/bars.xml:

<?XML VERSION = "1.0" STANDALONE = "no"?><!DOCTYPE Bars SYSTEM "bar dtd"><!DOCTYPE Bars SYSTEM "bar.dtd"><BARS>

<BAR type = "sports">/<NAME>Joe's Bar</NAME>

<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>

<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>

</BAR><BAR type = "sushi">

<NAME>Homma's</NAME><BEER><NAME>Sapporo</NAME><BEER><NAME>Sapporo</NAME>

<PRICE>4.00</PRICE></BEER></BAR> ...

</BARS>

69

</BARS>

Page 70: Yazici XML Ex

XQUERY Query

• Query: Find the prices charged for Bud by sports barsQu ry F n th pr c s charg for u y sports arsthat serve Miller.

FOR $ba IN document("http://www.cse.ucsc.edu/bars.html")

//BAR[@ " "]//BAR[@type = "sports"],$be IN

$b / [ A " d"]$ba/BEER[NAME = "Bud"]WHERE $ba/BEER/[NAME = "Miller"]RETURN $be/PRICE;RETURN $be/PRICE;

70

Page 71: Yazici XML Ex

Conclusions• XML is a data format for which there are an

increasing number of useful tools forg f f f– Constructing schemas– Programming– Querying

• Although it is likely that a query language will soon m s st d d th is l ss m t emerge as a standard, there is less agreement or

understanding on how to store XML data efficientlyefficiently.

• Many other database issues remain to make it useful for manipulating large amounts of data.f f m p g g m f .

71