Module 5Module 5
Introduction to XQueryIntroduction to XQuery
01/31/07 2
XML is now everywhereXML is now everywhere Google search (warning: unreliable Google search (warning: unreliable
numbers)numbers) 285.000.000 for XML285.000.000 for XML 1.000.000 for XQuery1.000.000 for XQuery 11.000.000 for XSLT11.000.000 for XSLT 12.000.000 for XML Schema12.000.000 for XML Schema 60.000.000 for .NET60.000.000 for .NET 200.000.000 for Java200.000.000 for Java 64.000.000 for SQL64.000.000 for SQL
The highest Google number among all the The highest Google number among all the technology buzzwords that I searched (except RSS)technology buzzwords that I searched (except RSS)
01/31/07 3
Sources of XML data Sources of XML data 1.1. Inter-application communication data (WS, Rest, etc)Inter-application communication data (WS, Rest, etc)2.2. Mobile devices communication dataMobile devices communication data3.3. LogsLogs4.4. Blogs (RSS)Blogs (RSS)5.5. Metadata (e.g. Schema, WSDL, XMP)Metadata (e.g. Schema, WSDL, XMP)6.6. Presentation data (e.g. XHTML)Presentation data (e.g. XHTML)7.7. Documents (e.g. Word)Documents (e.g. Word)8.8. Views of other sources of data Views of other sources of data
Relational, LDAP, CSV, Excel, etc.Relational, LDAP, CSV, Excel, etc.9.9. Sensor dataSensor data
01/31/07 4
Some vertical application Some vertical application domains for XMLdomains for XML
HealthCare Level Seven HealthCare Level Seven http://www.hl7.org/http://www.hl7.org/ Geography Markup Language (GML) Geography Markup Language (GML) Systems Biology Markup Language (SBML) Systems Biology Markup Language (SBML) http://sbml.org/http://sbml.org/ XBRL, the XML based Business Reporting standard XBRL, the XML based Business Reporting standard
http://www.xbrl.org/http://www.xbrl.org/ Global Justice XML Data ModelGlobal Justice XML Data Model (GJXDM) (GJXDM) http://it.ojp.gov/jxdmhttp://it.ojp.gov/jxdm ebXML ebXML http://www.ebxml.org/http://www.ebxml.org/ e.g. Encoded Archival Description Applicatione.g. Encoded Archival Description Application
http://lcweb.loc.gov/ead/http://lcweb.loc.gov/ead/ Digital photography metadata XMPDigital photography metadata XMP An XML grammar for sensor data (SensorML)An XML grammar for sensor data (SensorML) Real Simple Syndication (RSS 2.0)Real Simple Syndication (RSS 2.0)
Basically everywhere.Basically everywhere.
01/31/07 5
Processing the XML dataProcessing the XML data• Huge amount of XML information, and growingHuge amount of XML information, and growing• We need to “We need to “managemanage” it, and then “” it, and then “processprocess” it” it
• Store it efficientlyStore it efficiently• Verify the correctness Verify the correctness • Filter, search, select, join, aggregateFilter, search, select, join, aggregate• Create new pieces of informationCreate new pieces of information• Clean, normalize the dataClean, normalize the data • Update itUpdate it• Take actions based on the existing dataTake actions based on the existing data• Write complex execution flowsWrite complex execution flows
• No conceptual organization like for relational No conceptual organization like for relational databases (applications are too heterogeneous)databases (applications are too heterogeneous)
01/31/07 6
Frequent solutions to XML data Frequent solutions to XML data managementmanagement
1.1. Map it to Map it to genericgeneric programming APIs (e.g. programming APIs (e.g. DOM, SAX, StaX)DOM, SAX, StaX)
2.2. ManuallyManually map it to map it to non-genericnon-generic APIs APIs3.3. AutomaticallyAutomatically map it to map it to non-genericnon-generic structures structures4.4. Use Use XML extensionsXML extensions of existing languages of existing languages5.5. ShreddingShredding for relational stores for relational stores6.6. NativeNative XML processing through XSLT and XML processing through XSLT and
XQueryXQuery
01/31/07 7
1. Mapping to generic structures1. Mapping to generic structures Represent the data:Represent the data:
Original UNICODE form orOriginal UNICODE form or Some binary representation (e.g FastInfoset)Some binary representation (e.g FastInfoset)
Store it:Store it: Directly on a file system orDirectly on a file system or On a “transacted” file system (e.g. SleepyCat, or a relational On a “transacted” file system (e.g. SleepyCat, or a relational
database)database) Map the XML data to generic XML programmatic Map the XML data to generic XML programmatic
APIsAPIs E.g. Dom, Sax, Stax (JSR 173), XMLReaderE.g. Dom, Sax, Stax (JSR 173), XMLReader
Use the native programming languages (e.g. Java, C#) Use the native programming languages (e.g. Java, C#) to manipulate the datato manipulate the data
Re-serialize it at the endRe-serialize it at the end
01/31/07 8
1. Manual mapping to generic 1. Manual mapping to generic structures (example)structures (example)
<purchaseOrder><purchaseOrder><lineItem><lineItem>……....</lineItem></lineItem><lineItem><lineItem>……....</lineItem></lineItem>
</purchaseOrder></purchaseOrder>
<book><book><author>…</author><author>…</author><title>….</title><title>….</title>……....
</book></book>
Class DomNode{
public String getNodeName();public String getNodeValue();public void setNodeValue(nodeValue);public short getNodeType();
}
Hard coded mappings
01/31/07 9
2. Manual mapping to non-2. Manual mapping to non-generic structuresgeneric structures
<purchaseOrder><purchaseOrder><lineItem><lineItem>……....</lineItem></lineItem><lineItem><lineItem>……....</lineItem></lineItem>
</purchaseOrder></purchaseOrder>
<book><book><author>…</author><author>…</author><title>….</title><title>….</title>……....
</book></book>
Class PurchaseOrder{
public List getLineItems();
……..
}
Hard coded mappings
Class Book{ public List getAuthor();
public String getTitle();……
}
01/31/07 10
3. Automatic mapping to non-3. Automatic mapping to non-generic structuresgeneric structures
<type name=“<type name=“book-typebook-type”>”> <sequence><sequence> <attribute name=“<attribute name=“yearyear” type=“xs:integer”>” type=“xs:integer”> <element name=“<element name=“titletitle” type=“xs:string”>” type=“xs:string”> <sequence minoccurs=“0”><sequence minoccurs=“0”> <element name=“<element name=“authorauthor” type=“xs:string>” type=“xs:string> </sequence></sequence> </sequence></sequence></type></type><element name=“<element name=“bookbook” type=“” type=“book-typebook-type”>”>
Class Book-type{
public integer getYear();public string getTitle();public List getAuthors();
……..
}
Automatic mapping
e.g.XMLBeans
01/31/07 11
4. XML extensions of existing 4. XML extensions of existing procedural languagesprocedural languages
Examples:Examples: C-omega, ECMAscript, PHP extensions, C-omega, ECMAscript, PHP extensions, Phyton extensions, etc.Phyton extensions, etc.
Most of them define:Most of them define: A way of importing XML data into their native A way of importing XML data into their native
type systemtype system A rich API for XML data manipulationA rich API for XML data manipulation A way of navigating/searching/querying the A way of navigating/searching/querying the
XML data via their extensions (Xpath based or XML data via their extensions (Xpath based or Xpath inspired)Xpath inspired)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
01/31/07 12
5. Native XML processing5. Native XML processingXSLT and XQueryXSLT and XQuery
Most promising alternative for the future.Most promising alternative for the future. The The onlyonly alternative such that: alternative such that:
the data is modeled only oncethe data is modeled only once is well integrated with XML Schema type systemis well integrated with XML Schema type system it preserves the logical/physical data independenceit preserves the logical/physical data independence the code deals with non-generic structuresthe code deals with non-generic structures Code can be optimized automaticallyCode can be optimized automatically
Data is stored:Data is stored: in plain file systems in plain file systems or or in sophisticated data stores (e.g. XML in sophisticated data stores (e.g. XML
extensions of relational stores)extensions of relational stores) Missing pieces, under developmentMissing pieces, under development
E.g. no procedural logicE.g. no procedural logic
01/31/07 13
Why XQuery ?Why XQuery ? Why a “Why a “query” languagequery” language for XML ? for XML ?
Need to process XML dataNeed to process XML data Preserve logical/physical data independencePreserve logical/physical data independence
The semantics is described in terms of an The semantics is described in terms of an abstract data modelabstract data model, , independent of the physical data storageindependent of the physical data storage
DeclarativeDeclarative programmingprogramming Such programs should describe the “Such programs should describe the “whatwhat”, not the “”, not the “how”how”
Why a Why a nativenative query language ? Why not query language ? Why not SQLSQL ? ? We need to deal with the We need to deal with the specificitiesspecificities of XML of XML
(hierarchical, ordered , textual, potentially schema-less (hierarchical, ordered , textual, potentially schema-less structure)structure)
Why another XML processing language ? Why not Why another XML processing language ? Why not XSLTXSLT?? The template nature of XSLT was not appealing to the The template nature of XSLT was not appealing to the
database people. Not declarative enough.database people. Not declarative enough.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
01/31/07 14
What is XQuery ?What is XQuery ?
A programming language that can express arbitrary A programming language that can express arbitrary XML to XML data transformationsXML to XML data transformations Logical/physical data independenceLogical/physical data independence ““Declarative” Declarative” ““High level”High level” ““Side-effect free”Side-effect free” ““Strongly typed” languageStrongly typed” language
““An expression language for XML.”An expression language for XML.” Commonalities with Commonalities with functionalfunctional programming, programming,
imperativeimperative programming and programming and queryquery languages languages The “The “queryquery” part might be a misnomer (***)” part might be a misnomer (***)
01/31/07 15
XQuery family of standardsXQuery family of standards••XQueryXQuery 1.0: An XML Query Language 1.0: An XML Query Language:an XML-aware syntax for querying collections of :an XML-aware syntax for querying collections of
structured and semi-structured data both locally and over the Webstructured and semi-structured data both locally and over the Web•XSL Transformations (XSLT) Version 2.0•XSL Transformations (XSLT) Version 2.0:transforms data model instances (XML and :transforms data model instances (XML and
non-XML) into other documents, including into XSL-FO for printingnon-XML) into other documents, including into XSL-FO for printing•XML Path Language (•XML Path Language (XPathXPath) 2.0) 2.0:expression syntax for referring to parts of XML :expression syntax for referring to parts of XML
documentsdocuments••XQueryXQuery 1.0 and 1.0 and XPathXPath 2.0 Functions and Operators 2.0 Functions and Operators:the functions you can call in XPath :the functions you can call in XPath
expressions and the operations you can perform on XPath 2.0 data typesexpressions and the operations you can perform on XPath 2.0 data types••XQueryXQuery 1.0 and 1.0 and XPathXPath 2.0 Data Model (XDM) 2.0 Data Model (XDM):representation and access for both XML :representation and access for both XML
and non-XML sourcesand non-XML sources•XSLT 2.0 and •XSLT 2.0 and XQueryXQuery 1.0 Serialization 1.0 Serialization:how to output the results of XSLT 2.0 and XML :how to output the results of XSLT 2.0 and XML
Query evaluation in XML, HTML or as textQuery evaluation in XML, HTML or as text•XML Syntax for •XML Syntax for XQueryXQuery 1.0 ( 1.0 (XQueryXXQueryX)): an XML-aware syntax for querying collections : an XML-aware syntax for querying collections
of structured and semi-structured data both locally and over the Webof structured and semi-structured data both locally and over the Web••XQueryXQuery 1.0 and 1.0 and XPathXPath 2.0 Formal Semantics 2.0 Formal Semantics:the type system used in XQuery and XSLT :the type system used in XQuery and XSLT
2 via XPath defined precisely for implementers2 via XPath defined precisely for implementers
01/31/07 16
XQuery, Xpath, XSLTXQuery, Xpath, XSLT
Xpath 1.0
XSLT 2.0XQuery 1.0
Xpath 2.0
XSLT 1.0
uses
uses
extends, almost backwards compatible
extendsFLWOR expressionsNode constructorsValidation
1999
2007
01/31/07 17
Roadmap for todayRoadmap for today XQuery Data Model (XDM)XQuery Data Model (XDM) XQuery type systemXQuery type system Xquery environmentXquery environment XQuery basic constructsXQuery basic constructs variablesvariables constantsconstants function calls, function libraryfunction calls, function library arithmetic operationsarithmetic operations boolean operationsboolean operations path expressionspath expressions conditionalsconditionals
01/31/07 18
The need for an abstract XML The need for an abstract XML data modeldata model
XML 1.0 specification only talks about XML 1.0 specification only talks about characterscharacters
We cannot have a programming language We cannot have a programming language processing “characters” (one by one)processing “characters” (one by one)
An XML abstract/logical data model !?An XML abstract/logical data model !? Unfortunately too many of thoseUnfortunately too many of those
Infoset, PSVI, DOM, Infoset, PSVI, DOM, XDMXDM, etc, etc
01/31/07 19
XML Data Model (XDM)XML Data Model (XDM) Abstract (I.e. logical) data model for XML dataAbstract (I.e. logical) data model for XML data Same role for XQuery as the relational data model for SQLSame role for XQuery as the relational data model for SQL Purely Purely logicallogical --- no --- no standardstandard storage or access model (in purpose) storage or access model (in purpose) XQuery is XQuery is closedclosed with respect to the Data Model with respect to the Data Model
InfosetPSVI XML Data Model
XQueryXpath 2.0XSLT 2.0
01/31/07 20
XML Data model life cycleXML Data model life cycle
parse
validate
.xml
.xsd
XQueryData
Model
XQueryData
Model
Xpath 2.0
XQuery
XSLT 2.0
application- dependent
.xml
serialize
01/31/07 21
XML Data ModelXML Data Model
Instance of the data model: Instance of the data model: a a sequencesequence composed of zero or more composed of zero or more itemsitems
The The empty sequenceempty sequence often often considered as the “null value”considered as the “null value” ItemsItems
nodesnodes or or atomic valuesatomic values NodesNodes
document | element | attribute | text | namespaces | PI | commentdocument | element | attribute | text | namespaces | PI | comment Atomic values Atomic values
Instances of all XML Schema atomic typesInstances of all XML Schema atomic typesstring, boolean, ID, IDREF, decimal, QName, URI, ...string, boolean, ID, IDREF, decimal, QName, URI, ... untyped atomic valuesuntyped atomic values
TypedTyped (I.e. schema validated) and (I.e. schema validated) and untypeduntyped (I.e. non schema (I.e. non schema validated) nodes and valuesvalidated) nodes and values
Remember Lisp ?
01/31/07 22
SequencesSequences
Can be Can be heterogeneousheterogeneous (nodes (nodes and and atomic values) atomic values) (<a/>, 3)(<a/>, 3) Can contain Can contain duplicates duplicates (by value and by identity)(by value and by identity) (1,1,1)(1,1,1) AreAre notnot necessarily ordered in necessarily ordered in document orderdocument order Nested sequences are Nested sequences are automatically flattenedautomatically flattened
( 1, 2, (3, 4) ) = (1, 2, 3, 4)( 1, 2, (3, 4) ) = (1, 2, 3, 4) Single items and singleton sequences are the sameSingle items and singleton sequences are the same 1 = (1)1 = (1)
01/31/07 23
Atomic valuesAtomic values
The values of the 19 The values of the 19 atomic typesatomic types available in XML available in XML Schema Schema E.g. xs:integer, xs:boolean, xs:dateE.g. xs:integer, xs:boolean, xs:date
All the All the user defined derived atomic typesuser defined derived atomic types E.g myNS:ShoeSizeE.g myNS:ShoeSize
xs:untypedAtomicxs:untypedAtomic Atomic values carry their type together with the Atomic values carry their type together with the
valuevalue (8, myNS:ShoeSize) is not the same as (8, xs:integer)(8, myNS:ShoeSize) is not the same as (8, xs:integer)
01/31/07 24
XML nodesXML nodes
7 types of nodes:7 types of nodes: document | element | attribute | text | namespaces | PI document | element | attribute | text | namespaces | PI
| comment| comment
Every node has a unique Every node has a unique node identifiernode identifier Scope of node identifier uniqueness is implementation Scope of node identifier uniqueness is implementation
dependentdependent Nodes have children and an optional parentNodes have children and an optional parent
conceptual “conceptual “treetree”” Nodes are ordered based of the topological order in Nodes are ordered based of the topological order in
the tree (“the tree (“document orderdocument order”)”)
01/31/07 25
Node accessorsNode accessors node-kind : xs:stringnode-kind : xs:string node-name : xs:Qname ?node-name : xs:Qname ? parent : node() ?parent : node() ? string-value : xs:stringstring-value : xs:string typed-value : xs:anyAtomicType *typed-value : xs:anyAtomicType * type-name : xs:Qname ?type-name : xs:Qname ? children : node()*children : node()* attributes : attribute() *attributes : attribute() * namespaces : node() *namespaces : node() *
01/31/07 26
Example of well formed XML Example of well formed XML datadata
<<bookbook yearyear=“1967”>=“1967”> <<titletitle>The politics of experience</>The politics of experience</titletitle>> <<authorauthor>R.D. Laing</>R.D. Laing</authorauthor>>
</</bookbook>> 3 element nodes, 1 attribute node, 5 text nodes3 element nodes, 1 attribute node, 5 text nodes
name(book element) = {-}:bookname(book element) = {-}:book In the absence of schema validationIn the absence of schema validation
type(book element) = xs:untypedtype(book element) = xs:untyped type(author element) = xs:untypedtype(author element) = xs:untyped type(year attribute) = xs:untypedAtomictype(year attribute) = xs:untypedAtomic typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic)typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic) typed-value(year attribute) = (“1967”, xs:untypedAtomic)typed-value(year attribute) = (“1967”, xs:untypedAtomic)
01/31/07 27
XML schema exampleXML schema example
<type name=“<type name=“book-typebook-type”>”> <sequence><sequence> <attribute name=“<attribute name=“yearyear” type=“xs:integer”>” type=“xs:integer”> <element name=“<element name=“titletitle” type=“xs:string”>” type=“xs:string”> <sequence minoccurs=“0”><sequence minoccurs=“0”> <element name=“<element name=“authorauthor” type=“xs:string>” type=“xs:string> </sequence></sequence> </sequence></sequence></type></type><element name=“<element name=“bookbook” type=“” type=“book-typebook-type”>”>
01/31/07 28
Schema validated XML dataSchema validated XML data<<bookbook yearyear=“1967” >=“1967” >
<<titletitle>The politics of experience</>The politics of experience</titletitle>> <<authorauthor>R.D. Laing</>R.D. Laing</authorauthor>>
</</bookbook>> After schema validationAfter schema validation
type(book element) = {uri}:book-type type(book element) = {uri}:book-type type(author element) = xs:string type(author element) = xs:string type(year attribute) = xs:integer type(year attribute) = xs:integer typed-value(author element) = (“R.D. Laing” , xs:string)typed-value(author element) = (“R.D. Laing” , xs:string) typed-value(year attribute) = (1967 , xs:integer)typed-value(year attribute) = (1967 , xs:integer)
Schema validation impacts the data model Schema validation impacts the data model representation and therefore the XQuery semantics!!representation and therefore the XQuery semantics!!
01/31/07 29
Lexical and binary aspect Lexical and binary aspect of the dataof the data
Every node holds (logically) redundant information:Every node holds (logically) redundant information: <a xsi:type=“xs:integer”>001</a><a xsi:type=“xs:integer”>001</a>
dm:string-value () “001” as xs:stringdm:string-value () “001” as xs:string dm:typed-value ()dm:typed-value ()
““001” as an xs:untyped 001” as an xs:untyped beforebefore validation validation 1 as an xs:integer 1 as an xs:integer after after validationvalidation
Implementations can store :Implementations can store : The The string valuestring value
Retrieve the typed value dynamically based on the type, every Retrieve the typed value dynamically based on the type, every time is neededtime is needed
The The typed valuetyped value Retrieve an acceptable lexical value for that type every time this is Retrieve an acceptable lexical value for that type every time this is
requiredrequired BothBoth
In case of unvalidated data the two are the sameIn case of unvalidated data the two are the same
01/31/07 30
Typed vs. untyped XML DataTyped vs. untyped XML Data• Untyped data (non XML Schema validated)
<a>3</a> eq 3<a>3</a> eq “3”
• Typed data (after XML Schema validation)<a xsi:type=“xs:integer”>3</a> eq 3<a xsi:type=“xs:string”>3</a> eq 3 <a xsi:type=“xs:integer”>3</a> eq “3”<a xsi:type=“xs:string”>3</a> eq “3”
01/31/07 31
XML data equivalenceXML data equivalence XQuery has multiple notions of data “equality”XQuery has multiple notions of data “equality”
““==“, ““, “eqeq”, “”, “isis”, “”, “fn:deep-equal()”fn:deep-equal()” Expected properties:Expected properties:
TransitivityTransitivity, , reflexivity reflexivity andand symmetry symmetry Necessary for grouping, indexing and hashingNecessary for grouping, indexing and hashing
Additional property:Additional property: if (if ( datadata11 equal equal datadata22 ) ) then ( then ( ff((datadata1)1) equalequal ff((datadata22)) )) Necessary for memoization, cachingNecessary for memoization, caching
None of the equality relationships above (except “is”) None of the equality relationships above (except “is”) satisfies those propertiessatisfies those properties
The “The “isis” relationship only applies to nodes” relationship only applies to nodes Careful implementations forCareful implementations for indexesindexes, , hashinghashing, , cachescaches
01/31/07 32
Document orderDocument order<<bookbook yearyear=“1967” price=“45.32>=“1967” price=“45.32> <<titletitle>The politics of experience</>The politics of experience</titletitle>> <<authorauthor>R.D. Laing</>R.D. Laing</authorauthor>>
</</bookbook>>
How many nodes here ?How many nodes here ? What is the order between nodes ?What is the order between nodes ?
01/31/07 33
Document orderDocument order<<bookbook(n1)(n1) yearyear(n2)(n2) =“1967” price=“1967” price(n3)(n3)=“45.32>=“45.32>(n4)(n4)
<<titletitle(n5)(n5)>>(n6)(n6) The politics of The politics of experience</experience</titletitle>>(n7)(n7) <<authorauthor(n8)(n8)>>(n9)(n9) R.D. Laing</ R.D. Laing</authorauthor>>
</</bookbook>>
How many nodes here ? 9How many nodes here ? 9 What is the order between nodes ?What is the order between nodes ?
n1 before all the othersn1 before all the others order of n2 and n3 non-deterministicorder of n2 and n3 non-deterministic n2 and n3 are before n4,n5,n6,n7,n8,n9n2 and n3 are before n4,n5,n6,n7,n8,n9 n4<n5<n6<n7<n8<n9 (top-down, left to right among the n4<n5<n6<n7<n8<n9 (top-down, left to right among the
children)children)
01/31/07 34
XQuery type system XQuery type system
XQuery has a powerful (and complex!) type systemXQuery has a powerful (and complex!) type system XQuery types are imported from XML SchemasXQuery types are imported from XML Schemas Every XML data model instance has a dynamic typeEvery XML data model instance has a dynamic type Every XQuery expression has a static typeEvery XQuery expression has a static type Pessimistic static type inferencePessimistic static type inference The goal of the type system is:The goal of the type system is:
1.1. detect statically errors in the queriesdetect statically errors in the queries2.2. infer the type of the result of valid queriesinfer the type of the result of valid queries
3.3. ensure statically that the result of a given query is of a given ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type(expected) type if the input dataset is guaranteed to be of a given type
01/31/07 35
XQuery type system XQuery type system componentscomponents
Atomic typesAtomic types xs:untypedAtomicxs:untypedAtomic All 19 primitive XML Schema typesAll 19 primitive XML Schema types All user defined atomic typesAll user defined atomic types
Empty, NoneEmpty, None Type constructors (simplification!)Type constructors (simplification!)
Elements: Elements: element name {type}element name {type} Attributes: Attributes: attribute name {type}attribute name {type} Alternation : Alternation : type1 | type 2type1 | type 2 Sequence: Sequence: type1, type2type1, type2 Repetition: Repetition: type*type* Interleaved product: Interleaved product: type1 & type2type1 & type2
• type1 intersect type2 ?• type1 subtype of type2 ?• type1 equals type2 ?
01/31/07 36
XML queriesXML queries An XQuery basic structure:An XQuery basic structure:
a a prologprolog + an + an expressionexpression Role of the prolog:Role of the prolog:
Populate the context where the expression is compiled Populate the context where the expression is compiled and evaluatedand evaluated
Prologue contains:Prologue contains: namespace definitionsnamespace definitions schema importsschema imports default element and function namespacedefault element and function namespace function definitionsfunction definitions collations declarationscollations declarations function library importsfunction library imports global and external variables definitionsglobal and external variables definitions etcetc
01/31/07 37
XQuery processingXQuery processing
01/31/07 38
XQuery expressionsXQuery expressionsXQuery Expr :=XQuery Expr :=Constants | Variable | FunctionCalls | PathExpr |Constants | Variable | FunctionCalls | PathExpr |
ComparisonExpr | ArithmeticExpr | LogicExpr |ComparisonExpr | ArithmeticExpr | LogicExpr |
FLWRExpr | ConditionalExpr | QuantifiedExpr |FLWRExpr | ConditionalExpr | QuantifiedExpr |
TypeSwitchExpr | InstanceofExpr | CastExpr |TypeSwitchExpr | InstanceofExpr | CastExpr |
UnionExpr | IntersectExceptExpr |UnionExpr | IntersectExceptExpr |
ConstructorExpr | ValidateExprConstructorExpr | ValidateExpr
Expressions can be nested with full generality !Expressions can be nested with full generality !
Functional programming heritage (ML, Haskell, Lisp)Functional programming heritage (ML, Haskell, Lisp)
01/31/07 39
ConstantsConstantsXQuery grammar has built-in support for:XQuery grammar has built-in support for:
Strings:Strings: “125.0” or ‘125.0’“125.0” or ‘125.0’ Integers:Integers: 150150 Decimal:Decimal: 125.0125.0 Double:Double: 125.e2125.e2
19 other 19 other atomic typesatomic types available via XML Schema available via XML Schema Values can be constructed Values can be constructed
with constructors in F&O doc: with constructors in F&O doc: fn:true(), fn:date(“2002-5-20”)fn:true(), fn:date(“2002-5-20”) by castingby casting by schema validationby schema validation
01/31/07 40
VariablesVariables $ + Qname (e.g. $x, $ns:foo)$ + Qname (e.g. $x, $ns:foo) bound, not assignedbound, not assigned XQuery does not allow variable assignmentXQuery does not allow variable assignment created by created by letlet, , forfor, , some/every, typeswitch some/every, typeswitch
expressions, function parametersexpressions, function parameters example:example:
let $x := ( 1, 2, 3 )let $x := ( 1, 2, 3 )return count($x)return count($x)
above scoping ends at conclusion of above scoping ends at conclusion of return return expressionexpression
01/31/07 41
A built-in function samplerA built-in function sampler fn:document(xs:anyURI)=> document?fn:document(xs:anyURI)=> document? fn:empty(item*) => booleanfn:empty(item*) => boolean fn:index-of(item*, item) => xs:unsignedInt?fn:index-of(item*, item) => xs:unsignedInt? fn:distinct-values(item*) => item*fn:distinct-values(item*) => item* fn:distinct-nodes(node*) => node*fn:distinct-nodes(node*) => node* fn:union(node*, node*) => node*fn:union(node*, node*) => node* fn:except(node*, node*) => node*fn:except(node*, node*) => node* fn:string-length(xs:string?) => xs:integer?fn:string-length(xs:string?) => xs:integer? fn:contains(xs:string, xs:string) => xs:booleanfn:contains(xs:string, xs:string) => xs:boolean fn:true() => xs:booleanfn:true() => xs:boolean fn:date(xs:string) => xs:datefn:date(xs:string) => xs:date fn:add-date(xs:date, xs:duration) => xs:datefn:add-date(xs:date, xs:duration) => xs:date
See Functions and Operators W3C specificationSee Functions and Operators W3C specification
01/31/07 42
AtomizationAtomization fn:data(item*) -> fn:data(item*) -> xs:anyAtomicType* Extracting the “value” of a node, or returning Extracting the “value” of a node, or returning
the atomic valuethe atomic value Implicitly applied:
••Arithmetic expressionsArithmetic expressions••Comparison expressionsComparison expressions••Function calls and returnsFunction calls and returns••Cast expressionsCast expressions••Constructor expressions for various kinds of nodesConstructor expressions for various kinds of nodes••order byorder by clauses in FLWOR expressions clauses in FLWOR expressions
01/31/07 43
Constructing sequencesConstructing sequences(1, 2, 2, 3, 3, <a/>, <b/>)(1, 2, 2, 3, 3, <a/>, <b/>)
““,” is the sequence concatenation operator,” is the sequence concatenation operator Nested sequences are flattened:Nested sequences are flattened:
(1, 2, 2, (3, 3)) => (1, 2, 2, 3,3)(1, 2, 2, (3, 3)) => (1, 2, 2, 3,3)
range expressions:range expressions: (1 to 3) => (1, 2,3) (1 to 3) => (1, 2,3)
01/31/07 44
Combining sequencesCombining sequences Union, Intersect, ExceptUnion, Intersect, Except Work only for sequences of nodes, not atomic valuesWork only for sequences of nodes, not atomic values Eliminate duplicates and reorder to document orderEliminate duplicates and reorder to document order
$x := <a/>, $y := <b/>, $z := <c/>$x := <a/>, $y := <b/>, $z := <c/>
($x, $y) union ($y, $z) => (<a/>, <b/>, ($x, $y) union ($y, $z) => (<a/>, <b/>, <c/>)<c/>)
F&O specification provides other functions & F&O specification provides other functions & operators; eg. operators; eg. fn:distinct-values()fn:distinct-values() and and fn:distinct-nodes()fn:distinct-nodes() particularly useful particularly useful
01/31/07 45
Arithmetic expressionsArithmetic expressions1 + 41 + 4 $a div 5$a div 55 div 65 div 6 $b mod 10$b mod 101 - (4 * 8.5)1 - (4 * 8.5) -55.5-55.5
<a>42</a> + 1 <a>baz</a> + 1 <a>42</a> + 1 <a>baz</a> + 1 validate {<a xsi:type=“xs:integer”>42</a> }+ 1validate {<a xsi:type=“xs:integer”>42</a> }+ 1 validate {<a xsi:type=“xs:string”>42</a> }+ 1validate {<a xsi:type=“xs:string”>42</a> }+ 1
Apply the following rules:Apply the following rules: atomizeatomize all operands. if either operand is (), => () all operands. if either operand is (), => () if an operand is untyped, cast to if an operand is untyped, cast to xs:double xs:double (if unable, => (if unable, => error)error) if the operand types differ but can be if the operand types differ but can be promotedpromoted to common type, do so to common type, do so
(e.g.: (e.g.: xs:integerxs:integer can be promoted to can be promoted to xs:doublexs:double)) if operator is consistent w/ types, apply it; result is either atomic if operator is consistent w/ types, apply it; result is either atomic
value or value or errorerror if type is not consistent, throw type exceptionif type is not consistent, throw type exception
01/31/07 46
Logical expressionsLogical expressions expr1 expr1 andand expr2 expr2 expr1 expr1 oror expr2 expr2 fn:notfn:not() as a function() as a function
return return true, false true, false Different from SQLDifferent from SQL
twotwo value logic, value logic, notnot three three value logic value logic Different from imperative languagesDifferent from imperative languages
andand, , oror are commutative in Xquery, but not in Java. are commutative in Xquery, but not in Java. if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) …..if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) …..
Non-deterministicNon-deterministicfalse and error => false false and error => false oror error ! (non-deterministically) error ! (non-deterministically)
• Rules:Rules: first compute the first compute the Boolean Effective Value (BEV)Boolean Effective Value (BEV) for each operand: for each operand:
if (), “”, NaN, 0, then return if (), “”, NaN, 0, then return falsefalse if the operand is of type xs:boolean, return it; if the operand is of type xs:boolean, return it; If operand is a sequence with first item a node, return trueIf operand is a sequence with first item a node, return true else raises an errorelse raises an error
then use standard two value Boolean logic on the two BEV's as appropriatethen use standard two value Boolean logic on the two BEV's as appropriate
01/31/07 47
ComparisonsComparisons
<<, >>testing relative position of one node vs. another (in document order)
Order
is, isnotfor testing identity of single nodes
Node
=, !=, <=, <, >, >=
Existential quantification + automatic type coercion
General
eq, ne, lt, le, gt, ge
for comparing single values
Value
01/31/07 48
Value and general Value and general comparisonscomparisons
<a>42</a> eq “42” true<a>42</a> eq “42” true <a>42</a> eq 42 error<a>42</a> eq 42 error <a>42</a> eq “42.0” false<a>42</a> eq “42.0” false <a>42</a> eq 42.0 error<a>42</a> eq 42.0 error <a>42</a> = 42 true<a>42</a> = 42 true <a>42</a> = 42.0 true<a>42</a> = 42.0 true <a>42</a> eq <b>42</b> true<a>42</a> eq <b>42</b> true <a>42</a> eq <b> 42</b> false<a>42</a> eq <b> 42</b> false <a>baz</a> eq 42 error<a>baz</a> eq 42 error () eq 42 ()() eq 42 () () = 42 false() = 42 false (<a>42</a>, <b>43</b>) = 42.0 true(<a>42</a>, <b>43</b>) = 42.0 true (<a>42</a>, <b>43</b>) = “42” true(<a>42</a>, <b>43</b>) = “42” true ns:shoesize(5) eq ns:hatsize(5) truens:shoesize(5) eq ns:hatsize(5) true (1,2) = (2,3) true(1,2) = (2,3) true
01/31/07 49
Algebraic properties of Algebraic properties of comparisonscomparisons
General comparisons not reflexive, transitiveGeneral comparisons not reflexive, transitive (1,3) = (1,2) (1,3) = (1,2) (but also !=, <, >, <=, >= !!!!!)(but also !=, <, >, <=, >= !!!!!) ReasonsReasons
implicit existential quantification, dynamic castsimplicit existential quantification, dynamic casts Negation rule does not holdNegation rule does not hold
fn:not($x = $y) is not equivalent to $x != $yfn:not($x = $y) is not equivalent to $x != $y General comparison not transitive, not reflexiveGeneral comparison not transitive, not reflexive Value comparisons are Value comparisons are almostalmost transitive transitive
Exception: Exception: xs:decimal due to the loss of precisionxs:decimal due to the loss of precision
Impact on grouping, hashing, indexing, caching !!!
01/31/07 50
XPath expressionsXPath expressions An expression that defines the set of nodes where the An expression that defines the set of nodes where the
navigation starts + a series of selection steps that explain how navigation starts + a series of selection steps that explain how to navigate into the XML treeto navigate into the XML tree
A step:A step: axisaxis ‘::’ ‘::’ nodeTestnodeTest
Axis control the navigation direction in the treeAxis control the navigation direction in the tree attribute, child, descendant, descendant-or-self, parent, selfattribute, child, descendant, descendant-or-self, parent, self The other Xpath 1.0 axes (The other Xpath 1.0 axes (following, following-sibling, preceding, following, following-sibling, preceding,
preceding-sibling, ancestor, ancestor-or-selfpreceding-sibling, ancestor, ancestor-or-self) are optional in XQuery) are optional in XQuery Node test by:Node test by:
Name Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* )(e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* ) Kind of itemKind of item (e.g. node(), comment(), text() ) (e.g. node(), comment(), text() ) Type testType test (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer) (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer)
01/31/07 51
Examples of path expressionsExamples of path expressions document(“bibliography.xml”)/child::bibdocument(“bibliography.xml”)/child::bib $x/child::bib/child::book/attribute::year$x/child::bib/child::book/attribute::year $x/parent::*$x/parent::* $x/child::*/descendent::comment()$x/child::*/descendent::comment() $x/child::element(*, ns:PoType)$x/child::element(*, ns:PoType) $x/attribute::attribute(*, xs:integer)$x/attribute::attribute(*, xs:integer) $x/ancestors::document(schema-element(ns:PO))$x/ancestors::document(schema-element(ns:PO)) $x/(child::element(*, xs:date) | $x/(child::element(*, xs:date) |
attribute::attribute(*, xs:date)attribute::attribute(*, xs:date) $x/f(.)$x/f(.)
01/31/07 52
Xpath abbreviated syntaxXpath abbreviated syntax Axis can be missingAxis can be missing
By default the child axisBy default the child axis $x/$x/child::child::person -> $x/person person -> $x/person
Short-hands for common axesShort-hands for common axes Descendent-or-selfDescendent-or-self
$x/$x/descendant-or-self::*/child::descendant-or-self::*/child::comment()-> $xcomment()-> $x////comment()comment()
Parent Parent $x/$x/parent::*parent::* -> $x/ -> $x/....
AttributeAttribute$x/$x/attribute::attribute::year -> $x/year -> $x/@@yearyear
SelfSelf$x/$x/self::*self::* -> $x/ -> $x/..
01/31/07 53
Xpath filter predicatesXpath filter predicates Syntax:Syntax:
expression1 expression1 [ [ expression2expression2 ] ] [ ] is an overloaded operator[ ] is an overloaded operator Filtering by position (if numeric value) :Filtering by position (if numeric value) :
/book[3] /book[3] /book[3]/author[1] /book[3]/author[1] /book[3]/author[1 to 2] /book[3]/author[1 to 2]
Filtering by predicate :Filtering by predicate : //book [author/firstname = “ronald”]//book [author/firstname = “ronald”] //book [@price <25]//book [@price <25] //book [count(author [@gender=“female”] )>0 //book [count(author [@gender=“female”] )>0
Classical Xpath mistakeClassical Xpath mistake $x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]$x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]
01/31/07 54
Conditional expressions Conditional expressions
if ( $book/@year <1980 ) if ( $book/@year <1980 ) then ns:WS(<old>{$x/title}</old>) then ns:WS(<old>{$x/title}</old>) else ns:WS(<new>{$x/title}</new>)else ns:WS(<new>{$x/title}</new>)
Only one branch allowed to raise execution errorsOnly one branch allowed to raise execution errors Impacts scheduling and parallelizationImpacts scheduling and parallelization