21
Information Systems 33 (2008) 182–202 Toward microbenchmarking XQuery Philippe Michiels a , Ioana Manolescu b, , Ce´dric Miachon c a University of Antwerp, Belgium b INRIA Futurs, France c LRI—Universite´Paris-Sud 11, France Abstract A substantial part of the database research field focusses on optimizing XQuery evaluation. However, there is a lack of tools allowing to easily compare different implementations of isolated language features. This implies that there is no overview of which engines perform best at certain XQuery aspects, which in turn makes it hard to pick a reference platform for an objective comparison. This paper is the first to give an overview of a large subset of the open source XQuery implementations in terms of performance. Several specific XQuery features are tested for each engine on the same hardware to give an impression of the strengths and weaknesses of that implementation. This paper aims at guiding implementors in benchmarking and improving their products. r 2007 Elsevier B.V. All rights reserved. Keywords: XML; Query; XQuery; Benchmark; Microbenchmark; Performance 1. Introduction In the recent past, a lot of energy has been spent on optimizing XML querying. This resulted in many implementations of the corresponding specifications, notably XQuery and XPath. Usually, little time and space is spent on thorough measurements across different implementations. This complicates the task of implementors to compare their implementations to the state-of-the-art technology, since no one really knows what system actually represents it. As is pointed out in [1] , there are two possible approaches for comparing systems using benchmarks. Application benchmarks like XMark [2], XMach-1 [3] , X007 [4] and XBench [5] are used to evaluate the overall performance of a database system by testing as many query language features as possible, using only a limited set of queries. As such, these kinds of bench- marks are not very useful for XPath/XQuery imple- mentors, since they are mainly interested in isolated aspects of an implementation that need improvement. Micro-benchmarks, on the other hand, are designed to verify the performance of isolated features of a system. We believe that microbenchmarks are crucial in order to get a good understanding of an implementa- tion. Moreover, it rarely happens that one platform is the fastest on all aspects. Only microbenchmarks can reveal which implementation performs best for isolated features. Our focus is to benchmark a set of important XQuery constructs that form the foundation of the language and thus greatly impact the overall query engine performance. These features are: XPath navigation XPath predicates (including positional predicates) ARTICLE IN PRESS www.elsevier.com/locate/infosys 0306-4379/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2007.05.003 Corresponding author. E-mail address: [email protected] (I. Manolescu).

Toward microbenchmarking XQuery

Embed Size (px)

Citation preview

Page 1: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0306-4379/$ - se

doi:10.1016/j.is.

�CorrespondE-mail addr

Information Systems 33 (2008) 182–202

www.elsevier.com/locate/infosys

Toward microbenchmarking XQuery

Philippe Michielsa, Ioana Manolescub,�, Cedric Miachonc

aUniversity of Antwerp, BelgiumbINRIA Futurs, France

cLRI—Universite Paris-Sud 11, France

Abstract

A substantial part of the database research field focusses on optimizing XQuery evaluation. However, there is a lack of

tools allowing to easily compare different implementations of isolated language features. This implies that there is no

overview of which engines perform best at certain XQuery aspects, which in turn makes it hard to pick a reference platform

for an objective comparison. This paper is the first to give an overview of a large subset of the open source XQuery

implementations in terms of performance. Several specific XQuery features are tested for each engine on the same

hardware to give an impression of the strengths and weaknesses of that implementation. This paper aims at guiding

implementors in benchmarking and improving their products.

r 2007 Elsevier B.V. All rights reserved.

Keywords: XML; Query; XQuery; Benchmark; Microbenchmark; Performance

1. Introduction

In the recent past, a lot of energy has been spent onoptimizing XML querying. This resulted in manyimplementations of the corresponding specifications,notably XQuery and XPath. Usually, little time andspace is spent on thorough measurements acrossdifferent implementations. This complicates the taskof implementors to compare their implementations tothe state-of-the-art technology, since no one reallyknows what system actually represents it.

As is pointed out in [1], there are two possibleapproaches for comparing systems using benchmarks.Application benchmarks like XMark [2], XMach-1 [3],X007 [4] and XBench [5] are used to evaluate theoverall performance of a database system by testing as

e front matter r 2007 Elsevier B.V. All rights reserved

2007.05.003

ing author.

ess: [email protected] (I. Manolescu).

many query language features as possible, using only alimited set of queries. As such, these kinds of bench-marks are not very useful for XPath/XQuery imple-mentors, since they are mainly interested in isolatedaspects of an implementation that need improvement.

Micro-benchmarks, on the other hand, are designedto verify the performance of isolated features of asystem.We believe that microbenchmarks are crucial inorder to get a good understanding of an implementa-tion. Moreover, it rarely happens that one platform isthe fastest on all aspects. Only microbenchmarks canreveal which implementation performs best for isolatedfeatures. Our focus is to benchmark a set of importantXQuery constructs that form the foundation of thelanguage and thus greatly impact the overall queryengine performance. These features are:

.

XPath navigation

� XPath predicates (including positional predicates)
Page 2: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202 183

XQuery FLWORs � XQuery node construction.

The selected XQuery processors are chosen torepresent both in-memory and disk-based imple-mentations of the language.

We hope to continue this effort using automatedtools such as XCheck [6,7]. This continuationinvolves the population of a repository with a largeamount of ready-made micro-benchmarks as well asthe benchmarking of many more platforms. Wehope that this work can guide XQuery implemen-tors to improve their products based on objective,thorough and relevant measurements.

Limitations: We take the view that detailedperformance measures should document as muchas possible the times spent by an XQuery processingengine in each stage of query evaluation—forinstance, separating query optimization from queryexecution and from the XML result serializationtime. From our experience, in the case of large-result queries, the serialization time can easilydominate the other evaluation times (sometimes byorders of magnitude)! Unfortunately, some enginesdo not provide a means to isolate the serializationtime from the other execution components, e.g.when execution is streamed. Therefore, we havedecided to measure the time to run each query assuch, and then the time simply counts the query

results, with the hope that the latter time is a

reasonable approximation of the time to run the

query without serializing the result.We are aware of two possible problems of this

approach. First, a very simplistic implementationmay serialize the results and then count them, thusincluding, against our will, the serialization time inthe counting query running time. Second, asophisticated implementation may answer countingqueries from some data statistics, e.g. histograms orindexes, without actually accessing the data. In thiscase, the execution time is incomparable with thetime to run the simple query, without the count.Despite these shortcomings, we found the count-ing queries useful in practice as a means toapproximate the otherwise inaccessible XML serial-ization time.

2. Settings

In this section, we present the documents (Section2.1) and queries (Sections 2.2 and 2.4) used for theperformance measures in this paper, as well as the

rationale for choosing them. Section 2.5 describesour hardware and software environment, and thesystem versions used.

All documents, queries, settings and (links to)the systems used in these measures can be foundat [8].

2.1. Documents

In order to have full control over the parameterscharacterizing our documents, we used syntheticones, generated by the MemBeR project’s XMLdocument generator [1,9]. MemBeR-generateddocuments consist of simple XML elements, whoseelement names and tree structure are controlled bythe generator’s user. Each element has a singleattribute called @id, whose value is the element’spositional order in the document. The elementshave no text children.

Some of the systems we tested are based on apersistent store, while the others run completely inmemory. While we are aware of the inherentlimitations that an in-memory system encounters,we believe it is interesting to include both classes ofsystems in our comparison, since performanttechniques have been developed independently onboth sides, and the research community can learninteresting lessons from both. To enable uniformtesting of all systems, we settled for moderate-sizeddocuments of about 11MB, which most systems canhandle well. As such, the stress testing of thesystems below has a focus on query scalability,rather than data scalability.

To these documents, we added a family of 10more documents of varying size, going from 100,000nodes to 1,000,000 nodes. The purpose of this lastdocument family was to enable an analysis in theway query engines process path queries. Theinteresting feature of such queries is that a naiveimplementation requires sorting and duplicateelimination to be performed after each XPath step,whereas efficient engines are able to avoid it. Our 10chosen documents allow tracing the data scalabilityof an engine on path queries, which in turns allowssome inferences about the engine’s inner workings.

The document structures are outlined in Fig. 1. Inthis figure, white nodes represent elements, whosenames range from t1 to t19; dark nodes represent@id attributes. The exponential2.xml, layered.xmland mixed.xml documents have a depth of 19, whichwe chose so that complex navigation can be stu-died, and in accordance with the average-to-high

Page 3: Toward microbenchmarking XQuery

ARTICLE IN PRESS

exponential2.xml

218

x t91

217

x t81

22 x t3

21 x t2

21 x t1

215

x t19

215

x t18

215

x t3

215

x t2

1 x t1

layered.xml

mixed.xml blocking.xml

fanout 3

1 x t1

32680 x t3 / ... / t19

32680 x t3 / ... / t19

32680 x t3 / ... / t19

32680 x t3 / ... / t19

3268 x t2

100.000 nodes, 200.000 nodes, ..., 1.000.000 nodes

Fig. 1. Outline of the documents exponential2.xml, layered.xml, mixed.xml and blockingn.xml used in the measures.

P. Michiels et al. / Information Systems 33 (2008) 182–202184

document depth recorded in a previous experimen-tal study [10].

The exponential2.xml document’s size is11.39MB. At level i (where the root is at level1), the document has 2i�1 elements labeled ti. � The layered.xml document’s size is 12.33MB.

The root is labeled t1, and it has 32 768 childrenlabeled t2. At any level i comprised between 3and 19, there are 32 768 nodes labeled ti. Eachelement labeled ti, with 3pip18, has exactly onechild labeled tði þ 1Þ. Elements labeled t19 areleaves.

� The mixed.xml document’s size is 12.33MB. The

root is labeled t1, and it has 3268 children labeledt1. Each such child (at level 2) has 10 childrenlabeled (with equal probability) t3; t4; . . . ; t12.Nodes at levels comprised between 3 and 18 eachhave 1 child, labeled (with equal probability)t3; t4; . . . ; t12. At level 19, all nodes are leaves,and are labeled t13.

� The blockingn.xml documents all have a fixed

fanout of 3 for every node except the leaf nodes.Their depth is variable and depends on the size ofthe document. Document blocking1.xml contains100,000 nodes, blocking2.xml contains 200,000nodes and so on. This corresponds with 10 docu-ments whose size varies from 2.09 to 21.83MB.

The rationale for choosing these documents is thefollowing. The document exponential2.xml allowsstudying the impact of increasing number of nodes

at a given level, on the performance of pathtraversal queries. At the same time, in this docu-ment, the size of a subtree rooted at level i isexponential in i. At another extreme, the documentlayered.xml has the same depth and approximatetree size as exponential2.xml, but the size of subtreesrooted at various levels depends only linearly on thelevel. The subtree shapes exhibited by both expo-nential2.xml and layered.xml are quite extreme;subtrees from real-life documents are likely to besomewhere in between. Controlling both the pathdepth and the size of the subtrees rooted at eachdepth is important, since these parameters haveimportant, independent impacts on query perfor-mance: the first determines the performance ofnavigation queries, while the second determines theperformance of reconstructing (or retrieving) fulldocument subtrees.

The third document was chosen so as: (i) to be ofoverall size and aspects close to the two previousdocuments; (ii) to feature different tags uniformlydistributed over many levels, thus allowing us tovary, in a controlled manner, the structural selectiv-

ity of various queries (by allowing some tags torange over increasingly large subsets offt1; t2; . . . ; t13g).

Finally, the series of documents blockingn.xml arechosen to have a linearly increasing amount ofnodes in the intermediate results of path expressionsthat select the entire document tree. A superlineartendency in the data scalability plot can indicate thepresence of sorting operations.

Page 4: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202 185

2.2. XPath queries

We measured on the document exponential2.xmlseven parameterized XPath queries, denotedQ1:1ðnÞ;Q1:2ðnÞ; . . . ;Q1:7ðnÞ, where 1pnp19, asfollows:

Q1.1(n) is: /t1/t2/. . ./tn

This query retrieves nodes at increasing depthsof the document. Its output size decreases as theroots of the returned subtrees move lower in thedocument. This query is designed to test theability of the query processor to deal withincreasing lengths of path expressions. Forinstance, intermediate materialization will havean increasing performance impact for longerqueries. We also measured Q1.1(n) on layer-ed.xml, which provided some interesting in-sights when compared with the results on thefirst document.

� Q1.2(n) is: /t1/t2/ . . ./tn/data(@id)

To distinguish the impact of navigation fromthe impact of subtree serialization in the output,we also use the query Q1.2(n), which navigatesat the same depth as Q1.1(n) but only returnssimple attribute values.

� Q1.3(n) is: (/t1/t2/ . . ./tnÞ[1]/ datað@idÞ

This query is used to see whether query enginesare able to take advantage of the [1] predicateto shortcircuit navigation as soon as a singlenode is found.

� Q1.4(n) is: (/t1/t2/ . . . =tnÞ[position()=

lastðÞ�=datað@idÞ

This query is similar to Q1.3(n), but it uses the[position()=last()] predicate, which doesnot easily allow the same optimization as [1] ifthe engine is coded to navigate over the targetnodes in the order dictated by XPath’s semantics[11]. Measuring both Q1.3(n) andQ1.4(n) hints atthe navigation optimization techniques supportedin the engine.

� Q1.5(n) is: /t1[t2/ . . . =tn]/data(@id)

The queries Q1.5(n) aim at quantifying theimpact of increasingly deeper navigation alongexistential branches. Once again this verifieswhether query engines can get around materi-alizing the predicate results and/or if they arecapable to use shortcut evaluation, once a singlepredicate result has been found.

� Q1.6(n) is: /t1[t2/ . . . =tn]/t2/ . . ./tn/

datað@idÞ

Query Q1.6(n) presents an optimization oppor-tunity (the existential branch can be suppressedwithout changing the query semantics); itsresults are to be interpreted together with thoseof Q1.2(n).

� Q1.7(n) is: //tn

Finally, query Q1.7(n) retrieves all elements of agiven tag. Its results are to be compared withthose of Q1.1(n), to see whether the user’sknowledge of the depth of the desired elementssimplifies the query processor’s task.

2.3. Recursive XPath queries

In order to get a better understanding of howexactly XPath expressions are being evaluated, werun the following two queries:

Qxpr.1 count(//*/*) � Qxpr.2 count(/descendant-or-self::*/descendant-or-self::*)

on all blockingn.xml documents. Examining systemperformance on these queries allows verifyingwhether query processors perform sorting andduplicate elimination.

2.4. XQuery queries

We measured on the document mixed.xml a set ofsix parameterized XQuery queries, denotedQ2:1ðnÞ;Q2:2ðnÞ; . . . ;Q2:6ðnÞ, where 0pnp9, asfollows:

Q2.1(n) is

for $x in /t1/t2 return

ores4{for $y in $x/*/*/ . . ./*return string($y/@id)

}o/res4

where the path expression to which $y is boundfeatures n child navigation steps starting from$x. These queries test the performance ofincreasingly deeper navigation in the return

clause, while returning results of modest (and,on mixed.xml, constant) size.

� Q2.2(n) is

for $x in /t1/t2 return

ores4fx/*/*/ . . ./*o/res4

Page 5: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202186

where the path expression in the return clausefeatures n child navigation steps starting from$x. The queries Q2.2(n) combine increasinglydeep navigation with decreasingly large subtreesto be copied in the output (recall that XQuerysemantics [12] requires the content of newlycreated elements to be copied).

� Q2.3(n) is

for $x in /t1/t2 return

ores4f$x//t3; $x//t4; . . . ; $x//tðnþ 2)g o/res4

The queries Q2.3(n) are designed to returnresults whose size is expected to increase withn, given that elements labeled t3; t4; . . . ; t12 areuniformly distributed over levels 3–18 in mix-ed.xml. Moreover, the elements which Q2.3(n)must retrieve are scattered over the document.This allows us to verify how increasingly bigand scattered results are dealt with by theprocessor.

� Q2.4(n) is

for $x in /t1/t2 return

ores4f$x/*[position()pn]g o/res4

Q2.4(n) returns results whose size linearly growswith n; however, in this case, the elements to bereturned are grouped in contiguous subtrees inthe original document. The performance ofQ2.3(n) compared with that of Q2.4(n) providesinteresting insight on the node clustering strat-egy used by the system (if any).

� Q2.5(n) is

for $x in /t1/t2 return

x/*[position() pn]

The queries Q2.5(n) are similar to Q2.4(n);however, Q2.5(n) does not construct newelements. Strictly speaking, Q2.5(n) could havebeen expressed in XPath, but we keep it in theXQueries group for comparison with Q2.4(n).Given that Q2.5(n) does not construct newelements, there is an opportunity for a moreefficient evaluation than in the case of Q2.4(n),since no tree copy operation is needed.

� Q2.6(n) have increasingly deeply nested return

clauses. All the queries retrieve t13 elements andreturn them ‘‘wrapped’’ in increasingly deeperores4 elements. Rather than giving the generalform, quite difficult to read, we provide heresome examples:

Q2.6(0):

for $x in /t1/t2 return

ores4ffor $x1 in $x/* return

ores4f$x1==t13go/res4g o/res4

Q2.6(1):

for $x in /t1/t2 return

ores4ffor $x1 in $x/* return

ores4ffor $x2 in $x1/* return

ores4f $x2//t13} o/res4g o/res4g o/res4

Q2.6(2):

for $x in /t1/t2 return

ores4ffor $x1 in $x/* return

ores4ffor $x2 in $x1/* return

ores4ffor $x2 in $x1/* return

ores4f $x3//t13 go/res4g o/res4g o/res4g o/res4

Query Q2.6(n) returns subtrees consisting of nþ 2recursively nested ores4 elements, each of whichencloses some t13 elements. On the document mixed.xml, will all return 3260 ores4 element, since thereare 3260 t2 elements in mixed.xml. Moreover, eachsuchores4 element will include copies of 10 t13 ele-ments. As n grows, however, the number of ‘‘layers’’ ofores4 elements in which the t13 elements are wrap-ped increases. The queries and the document have beenchosen to capture the impact of result nesting only.

2.5. Hardware and software environment

Our measures were performed on a singlemachine. Although this allows absolute compar-isons of the tested systems, we are mostly interestedin identifying the tendency of each system’s runningtime across increasingly complex queries. Therelevant machine parameters are as follows:

Processor 2.00GHz Pentium 4 (512KB Cache);Memory 512MB DDR SDRAM;

Page 6: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202 187

Hard Disk Maxtor DMþ8, 40GB, 2MB buffer,avg. seek time o10ms;Operating System Linux 2.6.12-10-386.

The limited amount of memory on the testingmachine can be a disadvantage, especially to themain memory engines. Most engines do well interms of memory for the tests we are running here.Some, however, run onto trouble very fast andrequire a lot of memory swapping. This is why wemeasure CPU time here, rather than wall clock time.Whenever the XQuery processor fails to produceusable results for a benchmark due to swapping, thisis mentioned in the engine’s discussion in Section 3,the discussion of the engine. We tested the followingsystems:

Berkeley DB XML (v 2.2.13)—Oracle BerkeleyDB XML is an open source, embeddable XMLdatabase with XQuery-based access to documentsstored in containers and indexed based on theircontent. Oracle Berkeley DB XML is built on top ofOracle Berkeley DB. Queries can be passed to thedatabase system by means of scripts. We used thecommand line dbxml -s dbxml.script to exe-cute queries, where dbxml.script is a straight-forward wrapper script that contains the query.

CDuce/CQL version 0.4.1. CDuce (pronounced‘‘seduce’’) is a general purpose typed functionalprogramming language implemented in OCaml,which is specifically targeted to XML applications.It conforms to basic standards such as DTDs,Namespaces, etc. Theoretical foundations of theCDuce’s type system can be found in [13]. Recently,a ‘‘Select-From-Where’’ syntax similar to XPath hasbeen added for user convenience on top of CDuce[14]; it is translated into CDuce. An importantcharacteristic of CDuce is its pattern algebra, whichextends the XDuce [15] pattern algebra, allowingcomplex and powerful pattern matching. Patternmatching in CDuce is implemented by means ofautomata constructed ‘‘just-in-time’’. CDuce has astrong type system, which enables static verificationof safe program composition. Among the XQueryfeatures not supported is node identity: CDuce isvalue-based, that is, it does not distinguish twodistinct nodes having the same serialized value(other than by their respective positions in theoriginal document). In particular, there is no way totest if two variable bindings correspond to the same

node, in XQuery sense [12]. CDuce executionproceeds in two stages: first the query is compiledby invoking cduce -compile query.cd

�obj� dir cdo, then the query is executed byinvoking cduce -run query.cdo -I cdo.

eXist (v 1.1rc)—eXist is an open source nativeXML database, implemented in Java. It featuresindex-based XQuery processing and automaticindexing. The database implements the currentXQuery 1.0 working drafts, with exception of theschema import and schema validation featuresdefined as optional in the XQuery specification.Performance measurements were performed bylaunching an eXist server using the startup.shscript part of the distribution and then executingeach query by passing the query to the client:client.sh -F query -c /db -n 10,000.Here query is the plain query file, and /db is thecollection containing the document. Note that eXistlimits serialization to a maximum of10,000 nodes, although usually, the engine willrun out of memory before being able to computereasonably large results.

Galax (v 0.6.10)—Galax is an open-sourceOCaml-based implementation of XQuery. Galaxclosely tracks the definition of XQuery 1.0 asspecified by the W3C. The command line used tolaunch the experiment was: galax-run - inline�

variableson� streaming� shebangon� factori �

zationon� monitor� timeonquery:xq.MonetDB/XQuery (Pathfinder v 0.12.0)—Mon-

etDB/XQuery provides an XQuery implementation,which is constructed as an independent compiler,producing code for the MonetDB server backend.Both systems are implemented in Cþþ. We used a32-bit compilation with no optimizations. Querieswere run as follows: pf query.xq j Mserver.

QizX/open (v 1.1.p2)—Qizx/open is an open-source Java implementation of the XML Queryspecifications, working in main memory. Welaunched the execution by calling qizxopen_batch:shquery:xq4buf, where query.xq is afile containing the query, and buf is a temporarybuffer file receiving the output. The scriptqizxopen_batch.sh is part of the distribution.

Qexo/Kawa (v 1.8)—Qexo is a partial mainmemory implementation of the XML Query lan-guage, which attempts to achieve high performanceby compiling queries down to Java bytecodes usingthe Kawa framework. We used the followingcommand line to evaluate the queries:qexo -f /tmp/qexo_query.xq.

Saxon-B v 8.8J—Saxon is a high-performanceimplementation of the XQuery, XPath and XSLTrecommendations. The system is implemented in

Page 7: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202188

Java and operates in main memory. The commandline used was: java -Xmx1024m -cp saxon8:jarnet:sf:saxon:Query �tquery:xq as recommen-ded by the documentation.

We chose systems that (i) were freely available (ifpossible open source), (ii) had a user communityand/or (iii) were the target of recent publishedresearch works. Our choice of systems includessome endowed with a persistent store (Berkeley DB,eXist and MonetDB), as well as purely in-memorysystems (Saxon, Galax, QizX, Qexo and Saxon). Inthis work, we did not specifically target ourmeasures at disk-based retrieve times of disk-resident systems (although, of course, this aspect isinteresting). Rather, we aimed at studying theperformance of various algorithms implemented inthe engines once they run in memory.

To that effect, we ran each measure four times and

report the average CPU time of the last three (hot)runs. Although we cover a substantial part of theXQuery implementation field, we are aware thatmore systems meeting our criteria exist. We plan toextend our tests to such systems in the near future.

3. Benchmark results

This section presents the results for each of thebenchmarks. It is organized as follows.

(i)

Firstly, we take a look at XPath queryscalability over non-recursive documents. Here,the result size for each additional navigationstep is kept constant in order to monitor queryscalability independently of the result size;

(ii)

Secondly, we run additional tests for obtainingmore information about the evaluation strategyfor XPath expressions. More specifically, we seeif and how XQuery processors deal with orderand duplicates during XPath evaluation;

(iii)

Lastly, we run all XPath (Q1.x(n), 1pxp7)and XQuery (Q2.x(n), 1pxp6) queries onexponential2.xml and mixed.xml, respectively.We do this for each engine separately and relatethe results to the two benchmarks above.

For completeness, we first list the documentprocessing times for main memory processors.These are always included in the measurementsbelow. As Fig. 2 points out, processing times scalelinearly with the document size for all enginesexcept for Saxon, which seems to scale sublinearly.We point out that the document processing time for

Galax does not include materialization. Manyqueries indeed do not require materialization, butin case they do, there will be an extra performancepenalty. It is unknown to what extent this holds forother processors.

3.1. XPath performance for non-recursive documents

We started by running queries Q1.1(n) on thelayered.xml document for all engines. The querieswere run once more, wrapped in a count() functioncall, to eliminate the cost of serialization. The resultsare depicted in Fig. 3.

The eXist engine failed to produce results beyondQ1.1(13). The right graph of Fig. 3 shows that eXistscales linearly, suggesting that the evaluation ofsequences of child steps scales linearly with the sizeof the output. The same holds for Saxon, Qizx andMonetDB. Examples of XPath algorithms that havethese properties are nested loop with delayed sortingand duplicate elimination [16], Staircase Joins [17]as used by MonetDB and Twig Joins [18]. Galaxuses streaming for this query. Clearly, there is anobservable extra cost for each extra step in thepipeline, though Galax scales sublinearly. Thestreaming approach has the additional advantagethat the cost of subsequent operations, like serial-ization, can be amortized over the query evaluationcost. BerkeleyDB shows a similar scalability here,although it is unclear what causes this. The Qexograph is too irregular to draw any conclusion fromit. We now continue our discussion engine by enginefor each benchmark.

3.2. XPath, order and duplicates

We ran Qxpr.1 and Qxpr.2 on n blockingn.xmldocuments. The results are depicted in Fig. 4. EXisteither returned an error or a wrong result for bothqueries on all documents. CDuce raised an excep-tion mentioning a stack overflow for both querieson every document. This is because descendentnavigation is implemented in CDuce via a stack thatkeeps all descendant matched on recursive functioncalls (and automata), unlike child, parent andsibling navigation, which are implemented byautomata alone. Hence, eXist and CDuce areexcluded from the discussion below.

Qizx failed to execute Qxpr.2 on documents withmore than 300,000 nodes. This suggests that Qizxuses an evaluation strategy that does not deal wellwith duplicates in the intermediate result. Naive

Page 8: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0

2

4

6

8

10

12

blocking-5.xml

19.64 MB

blocking-4.xml

17.44 MB

blocking-3.xml

15.25 MB

blocking-2.xml

13.05 MB

blocking_1.xml

10.86 MB

mixed.xml11.45 MB

layered.xml12.33 MB

exponential2.xml

11.89 MB

Tota

l execution tim

e (

sec)

XCheck output (running phase), Experiment: Member Test Runs, Query: docloading

Galax-streamQexoQizx

SaxonCDuce

Fig. 2. Document processing times for the main memory XQuery engines.

0

5

10

15

20

25

30

35

40

Q11-19Q11-15100501

Tota

l C

PU

tim

e (

sec)

Query

XCheck, Document: layered.xml (12.33 MB)

0

5

10

15

20

Q11-19Q11-15100501

Tota

l C

PU

tim

e (

sec)

Query

XCheck, Document: layered.xml (12.33 MB)

Fig. 3. Results for queries Q1.1(n) using layered.xml on all processors, including serialization (left) and without serialization (right).

P. Michiels et al. / Information Systems 33 (2008) 182–202 189

nested loop evaluation may cause such behavior.The same holds for BerkeleyDB, which shows aclear superlinear tendency for both queries. Saxonseems to scale linearly for Qxpr.1, but scales slightlysuperlinear for Qxpr.2. This may mean that Saxondelays sorting until the end if possible, but this is nothelping for the second query. However, the effect ofsorting in Saxon is seemingly small, compared to thetotal query processing time. Qexo behaves in asimilar way, but does not seem to delay sorting andduplicate removal for the first query.

The other processors seem to use more advanceevaluation strategies that eliminate the need forsorting and duplicate elimination for our queries.Galax uses a streaming approach for this, whereasMonetDB is known to use the staircase join.

3.3. Berkeley DB

3.3.1. XPath

The BerkeleyDB XQuery implementation suc-cessfully completed all XPath queries. In Fig. 5,

Page 9: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0

10

20

30

40

50

1000K500K100K

Tota

l C

PU

tim

e (

sec)

Document Size (# nodes)

XCheck, Blocking XPath

0

20

40

60

80

100

1000K500K100K

Tota

l C

PU

tim

e (

sec)

Document Size (# nodes)

XCheck, Blocking XPath

Fig. 4. Results for queries Qxpr.1 (left) and Qxpr.2 (right) on blockingn.xml for all XQuery processors.

0.1

1

10

100

1000

1915100501

Log T

ime (

sec)

Query

XCheck, Engine: Berkeley,

Document: exponential2.xml (11.89 MB)

0.1

1

10

100

1000

1915100501

Log T

ime (

sec)

Query

XCheck, Engine: Berkeley,

Document: exponential2.xml (11.89 MB)

Fig. 5. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml on BerkeleyDB, including serialization (left) and without serialization

(right).

P. Michiels et al. / Information Systems 33 (2008) 182–202190

Q1.3(n) andQ1.5(n) show nearly constant executiontimes for all n, which suggests that BerkeleyDBhas an efficient way of handling existential pre-dicates and some positional predicates. All otherqueries show quite poor scalability when serial-ization is involved. Only Q1.7(n) scales betterwhen serialization is not included, but the re-sults in Section 3.2 reveal a naıve evaluation strategyfor descendant as well. However, child querieslike Q1.1(n) do not seem to benefit from this indexand the plot suggests level-per-level navigation. Thegraphs for Q1.6(n) and Q1.2(n) almost coincide,meaning that the redundant predicate test induceslittle or no extra cost. In general, the numberssuggest that there is a lot of room for improvement

with respect to common XPath steps in Berke-leyDB.

3.3.2. XQuery

BerkeleyDB experienced serious problems runningthe XQuery experiments. Notably, the queriesQ2.2(n), Q2.3(n) and Q2.6(n) hit the swappingboundary very early on in the experiment. As such,BerkeleyDB failed to produce usable results for thesebenchmarks within a reasonable time span. Clearly,the BerkeleyDB developers have some memory issuesto solve here. Moreover, the difference betweenQ2.5(n) in the left and right graphs shows howexpensive serialization is in absolute numbers,although it does scale linearly. Q2.5(n) is the only

Page 10: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202 191

query for which the behavior is comparable with theother engines. The plot for Q2.1(n) is a bumpy ridefrom which there seems little to deduce. BerkeleyDB isthe only engine that shows a pronounced superlinearscalability for Q2.4(n). The fact that Q2.5(n) is theonly query without constructors suggests that Berke-leyDB has some issues with those (Fig. 6).

3.4. CDuce

3.4.1. XPath

CDuce reported times in the left graph of Fig. 7are the sum of: the automaton construction time,determined by the query and the input document’s

0

10

20

30

40

50

60

9876543210

Tim

e (

sec)

Query

XCheck, Engine: BerkeleyDB,

Document: mixed.xml (11.45 MB)

Fig. 6. Results for queries Q2.1(n), Q2.4(n) and Q2.5(n), using mixe

serialization (right).

0

5

10

15

20

25

1915100501

Tim

e (

sec)

Query

XCheck, Engine: CDuce,

Document: exponential2.xml (11.89 MB)

Fig. 7. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml o

(right).

DTD; the execution time, during which the resultsare constructed but not serialized yet; and theserialization time.

The running times of Q1.1(n) and Q1.7(n) arecomparatively more important, and decrease as n

grows, due to the decreasing total size of the queryresult (Fig. 7 left). The curves for Q1.1(n) andQ1.7(n) are roughly similar.

Both Q1.3(n) and Q1.4(n) have very shortrunning time (and almost constant). The translationof these XPath queries to CQL, combined with thepattern matching concept underlying the system,allows to propagate the [1] and [position() ¼ last()]

predicates up, thus the efficient evaluation.

0

10

20

30

40

50

9876543210

Tim

e (

sec)

Query

XCheck, Engine: BerkeleyDB,

Document: mixed.xml (11.45 MB)

d.xml on BerkeleyDB, including serialization (left) and without

4

5

6

7

8

9

1915100501

Tim

e (

sec)

Query

XCheck, Engine: CDuce,

Document: exponential2.xml (11.89 MB)

n CDuce, including serialization (left) and without serialization

Page 11: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0

5

10

15

20

25

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Cduce,

Document: mixed.xml (11.45 MB)

4

6

8

10

12

14

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Cduce,

Document: mixed.xml (11.45 MB)

Fig. 8. Results for queriesQ2.1(n),Q2.4(n) andQ2.5(n), using mixed.xml on CDuce, including serialization (left) and without serialization

(right).

P. Michiels et al. / Information Systems 33 (2008) 182–202192

The time for Q1.2(n) exhibits an exponentialgrowth in Fig. 7 as the number of returned/visitednodes grows exponentially. The same observationholds for Q1.6(n), which is more expensive thanQ1.2(n), although they are equivalent. The reason isthat the pattern resulting from the translation ofQ1.6(n) has twice the size of the pattern for Q1.2(n)(in other words, the pattern is not minimized toeliminate the redundant existential branch).

Q1.5(n) times tend to grow sensibly for largevalues of n. The reason is that the correspondingpattern’s size grows linearly with n, and the curvereflects the complexity of automata-based evalua-tion for such patterns.

These results confirm earlier results discussed in[19], which also gives some details on the impact ofthe presence of a precise level-by-level DTDaccording to which exponential2.xml validates.

3.4.2. XQuery

Fig. 8 presents CDuce’s running times onXQueries. Q2.1(n) grows very slowly with n,reflecting a moderately increasing navigation timeand a small serialization effort. Q2.2(n) runningtime decreases as the size of the returned subtreesdecreases. Q2.3(n), which builds the larger results,failed to run for nX2. The reason is that eachserialized element in CDuce is written in an OCamlstring variable, and the maximum size of such stringvariables is constrained by the language environ-ment. Elements returned by Q2.3(n) grow largerwith n and outgrow at some point the available

space. Clearly, this is an engineering problem,whose solution involves the usage of some buffereddata structure continuously writing to a file.

Q2.4(n) and Q2.5(n) have roughly the samerunning time, since for CDuce, a newly constructednode is just another value (there is no need to deeplycopy trees). Finally, Q2.6(n) grows linearly with theresult nesting levels.

3.5. eXist

3.5.1. XPath

The eXist version that was tested consistentlyfailed to produce output for all queries Q1.Xn wherenX14. The eXist engine did not report any errors,leaving this behavior unexplained. For queriesQ1.1n and Q1.7n where np4, eXist failed with anerror message indicating that eXist had run out ofmemory. This may be related to the relatively largesize of the result for these queries, indicating theneed for materialization when serialization isrequired. These problems did not occur duringprior experiments [19] and we suspect this differencein behavior to be related to the reduced amount ofphysical memory of the testing machine. Fig. 9shows the running times for the queries that weresuccessfully executed.

Queries Q1.1n and Q1.7n virtually coincide, indicat-ing the use of a path index for simple path queries. Therelatively large result size is responsible for the highserialization cost for these queries. The evaluation timeswithout serialization are all located within a 0.05 s

Page 12: Toward microbenchmarking XQuery

ARTICLE IN PRESS

1

10

1915100501

Log T

ime (

sec)

Query

XCheck Engine: eXist,

Document: exponential2.xml (11.89 MB)

Q11nQ12nQ13nQ14nQ15n

Q16nQ17n

1.5

1.52

1.54

1.56

1.58

1.6

1915100501

Tim

e (

sec)

Query

XCheck Engine: eXist,

Document: exponential2.xml (11.89 MB)

Q11nQ12nQ13nQ14nQ15nQ16nQ17n

Fig. 9. Results for queries Q1.1n–Q1.7n, using exponential2.xml on eXist, including serialization (left) and without serialization (right).

0

5

10

15

20

9876543210

Tim

e (

sec)

Query

XCheck, Engine: eXist,

Document: mixed.xml (11.45 MB)

1.5

1.51

1.52

1.53

1.54

1.55

9876543210

Tim

e (

sec)

Query

XCheck, Engine: eXist,

Document: mixed.xml (11.45 MB)

Fig. 10. Results for queries Q2.1n–Q2.6n, using mixed.xml on eXist, including serialization (left) and without serialization (right).

P. Michiels et al. / Information Systems 33 (2008) 182–202 193

interval and it is hard to see any pattern in thosetimings, in fact query evaluation time is almostconstant. Note, however, that we have observeddegrading scalability of eXist for all queries exceptQ1.1n and Q1.7n for nX14 in earlier experiments [19].The experimental results in Fig. 3 show that theevaluation of simple path expressions likeQ1.1(n) scalelinearly with the size of the result. The lack of resultsfor longer path expressions keeps us from drawingdefinitive conclusions for the eXist XPath experiments,but given the exponential growth of the result, anequally exponential tendency is to be expected.

3.5.2. XQuery

We experienced many problems while running theXQuery benchmarks on eXist. For most queries,

i.e., Q2.2(n) (no1), Q2.3(n), Q2.4(n) (no2), Q2.6(n)eXist reported an error looking as follows:XMLDBException during query: Theconstructed document fragmentexceeded the predefined size limitðcurrent : 10;001; allowed : 10;000ÞThe query has been killed, which results inthe absence of measurements in Fig. 10, for thosequeries. It does come as a surprise that an XQueryengine would limit the output to a certain amountof nodes. Only Q2.5(n) succeeded when serializationwas disabled. Our earlier measures reported in [19]were performed using twice the amount of memoryused here, and they show substantially betterresults. Thus, the repeated failures of eXist reportedhere are likely due to the limited available memory.

Page 13: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202194

3.6. Galax

3.6.1. XPath

Compared to earlier experiments [19] Galax hasundergone substantial changes. The most importantchange is the addition of pull-based query evalua-tion [20]. The only XPath queries that show poorscalability in Fig. 11 are Q1.4(n) and—to a lesserdegree—Q1.6(n). Obviously, Galax materializesintermediate results in order to evaluate arbitrarypositional predicates, although theposition() = 1 predicate seems to be handledfine. The redundant existential predicate in Q1.6(n)is not optimized and as a result, Galax is unable toevaluate these queries without materialization. Thelinear scalability of queries Q1.1(n)–Q1.3(n) and theconstant running time for Q1.7(n) is a property ofGalax’s evaluation strategy. Query Q1.5(n) scaleswell because the intermediate results that needmaterialization for evaluating the predicate aresmall. The extreme behavior of Q1.4(n) may berelated to the fact that the Galax data model has alarge memory footprint. Notice that the serial-ization cost is being amortized over the queryevaluation cost. This results in only a limitedadditional serialization cost for queries Q1.1(n)and Q1.7(n).

3.6.2. XQuery

The results for the XQuery experiments for Galaxare given in Fig. 12. The difference with themeasurements in [19] is surprisingly large. Thesteepness of the curve for Q2.3n has decreased

0

20

40

60

80

100

120

140

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Galax-stream,

Document: exponential2.xml (11.89 MB)

Fig. 11. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml

(right).

substantially. The evaluation times for all queriesexcept Q2.3n has dropped well under 50 s, com-pared to more than 70 s before.

The serialization overhead in the left graph isresponsible for the downward curve for Q2.2(n),which produces smaller results for larger queries.However, the right-hand side curve for Q2.2(n) alsoshows a decrease in evaluation time, indicating thatthe size of the intermediate results matters here. Thepull-based execution of Galax allows the serial-ization cost to be amortized over the rest of thequery execution cost, explaining the difference withthe more sharply decreasing slope of earliermeasurements [19]. When serialization is avoided,queries except Q2.1(n), Q2.2(n) and Q2.5(n) shownear constant execution times. The relatively highcost of Q2.3(n), compared with other similarqueries, is due to the inability to avoid materializa-tion, caused by multiple uses of the variable $x.Adding individual navigation steps to Q2.1(n) andQ2.2(n) hardly influences the total cost of the query,since these steps do not affect the cardinality of theresult. Hence, the graphs on the right for Q2.1(n)andQ2.2(n) show a constant tendency. For all otherqueries, the output size grows linearly, explainingthe linear tendency in the corresponding plots.

The constructor operation in Galax has astreaming interface, i.e., it returns a SAX tokencursor. As a result, the Galax compiler picks up aspecial implementation for the count function,triggering lazy evaluation. In contrast, the querywithout the constructor does not pick up thisapproach and uses the default count function,

0

20

40

60

80

100

120

140

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Galax-stream,

Document: exponential2.xml (11.89 MB)

on Galax, including serialization (left) and without serialization

Page 14: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0

10

20

30

40

50

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Galax,

Document: mixed.xml (11.45 MB)

0

10

20

30

40

50

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Galax,

Document: mixed.xml (11.45 MB)

Fig. 12. Results for queriesQ2.1(n)–Q2.6(n), using mixed.xml on Galax, including serialization (left) and without serialization (right). The

plot for Q2.3(n) linearly grows to 200 and 89 s for query 9 in the left and right graph, respectively.

P. Michiels et al. / Information Systems 33 (2008) 182–202 195

which materializes before actually counting. Thisexplains the staggering difference between Q2.4(n)and Q2.5(n) in the right graph of Fig. 12.

3.7. MonetDB

3.7.1. XPath

The MonetDB engine seemed to have someproblems with the use of the fn:data function.The reported error was ‘‘type error: can’temcast type ‘untypedAtomic � ’ to type‘node � ’’’. Therefore, we ran MonetDB on thesame queries without the function call. Normally,this should raise an error since the serialization offree standing nodes is not allowed by the XQuery1.0 CR, but fortunately MonetDB did not complainabout this.

As for many other engines, queries Q1.4(n) areproblematic for MonetDB, probably due to theneed for materializing the results in order toevaluate the predicate. For n413 MonetDB sud-denly stops producing output and exits with thefollowing error: glibc detected free():invalid pointer : 0x8a957008 . Q1.7(n) isthe only series of queries showing constant scal-ability in the absence of serialization. All otherqueries (including Q1.1(n)) show a superlineartendency for larger values of n. The superlineartendency for Q1.1(n) is explained by pointing outthat the staircase join scales with the sum of theinput and output nodes for a step, which growexponentially here by navigating deeper into thedocument tree. This behavior is propagated to all

other queries except Q1.7(n) and is confirmed by theexperiments at the beginning of this section and thecorresponding results in Fig. 3. Despite the super-linear tendency, the predicate queries—with theexception of Q1.4(n)—seem to scale well in absolutenumbers. However, as Q1.4(n) shows, if the inter-mediate results would grow significantly larger forthese queries, performance may suffer more. Thesmall difference in absolute numbers here leaves usinconclusive about how the redundant predicate inQ1.6(n) is handled by MonetDB. Finally, thedownward curves in the left graph of Fig. 13 showsthat serialization accounts for a substantial part ofquery evaluation cost. The smaller the serializedvalues get, the faster the queries finish.

3.7.2. XQuery

MonetDB failed to execute query Q2.1(n), reportinga problem with the signature of the string function.The message was type error : no variantof function fn : string accepts thegiven argument typeðsÞ : attribute idfatomicg. Other queries did finish successfully andtheir results are depicted in Fig. 14. The downwardslope in the plot of Q2.2(n) in the right graphcorresponds with the decreasing result size and—as inGalax—must partly be caused by the relatively largeintermediate results, since it also shows up whenserialization is not performed. There is a slight super-linear tendency for queries Q2.3(n) and Q2.6(n), but itis too small to draw conclusions from it. QueriesQ2.2(n),Q2.4(n) andQ2.5(n) scale linearly with the sizeof the output. As the Q2.4(n) plot shows, adding a

Page 15: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0

0.5

1

1.5

2

2.5

3

3.5

4

1915100501

Tim

e (

sec)

Query

XCheck, Engine: MonetDB,

Document: exponential2.xml (11.89 MB)

Q11nQ12nQ13nQ14nQ15nQ16nQ17n

0

0.5

1

1.5

2

1915100501

Tim

e (

sec)

Query

XCheck, Engine: MonetDB,

Document: exponential2.xml (11.89 MB)

Q11nQ12nQ13nQ14nQ15nQ16nQ17n

Fig. 13. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml on MonetDB, including serialization (left) and without serialization

(right). The plot for Q1.4(n) grows exponentially to 15.4 and 14.8 s for query 13 in the left and right graph, respectively. For queries larger

than Q14(13), MonetDB no longer produces output.

0

2

4

6

8

10

9876543210

Tim

e (

sec)

Query

XCheck, Engine: MonetDB_S,

Document: mixed.xml (11.45 MB)

0

1

2

3

4

5

9876543210

Tim

e (

sec)

Query

XCheck, Engine: MonetDB_S,

Document: mixed.xml (11.45 MB)

Fig. 14. Results for queries Q2.1(n)–Q2.6(n), using mixed.xml on MonetDB, including serialization (left) and without serialization (right).

The plot for Q2.3(n) linearly grows to 48.3 and 24.5 s for query 9 in the left and right graph, respectively.

P. Michiels et al. / Information Systems 33 (2008) 182–202196

constructor to Q2.5(n) changes the behavior of thequery from constant to linear. Nesting constructors inQ2.6(n) seems considerably more expensive.

3.8. Qexo

3.8.1. XPath

Like MonetDB, Qexo was unable to evaluate thequeries that contained the fn:data function call.In Qexo, the function is simply not implemented.We resorted to running the same queries as we didfor MonetDB.

The Qexo results are depicted in Fig. 15. Allqueries have superlinear scalability, where Q1.4(n)and Q1.6(n) unsurprisingly show the worst scal-ability. The bad scalability ofQ1.2(n) (coinciding onthe graph with Q1.4(n)) is a bit surprising consider-ing the little difference with Q1.1(n), i.e., just oneextra attribute step. The superlinear behavior forQ1.7(n) is unique among all the processors. Theexperiments in Section 3.2 suggest a naıve XPathevaluation strategy. Despite the superlinear ten-dency, the Qexo engine performs remarkably well inabsolute numbers and shows a very low extra costfor serialization.

Page 16: Toward microbenchmarking XQuery

ARTICLE IN PRESS

0

5

10

15

20

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Qexo,

Document: exponential2.xml (11.89 MB)

0

5

10

15

20

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Qexo,

Document: exponential2.xml (11.89 MB)

Fig. 15. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml on Qexo, including serialization (left) and without serialization

(right).

0

5

10

15

20

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Qexo,

Document: mixed.xml (11.45 MB)

0

5

10

15

20

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Qexo,

Document: mixed.xml (11.45 MB)

Fig. 16. Results for queries Q2.1(n)–Q2.6(n), using mixed.xml on Qexo, including serialization (left) and without serialization (right). The

plot for Q2.3(n) linearly grows to 76.5 and 77.7 s for query 9 in the left and right graph, respectively.

P. Michiels et al. / Information Systems 33 (2008) 182–202 197

3.8.2. XQuery

The Qexo results for the XQuery experiments aredepicted in Fig. 16. The very small differencebetween serializing and non-serializing queries isagain quite striking. Qexo is the only engine forwhich Q2.2(n) does not show a downward slope. Itseems that Qexo is less sensitive to the size of the(intermediate) results. This may be in part becausequery processing time dominates serialization time.There is very little difference between Q2.4(n) andQ2.5(n), indicating a marginal cost for constructors.Constructors, however, are relatively expensive(consider Q2.6(n)).

3.9. QizX

3.9.1. XPath

Fig. 17 shows the results for Qizx. The decreasingtimes in the left graph for Q1.1(n) and Q1.7(n)correspond to the decreasing volumes of data thatneed to be materialized. The scalability of the XPathqueries Q1.1(n), which use only child steps, is worsethan the scalability of the equivalent query, usingthe descendant step in Q1.7(n). The reason for thisis that the cost of the separate steps scales linearlywith the size of the result, as we have seen in theexperiment at the beginning of this Section. It is also

Page 17: Toward microbenchmarking XQuery

ARTICLE IN PRESS

2

3

4

5

6

7

8

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Qizx,

Document: exponential2.xml (11.89 MB)

3

3.5

4

4.5

5

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Qizx,

Document: exponential2.xml (11.89 MB)

Fig. 17. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml on Qizx, including serialization (left) and without serialization

(right).

0

2

4

6

8

10

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Qizx,

Document: mixed.xml (11.45 MB)

2

2.5

3

3.5

4

4.5

5

5.5

6

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Qizx,

Document: mixed.xml (11.45 MB)

Fig. 18. Results for queries Q2.1(n)–Q2.6(n), using mixed.xml on Qizx, including serialization (left) and without serialization (right). The

plot for Q2.3(n) linearly grows to 30.4 and 13.5 s for query 9 in the left and right graph, respectively.

P. Michiels et al. / Information Systems 33 (2008) 182–202198

interesting to see that Q1.3(n) scales better thanQ1.2(n), suggesting that Qizx deals well with thepositional predicate. The redundant predicate inQ1.6(n) and the positional predicate in Q1.4(n) areclearly not optimized.

3.9.2. XQuery

The Qizx experiments run substantially fasterthan the Qexo ones, but show surprisingly similargraph shapes in Fig. 18. The only difference is thatin the absence of serialization there seems to be avery slight superlinear tendency for Q2.3(n). Thesimilarity with other main memory processorssuggests similar evaluation strategies, in which case

the same conclusions apply, i.e., adding a construc-tor to a query can change its behavior from constantto linear, and nesting constructors steepen the slopeof the query scalability plot. Q2.3(n) is consistentlymore expensive, since it requires materializing quitelarge intermediate results.

3.10. Saxon

3.10.1. XPath

The results for Saxon are given in Fig. 19.Q1.1(n)and Q1.7(n) nearly coincide in the both graphs, butstart diverging from the x-axis for larger values of n

in the right graph, suggesting superlinear behavior.

Page 18: Toward microbenchmarking XQuery

ARTICLE IN PRESS

4

5

6

7

8

9

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Saxon,

Document: exponential2.xml (11.89 MB)

4

4.5

5

5.5

6

6.5

7

1915100501

Tim

e (

sec)

Query

XCheck, Engine: Saxon,

Document: exponential2.xml (11.89 MB)

Fig. 19. Results for queries Q1.1(n)–Q1.7(n), using exponential2.xml on Saxon, including serialization (left) and without serialization

(right).

4

5

6

7

8

9

10

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Saxon,

Document: mixed.xml (11.45 MB)

4

5

6

7

8

9

10

9876543210

Tim

e (

sec)

Query

XCheck, Engine: Saxon,

Document: mixed.xml (11.45 MB)

Fig. 20. Results for queriesQ2.1(n)–Q2.6(n), using mixed.xml on Saxon, including serialization (left) and without serialization (right). The

plot for Q2.3(n) linearly grows to 23.4 and 16.2 s for query 9 in the left and right graph, respectively.

P. Michiels et al. / Information Systems 33 (2008) 182–202 199

However, the absolute difference between thenumbers is too small to draw definitive conclusionson this. The evaluation of child and descendantsteps seems to scale linearly with the size of theresult. Q1.4(n) shows somewhat poorer scalability,although performance is still very good in absolutenumbers, especially when compared with some ofthe secondary storage systems. Q1.3(n) performsbetter than Q1.2(n) in the absence of serialization,indicating a smart way to deal with the positionalpredicate. The redundant predicate inQ1.6(n) seemsto be handled quite well, since the graph for Q1.6(n)almost coincides with that of Q1.2(n). Generally

speaking, Saxon’s combination of scalability andabsolute performance makes it stand out, evencompared with the secondary storage systems.

3.10.2. XQuery

The results for the Saxon experiments are given inFig. 20. The graphs are quite similar to those of otherengines. Adding a constructor to Q2.5(n) (which runsin near-constant time) makes the query suddenly scalelinearly. This may indicate that some buffering isgoing on. Nesting constructors in Q2.6(n) is slightlymore expensive. The inability to avoid buffering inQ2.3(n) has obvious consequences.

Page 19: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202200

4. Concluding remarks

In this section, we summarize some of ourexperiments’ conclusions and hint to avenues forfuture work.

4.1. Methodology lessons learned

A first observation is that benchmarking needs tobe performed for a quite substantial number of datapoints. Failing to do so comes at the risk of notpicking up important facts. For instance, in theSaxon case exponential behavior of XPath evalua-tion only becomes apparent for path expressionsthat are long enough (see Fig. 19).

Another important part of our approach is tovary only one benchmark parameter at a time. Forinstance, varying both document size and query sizeat the same time may cause effects that cancel eachother out, eventually blurring or hiding the impactof the separate changes. This is why we ran thequeries Q1.1(n) on the layered document in order tosee the relation between the scalability of XPath andthe size of the result.

We also believe it is important to decomposequery evaluation times into their components, asthis is the only way to understand the impact ofseveral parameters on the times. Especially serial-ization time and—at least for main memoryprocessors—document loading times are important.For instance, for Galax (Fig. 11) and QizX (Fig. 17)we have seen that query evaluation times aresometimes dwarfed by the time needed to serializethe result. It would be quite inappropriate to onlyreport query evaluation time for such queries andsystems, ignoring serialization. It would just bepuzzling to report the overall time, without check-ing the respective evolution of its components.

A major problem is that an easy way to report anexact decomposition of the query processing timedoes not always exist. Lazy evaluation poses someserious challenges on this. Further, current systemreleases make it increasingly difficult or impossibleto properly evaluate component times. The namesand interpretations of such components’ time alsovary. For instance, while the separation between‘‘execution’’ and ‘‘serialization’’ seemed clean in allsystems, in the case of QizX, this separation is quitearbitrary, as our previous study has shown [19].Extreme caution is therefore advised to the carefultester. Few assumptions should be made on what a

system does, or what its intermediate times mean, as

long as these assumptions have not been checked with

the system developers. This is why we have resortedto tricks like using the count() function todiscover the impact of several XQuery processingphases on the total execution time. Special careshould be taken that the tested engines do not usethis function application to enhance the queryperformance.

Another unexpected twist came up by running allthe implementations on one machine with a ratherlimited amount of physical memory. Clearly, someimplementations—especially BerkeleyDB and eX-ist—suffer from severe performance penalties formemory swapping or are simply uncapable ofcompleting queries when the system’s memory israther small. This is an important information andcalls for the extension of the benchmarking efforttoward measuring memory consumption. Clearly,BerkeleyDB is not a good candidate for XML queryprocessing on embedded systems.

4.2. Performance lessons learned

Our study does not warrant a claim of havingfound the best XQuery processor among the systemswe tested. It seems likely that different systemsperform well at different features and under differentcircumstances. For instance, MonetDB (Fig. 13)performs quite badly on Q1.4(n) compared to Qizx(Fig. 17) but it performs better for all other queries.Galax performs bad in absolute numbers, but has avery favorable scalability for streamable queries.

The impact of the implementation language on

performance is quite limited. We might haveexpected a system implemented in Cþþ to performsome order of magnitude faster; however, Java andOCaml times are very competitive and sometimeseven better. This is reason for optimism, as wewould indeed prefer algorithms and efficient tech-niques to have a bigger impact on performance thanthe simple choice of the development language.

The performance of child axis navigation, a basicXPath feature, can be assessed by inspecting theexecution time of Q1.1(n) on exponential2.xml andlayered.xml.

On exponential2.xml, the number of returnednodes is in Oð2nÞ, and the number of nodesvisited by a naıve XPath evaluation strategy isalso in Oð2nÞ. � On layered.xml, the number of returned nodes is

constant (and equal to the number of nodes

Page 20: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202 201

returned by Q1.1(14)), whereas the number ofnodes visited by a naıve XPath evaluationstrategy is in OðnÞ.

A linearly increasing execution time for Q1.1(n) onlayered.xml may show that the engine indeedimplements a naıve top–down XPath evaluationstrategy, which visits all intermediate nodes betweenthe root and the target nodes. This is probably thecase for Qexo (Fig. 3, right graph). On the contrary,for systems such as MonetDB (Fig. 13), we see thatthe execution time is extremely small (and roughlyconstant).

Apparently, many engines (BerkeleyDB, Qexo,Qizx, Saxon) use a straightforward XPath imple-mentation strategy that requires some form ofintermediate sorting and duplicate removal. Others(eXist, Galax, MonetDB) use more specializedalgorithms that avoid these blocking operations.

The impact of imprecise navigation (expressed with

// steps) can be assessed by comparing Q1.1(n)with Q1.7(n). In our tests, these queries areequivalent, while the second uses //. There is littledifference among these queries for eXist, Galax,MonetDB and Saxon, which shows that thesesystems do not need to visit irrelevant nodes whenevaluating Q1.7(n). BerkeleyDB, Qexo and Qizxexhibit some performance differences, suggestingthat // navigation is suboptimal for these engines.

Early-stop optimizations allow the evaluation ofqueries like Q1.3(n) to proceed much faster thanQ1.2(n). CDuce is a clear winner here. Results forMonetDB and Qexo show that Q1.3(n) is actuallyrunning slower than Q1.2(n), indicating that theseoptimizations are not picked up.

Existential branches or path predicates exhibitdifferent performance depending on the testedsystem. This can be assessed by comparingQ1.2(n) with Q1.6(n), as well as comparingQ1.2(n) with Q1.5(n). Each of these pairs containsequivalent queries on the tested documents. ForGalax, MonetDB and Qexo, Q1.6(n), featuring anexistential test is more expensive than Q1.2(n),whereas for BerkeleyDB and Saxon, the existentialbranch seems to cause no overhead at all. In Galax,Q1.5(n) is more expensive, since it prohibitsstreaming evaluation, but for the other engines thisis not the case, probably due to efficient applicationof shortcut evaluation.

Result serialization times corresponding to ourqueries are an important component of totalrunning times, and in some cases, the dominant

one (see, for instance Fig. 13). This depends ofcourse on the chosen queries and data; however, ourtests stress the importance this (often ignored) partof query execution times may take. Overall, serial-ization time reflects the number and size of thereturned subtrees. The impact of some OCaml (orsystem programming) issues leads either to surpris-ing performance variations (Figs. 11), or to un-feasible queries such as for eXist (Figs. 9 and 10).

Complex result construction such as required byQ2.3(n) and Q2.6(n) is related to the problem ofserializing large results; Q2.6(n) constructs morecomplex new trees, but of smaller size. Interestingly,Galax (Fig. 12) handles Q2.6(n) well, whereasBerkeleyDB and eXist do not (missing measure-ments in Figs. 10 and 6).

New node creation is also an interesting feature inour measures. Queries Q2.4(n) and Q2.5(n) differonly by the fact that Q2.4(n) constructs new trees(which requires copying some t2 descendents),whereas Q2.5(n) returns these nodes from theoriginal documents. Several strategies are possible.Galax copies the nodes in a lazy manner, avoidingthe full materialization of the copied trees. All otherengines—except Qexo, apparently—copy and ma-terialize the subtrees under the constructed nodes,causing Q2.4(n) to be substantially less performant.

Finally, we also remark on the good performanceof main memory implementations, for the moder-ate-sized documents used in the experiments.

4.3. Future work

This paper is only a starting point in a greatereffort, which targets a better understanding of thedrawbacks and advantages of XPath/XQuery im-plementation strategies. An important step towardsuch a better understanding is the development ofmany more microbenchmarks, to reveal detailedinformation regarding the performance of imple-mentations at isolated language features. We alsoneed to encourage developers to use our microbe-nchmarks, which can help—as proven before—toidentify bugs and bottlenecks in their implementa-tions.

Feedback from the experiences in running theexperiments in this paper will be used to improvethe XCheck platform. One thing that we can takeaway from the results of our tests is that a moreclear separation of query evaluation times andserialization times is desirable. This could beincluded in the platform mentioned above by doing

Page 21: Toward microbenchmarking XQuery

ARTICLE IN PRESSP. Michiels et al. / Information Systems 33 (2008) 182–202202

an additional run without serialization and/or a runfor measuring document loading times. Addition-ally, tools for measuring memory usage would be avery welcome addition.

Important additional tests to be performed aresystem stress tests, which identify the operationalboundaries of XPath and XQuery implementations.Other interesting aspects to be tested include: theimpact of XML Schema information on navigationqueries; handling of atomic values, value joins, etc;handling of increasing branching factors by XPath,etc.

References

[1] L. Afanasiev, I. Manolescu, P. Michiels, MemBeR: a micro-

benchmark repository for XQuery, in: XSym, Lecture Notes

in Computer Science, vol. 3671, Springer, Berlin, 2005, pp.

144–161.

[2] A.R. Schmidt, F. Waas, M.L. Kersten, D. Florescu, M.J.

Carey, I. Manolescu, R. Busse, Why and how to benchmark

XML databases, SIGMOD Record 3 (30) (2001) 27–32.

[3] T. Bohme, E. Rahm, Xmach-1: a benchmark for XML data

management, in: Proceedings of BTW2001, Oldenburg, 7.-9.

Mz, Springer, Berlin, March 2001.

[4] S. Bressan, G. Dobbie, Z. Lacroix, M. Lee, Y. Li, U.

Nambiar, B. Wadhwa, X007: applying 007 benchmark to

XML query processing tool, in: CIKM, ACM, New York,

2001, pp. 167–174.

[5] B. Yao, T. Ozsu, N. Khandelwal, XBench benchmark and

performance testing of XML DBMSs, in: ICDE, IEEE

Computer Society, Silver Spring, 2004, pp. 621–633.

[6] L. Afanasiev, M. Franceschet, M. Marx, E. Zimuel, Xcheck:

a platform for benchmarking XQuery engines (demo), in:

VLDB, 2006.

[7] L. Afanasiev, M. Marx, XCheck, an automated XQuery

benchmark tool. hhttp://ilps.science.uva.nl/Resources/

XChecki, 2005.

[8] L. Afanasiev, I. Manolescu, C. Miachon, P. Michiels, Micro-

benchmarking XQuery experiments page. hhttp://www-

rocq.inria.fr/�manolesc/microbenchmarks.htmli, 2006.

[9] L. Afanasiev, I. Manolescu, P. Michiels, Member: a micro-

benchmark repository. hhttp://ilps.science.uva.nl/Resources/

MemBeR/i, 2005.

[10] L. Mignet, D. Barbosa, P. Veltri, The XML web: a first

study, in: WWW Conference, 2003.

[11] W.W.W. Consortium, XML path language (XPath) version

2.0—W3C Working Draft. 2005. hhttp://www.w3.org/TR/

xpath20/i.

[12] S. Boag, D. Chamberlin, M. Fernandez, D. Florescu, J.

Robie, J. Simeon, XQuery 1.0 An XML query language,

W3C Working Draft, April 2005. hhttp://www.w3.org/TR/

xqueryi.

[13] A. Frisch, G. Castagna, V. Benzaken, Semantic Subtyping,

in: Proceedings, Seventeenth Annual IEEE Symposium on

Logic in Computer Science, IEEE Computer Society Press,

Silver Spring, 2002, pp. 137–146.

[14] V. Benzaken, G. Castagna, C. Miachon, A full pattern-

based paradigm for XML query processing, in: PADL 05,

7th International Symposium on Practical Aspects of

Declarative Languages, Lecture Notes in Computer Science,

vol. 3350, Springer, Berlin, January 2005, pp. 235–252.

[15] H. Hosoya, B. Pierce, XDuce: a typed XML processing

language, ACM Trans. Internet Technol. 3 (2) (2003)

117–148.

[16] M.F. Fernandez, J. Hidders, P. Michiels, J. Simeon, R.

Vercammen, Optimizing sorting and duplicate elimination in

XQuery path expressions, in: K.V. Andersen, J.K. Deben-

ham, R. Wagner (Eds.), DEXA, Lecture Notes in Computer

Science, vol. 3588, Springer, Berlin, 2005, pp. 554–563.

[17] T. Grust, M. van Keulen, J. Teubner, Staircase join: teach a

relational DBMS to watch its (axis) steps, in: J.C. Freytag,

P.C. Lockemann, S. Abiteboul, M.J. Carey, P.G. Selinger,

A. Heuer (Eds.), VLDB, Morgan Kaufmann, Los Altos,

2003, pp. 524–525.

[18] N. Bruno, N. Koudas, D. Srivastava, Holistic twig joins:

optimal XML pattern matching, in: M.J. Franklin, B.

Moon, A. Ailamaki (Eds.), SIGMOD, ACM, New York,

2002, pp. 310–321.

[19] I. Manolescu, C. Miachon, P. Michiels, Towards XML

microbenchmarking, in: First International Workshop on

Performance and Evaluation of Data Management Systems

(EXPDB), Chicago, 2006.

[20] M. Stark, M. Fernandez, P. Michiels, J. Simeon, XQuery

streaming a la carte, in: ICDE, 2007, to appear.