38
XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and validate XML documents plus how to use XQuery Skill Level: Intermediate Mark Lorenz ([email protected]) Senior Application Architect Hatteras Software, Inc. 26 Sep 2006 Parsing and validation represent the core of XML. Knowing how to use these capabilities well is vital to the successful introduction of XML to your project. This tutorial on XML processing teaches you how to parse and validate XML files as well as use XQuery. It is the third tutorial in a series of five tutorials that you can use to help prepare for the IBM certification Test 142, XML and Related Technologies. Section 1. Before you start In this section, you'll find out what to expect from this tutorial and how to get the most out of it. About this series This series of five tutorials helps you prepare to take the IBM certification Test 142, XML and Related Technologies, to attain the IBM Certified Solution Developer - XML and Related Technologies certification. This certification identifies an intermediate-level developer who designs and implements applications that make use of XML and related technologies such as XML Schema, Extensible Stylesheet Language Transformation (XSLT), and XPath. This developer has a strong understanding of XML fundamentals; has knowledge of XML concepts and related technologies; understands how data relates to XML, in particular with issues associated with information modeling, XML processing, XML rendering, and Web XML processing © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 38

XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

XML and Related Technologies certification prep,Part 3: XML processingExplore how to parse and validate XML documents plus how touse XQuery

Skill Level: Intermediate

Mark Lorenz ([email protected])Senior Application ArchitectHatteras Software, Inc.

26 Sep 2006

Parsing and validation represent the core of XML. Knowing how to use thesecapabilities well is vital to the successful introduction of XML to your project. Thistutorial on XML processing teaches you how to parse and validate XML files as wellas use XQuery. It is the third tutorial in a series of five tutorials that you can use tohelp prepare for the IBM certification Test 142, XML and Related Technologies.

Section 1. Before you start

In this section, you'll find out what to expect from this tutorial and how to get themost out of it.

About this series

This series of five tutorials helps you prepare to take the IBM certification Test 142,XML and Related Technologies, to attain the IBM Certified Solution Developer - XMLand Related Technologies certification. This certification identifies anintermediate-level developer who designs and implements applications that makeuse of XML and related technologies such as XML Schema, Extensible StylesheetLanguage Transformation (XSLT), and XPath. This developer has a strongunderstanding of XML fundamentals; has knowledge of XML concepts and relatedtechnologies; understands how data relates to XML, in particular with issuesassociated with information modeling, XML processing, XML rendering, and Web

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 38

Page 2: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

services; has a thorough knowledge of core XML-related World Wide WebConsortium (W3C) recommendations; and is familiar with well-known, bestpractices.

Anyone working in software development for the last few years is aware that XMLprovides cross-platform capabilities for data, just as the Java® programminglanguage does for application logic. This series of tutorials is for anyone who wantsto go beyond the basics of using XML technologies.

About this tutorial

This tutorial is the third in the "XML and Related Technologies certification prep"series that takes you through the key aspects of effectively using XML technologieson Java projects. This third tutorial focuses on XML processing -- that is, how toparse and validate XML documents. It lays the groundwork for Part 4, which focuseson transformation, including the use of XSLT, XPath, and Cascading Style Sheets(CSS).

This tutorial is written for Java programmers who have a basic understanding ofXML and whose skills and experience are at a beginning to intermediate level. Youshould have a general familiarity with defining, validating, and reading XMLdocuments, as well as a working knowledge of the Java language.

Objectives

After completing this tutorial, you will know how to:

• Parse XML documents using the Simple API for XML 2 (SAX2) andDocument Object Model 2 (DOM2) parsers

• Validate XML documents against Document Type Definitions (DTDs) andXML Schemas

• Access XML content from databases using XQuery

Prerequisites

This tutorial is written for developers who have a background in programming andscripting and who have an understanding of basic computer-science models anddata structures. You should be familiar with the following XML-related,computer-science concepts: tree traversal, recursion, and reuse of data. You shouldbe familiar with Internet standards and concepts, such as Web browser,client-server, documenting, formatting, e-commerce, and Web applications.Experience designing and implementing Java-based computer applications andworking with relational databases is also recommended.

developerWorks® ibm.com/developerWorks

XML processingPage 2 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 3: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

System requirements

To run the examples in this tutorial, you need a Linux® or Microsoft® Windows® boxwith at least 50MB of free disk space and administrative access to install software.The tutorial uses, but does not require, the following software:

• Java software development kit (JDK) 1.4.2 or later

• Eclipse 3.1 or later

• XMLBuddy 2.0 or later (Note: Some portions of the series use capabilitiesof XMLBuddy Pro, which is not free.)

See Resources for links to download the above software

Section 2. Parsing XML documents

You can parse an XML document in multiple ways (see Part 1 of this series, whichfocuses on architecture), but the SAX parser and the DOM parser constitute theprimary ways. Part 1 features a high-level comparison of the two (see Resources).

StAXA new API, called Streaming API for XML (StAX), is to be releasedin late 2006. It is a pull API, as opposed to SAX's push model, so itkeeps control with the application rather than the parser. You canalso use StAX to modify the document being parsed. Read more in"An Introduction to StAX" (see Resources).

XML instance document

This tutorial uses a store's catalog of available DVDs for purchase as the documentthroughout. Conceptually, the catalog contains a collection of DVDs with informationabout each DVD associated with it. The actual document is a short catalog with onlyfour DVDs in it, but it has enough complexity for you to learn about XML processing,including validation. Listing 1 shows the file.

Listing 1. The XML instance document for the DVD catalog

<?xml version="1.0"?><!DOCTYPE catalog SYSTEM "dvd.dtd"><!-- DVD inventory --><catalog><dvd code="_1234567">

<title>Terminator 2</title><description>

A shape-shifting cyborg is sent back from thefuture

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 38

Page 4: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

to kill the leader of the resistance.</description><price>19.95</price><year>1991</year>

</dvd><dvd code="_7654321">

<title>The Matrix</title><price>12.95</price><year>1999</year>

</dvd><dvd code="_2255577" genre="Drama">

<title>Life as a House</title><description>

When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage

son.</description><price>15.95</price><year>2001</year>

</dvd><dvd code="_7755522" genre="Action">

<title>Raiders of the Lost Ark</title><price>14.95</price><year>1981</year>

</dvd></catalog>

Using the SAX parser

As Part 1 of this series discussed, the SAX parser is an event-based parser. Thismeans that the parser sends events to callback methods as it parses a document(see Figure 1). For simplicity, Figure 1 doesn't show all the events that wouldactually occur.

Figure 1. SAX parser events

developerWorks® ibm.com/developerWorks

XML processingPage 4 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 5: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

These events are pushed out to the application in real time, as the parser movesacross the document contents. One benefit of this processing model is that you canhandle large documents with relatively little memory. A downside is that you havemore work to do to handle all these events.

The org.xml.sax package contains a set of interfaces. One of these provides theXMLReader interface to the parser. You can set up for parsing like this:

try {XMLReader parser = XMLReaderFactory.createXMLReader();parser.parse( "myDocument.xml" ); //complete path

} catch ( SAXParseException e ) {//document is not well-formed

} catch ( SAXException e ) {//could not find an implementation of XMLReader

} catch ( IOException e ) {//problem reading document file

}

Apache Xerces2 parserIf you need a parser, you can download the open source ApacheXerces2 parser from The Apache Software Foundation Web site(see Resources).

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 5 of 38

Page 6: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Tip: Reuse the parser instance if possible. Creating a parser is expensive. If youhave multiple threads running, you can reuse parser instances from a resource pool.

This is all well and good so far, but how does your application get events from theparser? I'm glad you asked.

Handling SAX events

To receive events from the parser, you implement the ContentHandler interface.This interface has a number of methods that you can implement to process yourdocument. Alternatively, if you only want to handle one or two callbacks, you cansubclass DefaultHandler, which implements all the ContentHandler methods(doing nothing) and overrides only the methods you need.

Either way, you write logic to do whatever processing you require upon receivingstartElement, characters, endDocument, and other callback methods invokedby the SAX parser. You can see all the method calls from a document as they wouldoccur on pages 351-355 of XML in a Nutshell, Third Edition (see Resources).

The callback events are the normal events from a document as it's being parsed.You can also handle validity callbacks by implementing an ErrorHandler. I'lldiscuss this topic after I go over validation, so stay tuned.

To learn more about parsing with SAX, check out Chapter 20 of XML in a Nutshell,Third Edition or read "Serial Access with the Simple API for XML (SAX)" (seeResources).

SAX parser exception handling

By default, the parser ignores errors. To take action upon an invalid ornon-well-formed document, you must implement an ErrorHandler (note thatDefaultHandler implements this as well as the ContentHandler interface) anddefine an error() method:

public class SAXEcho extends DefaultHandler {...//Handle validity errorspublic void error( SAXParseException e ) {

echo( e.getMessage() );echo( "Line " + e.getLineNumber() +

" Column " + e.getColumnNumber();}

Then you must turn on the validation feature:

parser.setFeature( "http://xml.org/sax/features/validation", true );

Finally, call this code:

parser.setErrorHandler( saxEcho );

developerWorks® ibm.com/developerWorks

XML processingPage 6 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 7: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Remember, parser is an instance of XMLReader. The parser calls the error()method if the document violates a schema (DTD or XML Schema) rule.

Other ErrorHandler methodsErrorHandler also has warning and fatalError methods, fornonviolations and well-formedness violations, respectively. Youdon't normally need to do anything in these methods.

Echoing SAX events

As an exercise for the SAX parser skills you've learned, use the SAXEcho.java codein Listing 2 to output the parser events for the catalog.xml file.

Listing 2. Echoing SAX events

package com.xml.tutorial;

import java.io.IOException;import java.io.OutputStreamWriter;import java.io.Writer;

import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.XMLReader;import org.xml.sax.helpers.DefaultHandler;import org.xml.sax.helpers.XMLReaderFactory;

/*** A handler for SAX parser events that outputs certain event* information to standard output.** @author mlorenz*/public class SAXEcho extends DefaultHandler {public static final String XML_DOCUMENT_DTD = "catalogDTD.xml";

//validates via catalog.dtdpublic static final String XML_DOCUMENT_XSD = "catalogXSD.xml";

//validates via catalog.xsdpublic static final String NEW_LINE = System.getProperty("line.separator");protected static Writer writer;/*** Constructor*/

public SAXEcho() {super();

}/*** @param args*/

public static void main(String[] args) {//-- Set up my instance to handle SAX eventsDefaultHandler eventHandler = new SAXEcho();//-- Echo to standard outputwriter = new OutputStreamWriter( System.out );try {//-- Create a SAX parserXMLReader parser = XMLReaderFactory.createXMLReader();parser.setContentHandler( eventHandler );parser.setErrorHandler( eventHandler );parser.setFeature(

"http://xml.org/sax/features/validation", true );//-- Validation via DTD --echo( "=== Parsing " + XML_DOCUMENT_DTD + " ===" + NEW_LINE );

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 7 of 38

Page 8: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

//-- Parse my XML document, reporting DTD-related errorsparser.parse( XML_DOCUMENT_DTD );//-- Validation via XSD --parser.setFeature(

"http://apache.org/xml/features/validation/schema",true );

echo( NEW_LINE + NEW_LINE + "=== Parsing " +XML_DOCUMENT_XSD + " ===" + NEW_LINE );

//-- Parse my XML document, reporting XSD-related errorsparser.parse( XML_DOCUMENT_XSD );

} catch (SAXException e) {System.out.println( "Parsing Exception occurred" );e.printStackTrace();

} catch (IOException e) {System.out.println( "Could not read the file" );e.printStackTrace();

}System.exit(0);

}//--Implement SAX callback events of interest (default is do nothing) --/* (non-Javadoc)* @see org.xml.sax.helpers.DefaultHandler#startElement(java.lang.String,

* java.lang.String, java.lang.String, org.xml.sax.Attributes)* @see org.xml.sax.ContentHandler interface* Element and its attributes*/

@Overridepublic void startElement( String uri,

String localName,String qName,Attributes attributes)

throws SAXException {if( localName.length() == 0 )

echo( "<" + qName );else

echo( "<" + localName );if( attributes != null ) {

for( int i=0; i < attributes.getLength(); i++ ) {if( attributes.getLocalName(i).length() == 0 ) {

echo( " " + attributes.getQName(i) +"=\"" + attributes.getValue(i) + "\"" );

}}

}echo( ">" );

}/* (non-Javadoc)* @see org.xml.sax.helpers.DefaultHandler#endElement(java.lang.String,* java.lang.String, java.lang.String)* End tag*/

@Overridepublic void endElement(String uri, String localName, String qName)

throws SAXException {echo( "</" + qName + ">" );

}/* (non-Javadoc)* @see org.xml.sax.helpers.DefaultHandler#characters(char[], int, int)* Character data inside an element*/

@Overridepublic void characters(char[] ch, int start, int length)

throws SAXException {String s = new String(ch, start, length);echo(s);

}//-- Add additional event echoing at your discretion --/*** Output aString to standard output* @param aString*/

protected static void echo( String aString ) {try {

writer.write( aString );

developerWorks® ibm.com/developerWorks

XML processingPage 8 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 9: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

writer.flush();} catch (IOException e) {

System.out.println( "I/O error during echo()" );e.printStackTrace();

}}/* (non-Javadoc)* @see org.xml.sax.helpers.DefaultHandler#error(org.xml.sax.SAXParseException)* @see org.xml.sax.ErrorHandler interface*/

@Overridepublic void error(SAXParseException e) throws SAXException {

echo( NEW_LINE + "*** Failed validation ***" + NEW_LINE );super.error(e);echo( "* " + e.getMessage() + NEW_LINE +

"* Line " + e.getLineNumber() +" Column " + e.getColumnNumber() + NEW_LINE +

"*************************" + NEW_LINE );try {Thread.sleep( 10 );

} catch (InterruptedException e1) {e1.printStackTrace();

}}

}

You can use the code in SAXEcho.java to see how SAX parsing all comes together.Note that this code does not handle all events, so not everything from the originaldocument will be echoed (see Listing 3). Take a look at the ContentHandlerinterface to see what other messages you might get.

Listing 3. Output from SAXEcho execution

=== Parsing catalogDTD.xml ===<catalog><dvd><title>Terminator 2</title><description>

A shape-shifting cyborg is sent back from the futureto kill the leader of the resistance.

</description><price>19.95</price><year>1991</year></dvd><dvd><title>The Matrix</title><price>10.95</price>

<year>1999</year></dvd><dvd><title>Life as a House</title><description>When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage son.

</description><price>15.95</price><year>2001</year></dvd><dvd><title>Raiders of the Lost Ark</title><price>

14.95</price><year>1981</year></dvd></catalog>

=== Parsing catalogXSD.xml ===<catalog><dvd>

<title>Terminator 2</title><description>A shape-shifting cyborg is sent back from the future

to kill the leader of the resistance.</description><price>19.95</price><year>1991</year>

</dvd><dvd>

<title>The Matrix</title><price>10.95</price><year>1999</year>

</dvd><dvd>

<title>Life as a House</title><description>When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage son.

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 9 of 38

Page 10: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

</description><price>15.95</price><year>2001</year>

</dvd><dvd>

<title>Raiders of the Lost Ark</title><price>14.95</price><year>1981</year>

</dvd></catalog>

Using the DOM parser

In contrast to the SAX parser, the DOM parser builds a tree structure based on theXML document contents (see Figure 2). For simplicity, some parsing actions are notshown.

Figure 2. DOM parser tree

DOM doesn't specify an interface for the XML parser, so different vendors havedifferent parser classes. I'll continue to use the Xerces parser, which has aDOMParser class.

You set up a DOM parser like this:

DOMParser parser = new DOMParser();

developerWorks® ibm.com/developerWorks

XML processingPage 10 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 11: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

try {parser.parse( "myDocument.xml" );Document document = parser.getDocument();

} catch (DOMException e) {// take validity action here

} catch (SAXException e) {// well-formedness action here

} catch (IOException e) {// take I/O action here

}

Traversing the DOM tree

DOM incurs an expense in time and memory to construct an entire document tree.The payback comes from the many ways that you can traverse and manipulate thedocument's content using the tree structure. Figure 3 shows a portion of the DVDcatalog document.

Figure 3. Traversing the DOM tree

The tree has a root, which you can access through theDocument.getDocumentElement() method. From any Node, you can useNode.getChildNodes() to get a NodeList of children of the current Node. Notethat attributes are not considered a child of the containing Node. You can create newNodes, append them, insert them, locate them by name, and remove them. Theseare just a few of the available capabilities.

One of the more powerful methods is Document.getElementsByTagName(),which returns a NodeList of the matching Nodes in the descendant elements. TheDOM tree is available on the client as well as the server.

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 11 of 38

Page 12: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Client traversal

You can traverse the DOM tree in the client, and you can validate actions on anXHTML page through JavaScript from within the browser. For example, the clientmight need to find out if a Node with a particular name already exists:

//-- make sure a new DVD's title is uniquevar titles = document.getElementsByTagName("title");var newTitleValue = newTitle.getNodeValue();var nextTitle;for( i=0; i < titles.getLength(); i++ ) {

nextTitle = titles.item(i); //NodeList access by indexif( nextTitle.getNodeValue().equals( newTitleValue ) {

//take some action}

}

Server traversal

On the server, you will certainly need to manipulate the tree, such as to add a newchild to a Node:

//-- add a new DVD with aName anddescriptionpublic void createNewDvd( String aName,String description ) {

Element catalog =document.getDocumentElement(); //root

Element newDvd = document.createElement(aName );

Element dvdDescription =document.createTextNode( description );

newDvd.appendChild( dvdDescription );catalog.appendChild( newDvd ); //as last

element}

XHTML as an alternativeThis tutorial works with a data document, but the document couldeasily be an XHTML page, in which case you'd see Nodes such ashead, body, p, td, and li.

Caution: Make sure to use DOM interfaces, such as NodeList or NamedNodeMap,to manipulate the tree. The DOM tree is dynamic, meaning it is updated immediatelybased on changes you're making, so if you use local variables to cache values, theymight be wrong. For example, Node.getLength() returns a different value after acall to removeChild().

DOM parser exception handling

DOM3DOM3 has added a DOMErrorHandler, which provides a callbackmechanism to use instead of DOMException. Here is someexample code:

developerWorks® ibm.com/developerWorks

XML processingPage 12 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 13: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

DOMParser parser = new DOMParser();DOMConfiguration domConfig =document.domConfig;domConfig.setParameter( DOMErrorHandlerhandler );

The class that implements the DOMErrorHandler interface has ahandleError(DOMError error) method, which returns true tocontinue processing or false to stop processing (fatal errorsalways stop processing).

The DOM parser throws a DOMException if problems occur during parsing. This isa RuntimeException, since some languages don't support checked exceptions,but you should always catch it or throw it in your Java code.

To detect manipulation problems, use the code of a DOMException. These codestell you what is wrong, such as an attempted change that makes the documentinvalid (DOMException.INVALID_MODIFICATION_ERR) or a target Node thatcould not be found (DOMException.NOT_FOUND_ERR). The DOMException sectionwithin Chapter 9 of Processing XML with Java: A Guide to SAX, DOM, JDOM,JAXP, and TrAX offers a complete list of DOMException codes with explanations(see Resources).

Echoing the DOM tree

As an exercise for the DOM parser skills you've learned, use the DOMEcho.javacode in Listing 4 to output the contents of the DOM tree for the catalog.xml file. Afterthis code echoes the tree information, it then changes the tree and echoes theupdated tree.

Listing 4. Echoing a DOM tree

package com.xml.tutorial;

import java.io.IOException;import java.io.OutputStreamWriter;import java.io.Writer;

import org.w3c.dom.DOMException;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.NamedNodeMap;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import org.w3c.dom.traversal.DocumentTraversal;import org.w3c.dom.traversal.NodeFilter;import org.w3c.dom.traversal.TreeWalker;import org.xml.sax.SAXException;

import com.sun.org.apache.xerces.internal.parsers.DOMParser;

/*** A handler to output certain information about a DOM tree* to standard output.** @author lorenzm*/public class DOMEcho {public static final String XML_DOCUMENT_DTD =

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 13 of 38

Page 14: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

"catalogDTD.xml"; //validates via catalog.dtdpublic static final String NEW_LINE = System.getProperty("line.separator");protected static Writer writer;

// Types of DOM nodes, indexed by nodeType value (e.g. Attr = 2)protected static final String[] nodeTypeNames = {

"none", //0"Element", //1"Attr", //2"Text", //3"CDATA", //4"EntityRef", //5"Entity", //6"ProcInstr", //7"Comment", //8"Document", //9"DocType", //10"DocFragment", //11"Notation", //12

};//-- DOMImplementation features (we only need one for now)protected static final String TRAVERSAL_FEATURE = "Traversal";//-- DOM versions (we're using DOM2)protected static final String DOM_2 = "2.0";

/*** Constructor*/

public DOMEcho() {super();

}

/*** @param args*/

public static void main(String[] args) {//Echo to standard output

writer = new OutputStreamWriter( System.out );//use the Xerces parser

try {DOMParser parser = new DOMParser();parser.setFeature( "http://xml.org/sax/features/validation", true );parser.parse( XML_DOCUMENT_DTD ); //use DTD grammar for validationDocument document = parser.getDocument();echoAll( document );//-- add description for Indiana Jones movie//---- find parent NodeElement indianaJones = document.getElementById("_7755522");//---- insert a description before the price// (anywhere else would be invalid)NodeList prices = indianaJones.getElementsByTagName("price");Node desc = document.createElement("description");desc.setTextContent(

"Indiana Jones is hired to find the Ark of the Covenant");indianaJones.insertBefore( desc, prices.item(0) );//-- now, echo the document again to see the changeechoAll( document );

} catch (DOMException e) { //handle invalid manipulationsshort code = e.code;if( code == DOMException.INVALID_MODIFICATION_ERR ) {

//take action when invalid manipulation attempted} else if( code == DOMException.NOT_FOUND_ERR ) {

//take action when element or attribute not found} //add more checks here as desired

} catch (SAXException e) {e.printStackTrace();

} catch (IOException e) {e.printStackTrace();

}}

/*** Echo all the Nodes, in preorder traversal order, for aDocument* @param aDocument*/

developerWorks® ibm.com/developerWorks

XML processingPage 14 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 15: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

protected static void echoAll(Document aDocument) {if( aDocument.getImplementation().hasFeature(

TRAVERSAL_FEATURE,DOM_2) ) {echo( "=== Echoing " + XML_DOCUMENT_DTD + " ===" + NEW_LINE );Node root = (Node) aDocument.getDocumentElement();int whatToShow = NodeFilter.SHOW_ALL;NodeFilter filter = null;boolean expandRefs = false;//-- depth first, preorder traversalDocumentTraversal traversal = (DocumentTraversal)aDocument;TreeWalker walker = traversal.createTreeWalker(

(org.w3c.dom.Node) root, //where to start//(cannot go "above" the root)

whatToShow, //what to includefilter, //what to excludeexpandRefs); //include referenced entities or not

for( Node nextNode = (Node) walker.nextNode(); nextNode != null;nextNode = (Node) walker.nextNode() ) {

echoNode( nextNode );}

} else {echo( NEW_LINE + "*** " + TRAVERSAL_FEATURE +

" feature is not supported" + NEW_LINE );}

}

/*** Output aNode's name, type, and value to standard output.* @param aNode*/

protected static void echoNode( Node aNode ) {String type = nodeTypeNames[aNode.getNodeType()];String name = aNode.getNodeName();StringBuffer echoBuf = new StringBuffer();echoBuf.append(type);if( !name.startsWith("#") ) { //do not output duplicate names

echoBuf.append(": ");echoBuf.append(name);

}if( aNode.getNodeValue() != null ) {if( echoBuf.indexOf("ProcInst") == 0 )

echoBuf.append( ", " );else

echoBuf.append( ": " ); //output only to first newlineString trimmedValue = aNode.getNodeValue().trim();int nlIndex = trimmedValue.indexOf("\n");if( nlIndex >= 0 ) //found newline

trimmedValue = trimmedValue.substring(0,nlIndex);echoBuf.append(trimmedValue);

}echo( echoBuf.toString() + NEW_LINE );echoAttributes( aNode );

}

/*** Output aNode's attributes to standard output.* @param aNode*/

protected static void echoAttributes(Node aNode) {NamedNodeMap attr = aNode.getAttributes();if( attr != null ) {

StringBuffer attrBuf = new StringBuffer();for( int i = 0; i < attr.getLength(); i++ ) {

String type = nodeTypeNames[attr.item(i).getNodeType()];attrBuf.append(type);attrBuf.append( ": " + attr.item(i).getNodeName() + "=" );attrBuf.append( "\"" + attr.item(i).getNodeValue() + "\"" +

NEW_LINE );}echo( attrBuf.toString() );

}}

/**

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 15 of 38

Page 16: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

* Output aString to standard output* @param aString*/

protected static void echo( String aString ) {try {

writer.write( aString );writer.flush();

} catch (IOException e) {System.out.println( "I/O error during echo()" );e.printStackTrace();

}}

}

Look at some portions of the logic:

protected static final String[] nodeTypeNames = {...

};

This array maps the Node.getNodeType() int value to each of the types ofNodes that you can encounter:

if( aDocument.getImplementation().hasFeature(TRAVERSAL_FEATURE,DOM_2) ) {

DOM1 versus DOM2In DOM1, traversing the document tree was done in a "linear"fashion, with previous and next Nodes acquired usingNodeIterators and NodeFilters. In DOM2, the TreeWalkerinterface added the concept of a current Node, with movement toparent, child, and sibling.

You can read about DOM's NodeIterator and NodeFilter aswell as DOM2's TreeWalker in Chapter 12 of Processing XMLwith Java: A Guide to SAX, DOM, JDOM, JAXP, and TrAX (seeResources).

Bruno R. Preiss explains different tree traversals (see Resources).

DOMEcho takes advantage of the TreeWalker interface introduced in DOM2 (seeDOM 1 versus DOM 2). To be safe, check to make sure your parser supports thisfeature. You can read about all the available features in the "DOM Modules" sectionin Chapter 9 of Processing XML with Java: A Guide to SAX, DOM, JDOM, JAXP,and TrAX (see Resources).

Basically, DOMEcho has an echoAll(Document aDoc) method, which uses theTreeWalker with no filtering to get the Nodes in preorder traversal order (see DOM1 versus DOM 2). echoNode(Node aNode) is then called for each. In turn,echoNode calls echoAttributes(Node aNode) for its Node:

//---- find parent NodeElement indianaJones =document.getElementById("_7755522");

developerWorks® ibm.com/developerWorks

XML processingPage 16 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 17: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

//---- insert a description before the price// (anywhere else would be invalid)NodeList prices =indianaJones.getElementsByTagName("price");Node desc =document.createElement("description");desc.setTextContent(

"Indiana Jones is hired to find theArk of the Covenant");indianaJones.insertBefore( desc,prices.item(0) );

This section of code is what changes the DOM tree. It adds a description in thecorrect place so that the tree is still valid according to the document's schema.

Listing 5 shows the resulting output from DOMEcho.

Listing 5. Output from DOMEcho

=== Echoing catalogDTD.xml ===Text:Comment: DVD inventoryText:Element: dvdAttr: code="_1234567"Text:Element: titleText: Terminator 2Text:Element: descriptionText: A shape-shifting cyborg is sent back from the futureto kill the leader of the resistance.Text:Element: priceText: 19.95Text:Element: yearText: 1991Text:Text:Element: dvdAttr: code="_7654321"Text:Element: titleText: The MatrixText:Element: priceText: 10.95Text:Element: yearText: 1999Text:Text:Element: dvdAttr: code="_2255577"Attr: genre="Drama"Text:Element: titleText: Life as a HouseText:Element: descriptionText: When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage son.Text:Element: priceText: 15.95Text:Element: yearText: 2001

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 17 of 38

Page 18: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Text:Text:Element: dvdAttr: code="_7755522"Attr: genre="Action"Text:Element: titleText: Raiders of the Lost ArkText:Element: priceText: 14.95Text:Element: yearText: 1981Text:Text:=== Echoing catalogDTD.xml ===Text:Comment: DVD inventoryText:Element: dvdAttr: code="_1234567"Text:Element: titleText: Terminator 2Text:Element: descriptionText: A shape-shifting cyborg is sent back from the futureto kill the leader of the resistance.Text:Element: priceText: 19.95Text:Element: yearText: 1991Text:Text:Element: dvdAttr: code="_7654321"Text:Element: titleText: The MatrixText:Element: priceText: 10.95Text:Element: yearText: 1999Text:Text:Element: dvdAttr: code="_2255577"Attr: genre="Drama"Text:Element: titleText: Life as a HouseText:Element: descriptionText: When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage son.Text:Element: priceText: 15.95Text:Element: yearText: 2001Text:Text:Element: dvdAttr: code="_7755522"Attr: genre="Action"Text:Element: titleText: Raiders of the Lost Ark

developerWorks® ibm.com/developerWorks

XML processingPage 18 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 19: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Text:Element: descriptionText: Indiana Jones is hired to find the Ark of the CovenantElement: priceText: 14.95Text:Element: yearText: 1981Text:Text:

Whitespace

You'll notice a lot of Text Nodes in the DOMEcho output (Listing 6), many of themwith nothing apparent as content. Why would that be?

The parser reports whitespace (extra spaces, tabs, and carriage returns) that occurswithin the document's element contents.

Notice what's not reported: whitespace within elements, such as surroundingattributes. Not shown here, but also not reported, is whitespace in the prolog. Notethat there is a Text Element for the description, but the whitespace isnormalized to strip out extra characters before and after the nonwhitespace content.

The Text elements due to whitespace that is in Element content are calledignorable whitespace. Ignorable whitespace is not part of validation, as you're aboutto see in Figure 4.

Figure 4. Whitespace processing

Section 3. Validating XML documents

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 19 of 38

Page 20: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Validation consists of ensuring the proper structure and content of XML documentsusing a grammar. You can specify a grammar by using an XML schema, which cantake the form of a DTD or XML Schema file (see Schemas). This section of thetutorial discusses DTD and XML Schema files.

SchemasTechnically speaking, DTDs, XML Schemas (capital S), and RELAXNG are all types of XML schema (little s). XML Schemas (capital S)are strictly called W3C XML Schemas. In this tutorial, whenever yousee XML Schema, realize that it's the W3C language and not thegeneric schema document description.

Validating using a DTD

A DTD defines constraints to put on an XML instance document. These constraintsare not related to well-formedness. In fact, a document that is not well-formed is notconsidered an XML document at all. Constraints relate to business rules aboutcontent that must hold true for you to be able to use the document with anapplication.

A DTD specifies the elements and attributes that an XML instance document mustcontain to be considered valid. You can associate a document with a DTD byincluding a DOCTYPE statement near the top of the document:

<!DOCTYPE catalog SYSTEM "catalog.dtd">

Now, go through the catalog.dtd file. To validate a document, you need to turnvalidation on and use a validating parser. With this code, turn on validation for theSAX parser:

saxParser.setFeature("http://xml.org/sax/features/validation", true );

With this code, turn on validation for the DOM parser:

domParser.setFeature("http://xml.org/dom/features/validation", true );

Figure 5 shows the catalog.dtd file.

Figure 5. Catalog DTD

developerWorks® ibm.com/developerWorks

XML processingPage 20 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 21: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Go line by line through the DTD to see what is being specified:

<!ELEMENT catalog (dvd+)>

The dvd+ specifies that a <catalog> element has one or more <dvd>s. Makessense; otherwise, you aren't going to be selling too many DVDs!

<!ELEMENT dvd (title, description?, price, year)>

The title, ..., year is called a sequence. It means that the named elementsmust appear in this order as children of a <dvd> element. The question mark afterdescription means that a <dvd> has zero or one description elements -- in otherwords, it's optional but if it is specified, there can only be one (an asterisk meanszero or more, and a plus sign means one or more).

<!ATTLIST dvd code ID #REQUIRED>

An ID type attribute must have a unique name within the document. You'll noticethat in the catalog.xml file, the IDs begin with an underscore. An XML namecannot start with a number, but an underscore (or letter or many other nondigitcharacter) is fine. An element can only have one ID type. REQUIRED, as you mighthave guessed, means that a <dvd> must have a code.

<!ATTLIST dvd genre ( Drama | Comedy | SciFi | Action | Romance ) #IMPLIED>

This is an enumeration. Since it is IMPLIED, it is optional. However, if it does appearin the document, it must be one of the enumerated values (read them as "Drama or

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 21 of 38

Page 22: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Comedy or ...").

<!ELEMENT title (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)><!ELEMENT year (#PCDATA)>

These remaining lines all specify parsed character data. None of these elementsmay have children.

Now try to change the instance document to make sure the rules work correctly.First, add a <description>, but put it at the end of the <dvd>. As expected, youget an error (see Figure 6).

Figure 6. Description error

Now, add a genre (see Figure 7).

Figure 7. Genre error

developerWorks® ibm.com/developerWorks

XML processingPage 22 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 23: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Why didn't that work?! Science fiction is in the list! D'oh -- XML is case-sensitive, asyou know, so "scifi" won't work. It needs to be "SciFi".

Now check to see if IDs really need to be unique. Copy the same code into another<dvd> (see Figure 8).

Figure 8. ID error

Sure enough, you get an appropriate error. You get the idea. Feel free to use theDTD and XML files to try out other changes (see Download for the source files).

DTD exception handling

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 23 of 38

Page 24: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

To handle DTD manipulation errors, you must turn on validation. For Xerces, you setthe schema validation feature to true:

parser.setFeature("http://apache.org/xml/features/validation/schema",true );

You can read about the different Xerces parser features at The Apache SoftwareFoundation Web site (see Resources). To read more about validation with DTDs,check out Chapter 3 of XML in a Nutshell, Third Edition (see Resources).

Validating with SAXEcho

Now, check out the validation. Comment out the price for the Life as a House dvdin the XML document and see the results, using both DTD and XSD files forvalidation. Listing 6 shows the output.

Listing 6. Output from SAXEcho execution

=== Parsing catalogDTD.xml ===<catalog><dvd><title>Terminator 2</title><description>

A shape-shifting cyborg is sent back from the futureto kill the leader of the resistance.

</description><price>19.95</price><year>1991</year></dvd><dvd><title>The Matrix</title><price>10.95</price><year>1999</year></dvd><dvd><title>Life as a House</title><description>

When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage son.

</description><year>2001</year>*** Failed validation **** The content of element type "dvd" must match "(title,description?,price,year)".*************************</dvd><dvd><title>Raiders of the Lost Ark</title><price>14.95</price><year>1981</year></dvd></catalog>=== Parsing catalogXSD.xml ===<catalog>

<dvd><title>Terminator 2</title>

<description>A shape-shifting cyborg is sent back from the future

to kill the leader of the resistance.</description>

<price>19.95</price><year>1991</year>

</dvd><dvd>

<title>The Matrix</title><price>10.95</price><year>1999</year>

</dvd><dvd>

<title>Life as a House</title><description>When a man is diagnosed with terminal cancer,he takes custody of his misanthropic teenage son.

</description>

*** Failed validation **** cvc-complex-type.2.4.a: Invalid content was found starting withelement 'year'. One of '{"":price}' is expected.*************************

developerWorks® ibm.com/developerWorks

XML processingPage 24 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 25: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

<year>2001</year></dvd><dvd>

<title>Raiders of the Lost Ark</title><price>14.95</price><year>1981</year>

</dvd></catalog>

Validating using an XML schema

Perhaps you're wondering: If I have DTDs to make sure a document's structure andcontent is valid, why do I need another way to validate documents? I'll give you afew reasons:

• Granular control over element and attribute values: XML Schemaallows you to specify the format, length, and data type.

• Complex data types: XML Schema supports the creation of new datatypes and specialization from existing types.

• Element occurrence: With XML Schema, granular control of elements ispossible.

• Namespaces: XML Schema works with namespaces, which becomeimportant for organizations that deal with other organizations.

The XML Schema language is more powerful than the DTD language and thus isalso more complicated. One nice aspect is that XML Schemas are written in XML,whereas DTDs are not.

XSDXML Schema is also known as XML Schema Definition, thus the fileextension .xsd.

Let's validate the same XML instance document that you used for DTD validation inListing 1. Listing 7 shows the XML Schema:

Listing 7. Catalog XML Schema

<?xml version="1.0" encoding="UTF-8"?><xs:schema elementFormDefault="qualified" xml:lang="EN"xmlns:xs="http://www.w3.org/2001/XMLSchema">

<!-- Our DVD catalog contains four or more DVDs --><xs:element name="catalog">

<xs:complexType><xs:sequence minOccurs="4" maxOccurs="unbounded">

<xs:element ref="dvd"/></xs:sequence>

</xs:complexType></xs:element>

<!-- DVDs have a title, an optional description, a price, and a release year --><xs:element name="dvd">

<xs:complexType><xs:sequence>

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 25 of 38

Page 26: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

<xs:element name="title" type="xs:string"/><xs:element name="description" type="descriptionString"

minOccurs="0"/>

<xs:element name="price" type="priceValue"/><xs:element name="year" type="yearString"/>

</xs:sequence><xs:attribute name="code" type="xs:ID"/> <!-- requires a unique ID --><xs:attribute name="genre"> <!-- default = optional -->

<xs:simpleType><xs:restriction base="xs:string">

<xs:enumeration value="Drama"/><xs:enumeration value="Comedy"/><xs:enumeration value="SciFi"/><xs:enumeration value="Action"/><xs:enumeration value="Romance"/>

</xs:restriction></xs:simpleType>

</xs:attribute></xs:complexType>

</xs:element>

<!-- Descriptions must be between 10 and 120 characters long --><xs:simpleType name="descriptionString">

<xs:restriction base="xs:string"><xs:minLength value="10"/><xs:maxLength value="120"/>

</xs:restriction></xs:simpleType>

<!-- Price must be < 100.00 --><xs:simpleType name="priceValue">

<xs:restriction base="xs:decimal"><xs:totalDigits value="4"/><xs:fractionDigits value="2"/><xs:maxExclusive value="100.00"/>

</xs:restriction></xs:simpleType>

<!-- Year must be 4 digits, between 1900 and 2099 --><xs:simpleType name="yearString">

<xs:restriction base="xs:string"><xs:pattern value="(19|20)\d\d"/>

</xs:restriction></xs:simpleType>

</xs:schema>

Notice that the XML Schema is a lot more involved than the corresponding DTD. Infact, even taking out the comments and spacing, this schema is more than 50 lineslong, as opposed to the DTD schema that is nine lines long. (Granted, this schemadoes more detailed checking than the DTD does). So, along with more granularcontrol comes more complexity -- a lot more complexity. The message is: If yourvalidation needs don't require an XML Schema, use a DTD.

Review the added value list for XML Schemas to see how the DVD catalogdocuments benefit, in addition to enforcing comparable constraints from the DTDyou used before:

• Granular control over element and attribute values: Unlike the DTD,which allows any character values, the XSD constrains the values ofdescriptions (20 to 120 characters), prices (0.00 to 100.00), and years(1900 to 2999).

• Complex data types: You created new data types that you can reuse

developerWorks® ibm.com/developerWorks

XML processingPage 26 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 27: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

and extend even further: dvd, descriptionString, priceValue, andyearString.

• Element occurrence: Since this tutorial has a small document, I set thenumber of DVDs to be four or more so the document would be valid. Inreality, the minimum would probably be a larger number, but you can seethat these types of constraints are possible.

• Namespaces: You only used namespaces for XML Schema types, butsince XML Schemas are namespace-aware, you know that you can addmore namespaces to control name collisions.

Let's discuss some more points about the XML Schema to understand its contents:

• xs:complexType and xs:simpleType. A complexType is an elementthat contains other elements or attributes:

<xs:element name="dvd"><xs:complexType>

<xs:sequence><xs:element name="title" type="xs:string"/>

...

A simpleType is an element that only contains text and its own attributevalues:

<xs:simpleType name="yearString"><xs:restriction base="xs:string">

<xs:pattern value="(19|20)\d\d"/></xs:restriction>

</xs:simpleType>

In this particular case, you define a new type called yearString thatmust contain four digits and begin with either "19" or "20." You use thexs:restriction element to derive a new, constrained type from anexisting (base) type. You use the xs:pattern facet element to comparevalues to see if they match the specified expression (see Facets).

• xs:sequence. The child elements must appear in the exact order listed(although minOccurs can make an element optional, as you saw):

<xs:sequence><xs:element name="title" type="xs:string"/><xs:element name="description" type="descriptionString" minOccurs="0"/><xs:element name="price" type="priceValue"/><xs:element name="year" type="yearString"/>

</xs:sequence>

The sequence declares that dvds in a valid document must have atitle, optionally followed by a description of between 10 and 120characters, followed by a price of less than US$100 in the format

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 27 of 38

Page 28: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

"nn.nn," and finally a year.

FacetsSchemas support a set of possible aspects for values. Theseaspects are called facets and are used with a restriction to constrainthe valid values. The following facet types are available:

• pattern

• enumeration

• minLength and maxLength

• minInclusive, maxInclusive, minExclusive, andmaxExclusive

• totalDigits and fractionDigits

• whiteSpace

Note: Validation for XML Schemas requires XMLBuddy Pro.

Now make some edits and verify that your constraints are being enforced. Add agenre of Adventure, enter a description more than 120 characters long, andduplicate a dvd code (see Figure 9).

Figure 9. XSD errors

You can see that the genre, unique ID, and description length are all enforced.

XML Schema is capable of much more. Here are a few highlights:

• xs:choice: One of the child elements must appear.

developerWorks® ibm.com/developerWorks

XML processingPage 28 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 29: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

• xs:all: Each of the child elements listed must appear once, but theycan appear in any order.

• xs:group: A set of elements of the group name can be defined and thenreferenced (throughref=groupName).

• xs:attributeGroup: This is the corresponding indicator for attributes,as xs:group is for elements.

• xs:date: This is a Gregorian calendar date as defined in ISO 8601,formatted as YYYY-MM-DD.

• xs:time: The time is represented by hh:mm:ss, with or without "Z" forUTC relative time.

• xs:duration: An amount of years, months, days, hours, and minutes.

As you can see, a lot of built-in power is available when you write an XML Schema.Can't find what you need? Create a new type.

Data types

A powerful feature of XML Schema is the capability to create new data types. Yousaw new types used extensively in the catalog.xsd file, including the creation of theyearString and priceValue types. In this case, these types are only used in thedvd type, but you could use them anywhere that years or prices appear in thedocument.

These types extend existing decimal and string types:

<!-- Price must be < 100.00 --><xs:simpleType name="priceValue">

<xs:restriction base="xs:decimal"><xs:totalDigits value="4"/><xs:fractionDigits value="2"/><xs:maxExclusive value="100.00"/>

</xs:restriction></xs:simpleType>

<!-- Year must be 4 digits, between 1900 and 2099 --><xs:simpleType name="yearString">

<xs:restriction base="xs:string"><xs:pattern value="(19|20)\d\d"/>

</xs:restriction></xs:simpleType>

As I mentioned before, you can specialize an existing type using the restrictionelement in combination with one or more facets. If more than one facet exists, youcan use them in combination to determine which values are valid and which are not.

Pattern matching

The pattern facet element supports a rich expression syntax that is similar to Perl.You saw it used for the yearString, where you can read the pattern "(19|20)\d\d" as "the string must start with either one-nine or two-zero and mustbe followed by two decimal numbers." Table 1 shows a few more patterns.

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 29 of 38

Page 30: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Table 1. XML Schema pattern-matching expressionsPattern Matches

(A|B) A string that matches A or B

A? Zero or one occurrence of a string that matchesA

A* Zero or more occurrences of a string thatmatches A

A+ One or more occurrences of a string thatmatches A

[abcd] A character that matches one of the specifiedcharacters

[^abc] A character other than those specified

\t A tab character

\\ A backslash character

\c An XML name character

\s A space, tab, carriage-return, or line-feedcharacter

. Any character except a carriage return or linefeed

To read more about the many possibilities for expressions, see pages 427-429 ofXML in a Nutshell, Third Edition or view Table 24-5 in Chapter 24 of XML Bible,Second Edition online (see Resources).

XSD exception handling

To handle XML Schema manipulation errors, you must turn on validation. ForXerces, set the schema validation feature to true:

parser.setFeature("http://apache.org/xml/features/validation/schema",true );

You can read about the different Xerces parser features on The Apache SoftwareFoundation Web site (see Resources).

I previously discussed DOMExceptions that can occur due to manipulationproblems. The DOMException's code indicates what type of problem has occurred.

DOMEcho revisited

Change the logic of DOMEcho.java to cause a DOMException. Here's the newlogic:

//---- find parent NodeElement indianaJones = document.getElementById("_7755522");//---- insert a description before the price

developerWorks® ibm.com/developerWorks

XML processingPage 30 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 31: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

// (anywhere else would be invalid)NodeList years = indianaJones.getElementsByTagName("price");Node desc = document.createTextNode(

"Indiana Jones is hired to find the Ark of the Covenant");// This change will now fail validation.

indianaJones.insertBefore( desc, indianaJones );

This results in the following code being executed:

short code = e.code;...} else if( code == DOMException.NOT_FOUND_ERR ) {//take action when element or attribute not foundecho( "*** Element not found" );System.exit(code);

}

To read more about validation with XML Schemas, check out Chapter 17 of XML ina Nutshell, Third Edition, W3Schools, or "Interactive XML tutorials" (see Resources).

Section 4. Using XQuery

XML Query (XQuery) is a language for writing expressions that return matchingresults from XML data, often in a database. The functionality is like that provided bySQL for non-XML content:

"Like SQL, XQuery contains functions for extracting, summarizing,aggregating, and joining data from multiple datasets."--"Java theory and practice: Screen-scraping with XQuery" by BrianGoetz (see Resources)

XQuery expands upon XPath expressions, which the fourth part of this tutorial onXML transformations discusses in detail. An XPath expression is also a valid XQueryexpression. So, why do you need XQuery? The value-add for XQuery is due toclauses that XQuery adds to its expressions, allowing for more complicatedexpressions much like a SELECT statement does in SQL.

XQuery clauses

XQuery contains multiple clauses, represented by the acronym FLWOR: for, let,where, order by, return. Table 2 shows these parts.

Table 2. FLWOR clausesClause Description

for You use this looping construct to assignvalues to variables used within the otherclauses. You declare the variables with a

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 31 of 38

Page 32: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

dollar sign, as in $name, and get valuesassigned to them from the search results.

let You use a let to assign a value to avariable outside of a for.

where Much like in SQL, you use a where clauseto filter the results based on some criteria.

order by You use this clause to determine how tosort the result set (ascending ordescending).

return You use the return clause to determinethe contents of the output of the query. Thecontents can include literals, XML documentcontents, HTML markup, or many otherpossibilities.

XQuery contains a condition that evaluates to true or false and comprises thesearch criteria within the FLWOR clauses. Look at some examples. You can use thedvd.xml file shown in Listing 8 as the XML instance document.

Listing 8. dvd.xml

<?xml version="1.0"?><!-- DVD inventory --><catalog>

<dvd code="1234567"><title>Terminator 2</title><price>19.95</price><year>1991</year>

</dvd><dvd code="7654321">

<title>The Matrix</title><price>12.95</price><year>1999</year>

</dvd><dvd code="2255577">

<title>Life as a House</title><price>15.95</price><year>2001</year>

</dvd><dvd code="7755522">

<title>Raiders of the Lost Ark</title><price>14.95</price><year>1981</year>

</dvd></catalog>

SaxonYou can get the free Saxon tools at Saxonica if you want to try outXQuery yourself (see Resources).

To try this out, I used the Saxon XQuery tools. All my files are in the directory Iunpacked Saxon into. To use XQuery to create an HTML page that lists all the DVDtitles in ascending order, I used the dvdTitles.xq file shown in Listing 9, which alsoshows the output. I used the following command to execute this query:

java -cp saxon8.jar net.sf.saxon.Query -t dvdTitles.xq > dvdTitles.html

developerWorks® ibm.com/developerWorks

XML processingPage 32 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 33: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Listing 9. XQuery to list DVD titles in ascending order

dvdTitles.xq:

<html><body>Available DVDs:<br/><ol>{for $title in doc("dvd.xml")/catalog/dvd/title

order by $titlereturn <li>{data($title)}</li>

}</ol>

</body></html>

dvdTitles.html:

<?xml version="1.0" encoding="UTF-8"?><html>

<body>Available DVDs:<br/>

<ol><li>Life as a House</li><li>Raiders of the Lost Ark</li><li>Terminator 2</li><li>The Matrix</li>

</ol></body>

</html>

In Listing 9, look at the XQuery logic in detail. First of all, the query must besurrounded by curvy brackets ("{}"). You can see in this example that three of theclauses are used (for, order by, and return). You use the doc() function toopen an XML document. $title is a variable that is set to each of the searchresults during each loop. In this case, it is set to each result of the/catalog/dvd/title expression -- thus, its name. The data() function in thereturn clause pulls out just the value from the XML without the tags. If you just put$title, you would get "<title>value</title>," which you don't want in yourHTML output. Notice that the XQuery is surrounded with all the HTML needed tocomplete the page.

Now, suppose you want to output the prices for DVDs that cost more than US$15 indescending order. Listing 10 shows the XQuery and output files.

Listing 10. DVD prices > US$15 in descending order

dvdPriceThreshold.hq

<html><body>DVDs prices below $15.00:<br/><ol>{for $price in doc("dvd.xml")/catalog/dvd/price

where $price < 15.00

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 33 of 38

Page 34: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

order by $price descendingreturn <li>{data($price)}</li>

}</ol>

</body></html>

dvdPrices.html

<?xml version="1.0" encoding="UTF-8"?><html>

<body>DVDs prices below $15.00:

<br/><ol>

<li>14.95</li><li>12.95</li>

</ol></body>

</html>

The main difference with this query is that you specified a where clause. Just forfun, you also reversed the sort order.

Obviously, you can do a lot more to learn the power of XQuery, but I've coveredenough to show you some of the possibilities. To learn more, check out "XQuery"and "Five Practical XQuery Applications" (see Resources).

Section 5. Conclusion

The core of XML is parsing and validation. Knowing how to use these capabilitieswell is vital to the successful introduction of XML to your project.

Summary

In this tutorial on XML processing, you've seen how to:

• Parse XML documents using the SAX2 and DOM2 parsers

• Validate XML documents against DTDs and XML Schemas

• Access XML content from databases using XQuery

developerWorks® ibm.com/developerWorks

XML processingPage 34 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 35: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Downloads

Description Name Size Download method

Sample DTD and XML files x-cert1423-code-samples.zip16KB HTTP

Information about download methods

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 35 of 38

Page 36: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Resources

Learn

• XML and Related Technologies certification prep (developerWorks, August -October, 2006): With this series of five tutorials, prepare to take the IBMcertification Test 142, XML and Related Technologies, to attain the IBMCertified Solution Developer - XML and Related Technologies certification.

• XML: A Manager's Guide, Second Edition (Kevin Dick, Addison-WesleyProfessional, 2002): Read about uses of XML technologies in enterpriseapplications.

• XML in a Nutshell, 3rd Edition (Elliotte Rusty Harold and W. Scott Means,O'Reilly Media, 2004, ISBN: 0596007647): Check out this comprehensive XMLreference with everything from fundamental syntax rules, DTD and XMLSchema creation, XSLT transformations, processing APIs, XML 1.1, plus SAX2and DOM Level 3.

• XQuery (Jim Keogh and Ken Davidson, McGraw-Hill/Osborne, 2005; ISBN:0072262109): Learn to write XQuery expressions in this excerpt from chapter 9of the book XML DeMYSTiFieD.

• Five Practical XQuery Applications (Tim Matthews and Srinivas Pandrangi, 9May 2003): Add XQuery in your own apps to simplify difficult or tedious tasks.

• An Introduction to StAX (Elliotte Rusty Harold, O'Reilly Media, September 17,2003): Read more about Streaming API for XML (StAX) in this article.

• Interactive XML tutorials: Explore a variety of XML topics including, SVG, DTD,Schema, XSLT, DOM and SAX complete with student problems, access toonline parsers to process your answers for immediate feedback.

• W3Schools online Web tutorials: Discover Web-building tutorials, from basicHTML and XHTML to advanced XML, SQL, Database, Multimedia and WAP.

• Java theory and practice: Screen-scraping with XQuery (Brian Goetz,developerWorks, 22 Mar 2005): See how effectively you can use XQuery as anHTML screen-scraping engine.

• Power your mashups with XQuery (Ning Yan, developerWorks, July 2006):Create a mashup application that uses XQuery to couple Web content with XMLdata and Web services.

• The Java XML Validation API (Elliotte Rusty Harold, developerWorks, August2006): Check your documents for conformance to schemas with this XMLvalidation API.

• Saxonica: XSLT and XQuery Processing: Learn about this collection of tools forprocessing XML documents that includes XSLT 2.0, XPath 2.0, XQuery 1.0,and XML Schema 1.0 processors.

• DOMException from Chapter 9 of Processing XML with Java: A Guide to SAX,DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-Wesley

developerWorks® ibm.com/developerWorks

XML processingPage 36 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.

Page 37: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

Professional, 2002): Read about DOMException -- generic, runtime exception.

• DOM Modules section in Chapter 9 of Processing XML with Java: A Guide toSAX, DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-WesleyProfessional, 2002): Read about the fourteen modules in eight differentpackages of DOM2.

• Chapter 12, The DOM Traversal Module of Processing XML with Java: A Guideto SAX, DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-WesleyProfessional, 2002): Delve into this collection of utility interfaces that performmost of the logic to traverse a DOM tree for simpler programs .

• Setting Features: Read how to set and query features from The ApacheSoftware Foundation, 2005.

• Serial Access with the Simple API for XML (SAX): Discover SAX -- theevent-driven, serial-access mechanism for accessing XML documents.

• Tree traversals: Bruno R. Preiss explains different tree traversals.

• XML Bible, Second Edition (Elliotte Rusty Harold): View Table 24-5 in Chapter24 for a grammar of regular expressions symbols for XML schema.

• IBM XML 1.1 certification: Become an IBM Certified Developer in XML 1.1 andrelated technologies.

• XML: See developerWorks XML Zone for a wide range of technical articles andtips, tutorials, standards, and IBM Redbooks.

• developerWorks technical events and webcasts: Stay current with technology inthese sessions.

Get products and technologies

• Apache Xerces2 parser: Download the open source for a XML-compliant parserthat includes the Xerces Native Interface (XNI) framework for building parsercomponents and configurations.

• Java software development kit (JDK) 1.4.2 or later: Download the JDK to buildstandards-based, interoperable apps, applets, and Web services.

• Eclipse 3.1 or later: Download this open source, extensible developmentplatform and application frameworks for building software.

• XMLBuddy 2.0 or later: Download and start to work in XML-related technology,including XML, DTD, XML Schema, RELAX NG, RELAX NG compact syntaxand XSLT. You can get XMLBuddy as an Eclipse plugin.

Discuss

• XML zone discussion forums: Participate in any of several XML-centeredforums.

• developerWorks blogs: Get involved in the developerWorks community.

ibm.com/developerWorks developerWorks®

XML processing© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 37 of 38

Page 38: XML and Related Technologies certification prep, Part 3: XML … · 2010-01-15 · XML and Related Technologies certification prep, Part 3: XML processing Explore how to parse and

About the author

Mark LorenzMark Lorenz is the founder of Hatteras Software, an object-oriented consulting firm,and the author of multiple books on software development. He is certified inobject-oriented analysis and design (OOAD), XML, RAD, and Java. He uses XHTML,Web services, Ajax, JSF, Spring, BIRT, and related Eclipse-based tools to developJava enterprise applications. You can read Mark's blog on technology.

Trademarks

IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBMCorporation in the United States, other countries, or both.Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in theUnited States, other countries, or both.Linux is a trademark of Linus Torvalds in the United States, other countries, or both.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of MicrosoftCorporation in the United States, other countries, or both.

developerWorks® ibm.com/developerWorks

XML processingPage 38 of 38 © Copyright IBM Corporation 1994, 2008. All rights reserved.