Upload
anna-doyle
View
240
Download
4
Embed Size (px)
Citation preview
CHAPTER GOALS
• Understanding XML elements and attributes
• Understanding the concept of an XML parser
• Being able to read and write XML documents
• Being able to design Document Type Definitions for XML documents
XML
• Stands for Extensible Markup Language
• Lets you encode complex data in a form that the recipient can parse easily
• Is independent from any programming language
Advantages of XML
• XML files are readable by both computers and humans
• XML formatted data is resilient to change
o It is easy to add new data elements
o Old programs can process the old information in the new data format
Differences Between XML and HTML
• Both are descendants of SGML (Standard Generalized Markup Language)
• XML is a simplified version of SGML
• XML is very strict but HTML (as used today) is not
• XML tells what the data means; HTML tells how to display data
Differences Between XML and HTML
• XML tags are case-sensitive o <LI> is different from <li>
• Every XML start tag must have a matching end tag
• If a tag has no end-tag, it must end in /> o <img src="hamster.jpeg"/>
• XML attribute values must be enclosed in quotes o <img src="hamster.jpeg" width="400" height="300"/>
Structure of an XML Document • An XML data set is called a document
• The document starts with a header
<?xml version 1.0?>
• The data are contained in a root element <?xml version 1.0?> <purse>
more data </purse>
• The document contains elements and text
Structure of an XML Document • An XML element has one of two forms
<elementTag optional attributes> contents </elementTag> or <elementTag optional attributes/>
• The contents can be elements or text or both
• An example of an element with both elements and text (mixed content):
<p>Use XML for <strong>robust</strong> data formats.</p>
• Avoid mixed content for data descriptions
Structure of an XML Document • An element can have attributes
• The a element in HTML has an href attribute
<a href="http://java.sun.com"> ... </a>
• An attribute has a name (such as href) and a value
• The attribute value is enclosed in either single or double quotes
• Attribute is intended to provide information about the content
<value currency="USD">0.5</value> or
<value currency="EUR">0.5</value>
• An element can have multiple attributes
Parsing XML Documents
• A parser is a program that o Reads a document o Checks whether it is syntactically cornet o Takes some action as it processes the document
• There are two kinds of XML parsers o SAX (Simple Access to XML) o DOM ( Document Object Model)
Parsing XML Documents • SAX parser
o Event-driven o It calls a method you provide to process each construct it encounters o More efficient for handling large XML documents
• DOM parser o Builds a tree that represents the document o When the parser is done, you can analyze the tree o Easier to use for most applications
JAXP • Stands for Java API for XML Processing
• Provides a standard mechanism for DOM parsers to read and create documents
• Part of Java1.4 and above
• Earlier versions need to download additional libraries
Parsing XML Documents • Document interface describes the tree structure of an XML document
• A DocumentBuilder can generate an object of a class that implements Document interface
• Get a DocumentBuilder by calling the static newInstance method of the DocumentBuilderFactory class
• Call newDocumentBuilder method of the factory to get a DocumentBuilder DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder();
Parsing XML Documents • To read a document from a file String fileName = . . . ; File f = new File(filename);
Document doc = builder.parse(f);
• To read a document from a URL on the Internet String urlName = . . . ; URL u = new URL(urlName); Document doc = builder.parse(u);
• To read from an input stream InputStream in = . . . ; Document doc = builder.parse(in);
Parsing XML Documents
• You can inspect or modify the document
• The document tree consists of nodes
• Two node type are Element and Text
• Element and Text are subinterfaces of the Node interface
An XML Document <?xml version="1.0"?><items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <item> <product> <description>4-port Mini Hub</description> <price>19.95</price> </product> <quantity>4</quantity> </item></items>
Parsing XML Documents • Start inspection of the tree by getting the root element Element root = doc.getDocumentElement();
• To get the child elements of an element o Use the GetChildNodes method of the Element interface o The nodes are stored in an object of a class that implements the NodeList interface
• Use a NodeList to visit the child nodes of an element o getLength method gives the number of elements o item method gets an item in the node list
• Code to get a child node NodeList nodes = root.getChildNodes(); int i = . . . ; //a value between o and getlength() - 1 Node child = nodes.item(i);
• The XML parser keeps all white spaces if you don't use a DTD o You can include a test to ignore the white space
Parsing XML Documents
• Get an element name with the getTagName Element priceElement = . . . ;
String name = priceElement.getTagName();
• To find the value of the currency attribute String attributeValue = priceElement.getAttribute("currency")
• You can also iterate through all attributes o Use a NamedNodeMap o Each attribute is stored in a Node
Parsing XML Documents • Some elements have children that contain text
• Document builder creates nodes of type Text
• If you don't use mixed content elements o Any element containing text has a single Text child node o Use getFirstChild method to get it o Use getData method to read the text
• To determine the price stored in the price element Element priceNode = . . . ; Text priceData = (Text)priceNode.getFirstChild(); String priceString = priceNode.getData(); double price = Double.parseDouble(priceString);
File ItemListParser.java 001: import java.io.File;
002: import java.io.IOException;
003: import java.util.ArrayList;
004: import javax.xml.parsers.DocumentBuilder;
005: import javax.xml.parsers.DocumentBuilderFactory;
006: import javax.xml.parsers.ParserConfigurationException;
007: import org.w3c.dom.Attr;
008: import org.w3c.dom.Document;
009: import org.w3c.dom.Element;
010: import org.w3c.dom.NamedNodeMap;
011: import org.w3c.dom.Node;
012: import org.w3c.dom.NodeList;
013: import org.w3c.dom.Text;
014: import org.xml.sax.SAXException;
015:
016: /**
017: An XML parser for item lists
018: */
019: public class ItemListParser
020: {
021: /**
022: Constructs a parser that can parse item lists
023: */
024: public ItemListParser()
025: throws ParserConfigurationException
026: {
027: DocumentBuilderFactory factory
028: = DocumentBuilderFactory.newInstance();
029: builder = factory.newDocumentBuilder();
030: }
031:
032: /**
033: Parses an XML file containing an item list
034: @param fileName the name of the file
035: @return an array list containing all items in the XML file
036: */
037: public ArrayList parse(String fileName)
038: throws SAXException, IOException
039: {
040: File f = new File(fileName);
041: Document doc = builder.parse(f);
042:
043: // get the <items> root element
044:
045: Element root = doc.getDocumentElement();
046: return getItems(root);
047: }
048:
049: /**
050: Obtains an array list of items from a DOM element
051: @param e an <items> element
052: @return an array list of all <item> children of e
053: */
054: private static ArrayList getItems(Element e)
055: {
056: ArrayList items = new ArrayList();
057:
058: // get the <item> children
059:
060: NodeList children = e.getChildNodes();
061: for (int i = 0; i < children.getLength(); i++)
062: {
063: Node childNode = children.item(i);
064: if (childNode instanceof Element)
065: {
066: Element childElement = (Element)childNode;
067: if (childElement.getTagName().equals("item"))
068: {
069: Item c = getItem(childElement);
070: items.add(c);
071: }
072: }
073: }
074: return items;
075: }
076:
077: /**
078: Obtains an item from a DOM element
079: @param e an <item> element
080: @return the item described by the given element
081: */
082: private static Item getItem(Element e)
083: {
084: NodeList children = e.getChildNodes();
085: Product p = null;
086: int quantity = 0;
087: for (int j = 0; j < children.getLength(); j++)
088: {
089: Node childNode = children.item(j);
090: if (childNode instanceof Element)
091: {
092: Element childElement = (Element)childNode;
093: String tagName = childElement.getTagName();
094: if (tagName.equals("product"))
095: p = getProduct(childElement);
096: else if (tagName.equals("quantity"))
097: {
098: Text textNode = (Text)childElement.getFirstChild();
099: String data = textNode.getData();
100: quantity = Integer.parseInt(data);
101: }
102: }
103: }
104: return new Item(p, quantity);
105: }
106:
107: /**
108: Obtains a product from a DOM element
109: @param e a <product> element
110: @return the product described by the given element
111: */
112: private static Product getProduct(Element e)
113: {
114: NodeList children = e.getChildNodes();
115: String name = "";
116: double price = 0;
117: for (int j = 0; j < children.getLength(); j++)
118: {
119: Node childNode = children.item(j);
120: if (childNode instanceof Element)
121: {
122: Element childElement = (Element)childNode;
123: String tagName = childElement.getTagName();
124: Text textNode = (Text)childElement.getFirstChild();
125:
126: String data = textNode.getData();
127: if (tagName.equals("description"))
128: name = data;
129: else if (tagName.equals("price"))
130: price = Double.parseDouble(data);
131: }
132: }
133: return new Product(name, price);
134: }
135:
136: private DocumentBuilder builder;
137: }
File ItemListParserTest.java01: import java.util.ArrayList;
02:
03: /**
04: This program parses an XML file containing an item list.
05: It prints out the items that are described in the XML file.
06: */
07: public class ItemListParserTest
08: {
09: public static void main(String[] args) throws Exception
10: {
11: ItemListParser parser = new ItemListParser();
12: ArrayList items = parser.parse("items.xml");
13: for (int i = 0; i < items.size(); i++)
14: {
15: Item anItem = (Item)items.get(i);
16: System.out.println(anItem.format());
17: }
18: }
19: }
Creating XML Documents • We can build a Document object in a Java program
and then save it as an XML document
• We need a DocumentBuilder object to create a new, empty document DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.newDocument(); //empty document
• The Document class has methods to create elements and text nodes
Creating XML Documents • To create an element use createElement method and
pass it a tag
Element itemElement = doc.createElement("item");
• To create a text node, use createTextNode and pass it a string
Text quantityText= doc.createTextNode("8");
• Use setAttribute method to add an attribute to the tag priceElement.setAttribute("currency", "USD");
Creating XML Documents • To construct the tree structure of a document
o start with the root
o add children with appendChild
• To build an XML tree that describes an item
// create elementsElement itemElement = doc.createElement("item");Element productElement = doc.createElement("product");Element descriptionElement = doc.createElement("description");Element priceElement = doc.createElement("price");Element quantityElement = doc.createElement("quantity");Text descriptionText = doc.createTextNode("Ink Jet Refill Kit");Text priceText = doct.createTextNode("29.95");Text quantityText = doc.createTextNode("8");
// add elements to the documentdoc.appendChild(itemElement);itemElement.appendChild(productElement);itemElement.appendChild(quantityElement);productElement.appendChild(descriptionElement);productElement.appendChild(priceElement);descriptionElement.appendChild(descriptionText);priceElement.appendChild(priceText);quantityElement.appendChild(quantityText);
Creating XML Documents • Use a Transformer to write an XML document to a stream
• Create a transformer Transformer t =
TransformerFactory.newInstance().newTransformer();
• Create a DOMSource from your document
• Create a StreamResult from your output stream
• Call the transform method of your transformer t.transform(new DOMSource(doc),
new StreamResult(System.out));
File ItemListBuilder.java 001: import java.util.ArrayList;
002: import javax.xml.parsers.DocumentBuilder;
003: import javax.xml.parsers.DocumentBuilderFactory;
004: import javax.xml.parsers.ParserConfigurationException;
005: import org.w3c.dom.Document;
006: import org.w3c.dom.Element;
007: import org.w3c.dom.Text;
008:
009: /**
010: Builds a DOM document for an array list of items.
011: */
012: public class ItemListBuilder
013: {
014: /**
015: Constructs an item list builder.
016: */
017: public ItemListBuilder()
018: throws ParserConfigurationException
019: {
020: DocumentBuilderFactory factory
021: = DocumentBuilderFactory.newInstance();
022: builder = factory.newDocumentBuilder();
023: }
024:
025: /**
026: Builds a DOM document for an array list of items.
027: @param items the items
028: @return a DOM document describing the items
029: */
030: public Document build(ArrayList items)
031: {
032: doc = builder.newDocument();
033: Element root = createItemList(items);
034: doc.appendChild(root);
035: return doc;
036: }
037:
038: /**
039: Builds a DOM element for an array list of items.
040: @param items the items
041: @return a DOM element describing the items
042: */
043: private Element createItemList(ArrayList items)
044: {
045: Element itemsElement = doc.createElement("items");
046: for (int i = 0; i < items.size(); i++)
047: {
048: Item anItem = (Item)items.get(i);
049: Element itemElement = createItem(anItem);
050: itemsElement.appendChild(itemElement);
051: }
052: return itemsElement;
053: }
054:
055: /**
056: Builds a DOM element for an item.
057: @param anItem the item
058: @return a DOM element describing the item
059: */
060: private Element createItem(Item anItem)
061: {
062: Element itemElement = doc.createElement("item");
063: Element productElement
064: = createProduct(anItem.getProduct());
065: Text quantityText = doc.createTextNode(
066: "" + anItem.getQuantity());
067: Element quantityElement = doc.createElement("quantity");
068: quantityElement.appendChild(quantityText);
069:
070: itemElement.appendChild(productElement);
071: itemElement.appendChild(quantityElement);
072: return itemElement;
073: }
074:
075: /**
076: Builds a DOM element for a product.
077: @param p the product
078: @return a DOM element describing the product
079: */
080: private Element createProduct(Product p)
081: {
082: Text descriptionText
083: = doc.createTextNode(p.getDescription());
084: Text priceText = doc.createTextNode("" + p.getPrice());
085:
086: Element descriptionElement
087: = doc.createElement("description");
088: Element priceElement = doc.createElement("price");
089:
090: descriptionElement.appendChild(descriptionText);
091: priceElement.appendChild(priceText);
092:
093: Element productElement = doc.createElement("product");
094:
095: productElement.appendChild(descriptionElement);
096: productElement.appendChild(priceElement);
097:
098: return productElement;
099: }
100:
101: private DocumentBuilder builder;
102: private Document doc;
103: }
File ItemListBuilderTest.java01: import java.util.ArrayList;
02: import org.w3c.dom.Document;
03: import javax.xml.transform.Transformer;
04: import javax.xml.transform.TransformerFactory;
05: import javax.xml.transform.dom.DOMSource;
06: import javax.xml.transform.stream.StreamResult;
07:
08: /**
09: This program tests the item list builder. It prints the
10: XML file corresponding to a DOM document containing a list
11: of items.
12: */
13: public class ItemListBuilderTest
14: {
15: public static void main(String[] args) throws Exception
16: {
17: ArrayList items = new ArrayList();
18: items.add(new Item(new Product("Toaster", 29.95), 3));
19: items.add(new Item(new Product("Hair dryer", 24.95), 1));
20:
21: ItemListBuilder builder = new ItemListBuilder();
22: Document doc = builder.build(items);
23: Transformer t = TransformerFactory
24: .newInstance().newTransformer();
25: t.transform(new DOMSource(doc),
26: new StreamResult(System.out));
27: }
28: }
Document Type Definitions • A DTD is a set of rules for correctly formed documents of a particular type
o Describes the legal attributes for each element type
o Describes the legal child elements for each element type
• Legal child elements are described with an ELEMENT rule
<!ELEMENT items (item*)>
• The items element (the root in this case) can have 0 or more item elements
• Definition of an item node
<!ELEMENT item (product, quantity)>
• Children of the item node must be a product node followed by a quantity
node
Document Type Definitions • Definition of product node
<! ELEMENT product (description, price)>
• The other nodes
<!ELEMENT quantity (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)>
• #PCDATA stands for parsable character data which is just text
o Can contain any characters
o Special characters have to be encoded when they occur in character data
DTD for Item List
<!ELEMENT items (item)*>
<!ELEMENT item (product, quantity)>
<!ELEMENT product (description, price)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
Document Type Definitions
• A DTD gives you control over the allowed attributes of an element <!ATTLIST Element Attribute Type Default>
• Type can be any sequence of character data specified as CDATA
• Type can also specify a finite number of choices <!ATTLIST price currency (USD | EUR | JPY ) #REQUIRED >
Document Type Definitions
• #IMPLIED keyword means you can supply an attribute or not.
<!ATTLIST price currency CDATA #IMPLIED >
• If you omit the attribute, the application processing the XML data implicitly assumes some default value
• You can specify a default to be used if the attribute is not specified
<!ATTLIST price currency CDATA "USD" >
Parsing with Document Type Definitions
• Specify a DTD with every XML document
• Instruct the parser to check that the document follows the rules of the DTD
• Then the parser can be more intelligent about parsing
• If the parser knows that the children of an element are elements, it can suppress white spaces
Parsing with Document Type Definitions
• An XML document can reference a DTD in one of two ways
• The document may contain the DTD
• The document may refer to a DTD stored elsewhere
• A DTD is introduced with a DOCTYPE declaration
Parsing with Document Type Definitions
• If the document contains the DTD, the declaration looks like this: <!DOCTYPE rootElement [ rules ]>
• Example <?xml version="1.0"?><!DOCTYPE items [<!ELEMENT items (item*)><!ELEMENT item (product, quantity)><!ELEMENT product (description, price)><!ELEMENT quantity (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)>]>
<items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <item> <product> <description>4-port Mini Hub</description> <price>19.95</price> </product> <quantity>4</quantity> </item></items>
Parsing with Document Type Definitions
• If the DTD is stored outside the document, use the SYSTEM keyword inside the DOCTYPE declaration
• This indicates that the system must locate the DTD
• The location of the DTD follows the SYSTEM keyword
• A DOCTYPE declaration can point to a local file <!DOCTYPE items SYSTEM "items.dtd" >
• A DOCTYPE declaration can point to a URL <!DOCTYPE items SYSTEM "http://www.mycompany.com/dtds/items.dtd">
Parsing with Document Type Definitions
• When your XML document has a DTD, use validation when parsing
• Then the parser will check that all child elements and attributes conformto the ELEMENT and ATTRIBUTE rules in the DTD
• The parser throws an exception if the document is invalid
• Use the setValidating method of the DocumentBuilderFactorybefore calling newDocumentBuilder method
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(. . .);
Parsing with Document Type Definitions
• If the parser validates the document with a DTD, you can avoid validity checks in your code
• You can tell the parser to ignore white space in non-text elements factory.setValidating(true); factory.setIgnoringElementContentWhitespace(true);
• If the parser has access to a DTD, it can fill in defaults for attributes
File ItemListParser.java 001: import java.io.File;
002: import java.io.IOException;
003: import java.util.ArrayList;
004: import javax.xml.parsers.DocumentBuilder;
005: import javax.xml.parsers.DocumentBuilderFactory;
006: import javax.xml.parsers.ParserConfigurationException;
007: import org.w3c.dom.Attr;
008: import org.w3c.dom.Document;
009: import org.w3c.dom.Element;
010: import org.w3c.dom.NamedNodeMap;
011: import org.w3c.dom.Node;
012: import org.w3c.dom.NodeList;
013: import org.w3c.dom.Text;
014: import org.xml.sax.SAXException;
015:
016: /**
017: An XML parser for item lists
018: */
019: public class ItemListParser
020: {
021: /**
022: Constructs a parser that can parse item lists
023: */
024: public ItemListParser()
025: throws ParserConfigurationException
026: {
027: DocumentBuilderFactory factory
028: = DocumentBuilderFactory.newInstance();
029: factory.setValidating(true);
030: factory.setIgnoringElementContentWhitespace(true);
031: builder = factory.newDocumentBuilder();
032: }
033:
034: /**
035: Parses an XML file containing an item list
036: @param fileName the name of the file
037: @return an array list containing all items in the XML file
038: */
039: public ArrayList parse(String fileName)
040: throws SAXException, IOException
041: {
042: File f = new File(fileName);
043: Document doc = builder.parse(f);
044:
045: // get the <items> root element
046:
047: Element root = doc.getDocumentElement();
048: return getItems(root);
049: }
050:
051: /**
052: Obtains an array list of items from a DOM element
053: @param e an <items> element
054: @return an array list of all <item> children of e
055: */
056: private static ArrayList getItems(Element e)
057: {
058: ArrayList items = new ArrayList();
059:
060: // get the <item> children
061:
062: NodeList children = e.getChildNodes();
063: for (int i = 0; i < children.getLength(); i++)
064: {
065: Element childElement = (Element)children.item(i);
066: Item c = getItem(childElement);
067: items.add(c);
068: }
069: return items;
070: }
071:
072: /**
073: Obtains an item from a DOM element
074: @param e an <item> element
075: @return the item described by the given element
076: */
077: private static Item getItem(Element e)
078: {
079: NodeList children = e.getChildNodes();
080:
081: Product p = getProduct((Element)children.item(0));
082:
083: Element quantityElement = (Element)children.item(1);
084: Text quantityText
085: = (Text)quantityElement.getFirstChild();
086: int quantity = Integer.parseInt(quantityText.getData());
087:
088: return new Item(p, quantity);
089: }
090:
091: /**
092: Obtains a product from a DOM element
093: @param e a <product> element
094: @return the product described by the given element
095: */
096: private static Product getProduct(Element e)
097: {
098: NodeList children = e.getChildNodes();
099:
100: Element descriptionElement = (Element)children.item(1);
101: Text descriptionText
102: = (Text)descriptionElement.getFirstChild();
103: String description = descriptionText.getData();
104:
105: Element priceElement = (Element)children.item(1);
106: Text priceText
107: = (Text)priceElement.getFirstChild();
108: double price = Double.parseDouble(priceText.getData());
109:
110: return new Product(description, price);
111: }
112:
113: private DocumentBuilder builder;
114: }
File ItemListParserTest.java01: import java.util.ArrayList;
02:
03: /**
04: This program parses an XML file containing an item list.
05: The XML file should reference the items.dtd
06: */
07: public class ItemListParserTest
08: {
09: public static void main(String[] args) throws Exception
10: {
11: ItemListParser parser = new ItemListParser();
12: ArrayList items = parser.parse("items.xml");
13: for (int i = 0; i < items.size(); i++)
14: {
15: Item anItem = (Item)items.get(i);
16: System.out.println(anItem.format());
17: }
18: }
19: }