View
235
Download
0
Tags:
Embed Size (px)
Citation preview
Document Object Model(DOM)
Cheng-Chia Chen
What is DOM ?
• DOM (Document Object Model)
• A tree-view Data model of XML Documents
• An API for XML document processing– cross multi-languages
– language neutral.
– defined in terms of CORBA IDL
– language-specific bindings supplied for ECMAScript, java, ….
DOM (Document Object Model)
What is the Document Object Model of the following document:
<?xml version=“1.0” encoding=“UTF-8” ?>
<TABLE><TBODY> <TR> <TD>紅樓夢 </TD> <TD>曹雪芹 </TD> </TR> <TR> <TD>三國演義 </TD> <TD>羅貫中 </TD> </TR> </TBODY></TABLE>
Tree view (DOM view) of an XML Docuemnt
紅樓夢 曹雪芹 三國演義 羅貫中
(document node; root)
(element node)
(text node)
Class/interface Hierarchy of DOM (core) level 1&2 spec.
CharacterData
Attr
DocumentType
ProcessingInstruciton
DocumentFragment
Document
Element
(general) Entity
EntityReference
Notation
CDATASection
Commnet
Text
DOMImplementation
NamedNodeMap
NodeList
Node
DOMException
Possible children of different kinds of nodes• Document
– Element (≤ 1), DocumentType (≤ 1) , ProcessingInstruction, Comment,
• Element , DocumentFragment, EntityReference, Entity– Element, ProcessingInstruction, Comment, Text,
CDATASection, EntityReference
• Attr – Text, EntityReference
• Text, CDATASection, Comment, Notation, ProcessingInstruction, DocumentType – are leaves [ no children]
Notes: 1. Attr is not a child of any element. 2. Entities and Natations defined in DTD can be accessed via
getEntities() and getNatations() of DocumentType.
Node and Nodetype constantspublic interface Node { // NodeType: there are 12 kinds of nodes public static final short ELEMENT_NODE = 1; public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; public static final short CDATA_SECTION_NODE = 4; public static final short ENTITY_REFERENCE_NODE = 5; public static final short ENTITY_NODE = 6; public static final short PROCESSING_INSTRUCTION_NODE =
7; public static final short COMMENT_NODE = 8; public static final short DOCUMENT_NODE = 9; public static final short DOCUMENT_TYPE_NODE = 10;
public static final short DOCUMENT_FRAGMENT_NODE = 11;
public static final short NOTATION_NODE = 12;
IDL2Java Mapping of IDL attributes
// syntax of IDL attributes:
[readonly] attribute <type> <attrName> [// raise (<exception>) ]*
// we will abbreviate it by
<type>[R]:<attrName>
which is translated into one or two java methods:
• public <type> get<AttrName>() [throws {<exceptions>}];
if it is readable and
• public void set<AttrName>(<type> <newAttValue> )
[throws {<exceptions>}];
if it is writable.
Example:• The following attributes of the Node interface :
readonly attribute DOMString nodeName;
attribute DOMString nodeValue;
// raises(DOMException) on setting
// raises(DOMException) on retrieval
readonly attribute Node parentNode; are abbreviated as:
String[R]:nodeName,
String:nodeValue,
String[R]:parentNode, respectively, and will be mapped to 4 java
methods:
public String getNodeName();
public String getNodeValue() throws DOMException;
public void setNodeValue(String nodeValue)
throws DOMException;
public Node getParentNode();
Node attributes
// nodeName, nodeType and nodeValue
• String[R] : nodeName;
• short[R] : nodeType;
• String : nodeValue;
// raise(DOMException) on get/set
// namespace support: DOM2 only
• String[R] : namespaceURI;
• String[R] : localName;
• String : prefix
// node owner:
• Document[R]: ownerDocument;
Values of NodeName, NodeType and attributes in a Node
Interface nodeName nodeValue attributesAttr name of attribute value of attribute nullCDATASection #cdata-section content nullComment #comment content nullDocument #document null nullDocumentFragment #document-fragment null nullDocumentType document type name null nullElement tag name null NamedNodeMapEntity entity name null nullEntityReference null name of entity referenced nullNotation notation name null nullProcessingInstruction content excluding target target nullText #text content of the text node null
Node attributes
// node relatives
• Node[R] : parentNode, firstChild, lastChild,
• Node[R] : previousSibling, nextSibling;
• NodeList [R] : childNodes;
• NamedNodeMap[R]: attributes;
previousSliblingthis
firstChild
parentNode
lastChild
nextSibling
childNodes
Node manipulation and testing Methods
public Node insertBefore(Node newChild, Node refChild)
public Node replaceChild(Node newChild, Node oldChild)
public Node removeChild(Node oldChild) public Node appendChild(Node newChild) // all the above 4 methods throws DOMException;public boolean hasChildNodes();public Node cloneNode(boolean deep);// Introduced in DOM Level 2:public boolean hasAttributes(); // ture if element and
hasAttributespublic void normalize(); // merge descendant adjacent Texts
into onepublic boolean isSupported(String feature, String version); // same as hasFeature(feature, version) in DOMImplementation
NodeList and NamedNodeMap
public interface NodeList { // access node collection by index public Node item(int index); // zero-based public int getLength(); }
public interface NamedNodeMap { public Node getNamedItem(String name); // by nodeName public Node setNamedItem(Node arg) throws DOMException; // insert/replace node with nodeName= arg.getNodeName() public Node removeNamedItem(String name) throws DOMException; public Node item(int index); public int getLength(); // Introduced in DOM Level 2: public Node getNamedItemNS(namespaceURI, localName); public Node setNamedItemNS(Node arg) throws DOMException; public Node removeNamedItemNS(namespaceURI, localName) throws DOMException ; }
Elementpublic interface Element extends Node { public String getTagName(); // String[R]:tagName =getName() public String getAttribute(name); //value// set/replace attr ; value not parsed; for value with entity reference,// use setAttributeNode instead public void setAttribute(name, value) throws DOMException; public void removeAttribute(name) throws DOMException; public Attr getAttributeNode(name); public Attr setAttributeNode(Attr newAttr) // add/replace newAttr;
throws DOMException; // return replaced attr or null public Attr removeAttributeNode(Attr oldAttr) throws DOMException; public NodeList getElementsByTagName(name);// and additional DOM2 methods …
Additional ELEMENT methods in DOM2
// Introduced in DOM Level 2:
String getAttributeNS(namespaceURI, localName);
void setAttributeNS(namespaceURI, qualifiedName, value)
throws DOMException;
// set/replace attribute; value not parsed
void removeAttributeNS(namespaceURI, localName) throws DOMException;
Attr getAttributeNodeNS(namespaceURI, localName);
Attr setAttributeNodeNS(Attr newAttr) throws DOMException;
NodeList getElementsByTagNameNS(namespaceURI, localName);
boolean hasAttribute(name);
boolean hasAttributeNS(namespaceURI, localName); };
the Document node public interface Document extends Node {// 3 attributes:DocumentType[R]: doctype;DOMImplementation[R]; implementation;Element[R]: documentElement;
// factory methods: <nodetype> create<nodetype>(data) ;Element createElement(String tagName) throws DOMException;DocumentFragment createDocumentFragment();Text createTextNode(String data);Comment createComment(String data);CDATASection createCDATASection(String data) throws DOMException;ProcessingInstruction createProcessingInstruction(String target, String data)
throws DOMException;
the Document node (cont’d)
Attr createAttribute(name) throws DOMException;EntityReference createEntityReference(name) throws DOMException;// end of factory methodsNodeList getElementsByTagName(tagname); // DOM 2Node importNode(Node importedNode, boolean deep) throws DOMException;Element createElementNS(namespaceURI, qualifiedName) throws DOMException;Attr createAttributeNS(namespaceURI, qualifiedName) throws DOMException;NodeList getElementsByTagNameNS(namespaceURI, localName);public Element getElementById(String elementId); }
CharacterData
public interface CharacterData extends Node { public String getData() throws DOMException; public void setData(String data) throws DOMException; public int getLength(); public String substringData(int offset, int count) throws DOMException; public void appendData(String arg) throws
DOMException; public void insertData(int offset, String arg) throws DOMException; public void deleteData(int offset, int count) throws DOMException; public void replaceData(int offset, int count, String arg) throws DOMException; }
Attr, Text and Commentpublic interface Attr extends Node { public String getName(); public boolean getSpecified(); public String getValue(); public void setValue(String value); public Element getOwnerElement(); // DOM2 } public interface Text extends CharacterData { public Text splitText(int offset) throws DOMException; }
public interface Comment extends CharacterData { }
CDATASection, DocumentType and Notation
public interface CDATASection extends Text {}public interface DocumentType extends Node { String getName(); NamedNodeMap getEntities(); // GEs (int/external) only, // PEs excluded NamedNodeMap getNotations();// DOM2 only methods String getPublicId(); // publicId and String getSystemId(); // systemId of external subset if any String getInternalSubset(); // internal subset as a string } public interface Notation extends Node { public String getPublicId(); public String getSystemId(); }
Entity, EntityReference and ProcessingInstruction
public interface Entity extends Node { // for GE or unparsed
public String getPublicId(); // entity only.
public String getSystemId();
public String getNotationName(); }
// Entity’s replacement Text are stored as its childNodes
// if available.
public interface EntityReference extends Node { }
public interface ProcessingInstruction extends Node {
public String getTarget();
public String getData();
public void setData(String data) throws DOMException; }
DOMException
public abstract class DOMException extends RuntimeException {
public DOMException(short code, String message) {
super(message); this.code = code; }
public short code;
// ExceptionCode
public static final short INDEX_SIZE_ERR = 1;
public static final short DOMSTRING_SIZE_ERR = 2;
public static final short HIERARCHY_REQUEST_ERR = 3;
public static final short WRONG_DOCUMENT_ERR = 4;
public static final short INVALID_CHARACTER_ERR = 5;
public static final short NO_DATA_ALLOWED_ERR = 6;
public static final short NO_MODIFICATION_ALLOWED_ERR = 7;
public static final short NOT_FOUND_ERR = 8;
public static final short NOT_SUPPORTED_ERR = 9;
public static final short INUSE_ATTRIBUTE_ERR = 10;
DOMException
// DOM2 only DOMException code
public static final short INVALID_STATE_ERR = 11;
public static final short SYNTAX_ERR = 12;
public static final short INVALID_MODIFICATION_ERR = 13;
public static final short NAMESPACE_ERR = 14;
public static final short INVALID_ACCESS_ERR = 15;}
DOMImplementation and DocumentFragment
public interface DOMImplementation {
public boolean hasFeature(String feature, String version);
public DocumentType createDocumentType(qName, publicId, systemId) throws DOMException;
public Document createDocument(
namespaceURI, // namespace URI of the document element
qName, // QName of the document element
DocumentType doctype) throws DOMException;
}
public interface DocumentFragment extends Node { }
legal feature string
Module Feature String XML XML HTML HTML Views Views StyleSheets StyleSheets CSS CSS CSS (extended interfaces) CSS2 Events Events User Interface Events (UIEvent interface) UIEvents Mouse Events (MouseEvents interface) MouseEvents Mutation Events (MutationEvent interface) MutationEvents HTML Events HTMLEvents Traversal Traversal Range Range
Module dependence
Module Implies
Views XML or HTML
StyleSheets StyleSheets and XML or HTML
CSS StyleSheets, Views and XML or HTML
CSS2 CSS, StyleSheets, Views and XML or HTML
Events XML or HTML
UIEvents Views, Events and XML or HTML
MouseEvents UIEvents, Views, Events and XML or HTML
MutationEvents Events and XML or HTML
HTMLEvents Events and XML or HTML
DOMParsers and DOMImplementations
Problems:
• How to get a DOM object from an XML Document ?– DOMParser
• HOW to construct DOM objects directly by programs ?– get a DOMImplementation
• HOW to get a DOM object form an XML Document and modify it by programs ?– get a DOMParser and then get the DOMImplementation from the
DOM object.
DOMParser
XML Document
DOM Document
Use Apache’s xerces for DOM• XML2DOM:// find the DOM parser implementation class:
org.apache.xerces.parsers.DOMParserDOMParser parser = new DOMParser();parser.setFeature(("http://xml.org/sax/features/validation", true );parser.setFeature(("http://xml.org/sax/features/namespace", true ); …parser.parse( url_or_inputSource) ;Document doc = parser.getDocument();
DOMImplementation =doc.getImplementation();• Construct DOM from scratch:// find DOMImplematation class:
org.apache.xerces.dom.DOMImplementationImplDOMImplementation dm = new DOMImplementationImpl();// or dm = DOMImplementationImpl.getDOMImplementation(); // non-domDocument doc = dm.createDocument(…);Element e = doc.createElement(…);Attr attr = doc.createAttributeNS(…);Text txt = doc.createTextNode(“…”);
JAXP (Java API for XML Processing) 1.1
• Sun’s Java API for XML Processing• three modules:
– for DOM Processing– for SAX Processing– for Transformation
• 5 packages1. javax.xml.parsers
– Provides classes allowing the processing of XML documents. – Two types of plugable parsers are supported: – SAX (Simple API for XML) – DOM (Document Object Model)
2. javax.xml.transform ( + javax.xml.transform.dom, javax.xml.transform.sax, javax.xml.transform.stream)– APIs for processing transformation instructions, and performing a
transformation from source to result.
JAXP’s DOM plugability mechanism
JAXP API for DOM
• javax.xml.dom.DocumentBuilder– Defines the API to obtain DOM Document instances from
an XML document. Using this class, an application programmer can obtain a Document from XML.
• javax.xml.dom.DocumentBuilderFactory– Defines a factory API that enables applications to obtain a
parser that produces DOM object trees from XML documents.
– abstract class– Concrete subclass can be obtained by the static method:– DocumentBuilderFactory.newInstance()– desired capability of the parser can be specified by setting
the various properties of the obtained factory instance.
Example Code
import javax.xml.parsers.*;
DocumentBuilder builder;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
String location = "http://myserver/mycontent.xml";
try {
builder = factory.newDocumentBuilder();
Document doc1 = builder.parse(location);
Document doc2 = builder.newDocument(); //empty document
} catch (SAXException se) {// handle error
} catch (IOException ioe) { // handle error
} catch (ParserConfigurationException pce){// handle error
}
javax.xml.dom.DocumentBuilder • abstract DOMImplementation getDOMImplementation()
– Obtain an instance of a DOMImplementation object.
• abstract Document newDocument() – Obtain a new instance of a DOM Document object to build a DOM tree with.
• abstract boolean isNamespaceAware() – Indicates whether or not this parser is configured to understand
namespaces.
• abstract boolean isValidating() – Indicates whether or not this parser is configured to validate XML
documents.
• Document parse(File | InputSource | InputStream [, systemId] | uriString )– Parse the content of the given file as an XML document and return a new
DOM Document object.
• abstract void setEntityResolver(EntityResolver er) – Specify the EntityResolver to be used to resolve entities present in the XML
document to be parsed.
• abstract void setErrorHandler(ErrorHandler eh) – Specify the ErrorHandler to be used to report errors present in the XML
document to be parsed.
javax.xml.dom.DocumentBuilderFactory
• Object getAttribute(String name) • void setAttribute(String name, Object value)
– Allows the user to set/get specific attributes on the underlying implementation.
• boolean isIgnoringComments() , setIgnoringComments(boolean)– Indicates whether or not the factory is configured to produce parsers
which ignores comments.
• Other properties:– IgnoringElementContentWhitespace ; ExpandEntityReferences; – Coalescing; // merge adjacent texts and CDATA into a text node– NamespaceAware; Validating;
• abstract DocumentBuilder newDocumentBuilder() – Creates a new instance of a DocumentBuilder using the currently
configured parameters.
• static DocumentBuilderFactory newInstance() – Obtain a new instance of a DocumentBuilderFactory.
HOW DocumentBuilderFactory finds its instance
•Use the javax.xml.parsers.DocumentBuilderFactory system property
•Use the above property at file "lib/jaxp.properties" in the JRE directory.
•look for the classname in the file META-INF/services/ javax.xml.parsers.DocumentBuilderFactory in jars available to the runtime.
•Platform default DocumentBuilderFactory instance, which is "org.apache.crimson.jaxp.DocumentBuilderFactoryImpl“ for JAXP1.1 and crimson1.1.
Bootstrap DOM (level 3 core)
• Problem : how to get a DOMImplementation ?– implementation dependant prior to level 3.– xerces: => org.apache.xerces.dom.DOMImplmentationImpl;– crimson =>org.apache.crimson.tree.DOMImplementationImpl
• two supporting class/interface:– DOMImplementationRegistry– DOMImplementationSource
public interface DOMImplementationSource {
DOMImplementation
getDOMImplementation(String features);
};
DOMImplementationRegistry
public class DOMImplementationRegistry { // The system property to specify the DOMImplementationSource class
names. public static String PROPERTY =
"org.w3c.dom.DOMImplementationSourceList"; private static Vector sources = new Vector(); private static boolean initialized = false; private static void initialize() throws ClassNotFoundException, InstantiationException, IllegalAccessException { initialized = true; String p = System.getProperty(PROPERTY); if (p == null) { return; } StringTokenizer st = new StringTokenizer(p); while (st.hasMoreTokens()) { Object source = Class.forName(st.nextToken()).newInstance(); sources.addElement(source); } }
public static DOMImplementation getDOMImplementation(String features) throws ClassNotFoundException, InstantiationException, IllegalAccessException { if (!initialized) { initialize(); } int len = sources.size(); for (int i = 0; i < len; i++) { DOMImplementationSource source = (DOMImplementationSource) sources.elementAt(i);
DOMImplementation impl = source.getDOMImplementation(features);
if (impl != null) { return impl; } } return null; }
/* Register an implementation. */
public static void addSource(DOMImplementationSource s)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException
{
if (!initialized) { initialize(); }
sources.addElement(s);
// update system property accordingly
StringBuffer b = new StringBuffer(System.getProperty(PROPERTY));
b.append(" " + s.getClass().getName());
System.setProperty(PROPERTY, b.toString()); }}
Get Your DOMImplementation via DOMImplementationRegistry
1. Add all known DOMImplementationSource classes or classnames to your JVM:
A. put all classnames (space separated) into the System property "org.w3c.dom.DOMImplementationSourceList”
System.putProperty(PROPERTY, classnames);
B. DOMImplementationRegistry
.addSource(DOMImplementationSource);
2. Query DOMImplementationReqistry:
DOMImplementation impl = DOMImplementationRegistry
.getDOMImplementation("XML 1.0");
Example: XDXTest
import java.io.File; import org.w3c.dom.Document;import org.apache.xerces.parsers.DOMParser;public class XDXTest { public void test(String xmlDocument, String outputFilename) throws Exception { File outputFile = new File(outputFilename); DOMParser parser = new DOMParser();
// Get the DOM tree as a Document object parser.parse(xmlDocument); Document doc = parser.getDocument();
// Serialize DOM2XML d2x = new DOM2XML(); d2x.toXML(doc, new File(outputFilename)); }
DOM SerializerTest (continued)
public static void main(String[] args) {
if (args.length != 2) {
System.out.println(
"Usage: java XDXTest " +
"[XML document to read] " +
"[filename to write out to]");
System.exit(0); }
try {
XDXTest tester = new XDXTest();
tester.test(args[0], args[1]); // input file, outpt file name
} catch (Exception e) {
e.printStackTrace();
}
}
}
DOMSerializer
import java.io.*; import org.w3c.dom.*public class DOM2XML {
private String indent; // Indentation to use private String lineSeparator; // Line separator to use
public DOM2XML() { indent = ""; lineSeparator = "\n"; } public void setIndent(String indent) { this.indent = indent; } public void setLineSeparator(String lineSeparator) { …} public void toXML(Document doc, OutputStream out) throws IOException { Writer writer = new OutputStreamWriter(out); serialize(doc, writer); } public void toXML(Document doc, File file) throws IOException { … } public void toXML(Document doc, Writer writer) throws IOException { // Start serialization recursion with no indenting serializeNode(doc, writer, ""); writer.flush(); }
public void serializeNode(Node node, Writer writer, String indentLevel)
throws IOException { // Determine action based on node type switch (node.getNodeType()) { case Node.DOCUMENT_NODE: writer.write("<?xml version=\"1.0\"?>"); writer.write(lineSeparator); // recurse on each child NodeList nodes = node.getChildNodes(); if (nodes != null) { for (int i=0; i<nodes.getLength(); i++) { serializeNode(nodes.item(i), writer, ""); } } break;
case Node.ELEMENT_NODE: String name = node.getNodeName(); writer.write(indentLevel + "<" +
name); NamedNodeMap attributes = node.getAttributes(); for (int i=0; i<attributes.getLength(); i++) { Node current = attributes.item(i); writer.write(" " + current.getNodeName() + "=\"" + current.getNodeValue() + "\""); } writer.write(">"); // end of STAG NodeList children = node.getChildNodes(); if (children != null) { if ((children.item(0) != null) && (children.item(0).getNodeType() == Node.ELEMENT_NODE)) { writer.write(lineSeparator); } for (int i=0; i<children.getLength(); i++) { serializeNode(children.item(i), writer, indentLevel + indent); } if ((children.item(0) != null) && (children.item(children.getLength()-
1) .getNodeType() == Node.ELEMENT_NODE)) { writer.write(indentLevel); } } writer.write("</" + name + ">"); writer.write(lineSeparator); break;
case Node.TEXT_NODE: writer.write(node.getNodeValue()); break;
case Node.CDATA_SECTION_NODE: writer.write("<![CDATA[" + node.getNodeValue() + "]]>"); break;
case Node.COMMENT_NODE: writer.write(indentLevel + "<!-- " + node.getNodeValue() + " -->"); writer.write(lineSeparator); break; case Node.PROCESSING_INSTRUCTION_NODE: writer.write("<?" + node.getNodeName() + " " + node.getNodeValue()
+ "?>"); writer.write(lineSeparator); break;
case Node.ENTITY_REFERENCE_NODE: writer.write("&" + node.getNodeName() + ";"); break;
case Node.DOCUMENT_TYPE_NODE: DocumentType docType = (DocumentType)node; writer.write("<!DOCTYPE " + docType.getName()); if (docType.getPublicId() != null) { writer.write(" PUBLIC \"" + docType.getPublicId() + "\" ");
} else { writer.write(" SYSTEM "); } writer.write("\"" + docType.getSystemId() + "\">");
writer.write(lineSeparator); break; } }}