SE 5145 eXtensible Markup Language (XML ) DOM (Document Object
Model) (Part I) 2011-12/Spring, Baheehir University, Istanbul
Slide 2
4h Assignment: DOM implementation of XML Resume Implement DOM
to your Resume. Required: There must exist a Checkbox containing
the labels "Technical" and "Management". Upon checking the
label(s), relevant sections of your Resume must be highligted with
different colors. Optional: A "Show experience duration" button
should calculate the sum of check-ed experiences. You can write the
implementation by using one of the following ways; a) Javascript
(Due April 27) or b) Java, C#, or any other language you prefer
(Due May11) Notes: 1. A sample Javascript based DOM implementation
is displayed on the next slide. Relevant files will also be sent to
you. 2. Details of the assignment was already discussed at the last
lecture. If you did not attend it, please ask the details to your
classmates, not to me since nowadays I' m not available to reply
individual emails. 2
Slide 3
Sample DOM Implementation Inspect the files: JoesCafe.html
JoesCafeCode.js 3
Slide 4
DOM (Document Object Model) Q: How to provide uniform access to
structured documents in diverse applications (parsers, browsers,
editors, databases)? A: Use an XML API (DOM or SAX) 4
Slide 5
XML APIs XML processors make the structure and contents of XML
documents available to applications through APIs Event-based APIs
notify application through parsing events e.g., the SAX call-back
interfaces Object-model (or tree) based APIs provide a full parse
tree e.g, DOM, W3C Recommendation more convenient, but may require
too much resources with the largest documents Major parsers support
both SAX and DOM 5
Slide 6
DOM: What is it? W3C standard adopted in 1998 An object-based
API for XML and HTML documents Developed to support "dynamic HTML"
Provide a standard tree interface to document structure across
browsers, for use in JavaScript Provides a standardized way of
building documents, navigating their structure, adding, modifying
or deleting elements and content using programmatic techniques (by
programs and scripts). Provides a foundation for developing,
querying, filtering, transformation, rendering etc. applications on
top of DOM implementations 6
Slide 7
DOM: What is it? Platform-, browser- and language-neutral
Programming-language specific mappings for JavaScript, Java as part
of the specification Implementations in other languages: C++,
Python, C#,... In contrast to Serial Access XML (for SAX) could
think as Directly Obtainable in Memory (for DOM). 7
Slide 8
8 SAX vs. DOM DOM reads the entire XML document into memory and
stores it as a tree data structure SAX reads the XML document and
sends an event for each element that it encounters Consequences:
DOM provides random access into the XML document SAX provides only
sequential access to the XML document DOM is slow and requires huge
amounts of memory, so it cannot be used for large XML documents SAX
is fast and requires very little memory, so it can be used for huge
documents (or large numbers of documents) This makes SAX much more
popular for web sites Some DOM implementations have methods for
changing the XML document in memory; SAX implementations do
not
Slide 9
Overview of W3C DOM Specification Second one in the XML-family
of recommendations Different levels of implementation: Level 1 (W3C
Rec, Oct. 1998) : Flat object model (two features: DOM and HTML)
Level 2 (W3C Rec, Nov. 2000): API structured into multiple modules
Core, XML, HTML, Range, Traversal,... Level 3 (W3C Working Draft
(January 2002): New and revised modules new: Load and Save,
Validation revised: Core, Events in progress, W3C Notes: XPath,
Views and Formatting,... 9
Slide 10
DOM Features Core: Represent basic structure of well-formed XML
documents XML: Access entities, notations,... Events: Communicate
user interaction and document changes to the application
HTMLEvents, MutationEvents, UIEvents,... Range: Select portions of
a document Traversal: Process/Filter nodes in sequence Views:
Access alternative representations of a document StyleSheet/CSS:
Represent of style sheets HTML: Represent HTML documents LS,
LS-Async: Load and Save 10
Slide 11
DOM Tree & Nodes Text nodes are seperate nodes in the tree,
not considered to be parts of the tags 11
Slide 12
DOM Tree & Nodes 12
Slide 13
DOM Tree & Nodes 13
Slide 14
DOM Tree & Nodes 14
Slide 15
DOM structure model Based on O-O concepts: methods (to access
or change objects state) interfaces (declaration of a set of
methods) objects (encapsulation of data and methods) Roughly
similar to the XSLT/XPath data model (to be discussed later) a
parse tree Tree-like structure implied by the abstract
relationships defined by the programming interfaces; Does not
necessarily reflect data structures used by an implementation (but
probably does) 15
Slide 16
Structure of DOM Level 1 I: DOM Core Interfaces Fundamental
interfaces basic interfaces to structured documents Extended
interfaces XML specific: CDATASection, DocumentType, Notation,
Entity, EntityReference, ProcessingInstruction II: DOM HTML
Interfaces more convenient to access HTML documents (we ignore
these) 16
Slide 17
DOM Level 2 Level 1: basic representation and manipulation of
document structure and content (No access to the contents of a DTD)
DOM Level 2 adds support for namespaces accessing elements by ID
attribute values optional features interfaces to document views and
style sheets an event model (for, say, user actions on elements)
methods for traversing the document tree and manipulating regions
of document (e.g., selected by the user of an editor) Loading and
writing of docs not specified (-> Level 3) 17
Slide 18
18 Structure of the DOM tree The DOM tree is composed of Node
objects Node is an interface Some of the more important
subinterfaces are Element, Attr, and Text An Element node may have
children Attr and Text nodes are leaves Additional types are
Document, ProcessingInstruction, Comment, Entity, CDATASection and
several others Hence, the DOM tree is composed entirely of Node
objects, but the Node objects can be downcast into more specific
types as needed
Slide 19 Leila Laskuprintti Pyynpolku 1 70460 KUOPIO...
Document Element NamedNodeMap Text DOM structure model">
19 invoice invoicepage name addressee addressdata address
form="00" type="estimatedbill" Leila Laskuprintti
streetaddresspostoffice 70460 KUOPIOPyynpolku 1 Leila Laskuprintti
Pyynpolku 1 70460 KUOPIO... Document Element NamedNodeMap Text DOM
structure model
Slide 20
Core Interfaces: Node & its variants Node Comment
DocumentFragmentAttr Text Element CDATASection
ProcessingInstruction CharacterData EntityDocumentTypeNotation
EntityReference Extendedinterfaces Document 20
Slide 21
21 DOM interfaces: Node invoice invoicepage name addressee
addressdata address form="00" type="estimatedbill" Leila
Laskuprintti streetaddresspostoffice 70460 KUOPIOPyynpolku 1 Node
getNodeType getNodeValue getOwnerDocument getParentNode
hasChildNodesgetChildNodes getFirstChild getLastChild
getPreviousSibling getNextSibling hasAttributesgetAttributes
appendChild(newChild) insertBefore(newChild,refChild)
replaceChild(newChild,oldChild) removeChild(oldChild) Document
Element NamedNodeMap Text
Slide 22
22 Operations on Node s, I The results returned by
getNodeName(), getNodeValue(), getNodeType() and getAttributes()
depend on the subtype of the node, as follows: Element Text Attr
getNodeName() getNodeValue() getNodeType() getAttributes() tag name
null ELEMENT_NODE NamedNodeMap "#text" text contents TEXT_NODE null
name of attribute value of attribute ATTRIBUTE_NODE null
Slide 23
23 Distinguishing Node types Heres an easy way to tell what
kind of a node you are dealing with: switch(node.getNodeType()) {
case Node.ELEMENT_NODE: Element element = (Element)node;...; break;
case Node.TEXT_NODE: Text text = (Text)node;... break; case
Node.ATTRIBUTE_NODE: Attr attr = (Attr)node;... break; default:...
}
Slide 24
24 Operations on Node s, II Tree-walking operations that return
a Node : getParentNode() getFirstChild() getNextSibling()
getPreviousSibling() getLastChild() Tests that return a boolean :
hasAttributes() hasChildNodes()
Slide 25
25 invoice invoicepage name addressee addressdata address
form="00" type="estimatedbill" Leila Laskuprintti
streetaddresspostoffice 70460 KUOPIOPyynpolku 1 Document
getDocumentElement createAttribute(name) createElement(tagName)
createTextNode(data) getDocType() getElementById(IdVal) Node
Document Element NamedNodeMap Text DOM interfaces: Document
Slide 26
26 DOM interfaces: Element invoice invoicepage name addressee
addressdata address form="00" type="estimatedbill" Leila
Laskuprintti streetaddresspostoffice 70460 KUOPIOPyynpolku 1
Element getTagName getAttributeNode(name) setAttributeNode(attr)
removeAttribute(name) getElementsByTagName(name) hasAttribute(name)
Node Document Element NamedNodeMap Text
Slide 27
27 Operations for Element s String getTagName() Returns the
name of the tag boolean hasAttribute(String name) Returns true if
this Element has the named attribute String getAttribute(String
name) Returns the (String) value of the named attribute boolean
hasAttributes() Returns true if this Element has any attributes
This method is actually inherited from Node Returns false if it is
applied to a Node that isnt an Element NamedNodeMap getAttributes()
Returns a NamedNodeMap of all the Element s attributes This method
is actually inherited from Node Returns null if it is applied to a
Node that isnt an Element
Slide 28
28 Operations on Text s Text is a subinterface of CharacterData
which, in turn, is a subinterface of Node In addition to inheriting
the Node methods, it inherits these methods (among others) from
CharacterData : public String getData() throws DOMException Returns
the text contents of this Text node public int getLength() Returns
the number of Unicode characters in the text public String
substringData(int offset, int count) throws DOMException Returns a
substring of the text contents Text also declares some methods
public String getWholeText() Returns a concatenation of all
logically adjacent text nodes
Slide 29
29 Operations on Attr s String getName() Returns the name of
this attribute. Element getOwnerElement() Returns the Element node
this attribute is attached to, or null if this attribute is not in
use boolean getSpecified() Returns true if this attribute was
explicitly given a value in the original document String getValue()
Returns the value of the attribute as a String
Slide 30
Object Creation in DOM Each DOM object X lives in the context
of a Document: X.getOwnerDocument() Objects implementing interface
X are created by factory methods D.create X (), where D is a
Document object. E.g: createElement("A"), createAttribute("href"),
createTextNode("Hello!") Creation and persistent saving of Document
s left to be specified by implementations 30
Slide 31
Accessing properties of a Node Node.getNodeName () for an
Element = getTagName() for an Attr: the name of the attribute for
Text = "#text" etc Node.getNodeValue() content of a text node,
value of attribute, ; null for an Element (!!) (in XSLT/Xpath: the
full textual content) Node.getNodeType() : numeric constants (1, 2,
3, , 12) for ELEMENT_NODE, ATTRIBUTE_NODE,TEXT_NODE, ,
NOTATION_NODE 31
Slide 32
Content and element manipulation Manipulating CharacterData D :
D.substringData( offset, count ) D.appendData( string )
D.insertData( offset, string ) D.deleteData( offset, count )
D.replaceData( offset, count, string ) (= delete + insert)
Accessing attributes of an Element object E : E.getAttribute( name
) E.setAttribute( name, value ) E.removeAttribute( name ) 32
Slide 33
Additional Core Interfaces (1) NodeList for ordered lists of
nodes e.g. from Node.getChildNodes() or
Element.getElementsByTagName("name") all descendant elements of
type "name" in document order (wild-card "*" matches any element
type) Accessing a specific node, or iterating over all nodes of a
NodeList : Accessing a specific node, or iterating over all nodes
of a NodeList : E.g. Java code to process all children: for (i=0;
i