82
An Introduction to SAX Transparency No. 1 Introduction to SAX: a standard interface for event-based XML parsing Cheng-Chia Chen

Introduction to SAX: a standard interface for event-based XML parsing

  • Upload
    lainey

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to SAX: a standard interface for event-based XML parsing. Cheng-Chia Chen. What is SAX ?. SAX : Simple API for XML Started as community-driven project xml-dev mailing list Originally designed as Java API Others (C++, Python, Perl) are now supported SAX2 Namespaces - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 1

Introduction to SAX:a standard interface for event-

based XML parsing

Cheng-Chia Chen

Page 2: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 2

What is SAX ?

SAX : Simple API for XML Started as community-driven project

xml-dev mailing list Originally designed as Java API

Others (C++, Python, Perl) are now supported

SAX2 Namespaces configurable features and properties

Page 3: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 3

SAX Features

Event-driven You provide various event handlers

Fast and lightweight Document does not have to be entirely in memory

Sequential read access only Does not support modification of document

Page 4: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 4

SAX Processing Model

Page 5: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 5

What is an Event-Based Interface?Two major types of XML APIs:Tree-based APIs ==> DOM

compiles an XML document into an internal tree structure, then allows an application to navigate that tree.

Event-based APIs. ==> SAX reports parsing events (such as the start and end of

elements) directly to the application through callbacks, usually does not build an internal tree. The application implements handlers to deal with the

different events, much like handling events in a graphical user interface.

Comparison: For tree-based APIs useful for many applications require more system resources, especially if the document

is large.

Page 6: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 6

How an event-based API worksSample document:

<?xml version="1.0“ ?> <doc> <para>Hello, world!</para> </doc>

An event-based interface will break down the structure of this document into a sequence of SAX events: start document start element: doc start element: para characters: Hello, world! end element: para end element: doc end document

Page 7: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 7

Quick Start for SAX2 Application Writers

1. Make sure you have the required library(available in jdk):1. the SAX2 interfaces and classes and

2. XML parsers that supports SAX2. Xerces => org.apache.xerces.parsers.SAXParser or

com.sun.org.apache.xerces.internal.parsers.SAXParser

2. Get the parser via XMLReaderFactory#createXMLReader() XMLReader parser = XMLReaderFactory.createXMLReader() ;

3. Create event handlers to receive information about the document. The most important one is the ContentHandler, which receives events for

the start and end of elements, character data, processing instructions, and other basic XML structure.

can just subclss a builtin adapter class DefaultHandler , and then implement only the methods that you need.

Page 8: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 8

Example: (MyHandler.java)

prints a message each time an element starts or ends: import org.xml.sax.helpers.DefaultHandler;

import org.xml.sax.Attributes; import static java.lang.System.out;

public class MyHandler extends DefaultHandler { public void startElement (String uri, String localName, String qName,

Attributes atts)

{

out.println("Start element: " + localName);

}

public void endElement (String uri, String localName,

String qName)

{

out.println("End element: " + qName);

}

}

Page 9: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 9

The main program (SAXApp.java)

import org.xml.sax.XMLReader;

import org.xml.sax.helper.DefaultHandler;

import org.xml.sax.helpers.XMLReaderFactory;

public class SAXApp {

// static final String parserClass =

/ / “org.apache.xerces.parsers.SAXParser "; // use my own parser!

public static void main (String args[]) throws Exception

{

XMLReader xr = XMLReaderFactory.createXMLReader (/*parserClass*/);

DefaultHandler handler = new MyHandler();

xr.setContentHandler(handler);

for (int i = 0; i < args.length; i++) {

xr.parse(args[i]);

} }

Page 10: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 10

The input

the input XML document (roses.xml): <?xml version="1.0"?> <poem> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and I love you.</line> </poem>

To parse this with your SAXApp application, you would supply the absolute URL of the document on the command line:

java SAXApp file://localhost/tmp/roses.xml or

java SAXApp file:///tmp/roses.xml

Page 11: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 11

The output

The output should be as follows: Start element: poem

Start element: line

End element: line

Start element: line

End element: line

Start element: line

End element: line

Start element: line

End element: line

End element: poem

Page 12: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 12

Implementation of

Parser

AttrbuteList

Locator

(supplied by

Driver writer)

supplied by application writerSAX Driver’s

parser classname

[ ]

SAX

Page 13: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 13

Implementation of

Parser

Attrbutes

Locator

(supplied by

Driver writer)

supplied by application writerSAX Driver’s

parser classname

[ ]

Content

XMLReader

XMLReader

SAX 2

Page 14: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 14

SAX 2.0: Java Road Map

The SAX Java distribution contains 17 core classes/interfaces, 10 helper classes 2 extension interfaces + 6 extension implementations

For application writers 7 interfaces available, but most XML applications will need

only one or two of them.

Page 15: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 15

SAX classes and interfaces

Falling into five groups:

1. interfaces implemented by the parser: XMLReader, Attributes (required), and Locator (optional)

2.interfaces implemented by the application: ContentHandler, ErrorHandler, DTDHandler, and EntityResolver (all optional: ContentHandler will be the most important one for typical

XML applications) XMLFilter : for cascaded applications DeclHandler, LexicalHandler: for additional DTD/Lexical events

3.standard SAX classes supplied by SAX2: InputSource, SAXException,

SAXParseException, SAXNotSupportedException, SAXNotRecognizedException

Page 16: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 16

SAX classes and interfaces

4. Helper classes in the org.xml.sax.helpers package: Default implementations:

AttributesImpl, LocatorImpl, XMLFilterImpl NameSpaceSupport:

NameSpaceSupport Factory Classes:

XMLReaderFactory

5. Legacy SAX 1.0 classes:

Parser, ParserFactory, HandlerBase, AttributeList,

AttributeListImpl, DocumentHandler.

6. Conversion b/t SAX1.0 and SAX 2.0 Parser/XMLReader ParserAdaptor, XMLReaderAdaptor

Page 17: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 17

Interfaces for Parser Writers (org.xml.sax package)

A SAX-conformant XML parser needs to implement only two or three simple interfaces;

1. XMLReader the main interface to a SAX parser: allow the user to register handlers for callbacks, to set

the locale for error reporting, and to start an XML parse.

2. Attributes allow users to iterate through an attribute list. a convenience implementation available in the

AttributesImpl.

3. Locator allows users to find the location of current event in the

XML source document.

Page 18: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 18

Interfaces for Application Writers (org.xml.sax package)

A SAX application may implement any or none of the following interfaces, as required. may need only ContentHandler and possibly ErrorHandler. can implement all of these interfaces in a single class.

1. ContentHandler receive notification of basic document-related events like

the start and end of elements. applications use most often in many cases, it is the only one needed.

2. ErrorHandler used for special error handling.

Page 19: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 19

Interfaces for Application Writers (cont’d)

3. DTDHandler to receive notification of the NOTATION and unparsed

ENTITY declarations.

4. EntityResolver redirection of URIs in documents (or other types of

custom handling).

5. DECLHandler: To receive notification of Element and AttributeList

declaration in DTD.

6. LexicalHandler To receive notification of markup Boundary Events.

Comment, CDATASection (begin and end) Entity Expansion (begin and end),…

7. XMLFilter: For cascading applcations.

Page 20: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 20

Standard SAX Classes (org.xml.sax package)

1. InputSource Input for a parser. wrap information for a single input, including a public identifier,

system identifier, byte stream, and character stream (as appropriate). may be instantiated by EntityResolvers.

2. SAXException : represents a general SAX exception. SAXParseException : represents a SAX exception tied to a specific

point in an XML source document. SAXNotSupportedException, SAXNotRecognizedException

4. DefaultHandler default implementations for ContentHandler, ErrorHandler,

DTDHandler, and EntityResolver. users can subclass this to simplify handler writing.

Page 21: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 21

Helper Classes (org.xml.sax.helpers package)

provided simply as a convenience for Java programmers.

1. XMLReaderFactory used to load SAX parsers dynamically at run time, based

on the class name.

2. AttributesImpl default implementation of Attributes. can be used to make a copy of an Attributes

3. LocatorImpl used to make a persistent snapshot of a Locator's values

at a specific point in the parse.

4. XMLFilterImpl

Page 22: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 22

SAX2: Features and Properties

standard methods to query and set features and properties in an XMLReader. Features are boolean properties.

can request an XMLReader to validate (or not to validate) a document, or to internalize (or not to internalize) all names,

Use getFeature, setFeature, getProperty, and setProperty methods to get/set feature/property of an XMLReader:

EX: // check if a parser is doing validation! try{ if( xmlReader.getFeature( "http://xml.org/sax/features/validation")){ out.println("Parser is validating."); }else{ out.println("Parser is not validating.");} }catch(SAXException e){ out.println("Parser may or may not be validating."); }

Page 23: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 23

SAX2 features

See SAX2 standard feature flags for more Anyone can define his own features (by designating a unique uri) . A feature may be read-only or read/write, and it may be modifiable only

when parsing, or only when not parsing. http://xml.org/sax/features/namespaces

true => Perform Namespace processing. (URI + localPart ) reported + prefixMapping events generated

false: Optionally do not perform Namespace processing (implies namespace-prefixes).

access: (parsing) read-only; (not parsing) read/write …/namespace-prefixes // qName + xmlns* attributes reported

true: qualified names (pref:local) reported and namespace declarations (xmlns*) treated as attributes as well.

false: no Namespace declarations reported, and optionally no qualified names reported.

access: (parsing) read-only; (not parsing) read/write

Page 24: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 24

standard Features supplied by SAX2

…/string-interning true => All element names, prefixes, attribute names, Namespace URIs,

and local names are internalized using java.lang.String#intern(). access: (parsing) read-only; (not parsing) read/write

…/validation true => Report all validation errors (implies external-general-entities

and external-parameter-entities). access: (parsing) read-only; (not parsing) read/write

…/external-general-entities true => Include all external general (text) entities. access: (parsing) read-only; (not parsing) read/write

.../external-parameter-entities true: Include all external parameter entities, including the external DTD

subset. false: Do not include any external parameter entities, even the external

DTD subset. access: (parsing) read-only; (not parsing) read/write

Page 25: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 25

SAX2 Properties

See standard SAX2 Properties for more http://xml.org/sax/properties/lexical-handler

data type: org.xml.sax.ext.LexicalHandler description: The registered lexical handler. access: read/write

…/declaration-handler data type: org.xml.sax.ext.DeclHandler description: The registered Declaration handler. access: read/write

…/document-xml-version XML version; String:“1.0” or “1.1”

…/dom-node data type: org.w3c.dom.Node description: the current DOM node being visited if this is a DOM tree

Walker access: (parsing) read-only; (not parsing) read/write

…/xml-string // not supported by Xerces data type: java.lang.String description: The string source for the current event. access: read-only

Page 26: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 26

SAX2 Namespace Support

standardized Namespace support essential for higher-level standards like XSL, XML

Schemas, RDF, and XLink. Namespace processing affects only element and

attribute names. ex: <x:e y:att = “z:val”/> // x,y mapping resolved but not

z. With Namespace processing: name = [URI]+localName (must not contain : ) and qName may be valid or not Without Namespace processing: name = qName (qualified name may contains :),

SAX2 support either of these views or both simultaneously,

Page 27: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 27

Sax2 namespace support

affects the ContentHandler and Attributes interfaces. In SAX2, the startElement and endElement callbacks in a content

handler look like this:public void startElement (String uri, String localName, String qName, Attributes atts)throws SAXException;

public void endElement (String uri, String localName, String qName) throws SAXException;

By default, an XML reader will report a Namespace URI and a local name for every element, in both the start and end handler.

Example: <html:hr xmlns:html= "http://www.w3.org/1999/xhtml"/> uri = "http://www.w3.org/1999/xhtml" localName=“hr” qName = “html:hr” or “” depending on namespace-prefix

feature set or not

Page 28: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 28

startPrefixMapping, endPrefixMapping

SAX2 also reports the scope of Namespace declarations, so that applications can resolve prefixes in attribute values or character data if necessary.

public void startPrefixMapping (String prefix, String uri) throws SAXException;

public void endPrefixMapping (String prefix)

throws SAXException;

Ex: Before the start-element event, the XML reader would call :

startPrefixMapping("html","http://www.w3.org/1999/xhtml")

After the end-element event ,the XML reader would call :

endPrefixMapping("html")

Page 29: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 29

Configuring Namespace Support

"http://xml.org/features/namespaces" feature true [default] => Namespace URIs + local names valid, and start/endPrefixMapping events reported.

"http://xml.org/features/namespace-prefixes" feature true => prefixed names (qName) valid and Namespace declarations (xmlns* attributes) reported in attributes: false [default] => qualified prefixed names(qName) may

optionally be reported (in practice, all are reported), but xmlns* attributes must not be reported.

Note: 1. At least one of both features must be true. Suggestion: 1. namespace-aware: use default setting. 2. no use of namespace: toggle the default setting.

Page 30: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 30

Configuration Example

Consider the following simple sample document:

<h:hello xmlns:h ="http://www.greeting.com/ns/“ id ="a1" h:person ="David"/> NS true ,NSP false (the default) => report prefixMapping events +

h:hello => "http://www.greeting.com/ns/" + "hello"; xmlns:h => not appearing in attrs; id =>“”(empty string) + "id“ h:person => "http://www.greeting.com/ns/" + "person".

namespaces, namespace-prefixes both true: prefixMapping events + h:hello => "http://www.greeting.com/ns/" + "hello“ + “h:hello”

xmlns:h => “…” + “h” + “xmlns:h” id =>“”(empty string) + "id“ + “id” h:person => "http://www.greeting.com/ns/" + "person“ + “h:person”.

namespaces is false and namespace-prefixes is true: “” + “” + "h:hello"; “” + “” + "xmlns:h"; “” + “” + "id"; and “” + “” + "h:person".

Page 31: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 31

SAX2 packages

3 packages org.xml.sax org.xml.sax.helpers

XMLReaderFactory DefaultHandlerAttributesImpl LocatorImplNamespaceSupport XMLFilterImplAttributeListImpl,ParserAdapter,ParserFactory,

XMLReaderAdapter (sax 1.0 deprecated) org.xml.sax.ext

DeclHandler : for DTD declaration eventsLexicalHandler : for Lexical eventsdefaultHandler2 :Locator2, Locator2Impl, EntityResolver2, Attributes2,

Attributes2impl

Page 32: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 32

Package: org.xml.sax for SAX2

Interfaces: AttributeList sax1

Attributes2 Attributes ContentHandler DocumentHandlersax1

DTDHandler EntityResolver2

EntiryResolver ErrorHandler Locator2 Locator Parsersax1

XMLReader XMLFilter

Classes: HandlerBasesax1 InputSource

Exceptions: SAXException SAXParseException SAXNotRecognizedException SAXNotSupportedException

Page 33: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 33

Interface org.xml.sax.AttributeList(SAX1.0 deprecated)

Methods index: getLength()

Return the number of attributes in this list.

getName(int index) Return the name of an

attribute in this list (by position).

getType(int index) Return the type of an attribute

in the list (by position). getValue(int index)

Return the value of an attribute in the list (by position).

getIndex(String name)

getType(String name) Return the type of an attribute

in the list (by name).

getValue(String name) Return the value of an

attribute in the list (by name).

Page 34: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 34

interface org.xml.sax.ext.Attributes2 Attributes

int getLength()

int getIndex(String qName) int getIndex(String uri, String

localName) Look up the index of an attribute

by qName or uri+localName. 0-based

String getLocalName(int index) String getQName(int index) String getURI(int index)

isDeclared (index | qName | uri,local)2

declared in DTD => true

String getType(int index) String getType(String qName) String getType(String uri,

String localName) possible results:

"CDATA", "ID", "IDREF", "IDREFS",

"NMTOKEN"(+enumeration), "NMTOKENS", "ENTITY", "ENTITIES", "NOTATION"

String getValue(int index) String getValue(String qName) String getValue(String uri,

String localName) isSpecified(index | qName | uri,local)2

Note: All methods return null if namespace processing does not support them. e.g. if the

namespace feature is false => getValue(uri, localName) returns null.

Page 35: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 35

interface ContentHandler

startDocument()

endDocument()

startElement( uri, localName, qName, Attributes atts)

endElement(uri, localName, qName)

startPrefixMapping(prefix, uri) Begin the scope of a prefix-URI

Namespace mapping.

endPrefixMapping(prefix) no guarantee of proper nesting

among start- and end-prefixing mapping

characters(char[] ch, int start, int length) Receive notification of character

data. ignorableWhitespace(char[] ch,

int start, int length) processingInstruction(target,

data) setDocumentLocator(Locator

locator) Receive an object for locating the

origin of SAX document events. will be invoked only once and

before any other method is called. skippedEntity( name)

Receive notification of a skipped entity.

Page 36: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 36

interface ContentHandler

skippedEntity(name) Receive notification of a

skipped entity. The Parser will invoke this

method once for each entity skipped. Non-validating processors

may skip entities if they have not seen the declarations (because, for example, the entity was declared in an external DTD subset).

All processors may skip external entities, depending on the values of the http://xml.org/sax/features/external-general-entities and the http://xml.org/sax/features/external-parameter-entities features.

<test> <a/>&ge1;bc<c/> </test>

Page 37: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 37

Interface org.xml.sax.DTDHandler

Method Index notationDecl(String, String,

String) throws SAXException Receive notification of a

notation declaration event. parameters: name+pubId+sysId

Ex:

<!NOTATION GIF PUBLIC “abc” > notationDecl(“GIF”, “abc”, “”)

unparsedEntityDecl(name, pubicId, systemId, notation) Receive notification of an

unparsed entity declaration event.

Ex: <!ENTITY aPic SYSTEM ‘here”

NDATA GIF>

=>unparsedEntityDecl(

“aPic”,

“”, // publicId

“here”,// String systemId,

“GIF” // notationName)

Page 38: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 38

Interface org.xml.sax.Parser(SAX1.0; skipped!)

Method index parse(InputSource)

Parse an XML document. parse(String)

Parse an XML document from a system identifier (URI).

setDocumentHandler(DocumentHandler) Allow an application to

register a document event handler.

setDTDHandler(DTDHandler) Allow an application to

register a DTD event handler.

setEntityResolver(EntityResolver) Allow an application to

register a custom entity resolver.

setErrorHandler(ErrorHandler) Allow an application to

register an error event handler.

setLocale(Locale) Allow an application to

request a locale for errors and warnings.

Note: all return types are void.

Page 39: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 39

interface XMLReader

ContentHandler : getContentHandler() setContentHandler(ContentHa

ndler handler)

DTDHandler getDTDHandler() setDTDHandler(DTDHandler

handler)

EntityResolver getEntityResolver() setEntityResolver(EntityResol

ver resolver)

ErrorHandler getErrorHandler() setErrorHandler(ErrorHandler

handler) parse:

parse(InputSource input) parse(String systemId)

Features and Properties: boolean getFeature(name) Object getProperty(name) setFeature(name, boolean

value) setProperty(name, Object

value)

Page 40: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 40

Interface org.xml.sax.DocumentHandler(SAX1.0 skipped)

Method Index characters(char[], int, int)

Receive notification of character data.

endDocument() Receive notification of the end

of a document. endElement(String)

Receive notification of the end of an element.

ignorableWhitespace(char[], int, int) Receive notification of

ignorable whitespace in element content.

processingInstruction(String, String) Receive notification of a

processing instruction. setDocumentLocator(Locator)

Receive an object for locating the origin of SAX document events.

startDocument() Receive notification of the

beginning of a document. startElement(String,

AttributeList) Receive notification of the

beginning of an element.

Page 41: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 41

Interface org.xml.sax.Locator, org.xml.sax.ext.Locator2

Method Index getColumnNumber()

Return the column number where the current document event ends.

getLineNumber() Return the line number where

the current document event ends.

getPublicId() Return the public identifier for the

current document event. getSystemId()

Return the system identifier for the current document event.

getEncoding()2 :String caharacter encoding used

getXMLVersion()2:String XML version for the entity

Note: If an implementation supports Locator2,

XMLReader.getFeature (“…/use-locator2”)

will return true.

Page 42: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 42

Interface org.xml.sax.EntityResolver, org.xml.sax.ext.EntityResolver2

InputSource resolveEntity(String pubilcId, String systemId)

InputSource resolveEntity2(entityName, publicId, baseURI, systemId) // baseURI + systemId absolute URI Allow the application to resolve external entities

The Parser will call this method before opening any external entity including:

the external DTD subset( entityName is "[dtd]" ),

external entities referenced within the DTD or within the document element

parameter entity %name ; general entity name

InputSource getExternalSubset2(rootName, baseURI) Allows applications to provide an external subset for docs that don't explicitly

define one. // Either no DOCTYPE or has one but no external subset given.

rootName: document root name; baseURI: absolute, additional hint.

To use version 2, must setFeature(“…/use-entity-resolver2”, true) Version 2 will hide Version 1 if it is used.

Page 43: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 43

Special entity processing for XHTML dtd

import org.xml.sax.EntityResolver, org.xml.sax.InputSource;

public class MyResolver implements EntityResolver {

public InputSource resolveEntity (String publicId, String systemId) {

if (publicId.equals(“-//W3c//DTD XHTML 1.0//EN”) || systemId.equals(“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd") ) {

// return my local xhtml1.0 DTD

Reader reader = new FileReader(“myXhtmlDtdFile.dtd”);

return new InputSource(reader); }

else { // use the default behaviour

return null; } } }

Page 44: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 44

Interface org.xml.sax.ErrorHandler

Method Index error(SAXParseException)

Receive notification of a recoverable error.

fatalError(SAXParseException) Receive notification of a non-

recoverable error. warning(SAXParseException)

Receive notification of a warning.

Page 45: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 45

interface org.xml.sax.ext.DeclHandler

attributeDecl(String eName, String aName, String type, String valueDefault, String value) Report an attribute type declaration. valueDefault - "#IMPLIED", "#REQUIRED", "#FIXED" or null if none of

these applies. value - A string representing the attribute's default value, or null if

there is none. enumeartion or notations => [NOTATION](nm1|…|nmk)

elementDecl(name, String model) Report an element type declaration.

externalEntityDecl(name, publicId, systemId) Report a parsed external entity declaration. parameter entity => name begins with %.

internalEntityDecl(name, String value) Report an internal entity declaration. parameter entity => name begins with %; value is replacement text.

Page 46: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 46

Interface org.xml.sax.ext.LexicalHandler

optional extension handler for SAX2 to provide lexical information about an XML document, such as comments and CDATA section boundaries; XMLreaders are not required to support. apply to the entire document, not just to the document element, all lexical handler events must appear between startDocument and

endDocument events.set an LexicalHandler/DeclHandler for an XMLreader:try{setProperty("http://xml.org/sax/handlers/LexicalHandler“, aLexicalHandler)

setProperty("http://xml.org/sax/handlers/DeclHandler“, aDeclHandler)

}catch(SAXNotRecognizedException e){} catch(SAXNotSupportedException e){}

Page 47: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 47

interface LexicalHandler

startDTD(String name, String publicId, String systemId) Report the start of DTD declarations, if any.

endDTD() Report the end of DTD declarations.

startCDATA() Report the start of a CDATA section.

endCDATA() Report the end of a CDATA section.

comment(char[] ch, int start, int length) Report an XML comment anywhere in the document.

endEntity(String name) // general or parameter entity Report the end of an entity [expansion]. parameter entity begins with %

Page 48: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 48

interface LexicalHandler

startEntity(String name) Report the beginning of an entity in document. name: name of the entity. parameter entity begin with ‘%’ external dtd subset “[dtd]”

NOTE: Entity references in attribute values -- and the start and

end of the document entity -- are never reported. Skipped entities will be reported through the skippedEntity

event, which is part of the ContentHandler interface.

Page 49: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 49

Class org.xml.sax.InputSource

Constructors: InputSource()

Zero-argument default constructor.

InputSource(InputStream) Create a new input source with

a byte stream. InputSource(Reader)

Create a new input source with a character stream.

InputSource(String) Create a new input source with

a system identifier.

access order: char stream, byte stream,

systmId, publicId.

Methods getByteStream()

Get the byte stream for this input source.

getCharacterStream() Get the character stream for

this input source. getEncoding()

Get the character encoding for a byte stream or URI.

getPublicId() Get the public identifier for

this input source. getSystemId()

Get the system identifier for this input source.

Page 50: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 50

Class org.xml.sax.InputSource

setByteStream(InputStream) Set the byte stream for this

input source. setCharacterStream(Reader)

Set the character stream for this input source.

setEncoding(String) Set the character encoding, if

known. setPublicId(String)

Set the public identifier for this input source.

setSystemId(String) Set the system identifier for

this input source.

Page 51: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 51

Class org.xml.sax.HandlerBase (SAX1.0 deprecated)

Constructor: HandlerBase() Methods: characters(char[], int, int)

Receive notification of character data inside an element.

endDocument() Receive notification of the end

of the document. endElement(String)

Receive notification of the end of an element.

error(SAXParseException) Receive notification of a

recoverable parser error.

fatalError(SAXParseException) Report a fatal XML parsing

error. ignorableWhitespace(char[],

int, int) Receive notification of

ignorable whitespace in element content.

notationDecl(String, String, String) Receive notification of a

notation declaration. processingInstruction(String,

String) Receive notification of a

processing instruction.

Page 52: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 52

Class org.xml.sax.HandlerBase (cont’d)

resolveEntity(String, String) Resolve an external entity.

setDocumentLocator(Locator) Receive a Locator object for

document events. startDocument()

Receive notification of the beginning of the document.

startElement(String, AttributeList) Receive notification of the

start of an element.

unparsedEntityDecl(String, String, String, String) Receive notification of an

unparsed entity declaration. warning(SAXParseException)

Receive notification of a parser warning.

Page 53: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 53

Class org.xml.sax.SAXException

Constructors: SAXException(Exception)

Create a new SAXException wrapping an existing exception. SAXException(String)

Create a new SAXException. SAXException(String, Exception)

Create a new SAXException from an existing exception.

Methods: getException()

Return the embedded exception, if any. getMessage()

Return a detail message for this exception. toString()

retrun a string representation of this exception.

Page 54: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 54

Class org.xml.sax.SAXParseException

extends SAXException;

Constructors: SAXParseException(message,

locator) Create a new

SAXParseException from a message and a Locator.

SAXParseException(message, locator, exception) Wrap an existing exception in

a SAXParseException.

SAXParseException(message, pubID, sysID, lineNo, colNo) Create a new

SAXParseException. SAXParseException(message,

pubID, sysID, lineNo, colNo, exception) Create a new

SAXParseException with an embedded exception.

Page 55: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 55

Class org.xml.sax.SAXParseException

Methods:getColumnNumber()

The column number of the end of the text where the exception occurred.

getLineNumber() The line number of the end of the text where the exception

occurred. getPublicId()

Get the public identifier of the entity where the exception occurred.

getSystemId() Get the system identifier of the entity where the exception

occurred.

Page 56: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 56

public class SAXNotRecognizedException

extends SAXExceptionException class for an unrecognized identifier. XMLReader will throw this exception when it finds an

unrecognized feature or property identifier;Constructor

SAXNotRecognizedException(String message) Construct a new exception with the given message.

Page 57: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 57

public class SAXNotSupportedException

extends SAXExceptionException class for an unsupported operation. An XMLReader will throw this exception when it

recognizes a feature or property identifier, but cannot perform the requested operation (setting a state or value)

Constructor: SAXNotSupportedException(String message) Construct a new exception with the given message.

Page 58: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 58

package org.xml.sax.helpers for SAX2

AttributeListImpl implements AttributeListAttributesImpl implements AttributesDefaultHandler LocatorImpl implements LocatorNamespaceSupport ParserAdapter : ParserFactory :XMLFilterImpl : impements XMLFilterXMLReaderAdapter XMLReaderFactory

Page 59: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 59

public class org.xml.sax.helper.AttributesImpl

extends java.lang.Object implements AttributesDefault implementation of the Attributes interface, with

the addition of manipulators so that the list can be modified or reused.

typical uses of this class: 1. take a persistent snapshot of an Attributes object in a

startElement event; 2. construct or modify an Attributes object in a SAX2

XMLReader or filter. replaces the deprecated SAX1 AttributeListImpl class;a much more efficient implementation using arrays

rather than Vector.

Page 60: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 60

public class org.xml.sax.helper.AttributesImpl, Attributes2Impl

Constructors: AttributesImpl() AttributesImpl(Attributes atts)

Methods addAttribute(uri, localName, qName, type, value) clear() removeAttribute(int index) setAttribute(int index, uri, localName, qName, type, value) setLocalName(int index, localName) setQName(int index, qName) setType(int index, java.lang.String type) setURI(int index, java.lang.String uri) setValue(int index, java.lang.String value) setDeclared(index, boolean)…,setSpecified(index, boolean) + methods declared in Attributes2

Page 61: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 61

public class DefaultHandler

extends Object implements EntityResolver, DTDHandler, ContentHandler, ErrorHandler

a convenience base class for SAX2 applications: provides a default empty implementations for all 4 interfaces: EntityResolver DTDHandler ContentHandler ErrorHandler

Application writers usually extend this class when they need to implement only part of an interface;

Constructor: public DefaultHandler()

Page 62: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 62

org.xml.sax.ext.DefaultHandler2 extends DefaultHandler

Empty implementation of additional methods for 3 extensional Handlers LexicalHandler DeclHandler EntityResolver2

Constructor Summary DefaultHandler2()

Page 63: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 63

public class org.xml.sax.helper.LocatorImpl

extends java.lang.Object implements Locator a convenience implementation of Locator. available mainly for application writers, who can use it to make a

persistent snapshot of a locator at any point during a document parse:

Locator locator; Locator startloc;public void setLocator (Locator locator){ this.locator = locator; }

public void startDocument (){ // save the location of the start of the document // for future use. Locator startloc = new LocatorImpl(locator); }

Page 64: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 64

org.xml.sax.helper.LocatorImpl org.xml.sax.ext.Lcoator2Impl

Constructor Summary Locator(2)Impl() Locator(2)Impl(Locator locator) : Copy constructor.

Method Summary setColumnNumber(int columnNumber) setLineNumber(int lineNumber) setPublicId(String publicId) setSystemId(String systemId) setEncoding2(String encoding) setXMLVersion2(String version) + getXXX()’s defined in Locator2.

Page 65: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 65

example: print the end location of an endElement event

pubilc class myHandler entends DefaultHandler {Locator loc ; // locator provided by setDocumentLocator(…)

…pubic void setDocumentLocator(Locator l) {

loc = l;…public void endElement(String uri, String lName, String qName)

{ … System.out.println(“end of “ + qName + “ element at “ colum:” + loc.getColumnNumber() + “ line: “ +

loc.getLineNumber()); … }

Page 66: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 66

public class org.xml.sax.helper.NamespaceSupport

extends java.lang.ObjectEncapsulate Namespace logic for use by SAX drivers.

tracks the declarations currently in force for each context and automatically processing XML 1.0 qNames into their Namespace parts.

Namespace support objects are reusable, but the reset method must be invoked between each session.

a simple session:// when startDocument()String[] parts = new String[3];NamespaceSupport support = new NamespaceSupport();support.pushContext(); // before first prefixMappingsupport.declarePrefix("", "http://www.w3.org/1999/xhtml ");support.declarePrefix("dc", "http://www.purl.org/dc#");

Page 67: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 67

public class org.xml.sax.helper.NamespaceSupport

//when startElement() or charData(…)

String[] parts = support.processName(“p", parts, false);

// isAttribute=false

System.out.println("Namespace URI: " + parts[0]);

System.out.println("Local name: " + parts[1]);

System.out.println("Raw name: " + parts[2]);

String[] parts = support.processName("dc:title", parts, false);

System.out.println("Namespace URI: " + parts[0]);

System.out.println("Local name: " + parts[1]);

System.out.println("Raw name: " + parts[2]);

… support.pushContext();

// when endElement() or endDocument() is encountered

support.popContext();

Page 68: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 68

public class org.xml.sax.helper.NamespaceSupport

Field: static String XMLNS // The XML Namespace as a constant.

Constructor Summary NamespaceSupport()

Method Summary boolean declarePrefix(prefix, uri) // Declare a Namespace prefix. Enumeration getDeclaredPrefixes() Return an enumeration of all prefixes declared in this context. Enumeration getPrefixes() Return an enumeration of all active prefixes. String getURI( prefix) void popContext() Revert to the previous Namespace context. String[] processName(rawName,String[] parts, boolean isAttribute) Process a raw XML 1.0 name. void pushContext() Start a new Namespace context. void reset() // Reset this Namespace support object for reuse.

Page 69: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 69

public class org.xml.sax.helpers.XMLReaderFactory

Contains static methods for creating an XML reader from an explicit class name, or for creating an XML reader based on the value of the org.xml.sax.driver system property:

try{XMLReader myReader = XMLReaderFactory.createXMLReader([aClassName]);

}catch(SAXException e) { System.err.println(e.getMessage()); }

Page 70: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 70

public class org.xml.sax.helpers.XMLReaderFactory

Method Summary static XMLReader createXMLReader() Attempt to create an XML reader from the system property

“org.xml.sax.driver” static XMLReader createXMLReader(String className) Attempt to create an XML reader from a class name.

How to use XMLReaderFactory to create an XMLReader: 1. XMLReader rd = XMLReaderFactory.

createXMLReader(“org.apache.xerces.parsers.SAXParser”); // or 2.1 System.getProperties(). put(“org.xml.sax.driver”, “org.apache.xerces.parsers.SAXParser”); 2.2 XMLReader rd = XMLReaderFactory. createXMLReader(); note: 2.1 can be replaced by java –Dorg.xml.sax.driver=org.apache.xerces.SAXParser

Page 71: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 71

Apache Xerces: org.apache.xerces.parsers.SAXParser

Implements org.xml.sax.Parser, org.xml.sax.XMLReader provides a parser which implements the SAX1 and SAX2

parser APIs Constructor Summary

SAXParser() // Default constructor.Methods

String[] getFeaturesRecognized() String[] getPropertiesRecognized() …

How to create an XMLReader /SAX Parser directly : org.xml.sax.XMLReader rd = new SAXParser(); org.xml.sax.Parser parser = new SAXParser();

Page 72: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 72

The plugability mechanism of Sun’s JAXP

http://java.sun.com/xmlpackage: javax.xml.parsers

Class Summary

Document BuilderDefines the API to obtain DOM Document instances from an XML document.

DocumentBuilderFactory

Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.

SAX Parser Defines the API that wraps an XMLReader implementation class.

SAXParserFactory

Defines a factory API that enables applications to configure and obtain a SAX based parser to parse XML documents.

 

Page 73: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 73

sample code

SAXParser parser;DefaultHandler handler = new MyApplicationParseHandler();SAXParserFactory factory = SAXParserFactory.newInstance();factory.setNamespaceAware(true); //default false; sax: truefactory.setValidating(true);try {

parser = factory.newSAXParser();parser.parse("http://myserver/mycontent.xml", handler);} catch (SAXException se) {// handle error} catch (IOException ioe) {// handle error} catch (ParserConfigurationException pce) {// handle error

}

Page 74: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 74

How JAXP’s SAXParserFactory find its newInstance

The order used to find a SAXParserFactory implementation class:

1. javax.xml.parsers.SAXParserFactory system property.

2. find the above property from the file "lib/jaxp.properties" in the JRE directory.

3. Use the classname in the file META-INF/services/ javax.xml.parsers.SAXParserFactory in jars available to the runtime.

4. Platform default SAXParserFactory instance, which is “com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl” in JAXP1.2,1.3

Page 75: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 75

javax.xml.parsers.SAXParserFactory

abstract boolean getFeature(String name) abstract void setFeature(String name, boolean value)

get/Set the particular feature in the underlying implementation of org.xml.sax.XMLReader.

boolean isNamespaceAware() void setNamespaceAware(boolean awareness)

get/set the namespace support of the parser that would be produced by this code.

boolean isValidating() void setValidating(boolean validating)

get/set the validdating property of the produced parsers. static SAXParserFactory newInstance()

Obtain a new instance of a SAXParserFactory. abstract SAXParser newSAXParser()

Creates a new instance of a SAXParser using the currently configured factory parameters.

Page 76: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 76

javax.xml.parsers.SAXParser

abstract XMLReader getXMLReader() abstract Parser getParser()

Returns the XMLReader or SAX parser that is encapsultated by the implementation of this class.

abstract Object getProperty(String name) abstract void setProperty(String, Object)abstract boolean isNamespaceAware() abstract boolean isValidating() void parse(input, handler)

handler => DefaultHandler or HandlerBase, input =>File, InputSource, InputStream, URI(String),

void parse(InputStream, HandlerBase | DefaultHandler, String uri) uri is used for resolving relative URI.

Page 77: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 77

SAX2: Filters

The SAX interface assumes two basic streams:

1. a stream of requests flowing from the application to the SAX driver; and

2. a stream of events (and other information) flowing from the SAX driver to the application.

Application

(ContentHandler,

DTDHandler,

ErrorHander,

… )

SAXDriver

(XMLReader)

parse(…)

startDocument()

endDocument()

setFeature()

setProperty() input

Source

Page 78: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 78

extend SAX model to support a processing chain

Application

(parent)

SAXDriver

(XMLReader)

parse(…)

startDocument()

endDocument()

setFeature()

setProperty()

input

Source

Application

SAXDriver

(XMLReader)

parse(…)

startDocument()

endDocument()

setFeature()

setProperty()XMLFilter

Page 79: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 79

SAX2 support of XMLFilter

a new interface, org.xml.sax.XMLFilter, and a new helper class, org.xml.sax.XMLFilterImplpubic interface XMLFIlter extends XMLReader

setParent(XMLReader) XMLReader getParent()

piblic class XMLFilterImpl implement XMLFilter, ContentHandler, ErrorHandler,

DTDHandler, EntityResolver // by delegating all receiving event handlings to

registered external application handler.// note XMLFilter is itself a DefaultHandler

Page 80: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 80

Example

a simple filter that changes the Namespace URI http://www.foo.com/ns/ to http://www.bar.com/ wherever it appears in an element name

public class FooFilter extends XMLFilterImpl { public FooFilter () { }

public FooFilter (XMLReader parent) { super(parent); }

public void startElement (String uri, String localName, String qName, Attributes atts) throws SAXException

{ if (uri.equals("http://www.foo.com/ns/"))

uri = "http://www.bar.com/ns/";

super.startElement(uri, localName, qName, atts); }

Page 81: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 81

public void endElement (String uri, String localName, String qName) throws SAXException {

if (uri.equals("http://www.foo.com/ns/"))

uri = "http://www.bar.com/ns/";

super.endElement(uri, localName, qName); }

XMLFilterImpl

MyXMLFilter

startElement(…)

super.startElement()

Application

(ContentHandler)

startElement()

startElement(…) {

if(cntHandler != null)

cntHandler.startElement(..) ;

}

Page 82: Introduction to SAX: a standard interface for event-based XML parsing

An Introduction to SAX

Transparency No. 82

XMLWriter

XMLReader : xml document (InputSource) SAX EventsXMLWriter (@)extends XMLFilterImpl

SAX events xml document (fragment)Ex:

XMLWriter w = new XMLWriter();

w.startDocument();

w.startElement("greeting");

w.characters("Hello, world!");

w.endElement("greeting");

w.endDocument();

=>output :

<?xml version="1.0" standalone="yes" ?>

<greeting>Hello world!</greeting>