5.Java Parser

Embed Size (px)

Citation preview

  • 8/14/2019 5.Java Parser

    1/33

    XML Parsers

    Parsing

  • 8/14/2019 5.Java Parser

    2/33

    Parsing XML parsing is required so that our application can

    inspect, retrieve and modify the document contents.XML parser program this sits between XMLdocument and our application. In an attempt tostandardize the way parser should work, twospecification has come out, that spells out theinterfaces that an application can expect from aparser:

    SAX: the Simple API for XML: SAX processes theXML document a tag at a time and generates events.

    DOM: the Document Object Model: describes thedocument as a data-structure in the form of tree. Itfirst loads the entire xml in the form of tree. Then

    application can edit any traverse and edit any node.

  • 8/14/2019 5.Java Parser

    3/33

    SAX Vs. DOM When it comes to fast, efficient reading of XML data,

    SAX is hard to beat. It requires little memory, because

    it does not construct an internal representation (treestructure) of the XML data. Instead, it simply sendsdata to the application as it is read your applicationcan then do whatever it wants to do with the data it

    sees.But you cant go back to an earlier position orleap ahead to a different position.

    In general, it works well when you simply wantto read data and have the application act on it.

    DOM is not suitable for the above since it has to readthe entire data before it acts on it. Also it requiresmore memory.

    But when you need to modify an XML structure

    especially when you need to modify it interactively,an in-memory structure like the Document Object

  • 8/14/2019 5.Java Parser

    4/33

    JAXP APITHE Java API for XML Processing (JAXP) is for processing

    XML data using applications written in the Java programming

    language.

    JAXP leverages the parser standards SAX (Simple API for

    XML Parsing) and DOM (Document Object Model) so that you

    can choose to parse your data as a stream of events or to build

    an object representation of it.JAXP also supports the XSLT (XML Stylesheet Language

    Transformations) standard, giving you control over the

    presentation of the data and enabling you to convert the data to

    other XML documents or to other formats, such as HTML.JAXP also provides namespace support, allowing you to work

    with DTDs that might otherwise have naming conflicts.

    JAXP comes with standard java SDK.

  • 8/14/2019 5.Java Parser

    5/33

    Steps to write application

    1. Obtain a parser object2. Obtain a source of XML data

    3. Give that source to the parser to parse. JAXP has just Interfaces for SAX and DOM

    and abstract classes that provide factorymethods for obtaining instances of parserand an XML data source.

    4 packages:

    org.xml.sax: SAX Distribution org.xml.sax.helper: SAX Distribution org.w3c.dom: DOM in java javax.xml.parsers: JAXP distribution

  • 8/14/2019 5.Java Parser

    6/33

    SAX Programming model

    Not a W3C standard but widely adopted includingIBM and Sun.

    The standard SAX distribution for java contains 2

    packages: org.xml.sax

    org.xml.sax.helpers.

    They contain 11 classes and interfaces.

  • 8/14/2019 5.Java Parser

    7/33

    Classes

    Classes related to Parser: org.xml.sax.XMLReader is the interface that

    an XML parser's SAX2 driver must implement. It isan Interface for reading an XML document using

    callbacks. javax.xml.parsers.SAXParser defines the

    API that wraps an XMLReader implementationclass. An instance of this class can be obtainedfrom thejavax.xml.parsers.SAXParserFactory.newSAXParser() method.

  • 8/14/2019 5.Java Parser

    8/33

    Classes related to application that we write:

    Contain interface calledorg.xml.sax.ContentHandler:This is the main interface

    that most SAX applications implement.This interface define themethods which the parser class will use as call backs. TheParser class excepts an object of this type to be passed in itsconstructor.

    org.xml.sax.helpers.DefaultHandleris a class thatimplements ContentHandler. Default base class for SAX2event handlers.

    Exception classes:SAXException,

    SAXParserExceptionHelper classes: SAXParserFactory

    When parser reaches the end of the document, the only data inthe memory is what your application saved.

  • 8/14/2019 5.Java Parser

    9/33

    SAX Programming model

    XML source

    DTD

    (optional) SAXParser

    calls

    handlermethods

    startDocument

    startElement

    characters

    endElement

    endDocument

    etc

    output

    Class implementing ContentHandler

    SAXParserFactory

    2.input

    2. input

    1. creates 2. input

    e

    v

    e

    n

    t

    s

  • 8/14/2019 5.Java Parser

    10/33

    org.xml.sax.ContentHandler

    It is this interface which declares the eventhandling methods of SAX.void characters(char ch[], intstart, int length)

    void startDocumentvoid endDocument()public void startElement(String uri,String localName, String qName,

    Attributes attributes)void endElement(String uri, StringlocalName, String qName)

    void processingInstruction(Stringtarget, String data)

  • 8/14/2019 5.Java Parser

    11/33

    DefaultHandler andSAXParser

    DefaultHandler: The easiest way to implementContentHandler interface is to extend theDefaultHandler class, defined in theorg.xml.sax.helpers package.

    SAXParserFactory, SAXParser:SAXParseris an abstract class. The staticnewInstance()method ofSAXParserFactory returns a newconcrete implementation of this class. It throws aParserConfigurationExceptionif it is unable

    to produce a parser that matches the specifiedconfiguration of options.

    Xerces Parser from Apache: implements the Parserand uses JAXP API (org.apache.xerces.jaxp).

  • 8/14/2019 5.Java Parser

    12/33

    //Program 1: Counting no. of elements

    import java.io.*;

    import org.xml.sax.Attributes;

    import javax.xml.parsers.SAXParser;

    import org.xml.sax.helpers.DefaultHandler;

    import javax.xml.parsers.SAXParserFactory;public class CountSax extends DefaultHandler{

    public static void main(String s[]) throwsException{

    if (s.length !=1){

    System.out.println("Usage: cmd filename");

    System.exit(0);

  • 8/14/2019 5.Java Parser

    13/33

    // Use the default (non-validating) parser

    SAXParserFactoryfactory=SAXParserFactory.newInstance();

    /*Creates a new instance of a SAXParser using the currently

    configured factory parameters.*/

    SAXParser saxParser=factory.newSAXParser();

    File f= new File(s[0]);

    if(f.exists())

    // Parse the input

    saxParser.parse(f,new CountSax());

    else

    System.out.println("unknown file");

    }

  • 8/14/2019 5.Java Parser

    14/33

    static private int ele=0;

    public void startDocument(){ele=0;}

    public void startElement(String uri, StringlocalName, String qName, Attributes attrs)

    { ele++;}

    public void endDocument(){

    System.out.println("Number of elements :"+ele);

    }}

    Execution:

    java CountSax note.xml

    Number of elements :4

  • 8/14/2019 5.Java Parser

    15/33

    /*Program 2: Creating HTML document to represent

    note.xml*/

    import java.io.*;

    import org.xml.sax.*;

    import javax.xml.parsers.*;

    import org.xml.sax.helpers.DefaultHandler;

    public class NoteSax extends DefaultHandler{

    PrintWriter out;

    public NoteSax()throws Exception{out= new PrintWriter(new BufferedWriter(newFileWriter("note.html")));

    }

  • 8/14/2019 5.Java Parser

    16/33

    public static void main(String s[]) throwsException{

    if (s.length !=1){

    System.out.println("Usage: cmd filename");

    System.exit(0);}

    SAXParserFactory

    factory=SAXParserFactory.newInstance();

    SAXParser saxParser=factory.newSAXParser();

    File f= new File(s[0]);

    if(f.exists())

    saxParser.parse(f,new NoteSax());

    else

    System.out.println("unknown file");}

  • 8/14/2019 5.Java Parser

    17/33

    public void startDocument(){}

    public void startElement(String uri, StringlocalName, String qName, Attributes attrs){

    if(qName.equals("note"))

    out.println("Note");

    if(qName.equals("to"))out.println(" To, ");

    if(qName.equals("from"))

    out.println("

    -from ");if(qName.equals("body") && (attrs.getLength()>0)){for (int i = 0; i < attrs.getLength(); i++) {

    String aName = attrs.getQName(i);

    Strin value=attrs. etValue(i);

  • 8/14/2019 5.Java Parser

    18/33

    if(aName.equals("type")){

    if( value.equals("warm"))

    out.println("");

    if( value.equals("cold"))

    out.println("");

    if( value.equals("formal"))

    out.println(""); }

    if(aName.equals("subject"))

    out.println("" +value+":");}//end of for

    }// end of if

    }

  • 8/14/2019 5.Java Parser

    19/33

    public void endElement(String uri, StringlocalName, String qName, Attributes attrs){

    if(qName.equals("body"))

    out.println("");

    if(qName.equals("from"))

    out.println("

    ");}

    public void endDocument(){

    out.println("");

    out.close();}

    public void characters(char buf[], int offs, intl) throws SAXException{

    String s = new String(buf, offs, l);

    out.println(s+ "
    ");}}

  • 8/14/2019 5.Java Parser

    20/33

    you

    If today was aperfect day then there would be notomorrowGod

    Execution:

    java CountSax note.xml

    creates note.html

    note.xml

  • 8/14/2019 5.Java Parser

    21/33

    Note

    To,

    you

    Contemplation:

    If today was a perfect day then therewould be no tomorrow


    -from

    God

    note.html

  • 8/14/2019 5.Java Parser

    22/33

    Document object model. It is a standard

    produced by W3C .All DOM processing assumes that you haveread and parsed a complete document into

    memory so that all parts are equally accessible.The data is represented in the form of tree.

    Disadvantages

    4.It is pretty clumsy if you want to pick out a fewelements.

    5.Memory requirement could get restrictive

    DOM

  • 8/14/2019 5.Java Parser

    23/33

    org.w3c.dom package

    Interfaces:Node

    Document(extends

    Node

    ):The Documentinterface represents the entire HTML or XMLdocument.

    NodeList interface provides the abstractionof an ordered collection of nodes

    There are static methods inNode interface tocheck element type.Node.ELEMENT_NODE,Node. CDATA_SECTION_NODE

  • 8/14/2019 5.Java Parser

    24/33

    Methods

    Document Methods:public NodeListgetElementsByTagName(String tagname )

    public Element

    createElement(String tagName) throwsDOMException

    public Comment createComment(String data)public Text createTextNode(String data)

    NodeList Methods:public int getLength()public Node item(int index)

  • 8/14/2019 5.Java Parser

    25/33

    Node Methods:

    Methods to access information about current node:

    public String getNodeName()public short getNodeType()

    public NodeList getChildNodes()

    Methods to modify the nodes children

    public Node appendChild(Node newChild) throwsDOMException

    public Node removeChild(Node oldChild) throwsDOMException

    public Node replaceChild(Node newChild,Node oldChild) throws DOMException

  • 8/14/2019 5.Java Parser

    26/33

    DOM Programming model

    XML source

    DocumentBuilderNode

    DTD

    (optional)

    Search Mechanism

    Output

    Recursively search nodes

    3.Parse

    and build

    the tree

    Document (DOM)2.input 2.input

    DocumentBuilderFactory

    1.creates

    // Program 1: counting no of elements

  • 8/14/2019 5.Java Parser

    27/33

    // Program 1: counting no. of elements

    import org.w3c.dom.*;

    import

    javax.xml.parsers.DocumentBuilderFactory;

    import javax.xml.parsers.DocumentBuilder;

    import java.io.*;

    public class CountDom{

    public static void main(String str[])throwsException{

    File f= new File(str[0]);

    Node n= readFile(f);

    int ele=getElementCount(n);

    System.out.println(ele);}

  • 8/14/2019 5.Java Parser

    28/33

    public static Document readFile(File f) throwsException{

    Document d;

    DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();

    dbf.setValidating(true);

    DocumentBuilder db=dbf.newDocumentBuilder();

    d=db.parse(f);

    return d;}

    public static int getElementCount(Node node){

    if(node==null)

    return 0;

    int sum=0;

    boolean

  • 8/14/2019 5.Java Parser

    29/33

    booleanisElement=(node.getNodeType()==Node.ELEMENT_NODE);

    if(isElement)sum=1;

    NodeList children= node.getChildNodes();

    if(children==null)return sum;

    for(int i=0;i

  • 8/14/2019 5.Java Parser

    30/33

    // Program 2: Adding a comment and a node anddisplaying

    import javax.xml.parsers.DocumentBuilderFactory;

    import javax.xml.parsers.DocumentBuilder;

    import java.io.*;

    import org.w3c.dom.*;

    public class AddNodeDom{

    static Node n1;

    static Comment c;

    public static void main(String str[])throwsException{

    File f= new File(str[0]);

    Document n= readFile(f);

    setElements(n);

  • 8/14/2019 5.Java Parser

    31/33

    setElements(n);

    display(n);

    System.out.println("done");

    }

    public static Document readFile(File f) throwsException{

    Document d;

    DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();

    DocumentBuilder db=dbf.newDocumentBuilder();

    d=db.parse(f);

    return d;

    }

  • 8/14/2019 5.Java Parser

    32/33

    public static void display(Node node){

    if(node.getNodeType()==Node.ELEMENT_NODE)

    System.out.print(node.getNodeName()+":");if(node.getNodeType()==Node.TEXT_NODE ||node.getNodeType()==Node.COMMENT_NODE )

    System.out.println(node.getNodeValue().trim());

    NodeList children= node.getChildNodes();

    if(children!=null)

    for(int i=0;i

  • 8/14/2019 5.Java Parser

    33/33

    public static void setElements(Node node){

    if(node==null) return;

    booleanisEle=(node.getNodeType()==Node.ELEMENT_NODE);

    if(isEle && node.getNodeName().equals("display-name")) n1= node;

    if(isEle && node.getNodeName().equals("servlet"))

    { node.appendChild(c);

    node.appendChild(n1);}

    NodeList children= node.getChildNodes();

    if(children!=null)

    for(int i=0;i