View
213
Download
0
Tags:
Embed Size (px)
Citation preview
April 18, 2023 Copyright © 2004 James A Farley
Distributed and Enterprise Computing
Lecture 2 : XML BasicsFebruary 10, 2004
April 18, 2023 Copyright © 2004 James A Farley
Agenda
° Announcements° Unit overview
° …and tackling the assignment
° XML° Basics of XML (documents, entities, DTDs, Schemas)° Parsing modes: SAX and DOM° Java parsers° XML-based client/server
April 18, 2023 Copyright © 2004 James A Farley
Announcements
° Reminders: ° Assignment 0 due Feb 16° Assignment 1 due Feb 27
April 18, 2023 Copyright © 2004 James A Farley
Unit 1: Fundamentals and Tools of the Trade
Overview of the Unit
April 18, 2023 Copyright © 2004 James A Farley
File I/Oroutines
Deconstructing the Assignment
XML parse/generate
code
Internaldata structures
Web component(Servlet/JSP)Web component
(Servlet/JSP)Web component(Servlet/JSP)
User securitylogic
Filestructures
April 18, 2023 Copyright © 2004 James A Farley
One possible scenario
° Week 1: XML° Build basic I/O
and XML utils
° Week 2: Web components° Build a subset of the
UI components
° Week 3: Web components extended° Integrate user security° Finish the UI
Web component(Servlet/JSP)Web component
(Servlet/JSP)
User securitylogic
File I/Oroutines
XML parse/generate
code
Internaldata structures
Filestructures
Web component(Servlet/JSP)Web component
(Servlet/JSP)
April 18, 2023 Copyright © 2004 James A Farley
Unit 1 : Fundamentals and Tools of the Trade
eXtensible Markup Language (XML)
April 18, 2023 Copyright © 2004 James A Farley
Why are we starting with XML?
° It’s a technological swiss army knife° Great to keep in the toolbox
° It’s a basic element at every level of enterprise systems° Data representations° Inter-process communications (RPC, SOAP, etc.)° User-interface representations
° Good entrée to the overall J2EE environment° APIs and SPIs° Standards vs. tool particulars
April 18, 2023 Copyright © 2004 James A Farley
eXtensible Markup Language (XML)
° A data protocol language° Defined and controlled by W3C° Means for defining data protocols° Document structure defined by Document Type Definitions
(DTDs) or XML Schemas (newer, richer format)° XML documents are validated against these structure rules
° Roots in Standard Generalized Markup Language (SGML)° SGML: “Represent content only, separate from display.”° HTML: “Simple content, simple display info.”° XML: “Separate content (again), keep it simple but extensible.”
April 18, 2023 Copyright © 2004 James A Farley
XML Basics
° XML documents are indivisible units of data° Data is held and delivered in this form, nothing smaller (well, usually)
° Documents are composed of elements° Elements contain:
° Other elements (hierarchical)° “Markup data”° Character data or unparsed data
° Markup includes anything not part of the data itself° Element start and end tags, any element attributes
° Character data is inserted raw into the entities, or delimited by CDATA sections
° E.g., if the characters could be confused as XML markup° Different types of character data (UTF-8, international character sets, etc.)
can be used
° Unparsed data are things like external entities referenced by URIs
April 18, 2023 Copyright © 2004 James A Farley
XML Example
Element
Attribute
Data
Child element
<?xml version='1.0' encoding='us-ascii'?><purchase-order> <account-id>127-0045-1496-01</account-id> <line-item idx=“1”> <prod-desc>Apple PowerBook G4 17”</prod-desc> <prod-code>APP-987-00856-3</prod-code> <units>1</units> <price-quote>2999.99</price-quote> </line-item>
<line-item idx=“3”> <prod-desc>Free catalog</prod-desc> <prod-code>INT-0001</prod-code> </line-item> <shipped status=“yes”/></purchase-order>
April 18, 2023 Copyright © 2004 James A Farley
Defining XML Protocols
° Previous example is well-formed° All the syntax is correct, tags ended properly, etc.° Can be parsed cleanly by a compliant non-validating XML parser
° But no rules about the structure have been given° What sub-elements are required/optional?° What contexts can elements be used in?° What attributes are appropriate for the element types?
° In order to validate the XML data, we need a definition of the expected structure (aka, the protocol)° Document type definition (DTD) or XML Schema° Two standards for defining XML document structure
April 18, 2023 Copyright © 2004 James A Farley
Document Type Definitions (DTDs)
° A definition of the elements that can exist in a particular class of XML documents° The elements can be declared to contain no data, any
data, or specific data° Elements are declared with the “<!ELEMENT . . .>”
entity.
° Attributes are declared using “<!ATTLIST…>” entity
<!ELEMENT br EMPTY ><!ELEMENT container ANY ><!ELEMENT p (#PCDATA) ><!ELEMENT pos (x, y, z?) ><!ELEMENT view (front|back|top) >
<!ATTLIST line-item idx ID #REQUIRED >
April 18, 2023 Copyright © 2004 James A Farley
DTD for our example
<!-- DTD for purchase orders --><!–- Orders are the root of the object hierarchy. --><!ELEMENT purchase-order (account-id, line-item+, shipped) ><!ELEMENT account-id (#PCDATA) ><!ELEMENT line-item (prod-desc, prod-code, units?, price-quote?)><!ELEMENT prod-desc (#PCDATA)><!ELEMENT prod-code (#PCDATA)><!ELEMENT units (#PCDATA)><!ELEMENT price-quote (#PCDATA)><!ELEMENT shipped EMPTY>
<!ATTLIST line-item idx ID #REQUIRED ><!ATTLIST shipped status (yes|no) #REQUIRED >
April 18, 2023 Copyright © 2004 James A Farley
Referencing a DTD
<?xml version='1.0' encoding='us-ascii'?><!DOCTYPE purchase-order SYSTEM "http://my.server.com/po-data.dtd"><purchase-order> . . .</purchase-order>
April 18, 2023 Copyright © 2004 James A Farley
XML Schema
° Much richer way to describe classes of XML documents° More complex structure possible° Includes data type specifications
° But it’ll cost you° More complicated to author XML Schemas
° Analogy to RDBMS schemas isn’t by accident
° Schema-validating parsers will typically be more heavyweight
° Currently a W3C Recommendation (1.0 version)° Highest level in the W3C track° Reached formal status May 2001° Tool support has been growing steadily
° Xerces 2.x, oXygen, XMLSpy, MSXML, among others
° Support is being accelerated by SOAP usage, to some degree
April 18, 2023 Copyright © 2004 James A Farley
XML Schema example
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <!-- Element used to hold data about the content service. --> <xs:element name="content-service"> <xs:complexType> <!-- Document can either contain a set of users, subs and content, or an error element --> <xs:choice> <xs:sequence> <xs:element name="user" type="userType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="subscription" type="subscriptionType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="content-set" type="content-setType"/> </xs:sequence> <xs:element name="error" maxOccurs="unbounded"> . . . </xs:element> </xs:choice> </xs:complexType> . . .
April 18, 2023 Copyright © 2004 James A Farley
XML Schema example (cont)
. . .<!-- Complex type used for articles --><xs:complexType name="articleType"> <xs:sequence> <xs:element ref="article-id"/> <xs:element ref="title"/> <xs:element ref="by-line" minOccurs="0"/> <xs:element ref="abstract" minOccurs="0"/> <xs:element ref="pub-date"/> <xs:element ref="content-type"/> <xs:element ref="content-url" minOccurs="0"/> <xs:element ref="content" minOccurs="0"/> </xs:sequence></xs:complexType><!-- Basic elements supporting the article type --><xs:element name="abstract" type="xs:string"/>
. . .<xs:element name="content-url" type="xs:anyURI"/><xs:element name="pub-date" type="xs:string"/>
. . .
April 18, 2023 Copyright © 2004 James A Farley
Referencing a Schema
<?xml version="1.0" encoding="UTF-8"?><!-- Specify that our document follows the schema defined on the course site --><content-service xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“ xsi:noNamespaceSchemaLocation= "http://courses.dce.harvard.edu/…/content.xsd"> <user> <account-id>1001</account-id> <name>John Smith</name> </user>. . .
April 18, 2023 Copyright © 2004 James A Farley
Parser
Parsing: Simple API for XML (SAX)
° Implement org.xml.sax.HandlerBase
° Set parser’s doc handler° Run parser° Event-related callbacks are
called during parse° Less direct, more implicit° More options for
optimizationHandler
CDATAnamecontent-service user account-id CDATA
Parse order
content-service
user
account-id
CDATA
name
CDATA
Events
April 18, 2023 Copyright © 2004 James A Farley
Parser
Parsing: Document Object Model (DOM)
° Create a parser, call its parse() operation
° Returns an org.w3c.dom.Document
° Browse the nodes in the document, add, edit, delete
° Simpler to implement° More costly (creation of Node objects, etc.)
DOM
content-service
user
account-id
CDATA
name
CDATA
April 18, 2023 Copyright © 2004 James A Farley
Java and XML
° Easy enough to write XML documents, but how do we access/create them from Java?
° Need a Java API to use SAX/DOM parsers° SAX: Event-driven parsing, entities encountered in the document fire
specific event handlers° Faster runtime, but generally more work to implement
° DOM: Parse all entities into a tree hierarchy, then app can walk the tree, extract/convert/alter data
° Not as efficient, but very simple to implement
° Many Java parsers available° Apache Xerces (the standard one for our course)° JDOM (independent effort, “Java-centric” API)° Crimson from Sun (bundled with JAXP and JDK 1.4 as “reference impl”)
April 18, 2023 Copyright © 2004 James A Farley
Parsing APIs : SAX and DOM
° SAX:° Representations for parsing process, not document
elements° Parse engine represented by org.xml.sax.Parser
(SAX1) or org.xml.sax.XMLReader (SAX2)° Handlers registered by app to receive parse events
° ContentHandler, ErrorHandler
° DOM:° Representations for document elements, not parsing
process° org.w3c.dom.Document contains hierarchy of Nodes
April 18, 2023 Copyright © 2004 James A Farley
Parsing APIs : Xerces
° Provides concrete implementation of SAX and DOM° Concrete SAX Parser/XMLReader: org.apache.xerces.parsers.SAXParser
° Concrete DOM Document and Node: org.apache.xerces.dom.DocumentImpl, NodeImpl
° Provides non-standard, “native” APIs:DOMParser parser = new DOMParser();parser.parse(new InputSource(new StringReader(body)));Document doc = parser.getDocument();// Get the first (and only) "id" elementNode idNode = doc.getElementsByTagName("id").item(0);String msgId = idNode.getFirstChild().getNodeValue();// Get the first (and only) "body" elementNode bodyNode = doc.getElementsByTagName("body").item(0);String msgBody = bodyNode.getFirstChild().getNodeValue();
Xerces-specific
StandardDOM APIs
April 18, 2023 Copyright © 2004 James A Farley
Parsing APIs : JAXP
° Java API for XML Parsing (JSR 000005)° Specification 1.0 released Mar 2000, 1.2 in Aug 2002° Actually covers both parsing and transforming XML° Parsing package very simple
° Standardizes the initialization stage
° Uses SAX and DOM APIs for parsing handlers and document reps.
° Other parsers implement the JAXP API° Plugability layer (system properties) allows you to specify which
to use° JAXP implementations provided by Xerces 1.4/2.x, Crimson
° JAXP-compliant parser required by J2EE 1.3 spec.
April 18, 2023 Copyright © 2004 James A Farley
XML in Action
° Simple message format° Messages are delivered with the following
structure:
NOTE: We’ve omitted the XML header of this document
<message> <id>greeting</id> <body>Hi there</body></message>
April 18, 2023 Copyright © 2004 James A Farley
Parse Using DOM
InputStream clientIn = new WrappedInputStream(cin);// Parse the client request into a DocumentDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();DocumentBuilder docBuilder = factory.newDocumentBuilder();Document requestData = docBuilder.parse(clientIn);// Get the message node, and then its children (id and body)Element msgNode = (Element)requestData.getElementsByTagName("message").item(0);Element idNode = (Element)msgNode.getElementsByTagName("id").item(0);Element bodyNode = (Element)msgNode.getElementsByTagName("body").item(0);
April 18, 2023 Copyright © 2004 James A Farley
Parse Using SAX
° Looks simpler, but all the callbacks are in your event handler
SAXParserFactory factory = SAXParserFactory.newInstance();SAXParser parser = factory.newSAXParser();parser.parse(new FileInputStream(. . .)), new MySAXContentHandler());
April 18, 2023 Copyright © 2004 James A Farley
Generate XML Using DOM
// Create a new DOM document, using JAXP callsDocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();msgDoc = docBuilder.newDocument(); // Create the top-most message elementNode msgNode = msgDoc.createElement("message");// Add the id childNode tmp = msgNode.appendChild(msgDoc.createElement("id"));tmp.appendChild(msgDoc.createTextNode(msgId));// Add the body to the messagetmp = msgNode.appendChild(msgDoc.createElement("body"));tmp.appendChild(msgDoc.createTextNode(msgBody));// Append the top-most element to the documentmsgDoc.appendChild(msgNode);
April 18, 2023 Copyright © 2004 James A Farley
What has XML provided?
° Extendable message/data format° Using a standard meta-language
° Support for non-Java peers° No serialized objects or Java-specific protocols° Anything with an XML parser can play with us
° Some limitations° Non-text messages problematic at best
° References to external data (e.g., URLs), or encoded binary data
° We’ll return to this when we discuss SOAP services
° Inefficient encoding protocol – lots of overhead
April 18, 2023 Copyright © 2004 James A Farley
J2EE and XML
° Only loosely tied into the specification° Application deployment
° All config data provided as XML
° Application- and component-level
° J2EE 1.3: A JAXP 1.1 implementation in guaranteed° XML is the basis for web service protocols
° Other practical uses across the board, though° Use XML for JMS message content
° Some JMS providers have extensions for this
° Generate/consume XML from JSPs or servlets° Use XML data sources behind EJBs
° More on this later, practical use is somewhat limited
April 18, 2023 Copyright © 2004 James A Farley
XML Bottom-line
° Remember its roots: extensible data protocols° Great for data interchange, extensible data formats
° Lots of tool support° Same idea as Excel and tab-delimited data feeds
° Not a cure-all° Not all enterprise systems have native XML support° Others only support proprietary DTDs/Schemas/envelope
protocols° Laws of physics still prevail: interfaces need to be developed
° Be careful with overhead° Parsing/validating XML isn’t cheap° Protocol itself has built-in bandwidth overhead
April 18, 2023 Copyright © 2004 James A Farley
Further Reading
° XML Basics° http://www.oreilly.com/catalog/learnxml2/chapter/
ch02.pdf
° XML Schema° Tutorial:
° http://www.xfront.com/xml-schema.html
° XSLT: XML stylesheets and transformations° XSLT tutorial on ZVON
° http://www.zvon.org/xxl/XSLTutorial/Output/index.html
° Xalan docs and sample apps° http://xml.apache.org/xalan-j/index.html