36
March 27, 2022 Copyright © 2004 James A Farley Distributed and Enterprise Computing Lecture 2 : XML Basics February 10, 2004

June 15, 2015Copyright © 2004 James A Farley Distributed and Enterprise Computing Lecture 2 : XML Basics February 10, 2004

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

April 18, 2023 Copyright © 2004 James A Farley

Distributed and Enterprise Computing

Lecture 2 : XML BasicsFebruary 10, 2004

April 18, 2023 Copyright © 2004 James A Farley

Agenda

° Announcements° Unit overview

° …and tackling the assignment

° XML° Basics of XML (documents, entities, DTDs, Schemas)° Parsing modes: SAX and DOM° Java parsers° XML-based client/server

April 18, 2023 Copyright © 2004 James A Farley

Announcements

° Reminders: ° Assignment 0 due Feb 16° Assignment 1 due Feb 27

April 18, 2023 Copyright © 2004 James A Farley

Unit 1: Fundamentals and Tools of the Trade

Overview of the Unit

April 18, 2023 Copyright © 2004 James A Farley

Enterprise Applications

April 18, 2023 Copyright © 2004 James A Farley

Scope for Unit 1

April 18, 2023 Copyright © 2004 James A Farley

File I/Oroutines

Deconstructing the Assignment

XML parse/generate

code

Internaldata structures

Web component(Servlet/JSP)Web component

(Servlet/JSP)Web component(Servlet/JSP)

User securitylogic

Filestructures

April 18, 2023 Copyright © 2004 James A Farley

One possible scenario

° Week 1: XML° Build basic I/O

and XML utils

° Week 2: Web components° Build a subset of the

UI components

° Week 3: Web components extended° Integrate user security° Finish the UI

Web component(Servlet/JSP)Web component

(Servlet/JSP)

User securitylogic

File I/Oroutines

XML parse/generate

code

Internaldata structures

Filestructures

Web component(Servlet/JSP)Web component

(Servlet/JSP)

April 18, 2023 Copyright © 2004 James A Farley

Unit 1 : Fundamentals and Tools of the Trade

eXtensible Markup Language (XML)

April 18, 2023 Copyright © 2004 James A Farley

Why are we starting with XML?

° It’s a technological swiss army knife° Great to keep in the toolbox

° It’s a basic element at every level of enterprise systems° Data representations° Inter-process communications (RPC, SOAP, etc.)° User-interface representations

° Good entrée to the overall J2EE environment° APIs and SPIs° Standards vs. tool particulars

April 18, 2023 Copyright © 2004 James A Farley

eXtensible Markup Language (XML)

° A data protocol language° Defined and controlled by W3C° Means for defining data protocols° Document structure defined by Document Type Definitions

(DTDs) or XML Schemas (newer, richer format)° XML documents are validated against these structure rules

° Roots in Standard Generalized Markup Language (SGML)° SGML: “Represent content only, separate from display.”° HTML: “Simple content, simple display info.”° XML: “Separate content (again), keep it simple but extensible.”

April 18, 2023 Copyright © 2004 James A Farley

XML Basics

° XML documents are indivisible units of data° Data is held and delivered in this form, nothing smaller (well, usually)

° Documents are composed of elements° Elements contain:

° Other elements (hierarchical)° “Markup data”° Character data or unparsed data

° Markup includes anything not part of the data itself° Element start and end tags, any element attributes

° Character data is inserted raw into the entities, or delimited by CDATA sections

° E.g., if the characters could be confused as XML markup° Different types of character data (UTF-8, international character sets, etc.)

can be used

° Unparsed data are things like external entities referenced by URIs

April 18, 2023 Copyright © 2004 James A Farley

XML Example

Element

Attribute

Data

Child element

<?xml version='1.0' encoding='us-ascii'?><purchase-order> <account-id>127-0045-1496-01</account-id> <line-item idx=“1”> <prod-desc>Apple PowerBook G4 17”</prod-desc> <prod-code>APP-987-00856-3</prod-code> <units>1</units> <price-quote>2999.99</price-quote> </line-item>

<line-item idx=“3”> <prod-desc>Free catalog</prod-desc> <prod-code>INT-0001</prod-code> </line-item> <shipped status=“yes”/></purchase-order>

April 18, 2023 Copyright © 2004 James A Farley

Defining XML Protocols

° Previous example is well-formed° All the syntax is correct, tags ended properly, etc.° Can be parsed cleanly by a compliant non-validating XML parser

° But no rules about the structure have been given° What sub-elements are required/optional?° What contexts can elements be used in?° What attributes are appropriate for the element types?

° In order to validate the XML data, we need a definition of the expected structure (aka, the protocol)° Document type definition (DTD) or XML Schema° Two standards for defining XML document structure

April 18, 2023 Copyright © 2004 James A Farley

Document Type Definitions (DTDs)

° A definition of the elements that can exist in a particular class of XML documents° The elements can be declared to contain no data, any

data, or specific data° Elements are declared with the “<!ELEMENT . . .>”

entity.

° Attributes are declared using “<!ATTLIST…>” entity

<!ELEMENT br EMPTY ><!ELEMENT container ANY ><!ELEMENT p (#PCDATA) ><!ELEMENT pos (x, y, z?) ><!ELEMENT view (front|back|top) >

<!ATTLIST line-item idx ID #REQUIRED >

April 18, 2023 Copyright © 2004 James A Farley

DTD for our example

<!-- DTD for purchase orders --><!–- Orders are the root of the object hierarchy. --><!ELEMENT purchase-order (account-id, line-item+, shipped) ><!ELEMENT account-id (#PCDATA) ><!ELEMENT line-item (prod-desc, prod-code, units?, price-quote?)><!ELEMENT prod-desc (#PCDATA)><!ELEMENT prod-code (#PCDATA)><!ELEMENT units (#PCDATA)><!ELEMENT price-quote (#PCDATA)><!ELEMENT shipped EMPTY>

<!ATTLIST line-item idx ID #REQUIRED ><!ATTLIST shipped status (yes|no) #REQUIRED >

April 18, 2023 Copyright © 2004 James A Farley

Referencing a DTD

<?xml version='1.0' encoding='us-ascii'?><!DOCTYPE purchase-order SYSTEM "http://my.server.com/po-data.dtd"><purchase-order> . . .</purchase-order>

April 18, 2023 Copyright © 2004 James A Farley

XML Schema

° Much richer way to describe classes of XML documents° More complex structure possible° Includes data type specifications

° But it’ll cost you° More complicated to author XML Schemas

° Analogy to RDBMS schemas isn’t by accident

° Schema-validating parsers will typically be more heavyweight

° Currently a W3C Recommendation (1.0 version)° Highest level in the W3C track° Reached formal status May 2001° Tool support has been growing steadily

° Xerces 2.x, oXygen, XMLSpy, MSXML, among others

° Support is being accelerated by SOAP usage, to some degree

April 18, 2023 Copyright © 2004 James A Farley

XML Schema example

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <!-- Element used to hold data about the content service. --> <xs:element name="content-service"> <xs:complexType> <!-- Document can either contain a set of users, subs and content, or an error element --> <xs:choice> <xs:sequence> <xs:element name="user" type="userType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="subscription" type="subscriptionType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="content-set" type="content-setType"/> </xs:sequence> <xs:element name="error" maxOccurs="unbounded"> . . . </xs:element> </xs:choice> </xs:complexType> . . .

April 18, 2023 Copyright © 2004 James A Farley

XML Schema example (cont)

. . .<!-- Complex type used for articles --><xs:complexType name="articleType"> <xs:sequence> <xs:element ref="article-id"/> <xs:element ref="title"/> <xs:element ref="by-line" minOccurs="0"/> <xs:element ref="abstract" minOccurs="0"/> <xs:element ref="pub-date"/> <xs:element ref="content-type"/> <xs:element ref="content-url" minOccurs="0"/> <xs:element ref="content" minOccurs="0"/> </xs:sequence></xs:complexType><!-- Basic elements supporting the article type --><xs:element name="abstract" type="xs:string"/>

. . .<xs:element name="content-url" type="xs:anyURI"/><xs:element name="pub-date" type="xs:string"/>

. . .

April 18, 2023 Copyright © 2004 James A Farley

Referencing a Schema

<?xml version="1.0" encoding="UTF-8"?><!-- Specify that our document follows the schema defined on the course site --><content-service xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“ xsi:noNamespaceSchemaLocation= "http://courses.dce.harvard.edu/…/content.xsd"> <user> <account-id>1001</account-id> <name>John Smith</name> </user>. . .

April 18, 2023 Copyright © 2004 James A Farley

Parser

Parsing: Simple API for XML (SAX)

° Implement org.xml.sax.HandlerBase

° Set parser’s doc handler° Run parser° Event-related callbacks are

called during parse° Less direct, more implicit° More options for

optimizationHandler

CDATAnamecontent-service user account-id CDATA

Parse order

content-service

user

account-id

CDATA

name

CDATA

Events

April 18, 2023 Copyright © 2004 James A Farley

Parser

Parsing: Document Object Model (DOM)

° Create a parser, call its parse() operation

° Returns an org.w3c.dom.Document

° Browse the nodes in the document, add, edit, delete

° Simpler to implement° More costly (creation of Node objects, etc.)

DOM

content-service

user

account-id

CDATA

name

CDATA

April 18, 2023 Copyright © 2004 James A Farley

Java and XML

° Easy enough to write XML documents, but how do we access/create them from Java?

° Need a Java API to use SAX/DOM parsers° SAX: Event-driven parsing, entities encountered in the document fire

specific event handlers° Faster runtime, but generally more work to implement

° DOM: Parse all entities into a tree hierarchy, then app can walk the tree, extract/convert/alter data

° Not as efficient, but very simple to implement

° Many Java parsers available° Apache Xerces (the standard one for our course)° JDOM (independent effort, “Java-centric” API)° Crimson from Sun (bundled with JAXP and JDK 1.4 as “reference impl”)

April 18, 2023 Copyright © 2004 James A Farley

Parsing APIs : SAX and DOM

° SAX:° Representations for parsing process, not document

elements° Parse engine represented by org.xml.sax.Parser

(SAX1) or org.xml.sax.XMLReader (SAX2)° Handlers registered by app to receive parse events

° ContentHandler, ErrorHandler

° DOM:° Representations for document elements, not parsing

process° org.w3c.dom.Document contains hierarchy of Nodes

April 18, 2023 Copyright © 2004 James A Farley

Parsing APIs : Xerces

° Provides concrete implementation of SAX and DOM° Concrete SAX Parser/XMLReader: org.apache.xerces.parsers.SAXParser

° Concrete DOM Document and Node: org.apache.xerces.dom.DocumentImpl, NodeImpl

° Provides non-standard, “native” APIs:DOMParser parser = new DOMParser();parser.parse(new InputSource(new StringReader(body)));Document doc = parser.getDocument();// Get the first (and only) "id" elementNode idNode = doc.getElementsByTagName("id").item(0);String msgId = idNode.getFirstChild().getNodeValue();// Get the first (and only) "body" elementNode bodyNode = doc.getElementsByTagName("body").item(0);String msgBody = bodyNode.getFirstChild().getNodeValue();

Xerces-specific

StandardDOM APIs

April 18, 2023 Copyright © 2004 James A Farley

Parsing APIs : JAXP

° Java API for XML Parsing (JSR 000005)° Specification 1.0 released Mar 2000, 1.2 in Aug 2002° Actually covers both parsing and transforming XML° Parsing package very simple

° Standardizes the initialization stage

° Uses SAX and DOM APIs for parsing handlers and document reps.

° Other parsers implement the JAXP API° Plugability layer (system properties) allows you to specify which

to use° JAXP implementations provided by Xerces 1.4/2.x, Crimson

° JAXP-compliant parser required by J2EE 1.3 spec.

April 18, 2023 Copyright © 2004 James A Farley

XML in Action

° Simple message format° Messages are delivered with the following

structure:

NOTE: We’ve omitted the XML header of this document

<message> <id>greeting</id> <body>Hi there</body></message>

April 18, 2023 Copyright © 2004 James A Farley

Parse Using DOM

InputStream clientIn = new WrappedInputStream(cin);// Parse the client request into a DocumentDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();DocumentBuilder docBuilder = factory.newDocumentBuilder();Document requestData = docBuilder.parse(clientIn);// Get the message node, and then its children (id and body)Element msgNode = (Element)requestData.getElementsByTagName("message").item(0);Element idNode = (Element)msgNode.getElementsByTagName("id").item(0);Element bodyNode = (Element)msgNode.getElementsByTagName("body").item(0);

April 18, 2023 Copyright © 2004 James A Farley

Parse Using SAX

° Looks simpler, but all the callbacks are in your event handler

SAXParserFactory factory = SAXParserFactory.newInstance();SAXParser parser = factory.newSAXParser();parser.parse(new FileInputStream(. . .)), new MySAXContentHandler());

April 18, 2023 Copyright © 2004 James A Farley

Generate XML Using DOM

// Create a new DOM document, using JAXP callsDocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();msgDoc = docBuilder.newDocument(); // Create the top-most message elementNode msgNode = msgDoc.createElement("message");// Add the id childNode tmp = msgNode.appendChild(msgDoc.createElement("id"));tmp.appendChild(msgDoc.createTextNode(msgId));// Add the body to the messagetmp = msgNode.appendChild(msgDoc.createElement("body"));tmp.appendChild(msgDoc.createTextNode(msgBody));// Append the top-most element to the documentmsgDoc.appendChild(msgNode);

April 18, 2023 Copyright © 2004 James A Farley

What has XML provided?

° Extendable message/data format° Using a standard meta-language

° Support for non-Java peers° No serialized objects or Java-specific protocols° Anything with an XML parser can play with us

° Some limitations° Non-text messages problematic at best

° References to external data (e.g., URLs), or encoded binary data

° We’ll return to this when we discuss SOAP services

° Inefficient encoding protocol – lots of overhead

April 18, 2023 Copyright © 2004 James A Farley

J2EE and XML

° Only loosely tied into the specification° Application deployment

° All config data provided as XML

° Application- and component-level

° J2EE 1.3: A JAXP 1.1 implementation in guaranteed° XML is the basis for web service protocols

° Other practical uses across the board, though° Use XML for JMS message content

° Some JMS providers have extensions for this

° Generate/consume XML from JSPs or servlets° Use XML data sources behind EJBs

° More on this later, practical use is somewhat limited

April 18, 2023 Copyright © 2004 James A Farley

XML Bottom-line

° Remember its roots: extensible data protocols° Great for data interchange, extensible data formats

° Lots of tool support° Same idea as Excel and tab-delimited data feeds

° Not a cure-all° Not all enterprise systems have native XML support° Others only support proprietary DTDs/Schemas/envelope

protocols° Laws of physics still prevail: interfaces need to be developed

° Be careful with overhead° Parsing/validating XML isn’t cheap° Protocol itself has built-in bandwidth overhead

April 18, 2023 Copyright © 2004 James A Farley

Further Reading

° XML Basics° http://www.oreilly.com/catalog/learnxml2/chapter/

ch02.pdf

° XML Schema° Tutorial:

° http://www.xfront.com/xml-schema.html

° XSLT: XML stylesheets and transformations° XSLT tutorial on ZVON

° http://www.zvon.org/xxl/XSLTutorial/Output/index.html

° Xalan docs and sample apps° http://xml.apache.org/xalan-j/index.html

April 18, 2023 Copyright © 2004 James A Farley

Suggested Reading for Next Time

° Check the course site…