Essential Guide XML PDF

Embed Size (px)

Citation preview

  • 7/29/2019 Essential Guide XML PDF

    1/71

    THE ESSENTIAL GUIDE TO

    BY SHARON L. HOFFMAN AUGUST 2005

    XMLXML

    is a key technology for sharing data

    between business entities because it

    bridges different ways of storing and

    referencing data. Although XML can be described as a

    language, the extensible nature of XML means that its

    more correctly classified as a standard.Many interrelated standards (for a list, see Essential

    XML Standards on page 4) complement XML and expand

    its capabilities. XML is also a fundamental building block

    for other standards. For example, many Web-services

    standards, such as Simple Object Access Protocol (SOAP)

    and Web Services Description Language (WSDL), are based

    on XML. To give you a sense of how you might use XML

    in your own applications, lets start with a quick look at

    SUPPLEMENT TO iSeries NEWS 2005

    XML syntax and how XML compares with languages

    used for related tasks.

    XML in ContextAn XML document is made up of XML elements. Each

    element contains a starting tag, an ending tag, and (usually)

    data nested between the two tags. By choosing descriptive

    names for elements, you can make your XML documents

    more human-readable and therefore self-documenting. InFigure 1, the highlighted line is a single element called

    product_code. If a document contains more than one element

    of the same type, the tags will be repeated for each element

    as shown for the product_code and requested_qty elements

    in Figure 1. For more information about XML syntax see

    Essential XML Syntax and Terminology on page 3.

    Repeating the data description for every element means

    that XML documents are entirely self-contained you

    wont need to refer to a database layout, for example.

    However, the overhead of repeating all

    the element-description information

    quickly becomes unwieldy. As a result,

    most developers prefer using data-description languages (e.g., SQL, DDS)

    to define databases. However, XML shines

    in data-transfer applications that involve

    relatively small amounts of data (these are

    typically single transactions such as an

    inventory inquiry or a purchase order).

    Data transfer is by far the most common

    XML application in iSeries environments.

    However, you can also use XML to add

    meaning to text within documents. Used

    in this way, XML becomes a powerful

    Figure 1:Sample XML document

    bike component availability9/1/2005

    Acme CompanySharon [email protected]

    1234556789225

  • 7/29/2019 Essential Guide XML PDF

    2/7

    THE ESSENTIAL GUIDE TO SOFTWARE XML

    SUPPLEMENT TO iSeries NEWS 20052

  • 7/29/2019 Essential Guide XML PDF

    3/7

    THE ESSENTIAL GUIDE TO SOFTWARE XML

    1 XML is case sensitive.

    2 Generally, white space (e.g., indents, blank lines) in anXML document is ignored.

    3 You can choose any element names you like aslong as they conform to a few basic rules:

    Element names cannot contain spaces. Element names must begin with a letter or

    an underline.

    After the first character, element names cancontain numbers, hyphens, periods, colons,letters, and underscores. (Colons are usuallyavoided in element names because they havespecial meaning within XML.) Element names cannot begin with the lettersxml, regardless of case (i.e., xml, XML, xMl,and Xml are all invalid).

    4 Elements can contain one or more attributes. Inmany cases, the XML designer may choose whether touse elements or attributes to define a particular structure.As a rule of thumb, attributes should be used forinformation that is not integral to the element.

    5 An element cannot contain more than one attribute withthe same name.

    6 Both starting and ending tags are required for allelements except empty elements. Empty elements occurmost often when an element is completely defined by itsattributes.

    7 Elements must be properly nested (i.e., once an innerelement tag is opened, it must be closed before anyouter tags).

    The following nesting is correct:

    Sharon

    Hoffman

    The following nesting is syntactically correct, although itdoesnt make much sense:

    Sharon

    The following nesting is syntactically incorrect:

    Sharon

    Hoffman

    8 The outermost element in any XML document isreferred to as the root element.

    9 The root element may be preceded by a documentdeclaration and processing instructions.

    10 Built-in XML entities are used to include a characterthat has special meaning in XML (e.g., a greater-thansign) within XML content. You can also defineadditional entities as short-hand for text and structuresthat you use repeatedly.

    11 An XML document that has correct syntax is well formed.

    12 An XML document that conforms to the structure definedby its Document Type Definition (DTD) or schema isvalid. It is possible for an XML document to be wellformed but invalid, but the reverse is not possible.

    3

    tool for organizing information and improving search

    capabilities. To understand the benefits of an XML-

    encoded document, you should consider the differences

    between XML and HTML.

    Although the two languages are syntactically similar

    because they have the same antecedents (see Essential XML

    History on page 5 for information), they have different

    strengths. HTML is best used to format information for

    display, while the descriptive information in XML tagsmakes it easier to deal with document content. For example,

    suppose you have a document containing a list of PC

    printers that contains information about the features of each

    printer model. If the document is stored in HTML, its

    difficult to create a search that finds all printers that support

    color printing, duplex printing, and can print at least 10

    pages per minute. Conversely, if you store the same document

    using XML, you would probably create separate elements

    for each important feature (e.g., maximum_print_speed)

    and could easily develop an application that searches for

    all printers that meet your criteria. Of course, a database

    is ideal for such a search, but XML provides database-like

    search capabilities for information that is stored in documents

    such as user manuals or marketing brochures. As youll

    see in the following section, the XML data can easily be

    converted into HTML for display purposes.

    Because XML documents are plain text, you can write

    XML using any text editor (e.g., Notepad). However, as you

    begin working with XML, youll quickly find that an XML-

    aware editor is a big time-saver. An XML editor shouldhelp you write XML by providing syntax-checking and

    document-generation capabilities. For example, if you begin

    to create a new element, some editors will automatically

    generate the ending tag for you.

    An XML document can stand entirely on its own, without

    any related documents. More often, though, an XML

    document is part of a larger application architecture that

    includes components that define the structure required for

    a particular type of XML document, solutions that reformat

    XML data (e.g., create an HTML document for display using

    data from an XML document), and applications that process

    ESSENTIAL XML SYNTAX AND TERMINOLOGY

    SUPPLEMENT TO iSeries NEWS 2005

  • 7/29/2019 Essential Guide XML PDF

    4/7SUPPLEMENT TO iSeries NEWS 2005

    ESSENTIAL XML STANDARDS

    XLINK is a standard for defining hyperlinks in XML. XML Namespaces make it possible to create unique

    element names. XML Schemas define the rules for the specialized

    XML documents used to define the structure ofother XML documents.

    XPATH addresses each part of an XML documentvia a hierarchical structure (e.g., first_name withincustomer_name within quote_request).

    XQUERY is a relatively new standard that providesSQL-like query capabilities for XML documents.

    Extensible Stylesheet Language (XSL) formatsXML documents for display. There are twocomponents of the XSL standard: XSLTransformations (XSLT) and XSL FormattingObjects (XSL FO).

    XML itself is a standard, but it also involves many related standards. Here are

    some of the most widely used XML standards.

    XML documents. Understanding how these pieces work

    together is vital to understanding XML.

    The Big PictureAn XML document is almost always associated with a

    second document that defines the valid structure for a

    particular type of documents. For example, an XML

    document might contain a particular inventory inquiry from

    XYZ Company, but the structural-definition document would

    define the format for all inventory inquiry documents.

    There are two standards for these structural-definition

    documents: DTD is the older and simpler standard, whereas

    XML schema is the newer standard. DTDs and schemas

    serve the same purpose, but their complexity and capabilities

    vary significantly.

    Figure 2 contains a DTD that you could use to define the

    XML document in Figure 1, and Figure 3 contains the schema

    for the same document. Both the DTD and the schema were

    generated using an XML editor (WebSphere Development

    Studio Client for iSeries WDSc, in this case). Youll find

    that creating a sample document (e.g., an inventory inquiry)and using it to generate an initial version of the DTD or

    schema is often the simplest way to create a structural-

    definition document. While you may need to clean up the

    generated code, it will give you a good starting point for

    developing the DTD or schema.

    Whether you use a DTD or a schema, there is

    typically a one-to-many relationship between the

    DTD or schema and the XML documents. For

    example, you could publish a DTD or a schema

    (or both) specifying the format for incoming

    inventory inquiries and, hopefully, many of your cus-

    tomers would then begin to send you inventory

    inquiries in XML format. DTDs and schemas forexternal documents (versus documents that are inter-

    nal to a particular company) are usually published

    online so that they can be shared more easily.

    Ideally, everybody would use the same structure

    for the same type of document (e.g., inventory inquiries), but

    thats not always the case not even within a single industry.

    Fortunately, many industry groups are working on standards

    that should help alleviate some of the Tower-of-Babel aspects

    of XML. Youll find the latest information on industry-specific

    XML structures online at xml.org.

    In addition to DTDs and schemas, other components can

    be associated with XML documents. For example, if you

    plan to display an XML document in a Web page, youll

    probably want to first convert the XML document into

    an HTML document. Similarly, you often might need to

    create multiple XML documents that contain the same

    general information but use slightly different structures.

    If you need to convert lots of documents between the same

    two structures, it makes sense to automate the process. The

    simplest way to do this is via an Extensible Stylesheet

    Language Transformations (XSLT) document that defines

    how input elements should be formatted in the output (XML

    or HTML) document. For example, if several of your vendors

    accept inventory inquiries in XML, but each uses a slightly

    different schema, you could develop a generic XMLinventory inquiry, then create the variations using XSLT.

    As with DTDs and schemas, your XML editor should include

    tools to help you create XSLT documents.

    An XSLT document works in conjunction with an XSLT

    Figure 2:A DTD generated by WDSc

    for the XML document in Figure 1

    (customer_reference,date_required,customer,requested_products)>

    THE ESSENTIAL GUIDE TO SOFTWARE XML

    4

  • 7/29/2019 Essential Guide XML PDF

    5/7SUPPLEMENT TO iSeries NEWS 2005

    Although most XML editors include an XML parser, youllalso need an XML parser for production applications. XMLparsers may be part of a Web application server, or theymay be available as separate software options. There are

    two general standards for XML parsers: DocumentObject Model (DOM) and Simple API for XML (SAX).The only functional difference between DOM parsers

    and SAX parsers is that DOM parsers can modify anXML document, while SAX parsers are read-only (ofcourse, an application that uses a SAX parser can alwayswrite out a new XML document in a different formatthan the incoming XML document). The other differ-ences between DOM and SAX parsers dont affect theircapabilities, but they can have an impact on ease-of-use,and in some cases, performance.

    SAX parsers are event-driven and are best suited forapplications that need to choose specific elements from alarger XML document. Youll find the SAX parsers moreintuitive if your programming background includes languages

    that have event-driven capabilities (e.g., Visual Basic, Java).DOM parsers read an entire XML document into anapplication where the elements can be referenced, muchas an RPG program might reference fields in a recordformat. Therefore, DOM parsers have an advantage overSAX parsers when you need to process a high percentageof the elements in an XML document. In addition,DOM parsers generally feel more natural than SAXparsers if your programming background includes procedurallanguages such as RPG and Cobol.

    Essential XML History

    The Essential XML Resources

    The histories of individual computer languages are mostly just curiosities, but XMLs history provides a glimpse into

    its syntax as well. XML is part of the same family of languages as HTML and is based on Standard Generalized Markup

    Language (SGML). SGML is a direct descendent of Generalized Markup Language, which was developed by IBM

    researchers in the 1960s.

    The concept behind markup languages is to separate document content from document structure and display. Thus

    in both XML and HTML, the tags contain information about data formatting information in HTML, and contextinformation in XML.

    SGML became an ISO standard in 1986. HTML, which evolved somewhat independently but incorporates many SGML

    concepts, is slowly being brought back into compliance with the larger SGML standard.

    In 1996, developers began working on a simplified version of SGML that focuses on document structure rather than

    document format. That project is the basis for XML, which became a Worldwide Web Consortium standard in 1998.

    ESSENTIAL XML PARSER CONCEPTS

    THE ESSENTIAL GUIDE TO SOFTWARE XML

    5

    Charles F. Goldfarbs All the XML Books in PrintGoldfarb, one of the developers of SGML, attempted to

    list all the XML books in print. Although the list was last

    updated in early 2004, its still a useful resource.xmlbooks.com

    The CoverPagesThe XML CoverPages include XML news,

    background material, and technical tips.

    xml.coverpages.org

    DevX.comXML FAQs, articles, discussion groups and more.

    devx.com/xml

    World Wide Web Consortium XML pagew3.org/XML

    XML.comOReilly Media, Inc., a premier technical book publisher,

    maintains this XML information site.

    xml.com

    IBM RESOURCES

    Developerworks XML site

    www-106.ibm.com/developerworks/xmliSeries XML information home page

    www-1.ibm.com/servers/enable/site/xml/iseries/index.html

    Two IBM white papers illustrate how to processXML documents using RPG or Cobol:

    Parsing XML documents using the newV5R3 ILE COBOL syntaxwww-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.html

    XML Interface for RPG maps XMLinto DB2 UDB for iSerieswww-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlface

    http://xmlbooks.com/http://xmlbooks.com/http://xml.coverpages.org/http://devx.com/xmlhttp://www.w3.org/XML/http://xml.com/http://localhost/var/www/apps/conversion/tmp/scratch_7/www-106.ibm.com/developerworks/xmlhttp://www-1.ibm.com/servers/enable/site/xml/iseries/index.htmlhttp://www-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_7/www-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlfacehttp://localhost/var/www/apps/conversion/tmp/scratch_7/www-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlfacehttp://www-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.htmlhttp://www-1.ibm.com/servers/enable/site/xml/iseries/index.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_7/www-106.ibm.com/developerworks/xmlhttp://xml.com/http://www.w3.org/XML/http://devx.com/xmlhttp://xml.coverpages.org/http://xmlbooks.com/
  • 7/29/2019 Essential Guide XML PDF

    6/7SUPPLEMENT TO iSeries NEWS 2005

    THE ESSENTIAL GUIDE TO SOFTWARE XML

    processor software that applies the rules defined in the

    XSLT document to an incoming XML document and pro-

    duces an output document in HTML, XML, or text format.

    An XSLT processor is typically bundled into a Web appli-

    cation server such as WebSphere Application Server (WAS)

    and can be accessed by calling APIs in an application.

    Most XML editors also include an XSLT processor for

    testing purposes.

    From XML to the Database

    and Vice-VersaIn an iSeries environment, XML projects

    almost invariably involve extracting

    data from DB2 UDB for iSeries or

    moving data from XML documents

    into the database. While its possible

    to store entire XML documents in

    iSeries files, more often youll need to

    separate the data for one or more elements

    from its tags and store the data itself as a

    field or fields within existing iSeries databaserecords. Youll also find lots of requirements for

    the opposite task creating XML documents using data

    from one or more database records.

    The underlying software that is used to separate an

    XML document into data and data-description components

    is an XML parser. An XML parser understands the rules

    of XML syntax, just as the parser that is part of the RPG

    compiler understands RPG syntax. For more about XML

    parsers, see Essential XML Parser Concepts on page 5.

    As you begin developing in XML, you might not even

    realize that youre using an XML parser. For example,

    when an XML editor validates an XML document against

    its associated DTD or schema, an XML parser is invoked

    to perform the validation. XML parsers, including those

    for iSeries, are typically free. The iSeries-specific XML

    parser support is packaged in the no-charge licensed program

    product, XML Toolkit for iSeries (5733-XT1).If youre working with very low document

    volumes, it may be possible to assemble and

    disassemble XML documents using the tools

    built into an XML editor. However, for

    production processing of XML documents,

    youll usually need to develop code that

    moves data back and forth between a par-

    ticular type of XML document (e.g., an

    inventory inquiry) and the associated data-

    base records.

    You can create an XML document using a

    variety of techniques. At one end of the spectrum,

    you could write an RPG program that creates an XML

    document as an iSeries database file by hand-coding the

    tags and their contents. Then, you could convert the database

    file to a stream file using the CPYTOSTMF (Copy to Stream

    File) CL command. Other options include using APIs to

    output a stream file from an RPG program, generating an

    XML document using the results of an SQL query, or

    writing a Java application that builds an XML document.

    Although you can write custom code to extract data from

    an XML document, its simpler to leverage the capabilities

    of an XML parser. For example, you might write code that

    invokes specific parser functions such as reading the data

    for a particular type of element (e.g., product_code).Java is the language of choice for working with XML

    because it includes extensive support for accessing parser

    APIs. However, you can also invoke parser APIs using

    RPG or Cobol, and products are available that will auto-

    mate part of the process of assembling or disassembling

    XML documents.

    Explore XMLXML is a powerful tool for communicating data between

    applications using different databases and running on different

    platforms, and it is rapidly becoming the medium of choice for

    transaction-level data transfer. XML can also organize infor-

    mation within a document, thus making it easier to modifyand search large amounts of text. For all its strengths, XML is

    still a relatively new technology with a maze of confusing,

    and sometimes competing, standards. To take advantage of

    XML, it helps to have a clearly defined goal and the flexi-

    bility to experiment with various tools and techniques. Its

    also useful to understand how other businesses are using XML.

    To explore the opportunities XML offers, visit the Web

    sites listed in Essential XML Resources on page 5.

    Sharon L. Hoffman is a senior technical editor foriSeries NEWS.

    Figure 3:An XML schema generated by WDSc

    for the XML document in Figure 1

    6

  • 7/29/2019 Essential Guide XML PDF

    7/7SUPPLEMENT TO iSeries NEWS 2005

    THE ESSENTIAL GUIDE TO SOFTWARE XML

    7