27
Introduction to XML Marek Podgorny and Lukasz Beca EECS SU and CollabWorx, Inc. Syracuse University Fall 2002

Introduction to XML Marek Podgorny and Lukasz Beca EECS SU and CollabWorx, Inc. Syracuse University Fall 2002

Embed Size (px)

Citation preview

Introduction to XMLIntroduction to XML

Marek Podgorny and Lukasz BecaEECS SU and CollabWorx, Inc.

Syracuse UniversityFall 2002

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 2

Markup LanguagesMarkup Languages Marking up text is a methodology for encoding data with

information about itself– Yellow highlighter is a valid markup methodology

– You decide which part of the document are important– It is portable – others can benefit from your markup

Two critical properties on a valid markup:– A standard must be in place to define what a valid markup is

– Above, markup is defined as a bit of yellow ink atop text– In HTML a markup is a <font color=yellow>tag</font>

– A standard must be in place to define what markup means– Yellow highlight means the highlighted text represents an important

point– In HTML each tag carries a well-defined formatting instruction

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 3

What is XML?What is XML? Like HTML, XML (Extensible Markup Language) is a

markup language which relies on the concept of rule-specifying tags and the use of a tag-processing application that knows how to deal with the tags

For HTML, the application is a browser– This is because HTML is a presentation markup

For XML, the application can by anything– XML may be processed by browsers, but its application

domain is huge and not even completely understood today

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 4

eXtensibility of XMLeXtensibility of XML The most important technical difference between

XML and HTML is that while HTML is a closed set of tags, XML is a meta-language for defining other markup languages– XML specifies the standards with which you can define

your own markup languages with their own sets of tags– This very statement makes people nervous…– We will discuss methodology to define a new language

but in practice very few people will ever write a DTD

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 5

Made-up Markup Language (MuML)Made-up Markup Language (MuML)

<CONTACT> <NAME>Kim Smith</NAME> <ID>027</ID> <COMPANY>WebtopSystems Inc.</COMPANY> <EMAIL>[email protected]</EMAIL> <PHONE>315 443-4868</PHONE> <STREET>111 College Pl</STREET> <CITY>Syracuse</CITY> <STATE>New York</STATE> <ZIP>13244</ZIP></CONTACT>This is a chunk of valid XML. How is it useful?

Netscape browser surely doesn’t know what to do with it….

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 6

How to make MuML useful?How to make MuML useful?

There must be a set of rules allowing us/computer to understand syntax of the language– In XML, this information is provided to processing application by

Document Type Definition (DTD)– The DTD specifies what it means to be a valid tag - the syntax for

marking up There must be a set of rules defining the meaning

(semantics) of the markup– To specify what valid tags mean, XML documents are also

associated with style sheets which provide GUI instructions for a processing application like a web browser.

– Note that other application domains of XML might do w/o a style sheet – e.g., application using XML a object serialization technique

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 7

Style Sheet Pseudo-CodeStyle Sheet Pseudo-Code Anytime you see a

<CONTACT>, display it using a <UL> tag. </CONTACT> tags should be converted to </UL>

All <NAME> tags can be substituted for <LI> tags and </NAME> tags should substituted for </LI>

All <EMAIL> tags can be substituted for <LI> tags and </EMAIL> tags should be ignored

Style sheet utilizes the functionality of HTML to define the formatting of MuML.

For non-browser apps, the HTML translation is irrelevant

Processing application combines the logic of the style sheet, the DTD, and the data of the MuML document, and displays it according to the rules and the data.

So instead of a simple HTML we got three different chunks. Why the pain?

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 8

Complex XML WorldComplex XML World We need a processing agent which will put together

the DTD, the style sheet, and the data– Note Web browsers barely up to the task yet

Formal definition:– "A software module called an XML processor is used to

read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application."

And this is not yet all….

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 9

Build your own ColdFusion?Build your own ColdFusion?

XML allows each specific industry to develop its own tag sets to meet its unique needs– Doesn’t force everyone's browser to incorporate zillions of tag sets,

or developers to settle for a tag set that is too generic to be useful– Compelling? Well…

The real power of XML: – Not only can you define your own set of tags, but the rules

specified by those tags are not limited to formatting rules– XML allows you to define all sorts of tags with all sorts of rules

– tags representing business rules or tags representing data description or data relationships.

– As these tags are reflected in DOM, you can do computation on documents!

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 10

Why are HTML days counted?Why are HTML days counted?

The GUI is embedded in the data. – What happens if you decide that you like a table-based

presentation better than a list-based presentation? Searching for information in the data is tough The data is tied to the logic and language of HTML

and hence to browsers– What if I want to use my data in a Java applet?

HTML: <LI>State: Ohio <LI>State: Oregon

XML: <state>Ohio</state> <state>Oregon</state>

How do I find all records for Ohio

What is relationship of Ohio and Oregon?

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 11

HTML Search in ActionHTML Search in Action

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 12

Long Live XML!Long Live XML! With XML, the GUI and data are divorced

– Thus, changes to display do not require messing with the data - a separate style sheet will specify a table display or a list display

Searching the data is easy and efficient – Search engines can parse description-bearing tags rather than

muddling in the data. Tags provide them with the intelligence they otherwise lack

Complex relationships (trees, inheritances, classes) can be communicated

The code is much more legible to a lay person - – It is obvious that <ID>911</ID> represents an ID whereas <LI>911 might not. XML is self-describing

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 13

Why isn’t it there if it is so good?Why isn’t it there if it is so good? No XML applications…

– IE 5.0 provides some support for XSL and XML if output is HTML

– Netscape 5.0 (Mozilla) also implements support for XML but not for XSL

A quote: “XML isn't about display -- it's about structure. This has implications

that make the browser question secondary. So the whole issue of what is to be displayed and by what means is intentionally left to other applications. You can target the same XML (with different XSL) for different devices (standard web browser, palm pilot, printer, etc.). You should not get the impression that XML is useless until browsers support it. This is definitely not true -- we are using it at NASA in ways where no browser plays any role." - Ken Sall, NASA IT Manager

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 14

XML Design GoalsXML Design Goals Enable better search algorithms (metadata) Enable presentation of various views for same data Integrate data from different sources Provide easy use over the Internet Create documents readable even by humans Support data interchange Enable easy development of document processing

applications

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 15

XML - SummaryXML - Summary Extensible Markup Language - Subset of Standard

Generalized Markup Language (SGML) Universal format for describing structured data on

the Web Specification developed by World Wide Web

Consortium (W3C) supervised by XML Working Group

Applications of XMLApplications of XML

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 17

Applications of XMLApplications of XML XML languages XML protocols Support for XML

– Client side– Server side

XML and databases Data interchange

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 18

XML DeploymentXML Deployment XML is a basis for development of industry language

and protocol standards Corporations and academic organizations form

special organizations (consortiums or forums) in order to develop standards for whole branches of industry. Example: World Wide Web Consortium or WAPForum

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 19

Extensible HyperText Markup Language (XHTML)Extensible HyperText Markup Language (XHTML) XML based syntax Extensibility through XHTML modules allow the combination of

existing and new feature sets when developing content and when designing new user agents (web browsers, portable devices, etc.)

Examples of modules:– required modules: structure, basic text, hypertext, lists– optional modules: presentation, forms, tables, images,

stylesheets, applets, frames, etc. XHTML is designed with general user agent interoperability in

mind, XHTML documents should be displayed on any type of XHTML-compliant devices

Current version - XHTML™ 1.0, DTD specification available at http://www.w3.org site

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 20

Synchronized Multimedia Integration Language (SMIL)Synchronized Multimedia Integration Language (SMIL) SMIL allows developers to mix media presentation to be

presented and synchronized with each other For example, the SMIL document can specify:

– the positioning where the visual content appears in player – when audio or video (or other type of stream) starts and

stops playing Users need a special player to view the SMIL documents Products supporting SMIL: Real Networks - Realplayer, Apple -

QuickTime See:

http://www.empirenet.com/~joseram/smil_intro/smil_intro.html for tutorial about SMIL written in SMIL

Current version - SMIL 1.0, Specification available at http://www.w3.org site

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 21

Wireless Application Protocol (WAP) and Wireless Markup Language (1)Wireless Application Protocol (WAP) and Wireless Markup Language (1) Forecasted users of wireless services by 2001 - 530 million Currently used and available in the future devices have multimedia

capabilities: receiving/sending e-mail, accessing Internet Wireless Application Protocol - standard for the presentation and

delivery of wireless information and telephony on mobile phones and other wireless terminals – handset manufacturers that represent 90 percent of world market support

this standard Wireless Markup Language (WML) - part of the standard,

designed to describe information to be presented on small displays

WML documents can be accessed over the Internet using standard HTTP protocol – traditional servers can be used for hosting WML documents

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 22

Simple Object Access Protocol (SOAP) Simple Object Access Protocol (SOAP) Support for Remote Procedure Call and messaging

mechanisms over various protocols (for example, HTTP). implemented in XML

Describes conventions for definition of:– method calls– method parameters– results of method calls– serialization mechanisms for encoding application-defined data types

Since SOAP messages can be transported over HTTP protocol, currently deployed Web infrastructure becomes one distributed computing platform (distributed objects can be placed on HTTP servers)

Current version - SOAP 1.1 (status: note), Specification available at http://www.w3.org site

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 23

Support for XML in Web BrowsersSupport for XML in Web Browsers Internet Explorer 5.0+

– Extensible Markup Language– Extensible Stylesheet Language– Cascading Stylesheets– Document Object Model– Data Islands

Mozilla 5.0– Extensible Markup Language– Cascading Stylesheets – Document Object Model– Graphical User Interface built using XUL (Extensible User Interface

Language) - users can provide their own user interface documents to customize layout of the browser

Microbrowsers for portable devices– Wireless Markup Language

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 24

Support for XML on Server SideSupport for XML on Server Side Web servers can host XML documents XML documents can be dynamically generated by

servlets, JSP pages, and ASP pages XML adapters allow translation from application

specific formats to XML XML documents can be stored in databases for fast

retrieval Enterprise applications with XML processing

functionality can be easily built using available XML parser components and XSL processors

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 25

XML Document and Database (1)XML Document and Database (1)

Part Name Part ID Price InStock

window 001 40$ yes

muffler 002 150$ yes

door 003 30$ no

Information stored in database

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 26

XML Document and Database (2)XML Document and Database (2)

<store><part id=“p001”><part-name>window</part-name><price>40</price><instock>yes</instock></part><part id=“p002”><part-name>muffler</part-name><price>150</price><instock>yes</instock></part> </store>

The same information represented as an XML document

Introduction to XML CPS606, Fall 2002, EECS SU & CollabWorx 27

Data InterchangeData Interchange One of the most costly aspect of Enterprise

Application Integration - conversion of proprietary data formats to other data formats

XML - new data interchange standard Information handled by different applications and

data sources can be converted into XML to provide uniform data format

Using XML– applications can exchange data easily– application specific data can be used on the Internet