53
2 December 2005 Web Technologies XML and Related Technologies Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com

XML and Related Technologies - Web Technologies (1019888BNR)

Embed Size (px)

Citation preview

2 December 2005

Web TechnologiesXML and Related Technologies

Prof. Beat Signer

Department of Computer Science

Vrije Universiteit Brussel

http://www.beatsigner.com

Beat Signer - Department of Computer Science - [email protected] 2November 25, 2016

What is XML?

Standardised text format for (semi-)structured

information

Meta markup language tool for defining other markup languages

- e.g. XHTML, WML, VoiceXML, SVG, Office Open XML (OOXML)

Data surrounded by text markup that describes the data ordered labeled tree

<note date="2013-10-17"><to>Reinout Roels</to><from>Beat Signer</from><content>Let us discuss exercise 7 this afternoon ...</content></note>

Beat Signer - Department of Computer Science - [email protected] 3November 25, 2016

... and What is it Not?

XML is not a programming language however, it can be used to represent program

instructions, configuration files etc.

note that there is an XML application (XSLT) which isTuring complete

XML is not a database XML is often used to store long-term data but it lacks many

database features

many existing databases offer an XML import/export

more recently there also exist native XML databases

- e.g. BaseX or eXist

Beat Signer - Department of Computer Science - [email protected] 4November 25, 2016

XML Example

<?xml version="1.0"?><publications><publication type="inproceedings"><title>An Architecture for Open Cross-Media Annotation Services</title><author><surname>Signer</surname><forename>Beat</forename></author><author><surname>Norrie</surname><forename>Moira</forename></author><howpublished>Proceedings of WISE 2009</howpublished><month>10</month><year>2009</year></publication><publication type="article">...</publications>

Beat Signer - Department of Computer Science - [email protected] 5November 25, 2016

Evolution of XML

Descendant of Standard Generalized Markup

Language (SGML) SGML is more powerful but (too) complex

HTML is an SGML application

XML was developed as a “SGML-Lite” version XML 1.0 published in February 1998

Since the initial XML release numerous associated

standards have been published

Beat Signer - Department of Computer Science - [email protected] 6November 25, 2016

Why has XML been so Successful?

Simple

General

Accepted

Many associated standards

Many (freely) available tools

Beat Signer - Department of Computer Science - [email protected] 7November 25, 2016

XML Specification

Provides a grammar for XML documents in terms of placement of tags

legal element names

how attributes are attached to elements

...

General tools parsers that can parse all XML documents regardless of particular

application tags

editors and various programming APIs

Specification available at http://www.w3.org/TR/xml/

Beat Signer - Department of Computer Science - [email protected] 8November 25, 2016

XML Tree Document Structure

An XML document tree can contain 7 types of nodes root node

- always exactly one root node

element nodes

- element node with optional attribute nodes

attribute nodes

- name/value pairs

text nodes

- text belonging to an element or attribute

comment nodes

processing instruction nodes

- pass information to a specific application via <? ... ?>

namespace nodes

Beat Signer - Department of Computer Science - [email protected] 9November 25, 2016

Well-Formedness and Validity

An XML document is well-formed if it follows

the rules of the XML specification

An XML document can be valid according to its

Document Type Definition (DTD) or XML Schema completely self-describing about its structure and content through

- the document content

- auxiliary files referred to in the document

validity can be checked by a validating XML parser

- online validation service available at http://validator.w3.org

<ELEMENT publication (title, author+, howpublished?, month, year)><ELEMENT title (#PCDATA)><ELEMENT author (surname, forename)><ATTLIST publication type CDATA>…

Beat Signer - Department of Computer Science - [email protected] 10November 25, 2016

Differences Between XML and HTML

XML is a tool for specifying markup languages rather

than a markup language itself specify “special markup languages for special applications”

XML is not a presentation language defines content rather than presentation

HTML mixes content, structure and presentation

XML was designed to support a number of applications

and not just web browsing

XML documents should be well-formed and valid XML documents are easier to process by a program (parser)

Beat Signer - Department of Computer Science - [email protected] 11November 25, 2016

Differences Between XML and HTML ...

Readability is more important than conciseness e.g. <tablerow> rather than <tr>

Matching of tags is case sensitive

e.g. start tag <Bold> does not match end tag </BOLD>

Markup requires matching start and end tags

e.g. <p> and </p>

exceptions are special non-enclosing tagse.g. <br/> or <image ... />

Whitespaces in texts are significant

Beat Signer - Department of Computer Science - [email protected] 12November 25, 2016

XHTML

XHTML is a reformulation of HTML to make

it an XML application we accept that HTML is here to stay

improve HTML it by using XML with minimal effort

W3C stopped their work on XHTML (as discussed in lecture 3)

<!DOCTYPE html PUBLIC"-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Vrije Universiteit Brussel</title></head><body>...</body></html>

Beat Signer - Department of Computer Science - [email protected] 13November 25, 2016

Differences Between XHTML and HTML

Documents must be valid

XHTML namespace must be declared in <html> element

<head> and <body> elements cannot be omitted

<title> element must be the first element in the <head>

End tags are required for non-empty clauses empty elements must consist of a start-tag and end-tag pair or an

empty element (e.g. <br/>)

Element and attribute names must be in lowercase

Attribute values must always be quoted

Attribute names cannot be used without a value

Beat Signer - Department of Computer Science - [email protected] 14November 25, 2016

XML Technologies

XPointerXLink

XPath

XQuery

XSLT

Beat Signer - Department of Computer Science - [email protected] 15November 25, 2016

Overview of XML Technologies

XPath and XPointer addressing of XML elements and parts of elements

XSL (Extensible Stylesheet Language) transforming XML documents (XSLT) and XSL:FO

XLink (XML Linking Language) linking in XML

XQuery (XML Query Language) querying XML documents

Document Type Definition (DTD) and XML Schema definition of schemas for XML documents

DTDs have a very limited expressive power

XML Schema introduces datatypes, inheritance etc.

Beat Signer - Department of Computer Science - [email protected] 16November 25, 2016

Overview of XML Technologies ...

SAX (Simple API for XML) event-based programming API for reading XML documents

DOM (Document Object Model) programming API to access and manipulate XML documents as

tree structures

RDF (Resource Description Framework) specific XML encoding used by the semantic web

Beat Signer - Department of Computer Science - [email protected] 17November 25, 2016

Document Object Model (DOM)

Defines a language neutral API for accessing and

manipulating XML documents as a tree structure have already seen the HTML DOM model

The entire document must be read and parsed before it

can be used by a DOM application DOM parser not suited for large documents!

Two different types of DOM Core interfaces for

accessing supported content types generic Node interface

node type-specific interfaces

Various available DOM parsers e.g. JDOM parser specifically for Java

Beat Signer - Department of Computer Science - [email protected] 18November 25, 2016

Document Object Model (DOM) ...

Different DOM levels DOM Level 1

- concentrates on HTML and XML document models

- contains functionality for document navigation and manipulation

DOM Level 2

- supports XML Namespaces

- stylesheet object model and operations to manipulate it

DOM Level 3

- specifies content models (DTD and Schemas)

Beat Signer - Department of Computer Science - [email protected] 19November 25, 2016

XPath

Expression language to address elements of an XML

document (used in XPointer, XSLT and XQuery)

A location path is a sequence of location steps separated

by a slash (/) various navigation axes such as child, parent, following etc.

have a look at our XSLT/XPath reference document that is available on PointCarré for all the details about XPath

XPath expressions look similar to file pathnames

/publications/publication

/publications/publication[year>2008]/title

//author[3]

Beat Signer - Department of Computer Science - [email protected] 20November 25, 2016

XML Pointer Language (XPointer)

Address points or ranges in an XML document

Uses XPath expressions

Introduces addressing relative to elements supports links to points without anchors

Beat Signer - Department of Computer Science - [email protected] 21November 25, 2016

XML Linking Language (XLink)

Standard way for creating links in XML documents

Fixes limitations of HTML links where anchors must be placed within documents

only entire documents or predefined marks (#) can be linked

only one-to-one unidirectional links are supported

XLinks can be defined in separate documents third-party link (metadata) server

Two types of links simple links

- associate exactly one local and one remote resource (similar to HTML links)

extended links

- associate an arbitrary number of resources

Beat Signer - Department of Computer Science - [email protected] 22November 25, 2016

XML Linking Language (XLink) ...

Other XLink features linking parts of resources

links can be defined atthe attribute level

typed links

The Annotea project

uses XLink for managing

external annotations for example used in the

Amaya Web Browser

New Microsoft Edge browser annotation of arbritatry webpages

Annotation in the Amaya Browser

Beat Signer - Department of Computer Science - [email protected] 23November 25, 2016

Simple API for XML (SAX)

Event-based API for XML document parsing many free SAX parsers available (e.g. Apache Xerces)

Scans the document from start to end invokes callback methods

Different kinds of events start of document

end of document

start tag of an element

end tag of an element

character data

processing instruction

SAX parser needs less memory than DOM parser DOM parser often uses SAX parser to build the DOM tree

Beat Signer - Department of Computer Science - [email protected] 24November 25, 2016

XML Transformations

Developers want to be able to transform data from one

format to another

processing of XML documents

- XML to XML transformation

post-processing of documents

- e.g. XML to XHTML, XML to WML, XML to PDF, ...

The Extensible Stylesheet Language Transformations

(XSLT) language can be used for that purpose

Beat Signer - Department of Computer Science - [email protected] 25November 25, 2016

XSLT Processor

The XSLT processor (e.g. Xalan) applies an XSLT stylesheet to an

XML document and produces the corresponding output document

DTD

Source Tree Result Tree

Stylesheet Tree

DTD

XSLT Stylesheet

XML Document XHTML, WML, ...DOM

Parser

XSLT

Processor

Input Document Output Document

Beat Signer - Department of Computer Science - [email protected] 26November 25, 2016

XSL Transformations (XSLT)

Most important part of XSL uses XPath for the navigation

XSLT is an expression-based language based on

functional programming concepts

XSLT uses pattern matching to select parts of documents

templates to perform transformations

Most web browsers support XSLT transformation can be done on the client side based on an XML

document and an associated XSLT document

Beat Signer - Department of Computer Science - [email protected] 27November 25, 2016

Example

<?xml version="1.0"?><publications><publication type="inproceedings"><title>An Architecture for Open Cross-Media Annotation Services</title><author><surname>Signer</surname><forename>Beat</forename></author><author><surname>Norrie</surname><forename>Moira</forename></author><howpublished>Proceedings of WISE 2009</howpublished><month>10</month><year>2009</year></publication><publication type="article">...</publications>

Beat Signer - Department of Computer Science - [email protected] 28November 25, 2016

XSLT Stylesheet

<?xml version="1.0"?><xsl:stylesheet version="1.0" xmlns:xsl="http.w3.org/1999/XSL/Transform">...<xsl:template match="author"><p><xsl:value-of select="surname"/></p></xsl:template>...</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?><html>...<p>Signer</p><p>Norrie</p>...</html>

output

Beat Signer - Department of Computer Science - [email protected] 29November 25, 2016

Other XSLT Statements

<xsl:for-each select="..."> select every XML element of a specified node-set

<xsl:if test="..."> conditional test

<xsl:sort select="..."/> sort the output

...

Have a look at the XSLT/XPath reference document that

is available on PointCarré in exercise 7 you will have the chance to implement and execute

different XSLT transformations

Beat Signer - Department of Computer Science - [email protected] 30November 25, 2016

XML for Data Interchange

Standard representation to exchange information

between different systems

General way to query data from different systems e.g. via the XML Query (XQuery) language

Connect applications running on different operating

systems and computers with different architectures XML Remote Procedure Call (XML-RPC)

Simple Object Access Protocol (SOAP) which is a successorof XML-RPC and used for accessing Big Web Services

- discussed later in the course

Beat Signer - Department of Computer Science - [email protected] 31November 25, 2016

XML Remote Procedure Call (XML-RPC)

XML-RPC specification released in April 1998

Advantages XML-based lingua franca understood by different applications

HTTP as carrier protocol

not tied to a single object model (as for example in CORBA)

easy to implement (based on HTTP and XML standards)

lightweight protocol

built-in error handling

Disadvantages slower than specialised protocols that are used in closed

networks

Beat Signer - Department of Computer Science - [email protected] 32November 25, 2016

XML-RPC Request and Response

POST /RPC2 HTTP/1.0User-Agent: Java1.2Host: macrae.vub.ac.beContent-Type: text/xml;charset=UTF-8Content-length: 245

<?xml version="1.0" encoding="ISO-8859-1"?><methodCall><methodName>Math.multiply</methodName><params><param><value><double>128.0</double></value></param><param><value><double>256.0</double></value></param></params></methodCall>

HTTP/1.1 200 OKConnection: closeContent-Length: 159Content-Type: text/xmlServer: macbain.vub.ac.be

<?xml version="1.0" encoding="ISO-8859-1"?><methodResponse><params><param><value><double>32768.0</double></value></param></params></methodResponse>

XML-RPC Request XML-RPC Response

Beat Signer - Department of Computer Science - [email protected] 33November 25, 2016

XML-RPC Error Message

HTTP/1.1 200 OKConnection: closeContent-Length: 159Content-Type: text/xmlServer: macbain.vub.ac.be

<?xml version="1.0" encoding="ISO-8859-1"?><methodResponse><fault><value><struct><member><name>faultCode</name><value><int>873</int></value></member><member><name>faultString</name><value><string>Error message</string></value></member></struct></value></fault></methodResponse>

XML-RPC Response

Beat Signer - Department of Computer Science - [email protected] 34November 25, 2016

XML-RPC Scalar Values

XML-Tag Type Corresponding Java Type

<i4> or <int> four-byte signed integer Integer

<boolean> 0 or 1 Boolean

<string> ASCII string String

<double> double-precision signed float Double

<dateTime.iso8601> date/time Date

<base64> base64-encoded binary byte[]

Beat Signer - Department of Computer Science - [email protected] 35November 25, 2016

XML-RPC Composed Values

Complex data types can be represented by nested

<struct> and <array> structures

XML-Tag Type Corresponding Java Type

<struct> A structure contains

<member> elements and

each member contains a

<name> and a <value>

element

Hashtable

<array> An array contains a single

<data> element which can

contain any number of

<value> elements

Vector

Beat Signer - Department of Computer Science - [email protected] 36November 25, 2016

OMX-FS

XML-RPC Example: GOMES

Object-Oriented GUI for

the Object Model Multi-

User Extended Filesystem

GOMES is implemented in

Java and uses XML-RPC

to communicate with the

Object Model Multi-user

Extended File System

(OMX-FS) which was im-

plemented in the Oberon

programming language

XML-RPC

Beat Signer - Department of Computer Science - [email protected] 37November 25, 2016

Framework for Universal Client Access

Generic database interface instead of developing a new

interface from scratch for each new device type

The presented eXtensible Information Management

Architecture (XIMA) is based on OMS Java object database

- managing the application data

Java Servlet Technology

generic XML database interface

- separation of content and representation

XSLT

- appropriate XSLT stylesheet chosen based on User-Agent HTTP header field

Beat Signer - Department of Computer Science - [email protected] 38November 25, 2016

XIMA Architecture

OMS Java Workspace

OMS Java API

XML Server

HTML Servlet WML Servlet VXML Servlet

HTML

Browser

WML

Browser

VXML

Browser

Delegation

Builds XML

based on JDOM

XML + XSLT

→ Response

OM Model

Collections, Associations, multiple inheritance and multiple instantiation

Main Entry Servlet

Beat Signer - Department of Computer Science - [email protected] 39November 25, 2016

Generic XIMA Interfaces

XHTML Interface WML Interface

Beat Signer - Department of Computer Science - [email protected] 40November 25, 2016

Voice Interfaces

Trend for ubiquitous information services small screens, keyboards etc. often clumsy to use

Sometimes it is necessary to have hand-free interfaces e.g. while driving or operating a machine

Alternative input modality for visually impaired users

Voice interfaces can be accessed by a regular phone no new device is required

no installation effort

Improvements in speech recognition and text-to-speech

synthesis make automatic voice interfaces more feasible e.g. for call centers

Beat Signer - Department of Computer Science - [email protected] 41November 25, 2016

VoiceXML Architecture

Various solutions development: IBM WebSphere Voice Server SDK

deployment: BeVocal Cafe Voice Portal

Speech

Recogniser

Converts voice

input into text

Speech model

Language

Analyser

Extracts meaning

from text

Grammar

Application

Server

Gets data (text)

from database

Application

database

Speech

Synthesiser

Generates

speech output

Pronounciation

rules

MeaningText Text

Voice Input Voice Output

Speech Speech

Beat Signer - Department of Computer Science - [email protected] 42November 25, 2016

VoiceXML Architecture (for XIMA)

XIMA FrameworkApache

Web ServerTomcat

OMS Java

Database

Websphere Voice

Server SDK

BeVocal

Voice Portal

Beat Signer - Department of Computer Science - [email protected] 43November 25, 2016

Basic VoiceXML Concepts

Dialogue conversational state in a form or menu

form

- interaction that collects values for field item variables

menu

- presents user with a choice of options

- transition to next dialogue based on choice

Input recognition of spoken input (or recording of spoken input)

recognition of DTMF (dual-tone multi-frequency) input

Output speech synthesis (TTS)

recorded audio files

Beat Signer - Department of Computer Science - [email protected] 44November 25, 2016

VoiceXML Form Example

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/vxmlhttp://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0">

<form id="drinkForm"><field name="drink"><prompt>Would you like to order beer, wine, whisky, or nothing?</prompt><grammar src="drinks.grxml" type="application/srgs+xml"/></field><block><submit next="http://www.wise.vub.ac.be/drinks.php"/></block>

</form></vxml>

Beat Signer - Department of Computer Science - [email protected] 45November 25, 2016

VoiceXML Menu Example

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/vxmlhttp://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0">

<menu id="mainMenu"><prompt>This is the main menu. What would you like to order? <enumerate/></prompt><choice next="#foodForm">food</choice><choice next="#drinkForm">drink</choice>

</menu>...</vxml>

Beat Signer - Department of Computer Science - [email protected] 46November 25, 2016

associationscollections objects

The database contains #Collections and #Associations

Would you like to go to the collections, to the associations,

directly to an object or back to the main menu?

The database contains the

following # associations

Choose an association

Association 'name' contains #A

Would you like to list the

members or go back?

Association 'name' contains the

following # associations

Choose a 'domaintype' or

a 'rangetype' or say back

Object 'oID' is dressed with type 'type' and currently viewed as type 'type'. It contains #Attr, #Links, and #Methods

Choose a link

or say back

The object contains the

following # attributes

Would you like to hear the attributes, the links or

the methods or go back?

You can choose among

the following links

You can choose among

the following methods

You can view the object

as the following types

The database contains the

following # collections

Choose a collection

Collection 'name' contains #M

Would you like to list the

members or go back?

Collection 'name' contains the

following # members

Choose one of the members

The database contains #Objects

Choose an object or say back

Choose a method

or say back

Choose one of the

types or say back

The result of the

method is Result

Beat Signer - Department of Computer Science - [email protected] 47November 25, 2016

Example: Avalanche Forecasting System

Project to provide WAP

and voice access

Beat Signer - Department of Computer Science - [email protected] 48November 25, 2016

Other XML Applications

Synchronized Multimedia Integration Language (SMIL) animations (timing, transitions etc.)

Mathematical Markup Language (MathML) mathematical notations (content and structure)

Scalable Vector Graphics (SVG) two-dimensional vector graphics (static or dynamic)

Ink Markup Language (InkML ) digital ink representation (e.g. from digital pen)

Note that XML standards can also be combined e.g. XHTML+Voice Profile 1.0

Beat Signer - Department of Computer Science - [email protected] 49November 25, 2016

Other XML Applications …

Office Open XML (OOXML) file format (ZIP) for representing word processing documents,

presentations etc. (e.g. *.docx, *.pptx and *.xlsx)

- various XML files within these ZIP documents

- specific markup languages for different domains (wordprocessingML,

presentationML, spreadsheetML, …)

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?><p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">

... <a:p><a:r><a:rPr lang="en-GB" dirty="0" smtClean="0" /> <a:t>Other XML</a:t> </a:r><a:r><a:rPr lang="en-GB" dirty="0" smtClean="0" /> <a:t>Applications ...</a:t> </a:r><a:endParaRPr lang="en-GB" dirty="0" /> </a:p> ...

</p:sld> single slide from a pptx file

Beat Signer - Department of Computer Science - [email protected] 50November 25, 2016

Exercise 7

XML and Related Technologies

Beat Signer - Department of Computer Science - [email protected] 51November 25, 2016

References

Elliotte Rusty Harold and W. Scott Means,

XML in a Nutshell, O'Reilly Media, September 2004

XML and XML Technology Tutorials http://www.w3schools.com

Masoud Kalali, Using XML in Java http://refcardz.dzone.com/refcardz/using-xml-java

VoiceXML Version 2.0 http://www.w3.org/TR/voicexml20/

VoiceXML Version 2.0 http://www.w3.org/TR/voicexml20/

Beat Signer - Department of Computer Science - [email protected] 52November 25, 2016

References ...

Amaya Web Browser http://www.w3.org/Amaya/

XML-RPC Homepage http://www.xmlrpc.com

B. Signer et al., Aural Interfaces to Databases based

on VoiceXML, Proceedings of VDB6, Brisbane,

Australia, 2002 http://beatsigner.com/publications/signer_VDB6.pdf

eXtensible Information Management Architecture (XIMA) http://www.beatsigner.com/xima.html

2 December 2005

Next LectureWeb 2.0 Patterns and Technologies