30
1 XML Basics Roberto Bruni Dipartimento di Informatica Università di Pisa Models and Languages for Coordination and Orchestration IMT- Institutions Markets Technologies - Alti Studi Lucca

1 XML Basics Roberto Bruni Dipartimento di Informatica Università di Pisa Models and Languages for Coordination and Orchestration IMT- Institutions Markets

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

1

XML Basics

Roberto BruniDipartimento di Informatica Università di Pisa

Models and Languages for Coordination and Orchestration

IMT- Institutions Markets Technologies - Alti Studi Lucca

2

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Content XML DTD XML Schema

3

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Content XML DTD XML Schema

4

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

XML for Web Services Web Services are

loosely coupled software components delivered over Internet standard technologies

Today standard technology for interoperability is XML (like it or not)

All WS technologies are based on XML

5

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

What is XML? XML (eXtensible Markup Language) is an

industry-standard text-based markup language system-independent way of representing data

to make data portable Data are indentified using tags

identifiers enclosed in angle brackets <…> Ex. <message>Hello World</message>

collectively, tags are known as “markup” For background and motivation for XML see

"XML and the Second-Generation Web" by Jon Bosak and Tim Bray

Scientific American, May 6 1999 - http://www.sciam.com/

6

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

"XML and the Second-Generation Web"

Give people a few hints, and they can figure out the rest. They can look at a list of groceries and see shopping instructions. They can look at some rows of numbers and understand the state of

their bank account. Computers, of course, are not that smart;

they need to be told exactly what things are, how they are related and how to deal with them.

Extensible Markup Language (XML for short) is a new language designed to do just that, to make information self-describing.

This simple-sounding change in how computers communicate has the potential to extend the Internet beyond information delivery to many other kinds of human activity.

Indeed, since XML was completed in early 1998 by the W3C, the standard has spread like wildfire through science and into industries ranging from manufacturing to medicine.

7

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

XML in Practice An XML document is usually stored in

a (text) file with extension .xml Document publication Archiving Data exchange Data processing Document-driven programming

8

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

The XML Family (in part) XML XML-NameSpace

mechanism for disambiguating tag names DTD, XML Schema

define the structure of XML documents XSL, XSLT

style and style transformation languages XPointer, XLink, XBase, XPath

languages for hyperlinks and addressing

9

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

HTML and XML Like HTML (HyperText Markup Language), XML

encloses data in tags XML tags are case sensitive (HTML tags are not) XML tags relate to the meaning of the enclosed text,

while HTML tags tell how to display the enclosed text XML is extensible (you can write your own tags),

while with HTML you are limited to using only predefined tags (from the HTML specification)

XML documents must be well-formed http://www.ucc.ie/xml/#FAQ-VALIDWF

You can define class of valid documents DTD (Document Type Definition), XML Schema, and others

10

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Bad HTML: Example

11

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Why is XML important? Plain text not binary format

files can be created and edited with anything from standard text editors to visual development environments

easy to debug can store any amount of data (scalability)

Data identification XML describes the kind of each data data are easy to search, extract, process, use

Stylability XML is inherently style free, but you can use different stylesheets to produce output in

postscript, LaTeX, PDF or other formats (even not invented yet!)

12

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

What Makes XML Portable? XML documents are written in text format

which is readable by both human beings and text-editing software

A schema gives XML data its portability a parser uses schemas to understand the

structure of valid documents (and to validate documents)

XML documents do not include formatting instructions they can be easily displayed in various ways

13

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

A Debatable Argument Since identifying the data gives you

some sense of what means how to interpret it what you should do with it

XML is sometimes described as a mechanism for specifying the semantics (meaning) of the data! Advanced reading: “The essence of XML”

by J. Siméon and P. Wadler, Proc. POPL 2003.

14

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Example<message>

<to>[email protected]</to><from>[email protected]</from><subject>XML class</subject><text>

What is XML?</text>

</message>

15

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

XML Features Ability for one tag to contain others gives

XML its ability to represent any hierarchical data structure

Documents can contain comments XML comments look just like HTML

comments Tags can have attributes (like HTML) Inline reusability

XML entities can be included “in line” in a document

16

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Example Revised<message to="[email protected]"

from="[email protected]" subject="XML class" >

<!-- Revised using attributes and empty tags --><text>

What is XML?</text><unread />

</message>

17

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Heuristics for Designing XML Data Structures I

Attributes or elements? Forced choices for elements

the data contains substructures the data contains multiple lines or paragraphs multiple occurrences are possible data changes frequently

Forced choices for attributes data is a small simple string that rarely (if ever)

changes DTDs are used and data is confined to a small number

of fixed choices

18

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Heuristics for Designing XML Data Structures II

Attributes or elements? Stylistic criteria (a bit nebulous like for art or

music) Visibility

if data is intended to be shown, then elements are better otherwise attributes are ok

containers vs characteristics elements are containers attributes are characteristics of containers

More at http://www.oasis-open.org/cover/elementsAndAttrs

.html

19

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Heuristics for Designing XML Data Structures: Example

In XML documents for slideshows the type of the slide (which audience is aimed

to) is best modeled as an attribute it is a characteristic, not to be shown

the title of the slide is part of the content and it has to be displayed

better to have it as an element

20

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

The XML Prolog An XML file always starts with a prolog

it is a processing instruction <?app instr ?> minimal: <?xml version=“1.0”?> attributes

version XML version used in the data

(optional) encoding character set used to encode data (ex. ISO-8859-1,

UTF-8) (optional) standalone

whether or not the document references external entities

21

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

XML Parser Document processing

Read phase Syntax check Validation (validating parsers only) Errors report (fatal errors, errors, warnings) Access to data

DOM (Document Object Model) parser generate the whole tree-like data structure of the

document (favour random access) SAX (Simple API for XML) parser

event-driven serial access protocol

22

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

XML Browsing Some browsers can parse

XML MS Internet Explorer and

Netscape Navigator use DOM parsers

Elements can be hidden

23

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Well-Formed Documents I All attribute values must be in quotes

The single-quote character (the apostrophe) may be used if the value contains a double-quote character, and vice versa

For isolated quotes as data, you can use &apos; or &quot; Do not under any circumstances use the automated

typographic (‘curly’) inverted commas substituted by some word-processors for quoting attribute values (like in some of this power point slides!!)

Elements must nest inside each other properly no overlapping markup (same as for HTML)

Exactly one root element (after the declaration) All tags must be balanced

every element which may contain character data or sub-elements must have both the start-tag and the end-tag present

(omission is not allowed except for EMPTY elements)

24

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Well-Formed Documents II Any EMPTY elements (like HTML's IMG, BR, HR and others)

must either end with /> or they must look like non-EMPTY elements by having a real end-tag (but no content)

Example: <br> would become either <br/> or <br></br> (with nothing in between)

There must not be any isolated markup-start characters (< or &) in your text data.

They must be given as &lt; and &amp; The sequence ]]> may only occur as the end of a CDATA

marked section if you are using it for any other purpose it must be given

as ]]&gt;

25

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Well-Formed Documents III XML files with no DTD are considered to have

&lt; (it represents the character < ) &gt; (it represents the character > ) &apos; (it represents the character ' ) &quot; (it represents the character " ) &amp; (it represents the character & ) predefined and thus available for use

With a DTD, all entities must be declared, including these five

DTDless well-formed documents may use attributes on any element,

but the attributes are all assumed to be of type CDATA.

26

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Parsing Errors: Example

27

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Entities: Example

<?xml version="1.0"?><message> <from><!-- Deitel and Associates --> &#1583;&#1575;&#1610;&#1578;&#1614; &#1604;&#1571;&#1606;&#1583; </from> <subject>&lt;&quot;it&apos;s

me&quot;&gt;</subject></message>

28

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

Entities: Example

29

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

CDATA Sections When large blocks of text include many

special characters it is inconvenient use entity references

Character Data (CDATA) sections can be used instead analogous to HTML tags <pre> ... </pre> they start with <![CDATA[ they finishes with ]]> characters in the middle are NOT INTERPRETED

by the parser

30

Roberto Bruni @ IMT Lucca 8 March 2005

Models and Languages for Coordination and Orchestration

InstitutionsMarketsTechnologies

IMT

CDATA: Example<?xml version="1.0"?><diagram> <![CDATA[ A --"execute"---> B ^ | |"compile" | C ]]></diagram>