40
University of Helsinki – Department of Computer Science The XML Metalanguage Mika Raento [email protected] University of Helsinki – Department of Computer Science Mika Raento The XML Metalanguage – p.1/442 2003-09-15

The XML Metalanguage - Mika Raentomikie.iki.fi/teaching/xml_s03/slides-lect1.pdf · Metalanguage XML XML provides a general syntax for tree structured data Users provide a application-specic

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

University of Helsinki – Department of Computer Science

The XML MetalanguageMika Raento

[email protected]

University of Helsinki – Department of Computer Science

Mika Raento The XML Metalanguage – p.1/442 2003-09-15

University of Helsinki – Department of Computer Science

Preliminaries

Mika Raento The XML Metalanguage – Preliminaries – p.2/442 2003-09-15

University of Helsinki – Department of Computer Science

PreliminariesMotivation

Practicalities

Course Overview

Mika Raento The XML Metalanguage – Preliminaries – p.3/442 2003-09-15

University of Helsinki – Department of Computer Science

MotivationXML is used to

Publish documents (Linux documentation inDocBook, Reference works such as Dictionaries) inseveral formats from the same contents

Publish news items on the web via RDF (for example:Slashdot, CNN, Mozillazine) that can beincorporated to other web sites or client software

Store program settings and preferences (Gnome)

Mika Raento The XML Metalanguage – Preliminaries – p.4/442 2003-09-15

University of Helsinki – Department of Computer Science

MotivationXML is used to

Exchange business documents, such as invoices andinventories (ebXML – new standard for EDI, forexample Finnish customs documents)

Make remote procedure calls over the Internet(SOAP=Web Services allows calling Google fromyour code)

Build message-based large-scale software (SAP R/3integration, Ascade Cockpit application)

Mika Raento The XML Metalanguage – Preliminaries – p.5/442 2003-09-15

University of Helsinki – Department of Computer Science

MotivationXML is

Fairly easy to learn

Human and machine-readable

Lightweight for processing

easy to find software for

Mika Raento The XML Metalanguage – Preliminaries – p.6/442 2003-09-15

University of Helsinki – Department of Computer Science

Practicalities2 study weeks – 8× 2 hours lectures – 8× 2 hoursexercises

Lecturer Mika Raento, D419, available Thu 16–17

Lectures in Finnish, material in English, one exercisegroup in English

One course exam on Nov 10th

Literature: The XML Companion, 3rd edition by NeilBradley — or use the web

Course web site onhttp://www.cs.helsinki.fi/u/mraento/teaching/xml_s03/

Newsgroup news:hy.opiskelu.tktl.xmlMika Raento The XML Metalanguage – Preliminaries – p.7/442 2003-09-15

University of Helsinki – Department of Computer Science

PracticalitiesExam + project work

Two ways to complete project work:

Exercises + smaller project that will be partially doneat the exercises. You are allowed to miss at most twoexercises. OR

Larger standalone project work (details at the courseweb page)

So: cancel your registration at an exercise group, if youdon’t think you’ll be able to attend

Mika Raento The XML Metalanguage – Preliminaries – p.8/442 2003-09-15

University of Helsinki – Department of Computer Science

PracticalitiesGrading maximum 60 points

Exam 30 points, 15 points minimum to pass

Project work 30 points, 15 points minimum to pass (10points from exercise attendance, 20 points from work OR30 points from larger project work)

3 extra points (above 60) for exercises attended after theminimum six

No exercise points taken into account if you take theseparate exam

Mika Raento The XML Metalanguage – Preliminaries – p.9/442 2003-09-15

University of Helsinki – Department of Computer Science

Course Overview1. Introduction. History. Motivation

2. From HTML to XML. Well-formedness and validity.

3. DTD basics. Document modelling with DTDs.

4. DTD limitations. Alternatives

5. Namespaces. XML processing.

6. XSLT transformations, XPath. Mind set. Techniques,strength.

7. FO. Basics, mind set, more advanced topics.

8. Combining. Wrapup. Related standards.

Mika Raento The XML Metalanguage – Preliminaries – p.10/442 2003-09-15

University of Helsinki – Department of Computer Science

Introduction

Mika Raento The XML Metalanguage – Introduction – p.11/442 2003-09-15

University of Helsinki – Department of Computer Science

IntroductionWhat is XML?

What does it look like?

What does ’metalanguage’ mean?

XML-processors

DTDs

Transformations

Style definitions

Mika Raento The XML Metalanguage – Introduction – p.12/442 2003-09-15

University of Helsinki – Department of Computer Science

eXtensible Markup LanguageW3C recommendation

Version 1.0 (1.1 candidate recommendation)

1st edition 1998-02-10, 2nd (current) ed. 2000-10-06

An agreed-upon textual format for representingtree-structured data

For storing, combining, exchanging and publishinginformation

Human- and machine-readable

Mika Raento The XML Metalanguage – Introduction – p.13/442 2003-09-15

University of Helsinki – Department of Computer Science

XML document instance

<!-- Example document instance -->

←comment

<university>

<department>

←tag

<name>

Department of Computer Science

←element

</name>

<address>

Teollisuuskatu 23

</address>

</department>

←tag

</university>

Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15

University of Helsinki – Department of Computer Science

XML document instance

<!-- Example document instance -->

←comment

<university>

<department> ←tag<name>

Department of Computer Science

←element

</name>

<address>

Teollisuuskatu 23

</address>

</department> ←tag</university>

Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15

University of Helsinki – Department of Computer Science

XML document instance

<!-- Example document instance -->

←comment

<university>

<department>

←tag

<name>

Department of Computer Science ←element</name>

<address>

Teollisuuskatu 23

</address>

</department>

←tag

</university>

Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15

University of Helsinki – Department of Computer Science

XML document instance

<!-- Example document instance --> ←comment<university>

<department>

←tag

<name>

Department of Computer Science

←element

</name>

<address>

Teollisuuskatu 23

</address>

</department>

←tag

</university>

Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15

University of Helsinki – Department of Computer Science

XML document instanceActual document contents, that have been marked up inan agreed way

Self-describing (for humans) tags

Elements and nested elements, meaningful units ofinformation

Text within elements

Comments

Mika Raento The XML Metalanguage – Introduction – p.15/442 2003-09-15

University of Helsinki – Department of Computer Science

Logical vs. physical structureLogical structure

Logical relationships and constraints

Describes the structure of the information content

Physical structure

Entities

Characters and character set

Files

Mika Raento The XML Metalanguage – Introduction – p.16/442 2003-09-15

University of Helsinki – Department of Computer Science

XML ProcessorsXML parser

Finds errors

Provides information to applications

Entity (document part) management

Combines entities to documents

Combines physical files

Mika Raento The XML Metalanguage – Introduction – p.17/442 2003-09-15

University of Helsinki – Department of Computer Science

Metalanguage XMLXML provides a general syntax for tree structured data

Users provide a application-specific grammar for thissyntax

Element names

Element order and nesting

Certain reserved words

This grammar is called a Document Type Definition, DTD

Mika Raento The XML Metalanguage – Introduction – p.18/442 2003-09-15

University of Helsinki – Department of Computer Science

Example DTD<!-{}- Document Type Definition (DTD) example -->

<!ELEMENT university (department+)>

<!ELEMENT department (name, address)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT address (#PCDATA)>

Mika Raento The XML Metalanguage – Introduction – p.19/442 2003-09-15

University of Helsinki – Department of Computer Science

DTDDefines the type of the document, or its structure

One rule per element

Name of the element

Allowed content

Grammar for document instances

”Regular expressions” for element content (may berecursive)

not required in XML

Mika Raento The XML Metalanguage – Introduction – p.20/442 2003-09-15

University of Helsinki – Department of Computer Science

Advantages of using DTDsAllows a validating parser

Checks that the document instance corresponds tothe DTD

Consistent use of the tags

Standard DTDs for specific applications

A common vocabulary

Mika Raento The XML Metalanguage – Introduction – p.21/442 2003-09-15

University of Helsinki – Department of Computer Science

Descriptive (declarative) markupGeneralized markup (no formatting (informationnecessarily))

syntactic form without semantics

however includes element names that describe content

In addition we need a way to describe formatting and e.g.links so that we can present the information to humans

Mika Raento The XML Metalanguage – Introduction – p.22/442 2003-09-15

University of Helsinki – Department of Computer Science

Descriptive vs. procedural markupDescriptive

Categorises the document into parts

logical (logical parts and their relations)

self-describing

content and format separated

E.g. XML

Mika Raento The XML Metalanguage – Introduction – p.23/442 2003-09-15

University of Helsinki – Department of Computer Science

Descriptive vs. procedural markupProcedural

Defines what processing is to be carried out on thedocument

Visible (e.g. LATEX) or invisible (e.g. Word)

Formatting information

Content and format mixed

This distinction is neither clear-cut (e.g. LATEX with only\section, \subsection etc., or applications of XML such asXSL) nor all-encompassing, but provides a useful startingpoint

Mika Raento The XML Metalanguage – Introduction – p.24/442 2003-09-15

University of Helsinki – Department of Computer Science

StylesheetsDefines the presentation (output) format

Possibly several per DTD and/or document instance

Cascading Style Sheets (CSS)

XML Stylesheet Language (XSL)

(DSSSL, not covered in this course)

Mika Raento The XML Metalanguage – Introduction – p.25/442 2003-09-15

University of Helsinki – Department of Computer Science

Other applications of XMLData transfer

Subsets (views) of relational databases

EDI (Electronic data interchange)

Message-based applications

Coarse-grained RPC

Publishing

Documents

Metadata

Etc.

Mika Raento The XML Metalanguage – Introduction – p.26/442 2003-09-15

University of Helsinki – Department of Computer Science

Publishing process

XMLdocument

XSL/CSSstylesheet

FormatteddocumentFormatting

Mika Raento The XML Metalanguage – Introduction – p.27/442 2003-09-15

University of Helsinki – Department of Computer Science

History of XML — SGMLStandard Generalised Markup Language

Based (in part) on IBM’s GML (1969)

Introduced in 1974, ISO standard 1986

Large and complicated

Tools correspondingly large and complicated← few andexpensive

XML has basically the same expressive power (almost aproper subset)

Still widely used in publishing of very large documents/document collections

Mika Raento The XML Metalanguage – Introduction – p.28/442 2003-09-15

University of Helsinki – Department of Computer Science

History of XML — HTMLHyperText Markup Language (first proposal 1989–1990,HTML 2.0 IETF (RFC) standard 1995)

Huge success, basis for the web

Non-standard extensions problematic (althoughnowadays mainly in javascript/DOM)

Lots of tools available

Mika Raento The XML Metalanguage – Introduction – p.29/442 2003-09-15

University of Helsinki – Department of Computer Science

History of XML — HTMLAn SGML DTD + predefined semantics

Practical when the only purpose is to presentinformation

Easy and pleasing presentation in a browser

Focuses on tags for book-like document structure,presentation and linking

Mika Raento The XML Metalanguage – Introduction – p.30/442 2003-09-15

University of Helsinki – Department of Computer Science

XML — SGML — HTMLXML combines good features from both SGML(expressiveness, extensibility) and HTML (simple, easyto understand)

Lots of tools available

XHTML is HTML cast into an XML DTD (instead ofSGML)

SGML may still be the best format for very largedocuments

XML does not solve all problems

All three languages are needed

Mika Raento The XML Metalanguage – Introduction – p.31/442 2003-09-15

University of Helsinki – Department of Computer Science

XML Design principles1. XML shall be straightforwardly usable over the Internet

2. XML shall support a wide variety of applications

3. XML shall be compatible with SGML

4. It shall be easy to write programs which process XMLdocuments

5. The number of optional features in XML is to be kept tothe absolute minimum, ideally zero

Mika Raento The XML Metalanguage – Introduction – p.32/442 2003-09-15

University of Helsinki – Department of Computer Science

XML Design principles6. XML documents should be human-legible and

reasonably clear

7. The XML design should be prepared quickly

8. The design of XML shall be formal and concise

9. XML documents shall be easy to create

10. Terseness is of minimal importance

Mika Raento The XML Metalanguage – Introduction – p.33/442 2003-09-15

University of Helsinki – Department of Computer Science

Related standardsXLink — hyperlinking for XML

XPath — locating and selecting XML document parts

XPointer — The reference language for XLink

XSL — XML stylesheet language

XSLT — XSL transformations

SAX — stream-oriented XML API

DOM — tree-oriented XML API

Mika Raento The XML Metalanguage – Introduction – p.34/442 2003-09-15

University of Helsinki – Department of Computer Science

On this courseIntroduction, motivation, background

XML documents (XML)

XML DTDs (XML)

XML transformations (XSLT)

Stylesheets (CSS, XSL)

Some tools

Related information (XML Schema, XPath, XMLNamespaces)

Mika Raento The XML Metalanguage – Introduction – p.35/442 2003-09-15

University of Helsinki – Department of Computer Science

LiteratureBradley : The XML Companion

http://www.w3.org/XML/

http://www.xml.org

http://www.xml.com

http://www.xmlsoftware.com

http://www.xslinfo.com

http://xml.coverpages.org

http://xml.apache.org

http://www.cs.helsinki.fi/u/ruini/structure/xml/

Mika Raento The XML Metalanguage – Introduction – p.36/442 2003-09-15

University of Helsinki – Department of Computer Science

This lecture in literatureBradley: 2, 3, 31

or

XML in 10 points,http://www.w3.org/XML/1999/XML-in-10-points

Norman Walsh, a Technical Introduction to XML,http://nwalsh.com/docs/articles/xml/ , upto ’EntityReferences’

Greg Meyer: An Overview of the XML...,http://www.stsc.hill.af.mil/crosstalk/1998/06/xml.asp

Connolly et.al, The Evolution of Web Documents,http://www.xml.com/pub/a/w3j/s3.connolly.html

Mika Raento The XML Metalanguage – Introduction – p.37/442 2003-09-15