Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
University of Helsinki – Department of Computer Science
The XML MetalanguageMika Raento
University of Helsinki – Department of Computer Science
Mika Raento The XML Metalanguage – p.1/442 2003-09-15
University of Helsinki – Department of Computer Science
Preliminaries
Mika Raento The XML Metalanguage – Preliminaries – p.2/442 2003-09-15
University of Helsinki – Department of Computer Science
PreliminariesMotivation
Practicalities
Course Overview
Mika Raento The XML Metalanguage – Preliminaries – p.3/442 2003-09-15
University of Helsinki – Department of Computer Science
MotivationXML is used to
Publish documents (Linux documentation inDocBook, Reference works such as Dictionaries) inseveral formats from the same contents
Publish news items on the web via RDF (for example:Slashdot, CNN, Mozillazine) that can beincorporated to other web sites or client software
Store program settings and preferences (Gnome)
Mika Raento The XML Metalanguage – Preliminaries – p.4/442 2003-09-15
University of Helsinki – Department of Computer Science
MotivationXML is used to
Exchange business documents, such as invoices andinventories (ebXML – new standard for EDI, forexample Finnish customs documents)
Make remote procedure calls over the Internet(SOAP=Web Services allows calling Google fromyour code)
Build message-based large-scale software (SAP R/3integration, Ascade Cockpit application)
Mika Raento The XML Metalanguage – Preliminaries – p.5/442 2003-09-15
University of Helsinki – Department of Computer Science
MotivationXML is
Fairly easy to learn
Human and machine-readable
Lightweight for processing
easy to find software for
Mika Raento The XML Metalanguage – Preliminaries – p.6/442 2003-09-15
University of Helsinki – Department of Computer Science
Practicalities2 study weeks – 8× 2 hours lectures – 8× 2 hoursexercises
Lecturer Mika Raento, D419, available Thu 16–17
Lectures in Finnish, material in English, one exercisegroup in English
One course exam on Nov 10th
Literature: The XML Companion, 3rd edition by NeilBradley — or use the web
Course web site onhttp://www.cs.helsinki.fi/u/mraento/teaching/xml_s03/
Newsgroup news:hy.opiskelu.tktl.xmlMika Raento The XML Metalanguage – Preliminaries – p.7/442 2003-09-15
University of Helsinki – Department of Computer Science
PracticalitiesExam + project work
Two ways to complete project work:
Exercises + smaller project that will be partially doneat the exercises. You are allowed to miss at most twoexercises. OR
Larger standalone project work (details at the courseweb page)
So: cancel your registration at an exercise group, if youdon’t think you’ll be able to attend
Mika Raento The XML Metalanguage – Preliminaries – p.8/442 2003-09-15
University of Helsinki – Department of Computer Science
PracticalitiesGrading maximum 60 points
Exam 30 points, 15 points minimum to pass
Project work 30 points, 15 points minimum to pass (10points from exercise attendance, 20 points from work OR30 points from larger project work)
3 extra points (above 60) for exercises attended after theminimum six
No exercise points taken into account if you take theseparate exam
Mika Raento The XML Metalanguage – Preliminaries – p.9/442 2003-09-15
University of Helsinki – Department of Computer Science
Course Overview1. Introduction. History. Motivation
2. From HTML to XML. Well-formedness and validity.
3. DTD basics. Document modelling with DTDs.
4. DTD limitations. Alternatives
5. Namespaces. XML processing.
6. XSLT transformations, XPath. Mind set. Techniques,strength.
7. FO. Basics, mind set, more advanced topics.
8. Combining. Wrapup. Related standards.
Mika Raento The XML Metalanguage – Preliminaries – p.10/442 2003-09-15
University of Helsinki – Department of Computer Science
Introduction
Mika Raento The XML Metalanguage – Introduction – p.11/442 2003-09-15
University of Helsinki – Department of Computer Science
IntroductionWhat is XML?
What does it look like?
What does ’metalanguage’ mean?
XML-processors
DTDs
Transformations
Style definitions
Mika Raento The XML Metalanguage – Introduction – p.12/442 2003-09-15
University of Helsinki – Department of Computer Science
eXtensible Markup LanguageW3C recommendation
Version 1.0 (1.1 candidate recommendation)
1st edition 1998-02-10, 2nd (current) ed. 2000-10-06
An agreed-upon textual format for representingtree-structured data
For storing, combining, exchanging and publishinginformation
Human- and machine-readable
Mika Raento The XML Metalanguage – Introduction – p.13/442 2003-09-15
University of Helsinki – Department of Computer Science
XML document instance
<!-- Example document instance -->
←comment
<university>
<department>
←tag
<name>
Department of Computer Science
←element
</name>
<address>
Teollisuuskatu 23
</address>
</department>
←tag
</university>
Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15
University of Helsinki – Department of Computer Science
XML document instance
<!-- Example document instance -->
←comment
<university>
<department> ←tag<name>
Department of Computer Science
←element
</name>
<address>
Teollisuuskatu 23
</address>
</department> ←tag</university>
Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15
University of Helsinki – Department of Computer Science
XML document instance
<!-- Example document instance -->
←comment
<university>
<department>
←tag
<name>
Department of Computer Science ←element</name>
<address>
Teollisuuskatu 23
</address>
</department>
←tag
</university>
Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15
University of Helsinki – Department of Computer Science
XML document instance
<!-- Example document instance --> ←comment<university>
<department>
←tag
<name>
Department of Computer Science
←element
</name>
<address>
Teollisuuskatu 23
</address>
</department>
←tag
</university>
Mika Raento The XML Metalanguage – Introduction – p.14/442 2003-09-15
University of Helsinki – Department of Computer Science
XML document instanceActual document contents, that have been marked up inan agreed way
Self-describing (for humans) tags
Elements and nested elements, meaningful units ofinformation
Text within elements
Comments
Mika Raento The XML Metalanguage – Introduction – p.15/442 2003-09-15
University of Helsinki – Department of Computer Science
Logical vs. physical structureLogical structure
Logical relationships and constraints
Describes the structure of the information content
Physical structure
Entities
Characters and character set
Files
Mika Raento The XML Metalanguage – Introduction – p.16/442 2003-09-15
University of Helsinki – Department of Computer Science
XML ProcessorsXML parser
Finds errors
Provides information to applications
Entity (document part) management
Combines entities to documents
Combines physical files
Mika Raento The XML Metalanguage – Introduction – p.17/442 2003-09-15
University of Helsinki – Department of Computer Science
Metalanguage XMLXML provides a general syntax for tree structured data
Users provide a application-specific grammar for thissyntax
Element names
Element order and nesting
Certain reserved words
This grammar is called a Document Type Definition, DTD
Mika Raento The XML Metalanguage – Introduction – p.18/442 2003-09-15
University of Helsinki – Department of Computer Science
Example DTD<!-{}- Document Type Definition (DTD) example -->
<!ELEMENT university (department+)>
<!ELEMENT department (name, address)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
Mika Raento The XML Metalanguage – Introduction – p.19/442 2003-09-15
University of Helsinki – Department of Computer Science
DTDDefines the type of the document, or its structure
One rule per element
Name of the element
Allowed content
Grammar for document instances
”Regular expressions” for element content (may berecursive)
not required in XML
Mika Raento The XML Metalanguage – Introduction – p.20/442 2003-09-15
University of Helsinki – Department of Computer Science
Advantages of using DTDsAllows a validating parser
Checks that the document instance corresponds tothe DTD
Consistent use of the tags
Standard DTDs for specific applications
A common vocabulary
Mika Raento The XML Metalanguage – Introduction – p.21/442 2003-09-15
University of Helsinki – Department of Computer Science
Descriptive (declarative) markupGeneralized markup (no formatting (informationnecessarily))
syntactic form without semantics
however includes element names that describe content
In addition we need a way to describe formatting and e.g.links so that we can present the information to humans
Mika Raento The XML Metalanguage – Introduction – p.22/442 2003-09-15
University of Helsinki – Department of Computer Science
Descriptive vs. procedural markupDescriptive
Categorises the document into parts
logical (logical parts and their relations)
self-describing
content and format separated
E.g. XML
Mika Raento The XML Metalanguage – Introduction – p.23/442 2003-09-15
University of Helsinki – Department of Computer Science
Descriptive vs. procedural markupProcedural
Defines what processing is to be carried out on thedocument
Visible (e.g. LATEX) or invisible (e.g. Word)
Formatting information
Content and format mixed
This distinction is neither clear-cut (e.g. LATEX with only\section, \subsection etc., or applications of XML such asXSL) nor all-encompassing, but provides a useful startingpoint
Mika Raento The XML Metalanguage – Introduction – p.24/442 2003-09-15
University of Helsinki – Department of Computer Science
StylesheetsDefines the presentation (output) format
Possibly several per DTD and/or document instance
Cascading Style Sheets (CSS)
XML Stylesheet Language (XSL)
(DSSSL, not covered in this course)
Mika Raento The XML Metalanguage – Introduction – p.25/442 2003-09-15
University of Helsinki – Department of Computer Science
Other applications of XMLData transfer
Subsets (views) of relational databases
EDI (Electronic data interchange)
Message-based applications
Coarse-grained RPC
Publishing
Documents
Metadata
Etc.
Mika Raento The XML Metalanguage – Introduction – p.26/442 2003-09-15
University of Helsinki – Department of Computer Science
Publishing process
XMLdocument
XSL/CSSstylesheet
FormatteddocumentFormatting
Mika Raento The XML Metalanguage – Introduction – p.27/442 2003-09-15
University of Helsinki – Department of Computer Science
History of XML — SGMLStandard Generalised Markup Language
Based (in part) on IBM’s GML (1969)
Introduced in 1974, ISO standard 1986
Large and complicated
Tools correspondingly large and complicated← few andexpensive
XML has basically the same expressive power (almost aproper subset)
Still widely used in publishing of very large documents/document collections
Mika Raento The XML Metalanguage – Introduction – p.28/442 2003-09-15
University of Helsinki – Department of Computer Science
History of XML — HTMLHyperText Markup Language (first proposal 1989–1990,HTML 2.0 IETF (RFC) standard 1995)
Huge success, basis for the web
Non-standard extensions problematic (althoughnowadays mainly in javascript/DOM)
Lots of tools available
Mika Raento The XML Metalanguage – Introduction – p.29/442 2003-09-15
University of Helsinki – Department of Computer Science
History of XML — HTMLAn SGML DTD + predefined semantics
Practical when the only purpose is to presentinformation
Easy and pleasing presentation in a browser
Focuses on tags for book-like document structure,presentation and linking
Mika Raento The XML Metalanguage – Introduction – p.30/442 2003-09-15
University of Helsinki – Department of Computer Science
XML — SGML — HTMLXML combines good features from both SGML(expressiveness, extensibility) and HTML (simple, easyto understand)
Lots of tools available
XHTML is HTML cast into an XML DTD (instead ofSGML)
SGML may still be the best format for very largedocuments
XML does not solve all problems
All three languages are needed
Mika Raento The XML Metalanguage – Introduction – p.31/442 2003-09-15
University of Helsinki – Department of Computer Science
XML Design principles1. XML shall be straightforwardly usable over the Internet
2. XML shall support a wide variety of applications
3. XML shall be compatible with SGML
4. It shall be easy to write programs which process XMLdocuments
5. The number of optional features in XML is to be kept tothe absolute minimum, ideally zero
Mika Raento The XML Metalanguage – Introduction – p.32/442 2003-09-15
University of Helsinki – Department of Computer Science
XML Design principles6. XML documents should be human-legible and
reasonably clear
7. The XML design should be prepared quickly
8. The design of XML shall be formal and concise
9. XML documents shall be easy to create
10. Terseness is of minimal importance
Mika Raento The XML Metalanguage – Introduction – p.33/442 2003-09-15
University of Helsinki – Department of Computer Science
Related standardsXLink — hyperlinking for XML
XPath — locating and selecting XML document parts
XPointer — The reference language for XLink
XSL — XML stylesheet language
XSLT — XSL transformations
SAX — stream-oriented XML API
DOM — tree-oriented XML API
Mika Raento The XML Metalanguage – Introduction – p.34/442 2003-09-15
University of Helsinki – Department of Computer Science
On this courseIntroduction, motivation, background
XML documents (XML)
XML DTDs (XML)
XML transformations (XSLT)
Stylesheets (CSS, XSL)
Some tools
Related information (XML Schema, XPath, XMLNamespaces)
Mika Raento The XML Metalanguage – Introduction – p.35/442 2003-09-15
University of Helsinki – Department of Computer Science
LiteratureBradley : The XML Companion
http://www.w3.org/XML/
http://www.xml.org
http://www.xml.com
http://www.xmlsoftware.com
http://www.xslinfo.com
http://xml.coverpages.org
http://xml.apache.org
http://www.cs.helsinki.fi/u/ruini/structure/xml/
Mika Raento The XML Metalanguage – Introduction – p.36/442 2003-09-15
University of Helsinki – Department of Computer Science
This lecture in literatureBradley: 2, 3, 31
or
XML in 10 points,http://www.w3.org/XML/1999/XML-in-10-points
Norman Walsh, a Technical Introduction to XML,http://nwalsh.com/docs/articles/xml/ , upto ’EntityReferences’
Greg Meyer: An Overview of the XML...,http://www.stsc.hill.af.mil/crosstalk/1998/06/xml.asp
Connolly et.al, The Evolution of Web Documents,http://www.xml.com/pub/a/w3j/s3.connolly.html
Mika Raento The XML Metalanguage – Introduction – p.37/442 2003-09-15