Pal gov.tutorial2.session1.xml basics and namespaces

1PalGov © 2011 1PalGov © 2011

أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy

www.egovacademy.ps

Dr. Ismail M. Romi

Palestine Polytechnic University

Tutorial II: Data Integration and Open Information Systems

Session1

XML Basics and Namespaces


About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the

Commission of the European Communities, grant agreement 511159-TEMPUS-1-

2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps

University of Trento, Italy

University of Namur, Belgium

Vrije Universiteit Brussel, Belgium

TrueTrust, UK

Birzeit University, Palestine

(Coordinator )

Palestine Polytechnic University, Palestine

Palestine Technical University, PalestineUniversité de Savoie, France

Ministry of Local Government, Palestine

Ministry of Telecom and IT, Palestine

Ministry of Interior, Palestine

Project Consortium:

Coordinator:

Dr. Mustafa Jarrar

Birzeit University, P.O.Box 14- Birzeit, Palestine

Telfax:+972 2 2982935 [email protected]

http://www.egovacademy.ps/


© Copyright Notes

Everyone is encouraged to use this material, or part of it, but should

properly cite the project (logo and website), and the author of that part.

No part of this tutorial may be reproduced or modified in any form or by

any means, without prior written permission from the project, who have

the full copyrights on the material.

Attribution-NonCommercial-ShareAlike

CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-

commercially, as long as they credit you and license their new creations

under the identical terms.

4PalGov © 2011

Tutorial Map

Topic h

Session 1: XML Basics and Namespaces 3

Session 2: XML DTD‘s 3

Session 3: XML Schemas 3

Session 4: Lab-XML Schemas 3

Session 5: RDF and RDFs 3

Session 6: Lab-RDF and RDFs 3

Session 7: OWL (Ontology Web Language) 3

Session 8: Lab-OWL 3

Session 9: Lab-RDF Stores -Challenges and Solutions 3

Session 10: Lab-SPARQL 3

Session 11: Lab-Oracle Semantic Technology 3

Session 12_1: The problem of Data Integration 1.5

Session 12_2: Architectural Solutions for the Integration Issues 1.5

Session 13_1: Data Schema Integration 1

Session 13_2: GAV and LAV Integration 1

Session 13_3: Data Integration and Fusion using RDF 1

Session 14: Lab-Data Integration and Fusion using RDF 3

Session 15_1: Data Web and Linked Data 1.5

Session 15_2: RDFa 1.5

Session 16: Lab-RDFa 3

Intended Learning Objectives

A: Knowledge and Understanding

2a1: Describe tree and graph data models.

2a2: Understand the notation of XML, RDF, RDFS, and OWL.

2a3: Demonstrate knowledge about querying techniques for data

models as SPARQL and XPath.

2a4: Explain the concepts of identity management and Linked data.

2a5: Demonstrate knowledge about Integration &fusion of

heterogeneous data.

B: Intellectual Skills

2b1: Represent data using tree and graph data models (XML &

RDF).

2b2: Describe data semantics using RDFS and OWL.

2b3: Manage and query data represented in RDF, XML, OWL.

2b4: Integrate and fuse heterogeneous data.

C: Professional and Practical Skills

2c1: Using Oracle Semantic Technology and/or Virtuoso to store

and query RDF stores.

D: General and Transferable Skills2d1: Working with team.

2d2: Presenting and defending ideas.

2d3: Use of creativity and innovation in problem solving.

2d4: Develop communication skills and logical reasoning abilities.


Session ILO’s:

After completing this session students will be able to:

•Describe tree and graph data models.

•Understand the notation of XML.


Session1: XML Basics and Namespaces

Session Overview:

< Markup language />

< What is XML? />

< Components of XML Document/>

< Why we need namespaces />

< The syntax for using namespaces />

< What is a URI, a URL, and a URN />


Markup

• Information added to the document that

enhances its meaning.

• It identifies the parts and how they relate to

each other.


Markup language

A modern system for annotating a text in a

way that is syntactically distinguishable from

that text.set of words and symbols for describing the identity of

pieces of a document (for example ‗this is a paragraph‘, ‗this is a

heading‘, ‗this is a list‘, ‗this is the caption of this figure‘, etc).

Programs can use this with a style sheet to create

output for screen, print, audio, video, Braille, etc.

Some markup languages (eg those used in word processors)

only describe appearances (‗this is italics‘, ‗this is

bold‘), but this method can only be used for display,

and is not normally re-usable for anything else.

http://en.wikipedia.org/wiki/Annotation


History of Markup

Efforts starts in 1960‘s

TROFF, TEX:

Presentation and formatting printed documents.

GenCod: (General Coding):

Uses descriptive generic tags to assemble

documents from multiple pieces.

GML: (IBM)-Generalized Markup Language:

Encoding documents for use with multiple

information subsystems.

Document can be edited, formatted, searched

by different programs.

10PalGov © 2011 10PalGov © 2011

History of Markup…Cont

SGML Generalized Markup Language.

A framework for developing specialized markup

language.

Encode general purpose documents (books,

journals….)

Flexible, all-encompassing coding scheme.

Used for very large documentation projects.

Its usefulness limited to large organizations (high

requirements).

Companies develop their own SGML‘s, this means

that not compatible to browsers (ms-Explorer,

Netscape…)

11PalGov © 2011 11PalGov © 2011

History of Markup…Cont -

HTML: Hypertext markup language

Developed I mid 1990‘s

Simple

Generic code principles

Specific tags (commands).

Tags are presentational and limited

Open standard (free not tied to any technology).

Limited in it‘s scope and can‘t be extended.

12PalGov © 2011 12PalGov © 2011

XML: Extensible markup language

Combines the flexibility of SGML and the

simplicity of HTML

The W3C released the official XML version 1.0

specifications in 1998.

XML quickly gained popularity in the web

community.

XML itself is not a language, but rather a set of

rules that can be used to create markup

languages.

History of Markup…Cont

13PalGov © 2011 13PalGov © 2011

What is XML?

• A protocol for containing and managing information.

XML is really all about creating your own markup.

Technically, XML is a meta-language, which means it's a

language that lets you create your own markup languages.

Unlike HTML, XML is meant for storing data, not displaying it.

XML provides you with a way of containing, shaping,

structuring, and protecting data in documents.

XML is a general purpose information storage system.

XML documents are portable because they can be

interpreted by many different applications.

14PalGov © 2011 14PalGov © 2011

Because Anyone is free to mark up data in any way

using the language, even if others are doing it in

different ways.

We have full control over the creation of our XML

document.

Data can be shaped in any preferred way:• You can create data in a way that only one particular computer

program will ever use, we can do so.

• You can share your data with other programs, or even other

companies across the Internet, XML gives flexibility to do that

as well.

You are free to structure the same data in different

ways that suit the requirements of an application

or category of applications.

Why “Extensible?’’

15PalGov © 2011 15PalGov © 2011

Functions of XML

1. Store and retrieve data

2. Formatting documents:• Putting data in a presentable form.

3. Ensure data integrity:• Guarantee a minimal level of trust in data (hasn‘t been

corrupted, truncated, mistyped, incomplete, broken….).

4. Support multiple languages:• Support the character set (Unicode) which supports

hundreds of scripts (Latin, Arabic…).

16PalGov © 2011 16PalGov © 2011

How I Get Started? Initial Requirements

1. Text Editor:

XML editor: Enables in composing and reading the

document, and prevent mistakes.

You can use (notepad) or any other editor that

support the character set used by the document.

2. XML Parser

A software program (XML processor) is required to process

an XML document (eg. Stylus).

3. Document Type Definition DTD, or Schema.

4. Viewing the Document :

View the document in technologies such as browsers or

XML environment (eg. Stylus).

17PalGov © 2011 17PalGov © 2011

Where XML Can Be Used

• Reducing Server Load:

• keeping all information on the client for as long as possible, and

then sending the information to those servers in one big XML

document.

• Website Content:

• Transforming the same XML document to many formats.

• Combining many formats to one XML file…

• Distributed Computing:

• XML can be used as a means for sending data for distributed

computing, where objects on one computer call objects on another

computer to do work.

• e-Commerce:

• XML is the perfect format for the exchanging data between

computer processes and applications.

• Computer to computer data transfer.

18PalGov © 2011 18PalGov © 2011

Components of XML Document

• XML Declaration

• Elements

• Attributes

• Entities

• Comments

19PalGov © 2011 19PalGov © 2011

Tag

• Construct that begins with < and ends with >

• Start tag <name>

• End tag </name>

• Tags constitute the markup of the document.

20PalGov © 2011 20PalGov © 2011

• Logical component of a document, used to

describe data, consists of:

– A start tag

– Content

– An end tag

• Example:

<first>John</first>

• The text between the start-tag and end-tag of

an element is called the element content.

Element

21PalGov © 2011 21PalGov © 2011

Rules for Elements/ Well-formed Document

Every start-tag must have a matching end-tag, or be a

self-closing tag.

Tags can‘t overlap; elements must be properly nested.

XML documents can have only one root element.

Element names must obey XML naming conventions.

XML is case sensitive.

XML will keep whitespace in your PCDATA

22PalGov © 2011 22PalGov © 2011

Naming Rules

√ Names can start with letters or the dash (-) character,

but not numbers or other punctuation characters.

√ After the first character, numbers, hyphens, and

periods are allowed.

√ Names can‘t contain spaces.

√ Names can‘t contain the colon (:) character.

√ Names can‘t start with the letters xml, in uppercase,

lowercase, or mixed

√ There can‘t be a space after the opening < character;

the name of the element must come immediately

after it.

23PalGov © 2011 23PalGov © 2011

Whitespace in PCDATA

• whitespace that includes things such as:

• The space character

• new lines (what you get when you press the Enter key),

• Tabs

• Whitespace is used to separate words, as well as to

make text more readable.

• In XML, no whitespace stripping takes place for

PCDATA.

• Example:<Tag>This is a paragraph. It has a whole bunch

Of space.</Tag>

• The PCDATA is:

This is a paragraph. It has a whole bunch

of space.

24PalGov © 2011 24PalGov © 2011

Whitespace in Markup

• There could be whitespace within an XML

document that‘s not actually part of the data.<Tag>

<AnotherTag>This is some XML</AnotherTag>

</Tag>

• Any whitespace contained within <AnotherTag>‘s PCDATA is

part of the data.

• The newline after <Tag>, and some spaces before

<AnotherTag>: These spaces could be there just to make the

document easier to read, while not actually being part of its

data.

• This ―readability‖ whitespace is called extraneous whitespace.

25PalGov © 2011 25PalGov © 2011

Attributes

• Simple name/value pairs associated with an element.

• Attributes attached to the start-tag, but not to the end-tag.

• Example:

<name univ=‖PPU‖>

• Attributes must have values—even if that value is just an

empty string (such as ―‖).

• Attributes values must be in quotes-single ‗ or double ―

• Quotes must be matched.

• You can include quote character in the attribute value.

• Attributes must be unique in the same element.

• Subjected to naming rules.

26PalGov © 2011 26PalGov © 2011

Attributes ….Cont

• The order in which attributes are included on

an element is not considered relevant.

• If an XML parser encounters an element like:<name first=‖John‖ middle=‖Fitzgerald Johansen‖ last=‖Doe‖></name>

• It doesn‘t necessarily have to give us the

attributes in that order, but can do so in any

order it wishes.

27PalGov © 2011 27PalGov © 2011

When to Use Attributes

• Using attributes to separate different types of

information.

• Attributes use so much less space.

• Elements can be more complex than attributes.

• Attributes are unordered.

Problems in Using Attributes

• Attributes can‘t contain multiple values –elements can.

• Attributes can‘t contain tree structure – elements can.

• Attributes are not expandable- element ere.

• Attributes can‘t force order- elements can.

28PalGov © 2011 28PalGov © 2011

Empty Elements

• An empty complex element cannot have

contents, only attributes.• Examples:

<product prodid="1345" />

<product></product>

<product/>

<product

prodid=―1345‖

/>

• Used when an element has no or optional PCDATA.

29PalGov © 2011 29PalGov © 2011

Trees

• XML is hierarchical in nature.

• Information is structured like a tree, with

parent-child relationships.

• This means that the order of information has

to be arranged in a tree structure.

• XML document forms a tree structure,

starting at the root, and branches, then to

the leaves.

30PalGov © 2011 30PalGov © 2011

Trees- Used Symbols

Element appears

multiple times

Element appears

one time only

Element can be

further broken

31PalGov © 2011 31PalGov © 2011

Tree- Example

<bookstore>

<book category="COOKING">

<title lang="en">Everyday Italian</title>

<author>Giada De Laurentiis</author>

<year>2005</year>

<price>30.00</price>

</book>

<book category="CHILDREN">

<title lang="en">Harry Potter</title>

<author>J K. Rowling</author>

<year>2005</year>


</book>

<book category="WEB">

<title lang="en">Learning XML</title>

<author>Erik T. Ray</author>

<year>2003</year>


</book>

</bookstore>

32PalGov © 2011 32PalGov © 2011

Comments

• XML comments ignored by the application

that processes the xml document.

• Useful for:

– Documentation

– Others viewing the document.

Syntax

< !- - Comment - - >

Example:

<!– this is an xml class -->

33PalGov © 2011 33PalGov © 2011

XML Declarations

• A small collection of details that prepare XML

processors for working with a document.

Syntax:<?xml version=’1.0’ encoding=’UTF-16’ standalone=’yes’?>

• The XML declaration starts with the characters <?xml and ends

with the characters ?>.

• If you include a declaration, you must include the version, but

the encoding and standalone attributes are optional.

• The version, encoding, and standalone attributes must be in that

order.

• The version should be 1.0 or 1.1

• The XML declaration must be right at the beginning of the file.

34PalGov © 2011 34PalGov © 2011

Version

• The version attribute specifies which version of the XML

specification the document adheres to.

• There are two versions of the XML specification, 1.0 and 1.1

Example:

<?xml version=‖1.0‖?>

Or


• 1.1 is new, most processors supports 1.0

35PalGov © 2011 35PalGov © 2011

Encoding

• Text is stored in computers using numbers (1s,0s).

• A character code is a one-to-one mapping between a

set of characters and the corresponding numbers to

represent those characters.

• Character encoding is the method used to represent

the numbers in a character code digitally (how many

bytes should be used for each number).

• ASCII: represents any character in numbers.

• ISO-8859-1: created to add additional characters not

covered by ASCII.

• UTF-16 : uses two bytes for every character,

(2 bytes = 16 bits = 65,356 possible values.

36PalGov © 2011 36PalGov © 2011

Encoding ….Cont

UTF-8: uses one byte for the characters covered ASCII.

• any other characters may be represented by two or

more bytes.

• UTF-8 & UTF-16:

√ UTF-8 will result in smaller file sizes (because each character

requires only one byte).

√ for text in other languages, UTF-16 can be smaller (because UTF-8

can require three or more bytes for some characters, whereas UTF-

16 would only require two).

37PalGov © 2011 37PalGov © 2011

Specifying a Character Encoding for XML … Cont

Examples:

• <?xml version=’1.0’ encoding=’UTF-16’ ?>

• <?xml version=’1.0’ encoding=’UTF-8’ ?>

• <?xml version=’1.0’ encoding=’ASCII’ ?>

• <?xml version=’1.0’ encoding= “ISO-8859-1” ?>

38PalGov © 2011 38PalGov © 2011

Standalone

• Standalone = {yes or no}

• Yes: specifies that the document exists

entirely on its own, without depending on any

other files.

• No: indicates that the document may depend

on an external DTD.

39PalGov © 2011 39PalGov © 2011

Why We Need Namespaces

Used to differentiate

elements and

attributes of different

XML document types

from each other when

combining them in

one document, or

even when

processing multiple

documents

simultaneously.


<person>

<name>

<title>Sir</title>

<first>John</first>

<middle>Fitzgerald Johansen</middle>

<last>Doe</last>

</name>

<position>Vice President of Marketing</position>

<résumé>

<html>

<head><title>Resume of John

Doe</title></head>

<body>

<h1>John Doe</h1>

John‘s a great guy, you know?

</body>

</html>

</résumé>

</person>

To an XML parser, there isn’t any

difference between the two

<title> elements in this document.

40PalGov © 2011 40PalGov © 2011

Using Prefixes

• The best way is for every

element in a document to

have a completely

distinct name.

• This may occur as follow:

– Grouping elements

– Giving each group a

unique prefix.

– Using the prefix in name

elements.

– Prefix:ElementName.


<pers:person>

<pers:name>

<pers:title>Sir</pers:title>

<pers:first>John</pers:first>

<pers:middle>Fitzgerald Johansen</pers:middle>

<pers:last>Doe</pers:last>

</pers:name>

<pers:position>Vice President of Marketing</pers:position>

<pers:résumé>

<xhtml:html>

<xhtml:head><xhtml:title>Resume of John Doe</xhtml:title>

</xhtml:head>

<xhtml:body>

<xhtml:h1>John Doe</xhtml:h1>

<xhtml:p>John‘s a great guy, you know?</xhtml:p>

</xhtml:body>

</xhtml:html>

</pers:résumé>

</pers:person>

41PalGov © 2011 41PalGov © 2011

Why Doesn’t XML Just Use These

Prefixes?

• Prefixes have to be unique.

• A problem will occur if two companies uses the same prefixes.

• To solve this problem, you could take advantage of the already

unambiguous Internet domain names in existence and specify that

URIs must be used for the prefix names.

• URI (Uniform Resource Identifier) is a string of characters that

identifies a resource.

• It can be in one of two flavors:

– URL (Uniform Resource Locator)

– URN (Universal Resource Name).

42PalGov © 2011 42PalGov © 2011

How XML Namespaces Work

• The XML Namespaces Recommendation introduces a standard syntax

for declaring namespaces and identifying the namespace for a given

element or attribute in an XML document.

• The XML namespaces specification is located at

http://www.w3.org/TR/REC-xml-names/

• To use XML namespaces in your documents, elements are given

qualified names.

• W3C specifications, qualified name is abbreviated to Qname.

• These qualified names consist of two parts:

– The local part, which is the same as the names we have been giving

elements all along

– The namespace prefix, which specifies to which namespace this name

belongs.








43PalGov © 2011 43PalGov © 2011

How XML Namespaces Work…Cont

Example:

• To declare a namespace called

http://www.wiley.com/pers and associate a

<person> element with that namespace, you

would do something like the following:<pers:person xmlns:pers=‖http://www.wiley.com/pers‖/>

• The key is the xmlns:pers attribute (xmlns stands for XML

Namespace).

• Here you are declaring the pers namespace prefix and the URI of the

namespace that it represents (http://www.wiley.com/pers

http://www.wiley.com/pers

44PalGov © 2011 44PalGov © 2011

How XML Namespaces Work…Cont

• The prefix can be used for any descendants of the <pers:person>

element, to denote that they also belong to the

http://www.wiley.com/pers namespace, as shown in the following

example:

<pers:person xmlns:pers=‖http://www.wiley.com/pers‖>

<pers:name>

<pers:title>Sir</pers:title>

</pers:name>

</pers:person>

• Internally, when this document is parsed, the parser simply replaces

any namespace prefixes with the namespace itself.

• A parser might consider <pers:person> to be similar to

<{http://www.wiley.com/pers/person>.

45PalGov © 2011 45PalGov © 2011

Default Namespaces

• A default namespace is just like a regular namespace

except that you don‘t have to specify a prefix for all of the

elements that use it.

• Example:

<person xmlns=‖http://www.wiley.com/pers‖>

<name>

<title>Sir</title>

</name>

</person>

• All descendent elements belongs the specified name

space.

46PalGov © 2011 46PalGov © 2011

Default Namespaces…Cont

• You can declare more than one namespace for an

element, but only one can be the default.

• This allows you to write XML like this:

<person xmlns=‖http://www.wiley.com/pers‖

xmlns:xhtml=‖http://www.w3.org/1999/xhtml‖>

<name/>

<xhtml:p>This is XHTML</xhtml:p>

</person>

47PalGov © 2011 47PalGov © 2011

Default Namespaces…Cont

• You declared the namespaces and their prefixes, if

applicable, in the root element so that all elements in the

document can use these prefixes.

• You can‘t write XML like this:


xmlns=‖http://www.w3.org/1999/xhtml‖>

• This tries to declare two default namespaces.

• In this case, the XML parser wouldn‘t be able to figure out

to what namespace the element belongs.

48PalGov © 2011 48PalGov © 2011

Declaring Namespaces on Descendants

• Namespace prefixes can be declared in any element in the document.

• Example:


<name/>

<xhtml:p xmlns:xhtml=‖http://www.w3.org/1999/xhtml‖>

This is XHTML</xhtml:p>

</person>

• This makes the document more readable because namespaces declared

closer to where they‘ll actually be used.

• The prefix is available only in the element and its descendants.

49PalGov © 2011 49PalGov © 2011

Declaring Default Namespaces on

Descendants

• You can declare the namespace to be the default namespace for the element and its descendents.

• Example:


<name/>

This is XHTML

</person>

• http://www.wiley.com/pers is the default namespace for the

document as a whole.

• http://www.w3.org/1999/xhtml is the default namespace for the

 element, and any of its descendants.

• The http://www.w3.org/1999/xhtml namespace overrides the

http://www.wiley.com/pers namespace, so that it doesn‘t apply to the 

element.

http://www.wiley.com/pers

50PalGov © 2011 50PalGov © 2011

Canceling Default Namespaces

• Setting the value to an empty string to the namespace.

• Example:

<employee>

<name>Jane Doe</name>

<notes>

I‘ve worked

with <name xmlns=‖‖>Jane Doe</name> for over a

year

now.

</notes>

</employee>

51PalGov © 2011 51PalGov © 2011

Do Different Notations Make Any

Difference?


<name/>

This is XHTML

</person>

<pers:person xmlns:pers=‖http://www.wiley.com/pers‖


<pers:name/>


</pers:person>



<name/>


</person>

52PalGov © 2011 52PalGov © 2011

Namespaces and Attributes

• Do namespaces work the same for attributes as

they do for elements?

• The answer is no, they don‘t.

• In fact, attributes usually don‘t have namespaces

the way elements do.

• They are just ―associated‖ with the elements to

which they belong.

53PalGov © 2011 53PalGov © 2011

Understanding URIs

• URI (Uniform Resource Identifier) is a string of characters

that identifies a resource.

• It can occur in one of two flavors:

– URL (Uniform Resource Locator)

– URN (Universal Resource Name).

• A resource is anything that has identity.– An item that is retrievable over the Internet, such as an HTML

document.

– An item that is not retrievable over the Internet, such as the person

who wrote that HTML document.

54PalGov © 2011 54PalGov © 2011

Summary

• What XML is and why it‘s so useful?

– A protocol for containing and managing information.

– Store and retrieve data, format documents, put data

in a presentable form, ensure data integrity, support

multiple languages.

• Namespaces used to differentiate elements and

attributes of different XML document types from

each other when combining them in one

document, or even when processing multiple

documents simultaneously.

55PalGov © 2011 55PalGov © 2011

Refrences

• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt,

A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing Inc: Indiana, USA.

• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.

• Amiano, M., D'Cruz, C., Ethier, K., Thomas, M., (2006), XML:

Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.

• http://www.w3.org

• http://www.w3schools.com

• http://www.xml.com

• http://www.xml.org

http://www.w3.org/

http://www.w3.org/

http://www.w3.org/

http://www.w3schools.com/



http://www.xml.com/

http://www.xml.org/

56PalGov © 2011 56PalGov © 2011

<e-Gov> Thank you </e-Gov>

Education

Pal gov.tutorial2.session1.xml basics and namespaces