18th October, 2003
1
Khalid Khan
Please turn off your mobile phones
Feel free to interrupt me if you have any question
18th October, 2003
2
Khalid Khan
Main topics
What is XML?What is DTD and XML Schema?What is XSLT?Different Tools, API’s and Parsers
18th October, 2003
3
Khalid Khan
What is XML?
XML stands for eXtensible Markup Language.XML can be used to Create new Languages
XML is the mother of WML(Wireless Markup Language), CML(Chemical Markup Language), ThML(Theological Markup Language) and so on …
XML can be used to exchange dataWith XML, data can be exchanged between incompatible systems (Portable data)
XML is a cross-platform, software and hardware independent tool for transmitting information.
18th October, 2003
4
Khalid Khan
XML SyntaxXML documents use a self-describing and simple syntax
<?xml version="1.0" encoding="UTF-8"?><project>
<description>Yahoo für das Invisible Web Scatter/Gather-Clustering für semistrukturierte Daten</description><participant><professor>
<lastName>Fuhr</lastName> <firstName>Norbert</firstName> <fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2526</phone>
</professor> <mitarbeiter>
<lastName>Fischer</lastName><firstName>Gudrun</firstName><fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2206</phone>
</mitarbeiter><students number="1">
<lastName>Khan</lastName><firstName>Khalid</firstName><matNumber>745784</matNumber><sex>Male</sex><age>28</age><studiengang>AOS</studiengang><thema>XML und Werkzeuge</thema>
</students></participant>
</project>
18th October, 2003
5
Khalid Khan
NamespaceXML Namespaces provide a method to avoid element name conflictsName Conflicts
Since element names in XML are not fixed, very often a name conflict will occur when two different documents use the same names describing two different types of elements.
<project xmlns="http://www.uni-duisburg.de"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.uni-duisburg.de
project.xsd">
1. First, using a default namespace declaration, tell the schema-validator that all of the elements used in this instance document come from the http://www.uni-duisburg.de namespace.
2. Second, with schemaLocation tell the schema-validator that the http://www.uni-duisburg.de namespace is defined by project.xsd (i.e., schemaLocation contains a pair of values).
3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one in the XMLSchema-instance namespace.
3
1
2
18th October, 2003
6
Khalid Khan
Schema
.
18th October, 2003
7
Khalid Khan
Schema and Schema languages
A schema is a definition of the syntax of an XML-based language.A schema language is a formal language for expressing schemas.The document being validated is called an instance document or application document.Main schema language are
DTDXML Schema
18th October, 2003
8
Khalid Khan
DTDproject.dtd
<?xml version="1.0" encoding="UTF-8"?><!ELEMENT project (description, particepant+)><!ELEMENT description ANY><!ELEMENT particepant (professor, mitarbeiter, students+)><!ELEMENT professor (lastName, firstName, fachgebiet, phone)><!ELEMENT mitarbeiter (lastName, firstName, fachgebiet, phone)><!ELEMENT students (lastName, firstName, matNumber, sex, age,
studiengang, thema)><!ATTLIST students
number CDATA #IMPLIED><!ELEMENT lastName (#PCDATA)><!ELEMENT firstName (#PCDATA)><!ELEMENT matNumber (#PCDATA)><!ELEMENT sex (#PCDATA)><!ELEMENT age (#PCDATA)><!ELEMENT studiengang (#PCDATA)><!ELEMENT thema (#PCDATA)><!ELEMENT fachgebiet (#PCDATA)><!ELEMENT phone (#PCDATA)>
project.xml
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE project SYSTEM “project.dtd”>
<project> <description>Yahoo für das Invisible Web Scatter/Gather-Clustering für
semistrukturierte Daten</description><participant>
<professor><lastName>Fuhr</lastName> <firstName>Norbert</firstName> <fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2526</phone>
</professor> <mitarbeiter>
<lastName>Fischer</lastName><firstName>Gudrun</firstName><fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2206</phone>
</mitarbeiter><students number="1">
<lastName>Khan</lastName><firstName>Khalid</firstName><matNumber>745784</matNumber><sex>Male</sex><age>28</age><studiengang>AOS</studiengang><thema>XML und Werkzeuge</thema>
</students></participant>
</project>
18th October, 2003
9
Khalid Khan
Problems with DTD
not itself using XML syntax.no constraints on character data
if character data is allowed, any character data is allowed
no support for Namespaces of course, XML 1.0 was defined before Namespace
no embedded, structured self-documentation <!-- comments --> are not enough
too simple attribute value models
18th October, 2003
10
Khalid Khan
XML Schema (1)
XML Schemas are a tremendous advancement over DTDs
Enhanced data types:44+ versus 10Can create your own data types
Written in the same syntax as instance documentsObject-oriented‘ishglobal (=top-level) and local (=inlined) type definitionsstructured self-documentationuses and supports Namespacesmodularization (schema inclusion and redefinitions)and many more
18th October, 2003
11
Khalid Khan
XML Schema (2)<?xml version="1.0" encoding="UTF-8"?><xs:schema targetNamespace="http://www.uni-duisburg.de"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.uni-duisburg.de" elementFormDefault="qualified"><xs:element name="project"/><xs:element name="description" type="xs:string"/><xs:element name="participant">
<xs:complexType><xs:sequence>
<xs:element ref="professor “mixOccurs="1" maxOccurs="1"/><xs:element ref="mitarbeiter"/><xs:element ref="students" maxOccurs="12"/>
</xs:sequence></xs:complexType>
</xs:element><xs:element name="professor">
<xs:complexType><xs:sequence>
<xs:element name="lastName"/><xs:element name="firstName"/><xs:element name="fachgebiet"/><xs:element name="phone"/>
</xs:sequence></xs:complexType></xs:element><xs:element name=“mitarbeiter">
<xs:complexType><xs:sequence>
<xs:element name="lastName"/><xs:element name="firstName"/><xs:element name="fachgebiet"/><xs:element name="phone"/>
</xs:sequence></xs:complexType></xs:element>
<xs:element name="students"><xs:complexType>
<xs:sequence><xs:element name="lastName"/><xs:element name="firstName"/><xs:element name="matNumber"/><xs:element name="sex"/><xs:element name="age"/><xs:element name="studiengang"/><xs:element name="thema"/>
</xs:sequence><xs:attribute name="number"/>
</xs:complexType></xs:element><xs:element name="lastName" type="xs:string"/><xs:element name="firstName" type="xs:string"/><xs:element name="fachgebiet" type="xs:string"/><xs:element name="matNumber" type="xs:string"/><xs:element name="sex" type="xs:string"/><xs:element name="age" type="xs:string"/><xs:element name="studiengang" type="xs:string"/><xs:element name="thema" type="xs:string"/>
</xs:schema>
18th October, 2003
12
Khalid Khan
XML Schema (3)
All XML schema have “schema” as root element<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.uni-duisburg.de“ xmlns="http://www.uni-duisburg.de" elementFormDefault="qualified">
1. The schemas and data type that are used to constract schemas ie schema, element, complexType, sequence, string… are come form the http://www.w3.org/2001/XMLSchema
2. Indicates that the elements defined by this schema ie project, description, participant,….. Are to go in the http://www.uni-duisburg.de
3. The default namespace is http://www.uni-duisburg.de which is the targetNamesapce4. This is a directive to any instance documents which conform to this schema; any
element used by the instance document which were declared in this schema must be namespace qualified
4
1
2
3
18th October, 2003
13
Khalid Khan
XML Schema (4)project.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi=http://www.w3.org/2001/XMLSchema-instancexmlns="http://www.uni-duisburg.de"
xsi:schemaLocation="http://www.uni-duisburg.deproject.xsd">
<description>Yahoo für das Invisible Web Scatter/Gather-Clustering für semistrukturierte Daten</description><participant><professor>
<lastName>Fuhr</lastName> <firstName>Norbert</firstName> <fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2526</phone>
</professor> <mitarbeiter>
<lastName>Fischer</lastName><firstName>Gudrun</firstName><fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2206</phone>
</mitarbeiter><students number="1">
<lastName>Khan</lastName><firstName>Khalid</firstName><matNumber>745784</matNumber><sex>Male</sex><age>28</age><studiengang>AOS</studiengang><thema>XML und Werkzeuge</thema>
</students></participant>
</project>
18th October, 2003
14
Khalid Khan
XML Schema (5)
.
schemasLocation=“http://www.uni-duisburg.deproject.xsd”
targetNamespace=“http://www.uni-duisburg.de”
project.xsdproject.xmlProject.xml use schema form namesapce http://www.uni-duisburg.de
Defines element in namespace http://www.uni-duisburg.de
A schema defines a new vocabulary. Instance document use that new vocabulary
18th October, 2003
15
Khalid Khan
Multiple level of checking
XMLSchema.xsd(schema-for-schemas)
project.xml project.xsd
Validate that the xml documentconforms to the rules describedin project.xsd
Validate that project.xsd is a validschema document, i.e., it conformsto the rules described in theschema-for-schemas
18th October, 2003
16
Khalid Khan
Built-in Data typesDerived types
normalizedStringintegernonPositiveIntegernegativeIntegerlongint shortbytenonNegativeIntegerunsignedLongunsignedIntunsignedShortunsignedBytepositiveInteger
Primitive DatatypesstringbooleandecimalfloatdoubledurationdateTimetimedategYearMonthgYeargMonthDaygDaygMonthhexBinary base64BinaryanyURIQNameNOTATION
18th October, 2003
17
Khalid Khan
Creating own data type (1)A new data type can be defined from the an existing data
type by specifying value for one or more of the optional facets.
Examples of the facets:string has six facets
lengthminLengthmaxLengthpatternenumerationwhitespace
integer has eight facetspatternmaxInclusivemaxExclusivetotalDigitsenumerationwhitespaceminInclusivemaxInclusive
18th October, 2003
18
Khalid Khan
Creating own data type (2)PhoneType
<xs:simpleType name="phoneType"><xs:restriction base="xs:string">
<xs:pattern value="\d{4}-\d{3}-\d{4}"/></xs:restriction>
</xs:simpleType>
MatriculationtType<xs:simpleType name="matType">
<xs:restriction base="xs:integer"><xs:totalDigits value="6"/>
<!--<xs:pattern value="\d{6}"/> --></xs:restriction>
</xs:simpleType>
18th October, 2003
19
Khalid Khan
Better Schema<?xml version="1.0" encoding="UTF-8"?><xs:schema targetNamespace="http://www.uni-duisburg.de"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.uni-duisburg.de" elementFormDefault="qualified"><xs:element name="project">
<xs:annotation><xs:documentation>This is the root of my schema</xs:documentation>
</xs:annotation></xs:element><xs:element name="description" type="xs:string"/><xs:element name="participant">
<xs:complexType><xs:sequence>
<xs:element name="professor" type="empType"/><xs:element name="mitarbeiter" type="empType"/><xs:element name="students" type="stuType" maxOccurs="12"/>
</xs:sequence></xs:complexType>
</xs:element><xs:complexType name="empType">
<xs:sequence><xs:element name="lastName" type="xs:string"/>
<xs:element name="firstName" type="xs:string"/><xs:element name="fachgebiet" type="xs:string"/><xs:element name="phone" type="phoneType"/>
</xs:sequence></xs:complexType>
<xs:complexType name="stuType"><xs:sequence>
<xs:element name="lastName" type="xs:string"/><xs:element name="firstName" type="xs:string"/><xs:element name="matNumber" type="matType"/><xs:element name="sex" type="xs:string"/><xs:element name="age" type="xs:integer"/><xs:element name="studiengang" type="xs:string"/><xs:element name="thema" type="xs:string"/>
</xs:sequence><xs:attribute name="number" type="xs:integer" use="required"/>
</xs:complexType><xs:simpleType name="phoneType">
<xs:restriction base="xs:string"><xs:pattern value="\d{4}-\d{3}-\d{4}"/>
</xs:restriction></xs:simpleType><xs:simpleType name="matType">
<xs:restriction base="xs:integer"><xs:totalDigits value="6"/><xs:pattern value="\d{6}"/>
</xs:restriction></xs:simpleType>
</xs:schema>
18th October, 2003
20
Khalid Khan
What is XSLT?
XSLT stands for eXtensible Stylesheet Language Transformation It is a part of XSL, which consists of three parts:
XSLTXPathXSL Formatting Objects.
To understand XSLT we need to understand the XPath
18th October, 2003
21
Khalid Khan
XPath
XPath is a set of syntax rules for defining parts of an XML document.
XPath uses paths to define XML elements
XPath defines a library of standard functions
XPath is a major element in XSLT
18th October, 2003
22
Khalid Khan
Axes (1)
18th October, 2003
23
Khalid Khan
Axes (2)Forward-sibling Preceding-sibling
Parent Child
18th October, 2003
24
Khalid Khan
Axes (3)Forwarding Descendent
Preceding Ancestor
18th October, 2003
25
Khalid Khan
Axes and AbbreviationsSyntactic sugar: convenient notation for common situations
Normal syntax Abbreviationchild:: nothing (so child is the default axis)
attribute:: @/descendant-or-self::node()/ //
self::node() . parent::node() ..
Example.//@href
selects all href attributes in descendants of the context node.
18th October, 2003
26
Khalid Khan
Core function libraryNode-set functions:
last() returns position number of last node position() returns the context positioncount(node-set) number of nodes in node-setname(node-set) string representation of first node in node-set…. ……
String functions:string(value) type cast to stringconcat(string, string, ...) string concatenation
…. ……Boolean functions:
boolean(value) type cast to booleannot(boolean) boolean negation
…. ……Number functions:
number(value) type cast to numbersum(node-set) sum of number value of each node in node-set…. ……
18th October, 2003
27
Khalid Khan
XSLT (1)
The basic idea of XSLT
•An XSLT stylesheet is declarative and uses pattern matching and templates to specify the transformation.
•Tools on the Web, XSLT transformation can be done either on the client (e.g. Explorer or Mozilla), or on the server (e.g. Apache Xalan).
18th October, 2003
28
Khalid Khan
XSLT (2)
An XSLT stylesheet is itself an XML document:<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform“
version="1.0” xmlns="...">..
<xsl:template match="pattern"> \template > a template rule
</xsl:template> /.. <- other top-level elements.
</xsl:stylesheet>
18th October, 2003
29
Khalid Khan
Templates (1)
There are different kinds of templates constructsliteral result fragments recursive processingcomputed result fragmentsconditional processingsortingnumberingvariables and parameterskeys
18th October, 2003
30
Khalid Khan
Templates (2)Recursive processing instructions:
<xsl:apply-templates select="node-set expression" .../><xsl:call-template name="..."/><xsl:for-each select="node-set expression"> template</...><xsl:copy> template </...><xsl:copy-of select="..."/>
Example<xsl:template match=“students"><h1>
<xsl:apply-templates select=“age"/></h1>
</xsl:template>
A literal result fragment is:a text constant<xsl:text ...> ... </xsl:text.><xsl:comment>..</xsl:comment>
Example<xsl:template match="...">this text is written directly to outputwhen this template is instantiated</xsl:template>
computed result fragments<xsl:value-of select="..."/><xsl:element name="..." namespace="..."> ... </...>
18th October, 2003
31
Khalid Khan
Templates and Functionsconditional processing
<xsl:if test="expression"> ... </...><xsl:choose>
<xsl:when test="expression"> ... </...>...
<xsl:otherwise> ... </...></...>
Sorting<xsl:sort select="expression" .../>Some extra attributes:
order="ascending/descending“lang="..." data-type="text/number"case-order="upper-first/lower-first"
Some XSLT Functionscurrent()
Returns the current node
document()Used to access the nodes in an external XML document
element-available()Tests whether the element specified is supported by the XSLT processor
format-number()Converts a number into a string
function-available()Tests whether the function specified is supported by the XSLT processor
generate-id()Returns a string value that uniquely identifies a specified node
18th October, 2003
32
Khalid Khan
Project Example<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/"><html><body bgcolor="cornflower"><h2>
<font color="blue"><u> Student Project</u>
</font></h2>
<h4>Description:</h4><font color="blue">
<xsl:value-of select="project/description"/></font>
<hr/><h3>Dozent(en):</h3>
<xsl:for-each select="project/participant/*"><font color="blue"><h5><xsl:if test="descendant-or-self::professor">
Prof. Dr. </xsl:if>
<xsl:if test="descendant-or-self::mitarbeiter">Dip. Inform.
</xsl:if><xsl:if test="descendant-or-self::professor | descendant-or-self::mitarbeiter">
<xsl:value-of select="firstName"/><xsl:text> </xsl:text>
<xsl:value-of select="lastName"/></xsl:if>
</h5></font></xsl:for-each>
<h3>Students: </h3><xsl:for-each select="project/participant/*">
<xsl:if test="self::students">(<xsl:value-of select="@number"/>)
<br/>Name:<font color="blue">
<xsl:value-of select="firstName"/><xsl:text> </xsl:text><xsl:value-of select="lastName"/>
</font><br/>
Matrikulation Nr: <font color="blue">
<xsl:value-of select="matNumber"/></font>
<br/>Studiengng:
<font color="blue"><xsl:value-of select="studiengang"/>
</font><br/>
Thema: <font color="blue">
<xsl:value-of select="thema"/></font>
<br/><br/></xsl:if>
</xsl:for-each></body>
</html></xsl:template>
</xsl:stylesheet>
18th October, 2003
33
Khalid Khan
Output in Browser
18th October, 2003
34
Khalid Khan
Different Parser
Apache Xerces-J
IBM XML4J
James Clark’s parser, XP
MS XML Parser
18th October, 2003
35
Khalid Khan
Different APIs
The Document Object Model (DOM) APIThe W3C official Proposal
The Simple API for XML (SAX) APIThe first widely adopted API for XML in java and de facto standard
The JDOM APIAn API that is tailored to java
JAXPthe official API for XML processing from Sun.
The Streaming API for XML (StAX) APIPromising new model introduced
18th October, 2003
36
Khalid Khan
DOM
DOM ParsingDOM is a tree based parsing technique that builds up entire tree in the memory. It allow complete, dynamic access to the whole XML document.
DOMParser p = new DOMParser();p.parse(“project.xml”);Document doc = p.getDocument();
18th October, 2003
37
Khalid Khan
SAX (1)
SAX ParsingSAX is a event driven push model for processing XML. The SAX started as a grassroots movement, but has gained an official standing. An XML tree is not viewed as a data structure, but as a stream of events generated by the parser.
The kinds of events are:the start of the document is encounteredthe end of the document is encountered the start tag of an element is encountered the end tag of an element is encountered character data is encountered a processing instruction is encountered
18th October, 2003
38
Khalid Khan
SAX (2)
Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes.
Public class MyHandeler extends DefaultHandler{ ……….. }SAXParser sp = new SAXParser();
18th October, 2003
39
Khalid Khan
JDOM
JDOM is designed to be simple and Java-specific.JDOM is a small library, since it is used on top of either DOM or SAX.JDOM contains five Java packages:
org.jdom - defines the basic model of an XML treeorg.jdom.adapters - defines wrappers for various DOM implementationsorg.jdom.input - defines means for reading XML documentsorg.jdom.output - defines means for writing XML documentsorg.jdom.transform - defines an interface to JAXP XSLT
18th October, 2003
40
Khalid Khan
Editors
XMLSpy ( I just love it )www.altova.com/download.html
Eclipse with xml plug ins www.eclipse.org
X/HTML Kit www.chami.com/html-kit
Many more
18th October, 2003
41
Khalid Khan
Acknowledgments
During the preparation of this presentation, I studied and used material available online and in the form of printing. I say thanks to all of these organization and authors for their wonderful work.Here is small list ( sorry to those who missed, I am really sleepy now, 3.00 am)
www.w3.org/xmlwww.ibm.com/developerworkswww.w3school.com/default.aspwww.java.sun.com/products/xmlDB2 MagazineOracle MagazineRoger L. Costello Anders Moller (The XML Revolution)Michael I. Schwartzbach (The XML Revolution)Brett McLaughlin (java & XML)http://xml.Apache.orgmany many more ……….
It would also say thanks to my friends Sohail and Asif for their continuously support.
18th October, 2003
42
Khalid Khan