42
18th October, 2003 1 Khalid Khan Please turn off your mobile phones Feel free to interrupt me if you have any question

Khan Slides - Informationssysteme

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Khan Slides - Informationssysteme

18th October, 2003

1

Khalid Khan

Please turn off your mobile phones

Feel free to interrupt me if you have any question

Page 2: Khan Slides - Informationssysteme

18th October, 2003

2

Khalid Khan

Main topics

What is XML?What is DTD and XML Schema?What is XSLT?Different Tools, API’s and Parsers

Page 3: Khan Slides - Informationssysteme

18th October, 2003

3

Khalid Khan

What is XML?

XML stands for eXtensible Markup Language.XML can be used to Create new Languages

XML is the mother of WML(Wireless Markup Language), CML(Chemical Markup Language), ThML(Theological Markup Language) and so on …

XML can be used to exchange dataWith XML, data can be exchanged between incompatible systems (Portable data)

XML is a cross-platform, software and hardware independent tool for transmitting information.

Page 4: Khan Slides - Informationssysteme

18th October, 2003

4

Khalid Khan

XML SyntaxXML documents use a self-describing and simple syntax

<?xml version="1.0" encoding="UTF-8"?><project>

<description>Yahoo für das Invisible Web Scatter/Gather-Clustering für semistrukturierte Daten</description><participant><professor>

<lastName>Fuhr</lastName> <firstName>Norbert</firstName> <fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2526</phone>

</professor> <mitarbeiter>

<lastName>Fischer</lastName><firstName>Gudrun</firstName><fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2206</phone>

</mitarbeiter><students number="1">

<lastName>Khan</lastName><firstName>Khalid</firstName><matNumber>745784</matNumber><sex>Male</sex><age>28</age><studiengang>AOS</studiengang><thema>XML und Werkzeuge</thema>

</students></participant>

</project>

Page 5: Khan Slides - Informationssysteme

18th October, 2003

5

Khalid Khan

NamespaceXML Namespaces provide a method to avoid element name conflictsName Conflicts

Since element names in XML are not fixed, very often a name conflict will occur when two different documents use the same names describing two different types of elements.

<project xmlns="http://www.uni-duisburg.de"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.uni-duisburg.de

project.xsd">

1. First, using a default namespace declaration, tell the schema-validator that all of the elements used in this instance document come from the http://www.uni-duisburg.de namespace.

2. Second, with schemaLocation tell the schema-validator that the http://www.uni-duisburg.de namespace is defined by project.xsd (i.e., schemaLocation contains a pair of values).

3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one in the XMLSchema-instance namespace.

3

1

2

Page 6: Khan Slides - Informationssysteme

18th October, 2003

6

Khalid Khan

Schema

.

Page 7: Khan Slides - Informationssysteme

18th October, 2003

7

Khalid Khan

Schema and Schema languages

A schema is a definition of the syntax of an XML-based language.A schema language is a formal language for expressing schemas.The document being validated is called an instance document or application document.Main schema language are

DTDXML Schema

Page 8: Khan Slides - Informationssysteme

18th October, 2003

8

Khalid Khan

DTDproject.dtd

<?xml version="1.0" encoding="UTF-8"?><!ELEMENT project (description, particepant+)><!ELEMENT description ANY><!ELEMENT particepant (professor, mitarbeiter, students+)><!ELEMENT professor (lastName, firstName, fachgebiet, phone)><!ELEMENT mitarbeiter (lastName, firstName, fachgebiet, phone)><!ELEMENT students (lastName, firstName, matNumber, sex, age,

studiengang, thema)><!ATTLIST students

number CDATA #IMPLIED><!ELEMENT lastName (#PCDATA)><!ELEMENT firstName (#PCDATA)><!ELEMENT matNumber (#PCDATA)><!ELEMENT sex (#PCDATA)><!ELEMENT age (#PCDATA)><!ELEMENT studiengang (#PCDATA)><!ELEMENT thema (#PCDATA)><!ELEMENT fachgebiet (#PCDATA)><!ELEMENT phone (#PCDATA)>

project.xml

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE project SYSTEM “project.dtd”>

<project> <description>Yahoo für das Invisible Web Scatter/Gather-Clustering für

semistrukturierte Daten</description><participant>

<professor><lastName>Fuhr</lastName> <firstName>Norbert</firstName> <fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2526</phone>

</professor> <mitarbeiter>

<lastName>Fischer</lastName><firstName>Gudrun</firstName><fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2206</phone>

</mitarbeiter><students number="1">

<lastName>Khan</lastName><firstName>Khalid</firstName><matNumber>745784</matNumber><sex>Male</sex><age>28</age><studiengang>AOS</studiengang><thema>XML und Werkzeuge</thema>

</students></participant>

</project>

Page 9: Khan Slides - Informationssysteme

18th October, 2003

9

Khalid Khan

Problems with DTD

not itself using XML syntax.no constraints on character data

if character data is allowed, any character data is allowed

no support for Namespaces of course, XML 1.0 was defined before Namespace

no embedded, structured self-documentation <!-- comments --> are not enough

too simple attribute value models

Page 10: Khan Slides - Informationssysteme

18th October, 2003

10

Khalid Khan

XML Schema (1)

XML Schemas are a tremendous advancement over DTDs

Enhanced data types:44+ versus 10Can create your own data types

Written in the same syntax as instance documentsObject-oriented‘ishglobal (=top-level) and local (=inlined) type definitionsstructured self-documentationuses and supports Namespacesmodularization (schema inclusion and redefinitions)and many more

Page 11: Khan Slides - Informationssysteme

18th October, 2003

11

Khalid Khan

XML Schema (2)<?xml version="1.0" encoding="UTF-8"?><xs:schema targetNamespace="http://www.uni-duisburg.de"

xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.uni-duisburg.de" elementFormDefault="qualified"><xs:element name="project"/><xs:element name="description" type="xs:string"/><xs:element name="participant">

<xs:complexType><xs:sequence>

<xs:element ref="professor “mixOccurs="1" maxOccurs="1"/><xs:element ref="mitarbeiter"/><xs:element ref="students" maxOccurs="12"/>

</xs:sequence></xs:complexType>

</xs:element><xs:element name="professor">

<xs:complexType><xs:sequence>

<xs:element name="lastName"/><xs:element name="firstName"/><xs:element name="fachgebiet"/><xs:element name="phone"/>

</xs:sequence></xs:complexType></xs:element><xs:element name=“mitarbeiter">

<xs:complexType><xs:sequence>

<xs:element name="lastName"/><xs:element name="firstName"/><xs:element name="fachgebiet"/><xs:element name="phone"/>

</xs:sequence></xs:complexType></xs:element>

<xs:element name="students"><xs:complexType>

<xs:sequence><xs:element name="lastName"/><xs:element name="firstName"/><xs:element name="matNumber"/><xs:element name="sex"/><xs:element name="age"/><xs:element name="studiengang"/><xs:element name="thema"/>

</xs:sequence><xs:attribute name="number"/>

</xs:complexType></xs:element><xs:element name="lastName" type="xs:string"/><xs:element name="firstName" type="xs:string"/><xs:element name="fachgebiet" type="xs:string"/><xs:element name="matNumber" type="xs:string"/><xs:element name="sex" type="xs:string"/><xs:element name="age" type="xs:string"/><xs:element name="studiengang" type="xs:string"/><xs:element name="thema" type="xs:string"/>

</xs:schema>

Page 12: Khan Slides - Informationssysteme

18th October, 2003

12

Khalid Khan

XML Schema (3)

All XML schema have “schema” as root element<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.uni-duisburg.de“ xmlns="http://www.uni-duisburg.de" elementFormDefault="qualified">

1. The schemas and data type that are used to constract schemas ie schema, element, complexType, sequence, string… are come form the http://www.w3.org/2001/XMLSchema

2. Indicates that the elements defined by this schema ie project, description, participant,….. Are to go in the http://www.uni-duisburg.de

3. The default namespace is http://www.uni-duisburg.de which is the targetNamesapce4. This is a directive to any instance documents which conform to this schema; any

element used by the instance document which were declared in this schema must be namespace qualified

4

1

2

3

Page 13: Khan Slides - Informationssysteme

18th October, 2003

13

Khalid Khan

XML Schema (4)project.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns:xsi=http://www.w3.org/2001/XMLSchema-instancexmlns="http://www.uni-duisburg.de"

xsi:schemaLocation="http://www.uni-duisburg.deproject.xsd">

<description>Yahoo für das Invisible Web Scatter/Gather-Clustering für semistrukturierte Daten</description><participant><professor>

<lastName>Fuhr</lastName> <firstName>Norbert</firstName> <fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2526</phone>

</professor> <mitarbeiter>

<lastName>Fischer</lastName><firstName>Gudrun</firstName><fachgebiet>Informationssysteme</fachgebiet><phone>0203-379-2206</phone>

</mitarbeiter><students number="1">

<lastName>Khan</lastName><firstName>Khalid</firstName><matNumber>745784</matNumber><sex>Male</sex><age>28</age><studiengang>AOS</studiengang><thema>XML und Werkzeuge</thema>

</students></participant>

</project>

Page 14: Khan Slides - Informationssysteme

18th October, 2003

14

Khalid Khan

XML Schema (5)

.

schemasLocation=“http://www.uni-duisburg.deproject.xsd”

targetNamespace=“http://www.uni-duisburg.de”

project.xsdproject.xmlProject.xml use schema form namesapce http://www.uni-duisburg.de

Defines element in namespace http://www.uni-duisburg.de

A schema defines a new vocabulary. Instance document use that new vocabulary

Page 15: Khan Slides - Informationssysteme

18th October, 2003

15

Khalid Khan

Multiple level of checking

XMLSchema.xsd(schema-for-schemas)

project.xml project.xsd

Validate that the xml documentconforms to the rules describedin project.xsd

Validate that project.xsd is a validschema document, i.e., it conformsto the rules described in theschema-for-schemas

Page 16: Khan Slides - Informationssysteme

18th October, 2003

16

Khalid Khan

Built-in Data typesDerived types

normalizedStringintegernonPositiveIntegernegativeIntegerlongint shortbytenonNegativeIntegerunsignedLongunsignedIntunsignedShortunsignedBytepositiveInteger

Primitive DatatypesstringbooleandecimalfloatdoubledurationdateTimetimedategYearMonthgYeargMonthDaygDaygMonthhexBinary base64BinaryanyURIQNameNOTATION

Page 17: Khan Slides - Informationssysteme

18th October, 2003

17

Khalid Khan

Creating own data type (1)A new data type can be defined from the an existing data

type by specifying value for one or more of the optional facets.

Examples of the facets:string has six facets

lengthminLengthmaxLengthpatternenumerationwhitespace

integer has eight facetspatternmaxInclusivemaxExclusivetotalDigitsenumerationwhitespaceminInclusivemaxInclusive

Page 18: Khan Slides - Informationssysteme

18th October, 2003

18

Khalid Khan

Creating own data type (2)PhoneType

<xs:simpleType name="phoneType"><xs:restriction base="xs:string">

<xs:pattern value="\d{4}-\d{3}-\d{4}"/></xs:restriction>

</xs:simpleType>

MatriculationtType<xs:simpleType name="matType">

<xs:restriction base="xs:integer"><xs:totalDigits value="6"/>

<!--<xs:pattern value="\d{6}"/> --></xs:restriction>

</xs:simpleType>

Page 19: Khan Slides - Informationssysteme

18th October, 2003

19

Khalid Khan

Better Schema<?xml version="1.0" encoding="UTF-8"?><xs:schema targetNamespace="http://www.uni-duisburg.de"

xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.uni-duisburg.de" elementFormDefault="qualified"><xs:element name="project">

<xs:annotation><xs:documentation>This is the root of my schema</xs:documentation>

</xs:annotation></xs:element><xs:element name="description" type="xs:string"/><xs:element name="participant">

<xs:complexType><xs:sequence>

<xs:element name="professor" type="empType"/><xs:element name="mitarbeiter" type="empType"/><xs:element name="students" type="stuType" maxOccurs="12"/>

</xs:sequence></xs:complexType>

</xs:element><xs:complexType name="empType">

<xs:sequence><xs:element name="lastName" type="xs:string"/>

<xs:element name="firstName" type="xs:string"/><xs:element name="fachgebiet" type="xs:string"/><xs:element name="phone" type="phoneType"/>

</xs:sequence></xs:complexType>

<xs:complexType name="stuType"><xs:sequence>

<xs:element name="lastName" type="xs:string"/><xs:element name="firstName" type="xs:string"/><xs:element name="matNumber" type="matType"/><xs:element name="sex" type="xs:string"/><xs:element name="age" type="xs:integer"/><xs:element name="studiengang" type="xs:string"/><xs:element name="thema" type="xs:string"/>

</xs:sequence><xs:attribute name="number" type="xs:integer" use="required"/>

</xs:complexType><xs:simpleType name="phoneType">

<xs:restriction base="xs:string"><xs:pattern value="\d{4}-\d{3}-\d{4}"/>

</xs:restriction></xs:simpleType><xs:simpleType name="matType">

<xs:restriction base="xs:integer"><xs:totalDigits value="6"/><xs:pattern value="\d{6}"/>

</xs:restriction></xs:simpleType>

</xs:schema>

Page 20: Khan Slides - Informationssysteme

18th October, 2003

20

Khalid Khan

What is XSLT?

XSLT stands for eXtensible Stylesheet Language Transformation It is a part of XSL, which consists of three parts:

XSLTXPathXSL Formatting Objects.

To understand XSLT we need to understand the XPath

Page 21: Khan Slides - Informationssysteme

18th October, 2003

21

Khalid Khan

XPath

XPath is a set of syntax rules for defining parts of an XML document.

XPath uses paths to define XML elements

XPath defines a library of standard functions

XPath is a major element in XSLT

Page 22: Khan Slides - Informationssysteme

18th October, 2003

22

Khalid Khan

Axes (1)

Page 23: Khan Slides - Informationssysteme

18th October, 2003

23

Khalid Khan

Axes (2)Forward-sibling Preceding-sibling

Parent Child

Page 24: Khan Slides - Informationssysteme

18th October, 2003

24

Khalid Khan

Axes (3)Forwarding Descendent

Preceding Ancestor

Page 25: Khan Slides - Informationssysteme

18th October, 2003

25

Khalid Khan

Axes and AbbreviationsSyntactic sugar: convenient notation for common situations

Normal syntax Abbreviationchild:: nothing (so child is the default axis)

attribute:: @/descendant-or-self::node()/ //

self::node() . parent::node() ..

Example.//@href

selects all href attributes in descendants of the context node.

Page 26: Khan Slides - Informationssysteme

18th October, 2003

26

Khalid Khan

Core function libraryNode-set functions:

last() returns position number of last node position() returns the context positioncount(node-set) number of nodes in node-setname(node-set) string representation of first node in node-set…. ……

String functions:string(value) type cast to stringconcat(string, string, ...) string concatenation

…. ……Boolean functions:

boolean(value) type cast to booleannot(boolean) boolean negation

…. ……Number functions:

number(value) type cast to numbersum(node-set) sum of number value of each node in node-set…. ……

Page 27: Khan Slides - Informationssysteme

18th October, 2003

27

Khalid Khan

XSLT (1)

The basic idea of XSLT

•An XSLT stylesheet is declarative and uses pattern matching and templates to specify the transformation.

•Tools on the Web, XSLT transformation can be done either on the client (e.g. Explorer or Mozilla), or on the server (e.g. Apache Xalan).

Page 28: Khan Slides - Informationssysteme

18th October, 2003

28

Khalid Khan

XSLT (2)

An XSLT stylesheet is itself an XML document:<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform“

version="1.0” xmlns="...">..

<xsl:template match="pattern"> \template > a template rule

</xsl:template> /.. <- other top-level elements.

</xsl:stylesheet>

Page 29: Khan Slides - Informationssysteme

18th October, 2003

29

Khalid Khan

Templates (1)

There are different kinds of templates constructsliteral result fragments recursive processingcomputed result fragmentsconditional processingsortingnumberingvariables and parameterskeys

Page 30: Khan Slides - Informationssysteme

18th October, 2003

30

Khalid Khan

Templates (2)Recursive processing instructions:

<xsl:apply-templates select="node-set expression" .../><xsl:call-template name="..."/><xsl:for-each select="node-set expression"> template</...><xsl:copy> template </...><xsl:copy-of select="..."/>

Example<xsl:template match=“students"><h1>

<xsl:apply-templates select=“age"/></h1>

</xsl:template>

A literal result fragment is:a text constant<xsl:text ...> ... </xsl:text.><xsl:comment>..</xsl:comment>

Example<xsl:template match="...">this text is written directly to outputwhen this template is instantiated</xsl:template>

computed result fragments<xsl:value-of select="..."/><xsl:element name="..." namespace="..."> ... </...>

Page 31: Khan Slides - Informationssysteme

18th October, 2003

31

Khalid Khan

Templates and Functionsconditional processing

<xsl:if test="expression"> ... </...><xsl:choose>

<xsl:when test="expression"> ... </...>...

<xsl:otherwise> ... </...></...>

Sorting<xsl:sort select="expression" .../>Some extra attributes:

order="ascending/descending“lang="..." data-type="text/number"case-order="upper-first/lower-first"

Some XSLT Functionscurrent()

Returns the current node

document()Used to access the nodes in an external XML document

element-available()Tests whether the element specified is supported by the XSLT processor

format-number()Converts a number into a string

function-available()Tests whether the function specified is supported by the XSLT processor

generate-id()Returns a string value that uniquely identifies a specified node

Page 32: Khan Slides - Informationssysteme

18th October, 2003

32

Khalid Khan

Project Example<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/"><html><body bgcolor="cornflower"><h2>

<font color="blue"><u> Student Project</u>

</font></h2>

<h4>Description:</h4><font color="blue">

<xsl:value-of select="project/description"/></font>

<hr/><h3>Dozent(en):</h3>

<xsl:for-each select="project/participant/*"><font color="blue"><h5><xsl:if test="descendant-or-self::professor">

Prof. Dr. </xsl:if>

<xsl:if test="descendant-or-self::mitarbeiter">Dip. Inform.

</xsl:if><xsl:if test="descendant-or-self::professor | descendant-or-self::mitarbeiter">

<xsl:value-of select="firstName"/><xsl:text> </xsl:text>

<xsl:value-of select="lastName"/></xsl:if>

</h5></font></xsl:for-each>

<h3>Students: </h3><xsl:for-each select="project/participant/*">

<xsl:if test="self::students">(<xsl:value-of select="@number"/>)

<br/>Name:<font color="blue">

<xsl:value-of select="firstName"/><xsl:text> </xsl:text><xsl:value-of select="lastName"/>

</font><br/>

Matrikulation Nr: <font color="blue">

<xsl:value-of select="matNumber"/></font>

<br/>Studiengng:

<font color="blue"><xsl:value-of select="studiengang"/>

</font><br/>

Thema: <font color="blue">

<xsl:value-of select="thema"/></font>

<br/><br/></xsl:if>

</xsl:for-each></body>

</html></xsl:template>

</xsl:stylesheet>

Page 33: Khan Slides - Informationssysteme

18th October, 2003

33

Khalid Khan

Output in Browser

Page 34: Khan Slides - Informationssysteme

18th October, 2003

34

Khalid Khan

Different Parser

Apache Xerces-J

IBM XML4J

James Clark’s parser, XP

MS XML Parser

Page 35: Khan Slides - Informationssysteme

18th October, 2003

35

Khalid Khan

Different APIs

The Document Object Model (DOM) APIThe W3C official Proposal

The Simple API for XML (SAX) APIThe first widely adopted API for XML in java and de facto standard

The JDOM APIAn API that is tailored to java

JAXPthe official API for XML processing from Sun.

The Streaming API for XML (StAX) APIPromising new model introduced

Page 36: Khan Slides - Informationssysteme

18th October, 2003

36

Khalid Khan

DOM

DOM ParsingDOM is a tree based parsing technique that builds up entire tree in the memory. It allow complete, dynamic access to the whole XML document.

DOMParser p = new DOMParser();p.parse(“project.xml”);Document doc = p.getDocument();

Page 37: Khan Slides - Informationssysteme

18th October, 2003

37

Khalid Khan

SAX (1)

SAX ParsingSAX is a event driven push model for processing XML. The SAX started as a grassroots movement, but has gained an official standing. An XML tree is not viewed as a data structure, but as a stream of events generated by the parser.

The kinds of events are:the start of the document is encounteredthe end of the document is encountered the start tag of an element is encountered the end tag of an element is encountered character data is encountered a processing instruction is encountered

Page 38: Khan Slides - Informationssysteme

18th October, 2003

38

Khalid Khan

SAX (2)

Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes.

Public class MyHandeler extends DefaultHandler{ ……….. }SAXParser sp = new SAXParser();

Page 39: Khan Slides - Informationssysteme

18th October, 2003

39

Khalid Khan

JDOM

JDOM is designed to be simple and Java-specific.JDOM is a small library, since it is used on top of either DOM or SAX.JDOM contains five Java packages:

org.jdom - defines the basic model of an XML treeorg.jdom.adapters - defines wrappers for various DOM implementationsorg.jdom.input - defines means for reading XML documentsorg.jdom.output - defines means for writing XML documentsorg.jdom.transform - defines an interface to JAXP XSLT

Page 40: Khan Slides - Informationssysteme

18th October, 2003

40

Khalid Khan

Editors

XMLSpy ( I just love it )www.altova.com/download.html

Eclipse with xml plug ins www.eclipse.org

X/HTML Kit www.chami.com/html-kit

Many more

Page 41: Khan Slides - Informationssysteme

18th October, 2003

41

Khalid Khan

Acknowledgments

During the preparation of this presentation, I studied and used material available online and in the form of printing. I say thanks to all of these organization and authors for their wonderful work.Here is small list ( sorry to those who missed, I am really sleepy now, 3.00 am)

www.w3.org/xmlwww.ibm.com/developerworkswww.w3school.com/default.aspwww.java.sun.com/products/xmlDB2 MagazineOracle MagazineRoger L. Costello Anders Moller (The XML Revolution)Michael I. Schwartzbach (The XML Revolution)Brett McLaughlin (java & XML)http://xml.Apache.orgmany many more ……….

It would also say thanks to my friends Sohail and Asif for their continuously support.

Page 42: Khan Slides - Informationssysteme

18th October, 2003

42

Khalid Khan