Upload
peter-winstanley
View
279
Download
1
Embed Size (px)
DESCRIPTION
A presentation on XML from 2006! but still useful
Citation preview
XML From The Ground Up
?
12345678901234567890123456789
simpson bart springfield
flintstonefred bedrock
rubble barney bedrock
Fixed Width Field
12345678901234567890123456789
simpson bart springfield
flintstonefred bedrock
rubble barney bedrock
Fixed Width cont…
simpson bart springfield
flintstone fred bedrock
rubble barney bedrock
?1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof -
loaded",4799.00
CSV1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof -
loaded",4799.00
CSV cont…
1997 Ford E350 ac, abs, moon 3000.00
1999 ChevyVenture "Extended Edition" 4900.00
1996 Jeep Grand Cherokee
MUST SELL! air - moon roof - loaded 4799.00
? 01041cam 2200265 a 450000100200000000300040002000
50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^\
MARC 01041cam 2200265 a 450000100200000000300040002000
50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^\
MARC cont… Leader 01041cam 2200265 a 4500 Control No. 001 ###89048230 Control No. ID 003 DLC DTLT 005 19911106082810.9 Fixed Data 008 891101s1990 maua j 001 0 eng LCCN 010 ## $a ###89048230 ISBN 020 ## $a 0316107514 :
$c $12.95 ISBN 020 ## $a 0316107506 (pbk.) :
$c $5.95 ($6.95 Can.) Cat. Source 040 ## $a DLC
$c DLC $d DLC LC Call No. 050 00 $a GV943.25 $b .B74 1990 Dewey No. 082 00 $a 796.334/2 $2 20 …
?:p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE
Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered!
:figcap.An Inline, Page-Wide Figure :figdesc.This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and
column wide): :fig place=inline width=column. Let's create another figure that is column wide,
which will create a second item for a list of illustrations in a future exercise. :figcap.A Column-Wide Figure :efig.
GML:p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE
Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered!
:figcap.An Inline, Page-Wide Figure :figdesc.This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and
column wide): :fig place=inline width=column. Let's create another figure that is column wide,
which will create a second item for a list of illustrations in a future exercise. :figcap.A Column-Wide Figure :efig.
GML cont…
SGML
<QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS>
</QUOTE>
HTML
XML - 1
<stats21> <ARN ref="E008026"> <AttendantCircumstancesRecord> <PoliceForce>96</PoliceForce> <YearOfRecord>00</YearOfRecord> <MonthOfRecord>00</MonthOfRecord> <AccidentReferenceNumber>E008026</AccidentReferenceNumber> <AccidentSeverity>3</AccidentSeverity> <NumberOfVehicles>002</NumberOfVehicles> <NumberOfCasualties>001</NumberOfCasualties>
… </AttendantCircumstancesRecord> </ARN></stats21>
XML - 2
Is for structuring data Is derived from SGML/HTML Is text, but isn’t meant to be read Is verbose by design
Basic Syntax of XML All XML elements must have a closing tag Empty elements must close with / XML tags are case sensitive All XML elements must be properly nested All XML documents must have a root
element Attribute values must always be quoted XML entities must be used for special
characters
Special Characters in XML strings
& - & < - < > - > " - " ' - '
Example of Special Characters
Invalid XML<Organization>Logica & SE</Organization>
Valid XML <Organization>Logica &
SE</Organization>
XML Structure<?xml version="1.0" encoding="utf-
8" standalone="no"?> <?xml-stylesheet type="text/css“ href="xmlstyle.css"?><bookstore xml:lang="en-US“ xmlns:def="Definitions“>
<book id=“1”>The Bible</book> …</bookstore>
Prolog.(optional)
Processing Instruction (optional)Document Element (namespace/s)
Child node/s
Closing tag of Document Element
XML Example<?xml version="1.0" encoding="UTF-8"?> <Recipe name="bread" prep_time="5 mins" cook_time="3 hours">
<title>Basic bread</title> <ingredient amount="3" unit="cups">Flour</ingredient><ingredient amount="0.25" unit="ounce">Yeast</ingredient><ingredient amount="1.5" unit="cups“ state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient><Instructions>
<step>Mix all ingredients together, and knead thoroughly.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Knead again, place in a tin, and then bake in the oven.</step></Instructions>
</Recipe>
Root
?xml Recipe
name
prep_timecook_tim
e
ingredient bread
5mins3
hourstitle
Basic…amoun
t
Instructions
step step step
Mix… Cover… Knead…
3
Flour
textp-iroot
element
attribute
Attributes vs ElementsData can be stored in child elements or in attributes.
<person sex="female"> <fname>Anna</fname> <lname>Smith</lname> </person>
<person> <sex>female</sex> <fname>Anna</fname> <lname>Smith</lname> </person>
Namespaces Disambiguation mechanism <x xmlns:edi='http://ecommerce.org/schema'>
<!--the "edi" prefix is bound to http://ecommerce.org/schema for the "x" element and contents --></x>
<x xmlns:edi='http://ecommerce.org/schema'> <!-- the 'price' element's namespace is
http://ecommerce.org/schema --> <edi:price units='Euro'>32.18</edi:price></x>
XML Document Structure
Tree Representation
Tree
Pruning
Grafting
Hierarchy
Tree Traversal
Tree Models
Trees – Nested Set view
Take Home …
XML is a syntax for marking up data Markup tags are not pre-defined Namespaces make identical tag
names unique An XML instance document is made
up of markup tags and text (data) XML documents are tree structures
XPath
language for addressing part/s of an XML document
designed to be used by XSLT models XML document as tree of
nodes fully supports XML Namespaces
XPath & XML Document Structure
<xml> <table> <rec id="1"> <numField>123</numField> <stringField>StringValue</stringField> </rec> <rec id="2"> <numField>346</numField> <stringField>Text Value</stringField> </rec> </table> </xml>
XPathMain.htm
xml xml/table xml/table/rec xml/table/rec/numField xml/table/rec/stringField
xml/table/rec/@id
xml/table/
rec[@id='2']
XSL/XSLT
XSL/XSL Example - Source
<persons> <person username="MP123456">
<name>John</name> <family_name>Smith</family_name>
</person> <person username="PK123456">
<name>Sally</name> <family_name>Jones</family_name>
</person> </persons>
XSLT Stylesheet<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/"> <transform>
<xsl:apply-templates/> </transform> </xsl:template>
<xsl:template match="person"> <record>
<username> <xsl:value-of select="@username" />
</username> <name>
<xsl:value-of select="name" /> </name>
</record> </xsl:template>
</xsl:stylesheet>
Transformed Output<?xml version="1.0" encoding="UTF-8"?> <transform> <record> <username>MP123456</username>
<name>John</name> </record> <record> <username>PK123456</username>
<name>Sally</name> </record> </transform>
XSLT Functions
current document element-available format-number function-available generate-id key system-property unparsed-entity-uri
XPath Functions
boolean ceiling concat contains count false floor id lang
last local-name name namespace-uri normalize-space
not number position round
starts-with string string-length substring substring-after
substring-before
sum translate true
XSL-FO Processor
Take Home …
XPath to address data within XML XSLT to re-structure XML They operate on collections of
nodes They work with any type of XML
XSLT_test.htm
XML Schema
A pattern for XML documents Content Structure Constraints
XML Schema Defines … Content
elements & attributes Structure
parent-child relationships order of child elements number of child elements
Constraints whether an element is empty or can include
text data types for elements and attributes default/fixed values for elements & attributes
Example: Simple XML File
<?xml version="1.0"?> <note> <to>Peter</to> <from>Clare</from>
<heading>Reminder</heading> <body>Don't forget the pub this weekend!</body>
</note>
Example: XML Schema<?xml version="1.0"?> <xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema“><xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
Schema components [1]
The <schema> element
<?xml version="1.0"?><xs:schema …..... ...
</xs:schema>
Schema components [2]
Simple element can contain only text. It cannot
contain any other elements or attributes.
<xs:element name="to" type="xs:string"/>
Schema components [3]
Attributes
e.g. <xs:attribute name="lang"
type="xs:string"/>
<lastname lang="EN">Smith</lastname>
Schema components [4]
Built-in data types…. E.g: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time
Schema restrictions [restriction base]
<xs:element name="age"><xs:simpleType> <xs:restriction base="xs:integer">
<xs:minInclusive value="0"/><xs:maxInclusive value="100"/>
</xs:restriction> </xs:simpleType></xs:element>
Schema restrictions [enumeration]
<xs:element name="car"><xs:simpleType>
<xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction>
</xs:simpleType></xs:element>
Schema restrictions [pattern/regular expression]
<xs:element name="letter"><xs:simpleType>
<xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction>
</xs:simpleType>
</xs:element>
Regular Expressions Wildcards on steroids
ab|c{2}|de “ab”; “cc”; “de”
[A-Z]{1,4} “ABDS”; “A”; “ZXS”
[1970-2030] e.g. years in range
[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}
Post Codes
Restrictions for Datatypes enumeration fractionDigits length maxExclusive maxInclusive maxLength
minExclusive minInclusive minLength pattern totalDigits whiteSpace
Complex Element
contains other elements and/or attributes. [4 kinds]
1) empty elements 2) elements that contain only other
elements 3) elements that contain only text
4) elements that contain both other elements and text
Complex Element examples
a) <product pid="1345"/>
b) <employee> <firstname>John</firstname>
<lastname>Smith</lastname> </employee>
c) <food type="dessert">Ice cream</food>
Complex Element Definition
<xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname"
type="xs:string"/> <xs:element name="lastname"
type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
Complex Element Definition /2
Reference to complex type <xs:element name="employee"
type="personinfo"/>
<xs:complexType name="personinfo"> <xs:sequence>
<xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>
</xs:sequence> </xs:complexType>
Type Reuse
Several elements based on same type
<xs:element name="employee" type="personinfo"/>
<xs:element name="student" type="personinfo"/>
<xs:element name="member" type="personinfo"/>
Type Extension<xs:complexType name="fullpersoninfo"> <xs:complexContent>
<xs:extension base="personinfo"> <xs:sequence>
<xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/>
</xs:sequence> </xs:extension>
</xs:complexContent>
</xs:complexType>
Indicators Seven type of indicators enable composition
Order indicators: All Choice Sequence
Occurrence indicators: maxOccurs minOccurs
Group indicators: Group name attributeGroup name
<any>
The <any> element enables us to extend the XML document with elements not specified by the schema.
The <anyAttribute> element enables us to extend the XML document with attributes not specified by the schema.
Where’s the beef?
XML Schema permits… Standard libraries of data
specifications Formal specification of data models Automated validation of XML
instance files based on XML Schema Simplified creation of structured
documents
XML Schema QA
Automated using a QA XSLT GovTalk – Schema QA Stylesheet schemaQA_1.htm
Schema Libraries
Govtalk Ordnance Survey MasterMap Environmental Information
Exchange
XML Toolkit
Parsers (validating & non-validating)
DOM (Document Object Model) SAX (Simple API for XML) Hybrid pull parsers
Schema & Validation
Schema provide basis for automated validation of XML
xmlValidation.dot
Schema & Document Creation
SAS XML Mapper
SAS XMLMap<?xml version="1.0" encoding="UTF-8" ?> <SXLEMAP > <TABLE name="docDscr_citation__titl">
<TABLE-PATH syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl
</TABLE-PATH> <COLUMN name="docDscrcitationtitl"> <PATH
syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>950</LENGTH> <LABEL>Full authoritative title of the documentation (DC
Title)</LABEL> </COLUMN></TABLE></SXLEMAP>
SAS XMLMap Manager Plugin
Benefits of the XML route
Open Standards Vendor Neutral e-GIF/OSIAF compliant Very flexible – one source, many
uses
Problems with the XML route
XML files tend to be large DOM (Drudgery Object Model) Inter-record linking & validation
across records is not trivial Many tools are not mature (but this
situation is improving rapidly.)
OK, What next…?
Vocabularies Schemas Additional intra-record validation
based on XSLT and XPath Publish
Vocabularies
Domain experts identify data items and agree a vocabulary.
Arrange items into logical data groupings
XML Schemas
Model the data items (UML?) Isolate common data definitions Prepare Schemas Disambiguate using namespaces Validate model QA Schemas for compliance with
standards (automated)
Intra-record validation
Options include… XSLT XPath(SE examples: Pupil Census; Road
Accident Stats.)
Publication
Add to Schema Library Govtalk Ordnance Survey MasterMap Environmental Information
Exchange Example: BS7666