62
TEI, ODT, DOCX, and ePub Sebastian Rahtz May 2011 Summer School 2011 1/62

TEI, ODT, DOCX, and ePub - University of Oxfordtei.oucs.ox.ac.uk/Talks/2011-07-dhox/presentations/xsl...TEI, ODT, DOCX, and ePub Sebastian Rahtz May 2011 Summer School 2011 1/62 The

Embed Size (px)

Citation preview

TEI, ODT, DOCX, and ePub

Sebastian Rahtz

May 2011

Summer School 2011 1/62

The OpenXML format of OpenOffice

OpenOffice implements OpenXML, an ISO/IEC internationalstandard, ISO/IEC 26300:2006 Open Document Format for OfficeApplications (OpenDocument) v1.0.The most common filename extensions used are:

.odt for word processing documents

.ods for spreadsheets

.odp for presentations

.odb for databases

.odg for graphics

.odf for formulae, mathematical equationsAn OpenXML document is a zip archive of XML files.

Summer School 2011 2/62

File open in OpenOffice

Summer School 2011 3/62

Inside a typical ODF file

mimetype Gives mime typePictures/100000000000028000000168D2C0ED14.jpg a graphics filecontent.xml Main body of documentmanifest.rdf List of filesstyles.xml Definition of stylesmeta.xml Document meta dataThumbnails/thumbnail.png Document thumbnailsettings.xml Settings for applicationMETA-INF/manifest.xml List of files

Summer School 2011 4/62

The metadata.

.

. ..

.

.

<office:document-metaoffice:version="1.2"grddl:transformation="http://docs.oasis-

open.org/office/1.2/xslt/odf2rdf.xsl"><office:meta><meta:initial-creator>Sebastian Rahtz</meta:initial-creator><dc:creator>Sebastian Rahtz</dc:creator><meta:editing-cycles>1</meta:editing-cycles><meta:creation-date>2011-05-23T21:41:00</meta:creation-date><dc:date>2011-05-23T22:50:35</dc:date><meta:editing-duration>PT3S</meta:editing-duration><meta:generator>LibreOffice/3.3$Unix

LibreOffice_project/330m19$Build-6</meta:generator><meta:document-statistic

meta:table-count="0"meta:image-count="1"meta:object-count="0"meta:page-count="1"meta:paragraph-count="11"meta:word-count="116"meta:character-count="655"/>

<meta:user-defined meta:name="AppVersion">14.0000</meta:user-defined><meta:user-defined meta:name="Company">University of

Oxford</meta:user-defined><meta:template xlink:type="simple" xlink:actuate="onRequest" xlink:title="Normal.dotm" xlink:href=""/>

</office:meta></office:document-meta>

Summer School 2011 5/62

The document.

.

. ..

.

.

<office:document-content office:version="1.2"><office:body><office:text><draw:frame

text:anchor-type="page"text:anchor-page-number="0"draw:z-index="0"draw:name="Picture 1"draw:style-name="gr1"draw:text-style-name="P7"svg:width="387.98pt"svg:height="192.33pt"svg:x="0pt"svg:y="0pt">

<draw:imagexlink:href="Pictures/100000000000028000000168D2C0ED14.jpg"xlink:type="simple"xlink:show="embed"xlink:actuate="onLoad">

<text:p/></draw:image>

</draw:frame><text:h text:style-name="P4" text:outline-level="1">Flights cancelled as

ash cloud heads towards UK</text:h><text:p text:style-name="P3"><text:p text:style-name="P4">The threat of further disruption led US

President Barack Obama to fly out of the Republic of Ireland a day early to getto <text:span text:style-name="T2">London</text:span> for a statevisit.</text:p>

</text:p><text:list xml:id="list830950205" text:style-name="L2"><text:list-item><text:p text:style-name="P5"><text:a

xlink:type="simple"xlink:href="http://www.bbc.co.uk/news/business-13507675">

<text:span text:style-name="T5">Airline shares hit by ashfears</text:span>

</text:a></text:p>

</text:list-item></text:list>

</office:text></office:body>

</office:document-content>

Summer School 2011 6/62

Simple building blocks

<text:h> heading (with @text:outline-level)<text:p> paragraph<text:list> list<text:list-item> list item<text:span> inline span

With all styling controlled by @text:style-name

Summer School 2011 7/62

The styles.

.

. ..

.

.

<style:stylestyle:name="Heading_20_1"style:display-name="Heading 1"style:family="paragraph"style:parent-style-name="Standard"style:next-style-name="Text_20_body"style:default-outline-level="1"style:list-style-name=""style:class="text">

<style:paragraph-properties fo:margin-top="1.39pt" fo:margin-bottom="1.39pt"/><style:text-properties

style:font-name="Times"fo:font-size="24pt"fo:language="en"fo:country="GB"fo:font-weight="bold"style:letter-kerning="true"style:font-size-asian="24pt"style:font-weight-asian="bold"style:font-size-complex="24pt"style:font-weight-complex="bold"/>

</style:style><style:style style:name="P5" style:family="paragraph" style:parent-style-name="Heading_20_1" style:master-page-name="Standard"><style:paragraph-properties style:page-number="auto"/>

</style:style><style:style style:name="P3" style:family="paragraph" style:parent-style-name="Standard"><style:paragraph-properties fo:margin-top="0pt" fo:margin-

bottom="13.49pt" style:line-height-at-least="13.49pt"/><style:text-properties

fo:color="#333333"style:font-name="Arial1"fo:font-size="10pt"fo:language="en"fo:country="GB"fo:font-weight="bold"style:font-size-asian="10pt"style:font-weight-asian="bold"style:font-name-complex="Arial2"style:font-size-complex="10pt"style:font-weight-complex="bold"/>

</style:style>

Summer School 2011 8/62

Implementation of TEI/ODT conversion

Some simple principles:In ODT to TEI, use recursive <xsl:for-each-group> tointerpolate structure from headingsODT paragraphs, lists, items, spans all map more or less 1:1to <p>, <list>, <item> and <hi>ODT pictures more or less map to <figure> and <graphic>As always, table mapping is complicated by simplicity of tablemodel in TEI (no formatting)

When making ODT, unpack a template file (to avoid generating allthe style info), and then overwrite the content.xml file

Summer School 2011 9/62

The flat headings problem

What we see is.

.

. ..

.

.

<text><head level="1">Top-level heading 1</head><p>Lorum ipsum</p><p>Lorum ipsum</p><head level="2">Second-level heading 1</head><p>Lorum ipsum</p><head level="2">Second-level heading 2</head><p>Lorum ipsum</p><p>Lorum ipsum</p><head level="1">Top-level heading 2</head><p>Lorum ipsum</p><p>Lorum ipsum</p>

</text>

Summer School 2011 10/62

The flat headings problem (2)What we want is.

.

. ..

.

.

<text><div><head>Top-level heading 1</head><p>Lorum ipsum</p><p>Lorum ipsum</p><p>Lorum ipsum</p><p>Lorum ipsum</p><div><head>Second-level heading 1</head><p>Lorum ipsum</p><p>Lorum ipsum</p>

</div><div><head>Second-level heading 2</head><p>Lorum ipsum</p><p>Lorum ipsum</p><p>Lorum ipsum</p>

</div></div><div><head>Top-level heading 2</head><p>Lorum ipsum</p><p>Lorum ipsum</p>

</div></text>

Summer School 2011 11/62

Breaking it up with <for-each-group>, @starting-with

Summer School 2011 12/62

How does that <xsl:for-each-group> work?

Assuming we have <head> elements with a @level attribute.

.

. ..

.

.

<xsl:template match="office:text"><body><xsl:for-each-group select="*" group-starting-with="head[@level='1']"><xsl:choose><xsl:when test="self::head[@level='1']"><xsl:call-template name="group-by-section"/>

</xsl:when><xsl:otherwise><xsl:call-template name="inSection"/>

</xsl:otherwise></xsl:choose>

</xsl:for-each-group></body>

</xsl:template>

Summer School 2011 13/62

Case 1: this is a heading

.

.

. ..

.

.

<xsl:template name="group-by-section"><xsl:variable name="ThisHeader" select="number(@level)"/><xsl:variable name="NextHeader" select="number(@level)+1"/><div><head><xsl:apply-templates/>

</head><xsl:for-each-group

select="current-group() except ."group-starting-with="head[number(@level)=$NextHeader]">

<xsl:choose><xsl:when test="self::head"><xsl:call-template name="group-by-section"/>

</xsl:when><xsl:otherwise><xsl:call-template name="inSection"/>

</xsl:otherwise></xsl:choose>

</xsl:for-each-group></div>

</xsl:template>

Summer School 2011 14/62

Case 2: other elements

.

.

. ..

.

.

<xsl:template name="inSection"><xsl:for-each select="current-group()"><xsl:apply-templates select="."/>

</xsl:for-each></xsl:template>

Summer School 2011 15/62

Moving to Word: the OOXML data format

Microsoft Office 2007 (Office 2008/2011 on a Mac) is more or lessan implementation of ISO/IEC 29500 (OOXML); this defines

a family of interlinked XML schemas to describe officedocumentsa file hierarchy structurea packaging format (zip)

There is a (smallish) difference between is in Word, not whatshould be there according to the spec.

Summer School 2011 16/62

The architecture of a Word docx (OOXML) file(Useful picture fromhttp://en.wikipedia.org/wiki/Office_Open_XML)

Summer School 2011 17/62

XML namespaces in Wordurn:schemas-microsoft-com:mac:vml Drawinghttp://schemas.microsoft.com/office/mac/office/2008/mainhttp://schemas.openxmlformats.org/markup-compatibility/2006urn:schemas-microsoft-com:office:officehttp://schemas.openxmlformats.org/officeDocument/2006/relationships

Linkshttp://schemas.openxmlformats.org/officeDocument/2006/math

Mathsurn:schemas-microsoft-com:vml Another bit of drawingurn:schemas-microsoft-com:office:wordhttp://schemas.openxmlformats.org/wordprocessingml/2006/main

Normal texthttp://schemas.microsoft.com/office/word/2006/wordmlhttp://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing

More drawingSummer School 2011 18/62

The contents of the package

Summer School 2011 19/62

What are the files for?

[Content_Types].xml mime types of files_rels/.rels links between names and ob-

jectsword/_rels/document.xml.rels links between names and sup-

port filesword/document.xml document bodyword/media/image1.jpeg picturedocProps/thumbnail.jpeg document thumbnailword/settings.xml settingsword/webSettings.xml settings for HTML exportword/styles.xml style definitionsword/numbering.xml numbering schemesdocProps/core.xml document propertiesword/fontTable.xml font detailsdocProps/app.xml application details

All of these, except media files, are XML files (despite some weirdnames).

Summer School 2011 20/62

Simple text in Word

The main building blocks are<p> block-level object (‘paragraph’)<r> inline object<t> text ‘run’

with corresponding style objects:<pPr> block-level object style rules<rPr> inline style rules

There is no hierarchy, just a flat set of block-level objects.

Summer School 2011 21/62

Word/TEI conversion implementation

As for OpenXML, with the following extra issues(from DOCX) There is no heading element, just paragraphswith a particular style name(from DOCX) There is no list wrapper or list item. Groups ofparagraphs marked as list items have to be wrapped in a<list>(to DOCX) Graphics files have to be listed in a separate file,and linked up with ID/IDREF(to DOCX) Graphics file have to be read to get their naturalsize, needed in the XML markup

Summer School 2011 22/62

Converting other OOXML formats

So can I convert between TEI and Powerpoint, or Excel?

Theoretically, yes. Butthe models of presentations and spreadsheets differmuch more from what TEI does. Don’t expect it tobe easy

Summer School 2011 23/62

Sample of Powerpoint markup.

.

. ..

.

.

<p:txBody><a:bodyPr/><a:lstStyle/><a:p><a:pPr marL="457200" indent="-457200" algn="l"><a:buFont typeface="Arial"/><a:buChar char="•"/>

</a:pPr><a:r><a:rPr lang="en-US" dirty="0" smtClean="0"/><a:t>But the unexpected</a:t>

</a:r></a:p><a:p><a:pPr marL="457200" indent="-457200" algn="l"><a:buFont typeface="Arial"/><a:buChar char="•"/>

</a:pPr><a:r><a:rPr lang="en-US" dirty="0" smtClean="0"/><a:t>And more besides </a:t>

</a:r><a:endParaRPr lang="en-US" dirty="0"/>

</a:p></p:txBody>

Summer School 2011 24/62

Example: references in Word

Summer School 2011 25/62

Example: references in OOXML (Word) — 1.

.

. ..

.

.

<w:p w:rsidR="008A0CE8" w:rsidRPr="00250571" w:rsidRDefault="008A0CE8" w:rsidP="008A0CE8"><w:pPr><w:pStyle w:val="Heading1"/><w:tabs><w:tab w:val="clear" w:pos="400"/><w:tab w:val="clear" w:pos="560"/><w:tab w:val="left" w:pos="403"/><w:tab w:val="left" w:pos="562"/>

</w:tabs></w:pPr><w:bookmarkStart w:id="8" w:name="_Toc201542376"/><w:r w:rsidRPr="00250571"><w:t>Normative references</w:t>

</w:r><w:bookmarkEnd w:id="8"/>

</w:p><w:p w:rsidR="008A0CE8" w:rsidRPr="00250571" w:rsidRDefault="008A0CE8" w:rsidP="008A0CE8"><w:r w:rsidRPr="00250571"><w:t>The following referenced documents are indispensable for

the application of this document. For dated references, onlythe edition cited applies. For undated references, the latestedition of the referenced document (including any amendments)applies.</w:t>

</w:r></w:p>

Summer School 2011 26/62

Example: references in OOXML (Word) — 2.

.

. ..

.

.

<w:p w:rsidR="008A0CE8" w:rsidRPr="00250571" w:rsidRDefault="008A0CE8" w:rsidP="008A0CE8"><w:pPr><w:pStyle w:val="RefNorm"/>

</w:pPr><w:r><w:rPr><w:sz w:val="19"/><w:szCs w:val="19"/>

</w:rPr><w:t>ISO </w:t>

</w:r><w:r w:rsidRPr="00250571"><w:rPr><w:sz w:val="19"/><w:szCs w:val="19"/>

</w:rPr><w:t>13909-2:2001,</w:t>

</w:r><w:r w:rsidRPr="00250571"><w:t xml:space="preserve"> </w:t>

</w:r><w:r w:rsidRPr="00250571"><w:rPr><w:i/>

</w:rPr><w:t>Hard coal and coke</w:t>

</w:r><w:r><w:rPr><w:i/>

</w:rPr><w:t> —</w:t>

</w:r><w:r w:rsidRPr="00250571"><w:rPr><w:i/>

</w:rPr><w:t xml:space="preserve"> Mechanical sampling</w:t>

</w:r><w:r><w:rPr><w:i/>

</w:rPr><w:t> —</w:t>

</w:r><w:r w:rsidRPr="00250571"><w:rPr><w:i/>

</w:rPr><w:t xml:space="preserve"> </w:t>

</w:r><w:r><w:rPr><w:i/>

</w:rPr><w:t>Part </w:t>

</w:r><w:r w:rsidRPr="00250571"><w:rPr><w:i/>

</w:rPr><w:t>2: Coal</w:t>

</w:r><w:r><w:rPr><w:i/>

</w:rPr><w:t> —</w:t>

</w:r><w:r w:rsidRPr="00250571"><w:rPr><w:i/>

</w:rPr><w:t xml:space="preserve"> Sampling from moving streams</w:t>

</w:r></w:p>

Summer School 2011 27/62

Example: references in XML (TEI)

.

.

. ..

.

.

<div type="normativeReferences"><head>Normative references</head><p>The following referenced documents are indispensable

for the application of this document. For datedreferences, only the edition cited applies. For undatedreferences, the latest edition of the referenced document(including any amendments) applies.</p>

<listBibl type="normativeReferences"><bibl type="dated"><publisher>ISO</publisher><idno type="docNumber">13909</idno><idno type="docPartNumber">1</idno><edition>2001</edition><title rend="italic">Hard coal and coke — Mechanical

sampling —<seg/>Part 1: Generalintroduction</title>

</bibl></listBibl>

</div>

Summer School 2011 28/62

Example: math in Word

Summer School 2011 29/62

Example: math in XML (MathML).

.

. ..

.

.

<p>The required overall precision on a lot should be agreed between theparties concerned. In the absence of such agreement, a value of one tenthof the ash content may be assumed.</p><p>The theory of precision is given in ISO 13909-7. The followingequation is derived:</p><p><formula><mml:math><mml:msub><mml:mrow><mml:mi>P</mml:mi>

</mml:mrow><mml:mrow><mml:mtext>L</mml:mtext>

</mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:msqrt><mml:mfrac><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi>

</mml:mrow><mml:mrow><mml:mtext>l</mml:mtext>

</mml:mrow></mml:msub>

</mml:mrow><mml:mrow><mml:mi>n</mml:mi>

</mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:mfenced separators="|"><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mi>u</mml:mi>

</mml:mrow><mml:mrow><mml:mi>m</mml:mi>

</mml:mrow></mml:mfrac>

</mml:mrow></mml:mfenced><mml:msub><mml:mrow><mml:mi>V</mml:mi>

</mml:mrow><mml:mrow><mml:mtext>m</mml:mtext>

</mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>V</mml:mi>

</mml:mrow><mml:mrow><mml:mtext>PT</mml:mtext>

</mml:mrow></mml:msub>

</mml:mrow><mml:mrow><mml:mi>u</mml:mi>

</mml:mrow></mml:mfrac>

</mml:msqrt></mml:math><lb/><c rend="tab"/>(1)</formula>

</p>

Summer School 2011 30/62

Challenges in the XSLT conversion

interpolating hierarchy from flat section headings we use XSLT 2.0<for-each-group> heavily to create documentstructure

making decisions depend on generated structure the conversionmakes 3 passes over the data with the one XSLTtransform, each time adding more structure orresolving anomalies.

table management, depending on what table model we target aWord table differs from a CALS table in how itmodels spanning cells and tables, which causesconsiderable problems in mapping

Summer School 2011 31/62

Putting together similar objectsAnother technique with for-each-group is to put similar itemstogetherInput:.

.

. ..

.

.

<text><p>Lorum ipsum</p><item>cats</item><item>dogs</item><item>horses</item><p>Lorum ipsum</p>

</text>

Output:.

.

. ..

.

.

<text><p>Lorum ipsum</p><list><item>cats</item><item>dogs</item><item>horses</item>

</list><p>Lorum ipsum</p>

</text>

Summer School 2011 32/62

XSL to do grouping

The @group-adjacent attribute must return something to check.

.

. ..

.

.

<xsl:for-each-groupselect="*"group-adjacent="if (self::item) then 1 else 2">

<xsl:choose><xsl:when test="current-grouping-key()=1"><list><xsl:copy-of select="current-group()"/>

</list></xsl:when><xsl:otherwise><xsl:copy-of select="current-group()"/>

</xsl:otherwise></xsl:choose>

</xsl:for-each-group>

ex9.xsl

Summer School 2011 33/62

Working with group-adjacent to differentiate similarelements

.

.

. ..

.

.

<xsl:template match="w:p"><!-- We are looking for: - Lists -> 1 - Table of Contents -> 2 - Figures-> 3 --><xsl:for-each-group

select="."group-adjacent="if (teidocx:is-list(.)) then 1 else if

(teidocx:is-toc(.)) then 2 else if (teidocx:is-figure(.)) then 3 else 4"><!-- For each defined grouping call a specific template. If there is nogrouping defined, apply templates with mode paragraph -->

<xsl:choose><xsl:when test="current-grouping-key()=1"><xsl:call-template name="listSection"/>

</xsl:when><xsl:when test="current-grouping-key()=2"><xsl:call-template name="tocSection"/>

</xsl:when><xsl:when test="current-grouping-key()=3"><xsl:call-template name="figureSection"/>

</xsl:when><!-- it is not a defined grouping .. apply templates -->

<xsl:otherwise><xsl:apply-templates select="."/>

</xsl:otherwise></xsl:choose>

</xsl:for-each-group></xsl:template>

Summer School 2011 34/62

How do those functions work?

.

.

. ..

.

.

<xsl:function name="teidocx:is-toc" as="xs:boolean"><xsl:param name="p"/><xsl:choose><xsl:when

test="$p[contains(w:pPr/w:pStyle/@w:val,'toc')]">true</xsl:when><xsl:otherwise>false</xsl:otherwise>

</xsl:choose></xsl:function><xsl:function name="teidocx:is-figure" as="xs:boolean"><xsl:param name="p"/><xsl:choose><xsl:when

test="$p[contains(w:pPr/w:pStyle/@w:val,'Figure')]">true</xsl:when><xsl:when

test="$p[contains(w:pPr/w:pStyle/@w:val,'Caption')]">true</xsl:when><xsl:otherwise>false</xsl:otherwise>

</xsl:choose></xsl:function>

Summer School 2011 35/62

Handling incoming Word style

We use TEI @rend a lot to preserve style names.

.

. ..

.

.

<xsl:templatematch="w:p[w:pPr/w:pStyle/@w:val='Figure text']"mode="paragraph">

<p><xsl:if test="w:pPr/w:jc/@w:val"><xsl:attribute name="iso:align"><xsl:value-of select="w:pPr/w:jc/@w:val"/>

</xsl:attribute></xsl:if><xsl:attribute name="rend"><xsl:text>Figure_text</xsl:text>

</xsl:attribute><xsl:apply-templates/>

</p></xsl:template>

Summer School 2011 36/62

Handling incoming TEI element

.

.

. ..

.

.

<xsl:templatematch="tei:front/tei:div/tei:p[@type='foreword']">

<xsl:call-template name="block-element"><xsl:with-param name="pPr"><w:pPr><w:pStyle><xsl:attribute name="w:val"><xsl:value-of

se-lect="concat(translate(substring(parent::tei:div/@type,1,1),$lowercase,$uppercase),substring(parent::tei:div/@type,2))"/>

</xsl:attribute></w:pStyle>

</w:pPr></xsl:with-param>

</xsl:call-template></xsl:template>

Summer School 2011 37/62

Corrigenda and addenda (TEI XML)

.

.

. ..

.

.

<p>This fourth edition cancels and replaces the thirdedition(ISO 6579:<del when="2009-10-30T13:19:00Z" type="COR" n="1">1993</del><add when="2009-10-30T13:19:00Z" type="COR" n="1">1999</add>), which

has been technically revised.</p><bibl><add when="2009-10-30T09:27:00Z" type="AMD" n="1">ISO/TS 11133-1,

<title rend="italic">Microbiology of food and animal feeding stuffs —Guidelines on preparation and production of culture media — Part 1:General guidelines on quality assurance for the preparation of culturemedia in the laboratory</title></add>

</bibl>

Summer School 2011 38/62

Displaying corrigenda and addenda (HTML)

Summer School 2011 39/62

Supporting new styles in DOCX to TEI: a real storyOur target is a complex Word document, carefully prepared withmaximum use of styles.

Summer School 2011 40/62

Styles for headings

Summer School 2011 41/62

Styles for exercises, and inline styles

Summer School 2011 42/62

1. Map some styles to TEI elements.

.

. ..

.

.

<xsl:templatematch="w:p[w:pPr/w:pStyle/@w:val='ITLP Caption']"mode="paragraph">

<head><xsl:apply-templates/>

</head></xsl:template><xsl:template

match="w:p[w:pPr/w:pStyle/@w:val='ITLP Table Heading']"mode="paragraph">

<head><xsl:apply-templates/>

</head></xsl:template><xsl:template

match="w:p[w:pPr/w:pStyle/@w:val='ITLP Ex Tasks Bulleted']"mode="paragraph">

<item><xsl:apply-templates/>

</item></xsl:template><xsl:template

match="w:p[w:pPr/w:pStyle/@w:val='ITLP BodyText Bulletted']"mode="paragraph">

<item><xsl:apply-templates/>

</item></xsl:template>

Summer School 2011 43/62

2. Identify list structuresThe conversion uses functions which check whether something is alist, and decide what sort of list..

.

. ..

.

.

<xsl:function name="teidocx:is-list" as="xs:boolean"><xsl:param name="p"/><xsl:choose><xsl:when

test="$p[contains(w:pPr/w:pStyle/@w:val,'List')]">true</xsl:when><xsl:when

test="$p[contains(w:pPr/w:pStyle/@w:val,'Bulletted')]">true</xsl:when><xsl:when

test="$p[contains(w:pPr/w:pStyle/@w:val,'Bulleted')]">true</xsl:when><xsl:otherwise>false</xsl:otherwise>

</xsl:choose></xsl:function><xsl:function name="teidocx:get-listtype" as="xs:string"><xsl:param name="style"/><xsl:choose><xsl:when test="$style='ITLP BodyText Bulletted'"><xsl:text>unordered</xsl:text>

</xsl:when><xsl:otherwise><xsl:text/>

</xsl:otherwise></xsl:choose>

</xsl:function>

Summer School 2011 44/62

3. Identify headingsSimilarly, we need to know if something is a section heading.Top-level headings are a bit different..

.

. ..

.

.

<xsl:function name="teidocx:is-firstlevel-heading" as="xs:boolean"><xsl:param name="p"/><xsl:choose><xsl:when test="$p[w:pPr/w:pStyle/@w:val='ITLP H1']">true</xsl:when><xsl:when

test="$p[w:pPr/w:pStyle/@w:val='ITLP Anonymous Heading1']">true</xsl:when>

<xsl:otherwise>false</xsl:otherwise></xsl:choose>

</xsl:function><xsl:function name="teidocx:is-heading" as="xs:boolean"><xsl:param name="p"/><xsl:variable name="s" select="$p/w:pPr/w:pStyle/@w:val"/><xsl:choose><xsl:when test="$s=''">false</xsl:when><xsl:when test="$s='ITLP Anonymous Heading 1'">true</xsl:when><xsl:when test="$s='ITLP Anonymous Heading 2'">true</xsl:when><xsl:when test="$s='ITLP H1'">true</xsl:when><xsl:when test="$s='ITLP H2'">true</xsl:when><xsl:when test="$s='ITLP H3'">true</xsl:when><xsl:when test="$s='Heading1'">true</xsl:when><xsl:when test="$s='Heading2'">true</xsl:when><xsl:when test="$s='Heading3'">true</xsl:when><xsl:when test="$s='Heading4'">true</xsl:when><xsl:otherwise>false</xsl:otherwise>

</xsl:choose></xsl:function>

Summer School 2011 45/62

4. Some cases where the TEI has no structuralcomponent, so use @@rend

.

.

. ..

.

.

<xsl:templatematch="w:p[w:pPr/w:pStyle/@w:val='ITLP Ex Explanation']"mode="paragraph">

<p rend="ExampleExplanation"><xsl:apply-templates/>

</p></xsl:template><xsl:template

match="w:p[w:pPr/w:pStyle/@w:val='ITLP Task Text']"mode="paragraph">

<p rend="ExampleTask"><xsl:apply-templates/>

</p></xsl:template><xsl:template

match="w:p[w:pPr/w:pStyle/@w:val='ITLP Step Text']"mode="paragraph">

<p rend="ExampleStep"><xsl:apply-templates/>

</p></xsl:template><xsl:template

match="w:p[w:pPr/w:pStyle/@w:val='ITLP Ex Heading']"mode="paragraph">

<p rend="ExampleHeading"><xsl:apply-templates/>

</p></xsl:template>

Summer School 2011 46/62

5. Now the inline styles.

.

. ..

.

.

<xsl:templatematch="w:r[w:rPr/w:rStyle/@w:val='ITLP FileSpec']">

<code rend="FileSpec"><xsl:apply-templates/>

</code></xsl:template><xsl:template

match="w:r[w:rPr/w:rStyle/@w:val='ITLP Button']"><code rend="Button"><xsl:apply-templates/>

</code></xsl:template><xsl:template

match="w:r[w:rPr/w:rStyle/@w:val='ITLP Input']"><code rend="Input"><xsl:apply-templates/>

</code></xsl:template><xsl:template

match="w:r[w:rPr/w:rStyle/@w:val='ITLP Key']"><code rend="Key"><xsl:apply-templates/>

</code></xsl:template><xsl:template

match="w:r[w:rPr/w:rStyle/@w:val='ITLP Label']"><code rend="Label"><xsl:apply-templates/>

</code></xsl:template><xsl:template

match="w:r[w:rPr/w:rStyle/@w:val='ITLP Menu']"><code rend="Menu"><xsl:apply-templates/>

</code></xsl:template><xsl:template

match="w:r[w:rPr/w:rStyle/@w:val='ITLP Software']"><code rend="Software"><xsl:apply-templates/>

</code></xsl:template>

Summer School 2011 47/62

eBooks

A long history of attempts to make replacements for books ona small tablet computer looking like a bookMost successful is Amazon KindleFollowed by Apple iPadAnd then the Sony Reader, Barnes and Noble Nook etcLargely marketed for reading modern fiction

Summer School 2011 48/62

What are eBooks like?

Designers compare them to “1990s web”Designers too used to painting picturesInconsistent support for ePubToo many reader apps on iPadKindle format annoyingly different

Summer School 2011 49/62

Formats

Most ebook formats based on HTMLOpen ePub format has most supportAmazon Kindle is variant but can be created by convertingePubePub is simply a zipped bundle of XML/HTML files, CSS,graphics etc

Summer School 2011 50/62

iBooks: http://www.apple.com/

Free app on iPhone and iPadRenders ePub and PDF booksManaged using iTunes on host computerImplements some extensions to ePub (video, fixed format)

Summer School 2011 51/62

iBooks issues

Pretty good support for ePub / CSS featuresStill too slow with big books,1000 pages or more (not yettried with iPad 2)Badly needs MathML supportBookshelf layout still primitiveiTunes interface politically uncomfortable for someWe need an ePub previewer for the Mac desktop!

Summer School 2011 52/62

ePub specs and apps

ePub @ IDPF: http://idpf.org/epubAdobe Digital Editions: http://www.adobe.com/products/digitaleditions/Making ePub from Apple Pages:http://support.apple.com/kb/ht4168Making ePub using InDesign: http://blogs.adobe.com/digitalpublishing/2010/03/create_epub_ebooks_with_adobe_indesign.htmlsoftware in the cloud to convert any webpage into an e-book:http://dotepub.com/

Summer School 2011 53/62

Useful linksePub syntax checker, an essential tool for checking whether apackage is properly constructed:http://code.google.com/p/epubcheck/Stanza ePub reader for iPhone and Mac is not bad, but failson large files and does not do all the formatting:http://www.lexcycle.com/Aldiko on Google phones is quite complete:http://www.aldiko.com/FBReader ePub reader for Linux and Android, useableforsome texts, but not a very complete renderer:http://www.fbreader.org/FBReaderJ/EPUBReader Firefox extension allows you to view ePubsseamlessly in Firefox: https://addons.mozilla.org/en-US/firefox/addon/45281/Calibre is a good package for ePub conversions andmanagement: http://calibre-ebook.com/

Summer School 2011 54/62

Things to read

a nice ePub book called epub straight to the point by LizCastro (http://www.elizabethcastro.com/epub/)One of the guides on making ePub (there are many): http://www.lexcycle.com/faq/how_to_create_epubePub-related blogs which I find useful arehttp://www.pigsgourdsandwikis.com/ andhttp://blog.threepress.org

Summer School 2011 55/62

Files in a typical ePub

mimetype gives mime-­‐type (uncompressed)META-INF/container.xml gives name of directory where files are

(OEBPS)OEBPS/content.opf metadata, file manifest, order of chapters

etcOEBPS/media/image0.png image for bookOEBPS/stylesheet.css CSS stylesheetOEBPS/s2.html HTML chapterOEBPS/s3.html HTML chapterOEBPS/page-template.xpgt instructions for ADEOEBPS/titlepage.html HTML for front pageOEBPS/titlepageback.html HTML for back pageOEBPS/toc.ncx table of contentsOEBPS/index.html HTML main partOEBPS/s1.html HTML partOEBPS/cover.jpg book cover imageOEBPS/print.css CSS for stylesheet for printing

Summer School 2011 56/62

First part of metadata

.

.

. ..

.

.

<metadata><dc:title>Collected Poems</dc:title><dc:language xsi:type="dcterms:RFC3066">en</dc:language><dc:subject>Oxford Text Archive</dc:subject><dc:subject>Poems -- Great Britain -- 20th

century</dc:subject><dc:identifier id="dcidid" opf:scheme="URI">http://ota.ox.ac.uk/id/3020</dc:identifier><dc:description>Collected Poems / Owen, Wilfred,

1893-1918</dc:description><dc:creator>Owen, Wilfred</dc:creator><dc:publisher>Oxford Text Archive, Oxford

University</dc:publisher><dc:date opf:event="creation">1920</dc:date><dc:date opf:event="epubpublication" xsi:type="dcterms:W3CDTF">2010-

09-21</dc:date><dc:rights>Creative Commons Attribution</dc:rights><meta name="cover" content="cover-image"/>

</metadata>

Summer School 2011 57/62

Second part of metadata.

.

. ..

.

.

<manifest><item href="cover.jpg" id="cover-image" media-

type="image/jpeg"/><item href="stylesheet.css" id="css" media-

type="text/css"/><item

href="titlepage.html"id="titlepage"media-type="application/xhtml+xml"/>

<itemhref="titlepageback.html"id="titlepageback"media-type="application/xhtml+xml"/>

<item id="print.css" href="print.css" media-type="text/css"/><item

id="apt"href="page-template.xpgt"media-type="application/adobe-page-template+xml"/>

<item id="start" href="index.html" media-type="application/xhtml+xml"/><item href="s1.html" media-

type="application/xhtml+xml" id="section1"/><item href="s2.html" media-

type="application/xhtml+xml" id="section34"/><item href="s3.html" media-

type="application/xhtml+xml" id="section57"/><item href="media/image0.png" id="image-1" media-

type="image/png"/><item id="ncx" href="toc.ncx" media-type="application/x-

dtbncx+xml"/></manifest>

Summer School 2011 58/62

Third part of metadata.

.

. ..

.

.

<spine toc="ncx"><itemref idref="titlepage" linear="yes"/><itemref idref="start" linear="yes"/><itemref linear="yes" idref="section1"/><itemref linear="yes" idref="section34"/><itemref linear="yes" idref="section57"/><itemref idref="titlepageback" linear="no"/>

</spine><guide><reference type="text" href="titlepage.html" ti-

tle="Cover"/><reference type="text" title="Start" href="index.html"/><reference type="text" href="s1.html" title="War Poems"/><reference type="text" href="s2.html" title="Other Poems,

and Fragments"/><reference type="text" href="s3.html" title="Minor Poems,

and Juvenilia"/><reference href="titlepageback.html" type="text" ti-

tle="About this book"/></guide>

Summer School 2011 59/62

So lets make our own

Export to ePub from InDesignExport to ePub from Apple PagesConvert from other formats using CalibreConvert web pages (dotepub)Your Favourite System may have an exportRoll it yourself with an HTML editor

Summer School 2011 60/62

TEI ePub method

Use TEI XML as pivot formatAdapt HTML transforms and CSSGenerate extra components of ePub format as part of XSLTtransformGenerate cover images automatically from metadataScripting to manage zip-packaging and graphic filesInitially command-line, then as web service

Summer School 2011 61/62

Samples

Summer School 2011 62/62