Upload
simon-dew
View
272
Download
2
Embed Size (px)
Citation preview
mFiL 2015 1
Linguistic markup and processing of transclusion in XML documentsSimon Dew BA MISTC6 November 2015
Copyright © Simon Dew 2015.This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
mFiL 2015 2
Transclusion
mFiL 2015 3
Transclusion
• Theodor Holm Nelson, 1981: Literary Machines• The inclusion of an electronic document, or part of a document, in
the rendering of another document.• The main document does not contain a copy of the transcluded
text, but only a reference to it.• The software used to render the document obtains the transcluded
material and incorporates it into the main work.
Ted Nelson photo by DgiesLicensed under CC BY-SA 3.0
mFiL 2015 4
Transclusion
This presentation focuses on transclusion in XML (Extensible Markup Language) documents, including, but not limited to:
• DocBook• DITA• TEI• XHTML
mFiL 2015 5
Transclusion
Transclusion can be large scale / context-free:
mFiL 2015 6
Transclusion
Transclusion can be small scale / parametrised:
mFiL 2015 7
Transclusion
Transclusion can be small scale / parametrised:
• General entities
Definition:
<!ENTITY device "Euro 500">
Reference:
<title>Configuring the &device;</title>
Result:
<title>Configuring the Euro 500</title>
mFiL 2015 8
Transclusion
Transclusion can be small scale / parametrised:
• General entities• XInclude
Definition:
<phrase xml:id="device">Euro 500</phrase>
Reference:
<title>Configuring the <xi:include xpointer="xpath(id('device')/node())"/></title>
Result:
<title>Configuring the Euro 500</title>
mFiL 2015 9
Transclusion
Transclusion can be small scale / parametrised:
• General entities• XInclude• Specific transclusion mechanisms, e.g. DITA conref
Definition:
<ph id="device">Euro 500</para>
Reference:
<title>Configuring the <ph conref="device"/></title>
Result:
<title>Configuring the <ph>Euro 500</ph></title>
mFiL 2015 10
Transclusion
Transcluded content may vary.
mFiL 2015 11
Transclusion
Transcluded content may vary.
1. Local redefinition
mFiL 2015 12
Transclusion
Transcluded content may vary.
1. Local redefinition
2.Conditional processing:
• Conditional profiling — DocBook• DITAVAL files — DITA
<xsl:param name="profile.vendor" select="'ACME'"/>
<val> <prop action="include" att="product" val="ACME"/> <prop action="exclude" att="product" val="Yoyodyne"/></val>
mFiL 2015 13
Linguistic consequences
mFiL 2015 14
Linguistic consequences
A different form of the transcluded word or phrase may be required depending on the environment into which it is placed:
• Orthography, e.g. writing systems with upper case• Syntactic case• Definiteness• Number• Others, e.g. initial consonant mutation
<title>_____ Details</title>
organisational unit[TITLE CASE]
mFiL 2015 15
Linguistic consequences
A different form of the transcluded word or phrase may be required depending on the environment into which it is placed:
• Orthography, e.g. writing systems with upper case• Syntactic case• Definiteness• Number• Others, e.g. initial consonant mutation
<para>Om nödvändigt, välj _____.</para>
organisationsenhet[+DEFINITE]
mFiL 2015 16
Linguistic consequences
If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words.
• Phonetics• Gender• Number• Case• Definiteness
<para>Configuring a _____ Server</para>
Oz 500[_V]
mFiL 2015 17
Linguistic consequences
If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words.
• Phonetics• Gender• Number• Case• Definiteness
<para>Pour configurer le _____ auqel le modem est connecté : </para>
tablette[_C] [FEM] [SING]
mFiL 2015 18
Principles
mFiL 2015 19
Principles
1. Linguistic markup scheme
Defining transcluded term:
• Mark up all forms of term to be transcluded• Mark up features which affect dependent words
Where transcluded term required:
• Mark up required form• Mark up dependent words
mFiL 2015 20
Principles
2. Linguistic pre-processing
mFiL 2015 21
Principles
2. Linguistic pre-processing
mFiL 2015 22
Markup
mFiL 2015 23
Markup
XML attributes
• Extend markup schema
• Wrapper element:DocBook <phrase>DITA <ph>HTML <span>
• Namespace:http://stanleysecurity.github.io/PACBook/ns/linguistics
• Prefix:ling
mFiL 2015 24
Markup
ling:pron Phonetic environment. (V, C, ...)
ling:num Grammatical number.(sg, pl, ...)
ling:case Grammatical case.(nom, gen, dat, acc, ...)
ling:gen Grammatical gender.(c, m, f, n, ...)
ling:class Definiteness / inflectional class.(strong, weak, mixed, ind, def, ...)
ling:orth Orthographic case.(upper, lower, title, sentence)
ling:type head — form of a head word;depend — dependent word.
mFiL 2015 25
Markup
Resource — features of head noun that demand agreement
<resource xl:label="Product_Name"> <phrase vendor="ACME" ling:pron="C">Euro 500</phrase> <phrase vendor="Yoyodyne" ling:pron="V">Oz 500</phrase></resource>
Phonetic environment:
⟨Euro⟩ / j ə ə /ˈ ʊ ɹ ʊ _C
⟨Oz⟩ / z /ˈɒ _V
mFiL 2015 26
Markup
Resource — all possible forms of head noun:
<resource xl:label="Org_Unit"> <phrase ling:gen="c" ling:num="sg"> <phrase ling:type="head" ling:case="nom" ling:class="ind">organisationsenhet</phrase> <phrase ling:type="head" ling:case="gen" ling:class="ind">organisationsenhets</phrase> <phrase ling:type="head" ling:case="nom" ling:class="def">organisationsenheten</phrase> <phrase ling:type="head" ling:case="gen" ling:class="def">organisationsenhetens</phrase> </phrase></resource>
mFiL 2015 27
Markup
Document — mark up required form of transcluded term
<para>Om nödvändigt, välj <phrase ling:class="def" content:ref="Org_Unit"/>.</para>
<title><phrase ling:orth="title" content:ref="Org_Unit"/> Details</title>
mFiL 2015 28
Markup
Document — mark up dependent words in text
<title>Configuring <wordasword ling:type="depend">a</wordasword><phrase content:ref="Product_Name"/> Server</title>
<para>Wenn <phrase> <wordasword ling:type="depend">ein</wordasword> <phrase content:ref="Device"/> </phrase> konfiguriert wird, werden die Details <phrase> <wordasword ling:type="depend">der</wordasword> <phrase content:ref="Device" ling:case="gen"/> </phrase> auf der Weboberfläche angezeigt.</para>
mFiL 2015 29
Dictionary
mFiL 2015 30
Dictionary
Complies with dictionaries module of the TEI.
<entry n="a"> <form> <gramGrp><usg value="C"/></gramGrp> <orth>a</orth> </form> <form> <gramGrp><usg value="V"/></gramGrp> <orth>an</orth> </form></entry>
mFiL 2015 31
Dictionary
<usg> Phonetic environment. (V, C, ...)
<num> Grammatical number.(sg, pl, ...)
<case> Grammatical case.(nom, gen, dat, acc, ...)
<gen> Grammatical gender.(c, m, f, n, ...)
<oVar> Definiteness / inflectional class.(strong, weak, mixed, ind, def, ...)
<orth> Output.
mFiL 2015 32
Software
mFiL 2015 33
Transformational stylesheets
PACBook XSLT transformations:
• LingHead.xsl — select the required declension of head nouns.• LingDepend.xsl — inflect dependent words.● LingCasing.xsl — sets the orthographic case of specified text.
mFiL 2015 34
Transformational stylesheets
PACBook XSLT transformations:
• LingHead.xsl — select the required declension of head nouns.• LingDepend.xsl — inflect dependent words.• LingCasing.xsl — sets the orthographic case of specified text.
Licence:GNU Lesser General Public License (LGPL) v3
Repository:https://github.com/STANLEYSecurity/PACBook
mFiL 2015 35
Limitations
● Only noun phrases.● Only tested with small handful of languages.● Linguistic markup different for translated texts.● Linguistic markup can be complex for authors.
mFiL 2015 36
Related work
● Various linguistic markup schemas / ontologies● Internationalisation markup● Nothing else?● What should we call this?
mFiL 2015 37
Collaboration
● Dictionary — Wiktionary.● Testing and improving.● Integrating with other publication workflows.
Development fork:https://github.com/janiveer/PACBook
mFiL 2015 38
Examples
mFiL 2015 39
Example
Resource:
<resource xl:label="Doc"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Dokument</phrase> <phrase ling:type="head" ling:case="acc">Dokument</phrase> <phrase ling:type="head" ling:case="gen">Dokuments</phrase> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Hilfedatei</phrase> <phrase ling:type="head" ling:case="acc">Hilfedatei</phrase> <phrase ling:type="head" ling:case="gen">Hilfedatei</phrase> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></resource>
mFiL 2015 40
Example
Document:
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase content:ref="Doc" ling:case="dat"/>nicht enthalten.</para>
mFiL 2015 41
Example
After transclusion:
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Dokument</phrase> <phrase ling:type="head" ling:case="acc">Dokument</phrase> <phrase ling:type="head" ling:case="gen">Dokuments</phrase> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Hilfedatei</phrase> <phrase ling:type="head" ling:case="acc">Hilfedatei</phrase> <phrase ling:type="head" ling:case="gen">Hilfedatei</phrase> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>
mFiL 2015 42
Example
After head transformation:
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>
mFiL 2015 43
Example
After conditional processing:
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase></phrase>nicht enthalten.</para>
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>
mFiL 2015 44
Example
After dependent transformation:
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">diesem</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase></phrase>nicht enthalten.</para>
<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dieser</wordasword><phrase ling:case="dat"> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>
mFiL 2015 45
Questions?
mFiL 2015 46
References● [Nelson] Theodor Holm Nelson. 1981. Literary Machines. Mindful Press, Sausalito, California.
● [XML] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, editors. 26 November 2008. Extensible Markup Language (XML) 1.0 (Fifth Edition). World Wide Web Consortium (W3C).
● [DocBook] DocBook Technical Committee. 1 November 2009. The DocBook Schema Version 5.0. Organization for the Advancement of Structured Information Standards (OASIS).
● [DITA] OASIS DITA Technical Committee. 1 December 2010. Darwin Information Typing Architecture (DITA) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS).
● [TEI] TEI Consortium, eds. 20 January 2014. TEI P5: Guidelines for Electronic Text Encoding and Interchange, 2.6.0. TEI Consortium.
● [HTML] Ian Hickson, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara, Edward O’Connor, Silvia Pfeiffer, editors. 28 October 2014. HTML5. World Wide Web Consortium (W3C).
● [XInclude] Jonathan Marsh, David Orchard, and Daniel Veillard, editors. 15 November 2006. XML Inclusions (XInclude) Version 1.0 (Second Edition). World Wide Web Consortium (W3C).
● [XSLT] James Clark, editor. 16 November 1999. XSL Transformations (XSLT) Version 1.0. World Wide Web Consortium (W3C).
● [Ant] Stephane Bailliez, et al. December 29, 2013. Apache Ant™ 1.9.3 Manual. The Apache Software Foundation.
● [XProc] Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. 11 May 2010. XProc: An XML Pipeline Language. World Wide Web Consortium (W3C).
● [XLIFF] OASIS XLIFF Technical Committee. 1 February 2008. XML Localisation Interchange File Format (XLIFF) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS).
● [GOLD] Scott Farrar and D. Terence Langendoen. 2003. A linguistic ontology for the Semantic Web. GLOT International. 7 (3), pp.97-100.
● [ISOcat] M. Kemps-Snijders, M.A. Windhouwer, P. Wittenburg, S.E. Wright. November 2009. ISOcat: Remodeling Metadata for Language Resources. International Journal of Metadata, Semantics and Ontologies (IJMSO), 4(4), pp 261-276.
● [ICU] ICU Project Management Committee. 7 October 2015. ICU 56. ICU — International Components for Unicode.