13
Atelier Villejuif, LAMOP, Paris mai 2008 Digital Philology from XML & XSLT to XPath 2.0 & XQuery in Medieval Transcriptions The Arts & Humanities Research Council The Roberts Fund CNRS Agence Nationale de la Recherche

Atelier Villejuif, LAMOP, Paris mai 2008 Digital Philology from XML & XSLT to XPath 2.0 & XQuery in Medieval Transcriptions The Arts & Humanities Research

Embed Size (px)

Citation preview

Atelier Villejuif, LAMOP, Paris mai 2008

Digital Philologyfrom XML & XSLT toXPath 2.0 & XQuery

in Medieval TranscriptionsThe Arts & Humanities Research Council

The Roberts FundCNRS

Agence Nationale de la Recherche

www.python.orgxmlfr.org

Un laboratoire pour la philologie numérique – une bourse gagné des Fonds Roberts par Mansfield en 2007 afin de préparer les outils XML pour les chercheurs ingénieurs en Moyen Français

UTF-8 is an8-bit Unicode Transformation Format

It forgivesnon-accentedcharacters but You must manage things when yourweb pagehasaccented characters

eg Georgian Commas• ii •

While an ampersand or une esperluettewe show as an html entity&

November 2007 TEI P5 Announced

Full elements need an opening and a closing tagwhile an ‘empty’ element is complete in one tag.

A full element is opened <name> and closed </name>WhileAn empty element is complete within itself:<abbr expan="n"/>t

November 2007 TEI P5 Announced

Our empty elements, most of which Python converts fromsimple rules: tie<abbr expan="n"/>t

<cb n="b"/> here marking the start of column b

<lb n="CDAM.369c:01"/>

<layout ruledLines="38"/>

<pb n="241" side="r"/> page break, here marking the start of fol. 241r

The only one you will need to type in will be:soubz<space/>rire will show as'soubz rire'  in the diplomatic transcription and as'soubzrire'  in the modern edition

XPath 2.0in EditiX

XML + XSL = HTML Web Page

EditiX and Oxygen have built-in transformProcessors so that you can applyXSLT scripts to your XML transcription at any time during your tagging work

You cannot change the HTML web-page. It is frozen. However, it doeshighlight for you any potential problems

Our Portfolio of Essential Files

transcription.xml the text you are transcribing.4431.dtd is our Document Type Description which lists in alphabeticorder the elementsand their attributes which we have chosen from the TEI P5 guidelines mat/table.xsl is our main transform program for the whole manuscript and for individual books.  It differs from our specialised XSLT programs (which include the XSLTransform: glossary.xsl for rendering a table of glosses from all those we have typed-in inline).  Please be careful when downloading XSLTransform files because some systems (especially on Windows) add a further file extension of .xml after the .xsl. 

classes.css is our full list of class definitions for our cascading style-sheet which renders font size and colour on the final web version. 

javascript.js is CM's JavaScript or ECMAScript code for the interactive glosses and name pop-up windows.

fleur.png is the triangular red notes pointer image for the web versions of the online Harley transcription:

get.html is the pilot version of our Glossary File and system which passes a link to the DMF2 at our partner laboratory ATILF.

But for the ‘grand public” we need a way out of the building-site

A Native XML DatabaseNative, so we do not have to do any additional coding beyond our TEI P5 XML

http://sligachan.lib.ed.ac.uk:8080/exist/admin/admin.xql

FiNcharlie.mansfield à vodafone.netarn

ou

l.vjf

.cn

rs.f

r/ac

tes Future Funding

HERA Joint Research Programmes

http://www.heranet.info/Default.aspx?ID=274

National Contacts:France - CNRS Bruno [email protected]

United Kingdom - AHRC Christelle Pellecuer [email protected]