54
Biblissima’s Choices of Tools and Methodology for Interoperability Purposes Eduard FRUNZEANU Régis ROBINEAU Équipex Biblissima http://biblissima-condorcet.fr 5th HÉLOÏSE WORKSHOP Madrid, 19-21 October 2015

Biblissima’s Choices of Tools and Methodology for Interoperability Purposes

Embed Size (px)

Citation preview

Biblissima’s Choices of Tools and Methodology for Interoperability Purposes

Eduard FRUNZEANU Régis ROBINEAU

Équipex Biblissima

http://biblissima-condorcet.fr

5th HÉLOÏSE WORKSHOP Madrid, 19-21 October 2015

New technologies for a new library

Partners and challenges

Around 40 databases with DATA and IMAGES

codicology

catalography

manuscript transmission

Esprit  des  livres  (ENC)  

Codicologia  (IRHT)  

Bibale  (IRHT)  

BnF  Archives  et  Manuscrits   Pinakes  (IRHT)  

Reliures  (BnF)  

Partners and challenges

Around 40 databases with DATA and IMAGES

iconography prosopography

Mandragore  (BnF)  

IniBale  (IRHT)  

textual corpora

Bibliothèque  Virtuelles  Humanistes  (CESR)  

Prosopographie  des  inventaires  (MRSH  Caen)  

BUDE  (IRHT)  

Sermones  (CIHAM)  

Solutions and tools to handle and build interoperability of DATA & IMAGES

❖  Ontology (based on CIDOC-CRM and FRBRoo) ❖  Thesaurus/ Authority File (Ginco / BaseX) ❖  Viewer (Mirador) ❖  Semantic Web Application Framework (CubicWeb)

Building the Thesaurus / Authority File •  Thesaurus:

– Types of data: •  geographical names •  iconographical descriptors •  specialised terminology (codicology, palaeography) •  languages, etc.

•  Standard / Tool: SKOS / Ginco

•  Authority File: –  Types of data:

•  persons and corporate bodies •  works

–  Standard / Tool: XML-TEI / BaseX

Geographical Thesaurus

Types of geographical data: •  descriptors: geographical places identified in miniatures

(historical, disappeared, fictional, non-identified, current) •  places of origin: city or abbey where an item (manuscript or

printed work) was copied / edited / painted •  holding institutions: archives, libraries, museums

Structure and format of geographical data:

•  hierarchical thesaurus or flat lists for the descriptors •  places of origin associated with the relevant provinces,

countries and geographical areas •  Country / City / Repository for the holding institutions

Starting point for Biblissima’s GeoThesaurus

2 datasets: Mandragore (BnF) & Initiale (IRHT)

Linked Data repositories & methods used for alignment: •  automatic alignment, checked & manually corrected, to

geonames.org , data.bnf.fr (Map Department & Rameau), dbpedia.org

•  manual alignment to specialised repositories: pleiades.stoa.org, trismegistos.org, bibelwissenschaft.de

SKOS properties used to label the alignment and organise the thesaurus: prefLabel, altLabel, broader, narrower, exactMatch, closeMatch, relatedMatch

Hierarchical thesaurus in Mandragore (BnF)

Dewey Classification

Administrative division (country & department)

Physical geography

Hierarchical thesaurus in Initiale (IRHT)

Hierarchy of Biblissima’s GeoThesaurus I. General notions II. Political geography (based on feature codes of Geonames)

A.  Geographical areas (= Dewey classification) 1.  Countries

a)  Counties (1)  Cities

2.  Ancient cities and provinces III. Physical geography (based on feature codes of Geonames)

A.  Continents B. Islands & Peninsulas C. Deserts & Oasis D. Rivers, Lakes, Seas E. Mountains & Volcanos F. Forests & Parks

IV. Human constructions (based on feature codes of Geonames) A.  Monasteries B. Castles & palaces C. Religious sites

D. Bridges E. Towers & fortresses V. Fictional places VI. Non-identified places VII. Disappeared places

Integrating a geographical thesaurus into CubicWeb application used to build the

Biblissima portal

MNMT, OBS, RLG

Geonames feature codes

Record for a geographic place

Get access to the data via a cartographic representation

http://nossl.demo.logilab.fr/biblissima/descr-map

Administrative interface of Ginco platform

Hierarchy and structure of geographical terms

Ginco-Diff: thesaurus page

Thesaurus URI: http://data.biblissima.fr/thesaurus/page/ark:/43093/b6957cdd-bade-4373-b058-ca63680ee39b

Ginco-Diff: concept page

Concept URI: http://data.biblissima.fr/thesaurus/page/ark:/43093/359ed56c-3026-4323-8bef-73abd11d4b04

Authority File

Data about persons: -  Personal Name Heading -  Alternative Name Forms -  Gender -  Date of birth / death -  Place of birth / death -  Titles / Relators -  Works -  Alignments with linked data repositories: data.bnf.fr, viaf.org

Relationships between persons (to be modelled in the near future):

-  academic (master of / student of) -  genealogical (father of / husband of) -  institutional (friar of / member of) -  intellectual (translator of / copyist of / editor of / illuminator of) -  socio-cultural (dedicatee of / donor of / patron of / sponsor of)

Authority record in XML-TEI edited in XXE framework

HTML page created from XML-TEI file

Identifier based on the XML ID

Prosopographie des inventaires - Centre Michel de Boüard & MRSH de Caen

Other projects using XML-TEI / BaseX solution

Curate and enrich datasets

Operations: •  identify identical items with different graphical forms •  align with other linked data repositories:

–  data.bnf.fr - dbpedia.org –  viaf.org - geonames.org

•  extract complementary information •  dispatch the complementary information to the original datasets

Tools:

•  OpenRefine •  GoogleXML •  PHP scripts

OpenRefine: alignment with open source linked data repositories

OpenRefine: clustering based on the similarity of character strings

OpenRefine: clustering based on data.bnf.fr URIs

OpenRefine: clustering based on VIAF IDs

SPARQL query to retrieve the personal name heading for an author via data.bnf.fr/sparql

Alignment  of  the  database  graphical  form  Abbo  Floriacensis  =  hLp://data.bnf.fr/ark:/12148/cb12584637x    

SPARQL query to retrieve alternative forms of an author’s name via data.bnf.fr/sparql

Alignment  of  the  database  graphical  form  Abbo  Floriacensis  =  hLp://data.bnf.fr/ark:/12148/cb12584637x    

SPARQL query to retrieve URIs from other linked data repositories

Alignment  of  the  database  graphical  form  Abbo  Floriacensis  =  hLp://data.bnf.fr/ark:/12148/cb12584637x    

SPARQL queries applied to an entire set of data

Retrieve the personal name heading for an author

Retrieve the variant forms for an author’s name

Extract URIs from other linked data repositories

Retrieve other type of information

Biographical Note

Scientific works used for the authority record

GoogleXML: extract information about an author from HTML code

GoogleXML: identify the appropriate tags

Apply a GoogleXML formula to retrieve alternative name forms

GoogleXML: extract geographical coordinates (decimal degrees) from HTML code

GoogleXML: identify the appropriate tags

Apply a GoogleXML formula

PHP script

•  Input (CSV): list of places aligned with data.bnf URIs

Get lat/long (Sparql query in the loop):

 SELECT  ?concept  ?spatialThing  ?long  ?lat  WHERE  {  

?concept  skos:closeMatch  <".$uri.">  .    ?concept  foaf:focus  ?spatialThing  .    ?spatialThing  geo:long  ?long  .    ?spatialThing  geo:lat  ?lat  .  

}  

• Output (CSV): source data enriched with

latitude/longitude coordinates

Using technical solutions to build a prototype based on Initiale & Mandragore data

demos.biblissima-­‐condorcet.fr/prototype/  

One of the prototype’s main objectives: to build interoperability between two datasets from the iconographical databases Initiale (IRHT) and Mandragore (BnF)

Aligning different forms of a name

•  Titus Livius / Database Mandragore (BnF) http://mandragore.bnf.fr

hLp://data.bnf.fr/ark:/12148/cb11886799m  

Aligning different forms of a name

•  Livius / Database Initiale (IRHT) http://initiale.irht.cnrs.fr

hLp://data.bnf.fr/ark:/12148/cb11886799m  

Find relevant results for data from two different datasets in the same interface

IniBale  

Mandragore  

Find relevant results in a web search engine by searching the URI

BnF  

Biblissima  

=  Titus  Livius  

New visualisation tools to enhance research

Introducing Mirador: •  IIIF-compatible web viewer (Shared Canvas / OA) •  Zoom, compare, annotate, share • multi-window workspace, cross-repository

(interoperability)

iiif.io projectmirador.org

Autograph handwriting and personal identity

Mirador and its potential uses:

•  Trace, identify and index an author’s personal annotations. Ex.: Marginalia by Florus of Lyon on St Petersburg, National Library of Russia, Lat.F.papyr. I.1, b (annotated in Mirador)

Autograph handwriting and personal identity

Mirador and its potential uses: •  Create a database of autographs in order to better identify

scribes and writers  

Note about an autograph letter by Jean Hervin in the manuscript Paris, BnF Français 17708, f. 210r

Autograph handwriting and personal identity

BnF Français 17708, f. 210r, annotated in Mirador viewer: http://demos.biblissima-condorcet.fr/mirador/?json=56241541e4b01190df3c263d

Stylistic features and personal identity

Mirador and its potential uses: Compare stylistic features to better identify artists (e.g. Willem Vrelant in Initiale)  

Paris, Bibliothèque Sainte-Geneviève, ms. 0811, f. 005 Reddition de Valenciennes à Herman, comte de Mons Attribution: Willem Vrelant (entourage)

Paris, Bibliothèque Sainte-Geneviève, ms. 0809, f. 317 Siège de Mayence par les Romains Attribution: Willem Vrelant (entourage)

Restoring the relationship between an artistic work and its textual context

The interpretation of a miniature is dependent on the original textual context. However, there are many damaged manuscripts:

•  around 280 notes about cut miniatures in Initiale database

•  very few of these cut miniatures have been located Example: Châteauroux BM, ms. 5, Grandes Chroniques de France http://demos.biblissima-condorcet.fr/chateauroux/

What’s next?

•  Integrate all partner databases using a common XML format to simplify the ingestion of data into the CubicWeb application

•  Integrate the Mirador viewer within the web portal (with

new functionalities)

•  Enhance the search engine and navigation within the portal

•  Propose new visual representations of data