36
Knowledge engineering techniques for the creation of a semantic digital edition of Saussure's manuscripts Gilles Falquet, Luka Nerima, Massimo Brero

Knowledge engineering techniques for the creation of a ... · Knowledge engineering techniques for the creation of a semantic digital edition of Saussure's manuscripts! Gilles Falquet,

  • Upload
    lamnhi

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Knowledge engineering techniques for the creation of a semantic digital edition of

Saussure's manuscripts

Gilles Falquet, Luka Nerima, Massimo Brero

Fribourg Workshop – 27.02.2014

Université de Genève - CUI 2

1.  Storage, visualization, annotation, transcriptions of manuscripts from Ferdinand de Saussure

2.  Digital scholarly publishing of manuscripts

} 

A system for the visualization, annotation, and transcription of manuscripts from Ferdinand de Saussure

Fribourg Workshop – 27.02.2014

Université de Genève - CUI 3

Swiss linguist (1857 – 1913 Famous for

modern linguistics structuralism Cours de linguistique générale

very few publications in his lifetime but 15'000 sheets of paper given to libraries (Harvard, Paris, Geneva)

Aims of the project

A usable tool for researchers

1.  Visualization 2.  Annotation 3.  Transcription

of manuscrits from F. de Saussure

Université de Genève - CUI Fribourg Workshop – 27.02.2014

4

Typical (human) task: ���Reconstructing the reading order

Fribourg Workshop – 27.02.2014

Université de Genève - CUI 5

Main concepts

Fribourg Workshop – 27.02.2014

Université de Genève - CUI 6

Transcriptionelement

zone

Writing surface

Pictures

Covered surface

zone

Annotation Transcriptionelement

Transcription

Data/Knowledge Model Represent •  basic metadata about manuscripts •  location, date, image file, ...

•  (scientific) transcriptions •  annotations •  semantic annotations

Available on the semantic web •  expressed in RDF/S •  stored in a RDF triple store

Université de Genève - CUI Fribourg Workshop – 27.02.2014

7

From classification numbers to URIs

Université de Genève - CUI 27.02.14 8

Semantic web => universal identification (URI) •  library classification number → URI

Example (BGE) •  Cote : Ms. fr. 3951/10, f. 28 •  Nom de fichier : ms_fr_03951_10_f028v_029.tif

URI : •  x:ms_fr •  x:ms_fr_03951 •  x:ms_fr_03951_10 •  x:ms_fr_03951_10_f028v_029 •  x:ms_fr_03951_10_f028v_029-DOT-jp2 •  x:ms_fr_03951_10_f028v_029_Z_001 •  x:ms_fr_03951_10_f028v_029_Z_001_annot_001 •  x:ms_fr_03951_10_f028v_029_Z_001_Shape_001

Data Model

Université de Genève - CUI Fribourg Workshop – 27.02.2014

9

System / User Interface

Fribourg Workshop – 27.02.2014

Université de Genève - CUI 10

Manuscript visualization

27.02.14 Université de Genève - CUI 11

Manuscript visualization

27.02.14 Université de Genève - CUI 12

Manuscript visualization

27.02.14 Université de Genève - CUI 13

Manuscript visualization

27.02.14 Université de Genève - CUI 14

Manuscript visualization

27.02.14 Université de Genève - CUI 15

IIP Image server

}  Tiles

Université de Genève - CUI 27.02.14 16

Creating Annotations (texts or concepts)

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 17

Navigation in the corpus

27.02.14 Université de Genève - CUI 18

Navigation in the corpus

27.02.14 Université de Genève - CUI 19

Full text search

27.02.14 Université de Genève - CUI 20

System Architecture

Université de Genève - CUI 21

Web Server/ Front end (REST)

Back end Storage control (updates, authentification)

Image import

Example: Inserting a new annotation

27.02.14 Université de Genève - CUI 22

}  Insert request sent to the RDF server

Usability Testing

27.02.14 Université de Genève - CUI 23

Methodology •  14 users (linguists, librarians, ...) •  13 tasks (4 scenarios)

•  find a manuscrit, create an annotation, ...

•  Measurements: •  #completed tasks •  time to complete each task •  user satisfaction

¨  System Usability Scale (SUS) questionaire

Results

27.02.14 Université de Genève - CUI 24

100% 85%

50%

Task completion by task

Task completion by user

Satisfaction evaluation

Fribourg Workshop - 27.02.14 Université de Genève - CUI 25

68

SUS scores (by question)

SUS scores (by user)

Demo Site

27.02.14 Université de Genève - CUI 26

fds.unige.ch/iipmooviewer/homepage.php

Digital scholarly publishing of manuscripts

27.02.14 Université de Genève - CUI 27

a knowledge representation and management model ... and a system for the digital edition of large corpora of original works

Context and goals

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 28

Digital Critical Edition – current state •  based on paper critical edition

•  DCE of Nietzsche, Peirce, Wittgenstein •  other obstacles:

•  no scientific catalogue

Digital edition of Saussure’s manuscripts project •  to provide a cooperative edition platform for the next 20 years •  to use computers as convergence and mediation tools •  the scientific catalogue and the critical edition will be the

outputs

Digital editions as knowledge networks

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 29

Manuscripts

Transcriptions

terminologies

Articles/Monographs

ontologies

Digital editions as knowledge networks

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 30

Manuscripts

Transcriptions

terminologies

Articles/Monographs

ontologies

Semantic indexes

Alignment

Digital editions as knowledge networks

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 31

Manuscripts

Transcriptions

terminologies

Articles/Monographs

ontologies

Semantic indexes

Alignment

Inferred relations

Knowledge modeling challenge

27.02.14 Université de Genève - CUI 32

To represent the current state of our knowledge about the manuscripts

different types of resources •  direct transcriptions •  scholarly transcriptions •  related terminologies, ontologies, dictionaries •  annotations •  ...

and resource interconnections •  semantic indexes •  text alignments / ontology alignments

Operations

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 33

Manuscripts

Transcriptions

multiword lexical units

Articles/Monographs

ontologies

MLU extraction

Ontology Alignment

Handwriting recognition

Semantic indexing

Operations

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 34

alignment operations: Finding correspondences between elements of different resources,

aligning ontologies, aligning texts at the sentence or term level. enrichment operations: Create new resources that describe an existing one,

add transcriptions to manuscript pictures, extract collocations from texts, create a semantic index.

}  Specific to each type of resource }  Based on OCR, NLP, AI algorithms

Challenge: define a minimal and expressive set of operations

System/Workbench ���for linguists/knowledge engineers

Fribourg Workshop – 27.02.2014

Université de Genève - CUI 35

}  Transcription acquisition •  crowdsourcing

}  Indexing •  word spotting, handwriting recognition ?

}  Knowledge network operations •  NLP techniques for multiword lexical unit extraction •  terminology extraction •  semantic indexing •  resource alignment (existing ontologies, terminologies, ...)

•  define operation workflows •  define virtual (hyper) document generation

Thank you

Fribourg Workshop – 27.02.2014 Université de Genève - CUI 36

Questions ?