Upload
trinhnhu
View
221
Download
2
Embed Size (px)
Citation preview
Knowledge engineering techniques for the creation of a semantic digital edition of
Saussure's manuscripts
Gilles Falquet, Luka Nerima, Massimo Brero
Fribourg Workshop – 27.02.2014
Université de Genève - CUI 2
1. Storage, visualization, annotation, transcriptions of manuscripts from Ferdinand de Saussure
2. Digital scholarly publishing of manuscripts
}
A system for the visualization, annotation, and transcription of manuscripts from Ferdinand de Saussure
Fribourg Workshop – 27.02.2014
Université de Genève - CUI 3
Swiss linguist (1857 – 1913 Famous for
modern linguistics structuralism Cours de linguistique générale
very few publications in his lifetime but 15'000 sheets of paper given to libraries (Harvard, Paris, Geneva)
Aims of the project
A usable tool for researchers
1. Visualization 2. Annotation 3. Transcription
of manuscrits from F. de Saussure
Université de Genève - CUI Fribourg Workshop – 27.02.2014
4
Typical (human) task: ���Reconstructing the reading order
Fribourg Workshop – 27.02.2014
Université de Genève - CUI 5
Main concepts
Fribourg Workshop – 27.02.2014
Université de Genève - CUI 6
Transcriptionelement
zone
Writing surface
Pictures
Covered surface
zone
Annotation Transcriptionelement
Transcription
Data/Knowledge Model Represent • basic metadata about manuscripts • location, date, image file, ...
• (scientific) transcriptions • annotations • semantic annotations
Available on the semantic web • expressed in RDF/S • stored in a RDF triple store
Université de Genève - CUI Fribourg Workshop – 27.02.2014
7
From classification numbers to URIs
Université de Genève - CUI 27.02.14 8
Semantic web => universal identification (URI) • library classification number → URI
Example (BGE) • Cote : Ms. fr. 3951/10, f. 28 • Nom de fichier : ms_fr_03951_10_f028v_029.tif
URI : • x:ms_fr • x:ms_fr_03951 • x:ms_fr_03951_10 • x:ms_fr_03951_10_f028v_029 • x:ms_fr_03951_10_f028v_029-DOT-jp2 • x:ms_fr_03951_10_f028v_029_Z_001 • x:ms_fr_03951_10_f028v_029_Z_001_annot_001 • x:ms_fr_03951_10_f028v_029_Z_001_Shape_001
Data Model
Université de Genève - CUI Fribourg Workshop – 27.02.2014
9
System / User Interface
Fribourg Workshop – 27.02.2014
Université de Genève - CUI 10
Manuscript visualization
27.02.14 Université de Genève - CUI 11
Manuscript visualization
27.02.14 Université de Genève - CUI 12
Manuscript visualization
27.02.14 Université de Genève - CUI 13
Manuscript visualization
27.02.14 Université de Genève - CUI 14
Manuscript visualization
27.02.14 Université de Genève - CUI 15
IIP Image server
} Tiles
Université de Genève - CUI 27.02.14 16
Creating Annotations (texts or concepts)
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 17
Navigation in the corpus
27.02.14 Université de Genève - CUI 18
Navigation in the corpus
27.02.14 Université de Genève - CUI 19
Full text search
27.02.14 Université de Genève - CUI 20
System Architecture
Université de Genève - CUI 21
Web Server/ Front end (REST)
Back end Storage control (updates, authentification)
Image import
Example: Inserting a new annotation
27.02.14 Université de Genève - CUI 22
} Insert request sent to the RDF server
Usability Testing
27.02.14 Université de Genève - CUI 23
Methodology • 14 users (linguists, librarians, ...) • 13 tasks (4 scenarios)
• find a manuscrit, create an annotation, ...
• Measurements: • #completed tasks • time to complete each task • user satisfaction
¨ System Usability Scale (SUS) questionaire
Results
27.02.14 Université de Genève - CUI 24
100% 85%
50%
Task completion by task
Task completion by user
Satisfaction evaluation
Fribourg Workshop - 27.02.14 Université de Genève - CUI 25
68
SUS scores (by question)
SUS scores (by user)
Demo Site
27.02.14 Université de Genève - CUI 26
fds.unige.ch/iipmooviewer/homepage.php
Digital scholarly publishing of manuscripts
27.02.14 Université de Genève - CUI 27
a knowledge representation and management model ... and a system for the digital edition of large corpora of original works
Context and goals
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 28
Digital Critical Edition – current state • based on paper critical edition
• DCE of Nietzsche, Peirce, Wittgenstein • other obstacles:
• no scientific catalogue
Digital edition of Saussure’s manuscripts project • to provide a cooperative edition platform for the next 20 years • to use computers as convergence and mediation tools • the scientific catalogue and the critical edition will be the
outputs
Digital editions as knowledge networks
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 29
Manuscripts
Transcriptions
terminologies
Articles/Monographs
ontologies
Digital editions as knowledge networks
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 30
Manuscripts
Transcriptions
terminologies
Articles/Monographs
ontologies
Semantic indexes
Alignment
Digital editions as knowledge networks
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 31
Manuscripts
Transcriptions
terminologies
Articles/Monographs
ontologies
Semantic indexes
Alignment
Inferred relations
Knowledge modeling challenge
27.02.14 Université de Genève - CUI 32
To represent the current state of our knowledge about the manuscripts
different types of resources • direct transcriptions • scholarly transcriptions • related terminologies, ontologies, dictionaries • annotations • ...
and resource interconnections • semantic indexes • text alignments / ontology alignments
Operations
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 33
Manuscripts
Transcriptions
multiword lexical units
Articles/Monographs
ontologies
MLU extraction
Ontology Alignment
Handwriting recognition
Semantic indexing
Operations
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 34
alignment operations: Finding correspondences between elements of different resources,
aligning ontologies, aligning texts at the sentence or term level. enrichment operations: Create new resources that describe an existing one,
add transcriptions to manuscript pictures, extract collocations from texts, create a semantic index.
} Specific to each type of resource } Based on OCR, NLP, AI algorithms
Challenge: define a minimal and expressive set of operations
System/Workbench ���for linguists/knowledge engineers
Fribourg Workshop – 27.02.2014
Université de Genève - CUI 35
} Transcription acquisition • crowdsourcing
} Indexing • word spotting, handwriting recognition ?
} Knowledge network operations • NLP techniques for multiword lexical unit extraction • terminology extraction • semantic indexing • resource alignment (existing ontologies, terminologies, ...)
• define operation workflows • define virtual (hyper) document generation
Thank you
Fribourg Workshop – 27.02.2014 Université de Genève - CUI 36
Questions ?