Upload
vudang
View
228
Download
0
Embed Size (px)
Citation preview
Session II: Chair Luc Van Eycken
Encoding Cultural Heritage Encoding Cultural Heritage Information for the Semantic Web
Procedures for Data Integration h h ithrough CIDOC CRM Mapping
Ø. Eide*, A. Felicetti**, C. E. Ore*, A. D’Andrea*** and J. Holmen*
* Unit for Digital Documentation, University of Oslo - Norway Unit for Digital Documentation, University of Oslo Norway
** PIN Scarl VASTLAB, University of Florence - Italy
*** CISA, University fo Naples “L’Orientale” - Italy
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
The BABEL tower….
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
The BABEL tower….
I CH i l th BABEL t i Metaphor about lack of communication or In CH circles the BABEL tower is a
incommutability for too many languages and different culturesand different cultures
Languages Terminology Thesauri StandardTh i S d d
Languages gyLanguages Terminology Thesauri Standard
Languages Terminology Thesauri StandardLanguages Terminology Thesauri Standard
Languages Terminology Thesauri Standardh S d d
Languages gyLanguages Terminologies Thesaurii Standard
KnowledgesKnowledgesKnowledgesKnowledgesKnowledges……………………
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
The BABEL tower….
As regard archaeological documentation (forms, reports, etc..) we have different
Archaeological Best Practices
(forms, reports, etc..) we have different streams:
National
ArchaeologicalBackgrounds
Best Practices
Rules Standard
Local Experiences
LegalObbligations
Experiences
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
The BABEL tower….
A paradoxical situation could be: Io sono un I’m a classical je ne fait pas des fouilles Io sono un preistorico e scavo seguendo le superfici…
I m a classical archaeologist. I dig following inscriptions and texts….
je ne fait pas des fouilles archéologiques... mais je rassemble la documentation suivant le St d d ICOMOS
pStandard ICOMOS….
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Here is what can happen in the same archaeological area…..
The BABEL tower….
Could the technology help us in overcoming these gy p gissues….SURE!!!!
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
The BABEL tower….
Languages Terminologies Thesaurii Standard +Human h +approach
+National Rules
Legal obbligations
Standard GuidelinesLegal approach
l fProgram Formats Softwares +ITC Platforms
gLanguages
Formats Softwares +ITC
Resources on-line with scarce access, not alwaysuseful and usable for the lack of interoperability
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
useful and usable for the lack of interoperability
A new challenge
A new challenge for the reconciliation of these disparate reconciliation of these disparate sources stored separately each other(we wanted also try to reconciliate different disciplinary groups separated by cultures a much as content (methods, theories…)
A SEMANTIC APPROACHA SEMANTIC APPROACH………..The integration of knowledge…. through ontologiesthrough ontologies
The use of CIDOC-CRM as a sort of inter-lingua enabling to guarantee data and schemas integration among cultural heritage information/archives/repositories
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
heritage information/archives/repositories
What is the CIDOC CRM?
• is not a metadata standard• it is a Conceptual Reference Model for the
analysis and design of cultural information systems
• does not define the terminology used to gydocument these data structures
• It is addressed to explain the logic of what they “do” documentationwhat they do documentation
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
WP 3.3 Common Infrastructure
As part of the EPOCH a complete framework for the mapping and management of cultural heritage mapping and management of cultural heritage information in a semantic web context has been developed by
• PIN (University of Florence, Italy)
CISA (University of Naples “L’Orientale” Italy) • CISA (University of Naples “L’Orientale”, Italy)
• EDD (University of Oslo, Norway)
AMAAMA (Archive Mapper for Archaeology) is one of the NEWTON projects (NEW TOols Needed)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Background
BUT!!!
mapping requires skills and mapping requires skills and knowledge which are g
uncommon among the cultural heritage cultural heritage
Professionals
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMA Project
The aim of the AMA project was to develop a tools for semi-automated mapping of cultural heritage for semi automated mapping of cultural heritage data to CIDOC-CRM (ISO 2006: 21127).
The reason for this investment was that such tool The reason for this investment was that such tool can enhance interoperability among the different archives and datasets produced in the field of pCultural Heritage.
We created a tool able to extract and encode We created a tool able to extract and encode legacy information coming from diverse sources, to store and manage this information using a semantic enabled container and to make it available for query and reuse.
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMA project: Two key-words
The tool relied upon two concepts:
Mapping = the rules allowing to write and conciliate different schemas
Template = the instantiation of the abstract pmapping between the source data structure and the mapping target (CIDOC-CRM compliant). Templates
h f h dcapture the semantic structure of the sources and their transformations ensuring the modularity and future maintenance of the systemfuture maintenance of the system.
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
A schema for a mapping-tool
Template
Classes and Properties
Template
E65 E t
Schema to be mapped CIDOC-CRMClasses and Properties
USE31_InformationCarrierE65_Event
CompilationEventP94_was created by
is a Site
p
P14_carried out by
E39 actor P70 is documented in
Compiler
E39_actor
E21 person
_
is a
YearForm
E21_person
E50_date
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMATool
The tool set developed in the AMA project includes:p j
• A powerful mapping application for the creation of mappings from existing datasets• A tool for mapping cultural heritage information contained in free text into a CIDOCinformation contained in free text into a CIDOC-CRM compliant data model• Templates describing relations between the Templates describing relations between the structure of existing archives and CIDOC-CRM• A semantic framework to store, manage and browse the encoded information providing user-friendly interfaces
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMATool: partners
CISA (Italy)PIN (It l )PIN (Italy)UNIREL (Italy)CIMEC (Romania)CIMEC (Romania)IAA (Israel)Oxford ArchDigital (UK)g ( )University of Kent (UK)Paveprime LtD (UK)U i it f O l (N )University of Oslo (Norway)ROB (Netherlands)VARTEC (Belgium)VARTEC (Belgium)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Tools developed
AMACurrent releaseCurrent releaseNew online applicationFuture developmentpSoftware release after EPOCH
MADMAD• Review release• Development after reviewDevelopment after review• Future development• Software release after EPOCH
• Complementary tools
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMA: Archive Mapper for Archaeology
Creation of an Open Source tool for mapping existing Creation of an Open Source tool for mapping existing archaeological datasets to CIDOC-CRM compliant structuresEnhance data interchange and interoperability between existing and future interoperability between existing and future repositorieshttp://www.epoch-net.org/AMA/
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMA: the new online application
A new and powerful online version of the application is under development (PHP)
Beta version available at http://ama.ilbello.com
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMA: new version features
Possibility to upload both starting schema and target ontology for any kind of mapping (not only towards ontology for any kind of mapping (not only towards CIDOC-CRM)
Ontology to ontology mapping capabilitiesOntology to ontology mapping capabilities
Advanced mapping features for complex mapping d fi iti (i ti f titi d tidefinitions (i.e. creation of new entities and propertiesto represent implicit elements and relations)Creation of shortcuts to simplify the mapping processp y pp g p
Generation of mapping templates to be applied to information stored in databases in order to get perfectly information stored in databases in order to get perfectly converted semantic archives
Graphic visualization of simple an complex relations
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Graphic visualization of simple an complex relations obtained during the mapping process.
AMA: forthcoming developments
Inplementation of a high level mapping language to Inplementation of a high level mapping language to describe the mappings (under development by Martin Doerr's team in Crete)
Implementation and archiving of mapping templates to be used as mapping starting points
Integration with MAD and other tools for authomatic data extraction and storing in a semantic containerg
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
AMA releases after EPOCH
2 versions: Online Application and a stand-aloneapplication
GPL licenses
Supported formats: all XML compliant documents, includingsemantic grammars (RDF and OWL), and schema formats(XSD/RDFS)(XSD/RDFS)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
CIDOC-CRM
The most central classes and properties:
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
TEI guidelines
A set of guidelines for XML encoding of culture heritage texts There are three culture heritage texts. There are three primary functions of the TEI guidelines:
• guidance for individual or local practice in • guidance for individual or local practice in text creation and data capture;
t f d t i t h• support of data interchange;• support of application-independent local
processing.
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
From text to CIDOC-CRM
• Free text documents contain useful informationinformation
• This information is more useful in a computerised environment if it can be computerised environment if it can be extracted into a well defined format
• This is impossible to do correctly by • This is impossible to do correctly by automatic tools
• This is very time consuming to do manually• This is very time consuming to do manually• Therefore: A semi-automatic approach
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Amatexttool
• Developed by EDD, University of OsloB d 14 i XML • Based on 14 years experience on XML content markup of texts
• Tool for semi-automatic semantic enrichment of XML markup
• Creates CIDOC-CRM compliant markup• Information stored in XML• Information stored in XML
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
What to encode in a text
• Reference to entities (e.g. person names)( g p )• Co-reference in the text (e.g. "he" refers
to the same person as the name on the to the same person as the name on the previous line)
• Relations between entities (e g family • Relations between entities (e.g. family relations between persons)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Search—choose—tag
• In the tool a document can be searhed • In the tool, a document can be searhed for pattersThe result list is a KWIC concordance• The result list is a KWIC concordance
• All or some of the results can be chosen ffor tagging
• Iterative process —semantic bootstrapping
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Amatexttool example
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
MAD: Managing archaeological data
Application designed to manage structured and unstructured archaeological excavation datasetsencoded in XMLDeveloped using Open Source technologies and entirely based on XML and W3C g ystandardsThe core: eXist Native XML Database (http://exist-db.org)
Fully XPath/XQuery and SPARQL aware
Featured by dynamic XSLT transformation and Featured by dynamic XSLT transformation and presentation of documents and query results
CIDOC-CRM ontology for semantic data encoding
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
Data structure of MAD
MAD
XML documents indexed and stored in a file-system-like structure of folders and of folders and subfolders
• XPath and XQuery to
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
• XPath and XQuery to query XML documents
MAD: Testing the application
Fi t l f MAD t t d f th EPOCH P t ’ XML First release of MAD tested for the EPOCH Partners’ XML database (http://partners.epoch-net.org:8080/exist/partners/index.xml)g p )
... and for an XML database of relevant european and italian projects (http://www.epoch-italian projects (http://www.epochnet.org:8080/exist/projects/index.xml)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
MAD: The SAD extension
Implementation of the RDF language
Full SPARQL and RQL query languages supportFull SPARQL and RQL query languages support
Semantic Browser: intuitive set of interfaces tonavigate semantic modelsFull CIDOC-CRM compliancynavigate semantic modelsFull CIDOC-CRM compliancy
Presented at the last EPOCH review in July 2007
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
MAD: Real Life applications
Archaeological excavation dataset of Cuma containinginformation on Stratigraphical units and other relatedresources converted from Syslat (HyperCard) andmanaged using MADg g
The MAD framework used for the creation of an onlineapplication for the complete managament of coinsapplication for the complete managament of coinscollections for the COINS Project - http://www.coins-project.eu)
Some features of MAD will be used for the creation of awide XML repository of ancient iberian pottery(University of Jaen - Spain)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
MAD: The future
Full integration of GML geospatial featureswithin the CIDOC-CRM framework
XML nativ support of MAD can be used in thefield of digital preservation for setting upannotations repositories and creating co-annotations repositories and creating co-reference resolution services
P ibl COLLADA/X3D t f 3D bj tPossible COLLADA/X3D support for 3D objectsrepresented in XML-based formats
AJAX integration and XForms support for easyweb applications development and complexXML information deployment and integrationXML information deployment and integration
Interfaces development to query complexsemantic data in a visual and user friendly
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008
semantic data in a visual and user-friendlyway
MAD releases
2 versions: Online Application and DownloadableVersion to be used as a stand-alone application ori t t d i li t t t (i i di t ib t dintegrated in a client-server context (i.e. using distributedarchives)
GPL License
Supported formats: all XML compliant documents, includingSupported formats: all XML compliant documents, includingsemantic grammars (RDF and OWL), graphic formats(SVG), geographic formats (GML) and 3D formats(COLLADA/X3D)(COLLADA/X3D)
EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008