29
TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

Embed Size (px)

Citation preview

Page 1: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

TEI, CIDOC-CRM and a Possible Interface between the Two?

Øyvind Eide & Christian-Emil Ore

Unit for Digital Documentation, University of Oslo, Norway

Page 2: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

The CIDOC Conceptual Reference Model(cidoc.ics.forth.gr)

• What is the CIDOC CRM?– An object oriented ontology developed by ICOM-CIDOC, 1996-2005– Accepted as ISO-21127 in June 2005– About 80 classes and 130 properties for cultural and natural history– CRM instances can be encoded in many forms: RDBMS, ooDBMS, XML,

RDF(S), OWL.

• What is the CIDOC CRM for?– Intellectual guide to create schemata, formats, profiles Extension of CRM with a

categorical level, e.g. reoccurring events– Best practice guide– A language for analysis of existing sources and models for data integration

(mapping)– Transportation format for data integration / migration /Internet

• Ongoing activities– CRM-Core– Harmonisation with object oriented version of FRBR, (Functional Requirement for

Bibliographic Records, IFLA), first version will be published in fall 2006– Extension of CRM with a categorical level, e.g. reoccurring events

Page 3: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

The CIDOC CRM Top-level Classes relevant for Integration

participate in

E39 Actors(persons, inst.)

E55 Types

E28 Conceptual Objects

E18 Physical Things

E2 Temporal Entities(Events)

E41

Ap

pel

lati

ons

refer to / refine

refe

r to

/ i d

ent i f

ie

have location

within

E53 PlacesE52 Time-Spans

at

affect or refer to

Page 4: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

CIDOC CRM: Class hierarchy

Page 5: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

CIDOC CRM: Events

Page 6: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

CIDOC CRM: Things and Conceptual object

Page 7: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

Original text (text witness)

Bibliographical record

Text with XML mark-up 1. Structural mark-up (2. Lemmatization etc.)

Step 1: registration

Step 3: transcriptionFacsimile

Step 2: reproduction

Text with XML mark-up Information elements identified and marked up according to a simple information model, DTD)

Step 4: content mark-up Museum database artefacts, excavations, referential information

Event/object oriented model (CIDOC-CRM compatible)

Motivation: Grey literature in Museums

Page 8: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

Catalogue entry

8.Malayan dagger, taken from pirates of the Indian Oceans.

Beautiful handle, graven as a human figure above waistline. Snake winded blade. VII, IX, p, 2. Daa,O., 99.

Donated April 11 1856 from Captain Teiste.

Motivation: Grey literature in Museums

Page 9: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

Catalogue entry with mark up

<NRPAR> <CATNR NRID="EM8"> 8</CATNR>. <ARTIFDATA><PROD><USE><PEOPLE><PLACE> Malayan </PLACE></PEOPLE></USE></PROD> <ARTIFACT> dagger </ARTIFACT> , <AQUISITION> taken from <AQUFROM>pirates</AQUFROM> of the Indian Oceans. </AQUISITION>

<DESCR>Beautiful handle, graven as a human figure above waistline. Snake winded blade. <LIT_REF>VII, IX, p, 2. Daa,O., 99.</LIT_REF></DESC>

<AQUISITION> Donated <AQUTIME> April 11 1856 </AQUTIME> from <AQUFROM> Captain Teiste </AQUFROM>. </AQUISITION> </ARTIFDATA> </NRPAR>

Motivation: Grey literature in Museums

Page 10: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.

Motivation: Grey literature in Museums

Page 11: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

E31 Document

E21 Person (actor)

E82 Actor appellaton

”Dr. Diggey”

E7 Activity

E52 Time span

E50 Date

”2005”

E55 Type

”Archaeological report”

P2 has type

P1 is identified by

E11 Modification

”Breaking of the sword”

P9 forms part of

P14 carried out by

E22 Man–Made object

“Sword”

P12 was present at

P70 documents

P4 has time-span

E55 Type

”Archaeological excavation”

E53 Place

E44 Place appellaton

”Wasteland”

P7 took place at

E82 Object identifier

” C50435”

P2 has type

The content of the text expressed in CIDOC-CRM

P1 is identified by P78 is identified byP87 is identified by

Page 12: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• Originally, a research project within the humanities– Founded in 1987-88– Sponsored by three professional associations– Funded 1990-1994 by US NEH, EU LE Programme etal

• Major influences– digital libraries and text collections– language corpora– scholarly datasets

• International consortium established June 1999 (see• http://www.tei-c.org/)

TEI - where did itcome from?

Acc. to L. Burnard

Page 13: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• better interchange and integration of scholarly data• support for all texts, in all languages, from all periods• guidance for the perplexed: what to encode — hence, a

user-driven codification of existing best practice• assistance for the specialist: how to encode — hence, a

loose framework into which unpredictable extensions can be fitted

• These apparently incompatible goals result in a highly flexible, modular, environment

Goals of the TEI

Acc. to L. Burnard

Page 14: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• A set of recommendations for text encoding, covering both generic text structures and some highly specific areas based on (but not limited by) existing practice

• A very large collection of element (400+) definitions with associated declarations for various schema languages

• a modular system for creating personalized schemas or DTDs from the foregoing

• for the full picture see http://www.tei-c.org/TEI/Guidelines/

TEI Deliverables

Acc. to L. Burnard

Page 15: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• a way of looking at what ‘text’ really is• a codification of current scholarly practice• (crucially) a set of shared assumptions about the digital

agenda:– focus on content and function (rather than

presentation)– identify generic solutions (rather than application-

specific ones)

Legacy of the TEI

Acc. to L. Burnard

Page 16: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• Elements for detailed bibliographic description:– File description

• Title statement• Edition statement• Extent statement• Publication statement• Series statement• Notes• Source Description

– bibliographic elements • (Manuscript description)

– Encoding description– Profile description– Revision description

• Mapping to other meta data standards– Marc, discusset– Dublin Core unfinished

TEI - the header

Page 17: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• Base Tag Set for Verse• Performance Texts• Transcription of Speech• Print Dictionaries• Manuscript description• Linking and alignment; analysis• Feature structures;• Certainty; physical transcription; textual criticism,• Names and dates• Graphs, networks and trees• Graphics, figures and tables• Language Corpora• Representation of non-standard characters and glyphs • Feature System Declaration

TEI additional element sets

Page 18: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

Some “ontological” elements in TEI: Events

• History– groups elements describing the full history of a manuscript or

manuscript part. • Origin

– contains any descriptive or other information concerning the origin of a manuscript or manuscript part

• CustEvent– describes a single event during the custodial history of a manuscript

• Provenance– contains any descriptive or other information concerning the origin of a

manuscript or manuscript part • Acquisition

– contains any descriptive or other information concerning the process by which a manuscript or manuscript part entered the holding institution.

Page 19: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• Event– (Event) any phenomenon or occurrence, not necessarily vocalized or

communicative, for example incidental noises or other events affecting communication. Eg. “ceiling collapses” during a recorded interview

• persEvent– contains a description of a particular event of significance in the life of a

person • Birth,death

– contains information about a person's birth/death, such as its date and place

• Date– contains a date in any format.

• Occasion– a temporal expression (either a date or a time) given in terms of a

named occasion such as a holiday, a named time of day, or some notable event

Some “ontological” elements in TEI: Events, time appellations

Page 20: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

• Person – provides information about an identifiable individual, for example

a participant in a language interaction, or a person referred to in a historical source.

• Hand– used in the header to define each distinct scribe or handwriting

style.

• Author– in a bibliographic reference, contains the name of the author(s),

personal or corporate, of a work; the primary statement of responsibility for any bibliographic item

• Name– (name, proper noun) contains a proper noun or noun phrase

Some “ontological” elements in TEI: Actors and appellations

Page 21: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

<person xml:id="Ovi01" sex="1" role="poet">  <persName xml:lang="en">Ovid</persName>  <persName xml:lang="la">Publius Ovidius Naso</persName>  <birth date="-0044-03-20">   20 March 43 BC    <placeName>     <settlement type="city">Sulmona</settlement>     <country reg="IT">Italy</country>    </placeName>  </birth>  <death notBefore="17" notAfter="18">

17 or 18 AD    <placeName>     <settlement type="city">Tomis (Constanta)</settlement>     <country reg="RO">Romania</country>    </placeName>  </death>

</person>

Some “ontological” elements in TEI: Person example (from P5 guidelines)

Page 22: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

A simple extension of the TEI-dtdThe root CIDOC-CRM element<!ELEMENT crm (crmClass*, crmProperty*)> <!ATTLIST crm id #ID>

The class element<!ELEMENT crmClass #PCDATA ><!ATTLIST crmClass

id #ID className #CDATA>

The property element<!ELEMENT crmProperty #EMPTY <!ATTLIST crmProperty

id #ID propName #CDATA from #IDREF to #IDREF>

Page 23: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.

The sample text revisited

Page 24: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

The text expressed with a TEI mark-up

<p id="p1">The <rs id="e1">excavation in

<name type="place" id="n1">Wasteland</name></rs> in

<date id="d1">2005</date> was performed by

<name type="person" id="n2">Dr. Diggey</name>. He had the misfortune of

<rs id="e2">breaking <rs id="o1">the beautiful sword

<rs id=“o_id1”>(C50435)</rs></rs> into 30 pieces

</rs>.</p>

Page 25: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

<crm id="crm-mod1"> <crmClass id="ent1" className=“E7_Activity"></crmClass> <crmClass id="ent2" className=“E55_Type">archaeological excavation</crmClass> <crmClass id="ent3" className=“E21_Person"></crmClass> <crmClass id="ent4" className=“E82_Actor_Appellation">Dr. Diggey</crmClass> <crmClass id="ent5" className=“E31_Document"></crmClass> <crmClass id="ent6" className=“E52_Time-span"></crmClass> <crmClass id="ent7" className=“E50_Date">2005</crmClass> <crmClass id="ent8" className=“E31_Document"></crmClass>… <crmProperty id="prop1" propName=“P2_has_type" from="ent1" to="ent2"/> <crmProperty id="prop2" propName=“P14_carried_out_by" from="ent1" to="ent3"/> <crmProperty id="prop3" propName=“P131_is_identified_by" from="ent3" to="ent4"/> <crmProperty id="prop4" propName=“P70_is_documented_in" from="ent1" to="ent8"/> <crmProperty id="prop5" propName=“P70_is_documented_in" from="ent4" to="ent5"/> <crmProperty id="prop6" propName=“P4_has_time_span" from="ent1" to="ent6"/> <crmProperty id="prop7" propName=“P78_is_identified_by" from="ent6" to="ent7"/>…</crm><linkGrp type="TEI-CRM interface"> <link targets="#ent5 #n2"/> <link targets="#ent8 #e1"/>…</linkGrp>

Encoding the information in an RDF-triplet fashion

Page 26: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

CRM-Core – a dtd for encoding information [suggested by CRM-SIG]

Page 27: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

<CRM_Core> <Category>E31 Document</Category> <Classification>Archaeological report</Classification> <Identification>Wasteland excavation 2005 report</Identification> <Event> <Role_in_Event>P70_documents</Role_in_Event> <Identification>Wasteland_2005_excavation</Identification> <Event_Type>E7_Activity</Event_Type> <Participant>Dr. Diggey</Participant> <Participant_Type>excavator</Participant_Type> <Thing_Present>C50435 sword</Thing_Present> <Date>2005</Date><Place>Wasteland</Place> </Event> <Event> <Role_in_Event>P70_documents</Role_in_Event> <Identification>damage_to_artifact_C50435</Identification> <Event_Type>E11_Modification</Event_Type> <Participant>Dr. Diggey</Participant> <Participant_Type>excavator</Participant_Type> <Thing_Present>C50435 sword</Thing_Present> <RelatedEvent> <Role_in_Event>P9_forms_part_of</Role_in_Event> <Identification>Wasteland_2005_excavation</Identification> </RelatedEvent> </Event></CRM_Core>

Encoding the information in CRM Core (Factoides)

Page 28: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

<CRM_Core> <Category>E21 Person</Category> <Classification>archaeologist</Classification> <Identification>Dr. Diggey</Identification> <Event> <Role_in_Event>P14 carried out by</Role_in_Event> <Identification>damage_to_artifact_C50435</Identification> <Event_Type>E11 Modification</Event_Type> <Participant_Type>excavator</Participant_Type> <Thing_Present>C50435 sword</Thing_Present> </Event></CRM_Core><CRM_Core> <Category>E82 Actor appellaton</Category> <Classification>formal name</Classification> <Identification>mention of name</Identification> <Relation> <To>Wasteland_excavation_2005_report#n2</To> <Relation_Type> <referred_to_by/> </Relation_Type> </Relation></CRM_Core>

Encoding the information in CRM Core (Factoides)

Page 29: TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

Conclusions and further work

• Possible now– TEI extended with a RDF-like CIDOC-CRM– TEI extended with CRM-Core records

• Future:– Make a mapping from TEI-elements to CRM– Make a mapping from the TEI-header into ooFRBR– Create an extension of the TEI definition – Write guidelines for CIDOC-CRM encoding of

information in TEI documents – Convince the TEI users