63
1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University at Buffalo

1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

Embed Size (px)

Citation preview

Page 1: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

1

Digital Library Content Model

Dagobert Soergel

College of Information Studies University of Maryland

Department of Library and Information Studies University at Buffalo

Page 2: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

2

The Problem

Digital libraries must

1.Store a wide variety of often complex information objects and display these objects on different platforms. This requires modeling information objects, their internal structure, and relationships among them.

2.Provide data that support discovery, interpretation, use, and management of information objects. This requires a good metadata model

3.Support annotation of information objects. Annotations turn out to be surprisingly diverse. An annotation my refer to only a part of an information object. This requires an elegant model that can deal with many cases.

Page 3: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

3

Purpose of the talk

To reexamine a number of basic notions regarding the content of a digital library (or, more generally, any information system) to achieve sound definitions

Developed in the framework of the

DELOS Digital Library Reference Model

a framework for describing digital libraries, their content, users, and functions and, for each, their qualities and associated policies

Page 4: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

4

Premisses

• Modeling the content domain is complex and much thinking is muddled

• Need to be able to handle both “data” and “documents”

• Any reference model • needs to be abstract and must not commit to any

particular standard or design decision

• rather, it must provide a framework for specifying the commitments of any particular DL (or information system)

Page 5: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

5

Issues

0 Scope of this talk and modeling constructs

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 6: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

6

Scope of this talk

• A reference model for a broadly conceived digital library will be able to model most any information system, thus will be useful very broadly.

• The focus on digital libraries is in the application, especially the type of collection, to which the model is applied.

Page 7: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

7

Scope: level of abstraction• The reference model should stay on an abstract level. It should

not require specific standards but rather allow for plugging in any standard, such as RDA or DC.

• A DL should indicate to the users what standard it uses for things like time, place, type of relationship, type of resource

• The reference model should not require design choices but rather provide a framework for specifying design choices,such as selectivity of the collection. A DL will then indicate whether its collection is selective or fully inclusive

Page 8: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

8

Modeling constructs

• The reference model should be based on an entity-relationship model (E-R model).

• Second-order logic: relationship instances are resources that can in turn be related to anything. Apply pragmatically for useful navigation and common-sense inferences; stay away from types of reasoning that run into problems with second order logic.

• Must add mechanisms for indicating the degree of precision or the degree of certainty of statements.

Page 9: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

9

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 10: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

10

Content in the overall context of a DL reference model

• Resources

• Structured data

• Unstructured data, text

• Uses of data

Page 11: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

11

Everything is a resource

W3C definition

A resource is anything that can be identified or named. Any resource is represented by a resource identifiern

Resource includes ● external (non-digital) objects or events and ● digital object or event, wherever that digital object or event may reside or occur.

Same as topic in topic maps

In an E-R model, entity types, entity instances (entity values), relationship types, and relationship instances are all resources

In RDA: Resource restricted to information object.Advantages of broader definition will become clear.

Page 12: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

12

Structured data = statements

Resource 1 <relationship> Resource 2

SoftwareModule <createdBy> LegalEntity

SoftwareModule <annotatedBy> Information object

Event <happenedIn> (Date1, Date2)

Multi-way relationships, frames

Statements are information objects, that is, they are resources that can in turn be related to anything

Statement also called proposition or assertions (or fact)

Page 13: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

13

More on structured data

Data consist of statements about resources.

Such statements can be conceived as relationship instances in which the resource in focus occupies one argument slot. A simple statement using a binary relationship or a multi-way relationship (a frame instance with slots filled) (objects in an object-oriented database)

Drug treatment frame instance

Drug Taxoteer

treatsDisease Cancer, estrogen-negative

inPopulationGroup Elderly

hasSuccessRate 55%

Page 14: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

14

More on structured data

Slot fillers are also known as data values.

A data value makes sense only when it is seen in relation to one or more resources, for example as a slot filler in a frame.

Examples

The value 55% makes sense only in the right context, such as in the success slot of a drug treatment frame

The value 185 cm makes sense only if we know it is the height of a person or the length of a pair of skis.

Page 15: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

15

There are two ways to communicate such statements.

1. Structured data:One learns what one wants to know about the resource in focus immediately from a relationship instance.

Hamlet <authoredBy> Shakespeare

The drug treatment frame on Taxoteer

The actual data of interest are represented in a database

Page 16: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

16

There are two ways to communicate such statements.

2. Unstructured data:One needs to extract what one wants to know from a text or image that is related to the resource in focus.

Shakespeare schrieb den Hamlet im Jahre 1625

Hamlet wurde von Shakespeare verfasst

Taxoteer ist effektiv in der Behandlung von Krebsen die keine Rezeptoren fuer Estrogen haben. In aelteren Personen liegt die Erfolgsrate bei 50%.

The data of interest are stored in what is commonly known as document.

Page 17: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

17

Functions of data

Data about a resource may serve any of the following functions:

• learn about the resource and its various characteristics

• learn about the history and context of the resource

• learn how to use the resource

• manage the resource• preserve the resource

The sections about metadata (roughly: data about an information object) will specialize this list

Page 18: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

18

Relationship as the basic modeling construct

Important principle:

Many concepts in a DL reference model are best modeled based on relationships rather than based on entities

For example, “annotation-hood” resides not in an information object but in the relationship

InformationObjectA <annotates> InformatioObjectB

InformationObject B <annotatedBy> InformationObjectA

Page 19: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

19

Resource type examples

• Information objectsIncl. documents, data streams, databases, queries and their results (virtual information objects, such as database reports, virtual collections)

• Actors that can search for, create, and manage resources

• Functions and services

• Software modules

• Policies

• Languages

• Ideas, concepts

Page 20: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

20

Inheritance

Many reference model constructs are specified at the level of resource.

They inherit down to the different resource types, especially information objects

For example, the following statement types are valid for Resource

Resource <identifiedBy> Identifier

Resource <characterizedBy> QualityParameter

Resource <regulatetBy> Policy

Therefore, they are also valid for InformationObject or Actor or Policy

Page 21: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

21

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 22: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

22

Information objects 1

1. A formal relationship instance (such a row in a table or a structured data record)

2. A document (written or spoken text, image, sound) from which a human reader can learn about the resource in focus or about the relationships among several resources.

Information extraction: document → formal relationship instances.

A collection of information objects is in turn an information object

• a table in a relational database = a collection of rows, each representing a relationship instance or a collection of relationship instances

• a collection of documents

Page 23: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

23

Information objects 2

An information object may be a close representation of an external object or event, for example

•An image (photograph or painting) of a building. There may be many such images taken from different angles etc.

•A video recording of a soccer game. There may be several such video recordings, each capturing different scenes, or capturing the same scene from different angles, or following different players, etc. These are different information objects representing the same external event.

Page 24: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

24

Real world objects, concepts, ideas

To provide full access to the information objects it contains, a digital library must manage data about any kind of object (real world objects, concepts, ideas) in its subject domain.

Why?

1. The DL may represent data in the form of a database

2. Users look for information objects that deal with or are digital representations of any kind of object.

This idea underlies Topic Maps which were originally designed to improve access to documents by relating the topics discussed in these documents.

Page 25: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

25

Real world objects, concepts, ideas

Examples (these are all resources)

• People (focus of biographical reference tools)

• Organizations (focus of organization directories)

• Events (focus of developing "event gazetteers")

• Places (focus of gazetteers)

• Dates

• Mathematical theorems (focus of mathematical encyclopedias)

• Concepts, ideas

• Problems and proposed solutions

• Computer programs (focus of software directories or libraries)

The reference model should have a more complete list and indicate sources dealing with these

Page 26: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

26

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 27: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

27

Levels, versions, and relationships

• Work, manifestation, item (individual copy)

• Linked through relationships

Page 28: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

28

Work

Intellectual or artistic entity, as the abstract essence or as a text, image, or piece of music.

Range:

•A basic story or theme• the story of Faust • the myth of the Great Flood

•A text telling the story, such as • Goethe's Faust• the account of the Great Flood in the Bible (original Hebrew)• the account of the same myth in another culture

•A specific version of the account in the Hebrew Biblea Latin translation of the account in the Hebrew Bible

Page 29: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

29

Manifestation

A specific rendering of a work by means of a graphical image or sound, taken in the abstract; the idea of such a rendering.

Examples:

• The text of Goethe's Faust printed in a particular typeface and layoutA performance at which the text is recited also renders the text but is more properly considered a separate, but related, work.

• A specific score of a given version of Schubert's Fifth. A performance of that version of Schubert’s Fifth also renders the piece of music but is considered a separate, but related, work.

Also the rendering of a work in the form of digital storage that can be transformed to a graphical image or sound, again taken as the abstract pattern of digital signals.

Page 30: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

30

Item, individual copy

The embodiment of a manifestation in a physical object

We can perceive the content of an manifestation only through an individual copy of it (unless we have memorized the visual expression manifest in a manifestation and can conjure it up from memory).

There are works that have only one manifestation of which there is only one copy.

Page 31: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

31

Relationships among information objects

The story of Faust <dealsWith> Pact with the devil

The story of Faust <isToldIn> Marlow’s Faust

The story of Faust <isToldIn> Goethe’s Faust

Goethe’s Faust <authoredBy> Goethe, Johann Wolfgang von

Goethe’s Faust <hasManifestation> R1231

R1231 <publishedBy> Cotta

R1231 <hasDate> 1871

R1232 <isCopyOf> R1231

R1232 <ownedBy> (HRieth, 1896, 1956)

R1232 <ownedBy> (DSoergel, 1956, *)

Page 32: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

32

Hierarchical inheritance

• Data about a work inherit to all works below it along <isToldIn>, <hasVersion> etc. Therefore

Goethe' Faust <dealsWith> Pact with the devil

• Data about a work inherit to all its manifestations. Therefore

R1231 <authoredBy> Goethe, Johann Wolfgang von

• Data about a manifestation inherit to all its items

• Hierarchical inheritance increases efficiency• More efficient catalog input• More efficient catalog storage• More efficient representation and reading of search results

Page 33: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

More relationships

R271 The man I killed, by Michael Halliday

R519 The man I killed, play by Christopher Wern

R519 <isBasedOn> R271

R315 Handbook of commercial geography, by Robert Chisholm

R783 Chisholm's handbook of commercial geography, entirely rewritten by L. Dudley Stamp and S. Carter Gilmour.

R783 <entirelyRewrittenFrom> R315

33

Page 34: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

34

Relationship to FRBRNotes on Terminology

• The FRBR distinction between work and expression should be rethought. It is unclear and consequently poorly understood, and it may not be necessary. Just have work.The intuition FRBR tries to capture in this distinction is better handled through relationships among works as defined here.

• Following FRBR I use the term manifestation. Other term: edition (in the sense of German Ausgabe), but edition also means German Auflage, so use of the term edition can be confusing.

• It would be nice to be able to use graphic expression as a synonym for rendering, but to avoid any further confusion with FRBR it is best not to use the term expression at all.

Page 35: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

35

Version control

Important, but not elaborated here

Page 36: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

36

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 37: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

37

Composite information objects / resources

Examples

•Book divided into chapters, sections, paragraphs, words (XML Document Object Model, DOM or TEI)Each part can be seen as a separate information object

•Movie with images, soundtrack, close captions, script, all coordinated (MPEG-7)

•A medical record with patient data, test data, images, live monitoring data streams, diagnoses, drugs prescribed, etc.

Page 38: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

38

Composite information objects / resources

Abstractly: Each component is a separate information object, composition expressed through relationships

In practice:

Many document models for composite (or compound) documents supporting presentation

DL needs to allow specification, for each document, of the particular document model used

Page 39: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

39

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 40: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

40

Identifying information objects

1 Initial definition upon entry into the digital library.

2 Definition on the spot

ExamplesAnnotate a specific segment of a text document or a region of an image or sound document orAnchor an annotation to a specific location in a document.

The segment or anchor is a new information object that is included in the original information object, and this new information object is linked with any of several annotation relationships to a new information object created by the user.

Related to composite objects. More on this under annotation

Page 41: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

41

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 42: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

42

Data about information objectsMetadata =

data about information objects if used for discovering, interpreting, and using information objects

Relate information objects to other types of resources. Examples:

InformationObject <hasCreator> Actor

InformationObject <dealsWith> Actor

InformationObject <containsText> Text (or, more specifically Word)

Relate a word in a text to the concept that is the meaning in which the word is used in this particular position.

InformationObjectA <hasAbstract> InformationObjectB

InformationObjectA <hasCriticalCommentary> InformationObjectC

InformationObjectD <hasSupportiveCommentary> InformationObjectC

Page 43: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

43

More on defining metadata

The “metadata-hood” of an information object does not reside in the information object, but in its relationship to another information object and, more specifically, in its use

A piece of data is used as metadata if it is used for the purpose of discovering, interpreting, and using information objects, which then give the ultimate data wanted.

The same piece of data may fill the ultimate need to of the user in one situation and be used as metadata in another situation.

Page 44: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

44

Not metadata• Data about resources that are not information objects are not

metadata even if they are similar in form.

• Data about information objects are not always used as metadata. For example, using author data to count a faculty members publications or citation data to compute impact

• Extensive discussion of what exactly is the definition of metadata is not a good use of resources. A system should provide the data that are useful to a user for whatever purpose; what each piece of data is called is less important.

Page 45: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

45

Metadata typologies

Metadata (and data in general) can be divided into categories from several perspectives, and within each perspective there exist several approaches. Some examples of how to categorize metadata

•by purposes or use. Since the same unit of metadata can be used for several purposes, the resulting categories overlap.

•by source, for example, extracted, assigned by cataloger, assigned by user (social tagging), from usage tracking

•by intrinsic characteristics, for example data about provenance or about the format of the information object

Page 46: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

46

Some metadata uses

A Learn about information objects and interpret them; this includes

A1 Learn about the identity and characteristics of information objects

(descriptive metadata)

A2 Learn about the history and other features of the context of the information object (contextual metadata)

B Learn how to use an information object, including

B1 Learn how to gain legal access (access and rights metadata)

B2 Learn how to gain technical access to the information object (what machinery and software is needed to access the

information object for a given purpose, such as assimilation by a person or processing by a computer program)

C Manage information objects (administrative metadata), in particular

C1 Manage the preservation of information objects (preservation metadata).

Page 47: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

47

Usage data

Data on usage of resourcesand on usage rights, usage history, future use / preservation important for discovering, interpreting, and using resources as well as managing resources

Some of these data can be collected automatically

If the resource in question is an information object, this kind of data is often used as metadata

Page 48: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

48

Issues

1a Content in the overall context of a DL reference model

1b Modeling information objects

1c Levels, versions, and relationships

1d Composite information objects / resources

1e Resource identifiers

2 Metadata, including provenance, context, usage

3 Annotation

Page 49: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

49

Annotation

InformationObjectA <annotatedBy> InformationObjectB

InformationObjectB may be created on the spot in order to annotate A (InformationObjectB and the annotation relationship have the same author) or B may preexist (the annotation relationship between A and B is introduced by a third party)

Specific type of annotation expressed by specializing the annotatedBy relationship, for example

InformationObjectA <criticizedBy> InformationObjectBInformationObjectA <hasCriticalCommentary> InformationObjectCInformationObjectD <hasSupportiveCommentary> InformationObjectC

InformationObjectE <isPartOfSpeech> PartOfSpeech

Annotation-hood is in the relationship, not in the information object

Page 50: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

50

Annotation

Annotation-hood is in the relationship, not in the information object

There is a wide range of relationship types that are called annotations. Linguists think of annotations differently than scholars making comments on a text.

Rather than trying to define exactly what “annotation” means, the reference model should include a comprehensive list of relationship types that might be considered annotation by somebody so that anybody can define their meaning of annotation by giving the appropriate subset of annotation relationship types.

The same thought applies to metadata, discussed on a later slide.

Page 51: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

51

Special resource types for annotations

Some annotations require special types of resources.

Examples

Annotate a text with part-of-speech indications annotated resource : a one-word fragment of the textannotating resource: a value from a list of parts of speech

Annotate a text with meaning for word sense disambiguation annotated resource : a word or phrase in the textannotating resource: a value from a list of meanings defined in some way

Annotation through underlining or other marksannotated resource : a fragment of text or other information objectannotating resource: a pair (sign, meaning), e.g. (underline, important) or

(?, check this out) or (X, nonsense)

The annotated resource and the annotating resource may be very short

Page 52: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

52

Annotation and metadataMetadata and annotation data overlap, and different communities and individuals have different definitions of what is included in metadata and what is included in annotations.

The precise nature of a unit of data about an information object is determined by the relationship type and the resource that is linked to. The interpretation of each type of data is in the eye of the beholder.

Need an inventory of relationship types (a type of ontology)For example, the CIDOC Content Reference Model (CIDOC/CRM) is an inventory of broad relationship types.

In such an inventory, one could indicate who considers a given relationship type as usable as metadata and/or as belonging to annotation.

Page 53: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

53

Take-home message 1

The entity-relationship model (E-R model) provides the unifying principle for a digital library content model

The E-R model allows representation of structured data of any complexity on a conceptual level.

Defining relationships between information objects handles•Modeling information objects•Levels, versions, and relationships•Composite information objects / resources •Metadata•Annotation

Many notions are captured better through relationships than fine distinctions of entity types

Page 54: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

54

Take-home message 2

Any reference model

• needs to be abstract and must not commit to any particular standard or design decision

• rather, it must provide a framework for specifying the commitments of any particular DL (or information system)

A reference model provides a systematic framework for description and analysis, not a prescription

Page 55: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

Dagobert Soergel

dsoergel at umd.edu

www.dsoergel.com

55

Page 56: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

Omitted slides

56

Page 57: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

57

Construction process

• Need to be sure all applicable concepts from various sources such as the 5S model and FRBR/CRM are included, either in the skeleton model or in a list of values / choices, as appropriate

• There is still work to be done to pull reference model subject matter out of the reference architecture document, and vice versa.

Page 58: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

58

Construction process

• We should have an online version of the reference model document with the following properties• Links to discussion of issues and underlying

rationale, capturing some of the discussion in the group.

• Links from the reference model to the appropriate section of the reference architecture

• The Wiki page may not quite do it.

Page 59: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

59

There are two ways to communicate such statements.

1. One learns what one wants to know about the resource in focus immediately from a relationship instance.

Hamlet <authoredBy> Shakespeare

The drug treatment frame on TaxoteerThe actual data of interest are represented in a database that captures these statements (relationship instances), such as

a collection of Prolog statementsa relational databasean object-oriented database

2. One needs to consult an information object that is related to the resource in focus.Shakespeare schrieb den Hamlet im Jahre 1625Hamlet wurde von Shakespeare verfasst

Taxoteer ist effektiv in der Behandlung von Krebsen die keine Rezeptoren fuer Estrogen haben. In aelteren Personen liegt die Erfolgsrate bei 50%

Page 60: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

• The DL designer must decide how to identify the new resource that is a part of an existing

resource and the new text object created by the annotator

and how to store the link between these two information objects

60

Page 61: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

61

Identifying information objectsArchitecture issues

Definition on the spot, options

(1) use completely independent identifiers and store the relationship explicitly

(2) use dependent identifiers

The part of a document can be identified by document identifier followed by information that uniquely identifies the part. The part relation is implied by the structure of the identifier.

The annotation information object could be identified by the identifier of the resource being annotated followed by a short string that identifies the nth annotation of this resource (like a footnote). The relationship between the resource and the resource annotating it would be implied by the identifier (however, the specific type of the annotation relationship would not be captured this way). The resource that annotates still can be referenced from any other context.

Implicit representationEmbedded annotations: The annotation is embedded in the document, linked to a point in a text that is identified only by the place of the annotation. This could be converted to an explicit representation.

Page 62: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

62

Some metadata uses

This is a specialization of the functions of data given above

A learn about other data, that is, information objects, and understand them; this includes

A1 learn about the identity and characteristics of information objects (descriptive metadata)

A2 learn about the history and other features of the context of the information object (contextual metadata)

B learn how to use an information object (source of data), including

B1 learn how to gain legal access to the information object (access and use rights metadata)

B2 learn how to gain technical access to the information object (what machinery and software is needed to access the information object for a given purpose, such as assimilation by a person or processing by a computer program)

C manage information objects (administrative metadata), in particular

C1 manage the preservation of information objects (preservation metadata).

Page 63: 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University

63

Metadata in the reference model

When describing a DL using the reference model, need to be able to indicate any typology of metadata used in the DL