38
Extending Models for Controlled Vocabularies to Classification Systems: Modelling DDC with FRSAD Joan S. Mitchell OCLC, Inc. Marcia Lei Zeng Kent State University Maja Žumer University of Ljubljana, Slovenia

Extending Models for Controlled Vocabularies to ...udcds.com/seminar/2011/media/slides/UDCSeminar2011_Mitchell_Zen… · Extending Models for Controlled Vocabularies to Classification

Embed Size (px)

Citation preview

Extending Models for Controlled

Vocabularies to Classification Systems:

Modelling DDC with FRSAD

Joan S. Mitchell

OCLC, Inc.

Marcia Lei Zeng

Kent State University

Maja Žumer

University of Ljubljana, Slovenia

The big question

Can the FRSAD conceptual model be extended

beyond subject authority data (its original focus) to

model classification data?

Outline

1. From Knowledge Organisation Systems (KOS)

to data and conceptual models

2. FRSAD conceptual model

3. FRSAD model for classification systems

4. DDC case study

5. Findings and limitations

6. Future work

2009

1998

2010

1876 DDC

1905

UDC

1898

LCSH

FRSAD

FRAD FRBR

1967

TEST*

*Thesaurus of engineering and scientific terms

ISO 2788 (1974) Guidelines for the Establishment and Development of Monolingual Thesauri

ISO 5964 (1985) Guidelines for the Establishment and Development of Multilingual Thesauri

1974

ISO 2788*

1985

ISO5964*

2004-2

009

SKOS

OWL

1. From Knowledge Organisation Systems

to Data and Conceptual Models:

Timeline

From Knowledge Organisation Systems

to Data and Conceptual Models:

Modelling efforts

2009

1998

2010

1876

1905

Classifi-

cation

1898

Subject

headings

FRSAD

FRAD FRBR

1967

1974

ISO 2788

1985

ISO5964

2004-2

009

SKOS

OWL

Classifi-

cation

Thesauri

Thesauri KOS

KOS

ontology

Thesauri: mostly comply with ISO 2788 and ISO 5964.

Subject heading schemes: adopted the basic structure of the thesaurus since 1990s.

Classification systems: implemented different practices and are usually constructed

according to specific conventions and examples.

The “FRBR family”

FRBR: the original framework

All entities, focusing on Group 1 entities: work, expression, manifestation, item

Published 1998

FRAD: Functional Requirements for Authority Data

Focusing on Group 2 entities: person, corporate body, family

Published 2009

FRSAD: Functional Requirements for Subject Authority Data

Focusing on Group3 entities

FRSAR WG established in 2005

Published 2010

The FRBR family models: main entities and relationships

FRBR

FRAD

FRSAD

2. FRSAD Conceptual Model

2.1 The core of the FRSAD conceptual model

FRSAD – generalisation of FRBR

The core of the FRSAD conceptual model

FRSAD Part 1: WORK has as subject THEMA /

THEMA is subject of WORK

FRSAD Part 2: THEMA has appellation NOMEN /

NOMEN is appellation of THEMA

NOMEN = any sign or sequence of

signs (alphanumeric characters,

symbols, sound, etc.) that a thema

is known by, referred to or

addressed as

Note: in a given controlled vocabulary and within a domain,

a nomen should be an appellation of only one thema.

The ‘has appellation’ relationship between

thema and nomen in a controlled vocabulary:

NOMEN = any sign or sequence of signs (alphanumeric characters, symbols,

sound, etc.) that a thema is known by, referred to or addressed as.

Source: STN Database Summary Sheet: USAN (The USP Dictionary of U.S.

Adopted Names and International Drug Names)

An example of nomens in an authority record for a chemical compound

Nomen

1-8

Nomen 9

terms (preferred & non-preferred)

notations

terms of pre-coordinated strings

category labels (w or w/t notations)

terms or identifiers

… …

• thesauri:

• classification schemes:

• subject heading systems:

• taxonomies:

• controlled lists:

• … …

themas represented by:

Nomens in different types of KOS

2.2 Relationships

(1) Thema-to-thema relationships

Hierarchical The generic relationship

The hierarchical whole-part relationship

The instance relationship

Other hierarchical relationships

Associative [most commonly considered categories are listed in the

report]

Other thema-to-thema relationships are domain- or

implementation-dependent

Equivalence

Two nomens are considered equivalent only if they are appellations of the same thema in a controlled vocabulary.

Partitive

An instance of a nomen may have parts.

A whole-part relationship may exist between a nomen and its components.

2.2 Relationships

(2) Nomen-to-nomen relationships

2.3 Attributes

Some general attributes of thema and nomen are

proposed

(1) thema attributes: type of thema, scope note

In an implementation themas can be organized based on

category, kind, or type

(2) nomen attributes: see next slide

In an implementation additional attributes may be

recorded

Nomen attributes

Type of nomen (identifier, controlled name, …)

Scheme (LCSH, DDC, UDC, ULAN, ISO 8601…)

Reference source of nomen (Encyclopaedia Britannica…)

Representation of nomen (alphanumeric, sound, visual,...)

Language of nomen (English, Japanese, Slovenian,…)

Script of nomen (Cyrillic, Thai, Chinese-simplified,…)

Script conversion (Pinyin, ISO 3601, Romanisation of Japanese…)

Form of nomen (full name, abbreviation, formula…)

Time of validity of nomen (until xxxx, after xxxx, from… to …)

Audience (English-speaking users, scientists, children …)

Status of nomen (provisional, accepted, official,...)

Note: examples of attribute values in parenthesis

include but not limited to:

2.4 The importance of the THEMA-NOMEN model

to the subject authority data

Separating what are usually called concepts (or

topics, subjects, classes [of concepts]) from what

they are known by, referred to, or addressed as

A general abstract model, not limited to any

particular domain or implementation

Potential for interoperability within the library

field and beyond

3. FRSAD model for classification systems

• Each class corresponds to a thema

• Notation associated with the class is the nomen

• Thema is the full category description of the class

• Nomen is the symbol (or surrogate) used to

represent the full category description

4. DDC case study

Thema: Class 025.04

Nomens: DDC number, Full caption, URI

025.04

Computer science, information & general

works/Library & information sciences/Operations of

libraries, archives, information centers/Information

storage and retrieval systems

http://dewey.info/class/025.04/

Thema: Any topic co-extensive with the full

meaning of the class topics that are

functionally

equivalent to the

class

Scope note: Text describing or defining thema

or specifying scope within particular system

Scope note

(≠ thema/class)

Scope note

(≠ thema/class)

Thema-to-thema relationships

associative

relationship

associative

relationship

(poly)hierarchical

relationship

Alternative nomens: Relative Index terms

with equivalence relationship to class

equivalence relationship

?

? ?

?

?

? ?

?

scope note SN

SN

SN

SN

? unknown relationship

?

Derived alternative nomens

150 ## $a Databanks

260 ## $i see also $a Databases

equivalence relationship

?

? ?

? ?

?

scope note SN

SN

SN

SN

? unknown relationship

Derived

5. Findings and limitations

• FRSAD conceptual model appears to accommodate

DDC data at a broad level

• Topic-to-topic relationships require further study

• The study did not consider the usefulness of

classification data modelled using FRSAD in real-

world applications

6. Future work

• Specify all relationships between Relative Index terms

and classes (see earlier work by Green, Mitchell)

equivalence relationship

?

? ?

? ?

?

scope note SN

SN

SN

SN

? unknown relationship

Derived

6. Future work

• Specify all relationships between Relative Index terms

and classes (see earlier work by Green, Mitchell)

• Investigate DDC translations and mappings in context of

model

French

DDC 22

German

DDC 22

Italian

DDC 22 Swedish

Mixed

DDC 22

Italian

A14

Vietnamese

A14 French

A14

Spanish A14 Hebrew

A14

200

Religion

Class

Guide

(French)

DDC 22

A14

DDC Sach-

Gruppen

(German)

DDC

Summaries

English

French

Italian

Rhaeto-Romansch

Afrikaans

Arabic

Chinese

French

German

Norwegian

Portuguese

Russian

Scots Gaelic

Spanish

Swedish

Mappings and crosswalks

DDC

LCSH

MeSH

SWD

RAMEAU

SAB

BISAC SEARS

CSH

UDC

LCC

SAO

Nuovo Soggettario

Thema-to-thema relationships across languages:

Class 025.04 (22/swe) = Class 025.04 (22)

Thema-to-thema relationships (Complex case):

T2—43414 (22) = T2—43414 (22/ger), but . . .

T2—43414 Giessen district (Giessen Regierungsbezirk)

Including *Lahn River

T2—43414 Regierungsbezirk Gießen

T2—434147 Lahn-Dill-Kreis

Hier auch: der Fluss *Lahn

not equivalent

to thema/class

T2—43414

functionally

equivalent to

thema/class

T2—434147

6. Future work

• Specify all relationships between Relative Index terms

and classes (see earlier work by Green, Mitchell)

• Investigate DDC translations and mappings in context

of model

• Investigate modelling the Relative Index as a separate

controlled vocabulary to provide a topic-centered

view

• Experiment with modelling other classification

schemes

• Investigate usefulness of classification data modelled

using FRSAD