77
INDEXES AND INDEXING Ma. Theresa B. Villanueva Head, Microforms and Digital Resource Center Rizal Library, Ateneo De Manila University April 15-16, 2013 James O’Brien Library-Ateneo de Naga University

INDEXES AND INDEXING

  • Upload
    liang

  • View
    114

  • Download
    2

Embed Size (px)

DESCRIPTION

INDEXES AND INDEXING. Ma. Theresa B. Villanueva Head, Microforms and Digital Resource Center Rizal Library, Ateneo De Manila University April 15-16, 2013 James O’Brien Library-Ateneo de Naga University. DEFINITION OF TERMS. - PowerPoint PPT Presentation

Citation preview

Page 1: INDEXES AND   INDEXING

INDEXES AND INDEXING

Ma. Theresa B. VillanuevaHead, Microforms and Digital Resource Center

Rizal Library, Ateneo De Manila University

April 15-16, 2013James O’Brien Library-Ateneo de Naga University

Page 2: INDEXES AND   INDEXING

Index

a tool, which indicates to a user the information or a source of information that one needs

2

a systematic guide designed to indicate subjects, topics, or features of documents in order to facilitate their retrieval

DEFINITION OF TERMS

Page 3: INDEXES AND   INDEXING

Indexing

the process of identifying and assigning index terms to a document, either to describe its physical characteristics, give facts about its creator or distribution, or describe its content

3

Page 4: INDEXES AND   INDEXING

General Purposes of Indexes

To construct representations of documents in a form that is suitable to the users to browse through

To maximize the searching success of the users

To minimize the time and effort in finding information

4

Page 5: INDEXES AND   INDEXING

• facilitate reference to the specific material or to locate wanted information

• serve as filter to withhold irrelevant materials

• make the information storage and retrieval system useful to individual

• disclose related information

• tool for current awareness services

5

Uses of Indexes

Page 6: INDEXES AND   INDEXING

6

Page 7: INDEXES AND   INDEXING

By Arrangement

7

a. Alphabetical Index - is based on the orderly principle of letters of the alphabet; used for the arrangement of subheadings, cross references as well as main headings

b. Classified Index – contents are arranged systematically by classes or subject headings

c. Concordance – is in alphabetical index of all principal words appearing in a single text or in a multi-volume of a single author w/ a precise pointer to the precise point at which the word occurs.

Page 8: INDEXES AND   INDEXING

By Physical Form

8

a) Card index – an index in which 3” x 5” cards are used as the tools

b) Printed index – a tool for indexing or for researching and retrieval of information that is in printed form

c) Microform index – index to microforms such as microfiche and microfilm

d) Computerized index – uses computers to construct indexes

Page 9: INDEXES AND   INDEXING

By Type of Materials Index

a. Audiovisual Material Index

- textual labeling (index terms or description) is needed along with image matching

- search on words may retrieve a particular image related to the search term which in turn can be used as input to find other related entries

9

Page 10: INDEXES AND   INDEXING

b. Book index

- a list of words or group of words arranged

alphabetically, at the back of the book giving a page location of the subject or name associated with each word.

10

Page 11: INDEXES AND   INDEXING

Periodical Index/Newspaper Index

- open-ended projects usually performed

by group of people

- consistency is a challenging part since

each periodical issue may deal with unrelated topics by several authors

- written in different styles and aimed at different users.

11

Page 12: INDEXES AND   INDEXING

Classified Index Entry points are arranged in a hierarchy of related topics, starting with generic or broad topics and working down to the specific ones.Examples: - Index Medicus – classified index in the field of medicines and related disciplines - Engineering Index – classified index in the field of engineering and related disciplines

Alphabetical Subject Index

an alphabetical subject index covers a number of different kinds of indexes. The arrangement is in alphabetical order and follows a familiar pattern.

Examples:- Reader’s Guide to Periodical Literature (RGPL)- Index to Philippine Periodicals (IPP)

Author IndexEntry points are names of persons, organizations, government agencies, institutions, etc.

Examples: - Development Bank of the Philippines - Philippine Chamber of Commerce and Industry - Romulo, Carlos P.

Periodicals Indexes

12

Page 13: INDEXES AND   INDEXING

- refers to the extent to which a document is analyzed to identify its subject content

– refers to the extent to which a concept or topic in a document is identified by precise term in the hierarchy of its genus-species relations

–refers to the extent to which agreement exists on the terms to be used to index contents of documents

INDEXING PRINCIPLES

Exhaustivi

ty

Consistency

Specificity

13

Page 14: INDEXES AND   INDEXING

Principle of Exhaustivity

• Exhaustive indexing

use of various index terms to fully cover the major and minor themes of document

•  Selective indexinguse of a few terms to cover only the main or major theme of a document

14

Exhaustivity results to high recall but low precision.

Page 15: INDEXES AND   INDEXING

Principle of Specificity

Example:

Genus: Citrus FruitsSpecies: ORANGES

LEMONS LIMES

GRAPEFRUITS

Specificity would result to high precision but low recall 15

Page 16: INDEXES AND   INDEXING

There are two types of consistency level:

Inter-indexer consistency refers to the agreement between or

among indexers in assigning subject terms in a particular article

16

Principle of Consistency

Intra-indexer consistency refers to the extent to which one

indexer is consistent to himself/herself on assigning subject terms.

Page 17: INDEXES AND   INDEXING

Indexing Methods

1. Derived or derivative indexing

– a method by which words and phrases occurring in the title or text of

documentary unit are extracted by a human or computer to serve as indexing terms.

- also called an extractive indexing.

17

Page 18: INDEXES AND   INDEXING

2. Assigned indexing

- a method by which terms, descriptors or subject headings are selected by a human or computer to represent the topics or features of a documentary unit

- assigned terms are often times taken from a

source other than the document itself.

18

Page 19: INDEXES AND   INDEXING

Indexing Language

An indexing language is a language that is used by the indexer to

represent the subject content of a document.

19

Page 20: INDEXES AND   INDEXING

Purposes and Uses of Indexing Language:

20

to represent the subject content of a document either using the words of the author or assigning appropriate descriptors from a controlled vocabulary

to help users discriminate between terms and reduce ambiguity in the language

Page 21: INDEXES AND   INDEXING

Types of Indexing Language

1. Natural Language

- uses index terms/words occurring in the printed text as index entries; it is

sometimes called derived-term system

21

Page 22: INDEXES AND   INDEXING

Characteristics of using Natural Language:

• Improves recall because it provides more access point but reduces precision

• Redundancy is greater

• Uses more current terms

• Tends to be favored by end-users

22

Page 23: INDEXES AND   INDEXING

2. Controlled vocabulary

- represent the general conceptual

structure of one or more subject areas and presents a guide to the users of the index

- categorized as assigned-term system

23

Page 24: INDEXES AND   INDEXING

Controlled Vocabulary provides cross references in the form of Use:

24

To show the three relationships of terms:

a) equivalenceb) hierarchical c) associative

This is achieved by providing or showing under:

broader term (BT) narrower term (NT) related terms (RT)use for (UF)

see also (SA)

Page 25: INDEXES AND   INDEXING

Relationships of Terms:

a. Equivalence relationship - implies that there will be more than one term denoting the same concept

25

Page 26: INDEXES AND   INDEXING

Equivalence relationship:

Example 1

Use for (UF) or Use reference (see reference)

Example: EMPLOYEES

UF: Personnel Staff Workers

- refers to a preferred descriptor from a non-usable term

26

Page 27: INDEXES AND   INDEXING

Equivalence relationship

Example 2:

BIRTH CONTROL UF : Family Planning

- reference deals primarily with synonymous or variant forms of the preferred descriptor

- it is also used to lead the indexer to more general terms

27

Page 28: INDEXES AND   INDEXING

Examples that indicate Equivalence relationship:

28

Synonyms (e.g. Reason; Cause)

Quasi-synonyms (e.g. Law; Law Management)

Preferred spelling (e.g. Catalog; Catalogue)

Acronyms and abbreviations (e.g. ASEAN; Association of Southeast Asian Nations)

Current and established terms (e.g. Cellular Radio; Cellular Phone)

Translation (e.g. Coconut Coir; Bunot)

Page 29: INDEXES AND   INDEXING

b. Hierarchical relationship

– refers to the general and specific or broad and narrow type of relationship

29

Page 30: INDEXES AND   INDEXING

Broader term (BT)

EmployeesBT : People

- shows hierarchical relationship upward in the classification ranking

- it differs from the use for reference in that both the basic terms and its broader term are descriptor

terms and both can be used

30

Hierarchical relationship Example 1 :

Page 31: INDEXES AND   INDEXING

CatsBT: ANIMALS

"ANIMALS" is a broader term to "CATS“ because all cats are animals.

Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm

Hierarchical relationship:

Example 2

Page 32: INDEXES AND   INDEXING

Narrower term (NT)

Employees

NT : HOTEL EMPLOYEES RAILROAD EMPLOYEES

- reference is similar to the broader term reference, except it goes down in the classification ranking

32

Hierarchical relationship: Example 3

Page 33: INDEXES AND   INDEXING

HeadNT : NOSE

“NOSE” might be a narrower term to “HEAD”, because noses are normally parts of heads.

Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm

Hierarchical relationship: Example 4

Page 34: INDEXES AND   INDEXING

Genus – species relationship (represent class

inclusion) Example: Animals Domestic Animals

Cats

Whole-part relationship Example: Hand Fingers

Instance relationship Example: Mountains Mount Apo

34

Page 35: INDEXES AND   INDEXING

c. Associative relationship

- refers to a non-hierarchical relationship of terms

35

Page 36: INDEXES AND   INDEXING

Example 1 :

Related term (RT)

EMPLOYEE

RT : EMPLOYMENT

- reference refers to a descriptor that can be used in addition to the basic term but not

in a hierarchical relationship

36

Associative relationship

Page 37: INDEXES AND   INDEXING

Other Examples :

Teachers – Student Tables – Chairs Education – Teaching Men – Women

37

Associative relationship

Page 38: INDEXES AND   INDEXING

Scope Note:

Examples: INDEXING (SN) Assigning of natural language terms

to documents

HOSPITALIZATION (SN) Assign also terms for the conditions for which patients were

hospitalized, if applicable

Qualifier: Example: Security (Law)

Security (Psychology) 38

Reference: http://publish.uwo.ca/~craven/677/thesaur/main08.htm

Scope Note (SN) & Qualifier - used to give the users about the descriptor’s usage restrictions or to clarify ambiguity; a scope note may give additional instructions to indexers

Page 39: INDEXES AND   INDEXING

Functions of Controlled Vocabulary:

• To control synonyms by choosing one form as the standard term

• To make distinction among homographs

• To link or bring together those terms whose meaning are closely related

Example: Cereals and Wheat

• Controls variant spelling

39

Page 40: INDEXES AND   INDEXING

40

A controlled vocabulary may take the form of verbal expressions as illustrated by Subject Headings Lists and Thesauri or coded/nonverbal expressions as shown by Classification schemes.

Subject headings lists – are lists of terms representing several subject fields; some focus on specific fields

Thesauri – are another authority devices that cover more

specific or narrower subject fields

Classification schemes – generally contain coded expression

or notations to the relevant topics in a particular class or

subclass

Page 41: INDEXES AND   INDEXING

INDEXING GUIDELINES & PROCEDURES

Part 2

41

Page 42: INDEXES AND   INDEXING

INDEXING PROCESS:

1. Recording of bibliographic data

- recording of the important information or the elements that identify a particular document

The International Organization for Standards (ISO) set a Standards for bibliographic references:

ISO 690 1975 (E)- “Bibliographic References

Essential and Supplementary Elements” 42

Page 43: INDEXES AND   INDEXING

43

- When indexing contents of a collection of documents, locators should give complete information about each document.

- for periodical articles, each entry normally consists of

the following elements:

Essential elements for an article or contribution in a

periodical are:

Name(s) of Author(s) with forenamesTitle of the article Title of the periodical or SourceVolume Number Issue Number Date of the issue Page number

Page 44: INDEXES AND   INDEXING

Example: Name(s) of Author(s): [Xian, Jie]

Title of the article : [Hybrid rice: a new hope towards a

bountiful Philippines]

Title of the periodical or Source : [Impact]

Volume Number : [46]

Issue Number : [9]

Date of the issue : [September 2007]

Page number : [4-8]44

Page 45: INDEXES AND   INDEXING

Sample entry:

________________ (subject/Topic)

Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8.

ISO FORMAT:

Page 46: INDEXES AND   INDEXING

46

ATENEO FORMAT:

OTHER FORMAT:

________________ (subject/Topic)

_______________ (subject/topic)

Format comparison:

_______________ (subject/topic)

ISO FORMAT:

Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8.

Hybrid rice: a new hope towards a bountiful Philippines. Xian, Jie. Impact 46 (9) : 4-8. S ‘12.

Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact 46 (9) : 4-8. S ‘12

Page 47: INDEXES AND   INDEXING

2. Subject determination

“aboutness of the material and the formulation of a

concept list

• Choose the most appropriate concepts; consider the users & the purpose of the index

• No arbitrary limit should be set to the number of terms or descriptors which can be assigned to a document.

- it should be determined fully by the amount of information

contained in the document - it should be related to the expected needs of the users of

the index. 47

Page 48: INDEXES AND   INDEXING

• Modify the indexing guidelines and procedures if needed; but modification should not compromise the structure or logic of the indexing language.

• Concepts should be as specific as possible. More general concepts may be preferred in some circumstances, depending upon the following factors:

– over-specificity might adversely affect the performance of the indexing system.

– if an idea is not fully developed, or is referred to only casually by the author, then it might be justified to index at a more general level

48

Page 49: INDEXES AND   INDEXING

3. Content/Conceptual analysis

– identifying the topics discussed in a

document and determining what aspects of its users will be interested in

49

Page 50: INDEXES AND   INDEXING

Content Analysis

- Decide which topics in the item are relevant to the potential user of the document.

- Decide which topics truly capture the content of the document.

- Determine terms that come as close as possible to the terminology use in the document.

- Decide on index terms and the specificity of those terms.

50

Page 51: INDEXES AND   INDEXING

Parts of the document that have to be

analyzed

Title of the document/article - it is considered as basic indexing unit

- it is the first stop in determining the subject content

Abstract - actual information-packed miniature of documents;

- good abstract can be fundamental indicator of subject content

51

Page 52: INDEXES AND   INDEXING

Text itself - includes introduction, summary, conclusion, section heading, first & last sentences of the paragraph

Illustrations, diagrams, tables and captions

References - reference sources cited by the author may also

be considered as subject indicator

52

Page 53: INDEXES AND   INDEXING

Factors that may affect content analysis:

if there is labor shortage or other critical time factor

the guidelines and policies imposed by institutions that generally concerns with the selection of index

content

decisions of the indexer which aspects of the subjects will be emphasized and which aspects will be deemphasized

53

Page 54: INDEXES AND   INDEXING

4. Translation

- involves the conversion of terms in the natural language into standard terms drawn from a

controlled vocabulary such as thesaurus, subject headings list, etc.

- match terms in the concept list against those available in the controlled vocabulary

 

54

Page 55: INDEXES AND   INDEXING

Practices to follow in the Translation process:

55

- Concepts which are already translated into indexing terms should be translated into their preferred terms

- Terms which represent new concept should be checked for accuracy and acceptability from the reference tools such as:

◦ Dictionaries and encyclopedias ◦ Thesauri (UNBIS Thesaurus)◦ Classification schemes (Library of Congress)◦ Established indexes (Reader’s Guide to Periodical Literature)

Page 56: INDEXES AND   INDEXING

- Subject specialist, particularly those with some knowledge of indexing or documentation, may also be consulted

56

- If the concepts are not found in existing thesaurus or

classification scheme, these may be:

• expressed by terms or descriptors which are admitted into indexing language

• represented temporarily by more general terms; the new concepts being proposed as candidates for later addition

Page 57: INDEXES AND   INDEXING

Translation

- Group references to information that is scattered in the text of the document.

- Combine heading and subheadings into related multilevel headings.

- Direct the user seeking information under terms not used to those that are being used by means of see references and to related terms with see also references.

- Arrange the index into a systematic presentation

57

Page 58: INDEXES AND   INDEXING

Generating Index Entries

Index entries maybe generated manually or using the computer.

Manual generation- involves generation of index entries one by one using an ordinary or electric typewriter

Machine generation- involves the use of the computers in generating index entries; various software packages are available

58

Page 59: INDEXES AND   INDEXING

Indexing Techniques for Periodicals

1. Topics that can be considered for indexing are the following:

- persons - local politics - sports events - entertainment - economic news - editorials & columns

- special features - first and last events   - social trends

59

Page 60: INDEXES AND   INDEXING

• All article that have permanent value should be indexed under all topics and issues dealt with

• Editorials should be indexed under their topics as any other article but differentiated with others by adding (Ed.) or (E). The titles of editorials may be indexed under a collective heading “Editorials”.

• Letters to the editor if considered indexable should be indexed by topic, not under a caption that may have been assigned by the editor. It is advisable to index at least the name of the person who criticized an article as well as the author’s response.

60

Page 61: INDEXES AND   INDEXING

2. Preference and Forms of Headings based on the

International Organization for Standardization

(ISO 999)

Personal Names:

– Provide as full a form as possible

– Choose the most recent/most commonly used form of personal name as the heading and add “see” cross-reference from other forms

– Personal names should be take the form used in the document, but if the text is not consistent the indexer should adopt one form. 61

Page 62: INDEXES AND   INDEXING

– Compound and multiple surnames, whether hyphenated or not, should be indexed under the first part

e.g. Lee Chua, Queena, Loren ; Perez de Cueller, Javier

– Persons normally identified by title of honor or nobility should be indexed under the first name

e.g. Prince Charles see Charles, Prince of Wales Queen Elizabeth I see Elizabeth I, Queen of England

62

Page 63: INDEXES AND   INDEXING

63

Corporate Bodies

• Names of the corporate bodies should normally be indexed without transportation and in as full a form as necessary. An initial article is omitted , unless specifically required for semantic or grammatical reasons

e.g. Lopez Museum

• Transposition maybe used if it is considered that this would help the users of the index

e.g. Department of Energy see Energy, Department of

• Choose the most recent, or the most commonly used, form of corporate name as the main heading and add “see” cross references from other forms

e.g. Philippine Normal College see Philippine Normal University

Page 64: INDEXES AND   INDEXING

64

Geographic Names

• Geographic names should be as full as is necessary for clarity, with additions to avoid confusion with the otherwise identical names Example: J.P. Rizal (Quezon city)

J.P. Rizal (Marikina)

• An article or preposition should be retained in a geographic name of which it forms an integral part

Example: Santolan, Pasig City

• Where the article or preposition does not form an integral part of a name it should be omitted Example: New Day rather than The New Day

Page 65: INDEXES AND   INDEXING

65

INDEXING STANDARDSPart 3

Page 66: INDEXES AND   INDEXING

Standards serve as models and guidelines for the analysis of documents, construction and organization of indexes, indexing terminology, construction and use of thesauri, etc. they promote consistency and uniformity.

66

Page 67: INDEXES AND   INDEXING

A. International Organization for Standardization

-is a network of the national standards institutes of 146 countries, on the basis of one member per country, with a Central Secretariat in Geneva, Switzerland that coordinates the system.

67

Page 68: INDEXES AND   INDEXING

ISO 5963: 1985 Documentation – Methods for examining documents, determining their subjects, and selecting indexing terms

ISO 999: 1996 Information and documentation – Guidelines for the content, organization and

presentation of indexes

ISO 4: 1997 Information and documentation

– Rules for the abbreviation of title words and titles of

publications. It publishes a List of Serial Title Word Abbreviations which includes title word abbreviations

in over 50 languages.

68

Page 69: INDEXES AND   INDEXING

B. National Information Standards Organization (NISO)

A nonprofit association accredited by the American

National Standards Institute (ANSI) that identifies, develops, maintains and publishes technical standards to manage information

in our changing and ever-more digital environment.

NISO standards apply both traditional and new technologies

to the full range of information-related needs, including retrieval, repurposing, storage, metadata, and presentation.

69

Page 70: INDEXES AND   INDEXING

Standards developed by NISO:

– ANSI/NISO Z39.2 – 1994 (R2001) Information interchange format equivalent international standard: ISO 2709

– ANSI/NISO Z39.19 – 2003 Guidelines for the construction, format, and management of Monolingual Thesauri

*Equivalent international standard: ISO 2788

70

Page 71: INDEXES AND   INDEXING

C. British Standards Institution (BSI)

– as the National Standards Body of the UK, it develops standards and applies innovative standardization solutions to meet the needs of business and society.

Standards developed by BSI (related to library and information science): – BS 1749: 1985 Recommendations for

alphabetical arrangement and the filing order of numbers and symbols

• Provides guidance on arranging entries within lists of all kinds, e.g. bibliographies, catalogues, directories and indexes.

– BS ISO 999: 1996 Information and Documentation – guidelines for the content, organization and presentation of indexes 71

Page 72: INDEXES AND   INDEXING

Automatic Indexing

refers to indexing by machine, or the analysis of text by means of computer algorithms.

- The focus is on automatic methods used behind the scenes with little or no input from individual searchers, with the exception of relevance feedback.

- It does not include searching options and techniques used by human searches, such as methods for creating effective search statements, adding weights to terms, specifying proximity requirements, using truncation, wild cards or combining terms with Boolean or role operators.

72

Page 73: INDEXES AND   INDEXING

Four Types of Approaches

• Statistical – based on counts of words, statistical associations, and collation techniques that assigns weights, cluster similar words

Example: Tf-idf (term frequency -inverse document frequency), which is frequency used in many search engines.

The intuitive philosophy behind tf-idf is that terms that are frequent in many documents are less suited to make discriminations, while terms that are frequent within a single document may indicate that this document has much information about the things the terms are referring to).

Source: Cleveland & Cleveland, 2001, p. 21173

Page 74: INDEXES AND   INDEXING

• Syntactical – stresses grammar and parts of speech, identifying concepts

found in designated grammatical combinations, such as noun phrases

• Semantic systems – systems are concerned with the context sensitivity of words

in the text Examples: What does cat mean in terms of its context?

House cats? Heavy earthmoving equipment?

• Knowledge-based – systems goes beyond thesaurus or equivalent relationships

to knowing the relationship between words Example: ‘tibia’ is part of a leg, thus the document is indexed under ‘leg injuries’.

74

Page 75: INDEXES AND   INDEXING

Human / Manual Indexing vs. Automatic Indexing

• Automatic methods have trouble handling synonyms, homonyms, and semantic relations. Conceptualizing is very poor. Human indexers go through cognitive processes that may be influenced by their background experience, education, training, intelligence, and common sense.

• Computers can, and humans cannot, organize all words in a text and in a given database and make statistical operations on them (e.g. Td-idf).

75

Page 76: INDEXES AND   INDEXING

Websites for Indexers Indexing Services  H.W. Wilson Home Page (http://www.hwwilson.com/)

Wright Information (http://mindspring.com/~jancw/)

Susan Holbert Indexing Services ( http://abbington.com/holbert/)

Special Formats and Subjects IndexingASIS Thesaurus of Information Science (http://www.asis.org/Publications/Thesaurus/isframe.htm)

 The Library of Congress Thesauri (http://lcweb.loc.gov/pmei/lexico/liv/bsearch.html)

StandardsNational Information Standards Organization (http://www.niso.org/)

 ANSI/NISO Z39.41- 1997 Guidelines for Abstracts (http://www.ansi.org/)

 ANSI/Z39.4- 1984 Basic Criteria for Indexers (http://www.ansi.org/)

Indexing software 

HTML Indexer (for Windows) http://www.html-indexer.com/ 

Cindex (for DOS, Windows, and Macintosh) http://www.indexres.com

76

Page 77: INDEXES AND   INDEXING

77www.comicstripgenerator.com

www.comicstripgenerator.com

http://sweetmud.tv/wp-content/plugins/thank-you-animation-for-powerpoint-free