Upload
lamthien
View
217
Download
0
Embed Size (px)
Citation preview
U N E S C O
INFORMATION PROGRAMME
STUDY FOR INDEXING AND
ABSTRACTING SYSTEMS FOR
ARABIC COLLECTIONS
IN
ARAB LEAGUE DOCUMENTATION CENTER
" A L D O C "
CONSULTANCY MISSION
PROJECT : RAB/79/030
PERIOD : 6 AUGUST TO 5 SEPTEMBER 1982
SHAWKY SALEM
UNESCO CONSULTANT
1982
DISTRIBUTION SYSTEM
UNESCO HEADQUARTERS
AUTHOR
TOTAL
Si INTRODUCTION
ARABIC COLLECTION IN "ALDOC" : NATURE & TYPES
INDEXING AND ABSTRACTING SYSTEMS.
3.1 Definition of subject profile for items
selected for indexing and abstracting.
3.2 Definition of indexing and abstracting
levels.
3.3 Definition of indexing tools and thesaurus
construction.
3.4 Definition of indexing and abstracting
procedures.
3.5 Definition of indexing and abstracting
rates and priority.
3.6 Evaluation of indexing and abstracting
processes.
ESTABLISHING ABSTRACTS FOR ARABIC COLLECTION
ACKNOWLEDGEMENTS
APPENDIX-1.
- 1 -
1. INTRODUCTION
SISISIBISIÍ
This mission was done in August 1982 for 4 weeks (two
weeks at ALDOC (Arab League Documentation Center) and two
weeks to prepare this report which concern with indexing
and abstracting systems for Arabic collections in "ALDOC",
Arab League, Tunis).
My first visit to "ALDOC" was in April 1981, and during
my second visit, I noticed many changes and progress in
different activities - e.g. Computer, Microfilm, Library,
Services, etc. and I hope during the time schedule of the
project "ALDOC" will implement its objects and goals.
The first mission was to establish a Microfilm Informa
tion System for Arab League collections specially Arabic
collections. The second is fully related to the first,
and is to establish an indexing and abstracting systems
for Arabic collections and to train the staff on that
operations and procedures.
- 2 -
2. ARABIC COLLECTIONS IN"ALDOC": NATURE & TYPES
The Arabic collections in "ALDOC" were acquired from
different sources in different subjects and with different
types.
We can divide the Arabic collections into three cate
gories :
First category: Arabic collections issued by the Arab
League and they contain :
A. Summit Conference Decisions
B. Arab League Council decisions
C. Arab League special councils decisions
D. Secretary General decisions
E. Steering Committees reports, meetings and
decisions.
F. Technical Committees reports, studies and decisions.
G. Arab League Departments reports and documents.
Second category: Arabic collections issued by Arabic
organizations related to Arab Leagtíe,
and contain: reports, studies,
documents, annual reports, etc.
Third category: Arabic collections issued by Arab unions
in different fields, and contain all
the documents related to its activities.
The quantity of these different Arabic collections is
not small and it is not limited. Up to now, it is not
- 3 -
surveyed accurately but the flow of documents from the
three sources mentioned above seemed as active flow and
continuous. Also all the Arabic collections are not com
pleted and there are many missing documents because of
the transfer of the Arab League from Cairo to Tunis.
The types of available Arabic collections were surveyed
in the following forms of representation :
Studies - reports - researches - speeches - press
statement & press conferences - recommendations -
agenda and meeting proceedings'- regulations - treaties
and charters - committees decisions - budgets, final
statement and financial data - annual reports - perio
dic pamphlets and bulletins, etc.
Some of these types are in bound form and the others
are in sheets or limited papers. Also some of them are
issued periodically like annual reports, budgets, etc. and
the most issued randomly according to the activities of
Arab League departments or Arab Organizations and Unions
activities.
The name "Arabic collections" for these types give the
impression they are fully in Arabic language but we can
notice that there are some foreign languages in these
collections as translated types e.g. the Secretary General
speeches in Arabic, English and French languages.
The classification system for these collections was
established according to my recommendations in the first
- 4 -
mission to build classification system depend on "organiza
tion structure" for Arab League and its organizations.
The cataloguing system for the Arabic collections are in
some way systematic and had the general features from
AACR 2 but it is not according to its rules and regulations
completely.
However, the matter of classification and cataloguing
are not the aim of this study and perhaps they will be
studied in detail by another consultant in future.
- 5 -
3. INDEXING & ABSTRACTING SYSTEMS
IsËIâlsBSIalâËlsIsËIsIsEIs
3.1 DEFINITION OF SUBJECT PROFILE FOR ITEMS SELECTED
FOR INDEXING & ABSTRACTING.
There are three consultancy studies related to that
subject which were done in past - Dr. Aman study,
Mr. El-Nagdawy study and Dr. Khidr study. In spite of these
studies it seems that the indexers in "ALDOC" are not aware
of them and therefore they usually ask about the selection
areas for items related to Arab League activities. Accord
ingly, I am defining the areas of interest for Arab League.
These areas and topics are mentioned at the end of this
study under "Appendix 1". The definition of these areas
will be a important tool to help the indexers, cataloguers
and classifiers in selecting items related to these areas
because they represent users requirements in Arab League
and its organizations and other Arabic Unions and establi
shments.
These areas and topics definition will help in :
A) Establishing a clear policy and plan to select the
related documents which will be indexed and abstracted
and ignore the unrelated documents.
B) The coordination and consistency between the selecting
documents and user requirements.
C) Linking the selecting policy and collections develop
ment policy specially for the items which are proved
that they are core documents to the users.
- 6 -
D) The necessity of updating the selection areas plan
according to the developed user requirements and
changes. It means that selection policy will be the
tool of measuring users profile and interested subjects.
3.2 THE DEFINITION OF INDEXING AND ABSTRACTING LEVEL
It is well known that depth indexing can be a fact
according to increase in the numbers of accurate indexing
keywords per document. But this increase is not unlimited
as we will reach a stage in which any increase will be
dangerous to the retrieval system because of redundancy
and faults in retrieval precision.
There are no specific rules or standards to give us the
minimum or maximum of indexing keywords per document.
These rules can be decided by the indexer himself and
according to his relations with users and his knowledge
about their requirements. But we can recommend levels
for the indexing keywords until there are definition for
that by the indexers themselves. These levels are :
Minimum 6 keywords/document (contains subject and
non-subject keywords).
Maximum 12 keywords/document (contains subject and
non-subject keywords).
The reason for this recommendation are :
1) Most of the Arabic collections are limited in few pages,
and sometimes they are decisions in few lines.
- 7 -
2) The minimum and maximum mentioned rates can produce a
good indexing system.
3) Any increase in indexing rates means the increase in
cost and in availability of high technical manpower to
do the indexing operations.
For the levels of abstracts, it is preferable to have
abstracts in 100-150 words expressing the subjects of
documents.
3.3 THE DEFINITION OF INDEXING TOOL AND CONTROL
THESAURUS
It was remarked that the indexing tools in "ALDOC" were
several and different in structure and areas. The tools
are :
1) Keywords selected from OECD wacrothesaurus in English/
French language edition 1978, and from Arabic transla
tion which was done by IDCAS in Cairo in 1979.
2) Keywords were selected from United Nations Thesaurus
in English language and translated into Arabic language.
3) Keywords were selected from New York Times Thesaurus in
English language and translated into Arabic language.
4) Arabic keywords were selected from the documents itself
(free indexing).
5) The translation into Arabic language was done by referring
to some Arabic subject heading tools (which differs
- 8 -
completely from thesaurus logic and structure). The
subject headings tools are :
a) Al-Khazendar : Subject headings list
b) Al-Sewidan : Subject headings list
This means that there is no clear policy for indexing
processes in Arabic and there is no one to control list of
terms to help indexers in their work and to standardize
the operations. Moreover, the different sources of keywords
produce overlaping of terms in language semantic, syntactic
and concept construction, plus the complexity of controlling
these keywords and the impossibility of defining its ana
lytical relations.
For these reasons we recommend to stop all these mixed
and ambiguous operations. The solutions for this matter
are the following options :
Fast Solutions Options:
First Option: to use one control thesaurus in English
language for the indexing of all documents in ALDOC. It
can be OECD Macrothesaurus, and we can modify its subjects
by adding some keywords related to users interests and
fulfil them in the thesaurus hierarchical structure and
logical building.
Advantages Disadvantages
1. The possibility of 1. The indexing processes for
beginning the work Arabic collections will be
directly. by English terms.
- 9 -
Advantages Disadvantages
2. The availability of good 2. The computer handling for
control tool which was Arabic documents will be
experimented in different twice: one by Arabic for
information and documenta- bibliographic description;
tion centers. second by English for
keywords indexing.
Second option: to use KWIC (Key-Words-In-Contest)
system or free indexing for the Arabic documents, accord-
'J ingly after some time there will be a nucleus for Arabic
X
thesaurus. This option is easy and more practical.
Advantages Disadvantages
1. Economic-wise & cheap. 1. Retrieval is not precised
2. Easy to use and to begin. 2. Retrieval is not compre-
3. Consider as base for hensive because we do not
controlled Arabic use controlled keywords,
thesaurus.
Slow Solutions Options:
Third Option: to add the Arabic translation to OECD
Macrothesaurus (English/French) to have in the end multi
lingual thesaurus with three languages (English/French/Arabic)
Advantages Disadvantages
1. Retrieving all documents 1. The translation usually is
in the database by any not precised specially if
language term (multi- it is not done by spe-
lingual access) cialised persons.
- 10 -
Advantages
The translation opera
tion is more easier
than building a
thesaurus.
Disadvantages
The translation concept
is, however, precised -
does not mafch equally
with the original concept,
and it will produce bad
retrieval.
The translation structure
differ from thesaurus con
struction because it is
more easier in process and
bad in retrieval, but the
heirarchical logic of
thesaurus is more difficult
in process and is better in
retrieval than have trans
lation.
What will happen when we
find some collections with
other fourth language e.g.
Italian or Russian colle
ctions ? Will we add
fourth translation ? or
indexing this fourth
collection with one of the
first three languages ?
and which one of them ?
and why ?
- 11 -
Fourth option: to establish an Arabic thesaurus from
the available Arabic collections. This requires at least
two years work from professional persons in thesaurus
construction to collect keywords, to group them, to classify
and build the heirarchical trees and produce analytical
relations.
Advantages Disadvantages
1. The Arabic thesaurus is 1. Require long perioc
one of the ultimate goal for establishing.
for 'ALDOC. 2. Require high professional
2. The Arabic thesaurus caliber to establish,
represents the Arabic
environment by
standardized Arabic
keywords.
3. The Arabic thesaurus can
be developed in future
and can be a good experi
ment for other specialized
thesaurus.
We recommend to begin by option two (KWIC/free indexing)
as first stage, and from the free keywords can be the
nucleilus of option four (Arabic thesaurus).
The matter now need a clear decision from ALDOC Director
after consulting with CTA of the project.
- 12 -
3.4 DEFINITION OF INDEXING AND ABSTRACTING PROCEDURES
The definition of this operation is divided into eight
stages (Chart-1).
A) Stage of Receiving Documents
The receiving of documents begin from Acquisition
Section to establish technical operations for different
documents. There must be a follow up system to pursue
the location of any documents during this stage in case
any user ask for it.
B) Stage of selecting items for indexing and abstracting
This stage was discussed in detail at the beginning of
this study.
C) Stage of understanding the selected items
This stage depends on fast readings for the items in
general and focusing on the following points :
1) Title
2) The author and the knowledge of his job and his special
field and interest if it is available on the document.
3) Reading the author's abstract.
4) Reading the introduction (which usually represent the
subject of the document).
5) Reading the beginning and end of chapters and parts
and reading the subject headings in the document.
6) Reading the conclusions part in the document.
7) Looking for the illustrations, maps, tables and dia
grams in the document.
DOCUMENTS
INPUT
CONTENT
ANALYSIS
OF DOCUMENT
DOCUMENT CONCEPT
STAGE
TRANSFER CONCEPTS TO
KEYWORDS
DOCUMENT
THESAURUS 3
DOCUMENT
KEYWORDS
STORE
RETRIEVING
TROM STORE-MATCHING USER
WITH DOCUMENT
Q THESAURUS TRANSFER INQUIRY CONCEPTS TO KEYWORDS
INQUIRY I
KEYWORD^
h-H INQUIRY
KEYWORDS
INQUIRY
CONCEPT
OUTPUT
CONTENT ANALYSIS
OF INQUIRY
CHART-1 : INDEXING STAGES
- 14 -
8) Looking for any underlined words or block letter
printing.
9) Reading of parts which the author mentioned that they
are contribution or new experiments and creation.
All these points can give the indexer a general idea
and specific outline about the concepts of the document.
D) Stage of building concepts related to the document
This stage depends usually on stage C, and the indexer
can put his concepts about the document in separate sheet
until he can begin the standardization of these concepts
by the controlled thesaurus.
E) Stage of transferring concepts to thesaurus terms
The indexer begin to transfer the concepts to terms
in his mind and check about it in the thesaurus. If he
does not find, he will check under some synonyms related
to this concept and in this stage we can call it "The
dialogue with Thesaurus" and this operation is considered
as evaluation point to recognize good indexer and bad
indexer because the dialogue depend on his experience,
culture, knowledge and prediction of user requirements.
The indexer usually use the comprehensibility and part/whole
symantic and syntactic language structure. Also he use the
synonym concept in his dialogue. In the end, the indexer
reaches the definite keywords and terms which represent
the document and which was chosen from the thesaurus and
controlled under standardized rules.
- 15 -
F) Stage of registering the selected keywords
After reaching the exact representative keywords in
language form, the indexer will register these words on
manual cards or on computer input sheet or direct through
online computer terminal to be stored in the database.
G) Stage of establishing abstract structure
This stage come after reaching stage E and depend on
it in building informative abstract for the document,
which will help in informing users by the documents which
cover their requirements.
H) Stage of returning the documents to shelfing
Documents must be returned after processing to Infor
mation Services Section to arrange them on shelves or
according to the system of storing or directed to micro
film section to be microfilmed. In the end, document must
be organized well in stores to enable the retrieval
operation to be correct.
3.5 THE DEFINITION OF INDEXING AND ABSTRACTING RATES
AND THE PRIORITY OF HANDLING.
Most of the international systems for indexing and
abstracting define 20-30 keywords to analyse the document,
and 20-25 documents for indexer to analyse per day (man/
day), and usually the indexer is a professional person
with high certificates in his fields plus an excellent
experience in indexing and abstracting.
- 16 -
Surely, the indexers in 'ALDOC are not the same caliber
and also at beginning they can implement indexing without
abstracting. For that reason we can estimate the rates by
15-20 documents per day for the indexer in ALDOC (man/day).
According to this rate we can design an indexing plan
and priority of handling the documents types (Chart-2).
The schedule plan can depend on :
1) Current trend: to begin indexing for the current docu
ments from the beginning of 1983 covering all the
documents received by 'ALDOC'.'
2) Retrospective trend: to index the stored documents
before 1983 gradually and in organized way to index
the documents type after type.
The implementation plan requires the following :
A) Clear definition for documents types
B) Clear definition of documents types priority according
to its use and handling.
C) Accurate time table definition for implementing the
indexing operation for each type with some reserve
time not more thanl0% from the time for any emergency
or delay.
D) The implementation of the whole plan must be controlled
and pursued.
E) The calculating for implementation time (man/day) will
be according to the previous mentioned rates (These
rates can be changed slightly according to manpower
capability until the plan can be executed firmly).
- 17 -
RETROSPECTIVE DOCUMENTS
INDEXING PLAN
CURRENT DOCUMENTS
BEGIN RETROSPECTIVELY FROM END OF 1982
TYPES
BEGIN CURRENTLY FROM BEGINNING OF 1983
c o •H -p Ü 0J l-l r-H O O
after
c o •H -P ü
• H iH
O o
after
c o • H -P
iH i-l
O o
TYPES
c o •H -p o 0) rH
iH O O
after
,
c o •H -P ü
i-H i-l
O c_>
after
t
G O
•H -P 0) l—l i-t
O o
etc, etc
CHART-2: IMPLEMENTATION PLAN FOR INDEXING
- 18 -
3.6 EVALUATION OF INDEXING PROCESSES
Evaluation operation consider very important operation
to know if the indexing is accurate or not. This evalua
tion must be done by three ways :
A) Using experiments to evaluate the keywords quality and
abstracting quality through users opinions and question
naires or through daily retrieval operations.
B) Using mathematical measurements to evaluate indexing
and abstracting quality, by the following measurements:
(1) the accuracy of retrieval factor comparing with
number of analytical keywords per document.
(2) Recall measurement to define the related retrieval
documents from the related unretrieval documents.
(3) Precision measurement to define related retrieval
documents from the unrelated retrieval documents.
(4) Coring and aboutness measurement to define the
coring terms and aboutness terms and its scatter
ing from the document concepts.
(5) Synonym measurement to define the coverage of
original term to its synonyms.
The evaluation operation produce clear results and give
reasons about good or bad indexing. This depends on indexer
experience, indexer knowledge about thesaurus construction
and heirarchical building, the difficulty of document
subject and if it needs professional indexer or normal
indexer, the use of search strategy in retrieval operations
in good logic.
- 19 -
The evaluation operation is very essential for the index
ing criteria.
4. THE BUILDING OF ABSTRACTS FOR ARABIC COLLECTION E]B]E]B]E]B]E]E]E]E]E]E1E]E]B1E]E]E]B]E]E]B]E]E]B] The importance of abstracts refers to the need of easier
and fast tool to help the users in reaching related document.
This tool provides an organized form representing the fast
pursuing method for fields of interest.
We recommend to implement indexing service together with
abstracting service and specially "ALDOC' will have a
sophisticated computer system.
Also, we recommend informative abstracts type which
introduce more information to guide the users to their needs
in spite of reading the documents itself, specially most of
the Arabic collections are decisions and recommendations,
and here the bibliographic description is very important
for abstracts because of the nature of Arabic collections.
For the reports, studies and big documents which were
issued by Arab League Departments & Organizations, it is
recommended to ask these departments to prepare informative
abstracts for their documents before issuing them.
We recommend also to mention the full bibliographic
description of :
(1) Author
(2) Title
(3) Source
- 20 -
The abstracting operations procedures can be done during
indexing procedures and with the same philosophy.
5. ACKNOWLEDGEMENT §]B]E]E]E)E]E]E]
I wish to thank Mrs. Zahawy, Director of ALDOC,
Dr. Zehery, Chief Technical Advisor for ALDOC, Mr.Salaheih,
UNDP Representative in Tunis, Mr. Vasarhelyi, Chief:
Operational Section, PGI, UNESCO and all the colleagues
in ALDOC who have extended their support and cooperated
with me by contributing their time, ideas and helping me
preparing this study in such a form.
SHAWKY SALEM
Kuwait, 5th September 1982.
APPENDIX 1
FIELDS OF INTERESTS
AND
SUBJECT AREAS
FOR
"ALDOC"
Arab League and its organizations: Documents published
by these organizations.
Documents discuss Arab League and its organizations,
e.g. activities, structure, role, rules, regulations,
etc.
Law :
3.1 Arab, Islamic and International Laws.
3.2 Rules, regulations, decisions, recommendations and
documents of the international courts of justice.
3.3 Official gazettes published by Arab and some African
Countries.
3.4 Law journals published in Arabic', English, French,
and other languages of interest.
3.5 Law texts and reference books (law dictionaries,
encyclopedias, directories, indexes, etc.).
Military Science:
4.1 Peace and war with Israel and Israel Intelligence»
4.2 Settlements»
4.3 Armament and disarmament»
4.4 International and Arab military development-
4.5 Alliences, pacts, treaties, etc. (NATO, WARSO).
4.6 Military bases and presence in Arab Seas and
Indian Ocean.
4.7 Military operations and manoeuvres-
4.8 Military communication.
4.9 Military leaders in Arab, Islamic and African countries, and Israel.
- 23 -
4.10 Military industries and technologies including
hardware and specifications.
4.11 Foreign military interventions (Afghanistan, etc.)
4.12 Arab and foreign reactions to military interven
tions in Arab, African and Muslim countries.
4.13 Military science references and periodicals such
as defence national, space and aviation, etc.
5. Historical and political sciences:
5.1 The Palestine problem-
5.2 Water supply to Israel.
5.3 Palestinians outside Palestine and the Arab world.
5.4 Arab-Israel border disputes-
5.5 Arab nations border disputes (Iraq & Iran, Algeria
and Morocco, Libya and Tunis, etc.).
5.6 Politics and political parties in the Arab World,
and in select friendly and non-friendly nations.
5.7 Political impact of the Arab oil, business and
banking.
5.8 Human rights of the Arabs.
5.9 Views of Arab, Israeli, and Western Press on world
politics in general and Arab politics in particular.
5.10 Studies and research on the Arab World published
within and outside the Arab World.
5.11 Who's who in world politics, especially those for
or against Arabs.
5.12 Arabs abroad and brain drain in the Arab World.
5.13 Arab-African, and Arab-European dialogues.
5.14 Clippins from Arab and foreign newspapers.
- 24 -
5.15 History and geography of the Arab countries.
5.16 Maps of the Arab countries.
5.17 Reference and non-reference books, who's who,
dictionaries, current events, e.g. statesman's
yearbook, Europa publications, facts on file,
keesings contemporary archives, etc.
6. Economics and statistics:
6.1 Foreign trade and agreements between Arab and
foreign countries.
6.2 Demographic aspects and projections.
6.3 Statistics by subjects (e.g. agriculture, banking,
mining, fishing, etc.).
6.4 Arab statistics (book form and magnetic tapes) the
latter available from Geneva.
6.5 Proceedings and other publications of the permanent
committee on statistics.
6.6 Arab and Islamic banks and development funds.
6.7 World bank.
6.8 Arab investments abroad and foreign investments
in Arab countries.
6.9 Arab countries' budgets.
6.10 Arab census data.
6.11 Books periodicals, and references on interalia,
economics, management, and finance, developmental
planning, etc.
7. International Relations:
7.1 U.N. Documents.
- 25 -
7.2 Documents and publications from Arab foreign
ministries.
7.3 Documents of the organization of African States
(OAU) and organization of American States (OAS).
7.4 Arab-African and European dialogues.
7.5 Arab and non-Arab nominations to the UN (back
ground information on nominees).
7.6 Africa and its politics and leaders.
7.7 Israeli intervention and influence in Africa.
7.8 PLO publications.
7.9 African liberation movements and leaders.
7.10 African political strategies.
7.11 Who's who in Africa.
7.12 Arab resistance to Israeli Occupation.
7.13 Israeli political parties.
7.14 Autonomy rule.
7.15 Scripts of Israeli broadcasting and clippings
from Israeli newspapers.
8. Social sciences:
8.1 Arab woman in particular and women in general.
8.2 Palestinian woman.
8.3 Handicapped (mentally, physically).
8.4 Youth and Youth activities.
8.5 Immigration of Arabs and non-Arabs from and to
Arab countries.
8.6 Refugees.
8.7 Child welfare.
8.8 Social planning and development.
8.9 Social services and agencies.
- 26 -
Information and mass media:
9.1 Arab and non-Arab public and official opinion and
reaction to Arab-related issues.
9.2 Palestinian question.
9.3 Editorials and journal articles on Arab issues,
problems and concerns.
9.4 General Arab and foreign periodicals and newspapers.
9.5 Periodicals on journalism and mass media e.g.
journal of communication, public opinion quarterly.
9.6 Books on Arabs.
9.7 Who's who information on Arabs and others.