Upload
v-a-markusova
View
212
Download
0
Embed Size (px)
Citation preview
ISSN 0005�1055, Automatic Documentation and Mathematical Linguistics, 2013, Vol. 47, No. 3, pp. 69–92. © Allerton Press, Inc., 2013.Original Russian Text © A.N. Libkind, V.A. Markusova, I.A. Libkind, M. Jansz, K.N. Ivanov, 2013, published in Nauchno�Tekhnicheskaya Informatsiya, Seriya 2, 2013, No. 3,pp. 9–34.
69
1 INTRODUCTION
It is well�known that the creation of the ScienceCitation Index (SCI) by Eugene Garfield in the Insti�tute for Scientific Information (ISI, nowadays part ofThomson�Reuters), was a powerful impulse for thedevelopment of scientometrics. The accumulation ofenormous files of bibliographic information in ISI andthe development of increasingly powerful computerfacilities has made possible the creation of a new infor�mation product based on the interrelations of scien�tific journals: the Journal Citation Reports (JCR). TheJCR first published in 1975 and containing statisticsabout some 3000 scientific journals, is publishingannually. A special version of the JCR for social sci�ences (JCR Social Science Edition) is publishing since1978. Both versions of the JCR are accessible onlinesince 2010, as components of the huge and multidi�mensional information system Web of Knowledge(WoK). Today, the JCR contains more than ten quan�titative characteristics for each journal. For the pur�poses of the present research, the most importantcharacteristics are the number of articles and theimpact factor value of each journal. The concept ofimpact factor was introduced by Dr. Eugene Garfieldin cooperation with Dr. Irving Sher in 1955 [1]. Dis�
1 The article was translated by the authors.
cussing the SCI and the JCR, E. Garfield stressed thatthe main purpose of these resources is to provide infor�mation to researchers; he notes that in addition thesets of bibliographic records are an invaluable sourceof scientometric information [2]. Scientometricsresearchers soon appreciated the opportunities offeredby the JCR and increasingly used JCR data in theirresearch. The number of research and papers based onthe use of journal statistics from the JCR is so largethat even simply listing them is not an easy task. Inthese circumstances, questions of the reliability andcorrectness when using these statistics becomes a veryactual and important problem [3–11]. These ques�tions are even more important in view of the fact thatJCR data are also used for making science and tech�nology policy decisions.
In this paper we try to estimate the stability in timeof some important bibliometric indicators for sets ofjournals, which, according to the opinion of the worldscientific society, belong to the most authoritativesources of scientific papers in the corresponding fieldsof science. In other words, the aim of the article is todescribe the process of change with time in statisticalindicators of journal sets, and to build up a mathemat�ical model describing this process. For this purpose wewill try to form the sets of journals in such a way, thateach of the journal sets at a given moment in time will
Modeling the Dynamics of the Retentivity Process of Journals Among the Most Authoritative Scientific Serials1
A. N. Libkinda, V. A. Markusovab, I. A. Libkindc, M. Janszd, and K. N. Ivanove
a PhD, Head of department of VINITI RAS, Moscow, Russiab Dr. Sci., Head of department of VINITI RAS, Moscow, Russia
c Programming supervisor of Finacial University under the Government of the Russian Federation, Moscow, Russiad Program Director at Technology Foundation STW, Utrecht, Netherlands
e Programming supervisor of Finacial University under the Government of the Russian Federation, Moscow, Russiae�mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Received January 11, 2013
Abstract—The article examines the process of journal retention over time as the most authoritative sourcesof scientific papers. Formal concepts of retentivity are introduced, retentivity orders and retentivity direc�tions. The postulates, which are formulated in the article, link the probability of retention to the time intervalbetween the two compared journal lists, to the ratio of the sizes of these lists, as well as to some qualitativecharacteristics of the journals. A mathematical model of the retentivity process is built on the basis of thesepostulates. The model is compared with data from the Journal Citation Reports—Science Edition for a 16�year period. The results of this comparison show a high degree of conformity of the model to the real processof retentivity, and reveal important features of this process.
Keywords: lists of journals; impact factor; journal output; Journal Citation Reports; journals retention pro�cess; mathematical model
DOI: 10.3103/S0005105513030011
70
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
represent practically all the fields and branches of sci�ence (social sciences and humanities excluded). Afterthe selection of these sets of world (national and inter�national) journals we will consider them withoutregard to their thematic spectrum.
BASIC CONCEPT AND DEFINITIONS
The central concept of this paper is concept of“retentivity of journals as the most authoritativesources of scientific papers”. This concept is close tothe concept of “retentivity of regular sources for agiven specific field, which had been introduced beforeby one of the authors of this article in co�authorshipwith Michael V. Arapov [12, 13].
It is obvious that on the one hand the concept of“retentivity of journals as the most authoritativesources of scientific papers” is a kind of simplification(reduction) of the concept of “retentivity of regularsources for a given specific field”. Unlike the conceptin [12, 13] the concept presented in this article isabstract away from a subject of a journal, and thusreduces the concept of retentivity. On the other hand,the concept of retentivity suggested here may be alsoregarded as some extension of the concept in papers[12, 13]. Namely, according to [12, 13], the probabilityof retention of a journal as the source for a given sub�ject—after a certain period of time—depends on itsproductivity (the number of articles on the subject) atsome initial time, and on the size of the time interval.The notion of retentivity, which is introduced in thisarticle links the fact of retention/non�retention of ajournal with its productivity and the time interval aswell, and in addition, also takes into account suchimportant characteristics as the impact factor (as anindicator for the scientific level of a journal) and theexpected response (as an indicator for its degree ofinfluence, see below) on the research carried out in theworld of science. With the help of this concept we willtry to formalize the assumption, that the majority ofthe journals which, in the opinion of the world scien�tific society (at some given moment), are character�ized as having a high scientific level, will remain forquite a long period of time in the subset of authorita�tive sources of scientific papers.
We will regard the fact of including a journal in thelist of journals of the information resource “JournalReports—Science Edition” (JCR SE) of ThomsonReuters company as the basis for recognizing the jour�nal as an authoritative source.
Let us consider a pair of lists Lt and Lt + τ, separated
by the time interval of τ years. We will assume thatjournal x1 from list Lt (xi ∈ Lt) is retentive in list Lt + τ
ifin case xi ∈ Lt also xi ∈ Lt + τ
(xi ∈ Lt ⇒ xi ∈ Lt + τ) is
true. In analogy we will say that journal yj from list Lt + τ
(yj ∈ Lt + τ) is retentive in list Lt if in case yj ∈ Lt + τ
istrue yj ∈ Lt (yi ∈ Lt + τ
⇒ yi ∈ Lt) also. It is clear thatwith this condition the equation is alwaysnLt
nLt τ+=
true, where is a number of journals from list Lt,
which are retentive in list Lt + τ and is a number of
journals from list Lt + τ, that were also present in list Lt.
Let us characterize each of the compared lists Ltand Lt + τ
with the following four quantitative charac�teristics:
—number of journals in the list Li of the given year i;—the sum of number of articles, which were pub�
lished in the journals from the list Li in the given year i;—the sum of the values of the 2�year impact�factor
of the journals from the list Li in the given year i;—the sum of the expected response values to the
articles, which were published in journals from the listi of the given year i. The expected response (ER) is thenumber of articles published in the journal Li in thegiven year multiplied by the value of the impact�factorof this journal i in the same year.
Let us briefly elaborate what we mean by the abovementioned quantitative characteristics.
The number of journals in the list of a given year isthe number of scientific periodicals and proceedingeditions, which are included in the “Master List” ofthe JCR SE in a given year.
The number of articles is the number of researchand review articles (“citeble items” in JCR teminol�ogy), which a given journal published in a given yearaccording to the JCR SE. Our analysis of SCI�expanded data shows that in the period 1995–2010citeble items constitute about 75% of the total numberof publications in this resource.
The sum of the 2–year impact–factor values of thejournals: today, JCR SE gives two values for a journal’simpact factor: a 2�year impact�factor and a 5–yearimpact factor. For an n�year impact factor, JCR SEtakes into account all citations in a given year of thepapers in the journal in the previous n years. Ourresearch covers a period starting in 1995. The values ofthe 2–year impact factor were regularly brought intothe JCR long before 1995. JCR SE only gives the val�ues of the 5–year impact factor since 2007. Hence ourchoice of the 2–year impact factor as one of the quan�titative characteristics for this study.
The sum of the expected response values. The valueof the expected response (ER) to the articles of a jour�nal, we take to be the product of the number of articlespublished in a given year and the value of the impactfactor of the journal in the same year. The sum of theexpected response values then is the sum of the ERvalues for all journals included in the “Master List” ofJCR SE for a given year.
Let us introduce the concept of retentivity of jour�nals by a given quantitative characteristic.
Retentivity of journals by a given quantitative char�acteristic—by this we mean the ratio between thevalue of a certain quantitative characteristic, whichcorresponds to the subset of the retentive journals from
nLt
nLt τ+
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 71
list Li in list Lj, and the full value of this quantitativecharacteristic corresponding to the whole set of jour�nals of the list Li.
It is important to emphasize that this value corre�sponding to the subset of retentive journals from thelist of the year ti in the list of some other year tj is alwaystaken in relation to the sum value of the quantitativecharacteristic corresponding to this year ti.
First case. Since τ is a non�negative integer, t + τ >t is always true (t + τ = t describes the degeneratedcase, when we consider the retention of a list in itself).Accordingly, list Lt dates from an earlier year than listLt + τ
. Thus, estimating retentivity of journals by givenparameter from list Lt in list Lt + τ
the researcher“moves” in the direction of “flow of time”.
Second case. When looking at retentivity of jour�nals by a given parameter from list Lt + τ
in list Lt, theresearcher moves in the reverse direction, i.e. into thepast. In the first case, we will talk about direct retentiv�ity, in the second about reverse retentivity. At firstglance, reverse retentivity may seem of purely aca�demic interest. However, this study shows that with thehelp of this concept we can address some questionsthat otherwise would remain unanswered.
In the first case we shall say about direct retentivityin the second—about the reverse retentivity.
Retentivity of the first order (retentivity by numberof journals): We define direct retentivity of the firstorder of journals from list Lt in list Lt + τ
by the
following relationship:
(1)
where: is the number of journals from list Lt, which
are present in list Lt + τ; is the complete number of
all the journals in list Lt, and generally speak�
ing, characterizes the probability of the retention of anaverage journal from list Lt in list Lt + τ
Lt (if we view listLt as a set, then is the potency of the set Lt; =
|Lt|).
We define the reverse retentivity of the first order of journals from list Lt + τ
in list Lt by the fol�
lowing relationship:
(2)
where is the number of journals from list Lt + τ
which are present in list Lt; is the number of all
the journals in list Lt + τ (if we view list Lt + τ
as a set,then is the potency of the set Lt + τ
; =
|Lt + τ|).
q1_Lt t τ+,
,
q1_Lt t τ+,
nLt
NLt
������,=
nLt
NLt
q1_Lt t τ+,
,
NLtNLt
q1_Lt t τ+,
,
q1_Lt τ+ t,
nLt τ+
NLt τ+
���������,=
nLt τ+
NLt τ+
NLt τ+NLt τ+
Retentivity of the second order (retentivity by thesum of number of articles). If Ut is the number of arti�cles published by all the journals from list Lt in year t,and ut is the number of articles which were publishedin the same year t by those journals from list Lt, whichin year t + τ (i.e. after τ years) are present also in listLt + τ
, then direct retentivity of journals of the secondorder is defined by the following relationship:
(3)
With being the number of articles published byall the journals from list in year , and thenumber of articles which were published in the sameyear t + τ only by those journals from list Lt + τ
, whichfor year t (i.e. τ years earlier) are present in list Lt, thereverse retentivity of journals of the second order is defined by the following relationship:
(4)
where – is a number of journals from list pre�
senting in list Lt; – is a number of journals in list
Lt+τ (if examin list as a set, is a power of set
; = | |).
The relationship presented in formula (3) charac�terizes the ratio of the contribution of those journalsfrom list Lt which are present (are retentive) in listLt + τ
, to the complete number of articles published inyear t by all the journals from list Lt. Similarly, the rela�tionship presented in formula (4) characterizes theratio of the contribution of those journals from listLt + τ
, which are present in list Lt for year t, to the com�plete number of articles published by all the journalsfrom list Lt + τ
in year t + τ.
Retentivity of the third order (retentivity by the sumof the impact�factor values). If Wt is the total sum of2�year impact�factor values of all the journals from listLt for year t, and wt is the sum of the 2�year impact�fac�tor values in the same year t of those journals from listLt which are present (are retentive) in list Lt + τ
for year t +τ (i.e. τ years later), then direct retentivity of journals ofthe third order is defined by the following relationship:
(5)
With Wt + τ being the total sum of 2�year impact�
factor values of all the journals from list Lt + τ for year
t + τ, and wt the sum of the 2�year impact�factor valuesin the same year t + τ of those journals from the listLt + τ
which are present (are retentive) in list Lt for yeart (i.e. τ years earlier), then reverse retentivity of journals
qLt
,2 _ .t t
tL
t
uq
U+τ
=
tU+τ
tL+τ
t + τ tu+τ
,2 _ t tLq+τ
,2 _ .t t
tL
t
uq
U+τ
+τ
+τ
=
NLt tL+τ
NLt
tL+τ
NLt
tL+τ
NLt tL+τ
,3 _ .t t
tL
t
wq
W+τ
=
72
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
of the third order is defined by the following relation�ship:
(6)
The parameter characterizes the relativecontribution of the 2�year impact�factor values ofthose journals from list , which are present (areretentive) in list , to the complete sum of 2�yearimpact�factor values of the journals in list . Simi�larly, the parameter corresponds to the relativecontribution of the 2�year impact�factor values ofthose journals in list , which are present also in list
, to the complete sum of 2�year impact�factor valuesof the journals in list .
Retentivity of the fourth order (retentivity by thesum of the expected response values). We will considerthe expected response (ER) to the articles of a journalas a measure for the influence of that journal on globalresearch and define the ER as the product of the num�ber of articles published in the journal j in a given yeart and the 2�year impact�factor value of this journal j inthe same year t:
(7)
where:
is the number of articles published by journal j inyear t;
is the 2�year impact�factor of journal j in year t;
is the expected response to the articles pub�lished in journal j in year t.
The sum of expected responses to all articles pub�lished in all the journals in list Lt in year t is:
(8)
where:
is the total number of journals in list ;
The sum of expected response values inyear t for only those journals from list Lt, which arepresent also in list Lt + τ
for the year t + τ is given by:
(9)
where: is the number of journals from list Lt, which are
present also in list for the year .Then we can write the direct retentivity of journals
of the fourth order as:
(10)
,3 _ .t t
tL
t
wq
W+τ
+τ
+τ
=
,3 _ t tLq+τ
tL+τtL
tL
,3 _ t tLq+τ
+τtLtL
,t tL+τ
,t t tj j jER f IF=
tjf
tjIF
tjER
1
_ ,LtN
t tt j
j
S ER ER=
=∑
tLN tL+τ_ t
tS ER
,
1
_ ,Ltn
t t tt j
j
S ER ER+τ
=
=∑
tLntL+τ
t + τ
,
,
4__
_t t
t tt
L tt
S ERq
S ER+τ
+τ
=
and the reverse retentivity of journals of the fourthorder as:
(11)
where:
is the sum of expected response values toall articles published in all journals from list for
year , and is the sum of expectedresponse values in year for only those journalsfrom list , which are present also in list for year .
SOURCE DATA AND FORMINGPAIRS OF LISTS
As data source we used the annual editions of JCRSE published during the 16–year period: 1995–2010(with the exception of 2001, because the edition ofJCR SE for 2001 was unavailable to us). In these edi�tions of JCR SE we were interested in the followingdata:
—the journals (we used as the identifier of a journal
either its title2 or ISSN) in each year;
—the number of articles published by each journalin each year;
—values of the 2�year impact�factor of each jour�nal in each year.
In total 15 lists of journals were extracted with thecorresponding data for each of the four quantitativecharacteristics and pairs of lists to be compared wereformed. The number of pairs for each of the quantita�tive characteristics is equal to the number of permuta�tions:
In our case n = 15 (number of lists) and m = 2 (thelists are compared in pairs). Thus, the number of pairsof lists we obtain for each of the four quantitative char�acteristics (we sum their total number for the directand the reverse retentivity) is:
The total number of comparisons is 840 (210 × 4).Let us specify the creation of each pair of lists
and/or corresponded numerical values depending onthe order retentivity. This clarification necessary bynext reasons. In some cases in JCR SE for a particular
2 We didn’t keep a close watch on the cases when a journal titlechanged or when a journal became a part of another journal orwhen a journal is divided into several independent publicationsor when a journal stopped to exist. The assessment of the fre�quency of such cases shows that the share of those journals isusually less than 0.5–1% of the complete number of journals inthe list of the corresponding year.
,
,
4__
,_t t
t tt
L tt
S ERq
S ER+τ
+τ
+τ
+τ
+τ
=
+τ
+τ_ t
tS ER
tL+τ
t + τ,_ t t
tS ER +τ
+τ
t + τ
tL+τ tL t
! .( )!
mn
nAn m
=
−
215
15! 210.(15 2)!
A = =
−
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 73
journal in one of list of given couple of lists the numberof articles and/or the impact factor was present, whilein the second list, these values is not indicated. Some�times in the JCR stated that the corresponding valuesare zero. In case of number of articles this situation isprobably due to the fact that at the time of the prepa�ration of next issue JCR SE specialists Thomson Reu�ters, for one reason or another, did not have informa�tion about the number of articles published by thejournal.
The situation is more obvious in case of impact fac�tor. Collection of statistics (number of references tothe paper of this journal) to calculate the impact factorshould be at least 2 years of observation. If the journalwas included in the JCR SE for the first time, thensuch statistics do not exist. As a result, in JCR SE valueof impact factor for a journal are absent, or indicatesthat the value is zero. Our analysis showed that theshare of journals for which the number articles isabsent or equal to 0—in the range of 0.5–3%. Thecorresponding values for case of impact factor is in therange 1–3.5%. In some cases of missing data, we havebeen able to restore the number of articles for gevenjournal. This was done by using the issue of JCR SE forthe year following after year of issue in which dataabout the number of articles for the journal was notindicated or specified as zero. This procedura was pos�sible because number of articles published in a journalin the previous year (t – 1) is used in the JCR SE (inyear t) for calculating the impact factor of the journalin year t. Unfortunately, it was possible to restore thenumber of articles not for all such journals. The spec�ified procedure is quite time�consuming, as in each ofthe 15 lists the information for one hundred or morejournals needed to be restored. In addition, the journalcould be missing entirely from the list of journals in theissue of JCR in year t + 1.
Pairs of lists were created as follows:—For retentivity by number of journals (retentivity
of the first order): list Lt includes without exception alljournals which are present in JCR SE for the year t;similarly, list Lt + τ includes all journals which arepresent in JCR SE for the year t + τ. That is, no journalwas excluded from any of the 15 lists.
—For retentivity by the sum of the number of arti�cles (retentivity of the second order): a journal j wasexcluded from both lists Lt and of the pair, if for one ofthem the number of articles in this journal j was miss�ing or was equal to zero.
—For retentivity by the sum of impact factor(retentivity of the third order): a journal j was excludedfrom both lists Lt and Lt + τ of the pair, if for one of themthe impact factor value for a journal j was missing orwas equal to zero.
—For retentivity by the sum of the expectedresponse values (retentivity of the fourth order): a jour�nal j was excluded from both lists Lt and list Lt + τ of thepair, if for one of them the value of the number of arti�
cles and/or the impact factor value for the journal jwas/were missing or equal to zero.
BUILDING THE MODEL AND COMPUTATION OF ITS PARAMETERS
Let us formulate the following postulates:
Postulate 1. The probability of retentivity of jour�nals from list Lt in list Lj, which are separated by timeinterval τ, basically depends on the value of Δt and onthe relationship between the numbers of journals inthe compared lists Ni and Nj respectively.
Postulate 2. Retentivity of journals from list Lt inlist Lj, which are separated by time interval Δt, isinversely dependent on the time interval Δt.
It is obvious that, if all other conditions are equal,then the probability of retentivity of some journal froma “shorter” list (a list that includes a smaller number ofjournals) in a “longer” list (a list that includes a largernumber of journals) is higher. Vice versa, this probabil�ity gets lower in the opposite case. This leads us to for�mulate a third postulate:
Postulate 3. If all other conditions are equal (i.e.with a fixed value of τ) for different pairs of lists to becompared, the probability of the retentivity of journalsfrom list Li in list Lj grows, when , andreduces when .
NB: Because in reality we observe a regular growthover time in the number of journals included in thelists, the condition will be generally corre�sponding to the direct retentivity, and the condition
to reverse retentivity.
And finally:
Postulate 4. The postulates 1–3 are true not onlyfor the retentivity of lists of journals, but also for theretentivity with regard to the other characteristics:total number of articles, sum of impact�factor values,and sum of expected response values.
Let us discuss briefly the suggested postulates.Because postulate 1 only gives us general presump�tions, which would be verified if postulates 2 and 3 arefound to be true, it makes sense to discuss its truthful�ness only after discussing postulates 2 and 3. In ouropinion, postulate 2 can hardly be doubted: commonsense tells us that the larger the time interval betweenthe compared lists is, the lower the probability ofretention of journals from one list in the other will be.As to postulate 3, to our mind it also looks quite natu�ral. Indeed, the probability of retentivity of some partof a shorter list in a longer list (a list with a larger num�ber of journals) must be higher than the probability ofretentivity of some part of a longer list in a shorter list.
There can be doubt about postulate 4, which saysthat all three previous postulates are true also for theother numerical characteristics of retentivity. We will
/ 1i jN N <
/ 1i jN N >
/ 1i jN N <
/ 1i jN N >
74
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
examine the validity of these postulates against theempirical retentivity data.
It should be noted that the postulates and themathematical model that is created on the basis ofthese postulates, are only an approximation describingthe process of retentivity. Indeed, these postulates donot consider irregularities in the growth of the numberof journals and the papers published in them. E.g. theydo not take into account some external, socio�eco�nomic factors, In particular: a change in the worldeconomy, which could impact the intensity of researchand, therefore, the number and productivity of scien�tific journals; possible changes in the general rules bywhich the lists of journals in JCR SE are compiled;sharp changes of emphasis and trends in the directionof basic research (influence of possible scientific revo�lutions). The degree to which the model and the pos�tulates describe the real process of retentivity can indi�cate the level of influence of these factors.
We now turn to the creation of a mathematicalmodel for the process of retentivity. The differentorders of retentivity will be denoted as , where htakes values 1, 2, 3 and 4.
It is obvious that retentivity is an estimator for theprobability of a given quantitative characteristic. Frompostulates 1–3 follows that the retentivity for a givenquantitative characteristic basically depends on thefollowing two variables:
—The time interval between a pair of lists for agiven kh, where , ti the time (years) corre�sponding to the list i, and tj the time corresponding tolist j.
—The ratio between the values the characteristic inthe two lists Li and Lj (for a given order of retentivity kh).
Based on the above, we can write the two�parame�ter regression equation:
(12)
In this equation let us interpret the variables andcoefficients of regressors in line with our goal (inter�pretation of the intercept term c in equation (12) willbe discussed after completion of the model). Theregressand y in equation (12) will be interpreted as theremaining share (retentivity) of the value of a
quantitative characteristic corresponding to agiven retentivity order .
According to postulate 2, an increase of the timeinterval leads to a drop in retentivity. Hence, theregressor can be interpreted as a function of the timeinterval τ between the lists are compared. Since thereis no certainty that this is a linear function, we canequate to τγ (x1 = τγ). The coefficient a of (i.e. thecoefficient a of τλ) in the case γ = 1 can be interpretedas the value of retentivity decrease during 1 year, i.e.when the time interval between the lists is 1 year (τ = 1).
hk
| |hk i jt tτ = −
1 2 .y ax bx c= + +
pkh
hkR
hk
1x
1x 1x
Based on postulate 3, the regressor x2 in equation(12) will be considered as a variable, which reflects theinfluence of the ratio of the values and onthe retentivity changes for lists and . Coefficientb of x2 then is the constant, which describes thestrength of this influence. The values of quantitativecharacteristics are changing over time: if we are mov�ing from the past towards the present, these values tendto increase, and vice versa, reduced when we are mov�ing backwards in time. Generally speaking, there is noreason to suppose that the rate of change in time of thevalues is constant, i.e. we may assume that the rate ofchange in the values is time�dependent and thisdependence is not necessarily linear. For this reasonwe can consider the variable x2 as an analogue of accel�eration, which characterizes the rate of change ofretentivity in time. Based on this assumption, wedenote x2 as “acceleration” . Obviously, willhave a positive or negative value depending on thedirection in time. In the case of direct retentivity will have a positive, in the case of reverse retentivity anegative value. We will denote direct retentivity by
and reverse retentivity by .
Based on the above considerations, we can write fordirect retentivity
(13)
And for case of reverse retentivity
(14)
Here and are the values of given charac�teristic of the pair of lists Lt and Lt + τ for a given orderof retentivity kh. Exponent β of τ in the formulas (13)and (14) is introduced, based on the assumptionsmade above, that changes of numerical values and
are time�dependent and this dependence is notnecessarily linear. We introduced the constant λ = 1only for the stability of the model in the “degeneratecase”, i.e. in case τ = 0. This value λ = 1 was chosenbecause, in normal circumstances, it practicallyshould not affect the values (the values of and
are larger by 3–6 exponent of a numbers thanvalue of λ).
Now we can write the final form of the equation fordirect retentivity as follows:
(15)
And for reverse retentivity:
(16)
_hk tR _hk tR+τ
tL tL+τ
khRν khRν
khRν
,,
khR directν ,khR reverseν
,
, ,
,
.( )
h h
kh
h
t k t kR direct
t k
R R
R
+τ
β
−ν =
τ + λ
, ,,
,
.( )
h h
kh
h
t k t kR reverse
t k
R R
R
+τ
β+τ
−ν =
τ + λ
, ht kR , ht kR+τ
,ht kR
, ht kR+τ
hkν , ht kR
, ht kR+τ
, ,,
,
.( )
h h
h
h
t k t kdirect k
t k
R Rp a b c
R
γ +τ
β
−= τ + +
τ + λ
, ,,
,
.( )
h h
h
h
t k t krevers k
t k
R Rp a b c
R
γ +τ
β+τ
−= τ + +
τ + λ
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 75
We will consider these equations just as the mathe�matical model of the retentivity process.
NB. In equations (15) and (16) free term c would beequal to 1 only in case τ = 0, that is when a list of jour�nals is saved in itself (degenerate case): in this case theprobability of retentivity is equal to 1. In all other casesc = 1 is not quite correct. Indeed, as mentioned above,the proposed postulates do not account for the influ�ence of several socio�economic factors, so neitherdoes this model, which is based on those postulates. Itcan be assumed that the value c – 1 will assess thedegree of influence on the retentivity of these unac�counted factors.
COMPARISON OF THE MODELWITH THE OBSERVED
RETENTIVITY PROCESS
We compared the mathematical model of theretentivity process with the empirical data in an itera�tive process, using a specially developed computerprogram (in TSQL). Assuming a chi�square value of0.03 (χ2 = 0.03), we obtained for each of the fourorders of direct and reverse retentivity coefficients andexponents for the first two terms of equations (15) and(16), as well as the values of the free terms.
Substituting the parameters thus obtained in equa�tions (15) and (16) we can write equations (17–24)which together constitute the mathematical model ofthe retentivity process.
Direct retentivity of the first order (direct retentiv�ity by number of journals):
(17)
Reverse retentivity of the first order (reverse reten�tivity by number of journals):
(18)
Direct retentivity of the second order (direct reten�tivity by the sum of the number of articles):
(19)
Reverse retentivity of the second order (reverseretentivity by the sum of the number of articles):
(20)
34 354 993.1 1
1
1
0.8 , ,, 0.8
,
0.0 0. 0.( ) 1
t k t kdirect k
t k
R Rp
R+τ
−
= − τ + +
τ +
+τ
+τ
−
=− τ + +
τ +
138 1 411 1 149.1 1
1
1
0,5 , ,, 0.5
,
0. . .( ) 1
t k t kreverse k
t k
R Rp
R
27 798 972.2 2
2
2
0,8 , ,, 0.8
,
0.0 0. 0.( ) 1
t k t kdirect k
t k
R Rp
R+τ
−
=− τ + +
τ +
46
99 1 15.
2
2 2
2
0.8,
, ,0.8
,
0.0
0.0 .0( ) 1
reverse k
t k t k
t k
p
R R
R+τ
+τ
= − τ
−
− +
τ +
Direct retentivity of the third order (direct reten�tivity by the sum of the impact�factor values):
(21)
Reverse retentivity of the third order (reverse reten�tivity by the sum of the impact�factor values):
(22)
Direct retentivity of the fourth (direct retentivity bythe sum of the expected response values):
(23)
Reverse retentivity of the fourth order (reverseretentivity by the sum of the expected response val�ues):
(24)
EVALUATION THE MODELAND POSTULATES IN VIEW OF THEIR COMPLIANCE WITH EMPIRICAL DATA
The observed and model values of retentivity are pre�sented in tables 1–4. The comparison shows a highdegree of agreement between observed and model values.
Let us consider the degree of compliance of modeland postulates with the empirical data in general.Table 1 contains data about the retentivity by numberof journals, Table 2 of retentivity by the sum of numberof articles, Table 3 of retentivity by the sum of theimpact�factor values, and Table 4 of retentivity by thesum of the expected response values. Each of thesetables is a matrix that contains three types of data. Inthe next paragraph we describe table 1 in detail; thestructure of the other tables and the types of data theycontain are identical.
Each of the rows in the table corresponds to a par�ticular year and is divided into three substrings. Eachof the columns in the table also corresponds to a year.Each intersection of a row with a column correspondsto a particular pair of lists. E.g. if the string is in the rowfor the year 2000, and in the column for the year 2007,the data are related to the comparison of the lists for2000 and 2007. Each of these intersections is subdi�vided into three cells. The lower cell is filled only onthe diagonal and gives the number of journals for the
29
432 977.
3
3 3
3
0.8,
, ,0.8
,
0.0
0. 0.( ) 1
direct k
t k t k
t k
p
R R
R
+τ
= − τ
−
+ +
τ +
56
321 991.
3
3 3
3
0.8,
, ,0.8
,
0.0
0. 0.( ) 1
reverse k
t k t k
t k
p
R R
R
+τ
+τ
= − τ
−
− +
τ +
16
279 987.
4
4 4
4
0.8,
, ,0.8
,
0.0
0. 0.( ) 1
direct k
t k t k
t k
p
R R
R+τ
= − τ
−
+ +
τ +
39
178 1 19.
4
4 4
4
0.8,
, ,0.8
,
0.0
0. .0( ) 1
reverse k
t k t k
t k
p
R R
R+τ
+τ
= − τ
−
− +
τ +
76
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
Table 1. Retentivity by number of journals (retentivity of the first order)*
Years 1995 1996 1997 1998 1999 2000 2002 2003 2004 2005 2006 2007 2008 2009 2010
1995
1.000 0.965 0.935 0.918 0.896 0.871 0.827 0.807 0.795 0.783 0.765 0.753 0.744 0.736 0.736
1.000 0.971 0.949 0.938 0.914 0.893 0.852 0.832 0.814 0.796 0.779 0.764 0.748 0.738 0.727
4623
1996
0.933 1.000 0.966 0.945 0.921 0.894 0.849 0.828 0.815 0.802 0.784 0.772 0.765 0.756 0.755
0.964 1.000 0.973 0.964 0.925 0.912 0.870 0.850 0.830 0.813 0.795 0.780 0.764 0.753 0.742
4779
1997
0.869 0.928 1.000 0.973 0.946 0.921 0.875 0.853 0.838 0.825 0.807 0.796 0.788 0.779 0.778
0.883 0.956 1.000 0.995 0.958 0.933 0.888 0.867 0.847 0.829 0.811 0.795 0.779 0.768 0.757
4963
1998
0.774 0.824 0.863 1.000 0.966 0.936 0.885 0.860 0.844 0.829 0.810 0.798 0.787 0.778 0.777
0.782 0.826 0.881 1.000 0.964 0.942 0.899 0/878 0.858 0.840 0.822 0.807 0.790 0.779 0.768
5464
1999
0.744 0.791 0.846 0.952 1.000 0.963 0.910 0.885 0.867 0.853 0.832 0.819 0.808 0.799 0.798
0.753 0.795 0.848 0.990 1.000 0.968 0.920 0.898 0.877 0.859 0.840 0.824 0.808 0.797 0.785
5550
2000
0.705 0.749 0.802 0.899 0.939 1.000 0.940 0.914 0.894 0.878 0.856 0.842 0.830 0.821 0.818
0.720 0.758 0.805 0.914 0.976 1.000 0.941 0.917 0.896 0.877 0.858 0.841 0.825 0.814 0.802
5684
2002
0.647 0.688 0.737 0.822 0.857 0.909 1.000 0.967 0.944 0.926 0.904 0.888 0.876 0.865 0.862
0.668 0.701 0.741 0.822 0.863 0.920 1.000 0.961 0.937 0.916 0.896 0.879 0.861 0.850 0.838
5876
2003
0.628 0.667 0.714 0.794 0.829 0.878 0.962 1.000 0.974 0.955 0.933 0.917 0.903 0.892 0.888
0.648 0.680 0.717 0.792 0.829 0.878 0.998 1.000 0.963 0.940 0.917 0.900 0.881 0.871 0.859
5907
2004
0.612 0.649 0.695 0.771 0.805 0.851 0.929 0.964 1.000 0.978 0.953 0.936 0.922 0.910 0.906
0.627 0.657 0.693 0.761 0.795 0.839 0.938 0.997 1.000 0.966 0.940 0.822 0.903 0.892 0.880
5968
2005
0.591 0.627 0.671 0.743 0.776 0.819 0.894 0.926 0.959 1.000 0.972 0.955 0.941 0.928 0.924
0.603 0.632 0.665 0.726 0.759 0.798 0.881 0.924 0.983 1.000 0.963 0.945 0.924 0.914 0.902
6088
2006
0.571 0.605 0.648 0.717 0.779 0.788 0.861 0.893 0.923 0.960 1.000 0.980 0.965 0.951 0.946
0.583 0.610 0.642 0.701 0.729 0.765 0.840 0.876 0.922 0.993 1.000 0.974 0.949 0.939 0.926
6166
2007
0.538 0.572 0.613 0.677 0.706 0.744 0.812 0.843 0.870 0.904 0.941 1.000 0.982 0.967 0.962
0.555 0.580 0.610 0.664 0.689 0.722 0.786 0.816 0.852 0.901 0.954 1.000 0.970 0.963 0.949
6427
2008
0.517 0.549 0.589 0.648 0.676 0.712 0.777 0.805 0.831 0.866 0.899 0.953 1.000 0.982 0.975
0.532 0.556 0.584 0.634 0.658 0.687 0.746 0.772 0.803 0.844 0.885 0.970 1.000 0.998 0.978
6620
2009
0.460 0.489 0.525 0.578 0.602 0.635 0.692 0.717 0.739 0.769 0.798 0.846 0.885 1.000 0.984
0.491 0.513 0.538 0.581 0.602 0.628 0.677 0.698 0.722 0.752 0.779 0.829 0.871 1.000 0.994
7347
2010
0.419 0.445 0.477 0.525 0.547 0.576 0.627 0.649 0.670 0.697 0.723 0.766 0.799 0.895 1.000
0.457 0.477 0.500 0.538 0.557 0.580 0.623 0.641 0.661 0.685 0.706 0.744 0.774 0.884 1.000
8073* — For each intersection the top cell contains the value of the observed retentivity, the middle cell the model value for the comparison
of the relvant years’ lists. On the diagonal, the number of journals in the JCR SE release for that year is given in the bottom cell in bold.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 77
Tabl
e 2.
Ret
enti
vity
by
the
sum
of n
umbe
r of
art
icle
s (r
eten
tivi
ty o
f th
e se
con
d or
der)
*
Year
s19
9519
9619
9719
9819
9920
0020
0220
0320
0420
0520
0620
0720
0820
0920
10
1995
1.00
00.
981
0.96
00.
934
0.91
40.
893
0.86
80.
854
0.84
80.
843
0.82
20.
820
0.81
70.
811
0.80
8
1.00
00.
977
0.94
70.
934
0.91
80.
903
0.87
40.
865
0.86
00.
852
0.83
50.
830
0.81
90.
813
0.80
8
607
049
1996
0.96
41.
000
0.97
80.
945
0.92
40.
902
0.87
90.
864
0.85
70.
852
0.83
10.
826
0.82
30.
816
0.81
4
0.97
31.
000
0.95
10.
944
0.92
80.
913
0.88
50.
875
0.87
10.
863
0.84
60.
840
0.82
90.
823
0.81
8
6311
37
1997
0.93
00.
963
1.00
00.
967
0.94
80.
925
0.90
10.
886
0.88
00.
874
0.85
40.
851
0.84
80.
841
0.83
9
0.93
80.
970
1.00
00.
971
0.95
10.
933
0.90
20.
892
0.88
80.
880
0.86
20.
857
0.84
50.
839
0.83
3
6353
86
1998
0.88
80.
917
0.95
11.
000
0.97
90.
954
0.92
80.
911
0.90
50.
899
0.87
60.
874
0.86
70.
862
0.85
9
0.90
80.
938
0.97
21.
000
0.96
40.
946
0.91
40.
905
0.90
10.
893
0.87
40.
869
0.85
70.
851
0.84
5
656
014
1999
0.86
00.
889
0.92
10.
967
1.00
00.
976
0.94
90.
931
0.92
30.
917
0.89
40.
890
0.88
50.
879
0.87
6
0.88
00.
908
0.93
90.
972
1.00
00.
963
0.92
90.
920
0.91
70.
908
0.88
90.
884
0.87
10.
865
0.85
9
6714
66
2000
0.83
00.
856
0.89
00.
932
0.96
51.
000
0.96
70.
951
0.94
10.
934
0.90
80.
905
0.90
00.
893
0.89
1
0.85
30.
879
0.90
80.
938
0.97
11.
000
0.94
50.
936
0.93
50.
926
0.90
40.
899
0.88
60.
880
0.87
4
686
146
2002
0.78
40.
811
0.84
40.
882
0.91
40.
943
1.00
00.
981
0.97
00.
963
0.93
30.
930
0.92
50.
917
0.91
4
0.80
20.
826
0.85
30.
879
0.90
80.
938
1.00
00.
979
0.98
00.
967
0.93
90.
934
0.92
00.
912
0.90
6
7163
04
2003
0.76
70.
794
0.82
70.
865
0.89
40.
924
0.97
51.
000
0.98
50.
978
0.94
80.
946
0.94
00.
932
0.92
8
0.77
80.
802
0.82
70.
853
0.88
00.
908
0.97
31.
000
0.99
70.
986
0.95
30.
948
0.93
20.
925
0.91
9
7470
60
78
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
Tabl
e 2.
(Con
td.)
Year
s19
9519
9619
9719
9819
9920
0020
0220
0320
0420
0520
0620
0720
0820
0920
10
2004
0.73
70.
762
0.79
40.
830
0.88
00.
910
0.95
90.
959
1.00
00.
990
0.93
60.
933
0.92
70.
918
0.91
6
0.75
40.
778
0.80
30.
828
0.85
40.
881
0.94
20.
976
1.00
00.
989
0.95
20.
925
0.93
70.
930
0.92
4
8029
88
2005
0.71
00.
736
0.76
80.
803
0.85
90.
888
0.93
70.
937
0.97
81.
000
0.93
90.
936
0.93
00.
920
0.91
8
0.73
10.
755
0.77
90.
803
0.82
80.
855
0.91
10.
942
0.97
51.
000
0.94
80.
960
0.94
50.
940
0.93
5
847
143
2006
0.71
70.
744
0.77
70.
812
0.83
80.
867
0.91
80.
943
0.95
80.
977
1.00
00.
994
0.98
60.
976
0.97
4
0.70
80.
731
0.75
50.
778
0.80
30.
828
0.88
20.
910
0.93
90.
970
1.00
00.
996
0.97
60.
968
0.96
1
8564
76
2007
0.68
50.
712
0.74
30.
777
0.89
050.
833
0.88
40.
911
0.92
50.
946
0.96
71.
000
0.99
10.
981
0.97
8
0.68
60.
709
0.73
20.
755
0.77
90.
804
0.85
50.
882
0.91
00.
940
0.97
61.
000
0.97
40.
973
0.96
8
9142
64
2008
0.66
80.
695
0.72
50.
759
0.78
40.
813
0.86
50.
890
0.90
50.
925
0.94
70.
979
1.00
00.
989
0.98
2
0.66
40.
686
0.70
90.
732
0.75
50.
779
0.82
90.
855
0.88
20.
909
0.94
10.
973
1.00
00.
998
0.99
0
9456
79
2009
0.62
10.
649
0.67
80.
713
0.73
80.
764
0.81
20.
836
0.85
10.
873
0.89
40.
927
0.94
51.
000
0.98
9
0.64
30.
664
0.68
70.
709
0.73
20.
756
0.80
40.
830
0.85
50.
882
0.91
10.
941
0.97
51.
000
0.99
8
1009
590
2010
0.58
20.
608
0.63
60.
668
0.69
20.
718
0.76
50.
787
0.80
20.
823
0.84
40.
880
0.89
70.
951
1.00
0
0.62
10.
643
0.66
50.
687
0.71
00.
733
0.78
00.
805
0.83
00.
855
0.88
40.
911
0.94
30.
976
1.00
0
1080
209
*—
For
eac
h in
ters
ecti
on th
e to
p ce
ll c
onta
ins
the
valu
e of
the
obse
rved
ret
enti
vity
, th
e m
iddl
e ce
ll th
e m
odel
val
ue fo
r th
e co
mpa
riso
n o
f th
e re
lvan
t yea
rs’ l
ists
. On
the
diag
onal
, th
en
umbe
r of
art
icle
s in
th
e JC
R S
E r
elea
se fo
r th
at y
ear
is g
iven
in t
he
bott
om c
ell i
n b
old.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 79
Tabl
e 3.
Ret
enti
vity
by
the
sum
of t
he
impa
ct�f
acto
r va
lues
(re
ten
tivi
ty o
f th
e th
ird
orde
r)*
Year
s19
9519
9619
9719
9819
9920
0020
0220
0320
0420
0520
0620
0720
0820
0920
10
1995
1.00
00.
976
0.93
40.
927
0.90
90.
899
0.87
80.
861
0.85
30.
843
0.82
00.
819
0.81
70.
774
0.81
5
1.00
00.
976
0.94
20.
936
0.92
70.
911
0.87
20.
871
0.85
80.
847
0.83
40.
823
0.82
30.
806
0.80
7
5873
.2
1996
0.94
71.
000
0.96
60.
955
0.94
00.
927
0.90
20.
885
0.87
60.
864
0.84
10.
840
0.83
80.
799
0.83
8
0.95
51.
000
0.94
80.
949
0.94
10.
924
0.89
50.
883
0.87
00.
858
0.84
50.
834
0.83
40.
817
0.81
7
6251
.8
1997
0.89
30.
946
1.00
00.
980
0.95
90.
946
0.91
80.
903
0.89
20.
880
0.85
70.
854
0.85
30.
817
0.85
5
0.90
50.
936
1.00
00.
988
0.97
40.
952
0.91
80.
905
0.89
10.
879
0.86
50.
853
0.85
30.
836
0.83
6
6255
.8
1998
0.84
50.
893
0.94
21.
000
0.97
10.
956
0.92
30.
907
0.89
80.
884
0.85
70.
855
0.85
50.
814
0.85
5
0.87
60.
910
0.96
31.
000
0.98
70.
963
0.92
80.
915
0.90
00.
888
0.87
40.
862
0.86
10.
843
0.84
3
6840
.3
1999
0.81
40.
863
0.90
70.
962
1.00
00.
979
0.94
40.
927
0.91
80.
904
0.87
70.
876
0.87
50.
834
0.87
4
0.84
50.
878
0.92
40.
962
1.00
00.
970
0.93
80.
925
0.91
00.
897
0.88
30.
871
0.87
00.
851
0.85
1
7462
.8
2000
0.77
90.
825
0.87
10.
921
0.95
71.
000
0.95
80.
939
0.92
80.
914
0.88
80.
886
0.88
60.
846
0.88
3
0.81
10.
844
0.88
40.
918
0.95
11.
000
0.95
50.
942
0.92
60.
912
0.89
80.
885
0.88
40.
865
0.86
4
7851
.1
2002
0.71
80.
759
0.79
60.
841
0.87
10.
916
1.00
00.
977
0.96
50.
953
0.92
50.
924
0.92
40.
882
0.91
9
0.74
90.
779
0.81
40.
845
0.87
70.
913
1.00
00.
980
0.95
90.
944
0.92
80.
913
0.91
30.
892
0.89
1
8755
.3
2003
0.68
50.
725
0.76
20.
806
0.83
60.
877
0.96
71.
000
0.98
50.
973
0.94
60.
944
0.94
40.
898
0.93
7
0.72
00.
749
0.78
30.
813
0.84
40.
879
0.95
71.
000
0.97
20.
958
0.94
10.
926
0.92
70.
904
0.90
3
9401
.3
80
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
Tabl
e 3.
(Con
td.)
Year
s19
9519
9619
9719
9819
9920
0020
0220
0320
0420
0520
0620
0720
0820
0920
10
2004
0.66
80.
705
0.74
20.
782
0.81
10.
850
0.93
50.
970
1.00
00.
986
0.95
60.
954
0.95
30.
908
0.94
7
0.69
00.
719
0.75
20.
781
0.81
10.
844
0.91
60.
952
1.00
00.
977
0.95
70.
942
0.94
40.
919
0.91
8
9938
.2
2005
0.64
30.
680
0.71
50.
753
0.78
10.
819
0.90
20.
936
0.96
81.
000
0.96
80.
967
0.96
60.
919
0.95
7
0.66
20.
691
0.72
20.
751
0.78
00.
812
0.88
00.
915
0.95
61.
000
0.97
20.
957
0.96
10.
934
0.93
3
1061
7.9
2006
0.62
20.
656
0.69
20.
728
0.75
50.
791
0.86
90.
903
0.93
10.
962
1.00
00.
997
0.97
80.
944
0.98
5
0.63
40.
662
0.69
20.
721
0.74
90.
780
0.84
50.
878
0.91
50.
952
1.00
00.
976
0.98
40.
952
0.95
1
1142
2.0
2007
0.60
20.
635
0.66
90.
703
0.72
70.
761
0.83
60.
867
0.89
50.
924
0.96
21.
000
0.95
80.
944
0.98
5
0.60
70.
634
0.66
40.
691
0.72
00.
750
0.81
30.
844
0.87
90.
914
0.95
51.
000
0.99
70.
970
0.97
0
1205
7.8
2008
0.60
50.
640
0.67
60.
712
0.73
80.
772
0.84
90.
883
0.91
10.
942
0.96
30.
980
1.00
00.
944
0.98
4
0.58
10.
608
0.63
40.
665
0.69
30.
723
0.78
30.
817
0.85
20.
888
0.92
90.
979
1.00
00.
954
0.96
8
1345
3.9
2009
0.53
70.
569
0.60
30.
634
0.65
80.
689
0.75
90.
789
0.81
50.
842
0.87
40.
913
0.89
51.
000
0.98
9
0.55
30.
580
0.60
90.
636
0.66
30.
692
0.75
20.
783
0.81
50.
848
0.88
40.
922
0.94
01.
000
0.99
9
1470
3.0
2010
0.51
50.
542
0.57
20.
601
0.62
40.
651
0.71
60.
744
0.76
70.
792
0.82
40.
861
0.84
40.
899
1.00
0
0.52
80.
555
0.58
30.
610
0.63
70.
666
0.72
50.
755
0.78
70.
820
0.85
50.
892
0.91
90.
978
1.00
0
1621
6.0
*—
For
eac
h in
ters
ecti
on th
e to
p ce
ll c
onta
ins
the
valu
e of
the
obse
rved
ret
enti
vity
, th
e m
iddl
e ce
ll th
e m
odel
val
ue fo
r th
e co
mpa
riso
n o
f th
e re
lvan
t yea
rs' l
ists
. On
the
diag
onal
, th
esu
m o
f im
pact
fact
or v
alue
s of
jour
nal
s in
th
e JC
R S
E r
elea
se fo
r th
e ye
ar is
giv
en in
th
e bo
ttom
cel
l in
bol
d.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 81
Tab
le 4
.R
eten
tivi
ty b
y th
e su
m o
f th
e ex
pect
ed r
espo
nse
val
ues
(ret
enti
vity
of t
he
four
th o
rder
)*
Year
s19
9519
9619
9719
9819
9920
0020
0220
0320
0420
0520
0620
0720
0820
0920
10
1995
1.00
00.
991
0.97
70.
971
0.95
80.
950
0.93
70.
927
0.92
30.
918
0.90
90.
909
0.90
70.
852
0.90
2
1.00
00.
993
0.96
90.
962
0.95
70.
948
0.93
20.
927
0.92
20.
917
0.91
00.
904
0.90
40.
892
0.89
3
1175
704.
6
1996
0.97
71.
000
0.98
60.
979
0.96
60.
957
0.94
40.
935
0.93
00.
926
0.91
80.
916
0.91
50.
860
0.91
0
0.99
31.
000
0.96
80.
965
0.96
20.
953
0.93
70.
932
0.92
80.
922
0.91
50.
909
0.90
90.
897
0.89
7
1269
083.
2
1997
0.95
70.
983
1.00
00.
992
0.97
90.
969
0.95
70.
947
0.94
30.
938
0.93
00.
929
0.92
70.
874
0.92
3
0.95
80.
978
1.00
00.
986
0.98
10.
969
0.95
10.
945
0.94
00.
935
0.92
70.
921
0.92
10.
908
0.90
8
1254
494.
1
1998
0.93
80.
963
0.98
11.
000
0.98
70.
976
0.96
40.
953
0.94
90.
944
0.93
60.
935
0.93
30.
978
0.92
9
0.93
40.
956
0.98
91.
000
0.99
40.
978
0.95
90.
953
0.94
80.
942
0.93
50.
928
0.92
80.
915
0.91
5
1322
313.
0
1999
0.91
40.
940
0.95
80.
980
1.00
00.
989
0.97
50.
964
0.95
90.
954
0.94
50.
944
0.94
20.
889
0.93
8
0.91
20.
934
0.96
40.
994
1.00
00.
981
0.96
40.
959
0.95
40.
948
0.94
00.
933
0.93
30.
919
0.91
9
1430
310.
8
2000
0.89
20.
918
0.93
60.
958
0.97
81.
000
0.98
40.
974
0.96
90.
964
0.95
50.
954
0.95
20.
899
0.94
8
0.88
90.
910
0.93
70.
963
0.98
71.
000
0.97
50.
969
0.96
50.
958
0.95
00.
942
0.94
20.
828
0.92
8
1483
081.
0
2002
0.95
10.
877
0.89
50.
916
0.93
70.
962
1.00
00.
988
0.98
30.
979
0.96
80.
967
0.96
50.
910
0.96
0
0.94
60.
866
0.89
00.
913
0.93
50.
961
1.00
00.
993
0.98
70.
979
0.96
80.
960
0.96
00.
944
0.94
4
1632
429.
6
2003
0.82
60.
854
0.87
30.
895
0.91
60.
943
0.98
31.
000
0.99
30.
988
0.97
80.
977
0.97
50.
919
0.96
9
0.82
60.
846
0.86
90.
891
0.91
30.
937
0.99
31.
000
0.99
60.
987
0.97
50.
966
0.96
70.
950
0.95
0
1760
771.
8
82
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
Tab
le 4
.(C
ontd
.)
Year
s19
9519
9619
9719
9819
9920
0020
0220
0320
0420
0520
0620
0720
0820
0920
10
2004
0.80
30.
831
0.85
10.
874
0.90
00.
926
0.96
80.
981
1.00
00.
994
0.97
90.
978
0.97
50.
919
0.96
8
0.80
70.
826
0.84
90.
870
0.89
10.
915
0.96
70.
995
1.00
00.
992
0.98
00.
971
0.97
20.
955
0.95
5
1922
874.
5
2005
0.78
50.
814
0.83
30.
855
0.88
20.
910
0.95
30.
968
0.98
61.
000
0.98
30.
981
0.97
90.
920
0.97
2
0.78
70.
807
0.82
60.
849
0.87
00.
893
0.94
10.
967
0.99
31.
000
0.98
60.
978
0.98
00.
961
0.96
1
2071
938.
5
2006
0.76
50.
793
0.81
30.
838
0.85
80.
887
0.93
40.
954
0.96
90.
983
1.00
00.
998
0.98
90.
938
0.98
7
0.76
80.
787
0.80
80.
828
0.84
90.
871
0.91
70.
940
0.96
40.
989
1.00
00.
988
0.99
30.
970
0.97
1
2192
911.
0
2007
0.74
20.
771
0.79
10.
812
0.83
40.
864
0.91
20.
933
0.94
90.
964
0.98
31.
000
0.98
20.
939
0.98
7
0.74
90.
768
0.78
80.
808
0.82
80.
849
0.89
30.
916
0.93
80.
962
0.99
01.
000
0.99
80.
978
0.98
1
2322
726.
7
2008
0.73
20.
764
0.78
30.
905
0.82
60.
855
0.90
70.
929
0.94
70.
963
0.97
90.
988
1.00
00.
942
0.98
9
0.73
10.
750
0.77
00.
789
0.80
90.
830
0.87
30.
895
0.91
80.
942
0.97
00.
999
1.00
00.
965
0.97
8
2556
148.
1
2009
0.68
50.
717
0.73
70.
760
0.78
00.
809
0.85
60.
877
0.89
40.
911
0.93
40.
955
0.95
11.
000
0.99
4
0.71
20.
730
0.74
90.
769
0.78
80.
808
0.85
00.
871
0.89
20.
914
0.93
80.
963
0.97
71.
000
0.99
8
2737
072.
5
2010
0.65
80.
689
0.70
70.
727
0.74
70.
777
0.82
40.
845
0.86
20.
877
0.90
20.
925
0.92
20.
928
1.00
0
0.69
40.
712
0.73
10.
750
0.76
90.
789
0.83
00.
851
0.87
20.
894
0.91
70.
942
0.96
30.
997
1.00
0
2976
525.
8
*—
For
eac
h in
ters
ecti
on th
e to
p ce
ll c
onta
ins
the
valu
e of
the
obse
rved
ret
enti
vity
, th
e m
iddl
e ce
ll th
e m
odel
val
ue fo
r th
e co
mpa
riso
n o
f th
e re
lvan
t yea
rs’ l
ists
. On
the
diag
onal
, th
esu
m o
f exp
ecte
d re
spon
se v
alue
s of
jour
nal
s in
th
e JC
R S
E r
elea
se fo
r th
at y
ear
is g
iven
in t
he
bott
om c
ell i
n b
old.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 83
related year in JCR SE (collumn and row on the diag�onal, of course, refer to the same year). Observedretentivity is indicated in the upper cell, and the modelretentivity in the middle cell. Data above the diagonalrefer to direct retentivity, those below the diagonal toreverse retentivity.
Tables 1–4 contain detailed information about theretentivity. In a more compact form the degree ofagreement of the model with the empirical data can bedescribed (using data of tables 1–4) by the followingrelationship:
(25)exp
exp
| |100.theorp p
relatp
−Δ =
where:
is the relative deviation of the model reten�tivity from the observed retentivity.
is the model value of the retentivity;
is the observed value of the retentivity;
is the absolute value of the differencebetween the model and observed retentivity.
Table 5 shows the fraction of cases (from the totalnumber of comparisons) for which the values of Δrelatare within the values shown in columns 3–7 of thetable.
relatΔ
theorp
expp
exp| |theorp p−
Table 5. Level of agreement of the model with empirical data*
1 2 3 4 5 6 7
Direct retentivity of the first order (by number of journals) 105 32.4 45.7 19.0 2.9 0.0
Reverse retentivity of the first order (by number of journals) 105 23.8 26.7 28.6 15.2 5.7
Direct retentivity of the second order (by the sum of number of articles) 105 56.2 39.0 4.8 0.0 0.0
Reverse retentivity of the second order (by the sum of number of articles) 105 28.6 21.9 22.9 16.2 10.5
Direct retentivity of the third order (by the sum of the impact�factor values) 105 60.0 22.9 11.4 3.8 1.9
Reverse retentivity of the third order (by the sum of the impact�factor values) 105 36.2 22.9 17.1 9.5 14.3
Direct retentivity of the fourth order (direct retentivity by the sum of the expected response values)
105 67.6 18.1 2.9 6.7 4.8
Reverse retentivity of the fourth order (by the sum of the expected response values) 105 57.1 26.7 6.7 6.7 2.9
* —1—order and direction of retentivity; 2—number of comparisons of observed and model retentivity; 3—fraction of comparisonsfor which Δrelat ⇐ 1 (%); 4—fraction of for which 1 < Δrelat ⇐ 2 (%); 5—fraction of comparisons for which 2 < Δrelat ⇐ 3 (%);6—fraction of comparisons for which (%) 3 < Δrelat ⇐ 4 ; 7—fraction of comparisons for which Δrelat > 4 (%);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0
0.2
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Time interval, yearsReal retentivity: direct retentivity by number of journalsReal retentivity: reverse retentivity by number of journalsTheo retical retentivity: direct retentivity by number of journals, function graphTheo retical retentivity: reverse t retentivity by number of journals, function graph
Ret
enti
vity
Fig. 1. Comparison of model and empirical data for the1st order of retentivity (by number of journals).
84
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
Table 5 allows us to draw some conclusions. Firstly,the model is in close enough agreement with theempirical data. Since the proportion of cases wherethe calculated (theoretical) values of retentivity differby no more than 1% is in the range 23.8–67.6%. Theproportion of cases when the deviation is less than2%—in the range of 50.5–85.7% (the sum of the cor�responding values of columns 3 and 4). For caseswhere the deviation is less than 3%—80.1–88.6%.(the sum corresponding values of columns 3, 4 and 5).
The maximum difference value found, is 9.6%; there isonly one such large deviation among 840 compari�sons. Secondly, for all the cases of direct retentivity themodel is in closer agreement with the observed valuesthen in cases of reverse retentivity (see also Figs. 1–4).Thirdly, the agreement of the model with empiricaldata increases with the increasing retentivity order.
We also find the highest values of retentivity for the4th order of retentivity, and the lowest for the firstorder (see also figures 1–4).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0
0.2
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Time interval, yearsReal retentivity: direct retentivity by the sum of number of articlesReal retentivity: reverse retentivity by the sum of number of articlesTheoretocal retentivity: direct retentivity by the sum of number of articles, function graphTheoretocal retentivity: reverse retentivity by the sum of number of articles, function graph
Ret
enti
vity
Fig. 2. Comparison of model and empirical data for 2nd order of retentivity (by the sum of number of articles).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0
0.2
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Time interval, yearsReal retentivity: direct retentivity by the sum of the values of impact factorReal retentivity: reverse retentivity by the sum of the values of impact factorTheoretocal retentivity: direct retentivity by the sum of the values of impact factor,
Theoretocal retentivity: reverse retentivity by the sum of the values of impact factor,
Ret
enti
vity
function graph
function graph
Fig. 3. Comparison of model and empirical data for the 3rd order of retentivity (by the sum of the impact factor values).
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 85
The level of agreement between the model and theempirical data is presented in a visually accessibleform in Figs. 1–4, which correspond to retentivityorders 1–4. The curves in these graphs correspond tothe model retentivity (upper curve—direct retentivity,lower—reverse retentivity), and data points representthe observed retentivity values.
In interpreting these graphs one should keep thefollowing in mind. If the values of the observed reten�tivity correspond to different periods of physical time,but refer to the same time intervals, then these valuesare usually only slightly different from each other, andoften coincide up to the third decimal place. In thesecases, the symbols for these values on the graph will
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0
0.2
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Time interval, yearsReal retentivity: direct retentivity by the sum of values of the expected responseReal etentivity: reverse retentivity by the sum values of impact factorTheoretocal retentivity: direct retentivity by the sum of values of the expected response,
Theoretocal retentivity: reverse retentivity by the sum of values of the expected response,
Ret
enti
vity
function graph
function graph
Fig. 4. Comparison of model and empirical data for the 4th order of retentivity (by the sum of the expected response values).
0.05
0
–0.05
–0.10
–0.15
–0.20
159742 31 5
–0.25
–0.30
–0.35141312106 8 11
Influence on retentivity of the 1st term of the equation
Influence on retentivity of the 2st term of the equation Influence on retentivityof the free tern of the equation, с–1
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e o
f 1
Fig. 5. Direct retentivity of the 1st order. Influence on retentivity of 1—the time interval τ between two compared lists (1st termof the equation); 2—the changes over time of the ratio of the journal numbers in these lists (2nd term of the equation); 3—factors,which the postulates do not take into account directly (the difference between the free term c and 1).
86
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
overlap and merge into a single data point and may beperceived as a single value. It is typical for. This hap�pens most often in the case of direct retentivity of the
2nd, 3rd and 4th order. It is especially true for smalltime intervals, for which, naturally, more data aboutretentivity are available. For example, for an interval of
0.17
0.07
–0.03
–0.13
–0.23
–0.33
159742 31 5
–0.43
–0.53
–0.63141312106 8 11
Influence on retentivity of the 1st term of the equationInfluence on retentivity of the 2st term of the equationInfluence on retentivity of the free tern of the equation, c–1
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1
Fig. 6. Reverse retentivity of the 1st order. Influence on retentivity of 1—the time interval τ between two compared lists (1st termof the equation); 2—the changes over time of the ratio of the journal numbers in these lists (2nd term of the equation); 3—thefactors, which the postulates do not take into account directly (the difference between the free term c and 1).
0.10
0.05
0
–0.05
–0.10
–0.15
159742 31 5
–0.20
–0.25
–0.30141312106 8 11
Influence on retentivity of the 1st term of the equation
nfluence on retentivity of the 2st term of the equationInfluence on retentivity of the free ternthe equation, c–1
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1
Fig. 7. Direct retentivity of the 2nd order. Influence on retentivity of 1) the time interval τ between two compared lists (1st term ofthe equation); 2—the changes over time of the ratio of the sum of number of articles in these lists (2nd term of the equation); 3—the factors, which the postulates do not take into account directly (the difference between the free term c and 1).
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 87
1 year we have 14 different values, for an interval of 2years 13 values, etc. At the same time, sporadic outliersare clearly visible and seem to have a greater “weight”most of the empirical data lie close to the graphs, indi�cating good agreement of the model.
FINAL VERIFICATION OF THE POSTULATES
While the tables and figures presented above showgood agreement between the model and the observedretentivities, unfortunately, those data do not allow usto estimate the specific impact of each term in the cor�
0.05
–0.05
–0.15
–0.20
–0.25
–0.30
159742 31 5
–0.35
–0.40
–0.45141312106 8 11
Influence on retentivity of the 1st termof the equation
Influence on retentivity of the 2st termof the equation
Influence on retentivity of the free ternof the equation, c–1
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1–0.10
0
Fig. 8. Reverse retentivity of the 2nd order. Influence on retentivity of 1—the time interval τ between two compared lists (1st termof the equation); 2—the changes over time of the ratio of the sum of number of articles in these lists (2nd term of the equation);3—the factors, which the postulates do not take into account directly (the difference between the free term c and 1).
0.10
0.05
0
–0.05
–0.10
–0.15
159742 31 5
–0.20
–0.25
–0.30141312106 8 11
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1
Influence on retentivity of the 1st termof the equation
Influence on retentivity of the 2st termof the equation
Influence on retentivity of the free ternof the equation, c–1
Fig. 9. Direct retentivity of the 3rd order. Influence on retentivity of 1—the time interval τ between two compared lists (1st termof the equation); 2—the changes over time of the ratio of the sum of the impact�factor values in these lists (2nd term of the equa�tion); 3—the factors, which the postulates do not take into account directly (the difference between the free term c and 1).
88
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
responding equation of the model. Without such anassessment, we cannot check the validity of the fourpostulates which form the basis of the model. To solvethis key problem, we will consider the mathematicalstructure of each term of the relevant equations, anduse the graphs in Fig. 5–12. Each of these graphsreflects both the impact of the interval between the
compared lists on retentivity (1st term of the equation)and the change in the ratio between the quantitativecharacteristic values assigned to a given pair of lists(2nd term of the equation). In addition, each graphsgives an indication of the influence of the socio�eco�nomic factors which the postulates do not take intoaccount. The graphs show this influence in a graph
0.05
0
–0.10
–0.20
–0.30
–0.40
159742 31 5
–0.45
–0.50
–0.55141312106 8 11
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1
–0.05
–0.15
–0.25
–0.35 Influence on retentivity of the 1st termof the equation
Influence on retentivity of the 2st termof the equation
Influence on retentivity of the free ternof the equation, c–1
Fig. 10. Reverse retentivity of the 3rd order. Influence on retentivity of 1—the time interval τ between two compared lists (1st termof the equation); 2—the changes over time of the ratio of the sum of the impact�factor values in these lists (2nd term of the equa�tion); 3—the factors, which the postulates do not take into account directly (the difference between the free term c and 1).
0.06
–0.04
0.01
–0.09
–0.14
159742 31 5–0.19
141312106 8 11
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1
Influence on retentivity of the 1st termof the equationInfluence on retentivity of the 2st termof the equationInfluence on retentivity of the free ternof the equation, c–1
Fig. 11. Direct retentivity of the 4th order. Influence on retentivity of 1—the time interval τ between two compared lists (1st termof the equation); 2—the changes over time of the ratio of the sum of the expected response values in these lists (2nd term of theequation); 3—the factors, which the postulates do not take into account directly (the difference between the free term c and 1).
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 89
that shows the change in time of c – 1, where c is thevalue of the free term of the equation, and 1 the prob�ability of retention in the degenerate case, i.e. of theretention of the list in itself.
For the numerical evaluation of the impact of eachterm of the equations, we will use the following ratio:
(26)
where:
is the numerical value of the contribution of agiven term to retentivity/non retentivity; i corre�sponds to the 1st, 2nd or 3rd term; k is the order ofretentivity; D the direction of retentivity;
is the numerical value of the retentivity of agiven order k for direction D;
r is the fraction of the contribution of a given termof the equation to the retentivity/non retentivity (%).
For convenience we will take the time interval to be15 years, i.e. the maximum time difference betweenlists considered in the present study.
The influence of the 1st term of the equation. Look�ing at the first term on the right side of the model equa�tions, we find that in all eight cases (four cases of directand four cases of reverse retentivity) an increase of thetime interval between lists is connected with a drop inretentivity. Indeed, in equations (17)–(24) the value ofthe coefficient a related to τ is always negative.
, ,
,
100,i k Dm
k D
rp
δ=
, ,i k Dmδ
, ,i k Dm
,k Dp
The influence of the time interval on retentivitydecreases with the increase of the order of retentivity.For example, in the case of first order reverse retentiv�ity a = –0.034, but for the fourth order a = –0.016.
The retentivity decreases non�linearly with thetime interval (the exponent in all cases is less than 1),which results in a decrease of the absolute value of theincrement of retentivity reduction with increasingtime interval. Therefore, the impact of retentivitydependence on the time interval falls, even if onlyslowly. The non�linear nature of this dependence ismore characteristic for reverse retentivity (see Fig. 6,8, 10 and 12). This is most clearly illustrated by Fig. 6,in which the lower curve is significantly concave andcorresponds to the reverse retentivity by number ofjournals.
The influence of increasing time interval τ is stron�ger for the reverse retentivity than for the correspond�ing cases of direct retentivity, as is very well illustratedby the graphs in Fig. 5–12. This is indicated by theratio of the values of coefficient a of the reverse anddirect retentivity, which is always greater than 1; forthe 2nd, 3rd and 4th orders of retentivity it rangesbetween 1.7 and 2.4. We do not considere here thevalue of this ratio for the 1st order, which has an evengreater value (4.05). In this case the values of exponentγ for reverse and direct retentivity are not equal (0.5and 0.8 respectively), making such a comparisonincorrect.
0.05
0
–0.05
–0.10
–0.15
–0.20
159742 31 5
–0.25
–0.30
–0.40141312106 8 11
Time interval, years
Dec
reas
ing/
incr
easi
ng
of r
eten
tivi
ty,
shar
e of
1
–0.35
Influence on retentivity of the 1st termof the equation
Influence on retentivity of the 2st termof the equation
Influence on retentivity of the free ternof the equation, c–1
Fig. 12. Reverse t retentivity of the 4th order. Influence on retentivity of 1—the time interval τ between two compared lists (1st
term of the equation); 2—the changes over time of the ratio of the sum of the expected response values in these lists (2nd term ofthe equation); 3—the factors, which the postulates do not take into account directly (the difference between the free term c and 1).
90
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
To illustrate the impact of the 1st term of the equa�tions, we consider the ratio r (26) for the maximumvalue of τ (i.e. a time difference of 15 years, comparing1995 and 2010). For direct retentivity we find r to be⎯40.9%, –29.3%, –32.0% and –15.8% for the 1st,2nd, 3rd and 4th order retentivity respectively. Forreverse retentivity of the 2nd, 3rd and 4th order we finda stronger influence with r values of –63.8, –91.7 and–48.5% respectively. In relation to reverse retentivityof the 1st order (by number of journals), the losses ofretentivity described by the first term of the equationexceed the total retentivity (r = –116.3%); these arecompensated only by the influence of the remainingtwo terms of the equation.
Looking at the influence of the 1st term of themodel equations, we can conclude that postulate 2 isfully confirmed.
The influence of 2nd term of the equation. The
expressions and in equations
(15) and (16) that are part of the 2nd term of theseequations, can be considered as an analogue of “accel�eration” (either positive or negative), which charac�terizes the process of change of the quantitative char�acteristics of the lists over time. This is indicated by thevalue of the exponent β of τ in the denominator: β isalways less than 1 (0.5 ⇐ β ⇐ 0.8).
When looking at the second term of the right side ofequations (15) and (16) we must keep in mind that the
expression , which corresponds to the
case of direct retentivity, as a rule has a positive value,
and the expression , which matches the
case of reverse retentivity, always has a negative value.At the same time, as follows from equations (17)–(24), this is not true for the values of the coefficient b.Hence, unlike in the case of the 1st term, we cannotestimate the character and the degree of influence ofthe 2nd term on a retentivity directly by using its coef�ficient b. Instead, we look at figures 5–12 and ratiovelue r in equation (26) to assess the influence of the2nd term. Based on these data we come to the follow�ing conclusions.
1. In all four cases of a direct retentivity, the 2ndterm makes a positive constribution to the retentivity.This influence increases with higher orders of retentiv�ity. For the retentivity by number of journals (1st orderof direct retentivity—Fig. 5) and τ = 15: r = +4.2%.For the direct retentivity by the sum of number of arti�cles (2d order of direct retentivity—Fig. 7 rises to r =+8.9%. For the retentivity by the sum of the impact�factor values (3rd order of direct retentivity—Fig. 9)r = 10.9%, and for the direct retentivity by the sum of
4 4
4
, ,
,( ) 1
t k t k
t k
R R
R+τ
β
−
τ +
, ,
,( ) 1h h
h
t k t k
t k
R R
R
+τ
β+τ
−
τ +
4 4
4
, ,
,( ) 1
t k t k
t k
R R
R+τ
β
−
τ +
, ,
,( ) 1h h
h
t k t k
t k
R R
R
+τ
β+τ
−
τ +
the expected response values (4th order of direct
retentivity—Fig. 11) r = 5.4%.3
2. The influence of the 2nd term on reverse reten�tivity is not so unequivocal. For a reverse retentivity bynumber of journals (1st order of a reverse retentivity)the agreement with postulate 3 is fair: the contributionof the 2nd term of the equation in the retentivity in thiscase is negative. Looking at figure 6, we find a rathersignificant contribution of the 2nd term: r = 34.1%.
3. For the reverse retentivity of 2nd, 3rd and 4thorders it becomes clear that the postulate 3 does notdescribe the real processes of retentivity change:
Firstly, according to 3rd postulate the 2nd memberof the equation should to bring a negative contributionto the reverse retentivity. In reality, we find for all threespecified cases of reverse retentivity a positive contri�bution of the 2nd term, a very insignificant one. Indeed,from figures 8, 10 and 12, we find the ratios for the 2nd,3rd and 4th order retentivity to be +0.8%, +4.4% and+1.8% (see alsow Figs. 8, 10 and 12).
Secondly, the contribution of the 2nd term to thereverse retentivity depends to only a very small degreeon the τ value: the curves corresponding to the contri�bution of the 2nd term in all three specified figures arepractically in parallel to the abscissas. The explanationfor this paradox probably lies in the presence in eachannual list of JCR SE, of a core of the most authorita�tive journals, characterized by high productivity(number of articles) and a high academic level (highimpact factor) and, consequently, a great influence onthe world science (high expected response).
There is indeed such a core of 3259 journals (40.4%of the JCR SE list of journals in 2010), each of whichis present in each annual list JCR SE we considered.Moreover, in 2010 those 40.4% provided 57.2%, 51.0%and 66.3% of the total number of articles, the sum ofimpact factor values and the sum of the expectedresponse values, respectively. The ratio of the core�sponding characteristics of this set of journals con�firms the strong influence of this core: the averagenumber of articles per core journal is 1.4 times that ofthe average for the entire list in 2010 (192.01 and 137.3respectively), their impact factor 1.25 times (2.540 and2.032), and their expected response 1.62 times higher(610.6 and 376.4).
Influence of factors that are not accounted for in thepostulates. We need to consider influences on theretentivity of some factors which are not addresseddirectly by the postulates, but which nevertheless areindirectly mentioned in a postulate 1. Here, theexpression “basically depends” specifies that postulate 1
3 It is noticeable that in all these four graphs, curves correspond�ing to the 2nd member of equations sometimes look like joggedlines. This is because the calculation of these curves bases on tothe actual values of the numerical characteristics that sometimesdo not changes vary smoothly (see the values on the diagonals ofTables 1–4.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
MODELING THE DYNAMICS OF THE RETENTIVITY PROCESS 91
to consider influences on the retentivity of some fac�tors which are not addressed directly by the postulates,but which nevertheless are indirectly mentioned in apostulate 1. Above we already suggested that such fac�tors could have a socio�economic nature. In particu�lar, the state of the global economy, trends in basicresearch, changing the general rules in the formationof the journal lists of JCR�SE.
The model does not allow estimating separately thedegree of influence of any socio�economic factors onthe retentivity of each of listed above socio�economicfactors. Nevertheless, the offered model not onlyenables to estimate total influence of these factors, itdoes, however, give us an opportunity to estimate thetotal influence on each of the eight considered cases ofretentivity.
As a visual assessment of the influence of these fac�tors, we use the curves corresponding to c – 1 in figures5–12. Let us recall that c is the constant term of thecorresponding equation, and 1 the probability ofretention of a list in itself. To illustrate and evaluatethis effect, we use the value of the ratio of (26).
The influence of socio�economic factors on thedirect retentivity. In the graphs (figures 5, 7, 9 and 11),the curves corresponding to the difference c – 1 alwayslie below 0. Hence, it can be argued that these factorsalways have a negative impact on the direct retentivity.It should be noted that this impact is very small. In thecase of the direct retentivity by number of journals, theshare of the impact of these factors on the total directretentivity for τ = 15 is less than 1% (r = +0.9%). Inthe case of direct retentivity by the total number ofarticles a few moreit is somewhat higher (r = +3.5%);for the direct retentivity by the sum of the impact fac�tor values this characteristic is r = +2.8%, and fordirect retentivity by sum of the expected response val�ues r = +1.4%.
The influence of socio�economic factors on thereverse retentivity. From the graphs it is immediatelyevident that the influence of both the difference c – 1and of the 2nd term for the reverse retentivity is not asclear as it is for the direct retentivity. For the 1st orderretentivity that influence is positive and significant at32.4% for τ = 15 (see figure 6). For the other cases ofreverse retentivity, the influence of socio�economic fac�tors is practically negligible (see figures 8, 10 and 12),positive with r = +2.4% for the 2nd order and r =+2.7% for for the 4th order, and negative at –1.7% forthe 3rd order.
We can conclude with regard to the matching pos�tulates formulated at the beginning of this article, thatpostulate 2, according to which the probability ofretention for a journal as an authoritative source ofpapers (1st order retentivity) is inversely dependent onthe time interval, is true for both direct and reverseretentivity.
Postulate 3, according to which in case of a fixedtime interval, the probability of retention for a journal
as an authoritative source of papers (1st order retentiv�ity) increases when the number of journals in the start�ing list is smaller than the number in the list to whichit is compared, and, conversely, this probabilitydecreases when the number of journal in the startinglist is larger. In both these cases, these suppositionswere confirmed.
As postulates 2 and 3 are elaborations of postulate 1,this postulate also is not in contradiction with the realprocess of retentivity of journals as the most authorita�tive source of scientific papers.
As to a postulate 4, according to which postulates 2and 3 are valid also for 2nd, 3rd and 4th orders of a reten�tivity, we can only partly confirm its validity. The gen�eralization of postulate 2 for the retentivity of the 2nd,3rd and 4th orders is found valid, both in case of directand of reverse retentivity. However, this is not true forpostulate 3 for the reverse retentivity. Indeed, the ratiobetween the quantitative characteristics of the twocompared lists practically does not influence theretentivity for given characteristic.
CONCLUSIONS
1. The article analyzes the lists of journals from theJCR SE for the period of 16 years between 1995 and2010. The results of this analysis indicate for these setsof scientific journals sufficient stability of the statisti�cal characteristics, which are the starting point for theconstruction of most scientometric indicators. This inturn suggests that it is reasonable to use these indica�tors to assess trends in science.
2. We constructed a mathematical model of theprocess of change over time of the retentivity of jour�nals as the most authoritative sources of scientificpapers. The model is based on postulates that connectthe probability of a journal’s retention with the timeinterval between two lists of journals that are com�pared, and with the ratio of the sizes of these lists: 1—the probability of retention for journals falls withincreasing time interval; 2—the probability of reten�tion for a journal from a shorter list in a longer list ishigher than the probability of retention of a journalfrom a longer list in a shorter list.
3. We introduced the concept of retentivity of jour�nals as the most authoritative sources of scientificpapers as well as the concepts of retentivity order andretentivity direction. There are four retentivity orders,each of which has its own quantitative characteristic.As the 1st order we define retentivity by the number ofjournals in the given list; the 2nd order as retentivity bythe sum of the number of articles published by thejournals from a given list; the 3rd order as retentivity bythe sum of the impact�factor values of the journalsfrom a given list; the 4th order as retentivity by the sumof the expected response values for the articles whichhave been published in the journals from a given list.
92
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 47 No. 3 2013
LIBKIND et al.
4. The concepts of direct and reverse retentivitywere introduced. When journals are retained in a laterlist, we call this direct retentivity. We speak of reverseretentivity when journals from a list are also present inan earlier list.
5. Comparing the model with empirical data showsthat it gives a quite good description of the process ofchange of retentivity with time.The validity of the pos�tulates, which were used to build the model, was alsochecked.
We found that, indeed, the retentivity falls withincrease of the time interval between two lists, both incase of direct and of reverse retentivity.
The statement about inverse dependence of directretentivity on a ratio value of the sizes of comparedjournals lists is true. Moreover, the effect of this rde�pendenc is significant. The same holds for all orders ofdirect retentivity.
For reverse retentivity it is true only for the 1st
order; in all other cases of reverse retentivity we findonly minimal impact. The reason for this seeminglyparadoxical finding lies in the fact that a substantial setof journals, characterized by high productivity and/ora high academic level (high impact factor value) and,consequently, of great influence on the world of sci�ence (high expected response value), form a stablecore in the JCR SE lists; the probability of their inclu�sion depends very little on any external factors.
6. An analysis of the mathematical structure of themodel and its direct comparison with empirical datashows that the negative correlation of the retentivitywith the time interval is nonlinear, i.e. the absolutevalue of the increment of retentivity change decreaseswith time and, therefore, the dependence of the reten�tivity on the time interval fades, albeit slowly.
7. The change in the size of the lists and othernumerical characteristics are nonlinear as well (this alsofollows from the structure of a model and from of thecomparison between the model and empirical data).
8. The effect of increasing time interval betweenlists is always stronger in cases of reverse retentivitythan in the corresponding cases of direct retentivity.
9. We were able to draw a number of additionalconclusions from the model and a comparison withempirical data. Thus, for direct retentivity by thenumber of journals, we found considerable influenceof factors, not covered by the postulates, which weassume to be of socio�economic nature (let’s call themexternal). In contrast, the influence of these factors onthe 2nd, 3rd and 4th orders of direct retentivity is mini�mal. The influence of external factors is insignificantfor all orders of reverse retentivity. This fact is indirectevidence that the core of the world’s scientific jour�nals, which was formed before the period we studied,is quite stable and very little influenced by external fac�tors. Indeed, 70.5% of the full list in 1995 (3259 out of4623 journals) has been included in every subsequentlist, despite a very significant increase in the size of
subsequent JCR SE journal lists. The recognition ofthese core journals is confirmed by the fact that theyhave significantly higher values of productivity (num�ber of articles), scientific level (impact factors values)and the degree of influence on the world of science(the expected response values) than the average jour�nal in the lists.
ACKNOWLEDGMENTS
The study was supported by the Russian Founda�tion for Humanities (project 12�03�00070).
REFERENCES
1. Garfield, E., Citation Indexing for science: Newdimension in documentation though association ofideas, Science, 1955, no. 122, pp. 108–111.
2. Garfield, E., A century of citation indexing. Key noteaddress, Proc. 13th Int. Conf. on Webometrics, Informet�rics and Scientometrics and COLLNET Meeting, Istan�bul, 2011, pp. 20–23.
3. Johnstone, M.J., Journal impact factors: implicationsfor the nursing profession, Int. Nursing Rev., 2007,vol. 54, pp. 35–40.
4. Leydesdorff, L. and Wagner, C., Macro�Level Indica�tors of the Relations between Research Funding andResearch Output. http://www.leydesdorff.net/road�map/roadmap.pdf
5. Braun, T., Glanzel, W., and Schubert, A., A Hirsch�type index for journals, Scientometrics, 2006, vol. 69,pp. 169–173.
6. Glanzel, W., On the h�index�a mathematical approachto a new measure of publication activity and citationimpact, Scientometrics, 2006, vol. 67, pp. 121–129.
7. Halfman, W. and Leydesdorff, L., Is inequality amonguniversities increasing? Gini coefficients and the elu�sive rise of elite universities. http://www.loetleydes�dorff.net
8. Bergstrom, C.T., Eigen factor as the value and prestige ofscholarly journals, College Res. Libr. New, 2007, vol. 68.http://www.ala.org/ala/acr1pubs/crlnews/backissues2007/may07/egenfactor.cfm
9. Michels, C. and Schmoch, U., The growth of scienceand database coverage, Scientometrics, 2012, vol. 93,pp. 831–846.
10. Markusova, V.A., Quality of scientific journals and basecriterions for including to information system Web ofScience of Thomson Reuters Co., Acta Naturae, 2012,vol. 4, pp. 6–13.
11. Gilyarevsky, R.S., Mulchenko, Z.M., Terekhin, A.T.,and Cherny, A.B., Experience with the study of the Sci�ence Citation Index, in Prikladnaya dokumentalistika(Applied Documentation), Moscow: Nauka, 1968,pp. 32–53.
12. Arapov, M.V. and Libkind, A.N., Change of productiv�ity in regular sources of information, Nauchn.�Tekhn.Inform., Ser. 2, 1976, no. 10, pp. 3–14.
13. Libkind, A.N., One approach to study communicationin science, Scientometrics, 1985, vol. 8, nos. 3–4,pp. 217–231.