64
1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic [email protected]; http://comminfo.rutgers.edu/~tefko/

1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic [email protected]@rutgers.edu;

  • View
    227

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

1

Bibliometric [scientometric, webometric, informetric …]

searchingData used for assessing impact of

scholarly output

Tefko Saracevic

[email protected]; http://comminfo.rutgers.edu/~tefko/

Page 2: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Central idea• Use of quantitative methods – statistics – to

study & characterize recorded communication - ‘literature’ - of all kinds

• In order to: – describe research output with various indicators &

distributions – use in evaluating scholarly scientific performance

• New tools increased & changed significantly role of searching & searchers

2 Tefko Saracevic

Page 3: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

ToC

1. Goals, definitions

2. Reasons, applications – why?

3. Data sources for bibliometric analyses

4. Methods & measures – how?

5. A sample of examples

6. Implications for searching. Caveats

Tefko Saracevic 3

Page 4: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

1. Goals, definitionsBibliometrics, scientometrics, webometrics …

Tefko Saracevic 4

Page 5: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Metric studies

• Applied in many fields:

Sociometrics,

Econometrics,

Biometrics …– deal with statistical

properties, relations, & principles of a variety of entities in their domain

• Metric studies in information science follow these by concentrating on statistical properties & the discovery of associated relations & principles of information objects, structures, & processes

Tefko Saracevic 5

Page 6: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Goals of metric studies

• To characterize statistically entities under study– more ambitiously to

discover regularities & relations in their distributions & dynamics in order to observe predictive regularities & formulate laws

• describe numerically, predict, apply

• Same in information science– portray statistically

entities under study:• literature, documents, …

all kinds of inf. objects & processes as related to science, institutions, the Web …

• but also people – authors• more recently: also

scholarly productivity

Tefko Saracevic 6

Page 7: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Definitions

• biblio derived from “biblion” Greek word for book

• metrics derived from “metrikos” Greek word for measurement

• Bibliometrics– “...the application of

mathematical and statistical methods to books and other media of communication .”

Alan Pritchard (1969)

– “… the quantitative treatment of the properties of recorded discourse and behavior pertaining to it.”

Robert Fairthorne (1969)

Tefko Saracevic 7

Page 8: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Definitions … morebut with differing contexts

• Scientometrics bibliometric & other metric studies specifically concentrating on science

• Informetrics study of the quantitative aspects of information in any form - broadest

• Webometrics quantitative analysis of web-related phenomena

• Cybermetricsquantitative aspects of information resources on the whole Internet

• E-metricsmeasures of electronic resources, particularly in libraries

Tefko Saracevic 8

For simplicity, we will use here bibliometrics to cover all

Page 9: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

2. Base, reasons, useWhy? What? What for?

Tefko Saracevic 9

Page 10: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Based on what entities have & could be COUNTED

• In documents (as entities):

– authors– their institutions,

countries– sources – e.g. journals– references – who &

what is cited– age of references

• & anything else that is countable

• In Web entities– identifying

relationships between Web objects

– link structures• out-links• in-links• self-links• nodes, central nodes• in a way analogous to

citations

Tefko Saracevic 10

And derivation of structures based on any of these

Page 11: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

A lot is based on citations

• Citation analysis:– analysis of data

derived from references cited in footnotes or bibliographies of scholarly publications

Used to be just counts• Now it also leads to

examination & mapping of intellectual impact of scholars, projects, institutions, journals, disciplines, and nations

Tefko Saracevic 11

Becoming increasingly popular & widely used –with important implications for searching

Page 12: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Reasons for bibliometric studies

• Understanding of patterns– discovery of regularities, behavior– “order out of documentary chaos” [Bradford, 1948]

• Analysis of structures & dynamics– discovery of connections, relations, networks– search for regularities - possible predictions

• Discovery of impacts, effects• relation between entities & amounts of their various uses

– providing support for making of decisions, policies

Tefko Saracevic 12

Page 13: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Major branches of bibliometricsRelational

• Older - patterns, structures, relations, mappings – where bibliometrics started

• Data on what was observed – e.g. no. of articles/citations by/to

an author; no. of journals with articles relevant to a topic; no. of articles/citations in/to a journal …

• Used for description, mapping of relations & prediction

Evaluative

• Newer – impacts, effects– where bibliometrics

became a big deal in many arenas

• Data from what was observed but looking for– measures of impact,

prominence, ranking, bang

• Discovers who’s up & how much up

• Used for decisions, policies

Tefko Saracevic 13

Page 14: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Seeking …Thelwall (2008)

Relational• Relational bibliometrics

seeks to illuminate relationships within research, such as the cognitive structure of research fields, the emergence of new research fronts, or national and international co-authorship patterns

Evaluative• Evaluative bibliometrics

seeks to assess the impact of scholarly work, usually to compare the relative scientific contributions of two or more individuals or groups

Tefko Saracevic 14

Page 15: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Major approaches

Empirical• Collection & study of data

– establishment of measures– statistical & graphic

analyses

• We will pursue some of these here– concentrate on empirical

Theoretical• Building of generalized

models, theories– often mathematical, abstract– becoming highly specialized

• We will NOT pursue this here– but you should be aware

that there are a lot of theoretical efforts

Tefko Saracevic 15

Page 16: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Users

Relational• Mostly scholars• Mostly research oriented• But also librarians for

decisions – e.g. on collections,

purchase, weeding

Evaluative – new audience • Library managers• Analysts • University administrators

(deans, provosts) • Directors of institutional

research • National governments &

ministries • Grant & funding agencies 

Tefko Saracevic 16

Page 17: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Used in a variety of functions & areas

• In collection developmentidentifying the most-useful materials: by analyzing circulationrecords; journal / e-journal usage statistics; etc.

• In information retrievalidentifying top-ranked documents, authors: those most highly-cited;most highly co-cited; most popular; etc.

• In the sociology of knowledgeidentifying structural and temporal relationships betweendocuments, authors, research areas, universities etc.

• In policy makingjustifying, managing or prioritizing support for course of action in

a number of areas – e.g. science policy, institutional policy

Tefko Saracevic 17

Page 18: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Use of evaluative bibliometrics

18

• Academic, research & government institutions for: – promotion and tenure, hiring, salary raising– decisions for support of departments, disciplines– grants decision; research policy making– visualization of scholarly networks, identifying key contributions &

contributors– monitoring scholarly developments– determining journal citation impact

• Resource allocation:– identifying authors most worthy of support;– research areas most worthy of funding– journals most worthy of support or purchase; etc.

Tefko Saracevic

Page 19: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Major bibliometric factors for evaluation of academic performance

For individuals• Number of publications in

peer reviewed journals• The impact factor of

those journals• The h-index

For institutions• Total no. of publications• Total no. of citations• Various ratios - per

faculty, project …

Tefko Saracevic 19

Page 20: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Impact indicators and studies

• Several governments mandate citation analysis to– asses quality of research and institutions– inform decisions on support– determine support for journal– rank institutions, programs, departments, projects

• Many institutions practice it regulalry

Tefko Saracevic 20

Page 21: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

3. Data sources for bibliometric analyses

Where does stuff for analysis come from?

Tefko Saracevic 21

Page 22: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Main sources for bibliometric analyses

• Bibliographies, indexes – once popular, not any more– once done manually - limited

• Documents in databases– computerization enabled

wide collection of data & development of new methods

• Science statistics

• And then there are citations– as they become automated

use of bibliometrics exploded

• Web & Internet– mining connections & other

networked aspects– but also applying some

older methods to new data

Tefko Saracevic 22

Page 23: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Institute for Scientific Information (ISI, now Thomson Reuters)

• ISI launched in 1962 by Eugene Garfield– started by publishing Science Citation Index (SCI) &

later Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI) [all still in Dialog]

– these morphed into Web of Science (WoS)

• All only cover an ISI selected set of journals– thus all citation results & studies are based on that set

of journals, not the universe of journals and books, but the citations themselves are to whatever is cited

– true of any database – Scopus, Google Scholar etc.

Tefko Saracevic 23

Page 24: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Impact of ISI citation databases

• Major source for bibliometric analysis• Revolutionized use of citations

– e.g. easy citation counts, tracing, establishment of connections … became possible

• Provided data for new types of analysis– e.g. mapping of fields, identifying research fronts

• Laid base for evaluative bibliometrics• Instigated new types of searching

– above & beyond subject searching

Tefko Saracevic 24

Page 25: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Expansion of citation data sources

• Starting in early 2000s citation data are being offered by a number of databases other than Web of Science, most notably– Scopus– Google Scholar

• and a host of others

• This expanded dramatically availability of data & types of analyses– a number of

innovations were introduced

– use of such data also expanded

• Challenge to WoS databases

Tefko Saracevic 25

Page 26: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Connections

• Data from relational bibliometrics is used for sorting, ranking, mapping … in evaluative bibliometrics

• Raw data obtained from relational analyses is then “milked” in many ways– often combined with other data

• e.g. ranked citation counts and financial data, enrollment data …

Tefko Saracevic 26

Page 27: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

4. Methods & measures – how?

Tefko Saracevic 27

Page 28: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Overview

• A few older bibliometric laws & methods:

• Lotka’s law– deals with distribution of

authors in a field

• Bradford’s law– deals with distribution of

articles relevant to a subject across journals where they appear

• From citations:– citation age (or

obsolescence)

– co-citation – clustering & co-citation

maps– bibliographic coupling– journal impact factor– self citation (auto-citation)

– & many more.

Tefko Saracevic 28

Page 29: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Lotka’s law (1926) – papers & authors Alfred Lotka (1880-1949, American mathematician, chemist and statistician)

Formal

Number of authors who had published n papers in a given field is roughly 1/n 2 the number of authors who had published one paper only

EnglishA large proportion of the total

literature in a field is authored by a small proportion of the total number of authors, falling down regularly, where the majority of authors produce but one paper

e.g. for 100 authors, who on average each wrote one article each over a specific period, we have also 25 authors with 2 articles (100/22=25), 11 with 3 articles (100/32 ≈ 11), 6 with 4 articles (100/42 ≈ 6) etc.

Tefko Saracevic 29

Page 30: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Bradford’s law (1934) – papers & journals Samuel C. Bradford (1878-1948, British mathematician and librarian)

FormalIf scientific journals are arranged

in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3

n is called Bradford multiplier

English

• Basically states that most articles in a subject are produced by few journals (called nucleus) and the rest are made up of many separate sources that increase in numbers in a regular, exponential way

• Like Lotka’s law this is a law that generally follows laws of diminishing returns

Tefko Saracevic 30

Page 31: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Bradford’s law: How he did it?• He grouped periodicals with articles relevant to a subject

(from a bibliography) into 3 zones in order of decreasing yield – from journals with largest no. of articles to those with smallest; at

the end are journals with one article each on the subject

• Each zone had the SAME number of articles but different no. of journals

• The number of journals in each zone increases exponentially– e.g. if there are 5 journals in the first zone that produced 12

relevant articles; there may be 10 journals in the second zone for next 12 articles & 20 for next 12 – Bradford multiplier (n) found here is 10/5=2

Tefko Saracevic 31

Page 32: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Cited half-life

Formal• Definition: the number of

years that the number of citations take to decline to 50% of its current total value

English• How far back in time one

must go to account for one half of the citations a journal receives in a given year – e.g. if in 2008 the journal XYZ

has a cited half life of 7.0 it means that articles published in XYZ between 2002 to 2008 (inclusive) account for 50% of all citations to articles from that journal (anyplace) in 2008

Tefko Saracevic 32

Page 33: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Citing half-life

Formal• Definition: the median

age of all cited articles in the journal during a given year

English• A measure of how current

(or how old) are the references cited in a journal – e.g. if in 2008 for journal XYZ

citing half life was 9.0 it means that 50% of articles cited (references) in XYZ were published between years 2000 and 2008 (inclusive)

Tefko Saracevic 33

Page 34: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Co-citation a popular similarity measure between two entities

Formal

The frequency with which two items of earlier literature are cited together by the later literature

1. frequency with which two documents are cited together, or

2. frequency with which two authors are cited together irrespective of what document

English• As of 2.: How often are two

authors cited together• If author A and B are both

cited by C, they may be said to be related to one another, even though they don’t directly reference each other

– if A and B are both cited by many other articles, they have a stronger relationship. The more items they are cited by, the stronger their relationship is

Tefko Saracevic 34

Page 35: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Use of co-citation

• Co-citation is often used as a measure of similarity– if authors or documents are co-cited they are likely to be similar

in some way

• This means that if collections of documents are arranged according to their co-citation counts then this should produce a pattern reflecting cognitive scientific relationships

• Author co-citation analysis (ACA) is a technique in that it measures the similarity of pairs of authors through the frequency with which their work is co-cited

• These are then arranged in maps showing a structure of an field, domain, area of research …

Tefko Saracevic 35

Page 36: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Tefko Saracevic 36

Map of Author Co-citation Analysis of information scienceZhao & Strotmann (2008)

Page 37: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Bibliographic coupling

Formal

• Links two items that reference the same items, so that if A and B both reference C, they may be said to be related, even though they don't directly reference each other. The more items they both reference in common, the stronger their relationship is

• It is backward chaining, while co-citation is forward chaining

English• Occurs when two works

reference a common third work in their bibliographies e.g.

If in one article Saracevic cites Kantor, P. &in another article Belkin cites Kantor. P.,

• but neither Saracevic or Belkin cite each other in those articles

• then Saracevic & Belkin are bibliographically coupled because they cite Kantor

Tefko Saracevic 37

Page 38: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Journal Impact Factorin Journal Citation Reports (JCR)

FormalThe average number of times

articles from the journal published in the past two years have been cited in the JCR year.

The number of citations published in the year X to articles in the journal published in years X − 1 and X − 2, divided by the number of articles published in the journal in the years X − 1 and X − 2.

English

• Measures how often articles in a specific journal have been cited– a Journal Impact Factor for

journal XYZ of 2.5 means that, on average, the articles published in XYZ one or two year ago have been cited two and a half times

• How to use Journal Citation Reports

Tefko Saracevic 38

Page 39: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

h-index - Hirsch (2005)

Formal

• For a scientist, is the largest number h such that s/he has at least h publications cited at least h times & the other publications have less citations each– it is more than a straight

citation count because it takes into account BOTH: number of publications one had AND number of citations one received

English• Number of papers a

scientist has published that received the same number of citations

• I published (as listed in Scopus):

– 74 articles– 31 of which were considered for h-

index (their criteria)– of these 15 were cited at least 15

times– others were cited less– my h-index is 15

Tefko Saracevic 39

Page 40: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

h-index differences• There are differences

in typical h values in different fields, determined in part by – the average number of

references in a paper in the field

– the average number of papers produced by each scientist in the field

– the size (number of scientists) of the field

• Thus, comparison of h-indexes of scientists in different fields may not be valid

• Keep it to the same field!– e.g. h indices in biological

sciences tend to be higher than in physics

Tefko Saracevic 40

Page 41: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Citation frequency: citations are skewedResearch front

• A few articles are cited a lot, others less, a lot very little or not al all– 80-20 distribution: 20% of

articles may account for 80% of the citations

– from 1900-2005, about one half of one percent of cited papers were cited over 200 times. Out of about 38 million source items about

half were not cited at all. (Garfield, 2005)

• This led to identifying of a “research front”– cluster of highly cited papers

in a domain

– showing also links among the highly cited papers in form of maps

• indicating what papers are frequently cited together i.e. co-citated

• For searchers: identifying current & evolving research fronts in a domain

Tefko Saracevic 41

Page 42: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Aggregate article & citation statistics

• Derived from citation databases– combined statistics for

a variety of entities

• “Milked” in great many, even ingenious ways– e.g. a major

component in ranking of universities (shown later)

• The number of citations to all articles in a – journal (base for Journal

Impact Factor)

– or all articles or citations received by

• author• research group• institution• country

Tefko Saracevic 42

Page 43: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

5. A sample of examples

Tefko Saracevic 43

Page 44: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Scopus citation tracking for an author

Tefko Saracevic 44

Page 45: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Scopus journal analyzer -three journals selected for comparison

could be further analyzed by tabs or listed in a table

Tefko Saracevic 45

Page 46: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Web of Science citation report for an author

Tefko Saracevic 46

Page 47: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Web of Science Journal Citation Report for three journals

Tefko Saracevic 47

Page 48: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Tefko Saracevic 48

Histogram for JASIST using Garfield's HistCiteLCS= Local Citation Score; count of how much cited in JASISTGCS=Global Citation Score; count of how much cited in all journals in WoSLCR=Local Cited References; how many references from JASISTNCR=Number of Cited References; how many references in the paper

Page 49: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

WoS: Essential Science Indicators

Tefko Saracevic 49

Page 50: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

WoS: Incites

Tefko Saracevic 50

Page 51: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

SCImago Journal & Country Rank (SJR) a great resource – from Spain

Tefko Saracevic 51

Page 52: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

SJR Journal Analysis for Information Processing & Management

Tefko Saracevic 52

Page 53: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

SJR Country Indicators

Tefko Saracevic 53

Page 54: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

University rankings

• Times Higher Education ranking: QS World University Rankings 2008 - Top 400 Universitieshttp://www.topuniversities.com/worlduniversityrankings/results/

2008/overall_rankings/fullrankings/ • Shanghai ranking: Academic Ranking of World

Universities – 2007 - Shanghai Jiao Tong University http://www.arwu.org/rank/2007/ranking2007.htm – Miscellaneous Information on University Rankings 

http://www.arwu.org/rank/2008/200810/ARWU2008Resources.htm

• Leiden ranking: Top 100 & 250 universities, Europe & world, 2008 - Centre for Science and Technology Studies (CWTS), Leiden University, Netherlandshttp://www.cwts.nl/ranking/LeidenRankingWebSite.html

Tefko Saracevic 54

Page 55: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

6. Implications for searching. Caveats

What to watch for? Ethical issue as well

Tefko Saracevic 55

Page 56: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Role of searchers Relational bibliometricsearching

Older:• Connected with subject

searches– adding dimension of

authors, sources …

• Performing citation analyses– e.g. identifying key papers,

authors, sources– citation pearl growing

Evaluative bibliometricsearching

Newer - higher responsibility: • Called to perform searches

related to bibliometric indicators of impact– often by administrators,

decision makers, policy wonks, managers e.g. for tenure & promotion; resource allocation; grants; purchase decisions; justification …

Tefko Saracevic 56

Page 57: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Implication for searching because of scatter

• Journals & articles are scattered, so are authors – many articles are in core

journals – easy to find– BUT: a number of relevant

articles will be scattered throughout other journals

– These need to be found• not to miss relevant articles in

non-core journals

• High precision searching concentrates on top producing journals and authors in a subject

• High recall searching includes the long tail of authors and journals– but the long tail could be

very long• need to know when to

stop

Tefko Saracevic 57

Key: Adjusting effectiveness & efficiency of searching to laws of diminishing returns

Page 58: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Caveats for citations (and there are many)

• Citation rates & practices differ greatly among fields– citation & publication practices are NOT homogenous within

specialties and fields of science (Leydesdorff, 2008)

• The context could be negative• A citation may not be relevant to the work• The second, third … author may not be cited at all• Matthew effect (rich get richer) or success-breads-

success mechanism works in citations– already well-known individuals receive disproportionately high

rate of citation

• Self citation practices & citation padding– author citing him/herself; journal articles citing their own journal

Tefko Saracevic 58

Page 59: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Caveat for author & citation disambiguation

• Distinguishing Saracevic, T. from other authors is not hard – to zero in on that one author– Belkin, N. is harder; Kantor, P still harder, Ying, Z. almost

impossible– thus, VERY careful disambiguation is necessary

• sometimes very time consuming; sometimes never sure

• Citations in articles are often messy & careless– e.g. my name while being cited was misspelled in many creative

ways – no corrections are made by databases– thus, variations have to be explored to be included in citation

counts

Tefko Saracevic 59

Page 60: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Caveats for h-index - (Hirsch, 2005)

• “Obviously, a single number can never give more than a rough approximation to an individual’s multifaceted profile, and many other factors should be considered in combination in evaluating an individual.”

• “Furthermore, the fact that there can always be exceptions to rules should be kept in mind, especially in life-changing decisions such as the granting or denying of tenure.”

Tefko Saracevic 60

Page 61: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Caveat for webometrics & Web sources – Thelwall (2008)

• Web data is not quality controlled– caveat emptor (search for what it means)

• Web data is not standardized– e.g. there does not seem to be a simple way to

separate out web citations in online journal articles from those in online course reading lists

• It can be impossible to find the publication date of a web page – results typically combine new and old web pages

• Web data is incomplete in several senses and in arbitrary ways

Tefko Saracevic 61

Page 62: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Caveat for Journal Impact Factor (JIF)

• Assumption: journals with higher JIFs tend to publish higher impact research & hence tend to be better regarded. But:– JIFs vary greatly from field to field, because citation

practices differ greatly– even within discrete subject fields, ranking journals

based upon JIFs is problematic – it is but one measure, other characteristics are important

– because of popularity journal citations misused:• recommendations to authors to cite other articles in a given

journal to improve its JIF

Tefko Saracevic 62

Page 63: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Caveat for coverage: differences can be substantial

• Different databases cover different articles, citations, handle them differently …– there is no one answer to: “How many citations did X receive?”

• For the same author (institution …) different databases will provide different– no. of articles, citations; h-index; … overlap may not be great– in citations there are even ghost citations (listed as citing an

article but there is no actual citation in the article)

• Careful comparisons & use of multiple databses are necessary

• A whole literature on these inconsistencies emerged– one of the frequent analyzers is Peter Jasco, U of Hawaii

Tefko Saracevic 63

Page 64: 1 Bibliometric [scientometric, webometric, informetric …] searching Data used for assessing impact of scholarly output Tefko Saracevic tefkos@rutgers.edutefkos@rutgers.edu;

Searching ….

Tefko Saracevic 64