8
Library and information science (LIS) as we see it: An overview at the state and country level from 1965-2010 Erjia Yan Indiana University 1320 E10th St, LI011 [email protected] Ying Ding Indiana University 1320 E10th St, LI011 [email protected] ABSTRACT This study is motivated to explore and visualize the dynamic citation impact at the state and country level. We present several information visualization approaches, including cartograms and global citation maps, to understand the citation impact distribution in library and information science (LIS). We find that the citation impact in LIS is highly skewed towards North American and European countries. Through the dynamic analysis, states such as GEORGIA and FLORIDA, countries such as CHINA and SOUTH KOREA are becoming more impactful. Keywords cartogram, citation, network, library and information science. INTRODUCTION Scholars now have unprecedented access to comprehensive academic repositories. The vast amount of scholarly data raises the hope that mankind’s scientific endeavor can be fully analyzed and interpreted. Nevertheless, these collections are not well taken advantage of, and they have a great potential to be further utilized. For instance, previous efforts on citation impact analysis mainly focused on the research aggregates such as authors and journals, but higher level of aggregations such as state/province or country is not well explored. The findings from country level analysis provide rich background information for author or journal level analyses, and moreover can confirm and validate these studies on lower level research aggregates. A common issue that bibliometricians often encounter is to interpret the results in a meaningful way. An environment parameter is needed to interpret the result such as “author A received 20% more citations from 2009 to 2010”. The findings from country level analysis can be considered as ideal environment parameters which generate a useful context for the analyses. Therefore, the present study is motivated to explore and visualize the dynamic citation impact at the state and country level. The typical approach of a geographical visualization would be to plot a data set on a standard projection with some color code. For example, in the U.S. election map, states are colored red or blue to indicate whether the majority of their voters voted for the Republican candidate or for the Democratic candidate. Such visualization, however, can be misleading because many states with large areas have small numbers of delegations. An effectively solution for this problem is plotting the data in cartograms. Cartograms are maps in which the sizes of geographic regions are purposely distorted and are displayed in proportion to certain measurements, such as population and gross domestic product (GDP). For the problem mentioned above, the cartograms corrected the miscommunication by scaling the size of each state according to the number of its delegations (Gastner & Newman, 2004). Therefore, in U.S. election cartograms, equal area is an indication of equal number of delegations but not the areas of the states so that states with more delegations are more visible. Cartograms are thus considered as more effective and informative geographical presentations. A popular and comprehensive country level citation impact analysis is conducted by SCImago (2010). Using Scopus citation data, SCImago developed an easy-to-access endpoint which allows users to search and compare indicators at country level, such as number of documents, number of citation, and h-index. In regard to visualizations, both cartographic and geographic maps have been used to visualize scientific productivities. For example, Carvalho and Batty (2006) used cartograms to illustrate the geographical distribution of scientific productivity of computer science in the U.S. They found that institutional productivity still follows a power law distribution when transforming from geographic to cartographic space. Leydesdorff and Persson (2010) overlaid an institution collaboration network on several geographic maps, including Google Earth, Google Maps, and Pajek. Previous country-level analyses mainly concentrated on the productivity (e.g. number of publications, number of coauthors, etc.), the current study focus on the citation ASIST 2011, October 9-13, 2011, New Orleans, LA, USA.

Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

Embed Size (px)

Citation preview

Page 1: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

Library and information science (LIS) as we see it: An overview at the state and country level from 1965-2010

Erjia Yan Indiana University

1320 E10th St, LI011 [email protected]

Ying Ding Indiana University

1320 E10th St, LI011 [email protected]

ABSTRACT

This study is motivated to explore and visualize the dynamic citation impact at the state and country level. We present several information visualization approaches, including cartograms and global citation maps, to understand the citation impact distribution in library and information science (LIS). We find that the citation impact in LIS is highly skewed towards North American and European countries. Through the dynamic analysis, states such as GEORGIA and FLORIDA, countries such as CHINA and SOUTH KOREA are becoming more impactful.

Keywords cartogram, citation, network, library and information science.

INTRODUCTION Scholars now have unprecedented access to comprehensive academic repositories. The vast amount of scholarly data raises the hope that mankind’s scientific endeavor can be fully analyzed and interpreted. Nevertheless, these collections are not well taken advantage of, and they have a great potential to be further utilized. For instance, previous efforts on citation impact analysis mainly focused on the research aggregates such as authors and journals, but higher level of aggregations such as state/province or country is not well explored. The findings from country level analysis provide rich background information for author or journal level analyses, and moreover can confirm and validate these studies on lower level research aggregates. A common issue that bibliometricians often encounter is to interpret the results in a meaningful way. An environment parameter is needed to interpret the result such as “author A received 20% more citations from 2009 to 2010”. The findings from country level analysis can be considered as ideal environment parameters which generate a useful context for

the analyses. Therefore, the present study is motivated to explore and visualize the dynamic citation impact at the state and country level.

The typical approach of a geographical visualization would be to plot a data set on a standard projection with some color code. For example, in the U.S. election map, states are colored red or blue to indicate whether the majority of their voters voted for the Republican candidate or for the Democratic candidate. Such visualization, however, can be misleading because many states with large areas have small numbers of delegations. An effectively solution for this problem is plotting the data in cartograms. Cartograms are maps in which the sizes of geographic regions are purposely distorted and are displayed in proportion to certain measurements, such as population and gross domestic product (GDP). For the problem mentioned above, the cartograms corrected the miscommunication by scaling the size of each state according to the number of its delegations (Gastner & Newman, 2004). Therefore, in U.S. election cartograms, equal area is an indication of equal number of delegations but not the areas of the states so that states with more delegations are more visible. Cartograms are thus considered as more effective and informative geographical presentations.

A popular and comprehensive country level citation impact analysis is conducted by SCImago (2010). Using Scopus citation data, SCImago developed an easy-to-access endpoint which allows users to search and compare indicators at country level, such as number of documents, number of citation, and h-index. In regard to visualizations, both cartographic and geographic maps have been used to visualize scientific productivities. For example, Carvalho and Batty (2006) used cartograms to illustrate the geographical distribution of scientific productivity of computer science in the U.S. They found that institutional productivity still follows a power law distribution when transforming from geographic to cartographic space. Leydesdorff and Persson (2010) overlaid an institution collaboration network on several geographic maps, including Google Earth, Google Maps, and Pajek. Previous country-level analyses mainly concentrated on the productivity (e.g. number of publications, number of coauthors, etc.), the current study focus on the citation

ASIST 2011, October 9-13, 2011, New Orleans, LA, USA.

Page 2: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010
Page 3: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

Time Size of cited U.S. states

No. of internal citations (states)

Size of cited world countries

No. of internal citations (countries)

1965-1990 49+D.C. 5,408 52 8,494 1991-2000 50+D.C. 15,826 73 25,957 2001-2005 50+D.C. 18,782 72 35,500 2006-2010 50+D.C. 32,733 73 58,451 1965-2010 50+D.C. 72,743 80 128,402

Table 1. Size of institution citation networks

impact which is more effective candidate for scientific performance evaluation. In addition, cartograms are adopted to display the results in a vivid and instructive manner.

DATA The dataset used in this analysis was draw from all documents from the 59 journals indexed in the 2008 version of the Journal Citation Reports (JCR) in the Information Science & Library Science category1. All document types published within these journals from January 1965 to February 2010 were downloaded for analysis2.

Data were processed in several steps. The first step was to filter out documents without addresses. Documents that were not cited by other documents in the data set were then excluded, as citations served as the unit indicating a relationship between the documents. Therefore, the final dataset consisted of documents with at least one author affiliation that had been cited by another document (containing author affiliations) within the dataset. Citation counts between the documents were then calculated, using the concept of “internal citation”. That is, the number of times an article has been cited by other articles in the network, thereby reflecting local impact.

The second step involved identifying U.S. state names and world country names from the affiliation data. Country names were unified under ISO 3166-1. The dataset was then divided into four subsets based on the citing papers’ year of publication. Time spans are longer for the first two periods as the first years provided insufficient data to provide comparable networks. Table 1 shows the size of data set for five time periods.

RESULTS Table 2 shows the number of citation counts (CC) for the U.S. states in the four time periods. 1 There are 61 journals categorized as Information Science & Library Science in 2008; two journals written in foreign languages were excluded: PROF INFORM and Z BIBL BIBL. The total number of journals in the data set is 59. 2 See detailed data processing procedures at Yan & Sugimoto (2011).

State name (1965-1990) CC % PENNSYLVANIA, USA 811 15.00%

ILLINOIS, USA 569 10.52%

NEW YORK, USA 558 10.32%

CALIFORNIA, USA 557 10.30%

MASSACHUSETTS, USA 266 4.92%

OHIO, USA 261 4.83%

NEW JERSEY, USA 191 3.53%

TEXAS, USA 171 3.16%

DISTRICT OF COLUMBIA, USA 167 3.09%

INDIANA, USA 172 3.18%

State name (1991-2000) CC % PENNSYLVANIA, USA 2285 14.44%

CALIFORNIA, USA 1360 8.59%

NEW YORK, USA 1305 8.25%

ILLINOIS, USA 1119 7.07%

MASSACHUSETTS, USA 952 6.02%

TEXAS, USA 772 4.88%

OHIO, USA 699 4.42%

MINNESOTA, USA 639 4.04%

INDIANA, USA 586 3.70%

MICHIGAN, USA 465 2.94%

State name (2001-2005) CC %

PENNSYLVANIA, USA 1706 9.08%

CALIFORNIA, USA 1560 8.31%

NEW YORK, USA 1362 7.25%

TEXAS, USA 1333 7.10%

GEORGIA, USA 1291 6.87%

MASSACHUSETTS, USA 1099 5.85%

ILLINOIS, USA 973 5.18%

INDIANA, USA 758 4.04%

FLORIDA, USA 733 3.90%

MINNESOTA, USA 661 3.52%

Page 4: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

State name (2006-2010) CC % GEORGIA, USA 3195 9.76%

PENNSYLVANIA, USA 2688 8.21%

CALIFORNIA, USA 2629 8.03%

TEXAS, USA 2093 6.39%

NEW YORK, USA 2013 6.15%

FLORIDA, USA 1889 5.77%

MASSACHUSETTS, USA 1504 4.59%

ILLINOIS, USA 1372 4.19%

MARYLAND, USA 1067 3.26%

MICHIGAN, USA 1040 3.18%

Table 2. Number of citations counts for U.S. states

PENNSYLVANIA ranked first in the first three time periods since it is the home to several renowned information schools, including Drexel University, University of Pittsburgh, Pennsylvania State University, etc. PENNSYLVANIA was replaced by GEORGIA in 2006-2010 due to GEORGIA’s strong research capability in information system studies in recent years; specifically, University of Georgia and Georgia State University are the emerging centers of information system studies.

Figure 1 visualizes the geographical citation distributions among U.S. states through cartograms. The following four cartograms were produced based on Gastner and Newman’s (2004) diffusion-based cartograms.

Figure 1. Cartograms visualizations of citation distributions among U.S. states

As can be seen from the four cartograms, citations are skewed towards east north part of the U.S., such as PENNSYLVANIA, NEW YORK, MASSACHUSETTS, NEW JERSEY, and DISTRICT OF COLUMBIA. Noticeably, DISTRICT OF COLUMBIA is more visible as in the cartograms the size of a state is proportioned to the number of citations but not the physical area. Comparing the four cartograms, it can be noted that southern states, such as GEORGIA and FLORIDA, are becoming more impactful, and at the same time, some states in the middle part of the U.S. have received less citations, such as ILLNOIS, INDIANA, and OHIO.

Table 3 shows the number of citation counts (CC) for the countries worldwide in the four time periods.

Country name (1965-1990) CC %

UNITED STATES 5408 63.68%

UNITED KINGDOM 1690 19.90%

CANADA 259 3.05%

RUSSIA 155 1.83%

HUNGARY 150 1.77%

INDIA 100 1.18%

NETHERLANDS 89 1.05%

FRANCE 73 0.86%

BELGIUM 68 0.80%

Page 5: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

GERMANY 64 0.75%

Country name (1991-2000) CC %

UNITED STATES 15833 61.00%

UNITED KINGDOM 3414 13.15%

CANADA 1181 4.55%

NETHERLANDS 713 2.75%

HUNGARY 622 2.40%

FRANCE 478 1.84%

BELGIUM 439 1.69%

GERMANY 431 1.66%

INDIA 385 1.48%

DENMARK 345 1.33%

Country name (2001-2005) CC %

UNITED STATES 18789 52.93%

UNITED KINGDOM 4213 11.87%

CANADA 2098 5.91%

NETHERLANDS 1068 3.01%

BELGIUM 1054 2.97%

DENMARK 763 2.15%

GERMANY 745 2.10%

FRANCE 732 2.06%

HUNGARY 694 1.95%

CHINA 658 1.85%

Country name (2006-2010) CC %

UNITED STATES 32733 56.00%

UNITED KINGDOM 5295 9.06%

CANADA 3465 5.93%

NETHERLANDS 1639 2.80%

BELGIUM 1389 2.38%

HUNGARY 1305 2.23%

CHINA 1236 2.11%

GERMANY 1210 2.07%

AUSTRALIA 1131 1.93%

FINLAND 1021 1.75%

Table 3. Number of citations counts for countries

UNITED STATES took more than 50% of the world’s citation counts in LIS. UNITED KINGDOM ranked second for the four time periods; however, its share was reduced consecutively from 20% to 9%. CANADA, on the other hand, increased its share from 3% in 1965-1990 to 6% in 2006-2010.

Figure 2. Cartograms visualizations of citation distributions among countries

Page 6: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010
Page 7: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

Figure 3. Citation traffic visualization (1965-2010)

The four cartograms in Figure 2 were produced based on Dougenik, Chrisman, and Niemeyer’s (1985) technique. The four cartograms show that some European countries had stable shares in the four time periods, such as NETHERLANDS, BELGIUM, DENMARK, GERMANY, FRANCE, and HUNGARY. In the meantime, a rise of some Asia countries can also be identified, including CHINA, SOUTH KOREA, etc. Countries in South America and Africa are highly negatively skewed, and thus their impact is less visible in LIS. On the contrary, countries in North America and Europe are positively skewed. These countries possessed a high scientific productivity and their publications received more citations in turn, and therefore can be regarded as the research centers in LIS. Finally, the accumulative worldwide citation traffic in LIS (1965-2010) is displayed in Figure 3 which confirms the findings made from Figure 2.

CONCLUSION We have witnessed a growing trend over the past decade in the study of scholarly networks. However, these studies have not sufficiently addressed the impact of physical locations on scientific productivity. As pointed out by Fox and Hendler (2011), visualizations are critical to the understanding of complex data, in this study, we present several information visualization approaches, including cartograms and global citation maps, to better perceive productivity distribution in LIS at state and country level.

We find that the productivity and impact in LIS is highly skewed towards North American and European countries. Through the dynamic analysis, states such as GEORGIA and FLORIDA, countries such as CHINA and SOUTH KOREA are becoming more impactful. The visualization

approaches used in this study can serve as a starting point for ongoing scientific inquiries on spatial bibliometrics. Future studies should extend the current data set to include publications from varied disciplines and explore the geospatial distribution patterns for different disciplines.

ACKNOWLEDGEMENT The authors would like to thank Dr. Michael Gastner for his help on generating cartograms.

REFERENCES Carvalho, R., & Batty, M. (2006). The geography of

scientific productivity: Scaling in US computer science. Journal of Statistical Mechanics: Theory and Experiment, 10, DOI: 10.1088/1742-5468/2006/10/P10012

Dougenik, J. A., Chrisman, N. R., & Niemeyer, D. R. (1985). An algorithm to construct continuous area cartograms. The Professional Geographer, 37, 75–81.

Fox, R., & Hendler, J. (2011). Changing the equation on scientific data visualization. Science, 331(6018), 705-708.

Gastner, M.T., & Newman, M.E.J. (2004). Diffusion-based method for producing density-equalizing maps. Proceedings of the National Academy of Sciences of the United States of America, 101(20), 7499-7504.

Leydesdorff, L., & Persson, O. (2010). Mapping the geography of science: Distribution patterns and networks of relations among cities and institutes. Journal of the American Society for Information Science & Technology, 61(8), 1622-1634.

Page 8: Library and information science (LIS) as we see it: An overview at the state and country level from 1965–2010

SCImago (2007). SJR: SCImago Journal & Country Rank. Retrieved August 31, 2009 from http://www.scimagojr.com

Yan, E., & Sugimoto, C.R. (2011 forthcoming). Institutional interactions: Exploring the social, cognitive,

and geographic relationships between institutions as demonstrated through citation networks. Journal of the American Society for Information Science and Technology.