1
Our results on researchers’ downloading patterns are consistent with those obtained by other studies: Wang & al. (2012): the overall number of decreases in the weekends Magnone (2013) and Cabanac & Hartley (2013): a decline in the number of paper submissions during weekends Wang & al. (2012): American researchers’ timetable is much steadier than other countries However: Magnone (2013): the number of submissions is lower in fall and winter while paper submissions provide an indicator of researchers-authors working habits, can be performed by a much broader set of , such as undergraduates, practitioners or the general public (Moed & Halevi, 2016) Cabanac & Hartley (2013): an increase in the researchers’ work-related online activity between 2001 and 2012 the period we analysed is too short to show any change in researcher’s online habits from a year to another (the hourly and weekly patterns were similar from 2011 to 2015 for the three countries, except a very slight increase of during the weekend for Canada) Our results suggest that there are different types of Érudit users: while most of Érudit users in Canada and Quebec might be undergraduate students , downloads made in France and, especially, in the United States, might be performed by researchers . They also show the influence of special events ( ). Rock around the clock? Exploring scholars’ downloading patterns 1. CONTEXT 2. METHODS Sarah Cameron-Pesant 1 , Yorrick Jansen 2 and Vincent Larivière 3 [email protected] 1 Université de Montréal, École de bibliothéconomie et des sciences de l’information, 3150 Jean-Brillant, H3T 1N8 Montréal, Qc. (Canada) 2 1science, 3863 St-Laurent Blvd. Suite 206, H2W 1Y1 Montréal, Qc. (Canada) 3 Université de Montréal, École de bibliothéconomie et des sciences de l’information, C.P.6128, Succ. Centre-Ville, H3C 3J7 Montréal, Qc. (Canada) and Université du Québec à Montréal, Centre interuniversitaire de recherche sur la science et la technologie (CIRST), Observatoire des sciences et des technologies (OST), C.P. 8888, Succ. Centre-Ville, H3C 3P8 Montréal, Qc. (Canada) Canada Research Chair on the Transforma2ons of Scholarly Communica2on Prof. Vincent Larivière IF ( the referring field is unassigned AND no images are requested AND the IP address downloads more than 100 scholarly papers / day ) OR ( no Javascripts are requested AND no CSS are requested AND the IP address downloads more than 100 scholarly papers / day ) THEN classify as a web robot 4. DISCUSSION 3. RESULTS Number of log files 2,062 Total number of HTTP requests in the log files 999,367,190 Percentage of parsable HTTP requests in the log files 99.99 % Percentage of HTTP requests that refer to scholarly papers ( and ) 10.34 % Percentage of HTTP requests that refer to scholarly papers ( only) 3.95 % Total number of of scholarly papers by in Érudit’s web logs 39,437,659 Number of of scholarly papers by analysed (excluded: NULLs and 2010 because of missing data) [19,318,374 .. 19,430,509] References Cabanac, G., & Hartley, J. (2013). Issues of work–life balance among JASIST authors and editors. Journal of the American Society for Information Science and Technology, 64(10), 2182-2186. doi:10.1002/asi.22888 Doran, D. & Gokhale, S. S. (2010). Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 22(1-2), 183210. doi:10.1007/s10618-010-0180-z Geens, N., Huysmans, J. & Vanthienen, J. (2006). Evaluation of web robot discovery techniques: a benchmarking study. In P. Perner (Ed.), Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining (pp. 121130). Springer Berlin Heidelberg. Retrieved from http://link.springer.com/chapter/ 10.1007/11790853_10 Guest, D. E. (2002). Perspectives on the study of work-life balance. Social Science Information , 41 (2), 255-279. doi: 10.1177/0539018402041002005 Magnone, E. (2013). A scientometric look at calendar events. Journal of Informetrics, 7, 101-108. doi:10.1016/j.joi.2012.09.006 Moed, H. F., & Halevi, G. (2016). On full text download and citation distributions in scientific-scholarly journals. Journal of the Association for Information Science and Technology, 67(2), 412-431. doi:10.1002/asi Wang, X., Xu, S., Peng, L., Wang, Z., Wang, C., Zhang, C., & Wang, X. (2012). Exploring scientists’ working timetable: do scientists often work overtime? Journal of Informetrics, 6(4), 655-660. doi:10.1016/j.joi. 2012.07.003 Hourly and weekly working patterns for the three countries: Monday, Tuesday and Wednesday are busiest; downloads on Friday and Sunday are similar start day slightly earlier than the French effect of lunch time and dinner activity much lower during the weekend than the French and American continue to work later in the afternoon and early evening effect of lunch time and dinner active at night, even during the weekends activity steadier across the day and across the week Downloads throughout the academic year especially active from September to November and from February to March a decrease in March a decrease in December and January (holidays ) downloads go down as the academic year goes by a decrease in December and January (holidays ) downloads go down as the academic year goes by activity steadier across the academic year lower proportion of downloads in winter & summer Data source: 166,098 papers from 106 scholarly journals mostly in the social sciences and humanities (SSH) from Érudit’s collection Web log data from April 1 st 2010 to December 31 st 2015 Geolocation of users found with the IP addresses Data cleaning: Identification of successful downloads of scholarly papers Robot detection technique to exclude downloads performed by web crawlers and robots behaving like humans : n = 19,318,374 n = 19,413,395 n = 19,413,395 n = 19,413,395 The issue of work-life balance are increasingly being debated in our modern societies as the pressures of work are getting heavier. Scholars are no exception, especially in the “publish or perish” culture. Different authors have investigated researchers’ working habits, such as work-life balance in information science (Cabanac & Hartley, 2013), calendar effects on the dissemination of science (Magnone, 2013), researchers’ timetable (Wang & al., 2012), and seasonal influences and academic life cycles (Moed & Halevi, 2016). The aim of this project is to explore scholars’ downloading behaviour on four timescales: hourly, weekly, monthly and by season. This is performed by analysing the web log data of Érudit, the main diffusion platform for French-Canadian journals, for the three countries (Canada, France and the USA) that account for 58.48 % of all downloads.

Rock around the clock? Exploring scholars’ downloading patterns

Embed Size (px)

Citation preview

Page 1: Rock around the clock? Exploring scholars’ downloading patterns

Our results on researchers’ downloading patterns are consistent with those obtained by other studies: •  Wang & al. (2012): the overall number of decreases in the weekends •  Magnone (2013) and Cabanac & Hartley (2013): a decline in the number of paper submissions during weekends •  Wang & al. (2012): American researchers’ timetable is much steadier than other countries

However: •  Magnone (2013): the number of submissions is lower in fall and winter

•  while paper submissions provide an indicator of researchers-authors working habits, can be performed by a much broader set of , such as undergraduates, practitioners or the general public (Moed & Halevi, 2016)

•  Cabanac & Hartley (2013): an increase in the researchers’ work-related online activity between 2001 and 2012 •  the period we analysed is too short to show any change in researcher’s online habits from a year to another (the hourly and weekly patterns were similar from 2011 to 2015 for the three countries,

except a very slight increase of during the weekend for Canada)

Our results suggest that there are different types of Érudit users: while most of Érudit users in Canada and Quebec might be undergraduate students , downloads made in France and, especially, in the United States, might be performed by researchers . They also show the influence of special events ( ).

Rock around the clock?Exploring scholars’ downloading patterns

1. CONTEXT 2. METHODS

Sarah Cameron-Pesant1, Yorrick Jansen2 and Vincent Larivière3

[email protected] 1 Université de Montréal, École de bibliothéconomie et des sciences de l’information, 3150 Jean-Brillant, H3T 1N8 Montréal, Qc. (Canada) 2 1science, 3863 St-Laurent Blvd. Suite 206, H2W 1Y1 Montréal, Qc. (Canada) 3 Université de Montréal, École de bibliothéconomie et des sciences de l’information, C.P. 6128, Succ. Centre-Ville, H3C 3J7 Montréal, Qc. (Canada) and Université du Québec à

Montréal, Centre interuniversitaire de recherche sur la science et la technologie (CIRST), Observatoire des sciences et des technologies (OST), C.P. 8888, Succ. Centre-Ville, H3C 3P8 Montréal, Qc. (Canada)

CanadaResearchChairontheTransforma2onsofScholarlyCommunica2onProf.VincentLarivière

IF ( the referring field is unassigned AND no images are requested AND the IP address

downloads more than 100 scholarly papers / day )

OR( no Javascripts are requested

AND no CSS are requested AND the IP address downloads more than 100 scholarly papers / day )

THEN classify as a web robot

4. DISCUSSION

3. RESULTS

Number of log files 2,062

Total number of HTTP requests in the log files 999,367,190

Percentage of parsable HTTP requests in the log files 99.99 %

Percentage of HTTP requests that refer to scholarly papers ( and ) 10.34 %

Percentage of HTTP requests that refer to scholarly papers ( only) 3.95 %

Total number of of scholarly papers by in Érudit’s web logs 39,437,659

Number of of scholarly papers by analysed (excluded: NULLs and 2010 because of missing data)

[19,318,374 .. 19,430,509]

References Cabanac, G., & Hartley, J. (2013). Issues of work–life balance among JASIST

authors and editors. Journal of the American Society for Information Science and Technology, 64(10), 2182-2186. doi:10.1002/asi.22888

Doran, D. & Gokhale, S. S. (2010). Web robot detection techniques : overview and limitations. Data Mining and Knowledge Discovery, 22(1-2), 183‑210. doi:10.1007/s10618-010-0180-z

Geens, N., Huysmans, J. & Vanthienen, J. (2006). Evaluation of web robot discovery techniques: a benchmarking study. In P. Perner (Ed.), Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining (pp. 121‑130). Springer Berlin Heidelberg. R e t r i e v e d f r o m h t t p : / / l i n k . s p r i n g e r. c o m / c h a p t e r /10.1007/11790853_10

Guest, D. E. (2002). Perspectives on the study of work-life balance. Social S c i e n c e I n f o r m a t i o n , 4 1 ( 2 ) , 2 5 5 - 2 7 9 . d o i :10.1177/0539018402041002005

Magnone, E. (2013). A scientometric look at calendar events. Journal of Informetrics, 7, 101-108. doi:10.1016/j.joi.2012.09.006

Moed, H. F., & Halevi, G. (2016). On full text download and citation distributions in scientific-scholarly journals. Journal of the Association for Information Science and Technology, 67(2), 412-431. doi:10.1002/asi

Wang, X., Xu, S., Peng, L., Wang, Z., Wang, C., Zhang, C., & Wang, X. (2012). Exploring scientists’ working timetable: do scientists often work overtime? Journal of Informetrics, 6(4), 655-660. doi:10.1016/j.joi.2012.07.003

Hourly and weekly working patterns

•  for the three countries: Monday, Tuesday and Wednesday are busiest; downloads on Friday and Sunday are similar

•  start day slightly earlier than the French

•  effect of lunch time and dinner

•  activity much lower during the weekend than the French and American

•  continue to work later in the afternoon and early evening

•  effect of lunch time and dinner

•  active at night, even during the weekends

•  activity steadier across the day and across the week

Downloads throughout the academic year

•  especially active from September to November and from February to March

•  a decrease in March

•  a decrease in December and January (holidays )

•  downloads go down as the academic year goes by

•  a decrease in December and January (holidays )

•  downloads go down as the academic year goes by

•  activity steadier across the academic year

•  lower proportion of downloads in winter & summer

•  Data source: 166,098 papers from 106 scholarly journals mostly in the social sciences and humanities (SSH) from Érudit’s collection

•  Web log data from April 1st 2010 to December 31st 2015

•  Geolocation of users found with the IP addresses

•  Data cleaning:

•  Identification of successful downloads of scholarly papers

•  Robot detection technique to exclude downloads performed by web crawlers and robots behaving like humans :

n = 19,318,374 n = 19,413,395

n = 19,413,395 n = 19,413,395

The issue of work-life balance are increasingly being debated in our modern societies as the pressures of work are getting heavier. Scholars are no exception, especially in the “publish or perish” culture. Dif ferent authors have investigated researchers’ working habits, such as work-life balance in information science (Cabanac & Hartley, 2013), calendar effects on the dissemination of science (Magnone, 2013), researchers’ timetable (Wang & al., 2012), and seasonal influences and academic life cycles (Moed & Halevi, 2016).

The aim of this project is to explore scholars’ downloading behaviour on four timescales: hourly, weekly, monthly and by season. This is performed by analysing the web log data of Érudit, the main diffusion platform for French-Canadian journals, for the three countries (Canada, France and the USA) that account for 58.48 % of all downloads.