17
Capturing collabportunities: A method to evaluate collaboration opportunities in information search using pseudocollaboration Rutgers University has made this article freely available. Please share how this access benefits you. Your story matters. [https://rucore.libraries.rutgers.edu/rutgers-lib/47405/story/] This work is an ACCEPTED MANUSCRIPT (AM) This is the author's manuscript for a work that has been accepted for publication. Changes resulting from the publishing process, such as copyediting, final layout, and pagination, may not be reflected in this document. The publisher takes permanent responsibility for the work. Content and layout follow publisher's submission requirements. Citation for this version and the definitive version are shown below. Citation to Publisher Version: González-Ibáñez, Roberto, Shah, Chirag & White, Ryen W. (2014). Capturing collabportunities: A method to evaluate collaboration opportunities in information search using pseudocollaboration. Journal of the Association for Information Science and Technology 66(9), 1897-1912. https://dx.doi.org/10.1002/asi.23288 Citation to this Version: González-Ibáñez, Roberto, Shah, Chirag & White, Ryen W. (2014). Capturing collabportunities: A method to evaluate collaboration opportunities in information search using pseudocollaboration. Journal of the Association for Information Science and Technology 66(9), 1897-1912. Retrieved from http://dx.doi.org/doi:10.7282/T31R6SB8 Terms of Use: Copyright for scholarly resources published in RUcore is retained by the copyright holder. By virtue of its appearance in this open access medium, you are free to use this resource, with proper attribution, in educational and other non-commercial settings. Other uses, such as reproduction or republication, may require the permission of the copyright holder. Article begins on next page SOAR is a service of RUcore, the Rutgers University Community Repository RUcore is developed and maintained by Rutgers University Libraries

Capturing collabportunities: A method to evaluate

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Capturing collabportunities: A method to evaluate

Capturing collabportunities: A method to evaluate collaboration opportunities in information search using pseudocollaboration

Rutgers University has made this article freely available. Please share how this access benefits you.Your story matters. [https://rucore.libraries.rutgers.edu/rutgers-lib/47405/story/]

This work is an ACCEPTED MANUSCRIPT (AM)

This is the author's manuscript for a work that has been accepted for publication. Changes resulting from the publishing process, such as copyediting, final layout, and pagination, may not be reflected in this document. The publisher takes permanent responsibility for the work. Content and layout follow publisher's submission requirements.

Citation for this version and the definitive version are shown below.

Citation to Publisher Version: González-Ibáñez, Roberto, Shah, Chirag & White, Ryen W. (2014). Capturing collabportunities: A method to evaluate collaboration opportunities in information search using pseudocollaboration.

Journal of the Association for Information Science and Technology 66(9), 1897-1912. https://dx.doi.org/10.1002/asi.23288

Citation to this Version: González-Ibáñez, Roberto, Shah, Chirag & White, Ryen W. (2014). Capturing collabportunities: A method to evaluate collaboration opportunities in information search using pseudocollaboration. Journal of the Association for Information Science and Technology 66(9), 1897-1912. Retrieved

from http://dx.doi.org/doi:10.7282/T31R6SB8

Terms of Use: Copyright for scholarly resources published in RUcore is retained by the copyright holder. By virtue of its appearance in this open access

medium, you are free to use this resource, with proper attribution, in educational and other non-commercial settings. Other uses, such as reproduction or

republication, may require the permission of the copyright holder.

Article begins on next page

SOAR is a service of RUcore, the Rutgers University Community Repository

RUcore is developed and maintained by Rutgers University Libraries

Page 2: Capturing collabportunities: A method to evaluate

Capturing Collabportunities: A Method to EvaluateCollaboration Opportunities in Information SearchUsing Pseudocollaboration

Roberto González-IbáñezDepartamento de Ingeniería Informatica, Universidad de Santiago de Chile, Santiago 9170124, Chile.E-mail: [email protected]

Chirag ShahSchool of Communication & Information (SC&I), Rutgers University, 4 Huntington Street, New Brunswick, NJ08901-1071. E-mail: [email protected]

Ryen W. WhiteMicrosoft Research, One Microsoft Way, Redmond, WA 98052. E-mail: [email protected]

In explicit collaborative search, two or more individualscoordinate their efforts toward a shared goal. Every day,Internet users with similar information needs have thepotential to collaborate. However, online search is typi-cally performed in solitude. Existing search systems donot promote explicit collaborations, and collaborationopportunities (collabportunities) are missed. In thisarticle, we describe a method to evaluate the feasibilityof transforming these collabportunities into recommen-dations for explicit collaboration. We developed a tech-nique called pseudocollaboration to evaluate thebenefits and costs of collabportunities through simula-tions. We evaluate the performance of our method usingthree data sets: (a) data from single users’ search ses-sions, (b) data with collaborative search sessionsbetween pairs of searchers, and (c) logs from a large-scale search engine with search sessions of thousandsof searchers. Our results establish when and how col-labportunities would significantly help or hinder thesearch process versus searches conducted individually.The method that we describe has implications for thedesign and implementation of recommendation systemsfor explicit collaboration. It also connects system-mediated and user-mediated collaborative search,whereby the system evaluates the likely benefits of col-laborating for a search task and helps searchers makemore informed decisions on initiating and executingsuch a collaboration.

Introduction

Many Internet users search for information aboutsimilar or identical topics simultaneously with no aware-ness of each other’s highly related search activity. As moti-vation for our research, we analyzed search engine logscontaining traces of search behavior from consenting usersof a browser toolbar distributed by the Microsoft Bingsearch engine on a particular exploratory search topic, inthis case, the British Petroleum (BP) Deepwater Horizonoil spill (explained later). This analysis revealed that,around the time of the incident in 2010, there were alwaysat least two people searching for related information withinthe same hour, and about 38% of the time there were atleast two people searching for the topic within the sameminute. Figure 1 illustrates this phenomenon for userssearching on this topic in each hour of the day duringOctober 6, 2010 (Figure 1a), and also in each minute from12:00 to 1:00 PM of the same day (Figure 1b). Unlesspeople previously coordinate and agree to collaborateduring search—with or without the support of tools—collaboration opportunities (referred to in this article ascollabportunities) are typically missed, but they could becaptured. To address this issue, search engines couldprovide direct support for connecting searchers with can-didate collaborators who are pursuing similar search taskssimultaneously. Searchers could then decide whether topursue the suggested collaboration.

In information retrieval (IR) research, many methodshave been developed to help people search more effectively,

Received July 16, 2013; revised February 25, 2014; accepted March 6, 2014

© 2014 ASIS&T • Published online in Wiley Online Library(wileyonlinelibrary.com). DOI: 10.1002/asi.23288

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, ••(••):••–••, 2014

Page 3: Capturing collabportunities: A method to evaluate

including query suggestions (Anick & Kantamneni, 2008),result recommendations (Resnick & Varian, 1997), informa-tion filtering (Adomavicius & Tuzhilin, 2005), and person-alization (Belkin, 2000). In addition to supporting anindividual’s search activity, researchers have argued thatcollaboration could facilitate more enhanced informationexploration in part by exploiting the different skills andexperiences of collaborators (Hyldegard, 2006, 2009; Shah,2010b; Shah & González-Ibáñez, 2011; Twidale, Nichols, &Paice, 1997). For instance, a subject specialist (e.g., an attor-ney) and a search specialist (e.g., a legal librarian) couldcollaborate to retrieve better results and deeper understand-ing of the material and its sources. There are many othersimilar scenarios in which two or more individuals cancombine their skills via collaboration to fulfill shared infor-mation needs and improve the quality of search outcomes.Unfortunately, in many search settings, such opportunities tocollaborate are latent and often remain undiscovered. To thebest of our knowledge, there are no current IR solutions thatdynamically identify and evaluate collaboration opportuni-ties to promote explicit and intentional collaboration amongsearchers.

Capitalizing on collabportunities could allow an infor-mation seeker to discover information that is difficult tofind or synthesize by an individual (Pickens, Golovchinsky,Shah, Qvarfordt, & Back, 2008), or it could help boostnovelty/diversity in addition to relevance (Shah, 2010a).There are two ways such collaborative information-seeking(CIS) activities are currently supported, system mediatedand user mediated (Golovchinsky, Pickens, & Back, 2008).The former imposes roles that can enhance search perfor-mance at the expense of autonomy for searchers, whereasthe latter gives more control to searchers with little or nointervention from the system. These approaches have theiradvantages and disadvantages, but a common disadvantageis that neither of them can provide a sense of relativebenefit to the searcher from engaging in collaboration,which leads to lost collabportunities. This problem couldbe addressed by (a) identifying collabportunities and (b)informing searchers about potential benefits of collaborat-ing during their search.

On the assumption that specific collabportunities exist ata given time, one challenge that we have to address isdetermining whether they should be transformed into

FIG. 1. Number of users searching for information about the British Petroleum (BP) oil spill per hour on October 6, 2010 (a), and per minute within a singlehour on October 6, 2010 (b). Log sample obtained from the Microsoft Bing web search engine. [Color figure can be viewed in the online issue, which isavailable at wileyonlinelibrary.com.]

2 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 4: Capturing collabportunities: A method to evaluate

recommendations for collaboration and offered to users. Toperform this evaluation, it is necessary to measure the poten-tial of a collabportunity, estimating how it could help orhinder the information search process. In other words, wehave to measure the relative costs and benefits of collabo-rating on a search task at a given time. As stated byGonzález-Ibáñez, Shah, and White (2012), collaboration istypically associated with more successful searching;however, in the same way that collaboration can help, it canalso hurt the process and its products, for example, reducingefficiency by introducing the need to coordinate searchactivity or reducing effectiveness by directing searchers toirrelevant material. In the context of collabportunity evalu-ation, if the benefits of collaborating with another searcherexceed the associated costs, then such opportunities can betransformed into actual recommendations for collaborationand recommended to both searchers by the search system.

There are many aspects of collabportunities that we couldinvestigate, including the ways in which search activitycould be disrupted or enhanced through collaboration. Tonarrow our exploration in this area, we focus our study on asingle exploratory topic deemed representative of the typesof scenarios in which we believe that this approach couldhelp. We answer the following research question.

• When are good opportunities for people to collaborate in anexploratory search? In other words, can we identify pointsalong a user’s search process at which it is advantageous forthem to collaborate with other searchers performing a similartask?

To address this question, we propose a method to evalu-ate collabportunities using an adaptation of the pseudocol-laboration technique described by González-Ibáñez et al.(2012). In our previous work, we devised a pseudocollabo-ration method of timely assistance to searchers. In thework reported in this article, we extend that techniqueto evaluate the feasibility of potential collaborationsbetween searchers who share similar information needsbased on the simulation of collaborative search processesbetween them. From these simulations, we evaluateat each time point (each minute in our case) the benefits(help) and costs (hurt) of collaboration opportunities interms of two measures, search effectiveness and searchefficiency.

As mentioned, we focus on a typical search scenariocomprising exploratory information-seeking behavior. Suchsearch tasks occur frequently, and we believe that they aresufficiently lengthy and complex (the research of Borlundand Ingwersen [1999] offers an empirical justification of thisassumption) that assistance from another searcher may bevaluable. As part of our investigation, we design and conductan experiment with three data sets comprising logs of searchbehavior from different sources: (a) logs from a laboratorystudy with single users performing the exploratory searchtask, (b) logs from a laboratory study with collaborativepairs performing the exploratory search task, and (c) logs

from the Microsoft Bing search engine covering thousandsof users’ search sessions on the same topic as was used for(a) and (b). The search engine log data also provide us withan opportunity to assess the performance of our method in alarge-scale naturalistic setting similar to that in which itwould obtain in the wild.

The remainder of this article is organized as follows. Thenext section describes some relevant literature on collabora-tive IR, mediated collaboration, and other relevant areassuch as mining on-task behaviors from historic search logs.The third section provides a detailed explanation and anexample of collaboration opportunities. The fourth sectiondescribes our approach to evaluate collabportunities. Thefifth section describes our experiment, data, and measures.The sixth section provides a detailed explanation of theresults derived from our experiments and revisits ourresearch question. Finally, we discuss implications for thedesign and implementation of search systems capable ofrealizing the benefits of collaboration where appropriate andpresent future directions of our work.

Background

Previous research on collaborative IR (CIR) and collab-orative information seeking has focused on implicit/unintentional and explicit/intentional forms of collaboration(Fidel et al., 2000; Fidel, Pejtersen, Cleal, & Bruce, 2004;Golovchinsky et al., 2008; Morris, 2008; Shah, 2009,2010a). Research on implicit/unintentional collaboration,such as collaborative filtering, has targeted methods capableof exploiting individual behaviors in a population to aidusers within it.

Instead of diving into definitional details, we provideFigure 2, which shows collabportunity evaluation with

FIG. 2. Collaboration opportunities (collabportunities) evaluation withrespect to collaborative information retrieval (CIR), collaborativeinformation behavior (CIB), and collaborative filtering.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 3DOI: 10.1002/asi

Page 5: Capturing collabportunities: A method to evaluate

respect to related research topics on the human−system andexplicit−implicit dimensions.1 The research focus of ourstudy lies at the intersection of human−computer interac-tion, information retrieval, and human information behavior.Moreover, as we explain in the following section, collabpor-tunity evaluation has as an objective the identification,assessment, and transformation of opportunities to collabo-rate in information search that are typically found in non-collaborative situations and also in scenarios in whichimplicit/unintentional forms of collaboration are developed.We also note that our approach is different from others suchas collaborative filtering (Resnick & Varian, 1997) in thatcollabportunity evaluation focuses on search processes(queries and result selections) rather than products (e.g., taskoutcomes).

Search logs collected by search engines have also beenanalyzed to extract the on-task behavior of searchers. Thematching with the current search situation has been per-formed at the query level (Agichtein, Brill, & Dumais, 2006;Joachims, 2002) using sequences of related queries(Radlinski & Joachims, 2005) and, most recently, moresophisticated models of on-task behavior (White et al.,2013). These approaches leverage the historic behaviorof many searchers captured in search logs to identifyresources of potential interest to the current searcher giventheir ongoing task. Long-term personalization of searchbehavior (e.g., Bennett et al., 2012) is also relevant in thiscontext, focusing on prior task-relevant behavior from asingle searcher only. None of this research on mining his-toric search behavior supports the initiation of synchronouscollaborations between searchers as we propose in thisarticle.

Research on explicit/intentional collaboration in infor-mation search has focused primarily on the study of behav-iors and collaborative practices that can be supported withtechnology. Work on CIR is often characterized by thenature of collaboration, as outlined by Pickens et al.(2008):

• System or algorithmically mediated. The system acts as anactive agent and provides mediation among the collaboratorsto enhance their productivity and experience. Examplesystems include Cerchiamo (Golovchinsky, Adcock, Pickens,Qvarfordt, & Back, 2008; Golovchinsky et al., 2008) andQuerium (Golovchinsky, Dunnigan, & Diriye, 2012).

• User or interface mediated. The control lies with the col-laborators, with the system being a passive component.The users drive the collaboration, and the systemprimarily provides various functions at the interface level.Examples include Ariadne (Twidale & Nichols, 1996),SearchTogether (Morris & Horvitz, 2007), and Coagmento(Shah, 2010a).

In both cases (system and user mediated), it is assumedthat collaboration over search happens as a part of a

larger, ongoing collaborative project among participants.Golovchinsky et al. (2012) describe a tool called Queriumthat supports and mediates explicit collaboration amongusers. It does so by providing specialized tools thatenable them to perform actions such as informationsharing and communication. In addition, the tool incorpo-rates algorithmic mediation, which is system driven anddoes not require explicit user intervention. In systems suchas SearchTogether (Morris & Horvitz, 2007) and Coag-mento (Shah, 2010a), users have complete control over aset of features that support the collaborative searchprocess. However, unlike Querium, in these systems, col-laboration depends exclusively on explicit actions fromusers. Synchronous social question-answering systems,such as Aardvark (Horowitz & Kamvar, 2010) and IBMCommunity Tools (Weisz, Erickson, & Kellogg, 2006),let people pose questions to others via synchronouscommunication channels such as instant messaging. Theyassume that the participants in the dialog have clearlydefined roles (an asker with a question and an answererwith the subject-matter expertise to furnish a possibleanswer), rather than the searcher roles that we targetwith our method, in which both parties assume similarroles and have similar information-seeking objectivessimultaneously.

Our approach is oriented toward the detection and evalu-ation of latent opportunities to collaborate to determine theirviability to be transformed into explicit and intentional col-laboration. In this sense, we consider the identification andevaluation of collabportunities as a specialized form ofsystem-mediated collaboration.

Collabportunities

The preceding sections have briefly introduced theconcept of collabportunity in information search. Collabpor-tunity refers to latent opportunities for collaborationbetween two or more individuals with common informationneeds at a similar time. As illustrated in Figure 1, there areoften multiple users searching for the same topic at oraround the same point in time (at least in the case of theexploratory topic that we targeted in the investigation). Ifthese searchers could be connected, then they could benefitfrom shared experiences, including sharing resources thatthey have encountered during their searches so far, queriesthat have been particularly useful, or perspectives on thetopic itself.

To illustrate this idea, consider the following exampledepicted in Figure 3: Users A and B have a common infor-mation need; however, they are not aware of this situation.On a given day, both users are seeking and gathering infor-mation about the same search topic simultaneously,namely, the Deepwater Horizon oil spill, an environmentaldisaster that occurred in 2010 in the Gulf of Mexico. Theyare seeking information about causes, implications, andreactions pertaining to this event. Despite the collaborationpotential between users A and B (the collabportunity)1See Appendix for definitions of relevant terms.

4 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 6: Capturing collabportunities: A method to evaluate

while working on a search task using current searchengine technology, this opportunity vanishes becauseof the lack of a reliable method for identifying, assessing,and promoting it as an explicit and intentional form ofcollaboration.

Figure 3 depicts two types of search processes, ongoingsearch processes (reflecting current search activity) andpast search processes (reflecting historic search activity). Inpast search processes, people conduct search sessions on avariety of topics, and multiple people search on the sametopic (e.g., users C ≈ A, D ≈ B, E ≈ A, F ≈ B). We use thesymbol ≈ to denote similarity along one or more dimensionsbetween users (e.g., C ≈ A means that user C is similar touser A). Missing collabportunities like those representedbetween users C ≈ A and D ≈ B in Figure 3, or even actualcollaboration episodes, could serve as valuable sources ofknowledge to simulate and predict how viable current col-labportunities are. This idea is further developed in the fol-lowing section.

On every occasion when people share common infor-mation needs, as in the case of users A and B in Figure 3,there are latent collabportunities. In this sense, all collab-portunities are potential candidates to become recommen-dations for actual collaborations. However, not everycollabportunity guarantees a successful collaboration. Acollabportunity indicates only that there exists the potentialfor actual collaborations between people. For instance, inour example, if users A and B were connected to allowcollaboration, the process and its results might end upbeing better or worse than if they were to continueworking in isolation. Thus, a fundamental problem to solveis the estimation of costs and benefits of collabportunitiesbefore actual collaborations take place or recommenda-tions are made. Once collabportunities are properly evalu-ated, if their benefits significantly outweigh the associatedcosts, then collabportunities can be transformed into rec-ommendations for collaborations and offered to the user.At that point, the decision resides with users on whetherthey accept the invitation and pursue the collaboration,

although the system could provide an estimate of the likelybenefit to help inform the decision. The next sectiondescribes a method to evaluate collabportunities in deter-mine whether they should be promoted to recommenda-tions for collaboration.

Methods

The evaluation of collabportunities in informationsearch requires procedures and measures to evaluateproperly the collaboration process and its products. Unfor-tunately, at any given time, collabportunities are justopportunities; they are neither complete processes norproducts to be evaluated. To overcome this problem, wepropose an evaluation approach of simulations of collabo-ration processes and estimations of their potential results.This estimation is performed using an adapted version ofpseudocollaboration (González-Ibáñez et al., 2012), whichsupports searchers in their information-seeking processesby estimating an opportune moment to recommend prod-ucts (e.g., pages found, queries generated) derived frompast sessions that help users to find useful information andminimize the required effort.

Pseudocollaboration works through simulated collabora-tions using search logs of users with complete search ses-sions (see Figure 3). The simulation of collaborationsfollows six steps.

1. Given a user A with a search session in progress, locatethe set of users {≈A} in the session log search patternssimilar to those displayed by A during the first fewminutes of the session.

2. Perform all possible combinations of users in {≈A} andtheir respective search sessions to simulate collaborativesearch processes.

3. For each simulated collaboration, sc, evaluate its perfor-mance per minute.

4. Determine when such simulated collaborations show sig-nificant benefits with respect to the individual search ses-sions of users in {≈A}.

5. Select products such as pages or queries from sc thatdisplay the highest improvement.

6. With the selected products, provide product recommen-dations to A when predictions display the best relation ofcost to benefit (help−hurt).

As explained by González-Ibáñez et al. (2012), the termpseudocollaboration is used to indicate a “lack of explic-itness and intentionality in which convenient and imper-sonal combinations of users’ search sessions take place”(p. 2). In this article, however, we use pseudocollaborationto predict benefits and costs of collabportunities, whichlater could become explicit and intentional forms of col-laboration. Let us explore this further by using the follow-ing scenario. Two independent users, A and B, are lookingfor digital cameras in a popular online store at about thesame time. Based on their queries and the types of cameras

FIG. 3. View of collaboration opportunities (collabportunities) over time.coc, current collabportunities; cop, past collabportunities based on completesearch sessions. [Color figure can be viewed in the online issue, which isavailable at wileyonlinelibrary.com.]

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 5DOI: 10.1002/asi

Page 7: Capturing collabportunities: A method to evaluate

they have explored so far, it seems that they are looking forsimilar specifications. With this information, it is possibleto find in the search logs of the online store that manyother past users in {≈A} and {≈B} have been looking forsimilar cameras, leading them to purchase differentmodels. Our method would perform all combinations ofthe completed search processes of users in {≈A} and {≈B}to simulate how the collaboration between A and B wouldplay out. Successful combinations of users in {≈A} and{≈B} could show an increasing information exposure touseful content that helps users to make informed decisions.Multiple combinations displaying significant improve-ments compared with individual performance could serveas an indicator of the viability of the collabportunitybetween A and B.

Adapting the pseudocollaboration method to evaluatecollabportunities involves the following steps.

1. Having a collabportunity, coc, between users A and B(note that the approach can be applied to more than twousers), find pairs of users in {≈A} and {≈B} with searchpatterns similar to those of A and B, respectively, duringthe first few minutes of the search session. This is illus-trated in Figure 3.

2. For each user pair (α,β) with α ∈ {≈A} and β ∈ {≈B},align and combine the individual search sessions of α andβ to simulate collaborative search processes.

3. For each simulated collaboration, sc, evaluate its perfor-mance in each minute.

4. Identify when simulated collaborations sc show signifi-cant improvements with respect to the baseline perfor-mance of users in {≈A} and {≈B}.

5. Determine the help−hurt relation for each sc to estimatethe goodness of an eventual collaboration between Aand B.

6. Finally, if the ratio between the number of sc that helpsand the total number of simulated collaboration is greaterthan a given threshold, then coc is captured, otherwise it isdiscarded.

Algorithm 1 presents a generalized form of this proce-dure to evaluate a collabportunity with N users over time. Asdetailed in Algorithm 1, the procedure EvaluateCollabpor-tunity(coc) relies on different functions and subevaluations,briefly explained below.

Algorithm 1. General procedure to evaluate a collab-portunity and determine whether it should be captured ordiscarded.

Proc EvaluateCollabportunity(coc: collabportunity,threshold)1: output capture2: begin3: /* find collabportunities cop in logs with N users similar to those in coc */4: COp←getPastCollabportunities(coc)5: for each cop ∈ COp do6: sc←simulateCollaboration(cop)7: for each user in cop do8: for t←t0 to getMaxTime(sc) do9: psc←getPerformance(sc,t)10: pu←getPerformance(user,t)11: if (psc>pu) or (psc<pu) at p=0.05 then12: //save performance difference at time t13: Δperformance[user,t]←psc-pu14: end if15: end for16: end for17: /* help: (psc-pu)>0 and hurt: (psc-pu) <0 */18: if HelpHurtRatio(Δperformance)>1 ∀ user in cop then19: help←help+120: end if21: end for22: if (help/size(COp)>threshold) then23: capture←true24: else25: capture←false26: end if27: end

6 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 8: Capturing collabportunities: A method to evaluate

First, the function getPastCollabportunities finds andretrieves past collabportunities cop similar to coc. Unlikecurrent collabportunity coc, past collabportunities cop arebased on search sessions that are complete and availablethrough logs. Therefore, for each cop a simulated collabora-tive search process sc is generated through the functionsimulateCollaboration in line 6. The function getMaxTimein line 8 returns the calculated time in which simulatedcollaboration sc is complete. Then, the function getPerfor-mance in lines 9 and 10 can be any performance measuresuch as number of useful pages visited, search effectiveness,and search efficiency. The function HelpHurtRatio in line 18computes the relative cost−benefit of simulated collabora-tion sc with respect to each user by using results contained inthe array Δperformance. As specified in line 18, an sc will beconsidered as positive if the benefits exceed the costs foreach user in the corresponding collabportunity, cop; in a suchcase, a collabportunity is considered to be helpful (line 19).Finally, after all past collabportunities cop are evaluated, thedecision on whether the input collabportunity coc is captureddepends on the numbers of cop in COp that succeed versusthe total number of cop in COp evaluated, which is obtainedwith the function size in line 22. This result is then comparedwith a given threshold, which can be tuned for precision andrecall as needed in the application scenario.

Given the above general procedure to evaluate collabpor-tunities, the next section presents an evaluation approach totest our method with a particular focus on the researchquestion stated earlier. In other words, we seek to determinethe location of good opportunities for individuals to collabo-rate in an exploratory search task.

Evaluation

The data that we used were gathered from laboratorysettings and through logs of the Microsoft Bing Web searchengine. The use of a range of data types allowed us toevaluate the performance of our method from different per-spectives, improving the robustness and generalizability ofthe conclusions that we can draw from our investigation.Below we describe the experiment, the performance mea-sures, and data that we used in this study.

Experiment

To test our approach, we designed an experiment usingthree data sets. The experiment focuses on evaluatingwhether a given collabportunity should be captured andconsiders the following initial condition. Given a non-collaborative situation, let coc be a collabportunity betweenusers A and B while working on an exploratory search taskabout the Deepwater Horizon oil spill. The objective of thisexperiment is to evaluate past collabportunities found in thedifferent datasets—with searchers performing the sametask—to estimate the feasibility of coc.

With each data set, we created all possible pairs of userswhere each pair represents a past collabportunity cop equiva-lent to coc. The datasets contain complete search sessions, so

simulated collaborations were generated for each cop byaligning and combining the search sessions, which includequeries, search result pages (SERPs), and content pages.

Performance Measures

Search sessions from each data set constitute baselineswith which to compare the performance with the simulatedcollaborative processes. For the performance measuresrequired by Algorithm 1, we use search effectiveness andsearch efficiency, and a procedure to perform comparisonsover time in terms of the area under the curve (AUC) of thesearch effectiveness and search efficiency plots.

Search effectiveness (Equation 1) corresponds to the pre-cision in terms of the fraction of information found that isuseful (useful coverage) until time t. The purpose of thismeasure is to quantify the extent to which searchers arereaching useful content that helps with task completion.

Search Effectiveness u tUsefulCoverage u t

OverallCoverag,

,( ) = ( )ee u t,( )

(1)

In the definition of search effectiveness presented in Equa-tion 1, useful coverage comprises all content pages (reachedthrough a SERP click or through post-click navigation) onwhich the searcher spends 30 seconds or more. A 30-secondtime out has been used in previous work (Fox, Karnawat,Mydland, Dumais, & White, 2005; White & Huang, 2010)to identify cases in which searchers derive utility fromexamined web pages. With respect to the overall coveragethat appears in the denominator of Equation 1, this refers toall distinct pages visited by the user until time t. In an idealscenario, this should be done with respect to coverage ofrelevant pages (widely used measures in IR such as preci-sion and recall), but our approach is concerned withmeasuring performance from the perspective of searchers. Inthis sense, search effectiveness is based on implicit measuresof usefulness, which, as pointed out by González-Ibáñezet al. (2012), show high overlap with explicit relevantcoverage.

Although search effectiveness provides evidence ofsearchers’ ability to find useful pages, it does not considerthe effort required in that process. We consider that effortcan be expressed in terms of query formulations, which canbe related to underlying cognitive processes involved in thetransformation or expression of internal information needsinto a set of terms that a search engine can process. Weformalize this relation as search efficiency (Equation 2),which indicates the relationship between search effective-ness and the effort required from the searcher to achieve thatusefulness. Effort in this case is expressed in terms of thenumber of queries issued until time t. Search efficiencyincreases if the precision to find useful pages also increasesand the number of queries used in the process remains low.

Search Efficiency u tSearch Effectiveness u t

NumQueries u,

( , )

(( ) =

,, )t(2)

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 7DOI: 10.1002/asi

Page 9: Capturing collabportunities: A method to evaluate

Our goal is to identify good opportunities to collaborate, sowe must compare sessions to estimate the costs and benefitsof collaborating at different times. For discrete search ses-sions, we use the trapezoidal rule for nonuniform grids as anumerical method of integration (Equation 3) to performcomparisons in terms of the area under the curve (AUC) ina window of time.

f t dt t t f t f tt

t

k k k kk t

tc c

( ) ≈ −( ) ( ) + ( )( )∫ ∑ + +=0 0

1

21 1 (3)

AUC in this case represents how effective and efficientsearchers are until time t. For a given searcher or team, thelarger the AUCs for each of the two measures, the moreeffective or efficient the user is in finding useful pages.

Data

To perform the experiment described above, we usedthree data sets, (a) SULab, logs from a laboratory study withsingle users’ search sessions; (b) CPLab, logs from a labora-tory study with collaborative pairs’ search sessions; and (c)SUSEngine, logs from the Microsoft Bing search engine withsearch sessions of single users. Each data set contains com-plete search sessions about the Deepwater Horizon oil spill,we were sure of the existence of past collabportunities cop

between searchers that can be used to evaluate a collabpor-tunity coc according to the procedure described in Algo-rithm 1. The selection of this topic is based on its relevanceand popularity at the time when the laboratory studieswere conducted (early autumn 2010). As reported byGonzález-Ibáñez, Haseki, and Shah (2013), preliminarystudies on the topic and pilot runs conducted before thelaboratory studies showed considerable amounts of informa-tion about the oil spill. In the particular case of SULab andCPLab, the participants in the study were instructed toaddress the following task, which was designed to be arealistic work task (Borlund, 2000):

A leading newspaper has hired your team to create a compre-hensive report on the causes, effects, and consequences of therecent Gulf oil spill. As a part of your contract, you are requiredto collect all the relevant information from any available onlinesources that you can find. To prepare this report, search and visitany website that you want and look for specific aspects as givenin the guideline below. As you find useful information, highlightand save relevant snippets. Make sure you also rate a snippet tohelp you in ranking them based on their quality and usefulness.Later, you can use these snippets to compile your report, nolonger than 200 lines, as instructed. Your report on this topicshould address the following issues: description of how the oilspill took place, reactions by BP as well as various governmentand other agencies, impact on economy and life (people andanimals) in the Gulf, attempts to fix the leaking well and toclean the waters, long-term implications, and lessons learned.

We used the three data sets independently as follows.First, the SULab data set was used to evaluate coc with our

method using a small sample of single users’ search sessionscollected in a controlled study. CPLab, on the other hand, wasused to evaluate coc with a larger data set and also to inves-tigate the potential of collabportunities compared withactual collaborative search sessions. Finally, SUSEngine wasused to test the external validity of the approach with behav-ioral data collected in an uncontrolled setting (the MicrosoftBing search engine). The remainder of this section describeseach of the data sets in more detail.

Single users’ lab study data set: SULab. The single users’data set comprises 11 search sessions, one session per par-ticipant, lasting for approximately 20 minutes each. Partici-pants in this study were undergraduate students fromdifferent fields at Rutgers University. Six participants werewomen, and five men. Their ages ranged between 18 and 24years (mean 20.50, SD 1.71). Recruitment was conductedthrough public announcements (e.g., e-mail lists, flyers) anda web form to register recruits’ information and scheduletheir sessions. During the sessions, the participants wereinstructed to collect relevant information about differentaspects of the Deepwater Horizon oil spill such as reactions,causes, and effects (see task description above for specifics).The logs collected during this study comprised time-stamped data containing the list of identifiers for each user,queries issued, SERPs, content pages, relevance judgmentsassigned to the pages encountered by the user, and pages’dwell times. Recruitment was conducted through announce-ments posted on campus and also delivered through e-maildistribution lists. For this study, participants were compen-sated with $10 for a 1-hour session, and participants alsocompeted for additional financial rewards based on theirperformance in the task.

Collaborative pairs lab study data set: CPLab. The collab-orative data set comprises search logs of 60 pairs or 120participants, again with approximately 20 minutes of activesearching. Likewise, in the single user lab study, participantswere undergraduate students from different fields at RutgersUniversity. Seventy-one participants were women, 49 men.Their ages ranged between 17 and 30 years (mean 21.03,SD 2.44). Recruitment was conducted through publicannouncements (e.g., listservs, announcements posted onbus stops and Rutgers facilities) and a web form to registerrecruits’ information and schedule their sessions. Pairs inthis study were also instructed to collect relevant informa-tion about the Deepwater Horizon oil spill (see task descrip-tion above). Each session log consists of time-stampeddata containing unique identifiers for each team, uniqueidentifiers for each user, queries issued, SERPs, contentpages, relevance judgments, and pages’ dwell time. Recruit-ment was conducted through announcements posted oncampus and also delivered through e-mail distributionlists. A key requirement of this study was that participantsenroll for the study in pairs along with someone withwhom they’d had experience working together. As with

8 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 10: Capturing collabportunities: A method to evaluate

SULab, participants were compensated with $10 for a 1-hoursession. They also competed as teams for financial rewardsbased on task performance.

Large-scale search engine data set: SUSEngine. We hadaccess to search logs obtained from a large-scale searchengine. Logs were gathered from Microsoft Bing for a2-week period (from October 1, 2010, to October 14, 2010)when there was still significant interest in the DeepwaterHorizon oil spill among web searchers. We used anony-mized logs of URLs visited by users who explicitly con-sented to provide data through a widely distributed browsertoolbar. At the time of toolbar installation, or upon first useif the toolbar was preinstalled, users were presented with adialog requesting their consent for data sharing. Log entriesincluded a unique user identifier, a timestamp for each pageview, and the URL of the page visited. We excluded intranetand secure (https) URL visits at the source. To removevariability caused by cultural and linguistic variation insearch behavior, we included only log entries generated byusers in the English-speaking United States. From theselogs, we extracted search sessions on Google, Yahoo!, andBing via a methodology similar to White and Drucker(2007). Search sessions comprised queries, clicks on searchresults, and pages visited during navigation once usershad left the search engine. Search sessions ended after aperiod of user inactivity exceeding 30 minutes. Logs were

analyzed on premises at Microsoft by one of the authors(R. White).

We intersected the text of queries generated by the labo-ratory participants (described in the previous section) withthose of search engine users to find sessions related to the oilspill task. Note that these sessions were not the actual ses-sions from the laboratory searchers, they shared queries incommon only with the laboratory study. In total, 149 distinctqueries (21.3%; e.g., attempts to fix the oil spill cleaning upthe oil spill how fix bp oil leak bp oil spill impact oneconomy) matched the logged Bing queries, yielding a totalof 14,934 search sessions from 12,173 users. We excludedcommon queries that may be unrelated to oil spill searching(e.g., crude oil greenpeace), giving us a final set of 8,969search sessions from 8,051 users with which to assess theperformance of our method.

Results

As indicated earlier, all experiments comprise an evalu-ation of a collabportunity coc between two users while inde-pendently working on an exploratory search task about theDeepwater Horizon oil spill.

Experiment on SULAB

Logs in this data set correspond to users performing thesame task, we generated all possible pairs with the 11

FIG. 4. Aggregated results for search effectiveness and search efficiency in each minute for the data set SULab. a: Search effectiveness and search efficiencyin each minute. b: Average of AUC for search efficiency and search effectiveness at each minute. c: Help−hurt relation in term of search effectiveness ineach minute. d: Help−hurt relation in terms of search efficiency in each minute. [Color figure can be viewed in the online issue, which is available atwileyonlinelibrary.com.]

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 9DOI: 10.1002/asi

Page 11: Capturing collabportunities: A method to evaluate

participants in the data set. Each pair (α,β) with α ∈ {≈A}and β ∈ {≈B} represents a collabportunity cop betweenusers in α and β that will be employed to estimate thefeasibility of coc. Overall, we produced 55 collabportunitiescop and the corresponding simulated collaborative searchessc. We then computed the performance (using the measuresexplained above) in each minute for all cop and comparedthat performance with the individual performance of theusers. Figure 4a shows how simulated collaborations outper-form individual search effectiveness at different minutes.The average search effectiveness of simulated pairs withreported significant differences is higher than the baseline(users in cop working individually) most of the time. Acomparison in terms of search efficiency shows that fewerthan 20% of the simulated pairs obtain higher search effi-ciency during minutes 3 and 4. The cost of collaboratingaccording to this data set exceeds the benefits (all of thesimulated pairs showed search efficiencies significantlylower than the baseline). For individual cases, this meansthat a single searcher was able to cover more useful pageswith the use of fewer queries (search efficiency) than in thesimulated collaborations with the other 10 searchers in thedata set. For instance, at the end of the session, participantP129 covered 10 useful pages with a search effectiveness of0.63. To find such pages, P129 issued seven queries that ledhim to a search efficiency of 0.09. However, in a simulatedcollaboration with participant P141, the pair covered 14useful pages with a search effectiveness of 0.53. This wasachieved through 16 queries that led to a search efficiency of0.03. When considering cumulative scores (Figure 4b), bothsearch effectiveness and search efficiency significantly out-perform the baselines starting in the sixth minute. Figure 4cand 4d shows the percentage of simulated sessions withsignificant differences (positive [helped] and negative[hurt]) at p < 0.05 (using z-score) for both performance

measures. No bars are displayed during the first 2 minutesbecause no significant differences were found. For the caseof search effectiveness (Figure 4c)—starting in minute 3—itwas observed that more than 60% of the simulated collabo-rations, sc, displayed significant benefits. The highest peaksare found in minutes 3 and 5 (100%), only after minute18 dropping below 50%. Conversely, search efficiency(Figure 4d) shows significant benefits for all simulated col-laborations from minute 5 to minute 19. At early times(minutes 3 and 4), more than 80% of the simulated sessionsdisplay significant benefits with respect to the baseline.Overall, these results suggest that coc is likely to improvesearch effectiveness and search efficiency. Pairs achievehigher exposure to useful pages than their individualmembers. Moreover, they do so requiring less effortexpressed in the number of queries formulated.

Experiment on CPLAB

In a similar manner, we ran the experiment on the CPLab

data set. This data set contains logs from pairs of usersperforming a search task collaboratively, so we dividedeach collaborative search session and produced a total of120 individual sessions. We then generated all possiblecombinations of pairs, discarding those that matched withactual pairs in the original data set. Each generated pairrepresents a collabportunity, cop. Overall, we produced7,080 cop and the corresponding simulated collaborativesearches, which provides us with a larger set of samples ofcop for estimating the feasibility of coc. Next, we calcu-lated search effectiveness and search efficiency in eachminute for all cop and compared them with the individualperformance of their users and also with that of the origi-nal pairs in the data set. Figure 5 provides a more detailed

FIG. 5. Performance comparison in terms of search effectiveness and search efficiency of a simulated collaboration, sc, against two baselines. [Color figurecan be viewed in the online issue, which is available at wileyonlinelibrary.com.]

10 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 12: Capturing collabportunities: A method to evaluate

view of the performance comparison between one simu-lated user pair (α1,β1) with α1 ∈ {≈A} and β1 ∈ {≈B} andtwo baselines, (a) individual performance of user α1 and(b) collaborative performance of user α1 working with hisactual collaborator as indicated in the data set (herereferred to as β2 ∈ {≈B}). Results for both performancemeasures show that the simulated collaboration outper-forms the two baselines approximately between minutes 2and 5 (p < 0.05 using z-score). After minute 5, this gap isreduced, and slight differences are observed in terms ofsearch effectiveness. Unlike the example provided in theprevious section for a single user in SULab, here ourapproach showed the potential benefits of a collabportunityin which a simulated pair (α1,β1) is able to reach searcheffectiveness. In addition, the ratio between search effec-tiveness and the required effort expressed in terms of thenumber of queries issued by the simulated pair (α1,β1) islower than that for the individual user (α1) or if workingwith an actual partner (β2) during the study.

A summary of a complete evaluation and comparisonwith all the simulated pairs sc is provided in Figure 6.Results in terms of averages of search effectiveness andsearch efficiency in each minute show significantly higherlevels of search effectiveness and search efficiency atp < 0.05 (using z-score) along the session, with the excep-tion of minutes 2 and 3 (Figure 6a). Upon examining the

cumulative scores in Figure 6b, we can see that the simu-lated collaborations have higher cumulative AUCs for bothmeasures during the entire session. In terms of costs andbenefits, Figure 6c indicates that most of the simulated pairs(80 to 100%) show significant benefit at p < 0.05 (againusing z-score) in terms of search effectiveness until minute9. After this point, the probability of achieving higher searcheffectiveness in co drops to 70%, although the collaborationis still much more likely to help rather than hurt. A similartrend is observed for search efficiency; significant benefitsare found mainly at early stages of the search process(Figure 6d). Previous research has shown that search queriescan become more specific as the session proceeds (Aula,Khan, & Guan, 2010; Radlinski & Joachims, 2005; Xianget al., 2010), meaning that the benefit from collaborationwith another searcher, who might have his own specificneeds, could diminish.

It is interesting to note that simulated collaborationsalso outperform the search effectiveness and search effi-ciency of actual pairs during the entire session. Thisfinding suggests that there may be other searchers in agiven population who could help team members achievebetter results by adding them to the team. In addition, byincrementing the set of sc, our evaluation procedure tests awide variety of collaborative alternatives to estimate moreprecisely the impact of a coc.

FIG. 6. Aggregated results for search effectiveness and search efficiency in each minute for the data set CPLab, a: Effectiveness and efficiency in eachminute. b: Average of AUC for search efficiency and search effectiveness at each minute. c: Help−hurt relation in term of search effectiveness in each minute.d: Help−hurt relation in terms of search efficiency in each minute. [Color figure can be viewed in the online issue, which is available atwileyonlinelibrary.com.]

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 11DOI: 10.1002/asi

Page 13: Capturing collabportunities: A method to evaluate

Experiment on SUSENGINE

Our last evaluation consisted of running the experiment onthe SUSEngine data set. Using this data set, we constructed teamsof paired users following the same procedure used with thelaboratory study data set. We could not guarantee that users inthe log data were actually engaged in the same task (eventhough there was substantial topical similarity), so we esti-mated the association automatically from the log data. To dothis, we matched users’ sessions based on the first 2 minutesof their searching. Specifically, given a search session, s, welooked in our log data for other sessions {≈S} from otherusers with at least one query in common (exact match follow-ing trimming and lowercasing) during the first 2 minutes ofthat session. Each user for each session in {≈S} that met thiscriterion was regarded as a teammate to produce cop and theircorresponding sc sessions using the method described earlier.Each user was included only once in the set of candidatematches for a given reference session. If there were multiplecandidate sessions from a given user, one was randomlyselected. Using this method, we identified on average 1386.4matching sessions for each session of interest (12,387,851 intotal). Because the laboratory tasks were 20 minutes in dura-tion, we focused on users’ logged interactions during the first20 minutes of each search session.

Unlike our results with the previous data sets (SULab andCPLab), simulated pairs always outperformed the baselinein terms of search effectiveness and search efficiency(Figure 7a), with steady gains over the course of the searchsession (Figure 7b). Possible explanations for these differ-ences include differences in the size and constitution of thepopulation from which other searchers are drawn or differ-ences in the search tasks that people are attempting, eventhough they are topically related. The fact that search-enginelog data are inherently noisier also might have some effecton the findings. More investigation into the nature of theobserved differences is required, but overall the trends in thelog-based results mirror what was observed in the other twostudies.

In terms of search effectiveness, our results indicate that,more than 80% of the time, simulated pairs offer significantbenefits at p < 0.05 (using z-score) between minutes 7 and 8(Figure 7c). The same evaluation indicates that the worsttime to collaborate would be the first minute, for which ourresults indicate that approximately 30% of the simulatedpairs showed more benefits than costs with respect to thebaseline. For search efficiency, Figure 7d illustrates that,before minute 10, collaborating could help to increase thelikelihood of finding more useful information with lesseffort. This situation changes from minute 10 to minute 20,

FIG. 7. Aggregated results for search effectiveness and search efficiency in each minute for the data set SUSEngine. a: Effectiveness and efficiency in eachminute. b: Average of AUC for search efficiency and search effectiveness at each minute. c: Help−hurt relation in term of search effectiveness in each minute.d: Help−hurt relation in terms of search efficiency in each minute. [Color figure can be viewed in the online issue, which is available atwileyonlinelibrary.com.]

12 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 14: Capturing collabportunities: A method to evaluate

when the number of sessions that outperform the baselinefalls between 60% and 70%. As highlighted in the previoussection, this variation could be attributed to searchersbecoming more specific in their searches, which makes col-laboration less likely to enhance individual search effective-ness and search efficiency in finding useful information forthe specific search task.

Summary and Discussion

We have presented results of an experiment aimed atunderstanding the performance of our method using threedifferent data sets. The primary objective of this experimentwas to determine whether a collabportunity, coc, in a non-collaborative situation was beneficial. We used different datasets to test our evaluation procedure and demonstrate poten-tial benefits of coc at different times. This assumes that usersA and B in coc will have search trends similar to those foundin past collabportunities, cop, in each data set.

Evaluation results obtained for each data set can be usedto determine whether coc is captured or discarded, a decisionthat depends on a predetermined threshold (see line 22 inAlgorithm 1). For example, if the threshold is defined as70% (which can be adjusted depending on the task andrequirements), then results from the experiment with thethree data sets suggest that collabportunity between users Aand B is feasible between the second and the ninth minutes.Therefore, coc should be promoted to a recommendation forcollaboration during that window of time.

Our results, although different, were consistent in indi-cating the potential benefits of coc in enhancing search effec-tiveness. On the other hand, we found that search efficiencyis sensitive to the variability of the data available to simulatecollaborative behavior. As explained in the previous section,both SULab and CPLab used search sessions of a small groupof users with specific characteristics (i.e., students fromRutgers University), whereas SUSEngine contains search ses-sions from thousands of users, which are assumed to havedifferent demographic characteristics. Moreover, althoughwe are not aware of the nature and goals of the search taskperformed by users in SUSEngine, it is likely that their searchestook place in natural settings and possibly with motivationsbeyond the scope of motivators provided in the studies (i.e.,financial rewards). In this sense, not only the size of the dataavailable but also similarities and, more important, differ-ences among users could be key factors in the identificationof collabportunities with a greater potential to become actualcollaboration.

Among the results derived from this study, the experi-ment on CPLab was found to be especially interesting, in thatit showed that our approach could outperform a baseline ofreal teams, in which users collaborated intentionally andexplicitly. Although we observed these gains, differencesand similarities between searchers might have a significanteffect on the utility of this approach when employed inpractice. Those in the real teams shared common ground(i.e., they were friends, couples, relatives, etc.), suggesting

similarities on a range of different levels, including person-ality, interests, and skills. Conversely, the simulated pairsinvolve combinations of users who may lack commonground and may possess different characteristics. The evalu-ation of simulated pairs does not consider the underlyingprocesses of explicit/intentional collaboration such as com-munication and coordination, which in the case of userswithout common ground could result in poor collaborations(Clark and Brennan, 1991; McCarthy, Miles, & Monk,1991).

Although the method that we propose does not guaranteesuccessful collaboration, it provides an estimation of thecosts and benefits of a collaboration opportunity. Such anestimation could be utilized by a search system and is thefirst step in the selection of collabportunities with thepotential to become an actual collaboration. Despitethe promise of our method, it may be necessary to incorpo-rate additional layers of evaluation that consider similaritiesand differences among searchers, aiming to maximize thelikelihood of transforming collabportunities into successfulcollaborations.

We have shown that our method is capable of indicatingwhen in the search process would be a good moment tocollaborate. Appropriate timing in this context refers to spe-cific periods when the benefits exceed the costs. It is impor-tant to mention that these results are specific for the task andtopic evaluated. The method may report similar or differenttime periods depending on the task, topic, and informationavailable about past search sessions.

We explored two performance measures that can be inte-grated with our method. Search effectiveness shows when acollabportunity could lead users to find information moreprecisely. Search efficiency quantifies expected effort at agiven time to achieve a particular search effectiveness level.These measures are complementary and help to estimatebenefits and costs from different perspectives. Although welimited our study to these performance measures, we notethat other performance measures and features could be inte-grated with our approach.

Finally, we highlight the experiment on the search enginedataset SUSEngine, which demonstrates the validity of ourmethod when applied to data obtained from noncontrolledsettings. This helps us understand the potential practicalutility of our approach and highlights the potential of col-labportunities in large-scale search engines.

Conclusions and Future Work

Beyond retrieving and ranking, many solutions andapproaches have been developed to support the informationsearch processes of users. Solutions based on query sugges-tions, results recommendation, information filtering, andpersonalization are some examples found in the literature.Although some of these cases, such as informationfiltering, consist of implicit and unintentional forms ofcollaboration, some researchers have indicated that collabo-ration is another such possibility that could enhance

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 13DOI: 10.1002/asi

Page 15: Capturing collabportunities: A method to evaluate

information exploration (Hyldegard, 2006, 2009; Shah,2010b; Twidale et al., 1997). It is often the case, however,that information search is performed in solitude even whenmany others may be performing the same or similar tasksaround the same time. Having many users (potentially manythousands, depending on the topic and the search environ-ment) searching for information about the same topic at thesame time implies latent opportunities for collaborationsbetween searchers. These opportunities are usually missedbecause of the lack of methods to identify them.

We have explored these collabportunities, in particular, away to estimate their feasibility. This raises the possibilityof opportunities for collaboration being captured ratherthan lost over time. We have proposed a method to performthis evaluation with the assumption that underlying IRsystems are capable of accessing data about past searchsessions. Our approach is based on simulated collaborativesearch processes and performance measures that are used todetermine the potential benefits and costs of collaboratingwith another individual. In addition, this method also indi-cates when good opportunities for searchers to collaborateexist.

We evaluated our method with an experimental design forthree different data sets, which included data from labora-tory studies and also logs from a large-scale search engine.We relied on two performance measures, namely: effective-ness and efficiency. Results derived from this study wereconsistent in demonstrating the potential of a given collab-portunity to enhance individual performance.

The method is presented here in a generic form. We focuson an implementation in the back end of systems capable ofkeeping track of search sessions from multiple users overtime. For example, implementations at the level of IRsystems could help to identify and evaluate collabportunitiesin real time, by connecting different searchers with similarqueries. Integration with other collaboration approachessuch as algorithmic mediation for collaboration in informa-tion search (Pickens et al., 2008) is possible.

Implementing mechanisms to detect and evaluate col-laboration opportunities to promote explicit and intentionalcollaboration between users poses several challengesbeyond the technical domain. One such challenge is protect-ing the privacy of users. As explained, our approach relies onactive monitoring of user search activity to identify potentialcollaborations. Moreover, it requires methods to mine pastusers’ logs to evaluate the feasibility of current collabportu-nities. There are also sensitivities around connecting search-ers with strangers simply based on queries issued, and userconsent may need to be sought before they are entered intoa pool of potential collaborators and their ongoing searchactivity shared with others. These aspects are beyond thescope of this study; however, we acknowledge their impor-tance and the need for in-depth investigations that helpresearchers to understand the cultural feasibility, socialimplications, and legal limitations of implementing this kindof technology as well as the privacy-preserving mechanismsrequired.

Once collabportunities are properly evaluated and pro-moted to recommendations for collaboration, other ques-tions must be addressed. For example, if users acceptcollaboration, how are they going to coordinate theiractions? Will a communication channel be enabled whilethe collaboration takes place? Or will collaboration bebased solely on information sharing? How best can theserecommendations be communicated to users in an unob-trusive yet informative way? These are fundamental ques-tions that must be answered in developing systems tolessen the number of missed collabportunities and to helpsearchers help each other. Responding to these questions isbeyond the scope of this article. Further explorations arerequired to investigate the experience of users and otherhuman factors as well as social factors when offered thepossibility to collaborate with someone else.

We have evaluated our method in the context of a par-ticular exploratory search task and topic, namely, the Deep-water Horizon oil spill. Although further work with moretopics is needed, we believe that exploratory search sce-narios of this nature are especially likely to benefit from theadditional support that we have described, primarily becausethey may extend over many queries and involve examiningresources from multiple locations. Marchionini (2006) con-siders broad search activities such as learn and investigate asessential parts of exploratory search, which include specificactions of discovery, knowledge acquisition, synthesis, andevaluation, among others. In this sense, our future worktargets two important objectives: (a) the definition andevaluation of a method capable of identifying collabportu-nities in real time and (b) the refinement and testing of ourcurrent evaluation method with different tasks and scenarios(i.e., controlled and semicontrolled) to validate our findingsand address questions such as those stated above.

Finally, we note that, although the experiments reportedhere were conducted using laboratory study data sets andweb search logs, the approach itself is independent of suchdata. One could take the algorithm we presented and runsimilar analyses on a different data set to achieve similareffects. This paves the way for research and development onperforming selective collaboration and bridging system-mediated and user-mediated collaborative search.

Acknowledgments

The work described in this article was in part supportedby the Institute of Museum and Library Services (IMLS)Early Career Development grant # RE-04-12-0105-12.

References

Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation ofrecommender systems: A survey of the state-of-the-art and possibleextensions. IEEE Transactions on Knowledge and Data Engineering,17(6), 734–749.

Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web searchranking by incorporating user behavior information, In Proceedings ofSIGIR’06 (pp. 19–26).

14 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi

Page 16: Capturing collabportunities: A method to evaluate

Anick, P., & Kantamneni, R.G. (2008). A longitudinal study of real-timesearch assistance adoption. In Proceedings of ACM SIGIR’08 (pp. 701–702).

Aula, A., Khan, R.M., & Guan, Z. (2010). How does search behaviorchange as search becomes more difficult? In Proceedings of SIGCHI’10(pp. 34–44).

Belkin, N.J. (2000). Helping people find what they don’t know. Commu-nications of the ACM, 43(8), 58–61.

Bennett, P., White, R.W., Chu, W., Dumais, S., Bailey, P., Borisyuk, F., &Cui, X. (2012). Modeling the impact of short and long-term behavior onsearch personalization. In Proceedings of SIGIR’12 (pp. 185–194). Port-land, OR: ACM Digital Library.

Borlund, P. (2000). Experimental components for the evaluation of inter-active information retrieval systems. Journal of Documentation, 56(1),71–90.

Borlund, P., & Ingwersen, P. (1999). The application of work tasks inconnection with the evaluation of interactive information retrievalsystems: Empirical results. In Proceedings of MIRA: Electronic Work-shops in Computing (pp. 29–46). ACM Digital Library.

Clark, H.H., & Brennan, S.E. (1991). Grounding in communication. InL.B. Resnick, R.M. Levine, & S.D. Teasley (Eds.), Perspectives onsocially shared cognition (pp. 127–149). The American PsychologicalAssociation.

Fidel, R., Bruce, H., Pejtersen, A.M., Dumais, S.T., Grudin, J., & Poltrock,S. (2000). Collaborative Information Retrieval (CIR). The New Reviewof Information Behaviour Research, 1(1), 235–247.

Fidel, R., Pejtersen, A.M., Cleal, B., & Bruce, H. (2004). A multidimen-sional approach to the study of human-information interaction: A casestudy of collaborative information retrieval. Journal of the AmericanSociety for Information Science and Technology, 55(11), 939–953. doi:10.1002/asi.20041

Fisher, K.E., Erdelez, S., & McKechnie, L. (2005). Theories of InformationBehavior. Medford, NJ: Information Today.

Fox, S., Karnawat, K., Mydland, M., Dumais, S., & White, T. (2005).Evaluating implicit measures to improve web search. ACM Transactionson Information Systems (TOIS), 23(2), 147–168.

Golovchinsky, G., Pickens, J., & Back, M. (2008). A taxonomy of collabo-ration in online information seeking. In Proceedings of JCDL 2008Workshop on Collaborative Exploratory Search (pp. 1–3). Pittsburgh, PA.

Golovchinsky, G., Adcock, J., Pickens, J., Qvarfordt, P., & Back, M.(2008). Cerchiamo: A collaborative exploratory search tool. In Proceed-ings of CSCW’08 (pp. 1–2). San Diego.

Golovchinsky, G., Dunnigan, A., & Diriye, A. (2012). Designing a tool forexploratory information seeking. In Proceedings of ACM SIGCHI’12(pp. 1799–1804). ACM Digital Library.

González-Ibáñez, R., Shah, C., & White, R. (2012). Pseudo-collaborationas a method to perform selective algorithmic mediation in collaborativeIR systems. In Proceedings of ASIS&T’12 (pp. 1–4).

González-Ibáñez, R., Haseki, M., & Shah, C. (2013). Let’s search together,but not too close! An analysis of communication and performance incollaborative information seeking. Information Processing and Manage-ment, 49(5), 1165–1179.

Horowitz, D., & Kamvar, S.D. (2010). The anatomy of a large-scale socialsearch engine. In Proceedings of WWW’10 (pp. 431–440). ACM DigitalLibrary.

Hyldegard, J. (2006). Collaborative information behaviour—exploringKuhlthaus Information Search Process model in a group-based educa-tional setting. Information Processing and Management, 42(1), 276–298.

Hyldegard, J. (2009). Beyond the search process—exploring groupmembers information behavior in context. Information Processing andManagement, 45(1), 142–158.

Joachims, T. (2002). Optimizing search engines using clickthrough data. InProceedings of SIGKDD’02 (pp. 133–142).

Marchionini, G. (2006). Exploratory search. Communications of the ACM,49(4), 41–46.

McCarthy, J.C., Miles, V.C., & Monk, A.F. (1991). An experimental studyof common ground in text-based communication. In Proceedings ofACM SIGCHI’91 (pp. 209–215). ACM Digital Library.

Morris, M.R. (2008). A survey of collaborative web search practices. InProceedings of ACM SIGCHI’08 (pp. 1657–1660).

Morris, M.R., & Horvitz, E. (2007). SearchTogether: An interface forcollaborative Web search. In Proceedings of ACM UIST’07 (pp. 3–12).ACM Digital Library.

Pickens, J., Golovchinsky, G., Shah, C., Qvarfordt, P., & Back, M. (2008).Algorithmic mediation for collaborative exploratory search. In Proceed-ings of ACM SIGIR’08 (pp. 315–322). ACM Digital Library.

Radlinski, F., & Joachims, T. (2005). Query chains: learning to rank fromimplicit feedback. In Proceedings of SIGKDD’05 (pp. 239–248).ACM Digital Library.

Resnick, P., & Varian, H.R. (1997). Recommender Systems. Communica-tions of the ACM, 40(3), 56–58.

Shah, C. (2009). Lessons and challenges for Collaborative InformationSeeking (CIS) systems developers. In Proceedings of GROUP 2009Workshop on Collaborative Information Behavior (pp. 1–4).

Shah, C. (2010a). Coagmento—A collaborative information seeking, syn-thesis and sense-making framework. In Integrated Demo at CSCW 2010(pp. 1–4).

Shah, C. (2010b). Working in collaboration—What, why, and how? InProceedings of Second workshop on Collaborative Information Seekingat CSCW 2010 (pp. 1–6).

Shah, C. (2012). Collaborative information seeking: The art and science ofmaking the whole greater than the sum of all. The Information RetrievalSeries, Vol. 34. Springer.

Shah, C., & González-Ibáñez, R. (2011). Evaluating the synergic effect ofcollaboration in information seeking. In Proceedings of ACM SIGIR’11(pp. 913–922). ACM Digital Library.

Twidale, M.B., & Nichols, D.M. (1996). Collaborative browsing and visu-alisation of the search process. In Proceedings of ELVIRA 3, Aslib48(7–8), 177–182.

Twidale, M.B.T., Nichols, D.M.N., & Paice, C.D. (1997). Browsing is acollaborative process. Information Processing and Management, 33(6),761–783.

Weisz, J.D., Erickson, T., & Kellogg, W.A. (2006). Synchronous broadcastmessaging: The use of ICT. In Proceedings of SIGCHI ’06 (pp. 1293–1302). ACM Digital Library.

White, R.W., & Drucker, S.M. (2007). Investigating behavioral variabilityin web search. In Proceedings of WWW’07 (pp. 21–30). ACM DigitalLibrary.

White, R.W., & Huang, J. (2010). Assessing the scenic route: Measuringthe value of search trails in web logs. In Proceedings of ACM SIGIR’10(pp. 587–594). ACM Digital Library.

White, R.W., Chu, W., Hassan, A., He, X., Song, Y., & Wang, H. (2013).Enhancing personalized search by mining and modeling task behavior. InProceedings of WWW’13 (pp. 1411–1420). ACM Digital Library.

Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E., & Li, H. (2010). Context-aware ranking in Web search. In Proceedings of ACM SIGIR’10 (pp.451–458). ACM Digital Library.

Appendix: Definitions (Based on Shah, 2012)

• Information retrieval (IR) is the area of study concerned withsearching for documents, for information within documents, andfor metadata about documents as well as searching structuredstorage, relational databases, and the web.

• Information seeking (IS) is the process or activity of attemptingto obtain information in both human and technological contexts.IS, in this work, is seen as incorporating IR.

• Information behavior/human information behavior is thestudy of the interaction among people, information, and thesituations (contexts) in which they interact (Fisher, Erdelez, &McKechnie, 2005).

• Coordination is a process of connecting different agentstogether for a harmonious action. This often involves bringingpeople or systems under an umbrella at the same time and place.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014 15DOI: 10.1002/asi

Page 17: Capturing collabportunities: A method to evaluate

During this process, the involved agents may share resources,responsibilities, and goals.

• Cooperation is a relationship in which different agents withsimilar interests take part in planning activities, negotiating roles,and sharing resources to achieve joint goals. In addition to coor-dination, cooperation involves all the agents following somerules of interaction.

• Collaboration is a process involving various agents who maysee different aspects of a problem. They engage in a processthrough which they can go beyond their own individual expertiseand vision by constructively exploring their differences andsearching for common solutions. In contrast to cooperation, col-laboration involves creating a solution that is more than merelythe sum of each party’s contribution. The authority in such aprocess is vested in the collaborative rather than in an individualentity.

• Explicit collaboration occurs when various aspects of collabo-ration are clearly stated and understood. For instance, a groupof students working on a science project together knows that(a) they are collaborating and (b) who is responsible for doingwhat.

• Implicit collaboration occurs when collaboration happenswithout explicit specifications. For instance, visitors to Amazon-.com receive recommendations based on other people’s search-ing and buying behavior without knowing those people.

• Active collaboration is similar to explicit collaboration, the keydifference being the willingness and awareness of the user. Forinstance, when a user of Netflix rates a movie, he is activelyplaying a part in collaborating with other users. However,because he did not explicitly agree to collaborate with others, hemight not even know those users.

• Passive collaboration is similar to implicit collaboration, thekey difference being the willingness and awareness of the user.For instance, when a user visits a video on YouTube, he passivelycontributes to the popularity of that video, affecting the rankingand popularity of that video for others. The key differencebetween active and passive collaboration is user’s willingnessand control over the actions. In the case of active collaboration,user agrees to do it (rating, comments), whereas, in case ofpassive collaboration, the user has very little control (click-through, browsing patterns).

16 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2014DOI: 10.1002/asi