26
Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/ Teaching HCI Methods: Replicating a Study of Collaborative Search Max L. Wilson Evaluating the synergic effect of collaboration in information seeking. In SIGIR 2011 by Shah & González-Ibáñez Replicated Paper: Wednesday, 8 May 13

RepliCHI - 8 Challenges in Replicating a Study

Embed Size (px)

Citation preview

Page 1: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Teaching HCI Methods: Replicating a Study of Collaborative Search

Max L. Wilson

Evaluating the synergic effect of collaboration in information seeking.In SIGIR 2011

by Shah & González-Ibáñez

Replicated Paper:

Wednesday, 8 May 13

Page 2: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Background

• I was teaching HCI Methods in Swansea

• Replicating a study is “a good way to learn”

• I’d just finished teaching the Information Seeking module - and I’m very interested in Collaborative Information Seeking

Wednesday, 8 May 13

Page 3: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

http://coagmento.orgWednesday, 8 May 13

Page 4: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Chirag Shah

• Assistant Prof at Rutgers LIS

• Built Coagmento during his PhD

• Now Working with his PhD Student

Wednesday, 8 May 13

Page 5: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Collaborative Information Seeking

Wednesday, 8 May 13

Page 6: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Study Conditionsrelevant information, and using it to compose a report.

6. Participants filled out post-task questionnaires.

The researcher conducting the study communicated with the participants through the chat-box at different times during the study instructing them to start/stop the task or fill in a questionnaire.

3.4 Conditions To study the difference between individual information seeking and CIS, as well as to understand how various CIS settings can affect a collaborative team’s effectiveness in accomplishing an information-seeking task, we conducted experiments with four different conditions: single participants, two participants at the same computer, two participants in the same room but different computers, and two participants in different rooms with individual computers.

In order to have a baseline to study the synergic effect of collaboration, we artificially created pairs of users from C1 (single users). We generated all possible combinations of pairs in groups of 5, reaching a total of 49 groups and creating 245 artificial teams in total. This was done in order to cover all possible pairs of users while avoiding a given user appearing in more than one team within the same group of teams.

These five conditions are summarized in Table 1. Setups for four of these conditions are also depicted in Figure 3. Note that in the real experiment, those in C5 condition were located in different rooms separated by walls, and not just a partition. They could not see or talk to each other directly, and the only communication channel they had was the text-box provided with the system.

Table 1: Experimental conditions.

Cond. Description

C1 Single participants

C2 Artificial team

C3 Co-located using the same computer

C4 Co-located using different computers

C5 Remotely located

3.5 Task We chose “gulf oil spill” as the topic for this experimentation since it was quite popular and relevant at the time the study was being conducted. Our preliminary investigations, including a few pilot runs, indicated that there was a huge amount of material on this topic, and that the participants would find it interesting and challenging enough as an exploratory search task. Each participant was given the following task description.

“A leading newspaper has hired your team to create a comprehensive report on the causes, effects, and consequences of the recent gulf oil spill. As a part of your contract, you are required to collect all the relevant information from any available online sources that you can find.

To prepare this report, search and visit any website that you want and look for specific aspects as given in the guideline below. As you find useful information, highlight and save relevant snippets. Make sure you also rate a snippet to help you in ranking them based on their quality and usefulness. Later, you can use these snippets to compile your report, no longer than 200 lines, as instructed.

Your report on this topic should address the following issues: description of how the oil spill took place, reactions by BP as well as various government and other agencies, impact on economy and life (people and animals) in the gulf, attempts to fix the leaking well and to clean the waters, long-term implications and lessons learned.” The participants saw this description on the screen (phase 4 in the study), and were also given a printed copy to refer to during their session.

Figure 3: Experimental setups for four different conditions.

4. EVALUATION In order to evaluate the effectiveness of the participants in various conditions, we employed a number of traditional and non-traditional evaluation measures, which are presented below. Here we also describe other useful constructs and definitions that will later be used while reporting and discussing the results.

4.1 Universe of webpages In order to compute quantities such as coverage, we needed a universal set of webpages. Given that the search domain for our experiments was the open web, we needed a more confined set that we could use to compare with. We decided to take the union of all the webpages visited by all of our participants (total 70). In other words, the universe of webpages was defined by combining the visited webpages of each participant/team in every condition.

Here, Coverage(t) is the coverage (webpages visited) by participant/team t.

4.2 Relevant webpages This corresponds to the webpages that participants either bookmarked using the toolbar or from where one or more snippets were collected. Once again, we took the union of all such webpages by each participant/team to form a universe of relevant webpages.

U =[

t

Coverage(t) ... (1)

relevant information, and using it to compose a report. 6. Participants filled out post-task questionnaires.

The researcher conducting the study communicated with the participants through the chat-box at different times during the study instructing them to start/stop the task or fill in a questionnaire.

3.4 Conditions To study the difference between individual information seeking and CIS, as well as to understand how various CIS settings can affect a collaborative team’s effectiveness in accomplishing an information-seeking task, we conducted experiments with four different conditions: single participants, two participants at the same computer, two participants in the same room but different computers, and two participants in different rooms with individual computers.

In order to have a baseline to study the synergic effect of collaboration, we artificially created pairs of users from C1 (single users). We generated all possible combinations of pairs in groups of 5, reaching a total of 49 groups and creating 245 artificial teams in total. This was done in order to cover all possible pairs of users while avoiding a given user appearing in more than one team within the same group of teams.

These five conditions are summarized in Table 1. Setups for four of these conditions are also depicted in Figure 3. Note that in the real experiment, those in C5 condition were located in different rooms separated by walls, and not just a partition. They could not see or talk to each other directly, and the only communication channel they had was the text-box provided with the system.

Table 1: Experimental conditions.

Cond. Description

C1 Single participants

C2 Artificial team

C3 Co-located using the same computer

C4 Co-located using different computers

C5 Remotely located

3.5 Task We chose “gulf oil spill” as the topic for this experimentation since it was quite popular and relevant at the time the study was being conducted. Our preliminary investigations, including a few pilot runs, indicated that there was a huge amount of material on this topic, and that the participants would find it interesting and challenging enough as an exploratory search task. Each participant was given the following task description.

“A leading newspaper has hired your team to create a comprehensive report on the causes, effects, and consequences of the recent gulf oil spill. As a part of your contract, you are required to collect all the relevant information from any available online sources that you can find.

To prepare this report, search and visit any website that you want and look for specific aspects as given in the guideline below. As you find useful information, highlight and save relevant snippets. Make sure you also rate a snippet to help you in ranking them based on their quality and usefulness. Later, you can use these snippets to compile your report, no longer than 200 lines, as instructed.

Your report on this topic should address the following issues: description of how the oil spill took place, reactions by BP as well as various government and other agencies, impact on economy and life (people and animals) in the gulf, attempts to fix the leaking well and to clean the waters, long-term implications and lessons learned.” The participants saw this description on the screen (phase 4 in the study), and were also given a printed copy to refer to during their session.

Figure 3: Experimental setups for four different conditions.

4. EVALUATION In order to evaluate the effectiveness of the participants in various conditions, we employed a number of traditional and non-traditional evaluation measures, which are presented below. Here we also describe other useful constructs and definitions that will later be used while reporting and discussing the results.

4.1 Universe of webpages In order to compute quantities such as coverage, we needed a universal set of webpages. Given that the search domain for our experiments was the open web, we needed a more confined set that we could use to compare with. We decided to take the union of all the webpages visited by all of our participants (total 70). In other words, the universe of webpages was defined by combining the visited webpages of each participant/team in every condition.

Here, Coverage(t) is the coverage (webpages visited) by participant/team t.

4.2 Relevant webpages This corresponds to the webpages that participants either bookmarked using the toolbar or from where one or more snippets were collected. Once again, we took the union of all such webpages by each participant/team to form a universe of relevant webpages.

U =[

t

Coverage(t) ... (1)

Wednesday, 8 May 13

Page 7: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

What they found• Remote collaborators were more independent (less overlap), and

more synergetic than random pairs

• Significant differences between conditions

• Across several measures - page diversity - page coverage - relevance (precision, recall, F-measure) - page usefulness

• No difference in Nasa TLX

Wednesday, 8 May 13

Page 8: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Why was this a good paper to Replicate?

• 1) Coagmento was a downloadable tool

• 2) The study is clearly reported in the paper

• 3) I know Chirag relatively well

• 4) The study used more than the basic metrics

Wednesday, 8 May 13

Page 9: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Many Challenges

• 1) Software• 2) Data capture• 3) Task Design• 4) Team Research Experience• 5) Financial Support• 6) Time Scales• 7) Data Processing• 8) Data Analysis

Wednesday, 8 May 13

Page 10: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #1

• Coagmento had evolved/improved (after this study)

• BUT - Roberto offered to try to roll-back the software - This created a small project delay (hard for teaching) - But meant we were using more comparable software

• BUT - Sadly this process was only semi-successful

Software Versions

Wednesday, 8 May 13

Page 11: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #2

• The raw data was from the servers

• We began to consider recording the screens and manually creating the logs

• BUT - Roberto offered to create a new server instance

• And provided us with a zip of the data at the end!

Data Capture

Wednesday, 8 May 13

Page 12: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #3

• A leading newspaper has hired your team to create a comprehensive report on the causes, effects, and consequences of the recent gulf oil spill. As a part of your contract, you are required to collect all the relevant information from any available online sources that you can find.

• To prepare ... [prompts to use features of the software]

• Your report on this topic should address the following issues: description of how the oil spill took place, reactions by BP as well as various government and other agencies, impact on economy and life (people and animals) in the gulf, attempts to fix the leaking well and to clean the waters, long-term implications and lessons learned.

Task Design

Wednesday, 8 May 13

Page 13: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #3

• Temporally irrelevant• Culturally less relevant?

- less intrinsic motivation for participants

• Should we create a new one, or use the original one?

Task Design

Wednesday, 8 May 13

Page 14: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #3

• A leading newspaper has hired your team to create a comprehensive report on the causes, effects, and consequences of the recent gulf oil spill. As a part of your contract, you are required to collect all the relevant information from any available online sources that you can find.

• To prepare ... [prompts to use features of the software]

• Your report on this topic should address the following issues: description of how the oil spill took place, reactions by BP as well as various government and other agencies, impact on economy and life (people and animals) in the gulf, attempts to fix the leaking well and to clean the waters, long-term implications and lessons learned.

Task Design

Wednesday, 8 May 13

Page 15: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #3

• A leading newspaper has hired your team to create a comprehensive report on the causes, effects, and consequences of the Olympic Games. As a part of your contract, you are required to collect all the relevant information from any available online sources that you can find.

• To prepare ... [prompts to use features of the software]

• Your report on this topic should address the following issues: Impact on economy of host countries (people and animals), long-term implications on the host country, conditions and voting policy to become hosting nation and the next host country and their preparations to host the games.

Task Design

Wednesday, 8 May 13

Page 16: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #4

• 6 Novice MSc students, rather than 1 solid PhD student - each taking several modules at the time

• Potential for high variance between students

• Tried to create anchors - like a fixed script etc

• Not clear how important the variance would be

Research Team

Wednesday, 8 May 13

Page 17: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #5

• Original participants received $15 (x70) each

• And prizes for the best collaborating teams

• We had no budget - managed £50 of prizes

• Decided the prize for best team being most important to replicate

Financial Incentives

Wednesday, 8 May 13

Page 18: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #6

• Had to be within a fixed module (a semester)

• They had other modules to work on

• We had some time-slips in setting up the study

• We were only able to run 20 pairs, rather than 30 pairs

Time Limitations

Wednesday, 8 May 13

Page 19: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #7

• This was very interesting

• For example - they removed search result pages - this can never be an explicit known set - especially as the task domain was different

• We followed their principles for data processing for analysis

• But we could not be sure we did this the same way

Data Processing

Wednesday, 8 May 13

Page 20: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Challenge #8

• The exact stages of statistical analysis are not always clear

• For example - they used a modified NASA TLX - consequently, the exact analysis was not clear - in particular, as to whether pair-wise comparisons were made

• Also, as the scales were likert, and the stats reported as ANOVA - we weren’t sure if ANOVA on Ranks was used - or a traditional ANOVA

• (also the novice students didnt know the difference)

Data Analysis

Wednesday, 8 May 13

Page 21: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

The Outcome

• Is almost irrelevant - but we did not find the same results

• They found remote collaborators to be more independent, but more synergetic than random pairs

• There were so many potential reasons though - smaller sample size - different task difficulty - different software performance - different financial incentives - novice researchers

Wednesday, 8 May 13

Page 22: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

RepliCHI discussion points

• 1) How should we handle different software versions?

• 2) Should we be using original tasks?

• 3) How to support data processing for future researchers?

• 4) Is there community value from replicating as teaching?

Wednesday, 8 May 13

Page 23: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Many Challenges

• 1) Software• 2) Data capture• 3) Task Design• 4) Team Research Experience• 5) Financial Support• 6) Time Scales• 7) Data Processing• 8) Data Analysis

Wednesday, 8 May 13

Page 24: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Many Challenges

• 1) Software• 2) Data capture• 3) Task Design• 4) Team Research Experience• 5) Financial Support• 6) Time Scales• 7) Data Processing• 8) Data Analysis

General ReplicationIssues

Wednesday, 8 May 13

Page 25: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Many Challenges

• 1) Software• 2) Data capture• 3) Task Design• 4) Team Research Experience• 5) Financial Support• 6) Time Scales• 7) Data Processing• 8) Data Analysis

General ReplicationIssues

Replication forTeaching Issues

Wednesday, 8 May 13

Page 26: RepliCHI - 8 Challenges in Replicating a Study

Dr Max L. Wilson http://cs.nott.ac.uk/~mlw/

Many Challenges

• 1) Software• 2) Data capture• 3) Task Design• 4) Team Research Experience• 5) Financial Support• 6) Time Scales• 7) Data Processing• 8) Data Analysis

General ReplicationIssues

Replication forTeaching Issues

Publishing Issues

Wednesday, 8 May 13