24
USC INFORMATION SCIENCES INSTITUTE 1 Approach Introduct ion Motivat ion Evaluati on Conclusi on A Task-Centered Framework for Computationally Grounded Science Collaborations 1 Information Sciences Institute, University of Southern California 2 Department of Software Engineering for Business Information Systems, Technical University of Munich 3 Department of Civil and Environmental Engineering at Penn State University 4 Center for Limnology at the University of Wisconsin Madison Yolanda Gil 1 , Felix Michel 12 , Varun Ratnakar 1 , Matheus Hauder 2 , Christopher Duffy 3 , Hilary Dugan 4 , and Paul Hanson 4 11th IEEE International Conference on eScience Organic Data Science http://www.organicdatascience.org/

A Task-Centered Framework för Computationally Grounded Science Collaborations

Embed Size (px)

Citation preview

Page 1: A Task-Centered Framework för Computationally Grounded Science Collaborations

A Task-Centered Framework for Computationally Grounded Science Collaborations

1Information Sciences Institute, University of Southern California 2Department of Software Engineering for Business Information Systems, Technical University of Munich3Department of Civil and Environmental Engineering at Penn State University4Center for Limnology at the University of Wisconsin Madison

Yolanda Gil1, Felix Michel12, Varun Ratnakar1, Matheus Hauder2, Christopher Duffy3, Hilary Dugan4, and Paul Hanson4

11th IEEE International Conference on eScience

Organic Data Sciencehttp://www.organicdatascience.org/

Page 2: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 2

ApproachIntroductionMotivation Evaluation Conclusion

Evolution of the scientific enterprise

Evolution of the scientific enterprise from [Barabasi, 2005] extended with the ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012].

Motivation

single-authorship co-authorship large number ofco-authors

the community as author

Page 3: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 3

ApproachIntroductionMotivation Evaluation Conclusion

Taxonomy of Science Communities

Collaboration types with resources and activities [Bos et al 2007]

Introduction

Tools (instruments)

Information(data)

Knowledge(new findings)

Aggregatingacross distance (loose coupling, often asyn-chronously)

Shared InstrumentNEON

Communication Data SystemPDB

Virtual Learning Community GLEON,

Virtual Community of PracticeVIVO

Co-creatingacross distance(requires tighter coupling, often synchronously)

InfrastructureCSDMS

Open Community Contribution SystemZooniverse

Distributed Research CenterENCODE

Page 4: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 4

ApproachIntroductionMotivation Evaluation ConclusionIntroduction

Multi-disciplinary contributions

Significant coordination

Engaging unanticipated participants

R1:

R2:

R3:

Goal: Supporting Distributed Research Activities with Unanticipated Participants Joining Over Time

Page 5: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 5

ApproachIntroductionMotivation Evaluation ConclusionApproach

Algorithm Black box

Input Parameter Output-> x1-> x2-> y1-> y2

Description

z ->v ->

a ->b ->

This component uses the X model to generate ….

factor: 20repeat: 16 timesMin: 0.5 unitsmax: 11.5 units

Met

a De

scrip

tion

Software Component

Modeling Analyze Provenance

Executed 2014Input:Results:

Executed 2013Input:Results:

Executed 2012Input:Results:

Executed 2011Input:Results:

Implement computational data analysis1) Workflow

creation activities

Supported by workflow systems

Computationally Grounded Science Collaboration: Layers

Page 6: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 6

ApproachIntroductionMotivation Evaluation ConclusionApproach

CodeInput Parameter Output

-> x1-> x2-> y1-> y2

Description

z ->v ->a ->b ->

This component uses the X model to generate ….

factor: 20repeat: 16 timesMin: 0.5 unitsmax: 11.5 units

Met

a De

scrip

tion

Algo

rithm

Select/develop software

Computationally Grounded Science Collaboration: Layers

2) Softwaredevelopmentactivities

Supported by shared software repositories

Page 7: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 7

ApproachIntroductionMotivation Evaluation ConclusionApproach

Select problems, strategies, data, models, methods, etc.

Organic Data Science

Workflow Black box

Data Parameters

Description

z ->v ->

a ->b ->

Model X with data source Y indicates …

Met

a D

escr

iptio

n

Computational WorkflowModels

Computationally Grounded Science Collaboration: Layers

3) Meta-workflow designactivities

Our focus

Page 8: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 8

ApproachIntroductionMotivation Evaluation Conclusion

Computationally Grounded Science Collaboration: Layers

Meta-workflow design activities

Workflow creation activities

Softwaredevelopment activities

Approach

Page 9: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 9

ApproachIntroductionMotivation Evaluation Conclusion

Collaboration that occurs in distributed research activities with unanticipated participants joining over time

Meta-workflow design layer: scientists working together to agree on a problem to solve and a strategy to solve it

Reducing the coordination effort, lower the barriers to growing the community

Focus of this workApproach

Page 10: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 10

ApproachIntroductionMotivation Evaluation Conclusion

Social Design Principles

Selected social principles from [Kraut and Resnick 2012] for building successful online communities that can be applied to Organic Data Science.

A1: Carve a niche of interest, scoped in terms of topics, members, activities, and purpose A2: Relate to competing sites, integrate content A3: Organize content, people, and activities into subspaces once there is enough activity A4: Highlight more active tasks A5: Inactive tasks should have “expected active times” A6: Create mechanisms to match people to activities

B1: Make it easy to see and track needed contributions B2: Ask specific people on tasks of interest to them B3: Simple tasks with challenging goals are easier to comply with B4: Specify deadlines for tasks, while leaving people in control B5: Give frequent feedback specific to the goals …B10 …

C1: Cluster members to help them identify with the community C2: Give subgroups a name and a tagline C3: Put subgroups in the context of a larger group C4: Make community goals and purpose explicit C5: Interdependent tasks increase commitment and reduce conflict

DD1: Members recruiting colleagues is most effective D2: Appoint people responsible for immediate friendly interactions D3: Introducing newcomers to members increases interactions D4: Entry barriers for newcomers help screen for commitment D5: When small, acknowledge each new member …D12 …

B

A C

Approach

Starting communities

Encouraging contributions through motivation

Encouraging commitment

Attracting and Engaging Newcomers

Page 11: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 11

ApproachIntroductionMotivation Evaluation ConclusionBest Practices from Polymath and Encode

Selected best practices from the Polymath [Nielsen 2012] project and lessons learned from ENCODE [Encode 2004].

E1: Permanent URLs for posts and comments, so others can refer to themE2: Appoint a volunteer to summarize periodicallyE3: Appoint a volunteer to answer questions from newcomersE4: Low barrier of entry: make it VERY easy to commentE5: Advance notice of tasks that are anticipatedE6: Keep few tasks active at any given time, helps focus

F1: Spine of leadership, including a few leading scientists and 1-2 operational project managers, that resolves complex scientific and social problems and has transparent decision makingF2: Written and publicly accessible rules to transfer work between groups, to assign credit when papers are published, to present the workF3: Quality inspection with visibility into intermediate stepsF4: Export of data and results, integration with existing standards

E

F

Approach

Lessons learned from ENCODE

Best practices from Polymath

Page 12: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 12

ApproachIntroductionMotivation Evaluation Conclusion

Self-Organization through Dynamic Task Decomposition

Approach

eScience

Page 13: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 13

ApproachIntroductionMotivation Evaluation Conclusion

Organic Data Science:Contributors

https://github.com/IKCAP/organicdatascience

Approach

Page 14: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 14

ApproachIntroductionMotivation Evaluation Conclusion

Organic data science is a novel approach to on-line scientific collaboration that supports:

Self-organization of communities by enabling any user to specify and decompose tasks

On-line community support by incorporating social sciences principles and best practices

An open science process by capturing new kinds of metadata about the collaboration that give necessary context to newcomers

Task-oriented self-organizing on-line communities for open collaboration in science

Organic Data ScienceApproach

Page 15: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 15

ApproachIntroductionMotivation Evaluation Conclusion

Ongoing CommunitiesAge of Water is community of hydrologists and limnologists that are studying the age of water in an ecosystem.

ENIGMA a consortium for neuroimaging genetics, it includes more than 70 institutions that collaborate to do joint neuroscience studies.

GPF a group of geoscientists publishing a special issue of a journal. All articles include datasets, software, and workflows used to generate the results in the paper

ODST assigns all new users a set of pre-defined tasks that involves learning aspects of the framework.

ODSF coordinates the development and improvement of the Organic Data Science Framework.

Approach

Page 16: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 16

ApproachIntroductionMotivation Evaluation Conclusion

Evolution of the collaboration in the GPF community

GPF community was seeded with five organizers of the special issue

One of the organizers served as the host for the authors

The authors shared more and more tasks as the collaboration progressed

The thickness of the lines is more pronounced in the final graph❹

❶ ❷

Evaluation

Page 17: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 17

ApproachIntroductionMotivation Evaluation Conclusion

Age of Water Community

Number of Ancestors

Num

ber o

f Tas

ks

Social Task NetworkTask Hierarchy

Node = ParticipantEdge = Tasks in common

Evaluation

Page 18: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 18

ApproachIntroductionMotivation Evaluation Conclusion

ENIGMA Community

Number of Ancestors

Num

ber o

f Tas

ks

Social Task NetworkTask Hierarchy

Node = ParticipantEdge = Tasks in common

Evaluation

Page 19: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 19

ApproachIntroductionMotivation Evaluation Conclusion

GPF Community

Number of Ancestors

Num

ber o

f Tas

ks

Social Task NetworkTask Hierarchy

Node = ParticipantEdge = Tasks in common

Evaluation

Page 20: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 20

ApproachIntroductionMotivation Evaluation Conclusion

ODSF Community

Number of Ancestors

Num

ber o

f Tas

ks

Social Task NetworkTask Hierarchy

Evaluation

Node = ParticipantEdge = Tasks in common

Page 21: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 21

ApproachIntroductionMotivation Evaluation Conclusion

ODST Community

Number of Ancestors

Num

ber o

f Tas

ks

Social Task NetworkTask Hierarchy

Node = ParticipantEdge = Tasks in common

Evaluation

To accomplish the ODS Training no collaboration is needed, therefore

only two users are shown.

Page 22: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 22

ApproachIntroductionMotivation Evaluation Conclusion

Task metadata analysisEvaluation

Page 23: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 23

ApproachIntroductionMotivation Evaluation Conclusion

ConclusionsConclusion

The Organic Data Science Framework supports collaborations that are distributed research activities with unanticipated participants joining over time: meta-workflow design layer: scientists working

together to agree on a problem to solve and a strategy to solve it.

based on social design principles preliminary data on use in different communities

Future work: Evaluation to assess how the framework supports scientific collaboration and whether it increases productivity and community growth.

Page 24: A Task-Centered Framework för Computationally Grounded Science Collaborations

USC INFORMATION SCIENCES INSTITUTE 24

ApproachIntroductionMotivation Evaluation Conclusion

Thank You

https://github.com/IKCAP/organicdatascience

Organic Data Sciencehttp://www.organicdatascience.org/

Development

AcknowledgmentsWe gratefully acknowledge funding from the US National Science Foundation under grant IIS-1344272.