Upload
dr-matheus-hauder
View
18
Download
2
Embed Size (px)
Citation preview
A Task-Centered Framework for Computationally Grounded Science Collaborations
1Information Sciences Institute, University of Southern California 2Department of Software Engineering for Business Information Systems, Technical University of Munich3Department of Civil and Environmental Engineering at Penn State University4Center for Limnology at the University of Wisconsin Madison
Yolanda Gil1, Felix Michel12, Varun Ratnakar1, Matheus Hauder2, Christopher Duffy3, Hilary Dugan4, and Paul Hanson4
11th IEEE International Conference on eScience
Organic Data Sciencehttp://www.organicdatascience.org/
USC INFORMATION SCIENCES INSTITUTE 2
ApproachIntroductionMotivation Evaluation Conclusion
Evolution of the scientific enterprise
Evolution of the scientific enterprise from [Barabasi, 2005] extended with the ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012].
Motivation
single-authorship co-authorship large number ofco-authors
the community as author
USC INFORMATION SCIENCES INSTITUTE 3
ApproachIntroductionMotivation Evaluation Conclusion
Taxonomy of Science Communities
Collaboration types with resources and activities [Bos et al 2007]
Introduction
Tools (instruments)
Information(data)
Knowledge(new findings)
Aggregatingacross distance (loose coupling, often asyn-chronously)
Shared InstrumentNEON
Communication Data SystemPDB
Virtual Learning Community GLEON,
Virtual Community of PracticeVIVO
Co-creatingacross distance(requires tighter coupling, often synchronously)
InfrastructureCSDMS
Open Community Contribution SystemZooniverse
Distributed Research CenterENCODE
USC INFORMATION SCIENCES INSTITUTE 4
ApproachIntroductionMotivation Evaluation ConclusionIntroduction
Multi-disciplinary contributions
Significant coordination
Engaging unanticipated participants
R1:
R2:
R3:
Goal: Supporting Distributed Research Activities with Unanticipated Participants Joining Over Time
USC INFORMATION SCIENCES INSTITUTE 5
ApproachIntroductionMotivation Evaluation ConclusionApproach
Algorithm Black box
Input Parameter Output-> x1-> x2-> y1-> y2
Description
z ->v ->
a ->b ->
This component uses the X model to generate ….
factor: 20repeat: 16 timesMin: 0.5 unitsmax: 11.5 units
Met
a De
scrip
tion
Software Component
Modeling Analyze Provenance
Executed 2014Input:Results:
Executed 2013Input:Results:
Executed 2012Input:Results:
Executed 2011Input:Results:
Implement computational data analysis1) Workflow
creation activities
Supported by workflow systems
Computationally Grounded Science Collaboration: Layers
USC INFORMATION SCIENCES INSTITUTE 6
ApproachIntroductionMotivation Evaluation ConclusionApproach
CodeInput Parameter Output
-> x1-> x2-> y1-> y2
Description
z ->v ->a ->b ->
This component uses the X model to generate ….
factor: 20repeat: 16 timesMin: 0.5 unitsmax: 11.5 units
Met
a De
scrip
tion
Algo
rithm
Select/develop software
Computationally Grounded Science Collaboration: Layers
2) Softwaredevelopmentactivities
Supported by shared software repositories
USC INFORMATION SCIENCES INSTITUTE 7
ApproachIntroductionMotivation Evaluation ConclusionApproach
Select problems, strategies, data, models, methods, etc.
Organic Data Science
Workflow Black box
Data Parameters
Description
z ->v ->
a ->b ->
Model X with data source Y indicates …
Met
a D
escr
iptio
n
Computational WorkflowModels
Computationally Grounded Science Collaboration: Layers
3) Meta-workflow designactivities
Our focus
USC INFORMATION SCIENCES INSTITUTE 8
ApproachIntroductionMotivation Evaluation Conclusion
Computationally Grounded Science Collaboration: Layers
Meta-workflow design activities
Workflow creation activities
Softwaredevelopment activities
Approach
USC INFORMATION SCIENCES INSTITUTE 9
ApproachIntroductionMotivation Evaluation Conclusion
Collaboration that occurs in distributed research activities with unanticipated participants joining over time
Meta-workflow design layer: scientists working together to agree on a problem to solve and a strategy to solve it
Reducing the coordination effort, lower the barriers to growing the community
Focus of this workApproach
USC INFORMATION SCIENCES INSTITUTE 10
ApproachIntroductionMotivation Evaluation Conclusion
Social Design Principles
Selected social principles from [Kraut and Resnick 2012] for building successful online communities that can be applied to Organic Data Science.
A1: Carve a niche of interest, scoped in terms of topics, members, activities, and purpose A2: Relate to competing sites, integrate content A3: Organize content, people, and activities into subspaces once there is enough activity A4: Highlight more active tasks A5: Inactive tasks should have “expected active times” A6: Create mechanisms to match people to activities
B1: Make it easy to see and track needed contributions B2: Ask specific people on tasks of interest to them B3: Simple tasks with challenging goals are easier to comply with B4: Specify deadlines for tasks, while leaving people in control B5: Give frequent feedback specific to the goals …B10 …
C1: Cluster members to help them identify with the community C2: Give subgroups a name and a tagline C3: Put subgroups in the context of a larger group C4: Make community goals and purpose explicit C5: Interdependent tasks increase commitment and reduce conflict
DD1: Members recruiting colleagues is most effective D2: Appoint people responsible for immediate friendly interactions D3: Introducing newcomers to members increases interactions D4: Entry barriers for newcomers help screen for commitment D5: When small, acknowledge each new member …D12 …
B
A C
Approach
Starting communities
Encouraging contributions through motivation
Encouraging commitment
Attracting and Engaging Newcomers
USC INFORMATION SCIENCES INSTITUTE 11
ApproachIntroductionMotivation Evaluation ConclusionBest Practices from Polymath and Encode
Selected best practices from the Polymath [Nielsen 2012] project and lessons learned from ENCODE [Encode 2004].
E1: Permanent URLs for posts and comments, so others can refer to themE2: Appoint a volunteer to summarize periodicallyE3: Appoint a volunteer to answer questions from newcomersE4: Low barrier of entry: make it VERY easy to commentE5: Advance notice of tasks that are anticipatedE6: Keep few tasks active at any given time, helps focus
F1: Spine of leadership, including a few leading scientists and 1-2 operational project managers, that resolves complex scientific and social problems and has transparent decision makingF2: Written and publicly accessible rules to transfer work between groups, to assign credit when papers are published, to present the workF3: Quality inspection with visibility into intermediate stepsF4: Export of data and results, integration with existing standards
E
F
Approach
Lessons learned from ENCODE
Best practices from Polymath
USC INFORMATION SCIENCES INSTITUTE 12
ApproachIntroductionMotivation Evaluation Conclusion
Self-Organization through Dynamic Task Decomposition
Approach
eScience
USC INFORMATION SCIENCES INSTITUTE 13
ApproachIntroductionMotivation Evaluation Conclusion
Organic Data Science:Contributors
https://github.com/IKCAP/organicdatascience
Approach
USC INFORMATION SCIENCES INSTITUTE 14
ApproachIntroductionMotivation Evaluation Conclusion
Organic data science is a novel approach to on-line scientific collaboration that supports:
Self-organization of communities by enabling any user to specify and decompose tasks
On-line community support by incorporating social sciences principles and best practices
An open science process by capturing new kinds of metadata about the collaboration that give necessary context to newcomers
Task-oriented self-organizing on-line communities for open collaboration in science
Organic Data ScienceApproach
USC INFORMATION SCIENCES INSTITUTE 15
ApproachIntroductionMotivation Evaluation Conclusion
Ongoing CommunitiesAge of Water is community of hydrologists and limnologists that are studying the age of water in an ecosystem.
ENIGMA a consortium for neuroimaging genetics, it includes more than 70 institutions that collaborate to do joint neuroscience studies.
GPF a group of geoscientists publishing a special issue of a journal. All articles include datasets, software, and workflows used to generate the results in the paper
ODST assigns all new users a set of pre-defined tasks that involves learning aspects of the framework.
ODSF coordinates the development and improvement of the Organic Data Science Framework.
Approach
USC INFORMATION SCIENCES INSTITUTE 16
ApproachIntroductionMotivation Evaluation Conclusion
Evolution of the collaboration in the GPF community
GPF community was seeded with five organizers of the special issue
One of the organizers served as the host for the authors
The authors shared more and more tasks as the collaboration progressed
The thickness of the lines is more pronounced in the final graph❹
❶ ❷
❸
Evaluation
USC INFORMATION SCIENCES INSTITUTE 17
ApproachIntroductionMotivation Evaluation Conclusion
Age of Water Community
Number of Ancestors
Num
ber o
f Tas
ks
Social Task NetworkTask Hierarchy
Node = ParticipantEdge = Tasks in common
Evaluation
USC INFORMATION SCIENCES INSTITUTE 18
ApproachIntroductionMotivation Evaluation Conclusion
ENIGMA Community
Number of Ancestors
Num
ber o
f Tas
ks
Social Task NetworkTask Hierarchy
Node = ParticipantEdge = Tasks in common
Evaluation
USC INFORMATION SCIENCES INSTITUTE 19
ApproachIntroductionMotivation Evaluation Conclusion
GPF Community
Number of Ancestors
Num
ber o
f Tas
ks
Social Task NetworkTask Hierarchy
Node = ParticipantEdge = Tasks in common
Evaluation
USC INFORMATION SCIENCES INSTITUTE 20
ApproachIntroductionMotivation Evaluation Conclusion
ODSF Community
Number of Ancestors
Num
ber o
f Tas
ks
Social Task NetworkTask Hierarchy
Evaluation
Node = ParticipantEdge = Tasks in common
USC INFORMATION SCIENCES INSTITUTE 21
ApproachIntroductionMotivation Evaluation Conclusion
ODST Community
Number of Ancestors
Num
ber o
f Tas
ks
Social Task NetworkTask Hierarchy
Node = ParticipantEdge = Tasks in common
Evaluation
To accomplish the ODS Training no collaboration is needed, therefore
only two users are shown.
USC INFORMATION SCIENCES INSTITUTE 22
ApproachIntroductionMotivation Evaluation Conclusion
Task metadata analysisEvaluation
USC INFORMATION SCIENCES INSTITUTE 23
ApproachIntroductionMotivation Evaluation Conclusion
ConclusionsConclusion
The Organic Data Science Framework supports collaborations that are distributed research activities with unanticipated participants joining over time: meta-workflow design layer: scientists working
together to agree on a problem to solve and a strategy to solve it.
based on social design principles preliminary data on use in different communities
Future work: Evaluation to assess how the framework supports scientific collaboration and whether it increases productivity and community growth.
USC INFORMATION SCIENCES INSTITUTE 24
ApproachIntroductionMotivation Evaluation Conclusion
Thank You
https://github.com/IKCAP/organicdatascience
Organic Data Sciencehttp://www.organicdatascience.org/
Development
AcknowledgmentsWe gratefully acknowledge funding from the US National Science Foundation under grant IIS-1344272.