11
Tales of the Field: Building Small Science Cyberinfrastructure Andrea Wiggins iSchool @ Syracuse University 31 October, 2009

Tales of the Field: Building Small Science Cyberinfrastructure

Embed Size (px)

DESCRIPTION

Society for the Social Studies of Science cyberinfrastructure methods panel presentation on experiences building small science cyberinfrastructure and reflections on implications for other pre-paradigmatic domains.

Citation preview

Page 1: Tales of the Field: Building Small Science Cyberinfrastructure

Tales of the Field: Building Small Science

Cyberinfrastructure

Andrea Wiggins

iSchool @ Syracuse University

31 October, 2009

Page 2: Tales of the Field: Building Small Science Cyberinfrastructure

Free/Libre Open Source Software

• FLOSS development– Large-scale social

phenomenon of “collaborative” software development

• Observing FLOSS research– Reflexive examination of

small scholarly community studying FLOSS development

– Specifically working on building CI for FLOSS research

http://www.flickr.com/photos/pmtorrone/304696349/

Page 3: Tales of the Field: Building Small Science Cyberinfrastructure

eScience Proof of Concept

• (some) FLOSS research is a good candidate for eScience approaches to doing the work– Lots of data due to scale of phenomenon– Research community ethos of sharing

• Data repositories• Research paper archive• Analysis artifacts

Page 4: Tales of the Field: Building Small Science Cyberinfrastructure

FLOSS Research Community

• Little Science– Interdisciplinary:

primarily software engineering, but also social sciences across a wide spectrum

– Fairly small community: under 500 researchers worldwide

http://www.flickr.com/photos/circulating/997909242/

Page 5: Tales of the Field: Building Small Science Cyberinfrastructure

FLOSS Data

• Many types of data, focus here on digital “trace” data– Archival, secondary– By-product of FLOSS work,

easy to get but hard to use

• Federated repositories of repositories (RoRs)– Data for research drawn from

hosting “forges”– ~1 TB across 3 RoRs

http://www.flickr.com/photos/smiteme/2379630899/

Page 6: Tales of the Field: Building Small Science Cyberinfrastructure

Research Methods & Tools

• Methods used with RoR data vary, but are generally quantitative– Correlational studies– Longitudinal analysis– Code metrics

• Two main approaches– Bespoke scripts or tools– eScience workflow tools

Page 7: Tales of the Field: Building Small Science Cyberinfrastructure

Barriers to Uptake

• Little Science– Lack of agreement over

epistemology, RQs, methods, tools

– Researcher isolation, few incentives to collaborate

• Bimodal distribution of skills– “I can’t possibly do that! I can’t

write code!”– “Why bother? I just write my

own Python script; you should too.”

http://www.flickr.com/photos/noner/1739876378/

Page 8: Tales of the Field: Building Small Science Cyberinfrastructure

Technology Skills Required

• Taverna• SVN• (more) SSH, Unix terminal, XML• R, plus packages• SQL, relational DB management• Java & Eclipse (just enough)• OWL, RDF, SPARQL• Knowledge of opaque data sources

Page 9: Tales of the Field: Building Small Science Cyberinfrastructure

Implications for Small Sciences

• Critical mass– Need stewardship, dedicated

resources

• Skills gap– eScience tools require fairly

high technology competency

• Convergence of research– Common questions, modes of

research

• Motivations to contribute– Academic credit

http://www.flickr.com/photos/askpang/327577395/

Page 10: Tales of the Field: Building Small Science Cyberinfrastructure

Potential Solutions

• $$$– Maintaining and developing resources is not free,

even if they are freely shared

• Curricular integration– Broaden contributor base by drawing on students

through coursework

• Deliberately cultivate a community– Train PhD students early in their studies

• Mechanisms to incentivize contribution

Page 11: Tales of the Field: Building Small Science Cyberinfrastructure

Conclusions

• Without external imperatives, CI for little science seems unlikely to emerge unaided

• CI requires standardization and movement toward normal science, which may be premature or simply inappropriate for many social sciences

• Benefits for early adopters: tools support efficient collaboration, enable rigorous research provenance, permit analysis replication, and speed time to results