18
1 Yolanda Gil USC Information Sciences Institute [email protected] OntoSoft: A Distributed Semantic Registry for Scientific Software Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar Information Sciences Institute and Department of Computer Science University of Southern California @yolandagil, @dgarijov {gil,dgarijo,saurabhm,varunr}@isi.edu http://www.ontosoft.org Building Block Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

OntoSoft: A Distributed Semantic Registry for Scientific Software

  • Upload
    dgarijo

  • View
    264

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OntoSoft: A Distributed Semantic Registry for Scientific Software

1Yolanda GilUSC Information Sciences Institute [email protected]

OntoSoft: A Distributed Semantic Registry for

Scientific Software

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar

Information Sciences Instituteand Department of Computer Science

University of Southern California@yolandagil, @dgarijov

{gil,dgarijo,saurabhm,varunr}@isi.edu

http://www.ontosoft.orgBuilding Block

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 2: OntoSoft: A Distributed Semantic Registry for Scientific Software

2Yolanda GilUSC Information Sciences Institute [email protected]

We have all been here…

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 3: OntoSoft: A Distributed Semantic Registry for Scientific Software

3Yolanda GilUSC Information Sciences Institute [email protected]

The Value of Software: Reproducibility

Financial

Human lives

Reliability

Scientific integrity

Financial

Trust

5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com

Page 1 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientific- studies.html?smid= tw- nytimesscience&_r= 1

Sections Home Search Skip to content

Advertisement

EmailShareTweetMore

Search

Subscribe

Log In 0 Settings

Close search

search sponsored by

Search NYTimes.com

Clear this text input Go

http://nyti.ms/1HPVX1t

1. 1. Study on Attitudes Toward Same-Sex Marriage Is Retracted by a Scientific Journal

2. A Proposal to Modify Plants Gives G.M.O. Debate New Life

3. Chimpanzees in Liberia, Used in New York Blood Center Research, Face Uncertain Future

4. Matter

The Human Family Tree Bristles With New Branches

5. Observatory

Race and Gender Biases Can be Reduced With Sleep Therapy, Study Finds

6. Observatory

Ancient Skull Suggests an Early Murder

7. National Briefing | Washington

Live Anthrax Spores Shipped to Laboratories

8. A Robot That Can Perform Brain Surgery on a Fruit Fly

9. Jinghong Journal

China’s High Hopes for Growing Those Rubber Tree Plants

10. Scientists Warn to Expect More Weather Extremes

11. Arguing in Court Whether 2 Chimps Have the Right to ‘Bodily Liberty’

12. Sister Megan Rice, Freed From Prison, Looks Ahead to More Anti-Nuclear Activism

13. Obama Announces New Rule Limiting Water Pollution

14. Lassa Fever Carries Little Risk to Public, Experts Say

SUBSCRIBE NOW

5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com

Page 5 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientifi c- studies.html?smid= tw- nytimesscience&_r= 1

The retraction by Science of a study of changing attitudes about gay marriage is

the latest prominent withdrawal of research results from scientific literature.

And it very likely won't be the last. A 2011 study in Nature found a 10-fold

increase in retraction notices during the preceding decade.

Many retractions barely register outside of the scientific field. But in some

instances, the studies that were clawed back made major waves in societal

discussions of the issues they dealt with. This list recounts some prominent

retractions that have occurred since 1980.

Photo

In 1998, The Lancet, a British medical journal,

published a study by Dr. Andrew Wakefield

that suggested that autism in children was

caused by the combined vaccine for measles,

mumps and rubella. In 2010, The Lancet

retracted the study following a review of Dr.

Wakefield's scientific methods and financial

conflicts.

Despite challenges to the study, Dr.

Wakefield's research had a strong effect on

many parents. Vaccination rates tumbled in

Britain, and measles cases grew. American

antivaccine groups also seized on the research. The United States had more

cases of measles in the first month of 2015

than the number that is typically diagnosed in a full year.

Vaccines andAutism

Papers published by Japanese researchers in Nature in 2014 claimed to provide

an easy method to create multipurpose stem cells, with eventual implications

for the treatment of diseases and injuries. Months later, the authors, including

Haruko Obokata, issued a retraction. An investigation by one of Japan's most

prestigious scientific institutes, where much of the research occurred, found

that the author had manipulated some of the images published in the study.

Approximately one month after the retraction, one of Ms. Obokata's co-authors,

Yoshiki Sasai, was found hanging in a stairwell of his office. He had taken his

own life.

Stem Cell Production

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 4: OntoSoft: A Distributed Semantic Registry for Scientific Software

4Yolanda GilUSC Information Sciences Institute [email protected]

Quantifying the Value of Software through

“Reproducibility Maps” [Bourne & Gil et al 12]

2 months of effort in reproducing published method (in PLoS’10)

Authors expertise was required

Comparison of ligand binding sites

Comparison of dissimilar protein structures

Graph network generation

Molecular Docking

Work with P. Bourne of UCSD

Page 5: OntoSoft: A Distributed Semantic Registry for Scientific Software

5Yolanda GilUSC Information Sciences Institute [email protected]

Software Today

There are repositories of domain specific software (e.g., geosciences)

There are general software repositories with no standard metadata

Most scientists are not aware of the value of their software

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 6: OntoSoft: A Distributed Semantic Registry for Scientific Software

6Yolanda GilUSC Information Sciences Institute [email protected]

“Dark Software”

Models that are not published

• Eg from a PhD thesis

Data preparation software

• Data pre-processing and QC can take up to 80% of a project’s effort

Visualization software

“Dark Software” is the counterpart of “Dark Data” [Heidorn 2008]

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 7: OntoSoft: A Distributed Semantic Registry for Scientific Software

7Yolanda GilUSC Information Sciences Institute [email protected]

Why Is Software Not Shared?

“Noone would use my code if I shared it”

“My code is really bad”

“My code is not ready to be shared”

“Sharing my software will take a lot of time”

“I won’t get anything out of sharing my software”

“I’ve shared software before, bad things happened”

“I work for the government”

“I want to commercialize my software”

“I don’t want anyone to sell my software”

“I don’t know where to start!”Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 8: OntoSoft: A Distributed Semantic Registry for Scientific Software

8Yolanda GilUSC Information Sciences Institute [email protected]

Contributions: OntoSoft

Registry for software• Complements code repositories

• Scientist-centered software metadata

• Community curated software metadata

• Training scientists on best practices

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 9: OntoSoft: A Distributed Semantic Registry for Scientific Software

9Yolanda GilUSC Information Sciences Institute [email protected]

OntoSoft Architecture

OntoSo So wareMetadataRepository

Ontologies

Geosciences

OntoSo so waremetadataimportpublish

query

OntoSo UserInterface

PublishBrowse/Search

query

ExternalRepository

Push

GitHub

ApacheSVN

CSDMS

Adapters(eg,BMI)

CSDMS CF ESMF …

Domain-SpecificUI

StandardNames

OntoSo Training

Lessons

VMEnvironmentGenerator

Docker

Vagrant

SolrSearch

Index

Videos

DomainOntologies

ExternalRepository

Pull

5/31/2016

Recommend

NOAA

OntoSo components

Externalcomponents

Legend

OtherOntoSo Installa ons

PROV

WebAccessControl

MetadataAccessControl

Page 10: OntoSoft: A Distributed Semantic Registry for Scientific Software

10Yolanda GilUSC Information Sciences Institute [email protected]

The OntoSoft Ontology for Describing

Scientific Software Metadata [Gil et al 2015]

An ontology for scientific software metadata

• Intended to describe scientific software

• Designed with scientists in mind to guide them to deposit and describe their software in a software registry

Major categories of metadata: what does a scientist need?

1. identify software

2. understand what it does and its utility for research,

3. execute the software,

4. get support if questions arise,

5. do research with it, and

6. contribute to its development

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 11: OntoSoft: A Distributed Semantic Registry for Scientific Software

11Yolanda GilUSC Information Sciences Institute [email protected]

OntoSoft Metadata Categories

http://www.ontosoft.org/software

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 12: OntoSoft: A Distributed Semantic Registry for Scientific Software

12Yolanda GilUSC Information Sciences Institute [email protected]

Describing Scientific Software in OntoSoft

http://www.ontosoft.org/portal

Metadata can be exported in

several formats (HTML, RDF,

JSON)

Metadata for 3DDY Software

Metadata properties

collected through

simple questions

Set permissions for 3DDY

Metadata properties

organized into categories that

make sense to scientists

Automatic import of metadata

from other repositoriesIndicators of metadata

completeness

Page 13: OntoSoft: A Distributed Semantic Registry for Scientific Software

13Yolanda GilUSC Information Sciences Institute [email protected]

Access control

http://www.ontosoft.org/portal

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Users and permissions for

the 3DDY software

component

Setting permissions for editing 3DDY metadata

W3CWeb access control Ontology

Page 14: OntoSoft: A Distributed Semantic Registry for Scientific Software

14Yolanda GilUSC Information Sciences Institute [email protected]

Software entries

from distributed

repositories are

readily accessible

Semantic

search

Comparison matrix

of software entries

PIHM PIHMgis DrEICH TauDEM WBMsed nto$o%$

Metadata

completion

highlighted

Software is

contrasted

by property

Page 15: OntoSoft: A Distributed Semantic Registry for Scientific Software

15Yolanda GilUSC Information Sciences Institute [email protected]

Community

Learning

UK Software Institute

Software Carpentry

CIGESMF

Critical Zone Observatory

Early Career Advisory Board

FES/ESIP

CSDMSEarthCubeBuilding Blocks

Recommender system � Interoperability

Publication

Community

Learning

Structured metadata � Interactive advice

� Best practices � Multimedia lessons

Collaborating with SEN C4P EC3

EarthCubeRCNs

Publication

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Omics

Code meta initiative

Page 16: OntoSoft: A Distributed Semantic Registry for Scientific Software

16Yolanda GilUSC Information Sciences Institute [email protected]

Conclusions

Software is a valuable research product

• Must embed best practices of software sharing into research activities

Improve productivity, quality, reproducibility

OntoSoft contributions• Ontology of scientific

software metadata

• Portal for software registry Do you want to use Ontosoft? Let us know!

http://www.ontosoft.org

http://www.ontosoft.org/software

http://www.ontosoft.org/portal

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 17: OntoSoft: A Distributed Semantic Registry for Scientific Software

17Yolanda GilUSC Information Sciences Institute [email protected]

More Information

http://www.ontosoft.org

http://www.ontosoft.org/software

http://www.ontosoft.org/portal

http://www.ontosoft.org/gpf

OntoSoft: Capturing Scientific Software Metadata. Yolanda Gil, Varun Ratnakar, and Daniel Garijo. Proceedings of the Eighth ACM International Conference on Knowledge Capture (K-CAP), 2015.

OntoSoft: A Distributed Semantic Registry for Scientific Software. Yolanda Gil, Daniel Garijo, Saurabh Mishra, and Varun Ratnakar. Under review, 2016.

DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis. Chris A. Mattmann, Ji-Hyun Oh, Tyler Palsulich, Lewis John McGibbney, Yolanda Gil, and Varun Ratnakar. Proceedings of the Fourth International Workshop on Software Mining, held in conjunction with the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015.

Cyber-Innovated Watershed Research at the Shale Hills Critical Zone Observatory. Xuan Yu, Chris Duffy, Yolanda Gil, Lorne Leonard, Gopal Bhatt, and Evan Thomas. IEEE Systems Journal, to appear.

Collaborative Software Development Needs in Geosciences. Yolanda Gil, Eunyoung Moon and James Howison. Proceedings of the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), held in conjunction with the IEEE ACM International Conference on High Performance Computing (SC), New Orleans, LA, November 2014.

Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users. Daniel Garijo, Oscar Corcho, Yolanda Gil, Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad and, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, 2014.

FragFlow: Automated Fragment Detection in Scientific Workflows. Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A. Gutman, Ivo D. Dinov, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, Guarujua, Brazil, October 2014.

An Overview of Mobile Applications for Field Science. Anna Zeng, Kevin Zeng, Yolanda Gil, and Matty Mookerjee. GeoSoft Project Report, September 2014.

The CSDMS Standard Names: Cross-Domain Naming Conventions for Describing Process Models, Data Sets and Their Associated Variables. Scott D. Peckham. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.

Web Applications that Share Level-12 HUC Data and Models of the CONUS. Lorne Leonard and Chris Duffy. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.

Intelligent Workflow Systems and Provenance-Aware Software. Yolanda Gil. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016

Page 18: OntoSoft: A Distributed Semantic Registry for Scientific Software

18Yolanda GilUSC Information Sciences Institute [email protected]

Acknowledgements

The OntoSoft project team includes Chris Duffy (PSU), Chris Mattmann (JPL),

Scott Pechkam (CU), Ji-Hyun Oh (USC), Varun Ratnakar (USC), and Erin

Robinson (ESIP)

Thank you to James Howison (UT), Lisa Kempler (Matworks), and Greg Wilson

(Software Carpentry) for their feedback on best practices for software sharing

Thank you to the scientists and other colleagues that have contributed ideas

and asked hard questions about software stewardship

Thank you to the National Science Foundation and the EarthCube program for

supporting this work

EarthCube!ICER-1440323ICER-1343800

http://www.ontosoft.org

http://www.ontosoft.org/software

http://www.ontosoft.org/portal

http://www.ontosoft.org/gpf

Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016