The Era of Open

Preview:

DESCRIPTION

Presented at the WikiSym and OpenSym joint conference in Hong Kong on August 7, 2013.

Citation preview

The Era of Open

Philip E. Bourne

University of California San Diego

pbourne@ucsd.edu

WikiSym+OpenSym Aug 7, 2013 1

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 2

Daniel Hulshizer/Associated Press

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 3

Daniel Hulshizer/Associated Press

An Example of That Potential:The Story of Meredith

WikiSym+OpenSym Aug 7, 2013 4

http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 5

Daniel Hulshizer/Associated Press

Deinstitutionalization Vs Conservatism

WikiSym+OpenSym Aug 7, 2013 6

Daniel Hulshizer/Associated Press

It Starts with the Metrics of Success

[Adapted from Carole Goble]WikiSym+OpenSym Aug 7, 2013 7

Committee on Academic Promotions

• What Counts– Money– Grants– Papers– Teaching – Service

• What Does Not– Sharing data– Sharing software– Open access– Collaboration– Patents– Startups

WikiSym+OpenSym Aug 7, 2013 8

Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 9

Daniel Hulshizer/Associated Press

Interim Solution: Use the Traditional Reward SystemThe Wikipedia Experiment – Topic Pages

Identify areas of Wikipedia that relate to the journal that are missing of stubs

Develop a Wikipedia page in the sandbox

Have a Topic Page Editor Review the page

Publish the copy of record with associated rewards

Release the living version into Wikipedia

WikiSym+OpenSym Aug 7, 2013 10

MOOCs Are Another Form of Disruption

WikiSym+OpenSym Aug 7, 2013 11

In Short Most Academic Institutions Have Yet to

Embrace the Open Digital Enterprise They Surely Will

Become

WikiSym+OpenSym Aug 7, 2013 12

• Anyone, anything, anytime

• publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies

• “accessible, intelligible, assessable, reusable”

http://royalsociety.org/policy/projects/science-public-enterprise/report/

[Carole Goble]WikiSym+OpenSym Aug 7, 2013 13

Business Models Rule

• The Internet demanded new business models to support scholarly communication

• Open access was one such sustainable model: – Began with the community – Was driven by new organizations (PLOS, BMC,

F1000, eLife, Dryad, Mendeley etc.)– Was NOT driven by academic institutions– Was driven by policies and funders

WikiSym+OpenSym Aug 7, 2013 14

One Metric of Change:Multidisciplinary Open Access

Mega Journal

• This year PLOS ONE will publish over 30,000 papers!

WikiSym+OpenSym Aug 7, 2013 15

This Disruption Got Us Thinking About…

• A paper as only one form of knowledge discovery

• The use of interaction and rich media from which to learn and actually do science

• Reproducibility• Reward structures• Better management of the research lifecycle

P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34

WikiSym+OpenSym Aug 7, 2013 16

This Disruption Got Us Thinking About…

• A paper as only one form of knowledge discovery

• The use of interaction and rich media from which to learn and actually do science

• Reproducibility• Reward structures• Better management of the research lifecycle

P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34

WikiSym+OpenSym Aug 7, 2013 17

Better Management of the Research Lifecycle is Not a

New Concept

WikiSym+OpenSym Aug 7, 2013 18

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

datasetsdata collectionsalgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware

Morin et al Shining Light into Black BoxesScience 13 April 2012: 336(6078) 159-160

Ince et al The case for open computer programs, Nature 482, 2012

[Carole Goble]

The Research Lifecycle

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

SoftwareRepositories

Analysis Tools

Visualization

ScholarlyCommunication

Commercial &Public Tools

Git-likeResources

By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories

Training

The Research Lifecycle

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

SoftwareRepositories

Analysis Tools

Visualization

ScholarlyCommunication

Commercial &Public Tools

Git-likeResources

By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories

Training

automate: workflows, pipeline & service integrative frameworks

pool, share & collaborate web systems

nanopub

semantics & ontologiesmachine readable documentation

scientific software engineering

CSSE

Carole Goble]

Why is This Important to Me Personally?

• My wife is being treated for stage 1 breast cancer

• This highlights for me the disparity between what is happening in the lab and what is happening in the clinic– In the lab cancer is a personalized and treatable

condition– In the clinic we are still equally “poisoning”

patients with drugs first introduced 10-20 years ago

WikiSym+OpenSym Aug 7, 2013 23

http://sagecongress.org/Presentations/Sommer.pdf

WikiSym+OpenSym Aug 7, 2013 24

Josh Sommer]

http://sagecongress.org/Presentations/Sommer.pdf

WikiSym+OpenSym Aug 7, 2013 25

[Josh Sommer]

Most Laboratories

• We are the long tail• Goodbye to the student is

goodbye to the data• Very few of us have

complied (or will comply with the data management plans we write into grants)

• Too much software is unusable

S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136

WikiSym+OpenSym Aug 7, 2013 26

Today’s Research Lifecycle is Digitally Fragmented at Best

• Proof:– I cant immediately reproduce the research in

my own laboratory• It took an estimated 280 hours for an average user

to approximately reproduce the paper

– Workflows are maturing and becoming helpful– Data and software versions and accessibility

prevent exact reproducability

Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review.

WikiSym+OpenSym Aug 7, 2013 27

At the Same Time The Disruption Continues

WikiSym+OpenSym Aug 7, 2013 28

G8 open data charterhttp://opensource.com/government/13/7/open-data-charter-g8

WikiSym+OpenSym Aug 7, 2013 29

• In the US alone..– March 2012 OSTP

commits $200M to Big Data

– OSTP demands sharing plans by August 2013

– GBMF/Sloan provide institutional awards for data science

– NCBI considers data catalog and MyBibliography

And the Disruption Continues

WikiSym+OpenSym Aug 7, 2013 30

Where Will It End?

First We Should Ask What It Is We Wish to Accomplish

WikiSym+OpenSym Aug 7, 2013 31

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Here is What I Want – The Paper As Experiment

1. User clicks on thumbnail2. Metadata and a

webservices call provide a renderable image that can be annotated

3. Selecting a features provides a database/literature mashup

4. That leads to new papers

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

PLoS Comp. Biol. 2005 1(3) e34

32

Here is What I Want – Knowledge Push

• Each evening the labs “Evernote” notebooks are scanned for commonalities from the days activities. These are seeds in a deep search of the webs research lifecycles that has become available since last searched. Results are ranked and presented for consideration over coffee the next morning

http://www.discoveryinformaticsinitiative.org/diw2012

WikiSym+OpenSym Aug 7, 2013 33

Will End With …

• Infrastructure:– Science, Nature, Cell and megajournals all

“open access” – An array of coupled institutional repositories – A central repository – PubMed Central – Open software in full support of the research

lifecycle – The research lifecycle in the cloud

WikiSym+OpenSym Aug 7, 2013 34

Will End With …

• Sociologically:– An end to build it and they will come– Alternative metrics accepted by the

community– Alternative reward systems that recognize the

realities of today’s scholarship, namely:• Open data availability• Software availability• Collaborative research

WikiSym+OpenSym Aug 7, 2013 35

We Have a Way to GoConsider the Life Sciences

• Good News– We have NCBI/EBI– Publishers are starting

to embrace data– Workflows in support

of the research lifecycle are catching on

• Bad News– Sustainability remains

a noun not a verb– Data are organized by

type not by questions asked (silos)

– Tenure committees are still in the dark ages

WikiSym+OpenSym Aug 7, 2013 36

What Can We Do As a Community?

WikiSym+OpenSym Aug 7, 2013 37

Build Trust

38

Data

Trust in the dataand the derived knowledge

WikiSym+OpenSym Aug 7, 2013

What I Have Learned About Trust 1/2

• Trust is like compound interest

• Comes from listening

• Comes from engaging the community in every aspect of the process

• Comes from data consistency and level of annotation

• Comes from responsiveness

• Comes from the quality of the delivery service

39WikiSym+OpenSym Aug 7, 2013

What I Have Learned About Trust 2/2

• Quality begats trust– Quality requires data models/ontologies

• Quality requires people– Annotators are the unsung heroes

• Trust requires provenance & versioning

• Trust requires explaining that all data and knowledge are not created equal

40WikiSym+OpenSym Aug 7, 2013

Beyond Building Trust What Else Can We Do?

WikiSym+OpenSym Aug 7, 2013 41

Think Globally Act Locally

• Support emergent community commons/portals• Be involved in the support and development of

metadata standards• Contribute to workflow development etc. to drive

an open research lifecycle• Educate your mentors on the importance of

open science and scholarly communication • Write software thinking of an App model

WikiSym+OpenSym Aug 7, 2013 42

Understand That All Data/Knowledge Are NOT

Created Equal• We need to understand

how data are used• Sustainability is not

more money from the funding agencies its about business models

• Reductionism is not a dirty word

• We need to do more with the long tail

On the Future of Genomic DataScience 11 February 2011: vol. 331 no. 6018 728-729 WikiSym+OpenSym Aug 7, 2013

Recognize That Institutions Must Play a Greater Role

• We need institutional data/knowledge sharing plans

• We need data/information scientists to be better recognized by institutions – its not all about papers – this implies new metrics

44WikiSym+OpenSym Aug 7, 2013

Learn from the App Store

• The App model– Think of it operating on a content base rather

than a mobile device– Simple and consistent user interface– Needs to pass some quality control– Has a reward

• The App+ Model– Apps interoperate through a generic workflow

interface

WikiSym+OpenSym Aug 7, 2013 45

In Summary

• Open science is a means to accelerate the rate of discovery

• Disruption has begun, but there is great inertia in the system

• All of us are stakeholders and capable of invoking further positive change

• We need to get institutions and more scientists involved….

WikiSym+OpenSym Aug 7, 2013 46

Acknowledgementswww.force11.org

WikiSym+OpenSym Aug 7, 2013 47

pbourne@ucsd.edu

• Force11 Manifesto• Fourth Paradigm: Data Intensive Scientific

Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/WikiSym+OpenSym Aug 7, 2013 48

Recommended