48
The Era of Open Philip E. Bourne University of California San Diego [email protected] WikiSym+OpenSym Aug 7, 2013 1

The Era of Open

Embed Size (px)

DESCRIPTION

Presented at the WikiSym and OpenSym joint conference in Hong Kong on August 7, 2013.

Citation preview

Page 1: The Era of Open

The Era of Open

Philip E. Bourne

University of California San Diego

[email protected]

WikiSym+OpenSym Aug 7, 2013 1

Page 2: The Era of Open

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 2

Daniel Hulshizer/Associated Press

Page 3: The Era of Open

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 3

Daniel Hulshizer/Associated Press

Page 4: The Era of Open

An Example of That Potential:The Story of Meredith

WikiSym+OpenSym Aug 7, 2013 4

http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne

Page 5: The Era of Open

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 5

Daniel Hulshizer/Associated Press

Page 6: The Era of Open

Deinstitutionalization Vs Conservatism

WikiSym+OpenSym Aug 7, 2013 6

Daniel Hulshizer/Associated Press

Page 7: The Era of Open

It Starts with the Metrics of Success

[Adapted from Carole Goble]WikiSym+OpenSym Aug 7, 2013 7

Page 8: The Era of Open

Committee on Academic Promotions

• What Counts– Money– Grants– Papers– Teaching – Service

• What Does Not– Sharing data– Sharing software– Open access– Collaboration– Patents– Startups

WikiSym+OpenSym Aug 7, 2013 8

Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol

Page 9: The Era of Open

The Era of Open Has The Potential to Deinstitutionalize

WikiSym+OpenSym Aug 7, 2013 9

Daniel Hulshizer/Associated Press

Page 10: The Era of Open

Interim Solution: Use the Traditional Reward SystemThe Wikipedia Experiment – Topic Pages

Identify areas of Wikipedia that relate to the journal that are missing of stubs

Develop a Wikipedia page in the sandbox

Have a Topic Page Editor Review the page

Publish the copy of record with associated rewards

Release the living version into Wikipedia

WikiSym+OpenSym Aug 7, 2013 10

Page 11: The Era of Open

MOOCs Are Another Form of Disruption

WikiSym+OpenSym Aug 7, 2013 11

Page 12: The Era of Open

In Short Most Academic Institutions Have Yet to

Embrace the Open Digital Enterprise They Surely Will

Become

WikiSym+OpenSym Aug 7, 2013 12

Page 13: The Era of Open

• Anyone, anything, anytime

• publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies

• “accessible, intelligible, assessable, reusable”

http://royalsociety.org/policy/projects/science-public-enterprise/report/

[Carole Goble]WikiSym+OpenSym Aug 7, 2013 13

Page 14: The Era of Open

Business Models Rule

• The Internet demanded new business models to support scholarly communication

• Open access was one such sustainable model: – Began with the community – Was driven by new organizations (PLOS, BMC,

F1000, eLife, Dryad, Mendeley etc.)– Was NOT driven by academic institutions– Was driven by policies and funders

WikiSym+OpenSym Aug 7, 2013 14

Page 15: The Era of Open

One Metric of Change:Multidisciplinary Open Access

Mega Journal

• This year PLOS ONE will publish over 30,000 papers!

WikiSym+OpenSym Aug 7, 2013 15

Page 16: The Era of Open

This Disruption Got Us Thinking About…

• A paper as only one form of knowledge discovery

• The use of interaction and rich media from which to learn and actually do science

• Reproducibility• Reward structures• Better management of the research lifecycle

P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34

WikiSym+OpenSym Aug 7, 2013 16

Page 17: The Era of Open

This Disruption Got Us Thinking About…

• A paper as only one form of knowledge discovery

• The use of interaction and rich media from which to learn and actually do science

• Reproducibility• Reward structures• Better management of the research lifecycle

P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34

WikiSym+OpenSym Aug 7, 2013 17

Page 18: The Era of Open

Better Management of the Research Lifecycle is Not a

New Concept

WikiSym+OpenSym Aug 7, 2013 18

Page 19: The Era of Open

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

datasetsdata collectionsalgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware

Morin et al Shining Light into Black BoxesScience 13 April 2012: 336(6078) 159-160

Ince et al The case for open computer programs, Nature 482, 2012

[Carole Goble]

Page 20: The Era of Open

The Research Lifecycle

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

SoftwareRepositories

Analysis Tools

Visualization

ScholarlyCommunication

Commercial &Public Tools

Git-likeResources

By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories

Training

Page 21: The Era of Open

The Research Lifecycle

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

SoftwareRepositories

Analysis Tools

Visualization

ScholarlyCommunication

Commercial &Public Tools

Git-likeResources

By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories

Training

Page 22: The Era of Open

automate: workflows, pipeline & service integrative frameworks

pool, share & collaborate web systems

nanopub

semantics & ontologiesmachine readable documentation

scientific software engineering

CSSE

Carole Goble]

Page 23: The Era of Open

Why is This Important to Me Personally?

• My wife is being treated for stage 1 breast cancer

• This highlights for me the disparity between what is happening in the lab and what is happening in the clinic– In the lab cancer is a personalized and treatable

condition– In the clinic we are still equally “poisoning”

patients with drugs first introduced 10-20 years ago

WikiSym+OpenSym Aug 7, 2013 23

Page 24: The Era of Open

http://sagecongress.org/Presentations/Sommer.pdf

WikiSym+OpenSym Aug 7, 2013 24

Josh Sommer]

Page 25: The Era of Open

http://sagecongress.org/Presentations/Sommer.pdf

WikiSym+OpenSym Aug 7, 2013 25

[Josh Sommer]

Page 26: The Era of Open

Most Laboratories

• We are the long tail• Goodbye to the student is

goodbye to the data• Very few of us have

complied (or will comply with the data management plans we write into grants)

• Too much software is unusable

S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136

WikiSym+OpenSym Aug 7, 2013 26

Page 27: The Era of Open

Today’s Research Lifecycle is Digitally Fragmented at Best

• Proof:– I cant immediately reproduce the research in

my own laboratory• It took an estimated 280 hours for an average user

to approximately reproduce the paper

– Workflows are maturing and becoming helpful– Data and software versions and accessibility

prevent exact reproducability

Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review.

WikiSym+OpenSym Aug 7, 2013 27

Page 28: The Era of Open

At the Same Time The Disruption Continues

WikiSym+OpenSym Aug 7, 2013 28

Page 29: The Era of Open

G8 open data charterhttp://opensource.com/government/13/7/open-data-charter-g8

WikiSym+OpenSym Aug 7, 2013 29

Page 30: The Era of Open

• In the US alone..– March 2012 OSTP

commits $200M to Big Data

– OSTP demands sharing plans by August 2013

– GBMF/Sloan provide institutional awards for data science

– NCBI considers data catalog and MyBibliography

And the Disruption Continues

WikiSym+OpenSym Aug 7, 2013 30

Page 31: The Era of Open

Where Will It End?

First We Should Ask What It Is We Wish to Accomplish

WikiSym+OpenSym Aug 7, 2013 31

Page 32: The Era of Open

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Here is What I Want – The Paper As Experiment

1. User clicks on thumbnail2. Metadata and a

webservices call provide a renderable image that can be annotated

3. Selecting a features provides a database/literature mashup

4. That leads to new papers

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

PLoS Comp. Biol. 2005 1(3) e34

32

Page 33: The Era of Open

Here is What I Want – Knowledge Push

• Each evening the labs “Evernote” notebooks are scanned for commonalities from the days activities. These are seeds in a deep search of the webs research lifecycles that has become available since last searched. Results are ranked and presented for consideration over coffee the next morning

http://www.discoveryinformaticsinitiative.org/diw2012

WikiSym+OpenSym Aug 7, 2013 33

Page 34: The Era of Open

Will End With …

• Infrastructure:– Science, Nature, Cell and megajournals all

“open access” – An array of coupled institutional repositories – A central repository – PubMed Central – Open software in full support of the research

lifecycle – The research lifecycle in the cloud

WikiSym+OpenSym Aug 7, 2013 34

Page 35: The Era of Open

Will End With …

• Sociologically:– An end to build it and they will come– Alternative metrics accepted by the

community– Alternative reward systems that recognize the

realities of today’s scholarship, namely:• Open data availability• Software availability• Collaborative research

WikiSym+OpenSym Aug 7, 2013 35

Page 36: The Era of Open

We Have a Way to GoConsider the Life Sciences

• Good News– We have NCBI/EBI– Publishers are starting

to embrace data– Workflows in support

of the research lifecycle are catching on

• Bad News– Sustainability remains

a noun not a verb– Data are organized by

type not by questions asked (silos)

– Tenure committees are still in the dark ages

WikiSym+OpenSym Aug 7, 2013 36

Page 37: The Era of Open

What Can We Do As a Community?

WikiSym+OpenSym Aug 7, 2013 37

Page 38: The Era of Open

Build Trust

38

Data

Trust in the dataand the derived knowledge

WikiSym+OpenSym Aug 7, 2013

Page 39: The Era of Open

What I Have Learned About Trust 1/2

• Trust is like compound interest

• Comes from listening

• Comes from engaging the community in every aspect of the process

• Comes from data consistency and level of annotation

• Comes from responsiveness

• Comes from the quality of the delivery service

39WikiSym+OpenSym Aug 7, 2013

Page 40: The Era of Open

What I Have Learned About Trust 2/2

• Quality begats trust– Quality requires data models/ontologies

• Quality requires people– Annotators are the unsung heroes

• Trust requires provenance & versioning

• Trust requires explaining that all data and knowledge are not created equal

40WikiSym+OpenSym Aug 7, 2013

Page 41: The Era of Open

Beyond Building Trust What Else Can We Do?

WikiSym+OpenSym Aug 7, 2013 41

Page 42: The Era of Open

Think Globally Act Locally

• Support emergent community commons/portals• Be involved in the support and development of

metadata standards• Contribute to workflow development etc. to drive

an open research lifecycle• Educate your mentors on the importance of

open science and scholarly communication • Write software thinking of an App model

WikiSym+OpenSym Aug 7, 2013 42

Page 43: The Era of Open

Understand That All Data/Knowledge Are NOT

Created Equal• We need to understand

how data are used• Sustainability is not

more money from the funding agencies its about business models

• Reductionism is not a dirty word

• We need to do more with the long tail

On the Future of Genomic DataScience 11 February 2011: vol. 331 no. 6018 728-729 WikiSym+OpenSym Aug 7, 2013

Page 44: The Era of Open

Recognize That Institutions Must Play a Greater Role

• We need institutional data/knowledge sharing plans

• We need data/information scientists to be better recognized by institutions – its not all about papers – this implies new metrics

44WikiSym+OpenSym Aug 7, 2013

Page 45: The Era of Open

Learn from the App Store

• The App model– Think of it operating on a content base rather

than a mobile device– Simple and consistent user interface– Needs to pass some quality control– Has a reward

• The App+ Model– Apps interoperate through a generic workflow

interface

WikiSym+OpenSym Aug 7, 2013 45

Page 46: The Era of Open

In Summary

• Open science is a means to accelerate the rate of discovery

• Disruption has begun, but there is great inertia in the system

• All of us are stakeholders and capable of invoking further positive change

• We need to get institutions and more scientists involved….

WikiSym+OpenSym Aug 7, 2013 46

Page 47: The Era of Open

Acknowledgementswww.force11.org

WikiSym+OpenSym Aug 7, 2013 47

Page 48: The Era of Open

[email protected]

• Force11 Manifesto• Fourth Paradigm: Data Intensive Scientific

Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/WikiSym+OpenSym Aug 7, 2013 48