Upload
philip-bourne
View
961
Download
2
Embed Size (px)
DESCRIPTION
Presented at Kansas State University as Part of Open Access Week, October 23, 2012.
Citation preview
Open Data – Where Do We Stand from A Researcher's
Perspective?
Philip E. Bourne
University of California San Diego
My Perspective …• Mine is a biomedical sciences perspective• My lab. distributes for free data equivalent to ¼ the
Library of Congress every month• I am a supporter of open access (provided there is a
business/sustainability model) and founding editor in chief of PLOS Computational Biology
• I am Co-founder of SciVee Inc. and believe innovation comes from open access to knowledge
• Recently became UCSD’s AVC of Innovation which is giving me a more institutional perspective
I Readily Acknowledge Each Discipline is Different
My General Opinion:Where Does the Open Access Debate
Stand Today?
• Its not a question of “if” but a question of “when” and “how” for most disciplines
• We are at the tip of the iceberg in our ability to use OA content
• OA will gain momentum in an increasingly knowledge-based economy
The State of Play:UC Open Access Policy Debate:
Opt Out vs Opt in
• For– Publically funded
research should be public
– Institutional Perspective: The open provision of data and knowledge derived from these data appears to be an unidentified asset at this time
• Against– Cost to some
disciplines– Impact on societies– Journal quality re
promotion– Extra work– Administration– UC as “Big Brother”
We will come back to this, but first let us explore why open
knowledge is so important (to me at least)
Open Data May Save Lives?
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity forH1N1 Influenza related structures
*
3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
Open Science Can Accelerate the Scientific Process…
For some people the change may be too slow to save their life
Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
http://sagecongress.org/Presentations/Sommer.pdf
Chordoma
• A rare form of brain cancer
• No known drugs• Treatment – surgical
resection followed by intense radiation therapy
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
Adapted: http://sagecongress.org/Presentations/Sommer.pdf
Isaac
If I have seen further it is only by standing on the shoulders of giants
Isaac Newton
From Josh’s point of view the climb up just takes too long
> 15 years and > $850M to be more precise
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
The Story of Meredith
What Does Meredith Tell Us?
• The Wikipedia / Kahn Academy /YouTube generation knows no bounds
• Bounds are too often imposed by tradition rather than what makes the most sense
• Another example of an underexploited asset at this time?
Another Way of Thinking About the Implications of What Josh and Meredith Represent Is the
Need for New Forms of Knowledge Management and
Access
Lets Explore this Notion with An Emphasis on Data
The Silos of Data & Knowledge Are Starting to Coalesce
Is a Biological Database Really Different than a Biological Journal?PLoS Comp. Biol. 2005 1(3) e34
The Silos of Data & Knowledge Are Starting to Coalesce
• Supplemental information has exploded
• Data journals are emerging
• The use of rich media is increasing
• Software and other processes are becoming available
• Databases are now knowledgebases
• Science can be done on the fly
• Biocuration is a respectful career
PLoS Comp. Biol. 2008. 4(7): e1000136
Where Does That Take Us?
• A paper is an artifact of a previous era• It is not the logical end product of eScience,
hence:– Work is omitted– Article vs supplement is a mess– Visualization may be limited– Interaction and enquiry are non-existent– Rich media can help, but barriers remain
Where Does That Take Us? Data Sharing Policies
• From the NSF:
• Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing. See Award & Administration Guide (AAG) Chapter VI.D.4.
Big Data is Off…
• March 2012 OSTP commits $200M to Big Data
• NSF, DOD, NIH all announce programs
• GBMF think tank leads to soon-to-be-announced institutional awards
Where Does That Take Us?Add into the Mix:
• Reproducibility• Maintainability• Usability• Reward
• It really is a myth!• DNA doubles in 5 months• Go ahead and try!• Tenure for data – no way
Notwithstanding dreams do emerge …Here is mine
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Here is What I Want
1. User clicks on thumbnail2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
The Knowledge Economy Begins
Immunology Literature
Cardiac DiseaseLiterature
Simultaneously Discovery Informatics Emerges
• Google with not suffice as a scientific knowledge discovery tool
• Google is broad but shallow
• Science is cross-disciplinary narrower and deeper
NSF Discovery Informatics Workshop
• Discoveries surpass an individuals ability - need intelligent tools
• Need to increase connections between knowledge and data
• Need to combine diverse human abilities
Discovery informatics - computer scientists, domain scientists, social scientists - http://www.isi.edu/~gil/diw2012/NSFDiscoveryInformatics2012-FinalReport.pdf
This is Just the Beginning of Discovery Informatics
• Each evening the labs “Evernote” notebooks are scanned for commonalities from the days activities. These are seeds in a deep search of the web for knowledge and data that has become available since last searched. Results are ranked and presented for consideration over coffee the next morning
http://www.discoveryinformaticsinitiative.org/diw2012
Unimaginable Connections Made Automatically Through RDF Descriptions
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html
Before We Get Too Heady Lets Look at the Realities of the
Situation from My Perspective
• Data repositories are broken
• There is a “high noon” effect
• NCBI has been a wonderful model to date…
Data/Institutional Repositories
• Build it and they will come fails most of the time
• Institutional repository is an oxymoron
• NCBI works because:– It is an act of the US congress– It has strong leadership– It has a monopoly on the literature– It has IT thought out over many years
Innkeeper at the Roach Motel D. Salo 2008http://muse.jhu.edu/journals/library_trends/v057/57.2.salo.html
Data/Institutional Repositories
• “High Noon” Effect
– Publishers make knowledge in very difficult, but at least knowledge out, albeit limited is consistent, intuitive and easy to use
– Data repositories make data in and data out very difficult – they strive to be different when in fact users want them to be the same
Data and Journals
• That journals are thinking about data is good
• Dryad etc. are welcome but a stop gap measure
• Fully functional data journals will not occur without a change to the reward system
• Data papers can help shift the reward system
• Are PLoS Topic Pages a sign?
Interim Solution: Use the Traditional Reward SystemThe Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that relate to the journal that are missing of stubs
Develop a Wikipedia page in the sandbox
Have a Topic Page Editor Review the page
Publish the copy of record with associated rewards
Release the living version into Wikipedia
Think Globally Act Locally:
What Can Our Institutions Do Now To Move Us in The Right
Direction?
Institutional Response
• Have repositories that are useful– Use common standards– Are vetted by the community– Are fully open and searchable
• Reward all forms of scholarship
• Leverage the asset …
Most Laboratories
• We are the long tail• Goodbye to the
student is goodbye to the data
• Very few of us have complied (or will comply with the data management plans we write into grants)
UCSD Dropbox
• Simple!!!!• Can drop large files easily• Asks for limited metadata and permissions to
“discover”• Has guaranteed quality of service and
security not available in the cloud• Is the data management plan and charged
against grants• Is a rich campus corpus open to discovery
informatics
The UCSD Dropbox Discovery Environment
• Scenarios:– Fosters known collaborations through
simplified data exchange– Discovers new collaborators through the
same or related data elements– A corpus whose intrinsic value is as yet
unknown
What Do I Want by 2020 or Earlier as a Researcher?
• Answer biological questions not just retrieve data
• Understand all there is to know about the availability and quality of a unit of biological data
• Operate on data in a way that is simpler, more productive, and reproducible
What Do We Need to Do to Get There? A Data Registry?
• Individual repositories register their metadata which includes access statistics, commentary etc. – DataCite is a beginning
• Identify identical data objects and their respective metadata for comparative analysis
• Funders support registration• Publishers support registration
What Do We Need to Do to Get There? An App+ Store?
• The App model– Think of it operating on a content base rather
than a mobile device– Simple and consistent user interface– Needs to pass some quality control– Has a reward
• The App+ Model– Apps interoperate through a generic workflow
interface
In Summary
• We have at hand the means to accelerate the rate of discovery
• To do so we need to place more value on the data, the individuals that produce it and the institutions that maintain it
• We are all stakeholders in this endeavor
• Here is one way to get involved….
Get Involved: FORCE11
• Tools and Resource catalog
• Article database in Mendeley
• Discussion Forum via Google
• Blogs courtesy of blog sites and RSS feeds
• Web site via Drupal• Announcements via
http://force11.org
General References
• Force11 Manifesto
• Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/