Upload
anabel-cowgill
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Making your data work for you:Scratchpads, publishing & the
Biodiversity Data Journal
Vince Smith1, Dave Roberts1 & Lyubomir Penev2
1. Natural History Museum, London2. Pensoft Publishers, Sofia, Bulgaria
Linnean Society, UK20 September, 2012
Our informatics grand challenge…
“Link together evolutionary data… by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses”
Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
Our informatics grand challenge…
Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
This requires data, information & knowledge to be…
• Digital Not printed paper
• Openly accessible Not behind barriers
• Linked-up Not in silos
“Link together evolutionary data… by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses”
• 15-20k new spp. described annually (2M total)1
• 30k nomenclatural acts (12M total) 1
• 20k phylogenies (750k total)2
• 31k taxa sequenced (360k taxa total)3
• 800k BioMed papers (40M total pp. of taxonomy) 4
• Countless specimens, images, maps, keys…
Most of our output is not digital, open or linked
Typically generated by small communities for “local” research projects
Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
ScratchpadVirtual Research Environments
Making taxonomy digital, open & linked
Your data1
“Published” & reviewedon your site
3Uploaded &
tagged
2
Fast Intuitive Fit for use
What is a Scratchpad?A website for you & your community
Scratchpads• EDIT (07-11), ViBRANT / eMonocot (11-
13)
• Hosted websites for taxonomists• Taxonomic, regional or societal • Research & publication platform • Supports the taxonomic workflow • Modular (Drupal) & flexible • Two full time developers • Ecosystem of communities (~450)
http://scratchpads.eu
Categories of Scratchpads
Taxa(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic,
genotypic & morphometric datasets, keys, phylogenies)
ProjectsConservation Regions Societies
+Administration -Change your site information -Change you front page -Change your logo -Activity and access logs+Backup -Backing up your data -Restoring your data+Bibliography -Creating a record -Importing from a ref. manager -Exporting to a reference manager+Blog -Creating and adding a blog+Custom Content -Defining a CCK -Importing from a spreadsheet -Creating a custom view+Fileshare -Creating and using a fileshare+Forum -Altering the forum settings -Creating a container for a forum -Creating a new forum -Creating a new topic inside a forum
+Groups -Creating a group -Subscribing to a group+Image -Uploading & basic annotation -Linking image & location records -Linking image & specimen records -Linking image & publication records -Overlay annotations on images+Layout -Change your theme -Menus -Blocks and sidebars+Locations -Creating a record -Importing from a spreadsheet+Pages -Creating, editing, cloning & deleting -Configuring the panels template+Panels -Adding & configuring content -Creating a new panel -Citing a Panels page+Phylogeny -Adding a phylogenetic tree
+Specimens -Creating a record -Importing from a spreadsheet -Linking specimen & location records -Linking specimen & pub. records+Tasks -Creating a tasklist+Taxonomy -Importing from a spreadsheet -Importing from ClassificationBank -Starting from scratch -Taxonomy manager -Displaying a classification -Adding names -Deleting names -Taxonomy & panels+Users -Your settings -Adding a new user -User roles and permissions -Adding and editing user profile fields -Logging in+Webform -Creating and using webforms
What can Scratchpads do?
Summary of what Scratchpads can do
• Taxon pages, generated from tagged content (plant/animal)• Bibliography management• Character matrixes• Specimen records• Distribution maps (from specimens and regional)• Images, video and sound (bulk import)• Excel spreadsheet import (dynamically generated)• Darwin Core Archive export• Tabular data editing• Custom content• User management• Custom webforms• EOL data import (taxonomy, species information)• GBIF Map integration
Nodes, 430, 948
Sites 326Users 6809Active Users 5733(273 w / 759 m)
Site
s Use
rs
Scratchpad v.1 usage (2007- Mar. 2012)
ViBRANT SP 2
• Prof. scientists• Amateur naturalists• Citizen scientists
Range: 1-1049Mean: 15Mode: 1
Scratchpad 2 – the new version of Scratchpads
• More professional• Easier to…
- configure (workflows)- navigate (facets)- & populate (MS Excel templates)
• Greater standardisation• Still highly flexible• Project profiles (eMonocot)• Framework for integration
• Launched March 2012• 120 sites to date• EOL Fellows• SP1 migration ongoing
e.g. http://ihs.myspecies.info/
Getting data in and out of Scratchpads 2
Sustainable training, support & development
• Wiki- Training manuals, videos & glossary
• In-site Support- One click help within your site
• Training Courses (12 in 2012)- UK (6), Sweden, (2) Greece (1),
Bulgaria (1), South Africa (1), Brazil (1)
• Ambassadors Programme- Enthusiastic experienced users- Local support
• Embedded Issues Queue- Bug reports- Feature requests
• Sandbox Site- http://sandbox.scratchpad.eu
• Open Source Development- http://scratchpad.eu/develophttp://scratchpad.eu/help
Online community revision
Freeloader flieshttp://milichiidae.info
• Taxonomy is in perpetual beta- Constantly evolving- Changing contributors- Small granular contributions
• Sustainability- A permanent space to work- Guaranteed access (2016)- Easy ways to get the data out
• Open science- Beyond Open Access- New ways of working- Data management plans
• Need incentives to use- More efficient (functions & reuse)- Attribution & provenance- Credit via citation
• New forms of publication
Publishing observations & taxon data
Specimen records & species pages on Scratchpads
Pushed to GBIF & EOL(requires site registration with
GBIF & EOL)
>19K specimen records> 122k species pages
>377M specimen records GBIF> 1 M species pages in EOL
http://scratchpads.eu > http://gbif.org & http://eol.org
Darwin Core
Archive (DwCA)
Experiments with article publishing
Paper assembled from Scratchpad database
XML submission, peer review & marked-up publication by Pensoft
5-step workflow for selecting data, adding metadata & previewing
Published in Zookeys & Phytokeys(worldwide coverage)
PD
FH
TM
LX
ML
http://scratchpads.eu > http://pensoft.net
doi:10.3897/zookeys.50.539
Example papers via Scratchpads…Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50:
79–90. doi: 10.3897/zookeys.50.506Faulwetter S, Chatzigeorgiou G, Galil BS,
Nicolaidou A, Arvanitidis C (2011. ZooKeys 150: 327–345. doi: 10.3897/zookeys.150.1877
Brake I, von Tschirnhaus M (2010). ZooKeys 50: 91–96. doi: 10.3897/zookeys.50.505
http://milichiidae.info/node/14995http://polychaetes.marbigen.org/node/35http://sciaroidea.info/node/44428
Live (updated) versions of these papers
But…
• Limited uptake in 2 years- 1 genus- 6 n. spp- 11 re-descriptions
• Software bugs- Pushing the boundaries of SP1- Fixed in SP2
• Focused on synthetic papers- Not suited to small papers- Less emphasis on data- Hard to properly link in the data
• More effort than MS Word- Especially for new SP users
BDJThe Biodiversity Data Journal
Making small data big!
BUT…• We need to encourage taxonomists to mobilize & describe their data• This takes considerable effort (e.g. Scratchpads)• “Arguably” this is best rewarded through credit• This means papers and citations• Process must be very easy for authors• Process must facilitate data reuse• Meet “Open Data” policy commitments
• The Biodiversity Data Journal is very different…
Why do we need another new journal!!!Taxonomy needs less fragmentation, not more!
Biodiversity Data Journal (BDJ)
• All data matters: No lower or upper limit of manuscript size!• Multiple publishing routes (not just Scratchpads)• ALL within a single online collaborative platform, including
the writing of the manuscript!• New collaborative article authoring tool• Community peer review with “open” &“public” options• This is in addition to conventional peer-review• Online editorial process and version control• Standards-compliant (Darwin Core, Dublin Core, NLM etc.)
• Pre-defined Code-compliant article templates
BDJ publication & dissemination workflow
Pensoft manuscript writing tool
• Collaborative online editing• Rich text capabilities• Various templates for taxon treatments• Identification keys builder
• Assembling plates from single figures• References import• (CrossRef, PubMed Central, etc.)
• Species occurrence data import (Darwin Core compliant)
• Smart citation for figures, tables, references & automated positioning
Testing screenshots of the writing tool
ID Keypreview
Multi-figure plates Plate layout
ID Keybuilder
Manuscript preview
Why publish in the BDJ?
• Joining (small) data into a large data pool• Open-access, archiving and re-using your data
through data aggregators • Providing citation record and creditability for data in
the form of peer-reviewed publications• Facilitating online article authoring and editorial
process for authors, reviewers and editors• Using a truly innovative dissemination of atomized
content• Very low-cost. Free in the launch phase, thereafter at
fee that anyone can afford!
What will BDJ publish?
• Single taxon treatments and nomenclatural acts • Local or regional checklists• Sampling reports and occasional inventories• Habitat-based checklists and inventories• Ecological and biological observations of species
and communities?• Single identification keys • ANY KIND of biodiversity-related database, including
genomic, ecological and environmental data (data papers)
• Biodiversity-related software tools
Starting late 2012, early 2013 Recruiting editors now
Acknowledgements
• Scratchpad technical development- Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton,
• Scratchpad outreach- Irina Brake, Laurence Livermore, Dimitris Koureas
• E-Monocot - Paul Wilkin &the Kew team, Charles Godfray & the Oxford team
• ViBRANT- Dave Roberts, Lucy Reeve & many many more
• Pensoft- Lyubomir Penev, Teodor Georgiev & colleagues
• Our 7,000+ users
Why we need new methods of publishing…
Primary data
Drawings: Slavena Peneva
Publishing and sharing of primary data
RE-USEof
CONTENT
Source: Wikipedia