Upload
alison-specht
View
57
Download
1
Embed Size (px)
Citation preview
DataONEData Life Cycle:
Tools and Tips
The DataONE Data Life Cycle
2
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Field Research
3
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Monitoring Project
4
Publish
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Synthesis Project
5
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Publish
Develop Solutions for Research
6
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
The DataONE Data Life Cycle
7
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
1. Plan:Create and Follow a Data Management Plan
8
Michener WK (2015) Ten Simple Rules
for Creating a Good Data Management Plan.
PLoS Comput Biol 11(10): e1004525.
doi:10.1371/journal.pcbi.1004525
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
The DataONE Data Life Cycle
26
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
2. Collect and Organize:Logically Structure the Data to Support Use
27
CC
im
ag
e b
y J
ustin
Se
e o
n F
lickr
Jones et al. 2007
2. Collect and Organize
28
• Columns of data are consistent:
only numbers, dates, or text
• Consistent Names, Codes, Formats (date) used in each column
• Data are all in one table, which is much easier for a statistical program to work with than multiple small tables which each require human intervention
2. Collect and Organize
29
• Columns of data are consistent:
only numbers, dates, or text
• Consistent Names, Codes, Formats (date) used in each column
• Data are all in one table, which is much easier for a statistical program to work with than multiple small tables which each require human intervention
Googledocs Forms
Googledocs Forms
Data Entry Tools: Excel
Data Entry Tools: Excel
Excel: Data Validation
20
Excel: Data Validation
20
Excel: Data Validation
20
The DataONE Data Life Cycle
37
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
3. Assure:Incorporate Quality Assurance & Quality
Control
38
0
10
20
30
40
50
60
0 10 20 30 40
Quality Engine
MetaDIG DIBBs
3. Assure
39
3. Assure
40
3. Assure
41
3. Assure
42
3. Assure
43
3. Assure
44
3. Assure
45
3. Assure
46
3. Assure
47
3. Assure
• JMP
• R
• MATLAB
• many others
48
The DataONE Data Life Cycle
49
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
4. Describe:Develop Comprehensive, Standardized
Metadata
50
Darwin Core – species and biodiversity
collections
EML – Ecological Metadata Language
ISO 19115 – geospatial data
http://rs.tdwg.org/dwc/
4. Describe
51
Tools Specify
Morpho
https://knb.ecoinformatics.org/#tools/morpho
http://specifyx.specifysoftware.org
The DataONE Data Life Cycle
52
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
5. Preserve:Protect and Preserve Data for Long-term
Use
53
Catalog of 1,500+ Data Repositories
Exercise• Search for repositories that host particular
types of data (e.g., biodoversity, trait)
• Visit one of the repositories and identify the
services that they offer
54
The DataONE Data Life Cycle
55
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
6. Discover Search a Domain Portal
56
57
58
59
60
Dryad links to journals
61
Provides citation instructions
6. Discover Search a Data Aggregator
62
63
64
65
Data Federations (DataONE,
GBIF)
66
Data Federations (DataONE,
GBIF)carbon cycling
67
Data Federations (DataONE,
GBIF)carbon cycling
68
Data Federations (DataONE,
GBIF)carbon cycling plant biomass
69
Data Federations (DataONE,
GBIF)carbon cycling plant biomass
70
Data Federations (DataONE,
GBIF)carbon cycling plant biomass
ocean nitrogen avian distribution
71
Exercise• Search datadryad.org for plant trait
• Search DataONE.org for plant trait
72
73
74
75
76
77
78
6. Discover:Support Discovery of Relevant Data
79
Dryad DataONE google
plant trait 2,137 26,300,000
plant trait datadryad 803 1,908 17,400
• Differential content searched
• Automated annotation via ontologies and other
approaches
• Differential filtering
• Different definitions of data sets (e.g., entire
package vs individual data sets)
The DataONE Data Life Cycle
80
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
7. Integrate:Enable Data Integration from Different
Sources
81 Jones et al. 2007
7. Integrate:DataONE Provenance Tracking System
82
The DataONE Data Life Cycle
83
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
8. Analyze:https://www.vistrails.org
84
85
8. Analyze:http://kepler-project.org
86
8. Analyze:http://kepler-project.org
87
8. Analyze:https://taverna.incubator.apache.org
8. Analyze:https://www.myexperiment.org/
88
Best PracticesWebinar series Lessons and
Exercises
DataONE.orgEducation Resources
89
90
DataONE Vision and Mission
91
92
dataone.org