Upload
junior-mason
View
217
Download
0
Embed Size (px)
Citation preview
The Future of Microalgal TaxonomyAnne Thessen, [email protected] Patterson [email protected](Data Conservancy, Life Sciences)
Setting the stage for a ‘big new biology’
• BIG = data-centric (like particle physics and astronomy)
• Characterized by data sharing via a virtual pool
• New = new skill sets, tools, cyber-infrastructure to exploit the data pool
• Data driven discovery as a new means of understanding
• GenBank as a model within the Life Sciences
Small science
Large number of providers with small amounts of data.
Small number of providers with lots of data.
Aa paleacea
Limulus polyphemus
Kiwa hirsuta
Osedax frankpressi
Kingia australis
Names
Pieris japonica
Pieris rapae
Trypanosoma brucei
Homo sapiens
Many names for one taxon
Didimosphenia geminata
Didymosphenia geminata
Didymosphenia geminata
Didymosphenia geminata
Rock snot
Didymo
Echinella geminata
Gomphonema geminatum
Gomphonema vulgare
Reconciliation Group
Didymosphenia geminataDidimosphenia geminataDidymoRock SnotEchinella geminataGomphonema geminatumGomphonema vulgare
Reconciliation Group
Didymosphenia geminataDidimosphenia geminataDidymoRock SnotEchinella geminataGomphonema geminatumGomphonema vulgare
One name for many taxa
Cyclophora tenuis Cyclophora Castracane 1878
Cyclophora Cyclophora Hübner 1822 Cyclophora porata
.
Contextual data
DiatomChloroplastFrustuleBenthicMarine
Disambiguate by authority, species, contextual data
Contextual data
FoodMoth
WingsExoskeleton
Caterpillar
Global Names Architecture
Provider Services
DATA AND SERVICE CONSUMERS
DATA AND SERVICE PROVIDERS
EXPERTS
Consumer Services
GNA
Names-based cyberinfrastructure
• Managing names to manage biodiversity data- All names (scientific vernacular surrogate)- For all organisms- Many names for one species reconciled- One name for many species disambiguated
• Global Names Architecture - a virtual layer, using names services to link together
distributed data• Globalnames.org• Micro*scope (microscope.mbl.edu) and
Encyclopedia of Life (eol.org)
Legacy Data
• Narrative tradition in biology
• Too much for a human• Can we get a machine
to do the work?• NLP!!!
The New Workforce
• Informatics/computing training• Modified workflows• Importance of data management and
preservation
In Summary
• Big New Biology is coming, taxonomy can benefit from being a part of it
• Existing data can be made machine-readable using information extraction algorithms
• Existing workflows can be modified to capture data close to the source
• Data can be shared using the semantic web