Upload
vassilis-protonotarios
View
120
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Presentation of the two agINFRA Germplasm data sources (CGRIS, China and CRA, Italy) and the metadata used for the description of their germplasm accessions. Presented during Session 2 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
Citation preview
Metadata analysis of germplasm collections
The case of agINFRA
Dr. Vassilis ProtonotariosAgricultural Biotechnologist, PhDAgro-Know Technologies, Greece
e-Conference on Germplasm Data InteroperabilitySession 2: “Status of data and metadata for germplasm”
Structure of the presentation
1. The agINFRA germplasm data sources– Chinese Crop Germplasm Information System– Italian National Germplasm Database
2. Current status– Mappings– Linked Data approach
3. Conclusions
The agINFRA germplasm data sources
agINFRA germplasm data sources
• Italian Germplasm Database (CRA)– Data available through EURISCO -> GENESYS– Uses EURISCO set of descriptors– Data also available through GBIF
• Chinese Crop Germplasm Information System (CGRIS/CAAS)– Data unavailable through aggregators– Own schema used for description of germplasm
accessions– Metadata exposure in CSV
agINFRA germplasm data analysis
1. Analysis of agINFRA germplasm data sources2. Analysis of metadata schemas used3. Identification of external schemas– Review of existing work
4. Definition of a base schema (descriptors)5. Mappings of various schemas to the base one6. Development of a linked data approach for
linking germplasm data sources
1. Chinese Crop Germplasm Information System (CGRIS / CAASD)
Chinese Crop Germplasm Information System (CGRIS)
• Provided by: Chinese Academy of Agricultural Sciences• A central repository for all type of plant genetic resources
information. It consists of six subsystems: 1. The management system of the National Crop Gene Bank (NCGB), 2. The management system of the long-term storage in Qinghai, 3. The management system of National germplasm Resources Nursery, 4. The crop characterization and evaluation database system,5. The database system for germplasm exchange at home and abroad
and 6. The management system of the medium-term storage in Beijing.
URL: http://icgr.caas.net.cn/cgrisintroduction.html
CGRIS: Data
At present, CGRIS owns• > 2000 MB data on 180 kinds of crops– including food crops, fibre plants, oil crops,
vegetable, fruit tree, tea, mulberry, tobacco, sugar, green manure crops, tropical crops etc.),
• 390,000 accessions of germplasm
CGRIS: Accessions (indicative list)
http://icgr.caas.net.cn/cgrisintroduction.html
Crop Germplasm Classification
Info on wheat varieties
Info on wheat varieties
CGRIS: Germplasm Data Query
CGRIS: Germplasm Data Query
CGRIS Metadata
• CGRIS germplasm descriptors based on own schema – can be seen as the de facto standard for
germplasm accession information in China. – Based on metadata scheme standards such as
developed by IPGRI (Bioversity) and GRIN
CGRIS: Basic Descriptors
CGRIS: Wheat descriptors
CGRIS Metadata: Next steps
• A mapping to the Multi-crop Passport Descriptors (MCPD) standard is intended– According to CAAS subject experts such a mapping
should be rather easy to produce.
CGRIS: Exposing data
• Data stored in relational DBs • Hosted in an SQL server• Exposure of data as CSV files (partially in
Chinese)
CGRIS: IPR information
• The CGRIS website is public and accessible for everybody. The information is provided free of charge but based on copyright.
• With regards to data exchange there is no explicit policy to follow.
• CGRIS does not have an Open Access mandate and the members of the CGRIS network apply their own institution policy.
2. Italian Germplasm Database (CRA)
Italian Germplasm Database
• Provided by: Italian Council for Research and Experimentation in Agriculture
• Developed in the context of the “Plant Genetic Resources/FAO” project in 2004 – Research Centres and Units of the CRA – The Institute of Plant Genetics of the CNR in Bari, – NGO “Rete Semi Rurali”– University collections (Perugia, Potenza etc.)
URL: http://fru.entecra.it
CRA Germplasm: Data
Current status of germplasm data (CRA)• 20,954 records from Italy are included in
EURISCO of which 17,212 from CRA • 28,509 records for 275 plant species in the
National Inventory (in general)– does not allow for identifying the number of CRA
germplasm records
CRA: Accessions (indicative list)
URL: http://fru.entecra.it/accessioni.php
Info on specific species
EURISCO descriptors
CRA Metadata
• Most CRA institutional databases use the MCPD– however, in the records provided to the National
Inventory several fields are often not filled. • Some CRA collections also use descriptors
defined by – the Union for the Protection of New Varieties of
Plants (UPOV) and – the National Register of New Varieties.
• Ensure mapping to the Multi-crop Passport Descriptors (MCPD)/EURISCO
CRA: IPR information
• The CRA website is public and accessible for everybody. The information is provided free of charge but based on copyright
• The Multilateral System (MLS) of the Treaty demands free availability of the information on the PGRFA that are under the management and control of the Contracting Parties and in the public domain (Treaty, Art. 11.2).
• This excludes – germplasm accessions that are subject to IPR and – other legally binding protection which restricts the Contracting
Party’s control over the material. – Accessions that are not covered by IPR include old and
autochthonous varieties, crop wild relatives and other material found in in-situ conditions, new cultivars not protected by IPR and cultivars whose IPR have expired.
Conclusions
Current status
• First version of mappings is available• EURISCO descriptors used as base schema– MCPD– Darwin Core for Genebanks– ABCD– CGRIS– CRA
Mapping table
Mapping table
Development of decision trees
Development of decision trees
Linked Data
• A linked data approach will be used by agINFRA for linking germplasm data sources
• OpenAGRIS already aggregates germplasm data using AGROVOC
Conclusions
• Both schemas / sets of descriptors can be mapped to the EURISCO ones
• Linked Data approach will facilitate linking of germplasm data from CRA/CGRIS
• EURISCO descriptors to be published as linked data– To be used as the base of passport data
• Linking to other germplasm standards– e.g. Darwin Core for Genebanks*
*https://code.google.com/p/darwincore-germplasm/wiki/DarwinCoreGermplasmMapping
Take home message
• The identification of common properties between different metadata schemas will facilitate the linked data framework
(Indicative) List of References
• agINFRA Deliverable D2.3 “Review of Content Requirements”
• agINFRA Deliverable D5.3 “Conceptual specification of linked agricultural data framework”
• agINFRA Germplasm Working Group Wiki http://wiki.aginfra.eu/index.php/Germplasm_Working_Group
• EURISCO passport descriptors http://www.ecpgr.cgiar.org/germplasm_databases.html
• Draft Mapping of EURISCO Descriptors to ABCD 2.06 http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD.pdf
Source: http://verastic.com/social/why-do-people-not-say-thank-you.html
Contact me: [email protected]