Upload
wheat-initiative
View
57
Download
1
Tags:
Embed Size (px)
Citation preview
WheatISGenetics and Genomics Virtual Research
Environment for the Wheat research community
Plant Genetics Congress
11th-12th May 2015 London
Wheat Initiative
Reinforce synergies between bread and durum wheat national and international research programs to increase food security
Coordinate worldwide research efforts in the fields of wheat genetics, genomics, physiology, breeding and agronomy
Foster communication between the research community, funders and global policy makers at the international level to meet their
research and development goals
Sharing resources, methods and expertise to improve and stabilize yields
Meeting of G20 Agriculture Ministers www.wheatinitiative.org
WheatIS Expert Working Group
Wheat Initiative steering board commissioned a
WheatIS EWG
Expert Working Group
• Build projects
• Build infrastructure
• Report to the Wheat Initiative
WheatIS Goals
Provide the wheat research community with a single entry point of access to genetic and genomics resources.
Promote the development of services on top of current wheat / Triticeae databases.
Authority to define guidelines for data curation, nomenclature, standards and integration.
Registry for bioinformatics tools.
Project Principles• Federated Network of Platform
• Ensure each platform sustainability through promoting their visibility and adoption.
• WheatIS will not replace the current wheat databases
Build collectively to better cope with the needs of the
community
• Evolving infrastructure
• Rely on a distributed system.
• Use of Virtual Machine and Cloud Computing technologies to share easily data and tools.
Incremental implementation to rapidly build an operational site.
• WheatIS to support the adoption of standards and the integration of data
• Gene nomenclature
• Best-practice guidelines
Promote data exchange following
an open-access model.
WheatIS Core
File
repositoryWeb
portalSearch
Integrated
DB
DB Indexes
WheatIS node
Storage
Distributed storage
DB Indexes
WheatIS node
Storage
DB Indexes
WheatIS node
Storage
WheatIS Architecture
A bioinformatic Infrastructure for wheatimprovement
Genome• Annotation
• Transcriptome
• SNP / Structural variant
Genetics• Genetic maps
• Genetic markers
• Genetic resources
Phenotypes• GxE
• QTL maps
• GWAS
WheatIS
• Extract
• Transform
• LoadGet data
•Consistency•Persistency•Efficient storage
Link data
•No persistency•Analyses oriented•Efficient query
Build data sets
Integrative
DB
Integration process
9
QueryDB
Develop a web semantic interoperability
.015EPSO-FEBS 2014, le 27/06/14
Ontology based annotation of database schemas
RDF triple store modeling of the databases schemas
Semantic web servicesIntegrated dispersed data
The WDI working group in brief
Endorsed by RDA in March 2014
Members: ~=30 members and 15 active members, Wheat
scientists, data and metadata technologists
The goal: contribute to the improvement of Wheat related
data interoperability by
Building a common interoperability framework (metadata, data formats and vocabularies)
Providing guidelines for describing, representing and linking Wheat related data
The results (so far)
Surveys
•Landscape of Wheat related standards and their use by the community
•Comprehensive overview of Wheat related ontologies and vocabularies
Workshops
•Recommendations
•Mappings between different data formats
•Actions to conduct in order to improve the current level of Wheat related data interoperability
• Interoperability use cases
Implementation
• Interactive cookbook: recommendations + guidelines
•A repository of Wheat related linked vocabularies (Bioportal)
Data type Data formats currently used Recommendations
Standardized Tool specific Non standardized
SNPs VCF BAM/SAM,BED, VARSCAN, VEP
VCF files generated by using the survey sequences of IWGSC + metadata about VCF files to enrich the information about the SNPs.
genomeannotations
Genbank Flat File, General FeatureFormat (GFF), EMBL
GFF 3 + specifications with regard the description of specific columns
Germplasms MPCD, ABCD, Darwin Core, Darwin CoreGermplasm
Grin Global tabulated MPCD
Gene expression
Many format standards laid out by repositories such as NCBI (GEO) and EBI Array Express
Existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA
Physical maps
GFF Cmap, fpc GFF3
Genetic maps Cmap, gnpmap
GFF3 (to be confirmed)
Phenotypes Drops, ped, isa-tab, ephesis
tabulated Isa-tab
Examples of use casesTitle Searching for germplasm with specific traits
Description Example of searching for germplasm with specific traits - tagged with ontology terms?
Data types Germplasm
Phenotype
Challenges ● Metadata very important ~ standardized format
● Association of genes to traits, linked to germplasm, marker information
● Need for quality controls- how confident are you of the data source?
● Provenance of the germplasm- pedigree, ownership,
● Standard system for tracking germplasm, names
Title Identification of wheat genes that control root growth
Description Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation)
Data types Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link)
Challenges Mapping between wheat genes and orthologs from other species (deduce function by seq.
similarity); Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ;
mapping of wheat genes and information on their function based on literature
Title Query on trial data associated with varieties
Data types Phenotypic data, GIS data, (wheat economy/production data)
Description To search wheat varieties with distribution maps, production figures, performances in wheat mega
environments, associated projects worldwide plus layers of climatic data on specific wheat
production areas and disease prevention information.
Challenges Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool
should be able to pull out such information from different websites/systems developed by wheat
community.
Pipeline Galaxy
VM
Input Files
Project concept
WheatIS Core
File
repositor
y
Web
portalIntegrate
d DB
DB Indexe
s
WheatIS node
Storag
e
Distributed Storage
DB Indexe
s
WheatIS node
Storag
e
DB Indexe
s
WheatIS node
Storag
e
Output Files
Search
Standards
WS
WheatIS
26
Data Integration / data mining Workbench
18/05/2015 26
QueryDB QueryDBIndexes
RDF
ExternalDB
User
Developer
Data
banks
User
files
Genome analysis
•Gene structure annotation
•SNPs, short Indels
•RNAseq analysis
Functional analysis
•Integration with gene networks, semantic data integration
•Prioritize candidate genes in QTL regions
Analysis of non-coding regions and their functional impact
•Repeat detection and annotation
•Transcriptomics and epigenetics analysis
GWAS analysis and QTL identification
• regions/genes important for agronomical traits
QTL analysis and wheat genetic selection
•Map QTL and markers on genome sequence
•Provide list of genotypes for breeding
Phenotype modeling
•GxE interaction
Nanochannel optical mapping
•Assembly, visualisation
•Large structural variation analysis
Durum wheat
•Datasets extraction from bread wheat data
Datacenter(s)
Query DB
Integrating
DB
Computation
Methods/Models
Users data
Predictions
Integrated dataDatabases
Tools
Towards an e-Infrastructure for predictive biology
Developer
Acknowledgments
• WheatIS expert working group– Chair: Hadi Quesneville
– Co-chairs: Dave Edwards, Gerard Lazo, Mario Caccamo
– Members: Michael Alaux, Ruth Bastow, Ute Baumann, Fran Clarke, Jorge Dubcovsky, David Edwards, Takeshi Itoh, Paul Kersey, David Marshall, Kate Dreher, Dave Matthews, Klaus Mayer, Amidou N’Diaye, Chris Rawlings, Franck Rober, Doreen Ware.
• RDA Wheat Data Interoperability Working Group– Esther Dzalé Yeumo Kadoré, Richard Fulss et al.
• TransPLANT– Paul Kersey et al.