29
WheatIS Genetics and Genomics Virtual Research Environment for the Wheat research community Plant Genetics Congress 11 th -12th May 2015 London

Wheat Information System

Embed Size (px)

Citation preview

WheatISGenetics and Genomics Virtual Research

Environment for the Wheat research community

Plant Genetics Congress

11th-12th May 2015 London

Wheat Initiative

Reinforce synergies between bread and durum wheat national and international research programs to increase food security

Coordinate worldwide research efforts in the fields of wheat genetics, genomics, physiology, breeding and agronomy

Foster communication between the research community, funders and global policy makers at the international level to meet their

research and development goals

Sharing resources, methods and expertise to improve and stabilize yields

Meeting of G20 Agriculture Ministers www.wheatinitiative.org

WheatIS Expert Working Group

Wheat Initiative steering board commissioned a

WheatIS EWG

Expert Working Group

• Build projects

• Build infrastructure

• Report to the Wheat Initiative

WheatIS Goals

Provide the wheat research community with a single entry point of access to genetic and genomics resources.

Promote the development of services on top of current wheat / Triticeae databases.

Authority to define guidelines for data curation, nomenclature, standards and integration.

Registry for bioinformatics tools.

Project Principles• Federated Network of Platform

• Ensure each platform sustainability through promoting their visibility and adoption.

• WheatIS will not replace the current wheat databases

Build collectively to better cope with the needs of the

community

• Evolving infrastructure

• Rely on a distributed system.

• Use of Virtual Machine and Cloud Computing technologies to share easily data and tools.

Incremental implementation to rapidly build an operational site.

• WheatIS to support the adoption of standards and the integration of data

• Gene nomenclature

• Best-practice guidelines

Promote data exchange following

an open-access model.

WheatIS Core

File

repositoryWeb

portalSearch

Integrated

DB

DB Indexes

WheatIS node

Storage

Distributed storage

DB Indexes

WheatIS node

Storage

DB Indexes

WheatIS node

Storage

WheatIS Architecture

WheatIS nodes (#12)

CerealsDB

UCW

A bioinformatic Infrastructure for wheatimprovement

Genome• Annotation

• Transcriptome

• SNP / Structural variant

Genetics• Genetic maps

• Genetic markers

• Genetic resources

Phenotypes• GxE

• QTL maps

• GWAS

WheatIS

• Extract

• Transform

• LoadGet data

•Consistency•Persistency•Efficient storage

Link data

•No persistency•Analyses oriented•Efficient query

Build data sets

Integrative

DB

Integration process

9

QueryDB

wheatis.org

DATA DISCOVERY

Data discovery in distributed databases

CerealsDB

UCW

Data model

wheatis.org

Develop a web semantic interoperability

.015EPSO-FEBS 2014, le 27/06/14

Ontology based annotation of database schemas

RDF triple store modeling of the databases schemas

Semantic web servicesIntegrated dispersed data

DATA STANDARDS

RDA Wheat Data Interoperability

Working Group

The WDI working group in brief

Endorsed by RDA in March 2014

Members: ~=30 members and 15 active members, Wheat

scientists, data and metadata technologists

The goal: contribute to the improvement of Wheat related

data interoperability by

Building a common interoperability framework (metadata, data formats and vocabularies)

Providing guidelines for describing, representing and linking Wheat related data

The results (so far)

Surveys

•Landscape of Wheat related standards and their use by the community

•Comprehensive overview of Wheat related ontologies and vocabularies

Workshops

•Recommendations

•Mappings between different data formats

•Actions to conduct in order to improve the current level of Wheat related data interoperability

• Interoperability use cases

Implementation

• Interactive cookbook: recommendations + guidelines

•A repository of Wheat related linked vocabularies (Bioportal)

Data type Data formats currently used Recommendations

Standardized Tool specific Non standardized

SNPs VCF BAM/SAM,BED, VARSCAN, VEP

VCF files generated by using the survey sequences of IWGSC + metadata about VCF files to enrich the information about the SNPs.

genomeannotations

Genbank Flat File, General FeatureFormat (GFF), EMBL

GFF 3 + specifications with regard the description of specific columns

Germplasms MPCD, ABCD, Darwin Core, Darwin CoreGermplasm

Grin Global tabulated MPCD

Gene expression

Many format standards laid out by repositories such as NCBI (GEO) and EBI Array Express

Existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA

Physical maps

GFF Cmap, fpc GFF3

Genetic maps Cmap, gnpmap

GFF3 (to be confirmed)

Phenotypes Drops, ped, isa-tab, ephesis

tabulated Isa-tab

Examples of use casesTitle Searching for germplasm with specific traits

Description Example of searching for germplasm with specific traits - tagged with ontology terms?

Data types Germplasm

Phenotype

Challenges ● Metadata very important ~ standardized format

● Association of genes to traits, linked to germplasm, marker information

● Need for quality controls- how confident are you of the data source?

● Provenance of the germplasm- pedigree, ownership,

● Standard system for tracking germplasm, names

Title Identification of wheat genes that control root growth

Description Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation)

Data types Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link)

Challenges Mapping between wheat genes and orthologs from other species (deduce function by seq.

similarity); Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ;

mapping of wheat genes and information on their function based on literature

Title Query on trial data associated with varieties

Data types Phenotypic data, GIS data, (wheat economy/production data)

Description To search wheat varieties with distribution maps, production figures, performances in wheat mega

environments, associated projects worldwide plus layers of climatic data on specific wheat

production areas and disease prevention information.

Challenges Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool

should be able to pull out such information from different websites/systems developed by wheat

community.

wheatis.org

COMPUTE INFRASTRUCTURE

Pipeline Galaxy

VM

Input Files

Project concept

WheatIS Core

File

repositor

y

Web

portalIntegrate

d DB

DB Indexe

s

WheatIS node

Storag

e

Distributed Storage

DB Indexe

s

WheatIS node

Storag

e

DB Indexe

s

WheatIS node

Storag

e

Output Files

Search

Standards

WS

WheatIS

26

Data Integration / data mining Workbench

18/05/2015 26

QueryDB QueryDBIndexes

RDF

ExternalDB

User

Developer

Data

banks

User

files

Genome analysis

•Gene structure annotation

•SNPs, short Indels

•RNAseq analysis

Functional analysis

•Integration with gene networks, semantic data integration

•Prioritize candidate genes in QTL regions

Analysis of non-coding regions and their functional impact

•Repeat detection and annotation

•Transcriptomics and epigenetics analysis

GWAS analysis and QTL identification

• regions/genes important for agronomical traits

QTL analysis and wheat genetic selection

•Map QTL and markers on genome sequence

•Provide list of genotypes for breeding

Phenotype modeling

•GxE interaction

Nanochannel optical mapping

•Assembly, visualisation

•Large structural variation analysis

Durum wheat

•Datasets extraction from bread wheat data

Datacenter(s)

Query DB

Integrating

DB

Computation

Methods/Models

Users data

Predictions

Integrated dataDatabases

Tools

Towards an e-Infrastructure for predictive biology

Developer

Acknowledgments

• WheatIS expert working group– Chair: Hadi Quesneville

– Co-chairs: Dave Edwards, Gerard Lazo, Mario Caccamo

– Members: Michael Alaux, Ruth Bastow, Ute Baumann, Fran Clarke, Jorge Dubcovsky, David Edwards, Takeshi Itoh, Paul Kersey, David Marshall, Kate Dreher, Dave Matthews, Klaus Mayer, Amidou N’Diaye, Chris Rawlings, Franck Rober, Doreen Ware.

• RDA Wheat Data Interoperability Working Group– Esther Dzalé Yeumo Kadoré, Richard Fulss et al.

• TransPLANT– Paul Kersey et al.