20
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson

CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson

Embed Size (px)

Citation preview

CASIMIR Networking MeetingHeathrow, July 2007

CASIMIR WP4Data Representation

John Hancock

Duncan Davidson

CASIMIR Networking MeetingHeathrow, July 2007

Objectives

• Assessment of technical aspects of database interoperability as

a barrier to scientific and financial sustainability

• Assessment of the variability of practice in the semantics of

biological data representation, e.g. genotype, gene expression

• Assessment of emerging standards and current practice for data

representation, annotation and ontologies

CASIMIR Networking MeetingHeathrow, July 2007

• 4.1 - D9 - Classified list of data representations in European mouse-centric and related databases• 4.4 - Network meeting 1 - June-Sep 07 - Bring together bioinformatics reps from (EU-funded) mouse projects to discuss data representation• 4.4 - Joint work package meeting to discuss results (4-5 Oct 07)• 4.5 - Sep - Dec 07 - Report of network meeting• 4.6 - Present conclusions at meetings

CASIMIR Networking MeetingHeathrow, July 2007

Discussion Points

• What do we understand by “data representation” - is it just CVs/Ontologies?– Interaction with other work packages

• What kinds of data?• What ontologies? How many on the PRIME

list do you use? Do you use others? Do you use OBO ontologies by default?

• What processes are they involved in elsewhere to discuss/unify data representation?

CASIMIR Networking MeetingHeathrow, July 2007

Future: Cross-Species Interactions

• Mouse-Human must be a priority because of the disease angle

• Mouse-Rat - already quite well integrated (?To what extent?) because of MGI-RGD-OBO interactions

• Other important models– Chick (ChickEST (UK), ChickVD (CN), Ensembl, others?)– Xenopus– Zebrafish– Drosophila– C. elegans– Yeast, E.coli

• In longer term get together with community reps to discuss similarities & differences

CASIMIR Networking MeetingHeathrow, July 2007

Extant Resources

• PRIME Expert Group Report and Outcomes

• Euromouse

• Interphenome discussion group & pilots

• EUMORPHIA/EUMODIC bioinformaticians

CASIMIR Networking MeetingHeathrow, July 2007

PRIME Expert Group

• Draft lists of:– Databases– Ontologies

CASIMIR Networking MeetingHeathrow, July 2007

Interphenome

• Phenotype data:

– Common data description

– Common protocol description

– Standard for data exchange

CASIMIR Networking MeetingHeathrow, July 2007

Interphenome - Current Status

• Ontologies– Investigate cross-mapping of current approaches and

eventual possible convergence (?)

• Protocols– Work on developing a format that can accommodate all

information needed for a protocol– Encode this as an XML schema– PPML?

• Data Exchange– Work on an XML schema that will allow structured exchange

of phenotype data and metadata - started work on this in EUMODIC

Publication in Mammalian Genome 18, 157-163 (March 2007):

“Integration of Mouse Phenome Data Resources”By The Mouse Phenotype Database Integration Consortium

CASIMIR Networking MeetingHeathrow, July 2007

WP4 - 1st Actions

Update the PRIME list of European mouse projects

Also identify “mouse-related” projectsIdentify contacts

• To hold a meaningful dialogue, get as many as possible to a networking meeting

CASIMIR Networking MeetingHeathrow, July 2007

Ontologies - So Far

• We have a little list

• Test how many of these are actually in use - Questionnaire

• Check how up to date it is, and track developments (e.g. Relationships Ontology, potential Synapse Ontology)

CASIMIR Networking MeetingHeathrow, July 2007

The CASIMIR Questionnaire

• http://www.casimir.org.uk/questionnaire.php• 1a. Are you using a relational database, object

database or flat files?• 1b. If relational, what is your chosen RDBMS

(Relational Database Management System)?• 2a. Is your database providing external links to other

on-line resources; possibly via URL/HTTP (if yes please name them)?

• 2b. Supported/Installed Web Services (if yes please name them)? Do you plan to install or develop web services in the near future?

CASIMIR Networking MeetingHeathrow, July 2007

The CASIMIR Questionnaire

• 3a. Please list the sorts of data entities you store (e.g. protein sequence data, mouse strain information etc...)

• 4a. Can you provide a brief explanatory description/schema of your data/data structure?

• 4b. Are you willing to provide a entity relationship diagram and would you be willing to provide it under an open source license?

CASIMIR Networking MeetingHeathrow, July 2007

The CASIMIR Questionnaire

• 5a.Are you currently using or do you intend to use any ontologies or controlled vocabularies to describe your data?

• 5b. Do you plan to expand your use of ontologies in future?

• 5c. Do you use OBO ontologies?• 5d. Do you perceive the need for additional

ontologies to serve your domain of knowledge?

CASIMIR Networking MeetingHeathrow, July 2007

The CASIMIR Questionnaire

• 6. Do you make use of Minimum Information standards (such as MIAME for microarray experiments) to describe any data? If so, which ones? If you do not make use of these standards, are you likely to do so in future?

CASIMIR Networking MeetingHeathrow, July 2007

Minimum Standards

• MIAME - Brazma et al (2001) Nat. Genet. 29, 365-71

CASIMIR Networking MeetingHeathrow, July 2007

The CASIMIR Questionnaire

• 7. What do you perceive as the main limiting factor in data representation/interoperability etc. in European bioinformatics databases?

• 8. Do you have any comments/thoughts on standards for data representation that need to be developed or that you might like discussed in CASIMIR?

CASIMIR Networking MeetingHeathrow, July 2007

The CASIMIR Questionnaire

Please fill it in as soon as humanly possible!

We will be chasing around database coordinators over the next few months to

make sure we have as much information as possible

CASIMIR Networking MeetingHeathrow, July 2007

Agenda for Today

• Reports from some databases:– MUGEN - Christina Chandras– EMMA - Glenn Proctor– EUMODIC - Niels Adams– EUCLIS - Eduardo Mendoza

• Discussion, e.g.– Comments on the questionnaire/CASIMIR’s aims– How to get widest possible participation– What do people see as the main obstacles to the

aim of integrating all this data?

CASIMIR Networking MeetingHeathrow, July 2007

Mouse to Human

DISEASE

Phenotypic AttributesPhenotypic Attributes Phenotypic Attributes Phenotypic Attributes

Phenotypic AttributesPhenotypic Attributes

Phenotypic Attributes

Human

Mouse PHENOTYPING

Phenotypic Measures Phenotypic MeasuresPhenotypic Measures