4DXpress: a database for cross-species expression pattern comparisons

Preview:

Citation preview

Published online 4 October 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D847–D853doi:10.1093/nar/gkm797

4DXpress: a database for cross-species expressionpattern comparisonsYannick Haudry1, Hugo Berube2, Ivica Letunic1, Paul-Daniel Weeber1,

Julien Gagneur1, Charles Girardot1, Misha Kapushesky2, Detlev Arendt1, Peer Bork1,

Alvis Brazma2, Eileen E. M. Furlong1, Joachim Wittbrodt1 and Thorsten Henrich1,*

1European Molecular Biology Laboratory EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany and 2EuropeanBioinformatics Institute, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK

Received August 14, 2007; Revised September 14, 2007; Accepted September 17, 2007

ABSTRACT

In the major animal model species like mouse, fishor fly, detailed spatial information on gene expres-sion over time can be acquired through wholemount in situ hybridization experiments. In thesespecies, expression patterns of many genes havebeen studied and data has been integratedinto dedicated model organism databases likeZFIN for zebrafish, MEPD for medaka, BDGPfor Drosophila or GXD for mouse. However, acentral repository that allows users to query andcompare gene expression patterns across differentspecies has not yet been established. Therefore, wehave integrated expression patterns for zebrafish,Drosophila, medaka and mouse into a central publicrepository called 4DXpress (expression databasein four dimensions). Users can query anatomyontology-based expression annotations acrossspecies and quickly jump from one gene to theorthologues in other species. Genes are linked topublic microarray data in ArrayExpress. We havemapped developmental stages between the speciesto be able to compare developmental time phases.We store the largest collection of gene expressionpatterns available to date in an individual resource,reflecting 16 505 annotated genes. 4DXpress will bean invaluable tool for developmental as well as forcomputational biologists interested in gene regula-tion and evolution. 4DXpress is available at http://ani.embl.de/4DXpress .

INTRODUCTION

Precise spatio-temporal gene expression is crucial duringthe development of an organism. Combinations oftranscription factors give distinct identities to embryonic

structures, tissues and cell types and trigger complexdevelopmental processes like embryonic patterning, mor-phogenesis and differentiation. To know the exact timeand location of gene transcripts is essential when studyingthe functions of genes involved in developmental processesas well as for trying to decipher the code of cis-regulatorymodules. Therefore expression localization data has beengathered by the dedicated model species databases likeZFIN for zebrafish (1), BDGP (2) and FlyBase (3) forDrosophila, MEPD (4) for medaka, Aniseed for ciona,XDB3 for Xenopus or GXD (5) and EMAGE (6) formouse. A central platform, which allows users to comparegene expression in different species, however, has not yetbeen established. Such a resource would be invaluable notonly to complement lacking expression information in onespecies by annotations done in other species, but also tostudy the evolutionary origin of embryonic structures.Here we provide a platform for a cross-species expres-

sion pattern resource. 4DXpress (expression database infour dimension) stores images, which lets biologists seeand judge expression patterns together with an organizedannotation. It allows users to query the data and makesdata accessible to computational analysis. Our vision isthat in a few years time the exact localization of eachsingle transcript will be known for the major modelspecies. We hope that our resource will help to storethem in an organized way, to compare different speciesexpression patterns and to provide tools to analysethis data.

DATA INTEGRATION

Data integration is a major challenge of the project.Besides the differences of the model organism themselves,databases provide gene expression data in differentformats (flat files, sql-dumps, direct database access) andannotation has been done differently (screens, literature,curators).

*To whom correspondence should be addressed. Tel: +49 6221 387 516; Fax: +49 6221 387 166; Email: henrich@embl.de

� 2007 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/

by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Expression data

So far we have integrated expression data for zebrafish (1),Drosophila (2), medaka (4) and mouse (5). Table 1 gives anoverview on the expression pattern annotations that havebeen integrated for 4DXpress. The best-annotated modelspecies at the moment are Drosophila and zebrafish withalmost 6000 annotated genes each. Mouse follows with3893 annotated genes; some annotations were done usinga 3D virtual embryo (6).Also expression data has been gathered differently. For

medaka and Drosophila the major annotation results froma screen. Expression has been analysed at distinct timepoints and cover between 3 and 4 stages per gene onaverage (Table 1, stages per gene), whereas zebrafishexpression patterns are additionally annotated fromliterature by a team of database curators. Annotation isdone for continuous developmental stages.Anatomy ontologies are often very rich, however only a

limited fraction of the terms is actually used for expressionannotation (Table 1, distinct annotations). Again, ZFINuses a rich vocabulary with almost 700 distinct terms. Thevalues for mouse and medaka need to be treated with care,as the ontologies used for annotation here are the crossproduct of anatomy and stage ontologies and thereforeoverestimates vocabulary richness.Our database schema can store all information required

by the MISFISHIE standard (minimum informationspecification for in situ hybridization and immunohisto-chemistry experiments) (7). This will allow us to efficientlyadapt other model species as well as developing a dataexchange format to keep up to date with other resources.

Cross-species relationships

One of the major goals of our project is to be able tocompare gene expression patterns between the differentmodel species. For doing so, relationships need to beestablished between genes (orthology), between timewindows (developmental stages) and most challengingbetween anatomical structures (homologue/analogue).

Orthology mapping. EnsEMBL compara (8) provides areliable source of sequence homology relationships, whichwas computed using a tree-based approach. We havechosen to use this and update regularly upon newEnsEMBL releases. We assigned each gene to acluster of orthologues using the EnsEMBL notification:one2one-, one2many- and many2many-orthology rela-tionships. Through the web interface (described below),these clusters are visualized as a network and homology

relationships are used to sort the gene list retrieved from aquery as well as for allowing quick links from one gene tothe orthologues in other species.

Developmental stage mapping. It is very difficult toidentify corresponding developmental stages in twospecies, even when comparing two closely related fishspecies like medaka and zebrafish. For instance inmedaka, the head and brain develop faster, whereas thetail and somites develop slower than in zebrafish. So amatching zebrafish stage regarding the number of somites(which is a very popular staging feature) would corre-spond to an earlier stage than a matching zebrafish stagebased on head features.

However there are key events in development, whichallow researchers to define a list of eight stages that isdescribed in all developmental biology text books and iscommon to all bilaterian animals: zygote, cleavage,blastula, gastrula, neurula, organogenesis, juvenile andadult. By mapping each of the species stages onto one ofthe bilaterian stages the link between species stages can bedone and combinatorial explosion can be prevented.A new species will only need to be mapped to thecommon stages (Figure 1, top right) and not against allstages of all other species (Figure 1, top left).

Obviously temporal resolution is lost when mapping alist of 40 developmental stages onto a list of only eightcommon stages, but the eight stages seem to be the largestset shared by all bilaterian species and they represent thekey events in the development of an organism. Theoriginal species-specific stage annotation is not replacedby the stage mapping terms to keep high temporalresolution. However, the stage mapping establishestemporal relationships that can be used for cross-speciesqueries.

Anatomy mapping. The anatomy mapping will be anongoing process the same as it is also an ongoing debate inthe scientific community about which structures can bedefined as being homologous. We have not yet carried outa complete anatomy mapping, but we have set up theresources and tools for doing so. Evidence from differentanalyses will need to be integrated for approaching thisproblem. One can use lexical, anatomy structure andco-expression cues to establish relationships between theanatomical terms. The first two cues can be used by justcomparing the anatomy ontologies available for the modelspecies (9). For the inclusion of co-expression we arecurrently examining conserved network patterns inspecies-specific co-expression networks via orthology

Table 1. Content of 4DXpress. Annotation status of gene expression patterns at present time

Source Genes Stages Stagesper gene

Anatomyterms

Anatomy termsper gene

Anatomy termsper stage

Distinct anatomyterms

Drosophila bdgp 5951 21 048 3.54 29 867 5.02 1.42 288Medaka mepd 882 27 46 3.11 5047 5.72 1.84 338Zebrafish zfin 5779 102 671 17.77 178 851 30.95 1.74 694Mouse mgi 3893 127 99 3.29 17 291 4.44 1.35 1661

16 505 139 264 8.44 231 056 14.00 1.66 2981

D848 Nucleic Acids Research, 2008, Vol. 36, Database issue

relationships. The user can exploit lexical cues already,using the term-based expression search (described below).

The common anatomy reference ontology (CARO) isbeing developed to facilitate interoperability betweenexisting anatomy ontologies for different species. It aimsto provide a template for building new anatomy ontolo-gies. We think CARO could serve as a template to buildan anatomy ontology shared by all bilaterians. Similar tothe stage mapping we then want to map species-specificanatomy terms onto this common ontology.

4D ArrayExpress data warehouse

Expression data acquired through in situ hybridization,antibody or transgenic expression can be complementedthrough microarray data. The first methods providehigh-resolution data in both space and time, whichmicroarray data cannot provide; microarray experimentshowever can quickly give a quantitative overview on theoverall expression of all genes in a genome. Especiallyuseful are time series that provide insight in expressionchanges during development. That is why we have set up acomplementary project at ArrayExpress (10), which storescorresponding microarray data. The project is called 4DArrayExpress data warehouse (4DDW) and is accessibleat: http://www.ebi.ac.uk/microarray-as/4DDW_EMBL/.The 4DDW will be described in detail elsewhere.

So far we have established 4737 reciprocal links formouse, Drosophila and zebrafish. When querying micro-array data at the 4DDW users can quickly go to 4DXpressand vice versa. The close linkage of these two resourcesallows researchers for example to quickly examine thegene expression patterns of a list of genes that clustertogether in a microarray experiment.

Expression similarity

Expression patterns within a species can easily becompared when representing the expression annotationas a binary vector (1 for expressed, 0 for not expressed).Different methods to calculate the similarity between thesevectors can be applied.We have chosen the Jaccard coefficient as a similarity

measure for a start, which is simple to calculate and hasbeen used in the first BDGP release (2) for the samepurpose.The Jaccard distance has been calculated between the

expression vectors of gene pairs. The expression binaryvector was compiled considering stage and anatomy. If agene is expressed (has positive annotation) at a given stagein a given anatomical structure the vector value is set totrue, otherwise to false.The Jaccard similarity coefficient is defined as the size of

the intersection divided by the size of the union of thesample vectors:

Jaccard similarity coefficient : J ðA, BÞ ¼ jA \ Bj=jA [ Bj

Jaccard distance : J�ðA,BÞ ¼ 1� JðA,BÞ

The Jaccard distance is supposed to estimate howdifferent expression patterns are. However this valuedepends on the extent and quality of the expressionannotation. Thus, in the cases where annotations areincomplete or have been done inconsistently, this measuremight be misleading. Also, this method treats allanatomical structures equally. Relations defined in theanatomy ontology are not taken into account. In futurewe will provide additional similarity measures e.g. thesemantic similarity, which accounts for that.

Zygote:1-cellCleavage:2-cellCleavage:4-cellCleavage:8-cellCleavage:16-cellCleavage:32-cellCleavage:64-cellBlastula:128-cellBlastula:256-cellBlastula:512-cellBlastula:1k-cellBlastula:HighBlastula:OblongBlastula:SphereBlastula:DomeBlastula:30%-epibolyGastrula:50%-epibolyGastrula:Germ-ringGastrula:ShieldGastrula:75%-epibolyGastrula:90%-epibolyGastrula:Bud

Segmentation:1-4 somitesSegmentation:5-9 somitesSegmentation:10-13 somitesSegmentation:14-19 somitesSegmentation:20-25 somitesSegmentation:26+ somitesPharyngula:Prim-5Pharyngula:Prim-15Pharyngula:Prim-25Pharyngula:High-pecHatching:Long-pecHatching:Pec-finLarval:Protruding-mouthLarval:Day 4Larval:Day 5Larval:Day 6Larval:Days 7-13Larval:Days 14-20Larval:Days 21-29Juvenile:Days 30-44Juvenile:Days 45-89Adult

zebrafish stages

ts01: One-cell eggts02: Dividing eggts03: Morulats04: Blastocystts05: Blastocyst (zona-free)ts06: Attachment of blastocystts07: Implantationts08: Egg cylinderts9: Pre-streak (PSts9: Early streak (ES)ts10: Mid streak (MS)ts10: Late streak, no budts10: Late streak, early budts11: Neural plate (NP)ts11: Late neural plate (LNP)ts11: Early head fold (EHF)ts11: Late head fold (LHF)

ts12: 1-4 somitests12: 5-7 somitests13: Turning of the embryots14: Ant. neuroporets15: Post. neuroporets16: Posterior neuropore closests17: Deep lens indentationts18: Closure of lens vesiclets19: Lens vesicle completelyts20: Earliest sign of fingersts21: Anterior footplate indentedts22: Fingers separate distallyts23: Fingers & Toes separatets24: Reposition of umbilical herniats25: Skin is wrinkledts26: Long whiskersts27: Newborn Mousts28: Postnatal development

mouse stages

MousePlatynereis

Medaka

Zebrafish

Drosophila

bilateriananimalszygote

cleavageblastulagastrulaneurula

organogenesisjuvenile

adult

Mouse

Platynereis

Medaka

Zebrafish

Drosophila

BilateriaStages

Figure 1. Mapping of developmental species was done via a list of stages common to all bilaterian animals.

Nucleic Acids Research, 2008, Vol. 36, Database issue D849

Still, the Jaccard distance provides a quick and easy wayfor identifying similarly annotated genes. The values arestored in the database and helps users to find genes withinthe species with similar expression patterns. This measurecan also be used to cluster genes with similar geneexpression pattern annotations as shown for Drosophila(2). We use these similarity relationships to generateco-expression networks and plan to search for conservednetwork patterns across species using orthologyrelationships.

CROSS-SPECIES OVERLAP

Model species differ; they differ in morphology andfunction; they differ in accessibility by molecular methods;they differ in the genomic and computational resourcesand in the size of the scientific community working onthem. This is reflected in the number of genes annotatedfor each model species throughout development(Figure 2). Whereas mouse has a large scientific commu-nity behind, it is not producing similar amounts ofoffspring as egg laying fish and Drosophila. Embryos aredeveloping internally resulting in a smaller number ofannotated genes; however, with high quality annotation.Zebrafish and Drosophila are the most complete data sets.There are huge differences in how they were compiled.Whereas the Drosophila data was only acquired in a singlescreen with only a few annotators, the zebrafish data wascollected from several large-scale screens and annotatedexpression patterns from the literature.When comparing expression data, it is important to

examine the data overlap between the species. Forexample: How many genes are annotated at correspondingdevelopmental stages? And: How many orthologues haveexpression annotation?In Figure 2, we have marked corresponding develop-

mental stages with the same colour (stage mapping asdescribed above).Zebrafish is spanning most developmental stages and

largest temporal overlap exists with Drosophila (fromcleavage till organogenesis). Neurula and organogenesisstages are the best-annotated stages in all four species andtherefore most promising to be compared to each other.Besides temporal overlap we need orthology overlap to

be able to compare expression patterns across species. Theoverlap we have between annotated genes in the differentspecies combinations is shown in Table 2.The numbers in Table 2 are getting particularly

important when doing global computational analyses.When focusing on two species comparisons, zebrafish andDrosophila annotations will yield the largest overlap of964 annotated orthologous groups, when going for threespecies mouse should be taken into account.

WEB INTERFACE

4DXpress is a JAVA-based application with a web-basedfront-end powered by the servlet container TOMCAT anddata are stored in a PostgreSQL relational database. Theweb application is based on a model-view-controller

(MVC) architecture using the Struts Framework, andenhanced with applets, JavaScript and AJAX(Asynchronous JavaScript and XML) technologies tobuild a powerful, interactive, user-friendly interface.4DXpress is available at http://ani.embl.de/4DXpress.

It is also possible to link to 4DXpress gene entry pagesfrom other projects using the following link http://ani.embl.de/4DXpress/reg/all/search/bquery.do?id=witha gene identifier as the ID value that can be either anEnsEMBL ID, gene symbol or primary identifier fromother public resources (e.g. FlyBase IDs, MGI IDs orZFIN IDs).

Query genes

Genes can be searched either by a range of externalidentifiers, symbols, names or by their expression patternannotation. Using the ontology-based form, by selecting aspecies, the corresponding stage and anatomy ontologiesare loaded and information is provided on how manygenes are annotated with the listed terms. The term-basedform allows more complex queries and cross-speciesqueries can be preformed by selecting ‘Bilateria’. Then, alist of search terms can be entered manually or guided byauto-completion of terms, which have been used forannotation. The fact that corresponding structures oftenhave similar names in the different species allows mean-ingful cross-species queries using this tool.

Upon sending the query a gene list is returned, whichprovides the user with a summary overview. By defaultthis list is ordered by orthologous groups, which facilitatesthe comparison of orthologous genes in the differentspecies.

When picking an individual gene entry the fullinformation on that gene is displayed: external identifiers,gene description, expression pattern annotation usingstage and anatomy ontologies, images of stained embryosand orthology relationships (visualized as a network).From the gene view a list of orthologues can be selectedand their expression annotation and images can becompared to each other on a single page. A croppedscreenshot comparing medaka Six3 and its Drosophilaorthologue Optix is shown in Figure 3.

Also, a list of similarly expressed genes within the samespecies is provided, which was calculated using the Jaccardcoefficient (see above). Users can select some of them andcompare them like shown in Figure 3.

Ontology browser

Ontologies are becoming widely used to annotate units ofinformation by providing controlled vocabularies andstructured knowledge. Therefore, anatomy ontologies areuseful to enforce standard terminology for gene expressionannotation as well as for making this informationaccessible to computational analysis, but at the sametime database usage becomes more complex to non-expertusers. We provide a tree-based tool to help users to browseontologies that were used for expression pattern annota-tion. It allows users to query terms and expand or collapseindividual nodes.

D850 Nucleic Acids Research, 2008, Vol. 36, Database issue

Developmental stage ontologies can be browsed byspecies and external links provide more information onstage definitions. Species-specific stage ontologies weremapped onto a common stage list (Figure 1) and therebytemporal relationships were established, which can beaccessed via web interface.

Annotation tool

Our annotation tool allows users to annotate geneexpression patterns resulting from any of the three types

of experiments: whole mount in situ hybridization,transgenic reporter gene expression or antibody staining.The same tool can be used for all supported species(for now: zebrafish, mouse, medaka, Drosophila, platy-nereis). Species-specific ontologies for developmentalstages and anatomies can be loaded and users cancustomize a list of favorite terms to be used.

CONCLUSIONS AND FUTURE DIRECTIONS

We have integrated expression data on 16 505 genes in thefour important developmental model species: mouse,zebrafish, Drosophila and medaka. We developed astable database schema and a powerful web interface toaccess this data. With the interface cross-species queriescan be done, facilitated by a stage mapping. An expressionsimilarity measure is implemented to find genes withsimilar expression patterns and links to all original datasources are provided.With these tools in place we aim to integrate more

species, which are available in the public domain likeXenopus laevis with 17.000 images, Ciona andCaenorhabditis elegans. We have set up the infrastructureto analyse and compare the data. We will analyse thefeatures of in situ co-expression networks and comparethem between the species and to co-expression networksderived from microarray data. We will use conservednetwork patterns to assess mapping of anatomicalstructures.

zebrafish

1-ce

ll

0

1000

2000

3000

4000

5000

60004-

cell

16-c

ell

64-c

ell

256-

cell

1k-c

ell

Obl

ong

Dom

e50

%-e

pibo

lySh

ield

90%

-epi

boly

1-4

som

ites

10-1

3 so

mite

s20

-25

som

ites

Prim

-5Pr

im-2

5Lo

ng-p

ec

Prot

rudi

ng-m

outh

Day

5D

ays

7-13

Day

s 21

-29

Day

s 45

-89

mouse

0

200

400

600

800

1000

1200

1400

1600

1800

2000

TS1

TS3

TS5

TS7

TS9

TS11

TS13

TS15

TS17

TS19

TS21

TS23

TS25

TS27

ZygoteCleavageBlastulaGastrulaNeurulaOrganogenesisJuvenileAdult

medaka

0

100

200

300

400

500

600

700

800

900

st1

st2b st4

st6

st8

st10

st12

st14

st16

st18

st20

st22

st24

st26

st28

st30

st32

st34

st36

st38

st40

st42

st44

drosophila

0

1000

2000

3000

4000

5000

6000

7000

egg

emb

2em

b 4

emb

6em

b 8

emb

10em

b 12

emb

14em

b 16

1st i

nst

3rd

inst

3rd

inst

2 P2 P4P4

(ii)

P5(ii

)

P7 P9

Figure 2. Comparison of the number of genes annotated at species-specific developmental stages in zebrafish, Drosophila, mouse and medaka.Corresponding developmental stages have the same colour. Colour legend for all panels is shown in the mouse panel.

Table 2. Number of orthologous groups with genes that are annotated

in more than one species

Number of species COGs

Two speciesZebrafish, Drosophila 964Mouse, Zebrafish 913Mouse, Drosophila 764Drosophila, Medaka 341Zebrafish, Medaka 329Mouse, Medaka 260

Three speciesMouse, Zebrafish, Drosophila 336Zebrafish, Drosophila, Medaka 156Mouse, Zebrafish, Medaka 135Medaka, Mouse, Drosophila 124

Four speciesMouse, Zebrafish, Drosophila, Medaka 68

Nucleic Acids Research, 2008, Vol. 36, Database issue D851

ACKNOWLEDGEMENTS

This work was carried out in the Centre forComputational Biology at EMBL. We are grateful tothe model organism database crews for providing us theirexpression pattern data. In particular we thank MartinRingwald and Susan McClatchy to give us direct access tothe MGI database; thanks to Monte Westerfield and JudySprague for helping us with the ZFIN data reports; wethank Pavel Tomancak for helping us to understand theBDGP mySQL schema.We thank Mirana Ramialison for extending the medakaannotation and Francois Spitz for mouse in situ images.We are grateful to the MISFISHIE team for initiating anexpression pattern data exchange format. Funding to paythe Open Access publication charges for this article wasprovided by EMBL.

Conflict of interest statement. None declared.

REFERENCES

1. Sprague,J., Bayraktaroglu,L., Clements,D., Conlin,T., Fashena,D.,Frazer,K., Haendel,M., Howe,D.G., Mani,P. et al. (2006) Thezebrafish information network: the zebrafish model organismdatabase. Nucleic Acids Res., 34, D581–D585.

2. Tomancak,P., Berman,B.P., Beaton,A., Weiszmann,R., Kwan,E.,Hartenstein,V., Celniker,S.E. and Rubin,G.M. (2007) Globalanalysis of patterns of gene expression during Drosophilaembryogenesis. Genome Biol., 8, R145.

3. Grumbling,G. and Strelets,V. (2006) FlyBase: anatomical data,images and queries. Nucleic Acids Res., 34, D484–D488.

4. Henrich,T., Ramialison,M., Wittbrodt,B., Assouline,B., Bourrat,F.,Berger,A., Himmelbauer,H., Sasaki,T., Shimizu,N. et al. (2005)MEPD: a resource for medaka gene expression patterns.Bioinformatics, 21, 3195–3197.

Figure 3. The comparative view shows expression patterns of a list of selected genes (medaka Six3 and its Drosophila orthologue Optix). Expressionannotation and images can be easily compared between the genes on a single page.

D852 Nucleic Acids Research, 2008, Vol. 36, Database issue

5. Smith,C.M., Finger,J.H., Hayamizu,T.F., McCright,I.J., Eppig,J.T.,Kadin,J.A., Richardson,J.E. and Ringwald,M. (2007) The mousegene expression database (GXD): 2007 update. Nucleic Acids Res.,35, D618–D623.

6. Christiansen,J.H., Yang,Y., Venkataraman,S., Richardson,L.,Stevenson,P., Burton,N., Baldock,R.A. and Davidson,D.R. (2006)EMAGE: a spatial database of gene expression patterns duringmouse embryo development. Nucleic Acids Res., 34, D637–D641.

7. Deutsch,E.W., Ball,C.A., Bova,G.S., Brazma,A., Bumgarner,R.E.,Campbell,D., Causton,H.C., Christiansen,J., Davidson,D. et al.(2006) Development of the minimum information specification forin situ hybridization and immunohistochemistry experiments(MISFISHIE). Omics, 10, 205–208.

8. Hubbard,T.J., Aken,B.L., Beal,K., Ballester,B., Caccamo,M.,Chen,Y., Clarke,L., Coates,G., Cunningham,F. et al. (2007)Ensembl 2007. Nucleic Acids Res., 35, D610–D617.

9. Zhang,S. and Bodenreider,O. (2003) Aligning representations ofanatomy using lexical and structural methods. AMIA Annu. Symp.Proc., 00, 753–757.

10. Parkinson,H., Kapushesky,M., Shojatalab,M.,Abeygunawardena,N., Coulson,R., Farne,A., Holloway,E.,Kolesnykov,N., Lilja,P. et al. (2007) ArrayExpress – apublic database of microarray experiments andgene expression profiles. Nucleic Acids Res.,35, D747–D750.

Nucleic Acids Research, 2008, Vol. 36, Database issue D853