6
Databases of free expression John R. Walker, Tim Wiltshire Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, California 92121, USA Received: 6 April 2006 / Accepted: 29 August 2006 Abstract The rapid development of microarray technologies has led to a similar progression in gene expression analysis methods, gene expression applications, and gene expression databases. Public gene expression databases enable any researcher to examine expres- sion of their favorite genes across a wide variety of samples, download sample data for development of new analysis methods, or answer broad ques- tions about gene expression regulation, among other applications. A wide variety of public gene expres- sion databases exist, and they vary in their content, analysis capabilities, and ease of use. This review highlights the current features and describes exam- ples of two broad categories of mammalian micro- array databases: tissue gene expression databases and data warehouses. Introduction With the development of microarray technology over the past decade, global gene expression surveys have become a popular method to study biological pro- cesses. Along with the technology have come tools to more easily extract useful information from experiments. Many reviews have described methods to distinguish signal from noise from microarray images as well as software to visualize and statisti- cally analyze differences between experimental groupings. More recent software tools now assemble biological pathways that are significantly altered in a gene expression experiment. All of these develop- ments have allowed for easier extraction of relevant data points and interpretation of results. But even after expression changes are validated by other technologies such as quantitative polymerase chain reaction (qPCR), results still need to be interpreted in a broader context. For example, how do we know if a particular biological pathway that appears to be important in a list of differentially expressed genes actually participates in the biology under examina- tion? Are those genes that make up the pathway expressed or enriched in the tissue or cells used in the current experiment? Has anybody else performed a comparable experiment and seen similar individual gene and pathway changes? Do members of a gene list appear as differentially expressed in expression data sets from other areas of biology? Because gene expression data sets are complex, published articles most likely do not describe all of the gene expression changes in their experiments. Thankfully, journals have required submission of entire data sets to public gene expression databases so that others have the opportunity to extract additional information from the data. This review focuses on types of publicly avail- able gene expression data sources that can be queried for particular genes and biological processes of interest. We focus primarily on gene expression re- sources of microarray data in mammals. However, other gene expression resources are cited where they add qualitative information. We discuss two cate- gories of databases. The first comprises databases in which researchers are seeking tissue expression location information about genes of interest. These databases typically contain data obtained by a single group using a single microarray type, so cross-sample comparisons are possible. The second category of expression database we discuss is that in which researchers seek entire gene expression data sets obtained in a biological area of interest. Cross-sam- ple or cross-experiment comparisons in these data- bases need to be interpreted with caution because there is a considerable amount of data variability across experiments, array types, and protocols (see below). We comment on the different types of Correspondence to: John R. Walker; E-mail: [email protected] DOI: 10.1007/s00335-006-0043-5 Volume 17, 11411146 (2006) Ó Springer Science+Business Media, Inc. 2006 1141 Review

Databases of free expression

Embed Size (px)

Citation preview

Page 1: Databases of free expression

Databases of free expression

John R. Walker, Tim Wiltshire

Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, California 92121, USA

Received: 6 April 2006 / Accepted: 29 August 2006

Abstract

The rapid development of microarray technologieshas led to a similar progression in gene expressionanalysis methods, gene expression applications, andgene expression databases. Public gene expressiondatabases enable any researcher to examine expres-sion of their favorite genes across a wide variety ofsamples, download sample data for development ofnew analysis methods, or answer broad ques-tions about gene expression regulation, among otherapplications. A wide variety of public gene expres-sion databases exist, and they vary in their content,analysis capabilities, and ease of use. This reviewhighlights the current features and describes exam-ples of two broad categories of mammalian micro-array databases: tissue gene expression databases anddata warehouses.

Introduction

With the development of microarray technology overthe past decade, global gene expression surveys havebecome a popular method to study biological pro-cesses. Along with the technology have come toolsto more easily extract useful information fromexperiments. Many reviews have described methodsto distinguish signal from noise from microarrayimages as well as software to visualize and statisti-cally analyze differences between experimentalgroupings. More recent software tools now assemblebiological pathways that are significantly altered in agene expression experiment. All of these develop-ments have allowed for easier extraction of relevantdata points and interpretation of results. But even

after expression changes are validated by othertechnologies such as quantitative polymerase chainreaction (qPCR), results still need to be interpretedin a broader context. For example, how do we knowif a particular biological pathway that appears to beimportant in a list of differentially expressed genesactually participates in the biology under examina-tion? Are those genes that make up the pathwayexpressed or enriched in the tissue or cells used inthe current experiment? Has anybody else performeda comparable experiment and seen similar individualgene and pathway changes? Do members of a genelist appear as differentially expressed in expressiondata sets from other areas of biology? Because geneexpression data sets are complex, published articlesmost likely do not describe all of the gene expressionchanges in their experiments. Thankfully, journalshave required submission of entire data sets topublic gene expression databases so that others havethe opportunity to extract additional informationfrom the data.

This review focuses on types of publicly avail-able gene expression data sources that can be queriedfor particular genes and biological processes ofinterest. We focus primarily on gene expression re-sources of microarray data in mammals. However,other gene expression resources are cited where theyadd qualitative information. We discuss two cate-gories of databases. The first comprises databases inwhich researchers are seeking tissue expressionlocation information about genes of interest. Thesedatabases typically contain data obtained by a singlegroup using a single microarray type, so cross-samplecomparisons are possible. The second category ofexpression database we discuss is that in whichresearchers seek entire gene expression data setsobtained in a biological area of interest. Cross-sam-ple or cross-experiment comparisons in these data-bases need to be interpreted with caution becausethere is a considerable amount of data variabilityacross experiments, array types, and protocols (seebelow). We comment on the different types ofCorrespondence to: John R. Walker; E-mail: [email protected]

DOI: 10.1007/s00335-006-0043-5 � Volume 17, 1141�1146 (2006) � � Springer Science+Business Media, Inc. 2006 1141

Review

Page 2: Databases of free expression

information that can be extracted from these data-bases and highlight their characterizing features.

Tissue Gene Expression Databases

One major stumbling block for researchers is what todo with genes of unknown function that appear asdifferentially expressed in their gene expressionexperiments. Because many microarray platformsare based on Unigene cluster members (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=uni-gene), there may be little information on what typeof protein a target sequence may encode, whichmeans that its function is very likely unknown.More often than not, transcripts with no knownfunction are the first to be ignored when analyzinggene expression data. This may represent a highpercentage of sequences on some arrays. For exam-ple, for the September 2005 annotation for theAffymetrix 430 2.0 array, only 45% of all sequenceson the array are associated with a Gene Ontology(GO) biological process category (http://www.gene-ontology.org/; http://www.affymetrix.com/support/technical/byproduct.affx?product=moe430-20).Obviously, any additional information about thesetranscripts might be useful to determine whetherthey should be considered for further analysis.

Tissue location of expression can be a valuabletool to determine gene function. Mootha et al. (2003)used tissue gene expression to find a previously un-known gene responsible for a mitochondrial disorder.The location of this gene in a genomic regionresponsible for the disorder, as well as its strong co-expression with other known mitochondrial genesacross tissues, hinted that it may be involved in thedisease. Additional experiments proved that the genedid indeed cause the disorder and was most likely amitochondrial gene. Tissue gene expression databas-es, along with other databases, have also been used tocategorize, at awhole-genome level, genes potentiallyinvolved in a particular type of disease category (Calvoet al. 2006). Tissue gene expression data sets can alsohelp prioritize potentially causative genes in humanassociation (as described above) or rodent quantitativetrait locus (QTL) or ethyl-N-nitrosourea (ENU)mutagenesis studies (Brown et al., 2005; Wen et al.2004). Finally, tissue expression can determine if agene product is a realistic target for pharmacotherapy.For example, if one is interested in targets for prostatecancer therapy, an ideal candidate would be onewhose expression is activated in cancer yet whoseexpression is low in other normal tissues besidesprostate (Welsh et al. 2003).

Several websites provide useful informationabout the tissue localization of gene expression.

Some are geared toward particular tissues and dis-ease areas; they are not described in detail here. Thewebsites presented here provide expression data for awider range of tissues. Various highlights of thedifferent data sets are discussed. Links and some keyfeatures of these databases are provided in Table 1.

Symatlas (http://symatlas.gnf.org/SymAtlas/) isa continually updated gene expression source formouse, rat, and human gene expression data across awide variety of tissues (Su et al. 2004). Currently,expression data are available across 79 human tis-sues, 61 mouse tissues, 29 rat tissues, and 83 com-monly used human cell lines. The mouse andhuman data sets cover nearly the entire ‘‘transcrip-tome,’’ and data from older-version Affymetrix ar-rays are also provided. Queries are gene centric, somultiple probe sets for every gene across each dataset are displayed when searching by gene, accessionnumber, or sequence. A particularly useful featurefor candidate gene analyses is the chromosomeinterval search. Queries are flexible because thereare options to combine searches via intersectionsand unions, and results can be filtered. Displaysconsist of bar graphs of expression across all tissuesin any particular data set chosen. There are links topublic information about protein function, genomiclocation using the UCSC genome browser (http://genome.ucsc.edu/), and probe and target sequence.Mining for transcripts that are enriched in expres-sion in a particular tissue or are coexpressed acrosstissues with a chosen transcript can be performed.Various results of queries such as bar graph images,complete updated gene annotation, and processeddata can be downloaded. Finally, all raw and pro-cessed data for the mouse, human, and rat data setsare available upon request.

The characterizing feature of Symatlas is itsgene-centric architecture. Queries result in allinformation about a gene and expression data foreach probe for mouse, rat, and human across all arraytypes in which a particular gene is represented.Because custom microarrays were used for themouse and half of the human data sets, a disadvan-tage to using Symatlas is that direct comparisons tocommercial microarrays are cumbersome.

Stanford University�s SOURCE (http://genome-www5.stanford.edu/cgi-bin/source/sourceSearch) alsoprovides useful alias searches (Diehn et al. 2003). Anadvantage and disadvantage of SOURCE is that itlinks to external sources for gene expression infor-mation. Though this results in a diverse view ofexpression across tissues, sometimes evidence isbased only on representation in expressed sequencetag (EST) libraries and not more quantitative micro-array data. Also, because of this dependence, there is

1142 J.R. WALKER AND T. WILTSHIRE: FREE EXPRESSION DATABASES

Page 3: Databases of free expression

not one source of tissue gene expression data that canbe downloaded or from which processed data for par-ticular genes can be retrieved.

Visualization of gene expression in SOURCE,when applicable, is in the form of a heat map.Finding nearest expression neighbors is as simple asclicking on the heat map. Useful additional featuresof SOURCE are availability of clone information forevery gene and ability to retrieve upstream genomicsequence. The one unique feature of SOURCE is itsability to retrieve diverse gene expression data setswhen a single gene is queried.

The Mouse Gene Prediction Database from theHughes lab at the University of Toronto (http://mgpd.med.utoronto.ca/) provides a simple yet veryuseful query interface and a rich mouse geneexpression data source (Zhang et al. 2004). Outputconsists of a heat map of gene expression acrosstissues along with nearest expression neighbors.Various links to gene annotation are available aswell as immediate access to the oligo sequence thatwas on the array. The raw and processed microarraydata for this data set are available via links to thecited publication. A distinguishing feature is theability to search for coexpression of genes that be-long to a GO category that yields immediate func-tion-expression correlations.

Though the Oncogenomics Normal TissueDatabase (http://ntddb.abcc.ncifcrf.gov/cgi-bin/nlt-issue.pl) represents only 19 human tissues, there aremany individual replicates of each tissue allowingfor examination of expression variability across hu-man donors (Son et al. 2005). As with the MouseGene Prediction Database, chromosome interval andGO searches are possible. This database is password-protected, though passwords are easily attainable.Distinguishing features of this database includemany options to search for differential expressionacross tissues, simple and rapid downloading of databehind heat maps, correlation searches in which thehits are sorted by correlation coefficient, and a bargraph display of expression across the individualreplicates.

RIKEN (http://read.gsc.riken.go.jp/) offered oneof the first tissue gene expression databases to thepublic (Bono et al. 2002). This mouse database wasdesigned around RIKEN�s rich clone collection andtherefore contains comprehensive clone informa-tion. Characteristic features include a somewhatrich developmental data set, expression correlationsearches, and tissue-enrichment search capability.

The TeraGenomics database (http://public-web.teragenomics.com/public/login.asp) does notoffer a query tool to examine expression of inputgene(s). However, because it contains expressionT

able

1.Key

featuresoftissueex

pressiondatab

ases

anddatawareh

ouses

URL

Sourcename

Keyfeatures

Tissu

eExpressionSources

http://symatlas.gn

f.org/SymAtlas/

Symatlas

Synonym

search

ing,

genece

ntric,man

ydifferenttissues

http://gen

ome-www5.stanford.edu/cgi-bin/source/sourceS

earch

SOURCE

Lnksbetwee

nge

nes

andclones,multiple

datasources

http://m

gpd.m

ed.utoronto.ca/

Mouse

Gen

ePrediction

GO

catego

rysearch

esDatab

ase

http://ntddb.abcc

.ncifcrf.gov/cgi-bin/nltissu

e.pl

Onco

genomicsNorm

alTissu

eDatab

ase

Datanorm

alizationan

ddisplayoptions,

man

yreplica

tes

http://rea

d.gsc.riken

.go.jp/

RIK

EN

Embryonic

tissues,cloneinform

ation

http://publicw

eb.terag

enomics.co

m/public/login.asp

TeraG

enomics

Differentmouse

strains,

rich

inex

perim

entaldetails

http://w

ww.gen

enetwork.org/home.htm

lW

ebQtl

Phen

otypedata,

multiple

strains

http://w

ww.tigr.org/index

.shtm

lTIG

RSoftware,

EST

library

inform

ation

http://w

ww.inform

atics.jax.org/m

enus/ex

pression_m

enu.shtm

lJack

sonLab

oratory

Dev

elopmen

taltissues,alternativeex

pression

tech

nologies

http://w

ww.brain-m

ap.org

Allen

Brain

Atlas

Insitu

hybridizationofmouse

brain

DataW

areh

ouses

http://w

ww.ncb

i.nlm

.nih.gov/geo

/GEO

Abundan

tdata,

analysistools

http://w

ww.ebi.ac

.uk/array

express/

Array

express

Analysistools,ex

tensivesample

annotation

genome-www.stanford.edu/m

icroarray

Stanford

MicroarrayDatab

ase

Freesoftware,

analysistools

http://w

ww.cbil.upen

n.edu/R

AD/php/index

.php

RAD

Rap

iddatadownload

shttp://proteoge

nomics.musc.edu/m

a/musc_m

adb.php?

MUSC

Rap

iddatadownload

s

J.R. WALKER AND T. WILTSHIRE: FREE EXPRESSION DATABASES 1143

Page 4: Databases of free expression

data for several tissues and mouse strains and be-cause it is possible to download processed data foroffline queries, it deserves mention here. Distin-guishing characteristics include expression data forseveral mouse strains, detailed experimental proce-dures, high-quality samples and hybridizations, anddetailed hybridization reports allowing for compari-son of sample and array quality.

WebQTL (http://www.genenetwork.org/home.html) combines microarray expression data in sev-eral tissues with mouse phenotype and DNA se-quence variation data across recombinant inbredmouse strains (Wang et al. 2003). This combinationof data allows for QTL mapping and association ofvarious phenotypic traits and facilitates the integra-tion of networks of genes, transcripts, and traits.

There also are sites that provide queries forexpression across tissues, yet the data sources arenot from microarrays. However, these sites provideadditional useful information such as how to obtainclones for particular genes or EST libraries, andsupplementary information using other expressiontechnologies including Northern blots and in situhybridization across select tissues. Though thesesources may not cover the entire genome nor probemany tissues, they are worth querying for knowngenes because they may provide information that isnot available from microarray technology.

TIGR (http://www.tigr.org/index.shtml), thougha leader in microarray-based expression data analy-sis, offers only expression data for mammals fromEST libraries. Though this does not allow for highlyquantitative visualization of expression across tis-sues, it does provide rich information about indi-vidual clones in a wide array of EST libraries for pig,dog, and cattle in addition to mouse, rat, and human.

The JacksonLaboratory (http://www.informatics.jax.org/searches/expression_form.shtml) is known asa rich source of mouse phenotypic data. It also has amouse gene expression collection using in situhybridization, Northern blotting, reverse transcrip-tase PCR (RT-PCR), and RNase protection (Ringwaldet al. 2001). In addition, it offers protein expressiondata using Western blots and immunohistochemis-try. These sources, therefore, could provide moresensitive expression information than microarrays(RT-PCR), better anatomically defined mRNA local-ization (in situ hybridization), protein size and semi-quantitation (Western), and protein localization(immunohistochemistry). A distinguishing feature ofthis database is its developmental expression data.

The Allen Brain Atlas (http://www.brain-map.org/welcome.do;jsessionid = 70CE335B9D84FCBEC571AB2F1E0027BE) is a rapidly expandingsource of gene expression in the mouse brain.

Currently, over 12,000 genes can be queried for verydetailed in situ hybridization data, with plans for20,000 genes by the end of 2006. Tools to examinedifferential expression across brain regions are al-ready available.

Gene Expression Data Warehouses

Often researchers have a particular gene or set ofgenes for which they would like information inaddition to tissue gene expression. Besides where agene is expressed, it would also be useful to knowwhen it is expressed. For example, it could be usefulto determine what happens to expression of a par-ticular gene when another gene is knocked down oroverexpressed in mice or in cell culture, when ani-mals or cells are treated with a particular drug orgiven a specific stimulus, or in a particular diseasecondition. Genewise queries across all samples arenot yet possible across large publicly available datasets, probably because it would take a huge effort toorganize the data and set up the analyses. Forexample, deciding which experimental factors tocompare and which appropriate statistical tests touse would have to be done in advance. In addition, asdescribed below, queries across laboratories and dataplatforms may produce expression artifacts.

At times, researchers will already have ahypothesis in mind and will want to search throughmulticondition databases to find an experimentalcondition of interest to them. For this purpose, itwould be useful to search for differential expressionof genes across this data set and/or download thedata for analysis with gene expression software oftheir choice. Currently, there are only a few data-bases that provide enough gene expression data inone place to make these searches useful; some aredeveloping tools to analyze and download portions ofthis data.

An ideal data warehouse would make it easy forusers to find differentially expressed genes in anydata set, visualize expression of those genes acrossthe given conditions, and download expression datawith gene and sample annotation. But before thisstage, finding experiments of interest should bemade easy with keyword searches. This requires thatsubmitters include adequate information about theexperiment, the samples, and the microarray proce-dures. Because microarray data analysis methodsvary widely and are continually developing, datawarehouses would not be expected to provide toolsto satisfy every user. For this reason, raw data shouldbe easily available for download.

The currently available data warehouses that arereviewed contain some of the above-described

1144 J.R. WALKER AND T. WILTSHIRE: FREE EXPRESSION DATABASES

Page 5: Databases of free expression

options and features. These warehouses are growingat a constant rate and are adding new features tomeet the demands of the research community. Asummary table of these warehouses is provided inTable 1.

Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) is one database that manyjournals recommend as a location to submit micro-array data upon submission of a manuscript (Edgaret al. 2002). GEO now contains data for over 85,000samples that cover many species and microarrayplatforms. Data submission requires data in GEO�sformat (SOFT), but they also accept MAGE-ML for-matted data. Submitters are encouraged to submitraw data formats (CEL files for Affymetrix data, forexample) so that site visitors can download andprocess the data on their own.

GEO has recently added query tools (Barrett et al.2005) that allow visitors to examine differentialexpression in any data set. GEO�s SOFT submissionformat requires description of experiments andsamples, so experiments of interest can be foundwith a simple query. Researchers who visit any ofNCBI�s databases (PubMed, Entrez Gene) will befamiliar with search tools and input options avail-able in GEO. Once experiments of interest are found,differential expression can be examined and expres-sion of individual genes across designated samplescan be displayed via heat maps and bar graphs.

Array Express from EMBL (http://www.ebi.ac.uk/arrayexpress/) is also recommended by manyjournals as a location to deposit microarray data,though its collection is smaller than GEO�s (Brazmaet al. 2003). The most distinguishing characteristicof Array Express is the extensive sample annotationthey require to pass MIAME (http://www.mged.org/Workgroups/MIAME/miame.html ) standards. Theyprovide software for this task, and MAGE-ML for-mats of submitted data sets are easily obtainable.

Data analysis in Array Express is performed afterdata are imported into their Expression Profilersoftware. A unique feature is the ability to normalizedata in several ways. There is also a wide variety ofoptions for clustering and differential expressionanalysis. It is relatively easy to download both rawand processed data, and many download parametersare available. It is also worth mentioning that ArrayExpress is developing a tissue gene expression searchtool using various data sources in their warehouse.

The Stanford Microarray Database (http://gen-ome-www5.stanford.edu/) has been a leader in pro-viding open-source data and creating microarray datatools (Ball et al. 2005). Most of its data submissionsoriginate from Stanford University and collaborators(over 60,000 experiments of which around 11,000 are

public). In addition to being a source for microarraydata, there are links to microarray company sites,collections of microarray publications, and micro-array-related learning materials. There are down-loads available for free microarray-related softwaredeveloped at Stanford and an extensive list of linksto external software. Similar to Array Express, thereare comprehensive experimental and sample anno-tation requirements. In addition, the StanfordMicroarray Database provides many options to ana-lyze, visualize, and download data.

Other much smaller gene expression data sour-ces are worth mentioning because they might con-tain data of interest to particular researchers. RAD(http://www.cbil.upenn.edu/RAD/php/index.php)from the University of Pennsylvania and MUSC(http://proteogenomics.musc.edu/ma/musc_madb.php?page=home&act=manage) from the MedicalUniversity of South Carolina contain a limitednumber of mouse, rat, and human gene expressiondata sets (Argraves et al. 2003; Manduchi et al. 2004).At MUSC experiment descriptions are complete anddata are downloaded rapidly. Because of the quickdownloads, it is worthwhile to examine both ofthese sites for experiments of interest.

The above-mentioned data warehouses allow forsome degree of within-experiment comparisons ofsamples. Because data frommost experiments can bedownloaded, it might be tempting to compare arraysfrom different experiments. However, cautionshould be used when comparing experiments be-tween laboratories and across platforms (Tan et al.2003). However, one recent encouraging set ofstudies shows that data collected using standardizedmethods from experienced laboratories can bereproducible across platforms and laboratories, atleast for a subset of differentially expressed genes(Bammler et al. 2005; Irizarry et al. 2005; Larkinet al. 2005).

Conclusions

There have been many improvements in geneexpression technologies over the past decade. Alongwith those improvements have come better algo-rithms and software to extract and analyze data. Inaddition, journals and gene expression warehouseshave been enforcing better descriptions of samplesand experiments. All of these trends have resulted inbetter data submitted to gene expression ware-houses. Some of these warehouses are now devel-oping user-friendly and powerful tools to more easilyextract meaningful information from these databas-es. All of these trends will allow us to get more outof each other�s data.

J.R. WALKER AND T. WILTSHIRE: FREE EXPRESSION DATABASES 1145

Page 6: Databases of free expression

Acknowledgments

The authors thank the reviewers for helpful com-ments and suggestions and the Novartis ResearchFoundation for financial support.

References

1. Argraves GL, Barth JL, Argraves WS (2003) The MUSCDNA Microarray Database. Bioinformatics 19,2473�2474

2. Ball CA, Awad IA, Demeter J, Gollub J, Hebert JM,et al. (2005) The Stanford Microarray Databaseaccommodates additional microarray platforms anddata formats. Nucl Acids Res 33(1), D580�D582

3. Bammler T, Beyer RP, Bhattacharya S, BoormanGA, Members of the Toxicogenomics ResearchConsortium (2005) Standardizing global geneexpression analysis between laboratories and acrossplatforms. Nate Methods 2, 351�356; Erratum(2005) 2, 477

4. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC,et al. (2005) NCBI GEO: mining millions of expressionprofiles—database and tools. Nucl Acids Res 33(Data-base issue), D562�D566

5. Bono H, Kawukawa T, Hayashizaki Y, Okazaki Y(2002) READ: RIKEN Expression Array Database. NuclAcids Res 30, 211�213

6. Brazma A, Parkinson H, Sarkans U, Shajatalab M, ViloJ, et al. (2003) ArrayExpress—a public repository formicroarray gene expression data at the EBI. Nucl AcidsRes 31, 68�71

7. Brown A, Olver WI, Donnelly CJ, May ME, Naggert JK,et al. (2005) Searching QTL by gene expression: anal-ysis of diabesity. BMC Genet 6, 12

8. Calvo S, Jain M, Xie X, Sheth SA, Chang B, et al. (2006)Systematic identification of human mitochondrialdisease genes through integrative genomics. NatGenet 38, 576�582

9. Diehn M, Sherlock G, Binkley G, Jin H, Matese JC,et al. (2003) SOURCE: a unified genomic resource offunctional annotations, ontologies, and gene expres-sion data. Nucl Acids Res 31(1), 219�223

10. Edgar R, Domrachev M, Lash AE (2002) GeneExpression Omnibus: NCBI gene expression and

hybridization array data repository. Nucl Acids Res30, 207�210

11. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S,et al. (2005) Multiple-laboratory comparison ofmicroarray platforms. Nat Methods 2, 245�350

12. Larkin JE, Frank BC, Gavras H, Sultana R, Quacken-bush J (2005) Independence and reproducibility acrossmicroarray platforms. Nat Methods 2, 337�344

13. Manduchi E, Grant GR, He H, Liu J, Mailman MD,et al. (2004) RAD and the RAD Study-Annotator: anapproach to collection, organization and exchange ofall relevant information for high-throughput geneexpression studies. Bioinformatics 20, 452�459

14. Mootha VK, Lepage P, Miller K, Bunkenborg J, ReichM, et al. (2003) Identification of a gene causing humancytochrome c oxidase deficiency by integrative ge-nomics. Proc Natl Acad Sci USA 100, 605�610

15. Ringwald M, Eppig JT, Begley DA, Corradi JP,McCright IJ, et al. (2001) The Mouse Gene ExpressionDatabase (GXD). Nucl Acids Res 29(1), 98�101

16. Son CG, Bilke S, Davis S, Greer BT, Wei JS, et al.(2005) Database of mRNA gene expression profiles ofmultiple human organs. Genome Res 15, 443�450

17. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, et al.(2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101,6062�6067

18. Tan PK, Downey TJ, Spitznagel EL Jr, Xu P, Fu D, et al.(2003) Evaluation of gene expression measurementsfrom commercial microarray platforms. Nucl AcidsRes 31, 5676�5684

19. Wang J, Williams RW, Manly KF (2003) WebQTL:Web-based complex trait analysis. Neuroinformatics1, 299�308

20. Welsh JB, Sapinoso LM, Kern SG, Brown DA, Liu T,et al. (2003) Large-scale delineation of secreted proteinbiomarkers over expressed in cancer tissue and serum.Proc Natl Acad Sci USA 100, 3410�3415

21. Wen BG, Pletcher MT, Warashina M, Choe SH, ZiaeeN, et al. (2004) Inositol (1,4,5) trisphosphate 3 kinase Bcontrols positive selection of T cells and modulatesErk activity. Proc Natl Acad Sci USA 101, 5604�5609

22. Zhang W, Morris QD, Chang R, Sahi O, Bakowski A,et al. (2004) The functional landscape of mouse geneexpression. J Biol 3, 21

1146 J.R. WALKER AND T. WILTSHIRE: FREE EXPRESSION DATABASES