Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Comparative and Functional GenomicsComp Funct Genom 2004; 5: 633–641.Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.447
Conference Paper
Standardization initiatives in the(eco)toxicogenomics domain: a review
Susanna Assunta Sansone1*, Norman Morrison2, Philippe Rocca-Serra1 and Jennifer Fostel31EMBL-EBI, The European Bioinformatics Institute, Cambridge CB10 1SD, UK2The University of Manchester, Kilburn Building, School of Computer Science, Oxford Road, Manchester M13 9PL, UK3National Institute of Environmental Health Sciences, National Center for Toxicogenomics. Research Triangle Park, NC 27709, USA
*Correspondence to:Susanna Assunta Sansone,EMBL-EBI, The EuropeanBioinformatics Institute,Wellcome Trust GenomeCampus, Cambridge CB101SD, UK.E-mail: [email protected]
Revised: 24 November 2004Accepted: 26 November 2004
AbstractThe purpose of this document is to provide readers with a resource of different ongo-ing standardization efforts within the ‘omics’ (genomic, proteomics, metabolomics)and related communities, with particular focus on toxicological and environmentalapplications. The review includes initiatives within the research community as well asin the regulatory arena. It addresses data management issues (format and reportingstructures for the exchange of information) and database interoperability, highlight-ing key objectives, target audience and participants. A considerable amount of workstill needs to be done and, ideally, collaboration should be optimized and duplicationand incompatibility should be avoided where possible. The consequence of failing todeliver data standards is an escalation in the burden and cost of data managementtasks. Copyright 2005 John Wiley & Sons, Ltd.
Keywords: toxicogenomics; ecotoxicogenomics; toxicology; environment; functionalgenomics; standards; database
Introduction
Molecular-based approaches, such as transcrip-tomics, proteomics, metabolomics and metabo-nomics, are being used to study the impact ofchemicals on human and wildlife populations.These high-throughput (eco)toxicogenomics inves-tigations are information-intensive and, by produc-ing massive amounts of data, have placed the infor-matics challenge under the spotlight. The need toprovide easy access to integrated data in a struc-tured standard format is clearly significant. Severalefforts are already under way to promote stan-dardization, tackle data management issues anddevelop databases to facilitate data exchange. Wehave seen the value of these collaborative effortsalready. The Microarray Gene Expression Data(MGED; http://www.mged.org) Society has beensuccessful in developing the MIAME standard andrelated ontology and object models for microar-ray data (reviewed in Quackenbush 2004). The
Reporting Structure for Biological Investigations(RSBI; http://www.mged.org/Workgroups/rsbi)is a new working group formed under the MGEDSociety umbrella, planning to act as a ‘singlepoint of focus’ for Toxicogenomics, EnvironmentalGenomics and Nutrigenomics communities work-ing towards an international and compatible infor-matics platform for data exchange. Discipline-specific initiatives are regarded as important be-cause they target ‘real world’ data capture require-ments for the particular omics technologies beingused. A consequence of this, however, is that, byremaining within each given discipline, the stan-dardization effort fragments, resulting in duplica-tion and the development of different terminol-ogy and data models, thereby limiting the poten-tial for data exchange. One of the objectivesof the RSBI working group is to ensure thatthese initiatives are coordinated, so that synergyand cross-discipline communication can be max-imized, and duplicated effort can be minimized.
Copyright 2005 John Wiley & Sons, Ltd.
634 S. A. Sansone et al.
To capitalize on these efforts, representatives ofthe RSBI working group are also directly par-ticipating in certain initiatives and, by foster-ing interactions, are laying the ground for fur-ther collaborations. One forum for such interac-tion is the Standards and Ontologies for FunctionalGenomics (SOFG; http://www.sofg.org) Confer-ence. We invite comments on the work of the RSBIat [email protected]
Standardization initiatives
Data standardization is now considered beyond theresearch application of high-throughput technolo-gies (reviewed in Quackenbush, 2004) and reg-ulatory bodies, such as the US Food and DrugAdministration (FDA) and Environmental Protec-tion Agency (EPA), are developing their pol-icy or guidance on genomics data submissions(http://www.fda.gov/cder/guidance/5900dft.doc;http://www.epa.gov/osa/genomics.htm). Severalorganizations and committees are tackling datastandardization; however, there is a fundamentaldifference in both the design and objectives ofthe efforts around regulatory submission of datavs. the needs of the research community, whoneed databases and tools for discovery. The for-mer aims to accelerate the review process, facili-tate proprietary data submission and optimize datavisualization in a way that does not impact thevocabulary used by the individual submitter. Theresearch community needs to ease deposition inpublic databases and facilitate data mining by theuse of common annotation standards and ontolo-gies. There is some overlap between the needsof these communities and some level of interac-tion. Thus, there is value in assessing the com-monality between regulatory, research communityand database designers’ objectives in the designof data standards. Specifically, a unified approachto describing and reporting the experimental bio-logical metadata that is common to the different‘omics’ technologies (transcriptomics, proteomicsand metabonomics/metabolomics) or disciplines(e.g. pharmacogenomics, toxicogenomics, environ-mental genomics) is a goal of the RSBI. Undoubt-edly specialized information is needed by certainapplications, but a high-level unified model fordescription of metadata would be able to encom-pass these applications. Here, metadata, refers to
biological information relating to samples and theinformation about experimental design. Data refersto measured values relating to samples (e.g. toxico-logical endpoints and gene expression) under givenexperimental conditions.
This paper is not an exhaustive list of all activ-ity but provides a summary of standardizationefforts for toxicological and environmental applica-tions, which address reporting standards (e.g. whatshould be reported), and management issues (e.g.how reported information should be stored andexchanged, and which ontologies should be usedto annotate data and metadata). The various initia-tives fall into six broad categories, summarized inTable 1 and explored in detail below.
‘Omics’ technology communities
These are academic grass roots communities thathave joined forces with commercial vendors toaddress content standards and reporting needs fora single high-throughput technology.
MGED Society
The MGED Society has established standardsfor microarray data annotation (MIAME; Brazmaet al., 2001; Ball et al., 2002) and exchange(MAGE-ML; Spellman et al., 2002) that have facil-itated the creation of microarray databases andrelated supporting software (MAGE-OM; Spell-man et al., 2002). The response from the sci-entific community to these community standardshas been extremely positive (Editorial, 2002).Most of the major scientific journals and somefunding agencies require publications describingmicroarray experiments to comply with MIAME,for the data to be submitted to public reposito-ries, such as ArrayExpress (Brazma et al., 2003),GEO (Edgar et al., 2002) and CIBEX (Ikeo et al.,2003). Consequently, the MIAME model hasbeen adopted by other communities (Quackenbush,2004). MGED is now working with other initia-tives, such as HUPO-PSI in the proteomics fieldand SMRS (see below). There have been severalextensions to MIAME: MIAME-Tox, an array-based toxicogenomics standard developed by theILSI Health and Environmental Sciences Institute(HESI) (http://hesi.ilsi.org/index.cfm?pubenti-tyid=120); the National Institute of Environmental
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
Standardization initiatives in (eco)toxicogenomics 635
Table 1. The initiatives divided according to six broad categories
Category Description Acronym Domain URL
Omics technology Academic grass roots communities MGED Microarray http://www.mged.orgcommunities that have joined forces with PSI Proteomics http://psidev.sourceforge.net
commercial vendors to createtechnology-driven standards
SMRS Metabolomics andmetabonomics
http://www.smrsgroup.org
Measurement andmethods validations
Efforts focusing on validationprograms and production of
ECVAM Array-basedtoxicogenomics
http://ecvam.jrc.cec.eu.int
standard materials and methods ERCC Microarrays and http://www.cstl.nist.gov/bio-quantitative RT-PCR tech/workshops/ERCC2004
MARG Microarray http://www.abrf.org/index.cfm/group.s how/
ABRF Microarray.30.htmMFB http://www.mfbprog.org.uk
Regulatory drivendiscussion fora
Efforts aiming for a broader CDISC Clinical data http://www.cdisc.org
understanding and use of omicsdata, defining data models for data
PGx Pharmacogenomicsdata
To be announced
submission to regulators. That SEND Animal toxicity data http://www.cdisc.org/models/preserve the terms andobservations used by the submitter
send/v1.5
Domain-driven discussion Efforts aiming to a broader DSSTox Chemical toxicity data http://www.epa.gov/nheerl/fora exchange and integration of dsstox/
toxicity and ecological data SEEK Ecological data http://seek.ecoinformatics.org
World-wide organizations Efforts producing internationally IPCS Toxicogenomics http://www.who.int/ipcs/en/agreed instruments, decisions and NAS (Eco)toxicogenomics http://dels.nas.edu/emerging-recommendations or acting as issuesfacilitator OECD Ecotoxicogenomics http://www.oecd.org
BSC IEEE Bioscience http://www.csbcon.org
Infrastructure Standards-compliant infrastructure, ArrayExpress Array-based data and http://www.ebi.ac.uk/array-assisting in development of useful and Tox- toxicology endpoints expressand usable standards MIAMExpress values http://www.ebi.ac.uk/tox-
miamexpressCEBS Toxicogenomics http://cebs.niehs.nih.govCTD Genes and proteins http://ctd.mdibl.orgmaxd Array-based data and http://bioinf.man.ac.uk/micro-
environmentalmetadata
array/maxd
TIS Toxicogenomics http://www.fda.gov/nctr/sci-(ArrayTrack) ence/centers/toxicoinfor-
matics/ArrayTrack
Health Sciences (NIEHS); the National Center forToxicogenomics (NCT; http://www.niehs.nih.gov/nct); the FDA National Center for Toxicologi-cal Research (NCTR; http://www.fda.gov/nctr);and the European Bioinformatics Institute (EBI;http://www.ebi.ac.uk). MIAME/Env has beendeveloped by the NERC Environmental GenomicsThematic Programme Data Centre (EGTDC; http://envgen.nox.ac.uk) to fulfil the diverse needs ofthose working in the functional genomic of ecosys-tems, invertebrates and vertebrates which are not
covered by the model organism community. How-ever, extending MIAME to meet domain-specificrequirements is only a partial solution. As multi-technology investigations become commonplace,these checklists will soon be insufficient. Currently,the above communities are working together withthe RSBI group to develop a reporting structure fordescribing multi-platform technologies investiga-tions. The proposed RSBI Tiered Checklist (RSBITC; http://www.mged.org/Workgroups/rsbi) willbe a modular context-dependent structure.
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
636 S. A. Sansone et al.
Proteomics Standardization Initiative (PSI)
The HUPO (Human Proteome Organization;(http://www.hupo.org) PSI (http://psidev.sourceforge.net) includes the major protein databases,government and industry and is defining standardsfor data representation in proteomics to facili-tate data comparison, exchange and verification.Current focus is on mass spectrometry and pro-tein–protein interaction data. A set of open sourcestandards are being developed along MIAME lines,including a content standard, the Minimum Infor-mation About Proteomics Experiments (MIAPE),an XML standard data exchange format (Herm-jakob et al., 2004) and an ontology of clearlydefined general proteomics terms.
Standard Metabolic Reporting Structure (SMRS)
SMRS (http://www.smrsgroup.org) comprises in-dustry, software developers, governmental repre-sentatives and academia, who are investigatingthe reporting and design of metabonomics andmetabolomics studies in plants, microbial systems,environment, in vivo and in vitro applications, aswell as human studies. A set of draft recommen-dations has been produced as a discussion docu-ment. It considers the factors in a metabolic studythat could be recorded and standardized, includingthe origin of a biological sample, the technologiesand methods for analysis and the chemometric andstatistical approaches. The recommendations alsotouch on the granularity of information requiredfor different reporting needs, including journal sub-missions, public databases and regulatory submis-sions.
Measurement and methods validations
As high-throughput technologies are used in indus-try and are considered by regulatory agencies, themethodology itself comes under scrutiny. Agree-ment on data formats will do little good if exper-imental protocols are inconsistent. Currently, stan-dardization of microarray experiment procedures iskey to the broad acceptance and use of these data.The very variability of microarray data generation,analysis, future validation of the technology andproduction of standard materials is now the focusof many initiatives.
MfB (Measurements for Biotechnology)program
MfB (http://www.mfbprog.org.uk) is a UK pro-gramme that addresses bio-measurements of impor-tance for industry. The ‘Comparability of GeneExpression Measurements on Microarrays’ is an in-dustry-based consortium led by LGC (http://www.lgc.co.uk). The project is designed to determinethe accuracy and comparability of gene expressionmeasurements made on different array platformsand also evaluates data analysis methods. A sec-ond phase is now looking at the standardization ofarray-based toxicogenomics and will build up onthe analysis framework to develop a panel of qual-ity metrics for validating and standardizing array-based toxicogenomics measurements.
The Microarray Research Group (MARG) ofthe Association of Biomolecular ResourceFacilities (ABRF)
The MARG (http://www.abrf.org/index.cfm/gro-up.show/Microarray.30.htm) is a research-focu-sed consortium of academic laboratories promotingcommunication and cooperation among core aca-demic and industrial microarray and data analysisservices providers. The resulting data is used tohelp laboratories evaluate their performance andachieve the highest quality results possible fromthe use of microarray technologies.
The European Centre for the Validationof Alternative Methods (ECVAM)
The ECVAM (http://ecvam.jrc.cec.eu.int) coordi-nates and funds validation studies of alternativemethods that could reduce, refine or replace theuse of laboratory animals in regulatory toxicol-ogy. Both the new EU Chemical Policy (REACH)(Editorials, 2003a, 2003b) that proposes the re-evaluation of about 30 000 chemicals, and the 7thAmendment to the Cosmetics Directive, whichforesees the complete replacement of animal exper-iments by 2013, call for the development andimplementation of alternative methods. ECVAMis working with the US Interagency CoordinatingCommittee on the Validation of Alternative Meth-ods (ICCVAM; http://iccvam.niehs.nih.gov/home.htm) and National Toxicology Program Intera-gency Center for the Evaluation of Alternative Tox-icological Methods (NICEATM; http://iccvam.ni-
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
Standardization initiatives in (eco)toxicogenomics 637
ehs.nih.gov/home.htm) to investigate the specificconsiderations necessary for adequate validationof array-based toxicogenomics-based test meth-ods. At present, recommendations are being pre-pared which will cover topics such as descriptionof the biological systems, methodological/technicalissues, data analysis, and data format and storage.
External RNA Controls Consortium (ERCC)
ERCC (http://www.cstl.nist.gov/biotech/work-shops/ERCC2004) originated at a US NationalInstitute of Standards and Technology (NIST;http://www.nist.gov) meeting and is composed ofrepresentatives from the public, private and aca-demic sectors, addressing experimental control andperformance evaluation for gene expression anal-ysis. ERCC is considering the utility of univer-sal (platform-independent) spike-in controls, proto-cols, and informatics tools intended for use acrossone- and two-channel microarray and quantitativeRT-PCR (QRT-PCR). Outcomes of this work willbe published and resulting data submitted to a pub-lic database.
Regulatory-driven fora
To streamline regulatory electronic submissions anumber of technical issues need to be addressed.These efforts intend to identify the kind of datathat should be included in submissions to regula-tory bodies and automate the largely paper-basedclinical trials and non-clinical research processes.
Clinical Data Interchange StandardsConsortium (CDISC)
CDISC (http://www.cdisc.org) is an open, mul-tidisciplinary, non-profit organization committedto the development of worldwide pharmaceuti-cal industry standards, vendor-neutral, platform-independent data models to support the electronicacquisition, exchange, and the submission andarchiving of clinical trials data and metadata.
Standard for Exchange of Non-clinical Data(SEND)
SEND (http://www.cdisc.org/models/send/v1.5)is a consortium formed among the pharmaceuticalindustry, contract laboratories, software developers
and the FDA. The goal of SEND is to developa common format for the electronic submissionof animal toxicity data and study description toa regulatory agency. Once the SEND standard isfinalized, it will be merged with CDISC’s model toform the Study Data Tabulation Model (SDTM).
Pharmacogenomics (PGx) Standards Group
The Pharmacogenomics (PGx) Standards Groupwas formed in November 2003 at a workshoporganized by the Drug Information Association(DIA), FDA, Pharmacogenetics Working Group(PWG), Pharmaceutical Research and Manufac-turers of America (PhRMA) and BiotechnologyIndustry Organization (BIO) to review the FDAdraft, ‘Guidance for Industry — PharmacogenomicData Submissions’. The PGx Standards Groupencompasses regulatory bodies, pharma, and indus-try organizations. The goal of this joint project isto help define the requirements for pharmacoge-nomics submission to the FDA and define dataformats and standards. This project focuses on theuse of pharmacogenomics and toxicogenomics datato support pharmacological and toxicological con-clusions. There is a consensus within this groupto use existing standards (e.g. MIAME, MAGE,SEND, CDISC) if available, and to extend them ifneeded.
Domain-driven fora
These toxicoinfomatics and ecoinformatics specificinitiatives are an example of international coordina-tion for the development and adoption of controlledvocabularies and format for exchanging chemicaltoxicity, and ecological and environmental data.
The Distributed Structure-Searchable Toxicity(DSSTox)
DSSTox (http://www.epa.gov/nheerl/dsstox) is anetwork project by the US EPA, providing acommunity forum for publishing standard format,structure-annotated chemical toxicity data files foropen public access. Although a primary focus ofthis effort is aimed towards inclusion of chem-ical structures and standardized chemical fields,DSSTox will also promote the use of a con-trolled vocabulary, i.e. common data field names
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
638 S. A. Sansone et al.
and entry formats for the same types of toxicitydata across databases. It will link to such pub-lic toxicity data by incorporating DSSTox Stan-dard Fields and Indices in the custom databases,making common queries possible using a stan-dard DSSTox identifier. DSSTox is collaborat-ing with, or using standards from, several otherefforts, including the LeadScope In Silico Tox(LIST) Focus Group, the National Cancer Insti-tute (NCI), NIEHS’s National Center for Tox-icogenomics and the National Toxicology Pro-gram, the National Library of Medicine (NLM)TOXNET, the International Union of Pure andApplied Chemistry (IUPAC), the National Insti-tutes of Standards and Technology (NIST), the ILSIHESI SAR Toxicity Database Project and MGED’sMIAME/Tox, as well as numerous vendors andconsortia (http://www.epa.gov/nheerl/dsstox/Co-ordinatingPublicEfforts.html).
The Science Environment for EcologicalKnowledge (SEEK)
SEEK (http://seek.ecoinformatics.org) is a mul-tidisciplinary initiative designed to create cyber-infrastructure for ecological, environmental andbiodiversity research and to educate the ecologicalcommunity about eco-informatics. SEEK partici-pants are building an integrated data grid (EcoGrid)for accessing a wide variety of ecological andbiodiversity data and analytical tools (Kepler;http://kepler-project.org). Ecological MetadataLanguage (EML) is a metadata specification devel-oped in association with SEEK and the KnowledgeNetwork for Biocomplexity (KNB; http://knb.eco-informatics.org) that can by used in a modu-lar and extensible manner to document ecologicaldata.
World-wide organizations
Global organizations have initiated a dialoguebetween technological experts, regulators and theprincipal validation bodies to draw road mapsfor development, validation and regulatory useof omics-based technologies in chemical assess-ment. Others are liaising with different life sciencesdisciplines, offering support, mediation and con-sultancy to speed up the standards developmentprocess.
Organization for Economic Co-operation andDevelopment (OECD) and the InternationalProgram on Chemical Safety (IPCS)
IPCS (http://www.who.int/ipcs/en/) is a joint pro-gram of three cooperating organizations — theInternational Labour Organization, the United Na-tions Environment Network and the World HealthOrganization — implementing activities related tochemical safety. In collaboration with the Orga-nization for Economic Cooperation and Devel-opment (OECD, http://www.oecd.org), the IPCShas organized a series of workshops to iden-tify the possible application of methods based on(eco)toxicogenomics in regulatory hazard assess-ment, to determine the current limitations to theuse of (eco)toxicogenomics in regulatory assess-ment and develop a plan to overcome such lim-itations, to identify the need for future activitieswith regard to the use of these methods in testguidelines, new and existing chemicals, pesticidesand biocides programs. At present, recommenda-tions are being prepared and will be published. Inview of these recommendations, the developmentof a coordinated international research programon (eco)toxicogenomics will be initiated, aimingto optimize the integration of genomic techniquesinto (eco)toxicology and their use in ecological andhuman health risk assessment.
The National Academy of Sciences (NAS)
The NAS Committee on Emerging Issues and Dataon Environmental Contaminants (http://dels.nas.edu/emergingissues) is a public forum for com-munication among government, industry, envi-ronmental groups and the academic communityabout emerging evidence and issues in toxicoge-nomics, environmental toxicology, risk assessmentand exposure assessment. The Committee willdevelop a framework for how the emerging fieldof genomics will be incorporated into risk assess-ment.
Institute of Electrical and Electronics Engineers(IEEE) Computer Society
The Bioinformatics Standards Committee (BSC;http://www.csbcon.org) has a mission to act asa liaison between groups in the bioscience com-munity, developing standards for biological objects
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
Standardization initiatives in (eco)toxicogenomics 639
in the life sciences disciplines and the IEEE Stan-dards Association. BSC will provide a neutralforum for the global bioinformatics community towork towards common agreements on standards innew areas and integration between established stan-dards.
Standard(s)-compliant infrastructure
This section provides a short review of pub-lic infrastructure currently available for toxicoge-nomics and environmental genomics data. Theseefforts are in different stages of development, serv-ing specific needs of their user community andrelying on diverse types of funding support. Never-theless, these are examples of institutions workingtogether, sharing expertise and moving towards aninternationally compatible informatics platform fordata exchange, interacting closely with standard-ization initiatives listed here.
ArrayExpress and Tox-MIAMExpress
ArrayExpress (http://www.ebi.ac.uk/arrayexpre-ss) (Brazma et al., 2003) is a MGED standards-compliant, public infrastructure for microarray-based gene expression data at the EBI. Theinfrastructure has been extended to link bio-logical endpoint values with gene expressiondata as result of a collaborative undertakingwith the ILSI HESI Committee on the Applica-tion of Toxicogenomics Data to Mechanism-basedRisk Assessment (http://www.ebi.ac.uk/mic-roarray/Projects/tox-nutri). Their toxicogenomicsdatasets (Pennie et al., 2004) have been submittedto ArrayExpress using Tox-MIAMExpress, the on-line MIAME/Tox-compliant data input tool (Matteset al., 2004) (http://www.ebi.ac.uk/tox-miamex-press). The ILSI HESI Committee research pro-gramme has provided the first large array-basedtoxicogenomics dataset in the public domain anno-tated according to the MGED standards.
Chemical Effects in Biological Systems (CEBS)Knowledgebase
CEBS (http://cebs.niehs.nih.gov) (Waters et al.,2003) is a public toxicogenomics knowledgebasein year two of its 10 year development at theNIEHS’s NCT. CEBS aims to integrate omics
datasets in the context of toxicology to advanceknowledge discovery about toxicity (Waters et al.,2003; Waters and Fostel, 2004; Mattes et al.,2004). CEBS implements standards developedby the MGED Society and the HUPO PSI inthe CEBS SysBio object model (Xirasagar et al.,2004). CEBS is designing an ontological rep-resentation of data and terms used by its col-laborators, which includes descriptors for differ-ent study design types and metadata vocabular-ies.
maxd
maxd (http://bioinf.man.ac.uk/microarray/ma-xd) is an open-source data warehouse and visu-alization environment for genomic expression dataemployed by the NERC EGTDC. The maxd soft-ware suite includes two major components. Thefirst, maxdLoad2, is a database schema and dataloading and curation application designed to enablebiologists to store expression data, annotate it toMIAME and MIAME/Env standards, and exportit in MAGE-ML format to ArrayExpress. Thesecond, maxdView, is a modular analysis andvisualization environment for interactive explo-ration of transcriptomics data and associated meta-data.
Toxicoinformatics Integrated System (TIS)
ArrayTrack (http://www.fda.gov/nctr/science/ce-nters/toxicoinformatics/ArrayTrack; Tong et al.2003) is an integrated software system for man-aging, mining and visualizing microarray geneexpression data at NCTR-FDA. The system hasthree integrated components: a MIAME-compliantdatabase storing array-based toxicogenomics data;a set of tools providing data visualization andanalysis capability; and a library containing func-tional information about genes, proteins, pathwaysand toxicants. ArrayTrack is the first module ofTIS, a system to integrate genomic, proteomic andmetabonomic data with data from the public repos-itories, as well as conventional in vitro and in vivotoxicology data. TIS will serve as a general tox-icogenomics repository for diverse data sources,supporting broad data mining and meta-analysisactivities, as well as the development of robust andvalidated predictive toxicology systems.
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
640 S. A. Sansone et al.
The Comparative Toxicogenomics Database(CTD)
The CTD (http://ctd.mdibl.org) promotes under-standing about the effects of environmental chem-icals on human health by facilitating cross-speciescomparative studies of toxicologically importantgenes and proteins. CTD is now publicly availableas a prototype. It provides annotated associationsbetween genes, proteins, sequences, references andchemicals in vertebrates and invertebrates; inte-grates molecular and toxicology data; implementsontologies; and will describe gene–chemical inter-actions in diverse organisms. These data provideinsight into the genetic basis of variable sensitivityto chemicals and complex interactions between theenvironment and human health.
Conclusions
Data produced by (eco)toxicogenomics investiga-tions are growing in volume and complexity at astaggering rate. It is not trivial to define precisedata content, presentation and exchange formats.However, there is a growing realization withinthe (eco)toxicogenomics community that, if weare to realize the opportunities offered by omics-based technologies, we will need to change ourapproach to data handling and work more collabo-ratively. The authors, also moderators of the RSBIworking group, would like to emphasize the needfor community participation in the integration ofthese standardization initiatives. It is hoped thathighlighting these different initiatives will help toassess the commonality and optimize harmoniza-tion, thus minimizing duplication and incompatibil-ity and achieving cost-effective results in a timelymanner.
Acknowledgements
The authors would like to thank the following peoplefor their assistance: Alvis Brazma, Ann Richard, BettyCheng, Carole Foy, Carolyn Mattingly, Chris Taylor, CraigZwickl, Dawn Field, Helen Parkinson, Henning Herm-jakob, Jason Snape, Jessie Kennedy, John Lindon, MichaelWaters, Nancy Doerrer, Peter Lord, Raffaella Corvi, RobertStevens, Syril Petit, Thomas Papoian, Weida Tong and theSOFG organizers. Susanna-Assunta Sansone is supportedby the ILSI-HESI Genomics Committee, Norman Morri-son by the NERC Environmental Genomics programme,
Philippe Rocca-Serra by the European Commission NuGOproject and Jennifer Fostel by the NIEHS NCT.
References
ArrayExpress: http://www.ebi.ac.uk/arrayexpressArrayTrack: http://www.fda.gov/nctr/science/centers/toxicoin-
formatics/ArrayTrackBall CA, Sherlock G, Parkinson H, et al. 2002. An open letter to
the scientific journals, published in: Science 298(5593): 539;Bioinformatics 18(11): 1409; Lancet 360: 1019.
Brazma A, Parkinson H, Sarkans U, et al. 2003. ArrayEx-press — a public repository for microarray gene expression dataat the EBI. Nucleic Acids Res 31(1): 68–71.
Brazma A, Hingamp P, Quackenbush J, et al. 2001. Minimuminformation about a microarray experiment (MIAME) — towardstandards for microarray data. Nature Genet 29(4): 365–371.
BSC; http://www.csbcon.orgCDISC; http://www.cdisc.orgCEBS; http://cebs.niehs.nih.govCTD; http://ctd.mdibl.orgDSSTox Coordinating Public Effort project; http://www.epa.gov/
nheerl/dsstox/CoordinatingPublicEfforts.htmlDSSTox; http://www.epa.gov/nheerl/dsstoxEBI toxicogenomics; http://www.ebi.ac.uk/microarray/Projects/
tox-nutriEBI; http://www.ebi.ac.ukECVAM; http://ecvam.jrc.cec.eu.intEditorial. 2002. Microarray standards at last. Nature 419: 323.Editorial. 2003a. EU starts a chemical reaction. Science 300, 5618:
405.Editorial. 2003b. Europe whittles down plans for massive chemical
testing program. Science 302, 5647: 969.Edgar R, Domrachev M, Lash AE. 2002. Gene Expression
Omnibus: NCBI gene expression and hybridization array datarepository. Nucleic Acids Res 30(1): 207–210.
EPA DRAFT — Potential implications of genomics for regulatoryand risk assessment applications at EPA; http://www.epa.gov/osa/genomics.htm
ERCC; http://www.cstl.nist.gov/biotech/workshops/ERCC2004FDA draft guidance for industry pharmacogenomic data
submissions; http://www.fda.gov/cder/guidance/5900dft.docHermjakob H, Montecchi-Palazzi L, Bader G, et al. 2004. The
HUPO PSI molecular interaction format — a communitystandard for the representation of protein interaction data. NatureBiotechnol 22: 177–183.
ICCVAM; http://iccvam.niehs.nih.gov/home.htmIkeo K, Ishi-i J, Tamura T, et al. 2003. CIBEX: center for
information biology gene expression database. C R Biol326(10–11): 1079–1082.
ILSI HESI; http://hesi.ilsi.org/index.cfm?pubentityid=120IPCS; http://www.who.int/ipcs/enKepler; http://kepler-project.orgKNB; http://knb.ecoinformatics.orgLGC; http://www.lgc.co.ukMARG; http://www.abrf.org/index.cfm/group.show/Microar-
ray.30.htm
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
Standardization initiatives in (eco)toxicogenomics 641
Mattes WB, Pettit SD, Sansone A, et al. 2004. Database devel-opment in toxicogenomics: issues and efforts. Environ HealthPerspect 112: 495–505.
maxd; http://bioinf.man.ac.uk/microarray/maxdMfB; http://www.mfbprog.org.ukMGED RSBI Working Groups; http://www.mged.org/Workgro-
ups/rsbiMGED; http://www.mged.orgNAS Committee on Emerging Issues and Data on Environmental
Contaminants; http://dels.nas.edu/emergingissues/index.aspNCTR-FDA; http://www.fda.gov/nctrNERC EGTDC; http://envgen.nox.ac.ukNICEATM; http://iccvam.niehs.nih.gov/home.htmNIEHS-NCT; http://www.niehs.nih.gov/nctOECD; http://www.oecd.orgPennie W, Pettit SD, Lord PG. 2004. Toxicogenomics in risk
assessment: an overview of an HESI collaborative researchprogram. Environ Health Perspect 112: 417–419.
PSI; http://psidev.sourceforge.netQuackenbush J. 2004. Data standards for ‘omic’ science. Nature
Biotechnol 22: 613–614.SEEK; http://seek.ecoinformatics.orgSEND; http://www.cdisc.org/models/send/v1.5
SMRS; http://www.smrsgroup.orgSOFG; http://www.sofg.orgSpellman PT, Miller M, Stewart J, et al. 2002. Design and
implementation of microarray gene expression mark-uplanguage (MAGE-ML). Genome Biol 3(9): research0046.
Stoeckert CJ, Parkinson H. 2003. The MGED ontology: aframework for describing functional genomics experiments.Comp Funct Genom 4: 127–132.
Stoeckert CJ, Causton HC, Ball CA. 2002. Microarray databases:standards and ontologies. Nature Genet 32: 469–473.
Tong W, Cao X, Harris S, et al. 2003. ArrayTrack — supportingtoxicogenomic research at the U.S. Food and DrugAdministration National Center for Toxicological Research.Environ Health Perspect 111: 1819–1826.
Tox-MIAMExpress; http://www.ebi.ac.uk/tox-miamexpressWaters M, Boorman G, Bushel P, et al. 2003. Systems toxicology
and the chemical effects in biological systems knowledge base.Environ Health Perspect 111: 811–824.
Waters MD, Fostel JM. 2004. Toxicogenomics and systemstoxicology: aims and prospects. Nature Rev Genet 5: 938–948.
Xirasagar S, Gustafson S, Merrick AB, et al. 2004. CEBSobject model for systems biology data, CEBS SysBio-OM.Bioinformatics 20(13): 2004–2015.
Copyright 2005 John Wiley & Sons, Ltd. Comp Funct Genom 2004; 5: 633–641.
Submit your manuscripts athttp://www.hindawi.com
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttp://www.hindawi.com
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
International Journal of
Microbiology