Upload
diana-campbell
View
228
Download
0
Tags:
Embed Size (px)
Citation preview
ArrayExpressA public database for microarray based gene expression datahttp://www.ebi.ac.uk/microarray/
European Bioinformatics Institute
EMBL-EBI
Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team
MGED IV, Boston, February 2002
ArrayExpress
• Standards:MIAME-compliant• Data model: MAGE-OM• Data input: MAGE-ML, web• Data output: HTML, MAGE-ML,
TAB-delimited, link to Expression Profiler
• Data curation: Team of curators• Data sets: Yeast, human
Tuesday, February 12th, 2002Opened to public
General overview
ArrayExpress
MIAMExpressExpression
Profiler
MAGE-ML
Internet
www
MAGE-ML
ArrayExpress component architecture
Main databaseSQL derived
from MAGE-OM
Data warehousegene-centred
queries
Application serverJava servletsMAGE-OM
Imagesfile server
ArrayExpress
MAGE-ML
Submission/curation
Internet
www
ArrayExpress - features
• MIAME-compliant, MAGE-ML, MAGE-OM
• Can deal with:• raw quantitation data
• processed data
• data transformations
• Independent of:• experimental platforms
• image analysis methods
• data normalization methods
ArrayExpress: details
• Database schema derived from MAGE-OM
• Standard SQL, we use Oracle
• Data loader for MAGE-ML - generated• Web interface (first release 12.2.2002)
• Queries by experiment, array, sample• Browsing
• Object model-based query mechanism, automatic mapping to SQL
Simplified ArrayExpress model
MIAMExpress
• Data annotation and submission tool
• MIAME based web interface
• Experiment, Array, Protocol submissions
• Uses CV/ontology wherever possible
• Creates MAGE-ML files for loading into ArrayExpress
• Based on MySQL, Perl, CGI, Apache
Login
Pending/New Experiment
Sample1 Sample2 Sample3 Samplen Sample protocol
Hybridisations Hyb protocol
Array1 Array2 Array3 Arrayn Scanning protocol
Data1 Data2 Data3 Datan Image analysis protocol
Combined Experiment Data Transformation protocol
Submit Final free text comment
Create account
Extracts 1…nExtracts 1…n Extracts 1…n Extracts 1…n
E1 E2 En E1 E2 En E1 E2 En E1 E2 En
Extraction protocol
MIAMExpresssubmission procedure
MIAMExpress design and future
• Species and domain specific pages and ontologies, ontology development
• Life-span of data submissions is long • Curation control, submissions tracking• Interaction with ArrayExpress• Full MAGE-OM, data updating• Usability, flexibility, scalability, platform
independence • User needs, free in-house installation
ArrayExpress curation effort
• User support and help documentation• Submission support for MIAMExpress• Support on ontologies and CVs• Minimize free text, removal of synonyms• MIAME encouragement• Help on MAGE-ML• Goal: to provide high-quality, well-
annotated data to allow automated data analysis
• E-MEXP-234 Experiment 234 viaMIAMExpress
• E-SANG-25 Experiment 25 from Sanger Institute
• A-AFFY-1034Array description 1034 from Affymetrix
• P-LABL-5 Protocol 5 for labeling
Accession numbers
Data in ArrayExpress
• Human data (ironchip) from EMBL
• Yeast data from EMBL• S. pombe data Sanger
Institute
• TIGR array descriptions• Affymetrix chip designs• Direct pipeline from
Sanger (Rob Andrews)• HGMP mouse• EMBL mosquito
• (Add your name here!)
Now Work underway
Data browsing and queries
Experiment info
Sample info
General overview
ArrayExpress
MIAMExpressExpression
Profiler
MAGE-ML
Internet
www
MAGE-ML
Expression Profiler: EPCLUST
DATA SELECT FOLDER ANALYZE
A “CLUSTER”
URLMAP
GeneOntologyPathwaysDatabasesSPEXSOther tools
>YAL036C chromo=1 coord=(76154-75048(C)) start=-600 end=+2 seq=(76152-76754)
TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTGCTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTTCTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTTCACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTTTTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTGTTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_>YAL025C chromo=1 coord=(101147-100230(C)) start=-600 end=+2 seq=(101145-101747)CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACCACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTTGTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTATAATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACCTTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTGACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_
...>YBR084W chromo=2 coord=(411012-413936) start=-600 end=+2 seq=(410412-411014)CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCATTACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACGTATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTTCTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGGACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTACTGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_
101 Sequences relative to ORF start
GATGAG.T 1:52/70 2:453/508 R:7.52345 BP:1.02391e-33G.GATGAG.T 1:39/49 2:193/222 R:13.244 BP:2.49026e-33AAAATTTT 1:63/77 2:833/911 R:4.95687 BP:5.02807e-32TGAAAA.TTT 1:45/53 2:333/350 R:8.85687 BP:1.69905e-31TG.AAA.TTT 1:53/61 2:538/570 R:6.45662 BP:3.24836e-31TG.AAA.TTTT 1:40/43 2:254/260 R:10.3214 BP:3.84624e-30TGAAA..TTT 1:54/65 2:608/645 R:5.82106 BP:1.0887e-29...
GATGAG.TTGAAA..TTT
YGR128C + 100
Upstream sequence (600bp)
GATGAG.TTGAAA..TTT
GATGAG.T W/30 TGAAA..TTT
1 mismatch
EPCLUST Expression data GENOMES
sequence, function, annotation
SPEXSdiscover patterns
URLMAPprovide links
Components of Expression Profilerhttp://ep.ebi.ac.uk/
Expression data
External data, toolspathways, function,
etc.
PATMATCHvisualise patterns
EP:GOGeneOntology
EP:PPIProt-Prot ia.
SEQLOGO
Ackowledgments: the team (3)
Alvis BrazmaAlan Robinson Jaak Vilo
1999 NovemberMGED 1 in Hinxton, EBI
Ackowledgments: the team (5)
Alvis Brazma, Alan Robinson
DatabaseUgis Sarkans
Expression ProfilerJaak Vilo
Research, studentsThomas Schlitt
2000 August
Ackowledgments: the team (9)
Alvis Brazma
Database Curation MIAMExpressUgis Sarkans Helen Parkinson Mohammadreza
Shojatalab
Expression ProfilerJaak Vilo
Research, studentsThomas SchlittKatja KivinenJohan Rung
Patrick Kemmeren
2001 June
Ackowledgments: the team (19)
Alvis Brazma
Database Curation MIAMExpressUgis Sarkans
Gonzalo Garcia
Helen Parkinson Mohammadreza Shojatalab
Expression ProfilerJaak Vilo
Research, studentsThomas SchlittKatja KivinenJohan Rung
Patrick KemmerenMisha Kapushesky
Lev Soinov
Koichi Tazaki
Anastasia Samsonova
Susanna SansonePhilippe Rocca-SerraEle Holloway
Niran Abeyguna- wardena
Ahmet Oezcimen
2002 February