Marie-Adèle RajandreamThe Pathogen Sequencing Unit
The Sanger InstituteThe Wellcome Trust Genome Campus
HinxtonCambridge
United Kingdom
The Sanger Institute
Principally funded by Wellcome Trust (about 96 %)
60,000,000 bases per day of raw data
600 employees
Sequencing of Human, Mice, Zebrafish & pathogen genomes
Manual and automatic genome annotation (Ensembl, Artemis)
Identification of cancer causing mutations (recently BRAF gene mutation)
Sequence variation and disease association
Sequencing Small genomes (bacterial and model organisms) 60-70 projects Current capacity 4 M reads p/a sufficient for 100 Mb of finished sequence Mainly whole genome/chromosome shotguns including finishing Many are international collaborations Larger more complex genomes (35-100 Mb) on the horizon
Informatics Automatic analysis Manual annotation by expert biologists Tools: finishing (Cyclops), annotation (Artemis), comparative analysis (ACT) Data dissemination Database resources
Functional Genomics S. pombe Bacterial Genomes D. discoideum
The Pathogen Sequencing Unit
GeneDB
http://www.genedb.org
Project pages
annotation
sequencesanalysis
GeneDBhttp://www.genedb.org
FTP site
BLASTcuration
What is GeneDB?
• a generic organism database
• annotated sequences as well as functional data
• visualisation in user-friendly environment
• annotation and analysis of data by biologists
• flexible enough to incorporate new data types
• linked to external databases
• fully curated
The GeneDB project
• Started in 2001
• Funded by the Wellcome Trust for a period of 5 years
• Initially for 3 organisms: S. pombe, Leishmania & Trypanosome
• 2 full-time programmers, 1 part-time programmer
• One curator for each organism
• One helpdesk person / programmer
• Prototype now done and in use
Technical Outline Prototype“Java”
biojava
data
gui
minelet
mining
test
utils
web
Web
jsp cgi
blast
ominblast
asp common
cerevisiae
pombe
malaria
leish
tryp
Data
aspimagesserialiseindices
cerevisiaeimagesserialiseindices
pombe
malaria
tryp
leish
EMBL
Broad specifications for production version
• Relational database
• Curator / annotator interface incorporating functionality of Artemis (MESS)
• Facility for doing more complex queries
For comprehensive, detailed specs see our Functional Specifications document
P. falciparum chr. 14
“biotin carboxylase”Inferred by Sequence Similarity
with a yeast sequenceSGD:S0005299
(which was originally annotated based on a published
mutant phenotype)
Pathogen Sequencing Unit
AnalysisMartin AslettSteven Bentley Matthew BerrimanAna CerdenoChristiane Hertz-FowlerMatthew HoldenKeith JamesRachel Lyne Arnab PainChris PeacockMohammed Sebaihia Nick Thomson Valerie Wood
Project ManagementBart BarrellJulian ParkhillMarie-Adele RajandreamAl IvensNeil Hall
ProgrammingRob DaviesDavid HarperArnaud KerhornouPaul MooneyKim RutherfordAdrian TiveyEd Zuiderwijk
Karen MungallTheresa FeltwellIan GoodheadZahra HanceHeidi HauserMandy SandersMark SimmondsDanielle Walker
Barbara HarrisBecky AtkinAndrew BarronCarol ChillingworthLouise ClarkeCraig CortonJonathan DoggettNicola LennardAlexandra LineDoug Ormand
David HarrisMatthew CollinsNigel FoskerArlette GobleLee MurphySusan O’NeilSimon RutterDavid SaundersKathy SeegerRobert SquaresSteven Squares
Carol ChurcherKaren Brooks Inna CherevachTracey ChillingworthKay ClarkePaul DaviesNancy HamlinKay JagelsSharon MouleBrian WhiteSally WhiteheadSubcloning
Ann CroninAudrey FraserDavid JohnsonMike QuailClaire Price Ester Rabbinowitsch Sarah Sharp
MappingMaria FookesJohn Woodward
Sequencing
Wellcome Trust Sanger Institute
AdministrationYvonne Shaw