Upload
purdue
View
0
Download
0
Embed Size (px)
Citation preview
Nucleic Acids Research 2014 1doi 101093nargku1164
GenoBase comprehensive resource database ofEscherichia coli K-12Yuta Otsuka1dagger Ai Muto1dagger Rikiya Takeuchi1dagger Chihiro Okada2dagger Motokazu Ishikawa2Koichiro Nakamura2 Natsuko Yamamoto1 Hitomi Dose1 Kenji Nakahigashi3Shigeki Tanishima2 Sivasundaram Suharnan4 Wataru Nomura1 Toru Nakayashiki1 WalidG Aref5 Barry R Bochner6 Tyrrell Conway7 Michael Gribskov8 Daisuke Kihara58Kenneth E Rudd9 Yukako Tohsato10 Barry L Wanner11 and Hirotada Mori1
1Graduate School of Biological Sciences Nara Institute of Science and Technology Ikoma Nara 630ndash0101 Japan2Mitsubishi Space Software Co LTD 5ndash4ndash36 Tsukaguchihonnmachi Amagasaki Hyougo 661ndash0001 Japan3Institute of Advanced Biosciences Keio University Tsuruoka 997ndash0017 Japan 4Axiohelix Okinawa Sangyo ShienCenter 5021831ndash1 Oroku Naha-shi Okinawa 901ndash0152 Japan 5Department of Computer Science PurdueUniversity 305 N University Street West Lafayette IN 47907ndash2107 USA 6Biolog Inc Hayward CA 94545 USA7Department of Microbiology and Plant Biology University of Oklahoma Norman OK 73019ndash0245 USA8Department of Biological Sciences Purdue University 915 W State Street West Lafayette IN 47907ndash2054 USA9Department Biochemistry and Molecular Biology University of Miami PO Box 016129 Miami FL 33101ndash6129USA 10Department of Bioinformatics Ritsumeikan University 1ndash1ndash1 Nojihigashi Kusatsu Shiga 525ndash8577 Japanand 11Department of Microbiology and Immunobiology Harvard Medical School Boston MA 02115 USA
Received September 17 2014 Revised October 27 2014 Accepted October 30 2014
ABSTRACT
Comprehensive experimental resources such asORFeome clone libraries and deletion mutant col-lections are fundamental tools for elucidation ofgene function Data sets by omics analysis usingthese resources provide key information for func-tional analysis modeling and simulation both in in-dividual and systematic approaches With the long-term goal of complete understanding of a cell wehave over the past decade created a variety of cloneand mutant sets for functional genomics studies ofEscherichia coli K-12 We have made these exper-imental resources freely available to the academiccommunity worldwide Accordingly these resourceshave now been used in numerous investigationsof a multitude of cell processes Quality control isextremely important for evaluating results gener-ated by these resources Because the annotationhas been changed since 2005 which we originallyused for the construction we have updated these
genomic resources accordingly Here we describeGenoBase (httpecolinaistjpGB) which containskey information about comprehensive experimentalresources of E coli K-12 their quality control andseveral omics data sets generated using these re-sources
INTRODUCTION
Escherichia coli K-12 is clearly one of the best studied or-ganisms (1) and has had an enormous contribution on con-struction of concept of genes over the past half century (2)It is however still far from completely understood at thesystems level although complete genome structure was es-tablished and numerous functional analyses have been per-formed Since the genome structure was determined newstreams of biology have come up Omics approaches suchas cytomics (3) interactome (4) phenomics (5) proteomics(6) transcriptomics (7) etc have been developed accord-ing to the innovation of high-throughput (HT) technolo-gies Simultaneously quantitative single cellular or singlemolecule analysis have emerged and become increasingly
To whom correspondence should be addressed Tel +81 743 72 5660 Fax +81 743 72 5669 Email hmorigtcnaistjpCorrespondence may also be addressed to Barry L Wanner Tel +1 617 432 5093 Fax +1 617 738 7664 Email blwannergeneticsmedharvardedudaggerThe authors wish it to be known that in their opinion the first four authors should be regarded as Joint First AuthorsPresent addressesNatsuko Yamamoto Department of Biomedical Ethics and Public Policy Graduate School of Medicine Osaka University Suita Osaka 565ndash0871 JapanYukako Tohsato Riken Quantitative Biology Center Kobe 650ndash0047 JapanFormer address Barry L Wanner Department of Biological Sciences Purdue University West Lafayette IN 47907 USA
Ccopy The Author(s) 2014 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited
Nucleic Acids Research Advance Access published November 15 2014 by D
aisuke Kihara on N
ovember 21 2014
httpnaroxfordjournalsorgD
ownloaded from
2 Nucleic Acids Research 2014
x
db_id
name
description
urlprefi
url
PKdb
organism_id
abbreviation
genus
species
common_name
comment
PKorganism
cv_id
name
definition
cvPK
pubauthor_id
pub_id
rank
editor
surname
givennames
suffix
PK
FK
pubauthor
pub_id
title
volumetitle
volume
series_name
issue
pyear
pages
miniref
uniquename
type_id
is_obsolete
publisher
pubplace
pubPK
FK
featureprop_id
type_id
value
rank
feature_id
PK
FK
FK
featureprop
cvterm_id
cv_id
name
definition
dbxref_id
is_obsolete
is_relationshiptype
cvtermPK
FK
FK
ed
feature_id
dbxref_id
organism_id
name
uniquename
residues
seqlen
md5checksum
type_id
is_analysis
is_obsolete
timeaccessed
timelastmodifi
featurePK
FK
FK
FK
dbxref_id
accession
version
description
PK
FK
dbxref
db_id
Figure 1 Chado schema in GenoBase The portion of the Chado schema is shown that was used to realize the abstract GenoBase schema Arrows showrelationships between a Foreign Key (FK) and Primary Key (PK) of the same name in the database tables except FK type id in the feature and featureproptables are connected to PK cvterm id in the cvterm table
important in the field of systems biology More and morein-depth studies of E coli will no doubt continue to providenew exciting insights into complete understanding the cellat the systems level as one of the model organisms
To accelerate this direction the development of biolog-ical resources for systematic studies of E coli K-12 likethe Keio single-gene deletion collection (8) and the ASKAORFeome clone libraries (9) has proven especially valu-able worldwide The dramatic advent of technologies foracquisition of HT data types (eg the network of proteinndashprotein interactions transcriptional and translational reg-ulation genetic interaction etc) has created a requirementto maintain and share such resources
The original purpose of GenoBase was to support the Ecoli K-12 genome project launched in 1989 in Japan (10)The original data were E coli K-12 sequence entries inGenBank and their mapping onto the chromosome (11) inprinted format GenoBase was designed to facilitate clas-sification of sequenced and yet to be sequenced chromoso-mal regions for efficient sequencing project management forthe conventional way of sequencing using Kohara-orderedphage clones (12) Following the completion of the genomeproject GenoBase was enhanced to facilitate genome anno-
tation GenoBase originally displayed information for theW3110 strain of E coli K-12 that was sequenced in theJapanese E coli genome project (1013ndash17) whereas theMG1655 strain whose complete genome was first reported(18) has been more widely used The single-gene Keio col-lection is derived from E coli K-12 BW25113 (8) and thebar-coded deletion collection is derived from E coli K-12 BW38028 Like MG1655 BW25113 and BW38028 aredescendants of E coli K-12 W1485 The construction ofBW25113 has been reported (8)
Here we describe key information resources available athttpecolinaistjpGB for storing sharing and retrievinginformation on experimental resources of the Keio single-gene deletion collection (8) the ASKA ORFeome clone li-brary (9) and their quality control to check duplications(19) HT screening data of proteinndashprotein interaction (20)protein localization and phenotype analysis data using BI-OLOG technology (2122) are also stored We also brieflysummarize features of the latest version of GenoBase whichis at httpecolinaistjpGB
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 3
idn_primerc_primerdndculdlpdcdul_termdl_termcommentupdate_date
barcode collection
gene
idtypencrna_classproductfunctiongenesynonymexperimentjwbgigeneidecogeneasapswiss-protprotein_idec_numbergo_componentgo_functiongo_processcodon_startribosomal_slippagetransl_exceptpec
gene
idn_primerc_primerdndccommentupdate_date
mg1655_genome
genomes
w3110_genome bw25113_genome bw38029_genome
primers
aska_gfp_plus
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
aska_gfp_minus
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
gateway_entry
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
transbac
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
keio_collection
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
barcode_collectionidversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratebarcodeup_seqdown_seqcommentflagupdate_date
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
idtarget_resourcesequencetarget_templateseq_from_genomemodificationplaterowcolumn
primer
idn_primerc_primerdndccommentupdate_date
aska_gfp(+)idn_primerc_primerdndccommentupdate_date
aska_gfp(-)
idn_primerc_primerdndccommentupdate_date
gateway_entry transbac
idn_primerc_primerdndculdloverlaporientationcommentupdate_date
keio_collection
stock_information
resources
Figure 2 SQL Views in GenoBase The GenoBase schema was built via SQL Views using the base relations in the Chado schema shown in Figure 1 Viewsand scripts are downloadable from the GenoBase description page
DEVELOPMENT OF THE GenoBase SYSTEM
Standardization is an important issue when maintaining adatabase for the long term To perform this we used thePostgreSQL database management system and the Chadoschema (2324) for numerous different types of biologicaldata PostgreSQL is an open source software and one ofthe most widely used relational database management sys-tems Chado is a relational database schema designed tomanage biological knowledge for a wide variety of organ-isms Our database was originally started as an organism-specific genome database and was previously implementedinto the same relational database management system Post-greSQL with a specific schema Here we switched to usethe Chado schema for interoperability between databasesFigure 1 shows the Chado schema used for GenoBase The
Chado schema and its associated tools were downloadedfrom the GMOD web site (httpwwwgmodorg) Becauseour target organism E coli K-12 is one of the eubacteriamodel species some Chado tables designed for eukaryoteswere not required Therefore the tables in Figure 1 show asubset of those in Chado All the data in the previous ver-sion of our database were converted and stored in the Post-greSQL database using the Chado schema and View tables(Figure 2) on the Linux operating system
Once all the data were converted into the new databasethe views of the virtual data table were defined by SQL fromthe Chado schema All of the SQL scripts are downloadablefrom the top page links of the GenoBase website (httpecolinaistjpGB)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
4 Nucleic Acids Research 2014
attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis
C) Gateway entry
B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg
ATG 2nd aa target ORF last TGA
PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer
D) TransBac
A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP
Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary
MAJOR PURPOSE OF GenoBase
Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in
Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 5
down
kan
up target geneATG SD
FRT FRT
catFRT1 FRT1
barcode
6 aa + TerA) Keio collection
B) Barcode deletion collection
Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase
INFORMATION ON NEW AND FUTURE RESOURCES
Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion
INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES
GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the
simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)
Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones
Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)
Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
6 Nucleic Acids Research 2014
Primers
GenBankannotation
primer positionon the chromosome
ASKA GFP(+) (-)Gateway entry amp TransBac
Keio amp Barcode collectionvalidation for gene deletion
overlap genes[N C] length
consistency
dif with annotation
OKconsistency
dif with annotation
No
YesYes
No
reconstruction required
OK
Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction
Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation
Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-
other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community
As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
2 Nucleic Acids Research 2014
x
db_id
name
description
urlprefi
url
PKdb
organism_id
abbreviation
genus
species
common_name
comment
PKorganism
cv_id
name
definition
cvPK
pubauthor_id
pub_id
rank
editor
surname
givennames
suffix
PK
FK
pubauthor
pub_id
title
volumetitle
volume
series_name
issue
pyear
pages
miniref
uniquename
type_id
is_obsolete
publisher
pubplace
pubPK
FK
featureprop_id
type_id
value
rank
feature_id
PK
FK
FK
featureprop
cvterm_id
cv_id
name
definition
dbxref_id
is_obsolete
is_relationshiptype
cvtermPK
FK
FK
ed
feature_id
dbxref_id
organism_id
name
uniquename
residues
seqlen
md5checksum
type_id
is_analysis
is_obsolete
timeaccessed
timelastmodifi
featurePK
FK
FK
FK
dbxref_id
accession
version
description
PK
FK
dbxref
db_id
Figure 1 Chado schema in GenoBase The portion of the Chado schema is shown that was used to realize the abstract GenoBase schema Arrows showrelationships between a Foreign Key (FK) and Primary Key (PK) of the same name in the database tables except FK type id in the feature and featureproptables are connected to PK cvterm id in the cvterm table
important in the field of systems biology More and morein-depth studies of E coli will no doubt continue to providenew exciting insights into complete understanding the cellat the systems level as one of the model organisms
To accelerate this direction the development of biolog-ical resources for systematic studies of E coli K-12 likethe Keio single-gene deletion collection (8) and the ASKAORFeome clone libraries (9) has proven especially valu-able worldwide The dramatic advent of technologies foracquisition of HT data types (eg the network of proteinndashprotein interactions transcriptional and translational reg-ulation genetic interaction etc) has created a requirementto maintain and share such resources
The original purpose of GenoBase was to support the Ecoli K-12 genome project launched in 1989 in Japan (10)The original data were E coli K-12 sequence entries inGenBank and their mapping onto the chromosome (11) inprinted format GenoBase was designed to facilitate clas-sification of sequenced and yet to be sequenced chromoso-mal regions for efficient sequencing project management forthe conventional way of sequencing using Kohara-orderedphage clones (12) Following the completion of the genomeproject GenoBase was enhanced to facilitate genome anno-
tation GenoBase originally displayed information for theW3110 strain of E coli K-12 that was sequenced in theJapanese E coli genome project (1013ndash17) whereas theMG1655 strain whose complete genome was first reported(18) has been more widely used The single-gene Keio col-lection is derived from E coli K-12 BW25113 (8) and thebar-coded deletion collection is derived from E coli K-12 BW38028 Like MG1655 BW25113 and BW38028 aredescendants of E coli K-12 W1485 The construction ofBW25113 has been reported (8)
Here we describe key information resources available athttpecolinaistjpGB for storing sharing and retrievinginformation on experimental resources of the Keio single-gene deletion collection (8) the ASKA ORFeome clone li-brary (9) and their quality control to check duplications(19) HT screening data of proteinndashprotein interaction (20)protein localization and phenotype analysis data using BI-OLOG technology (2122) are also stored We also brieflysummarize features of the latest version of GenoBase whichis at httpecolinaistjpGB
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 3
idn_primerc_primerdndculdlpdcdul_termdl_termcommentupdate_date
barcode collection
gene
idtypencrna_classproductfunctiongenesynonymexperimentjwbgigeneidecogeneasapswiss-protprotein_idec_numbergo_componentgo_functiongo_processcodon_startribosomal_slippagetransl_exceptpec
gene
idn_primerc_primerdndccommentupdate_date
mg1655_genome
genomes
w3110_genome bw25113_genome bw38029_genome
primers
aska_gfp_plus
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
aska_gfp_minus
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
gateway_entry
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
transbac
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
keio_collection
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
barcode_collectionidversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratebarcodeup_seqdown_seqcommentflagupdate_date
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
idtarget_resourcesequencetarget_templateseq_from_genomemodificationplaterowcolumn
primer
idn_primerc_primerdndccommentupdate_date
aska_gfp(+)idn_primerc_primerdndccommentupdate_date
aska_gfp(-)
idn_primerc_primerdndccommentupdate_date
gateway_entry transbac
idn_primerc_primerdndculdloverlaporientationcommentupdate_date
keio_collection
stock_information
resources
Figure 2 SQL Views in GenoBase The GenoBase schema was built via SQL Views using the base relations in the Chado schema shown in Figure 1 Viewsand scripts are downloadable from the GenoBase description page
DEVELOPMENT OF THE GenoBase SYSTEM
Standardization is an important issue when maintaining adatabase for the long term To perform this we used thePostgreSQL database management system and the Chadoschema (2324) for numerous different types of biologicaldata PostgreSQL is an open source software and one ofthe most widely used relational database management sys-tems Chado is a relational database schema designed tomanage biological knowledge for a wide variety of organ-isms Our database was originally started as an organism-specific genome database and was previously implementedinto the same relational database management system Post-greSQL with a specific schema Here we switched to usethe Chado schema for interoperability between databasesFigure 1 shows the Chado schema used for GenoBase The
Chado schema and its associated tools were downloadedfrom the GMOD web site (httpwwwgmodorg) Becauseour target organism E coli K-12 is one of the eubacteriamodel species some Chado tables designed for eukaryoteswere not required Therefore the tables in Figure 1 show asubset of those in Chado All the data in the previous ver-sion of our database were converted and stored in the Post-greSQL database using the Chado schema and View tables(Figure 2) on the Linux operating system
Once all the data were converted into the new databasethe views of the virtual data table were defined by SQL fromthe Chado schema All of the SQL scripts are downloadablefrom the top page links of the GenoBase website (httpecolinaistjpGB)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
4 Nucleic Acids Research 2014
attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis
C) Gateway entry
B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg
ATG 2nd aa target ORF last TGA
PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer
D) TransBac
A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP
Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary
MAJOR PURPOSE OF GenoBase
Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in
Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 5
down
kan
up target geneATG SD
FRT FRT
catFRT1 FRT1
barcode
6 aa + TerA) Keio collection
B) Barcode deletion collection
Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase
INFORMATION ON NEW AND FUTURE RESOURCES
Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion
INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES
GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the
simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)
Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones
Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)
Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
6 Nucleic Acids Research 2014
Primers
GenBankannotation
primer positionon the chromosome
ASKA GFP(+) (-)Gateway entry amp TransBac
Keio amp Barcode collectionvalidation for gene deletion
overlap genes[N C] length
consistency
dif with annotation
OKconsistency
dif with annotation
No
YesYes
No
reconstruction required
OK
Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction
Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation
Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-
other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community
As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 3
idn_primerc_primerdndculdlpdcdul_termdl_termcommentupdate_date
barcode collection
gene
idtypencrna_classproductfunctiongenesynonymexperimentjwbgigeneidecogeneasapswiss-protprotein_idec_numbergo_componentgo_functiongo_processcodon_startribosomal_slippagetransl_exceptpec
gene
idn_primerc_primerdndccommentupdate_date
mg1655_genome
genomes
w3110_genome bw25113_genome bw38029_genome
primers
aska_gfp_plus
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
aska_gfp_minus
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
gateway_entry
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
transbac
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
keio_collection
idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date
barcode_collectionidversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratebarcodeup_seqdown_seqcommentflagupdate_date
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote
idtarget_resourcesequencetarget_templateseq_from_genomemodificationplaterowcolumn
primer
idn_primerc_primerdndccommentupdate_date
aska_gfp(+)idn_primerc_primerdndccommentupdate_date
aska_gfp(-)
idn_primerc_primerdndccommentupdate_date
gateway_entry transbac
idn_primerc_primerdndculdloverlaporientationcommentupdate_date
keio_collection
stock_information
resources
Figure 2 SQL Views in GenoBase The GenoBase schema was built via SQL Views using the base relations in the Chado schema shown in Figure 1 Viewsand scripts are downloadable from the GenoBase description page
DEVELOPMENT OF THE GenoBase SYSTEM
Standardization is an important issue when maintaining adatabase for the long term To perform this we used thePostgreSQL database management system and the Chadoschema (2324) for numerous different types of biologicaldata PostgreSQL is an open source software and one ofthe most widely used relational database management sys-tems Chado is a relational database schema designed tomanage biological knowledge for a wide variety of organ-isms Our database was originally started as an organism-specific genome database and was previously implementedinto the same relational database management system Post-greSQL with a specific schema Here we switched to usethe Chado schema for interoperability between databasesFigure 1 shows the Chado schema used for GenoBase The
Chado schema and its associated tools were downloadedfrom the GMOD web site (httpwwwgmodorg) Becauseour target organism E coli K-12 is one of the eubacteriamodel species some Chado tables designed for eukaryoteswere not required Therefore the tables in Figure 1 show asubset of those in Chado All the data in the previous ver-sion of our database were converted and stored in the Post-greSQL database using the Chado schema and View tables(Figure 2) on the Linux operating system
Once all the data were converted into the new databasethe views of the virtual data table were defined by SQL fromthe Chado schema All of the SQL scripts are downloadablefrom the top page links of the GenoBase website (httpecolinaistjpGB)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
4 Nucleic Acids Research 2014
attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis
C) Gateway entry
B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg
ATG 2nd aa target ORF last TGA
PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer
D) TransBac
A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP
Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary
MAJOR PURPOSE OF GenoBase
Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in
Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 5
down
kan
up target geneATG SD
FRT FRT
catFRT1 FRT1
barcode
6 aa + TerA) Keio collection
B) Barcode deletion collection
Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase
INFORMATION ON NEW AND FUTURE RESOURCES
Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion
INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES
GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the
simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)
Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones
Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)
Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
6 Nucleic Acids Research 2014
Primers
GenBankannotation
primer positionon the chromosome
ASKA GFP(+) (-)Gateway entry amp TransBac
Keio amp Barcode collectionvalidation for gene deletion
overlap genes[N C] length
consistency
dif with annotation
OKconsistency
dif with annotation
No
YesYes
No
reconstruction required
OK
Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction
Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation
Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-
other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community
As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
4 Nucleic Acids Research 2014
attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis
C) Gateway entry
B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg
ATG 2nd aa target ORF last TGA
PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer
D) TransBac
A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP
Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary
MAJOR PURPOSE OF GenoBase
Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in
Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 5
down
kan
up target geneATG SD
FRT FRT
catFRT1 FRT1
barcode
6 aa + TerA) Keio collection
B) Barcode deletion collection
Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase
INFORMATION ON NEW AND FUTURE RESOURCES
Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion
INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES
GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the
simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)
Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones
Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)
Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
6 Nucleic Acids Research 2014
Primers
GenBankannotation
primer positionon the chromosome
ASKA GFP(+) (-)Gateway entry amp TransBac
Keio amp Barcode collectionvalidation for gene deletion
overlap genes[N C] length
consistency
dif with annotation
OKconsistency
dif with annotation
No
YesYes
No
reconstruction required
OK
Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction
Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation
Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-
other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community
As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 5
down
kan
up target geneATG SD
FRT FRT
catFRT1 FRT1
barcode
6 aa + TerA) Keio collection
B) Barcode deletion collection
Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase
INFORMATION ON NEW AND FUTURE RESOURCES
Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion
INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES
GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the
simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)
Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones
Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)
Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
6 Nucleic Acids Research 2014
Primers
GenBankannotation
primer positionon the chromosome
ASKA GFP(+) (-)Gateway entry amp TransBac
Keio amp Barcode collectionvalidation for gene deletion
overlap genes[N C] length
consistency
dif with annotation
OKconsistency
dif with annotation
No
YesYes
No
reconstruction required
OK
Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction
Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation
Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-
other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community
As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
6 Nucleic Acids Research 2014
Primers
GenBankannotation
primer positionon the chromosome
ASKA GFP(+) (-)Gateway entry amp TransBac
Keio amp Barcode collectionvalidation for gene deletion
overlap genes[N C] length
consistency
dif with annotation
OKconsistency
dif with annotation
No
YesYes
No
reconstruction required
OK
Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction
Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation
Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-
other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community
As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 7
(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries
ATG cloned fragment
DN DC
+ -0 +- 0
Sto
p
(B) TransBac library
cloned fragment
DN DC
+ -0 +- 0
Sto
p
ATG
target region
target region
N primer C primer
N primer C primer
(C) Keio and Barcode deletion collection
21bp
deleted region
DN DC
UL
upstream gene
DL
N primer C primer
target region
downstream gene
+ -0 +- 0
Sto
p
ATG
21bp
deleted region
DN DC
PD
overlap gene
CD
N primer C primer
target region
overlap gene
+ -0 +- 0
Sto
p
ATG
1) partial overlap case
2) complete overlap case
Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
8 Nucleic Acids Research 2014
(A)
(B) by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 9
(C)
(D)
Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
10 Nucleic Acids Research 2014
upsteam gene ATGA
4 bp overlap
deleted region
stop codon
FRT FRT
21bpAbR
start codon
ATGA
ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA
resistance cassette
target gene
Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain
DATA RETRIEVAL AND CONTENTS IN GenoBase
GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages
The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources
For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8
Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait
DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files
Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available
Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)
Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened
RESOURCE DISTRIBUTION
Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)
Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible
For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible
POLICY OF THE MANAGEMENT OF GenoBase
Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
Nucleic Acids Research 2014 11
is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system
ACKNOWLEDGEMENTS
We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript
FUNDING
National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan
Conflict of interest statement None declared
REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU
ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14
2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581
3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584
4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290
5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout
mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306
6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941
7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414
8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008
9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299
10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308
11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019
12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508
13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377
14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639
15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007
16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392
17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113
18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474
19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335
20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691
21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99
22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52
23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12
24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from
12 Nucleic Acids Research 2014
25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470
26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795
27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787
28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391
29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278
30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581
31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218
32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505
33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166
34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71
35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087
36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117
37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332
38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645
39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972
40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9
41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624
42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612
43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389
44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684
45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45
46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151
47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314
48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255
49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171
by Daisuke K
ihara on Novem
ber 21 2014httpnaroxfordjournalsorg
Dow
nloaded from