12
Nucleic Acids Research, 2014 1 doi: 10.1093/nar/gku1164 GenoBase: comprehensive resource database of Escherichia coli K-12 Yuta Otsuka 1,, Ai Muto 1,, Rikiya Takeuchi 1,, Chihiro Okada 2,, Motokazu Ishikawa 2 , Koichiro Nakamura 2 , Natsuko Yamamoto 1 , Hitomi Dose 1 , Kenji Nakahigashi 3 , Shigeki Tanishima 2 , Sivasundaram Suharnan 4 , Wataru Nomura 1 , Toru Nakayashiki 1 , Walid G. Aref 5 , Barry R. Bochner 6 , Tyrrell Conway 7 , Michael Gribskov 8 , Daisuke Kihara 5,8 , Kenneth E. Rudd 9 , Yukako Tohsato 10 , Barry L. Wanner 11,* and Hirotada Mori 1,* 1 Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630–0101, Japan, 2 Mitsubishi Space Software Co., LTD., 5–4–36 Tsukaguchihonnmachi, Amagasaki, Hyougo 661–0001, Japan, 3 Institute of Advanced Biosciences, Keio University, Tsuruoka 997–0017, Japan, 4 Axiohelix, Okinawa Sangyo Shien Center, 502,1831–1, Oroku, Naha-shi, Okinawa 901–0152, Japan, 5 Department of Computer Science, Purdue University, 305 N. University Street,West Lafayette, IN 47907–2107, USA, 6 Biolog, Inc., Hayward, CA 94545, USA, 7 Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019–0245, USA, 8 Department of Biological Sciences, Purdue University, 915 W. State Street, West Lafayette, IN 47907–2054, USA, 9 Department Biochemistry and Molecular Biology, University of Miami, P.O. Box 016129, Miami, FL 33101–6129, USA, 10 Department of Bioinformatics, Ritsumeikan University, 1–1–1 Nojihigashi, Kusatsu, Shiga 525–8577, Japan and 11 Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA Received September 17, 2014; Revised October 27, 2014; Accepted October 30, 2014 ABSTRACT Comprehensive experimental resources, such as ORFeome clone libraries and deletion mutant col- lections, are fundamental tools for elucidation of gene function. Data sets by omics analysis using these resources provide key information for func- tional analysis, modeling and simulation both in in- dividual and systematic approaches. With the long- term goal of complete understanding of a cell, we have over the past decade created a variety of clone and mutant sets for functional genomics studies of Escherichia coli K-12. We have made these exper- imental resources freely available to the academic community worldwide. Accordingly, these resources have now been used in numerous investigations of a multitude of cell processes. Quality control is extremely important for evaluating results gener- ated by these resources. Because the annotation has been changed since 2005, which we originally used for the construction, we have updated these genomic resources accordingly. Here, we describe GenoBase (http://ecoli.naist.jp/GB/ ), which contains key information about comprehensive experimental resources of E. coli K-12, their quality control and several omics data sets generated using these re- sources. INTRODUCTION Escherichia coli K-12 is clearly one of the best studied or- ganisms (1) and has had an enormous contribution on con- struction of concept of genes over the past half century (2). It is, however, still far from completely understood at the systems level, although complete genome structure was es- tablished and numerous functional analyses have been per- formed. Since the genome structure was determined, new streams of biology have come up. Omics approaches, such as cytomics (3), interactome (4), phenomics (5), proteomics (6), transcriptomics (7), etc. have been developed accord- ing to the innovation of high-throughput (HT) technolo- gies. Simultaneously, quantitative, single cellular or single molecule analysis have emerged and become increasingly * To whom correspondence should be addressed. Tel: +81 743 72 5660; Fax: +81 743 72 5669; Email: [email protected] Correspondence may also be addressed to Barry L. Wanner. Tel: +1 617 432 5093; Fax +1 617 738 7664; Email: [email protected] The authors wish it to be known that, in their opinion, the first four authors should be regarded as Joint First Authors. Present addresses: Natsuko Yamamoto, Department of Biomedical Ethics and Public Policy, Graduate School of Medicine, Osaka University, Suita, Osaka 565–0871, Japan. Yukako Tohsato, Riken Quantitative Biology Center, Kobe 650–0047, Japan. Former address: Barry L. Wanner, Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA. C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research Advance Access published November 15, 2014 by Daisuke Kihara on November 21, 2014 http://nar.oxfordjournals.org/ Downloaded from

GenoBase: comprehensive resource database of Escherichia coli K-12

  • Upload
    purdue

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Nucleic Acids Research 2014 1doi 101093nargku1164

GenoBase comprehensive resource database ofEscherichia coli K-12Yuta Otsuka1dagger Ai Muto1dagger Rikiya Takeuchi1dagger Chihiro Okada2dagger Motokazu Ishikawa2Koichiro Nakamura2 Natsuko Yamamoto1 Hitomi Dose1 Kenji Nakahigashi3Shigeki Tanishima2 Sivasundaram Suharnan4 Wataru Nomura1 Toru Nakayashiki1 WalidG Aref5 Barry R Bochner6 Tyrrell Conway7 Michael Gribskov8 Daisuke Kihara58Kenneth E Rudd9 Yukako Tohsato10 Barry L Wanner11 and Hirotada Mori1

1Graduate School of Biological Sciences Nara Institute of Science and Technology Ikoma Nara 630ndash0101 Japan2Mitsubishi Space Software Co LTD 5ndash4ndash36 Tsukaguchihonnmachi Amagasaki Hyougo 661ndash0001 Japan3Institute of Advanced Biosciences Keio University Tsuruoka 997ndash0017 Japan 4Axiohelix Okinawa Sangyo ShienCenter 5021831ndash1 Oroku Naha-shi Okinawa 901ndash0152 Japan 5Department of Computer Science PurdueUniversity 305 N University Street West Lafayette IN 47907ndash2107 USA 6Biolog Inc Hayward CA 94545 USA7Department of Microbiology and Plant Biology University of Oklahoma Norman OK 73019ndash0245 USA8Department of Biological Sciences Purdue University 915 W State Street West Lafayette IN 47907ndash2054 USA9Department Biochemistry and Molecular Biology University of Miami PO Box 016129 Miami FL 33101ndash6129USA 10Department of Bioinformatics Ritsumeikan University 1ndash1ndash1 Nojihigashi Kusatsu Shiga 525ndash8577 Japanand 11Department of Microbiology and Immunobiology Harvard Medical School Boston MA 02115 USA

Received September 17 2014 Revised October 27 2014 Accepted October 30 2014

ABSTRACT

Comprehensive experimental resources such asORFeome clone libraries and deletion mutant col-lections are fundamental tools for elucidation ofgene function Data sets by omics analysis usingthese resources provide key information for func-tional analysis modeling and simulation both in in-dividual and systematic approaches With the long-term goal of complete understanding of a cell wehave over the past decade created a variety of cloneand mutant sets for functional genomics studies ofEscherichia coli K-12 We have made these exper-imental resources freely available to the academiccommunity worldwide Accordingly these resourceshave now been used in numerous investigationsof a multitude of cell processes Quality control isextremely important for evaluating results gener-ated by these resources Because the annotationhas been changed since 2005 which we originallyused for the construction we have updated these

genomic resources accordingly Here we describeGenoBase (httpecolinaistjpGB) which containskey information about comprehensive experimentalresources of E coli K-12 their quality control andseveral omics data sets generated using these re-sources

INTRODUCTION

Escherichia coli K-12 is clearly one of the best studied or-ganisms (1) and has had an enormous contribution on con-struction of concept of genes over the past half century (2)It is however still far from completely understood at thesystems level although complete genome structure was es-tablished and numerous functional analyses have been per-formed Since the genome structure was determined newstreams of biology have come up Omics approaches suchas cytomics (3) interactome (4) phenomics (5) proteomics(6) transcriptomics (7) etc have been developed accord-ing to the innovation of high-throughput (HT) technolo-gies Simultaneously quantitative single cellular or singlemolecule analysis have emerged and become increasingly

To whom correspondence should be addressed Tel +81 743 72 5660 Fax +81 743 72 5669 Email hmorigtcnaistjpCorrespondence may also be addressed to Barry L Wanner Tel +1 617 432 5093 Fax +1 617 738 7664 Email blwannergeneticsmedharvardedudaggerThe authors wish it to be known that in their opinion the first four authors should be regarded as Joint First AuthorsPresent addressesNatsuko Yamamoto Department of Biomedical Ethics and Public Policy Graduate School of Medicine Osaka University Suita Osaka 565ndash0871 JapanYukako Tohsato Riken Quantitative Biology Center Kobe 650ndash0047 JapanFormer address Barry L Wanner Department of Biological Sciences Purdue University West Lafayette IN 47907 USA

Ccopy The Author(s) 2014 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Nucleic Acids Research Advance Access published November 15 2014 by D

aisuke Kihara on N

ovember 21 2014

httpnaroxfordjournalsorgD

ownloaded from

2 Nucleic Acids Research 2014

x

db_id

name

description

urlprefi

url

PKdb

organism_id

abbreviation

genus

species

common_name

comment

PKorganism

cv_id

name

definition

cvPK

pubauthor_id

pub_id

rank

editor

surname

givennames

suffix

PK

FK

pubauthor

pub_id

title

volumetitle

volume

series_name

issue

pyear

pages

miniref

uniquename

type_id

is_obsolete

publisher

pubplace

pubPK

FK

featureprop_id

type_id

value

rank

feature_id

PK

FK

FK

featureprop

cvterm_id

cv_id

name

definition

dbxref_id

is_obsolete

is_relationshiptype

cvtermPK

FK

FK

ed

feature_id

dbxref_id

organism_id

name

uniquename

residues

seqlen

md5checksum

type_id

is_analysis

is_obsolete

timeaccessed

timelastmodifi

featurePK

FK

FK

FK

dbxref_id

accession

version

description

PK

FK

dbxref

db_id

Figure 1 Chado schema in GenoBase The portion of the Chado schema is shown that was used to realize the abstract GenoBase schema Arrows showrelationships between a Foreign Key (FK) and Primary Key (PK) of the same name in the database tables except FK type id in the feature and featureproptables are connected to PK cvterm id in the cvterm table

important in the field of systems biology More and morein-depth studies of E coli will no doubt continue to providenew exciting insights into complete understanding the cellat the systems level as one of the model organisms

To accelerate this direction the development of biolog-ical resources for systematic studies of E coli K-12 likethe Keio single-gene deletion collection (8) and the ASKAORFeome clone libraries (9) has proven especially valu-able worldwide The dramatic advent of technologies foracquisition of HT data types (eg the network of proteinndashprotein interactions transcriptional and translational reg-ulation genetic interaction etc) has created a requirementto maintain and share such resources

The original purpose of GenoBase was to support the Ecoli K-12 genome project launched in 1989 in Japan (10)The original data were E coli K-12 sequence entries inGenBank and their mapping onto the chromosome (11) inprinted format GenoBase was designed to facilitate clas-sification of sequenced and yet to be sequenced chromoso-mal regions for efficient sequencing project management forthe conventional way of sequencing using Kohara-orderedphage clones (12) Following the completion of the genomeproject GenoBase was enhanced to facilitate genome anno-

tation GenoBase originally displayed information for theW3110 strain of E coli K-12 that was sequenced in theJapanese E coli genome project (1013ndash17) whereas theMG1655 strain whose complete genome was first reported(18) has been more widely used The single-gene Keio col-lection is derived from E coli K-12 BW25113 (8) and thebar-coded deletion collection is derived from E coli K-12 BW38028 Like MG1655 BW25113 and BW38028 aredescendants of E coli K-12 W1485 The construction ofBW25113 has been reported (8)

Here we describe key information resources available athttpecolinaistjpGB for storing sharing and retrievinginformation on experimental resources of the Keio single-gene deletion collection (8) the ASKA ORFeome clone li-brary (9) and their quality control to check duplications(19) HT screening data of proteinndashprotein interaction (20)protein localization and phenotype analysis data using BI-OLOG technology (2122) are also stored We also brieflysummarize features of the latest version of GenoBase whichis at httpecolinaistjpGB

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 3

idn_primerc_primerdndculdlpdcdul_termdl_termcommentupdate_date

barcode collection

gene

idtypencrna_classproductfunctiongenesynonymexperimentjwbgigeneidecogeneasapswiss-protprotein_idec_numbergo_componentgo_functiongo_processcodon_startribosomal_slippagetransl_exceptpec

gene

idn_primerc_primerdndccommentupdate_date

mg1655_genome

genomes

w3110_genome bw25113_genome bw38029_genome

primers

aska_gfp_plus

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

aska_gfp_minus

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

gateway_entry

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

transbac

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

keio_collection

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

barcode_collectionidversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratebarcodeup_seqdown_seqcommentflagupdate_date

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

idtarget_resourcesequencetarget_templateseq_from_genomemodificationplaterowcolumn

primer

idn_primerc_primerdndccommentupdate_date

aska_gfp(+)idn_primerc_primerdndccommentupdate_date

aska_gfp(-)

idn_primerc_primerdndccommentupdate_date

gateway_entry transbac

idn_primerc_primerdndculdloverlaporientationcommentupdate_date

keio_collection

stock_information

resources

Figure 2 SQL Views in GenoBase The GenoBase schema was built via SQL Views using the base relations in the Chado schema shown in Figure 1 Viewsand scripts are downloadable from the GenoBase description page

DEVELOPMENT OF THE GenoBase SYSTEM

Standardization is an important issue when maintaining adatabase for the long term To perform this we used thePostgreSQL database management system and the Chadoschema (2324) for numerous different types of biologicaldata PostgreSQL is an open source software and one ofthe most widely used relational database management sys-tems Chado is a relational database schema designed tomanage biological knowledge for a wide variety of organ-isms Our database was originally started as an organism-specific genome database and was previously implementedinto the same relational database management system Post-greSQL with a specific schema Here we switched to usethe Chado schema for interoperability between databasesFigure 1 shows the Chado schema used for GenoBase The

Chado schema and its associated tools were downloadedfrom the GMOD web site (httpwwwgmodorg) Becauseour target organism E coli K-12 is one of the eubacteriamodel species some Chado tables designed for eukaryoteswere not required Therefore the tables in Figure 1 show asubset of those in Chado All the data in the previous ver-sion of our database were converted and stored in the Post-greSQL database using the Chado schema and View tables(Figure 2) on the Linux operating system

Once all the data were converted into the new databasethe views of the virtual data table were defined by SQL fromthe Chado schema All of the SQL scripts are downloadablefrom the top page links of the GenoBase website (httpecolinaistjpGB)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

4 Nucleic Acids Research 2014

attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis

C) Gateway entry

B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg

ATG 2nd aa target ORF last TGA

PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer

D) TransBac

A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP

Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary

MAJOR PURPOSE OF GenoBase

Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in

Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 5

down

kan

up target geneATG SD

FRT FRT

catFRT1 FRT1

barcode

6 aa + TerA) Keio collection

B) Barcode deletion collection

Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase

INFORMATION ON NEW AND FUTURE RESOURCES

Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion

INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES

GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the

simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)

Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones

Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)

Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

6 Nucleic Acids Research 2014

Primers

GenBankannotation

primer positionon the chromosome

ASKA GFP(+) (-)Gateway entry amp TransBac

Keio amp Barcode collectionvalidation for gene deletion

overlap genes[N C] length

consistency

dif with annotation

OKconsistency

dif with annotation

No

YesYes

No

reconstruction required

OK

Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction

Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation

Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-

other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community

As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

2 Nucleic Acids Research 2014

x

db_id

name

description

urlprefi

url

PKdb

organism_id

abbreviation

genus

species

common_name

comment

PKorganism

cv_id

name

definition

cvPK

pubauthor_id

pub_id

rank

editor

surname

givennames

suffix

PK

FK

pubauthor

pub_id

title

volumetitle

volume

series_name

issue

pyear

pages

miniref

uniquename

type_id

is_obsolete

publisher

pubplace

pubPK

FK

featureprop_id

type_id

value

rank

feature_id

PK

FK

FK

featureprop

cvterm_id

cv_id

name

definition

dbxref_id

is_obsolete

is_relationshiptype

cvtermPK

FK

FK

ed

feature_id

dbxref_id

organism_id

name

uniquename

residues

seqlen

md5checksum

type_id

is_analysis

is_obsolete

timeaccessed

timelastmodifi

featurePK

FK

FK

FK

dbxref_id

accession

version

description

PK

FK

dbxref

db_id

Figure 1 Chado schema in GenoBase The portion of the Chado schema is shown that was used to realize the abstract GenoBase schema Arrows showrelationships between a Foreign Key (FK) and Primary Key (PK) of the same name in the database tables except FK type id in the feature and featureproptables are connected to PK cvterm id in the cvterm table

important in the field of systems biology More and morein-depth studies of E coli will no doubt continue to providenew exciting insights into complete understanding the cellat the systems level as one of the model organisms

To accelerate this direction the development of biolog-ical resources for systematic studies of E coli K-12 likethe Keio single-gene deletion collection (8) and the ASKAORFeome clone libraries (9) has proven especially valu-able worldwide The dramatic advent of technologies foracquisition of HT data types (eg the network of proteinndashprotein interactions transcriptional and translational reg-ulation genetic interaction etc) has created a requirementto maintain and share such resources

The original purpose of GenoBase was to support the Ecoli K-12 genome project launched in 1989 in Japan (10)The original data were E coli K-12 sequence entries inGenBank and their mapping onto the chromosome (11) inprinted format GenoBase was designed to facilitate clas-sification of sequenced and yet to be sequenced chromoso-mal regions for efficient sequencing project management forthe conventional way of sequencing using Kohara-orderedphage clones (12) Following the completion of the genomeproject GenoBase was enhanced to facilitate genome anno-

tation GenoBase originally displayed information for theW3110 strain of E coli K-12 that was sequenced in theJapanese E coli genome project (1013ndash17) whereas theMG1655 strain whose complete genome was first reported(18) has been more widely used The single-gene Keio col-lection is derived from E coli K-12 BW25113 (8) and thebar-coded deletion collection is derived from E coli K-12 BW38028 Like MG1655 BW25113 and BW38028 aredescendants of E coli K-12 W1485 The construction ofBW25113 has been reported (8)

Here we describe key information resources available athttpecolinaistjpGB for storing sharing and retrievinginformation on experimental resources of the Keio single-gene deletion collection (8) the ASKA ORFeome clone li-brary (9) and their quality control to check duplications(19) HT screening data of proteinndashprotein interaction (20)protein localization and phenotype analysis data using BI-OLOG technology (2122) are also stored We also brieflysummarize features of the latest version of GenoBase whichis at httpecolinaistjpGB

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 3

idn_primerc_primerdndculdlpdcdul_termdl_termcommentupdate_date

barcode collection

gene

idtypencrna_classproductfunctiongenesynonymexperimentjwbgigeneidecogeneasapswiss-protprotein_idec_numbergo_componentgo_functiongo_processcodon_startribosomal_slippagetransl_exceptpec

gene

idn_primerc_primerdndccommentupdate_date

mg1655_genome

genomes

w3110_genome bw25113_genome bw38029_genome

primers

aska_gfp_plus

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

aska_gfp_minus

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

gateway_entry

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

transbac

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

keio_collection

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

barcode_collectionidversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratebarcodeup_seqdown_seqcommentflagupdate_date

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

idtarget_resourcesequencetarget_templateseq_from_genomemodificationplaterowcolumn

primer

idn_primerc_primerdndccommentupdate_date

aska_gfp(+)idn_primerc_primerdndccommentupdate_date

aska_gfp(-)

idn_primerc_primerdndccommentupdate_date

gateway_entry transbac

idn_primerc_primerdndculdloverlaporientationcommentupdate_date

keio_collection

stock_information

resources

Figure 2 SQL Views in GenoBase The GenoBase schema was built via SQL Views using the base relations in the Chado schema shown in Figure 1 Viewsand scripts are downloadable from the GenoBase description page

DEVELOPMENT OF THE GenoBase SYSTEM

Standardization is an important issue when maintaining adatabase for the long term To perform this we used thePostgreSQL database management system and the Chadoschema (2324) for numerous different types of biologicaldata PostgreSQL is an open source software and one ofthe most widely used relational database management sys-tems Chado is a relational database schema designed tomanage biological knowledge for a wide variety of organ-isms Our database was originally started as an organism-specific genome database and was previously implementedinto the same relational database management system Post-greSQL with a specific schema Here we switched to usethe Chado schema for interoperability between databasesFigure 1 shows the Chado schema used for GenoBase The

Chado schema and its associated tools were downloadedfrom the GMOD web site (httpwwwgmodorg) Becauseour target organism E coli K-12 is one of the eubacteriamodel species some Chado tables designed for eukaryoteswere not required Therefore the tables in Figure 1 show asubset of those in Chado All the data in the previous ver-sion of our database were converted and stored in the Post-greSQL database using the Chado schema and View tables(Figure 2) on the Linux operating system

Once all the data were converted into the new databasethe views of the virtual data table were defined by SQL fromthe Chado schema All of the SQL scripts are downloadablefrom the top page links of the GenoBase website (httpecolinaistjpGB)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

4 Nucleic Acids Research 2014

attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis

C) Gateway entry

B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg

ATG 2nd aa target ORF last TGA

PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer

D) TransBac

A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP

Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary

MAJOR PURPOSE OF GenoBase

Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in

Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 5

down

kan

up target geneATG SD

FRT FRT

catFRT1 FRT1

barcode

6 aa + TerA) Keio collection

B) Barcode deletion collection

Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase

INFORMATION ON NEW AND FUTURE RESOURCES

Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion

INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES

GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the

simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)

Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones

Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)

Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

6 Nucleic Acids Research 2014

Primers

GenBankannotation

primer positionon the chromosome

ASKA GFP(+) (-)Gateway entry amp TransBac

Keio amp Barcode collectionvalidation for gene deletion

overlap genes[N C] length

consistency

dif with annotation

OKconsistency

dif with annotation

No

YesYes

No

reconstruction required

OK

Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction

Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation

Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-

other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community

As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 3

idn_primerc_primerdndculdlpdcdul_termdl_termcommentupdate_date

barcode collection

gene

idtypencrna_classproductfunctiongenesynonymexperimentjwbgigeneidecogeneasapswiss-protprotein_idec_numbergo_componentgo_functiongo_processcodon_startribosomal_slippagetransl_exceptpec

gene

idn_primerc_primerdndccommentupdate_date

mg1655_genome

genomes

w3110_genome bw25113_genome bw38029_genome

primers

aska_gfp_plus

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

aska_gfp_minus

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

gateway_entry

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

transbac

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

keio_collection

idversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratecommentflagupdate_date

barcode_collectionidversioneck_numberjw_numberb_numberplaterowcolumnpcr_eval_uppcr_eval_downpcr_eval_dupresistance_evalsensitivity_evalcoloniy_numsuccess_ratebarcodeup_seqdown_seqcommentflagupdate_date

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

ideck_idjw_idb_numberloc_start loc_endloc_strandpseudonote

idtarget_resourcesequencetarget_templateseq_from_genomemodificationplaterowcolumn

primer

idn_primerc_primerdndccommentupdate_date

aska_gfp(+)idn_primerc_primerdndccommentupdate_date

aska_gfp(-)

idn_primerc_primerdndccommentupdate_date

gateway_entry transbac

idn_primerc_primerdndculdloverlaporientationcommentupdate_date

keio_collection

stock_information

resources

Figure 2 SQL Views in GenoBase The GenoBase schema was built via SQL Views using the base relations in the Chado schema shown in Figure 1 Viewsand scripts are downloadable from the GenoBase description page

DEVELOPMENT OF THE GenoBase SYSTEM

Standardization is an important issue when maintaining adatabase for the long term To perform this we used thePostgreSQL database management system and the Chadoschema (2324) for numerous different types of biologicaldata PostgreSQL is an open source software and one ofthe most widely used relational database management sys-tems Chado is a relational database schema designed tomanage biological knowledge for a wide variety of organ-isms Our database was originally started as an organism-specific genome database and was previously implementedinto the same relational database management system Post-greSQL with a specific schema Here we switched to usethe Chado schema for interoperability between databasesFigure 1 shows the Chado schema used for GenoBase The

Chado schema and its associated tools were downloadedfrom the GMOD web site (httpwwwgmodorg) Becauseour target organism E coli K-12 is one of the eubacteriamodel species some Chado tables designed for eukaryoteswere not required Therefore the tables in Figure 1 show asubset of those in Chado All the data in the previous ver-sion of our database were converted and stored in the Post-greSQL database using the Chado schema and View tables(Figure 2) on the Linux operating system

Once all the data were converted into the new databasethe views of the virtual data table were defined by SQL fromthe Chado schema All of the SQL scripts are downloadablefrom the top page links of the GenoBase website (httpecolinaistjpGB)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

4 Nucleic Acids Research 2014

attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis

C) Gateway entry

B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg

ATG 2nd aa target ORF last TGA

PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer

D) TransBac

A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP

Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary

MAJOR PURPOSE OF GenoBase

Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in

Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 5

down

kan

up target geneATG SD

FRT FRT

catFRT1 FRT1

barcode

6 aa + TerA) Keio collection

B) Barcode deletion collection

Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase

INFORMATION ON NEW AND FUTURE RESOURCES

Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion

INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES

GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the

simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)

Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones

Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)

Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

6 Nucleic Acids Research 2014

Primers

GenBankannotation

primer positionon the chromosome

ASKA GFP(+) (-)Gateway entry amp TransBac

Keio amp Barcode collectionvalidation for gene deletion

overlap genes[N C] length

consistency

dif with annotation

OKconsistency

dif with annotation

No

YesYes

No

reconstruction required

OK

Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction

Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation

Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-

other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community

As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

4 Nucleic Acids Research 2014

attL1 N------ORF-------C attL2AAGCAGGCTTGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCCACCCAGC GlyLeuAlaLeuArgAla XXXXXX GlyLeuCysGlyArgHis

C) Gateway entry

B) ASKA GFP(-) NotI SfiI1 N------ORF-------C SfiI2CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArg

ATG 2nd aa target ORF last TGA

PT5 SD BlnI N-------ORF----------C SfiI2CAGAATTCATTAAAGAGGAGAAACCTAGG(1st Met)(LAST aa)TGAGGCCTATGCGGCCGCTAA MetTer

D) TransBac

A) ASKA GFP(+) NotI1 SfiI1 N------ORF-------C SfiI2 GFP NotI2 CACCATACGGATCCGGCCCTGAGGGCC(2nd aa)(LAST aa)GGCCTATGCGGCCGCAAATAAGCGGCCGCTAA6xHis-ThrAspProAlaLeuArgAla XXXXXX GlyLeuCysGlyArgGFP

Figure 3 Structure of cloned region of ORF clone libraries Colored boxes red green and blue boxes represent target ORF eGFP and attL regionsrespectively All of ORF cloning target regions are designed from the second to the last amino acid coding region except the TransBac plasmid clonelibrary

MAJOR PURPOSE OF GenoBase

Our database is focused on information about our compre-hensively constructed experimental resources (libraries ofplasmid clones deletion mutants etc) and HT experimen-tal data from a large E coli functional genomics project thatfar exceeds all other resources combined All the informa-tion in GenoBase is publicly available Current resources in-clude four types of annotated Open Reading Frame (ORF)plasmid clone libraries and two types of deletion collectionsThe plasmid clone libraries include the ASKA ORFeomelibraries with (A) and without (B) a C-terminal GFP fu-sion (925) Gateway entry clone library (C (25)) and thelatest TransBac library (H Dose unpublished) as shown in

Figure 3 The single-gene deletion collections include theKeio collection (A (8)) and a recently constructed barcodedeletion collection (B) as shown in Figure 4 The Barcodedeletion collection whose manuscript is now in prepara-tion was originally constructed for the systematic analy-sis of synthetic lethalsickness genetic interaction to makedouble genes knockout by combining single-gene deletionfrom the Keio and the barcode collection by conjugation(2627R Takeuchi unpublished) We added a further valu-able feature 20-nt molecular barcode which makes popula-tion analysis and multiplex parallel screening of mixed cul-tures feasible (Y Otsuka et al unpublished)

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 5

down

kan

up target geneATG SD

FRT FRT

catFRT1 FRT1

barcode

6 aa + TerA) Keio collection

B) Barcode deletion collection

Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase

INFORMATION ON NEW AND FUTURE RESOURCES

Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion

INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES

GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the

simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)

Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones

Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)

Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

6 Nucleic Acids Research 2014

Primers

GenBankannotation

primer positionon the chromosome

ASKA GFP(+) (-)Gateway entry amp TransBac

Keio amp Barcode collectionvalidation for gene deletion

overlap genes[N C] length

consistency

dif with annotation

OKconsistency

dif with annotation

No

YesYes

No

reconstruction required

OK

Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction

Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation

Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-

other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community

As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 5

down

kan

up target geneATG SD

FRT FRT

catFRT1 FRT1

barcode

6 aa + TerA) Keio collection

B) Barcode deletion collection

Figure 4 Genomic structure of target region of deletion collections Twosets of deletion collections have been constructed using the same primerset The deleted regions are the same between Keio and Barcode collec-tions FRT in the Keio and FRT1 of the Barcode collection are not site-specifically recombined by FLP recombinase

INFORMATION ON NEW AND FUTURE RESOURCES

Information on new resources to be constructed in the fu-ture and the data from HT experiments will be dissemi-nated shared and publicized via GenoBase Currently oursmall RNA deletion collection with or without a barcodechromosomal Venus-GFP fusion strains antibodies againstpurified E coli proteins are being prepared for dissemina-tion

INFORMATION ON THE QUALITY CONTROL OF EX-PERIMENTAL RESOURCES

GenoBase provides not only information about resourcesconstructed in our lab and HT experimental data usingthese resources but also information on quality controlof these resources Both the Keio and barcode single-genedeletion collections include two independent mutants of ev-ery non-essential gene As we noted in our original reportof the Keio collection (8) gene amplification during con-struction can result in mutants containing both the correctgene deletion and a copy of the targeted gene elsewhereIt was therefore important to validate the deletion collec-tions not only for the presence of novel expected junctionfragments but also for absence of the targeted gene (19) Bytesting for absence of a genetic duplication we verified ab-sence of the targeted gene in at least one of the two mu-tants for 983 of the Keio collection mutants It is no-table that Giaever et al (28) reported a similar percentage ofyeast deletion mutants with partial duplications Such geneamplifications have been shown to occur frequently follow-ing generalized transduction in bacteria (29ndash31) as well asspontaneously (32ndash36) Compensatory mutations can alsoarise during mutant construction especially among mu-tants showing a deleterious effect regardless of whether theorganism is a virus a prokaryote or eukaryote (37) To cir-cumvent problems and misinterpretations that happen fromHT screening of collections in which a small number of mu-tants may have duplications or compensatory mutations weroutinely examine both Keio collection mutants for a par-ticular gene Whenever we find ambiguous results we con-struct and test additional mutants We have found that the

simplest method for construction of additional mutants isto generate polymerase chain reaction (PCR) products onthe deletion mutant as template and primes flanking thedeleted gene Importantly the introduction of PCR prod-ucts for the gene deletion greater than 100 to 200 base pairsof flanking homology into a new strain harboring the Redsystem (38) yields huge numbers of recombinants and isfar more efficient and reliable than generalized transduction(39)

Our initial resource included types of ASKA plas-mid clone libraries whose construction was based on thegenome annotation of E coli K-12 from 1997 Annotationpolicy at that time was lacking and sometimes included thewrong initiation codon After completing the constructionof the first version of the ASKA clone libraries we partici-pated in annotation jamborees organized by the late Mon-ica Riley at the Marine Biological Laboratory Woods HoleMA where we completed new annotations of E coli K-12 MG1655 and W3110 which were based on updated andhighly accurate genome sequences of E coli K-12 MG1655and W3110 (40) The resulting annotation corrections led usto re-construct ASKA clones encoding ca1000 ORFs Ac-cordingly the current ASKA plasmid clone libraries con-sists of more than 5000 plasmid clones

Histories and precise information about the methods forconstruction of each of the resources can be downloadedfrom the GenoBase website (httpecolinaistjpGB) No-tably the original sources of genomic DNAs differ for eachof the resources The ASKA plasmid clone libraries wereoriginally constructed using the Kohara clone phages (12)as DNA sources for PCR amplification The Gateway-fittedentry clone library was based on the ASKA plasmid clonesexcept for about 100 target genes which failed to be iso-lated by SfiI cloning from the original clones as described indetail elsewhere (25) The newly designed TransBac libraryfollows the latest annotation of the MG1655 and E coliK- BW38028 genomic DNA for PCR amplification TheKeio deletion collection mutants were using E coli K-12BW25113 as parent and the Barcode deletion collection mu-tants were constructed in E coli K-12 BW38028 Due to useof different annotations we computed differences betweenthe positions of the target genes on the chromosome fromthe primer set and the latest annotation of GenBank Forthe plasmid clone libraries DN (Difference of N-terminus)and DC (Difference of C-terminus) were calculated as givenin Figure 6 (A) and (B) For the deletion collection in ad-dition to DN and DC UL (Upstream Length) and DL(Downstream Length) were determined Also in the caseof complete overlap with the neighbor genes PD (PartialDeletion) and CD (Complete Deletion) were evaluated asillustrated in Figure 6 (C)

Confirmation of the E coli K-12 MG1655 and W3110genome sequences has been previously performed (15) yetsome sequencing errors may exist Confirmation of the hoststrains E coli K-12 BW25113 and BW38028 (BW38029is an independent isolate) of the Keio and barcode single-deletion collections by deep sequencing has now been com-pleted (Y Otsuka et al unpublished) Currently we arecompleting the annotation of these strains which will bereported elsewhere and made publicly available throughGenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

6 Nucleic Acids Research 2014

Primers

GenBankannotation

primer positionon the chromosome

ASKA GFP(+) (-)Gateway entry amp TransBac

Keio amp Barcode collectionvalidation for gene deletion

overlap genes[N C] length

consistency

dif with annotation

OKconsistency

dif with annotation

No

YesYes

No

reconstruction required

OK

Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction

Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation

Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-

other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community

As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

6 Nucleic Acids Research 2014

Primers

GenBankannotation

primer positionon the chromosome

ASKA GFP(+) (-)Gateway entry amp TransBac

Keio amp Barcode collectionvalidation for gene deletion

overlap genes[N C] length

consistency

dif with annotation

OKconsistency

dif with annotation

No

YesYes

No

reconstruction required

OK

Figure 5 Flowchart of evaluation by the latest GenBank annotation All of resources are generated by DNA primers which are designed according to thelatest annotation at the time of construction

Changing the annotation especially changing the genelocation is influential for making libraries Databases egEcoGene (41) EcoCyc (42) and PEC (43) are frequentlyupdated We would like to keep the GenoBase experimen-tal resources updated according to such latest informationHowever in practice it is not easy to achieve At the leastwe can calculate their inconsistency between the design andthe latest annotation

Finally when constructing target resources sometimesunpredicted biological events may happen such as transpo-sition duplication integration introducing mutations etcOther possibilities result from the primer sequences Forthe Keio collection we performed genome confirmationby PCR amplification between antibiotic resistance frag-ments replaced with the target gene and upstream or down-stream genes Even though the PCR evaluation showed pre-dicted genomic structure we realized the existence of an-

other wild-type copy of the target gene somewhere on thechromosome So we tested all of the Keio collection strainsto detect such duplication as reported elsewhere (19) Insome cases suppressor mutations may occur elsewhere onthe chromosome during or after deletion or cloning Whileit is possible to test such possibilities experimentally it istechnically not practical Whether this can be achieved viacommunity-level activities with PortEco (44) has yet to beshown Community annotation has had success for somebacteria genera in the ASAP information resource (4546)(httpasapahabswisceduasaphomephp) but it has hadlimited success in the E coli community

As mentioned above all of our resources depend on theprimers designed according to the latest annotation at thetime So we performed calculation of differences betweenthe design and the annotation inconsistency Evaluationscheme is shown in Figure 5

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 7

(A) ASKA GFP(+) (-) clone and Gateway entry clone libraries

ATG cloned fragment

DN DC

+ -0 +- 0

Sto

p

(B) TransBac library

cloned fragment

DN DC

+ -0 +- 0

Sto

p

ATG

target region

target region

N primer C primer

N primer C primer

(C) Keio and Barcode deletion collection

21bp

deleted region

DN DC

UL

upstream gene

DL

N primer C primer

target region

downstream gene

+ -0 +- 0

Sto

p

ATG

21bp

deleted region

DN DC

PD

overlap gene

CD

N primer C primer

target region

overlap gene

+ -0 +- 0

Sto

p

ATG

1) partial overlap case

2) complete overlap case

Figure 6 Differences with annotation Each of resource pages for target gene shows inconsistency if it exists (A) ASKA GFP(+) ASKA GFP(minus) andGateway entry clone validation (B) TransBac library (C) Keio and Barcode deletion collections

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

8 Nucleic Acids Research 2014

(A)

(B) by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 9

(C)

(D)

Figure 7 Sample screen images (A) Sample pages of plasmid clone libraries Schematic view of inconsistency and cloned region of target genes in nucleotidelevel (B) Schematic view of deletion collection (C) Omics results of each of the target gene (D) BIOLOG phenotype analysis results in global gene andplate level view

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

10 Nucleic Acids Research 2014

upsteam gene ATGA

4 bp overlap

deleted region

stop codon

FRT FRT

21bpAbR

start codon

ATGA

ATGATTCCGGGGATCCGTCGACCCGAAGCAGCTCCAGCCTACA

resistance cassette

target gene

Figure 8 Overlap with no influence In the case of 4 nt (ATGA) overlap between the target and the upstream genes the termination codon of the upstreamgene is re-generated in deletion strain

DATA RETRIEVAL AND CONTENTS IN GenoBase

GenoBase is a searchable web database system devoted tosystems biology of E coli Querying GenoBase is done fromthe top page Any word such as an id gene name productname and sequence are available and the direct searchingsystem by SQL is also provided Searching results are shownas a table format with links to resource pages

The methods used for quality control of our resourcesare shown in Figure 6 On each web page the left panelshows the information on the latest annotation basicallyfrom GenBank database entry (see Figure 7) The secondand third tabs show sequences of the target genes and bioin-formatics analysis of them Currently six tabs are availablefor six different resources

For the Keio and the Barcode deletion collection in thecase of four base overlap with the upstream gene specialcase has no influence for the correct termination of the up-stream gene shown in Figure 8

Another tab shows omics output using our resourcesSystematic experimental data currently includes three datasets Proteinndashprotein interaction data are based on usingHis-tagged ASKA ORF clone library without GFP (20)All of the interaction data including data produced by Timeof Flight Mass Spectrometry (TOF-MASS) analysis arestored and specific partner candidates as prey proteins areavailable from each target protein as bait

DNA microarray analysis of about 150 single-gene dele-tion mutants mostly for ones lacking transcription factorsquantified by ImaGene for deletion mutants are also storedThese data are downloadable from the link as tab-limitedtext format files

Images of protein localization analyzed by the GFP-tagged ASKA ORFeome clones are available E coli cellswithout isopropyl-beta-D-thiogalactopyranoside (IPTG)in Luria-Bertani broth (LB) to avoid misfolding were an-alyzed by fluorescent microscopy Images captured with acharge-coupled device (CCD) camera are also available

Phenotype microarray analysis by BIOLOG plates(4748) are shown graphically Currently about 300 datasets using single-gene knockout of the Keio collection areavailable (2122) Our BIOLOG phenotype screening datahave been produced by 10 times wild-type strain tests asa control and duplicate tests for each of target gene dele-tion strains The screenings were performed using PM1to PM20 both for metabolic and chemical sensitivity tests(httpwwwbiologcompmMicrobialCellsshtml)

Other omics type results will be added and made down-loadable from the GenoBase system when we obtain themCurrently comprehensive genetic interaction data popula-tion dynamics using the Barcode deletion collection in ad-dition to the simple growth condition measured by our lat-est colony quantification system (49) are scheduled to beopened

RESOURCE DISTRIBUTION

Current official distribution site is NIG (the NationalInstitute of Genetics Mishima httpwwwshigennigacjpecolistrain) supported by the National BioresourceProject (40)

Currently we have four types of predicted ORF plasmidclone libraries and two of them have been distributed fromNIG The Gateway entry clone library will start soon to dis-tribute to the academy side We also have efforts to open ournew resources timely as much as possible

For deletion construct only Keio collection (8) is nowavailable from NIG and we would like to make the Barcodedeletion collection publicly available as soon as possible

POLICY OF THE MANAGEMENT OF GenoBase

Assignment of annotation onto the genome is not our maintask The annotation information on genes depends onGenBank database entry The major purpose of GenoBase

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2014 11

is to provide information related to resources constructedby our group for the community to investigate E coli K-12as a model cell system We are also producing omics datato understand what a cell system is We hope experimentalresources their information and omics results from these re-sources may contribute in the community using E coli as amodel system

ACKNOWLEDGEMENTS

We especially thank Mohamed Yassin Eltabakh Joe Gris-som Thomas McGrew Matt Traxler and Yifeng Yang forcontributions to development of the GenoBase informationresources when hosted at Purdue UniversityAuthorsrsquo Contributions CO designed and implementedChado schema AM RT and YO performed re-calculation of the comparison between the design and cur-rent annotation MI and KN (Mitsubishi Space SoftwareInc) prepared web pages WGA TC MRG and DKset-up of GenoBase when hosted at Purdue KER pro-vided expertise in annotation and set-up of EcoGene whenhosted at Purdue KKN (Keio Univ) NY HD and YOprepared the Barcode deletion collection data set WN andTN prepared primer data set SS and ST prepared theBIOLOG pages and YT and BRB conducted BIOLOGanalysis BLW and HM oversaw all aspects of designingGenoBase and preparation of the manuscript

FUNDING

National Institutes of Health (NIH) [U24 GM077905 andRC1 GM092047 to BLW RO1 GM075004 to DK]JSPS Kakenhi 25250028 and MEXT Kakenhi 25108716Grant-in-Aid for Scientific Research from the Ministryof Education Culture Sports Science and Technology ofJapan Bio Innovation funding program in Okinawa Japan[to HM] NSF [106394 to BLW] Funding for openaccess charge JSPS KAKENHI 25250028 Grant-in-Aidfor Scientific Research from the Ministry of Educa-tion Culture Sports Science and Technology of Japan

Conflict of interest statement None declared

REFERENCES1 ThomasonMK BischlerT EisenbartSK ForstnerKU

ZhangA HerbigA NieseltK SharmaCM and StorzG (2014)Global transcriptional start site mapping using dRNA-seq revealsnovel antisense RNAs in Escherichia coli J BacteriolpiiJB02096-14

2 BrennerS JacobF and MeselsonM (1961) An unstableintermediate carrying information from genes to ribosomes forprotein synthesis Nature 190 576ndash581

3 JehmlichN HubschmannT Gesell SalazarM VolkerUBenndorfD MullerS von BergenM and SchmidtF (2010)Advanced tool for characterization of microbial cultures bycombining cytomics and proteomics Appl Microbiol Biotechnol88 575ndash584

4 RajagopalaSV SikorskiP KumarA MoscaR VlasblomJArnoldR Franca-KohJ PakalaSB PhanseS CeolA et al(2014) The binary protein-protein interaction landscape ofEscherichia coli Nat Biotechnol 32 285ndash290

5 NakahigashiK ToyaY IshiiN SogaT HasegawaMWatanabeH TakaiY HonmaM MoriH and TomitaM (2009)Systematic phenome analysis of Escherichia coli multiple-knockout

mutants reveals hidden reactions in central carbon metabolism MolSyst Biol 5 306

6 MartoranaAM MottaS Di SilvestreD FalchiF DehoGMauriP SperandeoP and PolissiA (2014) Dissecting Escherichiacoli outer membrane biogenesis using differential proteomics PloSOne 9 e100941

7 ConwayT CreecyJP MaddoxSM GrissomJE ConkleTLShadidTM TeramotoJ San MiguelP ShimadaT IshihamaAet al (2014) Unprecedented high-resolution view of bacterial operonarchitecture revealed by RNA sequencing mBio 5 e01442ndashe01414

8 BabaT AraT HasegawaM TakaiY OkumuraY BabaMDatsenkoKA TomitaM WannerBL and MoriH (2006)Construction of Escherichia coli K-12 in-frame single-gene knockoutmutants the Keio collection Mol Syst Biol 2 20060008

9 KitagawaM AraT ArifuzzamanM Ioka-NakamichiTInamotoE ToyonagaH and MoriH (2005) Complete set of ORFclones of Escherichia coli ASKA library (A Complete Set of E coliK-12 ORF Archive) Unique Resources for Biological ResearchDNA Res 12 291ndash299

10 YuraT MoriH NagaiH NagataT IshihamaA FujitaNIsonoK MizobuchiK and NakataA (1992) Systematicsequencing of the Escherichia coli genome analysis of the 0ndash24 minregion Nucleic Acids Res 20 3305ndash3308

11 RuddKE (1998) Linkage map of Escherichia coli K-12 edition 10the physical map Microbiol Mol Biol Rev 62 985ndash1019

12 KoharaY AkiyamaK and IsonoK (1987) The physical map ofthe whole E coli chromosome application of a new strategy for rapidanalysis and sorting of a large genomic library Cell 50 495ndash508

13 AibaH BabaT HayashiK InadaT IsonoK ItohT KasaiHKashimotoK KimuraS KitakawaM et al (1996) A 570-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the280ndash401 min region on the linkage map DNA Res 3 363ndash377

14 FujitaN MoriH YuraT and IshihamaA (1994) Systematicsequencing of the Escherichia coli genome analysis of the 24ndash41min (110917ndash193643 bp) region Nucleic Acids Res 22 1637ndash1639

15 HayashiK MorookaN YamamotoY FujitaK IsonoKChoiS OhtsuboE BabaT WannerBL MoriH et al (2006)Highly accurate genome sequences of Escherichia coli K-12 strainsMG1655 and W3110 Mol Syst Biol 2 20060007

16 ItohT AibaH BabaT HayashiK InadaT IsonoK KasaiHKimuraS KitakawaM KitagawaM et al (1996) A 460-kb DNAsequence of the Escherichia coli K-12 genome corresponding to the401ndash500 min region on the linkage map DNA Res 3 379ndash392

17 YamamotoY AibaH BabaT HayashiK InadaT IsonoKItohT KimuraS KitagawaM MakinoK et al (1997)Construction of a contiguous 874-kb sequence of the Escherichia coli-K12 genome corresponding to 500ndash688 min on the linkage mapand analysis of its sequence features DNA Res 4 91ndash113

18 BlattnerFR PlunkettG 3rd BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK MayhewGFet al (1997) The complete genome sequence of Escherichia coli K-12Science 277 1453ndash1474

19 YamamotoN NakahigashiK NakamichiT YoshinoMTakaiY ToudaY FurubayashiA KinjyoS DoseHHasegawaM et al (2009) Update on the Keio collection ofEscherichia coli single-gene deletion mutants Mol Syst Biol 5 335

20 ArifuzzamanM MaedaM ItohA NishikataK TakitaCSaitoR AraT NakahigashiK HuangHC HiraiA et al (2006)Large-scale identification of protein-protein interaction ofEscherichia coli K-12 Genome Res 16 686ndash691

21 TohsatoY BabaT MazakiY ItoM WannerBL and MoriH(2010) Environmental dependency of gene knockouts on phenotypemicroarray analysis in Escherichia coli J Bioinform Comput Biol8(Suppl 1) 83ndash99

22 TohsatoY and MoriH (2008) Phenotype profiling of single genedeletion mutants of E coli using Biolog technology Genome Inform21 42ndash52

23 ZhouP EmmertD and ZhangP (2006) Using Chado to storegenome annotation data Curr Protoc Bioinformatdoi1010020471250953bi0906s12

24 MungallCJ and EmmertDB (2007) A Chado case study anontology-based modular schema for representing genome-associatedbiological information Bioinformatics 23 i337ndashi346

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from

12 Nucleic Acids Research 2014

25 RajagopalaSV YamamotoN ZweifelAE NakamichiTHuangHK Mendez-RiosJD Franca-KohJ BoorgulaMPFujitaK SuzukiK et al (2010) The Escherichia coli K-12ORFeome a resource for comparative molecular microbiologyBMC Genom 11 470

26 ButlandG BabuM Diaz-MejiaJJ BohdanaF PhanseSGoldB YangW LiJ GagarinovaAG PogoutseO et al (2008)eSGA E coli synthetic genetic array analysis Nat Methods 5789ndash795

27 TypasA NicholsRJ SiegeleDA ShalesM CollinsSRLimB BrabergH YamamotoN TakeuchiR WannerBL et al(2008) High-throughput quantitative analyses of genetic interactionsin E coli Nat Methods 5 781ndash787

28 GiaeverG ChuAM NiL ConnellyC RilesL VeronneauSDowS Lucau-DanilaA AndersonK AndreB et al (2002)Functional profiling of the Saccharomyces cerevisiae genome Nature418 387ndash391

29 HillCW SchifferD and BergP (1969) Transduction ofmerodiploidy induced duplication of recipient genes J Bacteriol99 274ndash278

30 HillCW FouldsJ SollL and BergP (1969) Instability of amissense suppressor resulting from a duplication of genetic materialJ Mol Biol 39 563ndash581

31 AndersonRP MillerCG and RothJR (1976) Tandemduplications of the histidine operon observed following generalizedtransduction in Salmonella typhimurium J Mol Biol 105 201ndash218

32 AndersonRP and RothJR (1977) Tandem genetic duplications inphage and bacteria Ann Rev Microbiol 31 473ndash505

33 AndersonRP and RothJR (1978) Tandem chromosomalduplications in Salmonella typhimurium fusion of histidine genes tonovel promoters J Mol Biol 119 147ndash166

34 AndersonRP and RothJR (1978) Tandem genetic duplications inSalmonella typhimurium amplification of the histidine operon JMol Biol 126 53ndash71

35 AndersonRP and RothJR (1979) Gene duplication in bacteriaalteration of gene dosage by sister-chromosome exchanges ColdSpring Harb Symp Quant Biol 43 1083ndash1087

36 AndersonP and RothJ (1981) Spontaneous tandem geneticduplications in Salmonella typhimurium arise by unequalrecombination between rRNA (rrn) cistrons Proc Natl Acad SciUSA 78 3113ndash3117

37 PoonA DavisBH and ChaoL (2005) The coupon collector andthe suppressor mutation estimating the number of compensatorymutations by maximum likelihood Genetics 170 1323ndash1332

38 DatsenkoKA and WannerBL (2000) One-step inactivation ofchromosomal genes in Escherichia coli K-12 using PCR productsProc Natl Acad Sci USA 97 6640ndash6645

39 ZhouL LeiXH BochnerBR and WannerBL (2003)Phenotype microarray analysis of Escherichia coli K-12 mutants withdeletions of all two-component systems J Bacteriol 1854956ndash4972

40 RileyM AbeT ArnaudMB BerlynMK BlattnerFRChaudhuriRR GlasnerJD HoriuchiT KeselerIM KosugeTet al (2006) Escherichia coli K-12 a cooperatively developedannotation snapshotndash2005 Nucleic Acids Res 34 1ndash9

41 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res 41D613ndashD624

42 KeselerIM MackieA Peralta-GilM Santos-ZavaletaAGama-CastroS Bonavides-MartinezC FulcherC HuertaAMKothariA KrummenackerM et al (2013) EcoCyc fusing modelorganism databases with systems biology Nucleic Acids Res 41D605ndashD612

43 YamazakiY NikiH and KatoJ (2008) Profiling of Escherichia colichromosome database Methods Mol Biol 416 385ndash389

44 HuJC SherlockG SiegeleDA AleksanderSA BallCADemeterJ GouniS HollandTA KarpPD LewisJE et al(2014) PortEco a resource for exploring bacterial biology throughhigh-throughput data and analysis tools Nucleic Acids Res 42D677ndashD684

45 GlasnerJD RuschM LissP PlunkettG 3rd CabotELDarlingA AndersonBD Infield-HarmP GilsonMC andPernaNT (2006) ASAP a resource for annotating curatingcomparing and disseminating genomic data Nucleic Acids Res 34D41ndashD45

46 GlasnerJD LissP PlunkettG 3rd DarlingA PrasadTRuschM ByrnesA GilsonM BiehlB BlattnerFR et al (2003)ASAP a systematic annotation package for community analysis ofgenomes Nucleic Acids Res 31 147ndash151

47 BochnerBR (2003) New technologies to assess genotype-phenotyperelationships Nat Rev Genet 4 309ndash314

48 BochnerBR GadzinskiP and PanomitrosE (2001) Phenotypemicroarrays for high-throughput phenotypic testing and assay ofgene function Genome Res 11 1246ndash1255

49 TakeuchiR TamuraT NakayashikiT TanakaY MutoAWannerBL and MoriH (2014) Colony-live -a high-throughputmethod for measuring microbial colony growth kinetics- revealsdiverse growth effects of gene knockouts in Escherichia coli BMCMicrobiol 14 171

by Daisuke K

ihara on Novem

ber 21 2014httpnaroxfordjournalsorg

Dow

nloaded from