SnapshotEmergingTomatoGenomeSequence.pdf

Embed Size (px)

Citation preview

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    1/1578 TH E PLANT GENOME MARCH 2009 VO L. 2, NO . 1

    ORIGINAL RESEARCH

    A Snapshot of the Emerging Tomato Genome SequenceLukas A. Mueller,* Ren Klein Lankhorst, Steven D. Tanksley, James J. Giovannoni, Ruth

    White, Julia Vrebalov, Zhangjun Fei, Joyce van Eck, Robert Buels, Adri A. Mills, NaamaMenda, Isaak Y. Tecle, Aureliano Bombarely, Stephen Stack, Suzanne M. Royer, Song-BinChang, Lindsay A. Shearer, Byung Dong Kim, Sung-Hwan Jo, Cheol-Goo Hur, Doil Choi,Chang-Bao Li, Jiuhai Zhao, Hongling Jiang, Yu Geng, Yuanyuan Dai, Huajie Fan, JinfengChen, Fei Lu, Jinfeng Shi, Shouhong Sun, Jianjun Chen, Xiaohua Yang, Chen Lu, MingshengChen, Zhukuan Cheng, Chuanyou Li, Hongqing Ling, Yongbiao Xue,[continued next page.]

    Abstract The genome of tomato (Solanum lycopersicum L.) is beingsequenced by an international consortium of 10 countries (Korea,China, the United Kingdom, India, the Netherlands, France,Japan, Spain, Italy, and the United States) as part of the largerInternational Solanaceae Genome Project (SOL): SystemsApproach to Diversity and Adaptation initiative. The tomatogenome sequencing project uses an ordered bacterial articialchromosome (BAC) approach to generate a high-quality tomatoeuchromatic genome sequence for use as a reference genomefor the Solanaceae and euasterids. Sequence is deposited atGenBank and at the SOL Genomics Network (SGN). Currently,there are around 1000 BACs nished or in progress, representingmore than a third of the projected euchromatic portion ofthe genome. An annotation effort is also underway by theInternational Tomato Annotation Group. The expected number ofgenes in the euchromatin is ~40,000, based on an estimate froma preliminary annotation of 11% of nished sequence. Here, wepresent this rst snapshot of the emerging tomato genome and itsannotation, a short comparison with potato (Solanum tuberosumL.) sequence data, and the tools available for the researchersto exploit this new resource are also presented. In the future,whole-genome shotgun techniques will be combined with theBAC-by-BAC approach to cover the entire tomato genome. Thehigh-quality reference euchromatic tomato sequence is expectedto be near completion by 2010.

    T HE SOLANACEAE, also called nightshades, is amedium-sized owering plant amily o >9000 spe-cies, including economically important species such

    as tomato (Solanum lycopersicum L.), potato (Solanumtuberosum L.), pepper (Capsicum annuum L.), eggplant(Solanum melongena L.), tobacco (Nicotiana tabacum L.), and petunia ( Petunia hybrida Vilm.) (Knapp etal., 2004). Species o Solanaceae occur on all continentsexcept Antarctica and are very diverse in habit romtrees to tiny annualsand habitat rom deserts totropical rain orests. Members o the amily also serveas scienti c model plants, or the study o ruit develop-ment (Gray et al., 1992; Fray and Grierson, 1993; Brum-mell and Harpster, 2001; Alexander and Grierson, 2002;Adams-Phillips et al., 2004; Giovannoni, 2004; anksley,

    2004; Seymour et al., 2008), tuber development (Prat etal., 1990; Bachem et al., 1996; Fernie and Willmitzer,2001), biosynthesis o anthocyanin and carotenoid pig-ments (Gerats et al., 1985; Giuliano et al., 1993; Mueller

    Published in The Plant Genome 2:7892. Published 18 Mar. 2009.doi: 10.3835/plantgenome2008.08.0005 Crop Science Society of America677 S. Segoe Rd., Madison, WI 53711 USAAn open-access publicationAll rights reserved. No part of this periodical may be reproduced ortransmitted in any form or by any means, electronic or mechanical,including photocopying, recording, or any information storage andretrieval system, without permission in writing from the publisher.Permission for printing and for reprinting the material contained hereinhas been obtained by the publisher.

    Abbreviations: AGP, Accessioned Golden Path; BAC, bacterialarticial chromosome; COS, conserved ortholog set; EST,expressed sequence tag; FISH, uorescent in situ hybridization; FPC,ngerprinted contig; HMM, hidden Markov model; HTGS, high-throughput genome sequence; ITAG, International Tomato AnnotationGroup; LRR, leucine-rich repeats; LZ, leucine zippers; NBS, nucleotidbinding sites; NCBI, National Center for Biotechnology Information;PlantGDB, Plant Genome Database; PUT, PlantGDB-assembledunique transcripts; R-genes, resistance genes; SGN, SOL GenomicsNetwork; SSR, simple sequence repeat; TF, transcription factor; Tm,trans-membrane; WGS, whole-genome shotgun.

    L.A. Mueller, J.J. Giovannoni, R. White, J. Vrebalov, Z. Fei, J. van Eck, R.Buels, A. Mills, N. Menda, I.Y. Tecle, and A. Bombarely, Boyce ThompsonInstitute, Ithaca, NY 14853; S.D. Tanksley, Dep. Plant Breeding, CornellUniv., Ithaca, NY 14853; S. Stack, S.M. Royer, S.-B. Chang, and L.A.Shearer, Dep. of Biology, Colorado State Univ., Fort Collins, CO 80523;S.-H. Jo and C.-G. Hur, Plant Genome Research Center, KRIBB, Taejon305-600, Korea; B.D. Kim and D. Choi, Seoul National Univ., San 56-1Shinlim-dong, Gwanak-gu, Seoul 151-742, Korea. Received 21 Aug.2008. *Corresponding author ([email protected]).[Afliations continued on next page.]

    Published March, 2009

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    2/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 79

    Ying Wang, Graham B. Seymour, Gerard J. Bishop, Glenn Bryan, Jane Rogers, Sarah Sims,Sarah Butcher, Daniel Buchan, James Abbott, Helen Beasley, Christine Nicholson, ClareRiddle, Sean Humphray, Karen McLaren, Saloni Mathur, Shailendra Vyas, AmolkumarU. Solanke, Rahul Kumar, Vikrant Gupta, Arun K. Sharma, Paramjit Khurana, Jitendra P.Khurana, Akhilesh Tyagi, Sarita, Parul Chowdhury, Smriti Shridhar, Debasis ChattopadhyayAwadhesh Pandit, Pradeep Singh, Ajay Kumar, Rekha Dixit, Archana Singh, SumeraPraveen, Vivek Dalal, Mahavir Yadav, Irfan Ahmad Ghazi, Kishor Gaikwad, Tilak RajSharma, Trilochan Mohapatra, Nagendra Kumar Singh, Dra Szinay, Hans de Jong,Sander Peters, Marjo van Staveren, Erwin Datema, Mark W.E.J. Fiers, Roeland C.H.J.van Ham, P. Lindhout, Murielle Philippot, Pierre Frasse, Farid Regad, Mohamed Zouine,Mondher Bouzayen, Erika Asamizu, Shusei Sato, Hiroyuki Fukuoka, Satoshi Tabata,Daisuke Shibata, Miguel A. Botella, M. Perez-Alonso, V. Fernandez-Pedrosa, Sonia OsorioAmparo Mico, Antonio Granell, Zhonghua Zhang, Jun He, Sanwen Huang, Yongchen Du,Dongyu Qu, Longfei Liu, Dongyuan Liu, Jun Wang, Zhibiao Ye, Wencai Yang, GuopingWang, Alessandro Vezzi, Sara Todesco, Giorgio Valle, Giulia Falcone, Marco Pietrella,Giovanni Giuliano, Silvana Grandillo, Alessandra Traini, Nunzio DAgostino, Maria LuisaChiusano, Mara Ercolano, Amalia Barone, Luigi Frusciante, Heiko Schoof, Anika Jcker,

    Rmy Bruggmann, Manuel Spannagl, Klaus X.F. Mayer, Roderic Guig, Francisco CamaraStephane Rombauts, Jeffrey A. Fawcett, Yves Van de Peer, Sandra Knapp, Dani Zamir, andWillem Stiekema

    [Author list continued.]

    C.-B. Li, J. Zhao, H. Jiang, Y. Geng, Y. Dai, H. Fan, J. Chen, F. Lu, J. Shi,S. Sun, J. Chen, X. Yang, C. Lu, M. Chen, Z. Cheng, C. Li, H. Ling, Y.Xue, and Y. Wang, Institute of Genetics and Developmental Biology,Chinese Academy of Sciences, Beijing 100101, China; G. Seymour,Division of Plant Sciences, Univ. of Nottingham, Sutton Bonington, LE125RD, UK; G.J. Bishop, S. Butcher, D. Buchan, and J. Abbott, ImperialCollege London, London, SW7 2AZ, UK; G. Bryan, SCRI Invergowrie,

    Dundee, DD2 5DA, UK; S. Mathur, S. Vyas, A.U. Solanke, R. Kumar,V. Gupta, A.K. Sharma, P. Khurana, J.P. Khurana, and A. Tyagi, Univ. ofDelhi South Campus, New Delhi, 110 02, India; Sarita, P. Chowdhury,S. Shridhar, and D. Chattopadhyay, National Institute for Plant GenomeResearch, New Delhi, 110 067, India; A. Pandit, P. Singh, A. Kumar, R.Dixit, A. Singh, S. Praveen, V. Dalal, M. Yadav, I.A. Ghazi, K. Gaikwad,T.R. Sharma, T. Mohapatra, and N.K. Singh, NRC on Plant Biotechnology,Indian Agricultural Research Institute, New Delhi, 110 012, India; R. KleinLankhorst, R.C.H.J. van Ham, and W. Stiekema, Centre for BioSystemsGenomics, P.O. Box 98, 6700 AB, Wageningen, Netherlands; D.Szinay, H. de Jong, S. Peters, and P. Lindhout, Wageningen Univ., Lab.of Genetics, Arboretumlaan 4, 6703 BD, Wageningen, Netherlands; M.van Staveren, E. Datema, M.W.E.J. Fiers, and R.C.H.J. van Ham, PlantResearch International, Droevendaalsesteeg 1, 6708 PB, Wageningen,Netherlands; M. Philippot, P. Frasse, F. Regad, M. Zouine, and M.Bouzayen, UMR990, INRA, chemin de Borde Rouge, 31326 Castanet-Tolosane, France; E. Asamizu, S. Sato, S. Tabata, and D. Shibata,Kazusa, Kisarazu, 292-0818, Chiba, Japan; S. Osorio, A. Mico, andA. Granell, Instituto de Biologa Molecular y Celular de Plantas, CSIC/Universidad Politcnica de Valencia Ciudad Politcnica de la Innovacin- Edicio 8E, 46 011 Valencia, Spain; M.A. Botella, Univ. of Mlaga,Campus de Teatinos, 29071 Mlaga, Spain; G. Falcone, M. Pietrella,and G. Giuliano, ENEA, Casaccia Research Center, Via Anguillarese301, 00123 Rome, Italy; A. Traini, N. DAgostino, M.L. Chiusano, M.Ercolano, A. Barone, and L. Frusciante, Dep. of Soil, Plant, Environmentaland Animal Production Sciences, Univ. of Naples Federico II, ViaUniversit 100, 80055 Portici, Italy; D. Zamir, Hebrew Univ., P.O. Box 12,Rehovot 76100, Israel; H. Fukuoka, National Institute of Vegetable and Tea

    Science, National Agriculture Research Organization, 360 Kusawa, Acho, Tsu-shi, Mie 514-2392, Japan (NIVTS); J. Rogers, S. Sims, H. BeC. Nicholson, C. Riddle, and K. McLaren, Wellcome Trust Sanger InsWellcome Trust Genome Campus, Cambridge, CB10 1SA, UK; Z. Zh

    J. He, S. Huang, Y. Du, and D. Qu, Institute of Vegetables and FlowerChinese Academy of Agricultural Sciences, Beijing 100081, China; LLiu, D. Liu, and J. Wang, Beijing Genomics Institute, Shenzhen 51808

    China; Z. Ye, College of Horticulture and Forestry, Huazhong AgricuUniv., Wuhan, China; W. Yang, College of Agronomy and BiotechnolChina Agricultural Univ., Beijing 100094, China; G. Wang, Dep. ofHorticulture, South China Agricultural Univ., Guangzhou, China; H.Schoof and A. Jcker, Max Planck Institute for Plant Breeding ReseaCarl-von-Linn-Weg 10, 50829 Cologne, Germany; M. Perez-Alonsoand V. Fernandez-Pedrosa, Sistemas Genmicos, SL, Avenida BenjamFranklin, 46980 Paterna, Valencia, Spain; R. Guig and F. Camara,Centre de Regulaci Genmica, Universitat Pompeu Fabra, Dr. Aigua88, 08003 Barcelona, Spain; S. Humphray, Illumina Cambridge Ltd.,Chesterford Research Park, Little Chesterford, Saffron Walden, EssexCB10 1XL, UK; S. Rombauts, J.A. Fawcett, and Y. van de Peer, VIB/GUniv., Technologiepark 927, 9052 Ghent, Belgium; S. Knapp, BotanyDep., The Natural History Museum, Cromwell Rd., London, SW7 5BUK; R. Bruggmann, Rutgers, The State Univ. of New Jersey, WaksmaInstitute of Microbiology, 190 Frelinghuysen Rd., Piscataway, NJ 0888020; M. Spannagl and K.X.F. Mayer, MIPS/Institute for Bioinformaand Systems Biology, Helmholtz Zentrum Mnchen, Ingolstdter Lan85764 Neuherberg, Germany; A. Vezzi, S. Todesco, and G. Valle, CRUniv. of Padua, via U. Bassi, 58/b-35131 Padua, Italy; S. Grandillo,CNR, Institute for Plant Genetics, Portici, Via Universit 133, 80055 PItaly; L.A. Mueller, Z. Fei, R. Buels, C.-G. Hur, D. Buchan, S. MathurDatema, M.W.E.J. Fiers, A. Traini, N. DAgostino, M.L. Chiusano, H.Schoof, A. Jcker, R. Bruggmann, M. Spannagl, K.X.F. Mayer, R. GuF. Camara, S. Rombauts, J.A. Fawcett, and Y. van de Peer, InternationTomato Genome Annotation Group (ITAG); J.J. Giovannoni and R. WUSDA- ARS, Tower Road, Ithaca, NY 14853, USA.

    [Afliations continued.]

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    3/1580 TH E PLANT GENOME MARCH 2009 VO L. 2, NO . 1

    et al., 2000; Spelt et al., 2002; De Jong et al., 2004; Quat-trocchio et al., 2006), and plant de ense (Bogdanoveand Martin, 2000; van der Vossen et al., 2000; Gebhardtand Valkonen, 2001; Kessler and Baldwin, 2001; Li etal., 2001; Bai et al., 2003; Hui et al., 2003; Pedley andMartin, 2003; Sacco et al., 2007). Te Solanaceae havealso attracted interest because they produce a numbero specialized metabolites that have medicinal proper-ties (Schijlen et al., 2006; Oksman-Caldentey, 2007). TeSolanaceae are remarkable in that the gene content o thedifferent species remains similar despite the highly var-ied phenotypic outcomes ( anksley et al., 1992; Knappet al., 2004). Tis makes the Solanaceae an excellentmodel or the study o plant adaptation to natural andagricultural environments (Knapp et al., 2004). Mostspecies o the Solanaceae are diploid and share a basic seto 12 chromosomes (Olmstead et al., 1999); recent poly-ploidizations during the evolutionary history o the am-ily are limited to a ew clades such as the potatoes andtobaccos (Clarkson et al., 2005).

    A Solanaceae re erence genome will be an invaluableresource in addressing two undamental biological ques-tions: rst, how genomes code or extensive phenotypicdifferences using relatively conserved sets o genes; andsecond, how phenotypic diversity can be harnessed orthe improvement o agricultural products. Sequencedata rom other species, such as expressed sequencetags (ES s) (Adams et al., 1991), methylation (Palmer etal., 2003; Whitelaw et al., 2003; Fu et al., 2004), or Cot-

    ltered sequence (Peterson et al., 2002; Yuan et al., 2003),together with sequencing by novel very high throughputapproaches such as 454 sequencing (Margulies et al.,2005) or Solexa sequencing (Shendure et al., 2005) in

    combination with good comparative maps ( anksleyet al., 1992; Doganlar et al., 2002; Fulton et al., 2002)between many Solanaceae plants (Hoeven et al., 2002;DAgostino et al., 2007), will enable insights into evolu-tion, domestication, development, response, and signaltransduction pathways.

    Afer the sequencing o a number o dicots rom therosid clade (Angiosperm Phylogeny Group, 2003), Ara-bidopsis thaliana L. (AGI, 2000), Medicago truncatula Gaertn. (Cannon et al., 2006) using bacterial arti cialchromosome (BAC)-by-BAC approaches, and poplar[Populus trichocarpa ( orr. & A. Gray)] ( uskan et al.,

    2006), grape (Vitis vinifera L.) (Jaillon et al., 2007), andothers using whole-genome shotgun (WGS) techniques,the sequencing o the rst genome in the asterids willshed light on this clade, permitting longer-range evolu-tionary distance comparisons and provide in ormationabout the larger picture o angiosperm evolution.

    en countries are involved in sequencing the tomatogenome and the 12 chromosomes have been allocatedamong the countries as depicted in Fig. 1. Te chloroplastgenome was recently completed by a European consor-tium (Kahlau et al., 2006) and the mitochondrial genomeis being sequenced by the Instituto Nacional de ecnolo-

    ga Agropecuaria in Argentina within the ramework o

    the EU-SOL project (http://www.eu-sol.net [veri ed 10Jan. 2009]).

    Te 950-Mb tomato genome is structured into distal,gene-rich euchromatin and gene-poor pericentromericheterochromatin. Te heterochromatic raction, consist-ing mostly o repetitive sequences, will be extremely di -

    cult to sequence. Tere ore, the strategy is to initiallysequence the euchromatic portions o the genome, whichis estimated to make up one-quarter (220 Mb) o thetomato genomic sequence (Peterson et al., 1996) includ-ing >90% o the genes (Wang et al., 2006). As a conse-quence, the effort to sequence the majority o the genespace is less than twice the effort required to sequencethe Arabidopsis genome at 157 Mb (Bennett et al., 2003).

    o render the emerging tomato sequence immedi-ately use ul to the community, it is being annotated bythe International omato Annotation Group (I AG).Annotations are available on the SOL Genomics Net-work (SGN) website (http://sgn.cornell.edu/ [veri ed 10Jan. 2009]), and a number o Web-based tools have beendeveloped that allow researchers to download and ana-lyze the emerging sequence.

    Here, we provide a summary o the status o theproject and relevant insights drawn rom the annotationo the tomato genome per ormed to date.

    Results and Discussiono sequence the tomato euchromatin, a BAC-by-BAC

    approach was chosen in pre erence to a WGS strat-egy. Tis will generate a high-quality gold standardsequence, which is essential or use as a re erencegenome (International Rice Genome Sequencing Proj-

    ect, 2005) and which will serve as the scaffold or therelated Solanaceae genomes. In short, the BAC-by-BACstrategy involves the anchoring o BACs or contigs oBACs to a re erence genetic map. Tese anchored BACsare sequenced, and the sequence in ormation is used toextend these BACs and BAC contigs urther (BAC walk-ing). Gaps between BAC contigs are closed by targetingnovel markers or BACs to these gaps, which is then ol-lowed by successive rounds o BAC walking.

    Te high-density F 22000 map (Fulton et al., 2002)is used as a re erence genetic map or the sequencingproject. Tis map is based on 80 F 2 individuals rom the

    cross Solanum lycopersicum LA925 S. pennellii Cor-rell LA716 and contains a subset o restriction ragmentlength polymorphism markers rom the omato-EXPEN1992 map ( anksley et al., 1992). Most o the markersare conserved ortholog set (COS) markers (Fulton etal., 2002; Wu et al., 2006) derived rom a comparison oSolanaceae ES s against the entire Arabidopsis genome.Tose COS markers selected were singlelow copy, hav-ing a highly signi cant match with a putative ortholo-gous locus in Arabidopsis. Maps constructed using COSmarkers can readily be compared and analyzed orchromosome inversions, duplications, and other large-

    scale genome rearrangements, a characteristic that will

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    4/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 81

    be use ul or trans erring knowledge rom tomato toother species. In addition to COS markers, the map alsocontains a signi cant number o simple sequence repeat(SSR) markers, most o which were identi ed in ES s(usually in 5 or 3 untranslated regions).

    Te BACs used in the tomato sequencing projectare derived rom several libraries, all o which were con-structed rom the Heinz 1706 tomato line. In addition toa HindIII library consisting o 129,024 clones that wasavailable at the outset o the project (Budiman et al., 2000),two additional BAC libraries were generated, an EcoRI

    library o 72,264 clones and an MboI library o 52,992clones. ogether, these libraries provide more than 25genome coverage. Te BAC libraries have been deep end-sequenced in the United States, with >340,000 high-qual-ity reads equivalent to 20% o the entire genome sequence.Te BAC libraries are complemented by a osmid library.Currently, >180,000 high-quality osmid end sequences

    rom the Wellcome rust Sanger Institute and the Univer-sity o Padua are available, equivalent to 15% o the entiregenome sequence. Fosmid libraries are crucial in a genomesequencing project because their narrowly de ned insertlength can be used as an analytical tool to detect potential

    misassemblies o BACs, and their generally shorter insert

    length is ideal or lling smaller gaps and thereby reducingredundant sequence (Kim et al., 1995). Te osmid libraryis cut using shearing rather than restriction enzymes toobtain clone coverage in regions low or devoid o the rel-evant restriction sites.

    All BACs rom theHindIII library and rom the MboI library were ngerprinted and contigs o overlap-ping BACs were generated using the ngerprinted con-tigs (FPC) tool (Soderlund et al., 2000). First, an analysiso the BAC ngerprint data yielded 6000 contigs, owhich >3500 could be anchored to the genetic map. In an

    effort to globally reduce the number o contigs, the entireFPC data were reassembled using less stringent assemblycriteria (cutoff E-value o 1 1012 and tolerance o 7).Tis resulted in 4360 contigs representing about 658 Mbo sequence. o increase the contig size and to reduce thecontig number urther, the contigs were manually editedwith anchoring in ormation by contig end-search andmerging, resulting in 4156 contigs.

    Finally, a total o 837 markers were used to anchorthe contigs to the tomato genetic map. Te anchoredcontigs represent about 187 Mb o genomic DNA and aremainly composed o euchromatic sequences rom the

    tomato genome.

    Figure 1. Status of the tomato euchromatin sequence as of September 2008. For each chromosome the responsible country is shown.Progress in the sequencing of each chromosome (Chr) is given, as well as the status and the availability of the bacterial articialchromosomes (BACs). HTGS, high-throughput genome sequence.

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    5/1582 TH E PLANT GENOME MARCH 2009 VO L. 2, NO . 1

    Validation o the physical map was per ormed usinguorescent in situ hybridization (FISH) on pachytene

    complements with entire BAC clones as probes (Chang etal., 2007; Szinay et al., 2008) (see also FISH map on SGN,http://sgn.cornell.edu/cview/map.pl?map_id=13 [veri ed10 Jan. 2009]), and by genetic mapping o anchored BACsusing panels o tomato introgression line populations(Eshed and Zamir, 1995). Te integrated map is availablethrough WebFPC.

    Since the current sequencing effort ocuses on thetomato euchromatin, determining the chromosomalborders between euchromatin and heterochromatin isessential. Currently, we use FISH to identi y BAC inserts

    rom euchromatinheterochromatin boundaries basedon linkage map in ormation and on the speci c stainingby FISH o the repetitive raction o the tomato genome(Szinay et al., 2008); see Fig. 2.

    In a multinational project, it is important that allparticipants use the same standards or completing theirsequences. Te omato Genome Project started to developthese standards early on, and they will be maintained anddeveloped when new issues arise. Te ull quality stan-dards are described in the omato Sequencing Guidelinesdocument available online at http://docs.google.com/View.aspx?docid=dggs4r6k_1dd5p56 (veri ed 5 Feb. 2009).

    In summary, the BACs are being sequenced to theollowing quality standards:

    Te BAC sequence submitted in high-throughputgenome sequence (H GS) Phase3 consists o asingle contig.

    All bases o the H GS Phase 3 consensus sequencemust have a Phred quality score o at least 30.

    As a result o the shotgun process, the bulk osequence will be derived rom multiple subclonessequenced rom both strands. Any regions ounidirectional sequence coverage with a single

    sequencing chemistry must pass manual inspectionor sequence problems but need not be annotated.Regions covered by only a single subclone must beattempted rom an alternate subclone or by directwalking on BAC DNA or by BAC polymerasechain reaction. Tese regions must concur with arestriction digest analysis o the clone. In addition,these regions must be annotated.

    At least 99% o the sequence must have less than oneerror in 10,000 bp as reported by Phrap or othersequence assembly consensus scores. Exceptionsmust be manually checked and pass inspection

    or possible problems. Any areas not meeting thisstandard must be annotated as such.o date (September 2008), 689 BACs have been

    sequenced and reported in the SGN BAC registry database(either H GS Phase 2 or Phase 3) (Fig. 1), representing74.8 Mb (including overlaps) (available rom SGN andGenBank). O these, 419 are included in the AccessionedGolden Path (AGP) les, which can be viewed in the SGNAGP map representing 44.5 Mb o sequence, representingroughly 20% o the tomato euchromatin. Tese BACs havebeen placed into 282 contigs and have been annotated

    using the I AG annotation pipeline; see below.Genome Annotation by ITAG

    o render the sequence immediately use ul to the com-munity, I AG is producing a high-quality automatedannotation o the tomato genome in a distributed collab-orative effort, which involves groups rom Europe, Asia,and the United States. Te centerpiece o the structuralannotation is the EuGene gene prediction plat orm (Fois-sac et al., 2008), a power ul predictor capable o integrat-ing a diverse array o inputs, such as evidence-basedalignments and ab initio predictions. For the unctional

    annotation, InterPro domains are determined usingInterproScan and homology searches are per ormed.Where possible, other sequence eatures (i.e., noncodingRNAs) are predicted. An important initial activity o theI AG group was to generate a training and test set o genesequences to train gene nders or tomato. Gene nd-ers that are trained or have been trained include EuGene(Foissac et al., 2008), GeneMark (Isono et al., 1994), win-Scan (Kor et al., 2001), and Augustus (Stanke et al., 2008).Results o predicted gene models and their unctionalannotations are available via the SGN Web site.

    In the rst batch o annotations partially based on

    as yet untrained gene nders, the I AG pipeline has

    Figure 2. Labeling of the heterochromatic part of tomato chro-mosome 6 by uorescent in situ hybridization (FISH) with theCot-100 genomic DNA fraction (green signal). The differentlylabeled bacterial articial chromosome (BAC) clones resident inthe heterochromatineuchromatin borders of the short arm andof the long arm are pseudocolored in red and magenta. DAPI,4 ,6-diamidino-2-phenylindole, dihydrochloride.

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    6/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 83

    identi ed 7464 protein coding genes longer than 180nucleotides in 44 Mb o nonredundant sequence. Tisrepresents a gene density o approximately one geneper 6 kb, slightly lower than the density o one gene per~4.5 kb in Arabidopsis (AGI, 2000) but is higher than oneprotein coding gene in 9.9 kb in the rice ( Oryza sativa L.)genome (International Rice Genome Sequencing Project,2005). Te average coding sequence is 996 bp long and iscomposed o 3.7 exons. Te primary difference betweentomato and Arabidopsis genes is that tomato genes, includ-ing their introns, are longer. Te average gene length

    rom this analysis is ~2 kb, with an average intron lengtho 485 bp and an average exon length o 268 bp, signi -cantly larger than those in Arabidopsis. While the lowernumber o exons per gene almost certainly represents thecurrent lower annotation quality o tomato genes, it isnotable that the average intron length is more than twicethat in Arabidopsis. Assuming a gene density o one geneper 6 kb in the rest o the tomato euchromatin, we canexpect that the euchromatin o the tomato genome con-tains just over 40,000 genes, close to the estimated numbero about 35,000 (Hoeven et al., 2002). Obviously, someo these parameters may change with improved tomatogenome annotations and the urther improvement otrained tomato gene nders. Figure 3 shows the numbero tomato genes alling into certain annotation categories,and a comparison to the numbers in the categories oundin Arabidopsis, rice, and poplar. Te numbers in each cat-egory are similar between species, indicating that the rac-tion o the tomato sequence that has so ar been sequencedis similar to other plant genomes.

    De novo repeat analysis was per ormed on the avail-able BAC-end sequences, and the resulting repeats were

    used to analyze both the BAC end sequences as well as thecomplete BAC sequences. Te de novo repeat set masked57% o BAC-ends and 24% o ull BAC sequence, indicatingthat the BACs selected rom the euchromatin contain ewerrepeats than the genome as a whole. Tese results supportthe recently described distribution o tomato repetitivesequences as determined by FISH (Chang et al., 2008). Te

    raction o long terminal repeat elements was much higherin BAC-ends (30%) than in the ull BAC sequences (12.6%),indicating that there are large differences in the nature orepeats occurring in different genome regions.

    Te distribution o repeats and gene content on

    selected chromosomes is shown in Fig. 4, de ned byrepeat analysis and ES coverage. Te in ormation isreported only or those chromosomes or which ilingPath Format les, which represent the tentative order othe BACs in the chromosome assembly as provided bythe sequencing centers, are available at the SGN Web siteto date. Te ollowing number o BACs were analyzed oreach chromosome: chromosome 4, 94; chromosome 5, 35;chromosome 6, 100; chromosome 9, 43; and chromosome12, 34. Tis analysis includes a number o BACs that wereattributed to heterochromatin but nevertheless have beensequenced. Te bars in each panel represent the percentage

    o nucleotides in a BAC that could be aligned to Solanum

    lycopersicum ES s (blue bars) and repeat sequences (redbars). Figure 4 shows that the repeats are much lower inabundance in the euchromatic arms and in some cases

    orm a gradient o increasing density into the heterochro-matin, whereas on other arms the transition appears lessgradual. Also, in general, the gene-rich BACs have lowerrepeat content, supporting the general assumption thatgenes are predominantly present in the relatively repeat-poor euchromatin. Te tomato heterochromatin consistso the bulk o the repetitive DNA raction, which never-theless also contains some genes as has been described byYasuhara and Wakimoto (2006).

    ranscription actors ( Fs) play key roles in regu-lation o gene expression in various biological pro-cesses. Te assembled ES s (Plant Genome Database[PlantGDB]assembled unique transcripts [PU s]) oSolanum lycopersicum rom PlantGDB were searched

    or putative Fs using hidden Markov model (HMM)pro les, which resulted in the identi cation o 1463such PU s that included 66 o the 71 known F gene

    amilies. Considering that 40,000 genes are predicted inthe tomato genome (Hoeven et al., 2002), this indicatesthat ~3.6% o the total genes in the euchromatic regionmay be Fs. For Arabidopsis, 5.9 to 7% (Riechmann etal., 2000; Riano-Pachon et al., 2007) and rice, 4% (Goffet al., 2002; Riano-Pachon et al., 2007) o the total genesare Fs. Further, 237 PU s (16%) encoding putative Fscould be mapped on 559 tomato BACs, representingaround 56 Mb sequenced tomato genome. On average,one F gene is present in every 200 kb (assuming aver-age BAC size to be 100 kb); see able 1. Chromosomes 12and 11 seem to harbor the highest and lowest density o

    F genes, respectively. Te major three F gene amilies

    in tomato include AP2-EREBP (APE ALA2-ethyleneresponsive element binding protein), MYB, and bHLH(basic helix-loop-helix) amilies (not shown).

    Sequence analysis o cloned plant disease resistancegenes (R-genes) con erring resistance to viral, bacterial,and ungal pathogens has shown that the majority othem possess common sequences and structural moti s.Tese R-genes can be grouped into three major classes(NBS-LRR type, LZ-NBS-LRR type, or LRR- m type) onthe basis o their encoded protein moti s such as leucinezippers (LZ), nucleotide binding sites (NBS), leucine-richrepeats (LRR), protein kinases domains, trans-mem-

    brane ( m) domains, and oll-IL-IR homology regions.We analyzed 48,945 unigene (PU ) sequences o tomatorom PlantGDB or the presence o R-gene homologs by

    a BLAS X analysis against the nonredundant databaseo the National Center or Biotechnology In ormation(NCBI) and classi ed them into the above three catego-ries. Te PU matches to different putative R-genes andLRR moti s only were grouped into the miscellaneousR-gene category. In addition, de ense response genes suchas glucanases, chitinase, and thaumatin-like proteinswere also included in the analysis.

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    7/1584 TH E PLANT GENOME MARCH 2009 VO L. 2, NO . 1

    Figure 3. Annotation categories for the annotated tomato genes from the International Tomato Annotation Group annotation pipelineand comparison to categories in Arabidopsis, poplar, and rice. (A) Annotation statistics categorized by higher-level gene ontology(GO) biological process terms. (B) Annotation statistics categorized by GO molecular function terms.

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    8/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 85

    We ound a total o 155 annotations similar to resis-tance-like genes and 83 annotations showed homology tothe de ense-response-like genes (Fig. 5).

    Tese R-gene and de ense-response gene homologswere mapped in silico onto the sequenced BACs o thedifferent chromosomes to nd their physical locations,

    resulting in the localization o 59 R-gene homologs and

    Figure 4. Gene and repeat coverage for selected tomato chromosomes (4, 5, 6, 9, and 12). The bacterial articial chromosomes(BACs) are arranged in the order they appear along the chromosome. For each BAC, the percentage of expressed sequence tag(blue bars) and repeat (red bars) coverage are shown. The gray rectangle denes the pericentromeric heterochromatic region in eachchromosome. The data shown in this gure are available for all the chromosomes under sequencing and are available through theGenome Overview at http://biosrv.cab.unina.it/GBrowse/ (veried 16 Jan. 2009). The data are updated at each new BAC releasein GenBank. Updated versions of this gure are provided on unordered BACs and are available at http://biosrv.cab.unina.it/GBrowse/Graphs/graphall1.html (veried 16 Jan. 2009).

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    9/15

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    10/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 87

    Tomato and Potato Assembly Assistance Systemhe omato and Potato Assembly Assistance System

    was developed to automate the assembly and sca -olding o contig sequences or tomato chromosome 6

    (Peters et al., 2006).

    Morgan2McClintock A tomato-speci c data set was added to the Morgan-2McClintock tool (Lawrence et al., 2006). Tis tool wasimplemented at the MaizeGDB database (http://www.maizegdb.org/) and initially used the maize Recombina-tion Nodule map (Anderson et al., 2003, 2004) to calcu-late approximate chromosomal positions or loci givena genetic map or a single chromosome in maize. Withthe new data set (Chang et al., 2007), the tool can also beused or queries related to tomato.

    U Padua PABS (Platform AssistedBAC-by-BAC Sequencing) Te Plat orm Assisted BAC-by-BAC Sequencing pipeline( odesco et al., 2008) is an in ormatics pipeline to opti-mize BAC-by-BAC sequencing projects.

    ISOLA An Italian SOLAnaceae genomics resource, ISOL@(http://biosrv.cab.unina.it/isola/ [veri ed 16 Jan. 2009]),was designed to provide ull Web access to details o thegenome annotation based on experimental evidence asderived rom ES ull-length cDNA sequences (Chiu-sano et al., 2008).

    Summary and Outlook

    Recently, the omato Genome Sequencing Project hasmade highly signi cant progress toward its goal osequencing 220 MB o euchromatin space o the tomatogenome, which has been predicted to contain the major-ity o tomato genes. In total, more than 950 BACs havebeen sequenced, representing over one-third o thetargeted genome space. Sequences are being depositedat GenBank (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genomeprj&cmd=ShowDetailView& erm oSearch=9509 [veri ed 5 Feb. 2009]) and the SGN database(http://sgn.cornell.edu/), and are being annotated usinga pipeline established by an international group (I AG)

    o bioin ormatics centers. A number o tools have beencreated that allow both researchers and tomato breed-ers to work with the emerging sequence. Trough theextensive comparative maps that are available, much othe in ormation rom the tomato sequence can readily betrans erred to other Solanaceae and related asterids suchas coffee (Coffea canephora L.) (Gentianales, Rubiaceae)or mint ( Mentha) (Lamiales, Lamiaceae).

    A BAC-by-BAC sequencing approach was chosen tosequence the tomato genome because it provides the high-est possible sequence quality. However, since the projectwas started, novel next generation sequencing technolo-gies have become available that are now being applied toWGS sequencing or complex genomes. Te BAC-by-BACapproach has inherent advantages, and yields insightsbeyond sequence space as the approach is based on care ulevaluation o BAC positions by genetic mapping and byFISH. For example, several inversions could be identi edbetween the cultivated tomato and its wild relative parentused in the re erence map ( ang et al., 2008). Te maindrawback o the BAC-by-BAC approach is that it is rela-tively more expensive and slower than the WGS approach.Recently, the grape genome was sequenced using a shot-gun approach, resulting in >2000 unordered contigs. How-ever, it was estimated that >95% o grape gene sequenceswere recovered in the sequence (Velasco et al., 2007). Tus,in the uture, a hybrid approach or sequencing the tomatogenome will be pursued by using WGS as an additionalresource or nishing the euchromatic part o the genomeand or obtaining sequence or the heterochromatic part othe genome.

    A preliminary annotation o about 11% o the totalassembled euchromatic space o tomato gives a gene den-

    sity o one gene per 6 kb, which corresponds to an extrap-olated gene count o just over 40,000 genes or the entireeuchromatin, consistent with previous estimates. Notably,certain well-known tomato genes have been recoveredin the genome sequence, such as R-gene alleles at the Mi resistance locus, the ruit shape locus ovate, and the phy-toene synthase 1 gene involved in carotenoid biosynthesis.

    Te tomato genome is repeat-rich, and analyseso BAC-end sequences, which sampled sequence romboth the heterochromatin and euchromatin, revealedthat about 70% o the sequence was masked and hencelargely represent heterochromatin repeats. In ull BAC

    Table 2. Disease resistance-like and defense response-like unigenes (Plant Genome Databaseassembled uniquetranscripts [PUTs]) mapped on the sequenced bacterial articial chromosomes (BACs) of the 12 tomato chromo

    1 2 3 4 5 6 7 8 9 10 11 12

    Chromosome size (Mbp) 108 85.6 83.6 82.1 80 53.8 80.3 64.7 81.8 88.5No. of BACs sequenced (available at SGN) 19 91 15 105 42 126 100 127 57 4 18Disease-resistance-like genes (PUTs) mapped 1 6 0 12 3 9 4 11 7 0No. of resistance-like genes per BAC 0.05 0.07 0.00 0.11 0.07 0.07 0.04 0.09 0.12 0.No. of defense-response-like unigenes mapped 0 5 0 0 7 1 2 1 1 0No. of defense-response-like genes per BAC 0 0.05 0 0 0.17 0.01 0.02 0.01 0.02 0Total mapped resistance-like and defense-response-like genes: 59

    SGN, SOL Genomics Network.

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    11/1588 TH E PLANT GENOME MARCH 2009 VO L. 2, NO . 1

    sequences, which were biased toward euchromatin,only 24% o the sequence was repeat masked, con rm-ing earlier results rom FISH analyses that the repeatcontent o hetero- and euchromatic regions are signi -cantly different.

    In some chromosomes whose sequencing isadvanced, diffi culties were encountered in nding newseed BACs in the gap regions. A number o initiativeshave been put in place to increase the number o seedBACs, such as additional screening o BAC library ltersand markers not used in the overgo process, computa-tional mapping o BAC ends to marker sequences, andmapping o BACs on tomato chromosomes using intro-gression lines (Eshed and Zamir, 1995). o nd novelcleaved ampli ed polymorphic sequences markers, BACswere selected containing open reading rames or uniquesequences at their ends. Nearly 41% o these BACs havebeen success ully mapped to speci c tomato chromo-somes in preliminary screening o a set o 120 BACs. Teprocedure proposed requires minimum cost and effortsto generate new CAPS markers, and identi ed BACs canbe directly used or sequencing. Te 200,000- osmid endsequences currently available have already proven tobe extremely valuable or increasing the possibilities oextensions rom other sequenced BACs.

    Considerable synergies will be derived rom the ongo-ing potato genome sequencing project. Potato, anotherimportant ood staple in Solanum, is being sequencedby another, but similarly structured consortium (http://www.potatogenome.net/ [veri ed 16 Jan. 2009]). Te rstsequences should be available this year. Within Solanum tomato and potato are closely related, both are memberso the same phylogenetically similar group o species,

    and only ve major pericentromeric inversions have beenobserved between these two species ( anksley et al., 1992).Because o their phylogenetic proximity, we expect thatit will be possible to close sequence gaps in the tomatogenome based on potato data and vice versa. Te twoprojects have a good working relationship and regularlymeet at the SOL genome workshops held once a year. Alldata related to the tomato genome sequencing projectcan be ound on SGN (http://sgn.cornell.edu/) and BACsequences are deposited to GenBank (http://www.ncbi.nlm.nih.gov/). We expect that the euchromatin sequencewill be close to nished in 2010.

    Experimental ProceduresSequencingData Availability and Sequencing StatisticsAll data, including BAC and BAC-end sequences, chro-matograms, assembly les, FISH localizations, overgoresults, and mapping data are available on the SGN Website (http://sgn.cornell.edu/). Sequence data are alsoavailable rom GenBank (http://www.ncbi.nlm.nih.gov/).

    o track the progress o the project, a BAC registry data-base is run as a central resource on the SGN website. Te

    sequencing teams have special log-in accounts that allow

    them to assign BACs to their projects and then adjust thestatus o each BAC in their sequencing pipeline. Basedon this in ormation, the summary statistics about projectprogress are calculated and displayed in real time on theInternational omato Sequencing Project overview pageat http://sgn.cornell.edu/about/tomato_sequencing.pl[veri ed 16 Jan. 2009].

    Genome Annotation

    Repeat Database A comprehensive repeat database speci c or tomato wasgenerated by running RepeatScout (Price et al., 2005)on the BAC-end sequences o each library. Te threedifferent repeat collections (one per BAC library) wereassembled into one library using the cap3 program. Teresulting set was assayed or repeat requency in theentire BAC-end database, and repeats occurring ewerthan 30 times were discarded. Tis set, re erred to as theunirepeat set, was annotated using BLAS against differ-ent databases (Te Institute or Genomic Research repeatset and GenBank Nonredundant), and was used to assessrepeat content in BAC-ends and in ull BAC sequences.

    ITAG Genome Annotation Pipeline Te I AG annotation pipeline operates on batches ocontigs composed o one or more BACs. Tese contigsare generated at SGN rom the AGP les and the BACsequences. Analyses such as repeat masking, ES align-ment, and gene predictions using different gene nderssuch as GeneID (Parra et al., 2000), GeneMark (Isonoet al., 1994), and Augustus (Stanke et al., 2008) are per-

    ormed on those BACs. o generate a consensus annota-tion, these data are combined with homology to protein or

    genomic sequences rom other species (BlastX, blastX),and ed into the combiner sofware called EuGene (Foissacet al., 2008). Te resulting gene models are then unction-ally annotated based on homology searches (BlastP), pro-tein domain searches (Interpro) (Mulder et al., 2003), andgene ontology assignment (Ashburner et al., 2000). Non-coding RNAs were identi ed using the In ernal program(Griffi ths-Jones et al., 2003).

    Estimation of Transcription Factors in TomatoGenome Using Expressed Sequence Tags

    o search putative Fs in the ES data sets o Solanum

    lycopersicum, the assembled ES s rom PlantGDB, ver-sion161a, September 2007 release (257,093 ES s assem-bled into 48,945 PU s) was downloaded and translatedusing ES Scan-3.0.2 (Iseli et al., 1999). Tese translatedPU s were categorized into F gene amilies based onthe classi cation process de ned by two plant transcrip-tion databasesPln FDB (Riano-Pachon et al., 2007)and Plant FDB (http://plantt db.cbi.pku.edu.cn/ [veri-

    ed 16 Jan. 2009]). A list o domains necessary or clas-si ying a F into a particular gene amily was preparedand the available HMM pro les rom PFAM (v22.0 [Finnet al., 2008]) were downloaded. Te HMM pro les or

    the remaining domains were created using the protein

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    12/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 89

    alignments available at Pln FDB. HMMER searches(http://hmmer.janelia.org/ [veri ed 16 Jan. 2009]) wereper ormed on translated PU s using HMM pro les andhits having E-values o 102 were selected. Further, theseputative Fs were localized on 559 tomato BACs ( n-ished and un nished BAC sequences downloaded romSGN [bacsv205]) by per orming BLAS N with selectioncriteria o 90% identity and 80% length coverage.

    Analysis of Resistance andDefense-Response-Like GenesWe analyzed the 48,945 PU sequences o tomato down-loaded rom the PlantGDB (Duvick et al., 2008). Allthe PU s were used or BLAS X search with the NCBInonredundant database (http://www.ncbi.nlm.nih.gov/)and top hits o all the genes were extracted in a tabulated

    orm. Each gene showing homology to the above-men-tioned three major classes o R-genes, that is, NBS-LRRtype, LZ-NBS-LRR type, and LRR- m type together withother putative resistance proteins and de ense-responsegenes, making ve total categories, were tabulated inMicrosof Excel (Microsof, Redmond, WA) ormat.Tese R-gene and de ense-response gene homologs werethen mapped in silico on 754 sequenced BACs o respec-tive chromosomes to nd their physical locations.

    AcknowledgmentsFinancial sources: Sequencing o chromosome 2 in Korea is supportedby Crop Functional Genomic Center, a Frontier 21 Project o the MOESo Korean government. Chromosome 3 is being sequenced with supporto the Chinese Academy o Sciences. Chromosome 4 is being sequencedat the Wellcome rust Sanger Inst itute in the United Kingdom with thesupport o BBSRC/DEFRA and RER AD. Te Wellcome rust SangerInstitute is unded by the Wellcome rust. Biodiversity work in Sola-num at the NHM is supported by the NSF PBI program through award

    DEB-0316614 PBI Solanuma worldwide t reatment. Chromosome 5is sequenced by the Indian Initiative on omato Genome Sequencing(II GS) unded by Department o Biotechnology, Government o Indiaand supported by Indian Council o Agricultural Research, New Delhi.Chromosome 6 is sequenced with the support o the European Com-mission (EU-SOL Project PL 016214) and by the Centre or BioSystemsGenomics (CBSG), which is par t o the Netherlands Genomics Initia-tive/Netherlands Organisat ion or Scienti c Research. Chromosome 7sequencing is unded by the National Institute o Agronomic Research(INRA, France) and the National Research Funding Agency (ANR,France). Chromosome 8 sequencing is supported by the Chiba Pre ec-ture, Japan. Chromosome 9 sequencing is supported by Genoma Espaa.Chromosome 11 is supported by the Chinese Academy o the Sciences.Chromosome 12 is sequenced with the support o the Italian Ministryo Agriculture (Agronanotech Project), the Italian Ministry o Research(FIRB Project), and the EU (EU-SOL project). Te U.S. group is sup-ported by the National Science Foundation, USA, g rants DBI-0421634and DBI-0606595. We would like to acknowledge the contr ibution o the

    ollowing people at the Wellcome rust Sanger Institute: Matthew Jones(Shotgun Library Constr uction), Karen Oliver (Fosmid End Sequencing),Sarah Sims (Shotgun Data Production), Stuart McLaren (AutomatedSequence Improvement), and Christine Lloyd (Finishing Qua lity Con-trol).

    ReferencesAdams, M.D., J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymero-

    poulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, and R .F. Moreno. 1991.Complementary DNA sequencing: Expressed sequence tags andhuman genome project. Science 252:16511656.

    Adams-Phillips, L., C. Bar ry, and J. Giovannoni. 2004. Signa l transduc-tion systems regulating ruit ripening. rends Plant Sci. 9:331338.

    AGI. 2000. Ana lysis o the genome sequence o the owering plant Arabi-dopsis thaliana . Nature 408:796815.

    Alexander, L., and D. Grierson. 2002. Ethylene biosynthesis and actionin tomato: A model or climacteric ruit ripening. J. Exp. Bot.53:20392055.

    Anderson, L.K., G.G. Doyle, B. Brigham, J. Carter, K.D. Hooker, A. Lai,M. Rice, and S.M. Stack. 2003. High-resolution crossover maps oreach bivalent o Zea mays using recombination nodules. Genetics165:849865.

    Anderson, L.K., N. Salameh, H.W. Bass, L .C. Harper, W.Z. Cande, G.Weber, and S.M. Stack. 2004 . Integrating genetic linkage maps withpachytene chromosome structure in maize. Genetics 166:19231933.

    Angiosperm Phylogeny Group. 2003. An update o the Angiosperm Phy-logeny Group classi cation or the orders and amilies o oweringplants: APG II. Bot. J. Linnean Soc. 141:399436.

    Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry,A.P. Davis, K. Dolinski, S.S. Dwight, J. . Eppig, M.A. Harr is, D.P.Hill, L. Issel- arver, A. Kasarsk is, S. Lewis, J.C. Matese, J.E. Rich-ardson, M. Ringwald, G.M. Rubin, and G. Sherlock. 2000. Geneontology: ool or the uni cation o biologyTe Gene OntologyConsortium. Nat. Genet. 25:2529.

    Asp, ., U.K. Frei, . Didion, K.K. Nielsen, and . Lubberstedt. 2007.Frequency, type, and dist ribution o ES -SSRs rom three geno-types o Lolium perenne, and their conservation across orthologous

    sequences o Festuca arundinacea, Brachypodium distachyon , andOryza sativa . BMC Plant Biol. 7:36.

    Bachem, C.W., R.S. van der Hoeven, S.M. de Bruijn, D. Vreugdenhil, M.Zabeau, and R.G. Visser. 1996. Visua lization o differential geneexpression using a novel method o RNA ngerprinting based onAFLP: Ana lysis o gene expression during potato tuber develop-ment. Plant J. 9:745753.

    Bai, Y., C.C. Huang, R . van der Hulst, F. Meijer-Dekens, G. Bonnema,and P. Lindhout. 2003. Q Ls or tomato powdery mi ldew resistance(Oidium lycopersici) in Lycopersicon parvi orum G1.1601 co-localizwith two qual itative powdery mildew resistance genes. Mol. PlantMicrobe Interact. 16:169176.

    Bennett, M.D., I.J. Leitch, H.J. Price, and J.S. Johnston. 2003. Compari-sons with Caenorhabditis (~100 Mb) and Drosophila (~175 Mb)

    using ow cytometry show genome size in Arabidopsis to be ~157Mb and thus ~25% larger than the Arabidopsis Genome Initiativeestimate o ~125 Mb. Ann. Bot. (Lond.) 91:547557.

    Bogdanove, A.J., and G.B. Martin. 2000. AvrPto-dependent Pto-interact-ing proteins and AvrPto-interacting proteins in tomato. Proc. Natl.Acad. Sci. USA 97:88368840.

    Brummell, D.A., and M.H. Harpster. 2001. Cell wall metabolism in ruitsofening and quality and its manipulation in transgenic plants.Plant Mol. Biol. 47:311340.

    Budiman, M.A., L . Mao, .C. Wood, and R.A. Wing. 2000. A deep-cov-erage tomato BAC library and prospects toward development o anS C ramework or genome sequencing. Genome Res. 10:129136.

    Cannon, S.B., L. Sterck, S. Rombauts, S. Sato, F. Cheung, J. Gouzy, X.Wang, J. Mudge, J. Vasdewani, . Schiex, M. Spannagl, E. Mon-aghan, C. Nicholson, S.J. Humphray, H. Schoo , K.F. Mayer, J. Rog-ers, F. Quetier, G.E. Oldroyd, F. Debelle, D.R. Cook, E.F. Retzel,B.A. Roe, C.D. own, S. abata, Y. Van de Peer, and N.D. Young.2006. Legume genome evolution viewed through the Medicagotruncatula and Lotus japonicus genomes. Proc. Natl. Acad. Sci. USA103:1495914964.

    Chang, S.B., L.K. Anderson, J.D. Sherman, S.M. Royer, and S.M.Stack. 2007. Predicting and testing physical locations o geneti-cally mapped loci on tomato pachytene chromosome 1. Genetics176:21312138.

    Chang, S.B., .J. Yang, E. Datema, J. van Vugt, B. Vosman, A. Kuipers, MMeznikova, D. Szinay, R. Klein Lankhorst, E. Jacobsen, and H. deJong. 2008. FISH mapping and molecular organization o the majorrepetitive sequences o tomato. Chromosome Res. 16:919933.

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    13/15

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    14/15MUELLER ET AL.: A SNAPSHOT OF TH E TOMATO GENOME 91

    2005. Genome sequencing in micro abricated high-density picolitrereactors. Nature 437:376380.

    Mueller, L.A., C.D. Goodman, R.A. Silady, and V. Walbot. 2000. AN9, apetunia glutathione S-trans erase required or anthocyanin seques-tration, is a avonoid-binding protein. Plant Physiol. 123:15611570.

    Mueller, L.A., A.A. Mills, B. Skwareck i, R.M. Buels, N. Menda, and S.D.anksley. 2008. Te SGN comparative map viewer. Bioin ormatics

    24:422423.Mulder, N.J., R. Apweiler, R.K. Attwood, A. Bairoch, and D. Barrell. 2003.

    Te Interpro Database, 2003 brings increased coverage and neweatures. Nucleic Acids Res. 31:315318.

    Oksman-Caldentey, K.M. 2007. ropane and nicotine a lkaloid biosynthe-sisnovel approaches towards biotechnological production o plant-derived pharmaceuticals. Curr. Pharm. Biotechnol. 8:203210.

    Olmstead, R.G., J.A. Sweere, R.E. Spangler, L. Bohs, and J.D. Palmer.1999. Phylogeny and provisional classi cation o the Solanaceaebased on chloroplast DNA. p. 111137. In M. Nee, D.E. Symon, R.N.Lester, and J.P. Jessop (ed.) Solanaceae IV, Advances in biology andutilization. Royal Botanic Gardens, Kew, UK.

    Palmer, L.E., P.D. Rabinowicz, A .L. OShaughnessy, V.S. Balija, L .U.Nascimento, S. Dike, M. de la Bastide, R.A. Martienssen, and W.R.McCombie. 2003. Maize genome sequencing by methylation ltra-tion. Science 302:21152117.

    Parra, G., E. Blanco, and R. Guigo. 2000. GeneID in Drosophila. GenomeRes. 10:511515.

    Pedley, K.F., and G.B. Mar tin. 2003. Molecular basis o Pto-mediated

    resistance to bacterial speck disease in tomato. Annu. Rev. Phyto-pathol. 41:215243.

    Peters, S.A., J.C. van Haarst, .P. Jesse, D. Woltinge, K. Jansen, . Hes-selink, M.J. van Staveren, M.H. Abma-Henkens, and R.M. Klein-Lankhorst. 2006. OPAAS, a tomato and potato assembly assistancesystem or selection and nishing o bacterial ar ti cial chromo-somes. Plant Physiol. 140:805817.

    Peterson, D.G., S.R. Schulze, E.B. Sciara, S.A. Lee, J.E. Bowers, A. Nagel,N. Jiang, D.C. ibbitts, S.R. Wessler, and A.H. Paterson. 2002 .Integration o Cot analysis, DNA cloning, and high-th roughputsequencing acilitates genome characterization and gene discovery.Genome Res. 12:795807.

    Peterson, D.G., S.M. Stack, H.J. Price, and J.S. Johnston. 1996. DNA con-tent o heterochromatin and euchromatin in tomato ( Lycopersicon

    esculentum) pachytene chromosomes. Genome 39:7782.Prat, S., W.B. Frommer, R. Ho gen, M. Keil, J. Kossmann, M. Koster- op-er, X.J. Liu, B. Muller, H. Pena-Cortes, and M. Rocha-Sosa. 1990.

    Gene expression during tuber development in potato plants. FEBSLett. 268:334338.

    Price, A.L., N.C. Jones, and P.A. Pevzner. 2005. De novo identi ca-tion o repeat amilies in large genomes. Bioin ormatics 21(Suppl.1):i351i358.

    Quattrocchio, F., W. Verweij, A. Kroon, C. Spelt, J. Mol, and R. Koes.2006. PH4 o Petunia is an R2R3 MYB protein that activates vacu-olar acidi cation through interactions with basic-helix-loop-helixtranscription actors o the anthocyanin pathway. Plant Cell18:12741291.

    Riano-Pachon, D.M., S. Ruzicic, I. Dreyer, and B. Mueller-Roeber. 2007.Pln FDB: An integrative plant transcription actor database. BMCBioin ormatics 8:42.

    Riechmann, J.L., J. Heard, G. Mart in, L. Reuber, C. Jiang, J. Keddie, L.Adam, O. Pineda, O.J. Ratcliffe, R.R. Samaha, R. Creelman, M. Pi l-grim, P. Broun, J.Z. Zhang, D. Ghandehari, B.K. Sherman, and G.Yu. 2000. Arabidopsis tr anscription actors: Genome-wide compara-tive analysis among eukaryotes. Science 290:21052110.

    Sacco, M.A., S. Mansoor, and P. Moffett. 2007. A RanGAP protein physi-cally interacts with the NB-LRR protein Rx, and is required or Rx-mediated viral resistance. Plant J. 52:8293.

    Schijlen, E., C.H. Ric de Vos, H. Jonker, H. van den Broeck, J. Molthoff,A. van unen, S. Martens, and A. Bovy. 2006. Pathway engineering

    or healthy phytochemicals leading to the production o novel a- vonoids in tomato ruit. Plant Biotechnol. J. 4:433444.

    Seymour, G., M. Poole, K. Manning, and G.J. King. 2008. Genetics andepigenetics o ruit development and r ipening. Curr. Opin. PlantBiol. 11:5863.

    Shendure, J., G.J. Porreca, N.B. Reppas, X. Lin, J.P. McCutcheon, A.M.Rosenbaum, M.D. Wang, K. Zhang, R.D. Mitra, and G.M. Church.2005. Accurate multiplex polony sequencing o an evolved bacterialgenome. Science 309:17281732.

    Soderlund, C., S. Humphray, A. Dunham, and L. French. 2000. Con-tigs built with ngerprints, markers, and FPC V4.7. Genome Res.10:17721787.

    Spelt, C., F. Quattrocchio, J. Mol, and R. Koes. 2002. AN HOCYANIN1o petunia controls pigment synthesis, vacuolar pH, and seedcoat development by genetically distinct mechanisms. Plant Cell14:21212135.

    Stanke, M., M. Diekhans, R . Baertsch, and D. Haussler. 2008. Usingnative and syntenically mapped cDNA alignments to improve denovo gene nding. Bioin ormatics 24:637644.

    Szinay, D., S.B. Chang, L. Khr ustaleva, S. Peters, E. Sch ijlen, Y. Bai, W.J.Stiekema, R.C. van Ham, H. de Jong, and R.M. Klein Lankhorst.2008. High-resolution chromosome mapping o BACs using multi-colour FISH and pooled-BAC FISH as a backbone or sequencingtomato chromosome 6. Plant J. 56:627637.

    ang, X., D. Szinay, C. Lang , M.S. Ramanna, E.A. van der Vossen, E.Datema, R. Klein Lankhorst, J. de Boer, S.A. Peters, C. Bachem, W.Stiekema, R.G. Visser, H. de Jong, and Y. Bai. 2008. Cross-speciesBAC-FISH painting o the tomato and potato chromosome 6 reveals

    undescribed chromosomal rearrangements. Genetics 180:13191328.

    anksley, S.D. 2004. Te genetic, developmental, and molecular bases oruit size and shape variation in tomato. Plant Cell 16(Suppl.):S181

    S189.anksley, S.D., M.W. Ganal, J.P. Prince, M.C. de Vicente, M.W. Bonier-

    bale, P. Broun, .M. Fulton, J.J. Giovannoni, S. Grandillo, and G.B.Martin. 1992. High density molecular linkage maps o the tomatoand potato genomes. Genetics 132:11411160.

    odesco, S., D. Campagna, F. Levorin, M. DAngelo, R. Schiavon, G. Valleand A. Vezzi. 2008. PABS: An online plat orm to assist BAC-by-BACsequencing projects. Biotechniques 44:60, 62, 64 .

    uskan, G.A., S. Di azio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hell-sten, N. Putnam, S. Ralph, S. Rombauts, A. Salamov, J. Schein,

    L. Sterck, A. Aerts, R.R . Bhalerao, R.P. Bhalerao, D. Blaudez, W.Boerjan, A. Brun, A. Brunner, V. Busov, M. Campbell, J. Carlson,M. Chalot, J. Chapman, G.L. Chen, D. Cooper, P.M. Coutinho,J. Couturier, S. Covert, Q. Cronk, R. Cunningham, J. Davis, S.Degroeve, A. Dejardin, C. Depamphilis, J. Detter, B. Dirks, I. Dub-chak, S. Duplessis, J. Ehlting, B. Ellis, K. Gendler, D. Goodstein,M. Gribskov, J. Grimwood, A. Groover, L. Gunter, B. Hamberger,B. Heinze, Y. Helariutta, B. Henrissat, D. Holligan, R. Holt, W.Huang, N. Islam-Faridi, S. Jones, M. Jones-Rhoades, R. Jorgensen,C. Joshi, J. Kangasjarvi, J. Karlsson, C. Kelleher, R. Kirkpatrick, M.Kirst, A . Kohler, U. Kalluri, F. Lar imer, J. Leebens-Mack, J.C. LepleP. Locascio, Y. Lou, S. Lucas, F. Martin, B. Montanini, C. Napoli,D.R. Nelson, C. Nelson, K. Nieminen, O. Nilsson, V. Pereda, G.Peter, R. Philippe, G. Pilate, A. Poliakov, J. Razumovskaya, P. Rich-ardson, C. Rina ldi, K. Ritland, P. Rouze, D. Ryaboy, J. Schmutz, J.Schrader, B. Segerman, H. Shin, A. Siddiqui, F. Sterky, A. erry, C.J.

    sai, E. Uberbacher, P. Unneberg, J. Vahala, K. Wall, S. Wessler, G.Yang, . Yin, C. Douglas, M. Marra, G. Sandberg, Y. Van de Peer,and D. Rokhsar. 2006. Te genome o black cottonwood, Populustrichocarpa ( orr. & Gray). Science 313:15961604.

    van der Vossen, E .A., J.N. van der Voort, K. Kany uka, A. Bendahmane,H. Sandbrink, D.C. Baulcombe, J. Bakker, W.J. Stiekema, and R .M.Klein-Lankhorst. 2000. Homologues o a single resistance-genecluster in potato con er resistance to distinct pathogens: A virus anda nematode. Plant J. 23:567576.

    van Os, H., S. Andrzejewski , E. Bakker, I. Barrena, G.J. Bryan, B. Car-omel, B. Ghareeb, E. Isidore, W. de Jong, P. van Koert, V. Le ebvre,D. Milbourne, E. Ritter, J.N. van der Voort, F. Rousselle-Bour-geois, J. van Vliet, R . Waugh, R.G. Visser, J. Bakker, and H.J. van

    Eck. 2006. Construction o a 10,000-marker ultradense genetic

  • 8/10/2019 SnapshotEmergingTomatoGenomeSequence.pdf

    15/15