7
letter 194 nature genetics • volume 29 • october 2001 A radiation hybrid transcript map of the mouse genome Philip Avner 1,2 , Thomas Bruls 1 , Isabelle Poras 1 , Lorraine Eley 3 , Shahinaz Gas 1 , Patricia Ruiz 4 , Michael V. Wiles 5,10 , Rita Sousa-Nunes 6 , Ross Kettleborough 6 , Amer Rana 6 , Jean Morissette 4 , Liz Bentley 3 , Michelle Goldsworthy 3 , Alison Haynes 3 , Eifion Herbert 3 , Lorraine Southam 3 , Hans Lehrach 5 , Jean Weissenbach 1 , Giacomo Manenti 7 , Patricia Rodriguez-Tome 8,10 , Rosa Beddington*, Sally Dunwoodie 6,9 & Roger D. Cox 3 1 Genoscope, Centre National de Sequençage and CNRS UMR 8030, CP 5706, 91057 Evry Cedex, France. 2 Unité de Génétique Moléculaire Murine, Institut Pasteur, Paris, France. 3 Mammalian Genetics Unit, Medical Research Council, Harwell, UK. 4 Centre de Recherche du CHUL 5 Max-Planck Institute of Molecular Genetics, Ihnestrasse, Berlin, Germany. 6 Laboratory of Mammalian Development, National Institute for Medical Research, The Ridgeway, Mill Hill, London, UK. 6 , Ste-Foy, Quebec, Canada. 7 Instituto Nazionale Tumori, Division of Oncology, Milano, Italy. 8 The European Bioinformatics Institute, European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 9 The Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, Australia. 10 Present addresses: Deltagen Inc., Menlo Park, California, USA (M.V.M.); Geneva Proteomics Inc., Meyringe, Switzerland (P.R.-T.). *Deceased. Correspondence should be addressed to P.A. (e-mail: [email protected]). Expressed-sequence tag (EST) maps are an adjunct to sequence- based analytical methods of gene detection and localization for those species for which such data are available, and provide anchors for high-density homology and orthology mapping in species for which large-scale sequencing has yet to be done 1 . Species for which radiation hybrid–based transcript maps have been established include human 2 , rat 3–5 , mouse 6 , dog 7 , cat 8 and zebrafish 9,10 . We have established a comprehensive first-gener- ation–placement radiation hybrid map of the mouse consisting of 5,904 mapped markers (3,993 ESTs and 1,911 sequence- tagged sites (STSs)). The mapped ESTs, which often originate from small-EST clusters, are enriched for genes expressed dur- ing early mouse embryogenesis and are probably different from those localized in humans. We have confirmed by in situ hybridization that even singleton ESTs, which are usually not retained for mapping studies, may represent bona fide tran- scribed sequences. Our studies on mouse chromosomes 12 and 14 orthologous to human chromosome 14 show the power of our radiation hybrid map as a predictive tool for orthology mapping in humans. To ensure the mapping of novel embryonic transcripts, we sequenced the Beddington endoderm cDNA library derived from a 7.5–days post coitum (dpc) gastrulating embryo. Some 4,000 EST sequences from this library and 200 sequences from a previ- ously analyzed embryonic library 11,12 were examined by cluster- ing homologous sequences into groups corresponding to putative single genes and determining their expression profile by computa- tional analysis. Approximately 18% were novel; after clustering, 108 sequences remained as unique single sequences (singletons). Although singletons are often considered to be DNA contami- nants and omitted from EST mapping programs, they may be transcripts expressed at low levels, or transcripts that are poorly represented in the nucleotide database because they are specific to a cell type or are poor substrates for reverse transcriptase. To address the biological significance of these singletons, we gener- ated probes for in situ hybridization from six of the EST-sequence Fig. 1 Whole-mount in situ hybridization: singletons represent bona fide gene transcripts. Whole-mount RNA in situ hybridization of mouse embryos (lateral view) using probes generated from ESTs (a,b) AL022911 (c) AL023051 (d,e) AL033345, and (f,g) AL034928. Singletons represent bona fide transcripts. a, At 7.5 dpc, AL022911 transcripts are localized to the head process. b, At 9.5 dpc, they are found in the otic vesicle, the branchial arches and isolated cells in the midbrain, ventral to the heart and adjacent to the neural tube. c, At 7.5 dpc (left), AL023051 transcripts are restricted to the visceral endoderm, and at 9.5 dpc (right) to its descendant, the yolk sac. d, At 7.5 dpc, AL03334 transcripts are restricted to the nascent mesoderm and primitive streak. e, At 9.5 dpc, they are localized to the midbrain, dorsal neural tube, pharyngeal pouches and pre- somitic mesoderm. f, EST genes with restricted patterns of expression 7–7.5-dpc embryos show widespread but non-ubiquitous localization of AL034928 tran- scripts. Embryos developed for a short time show expression in the anterior vis- ceral endoderm (arrow), whereas those developed for longer (bottom) show the extent of gene expression. g, At 9.5 dpc, AL034928 transcript localization is widespread but is absent from the heart, yolk sac and neuroepithelium. a b c d e f g © 2001 Nature Publishing Group http://genetics.nature.com © 2001 Nature Publishing Group http://genetics.nature.com

document

  • Upload
    roger-d

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: document

letter

194 nature genetics • volume 29 • october 2001

A radiation hybrid transcript map of the mouse genome

Philip Avner1,2, Thomas Bruls1, Isabelle Poras1, Lorraine Eley3, Shahinaz Gas1, Patricia Ruiz4, Michael V.Wiles5,10, Rita Sousa-Nunes6, Ross Kettleborough6, Amer Rana6, Jean Morissette4, Liz Bentley3, MichelleGoldsworthy3, Alison Haynes3, Eifion Herbert3, Lorraine Southam3, Hans Lehrach5, Jean Weissenbach1,Giacomo Manenti7, Patricia Rodriguez-Tome8,10, Rosa Beddington*, Sally Dunwoodie6,9 & Roger D. Cox3

1Genoscope, Centre National de Sequençage and CNRS UMR 8030, CP 5706, 91057 Evry Cedex, France. 2Unité de Génétique Moléculaire Murine, InstitutPasteur, Paris, France. 3Mammalian Genetics Unit, Medical Research Council, Harwell, UK. 4Centre de Recherche du CHUL 5Max-Planck Institute ofMolecular Genetics, Ihnestrasse, Berlin, Germany. 6Laboratory of Mammalian Development, National Institute for Medical Research, The Ridgeway, MillHill, London, UK. 6, Ste-Foy, Quebec, Canada. 7Instituto Nazionale Tumori, Division of Oncology, Milano, Italy. 8The European Bioinformatics Institute,European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 9TheVictor Chang Cardiac Research Institute, Darlinghurst, Sydney, Australia. 10Present addresses: Deltagen Inc., Menlo Park, California, USA (M.V.M.); GenevaProteomics Inc., Meyringe, Switzerland (P.R.-T.). *Deceased. Correspondence should be addressed to P.A. (e-mail: [email protected]).

Expressed-sequence tag (EST) maps are an adjunct to sequence-based analytical methods of gene detection and localization forthose species for which such data are available, and provideanchors for high-density homology and orthology mapping inspecies for which large-scale sequencing has yet to be done1.Species for which radiation hybrid–based transcript maps havebeen established include human2, rat3–5, mouse6, dog7, cat8 and

zebrafish9,10. We have established a comprehensive first-gener-ation–placement radiation hybrid map of the mouse consistingof 5,904 mapped markers (3,993 ESTs and 1,911 sequence-tagged sites (STSs)). The mapped ESTs, which often originatefrom small-EST clusters, are enriched for genes expressed dur-ing early mouse embryogenesis and are probably differentfrom those localized in humans. We have confirmed by in situhybridization that even singleton ESTs, which are usually notretained for mapping studies, may represent bona fide tran-scribed sequences. Our studies on mouse chromosomes 12 and14 orthologous to human chromosome 14 show the power ofour radiation hybrid map as a predictive tool for orthologymapping in humans.To ensure the mapping of novel embryonic transcripts, wesequenced the Beddington endoderm cDNA library derived froma 7.5–days post coitum (dpc) gastrulating embryo. Some 4,000EST sequences from this library and 200 sequences from a previ-ously analyzed embryonic library11,12 were examined by cluster-ing homologous sequences into groups corresponding to putativesingle genes and determining their expression profile by computa-tional analysis. Approximately 18% were novel; after clustering,108 sequences remained as unique single sequences (singletons).Although singletons are often considered to be DNA contami-nants and omitted from EST mapping programs, they may betranscripts expressed at low levels, or transcripts that are poorlyrepresented in the nucleotide database because they are specific toa cell type or are poor substrates for reverse transcriptase. Toaddress the biological significance of these singletons, we gener-ated probes for in situ hybridization from six of the EST-sequence

Fig. 1 Whole-mount in situ hybridization: singletons represent bona fide genetranscripts. Whole-mount RNA in situ hybridization of mouse embryos (lateralview) using probes generated from ESTs (a,b) AL022911 (c) AL023051 (d,e)AL033345, and (f,g) AL034928. Singletons represent bona fide transcripts. a, At7.5 dpc, AL022911 transcripts are localized to the head process. b, At 9.5 dpc,they are found in the otic vesicle, the branchial arches and isolated cells in themidbrain, ventral to the heart and adjacent to the neural tube. c, At 7.5 dpc(left), AL023051 transcripts are restricted to the visceral endoderm, and at 9.5dpc (right) to its descendant, the yolk sac. d, At 7.5 dpc, AL03334 transcripts arerestricted to the nascent mesoderm and primitive streak. e, At 9.5 dpc, they arelocalized to the midbrain, dorsal neural tube, pharyngeal pouches and pre-somitic mesoderm. f, EST genes with restricted patterns of expression 7–7.5-dpcembryos show widespread but non-ubiquitous localization of AL034928 tran-scripts. Embryos developed for a short time show expression in the anterior vis-ceral endoderm (arrow), whereas those developed for longer (bottom) showthe extent of gene expression. g, At 9.5 dpc, AL034928 transcript localization iswidespread but is absent from the heart, yolk sac and neuroepithelium.

a b c

d e

f g

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com

Page 2: document

letter

nature genetics • volume 29 • october 2001 195

cDNAs and examined theirexpression in the early embryo.Three of the ESTs (AL023051,AL033345, AL022911) showhighly restricted patterns ofhybridization, one (AL034928)is relatively widespread in itsexpression and two (AL023012,AL023075) show ubiquitoushybridization (Fig. 1). Thesefindings suggest that many ofthe singletons that remain afterclustering are likely to be bonafide cDNAs. In situ hybridiza-tion of a larger sample of 350cDNAs from the embryoniclibraries shows that 80% havewidespread or ubiquitous pat-terns of expression in the 7.5-dpc embryo.

For high-throughput radia-tion hybrid marker typing in a96-well plate format, weselected 90 radiation hybridsfrom the original panel of 94(ref. 13) to permit inclusion ofappropriate control DNAs. Weselected more than 2,800 SSLPmarkers for the framework map from the MIT (96 meiosis F2intercross) and EUCIB (1,000 progeny backcross) mouse geneticmaps14,15, of which 2,230 yielded valid panel typings. Theobserved average marker retention frequency of 30.5% is consis-tent with previously published results for the T31 radiationhybrid panel6,13,16.

The X chromosome shows the lowest retention frequency(expected because the male donor cell line that was irradiatedcontained a single copy of the X chromosome) and chromosome19, the smallest chromosome, shows the highest (Table 1). Con-struction of the framework map resulted in a series of extendedframework maps that comprise 1,238 markers.

We produced comprehensive EST placement maps by mappingchromosome-assigned ESTs and STSs against the framework-map intervals by multipoint maximum likelihood analysis. Thisresulted in a placement map of 3,446 markers. We also produced amap using the traveling-salesman problem (TSP) approach.Chromosome maps produced in this way generally containedfewer markers, as the reordering algorithm caused the more error-prone markers to be discarded from the map. The maps producedwith the two approaches, however, are highly consistent (Fig. 2),indicating that data quality and not algorithmic approach is thecritical factor in producing high-quality maps. Of the markers onthe maximum likelihood map (Fig. 2) with odds higher than1:1,000 that are also on the TSP map, 87% show the same relativeorder on both maps. Typically, about 80% of other chromosomesalso show the same relative order. Overall, using our build criteria,we were unable to assign about 7% of validated vector scores to achromosome, and among the markers assigned, we were unableto precisely localize 15%. Of the assigned markers, we found that40% were localized to both ends of the chromosomes, and elimi-nated these as potentially error-prone vector scores. This buildcontains 1,803 ESTs and 1,643 STSs. Many of the mapped ESTsincluded in the comprehensive map are derived from small ESTsequence clusters, compatible with their being derived from genesthat have both a restricted transcriptional profile and low copynumber in the cell (data not shown).

Since the first build of the comprehensive placement mapsreported here, a further 2,190 ESTs and 268 STSs have beenlocalized on the placement map using a mapping criterion thatrequires the two top-ranking intervals to be adjacent if the oddsof the top ranking interval are less than 1,000:1 compared to thesecond placement interval. If the slightly more stringent require-ments used in the first-release build had been used, some 2,384additional placements would have been obtained. The locationsof these new ESTs relative to the framework map are available athttp://www.genoscope.cns.fr. In all, 5,904 markers (3,993 ESTs

Table 1 • Summary of radiation hybrid map data

Chr Mb cR kb/cR Fw HOM LOM cRperMark Av. ret

1 216.00 5,579.80 38.71 73 148 87 23.74 0.292 208.50 6,003.00 34.73 92 208 103 19.30 0.283 179.70 5,066.70 35.47 72 154 51 24.72 0.334 176.70 5,398.00 32.73 76 151 58 25.83 0.275 170.40 4,886.00 34.88 68 132 103 20.79 0.286 165.90 5,234.30 31.69 81 117 102 23.90 0.277 155.70 4,296.50 36.24 57 129 58 22.98 0.278 149.10 4,366.50 34.15 64 113 92 21.30 0.319 143.70 4,290.40 33.49 49 109 60 25.39 0.3010 144.90 2,514.30 57.63 58 83 9 27.33 0.3211 141.60 5,185.80 27.31 72 190 13 25.55 0.3112 146.40 2,255.20 64.92 44 75 3 28.91 0.3513 131.40 3,864.30 34.00 48 108 4 34.50 0.3114 133.80 3,624.00 36.92 56 110 30 25.89 0.2815 121.50 2,797.80 43.43 53 109 12 23.12 0.3216 114.00 3,594.20 31.72 51 137 9 24.62 0.3417 115.50 5,175.70 22.32 57 139 11 34.50 0.3318 116.40 2,979.70 39.06 44 100 3 28.93 0.3219 81.90 2,523.40 32.46 50 113 31 17.52 0.38X 186.90 3,682.30 50.76 73 119 63 20.23 0.23

tot 3,000.00 83,317.90 36.01 1,238 2,544 902 24.18 0.30

Av. ret, average retention rate; chr, chromosome; cRperMark, average number of centiray per marker; HOM, high-ordered marker (framework markers and markers whose first placement interval is at least 1,000 times more likely thanthe second placement interval); kb, kilobase; LOM, low-ordered marker (markers whose first placement interval is lessthan 1,000 times more likely than the second placement interval); Mb, megabase; tot, total.

Table 2 • Chromosome distribution of EST markers in firstplacement map (Feb. 1, 2001)

Total ESTs Total markersChr Feb 2001 Feb 2001

1 262 3912 352 5003 228 3324 265 3675 321 4106 202 3367 245 3338 243 3309 268 357

10 111 18811 258 35612 102 16113 116 18514 131 22215 166 25716 144 22917 186 27118 106 17219 153 227X 134 280

total 3,993 5,904

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com

Page 3: document

letter

196 nature genetics • volume 29 • october 2001

plus 1,911 WI-MIT markers) have been localized as of the 1 Feb-ruary 2001 map build, of which 4,032 are high-confidencemarkers that allow the precise positioning of any newly typedmarker (Table 2).

Figure 3 shows an example of this map built for mouse chro-mosome 2. In order to validate our EST localizations and esti-mate the potential error rates, we tested a subset of ESTslocalizing to chromosomes 3, 6 and 12 against a correspondingmonochromosomal hybrid17. Of the 145 ESTs tested, 94% (136)gave unambiguous positive results on the corresponding mono-chromosomal hybrid. This rate is close to that found for thehuman Genebridge 4 panel18 and confirms that the criteria wehave adopted for assignment are appropriate.

Our radiation hybrid framework map was constructed in a dif-ferent manner from the WI-MIT radiation hybrid map andwould appear to be a useful update to the previously publishedmouse radiation hybrid framework map6. The concatenationstep in particular should improve the accuracy of the frameworkmap. The 1,066 markers common to both maps will facilitatecomparison between them, increasing the value of the overallEST dataset. Assuming an average retention frequency of 30%and a mean size of the radiation-induced fragments of 5–10 Mb,the theoretical resolution limit that can be achieved with our 90hybrid lines is about 300 kb, corresponding to the mapping ofsome 10,000 uniformly distributed markers. It is likely that theresolution obtained using the non-redundant set of markersmapped by the joint efforts of the WI-MIT group19 and the EUconsortium will therefore approach that of the T31 panel.

We initially attempted a global cross-species orthologysearch based on BLAST analysis between the mapped mouseand human EST-cluster consensus sequences. This approachconfirmed 63 known regions of homology and defined 17putative new regions of homology that were, however, sup-ported only by single clusters (data not shown). We did a pilotcomparative human/mouse gene localization study in whichEST markers that we had radiation hybrid mapped were com-pared with sequence data from human chromosome 14. Thischromosome shows conserved gene orthology with mousechromosomes 12 and 14 (ref. 20). Conversely, a 15-cM regionbetween the centromere and the Tpo marker of mouse chro-mosome 12 is orthologous to human chromosome 2p, whereasa genetic segment extending from 15 cM from the centromereto around 23 cM is orthologous to parts of human chromo-some 7. The rest of the chromosome is thought to be ortholo-gous with human 14q. Mouse chromosome 14 has previouslybeen shown to have conserved linkages or conserved ortholo-gies involving human chromosomes 3, 8, 10, 13 and 14 as wellas the X and Y chromosomes. Our approach combined infor-mation from the following sources: (i) the genomic sequencecorresponding to a clone-tiling path covering 99% of humanchromosome 14; (ii) consensus sequences for the EST clustersproduced at the EBI and for which a representative had beenRH-mapped; (iii) orthologous relationships between humanand mouse genes computed at the Jackson Laboratory(http://www.jax.org) and (iv) orthologous relationships pre-dicted at the National Center for Biotechnology Informationfrom sequence alignment using the megablast algorithmbetween the mouse and human UniGene sets(http://www.ncbi.nlm.nih.gov/Homology/). Because our ini-tial attempts at aligning single short-pass mouse ESTsequences against human chromosome 14 genomic sequences

D5Mt193D5Mt331D5Mt385D5Mt178D5Mt293D5Mt225

AA656780D5Mt194D5Mt1

D5Mt147D5Mt295D5Mt415

AA414454D5Mt294AU020218D5Mt72

AL033327AA675370AA409895D5Mt180

AA624280D5Mt170D5Mt73D5Mt124D5Mt387D5Mt195D5Mt44

AI316792AA693030AA008401D5Mt350N28109D5Mt66D5Mt252D5Mt334

AA517122AA516635AA034545AA522022AA409575AU018689AI256389AI256365D5Mt126

AA008944N28137

AA408748AA516935D5Mt230R75187

D5Mt182D5Mt127

AA407867AA675338AA516956D5Mt77

AA408969AA410078D5Mt128D5Mt267

AA410100X98549

D5Mt391D5Mt11R74617

AA536748D5Mt131D5Mt171

AA536683D5Mt132N28078

D5Mt183D5Mt133D5Mt300D5Mt255D5Mt200

AA675224AL024007D5Mt15

AU014593AU019774AU018552AL118343D5Mt111D5Mt234

AA407526D5Mt151

AA673478AL024256

R75593D5Mt114D5Mt430D5Mt308D5Mt112AA408689AA675351D5Mt235

AA038966AA414776

R74668AA407907D5Mt153AA589601

R74707D5Mt40

AA693218AA591032AA673486D5Mt155

AU020179AL024423AV294700D5Mt207D5Mt277D5Mt115AL033355D5Mt10

AA408882D5Mt403R75581R75480

AA517864D5Mt239AA008231D5Mt240D5Mt314R75505

AA537002AA516738AL118319D5Mt117AA171383D5Mt24

AU020874D5Mt158D5Mt187AL022764AU018630D5Mt406AA000077AA386520AA589278AA048369AL118422AL118260AL118180D5Mt188D5Mt68D5Mt367

AA536864AA122651D5Mt242

AA002282AA432683AA408107D5Mt118

AA672638AA692954AA516740AA589521AA667101AA051638D5Mt160AA410023AA408365AA545148AA116878AL117851D5Mt138AI120173D5Mt140D5Mt245

AA399935AU019812D5Mt141

AA589577AI197402AL022669

R75054D5Mt263

AA048069AA538561AA413725AA042069AU019516AI503091D5Mt284AA617259AI197272AL033334D5Mt375D5Mt100D5Mt224AU041031AI663989AL024446AU018636D5Mt409AA537053AA408213D5Mt169AA003443

D5Mt331D5Mt193D5Mt385D5Mt178

AA656780D5Mt225D5Mt293D5Mt194D5Mt1

D5Mt147D5Mt295D5Mt415

AA414454D5Mt294AU020218D5Mt72

AL033327D5Mt180

AA624280AA409895AA675370D5Mt170D5Mt73D5Mt124D5Mt387D5Mt195D5Mt44

AI316792D5Mt176AL117913AA408763AU018742AA693030AA008401D5Mt350N28109D5Mt66D5Mt252D5Mt334

AA517122AA034545AA516635AA409575AU018689AA522022D5Mt126

AA008944AI256365AI256389D5Mt149N28137

AA408748AA516935D5Mt230R75187

D5Mt182AL022827D5Mt127AA675338AA407867D5Mt77

AA408969AA410078D5Mt128D5Mt267X98549

D5Mt391AA410100

D5Mt11R74617

AA536748D5Mt131D5Mt171

AA536683N28078

D5Mt183D5Mt132D5Mt133D5Mt300D5Mt255D5Mt200AL024007D5Mt15

AU014593AU019774

R75593D5Mt234AA571391AL118343AU018552D5Mt151D5Mt111AL024256AA407526AA673478D5Mt308D5Mt112AA408689AA675351D5Mt235

AA038966R74668

AA414776AA407907D5Mt153AA589601

R74707D5Mt40

AA693218AL118186AL118464AA002502W91666

AA591032AA516750AA407558D5Mt89

AL024423AU020179D5Mt155

AV294700D5Mt10D5Mt115R75581

D5Mt403AA517864D5Mt277

AA408882R75480

D5Mt239AI256848AL022675D5Mt314D5Mt240AA008231AA516738AA537002AA396516AA408781AA033308AA399720

R75505AL118319D5Mt117

AA007830D5Mt24

AA171383AA667213AU020130D5Mt158

AU020874D5Mt187

AA571339AL022764AA014482AU018630D5Mt406

AA000077AA589278AL118260AL118422AA048369AL118180AA386520D5Mt188D5Mt68

AA122651D5Mt367

AA536864AA002282AU019810D5Mt242AA432683AA692954AA408107D5Mt118

AA672638AA516740AA589521AA667101AA051638D5Mt160AA410023AA408365AA545148AA116878AL117851D5Mt138R74684

AI120173D5Mt95D5Mt140D5Mt245R75054

AL022669AI197402AA399935AU019812AA048069AA638790D5Mt141

AA589577AL023051D5Mt263D5Mt265

AA413725AA408415AI503091AL024004AA111517AA408304AA538561AA042069D5Mt284D5Mt247

AA617259D5Mt375D5Mt224AL033334AI197272AL033316AL023026AU041031D5Mt100AI115602AA407225AU019462AI663989AA537053D5Mt409AA408213AU018636AL024446D5Mt169AA003443

Fig. 2 Placement maps for chromosome 5. Comparison of maps established usingthe maxlik program28 (right-hand bar) and the traveling-salesman program (TSP)approach (middle bar), as well as the MIT genetic map (left-hand bar).

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com

Page 4: document

letter

nature genetics • volume 29 • october 2001 197

D2Mt355D2Mt115R75123

D2Mt231D2Mt1

AA589595AA536750D2Mt312AI316785AA589574D2Mt176D2Mt215

AA409368AL022750D2Mt149D2Mt116D2Mt430AA407367AA408369D2Mt317D2Mt267D2Mt362AA416458D2Mt80D2Mt6D2Mt69

AU021026AU018611AU020968AU018541AU019859AA608031AV354648AU019518

D2Mt81AA647477AL118426D2Mt32D2Mt431

AA645442AL033340AA537040AA408785AL033373AL118468

R75291AI663999AA000510D2Mt82

AI197442D2Mt318D2Mt521

AA682045D2Mt295AA516881

N28142AL024097AA408683AA546959AL033358AL033332AA068951AA409356D2Mt268

AA048798R74718

AI448831D2Mt121

AA408187R75484

AA517851AI317162D2Mt235D2Mt522D2Mt65D2Mt367AU024573W91691

AU019543AL022860D2Mt321AL118328AA600681D2Mt370D2Mt237D2Mt371D2Mt297D2Mt322D2Mt241D2Mt269

AA410094D2Mt374D2Mt155D2Mt523D2Mt8

D2Mt323D2Mt89

AL118389AA408484D2Mt324

AA591420D2Mt181

AA511284AA162728AL117958D2Mt61D2Mt34

AI194270D2Mt472

AA387844D2Mt124AL022964D2Mt91

D2Mt182AA517729D2Mt327D2Mt380AA386407AL118036AA414972AA473988AA387251AA014627D2Mt434AA536933D2Mt11

AI115430D2Mt245D2Mt36D2Mt271AI225831AI047735D2Mt219AI194320D2Mt158AL033346AL022980AA617402AI194318AA414743D2Mt41

AL023070AA636984AU023622AA549565AA387663AA002839AA414952

T25636AA474386AL024317AA415558AL117816D2Mt476D2Mt14

AA087273AI225934AA409437D2Mt126AA499478AA000564AA098426AA104660D2Mt100D2Mt162AU022189AA590326D2Mt163

AA473307D2Mt481D2Mt483D2Mt274D2Mt131D2Mt275AU018693D2Mt443D2Mt254D2Mt276AL022997AA171292AU019491D2Mt104AA839955AA111657D2Mt395

AA050288D77033

AA162699AA409117

R75430AA409446AA032380D2Mt164D2Mt16D2Mt17D2Mt134D2Mt485

AA409156D2Mt105D2Mt398AI413131AL022671R74849

D2Mt304AI047775D2Mt305AL022991AA589366AL022645D2Mt77

AA465898D2Mt338AA000325AA035907AI448024D2Mt106D2Mt488N28097

D2Mt191D2Mt306D2Mt136D2Mt353D2Mt108D2Mt209D2Mt339

AA672204AA591224D2Mt166D2Mt194

AU019504D2Mt109D2Mt280AL023072D2Mt340D2Mt110AL117803D2Mt282D2Mt407D2Mt492D2Mt26

AA014820D2Mt309AL033326D2Mt284D2Mt285

AU020945AA624493AU018672D2Mt493

AA050669AA438095D2Mt287D2Mt411AL022637D2Mt55

AA681275R75070R74979

AI316789R75019

AA104435D2Mt496AA571381AU021775AA544773AL023031AI225853AA606848D2Mt494AA407109AA562127AL033320D2Mt499D2Mt344D2Mt25

AL022809AA409384D2Mt198AA407717AL118455AI194338AL118348AA003548AA589417D2Mt147AA589558D2Mt113AI448866R75548

AA000870AA517114AA407883AI225786AU019819AA549490AI256840AA516848D2Mt199AI194292AA072924D2Mt213D2Mt346D2Mt114D2Mt200R75455

AA408762D2Mt74D2Mt457R75323

AU018536AI115377

Fig. 3 Comprehensive radiation hybrid map of mouse chromosome 2. Left-hand bar: genetic localizations taken from the MIT genetic map. Right-hand bar: position onthe radiation hybrid map. a, Top corresponds to the centromere of chromosome 2. b, Bottom corresponds to the telomere of chromosome 2.

a b

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com

Page 5: document

letter

198 nature genetics • volume 29 • october 2001

using tblastx were considered unreliable, we derived longertranscript sequences for each EST from either the EGI or Uni-Gene EST cluster databases. We then either directly aligned thelongest mouse transcript or consensus sequence against thehuman chromosome 14 genomic sequence using the blastnand tblastx algorithms, or used this transcript to retrieve puta-tive orthologs in the human UniGene set, which we thenaligned against the genomic sequence by blastn analysis. Toincrease the number of potential points of orthology, we inte-grated previously known orthologs genetically mapped in themouse but absent from the set of radiation hybrid-mappedESTs in the first build of our placement radiation hybrid map.We indirectly localized these orthologous relationships on theradiation hybrid map by identifying, from the genetic coordi-nates on the Mouse Genome Informatics integrated geneticmap, the closest marker that was both radiation hybridmapped and genetically mapped, and assigning the radiationhybrid map position of this marker to the orthologous gene.

We used the correspondence between the mouse and humanUnigene sets available through NCBI’s Homologene project(http://www.ncbi.nlm.nih.gov/Homology/) to provide furthercross-species links. Using these combined approaches, weanchored 100 ESTs onto the sequence map of human chromo-some 14. The orthology relationships between ESTs mappedon mouse chromosome 12 and human chromosome 14 areshown in Fig. 4. Using this map, we were able to localize ESTsthat had been assigned to mouse chromosome 12, but whoselocalization remained ambiguous (under the criteria that boththe highest and next highest allocations must be in neighbor-ing intervals), with greater confidence by reference to theorthology and linkage relationships based on the sequence ofhuman chromosome 14.

We have constructed a radiation hybrid map of the mousegenome that contains some 6,000 markers, of which over 4,000 areordered at high odds. ESTs extracted from this map should be use-ful in identifying the many mutants being generated by phenotype-driven ENU mutagenesis. For example, the 500 new ENU-inducedmouse mutants reported by Nolan and colleagues21, of which 30were initially mapped, or the 182 mutants identified by de Angelisand colleagues22 can be tackled most efficiently at the gene levelusing a candidate-gene approach. The usefulness of our radiationhybrid map as a predictive tool for candidate-gene cloning is indi-cated by our studies of the Delta3 gene and pudgy mutations11,23.Likewise, mapped ESTs will be useful for characterizing the geneticbasis underlying quantitative traits once these have been refined bycongenic/and or BAC transgenesis studies. This map is an impor-tant resource for cross-referencing of mammalian genomes, posi-tional cloning of mouse genes through candidate gene approaches,and anchoring and orientation of current draft sequencing efforts.

MethodsRadiation hybrids. We obtained DNAs corresponding to the T31 radiationhybrid panel13 from Research Genetics. A list of the subset of 90 hybridsused in our experiments is available at www.genoscope.cns.fr.

Sequencing of cDNA clones. We sequenced the 7.5-dpc endoderm libraryof Beddington and colleagues12 using standard methods. We submittedapproximately 4,000 clones that yielded high-quality sequence data to theEMBL database and subjected them to cluster analysis (see below); theiraccession numbers are listed at www.genoscope.cns.fr.

Cluster analysis. To identify and group transcribed sequences derivedfrom a single gene, we processed EST sequences extracted from the EMBLdatabase to remove or mask redundant repetitive sequences, contaminat-ing vector sequences and low-quality sequences that could confound theanalysis, then carried out cluster analysis on the basis of sequence homolo-gy using the JESAM packages24 and tools available at http://corba.ebi.ac.uk/EST/egi.html. This was done at the European Bioinformat-ics Institute (EBI). Each cluster of homologous and aligned sequences, andthe consensus sequence derived from them, corresponds—subject to cer-tain important caveats—to an individual gene24.

SSLP marker development. We selected STS markers from the MIT andEUCIB mouse genetic map14,15. We obtained EST markers either from ourinternal sequencing program or from cDNA sequences obtained from thedbEST EBI site after clustering24. We used endoderm-derived sequences sys-tematically for EST derivation if they corresponded to unallocatedsequences; otherwise, we derived ESTs from clusters in the database com-posed of two or more sequences of which at least 30% were derived fromearly-embryonic libraries (that is, earlier than 10.5 dpc). The list of retainedlibraries can be consulted on the genoscope.server. We modified these crite-ria in the case of the mouse urogenital ridge (NMUR library 144; see thelibrary browser at http://www.ncbi.nlm.nih.gov/UniGene/) and the mouseeight-cell-stage embryo (library 150), in that we sometimes used sequenceclusters from these libraries without regard for the overall proportion of

11.034 61FOXG1BFOXG1B

12.079

FOXG1B

12.87

79

COCHCOCHCOCH

13.662

88

ARHGAP5

119

PAX9HNF3A

21.478

274

Hs.24063

14.77

299

SIX1

306

HIF1A

15.356

325

ESR2ESR2

14.77

ZNF46

15.356

HSPA2SPTBSPTB

15.736

GPX2

13.662

368

ADAM21ADAM21ADAM21

15.736

399

PSEN1PSEN1

16.152

409

ABCD4PGF

16.695

FOS

16.946

TGFB3ESRRBESRRB

18.793

519

GALC

556

CHGA

20.617

565

CBGCBG

AACTAACT

576

TCL1ATCL1A

20.197 Hs.168241

20.911

BDKRB220.197

Hs.168241

22.289

614

Hs.112844

21.133

622

DNCH1DNCH1TRAF3CKBCKB

21.36

KNS2

Fig. 4 Orthologous relationships between mouse ESTs mapping to mouse chro-mosome 12 and human chromosome 14, showing positions on the radiationhybrid map of mouse chromosome 12 (left-hand bar) and human chromosome14 (right-hand bars). The numbers on the right identify the relevant humanBAC on the minimal-tiling path31.

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com

Page 6: document

letter

nature genetics • volume 29 • october 2001 199

embryonic-derived sequences and some singleton sequences were selected.We selected primer sequences using the Primer 3 program(http://www.genome.wi.mit.edu/ genome_software/other/primer3.html).We tested both STS and EST markers for their specificity against mouse,Chinese hamster and human DNAs before typing them on the radiationhybrid panel. We carried out all tests in duplicate. All primer sequences havebeen submitted to RHdb (ref. 25 and http://www.ebi.ac.uk/RHdb) and areavailable on the RHdb and Genoscope websites.

Radiation hybrid assays. We carried out polymerase chain reaction (PCR)amplification of EST fragments from radiation hybrid DNAs in 96-wellplates containing the 90 radiation hybrid, control mouse and Chinesehamster DNAs in duplicate. PCR conditions and gel electrophoresis wereas described at www.genoscope.cns.fr. We analyzed PCR products byagarose gel electrophoresis, detected them by ethidium bromide staining,and used a semi-automated method to score for the presence or absence ofPCR products of the expected size by comparison with molecular stan-dards. The data were recorded as a string of 0 (no amplification of a mousefragment), 1 (amplification of a mouse fragment) or 2 (ambiguous orunknown) vector scores corresponding to the radiation hybrid DNA order.All loci were scored in duplicate.

Framework map construction. In constructing the framework map, westarted with the 1,066 genetic markers typed in common at the Genoscopeand at the Whitehead Institute–Massachusetts Institute of Technology6.Because of differences in the scoring methods used by the two centers, weconsidered the typing of a radiation hybrid at each center as two indepen-dent assays and joined the vectors (radiation hybrid scores) into a singlevector string (concatenated Genoscope–WI-MIT) to increase the overallmarker information content and map accuracy. Vectors from both centerswere concatenated only if the original vectors contained fewer than 5 unre-solved positions and if the marker retention rate was between 14% and47% (with the exception of chromosomes 11 and 19, for which the rangeswere 14–60% and 14–55%, respectively). We removed markers that weretoo close to each other (as measured by the number of obligate chromoso-mal breaks) and ordered the remaining markers by analogy to the well-studied traveling-salesman problem (TSP), for which powerfulcomputational tools are available26,27. We translated the radiation hybridproblem into 5 slightly differing versions of the TSP problem that werecharacterized by minor variations of the objective function. We solvedthese TSP instances with the Lin–Kernighan (LK) heuristic from the‘CONCORDE’ package27 (http://www.caam.rice.edu/keck/concorde.html)and compared the resulting marker orders to identify segments conservedin all solutions. We then checked the relative ordering of these conservedsegments for consistency against the WI-MIT genetic map15. The initialframework maps resulting from this process contained 814 markers. Weused these markers and 1,369 additional genetic markers that had beentyped at the Genoscope to produce extended framework maps. We con-verted typing data corresponding to the pooled 2,183 genetic markers intoTSP instances and solved these using the concorde LK heuristic to obtaintemporary comprehensive maps. We then determined the most likelyframework interval for each of the 1,369 non-framework markers using amaximum likelihood criterion, and recorded the associated lod score. Wesubmitted this information, along with the previously computed frame-work maps, to a global reordering algorithm26 that reorders and consoli-dates the comprehensive maps. We discarded markers with orderdiscordancies under different TSP solutions by applying the algorithm ofthe longest-common-subsequence problem (LCSP)28. This yieldedextended framework maps comprising 1,821 genetic markers.

Because the TSP transformations are strictly valid only for haploid,error-free data, we re-evaluated the map likelihoods and inter-marker dis-tances with the radiation hybrid maxlik program29 under a diploid model.We then applied a pruning routine to the extended framework maps, bothto ensure that adjacent markers were not too close to each other and tofavor the inclusion of markers with the highest associated lod scores. Wediscarded 752 markers in this pruning step.

To further extend the new framework maps, we mapped ESTs and non-framework genetic markers typed at Genoscope against the frameworkmaps by multipoint maximum-likelihood analysis. We considered addi-tional markers for inclusion into intervals defined by adjacent frameworkmarkers separated by a breakage probability of 0.5 or more. We retainedtwo types of markers for this step: 86 STS markers assigned to such inter-vals with a lod score of at least 2, and showing a placement consistent with

both the genetic map and the WI-MIT radiation hybrid map6; and 79 ESTsthat mapped to these intervals and for which all alternative placementintervals were at least 1,000 times less likely. When more than one candi-date framework marker was available for a given interval, we applied thefollowing integrating criteria: first, we sorted candidate markers wereaccording to the average difference between their retention rate and that ofthe adjacent framework marker ones; second, among markers with smallretention-rate differences, we selected the marker with the smallest num-ber of ambiguous positions in its typing vector.

Placement map construction. We used both framework markers andgenetic markers discarded in the pruning step as reference markers for thechromosomal assignation of ESTs. We considered an EST assigned if it waslinked to at least 2 reference markers on a given chromosome with a two-point lod score of 7 or more and showed no linkage above this thresholdwith a reference marker for another chromosome. The use of a very densemap of reference markers and the imposition of a positive threshold cut-offto at least two reference markers markedly reduces the chances of detectingsuch linkage purely by chance. Once ESTs had been assigned to chromo-somes, we mapped them against the framework intervals by multipointmaximum likelihood analysis29. We ordered markers binned into the sameinterval according to the radiation hybrid distance from the upper frame-work marker. For each marker, we recorded all placement intervals forwhich the odds were higher than 1:1,000 with the top-ranking placementinterval; we considered markers for which these intervals were not adjacentto be unreliable, and removed these from the map. We discarded 640 mark-ers through this process, leaving 3,446 markers on the first build of theplacement maps.

Whole-mount RNA in situ hybridization. We collected mouse embryosfrom 5.5–9.5 dpc and carried out whole-mount RNA in situ hybridizationon these embryos12. We linearized pSPORTI-cDNA plasmid DNA withSalI and generated antisense digoxygenin-labeled riboprobes using SP6RNA polymerase30.

Validation of EST localizations. We used the monochromosomal mouse× human hybrids SN11C5-3/sc1.3, N12C1 and N2C1 to confirm ESTlocalizations to chromosomes 3, 6 and 12, respectively17. We usedgenomic DNA from the hybrids for PCR screening of ESTs. We systemat-ically included human DNA as a PCR control. The observed confirma-tion rate is likely to be a minimal estimate, because ESTs that produced aPCR product on the corresponding monochromosomal hybrid that wasdifferent in size from that expected were not scored as being positive,even though these size differences could reflect the fact that control DNAof the mouse parent of the hybrids was not available to us and wasreplaced by 129/Sv DNA.

AcknowledgmentsWe dedicate this article to our friend and colleague Rosa Beddington (March23, 1956–May 18, 2001), a scientist of great biological insight. This work wassupported by EEC Contract PL 962414. We thank B. Gorick and the HumanGenome Mapping Project at Hinxton, UK, for help with the replication of the7.5-dpc mouse endoderm library; A.M. Mallon and S. Greenaway of theinformatics group at the Medical Research Council, Harwell; and V. Taghaviand E. Sartory for technical assistance in marker typing at MRC Harwell.

Received 12 April; accepted 17 July 2001.

1. Yang,Y.P. & Womack, J.E. Parallel radiation hybrid mapping: a powerful tool forhigh-resolution genomic comparison. Genome Res. 8, 731–736 (1998).

2. Deloukas, P. et al. A physical map of 30,000 human genes. Science 282, 744 –746(1998).

3. Watanabe, T.K. et al. A radiation hybrid map of the rat genome containing 5,255markers. Nature Genet. 22, 27–36 (1999).

4. Steen, R.G. et al. A high-density integrated genetic linkage and radiation hybridmap of the laboratory rat. Genome Res. 9, AP1–8 (1999).

5. Scheetz T.E. et al. Generation of a high-density rat EST map. Genome Res. 11,497–502 (2001).

6. Van Etten, W.J. et al. Radiation hybrid map of the mouse genome. Nature Genet.22, 384–387 (1999).

7. Mellersh, C.S. et al. An integrated linkage–radiation hybrid map of the caninegenome. Mamm. Genome 11, 120–130 (2000).

8. Murphy, W.J. et al. A radiation hybrid map of the cat genome: implications forcomparative mapping. Genome Res. 10, 691–702 (2000).

9. Geisler, R. et al. A radiation hybrid map of the zebrafish genome. Nature Genet.

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com

Page 7: document

letter

200 nature genetics • volume 29 • october 2001

23, 86–89 (1999).10. Hukriede, N.A. et al. Radiation hybrid mapping of the zebrafish genome. Proc.

Natl Acad. Sci. USA 96, 9745–9750 (1999).11. Dunwoodie, S.L., Henrique, D., Harrison, S.M. & Beddington, R.S.P. Mouse Dll3: a

novel divergent Delta gene which may complement the function of other Deltahomologues during early pattern formation in the mouse embryo. Development124, 3065–3076 (1997).

12. Harrison, S.M, Dunwoodie, S.L., Arkell, R.M., Lehrach, H. & Beddington, R.S.P.Isolation of novel tissue-specific genes from cDNA libraries representing theindividual tissue constituents of the gastrulating mouse embryo. Development121, 2479–2489 (1995).

13. McCarthy, L.C. et al. A first generation whole genome-radiation hybrid mapspanning the mouse genome. Genome Res. 7, 1153–1161 (1997).

14. Breen, M. et al. Towards high resolution maps of the mouse and humangenomes—a facility for ordering markers to 0.1 cM resolution. Hum. Mol. Genet.3, 621–627 (1994).

15. Dietrich, W. F. et al. A comprehensive genetic map of the mouse genome. Nature380, 149–152 (1996).

16. Elliott, R.W., Manly, K.F. & Hohman, C. A radiation hybrid map of mousechromosome 13. Genomics 57, 365–370 (1999).

17. Sabile, A., Poras, I., Cherif, D., Goodfellow, P. & Avner, P. Isolation ofmonochromosomal hybrids for mouse chromosomes 3, 6, 10, 12, 14 and 18.Mamm. Genome 8, 81–85 (1997).

18. Gyapay, G. et al. A radiation hybrid map of the human genome. Hum. Mol. Genet.5, 339–46 (1996).

19. Hudson, T.J. et al. A radiation hybrid map of mouse genes. Nature Genet. 27,201–205.

20. DeBry, R.W. & Seldin, M.F. Human/mouse homology relationships. Genomics 33,337–351 (1996).

21. Nolan, P.M. et al. A systematic, genome-wide, phenotype-driven mutagenesisprogramme for gene function studies in the mouse. Nature Genet. 25, 440–443(2000).

22. Hrabe de Angelis, M.H. et al. Genome-wide, large-scale production of mutantmice by ENU mutagenesis. Nature Genet. 25, 444–447 (2000).

23. Kusumi, K. et al. The mouse pudgy mutation disrupts Delta homologue Dll3 andinitiation of early somite boundaries. Nature Genet. 19, 274–278 (1998).

24. Parsons, J.D & Rodriguez-Tome, P. JESAM: CORBA software components to createand publish EST alignments and clusters. Bioinformatics 16, 313–325 (2000).

25. Rodriguez-Tome, P. & Lijnzaad, P. RHdb: the Radiation Hybrid Database. NucleicAcids Res. 28, 146–147 (2000).

26. Agarwala, R., Applegate, D.L., Maglott, D., Schuler, G.D. & Schaffer, A.A. A fastand scalable radiation hybrid map construction and integration strategy. GenomeRes. 10, 350–364 (2000).

27. Applegate, D., Bixby, R., Chvatal, V. & Cook, W. On the solution of traveling salesmanproblems. Documenta Mathematica Journal der Deutschen Mathematiker-Vereinigung International Congress of Mathematics III, 645–656 (1998).

28. Gusfield, D. Algorithms on strings, trees, and sequences: computer science &computational biology (Cambridge University Press, Cambridge, UK, 1997).

29. Boehnke, M., Lange, K. & Cox, D.R. Statistical methods for multipoint radiationhybrid mapping. Am. J. Hum. Genet. 49, 1174–1188 (1991).

30. Wilkinson, D.G. Whole-mount in situ hybridisation of vertebrate embryos. in InSitu Hybridisation (ed. Wilkinson, D.G.) 75–83 (Oxford University Press, Oxford,UK, 1992).

31. Bruls, T. et al. A physical map of human chromosome 14. Nature 409, 947–948 (2001).

©20

01 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://g

enet

ics.

nat

ure

.co

m© 2001 Nature Publishing Group http://genetics.nature.com