8
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc Vol. 266, No. 36, Issue of December 25, pp. 24477-24464,1991 Printed in U.S.A. Variants of a Leishmania Surface Antigen Derived from a Multigenic Family* (Received for publication, January 18, 1991) Peter J. Murray$ and Terry W. Spithill4 From the Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3050 Australia and the §Victorian Institute of Animal Science, Attwood, Victoria 3049 Australia The promastigote surface antigen-2(PSA-2) com- plex comprises a group of immunogenic surface anti- gens linked to the surface of the Leishmania major promastigote with glycosylphosphatidylinositol an- chors. The L. major genome contains at least 14 PSA- 2 genes on a 950-kilobase chromosome and comprising -20% of the length of this chromosome. The sequence of three independent, but incomplete, PSA-2 cDNAs and one genomic fragment encoding a complete PSA-2 coding sequence were compared. PSA-2 genes encode polypeptides exhibiting 22-26aa tandem repeat ele- ments, threonine-rich segments which vary between genes, a conserved COOH-terminal cysteine-rich re- gion, and a conserved GPI anchor signal sequence. PSA-2 genes appear to be transcribed in a complex manner with multiple RNAs. The complex genomic organization of PSA-2 genes is present in other mem- bers of the genus suggesting that PSA-2 function is important for the biology of Leishmania. The life cycle of the parasitic protozoan Leishmania sp. can be divided into the insect or promastigote form and the intracellular or amastigote form. Vertebrate infection is ini- tiated through the bite of an infected sandfly which innocu- lates promastigotes into the host’s skin. A proportion of the promastigotes are preadapted for survival in the vertebrate host and are able to withstand nonspecific complement attack to initiate macrophage infection (reviewed by Sacks, 1989). Macrophages are the obligatory host cell for Leishmania in- tracellular infection. Once the promastigotes have success- fully invaded macrophages they differentiate into the amas- tigote form and multiply within the phagolysosome. We are interested in understanding the surface molecules of the promastigote that are involved in the steps following inoculation and leading to successful invasion of macro- phages. A substantial amount of information gathered over the past 5 years has shown that there are two major molecules of the promastigote involved in these interactions with the host. The lipophosphoglycan family of glycolipids has been * This work was supported by the Australian National Health and Medical Research Council, the John D. and Catherine T. MacArthur Foundation, and National Institutes of Health Grant A1-19347. The costs of publication of thisarticle were defrayed inpart by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. to the GenBankTM/EMBL Data Bank with accession number(s) The nucleotide sequence(s) reported in thispaper has been submitted X57134, X57135, X57009, and X56810. $ To whom correspondence should be addressed The Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA. 02142. shown to be a major parasite receptor for macrophages (Hand- man and Goding, 1985) while the other major surface mole- cule, gp63, has been identified as a parasite protease similar to the zinc-dependent thermolysin family of enzymes (Bouvier et al., 1989; Chaudhuri et al., 1989; Ip et al., 1990). Other functions of gp63 are controversial such as the protein’s role as a complement receptor (Russell, 1987) and its association with parasite virulence (Wilson et al., 1989; Murray et al., 1990). Surface labeling data indicate that relatively few poly- peptides reside onthe surface of L. major promastigotes (Murray et al., 1989a, and references therein). Apart from lipophosphoglycan and gp63, the only other characterized surface molecules include gp46 (Kahl and McMahon-Pratt, 1988), several transporter-like molecules such as an ATPase (Meade et al., 1987), and a newly discovered family of surface glycoproteins termed the promastigote surface antigen-2 (PSA-8)’ complex (Murray etal., 1989b). Our previous experiments involved isolating genomic clones expressing sequences encoding PSA-2 polypeptides. We also determined that PSA-2 proteins are linked to the parasite surface membrane via a glycosylphosphatidylinositol linkage (GPI) as are lipophosphoglycan (McConville et al., 1987; Orlandi and Turco, 1987), gp63 (Bordier et al., 1987), and a subset of the glycoinositolphospholipids (McConville and Bacic, 1990). Since apparently few molecules are found on the surface of promastigotes they must, in toto, be capable of mediating survival of the nonspecific complement attack, attaching to macrophages and surviving within the lytic en- vironment of the macrophage phagolysosome. In this paper, we present an analysis of PSA-2 genes and transcripts which identifies this group of molecules as a polymorphic family with unusual repeat structures. MATERIALS AND METHODS Promastigotes of the virulent cloned line L. major V121 (Handman et al., 1983) were used for most experiments. This clone was derived from LRC-137 (WHO code, WHOM/IL/67/Jericho 11). The parasites from which DNA was isolated as shown in Figs. 12 and 13have been listed with their WHO codes in Samaras and Spithill (1987). Genomic DNA from in uitro cultured promastigotes was isolated as described (Spithill and Samaras, 1987; Cowman et al., 1984). Total RNA was isolated by themethod of Chirgwin et al. (1979). Cut genomic DNA was electrophoresed and alkali blotted onto Hybond- N (Amersham Corp.) asdescribed (Church and Gilbert, 1984). RNA was electrophoresed in formaldehyde gels and transferred to nitro- cellulose as described (Spithill and Samaras, 1987; Maniatis et al., 1982). All probes were made by random priming (Feinberg and Vogelstein,1983). Washing conditions were as described in figure legends. Pulse-field gels were electrophoresed and blotted as described (Samaras and Spithill,1987). An L. major size selected (>1 kb) library was constructed in XgtlO The abbreviations used are: PSA-2, promastigote surface antigen- 2; VSG, variablesurface glycoprotein; GPI, glycosylphosphatidyli- nositol. 24477

THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 266, No. pp ...at 0.2 X SSC (standard saline citrate), 65 "C. Panel C, estimation of the PSA-2 gene copy number and locus size. L. major V121

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

THE J O U R N A L OF BIOLOGICAL CHEMISTRY 0 1991 by The American Society for Biochemistry and Molecular Biology, Inc

Vol. 266, No. 36, Issue of December 25, pp. 24477-24464,1991 Printed in U.S.A.

Variants of a Leishmania Surface Antigen Derived from a Multigenic Family*

(Received for publication, January 18, 1991)

Peter J. Murray$ and Terry W. Spithill4 From the Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3050 Australia and the §Victorian Institute of Animal Science, Attwood, Victoria 3049 Australia

The promastigote surface antigen-2 (PSA-2) com- plex comprises a group of immunogenic surface anti- gens linked to the surface of the Leishmania major promastigote with glycosylphosphatidylinositol an- chors. The L. major genome contains at least 14 PSA- 2 genes on a 950-kilobase chromosome and comprising -20% of the length of this chromosome. The sequence of three independent, but incomplete, PSA-2 cDNAs and one genomic fragment encoding a complete PSA-2 coding sequence were compared. PSA-2 genes encode polypeptides exhibiting 22-26aa tandem repeat ele- ments, threonine-rich segments which vary between genes, a conserved COOH-terminal cysteine-rich re- gion, and a conserved GPI anchor signal sequence. PSA-2 genes appear to be transcribed in a complex manner with multiple RNAs. The complex genomic organization of PSA-2 genes is present in other mem- bers of the genus suggesting that PSA-2 function is important for the biology of Leishmania.

The life cycle of the parasitic protozoan Leishmania sp. can be divided into the insect or promastigote form and the intracellular or amastigote form. Vertebrate infection is ini- tiated through the bite of an infected sandfly which innocu- lates promastigotes into the host’s skin. A proportion of the promastigotes are preadapted for survival in the vertebrate host and are able to withstand nonspecific complement attack to initiate macrophage infection (reviewed by Sacks, 1989). Macrophages are the obligatory host cell for Leishmania in- tracellular infection. Once the promastigotes have success- fully invaded macrophages they differentiate into the amas- tigote form and multiply within the phagolysosome.

We are interested in understanding the surface molecules of the promastigote that are involved in the steps following inoculation and leading to successful invasion of macro- phages. A substantial amount of information gathered over the past 5 years has shown that there are two major molecules of the promastigote involved in these interactions with the host. The lipophosphoglycan family of glycolipids has been

* This work was supported by the Australian National Health and Medical Research Council, the John D. and Catherine T. MacArthur Foundation, and National Institutes of Health Grant A1-19347. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

to the GenBankTM/EMBL Data Bank with accession number(s) The nucleotide sequence(s) reported in thispaper has been submitted

X57134, X57135, X57009, and X56810. $ To whom correspondence should be addressed The Whitehead

Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA. 02142.

shown to be a major parasite receptor for macrophages (Hand- man and Goding, 1985) while the other major surface mole- cule, gp63, has been identified as a parasite protease similar to the zinc-dependent thermolysin family of enzymes (Bouvier et al., 1989; Chaudhuri et al., 1989; Ip et al., 1990). Other functions of gp63 are controversial such as the protein’s role as a complement receptor (Russell, 1987) and its association with parasite virulence (Wilson et al., 1989; Murray et al., 1990). Surface labeling data indicate that relatively few poly- peptides reside on the surface of L. major promastigotes (Murray et al., 1989a, and references therein). Apart from lipophosphoglycan and gp63, the only other characterized surface molecules include gp46 (Kahl and McMahon-Pratt, 1988), several transporter-like molecules such as an ATPase (Meade et al., 1987), and a newly discovered family of surface glycoproteins termed the promastigote surface antigen-2 (PSA-8)’ complex (Murray et al., 1989b).

Our previous experiments involved isolating genomic clones expressing sequences encoding PSA-2 polypeptides. We also determined that PSA-2 proteins are linked to the parasite surface membrane via a glycosylphosphatidylinositol linkage (GPI) as are lipophosphoglycan (McConville et al., 1987; Orlandi and Turco, 1987), gp63 (Bordier et al., 1987), and a subset of the glycoinositolphospholipids (McConville and Bacic, 1990). Since apparently few molecules are found on the surface of promastigotes they must, in toto, be capable of mediating survival of the nonspecific complement attack, attaching to macrophages and surviving within the lytic en- vironment of the macrophage phagolysosome. In this paper, we present an analysis of PSA-2 genes and transcripts which identifies this group of molecules as a polymorphic family with unusual repeat structures.

MATERIALS AND METHODS

Promastigotes of the virulent cloned line L. major V121 (Handman et al., 1983) were used for most experiments. This clone was derived from LRC-137 (WHO code, WHOM/IL/67/Jericho 11). The parasites from which DNA was isolated as shown in Figs. 12 and 13 have been listed with their WHO codes in Samaras and Spithill (1987).

Genomic DNA from in uitro cultured promastigotes was isolated as described (Spithill and Samaras, 1987; Cowman et al., 1984). Total RNA was isolated by the method of Chirgwin et al. (1979). Cut genomic DNA was electrophoresed and alkali blotted onto Hybond- N (Amersham Corp.) as described (Church and Gilbert, 1984). RNA was electrophoresed in formaldehyde gels and transferred to nitro- cellulose as described (Spithill and Samaras, 1987; Maniatis et al., 1982). All probes were made by random priming (Feinberg and Vogelstein, 1983). Washing conditions were as described in figure legends. Pulse-field gels were electrophoresed and blotted as described (Samaras and Spithill, 1987).

An L. major size selected (>1 kb) library was constructed in X g t l O

The abbreviations used are: PSA-2, promastigote surface antigen- 2; VSG, variable surface glycoprotein; GPI, glycosylphosphatidyli- nositol.

24477

24478 Leishmania Surface Antigen Gene Family

using an Amersham Corp. kit. A dedicated genomic library containing the SmaI-digested PSA-2 7.0-kb band was constructed by purifying DNA of the appropriate size by glass milk, ligating EcoRI adaptors to the purified DNA, kinasing and cloning into EcoRI-cut XgtlO. This library yielded -20,000 independent clones of which -3% consisted of the PSA-2 tandem repeat.

DNA fragments for sequencing were generally cloned into M13 mp18 initially. The exonuclease I11 method (Henikoff, 1984) was used to generate nested deletions. Sequencing was by the method of Sanger et al. (1977) using sequenase (U. S. Biochemicals), double-stranded sequencing (Hattori and Sakakai, 1986) with sequenase or TAQ polymerase (U. S. Biochemicals Kit) or automated sequencing on an Applied Biosystems sequenator using a protocol developed by V. Marshall (Hall Institute) also using TAQ polymerase (Cetus).

RESULTS

Examination of PSA-2 Genes in the L. Major Genome-A variety of PSA-2 probes (see Fig. 5 for probe assignment) were used to investigate the arrangement of PSA-2 genes. All the probes identified the PSA-2 locus on a single chromosome in the L. major karyotype as shown by pulse-field gel electro- phoresis in Fig. 1A. This chromosome can be identified as

6 f

PSA-2 @3

FIG. 1. Genomic organization of PSA-2 genes. Panel A, chro- mosomal location of the PSA-2 locus by hybridization to chromosome bands of L. major separated by pulse field gel electrophoresis. The left side shows an ethidium bromide-stained gel of a pulse-field gel which separated chromosome bands 10-18. The gel was blotted and the filter hybridized to a @-tubulin probe which identifies chromosome bands 7, 13, and 21 (unresolved ( u ) on this gel) or PSA-2 which unequivocally hybridizes to chromosome band 14. The PSA-2 probe was from a genomic clone containing the common COOH-terminal coding region. Panel El, complex organization of PSA-2 genes in L. major V121. DNA (1 pg) was digested with each enzyme as shown, electrophoresed through a 1% agarose gel, blotted onto a nylon filter, and probed with the PSA-2 cDNA 6.4 probe. The filter was washed at 0.2 X SSC (standard saline citrate), 65 "C. Panel C , estimation of the PSA-2 gene copy number and locus size. L. major V121 genomic DNA (1 pg) was digested with 1 unit of Sal1 for the times shown in minutes and electrophoresed through a 0.4% agarose gel, blotted onto a nylon filter and probed and washed as in the experiment in panel R. Panel D, confirmation that the multiple PSA-2 bands are not due to partial digestion products. L. major VI21 genomic DNA (1 pg) was digested and blotted as described above and probed with PSA-2 cDNA 6.4. This filter was then stripped and probed with an L. major gp63 probe (Button et al., 1989). All markers are in kb.

chromosome band 14 in the L. major V121 karyotype (Sa- maras and Spithill, 1987). The PSA-2 probes hybridized to the chromosome band immediately above chromosome band 13 (Fig. 1A) which is identified by the presence of a dispersed @-tubulin gene (Spithill and Samaras, 1987). None of the smaller L. major chromosomes hybridized to the PSA-2 probes.

When genomic DNA from L. major V121 was hybridized to a 1.9-kb PSA-2 cDNA probe (PSA-2 cDNA 6.4, Fig. 1B) a very complex pattern was observed in most restriction enzyme digests. Such a pattern indicated the presence of multiple PSA-2 gene copies on chromosome band 14. Similar hybridi- zation patterns were observed with all PSA-2 probes including those derived only from the COOH-terminal coding region of PSA-2 genes. This is shown in Figs. 2 and 3 where a genomic

FIG. 2. Southern blot of L. major gDNA probed with the original genomic clone isolated f rom a Xgt 11 library (Murray et al., 1989b). This probe is 2.2 kb in length (see Fig. 6) and contains only a small amount (0.2 kb) of PSA-2 coding region com- pared to the other clones but still identifies bands common between each probe (compare the pattern in this figure to the Southern blot in Fig. 2).

6.7 - 4.3 -

2.3 - 2.0 -

FIG. 3. Southern blot of L. major gDNA probed with a coding region probe derived from the common COOH termi- nus found in all the PSA-2 cDNA and genomic clones. The probe used was the 200-base pair probe constructed by polymerase chain reaction from the cording region present in the original X71A clone. Note that this probe also identifies many bands on a Southern blot in a similar manner to those using longer probes.

Leishmania Surface Antigen Gene Family 24479

probe and a coding region probe identify common PSA-2 bands in the Southern blots. Note the presence of a similar intensity band at approximately 7.0 kb with the enzymes HindIII, SalI, XhoI, and S m I . This indicated the possibility of tandemly arranged PSA-2 genes within the genome. Tan- demly repeated genes are common in Leishmania (and all trypanosomatids) and have been found for gp63 genes (Button et al., 1989; Murray et al., 1990; Miller et al., 1990), mini-exon genes (Iovannisci and Beverley, 1988): genes encoding a trans- porter-like protein (Stein et al., 1990) and @-tubulin genes (Landfear et al., 1984; Spithill and Samaras, 1987) for exam- ple.

To estimate the PSA-2 gene copy number, genomic DNA was partially digested with SalI and fractionated in a low percentage agarose gel. A PSA-2 cDNA identified multiple bands on a Southern blot of this gel (Fig. IC). The number of bands in this 2-h digest indicates that there are at least 14- 15 PSA-2 genes in the L. major genome. It is possible to estimate that PSA-2 genes occupy around 200 kb by addition of the fragment sizes. The PSA-2 genes, therefore, occupy about one-quarter of chromosome band 14 which is approxi- mately 900 k 50 kb (Samaras and Spithill, 1987). It is, however, impossible to determine from these results if the genes are clustered at a single locus or dispersed along the chromosome.

Finally, we determined whether the multiple bands ob- served in genomic Southern blots may have been due to partial restriction digests. Fig. 1D shows that if a filter is probed with a gp63 genomic fragment, the expected pattern is observed. The gp63 locus has been extensively characterized (Button et al., 1989; Murray et al., 1990). The same filter, when probed with a PSA-2 probe, gives the expected multiple banding pattern. Hence, we conclude that partial digestion products are not an issue in this study.

PSA-2 Transcripts-Since the locus containing PSA-2 genes was so complex, we anticipated finding multiple PSA- 2 transcripts. Using conventional Northern blotting (Fig. 4), four major PSA-2 transcripts were observed at 2.6-5.3 kb. We then looked for larger PSA-2 transcripts by nicking the RNA in formaldehyde gels by treatment with NaOH following by subsequent transfer. Fig. 4, B and C, show that this treatment

A . .- < F

B C

c 5.3

-3.15 - 1.P c 1.6

PSA-2 coding region - A71A gcnomlc clone

I I I I I I I FIG. 5. Genomic organization of the PSA-2 tandem repeat

and derived clones. A partial restriction map for the SrnaI-isolated tandem repeat unit (XTRI) (which was EcoRI ends due to the addition of linkers for cloning into Xgt lO) is shown along with the position of the PSA-2 coding region and each cDNA, clone (cDNA 6.4, 4.6, 2.5);

EcoRI; H, HindIII; N , NotI; X, XhoI; S, SalI. X71A, genomic clone (Murray et al., 1989b). The enzymes are: R,

1 6 1

121 1

181 13

241 33

301 53

361 73

421 93

481 113

541 133

601 153

661 173

721 193

7 8 1 213

841 233

901 253

961 273

1021 293

1081 313

GCTGCTCGCCTCTCTCCCCCGCACGAGGCTACGTACGACGCTGTCGGCCCCCTCGCTCTG CCTGGTARGCTCAGCAGACACCGACGCCCGAGCARTCCCGCCCACGGACCGTGTGCGCCC GCTCTGCTCGTGACCCTGGCTGCGARTGGCGCAGTGCGTGCGTCGGCTGGTGCTCGGCGA

M A Q C V R R L V L G O

CGCTCGCCGCTGCGGTGGCGCTGCTGCTGTGCACGAGCAGGCTCGGGTGGCGCGTGCTGC A R R C G G A A A V H E Q A R V A R A A

TGGGACGGGCGACTTCACTGCGGCGCAGCGGACGARCACGCTGGCGGTGCTGCAGGCGTT G T G D F T A A Q R T N T L A V L Q A F

TGGGCGTGCGATCCCTARGCTTGGGGAGARGTGGGCGGGCARCGACTTCTGCTCGTGGGA G R A I P K L G E K W A G N D F C S W E

GGCCGTCTTGTGCARTGCGCCGGACGTGTACGTGTCGGGARTCAGTCCGACGTATGCCGG A V L C N A P D V Y V S G I S P T Y A G

CACGCTGCCGGAGATGCC4GAGI\ACGTCGACTACAGGCACGTCGTGATCAGGCGGCTCGA T L P E M P E N V D Y R H V V I R R L D

CTTTTCCGRAATGGGGCCGGGGCTGAGCGGGACCGTGCCCGCCTCATGGCACTCGATGAC F S E M G P G L S G T V P A S W H S M T

ATCTTTGGAGTCGTTGTCGATTG~GTGTGRAAGCATCTCCGGCAGTGTGCCCCCCGA S L E S L S I E K C E S I S G S V P P E

GTGGGGCTCGATGACATCGCTGAGTGTTCTCARTCTGCGGGGCACAGGCATCTCCGGCAC W G S M T S L S V L N L R G T G I S G T

GCTGCCGCCCCAGTGGAGTGGGATGTCGARGGCCCGGTCCCTGCAGCTGCAGGACTGCGA L P P Q W S G M S K A R S L Q L O D C D

CCTGTCCGGCAGTCTGCCCTCTTCGTGGTCTGCGATACCGATGCTGGCTTCCGTCTCTCT L S G S L P S S W S A I P M L A S V S L

TARGGGCARCARGTTCTGCGGGTGTGTGCCGGACTCGTGGGATCAGARGGCTGGTCTTGT K G N K F C G C V P D S W D Q K A G L V

TGTGGACATCGAGGACARGCACARGGGCAGCGACTGCTTGGCTGCTARGGACTGCGCARC V D I E D K H K G S D C L A A K D C A T

GACCACCACTARGCCCTCCGCCACGACAGCGACCACCCCGARCCTCACTARCTTTCCCCC T T T K P S A T T A T T P N L T N F P P

TACGCCGAGGACCACGACTGAGCCGCTTACCACARCCAGCACTGAGGCACCGGCTG~CC T P R T T T E P L T T T S T E A P A E P

CACARCCACCACTGAGGCACCGGCTGARCCCACGACCACTGCTACCCCARCRAACACGCC T T T T E A P A E P T T T A T P T N T P

GACTCCTGCACCAGAGACGGAGTGCGAGGTGGATGGGTGTGAGGTGTGCGAGGGGGACTC T P A P E T E C E V D G C E V C E G D S

1141 CGCTGCGAGGTGCGCGAGGTGCCGTGAGGACTACTTCCTGACGGACGAGARGACGTGCCT 333 A A R C A R C R E D Y F L T D E K T C L

1201 GARGCACARCGATGGCGGTGTTGCTGCTGTGTCGAGCGGAGTGGCAGCAGCAGCTGTTGT 353 K H N D G G V A A V S S G V A A A A V V

1261 GTGCGTGGCTGTGCTGTTCAGCGTGGGGCTGGCGGCCTGA 1300 373 C V A V L F S V G L A A ' 304

FIG. 6. Sequence of the PSA-2 coding region of the 7.0-kb clone. A putative hydrophobic leader sequence is underlined along with the GPI anchor signal sequence at the carboxyl terminus. A single potential N-linked glycosylation site is in bold. The sequence has been submitted to the EMBL database.

FIG. 4. Analysis of PSA-2 transcripts. Panel A, total RNA from L. major V121 promastigotes (IO pg) was electrophoresed through a formaldehyde agarose gel, blotted onto nitrocellulose, and probed with PSA-2 cDNA 6.4. Panel B, as in panel A except the formaldehyde gel was treated with 50 mM NaOH for 15 min prior to blotting. The smaller bands of hybridization are background due to the large amounts of ribosomal RNA. Arrowheads indicate the larger transcripts whose sizes range from 7-10 kb. Panel C, longer exposure of the same experiment in panel B.

allowed the identification of at least four lower abundance PSA-2 transcripts. We are currently investigating whether any of these transcripts are polycistronic. Polycistronic tran- scripts have been found in several trypanosomatid RNAs such as Trypanosoma brucei VSG expression linked transcripts (Johnson et al., 1987) and T. brucei calmodulin transcripts (Tscdudi and Ullu, 1988) for example.

Derivation of the PSA-2 Sequence-Initially we tried to

24480 Leishmania Surface Antigen Gene Family

clone a full-length cDNA from a X g t l O L. major promastigote cDNA library using a short, 235-base pair polymerase chain reaction-generated probe containing only the coding region of an expressing genomic clone (Fig. 5, Murray et al., 1989b). Seven cDNAs were isolated, the longest three being 2.2, 2.1, and 1.9 kb (cDNAs 2.5, 4.6, and 6.4; Fig. 5). These were sequenced and were found to contain very long untranslated 3’ ends prior to the poly(A) tail (see Fig. 5 for the position of these clones and Fig. 9 for the sequences of the coding re- gions). We were unable to generate the 5‘ end of a PSA-2 cDNA by either the RACE protocol (Frohman et al., 1988) or polymerase chain reaction using the conserved mini-exon sequence which was anticipated to be present at the 5’ end of each PSA-2 mRNA.

We then decided to clone the entire 7.0-kb band which would probably contain the 5’ end of a PSA-2 gene if a tandem repeat unit was present. The band in the SmaI digest (Fig. 1B) was chosen because it is isolated from other high molecular weight PSA-2 genomic fragments such as in the Sal1 or XhoI digests. The SmaI fragment was cloned into X g t l O and one clone chosen for further analysis. The clone (XTR1) was subcloned, mapped, and the PSA-2 coding region sequenced. This sequence is shown in Fig. 6. All the cDNA probes hybridized exclusively to a 3.8-kb EcoRIISalI fragment (see Fig. 5 ) . This fragment, when partially sequenced, con- tained the 3’ end of the PSA-2 coding region. To obtain the remainder of the PSA-2 coding region the other EcoRI/SalI fragment (3.1 kb) was partially sequenced. There was one open-reading frame (ORF) when the two fragments were combined and translated that contained the PSA-2 ORF from the original genomic fragment X71A (Murray et al., 1989b). The ORF consisted of 384 amino acids of which a putative short leader peptide and GPI anchor signal sequence (both underlined in Fig. 6) would be removed to produce a mature protein of around 320 amino acids. The molecular weight of this deduced protein was calculated at approximately 36,000 after removal of the signal sequences. There was one potential N-linked glycosylation site at amino acid position 266. The presence of a putative leader sequence and initiator methio- nine preceded by an in-frame stop codon (nucleotides 131- 133) strongly suggests this is the bona fide PSA-2 tandem repeat ORF.

The calculated molecular weight of the protein encoded by the 7.0-kb clone, in the absence of post-translational modifi- cations such as glycosylation and the GPI anchor, is incon- sistent with the previously reported molecular weight of PSA- 2 surface glycoproteins of M, 80,000-94,000 (Murray et al., 1989a, 198913). Recently, we have detected an abundant M , 46,000-50,000 protein in TX-114 extracts of promastigotes of L. major using a rabbit antiserum directed against a PSA-2 fusion protein that we believe may correspond to the product of the smaller PSA-2 genes.’ Further, the larger PSA-2 pro- teins (M, 80,000-94,000) may represent the products of PSA- 2 genes with larger ORFs (e.g. cDNAs 2.5 and 4.6 which contain 67 and 77 amino acid insertions, respectively, relative to the other sequences). These insertions would cause the proteins encoded by cDNAs 2.5 and 4.6 to have a significantly larger Mr. This coupled to the addition of GPI anchors and carbohydrate could easily account for the larger PSA-2 forms previously described (Murray et al., 1989b). Our current aim is to establish precursor-product relationships between PSA- 2 proteins and their genes.

PSA-2 cDNAs Exhibit Sequence Variability and Contain Repeats in Their Coding Regions-The presence of multiple PSA-2 genes in the L. major genome and multiple PSA-2

P. J. Murray, unpublished data.

transcripts and proteins (Murray et aL, 1989b) suggested that there may be some degree of structural diversity within the gene family. To address this question, the three longest cDNAs were sequenced. The sequences of the coding regions are presented in Figs. 7-9. The common 3‘ end encoding the GPI anchor signal sequence was identified with respect to the genomic sequence (this is underlined in Fig. 10). The complete coding regions of the three cDNAs in comparison to the XTRl coding sequence is shown in Fig. 10. Each cDNA represents a distinct PSA-2 gene and none were identical to the XTRI coding sequence. Two cDNAs, 4.6 and 2.5, contained a repeat unit consisting of five or seven copies of the sequence TTTTTTTKPP. In cDNA 4.6, the repeat region was followed by a stretch of 17 consecutive theonine residues. The TTTTTTTKPP repeat was most similar to the threonine- rich region in the Sgs-3 salivary glue glycoprotein of Drosoph- ila, a heavily glycosylated mucin-like molecule (Garfinkel et al., 1983). cDNA 6.4 and the XTRl clone contained two copies of the sequence TEAPAEPTTT while cDNAs 4.6 and 2.5 contained only one copy of this sequence. The carboxyl- terminal region following the threonine-rich domain was highly conserved in all cases, cDNAs 4.6 and 2.5 having five or four amino acid differences, respectively, relative to the other sequences. The remainder of the cDNAs encoding their most 5’ ends exhibited various degrees of difference relative to one another. Of note were the extreme differences in sequence between the tandem repeat and the 5’ end of cDNA 2.5. We carefully examined alternate reading frames but could find no sequence similarities. In addition, the nucleotide sequences were highly dissimilar in this region. This result indicates that each cDNA was transcribed from a different PSA-2 gene. Thus, there is a large degree of variability within the PSA-2 gene family with the XTRl and cDNA 6.4 being closely related and cDNA 2.5 and 4.6 being related sequences. We do not believe these diverse sequences are due to hetero- geneity that may have arisen with time in culture of L. major V121 as this parasite is clonal and has not shown variance with a variety of genetic markers over time. Further, fresh parasite stocks originating from the first clone are regularly returned to culture. Finally, at least three PSA-2 genes appear to be transcribed, but whether all the genes are translated remains to be determined.

Repetitive Elements with PSA-2 Sequences-The structure of the coding regions of PSA-2 genes is shown as a diagramatic representation in Fig. 11A. The sequences can be subdivided into blocks consisting of a putative leader sequence, blocks of an approximately 25 amino acid repeat present in three copies in the XTRI clone (see Fig. 11B), a threonine-rich region which is greatly expanded into repeat units in cDNAs 4.6 and 2.5, a cysteine-rich region, and the GPI anchor signal se- quence. The amino acid structure of the 25 amino acid repeats is shown in Fig. 11B. Each of the cDNAs terminated at the 5’ end in this region. None of the sequences exhibited signif- icant similarity to any proteins present in the EMBL/Gen- bank sequence libraries except for the aforementioned simi- larity of the threonine-rich region to the mucin Sgs-3 gene product. None of the repeats were similar to those found in the repetitive Leishmania antigens described by Wallis and McMaster (1987) which is the only other example of repeats found in Leishmania antigens to date. The only other notable similarity was the arrangement of cysteine residues in the COOH domain which was related to those found in epidermal growth factor repeat containing proteins such as the Notch gene product (Wharton et al., 1985; Kidd et al., 1986). The relevance of this finding is uncertain but may simply be

PSA-2 cDNA 6.4. This was the first FIG. 7. Complete sequence of

cDNA clone sequenced. The clone was sequenced completely including the long 3”untranslated region leading to a poly(A) trial. EcoRI ends are shown at each end of the clone. The stop codon was identified by comparison with the original genomic sequence.

FIG. 8. Partial sequence of PSA-2 cDNA 2.5. This clone was sequenced initially with oligonucleotides derived from the cDNA 6.4 sequence to ascertain the extent of the coding region. Approx- imately 50% of this clone comprised a 3”untranslated region and poly(A) tail. The untranslated region was not se- quenced in depth and so only the coding region is presented.

Leishmania Surface Antigen Gene Family 24481 B P R A L Q S L D T L R L S C S K V S C T L P P Q W S C U S K A R S GRATTCCGTGCGCTGCAGTCGCTTGATACACTGCGGCTGTCCGGCAGTRAGGTCTCCGGCACGCTGCCGCCCCAGTGGAGTGGGATGTCGRAGGCCCGGT

CCCTGCAGCTGCAGGACTGCGACCTGTCCGGCAGTCTGCCCTCTTCGTGGTCTGCGATGCCGATGCTGGCTTCCGTCTCTCTTARGGGCRACRAGTTCTG ~ ~ ~ ~ ~ ~ D L S C S L P S S W S A M P M L A S V S L K C ~ ~ P C

CGGGTGTGTGCCGGACTCGTGGGATCAGRAGGCTGGTCTTGCTGTGGACATCGAGGACRAGCACARGGGCAGCGACTGCTTGGCTGGGAGGGAGTGCACA G C V P D S W D Q K A C L A V D I P D K B K C S D C L A C R B C T

T T ~ T K P P T M T T T T T K P T A T ? T T T T S P T N F P P T P T ACGACCACCACTARGCCCCCTACCATGACCACGACCACCACTRAGCCCACCGCCACGACRACGACCACTACGAGCCCCACTARCTTTCCCCCTACGCCGA

CGACCACGACTGAGCCGCTTACCACARCCAGCACTGAGGCACCAGCTGARCCCACRACCACCACTGAGGCACCGGCTGARCCCACGACCACTGCTACCCC T T ~ L P L T T T S T L A P A E P T T T T E A P A E ~ T T T ~ T P

RAC-CACGCCGACTCCTGCTCCTGAGACGGAGTGCGAGGTGGATGGGTGTGAGGTGTGCGAGGGGGACTCCGCTGCGAGGTGCGCCAGGTGCCGTGAG T U T P T P A P P T P C P V D C C L V C E C D ~ ~ ~ R ~ ~ ~ ~ ~ B

D Y F L ? D ~ R T C L M B N D C G V A A V S S G V A A A A V ~ C V ~ GACTACTTCCTGACGGACGAGAGGACGTGCCTGATGCAC~CGATGGCGGTGTTGCTGCTGTGTCGAGCGGAGTGGCAGCAGCAGCTGTTGTGTGCGTGG

CTGTGCTGTTCAGCGTGGGGCTGGCGGCGTGAGGACCGTGCTGCTGTCGCGCGCAGGTAGTGGCCCCGCTGCGTAGCACAGACTGTCTGCGTGCTTGCGT V L P S V ~ L A A * C P C C C R A Q V V A P L R S T D C L R A C V

GCAGCGCGCCCCCTGCGTTGGCGTGCGGTGCGTGTCTCTGTGAGCATGGCTGCCAGTGGTGCCCTCGCTCCTGCCTCTCGGTGCCTCTGCCTCTCTCGGC

GTGTTGATGCTGTGGGCTGTGTGTGGGGCTCTCATGCGGCCTGCTCTCCCGCGGTGTCCTCCTCTGGCCCGACTCTCTCTCCTGCCCTCCTCTCTCGCAT

GCGGCGAGGGAGGGGTGGCACGTGCGCGCGCGTTGCTGCGTTGCGAGTGTGCTGTGCACTGCCGTGCGCCTCTCTCCTCTTTCTCTTTCCGTTCGCTTGT

CTTCTCTCTTCTCCCCCCTCCCCCGCACTGCGTCTCCCCTCCTCTGCCGTGCGGTGGCGCAGATGAGGGATATGCCGTGTGCCTCCCCCCTTTCATGGAG

CGCCGAGCGATCCCCCTTCGGCCTCGCTCCTCCCTCCTCCCGTGTAGGCCCTGCCTGTTGTACATC~CGTTAGGACCGTCTCTTCATGAGCATCGCCTCT

TCCGC-TCTTTGTTCGCGTGTGCCGCCTCTCAGACTTCAGCCTTACTGTGATTGTCTTCTCACAGTGCGCCTCCGTGTGTGTGTGTGTGCCAGCACGC

ACCGCCTCTTCCATGTATGTCCTTGCTTGCTCTGGTGTGCCACCCTCCCGCTGCCTCCCACARTCCGTGCCTGTGCGCATGGCCGTGTGGAGGGGACATC

G G T G C C C C T C C C T G C C A C T C T C T A C T T C C T C A C T C T C T T C G C G

GGTATGCGTGCAGGTGTCCCCGTACACGTTATGTGTGTCAGTGCCCCACCTTAT-TATGTGCGTGTACGTGARTCGACRAGTTTTAGGGTATCGCATG

TCTACATGCGATGGATGGATGTATACAGCCGACTGCGTGCCGTGTGTAGGCGTGTGCGTGTATCCGTGTGTGCCAGCAGCGTGTGTCGGTCACGCTCTCT

TGCCCGCCTCTTCTGTGCTTGCCACTCGCTCTGGGGCGCTGGCGCTGGCCGGGTGGTGGCCGTGCGGARGGTGGGCGGCGGCTCCCCTATTTCTCTGTTT

Q R A P C V G V R C V S L ’

CTCTAAAAARAARAGGRATTCC 1921

B W S W L P N L Q T L R L R R L K L S C T L P A B W S S W K S L S CGAGTGGAGCTGGCTGCCTRATTTACAGACTCTGCGGCTAAGGCGACTARAACTGAGCGGTACGCTGCCTGCGGAGTGGAGCTCTTGGARATCACTGTCG

N V P L D D T P I T C L L P P B W C S L E R I Q Q L V L R K L K I T RACGTCTTTCTTGACGACACGCCGATCACAGGCTTGTTGCCCCCGGAGTGGGGCTCGCTGGAGAGAATACAGCAGCTGGTTTTACGGA~TTGRAGATTA

G P L P P Q W S L M K V L R V L D L D S T K V C C ? L P A E W S R CCGGCCCTCTCCCTCCTCAGTGGAGCCTAATGAAGGTATTGCGGGTTCTAGATCTGGATAGCACGAAGGTATGCGGCACGCTGCCGGCCGAGTGGAGTCG

M S T A A Y F W L N N Y D L S C T L P P Q W S S M P Y L R G V S L GATGTCGACGGCTGCATACTTCTGGCTGRACMCTACGACCTGTCCGGCACGCTGCCGCCCCAGTGGTCGTCGATGCCATACCTGCGCGGCGTCTCACTG

K C K R P C ~ C V P E S W ~ ~ X ~ D L A V E I ~ D ~ E K C ~ D C L ~ RAGGGCRAGCGCTTCTGCGGGTGTGTGCCGGAGTCGTGGGCCRAC~GGCTGATCTTGCTGTGG-TCGAGGACRAGCACRAGGGCAGCGACTGCTTGG

C K D C T T T T T K L P T T T T T T T K P P T T T T T T T K P P T CTGGTRAGGACTGCACRACGACCACCACTRAGCTCCCCACCACGACRACGACCACCACRARACCCCCCACCACGACAACGACCACCACT~GCCCCCCAC

T T T T T ? K P P T ? T T T T T K P P T T T T T T T K P P T T T T CACGACCACGACCACCACAACCCCCCACCACGACRACGACCACCACTRAGCCCCCCACCACGACRACRACCACCAC~CCCCCCACCACGACAACG

T T T K P P T T T T T T T K P L T T A T T T K P P T T T ? T ? T K P ACCACCACAARACCCCCCACCACGACCACGACCACCACT~GCCCCTAACGACAGCGACCACCACT-CCCCCTACCACGACCACGACCACCACTAAGC

P T T I ? S T T K L P T ? T T T B A P A B P T T T A T P T N T P T CGCCGACCACCATAACGAGCACTACTRAGCTGCCAACCAC~CCACCACTGAGGCACCGGCTGMCCCACGACCACTGCTnCCCCRACMACACGCCGAC

P A P E T E C E V D C C E V C E C D S A A R C A R S R ~ D ~ F L T TCCTGCACCAGAGACGGAGTGCGAGGTGGATGGATGGGTGTGAGGTGTGCGAGGGGGACTCCGCTGCGAGGTGCGCGAGGAGCCGTGAGGACTACTTCCTGACG

D E R T C L V I C D C C V A A V S S C V A A A A V V C V A V L F S V GACGAGAGGECGTGCCTGGTGTACTGCGATGGATGGCGGTGTTGCTGCTGTGTCGAGCGGAGTGGCAGCAGCAGCTGTTGTGTGCGTGGCTGTGCTGTTCAGCG

C L A A * TGGGGCTGGCGGCGTGA 1117

100

200

300

400

500

600

700

B O O

900

1000

1100

1200

1300

1400

1500

1600

1100

1800

1900

100

200

3 0 0

4 0 0

500

600

1 0 0

800

900

1000

1100

concerned with the way PSA-2 proteins are folded in the COOH-terminal domain.

The Structure of the PSA-2 Locus Is Similar in Other Species of Leishmania-It was necessary to determine whether the complex genomic organization of PSA-2 genes was unique to L. major or whether it was present in other species of this genus. At the chromosome level, PSA-2 loci were found on chromosomes of similar size to chromosome band 14 in L. major V121 (Fig. 12) using the COOH-terminal coding region probe. Interestingly, two PSA-2 loci were iden- tified in the L. tropica (Fig. 12, lanes 3 and 5 ) and L. mexicana (Fig. 12, lane 9 ) strains analyzed in this experiment. One

locus was present on a chromosome of -950 kb, as in L. major V121, and the other was on a larger, -1.4 Mb chromosome. No homologous chromosomal loci were identified in L. enrietti (lane 6 ) or L. enrietti L144 (lane 7) at this wash stringency. The PSA-2 locus in L. donouani L52 (lane 8 ) appeared to be present on a smaller chromosome to the other species.

Genomic DNA from a variety of Leishmania digested with PuuII hybridized to the PSA-2 cDNA 6.4 probe with patterns of similar complexity to that of L. major V121. DNA from L. major strains showed a similar pattern to L. major V121 (Fig. 13, compare lane 1 to lanes 2-6) while other strains showed that although PSA-2 genes were present in their genomes, the

24482

FIG. 9. Partial sequence of PSA-2 cDNA 4.6. The derivation of this clone was essentially the same as described for cDNA 2.5 above. Again, the cDNA clone was approximately 50% 3"untranslated region which, once the coding region was identified, was not sequenced any fur- ther.

Leishmania Surface Antigen Gene Family E P P L V L E K S K L T G P L P P P W S S M R S L S L L N L N G A K GAATTCCAGCTGGTTCTAGAGGARATCGAAGCTGACCGGCCCTCTCCCTCCTCAGTGGAGCTCGATGAGATCGCTGAGCCTTCTGAACCTGAATGGCGCAA

V S C T L P P B W S G M S K A A Y P W L N N C D L S C T L P P Q W AGGTCTCCGGCACGCTGCCGCCCGAGTGGAGTGGGATGTCGAAGGCCGCATACTTCTGGCTGAACAACTGCGACCTGTCCGGCACGCTGCCGCCCCAGTG

S S ~ P N L R C V S L ~ G N R P C G C V P ~ S W A U ~ A D L A V ~ GTCGTCGATGCCGARACCTGCGCGGCGTCTCACTGAAGGGCAACCGCTTTTGCGGGTGTGTGCCGGACTCGTGGGCCAACAAGGCTGATCTTGCTGTGGAA

ATCGRGGACAAGCACAAGGGCAGCGACTGCTTGGCTGGTAAGGACTGCACAACGACCACCACTAAGCTCCCCACCACGACAACGACCACCAC~AAGCCCC I E D K B X G S D C L A G K D C T T T T T K L P T T T T T T T K P P

T T Z T T T T K P P T T T T T T T K P P T T T T T T T K P P T T T CCACCACGACAACGACCACCAC~CCCCCCACCACGACCACGACCACCACAAAACCCCCCACCACGACAACGACCACCAC~CCCCCCACCACGAC

T T T T K L P T T T T T T Z K P P T T T ~ T T T T T T T T T T T T AACGACCACCACTAAGCTCCCCACCACGACAACGACCACCAC~CCCCCCACCACGACAACGACCACCACGACAACGACCACCACGACCACGACCACC

T K P P I T T A Z T T X P P T T T Z T T T K P P T T I T S T T K L P ACTARGCCCCCCATCACGACAGCGACCACCACT~CCCCCTACCACGACCACGACCACCACTAAGCCGCCGACCACGATAACGAGCACTACTAAGCTGC

T T T T T B A P A B P T T T A T P T N T P T P A P E T B C E V ~ G C C A C C R C A A C C A C C A C T G A G G C A C C G G C T G A A C C C A C G ~ G C G A G G T G G A T G G

C E V C E C D S ~ A R C A R C R E ~ Y F R ~ D B R ~ C L V Y C ~ G GTGTGAGGTGTGCGAGGGGGACTCCGCTGCGAGGTGCGCGCGAGGTGCCGTGAGGACrACTTCCGTACGGACGAGAGGACGTGCCTGGTGTACTGCGATGGC

G V A A V S S G V A A A A V V C V A V L P S V G L A A . GGTGTTGCTGCTGTGTCGRGCGGAGTGGCGGCAGCAGCTGTTGTGTGCGTGGCTGTGCTGTTCAGCGTGGGGCTGGCGGCGTGA 984

100

200

300

400

500

600

700

800

900

TAN. REP. VDYRHWIRR LDFSEMGPGL SGTVPASWHS MTSLESLSIE KCESISGSVP cDNA 6.4 .................................................. cDNA 4.6 .................................. ER'PLVL EXSKLTGPLP cDNA 2.5 PAEWSSWSL SNVFLDDZPI TGLLPPEWGS L.ERIQQLVL RKIXITGPLP

101

152 TAN. REP. PEWGSMTSLS VLNLRGTGIS GTLPPQWSGM SKARSLQLQD CDLSGSLPSS cDNA 6.4 .EFIUI.PSW TLRLSGSWS GTLPPQWSGM SKARSLQLQD CDLSGSLPSS cDNA 4.6 PQWSSMRSLS L L N m W S GTLPPEWSGM SKAAyplQLNN CDLSGTLPPQ cDNA 2.5 PQWSLMKVLR VIDWSTKVC GTLPAEWSRM STRAYPWLNN YDLSGTLPPP

TAN. REP. WSAIPMLASV SLKGNKFCGC VPDSWDQKAG LWDIEDKHK GSDCLAAKDC CDNA 6.4 WSAMPMLASV SLKGNKFCGC VPDSWDQKAG LAVDIEDKHK GSDCLAGREC CDNA 4.6 WSSMPNLRGV SLKGNRFCGC VPDSWANKAD LAVEIEDKHK GSDCLAGKDC CDNA 2.5 WSSMPYLRGV SLKGXRFCGC VPESWANKAD LAVEIEDKHK GSDCLAGKDC

202

251 TAN. REP. ATTTTKPS.. ........................................ CDNA 6.4 TTTTTKP. .......................................... CDNA 4.6 TTTTTKLPPP RQRPPLKPPT TTTTTTFSPT TTTTTTFZPT TTTTTTFSPT CDNA 2.5 TTTTTKLP.T TTTTTTFSPT TTTTTTFSPT TTTTTTKPPT TTTTTTFSPT

TAN. REP. ................................... SATTA TTPNLTNFPP cDNA 6.4 ........ PT MTTTTTFS.. ............... TATTT TTTSPTNFPP cDNA 4.6 TTTTTTXLPT TTTTTTFSPT TTTTTTTTTT TTTTTTFSPI TTATTTKPPT cDNA 2.5 TTTTTTFSPT TTTTTTFZP. ......... T TTTTTTKPP. TTATTTFSPT

TAN. REP. TPRTTTEPLT TTSTEAPAEP TTTTEAPAEP TTTATPTNTP TPAPETECEV cDNA 6.4 TPTTTTEPLT TTSTEAPAEP TTTTEAPAEP TTTATPTNTP TPAPETECEV CDNA 4.6 TTTTTTKPPT TITSTTKLPT TTTTEAPAEP TTTATPTNTP TPAPETECEV CDNA 2.5 TTTTTTKPPT TITSTTKLPT TTTTEAPAEP TTTATPTNTP TPAPETECEV

213

TAN. REP. DGCEVCEGDS AARCARCRED YFLTDEKTCL KHND- SSGVRAAAW CDNA 6.4 DGCEVCEGDS AARCARCRED YFLTDERTCL MHND- SSGVAAAAW CDNA 4.6 DGCEVCEGDS AARCARCRED YFRTDERTCL VYCD- SSGVAAAAW cDNA 2.5 DGCEVCEGDS AARCARCRED YFLTDERTCL W C D W SSGVAAAAW

323

373 TAN. REP. CVAVLFSVGL =* CDNA 6.4 CVAVLFSVGL =* CDNA 4.6 CVAVLFSVGL =* CDNA 2.5 CVAVLFSVGL =* ~ ~~

FIG. 10. Deduced amino acid sequences of each PSA-2 cDNA compared to the tandem repeat uni t coding region. The hydrophobic COOH-terminal GPI anchor signal sequence is under- lined. Amino acid differences in the cDNAs from the tandem repeat are in bold. The numbering is according to the tandem repeat sequence shown in Fig. 4. All sequences of the cDNAs have been submitted to the EMBL database.

overall organization was distinct to L. major. DNA from the lizard Leishmania, L. tarentolae (Fig. 13, lane 10) also hybrid- ized strongly to PSA-2 probes. Evidence for homologous PSA- 2 genes in T. brucei, Leptomonas, or Crithidia was not found, even at low stringency (data not shown).

6

FIG. 11. Repeat un i t s and overall s t ructural domains of PSA-2 proteins. In panel A, a diagram of each cDNA and the tandem repeat unit is shown. In panel B, an alignment of the repeat units in the tandem repeat clone is shown. The amino acid sequence numbering is the same as in Figs. 4 and 5.

A B 1 2 5 4 5 6 7 8 9 1 0 -

I

f ,

WJ- -4

FIG. 12. Pulse-field electrophoresis of Leishmania PSA-2 loci. Chromosomes from a variety of Leishmania sp. were separated by pulse field gel electrophoresis, blotted onto hybond and probed with a PSA-2 COOH-terminal coding region probe. The filter was washed in 2 X SSC, 65 "C. Panel A is the ethidium-stained gel and panel B is the corresponding autoradiograph of the Southern blot. Lanes I and 10, L. major V121; lane 2, L. major Freidlin strain; lane 3, L. tropica L32; lane 4, L. major L38; lane 5, L. tropica L39; lane 6, L. enrietti; lane 7, L. enrietti strain L144; lane 8, L. donooani L52; lane 9, L. mexicana L94. There is an artifact in lane 3 obscuring the higher molecular weight PSA-2 locus.

Leishmania Surface Antigen Gene Family 24483

1 2 3 4 5 6 7 8 9 1 0 1 1

2 3.1 -L

9.4 - 6.6 -L

4.3 - 2.3-

2.0 -

FIG. 13. Genomic Southern blots of PSA-2 genes in Leish mania. Genomic DNA (-1-2 pg) was digested with PuuII, fraction- ated in a 1% agarose gel, transferred to Hybond N and probed with PSA-2 cDNA 6.4. The filter was washed in 2 X SSC, 65 "C. Lane 1, L. major V121; lane 2, L. major L119; lane 3, L. major NIH s; lane 4, L. major Moshkovsky strain; lane 5, L. major L251; lane 6, L. major L287; lane 7. L. tropica L32; lane 8, L. tropica L39; lane 9, L. major (undesignated strain); lane IO, L. donouani L52; and lane 11. L. tarentohe. Markers are in kb.

DISCUSSION

Multigene families encoding surface antigens are common in trypanosomatids and protozoa in general. The most notable case is the variant surface glycoprotein of Tryparwsoma sp. which is encoded by hundreds of distinct genes only one of which is expressed at a time (Donelson, 1988, for review). There are also multigene families encoding polymorphic sur- face antigens in T. cruzi (Beard et al., 1987; Kahn et al., 1990; Petersen et al., 1986, 1989; Takle et al., 1989; Ibanez et al., 1988) and Babesia bouis (Cowman et al., 1984). In Leishmania, several surface proteins are also encoded by multigene families such as gp63 (Button et al., 1989; Miller et al., 1990; Murray et al., 1990) which is the major surface antigen of promasti- gotes (Bordier, 1987) and is also expressed in amastigotes (Button et al., 1989; Frommel et al., 1990; Medina-Acosta et al., 1989). In addition, an ATPase (Meade et al., 1988) and a glucose transporter-like molecule (Stein et al., 1990) are also encoded by tandemly repeated multigene families. The results described in this paper show that the PSA-2 proteins are also the products of a multigene family. The gene family was located on a -950-kb chromosome and occupied approxi- mately one-fifth of this chromosome. Southern analysis showed the possibility of several PSA-2 genes linked as a tandem repeat array (the 7.0-kb band), along with at least 10 other dispersed genes. The organization of the locus was complex, unlike the more simple arrangement of tandemly repeated genes encoding gp63 (Button et al., 1989; Miller et al., 1989), the glucose transporter-like molecule (Stein et al., 1990) or the ATPase (Meade et al., 1987,89). In addition, at least four major transcripts were found to hybridize to PSA- 2 probes as well as some large, minor transcripts, indicating the possibility of polycistronic transcripts being derived from the locus.

The sequences of three cDNAs and the genomic clone showed that each sequence encodes a distinct PSA-2 isogene. While the overall structure of the 3'-coding region of each cDNA was similar to that of the genomic clone, two cDNAs (2.5, 4.6) contained a region of direct repeats not found in

cDNA 6.4, or the genomic clone. This result suggests that each PSA-2 gene may vary considerably in sequence and significantly that many of these genes may be transcribed. The most 5' end of one cDNA, 2.5, was completely distinct in nucleotide and amino acid sequence to the comparable region in the repeat unit gene showing that the PSA-2 locus encodes a family of distinct but related polypeptides.

It is important to note that the cDNA library from which the PSA-2 clones were isolated was made from mRNA of the cloned parasite L. major V121. This indicates that each cDNA represents a transcript of a distinct PSA-2 gene within the population of parasites. An essential question is whether individual parasites within the population are expressing a single PSA-2 gene or whether an individual can express more than one or all PSA-2 genes. Resolution of this point will require the use of PSA-2 reagents that can be used at the single cell level, for example, monoclonal antibodies specific for each gene family member. Another aspect of our work is the question of variation in expression of several or all PSA- 2 genes throughout the Leishmania life cycle. For example, are some PSA-2 genes preferentially expressed during meta- cyclogenesis or in the amastigote stage? Studies with the ATPase gene indicate specific expression of one gene during transformation. We have found that some PSA-2 genes are expressed in amastigotes.'

Because of the diversity seen within the PSA-2 cDNA and genomic sequences, it is difficult to assign any tentative functional role to PSA-2 proteins based upon sequence simi- larities to known proteins. The sequence data suggest that all of the sequenced members of the gene family are GPI an- chored because of the conservation of the hydrophobic anchor sequence. Other than this, little can be discerned from the sequences. Of interest is the question of glycosylation of PSA- 2 proteins. The lectin-binding data presented in Murray et at. (1989a, 1989b) suggests that PSA-2 proteins are glycosylated with mannose and N-acetylglucosamine. The binding of PSA- 2 proteins to concanavalin A suggests the presence of N- linked mannose-containing oligosaccharides such as is found in L. menicunu gp63 (Olafson et al., 1989), while the N- acetylglucosamine could be present in N - or O-linked oligo- saccharides (Ferguson et al., 1983). The high levels of threo- nine residues present in all the inferred PSA-2 sequences suggest that these may serve as sites for 0-linked glycosyla- tion. In trypanosomatids 0-linked sugars have been described only on gp72, an epimastigote-specific antigen of T. cruzi (Ferguson et al., 1983). No potential N-linked sites (i.e. NXT/ S consensus sequence) were found in any of the sequences except a single site in the XTRI clone but since all the cDNAs were incomplete, there may be several sites within some of the gene family members.

Our current investigations are focusing on the expression patterns of PSA-2 genes as well as the relationship between each of the proteins and their post-translational modifica- tions.

Since the work in this paper was completed, a paper by Lohman et al., (1990) described the sequence analysis of a gp46 gene of L. amazonensis. This molecule has been charac- terized as a surface antigen (Kahl and McMahon-Pratt, 1987) capable of inducing a partial protection in certain mouse strains to L. amazonensis infection (Champsi and McMahon- Pratt, 1988). A comparison of the gp46 sequence (Lohman et al., 1990) to the PSA-2 XTRI sequence revealed regions of extremely high conservation at the nucleotide and amino acid level. Fig. 14 shows an amino acid comparison between XTRI and gp46 while Fig. 11A shows the overall structural features

24484 Leishmania Surface Antigen Gene Family

MAQCVRRLVL GDARRCGGAA AVHEQ.ARVA RAAGTGDFTA AQRTNTLAVL QAFGRAIPKL 59

MAQCVRRLVL AAPLAAWAL LLCTSSAPVA RAAGTSDFTG AQQKNTLTVL QAFRRAIPAL ********** * ** ***** *** ** ttt ** *** **** *

119 GEKWAGNDFC SWEAVLCNAP D V W S G I S P T YAGTLPEMPE NVDXRHWIR RLDFSEMGPG

GDTWTGSDFC SWEHIICYSS GVGWlMHNM YTGTLPEMPA SMYKDVMIL ALDFGAMGQG .j * " **. **I * * * *.;**e** *** * t** r i *

~ - ~ -

LSGTVPASWH SMTSLESLSI EKCESISGSV PPEWGSMTSL SVLNLRGTGI SGTLPPQWSG 179

**** * ** I* * ** * ** *****t ** LSGTLPPSWS SMKHLI . . . . .................... .VLDLEGTKV SGTLE'PEWSE

239 MSKARSLQLQ DCDLSGSLPS SWSAIPMLAS VSLKGNKFCG CVPDSWDQKA GLWDIEDKH

MTSAEALQLE NCGLSGSLPT SWSSMPKLRI VSLSGNHFCG CVPDSWREKD RLDVTIEEWH * * *** * ****** *** * * ,** ** *** ****** * * * **

299 KGSDCLAAKD CATTTTKPSA TTATTPNLTN FPPTPRTTTE PLTTTSTEAP AEPTTTTEAP *If * * *

MGEDCKLANA CRPTAAPGTT T T N P . . . . . . **

..............................

AEPTTTATPT NTPTPAPETE CEVDGCEVCE GDSARRCARC REDYFLTDEK T C L K H N D G 359

**** ** I* * ********** ********** ** * ***** *t* . .PTTTGTPA ASSTPSPGSG CEVDGCEVCE GDSAARCARC REGYSLTDEK TCLGEPRllRR

370 380 384 .............................. ~~- AAVSSGVAAA A W C V A V L F S VGL A A . . . . .

GGGVERTAGC RCCVGGCAVE RGAGGVRVRR APLLCGAPGA CPRPRHGWA ALSPPPADGE

............................................................ TDSHTRTRTR RRASRVLSAV VAPARMHGHA EACMRVRVPA LVCLSVWPAV GTRRRSNVRA

439 ...........................................................

AAVCRLGQRR CGARPSPCAS VCVSWPRERR TECACPALFD GARLRCCALV VCAGAAPAG

FIG. 14. Comparison of the deduced amino acid sequences of the XTRI coding region (top line) to the gp46/M-2 sequence derived by Lohman et al. (1990). Identical residues are shown with asterisks and hydrophobic regions are underlined. Numbering is according to the XTRI amino acid sequence.

of gp46 in relation to the PSA-2 sequences. This result indi- cates gp46 belongs to the PSA-2 gene family. A very interest- ing feature of the gp46 sequence is in the COOH-terminal region. As shown in Fig. 14, the PSA-2 cysteine-rich COOH- terminal region is almost identical over 33 residues to the gp46 sequence. After this region the sequences abruptly di- verge at the nucleotide and amino acid level. Because all the L. major PSA-2 clones have nearly identical COOH termini, i t is feasible to suggest that the gp46 gene sequenced by Lohman et al. (1990) has undergone a gene conversion event to generate a novel COOH terminus including a novel GPI- anchor signal sequence. Because PSA-2-like sequences are found as multigene families in all Leishmanias examined to date, such diversity may have arisen by unequal cross-over events. These data suggest a further level of complexity and diversity in PSA-2 biology.

Acknowledgments-We thank Emanuela Handman for constant encouragement and the cDNA library that made our findings possible; Vikki Marshall, Anabel Silva, and Russell Lane for dedicated and patient assistance with the automated sequencing; Simon Foote and Alan Cowman for many helpful discussions; Dr. W. R. McMaster for the NIH-S genomic DNA; Nick Samaras for some Leishmania chro- mosome blocks; Jason Smythe and Ross Coppel for assistance in computing analysis; Graham Mitchell, Martin Elhay, Jason Smythe, Dave Kemp, and Richard Harvey for criticism and encouragement; and Roberto Cappai for providing some much needed assistance with this paper.

REFERENCES Beard, C. A., Wrightsman, R. A., and Manning, J. E. (1988) Mol. Biochem.

Bordier, C., Etges, R. J., Ward, J., Turner, M. J., and Cardoso de Almeida, M.

Bouvier, J., Bordier, C., Vogel, H., Reichelt, R., and Etges, R. J. (1989) Mol.

Button, L. L., Russell, D. G., Klein, H. L., Medina-Acosta, E., Karess, R. E.,

Champsi, J., and McMahon-Pratt, D. (1988) Znfect. Immun 66,3272-3279 Chandhuri. G.. Chandhuri. M.. Pan. A.. and Chane. K.-P. (1989) J. Biol. Chem.

Parasitol. 28,227-234

L. (1986) Proc. Nutl. Acud. Sci. U. S. A. 83,5988-5991

Biochem. Parasitol. 37,235-246

and McMaster, W. R. (1989) Mol. Biochem. Parasitol. 32,271-284

264,1483-?489

Biochemistry 18,5294-5299

. , . , -. . , Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutler, W. J. (1979)

Church, G . M., and Gilbert, W. (1984) Proe. Nutl. Acud. Sci. U. S. A. 81,1991-

Cowman, A. F., Bernard, O., Stewart, N., and Kemp, D. J. (1984) Cell 37,653-

Donelson, J. E. (1988) in The Biology of Parasitism (Englund, P., and Sher, A.,

Feinberg, A. P., and Vogelstein, D. (1983) Anal. Biochem. 132,6-13 Ferguson, M. A. J., Allen, A. K., and Snary, D. (1983) Biochem. J. 213, 313-

Frohman, M. A,, Dush, M. K., and Martin, G. R. (1988) Proc. Nutl. Acud. Sci.

Frommel, T. 0.. Button, L. L., Fujikura, Y., and McMaster, W. R. (1990) Mol.

Garfinkel, M. D., Pruitt, R. E., and Meyerowitz, E. M. (1983) J. Mol. Biol. 168,

Hattori, M., and Sakaki, Y. (1986) A d . Biochem. 152, 232-237 Handman, E., and Goding, J. W. (1985) EMBO J. 4,637-643 Handman, E., Hocking, R. E., Mitchell, G. F., and Spithill, T. W. (1983) Mol.

Biochem. Purasitol. 7,111-126 Henikoff, S. (1984) Gene (Anst.) 28,351-359 Ibanez, C. F., Affranchino, J. L., Macina, R. A,, Reyes, M. B., Le izamon, A.,

Biochem. Parasitol. 30,27-34 Camargo, M. E., Aslund, L., Pettersson, U., and Frasch, A. C. p(1988) Mol.

Iovannisci, D. M., and Beverley, S. M. (1989) Mol. Biochem. Parasitol. 34,177- 188

Ip, H. S., Orn, A. M., Russell, D. G., and Cross, G. A. M. (1990) Mol. Biochem. Purasitol. 40,163-172

Johnson, P. J., Kooter, J. M., and Borst, P. (1987) Cell 61,273-281 Kahl, L. P., and McMahon-Pratt, D. (1987) J. Zmmunol. 138,1587-1595 Kahn, S., Van Voorhis, W. C., and Eisen, H. (1990) J. Exp. Med. 172, 589-

Kidd, S., Kelley, M. R., and Young, M. W. (1986) Mol. Cell. Biol. 6,3094-3108 Landfear, S. M., and Wirth, D. F. (1984) Nature 309, 716-717 Lohman, K. L., Langer, P. J., and McMahon-Pratt, D. (1990) Proc. Nutl. Acud.

Sei. U. S. A. 87,8393-8397 Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning: A

Laboratory Manwl, Cold Spring Harbor Laboratory Press, Cold Sprmg Harbor, NY

McConville, M. J. and Bacic, A. (1990) Mol. Bioehem. Purasitol. 38, 57-68 McConville, M. J.', Bacic, A,, Mitchell, G. F., and Handman, E. (1987) Proc.

Meade, J. C., Shaw, J., Lemaster, S., Gallagher, G., and Stringer, J. R. (1987)

Medina-Acrosta, E., Karess, R. E., Schwartz, H., and Russell, D. G. (1989) Mol.

Miller, R. A,, Reed, S. G., and Parsons, M. (1990) Mol. Biochem. Purasitol. 39,

Murray, P. J., Spithill, T. W., and Handman, E. (1989a) Znfect. Immun. 57,

Murray, P. J., Spithill, T. W., and Handman, E. (1989b) J. Zmmuol. 143, 4221-4226

Murray, P. J., Handman, E., Glaser, T. A., and Spithill, T. W. (1990) Exp. Purasitol. 7 1 , 294-304

Olafson, R. W., Thomas, J. R., Ferguson, M. A. J., Dufek, R. A., Chandhuri, M., Chang, K.-P., and Rademacher, T. W. (1990) J. Bud. Chem. 265,12240- 12247

1995

660

eds) Alan R. Liss, New York

319

U. S. A. 86,8998-9002

Bwchem. Parasitol. 38,25-32

765-789

597

Natl. Acud. Sci. U. S. A. 84,8941-8945

Mol. Cell. Biol. 7,3937-3946

Biochem. Parasitol. 37,263-274

267-274

2203-2209

Orlandi, P. A., and Turco, S. J. (1987) J. Biol. Chem. 262,10384-10391 Petersen, D. S., Wrightsman, R. A,, and Manning, J. E. (1986) Nature 322,

Petersen, D. S., Fouts, D. L., and Manning, J. E. (1989) EMBO J. 12,3911- 566-568

201 fi

RiGYI, D. G. (1987) Eur. J . Bwchem. 164,213-221 Sacks, D. L. (1989) Exp. Parasitol. 69, 100-103 Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acud. Sci. U. S.

Samaras N., and S ithill, T. W. (1987) MOL Biochem. Parasitol. 26,279-291 Spithill, T. W. anXSarnaras, N. (1987) Mol. Biochem. Purasitol. 24,23-38 Stein, D. A., dairns, B. R., and Landfear, S. M. (1990) Nucleic Acids Res. 18,

Takle, G. B., Young, A., Snary, D., Hudson, L., and Nicholls, S. C. (1989) Mol.

Tschudi, C., and Ullu, E. (1988) EMBO J. 7,455-463

Wharton, K.'A., Johansen, K. M., Xu, T., and ikvanis-Tsakonas, S. (1985) Wallis, A. E. and McMaster, W. R. (1987) J. Ex Med. 166,1814-1824

Wilson, M. E., Hardin, K. K., and Donelson, J. E. (1989) J. Zmmunol. 143,

A. 74,5463-5461

1549-1557

Bwchem. Purasitol. 37,57-64

Cell 43,567-581

678-684