6
Eur. J. Biochem. 233, 744-749 (1995) 0 FEBS 1995 A precursor-product relationship in molluscan sperm proteins from Ensis minor Antonella BANDIERA I. Urnesh A. PATEL’, Guidalberto MANFIOLETTI I, Alessandra RUSTIGHI I, Vincenzo GIANCOTTI and Colyn CRANE-ROBINSON’ I Dipartimento di Biochimica, Biofisica e Chimica delle Macromolecole, Universitii di Trieste, 1-341 27, Italy Biophysics Laboratories, University of Portsmouth, England (Received 7 July/21 August 1995) - EJB 95 1107/3 A cDNA library prepared from mRNA extracted from immature male gonads of the bivalve mollusc Ensis rninor (razor shell) was probed with a 133-bp reverse-transcriptase PCR product corresponding to a segment of the sperm protein EM6 [Giancotti, V., Russo, E., Gasparini, M., Serrano, D., Del Piero, D., Thorne, A. W., Cary, P. D. & Crane-Robinson, C. (1993) Eur. J. Biochenz. 136, 509-5161. A single 1.5-kb clone was found to encode both sperm proteins EM1 and EM6. Mass spectrometry was used to define the C-terminus of EM1, and since the N-terminus of EM6 is known from Edman degradation, this showed that the pentapeptide NTNNS must be lost on proteolytic processing. Both EM1 and EM6 contain highly repeated amino acid sequences, suggestive of extended structures. EM1 contains seven tandem repeats of the dipeptide S(WR), followed by six potential cdc2 phosphorylation sites and seven repeats of the octapeptide KRSASKKR, with occasional WR substitutions. EM6 contains a globular domain preceded by 17 almost identical uninterupted tandem repeats of the motif KKRSXSRKRSAS, where X is charged. Its C-terminus contains 15 short basic clusters. Assignment of EM1 and EM6 to the established categories of molluscan sperm proteins [PLI, PLII, PLIII, PLIV : Ausio, J. (1 992) Mol. Cell. Biochem. 115, 163- 1721 is discussed. Keywords: protamine ; protein-processing. The basic sperm proteins of bivalve molluscs have been sub- divided into four categories (PLI, PLII, PLIII, PLTV; PL, prot- amine like) on the basis of their mobilities in polyacrylamide gels (Ausio, 1992). PLI proteins, containing about 300 amino acids, have a central globular domain (-80 amino acids) homol- ogous to that of somatic linker histones and very basic N-termi- nal and C-terminal domains. The PLT protein from Spisulu soli- dissima has been the best characterised (Ausio et al., 1987 ; Ausio and Van Holde, 1988). PLII proteins consist of about 150 amino acids and, likewise, comprise a globular domain flanked by (shorter) basic terminal domains. Complete PLIL sequences are available for both Mytilus cal(fornianus and Mytilus trossu- /us (Carlos et al., 1993a and b). PLIlI proteins, to date, comprise about 100 amino acids, have no globular domain and are rich in both lysine and arginine residues. Well characterised examples from the Mytilus family (mussels) have been described (Rocchini et al., 1995: Ruiz-Lara et al., 1993; Carlos et al., 1993a). PLTV proteins consist of about 50-60 amino acids and are highly basic, as are those of the PLIII category. The best characterised examples (from M. trossulus and M. calfornianus) have a high content of lysine residues (~50%) but contain very few arginine residues. To date, PLIV proteins have only been analysed in the genus Mytilus. These small, basic proteins must be contrasted with the ‘true’ protamines from fish such as sal- Correspondence to V. Giancotti, Dipartimento di Biochimica, Bio- fisica e Chimica delle Macromolecole, UniversitB di Trieste, 1-34127, Italy Fux: +39 40 6763694. Ahhreviutions. M-MLV, Moloney leukemia virus reverse tran- miptase; RACE, rapid amplification of cDNA ends; PL, protamine like. Note. The novel nucleotide sequence published here has been sub- mitted to the GenBankIEMBL data bank and is available under accession number L41832. mine or truteine, which consist of -30 amino acids but have a very high proportion of arginine as apposed to lysine residues. Ensis rninor (razor shell) exhibits three sperm-specific pro- teins EM6, EM5 and EMI, in addition to small amounts of so- matic core histones (Giancotti et al., 1983; Giancotti et al., 1992). On the basis of their mobility, EM6 is a PLT protein, whereas both EM5 and EM1 fall into the PLII category. Partial protein sequence data showed that both EM6 and EM5 contain globular domains whilst amino acid composition data showed that EM1 does not (Giancotti et al., 1992). The present work was undertaken to increase our understand of the categories of E. nzinor sperm proteins, which seem at variance with their ap- parent molecular masses. Since these sperm proteins have strongly repetitive and monotonous sequences, protein sequenc- ing is particularly difficult and we, therefore, resorted to cDNA sequencing. Clones were selected using a probe corresponding to a segment of the globular domain from EM6 and a library constructed from male gonads of E. minor. Surprisingly, a single clone was found that encompassed both EM1 and EM6. Since it lacked the most 5‘ bases, a rapid amplification of cDNA ends (5’-RACE) procedure was used to establish the missing segment. We conclude that proteolytic processing occurs to generate the observed EM1 and EM6 proteins. Proteolytic processing has also been found in an M. trossulus sperm protein (Carlos et al., 1993b). MATERIALS AND METHODS Isolation and fractionation of mRNA. Total RNA was pre- pared from immature male gonads of E. minor immediately after collection, as previously described (Manfioletti et al., 1991). Po- lyadenylated [poly(A)-rich] mRNA was isolated using a com-

A Precursor-product Relationship in Molluscan Sperm Proteins from Ensis minor

Embed Size (px)

Citation preview

Eur. J. Biochem. 233, 744-749 (1995) 0 FEBS 1995

A precursor-product relationship in molluscan sperm proteins from Ensis minor Antonella BANDIERA I. Urnesh A. PATEL’, Guidalberto MANFIOLETTI I, Alessandra RUSTIGHI I, Vincenzo GIANCOTTI and Colyn CRANE-ROBINSON’ I Dipartimento di Biochimica, Biofisica e Chimica delle Macromolecole, Universitii di Trieste, 1-341 27, Italy ’ Biophysics Laboratories, University of Portsmouth, England

(Received 7 July/21 August 1995) - EJB 95 1107/3

A cDNA library prepared from mRNA extracted from immature male gonads of the bivalve mollusc Ensis rninor (razor shell) was probed with a 133-bp reverse-transcriptase PCR product corresponding to a segment of the sperm protein EM6 [Giancotti, V., Russo, E., Gasparini, M., Serrano, D., Del Piero, D., Thorne, A. W., Cary, P. D. & Crane-Robinson, C. (1993) Eur. J. Biochenz. 136, 509-5161. A single 1.5-kb clone was found to encode both sperm proteins EM1 and EM6. Mass spectrometry was used to define the C-terminus of EM1, and since the N-terminus of EM6 is known from Edman degradation, this showed that the pentapeptide NTNNS must be lost on proteolytic processing. Both EM1 and EM6 contain highly repeated amino acid sequences, suggestive of extended structures. EM1 contains seven tandem repeats of the dipeptide S(WR), followed by six potential cdc2 phosphorylation sites and seven repeats of the octapeptide KRSASKKR, with occasional W R substitutions. EM6 contains a globular domain preceded by 17 almost identical uninterupted tandem repeats of the motif KKRSXSRKRSAS, where X is charged. Its C-terminus contains 15 short basic clusters. Assignment of EM1 and EM6 to the established categories of molluscan sperm proteins [PLI, PLII, PLIII, PLIV : Ausio, J. (1 992) Mol. Cell. Biochem. 115, 163- 1721 is discussed.

Keywords: protamine ; protein-processing.

The basic sperm proteins of bivalve molluscs have been sub- divided into four categories (PLI, PLII, PLIII, PLTV; PL, prot- amine like) on the basis of their mobilities in polyacrylamide gels (Ausio, 1992). PLI proteins, containing about 300 amino acids, have a central globular domain (-80 amino acids) homol- ogous to that of somatic linker histones and very basic N-termi- nal and C-terminal domains. The PLT protein from Spisulu soli- dissima has been the best characterised (Ausio et al., 1987 ; Ausio and Van Holde, 1988). PLII proteins consist of about 150 amino acids and, likewise, comprise a globular domain flanked by (shorter) basic terminal domains. Complete PLIL sequences are available for both Mytilus cal(fornianus and Mytilus trossu- /us (Carlos et al., 1993a and b). PLIlI proteins, to date, comprise about 100 amino acids, have no globular domain and are rich in both lysine and arginine residues. Well characterised examples from the Mytilus family (mussels) have been described (Rocchini et al., 1995: Ruiz-Lara et al., 1993; Carlos et al., 1993a). PLTV proteins consist of about 50-60 amino acids and are highly basic, as are those of the PLIII category. The best characterised examples (from M. trossulus and M. calfornianus) have a high content of lysine residues ( ~ 5 0 % ) but contain very few arginine residues. To date, PLIV proteins have only been analysed in the genus Mytilus. These small, basic proteins must be contrasted with the ‘true’ protamines from fish such as sal-

Correspondence to V. Giancotti, Dipartimento di Biochimica, Bio- fisica e Chimica delle Macromolecole, UniversitB di Trieste, 1-34127, Italy

Fux: +39 40 6763694. Ahhreviutions. M-MLV, Moloney leukemia virus reverse tran-

miptase; RACE, rapid amplification of cDNA ends; PL, protamine like. Note. The novel nucleotide sequence published here has been sub-

mitted to the GenBankIEMBL data bank and is available under accession number L41832.

mine or truteine, which consist of -30 amino acids but have a very high proportion of arginine as apposed to lysine residues.

Ensis rninor (razor shell) exhibits three sperm-specific pro- teins EM6, EM5 and EMI, in addition to small amounts of so- matic core histones (Giancotti et al., 1983; Giancotti et al., 1992). On the basis of their mobility, EM6 is a PLT protein, whereas both EM5 and EM1 fall into the PLII category. Partial protein sequence data showed that both EM6 and EM5 contain globular domains whilst amino acid composition data showed that EM1 does not (Giancotti et al., 1992). The present work was undertaken to increase our understand of the categories of E. nzinor sperm proteins, which seem at variance with their ap- parent molecular masses. Since these sperm proteins have strongly repetitive and monotonous sequences, protein sequenc- ing is particularly difficult and we, therefore, resorted to cDNA sequencing. Clones were selected using a probe corresponding to a segment of the globular domain from EM6 and a library constructed from male gonads of E. minor. Surprisingly, a single clone was found that encompassed both EM1 and EM6. Since it lacked the most 5‘ bases, a rapid amplification of cDNA ends (5’-RACE) procedure was used to establish the missing segment. We conclude that proteolytic processing occurs to generate the observed EM1 and EM6 proteins. Proteolytic processing has also been found in an M. trossulus sperm protein (Carlos et al., 1993 b).

MATERIALS AND METHODS

Isolation and fractionation of mRNA. Total RNA was pre- pared from immature male gonads of E. minor immediately after collection, as previously described (Manfioletti et al., 1991). Po- lyadenylated [poly(A)-rich] mRNA was isolated using a com-

Bandiera et al. (EZN L Biochenz. 233) 745

mercial packed oligo(dT)-cellulose spin column (mRNA purifi- cation kit, Pharmacia).

cDNA library construction and screening. An oligo(dT) and random-primed cDNA library was made from E. minor poly(A)-rich and poly(A)-depleted mRNA mixture using a cDNA synthesis kit (Pharmacia). cDNAs were size selected, purified, then inserted into a AMaxl vector as previously de- scribed (Pate1 et al., 1994). 300000 plaques of the cDNA library were screened using the 133-bp insert of clone KA-24 as a spe- cific probe, labelled by random priming to a specific activity of 3X108 c p d p g , as previously described (Manfioletti et al., 1991). 15 positive clones were rescreened. One of them, clone 3.la, having an insert of about 1.5 kb was subcloned in Bluescript KS( +).

Preparation of a specific probe by PCR. Two degenerate oligonucleotides were prepared on the basis of the known pro- tein sequence of the C-terminal part of the globular domain of EM6. S’-GGATCCNGCNGGNATGAA(A/G)AA(C/T)CA-3’ (EM6-S’, corresponding to the peptide AGMKNH) and 5’-AAGCTTNGGNGGNGC(C/T)TT(C/T)TTNAC-3’ (EM6-3’, corresponding to the peptide VKKAPPK) were used to amplify an estimated 126-bp region. First-strand synthesis was at 42°C for 1 h in 20 p1 containing 1 mg total RNA from E. minor go- nads, 30 pmol EM6-3’, 60 mM Tris/HCI, pH = 8.3, 7.5 mM KCI, 3 mM MgCl,, 1 mM dNTPs and 200 units Moloney leukemia virus reverse transcriptase (M-MLV ; BRL). The reaction mix- ture was then freeze inactivated and made up to 50 pl containing SO pmol of EM6-5’, 50 pmol EM6-3‘, 60 mM Tris/HCl, pH = 8.3, 50 mM KCI, 3 mM MgCI, and 2.5 units Tuq polymerase (Promega). The amplification consisted of 30 cycles of 1 min at 94”C, 1.5 min at 50°C and 1 min at 72°C followed by one final extension for 3 min at 72°C. The PCR products were analysed by gel electrophoresis on 2% agarose. A band of approximately the expected size was recovered from the gel, purified and sub- cloned in a Bluescript KS(+) plasmid vector (Stratagene). The resulting clone, KA-24, was sequenced and shown to have a 133-bp insert coding for the final part of the globular domain of EM6.

5’ RACE. The reverse-transcriptase reaction was performed using 10 mg total RNA from male immature gonads of E. minor as template and random hexamers as primers under the same conditions as described above, (45 min at 42°C and 15 min at 45 “C). A band of approximately 100 bp was obtained after the second round of PCR and was purified from the reaction using a Microspin S-300 column, (Pharmacia). The ends of the duplex PCR product were filled in using the Klenow fragment of DNA polymerase 1 (Promega), phosphorylated using poly- nucleotide kinase (PNK, Promega) and subcloned in Bluescript KS( +).

DNA sequencing and sequence analysis. Clone 3.la, bearing the 1.5-kb insert, subcloned in Bluescript KS(+), was digested with XbaI, TuqI and PvuII. The fragments were purified and subcloned. The selected clones IIIXn, IIIXt, IIXv, IXa, IPc, IIPc and ITa were fully sequenced. DNA sequencing was carried out by the dideoxynucleotide method using T7 Sequenase (Phar- macia) and with a Sequencing PRO kit (Cambridge BioScience).

Mass spectrometry. HPLC-purified fractions of EM 1 and EM6 were lyophilised and redissolved in 10 pl 80% acetonitrile, 20% water plus 0.1 % formic acid (200-400 pmol). An API/ Perkin Elmer SCIEX mass spectrometer was used. Samples were injected at a flow rate of 2 pllmin and scanned at mlz = 600-1500. Mass-scale calibration was carried out by means of the multiple-charged ions of a separate introduction of myoglo- bin. All data are shown as average masses.

RESULTS

cDNA cloning. Two degenerate 24-residue oligonucleotides were synthesised, corresponding to two separate segments (AGMKNH and VKKAPPK) of the EM6 sequence (spaced by 31 amino acids) established by Edman degradation (Giancotti et al., 1992), with attached restriction sites. These were used in a reverse-transcriptase PCR reaction using total RNA extracted from male gonads of E. minor as template. This yielded a 133-bp product that was subcloned into a plasmid vector and sequenced, demonstrating it to represent the amino acid se- quence from the expected segment of EM6. A cDNA library was prepared by subjecting total RNA from male gonads to oli- go(dT) fractionation, then mixing the retained poly(A)-rich frac- tion with an aliquot of the poly(A)-depleted run-through frac- tion. This was done since there was uncertainty as to whether the mRNAs sought were poly(A) rich or not (Carlos et al., 1993b). This library was probed with the 133-bp reverse-tran- scriptase PCR product and 15 positive clones isolated on pri- mary screening. After secondary plating and screening, a clone (3.la) having the largest insert ( ~ 1 . 5 kb) was sequenced. Hypo- thetical translation revealed the presence of the amino acid seg- ment of EM6 corresponding to the probe and also revealed the presence of EM1-like sequences 5’ to those of EM6. The N- terminal sequence of EM1 protein is known from Edman degra- dation (Giancotti et al., 1992) and its comparison with the amino acid sequence deduced from the cDNA showed essential iden- tity. However, the clone lacked the first base of the N-terminal alanine residue and the presumed ATG start codon. To establish more 5’ sequence, 5’-RACE was employed using random hex- amen as primers with total RNA as template for first-stand syn- thesis. This established 23 more bases 5’ to the end of clone 3.la (Fig. 1).

Confirmation that the 150 N-terminal amino acids corre- spond to EM1 follows from the almost complete identity to the protein sequence of the first 34 amino acids (Giancotti et al., 1 992) and correspondence to the sequence of several peptides previously obtained from EM1 by proteolysis (McKay, D. and Dixon, G. H., unpublished data). In particular, the thermolysin peptide ASKKRRSRSPKKRSKSKK and the LysC peptide RSASRNK occupy residues 62-79 and residues 141 -147, re- spectively. The N-terminal 31 residues of EM6 determined by Edman degradation (AKKRSRSRK, etc; Giancotti et al., 1992), locate shortly after peptide RSASRNK, and thereby define the N-terminal start of EM6 in the cDNA sequence. However, the C-terminus of EM1 is not precisely defined and, for this, we used electrospray mass spectrometry with purified EM1 protein (Fig. 2). The principal component peak at 16 854 Da corresponds precisely to the amino acids at A1 -N1 SO. The smaller peak of 16 9.53 Da is presumed to correspond to the minor EM1 compo- nent defined previously by reverse-phase chromatography (Giancotti et al., 1992). Three differences, at positions 26, 31 and 34 (S to R, R to K, R to K, respectively) are noted between the present sequence of EM1 and that of the N-terminal peptide previously reported (Giancotti et al., 1992). These changes cannot be explained as due to a peptide from the higher mass protein species (greater by 99 Da; Fig. 2) having been se- quenced; the explanation is probably sequencing errors at the protein level.

Definition of the C-terminus as N150 implies that five amino acids (NTNNS) are lost during processing of precursor protein to yield EM1 and EM6. Mass spectrometry was also used with purified EM6, which is observed from electrophoresis to be made up of at least three components (Fig. 2). The main compo- nent of 40415 Da corresponds precisely to the 349 amino acids of EM6 at A1 -H349, preceding the TAA stop codon. We pre-

746 Bandiera et al. ( E m J. Biochem. 233)

Fig. 1. Nucleotide sequence and presumed translation of cDNA isolated from the library made from immature gonads of male E. minor. Protein EM1 extends over A1 -N150. Alternating S(R/K) dipeptide motifs are underlined in purple. Potential cdc2 phosphorylation sites are underlined in amber and the octapeptide repeat is underlined in green. The excised pentapeptide, NTNNS, is boxed. EM6 extends over A1 -H349. The 17 tandem 12-residue repeats in the N-terminal domain are underlined in blue and the globular domain is boxed in red. Clusters of basic residues in the C-terminal domain are underlined in yellow. A poly(A) tail at the 3’ end is preceded by a poly(A) addition sequence (underlined), indicating that this clone (3.la) came from the poly(A)-rich fraction of the cDNA library.

sume that the peaks of masses 40302 Da and 40520 Da repre- sent EM6 components coded for by other genes, since the mass differences do not correspond to multiples of the mass of a phos- phate group, the other possible explanation for the presence of multiple species. Four differences are noted between the present sequence and that of EM6 peptides published previously (Giancotti et al., 1992), at positions 14 (R to K), 28 (S to R), 30 (K to H), 31 (K to S), 295 (P to K) and 298-302 (KKTKK to RRRVR). Since EM6 consists of at least three components (Fig. 2), these differences are probably due to primary structure

heterogeneity. However, since several differences are located at positions towards the ends of relatively long peptide sequencing runs, mis-identification of amino acids cannot be totally dis- counted.

Protein EM1. This protein shows several primary structure fea- tures (Fig. 1). The N-terminus has seven tandem repeats of the dipeptide S(K/R). This motif is also observed in the PLIII pro- teins from Mytilus sperm (Carlos et al., 1993a; Ruiz-Lara et al., 1993; Rocchini et al., 1995), as well as in the N-terminus of the

Bandiera et al. IEur J. Bioclzem. 233) 747

Fig. 2. Mass spectrometric and gel electrophoretic analysis of micro- heterogeneity in proteins EM1 and EM 6. (A) Mass spectrum of HPLC-purified EMI. (B) Mass spectrum of HPLC-purified EM6. (C) Acetic acidhrea polyacrylamide gel of: lane 1, HPLC-purified EM1 ; lane 2, total protein extracted from mature sperm of E. minor; lane 3, HPLC-purified EM6. The three components a-c are presumed to corre- spond to the principal peaks a-c in the mass spectrum (B).

PLII proteins from M. trossulus and M. californianus (Carlos et al., 1993 b). The motif has also recently been discovered in the N-terminal portion of a protamine from the archaeogastropod Monodata turbinata (Daban et al., 1995). This protein, like EMI, has seven tandem repeats of a S(R/K) dipeptide. This is followed in EM1 by a very basic sequence that includes six potential cdc2 phosphorylation sites (SPAK, SPKWK). Such po- tential modification sites are also observed in Mytilus PLIII pro- teins and can extend over almost SO% of the molecule (Rocchini et a]., 1995). The remainder of the EM1 chain consists of seven repeats of the octapeptide KRSASKKR with occasional substitu- tion of K for R and vice versa. The spacing of these repeats is irregular and there are two overlaps. This repeat is not observed in the Mytilus PLIII or PLII proteins, but is present in the M. turbinata protein (Daban et al., 1995) as a single copy of the sequence KRSASRRR, immediately followed by SRSAGRRR, a closely related sequence. The C-terminus of EM1 is character- ised by several asparagine residues, a feature not observed i n other PLIII proteins but found in the PLII proteins of Mytilus (Carlos et al., 1992b). Since both EM1 and PLII protein from M. trossulus are N-terminal processing products of larger precur- sors, we assume that the asparagine residues are concerned with the need for processing. Nowhere in EM1 are there clusters of arginine residues, as typically found in many other types of prot- amine. Several specific features of protamine sequences have been discussed (Daban et al., 1995).

Although EM1 has a sequence resembling the N-terminus of PLII proteins, it does not possess a globular domain, nor does it have a very lysine-rich segment equivalent to the C-terminal domain of PLII proteins; we, therefore, assigned EM1 to the PLIII category. EM1 is the largest PLIII so far defined, being =SO% longer than those from M. trossulus, M. edulis and M. turbinatu.

Protein EM6. The N-terminal 205 amino acids consist of 17 almost identical tandem repeats of the 12-residue motif KKRSXSRKRSAS, where X is charged, either K, R, H or D (once) (Fig. 1). Preliminary protein sequence data for the PLI

protein from Spisula solidissima also indicate a highly repetitive N-terminal sequence, including the motif KRSASK (Ausio, 1992). This hexapeptide corresponds to the last five and the first residue of the 12-residue repeat defined in EM6. The N-terminal sequence of EM6 bears no obvious resemblance to that of PLII from M. trossulus (Carlos et al., 1993 b). The N-terminal repeats of EM6 are immediately followed by a globular domain which shows strong similarity to that from S. solidissima PLI and that from M. trossulus PLII (Fig. 3). The C-terminal segment (66 residues) is much shorter than the N-terminus, unlike most well- known mammalian or avian linker histone; its high content of basic residues are present largely as pairs or triplets, interspersed with valine, alanine and serine residues. Since there are few pro- line residues, the potential for a-helix formation in the presence of DNA is significant. This feature is not obvious in the C- terminus of PLII from M. trossulus (Carlos et al., 1993b) nor does it correspond to the AXPK repeat reported for PLI of S. solidissima (Ausio, 1992). We note that EM6 has no SPX(WR) cdc2 consensus sites. The sequence of EM6 is thus most closely akin to that of PLI from S. solidissima and EM6 is best assigned to that category.

Secondary and tertiary structures. Regular sequences imply ordered secondary and tertiary structures. The 17 repeats in the N-terminus of EM6 are particularly striking. When viewed in tandem, this repeat, totalling 204 residues, is (SXSB,),,, where B is a basic residue and X is alanine or a charged, but variable, residue, as indicated above. In an extended conformation, this chain would have runs of (X-B-B), as a continuous series of triplets on one side and a series of (S-S-B), triplets on the oppo- site side. Whilst there is, as yet, no data on the conformation of such a sequence when bound to DNA, an extended form seems more likely than an a helix. CD data from EM6 in high salt (Giancotti et al., 1983) indicate only a modest helicity (-4000" at 222 nm), consistent with the three helices of the globular domain by homology with established structures of chicken HS (Ramakrisnan et al., 1993) and its similar CD spectrum (Aviles et al., 1978). This indicates there is no helix in free EM6 other than that in the globular domain.

The seven-times repeated sequence in EM1, being irregu- larly spaced, does not present a very long range motif. However, in an extended form, the alternating serine residues would be on the same side of the chain, flanked by basic side chains (particu- larly for the repeat that overlaps with a KR pair) as for the EM6 repeat. Again, this regularity suggests an extended conformation rather than a helix.

Fig. 3 compares the globular domain of EM6 with that of S. solidissima PLI and PLII of M. trossulus, as well as with several other linker histone globular domains. Three helices and the p- sheet region have been drawn at positions corresponding to those found in the crystal structure of chicken GHS (Ramakrish- nan et al., 1993). It is reasonable to assume the conservation of these helices in EM6 from the spacing of hydrophobic residues within them. After helix 3, there is a small p sheet in GH5 that encompasses the sequence KGXGASGSF(K/R), which is very highly conserved among mammalian and avian linker-histone globular domains. In EM6, however, this region diverges signifi- cantly from the consensus in a manner which parallels that of PLI from S. solidissimus, but not that in PLII from M. trossulus. The conformation of the globular domain of EM6 may thus dif- fer from that of the better known GHYGH1 domains in this region.

Protein processing. The processing site of the EMUEM6 pre- cursor shows a strong resemblance to that of the PLIIPLIV pre- cursor of M. trossulus (Fig. 4) in that the five C-terminal resi-

748 Bandiera et al. (EM J. Biochern. 28.1)

i -

HELIX 1 ’

ti5 chiien SASHPTYSEMTAAATRAEKSRGGSSRQSIQKYIKSHY-KVGHN-ADLQIKLSIRRLLAAGVL-KQTKGVGASGSFRLAKSDK H1~a~rchin(SO~~C)PAAHPPAAEMVATAITELKDRNGSSLQAIKKYIATNF-DVQMDRQLLFIKRALKSGVEKGKL-VQTKGKGASGSFKVNVQAA Hlseaurchin(spefm) ASTHPPVLEMVQAAITAMKERKGSSAAKIKSYMAANY-RVDMNVLAPHVRRALRNGVASGAL-KQVTGTGASGRFRVGAVAK EM5 Eminw YITAAVGALKERGGSSRQAILKYIQANF-KVQABPAA

PLllM.bnssuuS(v.1) VAKKPSTLSMIVAAITAMKNRKGSSVQAIRKYILANNKGINTSHLGSAMKLAFAKGLKSGVLVRPKTSAGASGATGSFRVGK PLI SSO/&Shla KGSSGMMSMVAAAIAANRTKKGASAQAIRKYVAAHS-SLKGAVLNFRLRRALAAGLKSGALAHPKGSAGWVLVPKK EM6 E. minw RASRSGVMTKVMNAIAHCKSSKGCSAQAIRKYLAAHS-KLTGVFLNFHVRKALAAGMKNHLLAHPKGSNNFILAKKKAPRRR

HELIX 3

Fig. 3. Glohular domains from several ‘linker’ histones. EM6 from the present work is shown on the bottom line. The remaining sequences are taken from the following references: HS chicken, Briand et al. (1980); HI sea urchin (somatic), Levy et al. (1982); HI sea urchin (sperm), Strickland et al. (1980): EMS. Giancotti et al. (1992): PLII. Carlos et al. (1993b); PLI, Ausio et al. (1987). The sequences from EMS and PLI are partial. The secondary-structure elements !three helices (boxed) and one P-sheet (zig zag)] of chickcn GHS (Ramakrishnan et al., 1993) are shown below.

< EM 6 >

E. minor 1 1

N-term

146 155

EM1

~

M. trossulus

144 14b - P L - I I - < P L - I >

S. solidissima 1 7

N-term C-term

Fig. 4. The presumed precursor protein of EM1 and EM6 of E. minor, compared to the presumed precursor protein of PLII and PLIV of M . trossulus (Carlos et al., 1993h) and the PLI protein of S. solidissirnu. Globular domains are represented as spheres. Residues close to the proteolytic cleavage points of the E. minor protein (ISO/lSl) and the M. tmrsulus protcin (148) are highlighted. The peptide, NTNNS, released from the E. minor precursor, is shown as an open box. No amino acid residues are lost from the M. trossitl~is protein.

dues of the N-terminal protein are NKSNN in both cases. A striking difference, however, is shown in that no residues are lost from the M. trossidus precursor (Carlos et al., 1993b), whilst the pentapeptide NTNSS is lost from the E. minor precursor, as shown by comparing the inass spectrometry results (Fig. 2) with the sequence data (Fig. 1). It should be emphasised that neither in E. minor nor in M. trossuliis has the precursor protein been observed in extracts from sperm. As noted previously (Carlos et al., 1993b), asparagine is an atypical residue in histones and protamines and its presence is probably related to the processing event. The most likely explanation is a recognition site for a protease. However, an autocatalytic action cannot be ruled out, bearing i n mind cases i n which the residue crucial to chain scis- sion is an asparagine that forms a succinimide intermediate in

the process of becoming the C-terminal residue of the cleaved chain (Geiger and Clark, 1987: Cooper et al., 1993; Xu et al.. 1994). Since, in the case of EM1/EM6, the sequences adjacent to the asparapine-rich segment are probably non-folding in free solution, the autocatalytic explanation seems less likely, unless one supposed a conformation imposed on the precursor protein by DNA binding.

DISCUSSION

What do the present data on EM1 and EM6 say regarding the classification into the four categories of sperm proteins from bivalve molluscs? EM6 resembles the PLI protein of S. solidis-

Bandiera et al. (Eur: J. Biochem. 233) 749

sima but these two differ from the PLII proteins of Mytilus in the terminal domains, despite having related globular domains. The distinction between the PLI and PLII categories remains clear. EM1 is as large as the PLII proteins (=I50 amino acids) and, in Ensis ensis, was ascribed to the PLII category on the basis of size (Ausio, 1992). However, the present data, in partic- ular the absence of a globular domain, make it clear that EM1 should be assigned to the PLIII group instead. A feature of the PLIII group, borne out by the EM1 sequence, is that it contains both (SR), repeats and SPXK consensus sites, as also found in the N-terminal domains of PLII proteins. On the basis of the N-terminal sequence, it is thus difficult to distinguish between proteins belonging to the PLII and PLIII categories. Caution must therefore be exercised in assigning molluscan sperm pro- teins to the four categories on the basis of molecular masses (electrophoretic mobility) alone.

This work was supported in part by an EC Twinning grant (SCI- ENCE CT91-0619) awarded to V. Giancotti and C. Crane-Robinson and by grants from the Consiglio Nuzionale delle Ricerche (CNR), Roma, Italy, the Minister0 dell’Universita e della Ricerca Scientifica e Tecno- logica, Roma, Italy, the Universitu degli Studi di Trieste and the Associa- zione ltuliana per la Ricerca sul Cancro (AIRC), Milano, Italy. We thank Dr L. Ciani for technical assistance.

REFERENCES Ausio, J. & McParland, R. (1989) Eur: J. Biochem. 182, 569-576. Ausio, J., Toumadje, A., McParland, R., Becker, R. R., Johnson, W. C. &

Ausio, J. & Van Holde, K. E. (1988) Cell Differ: 23, 175-190. Ausio, J. (1992) Mol. Cell. Biochem. 115, 163-172. Aviles, F. J., Chapman, G. E., Kneale, G. G., Crane-Robinson, C. &

Van Holde, K. E. (1987) Biochemisty 26, 975-982.

Bradbury, E. M. (1978) Eur: J . Biochem. 88, 363-371.

Briand, G., Kmiecik, D.. Sautikre; P., Wouters, D., Borie-Lay, O., Biserte, G., Mazen, A. & Champagne, M. (1980) FEBS Lett. 112,

Carlos, S . , Jutglar, L., Borrell, I., Hunt, D. F. & Ausio, J. (1993a) ./.

Carlos, S., Hunt, D. F., Rocchini, C., Arnott, D. P. & Ausio, J. (1993b)

Cooper, A. A., Chen, Y.-J., Lindorfer, M. A. & Stevens, T. H. (1993)

Daban, M., Martinage, A., Kouach, M., Chiva, M., Subirana, J. A. &

Geiger, T. & Clark, S. (1987) J. Biol. Chem. 262, 785-794. Giancotti, V., Russo, E., Gasparini, M., Serrano, D., Del Piero, D.,

Thorne, A. W., Cay, P. D. & Crane-Robinson, C. (1983) Eur: J. Biockem. 136, 509-526.

Giancotti, V., Buratti, E., Santucci, A,, Neri, P. & Crane-Robinson, C. (1992) Biochem. Biophys. Acta 1119, 296-302.

Levy, S., Sures, I. & Kedes, L. (1982) J. Bid . Chem. 16, 9438- 9443.

Manfioletti, G., Giancotti, V., Bandiera, A,, Buratti, E., Sautikre, P., Cary, P. D., Crane-Robinson, C., Coles, B. & Goodwin, G. H. (1991) Nucleic Acids Res. 19, 6793-6797.

Patel, U. A,, Bandiera, A,, Manfioletti, G., Giancotti, V., Chau, K.-Y. & Crane-Robinson, C. (1 994) Biochem. Biophys. Rex Commun. 201, 63-70.

Ramakrishnan, V., Finch, J. T., Graziano, V., Lee, P. L. & Sweet, R. M.

Rocchini, C., Rice, P. & Ausio, J. (1995) FEBS Lett. 363, 37-40. Ruiz-Lara, S., Prats, E., Casa, M. T. & Cornudella, L. (1993) Nucleic

Acids Res. 21, 2774. Strickland, W. N., Strickland, M., Brandt, W. E, von Holt, C., Lehman,

A. & Wittmann-Liebold, B. (1980) Eur: J. Biochem. 104, 567-578. Xu, M.-Q., Comb, D. G., Paulus, H., Noren, C. J., Shao, Y. & Perler, F.

B. (1994) EMBO J. 2 3 , 5517-5522.

147 - 151.

Biol. Chem. 268, 185-194.

J. Biol. Chem. 268, 195-199.

EMBO J. 12, 2575-2583.

Sautibre, P. (1995) J. Mol. Evol. 40. 663-670.

(1993) Nature 362, 219-223.