8
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1987 hy The American Society of Biological Chemists, Inc. Vol. 262, No , 17. Isaue of June 15, pp. 80274034,1987 Printed in U.S.A. Mouse Glandular Kallikrein Genes STRUCTURE AND PARTIAL SEQUENCE ANALYSIS OF THE KALLIKREIN GENE LOCUS* (Received for publication, November 20, 1986) Bronwyn A. Evans, Catherine C. Drinkwater, and Robert 1. Richards From t& Howard Florey Institute of Experimental Physiology and Medicine, University of Melbourne, Parkville, Victoria 3052, Australia Mouse glandular kallikreins are encoded by a family of closely linked genes which are located on chromo- some 7 at a site corresponding to the genetically de- fined Tam-l, Prt-4, and Prt-6 loci. We have character- ized 24 kallikrein genes by genomic cloning and re- strictionmapping of 310 kilobase pairs of BALBfc mouse DNA. Most of these genes are highly homolo- gous, have the same exonfintron organization, and are linked in clusters of up to 11 genes. Partial sequence analysis of the kallikrein genes has facilitated identi- fication of those members of the family for which pro- tein sequence data exist and assignment of those which are pseudogenes or encode proteins of unknown func- tion. We find that a maximum of 14 mouse kallikrein genes have the potential to encode functional proteins. ~ ~~ ~ The enzymes which bring about maturation of growth fac- tors and polypeptide hormones are of interest because they represent potential regulatory steps in the conversion of in- active precursors to biologically active peptides. Maturation of polypeptide hormones has been shown to involve proteo- lytic cleavage of a protein precursor to give one or a number of active products. For example, cleavage of the neural peptide precursor, pro-opiomelanocortin, gives rise to the hormones ACTH, p-LPH, a- 'and @-MSH, @-endorphin, and met-en- kephalin (Nakanishi et al., 1979). Hormones such as insulin (Ullrich et al., 1977) and relaxin (Hudson et al., 1981) are activated by the removal of a single internal pro-peptide segment from a higher molecular weight precursor. Similarly, an essential step in the maturation of nerve growth factor (NGF)' and epidermal growth factor is the proteolytic release of the carboxyl-terminal segment of a larger polypeptide (Scott et al., 1983; Gray et al., 1983). There are only a few cases in which a particular enzyme has been shown directly to cleave a specific hormone or growth factor precursor. One example is the release of angiotensin I *This work was supported in part by a National Health and Medical Research Council Postdoctoral Fellowship (to B. A. E.), a Commonwealth Postgraduate Research Award (to C. C. D.), and grants to the Howard Florey Institute from the National Health and Medical Research Council of Australia, the Ian Potter Foundation, and the Myer Family Trusts. Earlier stages of the study were carried out in the Department of Genetics, Research School of Biological Sciences, Australian National University. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisernent" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequencefs) reported in this paper has been submitted 502754. to the GenBankTM/EMBL Data Bank with accession numberfs) ' The abbreviations used are: NGF, nerve growth factor; kb, kilo- base pair; HEPES, 4-(2-hydroxyethyl)-l-piperazineethanesulfonic acid bp, base pairs. from angiotensinogen due to the action of renin (Skeggs et al,, 1980). It is interesting that renin exists largely in a precursor form which must in turn be activated by an un- known protease (Fritz et al., 1986). Further examples of processing involve the class of glandular kallikreins. These are glycoproteins of molecular weight 25,000-40,000 in which the apparent heterogeneity is due to varying levels of glyco- sylation and different patterns of internal cleavage. They are related to trypsin and other serine proteases, although unlike trypsin, kallikreins show a high degree of substrate specificity. The major glandular kallikrein, found in kidney, pancreas, and salivary gland, cleaves the precursor kininogen to release bradykinin, a vasoactive peptide which may be important in regulating local blood flow (Schachter, 1980). Other charac- terized members of the kallikrein family include the a- and y-subunits of NGF, epidermal growth factor-binding protein, and y-renin, which mimics the action of renin in vitro (Poe et al., 1983). To date, we have isolated genes encoding mouse renal kallikrein (van Leeuwen et al., 19861, andthe a- andy- subunits of NGF (Evans and Richards, 1985). Another gene, mGK-1, was shown to encode a functional protein and to be expressed in male mouse salivary gland, although no physio- logical role could be ascribed to the product (Mason et al., 1983). Comparison of the amino acid sequences of different kallikreins indicated that although the proteins are highly homologous overall, regions of greater diversity are correlated with residues thought to be important in determining sub- strate specificity. This observation and the finding that kal- likreins are encoded by a large multigene family led us to propose that these proteins may play a role in the processing of a wide variety of hormone and growth factor precursors (Mason et al., 1983). In order to determine a potential limit for the kallikrein system in the maturation of biologically active peptides, we have characterized the mouse glandular kallikrein gene family both in terms of the total number of genes and in their ability to encode active serine proteases. We find that 14 kallikrein genes have the potential to encode functional proteins while a further 10 are pseudogenes. These results are discussed in relation to the observed expression of members of the kalli- krein gene family and the characteristics of this family in other mammalian species. EXPERIMENTAL PROCEDURES Isolation of Genomic DNA-A crude nuclear fraction was prepared from 5 g of BALB/c mouse liver by homogenizing the minced tissue in 20 ml of NKM solution containing 0.15 M NaCl, 5 mM KC1, and 2 mM MgCl,, followed by centrifugation at 2000 rpm for 5 min in a bench centrifuge. The nuclei were resuspended in 20 mlof NKM solution and the suspension addeddropwise to 40 mlof a buffer containing 10 mM Tris-HC1 (pH S), 10 mM NaCl, 10 mM EDTA, 0.5% sodium dodecyl sulfate, and 100 pg/ml pronase. This addition 802 7

Mouse Glandular Kallikrein Genes

Embed Size (px)

Citation preview

Page 1: Mouse Glandular Kallikrein Genes

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1987 hy The American Society of Biological Chemists, Inc.

Vol. 262, No , 17. Isaue of June 15, pp. 80274034,1987 Printed in U.S.A.

Mouse Glandular Kallikrein Genes STRUCTURE AND PARTIAL SEQUENCE ANALYSIS OF THE KALLIKREIN GENE LOCUS*

(Received for publication, November 20, 1986)

Bronwyn A. Evans, Catherine C. Drinkwater, and Robert 1. Richards From t& Howard Florey Institute of Experimental Physiology and Medicine, University of Melbourne, Parkville, Victoria 3052, Australia

Mouse glandular kallikreins are encoded by a family of closely linked genes which are located on chromo- some 7 at a site corresponding to the genetically de- fined Tam-l, Prt-4, and Prt-6 loci. We have character- ized 24 kallikrein genes by genomic cloning and re- striction mapping of 310 kilobase pairs of BALBfc mouse DNA. Most of these genes are highly homolo- gous, have the same exonfintron organization, and are linked in clusters of up to 11 genes. Partial sequence analysis of the kallikrein genes has facilitated identi- fication of those members of the family for which pro- tein sequence data exist and assignment of those which are pseudogenes or encode proteins of unknown func- tion. We find that a maximum of 14 mouse kallikrein genes have the potential to encode functional proteins.

~ ~~ ~

The enzymes which bring about maturation of growth fac- tors and polypeptide hormones are of interest because they represent potential regulatory steps in the conversion of in- active precursors to biologically active peptides. Maturation of polypeptide hormones has been shown to involve proteo- lytic cleavage of a protein precursor to give one or a number of active products. For example, cleavage of the neural peptide precursor, pro-opiomelanocortin, gives rise to the hormones ACTH, p-LPH, a- 'and @-MSH, @-endorphin, and met-en- kephalin (Nakanishi et al., 1979). Hormones such as insulin (Ullrich et al., 1977) and relaxin (Hudson et al., 1981) are activated by the removal of a single internal pro-peptide segment from a higher molecular weight precursor. Similarly, an essential step in the maturation of nerve growth factor (NGF)' and epidermal growth factor is the proteolytic release of the carboxyl-terminal segment of a larger polypeptide (Scott et al., 1983; Gray et al., 1983).

There are only a few cases in which a particular enzyme has been shown directly to cleave a specific hormone or growth factor precursor. One example is the release of angiotensin I

*This work was supported in part by a National Health and Medical Research Council Postdoctoral Fellowship (to B. A. E.), a Commonwealth Postgraduate Research Award (to C. C. D.), and grants to the Howard Florey Institute from the National Health and Medical Research Council of Australia, the Ian Potter Foundation, and the Myer Family Trusts. Earlier stages of the study were carried out in the Department of Genetics, Research School of Biological Sciences, Australian National University. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisernent" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencefs) reported in this paper has been submitted

502754. to the GenBankTM/EMBL Data Bank with accession numberfs)

' The abbreviations used are: NGF, nerve growth factor; kb, kilo- base pair; HEPES, 4-(2-hydroxyethyl)-l-piperazineethanesulfonic acid bp, base pairs.

from angiotensinogen due to the action of renin (Skeggs et al,, 1980). It is interesting that renin exists largely in a precursor form which must in turn be activated by an un- known protease (Fritz et al., 1986). Further examples of processing involve the class of glandular kallikreins. These are glycoproteins of molecular weight 25,000-40,000 in which the apparent heterogeneity is due to varying levels of glyco- sylation and different patterns of internal cleavage. They are related to trypsin and other serine proteases, although unlike trypsin, kallikreins show a high degree of substrate specificity. The major glandular kallikrein, found in kidney, pancreas, and salivary gland, cleaves the precursor kininogen to release bradykinin, a vasoactive peptide which may be important in regulating local blood flow (Schachter, 1980). Other charac- terized members of the kallikrein family include the a- and y-subunits of NGF, epidermal growth factor-binding protein, and y-renin, which mimics the action of renin in vitro (Poe et al., 1983).

To date, we have isolated genes encoding mouse renal kallikrein (van Leeuwen et al., 19861, and the a- and y- subunits of NGF (Evans and Richards, 1985). Another gene, mGK-1, was shown to encode a functional protein and to be expressed in male mouse salivary gland, although no physio- logical role could be ascribed to the product (Mason et al., 1983). Comparison of the amino acid sequences of different kallikreins indicated that although the proteins are highly homologous overall, regions of greater diversity are correlated with residues thought to be important in determining sub- strate specificity. This observation and the finding that kal- likreins are encoded by a large multigene family led us to propose that these proteins may play a role in the processing of a wide variety of hormone and growth factor precursors (Mason et al., 1983).

In order to determine a potential limit for the kallikrein system in the maturation of biologically active peptides, we have characterized the mouse glandular kallikrein gene family both in terms of the total number of genes and in their ability to encode active serine proteases. We find that 14 kallikrein genes have the potential to encode functional proteins while a further 10 are pseudogenes. These results are discussed in relation to the observed expression of members of the kalli- krein gene family and the characteristics of this family in other mammalian species.

EXPERIMENTAL PROCEDURES

Isolation of Genomic DNA-A crude nuclear fraction was prepared from 5 g of BALB/c mouse liver by homogenizing the minced tissue in 20 ml of NKM solution containing 0.15 M NaCl, 5 mM KC1, and 2 mM MgCl,, followed by centrifugation at 2000 rpm for 5 min in a bench centrifuge. The nuclei were resuspended in 20 ml of NKM solution and the suspension added dropwise to 40 ml of a buffer containing 10 mM Tris-HC1 (pH S), 10 mM NaCl, 10 mM EDTA, 0.5% sodium dodecyl sulfate, and 100 pg/ml pronase. This addition

802 7

Page 2: Mouse Glandular Kallikrein Genes

8028 Mouse Glandular Kallikrein Gems was carried out over several hours, with the solution stored at 37 "C, then incubation was continued overnight a t 37 "C. The DNA suspen- sion was allowed to cool to room temperature and was gently extracted with 30 ml of phenol, and 30 ml of chloroform. The aqueous phase was dialyzed overnight at 4 "C against several changes of TE (10 mM Tris-HC1, pH 8, 1 mM EDTA), and then the solution was incubated at 37 "C for 2 h with 20 pg/ml ribonuclease A. The phenol/chloroform extraction and dialysis were repeated as before and the DNA stored at 4 "C over 10 ml of chloroform. We find that this procedure involving minimal handling consistently gives high molecular weight DNA at a concentration of 0.3-0.5 mg/ml which can be cleaved readily with restriction enzymes.

Genomic Library Construction-Procedures for construction of the EMBL 3A library were essentially as described by Kaiser and Murray (1985). BALB/c genomic DNA was partially digested with Sau3A (Amersham Corp.) under conditions giving a maximum number of molecules in the size range 10-25 kb. Trial digestions were carried out using 35 pg of DNA in a total volume of 500 pl, and scaling-up was done by setting up 10 such reactions with slightly varying amounts of Sau3A (e.g. 3-5 units). Following incubation at 37 "C for 45 min, 12.5 pl of 1 M Tris-HC1 (pH 8) and 2 units of bacterial alkaline phosphatase (Sigma type IIIS, activated by dialysis against 10 mM Tris-HC1 (pH 8), 1 mM MgCI,, 0.1 mM ZnC12) were added to each tube. The solutions were incubated at 65 'C for 15 min, cooled briefly in ice, and extracted with phenol/chloroform, then the DNA was ethanol precipitated overnight at -20 "C. To check the efficiency of dephosphorylation, an additional 5 pg of DNA was digested to completion with Sau3A. This solution was split and one-half treated with alkaline phosphatase under the same conditions as above. Fol- lowing phenol/chloroform extraction and ethanol precipitation, 1 pg of each DNA sample was treated with T4 DNA ligase, and these were compared with unligated samples by electrophoresis on a 0.8% agarose gel. DNA treated with alkaline phosphatase showed no increase in size following the ligation reaction, whereas the untreated DNA self- ligated to give molecules longer than 5 kb.

The 350 pg of Sau3A-digested, dephosphorylated genomic DNA was pooled in 400 pl of TE buffer and fractionated by centrifugation over a 5-24% NaCl gradient (37,000 rpm for 4.5 h a t 25 "C, in an SW 41 rotor). Fractions were analyzed by electrophoresis on a 0.5% agarose gel, and DNA fragments of 15-23 kb were inserted into the EMBL 3A vector (Frischauf et al., 1983) then packaged in uitm (Maniatis et al., 1982). Using the host LE392 (Maniatis et al., 1982), recombinant phage were plated out a t a density of 15,000/9-cm plate and screened with kallikrein probes.

Hybridization Analysis of Genomic and Cloned DNA-Genomic DNA or kanikrein-positive clones were digested with appropriate restriction enzymes (see "Results" section, enzymes from P. L. Bio- chemicals, Pharmacia) and then electrophoresed overnight (1.5 V/ cm) on 0.9% agarose gels in TBE buffer (Maniatis et al., 1982). The DNA was transferred to 0.45-pm nitrocellulose filters by the method of Southern (1975), and hybridized to probes prepared either by

et al., 1986) in the presence of [a-"PIdCTP (from Bresa, Adelaide, Klenow-extension of random-primed DNA fragments (van Leeuwen

Australia), or 5"labeling of oligodeoxyribonucleotides (Maniatis et al., 1982) using T4 polynucleotide kinase and [T-~'P]ATP (Amersham Corp.). Nitrocellulose filters were prehybridized for 2-4 h at 65 "C in a buffer containing 50 mM HEPES (pH 71, 3 X SSC, 0.1% sodium dodecyl sulfate, 1 mM EDTA, 0.2% Ficoll 400,0.2% polyvinylpyrrol- idone, 0.2% bovine serum albumin, and 50 pg/ml sonicated herring sperm DNA. Following addition of the probe, hybridization was continued in the same buffer overnight at 65 'C for random-primed probes or room temperature for oligodeoxyribonucleotides. Filters were washed in 2 X SSC, at room temperature for X clone blots or 65 "C for genomic DNA blots hybridized with random-primed probes and varying temperatures as required for blots probed with oligode- oxyribonucleotides (van Leeuwen et al., 1986). The filters were then exposed at -80 "C with intensifying screens to Kodak XAR-5 film.

Partial Sequencing of Kallikrein Genes-In most cases, clones containing 1 or 2 kallikrein genes were digested with Sau3A and subcloned into dephosphorylated, BamHI-cut M13 mp18 or 19 (Mess- ing, 1983). The efficiency of this procedure was increased by using a molar excess of the DNA, favoring multiple inserts. In those cases where a Sau3A site was found to occur in the coding region, subclon- ing was carried out on the basis of defined restriction sites (Fig. 2). Because our sequencing strategy relied on the use of kallikrein primers, rather than the usual 17 mer complementary to M13, the presence of very long or multiple inserts was unimportant. Two oligodeoxyribonucleotides were chemically synthesized (van Leeuwen

et al., 1986); UKP-2 (Fig. lB), a 27 mer complementary to the conserved region in exon 2 corresponding to nucleotides 1556-1582 of mGK-6 (van Leeuwen et al., 1986), and UKP-3, a 30 mer comple- mentary to a conserved region in exon 3 (nucleotides 1934-1963 of mGK-6). These were initially used as probes for hybridization anal- ysis of M13 subclones. Positive recombinants were then sequenced by the chain termination method (Sanger et dl., 1977), using [a-35S] dATP (Amersham Corp.) with either UKP-2 or -3 as the primer.

RESULTS

Structure of the Mouse Glandular Kallikrein Gene Family- As shown in Fig. lA, digestion of mouse genomic DNA with both EcoRI and BamHI results in a complex array of bands hybridizing to the pMK-1 probe (Richards et al., 1982). In approaching the isolation of the complete mouse kallikrein locus, our aim was to assign all of these bands to particular genes. To this end, we screened two different bacteriophage libraries and a cosmid library, each containing clones suffi- cient to represent at least two haploid genomes. Restriction mapping of the kallikrein clones from these libraries is shown in Fig. 2.

We had previously isolated 2 kallikrein-positive clones (XMSP 1 and 2) from a Quakenbush genomic library con- structed in the Charon 28 vector (Mason et al., 1983). One of these clones contained a complete gene (mGK-1) of 4.5 kb, which was fully sequenced and found to consist of 5 exons, with lengths of 82, 160, 287, 137, and 181 bp, separated by introns of 2418,821,96, and 372 bp (Fig. 1B). In constructing maps of the bacteriophage and cosmid clones shown in Fig. 2, we routinely used the pMK-1 probe, and a HindIIIISmaI fragment from mGK-1 covering exon 1 (fragment IV, Mason et al., 1983). In cases where there was any ambiguity in the map, probes specific for exons 2 and 5 were also used. The position of exon 1 relative to the remainder of each gene was determined by its presence or absence in overlapping clones and hybridization of the exon 1 probe to one or more restric- tion fragments. Each clone was digested singly and pairwise with EcoRI, HindIII, BamHI, SacI, and SalI. It is striking that in the 310 kb of DNA covered by the overlapping clones, not a single SalI site was observed.

Variation between the genes occurs with respect to the length of introns, particularly A and to a lesser extent B and D (Fig. 1B). Sequence data for the genes mGK-1 (Mason et al., 1983), mGK-3 and 4 (Evans and Richards, 1985), mGK-5 and 9: mGK-6 (van Leeuwen et al., 1986), and mGK-13 and 2z3 show, however, that exonlintron boundaries occur at identical positions of the coding region for all exons. This observation is confirmed for exons 2 and 3 of other kallikrein genes by the data shown in Fig. 4. With the exception of mGK-7 and 17, restriction mapping and partial sequence analysis of the other 11 complete kallikrein genes (Fig. 2) give results in agreement with the exonlintron organization shown for mGK-1 (Fig. 1B). It is interesting that intron positions in the mouse coding regions are the same as those of a rat kallikrein gene (rGK-1),4 whereas the number and position of introns in other serine protease genes, even within a single species, vary considerably (Rogers, 1985).

Except for mGK-17, all genes within the kallikrein locus were found to be linked to at least one other gene. The largest cluster is that containing 11 genes (Fig. 2 A ) , while the cluster shown in Fig. 2B has five closely linked genes. All of the genes in these two clusters are transcribed in the same direction relative to each other. In fact the only exception to this observation is the linkage of mGK-14 and 15, which are

C. Drinkwater, unpublished result. C. Drinkwater, B. Evans, R. Richards, manuscript in preparation. ' B. Evans, unpublished data.

Page 3: Mouse Glandular Kallikrein Genes

Mouse Glandular Kallikrein Genes 8029

A . EEO Earn

B. 23 - 9.4 - 6 6 -

U,K P- , I

i n t r o n A B ’,\,’ C D

4 4 - exon 1 1 _.= 2 ”._ 3 L 5

-. -. -. -.

22 k b I 4 10.0 k b

9.2 3 . 4 9.2

8.4 7 8 .0

7.4 I 7.0

6.6 5. 6 6.5

4.2 9 5.6

3.4 19 4.7

3 .1 13 4 . 1

2.4 2. 8. 10. 1 1 . 3 .3 16. 22. 24

2. 10. 1 4 . 24. ( 8 . 16)

e-- - 2 3 - m- 2 . 0 ,_””’ ,

exon 2: AAPPVQSRIVCCFKCEK~SQPUHVAVYRYKEYICCCVLLD~~WVLTA~HCYYE 2.2 2 1 2.8 4. ( 3 . 9 )

zymogen p e p t l d e ‘% I .8 I2 2.6 1 . I t

-7 2.3 7. 9

0 . 5 6 .

r \ coding sequence: 5 ‘ AACTCCCTTCTCACACCTCCCCACTCC 3‘

UKP-2: 3’ TTCACCCAAGAGTCTCGACCCCTCACG 5 ’ I .o (10)

FIG. 1. Genomic blot analysis of mouse glandular kallikrein genes. A, BALB/c mouse genomic DNA was digested with the restriction enzymes EcoRI and BamHI and analyzed by Southern blotting using the general kallikrein cDNA probe, pMK-1. This probe hybridizes to most of exon 3 and to exons 4 and 5. Numbers to the left of the blot indicate the size in kilobase pairs of markers derived from Hind111 digestion of XcI857 DNA. E, a physical map of mGK-1 (Mason et al., 1983), showing the exon 2 amino acid sequence, and the oligodeoxyribonu- cleotide (UKP-2) used for rapid partial sequencing of this exon from other kallikrein genes. Also shown is the position of the UKP-3 oligodeoxyribonucleotide (see “Experimental Procedures”). C, hybridizing bands observed following EcoRI and BamHI digestion of mouse DNA are aligned with fragments isolated from the genomic clones shown in Fig. 2. Dashes indicate fragments for which no gene assignment can be made. Numbers in brackets indicate genes in which exon 5 is separated from other exons by a BamHI site in intron D, and which therefore generate separate weakly hybridizing fragments. There are additional weak signals corresponding to a 0.6-kb EcoRI fragment from mGK-12, and 0.5-kb BamHI fragments from mGK-10 and 24.

separated by 21 kb of spacer DNA. In general, intergenic spacer regions vary between 3.3 and 7 kb, the only other long spacer (24 kb) being between mGK-25 and mGK-7.

Our attempt to isolate the complete kallikrein locus was made more difficult by the very biased distribution of bacte- riophage clones (Fig. 2). For example, the mGK-4 gene is present in 11 X clones, whereas mGK-17 and 19 were found only once. In general, genes within the large clusters A and B are significantly overrepresented. This cannot be explained by variability of hybridization signals during screening of the libraries, since we were careful to include even weakly hybrid- izing plaques in our selection. One possible explanation is that there is a greater abundance of Sau3A sites within the gene clusters than in the flanking regions, although this is not reflected by any bias in the distribution of BamHI sites. A second possibility is that certain regions either cannot be propagated in bacteriophage vectors or give rise to such small plaques that no hybridization signal can be detected. This suggestion is in agreement with the observations of Wyman et al. (1985), who showed that 8.9% of a random selection of human bacteriophage clones were unable to grow on standard rec+ hosts (such as LE392). A third explanation is that regions in the vicinity of the kallikrein gene clusters could have a higher density of EcoK sites and therefore might be lost during in vitro packaging due to EcoK activity (Rosenberg, 1985).

We attempted to obtain additional data on the kallikrein locus by screening a BALB/c cosmid library (Cory et al., 1985; provided by M. Graham of the Walter and Eliza Hall Institute, Melbourne). With the exception of cMSP-18 and 20, all 11 positive clones covered the same regions which were highly represented in the bacteriophage libraries. In addition, the clones cMSP-8 and 21 contained more than one insert (Fig. 2.4). It is interesting that no single bacteriophage or cosmid clone contained kallikrein genes in an opposite orientation with respect to each other. This would be at least consistent with the conclusion of Wyman et al. (1985) that failure to propagate in a rec+ host is associated with the presence of inverted repeats.

Genomic Blot Analysis of the Kallikrein Locus-Comparison of the restriction fragments generated by each kallikrein gene with the genomic blot shown in Fig. LA indicates that we can account for all of the EcoRI bands and most of the BamHI bands hybridizing to pMK-1 (Fig. 1C). The size of the EcoRI fragments generated by the genes mGK-15 and 17 cannot be determined from the available mapping data. It is possible that mGK-15 represents an eighth gene containing the 2.4- kb EcoRI fragment or else an additional gene carrying a 4.2- kb fragment. The EcoRI fragment from mGK-17 must be a t least 5.7 kb (Fig. 2E), and may be a component of the relatively strong band a t 6.6 kb. The results of BamHI diges- tion are complicated somewhat by the fact that in 12 different kallikrein genes, there is a Bam site within the fourth intron (Fig. 2). Despite this, it can be seen that there are only two bands, of 4.7 and 4.1 kb, for which there is no defined gene fragment. Again, these bands could be accounted for by the genes mGK-15 and mGK-21.

We have denoted the exon 1 which is downstream from mGK-19 as a separate gene, mGK-20, although it is likely that this simply represents the beginning of one of mGK-15, 21, or 23, and that we have therefore isolated 24 rather than 25 distinct genes. The present gaps in our structural data are that neither of the strongly hybridizing genes mGK-15 or 21 has been isolated in full and that we have been unable to demonstrate linkage of the different gene clusters. We are confident, however, that we have isolated at least portions of all the homologous mouse kallikrein genes and thus have a reasonable basis on which to draw conclusions concerning their possible range of functions.

Characterization of Mouse Kallikrein Genes-We had pre- viously sequenced the complete coding regions of the genes mGK-1 (Mason et al., 1983), mGK-3 and 4 (which encode the y- and a-subunits of NGF, Evans and Richards, 1985), and the renal kallikrein gene, mGK-6 (van Leeuwen et al., 1986), as well as considerable regions of flanking and intervening sequence. In order to determine which of the remaining 20 genes have the potential to encode active kallikreins, we

19

Page 4: Mouse Glandular Kallikrein Genes

8030 Mouse Glandular Kallikrein Genes mGK-25 . . . . . . . . . . . . . - mGK - 7 mGK-8 mGU-2 mGK- 1 mGK-9 - "- -

ECO R 1 I I I 1 I I I I I I I I I I1 I

HIND Ill I 11 1 I I I1 I I I I I I I I I I I

BAM H I I I I I I I I l l I I 1 1 1 I I I I

CUSP-18 1 v---- cMSP-8 - cMSP-20

+ I CMSP-9

AMSP-67 I I

AMSP-3 I I , I A MSP-35

AMSP-62 1 4 A MSP- 1

AMSP-25 I I I I A MSP-36

A M P - 1 4 t I k -I AMSP-32

A MSP-27 I I I I

MSP-44 I , x MSP-3 1 I A MSP-61 7

> MSP-38 I A MSP-39 - AMSP-34 I I

A MSP-23 ' 4

BAM H

SAC

I I I I I I II 1 I 1 I I I l l 1 I l l 11 I

I 1 I1 1 1 I I 1 1 I I I I

I I CUSP-16

8 I cMSP-2

k a cMSP-7

I I XMSP-33 ~ CUSP-3

Aw-eo - t 4 b MSP- 17 AMSP-64

AMP-51 E d # I MSP-8

AMSP-22 I + AMSP-29

AMSP-191 $ I A MSP-43

A MSP-26 1 4 MSP- 15

A MSP-28 I I I A MSP-54

AMSP-53 I I -- XMSP-2

A MSP-B I

A MSP-50 c , A M P - 1 2 I

AMSP-18 t 4

FIG. 2. Restriction map of the mouse glandular kallikrein locus. The organization of mouse kallikrein genes is demonstrated by the isolation of overlapping genomic clones in bacteriophage (XMSP) and cosmid (cMSP) vectors. Clones AMSP-3-26 were isolated from an embryonic BALB/c library provided by Dr. P. Leder, while remaining A clones were from the EMBL 3A library described under "Experimental Procedures." Vertical burs denote exons and arrows above each gene show the direction of transcription. Dotted lines above the genes mGK- 17, 18, 19, 23, and 25 indicate an organization or pattern of hybridization different from those of functional kallikrein genes (see text). Gene cluster A extends continuously from mGK-25 to mGK-18. The relative positions of regions A to G within the kallikrein locus have not been determined.

devised a strategy for the rapid sequencing of exons 2 and 3 active peptide. Given that this region is usually the most from each gene. Both of these exons contain regions which accessible to amino acid sequence analysis, we maximize the are variable between different kallikreins, as well as regions likelihood of being able to match a particular gene with a which are highly conserved. Exon 2 was chosen in particular protein of known function. since this gives the sequence of the NH2-terminus of the Fig. 3 shows the amino acid sequences predicted from the

Page 5: Mouse Glandular Kallikrein Genes

Mouse Glandular Kallikrein Genes 803 1 mGK-10 mGK-I1 mGK-12 mGK-13 mGK-18 " __ - -..-

1 1 n n r n n n n m n I I I n II I U u u MU 11

I I I II I I I I I II I I I I l l

?--- cMSP-21 - cMSP- I

+ cMSP-22 x MSP-BO I I

I I AMSP-41

I h MSP-52

k I X MSP-56

I I A MSP-48

A MSP-40 I I A MSP-45

AMSP-66 I I A MSP-65

I A MSP-20 I 4

I A MSP-24 1 I

AMSP-I I 1 i MSP-68 1 I

AMSP-21 I AMSP-70 I I

...... - - mGK-19 mGK-20

EGO R I

HIND^

BAM H I

S A C I

mGK-14 -

- 10 kb

mQK-21 "22 - - . . . . . . . . . . . . . . . . . . . . , , . . - mGK- 17

E C O R I J I I I I I I l l ECO RI-

HIND Ill - HIND m BAMHI- BAM HI

SAC I SAC I L

mGK- 15 c

F. I n nn I I] NU %a

I I A MSP-59

A MSP-46 I I

A MSP-37 1 I

FIG. 2"Continued

ECO R I I I I1 I I I I

HINO 111 I I I I I I I 1 I IL

BAM H I I I I I I 1

SAC I I I I I

Page 6: Mouse Glandular Kallikrein Genes

8032

m Mouse Glandular Kallikrein Genes

EXON 3

1 1 10 2 0 30 5 0 6 0 70 80 90

1 2 3 4 5 6

8 7

1 0 9

11 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 1 2 2 2 3 2 4 2 5

Y N E Y I ~

Y N K Y I ~ TRNTYVGESC

TRNTYVGESC

YNKYIMWGYP """""

_ _ " _ _ . _ _ _ _ " _ " " " " ~ ~ " " " " ~ ~ " "

1 2

4 3

6 5

7 8

1 0 9

11

1 3 1 2

1 4 1 5 1 6 1 7 1 8 1 9 2 1 2 2 2 3 2 4 2 5

FIG. 3. Predicted amino acid sequence from exons 2 and 3 of the mouse glandular kallikrein genes. Assignment of the reading frame for each exon is analogous to that of previously sequenced kallikrein genes (Mason et al., 1983; Evans and Richards, 1985; van Leeuwen et al., 1986). Boxed areas represent amino acid residues which are identical for at least 75% of the sequences. Numbers refer to the position relative to the NH,- terminal residue of mature mouse renal kallikrein. The arrow denotes the first residue of the zymogen peptide.

exon 2 and 3 nucleotide sequences of mouse kallikrein genes (Fig. 4). It was not possible to obtain exon 2 sequences from some genes, in two cases (mGK-15 and 23) because no ge- nomic clones carried the complete gene. The genes mGK-7, 17, and 25, although isolated in full, showed no homology with either UKP-2 or another exon 2-specific oligodeoxyribonucle- otide, UKP-2R (data not shown). As well, mGK-17, 18, and 23 did not hybridize to the exon 3-specific 30 mer, UKP-3 (Fig. 1B). We investigated this lack of hybridization by more detailed sequence analysis of these genes and found in each case that gross rearrangements have occurred. For example, in the case of mGK-7, continuous sequence from intron A through to exon 3 was obtained, showing the absence of exon 2 to be caused by a deletion of 831 bp relative to mGK-1 (Mason et al., 1983), beginning 19 bp upstream from the normal intron/exon junction, and ending in intron B 175 bp upstream from the start of exon 3. We can exclude the possibility of cloning artifacts both for this gene and for mGK- 18, since multiple X clones gave identical restriction maps and showed the lack of hybridization to appropriate exon-specific probes. mGK-7 was also sequenced from two different X clones (XMSP-25 and 67). In the case of mGK-23, hybridization to the entire pMK-1 probe was extremely weak compared to that of the neighboring mGK-16. In conjunction with the rearrangement observed in XMSP-55, this provides evidence that mGK-23 is not a functional kallikrein gene. We believe that the rearrangement found in mGK-17 is unlikely to rep- resent a cloning artifact, since the 5.6 kb BamHI fragment from XMSP-11 is present in mouse genomic DNA (Fig. IC). Thus, we conclude that mGK-7,17, 18, 23, and 25 are pseu- dogenes. From the predicted amino acid sequences, mGK-2, 10,12,15,19, and 25 are also pseudogenes due to the presence of in-phase termination codons. Hence a total of 10 mouse glandular kallikrein genes are unable to encode functional proteins.

From the table of amino acid sequences, we have been able to show that mGK-13 and 22 encode EGF-binding proteins (Anundi et al., 1982) and that mGK-16 encodes y-renin (Poe et al., 1983). The complete sequences of these genes will be reported el~ewhere.~ mGK-5 and 9 have been sequenced and

are found to be expressed in mouse salivary gland: while mGK-8 corresponds to a cDNA clone isolated from a salivary gland library (Fahnestock et al., 1986). The four remaining unidentified genes, mGK-11,14,21, and 24, have no apparent rearrangements or in-phase termination codons which would indicate that they were nonfunctional. From our present data, we cannot rule out the possibility of mutations in exons 1, 4, or 5 . In general, however, it is likely that pseudogenes under no selective pressure would carry at least one in-phase ter- mination codon within the 261 nucleotide region for which sequence was determined (Fig. 4). Using gene-specific oligo- deoxyribonucleotides derived from our sequence data, we are currently investigating whether mGK-11, 14, 21, and 24 are expressed in salivary gland or any other mouse tissues.

DISCUSSION

To define the genetic limits of functional diversity for the glandular kallikreins, we set out to isolate the complete mul- tigene family from mouse. We have identified 24 genes most of which show a high degree of homology. Although we have not been able to demonstrate linkage of all these genes, we believe that they form a single genetic locus on chromosome 7. This proposal is supported by the work of Howles et al. (1984), who used a mouse y-NGF cDNA to probe genomic DNA from recombinant inbred mouse lines. Like pMK-1, the y N G F probe detects all mouse kallikrein genes. The results of this analysis indicated that five observed EcoRI fragment polymorphisms between the strains C57BL/6J and DBA/2J are genetically inseparable from each other and from the Tam-1 locus. If we compare the restriction fragments from these strains with those for BALB/c mice (Fig. IC), we find that fragments of 7.4 and 3.4 kb are absent in DBA/2J mice, which instead have unique EcoRI fragments of 2.7, 2.1, and 1.9 kb. Although we cannot determine the exact correspond- ence of polymorphic fragments from the three different strains, we can state that the genes mGK-1 and mGK-19 are subject to polymorphism. This means that mGK-19 is genet- ically linked to the 11 gene cluster containing mGK-1. In

' C. Drinkwater, unpublished results.

Page 7: Mouse Glandular Kallikrein Genes

Mouse Glandular Kallikrein Genes a033

1 2 3 4 5 6 8 9 10 11 1 2 13 1 4

18 1 6

2 1 1 9

2 2 2 4

1 2 3 4 5 6 8 9

10 I 1 1 2 1 3 1 4 1 6 1 8 1 9 2 1 2 2 2 4

1 2 3 4 5 6 7 8 9 10 1 1 1 2 I 3 14 1 5 16 19 2 1 2 2 2 4 2 5

1 2 1 4 5 6 7 8 9 10 11 1 2

1 4 11

1 5 I 6 1 9 2 1 2 2 2 4 2 5

1 2 3 4 5 6 7 8 9

11 10

1 2 13 14 1 5 16 19 2 1 2 2 24 2 5

FIG. 4. Nucleotide sequences of the mouse glandular kalli- krein genes aligned to maximize homology. Splice acceptor sites a t the 5' exonlintron junctions are boxed.

addition, EcoRI fragments carrying the genes mGK-15 and 17 are likely to be polymorphic and thus linked to the re- maining kallikrein genes.

We propose that the mouse kallikrein gene locus encom- passes the genetically defined loci Tam-1 (Skow, 1978), Prt- 4, and Prt-5 (Otto and von Deimling, 1981). These loci are linked on chromosome 7, and their gene products share with characterized members of the kallikrein family the properties of cleavage of synthetic esters of arginine, expression in the male mouse salivary gland, and induction by testosterone. It is possible that each of the Tam-1, Prt-4, and Prt-5 loci represents a particular kallikrein gene, since the products demonstrate a number of biochemical differences (Otto and von Deimling, 1981). Alternatively, given the occurrence of polymorphisms at all three loci, it may be that different strains of mice express a unique subset of kallikreins in the salivary gland which are detectable by the TAMase assay (Skow, 1978). For example, in BALB/c mice, the mGK-10 gene is unable to encode a functional protein due to the presence of point mutations resulting in termination codons (Fig. 3). However, other strains may carry an mGK-10 gene which encodes a product with TAMase activity.

To further define the mouse kallikrein family, we obtained sequences from exons 2 and 3 of the genes. The data presented in Fig. 3 indicate that in BALB/c mice, 10 of the kallikrein genes are pseudogenes, while 14 have the potential to encode functional proteins. Of the latter, 10 are known to be ex- pressed in salivary gland (Table I). If we compare the coding regions of functional genes with those of strongly hybridizing pseudogenes such as mGK-2, 10, and 12, the nucleotide se- quence homologies vary between 82 and 93%. This degree of similarity indicates that gene conversion and unequal cross- ing-over events may play a role in maintaining the mouse kallikrein gene family. As previously noted, however, different kallikreins display localized regions of divergence which often coincide with residues important in determining substrate specificity (Mason et al., 1983). We suggest that gene conver- sion and unequal crossing-over are counterbalanced by selec- tion toward functional diversity.

A corollary of this proposal is that between 10 and 14 functional mouse kallikrein genes, if not essential for survival, at least confer some selective advantage. In this context, we should consider the number of kallikrein genes found in other mammalian species. Using hybridization kinetics, Ashley and MacDonald (1985) estimated that there are at least eight rat kallikrein genes. Genomic blot analysis of DNA cleaved with EcoRI, BarnHI, or HindIII was found to give 7-10 cross- hybridizing bands. We have obtained a similar result using a rat probe derived from exon 4 of rGK-1, although HindIII digestion of rat genomic DNA gave 15 bands of varying intensity (data not shown). As in mouse, rat kallikreins appear

TABLE I Characterization of muse ghndulnr kallikrein genes -

Potentially Known function Unidentified, functionai,

expressed no expression Pseudogenes data ___

mGK-3 (y-NGF) mGK-1 mGK-11 mGK-2 mGK-4 (a-NGF) mGK-5 mGK-14 mGK-7 mGK-6 (renal kallikrein) mGK-8 mGK-21 mGK-10 mGK-13 (EGF-binding protein)" mGK-9 mGK-24 mGK-12 mGK-16 (y-renin) mGK-22 (EGF-binding protein)

mGK-15 mGK-17 mGK-18 mGK-19 mGK-23 mGK-25

EGF, epidermal growth factor.

Page 8: Mouse Glandular Kallikrein Genes

8034 Mouse Glandular Kallikrein Genes

to be encoded by a large multigene family. In contrast, Howles et al. (1984) suggested that another rodent, the Chinese ham- ster, has as few as two kallikrein genes. Southern blots of genomic DNA were probed with the mouse y-NGF cDNA, which like pMK-1 (Mason et al., 1983), may not cross-hybrid- ize with most hamster genes. In our experience, mouse probes hybridize rather weakly to rat kallikrein genes, giving signals 10- to 20-fold less intense than the rGK-1 probe, even though the overall sequence homology is at least 75%. This appears to reflect the absence of regions of very high homology which are seen between different kallikrein genes within a given species. Thus the lack of hybridization between mouse y N G F and hamster genes does not necessarily detract from the latter species possessing functionally equivalent genes.

Two different groups have used human kallikrein cDNA clones to probe genomic blots of human DNA (Baker and Shine, 1985; Fukushima et al., 1985). In both cases, only three bands were detected following digestion of the DNA with EcoRI, BamHI, and HindIII. Although this result should be clear-cut since a homologous probe was used, two lines of evidence indicate that the human genome may also contain a much larger kallikrein gene family. First, when Fukushima et al. (1985) screened a human genomic library using a pan- creatic cDNA probe, they obtained 54 positive clones. Given that the library contained 300,000 clones, sufficient to repre- sent 1.5 genomes, this result is comparable to those obtained by us in screening mouse libraries. Second, sequencing of one of these human genomic clones (hKK-3) by Fukushima and co-workers showed that it shares only 68% amino acid se- quence homology with the pancreatic cDNA. It is interesting that a human prostate-specific antigen (Watt et al., 1986), which is a member of the same family, shows only 60% homology with the renal/pancreatic kallikrein, although it shares 87% homology with the hKK-3 gene. Thus in the human genome, there is a greater divergence between kalli- krein genes. It is important to note that whenever we found a weakly hybridizing mouse kallikrein gene, it was always a pseudogene. In contrast, there are at least two human kalli- krein genes which encode functional proteins but which would hybridize relatively weakly to the cDNA probes used. It is thus possible that under the more stringent conditions re- quired to obtain a clear genomic blot not all human kallikrein genes would be detected.

In our view, the mouse and possibly rat kallikrein gene families are unusual, in that all members retain a very high degree of homology. Comparison of equivalent proteins from different species, such as the mouse and human renal kalli- kreins which show only 61% amino acid sequence homology, suggests that the homologies of 73 to 83% seen between mouse kallikreins are not essential for enzyme function. Nucleotide sequence homology, particularly as detected by hybridization experiments, may in reality represent a poor definition of a multigene family.

We have proposed previously that the kallikrein gene family may encode as many as 30 proteases involved in a wide variety of peptide processing pathways (Mason et al., 1983). The present study, however, limits the functional diversity to just

14 genes. It will be important to establish physiological roles for the eight apparently functional but unidentified genes in the mouse (Table I), and also to determine the number of functional genes present in other mammalian species.

Acknowledgments-We would like to thank Dr. P. Leder and Michael Graham for providing us with mouse genomic libraries, Drs. Geoff Tregear and Jim Haralambidis for synthesis of oligodeoxyri- bonucleotides, and members of the Molecular Biology Laboratory for valuable discussions and criticism of the manuscript.

REFERENCES Anundi, H., Ronne, H., Peterson, P. A., and Rask, L. (1982) Eur. J.

Ashley, P. L., and MacDonald, R. J. (1985) Biochemistry 2 4 , 4520-

Baker, A. R., and Shine, J. (1985) DNA 4,445-450 Cory, S., Graham, M., Webb, E., Corcoran, L., and Adams, J. M.

Evans, B. A., and Richards, R. I. (1985) EMBO J 4.133-138 Fahnestock, M., Brundage, S., and Shooter, E. M. (1986) Nucleic

Frischauf, A.-M., Lehrach, H., Poustka, A., and Murray, N. (1983) J.

Fritz, L. C., Arfsten, A. E., Dzau, V. J., Atlas, S. A., Baxter, J. D., Fiddes, J. C., Shine, J., Cofer, C. L. Kushner, P., and Ponte, P. A. (1986) Proc. Natl. Acad. Sci. U. S. A. 83,4114-4118

Fukushima, D., Kitamura, N., and Nakanishi, S. (1985) Biochemistry

Gray, A., Dull, T. J., and Ullrich, A. (1983) Nature 303 , 722-725 Howles, P. N., Dickinson, D. P., DiCaprio, L. L., Woodworth-Gutai,

M., and Gross, K. W. (1984) Nucleic Acids Res. 12.2791-2805

Biochem. 129,365-371

4527

(1985) EMBO J 4,675-681

Acids Res. 14,4823-4835

Mol. Biol. 170,827-842

24,8037-8043

Hudson, P., Haley, J., Cronk, M., Shine, J., and Niall, H. (1981) Nature 2 9 1 , 127-131

Kaiser, K., and Murray, N. (1985) in DNA Cloning, a Practical Approach (Glover, D. M., ed), Vol. I, pp. 1-47, IRL Press, Oxford

Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning. A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N. Y.

Mason, A. J., Evans, B. A., Cox, D. R., Shine, J., and Richards, R. I. (1983) Nature 303 , 300-307

Messing, J. (1983) Methods Enzymol. 101,20-78 Nakanishi, S., Inoue, A., Kita, T., Nakamura, M., Chang, A. C. Y.,

Otto, J., and von Deimling, 0. (1981) Biochem. Genet. 19,431-444 Poe, M., Wu, J . K., Florance, J. R., Rodkey, J. A., Bennett, C. D.,

Richards, R. I., Catanzaro, D. F., Mason, A. J., Morris, B. J., Baxter,

Rogers, J. (1985) Nature 316,458-459 Rosenberg, S. (1985) Gene (Amstj 39.313-319

Cohen, S. N., and Numa, S. (1979) Nature 278,423-427

and Hoogsteen, K. (1983) J. Biol. Chem. 268,2209-2216

J. D., and Shine, J. (1982) J. Biol. Chem. 267 , 2758-2761

Sanger, FY,. Nicklen, S., and Coulson,. A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 74,5463-5467

Schachter, M. (1980) Phurnacol. Reu. 31, 1-17 Scott, J., Selby, M., Urdea, M., Quiroga, M., Bell, G. I., and Rutter,

Skeggs, L. T., Dorer, F. E., Levine, M., Lentz, K. E., and Kahn, J. R.

Skow, L. (1978) Genetics 90,713-724 Southern, E. M. (1975) J. Mol. Bwl. 98, 503-517 Ullrich, A., Shine, J., Chirgwin, R., Pictet, R., Tischer, E., Rutter, W.

van Leeuwen, B. H., Evans, B. A., Tregear, G. W., and Richards, R.

Watt, K. W. K., Lee, P.-J., MTimkulu, T., Chan, W.-P., and Loor,

Wyman, A. R., Wolfe, L. B., and Botstein, D. (1985) Proc. Natl. Acad.

W. J. (1983) Nature 302, 538-540

(1980) Adu. Exp. Med. Biol. 130 , 1-23

J., and Goodman, H. M. (1977) Science 196,1313-1319

I. (1986) J. Biol. Chem. 261 , 5529-5535

R. (1986) Proc. Natl. Acad. Sci. U. S. A. 83, 3166-3170

Sci. U. S. A. 82,2880-2884