5
Proc. Natt Acad. Sci. USA Vol. 79, pp. 4575-4579, August 1982 Biochemistry Human somatostatin I: Sequence of the cDNA (recombinant DNA/polypeptide hormone structure/protein processing/homology) LU-PING SHEN*, RAYMOND L. PICTETt, AND WILLIAM J. RUTTER Department of Biochemistry and Biophysics, University of California, San Francisco, California 94143 Communicated by 1. S. Edelman, April 28, 1982 ABSTRACT RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After pre- screening, clones containing somatostatin I sequences were iden- tified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, in- cluding the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12,727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostat- in I indicated that the COOH-terminal region encoding somato- statin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal pep- tide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propep- tides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Somatostatin is a 14-amino acid polypeptide that inhibits the secretion of other polypeptides and hormones, including growth hormone, insulin, glucagon, and gastrin (reviewed in ref. 1). It occurs in the pancreas, stomach, and small intestine, as well as in the central nervous system. Its presence in neuronal tissue (2) and its apparent neurophysiological action (3) has led to the postulate that somatostatin is also a neurotransmitter. Higher Mr forms of somatostatin immunoreactivity have been detected (4). Studies (5) of the biosynthesis of somatostatin in cultured rat pancreatic islets have suggested that the larger immunoreactive species are somatostatin precursors. A 28- amino acid peptide containing the somatostatin moiety has been isolated from porcine intestine (6) and hypothalamus (7) and more recently from ovine hypothalamus (8). We recently have cloned and sequenced somatostatin cDNA from the endocrine pancreas of anglerfish (9). The nucleotide sequence of the coding region predicted a 121-amino acid poly- peptide containing somatostatin at its COOH terminus. This polypeptide is presumed to be the first somatostatin precursor. We also discovered another cDNA from anglerfish that encoded a 125-amino acid polypeptide that contained a somatostatin-like moiety at its COOH terminus. This somatostatin differs from the former somatostatin in two of the 14 amino acids (Tyr in place of Phe-7 and Gly in place of Thr-10). Therefore, we termed the classical somatostatin sequence somatostatin I and the novel sequence somatostatin II. Somatostatin II has been chemically synthesized and shown to inhibit selectively insulin release with no detectable effect on glucagon release. Thus, the activity is qualitatively different from somatostatin I. This jus- tifies the assertion that these represent two functional forms of the hormone (unpublished observations). Subsequently, others have demonstrated the presence of two somatostatin forms in catfish (10-12). This report describes the cloning and sequence determina- tion of a cDNA coding for human preprosomatostatin I. As yet we have been unable to detect cDNA sequences coding for a sanatostatin II. The predicted amino acid sequence of the pu- tative somatostatin precursor contains- segments; identical to those of somatostatin-14 and somatostatin-28, which have been isolated from pig and sheep, suggesting that these molecules are derived from a common precursor. We also have compared the amino acid sequence derived from human and anglerfish preprosomatostatin I. An analysis of the conserved structures provides evidence of functional regions in the molecule. MATERIALS AND METHODS cDNA Synthesis and Construction of Recombinant Plas- mids. Thirty milligrams of human pancreatic somatostatinoma tissue embedded for freeze sectioning, provided by D. M. McCarthy, was cleaned of its embedding compound, lyophi- lized, and then homogenized in 6 ml of 4 M guanidine thio- cyanate (13). The RNA was sedimented through a CsCl cushion (14), extracted with phenol/chloroform, and precipitated with ethanol. The total yield of RNA was 480 mg. The first-strand cDNA was synthesized in a reaction mixture (150 ml) containing 25 mg of RNA, 10 mM Tris chloride (pH 8.3), 70 mM KCI, 8 mM MgCl2, 0.05% 2-mercaptoethanol, 1 mM dGTP, dCTP, and dTTP, 0.5 mM dATP, 40 units of reverse transcriptase (RNA-dependent DNA nucleotidyltransferase; J. Beard, Na- tional Cancer Institute), and 100 ,uCi (1 Ci = 3.7 X 1010 becque- rels) of [a-32P]dATP. After 30 min at 43°C, the RNA was base- hydrolyzed, and the first-strand cDNA molecules were tailed with 30 dC residues by using terminal deoxynucleotidyltrans- ferase. The second strand was synthesized as described by Cooke et aL (15) except that DNA polymerase I replaced reverse transcriptase for extension of the oligo(dG) primer (Collabora- tive Research). Approximately 70 ng of dC-tailed double- stranded cDNA was hybridized with an equimolar portion of pBR322 that had been tailed with dG in the Pst I site. The re- sulting hybrid plasmids were used to transform Escherichia coli strain HB101 in compliance with the National Institutes of Health guidelines (P3/HV1). Library Screening. Tetracycline-resistant transformants grown on Whatman 541 paper were screened by using probes com- prising either the first-strand cDNA prepared from tumor RNA (specific activity, 5 x 108 cpm/liter per mg) or a nick-translated insert derived from cloned anglerfish or human preprosoma- tostatin cDNA. The hybridizations were carried out at 68°C for * Present Address: Shanghai Institute of Biochemistry, Chinese Acad- emy of Sciences, Shanghai, People's Republic of China. t Present Address: Institute for Research in Molecular Biology, Uni- versity of Paris 7, Tour 43, 2 place Jussieu, 75251 Paris, France. 4575 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Human somatostatin I: Sequence ofthe cDNA - PNAS · cluding the preprosomatostatin coding region, 105 nucleotides ... CCG GCU AUG GCA CCC CGA GAA CGC AAA GCU GGC UGC AAG …

  • Upload
    buinhi

  • View
    223

  • Download
    2

Embed Size (px)

Citation preview

Proc. Natt Acad. Sci. USAVol. 79, pp. 4575-4579, August 1982Biochemistry

Human somatostatin I: Sequence of the cDNA(recombinant DNA/polypeptide hormone structure/protein processing/homology)

LU-PING SHEN*, RAYMOND L. PICTETt, AND WILLIAM J. RUTTERDepartment of Biochemistry and Biophysics, University of California, San Francisco, California 94143

Communicated by 1. S. Edelman, April 28, 1982

ABSTRACT RNA has been isolated from a human pancreaticsomatostatinoma and used to prepare a cDNA library. After pre-screening, clones containing somatostatin I sequences were iden-tified by hybridization with an anglerfish somatostatin I-clonedcDNA probe. From the nucleotide sequence oftwo ofthese clones,we have deduced an essentially full-length mRNA sequence, in-cluding the preprosomatostatin coding region, 105 nucleotidesfrom the 5' untranslated region and the complete 150-nucleotide3' untranslated region. The coding region predicts a 116-aminoacid precursor protein (Mr, 12,727) that contains somatostatin-14and -28 at its COOH terminus. The predicted amino acid sequenceof human somatostatin-28 is identical to that of somatostatin-28isolated from the porcine and ovine species. A comparison of theamino acid sequences ofhuman and anglerfish preprosomatostat-in I indicated that the COOH-terminal region encoding somato-statin-14 and the adjacent 6 amino acids are highly conserved,whereas the remainder of the molecule, including the signal pep-tide region, is more divergent. However, many of the amino aciddifferences found in the pro region of the human and anglerfishproteins are conservative changes. This suggests that the propep-tides have a similar secondary structure, which in turn may implya biological function for this region of the molecule.

Somatostatin is a 14-amino acid polypeptide that inhibits thesecretion of other polypeptides and hormones, includinggrowth hormone, insulin, glucagon, and gastrin (reviewed inref. 1). It occurs in the pancreas, stomach, and small intestine,as well as in the central nervous system. Its presence in neuronaltissue (2) and its apparent neurophysiological action (3) has ledto the postulate that somatostatin is also a neurotransmitter.

Higher Mr forms of somatostatin immunoreactivity havebeen detected (4). Studies (5) ofthe biosynthesis ofsomatostatinin cultured rat pancreatic islets have suggested that the largerimmunoreactive species are somatostatin precursors. A 28-amino acid peptide containing the somatostatin moiety has beenisolated from porcine intestine (6) and hypothalamus (7) andmore recently from ovine hypothalamus (8).We recently have cloned and sequenced somatostatin cDNA

from the endocrine pancreas of anglerfish (9). The nucleotidesequence of the coding region predicted a 121-amino acid poly-peptide containing somatostatin at its COOH terminus. Thispolypeptide is presumed to be the first somatostatin precursor.We also discovered another cDNA from anglerfish that encodeda 125-amino acid polypeptide that contained a somatostatin-likemoiety at its COOH terminus. This somatostatin differs fromthe former somatostatin in two of the 14 amino acids (Tyr inplace of Phe-7 and Gly in place of Thr-10). Therefore, wetermed the classical somatostatin sequence somatostatin I andthe novel sequence somatostatin II. Somatostatin II has beenchemically synthesized and shown to inhibit selectively insulinrelease with no detectable effect on glucagon release. Thus, theactivity is qualitatively different from somatostatin I. This jus-

tifies the assertion that these represent two functional forms ofthe hormone (unpublished observations). Subsequently, othershave demonstrated the presence of two somatostatin forms incatfish (10-12).

This report describes the cloning and sequence determina-tion of a cDNA coding for human preprosomatostatin I. As yetwe have been unable to detect cDNA sequences coding for asanatostatin II. The predicted amino acid sequence of the pu-tative somatostatin precursor contains- segments; identical tothose of somatostatin-14 and somatostatin-28, which have beenisolated from pig and sheep, suggesting that these moleculesare derived from a common precursor. We also have comparedthe amino acid sequence derived from human and anglerfishpreprosomatostatin I. An analysis of the conserved structuresprovides evidence of functional regions in the molecule.

MATERIALS AND METHODScDNA Synthesis and Construction of Recombinant Plas-

mids. Thirty milligrams of human pancreatic somatostatinomatissue embedded for freeze sectioning, provided by D. M.McCarthy, was cleaned of its embedding compound, lyophi-lized, and then homogenized in 6 ml of 4 M guanidine thio-cyanate (13). The RNA was sedimented through a CsCl cushion(14), extracted with phenol/chloroform, and precipitated withethanol. The total yield of RNA was 480 mg. The first-strandcDNA was synthesized in a reaction mixture (150 ml) containing25 mg of RNA, 10 mM Tris chloride (pH 8.3), 70 mM KCI, 8mM MgCl2, 0.05% 2-mercaptoethanol, 1 mM dGTP, dCTP,and dTTP, 0.5 mM dATP, 40 units of reverse transcriptase(RNA-dependent DNA nucleotidyltransferase; J. Beard, Na-tional Cancer Institute), and 100 ,uCi (1 Ci = 3.7 X 1010 becque-rels) of [a-32P]dATP. After 30 min at 43°C, the RNA was base-hydrolyzed, and the first-strand cDNA molecules were tailedwith 30 dC residues by using terminal deoxynucleotidyltrans-ferase. The second strand was synthesized as described byCooke et aL (15) except that DNA polymerase I replaced reversetranscriptase for extension of the oligo(dG) primer (Collabora-tive Research). Approximately 70 ng of dC-tailed double-stranded cDNA was hybridized with an equimolar portion ofpBR322 that had been tailed with dG in the Pst I site. The re-sulting hybrid plasmids were used to transform Escherichia colistrain HB101 in compliance with the National Institutes ofHealth guidelines (P3/HV1).

Library Screening. Tetracycline-resistant transformants grownon Whatman 541 paper were screened by using probes com-prising either the first-strand cDNA prepared from tumor RNA(specific activity, 5 x 108 cpm/liter per mg) or a nick-translatedinsert derived from cloned anglerfish or human preprosoma-tostatin cDNA. The hybridizations were carried out at 68°C for

* Present Address: Shanghai Institute of Biochemistry, Chinese Acad-emy of Sciences, Shanghai, People's Republic of China.

t Present Address: Institute for Research in Molecular Biology, Uni-versity of Paris 7, Tour 43, 2 place Jussieu, 75251 Paris, France.

4575

The publication costs ofthis article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Proc. Nati Acad. Sci. USA 79 (1982)

20 hr in 0.75 M NaCV0.075 M Na citrate, pH 7/0.1 M sodiumphosphate, pH 7/0.1% polyvinylpyrrolidone/0. 1% Ficoll/1%bovine serum albumin in the presence of sonicated, denaturedsalmon sperm DNA (100 tkg/ml) and labeled DNA (105 cpm/ml). The filters were washed twice with 0.3 M NaCV0.03 MNa citrate, pH 7/0.1% NaDodSO4 at room temperature for 30min with continual agitation, followed by two 15-min washesin 0.015 M NaCl/0.0015 M Na citrate, pH 7/0.1% NaDodSO4at 50'C. For conditions of lower stringency, the latter washingstep was eliminated. Hybridizing colonies were identified byautoradiography for 20 hr at -70'C with a single Dupont Light-ning Plus intensifying screen.

Characterization of Recombinants. The size of the recom-binant plasmid was estimated as follows: a portion ofthe colonywas removed, and the cells were lysed; the DNA was electro-phoresed in 1% agarose gel and transferred to nitrocellulosefilters. The hybridization and washing conditions used were asdescribed above.The sequence of the cDNA inserts was determined by the

procedure of Maxam and Gilbert (16) with the G+A modifi-cation of Cooke et aL (17).

RESULTSCloning and Identification of Somatostatin cDNA. Total

RNA was isolated from a portion of a human pancreatic soma-tostatinoma that contained 100 times as much somatostatinimmunoreactivity as a normal pancreas contains (18). A cDNAlibrary was constructed in the Pst I site of pBR322, and 933tetracycline-resistant, ampicillin-sensitive colonies were ob-tained. Assuming that somatostatin mRNA was an abundantspecies in the somatostatinoma, we screened for cDNA clonesderived from frequent mRNAs by in situ hybridization with[32P]cDNA prepared from the tumor RNA; 71 such colonieswere identified, but these did not give detectable cross-hy-bridization with anglerfish somatostatin I or II [32P]cDNAprobes. To provide a more sensitive test for hybridization,DNAs prepared from the 71 recombinants were electropho-resed and bound to nitrocellulose filters, which were incubatedwith the two anglerfish somatostatin cDNA probes and thenwashed under conditions of low hybridization stringency. Sev-enteen recombinants now hybridized with the anglerfish so-matostatin I cDNA probe; none hybridized with the somato-statin II probe. Two recombinant plasmids, pHSl-68 (283 basepairs) and pHS8-90 (382 base pairs), were shown by DNA se-quence analysis to contain a segment encoding somatostatin.

The cDNA library was rescreened with probes prepared frompHSl-68 and pHS8-90 insert DNA, yielding 73 positives, 59ofwhich were found among the 71 "abundant" clones. The largenumber ofpositive clones (8% of the total) is in agreement withthe high level of somatostatin found in the tumor. The restric-tion maps of 18 of the larger recombinants were compared withpHSl-68 and pHS8-90, allowing the selection of two plasmidswith inserts extending furthest into the 5' (pHS8-86, 581 basepairs) and 3' (pHS3-16, 340 base pairs) regions, respectively.These plasmids were used to derive the essentially full-lengthsomatostatin mRNA sequence.

Sequence of Human Somatostatin I cDNA. The generalstrategy of sequence determination used for pHS8-86 andpHS3-16 is indicated in Fig. 1. All restriction enzyme sites usedto initiate the determination were confirmed by assaying thesequence through them from adjacent sites. In this instance,it was important to determine the sequence of both strandsbecause several sequence compression artifacts were observed.The sequence was checked against portions of sequence deter-mined from four additional independently isolated clones. Bythis method, the entire sequence was confirmed except for thefirst two 5' and the last 11 3' nucleotides, which are absent fromthe four extra clones. Further, comparison of the partial re-striction maps of 17 of the larger recombinants revealed no se-quence heterogeneity.The 603 nucleotides of human preprosomatostatin I mRNA

sequence derived from the two clones is shown in Fig. 2, alongwith the predicted amino acid sequence of the somatostatinprecursor; the size is shown in Fig. 3. The mRNA comprises105 bases of 5' untranslated sequence, 348 bases of coding se-quence, and the entire 150 bases ofthe 3' untranslated region,which is rich in A and U and contains the classical A-A-U-A-A-A sequence terminating 17 bases before the site of poly(A) ad-dition. The first AUG in the sequence (nucleotides 106-108)is likely to be the initiation codon because the region upstreamfrom it contains two translation termination codons in phasewith the proposed reading frame. The only region ofthe mRNAthat may be incomplete is the 5' untranslated region. Howeverpreliminary data obtained from the DNA sequence ofa genomicclone encoding human somatostatin I, in combination with invitro transcription experiments, indicate that the mRNA initi-ation site is located no more than five bases beyond the end ofthe sequence presented here (unpublished results).

Structure of Human Preprosomatostatin L, The nucleotidesequence predicts that human preprosomatostatin I is a protein

Nar I Hinf I Pst I

97 193 211~~~~~~~~~~~~~~~~~~~iHinfl

Nar I

Pst I Pst Ii -N 0 -

SS-2878 1234

14

42 15S

Bgl 11 Pst I Bgl 11

317 364 366

Bgl 11 Bgl 11

Pst I Pst II ' *

3'- Poly(A) FIG. 1. Organizationofhumanpre-

prosomatostatinIcDNA. (A) Structureof preprosomatostatin and its mRNA.The rectangle represents the trans-lated portion of the mRNA. The re-gions coding for somatostatin-14(i), somatostatin-28 (SS-28), andthe signal peptide (pre-) are indicated.The sizes of the pre-, pro-, and SS-14portions of preprosomatostatin are in-dicated in terms of the numbers ofamino acids and nucleotides. The sizesof the 5' and 3' untranslated regionsdeduced from these cDNA clones arealso indicated. (B) Structures of pHS8-86 and pHS3-16 and the strategy fordetermining their sequences. The re-striction sites at which sequence de-

Poly(A) terminations were initiated and thePst I direction and extent of the sequence---I determinations are shown.

2451

105

A

Amino acids

Nucleotides

B

pHS8-86

72

pHS3-16

m4 pro- .,- m

4576 Biochemistry: Shen et al.

Proc. NatL Acad. Sci. USA 79 (1982) 4577

ACACAAGCCGCUUUAGGAGCGAGGUUCGGAGCCAUCGCUGCUGCCUGCUGAUCCGCGCC

-102met

UAGAGUUUGACCAGCCACUCUCCAGCUCGGCUUUCGCGGCGCCGAG. AUG

-100leu serCUG UCC.

-90cys arg leu gln cys ala leu. ala ala leu ser ile val leu alaUGC CGC CUC CAG UGC GCG CUG GCU GCG CUG UCC AUC GUC CUG GCC

-70leu gly cys. val thr gly ala pro ser asp pro arg leu. arg glnCUG GGC.UGU GUC ACC GGC.GCU CCC UCG GAC CCC AGA CUC CGU CAG'

-60phe leu gln lys ser leu ala ala ala ala. gly lys gln glu leuUUU CUG CAG AAG UCC CUG GCU GCU.GCC GCG GGG AAG'CAG GAA CUG

-40ala lys tyr phe leu ala glu. leu leu ser glu pro asn gln thrGCC AAG UAC UUC.UUG GCA GAG CUG'CUG UCU.GAA CCC AAC CAG ACG

-30glu asn asp ala leu glu pro glu asp leu ser gln ala ala gluGAG.AAU GAU GCC CUG GAA CCU GAA GAU CUG UCC CAG GCU GCU GAG

-20 -10gln asp glu met arg leu glu le-u gln arg ser ala asn ser asnCAG GAU GAA AUG AGG CUU GAG CUG CAG AGA UCU GCU AAC UCA AAC

1

pro ala met ala pro arg glu arg lys ala gly cys lys asn phe.CCG GCU AUG GCA CCC CGA GAA CGC AAA GCU GGC UGC AAG AAU UUC

10 14phe trp lys thr phe thr ser cy s AMUUC UGG AAG ACU UUC ACA UCC UGU UAG CUUUCUUAACUAGUAUUGUCCAUA

UCAGACCUCUGAUCCCUCGCCCCCACACCCCAUCUCUCUUCCCUAAUCCUCCAAGUCUUC

AGCGAGACCCUUGCAUUAGAAACUGAAAACUGUAAAUACAAAAUAAAAUUAUGGUGAAAU

UAU(A)n

of 116 amino acid residues with Mr 12,727. The code for thesomatostatin tetradecapeptide is located between- nucleotides411 and 453, followed immediately by the termination codonUAG. The signal peptide, which is part of the primary trans-lation product of all secreted hormones, can be recognized atthe NH2 terminus by typical structural features, including apositively charged residue (arginine at -98) four residues fromthe NH2 terminus and an internal hydrophobic core. The lengthof the signal peptide varies among secreted proteins so that itis not possible to define the start of the prosomatostatin moietyprecisely. The clipping of the prepeptide sequence in otherproteins usually occurs immediately after a small neutral aminoacid (glycine, serine, cysteine, or alanine). Because alanine ismost frequently found at the clipping site and because prolineis found occasionally in the second position ofmature secretedproteins (see compilations in refs. 21 and 22), the cleavage couldoccur at position -78, resulting in a 92-amino acid prosoma-tostatin molecule (mass, 10,348 daltons). However, cleavage ataspartic acid at position -75 or at another site alsoseems pos-sible. Assuming one of these is correct, then three cysteineswould exist in the signal peptide (at positions -99, -95, and-82) and two cysteines in the somatostatin moiety, but no cys-teine residues would be present in the pro region. This agreeswith the observation of Patzelt et aL (5) that the pro region of

FIG. 2. The sequence .of humanpreprosomatostatin mRNA deducedfrom that of the composite sequencesof the cDNA clones. The predictedamino acid sequence is indicated andnumbered by designating the first res-idue of somatostatin-14 as 1. Theamino acidstowards theNH2 terminusandCOOH terminus of the peptide arenumbered negatively and positively,respectively. A possible signal peptideextends from Met at position -102 toGly at -79; the propeptide extendsfrom Ala at -78 to Lys at -1; and so-matostatin-28 extends from Ser at -14to Cys at 14.

the rat precursor contains no cysteine residues. The predictedamino acid sequencefrom residue 1 to 14 is identical to that of

FIG. 3. The size of the human prepro--_ 750 somatostatin mRNA. Human pancreatic

somatostatinoma RNA was electropho-resed in a 1.5% methylmercury(l) hy-droxide agarose gel (19), transferred tonitrocellulose paper (20), and hybridizedwith 32P-labeled pHS1-68. The hybridiz-ingmRNA molecules were detected by au,toradiography. The size of the mRNA, inbases, is indicated. The molecular lengthstandards were HindmI-digested bacterio-phage A DNA andHae m-digested 0X174.

-80

-50

Biochemistry: Shen et d

4578 Biochemistry: Shen et al

somatostatin-14, whereas that from residue -14 to 14 is iden-tical to the somatostatin-28 sequence determined for othermammalian species (6-8). Presumably both peptides are re-leased from a prohormone by a trypsin-like peptidase reactingwith basic residues at the cleavage site. Somatostatin-14 is pre-ceded by the basic dipeptide arginine-lysine whereas somato-statin-28 is preceded by a single basic residue, arginine at po-sition -15. The above structures would account for the Mr12,000, 3,000, and 1,600 proteins having somatostatin-like im-munoreactivity detected in the human pancreatic somatostati-noma (18).

DISCUSSIONWe isolated a somatostatin I cDNA from a human somatostat-inoma. The DNA sequence predicts the amino acid sequenceof the primary translation product of the mRNA for preproso-matostatin I. This 116-amino acid molecule (Mr 12,727) containsa. ::: ::: ATG CTG ::: TCC TGC CGC CTC CAG TGC

*** * *** * *** *** * ***

b. ATG AAG ATG GTC TCC TCC TCG CGC CTC CGC TGC-102 -100

c. Met Leu Ser Cys Arg Leu Gln se n

d. Met Lys Met Val Ser Ser Ser Arg Leu Arg Cys-107 -100

Proc. Natl. Acad. Sci. USA 79 (1982)

the sequence of somatostatin-14 at its COOH terminus. Thepredicted sequence agrees precisely with the known sequenceof somatostatin-14 isolated from other mammals. Human pre-prosomatostatin I, like other secretory proteins, presumablycontains a signal (pre-) peptide that targets the molecule forsecretion. We have tentatively identified residues -102 to-79 as the signal peptide on the basis of structural similaritieswith other signal peptides. This leaves a pro region of 92 aminoacids beginning with alanine -78. The general organization ofthe human preprosomatostatin I molecule (summarized in Fig.2) is essentially the same as that previously described for theprecursors for somatostatin I and II from anglerfish (9).

Somatostatin-28 (which contains somatostatin-14 at its COOHterminus) has been described in porcine (6, 7) and ovine (8) spe-cies. This identical sequence is present in the last 28 amino acidsof human preprosomatostatin I. Therefore, we envisage that,after secretion of the preprosomatostatin molecule and removal

GCG CTG GCT GCG CTG TCC ATC** * *** *

CTC CTC GTG CTC CTG CTG TCC-90

Ala Leu Ala Ala Leu Ser IleI

Leu Leu Val Leu Leu Leu Ser-90

GTC CTG GCC ::: CTG*T ** *

CTG ACC GCC TOO ATC

Val Leu Ala

Leu Thr Ala

CAG* *

CTG-70

Leu

GGC TGT GTC ACC GGC GCT** ** * * ** *

AGC TGC TCC TTC GCC GGA-80

Gly Cys Val Thr Gly Ala' -t ' -''

Ser Ile Ser Cys Ser Phe Ala Gly-80

TTT CTG CAG AAG TCC ::: CTG GCT GCT GCC* *** ** * * .* ***

CTG CTG CAC CGG TAO COG CTG:: : CAG

Gln Phe Leu Gln Lys Ser

Leu Leu Leu His Arg Tyr -Pro-70

AAG TAC TTC TTG GCA* * * *** **

CGC TCC GCC TTG GCC-50

Lys Tyr Phe Leu Ala

Arg Ser Ala Leu Ala

GCC CTG GAA CCT GAA** *** ** G**GOT CTG GAG:: GAG

Leu Ala

"'eu

CCC TCG GAC CCC AGA CTC CGT* **.* ** * * *** **

CAG AGA GAC TOO AAA CTC CGC

Pro Ser Asp Pro Arg Leu Arg

GlnArgAspSe r Lys LeuIGin Arg Asp Ser Lys Leu Arg

GCG GGG*

GGC TCC-60

Ala Ala Ala Gly

Gln Gly Ser

GAG CTG CTG ::: TCT GAA*** *** ** ** **

GAG CTG CTC CTG TCG GAC

Glu Leu

Glu Leu-50

GAT CTG* *

GAG AAC-30

Ala Leu Glu Pro Glu AspI

Ala Leu Glu Glu .Glu

Leu Se r Glu

Leu Leu Ser Asp

TCC CAG GCT GCT GAG* *C*C** * *

TTC OCT CTG GCC GAA

Leu Ser Gln Ala Ala Glu

Asn Phe Pro Leu Ala Glu-30

AGG CTT GAG CTG CAG AGA TCT GCT AAC TCA AAC CCG** ** ** * * ** * * **

CAC GCC GAC CTA GAG CGG GCC GCC AGC GGG GGG CCT-10

Arg Leu Glu Leu Gin Arg Ser Ala Asn Ser Asn Pro

His Ala Asp Leu Glu Arg Ala Ala Ser Gly Gly Pro-20 -10

CGC AAA'GCT GGC TGC AAG AAT TTC TTC TGG AAG ACT* ** ** .*** *** *** ** *** .*** *** ** **

AGA AAG GCC GGC TGC AAG AAC TTC TTC TGG AAA ACC1 10

Arq Lys Ala Gly COs Lys Asn Phe Phe Trp Lys Thr

Arg Lys Ala Gly Cys Lys Asn Phe Phe Trp Lys Thr

1 10

AAG CAG GAA CTG GCC** *** ** ** *

AAA CAG GAC ATG ACT

Lys Gln Glu Leu Ala

Lys Gln Asp Met Thr-60

CCC AAC CAG ACG* * *** *

CTC CTG CAG GGG-40

Pro Asn Gln ThrI

Leu Leu Gln 'Gly

GAG AAT GAT**-* ** **

GAG AAC GAG

Glu Asn Asp

Glu Asn Glu-40

::: ::: CAG GAT 'GAA ATG** **

GGA GGA CCC GAG GAC GCC-20

Gln Asp Glu Met

Gly Gly Pro Glu Asp Ala

GCT ATG GCA CCC CGA GAA.* -** *** ** **

CTG CTC GCC COO OGG GAG

Ala Met Ala Pro Arg Glu

Leu Leu Ala Pro Arg Glu

TTC ACA TCC TGT*** ** *** **

TTC ACC TCC TGC14

Phe Thr Se CsI I TS

Phe Thr Ser Cys14

FIG. 4. Comparison of the humanand anglerfish preprosomatostatin Iproteins and mRNAs. Homology be-tween the sequences was maximizedby inserting gaps. In nucleotide se-quences of human preprosomatostatinI mRNA (line a) and anglerfish pre-prosomatostatin I mRNA (line b), as-terisks indicate homologous nucleo-tides. In amino acid sequences ofhumanpreprosomatostatin I protein (line c)and anglerfish preprosomatostatin Iprotein (line d), vertical bars indicatehomologous amino acids. The aminoacid sequence is numbered by desig-nating the first residue of somatostat-in-14 as 1.

I

I

Proc. Natl. Acad. Sci. USA 79 (1982) 4579

ofthe signal peptide, somatostatin-28 and -14 may be generatedby processing of the prohormone. Indeed, a basic dipeptide(arginine-lysine) typical of structures found at cleavage sites ofother peptide hormones exists appropriately in positions -1and -2 just upstream from the somatostatin-14 sequence.There is only a single basic residue (arginine) at position - 15,directly prior to the start ofsomatostatin-28. We-assume, there-fore, that the processing enzymes involved require basic resi-dues and may be related to the trypsin family of proteases. Itis not yet clear whether somatostatin-28 is an obligatory pre-cursor to somatostatin-14 or whether the cleavages are inde-pendent. The human preprosomatostatin I sequence, togetherwith the proposed processing steps, account satisfactorily for thevarious forms of somatostatin immunoreactivity that have beendetected. A species of Mr 12,000-12,500, detected in rat pan-creatic islets (5) or in human somatostatinoma (18), seems largerthan expected for the Mr 10,348 prohormone. This differencemight arise because of glycosylation of the prohormone. In-spection of the sequence indicates a possible glycosylation siteat positions asparagine-glutamine-threonine (positions -42 to-40) in the prosomatostatin moiety (23), although there is noexperimental evidence to support this idea. Somatostatin-re-lated species of Mrs 3,000 and 1,600 presumably represent so-matostatin-28 and -14, respectively.The major question raised by the characterization of the pre-

prosomatostatin sequence involves the function of the NH2 ter-minal region of the prosomatostatin moiety. Is this simply a"connecting peptide" linking the functional somatostatin moietywith the signal peptide? Or do these amino acids serve anotherbiological function, such as is found in the multifunctional pro-opiocortin (24)? A comparison of the structures of the humanand anglerfish somatostatin I may provide insight with respectto the functional regions of the molecule because they shouldbe rather conserved during the evolutionary process. This com-parison is presented in Fig. 4; appropriate insertions have beenmade to maximize homology. The amino acids display 45% ho-mology overall, compared to 53% homology at the nucleotidelevel. Inspection of the two structures shows that somatostatin-14 is conserved precisely. The somatostatin-28 sequences are79% homologous between the two species, with most of theadditional homology coming from a block of six residues im-mediately preceding somatostatin-14. In addition, there is an-other block of conserved amino acids surrounding the cleavagesite for somatostatin-28 (positions - 15, - 17, and - 18). Theseconserved regions could provide specified sites for enzymaticcleavage. More persuasive evidence in favor of a function forsomatostatin-28 is provided by the 100% conservation of thispeptide among mammals. Further, recent experiments showthat somatostatin-28 and somatostatin-14 bind with differentaffinities to somatostatin receptors in various cells (25). The rea-son for the divergence between anglerfish and mammalian so-matostatin-28 at the amino terminus is not clear.The remainder of the protein sequence shows greater diver-

gence. There is clearly no region conserved to the degree of thesomatostatin-14 and -28 sequences. The signal peptides of thetwo species are probably of somewhat different lengths and areabout 38% homologous in amino acid sequence. Both putativeprepeptides display the type of sequence features (e.g., a hy-drophobic core) thought to be important for the secretion func-tion. The proregions of the prosomatostatin moiety are also ofdifferent lengths in the two species but still show 38% sequencehomology overall; interspersed, within this area are regions ofhigh homology (e.g., amino acids -32 to -41 and -45 to -60).Further, there is persuasive conservation of acidic, basic, hy-drophobic, and hydrophilic amino acids typical of related func-tional structures. Charged residues occur at 22 different posi-tions within-the pro regions: 9 of these positions are homologous,

whereas a further 11 positions represent conservative changes(see Fig. 4). The degree ofconservation of structure seen in thisregion of prosomatostatin is sufficiently strong to suggest thatthis region may have a biological role beyond simply connectingthe signal peptide with the functional somatostatin moiety. Thisregion could provide the necessary protein conformation to fa-cilitate processing of somatostatin-28 or somatostatin-14, orboth. Alternatively prosomatostatin moiety may possess so-matostatin-like activity different from either somatostatin-28 or-14. Finally, a distinct biological function for a portion of pro-somatostatin is not ruled out.

By using the somatostatin I cDNA sequences, it should bepossible to produce in an alternate host sufficient quantities ofprosomatostatin to test its biological activity and prepare anti-bodies against this region of the molecule. This should allowdecisive tests on its fate and function.

We thank Dr. Peter Hobart for helpful advice during the course ofthe experiments. We also are indebted to Dr. Graeme Bell, who pro-vided important technical advice during the research, and to Dr.Graeme Bell and Dr. David Standring for valuable advice during thepreparation ofthe manuscript. We acknowledge the assistance of LeslieSpector in typing the manuscript and Sonja Bock in computer analyses.The human somatostatinoma tissue was provided by Dr. Denis Mc-Carthy. This research was. supported by a grant from the National In-stitutes of Health (AM21344).

1. Arimura, A. (1981) Biomed. Res. 2, 233-257.2. Rorstad, 0. P., Epelbaum, J., Brazeau, P. & Martin, J. B. (1979)

Endocrinology 105, 1083-1092.3. Dodd, J. & Kelly, J. S. (1978) Nature (London) 273, 674-675.4. Noe, B. D., Fletcher, D. J. & Spiess, J. (1979) Diabetes 28,

724-730.5. Patzelt, C., Tager, H. S., Carroll, R. J. & Steiner, D. F. (1980)

Proc. Nati Acad. Sci. USA 77, 2410-2414.6. Pradayrol, L., Jornavall, H., Mutt, V. & Ribert, A. (1980) FEBS

Lett. 109, 55-58.7. Schally, A. V., Huang, W. Y., Chang, R. C. C., Arimura, A.,

Redding, T. W., Millar, R. P., Hunkapiller, M. W. & Hood, L.E. (1980) Proc. Nati Acad. Sci. USA 77, 4489-4493.

8. Esch, F., Bohlen, P., Ling, N., Benoit, R., Brazeau, P. & Guil-lemin, R. (1980) Proc. Nati. Acad. Sci. USA 77, 6827-6831.

9. Hobart, P., Crawford, R., Shen, L; P., Pictet, R. & Rutter, W.J. (1980) Nature (London) 288, 137-141.

10. Oyama, H., Bradshaw, R. A., Bates, 0. J. & Permutt, A. (1980)J. Biol Chem. 255, 2251-2254.

11. Andrews, P. C. & Dixon, J. E. (1981) J. BioL Chem. 256,8267-8270.

12. Taylor, W. L., Collier, K. J., Deschenes, R. J., Weith, H. L. &Dixon, J. E. (1981) Proc. NatL.Acad. Sci. USA 78, 6694-6698.

13. Chirgwin, J., Przybyla, A., MacDonald, R. J. & Rutter, W. J.(1979) Biochemistry 19, 5294-5299.

14. Glisin, V., Crkvenzakov, R. & Byus, C. (1974) Biochemistry 13,2633-2637.

15. Cooke, N. E., Coit, D., Weiner, R. I., Baxter, J. D. & Martial,J. A. (1980) J. Biol Chem. 255, 6502-6510.

16. Maxam, A. & Gilbert, W. (1980) Methods Enzymol 65, 499-560.17. Cooke, N. E., Coit, D., Shine, J., Baxter, J. D. & Martial, J. A.

(1981) J. Biol Chem. 256, 4007-4016.18. Krejs, G. J., Orci, L., Conlon, M., Ravazzola, M., Davis, G. R.,

Raskin, P., Collins, S. M., McCarthy, D. M., Baetens, D., Ru-benstein, A., Aldor, T. A. M. & Unger, R. H. (1979) N. Engl J.Med. 301, 285-292.

19. Bailey, J. & Davidson, N. (1976) AnaL Biochem. 70, 75-85.20. Thomas, P. S. (1980) Proc. Nati Acad. Sci. USA 77, 5201-5205.21. Austen, B. M. (1974) FEBS Lett. 103, 308-313.22. Standring, D. N. (1980) Dissertation (Harvard University, Cam-

bridge, MA).23. Hubbard, S. C. & Ivatt, R. J. (1981) Annu. Rev. Biochem. 50,

555-583:24. Nakanishi, S., Inoue, A., Kita, T., Nakamura, M., Chang, A. C.

Y., Cohen, S. N. & Numa, S. (1979) Nature (London) 278,423-427.

25. Strikant, C. B. & Patel, Y. C. (1981) Nature (London) 294,259-260.

Biochemistry: Shen et al.