7
THE JOURNAL OF BIOLOGlCAL CHEMISTRY Vol. 260, No. 20, Issue of September 15, pp. 11216-11222 1985 Printed in ~.S.A. Complete Primary Structure of the Human a2 Type V Procollagen COOH-terminal Propeptide” (Received for publication, March 29, 1985) Jeanne C. Myers$$, Helen R. Loidl$, Jerome M. Seyerll, and Arnold S. DionII From the SConnectiue Tissue Research Institute and the Departments of Medicine and Human Genetics, Uniuersity of Pennsyluania, Philadelphia, Pennsyluania 19104, the Weterans Administration Medical Center and the Departments of Medicine and Biochemistry, Uniuersity of Tennessee, Memphis, Tennessee 38104, and the 11 Department of Molecular Biology, Institute for Medical Research, Camden, New Jersey 08103 Recently we presented the partial covalent structure of a type V collagen chain. Analysis of amino acids 796-1020 in the human a2(V) Gly-X-Y region showed strong conservation of charged positions with the in- terstitial collagens but also revealed substitutions unique to type V. To gain more information about this procollagen and primarily to resolve the ambiguous nature of the 3’ noncollagenouspropeptide, we se- quenced several cDNA clones coding for amino acids adjacent to the carboxyl end of the chain. Here we report the complete primary structure of the a2(V) COOH-terminal propeptide. In general, the latter se- quence (270 residues) bears a greater degree of simi- larity tothose of the interstitial rather than the base- ment membrane procollagens. Compared to the inter- stitial procollagens, however, more divergence has oc- curred in a2(V) surrounding the conserved N-aspara- ginyl-linked carbohydrate attachment site at residues 171-173, and a2(V) possesses an additional potential glycosylation site (Asn-Lys-Thr) located in a hyper- variable region near the NH2 terminus. Although certainly premature to form any rigid hy- pothesis, a pattern emerges that may be characteristic of a2 uersus a1 chains. Both the a2(I) and a2(V) telo- peptides are devoid of a lysine, which in a1 chains forms an interchain cross-link with residue 87 of the collagenous region. Also in contrast to the interstitial a1 carboxyl propeptides is the absence in a2(I) and a2(V) of a cysteine that probably participates in an interchain disulfide bond. Therefore, one can speculate that those a2 chains, represented only once in procol- lagen trimers, may not be under the same selective pressure as a1 chains to maintain certain residues re- sponsible for stabilizing the triple helical molecules. The type V procollagen chains (al, a2, a3) constitute an unusual division of this large subfamily of structurally related proteins (1-4). Although widely distributed in many tissues, type V is the least abundant quantitatively as compared to types I-IV (4). These homo- or heterotrimertriple helical * These studies were supported by Grants AM33348 and AM20553 from the National Institutes of Health, United States Public Health Service Grants AM16505 and AA03732, and grants from the Veterans’ Administration and New Jersey State Commission on Cancer Re- search. The costs of publication of this article were defrayed in part by the paymentof page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 5 To whom correspondence should be addressed: Connective Tissue Research Institute, University of Pennsylvania, 3624 Market Street, Philadelphia, PA 19104. molecules (~ll(V)~, al(V)*aZ(V), al(V)a2(V)a3(V)) are con- sidered to be pericellular components and distinct from the interstitial types I, 11, and I11 and the basement membrane type IV (5-8). Like the former group, the type V chains have an uninterrupted collagenous domain of about 1000 amino acids; but, like type IV, type V isresistanttovertebrate collagenase that cleaves types 1-111 (9), has a low level of alanine, more hydroxylysine, and is more highly glycosylated (4). Recently we presented the first amino acid sequences of a type V collagen chain (10). These were derived from the nucleotide sequence of human cDNA clones coding for the large COOH-terminal a2(V)CBg1 peptide. Analysis of resi- dues 796-1020 in the Gly-X- Y region revealed a long proline- free stretch between 918-944 and generally strong conserva- tion of charged positions (Glu, Asp, Lys, Arg) with al(III), al(l), and aZ(1). Substitutions of certain polar amino acids, however, are unique to d(V) indicating more divergencethan among the interstitial collagens. Nevertheless, all four of these chains are much more related to each other than to human al(1V) collagen (11, 12). Left unresolved, and posing perhaps the major question concerning the type V structure, is the nature of the noncol- lagenous propeptides and their possible similarity to those of type IV or the interstitial procollagens (12-19). Whereas the conserved carboxyl propeptides of theselatterchainsare uniformly cleaved shortly after secretion, the shorter amino- terminal fragment of type I11 is sometimes retained (20). In contrast, both the type IV carboxyl and amino extensions are processed to a minimal extent, if at all (21, 22), indicating that they may play a specialized role in basement membranes. Seemingly intermediate between these two patterns is the situation of type V, since the final al(V) and a2(V) chains are larger than those obtained by pepsin digestion (4). The complexsequence of eventsculminating,therefore, in the partial removal of the type V terminal propeptides has been extremely difficult to clarify. Fessler, Fessler, and co-workers have extensively studied this process in hamster type V pro- collagen having a molecular composition of pro-al(V)3, and the chick type V heterotrimer, pr0-al(V)~pro-a2(V) (21, 23- 26). Interchain disulfide bridges are not found in the homo- trimer but usually are present in the heterotrimer, indicating that the pro-a2(V) propeptide may be responsible for the linkage. Only one &(V) propeptide with M, of 40,000 has been observed, but two propeptides, having M, of 85,000 and 35,000, are associated with the al(V) chain. Processing of the precursor molecules is slow, stepwise, and apparently incom- plete with a propeptide of 40,000 remaining withal(V) and a The abbreviations used are: CB, cyanogen bromide fragments; CR, conserved region; kb, kilobase(s). 11216

Complete Primary Structure of the Human a2 Type V Procollagen

Embed Size (px)

Citation preview

Page 1: Complete Primary Structure of the Human a2 Type V Procollagen

THE JOURNAL OF BIOLOGlCAL CHEMISTRY Vol. 260, No. 20, Issue of September 15, pp. 11216-11222 1985 Printed in ~ . S . A .

Complete Primary Structure of the Human a2 Type V Procollagen COOH-terminal Propeptide”

(Received for publication, March 29, 1985)

Jeanne C. Myers$$, Helen R. Loidl$, Jerome M. Seyerll, and Arnold S. DionII From the SConnectiue Tissue Research Institute and the Departments of Medicine and Human Genetics, Uniuersity of Pennsyluania, Philadelphia, Pennsyluania 19104, the Weterans Administration Medical Center and the Departments of Medicine and Biochemistry, Uniuersity of Tennessee, Memphis, Tennessee 38104, and the 11 Department of Molecular Biology, Institute for Medical Research, Camden, New Jersey 08103

Recently we presented the partial covalent structure of a type V collagen chain. Analysis of amino acids 796-1020 in the human a2(V) Gly-X-Y region showed strong conservation of charged positions with the in- terstitial collagens but also revealed substitutions unique to type V. To gain more information about this procollagen and primarily to resolve the ambiguous nature of the 3’ noncollagenous propeptide, we se- quenced several cDNA clones coding for amino acids adjacent to the carboxyl end of the chain. Here we report the complete primary structure of the a2(V) COOH-terminal propeptide. In general, the latter se- quence (270 residues) bears a greater degree of simi- larity to those of the interstitial rather than the base- ment membrane procollagens. Compared to the inter- stitial procollagens, however, more divergence has oc- curred in a2(V) surrounding the conserved N-aspara- ginyl-linked carbohydrate attachment site at residues 171-173, and a2(V) possesses an additional potential glycosylation site (Asn-Lys-Thr) located in a hyper- variable region near the NH2 terminus.

Although certainly premature to form any rigid hy- pothesis, a pattern emerges that may be characteristic of a2 uersus a1 chains. Both the a2(I) and a2(V) telo- peptides are devoid of a lysine, which in a1 chains forms an interchain cross-link with residue 87 of the collagenous region. Also in contrast to the interstitial a1 carboxyl propeptides is the absence in a2(I) and a2(V) of a cysteine that probably participates in an interchain disulfide bond. Therefore, one can speculate that those a2 chains, represented only once in procol- lagen trimers, may not be under the same selective pressure as a1 chains to maintain certain residues re- sponsible for stabilizing the triple helical molecules.

The type V procollagen chains ( a l , a2, a3) constitute an unusual division of this large subfamily of structurally related proteins (1-4). Although widely distributed in many tissues, type V is the least abundant quantitatively as compared to types I-IV (4). These homo- or heterotrimer triple helical

* These studies were supported by Grants AM33348 and AM20553 from the National Institutes of Health, United States Public Health Service Grants AM16505 and AA03732, and grants from the Veterans’ Administration and New Jersey State Commission on Cancer Re- search. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

5 To whom correspondence should be addressed: Connective Tissue Research Institute, University of Pennsylvania, 3624 Market Street, Philadelphia, PA 19104.

molecules (~ll(V)~, al(V)*aZ(V), al(V)a2(V)a3(V)) are con- sidered to be pericellular components and distinct from the interstitial types I, 11, and I11 and the basement membrane type IV (5-8). Like the former group, the type V chains have an uninterrupted collagenous domain of about 1000 amino acids; but, like type IV, type V is resistant to vertebrate collagenase that cleaves types 1-111 (9), has a low level of alanine, more hydroxylysine, and is more highly glycosylated (4). Recently we presented the first amino acid sequences of a type V collagen chain (10). These were derived from the nucleotide sequence of human cDNA clones coding for the large COOH-terminal a2(V)CBg1 peptide. Analysis of resi- dues 796-1020 in the Gly-X- Y region revealed a long proline- free stretch between 918-944 and generally strong conserva- tion of charged positions (Glu, Asp, Lys, Arg) with al(III) , a l ( l ) , and aZ(1). Substitutions of certain polar amino acids, however, are unique to d ( V ) indicating more divergence than among the interstitial collagens. Nevertheless, all four of these chains are much more related to each other than to human al(1V) collagen (11, 12).

Left unresolved, and posing perhaps the major question concerning the type V structure, is the nature of the noncol- lagenous propeptides and their possible similarity to those of type IV or the interstitial procollagens (12-19). Whereas the conserved carboxyl propeptides of these latter chains are uniformly cleaved shortly after secretion, the shorter amino- terminal fragment of type I11 is sometimes retained (20). In contrast, both the type IV carboxyl and amino extensions are processed to a minimal extent, if at all (21, 22), indicating that they may play a specialized role in basement membranes. Seemingly intermediate between these two patterns is the situation of type V, since the final al(V) and a2(V) chains are larger than those obtained by pepsin digestion (4). The complex sequence of events culminating, therefore, in the partial removal of the type V terminal propeptides has been extremely difficult to clarify. Fessler, Fessler, and co-workers have extensively studied this process in hamster type V pro- collagen having a molecular composition of pro-al(V)3, and the chick type V heterotrimer, pr0-al(V)~pro-a2(V) (21, 23- 26). Interchain disulfide bridges are not found in the homo- trimer but usually are present in the heterotrimer, indicating that the pro-a2(V) propeptide may be responsible for the linkage. Only one &(V) propeptide with M , of 40,000 has been observed, but two propeptides, having M , of 85,000 and 35,000, are associated with the al(V) chain. Processing of the precursor molecules is slow, stepwise, and apparently incom- plete with a propeptide of 40,000 remaining with al(V) and a

The abbreviations used are: CB, cyanogen bromide fragments; CR, conserved region; kb, kilobase(s).

11216

Page 2: Complete Primary Structure of the Human a2 Type V Procollagen

Procollagen a2(V) COOH-terminal Propeptide 11217

20,000 collagenase-resistant fragment remaining with a2(V). While their orientations have not been confirmed, the 40,000 a1(V) propeptide has been tentatively assigned to the amino terminus. Broek et al. (27) basically concur with the value for the a2(V) fragment (29,000 uersus 20,000) but find only an 18,000 globular region attached to the processed a l (V) chain. I t is possible that the differences in these results reflect the source of the material, since in the former case type V was isolated from fibroblast cultures, chick embryo crop, and blood vessels (21,23-26), whereas in the latter study the procollagen was prepared from lathyritic chick bone (27).

Isolation of the type V cleavage products in amounts nec- essary for further characterization has been greatly hindered by the low representation of this procollagen in cell culture and tissue (4). Therefore, we chose the alternate approach of deriving the amino acids directly from the nucleotide sequence of cDNA clones encoding one of these regions. Here we report the complete protein sequence of the a2 type V COOH- terminal propeptide and its relationship to those of other procollagen chains.

MATERIALS AND METHODS

Isolation of cDNA Clones-Isolation of the clones NH20, N7KK, and N6JJ from a normal fibroblast cDNA library has been described previously (10). The clone N3-6 was identified from the same library using a 5' restriction fragment of NH20. All clones were inserted into the PstI site of pBR322.

DNA Sequencing-HindIIIIPstI, HindIII/EcoRI, EcoRIIPstI, and PstI restriction fragments of the clones shown in Fig. 1 were elec- troeluted from 1% agarose gels or directly ligated to appropriately cleaved M13mp18,19 vectors (Bethesda Research Laboratories). The universal primer of 17 nucleotides (P-L Biochemicals) was used for Sanger sequencing reactions plus one primer of 17 nucleotides which was synthesized using sequences from the clone NH20 (Department of Chemistry, University of Pennsylvania). Transformation, dideoxy concentrations, and transcription conditions were essentially as de- scribed by Messing (29). Multiple reactions and gels were run using the fragments indicated. All of the coding and most of the 3' untrans- lated regions were determined from sequencing both DNA strands.

Northern Blot Hybridization-Normal fibroblast poly(A+) RNA was electrophoresed for 22 h a t 30 V in a 1.0% agarose/2 M formal- dehyde gel, transferred to nitrocellulose paper at 4 "C for 18 h in 10 X SSC (1 X SSC = 0.15 M NaCl, 0.015 M Na citrate, pH 6.8). Filters were baked for 2 h a t 76 "C and prehybridized overnight in a solution containing 50% formamide, 4 X SSC (30). Filter-bound RNA was hybridized to 32P nick-translated probes (specific activity, 0.9-1.1 X lo9 cpm/pg) a t a concentration of 4-6 ng/ml for 18-20 h at 40 "C in a 50% formamide, 4 X SSC solution. Filters were washed at a final concentration of 0.2 X SSC at 65 "C and exposed to x-ray film for the times reported in the legend to Fig. 4.

RESULTS AND DISCUSSION

D N A Sequencing of Human aB(V) cDNA Clones Coding for the COOH-terminal Propeptide-Previously, we described the isolation from a normal fibroblast cDNA library (19) of clones coding for the COOH-terminal CB9 peptide of the human a2(V) collagen chain (10). The identity of these recombinant molecules was established by comparing the derived amino acids with 31 residues determined by Edman degradation. DNA sequencing of the 3' part of the clones, adjacent to the final bases designating a Gly-X-Y triplet (Gly-His-Leu), re- vealed an open reading frame of 810 nucleotides coding for 270 noncollagenous residues. Since four overlapping cDNA clones obtained from the fibroblast library contained all or part of this region, we were able to conclusively identify the amino acids by completely sequencing both DNA strands. The endonuclease restriction maps and sequencing strategy using the M13 dideoxy procedure (29) are shown in Fig. 1. On the basis of the enzyme recognition sites, the noncollagenous domain is roughly delineated at the 5' end by the 5' EcoRV sequence, GATATC, coding for the 6th and 7th residues Asp- Ile, and at the 3' end by the unique EcoRI site, GAATTC, coding for Glu-Phe that precedes the TAA termination codon by 10 amino acids.

The a2(V) COOH-terminal Propeptide Displays High Ho- mology with Those of the Interstitial Chains-Upon initial examination of the a2(V) COOH-terminal propeptide we found striking similarities both in size and primary structure to the analogous region of other procollagens, namely those of the interstitial chains (13-19), as opposed to the smaller and highly divergent basement membrane al(1V) noncolla- genous peptide (12). Amino acid sequences for human al(III), a l ( I ) , and a2(1) were aligned with a2(V) and are shown in Fig. 2. The first 24 residues of a2(V) appear to fulfill the criteria of a procollagen telopeptide since the conserved resi- due Asp, designating the C-protease cleavage site in the interstitial procollagens, lies a t position 25. Whether cleavage may be influenced by the 5' residue being negatively charged, as only seen in a2(V) (where a Glu is present), is to our knowledge unknown. Divergence at this position from the conserved Ala in avian and human a l ( I ) , a2(1), and avian al(I1) is also seen in al(II1) where a Gly is found in the human protein and an Arg in the chick (13-19). The size of the a2(V) segment is more comparable to the human al(II1) and al(1) telopeptides containing 25 and 26 amino acids, respectively, uersus human a2(1) with 15 amino acids. How- ever, unlike human and avian al(II1) and al(1) and avian al(II) , but consistent with a2(I) of both species, is the absence of a lysine residue which in the a1 chains participates in a

Trlple Helical Region I 0 2 ( V ) C-Terrninol Propeptide I 3' Untranslated Region

J.

59pt % V

< I 1 I 1 I SN7KK - * I 1 1 I I 1

3 Y %. +NGJJ <

* ' I 1 I I I 1 *

T - >

FIG. 1. Restriction analysis and DNA sequencing strategy. The four cDNA clones were isolated from a normal fibroblast library (10, 19). Asterisks indicate insertion into the PstI site of pBR322. NH20, NGJJ, and N7KK are present in the p orientation and N3-6 in the q orientation. Thin lines below the maps show the regions of the inserts that have been sequenced multiple times to derive the nucleotides and amino acids reported. All of the COOH-terminal propeptide coding region was determined from both DNA strands. Triangle indicates the location of the specific type V 17-nucleotide primer. Procedures for dideoxy DNA sequencing are stated under "Materials and Methods."

5' NH20

f <

p" 3'

Page 3: Complete Primary Structure of the Human a2 Type V Procollagen

11218 Procollagen a2(V) COOH-terminal Propeptide

a 2 ( v ) A C A GCT GCT CTT GGG GXT ATC ATG GGG CAC TAT GXT GXA AGC ATG CCA GIT CCA C T T CCT GXG T T T A C T G ~ A T h r A l a A l a L e u G l y A s p I l e M e t G l y His T y r A s p G l u S e r M e t P r o A s p P r o L e u P r o G l u P h e T h r

s G l y G l y V a l - A l a A l a A l a I l e A l a G l y I l e G l y G l y G l u Lys A l a G l y G l y P h e A l a e T y r T y r G l y - Q 2 ( 1 ) G l y G l y G l y T y r A - A P h e - A - - G l y A A A - A D A Phe T y r A r g R A l a - '%! - G l y P h e A s p P h e S e r P h e Leu P r o G l n P r o P r o G l n G l u L s A l a H i s ~ G l y G l y A r g T y r T y r A A l . 3 -

a 2 ( v ) G l n A l a A l a P r o A s p A s p L y s A s n L s T h r A s p P r o G l y V a l H I S A l a T h r L e u Lys S e r L e u S e r S e r G l n I l e al(111) G l u P r o M e t A - P h e - & - A G l u I l e M e t T h r S e r - - - A l a A s n G l y - - Q l ( 1 1 A s p - A s n V a l V a l A A r g A s p A r g A - L e u G l u - A s p T h r - - - - - - G l n - - Q Z ( 1 ) - h S e r - - S e r Leu A r g P r o - A - T y r G l u - A s p - - - - - - A s n A s n - -

C A G GCG GCT C C T GXT GIC AXA AAC AXA A C G GXC C C A GGG G T T C A T GCT A C C CTG AXG T C A CTC AGT AGT C A G A T T

PR

I 2 Q 2 ( v ) G ~ A ACC ATG cEc AGC ccc G ~ T GGC T C G A ~ A AXG C A C CCA GCC cEc ACG TGT G ~ T GXC C T A AXG CTT TGC C A T TCC

Ql(1111 - S e r L e u I l e - - - - - A r g - A s n - - - - - P h e - P r o Q l ( 1 ) - A s n I l e - - - G l u - ND A r g - A s n - - - Q Z ( 1 ) - - Leu Leu T h r - G l u - - A r g - A s n - - - - A r g - - A r g - S e r - P r o

G l u T h r M e t A r g S e r P r o A s p G l y S e r Lys Lys H i s P r o A l a L y s L e u C y s H i s S e r

- M e t "

3 4 a2(r l G C A AXG C A G AGT GGT GIA TAC TGG ATT GXT CCT AAC CIA GGA TCT GTT GXA G ~ T GCC ATC AIA GTT TAC TGC AAC

Ql(111) G l u L e u LYS - - - - - v a l - - - - - LYS Leu - - - - al(1) A s p T r p Lys - - - - - - - - - - - A s n Leu - - - Q 2 ( 1 ) G l u T r p S e r - - T y r - - - - - - - - T h r M e t G l u - - - - -

Q 2 ( v ) A T G G X A A C A GGA GXA ACA TGT ATT T C A GCA AAC C C A TCC AGT GTA C C A CET AXA ACC TGG TGG GCC AGT AXA T C T

Ql(111) - - - - - M e t G l u T h r G l y G l u I l e S e r A l a A s n P r o S e r S e r V a l P r o A r g L y s T h r T r p T r p A l a S e r L y s S e r

a1(1) - - - - - Q 2 ( 1 ) P h e Pro - - - -

V a l T y r P r o T h r G l n P r o - - A l a G l n - A s n - T y r I l e - - A s n - A r g - G l n - G l u A s n I l e - A l a - A s n - T y r A A r g S e r -

A l a Lys 61 n S e r G l y G1 u T y r T r p I l e A s p P r o A s n G1 n G l y V a l 61 u A s p A l a I l e

5

- - - - - L e u A s n - - - - His - - T h r A s p S e r -

a2(rl C C T GXC AAT A ~ A C C T GTT TGG TAT GGT C T T G ~ T ATG AAC A ~ A GGG T C T CAG T T C GCT TAT GGA GIC CAC C A A T C A P r o A s p A s n Lys P r o V a l T r p T y r G l y Leu A s p M e t A s n A r g G l y S e r G l n P h e A l a T y r G l y A s p His G l n S e r

Q l ( 1 1 1 1 A1.a G l u Lys - H i s - - P h e - G l u S e r - A s p G l y - P h e - - S e r - - A s n P r o G l u L e u U l (1 ) - A - L y s A r g His - - P h e - G l u S e r - T h r A s p - P h e - - G l u - - G l y G l n G l y - fi Q 2 ( 1 ) ~ y s ~ - L y s - H I S - - L e u - G l u T h r I l e - A l a - - - - G l u - A s n V a l G l u G l y V a l

Q 2 ( v ) C C T A A T A C A G C C A T T A C T C A G ATG ACT T T T T T G c t c C T T T T A TCA A ~ A GXA GCC TCC CAG AAC ATC ACT TAC ATC P r o A s n T h r A l a I l e T h r G l n M e t T h r P h e Leu A r g Leu Leu S e r L y s G l u A l a S e r G l n A s n I l e T h r T y r I l e

Ql(111) - G l u A s p V a l L e u A V a l - Leu A l a - - - - - - S e r A r g - - - - - - - His

Q 2 ( 1 ) T h r S e r * G l u M e t A l a - - L e u A l a - M e t - - - A l a A s n T y r - - - - - - Q l ( 1 ) - A l a A s p V a l A l a D I l e - L e u - - - - - M e t - T h r - - - - - - - - His - His

6 Q 2 ( v ) T G T A ~ A A A C AGT GTA GGA TAC ATG G ~ C G ~ T C A A GCT A ~ G A A C C T C AXA ACA GCT GTG GTT CTC A ~ A GGG G C A A A T

V a l G l y T y r M e t A s p A s p G l n A l a Lys A s n Leu Lys T h r A l a V a l V a l L e u Lys G l y A l a A s n I l e A l a - - - G l n A l a S e r G l y - V a l - Lys - L e u Lys - M e t - S e r -

" - I l e A l a - - - G l u G l u T h r G l y - - - L y s - - I l e - G l n - S e r - - A l a - - - G l n - T h r G l y - - - Lys - Leu Leu - HD - S e r -

7 Q 2 ( v ) G ~ C T T A G ~ T ATC ALA GCA GXG GGA A A T ATT AEA TCC CEG TAT ATC GTT C T T CAA GAC ACT TGC T C T A ~ G CEG A A T

A s p L e u A s p I l e L y s A l a G l u G l y A s n I l e A r g Ser A r g T y r I l e V a l L e u G l n A s p T h r C y s S e r Lys A r g A s n Ql(111) G l u G l y G l u P h e - - - - - S e r L y s P h e T h r - T h r - - G l u - (Yl(1) G l u I l e G l u - A r g - - - - S e r - P h e T h r - S e r - T h r V a l - T h r - H i s T h r

Q 2 ( 1 ) - V a l G l u Leu V a l - - - - S e r - P h e T h r - T h r - - V a l - G l y T h r S e r His T h r - - Lys T h r

Q 2 ( V ) GGA A A T GTG GGC AXG A C T GTC T T T GXA TAT AEA A C A C A G A A T GTG GCA cEc T T G ccc ATC ATA G ~ T C T T GCT C C T G l y A s n V a l G l y Lys T h r V a l P h e G l u T y r A r g T h r G l n A s n V a l A l a A r g L e u P r o I l e I l e A s p L e u A l a P r o

Q l ( 1 1 1 ) . - G l u T r p S e r - - - - - - - - A r g L y s A l a V a l - - - - V a l - I l e - - Q l ( 1 ) - A l a T r p - - - - I l e - - L y s - T h r L y s S e r S e r - - - - - - V a l - - ( r Z ( 1 ) A s n G l u T r p - - - I l e I l e - - L y s - A s n L y s P r o S e r - - - P h e Leu - I l e - -

8 a2(r , GTG G ~ T G T T GGC GGC A C A G ~ C C A G G ~ A T T C GGC G T T G ~ A ATT GGG C C A GTT TGT T T T GTG

V a l A s p V a l G l y G l y T h r A s p G l n G l u P h e G l y V a l G l u I l e G l y P r o Ql(ll1) T y r - I l e - - P r o - - - - - - A s p V a l - - Q l ( 1 ) L e u - - - A l a P r o - - - - - P h e A s p V a l - - Q 2 ( l ) L e u - I l e - - A l a - His - - P h e - A s p - - - - - L y s

az(w @AGT AAG C C A AGA C A C ATC GAC AAT GAG CAC C A C C A T C A A TGA C C A C C G CCA T T C A C A AGA A C T TTG A C T G T T

Q 2 t T ) TGA AGT TGA TCC TGA GAC TCT TGA AGT AAT GGC TGA TCC TGC ATC AGC AT1 GTA TAT ATG GTC TTA ACT GCC TGG

aztr) CCT CCT T A T CCT T C A GAA TAT T T A T T T TAC T T A C A A TCC T C A AGT T T T A A T TGA T T T T A A A T A T T T T T C A A T A C A

Q 2 ( V ) ACA GTT TAG GTT TAA GAT GAC CIA TGA CAA TGA CCA CCT TTG CAG AAA GTA AAC TGA TTG (AAT b,AA]TAA ATC TCC

Q 2 ( V ) GTT TTC TTC AAT TTA TTT CAG TGT AAT GAA AAA GTT GCT TAG TAT TTA TGA GGA AAT TCT TCT TCC TGG CAG GTA

Q Z ( V ) GCT TAA AGA GTG GGG TAT ACA GAG CCA CAA CAC ATG TTT AT1 TTA CAA AAG CTG CAG TTG AAA AAT AGA AAT TAG

Q Z f V ) TGC CCT TTT GTG ACC TCT CAT TCC AAG AT1 GTC AAT TAA AAA TGA GTT TAA AAT GTT

FIG. 2. Sequence of the a2(V) COOH-terminal propeptide and part of the 3' untranslated region. Top two lines are the nucleotide and derived amino acids sequences of the a2(V) carboxyl propeptide obtained

25

50

75

LOO

125

150

175

200

225

250

Page 4: Complete Primary Structure of the Human a2 Type V Procollagen

Procollagen a2W1 COOH-terminal Propeptide 11219

stabilizing interchain cross-link with residue 87 of the triple helical region (4). In fact, no positively charged amino acids are included in this region of a2(V), which instead displays an unusually high overall negative charge with 6 of 24 posi- tions filled by Glu or Asp.

Assuming that Asp is the first amino acid of the carboxyl propeptide, the human a2(V) structure contains 246 residues, identical to the number in al(1) and only slightly different from the 245 in al(II1) and 247 in a2(I) from the same species (14, 15, 19). Since two outstanding features of the conserved interstitial carboxyl propeptides are the number and location of the cysteines and the presence of a unique N-linked car- bohydrate attachment site (31), we first analyzed whether these homologies also pertained to a2(V). Both the human and avian al(II1) and al(1) carboxyl propeptides contain eight cysteines (13, 15, 19) while the a2(I) propeptides (13, 14) are minus cysteine 2 (numbering is from 5’43’) . Interestingly, now another a2 chain is also minus one cysteine, that being number 3. Both substitutions are to serine residues, encoded by TCT in d ( V ) and AGC in a2(I), and are, therefore, single nucleotide changes from the cysteine codons TGT and TGC (Fig. 2 and Ref. 14). Although the human al(I1) sequences (32) are as yet unreported, one can extrapolate from the avian al(I1) chain where again eight cysteines are present (17, 18). If the same cysteines are involved in inter- and intrachain disulfide bonds as reported for avian a l ( I ) , cysteines 1, 2, 3, and 4 form interchain linkages and 5, 6, 7, and 8 are involved in intrachain linkages where, in the latter, they occur between 5 and 8, and 6 and 7 (28). Therefore, distinctive to the a2(I) and a2(V) chains, represented only once in their respective triple helices (4), is the absence of a potential interchain disulfide linkage.

Within the carboxyl propeptide of human and avian a l ( I ) , a2(1), and al(II1) and avian al(II), requisite sequences for N- asparaginyl-linked carbohydrate attachment, Asn-X-Ser/Thr (13-19, 31, 33-36) are found. In almost all instances, this sequence was observed to be Asn-Ile-Thr (Asn-Val-Thr in avian crl(1)) and located at almost identical positions in each domain (171-173 in aligned sequences in Fig. 2). In contrast, the COOH-terminal noncollagenous region of a1(IV) does not possess any corresponding N-linked carbohydrate attachment site (12), whereas two potential sites occur in the carboxyl- terminal propeptide of a2(V), i.e. Asn-Lys-Thr and Asn-Ile- Thr a t positions 33 to 35 and 171 to 173, respectively. The triplet Asn-Pro-Ser at positions 111 to 113 in a2(V) has not been considered because of the occurrence of Pro in the X - position, which has been demonstrated to block in vitro gly- cosylation acceptor activity for mode1 oligopeptides (36). These results most likely also pertain to in vivo glycosylation since relatively extensive computer searches of 105 proteins revealed that Asn-Pro-Ser/Thr sequences were invariably unglycosylated (37). A second potential glycosylation site (Asn-Leu-Ser) was also noted in the avian al(I1) carboxyl propeptide (17, 18) in a hypervariable region around position

150. Although from in vitro studies this sequence appeared to be a nonacceptor due to a Pro on the COOH-terminal side of the triplet (36), this concept was not verified in vivo (37). However, since a high number of asparagines are not glyco- sylated when Glu precedes Asn (371, as seen in type I1 (17, 18), this site probably possesses low potential for the attach- ment of an oligosaccharide moiety, unlike the additional one in a2(V).

Divergence of the a2(V) Sequence-Residues 55-107 and 157-270 (termination) delineate the two major stretches of similarity among all four chains. The amino-conserved region (NH2-CR) includes most of the cysteines while the carboxyl- conserved region (COOH-CR) includes the common glycosy- lation attachment site and three remaining cysteines. In these areas of a2(V) there are 19 substitutions of amino acids which are conserved in the three interstitial chains (Fig. 3, top). This value just exceeds 15 in a2(I) but is much greater than the five in each of al(1II) and al(1). One from His to Ile at position 175 in a2(V) occurs within a 12-residue uninter- rupted region in al(III), al(I), and a2(I). In this area, al(1) and a2(I) also show nucleotide homology with 30 identical bases coding for amino acids 170-179 (inclusive of the Asn- Ile-Thr triplet), whereas six are changed in a2(V) and four in al(II1) (Ref. 19 and Fig. 2). Since the genes encoding these latter chains are cytologically linked on chromosome 2 (38), in contrast to the dispersed al(I), a2(I), and al(I1) genes (39, 40) we searched for similarities selectively linking a2(V) with al(II1). These proteins do in fact share 14 residues between 99-112, while five changes are present in each of al(1) and aP(1). Notably, this region in a2(V) coincides with the one exhibiting the highest degree of homology with any of the three chains when based upon contiguous amino acids.

When we analyzed the two conserved areas of the four carboxyl propeptides for changes of charged and hydrophobic residues that may influence the conformation of the structure, a2(V) as before showed the most variation. The list of 14 amino acids shown in Fig. 3 (bottom) greatly exceeds five in al(II1) and two in al(1) when evaluated by the same criteria. These include the only reversal of a charged residue in three of the four proteins, Asp in a2(V) at position 68 versus Arg in the others. The considerable divergence seen also in a2(I) (8 changes, Fig. 3) adds further weight to the speculation that a 2 chains may not be under the same selective pressure as a1 chains and/or have adapted to meet specific requirements.

Coding Capacity of the a2(V) mRNA-To estimate a maxi- mum value for the size of the a2(V) polypeptide chain from the coding region of the transcript, we hybridized the cDNA clone to normal fibroblast mRNA (Fig. 4). The sizes of the two a2(V) species are 5.0 and 6.3 kb when the al(1) mRNAs of 4.8 and 5.8 kb (41) are used as standards. Polymorphic mRNAs also hybridize to human procollagen a2(I) (42), al(1V) (12), and al(II1) cDNA clones (Ref. 19 and Fig. 4), and in al(1) and a2(I) have been shown to differ in the length of their 3‘ untranslated regions (41,42). From the a2(V) clone

from the DNA sequence of the four clones shown in Fig. 1. Residue 1, Thr, follows the last triplet of the a2(V) collagenous region, Gly-His-Leu (10). Lines 3 , 4 , and 5 show the analogous amino acids in three human interstitial procollagen chains: line 3, al(II1) COOH propeptide; line 4, a l ( I ) COOH propeptide; and line 5, a2(I) COOH propeptide obtained from the nucleotide sequence (14, 15, 19). Highly charged residues in a2(V) (Asp, Glu, Arg, and Lys) are indicated by asterisks, cysteines are within boxes and those in the carboxyl propeptide are numbered 1-8 (5’+3’). The two potential N-linked glycosylation attachment sites in a2(V) are underlined, and an arrow designates the presumed C-proteinase cleavage site. Dashes show no change from the a2(V) amino acid, and open triangles show no corresponding amino acid at that position. Alignment of the four peptides has been determined from that giving the greatest homology of cysteines and highly charged residues (Asp, Glu, Lys, and Arg). The termination codon TAA in a 2 ( V ) is circled and is followed by 504 3’ noncoding nucleotides. The two overlapping potential poly(A) attachment signals for the smaller aZ(V) RNA are within horizontal brackets.

Page 5: Complete Primary Structure of the Human a2 Type V Procollagen

11220

M a 2 ~ OTHERS

- 158 '175 181 *188 *192 a

0

'210 = 199

I 203

8 -12 0 -13

220 225

*239 228

263

LY 5

H i s ASP Ser

Met I l e

61 Y LyS

Thr Ala

ASP

I l e Ser

Arg Thr

Asn Va 1 Asn

61 u

= 0 I * 62 68 Asp Asn Arg 76 Ala Asp/Glu 77 Lys Hydrophobic 92 Glu Hydrophobic

a 0

0 X 0 0

I

166 LY I l e

LY Thr I l e Ser

Arg I l e

As n

Uncharged

Charged

G ~ Y L V S

Ser Phe

Thr Ser/Thr

LY

Procollagen a2(V) COOH-terminal Propeptide

M 02(I) O M R S AA auIII) OTHERS

55 Thr Ser 66 Asn Thr 71 Arg Lys 84 Val I l e 73 Ser Cys 81 Tyr Glu 93 Glu Asp

*lo0 Asp Asn 101 Phe Met *lo2 Pro Glu

161 Met Leu 165 Ala Ser

226 Asn Gly 232 I l e Val

245 Phe Ile *258 His Gln

-61 Phe Gly

190 Val Leu

211 Lys Arg 229 Ser Gly

164 Met Leu *217 Thr Leu *223 Ser Lys 255 Ala Gly 262 Phe Val

M 0211) OTIERS M a l ( I I 1 ) OTHERS M a m ) OTHERS

81 Tyr Glu 91 Lys Uncharged *lo0 Asp Asn *lo2 Pro Glu

186 Glu Uncharged 195 Lys Hydrophobic -17 Thr Leu 205 Val ArglLys 202 Gly Hydrophobic *223 Ser Lys

'258 His Gln 218 Glu Uncharged '261 Phe Gly 238 Arg Uncharged

270 Lys Hydrophoblc

FIG. 3. Divergence of amino acids in conserved regions of human procollagen carboxyl propeptides. The two conserved regions (CR) of the human a2(V), a2(I), cul(III), and al(1) carboxyl propeptides were evaluated for differences unique to only one of the four chains. NH,-CR designates residues 55-107 (Fig. 2) inclusive of five (four in a2 chains) cysteines, and COOH-CR designates the region from 157--270 inclusive of the last three cysteines and conserved glycosylation attachment site. The list at the top (NH,-CR and COOH-CR) shows a change in one chain from an amino acid common to the other three. Bottom part (NH,-CR and COOH-CR) shows a difference in one chain of charged or hydrophobic residues (but not necessarily identical amino acid) common to the other three chains. Asterisks indicate representation of a residue in both top and bottom categories.

extending most 3' (N3-6 in Fig. l ) , we obtained the sequence Therefore, the 6.3-kb RNA appears to have a long untrans- of 507 3' noncoding nucleotides inclusive of two overlapping lated region of 1600 bases and the smaller RNA one of 300 Brownlee-Proudfoot sequences, AATAAA (43). These most bases. Since 810 nucleotides code for the COOH-terminal likely designate the poly(A) attachment signal for the smaller propeptide (Fig. 2) and presumably 3060 for the collagenous 5.0-kb n2(V) mRNA since a 32P-labeled Xmnl fragment of domain (lo), the 830 remaining bases account for the NH,- the N3-6 plasmid (Fig. I), containing sequences 3' to this terminal peptide and 5' untranslated region which in other site, hybridizes only to the 6.3-kb RNA (data not shown). procollagen genes has a midpoint value of about 130 nucleo-

Page 6: Complete Primary Structure of the Human a2 Type V Procollagen

Procollagen ru2(V) COOH-terminal Propepti& 11221

6.3 kb - a - 5.8kb

5.0kb - ’ I) c) - 4.8kb

a2(W al(lll) al(l) Flc. 4. Hybridization of human procollagen cDNA clones to

fibroblast poly(A*) RNAs. Normal fihrohlast poly(A+) RNA, 0.7 pgllane. was electrophoresed in a 1’0 agarose/2 M formaldehyde gel and transferred to nitrocellulose filters (see “Materials and Meth- ods”). Three filters, taken from the same gel, were hybridized to the 1.:15-kh cr2(V) cDNA prohe. NH20 (Fig. l ) , specific activity 0.96 X 10!’cpm/pg, the 2.4-kb l r l ( l l l ) cDNA prohe. E6 (19). specific activity 1.1 X 10‘cpm/pg, and the 2.4-kh t u l ( 1 ) cDNA probe, n12 (19). specific activity 1.0 X lo9 cpm/pg. Autoradiography was for 40. 20, and 3 h for the [r2(V), nl( l l l ) , and c r l ( l ) filters, respectively. The sizes of the two cr2(V) transcripts, 6.3 and 5.0 kh, and t r l ( I 1 l ) transcripts. 5.5 and 4.9 kh. were determined by using the n l ( l ) mRNA of 5.8 and 4.8 kh as standards. These latter values, reported by Chu ct al. (41), were determined by DNA sequencing of the nl(I) gene.

tides (41,44). Iftr2(V) follows this pattern, the NH,-terminal propept,ide would then he just smaller or almost equal to the C00H-terminal propeptide and thus larger than those of the interstitial procollagens (44).

SUMMARY

T h e tr2(V) COOH-terminal propeptide conforms to the structure expected of procollagens related to the interstitial rather than basement membrane chains. Comparison with human trl(III), c r l ( I ) , and & ( I ) corroborates previous obser- vations (18, 19) of this domain having a divergent NH, terminus inclusive of the telo- and first 30 residues of the carboxyl propeptide, a homologous internal region from 55- 107 encompassing five cysteines (four in n2 chains), a hyper- variable area from 108-156, and the final highly conserved one-third region (residues 157-270) containing the common glvcosylation attachment site and last three cysteines. There- fore, in t.he conserved regions it is easiest to note changes dist.inguishing one chain from the others. Finding increased amino acid substitutions unique to tr2(V) was in accord with our recent analysis of the COOH terminus of the n chain, strongly favoring the speculation that the n2(V) gene preceded those encoding the interstitial procollagens (10). Of the latter, human trl(II1) is more divergent from its avian counterpart and appears to have evolved earlier (13-19,32). This hypoth- esis is probably not overly simplistic even in light of compli- cations presented by the interesting possibility, as also enter- tained by Miller (4), tha t n2 chains may exhibit a faster rate of divergence due to less constraints than placed upon r r l chains.

Nevertheless, of paramount importance is to ascertain whether these evolutionary changes are functionally silent or are manifested in the secondary structure. Since the carboxyl propeptide is responsible for selection and assembly of the polypeptide chains into a triple helix (4, 28, 45), we are currently analyzing this region by employing Chou-Fasman

Page 7: Complete Primary Structure of the Human a2 Type V Procollagen

11222 Procollagen a2(V) COOH-terminal Propeptide

28. Olsen, B. R. (1982) in New Trends in Basement Membrane Research (Kuhn, K., Schoene, H., and Timpl, R., eds) pp. 225- 236, Raven Press, New York

29. Messing, J . (1983) Methods Enzyrnol. 101, 20-78 30. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) in Molecular

Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

31. Clark, C . C. (1979) J. Biol. Chem. 254, 10798-10802 3 2 . Strom, C. M., and Upholt, W. B. (1984) Nucleic Acids Res. 1 2 ,

33. Aubert, J. P., Biserte, G., and Loucheux-Lefebvre, M. (1976)

34. Bause, E., and Hettkamp, H. (1979) FEBS Lett. 1 0 8 , 341-344 3 . 5 . Struck, D. K., and Lennarz, W. J . (1980) in Biochemistry of

Glycoproteins end Proteoglycans (Lennarz, W. J., ed) pp. 35- 83, Plenum Press, New York

1025-1038

Arch. Biochem. Biophys. 175,410-418

36. Bause, E. (1983) Biochem. J . 2 0 9 , 331-336 37. Mononen, I. , and Karjalainen, E. (1984) Biochim. Biophys. Acta

38. Emanuel, B. S., Cannizzaro, L. A,, Seyer, J . M., and Myers, J . C. 788,364-367

(1985) Proc. Natl. Acad. Sci. U. S. A. 8 2 , 3385-3389 39. Huerre, C., Junien, C., Weil, D., Chu, M-L., Morabito, M., Van

Cong, N., Myers, J. C., Foubert, C., Gross, M-S., Prockop, D. J., Boue, J., Kaplan, J . C., de la Chapelle, A., and Ramirez, F. (1982) Proc. Natl. Acad. Sci. U. S. A. 7 9 , 6627-6630

40. Strom. C. M.. Eddy. R. L.. and Shows, T. B. (1984) Somatic Cell Mol.’ Genet.’ 10, 651-655

41. Chu. M-L.. deWet. W.. Bernard. M.. and Ramirez. F. (19851 J. Bid. Chem. 260; 2315-2320

42. Myers, J. C., Dickson, L. A., deWet, W. J., Bernard, M. P., Chu, M-L., DiLiberto, M., Pepe, G., Sangiorgi, F. 0.. and Ramirez, F. (1983) J. Bid. Chem. 2 5 8 , 10128-10135

43. Proudfoot, N. J., and Brownlee, G. G. (1975) Nature 2 6 3 , 211- 214

44. Kohno, K., Martin, G. R., and Yamada, Y. (1984) J. Biol. Chem. 2 5 9 , 13668-13673

45. Pihlajaniemi, T., Dickson, L. A,, Pope, F. M., Korhonen, V. R., Nicholls. A.. Prockou. D. J.. and Mvers. J. C. (1984) J. Biol.

. .

Chern. 259,’12941-i2944

148

“ I

46. Chou, P. Y., and Fasman, G. D. (1978) Adu. Enzymol. 47, 45-