12
Candidate Cell and Matrix Interaction Domains on the Collagen Fibril, the Predominant Protein of Vertebrates * S Received for publication, November 13, 2007, and in revised form, April 11, 2008 Published, JBC Papers in Press, May 15, 2008, DOI 10.1074/jbc.M709319200 Shawn M. Sweeney, a Joseph P. Orgel, b Andrzej Fertala, c Jon D. McAuliffe, d Kevin R. Turner, e Gloria A. Di Lullo, e Steven Chen, f Olga Antipova, b Shiamalee Perumal, b Leena Ala-Kokko, g,h Antonella Forlino, i,j Wayne A. Cabral, j Aileen M. Barnes, j Joan C. Marini, j and James D. San Antonio e1 From the a Cardiovascular Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, the b Center for Synchrotron Radiation Research and Instrumentation, Department of Biological, Chemical, and Physical Sciences, Illinois Institute of Technology, Chicago, Illinois 60616, the c Department of Dermatology and Cutaneous Biology, Thomas Jefferson University, Philadelphia, Pennsylvania 19107, the d Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia 19104, Pennsylvania, the f Chicago Medical School, North Chicago, Illinois 60064, the g Collagen Research Unit, Biocenter and Department of Medical Biochemistry and Molecular Biology, University of Oulu, Oulu, Finland, h Connective Tissue Gene Tests, Allentown, Pennsylvania 18103, the i Department of Biochemistry A. Castellani, University of Pavia, Pavia, Italy, the j Bone and Extracellular Matrix Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, and the e Cardeza Foundation for Hematologic Research and Department of Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania 19107 Type I collagen, the predominant protein of vertebrates, poly- merizes with type III and V collagens and non-collagenous mol- ecules into large cable-like fibrils, yet how the fibril interacts with cells and other binding partners remains poorly under- stood. To help reveal insights into the collagen structure-func- tion relationship, a data base was assembled including hundreds of type I collagen ligand binding sites and mutations on a two- dimensional model of the fibril. Visual examination of the distribu- tion of functional sites, and statistical analysis of mutation distri- butions on the fibril suggest it is organized into two domains. The “cell interaction domain” is proposed to regulate dynamic aspects of collagen biology, including integrin-mediated cell interactions and fibril remodeling. The “matrix interaction domain” may assume a structural role, mediating collagen cross-linking, pro- teoglycan interactions, and tissue mineralization. Molecular modeling was used to superimpose the positions of functional sites and mutations from the two-dimensional fibril map onto a three-dimensional x-ray diffraction structure of the collagen microfibril in situ, indicating the existence of domains in the native fibril. Sequence searches revealed that major fibril domain elements are conserved in type I collagens through evo- lution and in the type II/XI collagen fibril predominant in carti- lage. Moreover, the fibril domain model provides potential insights into the genotype-phenotype relationship for several classes of human connective tissue diseases, mechanisms of integrin clustering by fibrils, the polarity of fibril assembly, het- erotypic fibril function, and connective tissue pathology in dia- betes and aging. Type I collagen is the most abundant protein in humans and other vertebrates, comprising much of the fibrous extracellular matrix scaffold of bones, tendons, skin, and many other tissues (1– 4). In general, type I collagen and its binding partners are proposed to provide mechanical strength and form to tissues. Collagenous scaffolds are laid down and remodeled by cells and are also a predominant substrate for cell interactions, migra- tion, and differentiation. Consequently, various debilitating human diseases are associated with type I collagen mutations, including osteogenesis imperfecta (OI, 2 brittle bone disease), Ehlers Danlos syndrome, vascular disorders, and others (3, 5). Type I collagen is also employed in human medicine as hemo- static sponges and implants to repair wounds and in tissue engi- neering applications as scaffolds (6). Type I collagen is synthesized in the endoplasmic reticu- lum as 1 and 2 procollagen chains, each encoded by sep- arate genes that are translated into proteins somewhat lon- ger than 1000 amino acid residues (3, 7). Nucleation domains on the C-terminal propeptide promote the polymerization of two 1 and one 2 chains into the procollagen triple helical monomer (Fig. 1, A and B). The triple helical domain of procollagen is composed of contiguous glycine-X-Y tri-pep- tide repeats, with the obligate glycine in the first position as its side chain is the only one small enough to fit within the coiled-coil of the triple helix. Extracellularly, N- and C-pro- teinases remove the globular termini of procollagen, and every 67 nm along the fiber axis, five monomers assemble in a quarter-staggered fashion to form part of the supramo- lecular “helix,” the microfibril. Each microfibril, the pro- posed subunit of the collagen fibril (Fig. 1, C–E), and its immediate microfibrillar neighbors are connected by N- and C-terminal intermolecular cross-links. The basic repeating * This work was supported, in whole or in part, by National Institutes of Health Grants AR048544 (to A. F.), AHA 0435339Z, NIH RR08630, and NSF 0644015 (to J. P. R. O. O.), and NIH AR049604 and HL053590 (to J. S. A.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. S The on-line version of this article (available at http://www.jbc.org) contains supplemental text, equations, and Fig. S1. 1 To whom correspondence should be addressed: Thomas Jefferson Univer- sity, Cardeza Foundation for Hematologic Research and Dept. of Medicine, Rm. 809 Curtis, 1015 Walnut St., Philadelphia, PA 19107-5099. Tel.: 215- 955-6121; Fax: 215-923-3836; E-mail: [email protected]. 2 The abbreviations used are: OI, osteogenesis imperfecta; PG, proteogly- can; THP, triple helical peptide; MLBR, major ligand binding region; MMP, matrix metalloproteinase; SPARC, secreted protein, acidic, and rich in cysteine; FACIT, fibril-associated collagens with interrupted tri- ple helices. THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 283, NO. 30, pp. 21187–21197, July 25, 2008 Printed in the U.S.A. JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21187 by guest on January 12, 2020 http://www.jbc.org/ Downloaded from

CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

Candidate Cell and Matrix Interaction Domains on theCollagen Fibril, the Predominant Protein of Vertebrates*□S

Received for publication, November 13, 2007, and in revised form, April 11, 2008 Published, JBC Papers in Press, May 15, 2008, DOI 10.1074/jbc.M709319200

Shawn M. Sweeney,a Joseph P. Orgel,b Andrzej Fertala,c Jon D. McAuliffe,d Kevin R. Turner,e Gloria A. Di Lullo,e

Steven Chen,f Olga Antipova,b Shiamalee Perumal,b Leena Ala-Kokko,g,h Antonella Forlino,i,j Wayne A. Cabral,j

Aileen M. Barnes,j Joan C. Marini,j and James D. San Antonioe1

From the aCardiovascular Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, the bCenter for SynchrotronRadiation Research and Instrumentation, Department of Biological, Chemical, and Physical Sciences, Illinois Institute ofTechnology, Chicago, Illinois 60616, the cDepartment of Dermatology and Cutaneous Biology, Thomas Jefferson University,Philadelphia, Pennsylvania 19107, the dDepartment of Statistics, Wharton School, University of Pennsylvania, Philadelphia 19104,Pennsylvania, the fChicago Medical School, North Chicago, Illinois 60064, the gCollagen Research Unit, Biocenter and Departmentof Medical Biochemistry and Molecular Biology, University of Oulu, Oulu, Finland, hConnective Tissue Gene Tests, Allentown,Pennsylvania 18103, the iDepartment of Biochemistry A. Castellani, University of Pavia, Pavia, Italy, the jBone and ExtracellularMatrix Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health,Bethesda, Maryland 20892, and the eCardeza Foundation for Hematologic Research and Department of Medicine, ThomasJefferson University, Philadelphia, Pennsylvania 19107

Type I collagen, the predominant protein of vertebrates, poly-merizes with type III and V collagens and non-collagenousmol-ecules into large cable-like fibrils, yet how the fibril interactswith cells and other binding partners remains poorly under-stood. To help reveal insights into the collagen structure-func-tion relationship, a data basewas assembled including hundredsof type I collagen ligand binding sites and mutations on a two-dimensionalmodelof the fibril.Visual examinationof thedistribu-tion of functional sites, and statistical analysis of mutation distri-butions on the fibril suggest it is organized into two domains. The“cell interaction domain” is proposed to regulate dynamic aspectsof collagen biology, including integrin-mediated cell interactionsand fibril remodeling. The “matrix interaction domain” mayassume a structural role, mediating collagen cross-linking, pro-teoglycan interactions, and tissue mineralization. Molecularmodeling was used to superimpose the positions of functionalsites andmutations from the two-dimensional fibril map onto athree-dimensional x-ray diffraction structure of the collagenmicrofibril in situ, indicating the existence of domains in thenative fibril. Sequence searches revealed that major fibrildomain elements are conserved in type I collagens through evo-lution and in the type II/XI collagen fibril predominant in carti-lage. Moreover, the fibril domain model provides potentialinsights into the genotype-phenotype relationship for severalclasses of human connective tissue diseases, mechanisms ofintegrin clustering by fibrils, the polarity of fibril assembly, het-erotypic fibril function, and connective tissue pathology in dia-betes and aging.

Type I collagen is the most abundant protein in humans andother vertebrates, comprising much of the fibrous extracellularmatrix scaffold of bones, tendons, skin, and many other tissues(1–4). In general, type I collagen and its binding partners areproposed to provide mechanical strength and form to tissues.Collagenous scaffolds are laid down and remodeled by cells andare also a predominant substrate for cell interactions, migra-tion, and differentiation. Consequently, various debilitatinghuman diseases are associated with type I collagen mutations,including osteogenesis imperfecta (OI,2 brittle bone disease),Ehlers Danlos syndrome, vascular disorders, and others (3, 5).Type I collagen is also employed in human medicine as hemo-static sponges and implants to repair wounds and in tissue engi-neering applications as scaffolds (6).Type I collagen is synthesized in the endoplasmic reticu-

lum as �1 and �2 procollagen chains, each encoded by sep-arate genes that are translated into proteins somewhat lon-ger than 1000 amino acid residues (3, 7). Nucleation domainson the C-terminal propeptide promote the polymerization oftwo �1 and one �2 chains into the procollagen triple helicalmonomer (Fig. 1, A and B). The triple helical domain ofprocollagen is composed of contiguous glycine-X-Y tri-pep-tide repeats, with the obligate glycine in the first position asits side chain is the only one small enough to fit within thecoiled-coil of the triple helix. Extracellularly, N- and C-pro-teinases remove the globular termini of procollagen, andevery �67 nm along the fiber axis, five monomers assemblein a quarter-staggered fashion to form part of the supramo-lecular “helix,” the microfibril. Each microfibril, the pro-posed subunit of the collagen fibril (Fig. 1, C–E), and itsimmediate microfibrillar neighbors are connected by N- andC-terminal intermolecular cross-links. The basic repeating

* This work was supported, in whole or in part, by National Institutes of HealthGrants AR048544 (to A. F.), AHA 0435339Z, NIH RR08630, and NSF 0644015(to J. P. R. O. O.), and NIH AR049604 and HL053590 (to J. S. A.). The costs ofpublication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked “advertisement” inaccordance with 18 U.S.C. Section 1734 solely to indicate this fact.

□S The on-line version of this article (available at http://www.jbc.org) containssupplemental text, equations, and Fig. S1.

1 To whom correspondence should be addressed: Thomas Jefferson Univer-sity, Cardeza Foundation for Hematologic Research and Dept. of Medicine,Rm. 809 Curtis, 1015 Walnut St., Philadelphia, PA 19107-5099. Tel.: 215-955-6121; Fax: 215-923-3836; E-mail: [email protected].

2 The abbreviations used are: OI, osteogenesis imperfecta; PG, proteogly-can; THP, triple helical peptide; MLBR, major ligand binding region;MMP, matrix metalloproteinase; SPARC, secreted protein, acidic, andrich in cysteine; FACIT, fibril-associated collagens with interrupted tri-ple helices.

THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 283, NO. 30, pp. 21187–21197, July 25, 2008Printed in the U.S.A.

JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21187

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 2: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

morphological structure of the fibril is the D-period,�67 nmlong, and composed of one overlap and one gap zone. EachD-period contains the complete monomer sequence derivedfrom overlapping consecutive elements of five monomers(Fig. 1C, box). Other collagens, proteoglycans (PGs), andmatrix macromolecules may further assemble with the fibrilto impart tissue-specific properties to the heterotypic poly-mer (2, 4, 8). However, it is unclear how cells, matrix mole-cules, and other factors interact with the heterotypic colla-gen fibril. Therefore, a comprehensive structure-functionmodel of the protein has been lacking.To help elucidate type I collagen structure-function relation-

ships, a data base of functional domains, ligand binding sites,and humandisease-associatedmutationsmapping to type I col-lagen was previously assembled and presented as a two-dimen-sional map of the collagen fibril (9). It was reported that ligand-binding hot spots appear on collagen, and the positions ofcertain classes of mutations may correlate with disease pheno-type. Subsequently the molecular, microfibrillar, and fibrillarstructures of type I collagen have been determined throughx-ray diffraction (10).This study is a theoretical analysis of the interrelationships

between hundreds of new functional domains, ligand-bind-ing sites, and mutations on the collagen map. Moreover, themap is analyzed alongside of a three-dimensional model ofthe collagen microfibrillar structure that was recently deter-mined (11). This analysis reveals novel insights into how thetype I collagen fibril, and perhaps collagen fibrils in general,are organized to fulfill their crucial role as scaffolds for cell

interactions and as the predomi-nant structural elements of verte-brate tissues.

EXPERIMENTAL PROCEDURES

Ligand Binding Sites and Func-tional Domains—Positions of bind-ing sites and functional domainswere obtained from the literatureand are indicated by labeled boxesplaced next to the relevant se-quences. Primary literature refer-ences for sites indicated on an ear-lier version of the map appear in thesupplemental materials. Recentlyreported sites are referenced in leg-end to Fig. 2. Zones of interactionsbetween ligands and broad regionsof collagen fibrils, as observed byelectron microscopy, are indicatedby colored overlays. Many bindingsite locations were approximatedbased on low resolution approaches,such as electronmicroscopy ofmac-romolecular complexes. On theother hand, the use of collagenmodel triple helical peptides (THPs)(12, 13), in some cases, includingnovel peptide Toolkits composed of

overlapping, 27-residue segments of collagen sequences (14),has allowed high resolutionmapping of the positions of variousligand binding sites and functional domains on the collagenmonomer.In vivo, type I collagen-ligand interactions may depend upon

the tissue source.Moreover, collagen fibrilsmay be heterotypic,i.e. contain other collagens such as types III and V (15), whosearrangements in the fibril are incompletely understood, andthat could impact fibril-ligand binding. However, despite thedegree of uncertainty implicit in the map, and that some dis-crepancies exist in the data base (9), the approach followed hereis to identify fundamental features of collagen structure andfunction based on numerous lines of evidence and by focusingon the best characterized functional elements of collagen.Identifying Interrelationships between Sites—Relevant rela-

tionships between binding sites and functional domains wereidentified in fourways. First, as sequences on the collagenmon-omer towhichmore than one ligand has been shown to bind, orthat are near neighbors; second, as sequences that fall withinthe borders of a fibril region shown to bind a particular ligand;and third, as neighboring binding sites on adjacent monomerswithin the D-periodic packing scheme. In this last case it isproposed that interactions between ligands on adjacent mono-mersmay be possible if their binding sites align verticallywithinthe D-period, and if their two-dimensional structure allowsthem to simultaneously bind more than one triple helix andreach another ligand or ligand-binding site on neighboringmonomer(s). Here, sites on two ormoremonomers are consid-ered as aligned vertically if they overlap with or fall close to any

FIGURE 1. Assembly and structure of the type I collagen fibril. A fragment of a single type I collagen triplehelix (monomer) is depicted (A). Type I collagen is secreted as a procollagen monomer � 300 nm long, extra-cellularly, the propeptides (N and C) are cleaved by proteinases (dashed vertical lines, B) and the tropocollagenmonomers assemble in a staggered fashion into collagen fibrils (C) where one D-period repeat (expandedtwo-dimensional view of 67 nm segment of microfibril, box) contains the complete collagen sequence fromelements of the five monomers and includes an overlap and gap zone. The subunit structure of the fibril can beconsidered to be the D-periodic 5-mer referred to as the microfibril (D). Collagen fibrils appear as periodicbanded structures by electron microscopy; arrow, left border of overlap zone; particles are heparin-albumingold conjugates used to map the heparin-binding site (E). The collagen map in Fig. 2 represents an expandedview of one D-period. Modified from a figure reproduced from J. Cell Biol. by copyright permission of TheRockefeller University Press (53).

Domain Model of the Collagen Fibril

21188 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 283 • NUMBER 30 • JULY 25, 2008

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 3: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

axis drawn perpendicular to the long axes of the monomers,and joining monomers 1 and 5. Fourth, select data from thetwo-dimensional map were correlated with the known three-dimensional packing structure of collagen within the fibrillarcontext (10, 16).Correlating Two-dimensional Collagen Map Data with a

Three-dimensional Collagen Microfibril Model—The collagenmicrofibril model used in this study was composed from thepacking structure of rat tendon type I collagen moleculesobserved in situ as described (10). Fiber diffraction data col-lected from native and derivatized (heavy atom labeled) rat tailtendons was used to solve the in situ structure of the collagenmolecules andmicrofibril usingmultiple isomorphous replace-ment (16). A molecular model was constructed based on theprimary sequences of the�1 and�2 chains of rat type I collagen,and the superhelical parameters were determined from collag-en-like peptide crystallographic structure determinations. Theelectron density map representing the experimentally deter-mined microfibril has a resolution of 0.516 nm in the directionof the fiber (collagen molecule) axis and 1.11 nm in the direc-tion perpendicular to this. To determine the position of func-tional sites from the two-dimensional collagen map on thethree-dimensional microfibril model, solvent-accessible sur-face calculation and rendering was performed using SPOCK(17) with the default probe size of 0.14 nm. Distances betweensurface sites were determined by line-of-sight measurement ofatoms central to each site and located at themicrofibril surface,assembled by using the instructions and coordinates providedby RCSB structure file 1Y0F (10). To construct images fromthese analyses, the C� “worm” traces of relevant portions ofindividual triple helices were marked using a semi-transparentsurface rendering. The semi-transparent surface was then ren-dered in the appropriate colors to represent the positions ofrelevant functional sequences and binding sites. That rat andhuman collagen protein sequences are highly homologous jus-tifies the approach of localizing functional domains of humantype I collagen on the rat type I collagen microfibril.Three-dimensional Modeling of Integrin-Collagen Inter-

actions—Molecular modeling of �2�1 integrin I-domain bind-ing to the collagen triple helix was performed on a SiliconGraphics (Octane) computer system using the SYBYL softwarepackage, version 7.2 (Tripos Inc.). A model of the collagen Itriple helix was created (18) with a 36-amino acid peptide span-ning the �1-glycine 475 to �1-asparagine 510 region and a cor-responding �2-glycine 475 to �2-alanine 510 region of humancollagen IA, in which a fructopyranose residue was linked to�2-lysine 479. To minimize end group effects, the amino ter-minus was substituted with anN-acetyl group and the C termi-nus with NHCH3. The model was energy-minimized using aconjugate gradient method and subject to repeating cycles ofmolecular dynamics using Kollman force fields and unitedatoms (19). To illustrate the interaction of the �1(I)GFPGER502–507 sequence with the integrin I-domain a crystalstructure of a complex between the I-domain and a triple heli-cal collagenpeptidewas used (ProteinDataBank, ID code 1dzi).The intermolecular energy of the interaction was analyzed withthe SYBYL/Dock module to identify possible binding confor-

mations. Surface calculations of the molecules were analyzedusing SYBYL/Molcad module.Human Mutations—Mutations were obtained from the

Database of Human Type I and Type III Collagen Mutations(www.le.ac.uk/genetics/collagen/), the OI consortium muta-tion data base (5, 20), or published elsewhere (20–27). Unpub-lished mutations were detected by DNA sequencing (28) andwere identified in patients referred to a clinical DNA diagnos-tics laboratory (Reading, PA) for mutation analysis of theCOL1A1 andCOL1A2 genes. Clinical diagnosis of OI wasmadeby the referring medical personnel.Collagen Sequence Analysis—Sequence homologies between

human fibrillar collagens were calculated using the mature col-lagen primary sequences and the ClustalW function ofMacVector 9.0 (Accelrys). In homology determinationsbetween two proteins the number of homologous residuesdivided by the number of residues in the larger protein yieldedthe percent homology value. The find feature of the programwas used to search for functional domain sequences of interest.Accession numbers for collagen sequences examined in thisstudy include:Homo sapiens COL1A1, NM_000088;H. sapiensCOL1A2, NM_000089;H. sapiensCOL2A1, BC116449;H. sapi-ens COL3A1, NM_000090; H. sapiens COL5A1, NM_000093;H. sapiens COL5A2, NM_0000393; H. sapiens COL5A3,NM_015719; H. sapiens COL11A1, J04177, J05407, NM_080630, NM_08062, NM_001854; H. sapiens COL11A2,NM_080681, NM_080680, NM_080679; Bos taurus COL1A1,BC105184; B. taurus COL1A2, NM_174520; Canis familiarisCOL1A1, NM_001003090; C. familiaris COL1A2, NM_001003187; Xenopus laevis COL1A1, BC049829; X. laevisCOL1A2, BC049287;Mus musculus COL1A1, NM_007742;M.musculus COL1A2, NM_007743; Rattus norvegicus COL1A1,XM_001081230; R. norvegicus COL1A2, NM_053356; Daniorerio COL1A1, NM_199214, and D. rerio COL1A2,NM_182968.Statistics—Statistical analyses of mutation distributions on

the collagen fibril appear under “Results” and in the supple-mental materials.

RESULTS AND DISCUSSION

The collagenmap (Fig. 2) consists of the protein sequences ofthe human�1 and�2 chains of type I collagen, arranged as theyare proposed to occur in the fibril, upon which positions ofstructural landmarks, ligand binding sites, and disease-associ-ated mutations are superimposed.Ligand Binding Site Distribution—When the distribution of

ligand-binding sites was examined on the map, more werefound on the C-terminal half of the collagen monomer (Fig. 2),as noted previously (9). Moreover, three concentrations or “hotspots” of ligand binding are again evident, designated as majorligand binding regions (MLBRs) (dashed boxes 1–3, Fig. 2). Thepotential functional implications of some of the overlappingligand binding sites were discussed before (9). A previouslyunobserved and less conspicuous hot spot appears for the firsttime on the revised map on and around sequence glycine-phenylalanine-hyroxyproline-glycine-glutamic acid-arginine(GFPGER)502–507 (Fig. 2, orange box), previously discovered asthe predominant �1�1/�2�1/�11�1 integrin and cell binding

Domain Model of the Collagen Fibril

JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21189

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 4: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

Cell InteractionDomain

SPARC

N-ANCHOR DOMAIN MICRO UNFOLDING

G-S OI1,4

α 1α 2 P G P Q G F Q G P A G E P G E P G Q T G P A G A R G P A G P P G K A G E D G H P G K P G R P G E R G V V G P Q G A G T P G Q T G A R G L P G

P G P Q G F Q G P P G E P G E P G A S G P M G P R G P P G P P G K N G D D G E A G K P G R P G E R G P P G P Q G A G A P G Q M G P R G L P GP G P Q G F Q G P P G E P G E P G A S G P M G P R G P P G P P G K N G D D G E A G K P G R P G E R G P P G P Q G A G A P G Q M G P R G L P G

THP/HB;ANG-

α β2 1 INTEGRIN

α β2 1 INTEGRIN

FIBRIL(-) 50%

THP/HB; ANG-

HEP

NPC

NPC

GLYCATION

GLYCATION

FIBRIL(-) 100%

GLYCATION

AMYLOID PRECURSOR PROTEIN

FIBRIL (+)

COL V X-LINK

THP/HB;ANG-

DECORIN CORE

αβ

21 -

INTEGRIN

α1, 2

, 11-

β1-

INTEGRIN

GLYCATION

PHOSPHOPHORYN

COMP THP/FA+;MMP1,2,13

AMYLOID PRECURSOR PROTEINFIBRONECTIN

FIBRIL (-)

DSPG

[PS]DSPG [PS]DSPG

[PS]KSPG

[PS]DSPG [PS]DSPG

C2 C1 B2 B1 A4 A3 A2 A1 E2 E1 D C3

KERATAN SULFATE PGs KERATAN SULFATE PGsKERATAN SULFATE PGs DERMATAN/CHONDROITIN SULFATE PGs/DECORINGLYCATION

PHOSPHOPHORYNHEPARAN SULFATE PGs

1

213

446

682

915

1014

DC

INTEGRIN

CPC

CPC

COMP

GE-DECORIN

GE-DECORIN

COMP

α β1 1 INTEGRIN

α β1 1, INTEGRINα β2 1

α β1 1, INTEGRINα β2 1

THP/ 1, 2 1α α β

THP/ 1, 2 1-α α β

α β2 1 INTEGRIN

INTERLEUKIN 2

INTERLEUKIN 2

α β1 1 INTEGRIN

1 1 Iα β NTEGRIN α β2 1 INTEGRIN

α β2 1 INTEGRIN

α β1 1 INTEGRIN

α β2 1 INTEGRIN

[PS]KSPG

INTERLEUKIN 2

INTERLEUKIN 2

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

MMP-1

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

HSP47

α1

α2

N

GVR G P N G D A G R P G E P G L M G P R G L P G S P G N I G P A G K E G P V G L P G I D G R P G P I G P A G A R G E P G N I G F P G P K G P T G D P G K N G D K G H A G L A G A R G A P G P D G N N G A Q G P P G

G P K G S P G E A G R P G E A G L P G A K G L T G S P G S P G P D G K T G P P G P A G Q D G R P G P P G P P G A R G Q A G V M G F P G P K G A A G E P G K A G E R G V P G P P G A V G P A G K D G E A G A Q G P P GG P K G S P G E A G R P G E A G L P G A K G L T G S P G S P G P D G K T G P P G P A G Q D G R P G P P G P P G A R G Q A G V M G F P G P K G A A G E P G K A G E R G V P G P P G A V G P A G K D G E A G A Q G P P G

R G E I G N P G R D G A R G A P G A V G A P G P A G A T G D R G E A G A A G P A G P A G P R G S P G E R G E V G P A G P N G F A G P A G A A G Q P G A K G E R G A K G P K G E N G V V G P T G P V G A A G P A G P N G P P

K G A D G S P G K D G V R G L T G P I G P P G P A G A P G D K G E S G P S G P A G P T G A R G A P G D R G E P G P P G P A G F A G P P G A D G Q P G A K G E P G D A G A K G D A G P P G P A G P A G P P G P I G N V G A PK G A D G S P G K D G V R G L T G P I G P P G P A G A P G D K G E S G P S G P A G P T G A R G A P G D R G E P G P P G P A G F A G P P G A D G Q P G A K G E P G D A G A K G D A G P P G P A G P A G P P G P I G N V G A P

C

G-A,G-S,G-V OI1/4,3,2

G-D,

OI

DI

G-S

4,4B,2,

G-C,G-S

G-C

OI2

G-A OI3/4

G-S

OI2

G-S

OI2

G-E

OI2

G-S

OI2

G-C

OI2

G-C

OI2

G-C

G-D OI3

OI2

G-R

OI

G-R

OI22

G-V

R-C CAF

G-C OI2

OI2

G-S

OI4,4B,DI

G-R

OI2

G-S

OI2A,2,3,3/4

G-C,G-S OI3

G-S

OI3

G-S

OI2/

3

G-S

G-A OI4

OI4

G-C,G-S OI2,3,4,3/4

G-S

OI1,4

G-C,G-S

OI2

G-S

OI4

G-A

OI2

G-S

OI2

G-C

OI2

G-S

OI4?,4

G-S

OI2

G-S,G-V,G-D OI3,2

G-R

OI2

G-C

OI2

G-C,G-S OI3

G-V OI2

G-D

OI2C,2

G-C

OI1

G-S

OI2

G-V

OI2

G-S,

G-V

OI3,2/3,2

G-C,G-D

OI2,

3

G-C,G-S OI2,3,4

GAP ZONE

G-R

OI2

G-D

OI2

G-C

G-C OI2

OI2

G-R

OI2

G-S

OI2

G-V

OI4

G-D

G-E OI2

G-V OI3

G-V OI1

OI2

G-D

OI2

G-D

OI3

G-D,G-R,G-VOI1,3,4

G-C,G-D,G-S

OI3,1,4

G-D,G-S,G-AOI2,3,3/4

G-D

OI2

G-S

OI2

G-C

OI2/3

G-C

OI1,

4

G-R

OI3

G-V

G-V OI3

G-V OI2/3

OI3/4,4

G-S,G-V

OI1,4

G-S

OPO

R-Q

MFS

G-E

OI3

G-S

OI2

G-D

OI3

G-R

OI4

G-S

OI3,

4

G-S

OI3/4

G-R,G-S

OI2,

3

G-C

G-C OI4

G-S OI4

OI2

G-D

OI2

G-R

OI3,

3/4

G-S,G-A

OI2,4

G-S

OI1,4

G-C

OI2

G-D

OI2

G-S

G-R OI1

G-R OI1,4

G-C OI4

G-R OI2

G-D OI2

G-D,G-V OI1/4,3

OI2,

4

G-S OI2,2B/3,2/3,3,?

G-C,G-S OI2,3

G-V,G-D

G-V OI3

OI2

G-D

OI2

G-D

OI3/4

G-S OI1

G-R OI2

G-V OI1/4

G-S

OI4

G-S

OI2,3/4,4

G-S

OI3/4

G-R

OI3,4

G-E,G-ROI2

G-D

G-V OI2/3

G-V OI2

G-D OI1,4

OI2

G-A

OI3

G-R

G-S OI3

G-S OI3

G-C OI3

G-C OI3/4

OI3

G-D

OI2

G-R OI3?,3

G-D

OI

1,4

G-D

OI4

G-S

OI4?,4

G-S

G-D,

OI3,1

G-D

OI

3

G-V

G-D,G-R OI4,1/4

OI4

G-R,G-D OI1/4,3,4,3/4

G-D,G-V OI4,3

G-S OI1,4

G-D OI2/3,3/4

G-A,G-V OI 4,2/3

G-A,G-V OI 4,3

G-V OI 3

G-R

OI

4

G-C,G-S

OI4,4B,3/4,2,3

G-A,G-C,G-D

OI2,3,3/4,4

G-C,G-D

OI1,3/4,4,2,2US

G-C,G-S OI3/4?,4,3

G-C,G-A OI2,3/4

G-S

OI2

G-C

OI2

G-E,G-R OI2,3

G-S OI1,1?,2,2/3,3/4

G-V

G-V OI2

G-C OI2

OI2,3

G-R

G-R,G-A OI2,3

G-C OI4

OI2,2US,1

G-C,G-D

OI2

G-A

OI2

G-R

OI2

G-R

OI2

G-R

OI2

G-R,G-S,G-C

OI2,4,3/4,1/4,3

G-E

OI2?,2

G-S,G-C OI3,4

G-S OI3

G-A OI3

G-S OI3

G-C,G-E

OI2US

G-A G-D CAD,ANE,EDS7L,OI1,OI4

G-R

G-V,EDS7L,OI3,4,3/4

G-C,G-D,G-

R,G-S

OI1

,4,OPO,OI/EDS

G-C

OI1

G-C

OI1

G-R

OI1,

1A,3

,4

G-R,G-S

OI1A,1,3/4,4

G-C

OI1,

1/4

G-D,G-R

OI2,4

G-S,G-D OI4,2

G-D,G-R

OI3,4

R-C

EDS,

AD

G-R

OI3

G-A

OI3/4

G-C,G-S OI1,4

G-C

OI4

G-C,G-S

OI1,2,3,4,2/3,3/4

G-C,G-S

OI

3/4,3,4

G-R,

G-V

OI1

P-A

OPA?

G-C,G-R,G-S

OI2,3,3/4,4

G-S

OI4?,2/3

G-S

OI3

G-E G-R,E

DS7L

,OI3,2/3

G-A

OI4

G-E

OI2/3US,2/3

G-R,

G-S

0I4,3

G-D,G-C,G-S

G-C,G-S OI3/4,3

G-C,G-S OI3/4,4

G-C,G-S OI3,4

G-A,G-S OI3,4

OI3,4,1,1/4

G-R,G-P,G-C,G-S

OI3/4,3,1,4,1/4

G-D,G-V

G-D OI4

G-S OI1

G-S OI4

G-E OI2,4

OI3

G-W

OI3

G-S,G-C

OI3,4

G-D,G-V

OI3/4,4

G-V

OI2

G-D.G-SOI3?,2/3

G-S

G-E OI2/3

G-C OI2

OI1,4,OPO?

G-S

G-S OI4

OI3/4,4

G-E

OI2

G-R

OI2/3

G-S

OI2,2/3,3,4

G-D,G-V2/3

G-E,G-V

OI1,3/4

G-E

OI4

G-E

OI4

G-V

OI2

G-D

OI3

G-D,G-V,G-S

OI2,

4

G-R

OI3

G-C

OI4

G-E,G-R

OI3,3/4,OPO

G-E

OI4

G-S

OI4

G-R,G-S

OI3

G-R

OI1

G-S OI2/3

G-C OI4

G-C OI2

G-S OI3

G-D,G-S,G-R

OI2,3

G-C,G-S

OI2,4

E-K

OI1?

G-S

OI2

G-V

OI2

G-C,G-R

G-R OI2

A-P ICA

OI2

G-S

G-A OI3

G-V OI2

G-S OI3

OI2

G-C,G-S 0I4,3

G-S,G-C

OI2,3,2/3

G-R

OI2

G-V

OI2

G-S

OI2,3,3/4?3/4,4

G-D,G-C

OI1,1B,2,4,DI

G-S

OI3/4

G-C,G-S

OI2

G-A,G-S,G-V OI1,2

G-D,G-V OI3/4,2

G-C OI3/4

G-A

OI3

G-R,G-S OI2,3,4

G-D,G-S OI2/3,3

G-D,G-R OI2

G-R

OI2

G-D

OI2

G-S

OI2

G-C,G-S

OI3,4,3/4

R-H

CTW

G-S,G-V

OI2

G-S

G-S OI4

OI1/4

G-C

OI1

700 800750 850 900

500 600550450 650

250 350300 400

0 10050 150 200

950 1000

HSP47

HSP47

G-D

OI4

G-R,

EDS

7L,O

I 1/

4

G-R

OI3

G-C

OI1

G-E,

EDS

7L,OI

3/4

G-S,

G-V

1/4,

3/4

G-S,G-A OI2,1/4

G-D,G-R OI3,4

G-R OI 1/4

G-E OI3

G-D,G-R OI3,4,1/4

G-A,G-V OI3/4,2

G-C,G-V OI4,2/3

G-D OI1

G-D OI4

G-V OI1,1/4

G-V OI1/4

G-V OI2/3,3

THP/α1, 2, 11-β1

THP/OBD

THP/ECA+

THP/ANG-

ENDO 180

PHOSPHO

PHOSPHOPHORYN

HSP47

HSP47

HSP47

HSP47

HSP47

SPARC

SPARC

R-C OI/EDS

Matrix Interaction Domain

G-D,G-S,G-R,G-V OI3/4,1,4

G-D 0I2

G-A 0I2

G-C 0I2

OI3,3/4,4,DI

G-S,G-V OI4,2

G-S OI2

G-C,G-S OI2,4

G-V OI3/4

G-D,G-S OI3,4

G-A,G-C OI1,4

G-A,G-D,G-V OI1?,3/4,4

G-S,G-A OI2,4

G-C,G-D,G-S OI3/4,4B,2/3,2,DI

G-A,G-R OI3,1,4,3/4

Q L S Y G Y D E K S T G G I S V P G P M G P S G P R G L P G P P G AQ L S Y G Y D E K S T G G I S V P G P M G P S G P R G L P G P P G A

Q Y D G K G V G L G P G P M G L M G P R G P P G A A G A

R G L P G T A G L P G M K G H R G F S G L D G A K G D A G P A G P K G E P G S P G E NR G L P G T A G L P G M K G H R G F S G L D G A K G D A G P A G P K G E P G S P G E NR G F P G T P G L P G F K G I R G H N G L D G L K G Q P G A P G V K G E P G A P G E N

E R G R P G A P G P A G A R G N D G A T G A A G P P G P T G P A G P P G F P G A V G A K G E A G P Q G P R G S E G P Q G V R G E P G P P G P A G A A G P A G N P G AE R G R P G A P G P A G A R G N D G A T G A A G P P G P T G P A G P P G F P G A V G A K G E A G P Q G P R G S E G P Q G V R G E P G P P G P A G A A G P A G N P G AE R G R V G A P G P A G A R G S D G S V G P V G P A G P I G S A G P P G F P G A P G P K G E I G A V G N A G P A G P A G P R G E V G L P G L S G P V G P P G N P G A

P S G A S G E R G P P G P M G P P G L A G P P G E S G R E G A P A A E G S P G R D G S P G A K G D R G E T G P A G P P G A P G A P G A P G P V G P A G K S G D R G E T G P A G P A G P V G P V G A R G P A G P Q G PP S G A S G E R G P P G P M G P P G L A G P P G E S G R E G A P A A E G S P G R D G S P G A K G D R G E T G P A G P P G A P G A P G A P G P V G P A G K S G D R G E T G P A G P A G P V G P V G A R G P A G P Q G PI A G P P G A R G P P G A V G S P G V N G A P G E A G R D G N P G N D G P P G R D G Q P G H K G E R G Y P G N I G P V G A A G A P G P H G P V G P A G K H G N R G E T G P S G P V G P A G A V G P R G P S G P Q G I

G-R

OI1

X-LINK

E-K OI1

G-V OI1

P-R OI1

R-C AD,OPA

G-R OI2

G-R OI1

G-S OI1

G-V OI2

G-C OI1

G-S OI3

G-D OI3

G-D OI2

R-C AD,OPA

P3H OI2,3

G-D OI3

X-LINK

G-A,OI3

G-S OI4

VWF

MMP INTERACTION DOMAIN

MINERALIZATION

DDR2

DDR2

DDR2

G P A G S R G D G G P P G M T G F P G A A G R T G P P G P S G I S G P P G P P G P A G K E G L R G P R G D Q G P V G R T G E V G A V G P P G F A G E K G P S G E A G T A G P P G T P G P Q G L L G A P G I L G L P G S R G E R G L P G V A G A V G E P G P L G

G A K G A R G S A G P P G A T G F P G A A G R V G P P G P S G N A G P P G P P G P A G K E G G K G P R G E T G P A G R P G E V G P P G P P G P A G E K G S P G A D G P A G A P G T P G P Q G I A G Q R G V V G L P G Q R G E R G F P G L P G P S G E P G K Q GG A K G A R G S A G P P G A T G F P G A A G R V G P P G P S G N A G P P G P P G P A G K E G G K G P R G E T G P A G R P G E V G P P G P P G P A G E K G S P G A D G P A G A P G T P G P Q G I A G Q R G V V G L P G Q R G E R G F P G L P G P S G E P G K Q G

P Q G V Q G G K G E Q G P A G P P G F Q G L P G P S G P A G E V G K P G E R G L H G E F G L P G P A G P R G E R G P P G E S G A A G P T G P I G S R G P S G P P G P D G N K G E P G V V G A V G T A G P S G P S G L P G E R G A A G I P G G K G E K G E P G L

P A G P A G E R G E Q G P A G S P G F Q G L P G P A G P P G E A G K P G E Q G V P G D L G A P G P S G A R G E R G F P G E R G V Q G P P G P A G P R G A N G A P G N D G A K G D A G A P G A P G S Q G A P G L Q G M P G E R G A A G L P G P K G D R G D A G PP A G P A G E R G E Q G P A G S P G F Q G L P G P A G P P G E A G K P G E Q G V P G D L G A P G P S G A R G E R G F P G E R G V Q G P P G P A G P R G A N G A P G N D G A K G D A G A P G A P G S Q G A P G L Q G M P G E R G A A G L P G P K G D R G D A G P

N G L T G A K G A A G L P G V A G A P G L P G P R G I P G P V G A A G A T G A R G L V G E P G P A G S K G E S G N K G E P G S A G P Q G P P G P S G E E G K R G P N G E A G S A G P P G P P G L R G S P G S R G L P G A D G R A G V M G P P G S R G A S G P A

D G Q P G A K G A N G A P G I A G A P G F P G A R G P S G P Q G P G G P P G P K G N S G E P G A P G S K G D T G A K G E P G P V G V Q G P P G P A G E E G K R G A R G E P G P T G L P G P P G E R G G P G S R G F P G A D G V A G P K G P A G E R G S P G P AD G Q P G A K G A N G A P G I A G A P G F P G A R G P S G P Q G P G G P P G P K G N S G E P G A P G S K G D T G A K G E P G P V G V Q G P P G P A G E E G K R G A R G E P G P T G L P G P P G E R G G P G S R G F P G A D G V A G P K G P A G E R G S P G P A

R G D K G E T G E Q G D R G I K G H R G F S G L Q G P P G P P G S P G E Q G P S G A S G P A G P R G P P G S A G A P G K D G L N G L P G P I G P P G P R G R T G D A G P V G P P G P P G P P G P P G P P S A G F D F S F L P Q P P Q E K A H D G G R Y Y R AR G D K G E T G E Q G D R G I K G H R G F S G L Q G P P G P P G S P G E Q G P S G A S G P A G P R G P P G S A G A P G K D G L N G L P G P I G P P G P R G R T G D A G P V G P P G P P G P P G P P G P P S A G F D F S F L P Q P P Q E K A H D G G R Y Y R AR G D K G E P G E K G P R G L P G L K G H N G L Q G L P G I A G H H G D Q G A P G S V G P A G P R G P A G P S G P A G K D G R T G H P G T V G P A G I R G P Q G H Q G P A G P P G P P G P P G P P G V S G G G Y D F G Y D G D F Y R A D Q P R S A P S L R P

M1

M2

M3

M4

M5

4/4

0

PP

AA

L

O4-G44

1

OGGG-SVS-4

PPA

4V2222,2,222OO

2

O0GOOOOIOO0N00

α

3

FIGURE 2. Ligand binding site and mutation map of the human type I collagen fibril. Protein sequences of the triple helix are shown (GenBankTM, �1(I)accession #NP000079.2 and �2(I) NP_000080.2; proline and hydroxyproline are designated as P, the latter being the third position in glycine-X-Y sequences.Note that Ala-459, �2(I) results from a single nucleotide polymorphism observed in some patient samples in the non-redundant protein data base, whereas�2(I) Pro-459 appears in the NP_000080.2 sequence. Yet, the association of Ala to Pro-459 substitution mutations with intracranial aneurisms (26) justifies its

Domain Model of the Collagen Fibril

21190 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 283 • NUMBER 30 • JULY 25, 2008

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 5: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

sequence of type I collagen (29–31). Integrins are het-erodimeric cell surface receptors that bind extracellular matrixmolecules and are proposed to play roles in tissue morphogen-esis, matrix assembly, and cell signaling (32, 33). Studies usingTHPs and molecular modeling approaches indicate thatregions of the triple helix with high frequencies of ligand bind-ing sites and functions do not have obvious correlations withlocal (at the scale of few triplets) characteristics in the confor-mation or stabilities of the triple helix (34). On the other hand,analysis of regional variations in the triple helix stability basedon collagens with OI mutations identified two large flexibletriple helical regions that aligned with zones proposed impor-tant for fibrillogenesis and ligand binding, which overlap withMLBR2 and perhaps MLBR1 (35). Putative binding sites formany of the major ligands exist on multiple monomers acrossthe two-dimensional fibril map (Fig. 2); however, for the nativecollagen fibril it is not yet clear which of these sites are availablefor cell and ligand interactions.Mutation Distribution—The phenotypic consequences of

the most prevalent class of mutations on the map, those asso-ciated with OI, are in general proposed to arise from their dis-ruption of collagen monomer folding or stability, not ligand-fibril interactions. OI is generally divided into four clinicaltypes: type I (mild), II (lethal), III (severe), and IV (moderatelysevere) (5). The gradient model suggests that, because the helixfolds in the C- to N-terminal direction, mutations near the Cterminus affect collagen modification and assembly more sig-nificantly, overall resulting in a more severe phenotype (36).However, several exceptions to this pattern were observed (5);most relevant to the present study was the discovery that highconcentrations of lethal OI mutations co-localize with cell andligand-binding sites. For example, on the �2(I) chain clusters oflethal OI mutations tend to coincide with putative zones ofPG-fibril interactions (Fig. 2, red brackets) (5). New observa-tions from the collagen map extend these findings. Thus, theN-terminal site for intermolecular cross-linking is associatedwith several lethal OImutations but is bracketed bymildmuta-tions, and sites for �1�1/�2�1 ligation at GFPGER502–507 andMMP-1 cleavage co-localize with mutation “silent zones” thatmay reflect embryonic lethality (see below) (Fig. 2). Finally,non-OI mutations also fail to follow an N- to C-terminal gradi-ent of severity; yet cluster to several zones of the fibril D-period(supplemental Fig. S1).Domain Model of the Collagen Fibril—Gross examination of

the collagenmap brings home a point previously reported, thatmuch of the fibril is covered by PGs (2, 8), structuralmacromol-ecules considered to be full-time binding partners of collagenfibrils in many tissues (Fig. 2, yellow, pink, and purple overlays).PGs are composed of core proteins to which are covalentlylinked one or more high molecular weight, anionic glycosam-

inclusion here. Ligand binding regions are indicated by rectangular boxes adjacent to relevant sequences; ligand binding to type I procollagen or tropocollagen(gray), or to either �1(I) or �2(I) chains (unshaded). Ligand binding hot spots 1, 2, and 3 are enclosed by dashed-lined black boxes. Highlighted sequences include:the �1�1/�2�1 integrin binding site GFPGER502–507 (maroon rectangle); MMP-1 cleavage site (green arrow); N- and C-terminal intermolecular cross-links (purplerectangles). Studies responsible for mapping many of the functional domains on collagen are cited in the first map publication (9) and in the supplementalmaterials. New sites on the revised map include endo180 (57), HSP47 (58), micro-unfolding and N-anchor domains (59, 60), MMP-interaction domain (11),prolyl-3-hydroxylase substrate P986 (61), SPARC (62), phosphorphoryn (63, 64), von Willebrand Factor (65), discoidin domain receptor 2 (66), and activities ofTHPs on angiogenesis (13), endothelial cell activation (67), and osteoblast differentiation (68). Mutations were from sources cited under the “ExperimentalProcedures.” This figure was modified from a previously published illustration (9).

α1β1 integrin α1β1 integrin binding sitesα2β1 integrin α2β1 integrin binding sitesα11β1 integrin α11β1 integrin binding sitesAD Mutation associated with arterial dissectionsANE Mutation associated with aneurysmCAD Mutation associated with cervical artery dissectionsCAF Mutation associated with Caffey Disease (Infantile cortical hyperstole)Col V X-link Collagen V cross-link siteCOMP Cartilage oligomeric matrix protein binding sitesCPC C-Proteinase cleavage siteCTW Mutation associated with connective tissue weaknessDC integrin Integrin binding site on denatured collagenDDR2 Proposed site of discoidin domain receptor 2 bindingDecorin core Decorin core protein binding siteDI Mutations associated with Dentinogenesis Imperfecta[PS] DS PG Proposed site of dermatan sulfate proteoglycan bindingEDS Mutations associated with Ehlers-Danlos SyndromeEDS7L Mutations associated with Ehlers-Danlos VII-Like SyndromeENDO180 Collagen binding factorFibril (+) C-telopeptide fibrillogenesis nucleation domainFibril (-) Fibrillogenesis inhibition siteFibril (-) 50%, 100% 50 or 100% inhibition of fibrillogenesis by peptidesG X Substitution mutationGE-decorin Guanidine-extracted decorin binding sitesGlycation Glycation sitesHEP Heparin, heparan sulfate proteoglycan binding siteHSP47 Potential heat shock protein 47(chaperone) binding sitesICA Mutation associated with intracranial aneurysms[PS] KS PG Proposed sites of keratan sulfate proteoglycan bindingMFS Mutation associated with Marfan SyndromeMMP-1 Matrix Metalloproteinase-1 cleavage siteMMP 1, 2, 13 1, 2, 13MMP Interaction Domain Matrix Metalloproteinase Interaction DomainN-ANCHOR DOMAIN Proposed N-anchor DomainNPC N-Proteinase cleavage siteOI Mutations associated with Osteogenesis imperfecta types 1, 2, 3, 4; ?/US denotes difficult diagnosis/unusual phenotypeOPA Mutation associated with Osteopaenia; ? denotes difficult diagnosis/unusual phenotypeOPO Mutations associated with Osteoporosis; ? denotes difficult diagnosis/unusual phenotypeP-3-H Site of Prolyl-3-hydroxylationPHOSPH Phosphophoryn binding sitePHOSPHOPHORYNSPARC Secreted protein acidic and rich in cystein binding siteTHP/α1, 2, 11-β1 integrin Triple helical peptide: binds integrin receptorsTHP/ANG- Triple helical peptide: inhibits angiogenesisTHP/ECA+ Triple helical peptide: promotes endothelial cell activationTHP/FA+; Triple helical peptide: promotes fibroblast adhesion, substrate for MMPsTHP/HB; ANG- Triple helical peptide: binds heparin, inhibits angiogenesisTHP/OBD Triple helical peptide: supports osteoblastic differentiationµ Unfolding Proposed Microunfolding domainVWF Proposed Site of von Willebrand Factor bindingX-LINK Intermolecular cross-link siteOrange Cell interaction domainYellow Keratan sulfate proteoglycans binding region on the fibrilBlue Glycation region on the fibrilGreen Phosphophoryn binding region on the fibrilPink Dermatan/chondroitin sulfate proteoglycans/decorin binding region on the fibrilBlack Brackets Matrix interaction domainRed Brackets Delineate clusters of lethal OI mutations on the α2(I) chainBlue Boxes Regions containing 3 or more first position glycines on both chains where no human mutations have been reported.

Domain Model of the Collagen Fibril

JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21191

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 6: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

inoglycan chains. PGs bound to the fibril may thus be expectedto strongly influence fibril ligation of other factors competingfor overlapping or neighboring binding sites. On the otherhand, GFPGER502–507, potentially themost critical cell interac-tion sequence of the fibril (see below), along with the matrixmetalloproteinase (MMP) 1, 2, and 13 cleavage sequence andseveral other key ligand binding sites, localizes to the “b1/b2”bands region where PGs are absent (Fig. 2, wide vertical non-shaded region). These initial observations suggested the colla-gen fibril to be organized into two domains, one mediatingdynamic cell-collagen interactions, and the other carrying outstructural duties. Analysis of the map and microfibril structuremodel provided further support for this hypothesis, as detailedbelow.Cell Interaction Domain—Cell-associated molecules pro-

posed to bind type I collagen include the integrins, PGs, discoi-din domain receptors, andmolecules bridging cell surfaces andthe extracellular matrix, such as fibronectin and SPARC(secreted protein, acidic, and rich in cysteine) (Fig. 2). However,evidence suggests integrin binding to be themost crucial deter-minant in cell-collagen interactions. Integrins that bind type Icollagen include the �1�1, �2�1, �11�1, and �V�3 receptors.Because the �V�3 integrin is thought to only interact withdegraded collagen, it will not be further considered. Candidate�1�1/�2�1/�11�1 integrin binding sequences on type I colla-gen have beenmapped to residues 127–132, 502–507, and 811–816 (Fig. 2).GFPGER502–507: Ground Zero of Type I Collagen—Analysis

of published data in the context of the map suggests a criticalrole for the central integrin binding site, GFPGER502–507 (Figs.2 (orange box) and 3A), according to four lines of evidence.First, this sequence binds�1�1/�2�1/�11�1 integrins and sup-ports angiogenesis, endothelial activation, osteoblast differen-tiation, and cell adhesion. Second, it resides in the center of thelargest “PG-clear zone” (Figs. 2 and 3A, wide vertical non-shaded region), ensuring its availability, and consistent with itsdedicated function. Third, GFPGER502–507 lacks reportedmutations, and also coincides with zones containing contigu-ous stretches of three or more first position glycines that aresilent for mutations on the �1 and �2 chains (Figs. 2 and 3A,blue boxes), and in the fibril onM1 andM5 (Fig. 3B, zones 2 and7). Such zonesmay be so critical to collagen function thatmuta-tions occurring there are embryonically lethal (37). These dataimply that, during embryogenesis, cell-collagen interactionsusing GFPGER502–507 are crucial for fibril assembly, remodel-ing, ormorphogenic processes such as angiogenesis.Moreover,considering the mutation density on the map, statistical analy-ses suggest that, although there is a high probability that theseven mutation silent zones may exist by random chance, it isvery improbable they would align within such a narrow regionof the D-period (see “Statistics” below), implying its specialfunctional significance. Fourth, GFPGER502–507 is also a nearneighbor of MLBR2, the highest concentration of ligand-bind-ing sites of the fibril, including the site for cleavage by MMP-1,-2, and -13 required for collagen remodeling (Fig. 4A) (10). Thisconstellation of sites also falls largely within the PG-clear zone.Analysis of the x-ray diffraction structure of themicrofibril (16)confirmed these key sequences to be near neighbors, only

�3–11 nm apart on the 67 nm wide D-period (Fig. 4B). Thus,on the collagen microfibril, elements of M3 and M4 within theb1/b2 bands region, includingGFPGER502–507, theMMPcleav-age site, andMLBR2 are proposed to comprise a cell interactiondomain where cell-collagen interactions and fibril remodelingare controlled by cell surfaces (Figs. 2 (orange box) 4).Two assumptions regarding the importance of GFPGER502–507

to fibril function deserve further comment. First, GLPGER127–132and GASGER811–816 in type I procollagen and in THP formalso bind �1�1/�2�1 integrins (38). At the level of the fibril,these sequences may overlap with PG binding zones, and atthe monomer and fibril levels do not coincide with largemutation silent zones on both the �1 and �2 chains, distin-guishing them from GFPGER502–507. Yet, the relative func-tions of GFPGER502–507, GLPGER127–132, and GASGER811–816

in cell-collagen interactions must be resolved through furtherexperimental investigation. A key consideration is whetherthese sites prove to be available for cell interactions in the nativefibril. Second, a murine knockout of the �2�1 integrin receptoryieldedmicewith a largely normal phenotype (39), which seemsat odds with the proposal made here that GFPGER502–507 iscrucial for cell-fibril interactions and embryogenesis. However,in the integrin null mouse, compensation for �2�1 functioncould potentially arise from the expression of other collagen-interactive receptors, including the �1�1 or �11�1 integrins.Statistics—Statistics was used to determine whether the

localization of the seven mutation silent zones to a relativelynarrow fibril region (Figs. 2, 3A, and 3B) is due to randomchance. Thus, several properties related to the spatial distribu-tion of first-position glycine (G1 site) mutations in collagenwere investigated. To compute the p values of interest requiredthe derivation of some novel combinatorial quantities and algo-rithms. A complete presentation of these analyses appears inthe supplemental materials; the following is a summary of thefindings. G1 site mutations do not occur in a spatially uniformway along the collagen monomer (p values 3.6 � 10�27 and4.3 � 10�17) under two reasonable models of spatial homoge-neity. The adjacency patterns of mutation-free G1 sites, how-ever, are quite consistent with the hypothesis that they have notendency to cluster together or remain apart (p values all�0.5).On the other hand, several long sequences of adjacent muta-tion-free G1 sites are arranged within a narrow vertical regionof the D-period. The probability of their falling in this region,under a uniform random placement model, is 4.6 � 10�4. Theprobability of their falling in any such region throughout theentire D-period is no more than 0.0025. Thus, the apparentclustering of mutation-free zones on the fibril likely did notarise randomly.Collagen Glycation and Ligand-Collagen Interactions—Gly-

cation is the non-enzymatic addition of sugar to protein thatoccurs during aging and in an accelerated fashion in diabetes(40, 41). Thus, glucose reacts with the �-amino group of a freelysine residue to generate a Schiff-base intermediate, followedby its rearrangement to the more stable Amadori product.Through subsequent reactions and modifications, fructosyl-ly-sines may then form cross-links with other proteins, formingadvanced glycation end products. Collagen thus modified maybecome less flexible and exhibit altered cell- and ligand-colla-

Domain Model of the Collagen Fibril

21192 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 283 • NUMBER 30 • JULY 25, 2008

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 7: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

gen interactions and fibrillar structure (41–45). Glycationoccurs on numerous collagen residues but preferentially onhydroxylysines �1(I) 434, and �2(I) 453, 479, and 924 (Fig. 2),and the “c1” band fibril region (Figs. 2 and 3A, blue stripe).Because glycated collagen is a poor substrate for cell interac-tions, it is notable that lysine 479 and GFPGER502–507 are nearneighbors on the two-dimensional fibril map (Fig. 2). To exam-ine this relationship, a three-dimensionalmodelwas built of thetriple helix spanning residues 475–510, with GFPGER502–507

ligated to the �2�1 integrin-I-domain (Fig. 3C). The lysine 479

fructosyl adduct was found to project outwards from the triplehelix, poised to affect collagen-ligand interactions. Further,whereas glycation would likely not directly interfere with inte-grin-I-domain binding, interactions of the integrin het-erodimer may potentially be affected. Examining cell and inte-grin interactions with native and glycated THPs, includingsequences spanning the integrin binding site and lysine 479,could test such hypotheses. Lastly, it was reported that the fibrilbinding zone for keratan sulfate PGs, including the “c” bands,overlaps with all of the predominant glycation sites, in contrast

FIGURE 3. Integrin binding site of type I collagen: function and fibrillar environment. A, the �1�1/�2�1 integrin binding site. GFPGER502–507 functions inangiogenesis, endothelial cell activation, and osteoblast differentiation, occupies a PG-clear zone, and coincides with sequences silent for mutations on the �1and �2 chains of M3 and on other monomers in the fibril (blue boxes, Figs. 2 and 3B). GFPGER502–507 neighbors the major region of non-enzymatic glycation ofthe fibril (blue stripe) and the glycation substrate, lysine 479 (Fig. 3C). B, GFPGER502–507 and mutation silent zones localize to a narrow region of the fibril surface.GFPGER502–507 (maroon); mutation silent zones numbered sequentially from N to C termini (blue). C, glycation may influence integrin-collagen interactions.Model of �2 integrin subunit I-domain (purple) binding to GFPGER502–507 adjacent to �2(I) fructosyl-lysine 479 (sugar). Note that glycation may potentiallyaffect collagen ligation by the integrin heterodimer (gray arrow), but likely not the I-domain. On collagen: �1 GFPGER502–507 (yellow); mutation silent zone(blue); glycine residues associated with lethal OI mutations (red). Gray arrow, 10 nm is the maximal estimated reach of the integrin heterodimer.

Domain Model of the Collagen Fibril

JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21193

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 8: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

to that of dermatan sulfate PGs that localize to the “d” and “e”bands (9). Consistent with this observation, it was reported thatkeratan sulfate PGs, but not dermatan sulfate PGs, exhibitreduced affinity for glycated collagen (46). Together theseobservations justify further investigations into the conse-quences of glycation to cell- and ligand-collagen interactions.Matrix Interaction Domain—The map suggests that, aside

from the cell interaction domain, the remainder of theD-periodcomprises a substrate for the binding of structural molecules(Figs. 2 and 4), which is supported by various lines of evidence.Thus, dermatan sulfate PGs occupy the e and d bands regions,keratan sulfate PGs fall within the a and c bands regions, andheparin binds the a bands region. Moreover, intermolecularcross-links in type I collagen exist between residues 87 and 930,and between type I collagen at residues 1023–1036 and type Vcollagen in the a and c band regions. Notably, the N- and C-ter-minal cross-links closely bracket the cell interaction domain,potentially stabilizing its complement of key sites, or otherwise

influencing its function or availabil-ity. Next, the gap or “hole zone” andits associated proteins are proposedto contain the nucleation sites uponwhich hydroxyapatite crystalgrowth during bone mineralizationoccurs. Last, several domainsimportant for collagen fibrillogen-esis map to the a bands region.Thus, the overlap zone and lateralborders of the gap zone may com-prise a “matrix interaction domain”where organic and inorganic matrixmolecules bind to connect the fibrilwith other extracellular matrix ele-ments, and to impart distinct struc-tural properties to collagen fibrils andthe tissue stroma inwhich they reside(Figs. 2 (black brackets) and 4).Non-overlapping Fibril Domains—

Integrins and MMPs (�5 nm wide)may be envisioned to interact withtheir binding sites in the cell inter-action domain of �20 nm wide andthus mainly influence ligands bind-ing adjacent sequences. However,the impact of fibril-bound PGs tocell-collagen interactions may bemore considerable. For example,the decorin core protein of�40 kDamay contain a glycosaminoglycanchain averaging 40 kDa (47). Thecore protein is proposed to bind col-lagen monomer(s) in the d/e bandsregion. However, if its glycosamin-oglycan chain is unrestrained, itcould reach 80 nm or more, poten-tially disrupting cell- and ligand-col-lagen interactions at distant sites.Instead, it isproposed that theanionic

glycosaminoglycans are mostly constrained within the matrixinteraction domain via their binding to the transverse electropos-itive bands on the fibril surface, adjacent to where the PG coreprotein binds (Figs. 1, 2, and 5). Thus, the fibril is viewed to consistof non-overlapping cell and matrix interaction domains (Figs. 4and 5). Nonetheless, some cross-talk between domains may belikely, e.g. if glycation affects integrin-GFPGER502–507 ligation, ormatricellular proteins like SPARC with broad fibril-binding foot-prints influence the function of both domains simultaneously(Figs. 2 and 4).That numerous functional sites of the type I collagen mole-

cule fall into either of two fibril domains according to theirfunction and that a preferred glycation substrate, lysine 479,coincides with themajor fibril glycation zone support the accu-racy of the overlap model of collagen fibril structure (48) uponwhich the map is based. Lastly, FACIT (fibril-associated colla-genswith interrupted triple helices) collagens (49) are proposedto interact with type I or II fibrils to influence inter-monomer

FIGURE 4. Domain model of the collagen fibril. A, schematic of domains on the collagen fibril D-period. Theputative cell interaction domain, including the �1�1/�2�1 integrin binding site GFPGER502–507 (maroon horse-shoe); MMP cleavage site (green arrow); and elements of MLBR2 (geometric figures); occupy the fibril overlapzone. The putative matrix interaction domain, including sites for intermolecular cross-linking (X); PG-binding(yellow and pink); and mineralization occupy the remainder of the fibril. B, physical mapping of the cell andmatrix interaction domains on the collagen microfibril. Selected elements of the putative cell and matrixinteraction domains were mapped, including GFPGER502–507 (maroon); MMP cleavage site (green arrowhead),MMP interaction domain (cyan); lateral edges of the fibronectin binding site (blue); keratan sulfate PG bindingsites (yellow); dermatan sulfate PG binding sites (pink); N- and C-terminal intermolecular cross-links (black andblue arrows), respectively. The SPARC binding site (black) was rendered transparent where it overlaps thefibronectin and MMP binding/cleavage sites. The three-dimensional image was derived from an x-ray diffrac-tion structure of the collagen microfibril in situ. The microfibrillar aspect that faces toward the outside of thefibril is at the top of the picture (the C terminus points outside of the fibril, while the N terminus faces insidethe fibril). Distances between GFPGER502–507 and domains for MMP interaction, or the N-terminal end ofthe fibronectin-binding domain were 11.2 and 5.0 nm, respectively. A view of microfibril function in thecontext of the collagen fibril appears in Fig. 5.

Domain Model of the Collagen Fibril

21194 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 283 • NUMBER 30 • JULY 25, 2008

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 9: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

associations within the fibril or inter-fibril dynamics (50).Whether FACIT collagens modulate the function of the celland matrix interaction domains proposed here must await fur-ther understanding of the physical nature of FACIT-fibrilinteractions.Domains in Other Fibrillar Collagens—The best characterized

elements of the cell andmatrix interaction domains of the humanfibril as proposed here include sites for integrin binding,GFPGER502–507; MMP-1 cleavage, GPQGIA; and N- andC-terminal intermolecular cross-linking, GMKGHR andGIKGHR, respectively (Fig. 2). These elements weresearched for in other fibrillar collagens (data not shown).The sequences, and their positions in the triple helices rela-tive to the cross-link sites, are identical or highly homolo-gous in type I collagens of other vertebrates and in thehuman type II/XI collagen heterotypic fibril predominant in

cartilage. Thus, it may be specu-lated that vertebrate fibrillar colla-gens share the same domainstructure.Domains in Heterotypic Fibrillar

Collagens—The type I collagen het-erotypic fibril often contains type IIIandV collagens, inwhich some of thedomain elements are altered in wayspredicted to be functionally signifi-cant (51). It thus is proposed that thephysical and cell-interactive proper-ties of heterotypic fibrilsmaybemod-ulated according to their contents ofminor collagens carrying variants ofone or more key sequences. Forexample, typeV collagen, which lacksboth an active integrin binding site atthe appropriate position and anMMP-1 cleavage site, when incorpo-rated in the type I fibril, may hindercell interactions and fibril remodelingin tissues where low rates of collagenmetabolism are desirable, such as thecornea.Domains and the Polarity of Fibril

Assembly—That type I and type IIcollagens and their fibrillar collag-en-binding partners may share thesame domain structure implies theyassemble in parallel into the hetero-typic fibril, where key functionaldomains fall into register. Thatatypical mutations in type I and IIIcollagens appear at disparateregions of their triple helicaldomains, yet cluster to commonzones of the fibril supports this con-tention (supplemental Fig. S1).However, although collagen mono-mer assembly into fibrils in vivooccursmost commonly in ahead-to-

tail orientation, head-to-head, or tail-to-tail assembly is also seenand plays a role in tissue morphogenesis and repair (52). It is thusnotable thatGFPGER502–507, proposedhere tobe themost criticalsequence of type I collagen, is locatedmidway between theN- andC-terminal cross-links. Therefore, if monomers assemble in aunipolar or bipolar direction into fibrils, GFPGER502–507 remainsaccessible for cell interactions.Regulation of Cell-Fibril Interactions—The domain structure

of the collagenmicrofibril was next considered in the context ofthe multivalent collagen fibril (Fig. 1). Although the fine struc-ture of the collagen fibril surface remains incompletely under-stood, our following speculations assume that, at least in somecases, ligand-binding sites may align between microfibrils, ashas been observed for some collagen-binding ligands (8, 53),although there are exceptions (54). Thus, in the fibril it is pro-posed that cell interaction domains appear every 5–6nmacross

FIGURE 5. Domain regulation of collagen fibril function. The collagen fibril is proposed to be composed of cellinteraction domains (unshaded) and matrix interaction domains (yellow and pink shading). In the cell interactiondomain integrin binding sites promote cell adhesion and integrin receptor ligation, clustering, and activation. Theconfluence of crucial cell and ligand-binding sites on the fibril surface implies their coordinate regulation. Thus,fibril-bound integrins may modulate other ligand-collagen interactions, and MMP cleavage of one monomer mayenhance or disrupt ligand-fibril interactions elsewhere. In the matrix interaction domain, PGs and other matrixmolecules bind to impart key structural properties to the fibril. Note: the fibril shown in this schematic includes tenmicrofibrils and spans two and a half, 67 nm wide collagen D-periods; however, the average collagen fibril in vivocontains many more microfibrils than depicted here and may reach hundreds of microns in length (52).

Domain Model of the Collagen Fibril

JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21195

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 10: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

the fibril’s width, and at 67 nm intervals along its length (Fig. 5).Assuming integrin heterodimers to be�10 nm in diameter (55,56) and collagen microfibril diameters �5–6 nm (10),GFPGER502–507-bound integrins are proposed to cluster instripes perpendicular to the long axis of the fibril, with oneintegrin bound per one or two microfibrils. According to thismodel, the fibril is an ideal substrate for integrin clustering,considered a key component of receptor activation and signal-ing (51, 56). Other collagen-binding ligands of �5–10 nm indiameter may potentially interact with one or more cell inter-action domains, each occupying areas of �5 � 20 nm on thefibril. It follows that the cell interaction domainmay accommo-date one, or at most several ligands coincidently, implying thatcell- and ligand-binding interactions may be coordinately reg-ulated. Therefore, it is speculated that integrins may positivelyor negatively modulate the interactions of other ligands withthe fibril, or vise versa. Furthermore, MMP cleavage of onemonomer may destabilize the fibril, displacing ligands, includ-ing integrins from neighboring monomers, or, conversely,could facilitate cell-fibril interactions (11) (not shown).In summary, analysis of the distribution of functional sites and

mutationsona two-dimensionalmodel of the type I collagen fibril,and on an x-ray diffraction structure of the microfibril in situ, hasidentified candidate cell and matrix interaction domains. More-over, such fibril domainsmay exist in heterotypic type I fibrils andother vertebrate fibrillar collagens. Defining the fine structure ofthe fibril domains and their functional inter-relationships mayopen new avenues for the therapeutic modulation of collagenfunction and metabolism in diseases, including fibrosis, athero-sclerosis, andOI,where the protein plays a prominent role and forthe engineering of fibrillar collagens and synthetic polymers fornumerous applications in humanmedicine.

Acknowledgments—We thank Laila Huq for advice on phosphorpho-ryn; Francisca Malfait for input regarding atypical mutations; GreggFields for discussion on MMPs; Richard Farndale for comments onvon Willebrand Factor and discoidin domain receptor2-collageninteractions; Sergey Leikin, Barbara Brodsky, Anton Persikov, DaleBodian,Warren Ewens, Olena Jacenko, Donald Gullberg, andMorrisKarnovsky for helpful comments; and Andrew Likens and NanitaBarchi for artwork. Please note that an interactive data base contain-ing a library of collagen ligand-binding sites and mutations inte-grated with physicochemical properties of the collagen proteins isunder construction.3

REFERENCES1. Piez, K. A., and Reddi, A. H. (1984) Extracellular Matrix Biochemistry,

Elsevier, New York2. Ayad, S., Boot-handford, R. P., Humpries, M. J., Kadler, K. E., and Shuttle-

worth, C. A. (1998) The Extracellular Matrix Facts Book, 2nd Ed., Aca-demic Press, San Diego

3. Prockop,D. J., andKivirikko, K. I. (1995)Annu. Rev. Biochem. 64, 403–4344. Kadler, K. E., Baldock, C., Bella, J., and Boot-Handford, R. P. (2007) J. Cell

Sci. 120, 1955–19585. Marini, J. C., Forlino, A., Cabral, W. A., Barnes, A. M., San Antonio, J. D.,

Milgrom, S., Hyland, J. C., Korkko, J., Prockop, D. J., De Paepe, A., Coucke,P., Symoens, S., Glorieux, F. H., Roughley, P. J., Lund, A. M., Kuurila-

Svahn, K., Hartikka, H., Cohn, D. H., Krakow, D., Mottes, M., Schwarze,U., Chen, D., Yang, K., Kuslich, C., Troendle, J., Dalgleish, R., and Byers,P. H. (2007) Hum. Mutat. 28, 209–221

6. Yang, C., Hillas, P. J., Baez, J. A., Nokelainen, M., Balan, J., Tang, J., Spiro,R., and Polarek, J. W. (2004) BioDrugs 18, 103–119

7. Prockop, D. J., and Kivirikko, K. I. (1984) N. Engl. J. Med. 311, 376–3868. Scott, J. E. (1988) Biochem. J. 252, 313–3239. Di Lullo, G.A., Sweeney, S.M., Korkko, J., Ala-Kokko, L., and SanAntonio,

J. D. (2002) J. Biol. Chem. 277, 4223–423110. Orgel, J. P., Irving, T. C.,Miller, A., andWess, T. J. (2006) Proc. Natl. Acad.

Sci. U. S. A. 103, 9001–900511. Perumal, S., Antipova, O., and Orgel, J. P. (2008) Proc. Natl. Acad. Sci.

U. S. A. 105, 2824–282912. Lauer-Fields, J. L., Tuzinski, K. A., Shimokawa, K., Nagase, H., and Fields,

G. B. (2000) J. Biol. Chem. 275, 13282–1329013. Sweeney, S. M., DiLullo, G., Slater, S. J., Martinez, J., Iozzo, R. V., Lauer-

Fields, J. L., Fields, G. B., and San Antonio, J. D. (2003) J. Biol. Chem. 278,30516–30524

14. Farndale, R. W., Lisman, T., Bihan, D., Hamaia, S., Smerling, C. S., Pugh,N., Konitsiotis, A., Leitinger, B., deGroot, P.G., Jarvis, G. E., andRaynal, N.(2008) Biochem. Soc. Trans. 36, 241–250

15. Linsenmayer, T. F. (1991) in Cell Biology of Extracellular Matrix (Hay,E. D., ed) pp. 7–44, Plenum Press, New York

16. Orgel, J. P., Wess, T. J., and Miller, A. (2000) Structure 8, 137–14217. Christopher, J. A., Swanson, R., and Baldwin, T. O. (1996)Comput. Chem.

20, 339–34518. Chen, J. M., Sheldon, A., and Pincus, M. R. (1995) J. Biomol. Struct. Dyn.

12, 1129–115919. Weiner, S. J., Kollman, P. A., Case, D. A., Singh, U. C., Ghio, C., Alagona,

G., Profetta, S., and Weiner, P. (1984) J. Am. Chem. Soc. 106, 765–78420. Lee, K. S., Song,H. R., Cho, T. J., Kim,H. J., Lee, T.M., Jin, H. S., Park, H. Y.,

Kang, S., Jung, S. C., and Koo, S. K. (2006) Hum. Mutat. 27, 59921. Ward, L. M., Lalic, L., Roughley, P. J., and Glorieux, F. H. (2001) Hum.

Mutat. 17, 43422. Pallos, D., Hart, P. S., Cortelli, J. R., Vian, S., Wright, J. T., Korkko, J.,

Brunoni, D., and Hart, T. C. (2001) Arch. Oral Biol. 46, 459–47023. Malfait, F., Symoens, S., De Backer, J., Hermanns-Le, T., Sakalihasan, N.,

Lapiere, C. M., Coucke, P., and De Paepe, A. (2007) Hum. Mutat. 28,387–395

24. Gensure, R. C., Makitie, O., Barclay, C., Chan, C., Depalma, S. R., Bastepe,M., Abuzahra, H., Couper, R., Mundlos, S., Sillence, D., Ala Kokko, L.,Seidman, J. G., Cole, W. G., and Juppner, H. (2005) J. Clin. Invest. 115,1250–1257

25. Pollitt, R., McMahon, R., Nunn, J., Bamford, R., Afifi, A., Bishop, N., andDalton, A. (2006) Hum. Mutat. 27, 716

26. Yoneyama, T., Kasuya, H., Onda, H., Akagawa, H., Hashiguchi, K., Naka-jima, T., Hori, T., and Inoue, I. (2004) Stroke 35, 443–448

27. Cabral, W. A., Makareeva, E., Letocha, A. D., Scribanu, N., Fertala, A.,Steplewski, A., Keene, D. R., Persikov, A. V., Leikin, S., and Marini, J. C.(2007) Hum. Mutat. 28, 396–405

28. Korkko, J., Ala-Kokko, L., De Paepe, A., Nuytinck, L., Earley, J., andProckop, D. J. (1998) Am. J. Hum. Genet. 62, 98–110

29. Knight, C. G., Morton, L. F., Onley, D. J., Peachey, A. R., Messent, A. J.,Smethurst, P. A., Tuckwell, D. S., Farndale, R.W., and Barnes, M. J. (1998)J. Biol. Chem. 273, 33287–33294

30. Knight, C. G., Morton, L. F., Peachey, A. R., Tuckwell, D. S., Farndale,R. W., and Barnes, M. J. (2000) J. Biol. Chem. 275, 35–40

31. Zhang, W. M., Kapyla, J., Puranen, J. S., Knight, C. G., Tiger, C. F., Penti-kainen, O. T., Johnson, M. S., Farndale, R. W., Heino, J., and Gullberg, D.(2003) J. Biol. Chem. 278, 7270–7277

32. Giancotti, F. G., and Ruoslahti, E. (1999) Science 285, 1028–103233. Hay, E. D. (ed) (1991) Cell Biology of Extracellular Matrix, Plenum Press,

New York and London34. Persikov, A. V., Ramshaw, J. A., and Brodsky, B. (2005) J. Biol. Chem. 280,

19343–1934935. Makareeva, E., Mertz, E. L., Kuznetsova, N. V., Sutter, M. B., DeRidder,

A. M., Cabral, W. A., Barnes, A. M., McBride, D. J., Marini, J. C., andLeikin, S. (2008) J. Biol. Chem. 283, 4787–47983 D. L. Bodian and T. E. Klein, manuscript in preparation.

Domain Model of the Collagen Fibril

21196 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 283 • NUMBER 30 • JULY 25, 2008

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 11: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

36. Byers, P. H., Wallis, G. A., and Willing, M. C. (1991) J. Med. Genet. 28,433–442

37. Scott, J. E., and Tenni, R. (1997) Cell Biochem. Funct. 15, 283–28638. Xu, Y., Gurusiddappa, S., Rich, R. L., Owens, R. T., Keene, D. R.,Mayne, R.,

Hook, A., and Hook, M. (2000) J. Biol. Chem. 275, 38981–3898939. Holtkotter, O., Nieswandt, B., Smyth, N., Muller, W., Hafner, M., Schulte,

V., Krieg, T., and Eckes, B. (2002) J. Biol. Chem. 277, 10789–1079440. Tsilibary, E. C. (2003) J. Pathol. 200, 537–54641. Paul, R. G., and Bailey, A. J. (1996) Int. J. Biochem. Cell Biol. 28, 1297–131042. Brennan, M. (1989) J. Biol. Chem. 264, 20947–2095243. McCarthy, A. D., Etcheverry, S. B., Bruzzone, L., Lettieri, G., Barrio, D. A.,

and Cortizo, A. M. (2001) BMC Cell Biol. 2, 1644. Paul, R. G., and Bailey, A. J. (1999) Int. J. Biochem. Cell Biol. 31, 653–66045. Chen, J., Brodsky, S., Li, H., Hampel, D. J., Miyata, T., Weinstein, T.,

Gafter, U., Norman, J. T., Fine, L. G., and Goligorsky, M. S. (2001) Am. J.Physiol. 281, F71–F80

46. Reigle, K. L., Di Lullo, G., Turner, K. R., Last, J. A., Chervoneva, I., Birk,D. E., Funderburgh, J. L., Elrod, E., Germann, M. W., Surber, C., Sander-son, R. D., and San Antonio, J. D. (2008) J. Cell. Biochem., in press

47. Vogel, K. G., and Heinegard, D. (1985) J. Biol. Chem. 260, 9298–930648. Chapman, J. A. (1974) Connect. Tiss. Res. 2, 137–15049. Shaw, L. M., and Olsen, B. R. (1991) Trends Biochem. Sci. 16, 191–19450. Eyre, D. R., Pietka, T., Weis, M. A., andWu, J. J. (2004) J. Biol. Chem. 279,

2568–257451. Siljander, P. R., Hamaia, S., Peachey, A. R., Slatter, D. A., Smethurst, P. A.,

Ouwehand,W. H., Knight, C. G., and Farndale, R.W. (2004) J. Biol. Chem.279, 47763–47772

52. Kadler, K. E., Holmes, D. F., Trotter, J. A., and Chapman, J. A. (1996)Biochem. J. 316, 1–11

53. SanAntonio, J. D., Lander, A. D., Karnovsky,M. J., and Slayter, H. S. (1994)J. Cell Biol. 125, 1179–1188

54. Holmes, D. F., Gilpin, C. J., Baldock, C., Ziese, U., Koster, A. J., and Kadler,

K. E. (2001) Proc. Natl. Acad. Sci. U. S. A. 98, 7307–731255. Nermut, M. V., Green, N. M., Eason, P., Yamada, S. S., and Yamada, K. M.

(1988) EMBO J. 7, 4093–409956. Emsley, J., Knight, C. G., Farndale, R. W., Barnes, M. J., and Liddington,

R. C. (2000) Cell 101, 47–5657. Thomas, E. K., Nakamura, M., Wienke, D., Isacke, C. M., Pozzi, A., and

Liang, P. (2005) J. Biol. Chem. 280, 22596–2260558. Koide, T., Takahara, Y., Asada, S., andNagata, K. (2002) J. Biol. Chem. 277,

6178–618259. Cabral,W. A.,Makareeva, E., Colige, A., Letocha, A. D., Ty, J. M., Yeowell,

H. N., Pals, G., Leikin, S., and Marini, J. C. (2005) J. Biol. Chem. 280,19259–19269

60. Makareeva, E., Cabral, W. A., Marini, J. C., and Leikin, S. (2006) J. Biol.Chem. 281, 6463–6470

61. Morello, R., Bertin, T. K., Chen, Y., Hicks, J., Tonachini, L.,Monticone,M.,Castagnola, P., Rauch, F., Glorieux, F. H., Vranka, J., Bachinger, H. P., Pace,J.M., Schwarze, U., Byers, P. H.,Weis,M., Fernandes, R. J., Eyre, D. R., Yao,Z., Boyce, B. F., and Lee, B. (2006) Cell 127, 291–304

62. Wang, H., Fertala, A., Ratner, B. D., Sage, E. H., and Jiang, S. (2005) Anal.Chem. 77, 6765–6771

63. Fujisawa, R., Zhou, H., and Kuboki, Y. (1994) Connect. Tiss. Res. 31, 1–1064. Huq, N. L., Loganathan, A., Cross, K. J., Chen, Y. Y., Johnson, N. I., Wil-

letts, M., Veith, P. D., and Reynolds, E. C. (2005) Arch. Oral Biol. 50,807–819

65. Lisman, T., Raynal, N., Groeneveld, D., Maddox, B., Peachey, A. R., Huiz-inga, E. G., de Groot, P. G., and Farndale, R. W. (2006) Blood 108,3753–3756

66. Konitsiotis, A. D., Raynal, N., Bihan, D., Hohenester, E., Farndale, R. W.,and Leitinger, B. (2008) J. Biol. Chem. 283, 6861–6868

67. Baronas-Lowell, D., Lauer-Fields, J. L., and Fields, G. B. (2004) J. Biol.Chem. 279, 952–962

68. Reyes, C. D., and Garcia, A. J. (2004) J. Biomed. Mater. Res. A 69, 591–600

Domain Model of the Collagen Fibril

JULY 25, 2008 • VOLUME 283 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 21197

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from

Page 12: CandidateCellandMatrixInteractionDomainsonthe ... · and joining monomers 1 and 5. Fourth, select data from the two-dimensional map were correlated with the known three-dimensional

James D. San AntonioAla-Kokko, Antonella Forlino, Wayne A. Cabral, Aileen M. Barnes, Joan C. Marini and

Turner, Gloria A. Di Lullo, Steven Chen, Olga Antipova, Shiamalee Perumal, Leena Shawn M. Sweeney, Joseph P. Orgel, Andrzej Fertala, Jon D. McAuliffe, Kevin R.

Predominant Protein of VertebratesCandidate Cell and Matrix Interaction Domains on the Collagen Fibril, the

doi: 10.1074/jbc.M709319200 originally published online May 15, 20082008, 283:21187-21197.J. Biol. Chem. 

  10.1074/jbc.M709319200Access the most updated version of this article at doi:

 Alerts:

  When a correction for this article is posted• 

When this article is cited• 

to choose from all of JBC's e-mail alertsClick here

Supplemental material:

  http://www.jbc.org/content/suppl/2008/05/27/M709319200.DC1

  http://www.jbc.org/content/283/30/21187.full.html#ref-list-1

This article cites 63 references, 31 of which can be accessed free at

by guest on January 12, 2020http://w

ww

.jbc.org/D

ownloaded from