5
Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural model of the nucleotide-binding conserved component of periplasmic permeases (molecular modelng/sequence alinent/membrane transport/traffic ATPases) CAROL S. MIMURA*, STEPHEN R. HOLBROOKt, AND GIOVANNA FERRO-LUZZI AMES*t *Division of Biochemistry and Molecular Biology, and tChemical Biodynamics Division, University of California, Berkeley, CA 94720 Communicated by Howard K. Schachman, September 26, 1990 (received for review July 24, 1990) ABSTRACT The amino acid sequences of 17 bacterial membrane proteins that are components of periplasmic per- meases and function in the uptake of a variety of small molecules and ions are highly homologous to each other and contain sequence motifs characteristic of nucleotide-binding proteins. These proteins are known to bind ATP and are postulated to be the energy-coupling components of the per- meases. Several medically important eukaryotic proteins, in- cluding the multidrug-reslstance transporters and the protein encoded by the cystic fibrosis gene, are also homologous to this family. By multiple sequence alignment of these 17 proteins, the consensus sequence, secondary structure, and surface exposure were predicted. The secondary structural motifs that are conserved among nucleotide-binding proteins were identified in adenylate kinase, p21rw, and elongation factor Tu by superposition of their known tertiary structures. The equiva- lent secondary structural elements in the predicted conserved component were located. These, together with sequence infor- mation, served as guides for alignment with adenylate kinase. A model for the structure of the ATP-bindlng domain of the permease proteins is proposed by analogy to the adenylate kinase structure. The characteristics of several permease mu- tations and biochemical data lend support to the model. Periplasmic active transport systems (permeases) in Gram- negative bacteria transport a wide variety of substrates, including amino acids, peptides, ions, carbohydrates, and vitamins. These permeases share a common organization consisting of a substrate-binding protein that is located in the periplasm and imparts substrate specificity and a membrane- bound complex consisting of two hydrophobic membrane- spanning proteins plus a third membrane protein with a hydrophilic sequence (1). When different permeases are compared, the hydrophilic membrane protein (hereafter re- ferred to as the conserved component) invariably displays large stretches of sequence similarity, two of which are homologous to a previously defined ATP-binding consensus (1, 2), such as found in the a and f3 subunits of the FoF, ATPase, myosin, adenylate kinase, and others (3). This finding suggested that the function of the conserved compo- nent is to couple the energy of ATP hydrolysis to active transport. Indeed, recent experiments showed that ATP (and GTP) bind to the conserved components (4, 5) and that ATP hydrolysis is coupled to active transport (6-9). The con- served component has also been implicated in an interaction with the substrate-binding protein (10). Thus, each conserved component must include unique domains reflecting its inter- action with the individual hydrophobic membrane proteins and the specific periplasmic protein. Besides periplasmic permeases, other transport-related proteins are also homologous to the conserved components, indicating that these proteins constitute a superfamily with a common evolutionary origin and/or biochemical function. It has been proposed that members of this family be called "traffic ATPases" because they transport a variety of sub- strates at the expense of ATP and they translocate in both directions (11). Among these are several eukaryotic proteins such as the family of multidrug-resistance transporters of tumor cells (Mdr), the yeast protein STE6 responsible for secretion of the mating a-factor, the cystic fibrosis gene product (CFT-R), and others. To understand the molecular mechanism of action of the bacterial permeases and of the related eukaryotic proteins, it is necessary to know the structure of the conserved compo- nents. However, none of these proteins has been crystallized. Therefore, we have initiated a structure-function analysis by predicting the three-dimensional structure of the conserved components. Since proteins that share a common function are expected to conserve the three-dimensional folding pat- tern required for that function even though little sequence similarity may exist, we have determined the structural and functional constraints in common between the conserved components and proteins of known structure. A sequence alignment of the conserved components was performed, from which the consensus sequence, secondary structure, and surface exposure pattern were predicted. Then, based on a comparison of the primary and secondary structure with the known structures of the nucleotide-binding proteins adenyl- ate kinase, elongation factor Tu (EF-Tu), and p2115, a three-dimensional model was inferred for the conserved components and in particular for the histidine permease conserved component, HisP. METHODS Sequences were initially compared with the program GEN- ALIGN (12) constraining the nucleotide-binding motifs to align. Multiple sequence alignment utilized the method of Vingron and Argos (13). Secondary structure prediction for each pro- tein was made using both the Chou-Fasman approach (14) and a modification of the neural network method of Qian and Sejnowski (15). The surface accessibility was predicted by the method of Holbrook et al. (16). Computer graphics visualiza- tions of protein models as well as molecular superpositions and model building were done with the program INSIGHT. RESULTS AND DISCUSSION Sequence Aliment of the Conserved Components. The sequences of 17 conserved components (for review, see refs. 1 and 11; two of the protein, RbsA and AraG, have been divided into the amino and carboxyl halves because each half contains a nucleotide-binding consensus) were compared to one another through the repeated use of GENALIGN and were grouped in clusters as follows: (1) HisP and GlnQ; (2) AraG(C), RbsA(C), and CysA; (3) PstB, SfuC, UgpC, ProV, Abbreviation: EF, elongation factor. tTo whom reprint requests should be addressed. 84 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on February 2, 2021

Structural model of the nucleotide-binding ...Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural modelofthe nucleotide-binding conservedcomponentof

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structural model of the nucleotide-binding ...Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural modelofthe nucleotide-binding conservedcomponentof

Proc. Nati. Acad. Sci. USAVol. 88, pp. 84-88, January 1991Biochemistry

Structural model of the nucleotide-binding conserved component ofperiplasmic permeases

(molecular modelng/sequence alinent/membrane transport/traffic ATPases)

CAROL S. MIMURA*, STEPHEN R. HOLBROOKt, AND GIOVANNA FERRO-LUZZI AMES*t*Division of Biochemistry and Molecular Biology, and tChemical Biodynamics Division, University of California, Berkeley, CA 94720

Communicated by Howard K. Schachman, September 26, 1990 (receivedfor review July 24, 1990)

ABSTRACT The amino acid sequences of 17 bacterialmembrane proteins that are components of periplasmic per-meases and function in the uptake of a variety of smallmolecules and ions are highly homologous to each other andcontain sequence motifs characteristic of nucleotide-bindingproteins. These proteins are known to bind ATP and arepostulated to be the energy-coupling components of the per-meases. Several medically important eukaryotic proteins, in-cluding the multidrug-reslstance transporters and the proteinencoded by the cystic fibrosis gene, are also homologous to thisfamily. By multiple sequence alignment ofthese 17 proteins, theconsensus sequence, secondary structure, and surface exposurewere predicted. The secondary structural motifs that areconserved among nucleotide-binding proteins were identifiedin adenylate kinase, p21rw, and elongation factor Tu bysuperposition of their known tertiary structures. The equiva-lent secondary structural elements in the predicted conservedcomponent were located. These, together with sequence infor-mation, served as guides for alignment with adenylate kinase.A model for the structure of the ATP-bindlng domain of thepermease proteins is proposed by analogy to the adenylatekinase structure. The characteristics of several permease mu-tations and biochemical data lend support to the model.

Periplasmic active transport systems (permeases) in Gram-negative bacteria transport a wide variety of substrates,including amino acids, peptides, ions, carbohydrates, andvitamins. These permeases share a common organizationconsisting of a substrate-binding protein that is located in theperiplasm and imparts substrate specificity and a membrane-bound complex consisting of two hydrophobic membrane-spanning proteins plus a third membrane protein with ahydrophilic sequence (1). When different permeases arecompared, the hydrophilic membrane protein (hereafter re-ferred to as the conserved component) invariably displayslarge stretches of sequence similarity, two of which arehomologous to a previously defined ATP-binding consensus(1, 2), such as found in the a and f3 subunits of the FoF,ATPase, myosin, adenylate kinase, and others (3). Thisfinding suggested that the function of the conserved compo-nent is to couple the energy of ATP hydrolysis to activetransport. Indeed, recent experiments showed that ATP (andGTP) bind to the conserved components (4, 5) and that ATPhydrolysis is coupled to active transport (6-9). The con-served component has also been implicated in an interactionwith the substrate-binding protein (10). Thus, each conservedcomponent must include unique domains reflecting its inter-action with the individual hydrophobic membrane proteinsand the specific periplasmic protein.

Besides periplasmic permeases, other transport-relatedproteins are also homologous to the conserved components,

indicating that these proteins constitute a superfamily with acommon evolutionary origin and/or biochemical function. Ithas been proposed that members of this family be called"traffic ATPases" because they transport a variety of sub-strates at the expense of ATP and they translocate in bothdirections (11). Among these are several eukaryotic proteinssuch as the family of multidrug-resistance transporters oftumor cells (Mdr), the yeast protein STE6 responsible forsecretion of the mating a-factor, the cystic fibrosis geneproduct (CFT-R), and others.To understand the molecular mechanism of action of the

bacterial permeases and of the related eukaryotic proteins, itis necessary to know the structure of the conserved compo-nents. However, none ofthese proteins has been crystallized.Therefore, we have initiated a structure-function analysis bypredicting the three-dimensional structure of the conservedcomponents. Since proteins that share a common functionare expected to conserve the three-dimensional folding pat-tern required for that function even though little sequencesimilarity may exist, we have determined the structural andfunctional constraints in common between the conservedcomponents and proteins of known structure. A sequencealignment ofthe conserved components was performed, fromwhich the consensus sequence, secondary structure, andsurface exposure pattern were predicted. Then, based on acomparison of the primary and secondary structure with theknown structures of the nucleotide-binding proteins adenyl-ate kinase, elongation factor Tu (EF-Tu), and p2115, athree-dimensional model was inferred for the conservedcomponents and in particular for the histidine permeaseconserved component, HisP.

METHODSSequences were initially compared with the program GEN-ALIGN (12) constraining the nucleotide-binding motifs to align.Multiple sequence alignment utilized the method of Vingronand Argos (13). Secondary structure prediction for each pro-tein was made using both the Chou-Fasman approach (14) anda modification of the neural network method of Qian andSejnowski (15). The surface accessibility was predicted by themethod of Holbrook et al. (16). Computer graphics visualiza-tions ofprotein models as well as molecular superpositions andmodel building were done with the program INSIGHT.

RESULTS AND DISCUSSIONSequence Aliment of the Conserved Components. The

sequences of 17 conserved components (for review, see refs.1 and 11; two of the protein, RbsA and AraG, have beendivided into the amino and carboxyl halves because each halfcontains a nucleotide-binding consensus) were compared toone another through the repeated use of GENALIGN and weregrouped in clusters as follows: (1) HisP and GlnQ; (2)AraG(C), RbsA(C), and CysA; (3) PstB, SfuC, UgpC, ProV,

Abbreviation: EF, elongation factor.tTo whom reprint requests should be addressed.

84

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 2: Structural model of the nucleotide-binding ...Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural modelofthe nucleotide-binding conservedcomponentof

Biochemistry: Mimura et al.

FhuC, AraA(N), and RbsA(N); (4) ChID, OppD, and OppF;(5) MalK; (6) BtuD. After a prealignment of the sequenceswithin clusters 1-4, the multiple sequence alignment shownin Fig. 1 was built up by consecutive additions of thesubgroups to the growing list. The consensus sequence isshown at the bottom. Nine gaps in the consensus sequenceare due to insertion of extra residues in individual proteins,presumably corresponding to segments related to specialfunctions. Regions of similarity presumably reflect segmentswith common structure and function. Two regions of partic-

Proc. Natl. Acad. Sci. USA 88 (1991) 85

ularly high similarity (residues 52-63 and residues 183-221)clearly stand out and correspond to the well conservednucleotide-binding motif (3). A particularly variable regionoccurs around residue 100 and will be discussed below.

Prediction of Consensus Structural Features. The consensusprediction of secondary structure is illustrated schematicallyin Fig. 1. The conserved regions correspond to conservedstructural motifs discussed below. A striking feature is thelarge helical domain consisting of four a-helices (H1 to H4)totaling 99 residues that are predicted to occur consecutively

HisP t 10 20 30 40 50 60 70 80 90 100 110 120HisP MMSENKLHVIDLHKRY------------- GGHEVLKGVSLQARAGDVISI IGSSGSGKSTFLRCINFL---P--EXPSEGAI IVNGQNINLVRDKDGQLKVADKNQLRLLRTRLTMVFQHF--NLWSHMTVLENV--MEAPIQ ----- VLGGlnQ -----MIEFKNVSKHF------------ GPTQVLHNIDLNIAQGEVVVI IGPSGSGKSTLLRCINKL-----EEITSGDLIV------------GDLKVNDPKVDERLIRQEAGMVFQQ--FYLFPHLTALEN--VMFGPL.-----RVRCysA ----MSIEIANIKKSF-------------GRTQVLNDISLDIPSGQMVALLGPSGSGKTTLLRI IAGL----EHQTSGHIRFHGTDVS---------------RLHARDRKVGFVFQHY--ALFRHMTVFDNIAFGLTVLP----- RRERbsA APGDIRLKVDNLCGP -----------------GVND-VSFTLRKGEILGVSGLMGAGRTELMKVLYGA---- LPRTSGYVTLDGHEWTRSPQDG---LANGIVYISEDRXRDGLVLGM--SVKENMSLTALRYFSRAGGS-----LKHAraG< SYGEERLRLDAVKAP-----------------GVRTPISLAVRSGEIVGLFGLVGAGRSELMKGMFGG-----TQOITAGQVYIDQQPIDIRKPSHA--- IAAGMMLCPEDRKAEGI IPVH--SVRDNINISARRKHVLGGCV-----INNFhuC< NHSDTTFALRNISFRV------PGRTLLHPLSLTFPAGKVTGLIGHNGSGKSTLLKMLGRH---- QPPSEGEILLDAQPLE--------- SWSSKA----FARKVAYLPQQL--PPAEGMTVRELVAIGRYPWHGALGRFGASfuC ---MSTLELHGIGKSY------------- NAIRVLEHIDLQVAAGSRTAIVGPSGSGKTTLLRIIAGF -----EIPDGGQILLQGQAMG-----------NGSGWVPAHLRGIGFVPQDG--ALFPHFTVAGNIGFGL----------- KProV< EKGLSKEQILEKT----------------GLSLGVKDASLAIEEGEIFVIMGLSGSGKSTMVRLLNRL----- IEPTRGQVLIDGVDIA---------KISDAELREVRRKKIAMVFQSF--ALMPHMTVLDNTAFGM-----ELAGIAAUgpC ---MAGLKLQAVTKSW------------ DGKTQVIKPLTLDVADGEFIVMVGPSGCGKSTLLRMVAGL----ERVTEGDIWINDQRVT--------- EMEPKD------ RGIAMVFQNY--ALYPHMSVEENMAWGL-----KIRGMGKPstB< ETAPSKIQVRNLNFYY.-------.------GKFHALKNINLDIAXNQVTAFIGPSGCGKSTLLRTFNKMFELYPEQRAEGEILLDGDNI-----------LTNSQDIALLRAKVGMVFQKP--TPFP-MSIYDNIAFGV---- RLFKLSRRbsA --MEALLQLKGIDKAF-----------PG-VKALSGAALNVYPGRVMALVGENGAGKSTMMKVLTGI-----YTRDAGTLLWLGKETTFTGPKSSQEA- -----------GIGIIHQEL--NLIPQLTIAENIFLGR-EFVNRFGKIDWAraG< QQSTPYLSFRGIGKTF-----------PG-VKALTDISFDCYAGQVHALMGENGAGKSTLLKILSGN-----YAPTTGSVVINGQEMSFSDTTAALNA------------GVAIIYQEL--HLVPEMTVAENIYLG--QLPHKGGIVNROppD< QPANVLLEVNDLRVTFAT --------PDGDVTAVNDLNFTLRAGETLGIVGESGSGKSQTAFALMGL--LATNGRIGGSATFNGREI---------LNLPERELNTRRAEQISMIFQDPMTSLNPYMRVGEQL-----MEVLMLHKGOppF< EQRKVLLEIADLKVHFDIKEGKQWFWQPPKTLKAVDGVTLRLYEGETLGVVGESGCGKSTFARAI IGL-----VKATDGKVAWLGKDL---------LGMKADEWREVRSD-IQMIFQDPLASLNPRMTIGEII------- AEPLRTYHPChlD ----------.MLELNFSQTLGNHCL---------- TINETLPANGITAIFGVSGAGKTSLINAISGL-----TRPQKGRIVLNGRVL--------NDAEKGICLTPEKRRVGYVFQDA--RLFPHYKV-----------RGNLRYGMSMalK --------------.--MASVQLQNVTKAWGEVVVSKDINLDIHEGEFVVFVGPSGCGKSTLLRMIAGL----- ETITSGDLFIGEKRMNDTPPAE---------------RGVGMVFQSY--ALYPHLSVAENMSFGLKPAGAK-----KBtuD ------------------- MSIVMQLQDVAESTRLGPLSGEVRAGEILHLVGPNGAGKSTLLARMAGM------TSGKGSIQFAGQPLEAWSATKLALHRA-------------------------YLSQQQTPPFATPVWHYLTLHQHDCons J 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150Consensus EQGESKLELRNLKKSF-------------GEVKVLKDISLDIRAGEVLALVGPSGSGKSTLLRI IAGL-----ETPTEGDILLDGQEIE--------KLADKELLEPRRRKIGMVFQDY--ALFPHMTVAENIAFGL-PWP-KLRKIHRConserved 0 o 0 0 Oo O oO OOOOooo 00 o O 0 00 000o0 0 0

Exposure tttt t t tt ttt tt t t t tttt tt tttt t t tt tt t t tttt tttt t t t t tSecondary ++++++ ^^^^^^-----++++++++++++ ........... +++++++++++++++++ ++++++++++ +++++++

B1 Ti Al B2 T2 H1 H2 ---H3->

HisP J 130 140 150 160 170 180 190 200 210 220 230 240 250HisP LSKHDARERALKYLAK--VGIDERAQGKYPVHLSGGQQQRVSIA-RALAMEPD------VLLFDEPTSALDPELVGEVLRIMQQLAEE-GKTMVVVTHEMGFARHVsSHvIFLHQGKI ------EEEGDPEQVFGNPQSPRLQQFLKGSLKGlnQ GANKEEAEKLARELLA--KVGLAERAHHYPSELSGGQQQRVAIA-RALAVKPK------MMLFDGPTSALDPEI.RHEVLKVMQDLAEE-GMTMVIVTHEIGFAEKVASRLIFIDKGRI------AEDGNPQVLIKNPPSQRLQEFLQHVS-CysA> RPNAAAIKAKVTKLLE--MVQLAHLADRYPAHVSGGQXQRVALA-RALAVEPQ------ ILLLDEPFGALDAQVRKELRRWLRQLHEELKFTSVFVTHDQEEATEVADRVVVNsQGN I------EQADAPDQVMREPATRFVLEFMGENPDRbsA> ADEQQAVSDFIRLFNV--K---TP SMEQAIGLLSGGNQQKVAIA-RGLMTRPK------VLILDEPT-GVDVGAKKEIYQLINQFKAD-GLSIILVSSEMPEVLGMSDRIIVMHEGHLSGEFTREQA------------- TQEVLM----AraG> GWEENNADHHIRSLNI--K---TPGAEQLIMNLSGGNQQKAILG-RWLSEE8K-------VILLDEPTRG-IDVGAKHEIYNVIYALAAQ-GVAVLFASSDLPEVLGVADRIVVMREGE-IAGELLHEQADERQAL----------SLAMPKVSFhuC> ADR-EKVEEAISLVGL------KPLAHRLVDSLSGGERQRAWIA-MLVAQDSR-----CLLLDEPTSALDIAHQVDVLSLVHRLSQERGLTVIAVLHDINMAARYCDYLVALRGGEM -------AQGTPAEIMRGETLEMIYGIPMGILPSfuC> GGKREKQRRIEALMEM--VALDRRLAALNPHELSGGQQQRVALA-RALSQQPR------LMLLDEPFSALDTGLRAATRKAVAELLTEAKVASILVTHDQSEALSFADQVVMRSGRLA- QVGAPQDLYLRPVDEPTASFLGETLVProV> QERREKALDALRQVGL------ENYAHAYPDELSGGMRQRVGLA-RALAINPD.------ ILMDEAFSALDPLIRTEMQDELVKLQAKHQRTIVFISHDLDEAMIGDRIAIMQNGEVV------QVGTPDEILNNPANDYVRTFFR----UgpC> QQIAERVKEAARILEL------DGLLKRRPRELSGGQRQRVAMG-RAIVRDPA------VFLFDEPLSNLDAKLRVORLELQQLHRRLKTTSLYVTHDQVEAMTLAQRVMVMNGGVAE------QIGTPVEVYEKPASLFVASFIG ----PstB ADMDERVQWALTKAAL--WNETKDKLHQSGYSLSGGQQQRLCIA-RGIAIRPE------VLLLDEPCSALDPPISTGRIEELITEL--KQDYTVVIVTHNMQQAARCSDHTAFMYLGELI------EFSNTDDLFTKPAKKQTEDYITGRYGRbsA KTMYAEADKLLAKLNL--RFKSDKLVG----DLSIGDQQMVEIIA-KVLSFESK------VI IMDEPTDALTDTETESLFRIVIRELKSQ-GRG IVY I SHRMKEIFEICDDVTVFRDGQFIAE-REVASLTEDSLIEMMVGRXLEDQYPHLDKAraG SLLNYEAGLQLKHLGM--DIDPDTPLK----YLSIGQWQMVEIA-KALARNAK------ I-IAFDEPTSSLSAREIDNLFRVIRELRKE-GRVILYVSHRMEEIFALSDAITVFKDGRYVKTFTDMQQVDHDALVQAMVGRDIGDIY-----OppD> MSKAEAFEESVRMLDAVKMPEARKRMKMYPHEFSGGMRQRVMIA-MALLCRPK------LLOIADEPTTALDVTVQAQIMTLLNELKREFNTAI IMITHDLGWAGICDKVLViMYAGRTM-------EYGKARDVFYQP----VHPY-----SOppF> KLSRQDVRDRVKAM-MLKVGLLPNLINRYPHEFSGGQCQRIGIA-RALILEPK------LIICDDAVSALDVSIQAQVVNLLQQLQREMGLSLIFIAHDLAWKHaISDRVLVMYLGHAV------ ELGTYDEVYHNP----LHPY-----TChlD> KSMVDQFDKLVALLG------IEPLLDRLPGSLSGGEKQRVAIG-RALLTAPE------LLL0.DEPLASLDIPRKRELLPYLQRLTREINIPMLYVSHSLDEILHLADRVMVLENGQVK -----AFGALEEVWGSSV---MNPWLPKEQQMalK> EVINQRVNQVAEVLQL------AHLLDRKPKALSGGQRQRVAIG-RTLVAEPS------VFLLDEPLSNLDAALRVQMIEISRLHKRLGRTMI YVTDQVEAMTLADKIVWLDAGRVAQVGKPLAV----PLSGRPFCRRIYRFAKDELLBtuD> KTRTELLNDVAGALAL------DDKLGRSTNQLSGGEWQRVRLAAWLQITPQANPAGQLLLLDEPMNSLDVAQQSALDKILSALCQQ-GLAIVMSSHDLNHTLRHAHRAWLLKGGKMLASGRREEVLTPPNL-AQAYGMNFRRLDIEGHRCons t 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300Consensus KSKREAVEDALRLLGL--KVELDPLLHRYPHELSGGQQQRVAIA-RALAIEPK------VLLLDEPTSALDVALRAELLRLLQQLHRELGLTIVYVTHDLGEALRIADRVVMRGGR0IV-----EQVGTPDELFGNPVGRRLQDFLMEELKConserved o o OOOOoOOO oO ooO 0 oooOOO ooOO 0 o o o Oo oo Oo 00 0 o oExposure tttttt t t tt tt tt ttttttt t tt t ttt t t t t t t tt t t t t t t tt tt ttt tt tt tttttSecondary ++++++++++++++++ ++++++++ +++++++ ++++++++ ^ .^^^ ......+++++++++++++++"+++++++++++ . -+++

-------------H3---------> -------H4------> B3 T3 A3 B4 T4 A4 B5 TS AS

FIG. 1. Sequence alignment, consensus sequence, secondary structure, and surface accessibility. Sequences aligned with GENALIGN yieldeda pair-wise comparison matrix that was used as a measure of similarity that indicated the sequences could be grouped into seven clusters orsubgroups that were used in the subsequent alignment. The multiple sequence alignment (13) prealigned the most closely related sequencesubgroups first (those with the greatest internal homology) by finding matches of one or more amino acids in each sequence of the subgroup(k-tuple matching) and used these sites as anchors, or constraints, in the subsequent step. Then an intergroup alignment between these prealignedsequences was performed by applying a generalization of the standard Needleman-Wunsch algorithm to the short segments between theanchoring sites. This generalization, instead of working with only two sequences at a time, uses matrices describing the amino acid frequenciesat every alignment position to compare and align the sequences. This procedure (profile alignment) considers all sequences at once in makingthe alignment rather than any specific pairs or subsets, and being iterative, builds up the multiple sequence alignment by combining prealignedsubgroups. Minimal manual intervention and no arbitrary constraints were necessary to reach a satisfactory alignment of the sequences. Aconsensus sequence was computed from this alignment by finding for each sequential position the amino acid that occurred most frequently.If two or more amino acids occurred with same frequency then, among these, the amino acid with the highest similarity (as judged by a Dayhoffodds matrix) to all other amino acids at that position was chosen. Residues marked with o and 0 are >50%o (at least 9 out of 17) and 75% (atleast 13 out of 17) identical among all sequences, respectively. Due to differences in length and lack of sequence similarity at the extreme Nand C termini, the alignment and the consensus sequence are presented only from the beginning to the end ofHisP; the symbols < and > indicatethat the sequence continues at the N and at the C terminus, respectively. Numbers refer to the consensus (cons #) and HisP (HisP #) sequences.The consensus secondary structure prediction is shown below the consensus sequence. +, a-helix; A, P-strand; -, turn (indicated in the regionsmodeled as: Al to A5, Bi to B5, and T1 to T5, and H1 to H4 in the helical domain); all blank regions correspond to coils or turns. A consensussecondary structure prediction was made by comparison of the predictions for each protein according to the multiple sequence alignment. Forthe Chou-Fasman predictions an average, maximum and minimum value for the probability of helix, strand, and turn was computed over allthe aligned sequences at each position of the alignment. The consensus prediction was then taken from the average values (considering alsothe minimum and maximum). For the neural network predictions, the secondary structure predicted for the majority of the proteins was takenas the consensus. Where the Chou-Fasman and neural network consensus predictions differed, the secondary structure assignment was basedon the strength of the predictions. This type of consensus secondary structure prediction is significantly more accurate than predictions forindividual proteins (S.R.H., unpublished results). Surface accessibility is shown below the consensus; residues exposed are denoted by t. Thesurface accessibility calculation (16) employs neural networks "trained" to recognize the relationship between sequence and the accessibilitiesobserved in known protein structures. A binary model of exposure was used in which a residue is considered buried if <20%' of its intrinsicsurface area is exposed. A consensus prediction of surface exposure was obtained in a manner analogous to that used for consensus secondarystructure prediction. The activities (strength of prediction) from the neural network toward both burial and exposure were averaged over allproteins at each amino acid position of the multiple sequence alignment with the consensus prediction being the greater of the two averages.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 3: Structural model of the nucleotide-binding ...Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural modelofthe nucleotide-binding conservedcomponentof

86 Biochemistry: Mimura et al.

between strands B2 and B3 and are located in a region ofpoorconservation. The longest of these helices, H3, spans 31residues, which is greater than any a-helix documented in aresolved crystal structure. The possible function of thehelical domain is discussed below. The consensus predictionof surface exposure is also indicated in Fig. 1. In general,regions corresponding to (3-strands are predicted to be com-pletely buried and regions corresponding to a-helices aremore amphipathic, except for helix H2 (see later).

Conserved Structural Motifs of Other Nucleotide-BindingProteins. To predict a tertiary structure for the conservedcomponent, it was necessary to identify clearly structuralmotifs conserved among mononucleotide-binding proteinsand then find the corresponding motifs in the conservedcomponents. The known crystal structures of several GTP-and ATP-binding proteins have been determined and ob-served to contain a very similar fundamental architecture or"fold" around which variable regions are inserted to performthe unique functions of the particular proteins. These struc-tures include the p211s protein in both the GDP and GTP(analog) bound forms (17), a fragment of EF-Tu (18), gua-nylate kinase (19), and the yeast (20) and porcine (21)adenylate kinases. The common folding pattern includes aphosphate-binding domain composed of an N-terminalstrand-turn-helix (glycine-rich flexible loop) followed by two,/strands, and a second domain containing consecutive helix-turn-strand motifs comprising a typical Rossman or dinucleo-tide-binding fold. Altogether the five /3-strands with inter-vening a-helices constitute a highly conserved structuralmotif. Differences in topology exist between the GTPase andadenylate kinase structures; in particular the second (3-strandin the phosphate-binding domain is parallel in adenylatekinases and antiparallel in the GTPases (22). Superposition ofadenylate kinase with p21mg and EF-Tu by structural align-ment of the a-carbons of only 10 residues within the first/-turn-helix structural motif results in excellent overlap of allfive /-strands that form the protein core, several of thehelices, and the glycine-rich flexible loop, as shown in Fig. 2.The position and orientation of Asp-93 in adenylate kinase(21), Asp-57 in p2lms (17), and Asp-80 in EF-Tu (18) occurringnear the end of the third /3-strand are nearly identical. In allcases, this aspartic acid residue is oriented toward andprobably interacting with the Mg2" bound to the nucleotide(Fig. 2); its role in Mg2" binding and charge neutralizationmay be responsible for the complete conservation of thisresidue (23) among all members of this family. The sequenceand structure conservation of this aspartate make it likelythat Asp-93 is the conserved aspartate in the adenine nucleo-tide-binding fold, rather than Asp-119, as proposed (3); inagreement with this, Asp-119 is not conserved in all adenylatekinases (24). In conclusion, the striking similarity between allthese structural and sequence features lends confidence thatthis folding pattern can be used as a template for the three-dimensional structure of the conserved component.Sequence and Structure Alignment of the Conserved Com-

ponent with Adenylate Kinase. Due to the low level ofsequence similarity between the family of the conservedcomponents and adenylate kinase, alignment of their se-quences for purposes of homology modeling is difficult. Ourapproach was first to identify in the conserved componentsthose features that correspond to the structures that charac-terize ATP-binding proteins, the conserved aspartic acid andthe glycine-rich loop; then we identified within the predictedconsensus secondary structure the remaining structural fea-tures that correspond to the conserved mononucleotide-binding fold. The glycine-rich loop, characterized by thesequence Gly-(Xaa)4-Gly-Lys, is an obvious feature. Asp-214is completely conserved in the transport proteins and ispredicted to be near the terminus of a /-strand composed ofseveral hydrophobic residues. This sequence and structure

FIG. 2. Superposition of adenylate kinase and p21tm. The Catbackbone ofadenylate kinase (blue), p21tm (pink), and EF-Tu (cyan)were superimposed as described in the text. Residues shown in redcorrespond to the sequences Gly-Xaa-Xaa-Gly-Xaa-Gly-Lys and theconserved P-strands are yellow; GTP (in p21tm) is green with the vander Waal radii depicted in blue; the conserved aspartates (see text)are shown in red, green, and white for adenylate kinase, p21t, andEF-Tu, respectively; the Mg2" is depicted as a red star. The residueschosen forthe structural alignment are the last two ofB1, the first twoof the first turn, and the first six of Al (see Table 1).

conservation makes it likely that Asp-214 corresponds to theconserved aspartate of the other mononucleotide-bindingproteins (i.e., Asp-93 ofadenylate kinase). Given the locationof the glycine-rich loop and the conserved aspartate, we thenused the consensus secondary structure predictions andsequence homology to assign the remaining conserved sec-ondary structure elements of the mononucleotide-bindingfold. Table 1 shows the alignment of the secondary structurespredicted for the consensus sequence with the five motifs ofadenylate kinase, p21t, and EF-Tu. The alignment of motifs1 and 3 is unambiguous; the alignment of the other threemotifs is less certain but consistent with currently availabledata. The good match of these structural features implies thatadenylate kinase and the conserved components have verysimilar topology, thus forming the basis for a topologicalmodel of the ATP-binding domain of the conserved compo-nents and specifically of HisP (Fig. 3).

Str al Model of HisP. To initiate a structure-functionanalysis, a three-dimensional representation of HisP, asshown in Fig. 4, was inferred from its alignment with adenylatekinase. An ATP has been placed by analogy to the knownlocation of GTP in p21t (17). This structural model can betested and improved by mutational and biochemical analyses.We have initial evidence supporting the model from thelocation of a number of hisP mutations investigated in ourlaboratory. The following replacements (Fig. 3) located ininvariant residues contributing to what is modeled to be partof the ATP-binding site result in a defect in ATP binding, thusclearly supporting the predicted structure: Gly-39 to Asp, Lys,or Arg; Gly-44 to Ser; and Lys-45 to Asn, Pro, or Val (Fig. 4).Some hisP mutations located near the ATP-binding pocket donot affect ATP binding, such as: Ser-46 to Asn, Thr-47 to Ala,and Glu-179 to Asp (V. Shyamala, V. Baichwal, and G.F.-

Proc. Natl. Acad Sci. USA 88 (1991)

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 4: Structural model of the nucleotide-binding ...Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural modelofthe nucleotide-binding conservedcomponentof

Proc. Natl. Acad. Sci. USA 88 (1991) 87

Table 1. Alignment of sequences within secondary structure motifsMotif Protein

1 conadkrasEF-Tu

2 conadkrasEF-Tu

3 conadkrasEF-Tu

4 conadkrasEF-Tu

5 conadkrasEF-Tu

Beta

4EVLALV9KIIFVV4YKLVVV12VNVGTI79ÆDILLD34YT1LSI41YSDEI6EVHST210VLLLDU

53LDILDT76YAHVDC242TIVYVT

114LLLXVD77GFLCVF'O0GAILVV260VVVMR169IVRKVNA111HVLVGNK129YIIVFLN

Turn

'52SGSi'0GAGGV'8GHVDH85GQEIE40f27HFVDEYD P71TPT2'6PT S AL95YPR59AGQEEYSA82PG

248jj

120-14585NN

265CGD A176EG18CDLAARTV

138D M

Alpha

57GRSTLLRIIAGL2O0j[QCEKIVQKY15GKSALTIQLIQN23GKTTLTAAITTV

104-203, hel dom4IDLLRAEVSS

NoneNone

221DVALRAELLRLLQQLHR98EVKQGEEFERKI(67HRDQYMRTGNHADYVKNMI249DLGEALRIADR156A1EBVI AFXEE87TKSFEDIHQYREQIKRVK113PQTREHILLGR267RIV

178SVDD YFSQVCTHLD126ESRQAQDLARSY145LLELVEMEVRELL

Residues are from the five 13-turn-a secondary structure motifs shown in Fig. 3. adk, Porcineadenylate kinase; ras, p21S; con, consensus sequence; hel dom, helical domain. Underlined residuesoccur in at least 13 out of 17 conserved components or in at least 4 out of 5 sequences in the adenylatekinase family (23). Residues in boldface are conserved in nucleotide-binding proteins. Numbers referto the first residue in each motif. The second 13strands in ras and EF-Tu are: 41-37 and 68-64,respectively. Residues 120-145 of adenylate kinase contain both a helix and a turn. The equivalent ofhelix A2 in adenylate kinase is included within the helical domain of the conserved components.

L.A., unpublished data). These residues are predicted to beexposed to solvent (Fig. 1), thus they likely are facing awayfrom the pocket. Two HisP residues have been shown to becovalently modified by the photoaffinity labeling compound,8-azidoadenosine 5'-triphosphate: His-19 and Ser-41 (25).His-19 could be modeled to be located close to the adeninering. However, Ser-41 should be the equivalent of Gly-12 inp21's (17), which is adjacent to the terminal phosphate ofGTPand on the opposite side of the phosphate-binding loop fromthe purine ring and, therefore, distant from position 8 of thepurine. A reasonable explanation is that the large conforma-tional change that occurs in the glycine-rich flexible loop (26)upon ATP binding brings Ser-41 into close proximity to thereactive nitrene. Further experimentation with other affinitylabeling compounds and additional mutants is needed to clarifythe nature of the ATP-binding site.

Implications of the Model for the Structure of the Mem-brane-Bound Permease Complex. HisP has been shown re-cently to be present within the membrane-bound complex asa dimer and to cross the membrane, presumably between the

two hydrophobic components (R. Kerppola and G.F.-L.A.,unpublished data). Our model is compatible with this archi-tectural organization. The long helical domain between B2and B3, which includes the buried helix H2, is moderatelyhydrophobic and may well interact with the hydrophobiccomponents ofthe membrane complex. Such interaction maybe responsible for transmitting the conformational changesnecessary for transport (11). In agreement with this hypoth-esis is the fact that, since the hydrophobic componentsusually are not conserved, this domain is also not wellconserved.One of the characteristics to be understood in the mech-

anism of action of the family of traffic ATPases (11) is the factthat some members of the family function in extrusionwhereas others function in uptake. Since the eukaryoticmembers of this transport family have domains very similarto the prokaryotic conserved components, they may beorganized in the membrane in a similar fashion, utilizing theirhydrophobic domains as intramolecular anchors to the mem-brane. In this scheme translocation in either direction may

FIG. 3. Topological diagrams of adenylate kinase and of the models of the conserved component and of HisP. Cylinders, a-helices; arrows,13-strands; lines, coils and turns. The first and last residues of the helices and strands are denoted with numbers corresponding to each proteinsequence. Underlined residues correspond to the bold characters in Table 1. Starred residues are the sites of mutations discussed in the text. Thehelices and strands that can be modeled are labeled Al to A4 and Bi to B5, respectively. Helices H1-H4 constitute the helical domain. Helix H2in the conserved component and in HisP is predicted to be buried. Arrows mark the sites of 8-azidoadenosine 5'-triphosphate modification in HisP.

Biochemistry: Mimura et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 5: Structural model of the nucleotide-binding ...Proc. Nati. Acad. Sci. USA Vol. 88, pp. 84-88, January 1991 Biochemistry Structural modelofthe nucleotide-binding conservedcomponentof

Proc. Natl. Acad. Sci. USA 88 (1991)

FIG. 4. Stereoview of a model of the ATP-binding domain of HisP. The Ca atoms of the ATP-binding domain in the model of HisP are shownin stereo. The arrows labeled N and C are positioned at HisP residues 33 and 228, respectively. The first 32 residues from the N terminus andthe last 30 residues prior to the C terminus are not shown as no analogous regions exist in adenylate kinase. The helical domain (107 residues)is illustrated schematically, starting at residue 66 with entry and exit arrows indicated. Loop regions differing from those of adenylate kinasewere modeled from loops occurring in a data base of 15 proteins. The side chains of the conserved residues Lys-45 at the end of the glycine-richloop and Asp-178 are shown. A bound ATP is shown along with its associated Mg2" (as a cross surrounded by a dotted surface) in a positioncorresponding to the GTP found in p2lrnS. The t-phosphate of ATP is labeled as PC. The positions of residues 55 (end of Al), 66 (end of B2),and 204 (start of B4) are also indicated. The C, coordinates of this model are available on request.

occur by passage through a substrate-specific pore, with orwithout the requirement for a substrate-binding protein.

While this paper was under review, a paper describing astructural model ofthe conserved components was published(27). A major difference between that model and our modelcenters around our assignment of Asp-93 as the conservedaspartate in adenylate kinase and its alignment with Asp-214in the conserved component, while Hyde et al. (27) utilize fortheir analysis Asp-119, previously assigned as the relevantaspartate residue in the adenine nucleotide-binding site (3).We have discussed above why we believe that the latterassignment is incorrect. In addition and in contrast to Hydeet al. (27), we align Asp-93 with an aspartate that is 100%conserved in all the proteins we analyzed (Fig. 1 and Table1), thus lending additional support to our assignment. Ourmodel results in a much larger domain looping out betweenB2 and B3 (i.e., the entire helical domain) and in an alignmentof the adenylate kinase secondary structure motifs from B3to the end of the protein that is entirely different from thatdepicted by Hyde et al. (27) (Fig. 1 and Table 1).Among the consequences of our model that are relevant to

eukaryotic transporters members of this family is the fact thatthe hydrophilic portions of these molecules are likely to spanthe membrane, by analogy with the prokaryotic counterparts.Therefore, the helical domain may be accessible at the outersurface of the membrane and thus place the site of the mostcommon defect in cystic fibrosis (Aphe5O8) in contact withthe outer solvent; this makes it possible that this region isinvolved in substrate binding, whatever that may be. On theother hand and also in contrast with the suggestion by Hydeet al. (27), it is possible that cystic fibrosis mutations affectATP binding or hydrolysis, by analogy with several mutantHisPs whose defects are located within this large loop andthat are unable to bind ATP (V. Shyamala, V. Baichwal, andG.F.-L.A., unpublished data), though this effect may beindirect. The structural model we present is a useful first steptoward structure-function experiments that will providevaluable insights into the mechanism of action of all thesetransporters.We thank S. Muskal for help in secondary structure predictions

and V. Shyamala for discussions on the HisP mutants. This work wassupported by National Institutes of Health Grant DK12121 to G.F.-L.A. and by National Science Foundation Grant BBS-8720134 to theBerkeley Structural Biology Facility.

1. Ames, G. F.-L. (1986) Annu. Rev. Biochem. 55, 397-425.2. Higgins, C. F., Hiles, I. D., Salmond, G. P. C., Gill, D. R.,

Downie, J. A., Evans, I. J., Holland, I. B., Gray, L., Bukel, S. D.,Bell, A. W. & Hermodson, M. A. (1986) Nature (London) 332,543-546.

3. Walker, J. E., Saraste, M. J., Runswick, M. J. & Gay, N. J. (1982)EMBO J. 1, 945-951.

4. Hobson, A., Weatherwax, R. & Ames, G. F.-L. (1984) Proc. Natl.Acad. Sci. USA 81, 7333-7337.

5. Higgins, C. F., Hiles, I. B., Whalley, K. & Jamieson, D. K. (1985)EMBO J. 4, 1033-1040.

6. Prossnitz, E., Gee, A. & Ames, G. F.-L. (1989) J. Biol. Chem. 264,5006-5014.

7. Bishop, L., Agbayani, R., Jr., Ambudkar, S. V., Maloney, P. C. &Ames, G. F.-L. (1989) Proc. Natl. Acad. Sci. USA 86, 6953-6957.

8. Dean, D. A., Davidson, A. H. & Nikaido, H. (1989) Proc. Natil.Acad. Sci. USA 86, 9134-9138.

9. Ames, G. F.-L. & Joshi, A. K. (1990) J. Bacteriol. 172, 4133-4137.10. Ames, G. F.-L. & Spudich, E. N. (1976) Proc. Natl. Acad. Sci.

USA 73, 1877-1881.11. Ames, G. F.-L., Mimura, C. S. & Shyamala, V. (1990) FEMS

Microbiol. Rev. 429-446.12. Martinez, H. M. (1988) Nucleic Acids Res. 16, 1683-1691.13. Vingron, M. & Argos, P. (1989) Comput. Appl. Biosci. 5, 115-121.14. Argos, P. A., Hanei, M. & Garavito, R. M. (1978) FEBS Lett. 93,

19-24.15. Qian, N. & Sejnowski, T. J. (1988) J. Mol. Biol. 202, 865-884.16. Holbrook, S. R., Muskal, S. M. & Kim, S.-H. (1990) Protein Eng.

3, 659-665.17. Milburn, M. V., Tong, L., deVos, A. M., Brunger, A., Yamaizumi,

Z., Nishimura, S. & Kim, S.-H. (1990) Science 247, 939-945.18. Woolley, P. & Clark, B. F. C. (1989) Biotechnology 7, 913-920.19. Stehle, T. & Schulz, G. E. (1990) J. Mol. Biol. 211, 249-254.20. Egner, U., Tomasselli, A. G. & Schulz, G. E. (1987) J. Mol. Biol.

195, 649-658.21. Dreusicke, D., Karplus, A. & Schulz, G. E. (1988) J. Mol. Biol. 199,

359-371.22. Holbrook, S. R. & Kim, S.-H. (1989) Proc. Natl. Acad. Sci. USA

86, 1751-1755, and correction (1989) 86, 7415.23. Schlichting, I., Almo, S. C., Rapp, G., Wilson, K., Petratos, K.,

Lentfer, A., Wittinghofer, A., Kabsch, W., Pai, E. F., Petsko,G. A. & Goody, R. S. (1990) Nature (London) 345, 309-315.

24. Schulz, G. E., Schilts, E., Tomasselli, A. G., Rainer, F., Brune,M., Wittinghofer, A. & Schirmer, R. H. (1986) Eur. J. Biochem.161, 127-132.

25. Mimura, C., Admon, A. & Ames, G. F.-L. (1990) J. Biol. Chem.,in press.

26. Fry, D., Kuby, S. A. & Mildvan, A. S. (1986) Proc. Natl. Acad. Sci.USA 83, 907-911.

27. Hyde, S. C., Emsley, P., Hartsdhorn, M. J., Mimmack, M. M.,Gileadi, U., Pearce, S. R., Gallagher, M. P., Gill, D. R., Hubbard,R. E. & Higgins, C. F. (1990) Nature (London) 346, 362-365.

88 Biochemistry: Mimura et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1