5
Proc. Natl. Acad. Sci. USA Vol. 85, pp. 7872-7876, November 1988 Biochemistry Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: Structural and functional implications (picornavirus/protein structure prediction/sequence alignment) J. FERNANDO BAZAN*t AND ROBERT J. FLETTERICKt *Department of Biophysics, University of California, Berkeley, CA 94720; and tDepartment of Biochemistry and Biophysics, University of California, San Francisco, CA 94143 Communicated by William J. Rutter, June 28, 1988 (received for review April 8, 1988) ABSTRACT Proteases that are encoded by animal picor- naviruses and plant como- and potyviruses form a related group of cysteine-active-center enzymes that are essential for virus maturation. We show that these proteins are homologous to the family of trypsin-like serine proteases. In our model, the active-site nucleophile of the trypsin catalytic triad, Ser-195, is changed to a Cys residue in these viral proteases. The other two residues of the triad, His-57 and Asp-102, are otherwise absolutely conserved in all the viral protease sequences. Sec- ondary structure analysis of aligned sequences suggests the location of the component strands of the twin fl-barrel trypsin fold in the viral proteases. Unexpectedly, the 2a and 3c subclasses of viral cysteine proteases are, respectively, homol- ogous to the small and large structural subclasses of trypsin-like serine proteases. This classification allows the molecular map- ping of residues from viral sequences onto related tertiary structures; we precisely identify amino acids that are strong determinants of specificity for both small and large viral cysteine proteases. The naturally occurring proteases can be grouped into four classes according to the prominent functional groups at their active sites: Ser, Cys, Asp, and Zn (1). The Ser protease trypsin and subtilisin subclasses are distinguished by an identical spatial arrangement of catalytic His, Asp, and Ser residues in radically different ,8/,3 (trypsin) and a//3 (subtil- isin) protein scaffolds, an example of locally convergent evolution (1, 2). Within the trypsin subclass of molecules, divergent proteins exhibit widely varying degrees of se- quence similarity; analysis of tertiary structures, however, reveals a high level of structural conservation (2, 3). This precedence of tertiary form over primary sequence is ob- served in other structurally characterized families of proteins such as the immunoglobulins (4) and globins (5). Cys-active-center viral proteases have been identified in the sequenced genomes of four genera of the picornavirus family: the rhinoviruses [human rhinovirus strains 2-14 (HRV2-14)] (6, 7), the enteroviruses [human poliovirus (HPV1), echovirus strain 9 (EV9), coxsackievirus (CXV), bovine enterovirus (BEV), and hepatitis A virus (HAV)] (8- 12), cardioviruses [encephalomyocarditis virus (EMCV) and Theiler's murine encephalomyelitis virus (TMEV)] (13, 14), and aphthoviruses [foot-and-mouth disease virus (FMDV)] (15). Two different classes of plant viruses also encode Cys proteases that are homologous to the picornaviral proteases (16-18): a comovirus with a bipartite genome, cowpea mosaic virus (CPMV) (19), and two potyviruses with monopartite genomes, tobacco-etch virus (TEV) (20) and tobacco-vein- mottling virus (TVMV) (21). The animal and plant viruses discussed above have in common a positive-strand RNA genome that is translated into a single large polyprotein (two in the case of the segmented CPMV). The precursor is proteolytically pro- cessed at preferred Gln-Gly and Tyr-Gly sites to release a number of mature proteins needed for virus replication, structure, and assembly (22). The 3c gene segment in picor- naviruses encodes a Cys protease (on average 185 amino acids) that is specific for the Gln-Gly cleavages (22, 23). A second, N-terminal, 2a gene segment present only in rhino- and enterovirus members of the picornavirus family [exclud- ing HAV (12)] encodes a smaller (average 150 amino acids) Cys protease that is similar in sequence to the 3c proteases in the vicinity of the putative active site Cys, but is specific for the Tyr-Gly sites (22, 24). The three plant viruses carry gene regions (termed 24K in CPMV and NIa in TEV-TVMV) (19-22, 25, 26) that encode Cys proteases similar to the picornavirus 3c class. The catalytic role of a C-terminal Cys in both 2a and 3c proteases has been inferred by the high degree of local primary sequence conservation (16-18, 22) and by site- specific mutagenesis experiments, which showed that replac- ing the Cys inactivates the enzyme (27). Inactivation by the classical Cys protease inhibitors iodoacetamide, N-ethylma- leimide, and para-chloromercuribenzoate (28) and the pro- tein cystatin (29) confirms this assignment. These results have prompted classification of these enzymes as another Cys protease structural class, distinct from the papain family (with which there is no discernable sequence similarity) (16- 18, 22). Our studies demonstrate that the tertiary fold of these viral Cys proteases is similar to the bilobal /-barrel motif of the trypsin family of Ser proteases (2), a structure that is different from that of papain (1). Conserved His, Asp, and Cys residues equivalent to the trypsin catalytic triad (2) are found at structurally similar positions in the picorna- and plant viral proteases. Other conserved residues in an align- ment of viral and cellular proteases fulfill essential structural roles or contribute directly to the homologous catalytic and substrate-binding properties of the Ser proteases. METHODS The viral protease sequences were selected from the pub- lished literature and the National Biomedical Research Foun- dation (NBRF) data base (ref. 30, release 15.0). The cellular Abbreviations: HRV2-14, human rhinovirus, strains 2-14; HPV1, human poliovirus, strain Sabin vaccine P3/Leon/37; EV9, echovirus strain 9; CXV, coxsackievirus, B3 strain Nancy; BEV, bovine enterovirus; HAV, hepatitis-A virus; EMCV, encephalomyocarditis virus; TMEV, Theiler's murine encephalomyelitis virus; FMDV, foot-and-mouth disease virus, strain 0[1]K; TVMV, tobacco-vein- mottling virus; TEV, tobacco etch virus; CPMV, cowpea mosaic virus, B genome segment; SAP, Staphylococcus aureus (strain V8) protease); SGT, Streptomyces griseus trypsin; TRP, trypsin; CHT, chymotrypsin; ELA, elastase; SGPA and SGPB, Streptomyces griseus protease A and B; ALP, Lysobacter enzymogenes a-lytic protease; NBRF, National Biomedical Research Foundation. 7872 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Viral cysteine proteases are homologous to the trypsin-like family of

Embed Size (px)

Citation preview

Page 1: Viral cysteine proteases are homologous to the trypsin-like family of

Proc. Natl. Acad. Sci. USAVol. 85, pp. 7872-7876, November 1988Biochemistry

Viral cysteine proteases are homologous to the trypsin-like familyof serine proteases: Structural and functional implications

(picornavirus/protein structure prediction/sequence alignment)

J. FERNANDO BAZAN*t AND ROBERT J. FLETTERICKt*Department of Biophysics, University of California, Berkeley, CA 94720; and tDepartment of Biochemistry and Biophysics, University of California, SanFrancisco, CA 94143

Communicated by William J. Rutter, June 28, 1988 (received for review April 8, 1988)

ABSTRACT Proteases that are encoded by animal picor-naviruses and plant como- and potyviruses form a relatedgroup of cysteine-active-center enzymes that are essential forvirus maturation. We show that these proteins are homologousto the family of trypsin-like serine proteases. In our model, theactive-site nucleophile of the trypsin catalytic triad, Ser-195, ischanged to a Cys residue in these viral proteases. The other tworesidues of the triad, His-57 and Asp-102, are otherwiseabsolutely conserved in all the viral protease sequences. Sec-ondary structure analysis of aligned sequences suggests thelocation of the component strands of the twin fl-barrel trypsinfold in the viral proteases. Unexpectedly, the 2a and 3csubclasses of viral cysteine proteases are, respectively, homol-ogous to the small and large structural subclasses oftrypsin-likeserine proteases. This classification allows the molecular map-ping of residues from viral sequences onto related tertiarystructures; we precisely identify amino acids that are strongdeterminants of specificity for both small and large viralcysteine proteases.

The naturally occurring proteases can be grouped into fourclasses according to the prominent functional groups at theiractive sites: Ser, Cys, Asp, and Zn (1). The Ser proteasetrypsin and subtilisin subclasses are distinguished by anidentical spatial arrangement of catalytic His, Asp, and Serresidues in radically different ,8/,3 (trypsin) and a//3 (subtil-isin) protein scaffolds, an example of locally convergentevolution (1, 2). Within the trypsin subclass of molecules,divergent proteins exhibit widely varying degrees of se-quence similarity; analysis of tertiary structures, however,reveals a high level of structural conservation (2, 3). Thisprecedence of tertiary form over primary sequence is ob-served in other structurally characterized families of proteinssuch as the immunoglobulins (4) and globins (5).

Cys-active-center viral proteases have been identified inthe sequenced genomes of four genera of the picornavirusfamily: the rhinoviruses [human rhinovirus strains 2-14(HRV2-14)] (6, 7), the enteroviruses [human poliovirus(HPV1), echovirus strain 9 (EV9), coxsackievirus (CXV),bovine enterovirus (BEV), and hepatitis A virus (HAV)] (8-12), cardioviruses [encephalomyocarditis virus (EMCV) andTheiler's murine encephalomyelitis virus (TMEV)] (13, 14),and aphthoviruses [foot-and-mouth disease virus (FMDV)](15). Two different classes of plant viruses also encode Cysproteases that are homologous to the picornaviral proteases(16-18): a comovirus with a bipartite genome, cowpea mosaicvirus (CPMV) (19), and two potyviruses with monopartitegenomes, tobacco-etch virus (TEV) (20) and tobacco-vein-mottling virus (TVMV) (21).The animal and plant viruses discussed above have in

common a positive-strand RNA genome that is translated

into a single large polyprotein (two in the case of thesegmented CPMV). The precursor is proteolytically pro-cessed at preferred Gln-Gly and Tyr-Gly sites to release anumber of mature proteins needed for virus replication,structure, and assembly (22). The 3c gene segment in picor-naviruses encodes a Cys protease (on average 185 aminoacids) that is specific for the Gln-Gly cleavages (22, 23). Asecond, N-terminal, 2a gene segment present only in rhino-and enterovirus members of the picornavirus family [exclud-ing HAV (12)] encodes a smaller (average 150 amino acids)Cys protease that is similar in sequence to the 3c proteasesin the vicinity of the putative active site Cys, but is specificfor the Tyr-Gly sites (22, 24). The three plant viruses carrygene regions (termed 24K in CPMV and NIa in TEV-TVMV)(19-22, 25, 26) that encode Cys proteases similar to thepicornavirus 3c class.The catalytic role of a C-terminal Cys in both 2a and 3c

proteases has been inferred by the high degree of localprimary sequence conservation (16-18, 22) and by site-specific mutagenesis experiments, which showed that replac-ing the Cys inactivates the enzyme (27). Inactivation by theclassical Cys protease inhibitors iodoacetamide, N-ethylma-leimide, and para-chloromercuribenzoate (28) and the pro-tein cystatin (29) confirms this assignment. These resultshave prompted classification of these enzymes as anotherCys protease structural class, distinct from the papain family(with which there is no discernable sequence similarity) (16-18, 22). Our studies demonstrate that the tertiary fold oftheseviral Cys proteases is similar to the bilobal /-barrel motif ofthe trypsin family of Ser proteases (2), a structure that isdifferent from that of papain (1). Conserved His, Asp, andCys residues equivalent to the trypsin catalytic triad (2) arefound at structurally similar positions in the picorna- andplant viral proteases. Other conserved residues in an align-ment of viral and cellular proteases fulfill essential structuralroles or contribute directly to the homologous catalytic andsubstrate-binding properties of the Ser proteases.

METHODSThe viral protease sequences were selected from the pub-lished literature and the National Biomedical Research Foun-dation (NBRF) data base (ref. 30, release 15.0). The cellular

Abbreviations: HRV2-14, human rhinovirus, strains 2-14; HPV1,human poliovirus, strain Sabin vaccine P3/Leon/37; EV9, echovirusstrain 9; CXV, coxsackievirus, B3 strain Nancy; BEV, bovineenterovirus; HAV, hepatitis-A virus; EMCV, encephalomyocarditisvirus; TMEV, Theiler's murine encephalomyelitis virus; FMDV,foot-and-mouth disease virus, strain 0[1]K; TVMV, tobacco-vein-mottling virus; TEV, tobacco etch virus; CPMV, cowpea mosaicvirus, B genome segment; SAP, Staphylococcus aureus (strain V8)protease); SGT, Streptomyces griseus trypsin; TRP, trypsin; CHT,chymotrypsin; ELA, elastase; SGPA and SGPB, Streptomycesgriseus protease A and B; ALP, Lysobacter enzymogenes a-lyticprotease; NBRF, National Biomedical Research Foundation.

7872

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Viral cysteine proteases are homologous to the trypsin-like family of

Proc. Natl. Acad. Sci. USA 85 (1988) 7873

Ser protease sequences [trypsin (TRP), chymotrypsin(CHT), and elastase (ELA)] are found in the alignments ofCraik et al. (3). Several bacterial enzymes are considered:Streptomyces griseus trypsin (SGT) (31), Staphylococcusaureus protease (SAP) (32), Streptomyces griseus proteasesA and B (SGPA and SGPB) (33), and Lysobacter enzymo-genes a-lytic protease (ALP) (33). Crystallographic coordi-nates for representative members of the various cellularprotease families are available from the Protein Data Bank(34). Tertiary structure modeling of the 2a-3c viral proteasefolds based on the homologous known structures of small-large trypsin-like Ser proteases was carried out on an Evansand Sutherland PS330 graphics work station (linked to a VAX8650 computer) with the Biosym INSIGHT software pack-age.The University of Wisconsin Genetic Computer Group

(ref. 35; release 5.2) program PROFILE (36) was used incompiling the alignments of distantly related proteins. Se-quence templates that incorporated the disparate and con-served amino acid patterns of both viral and cellular prote-ases were used to search the (approximately) 6800 sequencesof the NBRF data base release 15.0 (30). The discriminatorycapability of a template was refined until it could consistentlyretrieve all known viral Cys protease and/or trypsin-like Serprotease sequences. GENALIGN (37) produced statisticalalignment scores of both pairwise and multiple matchingsbased on the method of Needleman and Wunsch (38).Structural constraints were incorporated into the alignmentsby allowing gaps to be located only between elements ofknown or predicted secondary structure. The latter predic-tions combined the Garnier-Robson algorithm (39) withamphipathic a-strand search techniques (40) to best locatethe twelve 8-strands of the trypsin-like Ser protease fold inthe viral proteases and the bacterial SAP. Accurate locationof turns was facilitated by the MATCH algorithm (41).

Box 1 Box 2

-10 12 2 9 3 6 48

....- .....-

Tyr Gly 2 0 3 8

Viral 2aCys

Protease

SmallSer

Protease

Viral 3cCys

Protease

G1..nGGly

LargeSer

Protease

57

H40

D85

Box 1 Box 2

16 49 69 100 110I I

D102

RESULTS

The viral Cys protease sequences were aligned in an effort toextend the similarities that were noted in the C-terminal thirdof the proteins (16-18). The picorna- and plant virus se-quences were aligned in order of their closer similarity untilall sequences had been incorporated into the profile (36). Twoabsolutely conserved residues in the N-terminal half of theviral 3c proteases, His-40 and Asp-85, accompanied theconserved C-terminal residues Cys-147 and His-161 (Fig. 1).A parallel profile of the smaller 2a proteases showed absoluteconservation of His-20, Asp-38, and Cys-109 residues (Fig.1). Merging ofthe 3c and 2a alignments (introducing a numberof gaps in the shorter 2a sequences) showed that the threeconserved 2a residues were equivalent to the conserved His-40, Asp-85, and Cys-147 of the 3c proteases.

Profile searches of the NBRF data bank (ref. 30; release15.0) utilizing the 2a, 3c, and merged profiles directedattention to the trypsin-like Ser protease family. Notably, the2a profile retrieved small bacterial protease sequences, whilethe 3c and merged profiles matched several large trypsin-likeSer proteases, the bacterial SAP (32) scoring highest. Five ofthe 3c proteases (HRV2, HPV1, EV9, CXV, and BEV) (6-11) and the five available 2a sequences (6-8, 10, 11) were thenused to generate pairwise alignment statistics with GENA-LIGN (37) in matches between themselves, a test group oftrypsin-like Ser proteases (SAP, SGT, TRP, CHT, and ELA)(3, 31, 32) and respective randomized sequences. The 3c/2amatches produced comparable identity scores (13 on aver-age) to the large/small Ser protease pairings (average of 14)(33). Viral-2a/small Ser protease alignments scored an aver-age of 13 identities, while the typical viral-3c/large Serprotease alignments were segregated into matches with SAP(scoring x15 identities) and all others (4 identities). This lowoverall identity between viral and cellular proteases must beBox 3 Box 4

87 117 118 129 150

11

I T~

C109 T124Gln Gly

I I j iC V/S12147 Qin iy

Box 3 Box 4

177 203 208 218I I l

245

S S214195

FIG. 1. Schematic comparison of the polypeptide chains of typical viral Cys and cellular Ser proteases that are aligned in Fig. 2. Boxes 1-4 correspond to sequence-similar regions of the chains that contribute to formation of the catalytic site or substrate binding pocket in determinedtertiary structures. The different box-shading schemes distinguish sequence patterns of the two known structural subclasses of cellulartrypsin-like Ser proteases (labeled small and large) homologous to the viral 2a and 3c Cys protease subclasses. Proteolytic processing sites areidentified at the N and C termini of the viral proteases by vertical arrows. Cellular proteases have analogous N-terminal sites, as the enzymesare processed from longer, inactive precursors. The broken-outline box (numbered -10 to 1) at the N terminus of the 2a protease representssequences N-terminal to the presumed Tyr-Gly cleavage site that display significant similarity to mature protein sequences in 3c and cellularproteases. The numbering of the aligned viral 2a-3c and small Ser protease sequences is absolute; the large Ser protease sequence is numberedaccording to the chymotrypsinogen scheme (3). H20, His-20; D38, Asp-38, etc.

Box I Box 2 Box 3 Box 4

ix2644 62 74 124 15415x6 167 93

H3 4 D6 4 s1 4 6 S162

Box I Box 2 Box 3 Box 41 32 51 83 93 130 155 156 166 183

HAL' At~~~~~~~~~~~~~~~~~~~~~~J

L%*JXXISmm m. M

t--717-771 Ir" bmw II

Biochemistry: Bazan and Fletterick

I9 MN,1 I

Page 3: Viral cysteine proteases are homologous to the trypsin-like family of

7874 Biochemistry: Bazan and Fletterick

qualified by the realization that the few conserved residuesare known to play essential structural roles in catalysis andsubstrate binding for trypsin-like Ser proteases. In particular,the three absolutely conserved residues in the viral 2a/3cmerged alignment (His-20/40, Asp-38/85, Cys-109/147; seeFig. 1) superimposed on His-57, Asp-102, Ser-195 of thetrypsin-like Ser protease catalytic triad (Fig. 1) (2, 3). Nosignificant alignment was possible with the active site triad(Asp-32, His-64, Ser-220) of subtilisin-like Ser proteases (1).The lead of Taylor (4) and Bashford et al. (5) is followed in

Proc. Natl. Acad. Sci. USA 85 (1988)

considering sequence/structure relationships in widely di-vergent families of proteins. We sought to distinguishwhether the low similarity that is observed in viral/cellularsequence comparisons was indeed indicative of structuralsimilarity and evolutionary relationship or suggestive merelyof analogous structure. To this effect, available structuralinformation [from Ser protease crystal structures (2) and viralprotease secondary structure predictions (39-41)] was incor-porated into the viral/cellular protease alignments in an effortto refine the profiles and increase their ability to retrieve

I 0 2 0 30

01SS VAIMSKCRANLOVFGT NLOIVMVPGRR.TI GMAVPV AYDOL PPKNEDLTFEGESL F. KGPRDYNPI SSTIT/GPTETL...PFDALPPEKOEVAFESKALL KGVRDFNPISACVE/ISAPPTDLOKMVMGNTKPVELILDGKT VAICCATGVF...O AGNPVMUD FELFCAKNI VAPI TFYYP.....DKAEVTOSCLLL.O/GPNPVMD..FEKrVAKHVTAPIGFVYP ...... TGVSTOTCLLV..O/ S T L . E GALVRKNLVMOAFGVGEKNGC .. VRWVMN L GV K ..

O/ GPL ...... FDF VSLLKKNI RTVKTG ...... AGEFTALGVY01/GP FEFAVAMMKRNSSTVKTE YGEFTMLGI Y.

GPA FEFAVAMMKRNASTVKTE YGEFTLGIYO/ GPO F DYAVAMAKRNI VTATTOS KGEFTML GOVH0/ GPN ..... TEFALSLLRKNI MTI TTS ....... K EFTGLI0/ GPE ...... EEFGMSLI KHNSCVI TTE ...... NGKFT LGVY ...

p .I a v a zkzr n p v & .1k g g y..VI LPNNDRHOI TDTTNGHYAPVTYV OVEAP. TGTFI ASGVVV..VVGG. TRAAOGEFPFMVRLSM GCGGALYA.

..I.VGG YTCGANTVPYOVSLNS GYHFCGGSLIN..I VN G EEAVPGSWPWOVSLODKT...GFHFCGGSLIN.N

.. VGG TEAORNSWPSOI SLOYRSGSSWANTCGGTLI R.b bbbbbbbb bbbbbb bbbbbbbbbb

A B

2 0 3 0

IPDSELVLYSHPSLEDVSHS....SL HGVF KVKNTT TL OOHL DTM 4HGEFKVKNSTOLOMKPVE .....

KVKGODMLSDAALMVLHRGNNR ...

NRSGAKTDLT F KVTKGPL .....

AKAGKETDVSFI RLSSGPL .....

SL DVGFODVVLMKVPT PK.....DKTDTSLELTIVKLKMNEK ......

DK DGT NL ELT L LELNRNEK ......DK DO NL E L T L L L ONOR E K.DKDSINLELTLLKLNRNEK....DOAGTNLEITIITLKRNEK ......

DPENINLELTVLTLDRNEK......SKNOI KLEI TVL KL DRNEK ......

d ks d g q * kx It n r n * It . .. .. .

YPNGGFTAEOI TKYSS.........AVKVRSTKVLOAPGYNGT

.GNOOFISASKSIVHPSYNSLTI...KIOKLKI AKVFKNSKYNSLTI..

.GT EOY VG VOKI V VHPY WNT DDVAAbbbbb bb bb bbb

E

SG NYVDTSC...TFPS

ESSH OVH

D.......T.......TYOHEKK. NDSTT

TOCEK8. NDOTIAl CSSSSTNOSTS

9 0

C 'A O L F CWDP DKGR O 11IRPKG RDOI: IVIK K.MA TKV T

AFNDT

FE NV F CSNFN NTSKFOKAF D, TOHFI KK.FN 'AM P DF tDIGGFVAKEFR NGFLAREFNRDI ROH PTF RGFI SED.F A ODI R RYI N

r D Ir k .kqEK LALVKFSP.G KO WA LAIKOI00N D L I K L KSN N O T L L K L T .

_ IOALLNRLAbb b b b

I 0 0 t 0 2 1

IF, LACKHFFT IK.. TKL RVEI VMDtRHLTNESD TT. SLYGI GFGPFIILLLENSSD SE.. RLFGI GFGPYIIGTAYLVP LF. AEKY KI MVD FGAHLFVVN VA. ETDWTAFKLKDVON:TLVONK MA...ESDNTSI VVNSOGD DWLLVPS AYKF. EKDYEMMEFYFNDTVVVLPR AM...PGK TIEMNOKDRWAVLPR A K .....PGPT L NDCDRWAVLPR AK.....PGPSI LMNDIDNVAI L PT AS . PGESI VI D KDRVCVI PT AO. PGDDVLVNGOD RF VV PT AD. POK E OV DD IdVP r a.. p g . s v v .g q

KDTLLTNK VVD. .ATHGDPHALKAFODI VLTAA CVSGSGNNTSI TAT GSGSOWVVSAA CYKS .....I OVRLGCENWVVTAA CGVT .. .. TSDVVVAGEONWVMTAA CVDRE. LTFRVVVGEbbbbbbb bbbbbbbb

C D

RRYYHOFDPANI Y. D

ITNKHL FRRNNOTL LVOIANOHL FRRNNGEL TIKRA MT DS DY RV FE F. E

VRHERHTVALN .... SVVTHARSTVKI L .. AlNRGGTYYSI SAGNVVIK DI E V L D A Y D L NOEVGVLDAKEL...VOEVGVLDAKEL ..... VO

EVE L DA KEA L E

OK IRVK DKYK L ........... VITT K VI DS Y D L V..Yq n v d I.

FPSAISDIN.O.DON..V VDL OSGS .........

ODNI NV V E........EF DOGSSSE.......EHNL NO. NN........

00 1 10 2 0oo 1

ELPSV FGADFLSCKYOEFSSFYEAOYN ADIKVRTKXECLTIODFPPF OKLKFREPPOEEI CLVTT FOTKSMSS S

DFPPFP...OKLKFROPTIKDNVCMVST NFOOKSVSSLVS..........ANMIA TPVVGVINNADVGRLIFS .. ..... G EALTYKDI VVCM

KDDFP ARNDTVTGI MNTGLAFVYS GNFLI GNOPVNTTGAAPVTGI NTDI PMMYT GTFLKAGVSVPVE

R. ALNRLATLVTTVNGTPML ........I SEGPLKMEEKATDYN EAVVVVNTSYYPOLFTCVG RVK DYGFLNLAG

......... EVEVN. EAVLAI NTSKFPNMYI PVG ....... OVTEYGFLNLGGEVEVN EAVLAI NTSKFPNMYI PVG OVTDYGFLNLGG

.........I TETN .. DGVLI VNTSKYPNMYVPVG ....... AVTEOOYLNLGG

......... LEGVD. ATLVVHSNNFTNTI LEVG .. P. T A LINPLSSEDDYP NCNLALLANOPEPTI INVG DVVSYGNI LLSG

... p v a v a 9 wg p v g n v 9p nI s 9NEONKH GEVVKPATMSNNAETOVNO TVTGYPGDKP

PIN.... OPTLKI A.. TTTAYNOSTFTVAGWGANRE. GGSOORYLLKANVPFVSDAASLNSNVASISLP..TSCASAGTOCLISGNSNTKSSGTSYPDVLKCLKAPILSNA A S F SOT V S A V C L P S A S D D F A A GT T CV T T NWO L T RY T N A N T P D R L OO A S L P L L S NSVTL NSYVOL GVL PRAGT LANNSPCYI TGWGL TNTN. GOLAOTLOOAYL PT VDY

bbbbb bbbb bbbbbbb b

a H

1 0 1 20 130 1 40 1 50 1 60

D

DG0G A D TN G K KG G S K0S ON

g..

N D E V

NG. AVN G. ON

6O

0 00C IVGVHVAGIO4I Lt NHSL THT

IV HTSAGGN

Lt 1HSAGSSLt |IHVAGG0

It VIVHVGGNCKVLSI HVGGNCSVLGtIH0000

KV I St H NOGSGKV FG11HVGGNGOVL HVG1NO

1 s N9 g nEVGI11H WGGVPOV|I VSWGYG

TV |IVTSFVSRb b b b b b *b

K

I 7 O 1 a O

GKI GCASLLPPLEPI AOAOI 0........

TT NNY F T SVP KN F MEL L TNO E A OO WVTNGSNYFVEFPEK FVATYL DAADGWCK ......

GVGYCSCVSRSMLLKMKAHIDPEPHHEIGGL AAATt TKELI EAAEKSMLALEPO/ GGI. AAASI VSOEMI NAVVNAFEPOI G.SI. LVAKLVTOEMFONI DKKI El S.

50 FAASLLNNRYFTAEO/H 0 GFSAALLKLNFL DEOK ..

4HGF.SFSAALLRHYFNEEOISH ...... GFAAALKRSYFTOSOG.O...G..RO GFSAOLKKOYFVEKOI O.l.RD..GFSAMLLRSYFTDVOI9q s a*v r v y a qN E FNOAVF NENVRNFLKONI EDI HF..CARP. GYPOVYTEVSTFASAI ASAARTL....CAOK. NKPOVYTKVCNYVSWI KOTI ASN..TCSTS. TPOVYARVTALVNWVOOTLAAN.

LOCNVT. RKPTVFTRVSAYI SNI IASN..b b bIbb b a a a a a a a a a a a a aL1

2 30 24 0

10 1 10 20 Box 1 30 40 Box 2 50

CSNRASLTSYYGPFGOOOGAAYV ...... GSYK IL L. TYADWEN ...... EVWOSY ..... ORDLLVTRVDAHC . DI RCNTTTROSITTMTNT.GAIWTTIRGSV ... CGDYRVVN SO TSADNON. CVWESTY ..... N LLVSTTTAHIC ............. DIIARCOLTPLSTKDLTTTOFGHONKAVYTT....AGYKtICOT LA. TODDLON....AVNVMN ...ISRI0LLVTESRAOT SI ARNNVILPSKGDI LGTTYGGHYT.GYKICNYLM.TPODDLN..VNVPLLVSFO|T0 PRCNT A VT RPI T T A. .GPSDMYVHV .....P.GSNMLI YRNL LF. NSEMHE .. .. .. S L OS Lt T N0T0V D .. .. ... .. D Y P S C D

9 9 t .C aV y k . 9 s t a .d n v 9 y . t t a h d a ,

..I AGGEAI TTG GSRCSLGFNVSVNVAHALTA CTS SSASW GTRTGTSFPNN oI NGRHSNPAAARVYTLYNGSYODI TTAGN

..I SGGDAI YSS ... TGRCSLGF VRS G STYYFLTA CTDGATGTW NSARTTVLGTTSGSSFPNN TD IVRTYT TTI PKDGTV.G ...0. TSAANAN IVGGI EYSI NN.. ASLCSVGFSVTRGATKGFVTAI CG. TVNATAR. GGAVVGTFAARVFPGN RAWVSLTSA TLLPRVAN. GSSFVT V RGSTE

b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bA a C O E F

5 12 27 34 37 47 57 62 64 72 a310 30 40 50 S7 60 00 90 100 102 110 120

s0 70 00 90 Box 100 110 120 ox 4 130 140*I 01_I

CRSGIYYCKSTAKHYPIVVTPPSITYKIEANDYYPERMOTHILLGIGFAEPODCOOLLRCEH. V ILTOGGO HV..GFADVDLLWIE DDAME OIG.

CNAGVYYCESRRKYYPVSFVGPTFOYMEANNYYPVRYOSMLI GHGFESPODCG LRCHH. GVIIITA GEGLV. AFSDI RDLYAYEEAMEOI G.

CTOATTTCKHEONRFPITVTSHDWYEIOESEYYPKHIOYNLLIGEGPCEPODOCOOLLCKH.GVtOIOTAGGDNHV..AFIDLRHFHC.AEEOiGc t V V y,y c 9 r k y P a 9 t P n e n Y y P q y g m 9 n 9 e p DN1]G GiIj c h V G t 19g 9 gfad m a q

AFVGOAVORSGSTTGLRSGSVTGLNATV..N0GSSGIVTGMIOTNVCAGP00DSSCSLFAGS.TALGLTS GSSSCRT.GGTTFYOPVTEALSATGATVLATVGMAVTRRGSTTGTHSGSVTALNAT V. NYGGGDVVTYGM IRTNVCAEPODSOGLLYST. RIGLLTSIGSGNCSS. GGTTFFOPVTEAL SVYGASVYAAVGAAVC NSGRTTG TOCGTITAKNVT0A.. S.O ST O0SWIS OO VMSGONVOSN.SORSSLFERLOPILSOTGLSLV__U~G .A .. .. iL

bIbb bb b b b bb b bbbb bbbb b bbIb b b bIb b bb b b b b b bIbb b bbb bb b a a a a a a aa H I J K L I

9 6 10 1 14 1 24 1 34 t 3 9 1 46 t S I 50a 9 I100 19 11 30 1 40 1 60 1 70 10a0 1 90 1 95 2 00 2 10 2 20 2 3 24 0

FIG. 2. Alignment of the viral Cys protease sequences with representative trypsin-like Ser protease sequences from mammals and bacteria.The numbering schemes of Fig. 1 are followed. (a) Viral 3c/large Ser protease alignment. (b) Picornavirus 2a/small bacterial Ser proteasealignment. Consensus (Cons) sequences highlight identical (uppercase) and most frequently occurring (lowercase) residues at any given position.The 12 core 3-strands (labeled A-L) and short C-terminal a-helix (labeled 1) characteristic of all trypsin-like Ser proteases are located underthe profiles. Gaps in the alignments map to length- and sequence-variable surface loops. The similarity boxes (labeled 1-4) of Fig. 1 encase the"conserved" catalytic triads of His, Asp, and Ser/Cys residues (#) as well as the few other identical matches (boxed and in bold). Otherchemically conserved residues are just in bold lettering. Notably, residues 189-190, 213-216, and 226 (labeled by @) are important in formingthe substrate binding pocket in large Ser proteases. Residue 213 in the 2a proteases is not a conserved His as was the case in the 3c and SAPenzymes. The * in the ALP sequence marks a small insertion (3).

a Vla, a

CPMV O24KT EV NIII a

TVMV NI aF MiDV_T ttE VE MCV 3oHAVB E V 3oCX V 3_EVe

H P V I 3oHRVt 4 3_H RV 2 _o

CoS A PS GTT RPC H TE L A

Soo StI,St r *nd

Chy a

T E V III aT V Lt V II aF MDV 3IeT M E V 3oE MlC V _3H A V Itse E V ItsCX V _3E V S So

H PV I 3eHRVI 4_ 3H RV 2 Its

ConSAPS0P

S CTT RPC H TE LA

S5. StS ta StndrSItraods

Cb y m a

C PMWV 24 KT EVNI a

T V tV- III aF MDV3aST M E V 3cE MCV 3H A V 3BE V o3CXV

H P V I _ 3oHRVI 4530H RV 2 _3 e

Co "S AS GTT RPC H TE LA

See-St r

Ch Yon aVI r al IF

by0 m

0EVV 2a

H PV I 2a|HRV1 40 2aHRV2 _ 2aC00 n

COOP A

aEa0.2

A L P

5t ra| tr

Som 2InC h y m

Vl r el

B E V_ 2a|CXV 2az

HRV1 42HR V 2 2a|C00 n

00000P,LS ISSm

IChy .

1 3 0 1 4 0 1 5 0

fNKVSRYL EY ... EAPTI PEDC SLVI AH GS DG0I FWKHW.I.. OTKD CSPLVSTNCE DT S F WOHW. T T K D CSPLV S I CDTMPGLFAY RAAT KA Y CGGAVL AK G.ACFNCLH. .... NAOTN COSAO CNVNOTFNHCI HY.. KANTRK CGCSALLADLGrvDLTVDOAW..RGKOGEGLP MCGIGALVSSNCTPTKVL0 MY.M EFPTKA CO4CVIGVI SMTPTK 'IML . FPTFtAG C450LMST 0TPTKNLMT. .... NFPTRA OCOSVLMSTOROTARI LMY .. . N F P T R A C IGVI T C TTPTNRMI RY . DYATKT CGSVLCATGTNOTARMLKY . SY PT K S YCIOVLYKIG.1. I.rcl0. ...... pka9q Ggpls9lg.rTLKGEAMOY. ... DLSTTG NSGSPVFNEKNANEE A D S G P FtRKiANEKICAGTPDTGGVDTCO DSSPVMFCKDIITSNMFCAGYLO. GGKDSCM D S PVVCS.KDAMI CAGA. SGVSSCM DSCOPLVCKKNiKNSMVCAGGD. SVNG C DC OPLHCLVI

bbbbb ** bb bbbbI J

4 0 5 0 so0

I 6 4 0o 7 0

2 2 0

Page 4: Viral cysteine proteases are homologous to the trypsin-like family of

Proc. Natl. Acad. Sci. USA 85 (1988) 7875

homologous sequences from the data bank. The chymotryp-sinogen numbering scheme (3) was adopted for all cellularand viral sequence referrals (Figs. 1 and 2).The alignment in Fig. 2a illustrates the evolutionary rela-

tionship between the 3c viral proteases and the large-trypsinfamily of proteins. While the overall degree of amino acididentity is poor, blocks of significant similarity surround thethree active site residues (boxes 1-4). These regions formeither the hydrophobic core of the 8-barrels or the loops thatdefine the active site and binding pocket of the enzyme. Gapsthat must be introduced to align the shorter 3c proteases mapto sequence and length-variable surface loops (3) connectingthe 12 p-strands of trypsin. One loop stands out regardingboth conservation in length and acidic amino acid composi-tion in both viral and cellular enzymes: residues 70-79 havebeen shown to bind calcium in eukaryotic trypsin molecules(2). The proposed D-E loop in viral proteases may also bindc2 +.Ca2On the basis of known three-dimensional structures, we

can identify the residues that will form the specificity (Si)pocket in the 3c proteases. In particular, residues 189-190 ofthe large-trypsin class are located at the bottom of the S1pocket and are a major determinant of specificity (2,3). Thesepositions are typically Ala/Pro-Thr in the viral 3c proteases,similar to the Ser-Thr pairing of SAP (32). Residues 216 and226 (which lie on opposite sides of the pocket) are wellconserved in character between the viral and cellular en-zymes. The striking differences between the viral 3c andtrypsin-like enzymes (besides the active site nucleophile)map to residues 213 and 215, respectively positioned at theside and top of the specificity pocket (2). Residue 213 istypically a small hydrophobic amino acid and residue 215 alarge aromatic amino acid in the sequenced trypsin homologs(3). The 3c proteases have instead a conserved His-213 andAla/Gly at position 215. Molecular modeling of this pair ofresidues in the pocket of a trypsin-inhibitor complex struc-ture (2) reveals possible hydrogen-bonding interactions be-tween viral His-213, Thr-190, and the S1-bound Gln sidechain (not shown). Accurate positioning of the His-213 ringmay be made possible by the Tyr-228 -* Leu viral mutation.We postulate that residues in this collection are the primarydeterminants of the Gln-Gly cleavage specificity of the 3cproteases. These amino acids are different in all knowntrypsin-like enzymes (3) with the exception of SAP (32),which has a homologous Thr-190/His-213/Gly-215/Val-228complement of residues. This bacterial Ser protease recog-nizes a Glu in the S1 pocket (32).The 2a picornaviral proteases are aligned with three small

trypsin-like Ser proteases of bacterial origin in Fig. 2b.Tertiary structures have been determined by x-ray crystal-lography for SGPA-B and ALP (33). These bacterial enzymesshow clear core homology with the large trypsin-like class,differing significantly only in the economy and surfacepacking of the loops connecting the p-strands (42). The poorsequence identity (typically -15%) compounded with thedifferences in chain length made alignments between bacte-rial and pancreatic enzymes difficult prior to structuraldeterminations (33, 42). Fig. 2b shows that there is asignificant degree of sequence identity in the active siteresidue boxes between bacterial and viral 2a proteases (10identities in the alignment of 8 sequences). Identical residuesmap to the interface between p-barrel domains in a smallbacterial structure and are catalytically important (such asHis-57 and Asp-102 accompanying Cys-195), structurallyvital glycines (such as the tightly buried Gly-211, or Gly-216with stressed dihedral angles), or particularly functional, asin the case of the buried Tyr-171 that hydrogen bonds toresidue 214 (a bacterial Ser, viral Thr, that interacts withAsp-102) (2). These residues are highlighted on the a-carbonframe of the SGPA crystal structure (43) in Fig. 3. Gaps in the

2a protease alignment can be accommodated by shorteningloops in the three bacterial structures. We can identifyspecificity-determining residues in the 2a proteases by com-parison to the homologous SGPA, SGPB, and ALP struc-tures. The substrate-binding pocket in these small bacterialproteases is not well developed and can be better describedas a groove on the surface of the enzyme (42). SGPA andSGPB are characterized as having a chymotryptic specificityfor aromatic residues (42). The 2a proteases are similar toSGPA and B in their specificity for Tyr at the S1 site (22).

DISCUSSIONInspection of the blocks of sequence similarity in Figs. 1 and2 suggests that the 2a and 3c viral proteases are structurallysimilar, from which we infer that they diverged from acommon ancestor. Their separate alignment with small bac-terial and large trypsin-like proteases, respectively, serves toemphasize the parallel evolutionary relationship with the twohomologous classes of trypsin-like molecules. A correctalignment (as regards structure) between the 2a and 3cclasses will be best attained by structural superposition ofspatially equivalent residues that will require large gaps in thesmaller 2a sequences. In lieu of the actual crystal structures,this structural alignment is made possible by the separategrouping of the 3c and 2a classes of viral proteases with thelarge and small trypsin-like molecules (Figs. 1 and 2). Weobserve that the 2a/3c viral proteases are both 35-45 aminoacids shorter than their corresponding small/large cellularhomologs (Fig. 1). By conventional reasoning (42), the viralmolecules could be judged to antedate the cellular moleculesin evolutionary history. However, the greater economy of

FIG. 3. Illustration of important residues in an alignment of smallviral/cellular proteases mapped onto the a-carbon framework ofSGPA (43). Residues of the catalytic triad (residues 57, 102, and 195),and conserved amino acids at positions 211, 214, 216, and 171 are farapart in sequence but contiguous in space. Residue 142 is a conservedLys/Arg in the 2a proteases, and is located in a position to interactwith the conserved Asp-102 in a manner analogous to the bacterialion pair of Arg-138-Asp-102 (42). Loop residue 66 in SGPA is the siteof a seven-amino acid insertion in SGPB. The corresponding D-Eloop in viral 2a proteases is similar to that in SGPA. The E-F loopis abbreviated in the viral proteases in a manner not seen in anybacterial protein and requires the joining of marked residues 91-94.Residues 122-144 denote an abbreviated viral F-G loop (the hingebetween p-barrel domains), hypervariable in sequence and length inknown Ser enzymes. Both of these loops are spatially distant fromthe conserved active site in the viral 2a structural model.

Biochemistry: Bazan and Fletterick

Page 5: Viral cysteine proteases are homologous to the trypsin-like family of

7876 Biochemistry: Bazan and Fletterick

viral proteases mayjust be a reflection of the tightly budgetedviral genome.

Mechanistically the His-57/Asp-102/Cys-195 active sitegeometry of the viral proteases may function homologouslyto that of trypsin and analogously to that ofpapain. The activesite geometry of the catalytic His-159/Asn-175/Cys-25 triadof papain is surprisingly similar to that of the trypsin/subtilisin groups (44) but displays no other structural simi-larity in the rest of the protein (1, 44). We emphasize that thetrypsin, subtilisin, and papain families of enzymes haveindependently evolved a similar active-site constellation offunctional residues but share no common ancestor (1, 2). Theviral 3c protease model is further inconsistent with a subtil-isin or papain secondary and tertiary structure.

Experimental simulations of the papain triad in a trypsin-like Ser protease have used direct chemical modification (45)or mutagenic (46) means to change the active site Ser to aCys. Higaki et al. (46) expressed a mutant trypsin with aSer-195 -- Cys change, producing an enzyme less active thannative trypsin by a factor of 105. This mutant theoreticallymimics the 3c viral Cys protease active site. However, theCys-195 trypsin mutant may be a poorer enzyme because theelectronic charge distribution of the native active site isevolutionarily designed for Ser-195 as nucleophile. Theobserved viral modification of the nucleophilic Ser -+ Cysmay be accompanied by other changes that contribute to anew active site environment.

This study proposes a compelling model for the tertiaryfold and active-site geometry of the viral Cys proteases thatclearly differentiates them from the papain-like class ofcellular Cys proteases. Previous analyses (16-19, 22, 24)have focused on two other conserved viral residues (besidesthe putative active-site Cys) as important to catalysis. Wehave argued that His-213 plays an important specificity-determining role in the 3c viral proteases and bacterial SAP.His-203 (conserved in the 2a class) is predicted to be in asurface loop connecting strands J and K (Fig. 3). Others (47)have postulated that (p/,a-type trypsin) Ser enzymes and(a + ,-type papain) Cys enzymes are ancestrally relatedbecause of a limited sequence similarity in the vicinity ofbothCys-25 (papain) and Ser-195 (trypsin) to some members ofthe3c viral class of proteases. A comparison of the determinedcrystal structures of papain and trypsin enzymes (1, 44) doesnot support this proposal.Asp active-center proteases from retroviruses have re-

cently been shown to be evolutionarily related to the cellularAsp protease family (48). Molecular modeling of the humanimmunodeficiency virus protease by using the x-ray coordi-nates of bacterial pepsin structures has resulted in a usefulstructure for the design of therapeutic inhibitors (48). Anal-ogously, the twin "small" and "large" trypsin-based struc-tural models proposed for the 2a and 3c classes of proteasesare sufficiently detailed to allow testing prior to actualstructural determination. We expect that analysis of theseviral "Cys trypsins" will provide valuable information on theevolution of catalytic function and protein structure in thetrypsin superfamily.

We thank Drs. C. Craik, T. Jukes, J. Higaki, and C.-B. Stewart forstimulating discussions and critical readings of this manuscript.J.F.B. acknowledges the lead of L. Evnin in setting in motionexperiments to test the viral Cys protease structural model. Thesupport of Tina S. Bazan (to J.F.B.) is gratefully recognized.

1. Neurath, H. (1984) Science 224, 350-357.2. Kraut, J. (1977) Annu. Rev. Biochem. 46, 331-358.3. Craik, C. S., Rutter, W. J. & Fletterick, R. J. (1983) Science 220, 1125-

1129.4. Taylor, W. R. (1986) J. Mol. Biol. 188, 233-258.

5. Bashford, D., Chothia, C. & Lesk, A. M. (1987) J. Mol. Biol. 196, 199-216.

6. Skern, T., Sommergruber, W., Blaas, D., Gruendler, P., Fraundorfer, F.,Pieler, C., Fogy, I. & Kuechler, E. (1985) Nucleic Acids Res. 13, 2111-2126.

7. Stanway, G., Hughes, P. J., Mountford, R. C., Minor, P. D. & Almond,J. W. (1984) Nucleic Acids Res. 12, 7859-7875.

8. Stanway, G., Hughes, P. J., Mountford, R. C., Reeve, P., Minor, P. D.,Schild, G. C. & Almond, J. W. (1984) Proc. Nadl. Acad. Sci. USA 81,1539-1543.

9. Werner, G., Rosenwirth, B., Bauer, E., Seifert, J.-M., Werner, F.-J. &Besemer, J. (1986) J. Virol. 57, 1084-1093.

10. Lindberg, A. M., Stalhandske, P. 0. K. & Pettersson, U. (1987) Virol-ogy 156, 50-63.

11. Earle, J. A. P., Skuce, R. A., Fleming, C. S., Hoey, E. M. & Martin,S. J. (1988) J. Gen. Virol. 69, 253-263.

12. Najarian, R., Caput, D., Gee, W., Potter, S. J., Renard, A., Merrywea-ther, J., Van Nest, G. & Dina, D. (1985) Proc. Nail. Acad. Sci. USA 82,2627-2631.

13. Palmenberg, A. C., Kirby, E. M., Janda, M. R., Drake, N. I., Potratz,K. F. & Collett, M. C. (1984) Nucleic Acids Res. 12, 2969-2985.

14. Pevear, D. C., Calenoff, M., Rohzon, E. & Lipton, H. L. (1987)J. Virol.61, 1507-1516.

15. Forss, S., Strebel, K., Beck, E. & Schaller, H. (1984) Nucleic Acids Res.12, 6587-6601.

16. Argos, P., Kamer, P., Nicklin, M. J. H. & Wimmer, E. (1984) NucleicAcids Res. 12, 7251-7267.

17. Franssen, H., Leunissen, J., Goldbach, R., Lomonossoff, G. & Zimm-ern, D. (1984) EMBO J. 3, 855-861.

18. Domier, L. L., Shaw, J. G. & Rhoads, R. E. (1987) Virology 158,20-27.19. Lomonossoff, G. P. & Shanks, M. (1983) EMBO J. 2, 2253-2258.20. Domier, L. L., Franklin, K. M., Shahabuddin, M., Hellmann, G. M.,

Overmeyer, J. H., Hiremath, S. T., Siaw, M. F. E., Lomonossoff,G. P., Shaw, J. G. & Rhoads, R. E. (1986) Nucleic Acids Res. 14, 5417-5430.

21. Allison, R., Johnston, R. E. & Dougherty, W. G. (1986) Virology 154,9-20.

22. Wellink, J. & van Kammen, A. (1988) Arch. Virol. 98, 1-26.23. Hanecak, R., Semler, B. L., Ariga, H., Anderson, C. W. & Wimmer, E.

(1984) Cell 37, 1063-1073.24. Toyoda, H., Nicklin, M. J. H., Murray, M. G., Anderson, C. W., Dunn,

J. J., Studier, F. W. & Wimmer, E. (1986) Cell 45, 761-770.25. Garcia, J. A., Schrijvers, L., Tan, A., Vos, P., Wellink, J. & Goldbach,

R. W. (1987) Virology 159, 67-75.26. Carrington, J. C. & Dougherty, W. C. (1987) J. Virol. 61, 2540-2548.27. Ivanoff, L. A., Towatari, T., Ray, J., Korant, B. D. & Petteway, S. R.

(1986) Proc. Nail. Acad. Sci. USA 83, 5392-5396.28. Konig, H. & Rosenwirth, B. (1988) J. Virol. 62, 1243-1250.29. Korant, B. D., Brzin, J. & Turk, V. (1985) Biochem. Biophys. Res.

Commun. 127, 1072-1076.30. Sidman, K. E., George, D. G., Barker, W. C. & Hunt, L. T. (1988)

Nucleic Acids Res. 16, 1869-1871.31. Read, R. J., Brayer, G. D., Jurasek, L. & James, M. N. G. (1984)

Biochemistry 23, 6570-6575.32. Drapeau, G. R. (1978) Can. J. Biochem. 56, 534-544.33. Delbaere, L. T. J., Brayer, G. D. & James, M. N. G. (1979) Can. J.

Biochem. 57, 135-144.34. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Jr.,

Brice, M. D., Rogers, J. R., Kennard, O., Shimanouchi, T. & Tasumi,M. (1977) J. Mol. Biol. 112, 535-542.

35. Devereaux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids Res. 12,387-395.

36. Gribskov, M., Homyak, M., Edenfield, J. & Eisenberg, D. (1988) Comp.Appl. Biol. Sci. 4, 61-66.

37. Martinez, H. (1988) Nucleic Acids Res. 16, 1683-1691.38. Needleman, S. & Wunsch, C. (1970) J. Mol. Biol. 48, 443-453.39. Garnier, J., Osguthorpe, J. D. & Robson, B. (1978) J. Mol. Biol. 120, 97-

120.40. Finer-Moore, J. & Stroud, R. M. (1984) Proc. Natl. Acad. Sci. USA 81,

155-159.41. Cohen, F. E., Abarbanel, R. M., Kuntz, I. D. & Fletterick, R. J. (1986)

Biochemistry 25, 266-275.42. James, M. N. G. (1976) in Proteolysis and PhysiologicalRegulation, eds.

Ribbons, D. W. & Brew, K. (Academic, New York), pp. 125-142.43. Sielecki, A. R., Hendrickson, W. A., Broughton, C. G., Delbaere,

L. T. J., Brayer, G. D. & James, M. N. G. (1979) J. Mol. Biol. 134, 781-793.

44. Argos, P., Garavito, R. M., Eventoff, W., Rossmann, M. G. & Branden,C. I. (1978) J. Mol. Biol. 126, 141-158.

45. Yokosawa, H., Ojima, S. & Ishii, S. (1977) J. Biochem. 82, 869-876.46. Higaki, J. N., Gibson, B. W. & Craik, C. S. (1987) Cold Spring Harbor

Symp. Quant. Biol. 52, 615-621.47. Gorbalenya, A. E., Blinov, V. M. & Donchenko, A. P. (1986) FEBS

Lett. 194, 253-257.48. Pearl, L. H. & Taylor, W. R. (1987) Nature (London) 329, 351-354.

Proc. Natl. Acad. Sci. USA 85 (1988)