5
Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 19, Issue Number 2, (2001) ©Adenine Press (2001) Cytochrome P450 Pattern Revision www.adeninepress.com Abstract The pattern suggested for the structure-function superfamily of cytochromes P450 is com- posed by combining the conserved amino acid motifs. The sizes of P450 cytochromes were estimated according to their length. The empirical coefficients reflecting the peculiarities of the primary structure of these enzymes are calculated. We propose an approach for deter- mining novel proteins sequences to the mentioned superfamily on the ground of the complex of these parameters. A number of the hypothetical proteins from the international databases is related to the cytochromes P450 by means of our pattern. Introduction Cytochromes P450 form a very diverse group of haemoproteins, which differ in chemical reactions they catalyse and in the number of participating substrates. The members of the superfamily are found in most pro- and eukaryotic organisms and almost in every cell type. They are involved in metabolism of such endogenous compounds as steroids, retinoids, fatty acids, eicosanoids, prostaglandins, throm- boxanes, leukotrienes and other lipid metabolites, as well as a lot of xenobiotics including drugs, procarcinogens, pesticides etc. (1-3). Regulation of their biologi- cal activity has attracted attention of many investigators in a diverse set of biolog- ical disciplines including biochemistry, endocrinology, oncology, toxicology and pharmacology. Cytochromes P450 are one of the most intensively studied enzymes not only because of its physiological importance, but also because of its peculiar structure and mechanism of functioning. All P450s appear to take on a similar structural fold, yet frequently they have only 12-20% identity between different isoforms (4, 5). This hampers their analysis and sequence homology identification (6), thus inca- pacitating one of the most powerful research tools (7). Under such circumstances it is convenient to possess a pattern of conservative amino acids which have a spe- cific relative position in the polypeptide chain. In addition, such an approach could be helpful for a more deep understanding of P450s sequence-function relationships on the whole as well as relationships between amino acid composition and protein structure (8). The aim of this work is to establish the pattern, which identifies the proteins to be considered as P450 superfamily members automatically. Information on superfam- ily affiliation will help in interpreting the likely functional properties and structure for new sequences and proposing a way in which mutation in the residues may have affected the function. Materials and Methods The sequences known as cytochromes P450 were extracted from the primary struc- Alexander G. Buchatskii*, Konstantin Yu. Kazachenko and Alexander A. Alexandrov Institute of Molecular Genetics, Russian Academy of Science, Kurchatov sq., Moscow 123182, Russia 273 *Author to whom correspondence should be addressed. Phone: (095) 196-02-01; Fax: (095) 196-02-21; E-mail: [email protected]

Cytochrome P450 Pattern Revision

Embed Size (px)

Citation preview

Page 1: Cytochrome P450 Pattern Revision

Journal of Biomolecular Structure &Dynamics, ISSN 0739-1102Volume 19, Issue Number 2, (2001)©Adenine Press (2001)

Cytochrome P450 Pattern Revision

www.adeninepress.com

Abstract

The pattern suggested for the structure-function superfamily of cytochromes P450 is com-posed by combining the conserved amino acid motifs. The sizes of P450 cytochromes wereestimated according to their length. The empirical coefficients reflecting the peculiarities ofthe primary structure of these enzymes are calculated. We propose an approach for deter-mining novel proteins sequences to the mentioned superfamily on the ground of the complexof these parameters. A number of the hypothetical proteins from the international databasesis related to the cytochromes P450 by means of our pattern.

Introduction

Cytochromes P450 form a very diverse group of haemoproteins, which differ inchemical reactions they catalyse and in the number of participating substrates. Themembers of the superfamily are found in most pro- and eukaryotic organisms andalmost in every cell type. They are involved in metabolism of such endogenouscompounds as steroids, retinoids, fatty acids, eicosanoids, prostaglandins, throm-boxanes, leukotrienes and other lipid metabolites, as well as a lot of xenobioticsincluding drugs, procarcinogens, pesticides etc. (1-3). Regulation of their biologi-cal activity has attracted attention of many investigators in a diverse set of biolog-ical disciplines including biochemistry, endocrinology, oncology, toxicology andpharmacology.

Cytochromes P450 are one of the most intensively studied enzymes not onlybecause of its physiological importance, but also because of its peculiar structureand mechanism of functioning. All P450s appear to take on a similar structural fold,yet frequently they have only 12-20% identity between different isoforms (4, 5).This hampers their analysis and sequence homology identification (6), thus inca-pacitating one of the most powerful research tools (7). Under such circumstancesit is convenient to possess a pattern of conservative amino acids which have a spe-cific relative position in the polypeptide chain. In addition, such an approach couldbe helpful for a more deep understanding of P450s sequence-function relationshipson the whole as well as relationships between amino acid composition and proteinstructure (8).

The aim of this work is to establish the pattern, which identifies the proteins to beconsidered as P450 superfamily members automatically. Information on superfam-ily affiliation will help in interpreting the likely functional properties and structurefor new sequences and proposing a way in which mutation in the residues may haveaffected the function.

Materials and Methods

The sequences known as cytochromes P450 were extracted from the primary struc-

Alexander G. Buchatskii*, Konstantin Yu. Kazachenko andAlexander A. AlexandrovInstitute of Molecular Genetics, Russian

Academy of Science, Kurchatov sq.,

Moscow 123182, Russia

273

*Author to whom correspondence should be addressed.

Phone: (095) 196-02-01; Fax: (095) 196-02-21; E-mail: [email protected]

Page 2: Cytochrome P450 Pattern Revision

tures databases – SWISS-PROT (version of August 2000) or PIR (version of June2000) (9, 10). The initial pattern was found in the database of specific sites(PROSITE, version of September 2000) (11). The pattern’s search has been ful-filled on the basis of the sequence multiple alignment procedure (12). Because oflow identity the sequences of different isoforms were aligned (each separately) upto the overlapping with the rest of the alignments. In some cases the hand-fittingwas used. Then the high-conservative amino acids were selected and the gaps per-missible only within specific ranges were determined. Further their fitting for auto-matic extraction combination with the suitable sensitivity and advanced power dis-criminating P450s from non-P450s was determined as a pattern. To make the selec-tivity perfect, the P450s sizes and the specific coefficients performed were supple-mented to the pattern. In this study the amino acid representation form (non-nucleotide) of sequences were utilized, as recommended (11). The hypotheticalproteins were found out in TREMBL databank (version of July 2000) (9). The divi-sion for N- and C-termini is performed relative from haem-binding Cys. Thesequence from the mentioned Cys which directs to the C-terminus is signed as C-terminus and the sequence from the Cys to N-terminus was called N-terminus. Thenumeration is also performed from the invariant Cys to the C-terminus as “+” andto the N-terminus as “-”.

Results and Discussion

The information on the segments of amino acid similarity located in functionallyimportant parts of proteins is always essential, and when the proteins with low levelof similarity are compared, it is of outmost importance. If the functionally impor-tant amino acids are similar, one can consider proteins as real similar ones, but ifthere is no similarity, one can assume the resemblance between the proteins to beinsignificant or invoke that they perform quite different functions. As a rule, suchdata are hard to formalize and their registration requires investigator’s participa-tion.

For P450s one typical motif in haem-binding segment was previously identified asa signature (11) (cysteine is the fifth ligand of the haem iron) as follows:

[FWK] – [GSNH] – x – [GD] – x – [RHKTP] – x – C – [LIVMFAP] – [GAD],

where “x” is any amino acid; and in square brackets any amino acid of those giventhere. However, this pattern needs to be supplemented and modified because thesuperfamily grows like an avalanche. Taking into consideration the amino acidsequences available up to now in the databases, we need to complete some moreamino acids to the pattern:

[FWKYPL]-x(0,1) – [GSNHTAQL] – x– [GDAER] –x– [RHKTPSN]-x –C- [LIVMFAP]-[GAD]

Besides, it should be noted that an amino acid at the beginning of the pattern mightbe shifted by one position towards the N-terminus. Thus, the pattern becomes“blurred” and thereby is not useful as a recognition method, but on the other side,these umbiquities indicate the interchangeability of amino acids forming the micro-surroundings of the haem in the mentioned segment.

The majority of these variants happen rarely and sometimes only in a single super-family member and their functional meaning is not quite clear. Up to the 3,5% ofsuch singular cases (13) might be due to improper sequencing or in some casestranslation errors. It is noteworthy, that the factors which determine residue con-servation in P450 superfamily is mainly associated with structural ones. The flank-ing amino acids of the haem-binding motif are involved in the β–turns formation(14), forming so-called Cys-pocket. The amino acid in +2 position from Cys to theC-terminus (as a rule – Gly) allows a sharp turn from the Cys-pocket into the L-

274Buchatskii et al.

Page 3: Cytochrome P450 Pattern Revision

helix. The Phe which dominates in the -7 position completes the hydrophobicshielding of the Cys-iron bond similar to the tryptophan adjacent to the disulfidebond in the immunoglobulins superfamily (15). On the other hand, it is shown thatthe haem hydrophobic envelopment is important in establishing the redox potentialof the haem iron (16). The -2 position residue in structurally determined P450s islikely to participate in the coordination of the negatively charged propionate groupof the haem. However, the positively charged/polar amino acids are not alwayspresent in the set [RHKTPSN], that features, e.g., the CYP7 family. In these casesthe other bond types are probably used (e.g., a hydrogen bond with the propionatethrough water molecules).

We also have identified a stretch of relatively conserved amino acids in the L-helixregion adjoining the haem-binding motif from the C-terminus (14):… - C –[LIVFMAP] – [GAD] –x- x- [LIVMFYAP]-[AGSTPVD],

where the first amino acid is hydrophobic and the second is small in size. However,we were not able to recruit within this site one more amino acid which is usuallyconsidered to be a conservative one (Glu) (17). Variability in this position appearedto be too pronounced to use it in the pattern matching procedure.

Another motif which attracted our attention is located in the interval of 58-112residues from the haem-binding site (Cys). It has more stalwarts conservatismamong other E/D-x-x-R blocks often dispersed through the N-terminal part ofP450s. The region contains Glu and Arg from the K-helix that form a set of salt-bridge interactions and hydrogen bonds with the meander region on the proxymalface of the enzyme. Disruption of the folding motif leads to the loss of haem bind-ing and P450 enzymatic activity (16).

Thus, the cytochrome P450 pattern as changed finally assumes the following form:

[ED]-x-x-R-x(50,105) – [FWKYPL]-x(0,1)-[GSNHTAQL]-x-[GDAER]--x-[RHKTPSN]-x- C – [LIVFMAP]-[GAD]-x-x-[LIVMFYAP]-[AGSTPVD].

The length of P450s stated in PROSITE (from 400 to 530 amino acids) alsorequires correction as the dimensions of full-length proteins of the superfamily areout of this scope. The smallest (338 amino acids) is a cytochrome P450 homologF6G17.50 from Arabidopsis thaliana; the largest (591 amino acids) is a fatty acidw-hydroxylase from Homo Sapiens (EC 1.14.15.-).

The critical parameters take into account not only amino acids and their arrange-ment (relatively to each other), but also their relative spreading in the polypeptidechain. The coefficient “Kc” is defined as equal to the relation of C-terminus to N-terminus (Kc = ∆C/∆N) and to be able to assume values from 0.06 till 0.3.Although all monodomainic and full-dimensive (often rather extended fragments)proteins are located in this range, they are not distributed evenly. In most cases thecoefficient “Kc” ranges from 0.1 till 0.2. Arginine from the other motif givenbefore often divides the P450 molecule approximately as 2/3. The coefficient “KR”is calculated in relation to the position of this Arg as a ratio of the number of theArg position in a definite enzyme to its length in amino acids is located in the inter-val of 0.56 and 0.78.

Thus, the classification of the protein which is under investigation with unknownstructure and function as a member of the P450 superfamily is performed purelywith the help of its amino acid sequence. The aggregate of the stated before param-eters is taken into account: cytochrome motif conserved amino acids, the length ofthe sequence (from 330 till 600 amino acids) and the coefficients mentioned. TheP450 enzymes among 88166 all of the proteins from the SWISS-PROT databankare unique for which significance of the complex of the indicated parameters is

275Cytochrome P450 pattern

revision

Page 4: Cytochrome P450 Pattern Revision

peculiar.

As a result of the experimental application of the elaborated pattern, an implementof the six hypothetical proteins with the low level of the mutual identity to thesought superfamily was found, namely: CAB87779 Arabidopsis thaliana;AAF65589 Brevibacterium linens; Q9SCN2 Arabidopsis thaliana; O24727Nocardioides sp.; Q9ZHK0 Rhodococcus sp.; O86330 Mycobacterium tuberculo-sis.Here we give amino acid sequences of the two representatives of different P450families and one hypothetical protein with the main amino acid of the patternmarked out in them. We also present for comparison the adenosine deaminase forcomparison which has some similarity in the primary structure in the field underanalysis and is at the same time in the PROSITE pattern but is out of our patterndue to the innovations specified.

Allene oxide synthase Linum usitatissimum (EC 4.2.1.92); 536 amino acids;Kc=0.096, KR=0.75ETLRIEPPVALQYGKAKKDFILESHEAAYQVKEGEMLFGYQPFATKDP-KIFD RPEEFVADRFVGEGVKLMEYVMWSNGPETETPSVANKQCAGKD-FVVMAA

Prostacyclin synthase Homo Sapiens (EC 5.3.99.4); 500 amino acid; Kc=0.13, KR=0.7ESLRLTAAPFITREVVVDLAMPMADGREFNLRRGDRLLLFPFLSPQRDPEIYD PEVFKYNRFLNPDGSEKKDFYKDGKRLKNYNMPWGAGHNHCLGRSYAVNS

Hypothetical 43.9 kD protein Arabidopsis thaliana; 382 amino acids; Kc=0.17, KR=0.68EALRCGNIVKTVHRKATHDIKFNEYVIPKGWKVFPIFTAVHLDPSLHENPFEFNPMRWTKTTAFGGGVRVCPGGELGKLQIAFF

tRNA-specific adenosine deaminase 2 Saccharomyces cerevisiae (EC 3.5.4.-); 250amino acids;Kc=0.22DTCTLVPKNNSAAGYESIPGILRKEAIMLLRYFYVRQNERAPKPRSKSDRVLDKNTFPPMEWSKYLNEEAFIETFGDDYRTCFANKVDLSSNSVD

For the last decade the comparative analysis of sequences has become an effectiveand essential instrument for most theorists and innovators. The number of decodedprimary structures of biopolimers increases rapidly, that gives the grounds to con-sider an important condition of successful work in the molecular biology field as aresult of a skillful orientation in this stream of information.

Acknowledgements

The authors thank Dr. Vtyurin N. for the valuable consultations and the given mate-rials. This work was partly supported by RBFR grant 98-07-91121.

References and footnotes

1. Nelson, D.R., Koymans, L., Kamataki, T., Stegeman, J.J., Feyereisen, R., Waxman, D.J., Waterman, M.R., Gotoh, O., Coon, M.J., Estabrook, R.W., Gunsalus, I.C. and Nebert, D.W., Pharmacogenetics 6, 1-42 (1996).

2. Guengerich, F.P., American Scientist 81, 440-447 (1993).3. Coon, M.J., Ding, X.X., Pernecky, S.J. and Vaz, A.D., FASEB J. 6, 669-673 (1992).4. Poulos, T.L., (1995) Curr. Opin. Struct. Biol. 5, 767-774 (1995).5. Graham, S.E. and Peterson, J.A., Arch. Biochem. Biophys. 369, 24-29 (1999).6. Degtyarenko, K.N. and Archakov, A.I., FEBS Lett. 332, 1-8 (1993). 7. Sverdlov, E.D., Molek. Biol. 33, 917-940 (1999).

276Buchatskii et al.

Page 5: Cytochrome P450 Pattern Revision

8. Kasuya A. and Thornton J.M., J. Mol. Biol. 286, 1673-1691 (1999).9. Bairoch, A. and Apweiler, R., Nucleic Acids Res. 28, 45-48 (2000).10. McGarvey, P.B., Huang, H., Barker, W.C., Orcutt, B.C., Garavelli, J.S., Srinivasarao,

G.Y., Yeh, L.S., Xiao, C. and Wu, C.H., Bioinformatics 16, 290-91 (2000).11. Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A., Nucleic Acids Res. 27, 215-19

(1999).12. Barker W.C., Garavelli J.S., Huang H., McGarvey P.B., Orcutt B.C., Srinivasarao G.Y.,

Xiao C., Yeh L.S., Ledley R.S., Janda J.F., Pfeiffer F., Mewes H.W., Tsugita A. and Wu C.,Nucleic Acids Res. 28, 41-44 (2000).

13. Kristensen, T., Lopez, R. and Pridz, H., DNA Seq. 2, 343-46 (1992).14. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov,

I.N. and Bourne, P.E., Nucleic Acids Res. 28, 235-242 (2000).15. Ioerger T.R., Du C. and Linthicum D.S., Mol. Immunol. 36, 373-86 (1999).16. Hasemann C.A., Kurumbail R.G., Boddupalli S.S., Peterson J.A. and Deisenhofer J.,

Structure 3, 41-62 (1995).17. Poulos T.L., Methods Enzymol. 206, 11-30 (1991).

Date Received: May 24, 2001

Communicated by the Editor Valery Ivanov

277Cytochrome P450 pattern

revision