8
JOURNAL OF FERMENTATION AND BIOENGINEERING vol. 76, No. 4, 257-264. 1993 Nucleotide SequenceAnalysis of a Region Upstream of the Cholesterol Oxidase-Cytochrome P450 Operon of Streptomyces sp. SA-COO Revealing Repeating Units Coding for Putative Transmembrane and DNA-Binding Proteins ISTVAN MOLNAR AND YOSHIKATSU MUROOKA* Department of Fermentation Technology, Faculty of Engineering, Hiroshima University, Kagamiyama I, Higashi-Hiroshima 724, Japan Received 19 October 1992/Accepted 2 August 1993 A 5.8-kb segment from the upstream region of the cholesterol oxidase (choA)-cytochrome P450 (chop) operon of Sfreptomyces sp. SA-COO was sequenced. Computer assisted analysis of the sequence revealed four open reading frames (ORFs), whose deduced gene products could be classified into two groups. Cho-Orfl, the C-terminal segment of Cho-Orf3, and Cho-Orf4 were homologous to each other and showed similarities to the DNA-binding domains of bacterial response regulators of the UhpA subfamily, while a putative transmem- brane protein, Cho-Orf2, and the N-terminal segment of Cho-Orf3 were homologous to each other but no homologies to known proteins were found. The genes coding for these putative proteins appeared to be organ- ized as repeating units. Structural features of the nucleotide sequence and the homologies of the predicted gene products are discussed. Computer assisted analysis of the nucleotide sequences of large DNA regions has proved to be an extremely useful tool for the characterization of gene clusters of antibiotic biosynthesis in Streptomyces (l-4). Since the biological functions of the individual genes in these clusters are often very difficult to test, comparisons of their putative gene products with protein sequence databanks have often provided the sole clue to their potential roles in compli- cated and as yet only partially characterized biochemical pathways (4). Besides being producers of more than 60% of the natu- rally occuring antibiotics, several strains of the saprophytic Gram-positive bacteria Streptomyces have also been shown to decompose different steroids, including choles- terol (5). The cholesterol catabolic pathway of these strains is usually initiated by the oxidation of the 31%hy- droxyl group of cholesterol (6), and proceeds through the simultaneous degradation of the 17-alkyl side chain and the steroid ring nucleus (6-S). Recently, we have cloned and sequenced an operon (cho) containing the genes for extracellular cholesterol oxi- dase (c/r&) (9, 10) and a cytochrome P450-like protein (chop) (I 1) from Streptomyces sp. SA-COO. We noticed that the upstream 1.2-kb region of the cho operon influences the copy numbers of the expression vector and, consequently, the overexpression levels of cholesterol oxi- dase in a Streptomyces lividans host-vector system (12). Since biosynthetic and regulatory genes for the produc- tion of secondary metabolites are clustered in the Strepto- rnyces genome, we presumed the biodegradative and regu- latory genes for cholesterol decomposition would exhibit a similar organization. In this report, we describe the nucleotide sequence anal- ysis of a 5.8-kb segment of the Streptomyces sp. SA-COO chromosome adjoining the cho operon upstream. Unex- pectedly, we found that this region consists of reiterated * Corresponding author. units that may code for proteins with putative transmem- brane domains or homologies to DNA-binding proteins with helix-turn-helix motifs. Structural features of this region as well as its similarities to previously described reiterated regions of the Streptomyces genome are also discussed. MATERIALS AND METHODS Microorganisms and cloning vectors Streptomyces sp. SA-COO, a producer of extracellular cholesterol oxi- dase, was provided by Toyobo Co., Ltd. (Tsuruga, Fukui). Escherichia coli XL1 Blue, plasmid pUC19, and phages M13mp18 and M13mp19 were from Takara Shuzo Co., Ltd. (Kyoto). DNA manipulation Enzymes for DNA manipula- tion were purchased from Toyobo Co., Ltd., or Takara Shuzo Co., Ltd. Microbiological and recombinant DNA techniques for E. coli and Streptomyces were carried out according to Sambrook et al. (13) and Hopwood et al. (14), respectively. Ml3 clones for sequencing were created by subcloning appropriate restriction fragments, or by se- quential deletions with Exonuclease III and Sl Nuclease (13). The nucleotides were sequenced manually by using the Tth DNA Sequencing Kit of Toyobo Co., Ltd., or automatically on an A.L.F. DNA Sequencer with an AutoReadTM Sequencing Kit (both from Pharmacia LKB Biotechnology AB, Bromma, Sweden). DNA and pro- tein sequences were analyzed using the GENETYX pro- gram, version 18 (SDC Software Development Co., Ltd., Tokyo). Genomic walking Genomic walking was accom- plished by a method similar to that of Nicholls et al. (15). In brief, short SmaI fragments from the distal regions of the cloned fragment of pCO1 (9) were used as probes in genomic Southern hybridizations to determine the restric- tion map of the chromosome of strain SA-COO around the insert of pCO1. BgnI restriction fragments 14.3 and 257

nucleotide sequence analysis of a region upstream of the cholesterol oxidase-cytochrome p450 operon of streptomyces sp. SA-COO revealing repeating units coding for putative transmembrane

Embed Size (px)

Citation preview

JOURNAL OF FERMENTATION AND BIOENGINEERING vol. 76, No. 4, 257-264. 1993

Nucleotide Sequence Analysis of a Region Upstream of the Cholesterol Oxidase-Cytochrome P450 Operon of Streptomyces sp. SA-COO

Revealing Repeating Units Coding for Putative Transmembrane and DNA-Binding Proteins

ISTVAN MOLNAR AND YOSHIKATSU MUROOKA* Department of Fermentation Technology, Faculty of Engineering, Hiroshima University, Kagamiyama I,

Higashi-Hiroshima 724, Japan

Received 19 October 1992/Accepted 2 August 1993

A 5.8-kb segment from the upstream region of the cholesterol oxidase (choA)-cytochrome P450 (chop) operon of Sfreptomyces sp. SA-COO was sequenced. Computer assisted analysis of the sequence revealed four open reading frames (ORFs), whose deduced gene products could be classified into two groups. Cho-Orfl, the C-terminal segment of Cho-Orf3, and Cho-Orf4 were homologous to each other and showed similarities to the DNA-binding domains of bacterial response regulators of the UhpA subfamily, while a putative transmem- brane protein, Cho-Orf2, and the N-terminal segment of Cho-Orf3 were homologous to each other but no homologies to known proteins were found. The genes coding for these putative proteins appeared to be organ- ized as repeating units. Structural features of the nucleotide sequence and the homologies of the predicted gene products are discussed.

Computer assisted analysis of the nucleotide sequences of large DNA regions has proved to be an extremely useful tool for the characterization of gene clusters of antibiotic biosynthesis in Streptomyces (l-4). Since the biological functions of the individual genes in these clusters are often very difficult to test, comparisons of their putative gene products with protein sequence databanks have often provided the sole clue to their potential roles in compli- cated and as yet only partially characterized biochemical pathways (4).

Besides being producers of more than 60% of the natu- rally occuring antibiotics, several strains of the saprophytic Gram-positive bacteria Streptomyces have also been shown to decompose different steroids, including choles- terol (5). The cholesterol catabolic pathway of these strains is usually initiated by the oxidation of the 31%hy- droxyl group of cholesterol (6), and proceeds through the simultaneous degradation of the 17-alkyl side chain and the steroid ring nucleus (6-S).

Recently, we have cloned and sequenced an operon (cho) containing the genes for extracellular cholesterol oxi- dase (c/r&) (9, 10) and a cytochrome P450-like protein (chop) (I 1) from Streptomyces sp. SA-COO. We noticed that the upstream 1.2-kb region of the cho operon influences the copy numbers of the expression vector and, consequently, the overexpression levels of cholesterol oxi- dase in a Streptomyces lividans host-vector system (12). Since biosynthetic and regulatory genes for the produc- tion of secondary metabolites are clustered in the Strepto- rnyces genome, we presumed the biodegradative and regu- latory genes for cholesterol decomposition would exhibit a similar organization.

In this report, we describe the nucleotide sequence anal- ysis of a 5.8-kb segment of the Streptomyces sp. SA-COO chromosome adjoining the cho operon upstream. Unex- pectedly, we found that this region consists of reiterated

* Corresponding author.

units that may code for proteins with putative transmem- brane domains or homologies to DNA-binding proteins with helix-turn-helix motifs. Structural features of this region as well as its similarities to previously described reiterated regions of the Streptomyces genome are also discussed.

MATERIALS AND METHODS

Microorganisms and cloning vectors Streptomyces sp. SA-COO, a producer of extracellular cholesterol oxi- dase, was provided by Toyobo Co., Ltd. (Tsuruga, Fukui). Escherichia coli XL1 Blue, plasmid pUC19, and phages M13mp18 and M13mp19 were from Takara Shuzo Co., Ltd. (Kyoto).

DNA manipulation Enzymes for DNA manipula- tion were purchased from Toyobo Co., Ltd., or Takara Shuzo Co., Ltd. Microbiological and recombinant DNA techniques for E. coli and Streptomyces were carried out according to Sambrook et al. (13) and Hopwood et al. (14), respectively. Ml3 clones for sequencing were created by subcloning appropriate restriction fragments, or by se- quential deletions with Exonuclease III and Sl Nuclease (13). The nucleotides were sequenced manually by using the Tth DNA Sequencing Kit of Toyobo Co., Ltd., or automatically on an A.L.F. DNA Sequencer with an AutoReadTM Sequencing Kit (both from Pharmacia LKB Biotechnology AB, Bromma, Sweden). DNA and pro- tein sequences were analyzed using the GENETYX pro- gram, version 18 (SDC Software Development Co., Ltd., Tokyo).

Genomic walking Genomic walking was accom- plished by a method similar to that of Nicholls et al. (15). In brief, short SmaI fragments from the distal regions of the cloned fragment of pCO1 (9) were used as probes in genomic Southern hybridizations to determine the restric- tion map of the chromosome of strain SA-COO around the insert of pCO1. BgnI restriction fragments 14.3 and

257

258 MOLNAR AND MUROOKA J. FERMENT. BIOENG.,

16.8 kb in size, extending considerable distances from the insert of pCO1 in the “upstream” and the “downstream” directions on the chromosome, were directly cloned by constructing genomic sublibraries from size-fractionated restriction digests of chromosomal DNA and screening these sublibraries with the ,%?a1 probes.

and 0.987, typical of Streptotnyces genes (18). None of the four ORFs contained a TTA codon (19).

RESULTS

Genomic walking To analyze the presumed gene cluster for the biodegradation of cholesterol in Strep- tornyces sp. SA-COO, we cloned a DNA region of about 31 kb centered on the previously isolated cho operon of Streptmnyces sp. SA-COO (9-l 1) by genomic walking. Figure 1 shows the restriction map of the cloned genomic region of 31.1 kb.

Putative gene products of the cho-orfs The trans- lated product of cho-orfl (280 amino acids, M,=30514) had hydrophilic amino acids of 31.8% with a net posi- tive charge, while no long (>20) hydrophobic stretches were found. The protein specified by cho-orf2 (643 amino acids, M,=67295) also showed a net positive charge with hydrophilic amino acids of 24.9!!. The C-terminus of the deduced protein is especially hydrophilic and posi- tively charged. Two long hydrophobic stretches that might traverse membranes were also detected. Cho-Orf3 (884 amino acids, M,=93660) might have one transmem- brane domain, and a net positive charge with hydrophilic amino acids of 26.2?4.

Nucleotide sequence determination To characterize the newly cloned DNA segment, we sequenced about 5.8 kb of the upstream region adjoining the previously re- ported cho operon. The sequence is presented on Fig. 2.

Features of the nucleotide sequence Four open reading frames (ORFs) were predicted using the FRAME computer analysis based on the 0 + C content of the three triplet positions of the Sfrepfornyces coding sequences (Fig. 3) (16). The ATG initiation codons of cho-orf2 and cho-orfl were found to overlap with the TGA termination codons of cho-or@ and cho-or$2, respectively (Fig. 2). Moreover, the initiation codon of cho-or$2 was preceded by a reasonable ribosome binding site, GGAG, inside the cho-or$3 coding sequence. This arrangement suggests that cho-orf3, cho-orfl, and cho-orfl are translationally cou- pled. The most likely start codon of cho-orf3 (ATG, position 310) is preceded by a potential ribosome bind- ing site, GGACG, and is separated from the termination codon of the incompletely sequenced cho-orf4 by a short intergenic region. No extensive inverted repeats, that are characteristic of Streptornyces terminators, were found downstream of cho-orfl, although its stop codon is fol- lowed by a complicated array of direct repeats (Fig. 2). No sequences with homology to the Streptornyces- E. coli- type promoters (17) were found in the sequenced region.

Homologies amongst the putative Cho-Orf proteins The deduced gene products of the four ORFs can be divided into two groups based on the similarities of their amino acid sequences (Fig. 4). Cho-Orfl, the C-terminal -250 amino acids of Cho-Orf3, and the sequenced re- gion of Cho-Orf4 form the first group with identities ex- ceeding 30x, while Cho-Orf2 and the N-terminal 630 amino acids of Cho-Orf3 belong to a second group with an identity of about 30%. Thus, Cho-Orf3 appears as the composite polypeptide of a Cho-Orf2- and a Cho-Orfl- like domain. Although we have resequenced the DNA frag- ment coding for the “fusion region” between the Cho- Orf2- and the Cho-Orfl-like segments of Cho-Orf3, and the corresponding DNA fragment at the border of cho- orf2 and cho-orfl, no sequencing errors were evident. Thus, the apparent “fusion” between a Cho-Orf2- and a Cho-Orfl-like peptide in the composite Cho-Orf3 polypep- tide or, conversely, the “division” of a Cho-Orf3-like poly- peptide to yield the individual Cho-Orf2 and Cho-Orfl proteins, seems to reflect the actual organization of this region.

The GfC contents of cho-orfl, cho-orfl, cho-orf3, and cho-orf4 genes were 0.758, 0.775, 0.781, and 0.751, respectively, with proportions of G+C at third codon positions (excluding Met and Trp) of 0.930, 0.925, 0.928,

Comparisons of the putative Cho-Orf proteins with pro- tein sequence databanks When the first group of pep- tides were compared with the protein sequence databases of the Protein Identification Resource (PIR, National Biomedical Research Foundation, USA) and the SWISS- PROT databank (University of Geneva, Switzerland), the best matches were to the response regulator components of bacterial signal transduction systems (Fig. 5). These systems often involve two components, a histidine pro-

0 31.1 kb

1 0 1 ! Chromosomal 9.2 15.0 kb insert of pCO1

I 9

0 1

14.3 kb “Upstream” fragment

1st - I I “Downstream” 14.3 31.1 kb fragment

FIG. 1. Restriction map of the cloned genomic region centered on the c/m operon (9-l 1) of Slreplonl.~ces sp. SA-COO. Ba, Ea,tl~1; Bg, BgflI; K, Kpnl; Sa, Sacl.

VOL. 16, 1993 ANALYSIS OF REPEATING UNITS OF STREPTOMYCES SP. 259

cccCCACTGACCccACcA~TGAGA~~~A~~~~~~CC~~~~CCG~CC~A~C~A~MGMGACC~~M~CCC~ G H a end w stsrtcho-orl3l-N S P L P G R R D E E D L L N S L

(e-llke doealn) CTGTCGGCCCTCCGCCCGGGCCGGCCCGCGCTCGTCCGCC~CC~CAT~C~~G~CC~~CGACffi~~C~CC~~C~AG~C~~~CACC 1. S A LR R G R PA LVGVHG P PC I G R S ALL D RAA A LA E RAGV RT

GTCCCCCCCCA~CCTGCCA~AGA~ACCTGCCCCICC VAAQACREETDLPHGVAEQLHAALGTGRQPADLCRALLAA

CACCCCTGGCCCCCCn;CGTCCACCACCn;CCCCACm;AC~ACAG~C~CCGA~CC~CCGCGMCGCACC~C~ACCCT~C~~~CC~A~MCTC~~A~TC~ H G W R P 1 V D H L P H L T D S AA EAV R E R T L RT L A II L P E E L L D V L

CCCCGCCCCCCGCCffiC~CCCn;AACCCIV\CGCGCCCGCC PGPRPAFREPKAAEAVLARLSTERREELHARAADNAHRWA

GTCCCCGACAGCGGCGn;CC~~~~G~~CGCCCCCCC VPDSGVARNLLGARVLGAAWAVDVLRREAARCRLAGKRAA

CCCCMCn;CCCCAGCAGCCCCCCGMGCACCCCCGGCCCC PELPEQPAEAAAAANRTALLGRDVDRARQLARAALGPEAR

CCCCTCGTCCCCCCGACCCTCCAGGTGGCGGCCTGTCACGCC PVVPPTLQVAACHALVLSGDFAEARAALDRVLVYAEHTDS

CCffiCGCTCGCCGGGCTCGCC~CTCCTCG~TCTCACCGMC~ffiCC~GAGC~CCCGA~C~CCACCGTCGCCC~GCC~CCCA~A~T~~CCCCffiCA~~ RAVAGLALLVAGLTELRRERPEAATVALARAQEVNPPHCN

2- Kpnl CACCCCCTCATGACCCCGGCACTCCTCGCC~~~CCCC HPLNTPGLVALWALRYLERGDRAAAGRSLALARPAGAEGG

(--IIke doaaln) CTCCCCTGGCCCTACm;CTCTACACCCGCCCCCCCCTGCCC LANAYLLYTRGRVRLAGGQREEALADLLECGRLLLARRVT

MCCCCCCGCn;m;CCCn;GCCTTCGCCCCCCCCCCFCCCCCC NPALLPNRSAAALAHGPAPDCPVAAGLLAEERRLALANGA

CCCCCCGTCGT~CCGAGTCCCM;CTCGGCACCCTCACCCC PGVVTESLLGTLTGGRLHHTHPDAGAPASNQYRQALVALG

ACCGCCCCC~CTCCGCTCACCG~CCATCGA~CCdCCTCGC TAPFSGHRTIDSLLGSGATAARPAPAKPPAARVPGPPHGL

ACCCACCCCGMCTGCGCCnC~CT~C~~A~~T~C~C~~C~T~CCGC~MC~~~CACC~~CAC~~M~C~CC~AC~~C~AT~C TDAELRVAALAADG~lANRAIAAELQVTLRTVELHLT~KAYR

2- 2- ~cn;ccCATCcGcGGGu;CCcACAG~C~C~~~A~~CC~AG~CCCCCTAC~A~~~~C~ACCCCC~~~~AC~~G~ffiM~~C~A K L G 1 R G R P Q L A T A L D S P E Ii P L P II E H S 7 end cho-orf3

rtartcho-arf2flTPLLFDRQHELGLI

FIG. 2.

120

240

360

480

600

120

840

960

1080

1200

1320

1440

1560

1680

1800

1920

2040

2160

2280

2400

2520

2640

2760

2880

3000

260 MOLNAR AND MUROOKA

GCAn;CA~C~CACCGCLC~~CCCGCCCTCCTCCMCG NQVVTASCALLERDFPLCVVSQLLDPLLPCNPADPPGDPN

n;CGCCCCGCCCCCCICTCCCU;GCCbCCACCCCCCCCCnT RPCPLSPASTAALVADRFCRVCDPAYVAACHEATGCNPNF

CCCACCCffiU;CCCCTCCCCCMCTU;CCMCCCCCTCCCCFCCC QPAPVRELAKALAVLDETADPELAGRLACLDCTGRDEAAR

ACCCCGCCC1Y;CTCCTCCA~C~~C~CCC~C~MCT~C~C~CC~G~C~~CCGC~CCTC~CCA~ACACC~C~ffiA~~CTC~CffiC~CCA~ A ALL L H S C C H P A E LA AT Q 1 LA AT S C H DT 1 A V E V 1 R A A A T A

CCCCffiTCCGCffiCGCCGCCCCCU;CCACCCGCCCCC(;TC AVRRCAPRDAARYLRRALLCSAPGGPDRATLLVELAAVER

GCGCC~ffiACCCCCACCCCGCMn;CGCQCC~~CCCACGCC AFDPQAAHRHLSQALLLLPTAAQRALAAARIPPALLGGCP

CCGCCCCCCTC~CCACCCGGTCCTCMGCn;GCCGCCGMC APVVDAVVKLAAELCDPAALRGTERQNALRNEARARHVAV

Sscl TGCCCCCACCCCACCMC~G~C~AC~~Gffi~GC~~CCCCC~CCC~C~~ACAC~C~CCGMC~AGCT~T~C~~CTGC~CAC~~CCACCC

AGPEELVLCADRLRSLCPVPRLDTAAERELVTVLLHGATL

n;ACAeAGCGCAn;ACCCCC~ACAT~CCCCCCTCCC TQRNTAAElAPLANRVLQYEPAAPCHVHTALPLLAHVLVA

CCGCffiACTCCCTCCA~CCGTCCCCCCCn;CCTCCACICGC ADSVEAVCPNLETARERAAGRTPPSRTPRSPSNSPNSCSP

CCAGGCCCGCCn;GACCA~CCCCCCCCCCU;CCCCGACC~AC~~T~C~A~C~ffi~C~~C~TGAC~C~TC~CCT~TCCCC~~GACC~CGA RAANRRPAPAPRRPNTNASPTCPRSRPyendcho-orl2

startcho-orllfl T A V V L V A 1 Q T R D

C~~CCCACGACRCCTCCAffiA~A~A~AG~~C~~C~~ACCCC~~CCAT~CCCCGCC~G~TC~~~~AC~A~AGCG~~T~A HA H D L V D E El ERA V A NC T PV AIG R A Q R V K C A V T EG E R C I E

CMGCTCCA-CCACGGMCTCGCCGCCCTCGCCCACCTGTTGCCGGGGCCGACGGCGGCGGTACGGCCCYAGCCT A~G!ACCGMCCCCCGGCCCGCCGGT cl KpnI

KLQVCGRTELAALAHLLPCPTAAVRP~end~

2- 3-

ACCCCIXGT~~CCGn;CC~ACC~MWAGC 5800

J. FERMENT. BIOENG.,

3120

3240

3360

3480

3600

3’720

3840

3960

4080

4200

4320

4440

4560

4680

4800

4920

5040

5160

5280

5400

5520

5640

5759

FIG. 2. Nucleotide sequence of the DNA region adjoined upstream to the cho operon on the Strepfotnyces sp. SA-COO chromosome. The deduced amino acid sequences of gene products described in the text are given in the single letter code. Putative ribosome binding sites and translational start codons are underlined, or doubly underlined, respectively. Short direct repeat sequences are labeled by numbered arrows. Putative HTH motifs are boxed. These sequence data will appear in the DDBJ, EMBL and GenBank Nucleotide Sequence Databases under the accession number D 13457. The sequence downstream of the KpnI site at nucleotide position 5737 was already published in (11).

VOL. 16, 1993 ANALYSIS OF REPEATING UNITS OF STREPTOMYCES SP. 261

Base number

FIG. 3. Frame analysis (16) of the DNA region shown on Fig. 2. Below: the G+C content in each of three triplet positions over a 50 bp window. Dots are used to distinguish the line of least density. Above: the extents and directions of cho-orfs l-4 marked by heavy lines (arrowheads are ATG codons, vertical lines are stop codons).

tein kinase with a conserved transmitter, and a response regulator with a conserved receiver domain (20). Cho- Orfl and the Cho-Orfl-like segment of Cho-Orf3 ap- peared to lack the conserved receiver domain, since the N- termini of these peptides were different from those of the response regulators. The C-termini of the Cho-Orfl-like peptides, however, aligned with the C-terminal domains of the UhpA subfamily of transcriptional activators, with identities of 28-41x on stretches of 51-137 amino acids. These domains include helix-turn-helix (HTH) motifs often found in DNA-binding regions of proteins recog- nixing specific DNA sequences (21).

We compared the Cho-Orfl-like peptides with a subset of recently described Streptomyces proteins with proposed HTH motifs and/or regulatory functions. Only the posi- tive regulator of the biosynthetic genes for the antibiotic bialaphos, BrpA, showed a significant homology of about 34% in a stretch of 54 amino acids (Fig. 5), which, in its turn, was previously shown to align with the UhpA sub- family of HTH-containing response regulators (22). In- terestingly, BrpA was also reported to lack the conserved receiver domain (22).

To assess the significance of the similarities of the Cho- Orfl-like proteins with the consensus HTH DNA-binding motif, we calculated the “standard deviation (SD) scores” and corresponding probabilities of the implicated seg- ments to form HTH motifs, according to the scoring method of Dodd and Egan (23) (Fig. 5). The putative HTH motifs were also evaluated against the stereochemical criteria of Shestopalov (24) (Fig. 5). The motif of Cho-Orf4 scored high (3.45), with a corresponding probability to form a HTH motif of 50%. Although this motif contains a phenylalanine at position 9 in violation of the Shestopalov rules (24), the recent work of Baumeister et al. (25) indi- cates that a HTH motif (that of the ret repressor from TnlO) can accomodate this residue in this position with- out a significant disturbance of the function. The pro- posed HTH motifs of Cho-Orf3 and Cho-Orfl scored slightly low in the Dodd-Egan calculation (23), but con- formed with the Shestopalov rules (24).

We have compared the putative transmembrane protein Cho-Orf2 and its homologous counterpart segment from Cho-Orf3 with the PIR and the SWISS-PROT data banks,

Cho-Drf 1

650 \ \

700 \ \ 750

2 800

7 2 650

\

Cho- Orf 1

Cho- Orf 3

~~~

Cho-Orf 3 TOO 400 600 a00 D. I........ ‘...I....‘.........‘......... ..-

‘1

. . . \

200- \

\

400- \

,” ‘y ? :

E 600- \

FIG. 4. Harrplot comparisons of Cho-Orf sequences using a win- dow of 70 and a weighted score minimum of 0.5. (A) Cho-Orfl (horizontal axis) versus the C-terminal 280 ammo acids of Cho-Orf3; 32.5% identity. (B) Cho-Orfl (horizontal axis) versus the sequenced part of Cho-Orf4; 50.0% identity. (C) The C-terminal 280 amino acids of Cho-Orb (horizontal axis) versus the sequenced part of Cho-Orf4; 50.7% identity. (D) Cho-Orf3 (horizontal axis) versus Cho-Orf2; 29.7% identity over 603 ammo acids.

262 MOLNAR AND MUROOKA J. FERMENT. BIOENG.,

Protein

Cho-Orfl 2 1s ifPfGiKC~0i~b11h Cho-OrfS 817 EDAIL. I. AE. iD” Cho-Orf4 (II) t;;ESE. “,.A:.. PC:

SiFNeN!. S$. PA 3.71 N6itE%?GKSNDVPrX . .j\* 3. 66 N. id~E0l.K. i SPA 4.44

ELQBNATSLV%: 2. 58 N~A:$KAKSLPB! 2. 27 llhiKbKiKS”V’A ., . . 4.9s

SD score I. 40 2. 24 3.45

3.68

HTE Sheslopalov % criteria

+ +

50%

75% + 71x + 90% 25%

+ 100% +

71% +

FIG. 5. Homology amongst the deduced amino acid sequences of Cho-Orfl, Cho-Orf3, Cho-Orf4, and response regulators of the UhpA subfamily (sequences were taken from the SWISS-PROT databank), and the transcriptional regulator BrpA of Streplo~rt.rces Ir.)~grosco~~~crts (22). Dots symbolize amino acids identical with those of Cho-Orfl. Shaded regions indicate positions having similar amino acids (A=G, D=E, F=W=Y, I=L=M=V, K=R, N=Q, S=T) in at least 6 of the 10 proteins listed. The region containing the proposed helix-turn-helix motifs is enclosed in a bos. SD scores and the probabilities that the region would form a HTH motif (HTH”;) are calculated as described by Dodd and Egan (23). Protein segments with “SD scores” of 12.50 are good candidates to form a HTH motif (23). f and - refer to the con- formity of the HTH motifs with the Shestopalov rules (24).

and a subset of recently described hypothetic sensors of Streptomyces two-component systems [DnrJ, EryCl, StrS (26 and op. cit.) and Cuts (27)]. No significant homologies were found in either case.

DISCUSSION By sequencing a 5.8-kb segment of the Streptomyces sp.

SA-COO chromosome adjoining the cho operon up- stream, we have identified ORFs organized into character- istic repeating units (Fig. 6). In units 1 and 2, a gene for a putative transmembrane protein (cho-orfl and the N-ter- minal segment of cho-orf3, respectively) is translationally coupled (&o-o@), or fused (&o-or-) to a gene (cho-orfl and the C-terminal segment of &o-o@, respectively) that may code for a protein with a putative DNA-binding domain and homologies to prokaryotic response regula- tors. The presence of the partially sequenced cho-orf4 gene, which also seems to code for a DNA-binding pro- tein with a HTH motif, raises the possibility of the exist- ence of a third repeat of this structure. The reiterated nature of this region is also perceivable at the DNA level: the nucleotide sequences of the three repeating units show homologies of 55.7-64.7x along their entire lengths. The conserved organization of the cho-orfs into repeating units, and the homologies of their nucleotide sequences and deduced protein products imply that this region had arisen by repeated gene duplications followed

- - _ - - - _ 0

cho-orf4 _ _ _ _ _ __ -f

sx 8 l.B 22

cho-or13

na sa K __-_ 3.0 4.5 5.8 kb

cho-orfl cho-orfl

HTH HTH HTH

FIG. 6. Schematic representation of repeating units located upstream of the cho operon. TM: putative transmembrane domain; HTH: postulated helix-turn-helix DNA-binding motif. Ba, BumHI; K, KpnI; Sa, &cl.

by limited divergence. Gene amplification is a widespread phenomenon

amongst bacteria (28), and has been suggested to be the first step in divergent evolution (29). Gene duplication was seen in recent DNA sequencing studies to be a characteris- tic feature of the gene clusters of such secondary metabolic pathways as the biosynthesis of polyketide antibiotics actinorhodin (30), granaticin (31), tetracenomycin (32), curamycin (3), and daunorubicin (26), the production of nonribosomal peptide antibiotics (33), or clavulanic acid formation (34). Some of the repeating units feature genes whose translational stop and start codons overlap (30-32), as is also seen with cho-orfs 1, 2 & 3, with occasional gene fusions that would generate bifunctional proteins (e.g. the bifunctional cyclase/O-methyltransferases of the act and gru ORF4 genes, 35, 31), a property that is also proposed here for cho-orf3.

Gene duplications might arise from homologous re- combination between extensive direct repeats of 1.0-2.2 kb (36), or from illegitimate recombination mediated by short imperfect direct repeats of 5-12 bases flanking the sequence in question (37, 38). It is interesting to note that the high G+C content of Streptomyces DNA might substantially increase the occurrence of small repeat sequences which could serve as substrates for illegitimate recombination (39). Conspiciously, the cho-proximal structural unit 1 of the cho-orf region ends with a com- plicated array of short tandem repeats, amongst which the ACCCCCGG octamer is also reiterated, albeit imperfectly, in similar positions concluding structural units 2 and 3, and at the “fusion region” of the Cho-Orfl-, and Cho- Orf2-like domains of Cho-Orf3 (Fig. 2).

Amplification of specific DNA sequences is often asso- ciated with increased resistance and overproduction of a particular product (40, 41), or prevention of the produc- tion of other substances (42). Bacteria selected in a che- mostat for the utilization of a poorly metabolized carbon source often harbor duplications of catabolic genes (29, 43). In many cases, however, amplification takes place with- out any obvious selective pressure (36, 37), and is not associated with detectable phenotypes (44). The homol- ogies of the reiterated Cho-Orfl-like peptides to the UhpA subfamily of response regulators, and the presence of

VOL. 76, 1993 ANALYSIS OF REPEATING UNITS OF STREPTOMYCES SP. 263

their proposed helix-turn-helix DNA-binding motifs sub- stantiated by a statistical comparison (23), or stereochem- ical considerations (24), suggests that this region might be involved in the transcriptional regulation of a hitherto unknown target gene(s). Recently, a gene coding for a puta- tive DNA-binding protein with unknown target genes was also described from the amplifiable genomic region AUD6 of Streptomyces ambofaciens (42). Although the proposed HTH motifs of the Cho-Orf proteins harbor slightly “unorthodox” amino acid makeups as exempli- fied by the appearance of a phenylalanine at motif posi- tion 9 in Cho-Orf4, or the slightly low Dodd-Egan scores (23) of the motifs of Cho-Orf3 and Cho-Orfl, most of the proposed HTH DNA-binding motifs of Streptonzyces proteins described in the literature score well below the cutoff value in the Dodd-Egan method (23), and/or break the proposed stereochemical rules (24, 45). It is possible that at least some of the Streptornyces DNA-binding domains might employ unorthodox HTH motifs as an adaptation to the high G+C content of the Streptomyces DNA that may influence the sequence and the local struc- ture of their cognate operators, or rely on a different mode of interaction with the DNA.

In a previous study (12), the presence of a 1.2-kb region adjoined upstream of the cho operon in an expression vec- tor was shown to reduce the rate of cholesterol oxidase overproduction in a Streptomyces host-vector system by a factor of 2.3. The effect appeared to be exerted through a 2.6-fold decrease of the copy number of the expression vector (12) amidst unchanged growth characteristics of the host strain. Since the sequences causing this copy num- ber decrease coincide with the coding sequences of Cho- Orfl, we suppose that the incidental expression of cho-orfl from a promoter of the cloning vector would provide a gene product that might interfere with the replication and/or partition functions of the plasmid, or reduce the viability of cells containing the vector in excessively high copy numbers. The putative DNA-binding nature of Cho- Orfl is in accord with such a mechanism.

Further studies on the transcription and translation of these ORFs involving their disruption on the chromo- some of the donor organism are required to shed light on the functional importance of this reiterated region in physiological conditions.

ACKNOWLEDGMENT

This work was supported by grant No. 01480036 to Y. M. from the Ministry of Education, Science, and Culture of Japan. I.M. was supported by a Monbusho Scholarship.

REFERENCES

I. Cartes, J., Haydock, S. F., Roberts, G. A., Bevitt. D. J., and Leadlay, P. F.: An unusually large multifunctional polypeptide in the erythromycin-producing polyketide synthase of SUCC/W- ropolysporu ery/hrea. Nature, 348, 176-178 (1990).

2. Donadio, S., Staver, M. J., McAlpine, J. B.. Swanson, S. J., and Katz, L.: Modular organization of genes required for complex polyketide biosynthesis. Science, 252, 675-679 (1991).

3. Bergh, S. and Uhlbn, M.: Analysis of a polyketide synthesis- encoding gene cluster of Streptornyces curucoi. Gene, 117, 131- 136 (1992).

4. Bevitt, D. J., Cartes, J., Haydock, S. F., and Leadlay, P. F.: 6-Deoxyerythronolide-B synthase 2 from Succhuropolysporu erythrueu. Cloning of the structural gene, sequence analysis and inferred domain structure of the multifunctional enzyme. Eur. J. Biochem., 204, 39-49 (1992).

5.

6.

7.

8.

9.

10.

Il.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

Arima, K., Nagasawa, M., Bae, M., and Tamura, G.: Microbial transformation of sterols. I. Decomposition of cholesterol by microorganisms. Agric. BioI. Chem., 33, 1636-1643 (1969). Nagasawa, M., Bae, M., Tamura, G., and Arima, K.: Microbial transformation of sterols. II. Cleavage of sterol side chains by microorganisms. Agric. BioI. Chem., 33, 1644-1650 (1969). Sih, C. J., Wang, K. C., and Tai, H. H.: Czl acid intermediates in the microbiological cleavage of the cholesterol side chain. J. Am. Chem. Sot., 89, 1956-1957 (1967). Sih, C. J., Tai, H. H., and Tsong, J. J.: The mechanism of microbial conversion of cholesterol into 17-keto steroids. J. Am. Chem. Sot., 89, 1957-1958 (1967). Murooka, Y., Ishizaki, T., Nimi, O., and Maekawa, N.: Cloning and expression of a Sfrepromyces cholesterol oxidase gene in Skeptomyces lividuns with plasmid pIJ702. Appl. Environ. Microbial., 52, 1382-1385 (1986). Ishizaki, T., Hirayama, N., Shinkawa. H., Nimi, O., and Murooka, Y.: Nucleotide sequence of the gene for cholesterol oxidase from a Streptomyces sp. J. Bacterial., 171, 596-601 (1989). Horii, M., Ishizaki, T., Paik, S.-Y., Manome, T., and Murooka, Y.: An operon containing the genes for cholesterol oxidase and a cytochrome P450-like protein from a Sfreptomyces sp. J. Bacterial., 172. 3644-3653 (1990). Molntir. I.. Choi. K.-P.. Hayashi. N.. and Murooka, Y.: Sec- retory ove;prod&tion of S&epto;nyces cholesterol oxidase by Sfreptomyces lividuns with a multi-copy shuttle vector. J. Ferment. Bioeng., 72, 368-372 (1991). Sambrook, J., Frilsch, E. F., and Maniatis, T.: Molecular cloning, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, USA (1989). Hopwood, D. A., Bibb, M. J., Chater, K. F., Kieser, T., Thom- pson, C. J., Kieser, H. M., Lydiate, D. J., and Schrempf, H.: Genetic manipulation of Streptomyces: a laboratory manual. John Innes Foundation, Norwich, UK (1985). Nicholls, R. D., Hill, A. V. S., Clegg, J. B., and Higgs, D. R.: Direct cloning of specific DNA sequences in plasmid libraries following fragment enrichment. Nucl. Acids Res., 13, 7569- 7578 (1985). Bibb, M. J., Findlay, P.R., and Johnson, M. W.: The relation- ship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. Gene, 30, 157-166 (1984). Hopwood, D. A., Bibb, M. J., Chater, K. F., Janssen, G. R., Malpartida, F., and Smith, C. P.: Regulation of gene expression in antibiotic-producing Streptomyces, p. 251-276. Zn Booth, J. R. and Higgins, C. F. (ed.), Regulation of gene expression-25 years on. Cambridge University Press, Cambridge, UK (1986). Wright, F. and Bibb, M. J.: Codon usage in the G+C-rich Sfrepfotnyces genome. Gene, 113, 55-65 (1992). Leskiw, B. K., Lawlor. E. J., Fernandez-Abalos, J. M., and Chater, K. F.: TTA codons in some genes prevent their expres- sion in a class of developmental, antibiotic-negative, Strep- rornyces mutants. Proc. NatI. Acad. Sci. USA, 88, 2461-2465 (1991). Stock, J. B., Ninfa, A. J., and Stock, A. M.: Protein phosphory- lation and regulation of adaptive responses in bacteria. Microbial. Rev., 53, 450-490 (1989). Harrison, S. C. and Aggarwal, A. K.: DNA recognition by pro- teins with the helix-turn-helix motif. Annu. Rev. Biochem., 59, 933-969 (1990). Raibaud, A., Zalacain, M., Holt, T. G., Tizard, R., and Thompson, C. J.: Nucleotide sequence analysis reveals linked N-acetyl hydrolase, thioesterase, transport, and regulatory genes encoded by the bialaphos biosynthetic gene cluster of Strep- tomyces hygroscopicus. J. Bacterial., 173, 4454-4463 (1991). Dodd, 1. B. and Egan, J. B.: Improved detection of helix-turn- helix DNA-binding motifs in protein sequences. Nucl. Acids Res., 18, 5019-5026 (1990). Shestopalov, B. V.: Amino acid sequence template useful for n-helix-turn-(r-helix prediction. FEBS Lett., 233, 105-108 (1988). Baumeister. R.. Mliller. G., Hecht, B., and Hillen, W.: Func-

264 MOLNAR AND MUROOKA J. FERMENT. BIOENG.,

26.

27.

28.

29.

30.

31.

32.

33.

34.

tional roles of amino acid residues involved in forming the cl-helix-turn-cl-helix operator DNA binding motif of ret repressor from TnlO. Proteins: Structure, Function, and Cenet., 14, 168- 177 (1992).

Slutzman-Engwall, K. J.. Ottcn, S. L., and Hutchinson, C. R.: Regulation of secondary metabolism in Slrepfotnyces spp. and overproduction of daunorubicin in Streptotttyces peuceficas. J. Bacterial., 174, 144-154 (1992). Tseng, H. C. and Chen, C. W.: A cloned or?tpR-like gene of Streplotnyces lividans 66 suppresses tttelC1. a putative copper- transfer gene. Mol. Microbial.. 5. 1187-l 196 (1991). Anderson, R. P. and Roth, J. R.: Tandem genetic duplications in phage and bacteria. Annu. Rev. Microbial., 31, 473-505 (1977). Rigby, P. W. J., Burleigh, B. D.. and Hartley, B. S.: Gene dupli- cation in experimental enzyme evolution. Nature, 251, 200-204 (1974). Fernandez-Moreno, M. A., Martinez, E.. Boto. L., Hopwood. D.A., and Malpartida. F.: Nucleotide sequence and deduced functions of a set of cotranscribed genes of Streprotttyces coe- licolor A3(2) including the polyketide synthase for the antibiotic actinorhodin. J. Biol. Chem., 267. 19278-19290 (1992). Sherman, D. H.. Malpartida, F.. Bibb, M. J., Kieser, H. M., Bibb, M. J., and Hopwood. D. A.: Structure and deduced func- tion of the granaticin-producing polyketide synthase gene cluster of Slreptotnyces violaceoruber Tii22. EMBO J., 8, 2717-2725 (1989). Bibb, M. J., Bird, S., Motamedi, H., Collins, J. F., and Hutchin- son, C. R.: Analysis of the nucleotide sequence of the Slrep- to/rryces glaucescetts fctnl genes provides key information about the enzymology of polyketide antibiotic biosynthesis. EMBO J., 8, 2727-2736 (1989). Kleinkauf, H. and von Dohren, H.: Nonribosomal biosynthesis of peptide antibiotics. Eur. J. Biochem., 192. l-15 (1990). Marsh, E. N., Chang, M. D.-T., and Townsend, C. A.: Two isozymes of clavaminate synthase central to clavulanic acid for- mation: cloning and sequencing of both genes from Strepto-

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

tttyces clavuligertrs. Biochemistry, 31. 12648-12657 (1992). Hopwood. 1). A. and Sherman, D. H.: Molecular genetics of polyketides and its comparison to fatty acid biosynthesis. Annu. Rev. Genet., 24, 37-66 (1990). Fishman, S. E.. Rosteck. P.R., and Hershberger, C. L.: A 2.2 kb repeated DNA segment is associated with DNA amplifica- tion in Streptottl.vces fradiae. J. Bacterial., 161. 199-206 (1985). Hiiusler, A., Birch, A.. Krek. W., Piret. J., and Hutter. R.: Heterogeneous genomic amplification in Streptotttyces glau- cescens: structure, location, and DNA sequence analysis. Mol. Gen. Cenet., 217. 437-446 (1989). Nakano, M. M., Ogawara, H.. and Sekiya, T.: Recombination between short direct repeats in Streptotttyces hettdulae plasmid DNA. J. Bacterial., 157, 658-660 (1984). Birch, A., Hiiusler, A., and Hutter, R.: Genome rearrangement and genetic instability in Streptott?,~ces spp. J. Bacterial., 172, 4138-4142 (1990). Sedlmeier, R. and Altenbuchner, J.: Cloning and DNA sequence analysis of the mercury resistance genes of S/rep/otttyces hiduns. Mol. Gen. Genet., 236, 76-85 (1992). Orlova, V. A. and Danilenko, V. N.: Multiplication of a DNA fragment in S!repfotttyces antibioticus producing oleandomycin. Antibiotiki. 28. 163-167 (1983). Simonet. J.-M., Schneider, D., Voltf, J.-N., Dary, A., and Decaris, B.: Genetic instability in Slreptotttyces atnbofacietw: inducibility and associated genome plasticity. Gene, 115. 49-54 (1992). Mortlock, R. P.: Metabolic acquisitions through laboratory selection. Annu. Rev. Microbial., 36, 259-284 (1982). PristaS, P. and Godany, A.: Cloning and characterization of an amplified DNA sequence in chromosomal DNA of Streptotttyces aureofucietfs 2201. FEMS Microbial. Lett., 96, 167-172 (1992). Molnar, 1. and Murooka. Y.: Helix-turn-helix DNA-binding motifs of Sfreplottlyces-a cautionary note. Molec. Microbial.. 8. 783-784 (1993).