8
JOURNAL OF BACTERIOLOGY, May 1987, p. 2142-2149 Vol. 169, No. 5 0021-9193/87/052142-08$02.00/0 Copyright © 1987, American Society for Microbiology Gene Fusion Is a Possible Mechanism Underlying the Evolution of STAJ ICHIRO YAMASHITA,* MOTONAO NAKAMURA, AND SAKUZO FUKUI Department of Fermentation Technology, Faculty of Engineering, Hiroshima University, Shitami, Higashi-Hiroshima 724, Japan Received 3 September 1986/Accepted 19 January 1987 DNA from the STAl (extracellular glucoamylase) gene of Saccharomyces diastaticus was used as a probe to enable the cloning by colony hybridization of three DNA fragments from Saccharomyces cerevisiae; these were designated Si, S2, and SGA (intracellular, sporulation-specific glucoamylase gene). To examine the evolution- ary relationship among these sequences at the nucleotide level, we sequenced S2, Si, and SGA and compared them with STAI. These data and RNA blot analysis revealed that the following regions of STAI were highly conserved in S2, S1, and SGA: upstream regulatory sequences responsible for transcription, a signal sequence for protein secretion, a threonine- and serine-rich domain, and a catalytic domain for glucoamylase activity. These results suggest that an ancestral STA gene was generated relatively recently in an evolutionary time scale by the sequential fusions of S2, Si, and SGA, with Si functioning as a connector for S2 and SGA. We describe a model for the involvement of short nucleotide sequences flanking the junctions in the gene fusions. Procaryotic and eucaryotic cells have many complex regulatory systems to maintain homeostasis and to adapt themselves to environmental shifts. In these systems, con- siderable portions ofbiochemical reactions may be catalyzed by multifunctional or ailosteric proteins which have a regu- latory (or an effector-binding) domain and a catalytic do- main. It has been convincingly shown that complex proteins can mediate such complicated regulation more effectively than simple proteins. The recent accumulation of gene sequence data has revealed that some multifunctional pro- teins have similar catalytic or regulatory domains which might have been derived from common ancestral genes (12-15, 21, 25, 27). Furthermore, it is known that most secretory or organelle-translocating proteins have specific, but structurally and functionally related, signals at their amino-terminal regions (5, 6, 26) and that genes under concerted regulation have similar regulatory sequences in their 5' upstream regions (8, 10, 16). Evolution of such complex genes is one of the central issues in biology. The presence of homologous domains in otherwise unrelated proteins may be due to gene fusion rather than independent evolution, because such homolo- gous sequences are usually found to be multiply dispersed in modern genes. Similarly, one might anticipate that an ances- tral regulatory sequence might have been dispersed by gene fusion to generate a family of genes subject to concerted regulation. However, we know of no gene fusion event that has been clearly demonstrated at a molecular level nor do we know the mechanism for such a fusion. This may be because the modern genes so far sequenced have completely lost traces of fusions between ancestral genes at the junctions, but have maintained limited homologies at the functionally essential regions during evolution. In this respect, to inves- tigate the mechanism and the role of gene fusion in the evolution of genes, we must search for the most recent gene fusion and must clone and sequence both a newly fused gene and its ancestral genes. Recently we found that a gene fusion * Corresponding author. which might have occurred very recently in the yeast genus Saccharomyces was highly attractive for such investiga- tions. Among a number of Saccharomyces species, S. diastaticus is notable for its ability to secrete glucoamylase, which is encoded by STAI, extracellularly and to ferment starch (33, 34). Genetic studies suggest a significant role for AMY2 (30) and an inhibitory effect of heterozygosity at MAT (29, 36) on the expression of STAI. On the contrary, S. cerevisiae, a starch-nonfermenting species, lacks functional STA genes (31, 33) but carries SGA which codes for intra- cellular glucoamylase and is expressed specifically in meio- sis and sporulation (32). S. diastaticus is closely related to S. cerevisiae, since haploid cells of these species are able to mate, and they are genetically similar to each other (23). Thus, S. diastaticus might be considered to be derived from S. cerevisiae by the acquisition of the gene for extracellular glucoamylase. The genetic separation between the two spe- cies, however, is not simple since S. cerevisiae exclusively carries a gene, INHI, which is inhibitory to the expression of STAI (31). By Southern blotting with fragments of the cloned STAI DNA as probes, we observed that SGA is homologous to the 3' region of STAI and that the two DNA fragments (S2 and S1) which are linked to each other in the genome of S. cerevisiae are homologous to the 5' region of STAI (33). In this paper, we cloned and sequenced S2, Si, and SGA to clarify the evolutionary relationship among these sequences and STAI at a molecular level. We propose that the fusion of resident genes S2, Si, and SGA in S. cerevisiae is the most likely mechanism underlying the evolution of an ancestral STA gene and discuss the possible role of gene fusion in the evolutionary histories of modern genes. MATERIALS AND METHODS Constructions of genomic library of S. cerevisiae. We con- structed a recombinant plasmid library that was representa- tive of the genome of S. cerevisiae AH22 (MATa stao AMY2 2142

Gene fusion is a possible mechanism underlying the evolution of

  • Upload
    lykhue

  • View
    230

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Gene fusion is a possible mechanism underlying the evolution of

JOURNAL OF BACTERIOLOGY, May 1987, p. 2142-2149 Vol. 169, No. 50021-9193/87/052142-08$02.00/0Copyright © 1987, American Society for Microbiology

Gene Fusion Is a Possible Mechanism Underlying theEvolution of STAJ

ICHIRO YAMASHITA,* MOTONAO NAKAMURA, AND SAKUZO FUKUIDepartment ofFermentation Technology, Faculty of Engineering, Hiroshima University, Shitami,

Higashi-Hiroshima 724, Japan

Received 3 September 1986/Accepted 19 January 1987

DNA from the STAl (extracellular glucoamylase) gene of Saccharomyces diastaticus was used as a probe toenable the cloning by colony hybridization of three DNA fragments from Saccharomyces cerevisiae; these weredesignated Si, S2, and SGA (intracellular, sporulation-specific glucoamylase gene). To examine the evolution-ary relationship among these sequences at the nucleotide level, we sequenced S2, Si, and SGA and comparedthem with STAI. These data and RNA blot analysis revealed that the following regions of STAI were highlyconserved in S2, S1, and SGA: upstream regulatory sequences responsible for transcription, a signal sequencefor protein secretion, a threonine- and serine-rich domain, and a catalytic domain for glucoamylase activity.These results suggest that an ancestral STA gene was generated relatively recently in an evolutionary time scaleby the sequential fusions of S2, Si, and SGA, with Si functioning as a connector for S2 and SGA. We describea model for the involvement of short nucleotide sequences flanking the junctions in the gene fusions.

Procaryotic and eucaryotic cells have many complexregulatory systems to maintain homeostasis and to adaptthemselves to environmental shifts. In these systems, con-siderable portions ofbiochemical reactions may be catalyzedby multifunctional or ailosteric proteins which have a regu-latory (or an effector-binding) domain and a catalytic do-main. It has been convincingly shown that complex proteinscan mediate such complicated regulation more effectivelythan simple proteins. The recent accumulation of genesequence data has revealed that some multifunctional pro-teins have similar catalytic or regulatory domains whichmight have been derived from common ancestral genes(12-15, 21, 25, 27). Furthermore, it is known that mostsecretory or organelle-translocating proteins have specific,but structurally and functionally related, signals at theiramino-terminal regions (5, 6, 26) and that genes underconcerted regulation have similar regulatory sequences intheir 5' upstream regions (8, 10, 16).

Evolution of such complex genes is one of the centralissues in biology. The presence of homologous domains inotherwise unrelated proteins may be due to gene fusionrather than independent evolution, because such homolo-gous sequences are usually found to be multiply dispersed inmodern genes. Similarly, one might anticipate that an ances-tral regulatory sequence might have been dispersed by genefusion to generate a family of genes subject to concertedregulation. However, we know of no gene fusion event thathas been clearly demonstrated at a molecular level nor do weknow the mechanism for such a fusion. This may be becausethe modern genes so far sequenced have completely losttraces of fusions between ancestral genes at the junctions,but have maintained limited homologies at the functionallyessential regions during evolution. In this respect, to inves-tigate the mechanism and the role of gene fusion in theevolution of genes, we must search for the most recent genefusion and must clone and sequence both a newly fused geneand its ancestral genes. Recently we found that a gene fusion

* Corresponding author.

which might have occurred very recently in the yeast genusSaccharomyces was highly attractive for such investiga-tions.Among a number of Saccharomyces species, S.

diastaticus is notable for its ability to secrete glucoamylase,which is encoded by STAI, extracellularly and to fermentstarch (33, 34). Genetic studies suggest a significant role forAMY2 (30) and an inhibitory effect of heterozygosity at MAT(29, 36) on the expression of STAI. On the contrary, S.cerevisiae, a starch-nonfermenting species, lacks functionalSTA genes (31, 33) but carries SGA which codes for intra-cellular glucoamylase and is expressed specifically in meio-sis and sporulation (32). S. diastaticus is closely related to S.cerevisiae, since haploid cells of these species are able tomate, and they are genetically similar to each other (23).Thus, S. diastaticus might be considered to be derived fromS. cerevisiae by the acquisition of the gene for extracellularglucoamylase. The genetic separation between the two spe-cies, however, is not simple since S. cerevisiae exclusivelycarries a gene, INHI, which is inhibitory to the expression ofSTAI (31).By Southern blotting with fragments of the cloned STAI

DNA as probes, we observed that SGA is homologous to the3' region of STAI and that the two DNA fragments (S2 andS1) which are linked to each other in the genome of S.cerevisiae are homologous to the 5' region of STAI (33). Inthis paper, we cloned and sequenced S2, Si, and SGA toclarify the evolutionary relationship among these sequencesand STAI at a molecular level. We propose that the fusion ofresident genes S2, Si, and SGA in S. cerevisiae is the mostlikely mechanism underlying the evolution of an ancestralSTA gene and discuss the possible role of gene fusion in theevolutionary histories of modern genes.

MATERIALS AND METHODS

Constructions of genomic library of S. cerevisiae. We con-structed a recombinant plasmid library that was representa-tive of the genome of S. cerevisiae AH22 (MATa stao AMY2

2142

Page 2: Gene fusion is a possible mechanism underlying the evolution of

GENE FUSION IN THE EVOLUTION OF STAI 2143

pScMtO

.PS M

pSCM15pScMll

pScM6. ~~~~~

D -e

/1 5 .84 A eG--=

I a, Pt v

S2

}1 I.11I.111t., :I

I,t.111

," I

12, 6,

I IKI go Ift PMg, __ _ _ _ __RmThiai

PkAi *AA OW s

SGA P PtK I * X ft aPt aE H EAN S1MV a It tj a;-fi*tJ.f..pScM13

pScY2pScY3,~~~~~S

pScM26 , k,

FIG. 1. Physical maps of S2, Si, SGA, and STAI and their putative products. Physical maps of nine positive clones (pScMi, pScM6,pScM10, pScMll, pScM13, pScM15, pScM26, pScY2, and pScY3) are drawn to scale. Unmapped regions in the clones pScMlO and pScM15are drawn with broken lines. The sites recognized by restriction enzymes PstI (pt), BamHI (B), HpaI (Hp), BstEII (Bt), KpnI (K), PvuII (Pv),HindIII (H), EcoRV (V), StuI (St), SalI (S), EcoRI (E), BanIII (III), and XhoI (X) are indicated. Regions showing sequence homologies byboth restriction mapping and Southern blotting are marked by striped, dotted, and open boxes. Nucleotide sequences were determined in theclones pScM6 and pScM13 in the regions marked by thick solid lines. Labeled restriction fragments (A, B, and C of STAI and D of pScM6)were used as probes. Putative gene products deduced from the nucleotide sequences (see Fig. 2) are shown schematically. Amino acids(indicated by three-letter symbols) are numbered in agreement with the STA1 product. The amino-terminal amino acid of the STA1 productwas expected to be the second methionine in the open reading frame (34), because Si mapping analysis (11) revealed that the major STAltranscripts started at and 3 base pairs downstream from the first ATG codon (data not shown). The amino-terminal hydrophobic peptide fromMet-1 to Gly-21 of the STAI precursor is cleaved off during the secretion of the precursor (35).

INHI SGA) which also carries the S2 and Si regions. Totalgenomic DNA from the strain was partially digested withrestriction endonuclease Sau3A, and fragments larger than 5kilobases were recovered by sucrose gradient centrifugation.The Sau3A-digested DNA fragments were inserted by liga-tion into the unique BamHI site of cloning vector pYI1 (28)which carries ampicillin resistance (Apr) and tetracyclineresistance genes for Escherichia coli and also the yeastLEU2 and URA3 genes. The resulting recombinant DNAmolecules were used to transform E. coli to Ampr. Thelibrary contained more than 26,000 Ampr clones, about 80%of which were tetracycline susceptible.Colony hybridization. The Ampr colonies were transferred

onto nylon membranes (Pall Biodyne; Pall Ultrafine Filtra-tion Corp., Glen Cove, N.Y.) and processed for hybridiza-tion as recommended by toe supplier. Restriction fragments(A, B, and C) of the cloned STAI DNA were labeled with[a-32P]dCTP by nick translation (17) and used as probes. Themembranes were hybridized with the radioactive probes at42°C for 20 h in 50% (vol/vol) deionized formamide-5 x SSC(1 x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) (20)-S50

mM sodium phosphate (pH 6.5)-sonicated and heat-denatured cod sperm DNA (50 ,uglml)-0.02% bovine serumalbumin-0.02% Ficoll 400-0.02% polyvinylpyrrolidone k-90.The membranes were then washed successively in 2x SSCcontaining 0.1% sodium dodecyl sulfate at room temperatureand in 0.1x SSC containing sodium dodecyl sulfate at 42°C,dried, and exposed to Fuji X-ray films at -70°C overnightwith an intensifying screen. Positive colonies were purified,and then colony hybridization was repeated-.RNA blotting. Cells were cultured in YPGL medium (i%

yeast extract, 2% polypeptone, 2% [wtlvol] glycerol, 2%[wt/vol] lactic acid; pH 6.2, adjusted with 10 N NaOH).RNAs were isolated (7), fractionated by electrophoresis on1% agarose gels (9), and transferred to nitrocellulose papers(24). The papers were hybridized as described above withthe nick-translated probes D and C (Fig. 1) and YIp5 (22),which carries the yeast URA3 gene.

RESULTSCloning homologous segments (S2, S1, and SGA) with STAI

from S. cerevisiae. To obtain homologous sequences (S2, Si,

S2,S1

STAl

'L

&

I i

'290 0mg L_I

--y

. . - - . - d , , --y .- IFlo, I.-

rooi-

0

I

0Am

=J.r.

VOL. 169, 1987

Page 3: Gene fusion is a possible mechanism underlying the evolution of

2144 YAMASHITA ET AL.

3332 -400

G67tA CICTTTAC&AA AATCTCATAG AGTTACCAAT YGGGATTCAA GGCATCATCG3600 -340 -320 -300 -280

8 -CA"A?ACT, C6TTC7777£ CGCAG£AAAT AAGCTCTT7C TACTTTGAAT TAACTG7TAG ACTTGTCTTA YCTCAGGAAT GTCCGTG7TC

-260 -240 -220 -200G&A7TAA£7A £AAAATTACC GCAGTTTTAT TTACCITAAC AAATA7GT7C AAGCATT7AC G7TAC7rGCGC ?C*C7TCTAG TTCAAGAACG-160 -160 * 210G*7AACTCAT AGACTTACCt GTCA£C££TTG ?7CA£GCGTT7 ChATA7 AAAfAAGGA0C TC?C7CTT?C 7££ ?££ a£££

* 0-0 -40 A -20C£CCC7£TTC £YCAGTTAT? ATCCCTCGTC ATCT76TGGT TCT£AT7VAAA £YA?£CT44 GTA0GtCC7C£ AAAATCCATA TACGCACACT

I£0C£A AGA CCA 77? C7£ CTC GC? TA? 770 GTC 4;TT TCG CT? CTA 777 AAC YC1 GC? 776 GOT T77 CCA AC? OMCAnotA gin &rg pro Ph. lOU 1ou ala Apr too val len setr li led ph. &an mar aia lea gip ph. pro thr lai

* 10t 20C7A 07? CCA AGA 0GA TCC 7C~ GAA GG* ACT £GC 707 AA? ?CT A?C G77 AAT GGC 70? CCC £AC 77£ GAC 77C AA?lea val pro arg giy mar so gin 917 thr mar cym mass er LIe Val man gip eye pro man leu masp phe ans

30 40 s0TOG CAC £70 6AC CAG C£A AA? £?C #TG CAG 7£? AC? 770 0A? 070 AC? 7CC G6?? CT 700 077 Ck£ GAC A£C ACAArp his not map gin gin ass LI? mot gim Apr thr Lou asp Val thr mar Val gar trp Val ginD asp mans hr

00 70

?AC CAA £7C AC? £77 CA? 070 AAA 00? AAA GAA AA? £77 GAC C70 ARO T£? CT£ 700 ?CT 776 A£A A7C £77 007Apr gInlaL Ahr le hLm val 1pm gly 1pm giu mans Ale asp 1.u 1pm tpr lea trp mar I.u 1ym I1. L10 glp

so 90 100Kral

G7C £CT 007 CCA AAA G0? ACC G70 C£A C?A ?£C 0G? ?AC MAC GAA A&? £CC 7£? 770 £77 GAC AAC PC£ AC? 0£TVal Ahr gip pro 1pm gip thr Vmi gim lea Apr gip Apr amm gin mmm thr Apr iean1liemp mnm pro xhr masp

-110 120

770 £CA CCC AC? 7?? GMA 6?? 7£? GCC AC4 CM 0£? 070TCAACOtC 70? CAG 070 700 £76 CC? AMC 7C C£A £7?ph. Ahr aim Ahr phg gin Val tpr mima ~hr in masp pi mans mar epa gim Vml trp sit pro mans ph. Ima Ile

120 140 150

CAA 77C GAG T£T 770 CAA GC? ACT 6CC CC? CMA 7£? GCA AGC 7CC 700 CM& 700 06£ AC? ACA ?C? 77? C£? 776gIna ph. gin Apr lIne gIna gip mar mim gim, gin Apr 4aim r mar App gIna Aerp gip Ahr Ahr mar ph. mmp laOU

160 170?CT AC? G67 767 M&C AAC 7£? CAC £AT CAA 0CC CAI: ?C? CMA ACG C£? 77C CCA 0CC 776C 7£? 760 MAC A 0£?mar Ahr gip epm mms mmn Apr masp mmn gIna g hig mar gin Ahr mgp pha, pro nip ph. Apr trp mms Ala masp

laO Ino 20076? CAC AA? MA? 767 GGC G67 £66 AA6 TCA 7C7 ACC AC? ACA TC7 AGC ACT?7CC GAG 7CA ?C? ACC AC? &CA C?Crpm amp mans can ca nip gip thr lypsemt aar Ahr Ahr Ahr aar aar Ahr mar gin sgr mar Ahr Ahr Ahr mar

210 220

AGC £67 TCC GAG T6£ TC? ACC AC? ACA TCT £GC ACT TCC GAG 76£ T67 £66mar Air mar gl u wer gar thr thr th r gar mar Air sar gi u mer ser thr

230 240

GCC? CCA 07£ CC& AC? CCA ?CC AOC ?C? AC? AC? 6AA AGC 707 ?CT CC? CC& 570 CCA AC? CCA 7CC AGC 707 AC?baima pro ?U4 pro hhr pro mar far mar Ahr Ahr gin aor mar mar lai pro vil pro Ahr pro mar mar aer thr

I 10 20

ACC 6£A AGC 707 707 GC? CCA 07£ CCA ACC CCA ?CA AGC 70? AC? AC? GMA AGC 707 07£ OFA CCA 07£ CC£ ACCAhr ginu mar sar mawrtai pro vat pro Ahr pro mar mar aar thr Ahr gin mar Sar vmt mlm pro wal pro Ahr

20 40 50

CCA C?C ?CC ?C? AGC M&C A?C ACT?7CC 7CC GC? CCA 7C? ?CA ACC CCA 770 AOC ?C? AOC AC? GML AGC 7C? 7C?pro mar mar mar aar mans La Ahr mar mar mia pro mar aar thr pro phe mar par sear Ah gia mar aar mar

60 70'67? CCA 67£ CCA ACC CCA ?CA AGC TC? AC? AC? GAA AGC 7C7 ?C? GCC? CCA 07£ ?CC AGC 7CC ACC AC? "A£ AGCCVml pro vai pro Ahr pro mar aar mar Ahr Ahr gin aar mar mar mlm pro Val nar mar mar Ahr Ahr gin mar

so t..o iogaar Gml mim pro Gmi pro Ahr CIA mar# CC ?C? AGC AAC A7C ACT?C7C TCC 6G? CCA ?C? ?rC£ A?? CCA, 776 £OCSOCTatalaproVolprothepro marse mar aar man Ale Ahr mar meearmi pio mar mar AIme pro pha mar

KLudIll 4110 120

70? AC? AC? GMA AGC 777 ?C? AC? 6CCC ACT AC? 670 AC? CCA ?CA 70£ ?CC M&A ?&C C"~ GGC AC? SMk ALCA GMAmar Ahr Ahr ginsemrph. aer Ahr gip Ahr Ahr vml Ahr pro mar mar mar ipa Apr pro glp mar Ila Ahr gin

120 140 ISO£66 ?CT G77 TC? 7C7 &CA £CC CMA AC? ACC £77 077 CCA AC? MIA AC? ACG AC? ?C? OTC aS? ACA CCA 70£ AC&throt at ow erAhr Ahr glu Ahr Ahr Lie vil pro thr twAr r thr oar ultrth ufoi

ACC AC? £77 ACC AC? ACG CT? 70C 7C? ACA 06£ ACA MAC ?C? CCC CC? GMA AC?t £ct ?PC? 00£ 7CC ?C?P CCi M-ACthhi h hr Ahr vat eya mar Ahr gly Ahr mms mar mim siy gin Ahr Ahr aor slp eye mar pro IL a

ACC 0??t ACA AC? AC? C77 CCAL AC? AC& AC? ACG AC? ?C? G?C An? ACAL ICA 70£ ACA £iCC AC?f £77 AC? AC?t ACtAhr Vai Air Ahr Ahr vat pro thr tAhrb the bAh t Vai Ahr AhbrAhawAiotAhthtbth

2nhrmr0a Ar220

677 7GC 707 ACA 06£ ACA AAC 7070CC 00? GAM AC? £CC ?C? 00£ 7CC ?C? CCfo AM Ace £A" ACA AC? AC? 677Val eye mar Ahr Aihr mans tow im glp gin Ahr Ahr mar l eye mar pro Ira Ahi Alp Ahr Ahr Ahr a

CCA 707 TCA &CC AG? CCA £6C GMR ACC GCC 700 GML 70£ ACA ACC ACT?C70 CC? ACC ACA CC? 67£ ACT ACA 677prq oay me Ahr saar pro aar gIna Ahr mim mar gIna mar Ahr Ahr Ahr mgr pro Ahr Ahr pro Val Ahr thr Vml

200 270

CTC ?CA ACC ACC G?C 077 AC? AC? GM 7£? TCM ACT AG? ACA AM CIA 007 60? GMA £77 ACA AC? ACA 777 676vml aar Air Air val Vml Ahr Air gin Apr ear Air mar Air 1pm pro glp glp gin Ala Air Air Air pbe vmi

200 290 IlsallZ 200

&CC AMA AAC AT? CC£ ACC £CT ?AC C?A ACS AC'A kTT GCC? CCA ACT SCjA 70£k 070C AC ACG 677 £CC M1? ??-C ACCAir 1pm man Ila pro Air Air Apr ian Air Air ALa alp pro Air pro saw vmt Air Air vmt Air &ma Pfin Air

310 320

CCA ACC AC? AT? AC? AC? ACG 077 7CC 7C? ACAJ 061 ACA MLC TC? GCC GO? GMA AC? JCC 70? GG£ 7CC ?PC7 CCApro Air Air ALa Air Ait Air vel aep aar Air nip Air mans sor mim slp gin Air Air mar nip epe a* pro

230 340 250

MCG AC?t G7C ACA ACC AC? 077 CC? 70? 70£ AC? 007 AC? GGC GMA ?AC AC? AC? GM& SC? ACC tACq C*?? G?? ACAlpe tir Vmi tir Air Air Vml pr'o epa mar Air nip Air nip gin Apr tir Air gin mim Air Air lan Vmi Air

260 270ACA 667 OTC ACA £CC ACC 677 67? ACC ac? GMA ?CC ?C? ACG GOT AC? M&C TCC GC? 06? AM ACG ACA AC? 56?Air &ai vmi Air Air Air Vmi Val Air Air gin aar mar Air gip Air ass mar mla gip Apa Air Air Air gip

360 .120 400

?AC ACA ACA MAG 7C? 67£ CC A£CC £CC 7£? 07£I ACC AC? 770 0C7 CC&AGM7 CA CCA 07£ AC? CC? 0CC AC? AA?Apr Air Air lpa aar val pro Air Air Apr vailtbAi Ar ian ala pro mar aim pro vat Air pro mla Air mans

410 420

GCC 07£ CCA AC? ACA A7A ACC AC? AC? "A£ 767 TCM 067 C? ACA AAC GC? GCC 067 GAA AC? AC C?C 67£ 7GCC&ai val pro Air Air Ila Air Air Air lim pay mar aima ala, Air man mla ala gip gin Air Air aor Val cpa

430 440 450?C? GCC? MAG AC? T£70 67£ £? C? 06£ £06 06£ GGC GMA MJC AC? GCA CC? 7CG CC? ACC £CC CC? 070 ACG ACGaar aimLy1pm Airla val mar mar alp aar ala li glin mana Air &If pra mar aIm Air Air pro vat Air tir

460 470

CC? A7? CCA £CC AGA 077 £77 ACC AC? GAG 70£ ?C? 077 067 AC? AAC 706 OCT GOC CMA ACA ACA AC? 667 TACala AIm pro Air Airc vat Ala Air Air gin mar mar val glp Air main mar &ai glp glm Air Air Air glp Apr

4a0 400 500

ACA £CC AAG 7CC £70 CCA ACC ACT?TAC A?A ACC AC? 770 £77 CCA 067 ?C£ MA? 007 0CC MAG MT ?AC 6AA AC?Air Air 1pm mar Ila pro Air Air Apr Lla Air Air ion Aim pro glp mar manA gipai lya ama tp; gin Air

510 520G?G GCC ACA GCCA ACC MAC CC? £77 7C£ £70 MAG AC? ACA 7CC CMA C?A GC7 A4CA ACA 067 ?C? GC ?C76 ACC 676val ala Ahir ala Air &ama pro Lla ipar AIl.ypmAithei saar gIn ian ala ibr Air aim mar laimemar mavi

S30 540 550

OCT CCC 6T7 GsC ACA sCs CCA SCT CTA AC? G67 CCA C?A CMa sCs CC? sC? 667 TCT cc6&la pro Val Val thA aet pro mer lou tAr gly pro lou gIn *mar &la nr glypor ala

So0 S70

J. BACTERIOL.

Page 4: Gene fusion is a possible mechanism underlying the evolution of

VOL. 169, 1987 GENE FUSION IN THE EVOLUTION OF STAJ 2145

SamOil/Sau3A -1140 -1120 -1100c ~~~~~GATCTTACC CATCAOAATA TTTTTATCGT GCAGATGCAA GGCGGAGTGA GGACGTGCGG A0CTAOCGAC

-1090 -1060 -1040 -1020 -1000CTGCGATAGC AACGTTTGTT COTGCACTAG GCTCOGAAOO CTT?GTAGGT CGGGAAACTG GTACTCGAAG CAGT?CAGAA CGTOCGCGCC

-900 -960 -940 -920 upa!CGCCGCC?CT TCCCCGOOCC COGCCCCCCO ACCACTCAOA AOCAAC?OTG GATGOTGTAA CTGCCGCAGCA ATGGACGATT TAOAGTAAC-900 -330 -060 -040 -020CTOTCOAATA TOGAG0CCGG CGCOGAA6CT OGCCGGCGCC AGTCCCTATC CAGTACGCTG ACGAGGTAGA GACGCTGATA GCGCCCCAGG

-000 -780 -760 -740GOGOTGTCCG T6AAGTG0AC CGTCYCACTG ??AAOGCTAA AAGCCGGGATA TTTC6TGTTG GAGAAGGTGT CT60ATGACA GTATT?OGAG-720 -700 -680 -660 -640CTTGCCGTG TOGTGGGGAGA AGAACTGGAT GCCOTACCGAG AACGAGCAAG OTAAAACACT AGTACACGAA T6AOTAGAAOAA?A6TG00006

-620 -600 -000 -0600060606060 GT?CAAGTGT GYACACACG? ACACGCACAA GCCACAGACG CCACGCGGCC CGGCATYCA? A?AOGTACAO ACATTTATGCC-540 -520 -500 -400 -460ACATA?A?AT ATA06ATA6A AT00A?A?G0 6AA??G?A0A A?ACAGCC?G ?GA000CGCG CGCCCGAATGG GCCGAAAAGC ACA0TATAGT

-440 -420 -400 -3000060660660 6060600606 6000060600 GGCAAOAGCA 60GG60000G TAGOATACAG OGAAGGCCAA GCTGTTGTTC AATGGATGCG-360 -340 -320 -300 -200G70CGAGGCC CCAGCGCAAG GGGGGCGC?? CGAAOCA?AG AACA??A?CC GCGGAAACGG 0060060000 0060000066 TAAGGAAAGO

-260 -240 -220 -200CAGGGAkAAC GGGCCAGAGT AACACCCA?T CA?AGCAC?C G?ACAAGGOG C?O?TAACTT GCCTGCATGT GTGGAG0CAC 6660060000-100 -160 -140 -120 -100 0a03!ACOCAGGCAC AGAAGCAAGG 00CC?????? GG00CCC000 0?CC?CCOGC GCAT??CGTA OTTTT?C?CA OC?C00GGC? C?GGAOCCO0

-00 -60 -40 -200A?C?Gw CT G00ACACA6OG AAA?CG?ACA ???6CAAT60 A00GA0A60C G0GGACTA0A GCAAGA000C 6660660066 CAGCACCAAA1 I~~~~~~~~~~~~~~~~~~~~~~~~~ho!

A00 OCA A0A CAA AA0 600 0?? 0?A 6AC AA6 ??A COC GGC 600 COC 6GC GTA 006 TOC 000 00? GCC 000 0CG COCMet ala arg JIM lye not phe tyr a00 lye lee 1.u gly mat leu ear vol gly ph. gly ph. ala trp ola leu1 10 20

GAG AAC AT? ACT 606 TAC 066 " A 0?? GGC 660 GGC A?? C?C GA? CAA AGC OtC GGC GGT 006 T00 ?CA AAICglu ass Ile tAr ile ty0 11.1 ph. amp pbe gly lye gly ile lou amp gli ocr tyr gly gly vol pho ser aem

* 3 Pvc!! 40 s

AAC GGC CC? ?C0 CAA 000 CAG CTO COO 06?0TCIAGC 000 A00 660 GG0 ACA 000 006 ?AC GAT? CA AAC GGC GC?003 gly pro nor g1n Val 910 leOU ar asp ala Vol 1OU net 000 gly thr Vol Vol tyr asp ~or ass gly a1a

60 70000 GAC AG? 600 GCCG CO GAG 066 000 C?C CAC 806 CAG 666 666 0??T TCC 6TC 066 6JA 606 T?? 066 660 AT0trp osp wer aer ala 1.u glu glu trp lea Ila gly gle lyg lys vol gcr 11. glu lye Lis phc glo oss 11.

00 90 100

OGG CCC AGC GCC G00 06? CCG ?C? 60? 0CG CC? 000 GOC 000 A0? GCCC ?CA CCA 0CG CAA ACG CA? CCA GAC TACgly PrO Der olo vol tyr pro mar lIe cer pro gly vol vol 11e ole sor pro mer glo thr his pro omp tyr

110 120??C ?AC CAA 000 606 A00 GAC ACC GCCG ?G ACG 606 AAC 600 A00 GTC TC? CAT C?r GCCC GCCCG OCA 606 G60ph. tyr gla trp I1. org oep ocr a1e lea thr Lis aee setr 11. vol nor his ser ola gly pro ola ile glo

130 '140 100

6CG 006 000 CAG ?AC C?G AAC G?? OCA TOC CAC 000 CAA 606A ACC AAC AAC ACA T00 CGC GCC GGC AT? GC? TACthr leu lee gla tyr lee oes val mar ph. hie lea gla org ser 060 000 thr leu gly ola gly ile gly tyr

160 gall 170AC? AAC 06T ACA 000 CC? 000 006 CAC CC? 660 000 AAC GTC CAC AAC ACC GCCT TTC ACC 066 OAT T00 CC? COOthr oem aep thr vol olo lee gly aep pro lys trp oes vol asp asm thr ole ph. thr glu cop trp gly arg

leO 190 200

CCT CAA AAC GAT GOG CC? OCT COO CGA 6GC 600 GCC ATC 006 666 ATC ATC GAC TAC ATC 660 CAA OCT GGC ACTpro gln aemn ap g1y pro ale lcu arg scr il. ale i1. leou lye L1. ile, oep tyr 11. lye gln eor gly thr

210 ZEOmV 220

GAT COO GGG GCC 660 TAC CCA TTC CAG TCC ACC GCA 060 AOC 0?? 060 GAT 600 006 CC? TG0 CAC COO A00 TTCasp l.u gly ala lye tyr pro ph. g1n ocr thr ala asp 1Us ph* aspeap Ile vol org trp asp. IOU arg Pha,

220 0006! 240 210

ATT ATT GAC CAC T00 660 OCT TCC 006 TTT 060 CTA 000 060 066 GTC 660 GGC 600 CA? TTC OTT AC? 006 COOil. II. asp hi. trp aen ser ocr g1y ph. asp lee trp glu glo vol oen gly noct hi. phe ph., thr leo 1cc

Pet! 260 270G06 CAA CTO OCT GCA GT0 GAC AKO TCC COO 0TCC TAT 0?? AAC GCC ?CA 066 CGG TCC TCT CCC TOO COO 066 066vol gln lou set alaval1asp lye ocr lcu ocr tyr ph. oen ala eer glc org scr ocr pro ph. vol glu 910

200 290 300

000 COO CAG ACA CCC COO GAC ATC TCC 660 TT0 006 000 GAC CC? GCCC 660 GGG TT0 ATC AAC CGC 660 ?AC 660lea org glo thr org org asp ile cer lye ph. lcc vol asp pro ole eon gly ph. Lis 000 gly lye tyr sea

310 320060 A00 000 G00 ACA CCC 600 ATT GCC CAC ACA TTG 606 0CC 006 COO GAC 606 TCC AC? 006 006 CC? GCCC AACtyr i1. vol gly thr pro o.t LIe ole asp thr lea org ocr gly 1cu asp L1. ocr thr lcc lca ole ole ego

320 240 250

ACC GTC CAC 060 GCCC CCA TCT GCC?0CC CA? C?? CCC TTC 060 AOC 660 CAC CCT GCC GOC CTO AAC ACC 000 CACthr vol hLe asp ola pro ser ala ocr his 1cc pro ph. asp £1. aso asp pro ole vol lcc oem thr leu hie

360 370

CAT 000 600 000 CAC 600 COO TCC 606 TAC CCC ATC AAC 060 ACC TCC AAA AAT GCA 6CC 000 A00 GCC COO GCChis 1cc not 1cc him met arg war £1. tyr pro LIa oma asp ecrsecr lye oemn ole thr gly Llc ole IOU gly

380 390 400

CGO 060 CCT 060 CAC 006 060 060 006 060 GGC ?TT GGC 060 006 660 CCC T00 GTC COO GCC ACC 000T ACC GCCorg tyr pro glu asp vol tyr asp gly tyr gly phe gly glu gly asn pro trp Vol 1cc ole thr rye thr ole

410 420

TCA ACA ACC COO 060 CAC CTC 600 TAC 606 CAC ATC OCT 060 CAG CAT CAC 000 COO GTC CCA 600 AAC AAC 060ear thr thr 1cc tyr glo Icc ilc tyr org hi. ile ser glu lan hi. asp 1cc vol vol pro not asm asm cop

430 440 450TGT TCC AAC GCA 000 T00 6GC 060 COO 006 TOC TCC AAC CTC 6CC AC? 000 006 660 GAC 066 CCC T60 000 600cye nor amn &la ph. trp ocr glu lec vol phe ser aen lcc thr thr lcc gly aen asp glu gly tyr lcc Ile

460 470

TTG 060 TTC 660 RCA CCT GCC TOC 660 CAA ACC 606 CAA 666 ATC TOC CAA CTA OCT 060 TCA TTC TOO CTC 6601cc qlu ph. aeo thr pro ole ph. eso gln thr LIc gln lye ilc phc glo 1cc ole asp ser phc 1cc Vol lye

480 1490 500

COO 666 GCC ACG T00 066 CAC 6CC 000G AA 066leu lye ala thr top glu gln thr gly 0s0oTtp

.20 510 .40 .60 *c0GTGAACA ATTTAACAAA TACACACGCT TTATGCAGGG TGCCCAACAC CTTACCTGGT CCTATACTTC ATTCOGGGAT GCCCTATCAAA

.100 .120 *140 .160 .150oTAAGACAAGA ACGTTTTACAG 6000000606 CAAAAAAAAA 80A6AAAGA AAGCGAGAAG TATACACAAG TGTATTTCCT AGATATTTAC

tJ00 .~~~~~220*240 .260ATCAAATATA 0606060Tk0OTOATTTACAAA ACTCTGAOTAT TATAAATO!A TTACATAC*TA TGTCGGAACG OCCAOCCC4AA CCACGTOTGC

*290 .300 .220 SallACTTCTTTTC ACTTTCTCAT CCTGTGTCAA CTTGTTGCCA GGATTGTATC TGCGCAC

FIG. 2. Nucleotide sequences of S2 (a), Si (b), and SGA (c). The nucleotide sequences in the clones pScM6 and pScMl3 of the regionsmarked by thick solid lines (Fig. 1) were determined by the method of Sanger et al. (19). The numbers above the sequences indicate thenumber of nucleotides in each direction. Restriction sites are also indicated. The numbers below the deduced protein sequences denote theamino acid number. The peptides marked by arrows with a hook show extensive sequence homologies to the STAJ glucoamylase. Comparedwith the nucleotide sequence of STAJ (34), substitution, deletion, and insertion of nucleotides are indicated by asterisks, solid triangles, anda box, respectively.

Page 5: Gene fusion is a possible mechanism underlying the evolution of

2146 YAMASHITA ET AL.

and SGA) with STAI, we screened a genomic library of S.cerevisiae with three restriction fragments of the clonedSTA1 DNA as probes (denoted by A, B, and C in Fig. 1).Fragment A includes the 5'-flanking region and encodes anamino-terminal peptide which can serve as a signal sequencefor protein secretion. Fragment B encodes the threonine-and serine-rich domain. Fragment C encodes the catalyticdomain for glucoamylase activity and includes the 3'-flanking region. Nine positive clones were obtained andsubjected to restriction mapping (Fig. 1). The physical mapswere verified by Southern blot analysis of genomic DNA(data not shown). S2, Si, and SGA showed striking sequencehomologies to fragments A, B, and C, respectively, by bothphysical mapping (Fig. 1) and Southern blotting (data notshown). Linkage analysis between SGA and the S2-S1 regionwas performed after marking the loci with LEU2 and URA3.The tetrad data (parental ditype:nonparental ditype:tetra-type = 11:7:51) indicated no linkage between them.Comparison of S2, Si, and SGA with STAl at the nucleotide

level. To examine sequence homologies at the nucleotidelevel, we determined the nucleotide sequences of S2, S1, andSGA (Fig. 2) and compared them with the STAI sequencewhich had been determined (Fig. 1 and 2).S2 (Fig. 2a) contained an open reading frame of 242 amino

acids, of which the amino-terminal peptide of 32 amino acids(marked by arrows with a hook) was identical to the corre-sponding region of the STAI glucoamylase (Fig. 1). In thisregion, there were two silent substitutions (marked by aster-isks). In the 5'-flanking region compared (up to -191), 11

(1-45)(46-90)(91-135)(136-180)(181-225)(226-270)(271-306)

Sla (307-351)(352-396)

GCT CCA 6TA CCA ACT CCA TCC AGC TCT ACT ACT GAA AGC TCT TCT--- --- ------- --- --- --- --- --- --C --- --- --- ---

--- --- --- --- --C --- --A --- --- --- --- --- --- --- GTA--A --- --- --- --C- --T TC- -6C -AC ATC -CT --C --C--- --- TCT T-- --C--T- --- - C --- --- --- --- ---

-T- --- --- --- --C--- --A --- --- --- --- --- --- --- ---

--- --- --- --- --- --C --C --- --- --- --- 6TAHR2

--A --- -- --T----- -GC -AC ATC -CT --C --C--- --- TCT T-- -T- -T- --- -T- --- --- --- --- -T- ---

(484-522) 6TT CCA ACT AM ACT ACG ACT CT 6TC ACT ACA CCA TCA(616-654) --- --- --- -C- --- --- --- --- --- --- --- T-- ---

(523-615)

(655-747)

(976-1068)

(1282-1374)

ACA ACC ACT AU ACC ACT AC6 6TT T6C TCT ACA 66A ACA MC TCT 6CC GGT 0A8 ACT ACCTCT 66A TGC TCT CCA AM ACC 6TT ACA ACT ACT

T --- --A --- --_ --_ __ _ _ __ --- --- A-- --- --- ---

C-- --- --- --- --T --- --- --- --- --- ----T -- --- --- --- --- --- --- ------ --- --- --- --- --- --T --C --- --C ---

C-- --T --A --A --- --- --T -M --T --- G-T -CT --- --- 6 --- --- --- --- --A--- -T- --- --- 6-T --- --T A-C 6T- -6- T--

(748-828) 6TT CCA T6T TCA ACC A6T CCA A6C 6AA ACC 6CC TC6 6A TCA ACA ACC ACT TCA CCT ACCACA CCT GTA ACT ACA 6TT 6TC

(1069-1149) --- --T --- -- --T 6-- A-T 6- --- T- -TA-T --- 6-T --C --- CT- GTT A-A --A

6-T 6TC AC- --C --C --- --T

(883-942)

Si b (1189-1248)(1486-1545)

ATT ACA ACT ACA m 6TC ACC AMA MC ATT CCA ACC ACT TAC CTA ACC ACA ATT 6CT CCA

-C6 --- --- 66T -AC ACA--A --TCT 6-A-- --- --C-T 6--|--- --T T-6 --- ---

-CA --- --- 6G6T -AC ACA--- -6 TC- --C --- --- --- --- A- --- --T T-6 AT- ---

(1150-1185) ACC ACT 688 TCC TCT 8C 661T ACT MC TCC GCT 6MT(1447-1482)---- - A ---TT --- --- --- --- --- -C

FIG. 3. Sequence diversity of the repeated units in Si. The sixspecies of direct repeats in Si are aligned. The bases identical to thetop sequence are marked by bars. The numbers in parenthesesindicate the nucleotide numbers (Fig. 2b). The regions which werefused with S2 and SGA are designated as Sla and Slb, respectively.The homologous blocks (HB2 and HB3) are boxed, and thenonanucleotide sequences are underlined (see Discussion).

1 2 3 4

S2-_

URA3-*i *

5 6 7 8

STAl 9

URA3- 4a **

FIG. 4. Genetic controls of transcript levels from S2 and STAI.Transcript levels from S2 (left) and STAI (right) were investigatedby Northern blotting. RNAs were isolated from the followingstrains: YIY342 (MATa stal AMY2 inh°; lane 1), YIY440(MATalMATa sta0/stal AMY2/AMY2 inh°linh°; lane 2), N361-9A(MATot stal AMY2 INHI; lane 3), YIY379 (MATTa stal amy2 inh°;lane 4), 5106-9A (MATa STA1 AMY2 inh°; lane 5), YMI115(MATa/MATTa STAI/STAI AMY2/AMY2 inh°/inh°; lane 6), YIY2-12B (MATa STA1 AMY2 INHI; lane 7), YKF12 (MATaSTAI amy2 inh°; lane 8). All strains contain the S2 and Si regionsand SGA.

nucleotides were exchanged (marked by asterisks), and 2nucleotides were deleted (marked by solid triangles).

S1 (Fig. 2b) encoded a protein of 570 amino acids, ofwhich the peptide from Ser-109 to Val-411 (marked byarrows with a hook) matched the peptide from Ser-31 toVal-289 of the STAI glucoamylase (Fig. 1), except that thepeptide from Val-84 to Thr-127 of the STAI glucoamylasewas duplicated in S1 (the duplicated sequences are under-lined in Fig. 2b and also presented schematically in Fig. 1).In S1, there were 12 silent substitutions and 14 replacementsubstitutions (marked by asterisks in Fig. 2b). We found thatS1 was composed of six-unit sequences which were repeatedfrom two to nine times. Nucleotide sequences of the unitswere aligned to examine the diversity (Fig. 3). The datashowed that there were frequent base substitutions in S1.

Figure 2c shows the nucleotide sequence of SGA codingfor intracellular glucoamylase in which a unique open read-ing frame of 510 amino acids was identified. The sequencefrom Phe-33 to Asn-510 (marked by arrows with a hook) ofthe putative SGA glucoamylase was alnost identical to thatfrom Phe-290 to Asn-767 of the STAI glucoamylase (Fig. 1),in which there were 2 silent substitutions and 14 replacementsubstitutions (marked by asterisks). In the 3'-flanking region(+1 to +327), substitutions of two nucleotides (marked byasterisks), an insertion ofa AA dimer (boxed), and a deletionof two nucleotides (marked by a solid triangle) were ob-served.

Control of gene expression at transcriptional level. Geneticcontrols of transcript levels from S2 and STAI were inves-tigated by RNA blotting, using as probes the restrictionfragments D and C (Fig. 1), respectively. When the tran-script from S2 was examined, RNA was extracted from cellscarrying no STAI. When cells were cultured in YPGLmedium, we detected only the STAI transcript but not theSGA transcript, because the latter is sporulation specific.The data are shown in Fig. 4. From S2, a 4.2-kilobasetranscript was observed in the control haploid cells (MATasta° AMY2 inho; lane 1). The transcript was greatly reducedor absent in diploid cells heterozygous at MAT (MATalMATa sta°/stao AMY2/AMY2 inho/inh°; lane 2) or in haploidcells carrying either INHI (MATa stao AMY2 INHI; lane 3)or the amy2 mutation (MATa stao amy2 inh°; lane 4).Tran-

J. BACTERIOL.

Page 6: Gene fusion is a possible mechanism underlying the evolution of

GENE FUSION IN THE EVOLUTION OF STAI 2147

HB1 HB221 22 23 24 25 26 27 28 29 30 31 32

S2 GGTMCCAAC CACTAGTTCCM GAGG ATCCTC AGGAACTAGCTGTAATTCTATCGTTAMT

1111 I1 lIiiSi a GAGCTCTGT, TCACCAGTACCACCC CTTCC CTAGCAACATCACTTCCTCCGCTCCATCT

L_ a_ l_ * * * * f i* _ , a*31 32 33 34 35 36 37 38 39 40 41 42

STAl GGTMTTCCAACTGCACTAGTTCCTAGAGGATCCTCCTCTAGCAACATCACTTCCTCCGGTCCATCTL sL- L--J.J|JI|LJ

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

HB3co 0t 04 n %' an%X X- CO at

NN N N Ni N" m N N HB4 HB5

Sib GGTTACACAACAM6TCTGTACCAACCAC CC^CMGG CAGTACT

SGA MGCTTGGGCGCTCGAGAACATTACTAT AC mCIGl G6GCTCTCGATCAA9-iL- -- -JL LJ-9 L-- L-J L_O .4 W Um r- co 0%

at 0% 0% 0% 0% 0% 0%

Ni N N N N N N N Ni Ni

STAl AGTTACACAACAAAGTCTGTACCAACCACCTATGTAMGACMGGCAAGGGCATTCTCGATCAAL-i -J " L-J L--J L-J LL-LJL- L-J L~ " L..J L- L-J

c0 0% _ 4 1X W X -1 c4 w in %D fI a D 0%

r-. r- CD 0% 0% 0% 0% 0% 0% a% a% 0%

N N N N N N N N N N N N N N NfN NI N

FIG. 5. Nucleotide sequences around the junctions. The sequences of S2, S1 (Sla and Slb), and SGA around the junctions are aligned.The corresponding sequences of STAI are also shown for clarity. The numbers above and below the sequences indicate the amino acidnumber in agreement with that of STAI glucoamylase (Fig. 1). The homologous blocks (HBs 1 to 5) are in boxes, in which identical bases areindicated by vertical lines. The nonanucleotide sequences commonly present in the 5' side of the junctions are underlined.

scription of STAI was under identical genetic regulation byMAT, INHI, and AMY2 (lanes 5 to 8).

DISCUSSION

Recent evolution of ancestral STA gene by fusion of S2, Si,and SGA. The above results strongly suggest that an ances-tral STA gene was generated essentially by two steps offusion events (S2-S1 and S1-SGA) and a deletion of one copyof the direct repeats, Val-84 to Thr-127, in Si (Fig. 1). It isequally possible that Si contained a single copy of thesequence Val-84 to Thr-127 when the fusion of S2, Si, andSGA occurred and that duplication of the sequence occurredrecently in S. cerevisiae. In this scheme, Si functioned toconnect S2 and SGA which encode the signal peptide forprotein secretion and intracellular glucoamylase, respec-tively. It is likely that these events occurred very recently,because the STAI sequence is highly conserved in S2, Si,and SGA. Alternatively, it is possible to speculate that STAIwas divided into S2, Si, and SGA. However, the fusionmodel is rather probable for the following reasons. (i) IfSTAI was an ancestral gene for S2, Si, and SGA, severalSaccharomyces species would contain STAI, since the DNArearrangement occurred very recently; on the contrary, ifS2, Si, and SGA were origins for STAI, it is reasonable thatonly S. diastaticus contains STA1. We observed that 10species (S. cerevisiae, S. chevalieri, S. willianus, S. uvarum,S. cordubensis, S. coreanus, S. oleaginosus, S. prosto-serdovii, S. heterogenicus, and S. inusitatus) of 33 Saccha-romyces species examined contained DNA sequences ho-mologous with SGA which could be candidates for STAI(data not shown); however, as described above, only S.diastaticus secretes glucoamylase and can ferment starch.(ii) Gene disruption experiments by integrating foreign DNA

fragments (LEU2 and URA3) indicated that S2, Si, and SGAare not essential for vegetative growth, sexual mating, ormeiosis and sporulation (32; unpublished data). As describedabove, 23 Saccharomyces species contained no homologoussequences to SGA. It is reasonable to assume that STAI wasgenerated by using as materials S2, Si, and SGA which hadbecome nonessential rather than that nonessential geneswere evolved recently by disruption of STAI. It may belikely that the acquisition of STAI (ability to degrade starchinto glucose) could be a driving force in gene evolution.

Sequences around the junctions. Since the STAI sequencewas highly conserved in S2, Si, and SGA, we can easilydetermine the junctions. Figure 5 shows the nucleotidesequences around both junctions (S2-Sla and Slb-SGA) andthe STA1 sequence for clarity. Short homologous blocks(HBs 1 to 5, boxed in Fig. 5) are found around the junctions.It may be likely that these homologous sequences played aparticular role in the mechanism of gene fusion. Phenotypi-cally similar events were reported in yeasts (3) andcyanobacterium (4) such that the rearrangement with shortdirect repeats may be involved in the generation of sponta-neous petite genomes of mitochondria and the nitrogenfixation gene during heterocyst differentiation, respectively.Another structural feature is the existence of the sequence

GTACCAACC (underlined in Fig. 3 and 5) at both junctions.Although Si is composed of many direct repeats, the twononanucleotide sequences might have been generated inde-pendently from different units (Fig. 3). It is worthy of notethat both sequences are located nine nucleotides upstreamfrom the putative junctions (Fig. 5). We assume that thecommon sequence played a role in the gene fusion along withthe homologous blocks (HBs 1 to 5).

It is not clear whether these sequences are indeed recog-nized by the enzymes that are required for the gene fusion.

VOL. 169, 1987

Page 7: Gene fusion is a possible mechanism underlying the evolution of

2148 YAMASHITA ET AL.

As we learn more about the recent gene fusion events, wemay find universal features for junctions, although the vari-ous fusion events will have individual characteristics.

Distribution of functional domains by gene fusion. Sequenc-ing analysis revealed that both the 5' upstream sequence andthe signal sequence for protein secretion of STAI werederived from S2 (Fig. 1 and 2a). We observed by RNAblotting that transcriptions of S2 and STA1 were underidentical genetic regulation (Fig. 4), suggesting that thehomologous 5'-flanking sequences of S2 and STAI are in-volved in transcriptional regulation. It has often been provedthat transcriptional regulation is exerted at the 5' upstreamregion of many procaryotic and eucaryotic genes. An exper-imental support of this proposal is that the expression of ahybrid ,B-lactamase gene of E. coli fused to the 5'-flankingregion of STA1 (up to the HpaI site) was repressed by INH1(data not shown). The main conclusion of the present workis that, in the evolutionary history of STA1, gene fusion isindeed the most likely mechanism by which SGA acquiredboth the secretory signal sequence and the upstream regu-latory sequence for transcription. These results imply thatnot only these sequences but also the signals for proteintransfer into organelles and the structurally and functionallysimilar domains of multifunctional proteins may have beendispersed among modern genes from a limited number oforigins by the mechanism of gene fusion, although we cannotrecognize their ancestral sequences because the domains ofmodern genes show only limited homologies, at the se-quences which are strictly required for their functions (1, 2,6, 12, 18).

ACKNOWLEDGMENTSWe thank A. Toh-e for helpful discussions and advice. We also

thank T. Morinaga for providing yeast strains.This work was supported in part by grants from the Ministry of

Education, Science and Culture of Japan.

LITERATURE CITED

1. Bedouelle, H., P. J. Bassford, A. V. Fowler, I. Zabin, J.Beckwith, and M. Hofnung. 1980. Mutations which alter thefunction of the signal sequence of the maltose binding protein ofEscherichia coli. Nature (London) 285:78-81.

2. Emr, S. D., and T. J. Silhavy. 1983. Importance of secondarystructure in the signal sequence for protein secretion. Proc.Natl. Acad. Sci. USA 80:4599-4603.

3. Gaillard, C., F. Strauss, and G. Bernardi. 1980. Excisionsequences in the mitochondrial genome of yeast. Nature (Lon-don) 283:218-220.

4. Golden, J. W., S. J. Robinson, and R. Haselkorn. 1985. Rear-rangement of nitrogen fixation genes during heterocyst differen-tiation in the cyanobacterium Anabaena. Nature (London)314:419-423.

5. Hall, M. N., L. Hereford, and I. Herskowitz. 1984. Targeting ofE. coli 0-galactosidase to the nucleus in yeast. Cell 36:1057-1065.

6. Horwich, A. L., F. Kalousek, W. A. Fenton, R. A. Poliock, andL. E. Rosenberg. 1986. Targeting of pre-ornithine transcar-bamylase to mitochondria: definition of critical regions andresidues in the leader peptide. Cell 44:451-459.

7. Jensen, R., G. F. Sprague, and I. Herskowtiz. 1983. Regulationof yeast mating-type interconversion: feedback control of HOgene expression by the mating-type locus. Proc. Natl. Acad.Sci. USA 80:3035-3039.

8. Johnson, A. D., and I. Herskowitz. 1985. A repressor (MATa2product) and its operator control expression of a set of cell typespecific genes in yeast. Cell 42:237-247.

9. McMaster, G. K., and G. G. Carmichael. 1977. Analysis ofsingle- and double-stranded nucleic acids on polyacrylamide

and agarose gels by using glyoxal and acridine orange. Proc.Natl. Acad. Sci. USA 74:4835-4838.

10. Miller, A. M., V. L. MacKay, and K. A. Nasmyth. 1985.Identification and comparison of two sequence elements thatconfer cell-type specific transcription in yeast. Nature (London)314:598-603.

11. Nasmyth, K. 1983. Molecular analysis of a cell lineage. Nature(London) 302:670-676.

12. Naumovski, L., and E. C. Friedberg. 1986. Analysis of theessential and excision repair functions of the RAD3 gene ofSaccharomyces cerevisiae by mutagenesis. Mol. Cell. Biol.6:1218-1227.

13. Ohno, S., Y. Emori, S. Imajoh, H. Kawasaki, M. Kisaragi, andK. Suzuki. 1984. Evolutionary origin of a calcium-dependentprotease by fusion of genes for a thiol protease and a calcium-binding protein? Nature (London) 312:566-570.

14. Palm, D., R. Goerl, and K. J. Burger. 1985. Evolution ofcatalytic and regulatory sites in phosphorylases. Nature (Lon-don) 313:500-502.

15. Patthy, L. 1985. Evolution of the proteases of blood coagulationand fibrinolysis by assembly from modules. Cell 41:657-663.

16. Pelham, H. R. B. 1982. A regulatory upstream promoter elementin the Drosophila hsp 70 heat-shock gene. Cell 30:517-528.

17. Rigby, P. W. J., M. Dieckmann, C. Rhodes, and P. Berg. 1977.Labelling deoxyribonucleic acid to high specific activity in vitroby nick translation with DNA polymerase I. J. Mol. Biol.113:237-251.

18. Rosenblatt, M., N. V. Beaudette, and G. D. Fasman. 1980.Conformational studies of the synthetic precursor-specific re-gion of preproparathyroid hormone. Proc. Natl. Acad. Sci.USA 77:3983-3987.

19. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc-ing with chain-terminating inhibitors. Proc. Natl. Acad. Sci.USA 74:5463-5467.

20. Southern, E. M. 1975. Detection of specific sequences amongDNA fragments separated by gel electrophoresis. J. Mol. Biol.98:503-517.

21. Stone, E. M., K. N. Rothblum, and R. J. Schwartz. 1985.Intron-dependent evolution of chicken glyceraldehyde phos-phate dehydrogenase gene. Nature (London) 313:498-500.

22. Struhl, K., D. T. Stinchcomb, S. Scherer, and R. W. Davis. 1979.High-frequency transformation of yeast: autonomous replica-tion of hybrid DNA molecules. Proc. Natl. Acad. Sci. USA76:1035-1039.

23. Tamaki, H. 1978. Genetic studies of ability to ferment starch inSaccharomyces: gene polymorphism. Mol. Gen. Genet. 164:205-209.

24. Thomas, P. S. 1980. Hybridization of denatured RNA and smallDNA fragments transferred to nitrocellulose. Proc. Natl. Acad.Sci. USA 77:5201-5205.

25. Ullrich, A., L. Coussens, J. S. Hayflick, T. J. Dull, A. Gray,A. W. Tam, J. Lee, Y. Yarden, T. A. Libermann, J. Schles-singer, J. Downward, E. L. V. Mayes, N. Whittle, M. D.Waterfield, and P. H. Seeburg. 1984. Human epidermal growthfactor receptor cDNA sequence and aberrant expression of theamplified gene in A431 epidermoid carcinoma cells. Nature(London) 309:418-425.

26. Viasuk, G. P., S. Inouye, H. Ito, K. Itakura, and M. Inouye.1983. Effect of the complete removal of basic amino acidresidues from the signal peptide on secretion of lipoprotein inEscherichia coli. J. Biol. Chem. 258:7141-7148.

27. Weber, I. T., K. Takio, K. Titani, and T. A. Steitz. 1982. ThecAMP-binding domains of the regulatory subunit of cAMP-dependent protein kinase and the catabolite gene activatorprotein are homologous. Proc. Natl. Acad. Sci. USA 79:7679-7683.

28. Yamashita, I., and S. Fukui. 1983. Molecular cloning of aglucoamylase-producing gene in the yeast Saccharomyces.Agric. Biol. Chem. 47:2689-2692.

29. Yamashita, I., and S. Fukui. 1983. Mating signals controlexpression of both starch fermentation genes and a novelflocculation gene FLO8 in the yeast Saccharomyces. Agric.Biol. Chem. 47:2889-2896.

J. BACTERIOL.

Page 8: Gene fusion is a possible mechanism underlying the evolution of

GENE FUSION IN THE EVOLUTION OF STAI 2149

30. Yamashita, I., and S. Fukui. 1984. Isolation of glucoamylase-non-producing mutants in the yeast Saccharomyces diastaticus.Agric. Biol. Chem. 48:131-135.

31. Yamashita, I., and S. Fukui. 1984. Genetic background ofglucoamylase production in the yeast Saccharomyces. Agric.Biol. Chem. 48:137-141.

32. Yamashita, I., and S. Fukui, 1985. Transcriptional control of thesporulation-specific glucoamylase gene in the yeast Saccharo-myces cergvisiae. Mol. Cell. Biol. 5:3069-3073.

33. Yamashita, I., T. Maemura, T. Hatano, and S. Fukui. 1985.Polymorphic extracellular glucoamylase genes and their evolu-

tionary origin in the yeast Saccharomyces diastaticus. J. Bac-teriol. 161:574-582.

34. Yamashita, I., K. Suzuki, and S. Fukui. 1985. Nucleotidesequence of the extracellular glucoamylase gene STAI in theyeast Saccharomyces diastaticus. J. Bacteriol. 161:567-573.

35. Yamashita, I., K. Suzuki, and S. Fukui. 1986. Proteolyticprocessing of glucoamylase in the yeast Saccharomycesdiastaticus. Agric. Biol. Chem. 50:475-482.

36. Yamashita, I., Y. Takano, and S. Fukui. 1985. Control of STAIgene expression by the mating-type locus in yeasts. J. Bacteriol.164:769-773.

VOL. 169, 1987