10
THE JOURNAL OF EXPERIMENTAL ZOOLOGY 275:355-364 (1996) Descriptive and Functional Characterization of Variation in the Fundulus heteroclitus Ldh-B Proximal Promoter JEFF A. SEGAL, PATRICIA M. SCHULTE, DENNIS A. POWERS, AND DOUGLAS L. CRAWFORD Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois 60637 (J.A.S., D.L.C.); Hopkins Marine Station, Stanford University, Pacific Grove, California 93950 (P.M.S., D.A.P.) ABSTRACT Variation in enzyme expression may be an important mechanism for physiological and evolutionary adaptation. The Ldh-B locus in the teleost fish Fundulus heteroclitus is one of a very few loci for which an evolutionary difference in transcription rate between populations has been demonstrated. To begin to understand the molecular modifications that are responsible for altering transcription, we have characterized the Ldh-B proximal promoter using a combination of sequence analysis, transient transfection, and in vivo footprinting. The Ldh-B gene has several transcription start sites and a TATA-less, Inr (initiator of transcription motif) containing promoter with multiple Spl-like motifs. Transfection experiments reveal that Spl sites, TCC repeats, and Inrs are functional components of the proximal promoter. We find substantial sequence variation between populations within the proximal promoter (250 bp from the transcription start sites) and footprinting analysis indicates that some of this sequence variation is associated with differential protein binding to the apparent TFIID binding site and Spl sites. Together, these data suggest that variation in the Ldh-B proximal promoter may play a role in the observed difference in tran- scription rates between northern and southern populations of l? heteroclitus. o 1996 Wiley-Liss, Inc. Populations of the teleost, Fundulus heterocli- tus, are nearly continuously distributed along the Atlantic coast of North America where there is a 1°C change in annual mean water temperature per degree change in latitude (Powers et al., '93). For example, northern populations at 44.23"N in Maine are consistently subjected to environmen- tal temperatures approximately 12°C colder than their southern counterparts at 32.02"N in Geor- gia. Fish at the southern extreme of the species range can experience summer temperatures in ex- cess of 40°C, while those at the northern extreme are seldom exposed to such warm temperatures. Conversely, winter temperatures in northern re- gions result in extensive ice formation, while southern marshes seldom approach freezing tem- peratures. For ectothermic organisms, these colder environmental temperatures could result in sig- nificant decreases in enzyme reaction rates if there were no compensatory changes (Schmidt- Neilsen, '90). In the liver of Ii: heteroclitus the gly- colytic enzyme lactate dehydrogenase ( LDH-B4) provides an example of compensatory changes in reaction rate resulting from fixed differences be- tween populations in enzyme concentration (Craw- @ 1996 WILEY-LISS, INC. ford and Powers, '89) due to a twofold difference in the transcription rate of the Ldh-B gene (as measured by nuclear run-on assays, Crawford and Powers, '92). This supports the view that the mechanisms that regulate enzyme expression may be an important component of evolutionary adap- tation (Crawford and Powers, '92; Laurie-Ahlberg, '85; Wilson, '76). Transcription rates are typically governed by proteins (transcription factors) binding to specific DNA sequences in the 5' regulatory region (re- viewed in Mitchell and Tjian, '89; Sawadogo and Sentenac, '90). Variation in the interactions be- tween these proteins and their binding sites is required to modify gene expression within or be- tween species. This variation could arise as a re- sult of sequence changes in the binding sites or polymorphisms in the transcription factors. As little is known about the extent and consequences of variation in these interactions in natural popu- Received July 13, 1995; revision accepted March 15,1996. Address reprint requests to Douglas L. Crawford, Department of Molecular Biology and Biochemistry, 5 100 Roekhill Road, University of Missouri-Kansas City, Kansas City, MO 64110-2499.

Descriptive and functional characterization of variation in theFundulus heteroclitus Ldh-B proximal promoter

Embed Size (px)

Citation preview

THE JOURNAL OF EXPERIMENTAL ZOOLOGY 275:355-364 (1996)

Descriptive and Functional Characterization of Variation in the Fundulus heteroclitus Ldh-B Proximal Promoter

JEFF A. SEGAL, PATRICIA M. SCHULTE, DENNIS A. POWERS, AND

DOUGLAS L. CRAWFORD Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois 60637 (J.A.S., D.L.C.); Hopkins Marine Station, Stanford University, Pacific Grove, California 93950 (P.M.S., D.A.P.)

ABSTRACT Variation in enzyme expression may be an important mechanism for physiological and evolutionary adaptation. The Ldh-B locus in the teleost fish Fundulus heteroclitus is one of a very few loci for which an evolutionary difference in transcription rate between populations has been demonstrated. To begin to understand the molecular modifications that are responsible for altering transcription, we have characterized the Ldh-B proximal promoter using a combination of sequence analysis, transient transfection, and in vivo footprinting. The Ldh-B gene has several transcription start sites and a TATA-less, Inr (initiator of transcription motif) containing promoter with multiple Spl-like motifs. Transfection experiments reveal that Spl sites, TCC repeats, and Inrs are functional components of the proximal promoter. We find substantial sequence variation between populations within the proximal promoter (250 bp from the transcription start sites) and footprinting analysis indicates that some of this sequence variation is associated with differential protein binding to the apparent TFIID binding site and Spl sites. Together, these data suggest that variation in the Ldh-B proximal promoter may play a role in the observed difference in tran- scription rates between northern and southern populations of l? heteroclitus. o 1996 Wiley-Liss, Inc.

Populations of the teleost, Fundulus heterocli- tus, are nearly continuously distributed along the Atlantic coast of North America where there is a 1°C change in annual mean water temperature per degree change in latitude (Powers et al., '93). For example, northern populations at 44.23"N in Maine are consistently subjected to environmen- tal temperatures approximately 12°C colder than their southern counterparts at 32.02"N in Geor- gia. Fish at the southern extreme of the species range can experience summer temperatures in ex- cess of 40°C, while those at the northern extreme are seldom exposed to such warm temperatures. Conversely, winter temperatures in northern re- gions result in extensive ice formation, while southern marshes seldom approach freezing tem- peratures. For ectothermic organisms, these colder environmental temperatures could result in sig- nificant decreases in enzyme reaction rates if there were no compensatory changes (Schmidt- Neilsen, '90). In the liver of Ii: heteroclitus the gly- colytic enzyme lactate dehydrogenase ( LDH-B4) provides an example of compensatory changes in reaction rate resulting from fixed differences be- tween populations in enzyme concentration (Craw- @ 1996 WILEY-LISS, INC.

ford and Powers, '89) due to a twofold difference in the transcription rate of the Ldh-B gene (as measured by nuclear run-on assays, Crawford and Powers, '92). This supports the view that the mechanisms that regulate enzyme expression may be an important component of evolutionary adap- tation (Crawford and Powers, '92; Laurie-Ahlberg, '85; Wilson, '76).

Transcription rates are typically governed by proteins (transcription factors) binding to specific DNA sequences in the 5' regulatory region (re- viewed in Mitchell and Tjian, '89; Sawadogo and Sentenac, '90). Variation in the interactions be- tween these proteins and their binding sites is required to modify gene expression within or be- tween species. This variation could arise as a re- sult of sequence changes in the binding sites or polymorphisms in the transcription factors. As little is known about the extent and consequences of variation in these interactions in natural popu-

Received July 13, 1995; revision accepted March 15,1996. Address reprint requests to Douglas L. Crawford, Department of

Molecular Biology and Biochemistry, 5 100 Roekhill Road, University of Missouri-Kansas City, Kansas City, MO 64110-2499.

366 J.A. SEGAL ET AL.

lations, we have begun tzo study the regulation of Ldh-B transcription rate in populations of F: het- eroclitus as a model to provide insight concerning this important evolutionary phenomenon.

As a first step toward1 elucidating the molecu- lar mechanisms that underlie the natural varia- tion in Ldh-B transcription rate, we have defined transcriptional start sites, compared Ldh-B regu- latory sequences between populations, and defined functionally important regions by (1) an analysis of promoter strength in a heterologous transfec- tion system and (2) in vivo footprinting. The transfection of different promoter regions into het- erologous cells provides data regarding the func- tional roles of sequence motifs in the Ldh-B proximal promoter, but may not reveal population differences because of tissue and species specific effects (Clayton et al., 'Slj). Conversely, the in vivo footprinting experiments allowed us to locate re- gions within the 5' promoter sequence that bind F: heteroclitus hepatocyte nuclear proteins and ex- amine whether the variation in binding of these proteins is associated with sequence variation (Mirkovitch and Darnell, '91; Mueller and Wold, '89; Reddy et al., '94). Together, these assays should define the functj onal cis-acting elements within the Ldh-B proximal promoter and deter- mine whether sequence variation between popu- lations d e c t s processes involved in transcription. The combination of sequence and functional analy- ses can provide insight into the evolution of gene regulation that either approach alone cannot.

METHODS G'enomic clones

Fundulus heteroclitus homozygous for northern and southern Ldh-B alleles were collected from Wiscasset, Maine (44.2"N latitude) and Whitney Marine Laboratory, Florida (29.9"N latitude), re- spectively. Two genomic libraries were constructed from high molecular weight testicular DNA from a single individual from each of these populations. These DNAs were partially digested with Sau3A and ligated in lambda Fix11 arms (Stratagene, La Jolla, CAI. The 2.5 Kb of 5' Ldh-B flanking re- gions were isolated from these libraries and sub- cloned into pBSKSII (Stratagene, La Jolla, CA). These two clones were sequenced in both direc- tions with single stranded dideoxy reactions us- ing Sequenase (US. Biochemical, Cleveland, OH). Sequences were aligned and the binding motifs of transcription factors were identified using the MacVector 4.0 (Eastman Chemical Co., Roches-

ter, NY) and Signal Scan 3.3 (Prestridge and Stormo, '93) software packages.

Primer extension Transcriptional start sites were identified by

primer extension on total RNA isolated from single individual I? heteroclitus livers. Primer extension analysis used a 32P end labeled primer, 30 nucle- otides long, located at +lo0 to +130, and followed standard procedures (Treizenberg, '92). To deter- mine product sizes, sequence reactions were elec- trophoresed simultaneously and used as molecular weight standards. These experiments were re- peated on at least three individuals from each population.

Plasmid DNAs Plasmid DNAs for transfection experiments

were generated by subcloning portions of north or south Ldh-B promoter region (derived from the genomic clones) into the luciferase vector pGL2- Basic (Promega, Madison, WI). By linking these promoter DNA segments to a luciferase reporter gene, their relative promoter strength could be monitored by measuring the amount of luciferase protein produced in transfected cells. All con- structs were confirmed by sequence analysis.

Cell culture and transfection A salmon embryonic cell line (CHSE-214; ATCC,

Rockville, MD) was used for transfection experi- ments. This cell line was maintained at 21°C in minimum essential medium (MEM; Gibco, Grand Island, NY) supplemented with 25 mM HEPES buffer and 10% fetal bovine serum (FBS; Hyclone, Logan, UT). The cells were transfected with plas- mid DNA by calcium phosphate co-precipitation followed by a glycerol shock (Friedenreich and Schartl, '90). To correct for transfection and har- vesting efficiency, every Ldh-B promoter-luciferase construct was co-transfected with an internal con- trol construct [a CMV-promoter linked to the p- galactosidase gene; supplied by Nikolsky and Casadaban, University of Chicago, unpublished re- sults)]. All constructs were transfected in triplicate (three separate plates) per experiment and a mini- mum of three replicate experiments. Negative con- trol transfections (transfection of the luciferase gene with no attached promoter) accompanied ever ex- periment. Protein from each transfected plate of cells was harvested using Reporter Lysis Buffer (Promega, Madison, WI) and these protein extracts were assayed in a luminometer (Optocomp 1; MGM Instruments, Hamden, CT) for both lu-

CHARACTERIZATION OF THE Ldh-B PROXIMAL PROMOTER 357

ciferase activity (Promega) and P-gal activity (Tropix, Bedford, MA).

Promoter activity is expressed as net relative light units (luciferase activity minus the negative control) normalized for transfection activity (net P-galactosidase activity, P-galactosidase activity minus the negative control). This normalization corrects for variation in transfection efficiency. Net P-galactosidase activity from transfected cells yielded more than lo5 U. Net luciferase activity for the most active promoter construct (pLPr-5s) also yielded more than lo5 U.

In vivo footprinting A complete description of the in vivo footprint-

ing methodology (Mueller and Wold, '89, with modifications) is published elsewhere (Schulte et al., '95). Briefly, nuclei were isolated from indi- viduals from each population and subjected to mild DNase I digestion. The banding patterns from condensed DNA in nuclei and DNA from the same individual stripped of its proteins prior to DNase digestion were compared. The sequence of the Ldh-B promoter was determined for all indi- viduals used for footprinting assays.

RESULTS Multiple transcriptional start sites

Ldh-B mRNA from three individuals from both Maine and Georgia populations had seven differ- ent transcriptional start sites (Fig. 1). The seven start sites fall into two groups each associated with sequence motifs similar to the terminal deoxynucleotidyltransferase initiator sequence (Inr, Roeder, '91; Smale and Baltimore, '89). Two transcriptional start sites at +8 and +13 predomi- nate based on the density of the primer extension products (bands 118 and 122; Fig. 1). Examina- tion of the minor start sites suggest there may be quantitative variation between populations. For clarity in numbering, we have retained the as- signment of the +1 nucleotide as defined previ- ously based on cDNA analysis (Lauerman, '90). RNase protection assays yielded results similar to those found by primer extension (J.S., unpub- lished data).

Functional cis-acting elements The I? heteroclitus Ldh-B proximal promoter

consists of three regions which are of potential functional importance based on sequence similar- ity to known motifs and location relative to the major transcriptional start sites (Figs. 2 and 3C).

South North Neg. Pas. Pos. North South Control Control Control RNA RNA

-180

4-151 -147 +- 141 *139 -134

-125 4-122 -118

Fig. 1. Transcription start sites of the Ldh-B locus in liver. Two predominant transcription start sites (118 and 122) and five minor sites were identified using primer extension. Liver RNA was reverse transcribed using a primer located at +lo0 to +130. Yeast RNA was used as a negative control. Positive control cRNAs (bands 141 and 180) were synthesized from the two LDH-B cDNA alleles with the use of Sp6 polymerase. The size difference of the two controls reflects both variation in the cDNA length and changes in the lengths of the polylinker. Product lengths are indicated on the right.

These regions are the Inr motif, apparent TFIID binding site at -30 bp (TCC repeat) and S p l binding sequences. These regions are found in all individuals sequenced (see Fig. 3C, and un- published data). The Inr motifs, as mentioned above, are associated with the start of tran- scription. At -30 bp, the Ldh-B promoter lacks a TATA box which is the typical binding site for TFIID [a complex of protein factors which are re- quired for the initiation of transcription (Roeder, '91)] and is completely devoid of adenosines which might form a weak TATA consensus sequence. In- stead, this region contains a TCC repeat. The two Inrs (associated with the two clusters of start sites as previously discussed) do not themselves bind the TFIID complex (Roeder, '91; Smale and Balti- more, '89) but may direct its binding to sequences at -30 from the start of transcription, eliminat- ing the need for a TATA sequence (Smale et al.,

358 J.A. SEGAL ET AL.

'90; Zenzie-Gregory et al., '93). Upstream of the TCC repeats, between -49 and -80, there are sev- eral Sp l motifs that have 5 of 6 nt that are iden- tical to the consensus Spl binding site: GGGCGG (Dynan and Tjian, '83) and these motifs are in the reverse complement. Imperfect Sp l motifs (5 of 6 bp) promote transcription in other TATA-less promoters (Ciudad et al., '92; Jeang et al., '93) even when these motif's occur on the opposite strand (i.e., the reverse complement; OShea-Green- field and Smale, '92; Smale and Baltimore, '89).

To test the functional significance of these mo- tifs in the l? heteroclitu:; Ldh-B promoter, differ- ent portions of this proximal promoter were transfected into salmon embryonic cells (Fig. 2). Deletion of a region that included the Spl sites results in an approximate &fold decrease in pro- moter activity for both north and south (compare

pL-LPr5 and pL-LPr7, Fig. 2). Since other TATA- less promoters depend on Spl sites (Azizkhan et al., '93) these sites are the most likely candidate for the demonstrated effect on activity, despite the presence of additional sequence in the region that was deleted. Plasmids pL-LPr8 consisted of fur- ther deletions which replaced the TCC repeats (putatively assigned as the TFIID binding site based on distance from the major start sites) with plasmid sequences. This deletion resulted in an additional 5-fold reduction in promoter activity and thus a 25-fold decrease in activity compared to the intact proximal promoter (compare plasmids pL-LPr5, 7, and 8, Fig. 2). This suggests that the TCC repeats constitute a relatively sequence spe- cific activating region, because the plasmid se- quence is incapable of functionally replacing it. Finally, deletion to +20, removing Spl sites, TCC

Promoter Activity !Yo).

Inr's and SP1 Site TCC: Start Site Cluster Repeat Clusters

Constructs:

North 0.877 f. 0.044 - 5-fold reduction

South 0.936 0.065 p L- LPr5

North 0.1 47 2 0.01 4 +75 -45 I'P - 5-fold reduction pL-LPr7

South 0.21 0 L 0.022 +I

North 0.039 2 0.005 // South 0.033 2 0.002

pL-LPr8 +1

+20 +75 North

South 1-1 pL-LPrs 0

0

Fig. 2. Transfection of prorimal promoter constructs. Sche- matic shows plasmids used for transfection into salmon em- bryonic cells (CHSE-214). The proximal promoter region is defined as the region between -125 and +75 and includes the Sp l site cluster, TCC repeat, and Inr elements. The re- sults indicate that all three regions of cis-acting elements contribute t o proximal promoter activity. Promoter activ- ity is luciferase light units normalized for transfection ef-

ficiency @-gal activity) and expressed relative to pL-LPr5S (south). Values given are means of three separate experiments (each construct is measured in triplicate per experiment) plus or minus one standard deviation. Symbols: 0 = Spl site cluster region, = TCC repeat region, @ = Inr, r+= cluster of minor transcriptional start sites, = cluster of major start sites.

CHARACTERIZATION OF THE Ldh-B PROXIMAL PROMOTER 369

repeat, and Inrs resulted in a complete loss of pro- moter activity (see pL-LPr9, Fig. 2). In summary then, the Inr-start site region is capable of gener- ating a low level of expression that is increased &fold by the addition of the TCC repeat (putative TFIID binding site) region and 5-fold more with the further addition of the Spl site cluster region. The difference between north and south pL-LPr7 constructs (promoters containing the TCC repeats and Inrs) is statistically significant (P < 0.001) but of unknown biological significance.

Substantial sequence variation There is substantial sequence variation within

the 2.5 kb of 5’ flanking region between the two individuals used to construct the genomic librar- ies. The percent difference between sequences is 4.9% and is greater than the 1.5% found in the fourfold degenerate sites of the cDNA (Bernardi et al., 1993) but similar to that found in the first intron (5.2%; D.A.P., unpublished data). Promoter regions from six individuals from each popula- tion were compared and determined to have an average pairwise difference of 5.5% between populations, 1.5% difference within the Maine population and a 2.9% difference within the Georgia population. These sequences reveal that there are more differences between popu- lations than within a population. Importantly, all northern sequences have a C to T transi- tion within the TCC repeat at -27 and this is not found in any southern sequence. Addition- ally, all southern individuals have a 6 bp inser- tion before the first protected S p l motif that is not present in any northern sequence.

Differential protein binding In vivo footprinting analyses were performed on

several individuals from both populations to ad- dress the significance of sequence variation within the proximal promoter. Consistent differences in the banding patterns are indicative of the pres- ence of transcription factors bound to the DNA in intact nuclei. These protected regions were mapped onto the sequence of the same footprinted individual (Fig. 3C). A region containing numerous protected sites occurs at the TCC repeat (putative TFIID binding site) between -25 and -45 (la-f, Fig. 3A,C). In the southern population the T in each of the TCC repeats is protected (for one individual this protection extends beyond the 5’ most TCC). In contrast, in the Maine individuals the TCC at -26 to -29 has a C to T transition (TCC to TTC) eliminating this recurrent sequence pattern and,

importantly, the pattern of protected sites as well (Fig. 3).

There are several protected sites within the Spl site region between -49 and -80 (2, 3, 4, and 5; Fig. 3A,C). Tkvo of the protected sites, 2 and 3, were described as single footprints because only a single base pair intervened between the pro- tected regions. The protected Spl sites vary in their distance from the major transcriptional start sites between the two populations. This can be seen most clearly for footprints 3a and 3b (Fig. 3A) where the protected sites are at different dis- tances from the Inrs between populations. How- ever, when these protected sites are mapped on the aligned sequences they occur at the same se- quence motif (Fig. 3C) indicating that these pro- tected sites are sequence specific. Not all Sp l sequence motifs are protected, however, and some of the footprints which vary between populations occur at sequences that are not different (2 and 4, Fig. 3C). This suggests that other protein fac- tors or sequences may contribute to binding or lack of binding in this region.

The final footprinted region starts at -139 (6, Fig. 3B, C). In all individuals the control DNA had three predominant bands (6a,b,c; Fig. 3B). These bands were missing in all experimental samples (DNA with bound proteins). However, this region is unlike other footprints in that there is less consistency between individuals within a population: the banding pattern was sensitive to the amount of DNase I and instead of missing bands, there is a variation in the intensity and location of bands. This variable banding pattern starts at -198 to -248 in the northern population and at -221 to -248 in the southern population (6, Fig. 3B). Below and above these regions the banding patterns are the same between the con- trol and experimental samples. The significance of the longer region in individuals from the north- ern population than in those from the southern population is unknown. A likely explanation for the variation in the footprinting pattern in this region may be a change in the secondary structure.

DISCUSSION The I;: heteroclitus Ldh-B promoter lacks a TATA

box around -30 and in this region there is no ap- parent sequence that might act as a weak TATA consensus. Instead of a TATA box, the promoter contains a series of TCC repeats and 3’ to this region, at the two clusters of transcriptional start sites, are sequences similar to Inr motifs (Roeder, ’91; Smale and Baltimore, ’89). In the absence of

360 J.A. SEGAL ET AL.

Figure 3.

CHARACTERIZATION OF THE Ldh-B PROXIMAL PROMOTER 361

any other elements, these Inr motifs are able to promote transcription (Fig. 2) and thus are func- tionally important. The multiple start sites we ob- serve are not unusual in TATA-less promoters (Dynan and "Jian, '83) and are present in Ldh-B in ducks (Kraft et al., '93). In fact, the general structure of the proximal promoter of I? heterocli- tus Ldh-B is strikingly similar to that of the hu- man epidermal growth factor receptor (Ishii et al., '85; Johnson et al., '88) and the insulin receptor (Ishii et al., '85; Johnson et al., '88; McKeon et al., '90). These TATA-less promoters with multiple start sites also contain consensus Spl sites (in the same orientation as seen here) and TCC re- peats of various lengths. In I;: heteroclitus, pro- moter activity is dependent on the Spl motifs and TCC repeats (Fig. 2). Taken together, these data suggest that the E: heteroclitus Ldh-B promoter is typical of a poorly understood class of promot- ers which initiate transcription without the use of a TATA-box. Minor structural variations in these TATA-less promoters may allow for subtle changes in transcription rate (O'Shea-Greenfield and Smale, '92; Zenzie-Gregory et al., '93).

Although sequence comparisons of regulatory regions between species (especially from distantly related taxa) are relatively common (Fang and Brennan, '92; Gumucio et al., '93; Magoulas et al., '93; McKenzie et al., '94; Vincent and Wilson, '89), there has been little attention paid to intra-spe- cific variation. The few other studies comparing sequence variation in regulatory regions within a species have shown that variation is often very limited [less than 1%; (Kreitman and Hudson, 199111. In contrast, we observe very high levels of variation (5.5%) in the regulatory region of the I;: heteroclitus Ldh-B gene. Surprisingly, the high

Fig. 3. In vivo footprinting of the Ldh-B regulatory re- gion. A: The first 120 bp of the 5' regulatory region. One representative individual from each population is shown. Con- trols (C j are nuclear DNA stripped of proteins prior to DNase digestion. Experimental (El samples are subjected to DNase as condensed DNA in isolated nuclei. Bars and brackets indi- cate protected sites and are numbered based on the distance from the transcription start site. Protected sites were similar for all individuals within a population. B: Starts at -139 bp. Boxes indicate a region containing substantial Variation in the banding pattern between the control and experimental samples. Letters indicate three prominent bands seen in all controls. C: Aligned promoter sequences. (-1 indicates dele- tions. Bent arrows indicate transcription start sites, with heavier arrows at the two predominant sites. Shaded ovals are the Inr motifs. Boxes enclose protected sites. Numbers and letters refer to sites shown in A and B above.

level of sequence variation persists within 250 bp of the transcription start site; a region which we have shown to contain elements essential for pro- moter activity in salmon embryonic cells (Fig. 2). In the context of these transfection assays the se- quence variation in the functional elements did not affect transcription. Transfections of cell cul- tures are informative when used to define func- tionally important sequences. However, they may not accurately reproduce subtle but important in vivo conditions. For example, qualitative and quantitative differences in tissue specific gene ex- pression are due to the absence or presence of one or more transcription factors which vary between cell types (Alexander-Bridges et al., '92; Costa et al., '89, '90; Gregori et al., '93; Herbst et al., '91; Kuo et al., '92; Nishiyori et al., '94; Platt et al, '94). Thus, it is not unusual for cell cultures to fail to reproduce species specific, tissue specific, or even intact organ specific gene expression when transfected with proximal promoters (Babiss et al., '87; Bancroft et al., '92; Clayton et al., '85; Friedman et al., '87; Grayson et al., '88; Gregori et al., '93; Herbst et al., '91; Kuo and Darnell, '91; Li and Liao, '91). It could be that the sequence variation in the Ldh-B proximal promoter has no effect on transcription rate, as suggested by transfection studies. However, taking into ac- count (1) that in the proximal promoter even a few nucleotide differences at the binding site of a transcription factor can result in large changes in transcription (Cooper, '92; Fang and Brennan, '92) and (2) the importance of tissue spe- cific trans-acting factors, it is unclear whether the transfection assays accurately reflect the effect of Ldh-B sequence variation on transcrip- tion rates.

An alternative interpretation is that the se- quence variation in the Ldh-B proximal promoter affects transcription in the natural context of I;: heteroclitus hepatocytes by altered protein-DNA interactions. These interactions can be directly and more appropriately assayed by in vivo foot- printing (Babiss et al., '87; Bancroft et al., '92; Mirkovitch and Darnell, '91; Reddy et al., '94). Se- quence variability between north and south Ldh-B proximal promoters is correlated with differential in vivo protein binding in two functionally impor- tant regions: the Spl site region and the putative TFIID binding site (TCC repeat, Fig. 3). This type of variation can affect transcription. For example, in synthetic promoters containing Inr and Spl sites, variable sequences between -14 and -33 (TFIID binding site) resulted in altered transcrip-

362 J.A. SEGAL ET AL.

tion rates (Zenzie-Gregory et al., '93). These re- searchers suggested that the presence of an Inr allows for subtle variatiion in the TFIID binding site that could modulate transcription. The R het- eroclitus Ldh-B promotein contains a functional Inr motif region and thus the variation at the TCC repeat at -35 may affect transcription rate. This supposition is supported by the observation that the sequence variation at the Ldh-B putative TFIID binding site is associated with variation in protein-DNA interactions.

The results of the foatprinting assays suggest that binding at the TCC sequences is more ex- tensive in individuals from the south, yet in this population transcription rates are twofold lower than in individuals from Maine (Crawford and Powers, '92). The more extensive binding in south- ern individuals could be due to the interaction of other factors that inhibit transcription. Inhibitors that bind to proximal promoter elements have been characterized in other systems. For example, the transcription factor GATA-1 inhibits transcrip- tion in a cell specific manner by competing for the TATA-less TFIID binding site (Aird et al., '94). Similarly, in mammalian kidney the protein WT1 binds to the proximal consensus element TCCTCCTCC and represses transcription. A single substitution of T in the TCC sequence (as occurs exclusively in d l northern Ldh-B se- quences) has been shown to markedly reduce WT1 inhibition (Wang et al., '!33). The existence of such an inhibitor requires t h a t there is a hepatocyte or Fundulus specific factor that is not present in salmon embryonic cells. This is not unprecedented, as tissue specific protein-DNA interactions that are absent in cell cultures have been demonstrated at several proximal promoters (Aird et al., '94; Al- exander-Bridges et al., '92; Babiss et al., '87; Costa et al., '89; Gregori et al., '93; Takiguchi and Mori, '91). Regardless of the (actual mechanism, in Ii: heteroclitus the sequence variation at approxi- mately -30 is correlated with differential foot- printing that might altei- transcription rate.

The Spl region also (exhibits differential pro- tein binding in vivo. The total number of protected Spl motifs is the same in each F: heteroclitus popu- lation, but their exact sequence and distance from the major transcription start sites vary (Fig. 3). In all southern individuals there is a greater dis- tance between the first protected Spl motif and the TCC repeats in comparison to all northern in- dividuals. Small variations in the distance be- tween S p l sites and i;he initiator region, or nucleotide substitutions among these sites, can

have modulatory affects on transcription (Ciudad et al., '92; O'Shea-Greenfield and Smale, '92). No- tice that not all the Spl sequence motifs are pro- tected and some of the footprints which vary between populations occur at sequences that are not different (2 and 4, Fig. 3C); thus other protein factors or sequences may contribute to binding or lack of binding in this region. Similarly in the Tf- p3 promoter only a few of the many Spl sites con- tribute to activation of transcription (Geiser et al., '93). Additionally, for the liver specific expression of the transthyretin gene, not all binding sites identi- fied by transfection assays are utilized in vivo (Mirkovitch and Darnell, '91). It remains to be de- termined if variation in the utilization of Spl sites in R heteroclitus could affect transcription.

Many studies have investigated the importance of sequence motifs for promoting transcription and from these studies we have begun to understand the processes responsible for the regulation of gene expression. Exploring the natural variation in these promoter elements provides insight concerning the evolution of gene regulation. While it remains to be seen which of the variants we have identified af- fect transcription rate in vivo, the combination of evidence presented above suggests that the Ldh-B proximal promoter region may differentially bind transcription factors (e.g., TFIID complex or Spl) and thus play a role in the modulation of transcrip- tion rate. Sequence variation in this region is one of the few examples of a naturally occurring se- quence polymorphism that alters the binding of transcription factors at the proximal promoter that is not associated with a disease state.

ACKNOWLEDGMENTS The contributions by J.A.S. and P.M.S. were

equal, order of authorship was randomly deter- mined. The use of Dr. Radovan Zak's (University of Chicago) cell culture facilities and assistance is greatly appreciated. We thank Mark Martindale, Valerie Pierce, and Joe Quattro for their helpful comments and suggestions. Technical assistance was provided by Yeon Kim (Univ. Chicago). Craw- ford Norman's assistance in field collections is greatly appreciated. Support was provided by NSF grants OCE-9116016 and IBN-9419781 to D.L.C. and BSR-9022648 to D.A.P. J.A.S. was supported by NIH Pre-doctoral training grant GM 07183. P.M.S. was supported by Natural Sciences and Engineering Research Council (Canada) post- graduate fellowship and a departmental fellow- ship from the Department of Biological Sciences, Stanford University.

CHARACTERIZATION OF TT3.E Ldh-B PROXIMAL PROMOTER 363

LITERATURE CITED Aird, W.C., J.D. Parvin, P.A. Sharp, and R.D Rosenberg (1994)

The interaction of GATA-binding proteins and basal tran- scription factors with GATA box-containing core promoters. A model of tissue-specific gene expression. J. Biol. Chem.,

Alexander-Bridges, M., L. Ercolani, X.F. Kong, and N. Nasrin (1992) Identification of a core motif that is recognized by three members of the HMG class of transcriptional regula- tors: IRE-ABP, SRY, and TCF-1 alpha. J. Cell. Biochem.,

Azizkhan, J.C., D.E. Jensen, A.J. Pierce, and M. Wade (1993) Transcription from TATA-less promoters: Dihydrofolate re- ductase as a model. Crit. Rev. Euk. Gene Exp., 3:229-54.

Babiss, L.E., R.S. Herbst, A.L. Bennett, and J.E. Darnell Jr. (1987) Factors that interact with the rat albumin promoter are present both in hepatocytes and other cell types. Genes Dev., 1:256-67.

Bancroft, J.D., S.A. McDowell, and S.J. Degen (1992) The human prothrombin gene: Transcriptional regulation in HepG2 cells. Biochemistry, 31 :12469-76.

Bernardi, G., P. Sordino, and D.A. Powers (1993) Concordant mitochondria1 and nuclear DKA phylogenies for populations of the teleost fish Fundulus heteroclitus. Proc. Natl. Acad. Sci. U.S.A., 90:92714.

Ciudad, C.J., A.E. Morris, C. Jeng, and L.A. Chasin (1992) Point mutational analysis of the hamster dihydrofolate re- ductase minimum promoter. J. Biol. Chem., 267:3650-6.

Clayton, D.F., A.L. Harrelson, and J.E. Darnell Jr. (1985) De- pendence of liver-specific transcription on tissue organiza- tion. Mol. Cell. Biol., 52623-32.

Cooper, D.N. (1992) Regulatory mutations and human genetic disease. Ann. Med., 24427-37.

Costa, R.H., D.R. Grayson, and J.E. Darnell Jr. (1989) Mul- tiple hepatocyte-enriched nuclear factors function in the regulation of transthyretin and alpha 1-antitrypsin genes. Mol. Cell. Biol., 9:1415-25.

Crawford, D.L., and D.A. Powers (1989) Molecular basis of evolutionary adaptation at the lactate dehydrogenase-B lo- cus in the fish Fundulus heteroclitus. Proc. Natl. Acad. Sci. U.S.A., 86:9365-9.

Crawford, D.L., and D.A. Powers (1992) Evolutionary adap- tation to different thermal environments via transcriptional regulation. Mol. Biol. Evol., 9:806-13.

Dynan, W.S., and R. Tjian (1983) The promoter-specific tran- scription factor Spl binds to upstream sequences in the SV40 early promoter. Cell, 35:79-87.

Fang, X.M., and M.D. Brennan (1992) Multiple cis-acting se- quences contribute to evolved regulatory variation for Droso- phila Adh genes. Genetics, 131:333-43.

Friedenreich, H., and M. Schartl (1990) Transient expres- sion directed by homologous and heterologous promoter and enhancer sequences in fish cells. Nucleic Acids Res.,

Friedman, J.M., L.E. Babiss, M. Weiss, and J.E. Darnell Jr. (1987) Hepatoma variants (C2) are defective for transcrip- tional and post-transcriptional actions from both endogenous and viral genomes. EMBO J., 6:1727-31.

Geiser, A.G., K.J. Busam, S.J. Kim, R. Lafyatis, M .A. O'Reilly, R. Webbink, A.B. Roberts, and M.B. Sporn (1993) Regula- tion of the transforming growth factor-beta 1 and -beta 3 promoters by transcription factor Spl. Gene, 129223-8,

Grayson, D.R., R.H. Costa, K.G. Xanthopoulos, and J.E. Darnell Jr. (1988) A cell-specific enhancer of the mouse

269:883-9.

48:129-35.

18:3299-305.

alpha 1-antitrypsin gene has multiple functional regions and corresponding protein-binding sites. Mol. Cell. Biol., 8:1055-66.

Gregori, C., A. Kahn, and A.L. Pichard (1993) Competition between transcription factors HNFl and HNF3, and alter- native cell-specific activation by DBP and C/EBP contrib- ute to the regulation of the liver-specific aldolase B promoter. Nucleic Acids Res., 21:897-903.

Gumucio, D.L., D.A. Shelton, W.J. Bailey, J.L. Slightom, and M. Goodman (1993) Phylogenetic footprinting reveals un- expected complexity in trans factor binding upstream from the epsilon-globin gene. Proc. Natl. Acad. Sci. U.S.A.,

Herbst, R.S., U. Nielsch, F. Sladek, E. Lai, L.E. Babiss, and J.E. Darnell Jr. (1991) Differential regulation of hepatocyte- enriched transcription factors explains changes in albumin and transthyretin gene expression among hepatoma cells. New Biol., 3:289-96.

Ishii, S., Y.H. Xu, R.H. Stratton, B.A. Roe, G.T. Merlino, and I. Pastan (1985) Characterization and sequence of the pro- moter region of the human epidermal growth factor recep- tor gene. Proc. Natl. Acad. Sci. U.S.A., X24920-4.

Jeang, K.T., R. Chun, N.H. Lin, A. Gatignol, C.G. Glabe, and H. Fan (1993) In vitro and in vivo binding of human immu- nodeficiency virus type 1 Tat protein and Sp l transcription factor. J . Virol., 67:6224-33.

Johnson, A.C., S. Ishii, Y. Jinno, I. Pastan, and G.T. Merlino (1988) Epidermal growth factor receptor gene promoter. De- letion analysis and identification of nuclear protein bind- ing sites. J. Biol. Chem., 2635693-9.

Kraft, H.J., W. Hendriks, W.W. de Jong, N.H. Lubsen, and J.G. Schoenmakers (1993) Duck lactate dehydrogenase B/ epsilon-crystallin gene. Lens recruitment of a GC-promoter. J . Mol. Biol., 229:849-59.

Kreitman, M., and R.R. Hudson (1991) Inferring the evolu- tionary histories of the Adh and Adh-dup loci in Drosophila rnelanogaster from patterns of polymorphism and diver- gence. Genetics, 127:565-82.

Kuo, C.J., P.B. Conley, L. Chen, F.M. Sladek, J.E. Darnell Jr., and G.R. Crabtree (1992) A transcriptional hierarchy involved in mammalian cell-type specification. Nature, 355:457-61.

Kuo, F.C., and J.E. Darnell Jr. (1991) Evidence that inter- action of hepatocytes with the collecting (hepatic) veins triggers position-specific transcription of the glutamine synthetase and ornithine aminotransferase genes in the mouse liver. Mol. Cell. Biol., 11:6050-8.

Lauerman, T. (1990) The Functional Significance of the Amino Acid Differences Between Allelic Isozymes of Heart Type Lactate Dehydrogenase in Fundulus heteroclitus. Ph.D. The- sis. Johns Hopkins University, Baltimore, MD.

Laurie-Ahlberg, C.C. (1985) Genetic variation affecting the expression of enzyme-coding genes in Drosophila: An evo- lutionary perspective. Isozymes. Curr. Top. Biol. Med. Res.,

Li, X.X., and W.S. Liao (1991) Expression of rat serum amy- loid A1 gene involves both C/EBP-like and NF kappa B- like transcription factors. J. Biol. Chem., 266:15192-201.

Magoulas, C., A. Loverre-Chyurlia, S. Abukashawa, L. Bally- Cuif, and D.A. Hickey (1993) Functional conservation of a glucose-repressible amylase gene promoter from Drosophila virilis in Drosophila rnelanogastel: J. Mol. Evol., 36:23442.

McKenzie, R.W., J. Hu, and M.D. Brennan (1994) Redundant cis-acting elements control expression of the Drosophila affinidisjuncta Adh gene in the larval fat body. Nucleic Ac- ids Res., 22:1257-64.

90:6018-22.

12:33-88.

364 J.A. SEGAL ET AL.

McKeon, C., V. Moncada, T. Pham, P. Salvatore, T. Kadowaki, D. Accili, and S.I. Taylor (1.990) Structural and functional analysis of the insulin receptor promoter. Mol. Endocrinol., 4:647-56.

Mirkovitch, J., and J.E. Darnel1 Jr. (1991) Rapid in vivo foot- printing technique identifier; proteins bound to the TTR gene in the mouse liver. Genes Dev., 5:83-93.

Mitchell, P.J., and R. Tjian (1989) Transcriptional regulation in mammalian cells by sequence-specific DNA binding pro- teins. Science, 245:371-8.

Mueller, P.R., and B. Wold (1989) In vivo footprinting of a muscle specific enhancer by ligation mediated PCR [pub- lished erratum appears in Science, May 18, 1990; 248:802]. Science, 246:780-6.

Nishiyori, A., H. Tashiro, A. Kimura, K. Akagi, K. Yamamura, M. Mori and M. Takiguchi (1994) Determination of tissue specificity of the enhancer by combinatorial operation of tissue-enriched transcription factors. Both HNF-4 and C/ EBP beta are required for liver-specific activity of the or- nithine transcarbamylase enhancer. J. Biol. Chem.,

OShea-Greenfield, A., and S.T. Smale (1992) Roles of TATA and initiator elements in determining the start site loca- tion and direction of RNA polymerase I1 transcription. J. Biol. Chem., 267:6450.

Platt, K.A., K.P. Claffey, W.0. Wilkison, B.M. Spiegel- man and S.R. Ross (1994) Independent regulation of adi- pose tissue-specificity and obesity response of t h e adipsin promoter in transgenic mice. J. Biol. Chem.,

Powers, D.A., M. Smith, I. Gonzalez-Villasenor, L. DiMichelle, D.L. Crawford, G. Bernardi. and T. Lauerman (1993) A mul- tidisciplinary approach to the selectionistheutralist contro- versy using the model teleost, Fundulus heteroclitus. In: Oxford Surveys in Evolutionary Biology. D. Futuyma and J. Antonovics, eds. Oxford 1Jniversity Press, New York, vol. 9, pp. 43-108.

Prestridge, D.S., and G. Stormo (1993) SIGNAL SCAN 3.0: New database and program features. Comput. Appl. Bio- sci., 9:113-5.

Reddy, P.M., G. Stamatoyannopoulos, T. Papayannopoulou,

269:1323-31.

269:28558-62.

and C.K. Shen (1994) Genomic footprinting and sequenc- ing of human beta-globin locus. Tissue specificity and cell line artifact. J. Biol. Chem., 26923287-95.

Roeder, R.G. (1991) The complexities of eukaryotic transcrip- tion initiation: Regulation of preinitiation complex assem- bly. Trends Biochem. Sci., 16:402-8.

Sawadogo, M., and A. Sentenac (1990) RNA polymerase B (11) and general transcription factors. Annu. Rev. Biochem.,

Schmidt-Neilsen, K. (1990) Animal Physiology: Adaptation and Environment. Cambridge University Press, New York,

Schulte, P.M., J.A. Segal, D.L. Crawford, and D.A. Powers (1995) A rapid in vivo footprinting method for the detection of DNA-protein interactions in isolated nuclei. Mol. Marine Biol. Biotechnol., 4:200-205.

Smale, S.T., and D. Baltimore (1989) The "initiator" as a tran- scription control element. Cell, 57:103-13.

Smale, S.T., M.C. Schmidt, A.J. Berk, and D. Baltimore (1990) Transcriptional activation by S p l as directed through TATA or initiator: Specific requirement for mammalian transcrip- tion factor IID. Proc. Natl. Acad. Sci. U.S.A., 87:4509-13.

Takiguchi, M., and M. Mori (1991) In vitro analysis of the rat liver-type arginase promoter. J. Biol. Chem., 266:918&93.

Treizenberg, S.J. (1992) Primer Extension. In: Current Proto- cols in Molecular Biology. F.M. Ausebel, ed. John Wiley and Sons, Inc., New York, vol. 1 pp. 4.8.14.8.5.

Vincent, K.A., and A.C. Wilson (1989) Evolution and tran- scription of old world monkey globin genes. J. Mol. Biol., 207:465-80.

Wang, Z.Y., Q.Q. Qiu, K.T. Enger, and T.F. Deuel (1993) A second transcriptionally active DNA-binding site for the Wilms tumor gene product, WT1. Proc. Natl. Acad. Sci. U.S.A., 90:8896-900.

Wilson, A.C. (1976) Molecular Evolution. FJ . Ayala, ed. Sinauer, MA, Sunderland, pp. 225-234.

Zenzie-Gregory, B., A. Khachi, I.P. Garraway, and S.T. Smale (1993) Mechanism of initiator-mediated transcription: Evi- dence for a functional interaction between the TATA-bind- ing protein and DNA in the absence of a specific recognition sequence. Mol. Cell. Biol., 13:3841-9.

59:711-754.

pp. 218-222.