Transcript

Biotech. Adv. Vol. 5, pp. 29--45, 1987 0734-9750/87 $0.00 + .50 Printed in Great Britain. All Rights Reserved. Copyright ~ Pergamon lournals Ltd

ADVANCES IN THE MOLECULAR BIOLOGY OF PLANT SEED STORAGE PROTEINS

JERRY L. SLIGHTOM and PAULA P. CHEE

Division of M o l e c u l a r Biology, The Up john C o m p a n y , Kalamazoo, M i c h i g a n 49007, USA

ABSTRACT

Plant seed storage proteins were among the first proteins to be isolated

(20); however, only recently, as a result of using molecular biology

techniques, have the amino acid sequences of many of these proteins been

determined. With the accumulation of amino acid sequence data for many

vicilin-type storage proteins much has been learned concerning the location

of conserved amino acid regions and other regions which can tolerate amino

acid sequence variation. Combining this knowledge with recent advances in

plant gene transfer technologies will allow molecular biologists to correct

(by using amino acid replacement mutations) the sulfur amino acid deficiency

inherent to bean seed storage proteins. The development of more nutritious

soybean and common bean seeds will be of benefit to programs involving human

and animal nutrition.

KEY WORDS

Amino acid sequence, AErobaccerium Cumefaciens, evolution, Glycine max,

nucleotide sequence, nutritional improvement, Phaseolus vuiEaris ,

recombinant DNA.

INTRODUCTION

Molecular investigations of plant seed storage proteins and the genes which

encode them are important because these proteins are a major source of

nutrition for people in many parts of the world. For the plant molecular

29

30 JERRYL. SLIGHTONIand PAtJLA P. CHEE

biologist these genes are also scientifically interesting as they provide an

excellent plant gene system which shows both tissue-specific and

developmental regulation. The overall goal of the molecular biologist is to

have an understanding of these genes with respect to their structural gene

organization, number and chromosomal relations~hip with each other, the

mechanism which regulates their expression, and their diversity among the

plant species. With this type of information and with the development of

transformation and regeneration schemes for crop species, storage protein

genes can be engineered to overcome inherent deficiency as a source of

protein, such as the lack of sulfur-containing amino acids for bean seeds by

adding codons for the sulfur-containing amino acids cysteine and methionine.

Because the plant kingdom is very large and quite diverse we have limited

this report to a discussion of recent advances in the research of the

storage proteins genes isolated from two economically important legumes,

soybean (Glycine max) and common bean (Phaseolus vuigaria) The major seed

storage proteins of these legumes are globulins (packaged into cotyledonary

protein storage bodies) which are represented in most legumes by two

different types of polypeptides; the nonglyeosylated 1IS fraction (called

legumins) and the glycosylated 7S proteins (called vicilins) (7). In these

two plant species our major research focus has been the investigation of the

vicilin-type storage proteins; the soybean "~-conglycinin" and the common

bean "phaseolin" polypeptides. These 7S proteins consist of multi-subunit

combinations of approximately 150~220 kiloDaltons (kDa) (2,7)~ The

individual protein subunits vary in size between and within these species.

The principal polypeptide subunits of phaseolin are referred to as ~-(51-53

kDa), ~-(47-48 kDa), and 7-(43-46 kDa) (4). These three polypeptide types

make up about 50% of the protein in P. vulgaris seeds (15)~ However,

molecular analysis of individual members of the phaseolin gene family finds

that there are only two unique types of phaseolin genes [with 3-4 gene

copies each per haploid genome, (28)], which encode only m-and B-type

polypeptide subunits (25-27, J.L. Slightom, D.V. Thompson, and R.F. Drong,

manuscript in preparation). The 7 subunit appears to be the result of

incomplete addition of N-linked oligosaccharide sidechains on the E-type

polypeptide, or their partial degradation (3,21).

The soybean ~-conglycinin polypeptide subunits consist of ~'-(76 kDa), ~-(72

kDa) and ~-(53 kDa) (7,16). The ~-subunits appear to be encoded by a

relatively small number of genes, 1-2 copies per haploid genome, while the

B-subunit is encoded by a larger number of genes (II).

PLANT SEED STORAGE PROTEINS 31

Amino acid analyses of both phaseolin and ~-conglycinin type storage

proteins find that they all contain less than 1% sulfur-containing amino

acids. For these proteins to be nutritionally balanced they should contain

between 3 to 6% sulfur-containing amino acids. Thus a goal of molecular

biology research is to incorporate, by DNA mutation, the number of amino

acid replacements necessary to obtain high sulfur storage protein (HSSP)

genes. However, before genetic engineering techniques can be successfully

used to obtain HSSP-genes a large amount of basic molecular biology

information concerning these seed storage protein genes needs to be

collected. In addition, plant transformation and regeneration schemes need

to be developed for both soybean and common bean species so that the

HSSP-genes can be placed into the proper plant environment. In this report

we have summarized the progress which has been made over the last few years

to achieve these goals.

COMPARATIVE ANALYSES OF SOYBEAN AND COMMON BEAN VICILIN-TYPE GENES

Phaseolin and ~'-subunit of ~-conglycinin polypeptides differ considerably

in both size (see below) and antigenic identity (8). However, a comparison

of the partial nucleotide sequence of these genes show that they do share

both structural and sequence identity (23). Doyle et al. (9) recently

reported the complete nucleotide sequence for the ~'-subunit (Gma-~'), a

total of 3636 base pairs (bp), and compared its nucleotide and amino acid

sequences with that obtained from the ~-phaseolin gene (Pvu-~), about 2800

bp (25). The nucleotide sequence alignments of these two sequences are

shown in Figs. 1 through 3, beginning about 900 bp 5' of the common

transcriptional capped nucleotide (the nucleotide at position 1 in Fig. I)

and extending to the end of their 3'-untranslated regions. Alignment of

these nucleotide sequences, which involves the placement of gaps (shown as

asterisks) to maximize identities, is for the most part straight forward due

to the similar structural organization of the genes; both genes have six

exons and five introns and share common nucleotide sequence elements in

their 5'-and 3'-untranslated regions. Nucleotide and amino acid sequence

comparisons presented in Figs 1-3 show that these genes do share a high

degree of identity. Using both natural (introns, exons, and untranslated

regions) and arbitrary boundaries the degree of apparent nucleotide sequence

divergence has been determined for 18 DNA regions (Table i). The overall

corrected divergence (14) between these genes is about 41%, with exons and

introns showing similar degrees of overall divergence, 43 and 37%,

respectively.

32 IERRY L. SLIGHTOM and PAULA P. CHEE

Divergence in vulgaris

Structural Region Compared

Table i

and around 7S storage protein genes from G.

i. 5' Flanking -958 2. 5' Flanking -357 3. 5' Flanking -247 4. 5' Flanking -149 5. 5' Flanking - 51 6. 5' Untranslated i 7. Exon I 78 8. Intron i 912 9. Exon 2 1116

I0. Intron 2 1349 II. Exon 3 1443 12. Intron 3 1524 13. Exon 4 1656 14. Intron 4 1939 15. Exon 5 2083 16. Intron 5 2373 17. Exon 6 2509 18. 3' Untranslated 2748

Nucleotide Number bp match/ Positions of Gaps bp compared

- -358 28 304/549 -248 4 75/101 -150 7 46/96

52 2 73/94 I 3 26/39

77 6 39/64 911 3 237/311

1115 5 52/77 1348 2 123/193

- 1442 3 60/84 - 1523 -- 67/81 - 1655 I0 61/87 - 1938 4 143/229 - 2082 7 91/116 - 2372 i 198/260 - 2508 7 74/106 - 2747 4 115/176 - 2886 6 92/129

max and P.

Corrected % divergence

67.8 31.5 87.3 26.5 44.1 552 286 425 49.6 360 196 381 521 25.4 28.7 386 465 361

Identification of cis-acting regulatory elements--The cis-acting regulatory

elements common to most eukaryotic genes, CCAAT- and TATA- elements (i0) are

also present in similar locations (-77 and -31 bp 5' of the cap site,

respectively) in these plant genes. The alignment in Fig. i shows that both

genes share identical sequences in and around these regulatory elements.

The multiple CCAAT-and TATA- elements found in the Pvu-~ sequence are

believed responsible for the numerous cap sites found in phaseolin mRNAs, at

least 12 different capped mRNA species have been identified by SI nuclease

analysis (26). The nucleotide sequences surrounding the TATA-elements of

both genes match the "pla~t consensus" while their CCAAT-elements are more

similar to the mammalian consensus sequence (17). In addition to these

regulatory elements these 5'-flanking DNAs contain sequence regions which

share identity with another mammalian cis-acting regulatory element known as TTT

an enhancer element, which has a consensus of GTGGAAAG (13>, see Fig. i.

However, because these seed storage protein genes are subject to similar

tissue specific and developmental regulation it is conceivable that they

could share other cis-acting regulatory signals unique to their seed tissue

environment. Experiments involving the transfer of Pvu-$ gene into the

genome of tobacco via Agrobacter2um tumefaciens vector systems (see below)

suggest that tissue-specific and developmental regulatory DNA elements

should exist within about 800 bp 5' of the cap sites (24). The alignment in

Fig.l shows that within about 350 bp 5' of the cap site there are several

P L A N T SEED S T O R A G E P R O T E I N S 33

pvu J TTTTATA~AAT~ATnCACC~*m̀*o'tAAACAA.FCATT~mGTATTT~TuA~AA~CATGTTAT~̀~**~G~TT~TAT~TT~(ATTT~A(A~TA~EAAit~TE

......... . ......... + ......... • ......... ~ ......... + ......... • ......... . ......... + ......... + ......... + ......... + ......... • _ it~q

Pvu s AItititTCT~KTTTTACATSCSASAt;ACATCTTil| [|j ~ ilL, l| ]. j.II[ j I ~.~j~|~ |~ i~|~.~ ].~ I|~P~I ~.~ ~i |~ ~ T A A A TAAT TAATAATAItT ird:TAIATTCAASATTTCItTATItT AAATACTCAItTATTIt(ToTCtA ATTAATTItSAIATItAT I I~ III i~l Iii I I I 111 I ~ I

Pro I 1AAAATA TA T TTTTAATTTT*.AAItTTTAATTItTT6It*.ItTTTGTGA(TATTGATTTATTItTT TA(TATGTTTAA*ATTGTTTTATAEATA~.~,,STTTAAAETAAAIItTItAGTAA I I I H~ II I I I I 1 I I IL I I I~ I I I I I I i l l l I F I I I I I t it I I I

• A AATTA ES~6'T~gETr,,~A(TCI6Tr~TCTTTTItitTTt;ItT .ATE 5! TTitf; AASA.AAAIt~ AAItSAA..-.+.-,.. AAit EA..-.*AEASA AAAAC.AGItSA

5~A • ' C~AT5C~ATitp~TATTitATTAGT'~+nTTT55A~ATit~AAAA~AAGGAACApJ~A~itA~AA~A~T~TAT~CTTT~itTA~T~T~TAit~TT~AT~TT~A ......... + ......... + ......... + ......... + ......... + ......... ~ ......... • ......... • ......... + ......... • ......... + ......... + - ~9

Pro , ~¢AAAA.CS~AATCACP~AACCAACTCA..AATT AGTC&CT55CTSATCAASATCS{CSCETCCATGT . . . . . . . . TET --,,~ T~AT ~ , ~ T S ~ I 5~J~ ~ T S ~ [ T ~ A T 5Cit I / l i l l / I I I I I 11 I I i

......... + ......... + ......... . ......... + ......... + ......... + ......... + ......... • ......... + ......... • ......... + ......... • - )19

T it A It ItTCPkAC T ATAT~ETET AT ¢ AT ~ : ~ T C T ~ A TT(CAACCA TTCTCTCTTATATAATAC TItTA/dtTACCTCTAA1ATCA+CT A~TTCTTTCA "~-" ~ I I ~ l ~ l ~ ~ ' l I t ' i i ' l l l i H ' I I, i I i i '

......... • ......... + ......... + ......... • ......... • ......... • ......... + ......... + ...... =--+- ........ + ......... * ......... • I

~I'-U#TnalISLAT D IlL 1'~ TItIIGAL&EOGVA PRO~. uL u~. uL u YIL L uF~ P,,u . . r ~ A T ~ c A ~ c i t T c i t A ~ E T A ~ T A c ~ . x c ~ r ~ T A c ~ A ~ T ~ i t c ~ ( ~ c ~ A T i t T T ~ T A ¢ T A ~ T ~ T ~ T ~ A ~ E i t T ~ ¢ ~ T ~ i t ~ A i t T ~ c ~ n 1 , , , , , , , , , , ,, , , , , , , , , , , , ~ . . I~,I~,~..,,~L t,~,A I~,c~I.IL~E ....... ~cE,it,T~,,...itl,*~,L...,itl,~l~litEl~,~,Ic~,, it ,lie ......... + ....... ".+---'----.-+ ......... + ......... + ......... + ......... + ......... + ......... + ........................... iZl

S' "~T**.SCa'r [O I t ~l~ T Pi~ TA*SA~. AA*~P. K P~o,L [ ut~u L~ ukE uEt. YVe~t.W L Fx

Fig. i. Comparison of nucleotide sequences from the 5' -flanking and

untranslated regions of the phaseolin (Pvu ,8) and soybean ~'-subunit of ~-

conglycins (Gma ~'). The nucleotide sequence numbering system was set by

the overall alignment with nucleotide I being the shared capped site A-

nucleotide. This numbering system is continued in the positive direction in

Figs. 2 and 3. The asterisks were used to maximize identities. Nucleotide

sequence which may be important to the expression of these genes are

indicated; TATAA-elements are overlined or underlined once and CCAAT

elements are twice overlined or underlined. Possible enhancer elements are

indicated by arrows.

DNA regions which share a high degree of identity, see positions -357 to-

338, -330 to -293, -149 to -127, -119 to -98, -89 to -61, and -33 to -13.

The latter two sequence regions contain the known cis-acting regulatory

elements CCAAT and TATA, respectively. Functional analysis of these

specific DNA elements to determine if they have a role in signaling tissue

specific and developmental expression of these genes is now approachable

(see below).

One additional cis-acting regulatory element is located in the 3'-

untranslated DNA region and it is used to signal mRNA splicing and poly(A)

adenylation. The 3'-untranslated regions are. both 135 bp in length and

appear to have diverged to about the same degree (36%) as the coding

regions. Nucleotide sequence homology near the poly(A) signal is somewhat

less (Table I and Fig. 3), possibly because the Gma-a' gene contains

overlapping poly(A) signals (9).

~volutionarv relationships of exon re~ions--The degree of identity found

between the orthologous exons, and even within each exon comparison varies

considerably. Most of the exons contain regions of highly matching

34 I E R R Y L S L I G H T O N I a n d P A L I L A P. C H E E

~ l ~ . A l P E P I I . E ¢ L E A V A ~

v ELEuAL ASE .L~uSE RAt ASE ~PHEAt ATX fSERL~uAmGELU6L U, ...... ,o.oo,,o.o,o.., ............. • ......................................

F~u e CCTGGCAmACTTTCTGCCTCATTmCCACTTCACTCCEGGAGGAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . * . . . . . . . . . . I l l I l t l l l H I I I I I l l l l l l l I I I I I I 1 ~ I I

G.A = ' CC~A~CATC~TTmT~TC~CATTTGG~ATTGC~TAn~GAAAA~CA~A~CCCAGTCACA~CAA~TE~CTCCGA~T~EC~TA6CGA~AAGACmCTACAGGAAC£AAGCAmC~A

e L E u AC • S~ R VA L S E e VA~ S E • P ' z 6L T i c ~ALA T V R T e w Gt u Lv S GL N]AS N PRoSE~ I S AS N1 Y S [ Y S L E O ~ G S( m C v s As NSE EEL o L • S AS P S E m T • m ARGA S N~L I At A [ • S H;

q

Sr~NAL ~FPTIn~ CL f~W~F

. . . . o . . * . . . . . . . . . . . . . . . . . o o ° . . . . . . . . . . . . . . . . . . oo . . . . . . . . o . . . . . . . . . . . . . . . . . . . . . . . . . . . . o * . o . . ° ° o . . o . . • . . . . . . . . . . . . . . . . . . . e . ° . . . . . . o . . . . . . . . . . . . . . . . . o . . . . . o . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . • . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . i , CGCmGTTGCAACCTCCTTAA6GTGGAGGAA~AGAAGAAT~EAA~A6ETCAAA|mCACGACCACGACCACAACACCCG~AGAGGGAACGTCA~AACACEGTGAGAAGGAGGAA~A

. . . . . . . . . • . . . . . . . . . ÷ . . . . . . . . . • . . . . . . . . . • . . . . . . . . . * . . . . . . . . . + . . . . . . . . . + . . . . . . . . . * . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . $61 s A c ~ A ~ C v s k s , L E u L ~ . l .SVkLGC UGL U~LOGLUGL uCYsfiLUGLt~GL Y GL M l L E P~OA~OP~ok~GP~OGL~NI sP~oEc u k ~ G c uA~GEc NGZ ~HISGL ~GI u[Y set oGL uAs

1

.o . .o .o .o .•°o. .o . . . . . . . . . oo°.°oo . . . . . . . . . oo . . . . . . . . . °.o . . . . . . . . . . . . . o . . . . . . . . . . . . . . . . . . . . . . ° . . . . . . . . . . . . . • . . . . . . . . . . . . . 0 ~ °oo°°..oo.°.°.o°°°o°..o..°.o.°.. .°.°..o°oo°o.°oo.o°°..o°°o.oo. . . . . . . o. . . . . . . ° . . . . . . . . . . . . . . . . . . . . . . . . . . . . ° . . . . . . . . . . . . . • , ' CGAAGGTGkGC~GCCkCGTCCATmCCATm~CACECCCACGCCAACCTCAmAkGAGGAAGAGCAC6kEAGAAGGAGG~A£~CG~ATGGCAm~AAGEAEGAkAA~CkCEG~6G~AA

. . . . . . . . . ~ . . . . . . . . . . . . . . . . . . . • . . . . . . . . . ° . . . . . . . . . + . . . . . . . . . . . . . . . . . . . ° . . . . . . . . . . . . . . . . . . . + . . . . . . . . . . . . . . . . . . . _ . . . . . . . . + i~1 PGt U GL ve t uGl ~P~Ok~GP.oP~EP~oPxEPmoA~sPmok~GL~P~oXI sGLwGLuEL uGLoHI SGLO~L wLv s GLuGt u HI SGLoT~@HISA~L~ S GL oGt ~LYsHrSGL v Gz ~Lv . . o ° . . . . ° . . . . . . . . . . . . . o . . . . . . . . o . . . . . . . . ° . ° . . o ° . o . . ° ° ° . . o . . o . ° ° ° . . . . . • . . . . . . . . . ° ° . o ° ° . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . .

e . . ° . . o ° ° . . . . o . . . . . . . . o ° . . . . . . . . . . . . . . . . o . . . . . o ° . . . . . . o . . ° o . o . . . . . ° . . . . . ° . ° . ° ° . . . ° o o . . . . . . . . . . . . . ° . . . . . . . . . . . . . o . . . . . . . . . ~, GEGkkG T GAAGAGGAAC AAGA T GA AC GTGkACkCC6ACECC CkCACC AACC m A TCAAAAGGkAGAGGIAAAGC AC GAATG~AAC k( AA~ A6GAAAAGC ACCAAEGAAAGEkAAG T GA

. . . . . . . . . • . . . . . . . . . ° . . . . . . . . . • . . . . . . . . . * . . . . . . . . . • . . . . . . . . . * . . . . . . . . . • . . . . . . . . . ° . . . . . . . . . ° . . . . . . . . . * . . . . . . . . . ° . . . . . . . . . • ao l S GL y SE ~GL U G L U EL UGL N AS P EL u AeGGL U Has P eokeQ P i oH i SEL ~PeoH ~ s EL x LY s EL u E l U GL u l ¥ $ H ISGL u T e P GL fiX IS L~ $GL ~GI o l Y S H r~ G l N EL Y L Y S GL u S[ ~ L

. . . . ° o ° . . . . . ° . . . ° o , ° , . ° ° . , ° , . , ° ° ° ° ° ° , . ° ° , . ° ° ° ° ° ° . ° , . . . . . . . . • . . . . . . . . . . . . . . GLUGLuSE,EL ~ k ~ . . . . . . . . . . . . • .As~P~oP,~TynPHFA~ Pv~ ~ . . . . . . . . . . . . . • . . . . . . . . . . . . . * . - . - . - . . . . - . ° . - . ° . - . . . . . . . . . . . . . . . . . . . . . . . . . . * G A A G ~ E A G C C A A G A | . . . . . . . . . . . . . . AACCCCTT[TACTTCAA

I El f I ; rF I I I I I

U GL U GL U GL U GL . k s P GL M ks P GL U AS p~L u GL u E1 . k s e l ~ S EL uSE ~G L ME1 U S( ~G t u G l y SE RE L . S E RGL N kMGGL U PmoA,~Ae~HI s l , S k s . L ~ s As .Pe o e . ~ HIS P . e A s

~5F nkseksMSE mTePkSMlx~ leuPxeLYskS~GL,TwGc~Hi s 1LeA,sV~LLeUEL~AeEP,eAsPEL~GL~S£eL~$Ae~L uEL~AsMLEUEL uk~ e l v e A , ~ l e u V * LGL UP~ Pvu e CTETG~CA~CTCCTGEAACAC1CrATTC&~AACCA~TATGETCAC~TTC~TGTCCTCCAGAGGTm~CC~C~MCC~CG~C~TC~EkATCTTEAA~CTACCGTCTT6TEEA6[T

I I I I I I I I I l l l l I I I I I I I / 1 1 I / I I I I I / J i l l I I I H I I I I I I I I I I I I I I I I I I P [ I I I I I

M S e + L v s k M + P x E E L . . . ° l m mL E u Px E L+ s AS MEt I TY I E l • HIS VAt k , s VA ~LE u Gt , A m G P, E AS ~ l v s Am m So ,Gt ,E~ N l E U 6L N A ¢ . l [ u A,~As ~ T • ~AmG [L E l E O ~L U P H

£A~sSEmL~sP~oGLuTN~LEuL~uL~uPmoGL~GLnALAAsPAL~ELULEULEULEUV~LYALAmESE~E ~,1 Pvu ~ CAGG[¢CAAACCCGAAACCCTCCrTCnCCTCAG°AGGCTGATGCTGAGTTACmCTAGTTGTCCGTAETGET . . . . . . . . * o ° . . . . . . . • . . . . . . . . . . . . . . ° . . . . . . . . . . . . .

I t / / l l t / / / I I / I l l / I H I / I / I / I I / / / I I I / 1 1 I I l l / / I | / / / I / 1 / GMA 0 ' £~CmC~k~CCC~C~CCCnCT~CTCCCCC~CC~TET~C~CTG~TT~CTCMCGTT~mCTT~A~GGT~GTMTCCnTC~mTC~AAATA~ATAAATGmATT~ TTTATGC ......... + ................... + ......... ° ......... • ......... ° ......... + ......... • ......... • ......... ° ......... * ....... • gGl

eks iSemLysPeokseTxeLeu leu teuPmoH sX skc*ksPAc~AseTv~LEu c ( V ~ l c E L e u # s , 6 ~

P~o ~ ° ° . ° o . , , ° ° . . . . . . . . . . . . . . . . . . . . . . . . . . . * . . . . . . . . . . . . . ° . . . . . . . . ° ° ° ° . ° ° o . ° ° . . ° . ° ° ° A A G T A A T T G C TAC TGGT.A mAC TTGTTTCT 1C [ TG.CAGAA I / / / I r I I I i i i i i r l I r I [11 i i Id

G.A ¢ ' GG~T1AAG~A~TTATT~]~A~TkAA~AkAC~k~T~k~kkAAkCTAATCT~ACTATGACk~7TAi'TTk~TTk&TTAA|i~A~A|CA~CACA GCTTT ACT CTCAIAA ......... • ................... + ......... + ......... ° ......... ° ......... ° ......... ° ......... + ............................ IO~l

([.rmo# 1, ?05 p)

Fig. 2. Nucleotide sequence alignment of Pvu ~ and Gma o'-genes covering

most o f e x o n 1 a n d i n t r o n 1 . P o s s i b l e l o c a t i o n f o r s i g n a l p e p t i d e c l e a v a g e

and secondary amino acid cleavage of Gma c ~ ' are indicated. The N-

g l y c o s y l a t i o n s i t e p r e s e n t i n G m a a - e x o n 1 i s s h o w n ( s e e a m i n o a c i d s e q u e n c e

line) in the 522 bp region not present in Pvu ~. Exon and intron junction

are s h o w n b y v e r t i c a l a r r o w s w h i c h e x t e n d t h r o u g h b o t h n u c l e o t i d e s e q u e n c e s

(also see Fig. 3).

nucleotide and amino acid sequences being interspersed among other regions

which show either complete divergence or loss of genetic information due to

deletions or insertions. This observation suggests that evolutionary

constraints both at the nucleotide and amino acid sequence levels for these

seed storage proteins may consist of units which are smaller than the

individual exons. Exon 3, the smallest exon (81bp), is the exception to

this observation as it shows the least amount of divergence (19.6%).

Divergence in the other exons ranges between 28 to 52% (Table I) with exon 4

showing the most divergence. However, the most noticeable difference found

by this exon comparison occurs in exon I where Gma-e' contains 522 bp

(encoding an additional 174 amino acids) which is not present in Pvu-~ (Fig.

2). The nucleotide sequence of this 522 bp region is unique as it does not

share nucleotide or amino acid sequence identities with any 7S storage

protein gene (9). Thus it appears to have been derived from an event other

than the duplication of surrounding DNAs. DNA duplications have been found

PLANT SEED STORAGE PROTEINS 35

in many storage protein genes (22,26). Doyle etal. (9) suggested that this

522 bp region is the result of a transposon insertion event; however, no

evidence of transposon-like sequence elements exist near its boundaries. If

this sequence is the result of a transposition event, then the directly

repeated DNA elements common to a transposon have been lost as a result of

evolutionary mutations. This insertion event has retained the codon

integrity of the exon I coding region which results in the addition of 20

kDa to the molecular weight of Gma-~'-polypeptide, accounting for most of

the molecular weight difference between Pvu-~-and Gma-e'-polypeptides. The

amino acid composition encoded by the insert is quite different from that

found in the remaining Gma-e'-polypeptide as it is rich in glutamate

residues (Fig.2); however, despite being quite different in both nucleotide

and amino acid sequences from the surrounding proto-Gma-a'-gene this insert

has been able to achieve both functional and evolutionary stability.

The evolutionary relationships among the amino acid sequences of these seed

storage proteins was further refined by adding the amino acid sequences of

the vicilin-type seed storage proteins from pea (P. sacivum) to the

comparison (9). Information concerning the location of conserved amino acid

sequence regions is important because such regions would be poor choices for

the placement of amino acid modifications. Amino acid replacement mutations

designed to increase the number of sulfur-containing amino acids should be

placed in regions which show little or no conservation of amino acid

sequence. The presence of the large insertion in exon I of the soybean Gma-

o'-gene shows us that the exon I region of Gma-~'-and possibly that of Puv-

~-gene can tolerate a considerable amount of nucleotide and amino acid

sequence change. Thus exon i may provide an excellent target for the

insertion of a DNA sequence which encodes a large number of sulfur-

containing amino acids. Needless-to-say, the proof that such modifications

can be tolerated will need to be tested in the seeds of transgenic plants

(see below).

EXPRESSION OF BEAN SEED STORAGE PROTEIN GENES IN TRANSGENIC PLANTS

AnaSysis of ~ene transfers--Intensive study to elucidate the mechanism by

which Agrobacter~um =umefaciens can transform dicotyledonous plant species

[the transfer of part of its large plasmid DNA (T-DNA region) into a host

plant genome] has resulted in the development of T-DNA-derived vector

systems for the transfer of foreign genes into plants (30). The transfer

and integration of foreign DNAs into the genome of various dicotyledonous

plant species is now routine. The first plant-derived gene to be

36 IERRY L, SLIGHTOM and PAULA P. CHEE

IL vSIRAuA I LE L[UVALL[UVALLYsPmoAsPAspAm6Am~5~uTv*P,e PN~ LeuTMmSemAsMAsmPNo k ~P.e S~ R...o.. Phlu • A.IAATGGTAATGAGTTTTTrA ~AATTTCA~A5CG~ATACT£5TCTT55TGAAA~CTGAT5ATCGCA5AGA5TAcTT~TT~CTT;i.C[5AG{;GATAAcCC~ATATTC1~`~ ....

J I I I I JI I I I I I J I I I I l l l I IH / I I / /H / / / / / I ~ ! ! t I / /11 / ! ( I l / I 5~A a AATTA~6TTAT~5AATTTETT~TGTTAAnA~A~ACT~ATTCTTACCTT&ET5AA[AAC~AC~ACC~AGACT~T~ACAACCTT~AAT~65[5AT5CCCTAA~A~TCC~t~CA55A

. . . . . . . . . + . . . . . . . . . * . . . . . . . . - . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . " . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . 1~01 ~ A U J It (LmEuluRLEUVauAsnAsNAspAspAR~AsF, SeITvmAsMLeu6uMSemGL YAsPAu~LeuA~e¥~LP~OAL ~GL v

• . oo . . . . . oo . * *o . . oo .oooo . * . . . . . . . ~spH~sGL~ l ys l [ o u~ l LL ~ ~ P w AusGt~T~MI P~[TYmL[uVA AsMP~oAs~PnoLvsGLoAs~LeuA~I ~1 £5Lnl UALXJq£T Pvu e .o* . . . . . * o . . . . . . . . ,$ ..... *°~°°~*~a~T~AC~A5AAAA~C~CTGCA~AAC~ATTT~{TATn~T1AACCCTGATC~AAAGA~ATCT~A5AA?AA?~CAA~C5~AT~

I I l / I ' l / | I l l / / / l / / / l i l l l l I / I / t / I I I / I I I I

. . . . . . . . . • . . . . . . . . . • . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . • . . . . . . . . . + 1~21 T HeTHePHE ]Y~VAL VALASNP~oAsPA$NASPSLIFASNLEuAn6~IET J LE ALAGL v THATHA~4eTveVALYALAsNPeoAsPAsMASPDL u As~L[uAeGnET ]~E ]H,ke UAL AlL e

PeOVALAS~AsNP*OGL~ILeHIS*O. '1 ([~TeOn 2. 88 Se) " ' PvrJ n (~CCGTTAACAA(~C(.T(A6ATTCAT *..6TACT6CCTTTTGTA*ATACCGAACTAATTTTTT6TTATTTTAACTTGCAATTTCTC TCCAAATETGATSATAAATGTT [6T[ . . . . . CTS|k

. . . . . t ~ t t I l ~ I k , t & , ~ , l . . t , 6 , d t l ~ 6 ~ t b t t d t d t I . 6 1 k t t W t , , t 6 t t l t t J W t t ~ . . . . . . . I t l~l/ l lcltalA~l i lcal~Icl l t t l~tt t tc~l~:l ,,,~ . . . . . . . . . . . . . . . . . . . + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + . . . . . . . . . . . . . . . . . . . ~ . . . . . . . . . . . . . . . . . . + . . . . . . . . . . P,OYALAs,LvsPmOGLoA,GPXeGL~I . . . . { ] , t ao , 2. 87 mP)

"i~k~LuPHePHeLEuSEeS(eT~eGLUALAGLIIGLNSeeTvmLEU6LN6LuPI~[S£*IL~sHIslL releuGtuAuaSeePaeAsn . I PVLr e 6K~AATTT~T[(TAT~|A6CA(~AG~CitA~AAT~CTA~TT~CAA~A~TT~A~¢AA~CATATT~TA5A6~CT~FTCAAT~TA~AAA&AAAA~A~(AT~T~AC ACATATTTGCGTT

!! !!!!!!!!!'!{!! '!!! !!'!' !!if!t!"!!! !!!! {!!! !'!!!!'!"!{!!!' " l ! ' " " '" -~e~e~P~P~Leu~e~$e~H~La6~Se~T~Leu~L~L~PNeSe~Lts~s~eL~AL~Se~T~Ase I ~i

([~teola ~, ]2q lip) I-~£eL~sPHEG(oGLu]~eAs~AeG¥~

. . . . ,I ,li,~ , ,I I, ? ,,TI t H ,1 ,l ,l~ lrd,l [fliT ?I It,fill J,~lllTTl?l?~l?l[ irll ?~I GCCATTTASCTA5TA TTTGT TAAATGTCA A TI TT5AA o.. 6 EAk 5A .A A A A A .... A A IA A & AAA A A A CAA AG

5~A -' °°°...°~..*°°`ACACCATCTAA.°*T~ACGCTA6CA.AATTCAATAT(~.ATCATTATCCTTATATTTGTHCCECGCTT6~nTT.~TA~CCAAATT(5A55A~ATAAACAAEGT . . . . . . . . . • . . . . . . . . . . . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . • . . . . . . . * . . . . . . . . . • . . . . . . . . . * I 5 8 1

( ] m t n o , 3 , 8 S I P ~ Im j ] M R L Y s P , E E L U G L U l t l A s m L v s V x T

~(Pvu • O'LV)---' L uL~4 6 u , , ,6~u5 U6LYSL~GL,-,---,SLU6LUGL~EC~6L~ELUGL~VALI VALAs~] EASPSE~GLU~L~[ ELY$6~ULEUSERL~$HISALAL~SSE~$E,$E,A~

Pvu e ~C~[GTT~5~.**GA56~556ACAS~AA . . . . . . ~A~A~ACA5{AA~A5G~A6T~A~|5AA~A1[T5ATTCT~AACA~A~TAA~AACT~A~AAAcA~AAAAT~TA5|~¢AA6 I / i l l | l / I l l / I l l / / l l l l l I I | / l ! / / f i l l / I l l / / / 1 / / / / / I I / I I / I I l J / / l l / / / l l l l l l / l / l l I I I /H I / [ i / I I /

G~ o ' TCT5TTT~5TA~A5A55A~6&5{AGCAAC~AE6G6A56~5~55~T6CAA~A5A~T~AT~5T55AAATCTC~AA6AAACAAA~TC6G5~ACTGA~A~AC~T5CCA~ATC~A5TTC~A~ . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . • . . . . . . . . . . . . . . . . . . . * 1 5 0 1 c leuP,EGcvAn*GLuGLuGt YG~_ met ,6L N6L YEt uGL uA~6LE UGL,Et uS~ eVA C I L[VaL6LU I L ESEeLYsLysGL N [ L E AR6GL uL[uSEeLvsHI sALALYsSE,S~RSE mAn

GLYsSEmLEuSEeLYS6L,ASPA$,THel EGLYASNGLUF~EGLV~LUAmoTHmAsrAsmSERLEuAs~VALLEul ES(eSemlLeGco~eTGuu Pvu e 5AAATCCCTTTCC~AACAA5ATAAC~AA~T6~AAAC~AATTT~AACCTGACT~A6A~6ACC~ATAACTCCTT5AATGT~TTAA~CA5TTCTATAEA~AT~G~A . . . . . . . . . . . . . .

I I I I / / I I I I I I I I / I / I l l I I I I I I / I I l l I |1 I I I I 6~ ~ ' GAAAACCA~TCTTCTEAACA~AAACCTTTCAACTT5**°°°°GEAA~CEC5ACCCCATCTATTCCAACAA5~TT5ECAAGnETTT5A5ATTACCCA5AGAAACCCTCA5CT~C555A

. . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . + . . . . . . . . . . . . . . . . . . . * . . . . . . . . . * 1 9 2 1 ~L*s t , l 1 L| ~(~Sl ~6LUASPL y$ P,OPHE ASN L|u* * ** **~L ¥SE~AEBAsPPIO I LETYRSE Rls,LvsLeuSu ~LYsLEuP~e6LulL e T~aGL nAmSAsnPRO~L NLeuk,6k$

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , , . . . . . . . • o * * * * . . * * * . . . * * *o . * * . o . . . . . . . * * . .GA~AAATACAAAGAAAAA CATA AGA AAAC EA AATTGAETT [ATTATTCACTGTCST TmGTTAEAAAAT TTAGTA

Pvu . l / l / I I l l l l l ~ ] I f l 1131 l l l 111111111 I I I I I1 , f l l I l l l l l I I f t l l ~A " ' CT~5GAT~TCTTCCTCA~iT~TT~TG5ATAT6A~CGA~T~A6CA.~*.5A~AAAC.*~T5AAC~.~.TGA~CA~555TTCTA~T..~ACTTTC~CTTAGnAGAGAAACCT~TTC

' ~ -~ i ,A , . L . . . . . . . . . . . i , i t v i t . . . . . . . . . . . i i ~ i * o i , i . . . . . . . . . . . iLO~i .~cu~ i , Pvu e I]~A~AITAT~TTAAATAAT~GTTTI,. ,,.,,,,, ,,, ...... T'ITT?H"~"TtT ' , ,,, , 'Nf1TITT~T~ff ' ' , , , TM, 'fl11"975~* ' , el i "", , 'gl?flT"',, ,,, ~',, '~",., ', GM& ~* TT5A~ACT**°`TTAAATAAT~TTTACTnTT(TTT6TTCACAAATATA~5AgTCTTTTTCTACCA(ACTT[AAHCAAAGGCCATAGT55TACTA5T5ATTAAT~AA55AEAA5CA

. . . . . . . . . + . . . . . . . . . + . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * ....... + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * 2151

i~ LYALALEUP~ELEuPAoH I SPHEASNSERLYsALA ]IEVAL YALLEUYXL I L EAsNSLUSL YEL UAL~

HZ Va 6 uL UVA 6 VP~oLvs5 YAsaL~s5 uT~R . . . . . . . - . . . - . - . . . . . . . L UELuTYn6LUS/RTYaA~sALxELuLEuSEeLYsAsPAsPVALPHEVALI [P,oAL* Pvu 0 CA~GTFrG~¢~[TETi~5~CCCAAAA6~AAATAAEG~AACC . . . . . . . . . . . . . . . . . . . . . T}65AATAT6ASAGCTA(ASAECTGAGCTTICIAAASACGAT~TATTIGTAAIICCCA6CA

/ / I I I / I I I l l / l / I t / f I I I / I I / I I / / I | l l

As, [ L IGLuLEUYALELY [ L E LYSGL UGL~ELREt NA,~GL ,6L NGL M6L UEL U~L NPRoL~u 5L UVA L A~GL~S Iv RARGAL AGL u LEo SE nGLUGLNASP [ L e PHEVkL IL e PROAL A

A ATYIPMoV A AI LY A AT~e$ ~AsnY Y 6 y] ASNA AS A ,As~As,AR~As~ uL UALA am

I / i / / / / / / / / / |H i l l / / / / / / l l i / / / / / / / / I I I I / / i I I I I I / / I /11 I I I I / I i l / l l l I I I I I I I / I t l l l l i I I I T I I I I I i t l I I I I I

6L y Ty~PROYALM~ TYALiA s~AL A]~liSe nAsr L[ UA$~P~E PHe AL ~ ~E 6L y i L EASaAL AGLuAsnA$,SL ,AnGA SHh ~ L~ uAL A[ ~ 1

( [~ t~o, i;. ]0]1 eP) J[ ILyLY$|NeAseAs AT T AATATASG TT6*** . . A 66AAT TTA AT*AA * . . . * * * * . . . * * * E AA. C T6A o . * . . . . . . . . . A TAAA A AA~ AA AC5 A AA

" ° ' ,T?,,.t ?.~, TT?T .TTT, TTT.,dT ?T,T~? ,TT ?Tf,TT?,,,TT?,I T?, ?,?,,

,V~ I eSe~Seel 5L~A,$A AL uAspELvLYsAsPVA u6 v uT~e S RSLYSEA5 YASPGLUVA ~ TL?sL U] eAs,LvsSL~$en6LVSERlVRP~ YALA$ Pvu e TGT~A1/~6~A5CA~(~5TA6A~TC1~AC6~rAAA~AC~T~TA~[T [T~TCT~T~(~AA6I }A~A6~}~A}CAA(AAA~A~A~T~ATc~TA(T~}~T~A

I I / I I I I l i I I I I I I I / / I I I ( i l / / / 11 I I f l l l l l I t l ]1 I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I 5~, o ' T5TC~tAAGCCA~ATACCT~ETCAA~6t5~A55~G{TT5C5TTCCCTA6ETCT6C~AG~T~t~5~5~ACCTA~TA~5~6C£~G~5~6TCCTACnTG~GG~

......... • ......... + ......... + ......... + ......... • ......... + ......... + .......... - ........ + ......... + ......... * ......... + 26qi IVALILE SI~GLN I L~PnoSI~GL, * . - - * * * * - - * , * . -VALGt mELUL[uALAF~E PnoAnGSERALALvsAsP ILe6LUAS~LEu iLeLvsSe ~6LNSE ~ 6 t uSE ~IFynpHEYALAS

',--,,,~----(Pvu • O~L~ ) ' ~ ' - - - - - - ' SE nH I s 6L "6L nGL U~LnGL NL YS6L YAR AL ApH~VAL I"yNT~

PALAH I SH I sH t SGLm***6LmSLU6L,SLNLyS~LYAe sL~SSLV Pvu ' ITGCACACCAT?A~CAA'"CA~AACA6CAAAA~?AA?TCACCAACA6G'ACA5CAAAAGE£AAGAAA???TI I I i I i I / I l l . . . . . . . . . . . . . . . . . . . . . ~?AI[TGTG[Af]?AAIkA?TA]*'°I I I I I i

5~ • ' T6CTCASCCTCAGCASAAASA55ASEGGAACA&555AAG.--oo...-o-. . . . . . ~..oo.*.*oAAAGGGTCCTTTSTCTTCAATTTTSA556CTTT -..TACTSAA|AAGTA]GTA . . . . . . . . . • . . . . . . . . . • . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . • 2 7 6 ]

• AL *6L NP,OGL qSL NLY$6L uSL UEL *A s,Ly$SL YA,* . • . - * * • . - - . * * . - - * * * - - - " " " - s~ . • SGL v P~oLE USE a Se R 1L e LeuAR6 At A P~ E - - . T • M T ~.~

' 3 ' - u t ~ , , ~te lSa ~r) Pvu ~ 5xActaAAxT~CxT5T~55t51AA5x5CTcAT5~5A5C~T56AATATT5~ATCC~CCa~5~AACA5TaT~ATAACT5A~CT¢CATcTCACTT(TTCtAT~AArA~A(~A5~ATG.T!

I I / / I I / / l i I I I / /H l i I / | 1 / 1 / / / 1 | / / / i i / 11 / / / l i |11/ / I I / / I f i / / / l / i / / t i / I I I i / I I / 1 / / H

......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + .............. -----+-- ........ - ........ + 2Ssl $' -U, t ,ANSLAte~ (15S ~vl

Pve e AT~AT - - - eOL~(A) I I I

GMA m' TTGTTTr6TEIAC POLY(A)

m,,

Fig. 3. Nucleotide and amino acid alignments of Pvu fl and Gma ~ ' - sequences .

The alignments include exons 2-6, introns 2-5 and the 3' -untranslated

r e g i o n . A s s i g n m e n t o f i n t r o n j u n c t i o n s a n d N - g l y c o s l a t i o n s i t e s a r e

indicated as described in Fig. 2. The termination codon is designated TER.

In addition, the nucleotide and amino acid sequences corresponding to the

d u p l i c a t e d r e g i o n o f P v u o ( 2 6 ) a r e s h o w n .

PLANT SEED STORAGE PROTE[NS 37

transferred into a foreign plant environment was the phaseolin storage

protein gene, Pvu-~,

BSH R S H ~ A ,,, s . . . . . M, e,~B. s ~R.. . . . ~ R i l [ l i J I I I

. . . . . . . . . . . . . . . . . . . . I .OKb B . . . . . . . . . . . X(S) " '"~"R- - . .R B L . . .o - "

B. . - - I ~ ' ' " ' " J - - ) " " . . . . . . . . . . . . i 2:5o ~ s

TMR44) TMLI6b) . . . . . . . . OCS(3)

H G - ' " ' " ° " TAT~p4mm R c . L.-: '" i i t I s . . . . . H L..a~.~.' . ' . : . j - -

a. ~ V ' " SUT - I : J i m L ~ . 200 ~ [ pBR "] p3SR-17"/'.4 Iqmuolin " NPTri ' I

G T a T ~ _ _ ~ e H

ru -rJ" 5~t o~t 7-'u , :m :., . ;I 2 ° ° " ' pEIR p3.BB- eDNA31 PMmeoli. NPTH

Fig. 4. Placement of phaseolin native and minigene constructions into pTi 15955 Ba___mmHl fragment 17a. (A) Physical map of T-DNA regioD of pTi 15955. (B) The Ba_._~mHl fragment 17a contains a single Sm___~al site which was converted to a Hindlll site (18). (C) The native phaseolin gene linked to the neomycin phosphotransferase II gene (NPTII) was cloned into this Hindlll site. (D) Similarly, the phaseolin minigene linked to the NPTII gene was cloned into the same shuttle vector (5). Restriction enzyme sites are: B, BamHl; R, EcoRl; Smal, and H, Hindlll. The symbol G denotes the fusion of BamHl and ~,~II sites. Phaseolin gene were transferred into various A. tumefaciens strains as described by Muri et al. (18).

and it was transferred into the sunflower genome (18). Fig. 4C shows the

phaseolin structural gene region and part of the T-DNA used as the vector

for this transfer. Confirmation that the phaseolin gene was integrated into

the sunflower genome was shown by assaying for the presence of phaseolin

polypeptides in the transformed sunflower callus tissues. Surprisingly,

this phaseolin gene was found to be expressed in the transformed callus

tissues and the total poly(A)+ mRNA fraction isolated from transformed calli

does contain mRNAs which encode phaseolin. The Pvu-~-gene probe hybridizes

to a 1700 nucleotide mRNA which is identical in size to that found in the

developing bean cotyledons (5,18). The size of the phaseolin mRNA

hybridizing signal indicates that the five intron sequences of this Pvu-~-

gene (a total of 515 nucleotides) have been removed to obtain a mature

phaseolin mRNA. This result suggests that plant mRNA splicing mechanisms

are conserved between widely divergent plant species. Similar size

phaseolin mRNAs have been isolated from tobacco calli which contain either a

native Pvu-~-gene or a mutated Pvu-~-gene which lacks the 5 introns, a

phaseolin "minigene" (5) (see Figs. 4D). Expression of this phaseolin

minigene construction indicates that intron splicing is not necessary for

biogenesis of a stable mRNA molecule (5).

38 IERRYL. SLIGHTOMandPAULA P. CHEE

Tobacco calli containing either the native or minigene phaseolin

constructions were regenerated, set flowers (which were self-pollinated),

and grown to maturity. During the course of this regeneration and

maturation process different tobacco plant tissues were assayed for phaseol-

in expression. These tests showed that in most cases phaseolin expression

ceased soon after the regenerated plantlets reached a two leaf stage,

phaseolin polypeptides were not found in tobacco stems, leaves or flowers.

However, high levels of phaseolin polypeptides were found in the embryonic

tissues of the developing tobacco seeds (24). The level of expression in

these tobacco seeds was about 1000-fold higher than that found in the callus

tissues, suggesting that this bean-derived gene can respond to tobacco

developmental regulation factors in a manner similar to that found in the

developing bean seed. That is, the interaction of tobacco seed specific

developmental regulating factors and developmental enhancer DNA elements in

the transferred bean DNA are adequate to obtain developmental expression of

this bean gene. A similar set of experiments involving the phaseolin

minigene show an identical pattern of expression in tobacco seeds,

indicating that the DNA sequences responsible for tissue-specific and

developmental expression of Pvu-~-gene are not located within intron

sequences (P.P. Chee and J.L. Sligh~om, manuscript in preparation).

Again, the transfer of this bean gene shows that there is conservation in

the mechanisms which regulate the developmental expression of seed storage

protein genes in taxonomically distinct plant families. If this is

generally true, the transfer of many plant genes among the different plant

genera should not be limited by differences in gene regulatory mechanisms.

Analysis of foreign proteins in transformed seeds--One-dimensional

polyacrylamide gel analysis of phaseolin polypeptides isolated from tobacco

seeds shows the presence of authentic 46 kDa phaseolin polypeptides along

with smaller polypeptides which react to the anti-phaseolin rabbit

polyclonal antibody (24). These results suggest that full-length ~-

phaseolin is produced in transformed tobacco seeds and that these

polypeptides are correctly processed for removal of the phaseolin signal

peptide and the addition of N-linked oligosaccharide sidechains. However,

at some point in the development of the tobacco seed cotyledons some of the

full-length phaseolin polypeptides are degraded by a set of distinct

proteolytic cleavages. This degradation appears to follow the pattern found

in germinating bean seeds (24). The reason for the unstability of phaseolin

polypeptides in these transformed tobacco seeds is not clear, it could

possibly be due to the accumulation of more protein than the tobacco seed

storage protein bodies can protect from proteolytic enzymes. Analysis of

PLANT SEED STORAGE PROTEINS 39

tobacco seed storage protein bodies shows that the expressed phaseolin

polypeptides are correctly targeted as they accumulate in the amorphous

matrix of the protein bodies (12).

A similar set of experiments have been done with the soybean Gma-e'-gene,

except it was transferred into the genome of petunia plants (I). Analysis

of the petunia seed proteins showed that Gma-e'-polypeptides are produced at

a level similar to that found for phaseolin, and that expression is limited

to the developing petunia seed embryos. The first Gma-~'-related protein

produced is a 55 kDa polypeptide, which is followed by later accumulation of

larger polypeptides, 76, 68, and 64 kDa. These larger Gma-~'-related

polypeptides appear to accumulate following the development of the seed

storage protein bodies (I). In contrast to the production of phaseolin

polypeptides in tobacco seeds these Gma-~'-related polypeptides appear not

to be subjected to proteolytic cleavage (I). This observation suggests that

either the petunia storage protein bodies can protect foreign storage

proteins better than tobacco protein bodies or that Gma-~'-related

polypeptides are less susceptible to proteolytic degradation than

Pvu-~-polypeptides. Transfer of the phaseolin gene into petunia should

identify the reason for this difference in stability. Nevertheless, the

developmental expression of both the Pvu-~ and Gma-~' genes products in

tobacco and petunia seeds clearly shows that these plant species do provide

excellent heterologous whole plant systems for the testing of mutated Pvu-$

and Gma-~' genes engineered to either investigate regulator mechanisms or to

improve their nutritional quality.

IDENTIFICATION OF SEED EMBRYO-SPECIFIC GENE REGULATORY ELEMENTS

Having demonstrated the expression of the Gma-~'-gene in transgenic petunia

seed embryo tissues, Beachy and his co-workers used this model system to

examine the 5'-flanking DNA region of the Gma-a'-gene for DNA sequence

elements which interact with petunia seed developmental regulatory factors.

As mentioned above, transcriptional control of most eukaryotic genes require

the presence of a TATA-element for proper initiation of transcripts and to a

lesser extent a CCAAT-element which appears to modulate the level of

expression (19,29). In some cases, enhancer elements have been found which

greatly effect the level of gene expression in a tissue-specific manner

(13). Both Pvu-~- and 6ma-~'-storage protein genes contain sequences which

match these identified regulatory elements (9), see Fig. i. Using the

nucleotide sequence comparison of the two seed storage protein genes as a

guide, Chen et el. (6) constructed a series of deletion mutants of the Gma-

40 IERRY L. SLIGHTONIand PAULA P. CHEE

a'-gene to test the effect that each of these known regulatory elements have

on expression and to locate other regulatory elements. Each mutated Gma-~'-

gene was transferred and integrated into the genome of petunia. Fig. 5

shows a regional comparison of the Gma-~'- and Pvu-~-genes along with the

location of several informative deletions. The levels of Gma-a'-protein

observed by Chen et al. (6) are also presented in Fig. 5 as a percentage of

Gma-~'-protein produced by the native construction in transgenic petunia

seeds (I).

Deletion of the enhancer type sequence at position -560 (see Fig. i) shows

little effect on the level of expression indicating that this sequence

element does not play a role in regulating the expression of this gene in

petunia seeds (6). Little effect on The level of expression is also

observed for the deletions at positions -457 and -257 (6). The deletion at

position -208 shows some loss in the level of expression; however, the most

substantial loss in expression corresponds with the deletion of nucleotide

sequences between positions -208 and -159 as the Gma-~'-gene becomes

vertically inactive (showing no response to developmental regulation

factors) in transgenic petunia seeds. Further deletions at positions -69,

-42 and +14, which were designed to test the role of the CCAAT- and TATA-

elements show no detectable protein expression and thus yield no information

concerning their role in regulating this gene. This series of experiments

clearly shows that the genetic information between positions -257 and -159

of the Gma-~'-gene are required for the recognition of petunia (embryo-

GACAAAAJ 6CAA| ACACAACCAACTCAooAA||AG| A TG T6AT AA~AT(:5(C 51 CATGWA***-,,,T{]T-*oCTAAA|5 AIGCAAA('~AA(AC6T6 TTAA([AT~A

~'~qA • ' AA~AAAAAC~.~(A~TC~A~p~CA~T~6ACA|CA~|A~CCAC~AGCr5A~C~GATC5C~G~CAA5AAAAAAAAAC~A~C~ AAAAGCCATGCACAACAACACETACICACAAAGGC

,F d e l e t i o n s - 2 5 7

o/o p r o t e i n 8 0 %

~ 0 ' - ' ' " - ' ~ ' - f , , ~ " " - ' " f f ~ " ~ ' ~ " ' ~ ' ' , , , ,, , , , ,, ~,,. . . .~... . t . t . t . t~.¢¢...¢~.¢,, , ~ , . I , ' " ' " ' " ' ~ ' " ¢ ' ~ " ~ ' , ~ , ~ ' , ¢ : ' ~ ' I T ~ ' ~ , , , , , , C A T T T T T S I T | A T ] I F Ir

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ : - ; ~ . . . . . . . . . . . . . . . . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , .

65°Xo P~'lp J TI[AACACACGICAACCTGCATATSCGTGTCAT CCAT.~(C(.AAA CI'C~A"A~AT,~(AT5 rCCAACCACC ( C ( k k AATACC AIAAATACCICTAATAICAtCT[A(TT(TII[A "aa'-

I I I I l l ] I I I I I I I I I I I I f l f I I f / 1 1 1 / I I I ] I I I I I I I I I I I I I I I I I I I I I I l l (mA • ' T ~ A ~ 5 T ~ A ~ A ~ E ~ L ~ ( ( ~ 5 ~ G 5 ( ( ~ A ~ G T ~ T ( ~ 5 ~ A A ~ 5 ~ [ ~ T ~ G ~ ( ~ A ~ A ~ ( ~ A ~ ( ~ G ( ~

- 6 9 -42 NO NO

Fig. 5. Comparison of Pvu /3- and Gma a ' nucleotide sequences immediately 5' of the shared capped nucleotide. The deletion points in the Gma o' sequence used by Chen et al. (6) are shown by vertical arrows. The large horizontal arrows show the position of the large imperfect direct repeat and the smaller horizontal arrows show the location of the short direct repeats on both Pvu ,8 and Gma ~° sequences. The relative amount of Gma ~' polypeptides found for each deletion mutant are shown below each deletion point arrow, ND indicates that no protein was detected.

PLANT SEED STORAGE PROTEINS 41

specific) developmental regulatory factors. A more detailed analysis of the

nucleotide sequence between these positions reveals the presence of an

imperfect direct repeat of 28 nucleotides and five smaller (G+C)-rich A

repeats (AGCCCA) four of which are located within the larger repeat (Fig. C

5). Chen et al. (6) suggest that the smaller direct repeat elements may

provide the genetic information responsible for regulating the level of

expression of the Gma-~'-gene in developing seeds of transgenic petunia

plants, and, if true, these sequences may also be important in regulating

the expression of this gene in developing soybean seeds. This hypothesis is

supported by finding sequences which match the short repeats in similar

locations of the Pvu-~-gene, positions -218 (CACCCA) and -208 (AACCCA) and

the soybean gene encoding the B-subunit of ~-conglycinin (S.J.Barker, 3.J.

Harada, and R.B. Goldberg, personal communication). An analysis of other

non-allelic phaseolin gene sequences, including a gene which encodes the

Pvu-~-subunit of phaseolin shows conservation of these short direct repeats

(J.L. Slightom, D.V. Thompson and R.F. Drong, manuscript in preparation).

Further delineation of these putative embryo-specific regulatory sequences

which function like a tissue specific enhancer will be very interesting and

important if we hope to understand how the expression of specific plant

genes are regulated.

CONCLUDING REMARKS

This is indeed an interesting time to be involved in plant molecular biology

research as just in the last three years the addition of new techniques for

plant gene transfers has greatly enhanced our ability to learn how plant

genes function. Much has recently been learned about the structure of many

plant seed storage protein genes and with the use of AErobacterium-derived

vector systems we are now learning much about the regulatory signals which

control their expression in developing plant tissues. We have already

learned that many of the mechanisms and DNA-related signals which regulate

plant gene expression transcend the boundaries of taxonomically distinct

plants. Thus genes isolated from one particular plant species can be

expected to function in another species even if they are not closely

related. The use of transgenlc petunia plants as an in vivo whole plant

model system has already proven itself useful for locating DNA sequence

elements which may be responsible for embryo-specific expression of a

soybean seed storage protein gene.

At the present time the amount of information concerning the function and

evolutionary constraints of bean seed storage proteins and their genes is

42 |ERRY L. SLIGHTOM and PAULA P. CHEE

sufficient to guide the placement of amino acid replacement substitutions

necessary to improve their nutritional quality. Such nutritionally balanced

seed storage protein genes are presently being constructed and with the

development of transformation and regeneration schemes for soybean and

common bean species this goal can be achieved. The necessary components

are almost in place and we believe that the development of a more nutritious

soybean and common bean cultivars should be accomplished within the next few

years.

ACKNOWLEDGEMENTS

We thank Mr. Roger Drong for his helpful comments and help in proof reading

the manuscript. We also thank Dr. Roger Beachy and Mr. Z.L. Chen for

communicating their results prior to publication.

REFERENCES

I. R. N. Beachy, Z.-L. Chen, R. B. Horsch, S. G. Rogers, N. J. Hoffmann

and R. T. Fraley, Accumulation and assembly of soybean ~-conglycinin

in seeds of transformed petunia plants. EMBO J., i, 3047-3053

(1985).

2. R. J. Blagrove, G. G. Lilley, A. Van Donkelaar, S. M. Sun and T. C.

Hall, Structural studies of a French bean storage protein:

phaseolin. Int. J. Biol. Macromol., 6, 137-141 (1984).

3. R. Bollini, A. Vitale and M. J. Chrispeels, In vivo and in vitro

processing of seed reserve protein in the endoplasmic reticulum:

evidence for two glycosylation steps. J. Cell Biol., 96, 999-I007

(1983).

4. J. W. S. Brown, F. A. Bliss and T. C. Hall, Linakge relationships

between genes controlling seed proteins in French bean. Theor.

Appl. Genet., 60, 251-258 (1981).

5. P. P. Chee, R. C. Klassy and J. L. Slightom, Expression of a bean

storage protein 'phaseolin minigene' in foreign plant tissues.

Gene, 41, 47-57 (1986).

6. Z.-L. Chen, M. A. Schuler and R. N. Beachy, Functional analysis of

regulatory elements in a plant embryo-specific gene. Proc. Natl.

Acad. Sci. USA, 83, 8560-8564 (1986).

PLANT SEED STORAGE PROTEINS 43

7. E. Derbyshire, D. J. Wright and D. Boulter, Legumin and vicilin,

storage proteins of legume seeds. Phytochem., 15, 3-24 (1976).

8. J. J. Doyle, B. F. Ladin and R. N. Beachy, Antigenic relationship of

legume seed proteins to the 7S seed storage protein of soybean.

Biochem. Syst. Ecol., 13, 123-132 (1985).

9. J. J. Doyle, M. A. Schuler, W. D. Godette, V. Zenger, R. N. Beachy

and J. L. Slightom, The glycosylated seed storage proteins of

Glycine max and Phaseolus vulgaris. J. Biol. Chem., 26, 9228-9238

(1986).

i0. A. Efstratiadis, J. W. Posakony, T. Maniatis, R. M. Lawn, C.

O'Connell, R.A. Spritz, J. K. DeRiel, B. G. Forget, S. M. Weissman,

J. L. Slightom, A. E. Blechl, O. Smithies, F. E. Baralle, C. C.

Shoulders, and N. J. Proudfoot, The structure and evolution of the

human ~-globin gene family. Cell, 21, 653-668 (1980).

ii. R. B. Goldberg, G. Hoschek, G. S. Ditta and R. W. Breidenbaeh,

Developmental regulation of cloned superabundant embryo mRNAs in

soybean. Dev. Biol., 83, 218-231 (1981).

12. J. S. Greenwood and M. J. Chrispeels, Correct targeting of the bean

storage protein phaseolin in the seeds of transformed tobacco.

Plant Physiol., 79, 65-71 (1985).

13. P. Gruss, Magic enhancers? DNA, !, 1-5 (1984).

14. H. Hayashida and T. Miyata, Unusual evolutionary conservation and

frequent DNA segment exchange in class I genes of the major

histocompatibility complex. Proc. Natl. Acad. Sci. USA, 80, 2671-

2675 (1983).

15. Y. Ma and F. A. Bliss, Seed proteins of common bean. Crop Sci., 17,

431-437 (1978).

16. D. W. Meinke, J. Chen and R. N. Beaehy, Expression of storage-

protein genes during soybean seed development. Planta, 153, 130-139

(1981).

44 JERRY L SL]GHTO~ and PAULA P. CHEE

17. J. Messing, D. Geraghty, G. Heidecker, N.-T. Hu, J. Kridl, I.

Rubenstein, Plant gene structure, in Genetic Engineering of Plants

(T. Kosugl, C. P. Meredith, and A. Hollaender, eds.). Plenum Press,

New York, 211-227 (1983).

18. N. Murai, D. W. Sutton, M. G. Murray, J. L. Slightom, D. J. Merlo,

N. A. Reichert, C. Sengupta-Gopalan, C. A. Stock, R. F. Barker, J~

D. Kemp and T. C. Hall, Phaseolin gene from bean is expressed after

transfer to sunflower via tumor-inducing plasmid vectors. Science,

222, 476-482 (1983).

19. R. M. Myers, K. Tilly, T. Maniatis, Fine structure genetic analysis

of a ~-globin promoter. Science, 232, 613-618 (1986).

20. T. B. Osborne, The proteids of the kidney bean. J. Amer. Chem.

Sci., 16, 633-764 (1894).

21. H. E. Paaren, J. L. Slightom, T. C. Hall, A. S. Inglis and R. J.

Blagrove, Purification of a seed glycoprotein: N-terminal and

deglycosylation analysis of phaseolin, Phytochem,, in press (1987).

22. K. Pedersen, J. Devereux, D. R. Wilson, E. Sheldon and B. A.

Larkins, Cloning and sequence analysis reveal structural variation

among related zein genes in maize. Cell, 29, 1015-1026 (1982).

23. M, A. Schuler, J. J. Doyle and R. N. Beachy, Nucleotide homologies

between the glycosylated seed storage proteins of Glycine max and

Phaseolus vulgaris. Plant Mol. Biol., I, 119-127 (1983).

24. C. Sengupta-Gopalan, N. A. Reichert, R. F. Barker, T. C, Hall, J. D.

Kemp, Developmentally regulated expression of the bean ~-phaseolin

gene in tobacco seed. Proc. Natl. Acad. Sci. USA, 82, 3320-3324

(1985).

25. J. L. Slightom, S. M. Sun and T. C. Hall, Complete nucleotide

sequence of a French bean storage protein gene: phaseolin. Proc.

Natl. Acad. Sci. USA, 80, 1897-1901 (1983).

26. J. L. Slightom, R. F. Drong, R. C. Klassy and L. M. Hoffman,

Nucleotide sequences from phaseolin cDNA clones: the major storage

proteins

families.

PLANT SEED STORAGE PROTEINS

from Phaseolus vulgaris are encoded by two unique gene

Nucl. Acids. Res., 13, 6483-6498 (1985).

45

27. S. M. Sun, J. L. Slightom and T. C. Hall, Intervening sequences in a

plant gene - comparison of the partial sequence of cDNA and genomic

DNA of French bean phaseolin. Nature, 289, 37-41 (1981).

28. D. R. Talbot, M. J. Adang, J. L. Sllghtom and T. C. Hall, Size and

organization of a multigene family encoding phaseolin, the major

seed storage protein of Phaseolus vulgaris L. Mol. Gen. Genet.,

198, 42-49 (1984).

29. B. Wasylyk, C. Waslylyk, P. Augereau and P. Chambon, The SV40 72 bp

repeat preferentially potentiates transcription starting from

proximal natural or substitute promoter elements. Cell, 32, 503-514

(1983).

30. P. Zambryskl, H. Joos, C. Genetello, J. Leemans, M. Van Montagu, and J.

Schell, Ti plasmld vector for the introduction of DNA into plant cells

without alteration of thelr normal regneratien capaclty. EMBO J., ~,

2143-2150 (1983).


Recommended