Biotech. Adv. Vol. 5, pp. 29--45, 1987 0734-9750/87 $0.00 + .50 Printed in Great Britain. All Rights Reserved. Copyright ~ Pergamon lournals Ltd
ADVANCES IN THE MOLECULAR BIOLOGY OF PLANT SEED STORAGE PROTEINS
JERRY L. SLIGHTOM and PAULA P. CHEE
Division of M o l e c u l a r Biology, The Up john C o m p a n y , Kalamazoo, M i c h i g a n 49007, USA
ABSTRACT
Plant seed storage proteins were among the first proteins to be isolated
(20); however, only recently, as a result of using molecular biology
techniques, have the amino acid sequences of many of these proteins been
determined. With the accumulation of amino acid sequence data for many
vicilin-type storage proteins much has been learned concerning the location
of conserved amino acid regions and other regions which can tolerate amino
acid sequence variation. Combining this knowledge with recent advances in
plant gene transfer technologies will allow molecular biologists to correct
(by using amino acid replacement mutations) the sulfur amino acid deficiency
inherent to bean seed storage proteins. The development of more nutritious
soybean and common bean seeds will be of benefit to programs involving human
and animal nutrition.
KEY WORDS
Amino acid sequence, AErobaccerium Cumefaciens, evolution, Glycine max,
nucleotide sequence, nutritional improvement, Phaseolus vuiEaris ,
recombinant DNA.
INTRODUCTION
Molecular investigations of plant seed storage proteins and the genes which
encode them are important because these proteins are a major source of
nutrition for people in many parts of the world. For the plant molecular
29
30 JERRYL. SLIGHTONIand PAtJLA P. CHEE
biologist these genes are also scientifically interesting as they provide an
excellent plant gene system which shows both tissue-specific and
developmental regulation. The overall goal of the molecular biologist is to
have an understanding of these genes with respect to their structural gene
organization, number and chromosomal relations~hip with each other, the
mechanism which regulates their expression, and their diversity among the
plant species. With this type of information and with the development of
transformation and regeneration schemes for crop species, storage protein
genes can be engineered to overcome inherent deficiency as a source of
protein, such as the lack of sulfur-containing amino acids for bean seeds by
adding codons for the sulfur-containing amino acids cysteine and methionine.
Because the plant kingdom is very large and quite diverse we have limited
this report to a discussion of recent advances in the research of the
storage proteins genes isolated from two economically important legumes,
soybean (Glycine max) and common bean (Phaseolus vuigaria) The major seed
storage proteins of these legumes are globulins (packaged into cotyledonary
protein storage bodies) which are represented in most legumes by two
different types of polypeptides; the nonglyeosylated 1IS fraction (called
legumins) and the glycosylated 7S proteins (called vicilins) (7). In these
two plant species our major research focus has been the investigation of the
vicilin-type storage proteins; the soybean "~-conglycinin" and the common
bean "phaseolin" polypeptides. These 7S proteins consist of multi-subunit
combinations of approximately 150~220 kiloDaltons (kDa) (2,7)~ The
individual protein subunits vary in size between and within these species.
The principal polypeptide subunits of phaseolin are referred to as ~-(51-53
kDa), ~-(47-48 kDa), and 7-(43-46 kDa) (4). These three polypeptide types
make up about 50% of the protein in P. vulgaris seeds (15)~ However,
molecular analysis of individual members of the phaseolin gene family finds
that there are only two unique types of phaseolin genes [with 3-4 gene
copies each per haploid genome, (28)], which encode only m-and B-type
polypeptide subunits (25-27, J.L. Slightom, D.V. Thompson, and R.F. Drong,
manuscript in preparation). The 7 subunit appears to be the result of
incomplete addition of N-linked oligosaccharide sidechains on the E-type
polypeptide, or their partial degradation (3,21).
The soybean ~-conglycinin polypeptide subunits consist of ~'-(76 kDa), ~-(72
kDa) and ~-(53 kDa) (7,16). The ~-subunits appear to be encoded by a
relatively small number of genes, 1-2 copies per haploid genome, while the
B-subunit is encoded by a larger number of genes (II).
PLANT SEED STORAGE PROTEINS 31
Amino acid analyses of both phaseolin and ~-conglycinin type storage
proteins find that they all contain less than 1% sulfur-containing amino
acids. For these proteins to be nutritionally balanced they should contain
between 3 to 6% sulfur-containing amino acids. Thus a goal of molecular
biology research is to incorporate, by DNA mutation, the number of amino
acid replacements necessary to obtain high sulfur storage protein (HSSP)
genes. However, before genetic engineering techniques can be successfully
used to obtain HSSP-genes a large amount of basic molecular biology
information concerning these seed storage protein genes needs to be
collected. In addition, plant transformation and regeneration schemes need
to be developed for both soybean and common bean species so that the
HSSP-genes can be placed into the proper plant environment. In this report
we have summarized the progress which has been made over the last few years
to achieve these goals.
COMPARATIVE ANALYSES OF SOYBEAN AND COMMON BEAN VICILIN-TYPE GENES
Phaseolin and ~'-subunit of ~-conglycinin polypeptides differ considerably
in both size (see below) and antigenic identity (8). However, a comparison
of the partial nucleotide sequence of these genes show that they do share
both structural and sequence identity (23). Doyle et al. (9) recently
reported the complete nucleotide sequence for the ~'-subunit (Gma-~'), a
total of 3636 base pairs (bp), and compared its nucleotide and amino acid
sequences with that obtained from the ~-phaseolin gene (Pvu-~), about 2800
bp (25). The nucleotide sequence alignments of these two sequences are
shown in Figs. 1 through 3, beginning about 900 bp 5' of the common
transcriptional capped nucleotide (the nucleotide at position 1 in Fig. I)
and extending to the end of their 3'-untranslated regions. Alignment of
these nucleotide sequences, which involves the placement of gaps (shown as
asterisks) to maximize identities, is for the most part straight forward due
to the similar structural organization of the genes; both genes have six
exons and five introns and share common nucleotide sequence elements in
their 5'-and 3'-untranslated regions. Nucleotide and amino acid sequence
comparisons presented in Figs 1-3 show that these genes do share a high
degree of identity. Using both natural (introns, exons, and untranslated
regions) and arbitrary boundaries the degree of apparent nucleotide sequence
divergence has been determined for 18 DNA regions (Table i). The overall
corrected divergence (14) between these genes is about 41%, with exons and
introns showing similar degrees of overall divergence, 43 and 37%,
respectively.
32 IERRY L. SLIGHTOM and PAULA P. CHEE
Divergence in vulgaris
Structural Region Compared
Table i
and around 7S storage protein genes from G.
i. 5' Flanking -958 2. 5' Flanking -357 3. 5' Flanking -247 4. 5' Flanking -149 5. 5' Flanking - 51 6. 5' Untranslated i 7. Exon I 78 8. Intron i 912 9. Exon 2 1116
I0. Intron 2 1349 II. Exon 3 1443 12. Intron 3 1524 13. Exon 4 1656 14. Intron 4 1939 15. Exon 5 2083 16. Intron 5 2373 17. Exon 6 2509 18. 3' Untranslated 2748
Nucleotide Number bp match/ Positions of Gaps bp compared
- -358 28 304/549 -248 4 75/101 -150 7 46/96
52 2 73/94 I 3 26/39
77 6 39/64 911 3 237/311
1115 5 52/77 1348 2 123/193
- 1442 3 60/84 - 1523 -- 67/81 - 1655 I0 61/87 - 1938 4 143/229 - 2082 7 91/116 - 2372 i 198/260 - 2508 7 74/106 - 2747 4 115/176 - 2886 6 92/129
max and P.
Corrected % divergence
67.8 31.5 87.3 26.5 44.1 552 286 425 49.6 360 196 381 521 25.4 28.7 386 465 361
Identification of cis-acting regulatory elements--The cis-acting regulatory
elements common to most eukaryotic genes, CCAAT- and TATA- elements (i0) are
also present in similar locations (-77 and -31 bp 5' of the cap site,
respectively) in these plant genes. The alignment in Fig. i shows that both
genes share identical sequences in and around these regulatory elements.
The multiple CCAAT-and TATA- elements found in the Pvu-~ sequence are
believed responsible for the numerous cap sites found in phaseolin mRNAs, at
least 12 different capped mRNA species have been identified by SI nuclease
analysis (26). The nucleotide sequences surrounding the TATA-elements of
both genes match the "pla~t consensus" while their CCAAT-elements are more
similar to the mammalian consensus sequence (17). In addition to these
regulatory elements these 5'-flanking DNAs contain sequence regions which
share identity with another mammalian cis-acting regulatory element known as TTT
an enhancer element, which has a consensus of GTGGAAAG (13>, see Fig. i.
However, because these seed storage protein genes are subject to similar
tissue specific and developmental regulation it is conceivable that they
could share other cis-acting regulatory signals unique to their seed tissue
environment. Experiments involving the transfer of Pvu-$ gene into the
genome of tobacco via Agrobacter2um tumefaciens vector systems (see below)
suggest that tissue-specific and developmental regulatory DNA elements
should exist within about 800 bp 5' of the cap sites (24). The alignment in
Fig.l shows that within about 350 bp 5' of the cap site there are several
P L A N T SEED S T O R A G E P R O T E I N S 33
pvu J TTTTATA~AAT~ATnCACC~*m̀*o'tAAACAA.FCATT~mGTATTT~TuA~AA~CATGTTAT~̀~**~G~TT~TAT~TT~(ATTT~A(A~TA~EAAit~TE
......... . ......... + ......... • ......... ~ ......... + ......... • ......... . ......... + ......... + ......... + ......... + ......... • _ it~q
Pvu s AItititTCT~KTTTTACATSCSASAt;ACATCTTil| [|j ~ ilL, l| ]. j.II[ j I ~.~j~|~ |~ i~|~.~ ].~ I|~P~I ~.~ ~i |~ ~ T A A A TAAT TAATAATAItT ird:TAIATTCAASATTTCItTATItT AAATACTCAItTATTIt(ToTCtA ATTAATTItSAIATItAT I I~ III i~l Iii I I I 111 I ~ I
Pro I 1AAAATA TA T TTTTAATTTT*.AAItTTTAATTItTT6It*.ItTTTGTGA(TATTGATTTATTItTT TA(TATGTTTAA*ATTGTTTTATAEATA~.~,,STTTAAAETAAAIItTItAGTAA I I I H~ II I I I I 1 I I IL I I I~ I I I I I I i l l l I F I I I I I t it I I I
• A AATTA ES~6'T~gETr,,~A(TCI6Tr~TCTTTTItitTTt;ItT .ATE 5! TTitf; AASA.AAAIt~ AAItSAA..-.+.-,.. AAit EA..-.*AEASA AAAAC.AGItSA
5~A • ' C~AT5C~ATitp~TATTitATTAGT'~+nTTT55A~ATit~AAAA~AAGGAACApJ~A~itA~AA~A~T~TAT~CTTT~itTA~T~T~TAit~TT~AT~TT~A ......... + ......... + ......... + ......... + ......... + ......... ~ ......... • ......... • ......... + ......... • ......... + ......... + - ~9
Pro , ~¢AAAA.CS~AATCACP~AACCAACTCA..AATT AGTC&CT55CTSATCAASATCS{CSCETCCATGT . . . . . . . . TET --,,~ T~AT ~ , ~ T S ~ I 5~J~ ~ T S ~ [ T ~ A T 5Cit I / l i l l / I I I I I 11 I I i
......... + ......... + ......... . ......... + ......... + ......... + ......... + ......... • ......... + ......... • ......... + ......... • - )19
T it A It ItTCPkAC T ATAT~ETET AT ¢ AT ~ : ~ T C T ~ A TT(CAACCA TTCTCTCTTATATAATAC TItTA/dtTACCTCTAA1ATCA+CT A~TTCTTTCA "~-" ~ I I ~ l ~ l ~ ~ ' l I t ' i i ' l l l i H ' I I, i I i i '
......... • ......... + ......... + ......... • ......... • ......... • ......... + ......... + ...... =--+- ........ + ......... * ......... • I
~I'-U#TnalISLAT D IlL 1'~ TItIIGAL&EOGVA PRO~. uL u~. uL u YIL L uF~ P,,u . . r ~ A T ~ c A ~ c i t T c i t A ~ E T A ~ T A c ~ . x c ~ r ~ T A c ~ A ~ T ~ i t c ~ ( ~ c ~ A T i t T T ~ T A ¢ T A ~ T ~ T ~ T ~ A ~ E i t T ~ ¢ ~ T ~ i t ~ A i t T ~ c ~ n 1 , , , , , , , , , , ,, , , , , , , , , , , , ~ . . I~,I~,~..,,~L t,~,A I~,c~I.IL~E ....... ~cE,it,T~,,...itl,*~,L...,itl,~l~litEl~,~,Ic~,, it ,lie ......... + ....... ".+---'----.-+ ......... + ......... + ......... + ......... + ......... + ......... + ........................... iZl
S' "~T**.SCa'r [O I t ~l~ T Pi~ TA*SA~. AA*~P. K P~o,L [ ut~u L~ ukE uEt. YVe~t.W L Fx
Fig. i. Comparison of nucleotide sequences from the 5' -flanking and
untranslated regions of the phaseolin (Pvu ,8) and soybean ~'-subunit of ~-
conglycins (Gma ~'). The nucleotide sequence numbering system was set by
the overall alignment with nucleotide I being the shared capped site A-
nucleotide. This numbering system is continued in the positive direction in
Figs. 2 and 3. The asterisks were used to maximize identities. Nucleotide
sequence which may be important to the expression of these genes are
indicated; TATAA-elements are overlined or underlined once and CCAAT
elements are twice overlined or underlined. Possible enhancer elements are
indicated by arrows.
DNA regions which share a high degree of identity, see positions -357 to-
338, -330 to -293, -149 to -127, -119 to -98, -89 to -61, and -33 to -13.
The latter two sequence regions contain the known cis-acting regulatory
elements CCAAT and TATA, respectively. Functional analysis of these
specific DNA elements to determine if they have a role in signaling tissue
specific and developmental expression of these genes is now approachable
(see below).
One additional cis-acting regulatory element is located in the 3'-
untranslated DNA region and it is used to signal mRNA splicing and poly(A)
adenylation. The 3'-untranslated regions are. both 135 bp in length and
appear to have diverged to about the same degree (36%) as the coding
regions. Nucleotide sequence homology near the poly(A) signal is somewhat
less (Table I and Fig. 3), possibly because the Gma-a' gene contains
overlapping poly(A) signals (9).
~volutionarv relationships of exon re~ions--The degree of identity found
between the orthologous exons, and even within each exon comparison varies
considerably. Most of the exons contain regions of highly matching
34 I E R R Y L S L I G H T O N I a n d P A L I L A P. C H E E
~ l ~ . A l P E P I I . E ¢ L E A V A ~
v ELEuAL ASE .L~uSE RAt ASE ~PHEAt ATX fSERL~uAmGELU6L U, ...... ,o.oo,,o.o,o.., ............. • ......................................
F~u e CCTGGCAmACTTTCTGCCTCATTmCCACTTCACTCCEGGAGGAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . * . . . . . . . . . . I l l I l t l l l H I I I I I l l l l l l l I I I I I I 1 ~ I I
G.A = ' CC~A~CATC~TTmT~TC~CATTTGG~ATTGC~TAn~GAAAA~CA~A~CCCAGTCACA~CAA~TE~CTCCGA~T~EC~TA6CGA~AAGACmCTACAGGAAC£AAGCAmC~A
e L E u AC • S~ R VA L S E e VA~ S E • P ' z 6L T i c ~ALA T V R T e w Gt u Lv S GL N]AS N PRoSE~ I S AS N1 Y S [ Y S L E O ~ G S( m C v s As NSE EEL o L • S AS P S E m T • m ARGA S N~L I At A [ • S H;
q
Sr~NAL ~FPTIn~ CL f~W~F
. . . . o . . * . . . . . . . . . . . . . . . . . o o ° . . . . . . . . . . . . . . . . . . oo . . . . . . . . o . . . . . . . . . . . . . . . . . . . . . . . . . . . . o * . o . . ° ° o . . o . . • . . . . . . . . . . . . . . . . . . . e . ° . . . . . . o . . . . . . . . . . . . . . . . . o . . . . . o . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . • . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . i , CGCmGTTGCAACCTCCTTAA6GTGGAGGAA~AGAAGAAT~EAA~A6ETCAAA|mCACGACCACGACCACAACACCCG~AGAGGGAACGTCA~AACACEGTGAGAAGGAGGAA~A
. . . . . . . . . • . . . . . . . . . ÷ . . . . . . . . . • . . . . . . . . . • . . . . . . . . . * . . . . . . . . . + . . . . . . . . . + . . . . . . . . . * . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . $61 s A c ~ A ~ C v s k s , L E u L ~ . l .SVkLGC UGL U~LOGLUGL uCYsfiLUGLt~GL Y GL M l L E P~OA~OP~ok~GP~OGL~NI sP~oEc u k ~ G c uA~GEc NGZ ~HISGL ~GI u[Y set oGL uAs
1
.o . .o .o .o .•°o. .o . . . . . . . . . oo°.°oo . . . . . . . . . oo . . . . . . . . . °.o . . . . . . . . . . . . . o . . . . . . . . . . . . . . . . . . . . . . ° . . . . . . . . . . . . . • . . . . . . . . . . . . . 0 ~ °oo°°..oo.°.°.o°°°o°..o..°.o.°.. .°.°..o°oo°o.°oo.o°°..o°°o.oo. . . . . . . o. . . . . . . ° . . . . . . . . . . . . . . . . . . . . . . . . . . . . ° . . . . . . . . . . . . . • , ' CGAAGGTGkGC~GCCkCGTCCATmCCATm~CACECCCACGCCAACCTCAmAkGAGGAAGAGCAC6kEAGAAGGAGG~A£~CG~ATGGCAm~AAGEAEGAkAA~CkCEG~6G~AA
. . . . . . . . . ~ . . . . . . . . . . . . . . . . . . . • . . . . . . . . . ° . . . . . . . . . + . . . . . . . . . . . . . . . . . . . ° . . . . . . . . . . . . . . . . . . . + . . . . . . . . . . . . . . . . . . . _ . . . . . . . . + i~1 PGt U GL ve t uGl ~P~Ok~GP.oP~EP~oPxEPmoA~sPmok~GL~P~oXI sGLwGLuEL uGLoHI SGLO~L wLv s GLuGt u HI SGLoT~@HISA~L~ S GL oGt ~LYsHrSGL v Gz ~Lv . . o ° . . . . ° . . . . . . . . . . . . . o . . . . . . . . o . . . . . . . . ° . ° . . o ° . o . . ° ° ° . . o . . o . ° ° ° . . . . . • . . . . . . . . . ° ° . o ° ° . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . .
e . . ° . . o ° ° . . . . o . . . . . . . . o ° . . . . . . . . . . . . . . . . o . . . . . o ° . . . . . . o . . ° o . o . . . . . ° . . . . . ° . ° . ° ° . . . ° o o . . . . . . . . . . . . . ° . . . . . . . . . . . . . o . . . . . . . . . ~, GEGkkG T GAAGAGGAAC AAGA T GA AC GTGkACkCC6ACECC CkCACC AACC m A TCAAAAGGkAGAGGIAAAGC AC GAATG~AAC k( AA~ A6GAAAAGC ACCAAEGAAAGEkAAG T GA
. . . . . . . . . • . . . . . . . . . ° . . . . . . . . . • . . . . . . . . . * . . . . . . . . . • . . . . . . . . . * . . . . . . . . . • . . . . . . . . . ° . . . . . . . . . ° . . . . . . . . . * . . . . . . . . . ° . . . . . . . . . • ao l S GL y SE ~GL U G L U EL UGL N AS P EL u AeGGL U Has P eokeQ P i oH i SEL ~PeoH ~ s EL x LY s EL u E l U GL u l ¥ $ H ISGL u T e P GL fiX IS L~ $GL ~GI o l Y S H r~ G l N EL Y L Y S GL u S[ ~ L
. . . . ° o ° . . . . . ° . . . ° o , ° , . ° ° . , ° , . , ° ° ° ° ° ° , . ° ° , . ° ° ° ° ° ° . ° , . . . . . . . . • . . . . . . . . . . . . . . GLUGLuSE,EL ~ k ~ . . . . . . . . . . . . • .As~P~oP,~TynPHFA~ Pv~ ~ . . . . . . . . . . . . . • . . . . . . . . . . . . . * . - . - . - . . . . - . ° . - . ° . - . . . . . . . . . . . . . . . . . . . . . . . . . . * G A A G ~ E A G C C A A G A | . . . . . . . . . . . . . . AACCCCTT[TACTTCAA
I El f I ; rF I I I I I
U GL U GL U GL U GL . k s P GL M ks P GL U AS p~L u GL u E1 . k s e l ~ S EL uSE ~G L ME1 U S( ~G t u G l y SE RE L . S E RGL N kMGGL U PmoA,~Ae~HI s l , S k s . L ~ s As .Pe o e . ~ HIS P . e A s
~5F nkseksMSE mTePkSMlx~ leuPxeLYskS~GL,TwGc~Hi s 1LeA,sV~LLeUEL~AeEP,eAsPEL~GL~S£eL~$Ae~L uEL~AsMLEUEL uk~ e l v e A , ~ l e u V * LGL UP~ Pvu e CTETG~CA~CTCCTGEAACAC1CrATTC&~AACCA~TATGETCAC~TTC~TGTCCTCCAGAGGTm~CC~C~MCC~CG~C~TC~EkATCTTEAA~CTACCGTCTT6TEEA6[T
I I I I I I I I I l l l l I I I I I I I / 1 1 I / I I I I I / J i l l I I I H I I I I I I I I I I I I I I I I I I P [ I I I I I
M S e + L v s k M + P x E E L . . . ° l m mL E u Px E L+ s AS MEt I TY I E l • HIS VAt k , s VA ~LE u Gt , A m G P, E AS ~ l v s Am m So ,Gt ,E~ N l E U 6L N A ¢ . l [ u A,~As ~ T • ~AmG [L E l E O ~L U P H
£A~sSEmL~sP~oGLuTN~LEuL~uL~uPmoGL~GLnALAAsPAL~ELULEULEULEUV~LYALAmESE~E ~,1 Pvu ~ CAGG[¢CAAACCCGAAACCCTCCrTCnCCTCAG°AGGCTGATGCTGAGTTACmCTAGTTGTCCGTAETGET . . . . . . . . * o ° . . . . . . . • . . . . . . . . . . . . . . ° . . . . . . . . . . . . .
I t / / l l t / / / I I / I l l / I H I / I / I / I I / / / I I I / 1 1 I I l l / / I | / / / I / 1 / GMA 0 ' £~CmC~k~CCC~C~CCCnCT~CTCCCCC~CC~TET~C~CTG~TT~CTCMCGTT~mCTT~A~GGT~GTMTCCnTC~mTC~AAATA~ATAAATGmATT~ TTTATGC ......... + ................... + ......... ° ......... • ......... ° ......... + ......... • ......... • ......... ° ......... * ....... • gGl
eks iSemLysPeokseTxeLeu leu teuPmoH sX skc*ksPAc~AseTv~LEu c ( V ~ l c E L e u # s , 6 ~
P~o ~ ° ° . ° o . , , ° ° . . . . . . . . . . . . . . . . . . . . . . . . . . . * . . . . . . . . . . . . . ° . . . . . . . . ° ° ° ° . ° ° o . ° ° . . ° . ° ° ° A A G T A A T T G C TAC TGGT.A mAC TTGTTTCT 1C [ TG.CAGAA I / / / I r I I I i i i i i r l I r I [11 i i Id
G.A ¢ ' GG~T1AAG~A~TTATT~]~A~TkAA~AkAC~k~T~k~kkAAkCTAATCT~ACTATGACk~7TAi'TTk~TTk&TTAA|i~A~A|CA~CACA GCTTT ACT CTCAIAA ......... • ................... + ......... + ......... ° ......... ° ......... ° ......... ° ......... + ............................ IO~l
([.rmo# 1, ?05 p)
Fig. 2. Nucleotide sequence alignment of Pvu ~ and Gma o'-genes covering
most o f e x o n 1 a n d i n t r o n 1 . P o s s i b l e l o c a t i o n f o r s i g n a l p e p t i d e c l e a v a g e
and secondary amino acid cleavage of Gma c ~ ' are indicated. The N-
g l y c o s y l a t i o n s i t e p r e s e n t i n G m a a - e x o n 1 i s s h o w n ( s e e a m i n o a c i d s e q u e n c e
line) in the 522 bp region not present in Pvu ~. Exon and intron junction
are s h o w n b y v e r t i c a l a r r o w s w h i c h e x t e n d t h r o u g h b o t h n u c l e o t i d e s e q u e n c e s
(also see Fig. 3).
nucleotide and amino acid sequences being interspersed among other regions
which show either complete divergence or loss of genetic information due to
deletions or insertions. This observation suggests that evolutionary
constraints both at the nucleotide and amino acid sequence levels for these
seed storage proteins may consist of units which are smaller than the
individual exons. Exon 3, the smallest exon (81bp), is the exception to
this observation as it shows the least amount of divergence (19.6%).
Divergence in the other exons ranges between 28 to 52% (Table I) with exon 4
showing the most divergence. However, the most noticeable difference found
by this exon comparison occurs in exon I where Gma-e' contains 522 bp
(encoding an additional 174 amino acids) which is not present in Pvu-~ (Fig.
2). The nucleotide sequence of this 522 bp region is unique as it does not
share nucleotide or amino acid sequence identities with any 7S storage
protein gene (9). Thus it appears to have been derived from an event other
than the duplication of surrounding DNAs. DNA duplications have been found
PLANT SEED STORAGE PROTEINS 35
in many storage protein genes (22,26). Doyle etal. (9) suggested that this
522 bp region is the result of a transposon insertion event; however, no
evidence of transposon-like sequence elements exist near its boundaries. If
this sequence is the result of a transposition event, then the directly
repeated DNA elements common to a transposon have been lost as a result of
evolutionary mutations. This insertion event has retained the codon
integrity of the exon I coding region which results in the addition of 20
kDa to the molecular weight of Gma-~'-polypeptide, accounting for most of
the molecular weight difference between Pvu-~-and Gma-e'-polypeptides. The
amino acid composition encoded by the insert is quite different from that
found in the remaining Gma-e'-polypeptide as it is rich in glutamate
residues (Fig.2); however, despite being quite different in both nucleotide
and amino acid sequences from the surrounding proto-Gma-a'-gene this insert
has been able to achieve both functional and evolutionary stability.
The evolutionary relationships among the amino acid sequences of these seed
storage proteins was further refined by adding the amino acid sequences of
the vicilin-type seed storage proteins from pea (P. sacivum) to the
comparison (9). Information concerning the location of conserved amino acid
sequence regions is important because such regions would be poor choices for
the placement of amino acid modifications. Amino acid replacement mutations
designed to increase the number of sulfur-containing amino acids should be
placed in regions which show little or no conservation of amino acid
sequence. The presence of the large insertion in exon I of the soybean Gma-
o'-gene shows us that the exon I region of Gma-~'-and possibly that of Puv-
~-gene can tolerate a considerable amount of nucleotide and amino acid
sequence change. Thus exon i may provide an excellent target for the
insertion of a DNA sequence which encodes a large number of sulfur-
containing amino acids. Needless-to-say, the proof that such modifications
can be tolerated will need to be tested in the seeds of transgenic plants
(see below).
EXPRESSION OF BEAN SEED STORAGE PROTEIN GENES IN TRANSGENIC PLANTS
AnaSysis of ~ene transfers--Intensive study to elucidate the mechanism by
which Agrobacter~um =umefaciens can transform dicotyledonous plant species
[the transfer of part of its large plasmid DNA (T-DNA region) into a host
plant genome] has resulted in the development of T-DNA-derived vector
systems for the transfer of foreign genes into plants (30). The transfer
and integration of foreign DNAs into the genome of various dicotyledonous
plant species is now routine. The first plant-derived gene to be
36 IERRY L, SLIGHTOM and PAULA P. CHEE
IL vSIRAuA I LE L[UVALL[UVALLYsPmoAsPAspAm6Am~5~uTv*P,e PN~ LeuTMmSemAsMAsmPNo k ~P.e S~ R...o.. Phlu • A.IAATGGTAATGAGTTTTTrA ~AATTTCA~A5CG~ATACT£5TCTT55TGAAA~CTGAT5ATCGCA5AGA5TAcTT~TT~CTT;i.C[5AG{;GATAAcCC~ATATTC1~`~ ....
J I I I I JI I I I I I J I I I I l l l I IH / I I / /H / / / / / I ~ ! ! t I / /11 / ! ( I l / I 5~A a AATTA~6TTAT~5AATTTETT~TGTTAAnA~A~ACT~ATTCTTACCTT&ET5AA[AAC~AC~ACC~AGACT~T~ACAACCTT~AAT~65[5AT5CCCTAA~A~TCC~t~CA55A
. . . . . . . . . + . . . . . . . . . * . . . . . . . . - . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . " . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . 1~01 ~ A U J It (LmEuluRLEUVauAsnAsNAspAspAR~AsF, SeITvmAsMLeu6uMSemGL YAsPAu~LeuA~e¥~LP~OAL ~GL v
• . oo . . . . . oo . * *o . . oo .oooo . * . . . . . . . ~spH~sGL~ l ys l [ o u~ l LL ~ ~ P w AusGt~T~MI P~[TYmL[uVA AsMP~oAs~PnoLvsGLoAs~LeuA~I ~1 £5Lnl UALXJq£T Pvu e .o* . . . . . * o . . . . . . . . ,$ ..... *°~°°~*~a~T~AC~A5AAAA~C~CTGCA~AAC~ATTT~{TATn~T1AACCCTGATC~AAAGA~ATCT~A5AA?AA?~CAA~C5~AT~
I I l / I ' l / | I l l / / / l / / / l i l l l l I / I / t / I I I / I I I I
. . . . . . . . . • . . . . . . . . . • . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . • . . . . . . . . . + 1~21 T HeTHePHE ]Y~VAL VALASNP~oAsPA$NASPSLIFASNLEuAn6~IET J LE ALAGL v THATHA~4eTveVALYALAsNPeoAsPAsMASPDL u As~L[uAeGnET ]~E ]H,ke UAL AlL e
PeOVALAS~AsNP*OGL~ILeHIS*O. '1 ([~TeOn 2. 88 Se) " ' PvrJ n (~CCGTTAACAA(~C(.T(A6ATTCAT *..6TACT6CCTTTTGTA*ATACCGAACTAATTTTTT6TTATTTTAACTTGCAATTTCTC TCCAAATETGATSATAAATGTT [6T[ . . . . . CTS|k
. . . . . t ~ t t I l ~ I k , t & , ~ , l . . t , 6 , d t l ~ 6 ~ t b t t d t d t I . 6 1 k t t W t , , t 6 t t l t t J W t t ~ . . . . . . . I t l~l/ l lcltalA~l i lcal~Icl l t t l~tt t tc~l~:l ,,,~ . . . . . . . . . . . . . . . . . . . + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + . . . . . . . . . . . . . . . . . . . ~ . . . . . . . . . . . . . . . . . . + . . . . . . . . . . P,OYALAs,LvsPmOGLoA,GPXeGL~I . . . . { ] , t ao , 2. 87 mP)
"i~k~LuPHePHeLEuSEeS(eT~eGLUALAGLIIGLNSeeTvmLEU6LN6LuPI~[S£*IL~sHIslL releuGtuAuaSeePaeAsn . I PVLr e 6K~AATTT~T[(TAT~|A6CA(~AG~CitA~AAT~CTA~TT~CAA~A~TT~A~¢AA~CATATT~TA5A6~CT~FTCAAT~TA~AAA&AAAA~A~(AT~T~AC ACATATTTGCGTT
!! !!!!!!!!!'!{!! '!!! !!'!' !!if!t!"!!! !!!! {!!! !'!!!!'!"!{!!!' " l ! ' " " '" -~e~e~P~P~Leu~e~$e~H~La6~Se~T~Leu~L~L~PNeSe~Lts~s~eL~AL~Se~T~Ase I ~i
([~teola ~, ]2q lip) I-~£eL~sPHEG(oGLu]~eAs~AeG¥~
. . . . ,I ,li,~ , ,I I, ? ,,TI t H ,1 ,l ,l~ lrd,l [fliT ?I It,fill J,~lllTTl?l?~l?l[ irll ?~I GCCATTTASCTA5TA TTTGT TAAATGTCA A TI TT5AA o.. 6 EAk 5A .A A A A A .... A A IA A & AAA A A A CAA AG
5~A -' °°°...°~..*°°`ACACCATCTAA.°*T~ACGCTA6CA.AATTCAATAT(~.ATCATTATCCTTATATTTGTHCCECGCTT6~nTT.~TA~CCAAATT(5A55A~ATAAACAAEGT . . . . . . . . . • . . . . . . . . . . . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . • . . . . . . . * . . . . . . . . . • . . . . . . . . . * I 5 8 1
( ] m t n o , 3 , 8 S I P ~ Im j ] M R L Y s P , E E L U G L U l t l A s m L v s V x T
~(Pvu • O'LV)---' L uL~4 6 u , , ,6~u5 U6LYSL~GL,-,---,SLU6LUGL~EC~6L~ELUGL~VALI VALAs~] EASPSE~GLU~L~[ ELY$6~ULEUSERL~$HISALAL~SSE~$E,$E,A~
Pvu e ~C~[GTT~5~.**GA56~556ACAS~AA . . . . . . ~A~A~ACA5{AA~A5G~A6T~A~|5AA~A1[T5ATTCT~AACA~A~TAA~AACT~A~AAAcA~AAAAT~TA5|~¢AA6 I / i l l | l / I l l / I l l / / l l l l l I I | / l ! / / f i l l / I l l / / / 1 / / / / / I I / I I / I I l J / / l l / / / l l l l l l / l / l l I I I /H I / [ i / I I /
G~ o ' TCT5TTT~5TA~A5A55A~6&5{AGCAAC~AE6G6A56~5~55~T6CAA~A5A~T~AT~5T55AAATCTC~AA6AAACAAA~TC6G5~ACTGA~A~AC~T5CCA~ATC~A5TTC~A~ . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . + . . . . . . . . . • . . . . . . . . . . . . . . . . . . . * 1 5 0 1 c leuP,EGcvAn*GLuGLuGt YG~_ met ,6L N6L YEt uGL uA~6LE UGL,Et uS~ eVA C I L[VaL6LU I L ESEeLYsLysGL N [ L E AR6GL uL[uSEeLvsHI sALALYsSE,S~RSE mAn
GLYsSEmLEuSEeLYS6L,ASPA$,THel EGLYASNGLUF~EGLV~LUAmoTHmAsrAsmSERLEuAs~VALLEul ES(eSemlLeGco~eTGuu Pvu e 5AAATCCCTTTCC~AACAA5ATAAC~AA~T6~AAAC~AATTT~AACCTGACT~A6A~6ACC~ATAACTCCTT5AATGT~TTAA~CA5TTCTATAEA~AT~G~A . . . . . . . . . . . . . .
I I I I / / I I I I I I I I / I / I l l I I I I I I / I I l l I |1 I I I I 6~ ~ ' GAAAACCA~TCTTCTEAACA~AAACCTTTCAACTT5**°°°°GEAA~CEC5ACCCCATCTATTCCAACAA5~TT5ECAAGnETTT5A5ATTACCCA5AGAAACCCTCA5CT~C555A
. . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . + . . . . . . . . . . . . . . . . . . . * . . . . . . . . . * 1 9 2 1 ~L*s t , l 1 L| ~(~Sl ~6LUASPL y$ P,OPHE ASN L|u* * ** **~L ¥SE~AEBAsPPIO I LETYRSE Rls,LvsLeuSu ~LYsLEuP~e6LulL e T~aGL nAmSAsnPRO~L NLeuk,6k$
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , , . . . . . . . • o * * * * . . * * * . . . * * *o . * * . o . . . . . . . * * . .GA~AAATACAAAGAAAAA CATA AGA AAAC EA AATTGAETT [ATTATTCACTGTCST TmGTTAEAAAAT TTAGTA
Pvu . l / l / I I l l l l l ~ ] I f l 1131 l l l 111111111 I I I I I1 , f l l I l l l l l I I f t l l ~A " ' CT~5GAT~TCTTCCTCA~iT~TT~TG5ATAT6A~CGA~T~A6CA.~*.5A~AAAC.*~T5AAC~.~.TGA~CA~555TTCTA~T..~ACTTTC~CTTAGnAGAGAAACCT~TTC
' ~ -~ i ,A , . L . . . . . . . . . . . i , i t v i t . . . . . . . . . . . i i ~ i * o i , i . . . . . . . . . . . iLO~i .~cu~ i , Pvu e I]~A~AITAT~TTAAATAAT~GTTTI,. ,,.,,,,, ,,, ...... T'ITT?H"~"TtT ' , ,,, , 'Nf1TITT~T~ff ' ' , , , TM, 'fl11"975~* ' , el i "", , 'gl?flT"',, ,,, ~',, '~",., ', GM& ~* TT5A~ACT**°`TTAAATAAT~TTTACTnTT(TTT6TTCACAAATATA~5AgTCTTTTTCTACCA(ACTT[AAHCAAAGGCCATAGT55TACTA5T5ATTAAT~AA55AEAA5CA
. . . . . . . . . + . . . . . . . . . + . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * ....... + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * 2151
i~ LYALALEUP~ELEuPAoH I SPHEASNSERLYsALA ]IEVAL YALLEUYXL I L EAsNSLUSL YEL UAL~
HZ Va 6 uL UVA 6 VP~oLvs5 YAsaL~s5 uT~R . . . . . . . - . . . - . - . . . . . . . L UELuTYn6LUS/RTYaA~sALxELuLEuSEeLYsAsPAsPVALPHEVALI [P,oAL* Pvu 0 CA~GTFrG~¢~[TETi~5~CCCAAAA6~AAATAAEG~AACC . . . . . . . . . . . . . . . . . . . . . T}65AATAT6ASAGCTA(ASAECTGAGCTTICIAAASACGAT~TATTIGTAAIICCCA6CA
/ / I I I / I I I l l / l / I t / f I I I / I I / I I / / I | l l
As, [ L IGLuLEUYALELY [ L E LYSGL UGL~ELREt NA,~GL ,6L NGL M6L UEL U~L NPRoL~u 5L UVA L A~GL~S Iv RARGAL AGL u LEo SE nGLUGLNASP [ L e PHEVkL IL e PROAL A
A ATYIPMoV A AI LY A AT~e$ ~AsnY Y 6 y] ASNA AS A ,As~As,AR~As~ uL UALA am
I / i / / / / / / / / / |H i l l / / / / / / l l i / / / / / / / / I I I I / / i I I I I I / / I /11 I I I I / I i l / l l l I I I I I I I / I t l l l l i I I I T I I I I I i t l I I I I I
6L y Ty~PROYALM~ TYALiA s~AL A]~liSe nAsr L[ UA$~P~E PHe AL ~ ~E 6L y i L EASaAL AGLuAsnA$,SL ,AnGA SHh ~ L~ uAL A[ ~ 1
( [~ t~o, i;. ]0]1 eP) J[ ILyLY$|NeAseAs AT T AATATASG TT6*** . . A 66AAT TTA AT*AA * . . . * * * * . . . * * * E AA. C T6A o . * . . . . . . . . . A TAAA A AA~ AA AC5 A AA
" ° ' ,T?,,.t ?.~, TT?T .TTT, TTT.,dT ?T,T~? ,TT ?Tf,TT?,,,TT?,I T?, ?,?,,
,V~ I eSe~Seel 5L~A,$A AL uAspELvLYsAsPVA u6 v uT~e S RSLYSEA5 YASPGLUVA ~ TL?sL U] eAs,LvsSL~$en6LVSERlVRP~ YALA$ Pvu e TGT~A1/~6~A5CA~(~5TA6A~TC1~AC6~rAAA~AC~T~TA~[T [T~TCT~T~(~AA6I }A~A6~}~A}CAA(AAA~A~A~T~ATc~TA(T~}~T~A
I I / I I I I l i I I I I I I I / / I I I ( i l / / / 11 I I f l l l l l I t l ]1 I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I 5~, o ' T5TC~tAAGCCA~ATACCT~ETCAA~6t5~A55~G{TT5C5TTCCCTA6ETCT6C~AG~T~t~5~5~ACCTA~TA~5~6C£~G~5~6TCCTACnTG~GG~
......... • ......... + ......... + ......... + ......... • ......... + ......... + .......... - ........ + ......... + ......... * ......... + 26qi IVALILE SI~GLN I L~PnoSI~GL, * . - - * * * * - - * , * . -VALGt mELUL[uALAF~E PnoAnGSERALALvsAsP ILe6LUAS~LEu iLeLvsSe ~6LNSE ~ 6 t uSE ~IFynpHEYALAS
',--,,,~----(Pvu • O~L~ ) ' ~ ' - - - - - - ' SE nH I s 6L "6L nGL U~LnGL NL YS6L YAR AL ApH~VAL I"yNT~
PALAH I SH I sH t SGLm***6LmSLU6L,SLNLyS~LYAe sL~SSLV Pvu ' ITGCACACCAT?A~CAA'"CA~AACA6CAAAA~?AA?TCACCAACA6G'ACA5CAAAAGE£AAGAAA???TI I I i I i I / I l l . . . . . . . . . . . . . . . . . . . . . ~?AI[TGTG[Af]?AAIkA?TA]*'°I I I I I i
5~ • ' T6CTCASCCTCAGCASAAASA55ASEGGAACA&555AAG.--oo...-o-. . . . . . ~..oo.*.*oAAAGGGTCCTTTSTCTTCAATTTTSA556CTTT -..TACTSAA|AAGTA]GTA . . . . . . . . . • . . . . . . . . . • . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . * . . . . . . . . . + . . . . . . . . . * . . . . . . . . . * . . . . . . . . . • 2 7 6 ]
• AL *6L NP,OGL qSL NLY$6L uSL UEL *A s,Ly$SL YA,* . • . - * * • . - - . * * . - - * * * - - - " " " - s~ . • SGL v P~oLE USE a Se R 1L e LeuAR6 At A P~ E - - . T • M T ~.~
' 3 ' - u t ~ , , ~te lSa ~r) Pvu ~ 5xActaAAxT~CxT5T~55t51AA5x5CTcAT5~5A5C~T56AATATT5~ATCC~CCa~5~AACA5TaT~ATAACT5A~CT¢CATcTCACTT(TTCtAT~AArA~A(~A5~ATG.T!
I I / / I I / / l i I I I / /H l i I / | 1 / 1 / / / 1 | / / / i i / 11 / / / l i |11/ / I I / / I f i / / / l / i / / t i / I I I i / I I / 1 / / H
......... + ......... + ......... + ......... + ......... + ......... + ......... + ......... + .............. -----+-- ........ - ........ + 2Ssl $' -U, t ,ANSLAte~ (15S ~vl
Pve e AT~AT - - - eOL~(A) I I I
GMA m' TTGTTTr6TEIAC POLY(A)
m,,
Fig. 3. Nucleotide and amino acid alignments of Pvu fl and Gma ~ ' - sequences .
The alignments include exons 2-6, introns 2-5 and the 3' -untranslated
r e g i o n . A s s i g n m e n t o f i n t r o n j u n c t i o n s a n d N - g l y c o s l a t i o n s i t e s a r e
indicated as described in Fig. 2. The termination codon is designated TER.
In addition, the nucleotide and amino acid sequences corresponding to the
d u p l i c a t e d r e g i o n o f P v u o ( 2 6 ) a r e s h o w n .
PLANT SEED STORAGE PROTE[NS 37
transferred into a foreign plant environment was the phaseolin storage
protein gene, Pvu-~,
BSH R S H ~ A ,,, s . . . . . M, e,~B. s ~R.. . . . ~ R i l [ l i J I I I
. . . . . . . . . . . . . . . . . . . . I .OKb B . . . . . . . . . . . X(S) " '"~"R- - . .R B L . . .o - "
B. . - - I ~ ' ' " ' " J - - ) " " . . . . . . . . . . . . i 2:5o ~ s
TMR44) TMLI6b) . . . . . . . . OCS(3)
H G - ' " ' " ° " TAT~p4mm R c . L.-: '" i i t I s . . . . . H L..a~.~.' . ' . : . j - -
a. ~ V ' " SUT - I : J i m L ~ . 200 ~ [ pBR "] p3SR-17"/'.4 Iqmuolin " NPTri ' I
G T a T ~ _ _ ~ e H
ru -rJ" 5~t o~t 7-'u , :m :., . ;I 2 ° ° " ' pEIR p3.BB- eDNA31 PMmeoli. NPTH
Fig. 4. Placement of phaseolin native and minigene constructions into pTi 15955 Ba___mmHl fragment 17a. (A) Physical map of T-DNA regioD of pTi 15955. (B) The Ba_._~mHl fragment 17a contains a single Sm___~al site which was converted to a Hindlll site (18). (C) The native phaseolin gene linked to the neomycin phosphotransferase II gene (NPTII) was cloned into this Hindlll site. (D) Similarly, the phaseolin minigene linked to the NPTII gene was cloned into the same shuttle vector (5). Restriction enzyme sites are: B, BamHl; R, EcoRl; Smal, and H, Hindlll. The symbol G denotes the fusion of BamHl and ~,~II sites. Phaseolin gene were transferred into various A. tumefaciens strains as described by Muri et al. (18).
and it was transferred into the sunflower genome (18). Fig. 4C shows the
phaseolin structural gene region and part of the T-DNA used as the vector
for this transfer. Confirmation that the phaseolin gene was integrated into
the sunflower genome was shown by assaying for the presence of phaseolin
polypeptides in the transformed sunflower callus tissues. Surprisingly,
this phaseolin gene was found to be expressed in the transformed callus
tissues and the total poly(A)+ mRNA fraction isolated from transformed calli
does contain mRNAs which encode phaseolin. The Pvu-~-gene probe hybridizes
to a 1700 nucleotide mRNA which is identical in size to that found in the
developing bean cotyledons (5,18). The size of the phaseolin mRNA
hybridizing signal indicates that the five intron sequences of this Pvu-~-
gene (a total of 515 nucleotides) have been removed to obtain a mature
phaseolin mRNA. This result suggests that plant mRNA splicing mechanisms
are conserved between widely divergent plant species. Similar size
phaseolin mRNAs have been isolated from tobacco calli which contain either a
native Pvu-~-gene or a mutated Pvu-~-gene which lacks the 5 introns, a
phaseolin "minigene" (5) (see Figs. 4D). Expression of this phaseolin
minigene construction indicates that intron splicing is not necessary for
biogenesis of a stable mRNA molecule (5).
38 IERRYL. SLIGHTOMandPAULA P. CHEE
Tobacco calli containing either the native or minigene phaseolin
constructions were regenerated, set flowers (which were self-pollinated),
and grown to maturity. During the course of this regeneration and
maturation process different tobacco plant tissues were assayed for phaseol-
in expression. These tests showed that in most cases phaseolin expression
ceased soon after the regenerated plantlets reached a two leaf stage,
phaseolin polypeptides were not found in tobacco stems, leaves or flowers.
However, high levels of phaseolin polypeptides were found in the embryonic
tissues of the developing tobacco seeds (24). The level of expression in
these tobacco seeds was about 1000-fold higher than that found in the callus
tissues, suggesting that this bean-derived gene can respond to tobacco
developmental regulation factors in a manner similar to that found in the
developing bean seed. That is, the interaction of tobacco seed specific
developmental regulating factors and developmental enhancer DNA elements in
the transferred bean DNA are adequate to obtain developmental expression of
this bean gene. A similar set of experiments involving the phaseolin
minigene show an identical pattern of expression in tobacco seeds,
indicating that the DNA sequences responsible for tissue-specific and
developmental expression of Pvu-~-gene are not located within intron
sequences (P.P. Chee and J.L. Sligh~om, manuscript in preparation).
Again, the transfer of this bean gene shows that there is conservation in
the mechanisms which regulate the developmental expression of seed storage
protein genes in taxonomically distinct plant families. If this is
generally true, the transfer of many plant genes among the different plant
genera should not be limited by differences in gene regulatory mechanisms.
Analysis of foreign proteins in transformed seeds--One-dimensional
polyacrylamide gel analysis of phaseolin polypeptides isolated from tobacco
seeds shows the presence of authentic 46 kDa phaseolin polypeptides along
with smaller polypeptides which react to the anti-phaseolin rabbit
polyclonal antibody (24). These results suggest that full-length ~-
phaseolin is produced in transformed tobacco seeds and that these
polypeptides are correctly processed for removal of the phaseolin signal
peptide and the addition of N-linked oligosaccharide sidechains. However,
at some point in the development of the tobacco seed cotyledons some of the
full-length phaseolin polypeptides are degraded by a set of distinct
proteolytic cleavages. This degradation appears to follow the pattern found
in germinating bean seeds (24). The reason for the unstability of phaseolin
polypeptides in these transformed tobacco seeds is not clear, it could
possibly be due to the accumulation of more protein than the tobacco seed
storage protein bodies can protect from proteolytic enzymes. Analysis of
PLANT SEED STORAGE PROTEINS 39
tobacco seed storage protein bodies shows that the expressed phaseolin
polypeptides are correctly targeted as they accumulate in the amorphous
matrix of the protein bodies (12).
A similar set of experiments have been done with the soybean Gma-e'-gene,
except it was transferred into the genome of petunia plants (I). Analysis
of the petunia seed proteins showed that Gma-e'-polypeptides are produced at
a level similar to that found for phaseolin, and that expression is limited
to the developing petunia seed embryos. The first Gma-~'-related protein
produced is a 55 kDa polypeptide, which is followed by later accumulation of
larger polypeptides, 76, 68, and 64 kDa. These larger Gma-~'-related
polypeptides appear to accumulate following the development of the seed
storage protein bodies (I). In contrast to the production of phaseolin
polypeptides in tobacco seeds these Gma-~'-related polypeptides appear not
to be subjected to proteolytic cleavage (I). This observation suggests that
either the petunia storage protein bodies can protect foreign storage
proteins better than tobacco protein bodies or that Gma-~'-related
polypeptides are less susceptible to proteolytic degradation than
Pvu-~-polypeptides. Transfer of the phaseolin gene into petunia should
identify the reason for this difference in stability. Nevertheless, the
developmental expression of both the Pvu-~ and Gma-~' genes products in
tobacco and petunia seeds clearly shows that these plant species do provide
excellent heterologous whole plant systems for the testing of mutated Pvu-$
and Gma-~' genes engineered to either investigate regulator mechanisms or to
improve their nutritional quality.
IDENTIFICATION OF SEED EMBRYO-SPECIFIC GENE REGULATORY ELEMENTS
Having demonstrated the expression of the Gma-~'-gene in transgenic petunia
seed embryo tissues, Beachy and his co-workers used this model system to
examine the 5'-flanking DNA region of the Gma-a'-gene for DNA sequence
elements which interact with petunia seed developmental regulatory factors.
As mentioned above, transcriptional control of most eukaryotic genes require
the presence of a TATA-element for proper initiation of transcripts and to a
lesser extent a CCAAT-element which appears to modulate the level of
expression (19,29). In some cases, enhancer elements have been found which
greatly effect the level of gene expression in a tissue-specific manner
(13). Both Pvu-~- and 6ma-~'-storage protein genes contain sequences which
match these identified regulatory elements (9), see Fig. i. Using the
nucleotide sequence comparison of the two seed storage protein genes as a
guide, Chen et el. (6) constructed a series of deletion mutants of the Gma-
40 IERRY L. SLIGHTONIand PAULA P. CHEE
a'-gene to test the effect that each of these known regulatory elements have
on expression and to locate other regulatory elements. Each mutated Gma-~'-
gene was transferred and integrated into the genome of petunia. Fig. 5
shows a regional comparison of the Gma-~'- and Pvu-~-genes along with the
location of several informative deletions. The levels of Gma-a'-protein
observed by Chen et al. (6) are also presented in Fig. 5 as a percentage of
Gma-~'-protein produced by the native construction in transgenic petunia
seeds (I).
Deletion of the enhancer type sequence at position -560 (see Fig. i) shows
little effect on the level of expression indicating that this sequence
element does not play a role in regulating the expression of this gene in
petunia seeds (6). Little effect on The level of expression is also
observed for the deletions at positions -457 and -257 (6). The deletion at
position -208 shows some loss in the level of expression; however, the most
substantial loss in expression corresponds with the deletion of nucleotide
sequences between positions -208 and -159 as the Gma-~'-gene becomes
vertically inactive (showing no response to developmental regulation
factors) in transgenic petunia seeds. Further deletions at positions -69,
-42 and +14, which were designed to test the role of the CCAAT- and TATA-
elements show no detectable protein expression and thus yield no information
concerning their role in regulating this gene. This series of experiments
clearly shows that the genetic information between positions -257 and -159
of the Gma-~'-gene are required for the recognition of petunia (embryo-
GACAAAAJ 6CAA| ACACAACCAACTCAooAA||AG| A TG T6AT AA~AT(:5(C 51 CATGWA***-,,,T{]T-*oCTAAA|5 AIGCAAA('~AA(AC6T6 TTAA([AT~A
~'~qA • ' AA~AAAAAC~.~(A~TC~A~p~CA~T~6ACA|CA~|A~CCAC~AGCr5A~C~GATC5C~G~CAA5AAAAAAAAAC~A~C~ AAAAGCCATGCACAACAACACETACICACAAAGGC
,F d e l e t i o n s - 2 5 7
o/o p r o t e i n 8 0 %
~ 0 ' - ' ' " - ' ~ ' - f , , ~ " " - ' " f f ~ " ~ ' ~ " ' ~ ' ' , , , ,, , , , ,, ~,,. . . .~... . t . t . t . t~.¢¢...¢~.¢,, , ~ , . I , ' " ' " ' " ' ~ ' " ¢ ' ~ " ~ ' , ~ , ~ ' , ¢ : ' ~ ' I T ~ ' ~ , , , , , , C A T T T T T S I T | A T ] I F Ir
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ : - ; ~ . . . . . . . . . . . . . . . . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , .
65°Xo P~'lp J TI[AACACACGICAACCTGCATATSCGTGTCAT CCAT.~(C(.AAA CI'C~A"A~AT,~(AT5 rCCAACCACC ( C ( k k AATACC AIAAATACCICTAATAICAtCT[A(TT(TII[A "aa'-
I I I I l l ] I I I I I I I I I I I I f l f I I f / 1 1 1 / I I I ] I I I I I I I I I I I I I I I I I I I I I I l l (mA • ' T ~ A ~ 5 T ~ A ~ A ~ E ~ L ~ ( ( ~ 5 ~ G 5 ( ( ~ A ~ G T ~ T ( ~ 5 ~ A A ~ 5 ~ [ ~ T ~ G ~ ( ~ A ~ A ~ ( ~ A ~ ( ~ G ( ~
- 6 9 -42 NO NO
Fig. 5. Comparison of Pvu /3- and Gma a ' nucleotide sequences immediately 5' of the shared capped nucleotide. The deletion points in the Gma o' sequence used by Chen et al. (6) are shown by vertical arrows. The large horizontal arrows show the position of the large imperfect direct repeat and the smaller horizontal arrows show the location of the short direct repeats on both Pvu ,8 and Gma ~° sequences. The relative amount of Gma ~' polypeptides found for each deletion mutant are shown below each deletion point arrow, ND indicates that no protein was detected.
PLANT SEED STORAGE PROTEINS 41
specific) developmental regulatory factors. A more detailed analysis of the
nucleotide sequence between these positions reveals the presence of an
imperfect direct repeat of 28 nucleotides and five smaller (G+C)-rich A
repeats (AGCCCA) four of which are located within the larger repeat (Fig. C
5). Chen et al. (6) suggest that the smaller direct repeat elements may
provide the genetic information responsible for regulating the level of
expression of the Gma-~'-gene in developing seeds of transgenic petunia
plants, and, if true, these sequences may also be important in regulating
the expression of this gene in developing soybean seeds. This hypothesis is
supported by finding sequences which match the short repeats in similar
locations of the Pvu-~-gene, positions -218 (CACCCA) and -208 (AACCCA) and
the soybean gene encoding the B-subunit of ~-conglycinin (S.J.Barker, 3.J.
Harada, and R.B. Goldberg, personal communication). An analysis of other
non-allelic phaseolin gene sequences, including a gene which encodes the
Pvu-~-subunit of phaseolin shows conservation of these short direct repeats
(J.L. Slightom, D.V. Thompson and R.F. Drong, manuscript in preparation).
Further delineation of these putative embryo-specific regulatory sequences
which function like a tissue specific enhancer will be very interesting and
important if we hope to understand how the expression of specific plant
genes are regulated.
CONCLUDING REMARKS
This is indeed an interesting time to be involved in plant molecular biology
research as just in the last three years the addition of new techniques for
plant gene transfers has greatly enhanced our ability to learn how plant
genes function. Much has recently been learned about the structure of many
plant seed storage protein genes and with the use of AErobacterium-derived
vector systems we are now learning much about the regulatory signals which
control their expression in developing plant tissues. We have already
learned that many of the mechanisms and DNA-related signals which regulate
plant gene expression transcend the boundaries of taxonomically distinct
plants. Thus genes isolated from one particular plant species can be
expected to function in another species even if they are not closely
related. The use of transgenlc petunia plants as an in vivo whole plant
model system has already proven itself useful for locating DNA sequence
elements which may be responsible for embryo-specific expression of a
soybean seed storage protein gene.
At the present time the amount of information concerning the function and
evolutionary constraints of bean seed storage proteins and their genes is
42 |ERRY L. SLIGHTOM and PAULA P. CHEE
sufficient to guide the placement of amino acid replacement substitutions
necessary to improve their nutritional quality. Such nutritionally balanced
seed storage protein genes are presently being constructed and with the
development of transformation and regeneration schemes for soybean and
common bean species this goal can be achieved. The necessary components
are almost in place and we believe that the development of a more nutritious
soybean and common bean cultivars should be accomplished within the next few
years.
ACKNOWLEDGEMENTS
We thank Mr. Roger Drong for his helpful comments and help in proof reading
the manuscript. We also thank Dr. Roger Beachy and Mr. Z.L. Chen for
communicating their results prior to publication.
REFERENCES
I. R. N. Beachy, Z.-L. Chen, R. B. Horsch, S. G. Rogers, N. J. Hoffmann
and R. T. Fraley, Accumulation and assembly of soybean ~-conglycinin
in seeds of transformed petunia plants. EMBO J., i, 3047-3053
(1985).
2. R. J. Blagrove, G. G. Lilley, A. Van Donkelaar, S. M. Sun and T. C.
Hall, Structural studies of a French bean storage protein:
phaseolin. Int. J. Biol. Macromol., 6, 137-141 (1984).
3. R. Bollini, A. Vitale and M. J. Chrispeels, In vivo and in vitro
processing of seed reserve protein in the endoplasmic reticulum:
evidence for two glycosylation steps. J. Cell Biol., 96, 999-I007
(1983).
4. J. W. S. Brown, F. A. Bliss and T. C. Hall, Linakge relationships
between genes controlling seed proteins in French bean. Theor.
Appl. Genet., 60, 251-258 (1981).
5. P. P. Chee, R. C. Klassy and J. L. Slightom, Expression of a bean
storage protein 'phaseolin minigene' in foreign plant tissues.
Gene, 41, 47-57 (1986).
6. Z.-L. Chen, M. A. Schuler and R. N. Beachy, Functional analysis of
regulatory elements in a plant embryo-specific gene. Proc. Natl.
Acad. Sci. USA, 83, 8560-8564 (1986).
PLANT SEED STORAGE PROTEINS 43
7. E. Derbyshire, D. J. Wright and D. Boulter, Legumin and vicilin,
storage proteins of legume seeds. Phytochem., 15, 3-24 (1976).
8. J. J. Doyle, B. F. Ladin and R. N. Beachy, Antigenic relationship of
legume seed proteins to the 7S seed storage protein of soybean.
Biochem. Syst. Ecol., 13, 123-132 (1985).
9. J. J. Doyle, M. A. Schuler, W. D. Godette, V. Zenger, R. N. Beachy
and J. L. Slightom, The glycosylated seed storage proteins of
Glycine max and Phaseolus vulgaris. J. Biol. Chem., 26, 9228-9238
(1986).
i0. A. Efstratiadis, J. W. Posakony, T. Maniatis, R. M. Lawn, C.
O'Connell, R.A. Spritz, J. K. DeRiel, B. G. Forget, S. M. Weissman,
J. L. Slightom, A. E. Blechl, O. Smithies, F. E. Baralle, C. C.
Shoulders, and N. J. Proudfoot, The structure and evolution of the
human ~-globin gene family. Cell, 21, 653-668 (1980).
ii. R. B. Goldberg, G. Hoschek, G. S. Ditta and R. W. Breidenbaeh,
Developmental regulation of cloned superabundant embryo mRNAs in
soybean. Dev. Biol., 83, 218-231 (1981).
12. J. S. Greenwood and M. J. Chrispeels, Correct targeting of the bean
storage protein phaseolin in the seeds of transformed tobacco.
Plant Physiol., 79, 65-71 (1985).
13. P. Gruss, Magic enhancers? DNA, !, 1-5 (1984).
14. H. Hayashida and T. Miyata, Unusual evolutionary conservation and
frequent DNA segment exchange in class I genes of the major
histocompatibility complex. Proc. Natl. Acad. Sci. USA, 80, 2671-
2675 (1983).
15. Y. Ma and F. A. Bliss, Seed proteins of common bean. Crop Sci., 17,
431-437 (1978).
16. D. W. Meinke, J. Chen and R. N. Beaehy, Expression of storage-
protein genes during soybean seed development. Planta, 153, 130-139
(1981).
44 JERRY L SL]GHTO~ and PAULA P. CHEE
17. J. Messing, D. Geraghty, G. Heidecker, N.-T. Hu, J. Kridl, I.
Rubenstein, Plant gene structure, in Genetic Engineering of Plants
(T. Kosugl, C. P. Meredith, and A. Hollaender, eds.). Plenum Press,
New York, 211-227 (1983).
18. N. Murai, D. W. Sutton, M. G. Murray, J. L. Slightom, D. J. Merlo,
N. A. Reichert, C. Sengupta-Gopalan, C. A. Stock, R. F. Barker, J~
D. Kemp and T. C. Hall, Phaseolin gene from bean is expressed after
transfer to sunflower via tumor-inducing plasmid vectors. Science,
222, 476-482 (1983).
19. R. M. Myers, K. Tilly, T. Maniatis, Fine structure genetic analysis
of a ~-globin promoter. Science, 232, 613-618 (1986).
20. T. B. Osborne, The proteids of the kidney bean. J. Amer. Chem.
Sci., 16, 633-764 (1894).
21. H. E. Paaren, J. L. Slightom, T. C. Hall, A. S. Inglis and R. J.
Blagrove, Purification of a seed glycoprotein: N-terminal and
deglycosylation analysis of phaseolin, Phytochem,, in press (1987).
22. K. Pedersen, J. Devereux, D. R. Wilson, E. Sheldon and B. A.
Larkins, Cloning and sequence analysis reveal structural variation
among related zein genes in maize. Cell, 29, 1015-1026 (1982).
23. M, A. Schuler, J. J. Doyle and R. N. Beachy, Nucleotide homologies
between the glycosylated seed storage proteins of Glycine max and
Phaseolus vulgaris. Plant Mol. Biol., I, 119-127 (1983).
24. C. Sengupta-Gopalan, N. A. Reichert, R. F. Barker, T. C, Hall, J. D.
Kemp, Developmentally regulated expression of the bean ~-phaseolin
gene in tobacco seed. Proc. Natl. Acad. Sci. USA, 82, 3320-3324
(1985).
25. J. L. Slightom, S. M. Sun and T. C. Hall, Complete nucleotide
sequence of a French bean storage protein gene: phaseolin. Proc.
Natl. Acad. Sci. USA, 80, 1897-1901 (1983).
26. J. L. Slightom, R. F. Drong, R. C. Klassy and L. M. Hoffman,
Nucleotide sequences from phaseolin cDNA clones: the major storage
proteins
families.
PLANT SEED STORAGE PROTEINS
from Phaseolus vulgaris are encoded by two unique gene
Nucl. Acids. Res., 13, 6483-6498 (1985).
45
27. S. M. Sun, J. L. Slightom and T. C. Hall, Intervening sequences in a
plant gene - comparison of the partial sequence of cDNA and genomic
DNA of French bean phaseolin. Nature, 289, 37-41 (1981).
28. D. R. Talbot, M. J. Adang, J. L. Sllghtom and T. C. Hall, Size and
organization of a multigene family encoding phaseolin, the major
seed storage protein of Phaseolus vulgaris L. Mol. Gen. Genet.,
198, 42-49 (1984).
29. B. Wasylyk, C. Waslylyk, P. Augereau and P. Chambon, The SV40 72 bp
repeat preferentially potentiates transcription starting from
proximal natural or substitute promoter elements. Cell, 32, 503-514
(1983).
30. P. Zambryskl, H. Joos, C. Genetello, J. Leemans, M. Van Montagu, and J.
Schell, Ti plasmld vector for the introduction of DNA into plant cells
without alteration of thelr normal regneratien capaclty. EMBO J., ~,
2143-2150 (1983).