It was first identified (1885) by Theodor Escherich as a

Preview:

Citation preview

THE MOST POPULAR AMONG BACTERIAL HOSTS IS

It was first identified (1885) by Theodor Escherich

And it is, now, the world-wide most employed microorganism in both research and applied labs

as a commensal of human gut

ESCHERICHIA COLI

Every single gene of the E.coli K12 type strain (MG1655)

Most of the knowledge on transcription, translation, DNA replication, genetic code, trasducing phages and so on, have been acquired by studying E. coli

Has been silenced

to determine which genes are

essential

and which are not

It is a Gram-negative bacterium belonging to

γ-proteobacteria Proteobacteria Enterobacteriaceae

didermal structure

•CELL WALL

PERIPLASM

•CELLULAR (inner) MEMBRANE (IM)

•cytoplasm

•peptidoglycan

•Outer membrane (OM)

PERIPLASM an oxidizing environment, ~ 4% total proteins

CYTOPLASM a reducing compartment

OUTER MEMBRANE Where surface epitopes can be expressed

Straight rod ~ 2 µm x 0,5 µm

4 main phylogenetic groups (A, B1, B2, D)

Plenty of serovars

O (>170) = polysaccharide

chains LPS

H (hauch, german for «mist, whiff») (>50) = flagellar proteins

K (Kapsel, german for capsule) (>100) = capsular antigenes (microcapsular

equivalent)

Antigens: O: … H… K…

Escherichia coli K-12

Model microorganism

A very useful biotechnological tool

K12

RESPIRATION

AEROBIC OR

FERMENTATION

IT GROWS WELL ON RICH MEDIA (37 -42 °C) BUT CAN GROW IN THE RANGE 10-45 °C

ANAEROBIC

WITH A MEAN GENERATION TIME = 20’

ATP (ENERGY) IS PRODUCED BY

IT IS ABLE TO ADAPT TO SEVERAL CONDITIONS

E. coli is the most efficient and widely-used host for recombinant protein production and for the production of DNA plasmid to be used to transform other hosts

Well-known genetics

Plenty of mutants are available

Well-known fermentations pathways

Make it easier to set and control industrial processes

Although not naturally competent, it can reach a high transformation efficiency

Fast growth in many media

E. coli provides large biomasses, and its cells are very easy to lyse

Pathogenicity class = 1 Safe to be

used

The most common origin in the plasmid vectors employed for E. coli, is the ColE1/pMB1 one

ColE1 = pMB1: 1 base

ColE1/pMB1 harbouring plasmids have a relaxed control

By halting the protein synthesis, the plasmid still replicates, as the polymerases are very stable enzymes. So, the ratio plasmid vs cellular debris can be modified

The ColE1 family encompasses multicopy plasmids in the range 15-

20 ~ 500 copies/cell

A LOT OF PLASMIDS VECTORS ARE AVAILABLE FOR E. COLI

P15 (from a natural E. coli plasmid) Relaxed replicon, 20-40 copies

The plasmids harbouring this replication origin can coexist with ColE1/pMB1 in the same cell

Further replicons (different incompatibility groups)

pSC101 (from a natural Salmonella plasmid)

Stringent Replicon, very low copy number (1-5)

F (natural plasmid Fertility factor) and φ (bacteriophage) P1 origins

1-2 copy/cell, used for artificial chromosomes (BACs & PACs)

Compatible with both p15 e ColE1/pMB1

e.g. pACYC184

conditional replication systems (suicide vectors)

e.g. The R6K replication occurs at three sites on the plasmid called the alpha, gamma and beta origins

In E. coli the pi protein can be supplied in trans by a prophage (lambda pir) that carries a cloned copy of the pir gene: the plasmid will replicate in a “lambda pir” host, but will unable to do so in other E. coli strains

It requires the “pi protein” encoded by the pir gene, to function

Plasmids endowed of the oriR6K will not be able to replicate in the absence of the pir gene product

Conditional (temperature sensitive) ORIs are also largely employed: they are based on mutated replication enzymes which are inactivated at >30 °C

conditional replication systems are usually employed in order to obtain chromosomic integrations of heterologous fragment, or alleles exchange

SELECTION MARKERS

Escherichia coli is sensitive to

In bacteria, antibiotics are used almost exclusively

KANAMYCIN

CHLORAMPHENICOL

TETRACYCLINE

AMPICILLIN

only few cells take up DNA We need selectable markers to detect them

A selective agent kills or prevents the growth of those cells that do not contain the foreign DNA, leaving only the desired ones

Is a β-lactam antibiotic, which interfers with the peptidoglycan synthesis

AMPICILLIN Its target is the transpeptidation reaction: PBPs act by removing the terminal D-Ala residue and creating

the trans-peptide bond

β-lactam antibiotics act as pseudosubstrates for the Penicillin Binding Proteins by stably

acylating their active site

Serine active site

D-A

DAP

D-A

D-A

DAP

D-A

R H CH3

N

S

CH3 COO- O

β-lactam

BLA_TEM

Penicilloic acid (inactive)

HN

R H CH3 S CH3 COO-

O O

TPase TGase TPase TGase

Which acts mainly on penicillins, by hydrolizing the β-lactam ring, and preventing them to bind to PBPs

The resistance determinant used on plasmids (blaTEM) encodes a periplasmic class A β-lactamase

PBPs

Easy to use Low costs Fast growth of the resistant colonies

AMPICILLIN

The Ampr cells secrete the enzyme

Excellent choice for the laboratory

DRAWBACKS, DISCOURAGING FOR THE LARGE PRODUCTIONS

R

plasmidless cells can grow S S

The antibiotic concentration decreases

ADVANTAGES

On solid media, this is evident by the appearing of satellite colonies

A possible alternative is the carbenicillin, more expensive but stabler

In liquid media the loss of plasmid is high (up to 80%)

S S

S

S S

S

S S

S

R

-Ampicillin degrades spontaneously at acidic pH

If the selective agent is ampicillin, the starter cultures should not grow more than 8-10 hs

-if not correctly stored, the activity decreases quickly

CHLORAMPHENICOL

It blocks protein synthesis by binding to the 50s ribosomal subunit

The determinant used as a marker (Chloramphenicol Acetyl Transferase - CAT) derives from the TN9 transposon

CAT is a cytosolic, tetrameric protein

Cat + CM + acetyl CoA hydroxy-methyl derivatives, unable to bind to the target

DRAWBACKS: Chloramphenicol slows the bacterial growth rate

it is toxic and potentially carcinogenic

It is not allowed for productions aimed to human use

It has to be dissoved in ethanol and a particular attention must be paid in mixing the antibiotic to the medium as microclumps could occurr

TETRACYCLINES

H H

H O

H

CONH2

H O H O

H O H O

N(CH3)2 CH3

O O

Are a large group of antibiotics with a molecular structure containing four rings

They are naturally produced, by Streptomyces aureofaciens or S. rimosus, or semisynthetic

Tetracyclines act by forming a stable bond with the 30S ribosomal subunit, so deforming the "A“ site

for the selection just a small amount of the hydrochloride (usually 5 μg/ml) will be sufficient

HCl

The gene used for the selection in E. coli vectors is tetC from pSC101, which encodes a tetracycline efflux system, regulated by the repressor TetR

DRAWBACKS: a limited solubility (0.4 mg / mL in water; 20 mg / mL in alcohol) and photolability

P A E

30S

P A E

30S

AA2

TETRACYCLINES

H2SO4

KANAMYCIN

produced by Streptomyces kanamyceticus, it is the most frequently employed among amynoglicosides

the genic determinant used for plasmid selection is derived by the Tn 903 transposon and confers resistence to both kanamycin and neomycin

It acts by interacting with at least three ribosomal proteins, so inhibiting protein synthesis and increasing translation errors

The product is APH(3')I, an Aminoglycoside 3'-phosphotransferase

For the selection, kanamycin sulfate is used, generally at a 30-50 μg/mL concentration

AMP TET

TET

AMP

Antibiotic resistance genes can be used also for the SCREENING, (older plasmids) but this implies a longer time for handling

E.g. cloning within the tet gene, that is therefore disrupted

Day 1: after having selected on AMP, the transformant colonies are streaked or replicated on tetracycline plates

Day 2 : the clones unable to grow on tetracycline, harbour a charged plasmid and must be picked from the first plate

AMP

The best suited screening markers are those conferring features that can be detected by hystochemical methods

α-peptide of the β-galactosidase (blue/white)

It requires a defective host strain, deleted in the α-peptide encoding region so to be complemented by the plasmid

Reagent: X-gal (5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside)

The insertional inactivation is obtained by cloning the DNA fragment within the marker encoding gene

Some other selection systems can be used with every E. coli strain (i.e. they do not depend upon mutations in the host genome)

phoC (Morganella morganii): aspecific phosphatase, can be detected by adding BCIP (5-bromo-4-Chloro-

3 indolyl-phosphate) to the agar medium

Melanine-like pigment of Streptomyces: can be detected by adding tyrosine to the agar medium

The insertional inactivation is obtained by cloning the DNA fragment between the marker encoding gene and the promoter

It has, however, a rather low sensitivity and the browning is not clear cut on the isolated colonies

PROMOTERS

A good promoter must:

be strong enough

to function in many (most of) E. coli strains

inexpensive

simple

Independent from the normal components of culture media

To be induced in a manner

T78T82G68A58C52A54 -- 162117521819 -- T82A89T52A59A49T89 -35 spacer Pribnow (-10)

That is to be as much as possible consistent with the E. coli consensus

a strict control allows to obtain a good biomass yield before starting to express the product

A tight control is needed because

The presence of heterologous proteins can induce/activate the host proteases

The product could be toxic to the cell

Or could aspecifically interact with some cellular components

By binding or damaging DNA

By sequestering essential proteins

Through an improper interaction By causing oxydative

stresses or inducing response systems

“UP” SEQUENCES are «optional» promoter features

A+T RICH SEQUENCES LOCATED UPSTREAM THE -35

By binding to the alpha subunit of the RNA polymerase

UP sequences have been observed in

They are less conserved than -35 and -10 regions Their consensus is [-59 nnAAA(A/T)(A/T)T(A/T)TTTTnnAAAAnnn -38]

Whenever present, they increase the expression level (30-70%)

Monoderms (Bacillus, Clostridium)

Didermes (E. coli)

Bacteriophage (λ and Mu)

Shine Dalgarno start of the ORF A good SD region, in E. coli is about 6-8-bp long

consensus 5’-UAAGGAGG-3’

The start codon is usually AUG Some microbial species with an high G+C can use alternative codons ( the most frequent one is GUG) To express these ORFs in E. coli, it is expedient to modify the start codon

The bp number

4-14 possible

7-9 Most frequent

8 OPTIMAL

UAAGGAGG-xxxxxxxx-AUG

The quality of the bases A+T OK

G+C NO!

C and G negatively bias the translation efficiency

In the region spacing from SD and the start codon, it is very important to check:

The best thing would be to calculate the translation rate for each individual CDS, in terms of thermodynamics

It’s also necessary to avoid palindromic sequences involving the ORF start or masking the SD region

ccugaauUAAGGAGGnnnnnnAUGauucagg

UAAGGAGGacucgagaAUGnnnnnnnnucucgagu

And, if needed, to modify the SD sequence according to it (the translation rate should not exceed the folding and secretion ones)

This is almost impossible to do experimentally, but there are programs that can help, such as

the RBS Calculator at http://salis.psu.edu/software

Induction

The growth does not change after the induction

The growth rate decreases

The effects of the overexpression are depicted by the growth curve of the host

check for lysis

If the host cells lyse, consider the lysis amount, the related loss of product

decide whether to try other conditions and/or induction times, or not

Favourable curves

Induction

Costitutive Espression of a toxic product

The growth is inhibited/ the host cells lyse

Negative curves

Promoters frequently used in E. coli

lac promoter

INDUCTION: ALLOLACTOSE (lactose byproduct)/IPTG YIELD: MEDIUM to LOW REGULATION: NEGATIVE (Repressor: LacI) CONTROL: INADEQUATE

Z Y A O P I repressor β-galactosidase

permease promoter

operator

LacZ + HOH

LacZ + HOH

glucose

galactose lactose

allolactose

Due to the basal expression, a few molecules of LacZ and LacY are there

when lactose is made available

so to convert lactose to allolactose and relieve the repression (induction)

Z

Y

negative regulated genes are expressed at low (basal) level in the bacterial cell

Z

Y

Plac is a rather leaky promoter: its basal expression is hardly limited

Unless the repressor is located on the same plasmid

In the lab, induction is often not actually necessary

or the host is a LacI overespriming (LacIq) one

IPTG is a gratuitous inducer

it binds to the repressor that is no more able to bind the operator

but it is not metabolized

the concentration remains constant and can be controlled

However, it is relatively expensive (lactose can be used instead)

Z Y A O P I

IPTG is not allowed in GMP for human use, further limiting the usefulness of pLac for

industrial productions

TTTACA TATGTT TGGAATTGTGAGCGGATAACAATT Plac

it differs from the consensus in 3 nucleotides

the spacer length is sub-optimal (18 nt)

the spacer length is still non ideal

TTTACA TATAAT TGGAATTGTGAGCGGATAACAATT lacUV5

Although largely used for in the lab applications, Plac is not a strong promoter

The derivate PlacUV5 is much stronger, due to a 2 bp mutation in the -10 hexamer, that enhances the recruitment of the RNA polymerase σ70 subunit

WILD TGTGAGTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCT UV5 ...A...................................................

-65 -35

-10 +1 Operator RBS WILD CGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATG UV5 .....AA................................................

The net effect of the three-point mutations is the creation of a stronger promoter that is less sensitive to the glucose effect

Moreover, a third point mutation located in the CAP/cAMP binding site, decreases the affinity of PlacUV5 for CAP/cAMP and the sensitivity to

catabolite repression

The good recruitment of σ70 bypasses the need for a positive activation (CAP) typical of the wild-type Plac

PtacI

Ptac ensures a good yield: higher than Plac but not so high as the T7 promoter

INDUCTION: IPTG YIELD: MEDIUM-HIGH (5x than the parental ones) REGULATION: LAC I BASAL EXPRESSION : HIGH

-35 -10

lac operator

TTAACT TTGACA trp

TTTACA TATAAT AATTGTGAGCGGATAACAATT lacUV5

TATAAT AATTGTGAGCGGATAACAATT TTGACA RBS tac I

Its strenght could be a drawback for toxic products and membrane proteins

It is a synthetic hybrid promoter derived from the E. coli trp and lacUV5 promoters

tetA

INDUCTION: ANHYDROTETRACYCLINE

YIELD: MEDIUM TO HIGH

REGULATION: NEGATIVE

CONTROL: TIGHT

BASAL EXPRESSION: LOW

P tetA

TetR heterologous gene

-

The repressor encoding gene is placed on the plasmid: that’s why the system does not depend from the background of the host strain

NOT DEPENDENT UPON THE STRAIN AND/OR THE METABOLIC STATE OF THE HOST THE INDUCER MOLECULE IS NOT EXPENSIVE

The induction is performed with ANHYDROTETRACYCLINE at low concentrations

A full induction(~50 ng/ml) does not bias at all the growth rate of E. coli

OH

OH OH

OH

CH3 N(CH3)2

O

O CONH2 O

A-TET TET Binding efficiency

35

1

A-TET TET Bactericidal action

100

1

This promoter has been successfully used to produce many heterologous proteins (FABs toxins..)

Non-induced cells induced cells

1 100 =

INDUCTION: ARABINOSE DOSE DEPENDENT

YIELD: MODULABILE

CONTROL: POSITIVE

REGULATION: TIGHT

BASAL EXPRESSION : VERY LOW

AraBAD The promoter of this tightly

controlled operon, is a suitable tool for the heterologous

expression in E. coli

CTGACG -- 18 -- TACTGT TTGACA -- 17 –- TATAAT

araBAD

consensus

Inducer=Arabinose: very cheap and suitable for the Good Manifacturing Practices

The almost totally lack of a basal expression, in arabinose free media added with glucose, balances the structural weakness of this promotor

P AraC + heterologous gene

0.001%

ARA

3

1%

ARA +

The arabinose amount needed to induce expression depends upon the genetic background of the host strain

rhaPBAD

RhaR transcriptional activator 1 RhaS positively regulates the entire regulon (rhaBAD: catabolism and rhaT: transport)

INDUCTION: RHAMNOSE

YIELD: MEDIUM

REGULATION: TIGHT AND ELABORATE

CONTROL: TIGHT

BASAL ESPRESSION: LOW

RhaR

RhaS

L-rhamnose

RhaB RhaA RhaD

RhaT The regulon is however also subjected

to a catabolite repression

The optimal time to induce and to collect the product must be determined experimentally and carefully

the choice between lysogenic and lytic cycle depends upon the balance between CI and CRO The CI product is the

Lambda repressor

If CI is already expressed in a bacterial cell (that is: if there is a

prophage)

No other lambda phages are able to infect the same cell

att

cII

Q P

O

R S

Cos

cI

A W B

C D

E F Z

U V G T H M L K I J

cro PR PL N cIII

int

xis

λphages can enter a lytic (right arm) or a lysogenic (left arm) cycle

Phage promoters (Lambda and T7)

Bacteriophages offer a possible alternative to the regulated promoters of metabolic genes

cII

Q P O

R S

cI

A W B C

D E F Z U V G T H M L K I J

cro PR PL N cIII

int

xis

PL and PR promoter are directly recognized by the bacterial RNA

polymerases

And are very efficiently regulated by the Lambda

repressor

The Lambda PL/OL promoter-operator ensures medium to high expression levels

λPL

To further improve the control on the promoter, mutated forms of the CI repressor are usually employed

to easily switch the expression on/off, a temperature-sensitive mutant of the lambda repressor (cI857; CT in 37742) is the most frequently used

29-30 °C λPL

T sensitive cI857 repressor: ON

Active form

Heterologous gene

λPL = repressed

42 °C λPL

T sensitive cI857 repressor: OFF

Inactive form

Heterologous gene

λPL = constitutive

It is very advantageous for the overexpression of proteins susceptible to proteolysis which would be degraded at more growth-friendly temperatures

An alternative to the use of temperature sensitive mutant is to put the CI repressor under the control of PTrp

An economic and practically devoid of tryptophan culture medium contains molasses and acid casein hydrolysate

Heterologous gene

P trp cI repressor λPL

By adding tryptone (tryptic digest of casein tryptophan rich) to the medium)

Heterologous gene λPL cI repressor P

trp the repressor

expression halts the foreign gene expression starts

Another interesting application of CI repressor has been proposed in 2016 by G. Durante- Rodríguez et al.

These authors engineered a artificial chimeric regulator

By fusing the DNA binding domain of CI and the BzdR repressor of the bacterium Azoarcus, that responds to benzoyl-CoA

In natural conditions the lytic cycle is triggered by the SOS response, trough the action of RecA

PR promoter: repressed PR promoter: derepressed

λNTCI λCCI stress

Benzoyl-CoA

PR promoter

λNTCI

LCBzdR

PR promoter: derepressed

Differently from the Lambda PL one, the T7 promoter is not recognised by E. coli RNA polymerase as it promotes the class 2 (tardive) genes

So while choosing to use a T7 promoter we need a bacterial host expressing T7 RNA polymerase

The transcription from the T7 promoter depends upon the T7 RNA polymerase, produced among the class1 (early) genes which are transcribed

by the host RNA polymerase

the T7 RNA polymerase is faster than the E.coli one and its expression has to be tightly controlled

BL21 DE3

The most popular one is the commercially available E. coli BL21 (DE3)

A lisogenic E. coli «B» strain harbouring the DE3 Lambda phage

In the λDE3 phage the T7 polymerase is expressed from the LacUV5 promoter so that it can be succesfully produced even in

the presence of glucose

The λDE3 mutant has been obtained by inserting the PlacUV5-T7 fragment within

the integrase encoding gene

The disruption of the int gene prevents the excision of the

DE3 prophage

Of course, it is possible to construct one’s own lysogen (DE3) strain, with the same technique used to construct the BL21(DE3)

or other commercially available (DE3) strains

To do so 3 (4) different bacteriophages are needed:

λDE3 (int-): it is not able to integrate into (or excise from) the bacterial chromosome by itself

An Helper Phage (B10) that provides the int function to λDE3, but cannot form a lysogen by itself because it is cI- (has no repressor)

By means of the integrase provided in trans by the Helper Phage, the DE3 will be able to integrate in some bacterial cell

However, the colonies of the lysogenic cells would grow among an overwhelming number of WT ones, so we need a selection tool

Both of these phages cannot propagate due to another mutation affecting the ability to lyse the bacteria

-Cannot kill λDE3 lysogens, because it has the same immunity, as λDE3 -Cannot integrate in susceptible cells (cI-) -It kills the mutated bacterial cells resistant to λDE3 that otherwise would survive and hamper the screening

The selection phage (B482)

λDE3

B10 B482

Lysogens are prepared by co-infection with the three phages

B10 + λDE3 integration

Unaffected by B482

No DE3 integration

Killed by B482

most of the growing colonies should be (DE3)

To check the lysogenic state a 4th phage is employed

But it will succeed in killing the λDE3 lysogen cells, as they produce T7

polymerase, once induced with IPTG

The Tester Phage, is a T7 mutant, deleted in the T7 RNA

polymerase region

So it is unable to form large plaques on WT E. coli cells

A further control may be exerted with the T7 lysozyme

The LSZM encoding gene has been cloned in a p15 plasmid obtaining pLysE and pLysS

Lysozyme binds directly to the T7 RNA polymerase hampering its activity

But it has the drawback to slow the host growth rate as it cuts the cell wall, weakening the cell structure

p15 Cmr

pLysE

T7 LSZM

p15 Cmr

pLysS

T7 LSZM

Notwithstanding the tight control, some basal activity still occurs; it is not usually a problem but it could be if the product is toxic

pLysE and pLysS differ only for the orientation of the T7 lysozyme encoding gene

But it is a very important difference

In pLysE the gene is transcribed from the very strong Tet promoter

p15 Cmr

Prom Tet

T7 LSZM

pLysE However, the amount of produced lysozyme is too high

The cells became too frail

Actually, this was the projected construct

As the gene was transcribed from the weak T7Φ 3.80 promoter

p15 Cmr

T7 LSZM

pLysS

prom T7Φ 3.80

T7Φ 3.80 is located DOWNSTREAM of the gene but transcribes it together with the

entire plasmid

In this case the produced lysozyme is sufficient to hamper the T7

polymerase activity

without halting the growth of the culture

unexpectedly, the opposite orientation seemed to function

A T7 driven expression can sometimes cause some aggregation of the produced proteins

A possible solution is

To slow the production rate

By dosing the IPTG (25-100 µM instead of 1 mM)

By decreasing the incubation temperature

INDUCTION

After the induction, the product output steps up abruptly but

the growth of the hosts harbouring a T7 driven plasmid could stop

it is essential to carefully determine the right timing in order to get the maximal yield

T7 RNA polymerase

CE6 BACTERIOPHAGE

The polymerase gene is cloned into the int gene so to be transcribed from the λPL e λPI promoters during the infection

λCE6 is not able to enter the lytic cycle because of the “Sam” mutation (A G in 45352) which inactivates the lysis protein GPS

the basal expression of T7 RNA polymerase prevents the use of lysogenic hosts with particularly toxic genes transcribed from the T7 promoter

CE6

Immunity region

cII

Q P

O

cI857 cro PR PL N

cIII

xis

PI

int

Under the control of CI857

In such cases the λCE6 bacteriophage can be used to provide a source of T7 RNA polymerase to susceptible hosts

When CE6 infects the cell the T7 RNA polymerase synthesized de novo starts to copy the target DNA with a very high efficiency

CE6 can be propagated in the E. coli LE392 (supF) strain which suppresses the Sam7 mutation allowing the phage

to enter in the lytic cycle

the system is not so efficient as the use of lysogenic strains (DE3)

But until the infection is performed there are no polymerases able to recognize the T7 promoter

and to transcribe the target gene! +T7 POL

Another possible drawback of the T7 promoter in lysogenic strains is is the uneven expression in the single transformant clones

BL21 DE3

BL21 DE3

BL21 DE3

BL21 DE3

BL21 DE3

so that it is necessary to examine several colonies to look for those with the highest production level

T7/lac can also be combined to the use of pLysS

Another variant is the T7/lac hybrid promoter

P T7

lac operator Heterologous gene +1

The basal transcription is blocked by LacI

relieves at the same time the repression from the target gene Very often the lacI gene is included in the vectors with this system, to warrant a tight control

P T7

lac operator Heterologous gene +1

The IPTG used to induce the T7 RNA polymerase expression in BL21(DE3)

LacI

LacI IPTG

Another important feature is the TERMINATOR

On the expression vectors, a strong rho independent terminator is often located downstream from the MCS to ensure the right release of the mRNA

from the ribosome

Terminators are palindromic sequences followed by a A strand (U at the 3’ end of the mRNA)

<<<<<<<:::<:<<:-:--:-:>>:>:::>>>>>>> AACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG T7

<<<<<<<<<<<<<<<<<<---->>>>>>>>>>>>>>>>>> AAAACAAAAGGCTCAGTCGGAAGACTGGGCCTTTTGTTTT

rrnD

The terminator prevents the genes located downstream the target one, and in the same orientation, to be co-transcribed from the promoter

Downstream gene P Target gene

Downstream gene P Target gene

On some plasmids for example the β-lactamase encoding gene can be transcribed from the T7 promoter

BLA

The degradation rate of ampicillin increases

S S

Plasmidless cells can overgrow

If there are other promoters even far from the target ORF but oriented in the same direction

So to o prevent undue transcripts involving the gene of interest

It may be expedient to put a strong terminator also UPSTREAM of the expression cassette

WHERE IS BETTER TO DIRECT THE RECOMBINANT PRODUCT IN THE CELL?

Inclusion bodies easy purification protection from proteases inactive (non toxic) proteins Higher yield Simpler plasmid design

Simple purification low proteolysis level Improved folding N-terminus authenticity

The least extensive proteolysis Simple purification Improved folding N-terminal authenticity

Usually no secretion Cell lysis

Signal does not always facilitate the export Inclusion bodies may form

Inclusion bodies protein folding denaturation/refolding N-terminal extension

cytoplasm

periplasm

Outside the cell

The macromolecule concentration can reach 300–400 mg/ml

E. coli cytoplasm is a very crowded environment

On average, a ribosome releases one protein chain every 35 seconds

In such conditions the first challenge for a protein is to correctly fold

The small (<100 residues) single domain host proteins efficiently reach a native conformation owing to their fast folding kinetics

CYTOPLASM

large multidomain and overexpressed recombinant proteins often require the assistance of folding modulators

IN VIVO THE PHYSIOLOGICAL PROTEIN FOLDING IS ASSISTED BY CHAPERONES

the nascent polypeptides first interact with ribosome-bound Trigger Factor (TF)

TF transiently associates to the ribosome

To form a protected folding space where nascent polypeptides are shielded from both proteases and aggregation

Once released from TF, the peptides can either

fold spontaneously (roughly 70% of cytosolic proteins under normal growth

conditions)

Or require further folding-assistance by downstream

chaperones

KJE system (5-18%) DnaK (Hsp70) with its co-chaperone DnaJ and the

nucleotide exchange factor GrpE

ELS system 10-15% GroEL (Hsp60) and its co-chaperone GroES

The substrate proteins ejected from DnaK can

DNAK & DNAJ stabilize the nascent chain in a folding-competent state already during translation

By binding hydrophobic segments that are exposed in the extended chain but that will be later buried within the

folded protein

K

Be transferred in a semi-folded form to

GroELGroES

fold into a proper conformation

Be recaptured by DnaKDnaJ for additional cycles of binding/release

Proteins requiring a folding assistance can enter

GroEL is made of two rings and the protein binds inside the open one

the open ring becomes an enlarged folding chamber with a hydrophilic lining, where the substrate folding (~10s) is timed by

ATP hydrolysis

the GroEL ring binds ATP

And rapidly recruits GroES, which caps the cavity

EL EL EL

EL

EL EL

ES

ES

Once folded, the peptide will be released when ATP will bind again and a new peptide

will enter on the opposite side

In the meanwhile, GroES, the bound ADP and the folded peptide from the previous cycle, are ejected from the opposite ring

If the ejected substrate still exhibits significant surface hydrophobicity it will be recaptured, entering a new cycle

ES

ES

EL

EL

The chance to succeed, however, critically depends on the structure of the heterologous

protein

Direct the peptide to the PERIPLASM

To facilitate the correct folding of recombinant proteins

Some vectors include the DnaK/DnaJ or GroEL/GroES encoding genes

As the cytoplasm is a REDUCING compart, proteins that depend upon the disulphide bond formation, are easily misfolded

OR

In such cases it is often more expedient to manipulate the host strain rather than rely on plasmid features

In eukaryotic cells, disulphide bonds are preferentially formed in the endoplasmic reticulum

The BACTERIAL PERIPLASM is an oxidizing compartment

(ER)

It can therefore surrogate the ER, although translocating nascent polypeptides through the inner membrane introduces a delicate step

It hosts enzymes catalyzing both disulphide bond formation and isomerization, as well

as specific chaperones and foldases

The number of available gates to the periplasm is limited so that metastable precursors may accumulate in the cytoplasm

The fusion to suitable leader peptides allows to translocate the

unfolded precursors into the periplasm by either

the Sec (post-translational)

or the SRP (co-translational) system

SEC dependent translocation (General Secretory Pathway)

periplasm The Sec(YEG) translocases form a

transmembrane channel

The energy is provided by the ATPase SecA

The co-translational SRPs system utilizes the same channel as the GSP, but instead of SecB, a Small

RibonucleoProtein (SRP) acts as the chaperone

Ffh

5’ 3’

the SecB chaperone binds the pre-protein and brings it to the

general secretion apparatus

SRP binds to the signal peptide of the nascent polypeptide forming an SRP–ribosome nascent chain complex

then binds to its own receptor (FtsY) on the cytoplasmic side of the inner membrane, pulling the complex

FtsY

the SRP pathway is an alternative to the post-translational secretion “SEC” and it is required to avoid premature folding of the proteins in the cytoplasm

Another translocation system is TAT (Twin Arginine Translocase)

the Tat apparatus is energised exclusively by the transmembrane proton electrochemical gradient (Δp)

TAT

It is made of three essential (Tat A, B, C) transmembrane proteins and two accessory

ones (D,E)

The signal peptides which are recognized by TAT are characterized by a couple of arginine residues at the N-terminus

Differently from Sec, TAT translocates only the correctly folded peptides

periplasm

It is about 25 AA long (17-18 25-27)

The typical signal sequence of Gram-negative (didermal) bacteria

positive hydrophobic polar Mature Peptide

THE LENGTH IS CRITICAL

SO AS THE STRUCTURE

At least 1 positively charged residue (Lys Arg) within the first 7 ones

A central hydrophobic region (L V I)

A Ser or Ala rich C-TERMINUS

Frequently Pro at (-6)

Typically an Ala (-3 -1) = cut site for the signal peptidase (Ala-X-Ala) (less frequently Ala is substituted with Ser or Gly

P S S A H A L V L F L A L L Y M K K F

cotranslational translocation by SRP needs the presence of highly hydrophobic leader sequences

N-region H-region C-region

MNNNDLFQASRRRFLAQLGGLTVAGMLGPSLLTPRRATAAQA

This is mainly due to the extended N-region

S/T-R-R-X-F-L-K

To be targeted to the Tat pathway the N-terminal signal peptides must harbour at least two consecutive arginine residues within an S–R–R–x–F–L–K consensus

the H-region of Tat signal peptides is less hydrophobic than that of Sec-specific signal peptides owing to the presence of more glycine and threonine residues

TAT signal sequences are longer than their SEC counterparts (38 aminoacids, on average)

Some of the most popular..

PEL B (PECTATE LYASE)

ST-II (E. coli THERMOSTABLE TOXIN)

DSBA (OXIDOREDUCTASE)

OMPT (MEMBRANE PROTEASE)

K12

ETEC

PEC

PHO A (ALKALINE PHOSPHATASE )

A Sec signal sequence is present in almost all the expression vectors

Recombinant peptide

MRTLTTLGLALLLAQPAVA AQAVLPQLQPYTAPAAWLTPVAPLRIADN

MRTLTTLGLALLLAQPAVAAQA VLPQLQPYTAPAAWLTPVAPLRIADN

? ?

MRTLTTLGLALLLAQPAVAAQAVLPQLQPYTAPAAWLTPVAPLRIADN

http://www.cbs.dtu.dk/services/SignalP/

Any ambiguity in the leader peptide should be avoided

This signal peptide, from the chromosomal b-lactamase B3 of Pseudomonas otitidis showed two possible cutting points

some online services analyze the putative signal sequences and try to predict their efficiency and the most probable cutting site

artificial peptides can be designed

MIA-1/MIA-2: the same peptide but different nucleotide sequence (same CAI value) (aimed to co-express two different peptides on the same vector, avoiding homologous recombination)

MIAmax: the best one..

MIAperC: introduces a cloning NcoI site (CCATGG) without altering the AMA motif for the signal peptidase II

Even when fused with a suitable signal sequence, some large cytoplasmic proteins or some mutants obtained by a combinatorial approach could fail to be

translocated

Secondary/tertiary structure

Chaperones recognizing

attempts to solve the problem could be tried by

Overexpressing the DsbC Disulphide-isomerase

Overexpressing the wide range Skp/OmpH: chaperone

Testing different leader/chaperones combinations

The bias could arise from

To avoid the possible congestion of the translocation systems

It is possible to:

Decrease the expression rate

Increase some limiting components of the translocation system

e.g. introduce plasmid copies of prlA4 e secE genes, encoding the main transport proteins

Once in the periplasm, the folding is enzymatically catalyzed by

With the exception of DsbB, these proteins belong to the thioredoxin protein superfamily

the Dsb oxidases/isomerases

chaperones such as Skp DegP and FkpA

peptidyl-prolyl isomerases such as SurA PpiA and PpiB

. the Dsb protein system (DsbA, B, C, D, G) mediates the disulphide bond

formation and rearrangement

. The formation of disulphide bonds in a protein is made possible by two related pathways:

an oxidative pathway, which is responsible for the formation of the disulphides

OXIDATIVE PATHWAY Disulphides are introduced into the substrate

proteins trough exchanging them with the periplasmic protein DsbA, which is, in turn,

reoxidized by the inner membrane protein DsbB

and an isomerization pathway which shuffles incorrectly formed disulphides

Substrates with more than two cysteines may form incorrect disulphides, causing

them to misfold

THE ISOMERASE PATHWAY

Non-native disulphides are corrected by DsbC and DsbG which are maintained in their active reduced state by the inner

membrane protein DsbD

To facilitate the purification of heterologous proteins it is possible to fuse them with other proteins (fusion partners) or with

short aa stretches (peptide TAGS)

The MCS allows to fuse the TAG in frame with the coding sequence of the recombinant peptide

MCS TAG

5’ FUSION

MCS TAG

3’ FUSION

Once expressed, the recombinant protein has to be purified

Vectors are available that allow to position the tag to the N- or the C- terminal end

Whenever the tertiary structure of the peptide is available, the TAG is placed at the solvent-accessible end

If a signal peptide for the secretion has to be added the TAG is placed at the C-terminal end

TAGS DRAWBACKS

Need for expensive proteases

The cutting efficiency never reaches 100% and limits the yield

frequent need of further treatment (eg. the formation and isomerization of disulphide bridges) in order to obtain an active product

the lack of preliminary indications on the possibility to obtain the solubilization, which can only be determined experimentally

pBR322

pBR322 4361 bp

TET R AM R

pMB1 ori

Rop bom/nic

Designed in 1977 by the scientist Bolivar and Rodriguez, pBR322 has been the first artificial plasmid vector

The mob gene is absent but the nic/bom sites could allow its mobilization by an helper plasmid

POPULAR PLASMID VECTORS FOR E. COLI

ori: pMB1 (belongs to the ColE1 family of plasmids) Selection/screening BLATEM (Tn3) and tet (pSC101)

This plasmid has a low copy number (~20 copies per cell) due to the action of the Rop protein and of RNAI and RNAII

the plasmid replication is initiated by RNAII which hybridizes strongly to the plasmid

The formation of this hybrid at the origin is critical for plasmid replication

RNaseH digests RNA II yielding a 550 nt molecule that acts as a primer for DNA polymeraseI and initiates the replication of the entire plasmid

RNAII

RNaseH

RNAII

primer

replication

RNAI is a non-coding RNA that acts as an antisense repressor of plasmid replication within ColE1 plasmids

RNAI

RNAI concentration increases together with the copy number

RNAI anneals to RNAII by complementary base pairing and blocks the access to RNAseH thus prohibiting RNAII from its initiation role

RNAII

RNAI

This results in a negative feedback loop for replication, setting the average number of plasmids per cell

the plasmid encodes also the Rop (repression of primer) protein

Rop further enhances and stabilizes the interaction

between RNA I and RNA II

RNA-II

RNA-I ROP

RNAI accumulates and hybridizes with RNAII

The plasmid is replicated and the copy number increases

At a low plasmid concentration

RNAII is transcribed from the plasmid

But when the plasmid concentration trespasses a certain treshold

Replication stops and the plasmid copy number

reaches a steady state

The RNAI-Rop regulation prevents the copy number to increase

The deletion of the rop gene, coupled with a point mutation that reduces the

formation of the RNA I/RNA II duplex, led to

The higher copy number of pUC plasmids (derivatives of pMB1 plasmids)

pMB1ori and BLATEM selection are derived from pBR322

pBR322 4361 bp

Puc18/19 Small plasmids with a very high copy number up to 500-700/cell

Plac MCS 5’- -lacZ

The screening marker is lacZ; lacI is not on the vector so that basal activity is high and can be limited in lacIq strains

The absence of both mob and bom/nic ensures that the Puc18/19 can’t be mobilized by helper plasmids

pUC18 and pUC19 differ only for the MCS orientation

Most probably the point mutation hampers the interactions between RNA I and RNA II by producing a temperature-

dependent alteration of the RNA II conformation

pBluescript II SK/KS (+)/(-)

ori: ColE1; selection BLATEM; screening lacZ

Include phage elements PHAGEMIDS

T3 and T7 promoters flank the MCS

Possibility of in vitro transcription by phage polymerases

MCS= KpnI SacI (KS) o SacI KpnI (SK) within the β-galactosidase α-peptide

ColE1

Philamentous phage F1 origin (+) or (-)

The bacterial cells are infected with an M13 mutant phage

(M13K07)

M13K07 copies the ssDNA according to the (+) or (-) orientation

Once packed in the viral particles the ssDNA can isolated to be used as a probe or for site specific mutagenesis techniques

ColE1

lacZ frag

F1- F1+ Makes it possible the phagemid to be

rescued as sense or antisense single-stranded (ss) DNA by an helper phage

(+)

(+)

Infectious form

Replicative form dsDNA

Bidirectional replication

pII inserts a nick in the (+) strand

The rolling circle replication starts

Once completed completed the (+) strand

is cut by pII Is released and circularizes

M13 biological cycle

Host enzymes

Host enzymes

Replicative form

pII mut recognizes only poorly the engineered origin

So the phagemid is preferentially replicated

M13K07 + phagemid

M13K07 is an M13 phage mutated in pII

phagemid

The original replicon of the phage has been modified by several lacZ insertions P-II

mut

And packed into the phage

The PET series

based on PBR322

Expression vectors

Transcription vectors

*** S10 MCS T7-T

T7 SD gene «10» derived from ΦT7

The cloned gene fuses with the N-terminal end of S10; T7 is induced according the features of the host strain

In time other “optional characters" were added to this popular series, changing the selection, adding TAG as histidine strands and / or phage elements

The series includes several vectors

The lowercase letter (a,b,c o d) denotes the reading frame of S10, as referred to the cloning site BamHI, placed

downstream of the signal sequence

a) GGT CGC GGA TCC b) GGT CGG GAT CCG c) GGT CGG ATC CGG d) The same frame as “c” but NcoI (CCATGG) instead of

NdeI (CATATG) upstream to the signal sequence

Selection: AMP other than in the «9»series where it is kan 11: T7/lac + lacI are on the plasmid

12: downstream S10 there is the OmpT signal sequence suitable for the secretion

5: no terminator has been inserted

Recommended