Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Molecular Basis of Bacterial Formaldehyde Sensing
This thesis is submitted to the University of Manchester for the degree of PhD in the Faculty of Life Sciences
2012
JAMES ROSS LAW
2
Declaration ................................................................................................................................................... 6
Abstract ......................................................................................................................................................... 6
Copyright statement ................................................................................................................................. 7
Acknowledgements ................................................................................................................................... 8
Abbreviations .............................................................................................................................................. 9
1 Introduction ............................................................................................................................ 10
1.1 Transcription in bacteria ....................................................................................................... 10
1.2 Transcription Regulation ....................................................................................................... 13 1.2.1 Transcription Factors ........................................................................................................................................... 13 1.2.2 Other mechanisms of transcription regulation ......................................................................................... 28
1.3 Formaldehyde – Toxicity, origins, and detoxification mechanisms ........................ 32
1.4 Regulation of Formaldehyde detoxification in bacteria .............................................. 45
1.5 Overall Aims and Objectives .................................................................................................. 50
2 Materials and Methods ........................................................................................................ 51
2.1 Materials ....................................................................................................................................... 51 2.1.1 Chemicals and Reagents ...................................................................................................................................... 51 2.1.2 Enzymes and other proteins .............................................................................................................................. 51 2.1.3 Oligonucleotides ...................................................................................................................................................... 52 2.1.4 Bacterial strains ....................................................................................................................................................... 53 2.1.5 Plasmid Vectors ....................................................................................................................................................... 54 2.1.6 Growth Media ........................................................................................................................................................... 54
2.2 Molecular Biology Methods ................................................................................................... 55 2.2.1 Isolation of E.coli genomic DNA ........................................................................................................................ 55 2.2.2 Isolation of Bacillus subtilis DNA ...................................................................................................................... 55 2.2.3 Isolation of the hxlR2 gene and its promoter region from Bacillus cereus AH818 .................... 56 2.2.4 Polymerase Chain Reaction ................................................................................................................................ 56 2.2.5 DNA purification...................................................................................................................................................... 56 2.2.6 Restriction endonuclease digestions ............................................................................................................. 57 2.2.7 Gel extraction of DNA ............................................................................................................................................ 57 2.2.8 Agarose Gel Electrophoresis .............................................................................................................................. 57 2.2.9 Ligation cloning ....................................................................................................................................................... 57 2.2.10 “Non-ligation dependent cloning” cloning ............................................................................................. 58 2.2.11 Preparation of competent cells ................................................................................................................... 58 2.2.12 Transformation of E. coli with plasmids ................................................................................................. 58 2.2.13 Plasmid Extraction ........................................................................................................................................... 58 2.2.14 Protein Expression Trials .............................................................................................................................. 59 2.2.15 SDS-PAGE Analysis ........................................................................................................................................... 59 2.2.16 Site-Directed Mutagenesis ............................................................................................................................ 60 2.2.17 Deletion of the KanR cassette from E. coli ∆frmR ............................................................................... 60 2.2.18 Lysogenisation of E. coli ∆frmR∆KanR .................................................................................................... 60
2.3 Protein Production and Purification .................................................................................. 61 2.3.1 Large Scale Growth for protein production ................................................................................................ 61 2.3.2 Cell Lysis and extraction ...................................................................................................................................... 61 2.3.3 Nickel Affinity Purification ................................................................................................................................. 61 2.3.4 Purification of FrmR and FrmRC36S .............................................................................................................. 62 2.3.5 Protein Concentration Estimation .................................................................................................................. 63
3
2.4 In vitro biochemical and biophysical characterisation methods ............................. 63 2.4.1 Mass-Spectrometry ................................................................................................................................................ 63 2.4.2 Multi-Angle Light Scattering .............................................................................................................................. 64 2.4.3 Circular Dichroism (CD)....................................................................................................................................... 65 2.4.4 Electropheric Mobility Shift Assays (EMSAs) ............................................................................................. 66 2.4.5 Fluorescence Spectroscopy ................................................................................................................................ 67 2.4.6 In vivo experiments using the PGFPR plasmid .......................................................................................... 68 2.4.7 In vivo experiments using the pKanRR plasmid ....................................................................................... 68
2.5 Bioinformatic analysis ............................................................................................................. 70 2.5.1 General Use of Databases .................................................................................................................................... 70 2.5.2 BLAST searches ....................................................................................................................................................... 70 2.5.3 Sequence alignments ............................................................................................................................................. 70 2.5.4 Secondary structure prediction........................................................................................................................ 71 2.5.5 DNA binding residue prediction ...................................................................................................................... 71
2.6 X-Ray Crystallography ............................................................................................................. 73 2.6.1 Background ............................................................................................................................................................... 73 2.6.2 X-Ray Crystallisation Trials ................................................................................................................................ 76 2.6.3 Data Collection ......................................................................................................................................................... 77 2.6.4 Data Processing ....................................................................................................................................................... 77 2.6.5 Molecular Replacement ....................................................................................................................................... 77 2.6.6 Model building, Refinement and validation ................................................................................................ 78 2.6.7 Analysis of the dimer interface ......................................................................................................................... 78
3 Cloning, Purification and Biophysical Characterisation of Bacterial Transcription Factors Implicated in Formaldehyde Sensing ........................................... 79
3.1 Introduction ................................................................................................................................ 79
3.2 Aims and Objectives ................................................................................................................. 79
3.3 Phylogenetic distribution of known TFs of FDP ............................................................. 81 3.3.1 Distribution of the two component systems from Paracoccus denitrificans and Rhodobacter sphaeroides .................................................................................................................................................................................. 81 3.3.2 Phylogenetic distribution of HxlR and HxlR-pCER270 .......................................................................... 81 3.3.3 Phylogenetic distribution of AdhR .................................................................................................................. 83 3.3.4 Phylogenetic distribution of FrmR .................................................................................................................. 83 3.3.5 Summary ..................................................................................................................................................................... 84
3.4 Molecular cloning ...................................................................................................................... 84 3.4.1 Molecular cloning of the frmR gene from E.coli ......................................................................................... 85 3.4.2 Molecular cloning of the hxlR1 gene from Bacillus subtilis ................................................................... 88 3.4.3 Molecular cloning of the hxlR2 gene from Bacillus cereus AH818 .................................................... 89
3.5 Protein Expression Trials ....................................................................................................... 90 3.5.1 Expression trials using pET24b-frmR-His, pET15b-His-frmR and pET15b-frmR ....................... 90 3.5.2 hxlR Expression using pET24b-hxIR1-His .................................................................................................... 91 3.5.3 Expression of HxlR2-His from pET24b-hxlR2-His .................................................................................... 92
3.6 Protein Purification .................................................................................................................. 92 3.6.1 Purification of FrmR-His ...................................................................................................................................... 93 3.6.2 Purification of FrmR .............................................................................................................................................. 93 3.6.3 Purification of HxlR1-His ..................................................................................................................................... 96 3.6.4 Purification of HxlR2-His ..................................................................................................................................... 97
3.7 Protein Size Determination Using Mass Spectroscopy ................................................ 98 3.7.1 Mass spectrometry of FrmR-His ...................................................................................................................... 98 3.7.2 Mass spectrometry of FrmR ............................................................................................................................... 99 3.7.3 Mass spectrometry of HxlR1-His ..................................................................................................................... 99 3.7.4 Mass spectrometry of HxlR2-His ................................................................................................................... 100
4
3.8 Protein Size Determination Multi Angle Light Scattering (MALS) ........................ 101 3.8.1 MALS analysis of FrmR-His .............................................................................................................................. 101 3.8.2 MALS and Size Exclusion Chromatography analysis of FrmR .......................................................... 102 3.8.3 MALS analysis of HxlR1 ...................................................................................................................................... 104 3.8.4 MALS analysis of HxlR2-His ............................................................................................................................. 105
3.9 Secondary structure determination ................................................................................ 106 3.9.1 Secondary Structure prediction of FrmR-His ........................................................................................... 106 3.9.2 Secondary structure prediction of FrmR.................................................................................................... 107 3.9.3 Secondary Structure prediction of HxlR1-His .......................................................................................... 108
3.10 Summary and Discussion..................................................................................................... 109
4 Crystal Structure Determination of FrmR and HxlR ................................................ 111
4.1 Introduction ............................................................................................................................. 111
4.2 Aims and Objectives .............................................................................................................. 111
4.3 Crystallization .......................................................................................................................... 112 4.3.1 Crystallization of FrmR-His and FrmR ........................................................................................................ 112 4.3.2 Crystallization of FrmRC36S ............................................................................................................................ 112 4.3.3 Crystallisation of HxlR1-His ............................................................................................................................. 113 4.3.4 Crystallisation of HxlR2-His ............................................................................................................................. 113
4.4 Diffraction Data Collection .................................................................................................. 114 4.4.1 Data Collection on FrmRC36S crystals ........................................................................................................ 115 4.4.2 Data Collection of HxlR2-His crystals .......................................................................................................... 115
4.5 Data Processing ....................................................................................................................... 117 4.5.1 FrmRC36S ................................................................................................................................................................ 118 4.5.2 HxlR2-His.................................................................................................................................................................. 118
4.6 Phase determination by Molecular Replacement (MR)............................................ 119 4.6.1 Molecular replacement for FrmRC36S ........................................................................................................ 120 4.6.2 Molecular replacement of HxlR2-His ........................................................................................................... 121
4.7 Model building and refinement ......................................................................................... 122 4.7.1 Model improvement and refinement of FrmRC36S .............................................................................. 122 4.7.2 Model improvement and refinement of HxlR2-His ............................................................................... 123
4.8 Validation of model structures .......................................................................................... 124 4.8.1 Crystal structure of HxlR2-His ........................................................................................................................ 125
4.9 Comparison of both HxlR2-His structures .................................................................... 127
4.10 Comparison with other structures ................................................................................... 128
4.11 A comparison between chain A and chain B in HxlR2-His ....................................... 128
4.12 Secondary structure and domain organisation ........................................................... 129
4.13 B-factor analysis of HxlR2-His ........................................................................................... 130
4.14 Analysis of the HxlR2-His dimer interface .................................................................... 133
4.15 Analysis of the DNA-binding domain ............................................................................... 136
4.16 Discussion of formaldehyde sensing by HxlR2 ............................................................ 139
4.17 Discussion ................................................................................................................................. 142
5 In vitro and in vivo functional characterisation of FrmR and HxlR ................... 144
5.1 Introduction ............................................................................................................................. 144
5.2 Aims and objectives ............................................................................................................... 146
5
5.3 In vitro analysis of the FrmR:frmRAB promoter interaction .................................. 148 5.3.1 A Non-Competitive Electrophoretic Mobility Shift Assay (EMSA) reveals that FrmR-His does not bind the frmRAB operator .......................................................................................................................................... 148 5.3.2 A Non-Competitive EMSA shows that FrmR binds to the frmRAB operator .............................. 149 5.3.3 The effect of formaldehyde on formation of the FrmR:frmRAB promoter complex ............... 150 5.3.4 Analysis of the specificity of FrmR:frmRAB promoter interaction and its dependence on formaldehyde using EMSA ................................................................................................................................................. 151
5.4 Construction of an in vivo FrmR-reporter system ...................................................... 153 5.4.1 Construction of the frmRAB-KanR and the frmRAB-GFP inserts ...................................................... 153 5.4.2 Construction of an E. coli ∆frmR strain ....................................................................................................... 155 5.4.3 Construction of the E. coli ∆frmR (DE3) strain ........................................................................................ 156
5.5 In vivo studies of FrmR function ........................................................................................ 157 5.5.1 Initial characterisation of the pGFPR reporter system......................................................................... 158 5.5.2 Initial characterisation of the pKanRR reporter system ...................................................................... 159
5.6 In vivo analysis of the properties of selected FrmR mutants .................................. 161 5.6.1 Prediction of the FrmR DNA-binding residues ........................................................................................ 161 5.6.2 Experimental analysis of putative FrmR DNA-binding mutants ..................................................... 162 5.6.3 Summary of FrmR alanine mutants .............................................................................................................. 163
5.7 Probing the FrmR formaldehyde sensing mechanism .............................................. 165
5.8 In vitro analysis of FrmRC36S ............................................................................................ 170 5.8.1 EMSA experiments with FrmRC36S and the frmRAB promoter ...................................................... 171
5.9 Analysis of the DNA binding properties of HxlR2-His ............................................... 172
5.10 Assessing the effect of formaldehyde on HxlR1 .......................................................... 175 5.10.1 Fluorescence Spectroscopy ........................................................................................................................ 175
5.11 Discussion ................................................................................................................................. 176
6 Discussion, Conclusions and Future work .................................................................. 179
Appendix ........................................................................................................................................ 186
A1: Cloning strategies.......................................................................................................................... 186
A1.1 Cloning strategy for the construction of pET15b-His-frmR, pET24bfrmR-His and pET15b-frmR .......................................................................................................................................... 187
A1.2 Cloning Strategy for the construction of pET24b-hxlR1-His ....................................... 191
A1.3 Cloning Strategy for the construction of pET24b-hxlR2-His ....................................... 192
A1.4 Cloning Strategy for the construction of the pKanRR and pGFPR reporter system ..................................................................................................................................................................... 195
A1.5 Cloning Strategy for the construction of the E.coli K12∆frmR∆KanR (DE3) strain ..................................................................................................................................................................... 197
References ..................................................................................................................................... 198
6
Declaration
No portion of the work referred to in the thesis has been submitted in support of an application
for another degree or qualification of this or any other university or other institute of learning.
Abstract
Formaldehyde is a highly toxic molecule; despite this, it is produced in the cells of all living
organisms as a by-product of metabolic pathways. Consequently, several pathways have
evolved throughout life in order to detoxify cellular formaldehyde. These pathways need to be
regulated within the cell and this study sets out to determine how these pathways are regulated
in particular bacteria. Several approaches are taken to achieve this. Known or predicted
transcription factors that regulate formaldehyde detoxification pathways from particular
organisms are considered. These proteins are called FrmR (E. coli), HxlR1 (Bacillus subtilis), and
HxlR2 (Bacillus Cereus).
The transcription factors are cloned and purified using molecular biology techniques. The
proteins are subject to biophysical characterisation i.e. size and secondary structure
composition. Additionally, the X-ray crystal structure of HxlR2 is determined and significant
progress is made towards determining the structure of FrmR. Interactions of these transcription
factors towards their target DNA sequences are studied along with the effect that formaldehyde
has on these interactions.
A reporter system is constructed that enables the behaviour of FrmR to be studied in vivo.
Residues that are likely to play important roles in DNA recognition by this regulator are
identified. Additionally, this reporter system identifies a residue that is essential for
formaldehyde sensing by this protein. Overall, some significant insights into how these
transcription factors carry out their biological function are established.
7
Copyright statement
i. The author of this thesis (including any appendices and/or schedules to this thesis) owns
certain copyright or related rights in it (the “Copyright”) and s/he has given The University of
Manchester certain rights to use such Copyright, including for administrative purposes.
ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be
made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and
regulations issued under it or, where appropriate, in accordance with licensing agreements
which the University has from time to time. This page must form part of any such copies made.
iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual
property (the “Intellectual Property”) and any reproductions of copyright works in the thesis,
for example graphs and tables (“Reproductions”), which may be described in this thesis, may
not be owned by the author and may be owned by third parties. Such Intellectual Property and
Reproductions cannot and must not be made available for use without the prior written
permission of the owner(s) of the relevant Intellectual Property and/or Reproductions.
iv. The ownership of certain Copyright, patents, designs, trade marks and other intellectual
property (the “Intellectual Property”) and any reproductions of copyright works in the thesis,
for example graphs and tables (“Reproductions”), which may be described in this thesis, may
not be owned by the author and may be owned by third parties. Such Intellectual Property and
Reproductions cannot and must not be made available for use without the prior written
permission of the owner(s) of the relevant Intellectual Property and/or Reproductions.
8
Acknowledgements
I would like to thank the BBSRC for funding this project. I would also like to thank Professor
David Leys for giving me the opportunity to undertake this research and for his continued help
and advice throughout. Thank you to members of the research group who have offered their
help during my time here. In particular, thanks to Dr Mark Dunstan for the valued time and help
offered to me; this was much appreciated.
I would like to thank all my chums from the enzymology group for making this a very enjoyable
time. Thanks to my parents; Bobby and Norman Law for their continued input……of pounds into
my bank account! I could not have done it without you. Thank you to Emma Cartwright for her
sustained efforts to provide me with other things to stress about. This project may have ruined
me otherwise! I would like to thank anyone who is a friend of mine.
9
Abbreviations
AdhR HTH-type transcriptional regulator AdhR APS Ammonium Persulphate ASA Accessible Surface Area ATP Adenosine triphosphate AU Asymmetric Unit BLAST Basic Local Alignment Search Tool bp Base Pair C1-FFL Coherent type 1 feed forward loop CbnR LysR-type regulatory protein CbnR CD Circular Dichroism ChiP Chromatin immunoprecipitation CsoR Copper-sensing transcriptional repressor
CsoR CueR HTH-type transcriptional regulator CueR DMGO Dymethylglycine oxidase DMSO Dimethyl Sulphoxide DNA Deoxyribonucleic Acid DOR Dense overlapping regulon DskA DnaK suppressor protein DskA E. coli Escherichia coli EC Elongation complex EDTA Ethylenediaminetetraacetic acid EMSA Electropheric Mobility Shift Assay FA Formaldehyde FAD Flavin adenine dinucleotide Fae Formaldehyde activating enzyme FDH Formaldehyde dehydrogenase FDP Formaldehyde detoxification pathway FFL Feed forward loop FGH S-formylglutathione hydrolase FIS Factor for inversion stimulation FLP Flippase recombinase FrmR Transcriptional repressor frmR FRT Flippase recognition target GalR HTH-type transcriptional regulator GalR GFP Green Fluorescent Protein GSH Glutathione GSH-FDH Glutathione dependent formaldehyde
dehydrogenase GSH-FDP Glutathione dependent formaldehyde
detoxification pathway H4MPT Tetrahydromethanopterin HGT Horizontal Gene Transfer HK Histidine Kinase HMGSH S-hydroxymethylglutathione HPS 3-hexulose-6-phosphate synthase HTH Helix-turn-Helix HU Histone-like bacterial DNA-binding protein
HU HU DNA-binding protein HU HxlR HTH-type transcriptional activator HxlR HypR Transcriptional regulator HypR I1-FFL Incoherent type-1 feed forward loop IPTG Isopropyl β-D-1-thiogalactopyranoside ITC Initial transcribing complex KanR Kanamycin resistance gene LB Luria-Bertani
LS Light scattered LTTR LysR-type transcription regulator MAD Multi-wavelength anomalous dispersion MALS Multi-Angle Light Scattering MIR Multiple isomorphic replacement MR Molecular replacement MSA Multiple sequence alignment MSH Mycothiol NAD/NADH Nicotinamide adenine dinucleotide NAP Nucleoid Associated Protein NAR Negative auto regulation NEB New England Biolabs NIG National BioResource Project OD Optical Density OMPDC orotidine 5’- monophosphate
decarboxylase OmpR Transcriptional regulatory protein
OmpR PAR Positive auto regulation PCR Polymerase chain reaction PDB Protein Data Bank PEG Polyethylene glycol PHI 6-phospho-3-hexuloisomerase Poly(I)•Poly(C) Poly(deoxyinosinic-deoxycytidylic) acid
sodium salt ppb Part per billion ppGpp/pppGpp Guanosine tetapentaphosphate/
Guanosine pentaphosphate PSI-BLAST Position-Specific Iterative BLAST RCF Relative centrifugal force rcnR Transcriptional repressor rcnR RI Refractive index signal RMSD Route mean squared deviation RNA Ribonucleic Acid RNAP RNA polymerase ROS Reactive oxygen species RR Response regulator RuMP Ribulose monophosphate pathway SAD Single-wavelength anomalous
dispersion SDS Sodium dodecyl sulfate SDS-PAGE SDS-Polyacrylamide gel electrophoresis SEC Size exclusion chromatography SECC SEC column SIM Single input module SmtB Transcriptional repressor smtB SOM Self-organising Map TEMED tetramethylethylenediamine TF Transcription Factor TFBS Transcription Factor Binding Site THF Tetrahydrofolate TOF Time of flight TRN Transcriptional regulatory network UAS Upstream activating sequence UV Ultra Violet wHTH Winged- Helix-turn-Helix WT Wilde type YodB HTH-type transcriptional regulator
YodB
10
1 Introduction
This introduction first discusses transcription regulation in bacteria. This is followed by a
description of the biological context of formaldehyde detoxification. Then, the details that are
currently known with regard to the regulation of formaldehyde detoxification in bacteria are
discussed.
1.1 Transcription in bacteria
Transcription is the process that copies the genetic code from a DNA sequence into a
corresponding molecule of RNA. The resulting RNA molecule can subsequently be translated to
produce a corresponding protein. The concentration of these RNA transcripts within the cell
often governs how much of the corresponding protein will be produced. Control over the
transcription rate of particular genes therefore relates to a control of concentration of the
corresponding protein. This provides a means for organisms to control metabolic pathways by
regulating gene transcription via responses to particular stimuli.1 Although the process of
transcription in bacteria and eukaryotes displays significant similarities, they are also distinctly
different.2 Here we will only consider transcription in bacteria.
DNA transcription is initiated at a promoter site, which is a stretch of DNA in the chromosome
containing defined structural elements. For example, a standard promoter in E. coli (i.e.
transcribed by 70- see below) contains two conserved sequences: 5’-TTGACA-3’ at
approximately 35 base pairs upstream of the transcription start site and 5’-TATAAT-3’ at
approximately 10 base pairs upstream of the transcription start site; these are known as the -35
region and -10 region respectively. In between the -10 and -35 regions is a spacer region of
DNA of 17±1 base pairs. Typically promoters will deviate slightly from these conserved
sequences; in the absence of external factors the amount of transcription from a promoter
(termed promoter strength) depends largely on how similar it is to the consensus. In addition to
the -10, -35 and spacer region a promoter may contain other structural elements such as an UP
element which is usually an A/T rich DNA sequence from approximately -40 to -65. Another
promoter element that is sometimes found is an ‘Upstream activating sequence’ (UAS) from
approximately -40 to -150. UASs can contain sequences that induce a degree of bending in the
promoter DNA as well as protein binding sites. A schematic of a standard E. coli promoter is
shown in Figure 1-1. Structural elements of the promoter are known as cis-regulatory elements,
whereas external factors that affect transcription but are not part of the promoter are termed
11
trans-regulatory elements. In bacteria, one promoter typically controls the transcription of
several adjacent genes; collectively known as an operon. Operons are thus transcribed in one
transcriptional unit and tend to contain genes that perform a related function.3,4,5
Figure 1-1 - The consensus promoter for standard housekeeping genes in E. coli. The -10 and -35
regions are separated by a 17±1 bp spacer. The A/T rich UP region is located at ~ -40 to -65 and
the UAS can be located at ~40 to -150.
The process of transcription is catalysed by an enzyme called RNA polymerase (RNAP). RNA
polymerase is made up of four different subunits- , , ’ and a -factor. Two subunits, a
subunit, and a ’ subunit make up the ‘core enzyme’ which catalyses the reaction of adding
nucleotides onto a growing chain of RNA, using the genetic DNA as a template. When the -
factor subunit is attached to the core enzyme, the complex is known as the holoenzyme. The
holoenzyme is necessary to initiate transcription because the -factor is responsible for
promoter recognition and binds directly to the -10 and -35 regions. The UP element is believed
to further enhance promoter recognition by binding to the C-terminal domain of the -subunit.
Organisms have different types of -factors that recognize different types of promoters. For
example, in E. coli the -factor 70 recognises promoters for the standard ‘housekeeping’ genes
at exponential growth, whereas 32 recognizes promoters during heat shock. The overall
structure of RNA polymerase is highly complex and is said to resemble a crab claw in that it has
two ‘pincers’ comprised of the and ’ subunits. A channel exists between these two subunits
with the active site located at the base. The active site contains magnesium ions that are
essential for catalysis.6,7,8
Transcription involves three stages: chain initiation, chain elongation, and termination.
Initiation requires several things to happen; first, the holoenzyme has to recognize and bind to
the promoter elements. It is thought that the holoenzyme initially binds DNA non-specifically
and translocates along the DNA chain until it reaches a so-called recognition complex at a
promoter. Once a recognition complex between holoenzyme and promoter is formed, the
complex is capable of initiating transcription.9, 10 At this point the DNA at the promoter and
12
transcription start site is still in the double helical form, and the holoenzyme-promoter complex
is known as a closed complex. The next step in initiation is an isomerisation from the closed
complex to an open complex which is facilitated by the subunit. The open complex contains
12 2bp of DNA that has been separated or “melted” and occurs between +3 and -13 of the
transcription start site.11 Once the two DNA strands have melted, the template strand is
threaded into the active site channel of RNAP.12 Once created, the open complex is quite stable
and the next step of chain initiation can take place. The next step is transcription initiation in
which the first phosphodiester bond of the RNA chain is formed; the 5’ nucleotide is usually a
purine and normally adenine rather than guanine. RNAP binds two nucleotides which are
complementary to that of the DNA template at the transcription start site and forms a
phosphodiester bond between them. Only nucleotides that are complementary to the DNA
template can be added because the reaction is catalysed via base pairing. 13
Once the first phosphodiester bond is formed, RNAP continues joining successive nucleotides
using the genetic DNA as a template. Usually this process carries on until about 10 nucleotides
have been transcribed; these transcripts are then released from the enzyme and this procedure
is normally repeated many times. This is known as abortive initiation and at this point, the
holoenzyme is still intact and bound to the promoter elements; the complex at this stage is
called an “initial transcribing complex” (ITC). In order for the ITC to be converted into a stable
“elongating complex” (EC), RNAP needs to escape from the promoter. Eventually this will
happen in a process known as promoter clearance. Here, it is thought the -factor dissociates
from RNAP and the EC continues transcribing the template DNA.14
The transcription process is now at the elongation stage and the EC is stable and can continue
transcribing for many thousands of bps. The EC contains a heteroduplex of approximately 9bp
between the RNA strand and the template DNA. Transcription elongation is not continuous but
is characterized by pauses, which play an important role in transcription regulation. Several
factors can cause pausing such as the interaction of RNAP with secondary structures formed in
the RNA transcript. Regulator proteins called “transcription elongation factors” can also
influence pausing as well as particular DNA sequences that make pausing more likely.15
The last stage of transcription is chain termination which involves the stopping of RNA
synthesis, release of the RNA transcript and detachment of RNAP from the DNA strand. One
mechanism of termination that has been proposed is that RNAP is pushed forward in the 5’-3’
direction by an external force without the addition of nucleotides. The heteroduplex is therefore
shortened at both ends, which is thought to destabilize the EC leading to termination of
13
transcription.16 In E. coli this external force mostly comes from two sources: the first occurs at
particular “terminator sites” where there is a palindromic G/C rich region; when this region is
transcribed, the RNA has the capacity to form hairpin structures which induce the forward
translocation of the EC. The other source of external force comes from the helicase protein Rho.
Rho binds to RNA and translocates along it in the 5’-3’ direction using energy from ATP
hydrolysis. This process is thought to provide the force to push the EC forward.17,18 The
sequences and mechanism of how RNAP dissociates from template DNA and how the RNA
transcript is released remain unclear. The RNA transcript can then be translated at a ribosome
to produce the encoded proteins.
1.2 Transcription Regulation
As one of the early key steps towards protein synthesis, transcription is extensively regulated
within the cell. Transcription regulation refers to how a cell controls which genes are
transcribed and to what extent they are transcribed. Despite some level of regulation occurring
at the elongation stage, most appears to be conducted at the initiation stage. Regulation at the
initiation stage is provided via several factors.13
1.2.1 Transcription Factors
In order to regulate gene transcription in a very specific manner, bacteria use proteins known
as transcription factors (TFs). TF based regulation is the main way in which organisms maintain
control over transcription. However, the overall level of transcription from a promoter may be
influenced by a combination of factors. TFs can either decrease or increase transcription from a
promoter; a TF that decreases transcription is known as a repressor and a TF that increases
transcription is known as an activator. Some TFs can alter between repressor and activator
function depending on circumstances. Transcription from each operon within the genome is
usually regulated by one or more TF and TFs often regulate their own transcription (known as
autoregulation). TFs are usually DNA binding proteins that bind DNA at specific locations
usually at or near their target promoters; these binding sites are termed “Transcription factor
binding sites” (TFBSs). Once a TF is bound to its TFBS there are several ways in which it can
control transcription. 19
The binding of TFs to TFBSs is specific, which means that the TF has a higher binding affinity
towards its TFBS than the genomic DNA proximal to it. TFs contain a structural motif that
enables them to recognize and bind to specific DNA sequences. In bacteria, the DNA-binding
14
motif is usually a helix-turn-helix motif, which is ubiquitous in bacterial proteomes and plays a
key role in transcription regulation. The HTH motif is also present, though to a lesser extent, in
eukaryotes. It has been speculated that the HTH motif is one of the oldest structural motifs in
life and that all HTH containing proteins evolved from a common ancestor.20 A representative
structure of the HTH motif is shown in Figure 1-2; it is composed of a three -helix bundle ( 1,
2 3) with a conserved turn between 2 and 3. 3 is known as the recognition helix and makes
contacts to base pairs in the major groove of DNA, with 2 stabilizing the interaction. Hydrogen
bonds are made between residues of the recognition helix and functional groups of exposed
bases in the major groove. (Figure 1-2) 21
Figure 1-2- X-ray determined structure of the Helix-turn-Helix motif from the lac repressor in
complex with its operator DNA. DNA is shown in purple. 1, 2 and 3 are shown in orange red
and blue respectively. Hydrogen bonds are shown in green and side chains that make contacts to
the base pairs in the major groove are coloured brown.
A common variation on the HTH motif is the winged helix-turn-helix (wHTH) motif. The wHTH
motif contains an antiparallel -sheet packed against the three helix bundle. The hairpin loop
between the two strands form the “wing” of this motif; the wing often binds directly to target
DNA by making contacts with the minor groove. wHTH motifs can also have a second wing
caused by the presence of another hairpin loop. A schematic of the general structure of a winged
helix is shown in (Figure 1-3). OhrR is a TF from Bacillus subtilis that contains a wHTH motif in
which the wing makes contacts with the minor groove. Figure 1-4 shows the crystal structure of
the B. subtilis OhrR bound to its TFBS. Most bacterial TFs exist as homodimers and therefore the
corresponding TFBS contains 2 (near) identical DNA binding motifs (usually between 12 and
30bp) termed an inverted repeat or pseudo-inverted repeat sequence. This organization leads
15
to two near identical binding sites in sequential major grooves that can each accommodate one
of the DNA-binding motifs from the homodimer.22 This is also exemplified in Figure 1-4.
Figure 1-3 - Schematic of the wHTH motif. Arrows represent strands, cylinders represent -
helices and lines represent loop regions. The 3 helices of the HTH motif are packed against an
antiparallel -sheet. The loops between -strands make up the wings of the motif.
Figure 1-4 - Crystal structure of OhrR from Bacillus subtilis bound to the corresponding TFBS. DNA
is coloured purple. Chain A and chain B of the dimeric OhrR are coloured blue and yellow
respectively. The structure reveals that the wing of the wHTH motif interacts with the minor
groove of DNA. The dimerisation of OhrR (as observed for many TFs) arranges the DNA binding
motifs of each subunit to interact with successive major groves.
16
In order to understand how TFs function, it is necessary to understand the mechanism of how
they can recognize and bind to their TFBS. The process of a protein recognizing a specific DNA
sequence is called readout; readout can broadly be considered to be a result from two types of
readout : base readout and shape readout.23 Base readout (or direct readout) generally refers to
the network of hydrogen bonds that are formed between the proteins DNA binding motif and
the bases in the major and minor grooves. Proteins will form different hydrogen bonding
networks with different DNA sequences; some of these arrangements are more stable than
others which gives rise to this base specificity.24, Sequence specific hydrophobic interactions can
also play an important role in base readout. An example of this can be found in the TF called P22
c2 from the lambdoid bacteriophage p22 and its operator sequence. The crystal structure of
P22 c2 in complex with its operator shows that it binds using a HTH motif and that four
successive 5-methyl groups from thymines create a binding cleft in the major groove that
specifically accommodates a valine side chain from P22 c2.25
Shape readout (sometimes termed indirect readout) refers to how a protein recognizes the
overall shape of the DNA sequence at its binding site. The overall shape of a DNA molecule can
vary depending on the sequence. Particular sequences of bases can cause the molecule to
become more flexible which can result in bends, kinks and other deformations.26 In most
environments DNA exists in B-form, however most specific DNA-protein complexes require the
B-DNA to be distorted into a ‘non-ideal B-DNA shape’. It is therefore often the case that
specificity results from the ability of a DNA binding site to assume a conformation different to
its native form. This ability stems from particular DNA sequences that usually do not contact the
protein. The binding of a protein at these TFBSs can result in the stabilization of a non-native
DNA conformation. Specificity in shape readout therefore results from the ability of the DNA
sequence to distort from its native conformation and the ability for the protein-DNA complex to
stabilize this deformation.27 The result of base and shape readout is a highly complicated
recognition mechanism in which either readout mechanism can play the most significant role.
This makes predicting specific protein-DNA complexes very difficult indeed despite on-going
research into this area.28,29
For a protein to recognize a specific DNA sequence, it needs to be able to find its recognition
sequence amongst the vast amount of DNA within the cell. It has been calculated that DNA
binding proteins bind their target DNA far quicker than can be accounted for by three
dimensional diffusion within the cell; a phenomenon known as facilitated diffusion.30 Due to the
complexity of the problem, the mechanism of facilitated diffusion in this context is largely
17
unresolved.31 Experiments do however suggest the possibility that in addition to performing a
three dimensional search, the protein also performs a one dimensional search by sliding along
the DNA.32 Here the protein binds non-specifically to the DNA and does a one dimensional
search along the DNA for a short length (believed to be <150bp). The protein then dissociates
and either carries on with a three dimensional search or moves a short distance relative to its
dissociation point and rebinds the DNA at another location to perform another one dimensional
search. In addition, proteins are thought to be capable of transferring from one DNA section to
another in a process called intersegmental transfer.33,,34 A schematic of how a DNA binding
protein searches for its particular sequence via this proposed mechanism is shown in Figure
1-5.
Figure 1-5 - Adapted from 35. Schematic of proposed mechanisms used for a TF to search for its
TFBS. DNA is represented by the red line, TFs are represented by yellow circles and the TFs
trajectory is represented by black arrows. 1- Three dimensional diffusion. 2 -One dimensional
sliding along DNA. 3- Dissociation from DNA followed by a short translocation before
reassociation . 4- intersegmental transfer.
Non-specific protein-DNA interactions have long been thought to be dominated by electrostatic
interactions between DNA-binding domain residues in the proteins and the negatively charged
phosphate backbone in the DNA.36,37 An NMR study on the lac repressor from E. coli shows its
structure bound to the corresponding TFBS, as well as to a non-specific DNA sequence; the
structures are shown in Figure 1-6. In both cases the protein uses the same structural motifs to
bind to each sequence. However, the type of interaction is very different. In the non-specific
complex, side chains from the HTH motif are contacting the phosphate backbone of the DNA
instead of the bases in the major groove. In the specific complex contacts are made from the
HTH motif to the bases in the major groove. In the non-specific complex, the DNA retains the
18
ideal B-form conformation whereas in the specific complex, the DNA structure is significantly
deformed from the ideal B-form conformation with a distinct kink in the centre of the DNA
strand. Furthermore, the non-specific complex contains water molecules at the protein-DNA
interface which are not present in the specific complex. There are some electrostatic
interactions between protein side chains and the phosphate DNA backbone which are mediated
through these water molecules. This is not usually seen in specific protein-DNA complexes and
is thought to be usual in non-specific complexes.38 Non-specific complexes are more flexible
than specific complexes as the interaction energy of non-specific protein-DNA complexes is
much smaller than in specific complexes. It is for these reasons that a protein can rapidly slide
along the DNA molecule non-specifically but when it comes into contact with its recognition
sequence, it becomes tightly bound. 39
Figure 1-6- Solution H1NMR structures of the N-terminal domains of lac repressor from E. coli
bound to a non-specific DNA fragment (A) and its TFBS(B). DNA is shown in purple and chain A
and chain B of the lac repressor are coloured blue and yellow respectively. In the non-specific
complex the DNA retains the ideal B-form structure, while the hydrogen bonding network shown
in Figure 1.2 (HTH) is not formed. Instead, contacts are made from the HTH motif to the sugar-
phosphate backbone. In the specific complex the structure of the DNA is deformed significantly
and a specific hydrogen-bonding network is formed between the HTH motif and functional groups
of the base pairs in the major groove.
19
Once bound to the corresponding TFBS, a TF can then perform its regulatory function.
Repressors can function via several mechanisms; the most common and obvious is called steric
blocking, where the TF binding results in blocking access to one of the core promoter elements.
This prevents the RNAP holoenzyme from binding to the promoter thereby preventing
transcription initiation. Usually the TFBS overlaps with either or both of the -35 or -10
promoter elements, thereby inhibiting binding of the -factor. Some TF have also been found to
prevent the C-terminal domain of the -subunit in RNAP from contacting the UP element.40
Repression is stopped when the TF binds a specific small molecule known as an effector.
Binding of an effector molecule usually induces a conformational change in the repressor
structure, preventing it from specifically binding to the TFBS. Transcription from the operon
then takes place for as long as the signal from the effector molecule prevents the TF from
binding to its TFBS. 41
Other less common mechanisms that are used to repress transcription exist. Some promoters
have TFBSs upstream and downstream of the transcription start site. This is observed in E. coli
with the galETK operon that encodes genes for enzymes which metabolize galactose. Two
TFBSs are located 114bp apart at -60 and +54 of the transcription start site. The TF GalR exists
as a dimer; one dimer binds to each of these TFBS; these dimers can then associate to form a
tetramer. This causes the formation of a loop that contains the transcription start site, resulting
in promotor occlusion and hence repression. Interestingly, full repression is only observed in
the presence of the nucleoid associated protein (NAP) called HU (section 1.2.2).42,43 Also some
repressors bind to DNA upstream of the transcription start site preventing elongation; others
can act by inactivating activator TFs. 44
Activators can also act via several mechanisms, which all function to improve the affinity of
RNAP towards the promoter. The first of these is by binding to the promoter upstream of the -
35 region and recruiting RNAP to the promoter by making contacts with the C-terminal domain
of the -subunit. The TFBS for this mechanism can vary considerably because of the presence of
a flexible linker in the -subunit.45 The second mechanism of activation involves the TF binding
directly adjacent to the -35 region where it again recruits RNAP but this time through
recruitment of the -factor.45 The third mechanism by which a TF can activate a promoter is by
binding near or at the -10 and -35 regions and altering the DNA conformation. This alters the
promoter conformation to allow RNAP to bind and initiate transcription.45 A fourth mechanism
has also been discovered, whereby an activator can cause the inactivation of a repressor
resulting in transcription of the promoter.46 Like repressors, activators often only become active
20
after interacting with a particular effector molecule. Again this allows transcription to be
induced in response to particular signals (i.e. presence of effector molecule).
In addition to activator and repressor TFs, occasionally transcription from a promoter is
regulated by a pair of cognate TFs known as a two component system. Approximately 10% of
the E. coli transcription factors belong to this family. 47 Two component systems consist of a
histidine kinase (HK) protein along with a response regulator (RR) protein. Most HKs are
membrane bound and contain an extracellular “sensing” domain at their N-terminus which can
detect a particular stimulus in the environment. In response to the specific signal, a
conformational change is induced in the HK causing it to become phosphorylated at a conserved
histidine residue.48 This reaction is catalysed by an ATP-binding domain within the HK and the
process is called autophosphorylation. The cognate RR then catalyses the transfer of this
phosphate group to a conserved aspartate residue in its N-terminal “receiver domain”.
Phosphorylation causes a conformational change in the RR that allows it to either activate or
repress transcription from its target promoter. This is achieved by modulating the RRs C-
terminal “output domain” which is usually a DNA binding domain.49,50 As with other TF systems,
two component systems can vary a lot in terms of types of HKs, RRs and the stimuli detected;
however they all function via the same sequence of phosphorylation.51
Based on sequence similarity, TFs are grouped into different protein families.52 By far the most
widely distributed and well-studied family of bacterial TFs is the LysR-type transcriptional
regulator (LTTR) family. The abundance of LTTRs is exemplified by E. coli in which ~15% of its
total 314 TFs belong to this family.52 They can be involved in the transcriptional regulation of a
wide variety of different genes and can respond to many different stimuli.53 LTTRs can act as
activators or repressors and usually regulate a single operon although some have been found to
regulate several genes at several locations within the chromosome.54 LTTRs are often
divergently transcribed from the operon which it regulates, allowing it to negatively regulate its
own transcription.55 LTTR proteins contain a conserved N-terminal wHTH domain. A linker
helix connects the N-terminal domain with a “regulatory” C-terminal which consists of two /
domains (RD1 and RD2) connected by a hinge region. Effector molecules bind to LTTRs at this
hinge region, which is thought to induce conformational changes in the protein.53,55 A schematic
of the LTTR CbnR from Ralstonia eutropha which is thought to represent the general structure
of LTTRs is shown in Figure 1-7. Active LTTRs bind to target DNA as tetramers; usually at two
distinct TFBSs covering approximately 60bp at the promoter.56 Activator LTTRs that are not
bound to an effector molecule, bind to target DNA as two separate dimers. On binding an
effector molecule, the LTTR is subject to a conformational change causing the two dimers to
21
oligomerize to form a tetramer. This oligomerization requires significant bending of the DNA
indicating that DNA conformation plays an important role in the function of LTTRs.57,58
Figure 1-7- Schematic of the structure of CbnR. The C-terminal consists of two domains , RD1 and
RD2 which both contain a core five stranded β-sheet flanked by three -helicies. The effector
molecule (chlorocatechol) is thought to bind at a linker that connects these two domains. RD2 is
connected to the HTH N-terminal DNA binding domain by a linker helix.
Although many LTTRs have been reported and characterized, to date only 5 full length crystal
structures have been determined.59 The lack of crystal structures is due to the general
insolubility of LTTRs. LTTRs have several oligomerization interfaces that can result in
precipitation at high concentrations. This is not a problem in the cell because the LTTRs
concentration is kept at a low enough value by negative autoregulation.60
The first full length crystal structure of an LTTR reported was that of CbnR from Ralstonia
eutropha. CbnR regulates, and is divergently transcribed from, the CbnABCD operon which
encodes genes for the degradation of chlorocatechol. CbnR activates the CbnABCD operon in
response to chlorocatechol and binds to the CbnABCD promoter from -20 to -80 of the
transcription start site; this binding also overlaps with the CbnR -35 and -10 regions.61 In
solution, CbnR exists as a tetramer consisting of a dimer of dimers, which is believed to be the
biologically active form. CbnR is composed of a DNA binding N-terminal wHTH domain from
residues 1-58, a linker helix from residues 59-87 and two regulatory domains (RD1 and RD2)
from residues 88-294. RD1 and RD2 both have similar structures that contain a core five
22
stranded β-sheet flanked by three -helices. A hinge region located between RD1 and RD2 is
where the effector molecule (chlorocatechol) is postulated to bind. The two subunits that make
up the individual dimers are different in conformation. One subunit displays an “extended”
conformation while the other adopts an “open” conformation. Here, extended and open refers to
the angle made between the regulatory domain and the linker helix. (Figure 1-8).
Figure 1-8- Structures of the open (right) and extended (left) conformations of the CbnR
monomer. The angle between the regulatory domain and the linker helix is ~130˚ in the open
form and ~50° in the closed form.
The two subunits making up each dimer are associated by anti-parallel helix-helix contacts
between their linker helices, comprising a “coiled coil” linker. In the overall tetrameric
structure, the two coiled coil linkers and the regulatory domains make up the main body of the
protein. The DNA binding domains are located on one face of the main body which enables all
four wHTH motifs to bind to the same strand of DNA with each dimer bound to a separate TFBS.
The binding of all four wHTH motifs to the same strand of DNA can only occur if the DNA
conformation is significantly bent. (Figure 1-9) As CbnR is typical of the LTTR family, its
structure is thought to be representative for this family of TFs.62
23
Figure 1-9 - Overall structure of the CbnR tetramer. The main body of the tetramer in coloured
according to chain. The four HTH DNA binding domains are coloured red. The black line
represents the CbnR promoter. The HTH motifs are located on the same face of the main body
where a pair HTH motifs can bind to a TFBS. The promoter DNA must be significantly distorted in
order for both TFBS to be occupied.
Another widely distributed family of bacterial TFs is the ArsR–SmtB family. These TFs are one of
the main metal sensing TFs in bacteria and function by repressing the transcription of genes
involved in metal homeostasis. ArsR–SmtB proteins are homodimers that bind to TFBSs located
at the promoter of target genes via a wHTH; this results in repression through steric blocking.
On binding of particular metals to the TF, this protein-DNA interaction is disrupted allowing
transcription from the promoter.63 SmtB from Synechococcus regulates the expression of the
smtA gene that encodes a metallothionein protein – SmtA that sequesters metals. SmtB is
divergently transcribed from smtA and in the absence of zinc SmtB specifically binds to the smtA
operator region. In the presence of zinc, SmtB dissociates from the promoter as a result of
conformational changes caused by binding to zinc ions. Each SmtB monomer binds two zinc
ions at two distinct sites, with only one of these sites having an effect on DNA-binding.64,65,66 The
crystal structure of SmtB was determined in the apo form and in the zinc-bound form, with zinc
being bound to the binding site that influences regulation. This selectivity was achieved by
mutating the cysteine residues to serine residues in the other zinc binding site. Comparison of
the two structures shows significant differences at the DNA binding motif; an overlay of the two
structures is shown in Figure 1-10. These differences in structure arise from the formation of a
new hydrogen-bonding network on Zn-binding. Zinc becomes coordinated to two histidine
residues, a cysteine residue, and a glutamate residue. This has an allosteric effect on DNA-
binding; for example, in the zinc bound form the alpha carbon of Ser-72 in the recognition helix
deviates from the apo form by 4.8Å. The result of this conformational change is the inability for
SmtB to form the specific hydrogen bonding network with its TFBS.67
24
Figure 1-10- Overlay of main chain atoms of apo and zinc bound forms of SmtB. Apo-brown and
blue. Zn-bound Red and Yellow. Zinc coloured grey. Wings and recognition helices of the wHTH
are labelled. Alpha carbon atoms of Ser72 move 4.8Å relative to each other. N-terminus of chain in
zinc bound form was not resolved.
Many TFs regulate transcription from more than one promoter. Also, the result of a TF initiating
transcription from one promoter can often result in the alteration of transcription from other
promoters. This results in a complicated hierarchy of interactions between TFs and target genes
known as a ‘transcriptional regulatory network’ (TRN). The TRN of an organism describes how
every TF regulates the expression of their target genes in response to stimuli. The TRN is
represented by a directed graph; the TFs and target genes make up the nodes of the network
which are connected to each other via interactions that comprise the edges.68 (Figure 1-11) A
TRN is a dynamic structure meaning that different stimuli alter expression patterns of
particular parts of the TRN. This allows the organism to alter expression patterns according to
changes in the external environment. These changes are driven by the mechanisms of
transcription activation and repression discussed above. TRNs evolve as genomes incorporate
newly acquired/duplicated genes. These new interactions are added to the TRN and similarly
genes that are deleted/silenced will be removed.69 Bacteria contain global TFs that regulate the
expression of many different genes. Several ‘global’ TFs dominate the TRN with most genes
being under control of a global transcription factor as well as a ‘local transcription factor’. A
local transcription factor only regulates genes in close proximity to its own locus. E. coli contains
seven main global TFs of which four also function as NAPs (including Fis and H-NS – section
1.2.2).70,71
25
Figure 1-11- Left- How TRN are represented by directed graphs. TFs and target genes are nodes
and the edges are interactions between them. Right- Taken directly from 69. Representation of a
TRN built from interactions between genes and TFs.
The TRN is built from smaller regulatory systems called ‘network motifs’.72 The most basic
network motif is called a “simple regulation”, where in response to a signal, one TF regulates the
transcription of a gene, with no other influencing elements. While the signal persists, there is an
increase in transcript concentration that reaches a steady state level in the cell, equal to the
ratio of production rate to degradation rate. On loss of the signal, transcript concentration
decays exponentially.73 Two variants on simple regulation motif are ‘negative auto regulation’
(NAR) and ‘positive autoregulation’ (PAR) where the TF regulates is own transcription, either
by repression or activation respectively. NAR is employed frequently in bacterial repressor TFs
with a strong promoter. In this case, there is a rapid response that reaches a steady state
transcript concentration sooner than for simple regulation.74 Additionally, NAR can increase
cellular stability as it provides a mechanism to reduce cell to cell transcript variation in growing
bacterial cultures.75 Conversely, PAR network motifs lengthen the time taken for transcript
concentrations to reach the steady state concentration and cause increased cell to cell variation
in transcript levels, both of which can be beneficial to an organism.76
More complicated network motifs abundant in bacterial TRNs are systems known as ‘feed
forward loops’ (FFLs) which are composed of three genes (A,B,C). The gene product of A is a
global TF and regulates the transcription of B and C; the gene product of B is a TF and also
regulates the transcription of C. As A can activate or repress both B and C, and B can activate or
26
repress C, there are eight possible combinations of a FFL. 77 Of the eight possible FFL, two are
most commonly found in bacterial TRNs and are shown in Figure 1-12.78 The first of these is
known as a coherent type-1 FFL (C1-FFL) which is where both A and B are activators. The
behaviour of the C1-FFL depends on whether both A and B (AND logic) or just one of A or B (OR
logic) is required for activation of C. In response to a signal, the use of AND logic at a C1-FFL
causes an initial delay in C production. However, when the signal is lost there is a rapid decrease
in C levels. The initial delay therefore allows for small fluctuations in signal strength without
inducing C transcription. This can be useful if the signal is something that naturally fluctuates
within the cell, so that only strong sustained signals result in induction of C.79 A C1-FFL that
employs OR logic creates an immediate increase in C, but with a delayed decrease. This type of
FFL can enable a prolonged signal response after the signal has been lost. 80
The second most common type of FFL in bacterial TRNs is known as the incoherent type 1 FFL
(I1-FFL). Here, A activates B and C, but B is a repressor of C. I1-FFL can have different outputs
depending on how strong a repressor B is. Initially there is a rapid increase in C; if B is a strong
repressor, a rapid decrease in C will follow which leads to pulses in C concentration. If B does
not fully repress C, then C will reach a steady state concentration quicker than if C was
controlled by simple regulation, therefore providing a rapid response to the stimulus.
Figure 1-12-Representation of the C1-FFL (left) and the I1-FFL (right). In both cases A is a global
TF and is an activator of B and C. In the C1-FFL B is a TF that activates C whereas in the I1-FFL B is a
repressor of C.
27
Another common type of network motif is a ‘single input module’ (SIM) (Figure 1-13). Here TF A
regulates the transcription of several genes involved in a similar or related function. The TF acts
either exclusively as a repressor or exclusively as an activator at all target genes in the SIM.78
SIMs can exist in biosynthetic pathways where the transcription of individual enzymes can be
initiated at different times subsequent to A being activated. This prevents the cell from wasting
energy unnecessary in the production of enzymes that are not required. 81
Figure 1-13 - Representation of the single input motif. A is a global TF that either represses all of
the target genes in the motif or activates all of the target genes in the motif. The target genes tend
to be activated/repressed at different A concentrations.
The last, and most complicated type of network motifs are ‘dense overlapping regulons’ (DORs)
which consist of sets of related target genes regulated by a number of TFs (Figure 1-14). A DOR
is composed of two layers of genes; a top layer of TFs and a bottom layer of target genes. Most
TFs and target genes in an organism can be clustered into, or are connected to a DOR. The
existence of DORs is inferred by the fact that the number of operons regulated by the same two
TFs is far more than would be expected to occur by chance. Organisms contain several DORs, for
example E. coli contains five that are each related to a particular component of metabolism.
Depending on the signal input, TFs of the top layer act in different combinations to affect
transcription from target gene promoters. A DOR can be responsive to many signals and so the
resulting output is an integration of all input signals. DORs are on a single layer meaning that
they operate in isolation from any other DOR; the other network motifs that have been
discussed are usually located at the output level of a DOR.78,73 Deducing how these network
motifs interact along with the overall behaviour of the TRN is highly complicated.
28
Figure 1-14- Representation of a dense overlapping regulon network motif. The top layer of TFs
respond to many input signals. Target genes are regulated by several TFs of the top layer with
different combinations leading to different outputs. Many input signals are integrated by the
network motif to produce many different output signals.
1.2.2 Other mechanisms of transcription regulation
Although organisms control transcription mainly through the use of transcription factors; other
mechanisms have been documented. These tend to be less specific and will alter transcription
from many promoters. One alternative mechanism by which cells control which promoters are
actively transcribed from is through the use of ‘alternative -factors’. These -factors recognize
promoter elements different from the -factor for standard promoters. In response to particular
stimuli, cells can alter the relative concentrations of different sigma factors. When RNAP binds
to these alternative -factors, the holoenzyme will recognize promoter elements specific to the
alternative -factor. Transcription initiation is therefore induced in promoters that contain
these specific promoter elements and decreased in those that do not.82 As mentioned in section
1.1 this is what occurs in E. coli when subjected to heat shock. An increase in temperature
causes an increase in the production of the 32 factor which binds to RNAP. These holoenzymes
recognize promoters with a consensus sequence of 5’-TCTCNCCCTTTGAA-3 at the -35 region
and 5’-CCCCATNTA-3’ at the -10 region with a 13-17bp spacer. 83 Many of the genes that are
under control of these promoters are molecular chaperones that can help refold partly unfolded
29
proteins. Other genes encode proteases that can remove misfolded and aggregated proteins
from the cell.84
Often the production of an alternative sigma factor is not regulated by the cell and its
concentration remains at a constant level. In this case, the sigma factors are usually present in
an inactive state and only become active in response to particular stimuli. This is achieved
through the use of proteins known as anti sigma factors. Anti sigma factors bind to the
alternative sigma factors preventing them from either associating with RNAP or binding to
promoter elements. Each anti -factor has a specific -factor to which it binds; this complex
dissociates when the anti -factor interacts with a specific signal in the cell.85,86 Anti -factors
can also act upon the standard promoter -factor; the E. coli 70 is inhibited by an anti-sigma
factor called Rsd. When E. coli is reaching the stationary phase of growth, there is an increase in
the cellular concentration of Rsd and an increase in the production of the stationary phase
sigma factor ( S). Rsd associates with 70 preventing it from efficiently binding to RNAP and
from binding to the -35 region at promoters.87 Rsd therefore sequesters 70 when entering the
stationary phase and allows S to bind to RNAP and initiate transcription from stationary phase
promoters.88
Transcription initiation can also be regulated by small molecules. This is what occurs when E.
coli is starved of nutrients. E. coli responds to starvation by conserving energy in what is known
as the stringent response. This response is characterized by a surge in the concentration of the
molecules Guanosine tetraphosphate (ppGpp) and guanosine pentaphosphate (pppGpp); known
collectively as (p)ppGpp. (p)ppGpp causes a global change in transcription, translation,
replication and transport in order for the bacteria to conserve energy. 89 (p)ppGpp is
synthesized in bacteria by enzymes of the RSH family; in E. coli this enzyme is called RelA. RelA
is thought to be activated by sensing a high concentration of unacylated tRNA at the ribosome.90
(p)ppGpp bind near the active site of RNAP; a reaction which is facilitated by the protein DksA.91
This causes RNAP to destabilize open complexes at all promoters. Promoters that are involved
in the production of translational machinery contain a G/C rich discriminative sequence that
makes the open complex quite unstable and very unstable in the presence of ppGpp/DksA. This
causes cellular processes that are involved in protein production to be slowed down.
Conversely, the promoters of biosynthetic pathways such as those that synthesize amino acids
contain an A/T rich discriminative sequence that stabilizes the open complex. These A/T rich
open complexes are able to cope with the destabilizing effect of ppGpp/DksA and so the genes
are preferentially transcribed.92 The stringent response therefore allows E. coli to use available
energy for the production of essential metabolites rather than cell growth and replication when
nutrients are limited. This gives the organism an increased chance of survival. Other stress
30
factors can induce (p)ppGpp production in E. coli and it has been suggested that these molecules
are mostly responsible for controlling the growth rate of this organism.93 In E. coli and other
bacteria, (p)ppGpp can control other metabolic processes as well as those discussed above.
These include the inhibition of DNA replication and inhibition of lipid synthesis amongst
others.94
The topology of a bacterial chromosome can influence which genes are transcribed and which
genes are not. Bacterial chromosomes are condensed into a structure known as a nucleoid; the
structure of the nucleoid can have important regulatory effects on transcription initiation. The
nucleoid structure is the result of several interactions such as DNA supercoiling (the result of
circular DNA twisting around itself) which is mainly caused by topoisomerases. Non-specific
DNA binding proteins known as “nucleoid associated proteins” (NAPs) also play a large role in
the formation of the nucleoid structure. NAPs affect the DNA structure by inducing bending,
wrapping and forming DNA-Protein-DNA bridges. As well as contributing to the organization of
the nucleoid structure, NAPs can regulate gene transcription as the alterations in DNA structure
will affect how RNAP interacts with promoters.95,
An example of such a protein in E. coli is the “factor for inversion stimulation” (Fis); during rapid
growth, Fis is one of the most abundant DNA-binding proteins in E. coli.96 Fis is known to bind to
the E. coli genome at many sites and has a preference to bind to A/T rich sequences located at
non-coding parts of the genome. These sequences are often located directly upstream of
promoters. The crystal structure of Fis from E. coli bound to DNA is shown in Figure 1-15 which
shows that Fis is a dimeric HTH containing protein. The binding of Fis to DNA is driven by a
form of shape readout. The recognition helix of the Fis HTH motif are much closer together than
is required for each HTH to fit into subsequent major grooves in ideal B-form DNA. A/T rich
DNA sequences however can induce a narrowing of a minor groove. When such a narrow minor
groove exists, the DNA conformation is such that Fis recognition helices can fit into the major
grooves that flank the compressed minor groove. Binding of Fis induces a large bend into the
DNA structure of approximately 65°. Fis interacts with DNA through hydrogen bonds between
residues in the recognition helix and bases in the major groove. These interactions however are
not thought to contribute to specificity, which is determined through A/T induced minor groove
narrowing. 97 When Fis binds near promoters, it is capable of either decreasing or increasing the
level of transcription from it. One of the main regulatory functions of Fis is thought to be to
decrease the transcription of unessential genes during rapid growth.98
31
Figure 1-15 - Fis binding to a DNA sequence. The central region of the DNA contains an A/T rich
sequence which causes narrowing of the minor groove. The DNA is bent by approximately 65°.
Another one of the most abundant NAPs in E. coli is called H-NS and is known to play a key role
in regulating transcription in E. coli. H-NS also has a preference to bind A/T rich sequences and
form bridges between sections of DNA.99 H-NS is thought to act exclusively as an inhibitor of
transcription and it has been observed that H-NS seems to preferentially inhibit transcription of
genes acquired by horizontal gene transfer (HGT)100. This provides the cell with a ‘gene
silencing’ mechanism that may allow HGT acquired genes to initially exist in an inert state so
that any negative effects associated with the gene are minimized until the gene is integrated into
the cell’s transcriptional regulatory network 101(section 1.2.1).
Signal responsive transcription termination is also a widely used mechanism to control gene
expression in bacteria.102 This can be achieved in a process called attenuation, in which a signal
causes transcription termination in an elongation complex that would otherwise continue
transcribing. This is generally achieved by particular signals inducing secondary structures in
the RNA transcript that cause transcription termination. (as discussed in section 1.1). In
absence of the signal, the RNA forms a secondary structure known as an anti-terminator which
allows for the continuation of transcription.103 This is seen in E. coli with the transcription of the
operon that encodes the biosynthesis of tryptophan. When tryptophan concentrations are low
in the cell, an anti-terminator structure is formed in the RNA transcript allowing transcription of
the operon. When tryptophan concentrations reach a particular level, a terminator secondary
structure is formed preventing transcription of the operon. This mechanism therefore allows
the bacterium to produce tryptophan synthesizing enzymes in response to a decrease in Trp
concentration. 104
The preceding discussion on network motifs indicated that promoter strength, i.e. how closely
the promoter resembles the consensus sequence, plays a role in the TRN. A promoter more like
32
the consensus will have more transcription from it than one that deviates from the consensus.
Promoter strength therefore plays a role in controlling transcription levels and its influence is
dependent on what other trans-regulatory elements are acting on the promoter. 105
The signals that TFs and other bacterial regulatory mechanisms respond to, vary extensively. In
the examples described above, the sensing of small molecules and ions has been described. Most
local TFs are sensitive to these types of signals. Additionally, the regulatory response to a
temperature heat shock in E. coli, where an increase in temperature results in a change in σ-
factor activity has also been described. In addition to small molecules/ions and temperature,
several other environmental stimuli can influence the regulation of gene expression. Stress
factors such as pH, osmolarity and low oxygen levels can cause changes in gene expression that
help the cell adapt to the current environment. Often these responses are driven by global TFs
however the transcriptional response can be driven by any of the mechanisms discussed
above.106
1.3 Formaldehyde – Toxicity, origins, and detoxification mechanisms
Formaldehyde is the simplest of all carbonyl compounds. The carbonyl group polarizes the
molecule making the carbon centre act as an electrophile. In case of formaldehyde, the lack of -
carbons means there is little steric hindrance to nucleophilic attack making it a particularly
reactive carbonyl compound.107 As such formaldehyde has long been considered a highly toxic
substance. The World Health Organization consider there to be sufficient evidence to classify
formaldehyde as “Carcinogenic to Humans”, with formaldehyde induced nasal cancer being
directly observed in rats. 108,109 It is generally accepted that formaldehyde’s carcinogenic
properties result from the ability to form cross-links between DNA and proteins within the
cell.108 If these cross-links are not repaired properly, mutagenesis and the onset of cancer may
result.110,111
This cross-linking property of formaldehyde has been widely exploited as a means to analyse
protein-DNA interactions.112 In this type of experiment (Called Chromatin immunoprecipitation
or ChiP), formaldehyde is added to live cells, causing the formation of protein-DNA cross links.
Cells are then analysed to determine which DNA sequences interact with a particular protein.
33
The mechanism by which formaldehyde induces cross-links between proteins and DNA is
shown in Figure 1-16.113
Figure 1-16 - Reaction mechanism of formaldehyde induced protein-DNA cross link formation.
Nucleophilic attack from the amine group of cytosine on formaldehyde results in the formation of
an imine intermediate. Nucleophilic attack from the amine group of lysine results in a covalent
one carbon cross-link between the two amine groups.
Despite the high toxicity of formaldehyde, it is present within all organisms.114 The ubiquity of
formaldehyde is in part due to the fact that it is produced in vivo via many metabolic pathways,
notably through the oxidative demethylation of biomolecules. For example, the oxidative
demethylation of the metabolite sarcosine, or DNA, produces formaldehyde as a by-
product.115,116 Formaldehyde is also formed naturally in the atmosphere due to the
photooxidation of hydrocarbons and exists in unpolluted air at a concentration of
approximately 1ppb. This value can be increased several times in urban areas and reach a value
of 80ppb during heavy traffic.117,118,108
The damaging effects of formaldehyde, along with its production within the cell, suggest that life
must have found an efficient way to cope with formaldehyde. In fact, it has been discovered that
several formaldehyde detoxification pathways have evolved.119 Not only are organisms capable
of detoxifying formaldehyde, some methyltrophic bacteria can use it as their sole source of
carbon.120 Formaldehyde is an essential intermediate for growth and energy metabolism in
methyltrophic bacteria and keeping intracellular formaldehyde concentrations at a high though
non-toxic level is an essential component of their metabolism. This balance is normally achieved
by utilizing a combination of several formaldehyde oxidation pathways. 121,119
34
Many formaldehyde detoxification pathways require a cofactor to react with formaldehyde
prior to oxidation; there are several cofactors that conduct this role depending on the
detoxification pathway. All these cofactors are shown in Figure 1-17 and their roles will be
discussed in this review.
Figure 1-17 - Structures of cofactors involved in formaldehyde detoxification. A- Glutathione
(GSH) , B- Mycothiol, C- Tetrahydrofolate (THF) , D- Ribulose monophosphate, E-
Tetrahydromethanopterin (H4MPT). Nucleophilic atoms that react directly with formaldehyde are
highlighted in green.
The most widespread formaldehyde detoxification pathway is a glutathione (GSH) dependent
mechanism which is found in most prokaryotes and all eukaryotes. This pathway is considered
to be the main formaldehyde detoxification pathway in the majority of living organisms.122 A
notable exception appears to be archea, where genes for this pathway have not been
observed.123,124 The first step in this process involves the spontaneous reaction of formaldehyde
with GSH to form S-hydroxymethylglutathione (HMGSH). This adduct is then oxidised to S-
formylglutathione by the enzyme formaldehyde dehydrogenase (FDH) (a type 3 alcohol
dehydrogenase) using NAD.125,126,127 S-formylglutathione is then hydrolysed by the enzyme S-
formylglutathione hydrolase to give GSH and formate.128 Formate can then be further oxidized
35
to carbon dioxide by formate dehydrogenase.129 The mechanism of formaldehyde detoxification
via a GSH-FDH pathway is shown in Figure 1-18 an NADH molecule is produced in the reaction
suggesting the pathway serves an energy generation role as well as a detoxification role.
Figure 1-18 – The glutathione dependant formaldehyde detoxification pathway (GSH-FDP).
Glutathione spontaneously reacts with formaldehyde. The adduct is then oxidized by
formaldehyde dehydrogenase to form S-formylglutathione which is then hydrolysed by S-
formylglutathione hydrolase to produce GSH and formate.
GSH-FDH enzymes have been found to be highly conserved throughout life with sequences of
mammalian and bacterial homologues displaying approximately 60% identity. This very high
level of conservation implies that GSH-FDHs play an important role in life; they are also thought
to be the progenitor of all alcohol dehydrogenases.130,131
The GSH-FDH enzymes have been studied in detail and a number of substrates have been
identified along with HMGSH. These substrates include long-chain aliphatic alcohols and S-
nitrosoglutathione; the latter is thought to play an important role in nitric oxide
biochemistry.132,133 GSH-FDH shows little reactivity towards short chain alcohols.127 The crystal
structure of human GSH-FDH reveals it exists as a homodimer with each subunit composed of
two domains: a coenzyme binding domain, and a substrate binding domain; the enzyme’s active
site is located in a cleft between them. Each subunit contains two covalently bound zinc atoms;
one is required for structural stability and the other is required for catalysis. As is the case for
36
other alcohol dehydrogenases, this catalytic zinc atom acts as a Lewis acid and polarizes the
alcohol’s hydroxyl bond to facilitate hydride transfer.134,135 (Figure 1-19).
Figure 1-19 - Overall structure of Human GSH-FDH. The structure is a dimer (Chain A and B) The
catalytic domains are coloured yellow and light blue and the substrate binding domains are
coloured brown and green. Residues that form the entrance into the active site are coloured red.
The catalytic zinc atoms are coloured blue and the structural zinc atoms are coloured orange.
The structure of human GSH-FDH with bound HMGSH and reduced NAD (Figure 1-20) shows
that the substrate binding site is large and hydrophobic except for a polar binding pocket. This
polar binding pocket contains an aspartic acid, a glutamic acid, and an arginine side chain which
bind HMGSH via hydrogen bonding. The large hydrophobic binding site is likely to be the reason
why this enzyme is inactive towards short-chain alcohols. This structure also shows how the
HMGSH hydroxyl group directly co-ordinates to the catalytic zinc atom.136 The high homology
observed for all GSH-FDHs implies these enzymes should possess very similar structure and
function in other organisms.
37
Figure 1-20 - Right- Structure of chain A of Human GSH-FDH bound to HMGSH and reduced NAD.
The ribbon is coloured according to residue type. Blue is positively charged, red is negatively
charged, yellow is neutral and white is hydrophobic. GSH is coloured orange and NADH is coloured
green. Zincs are coloured orange. Left - Close up of the GSH-FDH active site. Hydrogen bonds
between the enzyme and GSH are displayed as dotted green lines.
The FGH enzymes are also conserved throughout life with prokaryotic and eukaryotic forms
displaying between 40-80% identity.124 FGHs are esterases and have been shown to be capable
of hydrolyzing C-S bonds, as in the case of S-formylglutathione; as well as C-O bonds such as -
Naphthyl acetate and p-Nitrophenyl acetate.137,138 The crystal structure of human FGH reveals a
dimer of approximately 62kDa (Figure 1-22). Other FGHs exist in a tetrameric form as a dimer
of dimers. The overall structure is similar to that of a typical / hydrolase fold; each monomer
contains a 9 stranded -sheet with 3 -helices on one side and 10 -helices on the other.
Figure 1-21 - Overall structure of dimeric human FGH. Chain A is coloured blue and chain B is
coloured yellow. Side chains from the three residues that make up the active site are shown in red
and are labelled.
38
The enzyme contains acyl and thiol/alchol binding pockets located near the active site, which
allows for the accommodation of substrate molecules such as S-formylglutathione. The active
site of the protein contains a catalytic triad consisting of a nucleophilic serine residue along with
an aspartic acid and histidine residue which act as a general acid and general base, respectively
(Figure 1-22). Mutation of any of these three residues abolishes the enzyme’s hydrolytic
activity. The active site also contains an oxyanion hole, which is thought to hydrogen bond to
the oxygen atom of the tetrahedral intermediate, thereby stabilizing the negative charge (Figure
1-22). This stabilization of the intermediate lowers the activation energy of the reaction,
thereby increasing the rate of hydrolysis.139,140 The structure of human FGH is very similar to
that of FGHs from other organisms that have been determined (all of which are prokaryotic)
with the catalytic triad and substrate binding pockets fully conserved.141
Figure 1-22 - Mechanism of action of the catalytic triad and intermediate stabilization by the
oxyanion hole in FGH. Asp226 and His260 act as a general base and acid, respectively. Ser 149 acts
as a nucleophile adding to the carbonyl carbon of S-formylglutathione. Amine groups from
residues that make up the surface of the oxyanion hole stabilize the tetrahedral intermediate by
hydrogen bonding to the negatively charged oxygen.
In some Gram-Positive bacteria, mycothiol (MSH) is used as the co-factor in this pathway rather
than glutathione. These MSH dependent FDHs are related to GSH-FDHs (approximately 35 %
identity) and are thought to perform catalysis in a slightly different way. The details of how
MSH-FDHs function is not fully understood.142
Probably the next most abundant formaldehyde detoxification pathway is the ribulose
monophosphate (RuMP) pathway, which was initially thought to be found only in methyltrophic
bacteria, but has been shown to exist in many non-methlytrophic bacteria and archea.143,144 As
39
well as its role in detoxification, the RuMP pathway is one of the most important mechanisms of
formaldehyde fixation in methyltrophic bacteria145. The first step in this process is an aldol
condensation between formaldehyde and ribulose 5-phosphate which is catalysed by 3-
hexulose-6-phosphate synthase (HPS). D-arabino-3-hexulose-6-phosphate is then isomerised by
the enzyme 6-phospho-3-hexuloisomerase (PHI) to give fructose 6-phosphate. The product
fructose 6-phosphate can then be phosphorylated to fructose 1,6-biphosphate within the cell
which is further metabolized to pyruvate by glycolysis.146,147 The mechanism for this process is
shown in Figure 1.23.
40
Figure 1-23 - A- Mechanism of the RuMP pathway. A) The first step is an aldol condensation
between Hexalose-6-phosphate and formaldehyde to form D-arabino-3-hexulose-6-phosphate
catalysed by HPS. HPS promotes formation of the enoldiolate intermediate by stabilization of the
intermediates negative charge from the positively charged magnesium ion. B - The second step is
an isomerization of D-arabino-3-hexulose-6-phosphate to form Fructose-6-phosphate catalysed by
PHI. C- Fructose-6-phosphate can be further phosphorylated within the cell to fructose1,6-
biphosphate which can be further metabolized via glycolysis.
HPS is a member of the orotidine 5’- monophosphate decarboxylase (OMPDC) superfamily of
enzymes. These enzymes are usually dimeric and contain a ( / )8-barrel fold with two identical
active sites at the dimer interface. Despite these structural similarities, the OMPDC type
enzymes appear to catalyse unrelated metabolic reactions.148,149 The crystal structure of HPS
from Mycobacterium gastri displays this consensus structure (Figure 1-24) and is approximately
41
25kDa. The active site is positioned at the end of the third strand and is made up of a
conserved Asp-X-Lys-X-X-Asp motif along with 4 other polar residues. The final Asp of the
conserved motif is part of a different active site from the other Asp and Lys. The active site also
contains a Mg2+ ion which is essential for catalytic activity. It is thought that the positive charge
on the metal ion acts to stabilize the enediolate intermediate and helps shift the equilibrium in
Figure 1-23 towards that of the enediolate intermediate.150,151
Figure 1-24 – Overall structure of HPS from Mycobacterium gastri. Ribbon is coloured according to
chain. Side chains of active site residues are shown and coloured red from the conserved Asp-X-X-
Lys-X-Asp motif and the 4 other polar residues are coloured orange. The Magnesium ions are
shown in green.
The crystal structure of a PHI enzyme from the archea Methanococccus jannaschii has been
determined which shows the protein to exist in a tetrameric form of approximately 80kDa. Each
monomer consists of a 5 stranded parallel -sheet with 2 -helices on one side and 4 -helices
on the other; the enzyme is predicted to have 4 identical active sites thought to be located in the
position indicated in (Figure 1-25).152 The mechanism of catalysis has not been studied in detail
due to the instability of the substrate, but kinetic experiments using PHI coupled to HPS
suggests that it is the HPS catalysed aldol reaction that is the rate determining step in the RuMP
pathway.153
42
Figure 1-25 - Left – Overall structure of PHI from Methanococccus jannaschii. Ribbon is coloured
according to chain. Chain A –Red, Chain B- Blue , Chain C- Green, Chain D- Yellow.
Right- Structure of monomeric PHI showing the predicted position of the enzymes active site.
Structure is coloured according to secondary structure.
Alongside the two formaldehyde detoxification pathways discussed above, two others are
known. The first of these utilizes a glutathione independent formaldehyde dehydrogenase
enzyme, which does not require formaldehyde to add to a co-factor before being oxidized. These
enzymes are found in bacteria though are far less common than GSH-FDHs. GSH-independent-
FDH enzymes are homologous to GSH-FDH but clearly distantly related (approximately 24%
identity over the full length). The crystal structure of FDH from Pseudomonas putida shows that
they have a similar general structure to GSH-FDHs but with significant differences. Their
method of formaldehyde oxidation is also postulated to be very different. In fact, the overall
reaction involves a dismutation of formaldehyde where one molecule of formaldehyde is
oxidised to formate and another is reduced to methanol. The structural basis for how this
reaction is catalysed is currently poorly understood.154,155
The final formaldehyde detoxification pathway known is a Tetrahydromethanopterin (H4MPT)
dependent pathway which is found in all Methyltrophic organisms. Genes encoding this
pathway have also been found to exist in methanogenic archea and some bacteria of the
Burkholderia genus.119,156 The first step in this process involves the condensation of
formaldehyde with the cofactor H4MPT; this reaction is catalysed by the enzyme “Formaldehyde
activating enzyme” (Fae).157 This adduct is then further metabolised to formate and H4MPT by a
series of dehydrogenases, hydrolases and transferases.119
Although many organisms appear to possess one formaldehyde detoxification pathway, some
organisms rely on more than one of the four discussed above (some methyltrophic bacteria
43
possess all four). Burkholderia fungorum possesses genes for GSH dependent FDH, GSH
independent FDH and H4MPT dependent dehydrogenase enzymes. All three contribute to
formaldehyde detoxification in this organism with the H4MPT dependent dehydrogenase
pathway contributing least.156 Organisms may also possess several copies of equivalent genes
encoding the same formaldehyde detoxification pathway.158
Aside from detoxification, life has evolved other means to avoid the damaging effects of
formaldehyde. Some enzymes, which catalyse reactions that should produce formaldehyde,
avoid doing so by coupling the reaction to the synthesis of a methylated cofactor. This has been
shown to be the case with several enzymes that catalyse the oxidation of secondary amines at
the same time as producing methylated tetrahydrofolate (5,10-methylene-THF).159
Dimethylglycine oxidase (DMGO) is one of these enzymes. The crystal structure of (DMGO) from
Arthrobacter globiformis shows that the enzyme possesses two distinct active sites located on
the same polypeptide chain. The N-terminal domain contains an active site that catalyses amine
oxidation using an FAD cofactor whereas the C-terminal domain is bound to a THF molecule and
catalyses the formation of 5,10-methylene-THF. The position of these cofactors in the enzyme is
shown in Figure 1-26.
Figure 1-26 - Structure of DMGO from Arthrobacter globiformis. The N-terminal domain is
coloured blue with FAD shown in yellow and the C-terminal domain is coloured with THF shown in
green.
The reaction scheme in Figure 1-27 shows how the reaction proceeds in absence of THF.
Initially dimethyl glycine is oxidized by the FAD cofactor to form an iminium intermediate;
subsequent hydrolysis of this intermediate results in the production of sarcosine and
formaldehyde. Figure 1-27 also shows how the reaction proceeds in the presence of THF; the
iminium intermediate is demethylated by THF to produce sarcosine and 5,10-methylene-THF.
The latter can be used by the cell in essential “one-carbon metabolism” such as the biosynthesis
of purines. Given the large distance between each active site ( 42Å) it has been proposed that
44
the unstable iminium intermediate is channelled from one active site to the other. This
channelling is believed to be through an internal cavity between the two domains and is
predicted to be at a rate quick enough to avoid hydrolysis of the imine and release of
formaldehyde.160,161,162, 163
Figure 1-27 A/B - Reaction scheme of dimethylglycine oxidation by DMGO in the absence (A) and
presence (B) of THF. Dimethylglycine is demethlylated by DMGO using an FAD cofactor to produce
an iminium intermediate. Without THF this intermediate is hydrolysed to produce sarcosine and
formaldehyde. With THF the intermediate is demethylated by the THF cofactor to produce 5,10-
methylene-THF and sarcosine.
45
1.4 Regulation of Formaldehyde detoxification in bacteria
Transcription of genes that encode formaldehyde detoxification pathways have to be regulated
in such a way that their transcripts are of high concentration in the presence of formaldehyde,
and low when they are not required. In most organisms this appears to be controlled by TFs
(section 1.2.1). Different organisms contain different TFs that regulate their formaldehyde
detoxification pathways. While the enzymes that perform these reactions are well conserved
throughout life, there seems to be a greater variation in the type of TF that regulate these
operons. However, organisms that are closely related do tend to have the same/similar TFs
regulating the expression of these pathways. The regulation of formaldehyde detoxification
pathways has only been studied in a few organisms and the last part of this review will discuss
what has been already studied in this area with regard to individual species.
The first regulatory system involving a formaldehyde detoxification pathway to be studied was
a two-component system from Paracoccus denitrificans. Paracoccus denitrificans detoxifies
formaldehyde using a glutathione-dependent pathway. The FDH gene (flhA) and the FGH gene
(fghA) are transcribed from the same gene cluster and their transcription is thought to be
induced by formaldehyde. 122,164 A deletion of two genes showing homology to a HK (flhS) and an
RR (flhR) abolishes formaldehyde induced expression of the glutathione-dependent
formaldehyde detoxification pathway. Sequence analysis suggests that FlhS is not a membrane
bound HK, rather it is thought to be cytoplasmic; the RR FlhR is predicted to contain a HTH
binding domain. It remains unclear how this system senses formaldehyde.165
A similar system was observed in Rhodobacter sphaeroides which also uses a GSH dependent
formaldehyde detoxification pathway that is induced in the presence of formaldehyde.166 A two
component system composed of the HK AfdS (50% identity to FlhS over full length) and of the
RR AfdR (56% identity to FlhR over full length) is thought to play a fairly similar role as to the
FlhSR system in Paracoccus denitrificans. As with FlhSR, the details of how formaldehyde levels
are detected by AfdS are unknown. In addition to the AfdSR system, this organism contains
another two component system that contributes to the regulation of GSH-FDH and FGH gene
transcription. This system consists of the HK RfdS and the RR RfdR which show weak homology
to the other two systems (RfdS-displays 25% identity to AfdS over full length and RfdR to shows
40 % identity AfdR). The RfdSR system has been shown to repress the transcription of the GSH-
FDH gene, however this repression appears to be independent of formaldehyde levels. RfdS may
therefore respond to a different signal than that of AfdS. 167
46
Bacillus subtilis has been shown to detoxify formaldehyde via both a GSH dependent and a RuMP
pathway. In both pathways, transcription of the enzymes involved is increased by
formaldehyde. 168,169 The genes encoding HPS (hxlA) and PHI (hxlB) from the RuMP pathway are
transcribed from the hxlAB operon. Divergently transcribed from the hxlAB operon is a gene for
a TF called hxlR. Deletion of hxlR abolishes formaldehyde induced expression of hxlAB indicating
that the encoded TF HxlR is an activator of the operon and that formaldehyde somehow induces
activation.168 HxlR is the first member of its family to be characterized and is thus a member of
the HxlR family of TFs. The HxlR family are part of the GntR superfamily of proteins which are
dimeric and contain an N-terminal HTH DNA binding domain and a C-terminal effector binding
/oligomerisation domain. 170 Purified HxlR protein was found to specifically bind to the hxlAB
promoter directly upstream of the -35 region at two 25bp TFBSs designated BRH1 and BRH2.
(Figure 1-28)
Figure 1-28 - Genetic organization of the hxlAB operon in Bacillus subtilis. hxlR is divergently
transcribed from hxlAB with BRH1 and BRH2 located in the intergenic region.
A HxlR dimer binds to each TFBS with each site containing a consensus sequence, 5-
(A/C)AAGT(A/G)(A/ C)CT(A/T)- 3. HxlR has a higher affinity for BRH1 than BRH2 and binding
was found to be independent of formaldehyde. It has been postulated that HxlR should always
be bound to the corresponding TFBSs and following complex formation with the effector
molecule (likely to be formaldehyde but not necessarily) a conformational change allows
transcription of the hxlAB operon.171 The details of how formaldehyde causes increased
transcription of the hxlAB operon remain unclear. Interestingly, methylglyoxal also causes a
significant increase in expression of the hxlAB operon indicating that HxlR may be capable of
sensing other aldehydes.169
Unlike most GSH dependent pathways, in Bacillus subtilis there appears to be no FGH enzyme
suggesting that S-formylglutathione is metabolized by a different pathway. The gene encoding
the GDH-FDH (adhA) is located at the adhA–yraA operon (Figure 1-29) that also contains one
other gene yraA which encodes a cysteine proteinase. Upstream of this operon is a gene called
yraC which encodes a carboxymuconolactone decarboxylase.
47
Figure 1-29 - Genetic organization of the adhA-yraA operon in Bacillus subtilis. adhR is divergently
transcribed from the adhA-yraA operon. TFBSs are located at the adhA promoter in the intergenic
region between the adhA-yraA operon and at the yraC promoter and the adhR promoter in the
intergenic region between adhR and yraC.
Transcription of the adhA–yraA operon and yraC is induced by formaldehyde and requires the
presence of adhR, a gene encoding the activator TF AdhR. AdhR is also autoregulated. AdhR is a
member of the MerR family of TFs which are largely involved in metal sensing but also sense
other signals such as reactive oxygen species (ROS) (including formaldehyde). The MerR family
of TFs are homodimers that contain an N-terminal HTH motif and function as transcriptional
activators.172 AdhR specifically binds to an 18bp consensus inverted repeat region just
overlapping the -35 region of its target promoters; this interaction is not affected by the
presence of formaldehyde. A conserved Cys52 residue was found to be essential for
formaldehyde induced activation of adhA-yraA and yraC by AdhR. It has been speculated that
this residue becomes methylated in the presence of formaldehyde; this is thought to induce a
conformational change in AdhR resulting in transcription activation. This sounds plausible
although there is no evidence that this is the case.169 The case for thiol modification being the
sensing mechanism is further supported by the fact that cysteine modification often plays a role
in the detection of ROS. TFs can be modified by various mechanisms such as disulphide bond
formation or irreversible oxidation causing a response in the transcription of genes involved in
ROS metabolism.173 Addition of formaldehyde to cultures of Bacillus subtilis results in a global
response, inducing transcription of many other genes that do not detoxify formaldehyde but
help the cell repair from the damage incurred. These include: activation of TFs that cause a
restoration of cellular cysteine levels, induction of genes regulated by the global TF LexA which
controls genes for repairing DNA damage, induction of genes that encode proteins to repair and
degrade cross-linked proteins i.e. yraA, and interestingly formaldehyde induces genes
controlled by metal sensing TFs such as ArsR (section 1.2.1).169
A homologue of AdhR that is required for formaldehyde and methylgloxal induced GSH-FDH
expression has been documented in Streptococcus pneumoniae. This protein NmlR has 46%
48
identity to AdhR over the first 112 residues (AdhR contains a 28 amino acid sequence at the C-
terminus that is not present in NmlR) with Cys-52 being conserved indicating that this residue
may have an important function.172
An interesting operon encoding a GSH-FDP was found to be conserved on plasmids of
periodontal and emetic strains of Bacillus cereus. Plasmids of strains B. cereusAH818 and AH280
(periodontal) and B. cereusAH187 (emetic) were sequenced. Each strain was shown to contain a
~2.7kb plasmid called pPER272 from the periodontal strains and called pCER270 from the
emetic strain.174 Directly upstream of the GSH-FDP is a gene encoding a HxlR family protein. The
genetic organization of this operon is depicted in Figure 1-30 which shows that the gene is
divergently transcribed from the GSH-FDP. This is therefore a similar arrangement to that found
in the hxlAB operon in Bacillus subtilis but with an encoded GSH-FDP rather than a RuMP
pathway. The gene product of BcAH187_pCER270_0216 (HxlR-pCER70) is 39% identical to the
HxlR TF from Bacillus subtilis. This similarity indicates that the encoded TF may function in a
similar way to HxlR.
Figure 1-30 - Genetic organization of a formaldehyde detoxification pathway located on
pCER270_0216 from Bacillus cereus AH187. BcAH187_pCER270_0216 is divergently transcribed
from frmA and fgh.
E. coli also utilizes a GSH dependant pathway which is induced in response to formaldehyde.175
The GSH dependent pathway is located on a three gene operon known as the frmRAB operon
which encodes a GSH-FDH (frmA), a FGH (frmB) and a TF (frmR) (Figure 1-31).
Figure 1-31 - Genetic organization of the frmRAB operon in E.coli. All three genes are transcribed
in the same direction as part of one transcriptional unit.
If frmR is inactivated then transcripts of frmA and frmB are significantly increased; if E. coli is
treated with formaldehyde then levels of all three genes of the operon are increased.124,176 These
findings imply that the TF FrmR is a repressor of all three genes of the frmRAB operon and that
formaldehyde causes derepression. It is not understood how FrmR represses the FrmRAB
operon or how formaldehyde causes derepression. FrmR is a member of the largely
49
uncharacterized yet widespread DUF156 family of TFs. Only two other types of proteins from
this family have been characterized, both of which are involved in metal transport. These are
RcnR from E. coli which regulates genes involved in nickel transport, and the CsoR proteins
from Mycobacterium tuberculosis, Thermus thermophilus, and Bacillus subtilis which regulate
genes involved in copper transport. All these proteins have been shown to be repressors that
bind to an inverted repeat region overlapping the -35 and -10 sequences. In all cases
derepression is caused by the TF binding to a metal ion (Ni in the case of RcnR and Cu in the
case of CsoR). The X-ray crystal structures of CsoR from Mycobacterium tuberculosis and
Thermus thermophilus have shown that these proteins are -helical without any -sheet
contribution. Interestingly, despite being DNA-binding proteins they lack any known DNA
binding motif and how DUF156 TFs bind to their TFBS remains unknown. 177,178,179,180 A diagram
summarising the relationships between the TF families that are researched in this study is
shown in Figure 1-32.
Many details of how FrmR, the most widespread TF of FDPs, functions remain unknown. In
contrast, HxlR has been characterised to some degree, providing a platform to study the
molecular mechanism of formaldehyde sensing. This thesis aims to further characterise FrmR
from E. coli, as well as the HxlR from Bacillus subtilis and HxlR-pCER270 from Bacillus cereus
AH187.
Figure 1-32 - Diagram showing relationships between the families of TFs researched and
discussed in this study. Particular protein families are boxed in blue and particular TFs are boxed
in red
50
1.5 Overall Aims and Objectives
This research sets out to obtain a further understanding of how bacteria sense formaldehyde
and how this relates to the regulation of formaldehyde detoxification pathways. In order to
achieve this aim, several approaches will be taken. As the regulation of formaldehyde
detoxification is known to be controlled by TFs in several organisms, this research will focus on
some of these regulator proteins. This will include an in vitro analysis of their biophysical
properties and investigation of their interactions with other species such as target promoters
and formaldehyde. In order to do this it is necessary to acquire the TFs in a pure form in
solution which will be attempted through molecular biology techniques. Understanding protein
function can often be facilitated by detailed knowledge of their structures. This study will
therefore attempt to obtain high resolution structures of these TFs by using X-ray
crystallography. Additionally, it is hoped to study the TF from E.coli, FrmR in vivo by
constructing a plasmid based reporter system. Such a reporter system will allow FrmR activity
to be monitored in response to different stimuli and point mutations. The overall strategy is
thus to acquire information regarding of how these TFs might function and how this relates to
the cells metabolic response to formaldehyde.
51
2 Materials and Methods
2.1 Materials
2.1.1 Chemicals and Reagents
All chemicals and reagents used in this study were purchased from Sigma Aldrich Company Ltd
or BDH unless otherwise stated. All solutions were aqueous in deionised, distilled water unless
otherwise stated.
2.1.2 Enzymes and other proteins
A list of the proteins obtained commercially that are used in this study is shown in Table 2-1.
Protein Supplier Application
BamH1 NEB Endonuclease, digests AATGCC
Hind111 NEB Endonuclease, digests GGATCC
Nde1 NEB Endonuclease, digests CATATG
Dpn1 NEB Endonuclease, targets methylated DNA
Fusion High
Fidelity
Polymerase
NEB DNA polymerase used in PCR
Calf Intestinal
alkaline
Phosphatase
NEB Hydrolyses 5’ and 3’ phosphate groups of
DNA
T4 DNA ligase NEB Catalyses formation of phosphodiester
bonds between DNA molecules
“In fusion”
enzyme
Clontech Catalyses formation of phosphodiester
bonds between DNA molecules
RNAase QIAGEN Used in plasmid preparation to digest RNA
molecules
RNAase Sigma Used as standard in protein molecular
weight determination
BSA Sigma Used as standard for protein concentration
52
estimation and molecular weight
estimation, added to some restriction
digests and used as a control in
Fluorescence spectroscopy
Carbonic
anhydrase
Sigma Used as standard in protein molecular
weight determination
Ovalbumin Sigma Used as standard in protein molecular
weight determination
Lysozyme Sigma Used in cell lysis procedure (catalyses
hydrolysis cell wall)
DNAase Sigma Used in cell lysis procedure (hydrolyses
phosphodiester bonds of DNA molecules)
Table 2-1- Proteins used in this study that were obtained commercially
2.1.3 Oligonucleotides
All oligonucleotides were purchased from Eurofins Scientific. A list of the oligonucleotides used
in this study is shown in Table 2-2.
Name Sequence (5’-3’) frmR_F GCTGACTGAGCAACTTAATCTCGG frmR_R GGAATACACCTTCCGGGTCATCGC frmR_Nde1 GATGAGGTGCCATATGCCCAGTACTC frmR_ BamH1 GTTTACCGGGATCCAATGCAACGGCA frmR _Hind111 GTAATAGATTAAGCTTTTTAAGATAGGC frmRmutnde1F TTTTGTTTAACTTTAAGAAGGAGATATACCATATGCAGCAGCCATCATCA
T fmrRmutnde1R
ATGATGATGGCTGCTGCATATGGTATATCTCCTTCTTAAAGTTAAACAAAA
hxIR_ F GCTCTTAGGCCTTCATTGATGACG hxIR_R GCCGCAATCATTTCCACTAAACAT hxIR _Nde1 AAGGGGGGATTCCATATGAGCCGGAT hxIR _BamH1 TGCTGCGTTCGATCCTTTTTTATTGC hxIR_Hind111 TTGCGAAGAGCAAGCTTCAACGATTC cer24b1F AAGGAGATATACATATGGTGATTCATTATAAAGATAAAG cer24b1R GTCATGCTAGCCATATGGGACAAGGAAGGTTCAATTGCGC frmRABF CCGTTGCATTGGATCCCGTCTGAATGACCCGCGCGGCACTGG frmRABR CCGGAGTACTGGGCATATGGCACCTC kanRf CAGTAATACAAGGGGTCATATG kanRR GTTAGCAGCCGGATCCCTTAGAAAAACTCATCGAGCATC
frmRK10f GGAAGAGAAGAAAGCGGTCCTTACTCG
frmRK10r CGAGTAAGGACCGCTTTCTTCTCTTCC frmRT13f GGTCCTTGCTCGAGTTCGTCGTATTCG
53
frmRT13r CGAATACGACGAACTCGAGCAAGGACC frmRR14f GGTCCTTACTGCAGTTCGTCGTATTCG
frmrR14r CGAATACGACGAACTGCAGTAAGGACC frmRR16f CTCGAGTTCGTGCTATTCGGGGGCAG
frmRR16r CTGCCCCCGAATAGCACGAACTCGAG
fmrRR17f CTCGAGTTCGTGCTATTCGGGGGC
frmRR17r GCCCCCGAATAGCACGAACTCGAG frmrR19f CGAGTTCGTCGTATTGCGGGGCAGATTGATGC
frmRR19r GCATCAATCTGCCCCGCAATACGACGAACTCG frmRR46f CCATTAGCCGCGCCCGCAACGGCAGCGATC
frmRR46r GATCGCTGCCGTTGCGGGCGCGGCTAATGG frmRG47f GCCGTTCGGGCCGCGGCTAATGG
frmRG47r CCATTAGCCGCGGCCCGAACGGC
frmRK91f CGTGCCTATCTTGCATAGCTGAATCTATTACC
frmRK91r GGTAATAGATTCAGCTATGCAAGATAGGCACG fmrRAB150F ATTAGCCCCCCCCCCTTTCCT
fmrRAB150R GGCATTTCGCACCTCATCATCTGC cerBiotin Biotin-CCTTGTCCTTATAATGAATAACC fmrRABBiotin Biotin-GGTCTGCAACTTGCAGCCCGTCTGACC cerigR GGCTTAAATGCAACAGCAGCTCTAC DehaloF TAATAATCTCCTTTACATTAGGC DehaloR TTAATCTGCGGAATTTATC-Biotin BRH1F Alexa555-CTCTCCTCACAGTATCCTCCAAGTAACTTGTTG BRH1R CAACAAGTTACTTGGAGGATACTGTGAGGAGAG GFPF GGAGAAATTACATATGAGAGGATCGGG PGFPR ATGGGGTTCCAAGGTTAACCCAAAATGGG frmRC36AF GCTGGAGGGTGATGCCGAAGCCCGTGCCATACTCCAACAGATCG frmRC36AR CGATCTGTTGGAGTATGGCACGGGCTTCGGCATCACCCTCCAGC frmRC72AF GGGAAACGTTTGACCGAAATGAACGCCTACAGCCGCGAAGTCAGCCAATC
CG frmRC72AF CGGATTGGCTGACTTCGCGGCTGTAGGCGTTCATTTCGGTCAAACGTTTC
CC frmRC36SF GCTGGAGGGTGATGCCGAAAGCCGTGCCATACTCCAACAGATCG frmRC36SR CGATCTGTTGGAGTATGGCACGGCTTTCGGCATCACCCTCCAGC Table 2-2- List of oligonucleotides used in this study. Labels are indicated in bold.
2.1.4 Bacterial strains
E. coli DH5α (Novagen) was used as a host for cloning and propagation of plasmids. Arctic
Express (Stratagene) was used for all expression trials and large scale growth of recombinant
protein. E. coli K12∆frmR was obtained from the National BioResource Project (NIG, Japan).
Table 2-1 shows the details of each bacterial strain used in this study.
54
E. coli Strain Genotype
DH5α F- 80lacZ M15 (lacZYA-argF)U169 recA1
endA1hsdR17(r -,m +) phoA supE44 thi-1 gyrA96 relA1 -
Arctic Express B F- ompT hsdS(r – m –) dcm+ Tetr gal λ(DE3) endA Hte
[cpn10cpn60 Gentr]
K12∆frmR F- Δ(araD-araB)567, ΔlacZ4787(::rrnB-3), λ-, rph-1, Δ(rhaD-
rhaB)568, hsdR514, ΔfmrR::kan
K12∆frmR∆KanR F- Δ(araD-araB)567, ΔlacZ4787(::rrnB-3), λ-, rph-1, Δ(rhaD-
rhaB)568, hsdR514, ΔfmrR
K12∆frmR∆KanR(DE3) F- Δ(araD-araB)567, ΔlacZ4787(::rrnB-3), λ-, rph-1, Δ(rhaD-
rhaB)568, hsdR514, ΔfmrR, (DE3)
Table 2-3 - Genotypes of the E. coli strains used in this study.
2.1.5 Plasmid Vectors
pET-24b and pET-15b were obtained from Novagen. pft-A was obtained from the National
BioResource Project (NIG, Japan).
2.1.6 Growth Media
All bacterial cultures were grown in sterile Luria-Bertani (LB) media from Formedium. One litre
of LB media is composed of 10g tryptone, 5g yeast extract and 10g NaCl. Agar plates for
bacterial growth were comprised of Luria-Bertani (LB) agar obtained from Formedium.
Relevant antibiotics were added to the growth media and agar plates at the concentrations
shown in Table 2-4.
Antibiotic Concentration
Ampicillin (Formedium) 100 µg/mL
Kanamycin (Formedium) 25 µg/mL
Gentamycin (Sigma) 20 µg/mL
Table 2-4 – Concentrations of the antibiotics used in this study.
55
2.2 Molecular Biology Methods
2.2.1 Isolation of E.coli genomic DNA
Genomic DNA was obtained using ‘PowersoilTM DNA Isolation Kit’ (MO BIO Laboratories, Inc).
This procedure involves lysis of bacterial cells using both mechanical and chemical methods.
DNA is then bound to a silica membrane which is washed and then eluted. The procedure used
was as follows: 5mL cultures of E. coli Dh5α were grown in LB media at 37°C for 16h. 1.5mL of
this culture was then centrifuged at 10000 rcf for 6 minutes. 60µL of C1 (an SDS containing
buffer) and was added to a PowerSoil tube that contains buffer to facilitate lysis. The tube was
then shaken vigorously for 10 minutes before being centrifuged at 10000 rcf for 10 minutes.
The supernatant (approximately 500µL) was retained and mixed with 250µL of C2 (a buffer that
causes precipitation of non-DNA material) and left for 5 minutes. The solution was then
centrifuged at 10000 rcf for 1 minute and the supernatant retained (approximately 600µL), and
mixed with 200 µL of C3 (an additional buffer to help precipitate any remaining non-DNA
material) and left for 5 minutes. This solution was then centrifuged at 10000 rcf for 1 minute
and the supernatant (approximately 750µL) was taken and mixed with 1.2mL of C4 (A high salt
buffer). 675µL of this solution was then transferred to a Spin Filter and centrifuged at 10000 rcf
for one minute. The flow through was discarded and 500 µL of C5 (an ethanol based wash
buffer) was added to the Spin Filter. The Spin Filter was then centrifuged at 10000 rcf for 1
minute and the flow through discarded; the Spin Filter was then centrifuged again at 10000 rcf
for a further 2 minutes to remove any remaining C5. 100 µL of C6 (a low salt buffer) was then
added to the Spin Filter which was left for 5 minutes and then centrifuged at 10000 rcf for 1
minute. The flow through contained the genomic DNA and was stored at -20 ᵒC. This DNA was
used as a template for the amplification of frmR and the frmRAB promoter by PCR (see section
2.2.4).
2.2.2 Isolation of Bacillus subtilis DNA
Bacillus subtilis genomic DNA was purchased from ATCCTM. This DNA was used as a template for
amplification of the hxlR1 gene using PCR (See section 2.2.4).
56
2.2.3 Isolation of the hxlR2 gene and its promoter region from Bacillus cereus
AH818
The DNA used for amplification of the hxlR2 gene and its promoter was synthesised by Eurofins;
genomic DNA was therefore not required for amplification of these sequences.
2.2.4 Polymerase Chain Reaction
PCR was carried out using a T1 Plus Thermocycler from Biometra. The polymerase enzyme used
was “Fusion high fidelity polymerase” obtained from NEB. Reactions were performed in a 30uL
solution containing the enzyme and the recommended buffer from the manufacturer, 0.4µM of
each primer, 1µL DMSO (NEB), between 1 and 10ng of template DNA and 0.3mM of each DNA
nucleotide (NEB). The cycle program is shown in Table 2-5. This procedure was occasionally
modified when unsuccessful by varying the annealing temperature and extension times.
Product DNA was purified using a “PCR purification kit” from QIAGEN.
Temperature Time Number of cycles
94ᵒC 240s 1
94ᵒC 30s 25-30
3 ᵒC above Tm of lowest Tm
primer or 72 if the Tm is
>72ᵒC
30s 25-30
72ᵒC 20s/kbp 25-30
72ᵒC 360s 1
Table 2-5 - Procedure for PCR used in this study.
2.2.5 DNA purification
DNA fragments >100 bp were purified using the “PCR purification kit” from QIAGEN. The
procedure used was that provided with the kit. The DNA containing solution is diluted by 5
using a high salt buffer and incubated onto a small spin column with a silica membrane for two
minutes; the tube was then centrifuged at ~10000 rcf. The bound material was then washed
with 750 µL of a high salt buffer containing ethanol and centrifuged again at ~10000 rcf. The
DNA was then eluted with 50µL of a low salt buffer and centrifuged at ~10000 rcf to obtain the
purified DNA.
57
2.2.6 Restriction endonuclease digestions
Restriction digest reactions were performed in 10-100µL solutions containing the necessary
concentrations of enzyme and buffer provided by NEB along with the DNA to be digested.
Digestions were done at 37ᵒC for the appropriate length of time as defined by manufacturer.
Reaction products were purified using “PCR purification kit”. (QIAGEN).
2.2.7 Gel extraction of DNA
DNA was extracted from agarose gels by visualising the gel under UV light and picking the
desired DNA band with a pipette tip. The pipette tip was then flushed in a relevant PCR mix and
the DNA was amplified using PCR.
2.2.8 Agarose Gel Electrophoresis
For the electrophoresis of DNA, typically a 0.8% agarose gel was cast in TAE buffer 40mM Tris,
1.3mM EDTA, 20mM glacial acetic acid, pH 8.5). DNA samples to be loaded onto the gel
contained 1×loading dye (NEB). A 1kb DNA marker (NEB) was also loaded onto the gel. The gel
tank (Geneflow) was run in TAE buffer at 90V for approximately 45 minutes. DNA was then
visualised by exposure to UV-light.
2.2.9 Ligation cloning
Cloning reactions that used “sticky ends” and ligase so as to condense two DNA molecules
together first required dephosphorylation of the digested plasmid to avoid self-ligation. A 50µL
aqueous solution of the plasmid DNA was incubated with Calf alkaline phosphatase along with
its corresponding buffer (NEB) at the concentrations specified by the manufacturer for 16 hours
at 37ᵒC. The dephosphorylated plasmid was then purified with “PCR purification kit”
(QIAGEN).
Ligations were then carried out in 20 µL reactions with T4 ligase and its corresponding buffer
(NEB). A 150 times molar excess of insert DNA fragment relative to plasmid was used in the
reaction. 6µL of the reaction product was then transformed into 50 µL of E. coli competent cells.
58
2.2.10 “Non-ligation dependent cloning” cloning
For non-ligation cloning reactions, the “in-fusion” cloning system (Clontech Laboratoriesm, Inc)
was used. In a 10µL reaction, approximately 200ng of the digested plasmid was incubated with
approximately 200ng of the insert along with the “in-fusion” enzyme and buffer for 15 minutes
at 37ᵒC, then at 50ᵒC for 15 minutes. The reaction is then cooled to approximately 0ᵒC and
diluted by a factor of 50 in TE (10mM Tris/HCl, 1mM EDTA, pH8.0) buffer. 2 µL of this solution
was then used to transform 50µL of competent E. coli cells.
2.2.11 Preparation of competent cells
To obtain E. coli cells capable of being transformed with a plasmid, they needed to be made
“competent”. A 100mL culture of the E. coli strain was grown to an OD600 of 0.5 and then cooled
at approximately 0ᵒC for 10 minutes. The culture was then centrifuged at 2000rcf for 7 minutes
and the resultant pellet was resuspended in 10mL of 100 mM CaCl2 (at 4ᵒC). The cells were
then kept at ~0ᵒC for 90 minutes and then centrifuged at 2000rcf for 7 minutes and then
resuspended in 2mL of a sterile solution of 100mM CaCl2, 20% glycerol. The cells were then split
into 150 µL aliquots and stored at -80ᵒC.
2.2.12 Transformation of E. coli with plasmids
Stored competent cells were thawed at ~0ᵒC and the plasmid DNA from either a Cloning
reaction, Site directed mutagenesis or purified plasmid DNA, was added to the cells. Typically
this 1-6 µL of plasmid containing solution into 50µL of competent cells. Cells are then incubated
at ~0ᵒC for 20 minutes and are then “heat shocked” at 42ᵒC for 1 minute and then cooled at
~0ᵒC for 5 minutes. 450µL of SOC media (20g tryptone, 5g Yeast extract, 10mM NaCl, 2.5mM
KCl, 10mM MgCl2 and 20mM glucose) was then added to the cells and the culture was incubated
for 1 hour. ~150µL of the culture was then spread over an agar plate containing LB media with
the relevant antibiotic. The plate was then incubated at 37ᵒC for ~16 hours.
2.2.13 Plasmid Extraction
Plasmids were extracted from E. coli using a “Miniprep” kit obtained from QIAGEN. The method
is based on that described in 181 and the instructions from the manufacturer were followed: 5mL
59
cultures containing the desired plasmid were grown in LB media containing the relevant
antibiotics for ~16 hours. 3mL of the culture was then centrifuged at ~10000 rcf and the
resulting pellet is then resuspended in a 250µL buffer containing RNAase; 250µL of a highly
basic buffer was then added to the suspension and the sample is mixed thoroughly. As soon as
the suspension was fully mixed, 350 µL of an acidic buffer was added. The suspension was then
centrifuged at ~10000 rcf for 10 minutes to give a pellet. The supernatant is removed and
added to a small spin column containing a silica membrane and left to incubate for 2 minutes.
The column was centrifuged at ~10000 rcf for 1 minute and the bound material was washed
with a high salt buffer containing ethanol and again centrifuged at ~10000 rcf for 1 minute. The
plasmid DNA was then eluted with 50µL of a low salt buffer and the purified solution was
obtained by centrifugation at ~10000 rcf for 1 minute.
2.2.14 Protein Expression Trials
5mL cultures of Arctic express cells (Stratagene) containing the plasmid carrying the gene to be
expressed were grown at 37ᵒC to an OD600 of 0.5 in LB media containing the appropriate
antibiotic (as defined by the plasmid encoded resistance gene) as well as gentamycin. Cultures
were then split in half into two 2.5mL cultures. One of these was induced with 1mM IPTG (the
other culture being kept as a negative control). Both cultures were then incubated at 15ᵒC for
16 hours before being split into two 1.25mL samples. Each sample was then centrifuged at 6000
rcf so as to obtain a pellet and the supernatant was discarded. One of the pellets from both the
induced and control samples was resuspended in water resulting in the total cell extract
fraction. The other pellets were resuspended in BugBuster (Novagen) and shaken for 1 hour;
this results in lysis of the cell culture. This suspension was then centrifuged at 13000 rcf (giving
the soluble fraction) and the supernatant was taken as the soluble fraction. 3µL of both the
cellular and soluble fractions were then subject to SDS-PAGE analysis.
2.2.15 SDS-PAGE Analysis
SDS-PAGE was performed using pre-cast Run-Blue 12-20% (Expedion) in an XCell SureLock™
tank from Invitrogen. Samples contained Run-Blue loading dye and 500mM β-mercaptoethanol
and prior to loading were boiled at 110ᵒC for 10 minutes. Gels were run in Run-Blue running
buffer (Expedion) at 160V for 45 minutes. Bands were visualised by staining gel with “instant
blue” (Novexin) for 30 minutes.
60
2.2.16 Site-Directed Mutagenesis
The first part of the procedure for site directed mutagenesis was identical to that described for
PCR (section 2.2.4). However, the extension times at 72ᵒC were increased to 60s/kbp and the
final step was increased to 720s. Once the PCR was finished, the solution was digested with
Dpn1 at 37ᵒC for 1h. 6µL of the resulting solution was then used to transform 50 µL of
competent E. coli cells.
2.2.17 Deletion of the KanR cassette from E. coli ∆frmR
pFT-A was transformed into E. coli ∆frmR and 5mL cultures were grown in LB media with
ampicillin at 30°C for ~16hours. This growth also contained chlorotetracycline (20µg/mL) that
had been autoclaved in LB media in order to induce the gene encoding flippase recombinase
(FLP) from Saccharomyces cerevisiae. Expression of this gene is from the pFT-A plasmid.
Cultures were then grown at 40°C in order to remove the heat sensitive pFT-A plasmid. Cultures
were then selected for lack of resistance with respect to both ampicillin and kanamycin. A
colony that was found to be sensitive to both antibiotics was verified by extracting the genomic
DNA and PCR amplifying the region of interest. Colonies containing DNA fragments of the
correct size were sequenced by Eurofins. Colonies containing the desired genotype were
designated E. coli∆frmR∆KanR.
2.2.18 Lysogenisation of E. coli ∆frmR∆KanR
E. coli ∆frmR∆KanR was infected with λDE3 Phage using a DE3 lysogenisation kit (Novagen) to
create an E. coli ∆frmR∆KanR(DE3) strain. The starting strain was grown in LB media (5mL)
containing MgSO4 (10mM) and Maltose (2%) at 37°C to an OD600 of approximately 0.5. 10µL of
this strain was then mixed with 1µL (108 pfu, phage forming units) of each of the three
provided phage solutions: λDE3, Helper Phage and Selection Phage. The resulting solutions
were incubated for 20 minutes at 37°C before being spread onto an LB-agar plate which was
incubated at 37°C for 16 hours. 6 of the resulting colonies were tested for their ability to induce
expression from its T7 promoter using the procedure described in section 2.2.14
61
2.3 Protein Production and Purification
Note- All buffers and samples used in this section were kept between 4-10ᵒC unless otherwise
stated.
2.3.1 Large Scale Growth for protein production
A 5mL culture of the Arctic Express strain containing the appropriate plasmid was grown in LB
media containing the appropriate antibiotic for 16 hours. This culture was used to inoculate a
further twenty 10mL solutions of the same media which were grown overnight at 37ᵒC. Each of
these 10mL solutions was used to inoculate a 2L flask containing the same media. The 2L
cultures are then grown at 37ᵒC to an OD600 of ~0.65 and then cooled to 15ᵒC for 45 minutes.
Cultures were then supplemented with 1mM IPTG and incubated at 15ᵒC for a further 16
hours. Cells were then harvested by centrifugation at 8000rcf with a Beckman JLA 8.1 rotor for
13 minutes. From a 20L growth, the harvested cells were split into 5 separate samples and
stored at -20ᵒC.
2.3.2 Cell Lysis and extraction
A sample of the harvested cells was thawed at 20ᵒC for 40 minutes and resuspended in 100mL
of buffer A (20mM Tris, 200mM NaCl, pH 7.5). For extraction of FrmR and FrmRC36S, the NaCl
concentration was reduced to 50mM due to the nature of the later purification steps. A
“protease inhibitor tablet, complete, EDTA free” from Roche was dissolved in the suspension
and lysozyme and DNAse were added, each at 100µg/mL. DNAse was not added when purifying
FrmR and FrmRC36S as this would interfere with downstream heparin purification steps. The
suspension was then placed in a sonicator (Bandalin) at ~0ᵒC and sonicated at an intensity of
20% for 30 minutes with 13 second pulses. The soluble crude extract was then obtained by
centrifugation at 30000rcf in a Beckman 25.10 rotor for 1 hour at 4ᵒC.
2.3.3 Nickel Affinity Purification
The His-tagged proteins FrmR-His, HxlR1-His and HxlR2-His were purified by a nickel affinity
method. Buffered imidazole (pH 7.5) was added the soluble extract at 20mM and then incubated
with 1.5mL of Nickel agarose-NTA (QIAGEN) at 4ᵒC for 90 minutes. The solution was then
62
centrifuged at 2000rcf for 5 minutes so as to separate the nickel agarose from the extract
solution. The unbound solution was removed and kept at 4ᵒC for later use. The nickel-agarose
was then washed by re-suspending in 6mL of 20mM imidazole in 20mM Tris, 200mM NaCl, pH
7.5 buffers. The suspension was then centrifuged for 5 minutes at 2000rcf and the supernatant
was removed and kept at 4ᵒC for analysis. This washing procedure was repeated with the same
buffer containing increasing imidazole concentrations at 40mM, 60mM, and 80mM. The bound
“His-Tagged” proteins were eluted by repeating the wash procedure at 300mM imidazole. The
different fractions were then analysed by SDS-PAGE. Pure protein samples were dialysed in
3mL 3.5kDa cut-off GeBAflexTM dialysis tubes (GeBa) against at least a 1000-fold excess in
volume of buffer A (20mM Tris, 200mM NaCl, pH 7.5). If samples were not to be used
immediately, glycerol (20% v/v) was added and samples were stored at -80ᵒC.
2.3.4 Purification of FrmR and FrmRC36S
The non-tagged FrmR and FrmRC36S proteins were purified from cellular extract in two stages.
The first stage used a heparin affinity method. A 5mL Heparin “HiTrapTM” column (GE
Healthcare) was fitted onto an “AKTA” purification system (GE Healthcare) and the column was
equilibrated with >25mL of loading buffer A (20mM Tris, 50mM NaCl, pH 7.5). The bound
protein was eluted by a linear gradient with buffer B ( 1M NaCl, 20mM Tris, pH 7.5) and 4mL
fractions were collected. The amount of protein being eluted in each fraction was estimated by
monitoring the 280nm absorbance. Fractions were then examined by SDS-PAGE and those
containing FrmR/FrmRC36S were pooled and concentrated using a 20mL “Centricon(R)” protein
concentrator from Sartorius. (100kDa Molecular weight cut-off).
Size exclusion chromatography (SEC) was used in the next stage of the purification of FrmR and
FrmRC36S. A Superdex 200 column (GE Healthcare) was loaded onto an AKTA purifying system
(GE Healthcare). The column was equilibrated with >30mL of buffer A (20mM Tris, 200mM
NaCl, pH 7.5) prior to injection of the 0.5mL FrmR/FrmRC36S. Elution of proteins was
monitored spectroscopically at 280nm with elution time allowing an estimation of molecular
weight when comparing against a series of standards. The procedure for SEC for molecular
weight estimation used BSA, Ovalbumin, Carbonic Anhydrase and RNAase (each at 1mg/mL,
0.5mL) as standards. A plot of log10(MW) against elution volume was fitted to a linear least
squares function. This was then used to estimate the MW of FrmR/FrmRC36S proteins. 15µL of
each fraction was analysed by SDS-PAGE and those that contained FrmR/FrmRC36S were
63
pooled. Samples that were not to be used immediately were made to 20%v/v glycerol and
stored at -80ᵒC.
2.3.5 Protein Concentration Estimation
Protein concentrations were estimated using a “Bio-rad Protein Assay” kit (Bio-Rad) which is
based on the methods described in 182 (Commonly termed Bradford Assay). Five solutions
containing BSA ranging from 0.2-1.0mg/mL were prepared. 20µL of each solution was then
added to 980µL of the provided dye. Each solution was mixed and left for 3 minutes prior to
recording the absorption of the solution at 595nm. A plot of A595 against BSA concentration was
used to fit a straight line that was used as a calibration curve to estimate the concentration of
unknown protein samples.
2.4 In vitro biochemical and biophysical characterisation methods
2.4.1 Mass-Spectrometry
Mass spectrometry measures the mass to charge ratio of ionized molecules, and thus allows the
molecular mass of the molecule to be determined when the ionization state is known. Mass
spectrometry in this section was performed using an electrospray time of flight system. This
means that ionization of the protein is performed by electrospray in which the protein solution
is pumped through a needle and a high voltage is used to disperse (electrospray) the liquid into
small charged droplets. These droplets then evaporate and transfer some of their charge to the
protein molecules which can be detected by the mass spectrometer 183. Electrospray ionization
produces multiple charged species in the protein which give rise to many peaks though these
can be deconvoluted to obtain the molecular mass.184 Prior to electrospray ionization, the
protein sample is run through a reverse phase monolithic column in order to remove all salts
from the protein sample. This is because salt ions “stick” to the protein during ionization
therefore altering its molecular mass 185. Time of flight refers to how the mass to charge ratio is
measured. Ions are accelerated by an electric field with a fixed kinetic energy, and the time
taken for the ions reach a detector is recorded. From the time recorded, the mass to charge ratio
of the ion can be calculated.186 On measuring the m/z of a protein signals will be detected that
contain many different combinations of isotopes throughout the polypeptide. As such, a
distribution of signals is recorded and the major peak in the final spectrum represents the
64
average molecular mass. Peaks of smaller intensity are observed at higher masses which are
commonly due to molecular species that have remained attached to the protein such as sodium
ion(s).185 The technique used is accurate to within 1 Da.
Mass spectroscopy was carried out by the University of Manchester’s Biomolecular Analysis
Facility. 50 µL of approximately 100µg/mL of purified protein in 20mM Tris, 200mM NaCl, pH
7.5, was initially separated from its solution using a reverse phase monolith column (Bionex) as
part of an “Ultimate chromatography” system (Bionex) which is linked to an “LCD” electrospray
TOF mass spectrometer (Waters). The separated sample is then analysed in the mass
spectrometer. Software associated with the instrument was used to interpret the Mass spec data
into an accurate molecular mass.
2.4.2 Multi-Angle Light Scattering
MALS can be used to estimate the molecular weights of protein molecules in solution. In these
experiments a protein sample is ran down a size exclusion column with online light scattering
and refractive index detectors. An estimate of the molecular weight of a protein can be
calculated by measuring the intensity of light scattered and the change in refractive index when
the protein is eluted. For example, it can be shown that the relationship between molecular
weight (Mw), intensity of light scattered (LS), and refractive index signal (RI) can be
approximated to:
2.1
Where K’ is an instrument specific calibration constant. This equation shows that the molecular
weight is proportional to LS and inversely proportional to the RI. Alternatively, as in this study,
the Mw can be calculated by more accurate and complicated methods using computer software.
Multi-Angle Light Scattering experiments were carried out by the University of Manchester,
Biomolecular analysis facility. Purified protein samples of approximately 500µg/mL in 20mM
Tris, 200mM NaCl, pH 7.5, were ran through a SEC (Superdex 20,200, GE Healthcare) with an
online DAWN EOS photometer (Wyatt) located at the elution point. A laser light source at
690nm was used and MALS was measured at 18 different scattering angles angels. Additionally,
Dynamic Light scattering was measured concomitantly with MALS using a QELS instrument
(Wyatt). Additionally, the Refractive index measurement was performed with an Optilab rEX
refractometer. (Wyatt). These measurements/data were analysed with “Astra” software which
65
used Zimm fitting to fit all the data to a model that is used to estimate the proteins molecular
weight. 187,188
2.4.3 Circular Dichroism (CD)
CD was used to estimate the secondary structure of purified proteins. CD is a measurement of
the differential absorption of left and right circularly polarized light by chiral molecules. CD at a
particular wavelength ∆A(λ) is therefore calculated as in equation:
Where AR(λ) and AL(λ) and represent the absorption of left and right handed circularly
polarized light at wavelength, λ, respectively. As proteins consist of chiral amino acids, they
display circular dichroism. The asymmetry of the two secondary structures that can form within
proteins (α-helix and β-sheet) causes them to display different circular dichroism at particular
wavelengths. This fact can be used to estimate the relative proportions of each secondary
structure within a given protein. α-helices characteristically give negative ∆A at 208nm and
222nm and positive ∆A at 193. β-sheets characteristically give negativeA at 215nm and positive
∆A at 198nm.189
In CD experiments ΔA is usually measured as ellipticity in units of millidegrees, (θ). ΔA and θ are
interchangeable and are related by:
θ (millidegrees) = ΔA × 32982 2.3
Spectra however, are then usually presented in units of molar ellipticity (∆ε) which is defined by
equation 2.4:
2.4
Where c and l represent the sample concentration and cell path length respectively. Circular
dichroism experiments were performed on a J-810 CD spectrometer (Jasco). Spectra were
recorded from 260nm to 200nm with a 1nm step. Protein samples were approximately 60µM in
30mM sodium phosphate, 250mM NaCl (pH7.2) and were placed in a 0.5mm pathlength cell, 4
2.2
66
spectra were recorded for each sample; these were averaged to give the overall sample
spectrum.
The recorded spectra were used for secondary structure analysis. The software used to
interpret CD spectra in terms of secondary structure composition was K2D2. K2D2 uses a type
of artificial neural network called a ‘self-organizing map’ (SOM) that is trained with proteins of
known tertiary structure. This SOM is used to create two secondary structure maps, one for α-
helices and one for β-sheets. K2D2 uses these maps to make a prediction on the relative α-helix
and β-sheet contribution of a protein from its CD spectrum.190
2.4.4 Electropheric Mobility Shift Assays (EMSAs)
Non-competitive EMSAs were performed to determine whether purified proteins bind non-
specifically to DNA. The experiment relies on the fact that DNA:protein complexes will run at
different speed from the free DNA molecule when subject to electrophoresis.191 DNA fragments
were obtained using PCR. A 10% polyacrylamide gel was cast by pouring a 20mL
polymerisation solution into 1.5mm cassetes from Invitrogen. (The polymerisation solution
contained 2 mL 0.5×TBE (45mM tris, 45mM boric acid, 1mM EDTA, pH8.0), 2mL 37%
acrylamide (Protogel), 13.59mL ddH2O and 0.4 mL of 10% w/v ammonium persulfate (APS).
10µL of tetramethylethylenediamine (TEMED) was then added to initiate the polymerisation
and the solution was poured into 1.5mm Cassettes, (Invitrogen) and left for 30 minutes.
Samples (20µL) contained: either 60 or 90 ng of DNA, 0-500ng protein of interest in binding
buffer (100mM NaCl, 5mM Tris/HCl (pH 7.5), 5mM MgCl2). Additionally for FrmR-His, FrmR and
FrmRC36S, 100mM β-mercaptoethanol was also present. For binding reactions in which the
dependence of DNA:protein complex formation on formaldehyde is tested, the protein was pre-
treated with 5mM formaldehyde prior to mixing with the DNA. DNA:protein samples were
incubated at room temperature for 30 minutes. Prior to loading the gel, 3.5µL of EMSA loading
dye (15% ficoll, 0.02% bromophenol blue and 0.02% xylene-cyanol). Gels were run in 0.5×TBE
at 90V for approximately 2 hours. Gels were stained in a 20mg/mL Ethidium Bromide solution
for 10 minutes and DNA was visualised under UV-light.
Competitive EMSAs assess the specificity of the protein-DNA interaction. The casting and
running of the gel is done identically as described above for Non-competitive EMSAs. Biotin-
labelled DNA fragments were obtained using biotin labelled primers. Samples contained
67
between 4 and 6ng of the biotin labelled DNA fragment, between 0.3 and 2.0µg of the protein of
interest and 0.5µg Poly(I)·Poly(C). The reaction was performed in binding buffer (see above) for
30 minutes. Again for the FrmR-His, FrmR and FrmRC36S proteins, this was supplemented with
100mM β-mercaptoethanol. For reactions where the effect of formaldehyde is analysed, the
protein was pre-treated with 5mM formaldehyde. Immediately before running of the gel, 3.5 µL
of EMSA loading dye was added to the binding reaction.
The Biotin-labelled DNA was visualized using LightShift Chemiluminescent EMSA Kit (Thermo
Scientific) The DNA is electro-blotted onto a nitrocellulose membrane in 0.5×TBE buffer at 380V
for 15 minutes in an Invitrogen blotting tank. The membrane is then wrapped in transparent
film and placed under UV light for 15 minutes in order to cross link the DNA to the membrane.
The membrane is then incubated in 25mL of the provided “blocking buffer” (100mM Tris/HCl,
0.5% SDS, 10g/L BSA, pH8.0) for 15 minutes. The membrane is then transferred to 25mL
“blocking buffer” with 50 µL Streptavidin-horseradish peroxidase conjugate and incubated for a
further 15 minutes. The Streptavidin-horseradish peroxidase conjugate selects for the biotin
labelled DNA. The membrane is then incubated in 25mL of the provided “wash buffer” (100mM
Tris/HCl, 0.5% SDS, pH8.0) for 15 minutes. This wash step is repeated another two times. The
membrane is then incubated in 1mL of the provided luminol/enhancer solution for 5 minutes.
This solution detects the Biotin- Streptavidin-horseradish peroxidase complex and exhibits
chemiluminescence. The membrane is then exposed to photographic film in the dark which is
subsequently developed for 10-120 minutes.
2.4.5 Fluorescence Spectroscopy
Fluorescence of an Alexa Fluor-555 labelled DNA molecule (Eurofins) was monitored in a Cary
Eclipse Fluorescence Spectrophotometer (Varian) in a 1mm Quartz cuvette with excitation at
555nm and emission measured at 565nm. The excitation and emission slits were both set at
5mm. 10µM of labelled DNA was titrated with increasing protein concentration to
a molar excess of 1.5 (protein:DNA). The reaction was performed in 30mM Phosphate buffer,
50mM NaCl, pH7.2. Protein was taken from a highly concentrated stock solution >200 µM. A
small correction was made to the fluorescence value to account for the dilution. This correction
assumes linearity in fluorescence signal with regard to the fluorophore concentration. Five
repeats were carried out for each sample with the average value taken for each point along with
its associated standard deviation.
68
2.4.6 In vivo experiments using the PGFPR plasmid
All experiments were performed using a Synergy HT plate reader from Biotek. Excitation was at
395nm and fluorescence was measured at 509nm. Cell growth was monitored by measuring
OD600. Colonies were picked from an agar plate and used to inoculate 5mL of minimal media
with ampicillin and cultures were grown at 37ᵒC overnight. The overnight culture was then
diluted 50× into either: minimal media with ampicillin; minimal media with ampicillin and IPTG
(75µM); or minimal media with ampicillin, IPTG (75µM) and formaldehyde (0.3mM). For each
test sample, 150µL of the inoculated media was added to a 96 well plate. After 14 hours, the
relative fluorescence obtained by dividing the absolute fluorescence value by the OD600 value
was calculated for each well. (Only wells in which the OD600 was within 0.05 of each other were
used in the analysis). Five independent repeats of each condition were performed and averaged
to give the reported values.
2.4.7 In vivo experiments using the pKanRR plasmid
All in vivo experiments were performed using a Synergy HT plate reader from Biotek. Cell
growth was monitored by measuring OD600 as a function of time. All experiments were
performed in one of three different LB solutions. These were named Media A, Media B and
Media C as defined in Table 2-6.
Table 2-6 - Media used for in vivo experiments with the pKanRR plasmid
Colonies were picked from an agar plate and used to inoculate 5mL of LB with ampicillin and
cultures were grown at 37ᵒC overnight prior to inoculation of the test media. The overnight
culture was then diluted 50× into its appropriate media solution. For each test sample, 150µL of
the inoculated media was added to four wells of a 96 well plate. The plate was then inserted into
a plate reader kept at 25ᵒC with continuous shaking and the absorption at 600nm was
Media Contains
A Kanamycin (50µg/mL)
B IPTG (75µM)
C Kanamycin (50µg/mL)
IPTG (75µM)
69
recorded every 14 minutes for approximately 18 hours. The absorption values for each of the
four wells were averaged to give the value taken for 1 replicate. At least three replicates were
recorded for each sample in which the initial starting culture came from a separate colony from
a separate transformation. For in vivo experiments that were testing the effect of formaldehyde,
0.3mM formaldehyde was added to the media.
70
2.5 Bioinformatic analysis
2.5.1 General Use of Databases
Sequence databases were accessed through the Entrez server (www.ncbi.clm.nih.gov/Entrez).
The viewing of genomes and retrieval of DNA and protein sequences were implemented
through this server.
2.5.2 BLAST searches
BLAST is a tool that retrieves sequences from databases that display homology with a particular
input sequence. Sequence matches or ‘Hits’ are ranked according to a statistical parameter
called the ‘Expect value’ (E-value). The E-value corresponds to the probability of the sequence
similarity occurring by chance.192 BLAST was accessed from the Entrez server
(www.ncbi.clm.nih.gov/Entrez). The parameters used for the BLAST searches performed in this
study are given in Table 2-7.
Variable Setting
Max target sequences 1000
Expect threshold 10
Word size 3
Max queries in range 0
Matrix BLOSUM62
Gap Costs Existence: 11, Extension 1
Computational Adjustments Computational compositional score matrix
adjustment
Database Non-redundant protein sequences
Table 2-7: Parameters used in BLAST searches
2.5.3 Sequence alignments
All sequence alignments were performed using the program ClustalW2. Given multiple
sequences, the algorithm uses a progressive pairwise alignment to generate a MSA. Initially,
pairwise alignments are constructed between all sequences to create a distance matrix which
ranks the similarities between each sequence. This matrix is then used to construct a guide tree
71
that is used as an aid to perform the multiple sequence alignment. The algorithm starts by
aligning sequences at the tips of the guide tree therefore aligning the most similar sequences
first. Successive sequences are added to the alignment based on the branching of the tree so that
least similar sequences will be added to the MSA last. 193 ClustalW2 was accessed using a server
from EMBL and the parameters used are displayed in Table 2-8.
Variable Setting
Protein weight matrix Gonnet
Gap opening penalty 10
Gap extension penalty 0.20
Gap distance penalty 5
No end gaps False
Number of iterations 1
Clustering type Neighbour Joining
Table 2-8: parameters used with ClustalW2 to generate multiple sequence alignments
Sequence alignments were edited for viewing using the software Jalview.194
2.5.4 Secondary structure prediction
Jpred 3 was used to predict the secondary structure of proteins from their sequence. Jpred3
initially uses the PSI-BLAST algorithm to generate a Multiple sequence alignment (MSA), this is
then used as input into an algorithm called Jnet. Jnet uses an artificial neural network that has
been trained with MSAs containing a known protein structure to make a prediction on whether
an individual residue will be part of an α-helix or β-sheet or random coil. The use of MSAs
improves the accuracy of the secondary structure prediction because it contains evolutionary
information that is not present in a single sequence. Jpred 3 has been tested to be 81% accurate
(average accuracy of predicting α-helix, β-sheet and random coil) and each residue prediction
comes with an associated confidence score from 0 (low) to 9 (high).195,196
2.5.5 DNA binding residue prediction
5 different programs were used for predicting potential DNA-Binding residues. These were:
DBindR197, BindN198, DNAbindR199, DP-Bind200 and Prote DNA201. See the associated references
for a detailed description of each program. Table 2-9 lists the settings used for each variable in
the program.
72
Program Settings Used
DBindR Prediction method: Random Forest
BindN Specificity: 80%
DNAbindR No editable variables for this program
DP-Bind Encoding Method: PSSM-based
Prote DNA No editable variables for this program
Table 2-9: Settings used in DNA binding residue prediction algorithms
73
2.6 X-Ray Crystallography
2.6.1 Background
X-Ray crystallography was used to determine/attempt to determine the high resolution
structure of proteins in this study. When X-rays encounter a material, they are scattered by the
electrons within it and the extent of this scattering is proportional to the electron density. In
non-ordered materials, this scattering will interfere predominantly in a destructive way.
However, in crystalline materials, the regular spacing between the atoms sets up a condition for
the scattered X-rays to occasionally interfere in a constructive way and produce diffraction.
A crystal is built up from a unit cell; the unit cell is the smallest group of atoms that can generate
the entire crystal lattice when extended in three dimensions. A simple model to demonstrate the
principle of diffraction between two such planes is shown in Figure 2-1. This is a simple model
because we consider only two dimensions in which the electron density sits directly on these
parallel planes (red lines in Figure 2-1).
Figure 2-1 - Schematic showing the condition for diffraction between two parallel planes. The red
lines represent planes of atoms. The blue arrows represent X-rays. The beams are scattered by
the atoms at the same angle at which they are incident to the planes i.e. θ. It is evident that A-B = d
and therefore B-C = d sin θ.
Figure 2-1 shows that:
2.5
Where BC is the distance between points B and C in Figure 2-1 and d is the spacing between the
planes and θ is the angle of incidence and reflection of the X-ray beam. Constructive interference
74
occurs when two waves are in phase and this occurs when the path difference between them is
equal to an integer multiple of λ (wavelength). Therefore, if two X-ray beams are impinging on a
crystal with two planes separated by distance, d, at the same angle θ, the scattered waves can
only interfere constructively provided the following equation is satisfied:
2.6
Where n is an integer and λ is the wavelength of the radiation. This is called Bragg’s law and if
this condition is not met, destructive interference will cancel out any scattered radiation. For a
3D crystal, the diffraction pattern is a pattern of spots scattered around the main incident beam.
Each spot (or reflection) represents a point where constructive interference has occurred
between particular planes in the crystal lattice. The total scattering of X-rays from lattice planes
in a crystal is a result of the interaction between the radiation and all the electron density in the
unit cell. How much each atom contributes to the overall scattering of the X-ray depends on two
factors: the identity of the atom (atoms with higher electron density scatter X-rays more than
those with low electron density) and where the atom is located relative to the diffracting plane.
Each diffracted wave (i.e. reflection in the diffraction pattern) is described by a mathematical
function known a structure factor, Fhkl. (h, k, and l describe the lattice plane from which the X-
ray is diffracted). The structure factor is related to the electron density within the unit cell by
the following equation:
2.7
Where p is the electron density at coordinates x,y and z within the unit cell. The integral V is the
volume over the unit cell (x, y and z). The equation shows that if Fhkl is known for all reflection
(h,k,l) , then the electron density at a position (x,y,z) can be determined by a mathematical
operation called Fourier transform. To obtain a high resolution map of the electron density
within the unit cell, it is necessary to have structure factors for h,k,l planes in which the value of
d is of the intended resolution. Equation 2.6 indicates that to obtain reflections corresponding to
these small values of d, they have to be diffracted at high angles. This results in the diffracted
waves from shortly spaced planes being observed in the most outer ring of the recorded
diffraction pattern.
75
The structure factor can be represented as a complex vector of two components. The real
component of the vector is proportional to the amplitude of the reflection and is represented as
the absolute value of the structure factor |F|. |F| can be determined from the intensity, Ih,k,l of the
reflection which is recorded in the diffraction experiment. The imaginary component describes
the angle that the vector F makes to the real component of the vector i.e. the phase angle, h,k,l.
In these terms, the electron density at a point, ρ(x,y,z), is given by the following equation:
2.8
It is entirely possible to determine the dimensions of the plane (h,k,l) of each reflection as well
as the absolute value of the structure factor |Fh,k,l| from a diffraction experiment. The only other
parameters that are needed in order to solve equation 2.8 in terms of electron density is the
phase angle h,k,l of each reflection. However, it is practically impossible to directly measure the
phase angle of a reflection during a protein X-ray diffraction experiment. This fact results in the
so-call phase problem. Fortunately, there are a series of solutions to this problem that either
allow an indirect measurement of the phases using anomalous scattering techniques and
isomorphous substitution, or by using molecular replacement (MR) that allows the phases be
estimated to good approximation using an available model for the molecule that is being
studied. Once a good estimate of h,k,l is obtained for all reflections, it is possible to construct an
initial model from the corresponding electron density maps. This initial model can be improved
by an iterative process called refinement, and the phases can be derived directly from the
model. The refinement combines X-ray data with stereochemical restraints to create a more
accurate model. Each refinement outputs a new model with new calculated structure factors
Fcalc for each reflection. A statistic used to assess the refinement is called the R-factor, R, which is
a measurement of the difference between the observed intensities |F| and what the intensities
would be if calculated from the refined model |Fcalc|, i.e. Equation 2.9
2.9
A more robust statistic is the free R-factor, Rfree, in which ~5% of the diffraction data are
omitted from the refinement and is used as |F| in the calculation. This statistic eliminates bias
because the omitted data has not been used to determine |Fcalc|. For a completed model, the R
and Rfree values should be approximately equal to 0.1 times the resolution of the diffraction data
(in Å). When all aspects of the refinement are completed and the model is as complete and as
accurate as possible, the model is generally submitted to the Protein Data Bank which is a freely
available database. 202,203,204
76
2.6.2 X-Ray Crystallisation Trials
For a substance to crystallise, the molecules have to self-arrange in an ordered way and under
particular conditions the ordered structure may be extended in three dimensions to form a
crystal. A proven technique to promote the crystallization of globular proteins is called vapor
diffusion. This technique relies on the fact that crystallization is more likely to occur at a
concentration range below the precipitation point, but high enough for significant crystal
growth. In the vapor diffusion method there is a sealed vessel that contains a protein sample
segregated from a reservoir. The reservoir solution is of a lower vapor pressure than the
protein solution leading to the evaporation of water from the protein sample to the reservoir.
This gradually increases the concentration of the protein sample which can bring about its
crystallization.205 This process is summarised in Figure 2-2.
Figure 2-2- Schematic showing the vapor diffusion crystallisation method.
Pure samples of FrmR, FrmR-His, FrmRC36S, HxlR1-His and HxlR2-His were screened against 4
standard screens provided by Molecular Dimensions. These were: PACT-Premier, JCSG plus,
Clear strategy 1 and Clear strategy 2. The screens were set up as follows: 100µL of the relevant
screening solution was pipetted into the reservoir wells of a 96 well crystallization tray. 200nL
of protein was pipetted into each crystallization well of the tray using the automated pipetting
robot Mosquito® (ttplabtech). Mosquito® was then used to pipette 200uL of the reservoir into
the same crystallization well as the protein. Trays were sealed and kept at 4ᵒC. This process
was carried out at two concentrations for each protein: 5mg/mL and 10mg/mL. Additionally, a
larger crystallization plate was created for scaling up the growth of FrmRC36S crystals. The
77
reservoir was composed of 2µL of reservoir, and 2µL of protein sample, and were manually
pipetted into the crystallization well. The tray was sealed and stored at 4ᵒC.
2.6.3 Data Collection
Crystals were either taken directly from the screening tray or the scaled up tray. Those of
FrmRC36S were cryo-protected with a 1:1 mixture of glycerol and PEG 200 placed into a loop
and then frozen in liquid nitrogen. HxlR2-His crystals were cryo-protected with mother liquor
supplemented with 8% PEG 200, placed into a loop and then frozen in liquid nitrogen.
Diffraction experiments were performed at the European Synchrotron Radiation Facility (ESRF)
beamline ID 14-4 (HxlR2-His) or beamline ID -23-1 (FrmRC36S), wavelengths were
approximately 0.94Å. Images were recorded through 180ᵒ with an individual oscillation angle
of 0.5ᵒ.
2.6.4 Data Processing
Indexing and integration of images was performed using the program “iMosflm”.206 Scaling and
merging was undertaken by “SCALA”207. Space group determination was done using
“Pointless”.207 The conversion of intensities to structure factors was performed with
“Truncate”; Truncate was also used to generate a 5% R-free dataset.208
2.6.5 Molecular Replacement
The MR procedure was performed using the program Phaser.209 The phasing model tried for
both HxlR2 data sets was that of the protein YtdC (PDB ID 2HHZ) from Bacillus subtilis. For
FrmRC36S data sets, two models were trialled as phasing models. These were CsoR from
Mycobacterium tuberculosis (PDB ID 2HH7) and CsoR from Bacillus subtilis (PDB ID 3AAI)
Additionally, molecular models were generated using the programs CaspR and Balbes.210,233
Each MR procedure with Phaser performed searches in the crystal space group as well as
related space groups in case the diffraction data had not been assigned to the correct group. The
output from Phaser with the highest likelihood score was carried forward for further model
building and refinement.
78
2.6.6 Model building, Refinement and validation
The initial model obtained from Phaser was rebuilt automatically by using the Autobuild
program as part of the Phenix software suite. Autobuild uses the program RESOLVE to perform
“Density modification” that iteratively improves the electron density maps (phases) by applying
known properties regarding protein electron density.211 Once the density modification has been
done, RESOLVE builds a model based on the sequence of the protein by fitting the sequence to
the electron density map.212 Autobuild then uses the program “phenix.refine” to refine the
current structure and minimize the R and R-free statistics. In addition to using the R factor as a
target score, the refinement uses known parameters such as legitimate bond and torsion angles,
bond lengths and Van der Waals contacts to score model coordinates. This is called a restrained
refinement. This routine also adds water molecules to the model where electron density that is
likely to correspond to water is observed. After each refinement round the model is used in
another round of RESOLVE, so as to iteratively improve the model.213,214 . The output electron
density map from Autobuild was then viewed in the software Coot as a 2F0-Fcalc map as well as
an F0-Fcalc along with the model. The model was further refined by manual adjustment. The
model was further refined using the program phenix.refine as part of the Phenix software
suite.214 The model generated was then adjusted in Coot and this iterative cycle was continued
until the R-free factor reached a minimum value. Validation of the final refined structure was
done by the program Molprobidity.215
2.6.7 Analysis of the dimer interface
The dimer interface of the final model HxlR2-His in the P43212 space group was analysed using
Protorp.216 Default settings were used for the analysis.
79
3 Cloning, Purification and Biophysical Characterisation of
Bacterial Transcription Factors Implicated in Formaldehyde
Sensing
3.1 Introduction
This chapter first examines the phylogenetic distribution of the known transcription factors of
formaldehyde detoxification pathways. This is done so as to obtain an improved context of the
regulators to be researched in this study. As we wish to obtain pure samples of some of these
proteins, this chapter covers the molecular biology that was performed in order to achieve this.
Also, the purification of the recombinant proteins is presented. The work in the latter part of the
chapter then attempts to obtain information regarding the physical properties of the purified
proteins. These properties include size and their secondary structure composition.
3.2 Aims and Objectives
Phylogenetic analysis will be carried out using bioinformatics methodology. Genome sequencing
has created tremendously useful databases of gene sequences from which open reading frames
may be inferred. It is therefore possible to search for related proteins in different bacterial
species. Therefore, to determine which TFs are conserved, the databases of known and putative
protein sequences will be searched using a BLAST algorithm and those sequences which appear
as part of a FDP will be collected for further analysis. A multiple sequence alignment (MSA) will
be created for each TF type so that the level conservation throughout the sequences can be
evaluated. When necessary the MSA can be used to create a phylogenetic tree which graphically
represents the sequence relationships between particular organisms.
In order to obtain pure samples of the TFs, their corresponding genes will be amplified by PCR
from either genomic or synthesised DNA templates. The corresponding genes will be inserted
into an expression vector using molecular cloning techniques which will allow the genes to be
overexpressed in E.coli. Purification of the overexpressed proteins will be attempted using a
number of chromatography based methods.
80
Once purified, it will be attempted to determine the size of the TF. The monomeric intact mass
of the proteins will be measured accurately using Mass Spectrometry , whereas Light Scattering
and Gel Filtration Chromatography will be used to evaluate their multimeric state. Finally, an
estimation of the secondary structure content of the TFs will be performed using a combination
of Circular Dichroism and computational prediction.
81
3.3 Phylogenetic distribution of known TFs of FDP
As discussed in section 1.4, there are several types of TF that have been shown to influence the
transcription of FDPs. In this section, the distribution of these TFs throughout prokaryotes is
analysed.
3.3.1 Distribution of the two component systems from Paracoccus denitrificans and
Rhodobacter sphaeroides
As the genes encoding the TFs in these systems are located at a distance from their target genes,
it is difficult to determine whether homologues from other organisms are likely to regulate
FDPs.
3.3.2 Phylogenetic distribution of HxlR and HxlR-pCER270
Here we define a gene as encoding a putative HxlR if it is located directly upstream of RuMP or
GSH dependant FDP genes. Only 14 such genes could be found, all belonging to species in the
firmicutes phylum of bacteria, with seven of the species being from the Genus Bacillus. A
multiple sequence alignment (MSA) between these sequences is shown in Figure 3-1.
Figure 3-1 - MSA between HxlR family proteins in which their genes are located directly upstream
of a FDP. The alignment was performed using ClustalW. Residues are coloured according to
conservation with weekly conserved residues coloured light blue and strongly conserved residues
coloured dark blue.
82
A phylogenetic tree was constructed using these sequences and is shown in Figure 3-2. The tree
shows that at the first branch point these HxlR proteins can be grouped into two types; type 1
(blue) and type 2 (green). Type 1 proteins contain sequences from the Bacillus genus as well as
Oceanobacillus and Exiguobacterium, whereas type 2 proteins contain sequences present in
Staphylococcus and Macrococcus. Also, HxlR-pCER270 belongs to the type two form of HxlR
proteins. (boxed in red) The difference between these two groups is quite pronounced, as each
subclass contains members who are >70% identical to each other. The closest related sequences
between the two groups are hxlR from Bacillus atrophaeus and from Staphylococcus carnosus
which show 48% identity.
Interestingly, the most similar protein sequences in the databases to HxlR-pCER270 are that of
the other Type 2 HxlR proteins; for example HxlR-pCER270 is >60% identical to all members of
this group. The most similar type 1 HxlR protein to HxlR-pCER270 is that from Oceanobacillus
iheyensis which share 45% identity. No other orthologs of HxlR-pCER270 which are part of a
GSH-FDP operon were found. For clarity and simplicity, HxlR and HxlR-pCER70 will now be
referred to as HxlR1 and HxlR2 respectively.
Figure 3-2 - Phylogenetic tree of HxlR proteins created by Neighbour Joining with % identities
from the alignment in Figure 3-1. Type one HxlR proteins are coloured blue and type 2 HxlR
proteins are coloured green. HxlR-pCER270 is boxed in red.
83
3.3.3 Phylogenetic distribution of AdhR
If we define an adhR gene as a MerR family member located directly upstream of a GSH-FDH,
only three organisms can be found that contain an adhR gene which all belong to the Bacillus
genus (Figure 3-3).
Figure 3-3- MSA alignment between AdhR proteins.
3.3.4 Phylogenetic distribution of FrmR
Orthologs of FrmR were searched for using the BLAST tool; any genes encoding a Duf156 family
protein upstream of a GSH-FDP were considered to encode a putative FrmR protein. It was
found that 42 bacterial species contain FrmR therefore making it by far the most widespread
known TF of FDPs. All species belong to the proteobacteria. A MSA between FrmR from E. coli
and six other randomly selected organisms shows that the first ~60 residues show a high level
of conservation, together with the three C-terminal residues. (Figure 3-4) This is also illustrated
the logo diagram in Figure 3-5 which is constructed from a MSA of all 42 sequences.
Figure 3-4- MSA between FrmR sequences. Alignment includes FrmR from E. coli along with six
randomly selected sequences.
Figure 3-5 – Logo diagram from a MSA of the 42 FrmR sequences. Size of each bar is proportional
to conservation level of the residue. Consensus sequence is shown below.
84
3.3.5 Summary
The preceding phylogenetic data shows that the HxlR proteins appear to be grouped into two
forms; types 1 and 2. Therefore, this study will further research one of each type, HxlR1 from
Bacillus subtilis and HxlR2 from Bacillus cereus AH818. It is clear that FrmR is the most widely
distributed TF of FDPs in sequenced bacterial organisms. Therefore, FrmR from E.coli will also
be further researched here. The following sections describe the strategy used to obtain purified
samples of the respective proteins.
3.4 Molecular cloning
In order to study the biochemical properties of TFs, and their response to formaldehyde, it is
necessary to purify significant amounts of these proteins, and hence need to have a large
amount of its transcript expressed in a suitable host organism. To do this it was decided to use
the pET cloning system from novagen. In this system, genes of interest are inserted into
plasmids (vectors) which have been designed specifically for the purpose of high level
expression. Genes cloned into these vectors are under the control of the T7 promoter, which is
a promoter specific for the enzyme T7 polymerase.217,218 The vectors used in this study are
named pET-15b and pET-24b. All molecular biology techniques used in this section were
carried out as described in 3.4. Acknowledgements are given to Dr Tewes Tralau for the help
and assistance given in the experiments described in this section.
85
3.4.1 Molecular cloning of the frmR gene from E.coli
A summary of the strategy used to obtain the frmR expression constructs is given in section A1.2
Two inserts containing the frmR gene were amplified by PCR so that two forms of the protein,
each with a ‘His tag’ at either terminus could transcribed. Initially a 778bp fragment containing
the frmR gene was amplified from genomic DNA of E. coli DH5α (Primers: frmR_F, frmR_R). A
subsequent PCR was used to select the frmR gene and add appropriate restriction sites at either
end. A 363bp insert containing a 5’ Nde1 site and a 3’ BamH1 site was amplified (Primers:
frmR_Nde1, frmR_BamH1), as was a 302bp insert containing a 5’ Nde1 site and a 3’ HindIII
(Primers: fmrR_Nde1, frmR_HindIII).
Figure 3-6 shows the purified PCR products on an agarose gel. Each band is at its predicted
position relative to the marker. The 363bp inert was cloned into pET15b to give a pET15b-His-
frmR construct and the 302bp insert was cloned into pET24b to give a pET24b-frmR-His
construct.
Figure 3-6- Agarose gel showing PCR amplified region and inserts. Lane1 – Marker, Lane2- 778bp
fragment, Lane 3- 363bp insert, Lane 4 – 302bp insert
The pET24b-frmR construct adds an extra 11 amino acids onto the C-terminus of WT-frmR
while the pET15b-frmR construct adds an extra 20 amino acids to the N-terminus. In order to
ligate the inserts into the corresponding plasmids, both inserts were cut with their respective
restriction enzymes. (NdeI and either BamH1 or HindIII). The plasmids were also cut with the
corresponding restriction enzymes (pET-15b: NdeI and BamHI, pET24b: NdeI and HindIII) and
dephosphorylated using calf intestinal alkaline phosphatase. Figure 3-7 shows the undigested
and digested forms of pET15b and pET24b on an agarose gel. If cut correctly, pET15b should be
5697bp and pET24b should be 5246bp. Figure 3-7 shows that the cut plasmids are running at
86
the expected molecular weight indicating that the digestion was successful. Inserts were ligated
into their corresponding plasmids using the enzyme T4 ligase. The ligation was performed using
a 150:1 insert to vector ratio (molarity) at 15ᵒC overnight.
Figure 3-7 - Agarose gel showing circular and digested plasmids. A- Lane 1- Marker, Lane 2 –
Circular pET15b, Lane 3 – pET15b cut with Nde1 and BamH1, expected size - 5697bp. B-Lane 1 –
Marker, Lane 2- pET24b, Lane 3 – pET24b cut with Nde1 and Hind111, expected size - 5246bp
The ligation product was transformed into E. coli DH5α and cells were selected for using
ampicillin (pET15b) and kanamycin (pET24b). The potential colonies were screened by colony
PCR using T7 primers (Primers: T7F, T7R) resulting in a PCR product of 552bp for the 363bp
insert in pET15b, and a 471bp product for the 302bp insert in pET24b. Figure 3-8 shows the
PCR products at the sizes expected relative to the marker, indicating successful ligations; these
plasmids were verified following plasmid preparation by sequencing, indicating that both
pET24b-frmR-His and pET15b-His-frmR constructs contained the desired sequence.
Figure 3-8– Agarose gel showing PCR products from colony PCR screening. Lane 1- Marker, Lane 2
– 552bp fragment from pET15b-His-frmR, Lane 3 – 471bp fragment from pET24b-frmR-His
Given the small size of frmR (91 aa), the addition of a His-tag is a relatively significant
perturbation. In order to allow comparison of his-tagged variants with WT-frmR, a non-his-
tagged construct was made from pET15b-His-frmR. A second Nde1 restriction site was
87
introduced at the start codon of the frmR gene via site directed mutagenesis (Primers:
frmR_mutNde1F and fmrR_mutNde1R). This plasmid was then cut with Nde1 and religated to
remove the His-tag sequence and give the pET15b-frmR construct. Subsequent to
transformation into E. coli DH5α, potential colonies were screened via colony PCR. Again T7
primers were used for this that would give a 492bp product if the procedure was successful.
Figure 3-9 shows the PCR product versus the 552bp product of the same PCR reaction using
pET15b-His-frmR as a template. The product in lane 2 is running lower than that of lane 3,
indicating that the N-terminal His-tag sequence is likely to have been removed. The
corresponding plasmid was purified and verified by sequencing.
Figure 3-9– Agarose gel showing PCR products from colony PCR screening. Lane 1- Marker, Lane 2-
492bp fragment form pET15b-frmR, Lane 3 – 552bp fragment from pET15b-His-frmR.
The translated sequences for each construct are shown below:
Figure 3-10- Translated sequences from each construct encoding a form of FrmR. Parts of the wt
sequence are in black. Amino acids that are not part of the FrmR sequence are in blue. ‘His-Tags’
are coloured red and the trypsin cleavage sites are in green
88
3.4.2 Molecular cloning of the hxlR1 gene from Bacillus subtilis
As it has already been documented that HxlR1-His is soluble and active with a C-terminal ‘His-
tag’171, a similar construct was made- pET24b-hxIR1-His. A summary of the strategy used to
obtain the hxlR1 expression construct is given in section A1.3. Initially an 894bp fragment
containing the hxIR gene was amplified from genomic DNA (Bacillus subtilis sp128, ATCC)
(Primers: hxlR_F, hxlR_R). A 395bp insert containing a 5’ Nde1 site a 3’ HindIII site was then
amplified from this 894bp fragment. (Primers: hxlR_Nde1, hxlR_HindIII). These PCR products
are shown on an agarose gel in Figure 3-11. The insert was digested with Nde1 and HindIII and
then ligated into a previously cut and dephosphorylated pET-24b vector. (Figure 3-7). The
ligation product was transformed into E. coli DH5α and cells were selected on kanamycin.
Putative colonies were screened by colony PCR using T7 primers which would give a PCR
product of 571bp if the corresponding pET-24b contained the insert. (Figure 3-11) Plasmids
were purified from a colony that gave rise to the correct PCR product and verified through
sequencing.
Figure 3-11 –Left – Agarose gel showing PCR products from the amplification of hxlR1. Lane 1-
Marker, Lane 2 – 894bp fragment, Lane 3- 395bp frgment. Right - Agarose gel showing PCR
products from colony PCR screening. Lane 1- Marker, Lane 2 – 571bp fragment from pET24b-
hxlR1--His
The transcribed product from pET24b-hxlR1-His is shown in Figure 3-12.
Figure 3-12- Translated sequence from the pET24b-hxlR1-His construct. Wt sequence is in black.
The ‘His-tag’ is in red and linker amino acids that are not part of the HxlR1 sequence are in blue.
89
3.4.3 Molecular cloning of the hxlR2 gene from Bacillus cereus AH818
A summary of the strategy used to obtain the hxlR2 expression constructs is given in section
A1.4 A codon-optimised version of the hxlR2 gene was obtaining through gene synthesis by
Eurofins. Given the sequence similarity of the gene product to HxlR1, it was decided to
synthesize the hxlR2 gene with a C-terminal ‘His tag’ as there was a good chance that it may also
be soluble and active. In order to generate a pET24b-hxlR2-His construct, pET24b was cut with
Nde1.
Primers that introduced ends complementary to the cut pET24b plasmid were used to amplify a
403bp insert that contained the hxlR2-His gene (Primers: cer24bF, cer24bR). This insert is
shown on an agarose gel in Figure 3-13. This insert was introduced into the cut plasmid using
the in-fusion cloning reaction. 219 The reaction product was transformed into E. coli DH5α and
cells were selected for using kanamycin. Potential colonies were screened by colony PCR using
T7 primers. (Figure 3-13). The plasmid corresponding to colonies that gave the correct 699 bp
PCR product were purified and verified by sequencing.
Figure 3-13- Agarose gel showing PCR amplified hxlR2-His insert. Lane 1 – Marker, Lane 2- 403bp
insert. B – Agarose gel showing PCR product from colony PCR screening. Lane 1 – Marker, Lane 2-
699bp fragment from pET24b-hxlR2-His.
The translated product from the pET24b-hxlR2-His construct (HxlR-2-His) is shown in Figure
3-14.
90
Figure 3-14- Translated sequence from the pET24b-HxlR2-His construct. Wt sequence is in black.
The ‘His-tag’ is in red and linker amino acids in blue.
3.5 Protein Expression Trials
In order to obtain cell cultures expressing high levels of the target protein, it is necessary to
ensure high expression from the T7 promoter. This is achieved by transforming the plasmid into
an E. coli DE3 lysogen and inducing expression by addition of IPTG.220 Each plasmid was
transformed into the E. coli DE3 lysogen ArcticExpressTM (Agilent) cells. 5mL cultures were
grown at 37°C to an OD600 of 0.5 in LB media, induced with 1mM IPTG and grown at 15°C for an
additional 10 hours. A separate control set of cultures were treated in the same way, but
without the addition of IPTG. The induced cell cultures were then split in half with half being
subject to lysis in order to obtain the soluble fraction. All expressions trials were conducted as
described in section 2.2.14.
In addition to the results presented in this thesis, several other expression strains were tested.
These strains were BL21 (DE3) from New England Biolabs® Inc. and HMS174 (DE3) from
Novagen. For each of the genes expressed in this section, the ArcticExpressTM strain appeared to
give the highest expression levels (as judged by eye using SDS-PAGE) which is why it was
chosen for large scale growth (See section 3.6). Additionally, expression times were varied
between 3 and 16 hours which appeared to have little effect on the final protein content. The
concentration of IPTG added to the cultures was also varied becuase high levels can sometimes
effect cell growth if the expressed gene product is toxic. However, it was found that high IPTG
concentrations (1 mM) did not substantially alter growth of the sample cultures.
3.5.1 Expression trials using pET24b-frmR-His, pET15b-His-frmR and pET15b-frmR
The various fractions from this experiment were subject to SDS-PAGE. Figure 3-15 shows the
results from cells containing the pET24b-frmR-His construct (A), the pET15b-His-frmR construct
(B), the pET15b-frmR construct (C) respectively. All three constructs express the corresponding
91
frmR variant indicated by the prominent band at the expected molecular weight (between 10
and 15 kDa). The results also show that both FrmR-His and FrmR are soluble. However, His-
FrmR appears as an insoluble protein under the conditions tested. It is however sometimes
possible to refold an aggregated protein. This can be done by using denaturants to attempt to
solubilise the aggregated protein. The soluble protein can then sometimes be refolded by the
removal of denaturant and variation of conditions such as salt concentration, temperature and
pH.
Figure 3-15– SDS-PAGE analysis of expression trial experiments from pET24b-frmR-His (A),
pET15b-His-frmR (B) and pET15b-frmR (C). Lane 1 shows the control i.e. no IPTG added, Lane 2-
Marker, Lane 3- Induced cells, Lane 4- Soluble fraction of induced cells.
3.5.2 hxlR Expression using pET24b-hxIR1-His
Fractions from this experiment were subject to SDS-PAGE which is shown in Figure 3-16. The
results indicate that hxlR1 gene is expressed as a soluble protein as there is a prominent band at
the expected molecular weight (~15kDa) that is not present in the control sample.
Figure 3-16– SDS-PAGE analysis of expression trial experiments from pET24b-HxlR1. Lane 1-
Marker, Lane 2- Control i.e. no IPTG added, Lane 3- Induced cells, Lane 4- Soluble fraction of
induced cells.
92
3.5.3 Expression of HxlR2-His from pET24b-hxlR2-His
SDS-PAGE was used to analyse the fractions from this experiment. The results (Figure 3-17)
indicate that hxlR2-His gene is expressed as a soluble protein, as there is a prominent band at
the expected molecular weight (~15kDa) that is not present in the control sample.
Figure 3-17– SDS-PAGE analysis of expression trial experiments from pET24b-hxlR2-His. Lane 1-
Marker, Lane 2- Control i.e. no IPTG added, Lane 3- Induced cells, Lane 4- Soluble fraction of
induced cells.
3.6 Protein Purification
In order to obtain a pure protein sample, the protein of interest has to be separated from all
other soluble proteins in the cell extract. As mentioned above, most of the proteins that have
been over expressed contain a ‘His-Tag’ at the C-terminus. The ‘His-tag’ has an affinity for nickel
ions through direct coordination of the metal ion via the imidazole side chains; therefore, a
protein containing a ‘His-tag’ should bind to nickel ions with a greater affinity than that of other
proteins. Imidazole can be used to displace the bound proteins. The method used to purify His-
tagged proteins is to incubate the soluble cellular extract with an agarose resin containing
immobilized nickel. Following a washing step which elutes all non-bound proteins, the resin is
then treated with increasing concentrations of imidazole which should elute the His-tagged
protein in a relatively pure state.221 For proteins that are not His-tagged, a combination of
standard chromatography steps is necessary in order to purify the protein. For all the
purifications discussed in this section, the expression protocol described in section 3.5 was
carried out on a litre scale.
93
3.6.1 Purification of FrmR-His
FrmR-His was purified using the nickel based method described in 2.3.3. Figure 3-18 shows the
eluted fraction containing FrmR-His subject to SDS-PAGE. Figure shows that FrmR-His is eluted
in a pure state as no other protein bands are visible on the gel. Approximately 1.4mg of FrmR-
His were obtained from 1L of cell growth.
Figure 3-18- SDS-PAGE analysis of fractions from the purification of FrmR-His. Lane 1 – Marker,
Lane 2- 20mM imidazole, Lane 3- 40mM imidazole, Lane 4 – 60mM imidazole, Lane 5- 80mM
imidazole, Lane 6 – Pure FrmR-His eluted at 300mM imidazole. The molecular weight of an FrmR-
His monomer is expected to be approximately 12kDa at which the band in lane 6 is running.
3.6.2 Purification of FrmR
Section 5.3.1 describes a series of experiments that leads us to conclude FrmR-His is inactive.
Once this had been established, it was decided to attempt to purify the ‘untagged’ form FrmR to
determine whether the His-Tag was the cause of inactivity. Untagged FrmR was purified using a
combination of two chromatography techniques of which the details are described in 2.3.4.
Initially, cellular extract was bound to a column consisting of agarose resin linked to a molecule
called heparin. Heparin is a polysaccharide that contains a high proportion of negatively
charged sulphate groups. These properties make it structurally similar to DNA; as such, DNA
binding proteins are likely to have a significantly higher affinity to bind to it, which provides a
method to isolate DNA binding proteins.222 The force of interaction between charged species is
decreased as ionic strength is increased; therefore, the binding of a protein to heparin will be
94
salt dependent. Weaker interactions will be disrupted at lower ionic strengths and stronger
interactions will be disrupted at higher ionic strengths.223 Therefore, if the cell extract is first
incubated with the heparin resin, and the resin is treated with buffer of increasing salt
concentrations, different proteins will dissociate from the heparin at different times. This
provides a means of separation based on affinity for heparin. FrmR was partially purified using
this method. Figure 3-1shows the UV trace of what is eluted from the column as the salt
concentration i.e. conductivity is increased. Figure 3-20 shows an SDS-PAGE gel of the eluted
fractions, and shows that although the FrmR protein is by far the most abundant protein in the
B13 and B12 samples, there is still significant contamination. Fractions B12 and B13 were
therefore pooled to be used in subsequent purification steps.
Figure 3-19- Traces of elution off a heparin column loaded with increasing NaCl. Blue line shows
trace of UV absorbance at 280nm. Brown line shows conductivity as a percentage of the final level.
The bracketed red numbers refer to fractions of elutant.
95
Figure 3-20- SDS-PAGE showing eluted fractions from Figure 3-9. Lane 1 – A17, Lane 2- B16, Lane
3- B15, Lane 4- B14, Lane 5 – B13, Lane 6 Marker, Lane 7- B12, Lane 8- B11, Lane 9-B10, Lane 10-
B9, Lane 11- B8, Lane 12 – B7. Fractions from Lane 5 and Lane 7 were taken as crude FrmR
samples to be used in the next purification step.
The final method used to purify FrmR is one that separates proteins according to their size
using a technique is called ‘size exclusion chromatography’ (SEC). Here a protein sample is ran
through a ‘size exclusion chromatography column’ (SECC) consisting of a matrix of cross linked
dextran. Proteins small enough to enter the pores of the matrix are then separated according to
size. Smaller proteins take a longer path through the matrix whereas larger proteins take a
shorter path. This results in a separation of protein samples according to size with the largest
proteins eluting from the column first and the smaller proteins eluting last.224 This procedure
was performed on the sample obtained from the heparin column from the previous step (Figure
3-21). Figure 3-22 shows an SDS-PAGE of the various fractions containing FrmR, which reveals
that FrmR is eluted in a relatively pure state. Fractions C8 and C7 were taken as pure samples.
Approximately 0.25 mg of FrmR was obtained per 1L of cell culture.
96
Figure 3-21- Trace of UV absorbance at 280nm as solution elutes from a SECC used to purify FrmR
from the crude FrmR sample. The bracketed red numbers refer to fractions of elutant.
Figure 3-22 - SDS-PAGE showing eluted fractions of the FrmR mixture from the SEC. Lane 9 –
C12/C11, Lane 8- C10/C9, Lane 7- C8/C7, Lane 6- C6/C5, Lane 5- C4/C3, Lane 4- Marker, Lane 3-
C2/C1, Lane 2-B1/B2, Lane 1- B3/B4. Fraction from lane 7 was taken as pure FrmR.
3.6.3 Purification of HxlR1-His
HxlR1-His was purified using the nickel based method described in 2.3.3. Figure 3-23 shows and
SDS-PAGE of the eluted fractions, and reveals that HxlR1-His is eluted in a pure state as no other
protein bands are visible on the gel. Approximately 3.2mg of HxlR1-His were obtained per 1L of
cell culture.
97
Figure 3-23 - SDS-PAGE analysis of fractions from the purification of HxlR1-His. Lane 1 – Marker,
Lane 2- 20mM imidazole, Lane 3- 40mM imidazole, Lane 4 – 60mM imidazole, Lane 5- 80mM
imidazole, Lane 6 – Pure HxlR1-His eluted at 300mM imidazole. The molecular weight of a HxlR1-
His monomer is expected to be approximately 15kDa.
3.6.4 Purification of HxlR2-His
HxlR2-His was also purified using the nickel method described in 2.3.3. Eluted fractions were
analysed using SDS-PAGE and the results are shown in Figure 3-24 which reveals that HxlR2-His
is eluted in a pure state. Approximately 4.3mg of HxlR2-His were obtained per 1L of cells.
Figure 3-24 – SDS-PAGE analysis of fractions from the purification of HxlR2-His. Lane 1 – Marker,
Lane 2- 20mM imidazole, Lane 3- 40mM imidazole, Lane 4 – 60mM imidazole, Lane 5- 80mM
98
imidazole, Lane 6 – Pure HxlR2-His eluted at 300mM imidazole. The molecular weight of a HxlR2-
His monomer is expected to be approximately 14kDa.
3.7 Protein Size Determination Using Mass Spectroscopy
The sizes of the purified TFs were determined using two techniques. Mass Spectrometry was
used to accurately determine molecular weights of individual subunits and Multi Angle Light
Scattering was used to determine the oligomeric state of the protein in solution. Mass
spectroscopy and light scattering were conducted as described in sections 2.4.1 and 2.4.2
respectively and were performed by the Biomolecular Interactions facility at The University of
Manchester.
3.7.1 Mass spectrometry of FrmR-His
The electrospray time of flight spectrum of FrmR-His is shown in Figure 3-25. The main peak is
at 11705Da indicating that this is the monomeric mass of FrmR-His. The predicted average
molecular weight of FrmR-His is 11837Da. Therefore FrmR-His is 132Da lighter than predicted.
This anomaly can be explained by the cleavage of an N-terminal methionine residue that would
result in the loss of 131Da. The cleavage of this residue is common for proteins grown in E. coli
and is due to the action of the enzyme methionine aminopeptidase. Cleavage is dependent on
the identity of the residue adjacent to the N-terminal methionine with smaller residues
facilitating cleavage.225
Figure 3-25 - Electrospray time of flight mass spectrum of FrmR-His. Main peak is at 11705Da
which is 132Da less than the expected average mass which can be attributed to a cleaved N-
terminal methionine residue.
99
3.7.2 Mass spectrometry of FrmR
The electrospray time of flight spectrum of FrmR is shown in Figure 3-26. The main peak is at
10186Da indicating that this is the monomeric mass of FrmR. The predicted average molecular
mass is 10318Da. Again there is a 132Da difference between the expected and observed mass
indicating that the N-terminal methionine has been cleaved.
Figure 3-26 - Electrospray time of flight mass spectrum of FrmR. Main peak is at 10186Da which is
132Da less than the expected average mass which can be attributed to a cleaved N-terminal
methionine residue.
3.7.3 Mass spectrometry of HxlR1-His
The electrospray time of flight spectrum of HxlR1-His is shown in Figure3-27. The main peak is
at 15492Da indicating that this is the monomeric mass of HxlR1-His. The expected average
molecular mass of HxlR1-His is 15623 Da. Again there is a 131Da difference between observed
and expected indicating cleavage of the N-terminal methionine.
100
Figure 3-27 - Electrospray time of flight mass spectrum of HxlR1-His. Main peak is at 15492Da
which is 131Da less than the expected average mass which can be attributed to a cleaved N-
terminal methionine residue.
3.7.4 Mass spectrometry of HxlR2-His
The electrospray time of flight spectrum of HxlR2-His is shown in Figure 3-28. The main peak is
at 14554Da indicating that this is the monomeric mass of HxlR2-His. The expected average
molecular mass of HxlR2-His is 14686Da. There is a 132Da difference between the expected and
observed mass indicating cleavage of the N-terminal methionine. However, in this case there is
also second peak occurring at 14685Da. This is likely to be protein containing uncleaved
methionine at the N-terminus. Therefore in this sample, HxlR2-His exists as a mixture of the two
polypeptides.
101
Figure 3-28 - Electrospray time of flight mass spectrum of HxlR2-His. Main peak is at 14554Da and
another high intensity peak is at 14685Da. This 131Da difference can be attributed to peptides
with or without a cleaved N-terminal methionine.
3.8 Protein Size Determination Multi Angle Light Scattering (MALS)
3.8.1 MALS analysis of FrmR-His
FrmR-His was subject to MALS analysis. Figure 3-29 shows LS and RI plotted against elution
volume from the SECC. LS and RI are on a relative scale in line with the instruments calibration.
The elution volumes shown are subsequent to the ‘void volume’ which contains molecular
species that did not enter the dextrin matrix. This is generally particulate contamination and not
protein, so it produces a large LS with little change in RI. The FrmR-His molecular weight was
calculated using data from 10.5mL-11.2mL and the average molecular weight was measured to
be 52kDa. As the monomeric mass of FrmR-His is 11.7kDa, it is most likely that FrmR-His exists
as a tetramer of 46.8kDa. The presence of only one peak indicates that the protein is solution is
monodisperse.
102
Figure 3-29 - Traces from a MALS experiment performed on a sample of purified FrmR-His. The
red trace shows intensity of light scattered (red y-axis) against elution volume from the SEC. The
blue trace shows the refractive index signal (blue axis) against elution volume from the SEC. The
average molecular weight was calculated from 10.5mL to 11.2mL which was calculated to be
52kDa.
3.8.2 MALS and Size Exclusion Chromatography analysis of FrmR
MALS was performed on a sample of FrmR and Figure 3-30 shows LS and RI plotted against
elution volume from the SECC. Volumes shown are subsequent to the void volume of the SECC.
The molecular weight was calculated using data from 15mL-15.8mL and the average molecular
weight was measured to be 53kDa. As the monomeric mass of FrmR is 10.2kDa, it would be
expected that FrmR exists as a pentamer. However, proteins rarely exist in an odd number
multimeric state and a tetramer or hexameric form might be equally likely. Common errors in
these types of experiments can occur with measuring RI. RI can be expressed as:
3.1
Where Kri is an instrument calibration constant, n is the refractive index of the solution and c is
the protein concentration. In these experiments dn/dc is assumed to have a fixed value of 1.8mL
g-1. It is possible that for a solution of FrmR, (dn/dc) deviates from 1.8mL g-1 which could of led
to this error in molecular weight. 226
RI Kri cdn
dc
103
Figure 3-30 - Traces from a MALS experiment performed on a sample of purified FrmR. The red
trace shows intensity of light scattered (red y-axis) against elution volume from the SECC. The blue
trace shows the refractive index signal (blue axis) against elution volume from the SECC. The
average molecular weight was calculated from 15.0mL to 15.8mL which was calculated to be
53kDa.
Due to the ambiguity of the MALS result with FrmR, another experiment was conducted using
SEC to determine its likely multimeric state. As mentioned in section 3.6.2 SEC can separate
macromolecules according to size. It has been shown that to a reasonable approximation the
elution volume of a protein is proportional to the logarithm of its molecular weight. Therefore if
proteins of known molecular weight are ran through the SECC, a calibration curve can be
constructed that can be used to estimate a proteins molecular weight based on its elution time.
224,227 This experiment was used to estimate the molecular weight of FrmR; protein samples
were ran as described in 2.3.4. The column was calibrated using four standards: Bovine Serum
Albumin (sigma) - 67kDa, Ovalbumin (sigma) - 44kDa, Carbonic anhydrase (sigma) -29kDa and
RNAse (sigma)- 15kDa. The elution volumes of these proteins are shown in Table 3-1 along with
that of FrmR. Figure 3-31 shows a plot of the log10 molecular weight against elution volume for
the protein standards. A least squares fit was performed on these data points to obtain a linear
equation to relate protein molecular weight to elution volume. The equation is shown on the
graph in Figure 3-31 where y = Log10 Mw and x= Elution volume. The estimated molecular
weight of FrmR using the linear equation is 36.8kDa. These experiments are predicted to have
an average uncertainty of 10% between predicted and actual molecular weights.228 The
104
tetrameric form of FrmR would be 40.7kDa. The percentage difference between this value and
that calculated in this experiment is 10.5%. This is close to the average uncertainty which
implies that this value is likely to be reliable. Along with the MALS result, these data imply that
FrmR is a tetramer in solution.
Table 3-1- Elution volumes of protein standards and FrmR when ran through a SEC.
Figure 3-31- Graph showing log10 of the molecular weights of the known standards against elution
volume. A fit of least squares was used to obtain a linear equation that can be used to calculate the
molecular weight of unknown samples.
3.8.3 MALS analysis of HxlR1
In previous studies, it has been claimed that HxlR exists as a dimer.171 This claim was confirmed
by subjecting HxlR1 to a MALS analysis. Figure 3-32 shows LS and RI plotted against elution
Protein Elution volume
BSA (67kDa) 14.4
Ovalbumin (44kDa) 15.4
Carbonic Anhydrase (29kDa)
16.5
RNAse (15kDa) 17.8
FrmR 15.7
105
volume from the SECC. Volumes shown are subsequent to elution of the void volume of the
SECC. The molecular weight was calculated using data from 11.0mL-11.6mL and the average
molecular weight was measured to be 30kDa. The monomeric mass of HxlR1 is 15.5kDa; this
experiment therefore suggests that HxlR1 exists in solution as a dimer.
Figure 3-32- Traces from a MALS experiment performed on a sample of purified HxlR1. The red
trace shows intensity of light scattered (red y-axis) against elution volume from the SECC. The blue
trace shows the refractive index signal (blue axis) against elution volume from the SECC. The
average molecular weight was calculated from 11.0mL to 11.6mL which was calculated to be
30kDa
3.8.4 MALS analysis of HxlR2-His
A sample of HxlR2-His was also used in a MALS analysis and Figure 3-33 shows LS and RI
plotted against elution volume from the SECC. Volumes shown are subsequent to elution of the
void volume of the SECC. The molecular weight was calculated using data from 15.3mL-16.5mL
and the average molecular weight was measured to be 29kDa. As the monomeric mass of HxlR2-
His is 14.7kDa, the experiment shows that the protein exists in solution as a dimer.
106
Figure 3-33- Traces from a MALS experiment performed on a sample of purified HxlR2-His. The
red trace shows intensity of light scattered (red y-axis) against elution volume from the SECC. The
blue trace shows the refractive index signal (blue axis) against elution volume from the SECC. The
average molecular weight was calculated from 15.3mL to 16.5mL which was calculated to be
29kDa
3.9 Secondary structure determination
In this section, the secondary structure composition of each TF is investigated. Without a high
resolution structure of a protein, its secondary structure can be inferred by using biophysical as
well as computational techniques. Circular dichroism spectra were recorded as described in
2.4.3 and computational predictions used the software Jpred3 of which an brief explanation is
iven in 2.5.4). Chapter 4 describes the high resolution structure of HxlR2-His and as such, it is
not necessary to apply techniques used in this section to this protein.
3.9.1 Secondary Structure prediction of FrmR-His
The CD spectrum of FrmR-His is shown in Figure 3-34 along with its predicted spectrum from
K2D2. K2D2 predicted that FrmR-His is 85% alpha helical and 1.2% beta sheet in secondary
structure composition. As this protein is only 104 residues, it would imply that one residue per
monomer is involved in a beta sheet which is unlikely. As this percentage is so low and is
physically unlikely, CD indicates that FrmR-His is mostly alpha helical with no beta sheet
contribution.
107
Figure 3-34- Left- CD spectrum of FrmR-His from 200nm to 240nm. This spectrum was then
submitted to the program K2D2 . Right- Predicted spectrum from K2D2.
The output from the Jpred3 prediction is shown in Figure 3-35 which shows that a FrmR-His
monomer is predicted to consist of three helices connected by random coils. As there are high
confidence scores within each helix and coil it is likely that this is a true reflection of FrmR-His’
secondary structure. As 79% of residues are predicted to be part of an alpha helix and there is
no predicted beta sheet contribution, these results are concurrent with those found in the CD
study.
Figure 3-35- Secondary structure prediction of FrmR-His using Jpred3. FrmR-His sequence is
shown at the top with the predictions underneath. (A red H indicates a predicted alpha helix
whereas a – indicates a predicted random coil). At the bottom is the confidence score for each
residue with scores above six being shown in green.
3.9.2 Secondary structure prediction of FrmR
Figure 3-36 shows the CD spectrum of FrmR. The spectrum was submitted to K2D2 which
predicted that FrmR consists of 75% alpha helix and 1.7% beta sheet. The beta sheet
contribution can therefore be ignored again with this experiment indicating that FrmR is mostly
helical.
108
Figure 3-36 - Right- CD spectrum of FrmR. from 200-240nm. Right- Predicted spectrum from
K2D2.
The Jpred output is shown in Figure 3-37 which as expected gives a similar prediction to that of
FrmR-His i.e. three helices separated by random coils.
Figure 3-37 - Secondary structure prediction of FrmR using Jpred3. (A red H indicates a predicted
alpha helix whereas a – indicates a predicted random coil). At the bottom is the confidence score
for each residue with scores above six being shown in green.
3.9.3 Secondary Structure prediction of HxlR1-His
HxlR1-His’s CD spectrum is shown in Figure 3-38. K2D2 predicted that the protein contains
56% alpha helices and 7% beta sheets. In this case the beta sheet contribution could therefore
be significant.
109
Figure 3-38- Left- CD spectrum of HxlR1-His. from 200-240nm. This spectrum was then submitted
to the program K2D2. Right- Predicted spectrum from K2D2.
The Jpred3 output is shown in Figure 3-39. The first 4 helices (from the N-terminus) interrupted
with random coils are predicted with high confidence as are the two beta sheets. The final
predicted helix could however be split into two because residues 107 to 111 are predicted to be
helical but with a confidence score of zero. Here the beta sheet contribution is predicted to be
9% and the helical content is predicted to be 63%. These results are thus closely aligned with
those from the CD experiment indicating that this could be an accurate representation of HxlR.
Figure 3-39 - Secondary structure prediction of HxlR1-His. (A red H indicates a predicted alpha
helix, – indicates a predicted random coil and a yellow E indicates a beta sheet). At the bottom is
the confidence score for each residue with scores above six being shown in green.
3.10 Summary and Discussion
Phylogenetic studies show that regulators of FDPs are much less conserved than the pathways
themselves. Relatively few organisms seem to share the same TF for a given pathway and this is
a general result with regard to TFs and the pathways which they regulate. 229 Of the known TFs
of FDPs, FrmR is the most widely distributed with a copy of it appearing in 42 genome
sequences. However given that ~1800 bacteria have been sequenced (at the time of writing),
this number is still rather small. A result of this poor conservation of FDP TFs means that there
are inevitably other TF of FDPs that have yet to be discovered. The hxlR gene does not appear to
110
be particularly abundant in sequenced organisms as only 13 copies of a hxlR like gene were
found upstream of a FDP. Interestingly these 13 genes can be grouped into two distinct groups
based on sequence identity.
Several constructs of the frmR gene have been cloned in order to create pure samples of the
FrmR protein. While both the native FrmR and a C-terminally tagged version FrmR-His could be
purified, the N-terminally tagged version proved His-FrmR insoluble and was not studied
further. Mass spectroscopy of the purified proteins shows that both have a methionine residue
cleaved from their N-terminus. Light scattering and gel filtration has shown that they are both
likely to be tetrameric proteins. Each have also been found to be largely alpha helical proteins
containing no beta sheets. Secondary structure predictions infer that the FrmR chain is
comprised of three helices. Interestingly the MSA in Figure 3-4 suggests that the third helix in
the chain represents a poorly conserved region (ignoring the terminal three residues).
A pure sample of HxlR1-His was obtained through a heterologous host (E. coli). Mass
spectroscopy results show that an N-Terminal methionine residue is cleaved off the protein.
Multiangle light scattering has confirmed that HxlR1-His exists in solution as dimer. CD and
computerised secondary structure predictions have shown that HxlR1-His is likely to be mostly
helical though containing some beta sheet structure. HxlR2-His, a member of the type two form
of HxlR proteins was also purified from E. coli. As with other members of this protein family,
HxlR2-His is dimeric and seems to exist as a mixture of peptides either with a cleaved or
uncleaved N-terminal methionine.
The results in this chapter mostly serve as a platform to undertake more insightful experiments
in the subsequent chapters. However, some important biophysical properties of these TFs have
been elucidated. As with all in vitro biochemical experiments, results have to be treated with
some caution because these macromolecules have been taken out of their natural environment
i.e. the cell and therefore any results obtained may not be a true reflection of the
behaviour/structure of the TFs true biological function. Indeed for this project, several of the
proteins studied have had extra amino acids placed onto their C-terminus which may also affect
their properties.
111
4 Crystal Structure Determination of FrmR and HxlR
4.1 Introduction
Knowing the detailed 3D-structure of biomolecules is essential to understand their function.
The unique structure of a biomolecule dictates how it can interact and/or catalyse a reaction
and therefore how it fulfils its biological role. It is therefore desirable to obtain detailed
structures of the TFs that are being studied in this project and this chapter describes
experiments aimed at obtaining these structures. The most common method of obtaining an
accurate 3D-model is through the use of macromolecular X-ray crystallography. X-ray
crystallography is the method of determining how the atomic structure of a crystal is arranged
by illuminating the crystal with X-rays and analysing the resulting diffraction patterns. This
chapter attempts to obtain high resolution structures of the purified proteins from Chapter 3 in
order to increase our understanding of their biochemical properties. Aknowledgements are
given to Dr Mark Dunstan and Professor David Leys for their help and assistance given in the
experiments described in this section
4.2 Aims and Objectives
The first step in macromolecular structure determination by X-ray crystallography is to obtain
crystals of the macromolecule. There are several known methods that can achieve this with the
‘vapour diffusion’ method proving particularly successful; this technique will therefore be
employed in this study. By screening many known crystallisation conditions it is hoped to
obtain crystals of the TFs that were purified in chapter 3. If this first step is successful, then the
diffraction patterns of the crystals will be determined using high intensity X-ray radiation from
a synchrotron source. Once the data are collected, they need to be processed so that all the
recorded reflections are stored in one file with their intensities scaled relative to each other.
Then, the recorded reflections need reasonably accurate phase estimates so that a starting
model for refinement can be created. In this study, molecular replacement (MR) was used for
this procedure. If a realistic starting model is obtained then model refinement can be performed
which will hopefully lead to a final structure. With a refined structure in hand, it is then possible
to evaluate the protein structure and assess its biological context.
112
4.3 Crystallization
Macromolecular crystallization usually only occurs in some conditions and all too often, not at
all.205 To try and find optimal crystallization conditions, each protein was screened under 384
different conditions using commercially available screens obtained from MOLECULAR
DIMENSIONS. These screens each contain 96 different reservoirs and are designed to cover the
majority of known conditions that support crystallization of globular proteins. These screens
are called “PACT-Premier, JCSG plus, Clear strategy 1 and Clear strategy 2. In order to screen
through these conditions the protein sample is first mixed in a 1:1 ratio with an equal volume
from the reservoir before being placed in the crystallization well. Details of the crystallization
procedure are described in 2.6.2.
4.3.1 Crystallization of FrmR-His and FrmR
Crystallization of FrmR and FrmR-His was attempted using these four screens. Two starting
concentrations were used for each protein: 10 mg/mL and 5 mg/mL. Unfortunately, no
crystalline form of either of these two proteins was observed (despite screening various
conditions and ligands such as formaldehyde). These results meant that crystallographic
structure determination could not be performed on these two proteins.
4.3.2 Crystallization of FrmRC36S
As explained in chapter 5, a mutant form of FrmRC36S, deficient in formaldehyde sensing but
not in DNA-binding has been obtained. In order to test whether this mutation alters FrmR’s
ability to crystallize, this protein was also tested against the four screens. It was found that in
contrast to the WT, FrmR-C36S readily formed crystals in several conditions. Two of these
conditions produced crystals that were of an appreciable size, but appeared to belong to
different crystal forms. One type of crystals were hexagonal rods while the other can be
described as plates. The plate like crystals were formed in condition G2 from PACT premier
with the reservoir consisting of 0.2M Sodium bromide, 0.1M Bis Tris propane, 20% w/v PEG
3350 at pH 7.5. The conditions that formed the rods were from well A4 of the Clear strategy 2
screen with the reservoir consisting of 0.5M dihydrogen potassium phosphate and 0.1M sodium
acetate (pH 5.5). Both of these crystal forms were on the scale of approximately 50-200
micrometres in each dimension. In order to obtain more and potentially improved crystals, both
these conditions were optimized by scaling up the volumes used as well as testing small
113
deviations from the original well conditions. Larger crystals that were approximately four times
larger than those from the screen were obtained which were used in the diffraction
experiments.
Figure 4-1- Picture taken of the rods formed in well A4 of clear strategy 2.
4.3.3 Crystallisation of HxlR1-His
Purified HxlR1-His was tested for crystallisation against the four screens. This was performed
with HxlR1-His at two concentrations: 5 mg/mL and 10 mg/mL. Unfortunately, none of the
conditions exhibited crystal growth. This meant that the crystallography procedure for HxlR1-
His was stopped at this point.
4.3.4 Crystallisation of HxlR2-His
Purified HxlR2-His was found to form crystals in two conditions. These were either rod shaped
or plate-like. The more rod-liked crystals came from well H5 in the Clear strategy screen 2
where the reservoir contains 0.15 M Potassium thiocyanate, 0.1M Tris pH 8.5 and 20% v/v PEG
600. The plate-like crystals were from well H6 in the Clear strategy 1 screen in which the
reservoir consists of 0.8 M Sodium formate, 0.1 M Tris pH 8.5, 10% w/v PEG 8K and 10% w/v
PEG1K. Both crystal forms had dimensions on the order of 10-100 micrometers.
Figure 4-2- Left- Rod-like crystals of HxlR2-His from well H5 from the Clear strategy 2 screen.
Right- Plate-like crystals of HxlR2-His from well H6 from the Clear strategy 1 screen.
114
4.4 Diffraction Data Collection
To obtain diffraction patterns of the crystals grown in section 4.3, several steps need to be
undertaken. First of all the crystals need to be taken from the crystallisation well and be flash-
cooled in liquid nitrogen. This allows X-ray diffraction experiments to be conducted at low
temperatures therefore improving crystal life-time, the order of the crystal and thus improving
the quality of data collected. To prevent the formation of ice crystals, protein crystals are mixed
with what is known as a cryo-protectant prior to flash-cooling. A cryo-protectant is usually a
viscous substance such as glycerol and prevents ice crystal formation.230 Once mixed with a
cryo-protectant, protein crystals are placed into a small nylon loop usually around 10µm in
diameter) and the crystal is quickly (flash) cooled in liquid nitrogen.
Next, high intensity monochromatic X-rays will be used to illuminate the crystal at multiple
angles and the resulting diffraction patterns need to recorded. High intensity monochromatic X-
rays can be obtained from synchrotron light sources. Therefore, all diffraction experiments
described in this section were performed at the European Synchrotron Radiation Facility
(ESRF). The setup of these diffraction experiments is shown in Figure 4-3. The crystal is
mounted onto a goniometer which is a device that rotates at precise angles perpendicular to the
incident X-ray beam (rotation about the angle, φ). The crystal is constantly cooled by a stream
of nitrogen gas and the images are recorded on a charged coupled device (CCD) detector.
Further technical details of these diffraction experiments are described in 2.6.3.
Figure 4-3- Diagram showing the setup of the X-ray diffraction experiments in this section. The
loop containing the crystal is mounted onto the head of the goniometer that rotates about angle φ.
The incident X-ray beam goes through the center of the crystal which is in the plane of the CCD
detector that records the images.
115
4.4.1 Data Collection on FrmRC36S crystals
These experiments were carried out with the explicit aim to obtain the highest resolution data
possible. Resolution of the data can be estimated by examining the location of outer most
reflections of the images collected. These experiments were performed using beamline ID 14-4
at the ESRF (Grenoble) which has a wavelength of approximately 0.94Å. 360 images were
recorded through 180ᵒ with an individual oscillation angle of 0.5ᵒ. Figure 4-4 shows an
example of the images from the highest resolution data set obtained for FrmRC36S. The rod like
crystals diffracted to a resolution of approximately 3Å whereas the plate like crystals did not
diffract X-rays.
Figure 4-4- Example of a diffraction image obtained from the highest diffracting rod like
FrmRC36S crystal. The blue line shows the approximate diffraction limit which corresponds to a
plane spacing of 3 Å.
4.4.2 Data Collection of HxlR2-His crystals
These experiments were performed at beamline ID -23-1 at the ESRF (Grenoble) which has a
wavelength of approximately 0.94Å. For each crystal data set, 360 images were recorded through
180ᵒ with an individual oscillation angle of 0.5ᵒ. The rod like HxlR2-His crystals diffracted to a
resolution approximately 2.3Å (Figure 4-5) whereas the plate like crystals diffracted to a
resolution of 1.95Å (Figure 4-6).
116
Figure 4-5- Example of a diffraction image obtained from the highest diffracting rod like HxlR2-His
crystal. The blue line shows the approximate diffraction limit which corresponds to a plane
spacing of around 2.3 Å.
Figure 4-6- Example of a diffraction image obtained from the highest diffracting plate like HxlR2-
His crystal. The blue line shows the approximate diffraction limit which corresponds to a plane
spacing of around 1.95 Å.
117
4.5 Data Processing
Following data collection, the images need to be processed. The processing of data goes through
a number of stages, the first of which is determining crystal orientation, unit-cell parameters
(indexing) and potential space group of the crystal. These parameters are used to predict a
diffraction pattern and the intensity is recorded at each predicted reflection, a process called
integration. The program iMosflm was used to carry out this initial data processing.206,231 Due to
the imperfect nature of real protein crystals, a particular reflection occurs over a certain
oscillation width (rather than being a discrete point in space) and so reflections can occur
across different images. This results in partial reflections being recorded as the reflection is split
between separate images. A process called merging needs to be performed on the data whereby
the presence of partial reflections is taken into account and the various images are scaled to a
common reference. This process is called scaling. Both merging and scaling were performed
using the program SCALA. 207 The most likely space group was then determined for each crystal
form using the program POINTLESS.207 In order to continue with structure elucidation, the list
of intensities needs to be converted into structure factors, |F|; this operation was conducted
using the program Truncate. If the data was from a perfect lattice then |F| would be equal to √I;
however as the data will be imperfect, truncate makes best estimates of |F|. Additionally, the
“Free R flag” is set; this is the 5% of data that is not used for determining the model and is kept
aside for generating the Rfree statistic208. The output from POINTLESS and SCALA gives a number
of data processing statistics and contains much of the information obtained from data collection
and data processing. These statistics are generally presented in a table such as that in Table 4-1.
It contains:
Space Group- space group of the crystal.
Unit-cell parameters (a,b,c)- The dimensions of the unit cell of the crystal in Å.
Unit-cell parameters , , (°)- Size of the angles between the three unit cell axes. between b
and c, between a and c, between a and b
Resolution Range- The range in Å that the data was processed in. This corresponds to the angle
of the reflections seen in the diffraction pattern.
Number of observed reflections- The total number of reflections used in the data processing.
Number of unique reflections- The number of measured reflections.
Completeness- The percentage of expected reflections to be observed from this space group at
the stated resolution that are measured
<I/ I>- The average ratio of the measured intensity of a reflection, I, divided by its standard
deviation, I.
118
Rmerge- This is an R factor similar to that described in equation 2.9. It is a measurement of the
agreement between the averaged scaled intensity of a reflection from all images, and the scaled
intensity of the same reflection from individual images.
Multiplicity-defined as average amount of individual measurements of a given reflection
(number of observed/unique reflections)
4.5.1 FrmRC36S
Table 4-1 shows the data processing statistics for the FrmRC36S crystal:
Space Group P3112
Unit-cell parameters (a,b,c) Å 143.58, 143.58, 55.35
Unit-cell parameters , , (°) 90.0, 90.0, 120.0
Resolution range (Å) 62.17– 3.00 (3.2-3.00)
No. of observed reflections 95051
No. of unique reflections 4233
Completeness (%) 95.6 (86.9)
<I/ I> 45.9(6.7)
Rmerge 0.037 (0.247)
Multiplicity 7.8 (10.2)
Table 4-1- Data processing statistics from the FrmRC36S crystal. Numbers for completeness,
<I/ I>, Rmerge and multiplicity are given for the overall data and for the high resolution data
(bracketed).
4.5.2 HxlR2-His
Table 4-2 shows the data processing statistics for the both crystal forms of HxlR2-His.
Crystal Rod-Like Plate-like
Space Group P 43 21 2 P21212
Unit-cell parameters (a,b,c) Å 74.21, 74.21, 118.58 78.65, 124.21,
30.69
Unit-cell parameters , , (°) 90.0, 90.0, 90.0 90.0, 90.0, 90.0
Resolution range (Å) 52.47 – 2.3 (2.5-2.3) 48.74 – 1.95 (2.15-
119
1.95)
No. of observed reflections 211497 85151
No. of unique reflections 15380 22450
Completeness (%) 99.9 (100) 98.7 (96.7)
<I/ I> 13.9 (5.0) 6.8 (2.6)
Rmerge 0.147 (0.816) 0.109 (0.375)
Multiplicity 13.8 (14.2) 3.8 (3.7)
Table 4-2- Data processing statistics from both crystal forms of HxlR2-His. Numbers for
completeness, <I/ I>, Rmerge and multiplicity are for the overall data while those for the high
resolution data are bracketed.
4.6 Phase determination by Molecular Replacement (MR)
At this stage, the phase angles of the recorded reflections are unknown. In order to obtain
interpretable electron density maps, we need a way to estimate the phases. As the proteins
crystallised in this work have homologues for which the structure has been determined,
molecular replacement (MR) was used as the preferred method. The homologous structures are
referred to as the phasing model. The general procedure involves accurately placing the phasing
model into the unit cell of the new structure so as to provide a suitable starting model for the
latter. This 6D search (3 rotational angles and x, y, z) is usually split into two searches, a
rotational search followed by a translation search. The procedure relies on the construction of
what is called a Patterson map of the data from the target protein. This is constructed by
performing a Fourier transform on the recorded absolute structure factors (i.e. phase angles set
arbitrarily at 0) which gives a map of vectors between atoms within the unit-cell.
A Patterson map is also constructed from the phasing model, and if the phasing model is similar
enough to the target protein, their Patterson maps should overlap well at a particular
orientation. The MR procedure is split into two separate functions: initially the Patterson map of
the phasing model is superimposed onto the Patterson map of the target protein. The phasing
model is then systematically rotated about three angles in order to find the orientation of the
phasing model that gives the best correlation with the target protein (through Patterson
correlation). The second search function translates the optimally orientated model throughout
the unit cell so as to find the position in the unit cell whereby |F| of each reflection from the
target protein data set is closest to a calculated |Fcalc| derived from the position of the phasing
120
model. In this section the program used to perform the MR procedure is called Phaser. Phaser
uses a method broadly based on procedure described above, and uses additional scoring
algorithms based on maximum likelihood probability theory .209 Each solution from Phaser
comes with associated scores that give an indication of whether the procedure is likely to have
been successful or not. The Z-score is given for both rotation and translation functions, and is a
representation of signal to noise for each procedure. The translational Z-score gives the biggest
indication as to whether a solution has been found. Translational Z-scores above 8 indicate that
a solution has almost definitely been found and Z-scores below 6 are unlikely to be solutions.
For potential solutions, Phaser outputs a file containing reflections with their new phase angle
estimates along with a model in the correct orientation and position. Details of the settings used
and variables changed to obtain the best solutions from Phaser are described in section 2.6.5.
4.6.1 Molecular replacement for FrmRC36S
On searching the PDB, the only two protein structures showing significant homology to FrmR
were the CsoR proteins from both Mycobacterium tuberculosis (PDB code: 2HH7) and Thermus
thermophilus (PDB code: 3AAI).177,232. Both structure display only modest sequence identity
(<40%) and both structures were used as phasing models. Additionally, to increase the chances
of obtaining a solution, a different phasing model was constructed using the program CaspR.
CaspR uses a MSA containing the target protein along with several related protein structures to
generate many potential phasing models. The resulting models also come with indications of
which are more likely to be useful as phasing models.210 The best model from CaspR was also
used in the MR procedure. These results are shown in Table 4-3. As the above methods did not
generate a successful model, the software “balbes” was also used as a further attempt to find a
successful solution. Balbes is a fully automated program that performs a model searching
procedure to generate phasing models as well as carrying out the MR procedure.233
Phasing Model Translational Z-score Rotational Z-score
CsoR (TB) (monomer) 6.0 (Partial solution) 3.6 (Partial solution)
CsoR (TB) (dimer) No solution No Solution
CsoR (TB) (tetramer) No solution No Solution
CsoR (TT) (monomer) 6.0 (Partial solution) 3.2 (Partial solution)
CsoR (TT) (dimer) No solution No Solution
CsoR (TT) (tetramer) No solution No Solution
CaspR model No solution No Solution
121
Table 4-3- phasing statistics from Phaser. MT= Mycobacterium tuberculosis TT= Thermus
thermophilus
Despite several attempts, no solutions could be found with Phaser (or related MR programmes).
In contrast, using Balbes gave a solution that the software predicted had a 95% probability of
being the correct model. This model was therefore used for further model building and
refinement.
4.6.2 Molecular replacement of HxlR2-His
On searching the PDB, the most similar protein structure to HxlR2 is that of a putative TF YtdC
from Bacillus subtilis (41% identity over full length).234 This structure was therefore used as a
phasing model. Additionally, to increase the chances of obtaining a solution, a different phasing
model was constructed using the program CaspR.210 The best model from CaspR was also used
in the MR procedure. Phaser MR was performed using both the monomeric and dimeric forms
of YtdC, as well as the best CaspR output as a phasing model for the P43212 data set. Table 4-4
shows the best Z-scores obtained with each phasing model.
Phasing Model Translational Z-score Rotational Z-score
YtdC (monomer) 8.9 and 4.7 4.2 and 3.9
YtdC (dimer) 8.8 5.7
CaspR output model (dimer) 25.1 9.2
Table 4-4 - Best obtained Z-score statistics from Phaser using dimeric and monomeric YtdC as well
as the best CaspR output model.
Table 4-4 reveals plausible solutions are obtained by using YtdC as a phasing model, because
translational Z-scores above 8 are obtained. However the phasing model obtained from CaspR
gives much higher Z-scores for both translational and rotational functions. Therefore the
subsequent model building and refinement was carried out using the phases and coordinates
resulting from this run of Phaser.
Table 4-5 shows the best output Z-scores from Phaser using the orthorhombic data set. The Z-
scores obtained using the above models are not nearly as convincing as those for the tetragonal
data set. Therefore, Phaser was re-run using the fully refined structure of the tetragonal form of
HxlR2-His which produced, as expected, high Z-factors.
122
Phasing Model Translational Z-score Rotational Z-score YtdC (monomer) 3.6 (Partial solution) 4.1 (Partial solution) YtdC (dimer) 4.6 6.1 CaspR output model (dimer)
9.1 5.3
Tetragonal HxlR2His (dimer)
30.2 17.7
Table 4-5 - Best obtained Z-score statistics from Phaser using dimeric and monomeric YtdC as
well as the best CaspR output model. Additionally HxlR2-His (tetragonal) was also used as a
phasing model.
4.7 Model building and refinement
Details of the model building and refinement method used for all datasets are described in
section 2.6.6
4.7.1 Model improvement and refinement of FrmRC36S
Attempted refinement of the Balbes model obtained in section 4.6.1 with the diffraction data of
FrmRC36S failed; i.e. the R and Rfree statistic would not decrease to an acceptable value. The
lowest R-values attained with each model are shown in Table 4-6.
Balbes output model (best)
R-Factor 0.44
Rfree-Factor 0.46
Table 4-6-Best refinement statistics obtained for the model of FrmRC36S.
The output from balbes is shown in Figure 4-7. Despite using a monomer as search model, the
structure clearly has the overall tetrameric structure that is expected (on the basis of our own
solution data for FrmR size in chapter 3 and given the tetrameric structure of CsoR) rendering it
a plausible solution; despite this, the model does not appear to refine, with no clear hints from
electron density maps as to what areas require extensive rebuilding.
123
Figure 4-7- Best model that was obtained from Balbes. Each chain is coloured differently.
There could be several reasons as to why it is not possible to refine the data. Because the data
are of relatively low resolution (3Å), there is less tolerance for widely divergent models. Hence,
a likely problem could be significant differences between the FrmRC36S structure and the CsoR
molecules. These differences may be too large so that refinement fails. Additionally, several
programs indicate all FrmR C36S data sets collected display characteristics of merohedral
twinning.235 This further complicates structure elucidation and again severely limits the extent
to which the phasing model can be different from the FrmR structure. Potential solutions to this
problem are discussed in chapter 6.
4.7.2 Model improvement and refinement of HxlR2-His
The final statistics obtained from the model building and refinement of both crystal forms of
HxlR2-His are shown in Table 4-7. These statistics include the R and Rfree factors as well as the
root mean square deviations (rmsd) of the model’s bond angles and bond lengths from their
ideal geometries.
P 43 21 2 P 21 21 2
R-Factor 0.2211 0.2864
Rfree-Factor 0.2537 0.3287
RMSD Bond angles (°) 0.009 0.010
RMSD Bond lengths (Å) 1.103 1.346
Number of protein atoms 1877 1835
Number of solvent atoms 95 86
Table 4-7- Refinement statistics for both crystal forms of HxlR2-His.
124
For the tetragonal space group, all the values in Table 4-7 are considered to be acceptable for a
completed structure at the given resolution. However, despite all efforts, the refinement
statistics for the orthorhombic space group form were not acceptable for the resoultion to
which they diffract to. It is not clear as to why the data will not refine fully. A detailed inspection
of the data reveals it is clear that the quality of the images vary throughout the data set. Figure
4-8 shows a close up view of two sample images from this data set. The reflections are obviously
smeared in one orientation and not in the other. This indicates poor quality data due to
anisotropic behaviour of the crystal.
Figure 4-8 - Close up of images from different orientations of the crystal. Reflection in the right
image are smeared which could indicate poor quality data.
4.8 Validation of model structures
The validity of the final models was assessed using a program called MolProbidity.215
MolProbidity assesses the geometry of the protein structure by comparing it to high quality,
high resolution structures as well as calculating which residues have backbone dihedral angles
in unfavourable conformations (a Ramachandran plot). Molprobidity also adds hydrogens to the
model and assesses if there are any serious clashes between atoms (non-hydrogen bonding
atoms having Van der Waals surfaces overlapping more than 0.4Å). The statistics obtained from
MolProbidity are shown in Table 4-8.
125
Table 4-8 - Validation statistics as obtained using MolProbidity for both HxlR2-His models.
The Ramachandran statistics from the tetragonal model indicate that most of the backbone
residues are in favourable conformations. 7.3% of rotamers were flagged as “poor rotamers”.
This is based on a whether the rotamer deviates from those of a reference library in high
quality protein structures (i.e. resolution ≤1.7Å).236 For the resolution of this structure (2.3Å),
the number of poor rotamers here is acceptable.237 All bond lengths and angles were of
acceptable values and all -Carbons are within 0.25 Å of their ideal position. The number of
serious clashes is also acceptable. Given that the orthorhombic data set has refined poorly, it is
not surprising that the validity of the structure is not as good as that for the tetragonal model.
4.8.1 Crystal structure of HxlR2-His
The refined tetragonal structure is composed of two subunits: chain A and chain B. Chain A
includes residues 3-110 and chain B includes residues 5-112 (HxlR2-His is 124 residues
including the C-terminal His-Tag). Electron density was not observed for a small amount of the
N-terminus and a more significant part of the C-terminus. This is often observed in crystal
structures because the termini tend to be more flexible than the rest of the protein. The refined
orthorhombic structure showed density from residues 4-107 in chain A and from residues 5-
112 in chain B. This structure therefore resolved slightly less residues than the other space
group form. The overall structures of the tetragonal space group form is shown in which
confirms the dimeric nature of the protein, concurrent with the MALS results (chapter 3). Some
example electron density (2F0-FC) of helix-5 (see section 4.12) contoured at 2.0 is also shown
in Figure 4-9.
P 43 21 2 P 21 21 2
Ramachandran outliers 0% 1.4%
Ramachandran favored 98.1% 95.7
Poor Rotamers 7.3% 8.7%
Residues with bad bonds 0% 0%
Residues with bad angle 0% 1.89%
-Carbon deviations >0.25Å 0 4
Serious clashes 1.17% 1.67%
126
Figure 4-9- Overall structure of tetragonal HxlR2-His colored according to chain. Chain A is
colored red and chain B is colored blue.
Figure 4-10 –Helix-4 (left) and helix-5 (right) of tetragonal HxlR2 displayed in 2FO-FC electron
density that is contoured at 2.0 .
127
4.9 Comparison of both HxlR2-His structures
To compare the two space group forms of HxlR2-His, an overlay of all atoms of the two
structures was performed. The two structures overlay very well (RMSD- 0.68Å for 215 amino
acids). Figure 4-11 shows the backbone atoms (for clarity) of this overlay. The only regions that
display significant variation are the “wings” of the wHTH motif. This is not surprising as these
are loop regions on the surface of the protein and are therefore mobile and more likely
influenced by crystal packing.
Figure 4-11 - Overlay of the tetragonal (blue) and orthorhombic (red) crystal forms of HxlR2-His
proteins.
It is interesting to note that in the orthorhombic structure, Cys-72 (of chain A) is linked to Cys-
72 (of chain A) of a symmetry related molecule. This is shown belown in Figure 4-12. Given the
similarity between these two structures, and that the orthorhombic structure was poorly
refined, only the tetragonal structure will be further discussed.
Figure 4-12: Disulphide bond between chain A Cys-72 of adjacently packed HxlR2-His dimers.
Electron density contoured at 2σ.
128
4.10 Comparison with other structures
Figure 4-13 shows an overlay of the backbone of both chain A and B of HxlR2-His with the two
most similar structures that are in the PDB. YtdC from Bacillus subtilis (PDB ID -2HZT) as well as
oxidised and reduced forms of HypR from Bacillus subtilis (PDB ID’s- 4A5M for oxidised and
reduced forms respectively 4A5N). The structures of HypR were published subsequent to the
refinement of HxlR2-His which is why they were not used as phasing models.238
Figure 4-13 - Overlays of the P 43 21 2 structure with YtdC (top left), oxidized HypR (top right),
reduced HypR (bottom).
The HxlR2 structure appears to overlap well with all proteins with no really large difference in
conformation between any structure. The RMS value between HxlR2 and YtdC is 1.49A (for 196
alpha carbon atoms), that with oxidised HypR is 1.22A (for 191 alpha carbon atoms), and that
with 1.60A (for 205 alpha carbon atoms), with reduced HypR. It is interesting that HxlR2 is
more similar to the oxidised form of HypR than the reduced form. The oxidised form of HypR is
believed to be an active form of the protein that binds DNA whereas the reduced form is
believed to be inactive and not DNA binding.238
4.11 A comparison between chain A and chain B in HxlR2-His
As the environment of chains A and B within the AU is different, the individual structures may not be identical.
129
Figure 4-14 shows the overlay of the backbone of the two chains. While most of the structure
matches closely there are some significant differences. At positions G22, G23 and R24, the two
chains noticeably diverge while there is also some discrepancy between the two chains at the
two termini which again reflects the mobility of these areas.
Figure 4-14 - Overlay of chain A (red) and chain B. (blue) labeled. The labeled divergence is part of
a loop region and contains residues G22 G23 and R24
4.12 Secondary structure and domain organisation
Figure 4-15 shows chain A coloured with respect to its secondary structure and Figure 4-16
shows a schematic of this organization. Each subunit appears consists of two domains: a wHTH
domain and a dimerisation domain. The wHTH is composed of -helices 2, 3 and 4 as well as -
Strands 3, 4 and 5. The loop between sheets 4 and 5 comprises the “wing” of the motif and helix
4 corresponds to the “recognition helix”.
130
Figure 4-15 - Chain A of HxlR2-His. The chain is colored according to secondary structure. -
helices are colored pink, -Sheets are colored blue and loop regions are colored grey. Each
secondary structure element is labeled according to its type and position in the chain starting at
the N-terminus.
Figure 4-16 - Schematic of the secondary structure elements of HxlR2-His. Each subunit can
broadly be divided into two separate domains: a wHTH domain and a dimerisation domain.
4.13 B-factor analysis of HxlR2-His
Some atoms within a protein will move about their position more so than others. This motion
will affect the electron density of that particular atom. The resulting electron density map is
therefore more defined at atoms with little motion and less well defined at atoms that exhibit
more motion. The extent of this motion is quantified by a number called its B-factor. Each non-
hydrogen atom is designated a B-factor during the refinement process which can be used as an
indication of the relative mobility of each atom.203 Figure 4-15 shows the backbone of HxlR2-His
131
coloured according to the B-factors of individual residues. The colouring is relative to every
atom of the protein and goes from dark blue being the lowest B-factor values and hence most
well defined, to red being the highest B-factor values and therefore least well defined (scale is
from 13.70 to 74.57). Colours between these two values relates to B-factor based on the visible
spectrum from dark blue to red.
Figure 4-17 - Backbone structure of HxlR2-His coloured according to backbone residue B-factors.
Colours range from dark blue through the visible spectrum to red although only blue to orange is
observed in this figure as only backbone atoms are shown.
Figure 4-15 reveals that most of the main chain is of a similar flexibility as most of it is coloured
either light or dark blue. There are however regions that are coloured green, yellow and orange.
These are all at loop regions between either α-helices or β-strands or are at the chain termini.
The peptide chain is therefore more variable in these regions than in parts that constitute
ordered secondary structure elements. This is to be expected because these regions are not held
in a specific position by a non-covalent interaction network whereas α-helices or β-strands are.
The flexibility of loop regions is often essential for a protein to carry out its biological function.
Figure 4-18 shows the ribbon structure of HxlR2-His coloured according to its side chain B-
factors. Again the colour is representative of the atom possessing the highest B-factor for that
particular residues side chain (scale is from 13.70 to 74.57). There is noticeably more variation
in side chain flexibility than for the backbone atoms. Noticeably high values occur within β-
sheet 2 which that could perhaps indicate that this N-terminal region is not an ordered β-sheet
as is depicted in Figure 4-13 but is in fact a disordered region. Better quality diffraction data
would be required in order to determine whether this is so. It is also evident that surface
residues in contact with the solvent tend to have a higher B-factor than that of those making
contacts solely to other protein atoms. This is apparent when, as an example, looking at -helix
4 from chain A (Figure 4-17). Side chains that point out into the solvent have relatively high B-
factors (Asn-52, Gln-53, Arg-54, Met-55, Arg-58, Gln-59, Gln-59, Arg-61, Glu-64 and Asp-65).
132
Conversely, side chains that point towards other parts of the protein and are therefore not fully
immersed in solvent (Leu-56, Ile-57, Leu-60, Leu-63 and Asp-66) have lower B-factors. This
result is expected because surface residues are not held in place by intramolecular interactions
to the same extent as interior residues and therefore have more freedom to move.
Figure 4-18 - Ribbon structure of HxlR2-His coloured according to side chain residue B-factor.
Colours range from dark blue through the visible spectrum to red.
Figure 4-19 - -helix 4 from chain A of HxlR2-His coloured according to side chain B-factors.
Colours range from dark blue through to red. The side chains of this helix can broadly be divided
into those that interact with other protein side chains in the interior of the protein and those that
are immersed in solvent.
133
4.14 Analysis of the HxlR2-His dimer interface
The interface between chains A and B was analysed in order to establish the basis for the
dimeric quaternary structure observed in HxlR2-His. This was performed using the program
Protorp239. Protorp uses several programs that have been written to calculate particular
physical properties of a protein given its atomic coordinates. Outputs from these programs are
amalgamated to provide a robust estimate on the structure of a protein-protein dimer interface.
240 A table showing the data obtained from Protorp is displayed in Table 4-9.
Table 4-9 - Output data from Protorp. All numerical values are per chain.
Protorp calculates the Accessible Surface Area (ASA) of each residue. ASA is the surface area of
the protein that is accessible by the solvent. Protorp defines a residue as being interfacing if its
ASA decreases by more than one Å2 when comparing calculations for the isolated chain with
those for the dimeric structure. This analysis (Table 4-9) suggests that over a third of residues
in HxlR2-His contribute to the formation of the dimer interface. Most of the atoms involved in
the interface are non-polar which suggests hydrophobic forces play a large role in its formation.
ASA (Accessible Surface Area) of entire
interface (Å)
1550.10
% Interface ASA 19.74
No. of atoms in interface 139
% polar atoms 21.24
% neutral atoms 7.00
% non-polar atoms 71.05
No. of Interfacing residues 43
% polar residues 25.58
% non-polar residues 53.49
% charged residues 20.93
% -helix contribution 67.44
% -Sheet contribution 6.98
No. of atoms forming Hydrogen bonds 2
No. of atoms forming salt bridges 5
No. of atoms forming Disulphide bonds 0
No. of atoms bridged by water molecules 0
134
Furthermore, the number of hydrogen bonds in this interface is rather low as is the number of
salt bridges. It is therefore likely that the free-energy of formation of this dimer interface is
largely driven by hydrophobic interactions. This is not an uncommon result for the formation of
quaternary protein structures.241
Protorp also gives an assignment to each residue as to whether it is involved in the formation of
the dimer interface. Figure 4-20 shows the sequence of HxlR2-His coloured according to
whether the residue is involved in the formation of the interface (red for interfacing residues
and black for non-interfacing residues). As expected a large portion of interfacing residues
(>85%) come from what has been depicted as the dimerization domain in section 4.12.
Figure 4-20 - Sequence of HxlR2 with residues colored according to whether Protorp assigned
them as interface forming residues (red) or non-interfacing residues (black). Residues colored
blue were not crystallographically resolved. Parts of the peptide sequence that were depicted as
being part of the dimerisation domain in section 4.12 are shown in bold.
Figure 4-19 shows the ribbon structure of HxlR2-His with only the side chains of interfacing
residues displayed and Figure 4-20 shows the same structure but only displays interfacing
residues. Chains A and B are coloured differently for clarity and side chains are coloured
according to their type. Solely from this image it is immediately clear that the main driving force
behind this dimerisation is likely to be hydrophobic interaction. The biggest proportion of the
dimer interface is made by the interaction between -helix 5 of both subunits.
Figure 4-20 shows that the length of this part of the interface is composed solely of residues
with non-polar side chains. This indicates that the hydrophobic interaction between these two
helices must contribute significantly to the overall stability of the dimer structure.
135
Figure 4-21 - Overall structure of HxlR2-His showing side chains assigned by protorp as being
interfacing residues. Side chains are colored brown (non-polar), purple (charged, acidic), blue
(charged, basic) and yellow (polar). Ribbon structures of chains A and B are colored red and blue
respectively.
Figure 4-22 – Left- Interfacing residues as assigned by protorp. Side chains are colored brown
(non-polar), purple (charged, acidic), blue (charged, basic) and yellow (polar). Ribbon structures
of chains A and B are colored red and blue respectively. Right HxlR2 showing the hydrogen
bonding between the backbone oxygen atom of Ile88 and the nitrogen atom of the indole side
chain of Trp102. Hydrogen bonding shown as red lines.
136
The two predicted hydrogen bonds that occur at the interface are between the backbone oxygen
atom of Ile88 and the nitrogen atom of the indole side chain of Trp102.
4.15 Analysis of the DNA-binding domain
Figure 4-23 shows the ribbon structure of the wHTH domain from chain A of HxlR2-His.
Figure 4-23 - Ribbon structure of the wHTH domain of HxlR2-His. Structure is colored according to
secondary structure with helices red, strands purple and loop regions grey.
The recognition helix from this motif is expected to make contacts with the major groove of its
TFBS.242 This interaction will play a large role in the specificity of the protein-DNA interaction
due to the specific network of hydrogen bonds between side chains of the recognition helix and
the bases in the major groove (Chapter 1). Analysis of the recognition helix shows that there are
several side chains on its surface that are capable of making hydrogen bonds to the DNA. These
are shown in Figure 4-24 and are Arg-54, Arg-58, Arg-61, Glu-62, Glu-64, Asp-65 and Asp-66.
These residues are likely form the basis for the direct base read-out of this interaction.
137
Figure 4-24 - End view of the recognition helix of HxlR2-His. Side chains of the helix are displayed
and colored according to type: brown (non-polar), purple (charged, acidic), blue (charged, basic)
and yellow (polar). Residues that are on the surface of the helix and are therefore likely DNA-
binding residues are labeled.
As mentioned in chapter 1, the wing of a wHTH motif can also contribute to base readout by
making contacts with the bases in the minor groove. Interestingly, the wing of HxlR2-His is
comprised of entirely non-polar residues; three prolines and two valines. Proline residues are
often found in β-turns as their ability to adopt unusual conformations can induce sharp turns in
the loop. It is highly unusual to have valines on the surface of the protein as their side chains are
hydrophobic and interact unfavourably with water. The fact that these valines are on the
protein surface suggests that they may be involved in a hydrophobic interaction with the DNA. If
a base within the minor groove is methylated, the valine residues could interact with this
methyl group resulting in stabilisation of the hydrophobic atoms. This would be another
contribution to base readout as the interaction would be specific to methylated bases in the
minor groove. For this hypothesis to be confirmed, the structure of a Type 2 HxlR-TFBS would
need to be observed experimentally.
Clearly some of the residues that contribute to HxlR2’s interaction with its TFBS will be
common to many related homologs. It is therefore possible to gain an insight into which
residues are likely to play a common role in the protein-DNA interaction in this family of
138
transcription factors. This was done examining the conservation of residues in this domain.
Conversely, residues that are not well conserved amongst many homologs may contribute to the
specificity of this interaction as these residues can give rise to the base specific interaction.
Equally, poorly conserved residues can contribute little or nothing to the protein-DNA
interaction. In order to determine which residues are likely to play a key-role in DNA binding in
this protein, though not necessarily add to the sequence specificity, a BLAST search was
performed on the domains sequence and the top 1000 hits were inputted into a MSA. The MSA
was constructed using CLUSTAL W.243 Figure 4-25 shows a logo diagram displaying the level of
conservation for each residue in the MSA. The consensus sequence from this MSA is also shown
along with the wHTH domain sequence from HxlR2. Columns above the red line represent
points where >80% of the 1000 different sequences in the MSA have the consensus residue.
Columns above the green line represent points where >95% of sequences in the MSA have the
consensus residue.
Figure 4-25 - Logo diagram of MSA containing the top 1000 hits from a BLAST search using the
wHTH domain of HxlR2 as a query. Consensus sequence is written in black under each column and
the sequence of HxlR2 at a particular position is written in black. Residues with column above red
and green line are conserved in >80% and >95% of sequences in the MSA respectively.
There are clearly some residues that are highly conserved throughout the DNA-binding domain
as well as some that are very poorly conserved. Of the residues in the recognition helix
highlighted in Figure 4-24, only Glu-64 is >95% conserved which could indicate that this residue
plays a general, though essential role for DNA-binding in many homologs of HxlR2. Again of
these highlighted residues, Asp-66 and Arg-61 are both >80% conserved suggesting that they
could play a more general role in DNA-binding. Glu-62 is 72% conserved and so a similar
conclusion can be made about that residue. However, Asp-65, Arg-54 and Arg-58 exhibit very
poor (<10%) conservation. It may well be the case that it is these residues that contribute to the
base specificity in the HxlR2-TFBS interaction.
139
Hydrophobic residues are unlikely to be directly DNA-binding, however they will carry out
important structural roles in the protein. Some hydrophobic residues show very high
conservation; the three leucines of the recognition helix are all >95% conserved throughout the
1000 sequences. These leucine residues may help to stabilize both DNA-bound and apo forms of
HxlR2 and related proteins. Val-77 that sits on the surface of the wing is also >80% conserved
indicating that it is likely to have some functional role in HxlR2 as well as other homologs. Pro-
78 and Pro-79 are both >95% conserved and are therefore likely to play an important role with
regard to this loop structure in many homologs of HxlR2. A high resolution structure of a HxlR
like protein in complex with its TFBS may give some insight into what the functional role of
these residues are; to date however, there is no such structure available.
4.16 Discussion of formaldehyde sensing by HxlR2
As HxlR2 is thought to be a sensor of cellular formaldehyde, we would expect it to either bind
directly to formaldehyde, or with a reaction product of formaldehyde, i.e. the adducts formed
from the reaction of formaldehyde with GSH or ribulose-monophosphate. In chapter 5, it is
shown that a cysteine residue is essential for the sensing of formaldehyde in FrmR, which is also
the case with AdhR.169 Therefore, formaldehyde sensing can be driven by cysteine residues,
however it remains to be established whether these two TFs sense formaldehyde directly or
sense an adduct/reaction product. HxlR2 possesses three cysteine residues (Cys-32, Cys-72,
Cys-100). If we were to expect any of these residues to react directly with formaldehyde, we
would observe a nucleophilic attack on the carbonyl group leading to the formation of a
tetrahedral species. This would be expected to induce a conformational change in the protein
structure. It is therefore unlikely that Cys-72 can perform this task because this residue sits on
the surface of the protein and therefore a nucleophilic attack would be unlikely to lead to a large
conformational change. This however would not rule out a change in the protein-TFBS complex
or an induction of RNAP recruitment caused by modification of Cys-72. Both Cys-32 and Cys-
100 are located in proximity to other residues that could stabilize a tetrahedral intermediate
and thus lead to a conformational change. For example, Figure 4-26 shows that Lys-47 (Chain B)
is in close proximity to Cys-32 (Chain B) (5.77Å) and could readily move to act as a hydrogen
bond donor to a tetrahedral intermediate. This interaction could bring about a conformational
change in the protein that results in changes in its transcription activating properties. Indeed,
an equivalent arrangement is also seen in chain A.
140
Figure 4-26 –Top right: Detail of HxlR2. Chain A and B are colored yellow and blue respectively.
Lys-47 (chain B) and Cys-32 (Chain B) are labeled and the distance between the functional groups
of these residues is shown. Top right: Close up of Cys-32 and Lys-47 showing electron density
contoured at 2σ. Bottom: Stereo pair of the Cys-100 and Lys-47 residues with a rotation angle of
five degrees.
Equally, shows that a similar arrangement is seen at Cys-100 where the amino group of Lys-14
(Chain B) is located 5.57Å from Cys-100’s thiol group (Chain A). Again, a very similar geometry
is observed with the equivalent groups on the other chains.
141
Figure 4-27 – Top left: Detail of HxlR2. Chain A and B are coloured yellow and blue respectively.
Lys-14 (Chain B) and Cys-100 (Chain A) are labeled and the distance between the functional
groups of these residues is shown. Top right: Close up of Cys-100 and Lys-14 showing electron
density contoured at 2σ. Bottom: Stereo pair of the Cys-100 and Lys-14 residues with a rotation
angle of five degrees.
It is also possible that both Cys-32 and Cys-100 are involved in formaldehyde sensing however
none of these suggestions can be proven without conducting further experimental research. In
HypR, it is the formation of a disulphide bond between Cys-14 and Cys-46 that is believed to
142
convert a non-active oxidized protein to an active oxidized protein. These cysteine residues are
not conserved in HxlR2 suggesting this is not a viable mechanism of regulation in this protein.
This is unusual given their sequence similarity however the overlay in Figure 4-13 shows that
HxlR2-His overlaps much better with the oxidized form of HypR suggesting HxlR2 is already in
this “active” conformation. This further suggests a similar mechanism is unlikely. The
comparison between these two proteins is discussed further in Chapter 6.
4.17 Discussion
Despite screening through an extensive range of conditions, several of the proteins in this study
failed to crystallise; these were FrmR, FrmR-His, and HxlR1. It was therefore not possible to
continue with crystallographic methods on these proteins. However, diffraction quality crystals
of FrmRC36S and HxlR2 were obtained. A full data set of an FrmRC36S crystal that diffracted to
a resolution of 3.0Å was recorded. Data processing showed that these crystals were of the space
group P3212. Unfortunately, although a plausible molecular replacement solution could be
obtained, this could not be refined. This could be due to the lower resolution obtained, in
combination with the large degree of error in the model used. Furthermore, signs of merohedral
twinning were detected in all FrmRC36S datasets. This is likely to result in the sort of problems
observed here, i.e. a potential solution is obtained from the MR software however the data are
too different to result in a refined structure.
Two crystal forms of HxlR2-His were observed. One of these was space group P43212 which
diffracted to 2.3Å and was fully refined using MR. The other crystal was of P21212 and diffracted
to 1.95Å. Although a solution to this data set was found, the data refined poorly. Again, the cause
of this problem is unknown however the diffraction data was clearly not of the same quality as
the tetragonal data. The final models obtained of both forms overlap very well and are
effectively the same structure given the error associated with crystal structures of this
resolution. One noticeable difference in the crystalline packing of these two proteins is the
formation of a disulphide bond between C72 of neighbouring molecules.
The overall dimeric structure of HxlR2-His is similar to that of other proteins of this family. Each
subunit contains a wHTH DNA binding domain that is linked to a helical dimerization domain.
The wHTH of each subunit are located at opposite ends of the protein which is likely to define
the protein-DNA interaction. The formation of the dimer interface appears to be driven by
hydrophobic interactions between the dimerization domains of each subunit. As would be
expected in any accurate protein model, the B-factors of loop regions tend to be higher than
143
those of secondary structure regions. Side chains that are in contact with the solvent also have
higher B-factors than those that are buried within the protein interior, reflecting the higher
motilities of these side-chains.
One very interesting feature of HxlR2-His is at the loop region comprising the “wing” of the
wHTH motif. There are several valine regions that are on the surface of the protein in contact
with the solvent. This is highly unusual as these residues will interact unfavorably with water
and hence destabilise the protein. It is also interesting that one of these valine residues show a
high level of sequence conservation. These observations suggest that these valine residues may
play an important functional role, for example interacting with methylated DNA. Several
residues were noted as being potential DNA-binding residues and these were Arg-54, Arg-58,
Arg-61, Glu-62, Glu-64, Asp-65 and Asp-66. Some of these are well conserved and may therefore
play a similar role in related proteins. It is worth noting that many of the leucine residues in the
DNA binding domain are well conserved amongst other homologs. This suggests that the
structural role that these hydrophobic residues carry out is highly important in HxlR-like
proteins. How this protein senses formaldehyde remains unknown and will require further
experiments.
144
5 In vitro and in vivo functional characterisation of FrmR and HxlR
5.1 Introduction
This chapter intends to further study the functional properties of the TF FrmR, which is a
duf156 family protein which are shown to the right the flow diagram in Figure 1-32. As
described in chapter 1, previous experiments have suggested that FrmR is a repressor of the
frmRAB operon and that this repression is stopped in the presence of formaldehyde.176 This
chapter tests this hypothesis both in vivo and in vitro and provides further evidence that
supports this mechanism of regulation. These experiments also provide further insights into the
interactions of FrmR with its TFBS as well as formaldehyde.
FrmR is a member of the Duf-156 family of TFs; the only other members of this family to be
characterised are TFs that sense heavy metals and regulate TGs involved in the transport of
these metals. These are the copper sensor CsoR from several organisms and RcnR from E. coli.
These TFs coordinate to metal ions which is thought to induce a conformational change in the
protein resulting in a dissociation from their TFBS and hence de-repression. This coordination
involves several protein residues including a conserved cysteine residue that corresponds to
Cys-36 of FrmR. 179,232,244,245 A similar link between formaldehyde and metal sensing appears to
exist in the MerR family, of which AdhR (section 1.4) is a member. CueR is a well-studied
member of this protein family and is involved in the regulation of copper transport proteins.
CueR specifically binds to copper which, as in CsoR/RcnR, induces a conformational change
causing an alteration of its TGs expression.246 Interestingly, this coordination also involves a
conserved cysteine residue that corresponds to Cys-52 in AdhR. This residue was found to be
essential for AdhR’s ability to sense formaldehyde and induce the expression of the
formaldehyde detoxification pathway that it had been shown to regulate.169 Hence, in the MerR
family there appears to be a link between sensors of heavy metal ions and sensors of
formaldehyde, both using a conserved cysteine residue. A similar situation could exist for Duf-
156 proteins, a hypothesis we will test in vitro and in vivo using a C36S FrmR mutant.
Both CsoR and RcnR have been shown to specifically bind to their promoters at TFBSs which are
located upstream of the transcription start site and overlap with the -10 and -35 structure
elements. These TFBSs possess AT containing inverted repeats which are interrupted by runs of
guanine and cytosine bases. In the RcnR promoter, these G and C tracts are shown induce the
formation of A-form DNA at these inverted repeats. Mutational studies suggest that the
145
formation of A-form DNA greatly enhances the affinity of RcnR for its TFBS.247 The FrmR
promoter also contains inverted repeat regions interrupted by G and C tracts that overlap with
the consensus -10 and -35 elements (Figure 5-1). There are two inverted repeats which may
indicate the presence of two different binding sites.
Figure 5-1 - Schematic of the frmRAB promoter. Consensus -10 and -35 elements are coloured
blue, the AT inverted repeats are in bold, C and G tracts are coloured red and the translational
start site is coloured green.
This promoter arrangement suggests that FrmR acts in a similar way to CsoR and RcnR and may
specifically bind to the inverted repeat regions. The G and C tracts may also be important
structural elements by enhancing this specificity. In this chapter, it is tested whether FrmR
specifically binds to its TFBS in vitro. However, the role of the C and G tracts was not examined.
FrmR residues that are likely to play an important role in the protein-DNA interaction are also
identified.
This chapter also aims to acquire a better understanding of how the HxlR proteins carry out
their biological function. In vitro experiments are carried out to try and obtain information to
help in this understanding. Figure 1-32 shows that the HxlR protein family is a sub family of the
GntR superfamily of wHTH TF proteins. Two previously studied HxlR family TFs that show
strong homology to both HxlR1 and HxlR2 have been characterized. These are YodB and HypR,
both from Bacillus subtilis. HxlR1 is 35.0% and 46.0% identical to YodB and HypR respectively
and HxlR2 is 32.2% and 37% identical to YodB and HypR respectively. Both these proteins
sense reactive electrophiles; YodB has been shown to sense quinones and HypR senses diamide.
Given the sequence similarity of HxlR1 and HxlR2 to these TFs, as well as the fact that
formaldehyde is a reactive electrophile, it is likely that these proteins function in a similar way.
Both HypR and YodB have been shown to regulate their TGs by what is known as a “two-Cys
type redox-sensing mechanism”. This mechanism involves the electrophile oxidizing a pair of
cysteine residues resulting in the formation of an intermolecular disulphide bridge between two
subunits. This is believed to alter the conformation of the protein, thus changing its
transcriptional regulatory properties. 238,248,249
146
For both YodB and HypR, one of the essential cysteine residues for the sensing mechanisms
corresponds to Cys-11 in HxlR1. The conservation of this residue could indicate that it plays an
important role in the formaldehyde sensing mechanism of HxlR1. However, this cysteine
residue is the only one of its kind in the HxlR chain which this rules out the potential for HxlR1
to function via the two-Cys type redox-sensing mechanism. If HxlR1 senses formaldehyde
through a cysteine residue then it will have to be through this Cys-11 residue and cannot
involve the formation of intramolecular disulphide linkages. Therefore, despite the high
homology between HxlR1 and HypR, they must sense their effector molecules by different
mechanisms. Interestingly, HxlR2 and the other type 2 HxlR TFs (see Figure 3-1) do not possess
a corresponding Cys-11 residue. However, they do possess other cysteine residues that may
support formaldehyde sensing (section 4.16). If HxlR2 is in fact a sensor of formaldehyde and
this is done through cysteine residue, then the mechanism of action will have to be different
from that of HxlR1.
Although previous biochemical work would suggest formaldehyde sensing, like other oxidant
sensing, would be driven by a nucleophilic thiolate; it is entirely plausable for sensing to be
driven by a non-cysteine residue such as lysine.250
5.2 Aims and objectives
Based on the sequence similarities between FrmR and the metal sensors, CsoR and RcnR; it can
be postulated that FrmR might behave in a similar way with regard to its interactions with DNA
and effector moleucles. Initially, this chapter will try and determine whether FrmR interacts
with its promoter in vitro. This will be done by performing a series of electropheric mobility
shift assays. The effect of formaldehyde on this interaction will also be studied. So that FrmR can
be studied in vivo, this section describes the construction of two plasmid based reporter
systems as well as the steps taken to acquire an E.coli DE3 lysogen that lacks the genomic frmR
gene. One of these reporter systems is shown to work as expected with respect to repression
and formaldehyde induced derepression. The reporter plasmid is then used to study which
residues are likely to be fundamental to DNA binding by creating mutations in the FrmR
sequence and analysing their effect on repression. Similarly, the in vivo reporter system is used
to identify a cysteine residue that is essential to formaldehyde induced derepression. This
particular mutant is purified in order to assess its behaviour in vitro with respect to how
formaldehyde affects its DNA binding properties.
147
Experiments in this section also set out whether HxlR2 specifically interacts with its supposed
TFBS. This is attempted using EMSAs and the effect of formaldehyde on this interaction is
studied. Little is understood in terms of how, if at all, formaldehyde affects HxlR2 and its
interaction with its TFBSs. Therefore, an experiment in this chapter sets out to observe if
formaldehyde affects the environment of as HxlR2-TFBS complex by monitoring the
fluorescence of a labelled TFBS that’s emission intensity is sensitive to complex formation
.
148
5.3 In vitro analysis of the FrmR:frmRAB promoter interaction
5.3.1 A Non-Competitive Electrophoretic Mobility Shift Assay (EMSA) reveals that
FrmR-His does not bind the frmRAB operator
Given the marked increase in efficiency of protein purification and detection for his-tagged
proteins (as compared to most non-tagged counterpart proteins), initial experiments were
performed using FrmR-His. To investigate the FrmR-His interaction with the frmRAB promoter,
a 150bp DNA-fragment that includes all the inverted repeat region of the promoter was
amplified using PCR (Primers: frmRAB150F, frmRAB150R). A simple but definitive experiment
was done to determine whether FrmR-His binds to frmRAB promoter. An EMSA experiment
relies on the fact that a DNA-protein complex will run at a higher molecular weight than the
corresponding ligand-free DNA molecule when subjected to gel electrophoresis.191 If an
interaction is observed, one can verify whether the interaction observed is specific by including
a competing, random DNA-sequence with the DNA-fragment of interest. EMSA experiments
were carried out as described in section 2.4.4. Figure 5-2 shows the results of an EMSA
experiment using a 150bp frmRAB fragment incubated with increasing concentrations of FrmR-
His. It reveals that there is no change in the apparent molecular weight of the DNA, and
establishes FrmR-His does not exhibit any significant affinity for the frmRAB promoter. This
experiment was repeated using a range of buffer conditions, all of which gave similar results,
which verifies the lack of interaction.
Figure 5-2- Acrylamide gel showing the 150bp fragment after being incubated with increasing
FrmR-His concentrations. Molar ratio of protein to DNA is shown above each well. All lanes
contain 60ng of DNA. Lane 1- 0g FrmR, Lane 2 – 110ng FrmR-His, Lane 3 – 430ng FrmR-His, Lane 4-
2.1μg FrmR-His. All wells contained 5mM 2-mercaptoethanol.
149
5.3.2 A Non-Competitive EMSA shows that FrmR binds to the frmRAB operator
As FrmR is anticipated to be a repressor that binds DNA in absence of ligand, the lack of DNA
binding by FrmR-His is a surprising result. However, DNA-binding could be disrupted by the
‘His-Tag’ at the C-terminus of the protein. We therefore decided to clone and purify a non-
tagged version of the protein (the N-terminal His-tag is not soluble; see Chapter 3) in order to
verify the effect of the His-tag on DNA binding. The results of this experiment are shown in
Figure 5-3.
Figure 5-3 - Acrylamide gel showing the 150bp fragment after being incubated with increasing
FrmR concentrations. Molar ratio of protein to DNA is shown above each well. All lanes contain
60ng of DNA. Lane 1- 0g FrmR, Lane 2 – 92ng FrmR, Lane 3- 184ng FrmR, Lane 4 -460ng FrmR. All
wells contained 5mM 2-mercaptoethanol.
A clear shift is observed in the apparent molecular weight of the frmRAB DNA fragement in the
presence of FrmR. These results indicate that FrmR binds to frmRAB promoter. It is therefore
likely that the FrmR-His non DNA-binding behaviour is due to the ‘His-tag’. Interestingly, the
FrmR:frmRAB interaction was only observed in a reducing environment i.e. in the presence of
5mM 2-mercaptoethanol.
150
5.3.3 The effect of formaldehyde on formation of the FrmR:frmRAB promoter
complex
As FrmR is thought to be a repressor of the frmRAB operon, it is likely to have a large decrease
in affinity for the corresponding TFBS when bound to the effector molecule. In this case, the
effector is postulated to be formaldehyde (or a metabolite:formaldehyde adduct). An FrmR
sample was treated with formaldehyde and used in a repeat of the non-competitive EMSA
experiment. Previous studies of distinct FDP regulators have shown that 10mM formaldehyde
has no effect on the results obtained from EMSA experiments. These studies have been with
AdhR and HxlR1 from Bacillus subtilis.171,169 For this experiment, FrmR was pre-treated with
10mM formaldehyde for 10 minutes prior to incubation with DNA. Figure 5-4 shows the result
of this experiment.
Figure 5-4 - Acrylamide gel showing 60ng 150bp fragment after being incubated with lane 1- No
FrmR, Lane 2- 360ng FrmR that had been incubated in 10mM formaldehyde. All wells contained
5mM 2-mercaptoethanol.
Figure 5-4 reveals that in the presence of formaldehyde, FrmR’s affinity for the frmRAB
promoter is greatly diminished. This suggests that formaldehyde could indeed be the natural
effector molecule for FrmR.
151
5.3.4 Analysis of the specificity of FrmR:frmRAB promoter interaction and its
dependence on formaldehyde using EMSA
In order to determine whether the observed FrmR:frmRAB protein-DNA interaction is indeed
specific for the frmRAB sequence, an EMSA was carried out in presence of a competing DNA
sequence (poly I – poly C), as well as repeating the same experiment with a different strand of
DNA. These EMSA experiments require the use of labelled DNA to distinguish it from the
competitor DNA. For this purpose one of the primers was labelled with biotin. Once the
acrylamide gel has been ran, the DNA can be blotted to a positive nitrocellulose membrane and
cross-linked to it using UV light. The biotin labelled DNA can then be selected for with a
streptavidin-horseradish peroxidase conjugate, which binds to the biotin. This complex can then
be detected using a luminol/enhancer solution that exhibits chemiluminescence.251 In these
experiments there is a vast excess of competitor DNA compared to the specific DNA sequence
(480ng competitor to approximately 5ng specific). An excess of TF is also therefore required in
the experiment. This EMSA experiment was performed with FrmR and a 231bp amplified DNA
sequence with Biotin linked to its 5’ end. (Primers: frmRABBiotin, frmRAB150R). Figure 5-5
shows the result of this ESMA.
Figure 5-5 - Competitive EMSA experiment using the biotin labeled 230bp frmRAB fragment. All
lanes contained 5.8ng DNA. Lane1- No FrmR, Lane 2- 100ng FrmR, Lane 3- 200ng FrmR, Lane 4-
300ng FrmR, Lane 5- 400ng FrmR, Lane 6- 500ng FrmR, Lane 7- 600ng FrmR, Lane 8 700ng FrmR,
Lane 9-800ng FrmR. All wells contained 5mM 2-mercaptoethanol.
The effect of formaldehyde on this interaction was determined by pre-treating FrmR with
10mM formaldehyde. The result of this experiment is shown in Figure 5-6. The result shows
that this protein-DNA interaction does not occur when FrmR has been treated with
formaldehyde.
152
Figure 5-6 - Competitive EMSA experiment using the biotin labeled 230bp frmRAB fragment. All
lanes contained 5.8ng DNA. Lane 1 – No FrmR. Lane 2- After incubation with 600ng FrmR . Lane 3-
As lane two with FrmR being subject to incubation in 10mM formaldehyde prior to incubation
with DNA. All wells contained 5mM 2-mercaptoethanol.
The experiment was repeated using a different DNA molecule that FrmR should not specifically
bind to. This 301bp DNA fragment contains a known TFBS from Dehalococcoides sp. (genome
position 1184670-1184971) (Primers: DehaloF, DehaloR). Figure 5-7 shows the result of this
experiment.
Figure 5-7 - Competitive EMSA using the biotin labeled fragment from Dehalococcoides sp. All lanes
contained 5ng DNA. Lane 1 – No FrmR added. Lane 2- After incubation with 900ng FrmR . Lane 3-
As lane two with FrmR being subject to incubation in 10mM formaldehyde prior to being
incubated with DNA. All wells contained 5mM 2-mercaptoethanol.
153
The results show that under these conditions, FrmR induces a decrease in mobility of frmRAB
but not of the fragment from Dehalococcoides sp. This indicates that this interaction is likely to
be a specific one. As with the non-competative EMSA, 10mM formaldehyde causes FrmR’s
affinity for the frmRAB to be reduced as no shift is observed for these samples.
5.4 Construction of an in vivo FrmR-reporter system
In order to study the in vivo properties of the FrmR TF, we set out to construct a suitable in vivo
reporter system. This reporter system is plasmid based with frmR expression being under the
control of an IPTG inducible T7 promoter. The frmRAB promoter is contained downstream on
the same plasmid but with a reporter gene in place of the frmRAB operon (Figure 5-8). There is
a wide range of reporter genes that are frequently used in the literature, and we screened 2
distinct reporter constructs: The green fluorescent protein (GFP) and the kanamycin resistance
gene (KanR). The measurable quantity when using the GFP gene is the increased fluorescence of
the cell culture that is observed when the protein is expressed; with the KanR gene as a
reporter, it is cell growth in the presence of kanamycin that is measured.
Figure 5-8 – Organization of the reporter systems (B) with respect to the chromosomal
organization of the frmRAB operon (A)
5.4.1 Construction of the frmRAB-KanR and the frmRAB-GFP inserts
The procedure used to construct the reporter system was to take pET15b-frmR, and to insert
frmRAB-KanR and frmRAB-GFP downstream at the BamH1 restriction site. The overall strategy
used to achieve this is schematically summarised in section A.1.5. It was therefore necessary to
create both an frmRAB-KanR insert and an frmRAB-GFP insert capable of being inserted at this
particular restriction site. All of the molecular biology techniques utilised in this section were
154
carried out as described in section 2.2. A 453 bp frmRAB insert was created using PCR that
contained the entire frmRAB promoter upstream from the start codon. Primers were designed
such that the frmR start codon and the three bases upstream of it at the 3’-end (GAAATG) were
converted to an Nde1 restriction site (CATATG). At the 5’-end, the primers were designed to
introduce a complementary sequence to that of pET15b-frmR at the BamH1 restriction site.
(Primers: frmRABF, frmRABR) The amplified fragment ran on an agarose gel is shown in Figure
5-9. A 768bp GFP insert and an 836bp KanR insert were amplified using primers that on the 5’-
end introduced an Nde1 restriction site at the start codon of the GFP/KanR genes. At the 3’-end
of each fragment, downstream of the GFP/KanR stop codon, a sequence complementary to that
of pET15b-frmR at the BamH1 restriction site was introduced (Primers: KanF, KanR / GFPF,
PGFPR ). The amplified fragments are shown on an agarose gel in Figure 5-10
Figure 5-9 - Agarose gel showing amplified PCR fragments. Lane1 – Marker, Lane 2- 453bp frmRAB
insert, Lane 3- 863bp KanR insert, Lane 4 – 768bp GFP insert
The frmRAB and GFP/KanR fragments were digested with Nde1 and purified. The fragments
were ligated and the resulting ligation mix loaded onto an agarose gel. Following
electrophoresis, the fragments at the appropriate size were excised and PCR amplified.
(Primers: FrmRAB, KanRR/PGFPR). The ligation products and the PCR amplified inserts ran on
an agarose gel are shown in Figure 5-11
155
Figure 5-10 - Agarose gels showing ligation products and purified inserts for the KanR (right) and
GFP reporter genes (left) . Lane 1- Marker, Lane 2- Ligation product between Nde1 digested
frmRAB and GFP/KanR the band at the expected size of the frmRAB-GFP/KanR insert is circled in
red, Lane 3- PCR amplified frmRAB-GFP/KanR.
pET15b-frmR was subjected to a BamH1 restriction digest and the frmRAB-GFP/KanR fragments
were inserted using the in-fusion cloning reaction.219 The reaction product was transformed
into E. coli DH5α and cells were screened for ampicillin resistance. Candidate colonies were
used for plasmid preparation and resulting plasmids were sequenced. Sequencing indicated that
the desired sequences were in place in the plasmids confirming a successful formation of the
reporter system constructs. These plasmids are being termed pGFPR and pKanRR for the GFP
and KanR reporter systems respectively.
5.4.2 Construction of an E. coli ∆frmR strain
In addition to creating suitable reporter systems for the in vivo experiments in E. coli, it was
necessary to create an E. coli strain lacking an endogenous copy of frmR on the chromosome.
This ensures that experimental results can be attributed to the recombinant protein. A
collection of “knock out” strains of E. coli K-12 BW25113 of all non-essential genes of that
organism has been created and is distributed by the National BioResource Project (NIG, Japan).
We obtained a strain from this collection lacking the frmR gene.252 This strain contained a
kanamycin cassette in place of the frmR gene. As we intended to use kanamycin resistance as a
reporter gene, it was necessary to remove this cassette. The method used to create the frmR
knock-out uses a DNA fragment with the kanamycin resistance gene in the centre flanked by
two identical sequences known as FRT sequences. At the each end of the FRT sequence is a 36bp
extension which is complementary to one of the ends of the part of chromosome to be removed.
The insertion of this DNA fragment is catalysed by the enzyme λ red recombinase for which the
gene is located on a helper plasmid, which is removed after a successful insertion.253 The
156
kanamycin cassette can be removed by transforming the knock out strain with a plasmid
carrying a gene for the enzyme FLP. When FLP is expressed, it targets the FRT sequences,
removing the kanamycin resistance gene as well as one of the FRT sequences.254 A plasmid
called pFT-A was obtained from the National BioResource Project (NIG, Japan) which contains a
chlorotetracycline inducible FLP gene and also contains the ampicillin resistance gene.
Furthermore, pFT-A contains a temperature sensitive replication site that allows it to be
removed from the strain when grown at high temperatures. pFT-A was transformed into E. coli
∆frmR and cells were grown at 30°C in the presence of chlorotetracycline in order to induce
FLP. Cultures were then grown at 40°C in order to remove pft-A. Cultures were then selected for
lack of resistance with respect to ampicillin and kanamycin. Further details of this procedure
can be found in section 2.2.17. A candidate colony that was found to lack both Amp and Kan
resistance was tested for the presence of any residual KanR gene by extraction of the
chromosomal DNA. DNA was also extracted from the initial KEIO strain as well as that from E.
coli K12. The PCR reaction in section 3.4.1 was repeated (Primers frmRF and frmRR) on each
chromosome. Figure 5-11 shows the products from this experiment ran on agarose gel. The
amplified fragment from E. coli K12 is the 778bp fragment containing the frmR gene. The
amplified product from E. coli ∆frmR contains the kanamycin cassette and is approximately
1.7kbp while that from the same strain with the cassette removed is approximately 600bp.
Figure 5-11 - Agarose gel showing PCR products from different strains of E. coli using the same
primer pair. Lane 1 – Marker, Lane 2- fragment from E. coli K12 Lane 3- fragment from E. coli
∆frmR with kanamycin cassette, Lane4- fragment from E. coli ∆frmR with removed kanamycin
cassette.
5.4.3 Construction of the E. coli ∆frmR (DE3) strain
As the reporter plasmids were to express frmR from a T7 promoter, the knock out strain
required a source of T7 polymerase. This was achieved using the procedure described in section
157
2.2.18 by infecting the cells with λDE3 Phage from Novagen. After plating the infected colonies
onto an agar plate, several colonies were picked and tested for their capacity to induce
expression of frmR. The candidate E. coli ∆frmR (DE3) was transformed with pKanRR and grown
in LB media at 37°C to an OD600 of 0.5. Cultures were then treated with 1mM IPTG and grown at
15°C for 10 hours. Also, a separate control set of cultures were treated in the same way but
without the addition of IPTG. The induced cell cultures were lysed in order to obtain the soluble
fraction. Fractions from this experiment were subject to SDS-PAGE analysis and the result is
shown in Figure 5-12. Figure 5-12 shows that the frmR gene has been over expressed with a
prominent band at the expected molecular weight (~10kDa) that is not present in the control
sample.
Figure 5-12 - SDS-PAGE analysis from expression trial with E. coli ∆frmR (DE3). Lane 1-Marker,
Lane 2- control sample, Lane 3- Induced cells, Lane 4- soluble fraction of the induced cells.
5.5 In vivo studies of FrmR function
Previously published work has indicated that FrmR represses transcription from the frmRAB
operon.176 On this basis, the reporter system that has been created should result in down
regulation of the reporter gene when grown in the presence of IPTG. It is also known that FrmR
repression is reduced in the presence of formaldehyde; the presence of formaldehyde should
therefore cause an increase in the transcription level of the reporter genes. A schematic diagram
of how these reporter systems should work is shown in Figure 5-16
158
Figure 5-13 - Supposed mechanism of how both reporter systems will function
5.5.1 Initial characterisation of the pGFPR reporter system
These experiments with pGFPR were performed as detailed in section 2.4.6. Figure 5-17 shows
the relative fluorescence of E. coli ∆frmR (DE3) containing pGFPR grown in minimal media for
14h, with and without the presence of 75µM IPTG. Also, to determine whether the reporter
system was sensitive to formaldehyde, the effect of 0.3mM formaldehyde on the results of the
experiment were assessed. The results are on a relative “percentage fluorescence” scale with
the uninduced culture described above being 100% and a culture of E. coli ∆frmR (DE3) lacking
the pGFPR plasmid being treated as 0%. The results are average values from 5 independent
repeats.
Figure 5-14- Relative fluorescence levels of cultures of E. coli ∆frmR (DE3) containing the pGFPR
reporter plasmid. Cells were grown at 25˚C in minimal media for 14 hours. Error bars represent 2
times the standard deviation from five independent repeats.
There is a higher level of fluorescence in the control sample compared to the induced. This
indicates that the reporter system is acting as we would expect i.e. the induction of FrmR causes
159
a decrease in the amount of reporter gene expressed from the pGFPR plasmid. However, if the
system was sensitive to formaldehyde, an increase in fluorescence would be expected in the
culture containing formaldehyde. Figure 5-14 shows no significant increase in fluorescence
levels in this sample. Although the reasons for this are unclear, the pGFPR system is clearly not
able to support further experiments that aim to assess formaldehyde binding.
5.5.2 Initial characterisation of the pKanRR reporter system
All experiments using the pKanRR reporter plasmid were carried out as described in section
2.4.7. Figure 5-15 shows a representative graph of cell density measured as OD600 against time
for the three conditions listed in Table 2-6. These are Media A (LB, Kanamycin), Media B (LB,
IPTG), and Media C (LB, Kanamycin, IPTG). Also shown is a bar chart of statistically analysed
data (i.e. averaged data from three independent experiments) of the OD600 after 14h of growth.
Figure 5-15 reveals both controls grow at a similar rate throughout the experiment. However,
the culture in media C displays significant growth inhibition. This is expected to occur as FrmR
is a repressor of the frmRAB operon.
Figure 5-15-Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing the pKanRR plasmid in media solutions A,B and C.
Right- Bar chart showing mean OD600 values of E. coli ∆frmR (DE3) cultures containing the pKanRR
reporter plasmid after growth for 13 hours. Data are shown for cells grown in media solutions A, B
and C with the error bars representing the standard deviations from three independent
experiments.
160
In order to see how the pKanRR system responds to formaldehyde, cultures were grown in
media C with different amounts of formaldehyde. Figure 5-16 shows a representative graph of
cell density measured as OD600 against time for these samples as well as a bar chart of data of
the OD600 measurement after 13 hours of growth. The results show that formaldehyde increases
the growth rate of these cell cultures, indicating an increase in reporter gene expression. This
can be explained by formaldehyde causing derepression of FrmR on the frmRAB operon.
Figure 5-16 - Left- Representative graph showing growth of cultures of E .coli ∆frmR (DE3)
containing the pKanRR reporter plasmid. The cells represented by black squares were grown in
media C. The cells represented by red circles were grown in media C also containing 0.3 mM
formaldehyde.
Right- Bar chart showing mean OD600 values of cultures of E. coli ∆frmR (DE3) containing the
pKanRR reporter plasmid after growth for 13 hours. Data are shown for samples grown in media C
with and without 0.3 mM formaldehyde. Error bars represent the standard deviations from three
independent experiments.
The pKanRR reporter appears to be more sensitive to IPTG as well as responding to
formaldehyde in the way that we would expect. Therefore, the remaining in vivo experiments in
this chapter were conducted using the pKanRR reporter system.
161
5.6 In vivo analysis of the properties of selected FrmR mutants
5.6.1 Prediction of the FrmR DNA-binding residues
As FrmR is a DNA binding protein, particular residues will play a part in the specificity of this
interaction. Knowing which of these residues are necessary for FrmR to bind to its TFBS would
give insight into how these proteins carry out their regulatory function, especially in absence of
any FrmR-DNA structural model being available. To try and determine which residues are
involved in this specific protein-DNA interaction, a series of computer programs were used to
identify potential candidates. The effect of mutating these residues on repression by FrmR was
analysed using the pKanRR reporter system. Any residue that was predicted as “binding” in 3 or
more out of the five algorithms was chosen to be analysed using the pKanRR reporter system.
The five programs used were: DBindR197, BindN198, DNAbindR199, DP-Bind200 and Prote DNA201
All these programs are based on “classification” algorithms and use knowledge based on known
DNA-binding TFs and the some of the estimated biophysical properties of the protein., i.e. likely
secondary structure at particular points and residue hydrophobicity. However, they each use
significantly different methods so they may generate different results. Table 2-1 shows the
residues that were identified as DNA binding residues by three or more of the programs used.
Table 5-1 - FrmR residues predicted to be DNA-binding residues various computer programs.
These predictions seem reasonably intuitive with most being polar or charged residues. Oddly,
Gly-47 has also been selected which would not ordinarily be expected to be a DNA binding
residue. Nevertheless, alanine mutants of each of the nine residues in Table 5-1 were created
Residue Programs
Lys-10 DNAbindR, BindN, ProteDNA
Tyr-13 DBindR, DP-Bind, DNAbindR, BindN,
ProteDNA
Arg-14 DBindR, DP-Bind, DNAbindR, BindN,
ProteDNA
Arg-16 DBindR, DP-Bind, DNAbindR, BindN
Arg-17 DP-Bind, DNAbindR, BindN, ProteDNA
Arg-19 DBindR, DP-Bind, BindN
Arg-46 DBindR, DP-Bind, DNAbindR, BindN
Gly-47 DBindR, DP-Bind, DNAbindR
Lys-91 DP-Bind, DNAbindR, BindN
162
using site-directed mutagenesis in the pKanRR construct. (Primers: frmRK10f/frmRK10r,
frmRT13f/frmRT13r, frmRR14f/frmrR14r, frmRR16f/frmRR16r, fmrRR17f/frmRR17r,
frmrR19f/frmRR19r, frmRR46f/frmRR46r, frmRG47f/frmRG47r, frmRK91f/frmRK91r). Each
mutant was then tested for its ability to repress transcription from the frmRAB operon. The
experiments were performed as described in 2.4.7.
5.6.2 Experimental analysis of putative FrmR DNA-binding mutants
Each mutant was transformed into E. coli ∆frmR (DE3) and a culture of the strain was grown
overnight in LB media with ampicillin. These cultures were used to inoculate the three different
LB media solutions from Table 2-6 (A, B, C). The growth of E.c oli ∆frmR (DE3) containing each
FrmR mutant on the reporter plasmid in each of the three media solutions was then measured.
Figure 5-17 shows representative growth curves of each mutant in media A, B and C.
163
Figure 5-17 - Representative growth curves of each mutant of in each of the three media
conditions A (black squares), and B (red circles). C (blue triangles),
5.6.3 Summary of FrmR alanine mutants
Figure 5-18 shows a bar chart of the OD600 after 13 hours for each mutant in media C expressed
a percentage of growth on media B. The chart shows that three mutants (Arginine-14, Arginine-
46 and Lysine-91) grow to a significantly higher average OD600 value than the WT or other
164
potential DNA-binding mutants. This suggests these residues are essential for FrmR to bind to
its TFBS. Indeed, all of these residues are highly conserved (as shown in Figure 3-4) which can
often indicate an important functional role. Furthermore, in CsoR from M. tuberculosis, the
residue corresponding to R-14 in FrmR was found to be essential for DNA binding as
established by use of an ESMA.177
Figure 5-18 - Bar chart showing mean value of OD600 of each of the cultures in media C, expressed
as a percentage of growth that occurred in media B after growth for 13 hours. Error bars
correspond to the standard deviation from three independent experiments.
The apparent derepression displayed by these mutants could of course be due to reasons other
than destabilising the FrmR:frmRAB interaction. It could be the case that altering the mutation
causes a distinct change in the physical properties of the protein i.e. causing insolubility or a
change in oligomeric state. Furthermore, the particular mutant could have a lower transcription
or translation rate compared to the WT, resulting in less of the protein in the cell which could
give rise to the above observations. In order to test whether these particular mutations cause a
change in solubility and/or quantity of FrmR, an expression trial was carried out. Cultures of E.
coli ∆frmR (DE3) containing the WT and respective mutant reporter plasmids were grown to an
165
OD600 of 0.5 and were then treated with 0.5 mM IPTG and left to grow for a further 6 hours.
Cells were then harvested and lysed with the soluble fraction being taken. Control samples that
were not treated with IPTG were also taken from each culture. The results of this experiment
are shown in Figure 5-19. Figure shows that each mutant expresses and is soluble due to the
prominent band at the expected molecular weight for each mutant.
Figure 5-19 - Left- SDS-PAGE analysis of the soluble fractions of E. coli ∆frmR (DE3) having been
induced with IPTG. Cells containing the reporter plasmids that were shown to significantly
increase growth in buffer C are shown as well as that of the wild type FrmR. Lane 1- Marker, Lane
2- wild type, Lane 3- R14A, Lane 4- R46A, Lane5- K91A. Right- The same cultures as on the left
without being induced with IPTG.
5.7 Probing the FrmR formaldehyde sensing mechanism
As explained in the introduction to this chapter, a number of observations suggest that
formaldehyde sensing by FrmR could involve a cysteine residue. In order to test this hypothesis,
it was decided to create mutants of both FrmR cysteine residues and test what effect these
mutations had on the formaldehyde response in vivo. FrmR contains two cysteine residues at
positions 36 and 72. Site directed mutagenesis was used to create alanine mutants of both
residues in the pKanRR plasmid. (Primers: frmRC36AF/frmRC36AR, frmRC72AF/frmRC72AF)
Once the mutants were verified by DNA sequencing, they were transformed into E. coli ∆frmR
(DE3). The experiment described in section 5.5.2 was repeated with both cys mutants; Figure
5.20 and Figure 5.21 show representative graphs of OD600 against time for the Cys36Ala and
Cys72Ala mutants respectively. Also shown are bar charts containing data of OD600 measured
after 13 hours for each sample. Figure 5.20 and Figure 5.21 show that both cys mutants repress
166
the expression from the frmRAB operon as observed with the WT protein. This confirms that
both mutants remain able to bind the DNA and cause repression.
Figure 5-20-Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing the pKanRR-C36A reporter plasmid in media solutions A,B and C.
Right- Bar chart showing mean value of OD600 after growth for 13 hours. Error bars correspond to
the standard deviation from three independent experiments.
Figure 5-21-Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing pKanRR-C72A reporter plasmid in media solutions A,B and C.
Right- Bar chart showing mean value of OD600 after growth for 13 hours. Error bars correspond to
the standard deviation from three independent experiments.
167
To then test whether this repression remains formaldehyde dependent with both Cys mutants,
the experiment was repeated with 0.3 mM formaldehyde being present in media C (as described
for wild type FrmR in section 5.5.2). ; Figure 5-22 and Figure 5-23 show representative graphs
of OD600 against time for the Cys36Ala and Cys72Ala mutants respectively along with bar charts
of equivalent samples containing data of OD600 measured after 13 hours.
Figure 5-22- Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing the pKanRR-C36A reporter plasmid. The cells represented by black squares were
grown in media C. The cells represented by red circles were grown in media C also containing 0.3
mM formaldehyde.
Right- Bar chart showing mean OD600 values of cultures of E. coli ∆frmR (DE3) containing the
pKanRR-C36A reporter plasmid after growth for 13 hours. Data are shown for samples grown in
media C with and without 0.3 mM formaldehyde. Error bars correspond to the standard deviation
from three independent experiments.
168
Figure 5-23- Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing the pKanR-C72A reporter plasmid. The cells represented by black squares were grown
in media C. The cells represented by red circles were grown in media C also containing 0.3mM
formaldehyde.
Right- Bar chart showing mean OD600 values of cultures of E. coli ∆frmR (DE3) containing the
pKanRR-C72A reporter plasmid after growth for 13 hours. Data are shown for samples grown in
media C with and without 0.3 mM formaldehyde. Error bars correspond to the standard deviation
from three independent experiments.
In case of the Cys72Ala mutant, growth rates in presence or absence of formaldehyde are
similar to the WT FrmR, suggesting Cys72 is not involved in formaldehyde sensing. However, no
cell growth for the Cys36Ala mutant can be observed in the presence of formaldehyde, which is
in contrast to the significant increase in growth that is observed in the WT. These results imply
that Cys36 residue plays an important role in the sensing of formaldehyde in FrmR. The results
also support the hypothesis that the sensing mechanism of FrmR is based on a covalent adduct
formed with the cysteine residue.
To establish whether the exact nature of the residue at position 36 is crucial to formaldehyde
sensing, it was decided to also mutate the Cys36 to a serine residue and see how this changes
169
the proteins behaviour (Primers: frmRC36SF/frmRC36SR). The same primers were used to
mutate the frmR gene in the pET-15b-frmR construct so that the effect of this mutation could be
analysed in vitro. Serine and cysteine are fairly similar with regard to size and hydrogen
bonding properties but serine lacks the inherent nucleophilic capabilities of cysteine.241 This
mutation should therefore bring about the least alteration in structure at the Cys36
environment, and, in case similar results are obtained, further support the hypothesis that
Cys36 acts as a nucleophile towards formaldehyde. Figure 5-24 and Figure 5-25 show the result
of repeating the previous experiment with the serine mutation in place.
Figure 5-24-Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing the pKanRR-C36S reporter plasmid in media solutions A,B and C.
Right- Bar chart showing mean value of OD600 after growth for 13 hours. Error bars correspond to
the standard deviation from three independent experiments.
170
Figure 5-25 - Left- Representative graph showing growth of cultures of E. coli ∆frmR (DE3)
containing the pKanRR-C36S reporter plasmid. The cells represented by black squares were
grown in media C. The cells represented by red circles were grown in media C also containing 0.3
mM formaldehyde.
Right- Bar chart showing mean OD600 values of cultures of E. coli ∆frmR (DE3) containing the
pKanRR-C36S reporter plasmid after growth for 13 hours. Data are shown for samples grown in
media C with and without 0.3 mM formaldehyde. Error bars correspond to the standard deviation
from three independent experiments.
The results show that there is no significant difference between the serine and alanine mutants
of Cys36.
5.8 In vitro analysis of FrmRC36S
In order to test what effect this mutation of Cys36 has on the properties of FrmR in vitro, a pET-
15b-frmRC36S construct was transformed into ArcticExpressTM (Agilent) cells. These cells were
then grown exactly as described for the WT protein in section 3.5.2. The protein was then
purified as described for the WT protein in section 3.6.2. Figure 5-26 shows the purified protein
(FrmRC36S) subject to an SDS-PAGE analysis. This protocol requires the sample to be separated
using the same SEC described in section 3.6.2 that was used to purify and determine the
apparent molecular weight of FrmR. As expected, FrmRC36S elutes from the SEC at the same
volume as WT FrmR, indicating that FrmRC36S retains the WT oligomeric structure.
171
Figure 5-26 – SDS-page of samples that have been eluted from a SEC during the purification of
FrmRC36S. Lanes going from left to right correspond to increasing elution volume with lanes 10
and 11 being taken as pure FrmRC36S.
5.8.1 EMSA experiments with FrmRC36S and the frmRAB promoter
The competitive EMSA experiment described in section 5.3.4 was repeated using purified
FrmRC36S. The result is shown in Figure 5-27.
Figure 5-27 - Competitive EMSA experiment using the biotin labeled 230bp frmRAB fragment. All
lanes contained 5.8ng DNA. Lane 1 – No FrmRC36S, Lane 2- After incubation with 800ng
FrmRC36S. Lane 3- As lane two with FrmRC36S being subject to incubation in 10mM
Formaldehyde prior to incubation with DNA.
172
Figure 5-27 shows that after incubation with formaldehyde, FrmRC36S is still capable of binding
to the frmRAB promoter. This result further suggests that Cys36 plays a key role in the
regulation of transcription from the fmrRAB promoter.
5.9 Analysis of the DNA binding properties of HxlR2-His
Chapter 4 shows that HxlR2-His has a wHTH motif in each subunit of its structure. To determine
whether HxlR2-His binds to the frmA promoter region, a non-competative EMSA was carried
out using a 181bp fragment of DNA consisting of the intergenic region between the
BcAH187_pCER270_0216 gene and the frmA gene. This is where the TFBS for HxlR2 would be
expected to be located. The fragment was obtained via gene synthesis (MWG) and amplified
using PCR. (Primers: CeringF/R). These EMSA experiments were undertaken using the
procedure described in 2.4.4 The EMSA was carried out with and without formaldehyde and the
results show that formaldehyde does not prevent HxlR2-His from binding to DNA (Figure 5-28).
This was expected, given the fact that the HxlR family of TFs are thought to be activators for
which binding an effector does not cause dissociation from the DNA. Alternatively,
formaldehyde itself may not be the effector for HxlR2 and therefore does not alter its DNA
binding affinity.
Figure 5-28 - Non-competitive EMSAs using the 181bp intergenic fragment and HxlR2-His. All
lanes contain 100ng of DNA. Left- Increasing amounts of HxlR2-His being added to the binding
reaction. Lane 1 – 0g HxlR2-His, Lane 2 – 100ng HxlR2-His, Lane 3 – 200ng HxlR2-His, Lane 4 –
500ng HxlR2-His. Right – EMSA subsequent to treating HxlR2-His with 10mM formaldehyde prior
to incubation with DNA. Lane 1- 0g HxlR2-His, Lane 2- 500ng HxlR2-His.
173
To test whether the HxlR2-His:DNA interaction is specific, a competitive EMSA was conducted.
The forward primer was labelled with biotin (Primer: cerBiotin) and a labelled 181bp fragment
of the intergenic region was amplified. An EMSA was conducted using the same method as
described in section 5.3.4. The binding reactions were incubated with increasing amounts of
HxlR2-His and a shift in mobility of the DNA fragment was observed. (Figure 5-29)
Figure 5-29- Competitive EMSA using the biotin labeled 181bp fragment of DNA. All lanes
contained 4.5ng of DNA. Lane 1- 0g of HxlR2-His added. Lanes 2, 3, 4, 5, 6, 7 contained 300ng,
600ng, 900ng, 12000ng , 1500ng and1800ng of HxlR2-His respectively.
This experiment was repeated with the biotin labeled 200bp fragment from Dehalococcoides sp.
The results from this experiment are shown in Figure 5-30.
Figure 5-30 - Competitive EMSA using the biotin labeled 200bp fragment of DNA from
Dehalococcoides sp. All lanes contained 5.0ng of DNA. Lane 1- 0g of HxlR2-His added. Lanes 2, 3, 4,
5, 6, 7 contained 300ng, 600ng, 900ng, 12000ng , 1500ng and1800ng of HxlR2-His respectively.
This experiment shows that HxlR2-His does not bind to the foreign fragment of DNA under
conditions that binding is observed with the intergenic fragment. This suggests that HxlR2-His
174
binds specifically to its intergenic region. The effect of formaldehyde on this specific binding
was examined by pre-incubating HxlR2-His with 10mM formaldehyde prior to the binding
reaction. The result of this is shown in Figure 5-31. As expected, formaldehyde does not prevent
the specific binding between HxlR2 and its intergenic region of DNA.
Figure 5-31 - EMSA using the biotin labeled 181bp intergenic fragment of DNA. Both lanes contain
4.5ng DNA. Lane 1- 0g of HxlR2-His. Lane 2- 1500ng of HxlR2-His that had been incubated with
10mM Formaldehyde.
175
5.10 Assessing the effect of formaldehyde on HxlR1
In contrast to our results for the FrmR regulator, it has been shown that formaldehyde has little
effect on DNA binding affinity of HxlR1.171 We set out to perform a fluorescence spectroscopy
measurement to perhaps elucidate some information regarding the nature of the
formaldehyde:HxlR1 interaction. The procedure and details of this experiment are described in
2.4.5.
5.10.1 Fluorescence Spectroscopy
In the absence of a direct influence on the DNA binding affinity of HxlR1, the binding of
formaldehyde to the HxlR-DNA complex could induce a conformational change that affects
transcription. Fluorescence spectroscopy can be used to monitor protein-DNA interactions by
labelling a DNA molecule with a fluorescent molecule and binding protein to the labelled DNA. If
the protein is close enough to the fluorescent molecule, the environment of the fluorescent
molecule is altered causing a change in fluorescence emission intensity which can be measured
directly.255 We set out to label a HxlR1 TFBS with a fluorophore, and measure whether any
difference can be observed in emission intensity from the HxlR1-TFBS complex when treated
with formaldehyde. This would indicate that formaldehyde is changing the environment of
fluorophore presumably through alteration in the HxlR1-His protein structure.
In order to conduct this experiment, two 33bp oligonucleotides were synthesized that would
create the BRH1 binding site with a few nucleotides on either side. (Primers: BRH1F/BRH1R)
Additionally, the forward oligonucleotide was labeled at its 5’ end with a fluorophore. This
fluorophore is called Alexa Fluor 555 (Invitrogen) and absorbs light at 555nm and fluoresces at
565nm. Figure 5-32 shows the increase in intensity of fluorescence when a 10μM solution of
labeled BRH1 is treated with increasing concentrations of HxlR1, demonstrating that protein
binding affects the fluorophore environment and can be easily monitored. This graph also
shows that when the same titration is performed in the presence of 15mM formaldehyde, the
observed fluorescence is not significantly different from the non-formaldehyde incubated
sample. A control using BSA rather than HxlR1 is also shown in the graph, which displays no
significant change in fluorescence intensity. These data imply that there is no change in the
environment of the fluorophore when formaldehyde is added to the HxlR1-DNA complex,
suggesting there is no large conformational change in the protein-DNA interaction when
formaldehyde is present. Indeed, it may well be the case that formaldehyde is not the natural
176
effector molecule for HxlR1. It is possible that HxlR1 and HxlR2 sense a formaldehyde adduct
such as that formed with glutathione or ribulose-5-phosphate.
Figure 5-32 - Graph showing change in intensity in fluorescence of labeled BRH1 at 565nm against
molar ratio of protein. Black squares shows HxlR1-His, Red circles show HxlR1-His but in the
presence of 15mM formaldehyde. Blue triangles show BSA. Error bars represent standard
deviations from five independent experiments.
5.11 Discussion
Given the fact that previously studied Duf-156 family TFs have been shown to function as
repressors, it is thought that FrmR behaves in a similar way. An FrmR-DNA interaction would
therefore be expected to be observed through a simple in vitro EMSA experiment. The result of
this experiment with FrmR-His and the frmRAB operon showed no such interaction. This result
suggests that either there is no interaction in vitro between FrmR and DNA, or that the non-
physiological His-Tag perturbs the DNA-binding. The His-Tag was therefore removed from the
protein and the experiment was repeated using the WT protein. The FrmR-DNA interaction was
then observed using the EMSA experiment and this interaction was shown to most likely be
sequence specific.
Other Duf-156 proteins have been shown to lose their affinity for their TFBS in presence of the
177
respective effector molecules. In case of FrmR, pre-incubations with 10mM formaldehyde
indeed leads to a complete loss of affinity for the frmRAB operator. This is the first example of a
formaldehyde sensing TF that appears to interact directly with formaldehyde. This is possibly a
reflextion of the fact that FrmR acts as a repressor, whereas previous in vitro studies have been
carried out with activator TFs (HxlR, AdhR).
In vivo reporter systems were constructed, and the experiments presented in this study provide
further evidence for FrmR functioning as a repressor that is alleviated by formaldehyde.
Induction of the frmR gene apparently results in repression of the KanR and GFP genes which
were placed under control of the frmRAB promoter. In the case of KanR, the addition of 0.3mM
formaldehyde to the cell culture caused significant derepression.
Unlike the more traditional HTH-type TFs, it is still not understood how FrmR and other Duf-
156 TFs bind to DNA. For the details of this process to be established, it will be necessary to
obtain a high resolution structure of a Duf-156 TF bound to its TFBS. Computational algorithms
identified residues likely to be involved in the FrmR-DNA interaction, and the effect of mutating
these residues on the activity of FrmR was tested in vivo. Three particular residues were noted
to cause a significant decrease in repression activity. These were R-14, R-46 and K91;
interestingly, each of these residues is highly conserved. The R-14 residue corresponds to the R-
15 residue in CsoR from M. tuberculosis that was found to be essential for DNA binding in vitro.
In CsoR this residue makes up a positively charged patch on the protein surface likely to form at
least part of the DNA binding region of the protein. Results from this chapter therefore suggest
this residue plays a similar role in the DNA binding functionality of FrmR.
In vivo studies have shown that Cys-36 plays an essential role in FrmR’s sensing of
formaldehyde as mutation of this residue abolishes derepression. Mutation of the other FrmR
cysteine residue (Cys-72) does not result in a change of any of the observable properties.
Interestingly, residues homologous to Cys-36 are known to be essential for heavy metal sensing
in FrmR homologs by coordinating to the metal center. This further suggests a mechanistic link
between metal sensing and formaldehyde sensing in certain TFs, as can be observed in the case
of MerR regulators. In both CsoR and RcnR, a histidine residue (corresponding to His-60 of
FrmR) is also shown to co-ordinate to the metal centre. The fact that His-60 is conserved in
FrmR may indicate that this residue also plays a role in effector molecule binding in FrmR. The
histidine side chain can act as a hydrogen bond donor which first of all may contribute to the
formation of a thiolate ion in Cys-36. After nucleophilic attack on formaldehyde from the
thiolate ion, the tetrahedral intermediate could be stabilized by His-60.
The lack of a high resolution structure of FrmR (Chapter 4) means we are unable to correlate
178
the results of this chapter to the structure of the protein. For example, a high resolution
structure would elucidate the environment of Cys-36 and suggest a molecular interpretation for
the observed formaldehyde effects. For this reason, work is being continued on the pursuit of
obtaining a high resolution structure of FrmR, to which some progress has been made (Chapter
4).
Although no structure exists for a representative HxlR-DNA complex, the presence of HTH
motifs allow for some limited understanding of the DNA binding mode. However, in case of HxlR
family members, the direct binding of formaldehyde or any allosteric effect on DNA binding has
not been observed. We have studied HxlR2 and shown it binds in a sequence specific manner to
its promoter region. This binding appears to be unaffected by 10mM formaldehyde. These
results are similar to those described previously for HxlR1-His. The HxlR1:DNA binding
interaction has been studied using fluorescence spectroscopy in which the labelled DNA
molecule shows a marked increase in fluorescence when treated with HxlR1-His. This is
indicative of a change in environment of the fluorophore when the DNA molecule is bound to
the protein. It was postulated that formaldehyde could induce a large conformational change in
this interaction possibly with a corresponding change in fluorescence intensity. There was no
such change observed when the binding interaction was studied in the presence of
formaldehyde. These results indicate that there is no major conformational change in the
interaction caused by formaldehyde.
179
6 Discussion, Conclusions and Future work
This study set out to provide a molecular understanding of how some bacteria “sense”
formaldehyde, i.e. what are the mechanisms by which transcriptional regulators allosterically
couple ligand binding (presumably with formaldehyde or formaldehyde:adducts) to DNA- and
/or RNA-polymerase binding. There is a wide range of distinct bacterial transcriptional
regulators, and representatives of several transcriptional regulator families have been
implicated in formaldehyde metabolism. This study looked in detail at two distinct regulators:
FrmR and HxlR.
We have been able to further our understanding of the FrmR protein and its basis for regulation
of the associated GSH-FDP genes in E. coli. We have established that it exists in a helical
tetrameric state as observed for homologous CsoR and RcnR proteins (Sections 3.8.2 and 3.9.2)
177,179,245. We reveal that FrmR binds specifically to the frmRAB promoter in vitro, and that this
interaction is severely weakened when FrmR is pre-treated with formaldehyde. (Section 5.2)
This suggests that repression by FrmR may be inactivated by a direct interaction with
formaldehyde. In vivo studies in E. coli have confirmed that FrmR is indeed a repressor of the
frmRAB opeon, and that addition of formaldehyde to the media causes derepression. (Section
5.5.2) These results indicate that FrmR exhibits negative auto-regulation and based on the
organisation of the operon, it is most likely that FrmR is a local TF solely regulating the frmRAB
operon.
In vivo studies have shown that FrmR’s functional repression is weakened/abolished by
mutating several predicted DNA-binding residues (Arg-14, Arg-46 and K91), delineating a
possible DNA binding site on the protein. Furthermore, the FrmR C-terminally His-tagged
protein does not bind DNA. (Section 5.3.1) An arginine residue corresponding to FrmR Arg-14 is
known to be essential for DNA binding in CsoR, and the results in this study suggest that Arg-14
plays a similar role in FrmR.177 It is difficult to establish the exact role of R-46 and K-91 without
a detailed FrmR structure. However, it is interesting to note that K-91 plays such an important
role because it is the C-terminal residue. Ordinarily, residues towards both termini tend not to
have important functional roles.241 However, as shown in Figure 3-4, K-91 shows a high degree
of conservation in terms of amino acid nature and protein chain length, suggesting this result is
significant. Indeed, this may go some way to explain why the C-terminal “His-Tagged” FrmR
protein appears to be inactive in vitro (Section 5.3.1). It will be interesting to establish whether
180
similar effects are observed in vitro. It would also be interesting to determine exactly which part
of the frmRAB promoter FrmR binds; this could be established using DNAse footprinting or
similar methods. Additionally, it was mentioned in the introduction to Chapter 5 that previous
work on the FrmR homolog, RcnR, suggested that G/C tracts played a role in protein-DNA
specificity.247 Figure 5-1 showed that the frmRAB promoter also contains these C/G tracts and
therefore may play a similar structural role.
In vivo experiments have shown that formaldehyde induced derepression of FrmR is critically
dependent on Cys-36. (Section 5.7) It has previously been established that thiols readily react
with formaldehyde169; this fact, in combination with these results, imply that a nucleophilic
attack from Cys-36 to form a covalent adduct is likely to be the basis of formaldehyde sensing in
FrmR.169 Indeed, the equivalent residue in CsoR from M. tuberculosis is essential for copper
sensing in which the residue coordinates to a copper ion.177 It is therefore likely that the
mechanism of regulation in these two proteins is largely similar. Figure 6-1 shows one of the
dimeric subunits of CsoR displaying the coordination sphere of the copper ion.
Figure 6-1 – Structure of a CsoR dimer subunit from Mycobacterium tuberculosis. Segments
coloured red and purple are helical parts of chain A and B respectively. The associated copper ion
is coloured blue and residues constituting the coordination sphere are shown as atom coloured
sticks.
The location of this copper binding site seems ideally placed to induce large conformational
changes as it is situated at a loop region in the middle of the peptide chain. A change in the
conformation of this loop could be coupled to significant changes in the overall protein
structure, moving the alpha-helices relative to each other. Such a movement could result in a
181
conformation in which the DNA-binding residues have moved to such an extent that the protein
shows little affinity for DNA. A summary of the likely mechanism of transcription regulation by
FrmR is shown in Figure 6-2.
Figure 6-2- Proposed mechanism of regulation by FrmR. When Cys-36 is reduced, FrmR is in a
DNA-binding state and thus binds to its promoter. This blocks the -10 and/or -35 regions
preventing the σ factor of RNAP from binding to them. Transcription is therefore repressed. In the
presence of high cellular formaldehyde concentrations, Cys-36 of FrmR becomes oxidised. This
converts FrmR to a non-DNA binding state. The promoter region is now clear for the σ-factor of
RNAP causing the operon to be transcribed.
The secondary structure prediction of FrmR in Figure 3-37 implies that Cys-36 is also located at
such a loop region indicating that the above hypothesis could also apply to FrmR. Indeed, if this
is the case, there may well be other residues in the postulated ligand binding-region that act to
stabilize/sense any Cys-36 adduct formed. It is likely that these residues will be located in
similar positions to the equivalent copper coordination sphere in CsoR. Future work could
182
involve mutational studies with residues in this region, and to establish which residues are
important for this function.
Of course, any of the future experiments above would be greatly facilitated by a high resolution
structure of FrmR. This would give us a better insight as to where the potential DNA-binding
residues are located on the protein. Also, the position of Cys-36 could be determined and the
possible environment of any adduct could be assessed. Although we obtained diffracting
crystals of FrmRC36S, as well as a plausible looking phasing model, a refinable model was not
obtained. (Section 4.7.1) This could be due to merohydral twinning of the crystals, combined
with low accuracy of the starting models. In future, it might be possible to use anomalous
scattering and/or multiple isomorphous replacement techniques in order to obtain
experimental phases. Alternatively, different crystals forms could be sought. It is interesting to
note that while FrmRC36S formed crystals under several conditions, the WT protein did not
crystallise at all (section 4.3). This, along with the fact that FrmR will only bind DNA under
reducing conditions, further suggests how essential this cysteine residue appears to be for the
activity of FrmR. It is therefore likely that under oxidising conditions, this residue is modified in
a way that causes the protein to become inactive, possibly leading to disorder affecting the
crystallisation likelihood.
It is still largely unknown how FrmR and other Duf-156 bind DNA. In the future, it would be
insightful to obtain a high resolution structure of one of these proteins in complex with its TFBS.
As far as this study is concerned, this would ideally be of FrmR:frmRAB, however a complex of
any homolog would also provide significant insight into FrmR function through homology. The
structure of CsoR from M. tuberculosis is only known when bound to its effector (copper), thus
only the inactive structure of this protein is known. It would be interesting to obtain high
resolution structures of both an active (DNA-binding), and an inactive (effector bound) form of a
Duf-156 protein. Differences between the two forms of the protein may elucidate the structural
basis for allosteric regulation in these proteins.
This project also aimed to gain a further understanding of the HxlR proteins. We studied HxlR1
from Bacillus subtilis and HxlR2 from Bacillus cereus AH818. These proteins are fairly similar
(39% identity over full length) and represent two distinct types of HxlR proteins that appear to
regulate formaldehyde detoxification pathways (HxlR1 is Type 1 and HxlR2 is Type 2). Both
Type 2 and Type 1 regulators are linked to genes for the RuMP pathway, however the Type 2
protein HxlR2, appears to regulate genes for a glutathione-dependent pathway.
183
This study, along with previously published work, indicate that HxlR1 appears to show little
difference in its DNA binding properties when in the absence or presence of formaldehyde.171
(Section 5.9) This suggests that if a direct interaction between HxlR1 and formaldehyde exists, it
affects HxlR:RNA-polymerase interactions rather than HxlR1-DNA interactions. On the other
hand, it may well be the case that neither HxlR nor the related HxlR2 actually interact with
formaldehyde, but respond to a distinct ligand/chemical stress that arises as a direct
consequence to an increase in cellular formaldehyde. At the outset of this project it was
intended to obtain a high resolution structure of HxlR1 to assist further experiments and
understanding of the formaldehyde sensing mechanism in this protein. Unfortunately no
crystals of this protein could be obtained (Section 4.3.3). In contrast, crystals were obtained for
the related HxlR2, and based on the level of conservation between HxlR1 and HxlR2 (39%
identity) we can assume that their overall structures will be largely similar. It is not yet known
whether HxlR2 is in fact a regulator of the FDP directly upstream, however the arrangement of
this operon suggests that it is. (Figure 1-30). Additionally, as shown in Figure 3-2, there are
genes encoding proteins >65% identical to HxlR2, located upstream of FDPs that are conserved
in other organisms. This further suggests a formaldehyde responsive role of this protein. The
HxlR2 structure is similar to previously solved members of this family: the protein is dimeric
and contains a wHTH DNA-binding domain and a dimerisation domain. (Section 4.8.1) The
dimerisation between the two subunits appears to be driven by hydrophobic interactions at the
interface. (Section 4.14)
It has been shown using EMSA that HxlR2 is capable of binding to DNA. (Section 5.9). Residues
that are likely involved in DNA-binding in the HxlR-family have been identified by analysis of
the recognition helix. (Section 4.15) One noticeable feature of the wHTH is that the loop
contains hydrophobic valine residues on its surface. This is unusual and therefore may imply
that there is a hydrophobic interaction between the wing of the wHTH and HxlR2 TFBS. Future
work could look to test the hypotheses regarding the protein-DNA interaction by attempting to
obtain a crystal structure of the complex.
HypR is the most similar protein to HxlR1/2 to be characterised both structurally and
functionally. HypR is a TF that senses sodium hypochlorite and diamide in Bacillus subtilis.238
The protein senses its effector molecules through their effect on key cysteine residues. Rather
than covalent adduct formation, the effectors lead to inter-molecular disulphide bond
formation. As explained in the introduction to chapter 5, this kind of inter-molecular disulphide
bond formation is not possible in HxlR1. Aslo, type 2 HxlR proteins do not possess either of the
cysteine residues implicated in disulphide bond formation.
184
Comparing the structure of HxlR2-His with that of oxidised and reduced HypR (section 4.10)
shows that it is more similar to the oxidised form. The most significant difference is the location
of the recognition helix; whereas the oxidized HypR form overlay almost perfectly with HxlR2-
His, there is a significant difference when comparing the reduced HypR form. (Figure 6-3) This
suggests that the recognition helix of HxlR2-His is in the same “DNA-binding” conformation as
the oxidized form of HypR. It is the difference in position of the recognition helix that is thought
to explain a difference in DNA-binding properties of the different conformers HypR (oxidized
HypR appears to bind DNA strongly more strongly than the reduced form).
Figure 6-3- Relative positions of reduced and oxidized forms of the backbone of HypR’s
recognition helix when their structures are overlaid onto the structure of HxlR2-His. HxlR2-His is
colored red, reduced HypR is colored green and oxidized HypR is colored red.
One hypothesis that has been put forward is that the conformation of oxidised HypR promotes
recruitment of RNAP.238 This does not appear to be an appropriate mechanism for HxlR2
because it would imply that the TF is in an active conformation (i.e. inducing gene expression) in
the absence of any effector molecule.
Clearly, more work will need to be conducted before a detailed mechanism of HxlR2 function
can be established. This should include in vivo and in vitro work to establish whether the protein
responds to formaldehyde and how this signal is coupled to increased transcription. Mutational
studies could be used to test the molecular detail of any hypotheses regarding function. For
example, we might expect cysteine residues to play an important role as with FrmR; this could
be tested if a working reporter system was constructed using this system.
Finally, it is worth noting that we have incidentally constructed a potential bio-sensor of
formaldehyde. The in vivo reporter system described in chapter 5 responds to environmental
185
formaldehyde. This may therefore be of use in applications that require the sensing of
formaldehyde. For example, it may be necessary to obtain an enzyme mutant that can
demethylate a particular molecule. Demethylation will produce formaldehyde as a by-product. If
this reaction is performed in vivo alongside the reporter system, it may be possible to effectively
observe this reaction as a response from the reporter system. Whether the reporter system will
be of any use entirely depends on whether it is sensitive enough to detect the levels of
formaldehyde produced. This will therefore need to be tested in future.
186
Appendix
A1: Cloning strategies
All plasmid constructs that were prepared in this study are derived from either pET-15b or pET-
24b from Novagen; vector maps of these constructs are show below in Figures A1 and A2
respectively.
Figure A1: Vector map of the pET-15b plasmid. bla encodes a Beta-lactamase that confers
resistance of the host bacterium to ampicillin. lacI encodes the lac repressor. ori represents the
origin of replication of the plasmid. The T7 promoter is the promoter site for T7 polymerase and
the region that encodes the ‘His-tag’ is also labeled. The location of restriction sites that were used
in this study (NdeI and BamH1) are also labeled.
187
Figure A2: Vector map of the pET24b plasmid. KanR encodes an aminoglycoside 3'-
phosphotransferase that confers resistance of the host bacterium to kanamycin. lacI encodes the
lac repressor. ori represents the origin of replication of the plasmid. The T7 promoter is the
promoter site for T7 polymerase and the region that encodes the ‘His-tag’ is also labeled. The
location of restriction sites that were used in this study (NdeI and HindIII) are also labeled.
A1.1 Cloning strategy for the construction of pET15b-His-frmR, pET24bfrmR-His and pET15b-frmR
This section schematically describes the strategy used to construct the FrmR expression vectors
and is intended to provide clarity of the procedure. Full details are described in section 3.4.1
and the sequences of the primers mentioned are given in Table 2-2. Figure A3 shows how the
two ‘His-tagged’ constructs were made.
188
Figure A3: Schematic representation of the cloning strategy used to construct pET15b-His-frmR
and pET24b-frmR-His
Primers: frmR_F, frmR_R,
Primers: frmR_NdeI, frmR_HindIII
Primers: frmR_NdeI, frmR_BamHI
189
Implementation of the above strategy led to the construction of pET15b-His-frmR and pET24b-
frmR-His; vector maps are shown in Figures A4 and A5 respectively.
Figure A4: Vector map of pET15b-His-frmR
Figure A5: Vector map of pET24b-frmR-His
190
The frmR construct that did not contain a ‘His-tag’ was constructed by removing its encoding
DNA from pET15b-His-frmR so that only wild type FrmR is expressed. Figure A5 shows a
schematic description of how this was performed. Full details are described in 3.4.1 and the
primer sequences are given in Table 2-2. A vector map of pET15b-frmR is included in Figure A6.
Figure A6- Left: Schematic representation of how pET15b-frmR was constructed from pET15b-His-
frmR. Right: Vector map of pET15b-frmR.
Primers: frmRmutnde1F, frmRmutnde1R
191
A1.2 Cloning Strategy for the construction of pET24b-hxlR1-His
pET24b-hxlR1-His was constructed using standard ligation cloning. Full details of the procedure
are described in section 3.4.2 and the sequences of the primers used are given in Table 2-2. The
procedure used is summarized by the schematic in Figure A7.
Figure A7: Schematic representation of the cloning strategy used to construct pET24b-hxlR1-His
Primers: hxlR_Nde1, hxlR_BamH1
Primers: hxlR_F, hxlR_R
192
The vector map of pET24b-hxlR1-His is shown in Figure-A8.
Figure A8: Vector map of pET24b-hxlR1-His
A1.3 Cloning Strategy for the construction of pET24b-hxlR2-His
The procedure used to construct pET-24b-hxlR2-His used a slightly different concept to those
described above. Here, the ‘in fusion’ reaction (see section 2.2.10) was used rather than the
standard ligation method. Full details of the procedure are described in section 3.4.3 and
sequences of the primers used are given in Table 2-2. The strategy applied is shown
schematically in Figure A9.
193
Figure A9: Schematic representation of the cloning strategy used to construct pET24b-hxlR2-His
The vector map of pET24b-hxlR1-His is shown in Figure-A10.
Primers: cer24b1F, cer24b1R
195
A1.4 Cloning Strategy for the construction of the pKanRR and pGFPR reporter system
The procedure used to generate both the pKanR and pGFP based reporter plasmids is outlined
in Figure A11. Full details are described in 5.4 and primer sequences are given in Table 2-2.
Figure A11- Outline of the procedure used to construct the pKanR and pGFP reporter systems
196
The vector maps of the pKanRR and pGFPR reporter plasmids are shown in Figures A12 and
A13 respectively.
Figure A12: Vector map of pKanRR
Figure A13: Vector map of pKanRR
197
A1.5 Cloning Strategy for the construction of the E.coli K12∆frmR∆KanR (DE3) strain
The methodology used to acquire a K12∆frmR∆KanR (DE3) strain of E.coli for use in the in vivo
experiments described in chapter 5 is summarised by the schematic in Figure-A14.
Figure-A14- Overview of the strategy used to construct the E.coli K12∆frmR∆KanR(DE3) strain.
198
References
1. van Hijum, S.A.F.T., Medema, M.H. & Kuipers, O.P. Mechanisms and evolution of control logic in
prokaryotic transcriptional regulation. Microbiology and molecular biology reviews 73, 481-509 (2009).
2. Struhl, K. Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell 98, 1-4
(1999).
3. Fassler, J. & Gussin, G. Promoters and basal transcription machinery in eubacteria and eukaryotes:
Concepts, definitions, and analogies. 273, 367-375 (1996).
4. Harley, C.B. & Reynolds, R.P. Analysis of E. coli promoter sequences. Nucleic acids research 15,
2343-61 (1987).
5. Hertz, G.Z. & Stormo, G.D. Escherichia coli promoter sequences: analysis and prediction. Methods in
enzymology 273, 30-42 (1996).
6. Murakami, K.S. & Darst, S.A. Bacterial RNA polymerases: the wholo story. Current opinion in
structural biology 13, 31-39 (2003).
7. Vassylyev, D.G. et al. Crystal structure of a bacterial RNA polymerase holoenzyme at 2.6 A resolution.
Nature 417, 712-9 (2002).
8. Zhang, G. et al. Crystal Structure of Core RNA Polymerase at 3.3 p Resolution. Cell 98, 811-824
(1999).
9. Guthold, M. et al. Direct observation of one-dimensional diffusion and transcription by Escherichia coli
RNA polymerase. Biophysical journal 77, 2284-94 (1999).
10. Borukhov, S. & Nudler, E. RNA polymerase: the vehicle of transcription. Trends in microbiology 16,
126-34 (2008).
11. Craig, M.L. et al. DNA footprints of the two kinetically significant intermediates in formation of an
RNA polymerase-promoter open complex: evidence that interactions with start site and downstream
DNA induce sequential conformational changes in polymerase and DNA. Journal of molecular biology
283, 741-56 (1998).
12. Haugen, S.P., Ross, W. & Gourse, R.L. Advances in bacterial promoter recognition and its control by
factors that do not bind DNA. Nature reviews. Microbiology 6, 507-19 (2008).
13. Helmann, J.D. RNA polymerase: a nexus of gene regulation. Methods 47, 1-5 (2009).
14. Hsu, L. Promoter clearance and escape in prokaryotes. Biochimica et biophysica acta - gene structure
and expression 1577, 191-207 (2002).
15. Landick, R. The regulatory roles and mechanism of transcriptional pausing. Biochemical society
transactions 34, 1062-6 (2006).
16. Park, J.-S. & Roberts, J.W. Role of DNA bubble rewinding in enzymatic transcription termination.
Proceedings of the National Academy of Sciences 103, 4870-5 (2006).
17. Skordalakes, E. & Berger, J.M. Structure of the Rho Transcription Terminator. Cell 114, 135-146
(2003).
18. Richardson, J.P. Loading Rho to Terminate Transcription. Cell 114, 157-159 (2003).
19. Minchin, S.D. & Busby, S.J.W. Analysis of mechanisms of activation and repression at bacterial
promoters. Methods 47, 6-12 (2009).
20. Ohlendorf, D.H., Anderson, W.F. & Matthews, B.W. Many gene-regulatory proteins appear to have a
similar α-helical fold that binds DNA and evolved from a common precursor. Journal of molecular
evolution 19, 109-114 (1983).
21. Brennan, R.G. & Matthews, B.W. The helix-turn-helix DNA binding motif. The journal of biological
chemistry 264, 1903-6 (1989).
22. Huffman, J.L. & Brennan, R.G. Prokaryotic transcription regulators: more than just the helix-turn-helix
motif. Current opinion in structural biology 12, 98-106 (2002).
23. Rohs, R. et al. Origins of specificity in protein-DNA recognition. Annual review of biochemistry 79,
233-69 (2010).
24. Seeman, N.C. Sequence-Specific Recognition of Double Helical Nucleic Acids by Proteins.
Proceedings of the National Academy of Sciences 73, 804-808 (1976).
25. Watkins, D., Hsiao, C., Woods, K.K., Koudelka, G.B. & Williams, L.D. P22 c2 repressor-operator
complex: mechanisms of direct and indirect readout. Biochemistry 47, 2325-38 (2008).
26. Koo, H.S., Wu, H.M. & Crothers, D.M. DNA bending at adenine . thymine tracts. Nature 320, 501-6
(1986).
199
27. Koudelka, G.B., Mauro, S.A. & Ciubotaru, M. Indirect readout of DNA sequence by proteins: the roles
of DNA sequence-dependent intrinsic and extrinsic forces. Progress in nucleic acid research and
molecular biology 81, 143-77 (2006).
28. Rohs, R., West, S.M., Liu, P. & Honig, B. Nuance in the double-helix and its role in protein-DNA
recognition. Current opinion in structural biology 19, 171-7 (2009).
29. Zhang, Y., Xi, Z., Hegde, R.S., Shakked, Z. & Crothers, D.M. Predicting indirect readout effects in
protein-DNA interactions. Proceedings of the National Academy of Sciences 101, 8337-41 (2004).
30. Riggs, A., Bourgeous, S. & Cohn, M. The lac represser-operator interaction *1, *2III. Kinetic studies.
Journal of molecular biology 53, 401-417 (1970).
31. Kolomeisky, A.B. Physics of protein-DNA interactions: mechanisms of facilitated target search.
Physical chemistry chemical physics 13, 2088-95 (2011).
32. Gorman, J. & Greene, E.C. Visualizing one-dimensional diffusion of proteins along DNA. Nature
structural & molecular biology 15, 768-74 (2008).
33. Sheinman, M. & Kafri, Y. The effects of intersegmental transfers on target location by proteins.
Physical biology 6, 016-030 (2009).
34. Mirny, L. et al. How a protein searches for its site on DNA: the mechanism of facilitated diffusion.
Journal of physics A: Mathematical and theoretical 42, 434013 (2009).
35. Givaty, O. & Levy, Y. Protein sliding along DNA: dynamics and structural characterization. Journal of
molecular biology 385, 1087-97 (2009).
36. Hagmar, P. Unspecific DNA binding of the DNA binding domain of the glucocorticoid receptor studied
with flow linear dichroism. FEBS letters 253, 28-32 (1989).
37. Mossing, M. & Record, M. Thermodynamic origins of specificity in the lac repressor-operator
interaction. Adaptability in the recognition of mutant operator sites. Journal of molecular biology 186,
295-305 (1985).
38. Kalodimos, C.G. et al. Structure and flexibility adaptation in nonspecific and specific protein-DNA
complexes. Science 305, 386-9 (2004).
39. Dahirel, V., Paillusson, F., Jardat, M., Barbi, M. & Victor, J.-M. Nonspecific DNA-Protein Interaction:
Why Proteins Can Diffuse along DNA. Physical review letters 102, (2009).
40. Quinones, M., Kimsey, H.H., Ross, W., Gourse, R.L. & Waldor, M.K. LexA represses CTXphi
transcription by blocking access of the alpha C-terminal domain of RNA polymerase to promoter DNA.
The journal of biological chemistry 281, 39407-12 (2006).
41. Rojo, F. Repression of transcription initiation in bacteria. Journal of bacteriology 181, 2987-91 (1999).
42. Semsey, S., Geanacopoulos, M., Lewis, D.E.A. & Adhya, S. Operator-bound GalR dimers close DNA
loops by direct interaction: tetramerization and inducer binding. The EMBO journal 21, 4349-56 (2002).
43. Aki, T. & Adhya, S. Repressor induced site-specific binding of HU for transcriptional regulation. The
EMBO journal 16, 3666-74 (1997).
44. Rodionov, D.A. Comparative genomic reconstruction of transcriptional regulatory networks in bacteria.
Chemical reviews 107, 3467-97 (2007).
45. Rhodius, V. Positive activation of gene expression. Current opinion in microbiology 1, 152-159 (1998).
46. Smits, W.K., Hoa, T.T., Hamoen, L.W., Kuipers, O.P. & Dubnau, D. Antirepression as a second
mechanism of transcriptional activation by a minor groove binding protein. Molecular microbiology 64,
368-81 (2007).
47. Yamamoto, K. et al. Functional characterization in vitro of all two-component signal transduction
systems from Escherichia coli. The journal of biological chemistry 280, 1448-56 (2005).
48. Mascher, T., Helmann, J.D. & Unden, G. Stimulus perception in bacterial signal-transducing histidine
kinases. Microbiology and molecular biology reviews 70, 910-38 (2006).
49. Galperin, M.Y. Diversity of structure and function of response regulator output domains. Current
opinion in microbiology 13, 150-9 (2010).
50. Bourret, R.B. Receiver domain structure and function in response regulator proteins. Current opinion in
microbiology 13, 142-9 (2010).
51. Casino, P., Rubio, V. & Marina, A. The mechanism of signal transduction by two-component systems.
Current opinion in structural biology 20, 763-71 (2010).
52. Perez-Rueda, E. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12.
Nucleic acids research 28, 1838-1847 (2000).
53. Maddocks, S.E. & Oyston, P.C. Structure and function of the LysR-type transcriptional regulator
(LTTR) family proteins. Microbiology 154, 3609-3623 (2008).
54. Hernández-Lucas, I. et al. The LysR-type transcriptional regulator LeuO controls expression of several
genes in Salmonella enterica serovar Typhi. Journal of bacteriology 190, 1658-70 (2008).
55. Schell, M.A. Molecular-Biology of the Lysr Family of Transcriptional Regulators. Annual review of
microbiology 47, 597-626 (1993).
200
56. Ogawa, N., McFall, S.M., Klem, T.J., Miyashita, K. & Chakrabarty, A.M. Transcriptional Activation of
the Chlorocatechol Degradative Genes of Ralstonia eutropha NH9. Journal of bacteriology 181, 6697-
6705 (1999).
57. van Keulen, G., Ridder, A.N.J.A., Dijkhuizen, L. & Meijer, W.G. Analysis of DNA Binding and
Transcriptional Activation by the LysR-Type Transcriptional Regulator CbbR of Xanthobacter flavus.
Journal of bacteriology 185, 1245-1252 (2003).
58. Maddocks, S.E. & Oyston, P.C.F. Structure and function of the LysR-type transcriptional regulator
(LTTR) family proteins. Microbiology 154, 3609-23 (2008).
59. Zhou, X. et al. Crystal structure of ArgP from Mycobacterium tuberculosis confirms two distinct
conformations of full-length LysR transcriptional regulators and reveals its function in DNA binding
and transcriptional regulation. Journal of molecular biology 396, 1012-24 (2010).
60. Ezezika, O.C., Haddad, S., Neidle, E.L. & Momany, C. Oligomerization of BenM, a LysR-type
transcriptional regulator: structural basis for the aggregation of proteins in this family. Acta
crystallographica. Section F, Structural biology and crystallization communications 63, 361-8 (2007).
61. Ogawa, N., McFall, S.M., Klem, T.J., Miyashita, K. & Chakrabarty, A.M. Transcriptional activation of
the chlorocatechol degradative genes of Ralstonia eutropha NH9. Journal of bacteriology 181, 6697-705
(1999).
62. Muraoka, S. et al. Crystal structure of a full-length LysR-type transcriptional regulator, CbnR: unusual
combination of two subunit forms and molecular bases for causing and changing DNA bend. Journal of
molecular biology 328, 555-566 (2003).
63. Busenlehner, L.S., Pennella, M.A. & Giedroc, D.P. The SmtB/ArsR family of metalloregulatory
transcriptional repressors: structural insights into prokaryotic metal resistance. FEMS microbiology
Reviews 27, 131-143 (2003).
64. Turner, J. Zinc sensing by the cyanobacterial metallothionein repressor SmtB: different motifs mediate
metal-induced protein-DNA dissociation. Nucleic acids research 24, 3714-3721 (1996).
65. Morby, A.P., Turner, J.S., Huckle, J.W. & Robinson, N.J. SmtB is a metal-dependent repressor of the
cyanobacterial metallothionein gene smtA: identification of a Zn inhibited DNA-protein complex.
Nucleic acids research 21, 921-5 (1993).
66. VanZile, M.L., Chen, X. & Giedroc, D.P. Allosteric Negative Regulation of smt O/P Binding of the
Zinc Sensor, SmtB, by Metal Ions: A Coupled Equilibrium Analysis †. Biochemistry 41, 9776-9786
(2002).
67. Eicken, C. et al. A Metal–Ligand-mediated Intersubunit Allosteric Switch in Related SmtB/ArsR Zinc
Sensor Proteins. Journal of molecular biology 333, 683-695 (2003).
68. Seshasayee, A.S.N., Bertone, P., Fraser, G.M. & Luscombe, N.M. Transcriptional regulatory networks
in bacteria: from input signals to output responses. Current opinion in microbiology 9, 511-9 (2006).
69. Babu, M.M. Early Career Research Award Lecture. Structure, evolution and dynamics of transcriptional
regulatory networks. Biochemical society transactions 38, 1155-78 (2010).
70. Gama-Castro, S. et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12
beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic
acids research 36, D120-4 (2008).
71. Martínez-Antonio, A. Identifying global regulators in transcriptional regulatory networks in bacteria.
Current opinion in microbiology 6, 482-489 (2003).
72. Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824-7 (2002).
73. Alon, U. Network motifs: theory and experimental approaches. Nature reviews. Genetics 8, 450-61
(2007).
74. Rosenfeld, N., Elowitz, M.B. & Alon, U. Negative Autoregulation Speeds the Response Times of
Transcription Networks. Journal of molecular bology 323, 785-793 (2002).
75. Becskei, A. & Serrano, L. Engineering stability in gene networks by autoregulation. Nature 405, 590-3
(2000).
76. Maeda, Y.T. & Sano, M. Regulatory dynamics of synthetic gene networks with positive feedback.
Journal of molecular biology 359, 1107-24 (2006).
77. Mangan, S. & Alon, U. Structure and function of the feed-forward loop network motif. Proceedings of
the National Academy of Sciences 100, 11980-5 (2003).
78. Shen-Orr, S.S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation
network of Escherichia coli. Nature genetics 31, 64-8 (2002).
79. Mangan, S., Zaslaver, A. & Alon, U. The Coherent Feedforward Loop Serves as a Sign-sensitive Delay
Element in Transcription Networks. Journal of molecular biology 334, 197-204 (2003).
80. Kalir, S., Mangan, S. & Alon, U. A coherent feed-forward loop with a SUM input function prolongs
flagella expression in Escherichia coli. Molecular systems biology 1, 2005.0006 (2005).
201
81. Zaslaver, A. et al. Just-in-time transcription program in metabolic pathways. Nature genetics 36, 486-91
(2004).
82. Ishihama, A. Functional modulation of Escherichia coli RNA polymerase. Annual review of
microbiology 54, 499-518 (2000).
83. Koo, B.-M., Rhodius, V.A., Campbell, E.A. & Gross, C.A. Dissection of recognition determinants of
Escherichia coli sigma32 suggests a composite -10 region with an “extended -10” motif and a core -10
element. Molecular microbiology 72, 815-29 (2009).
84. Arsène, F., Tomoyasu, T. & Bukau, B. The heat shock response of Escherichia coli. International
journal of food microbiology 55, 3-9 (2000).
85. Hughes, K.T. & Mathee, K. The anti-sigma factors. Annual review of microbiology 52, 231-86 (1998).
86. Helmann, J.D. Anti-sigma factors. Current opinion in microbiology 2, 135-41 (1999).
87. Jishage, M. & Ishihama, A. A stationary phase protein in Escherichia coli with binding activity to the
major sigma subunit of RNA polymerase. Proceedings of the National Academy of Sciences 95, 4953-8
(1998).
88. Piper, S.E., Mitchell, J.E., Lee, D.J. & Busby, S.J.W. A global view of Escherichia coli Rsd protein and
its interactions. Molecular biosystems 5, 1943-7 (2009).
89. Srivatsan, A. & Wang, J.D. Control of bacterial transcription, translation and replication by (p)ppGpp.
Current opinion in microbiology 11, 100-5 (2008).
90. Wendrich, T.M., Blaha, G., Wilson, D.N., Marahiel, M.A. & Nierhaus, K.H. Dissection of the
Mechanism for the Stringent Factor RelA. Molecular Cell 10, 779-788 (2002).
91. Paul, B.J. et al. DksA: a critical component of the transcription initiation machinery that potentiates the
regulation of rRNA promoters by ppGpp and the initiating NTP. Cell 118, 311-22 (2004).
92. Haugen, S.P. et al. rRNA promoter regulation by nonoptimal binding of sigma region 1.2: an additional
recognition element for RNA polymerase. Cell 125, 1069-82 (2006).
93. Potrykus, K., Murphy, H., Philippe, N. & Cashel, M. ppGpp is the major source of growth rate control
in E. coli. Environmental microbiology 13, 563-75 (2010).
94. Potrykus, K. & Cashel, M. (p)ppGpp: still magical? Annual review of microbiology 62, 35-51 (2008).
95. Dillon, S.C. & Dorman, C.J. Bacterial nucleoid-associated proteins, nucleoid structure and gene
expression. Nature reviews. Microbiology 8, 185-95 (2010).
96. Ball, C.A., Osuna, R., Ferguson, K.C. & Johnson, R.C. Dramatic changes in Fis levels upon nutrient
upshift in Escherichia coli. Journal of bacteriology 174, 8043-8056 (1992).
97. Stella, S., Cascio, D. & Johnson, R.C. The shape of the DNA minor groove directs binding by the DNA-
bending protein Fis. Genes & development 24, 814-26 (2010).
98. Squire, D.J.P., Xu, M., Cole, J.A., Busby, S.J.W. & Browning, D.F. Competition between NarL-
dependent activation and Fis-dependent repression controls expression from the Escherichia coli yeaR
and ogt promoters. The biochemical journal 420, 249-57 (2009).
99. Dorman, C.J. H-NS: a universal regulator for a dynamic genome. Nature reviews. Microbiology 2, 391-
400 (2004).
100. Lucchini, S. et al. H-NS mediates the silencing of laterally acquired genes in bacteria. PLoS pathogens
2, e81 (2006).
101. Stoebel, D.M., Free, A. & Dorman, C.J. Anti-silencing: overcoming H-NS-mediated repression of
transcription in Gram-negative enteric bacteria. Microbiology 154, 2533-45 (2008).
102. Merino, E. & Yanofsky, C. Transcription attenuation: a highly conserved regulatory strategy used by
bacteria. Trends in genetics 21, 260-4 (2005).
103. Naville, M. & Gautheret, D. Transcription attenuation in bacteria: theme and variations. Briefings in
functional genomics & proteomics 8, 482-92 (2009).
104. Yanofsky, C. Attenuation in the control of expression of bacterial operons. Nature 289, 751-758 (1981).
105. Browning, D.F. & Busby, S.J. The regulation of bacterial transcription initiation. Nature reviews.
Microbiology 2, 57-65 (2004).
106. Wang, S. et al. Transcriptomic response of Escherichia coli O157:H7 to oxidative stress. Applied and
environmental microbiology 75, 6110-23 (2009).
107. Clayden, J. Organic Chemistry. (Oxford University Press: Oxford, 2001).
108. WHO Formaldehyde. (2006).
109. Kerns, W.D., Pavkov, K.L., Donofrio, D.J., Gralla, E.J. & Swenberg, J.A. Carcinogenicity of
Formaldehyde in Rats and Mice after Long-Term Inhalation Exposure. Cancer Research 43, 4382-4392
(1983).
110. Schmid, O. & Speit, G. Genotoxic effects induced by formaldehyde in human blood and implications
for the interpretation of biomonitoring studies. Mutagenesis 22, 69-74 (2007).
111. Barker, S., Weinfeld, M. & Murray, D. DNA-protein crosslinks: their induction, repair, and biological
consequences. Mutation research 589, 111-35 (2005).
202
112. Aparicio, O., Geisberg, J.V. & Struhl, K. Chromatin immunoprecipitation for determining the
association of proteins with specific genomic sequences in vivo. Current protocols in cell biology
Chapter 17, Unit 17.7 (2004).
113. Collado-Vides, J. et al. Bioinformatics resources for the study of gene regulation in bacteria. Journal of
bacteriology 191, 23-31 (2009).
114. Heck, H. et al. Formaldehyde (CH2O) Concentrations in the Blood of Humans and Fischer-344 Rats
Exposed to CH2O Under Controlled Conditions. AIHA Journal 46, 1-3 (1985).
115. Handler, P., Bernheim, M. & R, K. The Oxidative Demethylation of Sacrosine. Journal of biological
chemistry 138, 211-218 (1941).
116. Trewick, S.C., Henshaw, T.F., Hausinger, R.P., Lindahl, T. & Sedgwick, B. Oxidative demethylation by
Escherichia coli AlkB directly reverts DNA base damage. Nature 419, 174-8 (2002).
117. Carlier, P., Hannichi, H. & Mouvier, G. The chemistry of carbonyl compounds in the atmosphere—A
review. Atmospheric environment 20, 2079-2099 (1986).
118. Granby, K. Urban and semi-rural observations of carboxylic acids and carbonyls. Atmospheric
environment 31, 1403-1415 (1997).
119. Vorholt, J.A. Cofactor-dependent pathways of formaldehyde oxidation in methylotrophic bacteria.
Archives of microbiology 178, 239-49 (2002).
120. Mitsui, R., Omori, M., Kitazawa, H. & Tanaka, M. Formaldehyde-limited cultivation of a newly
isolated methylotrophic bacterium, Methylobacterium sp. MF1: enzymatic analysis related to C1
metabolism. Journal of bioscience and bioengineering 99, 18-22 (2005).
121. Chistoserdova, L., Kalyuzhnaya, M.G. & Lidstrom, M.E. The expanding world of methylotrophic
metabolism. Annual review of microbiology 63, 477-99 (2009).
122. Harms, N., Ras, J., Reijnders, W.N., van Spanning, R.J. & Stouthamer, A.H. S-formylglutathione
hydrolase of Paracoccus denitrificans is homologous to human esterase D: a universal pathway for
formaldehyde detoxification? Journal of bacteriology 178, 6296-6299 (1996).
123. Interproscan. at <http://www.ebi.ac.uk/Tools/pfa/iprscan/>
124. Gonzalez, C.F. et al. Molecular basis of formaldehyde detoxification. Characterization of two S-
formylglutathione hydrolases from Escherichia coli, FrmB and YeiG. Journal of biological chemistry
281, 14514-14522 (2006).
125. Stittermatter, P. & Eric, B. Formaldehyde Dehydrogenase, a Glutathione dependent enzyme system. The
journal of biological chemistryournal of Biological Chemistry 213, 445-461 (1955).
126. Uotila, L. & Koivusalo, M. Formaldehyde Dehydrogenase from Human Liver. Purification, properties,
and evidence for formation of glutathione thiol esters by the enzyme. Journal of biological chemistry
249, 7653-7663 (1974).
127. Ksaumann, M. & Uotila, L. Evidence for the identity of glutathione-dependent formaldehyde
dehydrogenase and class III alcohol dehydrogenase. FEBS letters 257, 105-109 (1989).
128. Uotila, L. & Koivusalo, M. Purification and Properties of S-Formylglutathione Hydrolase from Human
Liver. Journal of biological chemistry 249, 7664-7672 (1974).
129. Min, H., Shane, B. & Stokstad, E.L. Identification of 10-formyltetrahydrofolate dehydrogenase-
hydrolase as a major folate binding protein in liver cytosol. Biochimica et biophysica acta 967, 348-53
(1988).
130. Danielsson, O. “Enzymogenesis”: Classical Liver Alcohol Dehydrogenase Origin from the Glutathione-
Dependent Formaldehyde Dehydrogenase Line. Proceedings of the National Academy of Sciences 89,
9247-9251 (1992).
131. Kaiser, R. Origin of the Human Alcohol Dehydrogenase System: Implications from the Structure and
Properties of the Octopus Protein. Proceedings of the National Academy of Sciences 90, 11222-11226
(1993).
132. Jensen, D.E., Belka, G.K. & Du Bois, G.C. S-Nitrosoglutathione is a substrate for rat alcohol
dehydrogenase class III isoenzyme. The biochemical journal 331 ( Pt 2, 659-68 (1998).
133. Liu, L. et al. A metabolic enzyme for S-nitrosothiol conserved from bacteria to humans. Nature 410,
490-4 (2001).
134. Yang, Z.N., Bosron, W.F. & Hurley, T.D. Structure of human chi chi alcohol dehydrogenase: a
glutathione-dependent formaldehyde dehydrogenase. Journal of molecular biology 265, 330-43 (1997).
135. Plapp, B.V. Conformational changes and catalysis by alcohol dehydrogenase. Archives of biochemistry
and biophysics 493, 3-12 (2010).
136. Sanghani, P.C., Bosron, W.F. & Hurley, T.D. Human Glutathione-Dependent Formaldehyde
Dehydrogenase. Structural Changes Associated with Ternary Complex Formation †. Biochemistry 41,
15189-15194 (2002).
203
137. Degrassi, G., Uotila, L., Klima, R. & Venturi, V. Purification and properties of an esterase from the
yeast Saccharomyces cerevisiae and identification of the encoding gene. Applied and environmental
microbiology 65, 3470-2 (1999).
138. Kordic, S., Cummins, I. & Edwards, R. Cloning and characterization of an S-formylglutathione
hydrolase from Arabidopsis thaliana. Archives of biochemistry and biophysics 399, 232-8 (2002).
139. Wu, D. et al. Crystal structure of human esterase D: a potential genetic marker of retinoblastoma. The
FASEB journal : official publication of the Federation of American Societies for Experimental Biology
23, 1441-6 (2009).
140. Johnson, Curtis, W. Principles Of Physical Biochemistry. (Pearson: 2006).
141. Alterio, V. et al. Crystal structure of an S-formylglutathione hydrolase from Pseudoalteromonas
haloplanktis TAC125. Biopolymers 93, 669-77 (2010).
142. Newton, G.L. & Fahey, R.C. Mycothiol biochemistry. Archives of microbiology 178, 388-94 (2002).
143. Reizer, J., Reizer, A. & Saier, M.H. Is the ribulose monophosphate pathway widely distributed in
bacteria? Microbiology 143 ( Pt 8, 2519-20 (1997).
144. Orita, I. et al. The archaeon Pyrococcus horikoshii possesses a bifunctional enzyme for formaldehyde
fixation via the ribulose monophosphate pathway. Journal of bacteriology 187, 3636-42 (2005).
145. Yurimoto, H., Kato, N. & Sakai, Y. Assimilation, dissimilation, and detoxification of formaldehyde, a
central metabolic intermediate of methylotrophic metabolism. Chemical record 5, 367-75 (2005).
146. Quayle, J.R. & Ferenci, T. Evolutionary aspects of autotrophy. Microbiological reviews 42, 251-73
(1978).
147. Kato, N., Yurimoto, H. & Thauer, R.K. The physiological role of the ribulose monophosphate pathway
in bacteria and archaea. Bioscience biotechnology and biochemistry 70, 10-21 (2006).
148. Wise, E., Yew, W.S., Babbitt, P.C., Gerlt, J.A. & Rayment, I. Homologous (β/α) 8 -Barrel Enzymes That
Catalyze Unrelated Reactions: Orotidine 5‘-Monophosphate Decarboxylase and 3-Keto- l -Gulonate 6-
Phosphate Decarboxylase † , ‡. Biochemistry 41, 3861-3869 (2002).
149. Gerlt, J. Evolution of function in (β/α)8-barrel enzymes. Current opinion in chemical biology 7, 252-264
(2003).
150. Yew, W.S., Wise, E.L., Rayment, I. & Gerlt, J.A. Evolution of enzymatic activities in the orotidine 5’-
monophosphate decarboxylase suprafamily: mechanistic evidence for a proton relay system in the active
site of 3-keto-L-gulonate 6-phosphate decarboxylase. Biochemistry 43, 6427-37 (2004).
151. Orita, I. et al. Crystal structure of 3-hexulose-6-phosphate synthase, a member of the orotidine 5’-
monophosphate decarboxylase suprafamily. Proteins 78, 3488-92 (2010).
152. Martinez-cruz, L.A. et al. of MJ1247 Protein from M. jannastechii at 2.0 A
Resolution Infers a Molecular Function of 3-Hexulose-6-Phosphate Isomerase. Structure 10, 195-204
(2002).
153. Ferenci, T., Strom, T. & Quayle, J.R. Purification and properties of 3-hexulose phosphate synthase and
phospho-3-hexuloisomerase from Methylococcus capsulatus. The biochemical journal 144, 477-86
(1974).
154. Tanaka, N., Kusakabe, Y., Ito, K., Yoshimoto, T. & Nakamura, K.T. Crystal Structure of Formaldehyde
Dehydrogenase from Pseudomonas putida: the Structural Origin of the Tightly Bound Cofactor in
Nicotinoprotein Dehydrogenases. Journal of molecular biology 324, 519-533 (2002).
155. Kato, N., Yamagami, T. & Shimao, M. Formaldehyde dismutase, a novel NAD-Binding oxioreductase
from Pseudomonas-putida F61. European journal of biochemistry 156, 59-64 (1986).
156. Marx, C.J., Miller, J.A., Chistoserdova, L. & Lidstrom, M.E. Multiple Formaldehyde
Oxidation/Detoxification Pathways in Burkholderia fungorum LB400. Journal of bacteriology 186,
2173-2178 (2004).
157. Vorholt, J.A., Marx, C.J., Lidstrom, M.E. & Thauer, R.K. Novel formaldehyde-activating enzyme in
Methylobacterium extorquens AM1 required for growth on methanol. Journal of bacteriology 182,
6645-50 (2000).
158. Roca, A., Rodríguez-Herva, J.J. & Ramos, J.L. Redundancy of enzymes for formaldehyde detoxification
in Pseudomonas putida. Journal of bacteriology 191, 3367-74 (2009).
159. Wittwer, A. & Wagner, C. Identification of the folate-binding proteins of rat liver mitochondria as
dimethylglycine dehydrogenase and sarcosine dehydrogenase. Flavoprotein nature and enzymatic
properties of the purified proteins. Journal of biological chemistry 256, 4109-4115 (1981).
160. Mackenzie, C.G. & Frisell, W. The metabolism of dimethylglycine by liver mitochondria. Thej ournal
of biological chemistry 232, 417-27 (1958).
161. Leys, D., Basran, J. & Scrutton, N.S. Channelling and formation of “active” formaldehyde in
dimethylglycine oxidase. The EMBO journal 22, 4038-48 (2003).
162. Tralau, T. et al. An internal reaction chamber in dimethylglycine oxidase provides efficient protection
from exposure to toxic formaldehyde. The journal of biological chemistry 284, 17826-34 (2009).
204
163. Fox, J.T. & Stover, P.J. Folate-mediated one-carbon metabolism. Vitamins and hormones 79, 1-44
(2008).
164. de Vries, G.E., Harms, N., Maurer, K., Papendrecht, A. & Stouthamer, A.H. Physiological regulation of
Paracoccus denitrificans methanol dehydrogenase synthesis and activity. Journal of bacteriology 170,
3731-7 (1988).
165. Harms, N., Reijnders, W.N., Koning, S. & van Spanning, R.J. Two-component system that regulates
methanol and formaldehyde oxidation in Paracoccus denitrificans. Journal of bacteriology 183, 664-70
(2001).
166. Barber, R.D. & Donohue, T.J. Pathways for transcriptional activation of a glutathione-dependent
formaldehyde dehydrogenase gene. Journal of molecular biology 280, 775-84 (1998).
167. Hickman, J.W., Witthuhn, V.C., Dominguez, M. & Donohue, T.J. Positive and negative transcriptional
regulators of glutathione-dependent formaldehyde metabolism. Journal of bacteriology 186, 7914-25
(2004).
168. Yasueda, H., Kawahara, Y. & Sugimoto, S. Bacillus subtilis yckG and yckF encode two key enzymes of
the ribulose monophosphate pathway used by methylotrophs, and yckH is required for their expression.
Journal of bacteriology 181, 7154-7160 (1999).
169. Nguyen, T.T.H. et al. Genome-wide responses to carbonyl electrophiles in Bacillus subtilis: control of
the thiol-dependent formaldehyde dehydrogenase AdhA and cysteine proteinase YraA by the MerR-
family regulator YraB (AdhR). Molecular microbiology 71, 876-94 (2009).
170. Hoskisson, P.A. & Rigali, S. Advances in Applied Microbiology Volume 69. Advances in applied
microbiology 69, 1-22 (Elsevier: 2009).
171. Yurimoto, H. et al. HxlR, a member of the DUF24 protein family, is a DNA-binding protein that acts as
a positive regulator of the formaldehyde-inducible hxlAB operon in Bacillus subtilis. Molecular
microbiology 57, 511-519 (2005).
172. Potter, A.J., Kidd, S.P., McEwan, A.G. & Paton, J.C. The MerR/NmlR family transcription factor of
Streptococcus pneumoniae responds to carbonyl stress and modulates hydrogen peroxide production.
Journal of bacteriology 192, 4063-6 (2010).
173. Barford, D. The role of cysteine residues as redox-sensitive regulatory switches. Current opinion in
structural biology 14, 679-86 (2004).
174. Rasko, D.A. et al. Complete sequence analysis of novel plasmids from emetic and periodontal Bacillus
cereus isolates reveals a common evolutionary history among the B. cereus-group plasmids, including
Bacillus anthracis pXO1. Journal of bacteriology 189, 52-64 (2007).
175. Gutheil, W.G., Kasimoglu, E. & Nicholson, P.C. Induction of glutathione-dependent formaldehyde
dehydrogenase activity in Escherichia coli and Hemophilus influenza. Biochemical and biophysical
research communications 238, 693-696 (1997).
176. Herring, C.D. & Blattner, F.R. Global transcriptional effects of a suppressor tRNA and the inactivation
of the regulator frmR. Journal of bacteriology 186, 6714-6720 (2004).
177. Liu, T. et al. CsoR is a novel Mycobacterium tuberculosis copper-sensing transcriptional regulator.
Nature chemical biology 3, 60-68 (2007).
178. Ma, Z., Cowart, D.M., Scott, R.A. & Giedroc, D.P. Molecular Insights into the Metal Selectivity of the
Copper(I)-Sensing Repressor CsoR from Bacillus subtilis. Biochemistry 48, 3325-3334 (2009).
179. Sakamoto, K., Agari, Y., Agari, K., Kuramitsu, S. & Shinkai, A. Structural and functional
characterization of the transcriptional repressor CsoR from Thermus thermophilus HB8. Microbiology
156, 1993-2005 (2010).
180. Iwig, J.S., Rowe, J.L. & Chivers, P.T. Nickel homeostasis in Escherichia coli - the rcnR-rcnA efflux
pathway and its linkage to NikR function. Molecular microbiology 62, 252-262 (2006).
181. Birnboim, H.C. & Doly, J. A rapid alkaline extraction procedure for screening recombinant plasmid
DNA. Nucleic acids research 7, 1513-23 (1979).
182. Bradford, M.M. A rapid and sensitive method for the quantitation of microgram quantities of protein
utilizing the principle of protein-dye binding. Analytical biochemistry 72, 248-54 (1976).
183. Fenn, J., Mann, M., Meng, C., Wong, S. & Whitehouse, C. Electrospray ionization for mass
spectrometry of large biomolecules. Science 246, 64-71 (1989).
184. Nolting, B. Methods in Modern Biophysics. (Springer: 2010).
185. Strupat, K. Molecular weight determination of peptides and proteins by ESI and MALDI. Methods in
enzymology 405, 1-36 (2005).
186. Wiley, W.C. & McLaren, I.H. Time-of-Flight Mass Spectrometer with Improved Resolution. Review of
Scientific Instruments 26, 1150 (1955).
187. Wen J., Arakawa T. & Philo J.S. Size-Exclusion Chromatography with On-Line Light-Scattering,
Absorbance, and Refractive Index Detectors for Studying Proteins and Their Interactions. Analytical
biochemistry 240, 12 (1996).
205
188. Zimm, B.H. The Scattering of Light and the Radial Distribution Function of High Polymer Solutions.
The journal of chemical physics 16, 1093 (1948).
189. Whitmore, L. & Wallace, B.A. Protein secondary structure analyses from circular dichroism
spectroscopy: methods and reference databases. Biopolymers 89, 392-400 (2008).
190. Perez-Iratxeta, C. & Andrade-Navarro, M.A. K2D2: estimation of protein secondary structure from
circular dichroism spectra. BMC structural biology 8, 25 (2008).
191. Fried, M.G. Measurement of protein-DNA interaction parameters by electrophoresis mobility shift
assay. Electrophoresis 10, 366-76
192. Altschul*, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic acids research 25, 3389-3402 (1997).
193. Larkin, M.A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-8 (2007).
194. Clamp, M., Cuff, J., Searle, S.M. & Barton, G.J. The Jalview Java alignment editor. Bioinformatics 20,
426-7 (2004).
195. Cole, C., Barber, J.D. & Barton, G.J. The Jpred 3 secondary structure prediction server. Nucleic acids
research 36, W197-201 (2008).
196. Simossis, V.A. & Heringa, J. Integrating protein secondary structure prediction and multiple sequence
alignment. Current protein & peptide science 5, 249-66 (2004).
197. Wu, J. et al. Prediction of DNA-binding residues in proteins from amino acid sequences using a random
forest model with a hybrid feature. Bioinformatics 25, 30-5 (2009).
198. Wang, L. & Brown, S.J. BindN: a web-based tool for efficient prediction of DNA and RNA binding
sites in amino acid sequences. Nucleic acids research 34, W243-8 (2006).
199. Yan, C. et al. Predicting DNA-binding sites of proteins from amino acid sequence. BMC bioinformatics
7, 262 (2006).
200. Hwang, S., Gou, Z. & Kuznetsov, I.B. DP-Bind: a web server for sequence-based prediction of DNA-
binding residues in DNA-binding proteins. Bioinformatics 23, 634-6 (2007).
201. Chu, W.-Y. et al. ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in
transcription factors. Nucleic acids research 37, W396-401 (2009).
202. Rhodes, G. Crystallography Made Crystal Clear. (Elsevier: Oxford, 2006).
203. Drenth, J. Principles of Protein X-Ray Crystallography. (Springer: New York, 2007).
204. Woolfson, M. An Introduction to X-Ray Crystallography. (Cambridge University Press: 1997).
205. Durbin, S.D. & Feher, G. Protein crystallization. Annual review of physical chemistry 47, 171-204
(1996).
206. Battye, T.G.G., Kontogiannis, L., Johnson, O., Powell, H.R. & Leslie, A.G.W. iMOSFLM: a new
graphical interface for diffraction-image processing with MOSFLM. Acta crystallographica. Section D,
Biological crystallography 67, 271-81 (2011).
207. Evans, P. Scaling and assessment of data quality. Acta crystallographica. Section D, Biological
crystallography 62, 72-82 (2006).
208. French, S. & Wilson, K. On the treatment of negative intensity observations. Acta crystallographica
Section A 34, 517-525 (1978).
209. McCoy, A.J. et al. Phaser crystallographic software. Journal of applied crystallography 40, 658-674
(2007).
210. Claude, J.-B., Suhre, K., Notredame, C., Claverie, J.-M. & Abergel, C. CaspR: a web server for
automated molecular replacement using homology modelling. Nucleic acids research 32, W606-9
(2004).
211. Terwilliger, T.C. Maximum-likelihood density modification using pattern recognition of structural
motifs. Acta crystallographica Section D Biological crystallography 57, 1755-1762 (2001).
212. Terwilliger, T.C. Automated main-chain model building by template matching and iterative fragment
extension. Acta crystallographica Section D Biological crystallography 59, 38-44 (2002).
213. Terwilliger, T.C. et al. Iterative model building, structure refinement and density modification with the
PHENIX AutoBuild wizard. Acta crystallographica. Section D, Biological crystallography 64, 61-9
(2008).
214. Afonine, P.., Grosse-Kunstleve, R.W. & Adams, P.D. The Phenix refinement framework. CCP4
newsletter 42, (2005).
215. Chen, V.B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta
crystallographica. Section D, Biological crystallography 66, 12-21 (2010).
216. Reynolds, C., Damerell, D. & Jones, S. ProtorP: a protein-protein interaction analysis server.
Bioinformatics 25, 413-4 (2009).
217. Studier, F.W. & Moffatt, B.A. Use of bacteriophage T7 RNA polymerase to direct selective high-level
expression of cloned genes. Journal of molecular biology 189, 113-30 (1986).
206
218. Rosenberg, A. et al. Vectors for selective expression of cloned DNAs by T7 RNA polymerase. Gene 56,
125-135 (1987).
219. Zhu, B., Cai, G., Hall, E.O. & Freeman, G.J. In-fusion assembly: seamless engineering of multidomain
fusion proteins, modular vectors, and mutations. BioTechniques 43, 354-9 (2007).
220. Studier, F.W., Rosenberg, A.H., Dunn, J.J. & Dubendorff, J.W. Use of T7 RNA polymerase to direct
expression of cloned genes. Methods in enzymology 185, 60-89 (1990).
221. Hochuli, E., Döbeli, H. & Schacher, A. New metal chelate adsorbent selective for proteins and peptides
containing neighbouring histidine residues. Journal of chromatography 411, 177-84 (1987).
222. Sternbach, H., Englehardt, R. & Lezius, A.G. Rapid Isolation of Highly Active RNA Polymerase from
Escherichia coli and Its Subunits by Matrix-Bound Heparin. European journal of biochemistry 60, 51-
55 (1975).
223. Atkins, P. & De Paula, J. Physical Chemistry. 162-163 (Oxford University Press: New York, 2006).
224. Winzor, D.J. Analytical exclusion chromatography. Journal of biochemical and biophysical methods 56,
15-52 (2003).
225. Ben-Bassat, A. et al. Processing of the initiation methionine from proteins: properties of the Escherichia
coli methionine aminopeptidase and its gene structure. Journal of bacteriology 169, 751-757 (1987).
226. Oliva, A., Llabrés, M. & Fariña, J.B. Applications of multi-angle laser light-scattering detection in the
analysis of peptides and proteins. Current drug discovery technologies 1, 229-42 (2004).
227. Whitaker, J.R. Determination of Molecular Weights of Proteins by Gel Filtration of Sephadex.
Analytical chemistry 35, 1950-1953 (1963).
228. Andrews, P. Estimation of the molecular weights of proteins by Sephadex gel-filtration. The
biochemical journal 91, 222-33 (1964).
229. Perez, J.C. & Groisman, E.A. Evolution of transcriptional regulatory circuits in bacteria. Cell 138, 233-
44 (2009).
230. Petsko, G. Protein crystallography at sub-zero temperatures: Cryo-protective mother liquors for protein
crystals. Journal of molecular biology 96, 381-388 (1975).
231. Leslie, A.G.W. The integration of macromolecular diffraction data. Acta crystallographica. Section D,
Biological crystallography 62, 48-57 (2006).
232. Ma, Z., Cowart, D.M., Scott, R.A. & Giedroc, D.P. Molecular insights into the metal selectivity of the
copper(I)-sensing repressor CsoR from Bacillus subtilis. Biochemistry 48, 3325-34 (2009).
233. Long, F., Vagin, A.A., Young, P. & Murshudov, G.N. BALBES: a molecular-replacement pipeline.
Acta crystallographica. Section D, Biological crystallography 64, 125-32 (2008).
234. Berman, H.M. The Protein Data Bank. Nucleic acids research 28, 235-242 (2000).
235. Yeates, T.O. Detecting and overcoming crystal twinning. Methods in enzymology 276, 344-58 (1997).
236. Lovell, S.C., Word, J.M., Richardson, J.S. & Richardson, D.C. The penultimate rotamer library.
Proteins 40, 389-408 (2000).
237. Arendall, W.B. et al. A test of enhancing model accuracy in high-throughput crystallography. Journal of
structural and functional genomics 6, 1-11 (2005).
238. Palm, G.J. et al. Structural insights into the redox-switch mechanism of the MarR/DUF24-type regulator
HypR. Nucleic acids research gkr1316- (2012).doi:10.1093/nar/gkr1316
239. Reynolds C, Damerell D, J.S. ProtorP: a protein-protein interaction analysis server. Bioinformatics 3,
413-414 (2009).
240. Bahadur, R.P. & Zacharias, M. The interface of protein-protein complexes: analysis of contacts and
prediction of interactions. Cellular and molecular life sciences 65, 1059-72 (2008).
241. Voet, D. & Voet, J. Biochemisrty. (J. Wiley & Sons: 2004).
242. Aravind, L., Anantharaman, V., Balaji, S., Babu, M.M. & Iyer, L.M. The many faces of the helix-turn-
helix domain: Transcription regulation and beyond. FEMS Microbiology Reviews 29, 231-262 (2005).
243. Thompson, J.D., Gibson, T.J. & Higgins, D.G. Multiple sequence alignment using ClustalW and
ClustalX. Current protocols in bioinformatics Chapter 2, Unit 2.3 (2002).
244. Smaldone, G.T. & Helmann, J.D. CsoR regulates the copper efflux operon copZA in Bacillus subtilis.
Microbiology 153, 4123-8 (2007).
245. Iwig, J.S., Leitch, S., Herbst, R.W., Maroney, M.J. & Chivers, P.T. Ni(II) and Co(II) sensing by
Escherichia coli RcnR. Journal of the American Chemical Society 130, 7592-7606 (2008).
246. Changela, A. et al. Molecular basis of metal-ion selectivity and zeptomolar sensitivity by CueR. Science
301, 1383-7 (2003).
247. Iwig, J.S. & Chivers, P.T. DNA Recognition and Wrapping by Escherichia coli RcnR. Journal of
molecular biology 393, 514-526 (2009).
248. Chi, B.K. et al. The redox-sensing regulator YodB senses quinones and diamide via a thiol-disulfide
switch in Bacillus subtilis. Proteomics 10, 3155-64 (2010).
207
249. Antelmann, H. & Helmann, J.D. Thiol-based redox switches and gene regulation. Antioxidants & redox
signaling 14, 1049-63 (2011).
250. Barford, D. The role of cysteine residues as redox-sensitive regulatory switches. Current opinion in
structural biology 14, 679-86 (2004).
251. Matthews, J., Batki, A., Hynds, C. & Kricka, L. Enhanced chemiluminescent method for the detection
of DNA dot-hybridization assays. Analytical biochemistry 151, 205-209 (1985).
252. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio
collection. Molecular systems biology 2, 2006.0008 (2006).
253. Datsenko, K.A. & Wanner, B.L. One-step inactivation of chromosomal genes in Escherichia coli K-12
using PCR products. Proceedings of the National Academy of Sciences 97, 6640-6645 (2000).
254. Martinez-Morales, F., Borges, A.C., Martinez, A., Shanmugam, K.T. & Ingram, L.O. Chromosomal
integration of heterologous DNA in Escherichia coli with precise removal of markers and replicons used
during construction. Journal of bacteriology 181, 7143-7148 (1999).
255. Anderson, B.J. et al. Using Fluorophore-Labeled Oligonucleotides to Measure Affinities of Protein-
DNA Interactions. Methods in enzymology Volume 450, 253-272 (2008).