View
219
Download
1
Category
Preview:
Citation preview
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
rstb.royalsocietypublishing.org
ResearchCite this article: Kim JD, Senn S, Harel A,
Jelen BI, Falkowski PG. 2013 Discovering the
electronic circuit diagram of life: structural
relationships among transition metal binding
sites in oxidoreductases. Phil Trans R Soc B
368: 20120257.
http://dx.doi.org/10.1098/rstb.2012.0257
One contribution of 14 to a Discussion Meeting
Issue ‘Energy transduction and genome
function: an evolutionary synthesis’.
Subject Areas:biochemistry, evolution
Keywords:evolution, oxidoreductase, bioenergetics,
metabolic pathways, biogeochemical cycles,
protein structure
Author for correspondence:Paul G. Falkowski
e-mail: falko@marine.rutgers.edu
& 2013 The Author(s) Published by the Royal Society. All rights reserved.
Electronic supplementary material is available
at http://dx.doi.org/10.1098/rstb.2012.0257 or
via http://rstb.royalsocietypublishing.org.
Discovering the electronic circuit diagramof life: structural relationships amongtransition metal binding sites inoxidoreductases
J. Dongun Kim1,3, Stefan Senn3, Arye Harel3, Benjamin I. Jelen3,4
and Paul G. Falkowski1,2,3,4
1Department of Chemistry and Chemical Biology, and 2Department of Earth and Planetary Sciences,Rutgers University, Piscataway, NJ 08854, USA3Environmental Biophysics and Molecular Ecology Program, Institute of Marine and Coastal Sciences,Rutgers University, New Brunswick, NJ 08901, USA4Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA
Oxidoreductases play a central role in catalysing enzymatic electron-transfer
reactions across the tree of life. To first order, the equilibrium thermodyn-
amic properties of these proteins are governed by protein folds associated
with specific transition metals and ligands at the active site. A global analy-
sis of holoenzyme structures and functions suggests that there are fewer than
approximately 500 fundamental oxidoreductases, which can be further clus-
tered into 35 unique groups. These catalysts evolved in prokaryotes early in
the Earth’s history and are largely responsible for the emergence of non-
equilibrium biogeochemical cycles on the planet’s surface. Although the
evolutionary history of the amino acid sequences in the oxidoreductases is
very difficult to reconstruct due to gene duplication and horizontal gene
transfer, the evolution of the folds in the catalytic sites can potentially be
used to infer the history of these enzymes. Using a novel, yet simple analysis
of the secondary structures associated with the ligands in oxidoreductases,
we developed a structural phylogeny of these enzymes. The results of this
‘composome’ analysis suggest an early split from a basal set of a small
group of proteins dominated by loop structures into two families of oxido-
reductases, one dominated by a-helices and the second by b-sheets. The
structural evolutionary patterns in both clades trace redox gradients and
increased hydrogen bond energy in the active sites. The overall pattern
suggests that the evolution of the oxidoreductases led to decreased entropy
in the transition metal folds over approximately 2.5 billion years, allowing
the enzymes to use increasingly oxidized substrates with high specificity.
1. IntroductionBiologically driven electron-transfer reactions are the primary energy-transduction
processes across the tree of life. These reactions depend upon energy sources exter-
nal to the system. The two external energy sources on the Earth are solar radiation
and geothermally derived heat. These create chemical redox gradients, which are
coupled to biological redox reactions and ultimately drive the non-equilibrium ther-
modynamic reactions that make life possible. Indeed, the origin of life almost
certainly began with the evolution of a small set of metabolic processes coupled
to redox chemistry.
Oxidoreductases (enzyme commission 1, EC1) are a class of enzymes that
facilitate these proton-coupled electron-transfer reactions. All of the core metabolic
processes mediated by EC1 proteins evolved in prokaryotes and ultimately
became coupled on local and planetary scales to facilitate an electron ‘market’
between the major light elements. This electron market ultimately led to a
closed cycle between respiratory reactions and their biological oxidative analogues,
and photosynthesis and their biological reductive analogues. These reactions, which
rstb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
2
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
are far from thermodynamic equilibrium, allow gas exchanges
across the tree of life and transformed the Earth’s biogeochemical
cycles (figure 1a) [1].
Although the exact number of core EC1 proteins, including
orthologues, paralogues and analogues is unknown, their
functions appear to be encoded by fewer than approximately
500 unique genes (i.e. genes that encode for unique functions
including paralogues and analogues; figure 1a). One reason
that such a small set of proteins plays such a large role in the
Earth’s elemental cycles is that the different electron-transport
chains found ubiquitously throughout life share common
components. For example, chemoautotrophic, anaerobic and
aerobic respiration, as well as anoxygenic and oxygenic photo-
synthesis all use similar proton-coupled electron-transport
schemes. The basis of the scheme is separation of protons
from electrons across a membrane. The electrons are inevita-
bly ferried via carriers, across a set of membrane-bound
proteins, ultimately arriving at a transient sink that allows a
negatively charged carrier to be neutralized by a proton. The
protons are initially segregated from the electrons by the mem-
brane, thereby forming an asymmetric distribution of charge.
The return flow of the protons (i.e. the proton motive force)
is coupled to nanomachines, especially the ‘coupling factor,’
ATP synthase, which conserves the electrochemical energy as
chemical bond energy.
Although peripheral components of these proton-coupled
electron-transport reactions have been selected for specific
reaction substrates and products, the basic architecture of all
the core pathways shares similar protein structures and
ligands, including iron–sulfur clusters, pterins, haems and
quinones. These interchangeable structures and ligands have
evolved into a metabolic network with overlapping functions
across the tree of life (figure 1b). Additionally, many biological
energy-transduction systems share a small subset of metabo-
lic pathways such as glycolysis (the Embden–Meyerhof, the
Entner–Doudoroff, including Archaean modifications of
these pathways) or the reverse TCA cycle [2]. Thus, metab-
olism, using a variety of electron donors and acceptors,
draws catalysts from a core set of similar components and path-
ways to enable a flow of electrons and protons. This modular
approach to metabolism has provided great flexibility on a
relatively small number of EC1 genes. Indeed, prokaryotes
are often able to regulate major components of metabolism
in accordance with environmental conditions [3], often at
suboptimal efficiency.
Regardless of efficiency, the flux of electrons through the
metabolic network is particularly dependent on, and sensi-
tive to the availability of specific transition metals, especially
iron (figure 2). The bioavailability of transition metals is, in
turn, highly dependent on the redox state of the environment.
A recent whole-genome analysis of phylogenetically diverse
micro-organisms suggests that the earliest proteins incorporated
metals, and that metal usage over time evolved in accordance
with environmental availability [4]. The metals are invariably
coordinated to the protein scaffolds via a small set of specific
protein folds [1]. Identifying members and the evolutionary pat-
tern of this set of folds is critical to understanding the evolution
of metabolism across the tree of life, as well as the emergence of
biogeochemical cycles, far from equilibrium.
In this paper, we present an analysis of the evolutionary
history of metal usage, the structures of the protein folds
and redox state of the oxidoreductases across the tree of
life, which ultimately formed an electronic circuit on a
planetary scale. Our results suggest that the redox processes
connecting metabolism across the Earth’s surface underwent
a secular trend in evolutionary transitions that led to succes-
sively greater complexity and thermodynamic efficiency in
these critical enzymes over the first approximately 2.5 billion
years of the Earth’s history.
2. Transition metals in oxidoreductasesOxidoreductases catalyse electron-transfer reactions via pros-
thetic groups that usually contain transition metals whose
ions have incompletely filled d or f orbitals and which can
accept a electrons from protein side-chains [5,6]. Multiple oxi-
dation states, especially of molybdenum and manganese,
allow transition metals to be coupled with a wide range of
the reduction–oxidation reactions. Further, a specific metal–
ligand structure is ‘tuned’, or poised for specific redox reactions
by the protein–metal microenvironment [7].
Among transition metals, specific elements were naturally
selected for their physical–chemical properties, abundance,
coordination bond strength, atomic radii, solubility and polariz-
ability [8]. Iron is, by far, the most abundant transition metal on
the Earth [9]. The abundance of this element, especially as a
ferrous ion in the Archaean and early Proterozoic oceans, is
reflected in its wide use as a catalysing cofactor for oxidoreduc-
tases (figure 2). In Swiss–Prot [10] and the protein data bank
(PDB) [11], iron is identified as a catalytic component in more
that 67 per cent of the EC1 proteins. Iron-containing oxidoreduc-
tases can use the metal in a mineral form of a sulfide, or
coordinated to imidazole nitrogens in porphyrins, forming
haems. Following iron, oxidoreductases contain metals in
the following order of relative abundance: Cu . Mn . Ni .
Mo . Co . V . W (figure 2). It should be noted that under
anoxic and/or euxenic conditions, Cu and Mo are highly
insoluble unless the ions are oxidized.
3. Biopolymer – metal interactions in the‘ancient ocean’
Assuming that the conversion of energy via dissipation of redox
gradients was an early bioinorganic reaction essential for the
origin of life, it logically follows that transition metals played
a key role. Transition metals can undergo stoichiometric reac-
tions via photochemical processes or in solution phase with
other redox couples [12–15]. Transition metals are also capable
of carrying out catalytic reactions when surrounded by a
biopolymer that provides a specific structural framework
that allows reversible population of the metal–ligands with
electrons [16].
Lewis basic peptide side-chains, such as thiolates (cysteine),
imidazole nitrogens (e.g. histidines) and carboxylates (aspartic
and glutamic acids), can form coordination bonds via d-orbitals
in the transition metals. Metals bound by multiple side-chains
are locked within the peptide/protein matrix. These interac-
tions influence multiple physical properties of the holoprotein,
including solvent accessibility, tuned redox potential, optimiz-
ation of Gibbs free energy and enhanced substrate specificity.
Understanding how the earliest biopolymer–metal interactions
evolved is critical to understanding the origins of non-equilibrium
bioenergetic reactions, and hence the origins of life.
VI:oxygenic
photosynthesis
O2
Mn4+
Mn2+
(a)
(b)
NH4+ HS– CH4
O2-mediated
Mn-mediatedFe-mediatedS-mediated
N-mediatedH2- or C-mediated
respiration
photosynthesis
+CO2
H2Fe2+
Fe3+
NO–3
SO42–
CO2
H2O
N2
N2 fixation
VI
VI
IVI
III
IIV
photosystem I
and II,
cytochrome
b6f complex
cytochrome c
reductive Acetyl C
oA
pathway
I: aerobic sugarmetabolism
IV:hydrogenoxidation III:
denitrification
II: sulfatereduction
V: methanogenesis
NAD(P)H quinoneoxidoreductase
intermediate cytochromes,
iron sulfur proteins,
quinones
TCA
cycle
glycolysis, pyruvate
oxidation
APS reductasesulfite reductasesulfate reductase
nitrate reductasenitrite reductase
nitric oxide reductasenitrous oxide reductase
hydrogenase
coenzyme F420
[CH2O]
Figure 1. Network of life’s biologically mediated cycles modified from [1], and a Venn diagram of the basic metabolic pathways. (a) The interconnected ‘circuit board’ oflife’s electron-transfer reactions (for C, H, O, N, S and Fe) is labelled with pathway groups I to VI, corresponding to metabolic networks. (b) The metabolic pathway groups Ito VI (large font) are shown to have redox components of electron-transport chains as well as auxiliary pathways in common (small font). Pathway group I: ‘aerobic sugarmetabolism’ has components such as NAD(P)H reductases, as well as membrane-localized cytochromes and quinones, in common with pathway group VI: ‘oxygenic photo-synthesis’. Pathway group II: ‘sulfate reduction’ utilizes oxidized sulfur compounds as terminal electron acceptors and carry out carbon metabolism via reductive acetyl CoApathway or the TCA cycle. Pathway group III: ‘denitrification’ uses unique oxidoreductases to reduce nitrates and other oxidized nitrogen compounds but still requires thecommon components of the electron-transport chain as well as glycolysis and the tricarboxylic acid (TCA) cycle as a source of reductant. Pathway group IV: ‘hydrogenoxidation’ uses an electron-transport chain but is not dependent on glycolysis or the TCA cycle for producing reductant. Pathway group V: ‘methanogenesis’ does notuse an electron-transport chain and is the only pathway using coenzyme F420. Size of circles is irrelevant.
rstb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
3
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
Co2%
Ni4%
Mo4%
Mn6%
Fe26%
Cu14%
Fe (haem)43%
V1%
W0%
Figure 2. Transition metals in EC1 proteins. The relative contribution of themajor transition metals found in protein data bank (PDB) structures anno-tated as oxidoreductases.
rstb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
4
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
In the early Archaean [17], metals in minerals may have
played a significant role in adsorbing and concentrating
organic molecules and catalysing various chemical reactions
implicated in the origin of non-equilibrium redox reactions.
Provided with the building blocks of life, metals bound to
short peptides could have functioned as protoenzymes, as
is proposed by models of early protein evolution [18,19].
An early protoenzyme would have had to originate and
evolve under a strict set of rules:
(1) The electronic structure of transition metals must match
geometrical requirements for metal–ligand coordination
number and geometry. As a result, the emergent entactic
states constrain the subsequent evolution of the structures,
and topologies of the coordinately bound polypeptides or
other molecules required for catalysis.
(2) The mildly reducing environment of the ancient oceans
[20] required a relatively low midpoint potential of the
early redox organocatalysts, especially compared with
more recently evolved oxidoreductases. The ligand
environment was modulated to tune the midpoint poten-
tial to meet the functional requirements.
To meet these constraints, random polypeptides almost cer-
tainly evolved in association with the available metals and
continuously selected for sequences and folds that satisfied
both structural and functional constraints. Such a combinator-
ial search to local minima would have persisted until the
polypeptide–metal complex acquired locally optimized cataly-
tic functions. The specific folds almost certainly coevolved over
geological time with an increasing larger set of coupled biogeo-
chemical cycles. The ancient folds were spread genetically
across the nascent tree of life primarily via horizontal gene
transfer, but ultimately diverged into several motifs. The
subsequent structural innovations were accelerated by various
modes of evolution such as gene insertion, duplication
and partial loss. Evolved core protein folds became
molecular ‘modules’ from which a variety of biomachines
could ultimately be built via a ‘mix and match’ set of motifs.
4. Evolution of sequences and foldsBecause of both their modularity and early spread across the tree
of life, it is extremely difficult to determine the evolutionary heri-
tage of folds in the oxidoreductases solely based on analysis of
organisms, sequences or synteny within highly conserved oper-
ons. While the core oxidoreductases in oxygenic photosynthesis
are extremely highly conserved, inspection reveals major
sequence degeneracy in closely related structures. For example,
transmembrane-spanning helices of photosystem I and II have
highly divergent sequences, yet their structures are almost
identical [21]. This basic phenomenon was noted early on by pio-
neers in the field of bioinformatics. Indeed, Eck & Dayhoff [22, p.
363], noted ‘the processes of natural selection severely inhibit
change to a well-adapted system on which several other essential
components depend’. While their comments were based on the
highly conserved structure of ferredoxin, they apply to many
ancient proteins, including enzymes that do not catalyse redox
reactions. For example, ribulose-1,5-bisphosphate carboxylase
oxygenase (EC 4.1.1.39) is a carboxylyase. The enzyme is respon-
sible for the fixation of CO2 in many photosynthetic and
chemoautotrophic organisms. This crucial enzyme cannot
easily distinguish between its ‘true’ substrate, CO2 and O2. The
result is that at present atmospheric levels of O2, the enzyme is
often remarkably inefficient. Moreover, the catalytic turnover
of the reaction, even under optimal conditions, is much slower
than reactions feeding the substrate (CO2) or removing the pro-
duct (3-phosphoglycerate). Regardless, the catalytic site of the
enzyme is highly conserved and the biological result of this con-
servation is that organisms often synthesize the enzyme in
excess to achieve maximum overall growth efficiency [23].
There are many other, similar examples.
The fundamental physico-chemical properties that govern
the major protein fold conformations have remained unchan-
ged. Reinvention of metalloenzyme folds is highly restricted,
given that geometrical and energetic selection processes
limit structural solutions. Let us examine a novel approach to
identifying and ordering the structural solutions found in
extant oxidoreductases.
5. The ‘composome’ approachWe hypothesize that the ensemble of secondary structures in
the region surrounding the catalytically active metals has
been selected to facilitate catalysis of the holoenzyme. We
further hypothesize that these secondary structures are the out-
come of selection and provide a window into the processes in
which protein folds evolved. We assume that the composition
of the secondary structural motifs reflects the evolutionary his-
tory of the protoenzyme from which the extant motif is
descended, and was inherited with modifications through a
myriad of organisms to form the observed protein fold. We
further assume that the folds must obey the rules set by the
d-block metal coordination chemistry [5]. These underlying
hypotheses are extensions from our previous work where we
proposed that secondary structures around the metal or
metal–ligand in the active site would be more conserved
than elsewhere in the protein [24]. We call this quantitative
analysis of the secondary structure of the folds in active sites
formate dehydrogenase (PDB: 2iv2)
cytochrome c oxidase (PDB: 2gsm) oxalate oxidase (PDB: 2et1)
toluene 2,3-dioxygenase (PDB: 3en1)
sheet (%)
helix
(%) loop (%
)
0 10 20 30 40 50 60 70 80 9010
0%
0
10
100%
90
80
70
60
50
40
20
30
10
0
20
30
40
50
60
70
80
90
100%
MnFe4S4Mo–Fe–SdiCuMO-PterinCuFe2S2FeFe-haem
Figure 3. Ternary vector space used for the secondary structure composition (i.e. the ‘composome’) and representative structures. Representative structures areshown for each of the major folds in the ternary space.
structural similarity (%)
com
poso
me
dist
ance
100 80 60 40 20 0
1.2
1.0
0.8
0.6
0.4
0.2
0
Figure 4. Comparison of the composome approach with a tertiary structuralanalysis. The structural similarity and composome distance (described in text)are compared in a dot plot of 3240 alignments of 82 metal catalytic sites. The447 alignments between the same metal cofactors are coloured red.Alignments that are less than 40 per cent structurally similar (grey-shadedarea) can be considered inconclusive in terms of structural similarity. Theinset shows an alignment between the manganese-binding motif in oxalateoxidase (2et1, EC 1.2.3.4; orange for matched, blue for unmatched) and theiron-binding motif of clavaminate synthase (1ds1, EC 1.14.11.21; red formatched, green for unmatched). The folds are very similar in secondary struc-ture content and overall fold (52.64% structurally similar, r.m.s.d. 1.86 A).The sequence identity derived from the 50 residues’ alignment is 6%, orthree residues, two of which are the metal-coordinating histidines (illustratedin stick representation). This analysis strongly suggests that the ‘composome’approach based solely on secondary structures can differentiate betweenclosely related folds with a high degree of accuracy.
rstb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
5
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
the elucidation of a ‘composome’. To the best of our knowl-
edge, this is the first attempt to infer quantitative distances
between distinctly different protein folds based solely on
secondary structural composition. The resulting ‘phylogeny’
represents a linkage of fold relationships in structural space,
and obviously is not intended to imply linear, monophyletic
history of the evolution of EC1 proteins.
The approach uses PDB files that possess previously deter-
mined ‘gold standard’ domains [25]. From this dataset, we
extracted a subset of representatives using the best resolu-
tion structure for each organism per ‘gold standard’ domain
(see the electronic supplementary material). We included
all structures of orthologous proteins with high resolution
for every gold standard domain. For every domain, the
corresponding metal–ligand was treated as the catalytic site.
For each catalytic site, we first collected a list of amino acid
residues that are within 15 A from a catalytic metal, based on
carbon alpha (Ca) coordinates. Secondary structures of the
residues were assigned using the Define Secondary Structure
of Proteins (DSSP) database [26]. The overall secondary structure
composition (i.e. the ‘composome’) around each metal-bearing
catalytic site was determined and further adjusted based on a
residue-metal distance by a factor of 1/r2. These secondary struc-
tural compositions, from each catalytic site, were plotted in a
ternary vector space (figure 3). Structures that contained both
identical metals and a Euclidean distance in composome space
less than 0.02 A were collapsed into a single representative struc-
ture, resulting in 82 final representative structures. Each data
point in this ternary space diagram represents an individual
oxidoreductase. The data points largely cluster into two
groups with helix-rich and sheet-rich clades. For each metal or
metal–ligand, data points tend to aggregate, suggesting that
the physico-chemical properties of metals constrain the entactic
evolution towards specific compositions of secondary structure
around the catalytic site, regardless of catalytic function.
The compositional finger printing method described
for calculating relations in folds across all known EC1
structures collapses three-dimensional information into a one-
dimensional matrix. To assess the effects of this mathematical
simplification and compression, the Ca backbone environments
for the 82 structurally different catalytic sites were superimposed
in a pairwise, all versus all fashion. We used a structural align-
ment protocol that calculates a novel similarity score, which
combines alignment length and spatial deviation into one
measurement [27]. The similarity values of highly similar struc-
ture pairs (more than 40% structurally similar) correlate with low
compositional profile distances (figure 4). This result strongly
not found in extant folds
–2.0
–3.0
–4.0
primordial polypeptide simple foldlow H-bond energy
complex foldhigh H-bond energy
protein evolvability(kcal m
ol –1 per residue)
beta barrel
100% sheet
80
60
40
20
0100% helix
100% loop
mixed helix and sheet
iron(haems)
Figure 5. Average hydrogen bond energy per amino acid residue associated with the metal/metal – ligand centre across the composome surface. Hydrogen bondenergy is a proxy for fold ‘evolvability’. The evolvability contour plot is generated based on average hydrogen bond energy per residue from 82 data points andoverlayed on the ternary plot ( figure 3). As protein folds acquire more helix or sheet secondary structures, they become less flexible and their folds are increasinglyfixed. The evolution of fold occurred starting from simple disordered state (red) to complex organized state (blue).
rstb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
6
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
suggests that, in principle, secondary structures retain sufficient
topological information such that a quantitative analysis of the
fold can retrieve conformational relationships with sufficient res-
olution and confidence to derive the structural history of the
folds in catalytic sites.
Indeed, for dissimilar structural pairs, secondary structure
composition maintains a high degree of predictive performance.
We find significantly lower secondary structure composition
distances for environments of same ligands compared with
different ligands (figure 4), especially for pairs where the calcu-
lated Ca backbone structure similarity is less than 40 per cent
and therefore inconclusive. Very high composome distances
are generally not observed for same ligand pairs, suggesting
a limit of the differences between same ligand environment sec-
ondary structures. This phenomenon appears to be related to
hydrogen bonding patterns that are required by certain elec-
tronic structures around metal or metal–ligand catalytic sites,
leading to similar secondary structure composition signatures.
We kept orthologous structures with detectable sequence
similarity in our dataset for two reasons: (i) to show the
structural variability of a motif and get the full spectrum of sec-
ondary structure composition and (ii) to highlight the sensitivity
of secondary structure composition analysis. This redundancy
also includes phylogenetic information and represents a positive
control for our analysis. One example is the molybdenum cluster
(figure 3) where related structure domains (SCOP family:
molybdenum cofactor-binding domain) are sampled over a
wide phylogenetic space as well as different functions according
to EC classifications.
Let us now examine the fold ‘phylogeny’ derived from
the composome analysis.
6. Fold phylogeny based on composomeA polypeptide chain has an intrinsic property of having
dynamic conformational variability, but specific structures
with relatively rigid folds are often conferred for specific bio-
logical functions. Fold evolution is achieved when a protein
structure with a functional promiscuity has a flexible chain,
which is ‘evolvable’ [28].
To check the evolvability of each protein fold, we calculated
the average hydrogen bond energy per residue around each
metal site. We postulated that hydrogen bond energy is inver-
sely proportional to the evolvability of the protein fold. The
analysis (figure 5) suggests that ‘loop-rich’ folds have low
hydrogen bond energy per residue, making the fold more ‘evol-
vable’; loops and coils tend to be more flexible. By contrast,
a-helices or b-sheets form more extended hydrogen-bonding
networks, making the fold far more rigid. Also the extensive
hydrogen-bonding network confers a high contact order,
which corresponds to a complex and slow folding kinetics
HEM
LA
NO
STER
OL1
4-A
LPH
A-D
EMET
HY
LASE
HE
M C
YTO
CH
RO
ME
P450
2C8
HE
M C
YT
OC
HR
OM
EP4
502C
9H
EM
CY
TO
CH
RO
ME
P450
2C5
HE
M P
450M
ON
OO
XY
GE
NA
SE
HE
M C
YT
OC
HR
OM
EP450130
HE
M PE
RO
XID
ASE
N
HE
M C
ATA
LA
SE-PE
RO
XID
ASE
CFN
NIT
RO
GE
NA
SEM
OLY
BD
EN
UM
-IRO
NPR
OT
EIN
AL
PHA
CH
AIN
HE
M C
ATA
LA
SE-PE
RO
XID
ASE
CFM
NITR
OG
ENA
DEM
OLY
BD
ENU
M-IR
ON
PRO
TEINA
LPHA
CH
AIN
HEM
CY
TOC
HRO
MEP4507A
1
HEM
CYTO
CHRO
MEP450CY
P119
FE 4 5-DIOXYGENASEALPHACHAIN
SF4 IRONHYDROGENASE1
HEM ENDOTHELIALNITRIC-OXIDESYNTHASE
HEM BOVINEENDOTHELIALNITRICOXIDESYNTHASEHEMEDOMAIN
HEM NITRIC-OXIDESYNTHASEHOMOLOG
HEM NITRICOOXIDESYNTHASE
HEM NITRICOXIDESYNTHASE INDUCIBLE
CFM NITROGENASEMOLYBDENUMIRONPROTEINALPHACHAIN
CFM NITROGENASEMOLYBDENUM-IRONPROTEINALPHACHAIN
CFM NITROGENASEMOLYBDENUMIRONPROTEINALPHACHAIN
SF4 PERIPLASMIC[NIFE] HYDROGENASESMALLSUBUNIT
SF4 RESPIRATORYNITRATEREDUCTASE1ALPAHACHAIN
SF4 FORMATEDEHYDROGENASEH
SF4 FORMATEDEHYDROGENASE[LARGESUBUNIT]SF4 FORMATEDEHYDROGENASE NITRATE-INDUCIBLE MAJORSUBUNIT
FES APOCYTOCHROMEFMN OXALATEOXIDASE1CU PHENYLETHYLAMINEOXIDASE
CU COPPERAMINEOXIDASE
CU AMILORIDE-SENSITIVEAMINEOXIDASE
CU COPPERAMINEOXID
ASE LIVERISOZYME
FE2 ISOPENIC
ILLIN
NSYNTHASE
CU PEPTIDYL-G
LYCIN
EALPHA-AM
IDATIN
GMONOOXYGENASE
CU SUPEROXID
EDISM
UTASE[C
U-ZN]
CU C
OPP
ER-ZIN
CSUPE
ROXID
EDIS
MU
TASE
FE H
YD
ROX
YQ
UIN
OL1
2-D
IOX
YG
ENA
SE
CU
SU
PERO
XID
EDIS
MU
TASE
FE C
ATEC
HO
L1 2
-DIO
XY
GEN
ASE
CU
EX
TR
AC
EL
LU
LA
RSU
PER
OX
IDE
DIS
MU
TASE
[CU
-ZN
]
FE2
CL
AV
AM
INA
TE
SYN
TH
ASE
1
FE2
AL
PHA
-KE
TO
GL
UTA
RA
TE
-DE
PEN
DE
NT
TAU
RIN
ED
IOX
YG
EN
ASE
CU
CO
PPE
R-Z
INC
SUPE
RO
XID
ED
ISM
UTA
SEC
U C
OPP
ER
-ZIN
CSU
PER
OX
IDE
DIS
MU
TASE
FE C
HL
OR
OC
AT
EC
HO
L1
2-D
IOX
YG
EN
ASE
FE2 A
SPAR
AG
INE
OX
YG
EN
ASE
FE PR
OT
OC
AT
EC
HU
AT
E3 4-D
IOX
YG
EN
ASE
AL
PHA
CH
AIN
FE PR
OT
OC
AT
EC
HU
AT
E3 4-D
IOX
YG
EN
ASE
CFM
NIT
RO
GE
NA
SEM
OLY
BD
NU
MIR
ON
PRO
TE
INA
LPH
AC
HA
IN
HE
M P
UTA
TIV
EC
YT
OC
HR
OM
EP4
5012
4
FE D
ESU
LFO
FER
RO
DO
XIN
FE2 D
ESU
LFO
FER
RO
DO
XIN
FE PUTATIV
ESUPER
OX
IDER
EDU
CTA
SE
CU
A BA
3-TYPEC
YTO
CH
ROM
E-CO
XID
ASE
FE SUPERO
XID
EREDU
CTASE
SF4 NITRO
GENA
SEIRON
PROTEIN
FES CYTOCHROMEB
CUB CARBONMONOXIDEDEHYDROGENASESM
ALLCHAIN
PCD CUTS IRON-SULFURPROTEINOFCARBONMONOXIDE
PCD 4-HYDROXYBENZOYL-COAREDUCTASEALPHASUBUNIT
MOS XANTHINEDEHYDROGENASE
MOM XANTHINEDEHYDROGENASE and OXIDASE
PCD ALDEHYDEOXIDOREDUCTASEFES CYTOCHROMEB6
FES RIESKEIRON-SULFURPROTEINFES RIESKEPROTEINFES NAPHTHALENE1 2-DIOXYGENASEALPHASUBUNIT
FES BENZENE1 2-DIOXYGENASESUBUNITALPHA
CUA BA3-TYPECYTOCHROME-COXIDASE
CU PHENOLOXIDASESUBUNIT2
SF4 DIFYDROPYRIMIDINEDEHYDROGENASE
HEM MYELOPEROXIDASE
HEM LACTOPEROXIDASE
HEM PEROXIDASEMANGANESE-D
EPENDENTI
HEM PUTATIV
ECYTOCHROMEP450120
HEM V
ERSATILEPEROXID
ASEVPL2
HEM C
YTOSOLIC
ASCORBATEPE
ROXIDASE
C20 P
OLY
PHEN
OLO
XID
ASE
CH
LORO
PLA
ST
HEM
CY
TOCH
ROM
EP45
0-SU
1
C20
CAT
ECH
OLO
XID
ASE
(a)
(d )
(b)
(c)
Figure 6. A tree of protein folds based on evolvability of metal binding sites (average hydrogen bond energy per residue, figure 5) and a Euclidean distance matrix based onthe composome analysis (i.e. figure 3). Full names of proteins that include metal binding folds are provided and the metal types are labelled with different colours (light blue:Fe2S2 cluster; purple: molybdopterin; pink: Fe4S4 cluster; red: iron; yellow: haem; cyan: copper; green: molybdenum – iron – sulfur). A complete list of examined proteinstructures is provided in the electronic supplementary data. During early evolution, polypeptide chains gained catalytic ability through random mutations. (a) The earliestprotein folds would have resembled extant Rieske or molybdopterin folds, which have high evolvability. (b) Conformational searches yields folds with both a-helix and b-sheet structures, such as ferredoxins. (c) Further conformational searches result in the divergence of protein structure, giving rise to a-helix-rich and (d) b-sheet folds.
rstb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
7
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
[29]. While evolution can obviously still occur in these folds, the
resulting structures are highly conserved. Mixed usage of both
helices and sheets requires adjoining loop regions. The result is
a variable degree of hydrogen bonding across the peptide back-
bone. Protein folds with both helices and sheets are expected be
relatively highly evolvable. In our analysis, we observe two
‘evolvability hot spots’ (figure 5). One such area is occupied
by a ferredoxin fold, which might be one of the ancient extant
oxidoreductase structures [22], whereas other obvious hotspots
of origins do not exist among folds in our dataset (see the elec-
tronic supplementary material).
Based on the evolvability, we hypothesize a parsimonious
condition (analogous to a Bayesian prior) for fold evolution:
that is, folds evolved from conditions with fewer to more
hydrogen bond energies. Hence, loop- and coil-rich protein
structures, lacking secondary structure, would be located
close to the root of the phylogenetic tree of electron-transfer
folds. Using the secondary structure compositional vectors
for each protein fold, a matrix of Euclidian distances was gen-
erated, and a phylogenetic tree of protein folds (figure 6) was
calculated using the Fitch–Margoliash algorithm and global
tree optimization, as provided by the PHYLIP package [30].
The Rieske fold was chosen as a root for building a monophy-
letic tree; however, it is impossible to prove that the actual
evolution of protein folds occurred in a monophyletic
fashion, and it most likely did not. For example, the higher
midpoint potential of Rieske proteins suggests that these
structures evolved after ferredoxin. Clearly, simple folds
could have been recruited from other functions to become a
catalytic part of oxidoreductases. However, the basic
r
8
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
topology of the fold tree is potentially informative about the
evolutionary history of electron-transfer reactions.
stb.royalsocietypublishing.orgPhilTransR
SocB368:20120257
7. Iron – sulfur proteinsIron–sulfur-containing ferredoxin folds are located near the
centre of the ternary plot, indicating relatively equal amounts
of helix, sheet and loop composition. The location of iron–
sulfur proteins on a ternary plot coincides with a high
evolvability hotspot, suggesting ferredoxin folds might be
the common ancestor of many extant protein folds for a
number of reasons.
First, iron–sulfur minerals were thought to be relatively
abundant in the early Archaean ocean and it has been specu-
lated that iron–sulfur clusters played an important role in the
evolution of bioenergetic redox transduction systems
[14]. These hypotheses postulate that the earliest biologically
relevant redox reactions occurred on iron–sulfur mineral depos-
its associated with hydrothermal vents. Second, ferredoxin is
found across the tree of life, and is found alone or as a domain
in larger proteins, many of which are encoded by the core
redox genes of life. Third, sequence and structural symmetry of
ferredoxins suggests that they may have evolved from a gene
duplication event of a 28–30 amino acid sequence, each capable
of binding one iron–sulfur cluster. Sequence analyses by Eck–
Dayhoff revealed even shorter repeats of four amino acids,
suggesting a prebiotic ‘protoferredoxin’ that was potentially
composed of a primaeval subset of the 20 amino acids [22].
Fourth, all ferredoxins have a simple, conserved fold that binds
two Fe4S4 clusters and is composed of fifty to sixty amino acids.
8. Molybdopterin and Rieske proteinsThe lack of a helix/sheet is characteristic of Rieske iron–sulfur-
containing proteins and molybdopterin proteins. Although
pterins may have been formed very early in the Earth’s history
(as they are derived from GTP), it is unclear how these proteins
carry out specific biological functions without a fixed helix or
sheet secondary structure. Regardless, our composome analysis
suggests that these folds have the highest potential to evolve
(figure 5). Unlike nitrogenases, molybdopterin proteins, such as
dimethyl sulfoxide (DMSO) reductase or nitrate reductase,
have a molybdenum atom chelated by pterins, which is
surrounded by a protein environment that lack either a-helix or
b-sheet. In the absence of Mo, tungsten can serve as a replacement
owing to its similar chemical properties, and it is possible
that tungsten-containing proteins gave rise to molybdenum-con-
taining protein folds in a prebiotic world [31].
9. Molybdenum – iron – sulfur proteinsReduction of N2 to NH3 is catalysed by nitrogenases, the
modern forms of which contain molybdenum–iron–sulfur
clusters. Molybdenum-containing nitrogenase folds are closely
located near ferredoxin folds in a ternary composome space,
indicating their similarity in the secondary structure com-
position with mixed a-helix and b-sheet. Our composome
analysis suggests that the Mo–Fe–S-cluster-containing folds
may have evolved from Fe–S folds. Both iron–sulfur folds
and molybdenum–iron–sulfur folds have a-helix and b-
sheet elements with a significant loop content, adding flexi-
bility to the core structure. As a result, these folds may have
diversified by exploring different conformations and
compositions, mainly diverging into two distinct clades: an
a-helix-rich clade and a b-sheet-rich clade.
10. Rubredoxin, Mn and Cu proteinsThe Fe atom is found alone or as a ligand complex with a por-
phyrin ring in most extant folds. Rubredoxin is a single iron-
containing fold with high b-sheet content. Manganese and
copper folds also appear at the far edges of the compo-
some-space in the b-sheet clade. These folds form extensive
hydrogen bond networks across the backbone amide and car-
bonyl, making the fold less evolvable. In return, these folds
gained high catalytic specificity and accuracy with rigid
structure, but sacrificed evolvability.
11. Haem proteinsHaem-containing proteins are a hallmark of many electron-
transfer reactions. Unlike rubredoxins, haem folds have an
iron atom residing in a porphyrin ring surrounded by
a-helices. The iron atom has octahedral coordination geome-
try, where axial positions are ligated to histidine side-chains.
A surrounding porphyrin ring allows iron atoms to be
incorporated into a protein scaffold with just one amino
acid residue binding the iron directly. The usage of porphyrin
rings may have relieved the stringent geometric requirement
of the iron coordination (i.e. rubredoxin), allowing an explo-
sive diversification of haem folds. Haems are among the most
abundant protein cofactors in oxidoreductases (figure 2) and
widely covered in composome ternary space (figure 3).
12. Concluding remarksLife is dependent on the catalytic function of a relatively small
set of core proteins. Within this core set, oxidoreductases play
an outsized role, but their evolutionary history is poorly under-
stood. Our brief analysis conforms to the hypothesis that the
evolution of the oxidoreductases is, like for all proteins, a
trade-off between enzyme specificity and evolvability. Assum-
ing that the evolutionary trajectory followed from a disordered
state to an ordered state, proteins appear to have evolved from
loop-rich into either a-helix-rich or b-sheet-rich structures,
becoming more specific but less evolvable. The early split
between the two major clades of oxidoreductases continued
to follow a decrease in the free energy within the active site—
a long-term displacement of internal entropy that coincides
with an increased contact order of the core structure. The selec-
tion pressure that led to the increased hydrogen bonding
energy concordant with increased redox potential in oxido-
reductases may be an outcome of evolution, or a result of a
feedback between the initial ignition of a non-equilibrium ther-
modynamic system that is self-sustaining with a positive
feedback. Regardless, the redox space explored by biology
over the past two billion years appears to be extremely limited.
We conclude that life has evolved a very small set of key
core catalysts, the genes of which are transmitted across vast
expanses of geological time by microbes, allowing a fundamen-
tal set of electron-transfer reactions to permit energy extraction
from an open system. The microbes themselves are temporary,
disposable carriers of the core genes. They go extinct but
rstb.royalso
9
on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from
transfer the functions onwards. How these catalysts came to be
coupled in specific sequences and with other chemical reactions
to obviate non-equilibrium systems remains a major challenge
to understanding the origins and continuity of life on the Earth.
This research is funded by the Gordon and Betty Moore Foundationthrough Grant GBMF2807 to Paul Falkowski. We thank DoronLancet for suggesting the term ‘composome’. We thank Yana Brom-berg, Vikas Nanda, David A. Case and two anonymous reviewersfor constructive comments.
cietypublishin
References
g.orgPhilTransR
SocB368:20120257
1. Falkowski PG, Fenchel T, Delong EF. 2008 Themicrobial engines that drive Earth’s biogeochemicalcycles. Science 320, 1034 – 1039. (doi:10.1126/science.1153213)
2. Braakman R, Smith E. 2012 The emergence andearly evolution of biological carbon-fixation. PLoSComput. Biol. 8. e1002455. (doi:10.1371/journal.pcbi.1002455)
3. Unden G, Bongaerts J. 1997 Alternative respiratorypathways of Escherichia coli: energetics andtranscriptional regulation in response to electronacceptors. Biochim. Biophys. Acta 1320, 217 – 234.(doi:10.1016/S0005-2728(97)00034-0)
4. Dupont CL, Yang S, Palenik B, Bourne PE. 2006Modern proteomes contain putative imprints ofancient shifts in trace metal geochemistry. Proc. NatlAcad. Sci. USA 103, 17 822 – 17 827. (doi:10.1073/pnas.0605798103)
5. Holm RH, Kennepohl P, Solomon EI. 1996 Structuraland functional aspects of metal sites in biology. Chem.Rev. 96, 2239 – 2314. (doi:10.1021/cr9500390)
6. Metz S, Thiel W. 2011 Theoretical studies on thereactivity of molybdenum enzymes. Coord. Chem. Rev.255, 1085 – 1103. (doi:10.1016/j.ccr.2011.01.027)
7. Dey A, Jenney Jr FE, Adams MWW, Babini E, TakahashiY, Fukuyama K, Hodgson KO, Hedman B, Solomon EI.2007 Solvent tuning of electrochemical potentials inthe active sites of HiPIP versus ferredoxin. Science 318,1464 – 1468. (doi:10.1126/science.1147753)
8. Williams RJP. 1990 Overview of biological electron-transfer. Adv. Chem. Ser. 226, 3 – 23. (doi:10.1021/ba-1990-0226.ch001)
9. Schlesinger KJ et al. 2012 The metallicitydistribution functions of segue g and k dwarfs:constraints for disk chemical evolution andformation. Astrophys. J. 761, 160. (doi:10.1088/0004-637X/761/2/160)
10. The UniProt Consortium 1. 2012 Reorganizing theprotein space at the universal protein resource(UniProt). Nucleic Acids Res. 40, D71 – D75. (doi:10.1093/nar/gkr981)
11. Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF,Brice MD, Rodgers JR, Kennard O, Shimanouchi T,
Tasumi M. 1997 The protein data bank. Acomputer-based archival file for macromolecularstructures. Eur. J. Biochem. 80, 319 – 324. (doi:10.1111/j.1432-1033.1977.tb11885.x).
12. Mauzerall DC. 1990 The photochemical origins oflife and photoreaction of ferrous ion in the archaeanoceans. Origins Life Evol. Biosph. 20, 293 – 302.(doi:10.1007/BF01808111)
13. Braterman PS, Cairns-Smith AG, Sloper RW. 1983Photo-oxidation of hydrated Fe2þ significance forbanded iron formations. Nature 303, 163 – 164.(doi:10.1038/303163a0)
14. Wachtershauser G. 1998 Pyrite formation, the 1stenergy-source for life: a hypothesis. Syst. Appl.Microbiol. 10, 207 – 210. (doi:10.1016/S0723-2020(88)80001-8)
15. Martin W, Russell MJ. 2003 On the origins of cells: ahypothesis for the evolutionary transitions fromabiotic geochemistry to chemoautotrophicprokaryotes, and from prokaryotes to nucleatedcells. Phil. Trans. R. Soc. Lond. B 358, 59 – 85.(doi:10.1098/rstb.2002.1183)
16. Cody GD. 2004 Transition metal sulfides and theorigins of metabolism. Annu. Rev. Earth Planet. Sci.32, 569 – 599. (doi:10.1146/annurev.earth.32.101802.120225)
17. Oparin AI. 2003 Origin of life. New York, NY: DoverPublications.
18. Hazen RM, Sverjensky DA. 2010 Mineral surfaces,geochemical complexities, and the origins of life.Csh Perspect. Biol. 2, a002162. (doi:10.1101/cshperspect.a002162)
19. Wachtershauser G. 1988 Before enzymes andtemplates: theory of surface metabolism. Microbiol.Rev. 52, 452 – 484.
20. Holland HD. 1984 The chemical evolution of theatmosphere and oceans. Princeton, NJ: PrincetonUniversity Press.
21. Sadekar S, Raymond J, Blankenship RE. 2006Conservation of distantly related membraneproteins: photosynthetic reaction centers share acommon structural core. Mol. Biol. Evol. 23,2001 – 2007. (doi:10.1093/molbev/msl079)
22. Eck RV, Dayhoff MO. 1966 Evolution of the structureof ferredoxin based on living relics of primitiveamino acid sequences. Science 152, 363 – 366.(doi:10.1126/science.152.3720.363)
23. Lorimer GH. 1981 The carboxylation andoxygenation of ribulose 1, 5-bisphosphate: theprimary events in photosynthesis andphotorespiration. Annu. Rev. Plant Physiol. 32,349 – 382. (doi:10.1146/annurev.pp.32.060181.002025)
24. Kim JD, Rodriguez-Granillo A, Case DA, Nanda V,Falkowski PG. 2012 Energetic selection of topologyin ferredoxins. PLoS Comput. Biol. 8, e1002463.(doi:10.1371/journal.pcbi.1002463)
25. Harel A, Falkowski P, Bromberg Y. 2012 TrAnsFuSErefines the search for protein function:oxidoreductases. Integr. Biol. 4, 765 – 777. (doi:10.1039/c2ib00131d)
26. Kabsch W, Sander C. 1983 Dictionary of proteinsecondary structure: pattern recognition ofhydrogen-bonded and geometrical features.Biopolymers 22, 2577 – 2637. (doi:10.1002/bip.360221211)
27. Sippl MJ, Wiederstein M. 2012 Detection of spatialcorrelations in protein structures and molecularcomplexes. Structure 20, 718 – 728. (doi:10.1016/j.str.2012.01.024)
28. Tokuriki N, Tawfik DS. 2009 Protein dynamism andevolvability. Science 324, 203 – 237. (doi:10.1126/science.1169375)
29. Plaxco KW, Simons KT, Baker D. 1998 Contact order,transition state placement and the refolding rates ofsingle domain proteins. J. Mol. Biol. 277, 985 – 994.(doi:10.1006/jmbi.1998.1645)
30. Felsenstein J. 1985 Phylogenies and thecomparative method. Am. Nat. 125, 1 – 15. (doi:10.1086/284325)
31. Schoepp-Cothenet B, van Lis R, Philippot P,Magalon A, Russell MJ, Nitschke W. 2012 Theineluctable requirement for the trans-ironelements molybdenum and/or tungsten in theorigin of life. Sci. Rep. 2, 263. (doi:10.1038/srep00263)
Recommended