Upload
somasushma
View
132
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Organisms sense stimuli at the molecular level using a relatively small set of protein domains. Computational analysis of protein sequences along with directed experimental studies have played a major role in the characterization of these protein domains. These sensor domains directly or indirectly detect a vast array of sensory inputs such as solutes, gases, redox potential and light. Here, we systematically survey the types of sensor domains found in bacterial signaling proteins. We summarize the key aspects of their structure that are central to their functions and their associations with other signaling domains. Despite the advances several of these domains remain poorly understood in terms of their structure, ligands and functional significance. We accordingly try to highlight the significance of some of the under-appreciated sensor domains. Genomic analysis reveals that the architectural complexity of sensory domains increases with the number of sensor proteins in a genome, with a gradual plateau towards a point where newer combinations of domains do not provide major selective advantage. Syntactical analysis of domain architectures shows several discernable patterns that have functional relevance, especially in terms of the constraints introduced by signal transmission domains such as HAMP and S-helix modules. Across bacteria, the number of signaling proteins shows a positive correlation with respect to proteome size. However, there is a clear distinction in the trends between bacteria that react directly and rapidly to a large number of small molecule signals vis-à-vis those that possess distinct signaling systems related to developmental complexity. Analysis of scaling trends for individual sensor domains shows that lifestyle strategies play a major role in the selection of the type and number of these domains in an organism. From an evolutionary viewpoint, the vast majority of sensory domains appear to have their origins in the bacteria and have been widely transferred to other superkingdoms of life. In particular most major eukaryotic sensor domains appear to have their antecedents in bacteria.
Citation preview
Natural history of sensor domains in bacterial signaling systems
L. Aravind*, Lakshminarayan M. Iyer and Vivek Anantharaman
NCBI, NLM, NIH, Bethesda, MD 20894, USA.
*Address for correspondence: [email protected]
Telephone: (301) 594-2445; Fax: (301) 480-9241.
Abstract
Organisms sense stimuli at the molecular level using a relatively small set of protein domains. Computational
analysis of protein sequences along with directed experimental studies have played a major role in the
characterization of these protein domains. These sensor domains directly or indirectly detect a vast array of sensory
inputs such as solutes, gases, redox potential and light. Here, we systematically survey the types of sensor domains
found in bacterial signaling proteins. We summarize the key aspects of their structure that are central to their
functions and their associations with other signaling domains. Despite the advances several of these domains remain
poorly understood in terms of their structure, ligands and functional significance. We accordingly try to highlight the
significance of some of the under-appreciated sensor domains. Genomic analysis reveals that the architectural
complexity of sensory domains increases with the number of sensor proteins in a genome, with a gradual plateau
towards a point where newer combinations of domains do not provide major selective advantage. Syntactical
analysis of domain architectures shows several discernable patterns that have functional relevance, especially in
terms of the constraints introduced by signal transmission domains such as HAMP and S-helix modules. Across
bacteria, the number of signaling proteins shows a positive correlation with respect to proteome size. However, there
is a clear distinction in the trends between bacteria that react directly and rapidly to a large number of small
molecule signals vis-à-vis those that possess distinct signaling systems related to developmental complexity.
Analysis of scaling trends for individual sensor domains shows that lifestyle strategies play a major role in the
selection of the type and number of these domains in an organism. From an evolutionary viewpoint, the vast
majority of sensory domains appear to have their origins in the bacteria and have been widely transferred to other
superkingdoms of life. In particular most major eukaryotic sensor domains appear to have their antecedents in
bacteria.
An overview of advances in research on domains found in signaling proteins
The term signaling can be inclusively defined as the set of mechanisms by which cells detect and relay
environmental and intracellular stimuli to activate the appropriate responses to them. Dissection of the molecular
mechanisms of signaling has been a major facet of modern biochemistry and molecular biology. Some of the earliest
studies on signaling emerged from the discovery of kinases that phosphorylate proteins in eukaryotes (e.g. the
pioneering work of Krebs on the glycogen phosphorylase kinase (Fischer and Krebs, 1955; Krebs, 1998)). Forays
into the signaling mechanisms of bacteria began with the operon hypothesis of Jacob and Monod (Jacob and Monod,
1959). Divergent choices of model systems resulted in the development of very different images of the signaling
processes in bacteria and eukaryotes in the early days of molecular biology (Krebs, 1998; Springer et al., 1979;
Stock et al., 1989). In eukaryotes the role of kinases in phosphorylating the hydroxyl groups of serine and threonine
and later tyrosine residues emerged as a major paradigm (Krebs, 1998). In tandem, other signaling mechanisms such
as the pathways dependent on cyclic nucleotide second messengers and GTP hydrolysis either coupled to seven
transmembrane receptors or independently of them became well known in eukaryotes (Gilman, 1987; Neves et al.,
2002). In bacteria, the operon hypothesis pointed to a simple regulatory mechanism, namely activation of a
transcription factor via direct sensing of a diffusible environmental ligand. Subsequently, concerted studies on
bacterial behavior and physiology by Adler, Koshland, Boyer, the Stocks, Ninfa, Magasanik, Hess and Simon,
among several others, uncovered new and distinctive signaling systems (Baker et al., 2006; Black et al., 1980; Hess
et al., 1988a; Hess et al., 1988b; Kreil and Boyer, 1964; Ninfa and Magasanik, 1986; Springer et al., 1979; Stock et
al., 1989; Wang et al., 1981; Wang and Koshland, 1981). The main systems that emerged from these studies were
the chemotaxis receptors, the bacterial cyclic nucleotide network and the phosphorelay two-component systems
comprised of receiver domain and histidine kinase pairs. Despite certain prophetic insights of Koshland (Koshland,
1980), by the early 1990s the mainstream view was that eukaryotic and bacterial signaling systems were quite
different from each other. While both systems were seen to depend heavily on protein phosphorylation cascades, the
former were seen as predominantly possessing serine, threonine or tyrosine phosphorylation (Hanks and Hunter,
1995); whereas the latter were seen as utilizing the histidine-asparate phosphorelay after phosphorylation of the
histidine (Stock et al., 1990). Similarly, though both superkingdoms were seen as possessing cyclic nucleotide
signaling there was no evidence at that time that the machinery involved in generation or degradation of these
nucleotides was conserved between bacteria and eukaryotes (Selinger, 2008; Wang and Koshland, 1981). Sensory
systems such as the chemotaxis receptors of bacteria and the 7TM receptors of eukaryotes were also seen as having
no counterparts in the other superkingdom (Selinger, 2008; Zhulin, 2001).
By the second half of the 1990s this picture was to undergo a major modification, thanks to the rise of genomics and
the expansion of studies on signaling in new prokaryotic models (Bult et al., 1996; Fleischmann et al., 1995). From
the earliest days, computational analysis of proteins sequences played a major role in the dissection of signaling
systems at the molecular level. This was first demonstrated by Stock and co-workers in their analysis of the CheY
protein, which resulted in identification of the receiver domain of the two component system (Stock et al., 1985).
More generally, it led to the important realization that signaling proteins typically displayed a modular architecture
and that each conserved domain performed a distinct specific biochemical function within the polypeptide. This
provided a means of predicting a protein’s function based on the conserved domains found in it. In the second half
of the 1990s key computational tools for protein sequence analysis, such as PSI-BLAST and HMMER were
developed. These programs represent the information in a protein sequence alignment as a position-specific score
matrix (PSSM) or a hidden Markov model (HMM) and search the protein sequence database for statistically
significant matches to the PSSM or HMM (Altschul et al., 1997; Durbin, 1998). As a consequence distant sequence
similarities that had previously eluded workers could be reliably identified. This development combined with the
proliferation of protein sequences from the efflorescence of genome sequencing efforts provided a powerful means
to explore signaling proteins with the objective of characterizing new protein domains. Concomitant advances in
protein structure determination through NMR and crystallography allowed domain discovery through computational
methods to be taken to the next level, i.e. visualization and analysis of the 3-dimensional structures of domains.
These efforts culminated in a multifaceted image of many of the signaling systems ranging from domain
architectures and interactions of the relevant polypeptides, through the individual domains, all the way down to an
atomic resolution view of domain structure (Gao and Stock, 2009).
Of the several major advances brought about by these developments, especially important were the discoveries
emphasizing the previously under-appreciated similarities between bacterial and eukaryotic sensory and signal
transmission systems (Ponting et al., 1999). For example, it became increasingly clear that certain bacterial lineages
such as cyanobacteria, actinobacteria and myxobacteria have an abundance of eukaryote-type protein kinases (S/T/Y
kinases) and associated phosphorylation cascades (Munoz-Dorado et al., 1991; Potts et al., 1993; Urabe and
Ogawara, 1995). On the other hand it also became apparent that the histidine kinase and receiver domains might
have an important role in transmitting signals sensed by diverse eukaryotic receptors in lineages such as plants,
fungi, slime molds and heterolobosean amoebae (Chang et al., 1993; Schuster et al., 1996). Genomics also gave us
the first glimpses of the signaling systems of the 3rd superkingdom of life, the archaea, many of which were
experimentally intractable organisms (Klenk et al., 1997). Comparisons of the signaling systems between the three
superkingdoms of life suggested that many of the systems that were previously believed to be eukaryote-specific
were probably ultimately of bacterial provenance (Ponting et al., 1999). Sequence analysis of signaling proteins led
to the discovery of several new domains belonging to different functional categories. These included: 1) the sensor
domains which recognize and respond to diverse signals – e.g. the PAS, GAF, CACHE and CHASE domains
(Anantharaman and Aravind, 2000; Anantharaman and Aravind, 2001; Aravind and Ponting, 1997; Mougel and
Zhulin, 2001; Ponting and Aravind, 1997; Taylor and Zhulin, 1999; Zhulin et al., 2003; Zhulin et al., 1997); 2)
Novel signaling receptors such as the prokaryotic 7TM, 8TM and 5TM receptors (Anantharaman and Aravind,
2003); 3) intra-molecular signal transmitter domains e.g. the HAMP domain and the S-helix module (Anantharaman
et al., 2006; Aravind and Ponting, 1999; Williams and Stewart, 1999); 4) novel enzymatic domains such as bacterial
caspase-type proteases and the STAND class of ATPases (Aravind and Koonin, 2002; Leipe et al., 2004); 5)
bacterial peptide tagging systems, including homologs of the eukaryotic ubiquitin tagging systems (Iyer et al., 2006;
Iyer et al., 2008; Pearce et al., 2008). These discoveries provided new mechanistic insights into how specific
domains sensed stimuli and transmitted them to downstream effectors. In particular, they clarified how both one-
component systems (i.e. transcription/ RNA-binding regulators that directly respond a stimulus) and two component
systems use comparable sensor domains to recognize their activating stimuli and further transmit the impulse to
other effector domains within the same or in a different polypeptide. In other cases the discovery of these domains
allowed the identification of previously unexpected signaling mechanisms (Gao and Stock, 2009; Stock, 2006;
Swain and Falke, 2007; Taylor, 2007). On the whole these developments affirmed the importance of the modular
architecture of proteins in signaling systems. The great diversity of signaling systems seen across life appears to
have been generated by combinations of a relatively small set of ancient protein domains performing specific types
of functions. In course of evolution, these domains were mixed and matched according a syntax, which we are just
beginning to understand, to spawn all manner of signaling systems from the simplest single component systems to
complex multi-component cascades and phospho-relay networks.
With more than a decade having passed since the “genomic revolution”, we are now in possession of genome
sequences of more than one representative of practically every known branch of the bacterial tree. We have also
identified nearly all of the widespread conserved domains found in bacterial signaling proteins and possess structural
or mechanistic information on more than half of them (Finn et al., 2008; Gao and Stock, 2009; Manson, 2009;
Moglich et al., 2009b; Ulrich and Zhulin, 2007). This places us in the position to objectively evaluate several
questions concerning the natural history of conserved sensor domains found in bacterial signaling systems. We can
now more conclusively address the similarities and differences between bacterial and eukaryotic sensory systems.
We can also investigate the interplay between life-style and the distribution of sensor domains in bacteria. Finally,
we can also look at the distribution of different sensory systems across the bacterial tree and evaluate the validity of
generalizations based on model systems. In this chapter, we address some these issues using the most up to date
collection of genome sequences across the bacterial tree and a comprehensive collection of HMMs and PSSM for
signaling domains. We present the discussion from a “domain-centric” viewpoint –first we categorize all the
different types of signaling domains into few handy mechanistic categories for facilitating the core discussion on the
sensor domains. We then discuss examples of widely distributed domains central to sensory mechanisms using their
structural features as the primary handle. We then summarize key syntactical elements of domain architectures of
sensory proteins and try to relate them to their mechanistic features. Finally, we review the phyletic distributions of
these domains in bacteria and their significance for the life-style strategies adopted by bacteria. To avoid breaking
the flow of the text, in most part we do not provide the expansions for the domains names each time they are
encountered, instead we provide them all together in Appendix-I
Functional classes of domains found in bacterial signaling systems
Globular sensor domains: This category includes globular domains that directly bind ligands and thereby sense
their presence in extracellular media or within the cell. A classical example of such a sensor domain is the sugar-
binding domain of the one-component signaling protein, the transcription factor AraC (Aravind et al., 2005; Soisson
et al., 1997). Other domains in this category bind ligands either covalently (e.g. the GAF domain binding a
tetrapyrrole ligand in phytochromes) or non-covalently (e.g. PAS and BLUF domains binding flavin nucleotides)
and utilize the redox and photosensitive properties of these ligands to act as sensors (Aravind and Ponting, 1997;
Gomelsky and Klug, 2002; Moglich et al., 2009b; Ponting and Aravind, 1997; Zhulin et al., 1997). Yet others use
these prosthetic groups or ions to sense gaseous ligands such as nitric oxide (E.g. the HNOB and HNOBA domains)
(Iyer et al., 2003; Tucker et al., 2008). These domains are the primary recipients of the sensory stimulus and transmit
it in the form of conformational changes induced either by ligand-binding or changes in ligand conformation. While
domains that sense intracellular second messengers are also sensor domains, we consider them separately.
Multi-transmembrane receptor domains: These include a variety of receptors with multiple TM domains. Some of
the multi-TM domain might directly sense stimuli such as light via prosthetic groups such rhodopsin (the sensory
rhodoposins) (Jung et al., 2003; Mukohata et al., 1999). Some other bacterial counterparts of eukaryotic 7TM
receptors might transduce signals recognized by extracellular sensor domains to intracellular effector domains
(Anantharaman and Aravind, 2003). Some other bacterial multi-TM receptor domains appear to be membrane-
embedded direct sensors of stimuli (Anantharaman and Aravind, 2003).
Signal transmitter domains and modules: This category includes distinct globular domains as well as simple helical
modules that typically connect different signaling domains. The prevalence of such domains is a hall-mark of
signaling systems of bacterial origin. They typically function in transmitting a signal in the form of a conformational
change from one domain to another or as switches that prevent inappropriate propagation of a signal in the absence
of a stimulus. Examples of such signal transmitters are the HAMP domain and the S-helix module that function in
conjunction with a whole range of sensor and effector domains (Anantharaman et al., 2006; Aravind and Ponting,
1999; Williams and Stewart, 1999).
Domains in two-component-type phosphorelay systems: The primary members of this category are the effector
domains of the two component systems, the histidine kinase and the receiver domain. They function in conjunction
with two helical partners that are the recipients of the initial histidine phosphorylation, namely the dimerization and
histidine phosphorylation module (DHp) and the His-containing phosphotransfer domain (HPt) domain (Gao and
Stock, 2009; Stock et al., 1990). A small minority of histidine kinase domains have been shown to phosphorylate
serine/threonine and are thus closer to the S/T/Y kinases in their action (Koretke et al., 2000).
Domains in S/T/Y phosphorylation cascades: Unlike the two component phosphorelay systems, the
phosphorylation of serine or threonine (and in some cases tyrosine) is a single step process catalyzed by the S/T/Y
kinase domain that is structurally distinct from the histidine kinase domains (Hanks and Hunter, 1995; Leonard et
al., 1998). These phosphopeptides might be sensed by specific domains such as the FHA domain (Mohammad and
Yaffe, 2009). In eukaryotes the BRCT domain has been claimed to bind such phosphopeptides, but there is no
evidence for such a function among the bacterial BRCT domains (Mohammad and Yaffe, 2009). The phosphate tags
on proteins are removed by phosphatase domains, of which the PP2A domains are among the most common in
bacteria (Ponting et al., 1999).
Domains in second messenger signaling: These signaling systems generate a second messenger in response to a
primary stimulus and this second messenger is further recognized by specific sensors that further relay the signal to
targets. The most prevalent secondary messengers in bacterial systems are cyclic nucleotides cAMP and cGMP,
cyclic diguanylate and the alarmone (p)ppGpp. This category includes the various cyclase domains which generate
the cyclic nucleotides, the cyclic nucleotide phosphodiesterases and the binding domains that recognize cNMP or
diguanylate. All cyclase domains appear to have been derived from different families of nucleic acid polymerases.
The cNMP cyclase and GGDEF (diguanylate cyclase) domains are related to the classical palm-domain
polymerases, whereas the Escherichia coli CyaA-like cyclases and the (p)ppGpp-generating enzyme of the
SpoT/RelA family are related to the polymerase β superfamily (Aravind and Koonin, 1999a; Hogg et al., 2004;
Makarova et al., 2002). The phosphodiesterases belong to at least 4 major families: 1) the HD superfamily (HD-
GYP the cyclic diguanylate phosphodiesterase and cNMP phosphodiesterases), 2) the calcineurin-like superfamily,
3) the metallo--lactamase superfamily and 4) the EAL superfamily (Aravind and Koonin, 1998; Galperin et al.,
1999).
Domains and modules in the core chemotaxis transducers: Bacterial chemotaxis transducers (also called methyl
accepting chemotaxis proteins) are proteins that combine N-terminal sensor and signal transmitter domains with a C-
terminal coiled coil domain (Alexander and Zhulin, 2007; Baker et al., 2006; Zhulin, 2001). This coiled coil domain
is the primary domain in this category and contains two distinct sub-structures, which respectively act as targets for
the other types of domains/modules in this category. These other domains are the methyltransferases and
methylesterases that modify glutamates or glutamines in one of the substructures in coiled coil and the CheW
modules that interact with the other substructure in the coiled coil.
GTP-dependent signaling domains: Unlike in eukaryotes, much less is known of GTPase signaling in bacteria.
There is no evidence for classical G proteins in bacteria (Ponting et al., 1999). Though several homologs of the
small Ras-like GTPase are observed in bacteria their roles in sensory signaling remain poorly understood, and these
do not possess the classical exchange factors (GEFs) and activating proteins (GAPs) typical of eukaryotic GTPase
signaling systems. A number of pathogenic bacteria possess effectors that interfere with the signaling of their host
cells into which they are secreted (Aepfelbacher et al., 2005).
Domains of “Apoptotic-type” signaling systems: A number of distinct domains were initially characterized to
interact together in signaling networks regulating apoptosis in animals. These include the distinctive ATPases of the
STAND class such as the AP- and NACHT- ATPases, proteases such as the caspases, and the enigmatic TIR
domain (Anantharaman and Aravind, 2001; Leipe et al., 2004). Subsequently, it became apparent that along with
S/T/Y kinases these domains are frequently found in a subset of bacteria and even in those organisms they might
interact with each other to constitute poorly understood signaling systems (Anantharaman and Aravind, 2001;
Aravind and Koonin, 2002). Some of them like the MalT ATPase might play an important role in transmitting
signals upon recognition of solutes such as sugars (Marquenet and Richet, 2007).
Ion channels: Prokaryotes possess several ion channels that have roles in responding to osmotic imbalances
(Vasquez and Perozo, 2009) as well as signaling in response to sensing of specific ligands (Lee et al., 2008a). In this
chapter we restrict ourselves to only discussing ligand-gated ion-channels that have a direct sensory role.
Phosphotransfer systems (PTS): This complex system includes a series of domains that relay a high energy
phosphate moiety from phosphoenolpyruvate via serial transestrifications to sugars being transported via
transmembrane permeases. However, a variant of this phospho-relay system is involved in the regulation of nitrogen
metabolism in certain bacteria. While the PTS could be seen as a solute sensory mechanism, it appears to be
primarily a solute uptake system. Hence, we do not discuss it in any detail here. We refer readers to a detailed
comparative genomics analysis of these proteins by Saier and co-workers (Barabote and Saier, 2005; Saier et al.,
2005).
Peptide-tagging systems: In eukaryotes conjugation of proteins with peptide tags in the form of ubiquitin (or a
related polypeptide) or homopolymeric peptides such as polyglycine, polyglutamate or even a single amino acid
such as tyrosine is a major regulatory mechanism in several signaling processes (Hochstrasser, 2009). In bacteria a
well-known mechanism of peptide tagging via the tmRNA system plays a role in degradation of defective proteins
(Wower et al., 2008). However, only recently it is becoming apparent that some bacteria possess domains that
constitute counterparts of the eukaryotic ubiquitin system or analogous regulatory peptide-tagging systems such as
pupylation (Iyer et al., 2006; Iyer et al., 2008; Pearce et al., 2008).
It should be stressed that the above categorization of signaling domains is not rigid but is intended as a useful
context for the further detailed discussion on domains involved in sensory processes. In functional terms there is
much cross-talk between some of these categories of domains (e.g. the well-known case of the close interaction
between chemotaxis transducers and certain two component systems) (Baker et al., 2006).
General mechanisms through which sensory inputs are channelized
In any organism there has been selection for several different types of responses to sensory stimuli types which
might be broadly characterized as: 1) direct rapid responses- These are mediated by one-component systems, where
sensory domains are usually fused to DNA-binding domains such as the helix-turn-helix (HTH) domain, to direct
transcription of particular target genes and rapidly alter the transcriptional state of the cell (Aravind et al., 2005). 2)
Filtered responses- In these cases a sensor domain might be combined to a catalytic effector domain, typically a
histidine kinase, via signal transmitter elements such as the HAMP domain and the S-helix (Gao and Stock, 2009).
These elements act as a preliminary filter for propagation of the signal received by the sensor domain and thus
ensure its controlled transmission. Additional control steps are present downstream in the form the DhP, HTp (if
present) and receiver domains before the signal is converted to a transcriptional response. These systems allow a
controlled response that allow sensing of thresholds and may be contrasted with the more continuous and rapid
responses afforded by the one-component systems. 3) Signal amplification- Amplification of the initial stimulus
sensed by a sensor domain can be useful to direct global state changes in the cell. These are usually mediated by
cNMP/cyclic diguanylate- generating enzymes that sense the signal via a sensory domain and amplify it by
generating a second messenger that is further sensed by specific domains (Benach et al., 2007; Linder and Schultz,
2008). An alternative amplificatory mechanism is via S/T/Y kinases that phosphorylate target peptides, which might
be sensed by FHA domains that further recruit other effectors to the phosphorylated proteins. Multiple kinases could
also be linked in an amplificatory phosphorylation cascade (Hanks and Hunter, 1995; Jagadeesan et al., 2009). 4)
Altering the shape of a response and negative regulation- Certain enzymes such as cyclic nucleotide
phosphodiesterases and phosphatases could function downstream of sensor domains to reverse the action of cyclases
and kinases to create specifically shaped responses such as a sharp peak or shut down of an ongoing response
(Soderling and Beavo, 2000). 5) Allosteric and feedback regulation- Sensor domains combined with catalytic
domains can often serve to regulate the action of an enzyme in an allosteric manner or sense a feedback from a
downstream process. For example GAF domains fused to the phosphodiesterase domain regulate its activity by
sensing cyclic nucleotides (Martinez et al., 2002). 6) Memory-The alteration of responses to stimuli subsequent to
the initial stimulus might be termed memory in a signaling system. This is seen in the form of covalent modification
of signaling domains. This is exemplified by the methylation, demethylation and deamidation of glutamates and
glutamines in chemotaxis transducers by the methylesterases and methyltransferases (Baker et al., 2006).
The way these different types of responses have been achieved in evolution is primarily via the combination of the
above-outlined categories of domains within a single polypeptide or through interactions between them on separate
polypeptides (Fig. 1). In the next section we consider the major sensor domains that are central to bacterial
signaling from a structural view point. We do not consider signal transmission domains and effector domains in
detail and discuss them only in the context of particular signaling mechanisms downstream of the sensor domains
(for domain name expansions see Appendix-I).
Globular sensor domains
The PAS-like fold
A large number of the sensor domains in bacterial signaling proteins can be unified into a single structural fold
termed the PAS-like fold (also termed the profilin-like fold in the SCOP database (Andreeva et al., 2008)). The
prototypical superfamily contained in this fold is the classical PAS domain, which is conserved in a vast number of
bacterial signaling proteins. Other major superfamilies of bacterial sensor domains possessing the PAS-like fold are
GAF, CACHE, PocR, HNOBA and GSU0582-like heme-binding domain (Anantharaman and Aravind, 2000;
Anantharaman and Aravind, 2005; Aravind and Ponting, 1997; Moglich et al., 2009b; Pokkuluri et al., 2008;
Ponting and Aravind, 1997; Zhulin et al., 1997). Domains previously named LOV and MEKHLA are simply
versions of the originally defined classical PAS domain (Moglich et al., 2009b; Mukherjee and Burglin, 2006).
Similarly, the so-called CHASE4 domain (Zhulin et al., 2003), which is found as the extracellular domain of
bacterial sensory receptors, is merely a divergent member of the CACHE superfamily. The classical PAS-like fold is
characterized by a core sheet comprised of 5 β-strands arranged in a 3-4-5-1-2 order which form the base of a pocket
(Moglich et al., 2009b). Strands 2 and 3 are on opposite sides of the sheet and are connected by “flange” that might
contain two or more secondary structure elements in a helical conformation (Fig. 2). The loop between strands 3 and
4 might also display extension or insertion of additional elements. Together, these two features form the sides of a
pocket that can accommodate a ligand (Moglich et al., 2009b). This ligand-binding mode is largely conserved
throughout the fold as evidenced by the multiple ligand-bound structures available for representatives of the PAS,
GAF and CACHE superfamilies (Moglich et al., 2009b) (Fig. 2). This suggests that the ancestral member of the
PAS-like fold was a ligand-binding domain and that the extant diversity of ligands bound by this fold is a
consequence of structural divergence of the ancestral ligand-binding pocket resulting in new specificities (Aravind
and Koonin, 2002). The emergence of new ligand-contacting sites appears to have chiefly occurred as a result of the
diversification of the flange and the loop between strand 3 and 4. One striking example of this is the emergence of a
cysteine in the first GAF domain of the cyanobacterial phytochromes (subsequently transferred to plants) in the
insert between strand 3 and 4 to covalently bind the phycocyanobilin ligand. As a result this version of the GAF
domain covalently binds the chromophore in contrast to the other bacterial phytochromes where the chromophore is
attached to a distinct N-terminal region (Lamparter et al., 2004). In the case of the PAS domain of the photoactive
yellow protein another such cysteine has emerged in the flange, which is covalently linked to 4-hydroxycinnamic
acid (Moglich et al., 2009b). Another variation is the conserved histidine in the FixL PAS domains which helps in
coordinating a heme ligand bound in the pocket (Ayers and Moffat, 2008). An exception to this picture is the
GSU0582-like heme-binding domain from Geobacter sulfurreducens which has evolved a distinct heme-
coordinating site on the face opposite to the conventional ligand-binding pocket (Pokkuluri et al., 2008).
Most superfamilies of this fold that have been characterized to date bind a wide range of ligands mostly non-
covalently and occasionally covalently. In the PAS superfamily itself we encounter versions showing specificity to
diverse set of ligands such as 4-hydroxycinnamic acid, FMN, FAD, 1H-indole-3-carbaldehyde (Moglich et al.,
2009b). Certain PAS domain proteins might also be promiscuous in binding ligands and serve as receptors of a
range of xenobiotic compounds. In the CACHE superfamily there is evidence for binding of ligands such as
glycerol, citrate, sugars and metal ions (Moglich et al., 2009b; Reinelt et al., 2003), whereas in the case the GAF
domain there is evidence for binding of cyclic nucleotides, the tetrapyrrole ligands such as phycocyanobilin, formate
and metal ions (Aravind and Ponting, 1997; Lamparter et al., 2004; Martinez et al., 2002; Tucker et al., 2008). The
divergent GAF domains found in the sensor domain of the transcription factors of the IclR family also bind a range
of small solutes such as glyoxylate (Walker et al., 2006). Another divergent version of the GAF superfamily, the
autoinducer-binding domain, is found fused to a HTH domain in the LuxR-like transcription factors and specializes
in binding the quorum-sensing N-acyl homoserine lactones (Chai et al., 2001). Much less is known of the ligands
of the PocR superfamily, but the available evidence suggests that it binds small hydrocarbon derivatives such as 1,3-
propanediol (Anantharaman and Aravind, 2005). Members of the PAS-like fold function in both intracellular and
extracellular contexts – the PAS, GAF, HNOBA and PocR superfamilies are typically intracellular in their
localization, whereas members of the CACHE superfamily are usually extracellular ligand-binding domains
(Anantharaman and Aravind, 2000). The ligand binding ability of members of the PAS-like fold might be used in
sensory perception in two distinct modes. The bound ligands like flavin nucleotides, 4-hydroxycinnamic acid,
phycocyanobilin and heme could act as redox sensors or photosensors, or both (Lamparter et al., 2004; Moglich et
al., 2009a; Moglich et al., 2009b; Taylor and Zhulin, 1999). In other cases like the binding of cNMP by GAF
domains, or the binding of autoinducers and diverse ligands of one-component systems the binding of the ligands by
the domains by itself acts as the sensory input (Chai et al., 2001; Martinez et al., 2002). There are other several
members of the PAS-like fold, especially classical PAS domains which are not known to bind any small molecule
ligands (Moglich et al., 2009a). Instead they have been found to dimerize through homophilic interactions or contact
other proteins using the same ligand-binding pockets (Gao and Stock, 2009; Moglich et al., 2009b). However, it is
possible that the natural ligands remain unknown in a number of these cases. Irrespective of whether they bind small
molecule or peptide ligands, the majority of members of this fold appear to be able to undergo conformational
changes, most probably on account of the peculiar flange which bridges the two ends of the core β-sheet (Fig. 2).
The presence of multiple tandem repeats of PAS and, in some cases, GAF domains in several signaling proteins
differentiates them from other members of the fold such as the CACHE and HNOBA domains that typically occur
as a single copy in the protein (Fig. 1). It is likely that not all of the copies in these tandem arrays of PAS domains
bind small molecule ligands; rather they might bind peptides (including via dimerization). It is conceivable that
these tandem arrays utilize their ability to undergo conformational changes to transmit the stimulus sensed by the
single sensory domain to the downstream effector domains. Thereby tandem arrays of PAS domains in signaling
proteins could provide a powerful means to set response thresholds for stimuli.
The CHASE domain
The CHASE domain is an extracellular sensor domain that appears to be critical for recognition of secreted ligands
used in cell-cell communication (Anantharaman and Aravind, 2001; Mougel and Zhulin, 2001). No structure is as
yet available for any representative of this family, but our profile-profile comparisons with the HHpred program
(Soding et al., 2005) indicate that it contains a core domain of the PAS-like fold that is most closely related to the
CACHE domain (see above). The domain is typically combined with a variety of intracellular catalytic domains of
the two component system and cyclic nucleotide/cyclic diguanylate-dependent second messenger systems and S/T/Y
kinases (Anantharaman and Aravind, 2001). In eukaryotes, the characterized ligands include adenine derivatives
such as cytokinin, and the related slime mold signaling molecule discadenine, and a small peptide SDF-2 secreted by
the pre-stalk cells of the Dictyostelium (Anantharaman and Aravind, 2001; Mougel and Zhulin, 2001). The ligands
of none of the prokaryotic versions have been characterized, but given their close relationship to the eukaryotic
versions it is possible that they are receptors of a previously unexplored bacterial cell-cell communication system
that might utilize cytokinin-like ligands.
The HNOB-like fold
This fold includes two distinct superfamilies of sensor domains namely the HNOB and V4R (Anantharaman et al.,
2001; Iyer et al., 2003), both of which are always found in intracellular proteins. This is an α+β fold with a 4-helical
bundle stacked against a 4 β-stranded unit with two additional helices associated with it (Fig. 2)(Pellicena et al.,
2004). A ligand binding cleft is formed between these two sub-structures. Majority of members of the HNOB
domain contain a conserved histidine in the first helix associated with the 4-stranded substructure with which it
coordinates a heme molecule (Iyer et al., 2003). The HNOB domains use the heme ligand to sense gaseous ligands
such as nitric oxide and carbon monoxide that diffuse through the cell membrane. In bacteria the HNOB domain
appears to occur as a standalone or in combination with the coiled-coil domain of chemotaxis transducers, while in
eukaryotes they form the NO or CO sensing module of the soluble guanylyl cyclases (Iyer et al., 2003). Most
versions of the HNOB domain are either coupled with a HNOBA domain (see above) in the same polypeptide or
physically associate with one (in bacteria usually encoded by another signaling protein in the same operon) (Iyer et
al., 2003; Ma et al., 2008). Thus it appears that coordination between the HNOB and HNOBA domains is critical for
transmission of the signal upon sensing the gaseous ligand (Iyer et al., 2003). The V4R domain is found in archaeal
and bacterial single component systems, where they are fused to HTH domains (e.g. MTH1349) or the NtrC-like
AAA+ domains (e.g. XylR and DmpR) that recruit 54 to regulate transcription (Anantharaman et al., 2001). The
V4R domains of XylR and DmpR appear to be required to sense aromatic compounds such as phenols (O'Neill et
al., 1998). Certain versions of the V4R domain such as that in MTH1349 contain 3 conserved cysteines suggesting
that these versions might coordinate a metal, which might play a further role in ligand recognition (Anantharaman et
al., 2001).
β-grasp fold sensory domains
This fold is an α+β fold typically containing 5 core β-strands and one helical segment (Fig. 2). In several versions
the β-strands are considerably curved to assume a barrel-like structure (Burroughs et al., 2007b). It is a versatile and
commonly found fold that includes a vast variety of representatives including ubiquitin, the 2Fe-2S ferredoxins, and
the Nudix phosphoesterases, of which a subset are dedicated small molecule-binding domains with a potential
sensor function. The chief among these is the soluble ligand-binding beta-grasp (SLBB) domain (Burroughs et al.,
2007a) (Fig. 2). One type of SLBB domain specializes in binding of cobalamin and its precursors and is often
present in bacteria cell surface proteins that might act as cobalamin scavengers. Bacterial and archaeal intracellular
versions are fused to ArsR- and AraC-type HTH domains, and might act as one-component system regulators that
sense cobalamin or its precursors (Burroughs et al., 2007a). Other superfamilies of the β-grasp fold, such as the 2Fe-
2S ferredoxin (Burroughs et al., 2007b) domain typically function as iron-binding electron transport facilitators in
metabolic reactions; however, some versions of this ferredoxin domain have also been recruited as potential sensors.
It is conceivable that these versions utilize the bound metal ions to sense redox potential. Some examples include
CyaC from Rhizobium meliloti (gi: 227823018) and cyaA6 from Leptospira interrogans (gi: 24213884) where the
2Fe-2S ferredoxin shows independent fusions to cNMP cyclase domains (Fig. 1). The Nudix superfamily is an
enzymatic version of the β-grasp fold members of which that hydrolyzes nucleotide disphosphate derivatives (Fig.
2). However, some catalytically compromised and low-activity versions function as sensor domains in both
eukaryotes and bacteria. The best known example in bacteria is the NAD metabolism repressor NrtR, whose
repressor action is shut off by the binding of ADP-ribose to its sensor Nudix domain. Gradual hydrolysis of the
ADP-ribose restores the repressor activity of NtrR; thereby the Nudix domain functions as sensory switch (Huang et
al., 2009).
The ferredoxin-like fold domains
The ferredoxin-like (also called RRM-like) fold is a very widespread α+β fold that is present in at least 3 distinct
sensor domains found in bacterial signaling proteins (Andreeva et al., 2008). The fold is characterized by four core
β-strands against that are stacked against two helices (Fig. 2). The surface of the 4-stranded β-sheet, and the cleft
between the helices and β-sheet, form distinct, versatile ligand-binding regions in different versions of this fold (Fig.
2). The classical 4Fe-4S ferredoxins are major players in diverse metabolic electron transport chains. The 4Fe-4S
ferredoxin domain has also be recruited as a sensor domain in certain one-component systems, where it is combined
with HTH domains (e.g. HM1_2484, gi: 167630545 from Heliobacterium modesticaldum; Fig. 1) and might
function as redox sensors. The heavy metal binding HMA domain is another superfamily with the ferredoxin-like
fold that possesses two conserved cysteines with which it binds heavy metals (Jordan et al., 2001). These domains
are typically found fused to heavy metal efflux ATPases of detoxification systems (Jordan et al., 2001)(Fig. 2);
however, a subset of these domains found fused to HTH domains are likely to function as heavy metal sensing
transcription regulators (e.g. CHU_1500, gi: 110637904 from Cytophaga) (Aravind et al., 2005). The ACT domain
is the third sensor domain with the ferredoxin-like fold (Fig. 2). It is frequently fused to the catalytic domains of
various metabolic enzymes, whose activity it regulates allosterically or by a feedback process by binding amino
acids and their derivatives (Aravind and Koonin, 1999b). It is also used as a sensor in different signaling proteins.
One classical example of this is the TyrR transcription factor that regulates the expression of genes dependent on
aromatic amino acids Tyr, Trp and Phe (Pittard et al., 2005). This protein senses the aromatic amino acids using the
N-terminal ACT domain and transmits this signal via a PAS domain to downstream NtrC-like 54-binding AAA+
ATPase and HTH domains. Another remarkable bacterial signaling system depending on the ACT domain is
stringent response system that is triggered by starvation of amino acids and other nutrients. The key proteins in this
system are SpoT and RelA that generate and degrade the second messenger or “alarmone” (p)ppGpp using their
nucleotidyltransferase and HD phosphoesterase domain (Hogg et al., 2004). These proteins additionally contain an
ACT domain which might sense the status of nutrient availability by binding amino acids (Aravind and Koonin,
1999b). A divergent version of the ACT domain also serves as a sensor for ligands in the Lrp family of one-
component transcription factors (Ettema et al., 2002).
The gyrase inhibitor (GyrI)-like fold
This fold is made up of a dimer of strand-helix-strand-strand (SHS2) modules that together constitute a ligand-
binding pocket (Anantharaman and Aravind, 2004) (Fig. 2). The sensory members of this fold are typified by the
ligand-binding domains of a number of one component transcription regulators prototyped by Bacillus subtilis
BmrR and Escherichia coli SbmC and Rob. In BmrR, the GyrI domain functions as a promiscuous xenobiotic sensor
and transmits the signal via a conformational change to the fused HTH domain to regulate transcription of multidrug
resistance genes (Newberry and Brennan, 2004). The diversity in the ligand-binding pocket of the GyrI domain of
the bacterial one-component transcription factors suggests that some of these might bind other types of ligands,
though most members might show a preference for positively charged ligands (Anantharaman and Aravind, 2004).
One specific family of the GyrI fold termed the SOUL family is found predominantly in secreted proteins in both
bacteria and eukaryotes and specifically binds a heme ligand (Dias et al., 2006). Members of the SOUL family from
Francisella tularensis (FTM_0021) and Mycobacterium (Mmcs_2896) might function on the cell surface as light or
redox sensors by means of the heme ligand.
CBS domains
A single CBS domain is an α+β unit with a 4-stranded β-sheet at its core. The CBS domains always occur as
obligate dimers with the surfaces of the β-sheets of the monomers stacked against each other to form a compact
globular unit (Bateman, 1997) (Fig. 3). CBS domains have thus far been shown to bind a variety of adenosine
ligands such as cAMP, AMP, ATP and S-AdoMet and regulate the activity of fused domains both in metabolic
enzymes such as Cystathionine-beta synthase and inosine-5'-monophosphate dehydrogenase and in signaling
proteins such as chloride channels in eukaryotes (Bateman, 1997; Ignoul and Eggermont, 2005). In bacteria, the
CBS domains (along with cNMP-binding domains) are found fused to a polymerase β-like nucleotidyltransferase
domain in a remarkable group of signaling enzymes prototyped by VIH_000647 (gi: 262033842; Fig. 1) from Vibrio
cholera (Aravind and Koonin, 1999a). This nucleotidyltransferase domain is likely to function in generating a cyclic
nucleotide or a related second messenger and the CBS domains might function as allosteric regulators that sense this
second messenger.
The DICT domain
This is a previously uncharacterized intracellular sensor domain of about 120 amino acids that is found associated
with the catalytic domains of diguanylate cyclases and phosphodiesterases (GGDEF, EAL and HD-GYP domains)
and two-component systems (histidine kinase). Hence, it was called the DICT domain (for diguanylate cyclases and
two-component systems; it overlaps with the domain of unknown function DUF2308 defined in the PFAM database
(Finn et al., 2008)). This domain is also fused to the STAS domain (see below) in certain proteins. It is predicted to
function as a sensor domain primarly based on its domain architectures in which in occupies a position comparable
to other sensor domains such as the BLUF (see below), PAS, GAF, HNOB and HNOBA domains (Fig. 1). Its
structure is currently not available, but secondary structure prediction (supplementary material) indicates that it
assumes an α+β fold with a 4-stranded β-sheet. The domain is prevalent in photosynthetic bacteria and halophilic
archaea suggesting that it might have a role in light response. A distinct subfamily of DICT domains has conserved
signatures in the form of a N-terminal HxxED motif and a C-terminal cysteine which might form a metal or
prosthetic group coordinating site (Supplementary material).
The STAS domain
This domain shows a distinctive α/β fold with a core sheet of 4-5 strands (Fig. 3) and was first discovered in the
intracellular domains of sulfate transporters and the regulator anti-sigma factor antagonists (Anantharaman and
Aravind, 2000). The characterized versions of the STAS domain bind GTP or ATP and might even hydrolyze it at a
low rate (Avila-Perez et al., 2009; Kovacs et al., 1998). Studies on the Bacillus YtvA protein, which combines light-
sensing PAS and GTP-binding STAS domains in a single polypeptide, indicate that the molecule is likely to
function as a dual sensory switch that requires both light and NTP to activate transcription via B(Avila-Perez et al.,
2009). The STAS domain is also combined with GAF, S/T/Y-kinase, PP2C phosphatase, receptors with extracellular
CACHE and HAMP domains and Zinc-ribbon domains, suggesting that it might function as a potential nucleotide
binding sensory switch in a number of distinct signaling contexts (Aravind and Koonin, 2000).
Domains with the periplasmic-binding protein-type folds
Two topologically distinct, but evolutionarily distantly related folds, PBP1 and PBP2, are prototyped by the
periplasmic solute binding proteins of bacteria that function as partners of the ABC transporters (Tam and Saier,
1993). Both these domains have two α/β sub-domains that are topologically intertwined and constitute a ligand-
binding pocket at the interface between the two subdomains (Fig. 3). The PBP1 and PBP2 domains are extremely
versatile-ligand binding domains that recognize a wide range of substrates including sugars, amino acids, the
bacterial autoinducer-2 (AI-2; furanosyl borate diester), amides, peptides, metal ions, inorganic ions such as
phosphate and sulfate, purines and polyamines (Stock, 2006; Tam and Saier, 1993). Not surprisingly, several
extracellular and intracellular versions of the PBP1 and PBP2 domains have been recruited as sensory domains of
number of distinct bacterial and eukaryotic signaling systems. Examples of the extracellular versions include the
LuxP protein, a PBP1 domain protein, which binds AI-2 and transmits the signal of its presence to the histidine
kinase LuxQ (Stock, 2006). PBP1 domains are also found in the extracellular ligand-sensing modules of certain
bacterial acetylcholine receptor type ligand-gated ion channels (ART-LGIC; also known as Cys-loop ion channels)
and chemotaxis signal transducers (Tasneem et al., 2005). Intracellular versions of the PBP1 domain constitute the
sensor domain of the LacI family of single component transcription regulators, such as the Lac repressor
(Anantharaman et al., 2001; Daber et al., 2007). The intracellular versions of the PBP2 superfamily constitute the
ligand binding domains of the LysR family transcription factors, which are one of the major bacterial one-
component systems (Anantharaman et al., 2001; Ezezika et al., 2007; Sainsbury et al., 2009). Such PBP2 domains
are also predicted to associate with bacterial ART-LGIC ion channels to regulate its activity (Tasneem et al., 2005).
In eukaryotes the PBP1 domains constitute the ligand-sensing domains of cNMP generating receptors such as the
atrial natriuretic factor receptor, the metabotropic glutamate receptors, vertebrate taste receptors and one of the
domains of the NMDA-type ligand-gated ion channels (Lee et al., 2008b; Tasneem et al., 2005). The PBP2 domain
is also found in NMDA-type ligand-gated ion channels and the membrane spanning segments of the channel are
inserted into it (Fig. 1).
The Anabaena sensory rhodopsin-associated homology (ASRAH) domain
This recently recognized superfamily has an 8-stranded β-sandwich fold and is predicted to bind sugars or their
derivatives in a cleft formed at the dimer interface (De Souza et al., 2009). The intracellular versions of this domain
occur as small proteins that form tetramers typified by the Anabaena sensory rhodopsin signal transducer
(ASRT)(Vogeley et al., 2007). These proteins are likely to regulate signaling via the Anabaena rhodopsin by sensing
sugars or their derivatives. Most ASRT homologs are found in organisms lacking a sensory rhodopsin, and are likely
to regulate the activity of sugar metabolism enzymes such as sugar isomerases, aldolases and sugar kinase by
directly binding sugar (De Souza et al., 2009). This domain differs from most other small molecule sensor domains
in functioning as a standalone module that is not fused to any other signaling domains.
The transporter-OB (T-OB) domain
This domain contains a specialized version of the β-barrel OB-fold which occurs as an obligate dimer on account of
swapping of the C-terminal strands of the fold between monomers (Aravind and Koonin, 2000) (Fig. 3). The
prototypical T-OB domain is found in the ModE single-component transcription factor that combines two T-OB
domains with a N-terminal DNA-binding winged HTH domain. This version of the T-OB domain senses molybdate
and regulates the expression of molybdenum metabolism genes (Schuttelkopf et al., 2003). Other versions of T-OB
domain are also found independently fused to the C-termini of a diverse range of ABC transporters such as the
sulfate transporter CysA, sugar transporters such as MalK and MsmX, glycerol, putrescine, molybdate and iron
transporters (Koonin et al., 2000). In these cases the T-OB domain is predicted to function as an inbuilt sensor which
detects the concentrations of the respective solutes translocated by these transporters and thereby regulates their
activity (Koonin et al., 2000).
The 4-helical up-and-down bundle fold
The 4-helical up-and-down bundle is a common structural motif that is found in at least three distinct superfamilies
of sensor domains, namely the 4HB or the aspartate-receptor type ligand-binding domain, the hemerythrin and the 4-
helical cytochromes (Ulrich and Zhulin, 2005). The 4HB domain is one of the most prevalent extracellular sensory
domains of the chemotaxis transducers, but it also occurs as a standalone cell-surface receptor (e.g. cg3044 from
Corynebacterium; gi: 41326927) or combined in the same polypeptide with effector domains of the two component
system and the cyclic nucleotide/diguanylate second messenger system (Ulrich and Zhulin, 2005). Though
topological simple (Fig. 3), the 4HB domain has been extensively used in bacterial signaling systems for the
recognition of a diverse range of ligands such as different amino acids, sugars, metals and peptides (Ulrich and
Zhulin, 2005). Profile-profile comparisons with the HHpred program suggest that the so-called “CHASE3” domain,
which is found in several signaling proteins (Zhulin et al., 2003), is likely to be a version of the 4HB superfamily.
The NarX/NarQ domain, which is the nitrate and nitrite sensor module of the NarX kinase (Cheung and
Hendrickson, 2009) and several other bacterial signaling receptors with other intracellular effector domains is
another widespread divergent clade of the 4HB superfamily. The hemerythrin superfamily chelates two iron ions by
means of multiple conserved histidine and carboxylate ligands (Fig. 3). A version of this domain has been found to
function as an iron-dependent O2 sensor in the Desulfovibrio vulgaris chemotaxis transducer DcrH (Xiong et al.,
2000). Other members of this superfamily are found combined to the GGDEF domain (e.g. CPS_3631, gi: 71279208
in Colwellia), suggesting a cyclic diguanylate-dependent signal transduction of the ambient O2 (Fig. 1). The 4-
helical cytochrome domains bind heme and are widely used in different electron transfer chains, e.g. cytochrome C’
and cytochrome b562 (Fig. 3). Members of this superfamily appear to have been adapted as heme-dependent sensor
domains in certain bacterial lineages. One previously under-appreciated example is a membrane-associated signaling
protein found in planctomycetes, which combines an N-terminal PTPase-type phosphatase domain with a C-terminal
4-helical cytochrome domain (e.g. RB3293, gi: 32472429 from Rhodopirellula; Fig. 1).
The NIT domain
This is a large all α-helical fold with 8-10 helices, which is found either as the extracellular sensor domain of
bacterial two-component systems, cyclic diguanylate-dependent systems and chemotaxis signal transducers or as
intracellular sensor domains of the Amir/Antar-type transcription antitermination regulators (Shu et al., 2003). In
eukaryotes it appears as the extracellular domain of a group of animal receptor guanylyl cyclases (Anantharaman et
al., 2006). Currently no structures are available for the NIT domain, but based on the predicted secondary structure
it appears likely to adopt a torroidal or multihelical bundle configuration. A functionally characterized protein with
this domain, NasR, combines the NIT domain to the RNA-binding Amir/ANTAR domain and mediates transcription
anti-termination to allow expression of the nitrate utilization genes (Shu et al., 2003). This process is dependent on
the sensing of nitrate and nitrite ions by the NIT domain (Wu et al., 1999). However, it remains unclear if any of the
extracellular versions similarly sense nitrate or nitrite. It is also not certain if the NIT domain directly binds these
ions or senses them via a bound prosthetic group.
Miscellaneous domains functioning as the sensory units of one-component systems
By definition, one-component systems contain “inbuilt” sensory domains. Several of these domains have a limited
distribution outside of one component systems (Fig. 1); hence, we consider some prominent examples of such
domains together in this subsection even though they possess very distinct structures. The ATP-cone domain is a
cone-shaped 4-helical ATP/ADP binding domain (Fig. 4). It is found in the allosteric regulatory domains of the
type-I and type-III ribonucleotide reductase and the archaeal/bacterial 2-phosphoglycerate kinase which involved in
the synthesis of 2-3 phosphoglycerate, a thermoprotective compound (Anantharaman and Aravind, 2000). The ATP-
cone domain is also fused to a DNA-binding Zn-ribbon domain in the one-component transcription factor
NrdR/YbaD (Anantharaman and Aravind, 2000), where it functions to regulate the transcription of the nucleotide
synthesis genes such as ribonucleotide reductase by sensing nucleotide concentrations (Torrents et al., 2007). The
CCTBP domain is a recently described domain with the same fold as the TATA-box binding protein but containing
a conserved cysteine. The CCTBP domain is part of a number of sulfur-transfer systems in conjunction with the E1-
like adenylation domain (Burroughs et al., 2009). Some archaeal one-component transcription factors contain a
CCTBP protein suggesting that it might function as either a redox sensor by means of the thiol group of the
standalone cysteine or else detect elemental sulfur. The FadR domain is a ligand-binding domain of a sub-family of
GntR transcription factors typified by the bacterial fatty acid-sensing regulator FadR (van Aalten et al., 2001). It is a
6-helical domain and the characterized members are known to bind fatty acyl-CoAs such as myristoyl-CoA. It is
possible that members of this fold show greater ligand diversity than is currently known. The AlcR-N domain is
typified by the sensor domain that recognizes the siderophore alcalgin in the Bordetella transcription factor AlcR
(Aravind et al., 2005). This domain is always fused to a C-terminal AraC HTH domain and shows lineage-specific
expansions in certain cyanobacteria. It appears to be a composite domain comprised of a N-terminal all β-strand sub-
domain with the double-stranded β-helix fold and a C-terminal all α-helical sub-domain. The majority of versions of
this domain contain a conserved cysteine that could be a site for covalent attachment of a prosthetic group (Fig. 1).
Globular sensor domains with relationships to catalytic domains
A number of sensor domains in bacterial proteins show relationships to catalytic domains. In particular, several one-
component system transcription factors combine catalytic domains of the enzymes of the pathways which they
transcriptionally regulate with a HTH domain. A classical example of this is the biotin repressor BirA, which
contains a biotin ligase domain involved in biotin adenylation and a DNA-binding winged HTH domain in the same
polypeptide (Hopwood, 2007). Here the enzymatic domain senses biotin to regulate transcription via the wHTH
domain. Other examples of such combinations of enzymatic domains to the HTH domains include the NadR
regulator of NAD metabolism genes (combined to a class-I tRNA synthetase-like nucleotidyltransferase domain),
sugar isomerase (SIS) domains, sugar kinase domains, pyridoxal phosphate-dependent aminotransferase domains
and serine protease domains in the DNA damage response regulator LexA (Aravind et al., 2005). In other instances
a sensor domain of a signaling system is related to an enzymatic domain, but is catalytically inactive. Thus, these
domains are bona fide small molecule binding domains that might have diverged from enzymatic domains. Such
domains are also most commonly encountered in one-component systems, though some versions have been widely
used in other signaling contexts. Below we discuss a few prime examples of such sensor domains.
The double-stranded β-helix
This domain (also called the cupin domain) is a distinctive all-β-βstrand fold that is found in an ancient superfamily
of enzymes that includes sugar isomerases, diverse dioxygenases, histone demethylases of the jumonji family,
isopenicillin synthase, the DNA-repair enzyme AlkB, and protein hydroxylases (Dunwell et al., 2004; Iyer et al.,
2009) (Fig. 4). The small-molecule sensor versions of this domain are prototyped by the sugar-binding domain of
the arabinose operon regulator AraC (Soisson et al., 1997). All members of this fold are characterized by a run of at
least 8-antiparallel strands with pairs of them linked by a short turns to constituting a single coil of the β-helix
(Dunwell et al., 2004) (Fig. 4). The β-helix contains an internal cavity which accommodates a diverse set of ligands.
The catalytically active members of the family have two conserved residues from the N-terminus of the fold (at least
one of which is a histidine) and one from the C-terminus (usually a histidine) that together constitute the substrate-
binding site. In several sensor versions of the fold the N-terminal conserved residues might be lost but they bind the
substrate in a manner identical the catalytic versions (Anantharaman et al., 2001; Dunwell et al., 2004; Soisson et
al., 1997). Like AraC, most sensor versions of this fold are fused to a DNA-binding HTH domain, and based on the
predicted operons to which they belong a large fraction of these are predicted to recognize sugars (Aravind et al.,
2005). It is quite possible that some of the versions fused to HTH, as well as several of the standalone versions of
this fold that are not fused to HTH domains, recognize other types of ligands. The DSBH domain fused to the anti-
sigma factor domain in the regulator ChrR from Rhodobacter sphaeroides uses the same active site to bind a Zn2+
ion with which it senses singlet oxygen to control transcription (Campbell et al., 2007).
The UTRA domain
The archetypal version of this domain is the catalytic domain of the bacterial chorismate lyases (UbiC) and is
comprised of a duplication of a core 3-stranded β-sheet unit with a bihelical “clasp” (Aravind and Anantharaman,
2003) (Fig. 4). The sheet forms the floor while bihelical clasps form the walls of the ligand-binding pocket. The
non-catalytic versions of the fold are the ligand-binding domains of a large subfamily of GntR-type transcription
factors prototyped by the HutC/FarR-like transcription factors. The UTRA domain might be critical for the sensing
of a diverse array of small molecule stimuli by these transcription factors, such as histidine (HutC), fatty acids
(FarR), sugars (TreR) and alkylphosphonate (PhnF) (Aravind and Anantharaman, 2003; Gebhard and Cook, 2008;
Gorelik et al., 2006).
Sensor domains with the ISOCOT fold
This fold is a complex α/β fold that is found in catalytic domains with diverse catalytic activities such as sugar
isomerases, sugar phosphate lactonases, aminosugar deaminases, acetyl-CoA transferases and
methenyltetrahydrofolate synthetase (Fig. 4). The sensor domains of the DeoR and the SorC families of transcription
factors are catalytically inactive versions of this fold (Anantharaman and Aravind, 2006). While being catalytically,
inactive they retain the substrate-binding site typical of the catalytic members of the fold. The ISOCOT fold domain
of the DeoR transcriptions senses a diverse set of sugar derivatives such as deoxyribose nucleoside (DeoR), tagatose
phosphate (LacR), galactosamine (AgaR), myo-inositol (Bacillus IolR) and L-ascorbate (UlaR) (Anantharaman and
Aravind, 2006; Garces et al., 2008). Similarly, the ISOCOT domains of the SorC family also specialize in sugar-
sensing, e.g. L-sorbose (SorC), erythritol (EryD), D-arabitol (DalR) and fructose 1,6 bis-phosphate (CggR).
However, evolutionary analysis indicates that the ISOCOT sensor domains of these two distinct families of
transcription factors were independently derived from distinct enzymatic versions (Anantharaman and Aravind,
2006).
The NTF2 fold
This domain is found in a variety of enzymes such as scytalone dehydratase, limonene-1,2-epoxide hydrolase,
polyketide cyclases and ketosteroid isomerase (Andreeva et al., 2008). A bacterial non-enzymatic version of the
NTF2 fold is found in the orange carotenoid-binding protein, which is deployed by cyanobacteria to protect
themselves against the excess energy absorbed during photosynthesis (Wilson et al., 2007). Here the NTF2 fold
domain binds a carotenoid moiety (Fig. 4). A related version of the NTF2 fold is found fused to the C-termini of a
widespread subfamily of ECF (extracellular function) sigma factors prototyped by the Mycobacterium tuberculosis
stress response transcription factor SigG (Lee et al., 2008a). ECF sigma factors are known to regulate the expression
of genes associated with extracellular or cell surface processes in response to diverse stimuli (Staron et al., 2009). In
light of this, it is conceivable that the NTF2 domains of the SigG-type ECF sigma factors are sensor domains that
help these transcription factors respond to stimuli by binding small molecules.
The MEDS domain
This domain is prototyped by the dichloromethane-sensing domain of the one component transcription factor DcmR
from Methylobacterium (La Roche and Leisinger, 1991). It also occurs fused to histidine kinase domains (e.g. in the
anti-sigma factor PrsR/RsbA from actinobacteria that regulates stress response and aerial mycelia growth (Lee et al.,
2004)) and as a standalone protein (Anantharaman and Aravind, 2005). The available evidence suggests that it is
likely to functionally collaborate with the PocR domain (see above) in certain organisms in binding hydrocarbon
derivatives (Anantharaman and Aravind, 2005; La Roche and Leisinger, 1991). Sequence profile analysis indicates
that the MEDS domain is a catalytically inactive version of the P-loop NTPase domain of the RecA (DNA
recombinase) superfamily. The MEDS domain does not retain the nucleotide-binding active site of the P-loop
NTPase fold. Instead it has a set of polar conserved residues in place of the P-loop motif which potentially constitute
a ligand-binding site. However, the characteristic feature of the MEDS domains is two highly conserved histidines, a
glutamate and an arginine that are located opposite to the conventional P-loop NTPase active. These might
constitute a second ligand-binding site that potentially coordinates a metal or a prosthetic group such as heme
(Anantharaman and Aravind, 2005).
The BLUF domain
This domain is a catalytically inactive version of the acylphosphatase domain that binds FAD (Gomelsky and Klug,
2002). It adopts an α+β fold similar to the ferredoxin domains and binds its FAD ligand between the two helices of
this fold (Fig. 4). The BLUF domain was first characterized in the AppA protein from Rhodobacter sphaeroides
which is required for sensing blue light and regulating photosynthesis-related genes (Braatsch et al., 2002). It is
likely that the domain senses both light and redox by means of its FAD ligand. The domain has been found
combined with GGDEF, EAL, cNMP cyclase and AraC-like HTH domains suggesting that it transmits sensory
inputs through both second-messenger and one-component systems. It has also been found in photosynthetic
eukaryotes such as Euglena where it might mediate phototactic responses via associated cNMP cyclase domains
(Han et al., 2004).
The cryptochrome/photolyase-like photosensors
The photolyase-like domains contain a special version of the α/β Rossmannoid fold known as the HUP (HIGH
nucleotidyl transferase, UspA, Photolyase) fold (Aravind et al., 2002). Members of photolyase-like superfamily bind
FAD, and the classical versions function as DNA repair enzymes in the light-dependent repair of cyclobutane
pyrimidine dimers (Lin and Todo, 2005). A clade of this superfamily in plants and animals function as blue light
sensors and are known as cryptochromes (Lin and Todo, 2005). Several bacteria (e.g. cyanobacteria and Vibrio) too
contain representatives of the cryptochrome domain known as CRY-DASH proteins. Studies in cyanobacteria
suggest that they might function as light-dependent transcription regulators (Brudler et al., 2003). However, their
functions in other bacteria remain poorly understood and they could also function as redox sensors by means of their
FAD cofactor.
Transmembrane sensor domains
7TM receptors
While 7TM receptors were initially characterized as the mainstay of sensory signaling in animals and fungi, their
role in bacteria has only recently become clear. The first prokaryotic 7TM proteins to be characterized were the
sensory rhodopsins of halophilic archaea, which are paralogs of their ion transporting rhodopsins (Purcell and
Crosson, 2008; Spudich and Luecke, 2002). Like all other rhodopsins these proteins contain a retinylidene prosthetic
group combined via a Schiff’s base to a lysine within one of TM segments of the protein (Fig. 4). These archaeal
sensory rhodopsins, SRI and SRII have been shown to function respectively as attractant and repellent phototaxis
receptors for orange and blue-green light. They are believed to transmit the conformational change in the
retinylidene upon photoreception through other membranes protein to the motility complex of the organism (Jung et
al., 2003; Mukohata et al., 1999). A sensory rhodopsin was also found in the cyanobacterium Anabaena (Jung et al.,
2003), which shows a regulatory interaction with the potential sugar-binding ASRAH domain protein ASRT (see
above). Subsequently, other sensory rhodopsins have been found in bacteria and they are likely to function
independently of ASRT (De Souza et al., 2009). One of these found in Salinibacter ruber of the bacteroidetes clade
is encoded in an operon with a chemotaxis signal transducer (SRU_2579, gi: 83815419 and SRU_2580, gi:
83815551), whereas another found in the cyanobacterium Cyanothece sp. PCC 7425 is encoded in an operon with a
photosystem-I reaction center subunit PsaK (Cyan7425_0874, gi: 220906310 and Cyan7425_0875, gi: 220906311).
The former might signal via the chemotaxis signal transducer, whereas the latter might directly signal to the
photosystem-I complex.
Another major family of bacterial 7TM proteins is the 7TMR-DISM family (7TM receptors with diverse
intracellular signaling modules), which is found across the bacterial tree, with expansions in certain lineages such
Leptospira and Cytophaga (Anantharaman and Aravind, 2003). Representatives of these 7TM receptors are
combined with diverse intracellular signaling domains belonging to the two component system, the GGDEF and
EAL, cNMP cyclases, chemotaxis transducers, HTH and Zn-ribbon domains (Fig.1). The combination with the HTH
and ZnR domains suggests that these receptors could potentially constitute unusual membrane-associated one-
component systems. Their extracellular regions typically contain one of two all-β domains, 7TMR-DISMED1 and
7TMR-DISMED2 (7TMR-DISM extracellular domains 1 and 2). These domains are structurally related to the
sugar-binding domains of glucosidases and glucuronidases, and are thus likely to function as sugar sensors in
conjunction with the 7TM domain (Anantharaman and Aravind, 2003). Consistent with this they are expanded in
certain lineages that are known to have an active carbohydrate metabolism like Cytophaga (Anantharaman and
Aravind, 2003). Architecturally these 7TMR-DISMs resemble the animal taste and glutamate receptors in having a
distinct extracellular ligand-binding domain as opposed to the classical 7TM domains, like rhodopsin, which bind a
ligand directly. Another family of 7TM receptors in bacteria, which has lower architectural diversity than the
7TMR-DISMs, is the 7TMR-HD (7TM receptors with intracellular HD domains) family. The intracellular HD
domain of these proteins is likely to function as a phosphodiesterase. The extracellular domain of the 7TMR-HD is a
domain enriched in polar residues with several α-helices which might function as a sensor along with the 7TM
domain. The 7TMR-HD are nearly always encoded in a distinctive operon, which contains genes for a
diacylglycerol kinase, a PhoH-like P-loop NTPase and a YbeY-like metal-dependent hydrolase that might function
as a phospholipase C (Anantharaman and Aravind, 2003). Based on this it appears likely that the 7TMR-HD
proteins regulate a previously unknown signaling pathway dependent on lipid modification. Evidence from E.coli
suggests that this function might be required in heat shock response (Rasouly et al., 2009).
Other multi-TM receptor domains
Bacteria also possess several unique multi-TM receptor domains that are not found in the other superkingdoms of
life. A widespread multi-TM receptor domain is the 5TMR-LYT (5 transmembrane receptors of the LytS-YhcK
type), which is combined to several intracellular signaling domains such the histidine kinase, GGDEF and EAL, the
chemotaxis signal transducer, and also intracellular sensory domains such as PAS and GAF (Anantharaman and
Aravind, 2003). The prototypical member of this family is the receptor domain of the LytS of the histidine kinase
from Gram-positive bacteria such as Staphylococcus aureus (Patton et al., 2006). This protein is believed to sense
changes in the cell-surface to regulate murein hydrolases and autolysis. Based on some recent studies on LytS it
appears possible that the 5TMR-LYT might be required to sense membrane potential (Patton et al., 2006). The
related 5TMR-LYT receptor with an intracellular GGDEF domain, namely GdpS from S.aureus, might also have a
comparable sensory role (Holland et al., 2008). The other widespread multi-TM receptor is the 8TMR-UT domain
(8-transmembrane receptor UhpB type) and it displays a similar set of intracellular domain architectures as the
5TMR-LYT domain (Anantharaman and Aravind, 2003). The archetypal member of this superfamily is the receptor
domain of the E.coli histidine kinase UhpB. Studies on this protein indicate that the 8TMR-UT is an intramembrane
sensor that functions in conjunction with other membrane-embedded proteins, such as UhpC, to recognize
extracellular solutes. In the case of the UhpB system, the UhpC protein initially binds glucose 6-phosphate and upon
doing so associates with the 8TMR-UT protein which transmits the signal to its histidine kinase domain (Verhamme
et al., 2001). Thus, the 8TMR-UT might be conceived as a sensor of protein-protein interactions in the membrane.
The MHYT motif is comprised of two TM domains, with one of them bearing the characteristic signature of the 4
amino acids after which it is named (Galperin et al., 2001; Santiago et al., 1999). The MHYT motifs typically occur
in three copies per protein constituting a 6-TM domain. It addition to combinations with two-component and cyclic
diguanylate signaling domains, it is also combined directly with DNA-binding HTH domains suggesting that it
might also be part of a membrane-associated one-component system. The presence of the conserved histidines in
this domain indicates that it might function in coordinating a metal or heme prosthetic group. Consistent with the
latter suggestion, the MHYT domain has been found in the carbon monoxide-sensing protein from Oligotropha
carboxidovorans (Santiago et al., 1999). A MHYT-containing receptor with intracellular GGDEF-EAL domains in
Pseudomonas aeruginosa might activate alginate formation by similarly sensing gaseous ligands (Hay et al., 2009).
Ligand-gated ion channel domains
The ligand-gated ion channels were originally characterized in animal neural signaling pathways as the receptors of
wide range of neurotransmitters such as acetylcholine, glutamate, serotonin, glycine, gamma-aminobutyric acid,
histamine and also of Zn2+ ions. Of these one of the major families is the ART-LGIC (also called Cys-loop channels
in animals) family that serves as receptor for all of these ligands. The discovery of bacterial homologs of these ion
channels indicated that such a signaling mechanism might be prevalent across diverse bacteria (Tasneem et al.,
2005). The ligands of the bacterial ART-LGICs remain unclear, though they could potentially function as amino
acid sensors. One version from the cyanobacterium Gloeobacter has proposed to be a sensor of pH (Bocquet et al.,
2009; Hilf and Dutzler, 2009) although its actual ligand might still be unknown. The prokaryotic versions show
greater architectural diversity than the eukaryotic versions in showing fusions to the PBP1 and CACHE domains.
This suggests that they might be potentially responsive to multiple ligands (Tasneem et al., 2005). The ART-LGIC
channel module are comprised of two domains – 1) the N-terminal β-sandwich domain which pentamerizes and
contains a ligand-binding pocket at the dimer interface and 2) the C-terminal 4-TM domain that forms the pore of
the ion-channel (Fig. 4). Residues in the 4-TM domain determine the ion selectivity of the channel and based on this
it is likely that different bacteria contain either anionic or cationic channels (Bocquet et al., 2009; Hilf and Dutzler,
2009; Tasneem et al., 2005). Unlike their eukaryotic counterparts the bacterial ART-LGIC lack the conserved
cysteines that form disulfide bonds in their ligand-binding domain. Diverse bacteria also possess homologs of the
second type of eukaryotic ligand-gated ion-channel, the ionotropic glutamate receptors (NMDA-type ligand-gated
channels) (Chen et al., 1999; Kuner et al., 2003). The bacterial versions differ from their animal counterparts in
having a compact architecture with just the core ligand-binding and ion-channel domains. This core shows an
unusual architecture with a ligand-binding PBP2 domain into which are inserted two TM segments. C-terminal to
the PBP2 domain there is one further transmembrane segment (Fig. 1). The channel is formed by a tetramer of such
subunits. The characterized bacterial ionotropic glutamate receptors appear to have the same ligand specificity as
their eukaryotic counterparts and sense L-glutamate via their PBP2 domain (Lee et al., 2008a). Both ART-LGIC and
the ionotropic glutamate receptors are found in cyanobacteria along with a variety of other ion channels suggesting
that they might possess a rather well-developed electrochemical signaling system.
The activity of diverse ion channels might also be regulated by a variety of other intracellular sensor domains such
as the TrkA domain which senses redox, the cyclic nucleotide-binding domain, which gates members of the
mechanosensory ion channel family by binding cNMP, and the CBS domain, which is also likely to bind cNMP or
other adenosine derivatives (Anantharaman et al., 2001; Bateman, 1997; Ignoul and Eggermont, 2005).
Sensor domains of second messenger systems
These domains are not biochemically different in any fundamental sense from the other above- described sensory
modules. However, they are mechanistically distinct in that they sense a second messenger generated by a signaling
system that senses the primary signal. Certain versions of the well-known sensor domains such as the PAS, GAF
and CBS domains act as sensors of second messengers such as cNMP (see above) (Anantharaman et al., 2001;
Martinez et al., 2002). Additionally, cNMPs are recognized by a dedicated cyclic nucleotide binding domains
(cNMPBD or CAP). Thus far only one cyclic diguanylate-binding domain namely, the PilZ domain has been
characterized (Amikam and Galperin, 2006). Currently, there are no known dedicated families of (p)ppGpp binding
domains. In this section we only consider the dedicated secondary messenger-binding domains.
The cNMPBD or CAP domain
The cNMPBD, which binds cyclic nucleotides, is prototyped by the ligand-binding domain of the catabolite
activator transcription factor Crp and related proteins (Kannan et al., 2007; Spiro, 1994). The domain adopts a
double-stranded β-helix fold but shows no detectable sequence similarity with the above-described enzymatic and
sugar-binding version of the DSBH fold (Andreeva et al., 2008). However, like the former domain it also binds the
cNMP ligand within the cavity formed by the β-strands of the DSBH (Fig. 4). The cNMP specificity is conferred by
a distinctive motif present in a subset of the cNMPBDs which contains a conserved arginine that recognizes the
phosphate of the cNMP moiety (Kannan et al., 2007). The cNMPBD is combined with a wide range of other
signaling domains belonging to the one-component, two-component, cyclic diguanylate, ion channel, S/T/Y
phosphorylation systems. Additionally, it is also fused to domains not directly related to any signaling system such
as ABC-transporters (Kannan et al., 2007). These architectures indicate that the cNMPBD confers allosteric
regulatory control to a wide range of processes by sensing cNMP. Versions of the cNMPBD that lack the cNMP
recognition motif found in the canonical versions have been shown to bind ligands unrelated to the second
messenger sensing system. These include the cNMPBDs in the transcription factors like Rhodospirillum rubrum
CooA which contain a conserved histidine in place of the conserved phosphate-binding arginine. These versions
instead coordinate a heme moiety by means of which these one component systems sense carbon monoxide
(Lanzilotta et al., 2000). The cNMPBD of Desulfitobacterium dehalogenans CrpK contains an asparagine in the
same position, by means of which it senses its halogenated growth substrate chlorophenolacetic acid (Joyce et al.,
2006).
The PilZ domain and cyclic diguanylate signaling
The PilZ domain is the only currently known cyclic diguanylate-binding domain (Amikam and Galperin, 2006). The
PilZ domain displays the all-β split-barrel fold with 6 β-strands (Fig. 4). The cyclic diguanylate is bound on a cleft
formed by two extended loops on one side of the barrel (Benach et al., 2007). The PilZ domain is best characterized
in the bacterial cellulose synthases (e.g. from Gluconacetobacter xylinus) where it acts as a binding site for the
allosteric activator of these enzymes, cyclic diguanylate (Weinhouse et al., 1997). This is consistent with the view
that cyclic diguanylate signaling is a major player in the transition between the single-celled motile state to the
multi-cellular sessile state wherein cells are linked by extracellular matrices such as cellulose. The PilZ domain is
also combined with a NtrC-like AAA+ ATPase, chemotaxis signal transducer, histidine kinase, S/T/Y kinase and
condensation domains of non-ribosomal peptide and polyketide synthesis systems. These architectures indicate that
the PilZ domain might play a role in the cross talk between diguanylate signaling and other signaling systems and
also regulation of the production of secondary metabolites such as peptides or polyketides. The PilZ domain is not
very rich in terms of architectural diversity and is relatively under-represented in numerical terms (Supplementary
material). This is somewhat surprising given the number and diversity of predicted cyclic diguanylate cyclases
encoded by bacterial genomes and processes they regulate (Benach et al., 2007; Ryjenkov et al., 2006). It is hence
possible that some of the other sensor domains such as PAS or GAF might also bind cyclic diguanylate. Certain
GGDEF domains in Gram-positive bacteria, e.g. GdpS of S.aureus, do not appear to have any diguanylate cyclase
activity (Holland et al., 2008). PilZ domains are also not known in S.aureus suggesting that the GGDEF domains
could potentially generate as yet unknown second messengers. Alternatively, as shown in E.coli, certain GGDEF-
EAL domain proteins might function in sequestering small RNAs and regulating their stability (Suzuki et al., 2006).
Given the relationship between the GGDEF domains the DNA and RNA polymerases (Makarova et al., 2002) it
would be of interest to see if these bacterial RNA-binding GGDEF domains might act as RNA polymerases or
nucleotidyltransferases in a distinct regulatory pathway.
Domain-architectural syntax of signaling proteins
Proteins with sensor domains show some of the greatest diversity in domain architectures among all bacterial
proteins. Signaling proteins with sensor domains are particular prone to lineage-specific architectural diversity
suggesting that sensor domains show a greater than average tendency for duplication, lost or evolutionarily mobility
(Anantharaman et al., 2001). This tendency might be the result of multiple selective forces: 1) At a basic level there
might not be a strong constraint on the specific number of copies of a certain sensor domains in a given polypeptide.
Thus, the specific number of PAS domain might vary widely even between functionally equivalent or orthologous
proteins in different organisms. 2) Secondly, a number of sensor domains might be functionally equivalent, e.g.
completely different domains can bind ligands such as heme (Dias et al., 2006; Iyer et al., 2003; Moglich et al.,
2009b). Thus, non-homologous but functionally analogous domains could replace each other in comparable
signaling proteins without being selectively disadvantageous (Anantharaman et al., 2001). 3) The same sensory
input might be channeled through widely different downstream effectors (Fig. 1, 5). The choice of different
downstream effectors might be selected by the need for different strengths and types of signals depending on an
organism’s lifestyle. The availability of a large number of complete bacterial genomes allows us to explore this
phenomenon quantitatively. We have formerly devised and utilized a measure to describe the domain architectural
complexity of a class of proteins – the complexity quotient (Aravind et al., 2001; Burroughs et al., 2007b). It
captures two elements of domain architectural complexity, namely the number of domains and the number of
different types of domains in a polypeptide as a single number. The average complexity quotient can then be
calculated for a class of proteins in a given organism. When we plot the average complexity quotient of sensor
proteins for over 200 diverse prokaryotic species we observe a positive correlation with the number of sensor
proteins encode by the genome (Fig. 6). Thus, the architectural complexity of sensor proteins scales as a logarithmic
function (r2=0.78) with increase in the number of different sensor proteins. The rise in architectural complexity is
rapid till around a count of 500 sensor proteins per genome is reached, followed by a more gradual increase
thereafter. This trend suggests that with increasing number of sensor proteins there might be greater opportunity for
recombination between them resulting in highly increased architectural complexity. However, after around 500
proteins there is a gradual saturation of the number of selectively advantageous combinations that can be achieved
(Fig. 6).
Despite their bewildering architectural diversity a number of syntactical features can be discerned in the domain
architectures of sensor proteins. This suggests that there are strong evolutionary constraints on certain aspects of
their architecture. The most important of constraints might be described as those specifying 1) the number, 2) the
position within a polypeptide and 3) the association with other domains in a polypeptide. In terms of number a
striking example is seen in the form of the differences between structural related domains such as PAS, GAF,
CACHE and CHASE. The PAS domain is frequently found as multiple tandem repeats, the GAF might occur singly
or as dyads but rarely as multiple tandem repeats, the CACHE might occur singly or as dyads and the CHASE
always singly (Anantharaman and Aravind, 2000; Anantharaman and Aravind, 2001; Anantharaman et al., 2001;
Moglich et al., 2009b; Mougel and Zhulin, 2001). This suggests that their specific functional differences might have
an influence on the domain architectures in which they occur. Firstly, the CACHE and CHASE domains are
predominantly extracellular small molecule sensor domains – this implies that there may be no selective advantage
in them occurring in multiple copies. In contrast, PAS domains with a major role in dimerization-linked signal
transmission might be selected to occur as tandem repeats that regulate the threshold of stimulus for signal
transmission (Anantharaman and Aravind, 2000; Moglich et al., 2009b; Ponting and Aravind, 1997). Thus, while the
primordial PAS domain might have emerged as a small molecule sensor, it appears to have been subsequently
adapted to function in addition as a signal transmitter domain. In terms of position in a polypeptide, receptor
proteins are under a selective pressure to maintain a certain polarity of the sensor domain with respect to the
downstream signaling domains. This constraint of polarity is particularly noticeable in the case of membrane-
associated receptors that need to have an extracellular sensor domain (Anantharaman and Aravind, 2000;
Anantharaman and Aravind, 2001) (Fig. 1, 5). In terms of intracellular receptors we observe an interesting
difference: One-component systems show lesser constraint on the position of the sensor domain with respect to the
associated regulatory domains (Aravind et al., 2005) (Fig. 1, 5). This is probably because they are typically made up
of just two domains and the conformational change upon receiving the sensory input can be transmitted directly to
the regulatory domain. In contrast, intracellular receptors with effector domains such as those of the two component,
second messenger and chemotaxis signal transducer systems show a strong preference for the primary stimulus-
sensing domain to occur at the N-terminus (Anantharaman and Aravind, 2003; Anantharaman et al., 2006; Aravind
and Ponting, 1999; Galperin et al., 2001; Ulrich and Zhulin, 2005; Williams and Stewart, 1999). However, they
might be additionally regulated allosterically by sensor domains at the C-terminus, especially those that bind second
messengers (Fig.1, 5)(Martinez et al., 2002). The primary reason for the polarity appears to be the signal transmitter
domains such as the S-helix and the HAMP that couple the sensor domain to the effector domains. These signal
transmitter domains appear to transmit the conformational change in a polarized fashion upon signal reception, thus
selecting for a strict order between the sensor and effector domains(Anantharaman et al., 2006; Manson, 2009;
Moglich et al., 2009b; Williams and Stewart, 1999).
The association between different domains across a whole functional category of proteins can be represented as a
network or an ordered graph (Fig. 5). In this graph the domains are the nodes and their co-occurrence in a
polypeptide is an edge joining the two domains, with the direction of the edge showing the order in which they
occur. Examination of such graphs helps in visualizing many of the key trends in the co-occurrence of domains
(Anantharaman et al., 2006). One feature that becomes immediately apparent is the relative promiscuity of most
sensor domains with respect to their co-occurrence with other types of downstream effector domains. Thus, in
different signaling proteins the same sensor domains might be combined with catalytic or effector domains of every
major signaling system (Fig. 5). This flexibility has enabled organisms to channel the sensing of the same stimulus
to different kinds of responses. However, the signal transmitter domains show much stronger constraints in their co-
occurrence trends with respect to sensor and downstream effector domains. Almost always, the HAMP co-occurs
with both a sensor domain and downstream effector domain and never as solo domains. In a few cases (e.g. the
aq_1825, a receptor with a PAS-fold heme-binding extracellular domain) the HAMP domain might occur with only
a sensor domain but no downstream effector domain (Aravind and Ponting, 1999). The S-helix also almost always
co-occurs with two other domains on either side of it in a polypeptide; however in this case the flanking domains
might be both sensor domains or both effector domains or either of them. Interestingly, S/T/Y kinases are rarely
combined downstream of a HAMP or S-helix modules (Anantharaman et al., 2006; Aravind and Ponting, 1999).
This suggests that there is fundamental difference in signal transmission in the S/T/Y kinase-based signaling systems
with respect to all other systems.
Phyletic patterns and demographics of sensor domains in bacteria
The availability of a large number of completed genomes allows us to examine the numerical trends in the
distribution of proteins with sensor domains across the bacterial superkingdom. The number of signaling proteins
encoded by an organism shows a positive correlation with proteome size (Fig. 6). However, within bacteria two
distinct trends are recognizable. In a subset of bacteria the number of proteins with sensor domains appears to scale
linearly with proteome size (Fig. 6). This linear trend is dominated at the high end by organizationally complex
myxobacteria such as Sorangium and Myxobacteria, cyanobacteria like Anabaena, Nostoc and Acaryochloris, and
actinobacteria such as Streptomyces, Frankia and Rhodobacter. In another subset of bacteria the number of proteins
with sensor domains scales as a power law with proteome size (Fig. 6). At the high end of this trend are
alphaproteobacteria of the Rhizobiaceae clade such as Agrobacterium, Bradyrhizobium, Rhizobium,
gammaproteobacteria such as Pseudomonas, and betaproteobacteria such as Cupriavidus (Ralstonia eutropha) and
Burkholderia xenovorans. Finally, planctomyctes like Rhodopirellula and cyanobacteria like Microcystis show an
unusually low count of proteins with sensor domains for their proteome size by considering either trend. The
difference in the linear and power-law trends is marked at proteome sizes greater than 4500 proteins; thus, for the
same number of encoded proteins different types of bacteria can show markedly different complements of proteins
with sensor domains (Fig. 6).
A systematic analysis of each of the families of signaling domains across different bacterial lineages (supplementary
material) revealed that the proteobacteria on the power-law trend, in particular the alphaproteobacteria with large
genomes, have an over-representation of one-component system transcription factors relative to other lineages with
large genomes (Supplementary material). On the other hand, most other major families of sensor proteins in these
proteobacteria following the power-law trend show comparable numbers to those observed in other lineages with
large genomes. This additional set of sensor domains from the one-component systems over and beyond the linear
baseline for the given proteome size appears to be a major contributor to the power-law trend especially in
proteobacteria with large genomes. As against the proteobacteria on the power-law trend, the bacteria at the high-
end within the linear trend (i.e. the myxobacteria, cyanobacteria and actinobacteria) have an over-representation of a
distinct subset of signaling proteins, namely those with domains found in eukaryote-type apoptosis-related signaling
(e.g. STAND ATPases, Caspases, TIR, S/T/Y kinases and FHA domains (Leipe et al., 2004); supplementary
material). This subset of signaling proteins has been, in particular, associated with developmental complexity and
differentiated multicellularity (Aravind et al., 2003; Koonin and Aravind, 2002). This observation suggests that the
two distinct trends in the number of proteins with sensor domains reflect a broad difference in the life-style
strategies of the organisms lying on them. Those on the power-law trend have optimized their sensory systems to
directly and rapidly respond to a large number of small molecule signals in their environment. Those on the linear
trend do not seem to require that many direct responses to environmental small molecules, and instead invest in
distinct signaling systems related to development complexity. This difference might in part arise due to fact that
cyanobacteria produce their own nutrients; hence, they are less-dependent the alphaproteobacteria in responding
rapidly to diffusible solutes in the environment. The actinobacteria and myxobacteria are typically slow growers,
with a saprophytic life-style, which compete strongly for resources by producing antibiotics (Hopwood, 2007;
Whitworth and American Society for Microbiology., 2008). Hence, they are also likely to have a lower dependence
on rapid responses to ambient solutes. Another poorly understood bacterium of the acidobacteria/fibrobacteria clade
is Solibacter usitatus that lies on the linear trend line. By analogy to the other bacteria showing this trend it is
conceivable that this uncultivated bacterium follows a similar lifestyle strategy as the actinobacteria and
myxobacteria. These bacteria also show a well-developed cyclic diguanylate signaling apparatus suggesting that
they might form multicellular differentiated colonies connected by an extracellular matrix (Ward et al., 2009).
Counts of proteins with particular types of sensor domains show very different scaling trends from the overall trend
observed for sensor domain proteins (Fig 6, 7). Some of these throw important light on the lifestyle strategies
adopted by particular bacterial lineages. PAS and the GAF domains show a general power-law scaling with
proteome size (Fig. 6). However, some organisms markedly deviate from the power-law trend to show much greater
than expected numbers of both PAS and GAF domains. The majority of organisms which show simultaneous over-
representation of both these domains belong to photosynthetic lineages, cyanobacteria or chloroflexi, which are not
phylogenetically close. This striking pattern suggests that the PAS and GAF domains have been independently
expanded as potential photo-sensors in both these lineages. Another organism showing a simultaneous expansion of
both these domains is Kineococcus radiodurans, the unusual gamma-irradiation resistant desert-living
actinobacterium (Fig. 6) (Phillips et al., 2002). This unusual bacterium produces a bright orange carotenoid and the
expansion of both GAF and PAS domains might point to a previously unknown photosensitive behavior in this
organism. This is also supported by the presence of a phytochrome-like protein in this organism (Karniol et al.,
2005). PAS domains by themselves are also greatly expanded in several phylogenetically distant motile aquatic
mesophilic bacteria and archaea, suggesting that it might have an important role in motility-associated
chemosensory signal transduction in these bacteria (Fig. 6). In contrast, actinobacteria (excluding Kineococcus) and
rhizobia show relatively fewer PAS or GAF domains for their proteome size, which probably correlates with their
comparatively lower dependence on chemosensory motility. Likewise, a dramatic over-representation of
chemotaxis signal transducers is observed in phylogenetically diverse marine and freshwater bacteria (e.g. Vibrio,
Aeromonas, Marinomonas and Magnetospirillum, Fig. 7) suggesting that a motile lifestyle has strongly selected for
a well-developed chemosensory response that controls the flagellar apparatus in these organisms (Alexander and
Zhulin, 2007; Zhulin, 2001). Chemotaxis transducers are vastly under-represented, when normalized by proteome
size, in actinobacteria, cyanobacteria and developmentally complex myxobacteria, which is consistent with the
lower importance of chemotactic motility in most of these organisms (Fig. 7). The cNMP-sensing domains are one
example, where no apparent relationship between their over-representation and life-style or phylogeny is observed.
They are highly expanded in bacteria with different types of motility, e.g. Magnetospirillum with flagellar motility
and Flavobacterium johnsoniae with gliding motility and also cyanobacteria and rhizobia (Fig. 7).
Among the effector domains downstream of sensor domains, the scaling of histidine kinases with proteome size also
follows a power law function (Fig. 7). Nevertheless, a few organisms, especially certain, myxobacteria and
cyanobacteria and motile aquatic bacteria show a strong over-representation for their respective proteome sizes.
Curiously, the majority of actinobacteria show an under-representation of histidine kinases for the proteome size. In
this regard they depart from the classical pattern of developmentally complex bacteria, though the possible basis for
this under-representation remains unclear. Irrespective of their relative representation in proteomes, histidine
kinases show a strong linear 1:1 correlation (r2=.91) with the number of receiver domains present in the same
proteome (Fig. 7). This suggests that the functional correspondence between the histidine kinase and the respective
receiver domain is a strong evolutionary constraint and they have been accordingly duplicated or lost as pairs.
Histidine kinases are interestingly absent or greatly underrepresented in the genomes of hyperthermophiles,
especially archaea (Koretke et al., 2000) (Supplementary material). As previously proposed, this might be possibly
related to the relative instability of exposed acyl phosphates at high temperatures (Burroughs et al., 2006).
Furthermore, while histidine kinases are present in several eukaryotic lineages, there is no evidence that they were
present in the ancestral eukaryote. Instead, they appear to be lateral transfers that have expanded in particular
lineages such as plants, amoebozoan and heterolobosean amoebae (e.g. Naegleria) (Koretke et al., 2000; Ponting et
al., 1999). This might mean that the relative under-representation of histidine kinases in most eukaryotic lineages
might have been an historical accident related to a part of their ancestry being from a thermophilic archaeal lineage.
The majority of archaea on the other hand have at least a few S/T/Y kinases, the phosphoesters generated by which
are potentially more stable than the exposed acyl phosphates at higher temperatures (Leonard et al., 1998). It is
plausible that this might have tipped the balance in favor of the proliferation of the S/T/Y kinases during the
emergence of eukaryotes. The GGDEF domain and the associated cyclic diguanylate phosphodiesterases, HD-GYP
and EAL are entirely restricted to the bacteria with not even a single occasion of lateral transfer to any of the other
superkingdoms of life (Galperin et al., 1999; Holland et al., 2008; Makarova et al., 2002). The archaeal homologs of
the GGDEF domain are the nucleic acid polymerases of the Crispr system rather than signaling enzymes (Haft et al.,
2005; Makarova et al., 2002). This strict exclusion of cyclic diguanylate might suggest that the compound is toxic
in the two other superkingdoms. Given the relationship of the GGDEF domain to the primary DNA polymerases of
the archaeal and eukaryotic superkingdoms (Makarova et al., 2002) it is possible that cyclic diguanylate might affect
polymerase activity in them.
The majority of sensor domains have been freely exchanged between bacteria and the other superkingdoms. In
particular, several major sensory systems of various eukaryotic lineages appear to be of ultimately bacterial origin.
Examples of these include phytochromes and cytokinin receptors of plants, BLUF domain light receptors of
chlorophyte and euglenozoan algae, ionotropic glutamate receptors, the nitric oxide receptor seen across diverse
eukaryotes, the sensory PBP1 domains of the taste, metabotropic glutamate, atrial natriuretic peptide and speract-
type peptide receptors, ART-LGIC receptors, HD domain cNMP phosphodiesterases, sensory CACHE domains of
the α-2 subunits of the calcium channel, the PAS domains of the circadian regulators and Eag family of potassium
channels in animals and the GAF domains of a novel class of channel-like proteins found in animals and
chromoalveolates (e.g. PFB0510w of Plasmodium falciparum) (Anantharaman and Aravind, 2001; Aravind et al.,
2003; Aravind and Ponting, 1997; Iyer et al., 2003; Lee et al., 2008b; Mougel and Zhulin, 2001; Ponting et al., 1999;
Tasneem et al., 2005). This suggests that the primary diversification of sensor domains and sensory mechanisms
occurred in the bacteria, which had probably “experimented” and “perfected” an enormous diversity of signaling
mechanisms even before the origin of the eukaryotes. Subsequently, the eukaryotes merely acquired these through
lateral transfer and reused them within the context of their own signaling networks that included domains acquired
from the archaeal lineage as well as those innovated within eukaryotes. In certain cases the eukaryotes reused the
bacterial signaling proteins “as is” (e.g. the phytochromes, HD domain cNMP phosphodiesterases and ligand-gated
ion channels). In other cases only certain domains were used and combined with the rest of the eukaryotic signaling
apparatus (e.g. the combination of the PAS domain with the bHLH DNA-binding domain that is unique to
eukaryotes in the circadian regulators) (Aravind and Ponting, 1997; Ponting and Aravind, 1997; Zhulin et al., 1997).
This flow of regulatory systems from bacteria to eukaryotes has also been encountered in other functional systems
such as the protein domains involved in extracellular adhesion, apoptosis-related signaling mechanisms, the
ubiquitin system and chromatin dynamics (Aravind et al., 2003; Aravind and Koonin, 2002; Iyer et al., 2006;
Koonin and Aravind, 2002; Ponting and Aravind, 1997). Hence, the acquisition by eukaryotes of sensor domains
from bacteria might be part of a more general process of the re-use of pre-existing adaptations of the bacterial world
via lateral transfer.
Concluding remarks
The past 15 years have seen an extraordinary rise in our knowledge of sensory signal transduction in bacteria. As
noted earlier sequence analysis has been at the forefront of identifying and characterizing these domains. Yet, this
information has not been used as efficiently as it could have been done. There are several instances of “rediscovery”
of previously described domains under different names (e.g. the re-description of the CACHE domain in CitA and
the HNOB domain (Pellicena et al., 2004; Reinelt et al., 2003) well after the original reports). In addition to
illustrating the lack of propagation of information, these instances have also resulted in some confusion in terms of
nomenclature. This has in part been mitigated by domain repositories like the PFAM and MiST databases (Finn et
al., 2008; Ulrich and Zhulin, 2007). However, much systematization remains to be done so that the researchers
might have a uniformly agreed framework to use in the future. We hope the survey presented here can serve as a
preliminary step towards the development of such as systematic framework. In practical terms several domains have
been poorly explored in terms of the diversity of their cognate ligands. In this regard we hope that the high-
throughput methods of biochemical genomics might have a major impact. Systematic surveys, where extensive
chemical libraries are matched against comprehensive arrays of sensor domains, might help in filling in the lacunae
in our understanding ligand specificities. We also hope the successful implementation of such technology might also
provide a biotechnological byproduct – i.e. development of novel biosensors. Indeed, we appear poised for an even
greater expansion in our understanding utilization of bacterial sensory systems.
Appendix-I
Expansions of domain names
Several domains are named based on the certain members of the superfamily that were initially discovered or
characterized. It should be noted that the names are typically chosen for being euphonic and easy to use, and in most
cases the expansions do not have great semantic value.
PAS: Period, Arnt and Single minded domain
GAF: cGMP phosphodiesterase, Adenylate cyclase, FhlA domain
MEKHLA:
HNOB: Heme NO binding domain
HNOBA: HNOB-associated domain
PocR: domain found in PocR
CACHE: Calcium channels and Chemotaxis receptor domain
CHASE: Cyclases/Histidine kinases Associated Sensory Extracellular domain
V4R: Vinyl-4 reductase domain
SLBB: soluble ligand-binding beta-grasp
HMA: Heavy metal associated domain
ACT: Aspartokinase, chorismate mutase, TyrA domain
SOUL: named after the ckSoul protein from Gallus gallus.
CBS: Cystathionine beta-synthase domain
DICT: Diguanylate cyclases and two-component systems domain
STAS: sulfate transporter, anti-sigma factor-binding domain
ASRAH: Anabaena sensory rhodopsin-associated homology domain
T-OB: transporter-Oligomer-binding domain
4HB: 4-helical up-and-down bundle domain
NIT: Nitrate/Nitrite sensor domain
CCTBP: Cysteine-containing TATA-box binding protein-like domain
FadR: domain found in FadR protein
AlcR-N: N-termina domain of AlcR protein
UTRA: UbiC transcription regulator-associated domain
ISOCOT: for isomerase, CoA transferase and translation initiation factor domain
NTF2: domain found in NTF2
MEDS: Methanogen/methylotroph, DcmR sensory domain
BLUF: blue-light Sensors using FAD
7TMR-DISM family: 7TM receptors with diverse intracellular signaling modules
7TMR-HD: 7TM receptors with intracellular HD domains
5TMR-LYT: 5 transmembrane receptors of the LytS-YhcK type
8TMR-UT: 8-transmembrane receptor UhpB type
MHYT: motif containing methionine, histidine, tyrosine and threonine.
ART-LGIC: Acetylcholine-type ligand-gated ion channels
cNMPBD/CAP: cyclic nucleotide monophosphate-binding domain/ domain found in catabolite activator protein
PilZ: Domain found in PilZ.
STAND: Signal transduction ATPases with numerous domains
Appendix-II
Abbreviations of organism names shown in the figures:
Aave: Acidovorax avenae, Abac: Acidobacteria bacterium, Aful: Archaeoglobus fulgidus , Ahyd: Aeromonas
hydrophila, Amar: Acaryochloris marina, Amet: Alkaliphilus metalliredigens, Ana: Nostoc sp. , Anae:
Anaeromyxobacter sp., Atum: Agrobacterium tumefaciens, Avar: Anabaena variabilis , Bjap: Bradyrhizobium
japonicum , Bpet: Bordetella petrii, Bsub: Bacillus subtilis, Bxen: Burkholderia xenovorans, Caur: Chloroflexus
aurantiacus , Cbei: Clostridium beijerinckii , Cpsy: Colwellia psychrerythraea, Ctha: Chloroherpeton thalassium,
Cvio: Chromobacterium violaceum, Daci: Delftia acidovorans , Daro: Dechloromonas aromatica, Ddes:
Desulfovibrio desulfuricans , Dhaf: Desulfitobacterium hafniense, Ecol: Escherichia coli , Fjoh: Flavobacterium
johnsoniae, Fran: Frankia sp., Gura: Geobacter uraniireducens , Haur: Herpetosiphon aurantiacus, Hche: Hahella
chejuensis, Hmar: Haloarcula marismortui , Hmod : Heliobacterium modesticaldum, Krad: Kineococcus
radiotolerans, Lcho: Leptothrix cholodnii, Lsph: Lysinibacillus sphaericus, Mace: Methanosarcina acetivorans,
Maer: Microcystis aeruginosa , Magn: Magnetococcus sp., Mari: Marinomonas sp., Meth: Methylobacterium sp.,
Mhun: Methanospirillum hungatei, Mlot: Mesorhizobium loti, Mmag: Magnetospirillum magneticum , Msme:
Mycobacterium smegmatis, Mxan: Myxococcus xanthus, Nfar: Nocardia farcinica, Npun: Nostoc punctiforme, Oter:
Opitutus terrae, Paer: Pseudomonas aeruginosa , Ping: Psychromonas ingrahamii, Pola: Polaromonas sp., Rbal:
Rhodopirellula baltica , Reut: Ralstonia eutropha, Rhod: Rhodococcus jostii, Rleg: Rhizobium leguminosarum,
Rpal: Rhodopseudomonas palustris, Rpic: Ralstonia pickettii , Rrub: Rhodospirillum rubrum, Scel: Sorangium
cellulosum, Scoe: Streptomyces coelicolor, Sdeg: Saccharophagus degradans , Sery: Saccharopolyspora erythraea ,
Sfum: Syntrophobacter fumaroxidans, Smel: Sinorhizobium meliloti , Ssp:Synechocystis sp., Susi: Solibacter
usitatus , Swoo: Shewanella woodyi, Tcru : Thiomicrospira crunogena, Tden: Treponema denticola , Tten:
Thermoanaerobacter tengcongensis, Umet: uncultured methanogenic, Vcho: Vibrio cholerae, Vhar: Vibrio harveyi,
Wsuc: Wolinella succinogenes , Xaut: Xanthobacter autotrophicus
Acknowledgements
Research by the authors is funded by the Intramural Research Program of the National Institutes of Health, United
States of America.
Supplementary material
Supplementary material is available at: ftp://ftp.ncbi.nih.gov/pub/aravind/signaling/
Figure legends
Figure 1. Domain architectures of proteins with sensor domains. Architectures are denoted by their gene name
and species abbreviations, separated by underscores. In the case of the bacterial HD-phosphodiesterase fused to
GAF domains the gene is from an environmental delta proteobacterium genome sequence. For species abbreviations
see Appendix-II. Domains are denoted by their standard names (Appendix-I) or abbreviations provided in the key.
For a comprehensive list of domain names refer to the supplementary materials.
Figure 2. Cartoon representation of diverse sensor domains. Structures are labeled by their type, the protein
name and the PDB id from which the domains were derived. Ligands for most structures are either shown as ball
and stick representations or as space-fill models. The figure shows sensor domains of the PAS-like, HNOB, β-
grasp, ferredoxin-like and the GyrI-like folds.
Figure 3. Cartoon representation of diverse sensor domains. Structures are annotated as in Fig. 2. Illustrated in
the figure are sensor domains belonging to the CBS, STAS, periplasmic-binding, T-OB, 4-helical up and down
bundle, and the ATP-cone folds.
Figure 4. Cartoon representation of diverse sensor domains. Structures are annotated as in Fig. 2. Displayed are
structures of members of the DSBH, UTRA, ISOCOT, NTF2, BLUF, ART-LGIC, cNMPBD and PilZ
superfamilies. Also illustrated is the 7-TM receptor seen in sensory rhodopsins.
Figure 5. Domain architecture networks of proteins involved in signaling.
The network diagrams show connections of domains with each other and all other domains occurring in their
respective polypeptides proteins. A hypothetical example showing how domain architecture networks are
constructed is shown in the yellow box. A, B, C and D are globular domains that occur in a range of combinations.
These are combined into an architectural network where the globular domains are notes and the edges reflect their
physical connectivity. The central figure shows a network of functions, where the domains have been collapsed into
the labeled functional categories. The key functions involved in signaling have been placed in the center of the
network. The thickness of the edges is approximately proportional to the relative frequency with which linkages
between two domains re-occur in distinct polypeptides of signaling domain containing prokaryotic proteins from the
220 organisms (supplementary material). Starting from the right hand corner going clockwise, the networks
represent interaction of signaling domains with 1) two component 2) apoptosis-related signaling domains, 3)
transmembrane receptors and 4) cyclic nucleotide and cyclic diguanylate binding domains. Domains that belong to
the respective functions are placed in the middle of the network and labeled in brown. Sensory domains and other
notable signaling domains have been labeled in each network. The graphs were rendered using Yed
(http://www.yworks.com/en/products_yed_about.html)
Figure 6. Complexity quotient plots and scaling of proteins containing sensor, PAS and GAF domains. a)
Complexity quotient plot for signaling proteins. The “complexity quotient” for an organism is defined as the product
of two values: the number of different types of domains which co-occurs in signaling proteins, and the average
number of domains detected in these proteins. The complexity quotient is plotted against the total number of
signaling proteins in a given organism. A log curve fitting the general trend of the majority of organisms is shown.
Organisms with anomalous numbers and those at the higher end are labeled. Each protein has at least a single
known or predicted domain with a signaling-related function. A total of 166 signaling and DNA-binding HTH
domains were collected from 220 representative prokaryotic genomes (see supplementary material for a complete
list). b) Scaling of sensor domain containing proteins with proteome size. A total of 55 sensor domains in the 220
genomes were analyzed for this graph. A linear equation fitting the general trend of the majority of organisms is
shown. c) Scaling of PAS domain containing proteins with proteome size. A power law curve fitting the general
trend of the majority of organisms is shown. d) Scaling of GAF domain containing proteins with proteome size. An
exponential curve fitting the general trend of the majority of organisms is shown
Figure 7. Scaling of cNMPBD, chemotaxis receptors and histidine kinases with proteome size, and plot of
receiver domains against histidine kinases. a) Scaling of cNMPBD domain containing proteins with proteome
size. b) Scaling of chemotaxis transducer-domain containing proteins with proteome size. c) Scaling of histidine
kinase domain containing proteins with proteome size. d) Plot of receiver domains vs histidine kinases. In all the
graphs organisms with anomalous numbers and those at the high end are labeled. The organisms analyzed and the
abbreviations of the labeled organisms are as in Appendix-II
References
Aepfelbacher, M., Zumbihl, R., and Heesemann, J. (2005). Modulation of Rho GTPases and the actin cytoskeleton by YopT of Yersinia. Curr Top Microbiol Immunol 291, 167‐175. Alexander, R. P., and Zhulin, I. B. (2007). Evolutionary genomics reveals conserved structural determinants of signaling and adaptation in microbial chemoreceptors. Proc Natl Acad Sci U S A 104, 2885‐2890. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389‐3402. Amikam, D., and Galperin, M. Y. (2006). PilZ domain is part of the bacterial c‐di‐GMP binding protein. Bioinformatics 22, 3‐6. Anantharaman, V., and Aravind, L. (2000). Cache ‐ a signaling domain common to animal Ca(2+)‐channel subunits and a class of prokaryotic chemotaxis receptors. Trends Biochem Sci 25, 535‐537. Anantharaman, V., and Aravind, L. (2001). The CHASE domain: a predicted ligand‐binding module in plant cytokinin receptors and other eukaryotic and bacterial receptors. Trends Biochem Sci 26, 579‐582. Anantharaman, V., and Aravind, L. (2003). Application of comparative genomics in the identification and analysis of novel families of membrane‐associated receptors in bacteria. BMC Genomics 4, 34. Anantharaman, V., and Aravind, L. (2004). The SHS2 module is a common structural theme in functionally diverse protein groups, like Rpb7p, FtsA, GyrI, and MTH1598/TM1083 superfamilies. Proteins 56, 795‐807. Anantharaman, V., and Aravind, L. (2005). MEDS and PocR are novel domains with a predicted role in sensing simple hydrocarbon derivatives in prokaryotic signal transduction systems. Bioinformatics 21, 2805‐2811. Anantharaman, V., and Aravind, L. (2006). Diversification of catalytic activities and ligand interactions in the protein fold shared by the sugar isomerases, eIF2B, DeoR transcription factors, acyl‐CoA transferases and methenyltetrahydrofolate synthetase. J Mol Biol 356, 823‐842. Anantharaman, V., Balaji, S., and Aravind, L. (2006). The signaling helix: a common functional theme in diverse signaling proteins. Biol Direct 1, 25. Anantharaman, V., Koonin, E. V., and Aravind, L. (2001). Regulatory potential, phyletic distribution and evolution of ancient, intracellular small‐molecule‐binding domains. J Mol Biol 307, 1271‐1292. Andreeva, A., Howorth, D., Chandonia, J. M., Brenner, S. E., Hubbard, T. J., Chothia, C., and Murzin, A. G. (2008). Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36, D419‐425. Aravind, L., and Anantharaman, V. (2003). HutC/FarR‐like bacterial transcription factors of the GntR family contain a small molecule‐binding domain of the chorismate lyase fold. FEMS Microbiol Lett 222, 17‐23. Aravind, L., Anantharaman, V., Balaji, S., Babu, M. M., and Iyer, L. M. (2005). The many faces of the helix‐turn‐helix domain: transcription regulation and beyond. FEMS Microbiol Rev 29, 231‐262. Aravind, L., Anantharaman, V., and Iyer, L. M. (2003). Evolutionary connections between bacterial and eukaryotic signaling systems: a genomic perspective. Curr Opin Microbiol 6, 490‐497. Aravind, L., Dixit, V. M., and Koonin, E. V. (2001). Apoptotic molecular machinery: vastly increased complexity in vertebrates revealed by genome comparisons. Science 291, 1279‐1284. Aravind, L., and Koonin, E. V. (1998). The HD domain defines a new superfamily of metal‐dependent phosphohydrolases. Trends Biochem Sci 23, 469‐472. Aravind, L., and Koonin, E. V. (1999a). DNA polymerase beta‐like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res 27, 1609‐1618.
Aravind, L., and Koonin, E. V. (1999b). Gleaning non‐trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol 287, 1023‐1040. Aravind, L., and Koonin, E. V. (2000). The STAS domain ‐ a link between anion transporters and antisigma‐factor antagonists. Curr Biol 10, R53‐55. Aravind, L., and Koonin, E. V. (2002). Classification of the caspase‐hemoglobinase fold: detection of new families and implications for the origin of the eukaryotic separins. Proteins 46, 355‐367. Aravind, L., Mazumder, R., Vasudevan, S., and Koonin, E. V. (2002). Trends in protein evolution inferred from sequence and structure analysis. Curr Opin Struct Biol 12, 392‐399. Aravind, L., and Ponting, C. P. (1997). The GAF domain: an evolutionary link between diverse phototransducing proteins. Trends Biochem Sci 22, 458‐459. Aravind, L., and Ponting, C. P. (1999). The cytoplasmic helical linker domain of receptor histidine kinase and methyl‐accepting proteins is common to many prokaryotic signalling proteins. FEMS Microbiol Lett 176, 111‐116. Avila‐Perez, M., Vreede, J., Tang, Y., Bende, O., Losi, A., Gartner, W., and Hellingwerf, K. (2009). In vivo mutational analysis of YtvA from Bacillus subtilis: mechanism of light activation of the general stress response. J Biol Chem 284, 24958‐24964. Ayers, R. A., and Moffat, K. (2008). Changes in quaternary structure in the signaling mechanisms of PAS domains. Biochemistry 47, 12078‐12086. Baker, M. D., Wolanin, P. M., and Stock, J. B. (2006). Signal transduction in bacterial chemotaxis. Bioessays 28, 9‐22. Barabote, R. D., and Saier, M. H., Jr. (2005). Comparative genomic analyses of the bacterial phosphotransferase system. Microbiol Mol Biol Rev 69, 608‐634. Bateman, A. (1997). The structure of a domain common to archaebacteria and the homocystinuria disease protein. Trends Biochem Sci 22, 12‐13. Benach, J., Swaminathan, S. S., Tamayo, R., Handelman, S. K., Folta‐Stogniew, E., Ramos, J. E., Forouhar, F., Neely, H., Seetharaman, J., Camilli, A., and Hunt, J. F. (2007). The structural basis of cyclic diguanylate signal transduction by PilZ domains. Embo J 26, 5153‐5166. Black, R. A., Hobson, A. C., and Adler, J. (1980). Involvement of cyclic GMP in intracellular signaling in the chemotactic response of Escherichia coli. Proc Natl Acad Sci U S A 77, 3879‐3883. Bocquet, N., Nury, H., Baaden, M., Le Poupon, C., Changeux, J. P., Delarue, M., and Corringer, P. J. (2009). X‐ray structure of a pentameric ligand‐gated ion channel in an apparently open conformation. Nature 457, 111‐114. Braatsch, S., Gomelsky, M., Kuphal, S., and Klug, G. (2002). A single flavoprotein, AppA, integrates both redox and light signals in Rhodobacter sphaeroides. Mol Microbiol 45, 827‐836. Brudler, R., Hitomi, K., Daiyasu, H., Toh, H., Kucho, K., Ishiura, M., Kanehisa, M., Roberts, V. A., Todo, T., Tainer, J. A., and Getzoff, E. D. (2003). Identification of a new cryptochrome class. Structure, function, and evolution. Mol Cell 11, 59‐67. Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sutton, G. G., Blake, J. A., FitzGerald, L. M., Clayton, R. A., Gocayne, J. D., et al. (1996). Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058‐1073. Burroughs, A. M., Allen, K. N., Dunaway‐Mariano, D., and Aravind, L. (2006). Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. J Mol Biol 361, 1003‐1034. Burroughs, A. M., Balaji, S., Iyer, L. M., and Aravind, L. (2007a). A novel superfamily containing the beta‐grasp fold involved in binding diverse soluble ligands. Biol Direct 2, 4. Burroughs, A. M., Balaji, S., Iyer, L. M., and Aravind, L. (2007b). Small but versatile: the extraordinary functional and structural diversity of the beta‐grasp fold. Biol Direct 2, 18.
Burroughs, A. M., Iyer, L. M., and Aravind, L. (2009). Natural history of the E1‐like superfamily: implication for adenylation, sulfur transfer, and ubiquitin conjugation. Proteins 75, 895‐910. Campbell, E. A., Greenwell, R., Anthony, J. R., Wang, S., Lim, L., Das, K., Sofia, H. J., Donohue, T. J., and Darst, S. A. (2007). A conserved structural module regulates transcriptional responses to diverse stress signals in bacteria. Mol Cell 27, 793‐805. Chai, Y., Zhu, J., and Winans, S. C. (2001). TrlR, a defective TraR‐like protein of Agrobacterium tumefaciens, blocks TraR function in vitro by forming inactive TrlR:TraR dimers. Mol Microbiol 40, 414‐421. Chang, C., Kwok, S. F., Bleecker, A. B., and Meyerowitz, E. M. (1993). Arabidopsis ethylene‐response gene ETR1: similarity of product to two‐component regulators. Science 262, 539‐544. Chen, G. Q., Cui, C., Mayer, M. L., and Gouaux, E. (1999). Functional characterization of a potassium‐selective prokaryotic glutamate receptor. Nature 402, 817‐821. Cheung, J., and Hendrickson, W. A. (2009). Structural analysis of ligand stimulation of the histidine kinase NarX. Structure 17, 190‐201. Daber, R., Stayrook, S., Rosenberg, A., and Lewis, M. (2007). Structural analysis of lac repressor bound to allosteric effectors. J Mol Biol 370, 609‐619. De Souza, R. F., Iyer, L. M., and Aravind, L. (2009). The Anabaena sensory rhodopsin transducer defines a novel superfamily of prokaryotic small‐molecule binding domains. Biol Direct 4, 25; discussion 25. Dias, J. S., Macedo, A. L., Ferreira, G. C., Peterson, F. C., Volkman, B. F., and Goodfellow, B. J. (2006). The first structure from the SOUL/HBP family of heme‐binding proteins, murine P22HBP. J Biol Chem 281, 31553‐31561. Dunwell, J. M., Purvis, A., and Khuri, S. (2004). Cupins: the most functionally diverse protein superfamily? Phytochemistry 65, 7‐17. Durbin, R. (1998). Biological sequence analysis : probabilistic models of proteins and nucleic acids (Cambridge, UK New York: Cambridge University Press). Ettema, T. J., Brinkman, A. B., Tani, T. H., Rafferty, J. B., and Van Der Oost, J. (2002). A novel ligand‐binding domain involved in regulation of amino acid metabolism in prokaryotes. J Biol Chem 277, 37464‐37468. Ezezika, O. C., Haddad, S., Clark, T. J., Neidle, E. L., and Momany, C. (2007). Distinct effector‐binding sites enable synergistic transcriptional activation by BenM, a LysR‐type regulator. J Mol Biol 367, 616‐629. Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut, S. J., Hotz, H. R., Ceric, G., Forslund, K., Eddy, S. R., Sonnhammer, E. L., and Bateman, A. (2008). The Pfam protein families database. Nucleic Acids Res 36, D281‐288. Fischer, E. H., and Krebs, E. G. (1955). Conversion of phosphorylase b to phosphorylase a in muscle extracts. J Biol Chem 216, 121‐132. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., and et al. (1995). Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496‐512. Galperin, M. Y., Gaidenko, T. A., Mulkidjanian, A. Y., Nakano, M., and Price, C. W. (2001). MHYT, a new integral membrane sensor domain. FEMS Microbiol Lett 205, 17‐23. Galperin, M. Y., Natale, D. A., Aravind, L., and Koonin, E. V. (1999). A specialized version of the HD hydrolase domain implicated in signal transduction. J Mol Microbiol Biotechnol 1, 303‐305. Gao, R., and Stock, A. M. (2009). Biological insights from structures of two‐component proteins. Annu Rev Microbiol 63, 133‐154. Garces, F., Fernandez, F. J., Gomez, A. M., Perez‐Luque, R., Campos, E., Prohens, R., Aguilar, J., Baldoma, L., Coll, M., Badia, J., and Vega, M. C. (2008). Quaternary structural transitions in the DeoR‐type repressor UlaR control transcriptional readout from the L‐ascorbate utilization regulon in Escherichia coli. Biochemistry 47, 11424‐11433.
Gebhard, S., and Cook, G. M. (2008). Differential regulation of high‐affinity phosphate transport systems of Mycobacterium smegmatis: identification of PhnF, a repressor of the phnDCE operon. J Bacteriol 190, 1335‐1343. Gilman, A. G. (1987). G proteins: transducers of receptor‐generated signals. Annu Rev Biochem 56, 615‐649. Gomelsky, M., and Klug, G. (2002). BLUF: a novel FAD‐binding domain involved in sensory transduction in microorganisms. Trends Biochem Sci 27, 497‐500. Gorelik, M., Lunin, V. V., Skarina, T., and Savchenko, A. (2006). Structural characterization of GntR/HutC family signaling domain. Protein Sci 15, 1506‐1511. Haft, D. H., Selengut, J., Mongodin, E. F., and Nelson, K. E. (2005). A guild of 45 CRISPR‐associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol 1, e60. Han, Y., Braatsch, S., Osterloh, L., and Klug, G. (2004). A eukaryotic BLUF domain mediates light‐dependent gene expression in the purple bacterium Rhodobacter sphaeroides 2.4.1. Proc Natl Acad Sci U S A 101, 12306‐12311. Hanks, S. K., and Hunter, T. (1995). Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. Faseb J 9, 576‐596. Hay, I. D., Remminghorst, U., and Rehm, B. H. (2009). MucR, a novel membrane‐associated regulator of alginate biosynthesis in Pseudomonas aeruginosa. Appl Environ Microbiol 75, 1110‐1120. Hess, J. F., Bourret, R. B., and Simon, M. I. (1988a). Histidine phosphorylation and phosphoryl group transfer in bacterial chemotaxis. Nature 336, 139‐143. Hess, J. F., Oosawa, K., Kaplan, N., and Simon, M. I. (1988b). Phosphorylation of three proteins in the signaling pathway of bacterial chemotaxis. Cell 53, 79‐87. Hilf, R. J., and Dutzler, R. (2009). Structure of a potentially open state of a proton‐activated pentameric ligand‐gated ion channel. Nature 457, 115‐118. Hochstrasser, M. (2009). Origin and function of ubiquitin‐like proteins. Nature 458, 422‐429. Hogg, T., Mechold, U., Malke, H., Cashel, M., and Hilgenfeld, R. (2004). Conformational antagonism between opposing active sites in a bifunctional RelA/SpoT homolog modulates (p)ppGpp metabolism during the stringent response [corrected]. Cell 117, 57‐68. Holland, L. M., O'Donnell, S. T., Ryjenkov, D. A., Gomelsky, L., Slater, S. R., Fey, P. D., Gomelsky, M., and O'Gara, J. P. (2008). A staphylococcal GGDEF domain protein regulates biofilm formation independently of cyclic dimeric GMP. J Bacteriol 190, 5178‐5189. Hopwood, D. A. (2007). Streptomyces in nature and medicine : the antibiotic maker (Oxford ; New York: Oxford University Press). Huang, N., De Ingeniis, J., Galeazzi, L., Mancini, C., Korostelev, Y. D., Rakhmaninova, A. B., Gelfand, M. S., Rodionov, D. A., Raffaelli, N., and Zhang, H. (2009). Structure and function of an ADP‐ribose‐dependent transcriptional regulator of NAD metabolism. Structure 17, 939‐951. Ignoul, S., and Eggermont, J. (2005). CBS domains: structure, function, and pathology in human proteins. Am J Physiol Cell Physiol 289, C1369‐1378. Iyer, L. M., Anantharaman, V., and Aravind, L. (2003). Ancient conserved domains shared by animal soluble guanylyl cyclases and bacterial signaling proteins. BMC Genomics 4, 5. Iyer, L. M., Burroughs, A. M., and Aravind, L. (2006). The prokaryotic antecedents of the ubiquitin‐signaling system and the early evolution of ubiquitin‐like beta‐grasp domains. Genome Biol 7, R60. Iyer, L. M., Burroughs, A. M., and Aravind, L. (2008). Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination. Biol Direct 3, 45. Iyer, L. M., Tahiliani, M., Rao, A., and Aravind, L. (2009). Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle 8, 1698‐1710.
Jacob, F., and Monod, J. (1959). [Genes of structure and genes of regulation in the biosynthesis of proteins.]. C R Hebd Seances Acad Sci 249, 1282‐1284. Jagadeesan, S., Mann, P., Schink, C. W., and Higgs, P. I. (2009). A novel "four‐component" two‐component signal transduction mechanism regulates developmental progression in Myxococcus xanthus. J Biol Chem 284, 21435‐21445. Jordan, I. K., Natale, D. A., Koonin, E. V., and Galperin, M. Y. (2001). Independent evolution of heavy metal‐associated domains in copper chaperones and copper‐transporting atpases. J Mol Evol 53, 622‐633. Joyce, M. G., Levy, C., Gabor, K., Pop, S. M., Biehl, B. D., Doukov, T. I., Ryter, J. M., Mazon, H., Smidt, H., van den Heuvel, R. H., et al. (2006). CprK crystal structures reveal mechanism for transcriptional control of halorespiration. J Biol Chem 281, 28318‐28325. Jung, K. H., Trivedi, V. D., and Spudich, J. L. (2003). Demonstration of a sensory rhodopsin in eubacteria. Mol Microbiol 47, 1513‐1522. Kannan, N., Wu, J., Anand, G. S., Yooseph, S., Neuwald, A. F., Venter, J. C., and Taylor, S. S. (2007). Evolution of allostery in the cyclic nucleotide binding module. Genome Biol 8, R264. Karniol, B., Wagner, J. R., Walker, J. M., and Vierstra, R. D. (2005). Phylogenetic analysis of the phytochrome superfamily reveals distinct microbial subfamilies of photoreceptors. Biochem J 392, 103‐116. Klenk, H. P., Clayton, R. A., Tomb, J. F., White, O., Nelson, K. E., Ketchum, K. A., Dodson, R. J., Gwinn, M., Hickey, E. K., Peterson, J. D., et al. (1997). The complete genome sequence of the hyperthermophilic, sulphate‐reducing archaeon Archaeoglobus fulgidus. Nature 390, 364‐370. Koonin, E. V., and Aravind, L. (2002). Origin and evolution of eukaryotic apoptosis: the bacterial connection. Cell Death Differ 9, 394‐404. Koonin, E. V., Wolf, Y. I., and Aravind, L. (2000). Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem 54, 245‐275. Koretke, K. K., Lupas, A. N., Warren, P. V., Rosenberg, M., and Brown, J. R. (2000). Evolution of two‐component signal transduction. Mol Biol Evol 17, 1956‐1970. Koshland, D. E., Jr. (1980). Bacterial chemotaxis in relation to neurobiology. Annu Rev Neurosci 3, 43‐75. Kovacs, H., Comfort, D., Lord, M., Campbell, I. D., and Yudkin, M. D. (1998). Solution structure of SpoIIAA, a phosphorylatable component of the system that regulates transcription factor sigmaF of Bacillus subtilis. Proc Natl Acad Sci U S A 95, 5067‐5071. Krebs, E. G. (1998). An accidental biochemist. Annu Rev Biochem 67, xii‐xxxii. Kreil, G., and Boyer, P. D. (1964). Detection of bound phosphohistidine in E. coli succinate thiokinase. Biochem Biophys Res Commun 16, 551‐555. Kuner, T., Seeburg, P. H., and Guy, H. R. (2003). A common architecture for K+ channels and ionotropic glutamate receptors? Trends Neurosci 26, 27‐32. La Roche, S. D., and Leisinger, T. (1991). Identification of dcmR, the regulatory gene governing expression of dichloromethane dehalogenase in Methylobacterium sp. strain DM4. J Bacteriol 173, 6714‐6721. Lamparter, T., Carrascal, M., Michael, N., Martinez, E., Rottwinkel, G., and Abian, J. (2004). The biliverdin chromophore binds covalently to a conserved cysteine residue in the N‐terminus of Agrobacterium phytochrome Agp1. Biochemistry 43, 3659‐3669. Lanzilotta, W. N., Schuller, D. J., Thorsteinsson, M. V., Kerby, R. L., Roberts, G. P., and Poulos, T. L. (2000). Structure of the CO sensing transcription activator CooA. Nat Struct Biol 7, 876‐880. Lee, E. J., Cho, Y. H., Kim, H. S., Ahn, B. E., and Roe, J. H. (2004). Regulation of sigmaB by an anti‐ and an anti‐anti‐sigma factor in Streptomyces coelicolor in response to osmotic stress. J Bacteriol 186, 8490‐8498.
Lee, J. H., Geiman, D. E., and Bishai, W. R. (2008a). Role of stress response sigma factor SigG in Mycobacterium tuberculosis. J Bacteriol 190, 1128‐1133. Lee, J. H., Kang, G. B., Lim, H. H., Jin, K. S., Kim, S. H., Ree, M., Park, C. S., Kim, S. J., and Eom, S. H. (2008b). Crystal structure of the GluR0 ligand‐binding core from Nostoc punctiforme in complex with L‐glutamate: structural dissection of the ligand interaction and subunit interface. J Mol Biol 376, 308‐316. Leipe, D. D., Koonin, E. V., and Aravind, L. (2004). STAND, a class of P‐loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. J Mol Biol 343, 1‐28. Leonard, C. J., Aravind, L., and Koonin, E. V. (1998). Novel families of putative protein kinases in bacteria and archaea: evolution of the "eukaryotic" protein kinase superfamily. Genome Res 8, 1038‐1047. Lin, C., and Todo, T. (2005). The cryptochromes. Genome Biol 6, 220. Linder, J. U., and Schultz, J. E. (2008). Versatility of signal transduction encoded in dimeric adenylyl cyclases. Curr Opin Struct Biol 18, 667‐672. Ma, X., Sayed, N., Baskaran, P., Beuve, A., and van den Akker, F. (2008). PAS‐mediated dimerization of soluble guanylyl cyclase revealed by signal transduction histidine kinase domain crystal structure. J Biol Chem 283, 1167‐1178. Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B., and Koonin, E. V. (2002). A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res 30, 482‐496. Manson, M. D. (2009). A mutational wrench in the HAMP gearbox. Mol Microbiol 73, 742‐746. Marquenet, E., and Richet, E. (2007). How integration of positive and negative regulatory signals by a STAND signaling protein depends on ATP hydrolysis. Mol Cell 28, 187‐199. Martinez, S. E., Beavo, J. A., and Hol, W. G. (2002). GAF domains: two‐billion‐year‐old molecular switches that bind cyclic nucleotides. Mol Interv 2, 317‐323. Moglich, A., Ayers, R. A., and Moffat, K. (2009a). Design and signaling mechanism of light‐regulated histidine kinases. J Mol Biol 385, 1433‐1444. Moglich, A., Ayers, R. A., and Moffat, K. (2009b). Structure and signaling mechanism of Per‐ARNT‐Sim domains. Structure 17, 1282‐1294. Mohammad, D. H., and Yaffe, M. B. (2009). 14‐3‐3 proteins, FHA domains and BRCT domains in the DNA damage response. DNA Repair (Amst) 8, 1009‐1017. Mougel, C., and Zhulin, I. B. (2001). CHASE: an extracellular sensing domain common to transmembrane receptors from prokaryotes, lower eukaryotes and plants. Trends Biochem Sci 26, 582‐584. Mukherjee, K., and Burglin, T. R. (2006). MEKHLA, a novel domain with similarity to PAS domains, is fused to plant homeodomain‐leucine zipper III proteins. Plant Physiol 140, 1142‐1150. Mukohata, Y., Ihara, K., Tamura, T., and Sugiyama, Y. (1999). Halobacterial rhodopsins. J Biochem 125, 649‐657. Munoz‐Dorado, J., Inouye, S., and Inouye, M. (1991). A gene encoding a protein serine/threonine kinase is required for normal development of M. xanthus, a gram‐negative bacterium. Cell 67, 995‐1006. Neves, S. R., Ram, P. T., and Iyengar, R. (2002). G protein pathways. Science 296, 1636‐1639. Newberry, K. J., and Brennan, R. G. (2004). The structural mechanism for transcription activation by MerR family member multidrug transporter activation, N terminus. J Biol Chem 279, 20356‐20362. Ninfa, A. J., and Magasanik, B. (1986). Covalent modification of the glnG product, NRI, by the glnL product, NRII, regulates the transcription of the glnALG operon in Escherichia coli. Proc Natl Acad Sci U S A 83, 5909‐5913. O'Neill, E., Ng, L. C., Sze, C. C., and Shingler, V. (1998). Aromatic ligand binding and intramolecular signalling of the phenol‐responsive sigma54‐dependent regulator DmpR. Mol Microbiol 28, 131‐141. Patton, T. G., Yang, S. J., and Bayles, K. W. (2006). The role of proton motive force in expression of the Staphylococcus aureus cid and lrg operons. Mol Microbiol 59, 1395‐1404.
Pearce, M. J., Mintseris, J., Ferreyra, J., Gygi, S. P., and Darwin, K. H. (2008). Ubiquitin‐like protein involved in the proteasome pathway of Mycobacterium tuberculosis. Science 322, 1104‐1107. Pellicena, P., Karow, D. S., Boon, E. M., Marletta, M. A., and Kuriyan, J. (2004). Crystal structure of an oxygen‐binding heme domain related to soluble guanylate cyclases. Proc Natl Acad Sci U S A 101, 12854‐12859. Phillips, R. W., Wiegel, J., Berry, C. J., Fliermans, C., Peacock, A. D., White, D. C., and Shimkets, L. J. (2002). Kineococcus radiotolerans sp. nov., a radiation‐resistant, gram‐positive bacterium. Int J Syst Evol Microbiol 52, 933‐938. Pittard, J., Camakaris, H., and Yang, J. (2005). The TyrR regulon. Mol Microbiol 55, 16‐26. Pokkuluri, P. R., Pessanha, M., Londer, Y. Y., Wood, S. J., Duke, N. E., Wilton, R., Catarino, T., Salgueiro, C. A., and Schiffer, M. (2008). Structures and solution properties of two novel periplasmic sensor domains with c‐type heme from chemotaxis proteins of Geobacter sulfurreducens: implications for signal transduction. J Mol Biol 377, 1498‐1517. Ponting, C. P., and Aravind, L. (1997). PAS: a multifunctional domain family comes to light. Curr Biol 7, R674‐677. Ponting, C. P., Aravind, L., Schultz, J., Bork, P., and Koonin, E. V. (1999). Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol 289, 729‐745. Potts, M., Sun, H., Mockaitis, K., Kennelly, P. J., Reed, D., and Tonks, N. K. (1993). A protein‐tyrosine/serine phosphatase encoded by the genome of the cyanobacterium Nostoc commune UTEX 584. J Biol Chem 268, 7632‐7635. Purcell, E. B., and Crosson, S. (2008). Photoregulation in prokaryotes. Curr Opin Microbiol 11, 168‐178. Rasouly, A., Schonbrun, M., Shenhar, Y., and Ron, E. Z. (2009). YbeY, a heat shock protein involved in translation in Escherichia coli. J Bacteriol 191, 2649‐2655. Reinelt, S., Hofmann, E., Gerharz, T., Bott, M., and Madden, D. R. (2003). The structure of the periplasmic ligand‐binding domain of the sensor kinase CitA reveals the first extracellular PAS domain. J Biol Chem 278, 39189‐39196. Ryjenkov, D. A., Simm, R., Romling, U., and Gomelsky, M. (2006). The PilZ domain is a receptor for the second messenger c‐di‐GMP: the PilZ domain protein YcgR controls motility in enterobacteria. J Biol Chem 281, 30310‐30314. Saier, M. H., Hvorup, R. N., and Barabote, R. D. (2005). Evolution of the bacterial phosphotransferase system: from carriers and enzymes to group translocators. Biochem Soc Trans 33, 220‐224. Sainsbury, S., Lane, L. A., Ren, J., Gilbert, R. J., Saunders, N. J., Robinson, C. V., Stuart, D. I., and Owens, R. J. (2009). The structure of CrgA from Neisseria meningitidis reveals a new octameric assembly state for LysR transcriptional regulators. Nucleic Acids Res 37, 4545‐4558. Santiago, B., Schubel, U., Egelseer, C., and Meyer, O. (1999). Sequence analysis, characterization and CO‐specific transcription of the cox gene cluster on the megaplasmid pHCG3 of Oligotropha carboxidovorans. Gene 236, 115‐124. Schuster, S. C., Noegel, A. A., Oehme, F., Gerisch, G., and Simon, M. I. (1996). The hybrid histidine kinase DokA is part of the osmotic response system of Dictyostelium. Embo J 15, 3880‐3889. Schuttelkopf, A. W., Boxer, D. H., and Hunter, W. N. (2003). Crystal structure of activated ModE reveals conformational changes involving both oxyanion and DNA‐binding domains. J Mol Biol 326, 761‐767. Selinger, Z. (2008). Discovery of G protein signaling. Annu Rev Biochem 77, 1‐13. Shu, C. J., Ulrich, L. E., and Zhulin, I. B. (2003). The NIT domain: a predicted nitrate‐responsive module in bacterial sensory receptors. Trends Biochem Sci 28, 121‐124. Soderling, S. H., and Beavo, J. A. (2000). Regulation of cAMP and cGMP signaling: new phosphodiesterases and new functions. Curr Opin Cell Biol 12, 174‐179.
Soding, J., Biegert, A., and Lupas, A. N. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33, W244‐248. Soisson, S. M., MacDougall‐Shackleton, B., Schleif, R., and Wolberger, C. (1997). The 1.6 A crystal structure of the AraC sugar‐binding and dimerization domain complexed with D‐fucose. J Mol Biol 273, 226‐237. Spiro, S. (1994). The FNR family of transcriptional regulators. Antonie Van Leeuwenhoek 66, 23‐36. Springer, M. S., Goy, M. F., and Adler, J. (1979). Protein methylation in behavioural control mechanisms and in signal transduction. Nature 280, 279‐284. Spudich, J. L., and Luecke, H. (2002). Sensory rhodopsin II: functional insights from structure. Curr Opin Struct Biol 12, 540‐546. Staron, A., Sofia, H. J., Dietrich, S., Ulrich, L. E., Liesegang, H., and Mascher, T. (2009). The third pillar of bacterial signal transduction: classification of the extracytoplasmic function (ECF) sigma factor protein family. Mol Microbiol. Stock, A., Koshland, D. E., Jr., and Stock, J. (1985). Homologies between the Salmonella typhimurium CheY protein and proteins involved in the regulation of chemotaxis, membrane protein synthesis, and sporulation. Proc Natl Acad Sci U S A 82, 7989‐7993. Stock, A. M. (2006). Transmembrane signaling by asymmetry. Nat Struct Mol Biol 13, 862‐863. Stock, J. B., Ninfa, A. J., and Stock, A. M. (1989). Protein phosphorylation and regulation of adaptive responses in bacteria. Microbiol Rev 53, 450‐490. Stock, J. B., Stock, A. M., and Mottonen, J. M. (1990). Signal transduction in bacteria. Nature 344, 395‐400. Suzuki, K., Babitzke, P., Kushner, S. R., and Romeo, T. (2006). Identification of a novel regulatory protein (CsrD) that targets the global regulatory RNAs CsrB and CsrC for degradation by RNase E. Genes Dev 20, 2605‐2617. Swain, K. E., and Falke, J. J. (2007). Structure of the conserved HAMP domain in an intact, membrane‐bound chemoreceptor: a disulfide mapping study. Biochemistry 46, 13684‐13695. Tam, R., and Saier, M. H., Jr. (1993). Structural, functional, and evolutionary relationships among extracellular solute‐binding receptors of bacteria. Microbiol Rev 57, 320‐346. Tasneem, A., Iyer, L. M., Jakobsson, E., and Aravind, L. (2005). Identification of the prokaryotic ligand‐gated ion channels and their implications for the mechanisms and origins of animal Cys‐loop ion channels. Genome Biol 6, R4. Taylor, B. L. (2007). Aer on the inside looking out: paradigm for a PAS‐HAMP role in sensing oxygen, redox and energy. Mol Microbiol 65, 1415‐1424. Taylor, B. L., and Zhulin, I. B. (1999). PAS domains: internal sensors of oxygen, redox potential, and light. Microbiol Mol Biol Rev 63, 479‐506. Torrents, E., Grinberg, I., Gorovitz‐Harris, B., Lundstrom, H., Borovok, I., Aharonowitz, Y., Sjoberg, B. M., and Cohen, G. (2007). NrdR controls differential expression of the Escherichia coli ribonucleotide reductase genes. J Bacteriol 189, 5012‐5021. Tucker, N. P., D'Autreaux, B., Yousafzai, F. K., Fairhurst, S. A., Spiro, S., and Dixon, R. (2008). Analysis of the nitric oxide‐sensing non‐heme iron center in the NorR regulatory protein. J Biol Chem 283, 908‐918. Ulrich, L. E., and Zhulin, I. B. (2005). Four‐helix bundle: a ubiquitous sensory module in prokaryotic signal transduction. Bioinformatics 21 Suppl 3, iii45‐48. Ulrich, L. E., and Zhulin, I. B. (2007). MiST: a microbial signal transduction database. Nucleic Acids Res 35, D386‐390. Urabe, H., and Ogawara, H. (1995). Cloning, sequencing and expression of serine/threonine kinase‐encoding genes from Streptomyces coelicolor A3(2). Gene 153, 99‐104. van Aalten, D. M., DiRusso, C. C., and Knudsen, J. (2001). The structural basis of acyl coenzyme A‐dependent regulation of the transcription factor FadR. Embo J 20, 2041‐2050.
Vasquez, V., and Perozo, E. (2009). Structural biology: A channel with a twist. Nature 461, 47‐49. Verhamme, D. T., Arents, J. C., Postma, P. W., Crielaard, W., and Hellingwerf, K. J. (2001). Glucose‐6‐phosphate‐dependent phosphoryl flow through the Uhp two‐component regulatory system. Microbiology 147, 3345‐3352. Vogeley, L., Trivedi, V. D., Sineshchekov, O. A., Spudich, E. N., Spudich, J. L., and Luecke, H. (2007). Crystal structure of the Anabaena sensory rhodopsin transducer. J Mol Biol 367, 741‐751. Walker, J. R., Altamentova, S., Ezersky, A., Lorca, G., Skarina, T., Kudritska, M., Ball, L. J., Bochkarev, A., and Savchenko, A. (2006). Structural and biochemical study of effector molecule recognition by the E.coli glyoxylate and allantoin utilization regulatory protein AllR. J Mol Biol 358, 810‐828. Wang, J. Y., Clegg, D. O., and Koshland, D. E., Jr. (1981). Molecular cloning and amplification of the adenylate cyclase gene. Proc Natl Acad Sci U S A 78, 4684‐4688. Wang, J. Y., and Koshland, D. E., Jr. (1981). The identification of distinct protein kinases and phosphatases in the prokaryote Salmonella typhimurium. J Biol Chem 256, 4640‐4648. Ward, N. L., Challacombe, J. F., Janssen, P. H., Henrissat, B., Coutinho, P. M., Wu, M., Xie, G., Haft, D. H., Sait, M., Badger, J., et al. (2009). Three genomes from the phylum Acidobacteria provide insight into the lifestyles of these microorganisms in soils. Appl Environ Microbiol 75, 2046‐2056. Weinhouse, H., Sapir, S., Amikam, D., Shilo, Y., Volman, G., Ohana, P., and Benziman, M. (1997). c‐di‐GMP‐binding protein, a new factor regulating cellulose synthesis in Acetobacter xylinum. FEBS Lett 416, 207‐211. Whitworth, D. E., and American Society for Microbiology. (2008). Myxobacteria : multicellularity and differentiation (Washington, DC: ASM Press). Williams, S. B., and Stewart, V. (1999). Functional similarities among two‐component sensors and methyl‐accepting chemotaxis proteins suggest a role for linker region amphipathic helices in transmembrane signal transduction. Mol Microbiol 33, 1093‐1102. Wilson, A., Boulay, C., Wilde, A., Kerfeld, C. A., and Kirilovsky, D. (2007). Light‐induced energy dissipation in iron‐starved cyanobacteria: roles of OCP and IsiA proteins. Plant Cell 19, 656‐672. Wower, J., Wower, I. K., and Zwieb, C. (2008). Making the jump: new insights into the mechanism of trans‐translation. J Biol 7, 17. Wu, S. Q., Chai, W., Lin, J. T., and Stewart, V. (1999). General nitrogen regulation of nitrate assimilation regulatory gene nasR expression in Klebsiella oxytoca M5al. J Bacteriol 181, 7274‐7284. Xiong, J., Kurtz, D. M., Jr., Ai, J., and Sanders‐Loehr, J. (2000). A hemerythrin‐like domain in a bacterial chemotaxis protein. Biochemistry 39, 5117‐5125. Zhulin, I. B. (2001). The superfamily of chemotaxis transducers: from physiology to genomics and back. Adv Microb Physiol 45, 157‐198. Zhulin, I. B., Nikolskaya, A. N., and Galperin, M. Y. (2003). Common extracellular sensory domains in transmembrane receptors for diverse signal transduction pathways in bacteria and archaea. J Bacteriol 185, 285‐294. Zhulin, I. B., Taylor, B. L., and Dixon, R. (1997). PAS domain S‐boxes in Archaea, Bacteria and sensors for oxygen and redox. Trends Biochem Sci 22, 331‐333.
PAS (FixL, PDB:1xj4)
N
C
GAF (3’-5’ cyclic PDE2aPDB:1mc0)
N
C
HNOB (TTE0680, PDB:1u56)
N
C
SLBB (transcobalamin
PDB:2bbc)
C
N
HMA (Bacillus CopZ,PDB:1k0v)
N
Cu
C
NC
ACT(3-phosphoglycerate dehydrogenase, PDB:1psd)
GyrI (BmrR, PDB:1exi)
N
C
CACHE (CitA, PDB:1p0z)
NC
NUDIX (NrtR, PDB:3gz8)
N
C
PBP2 (CatMPDB:2f7c)N
C
T-OB (ModE,PDB:1o7l)
N
C
CBS (Pyrobaculum PAE2072, PDB:2rif)
N
C
PBP1 (LuxP, PDB:2hj9A)
N
C
STAS (SPOIIAAPDB:1buz)
N
C
4-helical bundle - 4HB(Aspartate receptor;
PDB:1vlt)
NC
N
C
4-helical cytochromes(cytochrome b562,
PDB:1qpu)
Hemerythrin (DcrH,PDB:2awc)
N
C
Fe
ATP-cone (Ribonucleotide reductase,PDB:3r1r)
N
C
DSBH (AraC, PDB:2arc)
N
C
ISOCOT (Ribose-5-Phosphate Isomerase,PDB: 1lk7)
N
C
N
C
7 TM receptor(sensory rhodopsin,
PDB:1ap9)
PilZ (Vibrio VCA0042PDB: 2rde)ART-LGIC
(Acetylcholine-binding protein, PDB:1uv6, Chains C,D)
C
UTRA (Chorismate lyase, PDB:1fw9)
N
C
cNMPBD(CAP, PDB:1cgp)
N
C
NTF2(Orange carotenoid protein
PDB: 1m98)
NC
BLUF (PhotoreceptorPDB: 2hfo)
N
C
N
C
N
N
C
SHELIX
REC
PAS
HAMP
GAF
HISKIN
CBS
ATPase
7TMR−
DISM
PocR
ACYC
EAL
NarQ
PDZSTYKIN
MHYT
Globin ABC_ATPaseFHA 7TMR−DISMED2CheB_MethylesteraseAPATPase
7TMR−DISMED1
BarAN
STAS
MEDSPP2C
PBPB
Ferredoxin_RRM
CheW
NIT
SWACOS
7TMR−HD
BACTERIALFRINGE
Mbetalac
HNOBA
4HB
MglB
BetaPropeller
MNS
LYTTR
CheD
HTH
JAB
TrkAC
PilZCHASE2
CHASE4
CNMP
CHASE3
GGDEF
TPRIONCHANNEL
HDGYP
5TMR−LYT8TMR−UT
CACHECHASE
TLPCNPBPI
HISKIN
GGDEF
BetaPropeller
LRR
ANKREC
CHASE2
TRYPSERINEPROTEASE
sGTP_APGTPase
NACHT
MalT
SWACOS
GINGIPAIN
MNS
CLOSTRIPAIN
STYKIN
PDZ
CytC
BACTERIALFRINGE
DISCOIDINHPT
Globin
HMGL_TIM_Barrel
HAMP Ferredoxin_betagrasp
FBOXSTAS
HDGYP
PilZ
GLYCO
Ferredoxin_RRM
PASTA
ACYC
GAFHTH
PASATPaseSHELIX
ASH_IG
SH3
PBPI
RNA_Helicase_SFI
FN3
KELCH
EAL
PP2A
PP2C
CNMP
VWA
OBFOLD_NUCLEASE
RNA_Helicase_SFII
PTS_EIIA1
PTX
RICIN
RING
RnasePH
PAP2
CHEM TRANS
CASPASE
ABC_ATPase
FHA
S2P
SAM
S1COLD
APPLE
TIR
MACRODOMAIN
IG
INTEGRINBETA
APATPase
WD40_EUS
TRANSGLUTAMINASE
TPR
GLYCO
HNOBA
TIR
IPP
sGTP
PTXPOLYSACPOLYMERASE
EFHAND
4HBFerredoxin_betagrasp 7TMR−DISMED1
HxxxH
METHYLASE
CASPASE
NACHT
CheB_MethylesteraseMEDSMglB
KH
MNS
7TMR−DISMED2
7TMR−HDED
5TMR−LYT
7TMR−HD
ACET
ABC_ATPase
FN3
PP2C
Globin
Ferredoxin_RRM
NarQ
APATPase
Mbetalac
MHYT
SH3
BACTERIALFRINGE
CheRN_Alpha
DHH CNMP
ACYC
EALHDGYP
GGDEF
Phosphodiesterase
PilZ
SAM
TLPCN
START
PTS_EIIC
S1
S2P
Thermonuclease
STAS
Sm_MscS
WD40_EUS
FHA
IG
PBPI8TMR−UT CBS
IONCHANNEL
CHASE3 PBPB
STYKIN ATPase
CHEM TRANS
7TMR−DISMHPTCHASE2
RHOD_CDC25
REC
RADICAL_SAM
TPR
BarAN
PocRCACHE
CHASE4
SWACOS
Hpr_kinase_N
CheR_SAM−Methyltransferase
TPR
HDGYP
PP2C
HAMP
REC
EAL
GAF
PAS
HISKINGGDEFCHASE
HD
HPT
HTH
CHEMTRANS
7TMR−HDED
7TMR−HD
8TMR−UT
7TMR−DISMED2
7TMR−DISMED1
LyttR
ACYC
7TMR−DISM
SHELIX
5TMR−LYT
CN Signaling
two component Extracellular Sensory Domains
Signal Transmitter Domains
Apoptosisrelated
Misc Receptors
RNA MetabolismLipid signaling
Cal Signaling Misc
GTPaseSignaling
Intracellular Sensory Domains
HTH AdhesionMA chemotaxis
ubiquitinPTS-Accessory
Protein−protein interaction
proteases
Heme and iron sensors
PTS
Ion Transport
CheR_SAM
PASGAF
SHELIX
HISKIN
CHASE
DSBH
HAMPHTH
HPT
NIT
A D
B CA
C D
Protein 1
Protein 2
Protein 3
A
B
C
D
B D
Protein 4
ArchitectureNetwork
Functional Network
Two Component systems
Apoptosis-related systemsTransmembrane Receptors
Cyclic Nucleotide and Diguanylate Binding
CHEM TRANS
*
*
**
**
*
**
*
**
*
*
***
*** ***
*
*
* **
**
*
*
***
**
*
**
***
**
***
***
***
*
*
**
***
***
**
*
*
**
*
*
*
*
*
**
*
*
**
*
*
** *
*
**
*
**
**
*
*
**
*
*
*
*
***
*
*
*
*
*
*
*
*
**
*
*
*
**
**
*
*
*
*
*
*
*
*
*
*
**
*
*
*
**
*
*
*
**
*
* **
*
* *
***
*
*
**
*
**
*
*
**
*
*
*
*
**
**
*
*
* **
**
*
*
**
*
* ** **
*
** *
*
**
*
*
*
*
*
**
*
**
**
* **
*
*
**
*
*
*
0 500 1000 1500
5010
015
020
0
Number of Proteins
Com
plex
ityqu
otei
nt
AmetCbei
FranMsmeNfar
Rhod
Sery Scoe
Caur Haur
Amar
NpunAna
Ctha
Oter
Susi
Magn
Atum Bjap
Mmag
MlotMeth Rleg
Rpal SmelBpet
Bxen
Daro
DaciGura
MxanScelHche
PaerPing
Swoo
y = 60.604ln(x) - 210.56R² = 0.785
a b
*
*
*****
**
** *
*
*
*** * ** ***
*
** **
**
*
***
*
**
*
***
***
*
* **
***
***
**
*
*
***
*
**
*
**
*
*
*
*
*
*
*
***
* ***
*
*
*
***
**
*
**
**
*
**
*
*
***
**
**
*
*
*
*
**
*
***
*
*
* *
*
**
*
* *
*
*
* **
*
**
*
*
*
**
*
*
*
*
*
*
*
**
*
**
*
***
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
***
* **
*
*
*
****
**
**
*
**
*
*
**
*
**
*
***
*
**
*
*
* * *
*
*
* *
*
**
2000 4000 6000 8000
010
020
030
040
050
060
0
Proteome Size
Prot
ein
Cou
nts
CbeiFran
MsmeRhod
Sery
Scoe
Amar
Maer
Npun
Ana
Rbal
Susi
Atum
Bjap
Mlot
Meth
Rleg
Smel
Bpet
Bxen
CvioDaro
Daci
Pola
Reut
Rpic
Mxan
Scel
AhydHcheMari
Paer
Vhar
y = 0.0449x + 19.949R² = 0.7681
y = 0.0002x1.6602
R² = 0.8383
*
*
**
*
*
*
*
***
*
* **
*
*
*
***
*
*
*
*
**
****
*
*
*
****
**
** *
*
* **
*
*
**
**
*
** *
*
*
** *** ***
*
*
***
*
**
*
*
*
**
*
* * **
**
* *
***
*
*
*
*
*
*
*** *
*
* *
*
*
*
*
**
*
*
*
*
** ** *
*
*
*
*
*
**
*
*
*
**
**
***
*
*
**
*
*
**
*
**
*
**
*
**
**
*
*
*
* **
*
*
*
**
*
**
*
*
*
*
*
*
**
*
**
*
**
*
***
**
*
*
*
* **
*
**
2000 4000 6000 8000
020
4060
8010
0
Hmar Mace
Mhun
UmetDhaf Krad
Rhod
Scoe
Caur Haur Amar
Avar
Npun
AnaOter Susi
Magn
Bjap
Mmag
Meth
RpalRrub
Bxen
Daro
Lcho
Anae
DdesGura
Mxan
Scel
Sfum
Ahyd
Cpsy
HchePaerPingSwoo
Proteome Size
y = 8E-06x1.7626
R² = 0.4622
*
*
**
**
**
*
*
** **** *
**
** ***** *
** * ********
*
*
**
*
*
*
***
*
**
*
*
*
* ** **
*
*
*
*
*
*
*
*
**
*
*
**
*
**
** **
** **
*
**
***
* *
*
* *
**
***
*
**
** *
* *
*
**
*
* ***
**
**
****
** *
**
*
*** **** *
*
****
*
*
*
*
*
*
*
*
** * ** *
*
*
** *
**
*
*** ***
*** * *
*
** **
**
2000 4000 6000 8000
020
4060
80Krad
Rhod
ScoeCaur
Haur
Amar
Avar
Npun
Ana
BjapBxen
Mxan
Scel
d
y = 1.5575e0.0004x
R² = 0.3768
c
Proteome Size
PAS GAF
Pro
tein
Cou
nts
Pro
tein
Cou
nts
Complexity quotient Sensory Domain Containing Proteins
a
** * *
***
****
*
*** **
***
*
** *** **** * **
*
*
**
*
*
*
**
* ***
** ****
*
*
*
**
**
*
**
*
*
**
*
**
*
**
*
*
*
*
**
*
**
* *** *
** *
***
*
*
*
**
*
* **
*
*
*
*
*
**
**
*
*
*
*
*
**
***
*
*
**
* *
*
**
**
*
*
*
*
*
* *
**
* *
*
**
**
*** *
***
**
*
* ** **
*
* **
*
**
* ** *
**
*
* *
***
2000 4000 6000 8000
05
1015
2025
30
Rhod
Scoe
Amar
Npun
Fjoh BjapMmag
Meth
Bxen Scel
b
y = 1.3208e0.0003x
R² = 0.3626
*
**
**
*
*
* *** * ***
*
* ****
*
*
*
*
*
*
*
**
*
***
*
*
*
**
*
***
** ** *
*
***
**
** **
***
* ** **
***
* **
****
** *
*
*
**
* **
*
** *
*
*
* *
*
*
*
**
*
**
*
**
*
*
*
*
* **
*
*
*
*
*
**
**
*
** *
**
**
**
*
*
*
**
*
*
*
*
***
**
*
*
*
**
*
*
**
*
*
*
*
**
*
*
*
**
*
*
*
*
*
*
2000 4000 6000 8000
010
2030
4050
60
Proteome Size
Cbei
DhafLsph
Scoe
Amar
MagnBjap
Mmag
Meth
Rleg
Rrub
AaveBxen
Cvio
Daci
Scel
Wsuc
Ahyd
Hche
MariVcho
cNMPBD Chemotaxis TransducersP
rote
in C
ount
s
Pro
tein
Cou
nts
*
*
***
*
*
*
*
**
**
**
**
*
***
*
*
*
*
**
****
*
* * ****
***
**
*
***
*
* *
*
*
*
*
*
*
**
*
*
****
***
*
*
*
*
*
*
**
*
*
*
**
*
*
*
*
***
*
* **
**
*
*
*
**
*
****
*
* *
*
**
*
* *
*
*
*
*
** ** *
*
**
*
**
***
*
*
**
*
*****
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
**
*
***
*
**
*
*
****
**
*
*
*** *
*
**
*
**
*
**
*
*
**
**
*
*
*
*
*
* **
**
2000 4000 6000 8000
050
100
150
Rhod
Scoe
Amar
Avar
Npun
Ana
OterSusi
Bjap
Mmag
Bxen
Anae
Mxan Scel
Proteome Size
c
y = 1E-05x1.7846
R² = 0.6636
*
*
****
*
***
*
*
* ***
*
*
*
**
*
**
***
***
*
*
******
***
*
* *
*
**
*
**
*
*
*
*
*
*
*
*
*
*
* ****
**
*
*
*
*
*
*
*
*
*
*
*
**
*
**
*
***
* *
*
*
**
*
*
*
**
*
****
*
**
*
**
*
**
*
*
*
*
**
* **
*
*
*
*
**
*
* *
*
*
**
*
**
* **
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
*
* *
*
*
****
*
*
*
*
*
***
*
**
*
**
*
*
**
*
**
*
*
*
*
*
*
*
** *
*
*
0 50 100 150
050
100
150
Haur
Amar
Avar
Npun
AnaOter
Abac
SusiMagn
Bjap MmagMeth
Bxen
DaroAnae
Gura
MxanScel
Hche
y = 0.7428x + 1.3445R² = 0.9145
d
Prot
ein
Cou
nts
Histidine Kinases
Proteome Size
Receivers vs Histidine Kinases
Histidine Kinases
Rece
ivers