51
Natural history of sensor domains in bacterial signaling systems L. Aravind*, Lakshminarayan M. Iyer and Vivek Anantharaman NCBI, NLM, NIH, Bethesda, MD 20894, USA. *Address for correspondence: [email protected] Telephone: (301) 594-2445; Fax: (301) 480-9241.

Bacterial Signaling Chapter

Embed Size (px)

DESCRIPTION

Organisms sense stimuli at the molecular level using a relatively small set of protein domains. Computational analysis of protein sequences along with directed experimental studies have played a major role in the characterization of these protein domains. These sensor domains directly or indirectly detect a vast array of sensory inputs such as solutes, gases, redox potential and light. Here, we systematically survey the types of sensor domains found in bacterial signaling proteins. We summarize the key aspects of their structure that are central to their functions and their associations with other signaling domains. Despite the advances several of these domains remain poorly understood in terms of their structure, ligands and functional significance. We accordingly try to highlight the significance of some of the under-appreciated sensor domains. Genomic analysis reveals that the architectural complexity of sensory domains increases with the number of sensor proteins in a genome, with a gradual plateau towards a point where newer combinations of domains do not provide major selective advantage. Syntactical analysis of domain architectures shows several discernable patterns that have functional relevance, especially in terms of the constraints introduced by signal transmission domains such as HAMP and S-helix modules. Across bacteria, the number of signaling proteins shows a positive correlation with respect to proteome size. However, there is a clear distinction in the trends between bacteria that react directly and rapidly to a large number of small molecule signals vis-à-vis those that possess distinct signaling systems related to developmental complexity. Analysis of scaling trends for individual sensor domains shows that lifestyle strategies play a major role in the selection of the type and number of these domains in an organism. From an evolutionary viewpoint, the vast majority of sensory domains appear to have their origins in the bacteria and have been widely transferred to other superkingdoms of life. In particular most major eukaryotic sensor domains appear to have their antecedents in bacteria.

Citation preview

Page 1: Bacterial Signaling Chapter

Natural history of sensor domains in bacterial signaling systems

L. Aravind*, Lakshminarayan M. Iyer and Vivek Anantharaman

NCBI, NLM, NIH, Bethesda, MD 20894, USA.

*Address for correspondence: [email protected]

Telephone: (301) 594-2445; Fax: (301) 480-9241.

Page 2: Bacterial Signaling Chapter

Abstract

Organisms sense stimuli at the molecular level using a relatively small set of protein domains. Computational

analysis of protein sequences along with directed experimental studies have played a major role in the

characterization of these protein domains. These sensor domains directly or indirectly detect a vast array of sensory

inputs such as solutes, gases, redox potential and light. Here, we systematically survey the types of sensor domains

found in bacterial signaling proteins. We summarize the key aspects of their structure that are central to their

functions and their associations with other signaling domains. Despite the advances several of these domains remain

poorly understood in terms of their structure, ligands and functional significance. We accordingly try to highlight the

significance of some of the under-appreciated sensor domains. Genomic analysis reveals that the architectural

complexity of sensory domains increases with the number of sensor proteins in a genome, with a gradual plateau

towards a point where newer combinations of domains do not provide major selective advantage. Syntactical

analysis of domain architectures shows several discernable patterns that have functional relevance, especially in

terms of the constraints introduced by signal transmission domains such as HAMP and S-helix modules. Across

bacteria, the number of signaling proteins shows a positive correlation with respect to proteome size. However, there

is a clear distinction in the trends between bacteria that react directly and rapidly to a large number of small

molecule signals vis-à-vis those that possess distinct signaling systems related to developmental complexity.

Analysis of scaling trends for individual sensor domains shows that lifestyle strategies play a major role in the

selection of the type and number of these domains in an organism. From an evolutionary viewpoint, the vast

majority of sensory domains appear to have their origins in the bacteria and have been widely transferred to other

superkingdoms of life. In particular most major eukaryotic sensor domains appear to have their antecedents in

bacteria.

Page 3: Bacterial Signaling Chapter

An overview of advances in research on domains found in signaling proteins

The term signaling can be inclusively defined as the set of mechanisms by which cells detect and relay

environmental and intracellular stimuli to activate the appropriate responses to them. Dissection of the molecular

mechanisms of signaling has been a major facet of modern biochemistry and molecular biology. Some of the earliest

studies on signaling emerged from the discovery of kinases that phosphorylate proteins in eukaryotes (e.g. the

pioneering work of Krebs on the glycogen phosphorylase kinase (Fischer and Krebs, 1955; Krebs, 1998)). Forays

into the signaling mechanisms of bacteria began with the operon hypothesis of Jacob and Monod (Jacob and Monod,

1959). Divergent choices of model systems resulted in the development of very different images of the signaling

processes in bacteria and eukaryotes in the early days of molecular biology (Krebs, 1998; Springer et al., 1979;

Stock et al., 1989). In eukaryotes the role of kinases in phosphorylating the hydroxyl groups of serine and threonine

and later tyrosine residues emerged as a major paradigm (Krebs, 1998). In tandem, other signaling mechanisms such

as the pathways dependent on cyclic nucleotide second messengers and GTP hydrolysis either coupled to seven

transmembrane receptors or independently of them became well known in eukaryotes (Gilman, 1987; Neves et al.,

2002). In bacteria, the operon hypothesis pointed to a simple regulatory mechanism, namely activation of a

transcription factor via direct sensing of a diffusible environmental ligand. Subsequently, concerted studies on

bacterial behavior and physiology by Adler, Koshland, Boyer, the Stocks, Ninfa, Magasanik, Hess and Simon,

among several others, uncovered new and distinctive signaling systems (Baker et al., 2006; Black et al., 1980; Hess

et al., 1988a; Hess et al., 1988b; Kreil and Boyer, 1964; Ninfa and Magasanik, 1986; Springer et al., 1979; Stock et

al., 1989; Wang et al., 1981; Wang and Koshland, 1981). The main systems that emerged from these studies were

the chemotaxis receptors, the bacterial cyclic nucleotide network and the phosphorelay two-component systems

comprised of receiver domain and histidine kinase pairs. Despite certain prophetic insights of Koshland (Koshland,

1980), by the early 1990s the mainstream view was that eukaryotic and bacterial signaling systems were quite

different from each other. While both systems were seen to depend heavily on protein phosphorylation cascades, the

former were seen as predominantly possessing serine, threonine or tyrosine phosphorylation (Hanks and Hunter,

1995); whereas the latter were seen as utilizing the histidine-asparate phosphorelay after phosphorylation of the

histidine (Stock et al., 1990). Similarly, though both superkingdoms were seen as possessing cyclic nucleotide

signaling there was no evidence at that time that the machinery involved in generation or degradation of these

nucleotides was conserved between bacteria and eukaryotes (Selinger, 2008; Wang and Koshland, 1981). Sensory

systems such as the chemotaxis receptors of bacteria and the 7TM receptors of eukaryotes were also seen as having

no counterparts in the other superkingdom (Selinger, 2008; Zhulin, 2001).

By the second half of the 1990s this picture was to undergo a major modification, thanks to the rise of genomics and

the expansion of studies on signaling in new prokaryotic models (Bult et al., 1996; Fleischmann et al., 1995). From

the earliest days, computational analysis of proteins sequences played a major role in the dissection of signaling

systems at the molecular level. This was first demonstrated by Stock and co-workers in their analysis of the CheY

protein, which resulted in identification of the receiver domain of the two component system (Stock et al., 1985).

Page 4: Bacterial Signaling Chapter

More generally, it led to the important realization that signaling proteins typically displayed a modular architecture

and that each conserved domain performed a distinct specific biochemical function within the polypeptide. This

provided a means of predicting a protein’s function based on the conserved domains found in it. In the second half

of the 1990s key computational tools for protein sequence analysis, such as PSI-BLAST and HMMER were

developed. These programs represent the information in a protein sequence alignment as a position-specific score

matrix (PSSM) or a hidden Markov model (HMM) and search the protein sequence database for statistically

significant matches to the PSSM or HMM (Altschul et al., 1997; Durbin, 1998). As a consequence distant sequence

similarities that had previously eluded workers could be reliably identified. This development combined with the

proliferation of protein sequences from the efflorescence of genome sequencing efforts provided a powerful means

to explore signaling proteins with the objective of characterizing new protein domains. Concomitant advances in

protein structure determination through NMR and crystallography allowed domain discovery through computational

methods to be taken to the next level, i.e. visualization and analysis of the 3-dimensional structures of domains.

These efforts culminated in a multifaceted image of many of the signaling systems ranging from domain

architectures and interactions of the relevant polypeptides, through the individual domains, all the way down to an

atomic resolution view of domain structure (Gao and Stock, 2009).

Of the several major advances brought about by these developments, especially important were the discoveries

emphasizing the previously under-appreciated similarities between bacterial and eukaryotic sensory and signal

transmission systems (Ponting et al., 1999). For example, it became increasingly clear that certain bacterial lineages

such as cyanobacteria, actinobacteria and myxobacteria have an abundance of eukaryote-type protein kinases (S/T/Y

kinases) and associated phosphorylation cascades (Munoz-Dorado et al., 1991; Potts et al., 1993; Urabe and

Ogawara, 1995). On the other hand it also became apparent that the histidine kinase and receiver domains might

have an important role in transmitting signals sensed by diverse eukaryotic receptors in lineages such as plants,

fungi, slime molds and heterolobosean amoebae (Chang et al., 1993; Schuster et al., 1996). Genomics also gave us

the first glimpses of the signaling systems of the 3rd superkingdom of life, the archaea, many of which were

experimentally intractable organisms (Klenk et al., 1997). Comparisons of the signaling systems between the three

superkingdoms of life suggested that many of the systems that were previously believed to be eukaryote-specific

were probably ultimately of bacterial provenance (Ponting et al., 1999). Sequence analysis of signaling proteins led

to the discovery of several new domains belonging to different functional categories. These included: 1) the sensor

domains which recognize and respond to diverse signals – e.g. the PAS, GAF, CACHE and CHASE domains

(Anantharaman and Aravind, 2000; Anantharaman and Aravind, 2001; Aravind and Ponting, 1997; Mougel and

Zhulin, 2001; Ponting and Aravind, 1997; Taylor and Zhulin, 1999; Zhulin et al., 2003; Zhulin et al., 1997); 2)

Novel signaling receptors such as the prokaryotic 7TM, 8TM and 5TM receptors (Anantharaman and Aravind,

2003); 3) intra-molecular signal transmitter domains e.g. the HAMP domain and the S-helix module (Anantharaman

et al., 2006; Aravind and Ponting, 1999; Williams and Stewart, 1999); 4) novel enzymatic domains such as bacterial

caspase-type proteases and the STAND class of ATPases (Aravind and Koonin, 2002; Leipe et al., 2004); 5)

bacterial peptide tagging systems, including homologs of the eukaryotic ubiquitin tagging systems (Iyer et al., 2006;

Page 5: Bacterial Signaling Chapter

Iyer et al., 2008; Pearce et al., 2008). These discoveries provided new mechanistic insights into how specific

domains sensed stimuli and transmitted them to downstream effectors. In particular, they clarified how both one-

component systems (i.e. transcription/ RNA-binding regulators that directly respond a stimulus) and two component

systems use comparable sensor domains to recognize their activating stimuli and further transmit the impulse to

other effector domains within the same or in a different polypeptide. In other cases the discovery of these domains

allowed the identification of previously unexpected signaling mechanisms (Gao and Stock, 2009; Stock, 2006;

Swain and Falke, 2007; Taylor, 2007). On the whole these developments affirmed the importance of the modular

architecture of proteins in signaling systems. The great diversity of signaling systems seen across life appears to

have been generated by combinations of a relatively small set of ancient protein domains performing specific types

of functions. In course of evolution, these domains were mixed and matched according a syntax, which we are just

beginning to understand, to spawn all manner of signaling systems from the simplest single component systems to

complex multi-component cascades and phospho-relay networks.

With more than a decade having passed since the “genomic revolution”, we are now in possession of genome

sequences of more than one representative of practically every known branch of the bacterial tree. We have also

identified nearly all of the widespread conserved domains found in bacterial signaling proteins and possess structural

or mechanistic information on more than half of them (Finn et al., 2008; Gao and Stock, 2009; Manson, 2009;

Moglich et al., 2009b; Ulrich and Zhulin, 2007). This places us in the position to objectively evaluate several

questions concerning the natural history of conserved sensor domains found in bacterial signaling systems. We can

now more conclusively address the similarities and differences between bacterial and eukaryotic sensory systems.

We can also investigate the interplay between life-style and the distribution of sensor domains in bacteria. Finally,

we can also look at the distribution of different sensory systems across the bacterial tree and evaluate the validity of

generalizations based on model systems. In this chapter, we address some these issues using the most up to date

collection of genome sequences across the bacterial tree and a comprehensive collection of HMMs and PSSM for

signaling domains. We present the discussion from a “domain-centric” viewpoint –first we categorize all the

different types of signaling domains into few handy mechanistic categories for facilitating the core discussion on the

sensor domains. We then discuss examples of widely distributed domains central to sensory mechanisms using their

structural features as the primary handle. We then summarize key syntactical elements of domain architectures of

sensory proteins and try to relate them to their mechanistic features. Finally, we review the phyletic distributions of

these domains in bacteria and their significance for the life-style strategies adopted by bacteria. To avoid breaking

the flow of the text, in most part we do not provide the expansions for the domains names each time they are

encountered, instead we provide them all together in Appendix-I

Functional classes of domains found in bacterial signaling systems

Globular sensor domains: This category includes globular domains that directly bind ligands and thereby sense

their presence in extracellular media or within the cell. A classical example of such a sensor domain is the sugar-

binding domain of the one-component signaling protein, the transcription factor AraC (Aravind et al., 2005; Soisson

Page 6: Bacterial Signaling Chapter

et al., 1997). Other domains in this category bind ligands either covalently (e.g. the GAF domain binding a

tetrapyrrole ligand in phytochromes) or non-covalently (e.g. PAS and BLUF domains binding flavin nucleotides)

and utilize the redox and photosensitive properties of these ligands to act as sensors (Aravind and Ponting, 1997;

Gomelsky and Klug, 2002; Moglich et al., 2009b; Ponting and Aravind, 1997; Zhulin et al., 1997). Yet others use

these prosthetic groups or ions to sense gaseous ligands such as nitric oxide (E.g. the HNOB and HNOBA domains)

(Iyer et al., 2003; Tucker et al., 2008). These domains are the primary recipients of the sensory stimulus and transmit

it in the form of conformational changes induced either by ligand-binding or changes in ligand conformation. While

domains that sense intracellular second messengers are also sensor domains, we consider them separately.

Multi-transmembrane receptor domains: These include a variety of receptors with multiple TM domains. Some of

the multi-TM domain might directly sense stimuli such as light via prosthetic groups such rhodopsin (the sensory

rhodoposins) (Jung et al., 2003; Mukohata et al., 1999). Some other bacterial counterparts of eukaryotic 7TM

receptors might transduce signals recognized by extracellular sensor domains to intracellular effector domains

(Anantharaman and Aravind, 2003). Some other bacterial multi-TM receptor domains appear to be membrane-

embedded direct sensors of stimuli (Anantharaman and Aravind, 2003).

Signal transmitter domains and modules: This category includes distinct globular domains as well as simple helical

modules that typically connect different signaling domains. The prevalence of such domains is a hall-mark of

signaling systems of bacterial origin. They typically function in transmitting a signal in the form of a conformational

change from one domain to another or as switches that prevent inappropriate propagation of a signal in the absence

of a stimulus. Examples of such signal transmitters are the HAMP domain and the S-helix module that function in

conjunction with a whole range of sensor and effector domains (Anantharaman et al., 2006; Aravind and Ponting,

1999; Williams and Stewart, 1999).

Domains in two-component-type phosphorelay systems: The primary members of this category are the effector

domains of the two component systems, the histidine kinase and the receiver domain. They function in conjunction

with two helical partners that are the recipients of the initial histidine phosphorylation, namely the dimerization and

histidine phosphorylation module (DHp) and the His-containing phosphotransfer domain (HPt) domain (Gao and

Stock, 2009; Stock et al., 1990). A small minority of histidine kinase domains have been shown to phosphorylate

serine/threonine and are thus closer to the S/T/Y kinases in their action (Koretke et al., 2000).

Domains in S/T/Y phosphorylation cascades: Unlike the two component phosphorelay systems, the

phosphorylation of serine or threonine (and in some cases tyrosine) is a single step process catalyzed by the S/T/Y

kinase domain that is structurally distinct from the histidine kinase domains (Hanks and Hunter, 1995; Leonard et

al., 1998). These phosphopeptides might be sensed by specific domains such as the FHA domain (Mohammad and

Yaffe, 2009). In eukaryotes the BRCT domain has been claimed to bind such phosphopeptides, but there is no

evidence for such a function among the bacterial BRCT domains (Mohammad and Yaffe, 2009). The phosphate tags

Page 7: Bacterial Signaling Chapter

on proteins are removed by phosphatase domains, of which the PP2A domains are among the most common in

bacteria (Ponting et al., 1999).

Domains in second messenger signaling: These signaling systems generate a second messenger in response to a

primary stimulus and this second messenger is further recognized by specific sensors that further relay the signal to

targets. The most prevalent secondary messengers in bacterial systems are cyclic nucleotides cAMP and cGMP,

cyclic diguanylate and the alarmone (p)ppGpp. This category includes the various cyclase domains which generate

the cyclic nucleotides, the cyclic nucleotide phosphodiesterases and the binding domains that recognize cNMP or

diguanylate. All cyclase domains appear to have been derived from different families of nucleic acid polymerases.

The cNMP cyclase and GGDEF (diguanylate cyclase) domains are related to the classical palm-domain

polymerases, whereas the Escherichia coli CyaA-like cyclases and the (p)ppGpp-generating enzyme of the

SpoT/RelA family are related to the polymerase β superfamily (Aravind and Koonin, 1999a; Hogg et al., 2004;

Makarova et al., 2002). The phosphodiesterases belong to at least 4 major families: 1) the HD superfamily (HD-

GYP the cyclic diguanylate phosphodiesterase and cNMP phosphodiesterases), 2) the calcineurin-like superfamily,

3) the metallo--lactamase superfamily and 4) the EAL superfamily (Aravind and Koonin, 1998; Galperin et al.,

1999).

Domains and modules in the core chemotaxis transducers: Bacterial chemotaxis transducers (also called methyl

accepting chemotaxis proteins) are proteins that combine N-terminal sensor and signal transmitter domains with a C-

terminal coiled coil domain (Alexander and Zhulin, 2007; Baker et al., 2006; Zhulin, 2001). This coiled coil domain

is the primary domain in this category and contains two distinct sub-structures, which respectively act as targets for

the other types of domains/modules in this category. These other domains are the methyltransferases and

methylesterases that modify glutamates or glutamines in one of the substructures in coiled coil and the CheW

modules that interact with the other substructure in the coiled coil.

GTP-dependent signaling domains: Unlike in eukaryotes, much less is known of GTPase signaling in bacteria.

There is no evidence for classical G proteins in bacteria (Ponting et al., 1999). Though several homologs of the

small Ras-like GTPase are observed in bacteria their roles in sensory signaling remain poorly understood, and these

do not possess the classical exchange factors (GEFs) and activating proteins (GAPs) typical of eukaryotic GTPase

signaling systems. A number of pathogenic bacteria possess effectors that interfere with the signaling of their host

cells into which they are secreted (Aepfelbacher et al., 2005).

Domains of “Apoptotic-type” signaling systems: A number of distinct domains were initially characterized to

interact together in signaling networks regulating apoptosis in animals. These include the distinctive ATPases of the

STAND class such as the AP- and NACHT- ATPases, proteases such as the caspases, and the enigmatic TIR

domain (Anantharaman and Aravind, 2001; Leipe et al., 2004). Subsequently, it became apparent that along with

S/T/Y kinases these domains are frequently found in a subset of bacteria and even in those organisms they might

Page 8: Bacterial Signaling Chapter

interact with each other to constitute poorly understood signaling systems (Anantharaman and Aravind, 2001;

Aravind and Koonin, 2002). Some of them like the MalT ATPase might play an important role in transmitting

signals upon recognition of solutes such as sugars (Marquenet and Richet, 2007).

Ion channels: Prokaryotes possess several ion channels that have roles in responding to osmotic imbalances

(Vasquez and Perozo, 2009) as well as signaling in response to sensing of specific ligands (Lee et al., 2008a). In this

chapter we restrict ourselves to only discussing ligand-gated ion-channels that have a direct sensory role.

Phosphotransfer systems (PTS): This complex system includes a series of domains that relay a high energy

phosphate moiety from phosphoenolpyruvate via serial transestrifications to sugars being transported via

transmembrane permeases. However, a variant of this phospho-relay system is involved in the regulation of nitrogen

metabolism in certain bacteria. While the PTS could be seen as a solute sensory mechanism, it appears to be

primarily a solute uptake system. Hence, we do not discuss it in any detail here. We refer readers to a detailed

comparative genomics analysis of these proteins by Saier and co-workers (Barabote and Saier, 2005; Saier et al.,

2005).

Peptide-tagging systems: In eukaryotes conjugation of proteins with peptide tags in the form of ubiquitin (or a

related polypeptide) or homopolymeric peptides such as polyglycine, polyglutamate or even a single amino acid

such as tyrosine is a major regulatory mechanism in several signaling processes (Hochstrasser, 2009). In bacteria a

well-known mechanism of peptide tagging via the tmRNA system plays a role in degradation of defective proteins

(Wower et al., 2008). However, only recently it is becoming apparent that some bacteria possess domains that

constitute counterparts of the eukaryotic ubiquitin system or analogous regulatory peptide-tagging systems such as

pupylation (Iyer et al., 2006; Iyer et al., 2008; Pearce et al., 2008).

It should be stressed that the above categorization of signaling domains is not rigid but is intended as a useful

context for the further detailed discussion on domains involved in sensory processes. In functional terms there is

much cross-talk between some of these categories of domains (e.g. the well-known case of the close interaction

between chemotaxis transducers and certain two component systems) (Baker et al., 2006).

General mechanisms through which sensory inputs are channelized

In any organism there has been selection for several different types of responses to sensory stimuli types which

might be broadly characterized as: 1) direct rapid responses- These are mediated by one-component systems, where

sensory domains are usually fused to DNA-binding domains such as the helix-turn-helix (HTH) domain, to direct

transcription of particular target genes and rapidly alter the transcriptional state of the cell (Aravind et al., 2005). 2)

Filtered responses- In these cases a sensor domain might be combined to a catalytic effector domain, typically a

histidine kinase, via signal transmitter elements such as the HAMP domain and the S-helix (Gao and Stock, 2009).

These elements act as a preliminary filter for propagation of the signal received by the sensor domain and thus

Page 9: Bacterial Signaling Chapter

ensure its controlled transmission. Additional control steps are present downstream in the form the DhP, HTp (if

present) and receiver domains before the signal is converted to a transcriptional response. These systems allow a

controlled response that allow sensing of thresholds and may be contrasted with the more continuous and rapid

responses afforded by the one-component systems. 3) Signal amplification- Amplification of the initial stimulus

sensed by a sensor domain can be useful to direct global state changes in the cell. These are usually mediated by

cNMP/cyclic diguanylate- generating enzymes that sense the signal via a sensory domain and amplify it by

generating a second messenger that is further sensed by specific domains (Benach et al., 2007; Linder and Schultz,

2008). An alternative amplificatory mechanism is via S/T/Y kinases that phosphorylate target peptides, which might

be sensed by FHA domains that further recruit other effectors to the phosphorylated proteins. Multiple kinases could

also be linked in an amplificatory phosphorylation cascade (Hanks and Hunter, 1995; Jagadeesan et al., 2009). 4)

Altering the shape of a response and negative regulation- Certain enzymes such as cyclic nucleotide

phosphodiesterases and phosphatases could function downstream of sensor domains to reverse the action of cyclases

and kinases to create specifically shaped responses such as a sharp peak or shut down of an ongoing response

(Soderling and Beavo, 2000). 5) Allosteric and feedback regulation- Sensor domains combined with catalytic

domains can often serve to regulate the action of an enzyme in an allosteric manner or sense a feedback from a

downstream process. For example GAF domains fused to the phosphodiesterase domain regulate its activity by

sensing cyclic nucleotides (Martinez et al., 2002). 6) Memory-The alteration of responses to stimuli subsequent to

the initial stimulus might be termed memory in a signaling system. This is seen in the form of covalent modification

of signaling domains. This is exemplified by the methylation, demethylation and deamidation of glutamates and

glutamines in chemotaxis transducers by the methylesterases and methyltransferases (Baker et al., 2006).

The way these different types of responses have been achieved in evolution is primarily via the combination of the

above-outlined categories of domains within a single polypeptide or through interactions between them on separate

polypeptides (Fig. 1). In the next section we consider the major sensor domains that are central to bacterial

signaling from a structural view point. We do not consider signal transmission domains and effector domains in

detail and discuss them only in the context of particular signaling mechanisms downstream of the sensor domains

(for domain name expansions see Appendix-I).

Globular sensor domains

The PAS-like fold

A large number of the sensor domains in bacterial signaling proteins can be unified into a single structural fold

termed the PAS-like fold (also termed the profilin-like fold in the SCOP database (Andreeva et al., 2008)). The

prototypical superfamily contained in this fold is the classical PAS domain, which is conserved in a vast number of

bacterial signaling proteins. Other major superfamilies of bacterial sensor domains possessing the PAS-like fold are

GAF, CACHE, PocR, HNOBA and GSU0582-like heme-binding domain (Anantharaman and Aravind, 2000;

Anantharaman and Aravind, 2005; Aravind and Ponting, 1997; Moglich et al., 2009b; Pokkuluri et al., 2008;

Ponting and Aravind, 1997; Zhulin et al., 1997). Domains previously named LOV and MEKHLA are simply

Page 10: Bacterial Signaling Chapter

versions of the originally defined classical PAS domain (Moglich et al., 2009b; Mukherjee and Burglin, 2006).

Similarly, the so-called CHASE4 domain (Zhulin et al., 2003), which is found as the extracellular domain of

bacterial sensory receptors, is merely a divergent member of the CACHE superfamily. The classical PAS-like fold is

characterized by a core sheet comprised of 5 β-strands arranged in a 3-4-5-1-2 order which form the base of a pocket

(Moglich et al., 2009b). Strands 2 and 3 are on opposite sides of the sheet and are connected by “flange” that might

contain two or more secondary structure elements in a helical conformation (Fig. 2). The loop between strands 3 and

4 might also display extension or insertion of additional elements. Together, these two features form the sides of a

pocket that can accommodate a ligand (Moglich et al., 2009b). This ligand-binding mode is largely conserved

throughout the fold as evidenced by the multiple ligand-bound structures available for representatives of the PAS,

GAF and CACHE superfamilies (Moglich et al., 2009b) (Fig. 2). This suggests that the ancestral member of the

PAS-like fold was a ligand-binding domain and that the extant diversity of ligands bound by this fold is a

consequence of structural divergence of the ancestral ligand-binding pocket resulting in new specificities (Aravind

and Koonin, 2002). The emergence of new ligand-contacting sites appears to have chiefly occurred as a result of the

diversification of the flange and the loop between strand 3 and 4. One striking example of this is the emergence of a

cysteine in the first GAF domain of the cyanobacterial phytochromes (subsequently transferred to plants) in the

insert between strand 3 and 4 to covalently bind the phycocyanobilin ligand. As a result this version of the GAF

domain covalently binds the chromophore in contrast to the other bacterial phytochromes where the chromophore is

attached to a distinct N-terminal region (Lamparter et al., 2004). In the case of the PAS domain of the photoactive

yellow protein another such cysteine has emerged in the flange, which is covalently linked to 4-hydroxycinnamic

acid (Moglich et al., 2009b). Another variation is the conserved histidine in the FixL PAS domains which helps in

coordinating a heme ligand bound in the pocket (Ayers and Moffat, 2008). An exception to this picture is the

GSU0582-like heme-binding domain from Geobacter sulfurreducens which has evolved a distinct heme-

coordinating site on the face opposite to the conventional ligand-binding pocket (Pokkuluri et al., 2008).

Most superfamilies of this fold that have been characterized to date bind a wide range of ligands mostly non-

covalently and occasionally covalently. In the PAS superfamily itself we encounter versions showing specificity to

diverse set of ligands such as 4-hydroxycinnamic acid, FMN, FAD, 1H-indole-3-carbaldehyde (Moglich et al.,

2009b). Certain PAS domain proteins might also be promiscuous in binding ligands and serve as receptors of a

range of xenobiotic compounds. In the CACHE superfamily there is evidence for binding of ligands such as

glycerol, citrate, sugars and metal ions (Moglich et al., 2009b; Reinelt et al., 2003), whereas in the case the GAF

domain there is evidence for binding of cyclic nucleotides, the tetrapyrrole ligands such as phycocyanobilin, formate

and metal ions (Aravind and Ponting, 1997; Lamparter et al., 2004; Martinez et al., 2002; Tucker et al., 2008). The

divergent GAF domains found in the sensor domain of the transcription factors of the IclR family also bind a range

of small solutes such as glyoxylate (Walker et al., 2006). Another divergent version of the GAF superfamily, the

autoinducer-binding domain, is found fused to a HTH domain in the LuxR-like transcription factors and specializes

in binding the quorum-sensing N-acyl homoserine lactones (Chai et al., 2001). Much less is known of the ligands

of the PocR superfamily, but the available evidence suggests that it binds small hydrocarbon derivatives such as 1,3-

Page 11: Bacterial Signaling Chapter

propanediol (Anantharaman and Aravind, 2005). Members of the PAS-like fold function in both intracellular and

extracellular contexts – the PAS, GAF, HNOBA and PocR superfamilies are typically intracellular in their

localization, whereas members of the CACHE superfamily are usually extracellular ligand-binding domains

(Anantharaman and Aravind, 2000). The ligand binding ability of members of the PAS-like fold might be used in

sensory perception in two distinct modes. The bound ligands like flavin nucleotides, 4-hydroxycinnamic acid,

phycocyanobilin and heme could act as redox sensors or photosensors, or both (Lamparter et al., 2004; Moglich et

al., 2009a; Moglich et al., 2009b; Taylor and Zhulin, 1999). In other cases like the binding of cNMP by GAF

domains, or the binding of autoinducers and diverse ligands of one-component systems the binding of the ligands by

the domains by itself acts as the sensory input (Chai et al., 2001; Martinez et al., 2002). There are other several

members of the PAS-like fold, especially classical PAS domains which are not known to bind any small molecule

ligands (Moglich et al., 2009a). Instead they have been found to dimerize through homophilic interactions or contact

other proteins using the same ligand-binding pockets (Gao and Stock, 2009; Moglich et al., 2009b). However, it is

possible that the natural ligands remain unknown in a number of these cases. Irrespective of whether they bind small

molecule or peptide ligands, the majority of members of this fold appear to be able to undergo conformational

changes, most probably on account of the peculiar flange which bridges the two ends of the core β-sheet (Fig. 2).

The presence of multiple tandem repeats of PAS and, in some cases, GAF domains in several signaling proteins

differentiates them from other members of the fold such as the CACHE and HNOBA domains that typically occur

as a single copy in the protein (Fig. 1). It is likely that not all of the copies in these tandem arrays of PAS domains

bind small molecule ligands; rather they might bind peptides (including via dimerization). It is conceivable that

these tandem arrays utilize their ability to undergo conformational changes to transmit the stimulus sensed by the

single sensory domain to the downstream effector domains. Thereby tandem arrays of PAS domains in signaling

proteins could provide a powerful means to set response thresholds for stimuli.

The CHASE domain

The CHASE domain is an extracellular sensor domain that appears to be critical for recognition of secreted ligands

used in cell-cell communication (Anantharaman and Aravind, 2001; Mougel and Zhulin, 2001). No structure is as

yet available for any representative of this family, but our profile-profile comparisons with the HHpred program

(Soding et al., 2005) indicate that it contains a core domain of the PAS-like fold that is most closely related to the

CACHE domain (see above). The domain is typically combined with a variety of intracellular catalytic domains of

the two component system and cyclic nucleotide/cyclic diguanylate-dependent second messenger systems and S/T/Y

kinases (Anantharaman and Aravind, 2001). In eukaryotes, the characterized ligands include adenine derivatives

such as cytokinin, and the related slime mold signaling molecule discadenine, and a small peptide SDF-2 secreted by

the pre-stalk cells of the Dictyostelium (Anantharaman and Aravind, 2001; Mougel and Zhulin, 2001). The ligands

of none of the prokaryotic versions have been characterized, but given their close relationship to the eukaryotic

versions it is possible that they are receptors of a previously unexplored bacterial cell-cell communication system

that might utilize cytokinin-like ligands.

Page 12: Bacterial Signaling Chapter

The HNOB-like fold

This fold includes two distinct superfamilies of sensor domains namely the HNOB and V4R (Anantharaman et al.,

2001; Iyer et al., 2003), both of which are always found in intracellular proteins. This is an α+β fold with a 4-helical

bundle stacked against a 4 β-stranded unit with two additional helices associated with it (Fig. 2)(Pellicena et al.,

2004). A ligand binding cleft is formed between these two sub-structures. Majority of members of the HNOB

domain contain a conserved histidine in the first helix associated with the 4-stranded substructure with which it

coordinates a heme molecule (Iyer et al., 2003). The HNOB domains use the heme ligand to sense gaseous ligands

such as nitric oxide and carbon monoxide that diffuse through the cell membrane. In bacteria the HNOB domain

appears to occur as a standalone or in combination with the coiled-coil domain of chemotaxis transducers, while in

eukaryotes they form the NO or CO sensing module of the soluble guanylyl cyclases (Iyer et al., 2003). Most

versions of the HNOB domain are either coupled with a HNOBA domain (see above) in the same polypeptide or

physically associate with one (in bacteria usually encoded by another signaling protein in the same operon) (Iyer et

al., 2003; Ma et al., 2008). Thus it appears that coordination between the HNOB and HNOBA domains is critical for

transmission of the signal upon sensing the gaseous ligand (Iyer et al., 2003). The V4R domain is found in archaeal

and bacterial single component systems, where they are fused to HTH domains (e.g. MTH1349) or the NtrC-like

AAA+ domains (e.g. XylR and DmpR) that recruit 54 to regulate transcription (Anantharaman et al., 2001). The

V4R domains of XylR and DmpR appear to be required to sense aromatic compounds such as phenols (O'Neill et

al., 1998). Certain versions of the V4R domain such as that in MTH1349 contain 3 conserved cysteines suggesting

that these versions might coordinate a metal, which might play a further role in ligand recognition (Anantharaman et

al., 2001).

β-grasp fold sensory domains

This fold is an α+β fold typically containing 5 core β-strands and one helical segment (Fig. 2). In several versions

the β-strands are considerably curved to assume a barrel-like structure (Burroughs et al., 2007b). It is a versatile and

commonly found fold that includes a vast variety of representatives including ubiquitin, the 2Fe-2S ferredoxins, and

the Nudix phosphoesterases, of which a subset are dedicated small molecule-binding domains with a potential

sensor function. The chief among these is the soluble ligand-binding beta-grasp (SLBB) domain (Burroughs et al.,

2007a) (Fig. 2). One type of SLBB domain specializes in binding of cobalamin and its precursors and is often

present in bacteria cell surface proteins that might act as cobalamin scavengers. Bacterial and archaeal intracellular

versions are fused to ArsR- and AraC-type HTH domains, and might act as one-component system regulators that

sense cobalamin or its precursors (Burroughs et al., 2007a). Other superfamilies of the β-grasp fold, such as the 2Fe-

2S ferredoxin (Burroughs et al., 2007b) domain typically function as iron-binding electron transport facilitators in

metabolic reactions; however, some versions of this ferredoxin domain have also been recruited as potential sensors.

It is conceivable that these versions utilize the bound metal ions to sense redox potential. Some examples include

CyaC from Rhizobium meliloti (gi: 227823018) and cyaA6 from Leptospira interrogans (gi: 24213884) where the

2Fe-2S ferredoxin shows independent fusions to cNMP cyclase domains (Fig. 1). The Nudix superfamily is an

enzymatic version of the β-grasp fold members of which that hydrolyzes nucleotide disphosphate derivatives (Fig.

Page 13: Bacterial Signaling Chapter

2). However, some catalytically compromised and low-activity versions function as sensor domains in both

eukaryotes and bacteria. The best known example in bacteria is the NAD metabolism repressor NrtR, whose

repressor action is shut off by the binding of ADP-ribose to its sensor Nudix domain. Gradual hydrolysis of the

ADP-ribose restores the repressor activity of NtrR; thereby the Nudix domain functions as sensory switch (Huang et

al., 2009).

The ferredoxin-like fold domains

The ferredoxin-like (also called RRM-like) fold is a very widespread α+β fold that is present in at least 3 distinct

sensor domains found in bacterial signaling proteins (Andreeva et al., 2008). The fold is characterized by four core

β-strands against that are stacked against two helices (Fig. 2). The surface of the 4-stranded β-sheet, and the cleft

between the helices and β-sheet, form distinct, versatile ligand-binding regions in different versions of this fold (Fig.

2). The classical 4Fe-4S ferredoxins are major players in diverse metabolic electron transport chains. The 4Fe-4S

ferredoxin domain has also be recruited as a sensor domain in certain one-component systems, where it is combined

with HTH domains (e.g. HM1_2484, gi: 167630545 from Heliobacterium modesticaldum; Fig. 1) and might

function as redox sensors. The heavy metal binding HMA domain is another superfamily with the ferredoxin-like

fold that possesses two conserved cysteines with which it binds heavy metals (Jordan et al., 2001). These domains

are typically found fused to heavy metal efflux ATPases of detoxification systems (Jordan et al., 2001)(Fig. 2);

however, a subset of these domains found fused to HTH domains are likely to function as heavy metal sensing

transcription regulators (e.g. CHU_1500, gi: 110637904 from Cytophaga) (Aravind et al., 2005). The ACT domain

is the third sensor domain with the ferredoxin-like fold (Fig. 2). It is frequently fused to the catalytic domains of

various metabolic enzymes, whose activity it regulates allosterically or by a feedback process by binding amino

acids and their derivatives (Aravind and Koonin, 1999b). It is also used as a sensor in different signaling proteins.

One classical example of this is the TyrR transcription factor that regulates the expression of genes dependent on

aromatic amino acids Tyr, Trp and Phe (Pittard et al., 2005). This protein senses the aromatic amino acids using the

N-terminal ACT domain and transmits this signal via a PAS domain to downstream NtrC-like 54-binding AAA+

ATPase and HTH domains. Another remarkable bacterial signaling system depending on the ACT domain is

stringent response system that is triggered by starvation of amino acids and other nutrients. The key proteins in this

system are SpoT and RelA that generate and degrade the second messenger or “alarmone” (p)ppGpp using their

nucleotidyltransferase and HD phosphoesterase domain (Hogg et al., 2004). These proteins additionally contain an

ACT domain which might sense the status of nutrient availability by binding amino acids (Aravind and Koonin,

1999b). A divergent version of the ACT domain also serves as a sensor for ligands in the Lrp family of one-

component transcription factors (Ettema et al., 2002).

The gyrase inhibitor (GyrI)-like fold

This fold is made up of a dimer of strand-helix-strand-strand (SHS2) modules that together constitute a ligand-

binding pocket (Anantharaman and Aravind, 2004) (Fig. 2). The sensory members of this fold are typified by the

ligand-binding domains of a number of one component transcription regulators prototyped by Bacillus subtilis

Page 14: Bacterial Signaling Chapter

BmrR and Escherichia coli SbmC and Rob. In BmrR, the GyrI domain functions as a promiscuous xenobiotic sensor

and transmits the signal via a conformational change to the fused HTH domain to regulate transcription of multidrug

resistance genes (Newberry and Brennan, 2004). The diversity in the ligand-binding pocket of the GyrI domain of

the bacterial one-component transcription factors suggests that some of these might bind other types of ligands,

though most members might show a preference for positively charged ligands (Anantharaman and Aravind, 2004).

One specific family of the GyrI fold termed the SOUL family is found predominantly in secreted proteins in both

bacteria and eukaryotes and specifically binds a heme ligand (Dias et al., 2006). Members of the SOUL family from

Francisella tularensis (FTM_0021) and Mycobacterium (Mmcs_2896) might function on the cell surface as light or

redox sensors by means of the heme ligand.

CBS domains

A single CBS domain is an α+β unit with a 4-stranded β-sheet at its core. The CBS domains always occur as

obligate dimers with the surfaces of the β-sheets of the monomers stacked against each other to form a compact

globular unit (Bateman, 1997) (Fig. 3). CBS domains have thus far been shown to bind a variety of adenosine

ligands such as cAMP, AMP, ATP and S-AdoMet and regulate the activity of fused domains both in metabolic

enzymes such as Cystathionine-beta synthase and inosine-5'-monophosphate dehydrogenase and in signaling

proteins such as chloride channels in eukaryotes (Bateman, 1997; Ignoul and Eggermont, 2005). In bacteria, the

CBS domains (along with cNMP-binding domains) are found fused to a polymerase β-like nucleotidyltransferase

domain in a remarkable group of signaling enzymes prototyped by VIH_000647 (gi: 262033842; Fig. 1) from Vibrio

cholera (Aravind and Koonin, 1999a). This nucleotidyltransferase domain is likely to function in generating a cyclic

nucleotide or a related second messenger and the CBS domains might function as allosteric regulators that sense this

second messenger.

The DICT domain

This is a previously uncharacterized intracellular sensor domain of about 120 amino acids that is found associated

with the catalytic domains of diguanylate cyclases and phosphodiesterases (GGDEF, EAL and HD-GYP domains)

and two-component systems (histidine kinase). Hence, it was called the DICT domain (for diguanylate cyclases and

two-component systems; it overlaps with the domain of unknown function DUF2308 defined in the PFAM database

(Finn et al., 2008)). This domain is also fused to the STAS domain (see below) in certain proteins. It is predicted to

function as a sensor domain primarly based on its domain architectures in which in occupies a position comparable

to other sensor domains such as the BLUF (see below), PAS, GAF, HNOB and HNOBA domains (Fig. 1). Its

structure is currently not available, but secondary structure prediction (supplementary material) indicates that it

assumes an α+β fold with a 4-stranded β-sheet. The domain is prevalent in photosynthetic bacteria and halophilic

archaea suggesting that it might have a role in light response. A distinct subfamily of DICT domains has conserved

signatures in the form of a N-terminal HxxED motif and a C-terminal cysteine which might form a metal or

prosthetic group coordinating site (Supplementary material).

Page 15: Bacterial Signaling Chapter

The STAS domain

This domain shows a distinctive α/β fold with a core sheet of 4-5 strands (Fig. 3) and was first discovered in the

intracellular domains of sulfate transporters and the regulator anti-sigma factor antagonists (Anantharaman and

Aravind, 2000). The characterized versions of the STAS domain bind GTP or ATP and might even hydrolyze it at a

low rate (Avila-Perez et al., 2009; Kovacs et al., 1998). Studies on the Bacillus YtvA protein, which combines light-

sensing PAS and GTP-binding STAS domains in a single polypeptide, indicate that the molecule is likely to

function as a dual sensory switch that requires both light and NTP to activate transcription via B(Avila-Perez et al.,

2009). The STAS domain is also combined with GAF, S/T/Y-kinase, PP2C phosphatase, receptors with extracellular

CACHE and HAMP domains and Zinc-ribbon domains, suggesting that it might function as a potential nucleotide

binding sensory switch in a number of distinct signaling contexts (Aravind and Koonin, 2000).

Domains with the periplasmic-binding protein-type folds

Two topologically distinct, but evolutionarily distantly related folds, PBP1 and PBP2, are prototyped by the

periplasmic solute binding proteins of bacteria that function as partners of the ABC transporters (Tam and Saier,

1993). Both these domains have two α/β sub-domains that are topologically intertwined and constitute a ligand-

binding pocket at the interface between the two subdomains (Fig. 3). The PBP1 and PBP2 domains are extremely

versatile-ligand binding domains that recognize a wide range of substrates including sugars, amino acids, the

bacterial autoinducer-2 (AI-2; furanosyl borate diester), amides, peptides, metal ions, inorganic ions such as

phosphate and sulfate, purines and polyamines (Stock, 2006; Tam and Saier, 1993). Not surprisingly, several

extracellular and intracellular versions of the PBP1 and PBP2 domains have been recruited as sensory domains of

number of distinct bacterial and eukaryotic signaling systems. Examples of the extracellular versions include the

LuxP protein, a PBP1 domain protein, which binds AI-2 and transmits the signal of its presence to the histidine

kinase LuxQ (Stock, 2006). PBP1 domains are also found in the extracellular ligand-sensing modules of certain

bacterial acetylcholine receptor type ligand-gated ion channels (ART-LGIC; also known as Cys-loop ion channels)

and chemotaxis signal transducers (Tasneem et al., 2005). Intracellular versions of the PBP1 domain constitute the

sensor domain of the LacI family of single component transcription regulators, such as the Lac repressor

(Anantharaman et al., 2001; Daber et al., 2007). The intracellular versions of the PBP2 superfamily constitute the

ligand binding domains of the LysR family transcription factors, which are one of the major bacterial one-

component systems (Anantharaman et al., 2001; Ezezika et al., 2007; Sainsbury et al., 2009). Such PBP2 domains

are also predicted to associate with bacterial ART-LGIC ion channels to regulate its activity (Tasneem et al., 2005).

In eukaryotes the PBP1 domains constitute the ligand-sensing domains of cNMP generating receptors such as the

atrial natriuretic factor receptor, the metabotropic glutamate receptors, vertebrate taste receptors and one of the

domains of the NMDA-type ligand-gated ion channels (Lee et al., 2008b; Tasneem et al., 2005). The PBP2 domain

is also found in NMDA-type ligand-gated ion channels and the membrane spanning segments of the channel are

inserted into it (Fig. 1).

The Anabaena sensory rhodopsin-associated homology (ASRAH) domain

Page 16: Bacterial Signaling Chapter

This recently recognized superfamily has an 8-stranded β-sandwich fold and is predicted to bind sugars or their

derivatives in a cleft formed at the dimer interface (De Souza et al., 2009). The intracellular versions of this domain

occur as small proteins that form tetramers typified by the Anabaena sensory rhodopsin signal transducer

(ASRT)(Vogeley et al., 2007). These proteins are likely to regulate signaling via the Anabaena rhodopsin by sensing

sugars or their derivatives. Most ASRT homologs are found in organisms lacking a sensory rhodopsin, and are likely

to regulate the activity of sugar metabolism enzymes such as sugar isomerases, aldolases and sugar kinase by

directly binding sugar (De Souza et al., 2009). This domain differs from most other small molecule sensor domains

in functioning as a standalone module that is not fused to any other signaling domains.

The transporter-OB (T-OB) domain

This domain contains a specialized version of the β-barrel OB-fold which occurs as an obligate dimer on account of

swapping of the C-terminal strands of the fold between monomers (Aravind and Koonin, 2000) (Fig. 3). The

prototypical T-OB domain is found in the ModE single-component transcription factor that combines two T-OB

domains with a N-terminal DNA-binding winged HTH domain. This version of the T-OB domain senses molybdate

and regulates the expression of molybdenum metabolism genes (Schuttelkopf et al., 2003). Other versions of T-OB

domain are also found independently fused to the C-termini of a diverse range of ABC transporters such as the

sulfate transporter CysA, sugar transporters such as MalK and MsmX, glycerol, putrescine, molybdate and iron

transporters (Koonin et al., 2000). In these cases the T-OB domain is predicted to function as an inbuilt sensor which

detects the concentrations of the respective solutes translocated by these transporters and thereby regulates their

activity (Koonin et al., 2000).

The 4-helical up-and-down bundle fold

The 4-helical up-and-down bundle is a common structural motif that is found in at least three distinct superfamilies

of sensor domains, namely the 4HB or the aspartate-receptor type ligand-binding domain, the hemerythrin and the 4-

helical cytochromes (Ulrich and Zhulin, 2005). The 4HB domain is one of the most prevalent extracellular sensory

domains of the chemotaxis transducers, but it also occurs as a standalone cell-surface receptor (e.g. cg3044 from

Corynebacterium; gi: 41326927) or combined in the same polypeptide with effector domains of the two component

system and the cyclic nucleotide/diguanylate second messenger system (Ulrich and Zhulin, 2005). Though

topological simple (Fig. 3), the 4HB domain has been extensively used in bacterial signaling systems for the

recognition of a diverse range of ligands such as different amino acids, sugars, metals and peptides (Ulrich and

Zhulin, 2005). Profile-profile comparisons with the HHpred program suggest that the so-called “CHASE3” domain,

which is found in several signaling proteins (Zhulin et al., 2003), is likely to be a version of the 4HB superfamily.

The NarX/NarQ domain, which is the nitrate and nitrite sensor module of the NarX kinase (Cheung and

Hendrickson, 2009) and several other bacterial signaling receptors with other intracellular effector domains is

another widespread divergent clade of the 4HB superfamily. The hemerythrin superfamily chelates two iron ions by

means of multiple conserved histidine and carboxylate ligands (Fig. 3). A version of this domain has been found to

function as an iron-dependent O2 sensor in the Desulfovibrio vulgaris chemotaxis transducer DcrH (Xiong et al.,

Page 17: Bacterial Signaling Chapter

2000). Other members of this superfamily are found combined to the GGDEF domain (e.g. CPS_3631, gi: 71279208

in Colwellia), suggesting a cyclic diguanylate-dependent signal transduction of the ambient O2 (Fig. 1). The 4-

helical cytochrome domains bind heme and are widely used in different electron transfer chains, e.g. cytochrome C’

and cytochrome b562 (Fig. 3). Members of this superfamily appear to have been adapted as heme-dependent sensor

domains in certain bacterial lineages. One previously under-appreciated example is a membrane-associated signaling

protein found in planctomycetes, which combines an N-terminal PTPase-type phosphatase domain with a C-terminal

4-helical cytochrome domain (e.g. RB3293, gi: 32472429 from Rhodopirellula; Fig. 1).

The NIT domain

This is a large all α-helical fold with 8-10 helices, which is found either as the extracellular sensor domain of

bacterial two-component systems, cyclic diguanylate-dependent systems and chemotaxis signal transducers or as

intracellular sensor domains of the Amir/Antar-type transcription antitermination regulators (Shu et al., 2003). In

eukaryotes it appears as the extracellular domain of a group of animal receptor guanylyl cyclases (Anantharaman et

al., 2006). Currently no structures are available for the NIT domain, but based on the predicted secondary structure

it appears likely to adopt a torroidal or multihelical bundle configuration. A functionally characterized protein with

this domain, NasR, combines the NIT domain to the RNA-binding Amir/ANTAR domain and mediates transcription

anti-termination to allow expression of the nitrate utilization genes (Shu et al., 2003). This process is dependent on

the sensing of nitrate and nitrite ions by the NIT domain (Wu et al., 1999). However, it remains unclear if any of the

extracellular versions similarly sense nitrate or nitrite. It is also not certain if the NIT domain directly binds these

ions or senses them via a bound prosthetic group.

Miscellaneous domains functioning as the sensory units of one-component systems

By definition, one-component systems contain “inbuilt” sensory domains. Several of these domains have a limited

distribution outside of one component systems (Fig. 1); hence, we consider some prominent examples of such

domains together in this subsection even though they possess very distinct structures. The ATP-cone domain is a

cone-shaped 4-helical ATP/ADP binding domain (Fig. 4). It is found in the allosteric regulatory domains of the

type-I and type-III ribonucleotide reductase and the archaeal/bacterial 2-phosphoglycerate kinase which involved in

the synthesis of 2-3 phosphoglycerate, a thermoprotective compound (Anantharaman and Aravind, 2000). The ATP-

cone domain is also fused to a DNA-binding Zn-ribbon domain in the one-component transcription factor

NrdR/YbaD (Anantharaman and Aravind, 2000), where it functions to regulate the transcription of the nucleotide

synthesis genes such as ribonucleotide reductase by sensing nucleotide concentrations (Torrents et al., 2007). The

CCTBP domain is a recently described domain with the same fold as the TATA-box binding protein but containing

a conserved cysteine. The CCTBP domain is part of a number of sulfur-transfer systems in conjunction with the E1-

like adenylation domain (Burroughs et al., 2009). Some archaeal one-component transcription factors contain a

CCTBP protein suggesting that it might function as either a redox sensor by means of the thiol group of the

standalone cysteine or else detect elemental sulfur. The FadR domain is a ligand-binding domain of a sub-family of

GntR transcription factors typified by the bacterial fatty acid-sensing regulator FadR (van Aalten et al., 2001). It is a

Page 18: Bacterial Signaling Chapter

6-helical domain and the characterized members are known to bind fatty acyl-CoAs such as myristoyl-CoA. It is

possible that members of this fold show greater ligand diversity than is currently known. The AlcR-N domain is

typified by the sensor domain that recognizes the siderophore alcalgin in the Bordetella transcription factor AlcR

(Aravind et al., 2005). This domain is always fused to a C-terminal AraC HTH domain and shows lineage-specific

expansions in certain cyanobacteria. It appears to be a composite domain comprised of a N-terminal all β-strand sub-

domain with the double-stranded β-helix fold and a C-terminal all α-helical sub-domain. The majority of versions of

this domain contain a conserved cysteine that could be a site for covalent attachment of a prosthetic group (Fig. 1).

Globular sensor domains with relationships to catalytic domains

A number of sensor domains in bacterial proteins show relationships to catalytic domains. In particular, several one-

component system transcription factors combine catalytic domains of the enzymes of the pathways which they

transcriptionally regulate with a HTH domain. A classical example of this is the biotin repressor BirA, which

contains a biotin ligase domain involved in biotin adenylation and a DNA-binding winged HTH domain in the same

polypeptide (Hopwood, 2007). Here the enzymatic domain senses biotin to regulate transcription via the wHTH

domain. Other examples of such combinations of enzymatic domains to the HTH domains include the NadR

regulator of NAD metabolism genes (combined to a class-I tRNA synthetase-like nucleotidyltransferase domain),

sugar isomerase (SIS) domains, sugar kinase domains, pyridoxal phosphate-dependent aminotransferase domains

and serine protease domains in the DNA damage response regulator LexA (Aravind et al., 2005). In other instances

a sensor domain of a signaling system is related to an enzymatic domain, but is catalytically inactive. Thus, these

domains are bona fide small molecule binding domains that might have diverged from enzymatic domains. Such

domains are also most commonly encountered in one-component systems, though some versions have been widely

used in other signaling contexts. Below we discuss a few prime examples of such sensor domains.

The double-stranded β-helix

This domain (also called the cupin domain) is a distinctive all-β-βstrand fold that is found in an ancient superfamily

of enzymes that includes sugar isomerases, diverse dioxygenases, histone demethylases of the jumonji family,

isopenicillin synthase, the DNA-repair enzyme AlkB, and protein hydroxylases (Dunwell et al., 2004; Iyer et al.,

2009) (Fig. 4). The small-molecule sensor versions of this domain are prototyped by the sugar-binding domain of

the arabinose operon regulator AraC (Soisson et al., 1997). All members of this fold are characterized by a run of at

least 8-antiparallel strands with pairs of them linked by a short turns to constituting a single coil of the β-helix

(Dunwell et al., 2004) (Fig. 4). The β-helix contains an internal cavity which accommodates a diverse set of ligands.

The catalytically active members of the family have two conserved residues from the N-terminus of the fold (at least

one of which is a histidine) and one from the C-terminus (usually a histidine) that together constitute the substrate-

binding site. In several sensor versions of the fold the N-terminal conserved residues might be lost but they bind the

substrate in a manner identical the catalytic versions (Anantharaman et al., 2001; Dunwell et al., 2004; Soisson et

al., 1997). Like AraC, most sensor versions of this fold are fused to a DNA-binding HTH domain, and based on the

predicted operons to which they belong a large fraction of these are predicted to recognize sugars (Aravind et al.,

Page 19: Bacterial Signaling Chapter

2005). It is quite possible that some of the versions fused to HTH, as well as several of the standalone versions of

this fold that are not fused to HTH domains, recognize other types of ligands. The DSBH domain fused to the anti-

sigma factor domain in the regulator ChrR from Rhodobacter sphaeroides uses the same active site to bind a Zn2+

ion with which it senses singlet oxygen to control transcription (Campbell et al., 2007).

The UTRA domain

The archetypal version of this domain is the catalytic domain of the bacterial chorismate lyases (UbiC) and is

comprised of a duplication of a core 3-stranded β-sheet unit with a bihelical “clasp” (Aravind and Anantharaman,

2003) (Fig. 4). The sheet forms the floor while bihelical clasps form the walls of the ligand-binding pocket. The

non-catalytic versions of the fold are the ligand-binding domains of a large subfamily of GntR-type transcription

factors prototyped by the HutC/FarR-like transcription factors. The UTRA domain might be critical for the sensing

of a diverse array of small molecule stimuli by these transcription factors, such as histidine (HutC), fatty acids

(FarR), sugars (TreR) and alkylphosphonate (PhnF) (Aravind and Anantharaman, 2003; Gebhard and Cook, 2008;

Gorelik et al., 2006).

Sensor domains with the ISOCOT fold

This fold is a complex α/β fold that is found in catalytic domains with diverse catalytic activities such as sugar

isomerases, sugar phosphate lactonases, aminosugar deaminases, acetyl-CoA transferases and

methenyltetrahydrofolate synthetase (Fig. 4). The sensor domains of the DeoR and the SorC families of transcription

factors are catalytically inactive versions of this fold (Anantharaman and Aravind, 2006). While being catalytically,

inactive they retain the substrate-binding site typical of the catalytic members of the fold. The ISOCOT fold domain

of the DeoR transcriptions senses a diverse set of sugar derivatives such as deoxyribose nucleoside (DeoR), tagatose

phosphate (LacR), galactosamine (AgaR), myo-inositol (Bacillus IolR) and L-ascorbate (UlaR) (Anantharaman and

Aravind, 2006; Garces et al., 2008). Similarly, the ISOCOT domains of the SorC family also specialize in sugar-

sensing, e.g. L-sorbose (SorC), erythritol (EryD), D-arabitol (DalR) and fructose 1,6 bis-phosphate (CggR).

However, evolutionary analysis indicates that the ISOCOT sensor domains of these two distinct families of

transcription factors were independently derived from distinct enzymatic versions (Anantharaman and Aravind,

2006).

The NTF2 fold

This domain is found in a variety of enzymes such as scytalone dehydratase, limonene-1,2-epoxide hydrolase,

polyketide cyclases and ketosteroid isomerase (Andreeva et al., 2008). A bacterial non-enzymatic version of the

NTF2 fold is found in the orange carotenoid-binding protein, which is deployed by cyanobacteria to protect

themselves against the excess energy absorbed during photosynthesis (Wilson et al., 2007). Here the NTF2 fold

domain binds a carotenoid moiety (Fig. 4). A related version of the NTF2 fold is found fused to the C-termini of a

widespread subfamily of ECF (extracellular function) sigma factors prototyped by the Mycobacterium tuberculosis

stress response transcription factor SigG (Lee et al., 2008a). ECF sigma factors are known to regulate the expression

Page 20: Bacterial Signaling Chapter

of genes associated with extracellular or cell surface processes in response to diverse stimuli (Staron et al., 2009). In

light of this, it is conceivable that the NTF2 domains of the SigG-type ECF sigma factors are sensor domains that

help these transcription factors respond to stimuli by binding small molecules.

The MEDS domain

This domain is prototyped by the dichloromethane-sensing domain of the one component transcription factor DcmR

from Methylobacterium (La Roche and Leisinger, 1991). It also occurs fused to histidine kinase domains (e.g. in the

anti-sigma factor PrsR/RsbA from actinobacteria that regulates stress response and aerial mycelia growth (Lee et al.,

2004)) and as a standalone protein (Anantharaman and Aravind, 2005). The available evidence suggests that it is

likely to functionally collaborate with the PocR domain (see above) in certain organisms in binding hydrocarbon

derivatives (Anantharaman and Aravind, 2005; La Roche and Leisinger, 1991). Sequence profile analysis indicates

that the MEDS domain is a catalytically inactive version of the P-loop NTPase domain of the RecA (DNA

recombinase) superfamily. The MEDS domain does not retain the nucleotide-binding active site of the P-loop

NTPase fold. Instead it has a set of polar conserved residues in place of the P-loop motif which potentially constitute

a ligand-binding site. However, the characteristic feature of the MEDS domains is two highly conserved histidines, a

glutamate and an arginine that are located opposite to the conventional P-loop NTPase active. These might

constitute a second ligand-binding site that potentially coordinates a metal or a prosthetic group such as heme

(Anantharaman and Aravind, 2005).

The BLUF domain

This domain is a catalytically inactive version of the acylphosphatase domain that binds FAD (Gomelsky and Klug,

2002). It adopts an α+β fold similar to the ferredoxin domains and binds its FAD ligand between the two helices of

this fold (Fig. 4). The BLUF domain was first characterized in the AppA protein from Rhodobacter sphaeroides

which is required for sensing blue light and regulating photosynthesis-related genes (Braatsch et al., 2002). It is

likely that the domain senses both light and redox by means of its FAD ligand. The domain has been found

combined with GGDEF, EAL, cNMP cyclase and AraC-like HTH domains suggesting that it transmits sensory

inputs through both second-messenger and one-component systems. It has also been found in photosynthetic

eukaryotes such as Euglena where it might mediate phototactic responses via associated cNMP cyclase domains

(Han et al., 2004).

The cryptochrome/photolyase-like photosensors

The photolyase-like domains contain a special version of the α/β Rossmannoid fold known as the HUP (HIGH

nucleotidyl transferase, UspA, Photolyase) fold (Aravind et al., 2002). Members of photolyase-like superfamily bind

FAD, and the classical versions function as DNA repair enzymes in the light-dependent repair of cyclobutane

pyrimidine dimers (Lin and Todo, 2005). A clade of this superfamily in plants and animals function as blue light

sensors and are known as cryptochromes (Lin and Todo, 2005). Several bacteria (e.g. cyanobacteria and Vibrio) too

contain representatives of the cryptochrome domain known as CRY-DASH proteins. Studies in cyanobacteria

Page 21: Bacterial Signaling Chapter

suggest that they might function as light-dependent transcription regulators (Brudler et al., 2003). However, their

functions in other bacteria remain poorly understood and they could also function as redox sensors by means of their

FAD cofactor.

Transmembrane sensor domains

7TM receptors

While 7TM receptors were initially characterized as the mainstay of sensory signaling in animals and fungi, their

role in bacteria has only recently become clear. The first prokaryotic 7TM proteins to be characterized were the

sensory rhodopsins of halophilic archaea, which are paralogs of their ion transporting rhodopsins (Purcell and

Crosson, 2008; Spudich and Luecke, 2002). Like all other rhodopsins these proteins contain a retinylidene prosthetic

group combined via a Schiff’s base to a lysine within one of TM segments of the protein (Fig. 4). These archaeal

sensory rhodopsins, SRI and SRII have been shown to function respectively as attractant and repellent phototaxis

receptors for orange and blue-green light. They are believed to transmit the conformational change in the

retinylidene upon photoreception through other membranes protein to the motility complex of the organism (Jung et

al., 2003; Mukohata et al., 1999). A sensory rhodopsin was also found in the cyanobacterium Anabaena (Jung et al.,

2003), which shows a regulatory interaction with the potential sugar-binding ASRAH domain protein ASRT (see

above). Subsequently, other sensory rhodopsins have been found in bacteria and they are likely to function

independently of ASRT (De Souza et al., 2009). One of these found in Salinibacter ruber of the bacteroidetes clade

is encoded in an operon with a chemotaxis signal transducer (SRU_2579, gi: 83815419 and SRU_2580, gi:

83815551), whereas another found in the cyanobacterium Cyanothece sp. PCC 7425 is encoded in an operon with a

photosystem-I reaction center subunit PsaK (Cyan7425_0874, gi: 220906310 and Cyan7425_0875, gi: 220906311).

The former might signal via the chemotaxis signal transducer, whereas the latter might directly signal to the

photosystem-I complex.

Another major family of bacterial 7TM proteins is the 7TMR-DISM family (7TM receptors with diverse

intracellular signaling modules), which is found across the bacterial tree, with expansions in certain lineages such

Leptospira and Cytophaga (Anantharaman and Aravind, 2003). Representatives of these 7TM receptors are

combined with diverse intracellular signaling domains belonging to the two component system, the GGDEF and

EAL, cNMP cyclases, chemotaxis transducers, HTH and Zn-ribbon domains (Fig.1). The combination with the HTH

and ZnR domains suggests that these receptors could potentially constitute unusual membrane-associated one-

component systems. Their extracellular regions typically contain one of two all-β domains, 7TMR-DISMED1 and

7TMR-DISMED2 (7TMR-DISM extracellular domains 1 and 2). These domains are structurally related to the

sugar-binding domains of glucosidases and glucuronidases, and are thus likely to function as sugar sensors in

conjunction with the 7TM domain (Anantharaman and Aravind, 2003). Consistent with this they are expanded in

certain lineages that are known to have an active carbohydrate metabolism like Cytophaga (Anantharaman and

Aravind, 2003). Architecturally these 7TMR-DISMs resemble the animal taste and glutamate receptors in having a

distinct extracellular ligand-binding domain as opposed to the classical 7TM domains, like rhodopsin, which bind a

Page 22: Bacterial Signaling Chapter

ligand directly. Another family of 7TM receptors in bacteria, which has lower architectural diversity than the

7TMR-DISMs, is the 7TMR-HD (7TM receptors with intracellular HD domains) family. The intracellular HD

domain of these proteins is likely to function as a phosphodiesterase. The extracellular domain of the 7TMR-HD is a

domain enriched in polar residues with several α-helices which might function as a sensor along with the 7TM

domain. The 7TMR-HD are nearly always encoded in a distinctive operon, which contains genes for a

diacylglycerol kinase, a PhoH-like P-loop NTPase and a YbeY-like metal-dependent hydrolase that might function

as a phospholipase C (Anantharaman and Aravind, 2003). Based on this it appears likely that the 7TMR-HD

proteins regulate a previously unknown signaling pathway dependent on lipid modification. Evidence from E.coli

suggests that this function might be required in heat shock response (Rasouly et al., 2009).

Other multi-TM receptor domains

Bacteria also possess several unique multi-TM receptor domains that are not found in the other superkingdoms of

life. A widespread multi-TM receptor domain is the 5TMR-LYT (5 transmembrane receptors of the LytS-YhcK

type), which is combined to several intracellular signaling domains such the histidine kinase, GGDEF and EAL, the

chemotaxis signal transducer, and also intracellular sensory domains such as PAS and GAF (Anantharaman and

Aravind, 2003). The prototypical member of this family is the receptor domain of the LytS of the histidine kinase

from Gram-positive bacteria such as Staphylococcus aureus (Patton et al., 2006). This protein is believed to sense

changes in the cell-surface to regulate murein hydrolases and autolysis. Based on some recent studies on LytS it

appears possible that the 5TMR-LYT might be required to sense membrane potential (Patton et al., 2006). The

related 5TMR-LYT receptor with an intracellular GGDEF domain, namely GdpS from S.aureus, might also have a

comparable sensory role (Holland et al., 2008). The other widespread multi-TM receptor is the 8TMR-UT domain

(8-transmembrane receptor UhpB type) and it displays a similar set of intracellular domain architectures as the

5TMR-LYT domain (Anantharaman and Aravind, 2003). The archetypal member of this superfamily is the receptor

domain of the E.coli histidine kinase UhpB. Studies on this protein indicate that the 8TMR-UT is an intramembrane

sensor that functions in conjunction with other membrane-embedded proteins, such as UhpC, to recognize

extracellular solutes. In the case of the UhpB system, the UhpC protein initially binds glucose 6-phosphate and upon

doing so associates with the 8TMR-UT protein which transmits the signal to its histidine kinase domain (Verhamme

et al., 2001). Thus, the 8TMR-UT might be conceived as a sensor of protein-protein interactions in the membrane.

The MHYT motif is comprised of two TM domains, with one of them bearing the characteristic signature of the 4

amino acids after which it is named (Galperin et al., 2001; Santiago et al., 1999). The MHYT motifs typically occur

in three copies per protein constituting a 6-TM domain. It addition to combinations with two-component and cyclic

diguanylate signaling domains, it is also combined directly with DNA-binding HTH domains suggesting that it

might also be part of a membrane-associated one-component system. The presence of the conserved histidines in

this domain indicates that it might function in coordinating a metal or heme prosthetic group. Consistent with the

latter suggestion, the MHYT domain has been found in the carbon monoxide-sensing protein from Oligotropha

carboxidovorans (Santiago et al., 1999). A MHYT-containing receptor with intracellular GGDEF-EAL domains in

Pseudomonas aeruginosa might activate alginate formation by similarly sensing gaseous ligands (Hay et al., 2009).

Page 23: Bacterial Signaling Chapter

Ligand-gated ion channel domains

The ligand-gated ion channels were originally characterized in animal neural signaling pathways as the receptors of

wide range of neurotransmitters such as acetylcholine, glutamate, serotonin, glycine, gamma-aminobutyric acid,

histamine and also of Zn2+ ions. Of these one of the major families is the ART-LGIC (also called Cys-loop channels

in animals) family that serves as receptor for all of these ligands. The discovery of bacterial homologs of these ion

channels indicated that such a signaling mechanism might be prevalent across diverse bacteria (Tasneem et al.,

2005). The ligands of the bacterial ART-LGICs remain unclear, though they could potentially function as amino

acid sensors. One version from the cyanobacterium Gloeobacter has proposed to be a sensor of pH (Bocquet et al.,

2009; Hilf and Dutzler, 2009) although its actual ligand might still be unknown. The prokaryotic versions show

greater architectural diversity than the eukaryotic versions in showing fusions to the PBP1 and CACHE domains.

This suggests that they might be potentially responsive to multiple ligands (Tasneem et al., 2005). The ART-LGIC

channel module are comprised of two domains – 1) the N-terminal β-sandwich domain which pentamerizes and

contains a ligand-binding pocket at the dimer interface and 2) the C-terminal 4-TM domain that forms the pore of

the ion-channel (Fig. 4). Residues in the 4-TM domain determine the ion selectivity of the channel and based on this

it is likely that different bacteria contain either anionic or cationic channels (Bocquet et al., 2009; Hilf and Dutzler,

2009; Tasneem et al., 2005). Unlike their eukaryotic counterparts the bacterial ART-LGIC lack the conserved

cysteines that form disulfide bonds in their ligand-binding domain. Diverse bacteria also possess homologs of the

second type of eukaryotic ligand-gated ion-channel, the ionotropic glutamate receptors (NMDA-type ligand-gated

channels) (Chen et al., 1999; Kuner et al., 2003). The bacterial versions differ from their animal counterparts in

having a compact architecture with just the core ligand-binding and ion-channel domains. This core shows an

unusual architecture with a ligand-binding PBP2 domain into which are inserted two TM segments. C-terminal to

the PBP2 domain there is one further transmembrane segment (Fig. 1). The channel is formed by a tetramer of such

subunits. The characterized bacterial ionotropic glutamate receptors appear to have the same ligand specificity as

their eukaryotic counterparts and sense L-glutamate via their PBP2 domain (Lee et al., 2008a). Both ART-LGIC and

the ionotropic glutamate receptors are found in cyanobacteria along with a variety of other ion channels suggesting

that they might possess a rather well-developed electrochemical signaling system.

The activity of diverse ion channels might also be regulated by a variety of other intracellular sensor domains such

as the TrkA domain which senses redox, the cyclic nucleotide-binding domain, which gates members of the

mechanosensory ion channel family by binding cNMP, and the CBS domain, which is also likely to bind cNMP or

other adenosine derivatives (Anantharaman et al., 2001; Bateman, 1997; Ignoul and Eggermont, 2005).

Sensor domains of second messenger systems

These domains are not biochemically different in any fundamental sense from the other above- described sensory

modules. However, they are mechanistically distinct in that they sense a second messenger generated by a signaling

system that senses the primary signal. Certain versions of the well-known sensor domains such as the PAS, GAF

Page 24: Bacterial Signaling Chapter

and CBS domains act as sensors of second messengers such as cNMP (see above) (Anantharaman et al., 2001;

Martinez et al., 2002). Additionally, cNMPs are recognized by a dedicated cyclic nucleotide binding domains

(cNMPBD or CAP). Thus far only one cyclic diguanylate-binding domain namely, the PilZ domain has been

characterized (Amikam and Galperin, 2006). Currently, there are no known dedicated families of (p)ppGpp binding

domains. In this section we only consider the dedicated secondary messenger-binding domains.

The cNMPBD or CAP domain

The cNMPBD, which binds cyclic nucleotides, is prototyped by the ligand-binding domain of the catabolite

activator transcription factor Crp and related proteins (Kannan et al., 2007; Spiro, 1994). The domain adopts a

double-stranded β-helix fold but shows no detectable sequence similarity with the above-described enzymatic and

sugar-binding version of the DSBH fold (Andreeva et al., 2008). However, like the former domain it also binds the

cNMP ligand within the cavity formed by the β-strands of the DSBH (Fig. 4). The cNMP specificity is conferred by

a distinctive motif present in a subset of the cNMPBDs which contains a conserved arginine that recognizes the

phosphate of the cNMP moiety (Kannan et al., 2007). The cNMPBD is combined with a wide range of other

signaling domains belonging to the one-component, two-component, cyclic diguanylate, ion channel, S/T/Y

phosphorylation systems. Additionally, it is also fused to domains not directly related to any signaling system such

as ABC-transporters (Kannan et al., 2007). These architectures indicate that the cNMPBD confers allosteric

regulatory control to a wide range of processes by sensing cNMP. Versions of the cNMPBD that lack the cNMP

recognition motif found in the canonical versions have been shown to bind ligands unrelated to the second

messenger sensing system. These include the cNMPBDs in the transcription factors like Rhodospirillum rubrum

CooA which contain a conserved histidine in place of the conserved phosphate-binding arginine. These versions

instead coordinate a heme moiety by means of which these one component systems sense carbon monoxide

(Lanzilotta et al., 2000). The cNMPBD of Desulfitobacterium dehalogenans CrpK contains an asparagine in the

same position, by means of which it senses its halogenated growth substrate chlorophenolacetic acid (Joyce et al.,

2006).

The PilZ domain and cyclic diguanylate signaling

The PilZ domain is the only currently known cyclic diguanylate-binding domain (Amikam and Galperin, 2006). The

PilZ domain displays the all-β split-barrel fold with 6 β-strands (Fig. 4). The cyclic diguanylate is bound on a cleft

formed by two extended loops on one side of the barrel (Benach et al., 2007). The PilZ domain is best characterized

in the bacterial cellulose synthases (e.g. from Gluconacetobacter xylinus) where it acts as a binding site for the

allosteric activator of these enzymes, cyclic diguanylate (Weinhouse et al., 1997). This is consistent with the view

that cyclic diguanylate signaling is a major player in the transition between the single-celled motile state to the

multi-cellular sessile state wherein cells are linked by extracellular matrices such as cellulose. The PilZ domain is

also combined with a NtrC-like AAA+ ATPase, chemotaxis signal transducer, histidine kinase, S/T/Y kinase and

condensation domains of non-ribosomal peptide and polyketide synthesis systems. These architectures indicate that

the PilZ domain might play a role in the cross talk between diguanylate signaling and other signaling systems and

Page 25: Bacterial Signaling Chapter

also regulation of the production of secondary metabolites such as peptides or polyketides. The PilZ domain is not

very rich in terms of architectural diversity and is relatively under-represented in numerical terms (Supplementary

material). This is somewhat surprising given the number and diversity of predicted cyclic diguanylate cyclases

encoded by bacterial genomes and processes they regulate (Benach et al., 2007; Ryjenkov et al., 2006). It is hence

possible that some of the other sensor domains such as PAS or GAF might also bind cyclic diguanylate. Certain

GGDEF domains in Gram-positive bacteria, e.g. GdpS of S.aureus, do not appear to have any diguanylate cyclase

activity (Holland et al., 2008). PilZ domains are also not known in S.aureus suggesting that the GGDEF domains

could potentially generate as yet unknown second messengers. Alternatively, as shown in E.coli, certain GGDEF-

EAL domain proteins might function in sequestering small RNAs and regulating their stability (Suzuki et al., 2006).

Given the relationship between the GGDEF domains the DNA and RNA polymerases (Makarova et al., 2002) it

would be of interest to see if these bacterial RNA-binding GGDEF domains might act as RNA polymerases or

nucleotidyltransferases in a distinct regulatory pathway.

Domain-architectural syntax of signaling proteins

Proteins with sensor domains show some of the greatest diversity in domain architectures among all bacterial

proteins. Signaling proteins with sensor domains are particular prone to lineage-specific architectural diversity

suggesting that sensor domains show a greater than average tendency for duplication, lost or evolutionarily mobility

(Anantharaman et al., 2001). This tendency might be the result of multiple selective forces: 1) At a basic level there

might not be a strong constraint on the specific number of copies of a certain sensor domains in a given polypeptide.

Thus, the specific number of PAS domain might vary widely even between functionally equivalent or orthologous

proteins in different organisms. 2) Secondly, a number of sensor domains might be functionally equivalent, e.g.

completely different domains can bind ligands such as heme (Dias et al., 2006; Iyer et al., 2003; Moglich et al.,

2009b). Thus, non-homologous but functionally analogous domains could replace each other in comparable

signaling proteins without being selectively disadvantageous (Anantharaman et al., 2001). 3) The same sensory

input might be channeled through widely different downstream effectors (Fig. 1, 5). The choice of different

downstream effectors might be selected by the need for different strengths and types of signals depending on an

organism’s lifestyle. The availability of a large number of complete bacterial genomes allows us to explore this

phenomenon quantitatively. We have formerly devised and utilized a measure to describe the domain architectural

complexity of a class of proteins – the complexity quotient (Aravind et al., 2001; Burroughs et al., 2007b). It

captures two elements of domain architectural complexity, namely the number of domains and the number of

different types of domains in a polypeptide as a single number. The average complexity quotient can then be

calculated for a class of proteins in a given organism. When we plot the average complexity quotient of sensor

proteins for over 200 diverse prokaryotic species we observe a positive correlation with the number of sensor

proteins encode by the genome (Fig. 6). Thus, the architectural complexity of sensor proteins scales as a logarithmic

function (r2=0.78) with increase in the number of different sensor proteins. The rise in architectural complexity is

rapid till around a count of 500 sensor proteins per genome is reached, followed by a more gradual increase

thereafter. This trend suggests that with increasing number of sensor proteins there might be greater opportunity for

Page 26: Bacterial Signaling Chapter

recombination between them resulting in highly increased architectural complexity. However, after around 500

proteins there is a gradual saturation of the number of selectively advantageous combinations that can be achieved

(Fig. 6).

Despite their bewildering architectural diversity a number of syntactical features can be discerned in the domain

architectures of sensor proteins. This suggests that there are strong evolutionary constraints on certain aspects of

their architecture. The most important of constraints might be described as those specifying 1) the number, 2) the

position within a polypeptide and 3) the association with other domains in a polypeptide. In terms of number a

striking example is seen in the form of the differences between structural related domains such as PAS, GAF,

CACHE and CHASE. The PAS domain is frequently found as multiple tandem repeats, the GAF might occur singly

or as dyads but rarely as multiple tandem repeats, the CACHE might occur singly or as dyads and the CHASE

always singly (Anantharaman and Aravind, 2000; Anantharaman and Aravind, 2001; Anantharaman et al., 2001;

Moglich et al., 2009b; Mougel and Zhulin, 2001). This suggests that their specific functional differences might have

an influence on the domain architectures in which they occur. Firstly, the CACHE and CHASE domains are

predominantly extracellular small molecule sensor domains – this implies that there may be no selective advantage

in them occurring in multiple copies. In contrast, PAS domains with a major role in dimerization-linked signal

transmission might be selected to occur as tandem repeats that regulate the threshold of stimulus for signal

transmission (Anantharaman and Aravind, 2000; Moglich et al., 2009b; Ponting and Aravind, 1997). Thus, while the

primordial PAS domain might have emerged as a small molecule sensor, it appears to have been subsequently

adapted to function in addition as a signal transmitter domain. In terms of position in a polypeptide, receptor

proteins are under a selective pressure to maintain a certain polarity of the sensor domain with respect to the

downstream signaling domains. This constraint of polarity is particularly noticeable in the case of membrane-

associated receptors that need to have an extracellular sensor domain (Anantharaman and Aravind, 2000;

Anantharaman and Aravind, 2001) (Fig. 1, 5). In terms of intracellular receptors we observe an interesting

difference: One-component systems show lesser constraint on the position of the sensor domain with respect to the

associated regulatory domains (Aravind et al., 2005) (Fig. 1, 5). This is probably because they are typically made up

of just two domains and the conformational change upon receiving the sensory input can be transmitted directly to

the regulatory domain. In contrast, intracellular receptors with effector domains such as those of the two component,

second messenger and chemotaxis signal transducer systems show a strong preference for the primary stimulus-

sensing domain to occur at the N-terminus (Anantharaman and Aravind, 2003; Anantharaman et al., 2006; Aravind

and Ponting, 1999; Galperin et al., 2001; Ulrich and Zhulin, 2005; Williams and Stewart, 1999). However, they

might be additionally regulated allosterically by sensor domains at the C-terminus, especially those that bind second

messengers (Fig.1, 5)(Martinez et al., 2002). The primary reason for the polarity appears to be the signal transmitter

domains such as the S-helix and the HAMP that couple the sensor domain to the effector domains. These signal

transmitter domains appear to transmit the conformational change in a polarized fashion upon signal reception, thus

selecting for a strict order between the sensor and effector domains(Anantharaman et al., 2006; Manson, 2009;

Moglich et al., 2009b; Williams and Stewart, 1999).

Page 27: Bacterial Signaling Chapter

The association between different domains across a whole functional category of proteins can be represented as a

network or an ordered graph (Fig. 5). In this graph the domains are the nodes and their co-occurrence in a

polypeptide is an edge joining the two domains, with the direction of the edge showing the order in which they

occur. Examination of such graphs helps in visualizing many of the key trends in the co-occurrence of domains

(Anantharaman et al., 2006). One feature that becomes immediately apparent is the relative promiscuity of most

sensor domains with respect to their co-occurrence with other types of downstream effector domains. Thus, in

different signaling proteins the same sensor domains might be combined with catalytic or effector domains of every

major signaling system (Fig. 5). This flexibility has enabled organisms to channel the sensing of the same stimulus

to different kinds of responses. However, the signal transmitter domains show much stronger constraints in their co-

occurrence trends with respect to sensor and downstream effector domains. Almost always, the HAMP co-occurs

with both a sensor domain and downstream effector domain and never as solo domains. In a few cases (e.g. the

aq_1825, a receptor with a PAS-fold heme-binding extracellular domain) the HAMP domain might occur with only

a sensor domain but no downstream effector domain (Aravind and Ponting, 1999). The S-helix also almost always

co-occurs with two other domains on either side of it in a polypeptide; however in this case the flanking domains

might be both sensor domains or both effector domains or either of them. Interestingly, S/T/Y kinases are rarely

combined downstream of a HAMP or S-helix modules (Anantharaman et al., 2006; Aravind and Ponting, 1999).

This suggests that there is fundamental difference in signal transmission in the S/T/Y kinase-based signaling systems

with respect to all other systems.

Phyletic patterns and demographics of sensor domains in bacteria

The availability of a large number of completed genomes allows us to examine the numerical trends in the

distribution of proteins with sensor domains across the bacterial superkingdom. The number of signaling proteins

encoded by an organism shows a positive correlation with proteome size (Fig. 6). However, within bacteria two

distinct trends are recognizable. In a subset of bacteria the number of proteins with sensor domains appears to scale

linearly with proteome size (Fig. 6). This linear trend is dominated at the high end by organizationally complex

myxobacteria such as Sorangium and Myxobacteria, cyanobacteria like Anabaena, Nostoc and Acaryochloris, and

actinobacteria such as Streptomyces, Frankia and Rhodobacter. In another subset of bacteria the number of proteins

with sensor domains scales as a power law with proteome size (Fig. 6). At the high end of this trend are

alphaproteobacteria of the Rhizobiaceae clade such as Agrobacterium, Bradyrhizobium, Rhizobium,

gammaproteobacteria such as Pseudomonas, and betaproteobacteria such as Cupriavidus (Ralstonia eutropha) and

Burkholderia xenovorans. Finally, planctomyctes like Rhodopirellula and cyanobacteria like Microcystis show an

unusually low count of proteins with sensor domains for their proteome size by considering either trend. The

difference in the linear and power-law trends is marked at proteome sizes greater than 4500 proteins; thus, for the

same number of encoded proteins different types of bacteria can show markedly different complements of proteins

with sensor domains (Fig. 6).

Page 28: Bacterial Signaling Chapter

A systematic analysis of each of the families of signaling domains across different bacterial lineages (supplementary

material) revealed that the proteobacteria on the power-law trend, in particular the alphaproteobacteria with large

genomes, have an over-representation of one-component system transcription factors relative to other lineages with

large genomes (Supplementary material). On the other hand, most other major families of sensor proteins in these

proteobacteria following the power-law trend show comparable numbers to those observed in other lineages with

large genomes. This additional set of sensor domains from the one-component systems over and beyond the linear

baseline for the given proteome size appears to be a major contributor to the power-law trend especially in

proteobacteria with large genomes. As against the proteobacteria on the power-law trend, the bacteria at the high-

end within the linear trend (i.e. the myxobacteria, cyanobacteria and actinobacteria) have an over-representation of a

distinct subset of signaling proteins, namely those with domains found in eukaryote-type apoptosis-related signaling

(e.g. STAND ATPases, Caspases, TIR, S/T/Y kinases and FHA domains (Leipe et al., 2004); supplementary

material). This subset of signaling proteins has been, in particular, associated with developmental complexity and

differentiated multicellularity (Aravind et al., 2003; Koonin and Aravind, 2002). This observation suggests that the

two distinct trends in the number of proteins with sensor domains reflect a broad difference in the life-style

strategies of the organisms lying on them. Those on the power-law trend have optimized their sensory systems to

directly and rapidly respond to a large number of small molecule signals in their environment. Those on the linear

trend do not seem to require that many direct responses to environmental small molecules, and instead invest in

distinct signaling systems related to development complexity. This difference might in part arise due to fact that

cyanobacteria produce their own nutrients; hence, they are less-dependent the alphaproteobacteria in responding

rapidly to diffusible solutes in the environment. The actinobacteria and myxobacteria are typically slow growers,

with a saprophytic life-style, which compete strongly for resources by producing antibiotics (Hopwood, 2007;

Whitworth and American Society for Microbiology., 2008). Hence, they are also likely to have a lower dependence

on rapid responses to ambient solutes. Another poorly understood bacterium of the acidobacteria/fibrobacteria clade

is Solibacter usitatus that lies on the linear trend line. By analogy to the other bacteria showing this trend it is

conceivable that this uncultivated bacterium follows a similar lifestyle strategy as the actinobacteria and

myxobacteria. These bacteria also show a well-developed cyclic diguanylate signaling apparatus suggesting that

they might form multicellular differentiated colonies connected by an extracellular matrix (Ward et al., 2009).

Counts of proteins with particular types of sensor domains show very different scaling trends from the overall trend

observed for sensor domain proteins (Fig 6, 7). Some of these throw important light on the lifestyle strategies

adopted by particular bacterial lineages. PAS and the GAF domains show a general power-law scaling with

proteome size (Fig. 6). However, some organisms markedly deviate from the power-law trend to show much greater

than expected numbers of both PAS and GAF domains. The majority of organisms which show simultaneous over-

representation of both these domains belong to photosynthetic lineages, cyanobacteria or chloroflexi, which are not

phylogenetically close. This striking pattern suggests that the PAS and GAF domains have been independently

expanded as potential photo-sensors in both these lineages. Another organism showing a simultaneous expansion of

both these domains is Kineococcus radiodurans, the unusual gamma-irradiation resistant desert-living

Page 29: Bacterial Signaling Chapter

actinobacterium (Fig. 6) (Phillips et al., 2002). This unusual bacterium produces a bright orange carotenoid and the

expansion of both GAF and PAS domains might point to a previously unknown photosensitive behavior in this

organism. This is also supported by the presence of a phytochrome-like protein in this organism (Karniol et al.,

2005). PAS domains by themselves are also greatly expanded in several phylogenetically distant motile aquatic

mesophilic bacteria and archaea, suggesting that it might have an important role in motility-associated

chemosensory signal transduction in these bacteria (Fig. 6). In contrast, actinobacteria (excluding Kineococcus) and

rhizobia show relatively fewer PAS or GAF domains for their proteome size, which probably correlates with their

comparatively lower dependence on chemosensory motility. Likewise, a dramatic over-representation of

chemotaxis signal transducers is observed in phylogenetically diverse marine and freshwater bacteria (e.g. Vibrio,

Aeromonas, Marinomonas and Magnetospirillum, Fig. 7) suggesting that a motile lifestyle has strongly selected for

a well-developed chemosensory response that controls the flagellar apparatus in these organisms (Alexander and

Zhulin, 2007; Zhulin, 2001). Chemotaxis transducers are vastly under-represented, when normalized by proteome

size, in actinobacteria, cyanobacteria and developmentally complex myxobacteria, which is consistent with the

lower importance of chemotactic motility in most of these organisms (Fig. 7). The cNMP-sensing domains are one

example, where no apparent relationship between their over-representation and life-style or phylogeny is observed.

They are highly expanded in bacteria with different types of motility, e.g. Magnetospirillum with flagellar motility

and Flavobacterium johnsoniae with gliding motility and also cyanobacteria and rhizobia (Fig. 7).

Among the effector domains downstream of sensor domains, the scaling of histidine kinases with proteome size also

follows a power law function (Fig. 7). Nevertheless, a few organisms, especially certain, myxobacteria and

cyanobacteria and motile aquatic bacteria show a strong over-representation for their respective proteome sizes.

Curiously, the majority of actinobacteria show an under-representation of histidine kinases for the proteome size. In

this regard they depart from the classical pattern of developmentally complex bacteria, though the possible basis for

this under-representation remains unclear. Irrespective of their relative representation in proteomes, histidine

kinases show a strong linear 1:1 correlation (r2=.91) with the number of receiver domains present in the same

proteome (Fig. 7). This suggests that the functional correspondence between the histidine kinase and the respective

receiver domain is a strong evolutionary constraint and they have been accordingly duplicated or lost as pairs.

Histidine kinases are interestingly absent or greatly underrepresented in the genomes of hyperthermophiles,

especially archaea (Koretke et al., 2000) (Supplementary material). As previously proposed, this might be possibly

related to the relative instability of exposed acyl phosphates at high temperatures (Burroughs et al., 2006).

Furthermore, while histidine kinases are present in several eukaryotic lineages, there is no evidence that they were

present in the ancestral eukaryote. Instead, they appear to be lateral transfers that have expanded in particular

lineages such as plants, amoebozoan and heterolobosean amoebae (e.g. Naegleria) (Koretke et al., 2000; Ponting et

al., 1999). This might mean that the relative under-representation of histidine kinases in most eukaryotic lineages

might have been an historical accident related to a part of their ancestry being from a thermophilic archaeal lineage.

The majority of archaea on the other hand have at least a few S/T/Y kinases, the phosphoesters generated by which

are potentially more stable than the exposed acyl phosphates at higher temperatures (Leonard et al., 1998). It is

Page 30: Bacterial Signaling Chapter

plausible that this might have tipped the balance in favor of the proliferation of the S/T/Y kinases during the

emergence of eukaryotes. The GGDEF domain and the associated cyclic diguanylate phosphodiesterases, HD-GYP

and EAL are entirely restricted to the bacteria with not even a single occasion of lateral transfer to any of the other

superkingdoms of life (Galperin et al., 1999; Holland et al., 2008; Makarova et al., 2002). The archaeal homologs of

the GGDEF domain are the nucleic acid polymerases of the Crispr system rather than signaling enzymes (Haft et al.,

2005; Makarova et al., 2002). This strict exclusion of cyclic diguanylate might suggest that the compound is toxic

in the two other superkingdoms. Given the relationship of the GGDEF domain to the primary DNA polymerases of

the archaeal and eukaryotic superkingdoms (Makarova et al., 2002) it is possible that cyclic diguanylate might affect

polymerase activity in them.

The majority of sensor domains have been freely exchanged between bacteria and the other superkingdoms. In

particular, several major sensory systems of various eukaryotic lineages appear to be of ultimately bacterial origin.

Examples of these include phytochromes and cytokinin receptors of plants, BLUF domain light receptors of

chlorophyte and euglenozoan algae, ionotropic glutamate receptors, the nitric oxide receptor seen across diverse

eukaryotes, the sensory PBP1 domains of the taste, metabotropic glutamate, atrial natriuretic peptide and speract-

type peptide receptors, ART-LGIC receptors, HD domain cNMP phosphodiesterases, sensory CACHE domains of

the α-2 subunits of the calcium channel, the PAS domains of the circadian regulators and Eag family of potassium

channels in animals and the GAF domains of a novel class of channel-like proteins found in animals and

chromoalveolates (e.g. PFB0510w of Plasmodium falciparum) (Anantharaman and Aravind, 2001; Aravind et al.,

2003; Aravind and Ponting, 1997; Iyer et al., 2003; Lee et al., 2008b; Mougel and Zhulin, 2001; Ponting et al., 1999;

Tasneem et al., 2005). This suggests that the primary diversification of sensor domains and sensory mechanisms

occurred in the bacteria, which had probably “experimented” and “perfected” an enormous diversity of signaling

mechanisms even before the origin of the eukaryotes. Subsequently, the eukaryotes merely acquired these through

lateral transfer and reused them within the context of their own signaling networks that included domains acquired

from the archaeal lineage as well as those innovated within eukaryotes. In certain cases the eukaryotes reused the

bacterial signaling proteins “as is” (e.g. the phytochromes, HD domain cNMP phosphodiesterases and ligand-gated

ion channels). In other cases only certain domains were used and combined with the rest of the eukaryotic signaling

apparatus (e.g. the combination of the PAS domain with the bHLH DNA-binding domain that is unique to

eukaryotes in the circadian regulators) (Aravind and Ponting, 1997; Ponting and Aravind, 1997; Zhulin et al., 1997).

This flow of regulatory systems from bacteria to eukaryotes has also been encountered in other functional systems

such as the protein domains involved in extracellular adhesion, apoptosis-related signaling mechanisms, the

ubiquitin system and chromatin dynamics (Aravind et al., 2003; Aravind and Koonin, 2002; Iyer et al., 2006;

Koonin and Aravind, 2002; Ponting and Aravind, 1997). Hence, the acquisition by eukaryotes of sensor domains

from bacteria might be part of a more general process of the re-use of pre-existing adaptations of the bacterial world

via lateral transfer.

Concluding remarks

Page 31: Bacterial Signaling Chapter

The past 15 years have seen an extraordinary rise in our knowledge of sensory signal transduction in bacteria. As

noted earlier sequence analysis has been at the forefront of identifying and characterizing these domains. Yet, this

information has not been used as efficiently as it could have been done. There are several instances of “rediscovery”

of previously described domains under different names (e.g. the re-description of the CACHE domain in CitA and

the HNOB domain (Pellicena et al., 2004; Reinelt et al., 2003) well after the original reports). In addition to

illustrating the lack of propagation of information, these instances have also resulted in some confusion in terms of

nomenclature. This has in part been mitigated by domain repositories like the PFAM and MiST databases (Finn et

al., 2008; Ulrich and Zhulin, 2007). However, much systematization remains to be done so that the researchers

might have a uniformly agreed framework to use in the future. We hope the survey presented here can serve as a

preliminary step towards the development of such as systematic framework. In practical terms several domains have

been poorly explored in terms of the diversity of their cognate ligands. In this regard we hope that the high-

throughput methods of biochemical genomics might have a major impact. Systematic surveys, where extensive

chemical libraries are matched against comprehensive arrays of sensor domains, might help in filling in the lacunae

in our understanding ligand specificities. We also hope the successful implementation of such technology might also

provide a biotechnological byproduct – i.e. development of novel biosensors. Indeed, we appear poised for an even

greater expansion in our understanding utilization of bacterial sensory systems.

Page 32: Bacterial Signaling Chapter

Appendix-I

Expansions of domain names

Several domains are named based on the certain members of the superfamily that were initially discovered or

characterized. It should be noted that the names are typically chosen for being euphonic and easy to use, and in most

cases the expansions do not have great semantic value.

PAS: Period, Arnt and Single minded domain

GAF: cGMP phosphodiesterase, Adenylate cyclase, FhlA domain

MEKHLA:

HNOB: Heme NO binding domain

HNOBA: HNOB-associated domain

PocR: domain found in PocR

CACHE: Calcium channels and Chemotaxis receptor domain

CHASE: Cyclases/Histidine kinases Associated Sensory Extracellular domain

V4R: Vinyl-4 reductase domain

SLBB: soluble ligand-binding beta-grasp

HMA: Heavy metal associated domain

ACT: Aspartokinase, chorismate mutase, TyrA domain

SOUL: named after the ckSoul protein from Gallus gallus.

CBS: Cystathionine beta-synthase domain

DICT: Diguanylate cyclases and two-component systems domain

STAS: sulfate transporter, anti-sigma factor-binding domain

ASRAH: Anabaena sensory rhodopsin-associated homology domain

T-OB: transporter-Oligomer-binding domain

4HB: 4-helical up-and-down bundle domain

NIT: Nitrate/Nitrite sensor domain

CCTBP: Cysteine-containing TATA-box binding protein-like domain

FadR: domain found in FadR protein

AlcR-N: N-termina domain of AlcR protein

UTRA: UbiC transcription regulator-associated domain

ISOCOT: for isomerase, CoA transferase and translation initiation factor domain

NTF2: domain found in NTF2

MEDS: Methanogen/methylotroph, DcmR sensory domain

BLUF: blue-light Sensors using FAD

7TMR-DISM family: 7TM receptors with diverse intracellular signaling modules

7TMR-HD: 7TM receptors with intracellular HD domains

5TMR-LYT: 5 transmembrane receptors of the LytS-YhcK type

Page 33: Bacterial Signaling Chapter

8TMR-UT: 8-transmembrane receptor UhpB type

MHYT: motif containing methionine, histidine, tyrosine and threonine.

ART-LGIC: Acetylcholine-type ligand-gated ion channels

cNMPBD/CAP: cyclic nucleotide monophosphate-binding domain/ domain found in catabolite activator protein

PilZ: Domain found in PilZ.

STAND: Signal transduction ATPases with numerous domains

Appendix-II

Abbreviations of organism names shown in the figures:

Aave: Acidovorax avenae, Abac: Acidobacteria bacterium, Aful: Archaeoglobus fulgidus , Ahyd: Aeromonas

hydrophila, Amar: Acaryochloris marina, Amet: Alkaliphilus metalliredigens, Ana: Nostoc sp. , Anae:

Anaeromyxobacter sp., Atum: Agrobacterium tumefaciens, Avar: Anabaena variabilis , Bjap: Bradyrhizobium

japonicum , Bpet: Bordetella petrii, Bsub: Bacillus subtilis, Bxen: Burkholderia xenovorans, Caur: Chloroflexus

aurantiacus , Cbei: Clostridium beijerinckii , Cpsy: Colwellia psychrerythraea, Ctha: Chloroherpeton thalassium,

Cvio: Chromobacterium violaceum, Daci: Delftia acidovorans , Daro: Dechloromonas aromatica, Ddes:

Desulfovibrio desulfuricans , Dhaf: Desulfitobacterium hafniense, Ecol: Escherichia coli , Fjoh: Flavobacterium

johnsoniae, Fran: Frankia sp., Gura: Geobacter uraniireducens , Haur: Herpetosiphon aurantiacus, Hche: Hahella

chejuensis, Hmar: Haloarcula marismortui , Hmod : Heliobacterium modesticaldum, Krad: Kineococcus

radiotolerans, Lcho: Leptothrix cholodnii, Lsph: Lysinibacillus sphaericus, Mace: Methanosarcina acetivorans,

Maer: Microcystis aeruginosa , Magn: Magnetococcus sp., Mari: Marinomonas sp., Meth: Methylobacterium sp.,

Mhun: Methanospirillum hungatei, Mlot: Mesorhizobium loti, Mmag: Magnetospirillum magneticum , Msme:

Mycobacterium smegmatis, Mxan: Myxococcus xanthus, Nfar: Nocardia farcinica, Npun: Nostoc punctiforme, Oter:

Opitutus terrae, Paer: Pseudomonas aeruginosa , Ping: Psychromonas ingrahamii, Pola: Polaromonas sp., Rbal:

Rhodopirellula baltica , Reut: Ralstonia eutropha, Rhod: Rhodococcus jostii, Rleg: Rhizobium leguminosarum,

Rpal: Rhodopseudomonas palustris, Rpic: Ralstonia pickettii , Rrub: Rhodospirillum rubrum, Scel: Sorangium

cellulosum, Scoe: Streptomyces coelicolor, Sdeg: Saccharophagus degradans , Sery: Saccharopolyspora erythraea ,

Sfum: Syntrophobacter fumaroxidans, Smel: Sinorhizobium meliloti , Ssp:Synechocystis sp., Susi: Solibacter

usitatus , Swoo: Shewanella woodyi, Tcru : Thiomicrospira crunogena, Tden: Treponema denticola , Tten:

Thermoanaerobacter tengcongensis, Umet: uncultured methanogenic, Vcho: Vibrio cholerae, Vhar: Vibrio harveyi,

Wsuc: Wolinella succinogenes , Xaut: Xanthobacter autotrophicus

Acknowledgements

Research by the authors is funded by the Intramural Research Program of the National Institutes of Health, United

States of America.

Supplementary material

Supplementary material is available at: ftp://ftp.ncbi.nih.gov/pub/aravind/signaling/

Page 34: Bacterial Signaling Chapter

Figure legends

Figure 1. Domain architectures of proteins with sensor domains. Architectures are denoted by their gene name

and species abbreviations, separated by underscores. In the case of the bacterial HD-phosphodiesterase fused to

GAF domains the gene is from an environmental delta proteobacterium genome sequence. For species abbreviations

see Appendix-II. Domains are denoted by their standard names (Appendix-I) or abbreviations provided in the key.

For a comprehensive list of domain names refer to the supplementary materials.

Figure 2. Cartoon representation of diverse sensor domains. Structures are labeled by their type, the protein

name and the PDB id from which the domains were derived. Ligands for most structures are either shown as ball

and stick representations or as space-fill models. The figure shows sensor domains of the PAS-like, HNOB, β-

grasp, ferredoxin-like and the GyrI-like folds.

Figure 3. Cartoon representation of diverse sensor domains. Structures are annotated as in Fig. 2. Illustrated in

the figure are sensor domains belonging to the CBS, STAS, periplasmic-binding, T-OB, 4-helical up and down

bundle, and the ATP-cone folds.

Figure 4. Cartoon representation of diverse sensor domains. Structures are annotated as in Fig. 2. Displayed are

structures of members of the DSBH, UTRA, ISOCOT, NTF2, BLUF, ART-LGIC, cNMPBD and PilZ

superfamilies. Also illustrated is the 7-TM receptor seen in sensory rhodopsins.

Figure 5. Domain architecture networks of proteins involved in signaling.

The network diagrams show connections of domains with each other and all other domains occurring in their

respective polypeptides proteins. A hypothetical example showing how domain architecture networks are

constructed is shown in the yellow box. A, B, C and D are globular domains that occur in a range of combinations.

These are combined into an architectural network where the globular domains are notes and the edges reflect their

physical connectivity. The central figure shows a network of functions, where the domains have been collapsed into

the labeled functional categories. The key functions involved in signaling have been placed in the center of the

network. The thickness of the edges is approximately proportional to the relative frequency with which linkages

between two domains re-occur in distinct polypeptides of signaling domain containing prokaryotic proteins from the

220 organisms (supplementary material). Starting from the right hand corner going clockwise, the networks

represent interaction of signaling domains with 1) two component 2) apoptosis-related signaling domains, 3)

transmembrane receptors and 4) cyclic nucleotide and cyclic diguanylate binding domains. Domains that belong to

the respective functions are placed in the middle of the network and labeled in brown. Sensory domains and other

notable signaling domains have been labeled in each network. The graphs were rendered using Yed

(http://www.yworks.com/en/products_yed_about.html)

Page 35: Bacterial Signaling Chapter

Figure 6. Complexity quotient plots and scaling of proteins containing sensor, PAS and GAF domains. a)

Complexity quotient plot for signaling proteins. The “complexity quotient” for an organism is defined as the product

of two values: the number of different types of domains which co-occurs in signaling proteins, and the average

number of domains detected in these proteins. The complexity quotient is plotted against the total number of

signaling proteins in a given organism. A log curve fitting the general trend of the majority of organisms is shown.

Organisms with anomalous numbers and those at the higher end are labeled. Each protein has at least a single

known or predicted domain with a signaling-related function. A total of 166 signaling and DNA-binding HTH

domains were collected from 220 representative prokaryotic genomes (see supplementary material for a complete

list). b) Scaling of sensor domain containing proteins with proteome size. A total of 55 sensor domains in the 220

genomes were analyzed for this graph. A linear equation fitting the general trend of the majority of organisms is

shown. c) Scaling of PAS domain containing proteins with proteome size. A power law curve fitting the general

trend of the majority of organisms is shown. d) Scaling of GAF domain containing proteins with proteome size. An

exponential curve fitting the general trend of the majority of organisms is shown

Figure 7. Scaling of cNMPBD, chemotaxis receptors and histidine kinases with proteome size, and plot of

receiver domains against histidine kinases. a) Scaling of cNMPBD domain containing proteins with proteome

size. b) Scaling of chemotaxis transducer-domain containing proteins with proteome size. c) Scaling of histidine

kinase domain containing proteins with proteome size. d) Plot of receiver domains vs histidine kinases. In all the

graphs organisms with anomalous numbers and those at the high end are labeled. The organisms analyzed and the

abbreviations of the labeled organisms are as in Appendix-II

Page 36: Bacterial Signaling Chapter

References

Aepfelbacher, M., Zumbihl, R., and Heesemann, J. (2005). Modulation of Rho GTPases and the actin cytoskeleton by YopT of Yersinia. Curr Top Microbiol Immunol 291, 167‐175. Alexander, R. P., and Zhulin, I. B. (2007). Evolutionary genomics reveals conserved structural determinants of signaling and adaptation in microbial chemoreceptors. Proc Natl Acad Sci U S A 104, 2885‐2890. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389‐3402. Amikam, D., and Galperin, M. Y. (2006). PilZ domain is part of the bacterial c‐di‐GMP binding protein. Bioinformatics 22, 3‐6. Anantharaman, V., and Aravind, L. (2000). Cache ‐ a signaling domain common to animal Ca(2+)‐channel subunits and a class of prokaryotic chemotaxis receptors. Trends Biochem Sci 25, 535‐537. Anantharaman, V., and Aravind, L. (2001). The CHASE domain: a predicted ligand‐binding module in plant cytokinin receptors and other eukaryotic and bacterial receptors. Trends Biochem Sci 26, 579‐582. Anantharaman, V., and Aravind, L. (2003). Application of comparative genomics in the identification and analysis of novel families of membrane‐associated receptors in bacteria. BMC Genomics 4, 34. Anantharaman, V., and Aravind, L. (2004). The SHS2 module is a common structural theme in functionally diverse protein groups, like Rpb7p, FtsA, GyrI, and MTH1598/TM1083 superfamilies. Proteins 56, 795‐807. Anantharaman, V., and Aravind, L. (2005). MEDS and PocR are novel domains with a predicted role in sensing simple hydrocarbon derivatives in prokaryotic signal transduction systems. Bioinformatics 21, 2805‐2811. Anantharaman, V., and Aravind, L. (2006). Diversification of catalytic activities and ligand interactions in the protein fold shared by the sugar isomerases, eIF2B, DeoR transcription factors, acyl‐CoA transferases and methenyltetrahydrofolate synthetase. J Mol Biol 356, 823‐842. Anantharaman, V., Balaji, S., and Aravind, L. (2006). The signaling helix: a common functional theme in diverse signaling proteins. Biol Direct 1, 25. Anantharaman, V., Koonin, E. V., and Aravind, L. (2001). Regulatory potential, phyletic distribution and evolution of ancient, intracellular small‐molecule‐binding domains. J Mol Biol 307, 1271‐1292. Andreeva, A., Howorth, D., Chandonia, J. M., Brenner, S. E., Hubbard, T. J., Chothia, C., and Murzin, A. G. (2008). Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36, D419‐425. Aravind, L., and Anantharaman, V. (2003). HutC/FarR‐like bacterial transcription factors of the GntR family contain a small molecule‐binding domain of the chorismate lyase fold. FEMS Microbiol Lett 222, 17‐23. Aravind, L., Anantharaman, V., Balaji, S., Babu, M. M., and Iyer, L. M. (2005). The many faces of the helix‐turn‐helix domain: transcription regulation and beyond. FEMS Microbiol Rev 29, 231‐262. Aravind, L., Anantharaman, V., and Iyer, L. M. (2003). Evolutionary connections between bacterial and eukaryotic signaling systems: a genomic perspective. Curr Opin Microbiol 6, 490‐497. Aravind, L., Dixit, V. M., and Koonin, E. V. (2001). Apoptotic molecular machinery: vastly increased complexity in vertebrates revealed by genome comparisons. Science 291, 1279‐1284. Aravind, L., and Koonin, E. V. (1998). The HD domain defines a new superfamily of metal‐dependent phosphohydrolases. Trends Biochem Sci 23, 469‐472. Aravind, L., and Koonin, E. V. (1999a). DNA polymerase beta‐like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res 27, 1609‐1618. 

Page 37: Bacterial Signaling Chapter

Aravind, L., and Koonin, E. V. (1999b). Gleaning non‐trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol 287, 1023‐1040. Aravind, L., and Koonin, E. V. (2000). The STAS domain ‐ a link between anion transporters and antisigma‐factor antagonists. Curr Biol 10, R53‐55. Aravind, L., and Koonin, E. V. (2002). Classification of the caspase‐hemoglobinase fold: detection of new families and implications for the origin of the eukaryotic separins. Proteins 46, 355‐367. Aravind, L., Mazumder, R., Vasudevan, S., and Koonin, E. V. (2002). Trends in protein evolution inferred from sequence and structure analysis. Curr Opin Struct Biol 12, 392‐399. Aravind, L., and Ponting, C. P. (1997). The GAF domain: an evolutionary link between diverse phototransducing proteins. Trends Biochem Sci 22, 458‐459. Aravind, L., and Ponting, C. P. (1999). The cytoplasmic helical linker domain of receptor histidine kinase and methyl‐accepting proteins is common to many prokaryotic signalling proteins. FEMS Microbiol Lett 176, 111‐116. Avila‐Perez, M., Vreede, J., Tang, Y., Bende, O., Losi, A., Gartner, W., and Hellingwerf, K. (2009). In vivo mutational analysis of YtvA from Bacillus subtilis: mechanism of light activation of the general stress response. J Biol Chem 284, 24958‐24964. Ayers, R. A., and Moffat, K. (2008). Changes in quaternary structure in the signaling mechanisms of PAS domains. Biochemistry 47, 12078‐12086. Baker, M. D., Wolanin, P. M., and Stock, J. B. (2006). Signal transduction in bacterial chemotaxis. Bioessays 28, 9‐22. Barabote, R. D., and Saier, M. H., Jr. (2005). Comparative genomic analyses of the bacterial phosphotransferase system. Microbiol Mol Biol Rev 69, 608‐634. Bateman, A. (1997). The structure of a domain common to archaebacteria and the homocystinuria disease protein. Trends Biochem Sci 22, 12‐13. Benach, J., Swaminathan, S. S., Tamayo, R., Handelman, S. K., Folta‐Stogniew, E., Ramos, J. E., Forouhar, F., Neely, H., Seetharaman, J., Camilli, A., and Hunt, J. F. (2007). The structural basis of cyclic diguanylate signal transduction by PilZ domains. Embo J 26, 5153‐5166. Black, R. A., Hobson, A. C., and Adler, J. (1980). Involvement of cyclic GMP in intracellular signaling in the chemotactic response of Escherichia coli. Proc Natl Acad Sci U S A 77, 3879‐3883. Bocquet, N., Nury, H., Baaden, M., Le Poupon, C., Changeux, J. P., Delarue, M., and Corringer, P. J. (2009). X‐ray structure of a pentameric ligand‐gated ion channel in an apparently open conformation. Nature 457, 111‐114. Braatsch, S., Gomelsky, M., Kuphal, S., and Klug, G. (2002). A single flavoprotein, AppA, integrates both redox and light signals in Rhodobacter sphaeroides. Mol Microbiol 45, 827‐836. Brudler, R., Hitomi, K., Daiyasu, H., Toh, H., Kucho, K., Ishiura, M., Kanehisa, M., Roberts, V. A., Todo, T., Tainer, J. A., and Getzoff, E. D. (2003). Identification of a new cryptochrome class. Structure, function, and evolution. Mol Cell 11, 59‐67. Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sutton, G. G., Blake, J. A., FitzGerald, L. M., Clayton, R. A., Gocayne, J. D., et al. (1996). Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058‐1073. Burroughs, A. M., Allen, K. N., Dunaway‐Mariano, D., and Aravind, L. (2006). Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. J Mol Biol 361, 1003‐1034. Burroughs, A. M., Balaji, S., Iyer, L. M., and Aravind, L. (2007a). A novel superfamily containing the beta‐grasp fold involved in binding diverse soluble ligands. Biol Direct 2, 4. Burroughs, A. M., Balaji, S., Iyer, L. M., and Aravind, L. (2007b). Small but versatile: the extraordinary functional and structural diversity of the beta‐grasp fold. Biol Direct 2, 18. 

Page 38: Bacterial Signaling Chapter

Burroughs, A. M., Iyer, L. M., and Aravind, L. (2009). Natural history of the E1‐like superfamily: implication for adenylation, sulfur transfer, and ubiquitin conjugation. Proteins 75, 895‐910. Campbell, E. A., Greenwell, R., Anthony, J. R., Wang, S., Lim, L., Das, K., Sofia, H. J., Donohue, T. J., and Darst, S. A. (2007). A conserved structural module regulates transcriptional responses to diverse stress signals in bacteria. Mol Cell 27, 793‐805. Chai, Y., Zhu, J., and Winans, S. C. (2001). TrlR, a defective TraR‐like protein of Agrobacterium tumefaciens, blocks TraR function in vitro by forming inactive TrlR:TraR dimers. Mol Microbiol 40, 414‐421. Chang, C., Kwok, S. F., Bleecker, A. B., and Meyerowitz, E. M. (1993). Arabidopsis ethylene‐response gene ETR1: similarity of product to two‐component regulators. Science 262, 539‐544. Chen, G. Q., Cui, C., Mayer, M. L., and Gouaux, E. (1999). Functional characterization of a potassium‐selective prokaryotic glutamate receptor. Nature 402, 817‐821. Cheung, J., and Hendrickson, W. A. (2009). Structural analysis of ligand stimulation of the histidine kinase NarX. Structure 17, 190‐201. Daber, R., Stayrook, S., Rosenberg, A., and Lewis, M. (2007). Structural analysis of lac repressor bound to allosteric effectors. J Mol Biol 370, 609‐619. De Souza, R. F., Iyer, L. M., and Aravind, L. (2009). The Anabaena sensory rhodopsin transducer defines a novel superfamily of prokaryotic small‐molecule binding domains. Biol Direct 4, 25; discussion 25. Dias, J. S., Macedo, A. L., Ferreira, G. C., Peterson, F. C., Volkman, B. F., and Goodfellow, B. J. (2006). The first structure from the SOUL/HBP family of heme‐binding proteins, murine P22HBP. J Biol Chem 281, 31553‐31561. Dunwell, J. M., Purvis, A., and Khuri, S. (2004). Cupins: the most functionally diverse protein superfamily? Phytochemistry 65, 7‐17. Durbin, R. (1998). Biological sequence analysis : probabilistic models of proteins and nucleic acids (Cambridge, UK New York: Cambridge University Press). Ettema, T. J., Brinkman, A. B., Tani, T. H., Rafferty, J. B., and Van Der Oost, J. (2002). A novel ligand‐binding domain involved in regulation of amino acid metabolism in prokaryotes. J Biol Chem 277, 37464‐37468. Ezezika, O. C., Haddad, S., Clark, T. J., Neidle, E. L., and Momany, C. (2007). Distinct effector‐binding sites enable synergistic transcriptional activation by BenM, a LysR‐type regulator. J Mol Biol 367, 616‐629. Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut, S. J., Hotz, H. R., Ceric, G., Forslund, K., Eddy, S. R., Sonnhammer, E. L., and Bateman, A. (2008). The Pfam protein families database. Nucleic Acids Res 36, D281‐288. Fischer, E. H., and Krebs, E. G. (1955). Conversion of phosphorylase b to phosphorylase a in muscle extracts. J Biol Chem 216, 121‐132. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., and et al. (1995). Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496‐512. Galperin, M. Y., Gaidenko, T. A., Mulkidjanian, A. Y., Nakano, M., and Price, C. W. (2001). MHYT, a new integral membrane sensor domain. FEMS Microbiol Lett 205, 17‐23. Galperin, M. Y., Natale, D. A., Aravind, L., and Koonin, E. V. (1999). A specialized version of the HD hydrolase domain implicated in signal transduction. J Mol Microbiol Biotechnol 1, 303‐305. Gao, R., and Stock, A. M. (2009). Biological insights from structures of two‐component proteins. Annu Rev Microbiol 63, 133‐154. Garces, F., Fernandez, F. J., Gomez, A. M., Perez‐Luque, R., Campos, E., Prohens, R., Aguilar, J., Baldoma, L., Coll, M., Badia, J., and Vega, M. C. (2008). Quaternary structural transitions in the DeoR‐type repressor UlaR control transcriptional readout from the L‐ascorbate utilization regulon in Escherichia coli. Biochemistry 47, 11424‐11433. 

Page 39: Bacterial Signaling Chapter

Gebhard, S., and Cook, G. M. (2008). Differential regulation of high‐affinity phosphate transport systems of Mycobacterium smegmatis: identification of PhnF, a repressor of the phnDCE operon. J Bacteriol 190, 1335‐1343. Gilman, A. G. (1987). G proteins: transducers of receptor‐generated signals. Annu Rev Biochem 56, 615‐649. Gomelsky, M., and Klug, G. (2002). BLUF: a novel FAD‐binding domain involved in sensory transduction in microorganisms. Trends Biochem Sci 27, 497‐500. Gorelik, M., Lunin, V. V., Skarina, T., and Savchenko, A. (2006). Structural characterization of GntR/HutC family signaling domain. Protein Sci 15, 1506‐1511. Haft, D. H., Selengut, J., Mongodin, E. F., and Nelson, K. E. (2005). A guild of 45 CRISPR‐associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol 1, e60. Han, Y., Braatsch, S., Osterloh, L., and Klug, G. (2004). A eukaryotic BLUF domain mediates light‐dependent gene expression in the purple bacterium Rhodobacter sphaeroides 2.4.1. Proc Natl Acad Sci U S A 101, 12306‐12311. Hanks, S. K., and Hunter, T. (1995). Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. Faseb J 9, 576‐596. Hay, I. D., Remminghorst, U., and Rehm, B. H. (2009). MucR, a novel membrane‐associated regulator of alginate biosynthesis in Pseudomonas aeruginosa. Appl Environ Microbiol 75, 1110‐1120. Hess, J. F., Bourret, R. B., and Simon, M. I. (1988a). Histidine phosphorylation and phosphoryl group transfer in bacterial chemotaxis. Nature 336, 139‐143. Hess, J. F., Oosawa, K., Kaplan, N., and Simon, M. I. (1988b). Phosphorylation of three proteins in the signaling pathway of bacterial chemotaxis. Cell 53, 79‐87. Hilf, R. J., and Dutzler, R. (2009). Structure of a potentially open state of a proton‐activated pentameric ligand‐gated ion channel. Nature 457, 115‐118. Hochstrasser, M. (2009). Origin and function of ubiquitin‐like proteins. Nature 458, 422‐429. Hogg, T., Mechold, U., Malke, H., Cashel, M., and Hilgenfeld, R. (2004). Conformational antagonism between opposing active sites in a bifunctional RelA/SpoT homolog modulates (p)ppGpp metabolism during the stringent response [corrected]. Cell 117, 57‐68. Holland, L. M., O'Donnell, S. T., Ryjenkov, D. A., Gomelsky, L., Slater, S. R., Fey, P. D., Gomelsky, M., and O'Gara, J. P. (2008). A staphylococcal GGDEF domain protein regulates biofilm formation independently of cyclic dimeric GMP. J Bacteriol 190, 5178‐5189. Hopwood, D. A. (2007). Streptomyces in nature and medicine : the antibiotic maker (Oxford ; New York: Oxford University Press). Huang, N., De Ingeniis, J., Galeazzi, L., Mancini, C., Korostelev, Y. D., Rakhmaninova, A. B., Gelfand, M. S., Rodionov, D. A., Raffaelli, N., and Zhang, H. (2009). Structure and function of an ADP‐ribose‐dependent transcriptional regulator of NAD metabolism. Structure 17, 939‐951. Ignoul, S., and Eggermont, J. (2005). CBS domains: structure, function, and pathology in human proteins. Am J Physiol Cell Physiol 289, C1369‐1378. Iyer, L. M., Anantharaman, V., and Aravind, L. (2003). Ancient conserved domains shared by animal soluble guanylyl cyclases and bacterial signaling proteins. BMC Genomics 4, 5. Iyer, L. M., Burroughs, A. M., and Aravind, L. (2006). The prokaryotic antecedents of the ubiquitin‐signaling system and the early evolution of ubiquitin‐like beta‐grasp domains. Genome Biol 7, R60. Iyer, L. M., Burroughs, A. M., and Aravind, L. (2008). Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination. Biol Direct 3, 45. Iyer, L. M., Tahiliani, M., Rao, A., and Aravind, L. (2009). Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle 8, 1698‐1710. 

Page 40: Bacterial Signaling Chapter

Jacob, F., and Monod, J. (1959). [Genes of structure and genes of regulation in the biosynthesis of proteins.]. C R Hebd Seances Acad Sci 249, 1282‐1284. Jagadeesan, S., Mann, P., Schink, C. W., and Higgs, P. I. (2009). A novel "four‐component" two‐component signal transduction mechanism regulates developmental progression in Myxococcus xanthus. J Biol Chem 284, 21435‐21445. Jordan, I. K., Natale, D. A., Koonin, E. V., and Galperin, M. Y. (2001). Independent evolution of heavy metal‐associated domains in copper chaperones and copper‐transporting atpases. J Mol Evol 53, 622‐633. Joyce, M. G., Levy, C., Gabor, K., Pop, S. M., Biehl, B. D., Doukov, T. I., Ryter, J. M., Mazon, H., Smidt, H., van den Heuvel, R. H., et al. (2006). CprK crystal structures reveal mechanism for transcriptional control of halorespiration. J Biol Chem 281, 28318‐28325. Jung, K. H., Trivedi, V. D., and Spudich, J. L. (2003). Demonstration of a sensory rhodopsin in eubacteria. Mol Microbiol 47, 1513‐1522. Kannan, N., Wu, J., Anand, G. S., Yooseph, S., Neuwald, A. F., Venter, J. C., and Taylor, S. S. (2007). Evolution of allostery in the cyclic nucleotide binding module. Genome Biol 8, R264. Karniol, B., Wagner, J. R., Walker, J. M., and Vierstra, R. D. (2005). Phylogenetic analysis of the phytochrome superfamily reveals distinct microbial subfamilies of photoreceptors. Biochem J 392, 103‐116. Klenk, H. P., Clayton, R. A., Tomb, J. F., White, O., Nelson, K. E., Ketchum, K. A., Dodson, R. J., Gwinn, M., Hickey, E. K., Peterson, J. D., et al. (1997). The complete genome sequence of the hyperthermophilic, sulphate‐reducing archaeon Archaeoglobus fulgidus. Nature 390, 364‐370. Koonin, E. V., and Aravind, L. (2002). Origin and evolution of eukaryotic apoptosis: the bacterial connection. Cell Death Differ 9, 394‐404. Koonin, E. V., Wolf, Y. I., and Aravind, L. (2000). Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem 54, 245‐275. Koretke, K. K., Lupas, A. N., Warren, P. V., Rosenberg, M., and Brown, J. R. (2000). Evolution of two‐component signal transduction. Mol Biol Evol 17, 1956‐1970. Koshland, D. E., Jr. (1980). Bacterial chemotaxis in relation to neurobiology. Annu Rev Neurosci 3, 43‐75. Kovacs, H., Comfort, D., Lord, M., Campbell, I. D., and Yudkin, M. D. (1998). Solution structure of SpoIIAA, a phosphorylatable component of the system that regulates transcription factor sigmaF of Bacillus subtilis. Proc Natl Acad Sci U S A 95, 5067‐5071. Krebs, E. G. (1998). An accidental biochemist. Annu Rev Biochem 67, xii‐xxxii. Kreil, G., and Boyer, P. D. (1964). Detection of bound phosphohistidine in E. coli succinate thiokinase. Biochem Biophys Res Commun 16, 551‐555. Kuner, T., Seeburg, P. H., and Guy, H. R. (2003). A common architecture for K+ channels and ionotropic glutamate receptors? Trends Neurosci 26, 27‐32. La Roche, S. D., and Leisinger, T. (1991). Identification of dcmR, the regulatory gene governing expression of dichloromethane dehalogenase in Methylobacterium sp. strain DM4. J Bacteriol 173, 6714‐6721. Lamparter, T., Carrascal, M., Michael, N., Martinez, E., Rottwinkel, G., and Abian, J. (2004). The biliverdin chromophore binds covalently to a conserved cysteine residue in the N‐terminus of Agrobacterium phytochrome Agp1. Biochemistry 43, 3659‐3669. Lanzilotta, W. N., Schuller, D. J., Thorsteinsson, M. V., Kerby, R. L., Roberts, G. P., and Poulos, T. L. (2000). Structure of the CO sensing transcription activator CooA. Nat Struct Biol 7, 876‐880. Lee, E. J., Cho, Y. H., Kim, H. S., Ahn, B. E., and Roe, J. H. (2004). Regulation of sigmaB by an anti‐ and an anti‐anti‐sigma factor in Streptomyces coelicolor in response to osmotic stress. J Bacteriol 186, 8490‐8498. 

Page 41: Bacterial Signaling Chapter

Lee, J. H., Geiman, D. E., and Bishai, W. R. (2008a). Role of stress response sigma factor SigG in Mycobacterium tuberculosis. J Bacteriol 190, 1128‐1133. Lee, J. H., Kang, G. B., Lim, H. H., Jin, K. S., Kim, S. H., Ree, M., Park, C. S., Kim, S. J., and Eom, S. H. (2008b). Crystal structure of the GluR0 ligand‐binding core from Nostoc punctiforme in complex with L‐glutamate: structural dissection of the ligand interaction and subunit interface. J Mol Biol 376, 308‐316. Leipe, D. D., Koonin, E. V., and Aravind, L. (2004). STAND, a class of P‐loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. J Mol Biol 343, 1‐28. Leonard, C. J., Aravind, L., and Koonin, E. V. (1998). Novel families of putative protein kinases in bacteria and archaea: evolution of the "eukaryotic" protein kinase superfamily. Genome Res 8, 1038‐1047. Lin, C., and Todo, T. (2005). The cryptochromes. Genome Biol 6, 220. Linder, J. U., and Schultz, J. E. (2008). Versatility of signal transduction encoded in dimeric adenylyl cyclases. Curr Opin Struct Biol 18, 667‐672. Ma, X., Sayed, N., Baskaran, P., Beuve, A., and van den Akker, F. (2008). PAS‐mediated dimerization of soluble guanylyl cyclase revealed by signal transduction histidine kinase domain crystal structure. J Biol Chem 283, 1167‐1178. Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B., and Koonin, E. V. (2002). A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res 30, 482‐496. Manson, M. D. (2009). A mutational wrench in the HAMP gearbox. Mol Microbiol 73, 742‐746. Marquenet, E., and Richet, E. (2007). How integration of positive and negative regulatory signals by a STAND signaling protein depends on ATP hydrolysis. Mol Cell 28, 187‐199. Martinez, S. E., Beavo, J. A., and Hol, W. G. (2002). GAF domains: two‐billion‐year‐old molecular switches that bind cyclic nucleotides. Mol Interv 2, 317‐323. Moglich, A., Ayers, R. A., and Moffat, K. (2009a). Design and signaling mechanism of light‐regulated histidine kinases. J Mol Biol 385, 1433‐1444. Moglich, A., Ayers, R. A., and Moffat, K. (2009b). Structure and signaling mechanism of Per‐ARNT‐Sim domains. Structure 17, 1282‐1294. Mohammad, D. H., and Yaffe, M. B. (2009). 14‐3‐3 proteins, FHA domains and BRCT domains in the DNA damage response. DNA Repair (Amst) 8, 1009‐1017. Mougel, C., and Zhulin, I. B. (2001). CHASE: an extracellular sensing domain common to transmembrane receptors from prokaryotes, lower eukaryotes and plants. Trends Biochem Sci 26, 582‐584. Mukherjee, K., and Burglin, T. R. (2006). MEKHLA, a novel domain with similarity to PAS domains, is fused to plant homeodomain‐leucine zipper III proteins. Plant Physiol 140, 1142‐1150. Mukohata, Y., Ihara, K., Tamura, T., and Sugiyama, Y. (1999). Halobacterial rhodopsins. J Biochem 125, 649‐657. Munoz‐Dorado, J., Inouye, S., and Inouye, M. (1991). A gene encoding a protein serine/threonine kinase is required for normal development of M. xanthus, a gram‐negative bacterium. Cell 67, 995‐1006. Neves, S. R., Ram, P. T., and Iyengar, R. (2002). G protein pathways. Science 296, 1636‐1639. Newberry, K. J., and Brennan, R. G. (2004). The structural mechanism for transcription activation by MerR family member multidrug transporter activation, N terminus. J Biol Chem 279, 20356‐20362. Ninfa, A. J., and Magasanik, B. (1986). Covalent modification of the glnG product, NRI, by the glnL product, NRII, regulates the transcription of the glnALG operon in Escherichia coli. Proc Natl Acad Sci U S A 83, 5909‐5913. O'Neill, E., Ng, L. C., Sze, C. C., and Shingler, V. (1998). Aromatic ligand binding and intramolecular signalling of the phenol‐responsive sigma54‐dependent regulator DmpR. Mol Microbiol 28, 131‐141. Patton, T. G., Yang, S. J., and Bayles, K. W. (2006). The role of proton motive force in expression of the Staphylococcus aureus cid and lrg operons. Mol Microbiol 59, 1395‐1404. 

Page 42: Bacterial Signaling Chapter

Pearce, M. J., Mintseris, J., Ferreyra, J., Gygi, S. P., and Darwin, K. H. (2008). Ubiquitin‐like protein involved in the proteasome pathway of Mycobacterium tuberculosis. Science 322, 1104‐1107. Pellicena, P., Karow, D. S., Boon, E. M., Marletta, M. A., and Kuriyan, J. (2004). Crystal structure of an oxygen‐binding heme domain related to soluble guanylate cyclases. Proc Natl Acad Sci U S A 101, 12854‐12859. Phillips, R. W., Wiegel, J., Berry, C. J., Fliermans, C., Peacock, A. D., White, D. C., and Shimkets, L. J. (2002). Kineococcus radiotolerans sp. nov., a radiation‐resistant, gram‐positive bacterium. Int J Syst Evol Microbiol 52, 933‐938. Pittard, J., Camakaris, H., and Yang, J. (2005). The TyrR regulon. Mol Microbiol 55, 16‐26. Pokkuluri, P. R., Pessanha, M., Londer, Y. Y., Wood, S. J., Duke, N. E., Wilton, R., Catarino, T., Salgueiro, C. A., and Schiffer, M. (2008). Structures and solution properties of two novel periplasmic sensor domains with c‐type heme from chemotaxis proteins of Geobacter sulfurreducens: implications for signal transduction. J Mol Biol 377, 1498‐1517. Ponting, C. P., and Aravind, L. (1997). PAS: a multifunctional domain family comes to light. Curr Biol 7, R674‐677. Ponting, C. P., Aravind, L., Schultz, J., Bork, P., and Koonin, E. V. (1999). Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol 289, 729‐745. Potts, M., Sun, H., Mockaitis, K., Kennelly, P. J., Reed, D., and Tonks, N. K. (1993). A protein‐tyrosine/serine phosphatase encoded by the genome of the cyanobacterium Nostoc commune UTEX 584. J Biol Chem 268, 7632‐7635. Purcell, E. B., and Crosson, S. (2008). Photoregulation in prokaryotes. Curr Opin Microbiol 11, 168‐178. Rasouly, A., Schonbrun, M., Shenhar, Y., and Ron, E. Z. (2009). YbeY, a heat shock protein involved in translation in Escherichia coli. J Bacteriol 191, 2649‐2655. Reinelt, S., Hofmann, E., Gerharz, T., Bott, M., and Madden, D. R. (2003). The structure of the periplasmic ligand‐binding domain of the sensor kinase CitA reveals the first extracellular PAS domain. J Biol Chem 278, 39189‐39196. Ryjenkov, D. A., Simm, R., Romling, U., and Gomelsky, M. (2006). The PilZ domain is a receptor for the second messenger c‐di‐GMP: the PilZ domain protein YcgR controls motility in enterobacteria. J Biol Chem 281, 30310‐30314. Saier, M. H., Hvorup, R. N., and Barabote, R. D. (2005). Evolution of the bacterial phosphotransferase system: from carriers and enzymes to group translocators. Biochem Soc Trans 33, 220‐224. Sainsbury, S., Lane, L. A., Ren, J., Gilbert, R. J., Saunders, N. J., Robinson, C. V., Stuart, D. I., and Owens, R. J. (2009). The structure of CrgA from Neisseria meningitidis reveals a new octameric assembly state for LysR transcriptional regulators. Nucleic Acids Res 37, 4545‐4558. Santiago, B., Schubel, U., Egelseer, C., and Meyer, O. (1999). Sequence analysis, characterization and CO‐specific transcription of the cox gene cluster on the megaplasmid pHCG3 of Oligotropha carboxidovorans. Gene 236, 115‐124. Schuster, S. C., Noegel, A. A., Oehme, F., Gerisch, G., and Simon, M. I. (1996). The hybrid histidine kinase DokA is part of the osmotic response system of Dictyostelium. Embo J 15, 3880‐3889. Schuttelkopf, A. W., Boxer, D. H., and Hunter, W. N. (2003). Crystal structure of activated ModE reveals conformational changes involving both oxyanion and DNA‐binding domains. J Mol Biol 326, 761‐767. Selinger, Z. (2008). Discovery of G protein signaling. Annu Rev Biochem 77, 1‐13. Shu, C. J., Ulrich, L. E., and Zhulin, I. B. (2003). The NIT domain: a predicted nitrate‐responsive module in bacterial sensory receptors. Trends Biochem Sci 28, 121‐124. Soderling, S. H., and Beavo, J. A. (2000). Regulation of cAMP and cGMP signaling: new phosphodiesterases and new functions. Curr Opin Cell Biol 12, 174‐179. 

Page 43: Bacterial Signaling Chapter

Soding, J., Biegert, A., and Lupas, A. N. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33, W244‐248. Soisson, S. M., MacDougall‐Shackleton, B., Schleif, R., and Wolberger, C. (1997). The 1.6 A crystal structure of the AraC sugar‐binding and dimerization domain complexed with D‐fucose. J Mol Biol 273, 226‐237. Spiro, S. (1994). The FNR family of transcriptional regulators. Antonie Van Leeuwenhoek 66, 23‐36. Springer, M. S., Goy, M. F., and Adler, J. (1979). Protein methylation in behavioural control mechanisms and in signal transduction. Nature 280, 279‐284. Spudich, J. L., and Luecke, H. (2002). Sensory rhodopsin II: functional insights from structure. Curr Opin Struct Biol 12, 540‐546. Staron, A., Sofia, H. J., Dietrich, S., Ulrich, L. E., Liesegang, H., and Mascher, T. (2009). The third pillar of bacterial signal transduction: classification of the extracytoplasmic function (ECF) sigma factor protein family. Mol Microbiol. Stock, A., Koshland, D. E., Jr., and Stock, J. (1985). Homologies between the Salmonella typhimurium CheY protein and proteins involved in the regulation of chemotaxis, membrane protein synthesis, and sporulation. Proc Natl Acad Sci U S A 82, 7989‐7993. Stock, A. M. (2006). Transmembrane signaling by asymmetry. Nat Struct Mol Biol 13, 862‐863. Stock, J. B., Ninfa, A. J., and Stock, A. M. (1989). Protein phosphorylation and regulation of adaptive responses in bacteria. Microbiol Rev 53, 450‐490. Stock, J. B., Stock, A. M., and Mottonen, J. M. (1990). Signal transduction in bacteria. Nature 344, 395‐400. Suzuki, K., Babitzke, P., Kushner, S. R., and Romeo, T. (2006). Identification of a novel regulatory protein (CsrD) that targets the global regulatory RNAs CsrB and CsrC for degradation by RNase E. Genes Dev 20, 2605‐2617. Swain, K. E., and Falke, J. J. (2007). Structure of the conserved HAMP domain in an intact, membrane‐bound chemoreceptor: a disulfide mapping study. Biochemistry 46, 13684‐13695. Tam, R., and Saier, M. H., Jr. (1993). Structural, functional, and evolutionary relationships among extracellular solute‐binding receptors of bacteria. Microbiol Rev 57, 320‐346. Tasneem, A., Iyer, L. M., Jakobsson, E., and Aravind, L. (2005). Identification of the prokaryotic ligand‐gated ion channels and their implications for the mechanisms and origins of animal Cys‐loop ion channels. Genome Biol 6, R4. Taylor, B. L. (2007). Aer on the inside looking out: paradigm for a PAS‐HAMP role in sensing oxygen, redox and energy. Mol Microbiol 65, 1415‐1424. Taylor, B. L., and Zhulin, I. B. (1999). PAS domains: internal sensors of oxygen, redox potential, and light. Microbiol Mol Biol Rev 63, 479‐506. Torrents, E., Grinberg, I., Gorovitz‐Harris, B., Lundstrom, H., Borovok, I., Aharonowitz, Y., Sjoberg, B. M., and Cohen, G. (2007). NrdR controls differential expression of the Escherichia coli ribonucleotide reductase genes. J Bacteriol 189, 5012‐5021. Tucker, N. P., D'Autreaux, B., Yousafzai, F. K., Fairhurst, S. A., Spiro, S., and Dixon, R. (2008). Analysis of the nitric oxide‐sensing non‐heme iron center in the NorR regulatory protein. J Biol Chem 283, 908‐918. Ulrich, L. E., and Zhulin, I. B. (2005). Four‐helix bundle: a ubiquitous sensory module in prokaryotic signal transduction. Bioinformatics 21 Suppl 3, iii45‐48. Ulrich, L. E., and Zhulin, I. B. (2007). MiST: a microbial signal transduction database. Nucleic Acids Res 35, D386‐390. Urabe, H., and Ogawara, H. (1995). Cloning, sequencing and expression of serine/threonine kinase‐encoding genes from Streptomyces coelicolor A3(2). Gene 153, 99‐104. van Aalten, D. M., DiRusso, C. C., and Knudsen, J. (2001). The structural basis of acyl coenzyme A‐dependent regulation of the transcription factor FadR. Embo J 20, 2041‐2050. 

Page 44: Bacterial Signaling Chapter

Vasquez, V., and Perozo, E. (2009). Structural biology: A channel with a twist. Nature 461, 47‐49. Verhamme, D. T., Arents, J. C., Postma, P. W., Crielaard, W., and Hellingwerf, K. J. (2001). Glucose‐6‐phosphate‐dependent phosphoryl flow through the Uhp two‐component regulatory system. Microbiology 147, 3345‐3352. Vogeley, L., Trivedi, V. D., Sineshchekov, O. A., Spudich, E. N., Spudich, J. L., and Luecke, H. (2007). Crystal structure of the Anabaena sensory rhodopsin transducer. J Mol Biol 367, 741‐751. Walker, J. R., Altamentova, S., Ezersky, A., Lorca, G., Skarina, T., Kudritska, M., Ball, L. J., Bochkarev, A., and Savchenko, A. (2006). Structural and biochemical study of effector molecule recognition by the E.coli glyoxylate and allantoin utilization regulatory protein AllR. J Mol Biol 358, 810‐828. Wang, J. Y., Clegg, D. O., and Koshland, D. E., Jr. (1981). Molecular cloning and amplification of the adenylate cyclase gene. Proc Natl Acad Sci U S A 78, 4684‐4688. Wang, J. Y., and Koshland, D. E., Jr. (1981). The identification of distinct protein kinases and phosphatases in the prokaryote Salmonella typhimurium. J Biol Chem 256, 4640‐4648. Ward, N. L., Challacombe, J. F., Janssen, P. H., Henrissat, B., Coutinho, P. M., Wu, M., Xie, G., Haft, D. H., Sait, M., Badger, J., et al. (2009). Three genomes from the phylum Acidobacteria provide insight into the lifestyles of these microorganisms in soils. Appl Environ Microbiol 75, 2046‐2056. Weinhouse, H., Sapir, S., Amikam, D., Shilo, Y., Volman, G., Ohana, P., and Benziman, M. (1997). c‐di‐GMP‐binding protein, a new factor regulating cellulose synthesis in Acetobacter xylinum. FEBS Lett 416, 207‐211. Whitworth, D. E., and American Society for Microbiology. (2008). Myxobacteria : multicellularity and differentiation (Washington, DC: ASM Press). Williams, S. B., and Stewart, V. (1999). Functional similarities among two‐component sensors and methyl‐accepting chemotaxis proteins suggest a role for linker region amphipathic helices in transmembrane signal transduction. Mol Microbiol 33, 1093‐1102. Wilson, A., Boulay, C., Wilde, A., Kerfeld, C. A., and Kirilovsky, D. (2007). Light‐induced energy dissipation in iron‐starved cyanobacteria: roles of OCP and IsiA proteins. Plant Cell 19, 656‐672. Wower, J., Wower, I. K., and Zwieb, C. (2008). Making the jump: new insights into the mechanism of trans‐translation. J Biol 7, 17. Wu, S. Q., Chai, W., Lin, J. T., and Stewart, V. (1999). General nitrogen regulation of nitrate assimilation regulatory gene nasR expression in Klebsiella oxytoca M5al. J Bacteriol 181, 7274‐7284. Xiong, J., Kurtz, D. M., Jr., Ai, J., and Sanders‐Loehr, J. (2000). A hemerythrin‐like domain in a bacterial chemotaxis protein. Biochemistry 39, 5117‐5125. Zhulin, I. B. (2001). The superfamily of chemotaxis transducers: from physiology to genomics and back. Adv Microb Physiol 45, 157‐198. Zhulin, I. B., Nikolskaya, A. N., and Galperin, M. Y. (2003). Common extracellular sensory domains in transmembrane receptors for diverse signal transduction pathways in bacteria and archaea. J Bacteriol 185, 285‐294. Zhulin, I. B., Taylor, B. L., and Dixon, R. (1997). PAS domain S‐boxes in Archaea, Bacteria and sensors for oxygen and redox. Trends Biochem Sci 22, 331‐333.  

Page 45: Bacterial Signaling Chapter
Page 46: Bacterial Signaling Chapter

PAS (FixL, PDB:1xj4)

N

C

GAF (3’-5’ cyclic PDE2aPDB:1mc0)

N

C

HNOB (TTE0680, PDB:1u56)

N

C

SLBB (transcobalamin

PDB:2bbc)

C

N

HMA (Bacillus CopZ,PDB:1k0v)

N

Cu

C

NC

ACT(3-phosphoglycerate dehydrogenase, PDB:1psd)

GyrI (BmrR, PDB:1exi)

N

C

CACHE (CitA, PDB:1p0z)

NC

NUDIX (NrtR, PDB:3gz8)

N

C

Page 47: Bacterial Signaling Chapter

PBP2 (CatMPDB:2f7c)N

C

T-OB (ModE,PDB:1o7l)

N

C

CBS (Pyrobaculum PAE2072, PDB:2rif)

N

C

PBP1 (LuxP, PDB:2hj9A)

N

C

STAS (SPOIIAAPDB:1buz)

N

C

4-helical bundle - 4HB(Aspartate receptor;

PDB:1vlt)

NC

N

C

4-helical cytochromes(cytochrome b562,

PDB:1qpu)

Hemerythrin (DcrH,PDB:2awc)

N

C

Fe

ATP-cone (Ribonucleotide reductase,PDB:3r1r)

N

C

Page 48: Bacterial Signaling Chapter

DSBH (AraC, PDB:2arc)

N

C

ISOCOT (Ribose-5-Phosphate Isomerase,PDB: 1lk7)

N

C

N

C

7 TM receptor(sensory rhodopsin,

PDB:1ap9)

PilZ (Vibrio VCA0042PDB: 2rde)ART-LGIC

(Acetylcholine-binding protein, PDB:1uv6, Chains C,D)

C

UTRA (Chorismate lyase, PDB:1fw9)

N

C

cNMPBD(CAP, PDB:1cgp)

N

C

NTF2(Orange carotenoid protein

PDB: 1m98)

NC

BLUF (PhotoreceptorPDB: 2hfo)

N

C

N

C

N

N

C

Page 49: Bacterial Signaling Chapter

SHELIX

REC

PAS

HAMP

GAF

HISKIN

CBS

ATPase

7TMR−

DISM

PocR

ACYC

EAL

NarQ

PDZSTYKIN

MHYT

Globin ABC_ATPaseFHA 7TMR−DISMED2CheB_MethylesteraseAPATPase

7TMR−DISMED1

BarAN

STAS

MEDSPP2C

PBPB

Ferredoxin_RRM

CheW

NIT

SWACOS

7TMR−HD

BACTERIALFRINGE

Mbetalac

HNOBA

4HB

MglB

BetaPropeller

MNS

LYTTR

CheD

HTH

JAB

TrkAC

PilZCHASE2

CHASE4

CNMP

CHASE3

GGDEF

TPRIONCHANNEL

HDGYP

5TMR−LYT8TMR−UT

CACHECHASE

TLPCNPBPI

HISKIN

GGDEF

BetaPropeller

LRR

ANKREC

CHASE2

TRYPSERINEPROTEASE

sGTP_APGTPase

NACHT

MalT

SWACOS

GINGIPAIN

MNS

CLOSTRIPAIN

STYKIN

PDZ

CytC

BACTERIALFRINGE

DISCOIDINHPT

Globin

HMGL_TIM_Barrel

HAMP Ferredoxin_betagrasp

FBOXSTAS

HDGYP

PilZ

GLYCO

Ferredoxin_RRM

PASTA

ACYC

GAFHTH

PASATPaseSHELIX

ASH_IG

SH3

PBPI

RNA_Helicase_SFI

FN3

KELCH

EAL

PP2A

PP2C

CNMP

VWA

OBFOLD_NUCLEASE

RNA_Helicase_SFII

PTS_EIIA1

PTX

RICIN

RING

RnasePH

PAP2

CHEM TRANS

CASPASE

ABC_ATPase

FHA

S2P

SAM

S1COLD

APPLE

TIR

MACRODOMAIN

IG

INTEGRINBETA

APATPase

WD40_EUS

TRANSGLUTAMINASE

TPR

GLYCO

HNOBA

TIR

IPP

sGTP

PTXPOLYSACPOLYMERASE

EFHAND

4HBFerredoxin_betagrasp 7TMR−DISMED1

HxxxH

METHYLASE

CASPASE

NACHT

CheB_MethylesteraseMEDSMglB

KH

MNS

7TMR−DISMED2

7TMR−HDED

5TMR−LYT

7TMR−HD

ACET

ABC_ATPase

FN3

PP2C

Globin

Ferredoxin_RRM

NarQ

APATPase

Mbetalac

MHYT

SH3

BACTERIALFRINGE

CheRN_Alpha

DHH CNMP

ACYC

EALHDGYP

GGDEF

Phosphodiesterase

PilZ

SAM

TLPCN

START

PTS_EIIC

S1

S2P

Thermonuclease

STAS

Sm_MscS

WD40_EUS

FHA

IG

PBPI8TMR−UT CBS

IONCHANNEL

CHASE3 PBPB

STYKIN ATPase

CHEM TRANS

7TMR−DISMHPTCHASE2

RHOD_CDC25

REC

RADICAL_SAM

TPR

BarAN

PocRCACHE

CHASE4

SWACOS

Hpr_kinase_N

CheR_SAM−Methyltransferase

TPR

HDGYP

PP2C

HAMP

REC

EAL

GAF

PAS

HISKINGGDEFCHASE

HD

HPT

HTH

CHEMTRANS

7TMR−HDED

7TMR−HD

8TMR−UT

7TMR−DISMED2

7TMR−DISMED1

LyttR

ACYC

7TMR−DISM

SHELIX

5TMR−LYT

CN Signaling

two component Extracellular Sensory Domains

Signal Transmitter Domains

Apoptosisrelated

Misc Receptors

RNA MetabolismLipid signaling

Cal Signaling Misc

GTPaseSignaling

Intracellular Sensory Domains

HTH AdhesionMA chemotaxis

ubiquitinPTS-Accessory

Protein−protein interaction

proteases

Heme and iron sensors

PTS

Ion Transport

CheR_SAM

PASGAF

SHELIX

HISKIN

CHASE

DSBH

HAMPHTH

HPT

NIT

A D

B CA

C D

Protein 1

Protein 2

Protein 3

A

B

C

D

B D

Protein 4

ArchitectureNetwork

Functional Network

Two Component systems

Apoptosis-related systemsTransmembrane Receptors

Cyclic Nucleotide and Diguanylate Binding

CHEM TRANS

Page 50: Bacterial Signaling Chapter

*

*

**

**

*

**

*

**

*

*

***

*** ***

*

*

* **

**

*

*

***

**

*

**

***

**

***

***

***

*

*

**

***

***

**

*

*

**

*

*

*

*

*

**

*

*

**

*

*

** *

*

**

*

**

**

*

*

**

*

*

*

*

***

*

*

*

*

*

*

*

*

**

*

*

*

**

**

*

*

*

*

*

*

*

*

*

*

**

*

*

*

**

*

*

*

**

*

* **

*

* *

***

*

*

**

*

**

*

*

**

*

*

*

*

**

**

*

*

* **

**

*

*

**

*

* ** **

*

** *

*

**

*

*

*

*

*

**

*

**

**

* **

*

*

**

*

*

*

0 500 1000 1500

5010

015

020

0

Number of Proteins

Com

plex

ityqu

otei

nt

AmetCbei

FranMsmeNfar

Rhod

Sery Scoe

Caur Haur

Amar

NpunAna

Ctha

Oter

Susi

Magn

Atum Bjap

Mmag

MlotMeth Rleg

Rpal SmelBpet

Bxen

Daro

DaciGura

MxanScelHche

PaerPing

Swoo

y = 60.604ln(x) - 210.56R² = 0.785

a b

*

*

*****

**

** *

*

*

*** * ** ***

*

** **

**

*

***

*

**

*

***

***

*

* **

***

***

**

*

*

***

*

**

*

**

*

*

*

*

*

*

*

***

* ***

*

*

*

***

**

*

**

**

*

**

*

*

***

**

**

*

*

*

*

**

*

***

*

*

* *

*

**

*

* *

*

*

* **

*

**

*

*

*

**

*

*

*

*

*

*

*

**

*

**

*

***

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

***

* **

*

*

*

****

**

**

*

**

*

*

**

*

**

*

***

*

**

*

*

* * *

*

*

* *

*

**

2000 4000 6000 8000

010

020

030

040

050

060

0

Proteome Size

Prot

ein

Cou

nts

CbeiFran

MsmeRhod

Sery

Scoe

Amar

Maer

Npun

Ana

Rbal

Susi

Atum

Bjap

Mlot

Meth

Rleg

Smel

Bpet

Bxen

CvioDaro

Daci

Pola

Reut

Rpic

Mxan

Scel

AhydHcheMari

Paer

Vhar

y = 0.0449x + 19.949R² = 0.7681

y = 0.0002x1.6602

R² = 0.8383

*

*

**

*

*

*

*

***

*

* **

*

*

*

***

*

*

*

*

**

****

*

*

*

****

**

** *

*

* **

*

*

**

**

*

** *

*

*

** *** ***

*

*

***

*

**

*

*

*

**

*

* * **

**

* *

***

*

*

*

*

*

*

*** *

*

* *

*

*

*

*

**

*

*

*

*

** ** *

*

*

*

*

*

**

*

*

*

**

**

***

*

*

**

*

*

**

*

**

*

**

*

**

**

*

*

*

* **

*

*

*

**

*

**

*

*

*

*

*

*

**

*

**

*

**

*

***

**

*

*

*

* **

*

**

2000 4000 6000 8000

020

4060

8010

0

Hmar Mace

Mhun

UmetDhaf Krad

Rhod

Scoe

Caur Haur Amar

Avar

Npun

AnaOter Susi

Magn

Bjap

Mmag

Meth

RpalRrub

Bxen

Daro

Lcho

Anae

DdesGura

Mxan

Scel

Sfum

Ahyd

Cpsy

HchePaerPingSwoo

Proteome Size

y = 8E-06x1.7626

R² = 0.4622

*

*

**

**

**

*

*

** **** *

**

** ***** *

** * ********

*

*

**

*

*

*

***

*

**

*

*

*

* ** **

*

*

*

*

*

*

*

*

**

*

*

**

*

**

** **

** **

*

**

***

* *

*

* *

**

***

*

**

** *

* *

*

**

*

* ***

**

**

****

** *

**

*

*** **** *

*

****

*

*

*

*

*

*

*

*

** * ** *

*

*

** *

**

*

*** ***

*** * *

*

** **

**

2000 4000 6000 8000

020

4060

80Krad

Rhod

ScoeCaur

Haur

Amar

Avar

Npun

Ana

BjapBxen

Mxan

Scel

d

y = 1.5575e0.0004x

R² = 0.3768

c

Proteome Size

PAS GAF

Pro

tein

Cou

nts

Pro

tein

Cou

nts

Complexity quotient Sensory Domain Containing Proteins

Page 51: Bacterial Signaling Chapter

a

** * *

***

****

*

*** **

***

*

** *** **** * **

*

*

**

*

*

*

**

* ***

** ****

*

*

*

**

**

*

**

*

*

**

*

**

*

**

*

*

*

*

**

*

**

* *** *

** *

***

*

*

*

**

*

* **

*

*

*

*

*

**

**

*

*

*

*

*

**

***

*

*

**

* *

*

**

**

*

*

*

*

*

* *

**

* *

*

**

**

*** *

***

**

*

* ** **

*

* **

*

**

* ** *

**

*

* *

***

2000 4000 6000 8000

05

1015

2025

30

Rhod

Scoe

Amar

Npun

Fjoh BjapMmag

Meth

Bxen Scel

b

y = 1.3208e0.0003x

R² = 0.3626

*

**

**

*

*

* *** * ***

*

* ****

*

*

*

*

*

*

*

**

*

***

*

*

*

**

*

***

** ** *

*

***

**

** **

***

* ** **

***

* **

****

** *

*

*

**

* **

*

** *

*

*

* *

*

*

*

**

*

**

*

**

*

*

*

*

* **

*

*

*

*

*

**

**

*

** *

**

**

**

*

*

*

**

*

*

*

*

***

**

*

*

*

**

*

*

**

*

*

*

*

**

*

*

*

**

*

*

*

*

*

*

2000 4000 6000 8000

010

2030

4050

60

Proteome Size

Cbei

DhafLsph

Scoe

Amar

MagnBjap

Mmag

Meth

Rleg

Rrub

AaveBxen

Cvio

Daci

Scel

Wsuc

Ahyd

Hche

MariVcho

cNMPBD Chemotaxis TransducersP

rote

in C

ount

s

Pro

tein

Cou

nts

*

*

***

*

*

*

*

**

**

**

**

*

***

*

*

*

*

**

****

*

* * ****

***

**

*

***

*

* *

*

*

*

*

*

*

**

*

*

****

***

*

*

*

*

*

*

**

*

*

*

**

*

*

*

*

***

*

* **

**

*

*

*

**

*

****

*

* *

*

**

*

* *

*

*

*

*

** ** *

*

**

*

**

***

*

*

**

*

*****

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

**

**

*

***

*

**

*

*

****

**

*

*

*** *

*

**

*

**

*

**

*

*

**

**

*

*

*

*

*

* **

**

2000 4000 6000 8000

050

100

150

Rhod

Scoe

Amar

Avar

Npun

Ana

OterSusi

Bjap

Mmag

Bxen

Anae

Mxan Scel

Proteome Size

c

y = 1E-05x1.7846

R² = 0.6636

*

*

****

*

***

*

*

* ***

*

*

*

**

*

**

***

***

*

*

******

***

*

* *

*

**

*

**

*

*

*

*

*

*

*

*

*

*

* ****

**

*

*

*

*

*

*

*

*

*

*

*

**

*

**

*

***

* *

*

*

**

*

*

*

**

*

****

*

**

*

**

*

**

*

*

*

*

**

* **

*

*

*

*

**

*

* *

*

*

**

*

**

* **

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* *

*

* *

*

*

****

*

*

*

*

*

***

*

**

*

**

*

*

**

*

**

*

*

*

*

*

*

*

** *

*

*

0 50 100 150

050

100

150

Haur

Amar

Avar

Npun

AnaOter

Abac

SusiMagn

Bjap MmagMeth

Bxen

DaroAnae

Gura

MxanScel

Hche

y = 0.7428x + 1.3445R² = 0.9145

d

Prot

ein

Cou

nts

Histidine Kinases

Proteome Size

Receivers vs Histidine Kinases

Histidine Kinases

Rece

ivers