45
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems Moscow, Russia January 2008

Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems

Embed Size (px)

Citation preview

Evolution of bacterial regulatory systems

Mikhail Gelfand

Research and Training Center “Bioinformatics”Institute for Information Transmission

ProblemsMoscow, Russia

January 2008

Plan

• Individual sites

• Transcription factors and their binding signals

• Regulatory systems and regulons

Birth and death of sites is a very dynamic process

NadR-binding sites upstream of pnuB seem absent in Klebsiella pneumoniae and Serratia marcescens

… but there are candidate sites further upstream …

… and they are clearly different (not simply misaligned).

Cryptic sites and loss of regulators

Loss of RbsR in Y. pestis (ABC-transporter also is lost)

Start codon of rbsD

RbsR binding site

Unexpected conservation of non-consensus positions in orthologous

sites

regulatory site of LexA upstream of lexAconsensus nucleotides are in caps

Escherichia coli TgCTGTATATActcACAGcA

Salmonella typhi aACTGTATATActcACAGcA

Yersinia pestis agCTGTATATActcACAGcA

Haemophilus influenzae atCTGTATAcAatacCAGTt

Pasteurella multocida TtCTGTATATAataACAGTt

Vibrio cholerae cACTGgATATActcACAGTc

wrong consensus?

TF PurR, gene purL

Escherichia coli ACGCAAACGgTTtCGT

Salmonella typhi ACGCAAACGgTTtCGT

Yersinia pestis ACGCAAACGgTTtCGT

Haemophilus influenzae AtGCAAACGTTTGCtT

Pasteurella multocida ACGCAAACGTTTtCGT

Vibrio cholerae ACGCAAACGgTTGCtT

TF PurR, gene purMEscherichia coli tCGCAAACGTTTGCtT

Salmonella typhi tCGCAAACGTTTGCtT

Yersinia pestis tCGCAAACGTTTGCcT

Haemophilus influenzae tCGCAAACGTTTGCtT

Pasteurella multocida tCGCAAACGTTTGCtT

Vibrio cholerae ACGCAAACGTTTtCcT

Non-consensus positions are more conserved than synonymous codon positions

Regulators and their motifs

• Cases of motif conservation at surprisingly large distances

• Subtle changes at close evolutionary distances

• Correlation between contacting nucleotides and amino acid residues

• Changes in symmetry patterns

NrdR (regulator of ribonucleotide reducases and some other replication-related genes): conservation at large

distances

DNA motifs and protein-DNA interactions

CRP PurR

IHF TrpR

Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom)

The LacI family: subtle changes in motifs at close

distances

G

An

CGGn GC

Specificity-determining positions in the LacI family

Training set: 459 sequencesaverage length: 338 amino acids,85 specificity groups

10 residues contact NPF (analog of the effector)

6 residues in the intersubunit contacts

7 residues contact the operator sequence

7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ)

5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ)

6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ)

– 44 SDPs

LacI from E.coli

The CRP/FNR family of regulators

FNR

HcpR

CooA

Gam ma

Desulfovibrio

Desulfovibrio

TGTCGGCnnGCCGACA

TTGTgAnnnnnnTcACAA

TTGTGAnnnnnnTCACAA

TTGATnnnnATCAA

Correlation between contacting nucleotides and amino acid

residues• CooA in Desulfovibrio spp.• CRP in Gamma-proteobacteria• HcpR in Desulfovibrio spp. • FNR in Gamma-proteobacteria

DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVRDV COOA ELTMEQLAGLVGTTRQTASTLLNDMIREC CRP KITRQEIGQIVGCSRETVGRILKMLEDYP CRP KXTRQEIGQIVGCSRETVGRILKMLEDVC CRP KITRQEIGQIVGCSRETVGRILKMLEEDD HCPR DVSKSLLAGVLGTARETLSRALAKLVEDV HCPR DVTKGLLAGLLGTARETLSRCLSRMVEEC FNR TMTRGDIGNYLGLTVETISRLLGRFQKYP FNR TMTRGDIGNYLGLTVETISRLLGRFQKVC FNR TMTRGDIGNYLGLTVETISRLLGRFQK

TGTCGGCnnGCCGACA

TTGTgAnnnnnnTcACAA

TTGTGAnnnnnnTCACAA

TTGATnnnnATCAA

Contacting residues: REnnnRTG: 1st arginineGA: glutamate and 2nd arginine

The correlation holds

for other factors in

the family

NrtR (regulator of NAD metabolism): systematic search for correlated

positions

• analysis of correlated positions in proteins and sites• analysis of specificity determining positions• the same positions in one alpha-helix identified• plans for experimental verification

NiaR: changed dimer structure?

The GalR family and C-proteins of RM-systems: direct and inverted repeats

BirA: changed spacing

What are the events leading to the present-

day state?• Expansion and contraction of

regulons• New regulators (where from?)• Duplications of regulators with or

without regulated loci• Loss of regulators with or without

regulated loci• Re-assortment of regulators and

structural genes• … especially in complex systems• Horizontal transfer

Trehalose/maltose catabolism in alpha-proteobacteria

Duplicated LacI-family regulators: lineage-specific post-duplication loss

The binding motifs are very similar (the blue branch is somewhat different: to avoid cross-

recognition?)

Utilization of an unknown galactoside in gamma-

proteobacteria

Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup)

Yersinia and Klebsiella: two regulons, GalR and Laci-X

Erwinia: one regulon, GalR

Utilization of maltose/maltodextrin

in Firmicutes

Displacement: invasion of a regulator from a different subfamily (horizontal transfer

from a related species?) – blue sites

Orthologous TFs with completely different

regulons (alpha-proteobaceria and Xanthomonadales)

Catabolism of gluconate in proteobacteria

Extreme variability of the regulation of “marginal” regulon members

γ

Pse

udom

onas

spp

.

β

Regulation of amino acid biosynthesis in Firmicutes

• Interplay between regulatory RNA elements and transcription factors

• Expansion of T-box systems (normally – RNA structures regulating aminoacyl-tRNA-synthetases)

Three regulator

y systems for the

methionine bio-

synthesis

A. SAM-dependent riboswitch

B. Met-T-boxC. MtaR:

repressor of transcription

MtaR

Methionine regulatory systems: loss of S-box regulons

• S-boxes (SAM-1 riboswitch)– Bacillales– Clostridiales– the Zoo:

• Petrotoga

• actinobacteria (Streptomyces, Thermobifida)

• Chlorobium, Chloroflexus, Cytophaga

• Fusobacterium

• Deinococcus

• proteobacteria (Xanthomonas, Geobacter)

• Met-T-boxes (Met-tRNA-dependent attenuator) + SAM-2 riboswitch for metK– Lactobacillales

• MET-boxes (candidate transcription signal)– Streptococcales

Lact. Strep. Bac. Clostr.

ZOO

Recent duplications and bursts: Arg-T-box in Clostridium difficile

LJ_ARGS

LME_ARGS

LR_ARGS

LP_ARGS

CBE_ARGS

CPE_ARGSCB_ARGS

CTC_ARGS

CAC_ARGS

CDF_YQIXYZ

RDF02391

СDF_ARGC

CDF_ARGH

BC_ARGS2EF_ARGS

BH_ARGS

LSA_ARGSPPE_ARGS

LGA_ARGS

Bacillales

argSyqiXYZ

RDF02391

argCJBDF

predictedamino acidtransporters

NEW

argG

argH

Clostridiumdifficile

amino acidbiosynthetic genes

: ARG-specific T-box regulatory site

aminoacyl-tRNA synthetase

biosynthetic genes

amino acid transporters

NEW

Lactobacillales Clostridiales

argS argS

others

… following transcription factor loss

Expansion of T-box regulon

regulation of expression of arginine biosynthetic and transport genes by T-box antitermination

: ARG-specific T-box regulatory site

Binding to 5’ UTR gene region regulation of gene expression

Other clostridia spp. (CA, CTC, CTH, CPE, CB, CPE)

yqiXYZ

argC

argH

yqiXYZ

argC

argG

argH

AhrC regulatory protein (negative regulation of arginine metabolism positive regulation of arginine catabolism)

...AhrC site

: AhrC binding site

Gram+ bacteria: Clostridiumdifficile:

AhrC is lost

5’

Regulon expansion, or how FruR has become CRA

• CRA (a.k.a. FruR) in Escherichia coli:– global regulator

– well-studied in experiment (many regulated genes known)

• Going back in time: looking for candidate CRA/FruR sites upstream of (orthologs of) genes known to be regulated in E.coli

Common ancestor of gamma-proteobacteria

icdA

aceA

aceB

aceEF

pckA

ppsApykF

adhE

gpmApgk

tpiA

gapApfkAfbp

FructosefruKfruBA

eda

eddepd

Glucose

ptsHI-crr

Mannose

manXYZ

mtlDmtlAMannitol

Gamma-proteobacteria

Common ancestor of the Enterobacteriales

icdA

aceA

aceB

aceEF

pckA

ppsApykF

adhE

gpmApgk

tpiA

gapApfkAfbp

FructosefruKfruBA

eda

eddepd

Glucose

ptsHI-crr

Mannose

manXYZ

mtlDmtlAMannitol

Gamma-proteobacteriaEnterobacteriales

Common ancestor of Escherichia and Salmonella

icdA

aceA

aceB

aceEF

pckA

ppsApykF

adhE

gpmApgk

tpiA

gapApfkAfbp

FructosefruKfruBA

eda

eddepd

Glucose

ptsHI-crr

Mannose

manXYZ

mtlDmtlAMannitol

Gamma-proteobacteriaEnterobacterialesE. coli and Salmonella spp.

Life without Fur

Regulation of iron homeostasis (the Escherichia coli paradigm)

Iron:• essential cofactor (limiting in many environments)• dangerous at large concentrations

FUR (responds to iron):• synthesis of siderophores• transport (siderophores, heme, Fe2+, Fe3+)• storage• iron-dependent enzymes• synthesis of heme• synthesis of Fe-S clusters

Similar in Bacillus subtilis

Regulation of iron homeostasis in α-proteobacteria

Experimental studies:• FUR/MUR: Bradyrhizobium, Rhizobium and Sinorhizobium• RirA (Rrf2 family): Rhizobium and Sinorhizobium • Irr (FUR family): Bradyrhizobium, Rhizobium and Brucella

RirA IrrFeS heme

RirA

degraded

FurFe

Fur

Iron uptake systems

Siderophoreuptake

Fe / Feuptake Transcription

factors

2+ 3+

Iron storage ferritins

FeS synthesis

Heme synthesis

Iron-requiring enzymes

[iron cofactor]

IscR

Irr

[- Fe] [+Fe]

[+Fe][- Fe]

[+Fe][ Fe]-

FeS

FeS statusof cell

Distribution of

transcription factors in genomes

Search for candidate motifs and binding sites using standard comparative genomic techniques

Regulation of genes in

functional subsystemsRhizobiales

Bradyrhizobiaceae

Rhodobacteriales

The Zoo (likely ancestral state)

Reconstruction of history

Appearance of theiron-Rhodo motif

Frequent co-regulation

with Irr

Strict division of function

with Irr

All logos and Some Very Tempting Hypotheses:

1. Cross-recognition of FUR and IscR motifs in the ancestor.

2. When FUR had become MUR, and IscR had been lost in Rhizobiales, emerging RirA (from the Rrf2 family, with a rather different general consensus) took over their sites.

3. Iron-Rhodo boxes are recognized by IscR: directly testable

1

2

3

Summary and open problems• Regulatory systems are very flexible

– easily lost– easily expanded (in particular, by duplication)– may change specificity– rapid turnover of regulatory sites

• With more stories like these, we can start thinking about a general theory– catalog of elementary events; how frequent?– mechanisms (duplication, birth e.g. from enzymes,

horizontal transfer)– conserved (regulon cores) and non-conserved (marginal

regulon members) genes in relation to metabolic and functional subsystems/roles

– (TF family-specific) protein-DNA recognition code– distribution of TF families in genomes; distribution of

regulon sizes; etc.

People• Andrei A. Mironov – software, algorithms • Alexandra Rakhmaninova – SDP, protein-DNA correlations

• Anna Gerasimova (now at U. Michigan) – NadR• Olga Kalinina (on loan to EMBL) – SDP• Yuri Korostelev – protein-DNA correlations• Ekateina Kotelnikova (now at Ariadne Genomics) – evolution of

sites• Olga Laikova – LacI• Dmitry Ravcheev– CRA/FruR• Dmitry Rodionov (on loan to Burnham Institute) – iron etc.• Alexei Vitreschak – T-boxes and riboswitches

• Andy Jonson (U. of East Anglia) – experimental validation (iron)• Leonid Mirny (MIT) – protein-DNA, SDP• Andrei Osterman (Burnham Institute) – experimental validation

• Howard Hughes Medical Institute • Russian Foundation of Basic Research• Russian Academy of Sciences, program “Molecular and Cellular Biology”• INTAS