Upload
norma-clarke
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Evolution of Plant Stress Responsiveness:Evolution of Plant Stress Responsiveness:Genome-wide and Gene Family Level AnalysisGenome-wide and Gene Family Level Analysis
Shin-Han ShiuDepartment of Plant Biology
KBS, 1/18, 2008
OutlineOutline
Major interests and why
Gene families and stress responsiveness: The interplay between gene family expansion, duplication
mechanism, and the elusive selection pressure
The Receptor Kinase family as an example One of the biggest plant gene families and their involvement
in plant biotic interactions
If there is enough time, the short story on plant pseudogenes When can you can a gene pseudogene?
Major interestsMajor interests
Molecular evolutionary
patterns
Genetic basisof adaptation
Source of selection pressure: abiotic and biotic stress conditions
Target of selection: duplicate genes
Where does all these duplicates come fromWhere does all these duplicates come from
+
Whole genome duplication
Tandem duplication
Segmental duplication
Replicative transposition
Plasticity of plant gene contentsPlasticity of plant gene contents
3 whole genome duplications in the Arabidopsis thaliana lineage over the past ~150 million years
More “recent” retentions in plants
*: Number of orthologous groups in shared families between Arabidopsis and rice.**: Number of genes in shared families.
120,000
Arabidopsisgene content:
21,000**
60,000
30,000
15,000*
Expected Observed
Shiu et al. (2006) PNAS
Plant Gene Family Evolution: Major questionsPlant Gene Family Evolution: Major questions
What is the rate of gene gains in plants?
Do certain types of genes have higher gain rate?
What is the influence of duplication mechanisms?
Finally, how does genes that are responsive to stresses behave?
AtGenExpress microarray dataset22 stress conditions
Measuring Lineage-specific GainMeasuring Lineage-specific Gain
Orthologous group and lineage-specific gain Reconcile species and gene trees
Retention rate along the Retention rate along the A. thaliana A. thaliana lineage lineage
Diminishing rate of retention over time
M R P A
5,774.5±72.8
Moss Rice PoplarArabidopsis
6,508.9±109.7
8,029.4±133.8
6,995
8,474
10,050
Tree-based
Similarity-basedA-M
A-R
A-P
Number of gene gains
700 MYA
200 MYA
100-120MYA
11.3 - 14.4
47.5 - 70.0
32.0 - 42.4
Gain rate(gain/MY)
1
2
3
1
2
3
1521-1576 3.0- 3.2
734-1479 9.2-18.5
5774-6995 48.1-58.3
Retained (R) Rate (R/My)
Expansion at the gene family level Expansion at the gene family level
Lineage-specific gains per family in one plant lineage are moderately correlated with gains in the other lineage.
y = 0.32xr 2 = 0.33
0
30
60
90
120
150
0 50 100 150 200 250 300
A. thaliana lineage-specific gains
cificeps-e
gaenil
sneta
p.P
ga
ins
LRR
Protein kinase
PPR
NB-ARCC1
PPR
P450
Protein kinase
UDPGT
LRR
Kinesin
AP2
Moss
Moss
Moss
1
1
4
3
4
1
At 9
Moss 5
Rensing et al. (2008) Science
E.g. a family with 3 OGs
Expansion at the orthologous group level Expansion at the orthologous group level
Enrichment:
Exp
Obs
N
Nlog
−0.7 0.7log2(Obs/Exp)
0
Sp.Branch
size
Moss
A. t.
2
3
Orthologousgroup
Moss
A. t.
log(OG size)
log
(fre
q)
Two major patterns in OG expansion Two major patterns in OG expansion
Convergent expansion Single lineage expansion
5
4
2
3
1
0
0 1 2 3 4 5 > 6
> 6
0 1 2 3 4 5
5
4
2
3
1
0
> 6
> 6
5
4
2
3
1
0
0 1 2 3 4 5
> 6
> 6
Arabidopsis thaliana Arabidopsis thalianaArabidopsis thaliana
sso
M
eciR
ral
p
−0.7 0.7log2(Obs/Exp)
0Enrichment:
Exp
Obs
N
Nlog
Expansion patterns and duplication mechanisms Expansion patterns and duplication mechanisms
Comparison of ratios between tandem and non-tandem genes e.g. for A-M orthology
Expansion patternOG type Method for defining OG
Convergent 1 Single-lineage2P values
3
Similarity 0.17 (756/4500) 0.30 (848/2918) 2.2×10-23
A-MTree 0.16 (831/5297) 0.40 (1443/3566) 9.4×10-88
Similarity 0.31 (959/3115) 0.47 (644/1375) 3.1×10-12
A-RTree 0.27 (844/3073) 0.50 (1631/3294) 2.3×10-33
Similarity 0.29 (1141/3944) 0.60 (741/1234) 7.2×10-38
A-PTree 0.26 (1014/3930) 0.64 (1578/2452) 1.0×10-83
Convergent Single-lineage
Tandem 756 848
Non-tandem 4500 2918
Ratio 0.17 0.30
Summary I Summary I
Duplicate gene turn over But even though some of them are retained for millions of
years, the majority of them will be lost over hundreds MY time scale.
The degree of lineage-specific expansion is similar at the family level but with substantial variation
Expansion patterns fall into two major categories Convergent expansion Single lineage expansion
Orthologous group with single lineage expansion Tend to be enriched in tandemly repeated genes
What's so special about tandem genesWhat's so special about tandem genes
Duplication rate (event per unit time):
Whole genome duplication: 1 event / ~50 million years
Tandem duplication: multiple events / generation
Rate of recombination
Recombination rate: Pathogen attack > control Lucht et al., 2002. Nature.
Recombination rate: Tandem > non-tandem Zhang & Gaut, 2003. Genome Res.
Gene family expansion and functional bias Gene family expansion and functional bias
Question: What types of genes tend to experience expansion? What is the influence of duplication mechanism?
Classification of genes: In OG without expansion In OG with expansion
Gene Ontology, controlled vocabulary describing Gene functions:
e.g. protein kinase, involved in attaching phosphates onto self or other proteins, serving as a molecular switch.
Biological processes involved e.g. serine/threonine phosphorylation, the process of attaching
phosphate onto amino acid ser or thr. Location within the cell
e.g. plasma membrane
Functional bias of gene retention Functional bias of gene retention
None
Moss Rice Poplar Arabidopsis
Plasma membraneLipase activityCopper ion-bindingResposes to ABAHypersensitive responseCell wall organization
Translation
Ribosome, Apoplast,Membrane,Cell wall, Cytosol,UDP-glycosyl transferase,Receptor activity,Ca2+ homeostasis,Enzyme inhibitor,Electon transport,Phosphorylation,Stress response
A-M
A-R
A-P
Stress response categories over-represented in the vascular plant lineage
Cellular component categories: T vs. NT Cellular component categories: T vs. NT
Tandem: Extracellular region, cell surface, endomembrane Non-tandem: cytosol, cytoskeleton, nucleus
Biological process categories: T vs. NT Biological process categories: T vs. NT
Tandem: kinases, glycosinolate transferase, toxin responses
Non-tandem: regulation & hormone metabolism
response tostimulus
cellularprocess
physiologicalprocess
defenseresponse
response tochemicalstimulus
response tobiotic stimulus
responseendogenousstimulus
response toabioticstimulus
responsestress
cellularphysiological
process
cellularmetabolism
transport
generationprecursor metand energy
toxinmetabolism
phosphorusmetabolism
metabolism
localization
secondarymetabolism
glycosinolatemetabolism
responsetoxin
response drug
response toother
organismresponse tobacterium
peptidetransport
drug transport
lipidtransport
establishmentlocalization
response toosmotic stress
regulationbiologicalprocess
regulationcellularprocess
cellcommunication
regulationcell physiolprocess
regulationcellular
metabolism
hormonemetabolism
regulationphysiological
process
regulationmetabolism
responsehormonestimulus
responseABA
stimulus
signal
Biological process categories: T vs. NT (contd.)Biological process categories: T vs. NT (contd.)
response tostimulus
cellularprocess
physiologicalprocess
defenseresponse
response tochemicalstimulus
response tobiotic stimulus
responseendogenousstimulus
response toabioticstimulus
responsestress
cellularphysiological
process
cellularmetabolism
transport
generationprecursor metand energy
toxinmetabolism
phosphorusmetabolism
metabolism
localization
secondarymetabolism
glycosinolatemetabolism
responsetoxin
response drug
response toother
organismresponse tobacterium
peptidetransport
drug transport
lipidtransport
establishmentlocalization
response toosmotic stress
regulationbiologicalprocess
regulationcellularprocess
cellcommunication
regulationcell physiolprocess
regulationcellular
metabolism
hormonemetabolism
regulationphysiological
process
regulationmetabolism
responsehormonestimulus
responseABA
stimulus
signal
Tandem: response to stimuli, various transport functions Non-tandem: cell-cell communication and hormone
response
Stress responsivenessStress responsiveness
Expression data set: Arabidopsis thaliana Under 22 abiotic and biotic stress conditions
Definition: stress responsiveness For a given gene
ET: Expression level under stress condition Ec: Expression level under mock treatment control
If ET >> Ec: Significant UP regulation
If ET << Ec: Significant DOWN regulation
Question: do stress responsive genes tend to be those that are gained throughout plant evolution?
Expansion of responsive genes and conditionsExpansion of responsive genes and conditions
Response Up regulation Down regulation
OG type A-M A-R A-P A-M A-R A-P
Statistical test Exp1 T/N2 Exp T/N Exp T/N Exp T/N Exp T/N Exp T/N
Abiotic stress conditions3
UV-B + T + T + T + N
Wounding + T + + + +
Cold4C + N + N + N
Heat + N + N
Drought + + T + + +
Salt + + +
Osmotic + + +
Biotic stress conditions3
AvrRpm1 + + +
DC3000 + + +
Flg22 + T + T + + +
GST-NPP1 + T + T + T
HrcC- + T + T + T
HrpZ + + +
P. infestans + T + T + T
Psph + T + T + T
Genes in expanded OGs tends be enriched in stress responsive genes
+: significant at the 5% level
Stress responsiveness and duplication Stress responsiveness and duplication mechanismsmechanisms
Response Up regulation Down regulation
OG type A-M A-R A-P A-M A-R A-P
Statistical test Exp1 T/N2 Exp T/N Exp T/N Exp T/N Exp T/N Exp T/N
Abiotic stress conditions3
UV-B + T + T + T + N
Wounding + T + + + +
Cold4C + N + N + N
Heat + N + N
Drought + + T + + +
Salt + + +
Osmotic + + +
Biotic stress conditions3
AvrRpm1 + + +
DC3000 + + +
Flg22 + T + T + + +
GST-NPP1 + T + T + T
HrcC- + T + T + T
HrpZ + + +
P. infestans + T + T + T
Psph + T + T + T
Enrichment of tandemly over non-tandemly expanded genes under biotic conditions
Significant at the 5% level
T: tandem >> non-tandemN: non-tandem >> tandem
Tandem genes tend to be “bioticly” responsive Tandem genes tend to be “bioticly” responsive
This does not mean biotic responsive genes tend to be tandem Among GO molecular function categories that are enriched in
genes respond to biotic stresses: Tandem >> non-tandem
Non-tandem >> tandem
DNA binding
transferaseglycosy
1MPRrva
UDP-glycosy
nucleic actranscription factortranscription regulator
kinase_activity
bindingion bindingmetal ion bindingtransition metal ion bindingcarbohydrate bindingoxidoreductase
0003CD
1PPN-TSG22glF
ZprH
snatsefni-Phps
Summary II Summary II
Over the course of plant evolution, retention rate: Stress response genes >> genome average
True for genes up-regulated in both biotic and abiotic stress conditions
Influence of duplication mechanism, particularly for biotic stress conditions, retention rate: Tandem >> non-tandem
However, genes responsive to biotic stimuli are not necessarily tandem Depend on their location in the signaling network e.g. Plant receptor kinase: biotic -> tandem e.g. Transcription factors -> non-tandem, presumably WGD
Receptor KinaseReceptor Kinase
Shiu & Bleecker (2001) Science’s STKE
ArabidopsisTransmembrane Kinase 1
Functional bias: the Receptor-Like Kinase familyFunctional bias: the Receptor-Like Kinase family
Shiu & Bleecker (2001) PNAS
The Kinase superfamilyThe Kinase superfamily
Family size differences imply differential expansion Kinase: >1000 in A. thaliana, >1600 in Oryza sativa RLK/Pelle: ~600 in At, ~1200 in Os
Animal homolog: Drsophila: Pelle Mammalian: IRAKs
Shiu et al. (2004) Plant Cell
Receptor kinase configurationReceptor kinase configuration
Chlamydomonasreinhardtii
Ostreococcustauri
Physcomitrellapatens (M)
Oryza sativa (O)
Arabidopsisthaliana (A)
Populustrichocarpa (P)
356
73
256 356
73
256
388148
462
376
159
911
453
187
1003
388148
462
388148
462
376
159
911
376
159
911
453
187
1003
453
187
1003
424
2
93
424
2
424
2
9393
RLKRLCKOther Kinases
RLKRLCKOther Kinases
ECD Kinase
Kinase
InnovationInnovation
LysM
GDPD
CHASE
LRR
LRR
LRR
Thaumatin
ThaumatinThaumatin
GH18
GH18
DUF26
DUF26DUF26
Functional bias: motivated by RLK studiesFunctional bias: motivated by RLK studies
Shiu et al., 2004 Plant Cell
Stress responsiveness of RLKs Stress responsiveness of RLKs
RLKs are more responsive to stress than genome average
Response Up regulation Down regulation
Statistical test RLK T/N RLK T/N
Abiotic stress conditionsUV-B O T O NWounding O O NDrought U naCold4C O NHeat USaltOsmotic O O N
Biotic stess conditionsFlg22 O T NGST-NPP1 O THrcC- O T OP.infestans O T OPsph O T O NHrpZ O T NAvrRpm1 O O NDC3000 O
Stress responsiveness of RLKs Stress responsiveness of RLKs
Tandem RLKs are more responsive to biotic stress than non-tandem RLKs
Response Up regulation Down regulation
Statistical test RLK T/N RLK T/N
Abiotic stress conditionsUV-B O T O NWounding O O NDrought U naCold4C O NHeat USaltOsmotic O O N
Biotic stess conditionsFlg22 O T NGST-NPP1 O THrcC- O T OP.infestans O T OPsph O T O NHrpZ O T NAvrRpm1 O O NDC3000 O
Stress responsiveness and tandem RLKsStress responsiveness and tandem RLKs
Responsiveness (R) of an RLK subfamily For subfamilies with ≥ 10 genes i: subfamily j: condition UP: # of up-regulated genes DN: # of down-regulated
j i
ji
j i
ji N
DNR
N
UPR or
In the context of biotic stress signaling In the context of biotic stress signaling networksnetworks
T > NT
NT > T
NT > T
T > NT
Summary IIISummary III
Innovation in the RLK/Pelle family Most RK configuration established > 700 million ago. Plenty evidence of domain shuffling, but the rate is not high. Shuffled domains suggest involvement in biotic stress
perception.
History of expansion 4 major turnover patterns Substantially more recent gains in poplar and rice Mostly involved subfamilies with lots of tandem repeats
Stress responsiveness RLK > genome average Tandem > non-tandem Biotic > abiotic Stress responsive genes are not necessarily tandem
Plant pseudogenesPlant pseudogenes
Pseudogenes are: Genomic DNA sequences similar to normal genes but non-
functional
For protein coding genes, non-functional to many means: They have frameshift mutation or premature stop codons They are not transcribed into mRNA They exhibit signatures of neutral selection
Pseudogene numbers and family size Pseudogene numbers and family size
Gene family size is generally correlated with the number of pseudogenes in the family in question.
0 200 400 600 800
00
20
40
60
80
01
02
1
Domain family size
sene
go
dues
pf
ore
bm
uN
F-box
LRR_1
Pkinase_tyrPkinase
LRRNT-2
RRM1
Myb_DNA_binding
P450
PPRNB-ARC
Family size (S)1 Slope2 Spearman's rank (ρ) p-value
Overall 0.1247 0.5484 <2.2e-16
S < 10 0.0967 0.3000 <2.2e-16
10 ≤ S < 25 0.1259 0.2291 3.94e-5
25 ≤ S < 50 0.1950 0.2307 0.0209
50 ≤ S < 100 0.1650 0.3317 0.0152
S > 100 0.1177 0.6042 3.20e-4
Selection pressure on pseudogenes Selection pressure on pseudogenes
Pseudogenes still show signatures of purifying selection
Determine pseudogene expression Determine pseudogene expression
Tiling microarray Cover the whole genome, regardless of the annotation Can distinguish sense and antisense transcripts
Exon
UTR
Intron
Cis-regulatory elements
Novel genes
MAR (Matrix attachment regions)
Transcript array
Tiling array
Selection pressure on pseudogenes Selection pressure on pseudogenes
Pseudogenes still show signatures of purifying selection
A. Arabidopsis
B. Rice
Summary IVSummary IV
Relationships between gene family sizes and the numbers of pseudogenes Positively correlated Larger gene families tend to loss more frequently than
smaller families
Pseudogene still shows signature of purifying selection Mostly may due to the fact that pseudogenization event
occurred relatively recently
Pseudogenes are still expressed Significantly higher than intron antisense expression In rice, pseudogene expression is even as high as that
among presumably functional genes
AcknowledgementAcknowledgement
Lab members Melissa Lehti-Shiu Gaurav Moghe Cheng Zou
Past member Kosuke Hanada, RIKEN
Collaborators Jeff Conner Gregg Howe, PRL Rong Jin, CSE Doug Schemske Mike Thomashow, PRL
Funding: