Questions• Are we ‘just’ E. coli, except more so?• Where do new genes come from?• Do all genes evolve at the same rate?• Do all tissues & organs evolve at the same rate?• Where do we fit in the tree of life?• What specifies the differences between us and
rodents, or us and chimps?• What specifies the elevated complexity of us versus
other animals?• Can we understand sequence variation among
humans?• How can gene function contribute to behaviour?
23 of 94 InterPro families: Defense and Immunitye.g. IL-1, interferons, defensins
17 of 94 InterPro families: Peripheral nervous systeme.g. Leptin, prion, ependymin
4 of 94 InterPro families: Bone and cartilageGLA, LINK, Calcitonin, osteopontin
3 of 94 InterPro families: LactationCaseins (), somatotropin
2 of 94 InterPro families: Vascular homeostasisNatriuretic peptide, endothelin
5 of 94 InterPro families: Dietary homeostasisGlucagon, bombesin, colipase, gastrin, IlGF-BP
18 of 94 InterPro families: Other plasma factorsUteroglobin, FN2, RNase A, GM-CSF etc.
‘New Domains’
Where do new genes come from?
Structure & Sequence
Sequence
Stepping through structure and sequence space:the FGF / IL-1 beta-trefoilstory
J Mol Biol. 2000 Oct 6;302(5):1041-7.
beta-trefoilsFGFs, interleukin-1s
FGF IL-1
EXTRACELLULAR (CELL-CELL SIGNALLING):
Fascin Hisactophilin
INTRACELLULAR (ACTIN-BINDING PROTEINS):
J.Mol.Biol. 302, 1041-1047
VERT., INVERT. VERT.
VERT., INVERT., FUNGI Dictyostelium.
Gene Genesis• Positive selection often leads to the
erosion of sequence similarity • If this erosion is extensive, homology
cannot be inferred from database search strategies.
• If, concomitantly, there is positive selection for duplication of these genes, this gives the appearance of a new gene/domain family that lacks antecedents.
Copley, Goodstadt, PontingCurrent Opinion in Genetics & Development Volume 13, December 2003, Pages 623-628
Conservation and Selection over Time
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Con
serv
atio
n (%
iden
tity)
50% 100%
% o f o r th o lo g s fo u n d in fu g u
0 300150 450Tim e o f D iv e rg en ce (M y r)
M ouse-rat
Hum an-m ouseHum an-fugu
a
b
c
d
ef
hi
j
g
Cytoplasmic domainsNuclear domainsSecreted domains
0%
20%
40%
60%
80%
100%
0.00 0.10 0.20 0.30 0.40Per
ce
nta
ge
of
se
qu
en
ces
K /KA S
Do all tissues & organs evolve at the same rate?
PNAS | April 2, 2002 | vol. 99 | no. 7 | 4465-4470 Genetics
Large-scale analysis of the human and mouse transcriptomes Andrew I. Su et al.
http://expression.gnf.org
Need to investigate expression of tissue-specific genes.
• Tissue Specificity of a Gene: TS
• A gene's fractional expression in a tissue relative to the sum of its expression in all tissues
• max TS : an indicator of Tissue Specificity.
• Divide data into 5 sets:
• (1) maxTS ≤ 0.1;
• (2) 0.1 < maxTS ≤0.2;
• (3) 0.2 < maxTS ≤ 0.3;
• (4) 0.3 < maxTS ≤ 0.4;
• (5) maxTS > 0.4
All
Non-secreted
Secreted
Non-disease
Disease
Protein secretionaccounts for much ofthe elevation in KA /KS
for Tissue-Specific genes.
Eitan Winter
Slow Fast
Brain Blood
Kidney
Thymus
Liver
(KA/KS=0.04) (KA/KS=0.13)Evolutionary Rates
Low High
Brain Blood
Kidney
Trachaea
Liver
(12.2%) 50%Protein Secretion (%)
Testis
All
Non-secreted
Non-disease
Disease
Secreted
Housekeeping genesare under-representedamong disease genes
Eitan Winter
Low High
Brain Blood
Kidney
Trachaea
Liver
(5.0%) 39%Human Disease (%)
Testis
Tissue-specific genes’ Ks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8M
ed
ian
Ks
-va
lue
Winter et al. Genome Research 14:54-61, 2004
Tissue/Organ Evolution
• Mammalian tissues & organs are evolving at different rates, according to the genes that are specifically expressed in them.
• Perhaps this is not too surprising since there are mammalian-specific tissues & organs!
• Tissue-specific genes are ‘mutating’ at different rates, possibly due to transcription-coupled repair in the germline.
• Mendelian disease acts non-uniformly among genes and tissues.
Human-Mouse Orthologues’ Expression Profile Correlations
0
2
4
6
8
10
12
14
16
18
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pearson Correlation
%
Orthologue Pairs
Random Pairs
EitanWinter
Pan troglodytes genome
• 4X coverage
• average nucleotide divergence of just 1.2%
How do the 2 gene complements differ?
• Gene duplications observed in the human genome.
• Lack of N-glycolylneuraminic acid (Neu5Gc) in humans due to mutation in CMP-sialic acid hydroxylase (Chou et al. PNAS 95(20):11751-6.)
• Mutation in a Siglec (sialic acid receptor) (Angata et al. JBC 276:40282-7)
How do the Great Apes differ from us?
• Rare HIV progression to AIDS• Resistant to malarial infection• Menopause rare• Coronary atherosclerosis rare• Epithelial cancers rare• Alzheimer’s disease pathology incomplete
FOXP2• A point mutation in FOXP2 co-segregates with a disorder in a family in
which half of the members have impaired linguistic and grammatical abilities
• Human FOXP2 contains missense mutations and a pattern of nucleotide polymorphism, which strongly suggest that this gene has been the target of selection during recent human evolution. Enard et al. Nature 418, 869 - 872
Figure 2 Silent and replacement nucleotide substitutions mapped on a phylogeny of primates. Bars represent nucleotide changes. Grey bars indicate amino acid changes. P < 0.001
Loss of Olfactory Receptor Genes Coincides with the Acquisition of Full Trichromatic Vision in Primates.
PLoS Biol. 2004 Jan;2(1):E5. Epub 2004 Jan 20 Gilad et al.
Figure 2. The Proportion of OR Pseudogenes in 20 Species
Table 1. Biological processes showing the strongest evidence for positive selection. The top panel includes the categories showing the greatest acceleration in human lineage, and the bottom panel includes categories with the greatest acceleration in the chimp lineage.
Biological process Number of
genes* PMW (human/Model
2)* PMW (chimp/Model
2)*
Categories showing the greatest acceleration in human lineage
Olfaction 48 0 0.9184 Sensory perception 146 (98) 0 (0.026) 0.9691 (0.9079) Cell surface receptor—mediated signal transduction
505 (464) 0 (0.0386) 0.199 (0.0864)
Chemosensory perception 54 (6) 0 (0.1157) 0.9365 (0.7289) Nuclear transport 26 0.0003 0.2001 G protein—mediated signaling 252 (211) 0.0003 (0.1205) 0.2526 (0.0773) Signal transduction 1030 (989) 0.0004 (0.0255) 0.0276 (0.0092) Cell adhesion 132 0.0136 0.3718 Ion transport 237 0.0247 0.8025 Intracellular protein traffic 278 0.0257 0.8099 Transport 391 0.0326 0.7199 Metabolism of cyclic nucleotides 20 0.0408 0.1324 Amino acid metabolism 78 0.0454 0.0075 Cation transport 179 0.0458 0.8486 Developmental processes 542 0.0493 0.2322 Hearing
21
0.0494
0.9634
Categories with the greatest acceleration in the chimp lineage
Signal transduction 1030 (989) 0.0004 (0.0255) 0.0276 (0.0092) Amino acid metabolism 78 0.0454 0.0075 Amino acid transport 23 0.1015 0.0102 Cell proliferation and differentiation 82 0.3116 0.0182 Cell structure 174 0.2633 0.0233 Oncogenesis 201 0.3132 0.0267 Cell structure and motility 239 0.2208 0.0299 Purine metabolism 35 0.9127 0.0423 Skeletal development 44 0.2876 0.0438 Mesoderm development 168 0.5813 0.0439 Other oncogenesis 39 0.2777 0.0469 DNA repair
49
0.9363
0.0477
* The number of genes and the PMW values excluding olfactory receptor genes are shown in
Clark et al.Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene Trios Science (2003) 302: 1960-1963
Table 2. Molecular functions showing the strongest evidence for positive selection. The table includes only human-accelerated categories, because the only categories accelerated in the chimp lineage are chaperones (P = 0.0124), cell adhesion molecules (P = 0.0220), and extracellular matrix (P = 0.0333).
Molecular function Number of
genes* PMW (human/Model
2)* PMW (chimp/Model
2)*
G protein coupled receptor 199 (153) 0 (0.2533) 0.8689 (0.6776) G protein modulator 62 0.0008 0.3776 Receptor 448 0.0030 0.9798 Ion channel 134 0.0043 0.8993 Extracellular matrix 97 (95) 0.0120 (0.0178) 0.1482 (0.1593) Other G protein modulator 32 0.0149 0.4441 Extracellular matrix glycoprotein
44 (42) 0.0178 (0.0269) 0.1579 (0.1765)
Voltage-gated ion channel 62 0.0219 0.6692 Other hydrolase 95 0.0260 0.4823 Oxygenase 46 0.0303 0.4792 Protein kinase receptor 37 0.0314 0.6911 Transporter 214 0.0338 0.1836 Ligand-gated ion channel 45 0.0405 0.9503 Microtubule binding motor protein
22 0.0421 0.6385
Microtubule family cytoskeletal protein
54
0.0467
0.2815
* The number of genes and the PMW values excluding olfactory receptor genes are shown in parentheses.
• “Smell, Hearing Genes Differ between Chimps and Humans” Genome News Network January 9 2004
• “The 2.5Gb mouse genome sequence reveals about 30,000 genes, with 99% having direct counterparts in humans.”
Nature editorial 5 December 2002.
Questions• Are we ‘just’ E. coli, except more so? Not at all.• Where do new genes come from? Old genes!• Do all genes evolve at the same rate? No.• Do all tissues & organs evolve at the same rate? No.• Where do we fit in the tree of life? Primates!• What specifies the differences between us and
rodents, or us and chimps? Jury is out. Duplicates?• What specifies the elevated complexity of us versus
other animals? Jury is out.• Can we understand sequence variation among
humans? Not yet – Lon’s lecture?• How can gene function contribute to behaviour?
Seen in rodents, but not yet in primates.
Genome Sequencing Capacity (NHGRI)
YEAR 7X genome
(3 Gb)
1X genome
(3 Gb)2003 2.5
genomes18
genomes
2004 4.9 genomes
34 genomes
2005 6.2 genomes
43 genomes
2006 8.4 genomes
59 genomes
Sampling the placental mammalphylogeny
(Murphy et al. Science 2001 294: 2348-51 )
*
*
MRC Functional Genetics Unit, Oxford
Leo GoodstadtRichard EmesEitan WinterSteve Rice
Scott BeatsonNick Dickens
Caleb WebberMichael Elkaim
Jose DuarteZoe BirtleTania Oh
Ensembl (Ewan Briney, Michele Clamp, Abel Ureta-Vidal);Richard Copley (WTCHG, Oxford); Ziheng Yang (UCL);
The Human, Mouse and Rat Genome Sequencing Consortia; UCSC
BibliographyHuman Genome Papers:
Lander et al. Nature (2001) 409, 860-921
Venter et al. Science (2001) 291, 1304-1351.
Mouse Genome Paper:
Waterston et al. Nature (2002) 420, 520-62.
Rat Genome Paper: submitted.
Comparative genomics & evolutionary rates:
Hardison et al. Genome Res. (2003) 13, 13-26.
Adaptive evolution of genomes:
Emes et al. Hum Mol Genet. (2003) 12, 701-9
Wolfe & Li Nat Genet. (2003) 33 Suppl: 255-65