View
49
Download
0
Category
Preview:
DESCRIPTION
Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein. Illustration from Gerstein & Zheng (2006). Sci Am. Overall Approach Overall Pipeline runs at Yale and UCSC, yielding raw pseudogenes Extraction of coherent subsets for further analysis and annotation - PowerPoint PPT Presentation
Citation preview
Do not reproduce without permission 11 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Gencode Mar '10 Meeting:
Pseudogene Project Update
Mark Gerstein
Illustration from Gerstein & Zheng (2006). Sci Am.
Do not reproduce without permission 22 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Overall Flow:Pipeline Runs, Coherent Sets,
Annotation, Transfer to Sanger • Overall Approach
1. Overall Pipeline runs at Yale and UCSC, yielding raw pseudogenes
2. Extraction of coherent subsets for further analysis and annotation
3. Passing to Sanger for detailed manual analysis and curation
4. Incorporation into final GENCODE annotation
5. Pipeline modification
• Chronology of Sets
1. Encode Pilot 1%
2. Ribosomal Protein pseudogenes
3. Unitary pseudogenes (Hard)
4. Glycolytic Pseudogenes
5. Polymorphic Pseudogenes
6. Pseudogenes Associated with SDs
Do not reproduce without permission 33 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Specific Pseudogene Assignments: Glycolytic
Pseudogenes (completed)
Do not reproduce without permission 44 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Number of pseudogenes for each
glycolytic enzyme
Processed/Duplicated
[Liu et al. BMC Genomics ('09)]
GAPDH
GAPDH
Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance.
Do not reproduce without permission 55 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Number of pseudogenes for each
glycolytic enzyme
Processed/Duplicated
[Liu et al. BMC Genomics ('09)]
GAPDH
GAPDH
Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance.
60 Proc/2 Dup
Do not reproduce without permission 66 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Distribution of human GAPDH pseudogenes
[Liu et al. BMC Genomics ('09, in press)]
Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance.
60 Proc/2 Dup
Do not reproduce without permission 77 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Age calculated based on Kimura-2 parameter model of nucleotide substitution
Aproximate Age of GAPDH pseudogenes
[Liu
et
al.
BM
C G
en
om
ics
('0
9)]
Do not reproduce without permission 88 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Synteny of GAPDH
pseudogenes
Synteny derived based on local gene orthology
[Liu et al. BMC Genomics ('09)]
Do not reproduce without permission 99 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Specific Pseudogene Assignments: Unitary
Pseudogenes (completed)
10
zdz
© m
mix
Pseudogenes
▪ Pseudogenes: nongenic DNA segments with high sequence similarity to functional genes
▪ Unitary pseudogenes: unprocessed pseudogenes with no functional counterparts
{ Unitary pseudogene
Processed pseudogenesTranscription Transposition
Duplication+
Duplicated pseudogenes
Unitary pseudogenesIn situ pseudogenization
11
zdz
© m
mix
Identification pipeline { Unitary pseudogene
~16k human-mouseorthologs
~23k mouseproteins
~6k mouse proteinswithout human orthologs
~600 candidate humanunitary pseudogene loci
76 humanunitary pseudogenes
HGHG
[Zhang et al. GenomeBiology (in press, '10)]
12
zdz
© m
mix
Relativity of unitary pseudogenes { Unitary pseudogene
[Zhang et al. GenomeBiology (in press, '10)]
Do not reproduce without permission 1313
- L
ectu
res.
Ger
stei
nL
ab.o
rg (c
) '0
9
Unitary Pseudogene Families
14
zdz
© m
mix
{ Unitary pseudogeneDating the pseudogenization events
Do not reproduce without permission 1515
- L
ectu
res.
Ger
stei
nL
ab.o
rg (c
) '0
9
Specific Pseudogene Assignments: Polymophic Pseudogenes (in process)
Do not reproduce without permission 1616
- L
ectu
res.
Ger
stei
nL
ab.o
rg (c
) '0
9
11 Polymorphic Pseudogenes
17
zdz
© m
mix
Polymorphic pseudogenes (3 with allele frequency data)
[Zhang et al. GenomeBiology (in press, '10)]3 SNPs not found to be under recent positive selection....
18
zdz
© m
mix
Fst hierarchical clustering for rs4940595 in SERPINB11
....but population structure at rs4940595—the difference in the allelic frequencies in different populations—could
be result of different selective regimes that the same allele at rs4940595 is subjected to in different population subdivisions.
Do not reproduce without permission 1919
- L
ectu
res.
Ger
stei
nL
ab.o
rg (c
) '0
9
Specific Pseudogene Assignments: SD-associated
Pseudogenes (in process)
Segmental duplications (SDs)• Regions of the genome
with 90% sequence identity and 1kb in length
• Based on neutral divergence correspond to last ~40 million years of human evolution
• Comprise ~5-6% of the human genome
• Enriched with genes (~18%) and pseudogenes (duplicated ~45%, processed ~22%)
20
Bailey et al, Science, 2002
Can the study of ψgenes in SDs provide information not obvious from individual dataset ?
Nucleotide substitutions in ψgenes and SDs containing them
21
Most ψgenes show the same number of substitutions as larger SD region containing them
- Duplication accompanied by disablement- Followed by neutral rate of evolution
K2m : Nucleotide substitutions per site computed using Kimura’s two parameter model
Parent gene Duplicated ψgene
Do not reproduce without permission 2222
- L
ectu
res.
Ger
stei
nL
ab.o
rg (c
) '0
9
AcknowledgementsZ ZhangE KhuranaY J LiuYK LamS BalasubramanianG FangN CarrieroR RobilottoP CaytingM WilsonA Frankish M DiekhansR HarteT HubbardJ Harrow
Pseudogene.org
Recommended