Upload
hector
View
45
Download
0
Embed Size (px)
DESCRIPTION
Major insights from the HGP on. Gene content Proteome content SNP identification Distribution of GC content CpG islands Recombination rates Repeat content. Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1) Gene content. - PowerPoint PPT Presentation
Citation preview
Major insights from the HGP on
Nature (2001) 15th Feb Vol 409 special issue; pgs 814 & 875-914.
1)Gene content
2)Proteome content
3)SNP identification
4)Distribution of GC content
5)CpG islands
6)Recombination rates
7)Repeat content
1) Gene content
30 - 40,000 protein-coding genes estimated based on known genes and predictions
IHGSC Celeradefinite genes 24,500 26,383 possible genes 5000 12,000
Genes encode either protein or noncoding RNAs
rRNA, tRNA, snRNA, snoRNANature (2001) 15th Feb Vol 409 special issue; pg 814-816 and 860-914.
More genes: Twice as many as drosophila / C.elegans
Uneven gene distribution: Gene-rich and gene-poor regions
More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes
More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes)
Nature (2001) 409: pp 892
Gene content….
Gene-rich E.g. MHC on chromosome 6 has 60
genes with a GC content of 54%
Gene-poor regions 82 gene deserts identified? Large or unidentified genes
What is the functional significance of these variations?
Uneven gene distribution
Genetics by Hartwell: pp 341-347
Gene content
2) Proteome content proteome more complex than invertebrates
Nature (2001) 15th Feb Vol 409 special issue; pg 847
Protein Domains (sections with identifiable shape/function)
Domain arrangements in humanslargest total number of domains is 130largest number of domain types per protein is 9Mostly identical arrangement of domains
A A B B CB C C CC Protein X
proteome more complex than invertebrates……
Nature (2001) 15th Feb Vol 409 special issue; pg 847
no huge difference in domain number in humansBUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function)
However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins.
e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila
2) Proteome content….
3) SNPs (single nucleotide polymorphisms)
More than 1.4million SNPs identified
One every 1.9kb length on averageDensities vary over regions and
chromosomes
e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many millions of years
Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
How does one distinguish sequence errors from polymorphisms?sequence errorsEach piece of genome sequenced at least 10
times to reduce error rate (0.01%)
PolymorphismsSequence variation between individuals is 0.1%
To be defined as a polymorphism, the altered sequence must be present in a significant population
Rate of polymorphism in diploid human genome is about 1 in 500 bp
Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
3) SNPs……
Sites that result from point mutations in individual base pairs
biallelic ~60,000 SNPs lie within exons and
untranslated regions (85% of exons lie within 5kb of a SNP)
May or may not affect the ORF Most SNPs may be regulatory
Nature (2001) 15th Feb Vol 409 special issue; pg 821 & 928
http://www.genetics.gsk.com/kids/medicine01.htm
3) SNPs……and disease
3) SNPs……and risk of disease
3) SNPs……and drug prescription
4) Distribution of GC content
Genome wide average of 41%Huge regional variations exist
E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36%
Confirms cytogenetic staining with G-bands (Giemsa)dark G-bands – low GC content (37%)light G-bands – high GC content (45%)
Nature (2001) 15th Feb Vol 409 special issue; pg 876-877
5) CpG islands
Significance of CpG islands1) Non-methylated CpG islands
associated with the 5’ ends of genes2) Aberrant methylation of CpG islands
is one mechanism of inactivating tumor suppressor genes (TSGs) in neoplasia
http://www.sanger.ac.uk/HGP/cgi.shtml
CpG Methyl CpG TpG
methylated at C Deamination
CpG islands show no methylation
CpG islands
Greatly under-represented in human genome
• ~28,890 in number• Variable density
e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/MbAverage is 10.5/Mb
Nature (2001) 15th Feb Vol 409 special issue; pg 877-888
6) Recombination rates
2 main observations• Recombination rate increases with
decreasing arm length• Recombination rate suppressed
near the centromeres and increases towards the distal 20-35Mb
7) Repeat content
a) Age distribution
b) Comparison with other genomes
c) Variation in distribution of repeats
d) Distribution by GC content
e) Y chromosome
Nature (2001) 409: pp 881-891