Upload
hhalhaddad
View
427
Download
0
Tags:
Embed Size (px)
Citation preview
The Human Genome
HAGenetics.org
Dr. Hasan Alhaddad Guest lecturer: Molecular Basis of Human Diseases
October 12th, 14th, 16th 2014 Room 244 (1 PM)
Lectures structure
HAGenetics.org
• Part I (Sunday Oct 12th): • The book of life (Matt Ridely’s analogy with
modifications). • Introduction to the technologies at the time.
• Part II (Tuesday Oct 14th): • Why sequencing genomes/the human genome? • Genome war (public and private projects). • Sequencing the genome.
• Part III (Thursday Oct 16th): • Genome assembly revisited. • Genome annotation. • Genome outcome. • The Genomic era.
AIMS (part III)
• Learn the basic principles and terminology of genome assembly.
• Understand the importance of genome annotation.
• Become familiar with the outcomes of the human genome.
• Understand the technologies and applications that were developed due to the human genome project.
• Become familiar with the OMICS.
HAGenetics.org
Genome Assembly Revisited
HAGenetics.org
Genome Assembly Revisited
HAGenetics.org
DNA sequence: The sequence reads that gets produced by sequencing machine.
This can be considered the primary sequence of the genome.
Genome Assembly Revisited
HAGenetics.org
Sequence alignment: order and connect overlapping sequence reads to for a Contig.
This is something you are likely to do when you sequence a gene.
Genome Assembly Revisited
HAGenetics.org
We can consider Contigs the secondary level of genome assembly.
Genome Assembly Revisited
HAGenetics.org
Scaffolds are the tertiary level of genome assembly.
Scaffolds are also referred to as Super Contigs.
Scaffolds are formed by connecting ordered Contigs.
Genome Assembly Revisited
HAGenetics.org
Scaffolds are formed by connecting ordered and Contigs. How?
Genome Assembly Revisited
HAGenetics.org
Genome Assembly Revisited
HAGenetics.org
Genome assembly quality is measured by Contig/scaffold N50 or similar measures.
Genome Assembly Revisited
HAGenetics.org
What affects the quality of genome assembly?
1.Repeat elements.
2.Variations between the individuals sequenced (segmental duplications).
Genome Annotation
HAGenetics.org
Genome annotation is very important to study the biology of an organism.
Without a proper annotation, the sequence is useless.
Remember!
A book that cannot be read and understood is useless knowledge
Genome Annotation
HAGenetics.org
Genome
Coding Non-coding
Genes
Proteins or RNA
Introns Regulators
Etc.
Repetitive DNA
Interspersed Tandem
SINE LINE LTR
Transposons
Satellite Minisatellite
Microsatellite
The genome sequence can be classified into different groups based on the overall sequence composition and structure.
Genome Annotation
HAGenetics.org
Genome annotation can be divided into two approaches:
1.Structural annotation: 1. Largely in silico. 2. Utilizing the accumulated knowledge of genes and
genomes to identify sequence signatures.
2.Functional annotation: 1. Requires a lot of work and time. 2. Studying the function of the book/code. 3. Involves biochemical analyses of the genome. 4. Gene expression and regulation.
Structural annotation
HAGenetics.org
Introns
Exons5’ UTR 3’ UTR
Start End
Un-Translated Region
Promoter sequence
Regulation sequence
Structural annotation
HAGenetics.org
Hidden Markov Models are used for bioinformatic annotation
Genome Outcome
HAGenetics.org
Genome Outcome
HAGenetics.org
A time line of the developments in genomics
Genome Outcome
HAGenetics.org
Number of genes in the human genome ~ 22K and constitute ~1.5% of the genome
Genome Outcome
HAGenetics.org
Genes categorized
Genome Outcome
HAGenetics.org
Genes categorized
Genome Outcome
HAGenetics.org
Disease genes
Genome Outcome
HAGenetics.org
Potential Drug targets
Genome Outcome
HAGenetics.org
RNA gene are present in multiple copies in the human genome. WHY?
Genome Outcome
HAGenetics.org
Exon and intron size compared to other taxa
Genome Outcome
HAGenetics.org
Overall GC content of the human genome
Genome Outcome
HAGenetics.org
GC is correlated with genes
Genome Outcome
HAGenetics.org
GC is correlated with genes
CpG islands in the promoter region can regulate gene expression
Genome Outcome
HAGenetics.org
We are repeat elements with some genes :-)
Tandem Repeat elements
HAGenetics.org
Minisatellite: Variable Number Tandem Repeats (VNTR)
Repeat unit size = hundreds base pairs
Repeated 4 times
Repeated 8 times Repeated 20 times
Microsatellite: Short Tandem Repeats (STR) – Simple Sequence Repeats (SSR)
Repeat unit size = 2 - 6 base pairs
Tandem Repeat elements
HAGenetics.org
Genome Outcome
HAGenetics.org
• Evolutionary relationship.
• Syntenic region are the conserved regions across taxa.
Genome Outcome
HAGenetics.org
A summary of each chromosome
Genome Outcome
HAGenetics.org
SNP as a marker
HAGenetics.org
Single Nucleotide Polymorphism
1. Many are found in through out the genome.
2. Found in nuclear and mitochondrial DNA.
3. No need for a lot of DNA. 4. Can be used on degraded DNA. 5. Easy to detect – many
platforms. 6. Polymorphism lower than
microsatellites.
SNP as a marker
HAGenetics.org
The SNPs identified by the human genome project allowed the development of SNP arrays (SNP chip).
SNP array allows surveying the genome for variations between individuals easily at a low price.
SNP as a marker
HAGenetics.org
SNP as a marker
HAGenetics.org
Commercial uses of SNP markers to learn about ancestry and health
SNP as a marker
HAGenetics.org
Genome-wide Association studies
(GWAS)
Beyond the genome
HAGenetics.org
The ENCycleopedia Of DNA Elements
1. Transcripts 2. Regulatory elements 3. Enhancers 4. Silencers 5. Origins of replication 6. CpG islands 7. Histone modification sites 8. Open chromatin sites
Beyond the genome
HAGenetics.org
Genome papers are no longer news
The OME Era
HAGenetics.org
The OME Era
HAGenetics.org