44
The Human Genome HAGenetics.org Dr. Hasan Alhaddad Guest lecturer: Molecular Basis of Human Diseases October 12 th , 14 th , 16 th 2014 Room 244 (1 PM)

The Human Genome Project - Part III

Embed Size (px)

Citation preview

Page 1: The Human Genome Project - Part III

The Human Genome

HAGenetics.org

Dr. Hasan Alhaddad Guest lecturer: Molecular Basis of Human Diseases

October 12th, 14th, 16th 2014 Room 244 (1 PM)

Page 2: The Human Genome Project - Part III

Lectures structure

HAGenetics.org

• Part I (Sunday Oct 12th): • The book of life (Matt Ridely’s analogy with

modifications). • Introduction to the technologies at the time.

• Part II (Tuesday Oct 14th): • Why sequencing genomes/the human genome? • Genome war (public and private projects). • Sequencing the genome.

• Part III (Thursday Oct 16th): • Genome assembly revisited. • Genome annotation. • Genome outcome. • The Genomic era.

Page 3: The Human Genome Project - Part III

AIMS (part III)

• Learn the basic principles and terminology of genome assembly.

• Understand the importance of genome annotation.

• Become familiar with the outcomes of the human genome.

• Understand the technologies and applications that were developed due to the human genome project.

• Become familiar with the OMICS.

HAGenetics.org

Page 4: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

Page 5: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

DNA sequence: The sequence reads that gets produced by sequencing machine.

This can be considered the primary sequence of the genome.

Page 6: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

Sequence alignment: order and connect overlapping sequence reads to for a Contig.

This is something you are likely to do when you sequence a gene.

Page 7: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

We can consider Contigs the secondary level of genome assembly.

Page 8: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

Scaffolds are the tertiary level of genome assembly.

Scaffolds are also referred to as Super Contigs.

Scaffolds are formed by connecting ordered Contigs.

Page 9: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

Scaffolds are formed by connecting ordered and Contigs. How?

Page 10: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

Page 11: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

Genome assembly quality is measured by Contig/scaffold N50 or similar measures.

Page 12: The Human Genome Project - Part III

Genome Assembly Revisited

HAGenetics.org

What affects the quality of genome assembly?

1.Repeat elements.

2.Variations between the individuals sequenced (segmental duplications).

Page 13: The Human Genome Project - Part III

Genome Annotation

HAGenetics.org

Genome annotation is very important to study the biology of an organism.

Without a proper annotation, the sequence is useless.

Remember!

A book that cannot be read and understood is useless knowledge

Page 14: The Human Genome Project - Part III

Genome Annotation

HAGenetics.org

Genome

Coding Non-coding

Genes

Proteins or RNA

Introns Regulators

Etc.

Repetitive DNA

Interspersed Tandem

SINE LINE LTR

Transposons

Satellite Minisatellite

Microsatellite

The genome sequence can be classified into different groups based on the overall sequence composition and structure.

Page 15: The Human Genome Project - Part III

Genome Annotation

HAGenetics.org

Genome annotation can be divided into two approaches:

1.Structural annotation: 1. Largely in silico. 2. Utilizing the accumulated knowledge of genes and

genomes to identify sequence signatures.

2.Functional annotation: 1. Requires a lot of work and time. 2. Studying the function of the book/code. 3. Involves biochemical analyses of the genome. 4. Gene expression and regulation.

Page 16: The Human Genome Project - Part III

Structural annotation

HAGenetics.org

Introns

Exons5’ UTR 3’ UTR

Start End

Un-Translated Region

Promoter sequence

Regulation sequence

Page 17: The Human Genome Project - Part III

Structural annotation

HAGenetics.org

Hidden Markov Models are used for bioinformatic annotation

Page 18: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Page 19: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

A time line of the developments in genomics

Page 20: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Number of genes in the human genome ~ 22K and constitute ~1.5% of the genome

Page 21: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Genes categorized

Page 22: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Genes categorized

Page 23: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Disease genes

Page 24: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Potential Drug targets

Page 25: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

RNA gene are present in multiple copies in the human genome. WHY?

Page 26: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Exon and intron size compared to other taxa

Page 27: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Overall GC content of the human genome

Page 28: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

GC is correlated with genes

Page 29: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

GC is correlated with genes

CpG islands in the promoter region can regulate gene expression

Page 30: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

We are repeat elements with some genes :-)

Page 31: The Human Genome Project - Part III

Tandem Repeat elements

HAGenetics.org

Minisatellite: Variable Number Tandem Repeats (VNTR)

Repeat unit size = hundreds base pairs

Repeated 4 times

Repeated 8 times Repeated 20 times

Microsatellite: Short Tandem Repeats (STR) – Simple Sequence Repeats (SSR)

Repeat unit size = 2 - 6 base pairs

Page 32: The Human Genome Project - Part III

Tandem Repeat elements

HAGenetics.org

Page 33: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

• Evolutionary relationship.

• Syntenic region are the conserved regions across taxa.

Page 34: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

A summary of each chromosome

Page 35: The Human Genome Project - Part III

Genome Outcome

HAGenetics.org

Page 36: The Human Genome Project - Part III

SNP as a marker

HAGenetics.org

Single Nucleotide Polymorphism

1. Many are found in through out the genome.

2. Found in nuclear and mitochondrial DNA.

3. No need for a lot of DNA. 4. Can be used on degraded DNA. 5. Easy to detect – many

platforms. 6. Polymorphism lower than

microsatellites.

Page 37: The Human Genome Project - Part III

SNP as a marker

HAGenetics.org

The SNPs identified by the human genome project allowed the development of SNP arrays (SNP chip).

SNP array allows surveying the genome for variations between individuals easily at a low price.

Page 38: The Human Genome Project - Part III

SNP as a marker

HAGenetics.org

Page 39: The Human Genome Project - Part III

SNP as a marker

HAGenetics.org

Commercial uses of SNP markers to learn about ancestry and health

Page 40: The Human Genome Project - Part III

SNP as a marker

HAGenetics.org

Genome-wide Association studies

(GWAS)

Page 41: The Human Genome Project - Part III

Beyond the genome

HAGenetics.org

The ENCycleopedia Of DNA Elements

1. Transcripts 2. Regulatory elements 3. Enhancers 4. Silencers 5. Origins of replication 6. CpG islands 7. Histone modification sites 8. Open chromatin sites

Page 42: The Human Genome Project - Part III

Beyond the genome

HAGenetics.org

Genome papers are no longer news

Page 43: The Human Genome Project - Part III

The OME Era

HAGenetics.org

Page 44: The Human Genome Project - Part III

The OME Era

HAGenetics.org