50
Functional genomics approaches to disease genomics • Biological information and organisation • Genomics approaches to identifying disease-relevant enrichment • Candidate gene approaches

Functional genomics approaches to disease genomics

  • Upload
    tess

  • View
    63

  • Download
    1

Embed Size (px)

DESCRIPTION

Functional genomics approaches to disease genomics. Biological information and organisation Genomics approaches to identifying disease-relevant enrichment Candidate gene approaches. Biological information increases rapidly. Everyday hundreds of articles are published We can’t read them all - PowerPoint PPT Presentation

Citation preview

Page 1: Functional genomics approaches to disease genomics

Functional genomics approaches to disease genomics

• Biological information and organisation

• Genomics approaches to identifying disease-relevant enrichment

• Candidate gene approaches

Page 2: Functional genomics approaches to disease genomics

Biological information increases rapidly

• Everyday hundreds of articles are published– We can’t read them all– We can’t remember them all– Our memories are subjective anyway

• To make use of this incredible research output, we need some ways to bring this information together and summarise it

• If we could make it readable by a computer then our power to use it increases hugely

Page 3: Functional genomics approaches to disease genomics

OMIM Home Pagehttp://www.ncbi.nlm.nih.gov/omim/

Page 4: Functional genomics approaches to disease genomics

OMIM• Online Mendelian Inheritance in Man (OMIM) is a

catalog of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases

• Annotates 325 genes associated with human disease• 2,710 disorders with a known molecular basis• 1,634 genetic disorders with an unknown basis• The OMIM entries are made by experienced

annotators – Even the best annotators are not wholly consistent

Page 5: Functional genomics approaches to disease genomics

What is Ontology?

• Dictionary: A branch of metaphysics concerned with the nature and relations of being.

• Barry Smith:The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.

16061700s

Slide from the GO website www.geneontology.org

Page 6: Functional genomics approaches to disease genomics

Ontologies• Formalising our knowledge into a structured

and defined vocabulary is essential for genomics approaches

• The benefits from an agreed language enable rapid progress (e.g. Species classification)

• Recently, biological research communities have been defining a common language for describing everything from protein function through to phenotype

Page 7: Functional genomics approaches to disease genomics

From a practical view, ontology is the representation of something we

know about. “Ontologies" consist of a representation of things, that are detectable or directly observable,

and the relationships between those things.

Slide taken from GO (www.geneontology.org)

Page 8: Functional genomics approaches to disease genomics

Gene Ontology (GO)• The Gene Ontology project was set up to

provide a controlled vocabulary that describes a gene and its products (principally its product)

• GO describes genes in 3 separate ontologies– Molecular function, biological process and cellular

location– Genes can be annotated with many terms in each

category

Page 9: Functional genomics approaches to disease genomics

Biological ProcessGO term: tricarboxylic acid cycleSynonym: Krebs cycleSynonym: citric acid cycleGO id: GO:0006099

Cellular ComponentGO term: mitochondrionGO id: GO:0005739Definition: A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration.

Molecular FunctionGO term: Malate dehydrogenase. GO id: GO:0030060(S)-malate + NAD(+) = oxaloacetate + NADH.

H

O

H

O

O

H

O

H

O

H

H

O

O

H

O

H

O

H

H

O

NAD+NADH + H+

GO

Page 10: Functional genomics approaches to disease genomics

Biological Process

Physiological Process

Metabolism

Primary Metabolism

Protein Metabolism

Protein Biosynthesis

Biosynthesis

Is_a

Is_a

Is_a

Is_a

Is_a

Is_a

Is_a

• Directed Acyclic Graph(DAG)

• Allows a child node tohave more than oneparent

GO

Page 11: Functional genomics approaches to disease genomics

Mammalian Phenotype Ontology• Really the mouse phenotype ontology• Annotators take each published mouse gene

knock-out experiment and annotate the phenotype with the MPO

Page 12: Functional genomics approaches to disease genomics

Human Medical Ontologies • Human Phenotype Ontology

www.human-phenotype-ontology.org• The HPO provides a standardized vocabulary of phenotypic

abnormalities encountered in human genetic syndromes

Cardiovascular abnormality

Cardiac abnormality

Cardiac malformation

Organ abnormality

• London Dysmorphology Databasewww.human-phenotype-ontology.org

Abn. of the cardiac septa

Abn. of the cardiac atria

Page 13: Functional genomics approaches to disease genomics

Model Organisms• Excellent functional genomics resources

– The comparison between a human phenotype and a mouse phenotype is often very readily interpretable.

– Other useful organisms include the fly, the worm and even yeast• Useful as they have well-curated data for many genes

Page 14: Functional genomics approaches to disease genomics

Kyoto Encyclopaedia of Genes and Genomes (KEGG)

• Pathway database• manually-curated information from literature

Page 15: Functional genomics approaches to disease genomics

High-throughput functional resources

• Tissue-expression– Where and when genes are expressed may be relevant

to the disease

• Interactions– genes that interact may be involved in the same

biological process– E.g. protein-protein interactions or genetic interactions

(coordinated regulation)

• Sequence patterns (coding or regulatory)– Similar sequence can infer common functionality

Page 16: Functional genomics approaches to disease genomics

Different data sources have different types of error

• Literature sources (GO, model organism data, etc) have poor coverage and a lack of true negatives– We publish “A is an X” more than “A is not a Y”– All genes have not been subject to the same studies

• High-throughput sources often have high-error rates– False-positives are particularly a problem for

gene/protein interactions when you’re considering all pairs

Page 17: Functional genomics approaches to disease genomics

Ability to predict Human Phenotype Ontology terms

The value of mouse phenotypic data

Page 18: Functional genomics approaches to disease genomics

Forming interesting gene sets• If you can’t identify a single gene/loci, may be you can

form a subset of genes likely to contain gene(s) of interest– Genes in large intervals identified by linkage studies– Genes near SNPs with low, but not genome-wide significant,

p-values from GWAS studies– Genes in de novo or rare CNVs seen in cases

• Power is important– Bringing together many similar cases enriches for disease

genes associated with that disease

Page 19: Functional genomics approaches to disease genomics

Testing for enrichments• Compare to the genome

– Pulling balls (genes) from a bag (genome) is sampling without replacement, hypergeometric distribution

• Compare to controls– If chosen well, may account for biases– Contingency tables, Chi2 tests– If controls are unavailable, you can randomise to help

address potential biases like gene length and function

Page 20: Functional genomics approaches to disease genomics

Rare de novo copy number variant (CNV) associated with learning disability

2.8 Mb2.8 Mb

How does this CNV relate to the etiology

of the disease?

Which gene(s) underlie the phenotype?

Page 21: Functional genomics approaches to disease genomics

• Rare de novo CNVs > 100kb present in ~10% of LD cases

• Occur all over genome• 80% unique, non-recurrent

Collect a list of 148 rare de novo CNVsCollect a list of 148 rare de novo CNVs

Rare de novo CNVs are frequent in learning disability

Page 22: Functional genomics approaches to disease genomics

CNVs are common in all people

• Apparently benign, mostly inherited CNVs occur all over genome

Redon et al. Nature 2006

Collect a list of 26,472 benign CNVsCollect a list of 26,472 benign CNVs

Page 23: Functional genomics approaches to disease genomics

Mutations at different loci can give a similar phenotype

SYMPTOM/PHENOTYPE

Page 24: Functional genomics approaches to disease genomics

Method

HumanGenes

MouseGenes

ORTHOLOGY

Available Mouse KO phenotypes

Significantly over-represented phenotype

Mouse models relevant to the human disorder

Interesting intervals in patients

Disease phenotype

Page 25: Functional genomics approaches to disease genomics

0

50

100

150

200

Significant enrichments of genes associated with particular mouse phenotypes within de novo CNVs

identified in patients with Intellectual disability

Benign CNVs

All LD CNVs

LD CNVs - benign CNVs

Loss LD CNVs

Loss LD CNVs - benign CNVs

-15

-10

-5

0

5

10

15

% change

over

expected

*

Nervous System category

% change over

expected

% change over

expected

**

0

50

100

150

200

250

300 **

0

50

100

150

200

250

300

Abnormal dopaminergic neuron morphology

** * *

0

50

100

150

200

* *

Abnormal axon morphology

* FDR < 5%

Page 26: Functional genomics approaches to disease genomics

Human brain-specific genes corroborates mouse findings

“Brain-specific” genes are defined as those whose expression in human whole brain is > 4 x median expression across all other tissues

Provides ~ 3.75% of human genes as “brain-specific”

-20

-10

0

10

20

30

40

% change

over

expected

Brain-specific Genes

** * * Benign CNVs

All LD CNVs

All LD CNVs minus benign CNVs

Loss LD CNVs

Loss LD CNVs minus benign CNVs

Page 27: Functional genomics approaches to disease genomics

Autism Spectrum Disorders – the ‘triad’ of symptoms

Autism.org.uk

Impaired social

interaction

Impaired communication

Restrictive, repetitive behaviours and

interests

Page 28: Functional genomics approaches to disease genomics

Behavioural model phenotypes associated with Autism Spectrum Disorder (ASD) de novo CNVs

“Difficulty processing and retaining verbal information”“Difficulty understanding social language”“Difficulty coping with changes in routine”

Page 29: Functional genomics approaches to disease genomics

“Difficulty understanding social language”“Difficulty with empathy and friendships”

Behavioural model phenotypes associated with Autism Spectrum Disorder (ASD) de novo CNVs

Page 30: Functional genomics approaches to disease genomics

“Restricted and Repetitive Behaviours and Interests”60-80% of individuals with ASD exhibit poor motor planning and coordination

Behavioural model phenotypes associated with ASD de novo CNVs

Page 31: Functional genomics approaches to disease genomics

Candidate genes• The genes that constitute significant enrichments

become candidate disease genes

• While the enrichment is significantly associated with the intervals, the individual genes are not, and each requires further proof individually

• Experimental follow-up is costly and thus the genes taken forward need to be considered carefully

Page 32: Functional genomics approaches to disease genomics

0

50

100

150

200

0

10

20

30

40

50

60

70

80

GO Transcription

Brain-Specific

KEGGNeuro

KEGGParkinson’s

Mouse phenotypesAbnormal Axon/Neuron

Annotations vary in coverage and specificity

0

100

200

300

400

500

% change over

expected

Number of

candidate genes

% of CNVs with a

candidate gene

Page 33: Functional genomics approaches to disease genomics

6 of 148 LD patients have a cleft palate

-100

0

100

200

300

400

-100

0

100

200

300

400

The better the patients are classified the more power we have to identify enrichments

Enrichment for KO

phenotype cleft palate

LD CNVs in 6 patients

with cleft palate142 without cleft palate

Benign CNVs

-50

0

50

100

150

200

250

% change

over

expected

-50

0

50

100

150

200

250

% change

over

expected

Tremor phenotypeTremor phenotype

Patients +/- seizuresPatients +/- seizures

-100

0

100

200

300

400

500

600

-100

0

100

200

300

400

500

600

Patients +/- brain abnormality

Patients +/- brain abnormality

Abnormal myelination phenotypeAbnormal myelination phenotype

Page 34: Functional genomics approaches to disease genomics

Some associations found for the main cohort may be more relevant to associated, or co-occurring, symptoms – ASD

Page 35: Functional genomics approaches to disease genomics

Mutation databases are a rich source of discovery: DECIPHER

Proband 1

Proband 2

Proband 3

Single geneVery similar phenotype

• DECIPHER is a database that holds genetic information about patients who present with congenital abnormalities

Page 36: Functional genomics approaches to disease genomics

DECIPHER patients are annotated with London Medical Database terms

Level 1 Level 2 Level3

Page 37: Functional genomics approaches to disease genomics

18 CNVs 121 CNVs

7 CNVs

ENSEMBL genes assigned to CNVs

692 genesRemove copy number variable genes observed in healthy individuals 633 genes

3320 genes

2767 genes

132 CNVs

3030 genes

3036 genes

Cranium, General abnormalities

Formed groups CNVs associated with each human phenotype

Page 38: Functional genomics approaches to disease genomics

Human Symptom: Cupid bow shape of mouth

Human Symptom: Short Stature, Prenatal Onset

Many enrichments are readily interpretable

0

50

100

150

200

250

300

350

% E

nri

chm

en

t

0

200

400

600

800

1000

1200

1400

1600

% E

nri

chm

en

t

0

50

100

150

200

250

300

350

400

450

% E

nri

chm

en

t

-500

0

500

1000

1500

2000

2500

3000

% E

nri

chm

en

t

*

* * *

* *

Mouse Phenotype: Syndactyly Mouse Phenotype: Malocclusion

All Gain * Statistically Significant FDR < 0.05Loss

Human Symptom: Syndactyly of toes Human Symptom: Malocclusion

Mouse Phenotype: Decreased Fetal Size Mouse Phenotype: Abnormal Palate Development

Page 39: Functional genomics approaches to disease genomics

0

1000

2000

3000

4000

% E

nri

chm

en

tOthers identify less obvious relationships

All Gain* Statistically Significant FDR < 0.05

KEY

Loss

Human Symptom: Psychotic Behaviour

0

2000

4000

6000

% E

nri

chm

en

t

Human Symptom: Complex Partial Seizures

* * *

Mouse Phenotype: Abnormal pre-pulse inhibition

Mouse Phenotype: Abnormal circadian rhythm

Page 40: Functional genomics approaches to disease genomics

Mutations can be dissected to identify the contributions of individual genes

ATG7OXTRATP2B2

FANCD2Intellectual disability/ developmental delay candidate genes

Short stature, prenatal onset candidate gene

Patient id: 248772

SNX2 Mental retardation/ developmental delay candidate gene

FBN2 Camptodactyly candidate gene

Patient id: 785

Page 41: Functional genomics approaches to disease genomics

Gene set enrichment analysisAravind Subramanian et al, 2005

• Start with some list of ranked genes– Genes ranked by expression cases vs controls (Microarrays)– Genes ranked by nearby SNP p-values

• Score genes + or – according to some property• Ask, are genes with this property more focussed towards the top of

this list that I would expect by chance?

Page 42: Functional genomics approaches to disease genomics

Gene Prioritisation for disease• Given a list of genes, which are most likely to be

involved in this disease? • We just want a ranking, not a significant association

• Commonly employed approaches involve supervised learning methodologies– Collect data points from one or more sources– Take a “Gold Standard” set of genes for this disease– Train a method using known true +ives (and true –ives

if known)– Given a list of genes, which ones “look” most similar

to the known disease genes?

Page 43: Functional genomics approaches to disease genomics
Page 44: Functional genomics approaches to disease genomics

Linkage networks can infer missing values – “guilt by association”

Page 45: Functional genomics approaches to disease genomics

From pubmed ID: 19728866

Page 47: Functional genomics approaches to disease genomics

Conserved co-expression of disease genes (Ala et al. ,PLoS Genetics 2008)

• 850 OMIM entries where a phenotype was mapped to a loci but specific genes unknown

• Used conserved human-mouse co-expression data as other interaction or pathway data can bias towards studied genes

• Generated single species gene co-expression networks– Calculated Pearson’s cor. coef. between all pairs of gene

expression data. Formed a network edge if 2 genes’ exp. correlation was in the top 1% either gene.

• Clustered OMIM phenotypes using MimMiner– A text-mining tool

Page 48: Functional genomics approaches to disease genomics

Using this methodology, they were able to predict 321 candidates across 81 disease-associated loci at an FDR of <10%

Page 49: Functional genomics approaches to disease genomics

Human phenome-interactome network for predicting disease candidate genes

(Lage et al., Nature Biotech. 2007)

• 2 data networks– Phenotypic similarity, consisting of detecting words that

are common to two phenotype descriptions and do not occur frequently among all phenotype description.

– Human interactome, consisting of several large human sets and sets transferred from model organisms, weighted according to observation frequency.

Page 50: Functional genomics approaches to disease genomics

(1) a given positional candidate is queried for high-scoring interaction partners (“virtual pull-down”). These are interaction partners for the candidate complex.

(2) proteins known to be involved in disease are identified in the candidate complex, and pairwise scores of the phenotypic overlap between disease of these proteins and the candidate phenotype are assigned.

(3) Based on the phenotypes represented in the candidate complex, a Bayesian predictor awards a probability to the candidate in the complex. The score is used to form the ranking.