42
Exploring the Role of Non- Exploring the Role of Non- Coding DNA in the Function Coding DNA in the Function of the Human Genome through of the Human Genome through Variation. Variation. Christine Bird [email protected]

Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

  • Upload
    keita

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation. Christine Bird [email protected]. Hypothesis: Conserved non-coding DNA has a function in the human genome. Does human variation data suggest selection is acting on noncoding DNA? - PowerPoint PPT Presentation

Citation preview

Page 1: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Exploring the Role of Non-Coding Exploring the Role of Non-Coding DNA in the Function of the Human DNA in the Function of the Human

Genome through Variation.Genome through Variation.

Christine Bird

[email protected]

Page 2: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Hypothesis: Conserved non-coding DNA has a function in the human genome

Does human variation data suggest selection is acting on noncoding DNA?

Are conserved non-coding sequences selectively constrained?

Detection of fast evolving conserved non-coding sequence.

Exploring the properties and genomic context of human fast evolving non-coding regions.

Page 3: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

The Human Genome:

~25,000 genes

1 to 1.5% of human DNA is coding

Is the remaining 98.5% “junk”?

Page 4: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Selective constraint in mammalian genomes

Waterston et al. Nature 2002

Neutral

Constrained 5%

Page 5: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Proportions of Lineage Specific Conserved non-coding (CNC) sequences

Margulies et al. PNAS 2005

418 MCSs (Multiple vertebrate Conserved Sequences) in 571Kb:58 coding, 46 UTRs and 314 non-coding. ~ 27 species

Page 6: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

CNCs are evenly distributed in the human genome

Dermitzakis et al. Nat Rev Genet 2005

Page 7: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

The density of CNCs and exons is negatively correlated

Dermitzakis et al. Nat Rev Genet 2005

Page 8: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Why study conserved non-coding DNA?

Abundance beyond that expected under neutral evolution.

If function is gene regulation, understanding is limited.

Gene regulation is considered a crucial contributor to evolutionary change (King and Wilson, 1975).

Conserved non-coding sequences (CNCs) may well harbour critical regulatory changes that have driven recent human evolution.

Page 9: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Conserved non-coding sequences

Top conserved 5% of the human genome as detected with a phylogenetic hidden Markov model (phyloHMM) (Siepel, 2005).

Best-in-genome pairwise alignments by blastz, followed by chaining.

A multiple alignment constructed by MULTIZ. PhastCons constructs a two-state phylo-HMM for

conserved and non-conserved regions.

Remove overlap with Ensembl gene annotation.

http://genome.ucsc.edu/

Page 10: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Are conserved non-coding sequences selectively constrained?

Conservation of non-coding sequence due to forces acting on the human genome.

CNC SNP density only 82% of noncoding non-conserved sequence. 3.9 x 10-4 vs. 4.8 x 10-4; chi2= 686, 1 df; p<10-99

Just due to low local mutation rates? Or

Are New alleles deleterious, therefore less likely to be fixed in population?

Address this by looking at the derived allele frequency (DAF) spectra as it is unaffected by local mutation rates.

Drake et al. Nat Genet 2006

Page 11: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Derived Allele Frequency

Selective constraint shifts the distribution of constrained alleles toward rarer frequencies (Fay & Wu, 2000).

Allele frequencies in 4 populations from 210 unrelated individuals in the HapMap project:

CEU - American of European ancestry (60) YRI - Yoruba from Nigeria (60)JPT - Japanese from Tokyo (45)CHB - Han Chinese from Beijing (45)

Derived Allele Frequency (DAF) was generated for 1 million Phase I HapMap SNPs & 4 million Phase II.

The ancestral allele was inferred by comparison to chimp and/or macaque.

SNPs were assigned to defined genomic features to allow comparison.

Drake et al. Nat Genet 2006

Page 12: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

CNCs are selectively constrained

Drake et al. Nat Genet 2006

0

0.05

0.1

0.15

0.2

0.25

Binned Derived Allele Frequency

Fra

cti

on

of

SN

Ps

ConservedNon-conserved

Selective constraint

Low High

Mann-Whitney-U test; P<<10-4

Page 13: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

CNCs have an excess of low frequency derived alleles compared to Introns

Low High

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Binned Derived Allele Frequency

Fra

ctio

n o

f S

NP

s

CNCExonsIntronsRest

Mann-Whitney-U test; CNC vs Introns P<<10-16

Page 14: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

CNC sequences are selectively constrained and not mutation cold spots

Nucleotide variation revealed strong selective constraints upon CNCs in human populations.

SNP density 82% lower in CNCs

CNCs have an excess of low frequency derived alleles.

CNCs subject to purifying selection in humans, likely to harbour functionally important variants.

Drake et al. Nat Genet 2006

Page 15: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Why are they conserved?

Regions of the genome are therefore selectively constrained despite being non-coding.

But what is the reason for this conservation…?

What is novel about their biology? How can we tackle this question for so many elements? What are the most interesting regions?

A subset of CNCs undergoing rapid change with potential common properties or roles.

Page 16: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Why study fast-evolving non-coding?

If CNCs are part of chimpanzee-human lineage differentiation by changes in gene regulation then changes in their nucleotide sequence should be expected despite their overall conservation.

Following gene duplication subfunctionalization by the partitioning of gene regulation among descendant copies (Force, 1999)

Older models of gene duplication proposed an important role for positive selection after duplication (Bridges 1935, Ohno 1970, Ohta, 1987).

Page 17: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Heart HeartBrain

Duplicated gene and separated tissue specific regulation

Subfunctionalization

Duplicated genes preserved through subfunctionalization by the Duplication-Degeneration-Complementation model.

If CNCs are regulatory elements involved in this process they would have changed rapidly since duplication.

Lynch and Force, Genetics 2000

Page 18: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Detecting fast-evolving non-coding sequences

HumanChimp Macaque

GACTACGTTTGGTTTAGAGATGACTGGCTTTACTTTTGAGATGTCTGGGTTTACTTTTCAGAT

Lineage Specific

SubstitutionsTajima’s Relative rate test

512

GACTACGTTTGGTTTAGAGATGACTGGCTTTACTTTTGAGATGTCTGGGTTTACTTTTCAGAT

MULTIZ alignments (Webb Miller).

Human

Chimp

Macaque

S1

S2

(S1 - S2)2

(S1 + S2)= χ2

Tajima, Genetics 1993

Page 19: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

χ2 test of base substitutions.

Alignments = 304,291Power to detect acceleration = 26,477 P < 0.05 Accelerated = 2,794 (11%)

Accelerated in chimp = 1438

Accelerated in human = 1356

ANC (Accelerated Non-Coding)

Page 20: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Are Accelerated Non-Coding (ANCs) sequences functional?

Compare to 3 sets of control sequences: Power CNCs (not lineage specific):

CNCs with >= 4 substitutions = 23,683Non-accelerated CNCs:

CNCs < 4 substitutions = 277,814DAF controls 1&2:

1356 x 20Kb windows 500Kb from 5’ & 3’ of ANCs.

Repeat analyses excluding potential confounder: Segmental Duplications (SD), Copy Number Variants (CNV), pseudogenes and retroposed genes.

Page 21: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Are ANC sequences functional?

Does nucleotide variation data indicate particular modes of selection implying function? (Is acceleration recent or ancient?)Derived allele frequency spectrum comparisonsPopulation differentiation, FST

Are ANCs involved in subfunctionalization? Is there enrichment in recently duplicated sequences?

What function do these rapidly evolving sequences have?Association of ANC variation with expression levels of

nearby genes

Page 22: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

>0-0

.1

0.1-

0.2

0.2-

0.3

0.3-

0.4

0.4-

0.5

0.5-

0.6

0.6-

0.7

0.7-

0.8

0.8-

0.9

0.9-

<1

Binned Derived Allele Frequency

Fra

cti

on

of

SN

Ps

NonAccelerated CNC

Control

ANC

Excess of high frequency derived alleles in ANCs

Loss of constraint & Directional Selection?

Selective constraint

Mann-Whitney-U test; Non-accelerated CNC vs ANCs P =1.63x10-6

Page 23: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Binned Derived Allele Frequency

Fra

cti

on

of

SN

Ps

NonAccelerated CNC

ANC

Control

Power

Power CNCs are neutral

Mann-Whitney-U test; Power CNC vs Control P =0.15

Page 24: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

>0-0

.1

0.1-

0.2

0.2-

0.3

0.3-

0.4

0.4-

0.5

0.5-

0.6

0.6-

0.7

0.7-

0.8

0.8-

0.9

0.9-

<1

Binned Derived Allele Frequency

Fra

cti

on

of

SN

Ps

NonAccelerated CNCControlPowerANCANC no confounding

Excess of rare alleles in ANCs excluding confounding elements

Loss of constraint & Directional Selection?

Mann-Whitney-U test; ANCs vs ANC no confounders P =0.48

Page 25: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Detecting recent evolution and population-specific selection

A measure of population structure, Wright’s FST. Compares the mean amount of genetic diversity found

within subpopulations to the meta-population. Sampling from 2 diverged subpopulations as if it is a

panmitic population gives an excess of homozygotes & a deficiency of heterozygotes.

FST can be defined as:

Calculated for ANCs MSG - mean square error within populations MSP - mean square error between populations nc - variance-corrected average sample size

FST = HT - HS

HT

Weir and Cockerham, Evolution 1984

Page 26: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

ANC FST values higher than non-accelerated CNCs

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

-0.0

5 to

0

0 to

0.0

5

0.05

to 0

.1

0.1

to 0

.15

0.15

to 0

.2

0.2

to 0

.25

0.25

to 0

.3

0.3

to 0

.35

0.35

to 0

.4

0.4

to 0

.45

0.45

to 0

.5

0.5

to 0

.55

0.55

to 0

.6

0.6

to 0

.65

0.65

to 0

.7

0.7

to 0

.75

0.75

to 0

.8

0.8

to 0

.85

0.85

to 0

.9

0.9

to 0

.95

0.95

to 1

Fst bins

Fre

qu

ency

ANCs No ConfoundingANCsPower CNCsNon-Accelerated CNCs

Mann-Whitney-U-test; Non-accelerated CNCs vs ANCs P = 0.0504 ; Non-accelerated CNCs vs ANCs no confounders P = 0.0363

Page 27: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Enrichment in Segmental Duplications

Approximately 5-6% of the human genome in SDs (Bailey et al, Science 2002)

ANCs 8%power CNCs 10%non-accelerated CNCs 5%

Excess of ANCs and power CNCs in SDs (chi-square; P< 10-4).

The general enrichment in SDs is not surprising, as it has been observed that sequence divergence is elevated in duplicated sequences.(Hurles et al. GenBio. 2004; She et al. GenRes. 2006).

Page 28: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Excess of recent segmental duplications associated with ANCs

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

90 91 92 93 94 95 96 97 98 99 100

% identity of SDs

Fra

ctio

n o

f ca

terg

ory

ove

rlap

pin

g S

Ds

Non-Accelerated CNCs

Power CNCs

ANC

Mann-Whitney-U test; P<<10-4

Human Specific

Page 29: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Testing for evidence of involvement in Gene Regulation

GENEANC

SNPAssociation mRNA

Page 30: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

ANC SNP- Expression Association

What is the functional impact of ANC variation on gene expression phenotypes?

47,294 transcripts probed in lymphoblastoid cell lines of 210 unrelated HapMap

Associate SNPs genotypes within ANCs to transcript expression levels by linear regression.

Statistical significance adjusted following 10,000 permutations per gene.

Additive association model:Linear regression e.g. CC = 0, CT = 1, TT = 2.

CC CT TT

8.0

8.5

9.0

9.5

Genotype

Exp

ress

ion

leve

l

0 1 2

Page 31: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

SNPs within ANCs are significantly associated with gene expression

phenotypes. Significant SNPs at the 0.01 permutation threshold:

68% ANCs SNPs tested (496 out of 729)

9% Power CNCs SNPs tested (1047 out of 11468) A SNP within an ANC is 7 times more likely to be associated with gene expression levels than a SNP within a power CNC.

Significant at the 0.01 permutation threshold:16% of ANCs tested (59 out of 366)

3% of Power CNCs tested (165 out of 5968) Nucleotide variation within ANCs is 5 times more likely to be associated with gene expression levels than variation in a power CNC.

Tendency for derived alleles within ANCs to be associated with lower expression levels.

Page 32: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Summary

CNCs are not mutation cold spots but selectively constrained.

Fast evolving noncoding sequences in the human lineage have lost this constraint and some are potentially undergoing positive selection.

This may have contributed to some recent differentiation in human populations.

ANCs are enriched in the most recent segmental duplications.

SNPs in ANCs are associated with significant change in gene expression phenotypes.

Page 33: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

AcknowledgementsThanks to my joint supervisors Emmanouil Dermitzakis and Matthew Hurles and the members of their teams; Barbara Stranger Dan Jeffares Catherine Ingle Julian Huppert Antigone Dimas Sarah Lindsay Dan Andrews Dan Turner Chris Barnes

Particular thanks to my other co-authors, Webb Miller - human-chimpanzee-macaque alignments Daryl Thomas - DAF for both phase I and II SNPs Maureen Liu - quantifying gene density

The Rhesus Macaque Genome Sequencing Consortium (RMGSC) and the HapMap consortium for making data available, and the Wellcome Trust and MRC for funding.

Page 34: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Exploring the Role of Non-Exploring the Role of Non-Coding DNA in the Function of Coding DNA in the Function of the Human Genome through the Human Genome through

Variation.Variation.

By Christine Bird

[email protected]

Page 35: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Margulies et al. PNAS 2005

Fig. 3. Phylogenetic tree of vertebrate species. By using the generated 27-species multisequence alignment, branch lengths were calculated based on analysis of synonymous coding positions. The branch lengths (as substitutions per synonymous site) between human and each species are listed (with additional pair-wise branch lengths provided in the supporting information). The last common ancestor among the catarrhine primates (A) is estimated at 25 mya (36, 37), between the rodents and primates (B) at 75 mya (5,6),between eutherians and metatherians (C) at 185 mya (14), between monotremes and other therians (D) at 200 mya (14), and between mammals and birds (E) at 310 mya (13).

Page 36: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Proportions of Lineage Specific Conserved non-coding sequences

Margulies et al. PNAS 2005

Fig. 4. Lineage specificity of MCSs. The proportion of nonexonic MCSs found in the sequences of species in each category is indicated. Note that virtually all MCSs overlapping known exonic sequences are present in all mammals (data not shown). All Mammals: cat, dog, cow, pig, rat, mouse, N.A. opossum, wallaby, and platypus; Eutherian: cat, dog, cow, pig, rat, and mouse; Marsupials: N.A. opossum and wallaby; and Other: species combinations containing2% of the analyzed MCSs (see the supporting information for the complete data set). Hashed areas of ‘‘All Mammals’’ reflect portions lacking one or both rodents, and hashed portions of ‘‘Eutherian Marsupials’’ reflect portions lacking both rodents.

Page 37: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

3020100

400

300

200

100

0

Megabases (long arm)

Freq

uenc

y

Mb

Freq

uen

cy exons

’’CNGs big’’

’’CNGs small’’3020100

400

300

200

100

0

Freq

uenc

y

Mb

Fre

qu

en

cy

Exons

Big CNCs

Small CNCs

Dermitzakis et al. Nature 2002

Distribution of large and small CNCs (Conserved Non-Coding sequences) and

exons on Hsa21

Big CNCs: 70% ID, 100 bps ungappedSmall CNCs: 85% ID, 35-99 bps ungapped

Page 38: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Conservation of CNCs in multiple species

mousemouse

humanhumanConservedConserved

blockblock

0 55 110 165 220

Human

Mouse

Green Monkey

Lemur

Rabbit

Pig

Bat

Cat

Shrew

Elephant

Platypus

Wallabysp

ecie

s

# conserved sequences

Dermitzakis et al. 2003 Science

Page 39: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation
Page 40: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Drake et al. Nat Genet 2006

Page 41: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

Testing DAF spectrum distributions

Non-parametric distributions of unequal sample size

Mann-Whitney U-test: Compares the median of two populations Uses the rank order of values in the two samples.

Kolmogorov-Smirnov test: Measures differences in the entire distributions of two samples in both

shape and location of distributions, but at the cost that it is less sensitive to differences in location only.

KS is less powerful with respect to the alternative hypothesis of differences in location than the Mann-Whitney U-test

Page 42: Exploring the Role of Non-Coding DNA in the Function of the Human Genome through Variation

           No. of significant CNC to gene associations

No. of significant CNCs of those tested      

Population  

No. of tested CNCs

No. of SNPs

No. of probes tested

No. of associations 0.01 0.001 0.0001 0.01   0.001   0.0001  

CEPH ANC 387 555 8673 23330 77 9 0 5915% 9 2% 0 0

  Power 6232 8388 14906 350309 181 36 18 149 2% 33 1% 17 0

CHB ANC 356 499 8092 21291 83 13 0 5616% 11 3% 0 0

  Power 5737 7579 14893 317518 202 41 15 159 3% 39 1% 15 0

CHB& JPT ANC 342 466 7919 20163 109 11 1 59

17% 9 3% 1 0

  Power 5474 7162 14852 301636 203 12 1 149 3% 12 0 1 0

JPT ANC 355 490 8197 21166 88 12 0 5917% 11 3% 0 0

  Power 5674 7531 14852 315476 241 48 20 194 3% 42 1% 19 0

YRI ANC 391 583 9118 24310 113 15 2 6416% 15 4% 2

1%

  Power 6724 9218 14908 381407 196 32 15 173 3% 30 0 14 0