Meiotic gene conversion in humans: rate, sex ratio, and GC bias

Preview:

DESCRIPTION

Meiotic gene conversion in humans: rate, sex ratio, and GC bias. Amy L. Williams. June 19, 2013 University of Chicago. Gene conversion defined. Meiosis: produces haploid germ cells with recombinations Gene conversion: short segment copied into given chromosome from other homolog. - PowerPoint PPT Presentation

Citation preview

Meiotic gene conversion in humans: rate, sex ratio, and GC bias

Amy L. Williams

June 19, 2013

University of Chicago

Gene conversion defined

• Meiosis: produces haploid germ cells with recombinations

• Gene conversion: short segment copied into given chromosome from other homolog

MeiosisCrossover

GeneConversion

Two types ofrecombination:

• Number of gene conversions per meiosis?– 4-15× # crossovers? Jeffreys and May (2004)

• Length of gene conversion tracts?– 55-290 bp? Jeffreys and May (2004)

Study question 1: gene conversion rate?

• Number of gene conversions per meiosis?– 4-15× # crossovers? Jeffreys and May (2004)

• Length of gene conversion tracts?– 55-290 bp? Jeffreys and May (2004)

• Per base-pair rate? Fraction of genome affected– R = (number × tract length) / genome length– 2.2×10-6 to 4.4×10-5? Jeffreys and May (2004)

Study question 1: gene conversion rate?

Study question 2: male vs. female rate?

• Gender differences in rate?– Crossovers: female rate 1.78× male (deCODE)

Study question 3 & 4: GC bias? Localization?

• GC bias observed in allelic transmissions?

• Crossover hot spots influence location?

• Locations of gene conversions independent in a given meiosis?

Myers et al., Science 2005

Summary: study questions

1. Genome-wide de novo gene conversion rate?

2. Different rate between males/females?

3. Extent of GC bias in tracts?

4. Localization: Hotspots? Tracts independent?

Outline

• Background / study questions

• Study design and methods

• Results– SNP chip data– Sequence data

Approaches to identify gene conversions

• Linkage disequilibrium based– Can give rate estimate– Averaged over human history, both genders

• Sperm-based– Many meiotic products: per-individual estimates– Single molecule: genome-wide assays difficult

• Pedigree-based– De novo, per-gender events observable– Data for many samples required

Study design: SNP chip data for pedigrees

• Primary analysis: pedigree SNPchip data

• Challenge: small tracts– Tracts covered by ≤ 1 SNP– Not all tracts covered, but still

obtain overall rate

• Chip data give per base-pair rate– R = # gene conversions / # informative sites

Datasets for analysis

• Mexican American pedigrees• Data source 1: San Antonio Family Studies

– 2,490 genotyped samples, 80 pedigrees– SNP chip genotypes (Illumina 1M, 660k)– Can estimate de novo gene conversion rate

Datasets for analysis

• Mexican American pedigrees• Data source 1: San Antonio Family Studies

– 2,490 genotyped samples, 80 pedigrees– SNP chip genotypes (Illumina 1M, 660k)– Can estimate de novo gene conversion rate

• Data source 2: T2D-GENES Consortium– 607 sequenced samples, 20 pedigrees– Whole genome sequence (Complete Genomics)– Can examine tract length, distribution, etc.

• Though need deep data on single family to do so

Study design: SNP chip data for pedigrees

• Pedigree-based haplotypes/phasereveal recombinations– Heterozygous sites: informative for

recombination

• Phasing method: Hapi– Phases nuclear families– Williams et al., Genome Biol. 2010

Family-based phase reveals recombinations

• Hapi output: paternal haplotype transmissions

Crossover:

Haplotype 2Haplotype 1

Family-based phase reveals recombinations

• Hapi output: paternal haplotype transmissions

Crossover: Gene Conversion:

Haplotype 2Haplotype 1

Other pedigree phasing methods

• Most pedigree phasing methods slow– Runtime complexity for phasing ~O(m 22n)

• n = # non-founders• m = # markers

– Example: nuclear family with 11 children• 4,194,304 states per marker

• Can merge exponential class of states• Many states extremely unlikely to be optimal

Hapi: efficient phasing of nuclear families

• Hapi: state space reduction improves efficiency– Merges exponential class of states– Omits states that cannot yield optimal solution

• Applied to family with 11 children– Average per marker states: 4.2, maximum 48

Hapi: efficient phasing of nuclear families

• Hapi: state space reduction improves efficiency– Merges exponential class of states– Omits states that cannot yield optimal solution

• Applied to family with 11 children– Average per marker states: 4.2, maximum 48Program

All families (N=103)Runtime Speedup

Hapi 3.1 s -

Merlin 1,005 s 323×

Allegro v2 7,661 s 2,462×

Superlink 1,393 s* 448×

* Superlink failed to analyze 11 child family; 8/11 children used

Hapi: efficient phasing of nuclear families

• Hapi: state space reduction improves efficiency– Merges exponential class of states– Omits states that cannot yield optimal solution

• Applied to family with 11 children– Average per marker states: 4.2, maximum 48Program

All families (N=103) ≤ 3 children (N=86)Runtime Speedup Runtime Speedup

Hapi 3.1 s - 2.2 s -

Merlin 1,005 s 323× 8.7 s 3.8×

Allegro v2 7,661 s 2,462× 14.5 s 6.4×

Superlink 1,393 s* 448× 38.8 s 17.2×

* Superlink failed to analyze 11 child family; 8/11 children used

Applying Hapi to multi-generational pedigrees

• Hapi currently applies to nuclear families– For 3-generation pedigrees analyzed for gene

conversions, omit sites with phase conflicts• Will not bias results, but data are reduced

Applying Hapi to multi-generational pedigrees

• Hapi currently applies to nuclear families– For 3-generation pedigrees analyzed for gene

conversions, omit sites with phase conflicts• Will not bias results, but data are reduced

• Extension to Hapi possible to efficiently analyzearbitrarily large pedigrees– Most San Antonio Family Studies pedigrees too

large to be phased in practical time

Approach to identifying gene conversions

1. Perform QC, phase 3-generation pedigrees2. Find gene conversions in 2nd generation:

single SNP double crossovers3. Confirm:

– Gene converted allele in 3rd generation– Other allele in 2nd generation sibling(s)

• False positive only if ≥ 2 genotyping errors

Outline

• Background / study questions

• Study design and methods

• Results– SNP chip data– Sequence data

Current analysis dataset

• Analyzed SNP chip data for 16 pedigrees– Data for both parents, 3+ children, 1+ grandchild– 190 samples– 42 meioses (21 paternal, 21 maternal)

• 4.15×106 informative sites

• Rate: 7.95×10-6/bp/generation– Within range of Jeffreys and May (2004)– Close to LD-based estimates

Result 1: 33 putative gene conversions, rate

MaleFemale

• Rate: 7.95×10-6/bp/generation– Within range of Jeffreys and May (2004)– Close to LD-based estimates

Result 1: 33 putative gene conversions, rate

MaleFemale Are these real gene

conversions?

• 19 sites sequenced by T2D-GENES Consortium– 18/19 gene conversion genotypes verified

• Differing site looks like sequencing artifact– 2nd generation recipient has genotype mismatch

3rd generation grandchild shows same genotype– If sequence data correct,

gene conversion ingrandchild

T2D-GENES sequence confirms events

• More female gene conversions than male– Females transmit 1.54× males– Difference (yet) not significant –

larger sample coming

• Different rates expected based on crossovers– Female crossover rate 1.78× male (deCODE)

Result 2: gene conversion rates by gender

Result 3: gene conversions localize in hotspots

2.71% of genome in ≥10 cM/Mb hotspots

Result 3: gene conversions localize in hotspots

10/33 gene conversions with ≥10 cM/Mb:

P=1.1×10-8

2.71% of genome in ≥10 cM/Mb hotspots

Result 4: observe extreme GC bias

• 31 GC informative sites– A/C, A/G

T/C, T/G

• GC transmission in 74% of cases(95% CI 59% – 90%)– GC bias likely (P=5.3×10-3)

Outline

• Background / study questions

• Study design and methods

• Results– SNP chip data– Sequence data

Sequence near chip-identified gene conversions

• Sequence available for 11/33 putative sites

Sequence near chip-identified gene conversions

• Sequence available for 11/33 putative sites

• Shortest resolution for tract length ≤ 143 bp

Sequence near chip-identified gene conversions

• Sequence available for 11/33 putative sites

• Clustered gene conversions in 4 sequences

Sequence near chip-identified gene conversions

• Sequence available for 11/33 putative sites

• Clustered gene conversions in 4 sequences

Boxed regions confirmed by Sanger sequencing

Relationship to complex crossover?

Haplotype 2Haplotype 1

Conclusions

• Estimate of de novo gene conversion rate– 7.95×10-6/bp/generation– Females: 1.54× gene conversions vs. males

• Enriched in hotspots: similar mechanism to crossover

• GC vs AT allele transmitted ~3:1 – GC bias• Complex/clustered gene conversions observed

in sequence data– Suggests unique correlation within short region

The T2D-GENES Consortium (NIDDK)San Antonio Family Studies (NIDDK, NIMH)

NHGRI NRSA Fellowship

Acknowledgements

Nick Patterson David ReichJohn Blangero

Giulio GenoveseTom Dyer Kati Truax

Recommended