15
INVESTIGATION Quantication of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism Data Matthew C. Keller,* ,,1 Peter M. Visscher, and Michael E. Goddard §, ** *Department of Psychology and Neuroscience and Institute for Behavioral Genetics, University of Colorado, Boulder, Colorado, 80309, Queensland Statistical Genetics Laboratory, Queensland Institute for Medical Research, Brisbane, 4029, Queensland, Australia, § Faculty of Land and Food Resources, University of Melbourne, Melbourne, 3001, Victoria, Australia, and **Department of Primary Industries, Melbourne, 3001, Victoria, Australia ABSTRACT Inbreeding depression, which refers to reduced tness among offspring of related parents, has traditionally been studied using pedigrees. In practice, pedigree information is difcult to obtain, potentially unreliable, and rarely assessed for inbreeding arising from common ancestors who lived more than a few generations ago. Recently, there has been excitement about using SNP data to estimate inbreeding (F) arising from distant common ancestors in apparently outbredpopulations. Statistical power to detect inbreeding depression using SNP data depends on the actual variation in inbreeding in a population, the accuracy of detecting that with marker data, the effect size, and the sample size. No one has yet investigated what variation in F is expected in SNP data as a function of population size, and it is unclear which estimate of F is optimal for detecting inbreeding depression. In the present study, we use theory, simulated genetic data, and real genetic data to nd the optimal estimate of F, to quantify the likely variation in F in populations of various sizes, and to estimate the power to detect inbreeding depression. We nd that F estimated from runs of homozygosity (F roh ), which reects shared ancestry of genetic haplotypes, retains variation in even large populations (e.g., SD = 0.5% when N e = 10,000) and is likely to be the most powerful method of detecting inbreeding effects from among several alternative estimates of F. However, large samples (e.g., 12,00065,000) will be required to detect inbreeding depression for likely effect sizes, and so studies using F roh to date have probably been underpowered. I NBREEDING occurs when mates are more closely related than expected if chosen at random in the population. Most cultures have taboos against close inbreeding (Brown 1991) and most nonhuman animals appear to avoid it, ostensibly as an adaptation to prevent its deleterious effects (Pusey and Wolf 1996). Inbred offspring tend to have higher rates of congenital disorders and lower survival rates and fertility. This phenomenoncalled inbreeding depressionhas since been conrmed by hundreds of scientic investigations on plants and animals (Roff 1997). The magnitude of the effect appears to be related to the strength of directional selection on the trait. Fitness traits such as survival, reproduction, resistance to disease, and predator avoidance are more af- fected by inbreeding than are traits likely to be under less intense, directional selection (DeRose and Roff 1999). In- terestingly, there are numerous reports of inbreeding effects on human complex traits such as heart disease (Shami et al. 1991), hypertension (Rudan et al. 2003b), osteoporosis (Rudan et al. 2004), cancer (Lebel and Gallagher 1989), IQ (Morton 1979; Afzal 1988), and psychiatric disorders (Abaskuliev and Skoblo 1975; Gindilis et al. 1989; Bulayeva et al. 2005), sug- gesting that these traits or the genetic variants underlying them have been subject to natural selection ancestrally. Two major hypotheses have been forwarded to explain why inbreeding depression occurs (Wright 1977). The partial dominance hypothesis focuses on the role of homozygosity of rare, recessive/partially recessive deleterious mutations. Dele- terious mutations constantly arise in the population and selec- tion quickly purges the most additive and dominant ones, leaving the segregating pool of deleterious mutations enriched with partially recessive ones because selection against Copyright © 2011 by the Genetics Society of America doi: 10.1534/genetics.111.130922 Manuscript received May 22, 2011; accepted for publication June 14, 2011 Available freely online through the author-supported open access option. Supporting information is available online at http://www.genetics.org/content/ suppl/2011/06/25/genetics.111.130922.DC1. 1 Corresponding author: Department of Psychology and Neuroscience, Muenzinger Hall, 345 UCB, Boulder, CO 80309. E-mail: [email protected] Genetics, Vol. 189, 237249 September 2011 237

Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

INVESTIGATION

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Kellerdagger1 Peter M VisscherDagger and Michael E Goddardsect

Department of Psychology and Neuroscience and daggerInstitute for Behavioral Genetics University of Colorado Boulder Colorado80309 DaggerQueensland Statistical Genetics Laboratory Queensland Institute for Medical Research Brisbane 4029 Queensland

Australia sectFaculty of Land and Food Resources University of Melbourne Melbourne 3001 Victoria Australia and Departmentof Primary Industries Melbourne 3001 Victoria Australia

ABSTRACT Inbreeding depression which refers to reduced fitness among offspring of related parents has traditionally been studiedusing pedigrees In practice pedigree information is difficult to obtain potentially unreliable and rarely assessed for inbreeding arisingfrom common ancestors who lived more than a few generations ago Recently there has been excitement about using SNP data toestimate inbreeding (F) arising from distant common ancestors in apparently ldquooutbredrdquo populations Statistical power to detectinbreeding depression using SNP data depends on the actual variation in inbreeding in a population the accuracy of detecting thatwith marker data the effect size and the sample size No one has yet investigated what variation in F is expected in SNP data asa function of population size and it is unclear which estimate of F is optimal for detecting inbreeding depression In the present studywe use theory simulated genetic data and real genetic data to find the optimal estimate of F to quantify the likely variation in F inpopulations of various sizes and to estimate the power to detect inbreeding depression We find that F estimated from runs ofhomozygosity (Froh) which reflects shared ancestry of genetic haplotypes retains variation in even large populations (eg SD = 05when Ne = 10000) and is likely to be the most powerful method of detecting inbreeding effects from among several alternativeestimates of F However large samples (eg 12000ndash65000) will be required to detect inbreeding depression for likely effect sizesand so studies using Froh to date have probably been underpowered

INBREEDING occurs when mates are more closely relatedthan expected if chosen at random in the population Most

cultures have taboos against close inbreeding (Brown 1991)and most nonhuman animals appear to avoid it ostensiblyas an adaptation to prevent its deleterious effects (Puseyand Wolf 1996) Inbred offspring tend to have higher ratesof congenital disorders and lower survival rates and fertilityThis phenomenonmdashcalled inbreeding depressionmdashhas sincebeen confirmed by hundreds of scientific investigations onplants and animals (Roff 1997) The magnitude of the effectappears to be related to the strength of directional selectionon the trait Fitness traits such as survival reproduction

resistance to disease and predator avoidance are more af-fected by inbreeding than are traits likely to be under lessintense directional selection (DeRose and Roff 1999) In-terestingly there are numerous reports of inbreeding effectson human complex traits such as heart disease (Shami et al1991) hypertension (Rudan et al 2003b) osteoporosis (Rudanet al 2004) cancer (Lebel and Gallagher 1989) IQ (Morton1979 Afzal 1988) and psychiatric disorders (Abaskuliev andSkoblo 1975 Gindilis et al 1989 Bulayeva et al 2005) sug-gesting that these traits or the genetic variants underlyingthem have been subject to natural selection ancestrally

Two major hypotheses have been forwarded to explainwhy inbreeding depression occurs (Wright 1977) The partialdominance hypothesis focuses on the role of homozygosity ofrare recessivepartially recessive deleterious mutations Dele-terious mutations constantly arise in the population and selec-tion quickly purges the most additive and dominant onesleaving the segregating pool of deleterious mutations enrichedwith partially recessive ones because selection against

Copyright copy 2011 by the Genetics Society of Americadoi 101534genetics111130922Manuscript received May 22 2011 accepted for publication June 14 2011Available freely online through the author-supported open access optionSupporting information is available online at httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC11Corresponding author Department of Psychology and Neuroscience MuenzingerHall 345 UCB Boulder CO 80309 E-mail matthewckellergmailcom

Genetics Vol 189 237ndash249 September 2011 237

recessive mutations is inefficient When such mutationsmeet in homozygous form such as following inbreedingtheir full deleterious effects are exposed Alternatively theoverdominance hypothesis posits that inbreeding depres-sion is caused by a reduction in heterozygosity of commonalleles maintained at equilibrium at loci governed by het-erozygote advantage Both mechanisms may play a role ininbreeding depression effects but the partial dominancehypothesis enjoys the strongest empirical support to date(Charlesworth and Charlesworth 1999 Charlesworth andWillis 2009)

Estimation of Inbreeding

The inbreeding coefficient of an individual F is one of thecentral parameters in population genetics theory It is definedas the probability that two randomly chosen alleles at a homol-ogous locus within an individual are identical by descent (IBD)with respect to a base (reference) population in which allalleles are independent that is the alleles are identical be-cause they are passed down from a common ancestor (Wright1922) Homozygosity caused by two IBD genomic segmentsis termed autozygosity as opposed to allozygosity which ishomozygosity produced by alleles that are identical by stateF is therefore an estimate of genome-wide autozygosity

Traditionally F has been estimated using known pedi-grees (Fped) typically using a path coefficient method de-veloped by Wright (1922) In practice pedigree informationis difficult and costly to obtain potentially unreliable (egdue to problems with accurate recording of ancestry) andrarely assessed for inbreeding arising from common ances-tors who lived more than three or four generations in thepast Although autozygosity caused by common ancestorsliving more than three generations ago contributes very lit-tle variation to Fped it can contribute substantially to varia-tion in segments of the genome that are autozygous (Stam1980) Moreover Fped is an expectation of the proportion ofthe genome that is autozygous but there is much variationaround this expectation due to the stochastic nature of re-combination For example the percentage of the genomeautozygous among progeny of first cousins averages 625but the standard deviation of this is 624 (Franklin 1977Hill and Weir 2011)

For these reasons there has recently been excitementabout using dense marker data to estimate F arising fromeven very ancient inbreeding (Leutenegger et al 2003Carothers et al 2006 Gibson et al 2006 Li et al 2006Woods et al 2006 McQuillan et al 2008) Such genomicestimates of F potentially have several advantages over FpedFirst whereas Fped is an expectation of genome-wide auto-zygosity by directly measuring homozygosity genomic esti-mates of F can potentially estimate the actual percentage ofthe genome that is autozygous more accurately Secondgenomic estimates of F incorporate autozygosity arisingfrom very distant common ancestors (eg 50+ generationsago) Third genomic estimates of F can be estimated in any

sample that has marker data collected on it including sam-ples for which pedigree information is difficult or impossibleto collect Fourth whereas all estimates of F are genome-wide estimates of autozygosity genomic estimates of F canbe altered to allow for the possibility of testing whether aneffect of F is distributed evenly across the genome or whetherthe signal comes from specific genomic locations (eg by ob-taining separate F estimates for different chromosomes) Fifthin certain species (eg humans) individuals who inbreed maynot be a representative sample of the population and thusputative inbreeding effects may be due to ldquothird variablerdquonongenetic reasons Such third variable explanations are lesslikely in populations where inbreeding is likely to be distantand unintended Finally given the decreasing price in ge-nome-wide SNP data it is likely that genomic estimates of Fare less expensive to collect than is Fped which requires eitherintensive observation in the field or (in humans) extensiveinterviews to obtain pedigrees from both parents

A potential drawback to using genomic estimates of F isthat their behavior in populations with different levels ofinbreeding is not well characterized and it is therefore un-clear which estimates should be preferred under which sit-uations Furthermore there may not be enough variation ingenomic estimates of F in unselected (ldquooutbredrdquo) samples todetect inbreeding effects with statistical significance

Current Study

The current study has three main goals First there areseveral potential ways to estimate F from pedigrees ona marker-by-marker basis and from runs of homozygosityUsing simulated data sets that have realistic patterns of mo-lecular variation and linkage disequilibrium we seek to un-derstand which of these estimates of F are optimal fordetecting inbreeding depression and whether this answerdepends on the level of inbreeding (assessed using differentpopulation sizes) in the population Second we are inter-ested in comparing the variance error and intercorrelationsof the various estimates of F arising from populations ofdifferent sizes and that followed population expansions sim-ilar to those that occurred in recent human history Third onthe basis of the two results above we estimate the likelihoodof detecting inbreeding depression given the predicted var-iances in the optimal estimate(s) of F In this report we usethe human genome and population history to guide oursimulation because much is known about these parametersin humans and because there has been much interest indetecting the effects of autozygosity on human traits Nev-ertheless as discussed below the findings of this reportreadily extend to nonhuman animal populations as well

Methods

Simulation of sequence and SNP data

We simulated populations of different sizes using theFregene forward-time simulation program (Chadeau-Hyam

238 M C Keller P M Visscher and M E Goddard

et al 2008) Fregene simulates the evolutionary process ofgenetic sequencing data in a population following theFisherndashWright model (a monoecious diploid randomly mat-ing population that evolves across nonoverlapping genera-tions) Mating was random except that selfing was notallowed It should be noted that variation of all estimatesof F will be different if mating is not random Nonrandommating typically increases variation in F as does a reductionin Ne Therefore smaller effective population sizes could beused as a proxy for studying variation of estimates of F in thepresence of nonrandom mating

The effective population size (Ne) of humans has beenestimated at 10000 on the basis of molecular variation(Takahata et al 1995) and to have passed through a bottle-neck reducing the population sizes to 3000 for Caucasiansand 8000 for Africans on the basis of linkage disequilib-rium patterns (Tenesa et al 2007) In the present study wesimulated effective population sizes of Ne frac14 100 1000 and10000 Simulating larger population sizes was not compu-tationally feasible Each individualrsquos genome was composedof two homologous chromosomes of length 100 Mb Wechose 100-Mbndashlength genomes for reasons of computationalfeasibility and because 100 Mb is roughly the size of a typicalmammalian chromosome Fregenersquos recombination modelallows for crossover rates that vary along the chromosomeat both broad scales (regions of several megabases in lengththat differ in background recombination rates) and finescales (corresponding to recombination hotspots of 2 kbin length) Recombination rates for the present set of simu-lations averaged n frac14 11 middot 1028 per site per generation 80of recombination events occurred in hotspot regions andthe average distance between hotspots was 8500 bp Muta-tions arose at rate m frac14 23 middot 1028 per site per generationand all mutations were neutral with respect to fitness Newmutations occurred uniformly and at random across the ge-nome thus mutations could arise at an already polymorphicsite allowing both ldquodouble hitrdquo mutations (if occurring at anancestral allele) and ldquobackrdquomutations (if occurring at a derivedallele) although these types of mutations were extremely rareThese parameters the defaults in Fregene were based onestimated per site recombination and mutation rates in humanpopulations (Chadeau-Hyam et al 2008)

To reduce the computational time taken by our simula-tions we used Fregenersquos scale_exp option with the scalingparameter l frac14 20 This reduced the population size and num-ber of generations 20-fold but increased n and m 20-foldwhich kept the population mutation parameter u frac14 4Nem

and the population linkage parameter r frac14 4Nen constantthereby mimicking the degree of variation and linkage dis-equilibrium in nonscaled populations but decreasing thecomputational time by over an order of magnitude Popula-tions evolved for 100000 generations (ie across 5000 loopsonce scaled) ensuring that they reached driftndashmutationndashrecombination equilibrium (Chadeau-Hyam et al 2008)

One drawback with scaling in Fregene is that the mixtureof rare to common variants is slightly inaccurate For ex-

ample with lthinsp frac14 thinsp 20 and Ne frac14 1000 no variant can havea minor allele frequency 001 (rather than 00005 whichit would be if unscaled) To minimize this effect at the endof the 100000 generations (5000 loops) we ran Fregenewith no scaling for an additional 1000 generations (1000loops) for each population The additional unscaled gener-ations allowed polymorphism levels (as judged by heterozy-gosity) to build up to expected levels The observed levels ofheterozygosity in the three populations at the end of thesimulation (93 middot 1024 93 middot 1025 9 middot 1026) conformedclosely to 4Nem=eth4Nemthorn 1THORN as predicted by theory(92 middot 1024 92 middot 1025 9 middot 1026)

To investigate the effects of population size on variousestimates of F we needed to (a) obtain information on ped-igrees going back several generations (b) derive SNP datafrom the sequences in the samples and (c) create samples ofequal numbers of individuals despite the varying populationsizes To accomplish the first of these goals we modified theFregene program to track and write out pedigree informa-tion for each individual going back five generations (as weshow going back farther than this was unnecessary) Toaccomplish the second goal of creating SNP data we se-lected all variants with minor allele frequency (MAF) 005 from the sequence data that Fregene outputs at thefinal generation This resulted in 2700 SNPs (1 SNP per36 kb) in samples from the Ne frac14 100 population 27000SNPs (1 SNP per 37 kb) in samples from the Ne frac14 1000population and 274000 SNPs (1 SNP per 360 bp) in sam-ples from the Ne frac14 10000 population These differences inSNP densities reflect the larger numbers of variants that nat-urally occur in larger populations (Crow and Kimura 1970)

The third goal to create equal experimental sample sizesfor each population size was motivated by the fact that wedid not want our results across different population sizes tobe confounded by differing sample sizes To accomplish thegoal of analyzing adequately large samples of equal sizesacross the three different population sizes for the Ne frac14 100population 100 subpopulations (each of Ne frac14 100) split offfrom the original Ne frac14 100 population and evolved indepen-dently for 50 generations Groups of 10 of these subpopula-tions were randomly placed together (without replacement)to create 10 samples of size n = 1000 each Because all 100subpopulations evolved from the same progenitor popula-tion the common variants were generally the same betweensubpopulations allowing SNP sets to be created from then = 1000 samples Thus despite having the same levels ofinbreeding as in a population of size Ne frac14 100 each of 10replicate samples consisted of 1000 individuals Similarlyfor the Ne frac14 1000 population 10 subpopulations of sizeNe frac14 1000 split off from the original population (of sizeNe frac14 1000) and evolved independently for 50 generationseach of these populations was converted to 10 samples ofsize n = 1000 each at the end of the 50 generations Finallythe Ne frac14 10000 population was split into 10 replicationsamples of n = 1000 individuals each and evolved for 50generations In this way 10 samples of size n = 1000 were

Detecting Ancient Inbreeding 239

derived from the different population sizes (see Figure 1)By evolving independently for 50 generations each subpop-ulation (and therefore each sample) had independentbreeding (and inbreeding) histories within a 50-generationtime frame

To study the effects of population expansion on estimatesof F we allowed a population of size Ne frac14 100 that hadreached driftndashmutation equilibrium to expand instanta-neously to Ne frac14 10000 and then to evolve at this size for400 generations [11400 years for humans (Fenner2005)] Computational limitations disallowed investigatinglarger and more realistic expansions (eg Ne frac14 10000ndash1000000) but conclusions from studying smaller-scaleexpansions should apply to larger expansions as well Wewrote out pedigree sequencing and SNP data for 10 sam-ples of size n = 1000 as described above at generations0 (immediately before expansion and therefore identical tothe Ne frac14 100 data) 50 100 200 and 400 Results fromgeneration 100000 come from the final generation of theconstant Ne frac14 10000 population

A proxy for the homozygous mutation load

Given the evidence that inbreeding depression is caused byhomozygosity at numerous partially recessive deleteriousmutations (Charlesworth and Willis 2009) and because del-eterious mutations rarely reach frequencies 005 in thepopulation (Pritchard 2001) we approximated an individu-alrsquos overall load of homozygous recessivepartially recessivedeleterious mutations by defining mutations as the set of allvariants not included in the SNP data (ie all variants withMAF 005 in the original sequence data) and summed allsuch mutations that were homozygous for each individualWe call this measure the ldquohomozygous mutation loadrdquo Wecomputed the homozygous mutational load using rare neu-tral alleles for reasons of computational efficiency but rareneutral alleles approximated well the behavior of mildlydeleterious partially recessive alleles For all populationsizes the observed distribution of allele frequencies of sim-ulated neutral alleles with MAF 005 was very similar tothe theoretically expected frequency distributions of eitherpartially recessive (h = 025) very mildly deleterious muta-tions (s 1Ne) or fully recessive mutations (h= 0) of muchlarger effect (s 50Ne) where h is the dominance coefficientand s the selective coefficient (see supporting informationFigure S1) (Crow and Kimura 1970) Thus the homozygousmutation load investigated here simulates what would be ob-served if inbreeding depression is due to the aggregated effectof homozygosity at a large number of recessive to partiallyrecessive mildly deleterious alleles

Real SNP data

To compare results from simulated SNP data with resultsfrom real human SNP data we used 1000 randomly selectedcontrol individuals of Caucasian descent from the publiclyavailable Molecular Genetics of SchizophreniandashGAIN (Ge-netic Association Information Network) sample (OrsquoDonovanet al 2008) genotyped on the Affy 60 platform All SNPspassed rigorous quality control metrics (MAF 005 miss-ingness 002 HardyndashWeinberg equilibrium P-values00001) and individuals who had a missingness rate002 or who were outliers on the first two principal com-ponent dimensions derived from an identical-by-state matrixwere dropped Genome-wide 546882 SNPs (1 SNP per51 kb) of 906600 passed these quality control thresholdsWe then selected the first 100 Mb of SNP data from chromo-somes 1ndash10 (after removing the centromere and surroundingheterochromatin if applicable) to create 10 different SNPsamples We derived 10 replicates of the three genomic esti-mates of F (defined below) from these samples

Estimates of F

In some contexts F can be conceptualized as a parameterrather than as an estimate For example Fped is a knownproperty given a pedigree and a base population and canbe called a parameter of an individual in this context Inthe present context however Fped and all other F statistics

Figure 1 Procedure for deriving 10 samples from each of three effectivepopulation sizes Circles represent populations arrows represent evolu-tion and splittingcombining of each population and squares representsamples of size 1000 derived from each population The sizes of theshapes correspond to population (circles and arrows) or to sample(squares) sizes See text for details

240 M C Keller P M Visscher and M E Goddard

investigated here are conceptualized as imperfect estimatesof the actual level of autozygosity in an individualrsquos genomeThe average F estimate in a population depends on the basepopulation (defined as the ancestral population when F= 0)and increases as one considers older base populations How-ever the average F is not very important in the contextconsidered here Variation in F among the individuals inthe population is required to detect an inbreeding effectso it is variation in F that we focus on We use a recent basepopulationmdashfive generations agomdashfor pedigree inbreedingand the current population for genomic estimates of F aspresented in Powell et al (2010)

In each simulated sample we calculated four alternativeestimates of F as well as two additional estimates of Frohbased on shorter and longer megabase thresholds In thereal SNP data sample we calculated the three genomic esti-mates of F To compare the spread of the estimates of F thevariances of F were derived across the 1000 individuals ina sample and the log (base 10) of these variances was takento aid interpretability As there were 10 replicate samplesderived from each population size (see above) we then tookthe mean of the 10 log (base 10) variances and found thestandard error around each of these means Each estimate ofF is described below

1 Fped F from pedigree inbreeding going back fvie gener-ations as figured from Wrightrsquos path formula

Fpedjfrac14Xifrac141c

12

ethnijthornmijthorn1THORN

where mii and nii refer to the number of paternal and ma-ternal paths from the ith common ancestor and c refers tothe number of common ancestors of individual j For exam-ple nij frac14 mij frac14 2 for cousinndashcousin inbreeding and c frac14 2when the cousins share two grandparents (Fped frac14 00625)and c frac14 1 when the cousins share one grandparent(Fped frac14 003125) Virtually all variation in Fped is capturedwithin the most recent five generations (see Results) and sopaths mij and nij were not extended beyond this

2 Fh is the canonical estimate of genomic F based on excessSNP homozygosity

Fhjfrac14 OethHjTHORN2 EethHTHORN

m2 EethHTHORN

where OethHjTHORN is the observed homozygosity across all SNPsfor person j EethHTHORN frac14Pi 122pieth12piTHORN is the expected homo-zygosity for all people in the sample and pi is the MAF forSNPs i = 1 m This estimate can be obtained fromPLINK (Purcell et al 2007) using the ndashhet command

3 Falt is an alternative estimate of F predicted to have lowererror (Yang et al 2010a)

Faltj frac14 12P

idim

di frac14 1=  pi and 1= qi for a homozygote for the minor and themajor allele respectively and 0 if heterozygote at SNP i andwhere qi is the frequency of the major allele at the ith SNPand pi frac14 12qi

4 Froh refers to the proportion of the genome (0ndash1) that isin runs of homozygosity (ROHs)

Frohjfrac14P

k lengthethROHkTHORNL

ROHk is the kth ROH in individual jrsquos genome and L = 106

bp the length of the genome in both the simulated and thereal SNP data sets ROHs were found using PLINK and de-fined as stretches of continuously homozygous SNPs span-ning at least 15 Mb (165 cM) The lengths of ROHsegments generated by a single path should follow an expo-nential distribution with mean 1=2g M where g is the num-ber of generations since the last common ancestor (Fisher1954) Thus the expected length of an autozygous ROHsegment caused by a common ancestor g = 30 generationsin the past is 165 cM (15 Mb)

To compare alternative Froh thresholds we defined Frohshortas a run of homozygous SNPs 05 Mb and Frohlong as a runof homozygous SNPs 5 Mb These three threshold values(05 Mb 15 Mb and 5 Mb) are the same as those used byMcQuillan et al (2008) in their investigations into Froh inEuropean populations Frohlong is likely to lack many auto-zygous stretches of ancient origin (eg from 10 genera-tions back which have expected sizes of 5 Mb) but is alsolikely to be composed of very few ROHs that are not trulyautozygous On the other hand Frohshort can detect moreancient autozygous segments (eg from 100 generationsback) at the expense of a higher false detection ratemdashiedetecting a higher proportion of ROHs that are allozygousrather than autozygous Such allozygous haplotypes may besimilar superficially creating runs of homozygosity at mea-sured SNPs but heterozygous at unmeasured rare muta-tions Using the expected exponential distribution of ROHsegments our thresholds of 05 Mb 15 Mb and 5 Mb shouldcapture 58 20 and 05 (respectively) of all ROHs cre-ated from common ancestors 50 generations in the past

Results

Time to most recent common ancestor between spouses

Given that ancient inbreeding is central to the presentinvestigation and that the relationship between mates de-termines the inbreeding coefficient of their progeny it isuseful to have an intuition of how long ago the most recentcommon ancestor existed between two mates in a randomly

Detecting Ancient Inbreeding 241

mating population A similar issue has been investigatedwith respect to the amount of overlap in pedigrees betweentwo individuals (Ohno 1996 Derrida et al 2000) as wediscuss below Similarly Chang (1999) assessed the numberof generations one must travel back before reaching themost recent common ancestor of every human alive todayand it is surprisingly recent about 110 generations (3100years) ago (see also Rohde et al 2004 Lachance 2009)Clearly the most recent common ancestor between justtwo randomly chosen individuals (eg spouses) must bemuch more recent still This can be quantified simply ina FisherndashWright model

For a population of constant size consider two randomlychosen individuals (spouses) X and Y Take a singleancestor of spouse X who lived t generations ago Givennonoverlapping generations the probability that this ances-tor is the same person as any of spouse Yrsquos ancestors wholived t generations ago is At=Nt where At is the total numberof ancestors of Y at generation t and Nt is the total numberof all ancestors in the population at generation t This fol-lows from the fact that the probabilities that the ancestor ofX is related to each of Yrsquos ancestors are independent of eachother (ie multiple ancestors of Y can be the same person)The probability that the ancestor of X is unrelated to anyancestor of Y is therefore 12At=Nt Assuming that X also hasAt distinct ancestors at time t the probability that no ances-tor of X is related to any ancestor of Y at time t iseth12At=NtTHORNAt Finally the probability cg that a randomlychosen spousal pair share no common ancestors (no pedi-

gree inbreeding) up to and including generation g iscg frac14

Qtfrac141geth12At=NtTHORNAt

The above calculation for cg relies on estimations of At

and Nt that are not straightforward to derive As an approx-imation At ffi 2t and Nt ffi Ne While the latter assumption isprobably close to the truth on average the former (At ffi 2t)must overestimate the number of ancestors at time t becauseAt Nt even though At grows exponentially while Nt re-mains constant Put another way as one travels back ina pedigree duplicate ancestors grow increasingly commondue to ancient inbreeding This biases cg to be lower than itshould be but further investigation (not shown) indicates thatthis bias is minimal not changing how far one must go back tofind a common ancestor by more than a single generation

Figure 2 shows expected values for cg for four differentpopulation sizes the three investigated in the present sim-ulation as well as for Ne frac14 1 M which is probably a largermating pool than most modern humans belong to We checkedour theoretical predictions against simulation results for thethree population sizes (Ne frac14 100 1000 and 10000) andthe five generations of ancestry we recorded The expectedprobabilities (solid dots) agree nicely with the observed pro-portions from the simulated data (colored xrsquos)

Figure 2 shows that mates are likely to share a commonancestor in their recent ancestral past For populations ofsize Ne frac14 1000 almost all spousal pairs have at least onecommon ancestor in common from 6 generations in thepast For Ne frac14 10000 spousal pairs share a common ances-tor within 8 generations Indeed for every 10-fold increasein population size the number of generations one must goback before a common ancestor becomes certain increasesby 2 generations Thus at Ne frac14 1000000 spouses arealmost certain to share a common ancestor within the last11 generations Increasing this population size to 100 mil-lion pushes this estimate back only to 15 generationsROHs caused by inbreeding from common ancestors 15 gen-erations in the past are 3 Mb in length on average al-though due to the finite number of recombination eventsper generation inbreeding events from 15 generations inthe past often cause no ROHs These results are consonantwith the results of Derrida et al (2000) who found thatthere is substantial overlap between pedigree trees of tworandomly chosen individuals in a population of size10000 within 10 generations and that this overlap is com-plete (both individuals share the exact same ancestors)within 20 generations

Variance of Fped

Given that the contribution to Fped at generation t is inde-pendent of the contribution to Fped from previous genera-tions the variation of Fped for a given population size can beapportioned into the amount contributed by each ancestralgeneration Figure 3 shows that the contribution to the var-iation of Fped decreases log-linearly as a function of the con-tributing ancestral generation The bend (nonlinearity) atgeneration 1 for the samples drawn from the Ne frac14 10000

Figure 2 Shown is the probability that mates share no commonancestors in the most recent g generations as a function of populationsize (see text) The xrsquos are the same values derived empirically fromsimulations for up to 5 generations in the past and show good agree-ment with the expected probabilities Even in large randomly breedingpopulations (eg 1 million) it is almost certain that at least one ancestorexists in common between two pedigrees within 11 generations

242 M C Keller P M Visscher and M E Goddard

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 2: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

recessive mutations is inefficient When such mutationsmeet in homozygous form such as following inbreedingtheir full deleterious effects are exposed Alternatively theoverdominance hypothesis posits that inbreeding depres-sion is caused by a reduction in heterozygosity of commonalleles maintained at equilibrium at loci governed by het-erozygote advantage Both mechanisms may play a role ininbreeding depression effects but the partial dominancehypothesis enjoys the strongest empirical support to date(Charlesworth and Charlesworth 1999 Charlesworth andWillis 2009)

Estimation of Inbreeding

The inbreeding coefficient of an individual F is one of thecentral parameters in population genetics theory It is definedas the probability that two randomly chosen alleles at a homol-ogous locus within an individual are identical by descent (IBD)with respect to a base (reference) population in which allalleles are independent that is the alleles are identical be-cause they are passed down from a common ancestor (Wright1922) Homozygosity caused by two IBD genomic segmentsis termed autozygosity as opposed to allozygosity which ishomozygosity produced by alleles that are identical by stateF is therefore an estimate of genome-wide autozygosity

Traditionally F has been estimated using known pedi-grees (Fped) typically using a path coefficient method de-veloped by Wright (1922) In practice pedigree informationis difficult and costly to obtain potentially unreliable (egdue to problems with accurate recording of ancestry) andrarely assessed for inbreeding arising from common ances-tors who lived more than three or four generations in thepast Although autozygosity caused by common ancestorsliving more than three generations ago contributes very lit-tle variation to Fped it can contribute substantially to varia-tion in segments of the genome that are autozygous (Stam1980) Moreover Fped is an expectation of the proportion ofthe genome that is autozygous but there is much variationaround this expectation due to the stochastic nature of re-combination For example the percentage of the genomeautozygous among progeny of first cousins averages 625but the standard deviation of this is 624 (Franklin 1977Hill and Weir 2011)

For these reasons there has recently been excitementabout using dense marker data to estimate F arising fromeven very ancient inbreeding (Leutenegger et al 2003Carothers et al 2006 Gibson et al 2006 Li et al 2006Woods et al 2006 McQuillan et al 2008) Such genomicestimates of F potentially have several advantages over FpedFirst whereas Fped is an expectation of genome-wide auto-zygosity by directly measuring homozygosity genomic esti-mates of F can potentially estimate the actual percentage ofthe genome that is autozygous more accurately Secondgenomic estimates of F incorporate autozygosity arisingfrom very distant common ancestors (eg 50+ generationsago) Third genomic estimates of F can be estimated in any

sample that has marker data collected on it including sam-ples for which pedigree information is difficult or impossibleto collect Fourth whereas all estimates of F are genome-wide estimates of autozygosity genomic estimates of F canbe altered to allow for the possibility of testing whether aneffect of F is distributed evenly across the genome or whetherthe signal comes from specific genomic locations (eg by ob-taining separate F estimates for different chromosomes) Fifthin certain species (eg humans) individuals who inbreed maynot be a representative sample of the population and thusputative inbreeding effects may be due to ldquothird variablerdquonongenetic reasons Such third variable explanations are lesslikely in populations where inbreeding is likely to be distantand unintended Finally given the decreasing price in ge-nome-wide SNP data it is likely that genomic estimates of Fare less expensive to collect than is Fped which requires eitherintensive observation in the field or (in humans) extensiveinterviews to obtain pedigrees from both parents

A potential drawback to using genomic estimates of F isthat their behavior in populations with different levels ofinbreeding is not well characterized and it is therefore un-clear which estimates should be preferred under which sit-uations Furthermore there may not be enough variation ingenomic estimates of F in unselected (ldquooutbredrdquo) samples todetect inbreeding effects with statistical significance

Current Study

The current study has three main goals First there areseveral potential ways to estimate F from pedigrees ona marker-by-marker basis and from runs of homozygosityUsing simulated data sets that have realistic patterns of mo-lecular variation and linkage disequilibrium we seek to un-derstand which of these estimates of F are optimal fordetecting inbreeding depression and whether this answerdepends on the level of inbreeding (assessed using differentpopulation sizes) in the population Second we are inter-ested in comparing the variance error and intercorrelationsof the various estimates of F arising from populations ofdifferent sizes and that followed population expansions sim-ilar to those that occurred in recent human history Third onthe basis of the two results above we estimate the likelihoodof detecting inbreeding depression given the predicted var-iances in the optimal estimate(s) of F In this report we usethe human genome and population history to guide oursimulation because much is known about these parametersin humans and because there has been much interest indetecting the effects of autozygosity on human traits Nev-ertheless as discussed below the findings of this reportreadily extend to nonhuman animal populations as well

Methods

Simulation of sequence and SNP data

We simulated populations of different sizes using theFregene forward-time simulation program (Chadeau-Hyam

238 M C Keller P M Visscher and M E Goddard

et al 2008) Fregene simulates the evolutionary process ofgenetic sequencing data in a population following theFisherndashWright model (a monoecious diploid randomly mat-ing population that evolves across nonoverlapping genera-tions) Mating was random except that selfing was notallowed It should be noted that variation of all estimatesof F will be different if mating is not random Nonrandommating typically increases variation in F as does a reductionin Ne Therefore smaller effective population sizes could beused as a proxy for studying variation of estimates of F in thepresence of nonrandom mating

The effective population size (Ne) of humans has beenestimated at 10000 on the basis of molecular variation(Takahata et al 1995) and to have passed through a bottle-neck reducing the population sizes to 3000 for Caucasiansand 8000 for Africans on the basis of linkage disequilib-rium patterns (Tenesa et al 2007) In the present study wesimulated effective population sizes of Ne frac14 100 1000 and10000 Simulating larger population sizes was not compu-tationally feasible Each individualrsquos genome was composedof two homologous chromosomes of length 100 Mb Wechose 100-Mbndashlength genomes for reasons of computationalfeasibility and because 100 Mb is roughly the size of a typicalmammalian chromosome Fregenersquos recombination modelallows for crossover rates that vary along the chromosomeat both broad scales (regions of several megabases in lengththat differ in background recombination rates) and finescales (corresponding to recombination hotspots of 2 kbin length) Recombination rates for the present set of simu-lations averaged n frac14 11 middot 1028 per site per generation 80of recombination events occurred in hotspot regions andthe average distance between hotspots was 8500 bp Muta-tions arose at rate m frac14 23 middot 1028 per site per generationand all mutations were neutral with respect to fitness Newmutations occurred uniformly and at random across the ge-nome thus mutations could arise at an already polymorphicsite allowing both ldquodouble hitrdquo mutations (if occurring at anancestral allele) and ldquobackrdquomutations (if occurring at a derivedallele) although these types of mutations were extremely rareThese parameters the defaults in Fregene were based onestimated per site recombination and mutation rates in humanpopulations (Chadeau-Hyam et al 2008)

To reduce the computational time taken by our simula-tions we used Fregenersquos scale_exp option with the scalingparameter l frac14 20 This reduced the population size and num-ber of generations 20-fold but increased n and m 20-foldwhich kept the population mutation parameter u frac14 4Nem

and the population linkage parameter r frac14 4Nen constantthereby mimicking the degree of variation and linkage dis-equilibrium in nonscaled populations but decreasing thecomputational time by over an order of magnitude Popula-tions evolved for 100000 generations (ie across 5000 loopsonce scaled) ensuring that they reached driftndashmutationndashrecombination equilibrium (Chadeau-Hyam et al 2008)

One drawback with scaling in Fregene is that the mixtureof rare to common variants is slightly inaccurate For ex-

ample with lthinsp frac14 thinsp 20 and Ne frac14 1000 no variant can havea minor allele frequency 001 (rather than 00005 whichit would be if unscaled) To minimize this effect at the endof the 100000 generations (5000 loops) we ran Fregenewith no scaling for an additional 1000 generations (1000loops) for each population The additional unscaled gener-ations allowed polymorphism levels (as judged by heterozy-gosity) to build up to expected levels The observed levels ofheterozygosity in the three populations at the end of thesimulation (93 middot 1024 93 middot 1025 9 middot 1026) conformedclosely to 4Nem=eth4Nemthorn 1THORN as predicted by theory(92 middot 1024 92 middot 1025 9 middot 1026)

To investigate the effects of population size on variousestimates of F we needed to (a) obtain information on ped-igrees going back several generations (b) derive SNP datafrom the sequences in the samples and (c) create samples ofequal numbers of individuals despite the varying populationsizes To accomplish the first of these goals we modified theFregene program to track and write out pedigree informa-tion for each individual going back five generations (as weshow going back farther than this was unnecessary) Toaccomplish the second goal of creating SNP data we se-lected all variants with minor allele frequency (MAF) 005 from the sequence data that Fregene outputs at thefinal generation This resulted in 2700 SNPs (1 SNP per36 kb) in samples from the Ne frac14 100 population 27000SNPs (1 SNP per 37 kb) in samples from the Ne frac14 1000population and 274000 SNPs (1 SNP per 360 bp) in sam-ples from the Ne frac14 10000 population These differences inSNP densities reflect the larger numbers of variants that nat-urally occur in larger populations (Crow and Kimura 1970)

The third goal to create equal experimental sample sizesfor each population size was motivated by the fact that wedid not want our results across different population sizes tobe confounded by differing sample sizes To accomplish thegoal of analyzing adequately large samples of equal sizesacross the three different population sizes for the Ne frac14 100population 100 subpopulations (each of Ne frac14 100) split offfrom the original Ne frac14 100 population and evolved indepen-dently for 50 generations Groups of 10 of these subpopula-tions were randomly placed together (without replacement)to create 10 samples of size n = 1000 each Because all 100subpopulations evolved from the same progenitor popula-tion the common variants were generally the same betweensubpopulations allowing SNP sets to be created from then = 1000 samples Thus despite having the same levels ofinbreeding as in a population of size Ne frac14 100 each of 10replicate samples consisted of 1000 individuals Similarlyfor the Ne frac14 1000 population 10 subpopulations of sizeNe frac14 1000 split off from the original population (of sizeNe frac14 1000) and evolved independently for 50 generationseach of these populations was converted to 10 samples ofsize n = 1000 each at the end of the 50 generations Finallythe Ne frac14 10000 population was split into 10 replicationsamples of n = 1000 individuals each and evolved for 50generations In this way 10 samples of size n = 1000 were

Detecting Ancient Inbreeding 239

derived from the different population sizes (see Figure 1)By evolving independently for 50 generations each subpop-ulation (and therefore each sample) had independentbreeding (and inbreeding) histories within a 50-generationtime frame

To study the effects of population expansion on estimatesof F we allowed a population of size Ne frac14 100 that hadreached driftndashmutation equilibrium to expand instanta-neously to Ne frac14 10000 and then to evolve at this size for400 generations [11400 years for humans (Fenner2005)] Computational limitations disallowed investigatinglarger and more realistic expansions (eg Ne frac14 10000ndash1000000) but conclusions from studying smaller-scaleexpansions should apply to larger expansions as well Wewrote out pedigree sequencing and SNP data for 10 sam-ples of size n = 1000 as described above at generations0 (immediately before expansion and therefore identical tothe Ne frac14 100 data) 50 100 200 and 400 Results fromgeneration 100000 come from the final generation of theconstant Ne frac14 10000 population

A proxy for the homozygous mutation load

Given the evidence that inbreeding depression is caused byhomozygosity at numerous partially recessive deleteriousmutations (Charlesworth and Willis 2009) and because del-eterious mutations rarely reach frequencies 005 in thepopulation (Pritchard 2001) we approximated an individu-alrsquos overall load of homozygous recessivepartially recessivedeleterious mutations by defining mutations as the set of allvariants not included in the SNP data (ie all variants withMAF 005 in the original sequence data) and summed allsuch mutations that were homozygous for each individualWe call this measure the ldquohomozygous mutation loadrdquo Wecomputed the homozygous mutational load using rare neu-tral alleles for reasons of computational efficiency but rareneutral alleles approximated well the behavior of mildlydeleterious partially recessive alleles For all populationsizes the observed distribution of allele frequencies of sim-ulated neutral alleles with MAF 005 was very similar tothe theoretically expected frequency distributions of eitherpartially recessive (h = 025) very mildly deleterious muta-tions (s 1Ne) or fully recessive mutations (h= 0) of muchlarger effect (s 50Ne) where h is the dominance coefficientand s the selective coefficient (see supporting informationFigure S1) (Crow and Kimura 1970) Thus the homozygousmutation load investigated here simulates what would be ob-served if inbreeding depression is due to the aggregated effectof homozygosity at a large number of recessive to partiallyrecessive mildly deleterious alleles

Real SNP data

To compare results from simulated SNP data with resultsfrom real human SNP data we used 1000 randomly selectedcontrol individuals of Caucasian descent from the publiclyavailable Molecular Genetics of SchizophreniandashGAIN (Ge-netic Association Information Network) sample (OrsquoDonovanet al 2008) genotyped on the Affy 60 platform All SNPspassed rigorous quality control metrics (MAF 005 miss-ingness 002 HardyndashWeinberg equilibrium P-values00001) and individuals who had a missingness rate002 or who were outliers on the first two principal com-ponent dimensions derived from an identical-by-state matrixwere dropped Genome-wide 546882 SNPs (1 SNP per51 kb) of 906600 passed these quality control thresholdsWe then selected the first 100 Mb of SNP data from chromo-somes 1ndash10 (after removing the centromere and surroundingheterochromatin if applicable) to create 10 different SNPsamples We derived 10 replicates of the three genomic esti-mates of F (defined below) from these samples

Estimates of F

In some contexts F can be conceptualized as a parameterrather than as an estimate For example Fped is a knownproperty given a pedigree and a base population and canbe called a parameter of an individual in this context Inthe present context however Fped and all other F statistics

Figure 1 Procedure for deriving 10 samples from each of three effectivepopulation sizes Circles represent populations arrows represent evolu-tion and splittingcombining of each population and squares representsamples of size 1000 derived from each population The sizes of theshapes correspond to population (circles and arrows) or to sample(squares) sizes See text for details

240 M C Keller P M Visscher and M E Goddard

investigated here are conceptualized as imperfect estimatesof the actual level of autozygosity in an individualrsquos genomeThe average F estimate in a population depends on the basepopulation (defined as the ancestral population when F= 0)and increases as one considers older base populations How-ever the average F is not very important in the contextconsidered here Variation in F among the individuals inthe population is required to detect an inbreeding effectso it is variation in F that we focus on We use a recent basepopulationmdashfive generations agomdashfor pedigree inbreedingand the current population for genomic estimates of F aspresented in Powell et al (2010)

In each simulated sample we calculated four alternativeestimates of F as well as two additional estimates of Frohbased on shorter and longer megabase thresholds In thereal SNP data sample we calculated the three genomic esti-mates of F To compare the spread of the estimates of F thevariances of F were derived across the 1000 individuals ina sample and the log (base 10) of these variances was takento aid interpretability As there were 10 replicate samplesderived from each population size (see above) we then tookthe mean of the 10 log (base 10) variances and found thestandard error around each of these means Each estimate ofF is described below

1 Fped F from pedigree inbreeding going back fvie gener-ations as figured from Wrightrsquos path formula

Fpedjfrac14Xifrac141c

12

ethnijthornmijthorn1THORN

where mii and nii refer to the number of paternal and ma-ternal paths from the ith common ancestor and c refers tothe number of common ancestors of individual j For exam-ple nij frac14 mij frac14 2 for cousinndashcousin inbreeding and c frac14 2when the cousins share two grandparents (Fped frac14 00625)and c frac14 1 when the cousins share one grandparent(Fped frac14 003125) Virtually all variation in Fped is capturedwithin the most recent five generations (see Results) and sopaths mij and nij were not extended beyond this

2 Fh is the canonical estimate of genomic F based on excessSNP homozygosity

Fhjfrac14 OethHjTHORN2 EethHTHORN

m2 EethHTHORN

where OethHjTHORN is the observed homozygosity across all SNPsfor person j EethHTHORN frac14Pi 122pieth12piTHORN is the expected homo-zygosity for all people in the sample and pi is the MAF forSNPs i = 1 m This estimate can be obtained fromPLINK (Purcell et al 2007) using the ndashhet command

3 Falt is an alternative estimate of F predicted to have lowererror (Yang et al 2010a)

Faltj frac14 12P

idim

di frac14 1=  pi and 1= qi for a homozygote for the minor and themajor allele respectively and 0 if heterozygote at SNP i andwhere qi is the frequency of the major allele at the ith SNPand pi frac14 12qi

4 Froh refers to the proportion of the genome (0ndash1) that isin runs of homozygosity (ROHs)

Frohjfrac14P

k lengthethROHkTHORNL

ROHk is the kth ROH in individual jrsquos genome and L = 106

bp the length of the genome in both the simulated and thereal SNP data sets ROHs were found using PLINK and de-fined as stretches of continuously homozygous SNPs span-ning at least 15 Mb (165 cM) The lengths of ROHsegments generated by a single path should follow an expo-nential distribution with mean 1=2g M where g is the num-ber of generations since the last common ancestor (Fisher1954) Thus the expected length of an autozygous ROHsegment caused by a common ancestor g = 30 generationsin the past is 165 cM (15 Mb)

To compare alternative Froh thresholds we defined Frohshortas a run of homozygous SNPs 05 Mb and Frohlong as a runof homozygous SNPs 5 Mb These three threshold values(05 Mb 15 Mb and 5 Mb) are the same as those used byMcQuillan et al (2008) in their investigations into Froh inEuropean populations Frohlong is likely to lack many auto-zygous stretches of ancient origin (eg from 10 genera-tions back which have expected sizes of 5 Mb) but is alsolikely to be composed of very few ROHs that are not trulyautozygous On the other hand Frohshort can detect moreancient autozygous segments (eg from 100 generationsback) at the expense of a higher false detection ratemdashiedetecting a higher proportion of ROHs that are allozygousrather than autozygous Such allozygous haplotypes may besimilar superficially creating runs of homozygosity at mea-sured SNPs but heterozygous at unmeasured rare muta-tions Using the expected exponential distribution of ROHsegments our thresholds of 05 Mb 15 Mb and 5 Mb shouldcapture 58 20 and 05 (respectively) of all ROHs cre-ated from common ancestors 50 generations in the past

Results

Time to most recent common ancestor between spouses

Given that ancient inbreeding is central to the presentinvestigation and that the relationship between mates de-termines the inbreeding coefficient of their progeny it isuseful to have an intuition of how long ago the most recentcommon ancestor existed between two mates in a randomly

Detecting Ancient Inbreeding 241

mating population A similar issue has been investigatedwith respect to the amount of overlap in pedigrees betweentwo individuals (Ohno 1996 Derrida et al 2000) as wediscuss below Similarly Chang (1999) assessed the numberof generations one must travel back before reaching themost recent common ancestor of every human alive todayand it is surprisingly recent about 110 generations (3100years) ago (see also Rohde et al 2004 Lachance 2009)Clearly the most recent common ancestor between justtwo randomly chosen individuals (eg spouses) must bemuch more recent still This can be quantified simply ina FisherndashWright model

For a population of constant size consider two randomlychosen individuals (spouses) X and Y Take a singleancestor of spouse X who lived t generations ago Givennonoverlapping generations the probability that this ances-tor is the same person as any of spouse Yrsquos ancestors wholived t generations ago is At=Nt where At is the total numberof ancestors of Y at generation t and Nt is the total numberof all ancestors in the population at generation t This fol-lows from the fact that the probabilities that the ancestor ofX is related to each of Yrsquos ancestors are independent of eachother (ie multiple ancestors of Y can be the same person)The probability that the ancestor of X is unrelated to anyancestor of Y is therefore 12At=Nt Assuming that X also hasAt distinct ancestors at time t the probability that no ances-tor of X is related to any ancestor of Y at time t iseth12At=NtTHORNAt Finally the probability cg that a randomlychosen spousal pair share no common ancestors (no pedi-

gree inbreeding) up to and including generation g iscg frac14

Qtfrac141geth12At=NtTHORNAt

The above calculation for cg relies on estimations of At

and Nt that are not straightforward to derive As an approx-imation At ffi 2t and Nt ffi Ne While the latter assumption isprobably close to the truth on average the former (At ffi 2t)must overestimate the number of ancestors at time t becauseAt Nt even though At grows exponentially while Nt re-mains constant Put another way as one travels back ina pedigree duplicate ancestors grow increasingly commondue to ancient inbreeding This biases cg to be lower than itshould be but further investigation (not shown) indicates thatthis bias is minimal not changing how far one must go back tofind a common ancestor by more than a single generation

Figure 2 shows expected values for cg for four differentpopulation sizes the three investigated in the present sim-ulation as well as for Ne frac14 1 M which is probably a largermating pool than most modern humans belong to We checkedour theoretical predictions against simulation results for thethree population sizes (Ne frac14 100 1000 and 10000) andthe five generations of ancestry we recorded The expectedprobabilities (solid dots) agree nicely with the observed pro-portions from the simulated data (colored xrsquos)

Figure 2 shows that mates are likely to share a commonancestor in their recent ancestral past For populations ofsize Ne frac14 1000 almost all spousal pairs have at least onecommon ancestor in common from 6 generations in thepast For Ne frac14 10000 spousal pairs share a common ances-tor within 8 generations Indeed for every 10-fold increasein population size the number of generations one must goback before a common ancestor becomes certain increasesby 2 generations Thus at Ne frac14 1000000 spouses arealmost certain to share a common ancestor within the last11 generations Increasing this population size to 100 mil-lion pushes this estimate back only to 15 generationsROHs caused by inbreeding from common ancestors 15 gen-erations in the past are 3 Mb in length on average al-though due to the finite number of recombination eventsper generation inbreeding events from 15 generations inthe past often cause no ROHs These results are consonantwith the results of Derrida et al (2000) who found thatthere is substantial overlap between pedigree trees of tworandomly chosen individuals in a population of size10000 within 10 generations and that this overlap is com-plete (both individuals share the exact same ancestors)within 20 generations

Variance of Fped

Given that the contribution to Fped at generation t is inde-pendent of the contribution to Fped from previous genera-tions the variation of Fped for a given population size can beapportioned into the amount contributed by each ancestralgeneration Figure 3 shows that the contribution to the var-iation of Fped decreases log-linearly as a function of the con-tributing ancestral generation The bend (nonlinearity) atgeneration 1 for the samples drawn from the Ne frac14 10000

Figure 2 Shown is the probability that mates share no commonancestors in the most recent g generations as a function of populationsize (see text) The xrsquos are the same values derived empirically fromsimulations for up to 5 generations in the past and show good agree-ment with the expected probabilities Even in large randomly breedingpopulations (eg 1 million) it is almost certain that at least one ancestorexists in common between two pedigrees within 11 generations

242 M C Keller P M Visscher and M E Goddard

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 3: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

et al 2008) Fregene simulates the evolutionary process ofgenetic sequencing data in a population following theFisherndashWright model (a monoecious diploid randomly mat-ing population that evolves across nonoverlapping genera-tions) Mating was random except that selfing was notallowed It should be noted that variation of all estimatesof F will be different if mating is not random Nonrandommating typically increases variation in F as does a reductionin Ne Therefore smaller effective population sizes could beused as a proxy for studying variation of estimates of F in thepresence of nonrandom mating

The effective population size (Ne) of humans has beenestimated at 10000 on the basis of molecular variation(Takahata et al 1995) and to have passed through a bottle-neck reducing the population sizes to 3000 for Caucasiansand 8000 for Africans on the basis of linkage disequilib-rium patterns (Tenesa et al 2007) In the present study wesimulated effective population sizes of Ne frac14 100 1000 and10000 Simulating larger population sizes was not compu-tationally feasible Each individualrsquos genome was composedof two homologous chromosomes of length 100 Mb Wechose 100-Mbndashlength genomes for reasons of computationalfeasibility and because 100 Mb is roughly the size of a typicalmammalian chromosome Fregenersquos recombination modelallows for crossover rates that vary along the chromosomeat both broad scales (regions of several megabases in lengththat differ in background recombination rates) and finescales (corresponding to recombination hotspots of 2 kbin length) Recombination rates for the present set of simu-lations averaged n frac14 11 middot 1028 per site per generation 80of recombination events occurred in hotspot regions andthe average distance between hotspots was 8500 bp Muta-tions arose at rate m frac14 23 middot 1028 per site per generationand all mutations were neutral with respect to fitness Newmutations occurred uniformly and at random across the ge-nome thus mutations could arise at an already polymorphicsite allowing both ldquodouble hitrdquo mutations (if occurring at anancestral allele) and ldquobackrdquomutations (if occurring at a derivedallele) although these types of mutations were extremely rareThese parameters the defaults in Fregene were based onestimated per site recombination and mutation rates in humanpopulations (Chadeau-Hyam et al 2008)

To reduce the computational time taken by our simula-tions we used Fregenersquos scale_exp option with the scalingparameter l frac14 20 This reduced the population size and num-ber of generations 20-fold but increased n and m 20-foldwhich kept the population mutation parameter u frac14 4Nem

and the population linkage parameter r frac14 4Nen constantthereby mimicking the degree of variation and linkage dis-equilibrium in nonscaled populations but decreasing thecomputational time by over an order of magnitude Popula-tions evolved for 100000 generations (ie across 5000 loopsonce scaled) ensuring that they reached driftndashmutationndashrecombination equilibrium (Chadeau-Hyam et al 2008)

One drawback with scaling in Fregene is that the mixtureof rare to common variants is slightly inaccurate For ex-

ample with lthinsp frac14 thinsp 20 and Ne frac14 1000 no variant can havea minor allele frequency 001 (rather than 00005 whichit would be if unscaled) To minimize this effect at the endof the 100000 generations (5000 loops) we ran Fregenewith no scaling for an additional 1000 generations (1000loops) for each population The additional unscaled gener-ations allowed polymorphism levels (as judged by heterozy-gosity) to build up to expected levels The observed levels ofheterozygosity in the three populations at the end of thesimulation (93 middot 1024 93 middot 1025 9 middot 1026) conformedclosely to 4Nem=eth4Nemthorn 1THORN as predicted by theory(92 middot 1024 92 middot 1025 9 middot 1026)

To investigate the effects of population size on variousestimates of F we needed to (a) obtain information on ped-igrees going back several generations (b) derive SNP datafrom the sequences in the samples and (c) create samples ofequal numbers of individuals despite the varying populationsizes To accomplish the first of these goals we modified theFregene program to track and write out pedigree informa-tion for each individual going back five generations (as weshow going back farther than this was unnecessary) Toaccomplish the second goal of creating SNP data we se-lected all variants with minor allele frequency (MAF) 005 from the sequence data that Fregene outputs at thefinal generation This resulted in 2700 SNPs (1 SNP per36 kb) in samples from the Ne frac14 100 population 27000SNPs (1 SNP per 37 kb) in samples from the Ne frac14 1000population and 274000 SNPs (1 SNP per 360 bp) in sam-ples from the Ne frac14 10000 population These differences inSNP densities reflect the larger numbers of variants that nat-urally occur in larger populations (Crow and Kimura 1970)

The third goal to create equal experimental sample sizesfor each population size was motivated by the fact that wedid not want our results across different population sizes tobe confounded by differing sample sizes To accomplish thegoal of analyzing adequately large samples of equal sizesacross the three different population sizes for the Ne frac14 100population 100 subpopulations (each of Ne frac14 100) split offfrom the original Ne frac14 100 population and evolved indepen-dently for 50 generations Groups of 10 of these subpopula-tions were randomly placed together (without replacement)to create 10 samples of size n = 1000 each Because all 100subpopulations evolved from the same progenitor popula-tion the common variants were generally the same betweensubpopulations allowing SNP sets to be created from then = 1000 samples Thus despite having the same levels ofinbreeding as in a population of size Ne frac14 100 each of 10replicate samples consisted of 1000 individuals Similarlyfor the Ne frac14 1000 population 10 subpopulations of sizeNe frac14 1000 split off from the original population (of sizeNe frac14 1000) and evolved independently for 50 generationseach of these populations was converted to 10 samples ofsize n = 1000 each at the end of the 50 generations Finallythe Ne frac14 10000 population was split into 10 replicationsamples of n = 1000 individuals each and evolved for 50generations In this way 10 samples of size n = 1000 were

Detecting Ancient Inbreeding 239

derived from the different population sizes (see Figure 1)By evolving independently for 50 generations each subpop-ulation (and therefore each sample) had independentbreeding (and inbreeding) histories within a 50-generationtime frame

To study the effects of population expansion on estimatesof F we allowed a population of size Ne frac14 100 that hadreached driftndashmutation equilibrium to expand instanta-neously to Ne frac14 10000 and then to evolve at this size for400 generations [11400 years for humans (Fenner2005)] Computational limitations disallowed investigatinglarger and more realistic expansions (eg Ne frac14 10000ndash1000000) but conclusions from studying smaller-scaleexpansions should apply to larger expansions as well Wewrote out pedigree sequencing and SNP data for 10 sam-ples of size n = 1000 as described above at generations0 (immediately before expansion and therefore identical tothe Ne frac14 100 data) 50 100 200 and 400 Results fromgeneration 100000 come from the final generation of theconstant Ne frac14 10000 population

A proxy for the homozygous mutation load

Given the evidence that inbreeding depression is caused byhomozygosity at numerous partially recessive deleteriousmutations (Charlesworth and Willis 2009) and because del-eterious mutations rarely reach frequencies 005 in thepopulation (Pritchard 2001) we approximated an individu-alrsquos overall load of homozygous recessivepartially recessivedeleterious mutations by defining mutations as the set of allvariants not included in the SNP data (ie all variants withMAF 005 in the original sequence data) and summed allsuch mutations that were homozygous for each individualWe call this measure the ldquohomozygous mutation loadrdquo Wecomputed the homozygous mutational load using rare neu-tral alleles for reasons of computational efficiency but rareneutral alleles approximated well the behavior of mildlydeleterious partially recessive alleles For all populationsizes the observed distribution of allele frequencies of sim-ulated neutral alleles with MAF 005 was very similar tothe theoretically expected frequency distributions of eitherpartially recessive (h = 025) very mildly deleterious muta-tions (s 1Ne) or fully recessive mutations (h= 0) of muchlarger effect (s 50Ne) where h is the dominance coefficientand s the selective coefficient (see supporting informationFigure S1) (Crow and Kimura 1970) Thus the homozygousmutation load investigated here simulates what would be ob-served if inbreeding depression is due to the aggregated effectof homozygosity at a large number of recessive to partiallyrecessive mildly deleterious alleles

Real SNP data

To compare results from simulated SNP data with resultsfrom real human SNP data we used 1000 randomly selectedcontrol individuals of Caucasian descent from the publiclyavailable Molecular Genetics of SchizophreniandashGAIN (Ge-netic Association Information Network) sample (OrsquoDonovanet al 2008) genotyped on the Affy 60 platform All SNPspassed rigorous quality control metrics (MAF 005 miss-ingness 002 HardyndashWeinberg equilibrium P-values00001) and individuals who had a missingness rate002 or who were outliers on the first two principal com-ponent dimensions derived from an identical-by-state matrixwere dropped Genome-wide 546882 SNPs (1 SNP per51 kb) of 906600 passed these quality control thresholdsWe then selected the first 100 Mb of SNP data from chromo-somes 1ndash10 (after removing the centromere and surroundingheterochromatin if applicable) to create 10 different SNPsamples We derived 10 replicates of the three genomic esti-mates of F (defined below) from these samples

Estimates of F

In some contexts F can be conceptualized as a parameterrather than as an estimate For example Fped is a knownproperty given a pedigree and a base population and canbe called a parameter of an individual in this context Inthe present context however Fped and all other F statistics

Figure 1 Procedure for deriving 10 samples from each of three effectivepopulation sizes Circles represent populations arrows represent evolu-tion and splittingcombining of each population and squares representsamples of size 1000 derived from each population The sizes of theshapes correspond to population (circles and arrows) or to sample(squares) sizes See text for details

240 M C Keller P M Visscher and M E Goddard

investigated here are conceptualized as imperfect estimatesof the actual level of autozygosity in an individualrsquos genomeThe average F estimate in a population depends on the basepopulation (defined as the ancestral population when F= 0)and increases as one considers older base populations How-ever the average F is not very important in the contextconsidered here Variation in F among the individuals inthe population is required to detect an inbreeding effectso it is variation in F that we focus on We use a recent basepopulationmdashfive generations agomdashfor pedigree inbreedingand the current population for genomic estimates of F aspresented in Powell et al (2010)

In each simulated sample we calculated four alternativeestimates of F as well as two additional estimates of Frohbased on shorter and longer megabase thresholds In thereal SNP data sample we calculated the three genomic esti-mates of F To compare the spread of the estimates of F thevariances of F were derived across the 1000 individuals ina sample and the log (base 10) of these variances was takento aid interpretability As there were 10 replicate samplesderived from each population size (see above) we then tookthe mean of the 10 log (base 10) variances and found thestandard error around each of these means Each estimate ofF is described below

1 Fped F from pedigree inbreeding going back fvie gener-ations as figured from Wrightrsquos path formula

Fpedjfrac14Xifrac141c

12

ethnijthornmijthorn1THORN

where mii and nii refer to the number of paternal and ma-ternal paths from the ith common ancestor and c refers tothe number of common ancestors of individual j For exam-ple nij frac14 mij frac14 2 for cousinndashcousin inbreeding and c frac14 2when the cousins share two grandparents (Fped frac14 00625)and c frac14 1 when the cousins share one grandparent(Fped frac14 003125) Virtually all variation in Fped is capturedwithin the most recent five generations (see Results) and sopaths mij and nij were not extended beyond this

2 Fh is the canonical estimate of genomic F based on excessSNP homozygosity

Fhjfrac14 OethHjTHORN2 EethHTHORN

m2 EethHTHORN

where OethHjTHORN is the observed homozygosity across all SNPsfor person j EethHTHORN frac14Pi 122pieth12piTHORN is the expected homo-zygosity for all people in the sample and pi is the MAF forSNPs i = 1 m This estimate can be obtained fromPLINK (Purcell et al 2007) using the ndashhet command

3 Falt is an alternative estimate of F predicted to have lowererror (Yang et al 2010a)

Faltj frac14 12P

idim

di frac14 1=  pi and 1= qi for a homozygote for the minor and themajor allele respectively and 0 if heterozygote at SNP i andwhere qi is the frequency of the major allele at the ith SNPand pi frac14 12qi

4 Froh refers to the proportion of the genome (0ndash1) that isin runs of homozygosity (ROHs)

Frohjfrac14P

k lengthethROHkTHORNL

ROHk is the kth ROH in individual jrsquos genome and L = 106

bp the length of the genome in both the simulated and thereal SNP data sets ROHs were found using PLINK and de-fined as stretches of continuously homozygous SNPs span-ning at least 15 Mb (165 cM) The lengths of ROHsegments generated by a single path should follow an expo-nential distribution with mean 1=2g M where g is the num-ber of generations since the last common ancestor (Fisher1954) Thus the expected length of an autozygous ROHsegment caused by a common ancestor g = 30 generationsin the past is 165 cM (15 Mb)

To compare alternative Froh thresholds we defined Frohshortas a run of homozygous SNPs 05 Mb and Frohlong as a runof homozygous SNPs 5 Mb These three threshold values(05 Mb 15 Mb and 5 Mb) are the same as those used byMcQuillan et al (2008) in their investigations into Froh inEuropean populations Frohlong is likely to lack many auto-zygous stretches of ancient origin (eg from 10 genera-tions back which have expected sizes of 5 Mb) but is alsolikely to be composed of very few ROHs that are not trulyautozygous On the other hand Frohshort can detect moreancient autozygous segments (eg from 100 generationsback) at the expense of a higher false detection ratemdashiedetecting a higher proportion of ROHs that are allozygousrather than autozygous Such allozygous haplotypes may besimilar superficially creating runs of homozygosity at mea-sured SNPs but heterozygous at unmeasured rare muta-tions Using the expected exponential distribution of ROHsegments our thresholds of 05 Mb 15 Mb and 5 Mb shouldcapture 58 20 and 05 (respectively) of all ROHs cre-ated from common ancestors 50 generations in the past

Results

Time to most recent common ancestor between spouses

Given that ancient inbreeding is central to the presentinvestigation and that the relationship between mates de-termines the inbreeding coefficient of their progeny it isuseful to have an intuition of how long ago the most recentcommon ancestor existed between two mates in a randomly

Detecting Ancient Inbreeding 241

mating population A similar issue has been investigatedwith respect to the amount of overlap in pedigrees betweentwo individuals (Ohno 1996 Derrida et al 2000) as wediscuss below Similarly Chang (1999) assessed the numberof generations one must travel back before reaching themost recent common ancestor of every human alive todayand it is surprisingly recent about 110 generations (3100years) ago (see also Rohde et al 2004 Lachance 2009)Clearly the most recent common ancestor between justtwo randomly chosen individuals (eg spouses) must bemuch more recent still This can be quantified simply ina FisherndashWright model

For a population of constant size consider two randomlychosen individuals (spouses) X and Y Take a singleancestor of spouse X who lived t generations ago Givennonoverlapping generations the probability that this ances-tor is the same person as any of spouse Yrsquos ancestors wholived t generations ago is At=Nt where At is the total numberof ancestors of Y at generation t and Nt is the total numberof all ancestors in the population at generation t This fol-lows from the fact that the probabilities that the ancestor ofX is related to each of Yrsquos ancestors are independent of eachother (ie multiple ancestors of Y can be the same person)The probability that the ancestor of X is unrelated to anyancestor of Y is therefore 12At=Nt Assuming that X also hasAt distinct ancestors at time t the probability that no ances-tor of X is related to any ancestor of Y at time t iseth12At=NtTHORNAt Finally the probability cg that a randomlychosen spousal pair share no common ancestors (no pedi-

gree inbreeding) up to and including generation g iscg frac14

Qtfrac141geth12At=NtTHORNAt

The above calculation for cg relies on estimations of At

and Nt that are not straightforward to derive As an approx-imation At ffi 2t and Nt ffi Ne While the latter assumption isprobably close to the truth on average the former (At ffi 2t)must overestimate the number of ancestors at time t becauseAt Nt even though At grows exponentially while Nt re-mains constant Put another way as one travels back ina pedigree duplicate ancestors grow increasingly commondue to ancient inbreeding This biases cg to be lower than itshould be but further investigation (not shown) indicates thatthis bias is minimal not changing how far one must go back tofind a common ancestor by more than a single generation

Figure 2 shows expected values for cg for four differentpopulation sizes the three investigated in the present sim-ulation as well as for Ne frac14 1 M which is probably a largermating pool than most modern humans belong to We checkedour theoretical predictions against simulation results for thethree population sizes (Ne frac14 100 1000 and 10000) andthe five generations of ancestry we recorded The expectedprobabilities (solid dots) agree nicely with the observed pro-portions from the simulated data (colored xrsquos)

Figure 2 shows that mates are likely to share a commonancestor in their recent ancestral past For populations ofsize Ne frac14 1000 almost all spousal pairs have at least onecommon ancestor in common from 6 generations in thepast For Ne frac14 10000 spousal pairs share a common ances-tor within 8 generations Indeed for every 10-fold increasein population size the number of generations one must goback before a common ancestor becomes certain increasesby 2 generations Thus at Ne frac14 1000000 spouses arealmost certain to share a common ancestor within the last11 generations Increasing this population size to 100 mil-lion pushes this estimate back only to 15 generationsROHs caused by inbreeding from common ancestors 15 gen-erations in the past are 3 Mb in length on average al-though due to the finite number of recombination eventsper generation inbreeding events from 15 generations inthe past often cause no ROHs These results are consonantwith the results of Derrida et al (2000) who found thatthere is substantial overlap between pedigree trees of tworandomly chosen individuals in a population of size10000 within 10 generations and that this overlap is com-plete (both individuals share the exact same ancestors)within 20 generations

Variance of Fped

Given that the contribution to Fped at generation t is inde-pendent of the contribution to Fped from previous genera-tions the variation of Fped for a given population size can beapportioned into the amount contributed by each ancestralgeneration Figure 3 shows that the contribution to the var-iation of Fped decreases log-linearly as a function of the con-tributing ancestral generation The bend (nonlinearity) atgeneration 1 for the samples drawn from the Ne frac14 10000

Figure 2 Shown is the probability that mates share no commonancestors in the most recent g generations as a function of populationsize (see text) The xrsquos are the same values derived empirically fromsimulations for up to 5 generations in the past and show good agree-ment with the expected probabilities Even in large randomly breedingpopulations (eg 1 million) it is almost certain that at least one ancestorexists in common between two pedigrees within 11 generations

242 M C Keller P M Visscher and M E Goddard

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 4: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

derived from the different population sizes (see Figure 1)By evolving independently for 50 generations each subpop-ulation (and therefore each sample) had independentbreeding (and inbreeding) histories within a 50-generationtime frame

To study the effects of population expansion on estimatesof F we allowed a population of size Ne frac14 100 that hadreached driftndashmutation equilibrium to expand instanta-neously to Ne frac14 10000 and then to evolve at this size for400 generations [11400 years for humans (Fenner2005)] Computational limitations disallowed investigatinglarger and more realistic expansions (eg Ne frac14 10000ndash1000000) but conclusions from studying smaller-scaleexpansions should apply to larger expansions as well Wewrote out pedigree sequencing and SNP data for 10 sam-ples of size n = 1000 as described above at generations0 (immediately before expansion and therefore identical tothe Ne frac14 100 data) 50 100 200 and 400 Results fromgeneration 100000 come from the final generation of theconstant Ne frac14 10000 population

A proxy for the homozygous mutation load

Given the evidence that inbreeding depression is caused byhomozygosity at numerous partially recessive deleteriousmutations (Charlesworth and Willis 2009) and because del-eterious mutations rarely reach frequencies 005 in thepopulation (Pritchard 2001) we approximated an individu-alrsquos overall load of homozygous recessivepartially recessivedeleterious mutations by defining mutations as the set of allvariants not included in the SNP data (ie all variants withMAF 005 in the original sequence data) and summed allsuch mutations that were homozygous for each individualWe call this measure the ldquohomozygous mutation loadrdquo Wecomputed the homozygous mutational load using rare neu-tral alleles for reasons of computational efficiency but rareneutral alleles approximated well the behavior of mildlydeleterious partially recessive alleles For all populationsizes the observed distribution of allele frequencies of sim-ulated neutral alleles with MAF 005 was very similar tothe theoretically expected frequency distributions of eitherpartially recessive (h = 025) very mildly deleterious muta-tions (s 1Ne) or fully recessive mutations (h= 0) of muchlarger effect (s 50Ne) where h is the dominance coefficientand s the selective coefficient (see supporting informationFigure S1) (Crow and Kimura 1970) Thus the homozygousmutation load investigated here simulates what would be ob-served if inbreeding depression is due to the aggregated effectof homozygosity at a large number of recessive to partiallyrecessive mildly deleterious alleles

Real SNP data

To compare results from simulated SNP data with resultsfrom real human SNP data we used 1000 randomly selectedcontrol individuals of Caucasian descent from the publiclyavailable Molecular Genetics of SchizophreniandashGAIN (Ge-netic Association Information Network) sample (OrsquoDonovanet al 2008) genotyped on the Affy 60 platform All SNPspassed rigorous quality control metrics (MAF 005 miss-ingness 002 HardyndashWeinberg equilibrium P-values00001) and individuals who had a missingness rate002 or who were outliers on the first two principal com-ponent dimensions derived from an identical-by-state matrixwere dropped Genome-wide 546882 SNPs (1 SNP per51 kb) of 906600 passed these quality control thresholdsWe then selected the first 100 Mb of SNP data from chromo-somes 1ndash10 (after removing the centromere and surroundingheterochromatin if applicable) to create 10 different SNPsamples We derived 10 replicates of the three genomic esti-mates of F (defined below) from these samples

Estimates of F

In some contexts F can be conceptualized as a parameterrather than as an estimate For example Fped is a knownproperty given a pedigree and a base population and canbe called a parameter of an individual in this context Inthe present context however Fped and all other F statistics

Figure 1 Procedure for deriving 10 samples from each of three effectivepopulation sizes Circles represent populations arrows represent evolu-tion and splittingcombining of each population and squares representsamples of size 1000 derived from each population The sizes of theshapes correspond to population (circles and arrows) or to sample(squares) sizes See text for details

240 M C Keller P M Visscher and M E Goddard

investigated here are conceptualized as imperfect estimatesof the actual level of autozygosity in an individualrsquos genomeThe average F estimate in a population depends on the basepopulation (defined as the ancestral population when F= 0)and increases as one considers older base populations How-ever the average F is not very important in the contextconsidered here Variation in F among the individuals inthe population is required to detect an inbreeding effectso it is variation in F that we focus on We use a recent basepopulationmdashfive generations agomdashfor pedigree inbreedingand the current population for genomic estimates of F aspresented in Powell et al (2010)

In each simulated sample we calculated four alternativeestimates of F as well as two additional estimates of Frohbased on shorter and longer megabase thresholds In thereal SNP data sample we calculated the three genomic esti-mates of F To compare the spread of the estimates of F thevariances of F were derived across the 1000 individuals ina sample and the log (base 10) of these variances was takento aid interpretability As there were 10 replicate samplesderived from each population size (see above) we then tookthe mean of the 10 log (base 10) variances and found thestandard error around each of these means Each estimate ofF is described below

1 Fped F from pedigree inbreeding going back fvie gener-ations as figured from Wrightrsquos path formula

Fpedjfrac14Xifrac141c

12

ethnijthornmijthorn1THORN

where mii and nii refer to the number of paternal and ma-ternal paths from the ith common ancestor and c refers tothe number of common ancestors of individual j For exam-ple nij frac14 mij frac14 2 for cousinndashcousin inbreeding and c frac14 2when the cousins share two grandparents (Fped frac14 00625)and c frac14 1 when the cousins share one grandparent(Fped frac14 003125) Virtually all variation in Fped is capturedwithin the most recent five generations (see Results) and sopaths mij and nij were not extended beyond this

2 Fh is the canonical estimate of genomic F based on excessSNP homozygosity

Fhjfrac14 OethHjTHORN2 EethHTHORN

m2 EethHTHORN

where OethHjTHORN is the observed homozygosity across all SNPsfor person j EethHTHORN frac14Pi 122pieth12piTHORN is the expected homo-zygosity for all people in the sample and pi is the MAF forSNPs i = 1 m This estimate can be obtained fromPLINK (Purcell et al 2007) using the ndashhet command

3 Falt is an alternative estimate of F predicted to have lowererror (Yang et al 2010a)

Faltj frac14 12P

idim

di frac14 1=  pi and 1= qi for a homozygote for the minor and themajor allele respectively and 0 if heterozygote at SNP i andwhere qi is the frequency of the major allele at the ith SNPand pi frac14 12qi

4 Froh refers to the proportion of the genome (0ndash1) that isin runs of homozygosity (ROHs)

Frohjfrac14P

k lengthethROHkTHORNL

ROHk is the kth ROH in individual jrsquos genome and L = 106

bp the length of the genome in both the simulated and thereal SNP data sets ROHs were found using PLINK and de-fined as stretches of continuously homozygous SNPs span-ning at least 15 Mb (165 cM) The lengths of ROHsegments generated by a single path should follow an expo-nential distribution with mean 1=2g M where g is the num-ber of generations since the last common ancestor (Fisher1954) Thus the expected length of an autozygous ROHsegment caused by a common ancestor g = 30 generationsin the past is 165 cM (15 Mb)

To compare alternative Froh thresholds we defined Frohshortas a run of homozygous SNPs 05 Mb and Frohlong as a runof homozygous SNPs 5 Mb These three threshold values(05 Mb 15 Mb and 5 Mb) are the same as those used byMcQuillan et al (2008) in their investigations into Froh inEuropean populations Frohlong is likely to lack many auto-zygous stretches of ancient origin (eg from 10 genera-tions back which have expected sizes of 5 Mb) but is alsolikely to be composed of very few ROHs that are not trulyautozygous On the other hand Frohshort can detect moreancient autozygous segments (eg from 100 generationsback) at the expense of a higher false detection ratemdashiedetecting a higher proportion of ROHs that are allozygousrather than autozygous Such allozygous haplotypes may besimilar superficially creating runs of homozygosity at mea-sured SNPs but heterozygous at unmeasured rare muta-tions Using the expected exponential distribution of ROHsegments our thresholds of 05 Mb 15 Mb and 5 Mb shouldcapture 58 20 and 05 (respectively) of all ROHs cre-ated from common ancestors 50 generations in the past

Results

Time to most recent common ancestor between spouses

Given that ancient inbreeding is central to the presentinvestigation and that the relationship between mates de-termines the inbreeding coefficient of their progeny it isuseful to have an intuition of how long ago the most recentcommon ancestor existed between two mates in a randomly

Detecting Ancient Inbreeding 241

mating population A similar issue has been investigatedwith respect to the amount of overlap in pedigrees betweentwo individuals (Ohno 1996 Derrida et al 2000) as wediscuss below Similarly Chang (1999) assessed the numberof generations one must travel back before reaching themost recent common ancestor of every human alive todayand it is surprisingly recent about 110 generations (3100years) ago (see also Rohde et al 2004 Lachance 2009)Clearly the most recent common ancestor between justtwo randomly chosen individuals (eg spouses) must bemuch more recent still This can be quantified simply ina FisherndashWright model

For a population of constant size consider two randomlychosen individuals (spouses) X and Y Take a singleancestor of spouse X who lived t generations ago Givennonoverlapping generations the probability that this ances-tor is the same person as any of spouse Yrsquos ancestors wholived t generations ago is At=Nt where At is the total numberof ancestors of Y at generation t and Nt is the total numberof all ancestors in the population at generation t This fol-lows from the fact that the probabilities that the ancestor ofX is related to each of Yrsquos ancestors are independent of eachother (ie multiple ancestors of Y can be the same person)The probability that the ancestor of X is unrelated to anyancestor of Y is therefore 12At=Nt Assuming that X also hasAt distinct ancestors at time t the probability that no ances-tor of X is related to any ancestor of Y at time t iseth12At=NtTHORNAt Finally the probability cg that a randomlychosen spousal pair share no common ancestors (no pedi-

gree inbreeding) up to and including generation g iscg frac14

Qtfrac141geth12At=NtTHORNAt

The above calculation for cg relies on estimations of At

and Nt that are not straightforward to derive As an approx-imation At ffi 2t and Nt ffi Ne While the latter assumption isprobably close to the truth on average the former (At ffi 2t)must overestimate the number of ancestors at time t becauseAt Nt even though At grows exponentially while Nt re-mains constant Put another way as one travels back ina pedigree duplicate ancestors grow increasingly commondue to ancient inbreeding This biases cg to be lower than itshould be but further investigation (not shown) indicates thatthis bias is minimal not changing how far one must go back tofind a common ancestor by more than a single generation

Figure 2 shows expected values for cg for four differentpopulation sizes the three investigated in the present sim-ulation as well as for Ne frac14 1 M which is probably a largermating pool than most modern humans belong to We checkedour theoretical predictions against simulation results for thethree population sizes (Ne frac14 100 1000 and 10000) andthe five generations of ancestry we recorded The expectedprobabilities (solid dots) agree nicely with the observed pro-portions from the simulated data (colored xrsquos)

Figure 2 shows that mates are likely to share a commonancestor in their recent ancestral past For populations ofsize Ne frac14 1000 almost all spousal pairs have at least onecommon ancestor in common from 6 generations in thepast For Ne frac14 10000 spousal pairs share a common ances-tor within 8 generations Indeed for every 10-fold increasein population size the number of generations one must goback before a common ancestor becomes certain increasesby 2 generations Thus at Ne frac14 1000000 spouses arealmost certain to share a common ancestor within the last11 generations Increasing this population size to 100 mil-lion pushes this estimate back only to 15 generationsROHs caused by inbreeding from common ancestors 15 gen-erations in the past are 3 Mb in length on average al-though due to the finite number of recombination eventsper generation inbreeding events from 15 generations inthe past often cause no ROHs These results are consonantwith the results of Derrida et al (2000) who found thatthere is substantial overlap between pedigree trees of tworandomly chosen individuals in a population of size10000 within 10 generations and that this overlap is com-plete (both individuals share the exact same ancestors)within 20 generations

Variance of Fped

Given that the contribution to Fped at generation t is inde-pendent of the contribution to Fped from previous genera-tions the variation of Fped for a given population size can beapportioned into the amount contributed by each ancestralgeneration Figure 3 shows that the contribution to the var-iation of Fped decreases log-linearly as a function of the con-tributing ancestral generation The bend (nonlinearity) atgeneration 1 for the samples drawn from the Ne frac14 10000

Figure 2 Shown is the probability that mates share no commonancestors in the most recent g generations as a function of populationsize (see text) The xrsquos are the same values derived empirically fromsimulations for up to 5 generations in the past and show good agree-ment with the expected probabilities Even in large randomly breedingpopulations (eg 1 million) it is almost certain that at least one ancestorexists in common between two pedigrees within 11 generations

242 M C Keller P M Visscher and M E Goddard

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 5: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

investigated here are conceptualized as imperfect estimatesof the actual level of autozygosity in an individualrsquos genomeThe average F estimate in a population depends on the basepopulation (defined as the ancestral population when F= 0)and increases as one considers older base populations How-ever the average F is not very important in the contextconsidered here Variation in F among the individuals inthe population is required to detect an inbreeding effectso it is variation in F that we focus on We use a recent basepopulationmdashfive generations agomdashfor pedigree inbreedingand the current population for genomic estimates of F aspresented in Powell et al (2010)

In each simulated sample we calculated four alternativeestimates of F as well as two additional estimates of Frohbased on shorter and longer megabase thresholds In thereal SNP data sample we calculated the three genomic esti-mates of F To compare the spread of the estimates of F thevariances of F were derived across the 1000 individuals ina sample and the log (base 10) of these variances was takento aid interpretability As there were 10 replicate samplesderived from each population size (see above) we then tookthe mean of the 10 log (base 10) variances and found thestandard error around each of these means Each estimate ofF is described below

1 Fped F from pedigree inbreeding going back fvie gener-ations as figured from Wrightrsquos path formula

Fpedjfrac14Xifrac141c

12

ethnijthornmijthorn1THORN

where mii and nii refer to the number of paternal and ma-ternal paths from the ith common ancestor and c refers tothe number of common ancestors of individual j For exam-ple nij frac14 mij frac14 2 for cousinndashcousin inbreeding and c frac14 2when the cousins share two grandparents (Fped frac14 00625)and c frac14 1 when the cousins share one grandparent(Fped frac14 003125) Virtually all variation in Fped is capturedwithin the most recent five generations (see Results) and sopaths mij and nij were not extended beyond this

2 Fh is the canonical estimate of genomic F based on excessSNP homozygosity

Fhjfrac14 OethHjTHORN2 EethHTHORN

m2 EethHTHORN

where OethHjTHORN is the observed homozygosity across all SNPsfor person j EethHTHORN frac14Pi 122pieth12piTHORN is the expected homo-zygosity for all people in the sample and pi is the MAF forSNPs i = 1 m This estimate can be obtained fromPLINK (Purcell et al 2007) using the ndashhet command

3 Falt is an alternative estimate of F predicted to have lowererror (Yang et al 2010a)

Faltj frac14 12P

idim

di frac14 1=  pi and 1= qi for a homozygote for the minor and themajor allele respectively and 0 if heterozygote at SNP i andwhere qi is the frequency of the major allele at the ith SNPand pi frac14 12qi

4 Froh refers to the proportion of the genome (0ndash1) that isin runs of homozygosity (ROHs)

Frohjfrac14P

k lengthethROHkTHORNL

ROHk is the kth ROH in individual jrsquos genome and L = 106

bp the length of the genome in both the simulated and thereal SNP data sets ROHs were found using PLINK and de-fined as stretches of continuously homozygous SNPs span-ning at least 15 Mb (165 cM) The lengths of ROHsegments generated by a single path should follow an expo-nential distribution with mean 1=2g M where g is the num-ber of generations since the last common ancestor (Fisher1954) Thus the expected length of an autozygous ROHsegment caused by a common ancestor g = 30 generationsin the past is 165 cM (15 Mb)

To compare alternative Froh thresholds we defined Frohshortas a run of homozygous SNPs 05 Mb and Frohlong as a runof homozygous SNPs 5 Mb These three threshold values(05 Mb 15 Mb and 5 Mb) are the same as those used byMcQuillan et al (2008) in their investigations into Froh inEuropean populations Frohlong is likely to lack many auto-zygous stretches of ancient origin (eg from 10 genera-tions back which have expected sizes of 5 Mb) but is alsolikely to be composed of very few ROHs that are not trulyautozygous On the other hand Frohshort can detect moreancient autozygous segments (eg from 100 generationsback) at the expense of a higher false detection ratemdashiedetecting a higher proportion of ROHs that are allozygousrather than autozygous Such allozygous haplotypes may besimilar superficially creating runs of homozygosity at mea-sured SNPs but heterozygous at unmeasured rare muta-tions Using the expected exponential distribution of ROHsegments our thresholds of 05 Mb 15 Mb and 5 Mb shouldcapture 58 20 and 05 (respectively) of all ROHs cre-ated from common ancestors 50 generations in the past

Results

Time to most recent common ancestor between spouses

Given that ancient inbreeding is central to the presentinvestigation and that the relationship between mates de-termines the inbreeding coefficient of their progeny it isuseful to have an intuition of how long ago the most recentcommon ancestor existed between two mates in a randomly

Detecting Ancient Inbreeding 241

mating population A similar issue has been investigatedwith respect to the amount of overlap in pedigrees betweentwo individuals (Ohno 1996 Derrida et al 2000) as wediscuss below Similarly Chang (1999) assessed the numberof generations one must travel back before reaching themost recent common ancestor of every human alive todayand it is surprisingly recent about 110 generations (3100years) ago (see also Rohde et al 2004 Lachance 2009)Clearly the most recent common ancestor between justtwo randomly chosen individuals (eg spouses) must bemuch more recent still This can be quantified simply ina FisherndashWright model

For a population of constant size consider two randomlychosen individuals (spouses) X and Y Take a singleancestor of spouse X who lived t generations ago Givennonoverlapping generations the probability that this ances-tor is the same person as any of spouse Yrsquos ancestors wholived t generations ago is At=Nt where At is the total numberof ancestors of Y at generation t and Nt is the total numberof all ancestors in the population at generation t This fol-lows from the fact that the probabilities that the ancestor ofX is related to each of Yrsquos ancestors are independent of eachother (ie multiple ancestors of Y can be the same person)The probability that the ancestor of X is unrelated to anyancestor of Y is therefore 12At=Nt Assuming that X also hasAt distinct ancestors at time t the probability that no ances-tor of X is related to any ancestor of Y at time t iseth12At=NtTHORNAt Finally the probability cg that a randomlychosen spousal pair share no common ancestors (no pedi-

gree inbreeding) up to and including generation g iscg frac14

Qtfrac141geth12At=NtTHORNAt

The above calculation for cg relies on estimations of At

and Nt that are not straightforward to derive As an approx-imation At ffi 2t and Nt ffi Ne While the latter assumption isprobably close to the truth on average the former (At ffi 2t)must overestimate the number of ancestors at time t becauseAt Nt even though At grows exponentially while Nt re-mains constant Put another way as one travels back ina pedigree duplicate ancestors grow increasingly commondue to ancient inbreeding This biases cg to be lower than itshould be but further investigation (not shown) indicates thatthis bias is minimal not changing how far one must go back tofind a common ancestor by more than a single generation

Figure 2 shows expected values for cg for four differentpopulation sizes the three investigated in the present sim-ulation as well as for Ne frac14 1 M which is probably a largermating pool than most modern humans belong to We checkedour theoretical predictions against simulation results for thethree population sizes (Ne frac14 100 1000 and 10000) andthe five generations of ancestry we recorded The expectedprobabilities (solid dots) agree nicely with the observed pro-portions from the simulated data (colored xrsquos)

Figure 2 shows that mates are likely to share a commonancestor in their recent ancestral past For populations ofsize Ne frac14 1000 almost all spousal pairs have at least onecommon ancestor in common from 6 generations in thepast For Ne frac14 10000 spousal pairs share a common ances-tor within 8 generations Indeed for every 10-fold increasein population size the number of generations one must goback before a common ancestor becomes certain increasesby 2 generations Thus at Ne frac14 1000000 spouses arealmost certain to share a common ancestor within the last11 generations Increasing this population size to 100 mil-lion pushes this estimate back only to 15 generationsROHs caused by inbreeding from common ancestors 15 gen-erations in the past are 3 Mb in length on average al-though due to the finite number of recombination eventsper generation inbreeding events from 15 generations inthe past often cause no ROHs These results are consonantwith the results of Derrida et al (2000) who found thatthere is substantial overlap between pedigree trees of tworandomly chosen individuals in a population of size10000 within 10 generations and that this overlap is com-plete (both individuals share the exact same ancestors)within 20 generations

Variance of Fped

Given that the contribution to Fped at generation t is inde-pendent of the contribution to Fped from previous genera-tions the variation of Fped for a given population size can beapportioned into the amount contributed by each ancestralgeneration Figure 3 shows that the contribution to the var-iation of Fped decreases log-linearly as a function of the con-tributing ancestral generation The bend (nonlinearity) atgeneration 1 for the samples drawn from the Ne frac14 10000

Figure 2 Shown is the probability that mates share no commonancestors in the most recent g generations as a function of populationsize (see text) The xrsquos are the same values derived empirically fromsimulations for up to 5 generations in the past and show good agree-ment with the expected probabilities Even in large randomly breedingpopulations (eg 1 million) it is almost certain that at least one ancestorexists in common between two pedigrees within 11 generations

242 M C Keller P M Visscher and M E Goddard

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 6: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

mating population A similar issue has been investigatedwith respect to the amount of overlap in pedigrees betweentwo individuals (Ohno 1996 Derrida et al 2000) as wediscuss below Similarly Chang (1999) assessed the numberof generations one must travel back before reaching themost recent common ancestor of every human alive todayand it is surprisingly recent about 110 generations (3100years) ago (see also Rohde et al 2004 Lachance 2009)Clearly the most recent common ancestor between justtwo randomly chosen individuals (eg spouses) must bemuch more recent still This can be quantified simply ina FisherndashWright model

For a population of constant size consider two randomlychosen individuals (spouses) X and Y Take a singleancestor of spouse X who lived t generations ago Givennonoverlapping generations the probability that this ances-tor is the same person as any of spouse Yrsquos ancestors wholived t generations ago is At=Nt where At is the total numberof ancestors of Y at generation t and Nt is the total numberof all ancestors in the population at generation t This fol-lows from the fact that the probabilities that the ancestor ofX is related to each of Yrsquos ancestors are independent of eachother (ie multiple ancestors of Y can be the same person)The probability that the ancestor of X is unrelated to anyancestor of Y is therefore 12At=Nt Assuming that X also hasAt distinct ancestors at time t the probability that no ances-tor of X is related to any ancestor of Y at time t iseth12At=NtTHORNAt Finally the probability cg that a randomlychosen spousal pair share no common ancestors (no pedi-

gree inbreeding) up to and including generation g iscg frac14

Qtfrac141geth12At=NtTHORNAt

The above calculation for cg relies on estimations of At

and Nt that are not straightforward to derive As an approx-imation At ffi 2t and Nt ffi Ne While the latter assumption isprobably close to the truth on average the former (At ffi 2t)must overestimate the number of ancestors at time t becauseAt Nt even though At grows exponentially while Nt re-mains constant Put another way as one travels back ina pedigree duplicate ancestors grow increasingly commondue to ancient inbreeding This biases cg to be lower than itshould be but further investigation (not shown) indicates thatthis bias is minimal not changing how far one must go back tofind a common ancestor by more than a single generation

Figure 2 shows expected values for cg for four differentpopulation sizes the three investigated in the present sim-ulation as well as for Ne frac14 1 M which is probably a largermating pool than most modern humans belong to We checkedour theoretical predictions against simulation results for thethree population sizes (Ne frac14 100 1000 and 10000) andthe five generations of ancestry we recorded The expectedprobabilities (solid dots) agree nicely with the observed pro-portions from the simulated data (colored xrsquos)

Figure 2 shows that mates are likely to share a commonancestor in their recent ancestral past For populations ofsize Ne frac14 1000 almost all spousal pairs have at least onecommon ancestor in common from 6 generations in thepast For Ne frac14 10000 spousal pairs share a common ances-tor within 8 generations Indeed for every 10-fold increasein population size the number of generations one must goback before a common ancestor becomes certain increasesby 2 generations Thus at Ne frac14 1000000 spouses arealmost certain to share a common ancestor within the last11 generations Increasing this population size to 100 mil-lion pushes this estimate back only to 15 generationsROHs caused by inbreeding from common ancestors 15 gen-erations in the past are 3 Mb in length on average al-though due to the finite number of recombination eventsper generation inbreeding events from 15 generations inthe past often cause no ROHs These results are consonantwith the results of Derrida et al (2000) who found thatthere is substantial overlap between pedigree trees of tworandomly chosen individuals in a population of size10000 within 10 generations and that this overlap is com-plete (both individuals share the exact same ancestors)within 20 generations

Variance of Fped

Given that the contribution to Fped at generation t is inde-pendent of the contribution to Fped from previous genera-tions the variation of Fped for a given population size can beapportioned into the amount contributed by each ancestralgeneration Figure 3 shows that the contribution to the var-iation of Fped decreases log-linearly as a function of the con-tributing ancestral generation The bend (nonlinearity) atgeneration 1 for the samples drawn from the Ne frac14 10000

Figure 2 Shown is the probability that mates share no commonancestors in the most recent g generations as a function of populationsize (see text) The xrsquos are the same values derived empirically fromsimulations for up to 5 generations in the past and show good agree-ment with the expected probabilities Even in large randomly breedingpopulations (eg 1 million) it is almost certain that at least one ancestorexists in common between two pedigrees within 11 generations

242 M C Keller P M Visscher and M E Goddard

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 7: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

population occurs because several samples had no sibndashsibinbreeding making the log10(var(Fped due to sibndashsib in-breeding)) = 2N Such values were set to ldquomissingrdquo bi-asing the remaining nonmissing values upward

Our results show that very little variation in Fped wasmissed by ignoring inbreeding resulting from commonancestors more than five generations back (see Methods)This is despite the fact that quite long (eg 10 Mb) ROHscan be created from such inbreeding Although we had noway of measuring the proportion of variation in genomicestimates of F that was due to each ancestral generationour results below demonstrate that genomic estimates of Fretain considerable variation in large populations after var-iation in Fped has diminished to 0

Variation and error in genomic and pedigreeestimates of F

One important aim of the current project was to investigatethe effects of population size on the variance of the fourestimates of F Variances were taken across estimates of F foreach individual in samples of size 1000 drawn from thethree population sizes (Ne = 100 1000 and 10000) Figure4 shows the mean (61 SE) of the log10 variances of F asa function of Ne The variance decreases for all estimates of Fas Ne increases caused by the lower levels of ancient andrecent inbreeding in larger populations As expected (Figure3) Fped shows the least variation at every population sizebecause it does not capture the variation in F caused by therandom nature of recombination and segregation The twoestimates of F based on SNP-by-SNP homozygosity (Fh andFalt) show the highest variation while Froh is intermediateFigure 4 also shows the variance values for the three geno-

mic estimates of F in 10 replicates of 100 Mb of real SNPdata from an unselected (outbred) Caucasian control sam-ple The variances of Fh and Falt are slightly higher andvariance of Froh is slightly lower in the real data than inthe Ne frac14 10000 simulation

The prediction error variance (PEV) of the three genomicestimates of F is shown in Figure 5 PEV frac14 varethF12F2THORN=2where F1 is the estimate of F from a random half of SNPsand F2 is the estimate of F from the other half of SNPs(Powell et al 2010 Yang et al 2010a) PEV provides insightinto the amount of error in the genomic estimates of F thatexists due to the finite sampling of SNPs as SNPs becomemore dense and linkage disequilibrium increases betweenmarkers (as occurs with larger Ne) the PEV should decreaseThe PEV for all estimates of F is small (5) compared tothe variances of F shown in Figure 4 As expected the PEV ofevery estimate of F also decreases with population size butthe PEV for Froh decreases the most rapidly Error in Frohdecreases more rapidly as a function of Ne because withdense SNP data (such as exist when Ne frac14 10000) ROHsare made up of a large number of homozygous SNPs ina row (eg a ROH of 15 Mb contains 2000 SNPs in theNe frac14 10000 data) and such long sets of markers pick up thesame sets of ROHs with very high precision

Relationships between genomic estimates of F and Fped

Figure 6 shows the correlation coefficients between thethree genomic estimates of F and Fped In all cases Frohcorrelates most highly with Fped reflecting the fact that Frohtends to capture more recent inbreeding than the other twogenomic estimates of F The increasingly large standarderrors of the correlations are due to the higher standarderrors and lower variances of Fped at larger population sizes

Finding the optimal measure for detecting inbreedingdepression Relationships between estimates of F andthe homozygous mutation load

To understand which estimate of F is likely to be optimal fordetecting inbreeding depression effects we correlated eachof the four estimates of F with the homozygous mutationload which as explained above is the leading contender forwhy inbreeding depression occurs Figure 7 shows theseresults Fped is the worst predictor of the homozygous mu-tation load and its disadvantage grows as the populationsize increases This result is due to the low variance in Fpedin randomly breeding populations and suggests that Fped islikely to be a useful estimate only in samples selected on thebasis of recent consanguinity Falt outperformed Fh becauseFalt gives more weight to rare variants and because Falt haslower error variance (Figure 5) As predicted Froh was themost associated estimate with the recessive mutation load atevery population size and this advantage grows as the pop-ulation size increases (Figure 7)

The superiority of Froh in detecting the homozygous mu-tation load is even more stark if one considers the uniqueeffects of each estimate of F after controlling for their

Figure 3 Contribution to var(Fped) from previous generations Most ofthe variance in Fped at all population sizes is attributable to recent in-breeding For example the variance in Fped due to spouses sharing com-mon ancestors five generations in the past makes up only 02 of thetotal variance in Fped at all population sizes

Detecting Ancient Inbreeding 243

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 8: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

correlations with Froh After controlling for Froh the meanpartial correlations (across Ne) between the homozygousmutation load and Fh (rp = 007) Falt (rp = 015) and Fped(rp = 009) are all low but the reverse is not true Partialcorrelations between the homozygous mutation load andFroh remain high after controlling for Fh (rp = 057) Falt(rp = 050) and Fped (rp = 068) This shows that Frohcontains much additional information on the homozygousmutation load that is not captured by the other three esti-mates of F whereas the other three estimates of F containlittle additional information on the homozygous mutationload over what already exists in Froh The superiority of Frohoccurs because a given rare variant is typically homozygousonly when the haplotypic segment on which it exists meetsanother IBD segment within an individualmdashcreating a runof homozygosity If inbreeding depression is caused by ho-mozygosity at rare mutations as recent evidence indicatesthese results suggest that Froh is likely to be the optimalestimate for detecting it regardless of the level of inbreed-ing in the population

Figure 8 shows a comparison of the correlation coeffi-cients between the homozygous mutation load and Froh cal-culated from short (05 Mb) moderate (15 Mb) andlong (5 Mb) ROH thresholds as a function of Ne Frohshortcorrelates poorly with the homozygous mutation load whenNe is small In large populations where inbreeding is moreancient the situation reverses Froh defined by short ROHsoutperforms the other two Froh estimates The improvementof Frohshort as a function of population size is due to twofactors (a) denser SNP data in the larger populations allowseven short (05 Mb) ROHs to discriminate autozygosity ac-

curately and (b) a higher ratio of short to long ROHs in largepopulations Frohlong performs poorly when Ne frac14 10000 dueto low variance Froh defined by the moderate (15 Mb)threshold performed intermediately at each population sizeproviding justification for using this threshold in the currentstudy as it allowed more meaningful comparisons across thethree population sizes

Changes in F following population expansion

We wanted to understand how recent population expansionsuch as what occurred in human lineages over the last10000 years (350 generations) might alter our conclu-sions Figure 9 shows the change in the mean log10(vari-ance) of the four estimates of F as a function of generationssince a 100-fold expansion in population size from Ne frac14 100to N = 10000 While the variance in Fped decreases imme-diately (a 97 drop) and remains unchanged thereafter thethree genomic estimates of F change much more graduallyThe initial declines in the variances of the three genomicestimates of F are caused by the drop in recent inbreedingin larger populations However the declines in variancesbecome much slower thereafter reflecting the slow decayof linkage disequilibrium and the gradual increase in thenumber of new segregating variants following a populationexpansion For example after expanding from Ne frac14 100 toNe frac14 10000 the standard deviation of Froh decreases by70 from 0115 to 0035 in the first 100 generationsbut decreases only an additional 10 (to 0021) over thenext 300 generations and requires several thousands of gen-erations to reach its equilibrium value of 0015 Similarlystarting at 0142 the standard deviation of Falt decreasesto 0069 and 0055 in generations 100 and 400 respec-tively Our results should extrapolate to larger starting and

Figure 5 Prediction error variance (PEV) of genomic estimates of F asa function of Ne PEV decreases as a function of population size for allgenomic estimates of F but does so most rapidly for Froh

Figure 4 Variance of F (61 SE) as a function of Ne The variance of Fped isthe lowest at all population sizes and the variance of Froh is intermediateComparing the variance of genomic measures of F in simulated data tothe equivalent variances in real SNP data (from an outbred Caucasiansample) suggests that the effective population size of Caucasians is10000 with respect to these statistics

244 M C Keller P M Visscher and M E Goddard

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 9: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

ending population sizes (eg Ne frac14 10000 to Ne frac141000000) and indicate that genomic estimates of F requirea considerable time following expansion to reach the lowvariance values that would be predicted from the largerpopulation sizes

Figure 10 shows the correlations between estimates of Fand the homozygous mutation load following populationexpansion This correlation drops considerably for all esti-mates of F following population expansion although itremains low for Fped Fh and Falt in the 400 generationsfollowing the expansion whereas the correlation with Frohbegins to increase after 50 generations When Ne is small(ie 100) rare mutant alleles are tagged by long haplotypesas a result of relatively recent inbreeding When the popu-lation size is increased these long haplotypes are destroyedby recombination and new ones are not created quickly be-cause new inbreeding occurs slowly Consequently the cor-relation between F and homozygous mutation load declinesas did the variance of F (Figure 9) However as the popu-lation evolves at size Ne frac14 10000 for an increasing numberof generations new mutations arise that exist on only a sin-gle haplotype and the pairings of these creating homozy-gosity at rare mutations are increasingly detected by Frohand eventually by Fh and Falt as a new equilibrium isreached These results provide further evidence that Froh islikely to be the optimal estimate for detecting inbreedingdepression in populations such as humans that have ex-panded rapidly in population size

Power to detect inbreeding effects

The power of detecting inbreeding depression from markerdata depends in part on the accuracy of predicting auto-

zygosity with markers and our results suggest that Froh isoptimal for detecting autozygosity arising from both ancientand recent inbreeding However power also depends on thevariation in F as well as on the effect size and the samplesize When the level of inbreeding is low such as in largerandomly mating populations the variance of Froh is com-mensurately low (eg the standard deviation of Froh =0015 when Ne frac14 10000) The standard deviation of Frohin observed human SNP data is smaller yet 0012 Givensuch low variance in the predictor we were interested inwhether inbreeding depression is detectable using Froh inhuman samples and if so what sample sizes would be re-quired as a function of different levels of inbreeding(assayed using different effective population sizes) toachieve adequate power To do this we needed an estimateof the likely effect size of inbreeding depression on a com-plex trait in humans and an estimate of the variance of Frohscaled up to genome-wide levels

Because the effects of inbreeding on IQ have beeninvestigated more than on any other complex human traitto our knowledge we used a best estimate of the inbreedingeffect on IQ as a guide to the likely inbreeding effect sizesamong other human complex traits Morton (1979) reviewedfour large studies of cousinndashcousin inbreeding and found thatIQ decreased by 073 (Schull and Neel 1965) to 039 (Kudoet al 1972) points for every 001 increase in Fped Given thestandard deviation of IQ (15 points) this translates to a de-crease of 0025ndash005 standard deviations per percentage ofinbreeding

To predict the statistical power of using Froh to detectinbreeding depression in human studies we also needed

Figure 6 Correlations between Fped and genomic estimates of F as a functionof Ne All correlations between genomic estimates of F and Fped decrease asa function of Ne but Froh is consistently correlated most highly with Fped

Figure 7 Correlations between F and homozygous mutation load asa function of Ne Froh correlates most highly with the homozygous muta-tion load at all population sizes and this advantage increases at largerpopulation sizes (where inbreeding becomes more ancient)

Detecting Ancient Inbreeding 245

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 10: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

to estimate what the variance of Froh would be if derivedgenome-wide rather than across 100 Mb as simulated in thecurrent study To do this we compared the average varianceof Froh across 10 replicates of 100 Mb of real SNP data(15 middot 1024) to the genome-wide variance of Froh in the realSNP data (179 middot 1025) an 845-fold drop in variance Wethus scaled the variances of Froh down 845-fold to accountfor the decreased variance that would be observed in Frohgenome-wide This led to predicted genome-wide variances(standard deviations) of Froh of 14 middot 1023 (0037) for Ne frac14100 25 middot 1024 (0016) for Ne frac14 1000 263 middot 1025 (0005)for Ne frac14 10000 and 179 middot 1025 (0004) for real SNP data

Figure 11 shows the statistical power as a function ofsample size of a regression of Froh on an outcome variableassuming a slope of 2073 (solid lines) and 2039 (dashedlines) a standard deviation of the outcome variable of 15and standard deviations of Froh depending on the popula-tion size as detailed in the preceding paragraph Weassume that be tethn22THORN with mean 2073 or 2039 andvarethbTHORN frac14 ethvarethIQTHORN2b2ethvarethFrohTHORNTHORNTHORN=varethFrohTHORNethn22THORN Our resultssuggest that sample sizes of 700 are sufficient for achiev-ing adequate (80) statistical power when the rate of in-breeding is high (similar to that found in a randomlybreeding population of size Ne frac14 100) such as could occurin samples selected on the basis of likely recent inbreedingor in small isolated populations Our results also suggestthat inbreeding depression effects can be detected using Frohin large ostensibly outbred populations such as those foundin modern industrialized societies but that large samples(on the order of 12000ndash65000 depending on effect size)

are probably necessary to detect them reliably It is possiblethat phenotypes more related to fitness than is IQ wouldshow a larger inbreeding depression effect and thus wouldrequire smaller sample sizes than those predicted here toachieve adequate power

Discussion

Hundreds of scientific investigations have been conductedon the effects of inbreeding since Darwin (1868 1876) firststudied the topic The inbreeding coefficient F has tradi-tionally been defined in terms of pedigree inbreeding (Fped)However low variation in Fped seriously hampers the abilityto detect inbreeding effects in samples that are not selectedon the basis of known recent inbreeding In essence Fped isa genome-wide expectation for the percentage of the ge-nome that is autozygous but there is a great deal of varia-tion in autozygosity around this expected percentage causedby the finite number of recombination events per genera-tion The resulting identical haplotypes can persist in thepopulation for many generations coming together in off-spring of distantly related individuals to create increasedlevels of homozygosity The availability of genomic markerpanels has made possible new genomic alternatives to Fpedthat can better detect such identical haplotypes and there-fore even quite ancient inbreeding

In the last 10 years a large number of human (reviewedin Ku et al 2010) and animal (reviewed in Chapman et al2009) studies have investigated the relationship betweengenomic estimates of F and disease- or fitness-related traitsIn general animal studies have focused on excess homozy-gosity on a marker-by-marker basis (Fh) whereas human

Figure 9 Variance of F (61 SE) as a function of generations since pop-ulation expansion Whereas Fped drops immediately following a popula-tion expansion the variance in genomic measures of F requires hundredsof generations to reach equilibrium levels

Figure 8 Correlations between alternative thresholds of Froh and thehomozygous mutation load as a function of Ne Runs of homozygosity(ROHs) are defined as stretches of 05-Mb 15-Mb or 5-Mb homozygousSNPs Long thresholds are optimal for detecting autozygosity in highlyinbred populations whereas shorter thresholds are optimal for detectingautozygosity in outbred populations

246 M C Keller P M Visscher and M E Goddard

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 11: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

studies which have the opportunity to use denser SNP plat-forms have focused increasingly but not exclusively on runsof homozygosity (Froh)

In this article we show that Froh is preferable to Fped andto marker-by-marker estimates of F (Falt and Fh) for detect-ing both recent and ancient inbreeding even in cases wherethe level of inbreeding is likely to be high Froh correlatesmost highly with the homozygous mutation load the puta-tive causal mechanism underlying inbreeding depressionand this advantage is especially pronounced following a pop-ulation expansion such as has occurred in recent humanhistory Moreover Froh has low prediction error varianceespecially when SNP density is high Our findings provideempirical justification to the growing literature using Froh tostudy complex traits in humans (Lencz et al 2007 Nallset al 2009 Spain et al 2009 Vine et al 2009 Enciso-Moraet al 2010 Hosking et al 2010 Yang et al 2010b)

Nevertheless the variance in Froh in large simulated(Ne frac14 10000) and observed Caucasian SNP data sets islow and because of this there is likely to be little powerto detect inbreeding effects in unselected (with respect torecent inbreeding) samples unless samples sizes are largeWe estimate that sample sizes between 12000 and 65000would be required to regularly detect previously reportedIQ-inbreeding effects using Froh in unselected samples Thuscurrent studies investigating the effects of Froh on humancomplex traits that have samples sizes 3000 and that havefailed to find significant inbreeding effects (Nalls et al 2009Spain et al 2009 Vine et al 2009 Enciso-Mora et al 2010Hosking et al 2010) are likely to be underpowered Further-more small studies (eg n 1000) that do find significant

inbreeding depression effects using Froh (eg Lencz et al2007) may greatly overestimate the size of the effects

Our findings suggest two strategies for achieving suffi-cient power in studying inbreeding depression of complextraits using a genomic estimate of F The first is to conductanalyses on very large (eg n 10000) samples such asthose being put together by multisite consortia (Interna-tional Schizophrenia Consortium 2009) The second is toconduct analyses on smaller samples (perhaps 1000ndash3000) from populations where the variation in inbreedingis likely to be high such as in population isolates (Rudanet al 2009) or in cultures where close inbreeding is common(Bittles and Black 2010a) In either case Froh is likely to bea more powerful approach for detecting inbreeding depres-sion than any other alternative

While much of the focus in this article has been onunderstanding the behavior of estimates of F in human pop-ulations our general results should apply to nonhuman an-imal populations as well Animals with larger effectivepopulation sizes than humans will manifest lower levels ofvariation in F than those reported here and will requirecommensurately larger sample sizes to detect inbreedingeffects in nonselected samples On the other hand manyspecies have smaller effective population sizes than humansMoreover fitness traits in other species may show largerinbreeding depression effects than the human example ex-plored here and may be detectable with smaller samplesizes

Figure 11 Estimated power to detect inbreeding effects on a humancomplex trait using Froh Higher levels of real inbreeding (smaller Ne) leadto higher variance in Froh and thus greater statistical power to detect aninbreeding effect Large (solid lines) and small (dashed lines) inbreedingeffect sizes were derived from a review on the effects of consanguinity onIQ (see text) Arrows show predicted sample sizes required to achieve80 power When inbreeding is high (Ne frac14 100) sample sizes of400 are adequate but in outbred populations (Ne frac14 10000 or realSNP data) samples sizes 20000 may be required

Figure 10 Correlations between F and homozygous mutation load asa function of generations since population expansion The relationshipbetween the homozygous mutation load and Froh increases the mostquickly following a population expansion

Detecting Ancient Inbreeding 247

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 12: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

The simulation parameters explored in this article werechosen to mimic the human genome Recombination ratesand recombination hotspot rates can vary at least threefoldacross mammals (Jensen-Seaman et al 2004 Kauppi et al2004) Nevertheless the basic conclusions made here arenot overly dependent on the specific parameters chosenand can readily be applied to animals with different recom-bination parameters When distance is conceptualized incentimorgans rather than base pair units different recombi-nation rates lengthen or shorten the genome vis-agrave-vis oursimulation This would slightly lower or raise (respectively)the variance of Froh as shown above when variation in Frohwas extrapolated to its expected genome-wide value Simi-larly when genomic distance is measured in centimorgansdifferent recombination hotspot parameters change the var-iation in SNP density vis-agrave-vis our simulation This shouldhave minimal effect on the ROHs detected and therefore onFroh because even the shortest ROH encompasses 175hotspots on average Thus while the specific values in ourresults depend on the specific simulation parameters usedand are therefore most relevant to human populations thequalitative conclusions regarding the advantages of Froh overthe alternative estimates of F apply across a wide range ofparameters and extend to nonhuman animals as well

Inbreeding has had a central place in the field of populationgenetics since its inception Its importance derives not onlyfrom its relevance to many theoretical concepts in populationgenetics Evolutionists have used inbreeding as a way togauge which traits are likely to have been under ancestralselection given that traits most affected by it tend to be mostrelated to fitness (Roff 1997) At a more practical level in-breeding can have important health consequences in humanpopulations (Bittles and Black 2010b Rudan et al 2003a)Rates of recent inbreeding are not low across the world Forexample progeny from second-cousin or closer marriages(Fped 0016) are estimated to account for 104 of theglobal human population In this article we have shownthat rates of inbreeding due to distant common ancestorsin large outbred populations are high enough to have de-tectable effects and we have demonstrated the optimal wayof identifying such ancient inbreeding Investigations intoancient inbreeding effects should help investigators under-stand the evolutionary forces acting on the genes underlyingtrait variation and whether inbreeding represents an impor-tant risk factor in disease

Acknowledgments

The authors thank Matthew C Jones for his statisticaladvice Daniel P Howrigan for help in conducting andfinding optimal thresholds for the simulation and Tony Sunfor modifying the Fregene program The data sets used forthe analyses described in this article were obtained from thedatabase of genotype and phenotype (dbGaP) found athttpwwwncbinlmnihgovgap through dbGaP accessionnos phs000021v3p2 and phs000167v1p1 Samples and

associated phenotype data for the genome-wide associationof schizophrenia study were provided by the Molecular Ge-netics of Schizophrenia Collaboration [Principal Investiga-tor P V Gejman Evanston Northwestern Healthcare andNorthwestern University Evanston IL] MCK PMV andMEG conceived and designed this study MCK conductedanalyses and MCK PMV and MEG wrote the articleThe authors declare that no competing interests exist Thisstudy was supported by a grant from the National Institutesof Health and the National Institutes of Mental Health(MH085812) (to MCK) PMV and MEG acknowledgefunding from the Australian National Health and MedicalResearch Council (grants 613672 and 613601) and the Aus-tralian Research Council (grants DP0770096 and DP1093900)The funders had no role in study design data collection andanalysis decision to publish or preparation of the manuscript

Literature Cited

Abaskuliev A A and G V Skoblo 1975 Inbreeding endogamyand exogamy among relatives of schizophrenia patients Genetika11 145ndash148

Afzal M 1988 Consequences of consanguinity on cognitive be-havior Behav Genet 18 583ndash594

Bittles A H and M L Black 2010a Evolution in health andmedicine Sackler colloquium consanguinity human evolutionand complex diseases Proc Natl Acad Sci USA 107(Suppl 1)1779ndash1786

Bittles A H and M L Black 2010b The impact of consanguinityon neonatal and infant health Early Hum Dev 86 1779ndash1786

Brown D E 1991 Human Universals McGraw-Hill New YorkBulayeva K B S M Leal T A Pavlova R M Kurbanov S J Glatt

et al 2005 Mapping genes of complex psychiatric diseases inDaghestan genetic isolates Am J Med Genet Part B Neuro-psych Genet 132 76ndash84

Carothers A D I Rudan I Kolcic O Polasek C Hayward et al2006 Estimating human inbreeding coefficients comparisonof genealogical and marker heterozygosity approaches AnnHum Genet 70 666ndash676

Chadeau-Hyam M C J Hoggart P F OrsquoReilly J C Whittaker MDe Iorio et al 2008 Fregene simulation of realistic sequence-level data in populations and ascertained samples BMC Bioin-form 9 364

Chang J T 1999 Recent common ancestors of all present-dayindividuals Adv Appl Probab 4 1002ndash1026

Chapman J R S Nakagawa D W Coltman J Slate and B CSheldon 2009 A quantitative review of heterozygosity-fitnesscorrelations in animal populations Mol Ecol 18 2746ndash2765

Charlesworth B and D Charlesworth 1999 The genetic basis ofinbreeding depression Genet Res 74 329ndash340

Charlesworth D and J H Willis 2009 The genetics of inbreed-ing depression Nat Rev Genet 10 783ndash796

Crow J F and M Kimura 1970 An Introduction to PopulationGenetics Theory Harper amp Row New York

Darwin C 1868 The Variation of Animals and Plants Under Do-mestication Appleton New York

Darwin C 1876 The Effects of Crossing and Self Fertilization in theVegetable Kingdom John Murray London

DeRose M A and D A Roff 1999 A comparison of inbreedingdepression in life-history and morphological traits in animalsEvolution 53 1288ndash1292

Derrida B S C Manrubia and D H Zanette 2000 On thegenealogy of a population of biparental individuals J TheorBiol 203 303ndash315

248 M C Keller P M Visscher and M E Goddard

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 13: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

Enciso-Mora V F J Hosking and R S Houlston 2010 Risk ofbreast and prostate cancer is not associated with increased ho-mozygosity in outbred populations Eur J Hum Genet 18909ndash914

Fenner J N 2005 Cross-cultural estimation of the human gen-eration interval for use in genetics-based population divergencestudies Am J Phys Anthropol 128 415ndash423

Fisher R A 1954 A fuller theory of ldquojunctionsrdquo in inbreedingHeredity 8 187ndash197

Franklin I R 1977 The distribution of the proportion of thegenome which is homozygous by descent in inbred individualsTheor Popul Biol 11 60ndash80

Gibson J N E Morton and A Collins 2006 Extended tracts ofhomozygosity in outbred human populations Hum Mol Genet15 789ndash795

Gindilis V M R G Gainullin and L M Shmaonova1989 Genetico-demographic patterns of the prevalence of var-ious forms of endogenous psychoses Genetika 25 734ndash743

Hill W G and B S Weir 2011 Variation in actual relationshipas a consequence of Mendelian sampling and linkage GenetRes (Camb) 93 47ndash64

Hosking F J E Papaemmanuil E Sheridan S E Kinsey T Lightfootet al 2010 Genome-wide homozygosity signatures and child-hood acute lymphoblastic leukemia risk Blood 115 4472ndash4477

International Schizophrenia Consortium 2009 A framework forinterpreting genome-wide association studies of psychiatric dis-orders Mol Psychiatry 14 10ndash17

Jensen-Seaman M I T S Furey B A Payseur Y Lu K M Roskinet al 2004 Comparative recombination rates in the ratmouse and human genomes Genome Res 14 528ndash538

Kauppi L A J Jeffreys and S Keeney 2004 Where the cross-overs are recombination distributions in mammals Nat RevGenet 5 413ndash424

Ku C S N Naidoo S M Teo and Y Pawitan 2010 Regions ofhomozygosity and their impact on complex diseases and traitsHum Genet 129 1ndash15

Kudo A K Ito and K Tanaka 1972 Genetic studies on inbreed-ing in some Japanese populations X The effects of parentalconsanguinity on psychometric measurements school perform-ances and school attendance in Shizuoka school-children JpnJ Hum Genet 17 231ndash248

Lachance J 2009 Inbreeding pedigree size and the most recentcommon ancestor of humanity J Theor Biol 261 238ndash247

Lebel R R and W B Gallagher 1989 Wisconsin consanguinity stud-ies II familial adenocarcinomatosis Am J Med Genet 33 1ndash6

Lencz T C Lambert P DeRosse K E Burdick T V Morgan et al2007 Runs of homozygosity reveal highly penetrant recessive lociin schizophrenia Proc Natl Acad Sci USA 104 19942ndash19947

Leutenegger A L B Prum E Genin C Verny A Lemainque et al2003 Estimation of the inbreeding coefficient through use ofgenomic data Am J Hum Genet 73 516ndash523

Li L H S F Ho C H Chen C Y Wei W C Wong et al2006 Long contiguous stretches of homozygosity in the humangenome Hum Mutat 27 1115ndash1121

McQuillan R A L Leutenegger R Abdel-Rahman C S FranklinM Pericic et al 2008 Runs of homozygosity in European pop-ulations Am J Hum Genet 83 359ndash372

Morton N E 1979 Effect of inbreeding on IQ and mental re-tardation Proc Natl Acad Sci USA 75 3906ndash3908

Nalls M A R J Guerreiro J Simon-Sanchez J T Bras B JTraynor et al 2009 Extended tracts of homozygosity identifynovel candidate genes associated with late-onset Alzheimerrsquosdisease Neurogenetics 10 183ndash190

OrsquoDonovan M C N Craddock N Norton H Williams T Peirceet al 2008 Identification of loci associated with schizophreniaby genome-wide association and follow-up Nat Genet 401053ndash1055

Ohno S 1996 The Malthusian parameter of ascents What pre-vents the exponential increase of onersquos ancestors Proc NatlAcad Sci USA 93 15276ndash15278

Powell J E P M Visscher and M E Goddard 2010 Reconcilingthe analysis of IBD and IBS in complex trait studies Nat RevGenet 11 800ndash805

Pritchard J K 2001 Are rare variants responsible for suscepti-bility to complex diseases Am J Hum Genet 69 124ndash137

Purcell S B Neale K Todd-Brown L Thomas M A Ferreiraet al 2007 PLINK a tool set for whole-genome associationand population-based linkage analyses Am J Hum Genet81 559ndash575

Pusey A and M Wolf 1996 Inbreeding avoidance in animalsTrends Ecol Evol 11 201ndash206

Roff D A 1997 Evolutionary Quantitative Genetics Chapman ampHall New York

Rohde D L S Olson and J T Chang 2004 Modelling the recentcommon ancestry of all living humans Nature 431 562ndash566

Rudan I D Rudan H Campbell A Carothers A Wright et al2003a Inbreeding and risk of late onset complex disease JMed Genet 40 925ndash932

Rudan I N Smolej-Narancic H Campbell A Carothers A Wrightet al 2003b Inbreeding and the genetic complexity of humanhypertension Genetics 163 1011ndash1021

Rudan I T Skaric-Juric N Smolej-Narancic B Janicijevic DRudan et al 2004 Inbreeding and susceptibility to osteoporo-sis in Croatian island isolates Coll Anthropol 28 585ndash601

Rudan I A Marusic S Jankovic K Rotim M Boban et al2009 ldquo10001 Dalmatiansrdquo Croatia launches its national bio-bank Croat Med J 50 4ndash6

Schull W J and J V Neel 1965 The Effects of Inbreeding onJapanese Children Harper amp Row New York

Shami S A R Qaisar and A H Bittles 1991 Consanguinity andadult morbidity in Pakistan Lancet 338 954

Spain S L J B Cazier R Houlston L Carvajal-Carmona and ITomlinson 2009 Colorectal cancer risk is not associated withincreased levels of homozygosity in a population from theUnited Kingdom Cancer Res 69 7422ndash7429

Stam P 1980 The distribution of the fraction of the genomeidentical by descent in finite random mating populations GenetRes 35 131ndash155

Takahata N Y Satta and J Klein 1995 Divergence time andpopulation size in the lineage leading to modern humans TheorPopul Biol 48 198ndash221

Tenesa A P Navarro B J Hayes D L Duffy G M Clarke et al2007 Recent human effective population size estimated fromlinkage disequilibrium Genome Res 17 520ndash526

Vine A E A McQuillin N J Bass A Pereira R Kandaswamyet al 2009 No evidence for excess runs of homozygosity inbipolar disorder Psychiatr Genet 19 165ndash170

Woods C G J Cox K Springell D J Hampshire M D Mohamedet al 2006 Quantification of homozygosity in consanguineousindividuals with autosomal recessive disease Am J HumGenet 78 889ndash896

Wright S 1922 Coefficients of inbreeding and relationships AmNat 56 330ndash339

Wright S 1977 Evolution and the Genetics of Populations Vol 3Experimental Results and Evolutionary Deductions University ofChicago Press Chicago

Yang J B Benyamin B P McEvoy S Gordon A K Henders et al2010a Common SNPs explain a large proportion of the heri-tability for human height Nat Genet 42 565ndash569

Yang T L Y Guo L S Zhang Q Tian H Yan et al 2010b Runsof homozygosity identify a recessive locus 12q2131 for humanadult height J Clin Endocrinol Metab 95 3777ndash3782

Communicating editor N A Rosenberg

Detecting Ancient Inbreeding 249

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 14: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism

GENETICSSupporting Information

httpwwwgeneticsorgcontentsuppl20110625genetics111130922DC1

Quantification of Inbreeding Due to DistantAncestors and Its Detection Using Dense Single

Nucleotide Polymorphism DataMatthew C Keller Peter M Visscher and Michael E Goddard

Copyright copy 2011 by the Genetics Society of AmericaDOI 101534genetics111130922

Page 15: Home | Genetics - Quantification of Inbreeding Due …INVESTIGATION Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism