13
Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations Juan L. Rodriguez-Flores, 1,8 Khalid Fakhro, 2,3,8 Francisco Agosto-Perez, 1,4 Monica D. Ramstetter, 4 Leonardo Arbiza, 4 Thomas L. Vincent, 1 Amal Robay, 3 Joel A. Malek, 3 Karsten Suhre, 5 Lotfi Chouchane, 3 Ramin Badii, 6 Ajayeb Al-Nabet Al-Marri, 6 Charbel Abi Khalil, 3 Mahmoud Zirie, 7 Amin Jayyousi, 7 Jacqueline Salit, 1 Alon Keinan, 4 Andrew G. Clark, 4 Ronald G. Crystal, 1,9 and Jason G. Mezey 1,4,9 1 Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA; 2 Sidra Medical and Research Center, Doha, Qatar; 3 Department of Genetic Medicine, Weill Cornell Medical CollegeQatar, Doha, Qatar; 4 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA; 5 Bioinformatics Core, Weill Cornell Medical CollegeQatar, Doha, Qatar; 6 Laboratory Medicine and Pathology, Hamad Medical Corporation, Doha, Qatar; 7 Department of Medicine, Hamad Medical Corporation, Doha, Qatar An open question in the history of human migration is the identity of the earliest Eurasian populations that have left con- temporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between 125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypoth- esis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out- of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when consid- ering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations estab- lished by the out-of-Africa migrations. [Supplemental material is available for this article.] All humans can trace their ancestry back to Africa (Cann et al. 1987), where the ancestors of anatomically modern humans first diverged from primates (Patterson et al. 2006), and then from ar- chaic humans (Prüfer et al. 2014). Humans began leaving Africa through a number of coastal routes, where estimates suggest these out-of-Africamigrations reached the Arabian Peninsula as early as 125,000 yr ago (Armitage et al. 2011) and as late as 60,000 yr ago (Henn et al. 2012). After entering the Arabian Peninsula, human ancestors entered South Asia and spread to Australia (Rasmussen et al. 2011), Europe, and eventually, the Americas. The individuals in these migrations were the most direct ancestors of ancient non-African peoples, and they established the contem- porary non-African populations recognized today (Cavalli-Sforza and Feldman 2003). The relationship between contemporary Arab populations and these ancient human migrations is an open question (Lazaridis et al. 2014; Shriner et al. 2014). Given that the Arabian Peninsula was an initial site of egress from Africa, one hypothesis is that the original out-of-Africa migrations established ancient populations on the peninsula that were direct ancestors of con- temporary Arab populations (Lazaridis et al. 2014). These people would therefore be direct descendants of the earliest split in the lineages that established Eurasian and other contemporary non- African populations (Armitage et al. 2011; Rasmussen et al. 2011; Henn et al. 2012; Lazaridis et al. 2014; Shriner et al. 2014). If this hypothesis is correct, we would expect that there are contempo- rary, indigenous Arabs who are the most distant relatives of other Eurasians. To assess this hypothesis, we carried out deep-coverage genome sequencing of 104 unrelated natives of the Arabian Peninsula who are citizens of the nation of Qatar (Supplemental Fig. 1), including 56 of indigenous Bedouin ancestry who are the best representatives of autochthonous Arabs, and compared these genomes to contemporary genomes of Africa, Asia, Europe, and 8 These authors contributed equally to this study. 9 These authors contributed equally as senior investigators for this study. Corresponding author: [email protected] Article published online before print. Article, supplemental material, and publi- cation date are at http://www.genome.org/cgi/doi/10.1101/gr.191478.115. Freely available online through the Genome Research Open Access option. © 2016 Rodriguez-Flores et al. This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/. Research 26:151162 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/16; www.genome.org Genome Research 151 www.genome.org Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.org Downloaded from

Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

Indigenous Arabs are descendants of the earliestsplit from ancient Eurasian populations

Juan L. Rodriguez-Flores,1,8 Khalid Fakhro,2,3,8 Francisco Agosto-Perez,1,4 MonicaD. Ramstetter,4 Leonardo Arbiza,4 Thomas L. Vincent,1 Amal Robay,3 Joel A. Malek,3

Karsten Suhre,5 Lotfi Chouchane,3 Ramin Badii,6 Ajayeb Al-Nabet Al-Marri,6

Charbel Abi Khalil,3 Mahmoud Zirie,7 Amin Jayyousi,7 Jacqueline Salit,1 Alon Keinan,4

Andrew G. Clark,4 Ronald G. Crystal,1,9 and Jason G. Mezey1,4,91Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA; 2Sidra Medical and ResearchCenter, Doha, Qatar; 3Department of Genetic Medicine, Weill Cornell Medical College–Qatar, Doha, Qatar; 4Department ofBiological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA; 5Bioinformatics Core, Weill CornellMedical College–Qatar, Doha, Qatar; 6Laboratory Medicine and Pathology, Hamad Medical Corporation, Doha, Qatar;7Department of Medicine, Hamad Medical Corporation, Doha, Qatar

An open question in the history of human migration is the identity of the earliest Eurasian populations that have left con-

temporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between

125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the

Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypoth-

esis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of

indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these

genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the

indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans

and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out-

of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When

compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal

of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when consid-

ering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all

other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations estab-

lished by the out-of-Africa migrations.

[Supplemental material is available for this article.]

All humans can trace their ancestry back to Africa (Cann et al.1987), where the ancestors of anatomically modern humans firstdiverged from primates (Patterson et al. 2006), and then from ar-chaic humans (Prüfer et al. 2014). Humans began leaving Africathrough a number of coastal routes, where estimates suggest these“out-of-Africa” migrations reached the Arabian Peninsula as earlyas 125,000 yr ago (Armitage et al. 2011) and as late as 60,000 yrago (Henn et al. 2012). After entering the Arabian Peninsula,human ancestors entered South Asia and spread to Australia(Rasmussen et al. 2011), Europe, and eventually, the Americas.The individuals in these migrations were the most direct ancestorsof ancient non-African peoples, and they established the contem-porary non-African populations recognized today (Cavalli-Sforzaand Feldman 2003).

The relationship between contemporary Arab populationsand these ancient human migrations is an open question(Lazaridis et al. 2014; Shriner et al. 2014). Given that the ArabianPeninsula was an initial site of egress from Africa, one hypothesisis that the original out-of-Africa migrations established ancientpopulations on the peninsula that were direct ancestors of con-temporary Arab populations (Lazaridis et al. 2014). These peoplewould therefore be direct descendants of the earliest split in thelineages that established Eurasian and other contemporary non-African populations (Armitage et al. 2011; Rasmussen et al. 2011;Henn et al. 2012; Lazaridis et al. 2014; Shriner et al. 2014). If thishypothesis is correct, we would expect that there are contempo-rary, indigenous Arabs who are the most distant relatives of otherEurasians. To assess this hypothesis, we carried out deep-coveragegenome sequencing of 104 unrelated natives of the ArabianPeninsula who are citizens of the nation of Qatar (SupplementalFig. 1), including 56 of indigenous Bedouin ancestry who are thebest representatives of autochthonous Arabs, and compared thesegenomes to contemporary genomes of Africa, Asia, Europe, and

8These authors contributed equally to this study.9These authors contributed equally as senior investigators for thisstudy.Corresponding author: [email protected] published online before print. Article, supplemental material, and publi-cation date are at http://www.genome.org/cgi/doi/10.1101/gr.191478.115.Freely available online through the Genome Research Open Access option.

© 2016 Rodriguez-Flores et al. This article, published in Genome Research, isavailable under a Creative Commons License (Attribution 4.0 International),as described at http://creativecommons.org/licenses/by/4.0/.

Research

26:151–162 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/16; www.genome.org Genome Research 151www.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 2: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

the Americas (The 1000 Genomes Project Consortium 2012;Lazaridis et al. 2014).

Results

Population structure of the Arabian Peninsula

Previous analyses of the populations of the Arabian Peninsula(Hunter-Zinck et al. 2010; Alsmadi et al. 2013) have found threedistinct clusters that reflect primary ancestry: Q1 (Bedouin); Q2(Persian-South Asian); and Q3 (African) (Omberg et al. 2012). Byassessment of medical records and ancestry-informative SNP gen-otyping (Supplemental Fig. 2), a sample of 108 purportedly unre-lated individuals was selected for sequencing, including 60 Q1(Bedouin), 20 Q2 (Persian-South Asian), and 20 Q3 (African), aswell as 8 Q0 (Subpopulation Unassigned) that could not be cleanlyplaced in one of these three groups (Supplemental Table I). Eachof these genomes was sequenced to a median depth of 37× (mini-mum 30×) by Illumina technology, identifying a total of23,784,210 SNPs (see Methods, Supplemental Table II).

To confirm that none of the 108 individuals were closely re-lated, we used KING-robust (Manichaikul et al. 2010) and PREST-plus (McPeek and Sun 2000) to estimate family relationships basedon a set of 1,407,483 SNPs after pruning of the full set of22,958,844 autosomal SNPs in Qatar (see Methods). Both analysesidentified five pairs of related individuals greater than third-degreethat were subsequently confirmed by investigative reassessment ofmedical records (Supplemental Table III; Supplemental Fig. 3).Three of the pairs form a trio; hence, two individuals from thetrio were removed, and one individual from each of the two re-maining pairs was removed, such that the remaining 104 individ-uals analyzed further included 8 Q0 (Subpopulation Unassigned)and 96 Q1, Q2, or Q3 Qatari: 56 Q1 (Bedouin), 20 Q2 (Persian-South Asian), and 20 Q3 (African).

An analysis of inbreeding for these remaining individualsshowed the Q1 (Bedouin) to have a more positive inbreeding coef-ficient than most of the non-admixed 1000 Genomes (The 1000Genomes Project Consortium 2012) populations (SupplementalTable IV; Supplemental Fig. 4), consistentwith the known inbreed-ing of this group (Hunter-Zinck et al. 2010; Omberg et al. 2012); al-thoughwealso found theQ1 (Bedouin) tobe less inbred thanmanysmall and/or isolated populations worldwide represented in theHuman Origins samples (Lazaridis et al. 2014) (SupplementalTable V; Supplemental Table VI; Supplemental Fig. 4). The Q2(Persian-South Asian) had a positive, but slightly lower, inbreedingcoefficient than theQ1 (Bedouin). In contrast, theQ3 (African)hada non-negative coefficient that reflects known admixture withAfricanpopulations (Hunter-Zinck et al. 2010;Omberg et al. 2012).

We confirmed the primary ancestry classifications of the104 Qataris by principal component analysis (Price et al. 2006).We combined the 104 Qataris, the Human Origins populations(Lazaridis et al. 2014), and 1000 Genomes populations (The1000 Genomes Project Consortium 2012) (excluding individualsalready in Human Origins), and performed principal componentanalysis on a set of 197,714 linkage disequilibrium pruned auto-some SNPs (Fig. 1A; Supplemental Fig. 5A). We also confirmedthese clusterings just with the 104Qataris and 1000Genomes sam-ples based on the same set of autosomal SNPs (Supplemental Fig.5B). These analyses reproduced the population clustering observedpreviously (Hunter-Zinck et al. 2010; Omberg et al. 2012), with theQ1 (Bedouin) closest to Europeans, the Q2 (Persian-South Asian)between Q1 (Bedouin) and Asians, and the Q3 (African) closest

to African populations. A plot of just the Middle Eastern popula-tions on the principal components also showed clustering asexpected, with the Q1 (Bedouin) clustering with previously sam-pled Bedouins and Arabs, Q2 (Persian-South Asians) with Iranians,and Q3 (African) outside of the Middle Eastern cluster (data notshown) (Fig. 1B).

Y Chromosome and mitochondrial DNA haplogroups

We next analyzed the Y Chromosome (Chr Y) and mitochondrialDNA (mtDNA) to assess the degree to which the Q1 (Bedouin), Q2(Persian-South Asian), or Q3 (African) Qatari ancestry groups rep-resent distinct subpopulations (Fig. 2). The Chr Y haplogroupsshowed almost no overlap between the Q1 (Bedouin) Qatarisand Q2 (Persian-South Asian) Qataris, in which an Analysis ofMolecular Variance (AMOVA) was highly significant (P < 0.018)(Supplemental Table VII). The Arab haplogroup J1 was the domi-nant haplogroup in the Q1 (Bedouin) Qataris, but this haplogroupwas not represented at all among the Q2 (Persian-South Asian)Qataris (Fig. 2A). This confirmed that these are genetically well-defined subpopulations that are relatively isolated from one an-other (Omberg et al. 2012). There was also a strong partitioningof the Chr Y haplogroups when considering the Q3 (African)Qataris, both when considering Q1 (Bedouin) versus Q3(African) (AMOVA P < 1 × 10−5) and Q2 (Persian-South Asian) ver-sus Q3 (African) (AMOVA P < 0.028). The Q3 (African) had largelyAfrican haplogroups, a result consistent with the known recentAfrican admixture of this subpopulation (Omberg et al. 2012).

The mtDNA haplogroups were less partitioned among theQataris, although they still showed significant partitioning be-tween each pair of subpopulations (AMOVA Q1 versus Q2 P <0.035, Q1 versus Q3 P < 1 × 10−5, Q2 versus Q3 P < 0.017) andamong all three considered simultaneously (AMOVA P < 1 × 10−5)(Supplemental Table VII). The mtDNA haplogroups also includedmoreworldwide geographic diversity overall, indicating a differentmale versus female pattern of intermarriage among these subpop-ulations (Sandridge et al. 2010). Together the Chr Y and mtDNAhaplogroups indicate that the Q1 (Bedouin), Q2 (Persian-SouthAsian), andQ3 (African) ancestry groups represent genetic subpop-ulations that not only reflect known migration history (Hunter-Zinck et al. 2010; Omberg et al. 2012) but that also represent unitsdefined by a patrilocal society with strong historical barriers to in-termarriage (Esposito 2001; Cavalli-Sforza and Feldman 2003), inwhich gene flow has been dominated by female movement (i.e.,admixture occurring through females marrying into the relativelyisolated subpopulations), aswell as female influxes fromother geo-graphic areas.

X-linked and autosomal diversity

To further analyze the relative male and female contributions tothe genetics of the Qatari Q1 (Bedouin), Q2 (Persian-SouthAsian), and Q3 (African) subpopulations, we analyzed genome-wide ratios of X-linked and autosomal (X/A) diversity and X/Adiversity ratios for genome intervals >0.18 cM from genes(Supplemental Table VIII; Supplemental Fig. 6). For both of theseratios, the Q1 (Bedouin) and Q2 (Persian-South Asian) were lowerthan for African populations but were higher than for Europeansand Asians. This points to a higher effective population size of fe-males in the Q1 (Bedouin) and Q2 (Persian-South Asian), possiblya consequence of the out-of-Africa migrations, which were be-lieved to be biased toward migration of males over females(Gottipati et al. 2011; Arbiza et al. 2014). The Q3 (African)

Rodriguez-Flores et al.

152 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 3: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

Qataris had X/A diversity ratios that were higher, even when com-pared to African populations. Thismay be driven by a smallermaleeffective population size; a possible consequence of a polygamousculture and the ancestry of the Q3 (African) subpopulation thatwas a result of the historical slave trade into the region fromAfrica (Omberg et al. 2012).

We also analyzed the relative ratios ofX-linked and autosomal(X/A) diversity in nongenic regions of the female Q1 (Bedouin),Q2 (Persian-South Asian), and Q3 (African) genomes comparedto females in African populations of the 1000 Genomes Project(Supplemental Table IX). The relative X/A ratios of both the Q1(Bedouin) and Q2 (Persian-South Asian) to African populations

were slightly higher than when compar-ing European to African populations(Gottipati et al. 2011; Arbiza et al. 2014).This could indicate a slightly less extremeset of bottleneck events encounteredsince the out-of-Africa migrations by thedirect ancestors of the Q1 (Bedouin) andQ2 (Persian-South Asian) compared tothebottlenecks encounteredby thedirectancestors of Europeans. The relative X/Adiversity ratios of Q3 (African) to Africanpopulations were closer to one, consis-tent with the known African admixtureof this subpopulation (Omberg et al.2012).

Pairwise sequential Markov coalescent

analysis

We next analyzed the full complementof autosomal polymorphisms for signalsof ancient bottlenecks by applying thepairwise sequential Markov coalescent(PSMC) (Fig. 3; Li and Durbin 2011).This analysis showed that the Q1(Bedouin) and Q2 (Persian-South Asian)had clear hallmarks of a bottleneckevent, with effective population size hit-ting a trough in the range of 100,000to 30,000 yr ago with a minimum at∼60,000 yr ago. This same pattern is ob-served for a European individual fromthe 1000 Genomes Project and is consis-tent with what has been observed in oth-er non-African human genomes usingthe pairwise sequential Markov coales-cent, as well as related methods (Gronauet al. 2011; Fu et al. 2014; Schiffels andDurbin 2014). These data, therefore,point to the ancestors of Q1 (Bedouin)and Q2 (Persian-South Asian) as havingmigrated out of Africa at the same timeas the ancestors of other non-Africanpopulations (Henn et al. 2012). Al-though PSMC estimates in the more re-cent past tend to have larger confidenceintervals (Li and Durbin 2011), the Q1(Bedouin) do appear to have a lower pop-ulation size than the Q2 (Persian-SouthAsian) in the region <30,000 yr ago, con-

sistent with high levels of inbreeding in theQ1 (Bedouin) (Hunter-Zinck et al. 2010; Sandridge et al. 2010; Mezzavilla et al. 2015). Forthe Q3 (African), the median effective population size was moresimilar to an African individual from the 1000 Genomes Projectin the range 100,000 to 30,000 yr ago, consistent with Sub-Saharan African ancestry that is relatively recent (Omberg et al.2012).

Admixture analysis

The signal of an ancient bottleneck in the Q1 (Bedouin) is not un-expected given previous analyses of genomic admixture that

0.010.00-0.01-0.02-0.03 0.02

0.02

0.01

0.00

-0.01

-0.02

-0.03

-0.04

-0.05

-0.06

-0.07

A

B PC2

PC1

0.04 0.05 0.06 0.070.03 0.08

0.005

PC2

PC1

-0.020 -0.015 -0.010 0.000

0.007

0.003

0.001

-0.001

-0.003

-0.050

.

StudyHuman OriginsQatari Genomes

PopulationBedouin ABedouin BDruzeEgyptian ComasEgyptian MetspaluIranianJordanianLebanesePalestinianQ0 (Unassigned)Q1 (Bedouin)Q2 (Persian-South Asian)Q3 (African)SaudiSyrianTurkish AdanaTurkish IstanbulTurkish TrabzonYemen

Study1000 GenomesHuman OriginsQatari Genomes

Study, Population1000 Genomes, Africa1000 Genomes, America1000 Genomes, East Asia1000 Genomes, EuropeHuman Origins, AfricaHuman Origins, AmericaHuman Origins, Central Asia SiberiaHuman Origins, East AsiaHuman Origins, Middle EastHuman Origins, OceaniaHuman Origins, South AsiaHuman Origins, West EurasiaQatari Genomes, Q0 (Unassigned)Qatari Genomes, Q1 (Bedouin)Qatari Genomes, Q2 (Persian-South Asian)Qatari Genomes, Q3 (African)

Figure 1. Principal component analysis (PCA) (Price et al. 2006) of the 104 Qatari genomes (circle),1000 Genomes (triangle), and Human Origins (square) study samples. Shown are individuals plottedon principal components PC1 and PC2, with genomes color-coded by study and population, with theQ0 (Subpopulation Unassigned) in gray, Q1 (Bedouin) in red, Q2 (Persian-South Asian) in azure, andQ3 (African) in black. (A) Plot of all populations, defined by study and by population, in which all popu-lations from the same region and study are grouped and color-coded together (1000 Genomes: Africa,America, East Asia, and Europe; Human Origins: Africa, America, Central Asia/Siberia, East Asia, MiddleEast, Oceania, South Asia, and West Eurasia). (B) Plot of Middle Eastern subpopulations from HumanOrigins that cluster near Q1 (Bedouin) and Q2 (Persian-South Asian).

Uninterrupted ancestry in Arabia

Genome Research 153www.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 4: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

found <1% African ancestry in this subpopulation (Omberg et al.2012) and studies of worldwide population structure, whichhave inferred that the Q1 (Bedouin) genomes have the greatestproportion of Arab genetic ancestry, even when compared toBedouins from outside Qatar and to Arabs in surrounding coun-tries, including Yemen and Saudi Arabia (Hodgson et al. 2014;Shriner et al. 2014). To confirm a similarly minute amount ofAfrican admixture for the Q1 (Bedouin) in our sample, we appliedthree methodologies: (1) an ADMIXTURE (Alexander et al. 2009)analysis of the genome-wide ancestry proportions in the 104Qataris, the 1000 Genomes Project (The 1000 Genomes ProjectConsortium 2012), and Human Origins samples (Lazaridis et al.2014); (2) an ALDER (Loh et al. 2013) analysis of the proportionand timing of African ancestry in these same populations; and(3) a SupportMix (Omberg et al. 2012) analysis of the populationassignments of local genomic segments of the 96 Q1 (Bedouin),Q2 (Persian-South Asian), or Q3 (African) Qatari genomes.

The ADMIXTURE analysis identified K = 12 ancestral popula-tions as having the lowest cross-validation error (SupplementalFig. 7A). At this level of resolution, theQ1 (Bedouin) had a high av-erage (84%) proportion of ancestry that was also present in theHuman Origins Bedouin B population at a high average propor-tion (93%) (Supplemental Fig. 7B,C), in which this same ancestrywas also shared with Saudis, and at lower levels among otherMiddle Eastern populations. This ancestry therefore appears tobe the signal of an indigenous Arab ancestral population. TheBedouin A population also shared this ancestry but at a lower aver-

age proportion (45%) and appeared to bemore admixed overall. The Q2 (Persian-South Asian) shared a large proportion(45% on average) of ancestry that domi-nates in Iranians (46% on average),consistent with a Persian ancestral popu-lation (Omberg et al. 2012). The Q3(African) shared the majority of ancestrywith African populations as expectedand were considerably admixed overall,again consistent with the known historyof this subpopulation (SupplementalFig. 7A; Omberg et al. 2012).

The ALDER analysis determined therelative percentage of African (Yoruba)ancestry in the Q1 (Bedouin) (2.6%±1.37) and Q2 (Persian-South Asian)(5.0% ± 1.41) at levels on par with esti-mates for other populations sampled inthe region (Supplemental Fig. 8; Supple-mental Table X), including HumanOrigins Bedouin and Saudi. This con-firmed that recent African admixture islimited to the Q3 (African) subpopula-tion (37.6% ± 0.9), inwhich this estimateis on par with African American popula-tions. An estimate of the timing ofAfrican admixture placed the number ofgenerations for Q1 (Bedouin) (15.2) andQ2 (Persian-South Asian) (14.0) slightlyhigher thanQ3 (African) (9.3), consistentwith the Q1 (Bedouin) and Q2 (Persian-South Asian) reflecting more distantAfrican admixture events and with theQ3 (African) reflecting the historical tim-

ing of the African slave trade in the region (Omberg et al. 2012).The SupportMix analysis used six of the 1000 Genomes pop-

ulations (two European, two Asian, and two African) (seeSupplemental Methods for details) as ancestral proxy referencepanels and produced a set of “best guess” admixture assignmentsbased on highest similarity to these genomes. Although these1000 Genomes populations do not include appropriate local pop-ulations most closely related to the Qataris needed for assessmentof the true admixture composition of the genomes, the ancestrytrack length distribution of haplotypes assigned to African popula-tions (Yoruba or Luhuya) provides a qualitative indicator ofwheth-er the subpopulations experienced recent admixture with Africanpopulations. As expected, the track lengths of the Q1 (Bedouin)and Q2 (Persian-South Asian) assigned to African 1000 Genomespopulations were far shorter than those for Q3 (African)(Supplemental Fig. 9), again confirming that recent African admix-ture is limited to the Q3 (African) subpopulation.

Neanderthal ancestry

Wenext analyzedNeanderthal admixture contributions to the an-cestry of Q1 (Bedouin) compared to the Q2 (Persian-South Asian)and Q3 (African) Qataris, the 1000 Genomes populations, andthe populations of the Human Origins samples using the F4 ratioand Patterson’s D-statistic (Fig. 4; Supplemental Fig. 10, Supple-mental Table XI; Patterson et al. 2012). The results for both meth-ods were highly correlated (Supplemental Fig. 10A). The Q1

Q1 (Bedouin)

Q2 (Persian-South Asian)

Q3 (African)

A Y chromosome haplogroups

B Mitochondrial DNA haplogroups

J1 Arabian E1 African R1 European

AfricanB2

U6 Mediterranean

U4 EuropeanU2 West AsianU1 West Asian

U5 European

U9 South AsianX2West Asian

T1 West Asian

J1 West AsianJ2 West Asian

T2 West Asian

I West AsianI7 West Asian

R0 Arabian

HV West Asian

R2 ArabianR6 ArabianR8 Arabian

H1 Middle EasternH2 West AsianH6 West Asian

AfricanL2L3 African

L1 AfricanL0 African M2 South Asian

N1 West Asian

M3 South Asian

M52South AsianM5 South Asian

T West AsianJ2 West Asian

L1 South AsianT2 West Asian

Q2 (Persian-South Asian)

Q3 (African)

Q1 (Bedouin)

Figure 2. Y Chromosome (Chr Y) andmitochondrial DNA (mtDNA) haplogroup assignments. The ChrY and mtDNA haplogroups were determined for Q1 (Bedouin), Q2 (Persian-South Asian), and Q3(African). (A) Pie charts of the haplogroup frequencies for Chr Y. (B) Pie charts of the haplogroup frequen-cies for mtDNA.

Rodriguez-Flores et al.

154 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 5: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

(Bedouin; F4 ratio = 0.026, D-statistic = 0.000) had more Neander-thal admixture than all African populations, including Q3(African; F4 ratio range =−0.017 to 0.024, D-statistic range =−0.031 to−0.003). TheQ1 (Bedouin) also hadNeanderthal admix-ture at levels comparable to Q2 (Persian-South Asian; F4 ratio =0.024, D-statistic =−0.003) and to other Middle Eastern popula-tions, including other Bedouin populations (Human OriginsBedouin A F4 ratio = 0.022, D-statistic =−0.003 and Bedouin B F4ratio = 0.024, D-statistic =−0.003) and Saudi (F4 ratio = 0.026, D-statistic =−0.001). Interestingly, the Q1 (Bedouin) did not tendto have higher Neanderthal admixture levels when consideringpopulations outside of the Middle East, where the bulk ofEuropean populations had higher Neanderthal admixture (F4 ratiorange = 0.018 to 0.041, D-statistic range = 0.003 to 0.010). Yet, thepercentage of Neandethal admixture with the Q1 (Bedouin) washigher than expected if it could be entirely explained by later ad-mixture events between the Q1 (Bedouin) and Europeans (ob-served F4 ratio = 0.026 versus expected F4 ratio = 0.00247).

The higher Neanderthal ancestry in the Q1 (Bedouin) Qataricompared to African populations places the divergence of an-cestral Arabs after the out-of-Africa bottleneck. Given thecurrent evidence of the geographic range of Neanderthal popula-tions stretching from Europe and the Mediterranean throughNorthern and Central Asia (Fu et al. 2014; Hershkovitz et al.2015), the lower Neanderthal Ancestry in the Q1 (Bedouin)Qatari compared to populations within the ancestral Neanderthalrange is also consistent with an early divergence of the ancestorsof indigenous Arabs from other lineages that populated Asiaand Europe. Yet, since the Neanderthal admixture in the Q1(Bedouin) cannot be entirely explained by admixture withEuropeans, this indicates there was some admixture betweenNeanderthals and ancestors of the Q1 (Bedouin) in the region ofthe Arabian Peninsula.

TreeMix analysis

We also analyzed the autosomes of the combined 96 Q1(Bedouin), Q2 (Perisan-South Asian) or Q3 (African) Qataris, andnon-admixed populations of the 1000 Genomes Project usingthe population split and mixture inference method TreeMix(Pickrell and Pritchard 2012) to assess the relative genetic similar-ity of populations based on high-density, genome-wide allelefrequencies. The analysis returned an overall tree for the 1000Genomes populations that mirrored those found previously(Shriner et al. 2014) with the addition of the Q1 (Bedouin) andQ2 (Persian-South Asian) clustering on the branch that includesEuropeans (Pérez-Miranda et al. 2006) and the Q3 (African) clus-tering with African populations (Fig. 5). Whenmigrations were al-lowed in the analysis, no migration events were observed betweenthe Q1 (Bedouin) and African populations, even when allowing asmany as five migration events (Supplemental Fig. 11). These re-sults are also consistent with what is known of the migration his-tory of the Arabian Peninsula, including migration both to andfrom Europe during ancient and more recent eras of civilization,where this resulted in detectable admixture from European popu-lations in both the Q1 (Bedouin) and Q2 (Persian-South Asian)(Omberg et al. 2012).

Proportion of shared alleles neighbor-joining analysis

As the principal component analysis and the TreeMix population-level clusterings depend on allele frequencies, the clustering of theQ1 (Bedouin) on a common branch with European populationscould be driven by the haplotypes introduced by migrants, whichwould be expected to shift the allele frequencies of these popula-tions toward each other. As such, these clusterings based on allelefrequencies do not necessarily argue against significant anddeep ancestry of the Q1 (Bedouin) on the Arabian Peninsula,

Figure 3. Ancient bottlenecks in the 96 Q1 (Bedouin), Q2 (Persian-South Asian), or Q3 (African) Qatari genomes (56 Q1, 20 Q2, 20 Q3) determined bypairwise sequentialMarkov coalescent analysis (Li andDurbin 2011). Shown is the plot of themedian effective population size (y-axis) across individuals in asubpopulation versus years in the past (log scale x-axis) for the samples in the three major Qatari subpopulations: Q1 (Bedouin) in red, Q2 (Persian-SouthAsian) in azure, Q3 (African) in black. A single individual of European ancestry (NA12879, violet) and a single individual of African ancestry (NA19239,orange) from the 1000 Genomes Project deep-coverage pilot (The 1000 Genomes Project Consortium 2010) are shown for comparison.

Uninterrupted ancestry in Arabia

Genome Research 155www.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 6: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

as indicated by the levels of Neanderthal admixture in this subpo-pulation. Additionally, these population-level clusterings are dis-proportionately influenced by common segregating alleles(Pickrell and Pritchard 2012), while rare alleles can be more infor-mative about deeper shared ancestry (Mathieson and McVean2014) as the identity by state of a rare variant can more accuratelyreflect identity by descent (Hochreiter 2013).

In contrast to population-level clus-tering, a pairwise clustering of individualgenomes based on shared variants pro-vides a relative measure for comparingtotal shared ancestry between individu-als. Also, when applied to a commonset of genome-wide, high-density mark-ers that include the low-minor allele fre-quency alleles of the 1000 GenomesProject, such pairwise clustering also pro-vides an appropriate weight to rarealleles. We therefore performed a propor-tion of shared alleles (Mountain andCavalli-Sforza 1997) analysis on the com-bined samples in the 104 Qatari and the1000 Genomes samples, in which pair-wise proportion of shared alleles was cal-culated for the 11,711,386 autosomal,biallelic SNPs segregating in both the104 Qatari and the 1000 Genomes sam-ples. A robust version of the neighbor-joining algorithm was used to perform apairwise clustering of the samples (Fig.6A–F; Criscuolo and Gascuel 2008), inwhich bootstrap support values were cal-culated for the observed trees using 100random samplings of the SNPs.

The neighbor-joining analysis re-vealed that 50 of the 56 Q1 (Bedouin),along with three Q2 (Persian-SouthAsian), one Q3 (African), and two Q0(Subpopulation Unassigned) Qataris,clustered outside African lineages andwere also the most extreme outgroupthat are basal to all non-African popula-tions lacking recent African admixture(Fig. 6D). Strong bootstrap support wasobserved for this cluster (70 of 100 itera-tions), and for its presence as anoutgroupto the Eurasian cluster (68 of 100 itera-tions), comparable to the support for theJapanese cluster (60 of 100 iterations)and for the East Asians as an outgroup toEuropeans and Americans (81 of 100 iter-ations).TheQ1(Bedouin) thereforefit thecriteria of having ancient migration fromAfrica and beingmost distantly related toall other non-Africans in total ancestry.

A total of 11 Q2 (Persian-SouthAsian), three Q1 (Bedouin), one Q3(African), and one Q0 (SubpopulationUnassigned) defined an Asian outgroupmore closely related to Asians than themain Q1 (Bedouin) outgroup (Fig. 6C),likely driven by the ancestry of the the

Q2 (Persian-South Asian) subpopulation traceable to Persia andSouth Asia (Omberg et al. 2012) and indicating these individualsare most distantly related to other Asians present in this cluster.A total of 12 Q3 (African), three Q1 (Bedouin), three Q2 (Persian-South Asian), and four Q0 (Subpopulation Unassigned) cluster aslong individual branches or small clusters between the major Q1(Bedouin) cluster and the admixed individuals of African ancestry

Popu

latio

n, in

ord

er o

f inc

reas

ing

Nea

nder

thal

anc

estr

y pr

opor

tion

(F4

ratio

)Neanderthal ancestry proportion

0.110

0.084

0.058

0.033

0.006

-0.020

Study, Population1000 Genomes, Africa1000 Genomes, America1000 Genomes, East Asia1000 Genomes, EuropeHuman Origins, AfricaHuman Origins, AmericaHuman Origins, Central Asia SiberiaHuman Origins, East AsiaHuman Origins, Middle EastHuman Origins, OceaniaHuman Origins, South AsiaHuman Origins, West EurasiaQatari Genomes, Q1 (Bedouin)Qatari Genomes, Q2 (Persian-South Asian)Qatari Genomes, Q3 (African)

Bantu SA Zulu

Gujarati A GIHSyrianIranianOroqenTurkish TrabzonJordanianBedouin BDruzePalestinianIBSBedouin ABantu SA OvamboQ2 (Persian-South Asian)Egyptian ComasTurkish BalikesirMXLTurkishQ1 (Bedouin)Turkish KayseriSaudiLebaneseTurkish IstanbulCLMTurkish AydinTurkish AdanaFINJPTCHBKusundaKaritianaMansiItalian East SicilianNaxiBougainvilleAustralian ECCAC

LWKASWQ3 (African)Egyptian MetspaluBolivian Cochabamba

YemenSpanish Canarias IBS

Chukchi Sir

Figure 4. Neanderthal ancestry in world populations. F4 ratio estimation as implemented inADMIXTOOLS 3.0 (Patterson et al. 2012) was used to calculate the Neanderthal ancestry proportionfor each population in the combined data set of Qatari genomes, the 1000 Genomes Project, andHuman Origins. The F4 ratio estimates α, the proportion of Neanderthal ancestry in a population.Shown are the results for populations of interest, including highest and lowest scoring populationsfrom each region (the 1000 Genomes Project, Africa; the 1000 Genomes Project, America; the 1000Genomes Project, East Asia, the 1000 Genomes Project, Europe, Human Origins, Africa; HumanOrigins, America; Human Origins, Central Asia/Siberia; Human Origins, East Asia; Human Origins,Oceania; Human Origins, South Asia; Human Origins, West Eurasia), Middle Eastern populations(Human Origins), Q1 (Bedouin), Q2 (Persian-South Asian) and Q3 (African). Populations are color-codedby region, and a distinct color is used for each Qatari population. A full set of results is presented inSupplemental Figure 10 and Supplemental Table XI. The population codes are as in the 1000Genomes Project (The 1000 Genomes Project Consortium 2012).

Rodriguez-Flores et al.

156 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 7: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

from Southwest US (ASW), potentially representing individualswith a higher proportion of African admixture. As expected fromthe analyses of population genetic similarity and prior neighbor-joining analysis of admixed populations (Kopelman et al. 2013),the Q3 (African) and African Americans do not form large clusters,but rather appear as multiple individual branches close to the in-digenous African populations,most similar to their African admix-ture source (Fig. 6E,F). A set of three Q2 (Persian-South Asian)clustered as an outgroup to the Tuscan Southern European (TSI)branch (Fig. 6B), which is not unexpected given admixture withEuropean populations (Omberg et al. 2012; Pickrell et al. 2014).

Discussion

The hypothesis that the first Eurasian populations were estab-lished on the Arabian Peninsula and that contemporary indige-nous Arabs are direct descendants of this ancient population issupported by two major conclusions derived from the combinedevidence of this study. First, the analysis results for X/A diversity,the pairwise sequential Markov coalescent, genome-wide admix-ture, timing of African admixture, local admixture deconvolution,Neanderthal admixture, and application of TreeMix, support theinference that theQ1 (Bedouin) can trace the bulk of their ancestryback to the out-of-Africa migration events. Second, the com-bination of lower levels of Neanderthal admixture in the Q1(Bedouin) than European/Asian populations and the outgroup po-sition of the Q1 (Bedouin) compared to non-Africans in thepairwaise similarity clustering of high-density variants measuredgenome-wide, place the Q1 (Bedouin) as being the most distantrelatives of other contemporary non-Africans. Given that the Q1(Bedouin) have the greatest proportion of Arab genetic ancestrymeasured in contemporary populations (Hodgson et al. 2014;Shriner et al. 2014) and are among the best genetic representatives

of the autochthonous population on the Arabian Peninsula, thesetwo conclusions therefore point to the Bedouins being direct de-scendants of the earliest split after the out-of-Africa migrationevents that established a basal Eurasian population (Lazaridiset al. 2014). This is also consistent with the majority of Q1(Bedouin) being able to trace a significant portion of their autoso-mal ancestry through lineages that never left the peninsula afterthe out-of-Africa migration events since such deep ancestry wouldnot be expected if the entire Arabian Peninsula population hadbeen reestablished fromAfrica or a non-African population at a lat-er point.

Given the complex history of migration patterns to and fromEuropean populations, and the complicated patterns of isolationand intra- and inter-marriage of the indigenous Bedouin popula-tions (Hunter-Zinck et al. 2010; Sandridge et al. 2010), it is not sur-prising that among theQ1 (Bedouin) are individuals who retain anautosomal signal of being the most distant relatives of non-Africans, while population-level clustering based on migration-shifted allele frequencies places the Q1 (Bedouin) closer toEuropeans. The basal position of the Q1 (Bedouin) also has inter-esting implications for theories about the frequency, timing, andpath of major migration waves that established populations inAsia and Europe (Shi et al. 2008; Lazaridis et al. 2014; Shrineret al. 2014). A few isolated Asian populations were previously sus-pected to be descendants of a separate out-of-Africa migrationwave based on Y Chromosome data (Hammer et al. 1998; Shiet al. 2008). Yet, distinct out-of-Africa migration events or separatemigration waves emanating from the Arabian Peninsula intoEurope and West Asia would be expected to place Bedouins/Europeans andAsians on separate branches of a pairwise clusteringtree, distinct from our finding that places the Q1 (Bedouin) as di-rect descendants of the earliest lineage that split from the ancientnon-African population.

A demographic scenario consistent with the evidence pre-sented here is that the population ancestral to the Q1 (Bedouin)migrated out of Africa, and a subset of this population remainedin the peninsula until the present day, while a second subset ofthis population migrated onward and colonized Eurasia. This mi-gration scenario implies the signal of the same bottleneck wouldbe present in all non-African populations, which has been ob-served thus far in coalescent analysis of contemporary non-African populations (Gronau et al. 2011; Fu et al. 2014; Schiffelsand Durbin 2014) and for an anatomically modern human wholived 45,000 yr ago (Fu et al. 2014). This is also consistent withthe recent discovery of another anatomically modern humanwho lived 55,000 yr ago just northeast of the Arabian Peninsulathat had morphological features similar to European peoples(Hershkovitz et al. 2015), where this individual could have beena descendant of the basal Eurasian population that remained onthe peninsula. Under this migration scenario, although otherwaves of migration may have occurred, the descendants of thesealternative waves either left no descendants or were integratedinto the dominant populations.

Beyond the importance for disentangling human migrationhistory, an early split of Eurasian lineages in the ArabianPeninsula has implications for the study of disease genetics for in-digenous people in the region. For example, for a disease such astype 2 diabetes that has a prevalence of >18% in the Qatari popu-lation, associated genetic variants would not a priori be expectedto be the same as those discovered in Europeans, when consideringthat indigenous Arabs are able to trace a significant portion of theirancestry back to ancient lineages on the Arabian Peninsula. More

Bedouin (Q1)

Tuscan Italian (TSI)

Iberian Spanish (IBS)

Finnish (FIN)

Utahns (CEU)

British (GBR)

Japanese (JPT)

African (Q3)

Luhuya Kenyan (LWK)

Yoruba Nigerian (YRI)

Han Chinese Beijing (CHB)

Han Chinese South (CHS)

0.002

Persian-South Asian (Q2)

Figure 5. TreeMix (Pickrell and Pritchard 2012) hierarchical clusteringanalysis of the Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African)and the 1000 Genomes Project samples. Shown is a maximum-likelihoodtree of population splits inferred without subsequent migration events, inwhich branch lengths estimate divergence between populations(Europeans in shades of purple: CEU, FIN, GBR, IBS, TSI; East Asians inshades of brown: CHB, CHS, JPT; Africans in shades of orange: LWK, YRI,with the Q1 [Bedouin] in red, Q2 [Persian-South Asian] in azure, and Q3[African] in black). When allowing from one to fivemigration events in sep-arate TreeMix analyses, none of the admixture loops connected the Q1(Bedouin) with any African populations (Supplemental Fig. 10), consistentwith the Q1 (Bedouin) having no recent African admixture.

Uninterrupted ancestry in Arabia

Genome Research 157www.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 8: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

generally, this suggests that for any genome-wide associationstudy (GWAS) or rare variant association study (RVAS) of diabetesor other complex diseases in Qatar, inference of deep ancestry inthe Arabian Peninsula, using rare variation sampled by genomeor exome sequencing, is critical for identifying new disease riskgenes. Given the dearth of next generation sequencing studiesconducted in Middle Eastern and Arab populations, these resultsindicate that a considerable number of variants that make impor-

tant contributions to disease risk in thesepopulations are yet to be discovered.

This study is the first analysis ofArabian Peninsula migration makinguse of deeply sequenced genomes froma sample of unrelated inhabitants of thepeninsula. Although there have beenmany analyses of Chr Y andmtDNA sam-pled from Arab individuals (Abu-Ameroet al. 2007, 2008, 2009; Rowold et al.2007), and there have been previous sur-veys of genetic variation of peoplewithinthe peninsula and immediately sur-rounding regions conducted with geno-typing arrays (Behar et al. 2010; Hunter-Zinck et al. 2010; Alsmadi et al. 2013;Markus et al. 2014; Shriner et al. 2014)and deep exome sequencing (Rodri-guez-Flores et al. 2012, 2014; Alsmadiet al. 2014), and by individual high-cov-erage genomes (Alsmadi et al. 2014;John et al. 2015), the sample of rare andcommon genetic variation throughoutthe genome in our sample provides a farmore complete picture of how both an-cient and recent migration events havecontributed to the genetics of the mod-ern peoples of the Arabian Peninsula.For understanding how human migra-tion history has determined the structureof modern genomes, our identificationof a cluster of Q1 (Bedouin) as the mostdistant ancestors of non-Africans is ofconsiderable interest, particularly giventhe suspected route of migration out ofAfrica and into the surrounding con-tinents. The possibility that the Q1(Bedouin) are descendants of the firstEurasians provides an additional pieceof the puzzle concerning ancient migra-tion routes and the establishment of an-cient non-African populations.

Methods

Ethics statement

Human subjects were recruited, andwrit-ten informed consent was obtained atHamad Medical Corporation (HMC)and HMC Primary Health Care CentersDoha, Qatar, under protocols approvedby the Institutional Review Boards ofHamad Medical Corporation and WeillCornell Medical College in Qatar.

Inclusion criteria

Qatar is a peninsula nation on the eastern edge of the ArabianPeninsula (Supplemental Fig. 1). The population of Qatar includesmore than 2million inhabitants, comprised of∼300,000 nationalswith roots in Qatar predating the discovery of oil and gasand establishment of an independent nation in 1970 and themore than 1.7 million immigrants who mostly arrived in the past

Luhuya (LWK)

Bedouin (Q1)

Puerto Rican (PUR)

British and Utahn(GBR, CEU)

Admixed African (ASW, Q3)

Colombian and Mexican (CLM, MXL)

Iberian Spanish (IBS)

Persian-South Asian (Q2)

Finnish (FIN)

Yoruban (YRI)

Japanese (JPT)

Han Chinese (CHB, CHS)

Persian-South Asian Qataris (Q2)

Tuscan Italian (TSI)

British and Utahn(GBR, CEU)

British (GBR)Utahn (CEU)Finn (FIN)Colombian (CLM)Mexican (MXL)Puerto Rican (PUR)Iberian Spanish (IBS)Tuscan Italian (TSI)Persian-South Asian (Q2)Han Chinese South (CHS)Han Chinese Beijing (CHB)Japanese (JPT)Bedouin (Q1)African American (ASW)Subpopulation Unassigned (Q0)African (Q3)Luhuya Kenyan (LWK)Yoruba Nigerian (YRI)

A

F

E

D

C

B

F

E

D

C

BCEU_NA12154CEU_NA12751GBR_HG00136CEU_NA11829CEU_NA11892CEU_NA12777CEU_NA12043CEU_NA12058GBR_HG00252CEU_NA12340GBR_HG00232GBR_HG00253GBR_HG00258GBR_HG00262GBR_HG00103CEU_NA11831GBR_HG00104GBR_HG00106CEU_NA12842CEU_NA12874GBR_HG00141GBR_HG00126CEU_NA12775CEU_NA12004CEU_NA12399CEU_NA11920CEU_NA12829CEU_NA11932CEU_NA12383CEU_NA12341CEU_NA12717CEU_NA11992GBR_HG00108GBR_HG00154GBR_HG00160CEU_NA07056CEU_NA12827CEU_NA10851CEU_NA12750CEU_NA12716CEU_NA12045CEU_NA12046CEU_NA07048FIN_HG00173FIN_HG00319FIN_HG00269FIN_HG00361FIN_HG00177FIN_HG00271FIN_HG00284FIN_HG00335FIN_HG00185FIN_HG00343FIN_HG00366FIN_HG00278FIN_HG00342FIN_HG00367FIN_HG00373FIN_HG00176FIN_HG00377FIN_HG00182FIN_HG00272FIN_HG00324FIN_HG00325FIN_HG00313FIN_HG00338FIN_HG00369FIN_HG00282FIN_HG00285FIN_HG00323FIN_HG00383FIN_HG00281FIN_HG00266FIN_HG00336FIN_HG00353FIN_HG00309FIN_HG00328FIN_HG00171FIN_HG00326FIN_HG00276FIN_HG00321FIN_HG00186FIN_HG00381FIN_HG00178FIN_HG00280FIN_HG00320FIN_HG00372FIN_HG00179FIN_HG00187FIN_HG00274FIN_HG00273FIN_HG00267FIN_HG00327FIN_HG00189FIN_HG00330FIN_HG00357FIN_HG00346FIN_HG00376FIN_HG00318FIN_HG00334FIN_HG00349FIN_HG00355FIN_HG00384FIN_HG00359FIN_HG00360FIN_HG00358FIN_HG00382FIN_HG00356FIN_HG00362FIN_HG00364FIN_HG00351FIN_HG00378FIN_HG00180FIN_HG00332FIN_HG00345FIN_HG00375FIN_HG00310FIN_HG00174FIN_HG00315FIN_HG00329FIN_HG00183FIN_HG00339FIN_HG00188FIN_HG00312FIN_HG00270FIN_HG00331FIN_HG00268FIN_HG00275FIN_HG00337FIN_HG00344FIN_HG00277FIN_HG00306FIN_HG00190FIN_HG00311FIN_HG00341FIN_HG00350CEU_NA11930CEU_NA12003GBR_HG00158GBR_HG00256GBR_HG00143GBR_HG00250GBR_HG00261CEU_NA11995CEU_NA11931CEU_NA11933CEU_NA12748GBR_HG00260CEU_NA12763GBR_HG00131GBR_HG00133CEU_NA06989CEU_NA12155GBR_HG00233CEU_NA11919CEU_NA12144GBR_HG00231GBR_HG01334GBR_HG00146GBR_HG00148GBR_HG00125GBR_HG00259CEU_NA07000CEU_NA11893CEU_NA12749GBR_HG00236GBR_HG00264GBR_HG00151CEU_NA12778CEU_NA11830CEU_NA12812CEU_NA12814GBR_HG00139GBR_HG00150CEU_NA07051GBR_HG00245CEU_NA07037CEU_NA12830GBR_HG00130IBS_HG01618GBR_HG00097IBS_HG01624GBR_HG00135GBR_HG00255IBS_HG01625IBS_HG01626IBS_HG01619IBS_HG01617IBS_HG01623IBS_HG01620GBR_HG00129GBR_HG00235GBR_HG00128GBR_HG00113GBR_HG00234GBR_HG00238GBR_HG00240GBR_HG00102GBR_HG00121GBR_HG00099GBR_HG00109GBR_HG00100GBR_HG00101GBR_HG00116GBR_HG00120GBR_HG00114GBR_HG00096GBR_HG00110CEU_NA12272GBR_HG00118GBR_HG00263GBR_HG00156CEU_NA12273GBR_HG00112GBR_HG00123GBR_HG00119GBR_HG00124GBR_HG00117GBR_HG00257GBR_HG00122GBR_HG00140CEU_NA12889CEU_NA12275CEU_NA11843CEU_NA12347GBR_HG00111CEU_NA12873CEU_NA06984GBR_HG00138GBR_HG00159CEU_NA06994CEU_NA12249CEU_NA12546CEU_NA12282GBR_HG00149GBR_HG00243CEU_NA12489CEU_NA12006CEU_NA12044GBR_HG00134GBR_HG00142GBR_HG00242GBR_HG00244GBR_HG00246CEU_NA12890CEU_NA12286CEU_NA12342GBR_HG00249GBR_HG00265CEU_NA12761GBR_HG00239GBR_HG00254CEU_NA12815CEU_NA07347CEU_NA12718GBR_HG00127CEU_NA06986GBR_HG00137CEU_NA12843CEU_NA11894CEU_NA11993GBR_HG00152CEU_NA12287CEU_NA12283CEU_NA12348GBR_HG00237GBR_HG00251GBR_HG00247CEU_NA10847CEU_NA11994CEU_NA12400CEU_NA07357GBR_HG00155CEU_NA12413CEU_NA12872TSI_NA20803MXL_NA19675MXL_NA19679MXL_NA19678MXL_NA19648MXL_NA19676MXL_NA19725MXL_NA19652MXL_NA19651MXL_NA19779MXL_NA19684MXL_NA19750CLM_HG01383CLM_HG01465CLM_HG01550CLM_HG01374CLM_HG01251CLM_HG01113CLM_HG01351CLM_HG01357CLM_HG01378MXL_NA19719CLM_HG01149CLM_HG01350CLM_HG01495CLM_HG01456CLM_HG01455CLM_HG01494CLM_HG01497CLM_HG01498MXL_NA19716MXL_NA19761MXL_NA19731MXL_NA19732MXL_NA19729MXL_NA19728MXL_NA19735MXL_NA19741MXL_NA19740MXL_NA19758MXL_NA19759MXL_NA19783MXL_NA19746MXL_NA19785MXL_NA19755MXL_NA19720MXL_NA19788MXL_NA19663ASW_NA20314MXL_NA19717MXL_NA19762MXL_NA19681MXL_NA19734MXL_NA19657MXL_NA19753MXL_NA19786MXL_NA19654MXL_NA19747MXL_NA19771MXL_NA19726MXL_NA19738MXL_NA19723MXL_NA19660MXL_NA19672MXL_NA19664MXL_NA19661MXL_NA19685MXL_NA19752MXL_NA19774MXL_NA19777MXL_NA19756MXL_NA19776MXL_NA19722MXL_NA19782MXL_NA19682MXL_NA19789MXL_NA19764MXL_NA19795CLM_HG01441MXL_NA19770MXL_NA19780MXL_NA19749MXL_NA19773CLM_HG01137CLM_HG01377CLM_HG01250CLM_HG01356CLM_HG01136CLM_HG01375MXL_NA19655ASW_NA19625ASW_NA20414ASW_NA20299CLM_HG01551CLM_HG01462CLM_HG01342CLM_HG01390CLM_HG01461CLM_HG01365CLM_HG01440CLM_HG01124CLM_HG01366CLM_HG01277CLM_HG01278CLM_HG01272CLM_HG01344CLM_HG01274MXL_NA19737CLM_HG01271CLM_HG01257CLM_HG01140CLM_HG01360CLM_HG01125CLM_HG01384CLM_HG01134CLM_HG01148CLM_HG01359CLM_HG01345MXL_NA19794CLM_HG01354CLM_HG01489CLM_HG01488CLM_HG01133CLM_HG01353CLM_HG01491CLM_HG01389CLM_HG01492CLM_HG01112CLM_HG01437CLM_HG01259CLM_HG01275PUR_HG00640PUR_HG01048PUR_HG01060PUR_HG01174PUR_HG00732PUR_HG01080PUR_HG00731PUR_HG01187PUR_HG01191PUR_HG01097PUR_HG00734PUR_HG01067PUR_HG01183PUR_HG01069PUR_HG00641PUR_HG01073PUR_HG01197PUR_HG01055PUR_HG00740PUR_HG01066PUR_HG01061PUR_HG01070PUR_HG01095PUR_HG00638PUR_HG01198PUR_HG01176PUR_HG01204PUR_HG01101PUR_HG01188PUR_HG01167PUR_HG00553PUR_HG01052PUR_HG01051PUR_HG01190PUR_HG01107PUR_HG01082PUR_HG01102PUR_HG01171PUR_HG00637PUR_HG01083PUR_HG01170PUR_HG00554PUR_HG01072PUR_HG01098PUR_HG00736PUR_HG01079PUR_HG00737PUR_HG01047PUR_HG01104PUR_HG01085PUR_HG01105PUR_HG01173PUR_HG01075IBS_HG01519IBS_HG01521IBS_HG01515IBS_HG01522IBS_HG01516IBS_HG01518TSI_NA20524TSI_NA20530TSI_NA20507TSI_NA20785TSI_NA20502TSI_NA20585TSI_NA20534TSI_NA20543TSI_NA20792TSI_NA20796TSI_NA20517TSI_NA20537TSI_NA20505TSI_NA20753TSI_NA20525TSI_NA20757TSI_NA20815TSI_NA20544TSI_NA20801TSI_NA20522TSI_NA20814TSI_NA20783TSI_NA20798TSI_NA20586TSI_NA20775TSI_NA20778TSI_NA20813TSI_NA20755TSI_NA20773TSI_NA20800TSI_NA20807TSI_NA20808TSI_NA20588TSI_NA20754TSI_NA20508TSI_NA20513TSI_NA20805TSI_NA20536TSI_NA20510TSI_NA20761TSI_NA20760TSI_NA20804TSI_NA20786TSI_NA20811TSI_NA20535TSI_NA20812TSI_NA20512TSI_NA20802TSI_NA20519TSI_NA20540TSI_NA20504TSI_NA20527TSI_NA20810TSI_NA20582TSI_NA20772TSI_NA20515TSI_NA20799TSI_NA20765TSI_NA20521TSI_NA20581TSI_NA20509TSI_NA20529TSI_NA20516TSI_NA20816TSI_NA20538TSI_NA20795TSI_NA20532TSI_NA20806TSI_NA20518TSI_NA20539TSI_NA20542TSI_NA20790TSI_NA20774TSI_NA20818TSI_NA20506TSI_NA20828TSI_NA20759TSI_NA20771TSI_NA20809TSI_NA20797TSI_NA20826TSI_NA20589TSI_NA20756TSI_NA20541TSI_NA20768TSI_NA20503TSI_NA20520TSI_NA20758TSI_NA20531TSI_NA20770TSI_NA20528TSI_NA20533TSI_NA20787TSI_NA20752TSI_NA20769TSI_NA20819PUR_HG01168TSI_NA20766Q2_DGMQ32056Q2_DGMQ31269Q2_DGMQ31611CHB_NA18564CHB_NA18592CHB_NA18561CHS_HG00611CHB_NA18563CHB_NA18619CHB_NA18618CHB_NA18567CHB_NA18611CHB_NA18597CHB_NA18553CHB_NA18559CHS_HG00592CHB_NA18548CHB_NA18541CHB_NA18599CHB_NA18635CHB_NA18539CHB_NA18546CHB_NA18612CHB_NA18547CHB_NA18572CHB_NA18602CHB_NA18595CHB_NA18633CHB_NA18566CHB_NA18596CHB_NA18579CHB_NA18613CHB_NA18544CHB_NA18610CHB_NA18571CHB_NA18538CHB_NA18535CHB_NA18615CHB_NA18562CHB_NA18620CHB_NA18593CHB_NA18558CHB_NA18542CHB_NA18557CHB_NA18621CHB_NA18638CHB_NA18537CHB_NA18622CHB_NA18623CHB_NA18609CHS_HG00628CHB_NA18560CHS_HG00422CHS_HG00590CHS_HG00530CHS_HG00620CHS_HG00629CHS_HG00625CHB_NA18536CHB_NA18630CHS_HG00580CHB_NA18555CHS_HG00404CHS_HG00654CHS_HG00684CHS_HG00457CHS_HG00476CHS_HG00543CHS_HG00619CHS_HG00556CHS_HG00634CHB_NA18616CHS_HG00689CHS_HG00708CHS_HG00613CHS_HG00577CHS_HG00584CHS_HG00595CHB_NA18614CHS_HG00513CHS_HG00560CHS_HG00701CHS_HG00566CHS_HG00589CHS_HG00583CHB_NA18636CHS_HG00705CHB_NA18582CHS_HG00663CHS_HG00671CHS_HG00672CHS_HG00698CHS_HG00653CHS_HG00693CHS_HG00463CHS_HG00478CHS_HG00704CHS_HG00707CHS_HG00650CHS_HG00683CHS_HG00657CHS_HG00702CHS_HG00656CHS_HG00662CHS_HG00565CHB_NA18526CHS_HG00699CHS_HG00690CHB_NA18565CHS_HG00421CHS_HG00451CHS_HG00418CHS_HG00427CHB_NA18631CHB_NA18637CHS_HG00464CHB_NA18549CHS_HG00593CHS_HG00403CHS_HG00531CHS_HG00614CHS_HG00651CHS_HG00449CHB_NA18634CHS_HG00419CHS_HG00610CHS_HG00437CHS_HG00537CHS_HG00626CHB_NA18606CHS_HG00475CHS_HG00542CHS_HG00500CHS_HG00536CHS_HG00472CHB_NA18577CHS_HG00436CHS_HG00473CHS_HG00443CHS_HG00448CHB_NA18570CHB_NA18632CHS_HG00557CHS_HG00596CHB_NA18617CHB_NA18534CHB_NA18552CHB_NA18624CHB_NA18605CHS_HG00578CHS_HG00581CHS_HG00635CHS_HG00607CHS_HG00608CHB_NA18545CHB_NA18627CHS_HG00479CHS_HG00534CHB_NA18574CHB_NA18603CHB_NA18532CHB_NA18543CHB_NA18573CHS_HG00428CHS_HG00559CHB_NA18628CHS_HG00512CHS_HG00524CHS_HG00501CHS_HG00692CHS_HG00458CHS_HG00533CHS_HG00525CHB_NA18530CHB_NA18550CHB_NA18576CHB_NA18608CHB_NA18639CHB_NA18645CHS_HG00407CHB_NA18525CHB_NA18748CHS_HG00406CHS_HG00445CHS_HG00442CHS_HG00446CHS_HG00452CHB_NA18643CHB_NA18757CHB_NA18640CHB_NA18749CHB_NA18745CHB_NA18747CHB_NA18527CHB_NA18740CHB_NA18528CHB_NA18641CHB_NA18647CHB_NA18642JPT_NA19058JPT_NA19078JPT_NA19055JPT_NA19083JPT_NA18945JPT_NA19003JPT_NA19076JPT_NA18951JPT_NA18990JPT_NA19062JPT_NA18974JPT_NA18989JPT_NA19054JPT_NA19075JPT_NA18983JPT_NA19087JPT_NA18963JPT_NA19067JPT_NA19002JPT_NA18947JPT_NA19081JPT_NA19063JPT_NA19064JPT_NA18964JPT_NA19004JPT_NA18986JPT_NA18946JPT_NA18948JPT_NA18975JPT_NA18971JPT_NA19000JPT_NA18988JPT_NA18952JPT_NA18956JPT_NA18943JPT_NA18968JPT_NA19060JPT_NA18950JPT_NA18985JPT_NA18982JPT_NA19074JPT_NA18984JPT_NA18949JPT_NA19080JPT_NA19059JPT_NA19079JPT_NA19065JPT_NA19088JPT_NA19007JPT_NA18941JPT_NA18960JPT_NA19070JPT_NA18973JPT_NA19056JPT_NA18940JPT_NA18959JPT_NA18965JPT_NA18942JPT_NA18976JPT_NA19005JPT_NA19068JPT_NA19012JPT_NA19066JPT_NA19077JPT_NA18961JPT_NA18977JPT_NA19072JPT_NA18944JPT_NA19085JPT_NA19057JPT_NA18953JPT_NA18981JPT_NA18999JPT_NA19082JPT_NA19084JPT_NA18980JPT_NA19009JPT_NA18987JPT_NA19010JPT_NA18978JPT_NA18998JPT_NA18939JPT_NA18994JPT_NA18995JPT_NA18954JPT_NA18962JPT_NA18957JPT_NA18992JPT_NA18966CHB_NA18626Q2_DGMQ31455Q1_DGMQ31444Q2_DGMQ31514Q2_DGMQ31529Q0_DGMQ31598Q2_DGMQ32368Q1_DGMQ31623Q2_DGMQ31601Q2_DGMQ31574Q2_DGMQ31226Q1_DGMQ31607Q2_DGMQ31582Q2_DGMQ31872Q2_DGMQ32325Q3_DGMQ32371Q2_DGMQ31537Q1_DGMQ32118Q1_DGMQ31274Q1_DGMQ32308Q1_DGMQ31131Q1_DGMQ31725Q1_DGMQ31705Q1_DGMQ32064Q1_DGMQ31449Q1_DGMQ32224Q1_DGMQ32287Q1_DGMQ31433Q1_DGMQ31614Q0_DGMQ31627Q3_DGMQ31643Q1_DGMQ31617Q1_DGMQ31717Q1_DGMQ31568Q1_DGMQ32121Q1_DGMQ31711Q1_DGMQ31129Q0_DGMQ31225Q1_DGMQ31237Q1_DGMQ32302Q1_DGMQ31714Q1_DGMQ31644Q1_DGMQ31556Q2_DGMQ31552Q1_DGMQ32148Q1_DGMQ32061Q1_DGMQ31030Q1_DGMQ31650Q1_DGMQ31029Q1_DGMQ31245Q1_DGMQ31230Q1_DGMQ32182Q1_DGMQ31254Q1_DGMQ31639Q1_DGMQ31123Q1_DGMQ32167Q1_DGMQ31470Q1_DGMQ31508Q1_DGMQ31695Q1_DGMQ31516Q1_DGMQ31005Q1_DGMQ31606Q1_DGMQ31645Q1_DGMQ32293Q1_DGMQ31125Q1_DGMQ32215Q1_DGMQ32250Q2_DGMQ32977Q1_DGMQ31609Q1_DGMQ32180Q1_DGMQ32273Q1_DGMQ32219Q2_DGMQ31497Q0_DGMQ31613Q0_DGMQ31441Q2_DGMQ31452Q0_DGMQ31483Q1_DGMQ31724Q2_DGMQ31108Q0_DGMQ31615Q1_DGMQ31625Q3_DGMQ31641Q2_DGMQ32145Q1_DGMQ32200Q3_DGMQ32311ASW_NA20278Q3_DGMQ32044Q3_DGMQ32060Q0_DGMQ31179Q3_DGMQ32059Q3_DGMQ31406ASW_NA19921Q3_DGMQ32285Q3_DGMQ31425Q3_DGMQ32220Q3_DGMQ32160PUR_HG01108ASW_NA20342ASW_NA20334ASW_NA20336ASW_NA19923ASW_NA20351ASW_NA19922Q3_DGMQ31036Q3_DGMQ31058ASW_NA19914ASW_NA19819ASW_NA20126ASW_NA19713ASW_NA19985ASW_NA19707ASW_NA19818ASW_NA20282ASW_NA20359ASW_NA20363ASW_NA20296ASW_NA20317ASW_NA20412ASW_NA20322Q3_DGMQ32309Q3_DGMQ31244Q3_DGMQ31583Q3_DGMQ32251Q3_DGMQ32291ASW_NA20289ASW_NA20341ASW_NA20332ASW_NA20276ASW_NA20340ASW_NA20344ASW_NA20346ASW_NA19900ASW_NA19700ASW_NA19835ASW_NA20287ASW_NA20281ASW_NA19834ASW_NA19920ASW_NA19701ASW_NA19704ASW_NA19984ASW_NA20339ASW_NA20348ASW_NA20294ASW_NA19703ASW_NA19904ASW_NA19711ASW_NA20356ASW_NA19982ASW_NA19909ASW_NA19712ASW_NA19901ASW_NA19916ASW_NA19908ASW_NA20298ASW_NA20357ASW_NA19917ASW_NA20127LWK_NA19437LWK_NA19473LWK_NA19474LWK_NA19327LWK_NA19315LWK_NA19036LWK_NA19401LWK_NA19463LWK_NA19035LWK_NA19308LWK_NA19020LWK_NA19028LWK_NA19385LWK_NA19041LWK_NA19398LWK_NA19449LWK_NA19457LWK_NA19443LWK_NA19470LWK_NA19469LWK_NA19331LWK_NA19334LWK_NA19313LWK_NA19360LWK_NA19332LWK_NA19428LWK_NA19461LWK_NA19439LWK_NA19438LWK_NA19310LWK_NA19431LWK_NA19324LWK_NA19383LWK_NA19394LWK_NA19372LWK_NA19462LWK_NA19351LWK_NA19446LWK_NA19038LWK_NA19399LWK_NA19429LWK_NA19391LWK_NA19404LWK_NA19456LWK_NA19377LWK_NA19455LWK_NA19319LWK_NA19373LWK_NA19374LWK_NA19371LWK_NA19436LWK_NA19440LWK_NA19390LWK_NA19468LWK_NA19346LWK_NA19395LWK_NA19309LWK_NA19359LWK_NA19384LWK_NA19448LWK_NA19451LWK_NA19452LWK_NA19430LWK_NA19328LWK_NA19381LWK_NA19382LWK_NA19380LWK_NA19379LWK_NA19044LWK_NA19321LWK_NA19338LWK_NA19396LWK_NA19397LWK_NA19350LWK_NA19046LWK_NA19375LWK_NA19393LWK_NA19307LWK_NA19312LWK_NA19316LWK_NA19445LWK_NA19453LWK_NA19434LWK_NA19444LWK_NA19355LWK_NA19347LWK_NA19352LWK_NA19376LWK_NA19317LWK_NA19318LWK_NA19471LWK_NA19466LWK_NA19467LWK_NA19435LWK_NA19311LWK_NA19472LWK_NA19403ASW_NA20291Q3_DGMQ31489YRI_NA19147YRI_NA19198YRI_NA18934YRI_NA18933YRI_NA19235YRI_NA18909YRI_NA19131YRI_NA18508YRI_NA18871YRI_NA18916YRI_NA18516YRI_NA18917YRI_NA18908YRI_NA19236YRI_NA18853YRI_NA18868YRI_NA19138YRI_NA18870YRI_NA19248YRI_NA18874YRI_NA19114YRI_NA18507YRI_NA18520YRI_NA18867YRI_NA18873YRI_NA19099YRI_NA19130YRI_NA19108YRI_NA19098YRI_NA19207YRI_NA19137YRI_NA19256YRI_NA18522YRI_NA19102YRI_NA19119YRI_NA19129YRI_NA18912YRI_NA19200YRI_NA19172YRI_NA18519YRI_NA18907YRI_NA18504YRI_NA18505YRI_NA18501YRI_NA19209YRI_NA18499YRI_NA18523YRI_NA19160YRI_NA19093YRI_NA19225YRI_NA18510YRI_NA19152YRI_NA18511YRI_NA19190YRI_NA19197YRI_NA18487YRI_NA18498YRI_NA19189YRI_NA19146YRI_NA19175YRI_NA19095YRI_NA19185YRI_NA19096YRI_NA19149YRI_NA19121YRI_NA19222YRI_NA19150YRI_NA18861YRI_NA19117YRI_NA18489YRI_NA18856YRI_NA18923YRI_NA19257YRI_NA19113YRI_NA19213YRI_NA18486YRI_NA19107YRI_NA19116YRI_NA19171YRI_NA19204YRI_NA18502YRI_NA18858YRI_NA18910YRI_NA18924YRI_NA19223YRI_NA19247YRI_NA18517YRI_NA19118

LLAQYY

Figure 6. Neighbor-joining tree hierarchical clustering analysis of the combined Qatari genomes andthe 1000 Genomes Project Phase 1 samples based on pairwise proportion of shared alleles calculatedacross the entire autosome. (A) The entire neighbor-joining tree with each of the branches leading to in-dividuals in the 1000 Genomes samples color-coded by continent (Europeans in shades of purple: CEU,FIN, GBR, IBS, TSI; Asians in shades of brown: CHB, CHS, JPT; Africans in shades of orange: LWK, YRI, ASW;Americans in shades of green: CLM, MXL, PUR) and with the Q1 (Bedouin) in red, Q2 (Persian-SouthAsian) in azure, Q3 (African) in black, and Q0 (Subpopulation Unassigned) in gray. (B) Detail of the three(15%) Q2 (Persian-South Asian) that cluster with Europeans. (C) Detail of the 11 (55%) Q2 (Persian-South Asian) individuals, with three (5%) Q1 (Bedouin), one (5%) Q3 (African), and one (13%) Q0(Subpopulation Unassigned) that cluster as an outgroup to Asians. (D) Detail of the 50 (89%) Q1 indi-viduals, with three (15%) Q2 (Persian-South Asian), one (5%) Q3 (African), and two (25%) Q0(Subpopulation Unassigned), that cluster outside the Africans and African Ancestry in Southwest USand that also cluster as an outgroup to all other non-African populations, indicating that they are themost distant ancestors of all non-Africans. (E) Detail showing the three (15%) Q1 (Bedouin), three(15%) Q2 (Persian-South Asian), 12 (60%) Q3 (African), and four (50%) Q0 (SubpopulationUnassigned) that do not form large clusters but are all located within the admixed cluster. (F ) Detail ofthe one (5%) Q3 (African) that clusters between Yoruba (YRI) and Luhya (LWK).

Rodriguez-Flores et al.

158 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 9: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

decade (Qatar Statistics Authority 2013, http://www.qsa.gov.qa/QatarCensus/Pdf/Population above 15 by educational attainment,nationality, age, sex and marital status.pdf). As selection criteria,we required that subjects be third-generationQataris and all ances-tors were Qatari citizens born in Qatar, as assessed by question-naires. Recent immigrants or residents of Qatar who traced theirrecent ancestry to other geographic regions were excluded.

Natives of the Arabian Peninsula can be divided into at leastthree genetic subpopulations that reflect the historical migrationpatterns in the region: Q1 (Bedouin), Q2 (Persian-South Asian),and Q3 (African) (Hunter-Zinck et al. 2010; Omberg et al. 2012;Rodriguez-Flores et al. 2012). A panel of 48 SNPs was genotypedby TaqMan (Life Technologies) sufficient for classification intoone of the three subpopulations based on >70% ancestry in onecluster in a STRUCTURE analysis with k = 3 used to identify indi-viduals that could unambiguously be placed in one of these threegroups (Supplemental Fig. 2; Pritchard et al. 2000; Rodriguez-Flores et al. 2012). Our primary focus was the Q1 (Bedouin) genet-ic subpopulation because of its deepest ancestry in Arabia(Ferdinand et al. 1993), so we selected 60 Q1 (Bedouin) individu-als to include in the sample. We additionally selected 20 Q2(Persian-South Asian) and 20 Q3 (African) to use as controls inthe analysis, and an additional eight Q0 (Subpopulation Unas-signed) individuals that could not be confidently placed in oneof these subpopulations, defined as not having >70% ancestryin any of the three groups as determined by STRUCTURE. Thetotal sample therefore included 108 individuals with an even dis-tribution of males and females (see Supplemental Methods;Supplemental Table I).

Illumina deep sequencing of the genomes

In order to characterize the spectrum of genetic variation, each ofthe 108 Qatari genomes were sequenced to a median depth of 37×(minimum 30×) through the Illumina Genome Network (seeSupplemental Methods for details).

Relatedness among Qataris

Given the high rate of consanguineous marriage previously re-ported in the Qatari population (Hunter-Zinck et al. 2010; Mezza-villa et al. 2015), we sought to quantify the relatedness betweenindividuals in our sample and to exclude closely related individ-uals that could potentially confound population genetics analysismethods that assumethe input sample isunrelated. Inorder to con-duct the relatedness analysis, autosomal SNPs in 108 Qatari ge-nomes (described above) were filtered using PLINK 1.9 (Changet al. 2015), and relatedness between the 108 Qatari genomes wasassessed using kinship coefficients estimated by KING-robust(Manichaikul et al. 2010) and PREST-plus (McPeek and Sun 2000)(see Supplemental Methods). Both methods found the same fivefirst-degree and second-degree relationships, in which these rela-tionships were then confirmed by investigative reassessment ofmedical records. One individual from each of the five pairs of rel-atives was then excluded from the study. Three of the pairs of rel-atives formed a trio; hence, two individuals were excluded fromthe trio, and one individual was excluded from each of the othertwo pairs, resulting in exclusion of four relatives in total.

Integration with the 1000 Genomes Project Phase 1

An integrated SNP call set was produced for ancestry analysis for atotal of 1200 genomes, combining the 108 Qatari genomes withthe 1092 genomes from the 1000 Genomes Project Phase 1(1000 Genomes) (The 1000 Genomes Project Consortium 2012)(see Supplemental Methods). The integrated call set included

11,711,411 autosomal biallelic SNPs. The transition:transversionratio of this final set was 2.2, close to values previously observedin the 1000 Genomes Project (The 1000 Genomes ProjectConsortium 2012). Based on the concordance and quality mea-sures, the calls generated from our pipeline were considered to behigh quality, and these were used for all further aspects of thisstudy. After exclusion of four related Qataris (SupplementalTable III), the final integrated call set included 11,711,386 autoso-mal biallelic SNPs in 1196 genomes.

Integration with Human Origins data set

The 1000 Genomes Project Phase 1 is an excellent resource for rarevariant discovery; however, it is limited in terms of the breadth ofglobal populations sampled. Unfortunately, at the time of writing,no global resource of sequenced genomes existed; hence, the nextbest alternative for comparison of the Qataris to populationsaround the world is the “Human Origins Fully Public Dataset” (re-ferred to here as “HumanOrigins” [HO]), which includes genotypedata for 1917 indivduals from Africa, West Eurasia (includingMiddle East), South Asia, East Asia, Central Asia/Siberia, andAmerica. In particular, the West Eurasian, African, and SouthAsian data sets include populations sampled in countries close toQatar, where detection of shared ancestry is of interest in thisstudy. The data set also includes data from archaic genomes,such as Altai Neanderthal, Denisova, and chimpanzee, which areof interest in this study for quantification of Neanderthal ancestry.The Human Origins data set includes a number of samples alsopresent in the 1000 Genomes Project (Supplemental Table IV),and for these samples, the Human Origins overlap data is kept.

In order to conduct population genetic analysis on a com-bined data set of the 104 Qatari genomes (QG, n = 104), the 1000Genomes Project Phase 1 (1000G-HO, n = 1028 after exclusionof duplicates), and Human Origins Fully Public Dataset (HO, n =1862 after exclusion of archaic genomes, ancient genomes,and other genomes not relevant to this study) (SupplementalTable V), a set of sites overlapping between the integrated Qatarigenomes plus the 1000 Genomes Project minus Human Origins,and the Human Origins data set were identified. Of 600,841SNPs in the Human Origins data set and 11,711,386 SNPs in theQatari genomes plus the 1000 Genomes Project data set, 388,805SNPs overlapped. Further filtering was conducted on the data set,pruning SNPs based on linkage disequilibrium using PLINK(Purcell et al. 2007), “–indep-pairwise 200 25 0.4,” matching pa-rameters used previously (Lazaridis et al. 2014). After linkage dise-quilibrium-pruning, the final data set for analysis included197,714 SNPs segregating in the three data sets (QG, 1000G-HO,and HO).

Inbreeding coefficient

In order to place the high reported consanguinity in Qatar in aglobal context, the inbreeding coefficient was calculated usingPLINK 1.9 (Chang et al. 2015) for Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African) Qataris, the 1000 GenomesProject minus Human Origins overlap, and Human Origins popu-lations (see Supplemental Methods).

Principal component analysis

APCA(Priceet al. 2006)wascarriedout for thecombined104Qatarigenomes, the 1000 Genomes Project minus Human Origins over-lap, and Human Origins samples using the 197,714 SNPs in theintegrated data set (filtering criteria described above). Using theresults of this large-scale analysis, visual assessment of clusteringandpopulationoverlapwas used to confirmexpected relationships

Uninterrupted ancestry in Arabia

Genome Research 159www.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 10: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

between the analyzed populations. Four distinct plots of a singlePCA run were constructed: one comparing the Qatari genomesto the 1000 Genomes populations (Supplemental Fig. 5A), onecomparing Qataris to the 1000 Genomes and Human OriginsSamples including two visualizations of the full data set (Fig. 1A,color-coded by regional meta-populations; Supplemental Fig. 5B,color-coded by detailed population), and one comparing Qataristo Middle Eastern populations from the Human Origins dataset (Fig. 1B). For the latter, in order to compare Qataris to MiddleEastern populations with potential for recent shared Bedouin an-cestry with Qataris sampled by the Human Origins data set, popu-lations from the Middle East previously labeled in Lazaridis et al.(2014) as“WestEurasia,”were relabeledas“MiddleEast,” includingBedouinA,BedouinB,Druze, EgyptianComas, EgyptianMetspalu,Iranian, Jordanian, Lebanese, Palestinian, Saudi, Syrian, Turkish,Turkish Adana, Turkish Aydin, Turkish Balikesir, Turkish Istanbul,Turkish Kayseri, Turkish Trabzon, and Yemen.

Y and mitochondria haplogroup assignment

In order to determine the prevalence of known Chr Y and mtDNAhaplogroups inQatar, SNPgenotypesweregenerated simultaneouslyfor the 108 Qatari genomes using an updated version of GATK(v3.1.1) (DePristo et al. 2011) that supports haploid chromosomecalling (n = 53 Chr Y, n = 108 mtDNA). For one of the genomes, thesample was originally thought to be male but is most likely femaledue to low call rates on Chr Y. This sample was excluded fromChrYanalysis andX/Adiversityanalysis, butwas included inautoso-mal andmtDNAanalysis.Meancoverageofmapped readswas11× inChr Y and 3892× inmtDNA. After exclusion of related and Q0 (Sub-population Unassigned) (admixed) Qataris, the remaining samplesincluded 47 Chr Y and 96mtDNA.

Haplogroup assignments for the Chr Y and mtDNA weremade using previously characterized variants. For Chr Y, these as-signments were made using YFitter (Jostins et al. 2014) by usingvariants limited to known SNPs cataloged by the InternationalSociety ofGeneticGenealogy (Jobling andTyler-Smith 2003)with-in a 10-Mb interval of the YChromosome that is known to be ame-nable to analysis based on short read sequencing (Skaletsky et al.2003; Poznik et al. 2013). For mtDNA, these assignments weremade using HaploGrep (Kloss-Brandstätter et al. 2011) by usingthe set of known haplogroup-specific variants in the PhyloTree(van Oven and Kayser 2009) database.

Inorder toquantify the differences betweenmtDNAandChr Yin terms of diversity of the haplogroups identified, the propor-tion of variance among and within populations was quantifiedfor Chr Y and mtDNA using the AMOVA function in Arlequin(Supplemental Methods; Excoffier et al. 1992; Excoffier and Lischer2010). The analysis was repeated eight times, including separateanalysis of Chr Y andmtDNA, for three-way comparison of the pop-ulations, as well as all possible two-way comparisons (Q1/Q2, Q1/Q3, Q2/Q3). The proportion of variance among andwithin popula-tionswas tabulated, aswell as the estimated Fst and P-value for both.

Comparison of X Chromosome to autosomal diversity

The ratio of X-linked to autosomal nucleotide diversity (X/A) fordifferent populations was computed following the approach inGottipati et al. (2011) and Arbiza et al. (2014) (SupplementalMethods).

Coalescent analysis

To infer the extent and timing of bottlenecks, the pairwise sequen-tialMarkov coalescent (PSMC) (Li andDurbin 2011)was applied to

the 96 Q1 (Bedouin), Q2 (Persian-South Asian), or Q3 (African)Qatari genomes. A plot of effective population size versus yearsin the past was generated for each of the genome using instruc-tions from the PSMC manual (Li and Durbin 2011; seeSupplementalMethods). For comparison, the same PSMCpipelinewas run onBAM files of Illumina deep sequencing readsmapped tothe GRCh37 human reference genome for an individual ofEuropean ancestry (NA12878, Utah resident with Northern andWestern European ancestry, CEU) and an individual of African an-cestry (NA19239, Yoruba in Ibadan, Nigeria, YRI) sequenced aspart of the 1000 Genomes Pilot (The 1000 Genomes ProjectConsortium2010). The resulting PSMCplots for these two individ-uals were shifted slightly, such that they align with Qatari PSMCplots at distant (>200,000 yr ago) timescales (Fu et al. 2014).

Genome-wide admixture analysis

In order to learnmore about the ancestry of the sampled Qataris, agenome-wide admixture analysis was conducted on the combineddata set of 104 Qatari genomes, the 1000 Genomes Project minusHuman Origins overlap, and Human Origins using ADMIXTURE(Supplemental Methods; Alexander et al. 2009). The cross-valida-tion error was calculated for a range of expected number of ances-tral populations (K), and the K with the lowest cross-validationerror was used to quantify ancestry, in this case K = 12.

African admixture proportion and timing

In order to estimate the proportion and timing of African admix-ture in Qatari populations, the genomes of Qataris and world pop-ulations were analyzed using ALDER 1.2 (Supplemental Methods;Loh et al. 2013).

Local admixture analysis

An admixture deconvolution analysis was performed on the 96Q1(Bedouin), Q2 (Persian-South Asian), or Q3 (African) Qatari ge-nomes using the 11,711,386 autosomal SNPs segregating in boththe 1000 Genomes Project and Qatari genomes using SupportMix(Supplemental Fig. 9; SupplementalMethods;Omberg et al. 2012).

Neanderthal ancestry

In order to compare the proportion of Neanderthal admixturein Q1 (Bedouin) Qataris with that of other populations in the1000 Genomes Project (The 1000 Genomes Project Consortium2012) and Human Origins (Lazaridis et al. 2014), the F4 ratio(Patterson et al. 2012) and Patterson’s D-statistic (Patterson et al.2012) were estimated using the qpF4ratio and qpDstat programs,respectively, from the ADMIXTOOLS 3.0 package (SupplementalMethods; Patterson et al. 2012).

We additionally considered the expected F4 ratio for the Q1(Bedouin) under the scenario of no admixture betweenNeanderthal and direct ancestors of Q1 (Bedouin), such that ob-served Neanderthal ancestry in Q1 (Bedouin) would be entirelydue to European admixture. From the estimated components ofthe ADMIXTURE analysis with K = 12, the Southern European an-cestry in the Q1 (Bedouin) is 8.2% on average, and the NorthernEuropean ancestry in Q1 (Bedouin) is 1.3% on average, totaling9.5% of the genome. If the Q1 (Bedouin) had never mixed withNeanderthal prior to introduction of European admixture, assum-ing no selection against introgressed genomic intervals, we wouldtherefore expect an F4 ratio in Q1 (Bedouin) to be on the order of1/10 of those observed in European populations.

Rodriguez-Flores et al.

160 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 11: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

TreeMix analysis

We performed a TreeMix analysis (Pickrell and Pritchard 2012) ofthe 96 Q1 (Bedouin), Q2 (Persian-South Asian), or Q3 (African)Qatari genomes and the 1000Genomes Project excluding admixedpopulations (Puerto Rican, Mexican, Colombian, and AfricanAncestry in Southwest US) (Supplemental Methods).

Neighbor-joining tree clustering

In order to determine if any of the Qatari genomes were the mostdistant ancestors of all non-African populations, neighbor-joiningtrees were constructed for the 104 Qatari genomes and the 1000Genomes Project using the 11,711,386 autosomal SNPs segregat-ing in both data sets. For each pair of genomes, the proportionof shared alleles (PSA) (Mountain and Cavalli-Sforza 1997), or 1minus the proportion of the genome identical by state (IBS), wascalculated using the “–distance -square -1-ibs” function in PLINK1.9 (Purcell et al. 2007; Chang et al. 2015), which outputs a1196×1196 matrix of distances (1 minus IBS distance or PSA). Aneighbor-joining (NJ) tree was constructed using a recently updat-ed version of the originalNJ (Saitou andNei 1987) algorithm calledNJS (Criscuolo andGascuel 2008) that is better at handlingmissingvalues, as implemented in the APE package in R (Paradis et al.2004; R Core Team 2014). Overall, this approach is computation-ally tractable formillions ofmarkers genotyped in thousands of ge-nomes and produces similar topologies to maximum-likelihoodclustering methods but requires only a fraction of the computetime, where the trade-off is a sacrifice in the accuracy of branchlengths (Tateno et al. 1994). The algorithm takes the distancematrix as input and outputs a tree. In order to confirm the robust-ness to sample ordering, the order of samples in the matrix wasshuffled and reclustered 100 times, in which all reclusterings re-covered the same tree. In order to produce bootstrap support val-ues for the tree, 100 reclusterings of the tree were generatedbased on random sampling of SNPs. For each bootstrap iteration,11,711,386 random (with replacement) SNPs were selected usinga Python script (www.python.org), and then the PSA distancematrix and NJ tree were recalculated using these SNPs. Bootstrapsupport was calculated using the Python package SumTrees(Sukumaran and Holder 2010).

For visualization, the tree was rooted at the most recent com-mon ancestor (MRCA) node of the largest cluster of the 1000Genomes Yoruba (YRI) genomes in the tree. A color version ofthe tree was produced using TreeGraph 2 (Stöver and Müller2010) by manually coloring the branches leading to each node.A single color is assigned to each population, with populationsfrom the same continent having similar colors: Europeans inshades of purple, Asians in shades of brown, Americans in shadesof green, Africans in shades of orange, Q1 (Bedouin) in red, Q2(Persian-South Asian) in blue, Q3 (Sub-Saharan African) in black,and Q0 (Subpopulation Unassigned) in gray. When a cluster ofnodes includes different populations, the terminal branches weregiven population-specific colors, whereas the shared higher-orderbranches for the cluster were given the color of the population inmajority. For example, if 10 Q1 (Bedouin) and 1 Q0 (Subpopula-tion Unassigned) were in a cluster, the branches above where thenodes come together were colored red.

Data access

The sequence data generated for this study in BAM format, as wellas SNP genotypes in VCF format, have been submitted to the NCBISequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/)under accession number SRP060765. Allele frequencies for known

and novel genomic SNPs have been submitted to NCBI dbSNP(http://www.ncbi.nlm.nih.gov/SNP/) under submitter batch IDQG108_GENOMIC_SNPS_20151008 (http://www.ncbi.nlm.nih.gov/SNP/snp_viewBatch.cgi?sbid=1062298) and submitter handleWEILL_CORNELL_DGM. PLINK and VCF files of genotypes forvariants analyzed in this study, both before and after integrationwith 1000Genomes andHumanOrigins, are available on ourweb-site http://geneticmedicine.weill.cornell.edu/genome.html.

Acknowledgments

We thank the three reviewers of this manuscript for helpful com-ments and suggestions; the 1000 Genomes Project Consortium forhelpful advice on analysis methods; M.R. Staudt, Y. Strulovici-Barel, A. Al Shakaki, O.M. Chidiac, R. Mathew, and the WCMC-Q Genomics Core for help with the study; and Mezey laboratorystudents and N. Mohamed for help in preparing this manuscript.These studies were supported, in part, by the Qatar FoundationandWeill CornellMedical College inQatar. J.L.R.F. was supported,in part, by the National Heart, Lung, and Blood Institute of theNational Institutes of Health (NIH) T32 HL09428.

References

The 1000 Genomes Project Consortium. 2010. A map of human genomevariation from population-scale sequencing. Nature 467: 1061–1073.

The 1000Genomes Project Consortium. 2012. An integratedmap of geneticvariation from 1,092 human genomes. Nature 491: 56–65.

Abu-Amero KK, González AM, Larruga JM, Bosley TM, Cabrera VM. 2007.Eurasian and African mitochondrial DNA influences in the SaudiArabian population. BMC Evol Biol 7: 32.

Abu-Amero KK, Larruga JM, Cabrera VM,González AM. 2008.Mitochondri-al DNA structure in the Arabian Peninsula. BMC Evol Biol 8: 45.

Abu-Amero KK, Hellani A, González AM, Larruga JM, Cabrera VM,Underhill PA. 2009. Saudi Arabian Y-Chromosome diversity and its re-lationship with nearby regions. BMC Genet 10: 59.

Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation ofancestry in unrelated individuals. Genome Res 19: 1655–1664.

Alsmadi O, Thareja G, Alkayal F, Rajagopalan R, John SE, Hebbar P,Behbehani K, Thanaraj TA. 2013. Genetic substructure of Kuwaiti pop-ulation reveals migration history. PLoS One 8: e74913.

Alsmadi O, John SE, Thareja G, Hebbar P, Antony D, Behbehani K, ThanarajTA. 2014. Genome at juncture of early human migration: a systematicanalysis of two whole genomes and thirteen exomes from Kuwaiti pop-ulation subgroup of inferred Saudi Arabian tribe ancestry. PLoS One 9:e99069.

Arbiza L, Gottipati S, Siepel A, Keinan A. 2014. Contrasting X-linked and au-tosomal diversity across 14 human populations. Am J Hum Genet 94:827–844.

Armitage SJ, Jasim SA, Marks AE, Parker AG, Usik VI, Uerpmann HP. 2011.The southern route “out of Africa”: evidence for an early expansion ofmodern humans into Arabia. Science 331: 453–456.

Behar DM, Yunusbayev B, Metspalu M, Metspalu E, Rosset S, Parik J, RootsiS, Chaubey G, Kutuev I, Yudkovsky G, et al. 2010. The genome-widestructure of the Jewish people. Nature 466: 238–242.

Cann RL, Stoneking M, Wilson AC. 1987. Mitochondrial DNA and humanevolution. Nature 325: 31–36.

Cavalli-Sforza LL, FeldmanMW. 2003. The application ofmolecular geneticapproaches to the study of human evolution. Nat Genet 33(Suppl):266–275.

Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. 2015.Second-generation PLINK: rising to the challenge of larger and richerdatasets. Gigascience 4: 7.

Criscuolo A, Gascuel O. 2008. Fast NJ-like algorithms to deal with incom-plete distance matrices. BMC Bioinformatics 9: 166.

DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C,Philippakis AA, del Angel G, Rivas MA, Hanna M, et al. 2011. A frame-work for variation discovery and genotyping using next-generationDNA sequencing data. Nat Genet 43: 491–498.

Esposito JL. 2001.Women in muslim family law, 2nd ed. Syracuse UniverstiyPress, Syracuse, NY.

Excoffier L, Lischer HE. 2010. Arlequin suite ver 3.5: a new series of pro-grams to perform population genetics analyses under Linux andWindows. Mol Ecol Resour 10: 564–567.

Uninterrupted ancestry in Arabia

Genome Research 161www.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 12: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

Excoffier L, Smouse PE, Quattro JM. 1992. Analysis ofmolecular variance in-ferred frommetric distances among DNA haplotypes: application to hu-man mitochondrial DNA restriction data. Genetics 131: 479–491.

Ferdinand K, Nicolaisen I, Carlsberg Foundation’s Nomad Research Project.1993. Bedouins of Qatar. Thames and Hudson, New York.

Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL,Aximu-Petri A, Prüfer K, de FC, et al. 2014. Genome sequence of a45,000-year-old modern human from western Siberia. Nature 514:445–449.

Gottipati S, Arbiza L, Siepel A, Clark AG, Keinan A. 2011. Analyses of X-linked and autosomal genetic variation in population-scale whole ge-nome sequencing. Nat Genet 43: 741–743.

Gronau I, HubiszMJ, Gulko B, DankoCG, Siepel A. 2011. Bayesian inferenceof ancient human demography from individual genome sequences.NatGenet 43: 1031–1034.

Hammer MF, Karafet T, Rasanayagam A, Wood ET, Altheide TK, Jenkins T,Griffiths RC, Templeton AR, Zegura SL. 1998. Out of Africa and backagain: nested cladistic analysis of human Y chromosome variation.Mol Biol Evol 15: 427–441.

Henn BM, Cavalli-Sforza LL, Feldman MW. 2012. The great human expan-sion. Proc Natl Acad Sci 109: 17758–17764.

Hershkovitz I, Marder O, Ayalon A, Bar-Matthews M, Yasur G, Boaretto E,Caracuta V, Alex B, Frumkin A, Goder-Goldberger M, et al. 2015.Levantine cranium from Manot Cave (Israel) foreshadows the firstEuropean modern humans. Nature 520: 216–219.

Hochreiter S. 2013. HapFABIA: identification of very short segments ofidentity by descent characterized by rare variants in large sequencingdata. Nucleic Acids Res 41: e202.

Hodgson JA, Mulligan CJ, Al-Meeri A, Raaum RL. 2014. Early back-to-Africamigration into the Horn of Africa. PLoS Genet 10: e1004393.

Hunter-Zinck H, Musharoff S, Salit J, Al-Ali KA, Chouchane L, Gohar A,Matthews R, Butler MW, Fuller J, Hackett NR, et al. 2010. Population ge-netic structure of the people of Qatar. Am J Hum Genet 87: 17–25.

JoblingMA, Tyler-Smith C. 2003. The human Y chromosome: an evolution-ary marker comes of age. Nat Rev Genet 4: 598–612.

John SE, Thareja G, Hebbar P, Behbehani K, Thanaraj TA, Alsmadi O. 2015.Kuwaiti population subgroup of nomadic Bedouin ancestry—whole ge-nome sequence and analysis. Genom Data 3: 116–127.

Jostins L, Xu Y, McCarthy S, Ayub Q, Durbin R, Barrett J, Tyler-Smith C.2014. YFitter: maximum likelihood assignment of Y chromosome hap-logroups from low-coverage sequence data. arXiv: 1407.7988.

Kloss-Brandstätter A, Pacher D, Schönherr S, Weissensteiner H, Binna R,Specht G, Kronenberg F. 2011. HaploGrep: a fast and reliable algorithmfor automatic classification of mitochondrial DNA haplogroups. HumMutat 32: 25–32.

Kopelman NM, Stone L, Gascuel O, Rosenberg NA. 2013. The behavior ofadmixed populations in neighbor-joining inference of population trees.Pac Symp Biocomput 273–284.

Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K,Sudmant PH, Schraiber JG, Castellano S, Lipson M, et al. 2014.Ancient human genomes suggest three ancestral populations forpresent-day Europeans. Nature 513: 409–413.

Li H, Durbin R. 2011. Inference of human population history from individ-ual whole-genome sequences. Nature 475: 493–496.

Loh PR, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, Berger B.2013. Inferring admixture histories of human populations using linkagedisequilibrium. Genetics 193: 1233–1254.

Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. 2010.Robust relationship inference in genome-wide association studies.Bioinformatics 26: 2867–2873.

Markus B, Alshafee I, Birk OS. 2014. Deciphering the fine-structure of tribaladmixture in the Bedouin population using genomic data. Heredity(Edinb.) 112: 182–189.

Mathieson I, McVean G. 2014. Demography and the age of rare variants.PLoS Genet 10: e1004528.

McPeek MS, Sun L. 2000. Statistical tests for detection of misspecified rela-tionships by use of genome-screen data.Am JHumGenet66: 1076–1094.

Mezzavilla M, Vozzi D, Badii R, Alkowari MK, Abdulhadi K, Girotto G,Gasparini P. 2015. Increased rate of deleterious variants in long runsof homozygosity of an inbred population from Qatar. Hum Hered 79:14–19.

Mountain JL, Cavalli-Sforza LL. 1997. Multilocus genotypes, a tree of indi-viduals, and human evolutionary history.Am J HumGenet 61: 705–718.

Omberg L, Salit J, Hackett N, Fuller J, Matthew R, Chouchane L, Rodriguez-Flores JL, Bustamante C, Crystal RG, Mezey JG. 2012. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral popula-tions. BMC Genet 13: 49.

Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics andevolution in R language. Bioinformatics 20: 289–290.

Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D. 2006. Genetic evi-dence for complex speciation of humans and chimpanzees. Nature441: 1103–1108.

Patterson N,Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, GenschoreckT, Webster T, Reich D. 2012. Ancient admixture in human history.Genetics 192: 1065–1093.

Pérez-Miranda AM, Alfonso-Sánchez MA, Peña JA, Herrera RJ. 2006. QatariDNA variation at a crossroad of human migrations. Hum Hered 61:67–79.

Pickrell JK, Pritchard JK. 2012. Inference of population splits and mixturesfrom genome-wide allele frequency data. PLoS Genet 8: e1002967.

Pickrell JK, Patterson N, Loh PR, Lipson M, Berger B, Stoneking M,Pakendorf B, Reich D. 2014. Ancient west Eurasian ancestry in southernand eastern Africa. Proc Natl Acad Sci 111: 2632–2637.

Poznik GD, Henn BM, YeeMC, Sliwerska E, Euskirchen GM, Lin AA, SnyderM, Quintana-Murci L, Kidd JM, Underhill PA, et al. 2013. Sequencing Ychromosomes resolves discrepancy in time to common ancestor ofmales versus females. Science 341: 562–565.

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D.2006. Principal components analysis corrects for stratification in ge-nome-wide association studies. Nat Genet 38: 904–909.

Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population struc-ture using multilocus genotype data. Genetics 155: 945–959.

Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A,RenaudG, Sudmant PH, de Filippo C, et al. 2014. The complete genomesequence of a Neanderthal from the Altai Mountains. Nature 505:43–49.

Purcell S, Neale B, Todd-BrownK, Thomas L, FerreiraMA, Bender D,Maller J,Sklar P, de Bakker PI, DalyMJ, et al. 2007. PLINK: a tool set for whole-ge-nome association and population-based linkage analyses. Am J HumGenet 81: 559–575.

R Core Team. 2014. R: a language and environment for statistical computing.R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

RasmussenM,GuoX,WangY, Lohmueller KE, Rasmussen S, Albrechtsen A,Skotte L, Lindgreen S, MetspaluM, Jombart T, et al. 2011. An AboriginalAustralian genome reveals separate human dispersals into Asia. Science334: 94–98.

Rodriguez-Flores JL, Fuller J, Hackett NR, Salit J, Malek JA, Al-Dous E,Chouchane L, Zirie M, Jayoussi A, MahmoudMA, et al. 2012. Exome se-quencing of only seven Qataris identifies potentially deleterious vari-ants in the Qatari population. PLoS One 7: e47614.

Rodriguez-Flores JL, Fakhro K, Hackett NR, Salit J, Fuller J, Agosto-Perez F,Gharbiah M, Malek JA, Zirie M, Jayyousi A, et al. 2014. Exome sequenc-ing identifies potential risk variants for Mendelian disorders at highprevalence in Qatar. Hum Mutat 35: 105–116.

Rowold DJ, Luis JR, Terreros MC, Herrera RJ. 2007. Mitochondrial DNAgeneflow indicates preferred usage of the Levant Corridor over theHorn of Africa passageway. J Hum Genet 52: 436–447.

Saitou N, Nei M. 1987. The neighbor-joining method: a newmethod for re-constructing phylogenetic trees. Mol Biol Evol 4: 406–425.

Sandridge AL, Takeddin J, Al-Kaabi E, Frances Y. 2010. Consanguinity inQatar: knowledge, attitude and practice in a population born between1946 and 1991. J Biosoc Sci 42: 59–82.

Schiffels S, Durbin R. 2014. Inferring human population size and separationhistory from multiple genome sequences. Nat Genet 46: 919–925.

Shi H, Zhong H, Peng Y, Dong YL, Qi XB, Zhang F, Liu LF, Tan SJ, Ma RZ,Xiao CJ, et al. 2008. Y chromosome evidence of earliest modern humansettlement in East Asia and multiple origins of Tibetan and Japanesepopulations. BMC Biol 6: 45.

Shriner D, Tekola-Ayele F, Adeyemo A, Rotimi CN. 2014. Genome-wide ge-notype and sequence-based reconstruction of the 140,000 year historyof modern human ancestry. Sci Rep 4: 6055.

Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, BrownLG, Repping S, Pyntikova T, Ali J, Bieri T, et al. 2003. The male-specificregion of the human Y chromosome is a mosaic of discrete sequenceclasses. Nature 423: 825–837.

Stöver BC, Müller KF. 2010. TreeGraph 2: combining and visualizing evi-dence from different phylogenetic analyses. BMC Bioinformatics 11: 7.

Sukumaran J, Holder MT. 2010. DendroPy: a Python library for phylogenet-ic computing. Bioinformatics 26: 1569–1571.

Tateno Y, Takezaki N, Nei M. 1994. Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methodswhen substitution rate varies with site. Mol Biol Evol 11: 261–277.

van OvenM, Kayser M. 2009. Updated comprehensive phylogenetic tree ofglobal human mitochondrial DNA variation. Hum Mutat 30: E386–E394.

Received March 13, 2015; accepted in revised form December 15, 2015.

Rodriguez-Flores et al.

162 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from

Page 13: Research Indigenous Arabs are descendants of the …keinanlab.cb.bscb.cornell.edu/sites/default/files/papers/...Indigenous Arabs are descendants of the earliest split from ancient

10.1101/gr.191478.115Access the most recent version at doi:2016 26: 151-162 originally published online January 4, 2016Genome Res. 

  Juan L. Rodriguez-Flores, Khalid Fakhro, Francisco Agosto-Perez, et al.   Eurasian populationsIndigenous Arabs are descendants of the earliest split from ancient

  Material

Supplemental 

http://genome.cshlp.org/content/suppl/2015/12/21/gr.191478.115.DC1.html

  References

  http://genome.cshlp.org/content/26/2/151.full.html#ref-list-1

This article cites 65 articles, 17 of which can be accessed free at:

  Open Access

  Open Access option.Genome ResearchFreely available online through the

  License

Commons Creative

  .http://creativecommons.org/licenses/by/4.0/

License (Attribution 4.0 International), as described at , is available under a Creative CommonsGenome ResearchThis article, published in

ServiceEmail Alerting

  click here.top right corner of the article or

Receive free email alerts when new articles cite this article - sign up in the box at the

http://genome.cshlp.org/subscriptionsgo to: Genome Research To subscribe to

© 2016 Rodriguez-Flores et al.; Published by Cold Spring Harbor Laboratory Press

Cold Spring Harbor Laboratory Press on March 16, 2016 - Published by genome.cshlp.orgDownloaded from