Literature DB >> 25230663

Ancient human genomes suggest three ancestral populations for present-day Europeans.

Iosif Lazaridis1, Nick Patterson2, Alissa Mittnik3, Gabriel Renaud4, Swapan Mallick1, Karola Kirsanow5, Peter H Sudmant6, Joshua G Schraiber7, Sergi Castellano4, Mark Lipson8, Bonnie Berger9, Christos Economou10, Ruth Bollongino5, Qiaomei Fu11, Kirsten I Bos3, Susanne Nordenfelt1, Heng Li1, Cesare de Filippo4, Kay Prüfer4, Susanna Sawyer4, Cosimo Posth3, Wolfgang Haak12, Fredrik Hallgren13, Elin Fornander13, Nadin Rohland1, Dominique Delsate14, Michael Francken15, Jean-Michel Guinet16, Joachim Wahl17, George Ayodo18, Hamza A Babiker19, Graciela Bailliet20, Elena Balanovska21, Oleg Balanovsky22, Ramiro Barrantes23, Gabriel Bedoya24, Haim Ben-Ami25, Judit Bene26, Fouad Berrada27, Claudio M Bravi20, Francesca Brisighelli28, George B J Busby29, Francesco Cali30, Mikhail Churnosov31, David E C Cole32, Daniel Corach33, Larissa Damba34, George van Driem35, Stanislav Dryomov36, Jean-Michel Dugoujon37, Sardana A Fedorova38, Irene Gallego Romero39, Marina Gubina34, Michael Hammer40, Brenna M Henn41, Tor Hervig42, Ugur Hodoglugil43, Aashish R Jha39, Sena Karachanak-Yankova44, Rita Khusainova45, Elza Khusnutdinova45, Rick Kittles46, Toomas Kivisild47, William Klitz48, Vaidutis Kučinskas49, Alena Kushniarevich50, Leila Laredj51, Sergey Litvinov52, Theologos Loukidis53, Robert W Mahley54, Béla Melegh26, Ene Metspalu55, Julio Molina56, Joanna Mountain57, Klemetti Näkkäläjärvi58, Desislava Nesheva44, Thomas Nyambo59, Ludmila Osipova34, Jüri Parik55, Fedor Platonov60, Olga Posukh34, Valentino Romano61, Francisco Rothhammer62, Igor Rudan63, Ruslan Ruizbakiev64, Hovhannes Sahakyan65, Antti Sajantila66, Antonio Salas67, Elena B Starikovskaya36, Ayele Tarekegn68, Draga Toncheva44, Shahlo Turdikulova69, Ingrida Uktveryte49, Olga Utevska70, René Vasquez71, Mercedes Villena71, Mikhail Voevoda72, Cheryl A Winkler73, Levon Yepiskoposyan74, Pierre Zalloua75, Tatijana Zemunik76, Alan Cooper12, Cristian Capelli77, Mark G Thomas78, Andres Ruiz-Linares78, Sarah A Tishkoff79, Lalji Singh80, Kumarasamy Thangaraj81, Richard Villems82, David Comas83, Rem Sukernik36, Mait Metspalu50, Matthias Meyer4, Evan E Eichler84, Joachim Burger5, Montgomery Slatkin48, Svante Pääbo4, Janet Kelso4, David Reich85, Johannes Krause86.   

Abstract

We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages.

Entities:  

Mesh:

Year:  2014        PMID: 25230663      PMCID: PMC4170574          DOI: 10.1038/nature13673

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   49.962


Near Eastern migrants played a major role in the introduction of agriculture to Europe, as ancient DNA indicates that early European farmers were distinct from European hunter-gatherers[4,5] and close to present-day Near Easterners[4,6]. However, modelling present-day Europeans as a mixture of these two ancestral populations[4] does not account for the fact that they are also admixed with a population related to Native Americans[7,8]. To clarify the prehistory of Europe, we sequenced nine ancient genomes (Fig. 1A; Extended Data Fig. 1): “Stuttgart” (19-fold coverage), a ~7,000 year old skeleton found in Germany in the context of artifacts from the first widespread farming culture of central Europe, the Linearbandkeramik; “Loschbour” (22-fold), an ~8,000 year old skeleton from the Loschbour rock shelter in Luxembourg, discovered in the context of hunter-gatherer artifacts (SI1; SI2); and seven ~8,000 year old samples (0.01–2.4-fold) from a hunter-gatherer burial in Motala, Sweden (the highest coverage individual was “Motala12”).
Figure 1

Map of West Eurasian populations and Principal Component Analysis

(a) Geographical locations of analyzed samples, with color coding matching the PCA. We show all sampling locations for each population, which results in multiple points for some (e.g., Spain). (b) PCA on all present-day West Eurasians, with ancient and selected eastern non-African samples projected. European hunter-gatherers fall beyond present-day Europeans in the direction of European differentiation from the Near East. Stuttgart clusters with other Neolithic Europeans and present-day Sardinians. MA1 falls outside the variation of present-day West Eurasians in the direction of southern-northern differentiation along dimension 2.

Extended Data Figure 1

Photographs of analyzed ancient samples.

(A) Loschbour skull; (B) Stuttgart skull, missing the lower right M2 we sampled; (C) excavation at Kanaljorden in Motala, Sweden; (D) Motala 1 in situ.

Sequence reads from all samples revealed >20% C→T and G→A deamination-derived mismatches at the ends of the molecules that are characteristic of ancient DNA[9,10] (SI3). We estimate nuclear contamination rates to be 0.3% for Stuttgart and 0.4% for Loschbour (SI3), and mitochondrial (mtDNA) contamination rates to be 0.3% for Stuttgart, 0.4% for Loschbour, and 0.01–5% for the Motala individuals (SI3). Stuttgart has mtDNA haplogroup T2, typical of Neolithic Europeans[11], and Loschbour and all Motala individuals have the U5 or U2 haplogroups, typical of hunter-gatherers[5,9] (SI4). Stuttgart is female, while Loschbour and five Motala individuals are male (SI5) and belong to Y-chromosome haplogroup I, suggesting that this was common in pre-agricultural Europeans (SI5). We carried out large-scale sequencing of libraries prepared with uracil DNA glycosylase (UDG), which removes deaminated cytosines, thus reducing errors arising from ancient DNA damage (SI3). The ancient individuals had indistinguishable levels of Neanderthal ancestry when compared to each other (~2%) and to present-day Eurasians (SI6). The heterozygosity of Stuttgart (0.00074) is at the high end of present-day Europeans, while that of Loschbour (0.00048) is lower than in any present humans (SI2), reflecting a strong bottleneck in Loschbour’s ancestors as the genetic data show that he was not recently inbred (Extended Data Fig. 2). High copy numbers for the salivary amylase gene (AMY1) have been associated with a high starch diet[12]; our data are consistent with this finding in that the ancient hunter gatherers La Braña (from Iberia)[2], Motala12, and Loschbour had 5, 6 and 13 copies respectively, whereas the Stuttgart farmer had 16 (SI7). Both Loschbour and Stuttgart had dark hair (>99% probability); and Loschbour, like La Braña and Motala12, likely had blue or intermediate-colored eyes (>75%) while Stuttgart likely had brown eyes (>99%) (SI8). Neither Loschbour nor La Braña carries the skin-lightening allele in SLC24A5 that is homozygous in Stuttgart and nearly fixed in Europeans today[2], but Motala12 carries at least one copy of the derived allele, showing that this allele was present in Europe prior to the advent of agriculture.
Extended Data Figure 2

Pairwise Sequential Markovian Coalescent (PSMC) analysis.

(A) Inference of population size as a function of time, showing a very small recent population size over the most recent period in the ancestry of Loschbour (at least the last 5–10 thousand years). (B) Inferred time since the most recent common ancestor from the PSMC for chromosomes 20, 21, 22 (top to bottom); Stuttgart is plotted on top and Loschbour at bottom.

We compared the ancient genomes to 2,345 present-day humans from 203 populations genotyped at 594,924 autosomal single nucleotide polymorphisms (SNPs) with the Human Origins array[8] (SI9) (Extended Data Table 1). We used ADMIXTURE[13] to identify 59 “West Eurasian” populations that cluster with Europe and the Near East (SI9 and Extended Data Fig. 3). Principal component analysis (PCA)[14] (SI10) (Fig. 1B) indicates a discontinuity between the Near East and Europe, with each showing north-south clines bridged only by a few populations of mainly Mediterranean origin. We projected[15] the newly sequenced and previously published[1-4] ancient genomes onto the first two principal components (PCs) (Fig. 1B). Upper Paleolithic hunter-gatherers[3] from Siberia like the MA1 (Mal’ta) individual project at the northern end of the PCA, suggesting an “Ancient North Eurasian” meta-population (ANE). European hunter-gatherers from Spain[2], Luxembourg, and Sweden[4] fall beyond present-day Europeans in the direction of European differentiation from the Near East, and form a “West European Hunter-Gatherer” (WHG) cluster including Loschbour and La Braña[2], and a “Scandinavian Hunter-Gatherer” (SHG) cluster including the Motala individuals and ~5,000 year old hunter-gatherers from the Pitted Ware Culture[4]. An “Early European Farmer” (EEF) cluster includes Stuttgart, the ~5,300 year old Tyrolean Iceman[1] and a ~5,000 year old Swedish farmer[4].
Extended Data Table 1

West Eurasians genotyped on the Human Origins array and key f-statistics.

Sampling LocationLowest f3(X; Ref1, Ref2)Lowest f3(X; EEF, WHG)(Z<0 and Zdiff<3 reported)Lowest f3(X; Near East, WHG)(Z<0 and Zdiff<3 reported)Lowest f3(X; EEF, ANE)(Z<0 and Zdiff<3 reported)f4(Stuttgart, X;Loschbour, Chimp)f4(Stuttgart, X;MA1, Chimp)
XNLat.Long.Ref1Ref2statisticZRef1Ref2statisticZZdiffRef1Ref2statisticZZdiffRef1Ref2statisticZZdiffstatisticZstatisticZ
Abkhasian94341.02StuMA1−0.0053−2.9GeorgianLaB−0.0004−0.52.6StuMA1−0.0053−2.90.00.00204.2−0.0023−4.7
Adygei174439PiapocoStu−0.0073−5.9StuMA1−0.0067−4.10.30.00132.6−0.0029−6.0
Albanian641.3319.83StuMA1−0.0121−7.0Iraqi_JewLos−0.0090−9.11.7StuMA1−0.0121−7.00.0−0.0009−1.8−0.0027−5.4
Armenian1040.1944.55GujaratiCStu−0.0070−8.2StuMA1−0.0068−4.10.10.00224.5−0.0016−3.3
Ashkenazi_Jew752.2321.02StuMA1−0.0057−3.4Iraqi_JewLos−0.0042−4.71.0StuMA1−0.0057−3.40.00.00081.7−0.0010−2.0
Balkar1043.4843.62PiapocoStu−0.0113−8.9StuMA1−0.0092−5.51.10.00142.9−0.0027−5.6
Basque2943.04−0.65Iraqi_JewLos−0.0083−10.3StuLos−0.0061−3.81.3Iraqi_JewLos−0.0083−10.30.0StuMA1−0.0041−2.42.2−0.0034−7.2−0.0032−6.7
BedouinA253135EsanStu−0.0162−18.20.006213.00.00265.4
BedouinB193135EsanStu0.00897.80.00469.30.00193.9
Belarusian1053.9228.01GeorgianLos−0.0133−17.6GeorgianLos−0.0133−17.60.0StuMA1−0.0102−6.11.9−0.0035−6.9−0.0042−8.6
Bergamo124610StuMA1−0.0106−6.2StuLos−0.0068−4.21.7Iraqi_JewLos−0.0100−11.90.3StuMA1−0.0106−6.20.0−0.0018−3.9−0.0028−5.8
Bulgarian1042.1624.74StuMA1−0.0130−8.2StuLaB−0.0074−4.52.8Iraqi_JewLos−0.0106−12.41.5StuMA1−0.0130−8.20.0−0.0012−2.5−0.0028−5.9
Chechen943.3345.65StuMA1−0.0056−3.2GeorgianLos−0.0002−0.32.8StuMA1−0.0056−3.20.00.00112.3−0.0031−6.2
Croatian1043.5116.45StuMA1−0.0114−6.7StuLos−0.0065−3.82.1Iraqi_JewLos−0.0112−13.00.2StuMA1−0.0114−6.70.0−0.0023−4.7−0.0035−7.4
Cypriot835.1333.43StuMA1−0.0057−3.2Yemenite_JewLos−0.0013−1.52.5StuMA1−0.0057−3.20.00.00193.9−0.0012−2.5
Czech1050.114.4GeorgianLos−0.0137−17.9StuLos−0.0088−5.33.0GeorgianLos−0.0137−17.90.0StuMA1−0.0121−7.20.9−0.0032−6.6−0.0040−8.2
Druze393235StuMA1−0.0024−1.5StuMA1−0.0024−1.50.00.00285.9−0.0006−1.3
English1050.75−2.09Iraqi_JewLos−0.0129−14.8StuLos−0.0090−5.52.2Iraqi_JewLos−0.0129−14.80.0StuMA1−0.0125−7.40.1−0.0032−6.5−0.0041−8.5
Estonian1058.5424.89AbkhasianLos−0.0124−15.1AbkhasianLos−0.0124−15.10.0StuMA1−0.0094−5.61.9−0.0043−8.5−0.0051−10.1
Finnish760.224.9AbkhasianLos−0.0102−11.3AbkhasianLos−0.0102−11.30.0StuMA1−0.0078−4.41.4−0.0035−6.9−0.0045−9.1
French25462StuMA1−0.0131−8.4StuLos−0.0098−6.31.5Iraqi_JewLos−0.0129−16.80.2StuMA1−0.0131−8.40.0−0.0027−5.6−0.0036−7.7
French_South743.44−0.62Iraqi_JewLos−0.0095−9.5StuLaB−0.0089−5.00.3Iraqi_JewLos−0.0095−9.50.0StuMA1−0.0086−4.80.4−0.0030−6.2−0.0031−6.2
Georgian1042.541.85GujaratiCStu−0.0036−4.0StuMA1−0.0036−2.1−0.20.00204.2−0.0019−3.9
Georgian_Jew741.7244.78GujaratiCStu−0.0009−0.9StuMA1−0.0002−0.10.30.00224.3−0.0017−3.4
Greek2039.8423.17StuMA1−0.0118−7.4Iraqi_JewLos−0.0080−11.12.3StuMA1−0.0118−7.40.0−0.0004−0.9−0.0026−5.6
Hungarian2047.4919.08StuMA1−0.0133−8.4StuLos−0.0087−5.62.2Iraqi_JewLos−0.0127−15.90.4StuMA1−0.0133−8.40.0−0.0025−5.3−0.0037−7.8
Icelandic1264.13−21.93AbkhasianLos−0.0121−15.6StuLos−0.0078−4.82.7AbkhasianLos−0.0121−15.60.0StuMA1−0.0097−5.91.5−0.0038−7.7−0.0043−8.9
Iranian835.5951.46PiapocoStu−0.0094−7.2StuMA1−0.0087−5.20.40.00316.3−0.0016−3.2
Iranian_Jew935.751.42GujaratiCStu−0.0018−2.0StuMA1−0.0012−0.60.20.00285.7−0.0011−2.2
Iraqi_Jew633.3344.42VishwabrahminStu−0.0026−2.6StuMA1−0.0009−0.50.90.00306.1−0.0005−1.0
Jordanian932.0535.91EsanStu−0.0145−14.30.00489.60.00142.8
Kumyk843.2546.58PiapocoStu−0.0111−8.2StuMA1−0.0109−6.50.10.00153.1−0.0028−5.7
Lebanese833.8235.57EsanStu−0.0105−9.4StuMA1−0.0068−3.91.90.00387.70.00020.4
Lezgin942.1248.18StuMA1−0.0100−6.0StuMA1−0.0100−6.00.00.00132.7−0.0037−7.5
Libyan_Jew932.9213.18EsanStu−0.0051−4.4StuMA10.00000.02.70.00306.20.00040.9
Lithuanian1054.923.92AbkhasianLos−0.0119−14.9AbkhasianLos−0.0119−14.90.0StuMA1−0.0069−3.92.8−0.0045−9.0−0.0048−9.9
Maltese835.9414.38StuMA1−0.0086−4.9Yemenite_JewLos−0.0051−6.02.0StuMA1−0.0086−4.90.00.00132.7−0.0011−2.3
Mordovian1054.1845.18AbkhasianLos−0.0115−14.4AbkhasianLos−0.0115−14.40.0StuMA1−0.0113−6.60.3−0.0028−5.5−0.0044−9.0
Moroccan_Jew634.02−6.84EsanStu−0.0062−5.2Yemenite_JewLos−0.0021−2.22.9StuMA1−0.0032−1.71.40.00214.3−0.0001−0.1
North_Ossetian1043.0244.65PiapocoStu−0.0093−7.2StuMA1−0.0076−4.41.00.00142.9−0.0028−5.6
Norwegian1160.365.36GeorgianLos−0.0120−14.8GeorgianLos−0.0120−14.80.0StuMA1−0.0093−5.41.4−0.0035−7.3−0.0042−8.7
Orcadian1359−3ArmenianLos−0.0102−13.4StuLos−0.0059−3.62.5ArmenianLos−0.0102−13.40.0StuMA1−0.0098−5.90.5−0.0032−6.7−0.0042−8.6
Palestinian383235EsanStu−0.0120−13.20.004710.20.00143.1
Russian226140ChukchiLos−0.0119−11.3AbkhasianLos−0.0119−17.10.0StuMA1−0.0106−6.60.8−0.0030−6.2−0.0046−9.4
Sardinian27409StuLaB−0.0044−2.6StuLaB−0.0044−2.60.0Iraqi_JewLos−0.0033−4.20.0StuMA1−0.0035−2.10.3−0.0016−3.4−0.0015−3.3
Saudi818.4942.52KgalagadiStu−0.0042−3.60.00428.60.00153.1
Scottish456.04−3.94Iraqi_JewLos−0.0103−8.3Iraqi_JewLos−0.0103−8.30.0StuMA1−0.0090−4.70.7−0.0034−6.4−0.0045−8.7
Sicilian1137.5913.77StuMA1−0.0108−6.5Yemenite_JewLos−0.0066−8.12.4StuMA1−0.0108−6.50.00.00061.3−0.0015−3.2
Spanish5340.43−2.83Iraqi_JewLos−0.0126−17.8StuLos−0.0104−6.81.4Iraqi_JewLos−0.0126−17.80.0StuMA1−0.0120−7.60.3−0.0019−4.2−0.0024−5.2
Spanish_North542.8−2.7Iraqi_JewLos−0.0112−9.9StuLos−0.0102−5.40.5Iraqi_JewLos−0.0112−9.90.0StuMA1−0.0082−4.41.3−0.0035−6.9−0.0032−6.4
Syrian835.1336.87EsanStu−0.0101−8.70.00448.60.00122.4
Tunisian_Jew736.810.18GambianStu−0.0026−2.00.00265.20.00020.5
Turkish5639.2232.66PiapocoStu−0.0129−11.3StuMA1−0.0106−6.91.30.00183.8−0.0019−4.0
Turkish_Jew841.0228.95StuMA1−0.0075−4.3Yemenite_JewLos−0.0049−5.81.4StuMA1−0.0075−4.30.00.00173.6−0.0006−1.3
Tuscan84311StuMA1−0.0109−6.4StuLos−0.0055−3.22.3Iraqi_JewLos−0.0092−10.10.9StuMA1−0.0109−6.40.0−0.0011−2.2−0.0024−5.0
Ukrainian950.2931.56GeorgianLos−0.0134−16.7GeorgianLos−0.0134−16.70.0StuMA1−0.0114−6.61.3−0.0032−6.4−0.0041−8.5
Yemenite_Jew815.3544.2EsanStu−0.0027−2.40.00469.10.00132.6

Note: Zdiff is the number of standard errors of the difference between the lowest f-statistic over all reference pairs and the lowest f-statistic for a subset of reference pairs.

Abbreviations used: Stu: Stuttgart; Los: Loschbour; LaB: LaBrana.

Extended Data Figure 3

ADMIXTURE analysis (K=2 to K=20).

Ancient samples (Loschbour, Stuttgart, Motala_merge, Motala12, MA1, and LaBrana) are at left.

Patterns observed in PCA may be affected by sample composition (SI10) and their interpretation in terms of admixture events is not straightforward, so we rely on formal analysis of f-statistics[8] to document mixture of at least three source populations in the ancestry of present Europeans. We began by computing all possible statistics of the form f (SI11), which if significantly negative show unambiguously[8] that Test is admixed between populations anciently related to Ref and Ref (we choose Ref and Ref from 5 ancient and 192 present populations). The lowest f-statistics for Europeans are negative (93% are >4 standard errors below 0), with most showing strong support for at least one ancient individual being one of the references (SI11). Europeans almost always have their lowest f with either (EEF, ANE) or (WHG, Near East) (SI11, Table 1, Extended Data Table 1), which would not be expected if there were just two ancient sources of ancestry (in which case the best references for all Europeans would be similar). The lowest f-statistic for Near Easterners always takes Stuttgart as one of the reference populations, consistent with a Near Eastern origin for Stuttgart’s ancestors (Table 1). We also computed the statistic f which measures whether MA1 shares more alleles with a Test population or with Stuttgart. This statistic is significantly positive (Extended Data Fig. 4, Extended Data Table 1) if Test is nearly any present-day West Eurasian population, showing that MA1-related ancestry has increased since the time of early farmers like Stuttgart (the analogous statistic using Native Americans instead of MA1 is correlated but smaller in magnitude (Extended Data Fig. 5), indicating that MA1 is a better surrogate than the Native Americans who were first used to document ANE ancestry in Europe[7,8]). The analogous statistic f is nearly always positive in Europeans and negative in Near Easterners, indicating that Europeans have more ancestry from populations related to Loschbour than do Near Easterners (Extended Data Fig. 4, Extended Data Table 1). Extended Data Table 2 documents the robustness of key fstatistics by recomputing them using transversion polymorphisms not affected by ancient DNA damage, and also using whole-genome sequencing data not affected by SNP ascertainment bias. Extended Data Fig. 6 shows the geographic gradients in the degree of allele sharing of present-day West Eurasians (as measured by f-statistics) with Stuttgart (EEF), Loschbour (WHG) and MA1 (ANE).
Table 1

Lowest f-statistics for each West Eurasian population

Ref1Ref2Target for which these two references give the lowest f3(X; Ref1, Ref2)
WHGEEFSardinian***
WHGNear EastBasque, Belarusian, Czech, English, Estonian, Finnish, French_South, Icelandic, Lithuanian, Mordovian, Norwegian, Orcadian, Scottish, Spanish, Spanish_North, Ukrainian
WHGSiberianRussian
EEFANEAbkhasian***, Albanian, Ashkenazi_Jew****, Bergamo, Bulgarian, Chechen****, Croatian, Cypriot****, Druze**, French, Greek, Hungarian, Lezgin, Maltese, Sicilian, Turkish_Jew, Tuscan
EEFNative AmericanAdygei, Balkar, Iranian, Kumyk, North_Ossetian, Turkish
EEFAfricanBedouinA, BedouinB, Jordanian, Lebanese, Libyan_Jew, Moroccan_Jew, Palestinian, Saudi****, Syrian, Tunisian_Jew***, Yemenite_Jew***
EEFSouth AsianArmenian, Georgian****, Georgian_Jew*, Iranian_Jew***, Iraqi_Jew***

Note: WHG = Loschbour or LaBraña; EEF=Stuttgart; ANE=MA1; Native American=Piapoco; African=Esan, Gambian, or Kgalagadi; South Asian=GujaratiC or Vishwabrahmin. Statistics are negative with Z<-4 unless otherwise noted: † (positive) or *, **, ***, ****, to indicate Z less than 0, −1, −2, and −3 respectively. The complete list of statistics can be found in Extended Data Table 1.

Extended Data Figure 4

ANE ancestry is present in both Europe and the Near East but WHG ancestry is restricted to Europe, which cannot be due to a single admixture event.

(x-axis) We computed the statistic f, which measures where MA1 shares more alleles with a test population than with Stuttgart. It is positive for most European and Near Eastern populations, consistent with ANE (MA1-related) gene flow into both regions. (y-axis) We computed the statistic f which measures whether Loschbour shares more alleles with a test sample than with Stuttgart. Only European populations show positive values of this statistic, providing evidence of WHG (Loschbour-related) admixture only in Europeans.

Extended Data Figure 5

MA1 is the best surrogate for ANE for which we have data.

Europeans share more alleles with MA1 than with Karitiana, as we see from the fact that in a plot of f and f, the European cline deviates in the direction of MA1, rather than Karitiana (the slope is >1 and European populations are above the line indicating equality of these two statistics).

Extended Data Table 2

Confirmation of key findings on transversions and on whole genome sequence data.

InterpretationD(A, B; C, D) on Human Origins genotype dataD(A, B; C, D) on whole genome sequence data transversions

ABCD594,924 SNPs110,817 transversionsABCDstatisticZ
statisticZstatisticZ
Stuttgart has Near Eastern ancestryStuttgartArmenianLoschbourChimp0.02194.50.01892.9

Europeans have more WHG-related ancestry than StuttgartStuttgartFrenchLoschbourChimp−0.0266−5.7−0.031−5.0StuttgartFrench2LoschbourChimp−0.03−4.7
LithuanianStuttgartLoschbourChimp0.04469.10.04777.2

West Eurasians have more ANE-related ancestry than StuttgartFrenchStuttgartMA1Chimp0.03677.70.03865.5French2StuttgartMA1Chimp0.0376.4
LezginStuttgartMA1Chimp0.03727.60.04095.6

MA1 is a better surrogate of ANE ancestry than KaritianaFrenchChimpMA1Karitiana0.02074.50.02142.8French2ChimpMA1Karitiana20.0263.8

Eastern non-Africans closer to WHG/ANE/SHG than to EEFLoschbourStuttgartOngeChimp0.01963.50.02022.5
LoschbourStuttgartPapuanChimp0.01422.60.01271.5LoschbourStuttgartPapuan2Chimp0.0172.7
LoschbourStuttgartDaiChimp0.01643.20.0212.8LoschbourStuttgartDai2Chimp0.0182.9
MA1StuttgartPapuanChimp0.01392.20.01031.0MA1StuttgartPapuan2Chimp0.0182.8
MA1StuttgartDaiChimp0.01743.00.0161.7MA1StuttgartDai2Chimp0.0284.3
Motala12StuttgartPapuanChimp0.01823.20.0111.1Motala12StuttgartPapuan2Chimp0.0233.7
Motala12StuttgartDaiChimp0.01562.80.01491.6Motala12StuttgartDai2Chimp0.023.2
LaBranaStuttgartPapuanChimp0.01232.30.01011.1LaBranaStuttgartPapuan2Chimp0.023.2
LaBranaStuttgartDaiChimp0.01492.90.02282.5LaBranaStuttgartDai2Chimp0.0243.7

Native Americans closer to ANE than to WHGKaritianaChimpMA1Loschbour0.04677.10.04674.4Karitiana2ChimpMA1Loschbour0.0527.1

West Eurasians closer to Native Americans than to other Eastern non-AfricansStuttgartChimpKaritianaPapuan0.055910.90.04746.6StuttgartChimpKaritiana2Papuan20.0527.6
StuttgartChimpKaritianaOnge0.02375.10.01792.6

Ancient Eurasian hunter-gatherers equally related to Eastern non-Africans other than Native AmericansLoschbourMA1DaiChimp−0.0015−0.20.00160.2LoschbourMA1Dai2Chimp−0.013−1.9
LoschbourMA1PapuanChimp0.00020.00.00120.1LoschbourMA1Papuan2Chimp−0.003−0.4
LoschbourMotala12DaiChimp0.00240.40.0090.9LoschbourMotala12Dai2Chimp−0.002−0.3
LoschbourMotala12PapuanChimp−0.0028−0.40.00460.5LoschbourMotala12Papuan2Chimp−0.004−0.6
MA1Motala12DaiChimp0.00260.40.00470.4MA1Motala12Dai2Chimp0.011.5
MA1Motala12PapuanChimp−0.0047−0.7−0.001−0.1MA1Motala12Papuan2Chimp−0.004−0.5

LaBrana and Loschbour are a cladeLaBranaLoschbourDaiChimp−0.0028−0.50.00240.3LaBranaLoschbourDai2Chimp0.0071.1
LaBranaLoschbourPapuanChimp−0.0031−0.5−0.0012−0.1LaBranaLoschbourPapuan2Chimp0.0020.3
LaBranaLoschbourMA1Chimp−0.006−0.80.01010.7LaBranaLoschbourMA1Chimp0.0050.7

SHG closer to ANE than to WHGMotala12LoschbourMA1Chimp0.04255.30.03532.6Motala12LoschbourMA1Chimp0.0425.9
Motala12LaBranaMA1Chimp0.04655.80.03472.4Motala12LaBranaMA1Chimp0.0385.4

LaBrana and Loschbour equally related to StuttgartLaBranaLoschbourStuttgartChimp−0.0176−2.6−0.0106−1.0LaBranaLoschbourStuttgartChimp−0.012−1.8
Extended Data Figure 6

The differential relatedness of West Eurasians to Stuttgart (EEF), Loschbour (WHG), and MA1 (ANE) cannot be explained by two-way mixture.

We plot on a West Eurasian map the statistic f, where A and A are a pair of the three ancient samples representing the three ancestral populations of Europe. (A) In both Europe and the Near East/Caucasus, populations from the south have more relatedness to Stuttgart than those from the north where ANE influence is also important. (B) Northern European populations share more alleles with Loschbour than with Stuttgart, as they have additional WHG ancestry beyond what was already present in EEF. (C) We observe a striking contrast between Europe west of the Caucasus and the Near East in degree of relatedness to WHG. In Europe, there is a much higher degree of allele sharing with Loschbour than with MA1, which we ascribe to the 60–80% WHG/(WHG+ANE) ratio in most Europeans that we report in SI14. In contrast, the Near East has no appreciable WHG ancestry but some ANE ancestry, especially in the northern Caucasus. (Jewish populations are marked with a square in this figure to assist in interpretation as their ancestry is often anomalous for their geographic regions.)

To determine the minimum number of source populations needed to explain the data for many European populations taken together, we studied the matrix of all possible statistics of the form f (SI12). Test is a reference European population, Test is the set of all other European Test populations, O is a reference outgroup, and O is the set of other outgroups (ancient DNA samples, Onge, Karitiana, and Mbuti). The rank of the (i, j) matrix reflects the minimum number of sources that contributed to the Test populations[16,17]. For a pool of individuals from 23 Test populations representing most present-day European groups, this analysis rejects descent from just two sources (P<10−12 by a Hotelling T-test[17]). However, three source populations are consistent with the data after excluding the Spanish who have evidence for African admixture[18-20] (P=0.019, not significant after multiple-hypothesis correction), consistent with the results from ADMIXTURE (SI9), PCA (Fig. 1B, SI10) and f-statistics (Extended Data Table 1, Extended Data Fig. 6, SI11, SI12). We caution that the finding of three sources could be consistent with a larger number of mixture events. Moreover, the source populations may themselves have been mixed. Indeed, the positive f statistics obtained when Test is Near Eastern (Extended Data Table 1) imply that the EEF had some WHG-related ancestry, which was greater than 0% and as high as 45% (SI13). We used the ADMIXTUREGRAPH software[8,15] to fit a model (a tree structure augmented by admixture events) to the data, exploring models relating the three ancient populations (Stuttgart, Loschbour, and MA1) to two eastern non-Africans (Onge and Karitiana) and sub-Saharan Africans (Mbuti). We found no models that fit the data with 0 or 1 admixture events, but did find a model that fit with 2 admixture events (SI14). The successful model (Fig. 2A) confirms the existence of MA1-related admixture in Native Americans[3], but includes the novel inference that Stuttgart is partially (44 ± 10%) derived from a lineage that split prior to the separation of eastern non-Africans from the common ancestor of WHG and ANE. The existence of such “Basal Eurasian” admixture into Stuttgart provides a simple explanation for our finding that diverse eastern non-African populations share significantly more alleles with ancient European and Upper Paleolithic Siberian hunter-gatherers than with Stuttgart (that is, f is significantly positive), but that hunter-gatherers appear to be equally related to most eastern groups (SI14). We verified the robustness of the model by reanalyzing the data using the unsupervised MixMapper[7] (SI15) and TreeMix[21] software (SI16), which both identified the same admixture events. The ANE/WHG split must have occurred >24,000 years ago (as it must predate the age of MA1[3]), and the WHG/Eastern non-African split must have occurred >40,000 years ago (as it must predate the Tianyuan[22] individual from China which clusters with Asians to the exclusion of Europeans). The Basal Eurasian split must be even older, and might be related to early settlement of the Levant[23] or Arabia[24,25] prior to the diversification of most Eurasians, or more recent gene flow from Africa[26]. However, the Basal Eurasian population shares much of the genetic drift common to non-African populations after their separation from Africans, and thus does not appear to represent gene flow between sub-Saharan Africans and the ancestors of non-Africans after the out-of-Africa bottleneck (SI14).
Figure 2

Modeling of West Eurasian population history

(a) A three-way mixture model that is a fit to the data for many populations. Present-day samples are colored in blue, ancient in red, and reconstructed ancestral populations in green. Solid lines represent descent without mixture, and dashed lines represent admixture. We print mixture proportions and one standard error for the two mixtures relating the highly divergent ancestral populations. (We do not print the estimate for the “European” population as it varies depending on the population). (b) We plot the proportions of ancestry from each of three inferred ancestral populations (EEF, ANE and WHG).

Fitting present-day Europeans into the model, we find that few populations can be fit as 2-way mixtures, but nearly all are compatible with 3-way mixtures of ANE/EEF/WHG (SI14). The mixture proportions from the fitted model (Fig. 2B; Extended Data Table 3) are encouragingly consistent with those obtained from a separate method that relates European populations to diverse outgroups using f-statistics, assuming only that MA1 is an unmixed descendent of ANE, Loschbour of WHG, and Stuttgart of EEF (SI17). We infer that EEF ancestry in Europe today ranges from ~30% in the Baltic region to ~90% in the Mediterranean, consistent with patterns of identity-by-descent (IBD) sharing[27,28] (SI18) and shared haplotype analysis (chromosome painting)[29] (SI19) in which Loschbour shares more segments with northern Europeans and Stuttgart with southern Europeans. Southern Europeans inherited their European hunter-gatherer ancestry mostly via EEF ancestors (Extended Data Fig. 6), while Northern Europeans acquired up to 50% of WHG ancestry above and beyond the WHG-related ancestry which they received through their EEF ancestors. Europeans have a larger proportion of WHG than ANE ancestry in general. By contrast, in the Near East there is no detectable WHG ancestry, but up to ~29% ANE in the Caucasus (SI14). A striking feature of these findings is that ANE ancestry is inferred to be present in nearly all Europeans today (with a maximum of ~20%), but was absent in both farmers and hunter-gatherers from central/western Europe during the Neolithic transition. At the same time, we infer that ANE ancestry was not completely absent from the larger European region at that time: we find that it was present in ~8,000 years old Scandinavian hunter-gatherers, since MA1 shares more alleles with Motala12 (SHG) than with Loschbour, and Motala12 fits as a mixture of 81% WHG and 19% ANE (SI14).
Extended Data Table 3

Admixture proportions for European populations. The estimates from the model with minimal assumptions are from SI17. The estimates from the full modeling are from SI14 either by single population analysis or co-fitting population pairs and averaging over fits (these averages are the results plotted in Fig. 2B). Populations that do not fit the models are not reported.

Full modeling ofpopulation relationships(individual fits)Full modeling ofpopulation relationships(averaged fits)Modeling of populationrelationships withminimal assumptionsModel-based (averaged)- Model with minimalassumptions (Z-score)
EEFWHGANEEEFWHGANEEEFWHGANEEEFWHGANE
MeanRangeMeanRangeMeanRange
Albanian0.7810.0920.1270.7810.772–0.8190.0820.032–0.0980.1370.129–0.1580.595 ± 0.1120.353 ± 0.1500.052 ± 0.0491.658−1.8071.741
Ashkenazi_Jew0.93100.0690.938 ± 0.146−0.021 ± 0.1850.083 ± 0.049
Basque0.5930.2930.1140.5690.527–0.6160.3350.255–0.3920.0960.076–0.1290.569 ± 0.0910.315 ± 0.1240.115 ± 0.041−0.0010.165−0.472
Belarusian0.4180.4310.1510.4260.397–0.4640.4080.338–0.4430.1670.150–0.1990.272 ± 0.0940.554 ± 0.1310.174 ± 0.0471.637−1.118−0.158
Bergamo0.7150.1770.1080.7210.704–0.7930.1630.061–0.1890.1170.104–0.1470.644 ± 0.1250.248 ± 0.1700.108 ± 0.0530.615−0.5030.162
Bulgarian0.7120.1470.1410.7180.707–0.7780.1320.047–0.1510.1510.138–0.1750.556 ± 0.1100.328 ± 0.1430.116 ± 0.0431.469−1.3720.804
Croatian0.5610.2930.1450.5640.548–0.5860.2850.242–0.3100.1510.137–0.1720.453 ± 0.1220.407 ± 0.1590.140 ± 0.0460.911−0.7680.238
Czech0.4950.3380.1670.4890.460–0.5310.3480.273–0.3820.1630.145–0.1960.402 ± 0.1170.400 ± 0.1620.198 ± 0.0500.744−0.322−0.698
English0.4950.3640.1410.5030.476–0.5360.3530.296–0.3820.1440.130–0.1690.475 ± 0.0910.357 ± 0.1250.168 ± 0.0430.304−0.028−0.561
Estonian0.3220.4950.1830.3230.293–0.3450.490.451–0.5200.1870.172–0.2050.072 ± 0.1210.778 ± 0.1760.150 ± 0.0642.070−1.6360.584
French0.5540.3110.1350.5630.537–0.6010.2970.230–0.3280.140.126–0.1690.498 ± 0.0970.359 ± 0.1270.142 ± 0.0390.672−0.487−0.060
French_South0.6750.1950.130.6360.589–0.7380.2560.111–0.3230.1080.088–0.1510.636 ± 0.1160.225 ± 0.1650.140 ± 0.057−0.0030.189−0.558
Greek0.7920.0580.1510.7910.780–0.8160.0480.019–0.0600.1610.150–0.1710.658 ± 0.0980.255 ± 0.1270.086 ± 0.0391.357−1.6271.915
Hungarian0.5580.2640.1790.5480.520–0.5900.2790.199–0.3130.1740.156–0.2100.391 ± 0.1090.454 ± 0.1530.155 ± 0.0501.437−1.1450.371
Icelandic0.3940.4560.150.4090.386–0.4240.4480.409–0.4730.1430.126–0.1700.342 ± 0.1020.476 ± 0.1370.182 ± 0.0450.654−0.204−0.861
Lithuanian0.3640.4640.1720.3520.327–0.3840.4880.433–0.5270.160.135–0.1840.248 ± 0.1170.548 ± 0.1630.205 ± 0.0520.886−0.367−0.864
Maltese0.93200.0681.298 ± 0.185−0.509 ± 0.2480.211 ± 0.079
Norwegian0.4110.4280.1610.4170.388–0.4380.4230.383–0.4500.160.140–0.1810.273 ± 0.1150.557 ± 0.1610.170 ± 0.0551.252−0.831−0.185
Orcadian0.4570.3850.1580.4650.439–0.4930.3780.329–0.4030.1570.140–0.1790.395 ± 0.0880.437 ± 0.1220.168 ± 0.0410.798−0.487−0.264
Sardinian0.8170.1750.0080.8180.791–0.8740.1410.058–0.1820.0410.026–0.0680.883 ± 0.1280.075 ± 0.1660.042 ± 0.048−0.5100.400−0.024
Scottish0.390.4280.1820.4080.387–0.4240.4210.384–0.4480.1710.149–0.2010.286 ± 0.1120.532 ± 0.1560.182 ± 0.0531.091−0.712−0.210
Sicilian0.90300.0971.012 ± 0.149−0.131 ± 0.1990.119 ± 0.060
Spanish0.8090.0680.1230.7590.736–0.8040.1260.066–0.1700.1150.091–0.1510.856 ± 0.126−0.015 ± 0.1650.160 ± 0.049−0.7690.855−0.922
Spanish_North0.7130.1250.1630.6120.561–0.6600.2920.214–0.3650.0960.072–0.1260.581 ± 0.1200.298 ± 0.1580.121 ± 0.0460.254−0.038−0.533
Tuscan0.7460.1360.1180.7510.737–0.8060.1230.047–0.1450.1260.114–0.1500.734 ± 0.1180.153 ± 0.1600.113 ± 0.0540.141−0.1880.249
Ukrainian0.4620.3870.1510.4630.445–0.4910.3760.322–0.3990.160.148–0.1870.259 ± 0.1230.596 ± 0.1730.145 ± 0.0571.661−1.2690.269
Finnish−0.299 ± 0.2041.194 ± 0.2960.105 ± 0.105
Mordovian−0.255 ± 0.1731.151 ± 0.2460.104 ± 0.090
Russian−0.303 ± 0.2111.230 ± 0.3010.072 ± 0.106
Two sets of European populations are poor fits for the model. Sicilians, Maltese, and Ashkenazi Jews have EEF estimates of >100% consistent with their having more Near Eastern ancestry than can be explained via EEF admixture (SI17). They also cannot be jointly fit with other Europeans (SI14), and they fall in the gap between European and Near Easterners (Fig. 1B). Finns, Mordovians and Russians (from the northwest of Russia) also do not fit (SI14; Extended Data Table 3) due to East Eurasian gene flow into the ancestors of these northeastern European populations. These populations (and Chuvash and Saami) are more related to East Asians than can be explained by ANE admixture (Extended Data Fig. 7), likely reflecting a separate stream of Siberian gene flow into northeastern Europe (SI14).
Extended Data Figure 7

Evidence for Siberian gene flow into far northeastern Europe.

Some northeastern European populations (Chuvash, Finnish, Russian, Mordovian, Saami) share more alleles with Han Chinese than with other Europeans who are arrayed in a cline from Stuttgart to Lithuanians/Estonians in a plot of f against f.

Several questions will be important to address in future ancient DNA work. Where and when did the Near Eastern farmers admix with European hunter-gatherers to produce the EEF? How did the ancestors of present-day Europeans first acquire their ANE ancestry? Discontinuity in central Europe during the late Neolithic (~4,500 years ago) associated with the appearance of mtDNA types absent in earlier farmers and hunter-gatherers[30] raises the possibility that ANE ancestry may have also appeared at this time. Finally, it is important to study ancient genome sequences from the Near East to provide insights into the history of the Basal Eurasians.

Online Methods

Archeological context, sampling and DNA extraction

The Loschbour sample stems from a male skeleton excavated in 1935 at the Loschbour rock shelter in Heffingen, Luxembourg. The skeleton was AMS radiocarbon dated to 7,205 ± 50 years before present (OxA-7738; 6,220-5,990 cal BC)[31]. At the Palaeogenetics Laboratory in Mainz, material for DNA extraction was sampled from tooth 16 (an upper right M1 molar) after irradiation with UV-light, surface removal, and pulverization in a mixer mill. DNA extraction took place in the palaeogenetics facilities in the Institute for Archaeological Sciences at the University of Tübingen. Three extracts were made in total, one from 80 mg of powder using an established silica based protocol[32] and two additional extracts from 90 mg of powder each with a protocol optimized for the recovery of short DNA molecules[33]. The Stuttgart sample was taken from a female skeleton excavated in 1982 at the site Viesenhäuser Hof, Stuttgart-Mühlhausen, Germany. It was attributed to the Linearbandkeramik (5,500-4,800 BC) through associated pottery artifacts and the chronology was corroborated by radiocarbon dating of the stratigraphy[34]. Both sampling and DNA extraction took place in the Institute for Archaeological Sciences at the University of Tübingen. Tooth 47 (a lower right M2 molar) was removed and material from the inner part was sampled with a sterile dentistry drill. An extract was made using 40 mg of bone powder[33]. The Motala individuals were recovered from the site of Kanaljorden in the town of Motala, Östergötland, Sweden, excavated between 2009 and 2013. The human remains at this site are represented by several adult skulls and one infant skeleton. All individuals are part of a ritual deposition at the bottom of a small lake. Direct radiocarbon dates on the remains range between 7,013 ± 76 and 6,701 ± 64 BP (6,361-5,516 cal BC), corresponding to the late Middle Mesolithic of Scandinavia. Samples were taken from the teeth of the nine best preserved skulls, as well as a femur and tibia. Bone powder was removed from the inner parts of the teeth or bones with a sterile dentistry drill. DNA from 100 mg of bone powder was extracted[35] in the ancient DNA laboratory of the Archaeological Research Laboratory, Stockholm.

Library preparation

Illumina sequencing libraries were prepared using either double- or single-stranded library preparation protocols[36,37] (SI1). For high-coverage shotgun sequencing libraries, a DNA repair step with Uracil-DNA-glycosylase (UDG) and endonuclease VIII (endo VIII) treatment was included in order to remove uracil residues[38]. Size fractionation on a PAGE gel was also performed in order to remove longer DNA molecules that are more likely to be contaminants[37]. Positive and blank controls were carried along during every step of library preparation.

Shotgun sequencing and read processing

All non-UDG-treated libraries were sequenced either on an Illumina Genome Analyzer IIx with 2×76 + 7 cycles for the Loschbour and Motala libraries, or on an Illumina MiSeq with 2×150 + 8 + 8 cycles for the Stuttgart library. We followed the manufacturer’s protocol for multiplex sequencing. Raw overlapping forward and reverse reads were merged and filtered for quality[39] and mapped to the human reference genome (hg19/GRCh37/1000Genomes) using the Burrows-Wheeler Aligner (BWA)[40] (SI2). For deeper sequencing, UDG-treated libraries of Loschbour were sequenced on 3 Illumina HiSeq 2000 lanes with 50-bp single-end reads, 8 Illumina HiSeq 2000 lanes of 100-bp paired-end reads and 8 Illumina HiSeq 2500 lanes of 101-bp paired-end reads. The UDG-treated library for Stuttgart was sequenced on 8 HiSeq 2000 lanes and 101-bp paired-end reads. The UDG-treated libraries for Motala were sequenced on 8 HiSeq 2000 lanes of 100-bp paired-end reads, with 4 lanes each for two pools (one of 3 individuals and one of 4 individuals). We also sequenced an additional 8 HiSeq 2000 lanes for Motala12, the Motala sample with the highest percentage of endogenous human DNA. For the Loschbour and Stuttgart high coverage individuals, diploid genotype calls were obtained using the Genome Analysis Toolkit (GATK)[41].

Enrichment of mitochondrial DNA and sequencing

To test for DNA preservation and mtDNA contamination non-UDG-treated libraries of Loschbour and all Motala samples were enriched for human mitochondrial DNA using a bead-based capture approach with present-day human DNA as bait[42]. UDG-treatment was omitted in order to allow characterization of damage patterns typical for ancient DNA[10]. The captured libraries were sequenced on an Illumina Genome Analyzer IIx platform with 2 × 76 + 7 cycles and the resulting reads were merged and quality filtered[39]. The sequences were mapped to the Reconstructed Sapiens Reference Sequence, RSRS[43], using a custom iterative mapping assembler, MIA[44] (SI4).

Contamination estimates

We assessed if the sequences had the characteristics of authentic ancient DNA using four approaches. First we searched for evidence of contamination by determining whether the sequences mapping to the mitochondrial genome were consistent with deriving from more than one individual[44,45]. Second, for the high-coverage Loschbour and Stuttgart genomes, we used a maximum-likelihood-based estimate of autosomal contamination that uses variation at sites that are fixed in the 1000 Genomes data to estimate error, heterozygosity and contamination[46] simultaneously. Third, we estimated contamination based on the rate of polymorphic sites on the X chromosome of the male Loschbour individual[47] (SI3) Fourth, we analyzed non-UDG treated reads mapping to the RSRS to search for aDNA-typical damage patterns resulting in C→T changes at the 5′-end of the molecule[10] (SI3).

Phylogenetic analysis of the mitochondrial genomes

All nine complete mitochondrial genomes that fulfilled the criteria of authenticity were assigned to haplogroups using Haplofind[48]. A Maximum Parsimony tree including present day humans and previously published ancient mtDNA sequences was generated with MEGA[49]. The effect of branch shortening due to a lower number of substitutions in ancient lineages was studied by calculating the nucleotide edit distance to the root for all haplogroup R sequences (SI4).

Sex determination and Y-chromosome analysis

We assessed the sex of all sequenced individuals by using the ratio of (chrY) to (chrY+chrX) aligned reads[50]. We downloaded a list of Y-chromosome SNPs curated by the International Society of Genetic Genealogy (ISOGG, http://www.isogg.org) v. 9.22 (accessed Feb. 18, 2014) and determined the state of the ancient individuals at positions where a single allele was observed and MAPQ≥30. We excluded C/G or A/T SNPs due to uncertainty about the polarity of the mutation in the database. The ancient individuals were assigned haplogroups based on their derived state (SI5). We also used BEAST v1.7.51[51] to assess the phylogenetic position of Loschbour using 623 males from around the world with 2,799 variant sites across 500kb of non-recombining Y-chromosome sequence[52] (SI5).

Estimation of Neanderthal admixture

We estimate Neanderthal admixture in ancient individuals with the fratio or S-statistic[8,53,54] α̂ = f(Altai, Denisova; Test, Yoruba)/f(Altai, Denisova; Vindija, Yoruba) which uses whole genome data from Altai, a high coverage (52×) Neanderthal genome sequence[55], Denisova, a high coverage sequence[37] from another archaic human population (31×), and Vindija, a low coverage (1.3×) Neanderthal genome from a mixture of three Neanderthal individuals from Vindija Cave in Croatia[53].

Inference of demographic history and inbreeding

We used the Pairwise Sequentially Markovian Coalescent (PSMC)[56] to infer the size of the ancestral population of Stuttgart and Loschbour. This analysis requires high quality diploid genotype calls and cannot be performed in the low-coverage Motala samples. To determine whether the low effective population size inferred for Loschbour is due to recent inbreeding, we plotted the time-to-most-recent common ancestor (TMRCA) along each of chr1-22 to detect runs of low TMRCA.

Analysis of segmental duplications and copy number variants

We built read-depth based copy number maps for the Loschbour, Stuttgart and Motala12 genomes in addition to the Denisova and Altai Neanderthal genome and 25 deeply sequenced modern genomes[55] (SI7). We built these maps by aligning reads, subdivided into their non-overlapping 36-bp constituents, against the reference genome using the mrsFAST aligner[57], and renormalizing read-depth for local GC content. We estimated copy numbers in windows of 500 unmasked base pairs slid at 100 bp intervals across the genome. We called copy number variants using a scale space filter algorithm. We genotyped variants of interest and compared the genotypes to those from individuals sequenced as part of the 1000 Genomes Project[58].

Phenotypic inference

We inferred likely phenotypes (SI8) by analyzing DNA polymorphism data in the VCF format[59] using VCFtools (http://vcftoools.sourceforge.net/). For the Loschbour and Stuttgart individuals, we included data from sites not flagged as LowQuality, with genotype quality (GQ) of ≥30, and SNP quality (QUAL) of ≥50. For Motala12, which is of lower coverage, we included sites having at least 2× coverage and that passed visual inspection of the local alignment using samtools tview (http://samtools.sourceforge.net)[60]

Human Origins dataset curation

The Human Origins array consists of 14 panels of SNPs for which the ascertainment is well known[8,61]. All population genetics analysis were carried out on a set of 594,924 autosomal SNPs, after restricting to sites that had >90% completeness across 7 different batches of sequencing, and that had >97.5% concordance with at least one of two subsets of samples for which whole genome sequencing data was also available. The total dataset consists of 2,722 individuals, which we filtered to 2,345 individuals (203 populations) after removing outlier individuals or relatives based on visual inspection of PCA plots[14,62] or model-based clustering analysis[13]. Whole genome amplified (WGA) individuals were not used in analysis, except for a Saami individual who we included because of the special interest of this population for Northeastern European population history (Extended Data Fig. 7).

ADMIXTURE analysis

We merged all Human Origins genotype data with whole genome sequencing data from Loschbour, Stuttgart, MA1, Motala12, Motala_merge, and LaBrana. We then thinned the resulting dataset to remove SNPs in linkage-disequilibrium with PLINK 1.07[63], using a window size of 200 SNPs advanced by 25 SNPs and an r[2] threshold of 0.4. We ran ADMIXTURE 1.23[13,64] for 100 replicates with different starting random seeds, default 5-fold cross-validation, and varying the number of ancestral populations K between 2 and 20. We assessed clustering quality using CLUMPP[65]. We used the ADMIXTURE results to identify a set of 59 “West Eurasian” (European/Near Eastern) populations based on values of a “West Eurasian” ancestral population at K=3 (SI9). We also identified 15 populations for use as “non-West Eurasian outgroups” based on their having at least 10 individuals and no evidence of European or Near Eastern admixture at K=11, the lowest K for which Near Eastern/European-maximized ancestral populations appeared consistently across all 100 replicates.

Principal Components Analysis

We used smartpca[14] (version: 10210) from EIGENSOFT[62,66] 5.0.1 to carry out Principal Components Analysis (PCA) (SI10). We performed PCA on a subset on individuals and then projected others using the lsqproject: YES option that gives an unbiased inference of the position of samples even in the presence of missing data (especially important for ancient DNA).

f-statistics

We use the f-statistic8 , where t, r1,i and r2,i are the allele frequencies for the ith SNP in populations Test, Ref1, Ref2, respectively, to determine if there is evidence that the Test population is derived from admixture of populations related to Ref and Ref (SI11). A significantly negative statistic provides unambiguous evidence of mixture in the Test population[8]. We allow Ref and Ref to be any Human Origins population with 4 or more individuals, or Loschbour, Stuttgart, MA1, Motala12, LaBrana. We assess significance of the f-statistics using a block jackknife[67] and a block size of 5cM. We report significance as the number of standard errors by which the statistic differs from zero (Z-score). We also perform an analysis in which we constrain the reference populations to be (i) EEF (Stuttgart) and WHG (Loschbour or LaBrana), (ii) EEF and a Near Eastern population, (iii) EEF and ANE (MA1), or (iv) any two present-day populations, and compute a Zdiff score between the lowest f-statistic observed in the dataset, and the fstatistic observed for the specified pair. We analyze f-statistics[8] of the form to assess if populations A, B are consistent with forming a clade in an unrooted tree with respect to C, D. If they form a clade, the allele frequency differences between the two pairs should be uncorrelated and the statistic has an expected value of 0. We set the outgroup D to be a sub-Saharan African population or Chimpanzee. We systematically tried all possible combinations of the ancient samples or 15 “non-West Eurasian outgroups” identified by ADMIXTURE analysis as A, B, C to determine their genetic affinities (SI14). Setting A as a present-day test population and B as either Stuttgart or BedouinB, we documented relatedness to C=(Loschbour or MA1) or C=(MA1 and Karitiana) or C=(MA1 or Han) (Extended Data Figs. 4, 5, 7). Setting C as a test population and (A, B) a pair from (Loschbour, Stuttgart, MA1) we documented differential relatedness to ancient populations (Extended Data Fig. 6). We computed D-statistics[53] using transversion polymorphisms in whole genome sequence data[55] to confirm robustness to ascertainment and ancient DNA damage (Extended Data Table 2).

Minimum number of source populations for Europeans

We used qpWave[16,17] to study the minimum number of source populations for a designated set of Europeans (SI12). We use f-statistics of the form X(l, r) = f where l,r are arbitrarily chosen “base” populations, and l, r are other populations from two sets L and R respectively. If X(l, r) has rank r and there were n waves of immigration into R with no back-migration from R to L, then r+1 ≤ n. We set L to include Stuttgart, Loschbour, MA1, Onge, Karitiana, Mbuti and R to include 23 modern European populations who fit the model of SI14 and had admixture proportions within the interval [0,1] for the method with minimal modeling assumptions (SI17).

Admixture proportions for Stuttgart in the absence of a Near Eastern ancient genome

We used Loschbour and BedouinB as surrogates for “Unknown hunter-gatherer” and Near Eastern (NE) farmer populations that contributed to Stuttgart (SI13). Ancient Near Eastern ancestry in Stuttgart is estimated by the f-ratio[8,15] f. A complication is that BedouinB is a mixture of NE and African ancestry. We therefore subtracted[17] the effects of African ancestry using estimates of the BedouinB African admixture proportion from ADMIXTURE (SI9) or ALDER[68].

Admixture graph modeling

We used ADMIXTUREGRAPH[8] (version 3110) to model population relationships between Loschbour, Stuttgart, Onge, and Karitiana using Mbuti as an African outgroup. We assessed model fit using a block jackknife of differences between estimated and fitted f-statistics for the set of included populations (we expressed the fit as a Z score). We determined that a model failed if |Z|>3 for at least one f-statistic. A basic tree model failed and we manually amended the model to test all possible models with a single admixture event, which also failed. Further manual amendment to include 2 admixture events resulted in 8 successful models, only one of which could be amended to also fit MA1 as an additional constraint. We successfully fit both the Iceman and LaBrana into this model as simple clades and Motala12 as a 2-way mixture. We also fit present-day West Eurasians as clades, 2-way mixtures, or 3-way mixtures in this basic model, achieving a successful fit for a larger number of European populations (n=26) as 3-way mixtures. We estimated the individual admixture proportions from the fitted model parameters. To test if fitted parameters for different populations are consistent with each other, we jointly fit all pairs of populations A and B by modifying ADMIXTUREGRAPH to add a large constant (10,000) to the variance term f By doing this, we can safely ignore recent gene flow within Europe that affects statistics that include both A and B.

Ancestry estimates from f-ratios

We estimate EEF ancestry using the f-ratio[8,15] f/f which produces consistent results with ADMIXTUREGRAPH (SI14). We use f/f to estimate Basal Eurasian admixture into Stuttgart. We use f/f to estimate ANE mixture in Karitiana (Fig. 2B). We use f/f to lower bound ANE mixture into North Caucasian populations.

MixMapper analysis

We carried out MixMapper 2.0[7] analysis, a semi-supervised admixture graph fitting technique. First, we infer a scaffold tree of populations without strong evidence of mixture relative to each other (Mbuti, Onge, Loschbour and MA1). We do not include European populations in the scaffold as all had significantly negative f-statistics indicating admixture. We then ran MixMapper to infer the relatedness of the other ancient and present-day samples, fitting them onto the scaffold as 2- or 3-way mixtures. The uncertainty in all parameter estimates is measured by block bootstrap resampling of the SNP set (100 replicates with 50 blocks).

TreeMix analysis

We applied TreeMix[21] to Loschbour, Stuttgart, Motala12, and MA1[3], LaBrana[2] and the Iceman[1], along with the present-day samples of Karitiana, Onge and Mbuti. We restricted the analysis to 265,521 Human Origins array sites after excluding any SNPs where there were no-calls in any of the studied individuals. The tree was rooted with Mbuti and standard errors were estimated using blocks of 500 SNPs. We repeated the analysis on whole-genome sequence data, rooting with Chimp and replacing Onge with Dai since we did not have Onge whole genome sequence data[55]. We varied the number of migration events (m) between 0 and 5.

Inferring admixture proportions with minimal modeling assumptions

We devised a method to infer ancestry proportions from three ancestral populations (EEF, WHG, and ANE) without strong phylogenetic assumptions (SI17). We rely on 15 “non-West Eurasian” outgroups and study f which equals αβ f + α(1−β) f if European has 1−a ancestry from EEF and β, 1−β ancestry from WHG and ANE respectively. This defines a system of equations with unknowns αβ, α(1−β), which we solve with least squares implemented in the function lsfit in R to obtain estimates of α and β. We repeated this computation 22 times dropping one chromosome at a time[20] to obtain block jackknife[67] estimates of the ancestry proportions and standard errors, with block size equal to the number of SNPs per chromosome. We assessed consistency of the inferred admixture proportions with those derived from the ADMIXTUREGRAPH model based on the number of standard errors between the two (Extended Data Table 1).

Haplotype-based analyses

We used RefinedIBD from BEAGLE 4[27] with the settings ibdtrim=20 and ibdwindow=25 to study IBD sharing between Loschbour and Stuttgart and populations from the POPRES dataset[69]. We kept all IBD tracts spanning at least 0.5 centimorgans (cM) and with a LOD score >3 (SI18). We also used ChromoPainter[29] to study haplotype sharing between Loschbour and Stuttgart and present-day West Eurasian populations (SI19). We identified 495,357 SNPs that were complete in all individuals and phased the data using Beagle 4[27] with parameters phase-its=50 and impute-its=10. We did not keep sites with missing data to avoid imputing modern alleles into the ancient individuals. We used both unlinked (-k 1000) and linked modes (estimating -n and -M by sampling 10% of individuals). We combined ChromoPainter output for chromosomes 1-22 using ChromoCombine[29]. We carried out a PCA of the co-ancestry matrix using fineSTRUCTURE[29]. Photographs of analyzed ancient samples. (A) Loschbour skull; (B) Stuttgart skull, missing the lower right M2 we sampled; (C) excavation at Kanaljorden in Motala, Sweden; (D) Motala 1 in situ. Pairwise Sequential Markovian Coalescent (PSMC) analysis. (A) Inference of population size as a function of time, showing a very small recent population size over the most recent period in the ancestry of Loschbour (at least the last 5–10 thousand years). (B) Inferred time since the most recent common ancestor from the PSMC for chromosomes 20, 21, 22 (top to bottom); Stuttgart is plotted on top and Loschbour at bottom. ADMIXTURE analysis (K=2 to K=20). Ancient samples (Loschbour, Stuttgart, Motala_merge, Motala12, MA1, and LaBrana) are at left. ANE ancestry is present in both Europe and the Near East but WHG ancestry is restricted to Europe, which cannot be due to a single admixture event. (x-axis) We computed the statistic f, which measures where MA1 shares more alleles with a test population than with Stuttgart. It is positive for most European and Near Eastern populations, consistent with ANE (MA1-related) gene flow into both regions. (y-axis) We computed the statistic f which measures whether Loschbour shares more alleles with a test sample than with Stuttgart. Only European populations show positive values of this statistic, providing evidence of WHG (Loschbour-related) admixture only in Europeans. MA1 is the best surrogate for ANE for which we have data. Europeans share more alleles with MA1 than with Karitiana, as we see from the fact that in a plot of f and f, the European cline deviates in the direction of MA1, rather than Karitiana (the slope is >1 and European populations are above the line indicating equality of these two statistics). The differential relatedness of West Eurasians to Stuttgart (EEF), Loschbour (WHG), and MA1 (ANE) cannot be explained by two-way mixture. We plot on a West Eurasian map the statistic f, where A and A are a pair of the three ancient samples representing the three ancestral populations of Europe. (A) In both Europe and the Near East/Caucasus, populations from the south have more relatedness to Stuttgart than those from the north where ANE influence is also important. (B) Northern European populations share more alleles with Loschbour than with Stuttgart, as they have additional WHG ancestry beyond what was already present in EEF. (C) We observe a striking contrast between Europe west of the Caucasus and the Near East in degree of relatedness to WHG. In Europe, there is a much higher degree of allele sharing with Loschbour than with MA1, which we ascribe to the 60–80% WHG/(WHG+ANE) ratio in most Europeans that we report in SI14. In contrast, the Near East has no appreciable WHG ancestry but some ANE ancestry, especially in the northern Caucasus. (Jewish populations are marked with a square in this figure to assist in interpretation as their ancestry is often anomalous for their geographic regions.) Evidence for Siberian gene flow into far northeastern Europe. Some northeastern European populations (Chuvash, Finnish, Russian, Mordovian, Saami) share more alleles with Han Chinese than with other Europeans who are arrayed in a cline from Stuttgart to Lithuanians/Estonians in a plot of f against f. West Eurasians genotyped on the Human Origins array and key f-statistics. Note: Zdiff is the number of standard errors of the difference between the lowest f-statistic over all reference pairs and the lowest f-statistic for a subset of reference pairs. Abbreviations used: Stu: Stuttgart; Los: Loschbour; LaB: LaBrana. Confirmation of key findings on transversions and on whole genome sequence data. Admixture proportions for European populations. The estimates from the model with minimal assumptions are from SI17. The estimates from the full modeling are from SI14 either by single population analysis or co-fitting population pairs and averaging over fits (these averages are the results plotted in Fig. 2B). Populations that do not fit the models are not reported.
  63 in total

1.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root.

Authors:  Doron M Behar; Mannis van Oven; Saharon Rosset; Mait Metspalu; Eva-Liis Loogväli; Nuno M Silva; Toomas Kivisild; Antonio Torroni; Richard Villems
Journal:  Am J Hum Genet       Date:  2012-04-06       Impact factor: 11.025

2.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing.

Authors:  Matthias Meyer; Martin Kircher
Journal:  Cold Spring Harb Protoc       Date:  2010-06

3.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

4.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure.

Authors:  Mattias Jakobsson; Noah A Rosenberg
Journal:  Bioinformatics       Date:  2007-05-07       Impact factor: 6.937

5.  Diet and the evolution of human amylase gene copy number variation.

Authors:  George H Perry; Nathaniel J Dominy; Katrina G Claw; Arthur S Lee; Heike Fiegler; Richard Redon; John Werner; Fernando A Villanea; Joanna L Mountain; Rajeev Misra; Nigel P Carter; Charles Lee; Anne C Stone
Journal:  Nat Genet       Date:  2007-09-09       Impact factor: 38.330

6.  Gene flow from North Africa contributes to differential human genetic diversity in southern Europe.

Authors:  Laura R Botigué; Brenna M Henn; Simon Gravel; Brian K Maples; Christopher R Gignoux; Erik Corona; Gil Atzmon; Edward Burns; Harry Ostrer; Carlos Flores; Jaume Bertranpetit; David Comas; Carlos D Bustamante
Journal:  Proc Natl Acad Sci U S A       Date:  2013-06-03       Impact factor: 11.205

7.  Inference of human population history from individual whole-genome sequences.

Authors:  Heng Li; Richard Durbin
Journal:  Nature       Date:  2011-07-13       Impact factor: 49.962

8.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

9.  The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors:  Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal:  Nature       Date:  2013-12-18       Impact factor: 49.962

10.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

View more
  427 in total

Review 1.  Origin of ethnic groups, linguistic families, and civilizations in China viewed from the Y chromosome.

Authors:  Xueer Yu; Hui Li
Journal:  Mol Genet Genomics       Date:  2021-05-26       Impact factor: 3.291

2.  Deep History of East Asian Populations Revealed Through Genetic Analysis of the Ainu.

Authors:  Choongwon Jeong; Shigeki Nakagome; Anna Di Rienzo
Journal:  Genetics       Date:  2015-10-23       Impact factor: 4.562

3.  Inference of biogeographical ancestry across central regions of Eurasia.

Authors:  O Bulbul; G Filoglu; T Zorlu; H Altuncul; A Freire-Aradas; J Söchtig; Y Ruiz; M Klintschar; S Triki-Fendri; A Rebai; C Phillips; M V Lareu; Á Carracedo; P M Schneider
Journal:  Int J Legal Med       Date:  2015-08-20       Impact factor: 2.686

4.  Coevolution of genes and languages and high levels of population structure among the highland populations of Daghestan.

Authors:  Tatiana M Karafet; Kazima B Bulayeva; Johanna Nichols; Oleg A Bulayev; Farida Gurgenova; Jamilia Omarova; Levon Yepiskoposyan; Olga V Savina; Barry H Rodrigue; Michael F Hammer
Journal:  J Hum Genet       Date:  2015-11-26       Impact factor: 3.172

5.  Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome.

Authors:  Lara M Cassidy; Rui Martiniano; Eileen M Murphy; Matthew D Teasdale; James Mallory; Barrie Hartwell; Daniel G Bradley
Journal:  Proc Natl Acad Sci U S A       Date:  2015-12-28       Impact factor: 11.205

6.  Testing for Ancient Selection Using Cross-population Allele Frequency Differentiation.

Authors:  Fernando Racimo
Journal:  Genetics       Date:  2015-11-23       Impact factor: 4.562

7.  Admixture, Population Structure, and F-Statistics.

Authors:  Benjamin M Peter
Journal:  Genetics       Date:  2016-02-08       Impact factor: 4.562

Review 8.  Recent advances in the study of fine-scale population structure in humans.

Authors:  John Novembre; Benjamin M Peter
Journal:  Curr Opin Genet Dev       Date:  2016-09-20       Impact factor: 5.578

9.  Human variation in the shape of the birth canal is significant and geographically structured.

Authors:  Lia Betti; Andrea Manica
Journal:  Proc Biol Sci       Date:  2018-10-24       Impact factor: 5.349

10.  Ancestral Origins and Genetic History of Tibetan Highlanders.

Authors:  Dongsheng Lu; Haiyi Lou; Kai Yuan; Xiaoji Wang; Yuchen Wang; Chao Zhang; Yan Lu; Xiong Yang; Lian Deng; Ying Zhou; Qidi Feng; Ya Hu; Qiliang Ding; Yajun Yang; Shilin Li; Li Jin; Yaqun Guan; Bing Su; Longli Kang; Shuhua Xu
Journal:  Am J Hum Genet       Date:  2016-08-25       Impact factor: 11.025

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.