| Literature DB >> 30545160 |
Umberto Esposito1, Ranajit Das2, Syakir Syed3, Mehdi Pirooznia4, Eran Elhaik5.
Abstract
The rapid accumulation of ancient human genomes from various areas and time periods potentially enables the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations, which are known to misrepresent the population structure. Past studies addressed these problems by using ancestry informative markers (AIMs). It is, thereby, unclear whether AIMs derived from contemporary human genomes can capture ancient population structures, and whether AIM-finding methods are applicable to aDNA, provided that the high missingness rates in ancient-and oftentimes haploid-DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperform all of the competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of single nucleotide polymorphism (SNP) microarrays and the interpretation of aDNA results, which enables a population-wide testing of primordialist theories.Entities:
Keywords: admixture mapping, primordialism; ancient DNA; ancient ancestry informative markers; population structure; principal component analysis
Year: 2018 PMID: 30545160 PMCID: PMC6316245 DOI: 10.3390/genes9120625
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Geographic distribution of the highly differentiated rs7896530 in modern-day (A) and ancient (B) populations. The geographic distributions of the T (black) and G (yellow) alleles were obtained from the Geography of Genetic Variants Browser [33] and Table S1, respectively.
Figure 2A workflow to identify and evaluate the accuracy of ancient ancestry informative markers (aAIM)-finding algorithms compared to each other as well as to the complete single nucleotide polymorphism (SNP) (CSS) set. We adopted four criteria to evaluate how well the aAIM candidates captured the population structure depicted by the CSS. First, we qualitatively compared the dispersal of genomes obtained from a principal component analysis (PCA) to that of the CSS. Second, we compared the Euclidean distances between the admixture proportions of each genome and those obtained from the CSS. To avoid inconsistencies between the SNP sets, we used admixture components obtained through a supervised ADMIXTURE (see methods). Third, we tested which aAIMs classified individuals to populations most accurately. Finally, we evaluated the ability of the top performing method to identify admixed individuals against the CSS. aDNA: ancient DNA.
Figure 3Geographical locations of the ancient genomes. The shapes designate the country of origin of the genomes and their colors designate the era. The total number of ancient genomes from each era is noted. Insets show densely sampled regions.
Figure 4A comparison of the Euclidean distances (Δ) between the admixture proportions of the ancient genomes obtained from the CSS and those obtained from the aAIM sets using violin plots. Lower distances indicate high genetic similarity between the admixture proportions obtained using two different SNP sets.
Accuracy in classifying individuals to populations using the aAIM candidates. The total number of individuals (n) per population are reported in column two. Columns three to eight show the number of individuals correctly predicted to their populations and, in brackets, the corresponding population percentage. Columns seven and eight effectively represent a random number of 10000 and 15000 SNPs, respectively. Mean and standard error for each SNP set are provided in the last row.
| Population |
| CSS | PD |
| Infocalc | Admixture1 | Admixture2 | Rand10k | Rand15k |
|---|---|---|---|---|---|---|---|---|---|
| Britain Iron Saxon | 10 | 10 (100) | 4 (40) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 1 (10) | 3 (30) |
| Caucasus Chalcolithic Bronze | 22 | 21 (95) | 8 (36) | 0 (0) | 12 (55) | 6 (27) | 4 (18) | 13 (59) | 9 (41) |
| Caucasus Mesolithic Neolithic | 9 | 6 (67) | 7 (78) | 0 (0) | 6 (67) | 1 (11) | 7 (78) | 4 (44) | 4 (44) |
| Central EU Early Neolithic | 26 | 17 (65) | 14 (54) | 4 (15) | 18 (69) | 4 (15) | 5 (19) | 14 (54) | 18 (69) |
| Central EU Late Neolithic Bronze | 57 | 16 (28) | 17 (30) | 19 (33) | 19 (33) | 13 (23) | 21 (37) | 25 (44) | 21 (37) |
| Central EU Mid Neolithic Chalc | 6 | 2 (33) | 3 (50) | 0 (0) | 3 (50) | 3 (50) | 3 (50) | 2 (33) | 2 (33) |
| Central North EU Late Neol Bronz | 20 | 18 (90) | 9 (45) | 0 (0) | 6 (30) | 0 (0) | 5 (25) | 4 (20) | 6 (30) |
| Central Western EU Mesolithic | 3 | 3 (100) | 2 (67) | 0 (0) | 3 (100) | 0 (0) | 0 (0) | 1 (33) | 3 (100) |
| Italy Mid Neolithic Chalcolithic | 4 | 4 (100) | 3 (75) | 0 (0) | 1 (25) | 1 (25) | 0 (0) | 1 (25) | 1 (25) |
| Jordan Bronze | 3 | 3 (100) | 2 (67) | 0 (0) | 0 (0) | 2 (67) | 3 (100) | 1 (33) | 2 (67) |
| Levant Epipaleolithic Neolithic | 19 | 7 (37) | 6 (32) | 0 (0) | 9 (47) | 8 (42) | 7 (37) | 4 (21) | 7 (37) |
| Russia Chalcolithic | 3 | 2 (67) | 3 (100) | 0 (0) | 1 (33) | 0 (0) | 2 (67) | 1 (33) | 1 (33) |
| Russia Early Mid Bronze | 19 | 19 (100) | 15 (79) | 0 (0) | 10 (53) | 0 (0) | 18 (95) | 10 (53) | 14 (74) |
| Russia Late Chalcolithic | 9 | 6 (67) | 6 (67) | 0 (0) | 5 (56) | 0 (0) | 1 (11) | 3 (33) | 3 (33) |
| Russia Mesolithic | 3 | 2 (67) | 2 (67) | 0 (0) | 2 (67) | 0 (0) | 1 (33) | 2 (67) | 2 (67) |
| Russia Mid Late Bronze | 22 | 15 (68) | 16 (73) | 0 (0) | 7 (32) | 0 (0) | 0 (0) | 4 (18) | 6 (27) |
| Spain Early Neolithic | 6 | 4 (67) | 5 (83) | 0 (0) | 6 (100) | 4 (67) | 4 (67) | 4 (67) | 5 (83) |
| Spain Mid Neolithic Chalcolithic | 18 | 7 (39) | 6 (33) | 0 (0) | 7 (39) | 5 (28) | 3 (17) | 5 (28) | 5 (28) |
| Sweden Mesolithic | 8 | 8 (100) | 8 (100) | 0 (0) | 7 (88) | 4 (50) | 1 (13) | 6 (75) | 7 (88) |
| Sweden Mid Neolithic | 4 | 4 (100) | 1 (25) | 1 (25) | 2 (50) | 1 (25) | 0 (0) | 4 (100) | 2 (50) |
| Turkey Neolithic | 24 | 23 (96) | 18 (75) | 0 (0) | 12 (50) | 3 (13) | 6 (25) | 8 (33) | 11 (46) |
| 76 ± 5 | 61 ± 5 | 3 ± 2 | 50 ± 6 | 21 ± 5 | 33 ± 7 | 42 ± 5 | 50 ± 5 |
EU: Europe. CSS: Complete single nucleotide polymorphism (SNP) set; PD: Principal component analysis (PCA)-derived.
Accuracy of inferring hybrid individuals using the PD’s aAIMs. The six parental populations and the number of hybrid individuals generated from them are shown. Each hybrid was represented by three datasets: CSS, PD aAIMs, and a random SNP set. The mean genetic distances (d) between the admixture components of these datasets per population are shown. Short distances indicate high genetic similarity.
| Parental Population A | Parental Population B | # Hybrids |
|
|
|
|---|---|---|---|---|---|
| Britain Iron Saxon | Britain Iron Saxon | 6 | 0.026 | 0.212 | 0.208 |
| Britain Iron Saxon | Russia Late Chalcolithic | 9 | 0.009 | 0.610 | 0.601 |
| Britain Iron Saxon | Sweden Mesolithic | 9 | 0.051 | 0.344 | 0.337 |
| Britain Iron Saxon | Turkey Neolithic | 9 | 0.003 | 0.428 | 0.431 |
| Britain Iron Saxon | Spain Early Neolithic | 9 | 0.108 | 0.221 | 0.241 |
| Russia Late Chalcolithic | Russia Late Chalcolithic | 6 | 0.009 | 0.443 | 0.448 |
| Russia Late Chalcolithic | Sweden Mesolithic | 9 | 0.062 | 0.578 | 0.561 |
| Russia Late Chalcolithic | Turkey Neolithic | 9 | 0.063 | 0.661 | 0.633 |
| Russia Late Chalcolithic | Spain Early Neolithic | 9 | 0.101 | 0.520 | 0.491 |
| Sweden Mesolithic | Sweden Mesolithic | 6 | 0.000 | 0.384 | 0.384 |
| Sweden Mesolithic | Turkey Neolithic | 9 | 0.055 | 0.567 | 0.522 |
| Spain Early Neolithic | Sweden Mesolithic | 9 | 0.108 | 0.402 | 0.377 |
| Turkey Neolithic | Turkey Neolithic | 6 | 0.001 | 0.627 | 0.626 |
| Spain Early Neolithic | Turkey Neolithic | 9 | 0.092 | 0.483 | 0.493 |
| Spain Early Neolithic | Spain Early Neolithic | 6 | 0.041 | 0.197 | 0.172 |
CSS: Complete single nucleotide polymorphism (SNP) set; PD: Principal component analysis (PCA)-derived.