| Literature DB >> 31827153 |
Andrew J Pakstis1, William C Speed1, Usha Soundararajan1, Haseena Rajeevan2, Judith R Kidd1, Hui Li3, Kenneth K Kidd4.
Abstract
The benefits of ancestry informative SNP (AISNP) panels can best accrue and be properly evaluated only as sufficient reference population data become readily accessible. Ideally the set of reference populations should approximate the genetic diversity of human populations worldwide. The Kidd and Seldin AISNP sets are two panels that have separately accumulated thus far the largest and most diverse collections of data on human reference populations from the major continental regions. A recent tally in the ALFRED allele frequency database finds 164 reference populations available for all the 55 Kidd AISNPs and 132 reference populations for all the 128 Seldin AISNPs. Although much more of the genetic diversity in human populations around the world still needs to be documented, 81 populations have genotype data available for all 170 AISNPs in the union of the Kidd and Seldin panels. In this report we examine admixture and principal component analyses on these 81 worldwide populations and some regional subsets of these reference populations to determine how well the combined panel illuminates population relationships. Analyses of this dataset that focused on Native American populations revealed very strong cluster patterns associated with many of the individual populations studied.Entities:
Mesh:
Year: 2019 PMID: 31827153 PMCID: PMC6906462 DOI: 10.1038/s41598-019-55175-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1STRUCTURE population bar plots showing estimated cluster membership values for each of 81 populations at K = 10 and K = 12 displaying the highest likelihood run out of 20 runs at each K.
Comparing cluster membership value estimates (CMVE) of individuals (via STRUCTURE analyses) for different AISNP panels and within panels.
| Panel | best *K, other Ks | Count of individuals with CMVE thresholds | Percentage of individuals with CMVE thresholds | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| <60% | ≥60% | ≥70% | ≥80% | ≥90% | <60% | ≥60% | ≥70% | ≥80% | ≥90% | ||
| 170 SNPs | K = 8 | 412 | 3521 | 3213 | 2833 | 2184 | 10.5% | 89.5% | 81.7% | 72.0% | 55.5% |
| K = 9 | 446 | 3487 | 3218 | 2849 | 2165 | 11.3% | 88.7% | 81.8% | 72.4% | 55.0% | |
| *K = 10 | 502 | 3431 | 3153 | 2755 | 2035 | 12.8% | 87.2% | 80.2% | 70.0% | 51.7% | |
| K = 11 | 646 | 3287 | 2949 | 2543 | 1809 | 16.4% | 83.6% | 75.0% | 64.7% | 46.0% | |
| K = 12 | 729 | 3204 | 2844 | 2388 | 1656 | 18.5% | 81.5% | 72.3% | 60.7% | 42.1% | |
| 55 Kidd SNPs | K = 8 | 619 | 3414 | 3088 | 2660 | 1969 | 15.3% | 84.7% | 76.6% | 66.0% | 48.8% |
| *K = 9 | 696 | 3337 | 2987 | 2571 | 1843 | 17.3% | 82.7% | 74.1% | 63.7% | 45.7% | |
| K = 10 | 905 | 3128 | 2749 | 2279 | 1595 | 22.4% | 77.6% | 68.2% | 56.5% | 39.5% | |
| K = 11 | 1088 | 2945 | 2550 | 2074 | 1387 | 17.0% | 73.0% | 63.2% | 51.4% | 34.4% | |
| K = 12 | 1215 | 2818 | 2361 | 1867 | 1164 | 30.1% | 69.9% | 58.5% | 46.3% | 28.9% | |
| 128 Seldin SNPs | *K = 8 | 762 | 3164 | 2795 | 2346 | 1633 | 19.4% | 80.6% | 71.2% | 59.8% | 41.6% |
| K = 9 | 1014 | 2912 | 2543 | 2079 | 1404 | 25.8% | 74.2% | 64.8% | 53.0% | 35.8% | |
| K = 10 | 944 | 2982 | 2504 | 2113 | 1343 | 24.0% | 76.0% | 63.8% | 53.8% | 34.2% | |
| K = 11 | 1059 | 2867 | 2462 | 1937 | 1180 | 27.0% | 73.0% | 62.7% | 49.3% | 30.1% | |
| K = 12 | 1174 | 2752 | 2367 | 1881 | 1150 | 29.9% | 70.1% | 60.3% | 47.9% | 29.3% | |
The results are for the highest likelihood runs at/near optimal cluster (K) values within each dataset.
There is some variation in the total number of individuals for the 81 populations across the three AISNP sets analyzed because individuals with excessive numbers of missing typings were excluded. An individual was excluded from the analysis of a panel when >33% of SNP typings were missing.
Figure 2PCA results for 81 populations showing strong clustering by continental regions. Zoom-in view of central clusters (61 of the populations). Population groupings for sub-Saharan Africa (11 populations including Afr-Americans) and Americas (9 populations) are off screen.
Figure 3Region—Americas—showing initial differentiation stages into North and South American clusters with low levels of admixture from European and African sources. Individual bar plots from Structure analysis for K = 2 to 5 based on 170 AISNPs. Analysis includes 9 Native American populations and 3 outlier populations (Yoruba, European Americans, and Outer Mongolians). Displaying best of 10 runs at each K.
Figure 4Region—Americas— Individual bar plots from Structure analyses at K = 11 and 12 showing that the differentiation of clusters corresponds increasingly to particular populations at higher K levels for seven of the 9 Native American groups. However, the Maya and Quechua remain more complex with multiple cluster affiliations and it is more difficult to see the predominant light blue cluster for the small number of Guihiba. The extra image inserted below K = 12 displays an expanded view for these three populations with the individuals sorted together that have more similar cluster membership patterns at K = 12. A rather specific gray cluster also appears for about one-third of the Ticuna at K = 12.