| Literature DB >> 35651934 |
Igor Gorin1,2,3, Oleg Balanovsky1,3,4, Oleg Kozlov1, Sergey Koshel3,5, Elena Kostryukova6, Maxat Zhabagin7, Anastasiya Agdzhoyan1,3, Vladimir Pylev3,4, Elena Balanovska1,3,4.
Abstract
Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world's largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia's vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software "Homeland" fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for the needs of forensic science and genetic genealogy.Entities:
Keywords: ancestral origin; ancestry prediction; gene geography; human population genetics; machine learning
Year: 2022 PMID: 35651934 PMCID: PMC9149316 DOI: 10.3389/fgene.2022.902309
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1A map of the 266 populations of North Eurasia used for the analysis. Notes. Dots of different colors on the map are languages spoken by the representatives of the studied populations (the color legend is provided at the top of the map).
Populations and sizes of EGGs.
| No | EGG | Populations | Size |
|---|---|---|---|
| 1 | Amur_Nanais&Nivkhs&Orochi&Ulchi | Nanais, Nivkhs, Ulchi, Orochs | 55 |
| 2 | Bashkirs | Bashkirs | 44 |
| 3 | Buryats&Khamnegan&Yakuts | Buryats, Khamnegan, Yakuts | 59 |
| 4 | Chechens&Ingush | Chechens, Ingush | 39 |
| 5 | Chukchi&Koryaks&Itelmen | Koryaks, Itelmens, Kamchadals, Chukchi, Itelmens | 75 |
| 6 | Dagestan | Avars, Kubachins, Dargins, Tabasarans, Laks, Lezgins, Rutuls | 74 |
| 7 | Evenks&Evens | Evens, Evenks | 49 |
| 8 | Karelians&Veps | Karelians, Vepsa | 38 |
| 9 | Kazakh&Karakalpak&Uigur&Nogais | Karakalpaks, Nogais_Astrakhan, Nogais_Stavropol, Uyghurs, Kazakhs | 33 |
| 10 | Khakass&AltaiSouth | Khakass, Altaians | 46 |
| 11 | Khanty&Mansi&Nenets | Khanty, Nenets, Mansi | 53 |
| 12 | Komi&Udmurts | Komi Permyaks, Komi Zyrians, Udmurts, Besermyan | 84 |
| 13 | Kyrghyz | Kyrghyz | 43 |
| 14 | Mari&Chuvash | Chuvashes, Mari | 53 |
| 15 | Mongols&Kalmyks | Mongols, Kalmyks | 127 |
| 16 | Mordovians | Mordovians Moksha, Mordovians Erzya, Mordovians Shoksha | 41 |
| 17 | Ossets | Ossetians | 36 |
| 18 | Russians_North | Russians, Izhora, Vod | 81 |
| 19 | Russians_Southern | Russians, Belorussians | 240 |
| 20 | Russians_VeryNorth | Russians | 35 |
| 21 | Shors&AltaiNorth | Shors, Altaians | 37 |
| 22 | Siberian Tatars | Tatars Siberian | 68 |
| 23 | Tajiks&Pomiri&Yaghnobi | Pomiri, Tajiks, Yaghnobi | 72 |
| 24 | Tatars | Tatars Krayshen, Tatars Kazan, Tatar _Mishar, Tatars from Bashkortostan, Tatars Astrakhan | 60 |
| 25 | Transcaucasia&Crimea | Armenians, Azeri, Tatars_Crimean, Karaites, Turks, Kurds, Ezids, Georgians | 113 |
| 26 | Tuvinians&Tofalars | Tuvinians, Mongols, Tofalars | 64 |
| 27 | Ukrainians | Ukrainians | 79 |
| 28 | Uzbeks&Turkmens | Turkmens, Uzbeks | 55 |
| 29 | West_Caucasus | Adyghe, Kabardinians, Shapsug, Karachays, Abkhazians, Circassians, Abazins, Balkars | 87 |
FIGURE 2North Eurasia divided into genetically distinguishable ethnic geographic groups. Notes. Colored zones on the map designate areas occupied by the identified ethnic geographic groups. Groups are numbered according to their geographic coordinates. Black stars represent local populations (coincide with the populations in Figure 1).
FIGURE 3A plot of the first and the second principal components based on the entire 4.5 M SNP panel.
FIGURE 4Example of the map generated by the Homeland software.
Resulting metrics of predictions for each EGG.
| Precision | Recall | f1-Score | Support | |
|---|---|---|---|---|
| Amur_Nanais&Nivkhs&Orochi&Ulchi | 1.00 | 1.00 | 1.00 | 12 |
| Bashkirs | 0.71 | 0.77 | 0.74 | 13 |
| Buryats&Khamnegan&Yakuts | 0.79 | 0.88 | 0.83 | 17 |
| Chechens&Ingush | 1.00 | 0.63 | 0.77 | 8 |
| Chukchi&Koryaks&Itelmen | 1.00 | 1.00 | 1.00 | 20 |
| Dagestan | 0.90 | 0.90 | 0.90 | 20 |
| Evenks&Evens | 1.00 | 1.00 | 1.00 | 14 |
| Karelians&Veps | 1.00 | 0.73 | 0.84 | 11 |
| Kazakh&Karakalpak&Uigur&Nogais | 0.75 | 0.30 | 0.43 | 10 |
| Khakass&AltaiSouth | 1.00 | 0.92 | 0.96 | 13 |
| Khanty&Mansi&Nenets | 0.94 | 1.00 | 0.97 | 16 |
| Komi&Udmurts | 0.96 | 0.88 | 0.92 | 25 |
| Kyrghyz | 0.83 | 0.50 | 0.63 | 10 |
| Mari&Chuvash | 0.84 | 1.00 | 0.91 | 16 |
| Mongols&Kalmyks | 0.82 | 0.95 | 0.88 | 38 |
| Mordovians | 0.80 | 0.67 | 0.73 | 12 |
| Ossets | 0.86 | 0.55 | 0.67 | 11 |
| Russians_North | 0.80 | 0.35 | 0.48 | 23 |
| Russians_Southern | 0.75 | 0.93 | 0.83 | 59 |
| Russians_VeryNorth | 1.00 | 0.90 | 0.95 | 10 |
| Shors&AltaiNorth | 1.00 | 1.00 | 1.00 | 10 |
| Siberian Tatars | 1.00 | 0.65 | 0.79 | 20 |
| Tajiks&Pomiri&Yaghnobi | 0.81 | 0.95 | 0.88 | 22 |
| Tatars | 0.60 | 0.38 | 0.46 | 16 |
| Transcaucasia&Crimea | 0.92 | 0.96 | 0.94 | 25 |
| Tuvinians&Tofalars | 1.00 | 1.00 | 1.00 | 17 |
| Ukrainians | 0.57 | 1.00 | 0.73 | 24 |
| Uzbeks&Turkmens | 0.86 | 0.86 | 0.86 | 14 |
| West_Caucasus | 0.75 | 0.81 | 0.78 | 26 |
| — | — | — | — | — |
| accuracy | — | — | 0.84 | 532 |
| macro avg | 0.87 | 0.81 | 0.82 | 532 |
| weighted avg | 0.85 | 0.84 | 0.83 | 532 |