Literature DB >> 33586030

Population inference based on mitochondrial DNA control region data by the nearest neighbors algorithm.

Fu-Chi Yang1, Bill Tseng1, Chun-Yen Lin2, Yu-Jen Yu1, Adrian Linacre3, James Chun-I Lee4.   

Abstract

Population and geographic assignment are frequently undertaken using DNA sequences on the mitochondrial genome. Assignment to broad continental populations is common, although finer resolution to subpopulations can be less accurate due to shared genetic ancestry at a local level and members of different ancestral subpopulations cohabiting the same geographic area. This study reports on the accuracy of population and subpopulation assignment by using the sequence data obtained from the 3070 mitochondrial genomes and applying the K-nearest neighbors (KNN) algorithm. These data also included training samples used for continental and population assignment comprised of 1105 Europeans (including Austria, France, Germany, Spain, and England and Caucasian countries), 374 Africans (including North and East Africa and non-specific area (Pan-Africa)), and 1591 Asians (including Japan, Philippines, and Taiwan). Subpopulations included in this study were 1153 mitochondrial DNA (mtDNA) control region sequences from 12 subpopulations in Taiwan (including Han, Hakka, Ami, Atayal, Bunun, Paiwan, Puyuma, Rukai, Saisiyat, Tsou, Tao, and Pingpu). Additionally, control region sequence data from a further 50 samples, obtained from the Sigma Company, were included after they were amplified and sequenced. These additional 50 samples acted as the "testing samples" to verify the accuracy of the population. In this study, based on genetic distances as genetic metric, we used the KNN algorithm and the K-weighted-nearest neighbors (KWNN) algorithm weighted by genetic distance to classify individuals into continental populations, and subpopulations within the same continent. Accuracy results of ethnic inferences at the level of continental populations and of subpopulations among KNN and KWNN algorithms were obtained. The training sample set achieved an overall accuracy of 99 to 82% for assignment to their continental populations with K values from 1 to 101. Population assignment for subpopulations with K assignments from 1 to 5 reached an accuracy of 77 to 54%. Four out of 12 Taiwanese populations returned an accuracy of assignment of over 60%, Ami (66%), Atayal (67%), Saisiyat (66%), and Tao (80%). For the testing sample set, results of ethnic prediction for continental populations with recommended K values as 5, 10, and 35, based on results of the training sample set, achieved overall an accuracy of 100 to 94%. This study provided an accurate method in population assignment for not only continental populations but also subpopulations, which can be useful in forensic and anthropological studies.

Keywords:  Control region; Nearest neighbors algorithm; Population inference; mtDNA

Year:  2021        PMID: 33586030     DOI: 10.1007/s00414-021-02520-3

Source DB:  PubMed          Journal:  Int J Legal Med        ISSN: 0937-9827            Impact factor:   2.686


  20 in total

1.  Mitochondrial DNA polymorphisms in nine aboriginal groups of Taiwan: implications for the population history of aboriginal Taiwanese.

Authors:  Atsushi Tajima; Cheih-Shan Sun; I-Hung Pan; Takafumi Ishida; Naruya Saitou; Satoshi Horai
Journal:  Hum Genet       Date:  2003-04-10       Impact factor: 4.132

Review 2.  Forensics and mitochondrial DNA: applications, debates, and foundations.

Authors:  Bruce Budowle; Marc W Allard; Mark R Wilson; Ranajit Chakraborty
Journal:  Annu Rev Genomics Hum Genet       Date:  2003       Impact factor: 8.929

3.  Inferring the most likely geographical origin of mtDNA sequence profiles.

Authors:  T Egeland; H M Bøvelstad; G O Storvik; A Salas
Journal:  Ann Hum Genet       Date:  2004-09       Impact factor: 1.670

Review 4.  Recent progress in mitochondrial DNA analysis.

Authors:  Kazuo Umetsu; Isao Yuasa
Journal:  Leg Med (Tokyo)       Date:  2005-07       Impact factor: 1.376

5.  Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.

Authors:  Jodi A Irwin; Jessica L Saunier; Katharine M Strouss; Kimberly A Sturk; Toni M Diegoli; Rebecca S Just; Michael D Coble; Walther Parson; Thomas J Parsons
Journal:  Forensic Sci Int Genet       Date:  2007-03-13       Impact factor: 4.882

6.  Asian affinities and continental radiation of the four founding Native American mtDNAs.

Authors:  A Torroni; T G Schurr; M F Cabell; M D Brown; J V Neel; M Larsen; D G Smith; C M Vullo; D C Wallace
Journal:  Am J Hum Genet       Date:  1993-09       Impact factor: 11.025

7.  Estimates of continental ancestry vary widely among individuals with the same mtDNA haplogroup.

Authors:  Leslie S Emery; Kevin M Magnaye; Abigail W Bigham; Joshua M Akey; Michael J Bamshad
Journal:  Am J Hum Genet       Date:  2015-01-22       Impact factor: 11.025

Review 8.  Machine learning applications in genetics and genomics.

Authors:  Maxwell W Libbrecht; William Stafford Noble
Journal:  Nat Rev Genet       Date:  2015-05-07       Impact factor: 53.242

9.  Analysis of mtDNA variation in African populations reveals the most ancient of all human continent-specific haplogroups.

Authors:  Y S Chen; A Torroni; L Excoffier; A S Santachiara-Benerecetti; D C Wallace
Journal:  Am J Hum Genet       Date:  1995-07       Impact factor: 11.025

10.  Maternal ancestry and population history from whole mitochondrial genomes.

Authors:  Toomas Kivisild
Journal:  Investig Genet       Date:  2015-03-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.