Literature DB >> 20203074

Ensemble learning algorithms for classification of mtDNA into haplogroups.

Carol Wong1, Yuran Li, Chih Lee, Chun-Hsi Huang.   

Abstract

Classification of mitochondrial DNA (mtDNA) into their respective haplogroups allows the addressing of various anthropologic and forensic issues. Unique to mtDNA is its abundance and non-recombining uni-parental mode of inheritance; consequently, mutations are the only changes observed in the genetic material. These individual mutations are classified into their cladistic haplogroups allowing the tracing of different genetic branch points in human (and other organisms) evolution. Due to the large number of samples, it becomes necessary to automate the classification process. Using 5-fold cross-validation, we investigated two classification techniques on the consented database of 21 141 samples published by the Genographic project. The support vector machines (SVM) algorithm achieved a macro-accuracy of 88.06% and micro-accuracy of 96.59%, while the random forest (RF) algorithm achieved a macro-accuracy of 87.35% and micro-accuracy of 96.19%. In addition to being faster and more memory-economic in making predictions, SVM and RF are better than or comparable to the nearest-neighbor method employed by the Genographic project in terms of prediction accuracy.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20203074      PMCID: PMC3030810          DOI: 10.1093/bib/bbq008

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  3 in total

1.  Counting target molecules by exponential polymerase chain reaction: copy number of mitochondrial DNA in rat tissues.

Authors:  R J Wiesner; J C Rüegg; I Morano
Journal:  Biochem Biophys Res Commun       Date:  1992-03-16       Impact factor: 3.575

Review 2.  Mitochondrial DNA and aging.

Authors:  Mikhail F Alexeyev; Susan P Ledoux; Glenn L Wilson
Journal:  Clin Sci (Lond)       Date:  2004-10       Impact factor: 6.124

3.  The Genographic Project public participation mitochondrial DNA database.

Authors:  Doron M Behar; Saharon Rosset; Jason Blue-Smith; Oleg Balanovsky; Shay Tzur; David Comas; R John Mitchell; Lluis Quintana-Murci; Chris Tyler-Smith; R Spencer Wells
Journal:  PLoS Genet       Date:  2007-06       Impact factor: 5.917

  3 in total
  4 in total

1.  The use of classification trees for bioinformatics.

Authors:  Xiang Chen; Minghui Wang; Heping Zhang
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2011-01-06

2.  Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers.

Authors:  J Nikolaj Dybowski; Mona Riemenschneider; Sascha Hauke; Martin Pyka; Jens Verheyen; Daniel Hoffmann; Dominik Heider
Journal:  BioData Min       Date:  2011-11-14       Impact factor: 2.522

3.  The application of machine learning to predict genetic relatedness using human mtDNA hypervariable region I sequences.

Authors:  Priyanka Govender; Stephen Gbenga Fashoto; Leah Maharaj; Matthew A Adeleke; Elliot Mbunge; Jeremiah Olamijuwon; Boluwaji Akinnuwesi; Moses Okpeku
Journal:  PLoS One       Date:  2022-02-18       Impact factor: 3.240

4.  Predicting drug side effects by multi-label learning and ensemble learning.

Authors:  Wen Zhang; Feng Liu; Longqiang Luo; Jingxia Zhang
Journal:  BMC Bioinformatics       Date:  2015-11-04       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.