Literature DB >> 35001369

mitoDataclean: A machine learning approach for the accurate identification of cross-contamination-derived tumor mitochondrial DNA mutations.

Liping Su1, Shanshan Guo1, Wenjie Guo1, Xiaoying Ji1, Yang Liu1, Huanqin Zhang2, Qichao Huang1, Kaixiang Zhou1, Xu Guo1, Xiwen Gu3, Jinliang Xing1.   

Abstract

Next-generation sequencing (NGS) of mitochondrial DNA (mtDNA) has widespread applications in aging and cancer studies. However, cross-contamination of mtDNA constitutes a major concern. Previous methods for the detection of mtDNA contamination mainly focus on haplogroup-level phylogeny, but neglect haplotype-level differences, leading to limited sensitivity and accuracy. In our study, we present mitoDataclean, a random-forest-based machine learning package for accurate identification of cross-contamination, evaluation of contamination levels and detection of contamination-derived variants in mtDNA NGS data. Comprehensive optimization of mitoDataclean revealed that training simulation with mixtures of small haplogroup distance and low polymorphic difference was critical for optimal modeling. Compared to existing methods, mitoDataclean exhibited significantly improved sensitivity and accuracy for the detection of sample contamination in simulated data. In addition, mitoDataclean achieved area under the curve values of 0.91 and 0.97 for discerning genuine and contamination-derived mtDNA variants in a simulated Western dataset and private sequencing contamination data, respectively, suggesting that this tool may be applicable for different populations and samples with different sources of contamination. Finally, mitoDataclean was further evaluated in several private and public datasets and showed a robust ability for contamination detection. Altogether, our study demonstrates that mitoDataclean may be used for accurate detection of contaminated samples and contamination-derived variants in mtDNA NGS data.
© 2022 UICC.

Entities:  

Keywords:  machine learning; mitochondrial DNA; next-generation sequencing; sample cross-contamination

Mesh:

Substances:

Year:  2022        PMID: 35001369     DOI: 10.1002/ijc.33927

Source DB:  PubMed          Journal:  Int J Cancer        ISSN: 0020-7136            Impact factor:   7.396


  1 in total

1.  Mutational profiling of mtDNA control region reveals tumor-specific evolutionary selection involved in mitochondrial dysfunction.

Authors:  Xiaoying Ji; Wenjie Guo; Xiwen Gu; Shanshan Guo; Kaixiang Zhou; Liping Su; Qing Yuan; Yang Liu; Xu Guo; Qichao Huang; Jinliang Xing
Journal:  EBioMedicine       Date:  2022-05-17       Impact factor: 11.205

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.