Hao Chen1, Yan Lu1,2, Dongsheng Lu1, Shuhua Xu3,4,5,6,7. 1. Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China. 2. School of Life Sciences, Fudan University, Shanghai, 200433, China. 3. Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China. xushua@picb.ac.cn. 4. School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China. xushua@picb.ac.cn. 5. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China. xushua@picb.ac.cn. 6. Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, 450052, China. xushua@picb.ac.cn. 7. Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai, 200438, China. xushua@picb.ac.cn.
Abstract
BACKGROUND: Y-chromosome DNA (Y-DNA) has been used for tracing paternal lineages and offers a clear path from an individual to a known, or likely, direct paternal ancestor. The advance of next-generation sequencing (NGS) technologies increasingly improves the resolution of the non-recombining region of the Y-chromosome (NRY). However, a lack of suitable computer tools prevents the use of NGS data from the Y-DNA studies. RESULTS: We developed Y-LineageTracker, a high-throughput analysis framework that not only utilizes state-of-the-art methodologies to automatically determine NRY haplogroups and identify microsatellite variants of Y-chromosome on a fine scale, but also optimizes comprehensive Y-DNA analysis methods for NGS data. Notably, Y-LineageTracker integrates the NRY haplogroup and Y-STR analysis modules with recognized strategies to robustly suggest an interpretation for paternal genetics and evolution. NRY haplogroup module mainly covers haplogroup classification, clustering analysis, phylogeny construction, and divergence time estimation of NRY haplogroups, and Y-STR module mainly includes Y-STR genotyping, statistical calculation, network analysis, and estimation of time to the most recent common ancestor (TMRCA) based on Y-STR haplotypes. Performance comparison indicated that Y-LineageTracker outperformed existing Y-DNA analysis tools for the high performance and satisfactory visualization effect. CONCLUSIONS: Y-LineageTracker is an open-source and user-friendly command-line tool that provide multiple functions to efficiently analyze Y-DNA from NGS data at both Y-SNP and Y-STR level. Additionally, Y-LineageTracker supports various formats of input data and produces high-quality figures suitable for publication. Y-LineageTracker is coded with Python3 and supports Windows, Linux, and macOS platforms, and can be installed manually or via the Python Package Index (PyPI). The source code, examples, and manual of Y-LineageTracker are freely available at https://www.picb.ac.cn/PGG/resource.php or CodeOcean ( https://codeocean.com/capsule/7424381/tree ).
BACKGROUND: Y-chromosome DNA (Y-DNA) has been used for tracing paternal lineages and offers a clear path from an individual to a known, or likely, direct paternal ancestor. The advance of next-generation sequencing (NGS) technologies increasingly improves the resolution of the non-recombining region of the Y-chromosome (NRY). However, a lack of suitable computer tools prevents the use of NGS data from the Y-DNA studies. RESULTS: We developed Y-LineageTracker, a high-throughput analysis framework that not only utilizes state-of-the-art methodologies to automatically determine NRY haplogroups and identify microsatellite variants of Y-chromosome on a fine scale, but also optimizes comprehensive Y-DNA analysis methods for NGS data. Notably, Y-LineageTracker integrates the NRY haplogroup and Y-STR analysis modules with recognized strategies to robustly suggest an interpretation for paternal genetics and evolution. NRY haplogroup module mainly covers haplogroup classification, clustering analysis, phylogeny construction, and divergence time estimation of NRY haplogroups, and Y-STR module mainly includes Y-STR genotyping, statistical calculation, network analysis, and estimation of time to the most recent common ancestor (TMRCA) based on Y-STR haplotypes. Performance comparison indicated that Y-LineageTracker outperformed existing Y-DNA analysis tools for the high performance and satisfactory visualization effect. CONCLUSIONS: Y-LineageTracker is an open-source and user-friendly command-line tool that provide multiple functions to efficiently analyze Y-DNA from NGS data at both Y-SNP and Y-STR level. Additionally, Y-LineageTracker supports various formats of input data and produces high-quality figures suitable for publication. Y-LineageTracker is coded with Python3 and supports Windows, Linux, and macOS platforms, and can be installed manually or via the Python Package Index (PyPI). The source code, examples, and manual of Y-LineageTracker are freely available at https://www.picb.ac.cn/PGG/resource.php or CodeOcean ( https://codeocean.com/capsule/7424381/tree ).
Entities:
Keywords:
NGS; NRY haplogroup; Population genetics; Y-STR; Y-chromosome DNA
Authors: Sanghamitra Sengupta; Lev A Zhivotovsky; Roy King; S Q Mehdi; Christopher A Edmonds; Cheryl-Emiliane T Chow; Alice A Lin; Mitashree Mitra; Samir K Sil; A Ramesh; M V Usha Rani; Chitra M Thakur; L Luca Cavalli-Sforza; Partha P Majumder; Peter A Underhill Journal: Am J Hum Genet Date: 2005-12-16 Impact factor: 11.025
Authors: Anuradha Jagadeesan; S Sunna Ebenesersdóttir; Valdis B Guðmundsdóttir; Elisabet Linda Thordardottir; Kristjan H S Moore; Agnar Helgason Journal: Bioinformatics Date: 2021-05-01 Impact factor: 6.937
Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Snædis Kristmundsdottir; Hannes P Eggertsson; Gudny A Arnadottir; Bjarni V Halldorsson Journal: Bioinformatics Date: 2020-04-01 Impact factor: 6.937