| Literature DB >> 34654896 |
Víctor García-Olivares1, Adrián Muñoz-Barrera1, José M Lorenzo-Salazar1, Carlos Zaragoza-Trello2, Luis A Rubio-Rodríguez1, Ana Díaz-de Usera1, David Jáspez1, Antonio Iñigo-Campos1, Rafaela González-Montelongo1,3, Carlos Flores4,5,6,7.
Abstract
The mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. We also assessed the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34654896 PMCID: PMC8519921 DOI: 10.1038/s41598-021-99895-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
List of software assessed for human mtDNA haplogroup classification. All the tools assessed classify mtDNA sequences according to the PhyloTree nomenclature by using the latest version Build 17. Two of them, MitoTool and Phy-Mer, remain outdated in regards of the PhyloTree built when this study was conducted.
| Tool | User interface | Release year | Version | Latest release | Database version | Input options | Execution time$ (seconds) |
|---|---|---|---|---|---|---|---|
| James Lick'sa | Web | 2010 | 0.19a | 2013.04.08 | Phylotree 17 | FASTA | 20–27 |
| Haplogrepb | Web / CLI* | 2011 | 2.2.4 | 2016.07.08 | Phylotree 17 | VCF | FASTA | 0.1 | 1–3 |
| MitoToolc | App | 2011 | 1.1.2 | 2013.11.02 | Phylotree 16 | FASTA | 2–3 |
| Haplofindd | Web | 2013 | – | 2013.01.12 | Phylotree 17 | FASTA | 11–15 |
| EMMAe | Web | 2013 | 13 | 2019.11.28 | Phylotree 17 | FASTA | 10–38 |
| Phy-Merf | Web / CLI* | 2015 | 1.0.0 | 2016.01.14 | Phylotree 16 | BAM | FASTA | 8–298 | 51 -54 |
| MitoSuiteg | App | 2017 | 1.0.9 | 2017.06.06 | Phylotree 17 | BAM | 10–340 |
| MixEmth | CLI | 2017 | 0.1 | 2017.05.09 | Phylotree 17 | BAM | 79–5034 |
| Haplochecki | Web / CLI* | 2019 | 1.1.0 | 2019.11.17 | Phylotree 17 | BAM | VCF | 3–14 | 1 -2 |
| Haplotrackerj | Web | 2020 | – | 2020.04.23 | Phylotree 17 | FASTA | 30–32 |
| HaploGrouperk | CLI | 2020 | – | 2020.08.17 | Phylotree 17 | VCF | 0.1 |
*CLI Command-line interface. $A range is provided given the differences among samples in the size of the input files.
a: https://dna.jameslick.com/mthap/; b: https://haplogrep.i-med.ac.at[16]; c: http://mitotool.org[24,45]; d: https://haplofind.unibo.it[46]; e: https://empop.online[28]; f: https://github.com/MEEIBioinformaticsCenter/phy-mer[27]; g: https://mitosuite.com[25]; h: https://github.com/svohr/mixemt[30]; i: https://github.com/genepi/haplocheck[31]; j: https://haplotracker.cau.ac.kr[26]; k: https://gitlab.com/bio_anth_decode/haploGrouper[47].
Figure 1Circular plot of the depth of coverage for short-read and long-read sequencing in the mtDNA of an exemplar sample. Short-read WGS and WES data are colored in green and red, respectively. Long-read WGS data is shown in blue.
Figure 2Predicted probability values (and 95% confidence intervals) of the GLMM model estimated for each tool for (a) WGS and (b) WES datasets. In light grey, the raw data from the haplogroup classification results. JML James Lick’s, HPG Haplogrep, MTO MitoTool, HPF Haplofind, EMA EMMA, PHY Phy-Mer, MTS MitoSuite, MIX MixEmt, HPC Haplocheck, HPT Haplotracker, HPR HaploGrouper.
Qualitative assessment of mtDNA haplogroup classification tools. Performance of each tool is evaluated across different features and represented on a color scale based on the level of performance: green triangle pointing up for good, orange square for fair, and red triangle pointing down for low performance.
Haplogroup classification accuracy of each tool was categorized into three ranges based on the predicted probabilities by each application: tools with a predicted probability of incorrectly classifying a haplogroup below 1% were represented as good performance, for a fair performance was established a range between 1.01% and 10%, and as low performance tools with a predicted probability higher than 10.01%. For computation time, three intervals were established: tools that classified samples in less than a minute were defined as good performance, from 1 to 5 min were categorized as fair performance, and those tools that required more than an hour were represented as low performance. Regarding the PhyloTree database used, it was represented as low performance those tools which are not updated to the latest version of Phylotree, Build 17, and those that allow the latest version as high performance. Multi-sample function was evaluated based on the possibility of cohort analysis. Tools that allow to process these functions were defined as good performance, tools that allow processing several samples by using a loop through a command-line were categorized as fair performance, and tools without this ability were represented as low performance. Based on the file format supported two categories were established: tools that support various input format files categorized as high performance and those that only support one format file that are represented as low performance. The user interface was divided into two categories, tools based on web or desktop applications defined as good performance, and tools developed exclusively for CLI as low performance. The tool maintenance was classified into two classes, tools updated continually or have been recently released, both identified as good performance, and tools that have not been updated during the last years. The last feature is the presence or not of additional functions; tools that have other functions implemented are categorized as good performance. Those tools without more functions were determined as low performance.
Long-read (ONT) sequencing summary and haplogroup classification results for the samples. The results of the short-read whole-genome sequencing (WGS) mtDNA classification are also shown.
| Samples | ONT sequencing | Variant-calling | Reference-based assembly | De novo assembly | Hybrid de novo assembly | Short-read WGS | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mapped reads | Depth of coverage | Variants | Haplogroup | Variants | Haplogroup | Variants | Haplogroup | Variants | Haplogroup | Variants | Haplogroup | |
| 3,065 | 581 | 14 | H1ao1 | 69 | H1ao1 | 119 | H1ao1 | 16 | H1ao1 | 14 | H1ao1 | |
| 2,981 | 566 | 36 | J2a2d | 141 | J2a2d | 154 | J2a2d | 46 | J2a2d | 44 | J2a2d | |
| 2,343 | 444 | 15 | H6a1b2 | 132 | H6a1b2 | 175 | H6a1b | 17 | H6a1b2 | 14 | H6a1b2 | |
| 2,126 | 358 | 33 | U4c1 | 66 | U4c1 | 602 | U4c1 | 37 | U4c1 | 37 | U4c1 | |
| 4,209 | 821 | 10 | H2a2 | 59 | H + 16,189 | 41 | H + 16,189 | 15 | H + 16,189 | 12 | H + 16,189 | |
| 1,737 | 532 | 29 | K1a1b1 | 69 | K1a1b1 | 78 | K1a1b1 | 37 | K1a1b1 | 35 | K1a1b1 | |
| 1,776 | 398 | 20 | U6b1a1 | 67 | U6b1a1 | 119 | U6b1a1 | 28 | U6b1a1 | 26 | U6b1a1 | |
| 4,256 | 910 | 34 | L3f1b1a | 90 | L3f1b1a | 174 | L3f1b1a | 42 | L3f1b1a | 43 | L3f1b1a | |