| Literature DB >> 35350246 |
Eddie K K Ip1,2, Michael Troup1, Colin Xu3, David S Winlaw4, Sally L Dunwoodie1,2, Eleni Giannoulatou1,2.
Abstract
Mitochondrial DNA (mtDNA) mutations contribute to human disease across a range of severity, from rare, highly penetrant mutations causal for monogenic disorders to mutations with milder contributions to phenotypes. mtDNA variation can exist in all copies of mtDNA or in a percentage of mtDNA copies and can be detected with levels as low as 1%. The large number of copies of mtDNA and the possibility of multiple alternative alleles at the same DNA nucleotide position make the task of identifying allelic variation in mtDNA very challenging. In recent years, specialized variant calling algorithms have been developed that are tailored to identify mtDNA variation from whole-genome sequencing (WGS) data. However, very few studies have systematically evaluated and compared these methods for the detection of both homoplasmy and heteroplasmy. A publicly available synthetic gold standard dataset was used to assess four mtDNA variant callers (Mutserve, mitoCaller, MitoSeek, and MToolBox), and the commonly used Genome Analysis Toolkit "best practices" pipeline, which is included in most current WGS pipelines. We also used WGS data from 126 trios and calculated the percentage of maternally inherited variants as a metric of calling accuracy, especially for homoplasmic variants. We additionally compared multiple pathogenicity prediction resources for mtDNA variants. Although the accuracy of homoplasmic variant detection was high for the majority of the callers with high concordance across callers, we found a very low concordance rate between mtDNA variant callers for heteroplasmic variants ranging from 2.8% to 3.6%, for heteroplasmy thresholds of 5% and 1%. Overall, Mutserve showed the best performance using the synthetic benchmark dataset. The analysis of mtDNA pathogenicity resources also showed low concordance in prediction results. We have shown that while homoplasmic variant calling is consistent between callers, there remains a significant discrepancy in heteroplasmic variant calling. We found that resources like population frequency databases and pathogenicity predictors are now available for variant annotation but still need refinement and improvement. With its peculiarities, the mitochondria require special considerations, and we advocate that caution needs to be taken when analyzing mtDNA data from WGS data.Entities:
Keywords: benchmarking; heteroplasmic; homoplasmic; mitochondrial DNA; variant‐caller; whole-genome sequencing
Year: 2022 PMID: 35350246 PMCID: PMC8957813 DOI: 10.3389/fgene.2022.692257
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
A descriptive summary of mtDNA variant callers selected for comparative analyses.
| Features | Mutserve v1.1.17 | MitoSeek v1.3 | mitoCaller v1.0 | MToolBox v1.2 | GATK v3.7 and v4.2- HaplotypeCaller |
|---|---|---|---|---|---|
| Version from 19/02/19 | Version from 13/05/2019 | Version from 16/01/2018 | Version from 15/11/2021 | Versions from 12/12/2016 and 18/0/2021 | |
| Process Location | Local, command line | Local, command line | Local, command line | Local, command line | Local, command line |
| Input File format | BAM | BAM | BAM | BAM | BAM |
| Mitochondrial alignment | rCRS mitochondrial reference | rCRS mitochondrial reference | “double alignment” strategy, with rCRS mitochondrial reference and a shifted rCRS reference | RSRS and rCRS mitochondrial reference | rCRS mitochondrial reference |
| Heteroplasmic variant detection | Yes | Yes | Yes | Yes | Yes |
| Homoplasmic variant detection | Yes | No | Yes | Yes | Yes |
| Default heteroplasmic threshold | 1% | 5% | Based on likelihood model | 20% | Based on likelihood model |
| Altering heteroplasmic threshold | --level parameter | -hp parameter | a custom script for threshold detection was used in the study | hf_min parameter in config file | a custom script for threshold detection was used in the study |
| Description of the variant-calling algorithm | Maximum likelihood model for detecting heteroplasmic variants | One-tail Fisher’s exact test for detecting heteroplasmic variants | Maximum likelihood-based model for both heteroplasmic and homoplasmic variants detection | Identification of mismatches in the newly assembled mitochondrial genome (or differences in the SAM CIGAR string for INDELs) | Bayesian model for both heteroplasmic and homoplasmic variants detection |
FIGURE 1Accuracy comparison of mtDNA variant callers at heteroplasmy threshold of 1%, across different Taq polymerases and DNA extraction protocols using a synthetic benchmark dataset. (A) F1 scores of four variant callers using Clontech Taq polymerase and total DNA extraction. (B) F1 scores of four variant callers using Herk Taq polymerase and total DNA extraction. (C) F1 scores of four variant callers using NEB Taq polymerase and total DNA extraction. (D) F1 scores of four variant callers using Clontech Taq polymerase and PCR products. (E) F1 scores of four variant callers using Herk Taq polymerase and PCR products. (F) F1 scores of four variant callers using NEB Taq polymerase and PCR products.
mtDNA variants called in CHD trio dataset by variant callers using their default parameters (1% heteroplasmy threshold for Mutserve, 5% for Mitoseek, 0 for mitoCaller, and 20% for MToolBox).
| Caller | Comparing child with parents (126 trios) | Comparing mother with father | |||||
|---|---|---|---|---|---|---|---|
| # Variants in child | # Shared with mother | # Shared with father | # Variants in mother | # Shared with father | |||
| GATK Haplotypecaller | Homoplasmy | 3,357 | 3,339 (99.46%) | 1,320 (39.32%) | 3,348 | 1,320 (39.43%) | |
| Heteroplasmy | 62 | 36 (58.06%) | 6 (9.68%) | 62 | 6 (9.68%) | ||
| mitoCaller | Homoplasmy | 2,630 | 2,297 (87.34%) | 980 (37.26%) | 2,518 | 966 (38.36%) | |
| Heteroplasmy | 5,284 | 2,315 (43.81%) | 1,905 (36.05%) | 5,551 | 1,839 (33.13%) | ||
| Mutserve | Homoplasmy | 3,112 | 2,968 (95.37%) | 1,230 (39.52%) | 3,043 | 1,227 (40.32%) | |
| Heteroplasmy | 1,099 | 485 (44.13%) | 485 (44.13%) | 1,441 | 506 (35.11%) | ||
| MitoSeek | Homoplasmy | NA | NA | NA | NA | NA | |
| Heteroplasmy | 160 | 80 (50.00%) | 52 (32.50%) | 131 | 46 (35.11%) | ||
| MToolBox | Homoplasmy | 457 | 280 (61.27%) | 11 (2.41%) | 403 | 8 (1.99%) | |
| Heteroplasmy | 661 | 517 (78.21%) | 60 (9.08%) | 701 | 64 (9.13%) | ||
mtDNA variants called by variant callers using a 5% heteroplasmy threshold.
| Caller | Comparing child with parents (126 trios) | Comparing mother with father | ||||
|---|---|---|---|---|---|---|
| #Variants in child | #Shared with mother | #Shared with father | #Variants in mother | #Shared with father | ||
| GATK Haplotypecaller | Homoplasmy | 3357 | 3339 (99.46%) | 1320 (39.32%) | 3348 | 1,320 (39.43%) |
| Heteroplasmy | 62 | 36 (58.06%) | 6 (9.68%) | 62 | 6 (9.68%) | |
| mitoCaller | Homoplasmy | 3147 | 3126 (99.33%) | 1186 (37.69%) | 3139 | 1,188 (37.85%) |
| Heteroplasmy | 341 | 312 (91.50%) | 244 (71.55%) | 352 | 248 (70.45%) | |
| Mutserve | Homoplasmy | 3246 | 3195 (98.43%) | 1249 (38.48%) | 3215 | 1,249 (38.85%) |
| Heteroplasmy | 75 | 29 (38.67%) | 0 (0.00%) | 90 | 3 (3.33%) | |
| MitoSeek | Homoplasmy | NA | NA | NA | NA | NA |
| Heteroplasmy | 160 | 80 (50.00%) | 52 (32.50%) | 131 | 46 (35.11%) | |
| MToolBox | Homoplasmy | 449 | 272 (60.58%) | 11 (2.45%) | 394 | 8 (2.03%) |
| Heteroplasmy | 830 | 662 (79.76%) | 164 (19.76%) | 887 | 164 (18.49%) | |
FIGURE 2Allelic distributions of mitochondrial variants according to four variant calling methods. (A) Allelic fraction distributions of heteroplasmic variants with default detection threshold. The heteroplasmic variant allele fractions for Mutserve, MitoSeek, and MToolBox are at the minimum and equal to their default heteroplasmy thresholds of 1%, 5%, and 20%, respectively. mitoCaller has no threshold, which is reflected in the low VAF values in the plot. With GATK Haplotypecaller, which is not a specific mitochondrial caller, the allelic fraction represents values typical of a heterozygous call from autosomal genomic variant calling (median values: GATK Haplotypecaller—29.9%; mitoCaller—0.5%; Mutserve—1.5%; MitoSeek—4.7%; MToolBox—99.7%). (B) Allelic fraction distributions for heteroplasmic variants at a detection threshold of 1% (median values: GATK Haplotypecaller—29.9%; mitoCaller—2.2%; Mutserve—1.5%; MitoSeek—1.5%; MToolBox—1.9%). (C) Allelic fraction distributions for heteroplasmic variants at a detection threshold of 5% (median values: GATK Haplotypecaller—9.9%; mitoCaller—54%; Mutserve—37.3%; MitoSeek—14.7%; MToolBox—99.6%). (D) Allelic fraction distributions for homoplasmic variants with default detection thresholds and excluding MitoSeek, which does not call homoplasmic variants. As expected for homoplasmic variants, the allelic fractions are almost all at the 100% level. (E) Allelic fraction distributions for homoplasmic variants at a detection threshold of 1%. There is no change for GATK Haplotypecaller and Mutserve calls. mitoCaller now includes homoplasmic variants that are in the 99% range (100%—1% heteroplasmy detection threshold). (F) Allelic fraction distributions for homoplasmic variants at a heteroplasmy detection threshold of 5%, showing an allele fraction change for mitoCaller to include 95% range.
FIGURE 3Concordance of mitochondrial variants between the variant callers, Mutserve, MitoSeek, mitoCaller, and GATK Haplotypecaller. (A) Concordance of heteroplasmic variants called by the various callers using their default parameters. The first value in the diagram represents the number of mitochondrial variants, and the second value is the percentage of total. (B) Concordance of homoplasmic variants called by the various callers using their default parameters, excluding MitoSeek that does not identify homoplasmic variants. (C) Concordance of heteroplasmic variants called by the various callers using a heteroplasmic threshold of 5%. (D) Concordance of homoplasmic variants called by the various callers using a heteroplasmic threshold of 5%.
The number of variants called by the mtDNA variant callers for dataset, by variant type, and population frequency.
| Heteroplasmy detection threshold | Variant caller | Total variants (300 samples) | Homoplasmic (rare, common) | Heteroplasmic (rare, common) |
|---|---|---|---|---|
| default | GATK Haplotypecaller | 8,019 | 7,856 (1,456, 6,400) | 163 (94, 69) |
| mitoCaller | 19,387 | 5,943 (1,029, 4,914) | 13,444 (13, 120, 324) | |
| MitoSeek | 360 | NA | 360 (358, 2) | |
| Mutserve | 10,490 | 7,202 (1,499, 5,703) | 3,288 (3,233, 55) | |
| MToolBox | 2,688 | 979 (603, 376) | 1,709 (1,662, 47) | |
| 1% | GATK Haplotypecaller | 8,019 | 7,856 (1,456, 6,400) | 163 (94, 69) |
| mitoCaller | 9,738 | 7,267 (1,342, 5,925) | 2,470 (2,260, 210) | |
| MitoSeek | 2,549 | NA | 2,549 (2,541, 8) | |
| Mutserve | 10,490 | 7,202 (1,499, 5,703) | 3,288 (3,233, 55) | |
| MToolBox | 8,495 | 970 (602, 368) | 7,525 (7,215, 310) | |
| 5% | GATK Haplotypecaller | 8,019 | 7,856 (1,456, 6,400) | 163 (94, 69) |
| mitoCaller | 8,200 | 6,358 (1,391, 5,967) | 842 (796, 46) | |
| MitoSeek | 360 | NA | 360 (358, 2) | |
| Mutserve | 7,791 | 7,567 (1,595, 5,972) | 224 (204, 20) | |
| MToolBox | 3,113 | 960 (596, 364) | 2,153 (2,083,70) |
Note: Population frequency determined using helixMTdb; rare denote variants ≤ population allele frequency of 1%. Common denote variants > population allele frequency of 1%.