| Literature DB >> 35216360 |
Kimberly Sturk-Andreaggi1,2,3, Joseph D Ring2,3, Adam Ameur1, Ulf Gyllensten1, Martin Bodner4, Walther Parson4,5, Charla Marshall2,3,5, Marie Allen1.
Abstract
Whole-genome sequencing (WGS) data present a readily available resource for mitochondrial genome (mitogenome) haplotypes that can be utilized for genetics research including population studies. However, the reconstruction of the mitogenome is complicated by nuclear mitochondrial DNA (mtDNA) segments (NUMTs) that co-align with the mtDNA sequences and mimic authentic heteroplasmy. Two minimum variant detection thresholds, 5% and 10%, were assessed for the ability to produce authentic mitogenome haplotypes from a previously generated WGS dataset. Variants associated with NUMTs were detected in the mtDNA alignments for 91 of 917 (~8%) Swedish samples when the 5% frequency threshold was applied. The 413 observed NUMT variants were predominantly detected in two regions (nps 12,612-13,105 and 16,390-16,527), which were consistent with previously documented NUMTs. The number of NUMT variants was reduced by ~97% (400) using a 10% frequency threshold. Furthermore, the 5% frequency data were inconsistent with a platinum-quality mitogenome dataset with respect to observed heteroplasmy. These analyses illustrate that a 10% variant detection threshold may be necessary to ensure the generation of reliable mitogenome haplotypes from WGS data resources.Entities:
Keywords: NUMTs; heteroplasmy; massively parallel sequencing; mitochondrial DNA; next-generation sequencing; nuclear elements of mtDNA; whole-genome sequencing
Mesh:
Substances:
Year: 2022 PMID: 35216360 PMCID: PMC8876724 DOI: 10.3390/ijms23042244
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Average analysis metrics for each sample category based on the mitochondrial DNA (mtDNA) analysis of the SweGen dataset. The passing category includes complete (all positions with at least 100X read depth) and nearly complete (one to four positions below 100X) haplotypes that met all quality control criteria. Incomplete haplotypes (five or more positions below 100X), possible mixtures and related samples were excluded from the analysis. The “% Mapped Reads” is the proportion of total reads that mapped to the GRCh37 and the “% mtDNA Reads” is the proportion of the mapped reads that aligned to the mtDNA reference genome.
| Category | Subclassification | Count | Total Reads | % Mapped | % mtDNA | Average Read Depth |
|---|---|---|---|---|---|---|
| Passing | Complete | 858 | 865,518,368.5 | 99.16% | 0.036% | 2365.9 |
| Nearly Complete | 59 | 833,866,169.5 | 99.20% | 0.010% | 652.5 | |
| Excluded | Incomplete | 17 | 784,241,829.7 | 99.32% | 0.010% | 592.0 |
| Mixed | 7 | 762,731,157.9 | 98.72% | 0.019% | 1191.1 | |
| Related | 1 | 869,537,450 | 99.52% | 0.029% | 1870.7 |
Figure 1Scatterplots of the correlation between (a) total reads (billions) and (b) the proportion of mapped reads that mapped to the mtDNA reference genome and the average mtDNA read depth. Samples are plotted based on the category and subclassification: green = complete with less than 1.15 billion total reads (n = 785), dark green = complete with more than 1.15 billion total reads (n = 73), light green = nearly complete (n = 59), gray = incomplete (n = 17), yellow = mixed (n = 7), and blue = related (n = 1).
Figure 2Classification of mixed positions that were observed in the mitochondrial genome (mitogenome) haplotypes from the SweGen whole-genome sequencing data. Mixed positions (white) were identified during initial variant detection using a 5% minimum minor nucleotide (light gray). These 833 mixed positions were detected with a 5% minor nucleotide frequency threshold and classified as either nuclear mtDNA segment (NUMT) variants or point heteroplasmies (PHPs) during multiple assessments (gray gradient). The 413 NUMT variants are shown in the inner plot (orange; scale 0–60 observations) and the 420 PHPs are displayed in the outer plot (green; scale 0–35 observations).
Figure 3Distribution of mixed positions across the circular mitogenome, including the hypervariable segments 1 and 2 (HVS1 and HVS2, respectively; dark gray) of the mtDNA control region and the entire mtDNA coding region (light gray). These 833 mixed positions were detected with a 5% minor nucleotide frequency threshold and classified as either a NUMT variant or PHP. The 413 NUMT variants are shown in the inner plot (orange; scale 0–60 observations) and the 420 PHPs are displayed in the outer plot (green; scale 0–35 observations).
Figure 4Distribution of the frequency (>5%) of the minor nucleotide for the 420 PHPs (green) included in the SweGen haplotypes and the 413 NUMT variants (orange) identified in the mitogenomes. The av-erage for each classification is shown as an “×” and outliers are shown as single data points.
Figure 5Distribution of average read depth based on the detection of NUMT interference at either a 2% frequency threshold or 5% frequency threshold. The sample count per NUMT variant observation category is noted in parentheses for each category. The average for each classification is shown as an “×” and outliers are shown as single data points.
Average analysis metrics of passing samples depending on the observation of NUMT variants in the mtDNA data. The “% Mapped Reads” is the proportion of total reads that mapped to the GRCh37 and the “% mtDNA Reads” is the proportion of the mapped reads that aligned to the mitochondrial reference genome.
| NUMT | Count | Total Reads | % Reads Mapped | % Mapped mtDNA | Average |
|---|---|---|---|---|---|
| None | 561 | 869,724,465.0 | 99.17% | 0.044% | 2919.1 |
| 2–5% | 265 | 854,695,905.9 | 99.16% | 0.021% | 1390.1 |
| ≥5% | 91 | 855,009,718.4 | 99.16% | 0.011% | 686.1 |