| Literature DB >> 32998193 |
Charla Marshall1,2,3, Kimberly Sturk-Andreaggi1,2,4, Joseph D Ring1,2, Arne Dür5, Walther Parson3,6.
Abstract
Given the enhanced discriminatory power of the mitochondrial DNA (mtDNA) genome (mitogenome) over the commonly sequenced control region (CR) portion, the scientific merit of mitogenome sequencing is generally accepted. However, many laboratories remain beholden to CR sequencing due to privacy policies and legal requirements restricting the use of disease information or coding region (codR) information. In this report, we present an approach to obviate the reporting of sensitive codR data in forensic haplotypes. We consulted the MitoMap database to identify 92 mtDNA codR variants with confirmed pathogenicity. We determined the frequencies of these pathogenic variants in literature-quality and forensic-quality databases to be very low, at 1.2% and 0.36%, respectively. The observed effect of pathogenic variant filtering on random match statistics in 2488 forensic-quality mitogenome haplotypes from four populations was nil. We propose that pathogenic variant filtering should be incorporated into variant calling algorithms for mitogenome haplotype reporting to maximize the discriminatory power of the locus while minimizing the reveal of sensitive genetic information.Entities:
Keywords: EMPOP; coding region; haplogroup; haplotype; mitochondrial DNA; mitochondrial genome; pathogenic variants; population statistics; random match probability; variant filtering
Mesh:
Substances:
Year: 2020 PMID: 32998193 PMCID: PMC7599696 DOI: 10.3390/genes11101140
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
List of the 92 confirmed pathogenic variants described in MitoMap [8], separated by homoplasmic state (n = 37) or heteroplasmic state (n = 83). Note that there are two multi-nucleotide variants affecting multiple nucleotide positions: the dinucleotide deletion at T9205-A9206 and the sequence inversion causing A3902G C3904A T3905A T3906G C3908T. Conversely, there are five nucleotide positions with two different pathogenic variants each (e.g., T9176C and T9176G). Thus, the 92 confirmed pathogenic variants affected 92 unique coding region (codR) positions. Variants that are confirmed to be pathogenic in both homoplasmic and heteroplasmic states are underlined.
| Homoplasmic Variants with Confirmed Pathogenicity | Heteroplasmic Variants with Confirmed Pathogenicity |
|---|---|
| G583A, |
Figure 1Circular depiction of the 92 confirmed pathogenic variants affecting 92 coding region (codR) positions across the mitochondrial genome as described in MitoMap [8]. Homoplasmic state (n = 37) pathogenic variants are visible in the inner circle in red, and heteroplasmic state (n = 83) pathogenic variants are presented in the outer circle in blue. Pathogenic variants are represented by a line at the corresponding nucleotide position. The height of the line corresponds to the number of pathogenic variants observed at each nucleotide position, up to a maximum of two for both homoplasmic and heteroplasmic states. The control region is shown in purple, transfer RNA (tRNA) regions are orange, and codR genes are green and labeled accordingly with the corresponding gene. RNR1 = 12S RNA; RNR2 = 16S RNA; ND1 = NADH dehydrogenase 1; ND2 = NADH dehydrogenase 2; COX1 = cytochrome c oxidase I; COX2 = cytochrome c oxidase 2; ATP8 = ATP synthase 8; ATP6 = ATP synthase 6; COX3= cytochrome c oxidase 3; ND3 = NADH dehydrogenase 3; ND4L = NADH 4L dehydrogenase; NADH4 = NADH dehydrogenase 4; ND5 = NADH dehydrogenase 5; NADH6 = NADH dehydrogenase 6; CYTB = cytochrome b. This figure was generated using the circlize package v0.4.10 in R [12].
The 29 confirmed pathogenic variants found in literature-quality mitogenomes (n = 26,013) in the EDNAP mtDNA Population Database (EMPOP) [13]. Heteroplasmic states are indicated in parentheses with the corresponding IUPAC code. Pathogenic variants that were observed in both homoplasmic and heteroplasmic states are underlined.
| Count of Observations | Observed Frequency | Confirmed Pathogenic Variants Observed |
|---|---|---|
| 1 | 0.004% | T616C, |
| 2 | 0.008% | |
| 3 | 0.012% | C1494T, T8993G, T9176C, |
| 4 | 0.015% | G3635A, C7471CC, T14674C |
| 6 | 0.023% | C14568T |
| 25 | 0.096% |
|
| 37 | 0.142% | A1555G |
| 55 | 0.211% |
|
| 131 | 0.504% |
|
The change in the number of unique haplotypes, observed random match probability (RMP), and haplotype diversity estimates when considering point heteroplasmies (i.e., literal search mode) for forensic-quality mitochondrial genome datasets after filtering confirmed pathogenic variants.
| Original Dataset | Filtered Dataset | |||||||
|---|---|---|---|---|---|---|---|---|
| Population | Samples | Unique Haplotypes | RMP | Haplotype Diversity | Pathogenic Variants Filtered | Change in Unique Haplotypes | Change in RMP | Change in Haplotype Diversity |
| West Eurasian | 623 | 575 (30 shared) | 0.21% | 0.9995 | 1 | 0 | 0% | 0 |
| African | 613 | 597 (15 shared) | 0.17% | 0.9999 | 3 | 0 | 0% | 0 |
| East Asian | 630 | 557 (45 shared) | 0.22% | 0.9994 | 4 | 0 | 0% | 0 |
| Hispanic/Native American | 622 | 568 (43 shared) | 0.20% | 0.9996 | 1 | 0 | 0% | 0 |
Figure 2Example bioinformatics workflow showing the addition of pathogenic variant filtering to the analytical process for forensic mitochondrial genome (mitogenome) haplotype reporting. Sequence data are produced, processed, and aligned to the revised Cambridge Reference Sequence (rCRS) [11] for variant detection. Then, pathogenic variant filtering will be automatically performed before the mitochondrial DNA (mtDNA) profile is reported. Here, the pathogenic variants are depicted in red (homoplasmic) and blue (heteroplasmic), and both are filtered. Our results indicate that less than 1% of mtDNA haplotypes will require pathogenic variant filtering and the remaining 99% of mtDNA haplotypes will remain unchanged because they lack pathogenic variants.