| Literature DB >> 33575563 |
Nicolas Dierckxsens1, Patrick Mardulyn1,2, Guillaume Smits1,3,4.
Abstract
Heteroplasmy, the existence of multiple mitochondrial haplotypes within an individual, has been studied across different scientific fields. Mitochondrial genome polymorphisms have been linked to multiple severe disorders and are of interest to evolutionary studies and forensic science. Before the development of massive parallel sequencing (MPS), most studies of mitochondrial genome variation were limited to short fragments and to heteroplasmic variants associated with a relatively high frequency (>10%). By utilizing ultra-deep sequencing, it has now become possible to uncover previously undiscovered patterns of intra-individual polymorphisms. Despite these technological advances, it is still challenging to determine the origin of the observed intra-individual polymorphisms. We therefore developed a new method that not only detects intra-individual polymorphisms within mitochondrial and chloroplast genomes more accurately, but also looks for linkage among polymorphic sites by assembling the sequence around each detected polymorphic site. Our benchmark study shows that this method is capable of detecting heteroplasmy more accurately than any method previously available and is the first tool that is able to completely or partially reconstruct the sequence for each mitochondrial haplotype (allele). The method is implemented in our open source software NOVOPlasty that can be downloaded at https://github.com/ndierckx/NOVOPlasty.Entities:
Year: 2019 PMID: 33575563 PMCID: PMC7671380 DOI: 10.1093/nargab/lqz011
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.Workflow of heteroplasmy detection with NOVOPlasty. For simplicity, the representation of the workflow is limited to unidirectional extension. (A) During consensus calling, all positions with a polymorphism higher than the given minor allele frequency (MAF) are marked for further analysis. (B) Each marked position will be subjected to additional filtering steps, low quality scores and over-represented reads (as a result of duplications during PCR) will be removed. (C) The remaining polymorphisms will be subjected to an additional assembly step to link to other mutations sites or to exclude NUMT sequences.
Summary of the characteristics of each dataset
| F1&F2 | mtDNA-server | MToolBox |
| SRR2098263 | ERR3243159 | ERR1395547 | |
|---|---|---|---|---|---|---|---|
| Origin | WGS | Targeted | Simulated mtDNA | WGS | WGS | WGS | WGS |
| Platform | HiSeqX | HiSeq | / | HiSeq 2500 | HiSeq 2500 | NovaSeq 6000 | HiSeq 2000 |
| Read length | 151 bp | 101 bp | 101 bp | 251 bp | 100 bp | 150 bp | 100 bp |
| Insert size | 420 bp | 265 bp | 500 bp | 520 bp | 300 bp | 450 bp | 325 bp |
| Mitochondrial coverage | 3650 | 59 500 | 100–2000 | 4200 | 4000 | 20 000 | 5000 |
| Fraction of dataset | 0.04% | 97% | 99% | 0.13% | 0.08% | 0.25% | 0.08% |
Figure 2.Detailed view of the control region from the Circos output of the mtDNA-Server dataset. Positions linked by blue lines are on the same sequence, while positions linked by red lines originate from different sequences. Position 16,129 is one of the 5 sample-specific heteroplasmic mutations and has no linkage with the other polymorphisms. The red circle indicates the missing SNP in the low complexity region that was not detected without the linkage module.
Benchmark results of the heteroplasmy analysis of the mtDNA-Server and MToolBox datasets
| NOVOPlasty | mtDNA-Server | LoFreq | MToolBox | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
| mtDNA-Server | 27/28* | 0 | 24/28 | 0 | 26/28 | 30 | 28/28 | 741 | 92 |
| mtDNA-Server (intra-individuel) | 5/5 | 0 | 3/5 | 0 | 5/5 | 30 | 5/5 | 741 | 92 |
| MToolBox (2000×, 25%) | 5/5 | 0 | 1/5 | 0 | 5/5 | 1 | 5/5 | 2 | 1 |
| MToolBox (2000×, 0.5%) | 0/5 | 0 | 0/5 | 0 | 2/5 | 0 | 5/5 | 0 | 0 |
| MToolBox (500×, 25%) | 5/5 | 0 | 1/5 | 0 | 5/5 | 0 | 5/5 | 0 | 0 |
| MToolBox (500×, 0.5%) | 1/5 | 0 | 0/5 | 0 | 1/5 | 0 | 0/5 | 0 | 0 |
| MToolBox (100×, 25%) | 3/3 | 0 | 1/3 | 0 | 3/3 | 0 | 3/3 | 0 | 0 |
| MToolBox (100×, 0.5%) | 0/3 | 0 | 0/3 | 0 | 0/3 | 0 | 0/3 | 0 | 0 |
*The missing heteroplasmic position was indirectly detected by the linkage module.
Benchmark results of the heteroplasmy analysis of 9 human WGS datasets
| NOVOPlasty | mtDNA-Server | LoFreq | MToolBox | ||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| SNPs > 0.6% | |
| F1-P1 (Distant relative) | 3 | 2 | 0.75–0.87 % | 1535 | 1460 | 1372 | 119 |
| F1-P2 (Mother) | 4 | 3 | 0.7–6.4 % | 1601 | 1372 | 465 | 182 |
| F1-P3 (Son) | 1 | 1 | 1.10% | 1073 | 1236 | 2225 | 89 |
| F2-P1 (Daughter) | 75 | 9 | 0.8–1.4 % | 3045 | 2731 | 248 | 186 |
| F2-P2 (Distant relative) | 6 | 6 | 0.66–52 % | 978 | 817 | 1711 | 116 |
| F2-P3 (Father) | 71 | 3 | 0.6–1.2 % | 3245 | 2855 | 192 | 165 |
| SRR2098263 | 3 | 3 | 0.7–67 % | 1160 | 275 | 88 | 31 |
| ERR3243159 | 4 | 4 | 0.89–33% | / | 63 | 397 | 71 |
| ERR1395547 | 21 | 20 (6*) | 0.6–1.1% | / | 196 | 161 | 72 |
*After inspection of the linkage assemblies, 14 additional heteroplasmic positions were identified as NUMTs.
Figure 3.Circos output of the COI region generated by NOVOPlasty for the Gonioctena intermedia dataset with a MAF of 1%. All the detected SNPs for this region are indicated by their position in the complete mitochondrial genome. The 6 positions confirmed by Sanger in a previous study (29) are encircled in blue, the additional SNPs that were detected by SAMtools (MAF > 10%) in the same study are encircled in red. (A) Only SNPs that are fully linked to each other, which implies that these SNPs can always be found together, are connected with each other. (B) Fully linked (blue) and partially linked (yellow) SNPs are shown. Partially linked SNPs can be found together in some sequences, but not all.