| Literature DB >> 34069212 |
Alisa Morshneva1,2, Polina Kozyulina1,2, Elena Vashukova1,2, Olga Tarasenko1,2, Natalia Dvoynova2, Anastasia Chentsova2, Olga Talantova1, Alexander Koroteev3,4, Dmitrii Ivanov3, Elena Serebryakova1, Tatyana Ivashchenko1, Aitalina Sukhomyasova5,6, Nadezhda Maksimova6, Olesya Bespalova1, Igor Kogan1, Vladislav Baranov1, Andrey Glotov1,2.
Abstract
Clinical tests based on whole-genome sequencing are generally focused on a single task approach, testing one or several parameters, although whole-genome sequencing (WGS) provides us with large data sets that can be used for many supportive analyses. In spite of low genome coverage, data of WGS-based non-invasive prenatal testing (NIPT) contain fully sequenced mitochondrial DNA (mtDNA). This mtDNA can be used for variant calling, ancestry analysis, population studies and other approaches that extend NIPT functionality. In this study, we analyse mtDNA pool from 645 cell-free DNA (cfDNA) samples of pregnant women from different regions of Russia, explore the effects of transportation and storing conditions on mtDNA content, analyse effects, frequency and location of mitochondrial variants called from samples and perform haplogroup analysis, revealing the most common mitochondrial superclades. We have shown that, despite the relatively low sequencing depth of unamplified mtDNA from cfDNA samples, the mtDNA analysis in these samples is still an informative instrument suitable for research and screening purposes.Entities:
Keywords: ClinVar; NIPT; SNPs; breast cancer; cfDNA; foetal fraction; mitochondrial diseases; mitochondrial variants; mtDNA; mtDNA haplogroups; population studies; transportation
Year: 2021 PMID: 34069212 PMCID: PMC8156457 DOI: 10.3390/genes12050743
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1The quantitative analysis of the mtDNA pool in samples. MtDNA content (A), mtDNA coverage (B) and sequencing depth (C) distribution in samples collected to EDTA (dark-grey) and Streck (light-grey) sample tubes.
The number of point mutations (SNPs) and indels before and after two-step filtration.
| Total Number of Variants | Indels | SNPs | |
|---|---|---|---|
| Total | 32,962 | 21,416 | 11,546 |
| Depth filtering | 22,017 | 17,035 | 4982 |
| Homopolymer filtering (≥4) | 14,681 | 10,479 | 4202 |
| Homopolymer filtering (≥3) | 8969 | 6450 | 2519 |
Figure 2Variant distribution: frequency and location in the mitochondrial genome. (A) Distribution of distinct variants according to their population frequency (by the percentage of samples carrying the particular variant). (B) Distribution of variants throughout the mitochondrial genome. Figures outside the circle mark position in the mitochondrial genome (16 kb mtDNA has been divided into 16 sections per 1 kb), figures in the inner circle represent the number of variants in every section.
Top-5 most frequent non-ancestral point variants. Columns from left to right: mtDNA sequence variant, variant frequency (number of patients and frequency).
| mtDNA Variant | Patients |
|---|---|
| m.15301G>A | 43 (6.7%) |
| m.489T>C | 36 (5.6%) |
| m.10400C>T | 36 (5.6%) |
| m.14783T>C | 34 (5.3%) |
| m.15452C>A | 29 (4.5%) |
Top-5 most frequent non-ancestral indels. Columns from left to right: mtDNA sequence variant, number of patients and frequency.
| mtDNA Variant | Patients |
|---|---|
| m.9906delG | 81 (12.6%) |
| m.10151delA | 44 (6.8%) |
| m.9916delC | 38 (5.6%) |
| m.2193delT | 28 (4.3%) |
| m.9808insT | 27 (4.2%) |
Top-5 most frequent non-ancestral SNPs presented in ClinVar database. Columns from left to right: mtDNA sequence variant, variant ID (rs), variant ID (ClinVar), clinical significance (ClinVar), diagnosis (ClinVar), number of patients and frequency.
| mtDNA Variant | rs ID | ClinVar ID | Clinical Significance | Diagnosis (ClinVar) | Number of Patients |
|---|---|---|---|---|---|
| m.15301G>A | 193302991 | 140591 | Conflicting interpretations of pathogenicity | Familial cancer of breast | 43 (6.7%) |
| m.14783T>C | 193302982 | 140588 | Conflicting interpretations of pathogenicity | Familial cancer of breast | 34 (5.3%) |
| m.15452C>A | 193302994 | 143925 | Benign | Neoplasm of ovary/ Leigh syndrome | 29 (4.5%) |
| m.3010G>A | 3928306 | 441149 | Drug response | Not provided | 28 (4.3%) |
| m.13708G>A | 28359178 | 9696 | Benign | Leber’s optic atrophy/Leigh syndrome | 10 (1.6%) |
Figure 3The comparison between mtDNA pools of buffy coat and cfDNA from blood plasma in samples with different transportation conditions (transported in EDTA (blood plasma) or Streck (whole blood) sample tubes). The bars represent the mean values for the buffy coat—cfDNA pairs (n = 4); error bars indicate the standard error of the mean (SEM). (A) Average mtDNA content (the ratio of mtDNA reads to nuclear DNA reads averaged over all samples), the average number of called mitochondrial variants (homopolymer variants were filtered out) and the average percentage of heteroplasmy in buffy coat and cfDNA samples. (B) The average number of the mitochondrial variants shared by mtDNA pools of buffy coat and cfDNA (homopolymer variants were filtered out).
Figure 4MtDNA content depending on storage conditions and transportation. Statistical significance has been measured with the Kruskal–Wallis test. (A) Distribution of samples according to time of transportation (days). (B) Distribution of samples with low, medium and high mtDNA content within each time period (percentage). (C) Foetal fraction scaled to gestation age in samples shipped under or over 7 days. No statistically significant difference found (p = 0.09). (D) Foetal fraction scaled to gestation age in samples grouped by mtDNA rate. The difference is statistically significant (p = 0.007). (E) Temporal dynamic of mtDNA content in cfDNA fraction extracted from samples stored in EDTA (blood plasma) or Streck (whole blood) sample tubes. The difference is significant for Streck samples (p = 0.0014).
Figure 5Geographic origin of explored samples. (A) Distribution of samples according to geographic region (Others include Southern, Far Eastern and North Caucasian regions, presented with a minor fraction of samples). (B) A Venn diagram representing the intersection between sets of variants in four regions: Northwestern, Central, Volga-Ural and Yakutia. Figures represent the number of distinct variants in every section and the percentage of the total number of variants. (C) Distribution of samples in every region according to mtDNA content. (D) Distribution of samples in every region by the number of variants, normalised to mtDNA content.
Top-3 non-ancestral variants that are frequent in Yakutia and rare in other regions. Columns from left to right: mtDNA sequence variant, the frequency of the variant in each region (% from the number of samples in each region), the total number of patients carrying the variant.
| DNA Seq Variant | Central (%) | Northwestern (%) | Volga-Ural (%) | Yakutia (%) | Others (%) | Number of Patients |
|---|---|---|---|---|---|---|
| m.15301G>A | 1.81 | 8.00 | 0.0 | 79.17 | 0.00 | 54 (8.4%) |
| m.10400C>T | 0.00 | 2.67 | 2.5 | 75.00 | 0.00 | 41 (6.4%) |
| m.12704TC>T | 1.36 | 3.33 | 0.0 | 47.92 | 0.00 | 31 (4.8%) |
Figure 6Haplogroup analysis with HaploGrep2. (A) Distribution of samples according to their superclade. (B) Distribution of samples from different regions according to their superclade. Others include Volga-Ural, Southern, Far Eastern and North Caucasian regions. The bars indicate the proportion of each superclade (percentage from the total number of samples in each region).
Figure 7Filtration of the variants. (A) The first step of filtration or filtering out the variants with the lowest depths (less than 5), the vertical line sets the threshold. (B) The second step of filtration or filtering out the variant with the low number of reads carrying an alternative allele, the vertical line sets the threshold in 4 reads. (C) The distribution of variants after two steps of filtration. (D) The distribution of filtered variants by the number of reads supporting a reference allele—homoplasmic (all reads carry an alternative allele) and heteroplasmic (there are both reads with alternative and reference allele).
Figure 8Clinical significance of detected SNPs. (A) All clinical diagnoses associated with analysed variants were collected and represented in descending order by the number of occurrences, cases, where the found diagnosis was the only diagnosis provided for this particular variant, are presented in dark grey colour, while the ones where the diagnosis was provided in combination with other possible interpretations are presented in light grey colour. (B) In total, clinical diagnosis was provided for 68.7% of analysed variants. (C) Distribution of detected SNPs by their clinical significance according to ClinVar. Light grey represents likely benign variants, dark grey represents pathogenic variants.