| Literature DB >> 32403285 |
Maria Angela Diroma1, Angelo Sante Varvara2, Marcella Attimonelli2, Graziano Pesole1,2, Ernesto Picardi1,2.
Abstract
Mitochondria host multiple copies of their own small circular genome that has been extensively studied to trace the evolution of the modern eukaryotic cell and discover important mutations linked to inherited diseases. Whole genome and exome sequencing have enabled the study of mtDNA in a large number of samples and experimental conditions at single nucleotide resolution, allowing the deciphering of the relationship between inherited mutations and phenotypes and the identification of acquired mtDNA mutations in classical mitochondrial diseases as well as in chronic disorders, ageing and cancer. By applying an ad hoc computational pipeline based on our MToolBox software, we reconstructed mtDNA genomes in single cells using whole genome and exome sequencing data obtained by different amplification methodologies (eWGA, DOP-PCR, MALBAC, MDA) as well as data from single cell Assay for Transposase Accessible Chromatin with high-throughput sequencing (scATAC-seq) in which mtDNA sequences are expected as a byproduct of the technology. We show that assembled mtDNAs, with the exception of those reconstructed by MALBAC and DOP-PCR methods, are quite uniform and suitable for genomic investigations, enabling the study of various biological processes related to cellular heterogeneity such as tumor evolution, neural somatic mosaicism and embryonic development.Entities:
Keywords: mtDNA; scWGS; single-cell
Mesh:
Substances:
Year: 2020 PMID: 32403285 PMCID: PMC7290567 DOI: 10.3390/genes11050534
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Computational pipeline for mtDNA analysis in single cell sequencing data. Our computational pipeline comprises three main steps: (1) download from SRA database and preprocessing of raw data from .SRA format to .FASTQ format, followed by quality check and trimming of adapters and low-quality reads; (2) first read alignment onto the rCRS reference sequence and subsequent realignment onto the hg19 (including rCRS) genome to obtain genuine mtDNA reads, using BWA and GSNAP (evoked by MToolBox); (3) mtDNA sequence assembly, variant detection and annotation by MToolBox. Further variant processing was performed with BCFtools and VCFlib.
Figure 2Lorenz curves of coverage uniformity for reconstructed mtDNA genomes. The cumulative fraction of sequence reads against the cumulative fraction of genome covered is shown for each amplification protocol (eWGA, DOP-PCR, MALBAC and MDA) followed by WGS in HT-29 cells, the unamplified bulk of HT-29 and scATACseq data. The diagonal line represents the ideal perfect uniform coverage. The plot was generated using the Lc function in R package ineq.
Numbers of mtDNA variants detected in pairwise comparisons of HT-29 cells according to the following thresholds: depth of 30 reads and minimum allele frequency of 3%.
| Protocol | All | Common | Hom | Het | %Common | Unique | Confirmed | %Confirmed |
|---|---|---|---|---|---|---|---|---|
| DOP | 41 | 37 | 31 | 6 | 90.24 | 4 | 33 | 89.19 |
| eWGA | 44 | 40 | 37 | 3 | 90.91 | 4 | 36 | 90 |
| MALBAC | 3 | 3 | 2 | 1 | 100 | 0 | 2 | 66.67 |
| MDA | 43 | 38 | 36 | 2 | 87.37 | 5 | 36 | 94.74 |
Protocol: amplification protocol; All: all variants detected in each pair; Common: number of variants shared by each pair; Hom: number of homoplasmic variants; Het: number of heteroplasmic variants; %Common: percentage of variants shared by each pair; Unique: number of private variants; Confirmed: number of variants confirmed by the bulk; %Confirmed: percentage of variants confirmed by the bulk.
Variants detected in nine HT-29 cells sequenced using the eWGA protocol according to the following thresholds: depth of 30 reads and minimum allele frequency of 3%.
| Variant | inWGS | inWES | IsInBulk | mAF_WGS | mAF_WES | mAF_Bulk |
|---|---|---|---|---|---|---|
| m.73A > G | 5 | 1 | 1 | 1.00 | 1.00 | 1.00 |
| m.73A > T | 5 | 1 | 1 | 1.00 | 1.00 | 1.00 |
| m.114C > T | 5 | 2 | 1 | 1.00 | 1.00 | 1.00 |
| m.263A > G | 6 | 3 | 1 | 1.00 | 1.00 | 1.00 |
| m.310T > C | 4 | 1 | 0 | 0.38 | 0.68 | 0 |
| m.497C > T | 5 | 2 | 1 | 0.99 | 0.99 | 1.00 |
| m.750A > G | 7 | 2 | 1 | 1.00 | 0.98 | 1.00 |
| m.1189T > C | 5 | 2 | 1 | 1.00 | 0.97 | 1.00 |
| m.1352C > T | 2 | - | 0 | 0.06 | - | 0 |
| m.1413T > C | 6 | 2 | 1 | 1.00 | 1.00 | 1.00 |
| m.1438A > G | 6 | 2 | 1 | 1.00 | 1.00 | 1.00 |
| m.1811A > G | 9 | 2 | 1 | 1.00 | 1.00 | 1.00 |
| m.2706A > G | 9 | 3 | 1 | 1.00 | 0.91 | 1.00 |
| m.3480A > G | 9 | 3 | 1 | 1.00 | 0.99 | 1.00 |
| m.4769A > G | 4 | 1 | 1 | 1.00 | 1.00 | 1.00 |
| m.5591G > A | 2 | - | 0 | 0.08 | - | 0 |
| m.7028C > T | 4 | 1 | 1 | 1.00 | 1.00 | 1.00 |
| m.8860A > G * | 4 | 1 | 0 | 1.00 | 1.00 | 1.00 |
| m.9055G > A | 9 | 6 | 1 | 1.00 | 1.00 | 1.00 |
| m.9510T > C * | 6 | 4 | 0 | 0.31 | 0.36 | 0.37 |
| m.9698T > C | 8 | 4 | 1 | 0.99 | 0.99 | 0.97 |
| m.10398A > G * | 9 | 2 | 0 | 0.99 | 1.00 | 1.00 |
| m.10550A > G | 9 | 3 | 1 | 0.99 | 1.00 | 1.00 |
| m.10978A > G | 7 | 4 | 1 | 0.99 | 0.98 | 1.00 |
| m.11145C > A | 3 | - | 0 | 0.06 | - | 0 |
| m.11299T > C | 9 | 7 | 1 | 1.00 | 0.99 | 1.00 |
| m.11467A > G | 9 | 5 | 1 | 1.00 | 1.00 | 1.00 |
| m.11470A > G | 9 | 5 | 1 | 1.00 | 1.00 | 1.00 |
| m.11719G > A | 8 | 3 | 1 | 1.00 | 1.00 | 1.00 |
| m.11914G > A | 7 | 3 | 1 | 1.00 | 1.00 | 1.00 |
| m.12308A > G | 7 | 2 | 1 | 1.00 | 1.00 | 1.00 |
| m.12372G > A * | 7 | 4 | 0 | 1.00 | 1.00 | 0.96 |
| m.12954T > C | 5 | 6 | 1 | 1.00 | 1.00 | 1.00 |
| m.13831C > A * | 5 | 5 | 0 | 0.37 | 0.34 | 0.22 |
| m.14167C > T | 7 | 9 | 1 | 1.00 | 1.00 | 1.00 |
| m.14766C > T | 9 | 4 | 1 | 1.00 | 0.99 | 1.00 |
| m.14798T > C | 8 | 6 | 1 | 1.00 | 0.99 | 0.96 |
| m.15289T > C | 2 | - | 0 | 0.08 | - | 0 |
| m.15326A > G | 9 | 3 | 1 | 1.00 | 1.00 | 1.00 |
| m.15924A > G | 9 | 3 | 1 | 1.00 | 1.00 | 1.00 |
| m.16224T > C | 7 | 6 | 1 | 0.99 | 1.00 | 1.00 |
| m.16234C > T | 7 | 6 | 1 | 1.00 | 1.00 | 1.00 |
| m.16311T > C | 8 | 5 | 1 | 1.00 | 1.00 | 1.00 |
| m.16519T > C | 5 | 1 | 1 | 1.00 | 0.97 | 1.00 |
Variant: type and genomic location of each variant in the format m.[POS][REF] > [ALT] (m: chrM; POS: genomic position; REF: reference base; ALT: alternative base); inWGS: number of cells sequenced by WGS in which the variant is detected; inWES: number of cells sequenced by WES in which the variant is detected; IsInBulk: a flag indicating if a variant is detected (1) or not detected (0) in the bulk; mAF_WGS: mean allele frequency in cells sequenced by WGS; mAF_WES: mean allele frequency in cells sequenced by WES; mAF_Bulk: allele frequency in the bulk (reported also for sites supported by less than 30 reads). mAF_Bulk values of 0 indicate that the site is not covered by reads. * indicates variants apparently not supported by the bulk because covered by less than 30 reads.
Variants detected in 48 TF-1 cells sequenced using the ATACseq protocol according to the following thresholds: depth of 30 reads and minimum allele frequency of 3%.
| Variant | # Cells | IsInBulk | mAF | mAF_Bulk |
|---|---|---|---|---|
| m.73A > G | 43 | 1 | 1.00 | 1.00 |
| m.150C > T | 44 | 1 | 1.00 | 1.00 |
| m.199T > C | 45 | 1 | 1.00 | 1.00 |
| m.263A > G | 44 | 1 | 1.00 | 1.00 |
| m.489T > C | 47 | 1 | 1.00 | 0.06 |
| m.750A > G | 47 | 1 | 1.00 | 1.00 |
| m.1438A > G | 44 | 1 | 1.00 | 1.00 |
| m.2706A > G | 46 | 1 | 1.00 | 1.00 |
| m.3572T > C * | 9 | 0 | 0.04 | 1.00 |
| m.4048G > A | 43 | 1 | 1.00 | 0.01 |
| m.4071C > T | 40 | 1 | 1.00 | 1.00 |
| m.4164A > G | 34 | 1 | 1.00 | 1.00 |
| m.4769A > G | 1 | 1 | 1.00 | 1.00 |
| m.5351A > G | 28 | 1 | 0.99 | 1.00 |
| m.5460G > A | 47 | 1 | 0.99 | 1.00 |
| m.6455C > T | 47 | 1 | 1.00 | 1.00 |
| m.6680T > C | 43 | 1 | 1.00 | 1.00 |
| m.7028C > T | 38 | 1 | 1.00 | 1.00 |
| m.7684T > C | 42 | 1 | 1.00 | 1.00 |
| m.7853G > A | 46 | 1 | 1.00 | 1.00 |
| m.8552T > C | 43 | 1 | 1.00 | 1.00 |
| m.8563A > C * | 1 | 0 | 0.08 | 1.00 |
| m.8684C > T * | 1 | 0 | 0.18 | 1.00 |
| m.8701A > G | 40 | 1 | 1.00 | 1.00 |
| m.8860A > G | 1 | 1 | 1.00 | 1.00 |
| m.9540T > C | 47 | 1 | 1.00 | 1.00 |
| m.9627G > A | 46 | 1 | 0.18 | 0.02 |
| m.9824T > C | 47 | 1 | 1.00 | 0.99 |
| m.10345T > C | 47 | 1 | 1.00 | 1.00 |
| m.10398A > G | 45 | 1 | 1.00 | 1.00 |
| m.10400C > T | 45 | 1 | 1.00 | 0.99 |
| m.10873T > C | 44 | 1 | 1.00 | 1.00 |
| m.11284C > T | 36 | 1 | 0.08 | 0.99 |
| m.11719G > A | 44 | 1 | 0.98 | 1.00 |
| m.12405C > T | 47 | 1 | 1.00 | 1.00 |
| m.12705C > T | 44 | 1 | 1.00 | 1.00 |
| m.12811T > C | 47 | 1 | 1.00 | 1.00 |
| m.12906C > T * | 1 | 0 | 0.20 | 1.00 |
| m.13239C > T * | 1 | 0 | 0.07 | 1.00 |
| m.14766C > T | 39 | 1 | 1.00 | 1.00 |
| m.14783T > C | 42 | 1 | 1.00 | 1.00 |
| m.15043G > A | 47 | 1 | 1.00 | 1.00 |
| m.15301G > A | 46 | 1 | 1.00 | 1.00 |
| m.15326A > G | 47 | 1 | 1.00 | 0.01 |
| m.16129G > A | 46 | 1 | 0.99 | 1.00 |
| m.16189T > C | 39 | 1 | 1.00 | 0.99 |
| m.16223C > T | 46 | 1 | 1.00 | 1.00 |
| m.16297T > C | 47 | 1 | 1.00 | 0.17 |
| m.16298T > C | 47 | 1 | 1.00 | 1.00 |
Variant: type and genomic location of each variant in the format m.[POS][REF] > [ALT] (m: chrM; POS: genomic position; REF: reference base; ALT: alternative base); # Cells: number of cells in which the variant is detected; IsInBulk: a flag indicating if a variant is detected (1) or not detected (0) in the bulk; mAF: mean allele frequency; mAF_Bulk: allele frequency in the bulk (reported also for sites supported by less than 30 reads). * indicates variants apparently not supported by the bulk because covered by less than 30 reads.