| Literature DB >> 26621530 |
Mariangela Santorsola1,2, Claudia Calabrese3, Giulia Girolimetti3, Maria Angela Diroma1, Giuseppe Gasparre3, Marcella Attimonelli4.
Abstract
Assigning a pathogenic role to mitochondrial DNA (mtDNA) variants and unveiling the potential involvement of the mitochondrial genome in diseases are challenging tasks in human medicine. Assuming that rare variants are more likely to be damaging, we designed a phylogeny-based prioritization workflow to obtain a reliable pool of candidate variants for further investigations. The prioritization workflow relies on an exhaustive functional annotation through the mtDNA extraction pipeline MToolBox and includes Macro Haplogroup Consensus Sequences to filter out fixed evolutionary variants and report rare or private variants, the nucleotide variability as reported in HmtDB and the disease score based on several predictors of pathogenicity for non-synonymous variants. Cutoffs for both the disease score as well as for the nucleotide variability index were established with the aim to discriminate sequence variants contributing to defective phenotypes. The workflow was validated on mitochondrial sequences from Leber's Hereditary Optic Neuropathy affected individuals, successfully identifying 23 variants including the majority of the known causative ones. The application of the prioritization workflow to cancer datasets allowed to trim down the number of candidate for subsequent functional analyses, unveiling among these a high percentage of somatic variants. Prioritization criteria were implemented in both standalone ( http://sourceforge.net/projects/mtoolbox/ ) and web version ( https://mseqdr.org/mtoolbox.php ) of MToolBox.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26621530 PMCID: PMC4698288 DOI: 10.1007/s00439-015-1615-9
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
List of 53 non-synonymous variants composing the training dataset
| Non-synonymous variant | Locus | Non-synonymous variants | Locus | Non-synonymous variants | Locus |
|---|---|---|---|---|---|
| T9185C | MT-ATP6 | T3931C | MT-ND1 | G10573A | MT-ND4L |
| T9176C | MT-ATP6 | G3392A | MT-ND1 | G13042A | MT-ND5 |
| T9176G | MT-ATP6 | G3733A | MT-ND1 | T12706C | MT-ND5 |
| G8839A | MT-ATP6 | T3949C | MT-ND1 | T13540C | MT-ND5 |
| T8993C | MT-ATP6 | G3697A | MT-ND1 | G13513A | MT-ND5 |
| T8993G | MT-ATP6 | T3679C | MT-ND1 | A13514G | MT-ND5 |
| C6567T | MT-CO1 | G3922A | MT-ND1 | G13178A | MT-ND5 |
| T6210C | MT-CO1 | G4831A | MT-ND2 | T12797C | MT-ND5 |
| T15843C | MT-CYB | G4975A | MT-ND2 | T13847C | MT-ND5 |
| T15813G | MT-CYB | T10158C | MT-ND3 | T13271C | MT-ND5 |
| T15209C | MT-CYB | T10191C | MT-ND3 | C14568T | MT-ND6 |
| G3700A | MT-ND1 | G10197A | MT-ND3 | C14482A | MT-ND6 |
| C4171A | MT-ND1 | G12056A | MT-ND4 | C14482G | MT-ND6 |
| G3460A | MT-ND1 | T11613C | MT-ND4 | T14484C | MT-ND6 |
| T4222C | MT-ND1 | C11777A | MT-ND4 | A14495G | MT-ND6 |
| G3635A | MT-ND1 | G11778A | MT-ND4 | G14459A | MT-ND6 |
| G4148A | MT-ND1 | G11475A | MT-ND4 | T14487C | MT-ND6 |
| G3890A | MT-ND1 | T10663C | MT-ND4L |
The table lists non-synonymous variants and related locus previously validated as affecting the protein function and included in the training dataset used to define the disease score
Fig. 1a The histogram graphs the bimodal distribution of disease scores associated to 1872 non-synonymous variants (HmtDB, May 2014) observed in mtDNA sequences from healthy individuals and stored in HmtDB. The solid lines indicate the two gaussian components of the mixture model (McLachlan and Peel 2000) (46 and 54 %, respectively). The first component of the mixture model with the lowest disease score values included the most benign non-synonymous variants. The vertical dashed line is drawn at the selected Disease Score Threshold, DST, defined as 0.4311; non-synonymous variants featuring a DS above 0.4311 may, therefore, be considered potentially affecting function. b Box-plot diagram shows the disease scores of non-synonymous variants by class of ‘Neutral’ or ‘Disease’ prediction (disease scores ranging from 0.05 to 0.4311 and from 0.6565 to 0.9162, respectively, for each class) as returned by all six pathogenicity predictors implemented in MToolBox. Circles represent the outliers. c Empirical cumulative distribution function of nucleotide variability associated with the 816 non-synonymous variants, featuring a disease score above the established DST. Dashes vertical line indicates the nucleotide variability cutoff, NVC = 0.0026, defined as the third quartile of such distribution. Non-synonymous variants showing variability values below the NVC are filtered by the variant prioritization workflow
Fig. 2The stepwise prioritization workflow and the related number of mitochondrial variants filtered in any step performed on the full lists of any detected variants annotated in A LHON and B ovarian datasets from Sanger sequencing
List of prioritized non-synonymous variants in LHON samples
| No. samples | Variant allele | Locus | Nt Var | AA change | AA Var | Disease score | Mitomap | 1000 genomes |
|---|---|---|---|---|---|---|---|---|
| 1 | 10747A | MT-ND4L | 0.0000 | L93Q | 0 | 0.8781 | ||
| 1 | 6448A | MT-CO1 | 0.0000 | P182H | 0.0026 | 0.8325 | ||
| 1 | 7042C | MT-CO1 | 0.0000 | V380A | 0.0047 | 0.8044 | ||
| 1 | 15156G | MT-CYB | 0.0003 | Q137R | 0.0005 | 0.9044 | ||
| 1 | 7632C | MT-CO2 | 0.0003 | I16T | 0.0018 | 0.4579 | ||
| 1 | 9104C | MT-ATP6 | 0.0007 | F193S | 0.0075 | 0.5168 | ||
| 1 | 14249A | MT-ND6 | 0.0016 | A142 V | 0.0121 | 0.4498 | ||
| 1 | 8551C | MT-ATP6 | 0.0018 | F9L | 0.0042 | 0.7620 | 0.0008 | |
| 1 | 3890A | MT-ND1 | 0.0000 | R195Q | 0 | 0.8184 | PE/LS/OA | |
| 2 | 3733A | MT-ND1 | 0.0000 | E143 K | 0 | 0.8360 | LHON Top 14 | |
| 9 | 3635A | MT-ND1 | 0.0000 | S110 N | 0 | 0.7977 | LHON Top 14 | |
| 1 | 3733C | MT-ND1 | 0.0000 | E143Q | 0 | 0.8677 | LHON | |
| 1 | 3922A | MT-ND1 | 0.0000 | E206 K | 0 | 0.8939 | Head/neck tumor | |
| 1 | 14495G | MT-ND6 | 0.0000 | L60S | 0 | 0.8616 | LHON Top 14 | |
| 1 | 10663C | MT-ND4L | 0.0000 | V65A | 0 | 0.5776 | LHON Top 14 | |
| 1 | 14841G | MT-CYB | 0.0000 | N32S | 0 | 0.8360 | LHON helper mut. | 0.0012 |
| 1 | 9655A | MT-CO3 | 0.0005 | S150 N | 0.0029 | 0.7259 | Thyroid tumor | 0.0008 |
| 1 | 14459A | MT-ND6 | 0.0006 | A72 V | 0.0183 | 0.8655 | LDYT/LS/LHON Top 14 | 0.0008 |
| 4 | 14568T | MT-ND6 | 0.0009 | G36S | 0.0079 | 0.7311 | LHON Top 14 | |
| 15 | 3460A | MT-ND1 | 0.0014 | A52T | 0.0015 | 0.7629 | LHON Top 14 (95 %) | |
| 2 | 4171A | MT-ND1 | 0.0016 | L289 M | 0.0107 | 0.6809 | LHON Top 14 | |
| 2 | 14482A | MT-ND6 | 0.0024 | M64I | 0.0333 | 0.7923 | LHON Top 14 | |
| 41 | 11778A | MT-ND4 | 0.0025 | R340H | 0.0516 | 0.8534 | LHON Top 14 (95 %)/PDY | 0.0004 |
Non-synonymous variant recognized on LHON-derived mtDNAs and prioritized according to the established criteria. Number of samples harboring the variant allele (No. samples), mtDNA locus (locus), site-specific nucleotide variability value (Nt Var), amino acid change and variability (AA change and AA Var, respectively), Disease Score, annotations from Mitomap (Lott et al. 2013) and frequencies in 1000 genomes [as implemented in (Calabrese et al. 2014)] are associated with each variant allele. Frameshifts and Premature stop codons are also reported in ‘AA change’ field. None of the LHON variants are involved in haplogroup assignment. For full variants in LHON samples, see Supplementary Table 1
LDYT Leber’s hereditary optic neuropathy and Dystonia, LS Leigh syndrome, OA optic atrophy, PE progressive encephalomyopathy, PDY progressive dystonia
List of prioritized mtDNA non-synonymous variants in ovarian cancer samples
| Sample | Variant allele | HF | Locus | Nt Var | AA change | Tumor-specific | AA Var | Disease score | 1000 genomes |
|---|---|---|---|---|---|---|---|---|---|
| EOC5 | 3380A | 0.8 | MT-ND1 | 0.0003 | R25Q | + | 0.00 | 0.8764 | 0.0004 |
| EOC40 | 14969C | 0.5 | MT-CYB | 0.0003 | Y75H | + | 0.00 | 0.8526 | 0.0004 |
| EOC16 | 9837A | 0.5 | MT-CO3 | 0.0000 | G211S | + | 0.00 | 0.8379 | 0.0004 |
| EOC20 | 15255C | 0.8 | MT-CYB | 0.0000 | V170A | + | 0.00 | 0.8195 | 0.0004 |
| EOC20 | 10696T | 0.8 | MT-ND4L | 0.0000 | A76 V | + | 0.01 | 0.7810 | |
| EOC14 | 6121C | 0.5 | MT-CO1 | 0.0007 | I73T | + | 0.00 | 0.7054 | 0.0004 |
| EOC5 | 8412C | 1.0 | MT-ATP8 | 0.0023 | M16T | − | 0.03 | 0.6587 | 0.0008 |
| EOC32 | 14249A | 1.0 | MT-ND6 | 0.0020 | A142 V | − | 0.02 | 0.4498 | |
| EOC37 | 6691.A | 0.5 | MT-CO1 | 0 | Frameshift | + | 0 |
Tumor-specific and germline variants recognized on ovarian cancer-derived mtDNAs and prioritized according to the established criteria. Sample identifier (sample), heteroplasmic fraction (HF), mtDNA locus (locus), site-specific nucleotide variability (Nt Var), Amino acid change and variability (AA change and AA Var, respectively), Disease Score, somatic (+) or germline (−) nature (‘tumor-specific/germline’) of variants and frequencies in 1000 genomes [as implemented in (Calabrese et al. 2014)] are associated with each variant allele. For full variants in ovarian cancer samples, see Supplementary Table 2—AllVariants
Fig. 3The stepwise prioritization workflow and the related number of mitochondrial variants filtered in any step performed on the full lists of any detected A tumor-specific and B germline variants annotated in the COAD dataset from Whole Exome Sequencing (WXS). The number of blood-specific variants is also shown
List of tumor-specific non-synonymous variants prioritized in COAD samples
| Sample | Variant allele | HF | Locus | Nt Var | AA change | AA Var | Disease score | Mitomap | 1000 genomes |
|---|---|---|---|---|---|---|---|---|---|
| A6665101A21D183510 | 11390A | 0.861 | MT-ND4 | 0.0002 | Premature Stop Codon | 0.01 | |||
| A6665201A11D177110 | 3380A | 0.967 | MT-ND1 | 0.0003 | R25Q | 0.00 | 0.8800 | MELAS | 0.0004 |
| AU377901A01D171910 | 10863A | 0.909 | MT-ND4 | 0.0003 | S35N | 0.00 | 0.7600 | ||
| CM474301A01D171910 | 14918A | 0.814 | MT-CYB | 0.0003 | D58N | 0.00 | 0.7100 | 0.0004 | |
| 14985A | 0.881 | MT-CYB | 0.0000 | R80H | 0.00 | 0.9000 | Colorectal tumor | 0.0017 | |
| CM534401A21D171910 | 11552C | 0.819 | MT-ND4 | 0.0000 | S265P | 0.00 | 0.8900 | ||
| CM586101A01D165010 | 12814A | 0.979 | MT-ND5 | 0.0011 | A160T | 0.00 | 0.6900 | 0.0004 | |
| CM586401A01D165010 | 10854C | 0.971 | MT-ND4 | 0.0000 | L32P | 0.00 | 0.8500 | 0.0008 | |
| CM616401A11D165010 | 15243A | 0.874 | MT-CYB | 0.0000 | G166E | 0.00 | 0.9000 | HCM | 0.0004 |
| CM616501A11D165010 | 3946A | 0.935 | MT-ND1 | 0.0001 | E214K | 0.00 | 0.9100 | MELAS | 0.0012 |
| D5653501A11D171910 | 9645A | 0.861 | MT-CO3 | 0.0000 | A147T | 0.00 | 0.8100 | 0.0004 | |
| D5654101A11D171910 | 4810A | 0.9 | MT-ND2 | 0.0000 | Premature Stop Codon | 0.00 | |||
| D5693001A11D192410 | 6798A | 0.866 | MT-CO1 | 0.0000 | V299M | 0.00 | 0.7700 | ||
| DMA0X901A11DA1521 | 3380A | 0.799 | MT-ND1 | 0.0003 | R25Q | 0.00 | 0.8800 | MELAS | 0.0004 |
| 9790T | 0.949 | MT-CO3 | 0.0000 | S195L | 0.00 | 0.8300 | |||
| DMA1D001A11DA15210 | 4537A | 0.925 | MT-ND2 | 0.0000 | S23N | 0.00 | 0.8200 | ||
| DMA1DA01A11DA15210 | 8243A | 0.954 | MT-CO2 | 0.0000 | E220K | 0.00 | 0.7300 | 0.0008 | |
| DMA28501A11DA16V10 | 10233A | 0.935 | MT-ND3 | 0.0000 | A59T | 0.00 | 0.7700 | 0.0004 | |
| DMA28C01A11DA16V10 | 3380A | 0.976 | MT-ND1 | 0.0003 | R25Q | 0.00 | 0.8800 | MELAS | 0.0004 |
| DMA28G01A11DA16V10 | 7623T | 0.967 | MT-CO2 | 0.0000 | T13I | 0.00 | 0.7700 | LHON | |
| G4629401A11D180610 | 4222C | 0.95 | MT-ND1 | 0.0000 | S306P | 0.00 | 0.7800 | ||
| G4629501A11D171910 | 6744A | 0.792 | MT-CO1 | 0.0000 | G281S | 0.00 | 0.7700 | ||
| G4631501A11D171910 | 11711A | 0.792 | MT-ND4 | 0.0003 | A318T | 0.00 | 0.7800 | 0.0037 | |
| G4632001A11D171910 | 9384A | 0.893 | MT-CO3 | 0.0000 | D60N | 0.00 | 0.7800 | 0.0008 | |
| G4658801A11D177110 | 4004C | 0.768 | MT-ND1 | 0.0000 | M233T | 0.00 | 0.4800 |
Tumor-specific variants recognized on COAD-derived mtDNAs and prioritized according to the established criteria. Sample identifier (sample), heteroplasmic fraction (HF), mtDNA locus (locus), site-specific nucleotide variability (Nt Var), amino acid change and variability (AA change and AA Var, respectively), Disease Score, annotations from Mitomap (Lott et al. 2013) and frequencies in 1000 genomes [as implemented in (Calabrese et al. 2014)] are associated with each variant allele. For full tumor-specific variants in COAD samples, see Supplementary Table 3—TumorSpecific
HCM hypertrophic cardiomyopathy, MELAS mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes
List of prioritized germline non-synonymous variants in COAD samples
| Individual ID | Variant allele | HF blood/tumor | Locus | Nt Var | AA change | AA Var | Disease score | 1000 genomes |
|---|---|---|---|---|---|---|---|---|
| G4629410A | 9447C | 1/1 | MT-CO3 | 0.0000 | Y81H | 0.0155 | 0.7772 | 0.0012 |
| AY619710A | 9106G | 1/0.88 | MT-ATP6 | 0.0004 | T194A | 0.0009 | 0.6682 | |
| F4680710A | 8861T | 1/1 | MT-ATP6 | 0.0017 | T112 M | 0.0276 | 0.5456 | |
| A6565710A | 15434T | 0.98/0.99 | MT-CYB | 0.0015 | L230F | 0.0016 | 0.7115 | 0.0008 |
Non-synonymous germline variants recognized on COAD-derived mtDNAs and prioritized according to the established criteria. Sample identifier (individual ID), heteroplasmic fraction in blood and tumor tissues (HF blood/tumor), mtDNA locus (locus), site-specific nucleotide variability (Nt Var), amino acid change and variability (AA change and AA Var, respectively), Disease Score, somatic (+) or germline (−) nature (‘tumor-specific/germline’) of variants and frequencies in 1000 Genomes [as implemented in (Calabrese et al. 2014)] are associated with each variant allele. For full germline variants in COAD samples, see Supplementary Table 3—Germline