| Literature DB >> 28270093 |
Antonio Martín-Navarro1,2, Andrés Gaudioso-Simón1, Jorge Álvarez-Jarreta2,3, Julio Montoya1,4,5, Elvira Mayordomo6,7, Eduardo Ruiz-Pesini8,9,10,11.
Abstract
BACKGROUND: Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved.Entities:
Keywords: Classifier; Missense mutation; Mitochondrial DNA; Pathogenicity; Protein multiple sequence alignment; SVM
Mesh:
Substances:
Year: 2017 PMID: 28270093 PMCID: PMC5341421 DOI: 10.1186/s12859-017-1562-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Amino acid (AA) relative frequency (%) and conservation index (CI)
| IM | TM | M | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AA | N | % | CI | tM | D | N | % | CI | tM | D | N | % | CI | tM | D |
| A | 37 | 5.1 | 50.4 | 37 | 0 | 192 | 8.3 | 57.4 | 177 | 7 | 26 | 3.5 | 44.6 | 27 | 1 |
| C | 3 | 0.4 | 67.4 | 1 | 0 | 14 | 0.6 | 59.2 | 8 | 0 | 5 | 0.7 | 64.5 | 4 | 0 |
| D | 28 | 3.8 | 73.2 | 25 | 0 | 21 | 0.9 | 69.8 | 9 | 2 | 17 | 2.3 | 72.4 | 16 | 0 |
| E | 18 | 2.5 | 57.5 | 7 | 0 | 42 | 1.8 | 78.4 | 23 | 2 | 28 | 3.8 | 71.0 | 15 | 2 |
| F | 30 | 4.1 | 60.7 | 22 | 0 | 154 | 6.7 | 69.5 | 104 | 1 | 32 | 4.3 | 58.3 | 18 | 1 |
| G | 46 | 6.3 | 73.9 | 25 | 0 | 128 | 5.5 | 80.8 | 60 | 4 | 38 | 5.1 | 77.0 | 15 | 0 |
| H | 18 | 2.5 | 64.6 | 12 | 0 | 51 | 2.2 | 74.0 | 22 | 1 | 28 | 3.8 | 55.9 | 26 | 0 |
| I | 45 | 6.2 | 44.4 | 44 | 0 | 234 | 10.1 | 46.0 | 318 | 0 | 40 | 5.4 | 35.5 | 55 | 0 |
| K | 11 | 1.5 | 58.2 | 1 | 0 | 39 | 1.7 | 66.2 | 9 | 0 | 45 | 6.0 | 45.3 | 17 | 0 |
| L | 98 | 13.4 | 50.1 | 44 | 1 | 457 | 19.7 | 60.4 | 205 | 12 | 89 | 11.9 | 49.0 | 46 | 1 |
| M | 31 | 4.3 | 52.5 | 20 | 2 | 140 | 6.1 | 38.9 | 134 | 3 | 37 | 5.0 | 34.1 | 36 | 0 |
| N | 49 | 6.7 | 48.4 | 49 | 0 | 65 | 2.8 | 47.7 | 59 | 0 | 50 | 6.7 | 44.0 | 70 | 0 |
| P | 61 | 8.4 | 70.9 | 31 | 0 | 90 | 3.9 | 65.1 | 41 | 2 | 68 | 9.1 | 65.2 | 41 | 0 |
| Q | 24 | 3.3 | 55.9 | 8 | 0 | 35 | 1.5 | 63.2 | 11 | 1 | 31 | 4.2 | 54.1 | 20 | 0 |
| R | 10 | 1.4 | 86.6 | 2 | 0 | 33 | 1.4 | 80.5 | 14 | 1 | 20 | 2.7 | 82.6 | 13 | 3 |
| S | 58 | 8.0 | 39.6 | 55 | 0 | 159 | 6.9 | 53.1 | 117 | 2 | 57 | 7.6 | 42.2 | 61 | 2 |
| T | 82 | 11.2 | 36.3 | 73 | 0 | 207 | 8.9 | 40.6 | 201 | 0 | 62 | 8.3 | 33.4 | 79 | 0 |
| V | 33 | 4.5 | 49.7 | 32 | 0 | 114 | 4.9 | 50.7 | 147 | 3 | 20 | 2.7 | 42.0 | 18 | 0 |
| W | 20 | 2.7 | 89.0 | 1 | 0 | 64 | 2.8 | 79.9 | 19 | 1 | 20 | 2.7 | 71.5 | 7 | 0 |
| Y | 27 | 3.7 | 67.1 | 17 | 1 | 75 | 3.2 | 64.9 | 36 | 0 | 33 | 4.4 | 43.2 | 31 | 1 |
| 729 | 506 | 4 | 2314 | 1714 | 42 | 746 | 615 | 11 | |||||||
IM, TM, M, N, tM and D code for intermembrane, transmembrane, and matrix domains, total positions with this amino acid, number of total and damaging mutations, respectively
Comparison between predictors with validation dataset of 1,100 mutations (23 damaging + 1,077 neutral)
| MITOCLASS.1 | POLYPHEN-2 | PROVEAN | MUTPRED | |
|---|---|---|---|---|
| Sensitivity | 95.7 | 91.3/94.7 | 91.3/87.7 | 60.9/57.9 |
| Specificity | 58.7 | 47.7/46.9 | 60.4/59.2 | 85.6/87.3 |
| TP | 22 | 21/54 | 21/50 | 14/33 |
| TN | 623 | 514/1303 | 650/1646 | 922/2426 |
| FP | 454 | 563/1475 | 427/1132 | 155/352 |
| FN | 1 | 2/3 | 2/7 | 9/24 |
For PolyPhen-2, Provean and Mutpred, the complete mdmv.1 dataset has also been analyzed (numbers after the slash). PolyPhen-2 is unable to predict the phenotype of 10 missense mutations because much of the initial and final sequence of p.MT-ND5 is non-aligneable due to large stretches of repeats and/or high compositional biases as commented by authors. For the sake of comparison, we consider these unknown predictions as neutral variants. TP, TN, FP, FN refers to true positives, true negatives, false positives and false negatives respectively. Sensitivity is estimated as [TP/(TP + FN)], specificity as [TN/(TN + FP)]
Analysis of features for false negative (FN) predictions of Provean, PolyPhen-2 and Mitoclass.1 in validation dataset
| AA substitution (polypeptide) | FN | CI | F1 | F2 | F3 |
|---|---|---|---|---|---|
| p.A132T (p.MT-ND1) | PolyPhen-2 | 72.82 | 123.98 | 0.30 | 12.59 |
| p.L289M (p.MT-ND1) | Provean | 29.72 | 78.64 | 1.99 | 16.07 |
| p.S34P (p.MT-ND3) | PolyPhen-2 and Provean | 10.15 | 60.53 | 0.34 | 5.23 |
| p.V65A (p.MT-ND4L) | Mitoclass.1 | 24.72 | 103 | 35.06 | 6.78 |
CI refers to conservation index for each position. F1, F2 and F3 refer to the numerical values of the three attributes considered for Mitoclass.1 classifier
Feature values for rare missense mutations without clear evidences of pathogenicity classified as damaging mutations by Mitoclass.1
| rCRS Mut | AA subs/PP/Dom | F1 | F2 | F3 | Freq | Ho/He | DamPre |
|---|---|---|---|---|---|---|---|
| m.4633C > G | p.A55G/p.MT-ND2/TM | 127.5 | 2.33 | 12.66 | 0 | Ho | 4 |
| m.4648 T > C | p.F60S/p.MT-ND2/TM | 113.7 | 0.02 | 4.22 | 0 | Ho | 4 |
| m.5244G > A | p.G259S/p.MT-ND2/TM | 138.9 | 1.01 | 23.65 | 0 | He | 4 |
| m.6742 T > C | p.I280T/p.MT-CO1/TM | 99.7 | 0.02 | 9.22 | 0 | He | 3 |
| m.8528 T > C | p.W55R/p.MT-ATP8/M | 102.6 | 0.04 | 14.99 | 0 | He | 4 |
| m.8795A > G | p.H90R/p.MT-ATP6/TM | 73.8 | 0 | 0.66 | 0 | He | 4 |
| m.9972A > C | p.I256L/p.MT-CO3/IM | 112.3 | 0.97 | 25.85 | 1 | He | 1 |
| m.10543A > G | p.H25R/p.MT-ND4L/TM | 150.5 | 2.57 | 0.66 | 0 | He | 4 |
| m.10591 T > G | p.F41C/p.MT-ND4L/TM | 134.9 | 0.02 | 1.08 | 0 | He | 3 |
| m.12848C > T | p.A171V/p.MT-ND5/TM | 164.3 | 0.16 | 8.93 | 0 | He | 3 |
| m.13051G > A | p.G239S/p.MT-ND5/TM | 163.9 | 0.02 | 23.65 | 0 | Ho | 4 |
| m.13511A > T | p.K392M/p.MT-ND5/TM | 98.9 | 0,02 | 3.24 | 0 | He | 4 |
| m.13849A > C | p.N505H/p.MT-ND5/TM | 61.8 | 0.27 | 5.24 | 0 | Ho | 2 |
| m.14430A > G | p.W82R/p.MT-ND6/M | 99.2 | 0.19 | 14.99 | 0 | Ho | 3 |
| m.14498 T > C | p.Y59C/p.MT-ND6/TM | 143.9 | 0.04 | 2.97 | 0 | He | 3 |
| m.15243G > A | p.G166E/p.MT-CYB/IM | 115.6 | 0 | 11.92 | 0 | He | 4 |
| Mean | 118.8 | 0.48 | 10.25 | ||||
| Mean of neutral variants from validation dataset | 80.1 | 7.90 | 12.70 |
rCRS Mut, AA subs, PP, Dom, F1, F2, F3, Freq, Ho/He and DamPre code for position of the mutation according to the revised Cambridge Reference Sequence, amino acid substitution, polypeptide, domain, Feature 1–3 scores, frequency, Homoplasmy/Heteroplasmy, and number of predictors that consider damaging this amino acid substitution, respectively
Percentage of confirmed and predicted pathologic mutations per polypeptide/complex (A) or domain (B)
| A | |||||||
| Complex | Polypeptide | AA | % | MUT | % | MUT | % |
| Confirmed | Predicted | ||||||
| CI | 2214 | 55.8 | 36 | 63.2 | 7190 | 47.8 | |
| p.MT-ND1 | 318 | 8.4 | 15 | 26.3 | 1300 | 8.6 | |
| p.MT-ND2 | 347 | 9.2 | 1 | 1.8 | 1032 | 6.9 | |
| p.MT-ND3 | 115 | 3.0 | 2 | 3.5 | 420 | 2.8 | |
| p.MT-ND4 | 459 | 12.1 | 3 | 5.3 | 1689 | 11.2 | |
| p.MT-ND4L | 98 | 2.6 | 1 | 1.8 | 377 | 2.5 | |
| p.MT-ND5 | 603 | 15.9 | 7 | 12.3 | 2008 | 13.3 | |
| p.MT-ND6 | 174 | 4.6 | 7 | 12.3 | 364 | 2.4 | |
| CIII | 380 | 10.0 | 2 | 3.5 | 1721 | 11.4 | |
| p.MT-CYB | 380 | 10.0 | 2 | 3.5 | 1721 | 11.4 | |
| CIV | 1001 | 26.4 | 4 | 7.0 | 5146 | 34.2 | |
| p.MT-CO1 | 513 | 13.5 | 1 | 1.8 | 2803 | 18.6 | |
| p.MT-CO2 | 227 | 6.0 | 2 | 3.5 | 1021 | 6.8 | |
| p.MT-CO3 | 261 | 6.9 | 1 | 1.8 | 1322 | 8.8 | |
| CV | 294 | 7.8 | 15 | 26.3 | 992 | 6.6 | |
| p.MT-ATP6 | 226 | 6.0 | 15 | 26.3 | 876 | 5.8 | |
| p.MT-ATP8 | 68 | 1.8 | 0 | 0 | 116 | 0.8 | |
| B | |||||||
| Domain | AA | % | MUT | % | MUT | % | |
| Confirmed | Predicted | ||||||
| 3889 | 100 | 57 | 100 | 15049 | 100 | ||
| IM | 747 | 19.2 | 4 | 7.0 | 2883 | 19.1 | |
| TM | 2376 | 61.1 | 42 | 73.7 | 9294 | 61.8 | |
| M | 766 | 19.7 | 11 | 19.3 | 2872 | 19.1 | |
Complex, polypeptide, AA, %, MUT, %, confirmed, predicted, IM, TM and M refer to OXPHOS complexes, mtDNA-encoded polypeptides, number of amino acids and its percentage in a particular polypeptide or domain, number of damaging mutations and its percentage in a particular polypeptide or domain, confirmed damaging mutations, predicted damaging mutations by Mitoclass.1, intermembrane, transmembrane and matrix domains, respectively