| Literature DB >> 31307390 |
Tommaso Orioli1,2, Mauno Vihinen3.
Abstract
BACKGROUND: Membrane proteins constitute up to 30% of the human proteome. These proteins have special properties because the transmembrane segments are embedded into lipid bilayer while extramembranous parts are in different environments. Membrane proteins have several functions and are involved in numerous diseases. A large number of prediction methods have been introduced to predict protein subcellular localization as well as the tolerance or pathogenicity of amino acid substitutions.Entities:
Keywords: Benchmark; Benchmarking; Disease-causing variant; Membrane protein; Method performance; Mutation; Variation interpretation
Year: 2019 PMID: 31307390 PMCID: PMC6631444 DOI: 10.1186/s12864-019-5865-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Numbers of proteins in test sets
| MP1289 | MP508 | mpHTP | ||||
|---|---|---|---|---|---|---|
| Positive | Negative | Positive | Negative | Positive | Negative | |
| BUSCA | 1040 | 986 | 393 | 361 | 4550 | 4362 |
| CELLO | 1223 | 1180 | 474 | 449 | 5045 | 5194 |
| DeepLoc1.0 | 1285 | 931 | 505 | 337 | 4669 | 4629 |
| LocTree3 | 1286 | 1270 | 507 | 497 | 5358 | 5289 |
| MultiLoc2 | 1106 | 1193 | 408 | 464 | 4654 | 4999 |
| SubCons | 852 | 744 | 309 | 292 | 4055 | 3882 |
| Wolf PSORT | 1289 | 1288 | 508 | 507 | 5414 | 5362 |
Subcellular localization predictors
| Method | Description | URL | Reference |
|---|---|---|---|
| BUSCA | Metapredictor for localization-related protein features |
| [ |
| CELLO | Two-layer SVM |
| [ |
| DeepLoc | Deep neural network |
| [ |
| LOCTREE3 | SVM |
| [ |
| MultiLoc2 | SVM |
| [ |
| SubCons | RF |
| [ |
| Wolf PSORT | Converts amino acid sequences into numerical vectors that are grouped with a weighted k-nearest neighbor classifier |
| [ |
Performance of subcellular localization predictors on MP1289
| BUSCA | CELLO | DeepLoc 1.0 | LocTree3 | MultiLoc2 | SubCons | WolfPSORT | |
|---|---|---|---|---|---|---|---|
| TP | 316 | 348 | 447 | 344 | 108 | 242 | 516 |
| FP | 112 | 158 | 102 | 75 | 50 | 42 | 218 |
| TN | 763 | 1073 | 987 | 1203 | 1144 | 1152 | 1071 |
| FN | 724 | 875 | 838 | 942 | 998 | 610 | 773 |
| Sensitivity | 0.36 | 0.28 | 0.41 | 0.27 | 0.09 | 0.2 | 0.4 |
| Specificity | 0.87 | 0.87 | 0.91 | 0.94 | 0.96 | 0.96 | 0.83 |
| PPV | 0.74 | 0.69 | 0.81 | 0.82 | 0.68 | 0.85 | 0.7 |
| NPV | 0.51 | 0.55 | 0.54 | 0.56 | 0.53 | 0.65 | 0.58 |
| ACC | 0.56 | 0.58 | 0.6 | 0.6 | 0.54 | 0.68 | 0.62 |
| MCC | 0.21 | 0.19 | 0.3 | 0.28 | 0.11 | 0.35 | 0.26 |
| OPM | 0.23 | 0.21 | 0.28 | 0.26 | 0.18 | 0.3 | 0.25 |
Performance of subcellular localization predictors on MP508
| BUSCA | CELLO | DeepLoc1.0 | LocTree3 | MultiLoc2 | SubCons | Wolf PSORT | |
|---|---|---|---|---|---|---|---|
| TP | 126 | 136 | 218 | 166 | 49 | 110 | 227 |
| FP | 18 | 28 | 20 | 10 | 9 | 6 | 40 |
| TN | 282 | 438 | 350 | 493 | 437 | 440 | 468 |
| FN | 267 | 338 | 287 | 341 | 359 | 199 | 281 |
| Sensitivity | 0.42 | 0.29 | 0.59 | 0.33 | 0.11 | 0.25 | 0.45 |
| Specificity | 0.94 | 0.94 | 0.95 | 0.98 | 0.98 | 0.99 | 0.92 |
| PPV | 0.88 | 0.83 | 0.92 | 0.94 | 0.84 | 0.95 | 0.85 |
| NPV | 0.51 | 0.56 | 0.55 | 0.59 | 0.55 | 0.69 | 0.62 |
| ACC | 0.59 | 0.61 | 0.65 | 0.65 | 0.57 | 0.73 | 0.68 |
| MCC | 0.32 | 0.30 | 0.42 | 0.41 | 0.20 | 0.47 | 0.42 |
| OPM | 0.29 | 0.27 | 0.38 | 0.34 | 0.22 | 0.37 | 0.35 |
Performance of subcellular localization predictors on mpHTP
| BUSCA | CELLO | DeepLoc1.0 | LocTree3 | MultiLoc2 | SubCons | Wolf PSORT | |
|---|---|---|---|---|---|---|---|
| TP | 3766 | 3101 | 4242 | 3877 | 1187 | 2776 | 4016 |
| FP | 206 | 153 | 257 | 179 | 104 | 103 | 280 |
| TN | 4156 | 5041 | 4372 | 5110 | 4895 | 3779 | 5082 |
| FN | 784 | 1944 | 427 | 1481 | 3467 | 1279 | 1398 |
| Sensitivity | 0.86 | 0.60 | 0.93 | 0.73 | 0.24 | 0.71 | 0.75 |
| Specificity | 0.95 | 0.97 | 0.94 | 0.97 | 0.98 | 0.97 | 0.95 |
| PPV | 0.95 | 0.95 | 0.94 | 0.96 | 0.92 | 0.96 | 0.93 |
| NPV | 0.84 | 0.72 | 0.91 | 0.78 | 0.58 | 0.75 | 0.78 |
| ACC | 0.89 | 0.79 | 0.93 | 0.84 | 0.63 | 0.83 | 0.84 |
| MCC | 0.78 | 0.63 | 0.85 | 0.71 | 0.34 | 0.68 | 0.70 |
| OPM | 0.72 | 0.53 | 0.80 | 0.62 | 0.30 | 0.60 | 0.62 |
Overall performance of tolerance predictors on MP variants
| TP | FP | TN | FN | Sensitivity | Specificity | PPV | NPV | ACC | MCC | OPM | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CADD | 725 | 670 | 641 | 18 | 0.98 | 0.49 | 0.66 | 0.95 | 0.73 | 0.53 | 0.44 |
| DANN | 737 | 1006 | 305 | 6 | 0.99 | 0.23 | 0.56 | 0.97 | 0.61 | 0.35 | 0.30 |
| Eigen | 582 | 253 | 1058 | 161 | 0.78 | 0.81 | 0.80 | 0.79 | 0.80 | 0.59 | 0.50 |
| Eigen-PC | 580 | 266 | 1045 | 163 | 0.78 | 0.80 | 0.79 | 0.78 | 0.79 | 0.58 | 0.49 |
| FATHMM | 556 | 230 | 1053 | 183 | 0.75 | 0.82 | 0.81 | 0.77 | 0.79 | 0.57 | 0.49 |
| FATHMM-MKL | 705 | 434 | 877 | 38 | 0.95 | 0.67 | 0.74 | 0.93 | 0.81 | 0.64 | 0.55 |
| FitCons | 423 | 571 | 740 | 320 | 0.57 | 0.56 | 0.57 | 0.57 | 0.57 | 0.13 | 0.18 |
| GenoCanyon | 605 | 313 | 998 | 138 | 0.81 | 0.76 | 0.77 | 0.80 | 0.79 | 0.58 | 0.49 |
| LRT | 625 | 205 | 904 | 91 | 0.87 | 0.82 | 0.83 | 0.87 | 0.84 | 0.69 | 0.60 |
| M-CAP | 707 | 104 | 152 | 13 | 0.98 | 0.59 | 0.71 | 0.97 | 0.79 | 0.62 | 0.53 |
| MetaLR | 615 | 76 | 1235 | 128 | 0.83 | 0.94 | 0.93 | 0.85 | 0.88 | 0.77 | 0.70 |
| MetaSVM | 629 | 59 | 1252 | 114 | 0.85 | 0.95 | 0.95 | 0.86 | 0.90 | 0.81 | 0.74 |
| MutationAssessor | 278 | 36 | 999 | 96 | 0.74 | 0.97 | 0.96 | 0.79 | 0.85 | 0.73 | 0.64 |
| MutationTaster2 | 694 | 251 | 1060 | 49 | 0.93 | 0.81 | 0.83 | 0.92 | 0.87 | 0.75 | 0.67 |
| MutPred | 624 | 62 | 1240 | 117 | 0.84 | 0.95 | 0.95 | 0.86 | 0.90 | 0.80 | 0.73 |
| PolyPhen HDIV | 668 | 213 | 925 | 37 | 0.95 | 0.81 | 0.84 | 0.94 | 0.88 | 0.77 | 0.69 |
| PolyPhen HVAR | 633 | 151 | 1046 | 53 | 0.92 | 0.87 | 0.88 | 0.92 | 0.90 | 0.80 | 0.73 |
| PON-P2 | 699 | 76 | 1235 | 48 | 0.94 | 0.94 | 0.90 | 0.96 | 0.94 | 0.87 | 0.82 |
| PROVEAN | 646 | 263 | 1040 | 92 | 0.88 | 0.80 | 0.71 | 0.92 | 0.83 | 0.65 | 0.56 |
| REVEL | 650 | 56 | 1255 | 93 | 0.87 | 0.96 | 0.95 | 0.88 | 0.92 | 0.83 | 0.77 |
| SIFT | 659 | 330 | 979 | 84 | 0.89 | 0.75 | 0.78 | 0.87 | 0.82 | 0.64 | 0.55 |
| VEST3 | 641 | 79 | 1232 | 102 | 0.86 | 0.94 | 0.93 | 0.87 | 0.90 | 0.80 | 0.73 |
Fig. 1Visualisation of six performance measures for tolerance predictors. The methods are organized according to their increasing performance for each of the measures
Performance of tolerance predictors divided to membrane protein parts
| Sensitivity | Specificity | PPV | NPV | ACC | MCC | OPM | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TMa | Ia | Oa | TM | I | O | TM | I | O | TM | I | O | TM | I | O | TM | I | O | TM | I | O | |
| CADD | 0.99 | 0.96 | 0.98 | 0.46 | 0.50 | 0.49 | 0.65 | 0.66 | 0.66 | 0.99 | 0.93 | 0.96 | 0.73 | 0.73 | 0.73 | 0.54 | 0.52 | 0.54 | 0.45 | 0.43 | 0.45 |
| DANN | 1.00 | 0.99 | 0.99 | 0.20 | 0.24 | 0.24 | 0.56 | 0.57 | 0.56 | 1.00 | 0.97 | 0.95 | 0.60 | 0.62 | 0.61 | 0.33 | 0.35 | 0.34 | 0.30 | 0.31 | 0.30 |
| Eigen | 0.78 | 0.89 | 0.70 | 0.73 | 0.83 | 0.82 | 0.74 | 0.84 | 0.79 | 0.77 | 0.88 | 0.73 | 0.76 | 0.86 | 0.76 | 0.51 | 0.71 | 0.52 | 0.43 | 0.63 | 0.44 |
| Eigen-PC | 0.78 | 0.88 | 0.70 | 0.74 | 0.80 | 0.81 | 0.75 | 0.82 | 0.79 | 0.77 | 0.87 | 0.73 | 0.76 | 0.84 | 0.76 | 0.52 | 0.69 | 0.51 | 0.44 | 0.60 | 0.43 |
| FATHMM | 0.63 | 0.76 | 0.81 | 0.86 | 0.78 | 0.84 | 0.82 | 0.77 | 0.84 | 0.70 | 0.76 | 0.82 | 0.75 | 0.77 | 0.83 | 0.51 | 0.54 | 0.66 | 0.43 | 0.45 | 0.57 |
| FATHMM-MKL | 0.97 | 0.93 | 0.96 | 0.63 | 0.67 | 0.68 | 0.73 | 0.74 | 0.75 | 0.95 | 0.90 | 0.94 | 0.80 | 0.80 | 0.82 | 0.64 | 0.62 | 0.66 | 0.54 | 0.53 | 0.57 |
| FitCons | 0.44 | 0.69 | 0.55 | 0.72 | 0.55 | 0.53 | 0.61 | 0.60 | 0.54 | 0.56 | 0.64 | 0.54 | 0.58 | 0.62 | 0.54 | 0.17 | 0.24 | 0.08 | 0.20 | 0.24 | 0.16 |
| GenoCanyon | 0.78 | 0.84 | 0.81 | 0.72 | 0.77 | 0.77 | 0.74 | 0.78 | 0.78 | 0.76 | 0.83 | 0.80 | 0.75 | 0.81 | 0.79 | 0.50 | 0.61 | 0.58 | 0.42 | 0.52 | 0.49 |
| LRT | 0.94 | 0.85 | 0.85 | 0.75 | 0.82 | 0.83 | 0.79 | 0.83 | 0.83 | 0.93 | 0.85 | 0.85 | 0.85 | 0.84 | 0.84 | 0.70 | 0.67 | 0.68 | 0.62 | 0.59 | 0.59 |
| M-CAP | 0.99 | 0.98 | 0.98 | 0.55 | 0.57 | 0.64 | 0.69 | 0.69 | 0.73 | 0.98 | 0.96 | 0.97 | 0.77 | 0.77 | 0.81 | 0.60 | 0.60 | 0.66 | 0.50 | 0.50 | 0.56 |
| MetaLR | 0.81 | 0.81 | 0.85 | 0.92 | 0.93 | 0.96 | 0.91 | 0.92 | 0.95 | 0.83 | 0.83 | 0.87 | 0.86 | 0.87 | 0.91 | 0.73 | 0.74 | 0.82 | 0.65 | 0.66 | 0.75 |
| MetaSVM | 0.85 | 0.83 | 0.86 | 0.93 | 0.95 | 0.97 | 0.92 | 0.94 | 0.97 | 0.86 | 0.85 | 0.87 | 0.89 | 0.89 | 0.91 | 0.78 | 0.78 | 0.83 | 0.70 | 0.71 | 0.77 |
| MutationAssessor | 0.89 | 0.60 | 0.78 | 0.92 | 0.98 | 0.97 | 0.91 | 0.97 | 0.96 | 0.89 | 0.71 | 0.82 | 0.90 | 0.79 | 0.88 | 0.80 | 0.63 | 0.77 | 0.73 | 0.53 | 0.69 |
| MutationTaster2 | 0.99 | 0.92 | 0.91 | 0.78 | 0.81 | 0.82 | 0.82 | 0.83 | 0.83 | 0.99 | 0.91 | 0.90 | 0.89 | 0.87 | 0.86 | 0.79 | 0.74 | 0.73 | 0.71 | 0.65 | 0.65 |
| MutPred | 0.87 | 0.78 | 0.88 | 0.91 | 0.97 | 0.95 | 0.91 | 0.96 | 0.95 | 0.87 | 0.81 | 0.89 | 0.89 | 0.87 | 0.92 | 0.78 | 0.76 | 0.83 | 0.70 | 0.68 | 0.77 |
| PolyPhen HDIV | 0.96 | 0.94 | 0.94 | 0.70 | 0.83 | 0.83 | 0.76 | 0.85 | 0.85 | 0.95 | 0.93 | 0.94 | 0.83 | 0.89 | 0.89 | 0.69 | 0.78 | 0.78 | 0.59 | 0.70 | 0.70 |
| PolyPhen HVAR | 0.95 | 0.90 | 0.92 | 0.75 | 0.90 | 0.89 | 0.80 | 0.90 | 0.89 | 0.94 | 0.90 | 0.92 | 0.85 | 0.90 | 0.91 | 0.72 | 0.80 | 0.81 | 0.63 | 0.73 | 0.74 |
| PON-P2 | 0.94 | 0.93 | 0.94 | 0.90 | 0.94 | 0.96 | 0.90 | 0.88 | 0.92 | 0.94 | 0.96 | 0.97 | 0.92 | 0.93 | 0.95 | 0.85 | 0.85 | 0.89 | 0.78 | 0.80 | 0.85 |
| PROVEAN | 0.91 | 0.83 | 0.89 | 0.70 | 0.81 | 0.81 | 0.75 | 0.82 | 0.83 | 0.89 | 0.83 | 0.88 | 0.81 | 0.82 | 0.85 | 0.63 | 0.65 | 0.71 | 0.54 | 0.56 | 0.62 |
| REVEL | 0.90 | 0.85 | 0.88 | 0.93 | 0.96 | 0.97 | 0.93 | 0.95 | 0.96 | 0.91 | 0.86 | 0.89 | 0.92 | 0.90 | 0.92 | 0.84 | 0.81 | 0.85 | 0.77 | 0.74 | 0.79 |
| SIFT | 0.94 | 0.86 | 0.88 | 0.65 | 0.74 | 0.78 | 0.73 | 0.77 | 0.80 | 0.91 | 0.84 | 0.87 | 0.80 | 0.80 | 0.83 | 0.62 | 0.60 | 0.67 | 0.52 | 0.52 | 0.58 |
| VEST3 | 0.95 | 0.86 | 0.82 | 0.90 | 0.94 | 0.95 | 0.90 | 0.94 | 0.94 | 0.95 | 0.87 | 0.84 | 0.92 | 0.90 | 0.88 | 0.85 | 0.81 | 0.77 | 0.79 | 0.74 | 0.70 |
aI, inside the membrane; O, outside the membrane; TM transmembrane
Fig. 2Visualization of the performance of tolerance predictors for MP variants. The graphs indicate the performance of each measure as well as how balanced the methods are. Good predictors are balanced and predict both positive and negative cases equally well
Fig. 3Venn diagram for the congruence of the five best performing tools. The results are shown for 2002 variants that all the tools predicted. All the five methods predicted correct 1508 (75.3%) variants
Statistics for predicted variants in human membrane proteome
| Total | Outer | Transmembrane | Inner | |
|---|---|---|---|---|
| Number of proteins | 5422 | – | – | – |
| Number of predicted proteins | 5070 | – | – | – |
| Predicted proteins (%) | 93.51 | – | – | – |
| Number of amino acids | 2,983,503 | 1,458,196 | 456,186 | 1,069,121 |
| Number of predicted amino acids | 2,850,519 | 1,367,843 | 434,745 | 1,047,931 |
| Predicted amino acids (%) | 95.54 | 93.80 | 95.30 | 98.02 |
| Number of possible variants in all proteins/region | 56,686,557 | 27,705,724 | 8,667,534 | 20,313,299 |
| Number of possible variants in predicted proteins/region | 54,159,861 | 25,989,017 | 8,260,155 | 19,910,689 |
| Number of predicted variants | 53,310,412 | 25,558,804 | 8,169,606 | 19,581,983 |
| Predicted variants (% of possible) | 98.43 | 98.34 | 98.90 | 98.35 |
| Number of variants predicted as neutral | 21,343,305 | 10,529,555 | 3,141,354 | 7,672,377 |
| Neutral variants (%) | 40.04 | 41.20 | 38.45 | 39.18 |
| Average number of neutral variants per protein | 4197.31 | 2070.71 | 617.77 | 1508.83 |
| Median number of neutral variants per protein | 2471 | 727 | 234 | 781 |
| Number of variants predicted as pathogenic | 9,760,571 | 4367,037 | 1,702,467 | 3,691,067 |
| Pathogenic variants (%) | 18.31 | 17.09 | 20.84 | 18.8 |
| Average number of pathogenic variants per protein | 1919.48 | 858.81 | 334.80 | 725.87 |
| Median number of pathogenic variants per protein | 173 | 19 | 2 | 20 |
| Number of variants predicted as unknown | 22,206,536 | 10,662,212 | 3,325,785 | 8,218,539 |
| Unknown variants (%) | 41.66 | 41.72 | 40.71 | 41.97 |
| Average number of unknown variants per protein | 4367.07 | 2096.80 | 654.04 | 1616.23 |
| Median number of unknown variants per protein | 3013 | 714 | 290 | 616 |
| Ratio of pathogenic and neutral variants | 0.46 | 0.41 | 0.54 | 0.48 |