| Literature DB >> 29538399 |
Ilia Korvigo1,2,3, Andrey Afanasyev1,4, Nikolay Romashchenko2, Mikhail Skoblov1,5.
Abstract
Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/.Entities:
Mesh:
Year: 2018 PMID: 29538399 PMCID: PMC5851551 DOI: 10.1371/journal.pone.0192829
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Prediction inconsistency.
A heatmap of Spearman correlation between rank-transformed output values of different deleteriousness scoring systems. 1000F—allele frequency according to the 1000 Genomes project. Greater absolute correlation means greater consistency.
The fraction of nsSNVs with no predictions made by popular deleteriousness scores and the MetaLR meta-score.
| Dataset | PolyPhen-2 | SIFT | FATHMM | MutationTaster | MetaLR |
|---|---|---|---|---|---|
| Exome | 0.09 | 0.1 | 0.14 | 0.02 | 0.08 |
| Test I | 0.02 | 0.03 | 0.06 | 0.01 | 0.004 |
| Test II | 0.01 | 0.03 | 0.04 | 0.003 | 0.006 |
*The fractions are estimated by querying a random subset of 1 ⋅ 106 SNVs from dbNSFP v3.2 [9].
**Our testing datasets I and II (described in the Materials and methods), comprising variations with experimental evidence of phenotype.
Fig 2Network types.
Schematic representation of basic deep learning models used in this study. (a) A multilayer perceptron (MLP). (b) A shallow denoising autoencoder (dAE). (c) Connecting dAEs into a stacked denoising autoencoder (sdAE); notice that each individual dAE learns to reconstruct the latent representation from the previous one (data stream is represented by arrows). Colours encode layer functions (combinations are possible): blue—input, light-red—latent, dark-red—dropout (noise), purple—output, hollow—discarded.
Fig 3Nonlinearities.
The sigmoid (a) and hyperbolic tangent (b) iteratively applied 3 times. Observe how repeated application of the sigmoid function quickly makes the gradient vanish completely.
ROC curve AUC score with 95% confidence intervals.
| Score | Test I | Test II |
|---|---|---|
| CADD | 0.85 (0.79–0.90) | 0.78 (0.78–0.79) |
| DANN | 0.84 (0.79–0.89) | 0.75 (0.74–0.75) |
| Eigen | 0.86 (0.81–0.91) | 0.67 (0.66–0.68) |
| FATHMM | 0.84 (0.78–0.89) | 0.91 (0.90–0.91) |
| GERP++ | 0.79 (0.73–0.84) | 0.68 (0.67–0.68) |
| LRT | 0.85 (0.80–0.90) | 0.73 (0.72–0.74) |
| 0.90 (0.86–0.94) | 0.94 (0.94–0.95) | |
| MetaLR | 0.90 (0.86–0.94) | 0.93 (0.93–0.94) |
| MetaSVM | 0.90 (0.86–0.94) | 0.92 (0.92–0.93) |
| MutationAssessor | 0.78 (0.71–0.83) | 0.77 (0.77–0.78) |
| MutationTaster | 0.91 (0.87–0.94) | 0.76 (0.75–0.77) |
| PROVEAN | 0.82 (0.77–0.88) | 0.77 (0.77–0.78) |
| PolyPhen HDIV | 0.79 (0.73–0.85) | 0.77 (0.76–0.77) |
| PolyPhen HVAR | 0.80 (0.74–0.86) | 0.79 (0.78–0.79) |
| SIFT | 0.77 (0.71–0.82) | 0.78 (0.77–0.78) |
| SiPhy 29-way | 0.82 (0.76–0.87) | 0.70 (0.70–0.71) |
| phastCons 100-way | 0.81 (0.76–0.86) | 0.69 (0.69–0.70) |
| phyloP 100-way | 0.89 (0.85–0.93) | 0.75 (0.74–0.76) |
| 0.92 (0.88–0.95) | 0.92 (0.92–0.93) |
Fig 4ROC-curves.
MLP, MetaLR, MetaSVM, sDAE and MutationTaster produced the largest area under the curve.
Average precision score with 95% confidence intervals.
| Score | Test I | Test II |
|---|---|---|
| CADD | 0.79 (0.70–0.87) | 0.60 (0.58–0.61) |
| DANN | 0.84 (0.77–0.90) | 0.55 (0.53–0.56) |
| Eigen | 0.81 (0.73–0.89) | 0.57 (0.56–0.58) |
| FATHMM | 0.83 (0.75–0.90) | 0.83 (0.82–0.84) |
| GERP++ | 0.77 (0.69–0.85) | 0.44 (0.43–0.45) |
| LRT | 0.87 (0.82–0.92) | 0.64 (0.63–0.65) |
| 0.91 (0.82–0.94) | 0.89 (0.88–0.91) | |
| MetaLR | 0.88 (0.81–0.94) | 0.87 (0.87–0.89) |
| MetaSVM | 0.91 (0.87–0.95) | 0.87 (0.86–0.88) |
| MutationAssessor | 0.78 (0.70–0.85) | 0.67 (0.66–0.68) |
| MutationTaster | 0.91 (0.87–0.95) | 0.67 (0.67–0.68) |
| PROVEAN | 0.76 (0.67–0.85) | 0.61 (0.59–0.62) |
| PolyPhen HDIV | 0.80 (0.73–0.86) | 0.67 (0.66–0.68) |
| PolyPhen HVAR | 0.77 (0.69–0.84) | 0.66 (0.65–0.68) |
| SIFT | 0.79 (0.72–0.85) | 0.67 (0.66–0.68) |
| SiPhy 29-way | 0.75 (0.65–0.84) | 0.47 (0.45–0.48) |
| phastCons 100-way | 0.86 (0.81–0.90) | 0.66 (0.66–0.67) |
| phyloP 100-way | 0.89 (0.83–0.94) | 0.56 (0.54–0.57) |
| 0.92 (0.86–0.96) | 0.87 (0.86–0.87) |
Maximum average values of threshold-sensitive performance measures, evaluated for test II.
Numbers in parentheses represent corresponding cutoffs.
| Score | F1-score | MCC | Accuracy |
|---|---|---|---|
| CADD | 0.64 (0.50) | 0.43 (0.58) | 0.74 (0.73) |
| DANN | 0.60 (0.51) | 0.35 (0.53) | 0.71 (0.85) |
| Eigen | 0.61 (0.55) | 0.43 (0.66) | 0.77 (0.80) |
| FATHMM | 0.79 (0.83) | 0.67 (0.87) | 0.85 (0.88) |
| GERP++ | 0.57 (0.38) | 0.30 (0.38) | 0.67 (0.99) |
| LRT | 0.61 (0.49) | 0.37 (0.51) | 0.71 (0.68) |
| 0.83 (0.68) | 0.75 (0.69) | 0.89 (0.70) | |
| MetaLR | 0.82 (0.83) | 0.74 (0.83) | 0.88 (0.88) |
| MetaSVM | 0.81 (0.81) | 0.72 (0.86) | 0.88 (0.87) |
| MutationAssessor | 0.64 (0.73) | 0.47 (0.81) | 0.78 (0.85) |
| MutationTaster | 0.63 (0.45) | 0.41 (0.47) | 0.72 (0.80) |
| PROVEAN | 0.63 (0.55) | 0.42 (0.60) | 0.74 (0.79) |
| PolyPhen HDIV | 0.63 (0.55) | 0.42 (0.74) | 0.75 (0.88) |
| PolyPhen HVAR | 0.64 (0.59) | 0.44 (0.59) | 0.75 (0.77) |
| SIFT | 0.64 (0.58) | 0.44 (0.66) | 0.76 (0.72) |
| SiPhy 29-way | 0.59 (0.44) | 0.33 (0.46) | 0.67 (0.78) |
| phastCons 100-way | 0.59 (0.39) | 0.33 (0.68) | 0.67 (0.78) |
| phyloP 100-way | 0.60 (0.51) | 0.37 (0.62) | 0.73 (0.73) |
| 0.81 (0.69) | 0.72 (0.79) | 0.88 (0.79) |
MLP’s and sdAE’s ROC curve AUC with 95% confidence intervals evaluated on subsets of SNVs from the training dataset II that could not be processed by other predictors.
| Score | Missing predictions | MLP | sdAE |
|---|---|---|---|
| Eigen | 1175 | 0.97 (0.94–0.98) | 0.95 (0.92–0.97) |
| FATHMM | 898 | 0.80 (0.75–0.85) | 0.34 (0.28–0.42) |
| LRT | 1772 | 0.94 (0.93–0.95) | 0.80 (0.77–0.84) |
| MetaLR | 118 | 0.76 (0.68–0.85) | 0.59 (0.49–0.70) |
| MetaSVM | 118 | 0.76 (0.67–0.85) | 0.59 (0.48–0.69) |
| MutationAssessor | 843 | 0.90 (0.88–0.92) | 0.72 (0.67–0.77) |
| PROVEAN | 426 | 0.85 (0.81–0.90) | 0.47 (0.39–0.55) |
| PolyPhen HDIV | 286 | 0.85 (0.80–0.89) | 0.53 (0.46–0.60) |
| PolyPhen HVAR | 286 | 0.84 (0.80–0.89) | 0.53 (0.45–0.61) |
| SIFT | 514 | 0.89 (0.85–0.92) | 0.59 (0.52–0.66) |
MLP’s and sdAE’s average precision with 95% confidence intervals evaluated on subsets of SNVs from the training dataset II that could not be processed by other predictors.
| Score | Number predictions | MLP | sdAE |
|---|---|---|---|
| Eigen | 1175 | 1.00 (0.99–1.00) | 0.99 (0.99–1.00) |
| FATHMM | 898 | 0.52 (0.42–0.62) | 0.18 (0.11–0.25) |
| LRT | 1772 | 0.83 (0.79–0.87) | 0.74 (0.69–0.78) |
| MetaLR | 118 | 0.81 (0.70–0.90) | 0.63 (0.51–0.75) |
| MetaSVM | 118 | 0.81 (0.71–0.90) | 0.63 (0.51–0.75) |
| MutationAssessor | 843 | 0.83 (0.78–0.87) | 0.72 (0.67–0.77) |
| PROVEAN | 426 | 0.69 (0.59–0.78) | 0.41 (0.31–0.50) |
| PolyPhen HDIV | 286 | 0.82 (0.75–0.87) | 0.60 (0.52–0.68) |
| PolyPhen HVAR | 286 | 0.81 (0.74–0.87) | 0.60 (0.52–0.68) |
| SIFT | 514 | 0.79 (0.72–0.86) | 0.59 (0.51–0.66) |
Average classification success rate across GO terms and the number of significantly enriched terms.
The number of terms enriched in the misclassified subset is given in parentheses.
| Score | Average success rate | Significantly enriched |
|---|---|---|
| 0.864 | 139 (0) | |
| MetaLR | 0.857 | 139 (0) |
| 0.850 | 139 (0) | |
| MetaSVM | 0.849 | 139 (0) |
| FATHMM | 0.828 | 139 (0) |
| MutationAssessor | 0.725 | 136 (0) |
| PolyPhen HVAR | 0.720 | 134 (0) |
| Eigen | 0.720 | 134 (0) |
| PROVEAN | 0.714 | 133 (1) |
| SIFT | 0.713 | 134 (1) |
| PolyPhen HDIV | 0.711 | 135 (0) |
| CADD | 0.707 | 131 (1) |
| MutationTaster | 0.681 | 127 (2) |
| phyloP 100-way | 0.671 | 126 (2) |
| DANN | 0.668 | 126 (1) |
| LRT | 0.665 | 122 (3) |
| SiPhy 29-way | 0.642 | 120 (6) |
| phastCons 100-way | 0.631 | 115 (10) |
| GERP++ | 0.615 | 111 (14) |
Average deviation of FP/FN rates from equilibrium (imbalance) across all GO terms in the misclassified subsection of the test dataset II and the number of terms significantly enriched in either the FP or FN subsets of the misclassified variations.
| Score | Average imbalance | Significantly enriched |
|---|---|---|
| 0.130 | 70 | |
| FATHMM | 0.131 | 81 |
| PolyPhen HVAR | 0.137 | 93 |
| SIFT | 0.138 | 87 |
| 0.143 | 82 | |
| MetaLR | 0.145 | 78 |
| MetaSVM | 0.147 | 85 |
| PolyPhen HDIV | 0.153 | 100 |
| MutationAssessor | 0.161 | 99 |
| Eigen | 0.169 | 104 |
| PROVEAN | 0.170 | 100 |
| phyloP 100-way | 0.183 | 113 |
| LRT | 0.200 | 116 |
| CADD | 0.213 | 122 |
| DANN | 0.223 | 123 |
| SiPhy 29-way | 0.251 | 128 |
| MutationTaster | 0.266 | 129 |
| phastCons 100-way | 0.291 | 135 |
| GERP++ | 0.310 | 136 |