| Literature DB >> 29226803 |
Vanessa E Gray1, Ronald J Hause1, Jens Luebeck1, Jay Shendure2, Douglas M Fowler3.
Abstract
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).Entities:
Keywords: large-scale mutagenesis; machine learning; variant effect prediction
Mesh:
Year: 2017 PMID: 29226803 PMCID: PMC5799033 DOI: 10.1016/j.cels.2017.11.003
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304