| Literature DB >> 32712465 |
Stephen Cole1, Sudhakaran Prabakaran2.
Abstract
Phosphorylation sites often have key regulatory functions and are central to many cellular signaling pathways, so mutations that modify them have the potential to contribute to pathological states such as cancer. Although many classifiers exist for prioritization of coding genomic variants, to our knowledge none of them explicitly account for the alteration or creation of kinase recognition motifs that alter protein structure, function, regulation of activity, and interaction networks through modifying the pattern of phosphorylation. We present a novel computational pipeline that uses a random forest classifier to predict the pathogenicity of a variant, according to its direct or indirect effect on local phosphorylation sites and the predicted functional impact of perturbing a phosphorylation event. We call this classifier PhosphoEffect and find that it compares favorably and with increased accuracy to the existing classifier PolyPhen 2.2.2 when tested on a dataset of known variants enriched for phosphorylation sites and their neighbors.Entities:
Keywords: Biochemistry; Omics
Year: 2020 PMID: 32712465 PMCID: PMC7387813 DOI: 10.1016/j.isci.2020.101321
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Enrichment of Mutations in the Vicinity of Phosphorylation Sites in Human Cancers
The solid red line shows the expected number of mutations at each site, whereas the dotted lines are the upper and lower 2.5% bounds of the corresponding hypergeometric distribution, respectively.
Enrichment of Mutations in the Vicinity of Phosphorylation Sites in the COSMIC Cancer Mutation Database
| Site Relative to Phosphosite | Fold Enrichment of Mutations | BH-Corrected p Value |
|---|---|---|
| −5 | 1.02 | 0.126 |
| −4 | 1.06 | 6.86 × 10−6 |
| −3 | 1.14 | 2.42 × 10−8 |
| −2 | 1.03 | 1.79 × 10−2 |
| −1 | 1.08 | 3.08 × 10−8 |
| 0 | 0.94 | 1.60 × 10−5 |
| 1 | 1.05 | 3.54 × 10−4 |
| 2 | 1.03 | 4.74 × 10−2 |
| 3 | 1.01 | 0.632 |
| 4 | 1.01 | 0.662 |
| 5 | 1.01 | 0.296 |
All hypergeometric test p -values are Benjamini-Hochberg corrected for multiple testing.
Figure 2Performance of PolyPhen on Known Variants
The false-positive (A) and false negative (B) rates of the PolyPhen classifier on phosphorylated residues, residues neighboring phosphorylation sites, and all others. The error bars indicate standard errors of the proportion. The difference in false-positive rates between classes was significant (p=2.43×10ˆ-9) and the difference in false-negative rates was marginally significant (p=0.0455). All p-values, including for ad-hoc pairwise testing, are in Table S1.
Figure 3NetPhorest Outputs for p53 with Point Mutations at Ser20
(A–C) (A) Wild-type sequence, (B) Mutant S20Y, (C) Mutant S20P.
Features Used to Train the Classifier
| Feature | Source |
|---|---|
| Molecular weight of wild-type residue (Da) | |
| Molecular weight of mutant residue (Da) | |
| Chemical property of wild-type residue (acidic, basic, polar, nonpolar) | |
| Chemical property of mutant residue | |
| Polyphen score | Polyphen 2.2.2 ( |
| Secondary structure (helix, sheet, disordered, other) | RING ( |
| Wild-type residue phosphorylated? | PhosphoSitePlus ( |
| Number of phosphorylated neighbors (+-5) | PhosphoSitePlus |
| Impact on phosphorylation level | Derived from NetPhorest ( |
| Network perturbation score | Derived from iPTMnet ( |
| Number of internal contacts (of neighboring phosphorylated residues) | RING |
| RAPDF (of neighboring phosphorylated residues) | RING |
RAPDF, residue-specific all atom-dependent conditional probability distribution function, RING, residue interaction network generator.
Hyperparameters of the Random Forest Classifier
| Hyperparameter | Value |
|---|---|
| Bootstrap | True |
| Criterion | Gini |
| Maximum leaf nodes | None |
| Minimum impurity decrease | 1 × 10−5 |
| Minimum impurity split | None |
| Minimum samples per leaf | 1 |
| Minimum samples per split | 2 |
| Number of estimators | 100 |
The optimal hyperparameters, selected by grid search cross-validation, for the random forest classifier are shown.
Figure 4Performance of the PhosphoEffect Compared with PolyPhen on the Test Dataset
Receiver operating characteristic (ROC) curves for (A) PhosphoEffect classifier and (B) PolyPhen; (C) PhosphoEffect without PolyPhen feature and their respective areas under the curve are 0.884, 0.859, 0.747.
Figure 5Feature Importances
The relative contribution of each feature on which the classifier was trained to the model predictions; note log-scale of y axis.