| Literature DB >> 30995449 |
Sirawit Ittisoponpisan1, Suhail A Islam1, Tarun Khanna1, Eman Alhuzimi1, Alessia David2, Michael J E Sternberg3.
Abstract
Knowledge of protein structure can be used to predict the phenotypic consequence of a missense variant. Since structural coverage of the human proteome can be roughly tripled to over 50% of the residues if homology-predicted structures are included in addition to experimentally determined coordinates, it is important to assess the reliability of using predicted models when analyzing missense variants. Accordingly, we assess whether a missense variant is structurally damaging by using experimental and predicted structures. We considered 606 experimental structures and show that 40% of the 1965 disease-associated missense variants analyzed have a structurally damaging change in the mutant structure. Only 11% of the 2134 neutral variants are structurally damaging. Importantly, similar results are obtained when 1052 structures predicted using Phyre2 algorithm were used, even when the model shares low (<40%) sequence identity to the template. Thus, structure-based analysis of the effects of missense variants can be effectively applied to homology models. Our in-house pipeline, Missense3D, for structurally assessing missense variants was made available at http://www.sbg.bio.ic.ac.uk/~missense3d.Entities:
Keywords: Phyre2 protein structure prediction; missense variants; protein structure prediction; structure-based prediction; variant effect prediction
Mesh:
Substances:
Year: 2019 PMID: 30995449 PMCID: PMC6544567 DOI: 10.1016/j.jmb.2019.04.009
Source DB: PubMed Journal: J Mol Biol ISSN: 0022-2836 Impact factor: 5.469
Fig. 1Pipeline to analyze the structural impact of missense variants in experimental and predicted structures.
Structural features evaluated in this study and used by Missense3D web server
| Feature assessed as a structural impact | Description |
|---|---|
| Disulfide bond breakage | The substitution breaks a disulfide bond that was in the wild-type. The maximum S–S length for the bond is 3.3 Å. |
| Buried Pro introduced | The substitution introduces a buried proline. |
| Clash | The mutant structure has a MolProbity clash score ≥ 30 and the increase in clash score is > 18 compared to the wild type. |
| Buried hydrophilic introduced | The substitution replaces a buried hydrophobic residue with a hydrophilic residue. |
| Buried charge introduced | The substitution replaces a buried uncharged residue with a charged residue. |
| Buried charge switch | The substitution switches the charge (+/−) of the buried residue. |
| Secondary structure altered | The substitution results in a change in the DSSP secondary structure assignment at the variant position. |
| Buried charge replaced | The substitution replaces a buried charged residue with an uncharged residue. |
| Disallowed phi/psi | The mutant residue is in an outlier region, while the wild-type residue in both the 3D coordinates input file and the WT are in the favored or allowed regions. |
| Buried Gly replaced | The substitution replaces a buried glycine. |
| Buried H-bond breakage | The substitution breaks all side-chain/side-chain H-bond(s) and/or side-chain/main-chain bond(s) formed by the wild-type residue which was buried. The maximum H-bond N–O length is 3.9 Å. |
| Buried salt bridge breakage | The substitution breaks a salt bridge formed by the wild-type residue which was buried. The maximum N–O bond length is 5.0 Å. |
| Cavity altered | The substitution leads to an expansion or contraction of the cavity volume of ≥ 70Å3. Cavity also refers to a pocket on the surface. |
| Buried/exposed switch | The substitution results in a change between buried and exposed state of the target residue. (RSA < 9% for buried and the difference between RSA has to be at least 5%.) |
| Cis Pro replaced | The substitution replaces a proline, which was in cis configuration in the wild type. |
| Gly in a bend | The substitution replaces a glycine, which is located in a bend curvature (reported “S” in DSSP). |
| Exposed hydrophobic introduced (evaluated but not used) | The substitution replaces an exposed hydrophilic residue with a hydrophobic residue (not employed as a feature in Missense3D). |
Fig. 2Performance of structural analysis on experimental structures. For each feature, the TPR on disease-associated (Disease) and the FPR on neutral (Neutral) variants are plotted as bars. The ratios of the true-positive to false-positive rates (TPR/FPR) are given, and for ease of viewing, these are connected by a line. The overall TPR and FPR on the entire data set are also reported. Significance at P < 0.01 (denoted by **) is evaluated in a one-tailed test of the difference of two proportions. Panel a are the results from the set of high-quality X-ray structures (resolution < 2.0 Å) from MolProbity's Top8000 database [37]. Panel b are the results on a second independent data set of 855 structures with lower resolution (< 2.5 Å) from the PDB.
Fig. 3The distribution of the RMSD (Å) between the Phyre2-predicted models and the true experimental coordinates binned according to the % sequence identity between query and template. The central line in a box is the median RMSD with its value reported. The upper and lower box boundaries are the upper and lower quartiles. The whiskers extend up an additional 1.5 × the difference between the median and the upper quartile and down an additional 1.5 × the difference between the median and the lower quartile.
Fig. 4Performance of structural analysis on predicted models at different sequence identities. For each sequence identity bin, the fractions of positive predictions for the disease-associated and neutral variants are shown for both the predicted and the corresponding experimental structures. 95% Confidence intervals on the positive rates are shown as lines.
Fig. 5Histogram of the relative frequencies of RMSD (Å) of predicted models grouped according to whether the variant was a TP and FP. The bin labeling shows the upper bound; for example, 0.5 denotes the range 0.0 Å ≤ RMSD < 0.5 Å. The last bin is RMSD ≥ 6 Å. The relative percentage displayed on the Y axis is the fraction of true positives and similarly the fraction of false positives in each bin range.
Fig. 6Structural analysis of p.His107Tyr in the experimental and predicted structures of carbonic anhydrase (CA2). (a) His 107 in the wild-type (WT) (PDB ID: 2FOS), (b) Tyr 107 in the mutant (MUTANT) modeled from the WT, (c) Phyre2-predicted structure of the wild type with His 107 (PREDICTED WILD TYPE) and (d) predicted structure of the Tyr 107 mutant based on the Phyre2-predicted wild-type structure (PREDICTED MUTANT). In all four panels, the Cα traces of the structures analyzed by Missense3D are shown in gray. In the predicted structures (c and d), the Cα trace and side-chain positions shown in panels a and b are shown in pink. The side chains of the His 107, Tyr 107 and Glu 117 are show by chemical type (green for non-polar, blue for positive and red for negative). Figures were generated using PyMOL [54].
Fig. 7Structural analysis of variant p.Cys52Arg in the crystal and predicted structures of alpha-galactosidase. (a) Cys 52 in the wild type, (b) Arg 52 in the mutant modeled from the wild type, (c) Phyre2 wild-type predicted structure of the Cys 52 using a model with RMSD to the crystal structure of 1.4 Å (PREDICTED WILD TYPE) and (d) Phyre2-predicted wild-type structure of the Cys 52 using a model with RMSD to the crystal structure of 2.8 Å (PREDICTED WILD TYPE (2)). Color scheme for the four panels as in Fig. 7. In addition, the SG atoms are shown in yellow.