| Literature DB >> 30382169 |
Yasaman Karami1, Tristan Bitard-Feildel1,2, Elodie Laine3, Alessandra Carbone4,5.
Abstract
Characterizing a protein mutational landscape is a very challenging problem in Biology. Many disease-associated mutations do not seem to produce any effect on the global shape nor motions of the protein. Here, we use relatively short all-atom biomolecular simulations to predict mutational outcomes and we quantitatively assess the predictions on several hundreds of mutants. We perform simulations of the wild type and 175 mutants of PSD95's third PDZ domain in complex with its cognate ligand. By recording residue displacements correlations and interactions, we identify "communication pathways" and quantify them to predict the severity of the mutations. Moreover, we show that by exploiting simulations of the wild type, one can detect 80% of the positions highly sensitive to mutations with a precision of 89%. Importantly, our analysis describes the role of these positions in the inter-residue communication and dynamical architecture of the complex. We assess our approach on three different systems using data from deep mutational scanning experiments and high-throughput exome sequencing. We refer to our analysis as "infostery", from "info" - information - and "steric" - arrangement of residues in space. We provide a fully automated tool, COMMA2 ( www.lcqb.upmc.fr/COMMA2 ), that can be used to guide medicinal research by selecting important positions/mutations.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30382169 PMCID: PMC6208415 DOI: 10.1038/s41598-018-34508-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Sequence evolution-structural dynamics-function relationship and protein infostery. (a) Methods have been developed toward systematically assessing the link between the functional outcome of mutations and protein sequence evolution (arrow in black). Here, we investigate the link between functional outcome and protein structural dynamics (arrow in orange). (b) A protein is depicted as a grey shape and some residues are indicated by dots. Our approach relies on the identification of communication pathways (black edges between residues) and dynamical units (regions of the protein colored in red and blue). Top left: 3 overlapping communication pathways. The first and last residues of each pathway are colored the same way (yellow, red and magenta). Top right: 4 protein residues in direct communication with the protein’s ligand (green thick segment). Bottom left: 3 residues belonging to different types of dynamical units. Bottom right: 2 pairs of residues bridging two sub-regions of a dynamical unit. The more pronounced color of the two subregions indicate that they contain many pathways (dense communication).
Figure 2Infostery analysis of the wild-type PSD95-CRIPT peptide complex and two deleterious mutants. WT: wild-type. MU: H372A mutant. MU: A347F mutant. Pathway properties are mapped onto conformations averaged over 5 × 15 ns MD simulations. (a) Communication pathways (>3 residues) are displayed as segments linking residues’ C-α atoms. The thickness of each segment is proportional to the number of pathways linking the residue pair. (b) Pathway concentration is displayed as spheres centered on residues’ C-α atoms. The size of each sphere is proportional to the number of pathways crossing the residue.
Figure 3Effect of single-point mutations on pathway concentration in PSD95-CRIPT peptide complex. (a) Number of pathways longer than 3, 4, 5 or 6 residues. (b) Number of residues crossed by >50 to >120 pathways. The curves are colored according to the experimentally measured effects of the mutations: beneficial in pink, neutral in grey tones and deleterious in blue tones. (c,d) Inverse cumulative distribution functions of the number of pathways (>3 residue long) (c) and of the number of highly connected residues (>70 pathways) (d) for 175 mutations: 45 neutral (in grey), 71 deleterious (in light blue) and 59 highly deleterious (in dark blue). Each y value corresponds to the percentage of neutral, deleterious or highly deleterious mutations displaying a number of pathways (log) or a number of highly connected residues higher than the x value. The orange and red lines (superimposed on the plots) indicate the largest differences between the grey and dark blue curves and between the grey and light blue curves, respectively.
Performance of the number of highly connected residues as predictors for experimental mutational outcome.
| Coef | All (45 neutral + 59 highly del.) | Filtered (15 neutral + 41 highly del.) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens | Spe | Pre | Acc | F1 | MCC | Sens | Spe | Pre | Acc | F1 | MCC | |
| 1 | 97 | 27 | 63 | 66 | 77 | 34 | 95 | 40 | 81 | 80 | 88 | 44 |
| 93 | 31 | 64 | 66 | 76 | 32 |
|
|
|
|
|
| |
| 1.4 | 86 | 40 | 65 | 66 | 74 | 30 | 85 | 47 | 81 | 75 | 83 | 34 |
| 1.6 | 78 | 47 | 66 | 64 | 71 | 26 | 73 | 60 | 83 | 70 | 78 | 31 |
| 1.8 | 75 | 60 | 71 | 68 | 73 | 35 | 71 | 67 | 85 | 70 | 77 | 34 |
| 2.0 | 75 | 62 | 72 | 69 | 73 | 37 | 71 | 67 | 85 | 70 | 77 | 34 |
|
|
|
|
|
|
| 66 | 67 | 84 | 66 | 74 | 29 | |
| 2.4 | 66 | 71 | 75 | 68 | 70 | 37 | 61 | 67 | 83 | 62 | 70 | 25 |
The values of sensitivity (Sens), specificity (Spe), precision (Pre), accuracy (Acc), F1-score (F1) and Matthews correlation coefficient (MCC) are reported for different threshold values. The substitutions predicted as highly deleterious are those displaying a number of highly connected residues , where x is the coefficient reported in the first column of the table and is the value computed for the wild-type complex. The “Filtered” set comprises only neutral mutations occurring frequently in homologous sequences and highly deleterious mutations occurring rarely or never. For each set of mutations, the line displaying the best MCC is highlighted in bold.
Detection of highly sensitive positions in the PSD95-CRIPT peptide complex by infostery analysis of the wild-type form.
| Strategy | Sens | PPV | Spe | Acc | True positives | False positives |
|---|---|---|---|---|---|---|
| path- and clique-based unitsa | 25 | 100 | 100 | 82 | G324, I341, H372, A376, L379 | |
| direct communication w. ligandb | 15 | 75 | 98 | 78 | F325, I327, H372 | N326 |
| isolated direct communicationc | 65 | 93 | 98 | 90 | L323, I327, G329, G330, I336, I341, A347, L353, V362, L367, H372, A375, L379 | G356 |
| all criteria (20 ns) |
|
|
|
| L323, G324, F325, I327, G329, G330, I336, I341, A347, L353, V362, L367, H372, A375, A376, L379 | N326, G356 |
| all criteria (50 ns) | 85 | 89 | 97 | 94 | L323, G324, F325, I327, G329, G330, I336, I338, I341, A347, L353, V362, L367, H372, A375, A376, L379 | N326, I337 |
The performance values, sensitivity (Sens), precision or positive predictive value (PPV), specificity (Spe) and accuracy (Acc), are given in percentages. They are computed for the set of 20 highly sensitive positions given in Materials and Methods. aResidues detected in both a pathway-based dynamical unit and a clique-based dynamical unit with very high confidence. bResidues forming direct communications with the ligand. cResidues forming isolated direct communications between them (see Materials and Methods). The three first lines correspond to the analysis of 5 replicates of 20 ns, while the last line corresponds to the analysis of the 5 replicates extended to 50 ns.
Figure 4Network of residues in direct communication in wild-type PSD95-CRIPT peptide complex. (a) Each node corresponds to a residue and each edge corresponds to a direct communication, detected either as isolated within the PDZ domain, or between PDZ and its ligand. Residues in bold are deleterious hotspots. The connected components extracted from the subnetwork where the nodes and edges associated to the ligand are removed are encircled in different colors. (b) The residues involved in communications within PDZ are shown as sticks and colored according to the connected component to which they belong. (c) The residues from the ligand (in black) and from PDZ (in slate) in direct communication are shown as sticks. The communications are displayed as black lines.
Figure 5Dotplot representing direct and indirect communication between PSD95 residues. Upper triangle: default communication propensity threshold. Lower triangle: threshold corresponding to 65% quantile of the communication propensity distribution. Each dot stands for the existence of a communication pathway linking the 2 residues indicated in x and y-axis. If the 2 residues are less than 4 residues away in the protein sequence, the dot is colored in grey. Otherwise, if the 2 residues are adjacent in a pathway (direct communication), the dot is in black. If they are not adjacent (indirect communication), the dot is colored according to the pathway-based unit to which the residues belong (red or pink, same color code as in Supplementary Fig. S7, on the left). Isolated direct communications are encircled in blue. The secondary structures are also indicated (size of the rounds proportional to the persistence of the secondary structure along the MD trajectories). On the left, two communication motifs are mapped onto the 3D structure of PDZ, represented as a cartoon. The pathways (>3 residues) linking the residues in the motifs are displayed as black solid lines. The C-α atoms of the residues belonging to the motif are represented as grey spheres (black smaller spheres outside the motif). Dashed red lines indicate indirect communications.
Predictive performance of other sequence- and structure-based methods.
| Prediction of mutational outcomes | ||||||||
|---|---|---|---|---|---|---|---|---|
| Method/Strategy | Set of mutations | Sens | PPV | Spe | Acc | F1 | MCC | |
| Structural | All: 45 neu. + 59 highly del. | 92 | 31 | 64 | 65 | 75 | 29 | |
| Dynamics | ENCoMa | Filtered: 15 neu. + 41 highly del. | 88 | 33 | 78 | 73 | 83 | 24 |
|
| ||||||||
|
|
|
|
|
|
| |||
| Structural Dynamics Analysis | STRESSb | 25 | 33 | 84 | 70 | I338, L353, V362, L367, A375 | ||
| PRS-CGc | 75 | 44 | 78 | 73 | I327, I328, G329, G330, I336, I338, I341, A347, L353, I359, V362, L367, H372, A375, L379 | |||
| PRS-REMDd | 70 | 42 | 70 | 72 | F325, I327, I328, G329, G330, I336, I338, I341, I359, V362, L367, A375, L379 I388 | |||
| RIPe | 50 | 56 | 87 | 81 | L323, F325, I336, A347, L353, I359, V362, L367, A375, L379 | |||
| CARDSf | 45 | 36 | 75 | 67 | L323, I327, I328, I338, I341, L353, L367, H372, L379 | |||
| Sequence Analysis | JETg | 85 | 65 | 86 | 86 | L323, G324, F325, I327, G329, G330, I336, | ||
| SCAh | 75 | 75 | 92 | 88 | L323, F325, I327, G329, G330, I336, | |||
| MSTh | 80 | 64 | 86 | 84 | L323, G324, I327, G329, G330, I336, I341, | |||
| DCAh | 70 | 70 | 94 | 86 | L323, G324, I327, G329, G330, I336, I338, | |||
The performance values, sensitivity (Sens), precision or positive predictive value (PPV), specificity (Spe) and accuracy (Acc), are given in percentages. On top, they are computed for two selected sets of mutants (“all” and “filtered”, compare with Table 1). At the bottom they are computed for the set of 20 highly sensitive positions given in Materials and Methods (compare with Table 2). aPerformance obtained from ΔΔG values computed by combining Elastic Network Contact Model (ENCoM)[63] and FoldX[91], as described in[64]. Mutations predicted as highly deleterious are those with ΔΔG > 0. bResidues identified as interior-critical by STRucturally identified ESSential residues (STRESS)[49]. cResidues identified by perturbation response scanning (PRS) using a coarse-grained model (elastic network model)[40]. dResidues identified by perturbation response scanning (PRS) using all-atom restrained-replica exchange molecular dynamics (REMD)[40]. eResidues identified as forming buried tertiary couplings, defined based on rotamerically induced perturbation (RIP)[38]. fResidues displaying strong correlation between their rotameric states along MD simulations and those of all other residues in the protein, as computed by CARDS[56]. Residues in the top 30% of the distribution are considered. gHighly conserved residues (see Materials and Methods for a definition of the conservation measure used here). hCo-evolved residues detected by three different methods. iResidues exposed to the solvent are not considered.