| Literature DB >> 32587353 |
Xiang Gao1,2, Xiaoqun Dong1,2, Xuanxuan Li1,3, Zhijie Liu4, Haiguang Liu5.
Abstract
Disulfide bonds are covalently bonded sulfur atoms from cysteine pairs in protein structures. Due to the importance of disulfide bonds in protein folding and structural stability, artificial disulfide bonds are often engineered by cysteine mutation to enhance protein structural stability. To facilitate the experimental design, we implemented a method based on neural networks to predict amino acid pairs for cysteine mutations to form engineered disulfide bonds. The designed neural network was trained with high-resolution structures curated from the Protein Data Bank. The testing results reveal that the proposed method recognizes 99% of natural disulfide bonds. In the test with engineered disulfide bonds, the algorithm achieves similar accuracy levels with other state-of-the-art algorithms in published dataset and better performance for two comprehensively studied proteins with 70% accuracy, demonstrating potential applications in protein engineering. The neural network framework allows exploiting the full features in distance space, and therefore improves accuracy of the disulfide bond engineering site prediction. The source code and a web server are available at http://liulab.csrc.ac.cn/ssbondpre.Entities:
Year: 2020 PMID: 32587353 PMCID: PMC7316719 DOI: 10.1038/s41598-020-67230-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The disulfide bond and training dataset derivation. (a) A disulfide bond observed in a protein (PDB ID: 1IL8), and a negative sample is generated by finding the nearest neighbors between the adjacent residues of the cysteine residues. (b) The average distances between atoms of disulfide bonded cysteine amino acids in a heatmap representation. (c) The average distances between atoms of amino acid pairs in the negative sample set.
The training and testing datasets from non-redundant databases.
| No. protein chains | No. of disulfide bonds | No. of negative data | Training Data | Testing Data | |
|---|---|---|---|---|---|
| VAST | 14,647 | 12,496 | 12,496 | 18,992 | 6,000 |
| PISCES | 15,139 | 5,122 | 5,122 | 8,244 | 2,000 |
Figure 2The fully connected neural network architecture. The input layer is the vector of 45 dimensions, followed by two hidden layers composed of nodes that can be activated based on the ReLu function.
Figure 3The histogram of distances between Cα atoms of disulfide bonded cysteines.
Figure 4The classification performance using fully connected neural network model. (a) The receiver operating characteristic curve (ROC) on the VAST test dataset for the network trained with VAST data. (b) The ROC on the PISCES testing data for the same network. Both ROC curves show very good performance on the prediction of disulfide bonds. The number of the samples in each test dataset is shown in the parentheses. The area under the curve (AUC) are 0.995 for VAST test data and 0.998 for PISCES test data. The high true positive rate with extremely low false positive rate suggests the excellent performance in classifications.
The performance on the prediction of disulfide bond engineering sites.
| PDB ID | Mutation | SSbondPre | MAESTRO | MAESTRO-Score | Salam | ||||
|---|---|---|---|---|---|---|---|---|---|
| Abs.Rank* | Rel.Rank** | Abs.Rank | Rel.Rank | Abs.Rank | Rel.Rank | Abs.Rank | Rel.Rank | ||
| 1FG9 | Glu7:A–Ser69:A | 45 | 0.16 | 1 | 0.04 | 0 | 0 | 15 | 0.71 |
| 1LMB | Tyr88:3–Tyr88:4 | 2 | 0.03 | 10 | 0.2 | 9 | 0.18 | 32 | 0.48 |
| 1RNB | Ala43–Ser80 | 0 | 0.00 | 2 | 0.03 | 9 | 0.14 | 3 | 0.07 |
| 1RNB | Ser85–His102 | 15 | 0.41 | 33 | 0.52 | 40 | 0.63 | 0 | 0 |
| 1SNO | Gly79–Asn118*** | 45 | 0.73 | 3 | 0.04 | 4 | 0.05 | 22 | 0.34 |
| 1XNB | Ser100–Asn148 | 1 | 0.02 | 1 | 0.01 | 13 | 0.1 | 8 | 0.08 |
| 2CBA | Leu60–Ser173 | 33 | 0.33 | 66 | 0.45 | 68 | 0.46 | 55 | 0.47 |
| 2CI2 | Thr22–Val82 | 9 | 0.60 | 5 | 0.18 | 4 | 0.14 | 0 | 0 |
| 2LZM | Ile9–Leu164 | 35 | 0.45 | 31 | 0.44 | 38 | 0.54 | 13 | 0.21 |
| 2RN2 | Cys13–Asn44 | 5 | 0.12 | 14 | 0.16 | 21 | 0.24 | 25 | 0.33 |
| 2ST1 | Thr22–Ser87 | 86 | 0.67 | 37 | 0.16 | 30 | 0.13 | 44 | 0.23 |
| 3GLY | Asn20–Ala27 | 207 | 0.95 | 104 | 0.39 | 163 | 0.62 | 187 | 0.82 |
| 3GLY | Thr246–Cys320 | 54 | 0.25 | 35 | 0.13 | 34 | 0.13 | 19 | 0.08 |
| 4DFR | Pro39–Cys85 | 35 | 0.45 | 11 | 11 | 16 | |||
| 9RAT | Ala4–Val118 | 35 | 0.65 | 8 | 0.15 | 10 | 0.19 | 25 | 0.68 |
| Average | 40.5 | 0.39 | 24.0 | 0.20 | 30.2 | 0.24 | 30.9 | 0.31 | |
| Median | 35 | 0.41 | 14 | 0.16 | 21 | 0.18 | 19 | 0.23 | |
*The absolute rank (abs. rank) was based on the scores of each method, starting with 0.
**The relative rank (rel. rank) was calculated as abs.rank/(total prediction-1).
***The glycine79 was substituted with alanine before SSbondPre prediction.
Summary for prediction results for Bril and Flavodoxin.
| Bril | Flavodoxin | ||
|---|---|---|---|
| Number of Candidate ssbonds# | 378 | 464 | |
| Number of Predicted ssbonds | SSbondPre | 40 | 54 |
| MaestroWeb | 40 | 105 | |
| DbD2 | 23 | 21 | |
| Experimentally tested mutants | 10 | 3 | |
| Experimentally validated ssbonds | 6 | 3 | |
#Candidate bonds: number of amino acid pairs that passed the Cα distance criterion.
Disulfide bond prediction results for the proteins Bril and Flavodoxin.
| Mutant | Formed disulfide bond# | Score* | MaestroWeb(rank/total) | DbD2** (outcome) | ||
|---|---|---|---|---|---|---|
| Bril | Q41C-F65C | Yes | 0.99 | 3/40 | 8/40 | Yes |
| A20C-Q25C | Yes | 0.99 | 6/40 | 2/40 | ||
| T9C-A36C | Yes | 0.99 | 7/40 | 13/40 | Yes | |
| V16C-A29C | Yes | 0.99 | 8/40 | 11/40 | ||
| L78C-A87C | Yes | 0.99 | 9/40 | 9/40 | Yes | |
| K27C-A79C | Yes | 0.98 | 12/40 | 19/40 | ||
| Flavodoxin | R125C-C102 | Yes | 0.99 | 5/54 | 3/105 | Yes |
| N14C-C93 | Yes | 0.97 | 13/54 | 15/105 | No | |
| A43C-L74C | Yes | 0.78 | 37/54 | No | No |
#Experimental observed bonds: Yes (bonded) or No (nonbonded).
*Score computed with the neural network model.
**DbD2 results are not ranked, so the outcomes are labelled as Yes (bonded) or No (nonbonded).
Figure 5The web server for the prediction of disulfide bonds. (a) The input page for users to upload a PDB file or provide a PDB ID; (b) The result page with the selected prediction highlighted using the ball-stick representation and the backbone is shown in a cartoon representation.
Figure 6The relevance of the distance features to the classification outcome. Out of the 45 unique distances, 20 distances have negligible influence on the classification performance. The distances between Cβ and main chain atoms of the pairing residue are important features in disulfide bond classifications.