| Literature DB >> 27892693 |
Abstract
Protein-RNA interactions play important roles in the biological systems. Searching for regular patterns in the Protein-RNA binding interfaces is important for understanding how protein and RNA recognize each other and bind to form a complex. Herein, we present a graph-mining method for discovering biological patterns in the protein-RNA interfaces. We represented known protein-RNA interfaces using graphs and then discovered graph patterns enriched in the interfaces. Comparison of the discovered graph patterns with UniProt annotations showed that the graph patterns had a significant overlap with residue sites that had been proven crucial for the RNA binding by experimental methods. Using 200 patterns as input features, a support vector machine method was able to classify protein surface patches into RNA-binding sites and non-RNA-binding sites with 84.0% accuracy and 88.9% precision. We built a simple scoring function that calculated the total number of the graph patterns that occurred in a protein-RNA interface. That scoring function was able to discriminate near-native protein-RNA complexes from docking decoys with a performance comparable with that of a state-of-the-art complex scoring function. Our work also revealed possible patterns that might be important for binding affinity.Entities:
Keywords: binding sites; common subgraphs; graph patterns; protein–RNA interactions; recurrent patterns; scoring functions
Mesh:
Substances:
Year: 2016 PMID: 27892693 PMCID: PMC5220573 DOI: 10.1089/cmb.2016.0128
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479
Classification of RNA-Binding Sites Versus Nonbinding Sites Using libSVM
| Three nodes | Top 100 | 108 | 48 | 36 | 96 | 74.0 | 69.2 | 0.71 |
| Top 200 | 94 | 28 | 50 | 116 | 77.4 | 77.0 | 0.73 | |
| Top 300 | 101 | 30 | 43 | 114 | 76.7 | 77.1 | 0.75 | |
| Top 400 | 93 | 13 | 51 | 131 | 77.8 | 87.8 | 0.78 | |
| Top 500 | 95 | 16 | 49 | 128 | 76.4 | 85.6 | 0.77 | |
| Four nodes | Top 100 | 88 | 15 | 56 | 129 | 76.0 | 85.4 | 0.75 |
| Top 200 | 112 | 14 | 32 | 130 | 84.0 | 88.9 | 0.84 | |
| Top 300 | 118 | 29 | 26 | 115 | 80.9 | 80.3 | 0.81 | |
| Top 400 | 109 | 21 | 35 | 123 | 80.9 | 83.8 | 0.81 |
FN, false negative; FP, false positive; TN, true negative; TP, true positive.
Comparison Between Different Classification Algorithms
| Decision tree (J48) | 115 | 116 | 29 | 28 | 49.6 | 49.7 | 0.50 |
| Random Forest | 73 | 7 | 71 | 137 | 72.9 | 91.2 | 0.73 |
| libSVM | 112 | 14 | 32 | 130 | 84.0 | 88.9 | 0.84 |
Two hundred 4-node subgraphs were used to encode surface sites.
Overlap Between Subgraph Patterns and Uniprot Regions
| 1YVP | 14 | 165 | 12 |
| 1K8W | 17 | 29 | 4 |
| 1KNZ | 4 | 146 | 4 |
| 3MOJ | 5 | 76 | 5 |
| 4IG8 | 13 | 59 | 8 |
| 2ZKO | 10 | 73 | 10 |
| 1H4S | 4 | 30 | 0 |
| 3DH3 | 22 | 8 | 4 |
| 2A8V | 4 | 17 | 1 |
| 3FOZ | 19 | 28 | 13 |
| 3RW6 | 13 | 117 | 0 |
| 4KXT | 20 | 95 | 14 |
| 1JID | 4 | 9 | 0 |
| 2BH2 | 16 | 24 | 6 |
| 1N78 | 16 | 16 | 0 |
| 3MDI | 14 | 146 | 10 |
Overlap Between Subgraph Patterns and Uniprot Mutagens
| 2F8K | 5 | 2 | 1 |
| 1FEU | 9 | 8 | 4 |
| 2XS2 | 8 | 4 | 3 |
| 2A1R | 4 | 4 | 1 |
| 2A8V | 4 | 2 | 2 |
| 2BGG | 10 | 5 | 3 |
| 3PEY | 4 | 8 | 2 |
| 3MDI | 14 | 5 | 2 |

Success rate and hit count varied when different numbers of subgraph patterns were used in our scoring method. Both measurements reached the highest levels when 1200 patterns were used. There was a hike in both measurements when the pattern number changed from 700 to 800.

Comparison between our scoring method and DECK-RP. Our scoring method is comparable to the DECK-RP method in the whole range of predictions numbers. DECK-RP,.