| Literature DB >> 21284850 |
Shandar Ahmad1, Akinori Sarai.
Abstract
BACKGROUND: Protein-RNA interactions play important role in many biological processes such as gene regulation, replication, protein synthesis and virus assembly. Although many structures of various types of protein-RNA complexes have been determined, the mechanism of protein-RNA recognition remains elusive. We have earlier shown that the simplest electrostatic properties viz. charge, dipole and quadrupole moments, calculated from backbone atomic coordinates of proteins are biased relative to other proteins, and these quantities can be used to identify DNA-binding proteins. Closely related, RNA-binding proteins are investigated in this study. In particular, discrimination between various types of RNA-binding proteins, evolutionary conservation of these bulk electrostatic features and effect of conformational changes by complex formation are investigated. Basic binding mechanism of a putative RNA-binding protein (HI1333 from Haemophilus influenza) is suggested as a potential application of this study.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21284850 PMCID: PMC3048485 DOI: 10.1186/1472-6807-11-8
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1Distribution of electric charges amongst RNA-binding proteins. Abbreviations: First letter of each legend (q: charge, p: dipole moment, Q1: First eigen value of the quadrupole moment), followed by type of proteins considered (bind: all RNA-binding, dbp: DNA-binding, and proteins binding to tRNA (trna), mRNA (mrna) etc.)
Figure 2Distribution of electric dipole moments amongst RBPs.
Figure 3Distribution of quadrupole moments amongst RBPs.
Figure 4Scatterplot of net charge versus dipole moment of RBP, DBP and control data sets.
Mean and standard deviation values of electric moments in each class of RNA-binding protein (for a pair-wise comparison, see Table 2).
| Mean (Charge) | Stdev | Mean (Dipole moment) | Stdev | Mean (Quadrupole moment) | Stdev | |
|---|---|---|---|---|---|---|
| 0.075 | 0.129 | 4.613 | 3.681 | 20.156 | 20.455 | |
| 0.077 | 0.105 | 6.387 | 4.183 | 25.869 | 25.199 | |
| -0.017 | 0.012 | 3.196 | 1.437 | 11.862 | 5.208 | |
| 0.192 | 0.186 | 2.138 | 1.186 | 15.036 | 11.541 | |
| 0.025 | 0.037 | 3.728 | 2.499 | 16.616 | 14.455 | |
| 0.048 | 0.057 | 2.991 | 1.817 | 20.226 | 29.061 | |
| -0.020 | 0.043 | 2.664 | 1.748 | 9.972 | 9.294 | |
Pair-wise statistical significance (p-values) of difference in groups of RNA-binding proteins (for mean and standard deviation values in each group, see Table 1).
| Group 1 binding to | Group 2 binding to | p-value (charge) | p-value (dipole moment) | p-value (quadrupole moment) |
|---|---|---|---|---|
| <2.2E-016 | 4.19E-010 | 3.33E-009 | ||
| 1.04E-003 | 6.08E-003 | 2.48E-004 | ||
| 9.24E-001 | 1.30E-003 | 7.54E-002 | ||
| 2.68E-015 | 1.69E-008 | 1.01E-001 | ||
| 2.09E-002 | 4.30E-003 | 3.69E-001 | ||
| 1.63E-002 | 1.40E-002 | 9.81E-001 | ||
| 1.41E-003 | 1.58E-006 | 1.66E-005 | ||
| 9.70E-004 | 3.26E-002 | 2.88E-001 | ||
| 5.61E-002 | 2.37E-001 | 3.45E-003 | ||
| 2.05E-003 | 7.30E-001 | 2.22E-001 | ||
| 5.09E-004 | 1.91E-001 | 2.01E-001 | ||
| 6.29E-012 | 1.86E-012 | 5.45E-003 | ||
| 2.28E-002 | 1.66E-006 | 4.45E-002 | ||
| 2.27E-002 | 5.46E-007 | 1.26E-001 | ||
| 1.05E-012 | 3.28E-012 | 1.30E-007 | ||
| 2.73E-004 | 1.09E-001 | 7.19E-001 | ||
| <2.2E-016 | 2.21E-005 | 1.48E-001 | ||
| 2.40E-001 | 6.36E-002 | 6.50E-002 | ||
| 5.60E-003 | 1.44E-001 | 4.03E-001 | ||
| <2.2E-016 | 1.46E-006 | 4.53E-005 | ||
| 2.34E-004 | 4.70E-001 | 7.66E-002 | ||
Neural network performance to discriminate between proteins binding to different types of RNA based on charge, dipole and quadrupole moments*.
| Positive class binding to | Negative class binding to | Number of proteins in + ve class | Number of proteins in -ve class | AUC | F1 | Precision | Recall | Accuracy |
|---|---|---|---|---|---|---|---|---|
| 160 | 2441 | 0.78 | 0.37 | 0.31 | 0.45 | 0.91 | ||
| 84 | 2441 | 0.79 | 0.26 | 0.23 | 0.30 | 0.94 | ||
| 20 | 2441 | 0.42 | 0.02 | 0.01 | 1.00 | 0.03 | ||
| 17 | 2441 | 0.75 | 0.24 | 0.24 | 0.24 | 0.99 | ||
| 13 | 2441 | 0.10 | 0.01 | 0.01 | 1.00 | 0.02 | ||
| 20 | 84 | 0.70 | 0.45 | 0.32 | 0.75 | 0.64 | ||
| 13 | 84 | 0.56 | 0.30 | 0.18 | 1.00 | 0.37 | ||
| 17 | 84 | 0.44 | 0.32 | 0.19 | 1.00 | 0.28 | ||
| 13 | 2441 | 0.07 | 0.57 | 0.39 | 1.00 | 0.39 | ||
| 13 | 2441 | 0.02 | 0.60 | 0.43 | 1.00 | 0.43 | ||
| 20 | 17 | 0.19 | 0.63 | 0.46 | 1.00 | 0.46 | ||
| 143 | 2441 | 0.72 | 0.22 | 0.20 | 0.26 | 0.90 | ||
| 160 | 143 | 0.58 | 0.69 | 0.53 | 1.00 | 0.53 | ||
| 84 | 143 | 0.74 | 0.64 | 0.52 | 0.83 | 0.65 | ||
| 20 | 143 | 0.33 | 0.24 | 0.13 | 1.00 | 0.20 | ||
| 13 | 143 | 0.07 | 0.16 | 0.09 | 1.00 | 0.14 | ||
* AUC is area under the ROC curve, F-measure (F1) is the highest geometric mean of precision and recall and accuracy is number of correct predictions relative to all predictions at peak F-measure. In all cases, neural network with three units in the hidden layer was used for training in a leave-one-out procedure and the training was performed for a fixed number of epochs without using information from left-out protein.
Electric moments of RNA-binding proteins as pairs of RNA-complexed and monomeric structures*.
| Detailed protein-wise comparison | ED | ||||
|---|---|---|---|---|---|
| 30S Ribosomal protein S15 (1fjgO, 2fkxA, 100%) | 4.37 | 4.44 | 5.77 | 7.36 | rRNA binding: ED(P) = 0.7 ED(Q) = 1.6 |
| 30S Ribosomal protein S6 (1fjgF, 1louA, 99%) | 2.34 | 2.66 | 7.90 | 8.74 | |
| 30S Ribosomal protein S7 (1fjgG, 1rssA, 100%) | 4.95 | 3.30 | 22.12 | 9.56 | |
| 30S Ribosomal protein S19 (1ibmS, 1qkfA, 100%) | 5.04 | 3.83 | 12.96 | 8.43 | |
| 30S Ribosomal protein S16 (1hnwP, 1emwA, 100%) | 4.27 | 4.31 | 7.22 | 6.76 | |
| Ribosomal protein L11 (1hc8A, 2f0wA, 100%) | 1.79 | 1.84 | 6.64 | 6.41 | |
| Ribosomal protein L25 (1d6kA, 1b75A, 100%) | 3.59 | 3.35 | 10.26 | 9.90 | |
| 60S Ribosomal protein L30 (1cn8A, 1cn7A, 100%) | 2.94 | 2.41 | 4.61 | 4.97 | |
| Glutaminyl-tRNA synthetase (1euyA, 1nylA, 98%) | 1.10 | 1.15 | 14.24 | 15.46 | tRNA binding: ED(P) = 0.6 ED(Q) = 1.2 |
| Queuine tRNA-ribosyltransferase (1q2rA, 1r5yA, 100%) | 2.19 | 2.26 | 2.83 | 3.75 | |
| Glutamyl-tRNA synthetase (1g59A, 1j09A, 99%) | 1.64 | 1.50 | 13.21 | 13.77 | |
| Aspartyl tRNA-synthetase (1asyA, 1eovA, 100%) | 4.03 | 5.06 | 7.48 | 9.39 | |
| Elongation factor TU (1b23P, 2c78A, 98%) | 3.62 | 3.44 | 12.65 | 11.21 | |
| Arginyl tRNA synthetase (1f7uA, 1bs2A, 100%) | 2.22 | 2.31 | 12.44 | 13.83 | |
| Small protein B (1p6vA, 1k8hA, 98%) | 0.69 | 1.71 | 16.40 | 18.62 | |
| Pseudouridine synthase B (1k8wA, 1r3fA, 100%) | 1.65 | 1.90 | 18.31 | 17.41 | |
| Tyrosyl tRNA synthetase (1j1uA, 2ag6A, 96%) | 1.34 | 1.75 | 16.60 | 15.22 | |
| Bactereophage coat protein MS2 (1aq3A, 1mscA, 98%) | 2.08 | 2.08 | 4.77 | 3.65 | Viral RNA binding: ED(P) = 0.7 ED(Q) = 1.9 |
| Minor core protein lambda 3 (1n1hA, 1mukA, 100%) | 1.65 | 1.42 | 11.95 | 12.44 | |
| RNA polymerase HC-J4 (1nb7A, 1gx5A, 96%) | 3.81 | 2.77 | 12.44 | 10.34 | |
| HIV-I nucleocapsid protein (1a1tA, 1mfsA, 100%) | 4.24 | 3.65 | 12.74 | 23.69 | |
| NHP2-like protein 1 (1e7kA, 2jnbA, 100%) | 3.27 | 5.06 | 4.28 | 8.61 | Others: ED(P) = 0.9 ED(Q) = 2.0 |
| Splicosomal U1A protein (1audA, 1fhtA, 98%) | 1.36 | 2.60 | 7.93 | 12.88 | |
| Pumilo homology domain (1m8yB, 1m8zA, 100%) | 4.00 | 4.05 | 20.01 | 20.46 | |
| Rho transcription termination factor (2a8vA, 1a62A) | 1.84 | 1.86 | 12.42 | 10.87 | |
| Transcription factor IIIA (**) (1un6B, 2j7jA, 100%) | 3.92 | 1.94 | 16.83 | 30.18 | |
| VP39 protein (1av6A, 4dcgA, 98%) | 3.18 | 3.09 | 6.53 | 6.25 | |
*P and Q stands for dipole moment and quadrupole moment (first eigen value) respectively. Euclidean distance (ED) here refers to the root mean squared difference between bound and unbound moments for the given pair. (**) This protein is reported to have two modes of interaction and shows very large conformational change via domain movement (rigid body RMSD is 9.2Å). However, domain-wise comparison shows almost no change in moments (see Table 5).
Figure 5Dipole moments of RNA-binding proteins in complexed structure compared with their independently solved monomeric form.
Figure 6Quadrupole moments of RNA-binding proteins in complexed structure compared with their independently solved monomeric form.
Figure 7Superimposed structures of pairs of RBPs in RNA-complexed structure and their unbound monomeric forms. Figure on the left shows complexed and unbound monomeric pairs of 30S ribosomal protein S16 (Complex PDB ID 1hnw_P in red, unbound PDB ID 1emw_A in blue) and on the right a pair of Zinc finger structures in complex (1un6_B, blue) and unbound forms (2j7j_A, red) have been shown. Dipole and quadrupole moment values for Ribosomal protein S16 remain almost unchanged despite undergoing conformational changes (Table 5), whereas zinc finger pairs show a significant difference in the two variations. However, this protein (zinc finger) is a rare example of very large conformational changes in RBPs and in the compared pairs in Table 4, is the only exception to all other pairs, where complex and unbound structures have similar values of moments. This exception was further analyzed to reveal that the moments in the individual domains remain largely unchanged.
Electric moments of three domains in Zinc finger, which undergoes very large conformational change.
| Domain | Dipole moment (bound; 1un6) | Dipole moment (unbound; 2j7j) | Quadrupole moment (bound; 1un6) | Quardrupole moment (unbound; 2j7j) |
|---|---|---|---|---|
| Domain I (1-28) | 5.5 | 5.4 | 5.1 | 5.3 |
| DomainII (29-57) | 6.0 | 6.0 | 7.6 | 6.2 |
| Domain III (58-87) | 3.3 | 3.6 | 4.3 | 4.8 |
Figure 8Distribution of positively charged (Lys and Arg) residues (blue surface filled) and negatively charged (Asp and Glu) residues (red surface filled) in HI1333, a hypothetical protein from Haemophilus influenzae (PDB ID 1JO0) from the protein data bank. Protein has a significantly high dipole moment and its RNA-binding region seems to be clearly separated from the negatively charged region by a vertical plane.