| Literature DB >> 24792350 |
Huiying Zhao1, Jihua Wang2, Yaoqi Zhou3, Yuedong Yang4.
Abstract
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24792350 PMCID: PMC4008587 DOI: 10.1371/journal.pone.0096694
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Performance of various methods for DNA-binding protein prediction (leave-one-out cross validation).
Performance of various methods for predicting DNA-binding proteins.
| Methods | SN(%) | PR(%) | SP(%) | ACC | MCC |
|
| |||||
| DBD-Hunterc | 61 | 79 | 92 | - | 0.681 |
| DDNA3d | 60 | 91 | 99 | 98 | 0.73 |
| DDNA3Od | 64 | 93 | 99.8 | - | 0.76 |
|
| |||||
| PSI-BLAST(NCBI) e | 49 | 64 | 87 | - | 0.540 |
| PSI-BLAST(Uniprot)e | 43 | 75 | 93 | - | 0.553 |
|
| |||||
| Prospectore | 53 | 74 | 91 | - | 0.609 |
| HHblits | 61 | 69 | 99 | 97 | 0.639 |
| SPARKS-X | 45 | 95 | 99 | 97 | 0.647 |
|
| |||||
| SPARKS-X+Energy | 53 | 84 | 99 | 97 | 0.652 |
| DBD-Threadere | 56 | 86 | 96 | - | 0.680 |
| HHblits+Energy | 65 | 94 | 99 | 98 | 0.771 |
SN, sensitivity; PR, precision; SP, specificity; ACC, accuracy; MCC, Matthews correlation coefficient. bMethods based on known protein structures. cFrom Ref. [47] dfrom Ref. [53]. efrom Ref. [48].
Detecting DBPs in 18 structural folds shared by DNA-binding and non-binding proteins.
|
|
|
|
|
| A.38 | 5/1 | 5/0 | 5/0 |
| A.74 | 4/10 | 1/2 | 1/2 |
| C.52 | 14/4 | 3/0 | 4/0 |
| A.4 | 50/11 | 23/0 | 25/0 |
| A.6 | 2/2 | 2/0 | 2/0 |
| C.66 | 4/19 | 4/15 | 3/0 |
| C.62 | 2/10 | 2/0 | 2/0 |
| G.39 | 2/12 | 1/0 | 1/0 |
| C.37 | 5/87 | 2/5 | 2/0 |
| D.151 | 2/2 | 2/2 | 1/2 |
| A.60 | 7/1 | 4/0 | 5/0 |
| D.95 | 6/1 | 2/0 | 3/0 |
| C.55 | 8/35 | 2/0 | 1/0 |
| B.82 | 1/37 | 0/0 | 1/0 |
| C.53 | 1/5 | 1/0 | 1/0 |
| H.1 | 5/43 | 2/0 | 2/0 |
| D.129 | 3/13 | 0/0 | 1/0 |
| D.218 | 1/8 | 1/0 | 1/0 |
| Total | 122/301 | 57/24 | 61/4 |
Figure 2Matthews correlation coefficient for predicted binding residues versus the structural similarity SP-score between predicted and known structures of 116 targets.
The correlation coefficient is 0.38.
Figure 3Comparison of predicted (red) and native structures (green) of target 1yfjD (DAM).
Native structure and DNA are represented by green and orange, respectively. The predicted structure and DNA are denoted by color red and grey. The predicted binding sites and native binding sites are in cyan and yellow colors, respectively.
Performance of SPOT-Seq on prediction of DNA-binding proteins at three resolution levels.
| Measure | DB179/NB3797 | DB82 |
|
| ||
| MCC | 0.77 | - |
| Accuracy | 98% | - |
| Precision | 93% | - |
| Sensitivity | 65% | 51% |
|
| ||
| MCC | 0.52 | 0.64 |
| Accuracy | 88% | 93% |
| Precision | 63% | 67% |
| Sensitivity | 55% | 69% |
|
| ||
| SPscore | 0.65 | 0.73 |
| RMSD(<4 Å) | 67% | 68% |
Number of annotated and predicted DBPs in the human proteome.
|
|
|
|
|
| Transcription factor | 1459 | 837 | 61% |
| DNA binding | 1239 | 763 | 62% |
| DNA repair | 91 | 6 | 7% |
| DNA recombination | 10 | 1 | 1% |
| DNA replication | 51 | 3 | 6% |
| DNA-related biological process | 33 | 2 | 6% |
| Total | 2883 | 1612 | 56% |
Predicted DBPs whose homologs have experimentally determined 3-dimensional structures.
| Uniprot ID | Name | TPL | Homo chains | SP-score | SeqID (%) | Lmatch |
| P13051 | Uracil-DNA glycosylase | 4skne | 1emha | 1.329 | 98.7 | 224 |
| P24855 | Deoxyribonuclease-1 | 2dnja | 4awna | 1.021 | 97.3 | 99 |
| O75909 | Cyclin-K(DNA-dependent_transcription_regulation) | 1c9be | 2i53a | 0.853 | 75.6 | 76 |
| P38919 | Eukaryotic initiation factor 4A-III (RNA_helicase) | 2p6ra | 2j0qa | 0.808 | 91.0 | 114 |
| O95718 | Steroid hormone receptor ERR2 (DNA binding) | 1kb4a | 1lo1a | 0.799 | 93.4 | 86 |
| P30281 | G1/S-specific cyclin-D3 | 1c9be | 3g33b | 0.773 | 82.1 | 63 |
| P20248 | Cyclin-A2 | 1c9be | 2wipb | 0.773 | 80.2 | 64 |
| P24385 | G1/S-specific cyclin-D1 | 1c9be | 2w96a | 0.765 | 79.1 | 63 |
| P14635 | G2/mitotic-specific cyclin-B1 | 1c9be | 2b9ra | 0.746 | 80.7 | 116 |
| P24863 | Cyclin-C | 1c9be | 3rgfb | 0.742 | 76.3 | 115 |
| P51946 | Cyclin-H | 1c9be | 1jkwa | 0.733 | 73.6 | 56 |
| Q9UMR2 | ATP-dependent RNA helicase DDX19B | 2p6ra) | 3ewsa | 0.731 | 83.9 | 223 |
| O60942 | Mrna-capping enzyme (GTP binding) | 2owoa | 3s24a | 0.615 | 75.0 | 87 |
| Q9UNQ2 | Probable dimethyladenosine transferase (rrna binding) | 1dctb | 1zq9a | 0.562 | 82.4 | 107 |
| Q9NRR6 | 72 kda inositol polyphosphate 5-phosphatase | 1dewb | 2xswa | 0.539 | 75.3 | 66 |
| P32019 | Type II inositol 1,4,5-trisphosphate 5-phosphatase | 1dewb | 3n9va | 0.500 | 81.3 | 41 |
| Q96LA8 | Protein arginine N-methyltransferase 6 | 2ibsa | 4hc4a | 0.492 | 81.6 | 98 |
| Q96LI5 | CCR4-NOT transcription complex subunit 6-like (Nuclease) | 1dewb | 3ngna | 0.479 | 75.0 | 38 |
| Q96AZ6 | Interferon-stimulated gene 20 kda protein (Ribonuclease) | 2pyjb | 1wlja | 0.472 | 78.0 | 53 |
| P09234 | U1 small nuclear ribonucleoprotein C (mrna binding) | 2i13a | 2vrda | 0.363 | 75.4 | 33 |
| Q16281 | Cyclic nucleotide-gated cation channel alpha-3 | 1cgpa | 3swya | 0.342 | 67.7 | 40 |
| Q9NRK6 | ATP-binding cassette sub-family B member 10, mitochondrial | 2o8db | 4ayta | 0.310 | 76.2 | 140 |
| Q9BW91 | ADP-ribose pyrophosphatase, mitochondrial | 1rrqa | 1q33a | 0.207 | 72.6 | 57 |