| Literature DB >> 27565741 |
Jian Zhang1, Bo Gao1, Haiting Chai1, Zhiqiang Ma1, Guifu Yang2,3.
Abstract
BACKGROUND: DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable.Entities:
Keywords: Binary firefly algorithm; DNA-binding proteins; Feature selection; Parameters optimization
Mesh:
Substances:
Year: 2016 PMID: 27565741 PMCID: PMC5002159 DOI: 10.1186/s12859-016-1201-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1An example that illustrates the preferences of certain secondary structure motifs of a protein complex. Panel (a) is a TATA-binding protein (PDB ID: 1AIS_A). The binding surface is composed of strands (red) while the outer region is composed of helices (green). The general secondary structure pattern of this protein is strand-helix-strand-helix-strand-helix-strand-helix. Panel (b) is a transcription initialization protein (PDB ID: 1AIS_B) that is mainly composed of helices (green) and turns (blue)
Fig. 2The distribution of secondary structure motifs
Fig. 3The coding scheme for a firefly
Fig. 4An Example of calculating parameter r. Firefly X = {1 0 0 1 1 1 1 0 0 0}, Firefly Y = {1 0 1 1 0 1 1 1 0 0}. The distance or difference is calculated by X ⊕ Y operation and equals {0 0 1 0 1 0 0 1 0 0}. Finally, the similarity ratio of between X and Y is r. -(3/10) = 0.7
Fig. 5An example of movement and mutation for a firefly
Fig. 6The flowchart of proposed method
Comparison of BFA with different feature selection methods
| Method | SN | SP | ACC | MCC |
|---|---|---|---|---|
| BFA | 0.863 | 0.726 | 0.795 | 0.595 |
| BPSO | 0.830 | 0.710 | 0.770 | 0.544 |
| GA | 0.840 | 0.680 | 0.760 | 0.527 |
| FA | 0.720 | 0.610 | 0.665 | 0.332 |
| mRMR + IFS | 0.790 | 0.640 | 0.715 | 0.435 |
| All features | 0.680 | 0.760 | 0.600 | 0.365 |
Fig. 7ROC curves of different feature selection methods
Comparison of iDbP with existing methods on dataset PDB186
| Method | SN | SP | ACC | MCC |
|---|---|---|---|---|
| iDbP | 0.894 | 0.722 | 0.809 | 0.625 |
| DBPPred | 0.796 | 0.742 | 0.769 | 0.538 |
| iDNA-Prot | 0.677 | 0.667 | 0.672 | 0.344 |
| nDNA-Prot | 0.710 | 0.623 | 0.667 | 0.335 |
| enDNA-Prot | 0.602 | 0.699 | 0.651 | 0.303 |
| DNA-Prot | 0.699 | 0.538 | 0.618 | 0.240 |
| DNAbinder | 0.570 | 0.645 | 0.608 | 0.216 |
| DBD-Threader | 0.237 | 0.957 | 0.597 | 0.279 |
Comparison of iDbP with existing methods on dataset DNAiset
| Method | SN | SP | ACC | MCC |
|---|---|---|---|---|
| iDbP | 0.908 | 0.911 | 0.910 | 0.803 |
| Zou’s method | 0.890 | 0.828 | 0.900 | 0.753 |
| iDNA-Prot | 0.875 | 0.798 | 0.837 | 0.709 |
| nDNA-Prot | 0.779 | 0.887 | 0.851 | 0.664 |
| enDNA-Prot | 0.760 | 0.868 | 0.832 | 0.623 |
| DNAbinder | 0.717 | 0.642 | 0.863 | 0.473 |
Comparison of predictive quality on the DBP189 dataset
| Method | SN | SP | ACC | MCC |
|---|---|---|---|---|
| iDbP | 0.7619 | 0.9162 | 0.8989 | 0.5996 |
| DNA-Prot | 0.7143 | 0.9042 | 0.8830 | 0.5415 |
| iDNA-Prot | 0.6190 | 0.8563 | 0.8298 | 0.3960 |
| DNAbinder | 0.5714 | 0.8263 | 0.7979 | 0.3234 |
Number of annotated and recognized DBPs in UniProt database
| Category | Number of proteins | Proteins with complete DNA binding annotations | Number of predicted | SN |
|---|---|---|---|---|
| DBPs | ||||
| Human | 6,813 | 1,049 | 613 | 58 % |
| A. thaliana | 3,378 | 929 | 489 | 53 % |
| Mouse | 2,514 | 424 | 232 | 54 % |
| S. cerevisiae | 1,545 | 314 | 191 | 61 % |
| Fruit fly | 1,163 | 143 | 84 | 59 % |
| Summary | 15,413 | 2,859 | 1609 | 56 % |