| Literature DB >> 25522279 |
Yi-Fan Liou, Phasit Charoenkwan, Yerukala Srinivasulu, Tamara Vasylenko, Shih-Chung Lai, Hua-Chin Lee, Yi-Hsiung Chen, Hui-Ling Huang, Shinn-Ying Ho.
Abstract
BACKGROUND: Heme binding proteins (HBPs) are metalloproteins that contain a heme ligand (an iron-porphyrin complex) as the prosthetic group. Several computational methods have been proposed to predict heme binding residues and thereby to understand the interactions between heme and its host proteins. However, few in silico methods for identifying HBPs have been proposed.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25522279 PMCID: PMC4290654 DOI: 10.1186/1471-2105-15-S16-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of the system design for prediction and analysis of heme binding proteins (HBPs).
The training dataset HBPGO-TRN1000 for designing HBP predictors and three test datasets for evaluating the predictors.
| Dataset | Sequence Identity (%) | HBP Number | Non-HBP Number | Total Number |
|---|---|---|---|---|
| HBPGO-TRN1000 | 25 | 500 | 500 | 1000 |
| HBPGO-TST494 | 25 | 247 | 247 | 494 |
| HemeBind145 [ | 30 | 145 | 0 | 145 |
| TargetS311 [ | 40 | 311 | 0 | 311 |
The performance of 10 randomly selected negative datasets
| Number | Training (%) | MCC | Sensitivity | Specificity |
|---|---|---|---|---|
| 1 | 86.60 | 0.47 | 0.68 | 0.79 |
| 2 | 84.90 | 0.43 | 0.66 | 0.77 |
| 3 | 85.30 | 0.40 | 0.67 | 0.73 |
| 4 | 85.00 | 0.41 | 0.73 | 0.68 |
| 5 | 86.00 | 0.41 | 0.67 | 0.64 |
| 6 | 85.20 | 0.41 | 0.71 | 0.70 |
| 7 | 84.40 | 0.43 | 0.70 | 0.73 |
| 8 | 84.90 | 0.41 | 0.76 | 0.65 |
| 9 | 86.00 | 0.42 | 0.75 | 0.67 |
| 10 | 83.90 | 0.44 | 0.70 | 0.73 |
| average | 85.22 | 0.42 | 0.70 | 0.71 |
Figure 2Heat map of the heme protein propensity scores of dipeptides.
The propensity scores of amino acids to be a heme binding protein (HBP) and amino acid composition (%).
| Amino acid | Heme protein Score (Rank) | Composition of HBP: | Composition of Non-HBP: | Composition difference: |
|---|---|---|---|---|
| F-Phe | 705.3(1) | 5.08 | 4.07 | 1.01 |
| H-His | 615.6(2) | 2.59 | 2.26 | 0.33 |
| G-Gly | 604.3(3) | 6.66 | 6.00 | 0.66 |
| W-Trp | 603.1(4) | 1.47 | 1.08 | 0.39 |
| P-Pro | 601.3(5) | 5.17 | 4.70 | 0.47 |
| M-Met | 599.6(6) | 2.69 | 2.17 | 0.52 |
| D-Asp | 574.3(7) | 5.41 | 5.33 | 0.08 |
| L-Leu | 570.1(8) | 10.28 | 9.77 | 0.51 |
| K-Lys | 568.3(9) | 5.80 | 6.19 | -0.39 |
| R-Arg | 566.1(10) | 5.56 | 5.12 | 0.44 |
| A-Ala | 558.8(11) | 7.59 | 7.21 | 0.38 |
| Y-Tyr | 541.0(12) | 3.04 | 3.17 | -0.13 |
| V-Val | 535.5(13) | 6.41 | 6.29 | 0.12 |
| I-Ile | 518.3(14) | 5.77 | 6.16 | -0.39 |
| Q-Gln | 511.2(15) | 3.61 | 4.10 | -0.49 |
| T-Thr | 498.6(16) | 5.31 | 5.63 | -0.32 |
| C-Cys | 494.9(17) | 1.40 | 1.63 | -0.23 |
| E-Glu | 494.6(18) | 5.95 | 6.49 | -0.54 |
| N-Asn | 468.7(19) | 4.01 | 4.89 | -0.88 |
| S-Ser | 401.4(20) | 6.21 | 7.73 | -1.52 |
| R | 1.00 | -0.05 | -0.30 | 0.92 |
The total number of amino acids in the used datasets of HBPs and non-HBPs is 199,772 and 200,188, respectively.
Figure 3The secondary structures and surface hydrophobicity of 2PBJ and 1CYO. The heme is represented as spheres. The color of the surfaces represents the level of hydrophobicity. The blue, white, and brown colors represent low, mediate, and high hydrophobicity, respectively. (a) The structures of 2PBJ where a CP motif is located near the heme for the interaction with the heme. (b) Surface hydrophobicity of 2PBJ. The heme is represented as yellow sticks. (c) The structures of 1CYO. (d) Surface hydrophobicity of 1CYO. Protein structures are drawn using Discovery studio 4.0.
The comparisons of prediction accuracies (%) between SCMHBP and some methods.
| Method | HBPGO-TRN1000 | HBPGO-TST494 | HemeBind145 | TargetS311 | Mean test |
|---|---|---|---|---|---|
| SCMHBP* | 85.90 ± 0.42 | 72.79 ± 1.21 | 67.24 ± 2.89 | 74.69 ± 2.48 | 71.57 ± 7.83 |
| SVM-AAC | 91.00 | 64.57 | 66.21 | 75.56 | 68.78 |
| SVM-DPC | 97.70 | 78.54 | 63.45 | 74.60 | 72.20 |
| SVM-PSSM400 | 91.00 | 87.45 | 66.21 | 74.28 | 75.98 |
| SVM-AAindex | 87.60 | 76.52 | 71.72 | 80.39 | 76.21 |
| J48-AAC | 93.80 | 66.80 | 65.52 | 67.20 | 66.51 |
| J48-DPC | 97.90 | 64.78 | 60.00 | 68.81 | 64.53 |
| J48-PSSM400 | 97.70 | 75.30 | 61.38 | 62.38 | 66.35 |
| J48-AAindex | 98.60 | 68.22 | 62.76 | 70.42 | 67.13 |
| Bayes-AAC | 69.20 | 67.40 | 71.03 | 62.70 | 67.04 |
| Bayes-DPC | 67.40 | 63.77 | 73.79 | 71.38 | 69.65 |
| Bayes-PSSM400 | 57.80 | 55.30 | 85.52 | 82.64 | 74.49 |
| Bayes-AAindex | 63.60 | 63.20 | 61.38 | 55.95 | 60.18 |
| Mean accuracy | 84.55 | 69.63 | 69.02 | 72.00 | 70.22 |
* The results are the averages of the performances with 10 difference score cards from 10 independent runs.
The top-20 putative HBPs according to the HBP sequences.
| Name (UniProt) | UniProt ID | Function | HBP Score | |
|---|---|---|---|---|
| 1 | Dermatoxin-J2 | P86622 | Antimicrobial peptide | 715.58 |
| 2 | Antifungal protein | Q08617 | This protein inhibits the growth of a variety of fungal species | 683.37 |
| 3 | Dolichyl-diphosphooligosaccharide--protein glycosyltransferase subunit 4C | Q9SF57 | May be involved in N-glycosylation through its association with N-oligosaccharyl transferase | 673.21 |
| 4 | Proline, histidine and glycine-rich protein 1 | Q8K0G7 | Unknown | 671.84 |
| 5 | Eggshell protein 2A | P19469 | Unknown | 664.84 |
| 6 | Photosystem II reaction center protein Ycf12 | Q0IAK1 | A core subunit of photosystem II (PSII) | 662.62 |
| 7 | U13-ctenitoxin-Pn1c | P84018 | Acts as a neurotoxin | 660.56 |
| 8 | Histone H3 | P83864 | Core component of nucleosome | 660.43 |
| 9 | Uncharacterized protein DDB_G0295473 | B0G125 | Unknown | 657.61 |
| 10 | Vesicle-associated protein | Q06155 | May function as a multidomain RNA-binding protein | 657.01 |
| 11 | Uncharacterized protein YML007C-A, mitochondrial | Q3E7A6 | Unknown | 655.60 |
| 12 | Sperm protamine P1 | P83211 | Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis. | 653.50 |
| 13 | S-antigen protein | P13821 | S antigens are soluble heat-stable proteins present in the sera of some infected individuals. | 653.22 |
| 14 | Uncharacterized 8.4 kDa protein | P08685 | Unknown | 653.16 |
| 15 | Glycine-rich cell wall structural protein | P27483 | Responsible for plasticity of the cell wall | 651.68 |
| 16 | Defensin-1 | P84757 | Has antibacterial activity against the Gram-negative bacterium E. coli and the Gram-positive bacteria L. monocytogenes and S. aureus | 650.91 |
| 17 | Putative uncharacterized protein YKL156C-A | Q8TGN0 | Unknown | 650.84 |
| 18 | OriE replication initiation protein | D9IEI0 | Unknown | 650.41 |
| 19 | Putative uncharacterized protein YEL032C-A | Q8TGP4 | Unknown | 650.31 |
| 20 | Protein PCOTH | Q58A44 | May be involved in growth and survival of prostate cancer cells through the TAF-Ibeta pathway | 648.51 |
The three physicochemical properties selected by SCM-PCPs.
| Amino acid | Heme protein Score (Rank) | 1SNEP660103 Score (Rank) | 2TAKK010101 Score (Rank) | 3KARP850101 Score (Rank) |
|---|---|---|---|---|
| F-Phe | 705.3(1) | 0.438(2) | 23.0(2) | 0.93(19) |
| H-His | 615.6(2) | 0.320(4) | 11.9(9) | 0.982(14) |
| G-Gly | 604.3(3) | -0.073(12) | 0.0(20) | 1.142(3) |
| W-Trp | 603.1(4) | 0.493(1) | 24.2(1) | 0.925(20) |
| P-Pro | 601.3(5) | -0.016(9) | 15.0(7) | 1.055(8) |
| M-Met | 599.6(6) | -0.041(10) | 11.9(8) | 0.947(18) |
| D-Asp | 574.3(7) | -0.285(20) | 4.9(14) | 1.033(11) |
| L-Leu | 570.1(8) | -0.008(8) | 17.0(5) | 0.967(15) |
| K-Lys | 568.3(9) | 0.049(6) | 10.5(10) | 1.093(6) |
| R-Arg | 566.1(10) | 0.079(5) | 7.3(12) | 1.038(10) |
| A-Ala | 558.8(11) | -0.110(13) | 9.8(11) | 1.041(9) |
| Y-Tyr | 541.0(12) | 0.381(3) | 17.2(4) | 0.961(16) |
| V-Val | 535.5(13) | -0.155(16) | 15.3(6) | 0.982(13) |
| I-Ile | 518.3(14) | 0.001(7) | 17.2(3) | 1.002(12) |
| Q-Gln | 511.2(15) | -0.067(11) | 2.4(19) | 1.165(2) |
| T-Thr | 498.6(16) | -0.208(18) | 6.9(13) | 1.073(7) |
| C-Cys | 494.9(17) | -0.184(17) | 3.0(17) | 0.96(17) |
| E-Glu | 494.6(18) | -0.246(19) | 4.4(15) | 1.094(5) |
| N-Asn | 468.7(19) | -0.136(14) | 3.6(16) | 1.117(4) |
| S-Ser | 401.4(20) | -0.153(15) | 2.6(18) | 1.169(1) |
| R | 1.00 | 0.60 | 0.58 | -0.56 |
1SNEP660103 = Principal component III.
2TAKK010101 = Side-chain contribution to protein stability (kJ/mol).
3KARP850101 = Flexibility parameter for no rigid neighbors.
The amino acid scores and the average B-factors of the Cα and side chains
| Amino Acid | Score | HBPs | Non-HBPs |
|---|---|---|---|
| F-Phe | 705.30 | 29.41 ± 22.90 | 28.15 ± 18.41 |
| H-His | 615.60 | 28.38 ± 22.10 | 30.67 ± 19.56 |
| G-Gly | 604.30 | 32.00 ± 24.25 | 30.60 ± 20.17 |
| W-Trp | 603.10 | 29.48 ± 22.98 | 25.18 ± 16.77 |
| P-Pro | 601.30 | 30.60 ± 23.23 | 32.66 ± 20.87 |
| M-Met | 599.60 | 29.96 ± 22.34 | 30.20 ± 19.06 |
| D-Asp | 574.30 | 32.00 ± 23.58 | 34.04 ± 20.96 |
| L-Leu | 570.10 | 31.11 ± 23.27 | 29.04 ± 18.19 |
| K-Lys | 568.30 | 32.47 ± 22.93 | 35.76 ± 21.87 |
| R-Arg | 566.10 | 31.83 ± 24.25 | 33.24 ± 20.55 |
| A-Ala | 558.80 | 29.14 ± 21.42 | 30.85 ± 20.20 |
| Y-Tyr | 541.00 | 29.73 ± 22.51 | 26.99 ± 17.72 |
| V-Val | 535.50 | 30.19 ± 22.40 | 28.02 ± 17.71 |
| I-Ile | 518.30 | 30.53 ± 22.29 | 29.01 ± 17.89 |
| Q-Gln | 511.20 | 32.20 ± 24.31 | 33.27 ± 20.80 |
| T-Thr | 498.60 | 31.22 ± 23.47 | 30.08 ± 19.36 |
| C-Cys | 494.90 | 29.10 ± 23.30 | 26.61 ± 16.31 |
| E-Glu | 494.60 | 32.62 ± 23.35 | 35.51 ± 20.95 |
| N-Asn | 468.70 | 31.88 ± 23.42 | 32.66 ± 20.79 |
| S-Ser | 401.40 | 32.34 ± 23.93 | 32.13 ± 20.33 |
| R | 1.00 | -0.45 | -0.22 |
| p-value | 0.02 | 0.18 | |
Figure 4The B-factor distributions of the HBP (3BL2) and non-HBPs (1CW0 and 1E5H). The high B-factors are represented in red while the low value is in blue. The backbones are shown as the traces. In the HBP, the heme group is represented with spheres. The thinner traces mean the lower B-factors both in HBP and non-HBPs. (a) Dehaloperoxidase A. (b) Repair endonuclease. (c) Deacetoxycephalosporin C synthase.