| Literature DB >> 29859055 |
Min Han1, Yifan Song1, Jiaqiang Qian1, Dengming Ming2.
Abstract
BACKGROUND: Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking.Entities:
Keywords: Domain profile module; Hidden Markov model; Physicochemical interaction prediction; Protein functional site prediction; fiDPD
Mesh:
Substances:
Year: 2018 PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flow-chart for building the function-site- and interaction-annotated domain profile database (fiDPD) and for predicting protein function-sites and PLIs using fiDSPD
Fig. 2Mapping known protein function sites and interactions to a domain-profile module, ⊗: known PFSs of domain structures, ⊙: pivotal PFSs in a profile module with the number indicating a weight factor, *: PFSs mapped into the query protein sequence from profile module pivotal sites, which, after a filtering, is reduced to two points (A and B) as a final prediction output, Δ: non-conservative pivotal sites mapped into the query protein, which will be ignored due to the low conservation value
Fig. 3Mapping the protein-ligand interactions predicted for the mimivirus sulfhydryl oxidase R596, target T0737, PDB code 3TD7. Dash lines represent PLIs, they are colored as following: blue for electrostatic interactions, green for π-stacking interactions, gray for van der Waals interactions, and red for interaction not found by fiDPD
The prediction of protein-ligand interactions on PFSs of T0737†
| Target | Site | AA | COV | COO | ELE | HBD | HBA | π-π |
|---|---|---|---|---|---|---|---|---|
| T0737 | 41 | G | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | T | 0 | 0 | +/0 | T | 0 | 0 | |
| 45 | W | 0 | 0 | 0 | T | 0 | T | |
| 49 | H | 0 | 0 | 0 | 0 | + | T | |
| 78 | L | 0 | 0 | 0 | 0 | 0 | 0 | |
| 83 | C | 0 | 0 | 0 | + | T | 0 | |
| 114 | Y | 0 | 0 | 0 | 0 | T | T | |
| 117 | H | 0 | 0 | T | + | – | T | |
| 118 | N | 0 | 0 | 0 | + | T | 0 | |
| 120 | V | 0 | 0 | 0 | 0 | 0 | 0 | |
| 121 | N | 0 | 0 | 0 | 0 | + | 0 | |
| 123 | K | 0 | 0 | T | T | + | +/0 |
†AA stands for amino acid, COV for covalent bond, COO for coordinate bond, ELE for electrostatic interaction, HBD for H-bond donor, HBA for H-bond acceptor, π-π for π-stacking interactions. “0” indicates the corresponding interaction is not present in protein-ligand complex structure and fiDPD calculation also showed no such type PLIs on the site
Ligand-binding sites predictions of CASP10/11 targets proteins†
| Target | PDB | Ligand | Type | Sites* | Prediction | TP | Precision | Recall | MCC |
|---|---|---|---|---|---|---|---|---|---|
| T0652 | 4HG0 | AMP | Non-metal | 11 | 17 | 6 | 0.35 | 0.55 | 0.41 |
| T0657 | 2LUL | ZN | Metal | 5 | 9 | 4 | 0.44 | 0.8 | 0.58 |
| T0659 | 4ESN | ZN | Metal | 3 | No-hit | ||||
| T0675 | 2LV2 | ZN | Metal | 8 | 9 | 8 | 0.89 | 1 | 0.94 |
| T0686 | 4HQL | MG | Metal | 5 | 6 | 3 | 0.5 | 0.6 | 0.54 |
| T0696 | 4RT5 | NA | Metal | 6 | 3 | 1 | 0.33 | 0.17 | 0.21 |
| T0697 | 4RIT | TRS | Non-metal | 6 | 11 | 0 | 0 | 0 | 0 |
| T0706 | 4RCK | MG | Metal | 5 | 3 | 3 | 1 | 0.6 | 0.77 |
| T0720 | 4IC1 | MN/SF4 | Metal | 14 | No-hit | ||||
| T0721 | 4FK1 | FAD | Non-metal | 29 | 3 | 3 | 1 | 0.1 | 0.31 |
| T0726 | 4FGM | ZN | Metal | 7 | No-hit | ||||
| T0737 | 3TD7 | FAD | Non-metal | 21 | 13 | 12 | 0.92 | 0.57 | 0.71 |
| T0744 | 2YMV | FNR | Non-metal | 19 | 4 | 4 | 1 | 0.21 | 0.45 |
† Target 762 to 854 were taken from CASP11 whose protein-ligand interactions were well characterized in the crystal structures
*“Sites” is the number of ligand-binding sites recorded in PDB files of the target protein
Prediction performance of LIBRA*
| Target | PDB | Length | Sites | LIBRA Rank-1 | LIBRA Rank-2 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Prediction | TP | Model | MCC | Prediction | TP | Model | MCC | ||||
| T0652 | 4HG0 | 292 | 11 | 7 | 1 | N | 0.08 | 8 | 7 | N | 0.74 |
| T0657 | 2LUL | 154 | 5 | 4 | 4 | Y | 0.89 | 4 | 0 | N | 0 |
| T0659 | 4ESN | 72 | 3 | 3 | 3 | Y | 1 | 3 | 0 | N | 0 |
| T0675 | 2LV2 | 74 | 8 | 4 | 4 | Y | 0.69 | 4 | 4 | N | 0.69 |
| T0686 | 4HQL | 242 | 5 | 3 | 3 | Y | 0.77 | 3 | 3 | Y | 0.77 |
| T0696 | 4RT5 | 111 | 6 | 7 | 0 | N | 0 | 5 | 0 | N | 0 |
| T0697 | 4RIT | 483 | 6 | 14 | 0 | N | 0 | 5 | 0 | N | 0 |
| T0706 | 4RCK | 217 | 5 | 3 | 0 | N | 0 | 8 | 1 | N | 0.14 |
| T0720 | 4IC1 | 202 | 8 | 4 | 4 | Y | 0.7 | 5 | 0 | N | 0 |
| T0721 | 4FK1 | 301 | 29 | 24 | 23 | N | 0.86 | 23 | 2 | N | 0.01 |
| T0726 | 4FGM | 589 | 7 | 6 | 6 | N | 0.92 | 10 | 0 | N | 0 |
| T0737 | 3TD7 | 292 | 21 | 10 | 10 | N | 0.67 | 6 | 0 | N | 0 |
| T0744 | 2YMV | 329 | 19 | 12 | 12 | Y | 0.78 | 2 | 2 | Y | 0.64 |
*LIBRA prediction was based on the input of the PDBs of the target proteins. “Sites” is the number of ligand-binding sites recorded in PDB files of the target protein. “Y” in “Model” indicates that the prediction was made based on binding pockets in the PDB of the target protein as the template. “N” when the PDB of the target protein was not used in prediction
Prediction performance of COACH*
| Target | PDB | Length | Sites | COACH Rank-1 | COACH Rank-2 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Prediction | TP | Model | MCC | Prediction | TP | Model | MCC | ||||
| T0652 | 4HG0 | 292 | 11 | 12 | 2 | N | 0.14 | 19 | 2 | N | 0.09 |
| T0657 | 2LUL | 154 | 5 | 7 | 0 | N | 0 | 5 | 5 | Y | 1 |
| T0659 | 4ESN | 72 | 3 | 3 | 3 | N | 1 | 8 | 0 | N | 0 |
| T0675 | 2LV2 | 74 | 8 | 4 | 3 | N | 0.49 | 4 | 4 | N | 0.69 |
| T0686 | 4HQL | 242 | 5 | 4 | 3 | N | 0.66 | 13 | 0 | N | 0 |
| T0696 | 4RT5 | 111 | 6 | 5 | 4 | N | 0.72 | 3 | 1 | N | 0.2 |
| T0697 | 4RIT | 483 | 6 | 12 | 0 | N | 0 | 5 | 0 | N | 0 |
| T0706 | 4RCK | 217 | 5 | 3 | 3 | N | 0.77 | 5 | 4 | N | 0.79 |
| T0720 | 4IC1 | 202 | 8 | 5 | 4 | Y | 0.62 | 8 | 4 | Y | 0.48 |
| T0721 | 4FK1 | 301 | 29 | 32 | 24 | N | 0.76 | 19 | 2 | N | 0.01 |
| T0726 | 4FGM | 589 | 7 | 10 | 6 | N | 0.71 | 10 | 3 | N | 0.35 |
| T0737 | 3TD7 | 292 | 21 | 21 | 15 | N | 0.69 | 6 | 1 | Y | 0.05 |
| T0744 | 2YMV | 329 | 19 | 19 | 18 | Y | 0.94 | 7 | 4 | N | 0.32 |
*COACH built structures from the sequences of target proteins except for T0675 and T0697 by directly using the PDBs of the corresponding target proteins themselves. “Sites” is the number of ligand-binding sites recorded in PDB files of the target protein. “Y” in “Model” indicates that the prediction was made based on binding pockets in the PDB of the target protein as the template. “N” when the PDB of the target protein was not used in prediction
PLI predictions of CASP10/11 targets proteins†
| Target | Interactions | Correct Prediction | Recall |
|---|---|---|---|
| T0652 | 60 | 36 | 60% |
| T0657 | 24 | 23 | 95.80% |
| T0675 | 30 | 28 | 93.30% |
| T0686 | 18 | 17 | 94.40% |
| T0696 | 18 | 15 | 83.30% |
| T0697 | 104 | 72 | 69.20% |
| T0706 | 24 | 21 | 87.50% |
| T0720 | 78 | 58 | 74.40% |
| T0721 | 60 | 50 | 83.30% |
| T0737 | 72 | 63 | 87.50% |
| T0744 | 42 | 37 | 88.10% |
| T0762 | 42 | 35 | 83.30% |
| T0764 | 60 | 52 | 86.70% |
| T0770 | 18 | 14 | 77.80% |
| T0784 | 18 | 18 | 100% |
| T0854 | 24 | 20 | 83.30% |
† Target 762 to 854 were taken from CASP11 whose protein-ligand interactions were well characterized in the crystal structures