| Literature DB >> 16351739 |
Enrico Ferraro1, Allegra Via, Gabriele Ausiello, Manuela Helmer-Citterich.
Abstract
BACKGROUND: The SH3 domain family is one of the most representative and widely studied cases of so-called Peptide Recognition Modules (PRM). The polyproline II motif PxxP that generally characterizes its ligands does not reflect the complex interaction spectrum of the over 1500 different SH3 domains, and the requirement of a more refined knowledge of their specificity implies the setting up of appropriate experimental and theoretical strategies. Due to the limitations of the current technology for peptide synthesis, several experimental high-throughput approaches have been devised to elucidate protein-protein interaction mechanisms. Such approaches can rely on and take advantage of computational techniques, such as regular expressions or position specific scoring matrices (PSSMs) to pre-process entire proteomes in the search for putative SH3 targets. In this regard, a reliable inference methodology to be used for reducing the sequence space of putative binding peptides represents a valuable support for molecular and cellular biologists.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16351739 PMCID: PMC1866395 DOI: 10.1186/1471-2105-6-S4-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Class-specific network and PSSM results. The comparison shows the substantial improvement of the machine learning method with respect to PSSM. The higher sensitivity and precision of the neural model with respect to the PSSM indicate that the former is able to predict a lower number of false positives. The higher specificity of the neural model also implies a better filtering of non-interacting sequences and a higher performance of the model in the detection of SH3 binders.
| Class | Number of Binders | PSSM | NN | ||||||
| I | 88 (13.1%) | 40 | 73 | 84 | 0.45 | 51 | 77 | 89 | 0.56 |
| II | 131 (18.5%) | 52 | 64 | 87 | 0.47 | 57 | 72 | 88 | 0.55 |
Domain-specific neural network and PSSM results. The application of a domain-specific strategy in the detection of binders reveals the strong effect of the data unbalancing. Class I binding domains have a lower percentage of binders within the datasets and in the corresponding results both PSSM and neural networks display low performances, with no clear benefit in preferring one method to the other. The results of class II binding domains, where a higher percentage of binders (Rvs167, Yfr024, Ysc84) is present, clearly show the prevalence of neural networks. For Boi1 and Boi2 the estimation of PSSM and NN is less significant due to the scarcity of binders.
| Class | Domain | Number of Binders | PSSM | NN | ||||||
| I | BOI1 | 15 (2.2%) | 50 | 25 | 99 | 0.34 | 4 | 80 | 47 | 0.09 |
| MYO5 | 35 (5.2%) | 57 | 67 | 98 | 0.60 | 38 | 53 | 97 | 0.41 | |
| RVS167 | 19 (2.8%) | 0 | 0 | 99 | -0.01 | 31 | 68 | 96 | 0.43 | |
| SHO1 | 37 (5.5%) | 70 | 64 | 98 | 0.65 | 64 | 84 | 97 | 0.71 | |
| YFR024 | 25 (3.7%) | 14 | 14 | 97 | 0.11 | 25 | 37 | 94 | 0.25 | |
| YSC84 | 12 (1.8%) | 100 | 33 | 100 | 0.57 | 10 | 80 | 81 | 0.24 | |
| II | BOI1 | 16 (2.3%) | 17 | 50 | 95 | 0.27 | 19 | 38 | 97 | 0.25 |
| RVS167 | 44 (6.2%) | 53 | 62 | 96 | 0.54 | 59 | 77 | 96 | 0.65 | |
| YFR024 | 123 (17.4%) | 47 | 56 | 87 | 0.40 | 56 | 78 | 87 | 0.58 | |
| YSC84 | 67 (9.5%) | 61 | 55 | 96 | 0.54 | 60 | 83 | 94 | 0.67 | |
Peptide sequence distributions. Peptides are divided into the two classes of binding orientation (I and II). The peptide proportion is reported in the second column. The third and fourth columns contain the number of binders and non-binders, respectively. The fifth column describes class I and class II SH3 domains, with the corresponding proportions of binders and non-binders listed in the last two columns, respectively. The latter information characterizes the domain-specific datasets used to train and test the corresponding domain-specific neural networks. The percentage of binders (3rd and 6th columns) highlights the critical unbalancing and attains acceptable levels only in the two class-specific datasets and in three class II domains in the domain-specific datasets.
| Class | Number of Peptides | Number of Binders | Number of Non-binders | SH3 Domain | Number of Binders (%) | Number of Non-binders (%) |
| I | 672 | 88 (13.1%) | 584 (86.9%) | Rvs167 | 19 (2.8%) | 653 (97.2%) |
| Yfr024c | 25 (3.7%) | 647 (96.3%) | ||||
| Ysc84 | 12 (1.8%) | 660 (98.2%) | ||||
| Boi1 | 15 (2.2%) | 657 (97.8%) | ||||
| Sho1 | 37 (5.5%) | 635 (94.5%) | ||||
| Myo5 | 35 (5.2%) | 637 (94.8%) | ||||
| II | 707 | 131 (18.5%) | 576 (81.5%) | Rvs167 | 44 (6.2%) | 663 (93.8%) |
| Yfr024c | 123 (17.4%) | 584 (82.6%) | ||||
| Ysc84 | 67 (9.5%) | 640 (90.5%) | ||||
| Boi1 | 16 (2.4%) | 691 (97.6%) | ||||
| Boi2 | 6 (0.8%) | 701 (99.2%) |