| Literature DB >> 16689695 |
Abstract
G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR sequences have been collected, the ligand specificity of many GPCRs are still unknown and only one crystal structure of the rhodopsin-like family has been solved. Therefore, identifying GPCR types only from sequence data has become an important research issue. In this study, a novel technique for identifying GPCR types based on the weighted Levenshtein distance between two receptor sequences and the nearest neighbor method (NNM) is introduced, which can deal with receptor sequences with different lengths directly. In our experiments for classifying four classes (acetylcholine, adrenoceptor, dopamine, and serotonin) of the rhodopsin-like family of GPCRs, the error rates from the leave-one-out procedure and the leave-half-out procedure are 0.62% and 1.24%, respectively. These results are prior to those of the covariant discriminant algorithm, the support vector machine method, and the NNM with Euclidean distance.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16689695 PMCID: PMC5173237 DOI: 10.1016/s1672-0229(05)03036-6
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Overall and Class Error Rates and Misclassified Accession Numbers of GPCRs in the Leave-one-out Procedure
| Method | Acetylcholine [1] | Adrenoceptor [2] | Dopamine [3] | Serotonin [4] | Overall error rate |
|---|---|---|---|---|---|
| Covariant discriminant algorithm | 10/31 (32.26%) | 5/44 (11.36%) | 7/38 (18.42%) | 6/54 (11.11%) | 28/167 (16.77%) |
| NNM with Euclidean distance | 0/28 (0.00%) | 5/43 (11.63%) | 4/37 (10.81%) | 4/54 (7.41%) | 13/162 (8.02%) |
| NNM with WLD | 0/28 (0.00%) | 0/43 (0.00%) | 1/37 (2.70%) | 0/54 (0.00%) | 1/162 (0.62%) |
Overall and Class Error Rates and Misclassified Accession Numbers of GPCRs in the Leave-half-out Procedure
| Method | Test set | Acetylcholine [1] | Adrenoceptor [2] | Dopamine [3] | Serotonin [4] | Overall error rate |
|---|---|---|---|---|---|---|
| SVM with 10-fold cross-validation | 0.00% | 9.09% | 5.26% | 7.49% | 5.99% | |
| NNM with Euclidean distance | Set 1 | 0/14 (0.00%) | 1/22 (4.55%) | 4/18 (22.22%) | 2/27 (7.41%) | 7/81 (8.64%) |
| Set 2 | 0/14 (0.00%) | 2/21 (9.52%) | 3/19 (15.79%) | 2/27 (7.41%) | 7/81 (8.64%) | |
| NNM with WLD | Set 1 | 0/14 (0.00%) | 0/22 (0.00%) | 0/18 (0.00%) | 0/27 (0.00%) | 0/81 (0.00%) |
| Set 2 | 0/14 (0.00%) | 0/21 (0.00%) | 2/19 (10.53%) | 0/27 (0.00%) | 2/81 (2.47%) |
Summary of 162 GPCRs from Four Classes of the Rhodopsin-like Family
| Class | Number | Minimal length (aa) | Maximal length (aa) | Average length (aa) |
|---|---|---|---|---|
| Acetylcholine | 28 | 460 | 805 | 531.46 |
| Adrenoceptor | 43 | 400 | 519 | 448.09 |
| Dopamine | 37 | 363 | 539 | 441.35 |
| Serotonin | 54 | 357 | 834 | 443.69 |
Fig. 1The computational procedure of WLD.