| Literature DB >> 21076385 |
Lei Chen1, Tao Huang, Xiao-He Shi, Yu-Dong Cai, Kuo-Chen Chou.
Abstract
Given a protein-forming system, i.e., a system consisting of certain number of different proteins, can it form a biologically meaningful pathway? This is a fundamental problem in systems biology and proteomics. During the past decade, a vast amount of information on different organisms, at both the genetic and metabolic levels, has been accumulated and systematically stored in various specific databases, such as KEGG, ENZYME, BRENDA, EcoCyc and MetaCyc. These data have made it feasible to address such an essential problem. In this paper, we have analyzed known regulatory pathways in humans by extracting different (biological and graphic) features from each of the 17,069 protein-formed systems, of which 169 are positive pathways, i.e., known regulatory pathways taken from KEGG; while 16,900 were negative, i.e., not formed as a biologically meaningful pathway. Each of these protein-forming systems was represented by 352 features, of which 88 are graph features and 264 biological features. To analyze these features, the "Minimum Redundancy Maximum Relevance" and the "Incremental Feature Selection" techniques were utilized to select a set of 22 optimal features to query whether a protein-forming system is able to form a biologically meaningful pathway or not. It was found through cross-validation that the overall success rate thus obtained in identifying the positive pathways was 79.88%. It is anticipated that, this novel approach and encouraging result, although preliminary yet, may stimulate extensive investigations into this important topic.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21076385 PMCID: PMC6259184 DOI: 10.3390/molecules15118177
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Amount of properties in feature group 10–13.
| Properties | C | T | D | Total |
|---|---|---|---|---|
|
| 3 | 3 | 15 | 21 |
|
| 3 | 3 | 15 | 21 |
|
| 3 | 3 | 15 | 21 |
|
| 3 | 3 | 15 | 21 |
|
| 1 | 1 | 5 | 7 |
|
| 20 | --- | --- | 20 |
|
| --- | --- | --- | 132 |
The distribution of 352 features.
| Group ID | Group Name | Number of features |
|---|---|---|
| 1 | Graph size and graph density | 2 |
| 2 | Degree statistic | 8 |
| 3 | Edge weight statistics | 4 |
| 4 | Topological change | 7 |
| 5 | Degree correlation | 6 |
| 6 | Clustering | 6 |
| 7 | Topological | 12 |
| 8 | Singular values | 3 |
| 9 | Local density change | 40 |
| 10 | Hydrophobicity, normalized van der Waals volume, polarity and polarizability | 4 × 2 × 21 = 168 |
| 11 | Solvent accessibility | 7 × 2 = 14 |
| 12 | Secondary structure | 2 × 21 = 42 |
| 13 | Amino acid compositions | 2 × 20 = 40 |
Figure 1Illustration to show the distribution of features. See the text in Section 3.1 for further explanation.
Figure 2The IFS (incremental feature selection) curve. See the text in Section 3.2 for further explanation.
Figure 3Distribution of the optimized 22 features. See the text in Section 3.2 for further explanation.