| Literature DB >> 21980418 |
Tao Huang1, Lei Chen, Yu-Dong Cai, Kuo-Chen Chou.
Abstract
Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) "Metabolism", (ii) "Genetic Information Processing", (iii) "Environmental Information Processing", (iv) "Cellular Processes", (v) "Organismal Systems", and (vi) "Human Diseases". The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area.Entities:
Mesh:
Year: 2011 PMID: 21980418 PMCID: PMC3182212 DOI: 10.1371/journal.pone.0025297
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The distribution of the 146 regulatory pathways.
| Pathway class | Number of pathway |
| Metabolism | 73 |
| Genetic Information Processing | 2 |
| Environmental Information Processing | 15 |
| Cellular Processes | 9 |
| Organismal Systems | 19 |
| Human Diseases | 28 |
| Total | 146 |
A breakdown of the 264 features for a pathway system by considering its biochemical and physicochemical properties.
| Properties | C | T | D | Mean category | Maximum category | Pathway system |
| Hydrophobicity | 3 | 3 | 15 | 21 | 21 | 42 |
| Normalized van der Waals volume | 3 | 3 | 15 | 21 | 21 | 42 |
| Polarity | 3 | 3 | 15 | 21 | 21 | 42 |
| Polarizability | 3 | 3 | 15 | 21 | 21 | 42 |
| Secondary structure | 3 | 3 | 15 | 21 | 21 | 42 |
| Solvent accessibility | 1 | 1 | 5 | 7 | 7 | 14 |
| Amino acid composition | 20 | N/A | N/A | 20 | 20 | 40 |
| Total | 36 | 36 | 80 | 132 | 132 |
|
A breakdown of the of 5570 features.
| Categories | Group name | Number of features |
| Graph property | Graph size and graph density | 2 |
| Degree statistics | 8 | |
| Edge weight statistics | 4 | |
| Topological change | 7 | |
| Degree correlation | 6 | |
| Clustering | 6 | |
| Topological | 12 | |
| Singular values | 3 | |
| Local density change | 40 | |
| Biochemical and physicochemical property | Amino acid compositions | 40 |
| Hydrophobicity, normalized van der Waals volume, polarity and polarizability | 168 | |
| Solvent accessibility | 14 | |
| Secondary structure | 42 | |
| Functional property | Gene ontology enrichment score | 5218 |
| Total | N/A | 5570 |
The distribution of the most relevant 55 features.
| Category | Number of features |
| Graph property | 0 |
| Biochemical and physicochemical property | 32 |
| Functional property | 23 |
| Total | 55 |
Figure 1The IFS curve.
The highest ACC value of IFS is 78.8% using 49 features and SMO model.
The 49 optimized features.
| Order | Featurename |
| 1 | secondary_structure_composition_P_max |
| 2 | solvent_accessibility_composition_H_mean |
| 3 | solvent_accessibility_distribution_H.0.75_max |
| 4 | GO:0043627 response to estrogen stimulus |
| 5 | GO:0045121 membrane raft |
| 6 | secondary_structure_distribution_H.0.25_max |
| 7 | AA_composition_S_mean |
| 8 | secondary_structure_distribution_N.0.25_max |
| 9 | VanDerWaal_composition_P_max |
| 10 | GO:0043330 response to exogenous dsRNA |
| 11 | VanDerWaal_distribution_H.0.75_max |
| 12 | AA_composition_T_max |
| 13 | AA_composition_D_max |
| 14 | secondary_structure_distribution_H.0.5_max |
| 15 | GO:0048519 negative regulation of biological process |
| 16 | GO:0002687 positive regulation of leukocyte migration |
| 17 | secondary_structure_composition_P_mean |
| 18 | polarity_composition_N_max |
| 19 | GO:0042088 T-helper 1 type immune response |
| 20 | polarity_transition_NH_max |
| 21 | AA_composition_S_max |
| 22 | GO:0042063 gliogenesis |
| 23 | polarizability_distribution_P.0.75_max |
| 24 | GO:0090068 positive regulation of cell cycle process |
| 25 | GO:0014829 vascular smooth muscle contraction |
| 26 | secondary_structure_distribution_H.0.75_max |
| 27 | AA_composition_Q_mean |
| 28 | GO:0030225 macrophage differentiation |
| 29 | GO:0046661 male sex differentiation |
| 30 | hydrophobicity_composition_N_max |
| 31 | solvent_accessibility_distribution_H.0.0_max |
| 32 | polarity_distribution_P.0.5_max |
| 33 | polarizability_distribution_H.0.75_max |
| 34 | GO:0031594 neuromuscular junction |
| 35 | GO:0031330 negative regulation of cellular catabolic process |
| 36 | AA_composition_P_max |
| 37 | GO:0042953 lipoprotein transport |
| 38 | GO:0048523 negative regulation of cellular process |
| 39 | GO:0030217 T cell differentiation |
| 40 | GO:0007517 muscle organ development |
| 41 | GO:0009913 epidermal cell differentiation |
| 42 | GO:0042177 negative regulation of protein catabolic process |
| 43 | GO:0048641 regulation of skeletal muscle tissue development |
| 44 | hydrophobicity_distribution_N.0.75_max |
| 45 | hydrophobicity_distribution_H.0.75_max |
| 46 | GO:0022408 negative regulation of cell-cell adhesion |
| 47 | GO:0048608 reproductive structure development |
| 48 | GO:0045638 negative regulation of myeloid cell differentiation |
| 49 | GO:0006897 endocytosis |
Figure 2Distribution of the optimized 49 features.
It is straightforward to see that 25 (25/49, 51.0%) features were from the biochemical and physicochemical property and 24 (24/49, 49.0%) features were from the functional property, while none of features in graph property was selected into the optimized feature set.
Hypergeometric test of overlap between KEGG pathway classes and GO terms in optimized features.
| Metabolism | Genetic Information Processing | Environmental Information Processing | Cellular Processes | Organismal Systems | Human Diseases | |
| GO:0043627 response to estrogen stimulus | 0.032588 | 1 | 5.15E-16 | 1.86E-08 | 0.004826 | 2.30E-19 |
| GO:0045121 membrane raft | 0.681728 | 0.018851 | 2.68E-13 | 7.52E-15 | 1.09E-22 | 8.64E-15 |
| GO:0043330 response to exogenous dsRNA | 1 | 1 | 0.106165 | 0.003522 | 0.000117 | 0.001727 |
| GO:0048519 negative regulation of biological process | 1 | 1 | 1.86E-59 | 8.01E-39 | 4.20E-12 | 1.90E-51 |
| GO:0002687 positive regulation of leukocyte migration | 1 | 1 | 2.11E-09 | 0.001789 | 0.013702 | 0.000707 |
| GO:0042088 T-helper 1 type immune response | 1 | 1 | 3.50E-06 | 0.471266 | 0.094723 | 0.001178 |
| GO:0042063 gliogenesis | 0.993714 | 1 | 5.20E-11 | 1.30E-05 | 0.019525 | 1.32E-13 |
| GO:0090068 positive regulation of cell cycle process | 0.911776 | 1 | 9.12E-08 | 3.49E-06 | 0.024096 | 3.29E-08 |
| GO:0014829 vascular smooth muscle contraction | 1 | 1 | 0.000189 | 0.049965 | 0.023416 | 0.002415 |
| GO:0030225 macrophage differentiation | 1 | 1 | 0.003204 | 0.022913 | 0.00372 | 0.001178 |
| GO:0046661 male sex differentiation | 0.664515 | 1 | 4.00E-10 | 0.036323 | 0.938207 | 3.85E-07 |
| GO:0031594 neuromuscular junction | 1 | 1 | 0.001106 | 4.49E-06 | 1.97E-05 | 0.00224 |
| GO:0031330 negative regulation of cellular catabolic process | 1 | 1 | 0.006858 | 0.527536 | 0.137844 | 0.00224 |
| GO:0042953 lipoprotein transport | 1 | 1 | 0.127363 | 0.312566 | 0.023416 | 0.031663 |
| GO:0048523 negative regulation of cellular process | 0.999997 | 1 | 1.89E-56 | 1.93E-38 | 1.57E-08 | 4.91E-50 |
| GO:0030217 T cell differentiation | 0.957773 | 1 | 1.26E-16 | 0.023685 | 0.000397 | 1.82E-10 |
| GO:0007517 muscle organ development | 0.998366 | 1 | 6.32E-12 | 6.49E-09 | 0.32379 | 2.38E-09 |
| GO:0009913 epidermal cell differentiation | 1 | 1 | 0.123185 | 0.55964 | 0.968491 | 0.395449 |
| GO:0042177 negative regulation of protein catabolic process | 1 | 1 | 0.019214 | 0.002942 | 0.021538 | 0.001178 |
| GO:0048641 regulation of skeletal muscle tissue development | 1 | 1 | 5.03E-05 | 0.001284 | 0.447341 | 2.50E-06 |
| GO:0022408 negative regulation of cell-cell adhesion | 1 | 1 | 0.015685 | 0.040951 | 0.017213 | 0.001727 |
| GO:0048608 reproductive structure development | 0.431739 | 1 | 2.90E-16 | 0.036125 | 0.271969 | 4.81E-12 |
| GO:0045638 negative regulation of myeloid cell differentiation | 1 | 1 | 0.032936 | 0.289118 | 0.009817 | 1.09E-06 |
| GO:0006897 endocytosis | 0.995474 | 1 | 0.000121 | 0.012134 | 0.09916 | 0.006247 |