| Literature DB >> 22496748 |
Bi-Qing Li1, Tao Huang, Lei Liu, Yu-Dong Cai, Kuo-Chen Chou.
Abstract
One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22496748 PMCID: PMC3319543 DOI: 10.1371/journal.pone.0033393
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1IFS curve for the colorectal tumors and matched normal adjacent tissue samples classification.
In the IFS curve, the X-axis is for the number of probes used for classification, and the Y-axis for the prediction accuracies by the nearest neighbor algorithm (NNA) evaluated by the jackknife (Leave-One-Out) cross-validation test. The peak accuracy was 1 with six probes. The top 6 probes in the mRMR probe list formed the optimal discriminative probe set.
mRMR top six genes.
| order | Probe name | Symbol | EntrezID | Protein ID |
| 1 | ILMN_1735578 | GUCA2B | 2981 | ENSP00000361662 |
| 2 | ILMN_1766264 | PI16 | 221476 | ENSP00000362778 |
| 3 | ILMN_1704294 | CDH3 | 1001 | ENSP00000264012 |
| 4 | ILMN_2143314 | SPIB | 6689 | ENSP00000270632 |
| 5 | ILMN_1755796 | BEST2 | 54831 | ENSP00000042931 |
| 6 | ILMN_2339192 | HMGCLL1 | 54511 | ENSP00000381654 |
Figure 215 shortest paths between the six genes identified with mRMR method.
The 15 shortest paths between the six candidate genes were identified with Dijkstra's algorithm based on the PPI data from STRING. Yellow roundrect represents the top six candidate genes identified by the mRMR method. Red round represents the 35 genes existing within the range of the shortest paths. Numbers on edges represent the edge weights to quantify the interaction confidence. The smaller the number is, the stronger the interaction between two nodes is. See the text in the Section of “Graph approach and shortest paths tracing” for the quantitative relation of the edge weight with the confidence score between two proteins concerned.
Shortest paths genes.
| order | Protein id | symbol | betweenness | P-value |
| 1 | ENSP00000363822 | AR | 7 | 0 |
| 2 | ENSP00000269305 | TP53 | 6 | 0.3442 |
| 3 | ENSP00000230354 | TBP | 5 | 0.0066 |
| 4 | ENSP00000250003 | MYOD1 | 5 | 0.0006 |
| 5 | ENSP00000263253 | EP300 | 5 | 0.0598 |
| 6 | ENSP00000287936 | HMGCR | 5 | 0 |
| 7 | ENSP00000314151 | KLK3 | 5 | 0 |
| 8 | ENSP00000344456 | CTNNB1 | 5 | 0.0984 |
| 9 | ENSP00000344741 | INSIG1 | 5 | 0 |
| 10 | ENSP00000349508 | CHD4 | 5 | 0 |
| 11 | ENSP00000351363 | MSMB | 5 | 0 |
| 12 | ENSP00000354620 | FOXJ3 | 5 | 0 |
| 13 | ENSP00000362649 | HDAC1 | 5 | 0.0108 |
| 14 | ENSP00000396219 | MEF2C | 5 | 0 |
| 15 | ENSP00000417884 | TRIM27 | 5 | 0 |
| 16 | ENSP00000342470 | NR1H3 | 4 | 0.005 |
| 17 | ENSP00000354476 | SREBF2 | 4 | 0.0038 |
| 18 | ENSP00000363868 | ABCA1 | 4 | 0.0098 |
| 19 | ENSP00000361066 | NCOA3 | 3 | 0.0038 |
| 20 | ENSP00000419692 | RXRA | 3 | 0.0098 |
| 21 | ENSP00000324806 | GSK3B | 2 | 0.1016 |
| 22 | ENSP00000399968 | NCOA2 | 2 | 0.0308 |
| 23 | ENSP00000206249 | ESR1 | 1 | 0.1968 |
| 24 | ENSP00000254227 | NR0B2 | 1 | 0.0346 |
| 25 | ENSP00000262367 | CREBBP | 1 | 0.0754 |
| 26 | ENSP00000265565 | SCAP | 1 | 0.0088 |
| 27 | ENSP00000268712 | NCOR1 | 1 | 0.0176 |
| 28 | ENSP00000297146 | GPR85 | 1 | 0.0104 |
| 29 | ENSP00000304895 | IRS1 | 1 | 0.0976 |
| 30 | ENSP00000329357 | SP1 | 1 | 0.1242 |
| 31 | ENSP00000348069 | SREBF1 | 1 | 0.023 |
| 32 | ENSP00000348551 | NCOR2 | 1 | 0.0162 |
| 33 | ENSP00000348827 | THRB | 1 | 0.0082 |
| 34 | ENSP00000348986 | INS-IGF2 | 1 | 0.0898 |
| 35 | ENSP00000353483 | MAPK8 | 1 | 0.1194 |
: P-value<0.05, significant.
MaxRel table genes KEGG enrichment.
| Term | KEGG ID | Count | Percentage | P-value | Benjamini Adjusted P-Value |
| Fatty acid metabolism | 00071 | 11 | 1.2 | 8.4E-5 | 1.5E-2 |
| Pentose and glucuronate interconversions | 00040 | 7 | 0.8 | 3.0E-4 | 2.7E-2 |
| Starch and sucrose metabolism | 00500 | 10 | 1.1 | 6.6E-4 | 3.8E-2 |
The number of genes belonging to a certain pathway.
The percentage of genes belonging to a certain pathway account for all the genes underwent KEGG pathway analysis.
mRMR top six genes and shortest path genes KEGG enrichment.
| Term | KEGG ID | Count | Percentage | P-value | Benjamini Adjusted P-Value |
| Prostate cancer | 05215 | 8 | 19.5 | 3.80E-08 | 2.40E-06 |
| Pathways in cancer | 05200 | 10 | 24.4 | 2.60E-06 | 8.00E-05 |
| Wnt signaling pathway | 04310 | 6 | 14.6 | 3.00E-04 | 6.30E-03 |
| Huntington's disease | 05016 | 6 | 14.6 | 6.70E-04 | 1.10E-02 |
| Notch signaling pathway | 04330 | 4 | 9.8 | 8.80E-04 | 1.10E-02 |
| Cell cycle | 04110 | 5 | 12.2 | 1.50E-03 | 1.60E-02 |
| Insulin signaling pathway | 04910 | 5 | 12.2 | 2.00E-03 | 1.80E-02 |
| Colorectal cancer | 05210 | 4 | 9.8 | 4.70E-03 | 3.60E-02 |
| Thyroid cancer | 05216 | 3 | 7.3 | 6.20E-03 | 4.20E-02 |
| Melanogenesis | 04916 | 4 | 9.8 | 7.40E-03 | 4.60E-02 |
The number of genes belonging to a certain pathway.
The percentage of genes belonging to a certain pathway account for all the genes underwent KEGG pathway analysis.
The overlap between 41 genes identified from three different methods and 742 cancer genes.
| Overlap with 742 Cancer genes | p-value | |
| Our 41 genes | 8 | |
| Top 41 mRMR genes | 4 | 0.03965 |
| Top 41 t-test genes | 2 | 4.923e-05 |
The functional similarity between our 41 genes and known colorectal cancer genes.
| Cancer genes | Colorectal cancer genes | |
| Our 41 genes | 0.606068 | 0.491953 |
| Top 41 mRMR genes | 0.163112 | 0.244468 |
| Top 41 t-test genes | 0.203573 | 0.269548 |
Pearson correlation coefficient of functional profiles.