| Literature DB >> 20625410 |
Mei-Ling Hou1, Shu-Lin Wang, Xue-Ling Li, Ying-Ke Lei.
Abstract
Selection of reliable cancer biomarkers is crucial for gene expression profile-based precise diagnosis of cancer type and successful treatment. However, current studies are confronted with overfitting and dimensionality curse in tumor classification and false positives in the identification of cancer biomarkers. Here, we developed a novel gene-ranking method based on neighborhood rough set reduction for molecular cancer classification based on gene expression profile. Comparison with other methods such as PAM, ClaNC, Kruskal-Wallis rank sum test, and Relief-F, our method shows that only few top-ranked genes could achieve higher tumor classification accuracy. Moreover, although the selected genes are not typical of known oncogenes, they are found to play a crucial role in the occurrence of tumor through searching the scientific literature and analyzing protein interaction partners, which may be used as candidate cancer biomarkers.Entities:
Mesh:
Year: 2010 PMID: 20625410 PMCID: PMC2896865 DOI: 10.1155/2010/726413
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1The framework of our analysis method. (a) An ensemble classifier was constructed on the basis of the selected genes subsets by HBFSNRS with a specific threshold value δ. (b) Another ensemble classifier was constructed based on classification results of each δ value. (c) sig denotes the significance of genes, which is defined as (6).
Algorithm 1A heuristic breadth-first search algorithm based on neighborhood rough set (HBFSNRS).
The division of training set and test set in our experiments.
| No. | Dataset | Training set | Test set | No. of gene | No. of class |
|---|---|---|---|---|---|
| 1 | ALL | 148 | 100 | 12625 | 6 |
| 2 | Breast cancer | Breast30 | Breast22 | 19802 | 2 |
| 3 | Colon | 42 | 20 | 2000 | 2 |
| 4 | Prostate cancer | Prostate102 | Prostate34 | 12600 | 2 |
Classification accuracy, sensitivity and specificity on all the test datasets by the ensemble classifier.
| Dataset | ||||||
|---|---|---|---|---|---|---|
| ALL | 0.32(8) | 0.35(9) | 0.44(13) | 0.47(14) | 0.66(20) | integration |
| Accuracy | 89.00 | 92.00 | 93.00 | 94.00 | 93.00 | 95.00 |
| Breast | 0.04(2) | 0.21(2) | 0.29(2) | 0.30(2) | 0.69(3) | integration |
| Accuracy | 86.36 | 90.91 | 90.91 | 90.91 | 95.45 | 90.91 |
| Sensitivity | 100.00 | 100.00 | 100.00 | 100.00 | 93.33 | 100.00 |
| Specificity | 57.14 | 71.43 | 71.43 | 71.43 | 100.00 | 71.43 |
| Colon | 0.03(2) | 0.04(2) | 0.82(6) | 0.92(3) | 0.13(2) | integration |
| Accuracy | 70.00 | 75.00 | 75.00 | 80.00 | 75.00 | 75.00 |
| Sensitivity | 75.00 | 75.00 | 75.00 | 83.33 | 75.00 | 75.00 |
| Specificity | 62.50 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 |
| Prostate | 0.13(4) | 0.20(5) | 0.26(5) | 0.57(5) | 0.62(5) | integration |
| Accuracy | 94.12 | 91.18 | 88.24 | 88.24 | 97.06 | 91.18 |
| Sensitivity | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| Specificity | 92.00 | 88.00 | 84.00 | 84.00 | 96.00 | 88.00 |
Figure 2Comparison of classification accuracy with different numbers of top-ranked genes on the four test datasets by HBFSNRS, Relif-F, and KWRST.
The comparison with the ClaNC method in classification accuracy.
| Method | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of genes selected per subclass: | |||||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
| ClaNC | ALL(×6) | 86.00 | 95.00 | 97.00 | 99.00 | 98.00 | 99.00 | 99.00 | 99.00 | 99.00 | 98.00 |
| Breast(×2) | 50.00 | 40.91 | 45.45 | 45.45 | 40.91 | 40.91 | 40.91 | 40.91 | 40.91 | 40.91 | |
| Colon(×2) | 65.00 | 65.00 | 65.00 | 70.00 | 70.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | |
| Prostate(×2) | 73.53 | 85.29 | 79.41 | 76.47 | 76.47 | 79.41 | 79.41 | 76.47 | 76.47 | 79.41 | |
| Method | Number of all genes selected | ||||||||||
| 1 | 2 | 3 | 4 | 6 | 8 | 12 | 18 | 24 | 30 | ||
| HBFSNRS | ALL | 41.00 | 71.00 | 73.00 | 82.00 | 87.00 | 94.00 | 94.00 | 96.00 | 96.00 | 97.00 |
| Breast | 100.00 | 95.45 | 86.36 | 86.36 | 86.36 | 90.91 | 86.36 | 77.27 | 86.36 | 86.36 | |
| Colon | 80.00 | 70.00 | 80.00 | 70.00 | 85.00 | 80.00 | 75.00 | 75.00 | 75.00 | 75.00 | |
| Prostate | 97.06 | 91.18 | 82.35 | 82.35 | 79.41 | 82.35 | 88.24 | 85.29 | 85.29 | 88.24 | |
The 10 top-ranked genes selected for the four datasets.
| Four datasets | |||||||
| ALL | Breast cancer | Colon cancer | Prostate cancer | ||||
|---|---|---|---|---|---|---|---|
| gene symbol | sig | gene symbol | sig | gene symbol | sig | gene symbol | sig |
| LRMP | 0.0801 | CTHRC1 | 0.1212 | DES | 0.0895 | HPN | 0.174 |
| TCFL5 | 0.0569 | PDLIM4 | 0.0476 | MYH9 | 0.0834 | MAF | 0.1248 |
| CD99 | 0.0526 | KRT17 | 0.0321 | C3 | 0.062 | ABL1 | 0.0457 |
| MPP1 | 0.0483 | SFRP1 | 0.0292 | FUCA1 | 0.0538 | GSTP1 | 0.0225 |
| CD72 | 0.0399 | COL3A1 | 0.0261 | CSRP1 | 0.0427 | KIAA0430 | 0.0216 |
| NONO | 0.0377 | PI15 | 0.0258 | MT2A | 0.0421 | WWC1 | 0.0192 |
| DNTT | 0.0345 | ACTG2 | 0.0241 | TSPAN7 | 0.0346 | JUNB | 0.0164 |
| PLXNB2 | 0.0329 | TFPI2 | 0.0217 | 2-Sep | 0.0294 | PEX3 | 0.0153 |
| ECM1 | 0.0325 | SERPINB5 | 0.0203 | FXN | 0.0236 | RND3 | 0.0151 |
| SMARCA4 | 0.0296 | FN1 | 0.0186 | PMP22 | 0.0214 | P4HB | 0.0146 |
Figure 3The protein-interaction network associated with the ten top-ranked genes for prostate cancer.