| Literature DB >> 26932592 |
Yong Liang1, Hua Chai2, Xiao-Ying Liu2, Zong-Ben Xu3, Hai Zhang3, Kwong-Sak Leung4.
Abstract
BACKGROUND: One of the most important objectives of the clinical cancer research is to diagnose cancer more accurately based on the patients' gene expression profiles. Both Cox proportional hazards model (Cox) and accelerated failure time model (AFT) have been widely adopted to the high risk and low risk classification or survival time prediction for the patients' clinical treatment. Nevertheless, two main dilemmas limit the accuracy of these prediction methods. One is that the small sample size and censored data remain a bottleneck for training robust and accurate Cox classification model. In addition to that, similar phenotype tumours and prognoses are actually completely different diseases at the genotype and molecular level. Thus, the utility of the AFT model for the survival time prediction is limited when such biological differences of the diseases have not been previously identified.Entities:
Mesh:
Year: 2016 PMID: 26932592 PMCID: PMC4774162 DOI: 10.1186/s12920-016-0169-6
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Workflow for the development and evaluation of the semi-supervised learning framework for survival analysis
Fig. 2The percentage of different types of samples in original datasets and the datasets processed by our semi-supervised learning approach
Fig. 3The IBS obtained by the Cox and AFT models with and without semi-supervised learning approach for the four gene expression datasets
The performance of the Cox and AFT models with and without the semi-supervised learning approach in simulated experiment (the average numbers and the standard deviations (in brackets) were listed in 50 runs)
| Cor. | Size | Cox | Semi-Cox | ||||
| Correct | Selected | Precision | Correct | Selected | Precision | ||
| 100 | 4.06 (1.39) | 24.44 (4.65) | 0.166 (0.044) | 6.58 (1.41) | 16.96 (6.41) | 0.388 (0.080) | |
| 200 | 5.62 (1.64) | 28.22 (6.16) | 0.199 (0.031) | 8.68 (1.56) | 17.84 (5.72) | 0.487 (0.078) | |
| 300 | 8.02 (1.43) | 35.18 (5.81) | 0.228 (0.029) | 9.76 (0.98) | 19.02 (5.41) | 0.513 (0.087) | |
| 100 | 3.90 (1.43) | 24.38 (5.83) | 0.159 (0.041) | 6.46 (1.37) | 17.08 (6.05) | 0.378 (0.075) | |
| 200 | 5.68 (1.42) | 29.64 (6.19) | 0.192 (0.035) | 8.62 (1.11) | 17.86 (5.45) | 0.483 (0.074) | |
| 300 | 7.84 (1.55) | 35.86 (5.96) | 0.219 (0.037) | 9.42 (0.68) | 18.54 (5.10) | 0.508 (0.082) | |
| Cor. | Size | AFT | Semi-AFT | ||||
| Correct | Selected | Precision | Correct | Selected | Precision | ||
| 100 | 5.02 (1.61) | 38.74 (6.27) | 0.130 (0.029) | 6.84 (1.37) | 35.52 (6.17) | 0.192 (0.031) | |
| 200 | 7.12 (1.30) | 46.68 (6.03) | 0.152 (0.025) | 8.84 (1.18) | 42.16 (5.38) | 0.210 (0.039) | |
| 300 | 8.90 (0.99) | 56.54 (6.85) | 0.157 (0.019) | 9.86 (0.46) | 50.84 (5.49) | 0.194 (0.027) | |
| 100 | 4.74 (1.19) | 39.54 (5.88) | 0.120 (0.030) | 6.72 (1.43) | 35.84 (6.43) | 0.188 (0.033) | |
| 200 | 6.98 (1.50) | 47.02 (6.32) | 0.148 (0.024) | 8.78 (1.02) | 44.96 (6.95) | 0.195 (0.031) | |
| 300 | 8.80 (1.02) | 56.82 (6.30) | 0.155 (0.022) | 9.78 (0.50) | 49.31 (5.86) | 0.198 (0.034) | |
The detail information of four real gene expression datasets used in the experiments
| Datasets | No. of genes | No. of samples | No. of censored |
|---|---|---|---|
| DLBCL (2002) | 7399 | 240 | 102 |
| DLBCL (2003) | 8810 | 92 | 28 |
| Lung cancer | 7129 | 86 | 62 |
| AML | 6283 | 116 | 49 |
Fig. 4The CI obtained by the Cox and AFT models with and without semi-supervised learning approach for the four gene expression datasets
Fig. 5The number of genes selected by the Cox and AFT models with and without semi-supervised learning approach for the four gene expression datasets
Fig. 6The survival curves of the Cox model with and without the semi-supervised learning method for AML dataset
Fig. 7The percentage of different types of data processed by the semi-supervised learning model in simulated experiment
Fig. 8The percentage of correct and error classification obtained by our proposed semi-supervised learning model in simulated experiment