| Literature DB >> 24367394 |
Hongya Zhao1, Christopher J Logothetis2, Ivan P Gorlov2, Jia Zeng3, Jianguo Dai4.
Abstract
Predicting disease progression is one of the most challenging problems in prostate cancer research. Adding gene expression data to prediction models that are based on clinical features has been proposed to improve accuracy. In the current study, we applied a logistic regression (LR) model combining clinical features and gene co-expression data to improve the accuracy of the prediction of prostate cancer progression. The top-scoring pair (TSP) method was used to select genes for the model. The proposed models not only preserved the basic properties of the TSP algorithm but also incorporated the clinical features into the prognostic models. Based on the statistical inference with the iterative cross validation, we demonstrated that prediction LR models that included genes selected by the TSP method provided better predictions of prostate cancer progression than those using clinical variables only and/or those that included genes selected by the one-gene-at-a-time approach. Thus, we conclude that TSP selection is a useful tool for feature (and/or gene) selection to use in prognostic models and our model also provides an alternative for predicting prostate cancer progression.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24367394 PMCID: PMC3866878 DOI: 10.1155/2013/917502
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Logistic regression models that included TSP-selected gene pairs and different combinations of clinical variables.
| Model number | Patient's age | Gleason score | Tumor percentage | Fusion ERG arrangement | TSP genes |
|---|---|---|---|---|---|
| 1.1 | X | ||||
| 1.2 | X | X | |||
| 1.3 | X | X | |||
| 1.4 | X | X | |||
| 1.5 | X | X | |||
| 1.6 | X | X | X | ||
| 1.7 | X | X | X | ||
| 1.8 | X | X | X | ||
| 1.9 | X | X | X | ||
| 1.10 | X | X | X | ||
| 1.11 | X | X | X | ||
| 1.12 | X | X | X | X | |
| 1.13 | X | X | X | X | |
| 1.14 | X | X | X | X | |
| 1.15 | X | X | X | X | |
| 1.16 | X | X | X | X | X |
Figure 1AUC boxplots for 100 tenfold cross validations of 16 models that include TSP-selected genes. The x-axis is the index of 16 models listed in Table 1, and the y-axis is the AUC values. The red star denotes the corresponding AUC values of the validation dataset that uses the best logistic regression models from the learning dataset. (a) Models that included one pair of TSP-selected genes and (b) those that included two such gene pairs.
Figure 2The AUCs of the 16 best models from the validation dataset. The x-axis is the index of the 16 models listed in Table 1, and the y-axis is the AUC values. The blue line shows the AUCs from the models with one TSP-selected gene pair, and the black line shows those from models with two TSP-selected gene pairs. The points circled in red are the AUCs in the 8 models that included the Gleason score as a variable.
Comparison of the performance of our logistic regression models with that of the nine models evaluated by Sboner et al. [5], using the same number of genes.
| Model number | Patient's age | Gleason score | Tumor percentage | Fusion ERG | Number of genes | AUC in ref. [ | AUC of our model |
|---|---|---|---|---|---|---|---|
| 2.1 | X | 18 | 0.672 | 0.769 | |||
| 2.2 | X | X | 9 | 0.708 | 0.732 | ||
| 2.3 | 18 | 0.713 | 0.736 | ||||
| 2.4 | X | X | 21 | 0.726 | 0.793 | ||
| 2.5 | X | 11 | 0.730 | 0.712 | |||
| 2.6 | X | X | X | 3 | 0.738 | 0.806 | |
| 2.7 | X | X | X | 12 | 0.745 | 0.804 | |
| 2.8 | X | 16 | 0.749 | 0.813 | |||
| 2.9 | X | X | 12 | 0.750 | 0.788 |