| Literature DB >> 23112827 |
Takeshi Emura1, Yi-Hau Chen, Hsuan-Yu Chen.
Abstract
Survival prediction from a large number of covariates is a current focus of statistical and medical research. In this paper, we study a methodology known as the compound covariate prediction performed under univariate Cox proportional hazard models. We demonstrate via simulations and real data analysis that the compound covariate method generally competes well with ridge regression and Lasso methods, both already well-studied methods for predicting survival outcomes with a large number of covariates. Furthermore, we develop a refinement of the compound covariate method by incorporating likelihood information from multivariate Cox models. The new proposal is an adaptive method that borrows information contained in both the univariate and multivariate Cox regression estimators. We show that the new proposal has a theoretical justification from a statistical large sample theory and is naturally interpreted as a shrinkage-type estimator, a popular class of estimators in statistical literature. Two datasets, the primary biliary cirrhosis of the liver data and the non-small-cell lung cancer data, are used for illustration. The proposed method is implemented in R package "compound.Cox" available in CRAN at http://cran.r-project.org/.Entities:
Mesh:
Year: 2012 PMID: 23112827 PMCID: PMC3480451 DOI: 10.1371/journal.pone.0047627
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The proposed shrinkage scheme applied for the compound covariate method.
Simulation results under sparse cases with p = 100 and n = 100 based on 50 replications.
|
|
| ||||||||
| CC | CS | Ridge | Lasso | CC | CS | Ridge | Lasso | ||
| Scenario1, | LR-test | −5.89 | −5.88 | −4.99 | −10.59 | −4.71 | −4.55 | −4.75 | −8.76 |
| Cox-test | −8.41 | −8.26 | −7.32 | −13.80 | −6.76 | −7.06 | −6.95 | −11.73 | |
| Devi | 66.63 | 45.62 | −29.48 | −76.92 | 75.34 | 56.30 | −25.75 | −60.50 | |
|
| 0.772 | 0.768 | 0.752 | 0.859 | 0.750 | 0.751 | 0.750 | 0.825 | |
|
| / | 0.25 | 74.54 | 7.06 | / | 0.28 | 68.81 | 6.59 | |
| Scenario2 | LR-test | −8.88 | −9.35 | −7.01 | −12.39 | −6.38 | −6.74 | −6.30 | −11.40 |
| Cox-test | −12.16 | −12.35 | −9.64 | −14.51 | −9.27 | −9.94 | −8.77 | −14.21 | |
| Devi | −17.25 | −26.02 | −43.04 | −95.39 | −4.63 | −11.32 | −36.79 | −84.14 | |
|
| 0.828 | 0.833 | 0.790 | 0.879 | 0.785 | 0.790 | 0.770 | 0.864 | |
|
| / | 0.30 | 37.88 | 6.90 | / | 0.30 | 50.91 | 6.17 | |
|
|
| ||||||||
|
|
|
|
|
|
|
|
| ||
| Scenario1, | LR-test | −3.88 | −4.31 | −4.21 | −6.64 | −2.28 | −2.45 | −2.40 | −1.90 |
| Cox-test | −6.18 | −6.19 | −6.04 | −9.47 | −3.03 | −3.03 | −3.01 | −2.86 | |
| Devi | 80.59 | 56.87 | −21.44 | −43.22 | 145.95 | 97.88 | −9.28 | −7.85 | |
|
| 0.725 | 0.722 | 0.722 | 0.790 | 0.659 | 0.656 | 0.652 | 0.649 | |
|
| / | 0.28 | 79.85 | 6.89 | / | 0.275 | 101.77 | 8.44 | |
| Scenario2 | LR-test | −13.71 | −13.69 | −11.38 | −14.52 | −9.67 | −9.34 | −8.86 | −9.65 |
| Cox-test | −15.18 | −15.22 | −14.04 | −15.48 | −12.68 | −12.65 | −11.34 | −12.24 | |
| Devi | −23.91 | −34.13 | −77.63 | −107.14 | 8.563 | −0.559 | −55.62 | −67.93 | |
|
| 0.886 | 0.885 | 0.862 | 0.889 | 0.843 | 0.835 | 0.822 | 0.838 | |
|
| / | 0.33 | 33.34 | 6.66 | / | 0.29 | 47.22 | 6.86 | |
NOTE: For Scenario 1, each informative covariate is correlated with s non-informative covariates. For Scenario 2, the covariates for the right panel have two gene pathways and those for the left panel have one gene pathway. In each setting, q is the number of informative covariates (covariates with non-zero coefficients).
Simulation results under less sparse cases with p = 100 and n = 100 based on 50 replications.
|
|
| ||||||||
| CC | CS | Ridge | Lasso | CC | CS | Ridge | Lasso | ||
| Scenario1, | LR-test | −1.99 | −1.83 | −1.88 | −1.41 | −1.22 | −1.28 | −1.29 | −0.39 |
| Cox-test | −3.34 | −3.34 | −3.32 | −2.22 | −1.68 | −1.69 | −1.70 | −0.45 | |
| Devi | 75.15 | 62.99 | −10.09 | −5.65 | 100.77 | 88.78 | −3.79 | 0.000 | |
|
| 0.655 | 0.657 | 0.659 | 0.628 | 0.595 | 0.591 | 0.596 | 0.538 | |
|
| / | 0.20 | 125.01 | 10.39 | / | 0.225 | 173.64 | 12.03 | |
| Scenario2 | LR-test | −15.80 | −14.84 | −13.71 | −14.80 | −10.35 | −9.49 | −9.33 | −9.11 |
| Cox-test | −15.35 | −15.30 | −15.05 | −15.57 | −13.23 | −12.98 | −12.30 | −12.01 | |
| Devi | 59.54 | 48.07 | −92.79 | −103.80 | 114.48 | 75.17 | −63.92 | −60.30 | |
|
| 0.898 | 0.895 | 0.875 | 0.890 | 0.852 | 0.843 | 0.839 | 0.832 | |
|
| / | 0.35 | 39.56 | 7.07 | / | 0.41 | 53.37 | 7.42 | |
|
|
| ||||||||
|
|
|
|
|
|
|
|
| ||
| Scenario1, | LR-test | −1.10 | −1.02 | −0.95 | −0.55 | −0.55 | −0.61 | −0.61 | −0.40 |
| Cox-test | −1.35 | −1.27 | −1.43 | −0.42 | −0.68 | −0.66 | −0.62 | −0.22 | |
| Devi | 73.02 | 71.99 | −1.20 | 0.000 | 96.21 | 89.26 | −0.01 | 0.000 | |
|
| 0.601 | 0.598 | 0.605 | 0.529 | 0.552 | 0.548 | 0.559 | 0.501 | |
|
| / | 0.15 | 263.23 | 12.54 | / | 0.14 | 346.62 | 13.07 | |
| Scenario2 | LR-test | −12.27 | −11.84 | −11.40 | −11.41 | −7.93 | −6.80 | −6.67 | −6.05 |
| Cox-test | −12.87 | −12.82 | −12.77 | −12.73 | −10.55 | −9.83 | −9.65 | −8.79 | |
| Devi | 291.82 | 177.76 | −74.42 | −71.46 | 326.63 | 141.46 | −46.02 | −38.22 | |
|
| 0.873 | 0.865 | 0.854 | 0.850 | 0.810 | 0.790 | 0.794 | 0.778 | |
|
| / | 0.45 | 60.36 | 8.33 | / | 0.53 | 84.43 | 8.42 | |
NOTE: For Scenario 1, each informative covariate is correlated with s non-informative covariates. For Scenario 2, the covariates for the right panel have two gene pathways and those for the left panel have one gene pathway. In each setting, q is the number of informative covariates (covariates with non-zero coefficients).
Performance of the five methods based on the primary biliary cirrhosis of the liver data.
| CC | CS | MultiCox | Ridge | Lasso | |
| LR-test (log10 P-value) | −7.95 | −7.00 | −6.35 | −6.98 | −7.11 |
| Cox-test (log10 P-value) | −12.49 | −11.18 | −10.71 | −10.89 | −10.71 |
|
| 0.846 | 0.829 | 0.825 | 0.843 | 0.834 |
| Deviance | 101.8 | −39.9 | −39.2 | −49.4 | −45.9 |
|
| / | 0.875 | / | 22.75 | 7.32 |
Performance of the five methods based on the non-small-cell lung cancer data of Chen et al. [6].
| 97 genes | 16 genes | ||||
| CC | CS | Ridge | Lasso | CC | |
| LR-test (log10P-value) | −1.12 | −0.75 | −0.04 | −0.15 | −0.84 |
| Cox-test (log10P-value) | −0.19 | −0.78 | −0.03 | −0.12 | −0.16 |
|
| 0.581 | 0.606 | 0.535 | 0.544 | 0.584 |
| Deviance | 1520.3 | 68.4 | 15.2 | 15.8 | 439.5 |
|
| / | 0.70 | 11.58 | 2.66 | / |
| Computation time(sec) | 0.41 | 895.9 | 2.12 | 3.05 | 0.06 |
NOTE: Smaller values of the LR-test (log10 P-value), Cox-test (log10 P-value) and Deviance, and larger values of the c-index correspond to more accurate prediction performance.
If good and poor groups are separated by the median PI in the training set, the LR-test has P-value = 0.034 (log10 P-value = −1.47) with n = 28 in the good and n = 34 in the poor groups (the same result as Figure 1C of Chen et al. [6]).
The methods: CC = compound covariate (using 97 or 16 genes), CS = compound shrinkage, Ridge = ridge regression, and Lasso = Lasso analyses are compared.
Figure 2Kaplan-Meier curves for the 62 patients in the lung cancer data of Chen et al. [6].
Good (blue) and poor (red) groups are determined by the median of the PI’s in the test dataset.
Figure 3Kaplan-Meier curves for the 62 patients in the lung cancer data of Chen et al. [.
Good (blue), medium (black), and poor (red) groups are determined by the tertile of the PI’s in the test dataset.
Figure 4The c-index assessments of the four methods under varying number of top genes (p = 16 ∼ 124 ) in the lung cancer data of Chen et al. [6], where “top genes” refer to most strongly associated genes passing a univariate pre-filter for inclusion in the linear predictor (PI).