| Literature DB >> 23425000 |
Hung-Chia Chen1, James J Chen.
Abstract
BACKGROUND: Two most important considerations in evaluation of survival prediction models are 1) predictability - ability to predict survival risks accurately and 2) reproducibility - ability to generalize to predict samples generated from different studies. We present approaches for assessment of reproducibility of survival risk score predictions across medical centers.Entities:
Mesh:
Year: 2013 PMID: 23425000 PMCID: PMC3598915 DOI: 10.1186/1471-2288-13-25
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Definitions of key terms
| Predictability (Predictive performance) | Ability of a model to predict risk scores of patients that can match their survival risks (not survival times). |
| Generalizability (Reproducibility) | Ability of a model to predict risk scores of patients generated from different studies (different locations or different times). |
| Consistency | Agreement between two centers to predict the risk scores of a targeted center. |
| Transferability. | Agreement between one center and the targeted center to predict risk scores of targeted center. |
| Internal validation | An assessment of predictive performance of a model in which the available data are divided into a training set and a test set, the model is developed in the training set and applied to the test set. |
Performance evaluation using single-group analysis for the lung cancer data (training data: UM and HLM; test data: MSK and DFCI)
| A | −0.420 | 3.34 | 1.50E-8 | 0.169 |
| B | −0.244 | 1.59 | 2.19E-4 | 0.059 |
| C | −0.093 | 1.04 | 0.629 | 0.001 |
| D | −0.050 | 1.05 | 0.655 | 0.001 |
| E | −0.196 | 1.19 | 0.031 | 0.026 |
| F | −0.265 | 1.51 | 8.46E-4 | 0.062 |
| G | −0.170 | 1.15 | 0.083 | 0.017 |
| H | −0.333 | 1.71 | 2.50E-5 | 0.083 |
Columns 3–5 are the estimates from fitting a Cox model using the predicted risk scores as an independent variable.
Performance evaluation using two-group comparison for the lung cancer data (training data: UM and HLM; test data: MSK and DFCI)
| A | 3.78 | 1.18E-6 | 1.81E-7 |
| B | 2.03 | 0.003 | 0.002 |
| C | 1.34 | 0.264 | 0.261 |
| D | 0.94 | 0.808 | 0.811 |
| E | 1.99 | 0.006 | 0.005 |
| F | 1.91 | 0.009 | 0.007 |
| G | 1.57 | 0.068 | 0.066 |
| H | 2.37 | 0.005 | 0.004 |
The high-risk and low-risk groups for test data were segregated based on the median of the training scores. These are results from fitting a Cox model using the risk group as an independent variable and from the log-rank test.
Performance evaluation using single-group analysis for the lung cancer data (training data: MSK and DFCI; testing data: UM and HLM)
| A | −0.303 | 1.98 | 9.37E-11 | 0.149 |
| B | −0.230 | 1.61 | 2.18E-9 | 0.109 |
| C | −0.183 | 1.31 | 0.021 | 0.020 |
| D | −0.119 | 1.11 | 0.177 | 0.007 |
| E | −0.342 | 1.70 | 3.61E-11 | 0.150 |
| F | −0.237 | 1.30 | 1.20E-5 | 0.067 |
| G | −0.269 | 1.51 | 4.49E-8 | 0.101 |
| H | −0.258 | 1.37 | 5.19E-7 | 0.086 |
Performance evaluation using two-group comparison for the lung cancer data (training data: MSK and DFCI; testing data: UM and HLM)
| A | 2.40 | 6.00E-8 | 2.37E-8 |
| B | 1.74 | 5E-4 | 4.21E-4 |
| C | 1.53 | 0.132 | 0.129 |
| D | 1.33 | 0.138 | 0.137 |
| E | 2.10 | 3.99E-5 | 2.66E-5 |
| F | 1.64 | 0.002 | 0.002 |
| G | 1.74 | 0.002 | 0.001 |
| H | 1.74 | 6.43E-4 | 5.48E-4 |
Estimates of correlation between predicted risk scores from a center’s own training model and predicted risk scores using the training model of another center (training center) are given in the first three rows of each table (model transferability)
| UM | HLM | 0.74 | 0.62 | 0.56 | 0.33 | 0.58 | 0.43 | 0.56 | 0.42 |
| DFCI | 0.77 | 0.56 | 0.56 | 0.14 | 0.71 | 0.32 | 0.65 | 0.27 | |
| MSK | 0.66 | 0.47 | 0.41 | 0.25 | 0.49 | 0.34 | 0.47 | 0.31 | |
| HLM and DFCI | 0.83 | 0.78 | 0.54 | 0.19 | 0.46 | 0.4 | 0.59 | 0.39 | |
| HLM and MSK | 0.58 | 0.38 | 0.39 | 0.22 | 0.34 | 0.23 | 0.33 | 0.18 | |
| DFCI and MSK | 0.5 | 0.36 | 0.2 | 0 | 0.3 | 0.11 | 0.35 | 0.15 | |
| HLM | UM | 0.65 | 0.63 | 0.49 | 0.09 | 0.63 | 0.25 | 0.53 | 0.27 |
| DFCI | 0.83 | 0.77 | 0.5 | 0.21 | 0.53 | 0.34 | 0.64 | 0.37 | |
| MSK | 0.53 | 0.18 | 0.27 | 0.37 | 0.38 | 0.4 | 0.28 | 0.33 | |
| UM and DFCI | 0.68 | 0.66 | 0.5 | 0.25 | 0.75 | 0.47 | 0.65 | 0.47 | |
| UM and MSK | 0.72 | 0.46 | 0.34 | 0.12 | 0.54 | 0.29 | 0.54 | 0.2 | |
| DFCI and MSK | 0.44 | 0.27 | 0.2 | 0.03 | 0.47 | 0.24 | 0.49 | 0.19 | |
| DFCI | UM | 0.84 | 0.62 | 0.6 | 0.29 | 0.72 | 0.54 | 0.65 | 0.43 |
| HLM | 0.86 | 0.87 | 0.53 | 0.35 | 0.56 | 0.43 | 0.67 | 0.48 | |
| MSK | 0.56 | 0.41 | 0.19 | 0.34 | 0.23 | 0.38 | 0.31 | 0.36 | |
| UM and HLM | 0.72 | 0.62 | 0.52 | 0.37 | 0.61 | 0.55 | 0.6 | 0.51 | |
| UM and MSK | 0.7 | 0.54 | 0.44 | 0.34 | 0.51 | 0.38 | 0.51 | 0.28 | |
| HLM and MSK | 0.61 | 0.34 | 0.41 | 0.27 | 0.38 | 0.28 | 0.3 | 0.17 | |
| MSK | UM | 0.7 | 0.5 | 0.35 | 0.35 | 0.45 | 0.46 | 0.46 | 0.36 |
| HLM | 0.66 | 0.19 | 0.5 | 0.39 | 0.45 | 0.3 | 0.32 | 0.29 | |
| DFCI | 0.54 | 0.21 | 0.41 | 0.06 | 0.32 | 0.11 | 0.41 | 0.11 | |
| UM and HLM | 0.75 | 0.55 | 0.42 | 0.14 | 0.55 | 0.32 | 0.47 | 0.27 | |
| UM and DFCI | 0.76 | 0.52 | 0.7 | 0.35 | 0.71 | 0.42 | 0.6 | 0.39 | |
| HLM and DFCI | 0.88 | 0.77 | 0.59 | 0.28 | 0.48 | 0.52 | 0.62 | 0.49 |
The last three rows (consistency) display the correlation between predicted risk scores of the “center” using the training models developed by the “training centers.”
Performance evaluation of the colon cancer data for eight prediction models
| A | −0.520 | 3.85 | 2.27 | 6.51 | 5.04E-7 | 0.283 |
| B | −0.488 | 5.13 | 3.09 | 8.51 | 2.46E-10 | 0.251 |
| C | −0.227 | 2.48 | 1.52 | 4.04 | 2.78E-4 | 0.043 |
| D | −0.126 | 1.33 | 0.83 | 2.11 | 0.233 | 0.021 |
| E | −0.446 | 4.18 | 2.47 | 7.06 | 8.99E-08 | 0.169 |
| F | −0.210 | 1.87 | 1.16 | 3.00 | 0.01 | 0.049 |
| G | −0.563 | 4.52 | 2.66 | 7.67 | 2.25E-08 | 0.293 |
| H | −0.361 | 2.29 | 1.42 | 3.68 | 6.5E-4 | 0.149 |
Somers’ correlation (Dxy), the hazard ratio (HR) with the 95% confidence limits (CI) and p-value of significance (P-value), and the coefficient of determination (R2) are given for the single-group analysis.
Performance evaluation of the colon cancer data for eight prediction models
| A | 3.73 | 2.25 | 6.16 | 2.90E-7 | 0.152 |
| B | 5.13 | 3.09 | 8.51 | 2.46E-10 | 0.220 |
| C | NA | NA | NA | NA | NA |
| D | 1.73 | 0.93 | 3.22 | 0.083 | 0.019 |
| E | NA | NA | NA | NA | NA |
| F | 2.12 | 1.21 | 3.69 | 0.008 | 0.044 |
| G | NA | NA | NA | NA | NA |
| H | 2.68 | 1.23 | 5.85 | 0.013 | 0.044 |
The hazard ratio (HR) with the 95% confidence limits (CI) and p-value of significance (P-value), and the coefficient of determination (R2) are given for the two-group comparison.
Figure 1Box plots of the correlation coefficients between risk scores of Moffitt Cancer Center (MCC) test data predicted by the models developed from the Moffitt and Vanderbilt data using re-sampling techniques based on 1,000 repetitions. For each statistical model, three correlations are computed: 1) ρ1 (model transferability): consistency of risk scores developed from the VMC and MCC centers, 2) ρ2 (signature transferability): consistency of risk scores predicted by two centers using the same MCC signature, and 3) ρ3 (signature transferability): consistency of risk scores predicted from MCC data using different signatures.