| Literature DB >> 30180175 |
Christian A Bannister1,2, Julian P Halcox3, Craig J Currie2, Alun Preece1, Irena Spasić1.
Abstract
BACKGROUND: Genetic programming (GP) is an evolutionary computing methodology capable of identifying complex, non-linear patterns in large data sets. Despite the potential advantages of GP over more typical, frequentist statistical approach methods, its applications to survival analyses are rare, at best. The aim of this study was to determine the utility of GP for the automatic development of clinical prediction models.Entities:
Mesh:
Year: 2018 PMID: 30180175 PMCID: PMC6122798 DOI: 10.1371/journal.pone.0202685
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Baseline characteristics of patients in the SMART cohort.
| Predictor | Unit | Test set | Training set | Test statistic | ||
|---|---|---|---|---|---|---|
| 3873 | 11% (147) | 12% (313) | ||||
| 3873 | 25% (320) | 25% (656) | ||||
| 3873 | 52 60 68 | 52 60 68 | F1,3871 = 0.03, P = 0.86 | |||
| 3873 | 18% (235) | 18% (458) | ||||
| 69% (885) | 71% (1826) | |||||
| 12% (158) | 11% (286) | |||||
| 1% (13) | 0% (12) | |||||
| years | 3852 | 5.2 18.2 33.8 | 6.1 19.5 34.5 | F1,3850 = 0.79, P = 0.38 | ||
| 3873 | 20% (255) | 19% (496) | ||||
| 11% (141) | 10% (267) | |||||
| 69% (885) | 70% (1804) | |||||
| 1% (10) | 1% (15) | |||||
| Kg/m2 | 3870 | 24 26 29 | 24 26 29 | F1,3868 = 3, P = 0.084 | ||
| 3873 | 76% (983) | 78% (2004) | ||||
| 23% (294) | 21% (552) | |||||
| 1% (14) | 1% (26) | |||||
| mm Hg | 2650 | 127 140 155 | 127 139 153 | F1,2648 = 1.4, P = 0.23 | ||
| mm Hg | 2652 | 73 79 86 | 73 79 86 | F1,2650 = 0.01, P = 0.9 | ||
| mm Hg | 2375 | 128 140 158 | 125 139 155 | F1,2373 = 3.8, P = 0.052 | ||
| mm Hg | 2374 | 75 82 90 | 74 82 90 | F1,2372 = 0.2, P = 0.65 | ||
| mmol/L | 3855 | 4.4 5.2 5.9 | 4.3 5.1 5.9 | F1,3853 = 2.6, P = 0.11 | ||
| mmol/L | 3843 | 0.95 1.15 1.40 | 0.97 1.18 1.43 | F1,3841 = 3.8, P = 0.05 | ||
| mmol/L | 3657 | 2.5 3.1 3.8 | 2.4 3.0 3.8 | F1,3655 = 3.2, P = 0.073 | ||
| mmol/L | 3845 | 1.1 1.6 2.3 | 1.1 1.5 2.2 | F1,3843 = 4.1, P = 0.042 | ||
| 3873 | 30% (387) | 29% (760) | ||||
| 3873 | 56% (724) | 56% (1436) | ||||
| 3873 | 24% (308) | 24% (632) | ||||
| 3873 | 10% (134) | 11% (282) | ||||
| (μ)mol/L | 3410 | 10 13 16 | 10 13 16 | F1,3408 = 2.5, P = 0.11 | ||
| (μ)mol/L | 3854 | 5.3 5.8 6.5 | 5.3 5.7 6.5 | F1,3852 = 0.94, P = 0.33 | ||
| mL/min | 3856 | 78 89 102 | 78 89 101 | F1,3854 = 0.62, P = 0.43 | ||
| 3873 | 75% (969) | 75% (1928) | ||||
| 17% (221) | 17% (434) | |||||
| 3% (33) | 3% (81) | |||||
| 5% (68) | 5% (139) | |||||
| Mm | 3775 | 0.75 0.88 1.05 | 0.75 0.88 1.07 | F1,3773 = 0.24, P = 0.63 | ||
| 3873 | 79% (1020) | 79% (2038) | ||||
| 18% (236) | 19% (486) | |||||
| 3% (35) | 2% (58) | |||||
Numbers formatted a b c represent the lower quartile, the median, and the upper quartile for continuous variables. N is the number of non–missing values. Numbers after the percent sign are frequencies. NA represents missing value. Tests used:
1Pearson test
2Wilcoxon test
Fig 1The final model developed by genetic programming, presented as a binary tree.
Example of survival data in the counting process format.
| Patient | Time | Event | … | ||
|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | … | 0 |
| 1 | 2 | 0 | 1 | … | 1 |
| 2 | 1 | 0 | 0 | … | 1 |
| 2 | 2 | 0 | 0 | … | 0 |
| 2 | 3 | 1 | 0 | … | 1 |
Cox regression coefficients.
| Predictor | Variable | Full | Stepwise |
|---|---|---|---|
| AGE | 0.0011 | 0.0011 | |
| ALBUMIN = Macro | 0.5289 | 0.5371 | |
| ALBUMIN = Micro | 0.5227 | 0.5184 | |
| ALCOHOL = Current | 0.0234 | ||
| ALCOHOL = Former | −0.1854 | ||
| BMI | −0.0383 | −0.0359 | |
| CREAT | 0.5992 | 0.5282 | |
| DIABETES | 0.0783 | ||
| HDL | −0.4619 | −0.4096 | |
| HISTCAR2 | 0.2980 | 0.2895 | |
| HOMOC | 0.0169 | 0.0182 | |
| IMT | 0.5145 | 0.5879 | |
| SEX = Female | 0.1754 | ||
| SMOKING = Current | 0.0798 | ||
| SMOKING = Former | 0.0427 | ||
| STENOSIS | 0.1815 | ||
| SYSTH | 0.0037 | 0.0041 |
Association of predictors with cardiovascular events.
| Low | High | Δ | Effect | S.E. | Lower | Upper | |
|---|---|---|---|---|---|---|---|
| 52.00 | 68.0 | 16.00 | 0.32 | 0.08 | 0.16 | 0.48 | |
| Hazard ratio | 52.00 | 68.0 | 16.00 | 1.38 | 1.18 | 1.61 | |
| 24.03 | 28.7 | 4.69 | –0.15 | 0.08 | –0.31 | 0.00 | |
| Hazard ratio | 24.03 | 28.7 | 4.69 | 0.86 | 0.73 | 1.00 | |
| 127.00 | 156.0 | 29.00 | 0.11 | 0.07 | –0.04 | 0.25 | |
| Hazard ratio | 127.00 | 156.0 | 29.00 | 1.11 | 0.96 | 1.29 | |
| 0.96 | 1.4 | 0.47 | –0.18 | 0.09 | –0.34 | –0.01 | |
| Hazard ratio | 0.96 | 1.4 | 0.47 | 0.84 | 0.71 | 0.99 | |
| 1.00 | 5.0 | 4.00 | 1.05 | 0.27 | 0.52 | 1.59 | |
| Hazard ratio | 1.00 | 5.0 | 4.00 | 2.87 | 1.67 | 4.91 | |
| 10.50 | 15.9 | 5.40 | 0.09 | 0.05 | –0.02 | 0.19 | |
| Hazard ratio | 10.50 | 15.9 | 5.40 | 1.09 | 0.98 | 1.21 | |
| 78.00 | 101.0 | 23.00 | 0.12 | 0.05 | 0.04 | 0.21 | |
| Hazard ratio | 78.00 | 101.0 | 23.00 | 1.13 | 1.04 | 1.24 | |
| 0.75 | 1.1 | 0.32 | 0.17 | 0.07 | 0.04 | 0.30 | |
| Hazard ratio | 0.75 | 1.1 | 0.32 | 1.19 | 1.04 | 1.35 | |
| 1.00 | 2.0 | 0.47 | 0.14 | 0.21 | 0.74 | ||
| Hazard ratio | 1.00 | 2.0 | 1.60 | 1.23 | 2.09 | ||
| 1.00 | 3.0 | 0.49 | 0.24 | 0.02 | 0.96 | ||
| Hazard ratio | 1.00 | 3.0 | 1.63 | 1.02 | 2.61 |
The calibrated final Cox model.
Fig 2Average survival curves for the Cox regression and GP models.
The error bars represent ±2 standard errors of the KM estimates.
C-statistic.
| Time | Cox regression | GP |
|---|---|---|
| 0.66 | 0.59 | |
| 0.70 | 0.69 | |
| 0.70 | 0.64 |
Values estimates by the two models at t = 1, 3 and 5 years.
Fig 3C-statistic estimates by model for t = 1, 3 and 5 years.
Fig 4Calibration plots for the Cox regression and GP models at t = 1, 3 and 5 years.
χ2 statistic.
| Time | Cox regression | GP | ||
|---|---|---|---|---|
| (years) | χ2 | p-value | χ2 | p-value |
| 7.93 | 0.541 | 5.18 | 0.818 | |
| 4.89 | 0.844 | 9.99 | 0.352 | |
| 10.32 | 0.325 | 16.17 | 0.063 | |
A comparison between observed and expected (according to the model) number of events in groups of patients defined according to the predicted 1 − S(t) at t = 1, 3 and 5 years.