| Literature DB >> 32487049 |
Paul P Fahey1, Andrew Page2, Glenn Stone3, Thomas Astell-Burt4.
Abstract
BACKGROUND: For epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time.Entities:
Keywords: Alcohol drinking; Cancer registries; Exercise; Obesity; Oesophageal neoplasms; Tobacco smoking
Mesh:
Year: 2020 PMID: 32487049 PMCID: PMC7268470 DOI: 10.1186/s12885-020-06990-3
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
The estimated proportions with each health behaviour, the phi coefficient between imputed values and the estimated excess matches for each analysis
| Behaviour 5 years before diagnosis | N | Estimated proportion with behaviour, | Estimated phi coefficient, | Estimated excess matches, | |||
|---|---|---|---|---|---|---|---|
| Median | 95% CI | Median | 95% CI | Median | 95% CI | ||
| Current smoking | |||||||
| overall | 27,835 | 0.159 | 0.157,0.162 | 0.071 | 0.059,0.084 | 262.2 | 220.1312.2 |
| ESCC | 8914 | 0.166 | 0.162,0.170 | 0.077 | 0.061,0.097 | 94.8 | 74.5120.7 |
| EAC | 15,726 | 0.157 | 0.153,0.159 | 0.066 | 0.052,0.081 | 137.0 | 107.4169.5 |
| Binge drinking | |||||||
| Overall | 27,750 | 0.100 | 0.098, 0.102 | 0.060 | 0.049,0.077 | 150.5 | 121.5192.1 |
| ESCC | 8891 | 0.086 | 0.082,0.089 | 0.060 | 0.042,0.086 | 42.2 | 29.8,61.1 |
| EAC | 15,673 | 0.109 | 0.106,0.111 | 0.058 | 0.042,0.079 | 88.6 | 63.6120.3 |
| Heavy drinking | |||||||
| Overall | 27,749 | 0.048 | 0.047,0.050 | 0.011 | 0.002,0.025 | 14.3 | 2.7,32.0 |
| ESCC | 8888 | 0.046 | 0.043,0.049 | 0.015 | −0.002,0.036 | 5.7 | − 0.7,14.2 |
| EAC | 15,676 | 0.050 | 0.048,0.052 | 0.008 | −0.004,0.028 | 6.0 | −3.0,20.8 |
| Physical activity | |||||||
| Overall | 27,830 | 0.737 | 0.734,0.740 | 0.034 | 0.026,0.046 | 185.1 | 139.4247.4 |
| ESCC | 8912 | 0.716 | 0.709,0.721 | 0.036 | 0.016,0.056 | 64.7 | 29.6100.2 |
| EAC | 15,724 | 0.750 | 0.746,0.754 | 0.031 | 0.013,0.047 | 91.4 | 40.0,138.4 |
| Obese | |||||||
| Overall | 27,796 | 0.257 | 0.254,0.261 | 0.030 | 0.020,0.042 | 160.2 | 108.4226.8 |
| ESCC | 8898 | 0.262 | 0.255,0.268 | 0.045 | 0.024,0.061 | 77.0 | 41.4104.6 |
| EAC | 15,709 | 0.256 | 0.251,0.261 | 0.023 | 0.012,0.041 | 67.8 | 35.0,122.4 |
| Current smoking with regular drinking | |||||||
| Overall | 27,735 | 0.034 | 0.033,0.035 | 0.022 | 0.009,0.038 | 19.8 | 8.0,34.2 |
| ESCC | 8883 | 0.031 | 0.029,0.033 | 0.024 | −0.000,0.049 | 6.2 | −0.0,13.5 |
| EAC | 15,670 | 0.035 | 0.034,0.037 | 0.021 | 0.004,0.042 | 11.5 | 2.1,22.4 |
proportion of imputed values where the health behaviour is present
= φ the correlation between the pairs of imputed values (calculated as the phi coefficient)
= the excess number of correct matches greater than would be expected through chance alone
Median median of 100 repetitions of the imputation algorithm,
95% CI = empirical 95% confidence interval created from the 2.5 and 97.5 percentiles obtained from 100 repetitions of the imputation algorithm,
N number of SEER oesophageal cancer cases receiving data from two donor records from the BRFSS health behaviour datasets
ESCC oesophageal squamous cell carcinoma,
EAC oesophageal adenocarcinoma
Result of simulation-based testing of whether or not the imputation can be used to predict relative risk
| Target RR | Simulated data RR | Imputed RR ( | Impossible Result ( | Estimated true RR ( | |||
|---|---|---|---|---|---|---|---|
| Median | 95% CI | Median | 95% CI | Frequency | Median | 95% CIb | |
| Current smoking | |||||||
| RR = 0.5 | 0.501 | 0.475,0.521 | 0.964 | 0.934,0.993a | 0 | 0.519 | 0.163,0.904a |
| RR = 0.66 | 0.660 | 0.635,0.683 | 0.973 | 0.944,0.999a | 0 | 0.638 | 0.300,0.985a |
| RR = 0.80 | 0.799 | 0.771,0.823 | 0.983 | 0.952,1.017 | 0 | 0.753 | 0.375,1.226 |
| RR = 1.00 | 1.001 | 0.976,1.026 | 0.997 | 0.967,1.027 | 0 | 0.957 | 0.577,1.444 |
| RR = 1.25 | 1.249 | 1.220,1.287 | 1.017 | 0.989,1.048 | 0 | 1.254 | 0.856,1.793 |
| RR = 1.50 | 1.499 | 1.465,1.528 | 1.032 | 1.005,1.059a | 0 | 1.486 | 1.069,1.947a |
| RR = 2.00 | 2.000 | 1.974,2.034 | 1.064 | 1.034,1.092a | 0 | 2.047 | 1.542,2.532a |
| Binge drinking | |||||||
| RR = 0.5 | 0.501 | 0.474,0.526 | 0.967 | 0.940,0.996a | 0 | 0.478 | 0.087,0.927a |
| RR = 0.66 | 0.659 | 0.624,0.692 | 0.976 | 0.945,1.015 | 1 | 0.629 | 0.173,1.316 |
| RR = 0.80 | 0.798 | 0.758,0.830 | 0.988 | 0.959,1.025 | 0 | 0.805 | 0.341,1.448 |
| RR = 1.00 | 0.997 | 0.963,1.033 | 0.999 | 0.971,1.032 | 0 | 0.981 | 0.518,1.492 |
| RR = 1.25 | 1.245 | 1.213,1.278 | 1.016 | 0.984,1.054 | 0 | 1.271 | 0.739,2.029 |
| RR = 1.50 | 1.499 | 1.463,1.534 | 1.030 | 0.990,1.068 | 0 | 1.517 | 0.831,2.246 |
| RR = 2.00 | 1.999 | 1.978,2.028 | 1.058 | 1.021,1.093a | 0 | 2.014 | 1.352,2.717a |
| Heavy Drinking | |||||||
| RR = 0.5 | 0.500 | 0.450,0.548 | 0.995 | 0.945,1.046 | 40 | failed | failed |
| RR = 0.66 | 0.661 | 0.606,0.697 | 0.995 | 0.946,1.046 | 34 | failed | failed |
| RR = 0.80 | 0.799 | 0.746,0.847 | 0.997 | 0.944,1.053 | 43 | failed | failed |
| RR = 1.00 | 0.997 | 0.949,1.045 | 0.998 | 0.940,1.041 | 32 | failed | failed |
| RR = 1.25 | 1.251 | 1.210,1.300 | 1.003 | 0.959,1.053 | 22 | failed | failed |
| RR = 1.50 | 1.497 | 1.459,1.535 | 1.012 | 0.956,1.059 | 24 | failed | failed |
| RR = 2.00 | Not possible | Not possible | |||||
| Physical activity | |||||||
| RR = 0.5 | 0.500 | 0.491,0.509 | 0.974 | 0.951,0.997a | 0 | 0.504 | 0.319,0.901a |
| RR = 0.66 | 0.659 | 0.645,0.671 | 0.983 | 0.959,1.006 | 0 | 0.632 | 0.367,1.231 |
| RR = 0.80 | 0.800 | 0.782,0.818 | 0.993 | 0.971,1.017 | 0 | 0.833 | 0.449,1.907 |
| RR = 1.00 | 1.002 | 0.976,1.022 | 1.001 | 0.978,1.021 | 0 | 1.025 | 0.488,2.092 |
| RR = 1.25 | 1.250 | 1.219,1.276 | 1.006 | 0.977,1.030 | 0 | 1.206 | 0.541,2.961 |
| RR = 1.50 | 1.499 | 1.455,1.549 | 1.013 | 0.987,1.037 | 2 | 1.514 | 0.722,4.078 |
| RR = 2.00 | 2.003 | 1.939,2.083 | 1.021 | 1.002,1.047* | 3 | 2.127 | 1.055,10.987a |
| Obese | |||||||
| RR = 0.5 | 0.499 | 0.485,0.517 | 0.983 | 0.960,1.008 | 1 | 0.550 | 0.028,1.322 |
| RR = 0.66 | 0.660 | 0.634,0.680 | 0.989 | 0.962,1.016 | 2 | 0.665 | 0.114,1.772 |
| RR = 0.80 | 0.802 | 0.777,0.823 | 0.995 | 0.967,1.015 | 1 | 0.846 | 0.316,1.676 |
| RR = 1.00 | 1.002 | 0.981,1.024 | 0.999 | 0.980,1.024 | 0 | 0.962 | 0.461,2.067 |
| RR = 1.25 | 1.250 | 1.222,1.274 | 1.009 | 0.989,1.030 | 0 | 1.335 | 0.601,2.300 |
| RR = 1.50 | 1.500 | 1.468,1.534 | 1.014 | 0.987,1.039 | 0 | 1.440 | 0.606,2.796 |
| RR = 2.00 | 2.002 | 1.961,2.041 | 1.025 | 0.997,1.044 | 0 | 1.995 | 0.886,3.234 |
| Current smoking with regular drinking | |||||||
| RR = 0.5 | 0.504 | 0.441,0.550 | 0.988 | 0.931,1.034 | 37 | failed | failed |
| RR = 0.66 | 0.660 | 0.600,0.713 | 0.997 | 0.932,1.066 | 31 | failed | failed |
| RR = 0.80 | 0.797 | 0.744,0.863 | 0.991 | 0.928,1.052 | 34 | failed | failed |
| RR = 1.00 | 0.996 | 0.943,1.049 | 1.001 | 0.940,1.059 | 25 | failed | failed |
| RR = 1.25 | 1.250 | 1.183,1.298 | 1.009 | 0.954,1.059 | 16 | failed | failed |
| RR = 1.50 | 1.497 | 1.454,1.545 | 1.000 | 0.958,1.065 | 19 | failed | failed |
| RR = 2.00 | Not possible | Not possible | |||||
Target RR – the relative risk we attempted to achieve in the simulated data
Simulated data RR – the relative risk which was actually achieved between the first imputed value and the simulated one-year survival status
Imputed RR (RR) – the relative risk calculated using the second imputed data point as the imputed behaviour
Impossible result – instances where the estimated true relative risk was impossible (a negative value)
Estimated True RR (RR) – the estimated true relative risk derived from the imputed relative risk and calibration parameters and
Median median of 100 repetitions of the imputation algorithm,
95% CI = empirical 95% confidence interval created from the 2.5 and 97.5 percentiles obtained from 100 repetitions of the imputation algorithm,
a 95% confidence intervals exclude no association (i.e. exclude relative risk equals 1)
b excludes impossible result
Estimated relative risks of 1-year survival derived from imputed pre-diagnosis behaviours for SEER oesophageal cancer cases, 2006–2014; unadjusted and age adjusted
| Imputed RR ( | Impossible Result ( | Estimated True RR ( | Age-adjusted Imputed RR ( | Impossible Result ( | Age-adjusted Estimated True RR ( | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Median | 95% CI | Frequency | Median | 95% CI | Median | 95% CI | Frequency | Median | 95% CI | |
| Current smoking | ||||||||||
| All | 0.986 | 0.954,1.009 | 0 | 0.806 | 0.380,1.130 | 1.051 | 1.014,1.078 | 0 | 1.794 | 1.215,2.357a |
| ESCC | 1.025 | 0.981,1.067 | 0 | 1.349 | 0.733,2.142 | 1.064 | 1.016,1.111 | 0 | 1.990 | 1.240,3.117 a |
| EAC | 0.959 | 0.914,1.000 a | 5 | 0.478 | 0.039,1.003 b | 1.038 | 0.985,1.085 | 0 | 1.613 | 0.785,2.571 |
| Binge drinking | ||||||||||
| All | 0.933 | 0.900,0.964 | 49 | failed | failed | 0.997 | 0.961,1.032 | 1 | 0.951 | 0.445,1.539 b |
| ESCC | 0.998 | 0.936,1.059 | 4 | 0.991 | 0.167,1.995 b | 1.033 | 0.968,1.101 | 0 | 1.515 | 0.440,2.754 |
| EAC | 0.914 | 0.863,0.961 a | 72 | failed | failed | 0.989 | 0.935,1.046 | 3 | 0.818 | 0.181,1.890 b |
| Heavy drinking | ||||||||||
| All | 0.981 | 0.932,1.028 | 61 | failed | failed | 1.010 | 0.963,1.060 | 23 | failed | failed |
| ESCC | 0.995 | 0.912,1.066 | 48 | failed | failed | 1.012 | 0.929,1.088 | 36 | failed | failed |
| EAC | 0.974 | 0.907,1.039 | 66 | failed | failed | 1.011 | 0.938,1.077 | 35 | failed | failed |
| Physical activity | ||||||||||
| All | 0.954 | 0.934,0.978 a | 0 | 0.319 | 0.165,0.564 a | 0.974 | 0.956,1.001 | 0 | 0.507 | 0.307,1.030 |
| ESCC | 0.959 | 0.925,0.991 | 2 | 0.345 | 0.073,0.811 a,b | 0.971 | 0.933,1.003 | 1 | 0.452 | 0.102,1.071b |
| EAC | 0.957 | 0.929,0.986 a | 1 | 0.311 | 0.109,0.675 a,b | 0.984 | 0.954,1.013 | 0 | 0.627 | 0.285,2.180 |
| Obese | ||||||||||
| All | 0.969 | 0.946,0.993 a | 24 | failed | failed | 1.008 | 0.983,1.036 | 0 | 1.262 | 0.559,2.931 |
| ESCC | 1.000 | 0.968,1.039 | 0 | 1.004 | 0.134,2.378 | 1.027 | 0.992,1.068 | 0 | 1.733 | 0.834,4.167 |
| EAC | 0.949 | 0.917,0.987 a | 76 | failed | failed | 0.996 | 0.960,1.035 | 8 | failed | failed |
| Current smoking with regular drinking | ||||||||||
| All | 0.987 | 0.930,1.058 | 40 | failed | failed | 1.044 | 0.986,1.120 | 2 | 3.254 | 0.771,11.843 b |
| ESCC | 1.044 | 0.946,1.146 | 12 | failed | failed | 1.076 | 0.973,1.180 | 11 | failed | failed |
| EAC | 0.963 | 0.861,1.052 | 60 | failed | failed | 1.032 | 0.919,1.123 | 13 | failed | failed |
Imputed RR (RR) – the relative risk calculated using the imputed behaviour
Impossible result – instances where the estimated true relative risk was impossible (a negative value)
Estimated True RR (RR) – the estimated true relative risk derived from the imputed relative risk and calibration parameters and
Median median of 100 repetitions of the imputation algorithm,
95% CI = empirical 95% confidence interval created from the 2.5 and 97.5 percentiles obtained from 100 repetitions of the imputation algorithm,
a 95% confidence intervals exclude no association (i.e. exclude relative risk equals 1)
b excludes impossible result