| Literature DB >> 21929796 |
Nora Eisemann1, Annika Waldmann, Alexander Katalinic.
Abstract
BACKGROUND: Missing data on tumour stage information is a common problem in population-based cancer registries. Statistical analyses on the level of tumour stage may be biased, if no adequate method for handling of missing data is applied. In order to determine a useful way to treat missing data on tumour stage, we examined different imputation models for multiple imputation with chained equations for analysing the stage-specific numbers of cases of malignant melanoma and female breast cancer.Entities:
Mesh:
Year: 2011 PMID: 21929796 PMCID: PMC3184281 DOI: 10.1186/1471-2288-11-129
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Description of the observed and the simulated data sets for breast cancer and malignant melanoma patients
| Breast cancer data set | Malignant melanoma data set | ||||
|---|---|---|---|---|---|
| Number of cases | 21,428 | 17,162 | 5,520 | 1,685 | |
| Sex (in %) | Female | 100 | 100 | 45.8 | 45.1 |
| Male | 0 | 0 | 54.2 | 54.9 | |
| Age | Median | 62.0 | 61.0 | 61.0 | 59.0 |
| T-stage (in %) | 1 | 47.3 | 49.2 | 36.9 | 36.9 |
| 2 | 34.0 | 34.3 | 11.8 | 12.4 | |
| 3 | 5.5 | 5.1 | 8.0 | 7.6 | |
| 4 | 7.3 | 5.7 | 4.7 | 4.4 | |
| Unknown | 6.0 | 5.7 | 38.6 | 38.7 | |
| N-stage (in %) | 0 | 53.4 | 55.1 | 29.5 | 32.5 |
| 1 | 25.4 | 25.2 | 1.7 | 1.4 | |
| 2 | 6.1 | 5.9 | 0.6 | 0.4 | |
| 3 | 3.7 | 3.5 | 0.2 | 0.1 | |
| Unknown | 11.4 | 10.3 | 67.9 | 65.6 | |
| M-stage (in %) | 0 | 77.9 | 80.8 | 31.0 | 33.5 |
| 1 | 5.6 | 3.7 | 2.2 | 0.7 | |
| Unknown | 16.5 | 15.4 | 66.8 | 65.8 | |
| UICC-stage (in %) | I | 32.0 | 29.7 | 23.9 | 7.4 |
| II | 33.2 | 30.0 | 3.3 | 0.7 | |
| III | 11.4 | 10.4 | 2.8 | 0.8 | |
| IV | 5.6 | 3.7 | 0.5 | 0.7 | |
| Unknown | 17.8 | 26.1 | 69.5 | 90.4 | |
| Survival time (days) | Median | 1279 | 1279 | 1552 | 1765 |
| Censoring (in %) | Censored | 84.6 | 88.5 | 88.1 | 90.5 |
| Year of diagnosis | 2000 | 10.2 | 9.7 | 10.8 | 15.1 |
| 2001 | 10.7 | 10.2 | 11.5 | 14.4 | |
| 2002 | 11.1 | 10.6 | 10.0 | 10.9 | |
| 2003 | 10.8 | 10.9 | 14.3 | 12.2 | |
| 2004 | 10.7 | 10.7 | 12.8 | 14.6 | |
| 2005 | 10.7 | 10.9 | 10.7 | 8.9 | |
| 2006 | 11.5 | 11.6 | 10.0 | 10.8 | |
| 2007 | 11.6 | 12.1 | 9.7 | 5.0 | |
| 2008 | 12.7 | 13.4 | 10.3 | 8.1 | |
| Grading (in %) | 1 | 10.6 | 10.6 | 3.4 | 5.7 |
| 2 | 54.2 | 53.7 | 0.3 | 0.2 | |
| 3 | 30.0 | 30.1 | 0.2 | 0.2 | |
| 4 | 0.2 | 0.1 | < 0.1 | 0.1 | |
| Unknown | 5.0 | 5.5 | 96.1 | 93.8 | |
| Radiotherapy (in %) | Yes | 66.4 | 80.2 | 1.4 | 0.5 |
| no | 17.8 | 15.3 | 52.2 | 64.5 | |
| Unknown | 15.8 | 4.5 | 46.4 | 35.0 | |
| Chemotherapy (in %) | Yes | 46.1 | 47.9 | 1.8 | 1.4 |
| No | 37.4 | 36.7 | 51.8 | 63.9 | |
| Unknown | 16.6 | 15.5 | 46.5 | 34.8 | |
| Hormone therapy (in %) | Yes | 60.6 | 62.3 | 0.0 | 0.0 |
| No | 19.1 | 18.1 | |||
| Unknown | 20.3 | 19.6 | |||
| Morphology (in %) | Infiltrating duct carcinoma | 69.0 | 70.6 | ||
| Lobular carcinoma | 12.3 | 11.9 | |||
| Infiltrating duct and lobular carcinoma | 7.4 | 7.9 | |||
| Nodular melanoma | 12.4 | 13.6 | |||
| Lentigo maligna melanoma | 5.3 | 5.1 | |||
| Superficial spreading melanoma | 41.4 | 48.4 | |||
| Others and NOS | 10.5 | 9.6 | 41.0 | 32.8 | |
| Topography (in %) | Central portion of breast | 5.3 | 5.1 | ||
| Upper-inner quadrant of breast | 9.3 | 9.6 | |||
| Lower-inner quadrant of breast | 4.6 | 4.7 | |||
| Upper-outer quadrant of breast | 35.7 | 36.3 | |||
| Lower-outer quadrant of breast | 6.3 | 6.3 | |||
| Axillary tail of breast | 0.2 | 0.1 | |||
| Overlapping lesion of breast | 8.9 | 8.2 | |||
| Trunc | 32.2 | 35.9 | |||
| Extremity | 46.7 | 50.1 | |||
| Head/Neck | 13.5 | 11.3 | |||
| NOS | 29.9 | 29.7 | 7.6 | 2.7 | |
* For T-, N-, M- and UICC-stage the mean frequencies of the five simulated data sets are provided. The other variables distributions are identical in all five data sets.
Concordance rates of imputed with observed T- and UICC-stages for breast cancer and malignant melanoma
| Breast cancer | Malignant melanoma | |||||||
|---|---|---|---|---|---|---|---|---|
| | ||||||||
| T-stage | ||||||||
| Concordance | 48.7 | 48.0 | 31.4 | 39.4 | 47.7 | 47.2 | 40.6 | 42.5 |
| Dislocation by 1 stage | 37.8 | 38.4 | 55.4 | 41.5 | 32.0 | 32.3 | 30.8 | 31.0 |
| Dislocation by 2 stages | 10.1 | 10.3 | 8.2 | 11.4 | 15.4 | 15.5 | 19.9 | 18.1 |
| Dislocation by 3 stages | 3.4 | 3.4 | 5.0 | 7.7 | 4.9 | 5.0 | 8.7 | 8.4 |
| UICC-stage | ||||||||
| Concordance | 79.5 | 79.1 | 58.8 | 74.2 | 80.6 | 80.8 | 77.9 | 79.5 |
| Dislocation by 1 stage | 17.3 | 17.6 | 25.0 | 19.9 | 11.1 | 10.8 | 13.3 | 11.5 |
| Dislocation by 2 stages | 2.9 | 2.9 | 14.2 | 4.8 | 6.2 | 5.7 | 8.1 | 7.4 |
| Dislocation by 3 stages | 0.3 | 0.3 | 1.9 | 1.1 | 2.1 | 2.7 | 0.6 | 1.6 |
PR Polytomous regression
PMM Predictive mean matching
RF Random forests
Prop Proportional sampling
Observed and with different multiple imputation methods predicted T- and UICC-stage-specific numbers of cases for breast cancer and malignant melanoma
| Breast cancer | Malignant melanoma | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| | |||||||||||
| T-stage | |||||||||||
| 1 | N | 8,909.0 | 8,903.2 | 8,903.5 | 8,788.2 | 8,950.5 | 1,017.0 | 1,009.8 | 1,014.5 | 962.3 | 1,013.7 |
| | |||||||||||
| 2 | N | 6,235.0 | 6,236.7 | 6,239.5 | 6,394.9 | 6,228.6 | 338.0 | 341.8 | 340.2 | 332.7 | 341.2 |
| | |||||||||||
| 3 | N | 944.0 | 944.5 | 946.7 | 944.9 | 936.4 | 214.0 | 212.1 | 209 | 242.1 | 209.0 |
| | |||||||||||
| 4 | N | 1,074.0 | 1,077.6 | 1,072.3 | 1,034.1 | 1,046.5 | 116.0 | 121.3 | 121.3 | 147.9 | 121.1 |
| | |||||||||||
| MAD | 59.8 | 57.8 | 341.9 | 104.1 | 48.4 | 55.4 | 409.2 | 49.2 | |||
| UICC-stage | |||||||||||
| I | N | 6,859.0 | 6,865.7 | 6,856.7 | 6,096.1 | 6,642 | 1,321.0 | 1,269.8 | 1,276.8 | 1,213.7 | 1,267.8 |
| | |||||||||||
| II | N | 7,123.0 | 7,119.5 | 7,133.9 | 6,891.8 | 7,295.9 | 216.0 | 204.4 | 208.4 | 271.1 | 238.6 |
| | |||||||||||
| III | N | 2,371.0 | 2,361.7 | 2,351.9 | 3,206.1 | 2,462.6 | 122.0 | 149.9 | 124.5 | 173.4 | 143.6 |
| | |||||||||||
| IV | N | 809.0 | 815.1 | 819.5 | 968.0 | 761.5 | 26.0 | 60.9 | 75.2 | 26.9 | 35.0 |
| | |||||||||||
| MAD | 68.5 | 72.7 | 2182.4 | 528.8 | 133.9 | 126.9 | 344.4 | 107.5 | |||
N Number of cases
SD Standard deviation
PR Polytomous regression
PMM Predictive mean matching
RF Random forests
Prop Proportional sampling
MAD Mean absolute difference between the predicted number of caces in all 50 imputations and the observed number of cases.
Figure 1T-stage-specific survival curves for female breast cancer. The predicted survival curves are based on the 50 completed data sets.
Figure 2T-stage-specific survival curves for malignant melanoma. The predicted survival curves are based on the 50 completed data sets.
Figure 3UICC-stage-specific survival curves for female breast cancer. The predicted survival curves are based on the 50 completed data sets.
Figure 4UICC-stage-specific survival curves for malignant melanoma. The predicted survival curves are based on the 50 completed data sets.