| Literature DB >> 32804084 |
Teresa A'mar1, Jessica Chubak2, Ruth Etzioni1, J David Beatty3, Catherine Fedorenko1, Daniel Markowitz3, Thomas Corey1, Jane Lange1, Stephen M Schwartz1, Bin Huang4.
Abstract
BACKGROUND: There is a need for automated approaches to incorporate information on cancer recurrence events into population-based cancer registries.Entities:
Keywords: breast cancer; cancer recurrence event; cancer registries; data mining; medical claims; medical informatics; statistical learning
Year: 2020 PMID: 32804084 PMCID: PMC7459434 DOI: 10.2196/18143
Source DB: PubMed Journal: JMIR Cancer ISSN: 2369-1999
Figure 1Sample plots for a hypothetical case showing a typical pattern of recorded claims each month before and after a second breast cancer event (SBCE).
Top 20 features identified by the gradient boosting algorithm.
| Order | Description |
| 1 | Fraction of prior months with diagnosis code for secondary malignant neoplasm of other specified sites |
| 2 | Fraction of prior months with procedure codes for biopsy or excision of lymph nodes |
| 3 | Months since last procedure code for needle biopsy |
| 4 | Fraction of prior months with diagnosis codes for secondary malignant neoplasm of respiratory and digestive systems |
| 5 | Months since last procedure code for bone scan |
| 6 | Months since last procedure code for other tumor markers |
| 7 | Months since last diagnosis code for carcinoma in situ of breast and genitourinary system |
| 8 | Fraction of prior months with diagnosis code for cancer of breast |
| 9 | Time until next diagnosis code for secondary malignant neoplasm of respiratory and digestive systems |
| 10 | Fraction of prior months with procedure code for fine needle aspirate |
| 11 | Number of instances of diagnosis code for cancer of breast in the current month |
| 12 | Months since procedure code for biopsy or excision of lymph nodes |
| 13 | Fraction of prior months with procedure code for chemotherapy |
| 14 | Months since diagnosis |
| 15 | Months since last procedure code for chest computed tomography |
| 16 | Fraction of prior months with procedure code for bone scan |
| 17 | Age in current month |
| 18 | Fraction of prior months with diagnosis code for benign mammary dysplasias |
| 19 | Time until next diagnosis code for secondary malignant neoplasm of other specified sites |
| 20 | Time until next diagnosis code for cancer of other and unspecified sites |
Figure 2Month-level receiver operating characteristic (ROC) curve based on the test data set corresponding to the prediction model derived using the training data set. The area under the curve (AUC) is 0.986.
Person-level performance (sensitivity, specificity, and positive and negative predictive values) corresponding to various probability thresholds for classifying an individual as having a second breast cancer eventa.
| Threshold | Sensitivity | Specificity | Positive predictive value | Negative predictive value |
| 0.10 | 0.962 | 0.942 | 0.710 | 0.994 |
| 0.15 | 0.937 | 0.950 | 0.733 | 0.990 |
| 0.20 | 0.924 | 0.955 | 0.753 | 0.988 |
| 0.25 | 0.911 | 0.957 | 0.758 | 0.987 |
| 0.30 | 0.886 | 0.959 | 0.761 | 0.983 |
| 0.35 | 0.886 | 0.963 | 0.778 | 0.983 |
| 0.40 | 0.886 | 0.972 | 0.824 | 0.983 |
| 0.45 | 0.886 | 0.976 | 0.843 | 0.983 |
| 0.50 | 0.886 | 0.978 | 0.854 | 0.983 |
| 0.55 | 0.886 | 0.980 | 0.864 | 0.983 |
| 0.60 | 0.886 | 0.980 | 0.864 | 0.983 |
| 0.65 | 0.886 | 0.981 | 0.875 | 0.983 |
| 0.70 | 0.873 | 0.983 | 0.885 | 0.981 |
| 0.75 | 0.861 | 0.987 | 0.907 | 0.980 |
aFor each threshold, an individual is predicted to have a second breast cancer event if at least one of the monthly predicted probabilities exceeds the threshold. There were 538 cases without and 79 cases with a second breast cancer event in the test set.
Accuracy of the predicted timing of a second breast cancer event at each of a set of threshold probabilitiesa.
| Threshold | Predicted number of second breast cancer events | Mean difference in months | Median difference in months | Minimum difference in months | Maximum difference in months |
| 0.10 | 76 | –1.5 | 0 | –36 | 19 |
| 0.15 | 74 | –0.8 | 0 | –27 | 19 |
| 0.20 | 73 | –0.3 | 0 | –24 | 19 |
| 0.25 | 72 | –0.3 | 0 | –24 | 19 |
| 0.30 | 70 | –0.5 | 0 | –24 | 5 |
| 0.35 | 70 | –0.5 | 0 | –24 | 5 |
| 0.40 | 70 | –0.3 | 0 | –24 | 5 |
| 0.45 | 70 | –0.2 | 0 | –24 | 5 |
| 0.50 | 70 | –0.04 | 0 | –24 | 5 |
| 0.55 | 70 | 0.01 | 0 | –24 | 5 |
| 0.60 | 70 | 0.1 | 0 | –24 | 5 |
| 0.65 | 70 | 0.1 | 0 | –24 | 5 |
| 0.70 | 69 | 0.3 | 0 | –24 | 9 |
| 0.75 | 68 | 0.4 | 0 | –24 | 9 |
aThe table shows the mean, median, maximum, and minimum of the difference between the observed and predicted time of a second breast cancer event given the threshold for each of the individuals correctly predicted to have a second breast cancer event. A negative value indicates that the predicted time of a second breast cancer event precedes the observed time. For each threshold, an individual is determined to have had a second breast cancer event if at least one of the monthly predicted probabilities exceeds the threshold. There are 79 individuals with a second breast cancer event in the test data.
Figure 3Accuracy of predicted timing of recurrence expressed via a comparison of Kaplan-Meier curves for observed (red) versus predicted (blue) time to SBCE among test set cases with a SBCE, where the predicted time to SBCE is based on a threshold probability of 0.5. Cases for whom no SBCE is predicted (monthly predicted probabilities never exceed 0.5) are censored at their last follow-up time. SBCE: second breast cancer event.