| Literature DB >> 28941187 |
Jacqueline Birks1, Clare Bankhead2, Tim A Holt2, Alice Fuller2, Julietta Patnick3.
Abstract
Earlier detection of colorectal cancer greatly improves prognosis, largely through surgical excision of neoplastic polyps. These include benign adenomas which can transform over time to malignant adenocarcinomas. This progression may be associated with changes in full blood count indices. An existing risk algorithm derived in Israel stratifies individuals according to colorectal cancer risk using full blood count data, but has not been validated in the UK. We undertook a retrospective analysis using the Clinical Practice Research Datalink. Patients aged over 40 with full blood count data were risk-stratified and followed up for a diagnosis of colorectal cancer over a range of time intervals. The primary outcome was the area under the receiver operating characteristic curve for the 18-24-month interval. We also undertook a case-control analysis (matching for age, sex, and year of risk score), and a cohort study of patients undergoing full blood count testing during 2012, to estimate predictive values. We included 2,550,119 patients. The area under the curve for the 18-24-month interval was 0.776 [95% confidence interval (CI): 0.771, 0.781]. Performance improves as the time interval reduces. The area under the curve for the age-matched case-control analysis was 0.583 [0.574, 0.591]. For the population risk-scored in 2012, the positive predictive value at 99.5% specificity was 8.8% with negative predictive value 99.6%. The algorithm offers an additional means of identifying risk of colorectal cancer, and could support other approaches to early detection, including screening and active case finding.Entities:
Keywords: Blood cell count; colorectal neoplasms; early detection of cancer; electronic health records; machine learning; risk assessment
Mesh:
Year: 2017 PMID: 28941187 PMCID: PMC5633543 DOI: 10.1002/cam4.1183
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
Description of the population included in the primary analysis
| No diagnosis of CRC | Diagnosis of CRC | |||||
|---|---|---|---|---|---|---|
| Number | Age (SD) range | Score (SD) range | Number | Age (SD) range | Score (SD) range | |
| Female | 1240666 | 60.8 (14.7) 40–108 | 48.3 (28.7) 0–100 | 2410 | 73.4 (11.0) 40–98 | 75.4 (21.1) 0–100 |
| Male | 979442 | 60.2 (13.2) 40–110 | 55.6 (29.0) 0–100 | 2731 | 72.1 (9.9) 40–97 | 82.5 (17.3) 6–100 |
| Total | 2220108 | 60.5 (14.0) 40–110 | 51.5 (29.0) 0–100 | 5141 | 72.7 (10.5) 40–98 | 79.1 (19.5) 0–100 |
Results of logistic analysis for a diagnosis of colorectal cancer with score as the predictor, at five time intervals before diagnosis
| Time before diagnosis (months) | Total number | Cases | Noncases | AUROC (95% CI) | Specificity when sensitivity = 50% | Sensitivity when specificity = 99.5% |
|---|---|---|---|---|---|---|
| 3–6 m | 2484699 | 5935 | 2478764 | 0.844 (0.839, 0.849) | 92.50 (92.46, 92.53) | 14.2 (13.3, 15.1) |
| Score cut‐off = 96.16 | Score cut‐off = 99.82 | |||||
| 6–12 m | 2436324 | 6821 | 2429503 | 0.813 (0.809, 0.818) | 86.98 (86.94, 87.02) | 9.3 (8.6, 10.0) |
| Score cut‐off = 89.63 | Score cut‐off = 99.81 | |||||
| 12–24 m | 2334380 | 5744 | 2328636 | 0.791 (0.786, 0.796) | 84.98 (84.94, 85.03) | 6.2 (5.6, 6.9) |
| Score cut‐off = 86.04 | Score cut‐off = 99.79 | |||||
| 18–24 m | 2225249 | 5141 | 2220108 | 0.776 (0.771, 0.781) | 82.73 (82.68, 82.78) | 3.91 (3.40, 4.48) |
| Score cut‐off = 83.47 | Score cut‐off = 99.78 | |||||
| 24–36 m | 2110307 | 7360 | 2102947 | 0.751 (0.746, 0.756) | 79.41 (79.36, 79.47) | 2.5 (2.2, 2.9) |
| Score cut‐off = 80.22 | Score cut‐off = 99.77 |
The primary analysis was for the 18–24‐month interval (shaded).
Figure 1The receiver operating characteristic curve for the 18–24‐months' time interval (primary analysis).
Results of the case–control sensitivity analysis
| Descriptive statistics | ||||||
|---|---|---|---|---|---|---|
| No diagnosis of CRC | Diagnosis of CRC | |||||
| Number | Age (SD) Range | Score (SD) Range | Number | Age (SD) Range | Score (SD) Range | |
| Female | 291273 | 73.5 (10.7) 40‐98 | 71.5 (20.0)0–100 | 2410 | 73.4 (11.0) 40–98 | 75.4 (21.1)0–100 |
| Male | 222827 | 71.6 (10.1) 40‐98 | 78.6 (18.4)2–100 | 2731 | 72.1 (9.9) 40–97 | 82.5 (17.3)6–100 |
| Total | 514100 | 71.6 (10.1) 40‐98 | 78.6 (18.4)2–100 | 5141 | 72.7 (10.5) 40–98 | 79.1 (19.5)0–100 |
Description of patients included in the cohort who had a score measured in 2012 and with follow‐up for 24 months
| Number | Mean age at 2012 (SD) | Mean score (SD) | |
|---|---|---|---|
| Patients lost to follow‐up within 2 years without a diagnosis | 160814 | 63.5 (14.1) | 58.1 (28.0) |
| Patients who had died within 2 years without a diagnosis | 36032 | 79.0 (11.6) | 86.4 (15.9) |
| Patients in analysis set | 600273 | 62.9 (13.5) | 56.8 (27.6) |
| Total | 797119 | 63.8 (13.9) | 58.4 (28.0) |
Results of the 2012 cohort analysis. The threshold for case identification is a risk score cut off of 99.84 (associated with 99.5% specificity)
| Total number | Cases of CRC | Noncases | True positive | False positive | False negative | True negative | AUROC (95% CI) | Sensitivity when specificity=99.5% |
|---|---|---|---|---|---|---|---|---|
| 600273 | 2454 | 597819 | 280 | 2893 | 2174 | 594926 | 0.781 (0.772, 0.791) | 11.4 (10.2, 12.7) |
| Score cut‐off = 99.84 |