| Literature DB >> 35274820 |
Wei-Miao Wu1, Kai Gu2, Yi-Hui Yang1, Ping-Ping Bao2, Yang-Ming Gong2, Yan Shi2, Wang-Hong Xu1, Chen Fu2.
Abstract
BACKGROUND: An optimal risk-scoring system enables more targeted offers for colonoscopy in colorectal cancer (CRC) screening. This analysis aims to develop and validate scoring systems using parametric and non-parametric methods for average-risk populations.Entities:
Keywords: colorectal cancer; data mining; risk model; risk score; screening
Mesh:
Year: 2022 PMID: 35274820 PMCID: PMC9089226 DOI: 10.1002/cam4.4576
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.711
Characteristics and screening results of participants in the Shanghai colorectal cancer screening program
| Variables | CRC cases ( | Non‐cases ( |
| FDR |
|---|---|---|---|---|
| Demographic characteristics | ||||
| Age at screening (years), mean (SD) | 64.2 (5.7) | 61.9 (6.0) | <0.001 | <0.001 |
| Sex, n (%) | <0.001 | <0.001 | ||
| Men | 1487 (53.0) | 314785 (39.1) | ||
| Women | 1319 (47.0) | 490104 (60.9) | ||
| Educational level, n (%) | 0.001 | 0.001 | ||
| No formal education | 224 (8.0) | 63332 (7.9) | ||
| Primary school | 760 (27.1) | 230400 (28.6) | ||
| Middle school | 1498 (53.4) | 436897 (54.3) | ||
| High school | 319 (11.4) | 72666 (9.0) | ||
| College or above | 5 (0.2) | 1594 (0.2) | ||
| Marital status, n (%) | 0.050 | 0.053 | ||
| Married | 2522 (89.9) | 731574 (90.9) | ||
| Unmarried | 88 (3.1) | 21419 (2.7) | ||
| Divorced | 28 (1.0) | 8675 (1.1) | ||
| Widowed | 145 (5.2) | 34573 (4.3) | ||
| Unknown | 23 (0.8) | 8648 (1.1) | ||
| Occupation, n (%) | 0.007 | 0.009 | ||
| Office workers | 217 (7.7) | 53517 (6.7) | ||
| Enterprise workers | 1190 (42.4) | 324926 (40.4) | ||
| Farmers | 768 (27.4) | 243836 (30.3) | ||
| Self‐employed | 64 (2.3) | 19439 (2.4) | ||
| Unemployed | 119 (4.2) | 33301 (4.1) | ||
| Others | 448 (16.0) | 129870 (16.1) | ||
| Resident areas, n (%) | 0.175 | 0.175 | ||
| Downtown | 1026 (36.6) | 284435 (35.3) | ||
| Suburb | 1780 (63.4) | 520454 (64.7) | ||
| Factors for risk stratification, | ||||
| Chronic diarrhoea | 252 (9.0) | 43204 (5.4) | <0.001 | <0.001 |
| Chronic constipation | 228 (8.1) | 54980 (6.8) | 0.007 | 0.009 |
| Mucus or bloody stool | 167 (6.0) | 17186 (2.1) | <0.001 | <0.001 |
| Chronic appendicitis/appendectomy | 310 (11.1) | 79242 (9.9) | 0.033 | 0.037 |
| Chronic cholecystitis/cholecystectomy | 298 (10.6) | 74440 (9.3) | 0.012 | 0.015 |
| Serious unhappy life events | 86 (3.1) | 17991 (2.2) | 0.003 | 0.005 |
| Colorectal polyps | 65 (2.3) | 11468 (1.4) | <0.001 | <0.001 |
| Diagnosis of any cancer | 97 (3.5) | 16238 (2.0) | <0.001 | <0.001 |
| CRC in first‐degree relatives | 166 (5.9) | 25174 (3.1) | <0.001 | <0.001 |
| Stratified as high risk | 566 (20.2) | 91196 (11.3) | <0.001 | <0.001 |
| Qualitative FIT positive, | ||||
| One‐sample | 1450 (51.7) | 66980 (8.3) | <0.001 | <0.001 |
| Two‐sample | 1821 (64.9) | 103045 (12.8) | <0.001 | <0.001 |
Abbreviations: CRC, colorectal cancer; FDR, false discovery rate; FIT, faecal immunochemical test.
p values for t‐tests or chi‐square tests.
p values after FDR correction for multiple comparisons.
Discrimination and calibration of risk predictive models for colorectal cancer in the derivation and validation sets
| Derivation set ( | Validation set ( | |||
|---|---|---|---|---|
| AUC (95% CI) |
| AUC (95% CI) |
| |
| Incorporating risk factors only | ||||
| LR model | 0.648 (0.634–0.661) | 0.991 | 0.645 (0.630–0.661) | 0.186 |
| ANN model | 0.651 (0.638–0.664) | <0.001 | 0.647 (0.632–0.663) | <0.001 |
| Incorporating risk factors and one‐sample FIT results | ||||
| LR model | 0.777 (0.764–0.790) | 0.800 | 0.786 (0.771–0.801) | 0.374 |
| ANN model | 0.779 (0.766–0.791) | <0.001 | 0.787 (0.772–0.802) | <0.001 |
| Incorporating risk factors and two‐sample FIT results | ||||
| LR model | 0.809 (0.798–0.821) | 0.503 | 0.811 (0.797–0.825) | 0.891 |
| ANN model | 0.811 (0.800–0.823) | <0.001 | 0.813 (0.799–0.826) | <0.001 |
Abbreviations: ANN, artificial neural network; AUC, area under the receiver operating characteristic curve; CI, confidence interval; FIT, faecal immunochemical test; LR, logistic regression.
p values for calibration based on the Hosmer–Lemeshow goodness‐of‐fit tests.
Scoring algorithm to calculate the point values in the derivation set
| Variable | Reference value (Wij) | Risk factors only | Incorporating one‐sample FIT results | Incorporating two‐sample FIT results | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| β (95% CI) | LR score | ANN score | β (95% CI) | LR score | ANN score | β (95% CI) | LR score | ANN score | ||
| Age at screening | 0.06 (0.05–0.07) | 0.06 (0.05–0.07) | 0.06 (0.05–0.06) | |||||||
| Age group | ||||||||||
| 50–54 | 52 (Wref) | — | 0 | 0 | — | 0 | 0 | — | 0 | 0 |
| 55–59 | 57 | — | 1.0 | 0.75 | — | 1.0 | 0.50 | — | 1.0 | 0.375 |
| 60–64 | 62 | — | 2.0 | 1.50 | — | 2.0 | 1.00 | — | 2.0 | 0.750 |
| 65–69 | 67 | — | 3.0 | 2.25 | — | 3.0 | 1.50 | — | 3.0 | 1.125 |
| 70–74 | 72 | — | 4.0 | 3.0 | — | 4.0 | 2.0 | — | 4.0 | 1.5 |
| Sex | ||||||||||
| Women | 0 (Wref) | — | 0 | 0 | — | 0 | 0 | — | 0 | 0 |
| Men | 1 | 0.46 (0.36–0.55) | 2.0 | 2.0 | 0.39 (0.29–0.49) | 1.0 | 2.0 | 0.39 (0.29–0.48) | 1.0 | 1.5 |
| Chronic diarrhoea | ||||||||||
| Never | 0 (Wref) | — | 0 | 0 | — | 0 | 0 | — | 0 | 0 |
| Ever | 1 | 0.35 (0.17–0.55) | 1.0 | 1.0 | 0.26 (0.09–0.44) | 1.0 | 2.0 | 0.24 (0.06–0.42) | 1.0 | 2.0 |
| Mucus or bloody stool | ||||||||||
| Never | 0 (Wref) | — | 0 | 0 | — | 0 | 0 | — | 0 | 0 |
| Ever | 1 | 0.91 (0.71–1.12) | 3.0 | 2.0 | 0.74 (0.53–0.94) | 2.0 | 2.5 | 0.70 (0.49–0.91) | 2.0 | 2.0 |
| Diagnosis of any cancer | ||||||||||
| Never | 0 (Wref) | — | 0 | 0 | — | 0 | 0 | — | 0 | 0 |
| Ever | 1 | 0.51 (0.26–0.76) | 2.0 | 2.0 | 0.49 (0.24–0.74) | 2.0 | 2.0 | 0.48 (0.22–0.73) | 2.0 | 2.5 |
| CRC in first degree relatives | ||||||||||
| No | 0 (Wref) | — | 0 | 0 | — | 0 | 0 | — | 0 | 0 |
| Yes | 1 | 0.57 (0.36–0.78) | 2.0 | 4.0 | 0.49 (0.28–0.70) | 2.0 | 3.5 | 0.47 (0.25–0.68) | 2.0 | 2.5 |
| Qualitative FIT | ||||||||||
| Negative | — | — | — | — | — | 0 | 0 | — | 0 | 0 |
| Positive | — | — | — | — | 2.39 (2.30–2.49) | 8.0 | 6.0 | 2.48 (2.37–2.58) | 8.0 | 8.0 |
| Overall score | — | — | 0–14 | 0–14 | — | 0–20 | 0–20 | — | 0–20 | 0–20 |
Abbreviations: ANN, artificial neural network; CI, confidence interval; CRC, colorectal cancer; FIT, faecal immunochemical test; LR, logistic regression.
β (95% CI) derived from multivariable LR model.
LR‐based risk score = β*(Wij‐Wref)/B, in which constant B is the number of regression unit equivalent to 1 point in the final risk score, and was calculated by multiplying the β for age (0.06) by 5 (0.06*5 = 0.30). Based on an age‐standardized method, the point values of other variables were obtained with their corresponding regression coefficients dividing by 0.30 and rounding to the nearest whole number, e.g. for the LR‐based scoring system with risk factors only, 0.46/0.30 = 2.0 for sex, 0.35/0.30 = 1.0 for chronic diarrhoea, 0.91/0.30 = 3.0 for mucus or bloody stool, 0.51/0.30 = 2.0 for prior diagnosis of any cancer, 0.57/0.30 = 2.0 for CRC in first degree relatives.
Computed by multiplying the total score in the LR‐based scoring system (14 scores) by the contribution of predictors on the outcome in the ANN model: age (20.5%), sex (17.0%), chronic diarrhoea (10.4%), mucus or bloody stool (11.7%), history of any cancer (13.3%) and CRC in first‐degree relatives (27.1%).
Computed by multiplying the total score in the LR‐based scoring system (20 scores) by the contribution of predictors on the outcome in the ANN model: age (9.9%), sex (10.7%), chronic diarrhoea (10.5%), mucus or bloody stool (12.3%), history of any cancer (10.2%), CRC in first‐degree relatives (18.4%) and one‐sample FIT (28.0%).
Computed by multiplying the total score in the LR‐based scoring system (20 scores) by the contribution of predictors on the outcome in the ANN model: age (7.6%), sex (7.3%), chronic diarrhoea (10.7%), mucus or bloody stool (9.7%), history of any cancer (11.8%), CRC in first‐degree relatives (13.5%) and two‐sample FIT (39.4%).
Discrimination and calibration of risk scoring systems for colorectal cancer in the derivation and validation sets
| Derivation set ( | Validation set ( | |||
|---|---|---|---|---|
| AUC (95% CI) |
| AUC (95% CI) |
| |
| Incorporating risk factors only | ||||
| LR‐based scoring system | 0.642 (0.629–0.655) | 0.967 | 0.641 (0.626–0.657) | 0.938 |
| ANN‐based scoring system | 0.639 (0.626–0.652) | 0.053 | 0.640 (0.624–0.655) | 0.109 |
| Incorporating risk factors and one‐sample FIT results | ||||
| LR‐based scoring system | 0.774 (0.761–0.787) | 0.998 | 0.786 (0.771–0.800) | 0.923 |
| ANN‐based scoring system | 0.763 (0.751–0.776) | <0.001 | 0.781 (0.766–0.796) | 0.002 |
| Incorporating risk factors and two‐sample FIT results | ||||
| LR‐based scoring system | 0.808 (0.796–0.819) | 0.871 | 0.811 (0.798–0.825) | 0.860 |
| ANN‐based scoring system | 0.805 (0.793–0.817) | 0.001 | 0.809 (0.795–0.823) | 0.124 |
Abbreviations: ANN, artificial neural network; AUC, area under the receiver operating characteristic curve; CI, confidence interval; FIT, faecal immunochemical test; LR, logistic regression.
p values for calibration based on the Hosmer–Lemeshow goodness‐of‐fit tests.
Performance of selected initial screening methods using risk factors and FIT results in the derivation and validation sets
| Initial screening methods | Derivation set ( | Validation set ( | ||||||
|---|---|---|---|---|---|---|---|---|
| No. of high‐risk subjects | No. of CRC covered (%) | Sensitivity (95% CI), % | Specificity (95% CI), % | No. of high‐risk subjects | No. of CRC covered (%) | Sensitivity (95% CI), % | Specificity (95% CI), % | |
| Incorporating risk factors only | ||||||||
| Pre‐defined risk stratification | 54862 | 335 (0.61) | 19.9 (18.1–21.9) | 88.7 (88.6–88.8) | 36835 | 231 (0.63) | 20.6 (18.3–23.1) | 88.6 (88.5–88.7) |
| LR‐based scoring ≥6 | 42836 | 341 (0.80) | 20.3 (18.4–22.3) | 91.2 (91.1–91.3) | 28644 | 221 (0.77) | 19.7 (17.5–22.2) | 91.2 (91.1–91.3) |
| ANN‐based scoring ≥5 | 53782 | 401 (0.75) | 23.8 (21.9–25.9) | 88.9 (88.9–89.0) | 36033 | 259 (0.72) | 23.1 (20.7–25.6) | 88.9 (88.8–89.0) |
| Qualitative FIT only | ||||||||
| One‐sample | 40946 | 866 (2.11) | 51.5 (49.1–53.9) | 91.7 (91.6–91.8) | 27421 | 581 (2.12) | 51.8 (48.9–54.7) | 91.7 (91.6–91.8) |
| Two‐sample | 62830 | 1094 (1.74) | 65.0 (62.7–67.3) | 87.2 (87.1–87.3) | 41949 | 724 (1.73) | 64.6 (61.7–67.3) | 87.2 (87.1–87.3) |
| Incorporating risk factors and one‐sample FIT | ||||||||
| LR‐based scoring ≥6 | 50339 | 915 (1.82) | 54.4 (52.0–56.8) | 89.8 (89.7–89.8) | 33496 | 611 (1.82) | 54.5 (51.6–57.4) | 89.8 (89.7–89.9) |
| ANN‐based scoring ≥6 | 56287 | 936 (1.66) | 55.7 (53.3–58.0) | 88.5 (88.4–88.6) | 37701 | 630 (1.67) | 56.2 (53.2–59.0) | 88.5 (88.4–88.6) |
| Incorporating risk factors and two‐sample FIT | ||||||||
| LR‐based scoring ≥7 | 65753 | 1110 (1.69) | 66.0 (63.7–68.2) | 86.6 (86.5–86.7) | 43848 | 734 (1.67) | 65.5 (62.7–68.2) | 86.6 (86.5–86.7) |
| ANN‐based scoring ≥7 | 64344 | 1102 (1.71) | 65.5 (63.2–67.8) | 86.9 (86.8–87.0) | 42950 | 730 (1.70) | 65.1 (62.2–67.8) | 86.9 (86.8–87.0) |
| Parallel use of scoring system with one‐sample FIT | ||||||||
| Pre‐defined risk stratification | 89212 | 1018 (1.14) | 60.5 (58.2–62.8) | 81.7 (81.6–81.8) | 59828 | 683 (1.14) | 60.9 (58.0–63.7) | 81.6 (81.5–81.8) |
| LR‐based scoring ≥6 | 79136 | 1001 (1.26) | 59.5 (57.2–61.8) | 83.8 (83.7–83.9) | 52912 | 685 (1.29) | 61.1 (58.2–63.9) | 83.8 (83.6–83.9) |
| ANN‐based scoring ≥5 | 89058 | 1034 (1.16) | 61.5 (59.1–63.8) | 81.8 (81.7–81.9) | 59600 | 708 (1.19) | 63.1 (60.2–65.9) | 81.7 (81.6–81.8) |
| Parallel use of scoring system with two‐sample FIT | ||||||||
| Pre‐defined risk stratification | 107718 | 1208 (1.12) | 71.8 (69.6–73.9) | 77.9 (77.8–78.1) | 72060 | 805 (1.12) | 71.8 (69.1–74.4) | 77.9 (77.7–78.0) |
| LR‐based scoring ≥6 | 98651 | 1186 (1.20) | 70.5 (68.3–72.6) | 79.8 (79.7–79.9) | 65834 | 801 (1.22) | 71.5 (68.7–74.0) | 79.8 (79.7–79.9) |
| ANN‐based scoring ≥5 | 107995 | 1213 (1.12) | 72.1 (69.9–74.2) | 77.9 (77.8–78.0) | 72138 | 817 (1.13) | 72.8 (70.1–75.3) | 77.8 (77.7–78.0) |
Abbreviations: ANN, artificial neural network; CI, confidence interval; CRC, colorectal cancer; FIT, faecal immunochemical test; LR, logistic regression; No., number.