| Literature DB >> 33092313 |
Hyo-Joon Yang1, Chang Woo Cho2, Jongha Jang2, Sang Soo Kim2, Kwang-Sung Ahn3, Soo-Kyung Park1, Dong Il Park1.
Abstract
BACKGROUND/AIMS: We aimed to develop a deep learning model for the prediction of the risk of advanced colorectal neoplasia (ACRN) in asymptomatic adults, based on which colorectal cancer screening could be customized.Entities:
Keywords: Big data; Colorectal neoplasms; Deep learning; Mass screening; Risk assessment
Mesh:
Year: 2020 PMID: 33092313 PMCID: PMC8273821 DOI: 10.3904/kjim.2020.020
Source DB: PubMed Journal: Korean J Intern Med ISSN: 1226-3303 Impact factor: 2.884
Figure 1.Deep learning model development process. (A) Conventional logistic regression process, deep neural network (DNN) model development, and conventional machine learning methods. (B) Cross-validation of DNN models. (C) Flow of the study population. SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting; AUC, area under the receiver operating characteristic curve; CRC, colorectal cancer; IBD, inflammatory bowel disease.
Demographics and clinical characteristics of the study participants
| Characteristic | Development group (n = 56,269) | Validation group (n = 14,067) | |
|---|---|---|---|
| Age, yr | 41.6 ± 8.3 | 41.6 ± 8.4 | 0.920 |
| Male sex | 39,063 (69.4) | 9,747 (69.3) | 0.761 |
| Current smoker | 15,930 (28.3) | 4,044 (28.8) | 0.303 |
| Alcohol consumption, time/wk | 2 (1–3) | 2 (1–3) | 0.660 |
| Regular exercise ≥ 4 times/wk | 30,666 (54.5) | 7,661 (54.5) | 0.935 |
| Family history of CRC | 2,202 (3.9) | 566 (4.0) | 0.547 |
| Hypertension | 9,386 (16.7) | 2,307 (16.4) | 0.424 |
| Diabetes | 2,827 (5.0) | 708 (5.0) | 0.965 |
| BMI, kg/m2 | 23.8 ± 3.1 | 23.8 ± 3.1 | 0.360 |
| Waist circumference, cm | 83.2 ± 8.7 | 83.1 ± 8.6 | 0.393 |
| Systolic BP, mmHg | 113.3 ± 13.1 | 113.2 ± 13.0 | 0.453 |
| Diastolic BP, mmHg | 72.6 ± 9.6 | 72.5 ± 9.5 | 0.589 |
| Glucose, mg/dL | 93.5 ± 14.6 | 93.9 ± 15.5 | 0.007 |
| HbA1c, % | 5.7 ± 0.5 | 5.7 ± 0.5 | 0.981 |
| Total cholesterol, mg/dL | 199.8 ± 34.7 | 199.7 ± 34.7 | 0.638 |
| HDL-C, mg/dL | 55.1 ± 13.8 | 55.1 ± 13.8 | 0.601 |
| Triglyceride, mg/dL | 95 (67–141) | 96 (67–141) | 0.920 |
| LDL-C, mg/dL | 124.9 ± 32.0 | 124.9 ± 31.9 | 0.875 |
| Insulin, μU/mL | 4.5 (2.8–7.1) | 4.6 (2.8–7.1) | 0.946 |
| hsCRP, mg/L | 0.1 (0.0–0.1) | 0.1 (0.0–0.1) | 0.038 |
| WBC, × 103/mm3 | 6.2 ± 1.7 | 6.2 ± 1.6 | 0.740 |
| RBC, × 106/mm3 | 4.9 ± 0.4 | 4.9 ± 0.4 | 0.417 |
| Hemoglobin, g/dL | 14.9 ± 1.5 | 14.9 ± 1.5 | 0.642 |
| Hematocrit, % | 43.7 ± 4.0 | 43.7 ± 3.9 | 0.586 |
| Platelet, × 103/mm3 | 248.2 ± 52.8 | 248.2 ± 52.5 | 0.990 |
| Ferritin, ng/mL | 139.6 (66.0–225.2) | 139.1 (64.8–221.9) | 0.142 |
| CEA, ng/mL | 1.4 (1.0–2.0) | 1.4 (1.0–2.0) | 0.683 |
| ACRN | 775 (1.4) | 185 (1.3) | 0.570 |
| ACRN for age ≥ 50 yr | 328/8,459 (3.9) | 86/2,161 (4.0) | 0.827 |
Values are presented as mean ± SD, number (%), or median (interquartile range).
CRC, colorectal cancer; BMI, body mass index; BP, blood pressure; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; hsCRP, high-sensitivity C-reactive protein; WBC, white blood cell; RBC, red blood cell; CEA, carcinoembryonic antigen; ACRN, advanced colorectal neoplasia.
Figure 2.Receiver operating characteristic curve and area under the receiver operating characteristic curve (AUC) of the prediction models for advanced colorectal neoplasia. LR, logistic regression; DNN, deep neural network; CI, confidence interval.
Performance of DNN model at points of high sensitivity of detecting advanced neoplasia in colorectal cancer screening
| Screening strategy | Sensitivity, % | Specificity, % | No. of colonoscopy | ACRNs detected, n | NNScope, n | Reduction of NNScope, % | |
|---|---|---|---|---|---|---|---|
| Target sensitivity | 80 | ||||||
| LR | 78.9 | 51.7 | 6,855 | 146 | 47.0 | Reference | Reference |
| DNN | 79.5 | 58.2 | 5,948 | 147 | 40.5 | 13.8 | < 0.001 |
| Target sensitivity | 90 | ||||||
| LR | 89.2 | 26.5 | 10,364 | 165 | 62.8 | Reference | Reference |
| DNN | 89.7 | 41.0 | 8,356 | 166 | 50.3 | 19.9 | < 0.001 |
| Target sensitivity | 95 | ||||||
| LR | 92.4 | 14.5 | 12,041 | 171 | 70.4 | Reference | Reference |
| DNN | 94.6 | 19.9 | 11,293 | 175 | 64.5 | 8.4 | < 0.001 |
DNN, deep neural network; ACRN, advanced colorectal neoplasia; NNScope, number needed to colonoscope to detect one ACRN; LR, logistic regression.
Figure 3.Receiver operating characteristic curve and area under the receiver operating characteristic curve (AUC) of various prediction models for advanced colorectal neoplasia. (A) Logistic regression (LR) and deep neural network (DNN) models with 9 and 26 variables. (B) LR, support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and DNN models using 26 variables. (C) DNN model compared with fecal immunochemical testing (FIT) and combined FIT and clinical risk score. CI, confidence interval.
Comparison of subjects with advanced colorectal neoplasia (ACRN) according to detection models
| Characteristic | ACRNs detected both by LR and DNN (n = 157) | ACRNs detected only by LR model (n = 7) | ACRNs detected only by DNN model (n = 8) | |
|---|---|---|---|---|
| Age, yr | 51.5 ± 10.4 | 32.4 ± 4.8 | 43.4 ± 3.2 | 0.001 |
| Male sex | 138 (87.34) | 7 (100.0) | 2 (25.0) | < 0.001 |
| Current smoker | 65 (41.1) | 2 (28.6) | 3 (37.5) | 0.791 |
| Alcohol consumption, time/wk | 2 (1–3) | 1 (0–2) | 1.5 (1–2.5) | 0.214 |
| Regular exercise ≥ 4 times/wk | 88 (55.7) | 4 (57.4) | 3 (37.5) | 0.597 |
| Family history of CRC | 9 (5.7) | 0 (0) | 0 (0) | 0.637 |
| Hypertension | 35 (22.2) | 0 (0) | 2 (25.0) | 0.364 |
| Diabetes | 18 (11.4) | 0 (0) | 0 (0) | 0.385 |
| BMI, kg/m2 | 24.6 ± 2.8 | 25.4 ± 5.1 | 22.1 ± 2.1 | 0.031 |
| Waist circumference, cm | 86.7 ± 7.4 | 86.4 ± 11.2 | 78.1 ± 4.8 | 0.100 |
| Systolic BP, mmHg | 113.7 ± 12.5 | 93.0 ± 5.2 | 108.5 ± 13.8 | 0.106 |
| Diastolic BP, mmHg | 72.6 ± 9.6 | 60.8 ± 5.0 | 70.1 ± 10.4 | 0.241 |
| Glucose, mg/dL | 99.2 ± 16.8 | 91.3 ± 11.4 | 86.8 ± 8.0 | 0.053 |
| HbA1c, % | 5.9 ± 0.6 | 5.5 ± 0.1 | 5.5 ± 0.2 | 0.065 |
| Total cholesterol, mg/dL | 209.8 ± 34.6 | 197.9 ± 28.8 | 172.8 ± 26.6 | 0.593 |
| HDL-C, mg/dL | 50.9 ± 12.3 | 50.6 ± 11.4 | 60.3 ± 16.4 | 0.510 |
| Triglyceride, mg/dL | 125 (93–191) | 100 (67–152) | 67 (52–92.5) | 0.004 |
| LDL-C, mg/dL | 134.7 ± 32.8 | 133.0 ± 27.9 | 100.0 ± 25.8 | 0.645 |
| Insulin, μU/mL | 5.5 (3.3–7.8) | 7.3 (2.5–10.6) | 3.5 (1.5–3.9) | 0.088 |
| hsCRP, mg/L | 0.1 (0.0–0.1) | 0.0 (0.0–0.1) | 0.1 (0.0–0.1) | 0.031 |
| WBC, × 103/mm3 | 6.9 ± 2.0 | 5.4 ± 0.8 | 7.4 ± 1.6 | 0.046 |
| RBC, × 106/mm3 | 4.9 ± 0.4 | 5.1 ± 0.2 | 4.4 ± 0.3 | 0.110 |
| Hemoglobin, g/dL | 15.4 ± 1.3 | 15.8 ± 0.6 | 13.9 ± 1.3 | 0.134 |
| Hematocrit, % | 45.2 ± 3.4 | 45.7 ± 2.3 | 41.3 ± 2.8 | 0.424 |
| Platelet, × 103/mm3 | 252.9 ± 58.1 | 223.9 ± 54.3 | 270.4 ± 38.8 | 0.433 |
| Ferritin, ng/mL | 154.2 (98.6–216.3) | 203.8 (140.9–326.8) | 81.9 (34.9–116.2) | 0.034 |
| CEA, ng/mL | 1.8 (1.2–2.5) | 1.2 (1.1–1.5) | 1.7 (1.1–2.8) | 0.295 |
Values are presented as mean ± SD, number (%), or median (interquartile range).
LR, logistic regression; DNN, deep neural network; BMI, body mass index; BP, blood pressure; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; hsCRP, high-sensitivity C-reactive protein; WBC, white blood cell; RBC, red blood cell; CEA, carcinoembryonic antigen.