| Literature DB >> 30213141 |
Meng-Hsuen Hsieh1, Li-Min Sun2, Cheng-Li Lin3,4, Meng-Ju Hsieh5, Kyle Sun6, Chung-Y Hsu7, An-Kuo Chou8,9, Chia-Hung Kao10,11,12.
Abstract
OBJECTIVES: Observational studies suggested that patients with type 2 diabetes mellitus (T2DM) presented a higher risk of developing colorectal cancer (CRC). The current study aims to create a deep neural network (DNN) to predict the onset of CRC for patients with T2DM.Entities:
Keywords: colorectal cancer; deep neural network; receiver operating characteristic; the national health insurance database; type 2 diabetes mellitus
Year: 2018 PMID: 30213141 PMCID: PMC6162847 DOI: 10.3390/jcm7090277
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Distribution of training and test sets.
| All Patients | Training Set | Test Set | |
|---|---|---|---|
| 1,349,640 | 1,315,899 | 337,410 |
Baseline characteristics of T2DM patients with and without colorectal cancer.
| Colorectal Cancer | |||||
|---|---|---|---|---|---|
| No | Yes | ||||
| Variable |
| (%) |
| (%) | |
| Age group (year) | <0.001 | ||||
| ≤49 | 420,354 | 31.5 | 1737 | 11.7 | |
| 50–64 | 515,804 | 38.6 | 5950 | 40.0 | |
| 65 + | 398,615 | 29.9 | 7180 | 48.3 | |
| Mean (SD) (year) † | 56.2 | 14.2 | 63.7 | 11.2 | <0.001 |
| Gender | <0.001 | ||||
| Women | 633,366 | 47.5 | 6259 | 42.1 | |
| Men | 701,407 | 52.6 | 8608 | 57.9 | |
| Urbanization level # | 0.001 | ||||
| 1 (highest) | 387,470 | 29.0 | 4374 | 29.4 | |
| 2 | 397,750 | 29.8 | 4383 | 29.5 | |
| 3 | 223,753 | 16.8 | 2337 | 15.7 | |
| 4 (lowest) | 325,800 | 24.4 | 3773 | 25.4 | |
| Occupation | <0.001 | ||||
| White collar | 640,808 | 48.1 | 6695 | 45.0 | |
| Blue collar | 554,764 | 41.6 | 6577 | 44.2 | |
| Others ‡ | 139,201 | 10.4 | 1595 | 10.7 | |
| Underlying disease | |||||
| Hypertension | 984,221 | 73.7 | 11,707 | 78.7 | 0.001 |
| Hyperlipidemia | 899,397 | 67.4 | 9102 | 61.2 | <0.001 |
| Stroke | 259,808 | 19.5 | 2940 | 19.8 | 0.34 |
| Congestive heart failure | 183,790 | 13.8 | 2076 | 14.0 | <0.001 |
| Colorectal polyps | 58,952 | 4.42 | 1562 | 10.5 | <0.001 |
| Obesity | 71,119 | 5.33 | 452 | 3.04 | <0.001 |
| COPD | 375,331 | 28.1 | 4654 | 31.3 | <0.001 |
| CAD | 510,862 | 38.3 | 6264 | 42.1 | <0.001 |
| Asthma | 259,565 | 19.5 | 2859 | 19.2 | 0.51 |
| Smoking | 50,660 | 3.80 | 324 | 2.18 | <0.001 |
| Inflammatory bowel disease | 49,295 | 3.69 | 575 | 3.87 | 0.26 |
| Irritable bowel syndrome | 182,951 | 13.7 | 2781 | 18.7 | <0.001 |
| Alcohol-related illness | 142,265 | 10.7 | 1107 | 7.45 | <0.001 |
| CKD | 856,446 | 64.2 | 8314 | 55.9 | <0.001 |
| Diabetes complication (components of the aDCSI) | |||||
| Retinopathy | 262,293 | 19.7 | 2423 | 16.3 | <0.001 |
| Nephropathy | 479,819 | 36.0 | 4659 | 31.3 | <0.001 |
| Neuropathy | 398,979 | 29.9 | 3871 | 26.0 | <0.001 |
| Cerebrovascular | 354,430 | 26.6 | 3741 | 25.2 | <0.001 |
| Cardiovascular | 769,763 | 57.7 | 8887 | 59.8 | <0.001 |
| Peripheral vascular disease | 365,797 | 27.4 | 3406 | 22.9 | <0.001 |
| Metabolic | 60,532 | 4.54 | 434 | 2.92 | <0.001 |
| Mean aDCSI score (SD) † | |||||
| Onset | 1.55 | 1.67 | 1.55 | 1.62 | 0.74 |
| End of follow-up | 3.03 | 2.35 | 2.75 | 2.15 | <0.001 |
| Medications | |||||
| Statin | 706,079 | 52.9 | 6351 | 42.7 | <0.001 |
| Insulin | 437,994 | 32.8 | 3506 | 23.6 | <0.001 |
| Sulfonylureas | 770,838 | 57.8 | 8432 | 56.7 | <0.001 |
| Metformin | 856,446 | 64.2 | 8314 | 55.9 | <0.001 |
| TZD | 223,650 | 16.8 | 1767 | 11.9 | <0.001 |
| Other antidiabetic drugs | 365,662 | 27.4 | 3071 | 20.7 | <0.001 |
| Mean follow-up for endpoint, y (SD) † | 6.86 | 3.87 | 4.73 | 3.33 | <0.001 |
#: The urbanization level was categorized by the population density of the residential area into 4 levels, with level 1 as the most urbanized and level 4 as the least urbanized. ‡: Other occupations included primarily retired, unemployed, or low income populations. aDCSI = adapted Diabetes Complication Severity Index. Chi-square test, and †: t-test comparing subjects with and without death.
Performance of DNN across all data, the training set, and the test set.
| Dataset | F1 | Precision | Recall | AUROC | AUROC 95% CI | AUROC SE |
|---|---|---|---|---|---|---|
| All data ( | 0.931 | 0.982 | 0.889 | 0.738 | 0.734–0.742 | 0.002 |
| Training set ( | 0.931 | 0.982 | 0.889 | 0.739 | 0.735–0.743 | 0.002 |
| Test set ( | 0.929 | 0.980 | 0.886 | 0.700 | 0.674–0.727 | 0.013 |
The receiver operating characteristic of aDCSI.
| Dataset | AUROC | AUROC 95% CI | AUROC SE |
|---|---|---|---|
| All data ( | 0.492 | 0.487–0.497 | 0.003 |
| Training set ( | 0.492 | 0.487–0.498 | 0.003 |
| Test set ( | 0.498 | 0.466–0.530 | 0.016 |
Figure 1The ROC curve of the DNN model and aDCSI model in predicting CRC.