| Literature DB >> 34917137 |
Yingyin Feng1, Qi Ding1, Chen Meng2, Wenfeng Wang3, Jingjing Zhang3, Huixiu Lian4.
Abstract
In this paper, we mainly use random forest and broad learning system (BLS) to predict rectal cancer. A total of 246 participants with computed tomography (CT) image records were enrolled. The total model in the training set (combined with imaging and clinical indicators) has the best prediction result, with the area under the curve (AUC) of 0.999 (95% confidence internal (CI): 0.996-1.000) and the accuracy of 0.990 (95%CI: 0.976-1.000). Model 3, the general model in the test set, has the best prediction result, with the AUC of 0.962 (95%CI: 0.915-1.000) and the accuracy of 0.920 (95%CI: 0.845-0.995). The results of the model using random forest prediction are compared with those using BLS prediction. It can be found that there is no statistical difference between the two results. Our prediction model combined with image features has a good prediction result, and this image feature is the most important among all features. Consequently, we can successfully predict rectal cancer through a combination of the clinical indicators and the comprehensive indicators of CT image characteristics in four different periods (plain scan, vein, artery, and excretion).Entities:
Mesh:
Year: 2021 PMID: 34917137 PMCID: PMC8670968 DOI: 10.1155/2021/4662061
Source DB: PubMed Journal: Comput Intell Neurosci
Sensitivity analysis before/after gap-filling
| Variables | Missing value | After ( | Before ( | Statistics |
|
|---|---|---|---|---|---|
| Total cholesterol | 11 (4.47%) | 4.52 ± 1.28 | 4.51 ± 1.30 |
| 0.978 |
| Triglyceride | 11 (4.47%) | 1.30 (0.98, 1.76) | 1.30 (0.97, 1.76) |
| 0.950 |
| Lipoprotein cholesterol (LD) | 12 (4.88%) | 2.91 (2.30, 3.56) | 2.91 (2.25, 3.60) |
| 0.963 |
| Lipoprotein cholesterol (HD) | 12 (4.88%) | 1.04 (0.86, 1.26) | 1.04 (0.83, 1.26) |
| 0.852 |
| Carcinoembryonic antigen | 3 (1.22%) | 2.66 (1.81, 5.29) | 2.65 (1.74, 5.17) |
| 0.921 |
| Alpha fetoprotein | 6 (2.43%) | 2.53 (1.88, 3.68) | 2.50 (1.87, 3.66) |
| 0.808 |
| Sugar antigen 199 | 13 (5.28%) | 9.88 (2.78, 22.28) | 9.73 (2.51, 22.28) |
| 0.820 |
LD = low density, HD = high density.
Figure 1Data preprocessing process.
Figure 2Flow chart of the prediction model development and validation.
Comparison for the characteristics between the training and testing sets.
| Variables | Total ( | Training set ( | Testing set ( | Statistics |
|
|---|---|---|---|---|---|
| Gender, |
| 0.206 | |||
| Male | 143 (58.13) | 110 (56.12) | 33 (66.00) | ||
| Female | 103 (41.87) | 86 (43.88) | 17 (34.00) | ||
| Past history of hypertension, |
| 0.659 | |||
| No | 126 (51.22) | 99 (50.51) | 27 (54.00) | ||
| Yes | 120 (48.78) | 97 (49.49) | 23 (46.00) | ||
| Past history of diabetes, |
| 0.825 | |||
| No | 174 (70.73) | 138 (70.41) | 36 (72.00) | ||
| Yes | 72 (29.27) | 58 (29.59) | 14 (28.00) | ||
| Family cancer history, | Fisher | 0.208 | |||
| No | 229 (93.09) | 180 (91.84) | 49 (98.00) | ||
| Yes | 17 (6.91) | 16 (8.16) | 1 (2.00) | ||
| History of intestinal inflammatory diseases, | Fisher | 1.000 | |||
| No | 244 (99.19) | 194 (98.98) | 50 (100.00) | ||
| Yes | 2 (0.81) | 2 (1.02) | 0 (0.00) | ||
| Smoking history, |
| 0.080 | |||
| Never smoking | 145 (58.94) | 121 (61.73) | 24 (48.00) | ||
| Smoking | 80 (32.52) | 62 (31.63) | 18 (36.00) | ||
| Quit smoking | 21 (8.54) | 13 (6.63) | 8 (16.00) | ||
| Drinking history, |
| 0.309 | |||
| Never drinking | 188 (76.42) | 154 (78.57) | 34 (68.00) | ||
| Drinking | 51 (20.73) | 37 (18.88) | 14 (28.00) | ||
| Quit drinking | 7 (2.85) | 5 (2.55) | 2 (4.00) | ||
| Hemoglobin, Mean ± SD | 126.62 ± 28.66 | 125.71 ± 28.25 | 130.20 ± 30.24 |
| 0.324 |
| Total cholesterol, mean ± SD | 4.52 ± 1.28 | 4.50 ± 1.27 | 4.59 ± 1.34 |
| 0.637 |
| Triglycerides, M ( | 1.30 (0.98, 1.76) | 1.29 (0.97, 1.70) | 1.38 (1.01, 1.94) |
| 0.341 |
| Low density lipoprotein, M ( | 2.91 (2.30, 3.56) | 2.89 (2.32, 3.56) | 2.99 (2.09, 3.79) |
| 0.926 |
| High density lipoprotein, M ( | 1.04 (0.86, 1.26) | 1.04 (0.86, 1.26) | 1.03 (0.87, 1.27) |
| 0.961 |
| Fecal occult blood test, |
| 0.501 | |||
| No | 148 (60.16) | 120 (61.22) | 28 (56.00) | ||
| Yes | 98 (39.84) | 76 (38.78) | 22 (44.00) | ||
| Carcinoembryonic antigen, M ( | 2.66 (1.81, 5.29) | 2.66 (1.88, 5.17) | 2.65 (1.55, 7.01) |
| 0.629 |
| Alpha-fetoprotein, M ( | 2.53 (1.88, 3.68) | 2.51 (1.88, 3.70) | 2.57 (1.88, 3.32) |
| 0.723 |
| Saccharide antigen 199, M ( | 9.88 (2.78, 22.28) | 11.28 (2.81, 23.14) | 7.34 (2.51, 20.89) |
| 0.229 |
| Rectal cancer, |
| 0.809 | |||
| No | 161 (65.45) | 129 (65.82) | 32 (64.00) | ||
| Yes | 85 (34.55) | 67 (34.18) | 18 (36.00) |
Comparison for the characteristics between rectal cancer and non-rectal cancer groups.
| Variables | Total ( | Non-rectal cancer ( | Rectal cancer ( | Statistics |
|
|---|---|---|---|---|---|
| Gender, |
| 0.002 | |||
| Male | 110 (56.12) | 62 (48.06) | 48 (71.64) | ||
| Female | 86 (43.88) | 67 (51.94) | 19 (28.36) | ||
| Past history of hypertension, |
| 0.341 | |||
| No | 99 (50.51) | 62 (48.06) | 37 (55.22) | ||
| Yes | 97 (49.49) | 67 (51.94) | 30 (44.78) | ||
| Past history of diabetes, |
| 0.010 | |||
| No | 138 (70.41) | 83 (64.34) | 55 (82.09) | ||
| Yes | 58 (29.59) | 46 (35.66) | 12 (17.91) | ||
| Family cancer history, |
| <0.001 | |||
| No | 180 (91.84) | 128 (99.22) | 52 (77.61) | ||
| Yes | 16 (8.16) | 1 (0.78) | 15 (22.39) | ||
| History of intestinal inflammatory diseases, | Fisher | 0.548 | |||
| No | 194 (98.98) | 127 (98.45) | 67 (100.00) | ||
| Yes | 2 (1.02) | 2 (1.55) | 0 (0.00) | ||
| Smoking history, |
| 0.501 | |||
| Never smoking | 121 (61.73) | 81 (62.79) | 40 (59.70) | ||
| Smoking | 62 (31.63) | 38 (29.46) | 24 (35.82) | ||
| Quit smoking | 13 (6.63) | 10 (7.75) | 3 (4.48) | ||
| Drinking history, | Fisher | 0.044 | |||
| Never drinking | 154 (78.57) | 107 (82.95) | 47 (70.15) | ||
| Drinking | 37 (18.88) | 18 (13.95) | 19 (28.36) | ||
| Quit drinking | 5 (2.55) | 4 (3.10) | 1 (1.49) | ||
| Hemoglobin, mean ± SD | 125.71 ± 28.25 | 124.85 ± 29.84 | 127.36 ± 25.05 |
| 0.557 |
| Total cholesterol, mean ± SD | 4.50 ± 1.27 | 4.54 ± 1.34 | 4.42 ± 1.12 |
| 0.530 |
| Triglycerides, M ( | 1.29 (0.97, 1.70) | 1.35 (0.96, 1.86) | 1.21 (1.00, 1.56) |
| 0.154 |
| Low density lipoprotein, M ( | 2.89 (2.32, 3.56) | 2.90 (2.26, 3.52) | 2.88 (2.38, 3.56) |
| 0.652 |
| High density lipoprotein, M ( | 1.04 (0.86, 1.26) | 1.02 (0.81, 1.22) | 1.10 (0.91, 1.26) |
| 0.134 |
| Fecal occult blood test, |
| <0.001 | |||
| No | 120 (61.22) | 114 (88.37) | 6 (8.96) | ||
| Yes | 76 (38.78) | 15 (11.63) | 61 (91.04) | ||
| Carcinoembryonic antigen, M ( | 2.66 (1.88, 5.17) | 2.45 (1.71, 3.93) | 4.71 (2.54, 20.24) |
| <0.001 |
| Alpha-fetoprotein, M ( | 2.51 (1.88, 3.70) | 2.44 (1.77, 3.68) | 2.73 (2.04, 3.74) |
| 0.314 |
| Saccharide antigen 199, M ( | 11.28 (2.81, 23.14) | 9.30 (0.97, 17.38) | 19.85 (6.57, 58.40) |
| <0.001 |
The predictive performance of the prediction models using the training set.
| Variables | Clinical demographics | Radiomic | Total model |
|---|---|---|---|
| Cut off | 0.426 | 0.265 | 0.557 |
| Sensitivity (95% CI) | 0.910 (0.842–0.979) | 0.970 (0.929–1.000) | 0.970 (0.929–1.000) |
| Specificity (95% CI) | 0.884 (0.828–0.939) | 0.907 (0.857–0.957) | 1.000 (1.000–1.000) |
| PPV (95% CI) | 0.803 (0.713–0.892) | 0.844 (0.763–0.925) | 1.000 (1.000–1.000) |
| NPV (95% CI) | 0.950 (0.911–0.989) | 0.983 (0.960–1.000) | 0.985 (0.964–1.000) |
| AUC (95% CI) | 0.938 (0.903–0.972) | 0.980 (0.966–0.994) | 0.999 (0.996–1.000) |
| Accuracy (95% CI) | 0.893 (0.850–0.936) | 0.929 (0.893–0.965) | 0.990 (0.976–1.000) |
Compared with the total model, the difference is statistically significant. CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
The predictive performance of the prediction models using the testing set.
| Variables | Clinical demographics | Radiomic | Total model |
|---|---|---|---|
| Cut off | 0.426 | 0.265 | 0.557 |
| Sensitivity (95% CI) | 0.944 (0.839–1.000) | 0.778 (0.586–0.970) | 0.778 (0.586–0.970) |
| Specificity (95% CI) | 0.844 (0.718–0.970) | 0.844 (0.718–0.970) | 1.000 (1.000–1.000) |
| PPV (95% CI) | 0.773 (0.598–0.948) | 0.737 (0.539–0.935) | 1.000 (1.000–1.000) |
| NPV (95% CI) | 0.964 (0.896–1.000) | 0.871 (0.753–0.989) | 0.889 (0.786–0.992) |
| AUC (95% CI) | 0.911 (0.834–0.989) | 0.903 (0.821–0.985) | 0.962 (0.915–1.000) |
| Accuracy (95% CI) | 0.880 (0.790–0.970) | 0.820 (0.714–0.926) | 0.920 (0.845–0.995) |
CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
Figure 3ROC curves of the prediction models.
The predictive performance of the prediction models using the training set.
| Variables | Total model | BLS model |
|---|---|---|
| Cutoff | 0.557 | 0.461 |
| Sensitivity (95% CI) | 0.970 (0.929–1.000) | 0.985 (0.956–1.000) |
| Specificity (95% CI) | 1.000 (1.000–1.000) | 0.977 (0.951–1.000) |
| PPV (95% CI) | 1.000 (1.000–1.000) | 0.957 (0.908–1.000) |
| NPV (95% CI) | 0.985 (0.964–1.000) | 0.992 (0.977–1.000) |
| AUC (95% CI) | 0.999 (0.996–1.000) | 0.999 (0.997–1.000) |
| Accuracy (95% CI) | 0.990 (0.976–1.000) | 0.980 (0.960–0.999) |
CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
The predictive performance of the prediction models using the testing set.
| Variables | Total model | BLS model |
|---|---|---|
| Sensitivity (95% CI) | 0.778 (0.586–0.970) | 0.889 (0.744–1.000) |
| Specificity (95% CI) | 1.000 (1.000–1.000) | 0.906 (0.805–1.000) |
| PPV (95% CI) | 1.000 (1.000–1.000) | 0.842 (0.678–1.000) |
| NPV (95% CI) | 0.889 (0.786–0.992) | 0.935 (0.849–1.000) |
| AUC (95% CI) | 0.962 (0.915–1.000) | 0.965 (0.924–1.000) |
| Accuracy (95% CI) | 0.920 (0.845–0.995) | 0.900 (0.817–0.983) |
CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
Figure 4ROC curves of the prediction models.
Figure 5The importance of variables.