| Literature DB >> 34806496 |
Hui Li1, Jianmei Lin1, Yanhong Xiao1, Wenwen Zheng1, Lu Zhao1, Xiangling Yang1,2, Minsheng Zhong3, Huanliang Liu1,2.
Abstract
Background: Current diagnostic methods for colorectal cancer (CRC) are colonoscopy and sigmoidoscopy, which are invasive and complex procedures with possible complications. This study aimed to determine models for CRC identification that involve minimally invasive, affordable, portable, and accurate screening variables.Entities:
Keywords: clinical laboratory techniques; colorectal cancer; diagnosis; logistic regression; machine learning
Mesh:
Substances:
Year: 2021 PMID: 34806496 PMCID: PMC8606732 DOI: 10.1177/15330338211058352
Source DB: PubMed Journal: Technol Cancer Res Treat ISSN: 1533-0338
Figure 1.Flowchart of the colorectal cancer (CRC) identification model.
Characteristics of the Patients.
| Baseline characteristic | CRCs (n = 582) | Controls (n = 582) |
|---|---|---|
| Sex (male) | 355 (61%) | 355 (61%) |
| Age (years) | 52 ± 9 | 52 ± 9 |
| CEA (ng/mL) | 4.61 (2.38-13.59)* | 1.70 (1.13-2.59) |
| α-fetoprotein (ng/mL) | 2.52 (1.93-3.36) | 3.49 (2.85-4.16) |
| Alanine transaminase (U/L) | 13.76 (9.97-18.83)** | 20.72 (15.57-29.69) |
| Aspartate transaminase (U/L) | 17.02 (13.96-21.44)* | 20.66 (17.64-24.72) |
| γ-glutamyltransferase (U/L) | 20.93 (14.84-31.47)* | 27.23 (17.50-42.98) |
| Triglycerides (mmol/L) | 1.18 (0.92-1.58) ** | 1.45 (1.05-2.14) |
| Total cholesterol (mmol/L) | 5.04 ± 1.12** | 5.43 ± 1.05 |
| HDL (mmol/L) | 1.19 ± 0.30** | 1.36 ± 0.31 |
| LDL (mmol/L) | 3.29 ± 0.83** | 3.63 ± 0.82 |
| ApoA1 (g/L) | 1.16 ± 0.21** | 1.35 ± 0.20 |
| ApoB (g/L) | 1.00 ± 0.24 | 0.90 ± 0.24 |
| Lipoprotein (a) (g/L) | 176.49 (101.62-342.76)** | 103.15 (52.31-206.46) |
| hs-CRP (mg/L) | 2.27 (0.82-1.13)** | 0.88 (0.49-1.86) |
| Red blood cells (1012/L) | 4.50 ± 0.62** | 4.86 ± 0.52 |
| Hemoglobin (g/L) | 120.64 ± 22.03** | 141.56 ± 15.14 |
| White blood cells (109/L) | 6.76 ± 2.32** | 6.02 ± 1.56 |
| Neutrophils (109/L) | 4.26 ± 2.03** | 3.44 ± 1.14 |
| Lymphocytes (109/L) | 1.72 ± 0.63** | 1.92 ± 0.55 |
| Monocytes (109/L) | 0.54 ± 0.23** | 0.45 ± 0.15 |
| Eosinophiles (109/L) | 0.20 ± 0.17 | 0.18 ± 0.15 |
| Platelets (109/L) | 273.02 ± 97.58** | 233.13 ± 57.25 |
| Early colon cancer | 101 (17.4%) | - |
| Late colon cancer | 164 (28.2%) | - |
| Early rectal cancer | 102 (17.5%) | - |
| Late rectal cancer | 215 (36.9%) | - |
Abbreviations: CRC: colorectal cancer; CEA: carcinoembryonic antigen; HDL: high-density lipoprotein; LDL: low-density lipoprotein; hs-CRP: high-sensitivity C-reactive protein.
*P < .05 versus the control group, **P < .001 versus the control group.
Figure 2.Matrix of the Spearman correlation coefficients. All pairs of variables included in the models were tested using the Spearman correlation. For the variable pairs in which correlation coefficients >0.5, the one with the less weight coefficient in the principal component analysis (PCA) was deleted from feature collection.
Performance Comparison of Dfferent Machine Learning Models in Colorectal Cancer (CRC) (vs health).
| Patterns | AUC | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| Logistic Regression | 0.865 (0.857- 0.877) | 0.895 (0.880-0.903) | 0.835 (0.817-0.851) | 0.844 (0.835-0.859) | 0.889 (0.875-0.898) |
| Random Forests | 0.848 (0.840-0.857) | 0.873 (0.857-0.891) | 0.823 (0.80-0.840) | 0.832 (0.817 to 0.848) | 0.867(0.852-0.885) |
| Support Vector Machine | 0.865 (0.857-0.874) | 0.901 (0.880-0.914) | 0.830 (0.806-0.851) | 0.842 (0.827-0.859) | 0.894 (0.879-0.903) |
| K-Nearest Neighbor | 0.816 (0.797-0.831) | 0.879 (0.851-0.897) | 0.754 (0.737-0.771) | 0.781 (0.771-0.796) | 0.863 (0.830-0.883) |
| Naive Bayes | 0.839 (0.831-0.849) | 0.928 (0.914-0.943) | 0.749 (0.731-0.766) | 0.788 (0.777-0.796) | 0.913 (0.903-0.923) |
Abbreviations: CEA: carcinoembryonic antigen; AUC: area under the curve; PPV: positive predictive value; NPV: negative predictive value.
Figure 3.The weight coefficients of the logistic regression model (the model with the highest accuracy) for colorectal cancer (CRC) diagnosis. The first four weighted features in the logistic regression model were carcinoembryonic antigen (CEA), hemoglobin (HGB), lipoprotein (a) (Lp(a)), and high-density lipoprotein (HDL).
Figure 4.Receiver operating characteristic (ROC) curve for colorectal cancer (CRC) diagnosis using logistic regression models: CEA alone, CEA + hemoglobin (HGB) + Lp(a), CEA + HGB + Lp(a) + HDL, and CEA + HGB + Lp(a) + HDL + ALT.
Performance Comparison of Different Colorectal Cancer Patterns (vs health) Based on CEA, Hemoglobin, HDL, and Lp(a) (logistic regression model).
| Patterns | AUC | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| All CRC | 0.849 (0.840-0.860) | 0.883 (0.874-0.903) | 0.815 (0.794 to 0.834) | 0.828 (0.815-0.840) | 0.875 (0.866-0.888) |
| Early colon cancer | 0.801 (0.785-0.820) | 0.859 (0.833-0.900) | 0.743 (0.710- 0.774) | 0.771 (0.750-0.784) | 0.844 (0.821-0.880) |
| Late colon cancer | 0.905 (0.889-0.929) | 0.921 (0.898-0.959) | 0.888 (0.878-0.898) | 0.892 (0.880-0.906) | 0.920 (0.896-0.956) |
| Early rectal cancer | 0.745 (0.742-0.758) | 0.792 (0.774-0.839) | 0.698 (0.645-0.742) | 0.728 (0.711-0.758) | 0.773 (0.739-0.815) |
| Late rectal cancer | 0.862 (0.852-0.883) | 0.886 (0.862-0.908) | 0.838 (0.800-0.877) | 0.849 (0.819-0.875) | 0.881 (0.857-0.898) |
Abbreviations: CRC: colorectal cancer; CEA: carcinoembryonic antigen; AUC: area under the curve; PPV: positive predictive value; NPV: negative predictive value.
Comparison of Performance Between the Proposed Model and From Other Studies.
| Author | Detections | Algorithms | AUC | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Long
| Multi-platform transcriptomics | RF | 0.998 (0.995-0.999) | 99.8% | 99.9% |
| Nakajima
| Urinary polyamine biomarker panel | ADTree | 0.961 (0.937-0.984) | N/A | N/A |
| Wan
| Whole-genome sequencing | LR + SVM | 0.92 (0.91 to 0.93) | 85% | 85% |
| Hornbrook
| Complete blood count | ColonFlag® | 0.80 (0.79-0.81) | N/A | N/A |
| Kinar
| Complete blood counts | GBM + RF | 0.82 | 50% | 87% |
| Zhao
| Age, BMI, gut bacteria | LR + SVM | 0.942 | 93.3% | 80.7% |
| Proposed | CEA, hemoglobin, HDL, and Lp(a) | LR | 0.849 (0.840-0.860) | 88.3% | 81.5% |
Abbreviations: ADTree: alternating decision tree; AUC: area under the curve; BP: backpropagation; CEA: carcinoembryonic antigen; GBM: gradient boosting model; LR, logistic regression; RF, random forests; SVM, support vector machine.