| Literature DB >> 34307286 |
Okechinyere J Achilonu1, June Fabian2,3, Brendan Bebington3,4, Elvira Singh1,5, M J C Eijkemans6, Eustasius Musenge1,7.
Abstract
Background: South Africa (SA) has the highest incidence of colorectal cancer (CRC) in Sub-Saharan Africa (SSA). However, there is limited research on CRC recurrence and survival in SA. CRC recurrence and overall survival are highly variable across studies. Accurate prediction of patients at risk can enhance clinical expectations and decisions within the South African CRC patients population. We explored the feasibility of integrating statistical and machine learning (ML) algorithms to achieve higher predictive performance and interpretability in findings.Entities:
Keywords: cancer; colorectal; filter feature selection; machine learning; prediction; recurrence; survival
Year: 2021 PMID: 34307286 PMCID: PMC8292767 DOI: 10.3389/fpubh.2021.694306
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Graphical representation of the modelling approach to predicting CRC recurrence and survival.
Characteristics of the WDGMC population based on the selected features from information gain, OneR, and LASSO.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
|
| 57 (13) | Age at the time of first visit | 0.022 | 0.621 | –0.010 | |||
|
| Race | 0.042 | 0.567 | 0.009 | ||||
| Black | 356 (51.1) | |||||||
| White | 246 (35.3) | |||||||
| Others | 95 (13.6) | |||||||
|
| Histology | 0.127 | 0.755 | –2.527 | ||||
| Adenocarcinoma | 430 (61.7) | |||||||
| Others | 267 (38.6) | |||||||
|
| cancer related complication | 0.036 | 0.098 | 0.708 | –0.732 | |||
| No | 310 (44.5) | |||||||
| Yes | 387 (55.5) | |||||||
|
| Did patient undergo any procedure | 0.031 | 0.641 | –0.006 | 0.087 | 0.715 | 0.232 | |
| Yes | 410 (58.8) | listed | ||||||
| No | 287 (41.2) | |||||||
|
| Study site of recruitment | 0.018 | 0.621 | -0.111 | 0.075 | 0.659 | 0.284 | |
| Private | 248 (35.6) | |||||||
| Public | 449 (66.4) | |||||||
|
| What is your home language | 0.060 | 0.645 | |||||
| English | 241 (34.6) | 0.000 | ||||||
| Indigenous African language | 326 (46.8) | 0.680 | ||||||
| Others | 130 (18.7) | 1.316 | ||||||
|
| Assessment of stage of | 0.141 | 0.781 | 0.062 | 0.673 | |||
| Unable to stage | 80 (11.5) | malignancy | 0.000 | 0.794 | ||||
| Stage I and II | 157 (22.5) | 0.000 | 0.000 | |||||
| Stage III | 240 (34.4) | -0.023 | 0.783 | |||||
| Stage IV | 220 (31.6) | 2.055 | 1.472 | |||||
|
| Did patient cancer recur after | 0.037 | 0.648 | 1.700 | ||||
| Recurrence | 433 (62.1) | the follow-up | ||||||
| Non-recurrence | 264 (37.9) | |||||||
|
| Receipt of chemotherapy | 0.051 | 0.683 | 0.884 | ||||
| Yes | 246 (35.3) | |||||||
| No | 451 (64.7) | |||||||
|
| Treatment decision, MDT1 | 0.049 | 0.686 | 0.475 | ||||
| Chemotherapy | 214 (30.7) | |||||||
| No chemotherapy | 483 (69.3) | |||||||
|
| What previous treatment was | 0.028 | 0.666 | 1.205 | ||||
| Surgical | 68 (10.0) | given for this of patients colorectal | ||||||
| Non-surgical | 629 (90.0) | cancer prioir to recruitment | ||||||
|
| Was this colorectal cancer | 0.018 | 0.651 | 0.594 | ||||
| Yes | 112 (16.1) | diagnosed prior to recruitment | ||||||
| No | 585 (83.9) | |||||||
|
| Colonoscopy done prior | 0.023 | 0.634 | 0.157 | ||||
| Yes | 451 (64.7) | to first visit to the colorectal | ||||||
| No | 246 (35.3) | unit | ||||||
Risk factor ranking in descending order showing the relative importance of each feature to modelling WDGMC CRC recurrence as ranked by each predictive models.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 | Radiologic stage | Radiologic stage | Prior colonoscopy | Radiologic stage | Radiologic stage | Age at 1st visit |
| 2 | Chemotherapy | Chemotherapy | Radiologic stage | Age at 1st visit | Chemotherapy | Chemotherapy |
| 3 | Hospital | Treatment decision | Age at 1st visit | Chemotherapy | Treatment decision | Radiologic stage |
| 4 | Treatment decision | Procedure | Chemotherapy | Treatment decision | Procedure | Procedure |
| 5 | Age at 1st visit | Age at 1st visit | Treatment decision | Hospital | Age at 1st visit | CRC prior to recruit |
| 6 | Prior colonoscopy | Prior colonoscopy | Procedure | Procedure | Prior colonoscopy | Prior colonoscopy |
| 7 | Procedure | CRC prior to recruit | Prior CRC treatment | Prior CRC treatment | CRC prior to recruit | Treatment decision |
| 8 | Prior CRC treatment | Prior CRC treatment | Hospital | Prior colonoscopy | Prior CRC treatment | Hospital |
| 9 | CRC prior to recruit | Hospital | CRC prior to recruit | CRC prior to recruit | Hospital | Prior CRC treatment |
Risk factor ranking in descending order showing the relative importance of each feature to modelling WDGMC CRC survival as ranked by each predictive models.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 | Histology | Histology | Histology | Histology | Histology | CRC complications |
| 2 | Recurrence status | CRC complications | Hospital | Hospital | CRC complications | Radiological stage |
| 3 | Hospital | Procedure | Radiological stage | Radiological stage | Procedure | Histology |
| 4 | Radiological stage | Hospital | Recurrence status | CRC complications | Hospital | Hospital |
| 5 | Language | Radiological stage | Language | Procedure | Radiological stage | Recurrence status |
| 6 | CRC complications | Language | CRC complications | Recurrence status | Language | Race |
| 7 | Procedure | Recurrence status | Race | Language | Recurrence status | Procedure |
| 8 | Race | Race | Procedure | Race | Race | Language |
Figure 2Forest plots developed from logistic regression showing the effects of each features on the WDGMC CRC (A) recurrence and (B) survival. Features with significance effects are shown with asterisks. Features with their effects values written in red letters decrease odds of CRC recurrence or CRC survival.
AU-ROC performance scores (with confidence interval) examining the consistency of the predictive models from the WDGMC CRC recurrence data and across the three simulated datasets used for model validation.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
|
| |
| LR | 0.861 (0.840–0.899) | 0.941 (0.919–0.964) | 0.923 (0.917–0.930) | 0.927 (0.922–0.932) |
| NB | 0.854 (0.819–0.890) | 0.932 (0.908–0.965) | 0.925 (0.917–0.933) | 0.925 (0.921–0.929) |
| C5.0 | 0.867 (0.831–0.903) | 0.929 (0.904–0.954) | 0.937 (0.931–0.943) | 0.945 (0.943–0.948) |
| RF | 0.863 (0.828–0.898) | 0.931 (0.905–0.957) | 0.933 (0.925–0.941) | 0.945 (0.941–0.949) |
| SVM | 0.867 (0.833–0.900) | 0.940 (0.918–0.963) | 0.923 (0.916–0.930) | 0.930 (0.907–0.963) |
| ANN | 0.870 (0.835-0.905) | 0.955 (0.940–0.971) | 0.947 (0.942–0.951) | 0.953 (0.949–0.958) |
AU-ROC performance scores (with confidence interval) examining the consistency of the predictive models from the WDGMC CRC survival data and across the three simulated datasets used for model validation.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
|
| |
| LR | 0.816 (0.776–0.856) | 0.912 (0.893–0.930) | 0.907 (0.897–0.916) | 0.911 (0.905–0.918) |
| NB | 0.811 (0.771–0.850) | 0.907 (0.891–0.923) | 0.904 (0.893–0.914) | 0.907 (0.901–0.914) |
| C5.0 | 0.811 (0.771–0.855) | 0.902 (0.886–0.917) | 0.906 (0.897–0.914) | 0.911 (0.904–0.918) |
| RF | 0.806 (0.769–0.843) | 0.893 (0.876–0.909) | 0.900 (0.890–0.910) | 0.907 (0.900–0.914) |
| SVM | 0.806 (0.734–0.847) | 0.910 (0.893–0.927) | 0.907 (0.897–0.916) | 0.911 (0.904–0.917) |
| ANN | 0.818 (0.781–0.856) | 0.911 (0.893–0.929) | 0.909 (0.900–0.918) | 0.913 (0.907–0.920) |