| Literature DB >> 32410801 |
Hailun Liang1, Lei Yang2, Lei Tao3, Leiyu Shi4, Wuyang Yang5, Jiawei Bai6, Da Zheng7, Ning Wang2, Jiafu Ji8.
Abstract
OBJECTIVE: Prevention and early detection of colorectal cancer (CRC) can increase the chances of successful treatment and reduce burden. Various data mining technologies have been utilized to strengthen the early detection of CRC in primary care. Evidence synthesis on the model's effectiveness is scant. This systematic review synthesizes studies that examine the effect of data mining on improving risk prediction of CRC.Entities:
Keywords: Systematic review; colorectal cancer; data mining; disease detection
Year: 2020 PMID: 32410801 PMCID: PMC7219096 DOI: 10.21147/j.issn.1000-9604.2020.02.11
Source DB: PubMed Journal: Chin J Cancer Res ISSN: 1000-9604 Impact factor: 5.087
Study characteristics
| Author | Year | Country | Data source | Data mining methods | Sample | Input | Output | Validation method | Performance |
| DT, decision trees; GB, gradient boosting; CBC, complete blood count; CRC, colorectal cancer; OR, odds ratio; 95% CI, 95% confidence interval; AUC, area under the curve; RF, random forests; CART, classification and regression tree; NLP, natural language processing. | |||||||||
| Kinar | 2016 | Israel | MSH and the United Kingdom Health Improvement Network (THIN) | DT, GB, RF | 451,535 samples aged 50−75 years old | Age, sex, CBC data | CRC risk scores | 10-fold cross-validation | AUC: 0.81;
|
| Hoogendoorn | 2016 | Netherlands | Primary care dataset in the region of Utrecht, the Netherlands | CART, LR, RF, NLP | 90,000 samples aged over 30 years old | Age, gender, consults, lab tests, medication, and referrals, consultation notes | Occurrence of CRC | 5-fold stratified cross-validation | AUC: 0.900 (95% CI: 0.886−0.914);
|
| Kop | 2016 | Netherlands | EMR dataset from three urban regions in the Netherlands | CART, LR, RF | 263,879 samples aged over 30 years old | Age, gender, general practitioner (GP) consultations, drug prescriptions, specialist or additional diagnostic procedure referrals and lab test | Occurrence of CRC | 5-fold stratified cross validation | AUC: 0.891 (95% CI: 0.879−0.903);
|
| Kinar | 2017 | Israel | Maccabi Healthcare Services (MHS) electronic medical records (EMRs) and Israel Cancer Registry | DT, GB | 112,584 samples aged 50−75 years old | Age, sex and CBC reports | CRC risk scores | 10-fold cross-validation | Sensitivity at 1% percentile cutoff: 17.3% (23/135);
|
| Kop | 2015 | Netherlands | EMR dataset from the Utrecht region in the Netherlands | CART, LR, RF | 219,447 samples aged over 30 years old | Age, gender, GP consults, drug prescriptions, specialist referrals, comorbidity, and lab test outcomes. | Occurrence of CRC | 5-fold stratified cross-validation | AUC: 0.881 (95% CI: 0.864−0.898) |
| Birks | 2017 | UK | Clinical Practice Research Datalink (CPRD) from the UK | RF | 2,914,589 samples aged over 40 years old | Sex, year of birth, and CBC results | CRC risk scores | 2-fold cross-validation | OR for a diagnosis of CRC at 99.84 points cutoff: 26.5 (95% CI: 23.3, 30.2);
|
| Hornbrook
| 2017 | USA | Kaiser Permanente Northwest Region (KPNW) electronic medical record system and KPNW Tumor Registry | DT | 17,095 samples aged 40−89 years old | Gender, year of birth, and CBC | CRC risk scores | 10-fold cross-validation | AUC: 0.80 (95% CI: 0.79−0.82);
|