| Literature DB >> 31857990 |
Abstract
Background: Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examination and clinical breast examination. The prominent gold standard for BC detection is triple assessment: i) clinical examination, ii) mammography and/or ultrasonography; and iii) Fine Needle Aspirate Cytology. However, the introduction of cheaper, efficient and noninvasive methods of BC screening and detection would be beneficial. Design and methods: We propose the use of eight machine learning algorithms: i) Logistic Regression; ii) Support Vector Machine; iii) K-Nearest Neighbors; iv) Decision Tree; v) Random Forest; vi) Adaptive Boosting; vii) Gradient Boosting; viii) eXtreme Gradient Boosting, and blood test results using BC Coimbra Dataset (BCCD) from University of California Irvine online database to create models for BC prediction. To ensure the models' robustness, we will employ: i) Stratified k-fold Cross- Validation; ii) Correlation-based Feature Selection (CFS); and iii) parameter tuning. The models will be validated on validation and test sets of BCCD for full features and reduced features. Feature reduction has an impact on algorithm performance. Seven metrics will be used for model evaluation, including accuracy. Expected impact of the study for public health: The CFS together with highest performing model(s) can serve to identify important specific blood tests that point towards BC, which may serve as an important BC biomarker. Highest performing model(s) may eventually be used to create an Artificial Intelligence tool to assist clinicians in BC screening and detection. ©Copyright: the Author(s), 2019.Entities:
Keywords: Breast cancer; biomarkers; blood tests; cancer screening; machine learning
Year: 2019 PMID: 31857990 PMCID: PMC6902303 DOI: 10.4081/jphr.2019.1677
Source DB: PubMed Journal: J Public Health Res ISSN: 2279-9028
Figure 1.Global traditional versus proposed breast cancer screening and detection methods, indicating that the best performing model(s) can eventually be used as basis to create an Artificial Intelligence (AI) tool as a first approach to identify breast cancer in patients.
Figure 2.Proposed hybrid framework of the modelling process to predict breast cancer using the online breast cancer Coimbra Dataset (BCCD) and various machine learning algorithms: Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM) and eXtreme Gradient Boosting (XGBoost).