| Literature DB >> 32642458 |
Abstract
The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), and the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients' fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The inclusion criteria for objective 1 were all publication results yielded from the searches. All English papers that had a 'PDF' option from the search results were included for objective 2. A sample of the 'PDF' English papers were included for objective 3. All 116 female patients from the BCCD, consisting of 64 positive BC patients and 52 controls were included in the study for objective 2. For the WBCD, all 699 female patients comprising of 458 with a benign BC tumour and 241 with a malignant BC tumour were included for objective 3. All 2928 publications were included for objective 1. The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising. ©Copyright: the Author(s).Entities:
Keywords: Breast cancer; blood tests; cancer screening; fine needle aspiration; machine learning
Year: 2020 PMID: 32642458 PMCID: PMC7330506 DOI: 10.4081/jphr.2020.1772
Source DB: PubMed Journal: J Public Health Res ISSN: 2279-9028
Summary list of studies focused on BC prediction using ML classification with the BCCD and the WBCD. The contents of this table are ordered first by dataset, and then by summary measure, from highest to lowest. Boldface denotes the highest result per dataset and summary measure.
| Reference | Dataset | Country[ | Sampling strategy | ML Algorithm | Summary measure (in %) |
|---|---|---|---|---|---|
| Hernández-Julio et al.34 | BCCD | Colombia | 10-fold CV | clusters + pivot table | |
| Singh26 | BCCD | India | 67-33 training-testing | K-NN | 92.11 (Accuracy) |
| Polat and Senturk27 | BCCD | Turkey | 10-fold CV | AdaBoost | 91.37 (Accuracy) |
| Akben28 | BCCD | Turkey | 10-fold CV | DT | 90.52 (Accuracy) |
| Islam and Poly35 | BCCD | Taiwan (China) | 10-fold CV | K-NN | 86.00 (Accuracy) |
| Araújo et al.33 | BCCD | Brazil | 70-30 training-testing 10-fold CV | NN | 80.67 (Accuracy) |
| Aslan et al.31 | BCCD | Turkey | 80-20 training-testing | ELM | 80.00 (Accuracy) |
| Livieris32 | BCCD | Greece | 10-fold CV | K-NN | 62.00 (Accuracy) |
| Patrício et al.20 | BCCD | Portugal | MCCV | SVM | |
| Li and Chen30 | BCCD | United Kingdom | 70-30 training-testing | RF | 78.50 (AUC) |
| Hung et al.25 | BCCD | Vietnam | 80-20 training-testing | DT | |
| Abdar and Makarenkov43 | WBCD | Canada | 50-50 training-testing | CWV-BANN-SVM (an ensemble of ANN + SVM) | |
| Elgedawy41 | WBCD | Saudi Arabia | 75-25 training-testing | RF | 99.42 (Accuracy) |
| Hernández-Julio et al.34 | WBCD | Colombia | 10-fold CV | Clusters + pivot table | 99.40 (Accuracy) |
| Chaurasia et al.42 | WBCD | India | Stratified 10-fold CV | NB | 97.36 (Accuracy) |
| Asri et al.36 | WBCD | Morocco | 10-fold CV | SVM | 97.13 (Accuracy) |
| Alzubaidi et al.38 | WBCD | United Kingdom | LOOCV | SVM (quadratic-linear kernel) K-NN (Minkowsky and Euclidean distance measures) | 97.00 (Accuracy) 97.00 (Accuracy) |
| Islam et al.40 | WBCD | Bangladesh | 10-fold CV | SVM | 97.00 (Accuracy) |
| Chaurasia and Pal39 | WBCD | India | 10-fold CV | SMO (SVM) | 96.20 (Accuracy) |
| Bazazeh and Shubair37 | WBCD | United Arab Emirates | 10-fold CV | RF | |
| Li and Chen30 | WBCD | United Kingdom | 70-30 training-testing | RF | 98.90 (AUC) |
*Country is based on the first author’s affiliation. BC, breast cancer; ML, machine learning; WBCD, Wisconsin breast cancer dataset; BCCD, Breast Cancer Coimbra dataset; CV, cross-validation; SVM, support vector machine; RF, random forest; AUC, area under the receiver operating characteristics curve; LOOCV, leave-one-out cross-validation; K-NN, K-nearest neighbors; SMO, sequential minimal optimization; NB, naïve Bayes; MCCV, Monte Carlo cross validation; CI, confidence interval; AdaBoost, adaptive boosting; ELM, extreme learning machine; DT, decision tree; NN, Neural Network; CWV-BANN-SVM, confidence-weighted voting-boosting artificial neural network-support vector machine.
Figure 1.PRISMA 2009 flow diagram used for the screening and selection of breast cancer (BC) prediction studies focused on the use of machine learning and BC blood analysis data (the BC Coimbra Dataset [BCCD]). (http://prisma-statement.org/PRISMAStatement/FlowDiagram).
Figure 2.Top ten publishing countries for breast cancer prediction using machine learning publications, 2015-2019.
Figure 3.Top ten first authors for publications focused on breast cancer prediction using machine learning from 2015 to 2019.
Figure 5.Institutional collaboration network of breast cancer prediction using machine learning-related publications, 2015-2019. The size of the nodes indicates the number of collaborative publications. The higher the number of collaborative publications, the larger the size of the node and vice versa. The distance between two nodes is inversely proportional to the number of collaborations between two institutions. This implies that shorter distances indicate more collaboration between the institutions. There are two clusters: the red and green clusters.
Figure 6.Author keywords co-occurrence network of breast cancer prediction using machine learning-related publications from 2015 to 2019. The size of the nodes depicts the frequency of the keywords, so, the larger the node, the higher the frequency and vice versa. The distance between two nodes is inversely proportional to the number of co-occurrence between the keywords. This means that shorter distances indicate greater co-occurrence between the keywords. There are 14 clusters, each represented by the 14 different colours in the figure.
Figure 4.Top ten journals that breast cancer prediction using machine learning publications were most commonly published in during the period of 2015-2019.