| Literature DB >> 34291042 |
Yifeng Dou1,2, Wentao Meng1.
Abstract
As one of the most vulnerable cancers of women, the incidence rate of breast cancer in China is increasing at an annual rate of 3%, and the incidence is younger. Therefore, it is necessary to conduct research on the risk of breast cancer, including the cause of disease and the prediction of breast cancer risk based on historical data. Data based statistical learning is an important branch of modern computational intelligence technology. Using machine learning method to predict and judge unknown data provides a new idea for breast cancer diagnosis. In this paper, an improved optimization algorithm (GSP_SVM) is proposed by combining genetic algorithm, particle swarm optimization and simulated annealing with support vector machine algorithm. The results show that the classification accuracy, MCC, AUC and other indicators have reached a very high level. By comparing with other optimization algorithms, it can be seen that this method can provide effective support for decision-making of breast cancer auxiliary diagnosis, thus significantly improving the diagnosis efficiency of medical institutions. Finally, this paper also preliminarily explores the effect of applying this algorithm in detecting and classifying breast cancer in different periods, and discusses the application of this algorithm to multiple classifications by comparing it with other algorithms.Entities:
Keywords: breast cancer; classification; computer-aided diagnosis; machine learning; optimization; support vector machine
Year: 2021 PMID: 34291042 PMCID: PMC8287651 DOI: 10.3389/fbioe.2021.698390
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1Flow chart of GSP_SVM algorithm.
Contingency table for binary classification problems.
| Actual | Prediction | |
| Judged as | Not | |
| The record belongs to | ||
| The record does not belongs to | ||
Experimental results.
| Evaluating indicator | Proportion of training data | SVM | PCA_SVM | GA_SVM | GS_SVM | PSO_SVM | GSP_SVM |
| Precision | 50% | 0.9853 | 0.9417 | 0.9906 | 0.9804 | 0.9450 | 0.9716 |
| 60% | 0.9695 | 0.9519 | 0.9708 | 0.9711 | 0.9586 | ||
| 70% | 0.9841 | 0.9697 | 0.9853 | 0.9924 | 0.9699 | ||
| 80% | 0.9878 | 0.9667 | 0.9865 | 0.9667 | 0.9778 | ||
| 90% | 0.9773 | 0.9524 | 0.9722 | 0.9778 | 0.9778 | ||
| Recall | 50% | 0.9526 | 0.9713 | 0.9251 | 0.9524 | 0.9716 | |
| 60% | 0.9578 | 0.9700 | 0.9595 | 0.9711 | 0.9701 | ||
| 70% | 0.9612 | 0.9771 | 0.9710 | 0.9489 | 0.9577 | ||
| 80% | 0.9529 | 0.9667 | 0.9359 | 0.9670 | 0.9655 | ||
| 90% | 0.9556 | 0.9756 | 0.9722 | 0.9565 | 0.9762 | ||
| G | 50% | 1.0009 | 0.9061 | 0.9934 | 0.9172 | 0.9677 | |
| 60% | 0.9740 | 0.9112 | 0.9698 | 0.9640 | 0.9483 | ||
| 70% | 0.9928 | 0.9562 | 0.9841 | 1.0053 | 0.9525 | ||
| 80% | 1.0039 | 0.9508 | 1.0159 | 0.9429 | 0.9717 | ||
| 90% | 0.9785 | 0.9374 | 0.9825 | 0.9760 | 0.9673 | ||
| 50% | 0.9687 | 0.9611 | 0.9567 | 0.9662 | 0.9626 | ||
| 60% | 0.9636 | 0.9648 | 0.9651 | 0.9711 | 0.9643 | ||
| 70% | 0.9725 | 0.9734 | 0.9701 | 0.9749 | 0.9773 | ||
| 80% | 0.9701 | 0.9667 | 0.9605 | 0.9775 | 0.9724 | ||
| 90% | 0.9663 | 0.9639 | 0.9722 | 0.9670 | 0.9778 | ||
| MCC | 50% | 0.9209 | 0.8933 | 0.8828 | 0.9146 | 0.9008 | |
| 60% | 0.9082 | 0.8921 | 0.9058 | 0.9211 | 0.9073 | ||
| 70% | 0.9273 | 0.9252 | 0.9336 | 0.9150 | 0.9222 | ||
| 80% | 0.9226 | 0.9014 | 0.9122 | 0.9355 | 0.9176 | ||
| 90% | 0.9030 | 0.9077 | 0.9410 | 0.9009 | 0.9343 | ||
| AUC | 50% | 0.9707 | 0.9652 | 0.9561 | 0.9695 | 0.9681 | |
| 60% | 0.9592 | 0.9761 | 0.9688 | 0.9705 | 0.9652 | ||
| 70% | 0.9737 | 0.9648 | 0.9758 | 0.9573 | 0.9752 | ||
| 80% | 0.9812 | 0.9708 | 0.9651 | 0.9711 | 0.9731 | ||
| 90% | 0.9729 | 0.9512 | 0.9679 | 0.9605 | 0.9720 |
The experimental results of classification accuracy of algorithms.
| SVM | PCA_SVM | GA_SVM | GS_SVM | PSO_SVM | GSP_SVM | |
| 50% | 0.9619 | 0.9501 | 0.9443 | 0.9589 | 0.9531 | 0.9648 |
| 60% | 0.9560 | 0.9524 | 0.9560 | 0.9634 | 0.9560 | |
| 70% | 0.9657 | 0.9657 | 0.9608 | 0.9657 | ||
| 80% | 0.9630 | 0.9559 | 0.9559 | 0.9706 | 0.9632 | |
| 90% | 0.9559 | 0.9559 | 0.9706 | 0.9559 | 0.9706 | |
| Avg | 0.9605 | 0.9560 | 0.9595 | 0.9619 | 0.9617 |
Evaluation results of multicategorical indicators.
| Evaluating indicator | Proportion of training data | SVM | PCA_SVM | GA_SVM | GS_SVM | PSO_SVM | GSP_SVM |
| Accuracy_score | 50% | 0.8768 | 0.8261 | 0.8551 | 0.8478 | 0.8261 | 0.8957 |
| 60% | 0.9455 | 0.9273 | 0.9364 | 0.8455 | 0.8273 | ||
| 70% | 0.9518 | 0.9157 | 0.8193 | ||||
| 80% | 0.9455 | 0.8909 | 0.9455 | ||||
| 90% | 0.9630 | 0.9630 | 0.9630 | 0.9630 | 0.8889 | ||
| Precision_score | 50% | 0.8768 | 0.8261 | 0.8551 | 0.8478 | 0.8261 | |
| 60% | 0.9455 | 0.9273 | 0.9364 | 0.8455 | 0.8273 | ||
| 70% | 0.9518 | 0.9157 | 0.8193 | ||||
| 80% | 0.9455 | 0.8909 | 0.9455 | ||||
| 90% | 0.9630 | 0.9630 | 0.9630 | 0.9630 | 0.8889 | ||
| Recall_score | 50% | 0.8768 | 0.8261 | 0.8551 | 0.8478 | 0.8261 | |
| 60% | 0.9455 | 0.9273 | 0.9364 | 0.8455 | 0.8273 | ||
| 70% | 0.9518 | 0.9157 | 0.8193 | ||||
| 80% | 0.9455 | 0.8909 | 0.9455 | ||||
| 90% | 0.9630 | 0.9630 | 0.9630 | 0.9630 | 0.8889 | ||
| F1_score | 50% | 0.8768 | 0.8261 | 0.8551 | 0.8478 | 0.8261 | |
| 60% | 0.9455 | 0.9273 | 0.9364 | 0.8455 | 0.8273 | ||
| 70% | 0.9518 | 0.9157 | 0.8193 | ||||
| 80% | 0.9455 | 0.8909 | 0.9455 | ||||
| 90% | 0.9630 | 0.9630 | 0.9630 | 0.9630 | 0.8889 | ||
| Hamming_loss↓ | 50% | 0.1232 | 0.1739 | 0.1449 | 0.1522 | 0.1739 | |
| 60% | 0.0545 | 0.0727 | 0.0636 | 0.1545 | 0.1727 | ||
| 70% | 0.0482 | 0.0361 | 0.0361 | 0.0843 | 0.1807 | ||
| 80% | 0.0482 | 0.0361 | 0.0361 | 0.0843 | 0.1807 | ||
| 90% | 0.0370 | 0.0370 | 0.0370 | 0.0370 | 0.1111 | ||
| Cohen_kappa_score | 50% | 0.8053 | 0.7170 | 0.7647 | 0.7533 | 0.7205 | |
| 60% | 0.9132 | 0.8896 | 0.8952 | 0.7553 | 0.7149 | ||
| 70% | 0.9224 | 0.9412 | 0.9397 | 0.8602 | 0.7107 | ||
| 80% | 0.9403 | 0.9369 | 0.9142 | 0.8291 | 0.9127 | ||
| 90% | 0.9444 | 0.9429 | 0.9330 | 0.9363 | 0.8273 | ||
| Jaccard_score | 50% | 0.7899 | 0.7108 | 0.7452 | 0.7358 | 0.7006 | |
| 60% | 0.8966 | 0.8648 | 0.8809 | 0.7264 | 0.6987 | ||
| 70% | 0.9357 | 0.9293 | 0.8478 | 0.6901 | 0.9348 | ||
| 80% | 0.9300 | 0.8961 | 0.8110 | 0.8978 | 0.9340 | ||
| 90% | 0.9444 | 0.9383 | 0.9288 | 0.9290 | 0.8008 |
The optimal parameters of algorithms.
| Parameters | GA_SVM | GS_SVM | PSO_SVM | GSP_SVM |
| bestc | 0.84731 | 0.0625 | 77.691 | 0.1 |
| bestg | 4.0526 | 0.7579 | 0.01 | 1.0164 |
FIGURE 2The visualization of iterative optimization.
FIGURE 3(A–D) Classification results under 60, 70, 80, and 90% training percentages, respectively.