Nosayba Al-Azzam1, Ibrahem Shatnawi2. 1. Department of Physiology and Biochemistry, Faculty of Medicine, Jordan University of Science and Technology, Irbid, 22110, Jordan. 2. Independent Researcher in Data Analytics, Jordan.
Abstract
BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves. RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%-98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98. CONCLUSION: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91-98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type.
BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves. RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%-98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98. CONCLUSION: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91-98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type.
Authors: Carol E DeSantis; Jiemin Ma; Mia M Gaudet; Lisa A Newman; Kimberly D Miller; Ann Goding Sauer; Ahmedin Jemal; Rebecca L Siegel Journal: CA Cancer J Clin Date: 2019-10-02 Impact factor: 508.702
Authors: Abdur Rasool; Chayut Bunterngchit; Luo Tiejian; Md Ruhul Islam; Qiang Qu; Qingshan Jiang Journal: Int J Environ Res Public Health Date: 2022-03-09 Impact factor: 3.390