| Literature DB >> 35327018 |
Md Manjurul Ahsan1, Shahana Akter Luna2, Zahed Siddique3.
Abstract
Globally, there is a substantial unmet need to diagnose various diseases effectively. The complexity of the different disease mechanisms and underlying symptoms of the patient population presents massive challenges in developing the early diagnosis tool and effective treatment. Machine learning (ML), an area of artificial intelligence (AI), enables researchers, physicians, and patients to solve some of these issues. Based on relevant research, this review explains how machine learning (ML) is being used to help in the early identification of numerous diseases. Initially, a bibliometric analysis of the publication is carried out using data from the Scopus and Web of Science (WOS) databases. The bibliometric study of 1216 publications was undertaken to determine the most prolific authors, nations, organizations, and most cited articles. The review then summarizes the most recent trends and approaches in machine-learning-based disease diagnosis (MLBDD), considering the following factors: algorithm, disease types, data type, application, and evaluation metrics. Finally, in this paper, we highlight key results and provides insight into future trends and opportunities in the MLBDD area.Entities:
Keywords: COVID-19; artificial neural networks; convolutional neural networks; deep learning; deep neural networks; diabetes; disease diagnosis; heart disease; kidney disease; machine learning; review
Year: 2022 PMID: 35327018 PMCID: PMC8950225 DOI: 10.3390/healthcare10030541
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Different types of machine learning algorithms.
Figure 2Some of the most well-known CNN models, along with their development time frames.
Figure 3Illustration of machine learning and deep learning algorithms development timeline.
Figure 4MLBDD article selection procedure used in this study.
Figure 5Distribution of articles by subject area.
Figure 6Bibliometric map representing co-occurrence analysis of keywords in network visualization.
Figure 7Publications of machine-learning-based disease diagnosis (MLBDD) by year.
Figure 8Publications by journals.
Top ten cited papers published in MLBDD in between 2012–2021 based on Scopus and WOS database.
| Author(s) | Article Titles | Citation |
|---|---|---|
| [ | Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis | 257 |
| [ | Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease | 248 |
| [ | Effective Heart disease prediction Using hybrid Machine Learning techniques | 214 |
| [ | Deep Convolutional Neural Network based medical image classification for disease diagnosis | 155 |
| [ | Detection of subjects and brain regions related to Alzheimer’s disease using 3D MRI scans based on Eigenbrain and Machine Learning | 147 |
| [ | Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of Heart failure subtypes | 139 |
| [ | DWT based detection of Epileptic Seizure From EEG signals using Naive Bayes and k-NN classifiers | 134 |
| [ | Random Forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness | 129 |
| [ | ECG Arrhythmia classification based on optimum-path forest | 111 |
| [ | Gaussian process classification of Alzheimer’s disease and mild cognitive impairment from resting-state fMRI | 107 |
Figure 9Top ten countries that contributed to MLBDD literature.
Top ten authors based on total number of publications.
| Author | Total Article |
|---|---|
| Kim, J. | 20 |
| Wang, Y. | 19 |
| Li, J. | 18 |
| Liu, Y. | 18 |
| Chen, Y. | 17 |
| Kim, H. | 16 |
| Kim, Y. | 15 |
| Lee, S. | 15 |
| Li, Y. | 15 |
| Wang, L. | 15 |
Referenced literature that considered machine-learning-based heart disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Predict coronary heart disease | Gaussian NB, Bernoulli NB, and RF | Cleveland dataset | Tabular | Accuracy—85.00%, 85.00% and 75.00% |
| [ | Predicting heart diseases | RF, CNN | Cleveland dataset | Tabular | RF (Accuracy—80.327%, Precision—82%, Recall—80%, F1-score—80%), CNN (Accuracy—78.688, Precision—80%, Recall—79%, F1-score—78%) |
| [ | Heart disease classification | SVM | Cleveland database | Tabular | Accuracy—73–91% |
| [ | Heart disease classification | Back-propagation NN, LR | Cleveland dataset | Tabular | Accuracy (BNN—85.074%, LR—92.58%) |
| [ | ECG arrhythmia for heart disease detection | SVM and Cuckoo search optimized NN | Cleveland dataset | Tabular | Accuracy (SVM—94.44%) |
| [ | Intelligent scoring system for the prediction of cardiac arrest within 72 h | SVM | Privately ownend | Tabular | Specificity—78.8%, Sensitivity—62.3%, Positive predictive value—10%, Negative predictive value—98.2% |
| [ | Automatically identify 5 different categories of heartbeats in ECG signals | CNN | MIT-BIH | Tabular | Accuracy—94% (balance data) Accuracy—89.07% (imbalance data) |
| [ | Novel heartbeat recognition method is presented | SVM | MIT-BIH | Tabular | Accuracy—97.77% (imbalance data), Accuracy—97.08% (noise-free ECGs) |
Referenced literature that considered machine-learning-based kidney disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Analysis of Chronic Kidney Disease | NB, DT, and RF | Chronic kidney disease dataset | Tabular | Accuracy—100% (RF) |
| [ | Kidney disease detection and segmentation | ANN & kernel KMC | 100 collected image data of patients Ultrasound | Image | Accuracy—99.61% |
| [ | Classification of Chronic kidney disease | LR, Feedforward NN and Wide DL | Chronic kidney disease dataset | Tabular | Feedforward NN (F1-score—99%, Precision—97%, Recall—99%, and AUC—99%) |
| [ | Chronic kidney disease | CNN-SVM | Privately own dataset | Tabular | Accuracy—97.67%, Sensitivity—97.5%, Specificity—97.83% |
| [ | Detection and localization of kidneys in patients with autosomal dominant polycystic | CNN | Privately own data | Image | Accuracy—95% |
Referenced literature that considered machine-learning-based breast cancer disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Breast cancer | NB, BN, RF and DT (C4.5) | BCSC | Image | ROC—0.937 (BN) |
| [ | Classification of breast density and mass | SVM | Mini-MIAS, INBreast | Image | Mini-MIAS: Accuracy—99%, AUC—0.9325 |
| [ | Classify vector features as malignant or non-malignant | SVM | IRMA, DDSM | Image | IRMA: Sensitivity—99%, Specificity—99%, DDSM: Sensitivity—97%, Specificity—96% |
| [ | Classification of breast cancers by tumor size | LR-ANN | 156 Privately owned cases | Image | Accuracy—81.8%, Sensitivity—85.4%, Specificity—77.8%, AUC—0.855 |
| [ | CAD tumor | Binary-LR | 18 Privately owned cases | Image | Accuracy—80.39% |
| [ | Differentiating malignant and benign masses | NB, LR-AdaBoost | 246 Privately owned image | Image | Sensitivity—90%, Specificity—97.5%, AUC—0.98 |
Referenced literature that considered machine-learning-based diabetic disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Diabetes and hypertension | DPM | Privately owned | Tabular | Accuracy—96.74% |
| [ | Type 1 diabetes | RF | DIABIM-MUNE | Tabular | AUC—0.80 |
| [ | Diabetes classification | KNN | Privately owned- 4900 samples | Tabular | Accuracy—99.9% |
| [ | Predict diabetic retinopathy and identify interpretable biomedical features | SVM, DT, ANN, and LR | Privately owned | Tabular | SVM (Accuracy—79.5%, AUC—0.839) |
| [ | Diabetes classification | PSO and MLPNN | Privately owned | Tabular | Accuracy—98.73% |
Referenced literature that considered machine-learning-based Parkinson’s disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Parkinson’s disease | KMC and DT | Privately owned | Speech | Accuracy—95.56% |
| [ | Parkinson’s disease subtype classification | DT, LR | PPMI | Tabular | Accuracy—98.3%, Sensitivity—98.41%, and Specificity—99.07% |
| [ | Parkinson’s disease identification | KNN and ANN | Parkinson’s UI machine learning dataset | Tabular | ANN (Accuracy—96.7%) |
| [ | Diagnosis system for Parkinson’s disease | ANN, KMC | Parkinsons dataset | Speech and sound | Accuracy—99.52% |
| [ | identify Parkinson’s disease | SVM | NIHS | Speech and sound | Accuracy—83.33%, True positive—75%, False positive—16.67% |
Referenced literature that considered machine-learning-based COVID-19 disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | COVID-19 disease detection | CNN | Mixed dataset | Image | Accuracy—90% |
| [ | COVID-19 disease detection | CNN | Mixed dataset | Image | Accuracy—98.5% |
| [ | COVID-19 disease detection | CNN | Mixed dataset | Image | Accuracy—86% |
| [ | COVID-19 disease detection | CNN | Cohen’s dataset | Image | Accuracy—94.1% |
| [ | COVID-19 disease detection and image segmentation | CNN | Cohen’s dataset | Image and Tabular | Accuracy—95.38% |
Referenced literature that considered Machine Learning-based Alzheimer disease diagnosis.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Automatic diagnosis of Alzheimer’s disease and mild cognitive impairment | CNN+SVM | F-FDG PET:PET | Image | Accuracy—74–90% |
| [ | Predicting transition from mild cognitive impairment to Alzheimer’s | LR, ARN, DT | 1913 privately owned cases | Tabular | Accuracy—(89.52 ± 0.36%), AUC-ROC (92.08 ± 0.12), Sensitivity—(82.11 ± 0.42%) and Positive predictive value (75.26 ± 0.86%) |
| [ | Automatic classification of Alzheimer’s | DNN+RF | Tabular | Accuracy—67% |
Referenced literature that considered Machine Learning on various disease diagnoses.
| Study | Contributions | Algorithm | Dataset | Data Type | Performance Evaluation |
|---|---|---|---|---|---|
| [ | Classify pediatric colonic inflammatory bowel disease subtype | RF | 74 Privately owned cases | Image | Accuracy—100% |
| [ | classification of liver diseases | svm | ILPD and BUPA | Tabular | Accuracy—90–92%, Sensitivity—89–91%, F1-score—94–94.3% |
| [ | Hypertension | LR and ANN | BRFSS | Tabular | Accuracy—72%, AUC > 0.77 |
| [ | Brain tumor diagnostic | CNN | Brain tumor challenge websites and MRI centers | Image | Accuracy—90–99% |
| [ | Brain tumor segmentation for multi-modality MRI | RF | MICCAI, BraTS 2013 | Image | 88% disc overlap |
| [ | Melanoma detection with dermoscopic images | SVM with color and feature extractor | PH2 | Image | Accuracy—96% |
| [ | Melanoma skin cancer detection | NB, DT, and KNN | MED-NODE | Image | DT (Accuracy—82.35%) |
| [ | Skin cancer detection with infrared thermal imaging | Ensemble learning and DL | Image | Precision—0.9665, Recall—0.9411, F1-score—0.9536, ROC-AUC—0.9185 | |
| [ | Hepatocellular carcinoma | InceptionV3 | Genomic data commons databases | Image | Accuracy—89–96% |
| [ | Identification of liver cancer | Watershed gaussian based DL (WGDL) | Privately owned | Image | Accuracy—99.38% |
| [ | Hepatocellular carcinoma (HCC) postoperative death outcomes | RF, Gradient boosting, Gbm, LR, DT | BioStudies database | Tabular | AUC—0.803 (RF) |
Figure 10Word cloud for most frequently used ML algorithms in MLBDD publications.
Most widely used disease diagnosis dataset URL along with the referenced literature (accessed on 16 December 2021).
| Study | Disease | Dataset | URL |
|---|---|---|---|
| [ | Heart disease | Cleveland database |
|
| [ | Kidney disease | Chronic kidney disease dataset |
|
| [ | Diabetics | Pima diabetic dataset |
|
| [ | Parkinson disease | Parkinsons Dataset |
|
| [ | Breast cancer | WDBC dataset |
|
| [ | COVID-19 | Covid-chest X-ray dataset |
|