| Literature DB >> 35693267 |
Eman Yahia Alqaissi1,2, Fahd Saleh Alotaibi1, Muhammad Sher Ramzan1.
Abstract
Controlling infectious diseases is a major health priority because they can spread and infect humans, thus evolving into epidemics or pandemics. Therefore, early detection of infectious diseases is a significant need, and many researchers have developed models to diagnose them in the early stages. This paper reviewed research articles for recent machine-learning (ML) algorithms applied to infectious disease diagnosis. We searched the Web of Science, ScienceDirect, PubMed, Springer, and IEEE databases from 2015 to 2022, identified the pros and cons of the reviewed ML models, and discussed the possible recommendations to advance the studies in this field. We found that most of the articles used small datasets, and few of them used real-time data. Our results demonstrated that a suitable ML technique depends on the nature of the dataset and the desired goal. Moreover, heterogeneous data could ensure the model's generalization, while big data, many features, and a hybrid model will increase the resulting performance. Furthermore, using other techniques such as deep learning and NLP to extract vast features from unstructured data is a powerful approach to enhancing the performance of ML diagnostic models.Entities:
Mesh:
Year: 2022 PMID: 35693267 PMCID: PMC9185172 DOI: 10.1155/2022/6902321
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Comparison of related systematic reviews and our systematic review.
| Reference | Focus | Differences |
|---|---|---|
| [ | (i) Diagnosing COVID-19 and predicting severity and mortality risks | (i) The review is based only on clinical and laboratory data. |
| [ | (i) Diagnosis and prognosis of COVID-19 from prediction models. | (i) The review focuses more on preprints |
| [ | (i) Diagnosing hepatitis | (i) It covers only clinical tests. |
| [ | (i) Detecting pneumonia | (i) It is based only on signs and symptoms |
| [ | (i) Diagnosing tuberculosis | (i) It covers diverse AI approaches using clinical signs and symptoms and radiological images. |
| [ | (i) Diagnosing pulmonary tuberculosis | (i) It covers AI methods based on chest X-ray images. |
| [ | (i) Diagnosing tuberculous meningitis | (i) It is based only on clinical and laboratory data. |
| [ | (i) Predicting phenotypic characteristics of influenza virus | (i) It is based on genomic or proteomic input. |
| [ | (i) Diagnosing HIV, HCV, and chlamydia | (i) It implements different digital technology but does not include any kind of AI technique. |
| [ | (i) Diagnosing COVID-19, hepatitis, sepsis, malaria, Lyme disease, and tuberculosis | (i) It covers data coming from EMR. |
| [ | (i) Automatic diagnosis of several infections such as sepsis, general infections, and | (i) It covers papers based on physiological data. |
| [ | (i) Diagnosing infectious and noninfectious diseases through ML | (i) It explains in detail all reviewed ML algorithms but does not mention datasets or performance measures. |
| Our review | (i) ML diagnosis of all available human infectious disease papers | (i) It covers different kinds of ML techniques, several types of datasets, and performance measures. |
HIV: human immunodeficiency virus; HCV: hepatitis C virus; EMR: electronic medical record.
Figure 1PRISMA flow diagram.
CHARMS checklist for abstracts in the selected articles.
| Criteria | Description |
|---|---|
| “Type of diagnosis model” | Only ML algorithms classify infectious diseases |
| “Type of dataset” | Dataset specified and declared |
| “Evaluation of the model” | The performance metrics of the ML model |
Figure 2Data acquisition methods for the 14 selected articles.
Dataset specifications for the reviewed articles.
| Reference | Size | Data acquisition method | Infectious disease |
|---|---|---|---|
| [ | Time-series of physiological data | Wearable sensors to collect multivariate physiological data | Tetanus and HFMD |
| [ | 60 individuals | Medical sensors, hub, and Android-based app to collect vital signs | Skin and soft tissue infection, urinary tract infection, and acute respiratory infection |
| [ | 49,721 users | App-based symptom tracker | SARS-CoV-2 (COVID-19) |
| [ | 88 individuals | Cellular phone voice recordings | SARS-CoV-2 (COVID-19) |
| [ | 37,599 tweets | Social media messages | Latent infectious diseases |
| [ | 1,317,018 classes, 7,731,914 axioms, and 1,269,340 inheritance relations | Multiple medical ontologies | 507 infectious diseases |
| [ | 31268 reports | NLP tool (Topaz) to extract influenza-related findings | Influenza |
| [ | 52,306 patients | Routine blood tests | SARS-CoV-2 (COVID-19) |
| [ | 1391 patients | Routine laboratory results | SARS-CoV-2 (COVID-19) |
| [ | 295 patients | Clinical records | Candidemia |
| [ | 1118 patients | EHR | CDI |
| [ | 152 patients | Clinical records and chest CT images | SARS-CoV-2 (COVID-19) |
| [ | 2482 images | CT scan images | SARS-CoV-2 (COVID-19) |
| [ | 56,081 patients | Historical and real-time data | 25 infectious diseases |
SARS-CoV-2: severe acute respiratory syndrome coronavirus 2; EHR: electronic health record; HFMD: hand, foot, and mouth disease; CDI: Clostridium (Clostridioides) difficile infection; CT: computed tomography.
Features used in the diagnosis models.
| Reference | Features |
|---|---|
| [ | ECG, PPG, and IP |
|
| |
| [ | EDA, SPO2, body temperature, systolic and diastolic blood pressure, and heart rate |
|
| |
| [ | Age, gender, fever, nausea, shortness of breath, diarrhea, coryza, cough, myalgia, loss of smell or anosmia, and living with a confirmed case |
|
| |
| [ | Age, gender, cough, short speech utterances, counting, and nonspeech voicing |
|
| |
| [ | User IDs, timestamps, geospatial information, and textual information of each message |
|
| |
| [ | Patient's data, body temperature, infection site, symptoms, and signs |
|
| |
| [ | Nasal swab, lab-confirmed flu, influenza-like illness, suspected flu, viral syndrome, myalgias, rhinorrhea, viral, fever, coughing, chills, sore throat, malaise, arthralgia, pneumonia, wheezing, hoarseness, cervical lymphadenopathy, headache, hemoptysis, fatigue, diarrhea, conjunctivitis, dyspnea, anorexia, nausea, chest pain, cyanosis, pain with eye movement, photophobia, and abdominal cramps |
|
| |
| [ | Mean corpuscular hemoglobin concentration, eosinophils count, albumin, prothrombin international normalized ratio, prothrombin activity%, eosinophils%, lymphocyte %, monocyte %, gamma-glutamyltransferase, erythrocyte count, creatinine, alkaline phosphatase, leukocyte count, bilirubin total, aspartate aminotransferase, hematocrit, mean platelet volume, hemoglobin, basophils count, glucose, urea, alanine aminotransferase, age, neutrophils pH count, monocyte count, thrombocytes count, mean corpuscular volume, lymphocyte count, sodium in serum, potassium in serum, neutrophils %, mean corpuscular hemoglobin, bilirubin direct, basophils %, and erythrocyte distribution width |
|
| |
| [ | Bilirubin, ALT, AST, LDH, CRP, lymphocytes, creatinine, monocytes, neutrophils, red cell distribution width, platelets, eosinophils, mean corpuscular hemoglobin, leukocytes, mean corpuscular volume, hemoglobin, hematocrit, basophils, mean corpuscular hemoglobin concentration, and RBCs |
|
| |
| [ | Age, gender, Charlson's score, previous hospital admission, hospital admission from home, hospital admission from long-term care facility, hospital admission from medical wards, hospital admission from surgical wards, sepsis, fever, previous antifungal therapy, previous antibiotic therapy, in-hospital antibiotic therapy, in-hospital MHIA therapy, steroids during hospitalization, in-hospital immunosuppressants, concomitant infection, previous CDI, PICC, NGT, PN, UC, CVC, recent abdominal surgery, recent nonabdominal surgery, coronary heart disease, heart failure, COPD, diabetes, chronic kidney disease, dialysis, liver disease, pancreatitis, peripheral vascular disease, cerebrovascular disease, dementia, hemiplegia, connective tissue disease, peptic ulcer, leukemia or lymphoma, solid cancer, and metastatic cancer |
|
| |
| [ | Age, gender, White race, Charlson-Deyo's score, prior CDI, healthcare-associated CD, immunosuppression, solid organ transplant, metastatic cancer, hypertension, congestive heart failure, diabetes mellitus, chronic kidney disease, depression, concurrent antibiotic use, prior fluoroquinolone use, proton pump inhibitor use, fever, systolic blood pressure, mechanical ventilation, sodium, creatinine, albumin, total bilirubin, white blood cell count, hemoglobin, platelets, ribotypes, positive stool toxin by enzyme immunoassay, polymerase chain reaction cycle threshold, 30-day ICU admission, attributable 30-day ICU admission, 30-day colectomy, attributable 30-day colectomy, 30-day mortality, attributable 30-day mortality, and severe CDI |
|
| |
|
| History, demographics, and clinical data (gender, age, BMI, past medical history of comorbidities, weight, height, history of smoking, level of consciousness, initial vital signs, RR, O2Sat, PR, SBP, DBP, and temperature) |
|
| |
| [ | 100 prominent features |
|
| |
| [ | Age, gender, fever, fatigue, cough, WBC count |
ECG: electrocardiogram signal; PPG: photoplethysmogram; IP: impedance pneumography; EDA: electrodermal activity; SPO2: heart beat rate oxygen saturation; LDH: lactate dehydrogenase; CRP: C-reactive protein; RBCs: red blood cells; ALT: alanine aminotransferase; AST: aspartate aminotransferase; CDI: Clostridium difficile infection; PICC: peripherally inserted central catheter; NGT: nasogastric tube; PN: parenteral nutrition; UC: urinary catheter; CVC: central venous catheter; COPD: chronic obstructive pulmonary disease; MHIA: microbiome highly impacting antimicrobials; LOS: length of stay; RR: respiratory rate; O2Sat: O2 saturation; PR: pulse rate; SBP: systolic blood pressure; DBP: diastolic blood pressure; BMI: body mass index; WBC: white blood cells; pH: venous blood gas analysis of acidity; PCO2: carbon dioxide concentration; ALP: alkaline phosphatase; HCO3: bicarbonate concentration; Plt: platelet count; Cr: blood creatinine level; BUN: blood urea nitrogen; PT: prothrombin time; PTT: partial thromboplastin time; INR: prothrombin time normalized with the international normalized ratio; PCT: procalcitonin levels; ICU: intensive care unit.
Figure 3ML algorithms used in the reviewed articles.
Comparison of the reviewed articles concerning the applied ML techniques and the resulted performance.
| Reference | ML technique | Model performance |
|---|---|---|
| [ | SVM (Gaussian kernels) | For HFMD dataset (AC: 70.9%, precision: 60.6%, SP: 78.0%, F1 score: 55.7%, and recall: 55.9%) |
| [ | NB, filtered classifier (FC), and RF | Success ratio with the weight of each medical variable (vital signs) that affects the prediction |
| [ | LR | AUC: 0.68, recall: 60%, specificity: 75%, precision: 25%, NPV: 93%, F1 score: 35%, and MCC: 25% |
| [ | SVM | PFA: 30%, F1 score: 74%, precision: 71%, and recall: 78% |
| [ | Unsupervised sentiment analysis | Precision: 77.3%, recall: 68%, F1 score: 72.4% |
| [ | NB | ROC: 89.91%, SN: 47%, and SP: 37% |
| [ | XGBoost algorithm | AUC: 0.97, SN: 81.9%, SP: 97.9% |
| [ | SVM | AC: 91.18%, SN: 100%, SP: 84.21%, PPV: 83.33%, F1 score: 90.91%, and AUC: 0.958 |
| [ | RF | AUC: 0.847, SN: 84.24%, SP: 91%, HLT statistics: 12.779, and HLT |
| [ | EHR-based model (the study did not mention the used algorithm) | SN: 41.7%, SP: 96.7%, and PPV: 41.7% |
| [ | NB and LR | AUC: 0.93, BSS: all classifiers achieve positive BSS scores |
| [ | XGBoost algorithm | AUC: 0.95, AC: 88%, SN: 88%, and SP: 89% |
| [ | Ensemble learning model | AC: 99.73%, precision: 99.46%, recall: 100%, F1 score: 99.73%, and AUC: 0.9973 |
| [ | NB | AUC greater than: 98%, SN: 44.44% for hepatitis E and 96.67% for measles, SP: 96.36% for dengue fever and 100% for 5 diseases, median of total accuracy: 97.41%, and |
SN: sensitivity = recall; AUC: area under the curve; SP: specificity; PPV: positive predictive value; AC: accuracy; PFA: probability of false alarm: BSS: brier skill score; HLT p: the p value of the Hosmer–Lemeshow test; ROC: receiver operating characteristic.
Comparison of pros and cons of the reviewed articles.
| Reference | Pros | Cons |
|---|---|---|
| [ | (i) The proposed method provides efficient hospital resources | (i) The manual encoding of features used to encode the waveform dynamics in time and frequency domains is time-consuming and may have errors |
|
| ||
| [ | (i) The study shows an accessible, easy to use, flexible, ubiquitous, and cost-effective eHealth system for diagnosing infectious diseases from vital signs. | (i) The short period of sampling affects the classification results, and more accuracy is needed |
|
| ||
| [ | (i) The proposed model shows that a combination of symptoms assisted with the prediction of COVID-19 infection | (i) Some studies criticize the use of symptoms for classifying COVID-19 because of the existence of other respiratory coinfections and the nonspecific nature of some symptoms |
|
| ||
| [ | (i) The study shows that voice-based screening for COVID-19 is possible | (i) The study uses a small set of vocal-input types that were self-recorded |
|
| ||
| [ | (i) The study is helpful in diagnosing latent infectious diseases in early stages without prior training data and in a short period. | (i) There is a need to improve the performance of the proposed model and to include accuracy measures when considering social media user information (e.g., age, gender, and posting frequency). |
|
| ||
| [ | (i) It is more comprehensive compared to other existing works | (i) Symptoms are not weighed to distinguish syndromes and signs |
|
| ||
| [ | (i) The study determines the most useful routine blood parameters for COVID-19 diagnosis from a large number of patients > 5000 | (i) The proposed model might be inefficient at the stage where there are no systemic effects |
|
| ||
| [ | (i) The proposed ML model can be used as a decision support system tool | (i) The study patients' comorbidities are not available in the dataset |
|
| ||
| [ | (i) The use of ML to predict Candidemia improves decision-making for appropriateness in antifungal and antibiotic therapies | (i) There is a need for a large number of patients |
|
| ||
| [ | (i) The EHR-based model can be used as clinical decision support to predict complicated cases of CDI on the day of diagnosis | (i) The performance metrics that are used are not enough to evaluate the model |
|
| ||
| [ | (i) Machine-learning classifiers perform better than expert constructed classifiers using the NLP extraction tool | (i) The study focuses on the data of only one health system |
|
| ||
| [ | (i) The study shows that the combination of clinical data and radiomic features, including all measures in the optimal model, has the height performance, and can effectively predict survival in COVID-19 | (i) The used dataset is small, and there is a lack of an external validation dataset |
|
| ||
| [ | (i) Diagnosing COVID-19 is faster and more accurate than other traditional methods applied to the same CT image dataset. | (i) None of the COVID-19 variants is included in the study |
|
| ||
| [ | (i) Various types of predictors are utilized | (i) The validation dataset is available for only 12 out of 25 infectious diseases |
Figure 4Percentages of dataset types and ML algorithms from the reviewed articles.