| Literature DB >> 35968461 |
Clement Yaw Effah1, Ruoqi Miao1, Emmanuel Kwateng Drokow2, Clement Agboyibor3, Ruiping Qiao4, Yongjun Wu1, Lijun Miao4, Yanbin Wang5.
Abstract
Background: Pneumonia is an infection of the lungs that is characterized by high morbidity and mortality. The use of machine learning systems to detect respiratory diseases via non-invasive measures such as physical and laboratory parameters is gaining momentum and has been proposed to decrease diagnostic uncertainty associated with bacterial pneumonia. Herein, this study conducted several experiments using eight machine learning models to predict pneumonia based on biomarkers, laboratory parameters, and physical features.Entities:
Keywords: decision support system (DSS); electronic health records (EHR); machine learning; non-invasive measures; pneumonia
Mesh:
Year: 2022 PMID: 35968461 PMCID: PMC9371749 DOI: 10.3389/fpubh.2022.938801
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Confusion matrix.
Performance evaluation metrics equations.
|
|
|
|---|---|
| Accuracy |
|
| Recall |
|
| Precision |
|
| F-measure |
|
Figure 2Target data (LRTI) distribution before and after applying SMOTE. The label '0' is pneumonia and “1” for bronchitis. (A) Imbalanced data. (B) Balance data.
LR prediction result of feature selection methods on original dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| LV | 80.4 | 83.7 | 84.4 | 84.0 |
| UFS | 82.6 | 85.8 | 85.9 | 85.8 |
| L1 | 75.9 | 79.0 | 82.5 | 80.7 |
| L2 | 77.9 | 82.3 | 81.6 | 81.8 |
| Tree-based | 83.0 | 85.7 | 86.8 | 86.2 |
| PCA | 81.1 | 84.5 | 84.7 | 84.6 |
LR prediction result of feature selection method on balanced dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| LV | 83.6 | 85.4 | 81.3 | 83.4 |
| UFS | 82.2 | 83.3 | 80.9 | 82.0 |
| L1 | 77.3 | 78.2 | 75.4 | 77.1 |
| L2 | 79.1 | 81.5 | 75.2 | 78.0 |
| Tree-based | 82.0 | 83.1 | 80.3 | 81.6 |
| PCA | 85.4 | 86.6 | 83.0 | 84.7 |
Machine learning model prediction results on the original dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| LR | 81.4 | 82.7 | 84.2 | 84.3 |
| NB | 59.8 | 89.6 | 39.2 | 53.7 |
| SVM | 80.7 | 82.8 | 86.5 | 84.5 |
| ADT | 90.1 | 91.3 | 92.7 | 91.9 |
| KNN | 72.1 | 87.3 | 63.8 | 73.5 |
| RF | 92.0 | 91.3 | 96.0 | 93.6 |
| XGBoost | 90.8 | 92.6 | 92.3 | 92.4 |
| MLP | 79.4 | 83.7 | 82.5 | 82.9 |
Figure 3Confusion matrix of XGBoost and random forest on the original dataset. (A) XGBoost. (B) RF.
Figure 4ROC curves of XGBoost and random forest on the original dataset. (A) XGBoost. (B) RF.
Figure 5Feature importance according to XGBoost model on the original dataset.
Figure 6Feature importance according to the RF model on the original dataset.
Machine learning model prediction results in the balanced dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| LR | 83.6 | 84.9 | 81.2 | 83.1 |
| NB | 68.4 | 75.8 | 54.4 | 62.7 |
| SVM | 81.1 | 83.0 | 77.2 | 80.1 |
| ADT | 91.0 | 91.2 | 90.1 | 90.9 |
| KNN | 75.0 | 91.9 | 54.8 | 68.4 |
| RF | 92.2 | 93.0 | 91.2 | 92.0 |
| XGBoost | 91.2 | 91.1 | 91.6 | 91.2 |
| MLP | 81.4 | 81.9 | 83.2 | 82.4 |
Figure 7Confusion matrix of XGBoost and RF on SMOTE data. (A) XGBoost. (B) RF.
Figure 8ROC curves of XGBoost and random forest on the SMOTE dataset. (A) XGBoost. (B) RF.
Figure 9Feature importance according to the XGBoost model on the SMOTE dataset.
Figure 10Feature importance according to the RF model on the SMOTE dataset.
AUCs of the various models before and after SMOTE.
|
|
|
|
|
|---|---|---|---|
| LR | 89 | 91 | 0.032 |
| NB | 82 | 76 | 0.019 |
| SVM | 89 | 86 | 0.221 |
| ADT | 91 | 94 | 0.071 |
| KNN | 79 | 84 | 0.016 |
| RF | 96 | 97 | 0.050 |
| XGBoost | 97 | 97 | 0.314 |
| MLP | 80 | 86 | 0.005 |
Figure 11Decision boundaries of the models on the original dataset.
Figure 12Decision boundaries of the models on the balanced dataset.
External validation results from the best models.
|
|
|
|
|
|
|---|---|---|---|---|
| RF | 88.6 | 84.8 | 95.6 | 89.7 |
| XGBoost | 88.7 | 86.4 | 93.1 | 89.3 |
Figure 13AUROC curves for the external validation dataset. (A) XGBoost. (B) RF.
Comparing prediction performance from various studies that used non-invasive measures.
|
|
|
|
|
|---|---|---|---|
| DT, SVM, LR | Pneumonia | Accuracy-84, 82, 83 | ( |
| RF, LightGBM, SVM, DT | COVID-19 | Accuracy-89, 88, 84, 82 | ( |
| LogitBoost, RF, DT | Blood diseases | Accuracy-98.2, 97.1, 97 | ( |
| XGBoost, LightGBM | Accuracy-93, 91 | ( | |
| LR | COVID-19 | Specifificity-0.95; AUC-0.971; Sensitivity-0.82 | ( |
| RF, XGBoost | Pneumonia | Accuracy-92, 90.8; AUCs-0.96, 0.97 | This study |