Literature DB >> 34512057

Deep Learning-Based Available and Common Clinical-Related Feature Variables Robustly Predict Survival in Community-Acquired Pneumonia.

Ding-Yun Feng¹, Yong Ren², Mi Zhou³, Xiao-Ling Zou¹, Wen-Bin Wu¹, Hai-Ling Yang¹, Yu-Qi Zhou¹, Tian-Tuo Zhang¹.

Abstract

BACKGROUND: Community-acquired pneumonia (CAP) is a leading cause of morbidity and mortality worldwide. Although there are many predictors of death for CAP, there are still some limitations. This study aimed to build a simple and accurate model based on available and common clinical-related feature variables for predicting CAP mortality by adopting machine learning techniques.
METHODS: This was a single-center retrospective study. The data used in this study were collected from all patients (≥18 years) with CAP admitted to research hospitals between January 2012 and April 2020. Each patient had 62 clinical-related features, including clinical diagnostic and treatment features. Patients were divided into two endpoints, and by using Tensorflow2.4.1 as the modeling framework, a three-layer fully connected neural network (FCNN) was built as a base model for classification. For a comprehensive comparison, seven classical machine learning methods and their integrated stacking patterns were introduced to model and compare the same training and test data.
RESULTS: A total of 3997 patients with CAP were included; 205 (5.12%) died in the hospital. After performing deep learning methods, this study established an ensemble FCNN model based on 12 FCNNs. By comparing with seven classical machine learning methods, the area under the curve of the ensemble FCNN was 0.975 when using deep learning algorithms to classify poor from good prognosis based on available and common clinical-related feature variables. The predicted outcome was poor prognosis if the ControlNet's poor prognosis score was greater than the cutoff value of 0.50. To confirm the scientificity of the ensemble FCNN model, this study analyzed the weight of random forest features and found that mainstream prognostic features still held weight, although the model is perfect after integrating other factors considered less important by previous studies.
CONCLUSION: This study used deep learning algorithms to classify prognosis based on available and common clinical-related feature variables in patients with CAP with high accuracy and good generalizability. Every clinical-related feature is important to the model.

Entities: Chemical

Keywords: community-acquired pneumonia; deep learning; mortality; predictor

Year: 2021 PMID： 34512057 PMCID： PMC8427836 DOI： 10.2147/RMHP.S317735

Source DB: PubMed Journal: Risk Manag Healthc Policy ISSN： 1179-1594

Background

With the development of medical technology, technologies and opinions related to the diagnosis, treatment, and prognosis of community-acquired pneumonia (CAP) are continually developing. However, according to recent data, CAP is still one of the most common lung diseases worldwide and remains a major clinical and public health problem globally.1 A retrospective survey of the global burden of disease covering 195 countries and territories worldwide for more than 30 years showed that lower respiratory tract infections affected 471.8 million people and caused 2.6 million deaths in 2017 alone.2 Sun et al conducted a retrospective analysis of CAP incidence using the Chinese Urban Basic Medical Insurance database of 23 provinces and revealed a relatively high level (7.13 per 1000 person-years) of CAP incidence in China.3 Ceccato et al analyzed a prospective observational cohort and found that the mortality of CAP ranged from 4.8% to 7.6%.4 In China, the mortality of CAP caused by non-influenza respiratory viruses in adults was 3.1%.5 Although the development and updating of antibiotics have been essential in treating pneumonia, the reduction in the pneumonia-related death rate has been relatively limited.1 Therefore, it is still of great clinical significance to study the prognostic factors of pneumonia. Cataudella et al found that the neutrophil-to-lymphocyte ratio was an emerging marker predicting prognosis in elderly adults with CAP.6 Guo found that serum serial C-reactive protein and procalcitonin levels had moderate predictive values for hospitalized CAP prognosis.7 Mendez revealed that lymphopenic community-acquired pneumonia is associated with a dysregulated immune response and increased severity and mortality.8 However, different studies selected different populations with pneumonia, targeted different predictive indicators, and established different predictive models. This approach has several limitations, based on the study design, type of tests used, and subsequent statistical testing.9 Artificial intelligence (AI) is a gradually changing medical practice. Stripped of its science-fiction trappings and ambitions, AI at its core is a branch of computer science that attempts to understand and construct intelligent entities represented by software programs.10 AI is good at dealing with complex nonlinear relations and can overcome the shortcomings of traditional models. It is characterized by self-learning, associative memory, self-adaptation, fault tolerance, and highly parallel processing, and it has great potential in disease prediction.11 Deep learning is an important component of AI.12 Kuo used deep learning to predict the occurrence of hospital-acquired pneumonia in 185 patients with schizophrenia, and the results revealed that of the seven machine learning algorithms, the prediction accuracy of random forest and decision tree was better than that of other algorithms.13 Therefore, this study aimed to build a simple and accurate model based on available and common clinical-related feature variables for predicting CAP mortality by adopting machine learning techniques.

Methods

Training and Test Cohort

Two different cohorts were used to achieve a broad patient representation and improve the ability to generalize the results to other cohorts: patients with CAP between January 2012 and April 2020 at the Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China. Common inclusion criteria for the two cohorts were diagnosis of CAP, including the presence of a new pulmonary infiltrate associated with at least one of the following:14 new or increased cough with or without purulent tracheobronchial secretion or new pathogenic bacteria isolated from sputum or tracheal aspirate culture with ≥104 colony-forming units/mL, fever (>37.8°C) or hypothermia (<35.6°C), leukocytosis, left shift, or leukopenia based on local normal values. As severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) has been around the world since early 2020, from then on all CAP patients in the general ward had been screened for SARS-CoV-2 before hospitalization. Patients with SARS-CoV-2, acquired immunodeficiency syndrome, interstitial lung disease, or missing key data were excluded. Patients were labeled as having distinct prognoses depending on the follow-up data. Patients were assigned to the good outcome group if they had no record of CAP-related deaths during hospitalization. The poor outcome group consisted of patients who died during hospitalization. A total of 3977 patients were eventually included in the training and internal test sets, including 3772 patients with good prognosis and 205 patients with poor prognosis. Compared to the number of patients with a good prognosis, few patients had a poor prognosis. To balance the number of patients with the two outcomes, which makes the following modeling process reliable, a downsampling method for negative samples (with a good prognosis) was adopted. A total of 3772 samples were randomly divided into 12 non-overlapping subsets, each containing 250 negative samples. Simultaneously, the positive samples were replicated into 12 sets by up-sampling. A negative subset and a positive set constituted a model dataset, and the model training set and internal test set were randomly divided in a ratio of 4:1. Thus, 12 datasets with balanced positive and negative sample sizes were used to train 12 models.

Data Preparation

Each patient not only had a good or bad outcome, but also had 62 clinical-related features, including clinical diagnostic features and clinical treatment features. For features with continuous values, it is necessary to normalize them to distribute in the range of 0 to 1, while classification features need to be transformed into one hot coding form. When a feature has a vacancy value, it is filled with its mean or the level with the highest frequency. Feature A contains only diagnostic information, and Feature B contains diagnostic and treatment information.

Base Classification Model

Using Tensorflow2.4.1 as a modeling framework, a three-layer fully connected neural network (FCNN) was built as a base model for the classification. Twenty neurons were set in the first layer, ten neurons in the second layer, and “Relu” was selected as the activation function for the layers. For the last layer, two neurons corresponding to “good prognosis” and “poor prognosis” were set respectively, and the activation function was set to “Softmax.” The “Adam” optimizer was adopted when the model was compiled with the learning rate set to 0.0008, and the training loss function was based on binary cross-entropy.

Prognostic Network

Twelve models based on FCNN were trained on the 12 datasets and internal test cohorts with the patients’ distinct outcomes as ground truth. Twelve scores for each patient were predicted by the 12 models. The scores were between 0 and 1, and a score closer to 1 indicated a high probability of poor prognosis. On the contrary, a value closer to 0 indicated a high probability of a good prognosis. By averaging the 12 selected models’ scores for a patient, an ensemble model, named as “ProgNet”, was created and used to predict the patient’s final poor prognosis score (PPS). PPS is a continuous quantized value, which ranges from 0 to 1. Based on Features A and B, two different prognostic networks and their corresponding PPS could be obtained.

Traditional Machine Learning methods

For a comprehensive comparison, seven classical machine learning methods, including logistic regression, support vector machine, K-nearest neighbor, Gaussian naive Bayes, decision tree, and random forest, and their integrated stacking patterns, were introduced to model and compare the same training and test data.

Model Performance Evaluation Metrics

For the binary classification tasks of the base model on patches in the internal cohort, the confusion matrix and receiver operating characteristic (ROC) curve were estimated. A confusion matrix was used to visually evaluate the performance of the deep learning algorithms. Each row of the matrix represented an instance of a real label, and each column represented an instance of the prediction label. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated using the confusion matrix. The ROC curve was depicted by plotting the true positive rate (TPR, sensitivity) versus the false positive rate (FPR, 1-specificity) at various threshold settings. Accuracy was measured using the area under the ROC curve (AUC) and F1 score. To determine a suitable threshold for dichotomizing ProgNet’s predicted probability of poor prognosis, we computed the index of the dichotomized ProgNet prediction for thresholds at 0.01, 0.02, and up to and including 0.99 for patients. The threshold for obtaining the maximum TPR-FPR was selected as the cutoff value for ProgNet. The predicted outcome was poor prognosis if the ControlNet PPS was greater than the cutoff value; otherwise, PPS was less than or equal to the cutoff value, and the predicted outcome was a good prognosis.

Results

A total of 3997 patients with CAP were included at the Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China; 205 (5.12%) died in the hospital. First, using Tensorflow2.4.1 as the modeling framework, a three-layer FCNN was built as a base model for classification. Twelve models based on FCNN were trained on the 12 datasets in the training and internal test cohorts with the patients’ distinct outcomes as ground truth (Figure 1). The current study found that the AUC of the 12 models ranged from 0.882 to 0.989 (Figure 2A), with an accuracy rate ranging from 0.793 to 0.990. PPV and NPV were satisfactory. In order to improve the prediction accuracy of the model, this study averaged the scores of the 12 selected models for a patient and established an ensemble model (ensemble FCNN). The AUC of the ensemble FCNN was 0.975 (Figure 2B), with an accuracy rate of 0.952 (Table 1).

Figure 1

Deep learning flowchart.

Figure 2

(A,B)Area under the ROC curve of each model and method.

Table 1

Comparison of Performances of Each Model and the Ensemble Model in Internal and External Test

Model	AUC	ACC	PPV	NPV	SENS	SPEC	F1
FCN – Sampling 1	0.952	0.905	1	0.841	0.813	1	0.904
FCN – Sampling 2	0.945	0.916	0.941	0.874	0.852	0.953	0.903
FCN – Sampling 3	0.938	0.99	0.923	0.962	0.971	0.893	0.941
FCN – Sampling 4	0.945	0.901	0.974	0.863	0.804	0.983	0.904
FCN – Sampling 5	0.941	0.99	0.933	0.954	0.963	0.902	0.941
FCN – Sampling 6	0.921	0.869	0.892	0.854	0.824	0.911	0.878
FCN – Sampling 7	0.922	0.893	0.942	0.864	0.824	0.964	0.893
FCN – Sampling 8	0.915	0.882	0.974	0.824	0.794	0.982	0.884
FCN – Sampling 9	0.921	0.869	0.971	0.813	0.744	0.984	0.874
FCN – Sampling 10	0.939	0.918	0.933	0.874	0.864	0.933	0.894
FCN – Sampling 11	0.882	0.793	0.784	0.802	0.744	0.831	0.794
FCN – Sampling 12	0.989	0.941	0.911	0.974	0.982	0.903	0.944
Ensemble FCN (Based on 12 FCNs)	0.975	0.952	0.954	0.954	0.951	0.952	0.952

Abbreviations: AUC, area under the ROC curve; ACC, accuracy rate; PPV, positive predictive value; NPV, negative predictive value; SENS, sensitivity; SPEC, specificity; F1, accuracy score.

Comparison of Performances of Each Model and the Ensemble Model in Internal and External Test Abbreviations: AUC, area under the ROC curve; ACC, accuracy rate; PPV, positive predictive value; NPV, negative predictive value; SENS, sensitivity; SPEC, specificity; F1, accuracy score. Deep learning flowchart. (A,B)Area under the ROC curve of each model and method. To prove the superiority of the ensemble FCNN model, we compared it with other seven classical machine learning methods based on the same training and test data, including logistic regression, support vector machine, K-nearest neighbor, Gaussian naive Bayes, decision tree, and random forest, and their integrated stacking patterns. The results showed that the ensemble FCNN model had the best AUC, achieving 0.975, with the best accuracy rate of 0.952 (Table 2).

Table 2

Comparison of Performances of the Ensemble Model and Seven Classical Machine Learning Methods in Internal and External Test

Model	AUC	ACC	PPV	NPV	SENS	SPEC	F1
Logistic Regression	0.801	0.847	0.862	0.764	0.724	0.881	0.804
Support Vector Machine	0.837	0.835	1.00	0.754	0.673	1.00	0.842
K Nearest Neighbor	0.778	0.776	0.854	0.734	0.671	0.882	0.785
Gaussian Naive Bayes	0.813	0.812	0.914	0.753	0.702	0.933	0.814
Decision Tree	0.835	0.835	0.824	0.851	0.862	0.811	0.843
Random Forest	0.824	0.824	0.851	0.803	0.794	0.862	0.822
Stacking Classifier	0.825	0.824	0.911	0.764	0.721	0.932	0.823
Ensemble FCN (Based on 12 FCNs)	0.975	0.952	0.954	0.954	0.951	0.952	0.952

Abbreviations: AUC, area under the ROC curve; ACC, accuracy rate; PPV, positive predictive value; NPV, negative predictive value; SENS, sensitivity; SPEC, specificity; F1, accuracy score.

Comparison of Performances of the Ensemble Model and Seven Classical Machine Learning Methods in Internal and External Test Abbreviations: AUC, area under the ROC curve; ACC, accuracy rate; PPV, positive predictive value; NPV, negative predictive value; SENS, sensitivity; SPEC, specificity; F1, accuracy score. The predicted outcome was poor prognosis if the ControlNet’s poor prognosis score was greater than the cutoff value of 0.50. To confirm the scientificity of the ensemble FCNN model, we also analyzed the weight of random forest features (Figure 3, ) found that many mainstream prognostic features were still held important weight, but the model was perfect after integrating other factors considered less important by previous studies.

Figure 3

Weight of random forest features based on the ensemble FCNN model.

Discussion

In this study, each patient had 62 clinical-related features, including clinical diagnostic and treatment features. Based on the available and common clinical-related feature variables, this study obtained a precise model for predicting CAP mortality by adopting deep learning techniques. Deep learning has been widely used in medicine because of its computational power and availability of massive new datasets. For the sheer volume of data being generated and the increasing proliferation of medical devices and digital record systems, deep learning is convenient for healthcare and medicine.15,16 Presently, only a few studies have reported the application of deep learning in pneumonia-related studies. Li et al used AI to detect coronavirus disease 2019 (COVID-19) and CAP based on pulmonary computed tomography and found that a deep learning model can accurately detect COVID-19 and differentiate it from CAP and other lung conditions.17 Wang et al aimed to develop and test an efficient and accurate deep learning scheme based on chest X-ray that assists radiologists in automatically recognizing and localizing COVID-19 and revealed that compared to the radiologists’ discrimination and localization results, the accuracy of COVID-19 discrimination using the discrimination-DL yielded 98.71%, while the accuracy of localization using localization-DL was 93.03%.18 Another study compared CAP, secondary pulmonary tuberculosis, and healthy control images and found that the deep learning model was a good model for distinguishing COVID-19 from other lung infectious diseases.19 However, most previous studies used deep learning based on imaging data to distinguish pneumonia, while this study predicted the prognosis of pneumonia based on commonly used clinical test data, which was innovative noticeably. FCNN is a novel neural network with inherent features characterized by automatic feature extraction and classification steps.20 This study found that the best AUC of the 12 FCNN models was 0.989, with a perfect accuracy rate. PPV and NPV were satisfactory. To improve the prediction accuracy of the model, this study established an ensemble FCNN based on the 12 selected models. The ensemble FCNN model predictions were good. It has been suggested that FCNN has similar advantages in predicting the death of patients with CAP and other diseases in other studies.21,22 To further prove that this model was superior to that established by traditional statistical methods, it was compared with seven other classical machine learning methods based on the same training and test data. The results showed that the ensemble FCNN model had the best AUC of 0.975, with the best accuracy rate of 0.952. This was similar to a previous study, confirming the superiority of FCNN.23,24 Meanwhile, similar methods to proof have been used in other studies. Chandak et al used machine learning to improve ensemble docking for drug discovery and compared it with other classical machine learning.25 The predicted outcome was poor prognosis if the ControlNet’s poor prognosis score was greater than the cutoff value of 0.50 in the current study. This made the ensemble FCNN model easier to use. It was convenient for medical institutions, especially grassroots medical and health institutions. As clinicians, we believe there are many researchers, such as authors who are interested in the composition of the ensemble FCNN model, that is, about the weight of each parameter in the model. Would factors that were considered important predictors, such as treatment regimens still play a key role in the ensemble FCNN model? What about previously overlooked factors? We found that each variable played a role from the weight of random forest features based on the ensemble FCNN model. The top five variables were β-lactam plus macrolide treatment, other antibiotics treatment, blood albumin level, fluoroquinolone treatment alone, and blood urea nitrogen/albumin rate; β-lactam plus macrolide treatment is a common therapeutic schedule for CAP. Horita et al reviewed and analyzed the published trials, and observational studies revealed that β-lactam plus macrolide treatment might decrease all-cause death CAP.26 Ceccato et al found that β-lactam plus macrolide treatment was helpful for patients with pneumococcal CAP and patients with a high systemic inflammatory response.4 In the current study, other antibiotic treatments were defined as carbapenems combined with vancomycin or linezolid. In other words, the patients in whom clinicians decided to use the other antibiotic treatment scheme may be very critical; therefore, the mortality may be higher as the model showed that it was related to high mortality. This is understandable and acceptable There seem to be few studies on albumin levels and the prognosis of pneumonia. However, the albumin level is an important indicator of nutritional status and is related to the prognosis of many diseases. Low serum albumin levels are associated with an increased risk of death in patients with severe sepsis.27 Low serum albumin level is a powerful predictor of all-cause mortality in patients with acute coronary syndrome.28 Fluoroquinolone treatment alone is also a common therapeutic strategy for CAP. It plays an important role in the prognosis of CAP. There was no difference in the 30-day readmissions between patients with CAP who received fluoroquinolone monotherapy and those who received β-lactam plus macrolide combination therapy.29 The blood urea nitrogen/albumin ratio is a simple parameter. Akyil et al found that the blood urea nitrogen/albumin rate was related to the prognosis of patients hospitalized with CAP.30 In summary, the top five variables of the model are consistent with mainstream research. This model has considerable reliability and applicability. However, there were some limitations to the current study. First, as a retrospective study, there may be bias in the selection of medical records. Second, as a signal center study, the sample size was limited, and the study lacked external validation data. In the future, we plan to conduct a prospective multicenter large-sample study to further confirm the value of the model.

Conclusion

The present study used deep learning algorithms to classify prognosis based on available and common clinical-related feature variables in patients with CAP with high accuracy and good generalizability.

28 in total

1. DeepVOG: Open-source pupil segmentation and gaze estimation in neuroscience using deep learning.

Authors: Yuk-Hoi Yiu; Moustafa Aboulatta; Theresa Raiser; Leoni Ophey; Virginia L Flanagin; Peter Zu Eulenburg; Seyed-Ahmad Ahmadi
Journal: J Neurosci Methods Date: 2019-06-06 Impact factor: 2.390

2. Pneumonia is a neglected problem: it is now time to act.

Authors: Stefano Aliberti; Charles S Dela Cruz; Giovanni Sotgiu; Marcos I Restrepo
Journal: Lancet Respir Med Date: 2018-11-12 Impact factor: 30.700

3. Predictive Value of Serum Albumin Level for the Prognosis of Severe Sepsis Without Exogenous Human Albumin Administration: A Prospective Cohort Study.

Authors: Mei Yin; Lei Si; Weidong Qin; Chen Li; Jianning Zhang; Hongna Yang; Hui Han; Fan Zhang; Shifang Ding; Min Zhou; Dawei Wu; Xiaomei Chen; Hao Wang
Journal: J Intensive Care Med Date: 2016-12-26 Impact factor: 3.510

4. Effect of Combined β-Lactam/Macrolide Therapy on Mortality According to the Microbial Etiology and Inflammatory Status of Patients With Community-Acquired Pneumonia.

Authors: Adrian Ceccato; Catia Cilloniz; Ignacio Martin-Loeches; Otavio T Ranzani; Albert Gabarrus; Leticia Bueno; Carolina Garcia-Vidal; Miquel Ferrer; Michael S Niederman; Antoni Torres
Journal: Chest Date: 2018-11-22 Impact factor: 9.410

Review 5. Artificial intelligence in radiology.

Authors: Ahmed Hosny; Chintan Parmar; John Quackenbush; Lawrence H Schwartz; Hugo J W L Aerts
Journal: Nat Rev Cancer Date: 2018-08 Impact factor: 60.716

6. Accelerating cardiovascular model building with convolutional neural networks.

Authors: Gabriel Maher; Nathan Wilson; Alison Marsden
Journal: Med Biol Eng Comput Date: 2019-08-24 Impact factor: 2.602

7. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017.

Authors:
Journal: Lancet Date: 2018-11-08 Impact factor: 79.321