Massimiliano Greco1, Giovanni Angelotti2, Pier Francesco Caruso3, Alberto Zanella4, Niccolò Stomeo1, Elena Costantini2, Alessandro Protti1, Antonio Pesenti4, Giacomo Grasselli4, Maurizio Cecconi1. 1. Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072 Pieve Emanuele, Milan, Italy; IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy. 2. IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy. 3. Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072 Pieve Emanuele, Milan, Italy; IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy. Electronic address: pierfrancesco.caruso@humanitas.it. 4. Dipartimento di Anestesia, Rianimazione ed Emergenza-Urgenza, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy; Department of Pathophysiology and Transplantation, University of Milan, Milan, Italy.
Abstract
PURPOSE: COVID-19 disease frequently affects the lungs leading to bilateral viral pneumonia, progressing in some cases to severe respiratory failure requiring ICU admission and mechanical ventilation. Risk stratification at ICU admission is fundamental for resource allocation and decision making. We assessed performances of three machine learning approaches to predict mortality in COVID-19 patients admitted to ICU using early operative data from the Lombardy ICU Network. METHODS: This is a secondary analysis of prospectively collected data from Lombardy ICU network. A logistic regression, balanced logistic regression and random forest were built to predict survival on two datasets: dataset A included patient demographics, medications before admission and comorbidities, and dataset B included respiratory data the first day in ICU. RESULTS: Models were trained on 1484 patients on four outcomes (7/14/21/28 days) and reached the greatest predictive performance at 28 days (F1-score: 0.75 and AUC: 0.80). Age, number of comorbidities and male gender were strongly associated with mortality. On dataset B, mode of ventilatory assistance at ICU admission and fraction of inspired oxygen were associated with an increase in prediction performances. CONCLUSIONS: Machine learning techniques might be useful in emergency phases to reach good predictive performances maintaining interpretability to gain knowledge on complex situations and enhance patient management and resources.
PURPOSE: COVID-19 disease frequently affects the lungs leading to bilateral viral pneumonia, progressing in some cases to severe respiratory failure requiring ICU admission and mechanical ventilation. Risk stratification at ICU admission is fundamental for resource allocation and decision making. We assessed performances of three machine learning approaches to predict mortality in COVID-19 patients admitted to ICU using early operative data from the Lombardy ICU Network. METHODS: This is a secondary analysis of prospectively collected data from Lombardy ICU network. A logistic regression, balanced logistic regression and random forest were built to predict survival on two datasets: dataset A included patient demographics, medications before admission and comorbidities, and dataset B included respiratory data the first day in ICU. RESULTS: Models were trained on 1484 patients on four outcomes (7/14/21/28 days) and reached the greatest predictive performance at 28 days (F1-score: 0.75 and AUC: 0.80). Age, number of comorbidities and male gender were strongly associated with mortality. On dataset B, mode of ventilatory assistance at ICU admission and fraction of inspired oxygen were associated with an increase in prediction performances. CONCLUSIONS: Machine learning techniques might be useful in emergency phases to reach good predictive performances maintaining interpretability to gain knowledge on complex situations and enhance patient management and resources.
Towards the end of 2019, a novel strand of coronavirus, named Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2), was identified as the causative agent of an outbreak of bilateral pneumonia in the city of Wuhan in China [1]. The clinical picture related with SARS-CoV-2 infection, was subsequently named COVID-19 disease, and is frequently characterized by severe bilateral pneumonia. The epidemic spread outside mainland China to an increasing number of countries, and on March 11th, 2020, it was declared a pandemic [2].Lombardy region in Italy was the epicentre of the first outbreak of COVID-19 in the Western World. In Lombardy, the first cases were recognized at the end of February, and the number of Intensive Care Unit admissions rose substantially in the following weeks [3].The outcomes of patients admitted to ICU for COVID-19 disease are severe, and comparable with those of patients with severe Acute Respiratory Distress Syndrome (ARDS), with mortality up to 50% in patients requiring mechanical ventilation [4], [5], [6], [7]. Several factors have been associated with a negative outcome, including age, male gender, previous comorbidities, and level of respiratory support at ICU admission [8], [4], [9].Machine learning algorithms are increasingly employed in clinical medicine due to their potential of analysing large amount of information with reduced human supervision, resulting in high predictive performance [10], [11]. This kind of models can similarly help to hasten data cleaning and finetuning of predictive models, a process which would normally require weeks of data cleaning and exploratory analysis, while time and human resources are scarce during an emergency.On the contrary, a purely data-driven approach applied through artificial intelligence could yield good predictive performance while using less resources and in a lower time [12]. Better use of scarce resource through artificial intelligence could be useful both for the healthcare system, to enhance the allocation of resources, and for patients, to rapidly target the best therapeutic strategies with realistic goals. We tested the feasibility and performance of a purely data-driven machine learning model to predict mortality using emergency operative data.
Methods
This is a secondary analysis of data collected during the COVID-19 Lombardy outbreak from February 2020 to April 2020 using operational and clinical data from the Lombardy ICU network, as described in previous studies [3], [4], [13]. The aim of this study is to predict survival at 7, 14, 21 and 28 days from ICU admission, using several different supervised learning frameworks and comparing them to baseline models. Data on patient baseline characteristics, including medications, comorbidities and baseline ventilation parameters are included in the analysis. Data are described as mean (standard deviation) or frequency (percentage), as appropriate.We first conducted univariate analysis of the data, testing associations with Chi-square test for categorical variables and Mann-Whitney U test for continuous variables. We chose non-parametric tests due to the nature of the data that do not follow a gaussian distribution according to the Shapiro-Wilk test for normality. Survival analysis was conducted plotting Kaplan-Meier curves. We built two different models’ (model A and B), to predict survival at four different timepoints: 7-day, 14-day, 21-day and 28-day mortality from ICU admission.Missing data were a small minority (<1.2%) in our categorical variables and were imputed using the Simple Imputer; due to the categorical nature of the features, we opted for a median imputation.Patients whose data from the ICU were missing were excluded as reported in the inclusion and exclusion criteria presented in Fig. 1
. To assess the type of missingness of those data, we trained a multivariate model to test the hypothesis that patients with missing data belonged to a different cohort compared to those without missing data. The model, validated with 10-fold cross validation, and trained on baseline covariates, such as age, gender, and comorbidities, was not able to differentiate the two groups with average Area Under the Curve (AUC) of 0.5. Those results may indicate that missing data is not a result of clinical severity thus pointing to a missing at random (MAR) distribution.
Fig. 1
Inclusion criteria diagram of our cohort population. Model A was built using data from 1484 patients, while Model B was trained on a reduced cohort (929 patients).
Inclusion criteria diagram of our cohort population. Model A was built using data from 1484 patients, while Model B was trained on a reduced cohort (929 patients).Model A included only baseline patient data (age, gender, home medications and comorbidities); Model B included baseline data (the same data used to create model A) and ventilation parameters from the first 24 h in ICU: the list of training variables is presented in Table 1
.
Description of covariates characteristics according to survival analysis. IQR: ‘InterQuartile Range’, SD: ‘Standard Deviation’, COPD: ‘Chronic Obstructive Pulmonary Disease’, CKD: ‘Chronic Kidney Disease’, ACE: ‘Angiotensin Converting Enzyme’, ARBs: ‘Angiotensin receptor blockers’, PEEP: ‘Positive End-Expiratory Pressure’, FiO2: ‘Fraction of inspired oxygen’, PaO2: ‘Partial pressure of oxygen’, P/F: ‘PaO2/FIO2, CPAP: ‘Continuous positive airway pressure’, IMV: ‘Invasive Mechanical Ventilation’, NIV: ‘Non Invasive Ventilation’, SB: ‘Spontaneous Breathing’.We created four subsequent models with increasing complexity. A baseline model was created using a uniform classifier (UNIF), a simple classifier commonly employed as baseline in ML models [14]. Our first prediction model was a logistic regression (LR). The second model was a composed logistic regression with under-over-sampling strategy to balance our classes known as a SMOTE-Tomek [15] (BAL-LR).Lastly, we trained a random forest classifier with balanced-class weight (RF) [14].Every technique created two different models (A and B) and every model was trained using nested cross-validation (10-fold each). Hyperparameters were optimized to maximize the out-of-fold F1-Score on a randomized grid space. The main scores to track models’ performances were F1-score and area under the curve (AUC) and all the scores presented in the results section are on test sets.
Results
Table 1[16] describes our cohort that included a total of 1484 patients. Overall mortality at 28 days was 49% (n = 741). Survivors were significantly younger and suffered fewer comorbidities at admission compared to non-survivors (p < 0.001). 44% of survivors had no pre-existing comorbidities, compared to 25% of non-survivors (p < 0.001). Male gender was associated with increased mortality (p = 0.002). All comorbidities were more common in non-survivors except for hepatic disease. Among home medications all drugs analyzed were more common in non-survivors (p < 0.005) except for immunosuppressors (p = 0.058). Concerning respiratory data, survivors were more often treated with Continuous Positive Airway Pressure (CPAP) and less often with invasive mechanical ventilation on the first day of their admissions. (p < 0.005).Fig. 2 represents the Kaplan-Meier analysis stratified by age. Age was strongly associated with mortality, with 7-day survival ranging from 64% in the oldest age group to 93% in the 30–40 years group, with differences progressively increasing at 14 and 28-days survival. In the supplemental material, the overall Kaplan-Meier analysis shows a reduction in survival to 70% at 10 days and to 55% at 20 days from ICU admission, continuing as a plateau thereafter (Supplemental Figure 1). When considering gender in Kaplan-Meier analysis, females gender resulted in higher survival. (Supplemental Figure 2).
Fig. 2
Kaplan-Maier curve stratified by age.
Kaplan-Maier curve stratified by age.Comparison in F1-score and area under the curve for type A and B of the four models are reported in Fig. 3
. Comparison between models A and B of each model versus the baseline UNIF model are reported in Supplemental Figure 3. Comparison between models A and B of each model among each other are reported in Fig. 4
.
Fig. 3
Comparison of F1-score and Area Under the Curve between model A and model B for every model proposed. The point represents the mean, while the vertical line stands for the +/- 1 standard deviation among all cross-validation. UNIF: ‘Uniform Dummy Classifier’, LR: ‘Logistic Regression’, BAL-LR: ‘Balanced Logistic Regression’, RF: ‘Random Forest Classifier’.
Fig. 4
Differences between Logistic Regression (LR) and the other two models (Balanced Logistic Regression (BAL-LR) and Random Forest classifier (RF)). There was no significance difference between BAL-LR, and RF. P-values refer to differences in results between LR and the second model (BAL-LR and RF, respectively). The point represents the mean, while the vertical line stands for the +/- 1 standard deviation among all cross-validation.
Comparison of F1-score and Area Under the Curve between model A and model B for every model proposed. The point represents the mean, while the vertical line stands for the +/- 1 standard deviation among all cross-validation. UNIF: ‘Uniform Dummy Classifier’, LR: ‘Logistic Regression’, BAL-LR: ‘Balanced Logistic Regression’, RF: ‘Random Forest Classifier’.Differences between Logistic Regression (LR) and the other two models (Balanced Logistic Regression (BAL-LR) and Random Forest classifier (RF)). There was no significance difference between BAL-LR, and RF. P-values refer to differences in results between LR and the second model (BAL-LR and RF, respectively). The point represents the mean, while the vertical line stands for the +/- 1 standard deviation among all cross-validation.All the three LR, BAL-LR and RF models performed better than baseline UNIF models according to the F1-score and AUC (p < 0.01) (Table 2
). The only one exception was the F1-score of LR model at 7 days, that performed poorly due to low recall.
Table 2
Nested cross-validated performances of the models on the test set. UNIF is the baseline models, LR is the first logistic regression, BAL-LR is the composed balanced logistic regression and RF is a random forest classifier. F1-score is a combined harmonization of precision and recall. SD: ‘Standard Deviation’, AUC: ‘Area Under the Curve’, precision is also known as Positive Predictive Value (PPV), recall also known as sensitivity.
Nested cross-validated performances of the models on the test set. UNIF is the baseline models, LR is the first logistic regression, BAL-LR is the composed balanced logistic regression and RF is a random forest classifier. F1-score is a combined harmonization of precision and recall. SD: ‘Standard Deviation’, AUC: ‘Area Under the Curve’, precision is also known as Positive Predictive Value (PPV), recall also known as sensitivity.BAL-LR and RF had higher performance than LR when the outcome was set to 7 days (p < 0.01) and to 14 days for both model A (p < 0.001) and model B (p < 0.03). After that timeline the three models performed without any statistically significant difference (Fig. 4) and, as time intervals increased, F1-score and Area Under the Curve progressively increased as well for all the three models in both A and B.BAL-LR model at 7-days model A and B yielded an F1-score of 0.35 and 0.45 and AUC of 0.68 and 0.76, respectively (Table 2). The performance increased at 14 days, with average F1-score 0.56 (model A) and 0.61 (model B) and AUC 0.72 (model A) and 0.75 (model B), and again at 21 days. The greatest predictive performance was reached at 28-days, with average F1-score of 0.71 (model A) and 0.75 (model B) AUC 0.77 for model A and 0.80 for model B. Precision and recall were balanced to maximize performances in all our models with only two exceptions with LR at 7 and 14 days where recall was very low compared to precision.Fig. 5a reports the odds ratio of BAL-LR model: all the values higher than 1 were positively associated with death in ICU while those below 1 were considered protective by our models.
Fig. 5a
Odds Ratio of BAL-LR models for 28 days predictions. CPAP: ‘Continuous positive airway pressure’, P/F: ‘PaO2/FIO2, NIV: ‘Non Invasive Ventilation’, IMV: ‘Invasive Mechanical Ventilation’, ARBs: ‘Angiotensin receptor blockers’, PaO2: ‘Partial pressure of oxygen’, PEEP: ‘Positive End-Expiratory Pressure’, COPD: ‘Chronic Obstructive Pulmonary Disease’, ACE: ‘Angiotensin Converting Enzyme’, CKD: ‘Chronic Kidney Disease’, FiO2: ‘Fraction of inspired oxygen’.
Odds Ratio of BAL-LR models for 28 days predictions. CPAP: ‘Continuous positive airway pressure’, P/F: ‘PaO2/FIO2, NIV: ‘Non Invasive Ventilation’, IMV: ‘Invasive Mechanical Ventilation’, ARBs: ‘Angiotensin receptor blockers’, PaO2: ‘Partial pressure of oxygen’, PEEP: ‘Positive End-Expiratory Pressure’, COPD: ‘Chronic Obstructive Pulmonary Disease’, ACE: ‘Angiotensin Converting Enzyme’, CKD: ‘Chronic Kidney Disease’, FiO2: ‘Fraction of inspired oxygen’.Fig. 5b reports the average features importance of RF model; in contrast with logistic regression models, it is not possible in RF model to reconstruct how the model considered each included variables, due to the nature of Random Forest. In RF model a numeric value was assigned to each variable to rank its importance in the decision process of the model.
Fig. 5b
Mean feature importance of RF models for 28 days predictions. Differently from odds ratio, there is no association between the importance and the outcome, but it represents how relevant the features were for the model.
Mean feature importance of RF models for 28 days predictions. Differently from odds ratio, there is no association between the importance and the outcome, but it represents how relevant the features were for the model.
Discussion
In this study we demonstrated the potential of a purely data-driven machine learning approach to predict relevant clinical outcomes, reaching good predictive performance.Machine learning techniques have the advantage of automatic variable selection and model development with reduced human interaction. This can be applied not only in highly controlled settings with clean high-quality data -the best setting for performance of machine learning- but can also be considered in an emergency setting, based on operational data, with good results.We tested this new approach with different models and techniques (LR, BAL-LR, RF) and compared it with a baseline model (UNIF) to evaluate the prediction capability of the models. We defined an outcome and set four different timepoints: 7-, 14-, 21- and 28- days. Lastly, we decided to evaluate two different data frameworks with every model: the first framework (model A) was trained on a dataset with baseline characteristics of admitted in ICU patients, while the second framework (model B) also included respiratory data gathered during the first day of admission. We asked to every model 8 different questions that combines the 4 different timepoints (7–14-21–28 days) and the two different data frameworks (A and B).When we compare the recall of the two models in the first two timepoints to our baseline model (which classify patients similarly to a coin toss), we can conclude that LR models have lower sensitivity compared to coin toss. This happens because imbalanced data classes can affect the predictive capability of methods like logistic regression (LR): this kind of models tend to optimize the overall accuracy without considering the relative distribution of the classes [17]. Accordingly, after 21 days LR seems to be reliable because the classes of the population considered happen to be balanced. On the contrary, techniques which allow balancing of data, such as BAL-LR (based on SMOTE-Tomek technique) or models that do not suffer from the class imbalance problem, like RF, can represent an advantage in prediction models, allowing increased performances regardless the distribution of the classes. Thus, we can conclude that BAL-LR and RF outperform UNIF in any single moment, while LR models are reliable only after 21-days (as confirmed by Fig. 4).Models that were trained on baseline characteristics and respiratory data (model B) had higher overall performance than their baseline counterpart (model A), as shown in Fig. 3. The p-values calculated over the nested-cross validation were almost always statistically significant for the F1-score. Thus, the analysis of the respiratory condition at admission might be clinically relevant to establish survival predictions and correctly classify COVID-19 patients.RF models were trained with very tight rules to avoid overfitting as the amount of data to train was limited. Since Random Forests often work better for the analysis of big data. we can suppose that higher amount of data would have probably allowed random forests to outperform the balanced logistic regression (BAL-LR).In conclusion, as we are proposing a method that can be applied in any emergency setting with operational data, we would suggest training both models and modify the parameters of the forest according to the dimension of the dataset.An analysis of how decisions were taken when predicting mortality at 28 days was performed (Fig. 5a, Fig. 5b): BAL-LR and RF models confirm that age is the strongest predictor of ICU survival in COVID-19 patients.In both models B and A age was the most relevant feature and the most associated with mortality. The strong association between age and mortality is a constant finding in COVID-19 literature [4], [18].Chronic kidney disease (CKD) is highly correlated with mortality in our data (as shown in Table1), and this may be related to several factors. CKD affects older patients [19] that, as demonstrated widely [3] and confirmed by our models, are particularly fragile when hospitalized for COVID-19. Secondly, all stages of CKD are associated with an increased risk of premature mortality from all causes [20] and thirdly, CKD is associated in up to two-thirds of the cases with diabetes and hypertension [21], a proxy for older, multi-morbid patients [4]. COVID-19 disease is also associated with new onset acute kidney injury, that may further worsen previous kidney disfunction, leading to organ failure [22], [23].Regarding type B ICU admission models, the level of oxygen therapy (FiO2) was highly correlated with the outcome, as it represents a proxy of severity. To a minor extent, there was positive association with end expiratory pressure (PEEP), while on the opposite side, an increased P/F ratio and an initial admission with a continuous positive airway pressure (CPAP), non-invasive ventilation (NIV) or spontaneous breathing were associated with survival. All these features are a proxy of reduced severity because these patients needed an ICU hospitalization but were less critical than the rest of population in the ward.Male gender is a negative predicting factor, a finding confirmed by previous studies [24]. An etiological justification might be linked to a difference between the sexes in cellular immunity as males present a poorer T-cell activation and an increase in proinflammatory cytokines, but further studies are required on the topic [25].Chronic therapy with ACE inhibitors were associated with higher mortality. Initial reports linked the possible pharmacodynamics of this class of drugs to an up-regulation of ACE2 expression [26] and a consequent increase in the availability of target molecules for SARS-CoV-2 [27]. This association has been proven wrong by Mancia et al. [28], who performed a large population-based case-control study demonstrating that use of ACE inhibitors and ARBs was more frequent among COVID-19 patients due to their higher prevalence of cardiovascular disease, without evidence linking those drugs to a higher risk of infection by SARS-CoV-2.According to BAL-LR, we found an association between chronic obstructive pulmonary disease (COPD) and mortality. COPD patients have both an increased risk of COVID-19 disease, and a poorer prognosis, with higher rates of hospitalization and mortality [29]. COPD is an independent predictor of mortality in patients admitted to ICU for COVID-19 pneumonia [4].Diabetes Mellitus is associated with mortality in our results. Type-2 diabetes mellitus is more frequent in older patients, male gender, and is part of the metabolic syndrome with hypertension and obesity, which was previously demonstrated to have strong association with COVID-19 outcomes [30]. The association of diabetes and survival has been questioned by other studies, where the association was lost after controlling for other factors [4], [31].A higher number of comorbidities was associated with increased risk of death. As most of the comorbidities were associated with an increased risk, it does not surprise that their sum leads to an increased overall risk. In concordance with another work performed on the COVID-19 Lombardy population 4, most of the comorbidities we analyzed are associated with an increased risk of death with very few exceptions (e.g., hepatic diseases).
Limitations
The study presents several limitations. First, it is an observational study based on operative data collected during an emergency crisis by a regional coordination center, hindering the quality of data assured by a research targeted database. Despite being a limiting factor, our aim was to assess the ability of machine learning models on operational/emergency data collected during the escalation phase of the spread of SARS-CoV2, where a hold-out validation could not be retrieved. Some variables that could be useful to increase the predictive performance of the model were not collected, including more specific data about comorbidities (i.e., CKD Stage, hypertension severity stage) and other physiological parameters (weight, body mass index, more complete ventilatory data, patient frailty). Availability of more data could have improved predictive performances in this population.The number of patients and data included in this study is not comparable with big data analysis, where machine learning techniques really shine. However, with this study we were able to demonstrate that machine learning approach may be used even with smaller dataset in an emergency setting and reach high predictive performance.
Conclusions
Supervised machine learning models with a completely data-driven approach may be employed in emergency setting to assess the major risk factors of critical COVID-19 patients, despite sub-optimal numerosity and clean-up of the datasets. We propose a four-step machine learning approach which can be used in similar setting to gain knowledge on complex situations and enhance patient management and resources, sparing resources and time compared to classic statistical techniques.
Summary Table
What was already know on the topic?COVID-19 patients admitted to ICU have very high mortality.Age, gender, and previous comorbidities are associated to negative outcome.Resource allocation during surges would be helpful for both patients and healthcare systems.What this study added to our knowledge?Supervised machine learning predictive models perform robustly in predicting mortality at ICU admission for COVID-19 patients.Operational/emergency datasets collected during escalation phases can create robust machine learning predictive models even in smaller datasets.Easy-to-deploy machine learning pipelines should be created in advance, so that, during emergencies phases, faster insights on the patients admitted in ICU could be retrieved in a completely data-driven manner.
Ethics approval and consent to participate
The institutional ethics board of Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, approved this study and waived the need for informed consent from individual patients owing to the retrospective nature of the study. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
Authors' contributions
On behalf of the Lombardy ICU Network:Concept and design: MG, GA, PFC.Acquisition, analysis, or interpretation of data: MG, GA, PFC, AZ.Drafting of the manuscript: MG, GA, PFC, NS.Critical revision of the manuscript for important intellectual content: AZ, EC, Alessandro Protti, Antonio Pesenti, GG, MC.Supervision: Antonio Pesenti, GG, MC.
CRediT authorship contribution statement
Massimiliano Greco: . Giovanni Angelotti: . Pier Francesco Caruso: . Alberto Zanella: . Niccolò Stomeo: . Elena Costantini: . Alessandro Protti: . Antonio Pesenti: Supervision. Giacomo Grasselli: Supervision. Maurizio Cecconi: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Matt Arentz; Eric Yim; Lindy Klaff; Sharukh Lokhandwala; Francis X Riedo; Maria Chong; Melissa Lee Journal: JAMA Date: 2020-04-28 Impact factor: 56.272
Authors: Christian Karagiannidis; Carina Mostert; Corinna Hentschker; Thomas Voshaar; Jürgen Malzahn; Gerhard Schillinger; Jürgen Klauber; Uwe Janssens; Gernot Marx; Steffen Weber-Carstens; Stefan Kluge; Michael Pfeifer; Linus Grabenhenrich; Tobias Welte; Reinhard Busse Journal: Lancet Respir Med Date: 2020-07-28 Impact factor: 30.700