Literature DB >> 34988543

Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases.

Thomas Linden^1,2, Frank Hanses^3,4, Daniel Domingo-Fernández¹, Lauren Nicole DeLong^1,2, Alpha Tom Kodamullil¹, Jochen Schneider⁵, Maria J G T Vehreschild⁶, Julia Lanznaster⁷, Maria Madeleine Ruethrich⁸, Stefan Borgmann⁹, Martin Hower¹⁰, Kai Wille¹¹, Torsten Feldt¹², Siegbert Rieg¹³, Bernd Hertenstein¹³, Christoph Wyen¹⁴, Christoph Roemmele¹⁵, Jörg Janne Vehreschild⁶, Carolin E M Jakob¹⁶, Melanie Stecher¹⁷, Maria Kuzikov⁴, Andrea Zaliani⁴, Holger Fröhlich^1,2.

Abstract

Despite available vaccinations COVID-19 case numbers around the world are still growing, and effective medications against severe cases are lacking. In this work, we developed a machine learning model which predicts mortality for COVID-19 patients using data from the multi-center 'Lean European Open Survey on SARS-CoV-2-infected patients' (LEOSS) observational study (>100 active sites in Europe, primarily in Germany), resulting into an AUC of almost 80%. We showed that molecular mechanisms related to dementia, one of the relevant predictors in our model, intersect with those associated to COVID-19. Most notably, among these molecules was tyrosine kinase 2 (TYK2), a protein that has been patented as drug target in Alzheimer's Disease but also genetically associated with severe COVID-19 outcomes. We experimentally verified that anti-cancer drugs Sorafenib and Regorafenib showed a clear anti-cytopathic effect in Caco2 and VERO-E6 cells and can thus be regarded as potential treatments against COVID-19. Altogether, our work demonstrates that interpretation of machine learning based risk models can point towards drug targets and new treatment options, which are strongly needed for COVID-19.

Entities: Chemical

Keywords: Covid19; Drug repositioning; Explainable ai; Machine learning; Precision medicine

Year: 2021 PMID： 34988543 PMCID： PMC8677630 DOI： 10.1016/j.ailsci.2021.100020

Source DB: PubMed Journal: Artif Intell Life Sci ISSN： 2667-3185

Introduction

As of October 2021, the ongoing SARS-CoV-2 pandemic led to almost 5 million reported deaths worldwide according to data from the for US Institute for Health Metrics and Evaluation (https://covid19.healthdata.org/). In addition, economic costs are estimated to reach the order of several trillion dollars for the USA alone [11]. While effective vaccinations are now available, there are still a considerable number of infected people worldwide. Moreover, effective medications for treating severe cases are still scarce. Remdesivir, a drug originally developed against the Ebola virus, is currently the only approved COVID-19 drug in the European Union, and evidence suggests that it has little effect on the overall survival of COVID-19 patients [5]. Several studies have revealed general risk factors for a poor disease outcome, such as age, male gender, and low platelet count [20,42,46,65]. In addition, machine learning (ML) models have been published to predict mortality risk for individual patients, primarily based on data from Intensive Care Units and electronic health records from the US and UK [3,6,19,31,48,53,57] as well as a few other countries [33,39]. Notably the 4C mortality score developed by Ali et al., based on data from the UK has recently been validated within an intendent study in Canada [31]. None of these models have resulted in a change of clinical routine or the identification of new treatment options so far. In this work, we specifically investigated data from nearly 5700 PCR or rapid test confirmed SARS-CoV-2 patients recruited in more than 100 European active sites, primarily all over Germany. For these patients, disease symptoms, vital parameters, biomarkers from urine and blood, and diagnosed comorbidities were available. Using these data and ML, we first developed a model that can predict mortality with an area under receiver operator characteristic curve (AUC) of almost 80% up to 60 days in advance. One of the relevant predictors in our model was a prior diagnosis of dementia, which increases the mortality risk by about 15%. Based on this finding, we explored the overlap between COVID-19, Alzheimer's (AD), and Parkinson's Disease (PD) molecular disease mechanisms, which pointed us to tyrosine kinase 2 (TYK2) as a potential new drug target. Finally, our experimental data with Caco2 and VERO-E6 cells suggests that Sorafenib and Regorafenib, two approved anti-cancer drugs, could be repositioned for treating severe COVID-19 cases.

Results

Overview about LEOSS data

The Lean European Open Survey on SARS-CoV‑2 infected patients (LEOSS - https://leoss.net/) is an observational, multi-center study focusing on PCR or rapid test confirmed patients. Study centers are primarily University Medical Centers, but also include other hospitals, institutes, and medical practices. Active sites cover several European countries but have a primary focus on Germany. They are thought to generate representative data of (primarily hospitalized) COVID-19 cases, at least for Germany. In order to ensure anonymity in all steps of the analysis process, an individual LEOSS Scientific Use File (SUF) was created, which is based on the LEOSS Public Use File (PUF) principles described in [29]. The baseline data from more than 100 active sites, collected at time of a positive test or diagnosis, comprises patient demographics, disease symptoms, vital parameters, biomarkers from urine and blood, and comorbidities. Follow-up information, including survival, was available for patients between 18 and 85 years. These data were further filtered to only include patients with less than 50% of missing data at baseline, resulting into n = 5679 patients (Table 1 ). Out of those 5679 patients, 5225 (92%) were inpatient, and 569 (10.0%) were reported death cases within a follow-up period of up to 78 days. Among them, 430 (76% of 569) patients were reported death cases within the first 20 days (Fig. 1 ).

Table 1

Overview of patient demographics in LEOSS.

Age
18 - 25 years	181
26 - 35 years	472
36 - 45 years	540
46 - 55 years	907
56 - 65 years	1125
66 - 75 years	981
76 - 85 years	1231
missing	242

Gender

Male	3229
Female	2218
missing	232

Ethnicity

Caucasian	4225
missing	1195
Asian & Pacific Islander	155
African & African American	98
Hispanic or Latino	6

Country

Germany	5411
Turkey	65
Belgium	40
Czechia	33
Latvia	27
Other	26
GBR	23
Italy	19
Spain	15
France	11
Austria	9

Fig. 1

Kaplan-Meier plot of COVID-19 patients in LEOSS. The plot shows the estimated survival function according to the well-known product limit estimator, see section “Methods” [32]. The gray area depicts the 95% confidence interval.

Overview of patient demographics in LEOSS. Kaplan-Meier plot of COVID-19 patients in LEOSS. The plot shows the estimated survival function according to the well-known product limit estimator, see section “Methods” [32]. The gray area depicts the 95% confidence interval.

Machine learning can predict mortality with high accuracy

We implemented and compared a broad panel of time-to-event machine learning models to predict patient survival using only LEOSS baseline data: Elastic net penalized Cox proportional hazards regression [10,66,71] Elastic net penalized Weibull accelerated failure time regression [35,62,71] DeepSurv – a neural network approach using a loss function derived from a Cox proportional hazard model [34] Random Survival Forests [28] XGBoost Survival Embeddings – a popular stochastic gradient boosting algorithm using a loss function derived from a Weibull regression [58] Notably, all these models account for the right censoring of the data, see details in section “Methods”. We evaluated models via a five-fold cross-validation (CV). In other words, we split the entire dataset into five outer folds, and we subsequently left out one of these folds for testing the model, while the rest of the data was used for model training and tuning. Notably, splitting of the data was performed in a stratified manner, such that the number of events was equally maintained across all folds. We tuned the hyper-parameters within the CV loop using an extra level of inner five-fold CV (see Section 4.2 for details). We employed Uno's C-index as a metric to assess prediction performance [56]. A C-index of 50% indicates chance level, whereas a C-index of 100% would reflect a perfect concordance of model predictions and observed death cases in the test data (see Section 4.3 for details). Overall, elastic net penalized Weibull regression achieved the best discrimination performance with ∼77% C-index (Fig. 2 a) and low calibration error (Integrated Brier Score – IBS) of 0.12 (Fig. 2b, Supplementary Table 1). Furthermore, a stable prediction performance of ∼80% AUC was found up to ∼60 days after disease diagnosis (Fig. 2c). Therefore, elastic net penalized Weibull regression was used to subsequently train a final model on the entire dataset while using the previously described approach for hyper-parameter tuning.

Fig. 2

(a) Model prediction performance measured via Uno's C-index on held out test sets (COX = elastic net penalized Cox proportional hazards regression; WEI = elastic net penalized Weibull accelerated failure time regression; XGBSE = XGBoost Survival Embeddings; RSF = Random Survival Forest; DEEPSURV = DeepSurv); (b) model calibration error measured via Integrated Brier Score (IBS) on held out test sets; (c) model prediction performance as function of time on held out test sets with 95% confidence interval, with integrated AUC (iAUC) denoting the mean (standard error) AUC over time. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Diagnosis of dementia as a relevant predictor

The final model was further explored with respect to the impact of most relevant predictors using Shapley Additive Explanations - SHAP [38]. Briefly, SHAP is an approach from cooperative game-theory to decompose the overall prediction of the model into a sum of individual feature contributions (see details in 4.4). In total, the final model comprised 160 features. A complete list can be found in Supplementary File 1. Fig. 3a shows the most influential features according to SHAP, while Fig. 3b summarizes the influence of entire feature modalities, indicating that lab measures were the most relevant type of features (23.5% cumulative importance). Disease symptoms ranked second (20.5%) and comorbidities third (13.2% cumulative importance). Age, gender, platelet count as well as elevated troponin and ferritin concentrations were among the top predictors in the model, which are all known risk factors [20,42,46,65]. The prognostic significance of hemoglobin level and autoimmune hemolytic anemia for an unfavorable disease outcome has been discussed in [2]. The C-reactive protein (CRP) is a well-known infection and inflammation marker, which has been used as an indicator and prognostic marker of severity of COVID-19 infection [54]. Muscle pain is an often observed symptom of the infection [63], and its extent has been associated to the likelihood of a more unfavorable prognosis of hospitalized COVID-19 patients [12]. Comorbidity associated predictors included hypertension, an acute kidney injury, diabetes and dementia (Supplementary Table 2, Supplementary File 2). Again, this is concordant with the current literature [8,16,43].

Fig. 3

Feature importance using absolute SHAP values: (a) top 10 predictors; (b) cumulative influence per feature modality. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 4 displays partial dependency plots for the previously discussed predictors, describing the quantitative relationship between individual feature attributes and their impact on estimated hazard ratios. Accordingly, an asymptomatic Covid-19 infection (Fig. 4b) resulted into an ∼35% lower mortality risk compared to more severe disease symptoms, and for patients with low hemoglobin level (Fig. 4c) or low oxygen saturation (Fig. 4d) mortality risk was even increased by 50%. Prior diagnosis of dementia (Fig. 4d) results into an ∼15% increased mortality risk after SARS-CoV-2-19 infection (hazard ratio dementia vs. non-dementia: ∼1.15; 95% CI: [1.08, 1.24]). Notably, there are different possible explanations for this finding: (a) dementia might be a proxy for age; (b) dementia might, independently of age, trigger biological, physiological and psychological mechanisms that contribute to an unfavorable disease outcome.

Fig. 4

Partial dependence plots for most influential predictors. Boxplots show the distribution of patient specific hazard ratios per variable category. The red horizontal line defines the reference. The hazard ratio describes by which factor the median lifetime is expected to change compared to reference. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Commonly affected molecular mechanisms between neurodegenerative disorders and COVID-19

We aimed for a more in-depth exploration of potential overlaps of neurodegeneration and COVID-19 disease mechanisms. Notably, there has been increasing evidence that SARS-CoV-2 can enter the central nervous system [7,36,41], raising the question of potential interactions with dementia disease pathologies. In this context, [70] recently reported an overlap of transcriptionally dysregulated biological pathways in a very limited number of patients with Alzheimer's Disease (AD) and COVID-19. Here, we focused more broadly on shared molecular mechanisms linking COVID-19 with AD as well as Parkinson's Disease (PD), another major neurodegenerative disorder, which has previously been associated with an increased risk for an unfavorable outcome of a SARS-CoV-2 infection [50,59]. By looking at the intersection between AD and PD cause-and-effect models (referred as knowledge graphs - KGs) and the corresponding COVID-19 KG, in this work we found a series of mechanisms that were shared between all three disease etiologies (Supplementary Table 3). Firstly, one of the mechanisms identified by our approach is related to three proteins involved in the innate immune system (i.e., DDX58, MAVS, and IFIH1), and more specifically in the detection and response to viruses. These proteins are involved in both indications. For example, MAVS interacts with the RNA helicase RIG-I/MDA-5 after the dsRNA of the virus is recognized, leading to the initiation of the antiviral signaling cascade [69]. Related with this process is the second shared mechanism, which corresponds to the activation of the inflammasome and the subsequent triggering of caspase activation through cytokine secretion. This mechanism has been strongly linked with both AD [21] and PD [61] as well as COVID-19 [44]. In the context of neurodegeneration, the activation of the inflammasome leads to the secretion of inflammatory cytokines and cell death through pyroptosis, to which both AD and PD are associated via tangle and plaque formation and death of dopamine neurons, respectively [4]. Similarly, in the context of COVID-19, the inflammasome is activated by the proteins of the SARS-CoV-2 virus, which in turn leads to the production of inflammatory molecules, and in some cases leads to hyperinflammation [44]. Finally, TYK2 is also present in all three KGs. It is known to be implicated in the regulation of apoptosis in the amyloid cascade of AD [60] as well α-synuclein-induced neuroinflammation and dopaminergic neurodegeneration [47]. Lastly, IL-6 and IL-10 are among two of the interleukins secreted after inflammasome activation, one of the shared mechanisms between these pathologies, and their increased expression has been shown to be predictive of COVID-19 severity [14]. Furthermore, the interaction between two other proteins (i.e., DDIT3 and BCL2L11) involved in the regulation of apoptosis is also suggested as a common mechanism across these indications [18,26]. Regorafenib (panels A and C) and Sorafenib (panels B and D) activities measured in different cell lines (Vero-E6 cells upper panels; Caco2 cells lower panels) as percentage inhibition of viral cytopathic effect normalized to Remdesivir as positive control (100%). Cells in wells were treated with SARS CoV-2 virus, and drugs were administered after 48 or 96 h after infection. Subsequently, cells were stained, washed and counted if alive. Some signs of toxicity on Caco2 cells (lower panels) started to surface at higher drug concentrations and this might be the reason for the higher observed variance of triplicates. The slightly negative relative inhibition shown in panel D is caused by plate control differences within plates.

Sorafenib and regorafenib as potential treatments against COVID-19

In the following, we specifically focused on TYK2, which is a protein involved into the amyloid cascade. TYK2 inhibition results into effective regulation of IFNα, IL-10, IL-12, and IL-23 [23], which has specifically been reported in neurodegenerative disorders [45]. TYK2 has been patented as drug target in AD (CN102112879B, China, [27]). In addition, genetic variants in TYK2 have recently been associated to COVID-19 disease severity [9]. Moreover, we found several kinase inhibitors active against SARS-CoV-2 in a cellular screen for anti-cytopathic effect (anti-CPE) in two different cellular environments: Caco2 [17] and VERO-E6 [67]. The relative results of those screening have been made public on ChEMBL (https://www.ebi.ac.uk/chembl/document_report_card/CHEMBL4303101/, https://www.ebi.ac.uk/chembl/document_report_card/CHEMBL4495565/), respectively. We challenged VERO-E6 cells with SARS-CoV-2 pretreated with compounds from the Fraunhofer Repurposing Library (5632 compounds - https://www.itmp.fraunhofer.de/en/innovation-areas/drug_screening_repurposing.html), the EUOS Bioactives library (∼2500 compounds - https://ecbd.eu/compound/#lib{value='2′}lib{value='2′}), and a proprietary “Safe in Man” library of compounds having passed phase I clinical trial (∼600 compounds). Regarding the phenotypic assay with Caco2 cells to determine compound antiviral activity, we adapted a previously published protocol [37]. Compounds were added to confluent layers of Caco–2 cells in MEM supplemented with 1% FBS in 96-well plates. For the primary screen final compound concentration was 10 µM (0.1% DMSO final) in singlicates. Dose response profiling of selected priority compounds was performed with a range of eight different concentrations in three independent replicates (maximum 20 µM, minimum 20 nM, half log dilution factor, 0.1% DMSO final). Following the addition of compounds, cells were immediately infected with SARS-CoV-2 at MOI 0.01. Control wells (+ virus and - virus) also contained DMSO at 0.1% DMSO final. After 48 h, cells were fixed using 3% PFA in PBS, and the plates sealed and disinfected to inactivate SARS-CoV-2. Quantification of viral inhibition (based upon Caco-2 cell viability relative to controls) was performed using high content imaging (PerkinElmer, Operetta CLS). For what concerns the assay on VERO-E6 cells, we used basically the same protocol, but the time we waited for readout was longer than 48 h (96 h) due to the different infection kinetic on these cells. In VERO-E6 cells, only Regorafenib showed a clear antiviral CPE potency with an IC50 of around 3 – 5 µM. In Caco2 cells, Sorafenib and Regorafenib demonstrated a similar antiviral CPE potency with an IC50 of around 1µM for both molecules (Fig. 5). Both compounds are reported to be non-selective JAK/TYK2 inhibitors [25]. While the involvement of the JAK kinase family in inflammatory cytokine modulation is well-known, the extent of which TYK2 (a JAK family member) could be responsible of the observed CPE effect remains to be determined with more selective drug candidates. Such TYK2 selective preclinical compounds are currently not part of our screened libraries, because we focused on repurposing marketed kinase inhibitors.

Fig. 5

Regorafenib (panels A and C) and Sorafenib (panels B and D) activities measured in different cell lines (Vero-E6 cells upper panels; Caco2 cells lower panels) as percentage inhibition of viral cytopathic effect normalized to Remdesivir as positive control (100%). Cells in wells were treated with SARS CoV-2 virus, and drugs were administered after 48 or 96 h after infection. Subsequently, cells were stained, washed and counted if alive. Some signs of toxicity on Caco2 cells (lower panels) started to surface at higher drug concentrations and this might be the reason for the higher observed variance of triplicates. The slightly negative relative inhibition shown in panel D is caused by plate control differences within plates.

Conclusion

As of October 2021, the rates of completely vaccinated individuals in many Western countries are stagnating between 60 – 70%, while the fraction of vaccinated individuals is globally only around 36% [40]. Correspondingly, case numbers in many countries around the world are still increasing. Hence, there is an unmet need for effective and cost-efficient medications against severe cases. In this work, we first developed a highly predictive ML model for predicting COVID-19 mortality on an individual patient basis using deep observational data from LEOSS, primarily covering the inpatient situation in Germany (95% of patients). To our knowledge, this is the first ML based mortality model based on such (notionally) representative German data. Notably, ML models predicting alternative endpoints using LEOSS have been published recently [30,64]. Our ML model demonstrates similar prediction performance to the well-known 4C mortality score, which has been developed based on representative data from the UK [3]. However, a direct comparison between both models is not possible, because the 4C model is formulated as a classifier predicting all-cause in-hospital mortality, whereas our model is formulated as a time-to-event model predicting all-cause time dependent mortality risk after COVID-19 diagnosis. Our model, thus, considers censoring of survival times after patients have left hospital or other medical facilities. Our mortality model was built on a set of patients, which is thought to be primarily representative for German hospitals. Whether there are unknown selection biases, remains an open question and they were not under our control. Moreover, it is unclear whether our model would be predictive for patients in other countries. We showed that dementia, as one of the relevant predictors in our model, intersects on a molecular mechanism level with COVID-19. Together with evidence from recent GWAS studies, this pointed us to TYK2 as a potential drug target for COVID-19. Using a cellular screening assay for anti-cytopathic effect, we identified the anti-cancer drugs Regorafenib and Sorafenib as potential drug candidates against COVID-19. Notably, the known association of JAK family inhibitors like Regorafenib and Sorafenib with cellular inflammatory cytokines can be further characterized by investigating transcription dynamics within the first 12 h after SARS-CoV-2 viral infection compared to mock control [55]. Based on such data, Stukalov et al. [55], tested both compounds in the A549-ACE2 cell line and reported increased virus growth after treatment. Other authors recently reported Sorafenib to be a potent STING inhibitor effectively stopping virus growth in THP1 cells and thus suggested to pay more attention to COVID-19 treatment strategies that address the dysregulation of cytokines [13]. Since the used cell lines in both cases were different from ours, results are not directly comparable. Hence, we see a need for further tests with Regorafenib and Sorafenib in other cell systems. In addition to further experimental validation of Regorafenib and Sorafenib, it could be interesting to explore in large scale clinical real-world data whether SARS-CoV-2 infected patients treated with Regorafenib or Sorafenib demonstrate a lower mortality than other SARS-CoV-2 patients. Overall, our work demonstrates that interpretation of an ML based risk model trained on rich data can point towards drug targets and new treatment options, which are strongly needed for COVID-19.

Methods

Kaplan-Meier estimator

The Kaplan-Meier product limit estimator is classical non-parametric statistic to estimate a survival function [32]: Let denote a time point, where at least one event / death happened. The number of events (deaths) at is denoted as , and the number of individuals known to have survived up to is . Then the Kaplan-Meier estimator of the survival function (representing the probability that life is longer than ) is given by:and . is a right-continuous step function with jumps at event times . Censoring at certain time points affect the estimate only by reducing the number of individuals that are at risk for a subsequent event.

Machine learning models for predicting COVID-19 mortality

We compared five different machine learning algorithms, as outlined in Section 2.2. Here, we only elaborate on the best performing one, namely the elastic net penalized Weibull regression: The elastic net is a regularization and variable selection method, to shrink coefficients using a linear combination of and penalties. The Weibull regression is an accelerated failure time (AFT) model, which means that covariates act multiplicatively on (survival) time. It is used if the proportional hazards assumption of the Cox model is not satisfied. AFT models allow to directly estimate (the effect of covariates on) expected failure times, where the time until failure is the duration of survival. Let denote the -th patient with covariate vector and observed follow-up time . Furthermore, let be an event indicator (0 = right censored, 1 = uncensored) at . The true and potentially unobserved survival time is , and the censoring time is . That means and . The censoring is supposed to be non-informative about the true survival time [49]. In a Weibull AFT model, we assume , i.e. the hazard function has the form Parameters of a standard Weibull AFT model can be estimated by maximizing the likelihood [68]:where denotes the survival function . To account for overfitting, our case coefficients were additionally penalized via the elastic net penalty: Hyperparameters (i.e.) were tuned with Bayesian hyperparameter optimization using the Optuna package [1] within the inner-loop of the nested cross-validation. Early stopping was used if applicable (DeepSurv, GBM, XGBSE), and the best candidate model was subsequently selected. We chose the 5-fold cross-validated Harrell's C-index [22] as objective for the hyperparameter tuning. We ran the optimization for twenty initial epochs, adopted the search space if reasonable, and then ran it for another twenty rounds. Thus, forty hyperparameter sets were evaluated and the resulting best combination was selected based on the highest objective function value. Using this hyperparameter set, we subsequently trained a model on the entire training data and evaluated it on the held-out test set.

Uno's concordance-index

The prediction performance of time-to-event models can be evaluated with respect to discriminating between subjects with different event times via Uno's C-index [56]: The C-index (Concordance index) is a generalization of the area under receiver operator characteristic curve (AUC) for time-to-event models [24,51]. A value of 100% means perfect discriminative performance, and 50% is comparable to random predictions. In essence, Uno's C is a rank correlation between the risk predictions and the observed event times. The C-index measures the concordance across all pairs of patients . A pair is classified concordant if the predicted risk is higher for the patient with lower survival time. Uno's C-index was developed as an alternative to Harrell's C-index in settings with high censoring rates and leads to consistent concordance estimates under the general random censoring assumption. Uno's C-index uses an inverse probability censoring weighting (IPCW) approach [56]: The numerator counts the concordant pairs and the denominator the valid pairs, respectively. For patients , is 1 if an event (death) was observed and otherwise 0, is the Kaplan-Meier estimator for the censoring distribution for IPCW, is the risk prediction of the -th patient, is the observed time and is a stability parameter, for further details see [56].

Feature importance using SHAP

Shapley Additive Explanations [38] are a model-agnostic approach from coalitional game theory. The assumption of this framework is, that individuals (feature attributes) are cooperating as a team (patient feature vector) for a joint outcome (model prediction). SHAP's goal is to estimate those individual contributions to the outcome. Key properties are a) the solution is unique; b) local exactness, which means the sum of feature contributions matches the output; c) if a feature has no impact, then it's SHAP-value is zero. Mathematically, additivity and property b) can be described as:with being the original model and the explanation model defined on simplified inputs Moreover, is a function mapping to the simplified input . is the SHAP value of the -th feature for the model input vector and denotes the expectation value of . In other words: The SHAP values quantifies how much a particular feature pushes the prediction away from the population average . SHAP values are computed as follows: In other words, SHAP values are defined as a weighted (binomial coefficient) sum of the differences between (in square brackets) “prediction including the feature” minus “prediction excluding the feature”, for any subset in the power set . denotes the model trained with feature included and without it. Similarly, denotes the feature subset with feature included and without it.

Confidence intervals for hazard ratios

To construct a confidence interval for the hazard ratio of “dementia vs. non-dementia” we performed a bootstrap: We resampled 100,000 times with replacement a pair of a demented and non-demented patient. We then calculated the ratio of the SHAP values for the feature “prior dementia diagnosis” for both patients.

Identification of common molecular mechanisms between COVID-19 and neurodegenerative diseases

To identify the shared molecular mechanisms between COVID-19, AD, and PD, we leveraged several resources listed in Supplementary Table 3. These were combined into two independent Knowledge Graphs (KGs) following the harmonization procedure described in our previous work [52] and [15]. By doing so, we combined disease specific molecular interactions pertaining to COVID-19 and two neurological indications (i.e., AD and PD) into graph structures: one for COVID-19 and one for AD and PD. Subsequently, we calculated the intersection of these graphs. Supplementary File 2 contains the corresponding shared mechanisms as an Excel table.

Author contributions

Drafted the manuscript: HF, TL, DDF, AZ; initiated and guided the project: HF; implemented machine learning models: TL; computational drug target identification: DDF, LDL, ATK; experimental validation: MK, AZ. Other authors: acquisition and preparation of LEOSS data

Funding

This work has been funded via the ‘COPERIMOplus’ initiative and supported by the Fraunhofer ‘Internal Programs’ under Grant No. Anti-Corona 840266. The LEOSS registry was supported by the German centre for Infection Research (DZIF) and the Willy Robert Pitzer Foundation.

Ethics approval

LEOSS is registered at the German Clinical Trials Register (DRSK, S00021145) and was approved by the leading Ethics Commitee No. 20–600 “Ethikkommission des Fachbereichs Humanmedizin der Johann-Wolfgang-Goethe-Universität Frankfurt am Main, 60590 Frankfurt, Germany”. For the anonymization procedure see [29].

Code availability

The source code of the analyses presented in this paper is available at https://github.com/thomasmooon/leoss-cov19.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

57 in total

1. Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: A retrospective study.

Authors: Logan Ryan; Carson Lam; Samson Mataraso; Angier Allen; Abigail Green-Saxena; Emily Pellegrini; Jana Hoffman; Christopher Barton; Andrea McCoy; Ritankar Das
Journal: Ann Med Surg (Lond) Date: 2020-10-03

2. ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM.

Authors: Yichao Wu
Journal: Stat Sin Date: 2012 Impact factor: 1.261

Review 3. The mechanisms of NLRP3 inflammasome/pyroptosis activation and their role in Parkinson's disease.

Authors: Shuo Wang; Yu-He Yuan; Nai-Hong Chen; Hong-Bo Wang
Journal: Int Immunopharmacol Date: 2018-12-27 Impact factor: 4.932

4. Clinical Profiles and Mortality of COVID-19 Inpatients with Parkinson's Disease in Germany.

Authors: Raphael Scherbaum; Eun Hae Kwon; Daniel Richter; Dirk Bartig; Ralf Gold; Christos Krogias; Lars Tönges
Journal: Mov Disord Date: 2021-05-04 Impact factor: 10.338

5. Hypertension is a clinically important risk factor for critical illness and mortality in COVID-19: A meta-analysis.

Authors: Yanbin Du; Nan Zhou; Wenting Zha; Yuan Lv
Journal: Nutr Metab Cardiovasc Dis Date: 2020-12-11 Impact factor: 4.222

6. Plasma-borne indicators of inflammasome activity in Parkinson's disease patients.

Authors: Faith L Anderson; Katharine M von Herrmann; Angeline S Andrew; Yuliya I Kuras; Alison L Young; Clemens R Scherzer; William F Hickey; Stephen L Lee; Matthew C Havrda
Journal: NPJ Parkinsons Dis Date: 2021-01-04

7. Severe acute kidney injury in COVID-19 patients is associated with in-hospital mortality.

Authors: Jin Hyuk Paek; Yaerim Kim; Woo Yeong Park; Kyubok Jin; Miri Hyun; Ji Yeon Lee; Hyun Ah Kim; Yong Shik Kwon; Jae Seok Park; Seungyeup Han
Journal: PLoS One Date: 2020-12-09 Impact factor: 3.240

8. Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation.

Authors: Sulaiman Somani; Adam J Russak; Akhil Vaid; Jessica K De Freitas; Fayzan F Chaudhry; Ishan Paranjpe; Kipp W Johnson; Samuel J Lee; Riccardo Miotto; Felix Richter; Shan Zhao; Noam D Beckmann; Nidhi Naik; Arash Kia; Prem Timsina; Anuradha Lala; Manish Paranjpe; Eddye Golden; Matteo Danieletto; Manbir Singh; Dara Meyer; Paul F O'Reilly; Laura Huckins; Patricia Kovatch; Joseph Finkelstein; Robert M Freeman; Edgar Argulian; Andrew Kasarskis; Bethany Percha; Judith A Aberg; Emilia Bagiella; Carol R Horowitz; Barbara Murphy; Eric J Nestler; Eric E Schadt; Judy H Cho; Carlos Cordon-Cardo; Valentin Fuster; Dennis S Charney; David L Reich; Erwin P Bottinger; Matthew A Levin; Jagat Narula; Zahi A Fayad; Allan C Just; Alexander W Charney; Girish N Nadkarni; Benjamin S Glicksberg
Journal: J Med Internet Res Date: 2020-11-06 Impact factor: 5.428

Review 9. The Regulatory Role of IL-10 in Neurodegenerative Diseases.

Authors: Chiara Porro; Antonia Cianciulli; Maria Antonietta Panaro
Journal: Biomolecules Date: 2020-07-09

10. Development and validation of a simplified risk score for the prediction of critical COVID-19 illness in newly diagnosed patients.

Authors: Stanislas Werfel; Carolin E M Jakob; Stefan Borgmann; Jochen Schneider; Christoph Spinner; Maximilian Schons; Martin Hower; Kai Wille; Martina Haselberger; Hanno Heuzeroth; Maria M Rüthrich; Sebastian Dolff; Johanna Kessel; Uwe Heemann; Jörg J Vehreschild; Siegbert Rieg; Christoph Schmaderer
Journal: J Med Virol Date: 2021-08-10 Impact factor: 20.693

3 in total