Literature DB >> 34139154

A time-resolved proteomic and prognostic map of COVID-19.

Vadim Demichev¹, Pinkus Tober-Lau², Oliver Lemke³, Tatiana Nazarenko⁴, Charlotte Thibeault², Harry Whitwell⁵, Annika Röhl³, Anja Freiwald³, Lukasz Szyrwiel⁶, Daniela Ludwig³, Clara Correia-Melo⁶, Simran Kaur Aulakh⁶, Elisa T Helbig², Paula Stubbemann², Lena J Lippert², Nana-Maria Grüning³, Oleg Blyuss⁷, Spyros Vernardis⁶, Matthew White⁶, Christoph B Messner⁸, Michael Joannidis⁹, Thomas Sonnweber¹⁰, Sebastian J Klein⁹, Alex Pizzini¹⁰, Yvonne Wohlfarter¹¹, Sabina Sahanic¹⁰, Richard Hilbe¹⁰, Benedikt Schaefer¹², Sonja Wagner¹², Mirja Mittermaier¹³, Felix Machleidt², Carmen Garcia², Christoph Ruwwe-Glösenkamp², Tilman Lingscheid², Laure Bosquillon de Jarcy², Miriam S Stegemann², Moritz Pfeiffer², Linda Jürgens², Sophy Denker¹⁴, Daniel Zickler¹⁵, Philipp Enghard¹⁵, Aleksej Zelezniak¹⁶, Archie Campbell¹⁷, Caroline Hayward¹⁸, David J Porteous¹⁷, Riccardo E Marioni¹⁹, Alexander Uhrig², Holger Müller-Redetzky², Heinz Zoller¹², Judith Löffler-Ragg¹⁰, Markus A Keller¹¹, Ivan Tancevski¹⁰, John F Timms²⁰, Alexey Zaikin²¹, Stefan Hippenstiel²², Michael Ramharter²³, Martin Witzenrath²², Norbert Suttorp²², Kathryn Lilley²⁴, Michael Mülleder²⁵, Leif Erik Sander²², Markus Ralser²⁶, Florian Kurth²⁷.

Abstract

COVID-19 is highly variable in its clinical presentation, ranging from asymptomatic infection to severe organ damage and death. We characterized the time-dependent progression of the disease in 139 COVID-19 inpatients by measuring 86 accredited diagnostic parameters, such as blood cell counts and enzyme activities, as well as untargeted plasma proteomes at 687 sampling points. We report an initial spike in a systemic inflammatory response, which is gradually alleviated and followed by a protein signature indicative of tissue repair, metabolic reconstitution, and immunomodulation. We identify prognostic marker signatures for devising risk-adapted treatment strategies and use machine learning to classify therapeutic needs. We show that the machine learning models based on the proteome are transferable to an independent cohort. Our study presents a map linking routinely used clinical diagnostic parameters to plasma proteomes and their dynamics in an infectious disease.

Entities: Chemical

Keywords: COVID-19; biomarkers; clinical disease progression; disease prognosis; longitudinal profiling; machine learning; patient trajectories; physiological parameters; proteomics

Mesh：

Substances：
Biomarkers
Proteome

Year: 2021 PMID： 34139154 PMCID： PMC8201874 DOI： 10.1016/j.cels.2021.05.005

Source DB: PubMed Journal: Cell Syst ISSN： 2405-4712 Impact factor: 10.304

Introduction

The coronavirus disease 2019 (COVID-19) has created unprecedented societal challenges, particularly for public health and the global economy (Alwan et al., 2020; Blumenthal et al., 2020; Rosenbaum, 2020). Efficient management of these challenges is hampered by the variability of clinical manifestations, ranging from asymptomatic infection with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) to death, despite maximum intensive care. Biomarkers and molecular signatures enabling accurate prognosis of future disease courses are needed to optimize resource allocation and personalize treatment strategies. Patients likely to progress to severe disease and organ failure and those likely to remain stable could be identified early, which is particularly valuable in scenarios where health care systems reach capacity limits. Prognostic panels would also optimize the monitoring of novel treatments, thereby accelerating clinical trials (Phua et al., 2020; Saxena, 2020; Wu et al., 2020). Knowledge of factors that differentiate recovery from deterioration throughout the disease will further enhance our understanding of the inflammatory host response as well as the underlying pathophysiology and provide new therapeutic targets. A number of biomarkers that classify COVID-19 severity have recently been described. These are based on clinical chemistry, enzyme activities, immune profiling, single-cell sequencing, proteomics, and metabolomics (D’Alessandro et al., 2020; Laing et al., 2020; Liu et al., 2020b; Messner et al., 2020; Overmyer et al., 2020; Schulte-Schrepping et al., 2020; Shen et al., 2020; Shu et al., 2020; Wynants et al., 2020). As severity classifiers, the molecular signatures recorded in blood, serum, plasma, or immune cells characterize the COVID-19 pathology and host responses. Furthermore, markers of dysregulated coagulation, inflammation, and other organ dysfunction have been established as risk factors for severe illness, including low platelet count, elevated levels of D-dimer, C-reactive protein (CRP), interleukin 6 (IL-6), ferritin, troponin, and markers of kidney injury (Danwang et al., 2020; Henry et al., 2020). Proteomic investigations that characterize the comprehensive host response have revealed the activation of the complement cascade and acute phase response, both of which center around IL-6-driven pathways. In turn, these systematic studies have revealed that other common antiviral pathways, such as type I interferons (IFN), do not dominate the early response to COVID-19, probably reflecting evasion of the IFN system by SARS-CoV-2 and the subsequent activation of inflammatory cascades (Hadjadj et al., 2020; Yang et al., 2020). Furthermore, proteomic data and diagnostic parameters have pointed to underlying pathological mechanisms and possible therapeutic targets. For instance, using high-throughput proteomics, we reported a decline in plasma levels of gelsolin (GSN) in patients with severe COVID-19 in a previous study (Messner et al., 2020), and recombinant human GSN is currently undergoing clinical testing for COVID-19 pneumonia in a phase II trial (ClinicalTrials.gov identifier: NCT04358406). The severity of the disease, and the biomarker signatures that indicate severity, correlate with the outcome, but the highest diagnostic need is to stratify within therapeutically homogeneous patients. For instance, to identify those among the mildly affected individuals with the highest risk for deterioration, or among the most severely affected, those with the highest chance to respond positively to an augmentation of therapy. Predicting future trajectories on an individualized basis would also help accelerate therapeutic developments to judge the impact of the treatment on an individual disease course. To obtain a comprehensive picture of how the molecular COVID-19 phenotype develops over time, we deeply phenotyped a group of 139 COVID-19 inpatients at 687 sampling points. On the one hand, we measured a compendium of 86 clinical parameters, routine diagnostic markers, and clinically established risk scores using gold standard accredited clinical tests. On the other hand, we captured the patient’s molecular phenotype by measuring plasma proteomes in an untargeted fashion. For this, we made use of liquid chromatography coupled with tandem mass spectrometry, using a recently developed platform technology that utilizes analytical flow rate chromatography, data-independent acquisition mass spectrometry (SWATH-MS), and deep-neural network-based data processing (Demichev et al., 2020; Messner et al., 2020) (Figure S1). By combining the compendium of diagnostic parameters with the proteomes in a time- and patient-resolved fashion, we obtained a comprehensive molecular picture that captures changes in the patient’s molecular phenotype as they depend on the severity, age, and disease progression. We identify prognostic biomarkers and depict their distinct trajectories. We exemplify the power of our resource by showing that the biomarker profiles and diagnostic parameters classify treatment requirements, in particular, the need for mechanical ventilation. Furthermore, we report the future prediction of recovery time in mildly ill patients as well as the individual risk of clinical deterioration. Our study demonstrates the predictability of COVID-19 disease trajectories based on the molecular phenotype of the early disease stage.

Results

Covariation of clinical diagnostic parameters and the plasma proteome characterizes the host response to COVID-19

We longitudinally phenotyped 139 patients admitted to Charité University Hospital, Berlin, Germany, between March 01, 2020, and June 30, 2020, due to PCR-confirmed SARS-CoV-2 infection (Figure S2). The patients exhibited highly variable disease courses, graded according to the World Health Organization (WHO) ordinal scale for clinical improvement (Table S1), which reflects the treatment that the patient is receiving as a measure of disease severity. The patients included in our study range from WHO grade 3, which includes patients who require inpatient care without supplemental oxygen therapy, to WHO grade 7, which includes patients with severe COVID-19 who require invasive mechanical ventilation and additional organ support therapies such as renal replacement therapy (RRT) and extracorporeal membrane oxygenation (ECMO) (WHO, 2020). In total, 23 out of 139 (17%) patients in the WHO grade 3 category were stable, without requiring supplemental oxygen therapy and could be discharged after a median of 7 days of inpatient care (Table S2 and Figure S2); 47 (34%) patients required either low-flow or high-flow supplemental oxygen therapy; 69 (50%) patients either presented with severe COVID-19 (WHO grade 6 or 7, i.e., requiring invasive mechanical ventilation) or deteriorated and required invasive mechanical ventilation during their hospitalization; 46 patients (33%) required RRT; and 22 (16%) were treated with ECMO. A total of 20 (13%) patients died, including three patients with do not intubate/do not resuscitate (DNI/DNR) orders in place and one patient who died due to a non-COVID-19-related cause. Common risk factors for severe COVID-19 were reflected in the outcomes: patients with a severe course of disease were older than those with mild disease (49 years [IQR 35–70] for WHO grade 3 versus 62 years [IQR 53–72] for WHO grade 7, p = 0.02), and an age of 65 years or older was associated with a higher risk of death -(OR 4.1 [95% CI 1.5–11.5]). Our cohort further reflected that men and individuals with a high BMI have an increased likelihood to be hospitalized upon a COVID-19 infection; 68% of the patients were men, and the median BMI was 27.8 (IQR 24.7–31.9). However, we noted that within the group of patients hospitalized with COVID-19, sex and BMI were not further associated with disease severity or an increased risk of death. The median duration of hospitalization was 20 days (IQR 9–48) and correlated with severity (7 days for WHO grade 3 versus 46 days for WHO grade 7). The median time from admission to death despite receiving maximum treatment was 28 days (IQR 16–46). To capture the diverse disease trajectories on a molecular and biochemical level, we systematically collected 86 clinical and accredited diagnostic parameters as measured with certified tests. Moreover, we monitored the development of risk scores such as the “sequential organ failure assessment” (SOFA) score, blood gas analyses, blood cell counts, enzyme activities, and inflammation biomarkers (Table S3). To complement these parameters with an untargeted analysis, we employed a recently developed high-throughput proteomics platform (Messner et al., 2020). This platform makes use of the data-independent acquisition technique SWATH-MS (Gillet et al., 2012), a sample preparation pipeline designed to ISO13485 reporting standards, which is optimized for reducing batch effects, high-flow rate chromatography to provide highly consistent peptide separation in large sample series, and uses DIA-NN (Data-Independent Acquisition by Neural Networks) to analyze proteomics data recorded with 5-min chromatography (Messner et al., 2020; Demichev et al., 2020) (Figure S1 for a detailed overview of the proteomic workflow). In total, we measured 1,169 plasma proteome samples to determine 687 human proteomes, in which we quantified 321 plasma protein groups. Owing to the nature of the high-flow proteomics platform, data completeness was high; thus, we decided against the use of imputation strategies in the analysis of differential protein abundance. Total data completeness was 75%, with 200 proteins consistently quantified with 98% completeness, and 189 proteins with 99% completeness (Figure S1). To identify interdependencies of the diagnostic parameters that are routinely used in clinical decision making and the plasma proteomes, we characterized their covariation and present a direct correlation map (Figures 1B and S3–S4; Tables S4, S5, and S6). We report a robust positive or negative correlation of IL-6 levels and other inflammatory markers (CRP, procalcitonin) with acute phase proteins (APPs) (APOA2, APOE, CD14, CRP, GSN, ITIH3, ITIH4, LYZ, SAA1, SAA2, SERPINA1, SERPINA3, and AHSG; the protein names corresponding to the gene identifiers are provided in Table S3), coagulation factors and related proteins (FGA, FGB, FGG, F2, F12, KLKB1, PLG, and SERPINC1), and the complement system (C1R, C1S, C8A, C9, CFB, CFD, and CFHR5). Our data, therefore, link the prominent role of the IL-6 response in COVID-19 (D’Alessandro et al., 2020) to coagulation and the complement cascade. Consistently, in our data, markers of cardiac (troponin T, NT-proBNP) and renal (creatinine, urea) function, as well as anemia and dyserythropoiesis (hemoglobin, hematocrit, erythrocytes, and red blood cell distribution width) correlate with various APPs (APOA2, APOE, CD14, GSN, LYZ, SAA1, SAA2, and SERPINA3; Figure 1B and Table S4) supporting the role of inflammation in COVID-19-related organ damage and its impact on erythropoiesis.

Figure 1

Interdependence of clinical, diagnostic, physiological and proteomic parameters during the clinical progression of COVID-19

(A) Study design. Schematic of the cohort of 139 patients with PCR-confirmed SARS-CoV-2 infection treated at Charité University Hospital Berlin. Plasma proteomics and accredited diagnostic tests were applied at 687 sampling points to generate high-resolution time series data for 86 routine diagnostic parameters and 321 protein quantities (see also Figures S1 and S2).

(B) Covariation map for plasma proteins and routine diagnostic and physiological parameters. Statistically significant correlations (Spearman; p < 0.05) are colored. Dots indicate statistical significance after row-wise multiple-testing correction (false discovery rate [FDR] < 0.05), black rectangles—column-wise. The panel on the right of the heatmap provides manual functional annotation for the proteins (see also Figures S3 and S4, and Tables S4, S5, and S6).

(C) Covariation of key diagnostic parameters and plasma protein markers (log2-transformed) in COVID-19 (see also Tables S4, S5, and S6). Dots colors correspond to the WHO grade of the patient, see Figure 2B.

Interdependence of clinical, diagnostic, physiological and proteomic parameters during the clinical progression of COVID-19 (A) Study design. Schematic of the cohort of 139 patients with PCR-confirmed SARS-CoV-2 infection treated at Charité University Hospital Berlin. Plasma proteomics and accredited diagnostic tests were applied at 687 sampling points to generate high-resolution time series data for 86 routine diagnostic parameters and 321 protein quantities (see also Figures S1 and S2). (B) Covariation map for plasma proteins and routine diagnostic and physiological parameters. Statistically significant correlations (Spearman; p < 0.05) are colored. Dots indicate statistical significance after row-wise multiple-testing correction (false discovery rate [FDR] < 0.05), black rectangles—column-wise. The panel on the right of the heatmap provides manual functional annotation for the proteins (see also Figures S3 and S4, and Tables S4, S5, and S6). (C) Covariation of key diagnostic parameters and plasma protein markers (log2-transformed) in COVID-19 (see also Tables S4, S5, and S6). Dots colors correspond to the WHO grade of the patient, see Figure 2B.

Figure 2

The molecular phenotype of patients with COVID-19 and its dependency on severity and age

(A) Plasma proteome and clinical diagnostic parameters in dependency of COVID-19 severity irrespective of age. The patients are grouped according to the maximum clinical treatment received (WHO ordinal scale), used as an indicator of disease severity (Table S1). 113 proteins and 55 routine diagnostic parameters vary significantly (FDR < 0.05) between patients of the different WHO groups upon accounting for age as a covariate using linear modeling (Ritchie et al., 2015). A fully annotated heatmap is provided in Figure S5 (see also Figure S6 and Table S3).

(B) Selected protein markers and routine diagnostic parameters (log2-transformed) plotted against the WHO ordinal scale.

(C) Selected proteins differentially abundant depending on age (FDR < 0.05). Left, colored: this data set (log2-transformed levels; statistical testing was performed by accounting for the WHO grade as a covariate Ritchie et al., 2015 and STAR methods; for visualization only, the data were corrected for the WHO grade); right, black: general population (log2-transformed levels; Generation Scotland cohort).

Increased levels of neutrophils and the occurrence of immature granulocyte precursors as markers of emergency myelopoiesis have been linked to severe COVID-19 (Schulte-Schrepping et al., 2020). Our data reveal covariation between neutrophil counts and the levels of two inhibitors of neutrophil serine proteases, SERPINA1 and SERPINA3 (Figure 1C). These two proteins show the highest correlation (0.72 and 0.79 Spearman R, respectively) with the neutrophil-to-lymphocyte ratio (NLR), a prognostic marker for COVID-19 (Lian et al., 2020; Liu et al., 2020a). We further report a strong correlation (Figure 1C) of alkaline phosphatase and gamma-glutamyl transferase activities, both characteristic of biliary disorders (Poynard and Imbert-Bismut, 2012), with plasma levels of the polymeric immunoglobulin receptor (PIGR). We notice that cholangiocytes (bile duct epithelium cells) express ACE-2 and can be directly infected with SARS-CoV-2 (Zhao et al., 2020), potentially leading to host viral response-induced expression of PIGR and cell destruction (Schneeman et al., 2005; Turula and Wobus, 2018).

A map of plasma proteins and diagnostic parameters that depend on age and disease severity

113 proteins and 55 accredited diagnostic parameters responded in accordance to an increase in the WHO score as a measure of progressing COVID-19 severity (Figures 2, S5, and S6; STAR methods). To the best of our knowledge, more than 30 of these proteins have not been associated with COVID-19 severity previously (Table S3). The proteins that change dependent on disease severity include mediators of inflammation and immune response (CD44, B2M, PIGR, and A2M), components of the complement cascade (CFD, and CFHRs), and apolipoproteins (APOA2, APOC3, APOD, APOE, and APOL1). Furthermore, numerous markers of organ dysfunction (cardiac: NT-proBNP, troponin T; renal: creatinine, urea; liver: aspartate aminotransferase, alanine aminotransferase, gamma-glutamyl transferase, and total bilirubin) and, inversely, markers of anemia (hemoglobin, erythrocytes, and hematocrit) were correlated with the WHO grade of the patient. In order to further dissect the proteomic signatures of the most severely ill patients requiring maximum treatment (WHO grade 7), we specifically characterized the impact of organ support treatments (RRT and ECMO) on the patients’ molecular phenotype (Figures S7 and S8). We showed, for instance, that HP and HPX are reduced in patients on RRT and ECMO as a sign of hemolysis in the extracorporeal circuit, whereas elevated SERPINC1 levels mirror substitution of antithrombin during ECMO. We discuss these findings in Note S1. The molecular phenotype of patients with COVID-19 and its dependency on severity and age (A) Plasma proteome and clinical diagnostic parameters in dependency of COVID-19 severity irrespective of age. The patients are grouped according to the maximum clinical treatment received (WHO ordinal scale), used as an indicator of disease severity (Table S1). 113 proteins and 55 routine diagnostic parameters vary significantly (FDR < 0.05) between patients of the different WHO groups upon accounting for age as a covariate using linear modeling (Ritchie et al., 2015). A fully annotated heatmap is provided in Figure S5 (see also Figure S6 and Table S3). (B) Selected protein markers and routine diagnostic parameters (log2-transformed) plotted against the WHO ordinal scale. (C) Selected proteins differentially abundant depending on age (FDR < 0.05). Left, colored: this data set (log2-transformed levels; statistical testing was performed by accounting for the WHO grade as a covariate Ritchie et al., 2015 and STAR methods; for visualization only, the data were corrected for the WHO grade); right, black: general population (log2-transformed levels; Generation Scotland cohort). A total of 61 proteins and 18 diagnostic parameters varied with patients’ age (Figure S9). Out of these, 37 proteins do not change with age in a pre-COVID-19 general population baseline (Generation Scotland cohort [Smith et al., 2006]), for which proteomes have been measured with the same proteomic technology (Messner et al., 2020) (Figure S10). We observed that a number of markers that increase with age in COVID-19 patients also correlated with a high WHO grade (Figures S6 and S9). To identify markers that are upregulated or downregulated in older patients in comparison with younger patients with a comparable therapy need, i.e. WHO grade, we tested the relationship between omics feature levels and age by accounting for WHO severity grade as a covariate using linear modeling (Ritchie et al., 2015) (STAR methods). This analysis identified 36 proteins and 12 clinical laboratory markers that are up- or are downregulated with age in COVID-19 patients within the same level of care, i.e., one WHO grade (Figures 2C and S11, summarized in Figure 5). Out of these, 20 proteins do not change with age in the pre-COVID-19 population baseline (Generation Scotland cohort proteome data, Messner et al., 2020; Figure S10), or show the opposite correlation with age in the general population (e.g., APOC2, Figure 2C). These proteins that only show an age-dependency in COVID-19 patients but not in the general population point toward age-dependent differences in host response patterns to SARS-CoV-2, and include markers involved in inflammation (SERPINA3, ITIH4, SAA1, SAA1, SAA2, ITIH3, CFB, C7, and AHSG), lipid metabolism (APOC1, APOC2, APOC3, APOB, and APOD), and coagulation (KLKB1, and FBLN1). We consider the implications of these findings in Note S2.

Figure 5

Summary: association of individual plasma proteins, routine diagnostic and physiological parameters with severity, necessary therapy, and progression of COVID-19.

For each statistical test considered (association with WHO grade, prediction of the remaining time in hospital for patients at WHO grade 3, prediction of worsening, i.e., progression to a higher WHO grade in the future, the trend during the peak period of the disease, association with RRT, association with ECMO and association with higher patient age), measurements, which show significant differences are highlighted, with the color indicating the trend, e.g., red for CST3 in the “Association with COVID-19 severity” test indicates higher levels of CST3 in severely ill patients. Proteins for which MRMAssayDB (Bhowmick et al., 2018) lists that a targeted proteomic assay has been developed are indicated with a black bar at the top. Proteins which change significantly with age in the Charité COVID-19 cohort (FDR < 0.05) but do not change significantly (p < 0.05) with age in the general population (Generation Scotland cohort), are highlighted with a white circle in the 7th row (“Association with age”). See also Figures S6–S8, S10, S11, S14, S17, and S18, and Table S3.

Time-dependent alleviation of severity indicators highlights the role of the early host response in COVID-19 progression

The time-resolved nature of our study facilitated a covariation analysis of protein levels and accredited diagnostic parameters along the patient trajectory over time (Figure S12; Table S7). Correlating the dynamics of omics features during the peak period of the disease (STAR methods), we noted covariation of inflammatory markers, APPs, fibrinogen precursor proteins, and the NLR. The correlation between APPs and the markers of cardiac and renal impairment observed across different patients at the earliest time points (Figure 1B; Table S4) was not reflected as a trend over time (Figure S13). To further dissect the dynamics of the patients’ molecular phenotype during the course of COVID-19, we determined the longitudinal trend for all protein and diagnostic parameters during the peak period of the disease (i.e., while receiving maximum treatment; STAR methods). In total, 89 proteins and 37 clinical parameters significantly changed over time (Figure 3B, trends across all time points at the maximum WHO grade are provided in Figure S14; STAR methods). In general, we found that most proteins and diagnostic parameters that correlate with disease severity return toward baseline during the peak period of the disease. Many of these were most prominently changed in the early samples (Figure S6) but alleviated with time, irrespective of the outcome (Figure S14; summarized in Figure 5). For example, components of the coagulation cascade with known acute phase activity, such as fibrinogen, and many complement factors, significantly decreased over time. Proteins indicative of inflammatory response (e.g., ORM1, SERPINA1 and SERPINA3, SAA1, SAA2 [Luo et al., 2015; Sack, 2018; Wu et al., 2015]) and markers of inflammation, such as CRP or IL-6, also declined over time. Conversely, extracellular matrix (ECM) proteins, such as ECM1, LUM, and immunoregulatory factors (e.g. AHSG, A2M Rehman et al., 2013, and HRG Wakabayashi, 2013) and proteins involved in lipid metabolism (e.g., APOC1, APOD, APOM, GPLD1, and PON1), and negative APPs (e.g., ITIH1, Figure 3C), which are downregulated in severe COVID-19 (Figure S6, summarized in Figure 5), increased over time, approaching the baseline. This general alleviation of the initial molecular phenotype of COVID-19 was consistently detected in both mildly and severely ill patients (outlier trajectories discussed in Note S3). Indeed, only 13 proteins showed differences in trend depending on the WHO score (Figure S15). We provide visualization of individual trajectories for all omics features measured between the first and the last time points sampled at the peak of the disease (Figures 3C and S16).

Figure 3

The progression of the COVID-19 molecular patient phenotype over time

(A) Schematic: each patient is followed during inpatient care by repetitive sampling, and the “trajectory” of each of the proteins and the routine diagnostic features is analyzed (points of different colors at each time point) (see also Figure S2).

(B) Protein levels and routine diagnostic parameters that change significantly (FDR < 0.05) over time during the peak of the disease, shown for individual patients stratified by their maximum treatment received (WHO grade): 89 proteins, 37 clinical diagnostic markers show time dependency during the disease course (illustrated as log2-fold changes or absolute value changes, as indicated). The panel to the right of the heatmap provides manual functional annotation for the proteins. Known associations with COVID-19 severity are indicated (blue - downregulated in severe COVID-19, and red - upregulated) (D’Alessandro et al., 2020; Laing et al., 2020; Messner et al., 2020; Shen et al., 2020). Below the heatmap, the time span between the first and the last sampling time point at the peak of the disease is indicated as well as the remaining time until the discharge (see also Figures S14 and S15, and Table S3).

(C) Trajectories (change of log2-transformed levels with time) for selected proteins. Sampling points during the peak period of the disease (STAR methods) are considered. x axis: 0 – first time point measured at the peak of the disease, 1 – last. The y axis reflects the change relative to the first valid measurement during the peak of the disease. Loess approximations are shown for patients, which did (blue), and did not (orange), require invasive mechanical ventilation. See also Figure S16.

The progression of the COVID-19 molecular patient phenotype over time (A) Schematic: each patient is followed during inpatient care by repetitive sampling, and the “trajectory” of each of the proteins and the routine diagnostic features is analyzed (points of different colors at each time point) (see also Figure S2). (B) Protein levels and routine diagnostic parameters that change significantly (FDR < 0.05) over time during the peak of the disease, shown for individual patients stratified by their maximum treatment received (WHO grade): 89 proteins, 37 clinical diagnostic markers show time dependency during the disease course (illustrated as log2-fold changes or absolute value changes, as indicated). The panel to the right of the heatmap provides manual functional annotation for the proteins. Known associations with COVID-19 severity are indicated (blue - downregulated in severe COVID-19, and red - upregulated) (D’Alessandro et al., 2020; Laing et al., 2020; Messner et al., 2020; Shen et al., 2020). Below the heatmap, the time span between the first and the last sampling time point at the peak of the disease is indicated as well as the remaining time until the discharge (see also Figures S14 and S15, and Table S3). (C) Trajectories (change of log2-transformed levels with time) for selected proteins. Sampling points during the peak period of the disease (STAR methods) are considered. x axis: 0 – first time point measured at the peak of the disease, 1 – last. The y axis reflects the change relative to the first valid measurement during the peak of the disease. Loess approximations are shown for patients, which did (blue), and did not (orange), require invasive mechanical ventilation. See also Figure S16. Overall, the molecular patient phenotype reflected an initial spike in the systemic inflammatory response, which alleviated gradually, followed by a protein signature indicative of tissue repair, metabolic reconstitution, and immunomodulation. This was observed in both mildly and severely ill patients, highlighting the early disease phase as a major molecular determinant of the COVID-19 phenotype.

Proteomes and diagnostic clinical markers allow for prediction of disease severity by machine learning

Using a machine learning algorithm based on gradient boosted trees (STAR methods), we first evaluated the extent to which diagnostic parameters and proteomes characterize treatment requirements, as reflected by the WHO grade. Both proteomes and clinical diagnostic parameters were highly discriminative of the patient receiving invasive mechanical ventilation (WHO grade 6 or 7, clinical laboratory values AUROC = 0.97, proteomic data AUROC = 0.98, combined data AUROC = 0.99; Figure 4C). The machine learning models significantly outperformed the predictive scores derived from established COVID-19 risk factors such as age, BMI, Charlson comorbidity index (CCI), or molecular predictors such as CRP or IL-6 levels (Figure 4C). In order to assess the transferability of the proteomic predictors, we tested our model in an independent cohort of 99 hospitalized patients with COVID-19 from another hospital in a different healthcare system (Innsbruck cohort, STAR methods). The proteomic model trained on the main Charité cohort demonstrated a comparably high patient stratification performance when applied to this validation cohort (Figure 4D; AUROC = 0.97). Scores reflecting the contribution of individual proteins and clinical parameters to the machine learning model are provided in Table S3. Of note, we were able to establish machine learning models that not merely classified patients based on severity but were able to predict the current WHO severity grade from the proteome, from clinical measurements, and both (Figure 4E). Again, combined proteomic and clinical laboratory data performed best.

Figure 4

Predicting COVID-19 treatment requirement and future disease progression from the early molecular phenotype by using machine learning.

(A) Selected proteins that are predictive (FDR < 0.05) of the future clinical deterioration of the disease (that is progression to a higher WHO grade in the future; STAR methods). Illustrated are the log2-transformed levels of the proteins at the first sampling point upon correction (for visualization only) for the impact of the WHO grade and age as covariates (Ritchie et al., 2015) (see also Figure S17).

(B) Selected proteins and routine diagnostic parameters predictive (FDR < 0.05) of the remaining time in hospital for patients receiving mild treatment (WHO grade 3). Statistical testing was performed by including patient’s age as a covariate (STAR methods). Illustrated are the log2-transformed levels of the proteins (upon correction for age as a covariate, for visualization only) at the first sampling point, plotted against the remaining time in hospital (days) (see also Figure S18).

(C) Left: performance of a machine learning model characterizing the need for invasive mechanical ventilation, based on either the proteomic data, accredited diagnostic parameters, or both. Right: comparison of the performance of a machine learning model characterising the need for invasive mechanical ventilation based on proteomic data to established prognostic parameters.

(D) Prediction performance, based on the proteome, visualized as boxplots. Cross-validation predictions on the Charité cohort are shown in black, predictions of a model trained on the Charité data and then applied to an independent cohort from another hospital (Innsbruck cohort) are shown in red.

(E) Prediction of the WHO grade itself using machine learning (cross-validated, first time point at the maximum treatment level for each patient is used, STAR methods), based on either the proteome, clinical diagnostic parameters, or both. The performance of the proteomic model trained on the Charité cohort and applied to the Innsbruck cohort is also shown.

(F) A machine learning model was trained to predict the level of necessary treatment (WHO grade) using the data (proteomics, clinical, or both) from the first time point measured for each patient. Derived predictions for patients who did not receive supplemental oxygen at this time point (WHO = 3) were plotted against the remaining time (days) in hospital for these patients.

Predicting COVID-19 treatment requirement and future disease progression from the early molecular phenotype by using machine learning. (A) Selected proteins that are predictive (FDR < 0.05) of the future clinical deterioration of the disease (that is progression to a higher WHO grade in the future; STAR methods). Illustrated are the log2-transformed levels of the proteins at the first sampling point upon correction (for visualization only) for the impact of the WHO grade and age as covariates (Ritchie et al., 2015) (see also Figure S17). (B) Selected proteins and routine diagnostic parameters predictive (FDR < 0.05) of the remaining time in hospital for patients receiving mild treatment (WHO grade 3). Statistical testing was performed by including patient’s age as a covariate (STAR methods). Illustrated are the log2-transformed levels of the proteins (upon correction for age as a covariate, for visualization only) at the first sampling point, plotted against the remaining time in hospital (days) (see also Figure S18). (C) Left: performance of a machine learning model characterizing the need for invasive mechanical ventilation, based on either the proteomic data, accredited diagnostic parameters, or both. Right: comparison of the performance of a machine learning model characterising the need for invasive mechanical ventilation based on proteomic data to established prognostic parameters. (D) Prediction performance, based on the proteome, visualized as boxplots. Cross-validation predictions on the Charité cohort are shown in black, predictions of a model trained on the Charité data and then applied to an independent cohort from another hospital (Innsbruck cohort) are shown in red. (E) Prediction of the WHO grade itself using machine learning (cross-validated, first time point at the maximum treatment level for each patient is used, STAR methods), based on either the proteome, clinical diagnostic parameters, or both. The performance of the proteomic model trained on the Charité cohort and applied to the Innsbruck cohort is also shown. (F) A machine learning model was trained to predict the level of necessary treatment (WHO grade) using the data (proteomics, clinical, or both) from the first time point measured for each patient. Derived predictions for patients who did not receive supplemental oxygen at this time point (WHO = 3) were plotted against the remaining time (days) in hospital for these patients. Having observed clear time trajectories for many proteins and diagnostic parameters, we hypothesized that the molecular signature of the initial host response can be exploited for the prediction of the future disease course. We started by investigating the potential of using the levels of proteins and diagnostic parameters for prediction of future clinical worsening, defined as progression to a higher severity grade on the WHO scale, i.e., a requirement for supplemental low-flow oxygen therapy, high-flow oxygen therapy, or invasive mechanical ventilation. Upon using a linear model to account for current therapy (WHO grade) and age as covariates, 11 proteins and 9 clinical laboratory markers were identified as predictors of future worsening of the clinical condition, across all treatment groups (STAR methods) (Figures 4A and S17; Box 1). Increased or decreased plasma levels of these proteins functioning in inflammation (CRP, ITIH2, SERPINA3, AHSG, and B2M), coagulation (HRG, and PLG), and complement activation (C1R, and CFD), as well as levels of AGT and CST3, were predictive of future clinical deterioration.

High levels indicative of poor prognosis

AGT: Angiotensinogen: Conversion via angiotensin-converting enzymes ACE and ACE2 produces AngI/AngII (pro-inflammatory, vasoconstrictive, pro-fibrotic) and Ang1-7/Ang1-9 (anti-inflammatory, vasodilative, anti-fibrotic), respectively (Turner, 2015; Zhang et al., 2020). Increased AGT likely reflects increased AngI/AngII due to SARS-CoV-2 mediated inactivation of ACE2 (Tay et al., 2020) and subsequently predominant conversion of AGT to AngII. AngII correlates with viral load (Liu et al., 2020c) and has tissue damaging effects (Zhang et al., 2020). B2M: Beta-2-microglobulin: Component of major histocompatibility complex (MHC I) on all nucleated cells and platelets. Released abundantly by activated platelets leading to pro-inflammatory M1-like macrophage polarization (Hilt et al., 2019). Increase of B2M has been associated with death in patients with chronic kidney disease (Makridakis et al., 2020). C1R: Complement C1r: Initiator of the classical complement pathway (Hajishengallis et al., 2017). CFD: Complement Factor D: Initiator of the alternative complement pathway by cleaving Factor B (CFB) to form the C3bBb alternative pathway convertase (Volanakis and Narayana, 1996). CRP: C-reactive protein: Acute phase protein, strongly upregulated in inflammation and infection, including COVID-19. CST3: Cystatin C: Biomarker of kidney function (Peralta et al., 2011). SERPINA3: Alpha-1-antichymotrypsin: Protease inhibitor of neutrophil cathepsin G (Benarafa, 2015). When cleaved at reactive site loop, it becomes stable to degradation and becomes a strong neutrophil chemoattractant (Banda et al., 1988; Potempa et al., 1991)

Low levels indicative of poor prognosis

AHSG: Alpha-2-HS glycoprotein (Fetuin-A): Negative acute phase protein attenuating macrophage activation and neutrophil degranulation (Ombrellino et al., 2001). HRG: Histidine-rich glycoprotein: Negative acute phase protein, regulator of inflammation and immune response, clearance of pathogens and cell detritus as well as coagulation and fibrinolysis through a variety of interactions (Poon et al., 2011; Wakabayashi, 2013). ITIH2: Inter-alpha-trypsin inhibitor heavy chain H2: Covalently linked to bikunin (AMBP), the complex binds to hyalarunan (HA) to form serum-derived hyaluronan-associated protein (SHAP) which has matrix-stabilizing and immunomodulatory effects (Fries and Blom, 2000; Zhuo et al., 2004). PLG: Plasminogen, Plasmin: Mediator of fibrinolysis (Chapin and Hajjar, 2015). More recently, immunological functions including neutrophil attenuation as well as macrophage efferocytosis and polarization from pro-inflammatory M1 to tissue-repairing M2 phenotype have been identified (Heissig et al., 2020). Next, we investigated the predictability of the remaining time needed in the hospital for mildly ill patients with maximum WHO grade 3. We identified 26 protein biomarkers and 14 routine diagnostic markers (Figures 4B and S18) that correlate with the time between the first sampling point and discharge from inpatient care. The proteomic signature associated with a longer need for inpatient treatment is characterized by proteins of the complement system (C1QA, C1QB, and C1QC) and reflects altered coagulation (KLKB1, PLG, and SERPIND1) and inflammation (CD14, B2M, SERPINA3, CRP, GPLD1, PGLYRP2, and AHSG). As most of these proteins are also predictors of the required treatment (Figure S6; Table S3), we hypothesized that the time of inpatient care for mild (WHO grade 3) cases correlates with the severity of the disease in these patients. To test this hypothesis, we generated machine learning models for WHO grade prediction, similar to those shown in Figure 4E, but trained the model only on the first time point data measured for each patient (to avoid using any future information with respect to that time point). We observed that the predictions derived from the first time point data correlated with the remaining time in the hospital (Figure 4F). We conclude that machine learning allows us to finely distinguish between more and less severe patients within a single treatment group, i.e. WHO grade.

Discussion

Upfront clinical decision making is essential for optimum treatment allocation to patients as well as for efficient resource management within the hospital. For instance, early referral to intensive care treatment units has been shown to improve prognosis and outcome for patients with severe COVID-19 (Sun et al., 2020). One of the peculiarities of COVID-19 is that the examinable clinical conditions of patients often do not reflect the true severity of the disease, e.g., with respect to respiratory insufficiency. In contrast to patients with severe bacterial pneumonia, patients with COVID-19 often clinically appear to be only slightly affected, despite being in severe respiratory failure, a phenomenon termed “happy hypoxemia” (Stawicki et al., 2020). Clinical decisions therefore need to be supported by objective, molecular diagnostics. These diagnostic analyses help further in the monitoring of therapies and clinical trials as they allow for determining the extent to which a given patient has deviated from the disease trajectory that would be achieved without therapy. Several recent investigations have identified protein biomarkers and clinical parameters that classify patients with COVID-19 according to disease severity and/or received treatment (D’Alessandro et al., 2020; Laing et al., 2020; Liu et al., 2020b; Messner et al., 2020; Overmyer et al., 2020; Schulte-Schrepping et al., 2020; Shen et al., 2020; Shu et al., 2020; Wynants et al., 2020). In other studies, the potential prognostic value of several established and newly discovered markers for predicting the future course of the disease was investigated, e.g., for IL-6, ferritin, or resistin (Grifoni et al., 2020; Meizlish et al., 2020). Yet, it remained challenging so far, to put their prognostic value in relation to patient age and current level of care, the two most important apparent characteristics for prognosis in COVID-19. For instance, a patient at WHO grade 5 who requires high-flow oxygen therapy is significantly more likely to progress to mechanical ventilation and subsequently die than an inpatient at WHO grade 3 that does not require oxygen support. Likewise, a 90-year-old patient at WHO grade 3 is significantly more likely to progress to more severe disease and to stay in the hospital for a longer period of time than a 20-year-old patient at the same WHO grade. To identify (1) which proteomic markers and laboratory parameters correlate with each other by being attributed to a common biological or physiological response, and (2) which markers reflect disease trajectories, we longitudinally phenotyped 139 individuals admitted at Charité University Hospital, Berlin, Germany, between March 01, 2020, and June 30, 2020, due to PCR-confirmed SARS-CoV-2 infection (Figure S2). We recorded a large panel of 86 parameters with accredited diagnostic tests. These tests capture the compendium of analytical parameters that are available for the clinical routine. In parallel, we record plasma proteomes using a recently developed mass spectrometry platform. This platform introduced the use of analytical (high-flow rate) chromatography to routine proteomics in order to increase throughput and measurement precision to the scale of clinical trials (Messner et al., 2020). The platform reaches a similar proteomic depth as other contemporary mass spectrometry technologies that address undepleted human plasma that is constrained by its huge dynamic range (Anderson and Anderson, 2002) (Box 2 for the resources generated). We provide deep and time-resolved resources that characterize COVID-19 at the level of plasma proteomes and established diagnostic parameters. We demonstrate the extent to which proteomes and diagnostic parameters interdepend, in initial response to the disease and in dynamics during the disease course. We show how they change with age, differ depending on the disease severity, reflect the therapy received and evolve over time. Our data have been acquired for COVID-19 patients’ samples and analyzed in the context of general population proteomics (Generation Scotland) for which samples have been measured with the same proteomic technology (Messner et al., 2020), but we also expect it to be of high value as a reference for studies of other types of viral pneumonia as well as any investigations involving both routine clinical phenotyping and plasma proteomics. Summary of the resource data generated in the study. Covariation maps. We provide a covariation map between plasma proteins measured with at least 3 peptides and clinical laboratory measurements (Figure 1B; Table S4). In addition, we provide a full covariation map between all features measured in the study (Figure S3; Table S5) as well as a COVID-19 specific protein-protein covariation map (Figure S4; Table S6). Finally, we also provide a correlation map for the changes of different omics features with time (Figure S12; Table S7). A map of plasma protein levels and clinical laboratory measurements depending on disease severity (Figures S5 and S6; Table S3). Characterization of age-dependency of plasma protein levels and clinical laboratory measurements in COVID-19, and in comparison with the general population (Figures 2C and S9–S11; Table S3). Characterization of the dynamics of plasma protein levels and clinical laboratory measurements during the course of COVID-19 (Figures 3B, S14, and S15; Table S3). Characterization of the predictive power of plasma protein levels and clinical laboratory measurements in COVID-19 (Figures 4, S17, and S18; Table S3). Proteomic and clinical signatures observed in severe COVID-19 patients undergoing RRT and ECMO (Figures S7 and S8; Table S3). The comprehensive and time-resolved molecular phenotyping of this patient cohort is complemented by a comparison with a healthy population baseline (Generation Scotland [Smith et al., 2006]) measured with the same proteomic platform (Messner et al., 2020), and the characterization of an independent cohort from an unrelated health care system (Innsbruck cohort, Austria) for validating the created predictors. The measurements were performed on samples collected during the early period of COVID-19, i.e., before immunomodulatory treatments such as dexamethasone became standard of care for severe COVID-19 (RECOVERY Collaborative Group, 2020). Our data thus reflect treatment-naive trajectories, which are of major value as baseline data for future studies. We report an initial spike in the early inflammatory host response as a determinant for the future course of the disease. As our results indicate, the patients in our cohort showed molecular marker signatures of higher basal inflammation with increasing age, which might be partially responsible for the higher risk of severe COVID-19 in older individuals. While several approaches of targeted anti-inflammatory treatment have not been successful in preventing clinical deterioration in COVID-19 so far (Stone et al., 2020), our study indicates that this special population of older patients might benefit particularly from treatments that mitigate the inflammatory host response. We report numerous interdependencies between clinical laboratory markers and alterations in proteomes, linking, for example, clinical inflammatory markers to components of the complement cascade and the coagulation system. Using machine learning, we show that both plasma proteomes and the compendium of established diagnostic parameters can be used for accurate characterization of disease severity, significantly outperforming established individual risk markers, such as CRP or IL-6 levels. Of note, the combination of proteomic features and clinical laboratory markers repeatedly showed the best performance in the machine learning models. Furthermore, the models generated could be transferred for prediction in an independent cohort from another hospital and healthcare system, highlighting the robustness of this approach and its translational potential. We observed a considerable overlap between prognostic markers and those that classify treatment according to COVID-19 severity (Figure 5). Out of 49 prognostic markers, 41 correlated with the WHO severity score. As an example, SERPINA3 (Alpha-1 antichymotrypsin) can be used for both the classification of severity and prediction of future disease course. Both SERPINA3 and SERPINA1, another classifier of severity, possess anti-inflammatory properties and are involved in the protection of tissues from neutrophil elastase- and cathepsin G-mediated tissue damage (Benarafa, 2015). Our data show a strong correlation of both serpins with levels of neutrophils and NLR in peripheral blood. SERPINA1 is mainly produced by the liver but it is also produced in epithelial cells, pulmonary alveolar cells, tissue macrophages, blood monocytes, and granulocytes. Hence, this finding presumably reflects a systemic response to the increased NLR. After binding to effector enzymes, SERPIN-proteinase complexes are normally rapidly cleared from the blood but become resistant to degradation when cleaved at the reactive site loop (Gettins and Olson, 2016). Cleaved SERPINA1 and SERPINA3 have been shown to act as strong neutrophil chemoattractants (Banda et al., 1988; Potempa et al., 1991). The observed increase in levels of SERPINA1 and SERPINA3 might therefore partly reflect the more stable, chemoattractant, pro-inflammatory cleaved forms, rather than the short-lived tissue-protective proteins in severe COVID-19. Given the prominent role of neutrophil activation in severe COVID-19 (Schulte-Schrepping et al., 2020), this finding merits further investigation. Summary: association of individual plasma proteins, routine diagnostic and physiological parameters with severity, necessary therapy, and progression of COVID-19. For each statistical test considered (association with WHO grade, prediction of the remaining time in hospital for patients at WHO grade 3, prediction of worsening, i.e., progression to a higher WHO grade in the future, the trend during the peak period of the disease, association with RRT, association with ECMO and association with higher patient age), measurements, which show significant differences are highlighted, with the color indicating the trend, e.g., red for CST3 in the “Association with COVID-19 severity” test indicates higher levels of CST3 in severely ill patients. Proteins for which MRMAssayDB (Bhowmick et al., 2018) lists that a targeted proteomic assay has been developed are indicated with a black bar at the top. Proteins which change significantly with age in the Charité COVID-19 cohort (FDR < 0.05) but do not change significantly (p < 0.05) with age in the general population (Generation Scotland cohort), are highlighted with a white circle in the 7th row (“Association with age”). See also Figures S6–S8, S10, S11, S14, S17, and S18, and Table S3. Our data also highlight angiotensinogen (AGT) as a marker for future worsening. Activation of angiotensinogen occurs via the protease renin and the endogenous angiotensin-converting enzymes ACE or ACE2. ACE converts angiotensin I (AngI) to pro-inflammatory, vasoconstrictive, and pro-fibrotic angiotensin II (AngII) (Zhang et al., 2020). ACE2, in contrast, mediates conversion of angiotensins I and II to anti-inflammatory, vasodilative, anti-fibrotic, and anti-oxidant angiotensins 1–9 (Ang1-9) and 1–7 (Ang1-7) (Turner, 2015). SARS-CoV-2 invades host cells of the lung, heart, kidneys, and other organs via ACE2, resulting in the internalization and downregulation of ACE2 (Hoffmann et al., 2020; Tay et al., 2020; Zhang et al., 2020). Subsequently, angiotensinogen is converted predominantly via ACE to AngII and is less degraded by ACE2, resulting in AngII accumulation (Batlle et al., 2012; Silhol et al., 2020). We can thus assume that the higher plasma levels of AGT gene products in severely ill patients, as measured in our study, mainly reflect the higher levels of AngII. Importantly, we observed a strong correlation of AGT with markers of acute kidney injury (AKI; creatinine, urea; Figures S3 and S13; Table S5), a frequent complication of COVID-19 and a risk factor for poor prognosis and fatal outcome (Fu et al., 2020). Aggravated by the absence of tissue-protective Ang1-7, elevated levels of AngII lead to activation of the renin-angiotensin-system (RAS) and contribute to hypoxic kidney injury (Kasal et al., 2020). Of note, apart from tissue damaging effects, AngII has been shown to linearly correlate with viral load and lung injury in SARS-CoV-2 infection (Liu et al., 2020c). Overall, many of the markers that are both classifiers and predictors of the future disease course are initiators of the inflammatory response. This group includes some of the key initiators of the complement cascade: C1QA, C1QB, C1QC, C1R, and CFD. In contrast, severity markers without prognostic value largely include downstream effectors of inflammation-associated damage, such as GSN and the circulating actins ACTBL2 and ACTB, ACTG1, and ECM1. Thus, this high-precision, high-throughput approach can help us understand mechanisms of immune-mediated organ damage on a molecular basis. Despite the high resolution and high throughput of the mass spectrometry platform deployed in our study, the direct translation of our results into clinical practice will require the development of a clinical assay according to FDA or EMA standards. We further note that the use of machine learning is currently not a certified method to inform clinical decisions. However, in addition to multiple works that have successfully used machine learning for clinical prognosis previously (see recent reviews [Kelly et al., 2019; Lee and Lee, 2020; Nagendran et al., 2020; Shah et al., 2019; Vollmer et al., 2020]), our results bear a strong implication of the future potential of machine learning for clinical applications, including personalized medicine. This calls for a worldwide effort aimed at developing procedures, which would allow reliable clinical validation of machine learning predictors, their approval, and their routine deployment in the clinic. In summary, by following a deeply phenotyped COVID-19 patient cohort over time at the level of the proteome and established diagnostic biomarkers and physiological parameters, we have created a rich data resource for understanding the extent and progression of COVID-19. We have shown that an early spike in the inflammatory response is a key determinant of COVID-19, and that future disease progression is predictable by using panels of accredited diagnostic parameters as well as proteomic measurements from early time point samples. By using machine learning, we demonstrated that the proteome allows to precisely characterize the patients’ phenotype and that the resulting machine learning models are robust and perform accurately when applied to samples from a different hospital and healthcare system. Our study provides comprehensive information about the key determinants of the varying COVID-19 trajectories as well as marker panels for early prognosis that can be exploited for clinical decision making, to devise personalized therapies, as well as for monitoring the development of much needed COVID-19 treatments.

Consortia

Malte Kleinschmidt, Katrin M. Heim, Belén Millet, Lil Meyer-Arndt, Ralf H. Hübner, Tim Andermann, Jan M. Doehn, Bastian Opitz, Birgit Sawitzki, Daniel Grund, Peter Radünzel, Mariana Schürmann, Thomas Zoller, Florian Alius, Philipp Knape, Astrid Breitbart, Yaosi Li, Felix Bremer, Panagiotis Pergantis, Dirk Schürmann, Bettina Temmesfeld-Wollbrück, Daniel Wendisch, Sophia Brumhard, Sascha S. Haenel, Claudia Conrad, Philipp Georg, Kai-Uwe Eckardt, Lukas Lehner, Jan M. Kruse, Carolin Ferse, Roland Körner, Claudia Spies, Andreas Edel, Steffen Weber-Carstens, Alexander Krannich, Saskia Zvorc, Linna Li, Uwe Behrens, Sein Schmidt, Maria Rönnefarth, Chantip Dang-Heine, Robert Röhle, Emma Lieker, Lucie Kretzler, Isabelle Wirsching, Christian Wollboldt, Yinan Wu, Georg Schwanitz, David Hillus, Stefanie Kasper, Nadine Olk, Alexandra Horn, Dana Briesemeister, Denise Treue, Michael Hummel, Victor M. Corman, Christian Drosten, and Christof von Kalle

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Markus Ralser (markus.ralser@charite.de).

Materials availability

This study did not generate new unique reagents.

Data and code availability

The processed proteomic and clinical source data is available in this paper’s supplemental information. The raw mass spectrometry proteomics source data for the quality control plasma and serum acquisitions has been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (Perez-Riverol et al., 2019) with the dataset identifier PXD025752. This paper does not report original code. The machine learning scripts used to generate the figures reported in this paper are available in this paper’s supplemental information. Any additional information required to reproduce this work is available from the Lead Contact.

Experimental model and subject details

Charité patient cohort and clinical data

Patients were recruited within the Pa-COVID-19 study conducted at Charité - Universitätsmedizin Berlin, a prospective observational cohort study on the pathophysiology of COVID-19. The study protocol has been described in detail before (Kurth et al., 2020). All patients with PCR-confirmed SARS-CoV-2 infection were eligible for inclusion. Refusal to provide informed consent by the patient or a legal representative and any condition prohibiting supplemental blood collection for serial biosampling were exclusion criteria. Patients were treated according to current national and international guidelines. Three patients had Do Not Intubate and Do Not Resuscitate (DNI/DNR) orders in place, declining mechanical ventilation and other organ support or cardiopulmonary resuscitation. In 4 further cases, limitation of therapy was decided at a later time point according to the patient’s presumed wish (“secondary DNR”) and predictably unfavorable outcome. All other patients received maximum intensive care treatment including organ replacement therapies at the discretion of the responsible physicians. Biosampling for proteome measurement was performed 3 times per week after inclusion. The WHO ordinal scale for clinical improvement (Table S1) was used to assess disease severity. ARDS was defined according to the Berlin ARDS criteria (ARDS Definition Task Force et al., 2012). Sepsis was defined according to sepsis-3 criteria (Singer et al., 2016). The study was approved by the ethics committee of Charité - Universitätsmedizin Berlin (EA2/066/20) and conducted in accordance with the Declaration of Helsinki and guidelines of Good Clinical Practice (ICH 1996). The study is registered in the German and the WHO international registry for clinical studies (DRKS00021688). Clinical data was captured in a purpose built electronic case report form data using the capture system SecuTrial®. All routine laboratory parameters were analyzed in accredited laboratories at Charité - Universitätsmedizin Berlin. Pseudonymized data exported from SecuTrial® were processed using JMP Pro 14 (SAS Institute Inc., Cary, NC, USA). If a laboratory value was missing for a given day, values from up to two preceding days were used for the analysis.

Innsbruck Patient cohort and clinical data

Serum samples from 99 patients admitted to the intensive care unit at the Department of Medicine, University Hospital of Innsbruck for the treatment of respiratory failure due to severe COVID-19 were collected within the first days (median 7.5, IQR 5-12) after admission. Written informed consent was either obtained before sampling or retrospectively after recovery, if patients were mechanically ventilated at the time of sampling. COVID-19 was diagnosed on the basis of a (i) positive SARS-CoV2 PCR within the last 7 days prior to study inclusion, (ii) respiratory failure defined as a partial pressure of oxygen < 60 mmHg on arterial blood gas analysis or a peripheral oxygen saturation of < 90% and (iii) typical infiltrates on computed tomography scanning of the chest. Patients were treated according to national guidelines. The study was approved by the local ethics research committee EK-Nr. 1107/2020, and EK-Nr. 1103/2020 for follow-up.

Method details

Materials

Water for chromatography (LC-MS Grade, LiChrosolv®, Merck; 115333), Acetonitrile (LC-MS Grade Optima; A955-212) and Methanol (LC-MS Grade, Optima; A456-212) were purchased from Fisher Chemicals. DL-Dithiothreitol (BioUltra, 43815), Iodoacetamide (BioUltra, I1149) and Ammonium Bicarbonate (Eluent additive for LC-MS, 40867) were purchased from Sigma Aldrich. Urea (puriss. P.a., reag. Ph. Eur., 33247H) and Formic Acid (Eluent additive for LC-MS, 85178) were purchased from Thermo Scientific. Trypsin (Sequence grade, V511X) was purchased from Promega. Control samples were prepared from Human Serum (Sigma Aldrich, S7023-50MB) and Human Plasma (EDTA, Pooled Donor, Genetex GTX73265).

Mass spectrometry

Mass spectrometry-based proteomics analysis was performed as described previously (Messner et al., 2020) with minor adjustments to the workflow (Figure S1). Semi-automated sample preparation was performed in 96-well format, using in advance prepared stock solution plates stored at -80°C. Briefly, 5μl of thawed plasma samples were transferred to the pre-made denaturation/reduction stock solution plates (55μl 8M Urea, 100mM ammonium bicarbonate (ABC), 50mM dithiothreitol) resuspended and incubated at 30°C for 60 minutes. 5μl was then transferred from the iodoacetamide stock solution plate (100mM) to the sample plate and incubated in the dark at 23°C for 30 minutes before dilution with 100mM ABC buffer (340μl). 220μl of this solution was transferred to the pre-made trypsin stock solution plate (12.5μl, 0.1μg/μl) and incubated at 37°C for 17 h (Benchmark Scientific Incu-Mixer MP4). The digestion was quenched by addition of formic acid (10% v/v, 25μl). The digestion mixture was cleaned-up using C18 96-well plates (BioPureSPE Macro 96-Well, 100mg PROTO C18, The Nest Group) and redissolved in 60μl 0.1% formic acid with shaking. Insoluble particles were removed by centrifugation and the samples transferred to a new plate. Each 96-well plate contained 8 plasma and 4 serum sample preparation controls, and the acquisition workflow included a pooled quality control sample every ~10 injections. Liquid chromatography was performed using the Agilent 1290 Infinity II system coupled to a TripleTOF 6600 mass spectrometer (SCIEX) equipped with IonDrive Turbo V Source (Sciex). A total of 5μl was injected, and the peptides were separated in reversed phase mode using a C18 ZORBAX Rapid Resolution High Definition (RRHD) column 2.1mm x 50mm, 1.8μm particles or Infinitylab Poroshell 120 EC-C18, 2.1 x 50mm 1.9 μm particles. A linear gradient was applied which ramps from 1% B to 40% B in 5 minutes (Buffer A: 0.1% FA; Buffer B: ACN/0.1% FA) with a flow rate of 800μl/min. For washing the column, the organic solvent was increased to 80% B in 0.5 minutes and was kept for 0.2 minutes at this composition before going back to 1% B in 0.1 min. The mass spectrometer was operated in the high sensitivity mode. The DIA/SWATH method consisted of an MS1 scan from m/z 100 to m/z 1500 (20 ms accumulation time) and 25 MS2 scans (25ms accumulation time) with variable precursor isolation width covering the mass range from m/z 450 to m/z 850 (Messner et al., 2020). An IonDrive Turbo V Source (Sciex) was used with ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas set to 50, 40 and 25, respectively. The source temperature was set to 450 and the ion spray voltage to 5500V. System suitability was evaluated using synthetic peptides (Sciex 5045759, Biognosys Ki-30002-b) and human protein extracts (Promega V6951).

Quantification and statistical analysis

Data analysis

The data were processed with DIA-NN (Demichev et al., 2020), an open-source software suite for DIA / SWATH data processing (https://github.com/vdemichev/DiaNN, commit 4498bd7) using a two-step spectral library refinement procedure as described previously (Messner et al., 2020), with filtering at precursor level q-value (1%), library q-value (0.5%) and gene group q-value (1%). Highly hydrophobic peptides (reference retention time > 110 on the iRT scale) were discarded. Batch correction was performed at the precursor level as described previously (Messner et al., 2020), using linear regression for intra-batch correction (for each MS batch) and control samples for inter-plate correction. Protein quantification was subsequently carried out using the MaxLFQ algorithm (Cox et al., 2014; Pham et al., 2020) as implemented in the DIA-NN R package (https://github.com/vdemichev/diann-rpackage). One of the 96-well plates (#12) featured technical replicates of a number of samples that were also analysed on other plates: in an extra batch correction step, the median log2-protein levels across these replicates on plate 12 were matched to the respective median log2-levels (across the same biological samples) throughout other plates, to correct protein levels on plate 12. Further batch correction was performed for Innsbruck data, to match the mean log2-transformed protein levels in the respective control samples to log2-transformed protein levels in control samples acquired for the Charité cohort. The Generation Scotland cohort proteomics raw data, which we described previously (Messner et al., 2020), have been reanalyzed using the updated software pipeline, to ensure comparability. Exclusion of precursors or proteins based on the data completeness was not performed. Statistical testing was performed in the R environment for statistical computing, version 3.6.0 (R core team, www.R-project.org). All protein and clinical laboratory measurements (except for standard and actual base excess, oxyhemoglobin and sO2) were first log2-transformed, to ensure optimal performance of linear models assuming Gaussian errors, as well as to reduce the impact of outliers. Imputation of the data was not performed, as all the statistical tests applied can accommodate missing values. Likewise, no data filtering based on missing value rates was applied. For differential abundance testing, only protein groups matched to at least three different unmodified peptide sequences were considered. Significance testing for a zero median (for analysing trajectories) or against binary variables (worsening, death) was performed using the Wilcoxon W test or Mann-Whitney U test, respectively, as implemented in the “wilcox.test” function of the “stats” R package. Testing against a continuous variable (e.g. when determining significance of pairwise correlations) was performed using the Kendall Tau test, with the slope estimated using the Theil-Sen method, as implemented in the “kendallTrendTest” function of the “EnvStats” (Millard, 2014) package. When covariates had to be taken into account, we used linear modelling with the “limma” (Ritchie et al., 2015) R package, with P-values obtained using “eBayes” (Smyth, 2004). Modelling with “limma” was likewise used to correct for these covariates for visualisation purposes. WHO grade was considered as a “factor-type” covariate (resulting in a “limma” design matrix with one-hot encoding for different WHO grades). Multiple-testing correction was performed using the Benjamini-Hochberg false discovery rate controlling procedure (Benjamini and Hochberg, 1995) as implemented in the “p.adjust” function of the “stats” R package. The adjusted p-values below 0.05 were considered significant. Multiple-testing correction for differential abundance analysis was performed separately for proteins, for which MRMAssayDB lists a targeted assay (Bhowmick et al., 2018), the rest of proteins measured, the clinical laboratory measurements and the clinical factors (age, Charlson score, BMI, Horowitz index and FiO2, SOFA score), to ensure that the false discovery rate stayed below 0.05 for each of these categories of features. Likewise, when determining the significance of correlations in correlation matrices, correction was performed for each row or each column separately, to ensure less than 5% false discoveries in each row or column, respectively. For correlation map visualisations, black points were used to indicate row-wise significant correlations, and black rectangles at the border of the respective cell - column-wise significant correlations. Quantities of gene products corresponding to open reading frames named IGxx (i.e. different types of immunoglobulin chains) were summed together to generate quantities representative of the overall levels of immunoglobulin classes (IGHVs, IGLVs, etc). This does not affect any conclusions of this work and was done purely to improve visualization and simplify the interpretation of the heatmaps and correlation maps. Full protein level tables, including levels of individual immunoglobulin gene products, are provided in supplementary materials. For visualisation, different WHO grades were color-coded throughout the manuscript (see Figure S2).

Markers of the disease severity

The first time point measured at the maximum WHO grade was chosen for each patient. For each omics feature, its values (log2-transformed when necessary, as described above) were tested for a trend depending on the WHO grade. Age was included as a covariate in the linear model as described above.

Markers varying with age in COVID-19

The first sampling time point measured was chosen per patient. For each omics feature, its values (log2-transformed when necessary, as described above) were tested for a trend depending on age. The test was performed either using the Kendall Tau test (as described above; Figures S9 and S10), or by accounting for WHO grade as a covariate in the linear model (as described above; Figures S11 and 5).

Markers of RRT and ECMO

For each omics feature, the P-value was calculated using the Mann-Whitney test, comparing between the median levels (log2-transformed when necessary, as described above) across all sampling time points at WHO grade 7 in patients who did not receive the therapy and the median levels (log2-transformed when necessary, as described above) across all sampling time points at WHO grade 7 after initiation of the respective therapy in patients who did.

Markers predictive of time in hospital

Patients, for which the first sampling time point before the outcome corresponded to the WHO = 3 severity grade (that is the patient did not require supplemental oxygen on that day), were considered. Thus, no correction for disease severity was necessary. Testing of levels (log2-transformed when necessary, as described above) of each omics feature (measured for the first sampling time point) vs the remaining time in hospital (days) was performed by including age as a covariate in the linear model as described above.

Markers predictive of disease worsening

The first sampling time point measured was chosen per patient. Future disease worsening was defined as a future increase in the WHO grade (for patients at WHO grade < 7) or death (for patients at WHO grade 7). For each omics feature, its levels (log2-transformed when necessary, as described above) were compared between patients who did not worsen and patients who did, with age and current WHO grade (as factor) included as covariates in the linear model as described above.

Peak period of the disease definition

When studying the dynamic changes in omics values during the disease course, we focused on the time points sampled when the disease was the most severe for a particular patient. This allowed us to look at molecular changes over time without the need to take into account the potential impact of changes in disease severity and the level of treatment. For each patient, we thus defined the “peak period of the disease” as the time when the patient was receiving the most intensive treatment during their stay in hospital, that is the time when the patient was at WHO grade 6 or 7, for patients who received invasive mechanical ventilation at some point, or otherwise at their maximum WHO grade (3, 4 or 5).

Markers changing during the peak of disease

Only patients with at least two days between the first and last sampling time points at the peak of the disease (as defined above) were considered. For each omics feature, a linear regression model was fitted for its levels (log2-transformed when necessary, as described above) vs the day number (with the slope estimated using the nonparametric Theil-Sen method, as implemented in the “kendallTrendTest” function of the “EnvStats” (Millard, 2014) R package), and the quantity slopeadj = (regression slope) ∗ (number of days between first and last time points) was calculated. A non-parametric approach was chosen because of its superior robustness to outliers. A Wilcoxon W test was then applied to compare the median of slopeadj to zero. The values of slopeadj for each feature are visualised in Figure S14. The non-parametric approach was chosen here due to its robustness with respect to outliers.

Correlation maps

General correlation maps were generated using the values (log2-transformed when necessary, as described above) of features at the first time point measured at the maximum WHO grade for each patient. The correlation map between feature changes during the peak of the disease (as defined above) was generated by correlating the slopeadj values (as defined above). The map of significant protein correlations not detected in the general population was generated by excluding all correlations which were either significant (P <= 0.05, without multiple-testing correction) with the same trend in the Generation Scotland cohort, or could not be calculated reliably therein (less than 20 valid points).

Prediction of current mechanical ventilation

To reflect the power of omics measurements in characterising the phenotype, a classifier was constructed to predict mechanical ventilation (WHO grade > 5) at the present time point using the proteomic and/or accredited diagnostic data. For the proteomic data only proteins characterized by at least 3 peptides were taken into account. The first time point measured at the maximum WHO grade was selected per patient. We used a gradient boosted tree algorithm implemented in the XGBoost 1.2.0 (Chen and Guestrin, 2016) under Python 3.8.1. The classifier was constructed using leave-one-out cross-validation. To circumvent overfitting a subsampling of 0.5 of the training data per boosting step and an L2 regularization term “lambda” of 20 were applied. For the assessment of classifier performance, the leave-one-out method was applied in the following way: the prediction was made for each sample separately, by excluding (withholding) this sample from the dataset, training the classifier on the remaining (independent) samples and then predicting the withheld sample using the trained model. The source code is provided in supplementary materials. For the determination of the feature importances, one classifier was trained on all data points using the same setup as described above. The feature importances were then extracted directly from the trained classifier. For the validation of the trained models, samples from an independent cohort (Innsbruck) were used. A model was trained on the data collected at the Charité using the same setup and parameters as described above and the proteins that were characterized in both cohorts. The evaluation was performed on the Innsbruck cohort that was not used for training. ROC-curves and AUC were calculated using scikit-learn 0.23.2 (Pedregosa et al., 2011). The machine learning scripts are provided in Data S1.

WHO grade prediction

For the prediction of the WHO grade an elastic net was applied as implemented in scikit-learn 0.23.2. The WHO grade was predicted for the first time point at maximum WHO grade per patient using a leave-one-out cross-validation procedure. A training/prediction based on proteomic (proteins with at least 3 peptides) and/or accredited diagnostic data from the Charité cohort was performed. Additionally, the proteomic model was validated using proteomic data set from the Innsbruck cohort that was not included in the training. Features with more than 10% missing values were removed. All data were log2-transformed when necessary (as described above), standardized and kNN-imputed (5 neighbors). The latter two steps were fitted on the training data only. For the elastic net an “l1_ratio” of 0.05 was used coupled to a 5-fold cross-validated recursive feature-elimination algorithm (“step” = 10, “min_features” = 20). Calculations of metrics were performed using scikit-learn 0.23.2 and scipy 1.5.2 (Virtanen et al., 2020). The machine learning scripts are provided in Data S1.

Prediction of the remaining time in hospital

For the prediction of the remaining time in hospital a WHO grade predictor as described above was trained on the first data points for every patient. The predicted WHO grades for every patient at WHO grade 3 who stayed in hospital for at least 1 day after sample time were correlated to the remaining time in hospital. The Spearman correlation was calculated using scipy 1.5.2. The machine learning scripts are provided in Data S1.

Supplementary Note 1. Diagnostic parameters and Proteome signatures that indicate therapeutic interventions

We investigated to what extent specific organ replacement therapies in severely ill patients, (renal replacement therapy (RRT) and extracorporeal membrane oxygenation (ECMO)) were reflected in the proteome and at the level of accredited diagnostic parameters. HP and HPX were reduced in patients on RRT and ECMO, reflecting hemolysis in the extracorporeal circuits (Figures S7 and S8). Elevated SERPINC1 (Antithrombin III) levels mirror substitution of antithrombin during ECMO. The reason for elevated levels of APOE in patients with ECMO is unclear, but is in line with reports on increased levels of APOE in pediatric patients after cardiopulmonary bypass (Aĝirbaşli et al., 2015). The proteins increased in patients receiving RRT mainly reflect impaired kidney function and have been associated with RRT before (AMBP, B2M, CST3, LYZ, RBP4, Figure S7) (Shao et al., 2015). Of note, increased levels of AMBP, B2M and LYZ have been associated with death in chronic kidney disease (Makridakis et al., 2020). Levels of CFD and APOH, both involved in the complement system, were also increased (McDonnell et al., 2020; Volanakis and Narayana, 1996). CFD is eliminated renally and accumulates in end stage renal disease, possibly leading to enhanced complement activation via the alternative pathway (Pascual et al., 1988). In contrast, levels of APOH have even been reported to be slightly lower following high-flux hemodialysis (Han et al., 2018). We note that the analysis of the effect of treatments on the proteome has two limitations. First, some of the markers identified might be prognostic for the treatment rather than reflect its effect. Age and the Charlson comorbidity index belong to this category: patients receiving ECMO were significantly younger and had a lower number of pre-existing chronic conditions than those who did not. Second, the results might be partially confounded by the time elapsed from the onset of the disease, as we have shown (Figure 3) that omics signature changes with time in COVID-19 patients while on invasive mechanical ventilation.

Supplementary Note 2. Age-specific response to COVID-19 in the context of severity markers

Older age is one of the most significant risk factors for severe disease and adverse outcome in COVID-19. Enhanced understanding of underlying mechanisms for the age-specific response to SARS-CoV-2 infection is therefore important and needed for the development of effective age-specific strategies for prevention and treatment. Furthermore, dissecting the age-specific components of the host response will improve our knowledge of the pathogenicity of similar viruses, making the world better prepared for future pandemics. Current theories characterizing the link between the higher age and risk for severe disease include immunosenescence, elevated baseline inflammation, or altered protein glycosylation landscape leading to impaired antiviral response or reduced immune tolerance (Franceschi et al., 2018; Goronzy and Weyand, 2013; Rea et al., 2018; Tay et al., 2020). However, a detailed and mechanistic understanding of the relation between COVID-19 and aging is lacking. In this work, we leverage the large size and high precision of the proteomic data acquired to map the age-related response to COVID-19, to provide a reference dataset (Figures 2C, 5, and S11) for future studies addressing this problem. We report elevation of several inflammatory and acute phase proteins such as SERPINA3, ITIH4, SAA1, and ITIH3 in older patients with COVID-19. SAA1 has been shown to induce macrophage polarization to the M2-type which promotes tissue repair but also possesses pro-fibrotic properties involved in the pathogenesis of pulmonary fibrosis (Liu et al., 2014; Page et al., 2012; Wermuth and Jimenez, 2015). Moreover, SAA1 mediates displacement of APOA1 from HDL leading to loss of the cardio- and vasoprotective properties of high density lipoprotein (HDL) (Gordon, 2014). SERPINA3, as discussed above, has an ambivalent role as a neutrophil proteinase inhibitor but also a powerful neutrophil chemoattractant. Upregulation of SERPINA3 with age in COVID-19, along with the higher neutrophil-to-lymphocyte ratio, suggests that excessive neutrophil response is one of the aggravating factors in older COVID-19 patients. Taken together, our findings point toward a disproportionately dysregulated inflammatory response to SARS-CoV-2 with age, which may be explained by an increased baseline inflammation and immunosenescence in older patients (Chung et al., 2019; Ferrucci et al., 2005; Soysal et al., 2016). Age-dependent increase of FBLN1 and decrease of KLKB1 reflect alterations in blood coagulation which may aggravate this effect by predisposing older patients to thromboembolic events, one of the key clinical characteristics of severe COVID-19. Interestingly, a number of apolipoproteins displayed a strong age-specific signature in COVID-19. For instance, APOC2, a component of chylomicrons, very low density lipoprotein (VLDL) and high density lipoprotein (HDL), and activator of lipoprotein lipase involved in triglyceride metabolism (Ramasamy, 2014), was downregulated with age in COVID-19, but upregulated with age in the general population (Harris et al., 2017; Peters et al., 2015) (Figure 2C). Dysregulation of apolipoproteins has been observed in community acquired pneumonia and associated with unfavourable outcome (Sharma et al., 2017). Remarkably, contrary to the general trend, APOD, APOC3 and APOE show opposite trends in older COVID-19 patients and in severe disease (Figure 5). APOD is expressed by many tissues, including the brain (Dassati et al., 2014). An increase in APOD has been previously observed in ischemic stroke and CNS inflammation and may reflect (subclinical) involvement of the central nervous system especially in older patients with more severe inflammation and more comorbidities (Muffat and Walker, 2010). Conversely, high levels of APOD have been shown to temper coronavirus-mediated encephalitis in mice, indicating its role as a marker of CNS damage as well as tissue protection and repair (Carmo et al., 2008). APOE, involved in inflammation, immune response and lipid metabolism, is upregulated in severe COVID-19 but downregulated with age in this cohort. APOE typically mediates anti-inflammatory effects by downregulation of NFκB and inhibition of macrophage response to IFNy and TLR3, both mediators of viral immune response. Moreover, it neutralizes bacterial LPS and enhances the adaptive immune response by facilitating antigen presentation (Figueroa et al., 2019). Downregulation with age may reflect a compromised immune response leading to over-activation of NFκB and insufficient pathogen clearance in older patients. Finally, APOE has been described to reduce proliferation of myeloid progenitor cells (Murphy et al., 2011) and to reduce myeloid derived suppressor cell (MDSC) survival in mice (Tavazoie et al., 2018). Thus, lower levels of APOE in the elderly may favor expansion of immature and dysfunctional neutrophils that have been described as a hallmark in severe COVID-19 (Schulte-Schrepping et al., 2020). This broad involvement of APOE merits further investigation in future studies.

Supplementary Note 3. Diverging trends at the proteome level during the disease peak in individual patients

Some patients (59, 90, 96, 123) who died exhibited protein concentration trajectories distinctly similar to “typical” survivors (Figure 3B). Two of them (59, 90) had a prolonged ICU stay with repeated septic episodes and finally defined limitations of therapy according to presumed patients’ wishes (“secondary DNR”). Their protein signatures probably reflect the phenomenon of immune paralysis that can follow bacterial sepsis associated with a prolonged ICU treatment (Patricio et al., 2019). One patient (96) was receiving ongoing immunosuppressive therapy for an autoimmune disorder, and a fourth patient (123) had a history of kidney transplantation, both died of septic shock. Whether the particular group of solid organ recipients shows a distinct protein signature associated with the outcome requires further investigation. We also note that some surviving patients do not show a trajectory characteristic of the typical ‘alleviation’ of the proteomic phenotype (WHO = 4: 58, 106, 153; WHO = 6 or 7: 43, 80). Specifically, the proteomic response in patients 106, 153 and 141 was indicative of overall ‘worsening’ of the proteome (Figure 3B). In contrast, patients 43 and 80 exhibited the overall ‘alleviation’ of the proteome, except for the spike in the levels of CRP and serum amyloid (Figure 3B). Shorter time spans between sampling days may explain these observations in four of these patients (43, 58, 80, 106), indicating that the host inflammatory response requires a certain time to resolve, especially in more severely ill patients, and some of the markers of systemic inflammation might linger, whereas a typical alleviation of the proteomic signature can be observed even within a few days in moderate disease courses. The unusual pattern of patient 153 was likely confounded by a skin infection that subsequently required antibiotic treatment.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Biological samples

Human Serum	Sigma-Aldrich	Cat# S7023-50MB
Human Plasma (EDTA, Pooled Donor)	Genetex	Cat# GTX73265

Chemicals, peptides, and recombinant proteins

Water for chromatography (LC-MS Grade) LiChrosolv®	Merck	Cat# 115333
Acetonitrile (Acetonitrile, Optima™ LC/MS Grade, Fisher Chemical™ )	Fisher Scientific	Cat# A955-212
Methanol (Optima LC-MS Grade, Fisher Chemical)	Fisher Scientific	Cat# A456-212
DL-Dithiothreitol (BioUltra)	Sigma-Aldrich	Cat# 43815
Iodoacetamide (BioUltra)	Sigma-Aldrich	Cat# I1149
Ammonium Bicarbonate (Eluent additive for LC-MS)	Sigma-Aldrich	Cat# 40867
Urea (puriss. P.a., reag. Ph. Eur.)	Honeywell Research Chemicals	Cat# 33247H
Formic Acid, LC-MS Grade (Eluent additive for LC-MS)	Thermo Scientific™Pierce™	Cat# 85178
Trypsin (Sequence grade)	Promega	Cat# V511X
Mass Spec-Compatible Human Extract	Promega	Cat# V6951
Retention time peptides Biognosys iRT kit	Biognosys	Cat# Ki-30002-b
MS synthetic peptide calibration kit	SCIEX	Cat# 5045759

Deposited Data

Raw mass spectrometry proteomics data (commercial plasma and serum control samples)	This study	PXD025752

Software and algorithms

Proteomics data analysis via Deep NeuralNetworks, DIA-NN	Demichev et al., 2020	https://github.com/vdemichev/DiaNN
DIA-NN R package	Demichev et al., 2020	https://github.com/vdemichev/diann-rpackage
ComplexHeatmap R package	(Gu et al., 2016)	https://github.com/jokergoo/ComplexHeatmap
EnvStats R package	(Millard, 2014)	https://CRAN.R-project.org/package=EnvStats
Limma R package	(Ritchie et al., 2015)	https://bioconductor.org/packages/limma/
eBayes R package	(Smyth, 2004)	https://github.com/cran/limma/blob/master/R/ebayes.R
XGBoost 1.2.0 Python package	(Chen and Guestrin, 2016)	https://pypi.org/project/xgboost/1.2.0/
scikit-learn 0.23.2 Python package	(Pedregosa et al., 2011)	https://scikit-learn.org/0.23/
scipy 1.5.2 Python package	(Virtanen et al., 2020)	https://pypi.org/project/scipy/1.5.2/

Other

Zorbax RRHD Eclipse Plus 95A C18, 2.1 x 50mm, 1.8 um, 1200 bar	Agilent	Cat# 959757-902
Infinitylab Poroshell 120 EC-C18, 2.1x50mm 1.9um	Agilent	Cat# 699675-902
BioPureSPE Macro 96-Well,100mg PROTO 300 C18	The Nest Group, Inc.	HNS S18V-L

31 in total

1. Investigating phenotypes of pulmonary COVID-19 recovery: A longitudinal observational prospective multicenter trial.

Authors: Thomas Sonnweber; Piotr Tymoszuk; Sabina Sahanic; Anna Boehm; Alex Pizzini; Anna Luger; Christoph Schwabl; Manfred Nairz; Philipp Grubwieser; Katharina Kurz; Sabine Koppelstätter; Magdalena Aichner; Bernhard Puchner; Alexander Egger; Gregor Hoermann; Ewald Wöll; Günter Weiss; Gerlig Widmann; Ivan Tancevski; Judith Löffler-Ragg
Journal: Elife Date: 2022-02-08 Impact factor: 8.140

Review 2. A Comprehensive Review of Machine Learning Used to Combat COVID-19.

Authors: Rahul Gomes; Connor Kamrowski; Jordan Langlois; Papia Rozario; Ian Dircks; Keegan Grottodden; Matthew Martinez; Wei Zhong Tee; Kyle Sargeant; Corbin LaFleur; Mitchell Haley
Journal: Diagnostics (Basel) Date: 2022-07-31

3. Virus-induced senescence is a driver and therapeutic target in COVID-19.

Authors: Soyoung Lee; Yong Yu; Jakob Trimpert; Fahad Benthani; Mario Mairhofer; Paulina Richter-Pechanska; Emanuel Wyler; Dimitri Belenki; Sabine Kaltenbrunner; Maria Pammer; Lea Kausche; Theresa C Firsching; Kristina Dietert; Michael Schotsaert; Carles Martínez-Romero; Gagandeep Singh; Séverine Kunz; Daniela Niemeyer; Riad Ghanem; Helmut J F Salzer; Christian Paar; Michael Mülleder; Melissa Uccellini; Edward G Michaelis; Amjad Khan; Andrea Lau; Martin Schönlein; Anna Habringer; Josef Tomasits; Julia M Adler; Susanne Kimeswenger; Achim D Gruber; Wolfram Hoetzenecker; Herta Steinkellner; Bettina Purfürst; Reinhard Motz; Francesco Di Pierro; Bernd Lamprecht; Nikolaus Osterrieder; Markus Landthaler; Christian Drosten; Adolfo García-Sastre; Rupert Langer; Markus Ralser; Roland Eils; Maurice Reimann; Dorothy N Y Fan; Clemens A Schmitt
Journal: Nature Date: 2021-09-13 Impact factor: 69.504

4. New Insights into Clinical and Mechanistic Heterogeneity of the Acute Respiratory Distress Syndrome: Summary of the Aspen Lung Conference 2021.

Authors: Thomas R Martin; Rachel L Zemans; Lorraine B Ware; Eric P Schmidt; David W H Riches; Lisa Bastarache; Carolyn S Calfee; Tushar J Desai; Susanne Herold; Catherine L Hough; Mark R Looney; Michael A Matthay; Nuala Meyer; Samir M Parikh; Troy Stevens; B Taylor Thompson
Journal: Am J Respir Cell Mol Biol Date: 2022-09 Impact factor: 7.748

5. A multiplex protein panel assay for severity prediction and outcome prognosis in patients with COVID-19: An observational multi-cohort study.

Authors: Ziyue Wang; Adam Cryar; Oliver Lemke; Pinkus Tober-Lau; Daniela Ludwig; Elisa Theresa Helbig; Stefan Hippenstiel; Leif-Erik Sander; Daniel Blake; Catherine S Lane; Rebekah L Sayers; Christoph Mueller; Johannes Zeiser; StJohn Townsend; Vadim Demichev; Michael Mülleder; Florian Kurth; Ernestas Sirka; Johannes Hartl; Markus Ralser
Journal: EClinicalMedicine Date: 2022-06-09

Review 6. The role of multi-omics in the diagnosis of COVID-19 and the prediction of new therapeutic targets.

Authors: Jianli Ma; Yuwei Deng; Minghui Zhang; Jinming Yu
Journal: Virulence Date: 2022-12 Impact factor: 5.428

Review 7. Advances and Utility of the Human Plasma Proteome.

Authors: Eric W Deutsch; Gilbert S Omenn; Zhi Sun; Michal Maes; Maria Pernemalm; Krishnan K Palaniappan; Natasha Letunica; Yves Vandenbrouck; Virginie Brun; Sheng-Ce Tao; Xiaobo Yu; Philipp E Geyer; Vera Ignjatovic; Robert L Moritz; Jochen M Schwenk
Journal: J Proteome Res Date: 2021-10-21 Impact factor: 5.370

8. Hyperinflammatory environment drives dysfunctional myeloid cell effector response to bacterial challenge in COVID-19.

Authors: Srikanth Mairpady Shambat; Alejandro Gómez-Mejia; Tiziano A Schweizer; Markus Huemer; Chun-Chi Chang; Claudio Acevedo; Judith Bergada-Pijuan; Clément Vulin; Daniel A Hofmaenner; Thomas C Scheier; Sanne Hertegonne; Elena Parietti; Nataliya Miroshnikova; Pedro D Wendel Garcia; Matthias P Hilty; Philipp Karl Buehler; Reto A Schuepbach; Silvio D Brugger; Annelies S Zinkernagel
Journal: PLoS Pathog Date: 2022-01-10 Impact factor: 6.823

9. Circulating proteomic panels for risk stratification of intracranial aneurysm and its rupture.

Authors: Yueting Xiong; Yongtao Zheng; Yan Yan; Jun Yao; Hebin Liu; Fenglin Shen; Siyuan Kong; Shuang Yang; Guoquan Yan; Huanhuan Zhao; Xinwen Zhou; Jia Hu; Bin Zhou; Tao Jin; Huali Shen; Bing Leng; Pengyuan Yang; Xiaohui Liu
Journal: EMBO Mol Med Date: 2022-01-03 Impact factor: 12.137

10. Validation of the Performance of A1HPV6, a Triage Blood Test for the Early Diagnosis and Prognosis of SARS-CoV-2 Infection.

Authors: Pauline Maisonnasse; Thierry Poynard; Mehdi Sakka; Sepideh Akhavan; Romain Marlin; Valentina Peta; Olivier Deckmyn; Nesrine Braham Ghedira; Yen Ngo; Marika Rudler; Sylvie van der Werf; Stephane Marot; Dominique Thabut; Harry Sokol; Chantal Housset; Alain Combes; Roger Le Grand; Patrice Cacoub
Journal: Gastro Hep Adv Date: 2022-02-07