Vadim Demichev1, Pinkus Tober-Lau2, Oliver Lemke3, Tatiana Nazarenko4, Charlotte Thibeault2, Harry Whitwell5, Annika Röhl3, Anja Freiwald3, Lukasz Szyrwiel6, Daniela Ludwig3, Clara Correia-Melo6, Simran Kaur Aulakh6, Elisa T Helbig2, Paula Stubbemann2, Lena J Lippert2, Nana-Maria Grüning3, Oleg Blyuss7, Spyros Vernardis6, Matthew White6, Christoph B Messner8, Michael Joannidis9, Thomas Sonnweber10, Sebastian J Klein9, Alex Pizzini10, Yvonne Wohlfarter11, Sabina Sahanic10, Richard Hilbe10, Benedikt Schaefer12, Sonja Wagner12, Mirja Mittermaier13, Felix Machleidt2, Carmen Garcia2, Christoph Ruwwe-Glösenkamp2, Tilman Lingscheid2, Laure Bosquillon de Jarcy2, Miriam S Stegemann2, Moritz Pfeiffer2, Linda Jürgens2, Sophy Denker14, Daniel Zickler15, Philipp Enghard15, Aleksej Zelezniak16, Archie Campbell17, Caroline Hayward18, David J Porteous17, Riccardo E Marioni19, Alexander Uhrig2, Holger Müller-Redetzky2, Heinz Zoller12, Judith Löffler-Ragg10, Markus A Keller11, Ivan Tancevski10, John F Timms20, Alexey Zaikin21, Stefan Hippenstiel22, Michael Ramharter23, Martin Witzenrath22, Norbert Suttorp22, Kathryn Lilley24, Michael Mülleder25, Leif Erik Sander22, Markus Ralser26, Florian Kurth27. 1. Charité Universitätsmedizin Berlin, Department of Biochemistry, 10117 Berlin, Germany; The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London NW11AT, UK; The University of Cambridge, Department of Biochemistry and Cambridge Centre for Proteomics, Cambridge CB21GA, UK. 2. Charité Universitätsmedizin Berlin, Department of Infectious Diseases and Respiratory Medicine, 10117 Berlin, Germany. 3. Charité Universitätsmedizin Berlin, Department of Biochemistry, 10117 Berlin, Germany. 4. University College London, Department of Mathematics, London WC1E 6BT, UK; University College London, Department of Women's Cancer, EGA Institute for Women'S Health, London WC1E 6BT, UK. 5. National Phenome Centre and Imperial Clinical Phenotyping Centre, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW72AZ, UK; Lobachevsky University, Department of Applied Mathematics, Nizhny Novgorod 603105, Russia; Imperial College London, Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, London SW7 2AZ, UK. 6. The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London NW11AT, UK. 7. Lobachevsky University, Department of Applied Mathematics, Nizhny Novgorod 603105, Russia; University of Hertfordshire, School of Physics, Astronomy and Mathematics, Hatfield AL10 9AB, UK; Sechenov First Moscow State Medical University, Department of Paediatrics and Paediatric Infectious Diseases, Moscow 119435, Russia. 8. Charité Universitätsmedizin Berlin, Department of Biochemistry, 10117 Berlin, Germany; The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London NW11AT, UK. 9. Medical University Innsbruck, Division of Intensive Care and Emergency Medicine, Department of Internal Medicine, 6020 Innsbruck, Austria. 10. Medical University of Innsbruck, Department of Internal Medicine II, 6020 Innsbruck, Austria. 11. Medical University of Innsbruck, Institute of Human Genetics, 6020 Innsbruck, Austria. 12. Medical University of Innsbruck, Christian Doppler Laboratory for Iron and Phosphate Biology, Department of Internal Medicine I, 6020 Innsbruck, Austria. 13. Charité Universitätsmedizin Berlin, Department of Infectious Diseases and Respiratory Medicine, 10117 Berlin, Germany; Berlin Institute of Health, 10178 Berlin, Germany. 14. Charité Universitätsmedizin Berlin, Medical Department of Hematology, Oncology & Tumor Immunology, Virchow Campus & Molekulares Krebsforschungszentrum, 13353 Berlin, Germany; Berlin Institute of Health, 10178 Berlin, Germany. 15. Charité Universitätsmedizin Berlin, Department of Nephrology and Internal Intensive Care Medicine, 10117 Berlin, Germany. 16. The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London NW11AT, UK; Chalmers Tekniska Högskola, Department of Biology and Biological Engineering, SE-412 96 Gothenburg, Sweden. 17. University of Edinburgh, Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, Edinburgh EH4 2XU, UK; University of Edinburgh, Usher Institute, Edinburgh EH16 4UX, UK. 18. University of Edinburgh, MRC Human Genetics Unit, Institute of Genetics and Cancer, Edinburgh EH4 2XU, UK. 19. University of Edinburgh, Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, Edinburgh EH4 2XU, UK. 20. University College London, Department of Women's Cancer, EGA Institute for Women'S Health, London WC1E 6BT, UK. 21. University College London, Department of Mathematics, London WC1E 6BT, UK; University College London, Department of Women's Cancer, EGA Institute for Women'S Health, London WC1E 6BT, UK; Lobachevsky University, Laboratory of Systems Medicine of Healthy Ageing, Nizhny Novgorod 603105, Russia. 22. Charité Universitätsmedizin Berlin, Department of Infectious Diseases and Respiratory Medicine, 10117 Berlin, Germany; German Centre for Lung Research, 35392 Gießen, Germany. 23. Bernhard Nocht Institute for Tropical Medicine, Department of Tropical Medicine, and University Medical Center Hamburg-Eppendorf, Department of Medicine, 20359 Hamburg, Germany. 24. The University of Cambridge, Department of Biochemistry and Cambridge Centre for Proteomics, Cambridge CB21GA, UK. 25. Charité - Universitätsmedizin Berlin, Core Facility - High-Throughput Mass Spectrometry, 10117 Berlin, Germany. 26. Charité Universitätsmedizin Berlin, Department of Biochemistry, 10117 Berlin, Germany; The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London NW11AT, UK. Electronic address: markus.ralser@charite.de. 27. Charité Universitätsmedizin Berlin, Department of Infectious Diseases and Respiratory Medicine, 10117 Berlin, Germany; Bernhard Nocht Institute for Tropical Medicine, Department of Tropical Medicine, and University Medical Center Hamburg-Eppendorf, Department of Medicine, 20359 Hamburg, Germany.
Abstract
COVID-19 is highly variable in its clinical presentation, ranging from asymptomatic infection to severe organ damage and death. We characterized the time-dependent progression of the disease in 139 COVID-19 inpatients by measuring 86 accredited diagnostic parameters, such as blood cell counts and enzyme activities, as well as untargeted plasma proteomes at 687 sampling points. We report an initial spike in a systemic inflammatory response, which is gradually alleviated and followed by a protein signature indicative of tissue repair, metabolic reconstitution, and immunomodulation. We identify prognostic marker signatures for devising risk-adapted treatment strategies and use machine learning to classify therapeutic needs. We show that the machine learning models based on the proteome are transferable to an independent cohort. Our study presents a map linking routinely used clinical diagnostic parameters to plasma proteomes and their dynamics in an infectious disease.
COVID-19 is highly variable in its clinical presentation, ranging from asymptomatic infection to severe organ damage and death. We characterized the time-dependent progression of the disease in 139 COVID-19 inpatients by measuring 86 accredited diagnostic parameters, such as blood cell counts and enzyme activities, as well as untargeted plasma proteomes at 687 sampling points. We report an initial spike in a systemic inflammatory response, which is gradually alleviated and followed by a protein signature indicative of tissue repair, metabolic reconstitution, and immunomodulation. We identify prognostic marker signatures for devising risk-adapted treatment strategies and use machine learning to classify therapeutic needs. We show that the machine learning models based on the proteome are transferable to an independent cohort. Our study presents a map linking routinely used clinical diagnostic parameters to plasma proteomes and their dynamics in an infectious disease.
The coronavirus disease 2019 (COVID-19) has created unprecedented societal challenges, particularly for public health and the global economy (Alwan et al., 2020; Blumenthal et al., 2020; Rosenbaum, 2020). Efficient management of these challenges is hampered by the variability of clinical manifestations, ranging from asymptomatic infection with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) to death, despite maximum intensive care. Biomarkers and molecular signatures enabling accurate prognosis of future disease courses are needed to optimize resource allocation and personalize treatment strategies. Patients likely to progress to severe disease and organ failure and those likely to remain stable could be identified early, which is particularly valuable in scenarios where health care systems reach capacity limits. Prognostic panels would also optimize the monitoring of novel treatments, thereby accelerating clinical trials (Phua et al., 2020; Saxena, 2020; Wu et al., 2020). Knowledge of factors that differentiate recovery from deterioration throughout the disease will further enhance our understanding of the inflammatory host response as well as the underlying pathophysiology and provide new therapeutic targets.A number of biomarkers that classify COVID-19 severity have recently been described. These are based on clinical chemistry, enzyme activities, immune profiling, single-cell sequencing, proteomics, and metabolomics (D’Alessandro et al., 2020; Laing et al., 2020; Liu et al., 2020b; Messner et al., 2020; Overmyer et al., 2020; Schulte-Schrepping et al., 2020; Shen et al., 2020; Shu et al., 2020; Wynants et al., 2020). As severity classifiers, the molecular signatures recorded in blood, serum, plasma, or immune cells characterize the COVID-19 pathology and host responses. Furthermore, markers of dysregulated coagulation, inflammation, and other organ dysfunction have been established as risk factors for severe illness, including low platelet count, elevated levels of D-dimer, C-reactive protein (CRP), interleukin 6 (IL-6), ferritin, troponin, and markers of kidney injury (Danwang et al., 2020; Henry et al., 2020). Proteomic investigations that characterize the comprehensive host response have revealed the activation of the complement cascade and acute phase response, both of which center around IL-6-driven pathways. In turn, these systematic studies have revealed that other common antiviral pathways, such as type I interferons (IFN), do not dominate the early response to COVID-19, probably reflecting evasion of the IFN system by SARS-CoV-2 and the subsequent activation of inflammatory cascades (Hadjadj et al., 2020; Yang et al., 2020). Furthermore, proteomic data and diagnostic parameters have pointed to underlying pathological mechanisms and possible therapeutic targets. For instance, using high-throughput proteomics, we reported a decline in plasma levels of gelsolin (GSN) in patients with severe COVID-19 in a previous study (Messner et al., 2020), and recombinant human GSN is currently undergoing clinical testing for COVID-19 pneumonia in a phase II trial (ClinicalTrials.gov identifier: NCT04358406).The severity of the disease, and the biomarker signatures that indicate severity, correlate with the outcome, but the highest diagnostic need is to stratify within therapeutically homogeneous patients. For instance, to identify those among the mildly affected individuals with the highest risk for deterioration, or among the most severely affected, those with the highest chance to respond positively to an augmentation of therapy. Predicting future trajectories on an individualized basis would also help accelerate therapeutic developments to judge the impact of the treatment on an individual disease course. To obtain a comprehensive picture of how the molecular COVID-19 phenotype develops over time, we deeply phenotyped a group of 139 COVID-19 inpatients at 687 sampling points. On the one hand, we measured a compendium of 86 clinical parameters, routine diagnostic markers, and clinically established risk scores using gold standard accredited clinical tests. On the other hand, we captured the patient’s molecular phenotype by measuring plasma proteomes in an untargeted fashion. For this, we made use of liquid chromatography coupled with tandem mass spectrometry, using a recently developed platform technology that utilizes analytical flow rate chromatography, data-independent acquisition mass spectrometry (SWATH-MS), and deep-neural network-based data processing (Demichev et al., 2020; Messner et al., 2020) (Figure S1). By combining the compendium of diagnostic parameters with the proteomes in a time- and patient-resolved fashion, we obtained a comprehensive molecular picture that captures changes in the patient’s molecular phenotype as they depend on the severity, age, and disease progression. We identify prognostic biomarkers and depict their distinct trajectories. We exemplify the power of our resource by showing that the biomarker profiles and diagnostic parameters classify treatment requirements, in particular, the need for mechanical ventilation. Furthermore, we report the future prediction of recovery time in mildly ill patients as well as the individual risk of clinical deterioration. Our study demonstrates the predictability of COVID-19 disease trajectories based on the molecular phenotype of the early disease stage.
Results
Covariation of clinical diagnostic parameters and the plasma proteome characterizes the host response to COVID-19
We longitudinally phenotyped 139 patients admitted to Charité University Hospital, Berlin, Germany, between March 01, 2020, and June 30, 2020, due to PCR-confirmed SARS-CoV-2 infection (Figure S2). The patients exhibited highly variable disease courses, graded according to the World Health Organization (WHO) ordinal scale for clinical improvement (Table S1), which reflects the treatment that the patient is receiving as a measure of disease severity. The patients included in our study range from WHO grade 3, which includes patients who require inpatient care without supplemental oxygen therapy, to WHO grade 7, which includes patients with severe COVID-19 who require invasive mechanical ventilation and additional organ support therapies such as renal replacement therapy (RRT) and extracorporeal membrane oxygenation (ECMO) (WHO, 2020). In total, 23 out of 139 (17%) patients in the WHO grade 3 category were stable, without requiring supplemental oxygen therapy and could be discharged after a median of 7 days of inpatient care (Table S2 and Figure S2); 47 (34%) patients required either low-flow or high-flow supplemental oxygen therapy; 69 (50%) patients either presented with severe COVID-19 (WHO grade 6 or 7, i.e., requiring invasive mechanical ventilation) or deteriorated and required invasive mechanical ventilation during their hospitalization; 46 patients (33%) required RRT; and 22 (16%) were treated with ECMO. A total of 20 (13%) patients died, including three patients with do not intubate/do not resuscitate (DNI/DNR) orders in place and one patient who died due to a non-COVID-19-related cause. Common risk factors for severe COVID-19 were reflected in the outcomes: patients with a severe course of disease were older than those with mild disease (49 years [IQR 35–70] for WHO grade 3 versus 62 years [IQR 53–72] for WHO grade 7, p = 0.02), and an age of 65 years or older was associated with a higher risk of death -(OR 4.1 [95% CI 1.5–11.5]). Our cohort further reflected that men and individuals with a high BMI have an increased likelihood to be hospitalized upon a COVID-19 infection; 68% of the patients were men, and the median BMI was 27.8 (IQR 24.7–31.9). However, we noted that within the group of patients hospitalized with COVID-19, sex and BMI were not further associated with disease severity or an increased risk of death. The median duration of hospitalization was 20 days (IQR 9–48) and correlated with severity (7 days for WHO grade 3 versus 46 days for WHO grade 7). The median time from admission to death despite receiving maximum treatment was 28 days (IQR 16–46).To capture the diverse disease trajectories on a molecular and biochemical level, we systematically collected 86 clinical and accredited diagnostic parameters as measured with certified tests. Moreover, we monitored the development of risk scores such as the “sequential organ failure assessment” (SOFA) score, blood gas analyses, blood cell counts, enzyme activities, and inflammation biomarkers (Table S3). To complement these parameters with an untargeted analysis, we employed a recently developed high-throughput proteomics platform (Messner et al., 2020). This platform makes use of the data-independent acquisition technique SWATH-MS (Gillet et al., 2012), a sample preparation pipeline designed to ISO13485 reporting standards, which is optimized for reducing batch effects, high-flow rate chromatography to provide highly consistent peptide separation in large sample series, and uses DIA-NN (Data-Independent Acquisition by Neural Networks) to analyze proteomics data recorded with 5-min chromatography (Messner et al., 2020; Demichev et al., 2020) (Figure S1 for a detailed overview of the proteomic workflow). In total, we measured 1,169 plasma proteome samples to determine 687 human proteomes, in which we quantified 321 plasma protein groups. Owing to the nature of the high-flow proteomics platform, data completeness was high; thus, we decided against the use of imputation strategies in the analysis of differential protein abundance. Total data completeness was 75%, with 200 proteins consistently quantified with 98% completeness, and 189 proteins with 99% completeness (Figure S1).To identify interdependencies of the diagnostic parameters that are routinely used in clinical decision making and the plasma proteomes, we characterized their covariation and present a direct correlation map (Figures 1B and S3–S4; Tables S4, S5, and S6). We report a robust positive or negative correlation of IL-6 levels and other inflammatory markers (CRP, procalcitonin) with acute phase proteins (APPs) (APOA2, APOE, CD14, CRP, GSN, ITIH3, ITIH4, LYZ, SAA1, SAA2, SERPINA1, SERPINA3, and AHSG; the protein names corresponding to the gene identifiers are provided in Table S3), coagulation factors and related proteins (FGA, FGB, FGG, F2, F12, KLKB1, PLG, and SERPINC1), and the complement system (C1R, C1S, C8A, C9, CFB, CFD, and CFHR5). Our data, therefore, link the prominent role of the IL-6 response in COVID-19 (D’Alessandro et al., 2020) to coagulation and the complement cascade. Consistently, in our data, markers of cardiac (troponin T, NT-proBNP) and renal (creatinine, urea) function, as well as anemia and dyserythropoiesis (hemoglobin, hematocrit, erythrocytes, and red blood cell distribution width) correlate with various APPs (APOA2, APOE, CD14, GSN, LYZ, SAA1, SAA2, and SERPINA3; Figure 1B and Table S4) supporting the role of inflammation in COVID-19-related organ damage and its impact on erythropoiesis.
Figure 1
Interdependence of clinical, diagnostic, physiological and proteomic parameters during the clinical progression of COVID-19
(A) Study design. Schematic of the cohort of 139 patients with PCR-confirmed SARS-CoV-2 infection treated at Charité University Hospital Berlin. Plasma proteomics and accredited diagnostic tests were applied at 687 sampling points to generate high-resolution time series data for 86 routine diagnostic parameters and 321 protein quantities (see also Figures S1 and S2).
(B) Covariation map for plasma proteins and routine diagnostic and physiological parameters. Statistically significant correlations (Spearman; p < 0.05) are colored. Dots indicate statistical significance after row-wise multiple-testing correction (false discovery rate [FDR] < 0.05), black rectangles—column-wise. The panel on the right of the heatmap provides manual functional annotation for the proteins (see also Figures S3 and S4, and Tables S4, S5, and S6).
(C) Covariation of key diagnostic parameters and plasma protein markers (log2-transformed) in COVID-19 (see also Tables S4, S5, and S6). Dots colors correspond to the WHO grade of the patient, see Figure 2B.
Interdependence of clinical, diagnostic, physiological and proteomic parameters during the clinical progression of COVID-19(A) Study design. Schematic of the cohort of 139 patients with PCR-confirmed SARS-CoV-2 infection treated at Charité University Hospital Berlin. Plasma proteomics and accredited diagnostic tests were applied at 687 sampling points to generate high-resolution time series data for 86 routine diagnostic parameters and 321 protein quantities (see also Figures S1 and S2).(B) Covariation map for plasma proteins and routine diagnostic and physiological parameters. Statistically significant correlations (Spearman; p < 0.05) are colored. Dots indicate statistical significance after row-wise multiple-testing correction (false discovery rate [FDR] < 0.05), black rectangles—column-wise. The panel on the right of the heatmap provides manual functional annotation for the proteins (see also Figures S3 and S4, and Tables S4, S5, and S6).(C) Covariation of key diagnostic parameters and plasma protein markers (log2-transformed) in COVID-19 (see also Tables S4, S5, and S6). Dots colors correspond to the WHO grade of the patient, see Figure 2B.
Figure 2
The molecular phenotype of patients with COVID-19 and its dependency on severity and age
(A) Plasma proteome and clinical diagnostic parameters in dependency of COVID-19 severity irrespective of age. The patients are grouped according to the maximum clinical treatment received (WHO ordinal scale), used as an indicator of disease severity (Table S1). 113 proteins and 55 routine diagnostic parameters vary significantly (FDR < 0.05) between patients of the different WHO groups upon accounting for age as a covariate using linear modeling (Ritchie et al., 2015). A fully annotated heatmap is provided in Figure S5 (see also Figure S6 and Table S3).
(B) Selected protein markers and routine diagnostic parameters (log2-transformed) plotted against the WHO ordinal scale.
(C) Selected proteins differentially abundant depending on age (FDR < 0.05). Left, colored: this data set (log2-transformed levels; statistical testing was performed by accounting for the WHO grade as a covariate Ritchie et al., 2015 and STAR methods; for visualization only, the data were corrected for the WHO grade); right, black: general population (log2-transformed levels; Generation Scotland cohort).
Increased levels of neutrophils and the occurrence of immature granulocyte precursors as markers of emergency myelopoiesis have been linked to severe COVID-19 (Schulte-Schrepping et al., 2020). Our data reveal covariation between neutrophil counts and the levels of two inhibitors of neutrophil serine proteases, SERPINA1 and SERPINA3 (Figure 1C). These two proteins show the highest correlation (0.72 and 0.79 Spearman R, respectively) with the neutrophil-to-lymphocyte ratio (NLR), a prognostic marker for COVID-19 (Lian et al., 2020; Liu et al., 2020a). We further report a strong correlation (Figure 1C) of alkaline phosphatase and gamma-glutamyl transferase activities, both characteristic of biliary disorders (Poynard and Imbert-Bismut, 2012), with plasma levels of the polymeric immunoglobulin receptor (PIGR). We notice that cholangiocytes (bile duct epithelium cells) express ACE-2 and can be directly infected with SARS-CoV-2 (Zhao et al., 2020), potentially leading to host viral response-induced expression of PIGR and cell destruction (Schneeman et al., 2005; Turula and Wobus, 2018).
A map of plasma proteins and diagnostic parameters that depend on age and disease severity
113 proteins and 55 accredited diagnostic parameters responded in accordance to an increase in the WHO score as a measure of progressing COVID-19 severity (Figures 2, S5, and S6; STAR methods). To the best of our knowledge, more than 30 of these proteins have not been associated with COVID-19 severity previously (Table S3). The proteins that change dependent on disease severity include mediators of inflammation and immune response (CD44, B2M, PIGR, and A2M), components of the complement cascade (CFD, and CFHRs), and apolipoproteins (APOA2, APOC3, APOD, APOE, and APOL1). Furthermore, numerous markers of organ dysfunction (cardiac: NT-proBNP, troponin T; renal: creatinine, urea; liver: aspartate aminotransferase, alanine aminotransferase, gamma-glutamyl transferase, and total bilirubin) and, inversely, markers of anemia (hemoglobin, erythrocytes, and hematocrit) were correlated with the WHO grade of the patient. In order to further dissect the proteomic signatures of the most severely ill patients requiring maximum treatment (WHO grade 7), we specifically characterized the impact of organ support treatments (RRT and ECMO) on the patients’ molecular phenotype (Figures S7 and S8). We showed, for instance, that HP and HPX are reduced in patients on RRT and ECMO as a sign of hemolysis in the extracorporeal circuit, whereas elevated SERPINC1 levels mirror substitution of antithrombin during ECMO. We discuss these findings in Note S1.The molecular phenotype of patients with COVID-19 and its dependency on severity and age(A) Plasma proteome and clinical diagnostic parameters in dependency of COVID-19 severity irrespective of age. The patients are grouped according to the maximum clinical treatment received (WHO ordinal scale), used as an indicator of disease severity (Table S1). 113 proteins and 55 routine diagnostic parameters vary significantly (FDR < 0.05) between patients of the different WHO groups upon accounting for age as a covariate using linear modeling (Ritchie et al., 2015). A fully annotated heatmap is provided in Figure S5 (see also Figure S6 and Table S3).(B) Selected protein markers and routine diagnostic parameters (log2-transformed) plotted against the WHO ordinal scale.(C) Selected proteins differentially abundant depending on age (FDR < 0.05). Left, colored: this data set (log2-transformed levels; statistical testing was performed by accounting for the WHO grade as a covariate Ritchie et al., 2015 and STAR methods; for visualization only, the data were corrected for the WHO grade); right, black: general population (log2-transformed levels; Generation Scotland cohort).A total of 61 proteins and 18 diagnostic parameters varied with patients’ age (Figure S9). Out of these, 37 proteins do not change with age in a pre-COVID-19 general population baseline (Generation Scotland cohort [Smith et al., 2006]), for which proteomes have been measured with the same proteomic technology (Messner et al., 2020) (Figure S10). We observed that a number of markers that increase with age in COVID-19 patients also correlated with a high WHO grade (Figures S6 and S9). To identify markers that are upregulated or downregulated in older patients in comparison with younger patients with a comparable therapy need, i.e. WHO grade, we tested the relationship between omics feature levels and age by accounting for WHO severity grade as a covariate using linear modeling (Ritchie et al., 2015) (STAR methods). This analysis identified 36 proteins and 12 clinical laboratory markers that are up- or are downregulated with age in COVID-19 patients within the same level of care, i.e., one WHO grade (Figures 2C and S11, summarized in Figure 5). Out of these, 20 proteins do not change with age in the pre-COVID-19 population baseline (Generation Scotland cohort proteome data, Messner et al., 2020; Figure S10), or show the opposite correlation with age in the general population (e.g., APOC2, Figure 2C). These proteins that only show an age-dependency in COVID-19 patients but not in the general population point toward age-dependent differences in host response patterns to SARS-CoV-2, and include markers involved in inflammation (SERPINA3, ITIH4, SAA1, SAA1, SAA2, ITIH3, CFB, C7, and AHSG), lipid metabolism (APOC1, APOC2, APOC3, APOB, and APOD), and coagulation (KLKB1, and FBLN1). We consider the implications of these findings in Note S2.
Figure 5
Summary: association of individual plasma proteins, routine diagnostic and physiological parameters with severity, necessary therapy, and progression of COVID-19.
For each statistical test considered (association with WHO grade, prediction of the remaining time in hospital for patients at WHO grade 3, prediction of worsening, i.e., progression to a higher WHO grade in the future, the trend during the peak period of the disease, association with RRT, association with ECMO and association with higher patient age), measurements, which show significant differences are highlighted, with the color indicating the trend, e.g., red for CST3 in the “Association with COVID-19 severity” test indicates higher levels of CST3 in severely ill patients. Proteins for which MRMAssayDB (Bhowmick et al., 2018) lists that a targeted proteomic assay has been developed are indicated with a black bar at the top. Proteins which change significantly with age in the Charité COVID-19 cohort (FDR < 0.05) but do not change significantly (p < 0.05) with age in the general population (Generation Scotland cohort), are highlighted with a white circle in the 7th row (“Association with age”). See also Figures S6–S8, S10, S11, S14, S17, and S18, and Table S3.
Time-dependent alleviation of severity indicators highlights the role of the early host response in COVID-19 progression
The time-resolved nature of our study facilitated a covariation analysis of protein levels and accredited diagnostic parameters along the patient trajectory over time (Figure S12; Table S7). Correlating the dynamics of omics features during the peak period of the disease (STAR methods), we noted covariation of inflammatory markers, APPs, fibrinogen precursor proteins, and the NLR. The correlation between APPs and the markers of cardiac and renal impairment observed across different patients at the earliest time points (Figure 1B; Table S4) was not reflected as a trend over time (Figure S13).To further dissect the dynamics of the patients’ molecular phenotype during the course of COVID-19, we determined the longitudinal trend for all protein and diagnostic parameters during the peak period of the disease (i.e., while receiving maximum treatment; STAR methods). In total, 89 proteins and 37 clinical parameters significantly changed over time (Figure 3B, trends across all time points at the maximum WHO grade are provided in Figure S14; STAR methods). In general, we found that most proteins and diagnostic parameters that correlate with disease severity return toward baseline during the peak period of the disease. Many of these were most prominently changed in the early samples (Figure S6) but alleviated with time, irrespective of the outcome (Figure S14; summarized in Figure 5). For example, components of the coagulation cascade with known acute phase activity, such as fibrinogen, and many complement factors, significantly decreased over time. Proteins indicative of inflammatory response (e.g., ORM1, SERPINA1 and SERPINA3, SAA1, SAA2 [Luo et al., 2015; Sack, 2018; Wu et al., 2015]) and markers of inflammation, such as CRP or IL-6, also declined over time. Conversely, extracellular matrix (ECM) proteins, such as ECM1, LUM, and immunoregulatory factors (e.g. AHSG, A2M Rehman et al., 2013, and HRG Wakabayashi, 2013) and proteins involved in lipid metabolism (e.g., APOC1, APOD, APOM, GPLD1, and PON1), and negative APPs (e.g., ITIH1, Figure 3C), which are downregulated in severe COVID-19 (Figure S6, summarized in Figure 5), increased over time, approaching the baseline. This general alleviation of the initial molecular phenotype of COVID-19 was consistently detected in both mildly and severely ill patients (outlier trajectories discussed in Note S3). Indeed, only 13 proteins showed differences in trend depending on the WHO score (Figure S15). We provide visualization of individual trajectories for all omics features measured between the first and the last time points sampled at the peak of the disease (Figures 3C and S16).
Figure 3
The progression of the COVID-19 molecular patient phenotype over time
(A) Schematic: each patient is followed during inpatient care by repetitive sampling, and the “trajectory” of each of the proteins and the routine diagnostic features is analyzed (points of different colors at each time point) (see also Figure S2).
(B) Protein levels and routine diagnostic parameters that change significantly (FDR < 0.05) over time during the peak of the disease, shown for individual patients stratified by their maximum treatment received (WHO grade): 89 proteins, 37 clinical diagnostic markers show time dependency during the disease course (illustrated as log2-fold changes or absolute value changes, as indicated). The panel to the right of the heatmap provides manual functional annotation for the proteins. Known associations with COVID-19 severity are indicated (blue - downregulated in severe COVID-19, and red - upregulated) (D’Alessandro et al., 2020; Laing et al., 2020; Messner et al., 2020; Shen et al., 2020). Below the heatmap, the time span between the first and the last sampling time point at the peak of the disease is indicated as well as the remaining time until the discharge (see also Figures S14 and S15, and Table S3).
(C) Trajectories (change of log2-transformed levels with time) for selected proteins. Sampling points during the peak period of the disease (STAR methods) are considered. x axis: 0 – first time point measured at the peak of the disease, 1 – last. The y axis reflects the change relative to the first valid measurement during the peak of the disease. Loess approximations are shown for patients, which did (blue), and did not (orange), require invasive mechanical ventilation. See also Figure S16.
The progression of the COVID-19 molecular patient phenotype over time(A) Schematic: each patient is followed during inpatient care by repetitive sampling, and the “trajectory” of each of the proteins and the routine diagnostic features is analyzed (points of different colors at each time point) (see also Figure S2).(B) Protein levels and routine diagnostic parameters that change significantly (FDR < 0.05) over time during the peak of the disease, shown for individual patients stratified by their maximum treatment received (WHO grade): 89 proteins, 37 clinical diagnostic markers show time dependency during the disease course (illustrated as log2-fold changes or absolute value changes, as indicated). The panel to the right of the heatmap provides manual functional annotation for the proteins. Known associations with COVID-19 severity are indicated (blue - downregulated in severe COVID-19, and red - upregulated) (D’Alessandro et al., 2020; Laing et al., 2020; Messner et al., 2020; Shen et al., 2020). Below the heatmap, the time span between the first and the last sampling time point at the peak of the disease is indicated as well as the remaining time until the discharge (see also Figures S14 and S15, and Table S3).(C) Trajectories (change of log2-transformed levels with time) for selected proteins. Sampling points during the peak period of the disease (STAR methods) are considered. x axis: 0 – first time point measured at the peak of the disease, 1 – last. The y axis reflects the change relative to the first valid measurement during the peak of the disease. Loess approximations are shown for patients, which did (blue), and did not (orange), require invasive mechanical ventilation. See also Figure S16.Overall, the molecular patient phenotype reflected an initial spike in the systemic inflammatory response, which alleviated gradually, followed by a protein signature indicative of tissue repair, metabolic reconstitution, and immunomodulation. This was observed in both mildly and severely ill patients, highlighting the early disease phase as a major molecular determinant of the COVID-19 phenotype.
Proteomes and diagnostic clinical markers allow for prediction of disease severity by machine learning
Using a machine learning algorithm based on gradient boosted trees (STAR methods), we first evaluated the extent to which diagnostic parameters and proteomes characterize treatment requirements, as reflected by the WHO grade. Both proteomes and clinical diagnostic parameters were highly discriminative of the patient receiving invasive mechanical ventilation (WHO grade 6 or 7, clinical laboratory values AUROC = 0.97, proteomic data AUROC = 0.98, combined data AUROC = 0.99; Figure 4C). The machine learning models significantly outperformed the predictive scores derived from established COVID-19 risk factors such as age, BMI, Charlson comorbidity index (CCI), or molecular predictors such as CRP or IL-6 levels (Figure 4C). In order to assess the transferability of the proteomic predictors, we tested our model in an independent cohort of 99 hospitalized patients with COVID-19 from another hospital in a different healthcare system (Innsbruck cohort, STAR methods). The proteomic model trained on the main Charité cohort demonstrated a comparably high patient stratification performance when applied to this validation cohort (Figure 4D; AUROC = 0.97). Scores reflecting the contribution of individual proteins and clinical parameters to the machine learning model are provided in Table S3. Of note, we were able to establish machine learning models that not merely classified patients based on severity but were able to predict the current WHO severity grade from the proteome, from clinical measurements, and both (Figure 4E). Again, combined proteomic and clinical laboratory data performed best.
Figure 4
Predicting COVID-19 treatment requirement and future disease progression from the early molecular phenotype by using machine learning.
(A) Selected proteins that are predictive (FDR < 0.05) of the future clinical deterioration of the disease (that is progression to a higher WHO grade in the future; STAR methods). Illustrated are the log2-transformed levels of the proteins at the first sampling point upon correction (for visualization only) for the impact of the WHO grade and age as covariates (Ritchie et al., 2015) (see also Figure S17).
(B) Selected proteins and routine diagnostic parameters predictive (FDR < 0.05) of the remaining time in hospital for patients receiving mild treatment (WHO grade 3). Statistical testing was performed by including patient’s age as a covariate (STAR methods). Illustrated are the log2-transformed levels of the proteins (upon correction for age as a covariate, for visualization only) at the first sampling point, plotted against the remaining time in hospital (days) (see also Figure S18).
(C) Left: performance of a machine learning model characterizing the need for invasive mechanical ventilation, based on either the proteomic data, accredited diagnostic parameters, or both. Right: comparison of the performance of a machine learning model characterising the need for invasive mechanical ventilation based on proteomic data to established prognostic parameters.
(D) Prediction performance, based on the proteome, visualized as boxplots. Cross-validation predictions on the Charité cohort are shown in black, predictions of a model trained on the Charité data and then applied to an independent cohort from another hospital (Innsbruck cohort) are shown in red.
(E) Prediction of the WHO grade itself using machine learning (cross-validated, first time point at the maximum treatment level for each patient is used, STAR methods), based on either the proteome, clinical diagnostic parameters, or both. The performance of the proteomic model trained on the Charité cohort and applied to the Innsbruck cohort is also shown.
(F) A machine learning model was trained to predict the level of necessary treatment (WHO grade) using the data (proteomics, clinical, or both) from the first time point measured for each patient. Derived predictions for patients who did not receive supplemental oxygen at this time point (WHO = 3) were plotted against the remaining time (days) in hospital for these patients.
Predicting COVID-19 treatment requirement and future disease progression from the early molecular phenotype by using machine learning.(A) Selected proteins that are predictive (FDR < 0.05) of the future clinical deterioration of the disease (that is progression to a higher WHO grade in the future; STAR methods). Illustrated are the log2-transformed levels of the proteins at the first sampling point upon correction (for visualization only) for the impact of the WHO grade and age as covariates (Ritchie et al., 2015) (see also Figure S17).(B) Selected proteins and routine diagnostic parameters predictive (FDR < 0.05) of the remaining time in hospital for patients receiving mild treatment (WHO grade 3). Statistical testing was performed by including patient’s age as a covariate (STAR methods). Illustrated are the log2-transformed levels of the proteins (upon correction for age as a covariate, for visualization only) at the first sampling point, plotted against the remaining time in hospital (days) (see also Figure S18).(C) Left: performance of a machine learning model characterizing the need for invasive mechanical ventilation, based on either the proteomic data, accredited diagnostic parameters, or both. Right: comparison of the performance of a machine learning model characterising the need for invasive mechanical ventilation based on proteomic data to established prognostic parameters.(D) Prediction performance, based on the proteome, visualized as boxplots. Cross-validation predictions on the Charité cohort are shown in black, predictions of a model trained on the Charité data and then applied to an independent cohort from another hospital (Innsbruck cohort) are shown in red.(E) Prediction of the WHO grade itself using machine learning (cross-validated, first time point at the maximum treatment level for each patient is used, STAR methods), based on either the proteome, clinical diagnostic parameters, or both. The performance of the proteomic model trained on the Charité cohort and applied to the Innsbruck cohort is also shown.(F) A machine learning model was trained to predict the level of necessary treatment (WHO grade) using the data (proteomics, clinical, or both) from the first time point measured for each patient. Derived predictions for patients who did not receive supplemental oxygen at this time point (WHO = 3) were plotted against the remaining time (days) in hospital for these patients.Having observed clear time trajectories for many proteins and diagnostic parameters, we hypothesized that the molecular signature of the initial host response can be exploited for the prediction of the future disease course. We started by investigating the potential of using the levels of proteins and diagnostic parameters for prediction of future clinical worsening, defined as progression to a higher severity grade on the WHO scale, i.e., a requirement for supplemental low-flow oxygen therapy, high-flow oxygen therapy, or invasive mechanical ventilation. Upon using a linear model to account for current therapy (WHO grade) and age as covariates, 11 proteins and 9 clinical laboratory markers were identified as predictors of future worsening of the clinical condition, across all treatment groups (STAR methods) (Figures 4A and S17; Box 1). Increased or decreased plasma levels of these proteins functioning in inflammation (CRP, ITIH2, SERPINA3, AHSG, and B2M), coagulation (HRG, and PLG), and complement activation (C1R, and CFD), as well as levels of AGT and CST3, were predictive of future clinical deterioration.
High levels indicative of poor prognosis
AGT: Angiotensinogen: Conversion via angiotensin-converting enzymes ACE and ACE2 produces AngI/AngII (pro-inflammatory, vasoconstrictive, pro-fibrotic) and Ang1-7/Ang1-9 (anti-inflammatory, vasodilative, anti-fibrotic), respectively (Turner, 2015; Zhang et al., 2020). Increased AGT likely reflects increased AngI/AngII due to SARS-CoV-2 mediated inactivation of ACE2 (Tay et al., 2020) and subsequently predominant conversion of AGT to AngII. AngII correlates with viral load (Liu et al., 2020c) and has tissue damaging effects (Zhang et al., 2020).B2M: Beta-2-microglobulin: Component of major histocompatibility complex (MHC I) on all nucleated cells and platelets. Released abundantly by activated platelets leading to pro-inflammatory M1-like macrophage polarization (Hilt et al., 2019). Increase of B2M has been associated with death in patients with chronic kidney disease (Makridakis et al., 2020).C1R: Complement C1r: Initiator of the classical complement pathway (Hajishengallis et al., 2017).CFD: Complement Factor D: Initiator of the alternative complement pathway by cleaving Factor B (CFB) to form the C3bBb alternative pathway convertase (Volanakis and Narayana, 1996).CRP: C-reactive protein: Acute phase protein, strongly upregulated in inflammation and infection, including COVID-19.CST3: Cystatin C: Biomarker of kidney function (Peralta et al., 2011).SERPINA3: Alpha-1-antichymotrypsin: Protease inhibitor of neutrophil cathepsin G (Benarafa, 2015). When cleaved at reactive site loop, it becomes stable to degradation and becomes a strong neutrophil chemoattractant (Banda et al., 1988; Potempa et al., 1991)
Low levels indicative of poor prognosis
AHSG: Alpha-2-HS glycoprotein (Fetuin-A): Negative acute phase protein attenuating macrophage activation and neutrophil degranulation (Ombrellino et al., 2001).HRG: Histidine-rich glycoprotein: Negative acute phase protein, regulator of inflammation and immune response, clearance of pathogens and cell detritus as well as coagulation and fibrinolysis through a variety of interactions (Poon et al., 2011; Wakabayashi, 2013).ITIH2: Inter-alpha-trypsin inhibitor heavy chain H2: Covalently linked to bikunin (AMBP), the complex binds to hyalarunan (HA) to form serum-derived hyaluronan-associated protein (SHAP) which has matrix-stabilizing and immunomodulatory effects (Fries and Blom, 2000; Zhuo et al., 2004).PLG: Plasminogen, Plasmin: Mediator of fibrinolysis (Chapin and Hajjar, 2015). More recently, immunological functions including neutrophil attenuation as well as macrophage efferocytosis and polarization from pro-inflammatory M1 to tissue-repairing M2 phenotype have been identified (Heissig et al., 2020).Next, we investigated the predictability of the remaining time needed in the hospital for mildly ill patients with maximum WHO grade 3. We identified 26 protein biomarkers and 14 routine diagnostic markers (Figures 4B and S18) that correlate with the time between the first sampling point and discharge from inpatient care. The proteomic signature associated with a longer need for inpatient treatment is characterized by proteins of the complement system (C1QA, C1QB, and C1QC) and reflects altered coagulation (KLKB1, PLG, and SERPIND1) and inflammation (CD14, B2M, SERPINA3, CRP, GPLD1, PGLYRP2, and AHSG). As most of these proteins are also predictors of the required treatment (Figure S6; Table S3), we hypothesized that the time of inpatient care for mild (WHO grade 3) cases correlates with the severity of the disease in these patients. To test this hypothesis, we generated machine learning models for WHO grade prediction, similar to those shown in Figure 4E, but trained the model only on the first time point data measured for each patient (to avoid using any future information with respect to that time point). We observed that the predictions derived from the first time point data correlated with the remaining time in the hospital (Figure 4F). We conclude that machine learning allows us to finely distinguish between more and less severe patients within a single treatment group, i.e. WHO grade.
Discussion
Upfront clinical decision making is essential for optimum treatment allocation to patients as well as for efficient resource management within the hospital. For instance, early referral to intensive care treatment units has been shown to improve prognosis and outcome for patients with severe COVID-19 (Sun et al., 2020). One of the peculiarities of COVID-19 is that the examinable clinical conditions of patients often do not reflect the true severity of the disease, e.g., with respect to respiratory insufficiency. In contrast to patients with severe bacterial pneumonia, patients with COVID-19 often clinically appear to be only slightly affected, despite being in severe respiratory failure, a phenomenon termed “happy hypoxemia” (Stawicki et al., 2020). Clinical decisions therefore need to be supported by objective, molecular diagnostics. These diagnostic analyses help further in the monitoring of therapies and clinical trials as they allow for determining the extent to which a given patient has deviated from the disease trajectory that would be achieved without therapy.Several recent investigations have identified protein biomarkers and clinical parameters that classify patients with COVID-19 according to disease severity and/or received treatment (D’Alessandro et al., 2020; Laing et al., 2020; Liu et al., 2020b; Messner et al., 2020; Overmyer et al., 2020; Schulte-Schrepping et al., 2020; Shen et al., 2020; Shu et al., 2020; Wynants et al., 2020). In other studies, the potential prognostic value of several established and newly discovered markers for predicting the future course of the disease was investigated, e.g., for IL-6, ferritin, or resistin (Grifoni et al., 2020; Meizlish et al., 2020). Yet, it remained challenging so far, to put their prognostic value in relation to patient age and current level of care, the two most important apparent characteristics for prognosis in COVID-19. For instance, a patient at WHO grade 5 who requires high-flow oxygen therapy is significantly more likely to progress to mechanical ventilation and subsequently die than an inpatient at WHO grade 3 that does not require oxygen support. Likewise, a 90-year-old patient at WHO grade 3 is significantly more likely to progress to more severe disease and to stay in the hospital for a longer period of time than a 20-year-old patient at the same WHO grade.To identify (1) which proteomic markers and laboratory parameters correlate with each other by being attributed to a common biological or physiological response, and (2) which markers reflect disease trajectories, we longitudinally phenotyped 139 individuals admitted at Charité University Hospital, Berlin, Germany, between March 01, 2020, and June 30, 2020, due to PCR-confirmed SARS-CoV-2 infection (Figure S2). We recorded a large panel of 86 parameters with accredited diagnostic tests. These tests capture the compendium of analytical parameters that are available for the clinical routine. In parallel, we record plasma proteomes using a recently developed mass spectrometry platform. This platform introduced the use of analytical (high-flow rate) chromatography to routine proteomics in order to increase throughput and measurement precision to the scale of clinical trials (Messner et al., 2020). The platform reaches a similar proteomic depth as other contemporary mass spectrometry technologies that address undepleted human plasma that is constrained by its huge dynamic range (Anderson and Anderson, 2002) (Box 2 for the resources generated).We provide deep and time-resolved resources that characterize COVID-19 at the level of plasma proteomes and established diagnostic parameters. We demonstrate the extent to which proteomes and diagnostic parameters interdepend, in initial response to the disease and in dynamics during the disease course. We show how they change with age, differ depending on the disease severity, reflect the therapy received and evolve over time. Our data have been acquired for COVID-19 patients’ samples and analyzed in the context of general population proteomics (Generation Scotland) for which samples have been measured with the same proteomic technology (Messner et al., 2020), but we also expect it to be of high value as a reference for studies of other types of viral pneumonia as well as any investigations involving both routine clinical phenotyping and plasma proteomics.Summary of the resource data generated in the study.Covariation maps. We provide a covariation map between plasma proteins measured with at least 3 peptides and clinical laboratory measurements (Figure 1B; Table S4). In addition, we provide a full covariation map between all features measured in the study (Figure S3; Table S5) as well as a COVID-19 specific protein-protein covariation map (Figure S4; Table S6). Finally, we also provide a correlation map for the changes of different omics features with time (Figure S12; Table S7).A map of plasma protein levels and clinical laboratory measurements depending on disease severity (Figures S5 and S6; Table S3).Characterization of age-dependency of plasma protein levels and clinical laboratory measurements in COVID-19, and in comparison with the general population (Figures 2C and S9–S11; Table S3).Characterization of the dynamics of plasma protein levels and clinical laboratory measurements during the course of COVID-19 (Figures 3B, S14, and S15; Table S3).Characterization of the predictive power of plasma protein levels and clinical laboratory measurements in COVID-19 (Figures 4, S17, and S18; Table S3).Proteomic and clinical signatures observed in severe COVID-19 patients undergoing RRT and ECMO (Figures S7 and S8; Table S3).The comprehensive and time-resolved molecular phenotyping of this patient cohort is complemented by a comparison with a healthy population baseline (Generation Scotland [Smith et al., 2006]) measured with the same proteomic platform (Messner et al., 2020), and the characterization of an independent cohort from an unrelated health care system (Innsbruck cohort, Austria) for validating the created predictors. The measurements were performed on samples collected during the early period of COVID-19, i.e., before immunomodulatory treatments such as dexamethasone became standard of care for severe COVID-19 (RECOVERY Collaborative Group, 2020). Our data thus reflect treatment-naive trajectories, which are of major value as baseline data for future studies.We report an initial spike in the early inflammatory host response as a determinant for the future course of the disease. As our results indicate, the patients in our cohort showed molecular marker signatures of higher basal inflammation with increasing age, which might be partially responsible for the higher risk of severe COVID-19 in older individuals. While several approaches of targeted anti-inflammatory treatment have not been successful in preventing clinical deterioration in COVID-19 so far (Stone et al., 2020), our study indicates that this special population of older patients might benefit particularly from treatments that mitigate the inflammatory host response. We report numerous interdependencies between clinical laboratory markers and alterations in proteomes, linking, for example, clinical inflammatory markers to components of the complement cascade and the coagulation system. Using machine learning, we show that both plasma proteomes and the compendium of established diagnostic parameters can be used for accurate characterization of disease severity, significantly outperforming established individual risk markers, such as CRP or IL-6 levels. Of note, the combination of proteomic features and clinical laboratory markers repeatedly showed the best performance in the machine learning models. Furthermore, the models generated could be transferred for prediction in an independent cohort from another hospital and healthcare system, highlighting the robustness of this approach and its translational potential.We observed a considerable overlap between prognostic markers and those that classify treatment according to COVID-19 severity (Figure 5). Out of 49 prognostic markers, 41 correlated with the WHO severity score. As an example, SERPINA3 (Alpha-1 antichymotrypsin) can be used for both the classification of severity and prediction of future disease course. Both SERPINA3 and SERPINA1, another classifier of severity, possess anti-inflammatory properties and are involved in the protection of tissues from neutrophil elastase- and cathepsin G-mediated tissue damage (Benarafa, 2015). Our data show a strong correlation of both serpins with levels of neutrophils and NLR in peripheral blood. SERPINA1 is mainly produced by the liver but it is also produced in epithelial cells, pulmonary alveolar cells, tissue macrophages, blood monocytes, and granulocytes. Hence, this finding presumably reflects a systemic response to the increased NLR. After binding to effector enzymes, SERPIN-proteinase complexes are normally rapidly cleared from the blood but become resistant to degradation when cleaved at the reactive site loop (Gettins and Olson, 2016). Cleaved SERPINA1 and SERPINA3 have been shown to act as strong neutrophil chemoattractants (Banda et al., 1988; Potempa et al., 1991). The observed increase in levels of SERPINA1 and SERPINA3 might therefore partly reflect the more stable, chemoattractant, pro-inflammatory cleaved forms, rather than the short-lived tissue-protective proteins in severe COVID-19. Given the prominent role of neutrophil activation in severe COVID-19 (Schulte-Schrepping et al., 2020), this finding merits further investigation.Summary: association of individual plasma proteins, routine diagnostic and physiological parameters with severity, necessary therapy, and progression of COVID-19.For each statistical test considered (association with WHO grade, prediction of the remaining time in hospital for patients at WHO grade 3, prediction of worsening, i.e., progression to a higher WHO grade in the future, the trend during the peak period of the disease, association with RRT, association with ECMO and association with higher patient age), measurements, which show significant differences are highlighted, with the color indicating the trend, e.g., red for CST3 in the “Association with COVID-19 severity” test indicates higher levels of CST3 in severely ill patients. Proteins for which MRMAssayDB (Bhowmick et al., 2018) lists that a targeted proteomic assay has been developed are indicated with a black bar at the top. Proteins which change significantly with age in the Charité COVID-19 cohort (FDR < 0.05) but do not change significantly (p < 0.05) with age in the general population (Generation Scotland cohort), are highlighted with a white circle in the 7th row (“Association with age”). See also Figures S6–S8, S10, S11, S14, S17, and S18, and Table S3.Our data also highlight angiotensinogen (AGT) as a marker for future worsening. Activation of angiotensinogen occurs via the protease renin and the endogenous angiotensin-converting enzymes ACE or ACE2. ACE converts angiotensin I (AngI) to pro-inflammatory, vasoconstrictive, and pro-fibrotic angiotensin II (AngII) (Zhang et al., 2020). ACE2, in contrast, mediates conversion of angiotensins I and II to anti-inflammatory, vasodilative, anti-fibrotic, and anti-oxidant angiotensins 1–9 (Ang1-9) and 1–7 (Ang1-7) (Turner, 2015). SARS-CoV-2 invades host cells of the lung, heart, kidneys, and other organs via ACE2, resulting in the internalization and downregulation of ACE2 (Hoffmann et al., 2020; Tay et al., 2020; Zhang et al., 2020). Subsequently, angiotensinogen is converted predominantly via ACE to AngII and is less degraded by ACE2, resulting in AngII accumulation (Batlle et al., 2012; Silhol et al., 2020). We can thus assume that the higher plasma levels of AGT gene products in severely ill patients, as measured in our study, mainly reflect the higher levels of AngII. Importantly, we observed a strong correlation of AGT with markers of acute kidney injury (AKI; creatinine, urea; Figures S3 and S13; Table S5), a frequent complication of COVID-19 and a risk factor for poor prognosis and fatal outcome (Fu et al., 2020). Aggravated by the absence of tissue-protective Ang1-7, elevated levels of AngII lead to activation of the renin-angiotensin-system (RAS) and contribute to hypoxic kidney injury (Kasal et al., 2020). Of note, apart from tissue damaging effects, AngII has been shown to linearly correlate with viral load and lung injury in SARS-CoV-2 infection (Liu et al., 2020c).Overall, many of the markers that are both classifiers and predictors of the future disease course are initiators of the inflammatory response. This group includes some of the key initiators of the complement cascade: C1QA, C1QB, C1QC, C1R, and CFD. In contrast, severity markers without prognostic value largely include downstream effectors of inflammation-associated damage, such as GSN and the circulating actins ACTBL2 and ACTB, ACTG1, and ECM1. Thus, this high-precision, high-throughput approach can help us understand mechanisms of immune-mediated organ damage on a molecular basis.Despite the high resolution and high throughput of the mass spectrometry platform deployed in our study, the direct translation of our results into clinical practice will require the development of a clinical assay according to FDA or EMA standards. We further note that the use of machine learning is currently not a certified method to inform clinical decisions. However, in addition to multiple works that have successfully used machine learning for clinical prognosis previously (see recent reviews [Kelly et al., 2019; Lee and Lee, 2020; Nagendran et al., 2020; Shah et al., 2019; Vollmer et al., 2020]), our results bear a strong implication of the future potential of machine learning for clinical applications, including personalized medicine. This calls for a worldwide effort aimed at developing procedures, which would allow reliable clinical validation of machine learning predictors, their approval, and their routine deployment in the clinic.In summary, by following a deeply phenotyped COVID-19 patient cohort over time at the level of the proteome and established diagnostic biomarkers and physiological parameters, we have created a rich data resource for understanding the extent and progression of COVID-19. We have shown that an early spike in the inflammatory response is a key determinant of COVID-19, and that future disease progression is predictable by using panels of accredited diagnostic parameters as well as proteomic measurements from early time point samples. By using machine learning, we demonstrated that the proteome allows to precisely characterize the patients’ phenotype and that the resulting machine learning models are robust and perform accurately when applied to samples from a different hospital and healthcare system. Our study provides comprehensive information about the key determinants of the varying COVID-19 trajectories as well as marker panels for early prognosis that can be exploited for clinical decision making, to devise personalized therapies, as well as for monitoring the development of much needed COVID-19 treatments.
Consortia
Malte Kleinschmidt, Katrin M. Heim, Belén Millet, Lil Meyer-Arndt, Ralf H. Hübner, Tim Andermann, Jan M. Doehn, Bastian Opitz, Birgit Sawitzki, Daniel Grund, Peter Radünzel, Mariana Schürmann, Thomas Zoller, Florian Alius, Philipp Knape, Astrid Breitbart, Yaosi Li, Felix Bremer, Panagiotis Pergantis, Dirk Schürmann, Bettina Temmesfeld-Wollbrück, Daniel Wendisch, Sophia Brumhard, Sascha S. Haenel, Claudia Conrad, Philipp Georg, Kai-Uwe Eckardt, Lukas Lehner, Jan M. Kruse, Carolin Ferse, Roland Körner, Claudia Spies, Andreas Edel, Steffen Weber-Carstens, Alexander Krannich, Saskia Zvorc, Linna Li, Uwe Behrens, Sein Schmidt, Maria Rönnefarth, Chantip Dang-Heine, Robert Röhle, Emma Lieker, Lucie Kretzler, Isabelle Wirsching, Christian Wollboldt, Yinan Wu, Georg Schwanitz, David Hillus, Stefanie Kasper, Nadine Olk, Alexandra Horn, Dana Briesemeister, Denise Treue, Michael Hummel, Victor M. Corman, Christian Drosten, and Christof von Kalle
STAR★Methods
Key resources table
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Markus Ralser (markus.ralser@charite.de).
Materials availability
This study did not generate new unique reagents.
Data and code availability
The processed proteomic and clinical source data is available in this paper’s supplemental information.The raw mass spectrometry proteomics source data for the quality control plasma and serum acquisitions has been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (Perez-Riverol et al., 2019) with the dataset identifier PXD025752.This paper does not report original code.The machine learning scripts used to generate the figures reported in this paper are available in this paper’s supplemental information.Any additional information required to reproduce this work is available from the Lead Contact.
Experimental model and subject details
Charité patient cohort and clinical data
Patients were recruited within the Pa-COVID-19 study conducted at Charité - Universitätsmedizin Berlin, a prospective observational cohort study on the pathophysiology of COVID-19. The study protocol has been described in detail before (Kurth et al., 2020). All patients with PCR-confirmed SARS-CoV-2 infection were eligible for inclusion. Refusal to provide informed consent by the patient or a legal representative and any condition prohibiting supplemental blood collection for serial biosampling were exclusion criteria. Patients were treated according to current national and international guidelines. Three patients had Do Not Intubate and Do Not Resuscitate (DNI/DNR) orders in place, declining mechanical ventilation and other organ support or cardiopulmonary resuscitation. In 4 further cases, limitation of therapy was decided at a later time point according to the patient’s presumed wish (“secondary DNR”) and predictably unfavorable outcome. All other patients received maximum intensive care treatment including organ replacement therapies at the discretion of the responsible physicians.Biosampling for proteome measurement was performed 3 times per week after inclusion. The WHO ordinal scale for clinical improvement (Table S1) was used to assess disease severity. ARDS was defined according to the Berlin ARDS criteria (ARDS Definition Task Force et al., 2012). Sepsis was defined according to sepsis-3 criteria (Singer et al., 2016). The study was approved by the ethics committee of Charité - Universitätsmedizin Berlin (EA2/066/20) and conducted in accordance with the Declaration of Helsinki and guidelines of Good Clinical Practice (ICH 1996). The study is registered in the German and the WHO international registry for clinical studies (DRKS00021688). Clinical data was captured in a purpose built electronic case report form data using the capture system SecuTrial®. All routine laboratory parameters were analyzed in accredited laboratories at Charité - Universitätsmedizin Berlin. Pseudonymized data exported from SecuTrial® were processed using JMP Pro 14 (SAS Institute Inc., Cary, NC, USA). If a laboratory value was missing for a given day, values from up to two preceding days were used for the analysis.
Innsbruck Patient cohort and clinical data
Serum samples from 99 patients admitted to the intensive care unit at the Department of Medicine, University Hospital of Innsbruck for the treatment of respiratory failure due to severe COVID-19 were collected within the first days (median 7.5, IQR 5-12) after admission. Written informed consent was either obtained before sampling or retrospectively after recovery, if patients were mechanically ventilated at the time of sampling. COVID-19 was diagnosed on the basis of a (i) positive SARS-CoV2 PCR within the last 7 days prior to study inclusion, (ii) respiratory failure defined as a partial pressure of oxygen < 60 mmHg on arterial blood gas analysis or a peripheral oxygen saturation of < 90% and (iii) typical infiltrates on computed tomography scanning of the chest. Patients were treated according to national guidelines. The study was approved by the local ethics research committee EK-Nr. 1107/2020, and EK-Nr. 1103/2020 for follow-up.
Method details
Materials
Water for chromatography (LC-MS Grade, LiChrosolv®, Merck; 115333), Acetonitrile (LC-MS Grade Optima; A955-212) and Methanol (LC-MS Grade, Optima; A456-212) were purchased from Fisher Chemicals. DL-Dithiothreitol (BioUltra, 43815), Iodoacetamide (BioUltra, I1149) and Ammonium Bicarbonate (Eluent additive for LC-MS, 40867) were purchased from Sigma Aldrich. Urea (puriss. P.a., reag. Ph. Eur., 33247H) and Formic Acid (Eluent additive for LC-MS, 85178) were purchased from Thermo Scientific. Trypsin (Sequence grade, V511X) was purchased from Promega. Control samples were prepared from Human Serum (Sigma Aldrich, S7023-50MB) and Human Plasma (EDTA, Pooled Donor, Genetex GTX73265).
Mass spectrometry
Mass spectrometry-based proteomics analysis was performed as described previously (Messner et al., 2020) with minor adjustments to the workflow (Figure S1). Semi-automated sample preparation was performed in 96-well format, using in advance prepared stock solution plates stored at -80°C. Briefly, 5μl of thawed plasma samples were transferred to the pre-made denaturation/reduction stock solution plates (55μl 8M Urea, 100mM ammonium bicarbonate (ABC), 50mM dithiothreitol) resuspended and incubated at 30°C for 60 minutes. 5μl was then transferred from the iodoacetamide stock solution plate (100mM) to the sample plate and incubated in the dark at 23°C for 30 minutes before dilution with 100mM ABC buffer (340μl). 220μl of this solution was transferred to the pre-made trypsin stock solution plate (12.5μl, 0.1μg/μl) and incubated at 37°C for 17 h (Benchmark Scientific Incu-Mixer MP4). The digestion was quenched by addition of formic acid (10% v/v, 25μl). The digestion mixture was cleaned-up using C18 96-well plates (BioPureSPE Macro 96-Well, 100mg PROTO C18, The Nest Group) and redissolved in 60μl 0.1% formic acid with shaking. Insoluble particles were removed by centrifugation and the samples transferred to a new plate.Each 96-well plate contained 8 plasma and 4 serum sample preparation controls, and the acquisition workflow included a pooled quality control sample every ~10 injections. Liquid chromatography was performed using the Agilent 1290 Infinity II system coupled to a TripleTOF 6600 mass spectrometer (SCIEX) equipped with IonDrive Turbo V Source (Sciex). A total of 5μl was injected, and the peptides were separated in reversed phase mode using a C18 ZORBAX Rapid Resolution High Definition (RRHD) column 2.1mm x 50mm, 1.8μm particles or Infinitylab Poroshell 120 EC-C18, 2.1 x 50mm 1.9 μm particles. A linear gradient was applied which ramps from 1% B to 40% B in 5 minutes (Buffer A: 0.1% FA; Buffer B: ACN/0.1% FA) with a flow rate of 800μl/min. For washing the column, the organic solvent was increased to 80% B in 0.5 minutes and was kept for 0.2 minutes at this composition before going back to 1% B in 0.1 min. The mass spectrometer was operated in the high sensitivity mode. The DIA/SWATH method consisted of an MS1 scan from m/z 100 to m/z 1500 (20 ms accumulation time) and 25 MS2 scans (25ms accumulation time) with variable precursor isolation width covering the mass range from m/z 450 to m/z 850 (Messner et al., 2020). An IonDrive Turbo V Source (Sciex) was used with ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas set to 50, 40 and 25, respectively. The source temperature was set to 450 and the ion spray voltage to 5500V. System suitability was evaluated using synthetic peptides (Sciex 5045759, Biognosys Ki-30002-b) and human protein extracts (Promega V6951).
Quantification and statistical analysis
Data analysis
The data were processed with DIA-NN (Demichev et al., 2020), an open-source software suite for DIA / SWATH data processing (https://github.com/vdemichev/DiaNN, commit 4498bd7) using a two-step spectral library refinement procedure as described previously (Messner et al., 2020), with filtering at precursor level q-value (1%), library q-value (0.5%) and gene group q-value (1%). Highly hydrophobic peptides (reference retention time > 110 on the iRT scale) were discarded. Batch correction was performed at the precursor level as described previously (Messner et al., 2020), using linear regression for intra-batch correction (for each MS batch) and control samples for inter-plate correction. Protein quantification was subsequently carried out using the MaxLFQ algorithm (Cox et al., 2014; Pham et al., 2020) as implemented in the DIA-NN R package (https://github.com/vdemichev/diann-rpackage). One of the 96-well plates (#12) featured technical replicates of a number of samples that were also analysed on other plates: in an extra batch correction step, the median log2-protein levels across these replicates on plate 12 were matched to the respective median log2-levels (across the same biological samples) throughout other plates, to correct protein levels on plate 12. Further batch correction was performed for Innsbruck data, to match the mean log2-transformed protein levels in the respective control samples to log2-transformed protein levels in control samples acquired for the Charité cohort. The Generation Scotland cohort proteomics raw data, which we described previously (Messner et al., 2020), have been reanalyzed using the updated software pipeline, to ensure comparability. Exclusion of precursors or proteins based on the data completeness was not performed.Statistical testing was performed in the R environment for statistical computing, version 3.6.0 (R core team, www.R-project.org). All protein and clinical laboratory measurements (except for standard and actual base excess, oxyhemoglobin and sO2) were first log2-transformed, to ensure optimal performance of linear models assuming Gaussian errors, as well as to reduce the impact of outliers. Imputation of the data was not performed, as all the statistical tests applied can accommodate missing values. Likewise, no data filtering based on missing value rates was applied. For differential abundance testing, only protein groups matched to at least three different unmodified peptide sequences were considered. Significance testing for a zero median (for analysing trajectories) or against binary variables (worsening, death) was performed using the Wilcoxon W test or Mann-Whitney U test, respectively, as implemented in the “wilcox.test” function of the “stats” R package. Testing against a continuous variable (e.g. when determining significance of pairwise correlations) was performed using the Kendall Tau test, with the slope estimated using the Theil-Sen method, as implemented in the “kendallTrendTest” function of the “EnvStats” (Millard, 2014) package. When covariates had to be taken into account, we used linear modelling with the “limma” (Ritchie et al., 2015) R package, with P-values obtained using “eBayes” (Smyth, 2004). Modelling with “limma” was likewise used to correct for these covariates for visualisation purposes. WHO grade was considered as a “factor-type” covariate (resulting in a “limma” design matrix with one-hot encoding for different WHO grades). Multiple-testing correction was performed using the Benjamini-Hochberg false discovery rate controlling procedure (Benjamini and Hochberg, 1995) as implemented in the “p.adjust” function of the “stats” R package. The adjusted p-values below 0.05 were considered significant. Multiple-testing correction for differential abundance analysis was performed separately for proteins, for which MRMAssayDB lists a targeted assay (Bhowmick et al., 2018), the rest of proteins measured, the clinical laboratory measurements and the clinical factors (age, Charlson score, BMI, Horowitz index and FiO2, SOFA score), to ensure that the false discovery rate stayed below 0.05 for each of these categories of features. Likewise, when determining the significance of correlations in correlation matrices, correction was performed for each row or each column separately, to ensure less than 5% false discoveries in each row or column, respectively. For correlation map visualisations, black points were used to indicate row-wise significant correlations, and black rectangles at the border of the respective cell - column-wise significant correlations.Quantities of gene products corresponding to open reading frames named IGxx (i.e. different types of immunoglobulin chains) were summed together to generate quantities representative of the overall levels of immunoglobulin classes (IGHVs, IGLVs, etc). This does not affect any conclusions of this work and was done purely to improve visualization and simplify the interpretation of the heatmaps and correlation maps. Full protein level tables, including levels of individual immunoglobulin gene products, are provided in supplementary materials. For visualisation, different WHO grades were color-coded throughout the manuscript (see Figure S2).
Markers of the disease severity
The first time point measured at the maximum WHO grade was chosen for each patient. For each omics feature, its values (log2-transformed when necessary, as described above) were tested for a trend depending on the WHO grade. Age was included as a covariate in the linear model as described above.
Markers varying with age in COVID-19
The first sampling time point measured was chosen per patient. For each omics feature, its values (log2-transformed when necessary, as described above) were tested for a trend depending on age. The test was performed either using the Kendall Tau test (as described above; Figures S9 and S10), or by accounting for WHO grade as a covariate in the linear model (as described above; Figures S11 and 5).
Markers of RRT and ECMO
For each omics feature, the P-value was calculated using the Mann-Whitney test, comparing between the median levels (log2-transformed when necessary, as described above) across all sampling time points at WHO grade 7 in patients who did not receive the therapy and the median levels (log2-transformed when necessary, as described above) across all sampling time points at WHO grade 7 after initiation of the respective therapy in patients who did.
Markers predictive of time in hospital
Patients, for which the first sampling time point before the outcome corresponded to the WHO = 3 severity grade (that is the patient did not require supplemental oxygen on that day), were considered. Thus, no correction for disease severity was necessary. Testing of levels (log2-transformed when necessary, as described above) of each omics feature (measured for the first sampling time point) vs the remaining time in hospital (days) was performed by including age as a covariate in the linear model as described above.
Markers predictive of disease worsening
The first sampling time point measured was chosen per patient. Future disease worsening was defined as a future increase in the WHO grade (for patients at WHO grade < 7) or death (for patients at WHO grade 7). For each omics feature, its levels (log2-transformed when necessary, as described above) were compared between patients who did not worsen and patients who did, with age and current WHO grade (as factor) included as covariates in the linear model as described above.
Peak period of the disease definition
When studying the dynamic changes in omics values during the disease course, we focused on the time points sampled when the disease was the most severe for a particular patient. This allowed us to look at molecular changes over time without the need to take into account the potential impact of changes in disease severity and the level of treatment. For each patient, we thus defined the “peak period of the disease” as the time when the patient was receiving the most intensive treatment during their stay in hospital, that is the time when the patient was at WHO grade 6 or 7, for patients who received invasive mechanical ventilation at some point, or otherwise at their maximum WHO grade (3, 4 or 5).
Markers changing during the peak of disease
Only patients with at least two days between the first and last sampling time points at the peak of the disease (as defined above) were considered. For each omics feature, a linear regression model was fitted for its levels (log2-transformed when necessary, as described above) vs the day number (with the slope estimated using the nonparametric Theil-Sen method, as implemented in the “kendallTrendTest” function of the “EnvStats” (Millard, 2014) R package), and the quantity slopeadj = (regression slope) ∗ (number of days between first and last time points) was calculated. A non-parametric approach was chosen because of its superior robustness to outliers. A Wilcoxon W test was then applied to compare the median of slopeadj to zero. The values of slopeadj for each feature are visualised in Figure S14. The non-parametric approach was chosen here due to its robustness with respect to outliers.
Correlation maps
General correlation maps were generated using the values (log2-transformed when necessary, as described above) of features at the first time point measured at the maximum WHO grade for each patient. The correlation map between feature changes during the peak of the disease (as defined above) was generated by correlating the slopeadj values (as defined above). The map of significant protein correlations not detected in the general population was generated by excluding all correlations which were either significant (P <= 0.05, without multiple-testing correction) with the same trend in the Generation Scotland cohort, or could not be calculated reliably therein (less than 20 valid points).
Prediction of current mechanical ventilation
To reflect the power of omics measurements in characterising the phenotype, a classifier was constructed to predict mechanical ventilation (WHO grade > 5) at the present time point using the proteomic and/or accredited diagnostic data. For the proteomic data only proteins characterized by at least 3 peptides were taken into account. The first time point measured at the maximum WHO grade was selected per patient. We used a gradient boosted tree algorithm implemented in the XGBoost 1.2.0 (Chen and Guestrin, 2016) under Python 3.8.1. The classifier was constructed using leave-one-out cross-validation. To circumvent overfitting a subsampling of 0.5 of the training data per boosting step and an L2 regularization term “lambda” of 20 were applied.For the assessment of classifier performance, the leave-one-out method was applied in the following way: the prediction was made for each sample separately, by excluding (withholding) this sample from the dataset, training the classifier on the remaining (independent) samples and then predicting the withheld sample using the trained model. The source code is provided in supplementary materials. For the determination of the feature importances, one classifier was trained on all data points using the same setup as described above. The feature importances were then extracted directly from the trained classifier.For the validation of the trained models, samples from an independent cohort (Innsbruck) were used. A model was trained on the data collected at the Charité using the same setup and parameters as described above and the proteins that were characterized in both cohorts. The evaluation was performed on the Innsbruck cohort that was not used for training. ROC-curves and AUC were calculated using scikit-learn 0.23.2 (Pedregosa et al., 2011). The machine learning scripts are provided in Data S1.
WHO grade prediction
For the prediction of the WHO grade an elastic net was applied as implemented in scikit-learn 0.23.2. The WHO grade was predicted for the first time point at maximum WHO grade per patient using a leave-one-out cross-validation procedure. A training/prediction based on proteomic (proteins with at least 3 peptides) and/or accredited diagnostic data from the Charité cohort was performed. Additionally, the proteomic model was validated using proteomic data set from the Innsbruck cohort that was not included in the training. Features with more than 10% missing values were removed. All data were log2-transformed when necessary (as described above), standardized and kNN-imputed (5 neighbors). The latter two steps were fitted on the training data only. For the elastic net an “l1_ratio” of 0.05 was used coupled to a 5-fold cross-validated recursive feature-elimination algorithm (“step” = 10, “min_features” = 20). Calculations of metrics were performed using scikit-learn 0.23.2 and scipy 1.5.2 (Virtanen et al., 2020). The machine learning scripts are provided in Data S1.
Prediction of the remaining time in hospital
For the prediction of the remaining time in hospital a WHO grade predictor as described above was trained on the first data points for every patient. The predicted WHO grades for every patient at WHO grade 3 who stayed in hospital for at least 1 day after sample time were correlated to the remaining time in hospital. The Spearman correlation was calculated using scipy 1.5.2. The machine learning scripts are provided in Data S1.
Supplementary Note 1. Diagnostic parameters and Proteome signatures that indicate therapeutic interventions
We investigated to what extent specific organ replacement therapies in severely ill patients, (renal replacement therapy (RRT) and extracorporeal membrane oxygenation (ECMO)) were reflected in the proteome and at the level of accredited diagnostic parameters. HP and HPX were reduced in patients on RRT and ECMO, reflecting hemolysis in the extracorporeal circuits (Figures S7 and S8). Elevated SERPINC1 (Antithrombin III) levels mirror substitution of antithrombin during ECMO. The reason for elevated levels of APOE in patients with ECMO is unclear, but is in line with reports on increased levels of APOE in pediatric patients after cardiopulmonary bypass (Aĝirbaşli et al., 2015). The proteins increased in patients receiving RRT mainly reflect impaired kidney function and have been associated with RRT before (AMBP, B2M, CST3, LYZ, RBP4, Figure S7) (Shao et al., 2015). Of note, increased levels of AMBP, B2M and LYZ have been associated with death in chronic kidney disease (Makridakis et al., 2020). Levels of CFD and APOH, both involved in the complement system, were also increased (McDonnell et al., 2020; Volanakis and Narayana, 1996). CFD is eliminated renally and accumulates in end stage renal disease, possibly leading to enhanced complement activation via the alternative pathway (Pascual et al., 1988). In contrast, levels of APOH have even been reported to be slightly lower following high-flux hemodialysis (Han et al., 2018).We note that the analysis of the effect of treatments on the proteome has two limitations. First, some of the markers identified might be prognostic for the treatment rather than reflect its effect. Age and the Charlson comorbidity index belong to this category: patients receiving ECMO were significantly younger and had a lower number of pre-existing chronic conditions than those who did not. Second, the results might be partially confounded by the time elapsed from the onset of the disease, as we have shown (Figure 3) that omics signature changes with time in COVID-19 patients while on invasive mechanical ventilation.
Supplementary Note 2. Age-specific response to COVID-19 in the context of severity markers
Older age is one of the most significant risk factors for severe disease and adverse outcome in COVID-19. Enhanced understanding of underlying mechanisms for the age-specific response to SARS-CoV-2 infection is therefore important and needed for the development of effective age-specific strategies for prevention and treatment. Furthermore, dissecting the age-specific components of the host response will improve our knowledge of the pathogenicity of similar viruses, making the world better prepared for future pandemics. Current theories characterizing the link between the higher age and risk for severe disease include immunosenescence, elevated baseline inflammation, or altered protein glycosylation landscape leading to impaired antiviral response or reduced immune tolerance (Franceschi et al., 2018; Goronzy and Weyand, 2013; Rea et al., 2018; Tay et al., 2020). However, a detailed and mechanistic understanding of the relation between COVID-19 and aging is lacking. In this work, we leverage the large size and high precision of the proteomic data acquired to map the age-related response to COVID-19, to provide a reference dataset (Figures 2C, 5, and S11) for future studies addressing this problem.We report elevation of several inflammatory and acute phase proteins such as SERPINA3, ITIH4, SAA1, and ITIH3 in older patients with COVID-19. SAA1 has been shown to induce macrophage polarization to the M2-type which promotes tissue repair but also possesses pro-fibrotic properties involved in the pathogenesis of pulmonary fibrosis (Liu et al., 2014; Page et al., 2012; Wermuth and Jimenez, 2015). Moreover, SAA1 mediates displacement of APOA1 from HDL leading to loss of the cardio- and vasoprotective properties of high density lipoprotein (HDL) (Gordon, 2014). SERPINA3, as discussed above, has an ambivalent role as a neutrophil proteinase inhibitor but also a powerful neutrophil chemoattractant. Upregulation of SERPINA3 with age in COVID-19, along with the higher neutrophil-to-lymphocyte ratio, suggests that excessive neutrophil response is one of the aggravating factors in older COVID-19 patients. Taken together, our findings point toward a disproportionately dysregulated inflammatory response to SARS-CoV-2 with age, which may be explained by an increased baseline inflammation and immunosenescence in older patients (Chung et al., 2019; Ferrucci et al., 2005; Soysal et al., 2016). Age-dependent increase of FBLN1 and decrease of KLKB1 reflect alterations in blood coagulation which may aggravate this effect by predisposing older patients to thromboembolic events, one of the key clinical characteristics of severe COVID-19.Interestingly, a number of apolipoproteins displayed a strong age-specific signature in COVID-19. For instance, APOC2, a component of chylomicrons, very low density lipoprotein (VLDL) and high density lipoprotein (HDL), and activator of lipoprotein lipase involved in triglyceride metabolism (Ramasamy, 2014), was downregulated with age in COVID-19, but upregulated with age in the general population (Harris et al., 2017; Peters et al., 2015) (Figure 2C). Dysregulation of apolipoproteins has been observed in community acquired pneumonia and associated with unfavourable outcome (Sharma et al., 2017). Remarkably, contrary to the general trend, APOD, APOC3 and APOE show opposite trends in older COVID-19 patients and in severe disease (Figure 5). APOD is expressed by many tissues, including the brain (Dassati et al., 2014). An increase in APOD has been previously observed in ischemic stroke and CNS inflammation and may reflect (subclinical) involvement of the central nervous system especially in older patients with more severe inflammation and more comorbidities (Muffat and Walker, 2010). Conversely, high levels of APOD have been shown to temper coronavirus-mediated encephalitis in mice, indicating its role as a marker of CNS damage as well as tissue protection and repair (Carmo et al., 2008). APOE, involved in inflammation, immune response and lipid metabolism, is upregulated in severe COVID-19 but downregulated with age in this cohort. APOE typically mediates anti-inflammatory effects by downregulation of NFκB and inhibition of macrophage response to IFNy and TLR3, both mediators of viral immune response. Moreover, it neutralizes bacterial LPS and enhances the adaptive immune response by facilitating antigen presentation (Figueroa et al., 2019). Downregulation with age may reflect a compromised immune response leading to over-activation of NFκB and insufficient pathogen clearance in older patients. Finally, APOE has been described to reduce proliferation of myeloid progenitor cells (Murphy et al., 2011) and to reduce myeloid derived suppressor cell (MDSC) survival in mice (Tavazoie et al., 2018). Thus, lower levels of APOE in the elderly may favor expansion of immature and dysfunctional neutrophils that have been described as a hallmark in severe COVID-19 (Schulte-Schrepping et al., 2020). This broad involvement of APOE merits further investigation in future studies.
Supplementary Note 3. Diverging trends at the proteome level during the disease peak in individual patients
Some patients (59, 90, 96, 123) who died exhibited protein concentration trajectories distinctly similar to “typical” survivors (Figure 3B). Two of them (59, 90) had a prolonged ICU stay with repeated septic episodes and finally defined limitations of therapy according to presumed patients’ wishes (“secondary DNR”). Their protein signatures probably reflect the phenomenon of immune paralysis that can follow bacterial sepsis associated with a prolonged ICU treatment (Patricio et al., 2019). One patient (96) was receiving ongoing immunosuppressive therapy for an autoimmune disorder, and a fourth patient (123) had a history of kidney transplantation, both died of septic shock. Whether the particular group of solid organ recipients shows a distinct protein signature associated with the outcome requires further investigation.We also note that some surviving patients do not show a trajectory characteristic of the typical ‘alleviation’ of the proteomic phenotype (WHO = 4: 58, 106, 153; WHO = 6 or 7: 43, 80). Specifically, the proteomic response in patients 106, 153 and 141 was indicative of overall ‘worsening’ of the proteome (Figure 3B). In contrast, patients 43 and 80 exhibited the overall ‘alleviation’ of the proteome, except for the spike in the levels of CRP and serum amyloid (Figure 3B). Shorter time spans between sampling days may explain these observations in four of these patients (43, 58, 80, 106), indicating that the host inflammatory response requires a certain time to resolve, especially in more severely ill patients, and some of the markers of systemic inflammation might linger, whereas a typical alleviation of the proteomic signature can be observed even within a few days in moderate disease courses. The unusual pattern of patient 153 was likely confounded by a skin infection that subsequently required antibiotic treatment.
REAGENT or RESOURCE
SOURCE
IDENTIFIER
Biological samples
Human Serum
Sigma-Aldrich
Cat# S7023-50MB
Human Plasma (EDTA, Pooled Donor)
Genetex
Cat# GTX73265
Chemicals, peptides, and recombinant proteins
Water for chromatography (LC-MS Grade) LiChrosolv®
Authors: Thomas Sonnweber; Piotr Tymoszuk; Sabina Sahanic; Anna Boehm; Alex Pizzini; Anna Luger; Christoph Schwabl; Manfred Nairz; Philipp Grubwieser; Katharina Kurz; Sabine Koppelstätter; Magdalena Aichner; Bernhard Puchner; Alexander Egger; Gregor Hoermann; Ewald Wöll; Günter Weiss; Gerlig Widmann; Ivan Tancevski; Judith Löffler-Ragg Journal: Elife Date: 2022-02-08 Impact factor: 8.140
Authors: Soyoung Lee; Yong Yu; Jakob Trimpert; Fahad Benthani; Mario Mairhofer; Paulina Richter-Pechanska; Emanuel Wyler; Dimitri Belenki; Sabine Kaltenbrunner; Maria Pammer; Lea Kausche; Theresa C Firsching; Kristina Dietert; Michael Schotsaert; Carles Martínez-Romero; Gagandeep Singh; Séverine Kunz; Daniela Niemeyer; Riad Ghanem; Helmut J F Salzer; Christian Paar; Michael Mülleder; Melissa Uccellini; Edward G Michaelis; Amjad Khan; Andrea Lau; Martin Schönlein; Anna Habringer; Josef Tomasits; Julia M Adler; Susanne Kimeswenger; Achim D Gruber; Wolfram Hoetzenecker; Herta Steinkellner; Bettina Purfürst; Reinhard Motz; Francesco Di Pierro; Bernd Lamprecht; Nikolaus Osterrieder; Markus Landthaler; Christian Drosten; Adolfo García-Sastre; Rupert Langer; Markus Ralser; Roland Eils; Maurice Reimann; Dorothy N Y Fan; Clemens A Schmitt Journal: Nature Date: 2021-09-13 Impact factor: 69.504
Authors: Thomas R Martin; Rachel L Zemans; Lorraine B Ware; Eric P Schmidt; David W H Riches; Lisa Bastarache; Carolyn S Calfee; Tushar J Desai; Susanne Herold; Catherine L Hough; Mark R Looney; Michael A Matthay; Nuala Meyer; Samir M Parikh; Troy Stevens; B Taylor Thompson Journal: Am J Respir Cell Mol Biol Date: 2022-09 Impact factor: 7.748
Authors: Ziyue Wang; Adam Cryar; Oliver Lemke; Pinkus Tober-Lau; Daniela Ludwig; Elisa Theresa Helbig; Stefan Hippenstiel; Leif-Erik Sander; Daniel Blake; Catherine S Lane; Rebekah L Sayers; Christoph Mueller; Johannes Zeiser; StJohn Townsend; Vadim Demichev; Michael Mülleder; Florian Kurth; Ernestas Sirka; Johannes Hartl; Markus Ralser Journal: EClinicalMedicine Date: 2022-06-09
Authors: Eric W Deutsch; Gilbert S Omenn; Zhi Sun; Michal Maes; Maria Pernemalm; Krishnan K Palaniappan; Natasha Letunica; Yves Vandenbrouck; Virginie Brun; Sheng-Ce Tao; Xiaobo Yu; Philipp E Geyer; Vera Ignjatovic; Robert L Moritz; Jochen M Schwenk Journal: J Proteome Res Date: 2021-10-21 Impact factor: 5.370
Authors: Srikanth Mairpady Shambat; Alejandro Gómez-Mejia; Tiziano A Schweizer; Markus Huemer; Chun-Chi Chang; Claudio Acevedo; Judith Bergada-Pijuan; Clément Vulin; Daniel A Hofmaenner; Thomas C Scheier; Sanne Hertegonne; Elena Parietti; Nataliya Miroshnikova; Pedro D Wendel Garcia; Matthias P Hilty; Philipp Karl Buehler; Reto A Schuepbach; Silvio D Brugger; Annelies S Zinkernagel Journal: PLoS Pathog Date: 2022-01-10 Impact factor: 6.823