Literature DB >> 35690068

Multiomic analysis reveals cell-type-specific molecular determinants of COVID-19 severity.

Sai Zhang¹, Johnathan Cooper-Knock², Annika K Weimer³, Minyi Shi³, Lina Kozhaya⁴, Derya Unutmaz⁴, Calum Harvey², Thomas H Julian², Simone Furini⁵, Elisa Frullanti⁶, Francesca Fava⁷, Alessandra Renieri⁷, Peng Gao³, Xiaotao Shen³, Ilia Sarah Timpanaro⁸, Kevin P Kenna⁸, J Kenneth Baillie⁹, Mark M Davis¹⁰, Philip S Tsao¹¹, Michael P Snyder¹².

Abstract

The determinants of severe COVID-19 in healthy adults are poorly understood, which limits the opportunity for early intervention. We present a multiomic analysis using machine learning to characterize the genomic basis of COVID-19 severity. We use single-cell multiome profiling of human lungs to link genetic signals to cell-type-specific functions. We discover >1,000 risk genes across 19 cell types, which account for 77% of the SNP-based heritability for severe disease. Genetic risk is particularly focused within natural killer (NK) cells and T cells, placing the dysfunction of these cells upstream of severe disease. Mendelian randomization and single-cell profiling of human NK cells support the role of NK cells and further localize genetic risk to CD56bright NK cells, which are key cytokine producers during the innate immune response. Rare variant analysis confirms the enrichment of severe-disease-associated genetic variation within NK-cell risk genes. Our study provides insights into the pathogenesis of severe COVID-19 with potential therapeutic targets.

Entities: Chemical

Keywords: COVID-19; GWAS; Mendelian randomization; NK cell; gene discovery; genome-wide association study; machine learning; network analysis; rare variant analysis; single-cell multiomic profiling

Mesh：

Substances：
CD56 Antigen
Cytokines

Year: 2022 PMID： 35690068 PMCID： PMC9163145 DOI： 10.1016/j.cels.2022.05.007

Source DB: PubMed Journal: Cell Syst ISSN： 2405-4712 Impact factor: 11.091

Introduction

Infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) giving rise to coronavirus disease 2019 (COVID-19) has caused a global pandemic with almost unprecedented morbidity and mortality (Dong et al., 2020). Severity of COVID-19 is markedly variable ranging from an asymptomatic infection to fatal multiorgan failure. Severity correlates with age and comorbidities (Shang et al., 2020) but not exclusively (Li et al., 2020). Host genetics is thought to be an essential determinant of severity (The COVID-19 Host Genetics Initiative, 2020), but this is poorly understood. Improved tools to identify individuals at risk of severe COVID-19 could facilitate life-saving precision medicine. Profiles of the immune response associated with severe COVID-19 have been largely observational; they have produced conflicting conclusions and struggled to assign causality. Studies have variously linked severity to CD8 T cells (Mathew et al., 2020), CD19 B cells (Sosa-Hernández et al., 2020), eosinophils (Lucas et al., 2020), and myeloid cells (Arunachalam et al., 2020), and even specific subtypes of immune cells such as adaptive natural killer (NK) cells (Maucourant et al., 2020). Single-cell omics profiling has demonstrated the differential function of various immune cell types in severe disease (Delorey et al., 2021a; Liao et al., 2020; Melms et al., 2021; Ren et al., 2021; Stephenson et al., 2021; Zhang et al., 2020a), but this work has focused on the transcriptome profiled during acute COVID-19, which is likely to be modified by the ongoing infection and is not necessarily an upstream determinant of outcome. We have focused on host genetics so as to circumvent this problem because genetic variation is largely fixed at conception. There have been several efforts to address the genetic basis of COVID-19 severity (Severe COVID-19 GWAS Group et al., 2020; Shelton et al., 2021), including large-scale genome-wide association studies (GWASs) (COVID-19 Host Genetics Initiative, 2021; Pairo-Castineira et al., 2021) and rare variant approaches (Benetti et al., 2020; Kosmicki et al., 2021; Novelli et al., 2020; Wang et al., 2020b). However, the discovery power of those studies is limited, and the biological interpretation of those identified loci has been difficult, partially because of the confounding effects of patient age and comorbidities (Huang et al., 2020). To avoid a similar problem, we have integrated genetic signals with single-cell multiome profiling of human lungs so as to map cell-type-specific functions. A primary cause of morbidity and mortality in COVID-19 is respiratory disease and specifically, a hyperinflammatory response within the lung that occurs in an age-independent manner (Brodin, 2021). This is the basis of a number of interventions based on immunosuppression (Mehta et al., 2020), which have repurposed treatments used for other diseases, particularly autoimmune diseases. Efficacy and side-effect profiles are likely to be improved by a COVID-19-specific immunomodulatory approach. To understand the genomic basis of COVID-19 severity and gain insights into its molecular mechanisms, we sought to combine the genetic variation associated with severe COVID-19 together with single-cell-resolution functional profiling of human lungs. Using RefMap, a machine learning algorithm, we recently developed for genetic and epigenetic integration (Zhang et al., 2022), we identified over 1,000 genes associated with critical illness across 19 cell types, which account for 77% of SNP-based heritability for severe COVID-19; this represents a 5-fold increase over traditional approaches (COVID-19 Host Genetics Initiative, 2021). Analysis of single-cell transcriptomic profiling of respiratory tissues revealed downregulation of our risk genes in corresponding cell types in severe COVID-19 patients. Network analysis identified multiple protein-protein interaction (PPI) modules enriched with risk genes, unveiling additional cell-type-specific mechanisms underlying severe COVID-19. Heritability analysis and Mendalian randomization (MR) confirmed an important role for NK cells, specifically CD56bright NK cells, in driving severe disease that extends previous literature (Maucourant et al., 2020) and adds causal inference. Rare variant analysis provided orthogonal evidence to further support the association of NK-cell risk genes with severe disease. Altogether, our study unravels a genomic landscape of COVID-19 severity and provides a better understanding of the disease pathogenesis, with potential for new prevention strategies and therapeutic targets.

Results

Mapping cell-type-specific risk genes for severe COVID-19

We used the RefMap machine learning model (Zhang et al., 2022; STAR Methods) to identify the genomic regions and genes associated with severe COVID-19. Briefly, RefMap is a Bayesian network that combines genetic signals (e.g., SNP Z scores) with functional genomic profiling (e.g., ATAC-seq and ChIP-seq) to map risk regions associated with complex diseases. With RefMap, we can search the genome for functional regions within which disease-associated genetic variation is significantly shifted from the null distribution. This reduces the size of the search space and increases the statistical power. Rather than testing SNPs one by one, RefMap models the genetic architecture (i.e., all of the SNPs) of diseases using a unified probabilistic model that captures more complex genetic structures and avoids a multiple testing correction that would limit the statistical power. The power of the RefMap model for gene discovery and recovery of missing heritability has been demonstrated in our recent work (Zhang et al., 2022). Here, to achieve cell-type-specific resolution within multicellular tissue, we modified RefMap and integrated single-cell multiome profiling of human lungs with COVID-19 GWAS data (Figure 1 ). In particular, we obtained summary statistics (COVID-19 Host Genetics Initiative [COVID19-hg], European [EUR], Release 5, phenotype definition A2; 5,101 cases versus 1,383,241 population controls) from the largest GWAS study of COVID-19 (COVID-19 Host Genetics Initiative, 2021), where age, sex, and 20 first principal components were included in the analysis as covariates. Severe COVID-19 was defined by the requirement for respiratory support or death attributed to COVID-19. Human lung single-cell multiome profiling, including single nucleus RNA sequencing (snRNA-seq) and single nucleus assay for transposase-accessible chromatin using sequencing (snATAC-seq), was retrieved from a recent study of healthy individuals (Wang et al., 2020a). Nineteen cell types were identified in both snATAC-seq and snRNA-seq profiles, including epithelial (alveolar type 1 [AT1], alveolar type 2 [AT2], club, ciliated, basal, and pulmonary neuroendocrine [PNEC]), mesenchymal (myofibroblast, pericyte, matrix fibroblast 1 [matrix fib. 1], and matrix fibroblast 2 [matrix fib. 2]), endothelial (arterial, lymphatic, capillary 1 [cap1], and capillary 2 [cap2]), and hematopoietic (macrophage, B cell, T cell, NK cell, and enucleated erythrocyte) cell types. We adopted these 19 cell types as the reference set within lung tissue throughout our study. Based on snATAC-seq peaks called in one or more of the 19 cell types, we used RefMap to identify disease-associated genomic regions from the COVID-19 GWAS summary data, resulting in 6,662 1-kb regions that passed the 5% significance threshold (referred to as RefMap COVID-19 regions hereafter; STAR Methods). By examining the intersection of identified regions with open chromatin in individual cell types based on corresponding snATAC-seq peaks, we derived cell-type-specific RefMap regions (mean per cell type = 1,733, standard deviation [SD] = 624; Figure 2 A).

Figure 1

Schematic of the study design

(A-H) The COVID-19 GWAS and human lung single-cell multiome (A) are integrated by the RefMap model shown in (B), where gray nodes represent observations, green nodes are local hidden variables, and pink nodes indicate global hidden variables (STAR Methods). Cell-type-specific risk genes are mapped using single-cell multiome profiling (C). Heritability analysis (D), Mendelian randomization (E), transcriptome analysis (F), and network analysis (G) together characterize the functional importance of RefMap genes, particularly for NK cells, in severe COVID-19. Rare variant analysis (H) orthogonally supports the role of NK cells in severe disease. cCRE, candidate cis-regulatory element.

RefMap COVID-19 genes highlight known disease biology

The RefMap COVID-19 gene list contains known driver genes for severe COVID-19. For instance, a recent study mapped the 3p21.31 risk locus of severe COVID-19 (Pairo-Castineira et al., 2021) to altered expression of LZTFL1 in lung epithelial cells and in ciliated epithelial cells (Downes et al., 2021). Extensive functional follow-up analysis demonstrated that the 3p21.31 risk genotype is associated with increased expression of LZTFL1 by ciliated epithelial cells and inhibition of epithelial-mesenchymal transition (EMT). EMT is a component of the innate immune response that is thought to be beneficial in the context of SARS-CoV-2 infection via repair of damaged tissues and reduced expression of ACE2 and TMPRSS2, which may limit intracellular viral uptake (Downes et al., 2021). Consistent with this prior evidence, LZTFL1 is also identified as a RefMap gene and is specifically associated with ciliated epithelial cells (Table S1). In addition, excessive activity of ADAM9, a metalloprotease, has been associated with severe COVID-19 (Carapito et al., 2021). ADAM9 promotes the conversion of monocytes and macrophages to multinucleated giant cells as part of the immune response (Chou et al., 2020). Notably, ADAM9 is identified as a RefMap gene expressed by cell types associated with the extracellular matrix (ECM), including matrix fibroblasts, myofibroblasts, and pericytes (Table S1). ADAM proteins, including ADAM9, are typically localized to the ECM (Chou et al., 2020), and hence, our finding is consistent with known biology.

Heritability enrichment within COVID-19 risk genes

Linkage disequilibrium score regression (LDSC) (Bulik-Sullivan et al., 2015) has been used to measure the total SNP-based heritability (h 2) of severe COVID-19 (COVID-19 Host Genetics Initiative, 2021). Here, we examined the partitioning of SNP-based heritability for severe COVID-19 within RefMap genes (STAR Methods). We discovered that the heritability of severe COVID-19 (COVID19-hg, EUR, Release 5, phenotype definition A2) is significantly enriched within RefMap genes (proportion of SNP-based heritability = 77%, OR = 4.6, SE = 0.78, p = 1.55e−7; Figure 3 A; Table S3). We also compared RefMap with several widely used methods for genetic association study (STAR Methods), including naive GWAS (COVID-19 Host Genetics Initiative, 2021), MAGMA (Delorey et al., 2021b), PAINTOR (Kichaev et al., 2014), and Pascal (Lamparter et al., 2016), and observed superior performance of RefMap in terms of the proportion of recovered heritability (up to 5-fold increase; Table S3). The proportion of SNP-based heritability for hospitalized COVID-19 (COVID19-hg, EUR, Release 5, phenotype definition B2) within RefMap genes is 62% and for COVID-19 independent of severity (COVID19-hg, EUR, Release 5, phenotype definition C2) it is 52% (Table S3). In both cases, the improvement in captured heritability based on RefMap compared with traditional methods is up to 3-fold. Consistent with the design of our model, the recovered heritability is highest for severe COVID-19. As further confirmation, we calculated recovered heritability based on the GenOMICC GWAS. RefMap genes are highly enriched with heritability for severe COVID-19 in this replication cohort (proportion of SNP-based heritability = 50%, OR = 3.0, SE = 0.48, p = 5.12e−6; Table S3).

Figure 3

Severe-COVID-19-associated common variants are linked to multiple cell types

(A) Heritability enrichment estimated by LDSC for different cell types. Enrichment was calculated as the proportion of total SNP-based heritability adjusted for SNP number. The Benjamini-Hochberg (BH) procedure was used to calculate FDRs throughout the study. ∗: FDR < 0.1.

(B) Heritability enrichment for different subsets of T cells and NK cells.

(C–E) Significant Mendelian randomization results for three exposures linked to severe COVID-19, including blood counts of CD335+ CD314− (C), CCR7− CD314− (D), and CD314+ (E), NK cells; points indicate effect size (β) and standard errors for each SNP-outcome relationship.

(F) Sensitivity analyses and robust tests for MR analyses (STAR Methods).

(G) Comparative gene expression analysis of RefMap NK-cell genes in NKG2D+ and NKG2D− NK cells. Fold change was calculated as the ratio of gene expression levels in NKG2D+ NK cells to NKG2D− NK cells. The transcriptome was defined by all the expressed genes (with at least one unique molecular identifier [UMI]) in NK cells. Violin plots show the distributions of fold change values within each group, and boxplots indicate the median, interquartile range (IQR), Q1 − 1.5 × IQR, and Q3 + 1.5 × IQR. Distributions were compared by one-tailed Wilcoxon rank-sum test.

See also Figures S1 and S2 and Table S3.

Severe-COVID-19-associated common variants are linked to multiple cell types (A) Heritability enrichment estimated by LDSC for different cell types. Enrichment was calculated as the proportion of total SNP-based heritability adjusted for SNP number. The Benjamini-Hochberg (BH) procedure was used to calculate FDRs throughout the study. ∗: FDR < 0.1. (B) Heritability enrichment for different subsets of T cells and NK cells. (C–E) Significant Mendelian randomization results for three exposures linked to severe COVID-19, including blood counts of CD335+ CD314− (C), CCR7− CD314− (D), and CD314+ (E), NK cells; points indicate effect size (β) and standard errors for each SNP-outcome relationship. (F) Sensitivity analyses and robust tests for MR analyses (STAR Methods). (G) Comparative gene expression analysis of RefMap NK-cell genes in NKG2D+ and NKG2D− NK cells. Fold change was calculated as the ratio of gene expression levels in NKG2D+ NK cells to NKG2D− NK cells. The transcriptome was defined by all the expressed genes (with at least one unique molecular identifier [UMI]) in NK cells. Violin plots show the distributions of fold change values within each group, and boxplots indicate the median, interquartile range (IQR), Q1 − 1.5 × IQR, and Q3 + 1.5 × IQR. Distributions were compared by one-tailed Wilcoxon rank-sum test. See also Figures S1 and S2 and Table S3.

Prioritizing cell types by heritability partitioning

Next, we used cell-type-specific RefMap COVID-19 genes to determine which cell types are relatively more important in the development of severe COVID-19. Specifically, we calculated the partitioned heritability per cell type within the severe COVID-19 GWAS (A2) and also within GWAS for hospitalized versus non-hospitalized COVID-19 (B2) and COVID-19 versus population (C2) (STAR Methods). For severe COVID-19, of all 19 cell types tested, NK-cell and T cell genes are the most enriched with SNP-based heritability (NK cells: OR = 8.87, SE = 3.68, p = 0.016; T cells: OR = 8.64, SE = 3.28, p = 0.005; Figure 3A; Table S3). The same is also true for hospitalized COVID-19 (NK cells: OR = 10.57, SE = 4.95, p = 0.039; T cells: OR = 8.67, SE = 4.13, p = 0.041), but this enrichment is not statistically significant for COVID-19 irrespective of severity (NK cells: OR = 5.74, SE = 3.09, p = 0.077; T cells: OR = 4.16, SE = 2.56, p = 0.18; Table S3). In the GenOMICC GWAS, we obtained a similar result, whereby NK cells and T cells are the most enriched with SNP-based heritability for severe COVID-19 (NK cells: OR = 5.13, SE = 1.68, p = 0.01; T cells: OR = 4.28, SE = 1.32, p = 0.01; Table S3) compared with other cell types. We extended our analysis of T cells by dividing them into CD4+ and CD8+ subtypes based on fluorescence-activated cell sorting (FACS) purified single cell RNA sequencing (scRNA-seq) profiling of the human lung (Travaglini et al., 2020). We first examined the relative expression of RefMap T cell genes within CD4+ and CD8+ T cells but discovered no significant difference between these two cell types (p = 0.34, two-tailed Wilcoxon rank-sum test). Furthermore, when we split RefMap T cell genes into CD4+ and CD8+ specific genes based on relative overexpression (fold change [FC] > 1.5 between subtypes; STAR Methods), we observed no significant difference in the heritability enrichment for severe COVID-19 between these two gene sets (CD4+: OR = 6.95, CD8+: OR = 6.23; Figure 3B; Table S3), suggesting that both subtypes may contribute to the immune response leading to severe disease. We observed similar results (CD4+: OR = 5.40, CD8+: OR = 4.66; Table S3) using the GenOMICC GWAS.

Dissecting NK-cell subtypes using MR and single-cell multiome profiling

NK cells have diverse biological functions, and therefore, we sought to understand which of these functions is a determinant of COVID-19 severity. Two-sample MR facilitates identification of a causal relationship between an exposure and an outcome (Smith, 2010). We examined whether NK-cell populations measured in the blood are causally related to severe COVID-19. In total, 26 GWAS measures of NK-cell subtypes were identified based on a previous study in which flow cytometry was used to quantify both counts of immune cell subsets and immune-cell-surface protein expression (Roederer et al., 2015; STAR Methods). After harmonizing exposure and outcome genetic instruments, we excluded tests with less than five SNPs that are likely to be underpowered (STAR Methods). With MR, three exposures were shown to be causally related to severe COVID-19 after correcting for multiple testing (p < 2e−3, multiplicative random effects [MREs] inverse-variance weighted [IVW] estimate). All three exposures relate to the proportion of NK cells expressing NKG2D/CD314 on the cell surface (Roederer et al., 2015), which is the major receptor responsible for NK-cell activation (Raulet, 2003). A higher proportion of NKG2D/CD314− cells is linked to severe COVID-19 (A2) (Figures 3C and 3D), but a higher proportion of NKG2D/CD314+ cells is protective (Figure 3E). Checks for genetic pleiotropy (MR Egger intercept not significantly different from zero, p > 0.05; Figure 3F) and instrument heterogeneity (p > 0.05, Cochran’s Q test, and I 2 GX > 0.95; Figure 3F) were satisfactory. Moreover, robust measures are significant for all three exposures (Figure 3F). Furthermore, MR revealed that the proportion of NKG2D/CD314− cells is also causally associated with hospitalized COVID-19 (B2) and with COVID-19 independent of severity (C2) (Figure S1), but in each case, the effect size is reduced compared with severe COVID-19 (A2). To validate these findings, we repeated the MR analysis of NKG2D/CD314 cell count in the GenOMICC GWAS; again, a higher proportion of NKG2D/CD314− cells is linked to severe COVID-19 (p = 0.035; Figure S2). Inspired by our MR analysis, we tested if the expression of RefMap NK-cell genes reflects a functional difference between NKG2D/CD314+ and NKG2D/CD314− cells. We examined gene expression using scRNA-seq data from healthy lungs (Travaglini et al., 2020); we discovered that RefMap NK-cell genes are expressed at a higher level within NKG2D/CD314+ cells compared with NKG2D/CD314− cells (p = 0.036, one-tailed Wilcoxon rank-sum test; Figure 3G). We conclude from these analyses that NK-cell activation via NKG2D/CD314 receptors may be protective against severe COVID-19. We note that the NKG2D/CD314 signaling pathway is not specific with respect to diverse NK-cell functions. To achieve a finer characterization of NK-cell subtypes, we isolated NK cells from an uninfected human donor and performed single-cell multiome profiling (STAR Methods). NKG2D/CD314+ NK cells and various functional subtypes including CD56bright, CD16+, and NKG2A+ NK cells were identified based on relative expression and open chromatin over the gene body and promoter (Figure 4 A; STAR Methods). As expected, most NK cells are NKG2D/CD314+ (Figure 4A), and moreover, RefMap NK-cell genes that are relatively overexpressed in NKG2D/CD314+ compared with NKG2D/CD314− cells (FC > 1.5; STAR Methods) are not significantly enriched with heritability of severe COVID-19 compared with the total set of RefMap NK-cell genes (OR = 9.2 for NKG2D/CD314+ NK-cell genes versus OR = 8.9 for all NK-cell genes; Figure 3B; Table S3). The major functional subdivision of NK cells is between CD56bright NK cells, which are responsible for cytokine production and immunomodulation, and CD56dim NK cells, which are directly cytotoxic (Michel et al., 2016). A prominent mechanism for activation of cytotoxic NK cells is via CD16 ligand crosslinking, and therefore, it is expected that CD56bright NK cells are largely non-overlapping with CD16+ NK cells (Romee et al., 2013) as we observed (Figure 4A). Interestingly, RefMap NK-cell genes are relatively overexpressed in CD56bright cells compared with CD56dim cells (FC = 1.3, p = 0.02, Student's t test). Moreover, RefMap NK-cell genes overexpressed in CD56bright NK cells (FC > 1.5) are highly enriched with heritability for severe COVID-19 compared with the total set of RefMap NK-cell genes (OR = 24.5 for CD56bright NK-cell genes versus OR = 8.9 for all NK-cell genes; Figure 3B; Table S3). This result was replicated in the GenOMICC GWAS (OR = 16.5 for CD56bright NK-cell genes versus OR = 5.1 for all NK-cell genes; Table S3). This difference in heritability enrichment is statistically significant when considering the two datasets together (FC = 2.93, p = 0.045, Student’s t test). To further develop this observation, we used MR to test whether NK-cell surface expression of CD56 (Roederer et al., 2015) is causally linked to severe COVID-19; the test was relatively underpowered at the chosen significance threshold (STAR Methods), but higher expression of CD56 was associated with lower risk of severe COVID-19 (MRE IVW p = 0.01). We conclude that severe COVID-19 may be associated with the function of CD56bright NK cells; both our MR analysis and transcriptome study suggest that reduced function of these cells may be an upstream cause of SARS-CoV-2 infection. The observed heritability enrichment of severe COVID-19 associated with CD56bright NK cells is significantly larger than that of any other profiled cell type including subtypes of T cells (Figure 3B). This has functional implications because CD56bright cells are responsible for production of IFN-γ and other immunomodulatory cytokines important for the innate immune response (He et al., 2004). Our interpretation is that deficiency of the NK-cell-derived IFN-γ defense may precipitate uncontrolled viral replication.

Figure 4

Transcriptomic signature of RefMap COVID-19 genes in different cell types

(A) UMAP of iterative latent semantic indexing (LSI) for combined gene expression and open chromatin over the gene body and promoter. Cells are colored by relative expression of NKG2D, CD56, CD16, and NKG2A, respectively. Expression is quantified as log2(normalized gene counts + 1); yellow/orange cells have relatively high expression of each marker.

(B) Enriched TF motifs of RefMap COVID-19 regions across 19 cell types. Relative enrichment per cell type is indicated by circle size and significant enrichment (HOMER, Q value < 0.1) is annotated with black circle; TF expression is indicated by color according to log(normalized gene counts + 1). Only highly expressed (expression level in the top 95% in corresponding cell types) TFs were considered.

(C) Gene expression analysis of RefMap genes across different cell types in healthy lungs. The transcriptome was defined as the total set of expressed genes for each cell type. Violin plots show the distributions of log expression levels within each group, and point plots indicate the median and IQR.

(D and E) Comparative gene expression analysis of cell-type-specific RefMap genes in severe COVID-19 patients versus moderately affected patients based on scRNA-seq datasets from Ren et al. (D) and Liao et al. (E), respectively. The score of Wilcoxon rank-sum test was used to indicate the change of gene expression between severe and moderate patient groups, where a positive value means higher gene expression in severe patients. Violin plots show the distribution of gene expression changes within each group, and boxplots indicate the median, IQR, Q1 − 1.5 × IQR, and Q3 + 1.5 × IQR. ∗: 0.01 ≤ FDR < 0.1. +: FDR < 0.01. Distributions were compared by one-tailed Wilcoxon rank-sum test.

Functional profiling of severe COVID-19 risk genes

We sought to systematically characterize the functional roles of RefMap COVID-19 regions and genes across the 19 cell types in both healthy and diseased contexts. We first performed motif enrichment analysis (Heinz et al., 2010) (STAR Methods) for RefMap regions, which identified four transcription factors (TFs), including CUX1, TCF12, ZEB1, and ZEB2, whose binding motifs are enriched in at least one of 19 cell types (HOMER, Q value < 0.1 and expression percentile >95; Figure 4B). Interestingly, although T cell and NK-cell RefMap gene lists were equally enriched with heritability for severe COVID-19 (Figure 3A), only NK-cell risk regions were enriched with a TF binding motif: ZEB2. ZEB2 is an essential driver of NK-cell maturation (van Helden et al., 2015), and immature NK cells from ZEB2-null mice perform deficient immunosurveillance in vivo even when in vitro functions are maintained (Bi and Wang, 2020). We conclude that deficient NK-cell function leading to severe COVID-19 may be a result of failed maturation. Next, we performed functional enrichment analysis for RefMap genes using Enrichr (Kuleshov et al., 2016; Tables S4 and S5). We observed that RefMap NK-cell genes are enriched with pathways and gene ontology (GO) terms related to intra- and inter-cellular signaling important for NK-cell activation, including “Phospholipase D signaling pathway” (Balboa et al., 1992), “Antigen processing and presentation,” “regulation of small-GTPase-mediated signal transduction” (GO:0051056) (Watzl and Long, 2010), and “regulation of intracellular signal transduction” (GO:1902531) (adjusted p < 0.1; Tables S4 and S5). This is consistent with the hypothesis that COVID-19 severity is determined by failed activation of NK cells. Other cell-type-specific RefMap gene lists are also enriched with relevant biological pathways. For example, AT2-cell genes are linked to pathways associated with viral infection such as “human papillomavirus infection” and “viral carcinogenesis” (adjusted p < 0.1; Table S5), which is consistent with the established role of AT2 cells as the initial site of SARS-CoV-2 entry into host cells (Hoffmann et al., 2020). T cell genes are enriched with “IL-17-signaling pathway” (adjusted p = 0.021; Table S5), which is interesting in light of previous literature highlighting the production of IL-17 by T cells from COVID-19 patients as a potential therapeutic target (De Biasi et al., 2020). We investigated the baseline expression pattern of RefMap genes in healthy lungs. In particular, we calculated mean expression levels of genes in different cell types based on lung snRNA-seq data from Wang et al., 2020a), and then compared the expression of RefMap genes with the total set of expressed genes in each cell type. Interestingly, although the gene expression level was not an input to the RefMap model, RefMap genes are expressed at a higher level compared with expressed genes in all 19 cell types, including immune and epithelial cells (false discovery rate [FDR] < 0.1, one-tailed Wilcoxon rank-sum test; Figure 4C) but with the exception of pericytes (FDR = 0.11, Z score = 1.25); notably, pericytes may be downstream in the pathogenesis of COVID-19 because they are protected by an endothelial barrier (He et al., 2020). This supports the functional significance of RefMap genes across multiple cell types in healthy human lungs. As a negative control, we performed a similar expression comparison between non-developmental genes and all expressed genes in lungs, which yielded no significant difference (Figure S3; STAR Methods). Finally, given that RefMap COVID-19 genes were identified via their associated regulatory elements (i.e., snATAC-seq peaks), we examined whether there is any expression change for RefMap genes in the context of SARS-CoV-2 infection. We obtained scRNA-seq data from the respiratory system for a large COVID-19 cohort (Ren et al., 2021), including 12 bronchoalveolar lavage fluid (BALF) samples, 22 sputum samples, and 1 sample of pleural fluid mononuclear cells (PFMCs) from 27 severely and 8 mildly affected patients. Severity was classified based on the World Health Organization (WHO) guidelines (https://www.who.int/publications/i/item/WHO-2019-nCoV-clinical-2021-1). For individual cell types, we compared the expression level of RefMap genes in severe patients versus moderately affected patients (STAR Methods). Compared with the background transcriptome, we observed that RefMap genes are expressed at a lower level in corresponding cell types from severe patients compared with moderate patients (FDR < 0.01, one-tailed Wilcoxon rank-sum test; Figure 4D), supporting the functional significance of RepMap genes in severe COVID-19. As a replication experiment, we carried out a similar analysis based on an independent COVID-19 scRNA-seq dataset (Liao et al., 2020), including 9 BALF samples from 6 severe patients and 3 moderate patients (STAR Methods). The lower expression of RefMap genes in severe patients is consistent across multiple cell types (FDR < 0.01, one-tailed Wilcoxon rank-sum test; Figure 4E). Altogether, these transcriptome-based orthogonal analyses are consistent with the hypothesis that identified cell-type-specific RefMap genes are functionally linked to COVID-19 severity.

Identification of gene modules associated with severe COVID-19

To delineate cell-type-specific molecular mechanisms underlying severe COVID-19, we next mapped RefMap COVID-19 genes to the global PPI network and then inspected functional enrichment of COVID-19-associated network modules. In particular, we extracted high-confidence (combined score >700) PPIs from STRING v11.0 (Szklarczyk et al., 2019), which include 17,161 proteins and 839,522 protein interactions. To eliminate the bias of hub genes (Krishnan et al., 2016), we performed the random walk with restart algorithm over the raw PPI network to construct a smoothed network based on edges with weights in the top 5% (STAR Methods). Next, this smoothed PPI network was decomposed into non-overlapping subnetworks using the Leiden algorithm (Traag et al., 2019). This process yielded 1,681 different modules (Table S6) in which genes within modules are densely connected but sparsely connected with genes in other modules. Six modules including M62 (n = 370; mesenchymal cells), M148 (n = 364; mesenchymal cells), M546 (n = 90; epithelial cells), M750 (n = 281; endothelial and mesenchymal cells), M1164 (n = 396; endothelial, epithelial, hematopoietic, and mesenchymal cells), and M1540 (n = 226; hematopoietic cells) are significantly enriched with at least one cell-type-specific RefMap gene list (FDR <0.1, hypergeometric test; Figures 5 A–5C; Table S6). In particular, RefMap genes specific to ciliated epithelial cells are enriched in module M546 (FDR < 0.1, hypergeometric test; Figure 5A; Table S6), which is enriched with biological functions including “hippo signaling” (GO:0035329) (adjusted p<0.1; Figure 5D). Hippo signaling is involved in the EMT response (Lei et al., 2008), and hence, this result is consistent literature linking the RefMap risk gene LZTFL1 to severe COVID-19 via increased expression within ciliated epithelial cells and reduced EMT. RefMap genes expressed by several epithelial cell types including AT1, AT2, basal, and ciliated cells are enriched within module M1164 (FDR < 0.1, hypergeometric test; Figure 5B; Table S6), which is linked to infection (e.g., “Bacterial invasion of epithelial cells”) and intracellular signaling (e.g., “regulation of small-GTPase-mediated signal transduction” (GO:0051056) and “Rho protein signal transduction” (GO:0007266)) (adjusted p < 0.1; Figure 5E). Moreover, NK-cell and T cell RefMap genes are specifically enriched within module M1540 (FDR < 0.1, hypergeometric test; Figure 5C; Table S6), which is linked to interferon signaling (e.g., “interferon-gamma-mediated signaling pathway” (GO:0060333) and “positive regulation of NK-cell cytokine production” (GO:0002729)) (adjusted p < 0.1; Figure 5F). Interestingly, M1540 is also enriched with gene expression linked to CD56+ NK cells (adjusted p < 0.1, Human Gene Atlas).

Figure 5

PPI modules enriched with COVID-19 genes and their functional characterization

(A–C) Three PPI network modules, including M546 (A), M1164 (B), and M1540 (C), are significantly enriched with ciliated-cell gene, epithelial-cell genes, and immune-cell genes, respectively. Blue nodes represent RefMap COVID-19 genes and yellow nodes indicate other genes within each module. Edge thickness is proportional to STRING confidence score (>400).

(D–F) Gene functions that are significantly enriched (Fisher’s exact test, adjusted p < 0.1) in modules M546 (D), M1164 (E), and M1540 (F). GOBP, GO biological process.

(G) Gene expression analysis of module genes in NK and T cells. The transcriptome was defined as the total set of expressed genes in NK and T cells, respectively.

(H and I) Comparative gene expression analysis of module genes in severe COVID-19 patients versus moderate patients based on scRNA-seq datasets from Ren et al. (H) and Liao et al. (I), respectively. The Z score of Wilcoxon rank-sum test was used to indicate the change of gene expression between severe and moderate patient groups. Violin plots show the distribution of gene expression changes within each group, and boxplots indicate the median, IQR, Q1 − 1.5 × IQR, and Q3 + 1.5 × IQR. Distributions were compared by one-tailed Wilcoxon rank-sum test.

Rare variant association analysis of COVID-19 risk genes

To verify the set of RefMap genes, we utilized a rare variant analysis that is orthogonal to our methods described to this point, which rely on common variant analysis. Rare variants were used to construct gene-level mutation burden in a large meta-analysis, including 4,964 severe COVID-19 patients and 570,461 population controls (STAR Methods). Rare variants were identified by minor allele frequency (MAF) < 1% (STAR Methods). Only loss-of-function (LoF) mutations were considered, including nonsense mutations, frameshift mutations, and splice-site mutations (STAR Methods). A random-effect (Dersimonian-Laird) meta-analysis was performed to combine results from multiple independent cohorts. Of NK-cell RefMap genes, 225 are present within this dataset where 18 are significantly enriched (p < 0.05, REGENIE; Mbatchou et al., 2021) with rare variants associated with severe COVID-19. This enrichment is statistically significant (FDR < 0.1, permutation test; Figures 6A and 6B; STAR Methods). These 18 genes are significantly enriched with biological functions including “negative regulation of cell adhesion” (GO:0007162) (adjusted p = 2.1e−4, OR = 82.5), which is consistent with a role in recruitment of NK cells to an area of infection. One of these 18 genes is APOBEC3G (Figure 6B), which is a cytidine deaminase implicated in the immune defense against Coronaviridae (Milewska et al., 2018; Wang and Wang, 2009) and in enhancement of NK-cell antiviral function (Norman et al., 2011). Moreover, the mean p value and median p value of NK-cell RefMap genes in the rare variant test are significantly lower than expected by chance (p = 0.02 and p = 0.03, respectively, permutation test; STAR Methods). The enrichment of rare variants associated with severe COVID-19 within NK-cell RefMap genes is greater than for RefMap genes overall (λGC = 1.09 for NK-cell genes versus λGC = 0.89 for all RefMap genes; Figure 6C; STAR Methods) or any other cell-type-specific RefMap gene list, again, consistent with a role of NK cells in severe COVID-19.

Figure 6

Rare variant analysis supports the association of NK cells with severe COVID-19

(A) Enrichment analysis of cell-type-specific RefMap COVID-19 genes based on rare variant associations. ∗: Q value < 0.1, permutation test.

(B) Overlapping genes between the RefMap NK-cell gene set and the rare-variant-associated gene set (p < 0.05, REGENIE). MAC, minor allele count.

(C) Q-Q plot of rare variant association test for RefMap NK-cell genes and all RefMap genes, including expected p values based on the null hypothesis and observed p values by REGENIE.

Discussion

The COVID-19 pandemic is a global health crisis (Dong et al., 2020). Vaccination efforts have led to early successes (Shilo et al., 2021), but the prospect of evolving variants capable of immune-escape (Darby and Hiscox, 2021) highlights the importance of efforts to better understand the COVID-19 pathogenesis and to develop effective treatments. Host genetic determinants of disease severity have been investigated (Severe Covid-19 GWAS Group et al., 2020; Benetti et al., 2020; COVID-19 Host Genetics Initiative, 2021; Kosmicki et al., 2021; Novelli et al., 2020; Pairo-Castineira et al., 2021; Shelton et al., 2021; Wang et al., 2020b), but the findings and functional interpretations so far have been limited (Huang et al., 2020). In contrast, studies of the immune response accompanying severe COVID-19 (Arunachalam et al., 2020; Lucas et al., 2020; Mathew et al., 2020; Sosa-Hernández et al., 2020) have struggled to establish causality leading to a diverse array of candidates and little consensus. Our contributions are an integrated analysis of common and rare host genetic variation causally linked to severe COVID-19, together with biological interpretations via single-cell omics profiling of lung tissue and identification of >1,000 risk genes explaining the majority of SNP-based heritability of severe COVID-19. Our RefMap analysis, which integrates GWAS summary statistics with epigenetic profiles (Zhang et al., 2022), uncovers a landscape of cellular dysfunction within lung tissue leading to severe COVID-19. Our findings are consistent with previous literature, but our focus on host genetics allows us to make conclusions about upstream causation, which has been missing from previous studies. In particular, we highlight the failure of cytokine production by CD56bright NK cells. Our MR analyses revealed that genetic predisposition to lower counts of mature NK cells is associated with increased risk of severe COVID-19 and NK-cell-specific RefMap risk genes are enriched for binding motifs for ZEB2, a TF involved in NK-cell maturation. Our rare variant analysis revealed that LoF variants, which impair NK-cell function and potentially NK-cell recruitment to a site of infection, increase risk of severe COVID-19. Interestingly, we showed that LoF variants within APOBEC3G increase the risk of severe COVID-19, which is consistent with previous literature linking this protein to the NK-mediated immune response to Coronaviridae (Milewska et al., 2018; Norman et al., 2011; Wang and Wang, 2009). Previous work has implicated both the IFN-γ response and the importance of CD56bright NK cell function in development of severe COVID-19. Our contribution is that we have arrived at this result through study of host genetics, which are fixed at conception and so necessarily upstream of SARS-CoV-2 infection. This also helps to explain an apparent contradiction between our results that predict a failure of the IFN-γ response in severe COVID-19 and studies that have positively correlated serum IFN-γ with COVID-19 mortality (Gadotti et al., 2020). In a purely observational study, it is impossible to distinguish cause and effect, and individuals with mild disease will likely have a lower viral load and therefore an attenuated IFN-γ response. The key comparison is the capacity of NK cells to produce a suitable immune response to SARS-CoV-2 at the earliest stages of infection or even before infection has actually occurred. Given that experimentally introducing viruses to healthy individuals is not feasible, we believe that our genetics-based approach is an optimal method to determine which events are upstream and truly causal in the development of severe COVID-19. Other observational studies are entirely consistent with our own observations; for example, NK-cell counts at the time of admission predict the rate of decline in viral load (Witkowski et al., 2021), which is exactly what we would predict from our MR results. The same study observed a relationship between low blood counts of CD56bright NK cells in the first week after COVID-19 symptom onset and increased likelihood of severe disease, which directly mirrors our own conclusions regarding this NK-cell subtype. Another study of immune cell profiles during active infection observed a relative contraction of NKG2D+ and CD56bright NK-cell counts in patients with poor COVID-19 outcomes (Varchetta et al., 2021). These studies are of course important, but our work now shows that these findings are dependent on host factors and not determined by viral properties. Ultimately, this could enable prediction of COVID-19 risk in uninfected individuals. In the cancer field, NK-cell stimulation has been postulated as a therapeutic strategy (Hu et al., 2019). We propose that this strategy could protect at-risk individuals in future waves of COVID-19. We highlight the importance of T cells and NK cells in particular. Other immune cell types, such as B cells, carry less genetic risk for severe COVID-19. This is consistent with previous literature demonstrating robust immune responses to infection with SARS-CoV-2, even in patients with dramatic B cell depletion due to hematological malignancy (Bange et al., 2021). Our study has several limitations. Our genetic discovery data are largely focused on European ancestry, which may limit widespread applicability. In part, this was because we did not have access to individual-level data, and therefore, we necessarily performed out-sample LD estimation, which can lead to false positive results (Benner et al., 2017). To mitigate this, both our transcriptome analyses and rare variant burden testing encompassed several population groups. We also analyzed the gene intolerance to LoF variants using probability of loss of function intolerance (pLI) and observed/expected ratio (o/e) scores (Karczewski et al., 2020; Lek et al., 2016), which measure the excess of LoF variants for individual genes compared with the genome-wide expected baseline. The pLI score and o/e scores are calculated based on ExAC and gnomAD exome cohorts, respectively, which involve multiple populations. Interestingly, we observed that our RefMap NK-cell genes have significantly larger pLI scores and smaller o/e scores compared with the total set of protein-coding genes (pLI: p = 3.16e−16, Wilcoxon rank-sum test; o/e: p = 1.33e−16, Wilcoxon rank-sum test; Figure S5), suggesting that the RefMap NK-cell genes are intolerant to LoF mutations, regardless of populations and demonstrating their functional significance. The validation of our genes in orthogonal, and independent data from diverse ancestries indicate the population transposability of our findings. In addition, in the absence of tissue-specific QTL data, we mapped risk genes from regulatory regions using the “closest-gene” method, which has been widely used in the absence of 3D genome profiling (e.g., Hi-C) (Buniello et al., 2019; Mountjoy et al., 2020). Indeed, in the context of mapping disease risk loci, assigning the cis-regulatory elements (CREs) to their closest neighbors yields comparable results compared with more advanced methods (Nasser et al., 2021). Using decaying weights (e.g., exponentially) based on the distance from CREs to the transcription start sites (TSSs) in predicting gene expression also gives competitive results (Zhou et al., 2018). These previous studies indicate that the closest-gene method is a reasonable method when lacking other orthogonal molecular profiling. Next, our analysis did not have sufficient resolution to detect all potentially relevant cellular subtypes, and some rare cell types, such as ionocytes (Montoro et al., 2018), went completely undetected in the profiling data we used. Finally, we have not performed any in vitro experimental follow-up studies; we expect that cultured NK cells derived from individuals carrying an at-risk genotype would demonstrate reduced expression of RefMap COVID-19 genes and reduced ability to control viral replication in co-culture with an appropriate cell type, such as Calu3 cells. We note that use of this system by others confirmed our prediction that NKG2D signaling is necessary for NK cell control of viral replication (Witkowski et al., 2021). However, there are limitations of such in vitro systems that fail to capture all the interactions between the numerous cell types present in vivo. A recent study revealed that manipulating the activation of NK cells in a mouse model resulted in a significantly higher viral burden (Wang et al., 2021), but there are limitations to animal models that do not necessarily translate to human disease. We suggest that our genetic profile could be used to guide a prospective study and even a protective intervention trial in human patients. In conclusion, we have uncovered a genetic architecture of severe COVID-19 integrated with single cell-resolution biological functions. Both common and rare variant analyses have highlighted NK-cell activation as a key determinant of disease severity. Our framework can be applied to decipher the genetic and biological basis of other complex diseases.

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Micheal P. Snyder (mpsnyder@stanford.edu).

Materials availability

All unique/stable reagents generated in this study are available from the lead contact without restriction.

Method details

The RefMap model

The general idea of RefMap is to model the genetic associations within individual genomic regions whose function is informed by epigenetic features. In particular, in RefMap allele Z-scores were calculated as Z=b/se, where b and se are effect size and standard error, respectively, as reported by the COVID-19 GWAS (COVID-19 Host Genetics Initiative, 2021) (COVID-19 Host Genetics Initiative, Release 5, phenotype definition A2, EUR only) where the sample age, sex, and ancestry information were included as covariates. Given Z-scores and lung snATAC-seq peaks, we aim to identify functional genomic regions in which the Z-score distribution is significantly shifted from the null distribution, informing the disease association. Suppose we have K 1Mb linkage disequilibrium (LD) blocks, where each LD block contains J (k=1,..., K) 1kb regions and each region harbors I (j=1,..., J , I >0) SNPs, the Z-scores follow a multivariate normal distribution, i.e., where the Z-score of the i-th SNP in the j-th region of the k-th block is denoted as z (i=1,..., I ) and are the effect sizes that can be expressed as in Equation 1 represents the in-sample LD matrix comprising of the pairwise Pearson correlation coefficients between pairs of SNPs within the k-th block, and I is the total number of SNPs given by . Here, since we have no access to the individual-level data, we used EUR samples from the 1000 Genomes Project (Phase 3) to estimate , which yields the out-sample LD matrix. A modified Cholesky algorithm (Cheng and Higham, 1998) was used to get a symmetric positive definite (SPD) approximation of the LD matrix. Further, we assume u (i=1,..., I ) are independent and identically distributed (i.i.d.), following a normal distribution given by in which the precision follows a Gamma distribution, i.e., Furthermore, to characterize the shift of the expectation in Equation 3 from the null distribution, we model m by a three-component Gaussian mixture model (GMM) given by in which the precisions follow and v -1 and v +1 are non-negative variables measuring the absolute values of effect size shifts for the negative and positive components, respectively. To impose non-negativity over v -1 and v +1, we adopt the rectification nonlinearity technique proposed previously (Harva and Kabán, 2007). In detail, we assume v -1 and v +1 follow in which the rectified Gaussian distribution is defined via a dumb variable. We then define v -1 and v +1 by which guarantees that v -1 and v +1 are non-negative. The dump variable r -1 and r +1 follow the Gaussian distributions given by where and follow the Gaussian-Gamma distributions, i.e., Note that an advantage of the rectification nonlinearity is it is tractable in the variational inference framework (Harva and Kabán, 2007). The indicator variables in Equation 5 denote whether that region is disease-associated or not. Indeed, we define the region to be disease-associated if or , and to be non-associated otherwise. To simplify the analysis, we put a symmetry over and , and define the distribution by Furthermore, to incorporate the functional information into the modeling, we define the probability parameter in Equation 15 as where is the sigmoid function, is the vector of epigenetic features for the j-th region in the k-th LD block, and the weight vector follows a multivariate normal distribution, i.e., and follows In this study, the epigenetic feature was calculated as the overlapping ratios of that region with the snATAC-seq peaks detected in any of the cell types in healthy human lungs. All priors in RefMap are defined based on the conjugacy rule to make the variational inference tractable. Based on the model defined in (Equation 1), (Equation 2), (Equation 3), (Equation 4), (Equation 5), (Equation 6), (Equation 7), (Equation 8), (Equation 9), (Equation 10), (Equation 11), (Equation 12), (Equation 13), (Equation 14), (Equation 15), (Equation 16), (Equation 17), (Equation 18), we are interested in calculating the posterior probability p( | , ), where the mean-field variational inference (MFVI) and local variational method (Blei et al., 2017) were adopted to solve the intractability. Here, we used variational inference, an approximate inference framework, because of its superiority in convergence rate compared to sampling methods. More technical details, including a coordinate ascent-based inference algorithm, can be found in Zhang et al. (2022). In this study, we ran the MFVI algorithm per chromosome to further accelerate the computation. The Q +- and Q --scores were defined as and , respectively, and we also defined the Q-score as Q=Q ++Q -. RefMap regions were identified by Q +- or Q --score >0.95.

Mapping cell-type-specific genes from RefMap regions

For each cell type within lung tissue, we defined cell-type-specific RefMap regions as the overlap between RefMap regions and the total set of snATAC-seq peaks detected in that cell type. Cell-type-specific RefMap genes were then identified if the extended gene body (i.e., the region up to 10kb either side of the annotated gene body) overlapped with any of the cell-type-specific regions (Turro et al., 2021). To get the final gene lists, non-expressed genes in corresponding cell types were excluded from RefMap genes in the downstream analysis. In addition, we note that there are non-adult samples (∼30 weeks gestation and ∼3 years) sequenced in the single cell profiling data (Wang et al., 2020a). To remove the bias towards lung development, we first calculated the fold change of gene expression levels between the adult sample (∼30 years) and non-adult ones, and defined non-developmental genes (nDG) as those with FC>1.5. Only RefMap genes that were identified as expressed and non-developmental in each cell type were kept for downstream analysis.

Validation of RefMap results in 23andMe and GenOMICC datasets

We calculated the overlap of total RefMap regions and of cell-type-specific RefMap regions with genomic regions shown to contain COVID-19-associated SNPs (p < 1e-04) based on the GWAS of an independent cohort recruited by 23andMe (Shelton et al., 2021). To determine whether the observed overlap is statistically significant, we examined the average overlap with ten sets of control regions of equivalent length to RefMap regions. Control regions were +/-1Mb-5Mb distant from the RefMap regions (Quinlan and Hall, 2010). The same procedure was performed for the GenOMICC dataset (Kousathanas et al., 2021). SNP association with COVID-19 in the 23andMe and GenOMICC datasets was calculated as previously described (Kousathanas et al., 2021; Pairo-Castineira et al., 2021).

Heritability analysis

We used LD score regression (LDSC) (Bulik-Sullivan et al., 2015) to calculate the SNP-based heritability based on different GWAS and different gene sets in this study. Partitioned heritability was calculated as previously described (Finucane et al., 2015). Briefly, for all gene lists, we examined the proportion of total SNP-based heritability carried by SNPs +/-100kb from the transcription start site (TSS) of each gene in the list. Enrichment was calculated by comparing the ratio of partitioned heritability to the quantity of genetic material.

Implementation details of MAGMA, PAINTOR and Pascal

MAGMA (v1.08) (de Leeuw et al., 2015) and Pascal (Lamparter et al., 2016) were applied using default settings. Input consisted of summary statistics for all SNPs genome-wide as measured in the COVID-19 GWAS (COVID-19 Host Genetics Initiative, 2021). We employed PAINTOR (v3.0) following the guidance provided in Kichaev et al. (2014) and https://github.com/gkichaev/PAINTOR_V3.0/. The genome was annotated based on the snATAC-seq peaks detected in human lungs (Wang et al., 2020a). We ran the algorithm in the MCMC mode. All other parameters in PAINTOR were left to be default. In all cases, we estimated the LD structure using EUR samples from the 1000 Genomes Project phase 3.

Mendelian randomization

In total, 46 GWAS measures of NK cell subtypes were identified from the IEU Open GWAS Project, including "met-b-124", "met-b-245", "met-b-242", "met-b-237", "met-b-258", "met-b-246", "met-b-249", "met-b-140", "met-b-240", "met-b-123", "met-b-250", "met-b-239", "met-b-120", "met-b-154", "met-b-247", "met-b-251", "met-b-238", "met-b-243", "met-b-244", "met-b-153", "met-b-248", "met-b-152", "met-b-122", "met-b-121", "met-b-252", and "met-b-241" (Roederer et al., 2015). Exposure SNPs or instrumental variables (IVs) are chosen based on an arbitrary P-value cutoff (Choi et al., 2019; Wootton et al., 2018). A cutoff that is too low will lose informative instruments, but a cutoff that is too high could introduce non-informative instruments. We chose to set the cutoff at 5e-06 in line with our previous work (Julian et al., 2021). We employed a series of sensitivity analyses to ensure that our analysis was not confounded by invalid IVs. Identified SNPs were clumped for independence using PLINK clumping in the TwoSampleMR tool (Purcell et al., 2007). A stringent cutoff of R 2≤0.001 and a window of 10,000kb were used for clumping within a European reference panel. Where SNPs were in LD, those with the lowest P-value were retained. SNPs that were not present in the reference panel were excluded. Where an exposure SNP was unavailable in the outcome dataset, a proxy with a high degree of LD (R 2≥0.9) was identified in LDlink within a European reference population (Machiela and Chanock, 2015). Where a proxy was identified to be present in both datasets, the target SNP was replaced with the proxy in both exposure and outcome datasets in order to avoid phasing issues (Hartwig et al., 2016). Where a SNP was not present in both datasets and no SNP was available in sufficient LD, the SNP was excluded from the analysis. The effects of SNPs on outcomes and exposures were harmonized in order to ensure that the beta values were signed with respect to the same alleles. For palindromic alleles, those with minor allele frequency (MAF) > 0.42 were omitted from the analysis in order to reduce the risk of errors due to strand issues (Hartwig et al., 2016). The MR measure with the greatest power is the inverse-variance weighted (IVW) method, but this is contingent upon the exposure IV assumptions being satisfied (Burgess and Thompson, 2017). With the inclusion of a large number of SNPs within the exposure IV, it is possible that not all variants included are valid instruments and therefore, in the event of a significant result, it is necessary to include a range of robust methods which provide valid results under various violations of MR principles at the expense of power (Burgess et al., 2019). Robust methods applied in this study include MR-Egger, MR-PRESSO, weighted median, weighted mode, and MR-Lasso. With respect to the IVW analysis, a fixed-effects (FE) model is indicated in the case of homogeneous data, whilst a multiplicative random effects (MRE) model is more suitable for heterogeneous data. Burgess et al. recommended that an MRE model be implemented when using GWAS summary data to account for heterogeneity in variant-specific causal estimates (Burgess et al., 2019). In the interest of transparency, we calculated both results but present the MRE in the text. MR analyses should include evaluation of exposure IV strength. In order to achieve this, we provided the F-statistic, MR-Egger intercept, MR-PRESSO global test, Cochran's Q test, and I 2 for our data. The F-statistic is a measure of instrument strength with >10 indicating a sufficiently strong instrument (Burgess et al., 2011). We provided F-statistics for individual exposure SNPs and the instrument as a whole. Cochran's Q test is an indicator of heterogeneity in the exposure dataset and serves as a useful indicator that horizontal pleiotropy is present as well as directing decisions to implement FE or MRE IVW approaches (Bowden et al., 2018). The MR-Egger intercept test determines whether there is directional horizontal pleiotropy. The MR-PRESSO global test determines if there are statistically significant outliers within the exposure-outcome analysis (Verbanck et al., 2018). I 2 was calculated as a measure of heterogeneity between variant specific causal estimates, with a low I 2 indicating that Egger is more likely to be biased towards the null (Bowden et al., 2016). Finally, we performed a leave-one-out analysis using the method of best fit for each exposure SNP within the IV in order to determine if any single variants were exerting a disproportionate effect upon the results of our analysis (Burgess et al., 2019). The same MR analysis was conducted for COVID19-hg and GenOMICC GWAS independently.

Single-cell multiome profiling of NK cells

To obtain the NK single-cell multiome data, human PBMCs were isolated from a 33 year old male and sorted including CD14 exclusion (Renoux et al., 2015) to isolate NK cells (as utilized in Gauthier et al. (2019). NK cells were isolated by a series of monoclonal antibodies conjugated to magnetic beads (Miltenyi Biotech). The final step was sorting (AutoMACS pro separator, Miltenyi Biotech) to achieve CD14 exclusion (monocytes). A detailed nuclei isolation protocol can be found at https://www.encodeproject.org/experiments/ENCSR710NDM/. After nuclei isolation the 10x multiome protocol (Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Reagent Bundle, PN-1000283) was followed according to manufacturer's instructions (10X Genomics, USA) and can be accessed at https://assets.ctfassets.net/an68im79xiti/1MrGvRY2vJF0Fc1lcn3BPW/d24d1099ee82656f5c230cd08bdaec8b/CG000338_ChromiumNextGEM_Multiome_ATAC_GEX_User_Guide_RevD.pdf or https://www.encodeproject.org/experiments/ENCSR710NDM/. Libraries were sequenced on a NovaSeq6000 instrument. The scATAC-seq libraries were sequences as follows: Read 1N, 50 cycles; i7 Index, 8 cycles; i5 Index, 24 Cycles; Read 2N, 49 cycles. The snRNA-seq libraries were sequenced as follows: Read 1, 28 cycles; i7 Index, 10 cycles; i5 Index, 10 Cycles; Read 2, 90 cycles. 12,834 cells were profiled; 5,046 ATAC fragments and 858 expressed genes were identified per cell. TSS enrichment score for the scATAC-seq data was 11. Raw data was analyzed using CellRanger ARC (10X Genomics).

Deriving cell-type-specific genes for T-cell and NK-cell subtypes

We derived CD4+ and CD8+ T-cell genes from RefMap T-cell genes based on their relative expression (Travaglini et al., 2020). In particular, we defined CD4+ T-cell genes as those genes higher expressed in CD4+ T cells compared to CD8+ T cells (FC>1.5). The CD8+ T-cells genes were defined similarly. Identified gene sets were then analyzed in downstream analysis such as heritability partitioning as described above (see section “heritability analysis”). NK-cell multiome data was analyzed using ArchR (Granja et al., 2021). NK-cell subtypes were identified based on relative expression and open chromatin over the gene body and promoter, respectively. In downstream analysis doublets (a single droplet containing multiple cells) were excluded. Missing data was imputed using MAGIC (Markov affinity-based graph imputation of cells) (van Dijk et al., 2018). Dimensionality reduction was performed by iterative latent semantic indexing (LSI) and data was visualized using uniform manifold approximation and projection (UMAP). RefMap NK-cell genes were assigned to NK-cell subtypes based on relative expression within NK multiome data. For each subtype, a suitable control group provided a comparison to identify genes overexpressed (FC>1.5) in the subgroup of interest: CD56bright NK cells were compared to CD56null NK cells; NKG2D+, CD16+ and NKG2A+ NK cells were compared to NKG2D-, CD16- and NKG2A- NK cells respectively. Whereas the latter three groups were identified based on presence/absence of expression, CD56bright/dim NK cells were identified by first excluding cells with no CD56 expression, and then dividing the remaining cells based on the overall median expression of CD56. The method applied here is analogous to that used to assign RefMap T-cell genes to CD4+/CD8+ T-cells.

Motif enrichment analysis

Motif enrichment analysis was performed using HOMER v4.11 (Heinz et al., 2010). Specifically, to focus on RefMap regions and remove enrichment bias from the general chromatin accessibility, we used the original snATAC-seq peaks as the background in the analysis. Other parameters were left to be default.

Transcriptome analysis

Four single-cell RNA-seq datasets were used in the transcriptome analyses, including human healthy lungs (Travaglini et al., 2020; Wang et al., 2020a) and COVID-19 patients (Liao et al., 2020; Ren et al., 2021). Data after quality control (QC) was acquired for each study. Only samples from the respiratory system were considered in the analyses. For the healthy lung data, only expressed genes were considered in the analysis. For the disease samples, we removed the overlap of severe patients between the two cohorts (Liao et al., 2020; Ren et al., 2021). In the comparative expression analysis of severe versus moderate patients, to stabilize the analysis we estimated the change of gene expression levels using the Z-score estimated from Wilcoxon rank-sum test, wherein a positive Z-score indicates a higher expression level in severe patients and a negative value suggests the lower expression. The Benjamini-Hochberg (BH) procedure was used for multiple testing correction throughout the study.

Network analysis

We first downloaded the human PPIs from STRING v11, including 19,567 proteins and 11,759,455 protein interactions. To eliminate the bias caused by hub proteins, we first carried out the random walk with restart algorithm (Wang et al., 2015) over the PPI network, wherein the restart probability was set to 0.5, resulting in a smoothed network after retaining the top 5% predicted edges. To decompose the network into different subnetworks/modules, we performed the Leiden algorithm (Traag et al., 2019), a community detection algorithm that searches for densely connected modules by optimizing the modularity. After the algorithm converged, we obtained 1,681 modules with an average size of 9.98 nodes (SD=53.35; Table S6).

Rare-variant burden testing

Rare-variant burden testing was performed to determine whether any genes were differentially enriched with rare variants between severe COVID-19 patients and population controls. MAF were checked against gnomAD/ESP, as well as a pooled common variant list obtained from 19 of the 21 participating cohorts; variants with MAF >1% were removed. We utilized an additive model in which a score of 0 was assigned if there were no deleterious variants identified in a particular gene; 1 if at least one deleterious variant, but all heterozygous; and 2 if at least one homozygous deleterious variant. All burden tests were performed using REGENIE v2 (Mbatchou et al., 2021) with firth correction. LoF variants were identified by HIGH impact in the Ensembl annotations from VEP (McLaren et al., 2016). Meta-analysis was performed in two steps: first an inverse-variance weighted fixed-effect meta-analysis was performed to obtain summary statistics of the same ancestry. Then a random-effect (Dersimonian-Laird) meta-analysis was applied across the resulting mono-ancestry summary statistics. To avoid inflation of test statistics we removed all genes for which the number of controls was <5,000 or the MAC<10. A QQ-plot confirmed that there was no significant genomic inflation (λGC=0.825; Figure S4). Significant enrichment of rare LoF variants associated with severe COVID-19 for each cell-type-specific RefMap gene set was determined by comparing with 10,000 gene sets of the same size randomly selected from genes which passed all stages of QC.

Reagent or Resource	Source	Identifier
Critical Commercial Assays

NK Cell isolation kit	Miltenyi Biotec	#130-092-657
Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Reagent Bundle	10X Genomics	PN-1000283

Experimental Models: Cell Lines

Homo sapiens male adult (33 years) CD14-negative NK primary cell nuclear fraction	The Jackson Laboratory	N/A

Software and Algorithms

RefMap	https://github.com/szhang1112/refmap	DOI: 10.5281/zenodo.5774249
ArchR	https://github.com/GreenleafLab/ArchR	N/A
CellRanger ARC	10X Genomics	N/A
SKAT-O	https://cran.r-project.org/web/packages/SKAT/index.html	N/A
R v4.0.1	https://cran.r-project.org/mirrors.html	N/A
snpStats	https://www.bioconductor.org/packages/release/bioc/html/snpStats.html	N/A
VariantAnnotation	https://www.bioconductor.org/packages/release/bioc/html/VariantAnnotation.html	N/A
VAutils	https://github.com/oyhel/vautils/	N/A
PLINK V1.90	http://zzz.bwh.harvard.edu/plink/download.shtml	N/A
PRISM 7	GraphPad	N/A
IGV v2.4.16	https://software.broadinstitute.org/software/igv/	N/A
MATLAB R2018b	MathWorks	N/A
MAGMA v1.08	https://ctg.cncr.nl/software/magma	N/A
Pascal	https://www2.unil.ch/cbg/index.php?title=Pascal	N/A
PAINTOR v3.0	https://github.com/gkichaev/PAINTOR_V3.0	N/A
LD Score Regression	https://github.com/bulik/ldsc	N/A
HOMER v4.11	http://homer.ucsd.edu/homer/	N/A

Data

NK single-cell multiome data	encodeproject.org	https://www.encodeproject.org/experiments/ENCSR710NDM/

89 in total

Review 1. Roles of the NKG2D immunoreceptor and its ligands.

Authors: David H Raulet
Journal: Nat Rev Immunol Date: 2003-10 Impact factor: 53.106

Review 2. Mendelian Randomization for Strengthening Causal Inference in Observational Studies: Application to Gene × Environment Interactions.

Authors: George Davey Smith
Journal: Perspect Psychol Sci Date: 2010-10-11

3. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic.

Authors:
Journal: Eur J Hum Genet Date: 2020-05-13 Impact factor: 4.246

4. Signals of hope: gauging the impact of a rapid national vaccination campaign.

Authors: Smadar Shilo; Hagai Rossman; Eran Segal
Journal: Nat Rev Immunol Date: 2021-04 Impact factor: 53.106

5. Integrating functional data to prioritize causal variants in statistical fine-mapping studies.

Authors: Gleb Kichaev; Wen-Yun Yang; Sara Lindstrom; Farhad Hormozdiari; Eleazar Eskin; Alkes L Price; Peter Kraft; Bogdan Pasaniuc
Journal: PLoS Genet Date: 2014-10-30 Impact factor: 5.917

6. Genomewide Association Study of Severe Covid-19 with Respiratory Failure.

Authors: David Ellinghaus; Frauke Degenhardt; Luis Bujanda; Maria Buti; Agustín Albillos; Pietro Invernizzi; Javier Fernández; Daniele Prati; Guido Baselli; Rosanna Asselta; Marit M Grimsrud; Chiara Milani; Fátima Aziz; Jan Kässens; Sandra May; Mareike Wendorff; Lars Wienbrandt; Florian Uellendahl-Werth; Tenghao Zheng; Xiaoli Yi; Raúl de Pablo; Adolfo G Chercoles; Adriana Palom; Alba-Estela Garcia-Fernandez; Francisco Rodriguez-Frias; Alberto Zanella; Alessandra Bandera; Alessandro Protti; Alessio Aghemo; Ana Lleo; Andrea Biondi; Andrea Caballero-Garralda; Andrea Gori; Anja Tanck; Anna Carreras Nolla; Anna Latiano; Anna Ludovica Fracanzani; Anna Peschuck; Antonio Julià; Antonio Pesenti; Antonio Voza; David Jiménez; Beatriz Mateos; Beatriz Nafria Jimenez; Carmen Quereda; Cinzia Paccapelo; Christoph Gassner; Claudio Angelini; Cristina Cea; Aurora Solier; David Pestaña; Eduardo Muñiz-Diaz; Elena Sandoval; Elvezia M Paraboschi; Enrique Navas; Félix García Sánchez; Ferruccio Ceriotti; Filippo Martinelli-Boneschi; Flora Peyvandi; Francesco Blasi; Luis Téllez; Albert Blanco-Grau; Georg Hemmrich-Stanisak; Giacomo Grasselli; Giorgio Costantino; Giulia Cardamone; Giuseppe Foti; Serena Aneli; Hayato Kurihara; Hesham ElAbd; Ilaria My; Iván Galván-Femenia; Javier Martín; Jeanette Erdmann; Jose Ferrusquía-Acosta; Koldo Garcia-Etxebarria; Laura Izquierdo-Sanchez; Laura R Bettini; Lauro Sumoy; Leonardo Terranova; Leticia Moreira; Luigi Santoro; Luigia Scudeller; Francisco Mesonero; Luisa Roade; Malte C Rühlemann; Marco Schaefer; Maria Carrabba; Mar Riveiro-Barciela; Maria E Figuera Basso; Maria G Valsecchi; María Hernandez-Tejero; Marialbert Acosta-Herrera; Mariella D'Angiò; Marina Baldini; Marina Cazzaniga; Martin Schulzky; Maurizio Cecconi; Michael Wittig; Michele Ciccarelli; Miguel Rodríguez-Gandía; Monica Bocciolone; Monica Miozzo; Nicola Montano; Nicole Braun; Nicoletta Sacchi; Nilda Martínez; Onur Özer; Orazio Palmieri; Paola Faverio; Paoletta Preatoni; Paolo Bonfanti; Paolo Omodei; Paolo Tentorio; Pedro Castro; Pedro M Rodrigues; Aaron Blandino Ortiz; Rafael de Cid; Ricard Ferrer; Roberta Gualtierotti; Rosa Nieto; Siegfried Goerg; Salvatore Badalamenti; Sara Marsal; Giuseppe Matullo; Serena Pelusi; Simonas Juzenas; Stefano Aliberti; Valter Monzani; Victor Moreno; Tanja Wesse; Tobias L Lenz; Tomas Pumarola; Valeria Rimoldi; Silvano Bosari; Wolfgang Albrecht; Wolfgang Peter; Manuel Romero-Gómez; Mauro D'Amato; Stefano Duga; Jesus M Banales; Johannes R Hov; Trine Folseraas; Luca Valenti; Andre Franke; Tom H Karlsen
Journal: N Engl J Med Date: 2020-06-17 Impact factor: 91.245

7. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes.

Authors: Allen Wang; Joshua Chiou; Olivier B Poirion; Justin Buchanan; Michael J Valdez; Jamie M Verheyden; Xiaomeng Hou; Parul Kudtarkar; Sharvari Narendra; Jacklyn M Newsome; Minzhe Guo; Dina A Faddah; Kai Zhang; Randee E Young; Justinn Barr; Eniko Sajti; Ravi Misra; Heidie Huyck; Lisa Rogers; Cory Poole; Jeffery A Whitsett; Gloria Pryhuber; Yan Xu; Kyle J Gaulton; Sebastian Preissl; Xin Sun
Journal: Elife Date: 2020-11-09 Impact factor: 8.140

8. Phospholipase D activation in human natural killer cells through the Kp43 and CD16 surface antigens takes place by different mechanisms. Involvement of the phospholipase D pathway in tumor necrosis factor alpha synthesis.

Authors: M A Balboa; J Balsinde; J Aramburu; F Mollinedo; M López-Botet
Journal: J Exp Med Date: 1992-07-01 Impact factor: 14.307

Review 9. Molecular Regulation of NK Cell Maturation.

Authors: Jiacheng Bi; Xuefu Wang
Journal: Front Immunol Date: 2020-08-11 Impact factor: 7.561

10. A molecular cell atlas of the human lung from single-cell RNA sequencing.

Authors: Kyle J Travaglini; Ahmad N Nabhan; Lolita Penland; Rahul Sinha; Astrid Gillich; Rene V Sit; Stephen Chang; Stephanie D Conley; Yasuo Mori; Jun Seita; Gerald J Berry; Joseph B Shrager; Ross J Metzger; Christin S Kuo; Norma Neff; Irving L Weissman; Stephen R Quake; Mark A Krasnow
Journal: Nature Date: 2020-11-18 Impact factor: 49.962

2 in total

1. Low expression of EXOSC2 protects against clinical COVID-19 and impedes SARS-CoV-2 replication.

Authors: Tobias Moll; Valerie Odon; Calum Harvey; Mark O Collins; Andrew Peden; John Franklin; Emily Graves; Jack Ng Marshall; Cleide Dos Santos Souza; Sai Zhang; Lydia Castelli; Guillaume Hautbergue; Mimoun Azzouz; David Gordon; Nevan Krogan; Laura Ferraiuolo; Michael P Snyder; Pamela J Shaw; Jan Rehwinkel; Johnathan Cooper-Knock
Journal: Life Sci Alliance Date: 2022-10-14

Review 2. Neurological consequences of COVID-19.

Authors: Waldemar Brola; Maciej Wilski
Journal: Pharmacol Rep Date: 2022-09-30 Impact factor: 3.919

2 in total