Literature DB >> 36127160

Transcriptome-wide summary data-based Mendelian randomization analysis reveals 38 novel genes associated with severe COVID-19.

Suhas Krishnamoorthy1, Gloria H-Y Li2, Ching-Lung Cheung1,3.   

Abstract

Severe COVID-19 has a poor prognosis, while the genetic mechanism underlying severe COVID-19 remains largely unknown. We aimed to identify genes that are potentially causally associated with severe COVID-19. We conducted a summary data-based Mendelian randomization (SMR) analysis using expression quantitative trait loci (eQTL) data from 49 different tissues as the exposure and three COVID-19-phenotypes (very severe respiratory confirmed COVID-19 [severe COVID-19], hospitalized COVID-19, and SARS-CoV-2 infection) as the outcomes. SMR using multiple SNPs was used as a sensitivity analysis to reduce false positive rate. Multiple testing was corrected using the false discovery rate (FDR) q-value. We identified 309 significant gene-trait associations (FDR q value < 0.05) across 46 tissues for severe COVID-19, which mapped to 64 genes, of which 38 are novel. The top five most associated protein-coding genes were Interferon Alpha and Beta Receptor Subunit 2 (IFNAR2), 2'-5'-Oligoadenylate Synthetase 3 (OAS3), mucin 1 (MUC1), Interleukin 10 Receptor Subunit Beta (IL10RB), and Napsin A Aspartic Peptidase (NAPSA). The potential causal genes were enriched in biological processes related to type I interferons, interferon-gamma inducible protein 10 production, and chemokine (C-X-C motif) ligand 2 production. In addition, we further identified 23 genes and 5 biological processes which are unique to hospitalized COVID-19, as well as 13 genes that are unique to SARS-CoV-2 infection. We identified several genes that are potentially causally associated with severe COVID-19. These findings improve our limited understanding of the mechanism of COVID-19 and shed light on the development of therapeutic agents for treating severe COVID-19.
© 2022 Wiley Periodicals LLC.

Entities:  

Keywords:  COVID-19; Mendelian randomization; eQTL; transcriptome

Year:  2022        PMID: 36127160      PMCID: PMC9538104          DOI: 10.1002/jmv.28162

Source DB:  PubMed          Journal:  J Med Virol        ISSN: 0146-6615            Impact factor:   20.693


INTRODUCTION

The coronavirus disease 2019 (COVID‐19) is a highly contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), which has resulted in a global pandemic and public health crisis. As of 12 March 2022, there have been over 452 million confirmed cases of COVID‐19, as well as over 6 million deaths. COVID‐19 infections display extensive variability in symptoms, prognosis, and severity among individuals, with some infections being asymptomatic and others being lethal. Like other infectious diseases, host genetics affect susceptibility to COVID‐19 infection, severity, and prognosis. , The COVID‐19 Host Genetic Initiative (HGI) is a collaborative effort, which generates and shares data regarding the genetic determinants of COVID‐19 outcomes, and has previously identified several risk loci associated with the disease. These findings provide important insights into how host genetics may influence COVID‐19 infection and progression. However, no causality can be inferred by the genome‐wide association study (GWAS). Thus, there is an urgent need to better understand the causal factors of COVID‐19 infection and progression, especially to explain the mechanism underlying severe COVID‐19. Although GWAS per se does not infer causality, combining summary statistics from multiple GWAS could infer causality via the Mendelian randomization (MR) framework. At the same time, MR has been shown to be a valid approach for drug repurposing, drug identification, and clinical management. Expression quantitative trait loci (eQTL) are the genetic variants that affect the expression level of a gene. Combining summary statistics from both transcriptome‐wide eQTL and GWAS of COVID‐19 using MR approach is powerful in identifying genes with their expression levels being causally associated with COVID‐19 outcomes. Previous studies have identified 2 and 14 putatively causal genes of COVID‐19. These studies, which were conducted using earlier releases of HGI data, provide valuable insights into the pathological mechanism of the disease. However, in these studies, only one genetic instrument was used without sensitivity analysis, leading to potentially biased findings and inflated false positive rates. To better understand whether host genes, and hence its expression, could affect the susceptibility to severe COVID‐19, we performed a summary data‐based Mendelian Randomization (SMR) analysis using the top single‐nucleotide polymorphism (SNP) in eQTL study as the instrument and the severe COVID‐19, hospitalized COVID‐19 and SARS‐CoV‐2 infection in release 6 of the COVID‐19 HGI data as the outcomes. , SMR analysis using the summary statistics from multiple SNPs (multi‐SNPs SMR) was performed as the sensitivity analysis to reduce the false positive rate.

MATERIALS AND METHODS

Study design and data sources

In the current study, we evaluated the causality of transcriptome‐wide gene expression (exposure) on COVID‐19 outcomes using the SMR analysis. Summary statistics from eQTL studies provide information on the effect of genetic variants on gene expression levels. As expression levels vary by tissue‐type, eQTL studies are generally conducted for a specific tissue. In each tissue, probes are used to measure a gene's expression and the cis‐eQTLs for that gene are any SNPs located within 1 Mb of the gene probe that are significantly associated with the gene's expression as defined by the PeQTL < 5E−8. The cis‐eQTLs were obtained from the 48 different tissues from v7 of the Genotype‐Tissue Expression (GTEx) project with a sample size ranging from 80 to 491 as well as the cis‐eQTL data from blood provided by the eQTLGen Consortium with a sample size of 31 684. Details of the sample size and tissue used are provided in Table 1. We utilized the expression data set generated from all the aforementioned 49 tissues because gene expression in various tissues could have a systematic effect, especially when the gene codes for a secreted protein. Previously, COVID‐19 has been found to affect virtually all organs, while the virus binds to ACE2, which is present in nearly all tissues. Moreover, some genes are tissue‐specific, with the expression only detected in a few tissues. To obtain the broadest coverage of genes, we, therefore, studied 49 tissues in total.
Table 1

List of tissues used as exposures and their respective sample size

TissueNumber of samplesTissueNumber of samples
Brain – Substantia nigra80Colon – Sigmoid203
Brain – Spinal cord (cervical c‐1)83Esophagus – Gastroesophageal Junction213
Minor Salivary Gland85Pancreas220
Brain – Amygdala88Testis225
Uterus101Stomach237
Vagina106Colon – Transverse246
Brain – Hypothalamus108Breast – Mammary Tissue251
Brain – Anterior cingulate cortex (BA24)109Heart – Atrial Appendage264
Brain – Hippocampus111Artery – Aorta267
Brain – Putamen (basal ganglia)111Heart – Left Ventricle272
Cells ‐ EBV‐transformed lymphocytes117Cells – Transformed fibroblasts300
Brain – Frontal Cortex (BA9)118Adipose – Visceral (Omentum)313
Ovary122Esophagus – Muscularis335
Small Intestine – Terminal Ileum122Skin – Not Sun Exposed (Suprapubic)335
Brain – Cerebellar Hemisphere125Esophagus – Mucosa358
Brain – Nucleus accumbens (basal ganglia)130Nerve – Tibial361
Prostate132Whole Blood369
Brain – Cortex136Lung383
Brain – Caudate (basal ganglia)144Adipose – Subcutaneous385
Spleen146Artery – Tibial388
Artery – Coronary152Thyroid399
Liver153Skin – Sun Exposed (Lower leg)414
Brain – Cerebellum154Muscle – Skeletal491
Pituitary157Blood (eQTLgen)31 684
Adrenal Gland175
List of tissues used as exposures and their respective sample size Three COVID‐19 outcomes were used in the SMR analysis: severe COVID‐19 (n = 8779, controls = 1 001 875), hospitalized COVID‐19 (n= 24 274, controls = 2 061 529) and SARS‐CoV‐2 infection (n = 112 612, controls = 2 474 079). The control groups were subjects from the general population who did not show the respective phenotype. Severe COVID‐19 was considered the primary outcome since it was often associated with a poor prognosis. Hospitalized COVID‐19 and SARS‐CoV‐2 infection were the secondary outcomes. The phenotypes (Supporting Information: Table S1) were defined by the COVID‐19 HGI and have been used previously in several studies. , , , , GWAS summary data of these phenotypes were obtained from the meta‐analyses round 6 data released by HGI (https://www.covid19hg.org/results/r6/). Each contributing study conducted its GWAS independently but following the HGI consortium guidelines which suggested accounting for the following covariates: age, sex, age2, age*sex, and the first 20 principal components.

SMR and HEIDI analysis

SMR is a method that integrates summary statistics from GWAS and eQTL studies under the MR framework to prioritize genes whose expression levels are potentially causally associated with an outcome trait. MR analysis uses genetic variants as instrumental variables to infer a causal relationship between an exposure and an outcome. While conventional two‐sample MR utilizes summary statistics from two independent GWAS to estimate the effect of one phenotype on another, SMR utilizes summary statistics from independent eQTL study and GWAS to estimate the effect of a gene's expression level on a phenotype. We adopted the SMR method (version 1.03) for our primary analysis (Figure 1A). For each gene, the cis‐eQTL having the strongest association signal was used as the single genetic instrument in the primary analysis. This method has been adopted in previous SMR studies. , However, as more than one cis‐eQTL could be implicated in the expression of one gene, using a single eQTL as the instrument could lead to biased results and potentially inflated false positive rate. In addition, SMR uses a single variant as the genetic instrument disallowing the distinguishment between associations that arise due to causality or due to horizontal pleiotropy (i.e., the association between the genetic variant and the exposure being independent of that between the genetic variant and the outcome). Therefore, we used the multi‐SNPs SMR (Figure 1B) as the sensitivity analysis to reduce such bias. By including multiple instruments, especially within the cis region of a probe, the likelihood of horizontal pleiotropy is diminished, and the statistical power of the analysis increases. , Detailed information of the SMR and multi‐SNPs SMR has been described previously. ,
Figure 1

Illustration of SMR and multi‐SNPs SMR. (A) Illustration of the primary analysis, SMR. For each tissue T, the most associated cis‐eQTL for each gene (SNPtop) was used as the genetic instrument to estimate the effect of the expression level of the gene on the outcome. (B) Illustration of the sensitivity analysis, multi‐SNPs SMR. For each tissue T, all independent cis‐eQTLs for a gene (r 2 < 0.1 with the top cis‐eQTL) which were significant (PeQTL < 5E−8) were used as the genetic instruments to estimate the effect of the expression level of the gene on the outcome. Both analyses were repeated for each of the 49 tissues and all 3 COVID‐19 outcomes.

Illustration of SMR and multi‐SNPs SMR. (A) Illustration of the primary analysis, SMR. For each tissue T, the most associated cis‐eQTL for each gene (SNPtop) was used as the genetic instrument to estimate the effect of the expression level of the gene on the outcome. (B) Illustration of the sensitivity analysis, multi‐SNPs SMR. For each tissue T, all independent cis‐eQTLs for a gene (r 2 < 0.1 with the top cis‐eQTL) which were significant (PeQTL < 5E−8) were used as the genetic instruments to estimate the effect of the expression level of the gene on the outcome. Both analyses were repeated for each of the 49 tissues and all 3 COVID‐19 outcomes. For SMR, we followed the standard approach. , The analysis was conducted for each tissue independently. The cis‐eQTL within the cis‐region of a probe having the strongest association with the gene's expression was selected as the instrumental variable for the SMR analysis to evaluate the association between a gene's expression level in the specific tissue and the outcome, and only genes with at least one cis‐eQTL with PeQTL < 5E−8 were included. SNPs with an allele frequency <0.01 were removed, along with any SNPs having an allele frequency difference >0.2 between any of the three data sets (GWAS summary statistics, eQTL summary statistics, and linkage disequilibrium [LD] reference panel data). The analysis was repeated in all the 49 tissues with available eQTL data for all 3 COVID‐19 outcomes The heterogeneity in dependent instruments (HEIDI) test was used to distinguish pleiotropy from linkage. HEIDI tests against the null hypothesis that the association detected by the SMR test is due to pleiotropy (i.e., due to a single causal variant as opposed to two variants in high LD with each other). For the HEIDI test, cis‐eQTLs that were in strong LD (r 2 > 0.9) or in weak LD (r 2 < 0.05) were excluded. The HEIDI test was only conducted if the number of cis‐eQTLs for the test was ≥3. Associations with a PHEIDI < 0.01 were considered to show significant heterogeneity and hence excluded. To correct for multiple testing, false discovery rate (FDR) q value was used. FDR q value < 0.05 was considered statistically significant. In the sensitivity analysis using multi‐SNPs SMR, we used all independent cis‐eQTLs (SNPs with r 2 < 0.1 with the top cis‐eQTL) with p < 5E−8 within the cis‐region as instrumental variables. The originally proposed threshold for defining independent cis‐eQTL was r 2 < 0.9. However, this threshold may include SNPs with high LD (as defined as r 2 ≥ 0.1). We, therefore, adopted the r 2 < 0.1 as the threshold for defining independent SNPs. The reference panel of 503 European individuals from the 1000 genomes project (phase 3) was used to compute LD estimates for the analysis. Only gene‐trait associations that show significant associations in both primary and sensitivity analyses were considered potential causal genes. FDR q value < 0.05 was considered statistically significant. Data analyses were done using SMR v1.03 (https://cnsgenomics.com/software/smr/) and R version 4.1.0 (https://www.r-project.org/). LD reference panel was generated using BCFtools 1.11, VCFtools 0.1.15, and PLINK 1.90 (https://www.cog-genomics.org/plink/1.9/). To gain insight into the biological function of the potential causal genes, gene ontology (GO) enrichment analysis was conducted using g:Profiler (https://biit.cs.ut.ee/gprofiler) to annotate the results. Correction for multiple testing was done using g:SCS (Set Counts and Sizes), a method developed to estimate thresholds in complex functional profiling data such as GO.

RESULTS

Severe COVID‐19

In the transcriptome‐wide analysis of 49 tissues with severe COVID‐19, 826 gene‐trait associations were identified by univariable SMR analysis across 46 tissues (including blood) with FDR q value < 0.05. Thirty‐nine of these associations were excluded as they were insignificant in the multi‐SNPs SMR analysis. Of the remaining 787 associations, 478 did not pass the HEIDI test, resulting in 309 significant associations. The 309 associations were spread across 46 tissues (Supporting Information: Table S2), while the associations were mapped to 64 genes (Table 2), of which 38 were protein coding. Thirty‐nine of the genes were identified in two or more tissues (Supporting Information: Table S3). The top five most associated protein‐coding genes were Interferon Alpha and Beta Receptor Subunit 2 (IFNAR2), 2′‐5′‐Oligoadenylate Synthetase 3 (OAS3), mucin 1 (MUC1), Interleukin 10 Receptor Subunit Beta (IL10RB), and Napsin A Aspartic Peptidase (NAPSA). The tissue with the highest number of significant genes was esophagus mucosa (n = 17) (Supporting Information: Table S4).
Table 2

Genes that were associated with severe COVID‐19.

GeneNo. of tissuesResults in the most associated tissueProtein coding
βSE p ValueTissue
IFNAR26−0.3750.0682.9E−08BloodYes
OAS310.1290.0244.7E−08Cells Transformed fibroblastsYes
MUC110.7860.1467.8E−08BloodYes
IL10RB3−0.480.0911.2E−07Cells Transformed fibroblastsYes
NAPSA2−0.4790.0922.0E−07BloodYes
KCNC341.1510.2222.2E−07BloodYes
PLEKHM120.1550.0321.9E−06Brain Cerebellar HemisphereYes
OAS14−0.3840.0812.1E−06Skin Sun Exposed Lower legYes
ARL17A10−0.1020.0222.3E−06Brain CortexYes
TYK240.3620.0772.5E−06Adrenal GlandYes
NUTM2B28−0.1330.038.0E−06Skin Not Sun Exposed SuprapubicYes
WNT311−0.1780.049.5E−06PancreasYes
KANSL11−0.3830.0871.1E−05Cells Transformed fibroblastsYes
MUC5B1−0.3240.0751.6E−05LungYes
TRIM2330.2860.0682.8E−05Skin Sun Exposed Lower legYes
CENPK6−0.3430.0844.4E−05Cells Transformed fibroblastsYes
ARHGAP271−0.2340.0574.4E−05Brain Nucleus accumbens basal gangliaYes
RMI22−0.1890.0476.1E−05Esophagus MucosaYes
ICAM52−0.230.0576.4E−05Cells Transformed fibroblastsYes
ZNF52830−0.140.0356.7E−05Colon SigmoidYes
HLA‐DQA110.0980.0257.6E−05BloodYes
ADAMTS610.1320.0338.1E−05TestisYes
LRRC37A7−0.1650.0428.5E−05Artery CoronaryYes
ELF510.4270.111.0E−04LungYes
CRHR11−0.2740.0711.1E−04Esophagus MuscularisYes
TOMM780.1820.0471.3E−04Adipose SubcutaneousYes
SLC22A3110.130.0341.4E−04SpleenYes
CCHCR110.2530.0661.4E−04StomachYes
ZGLP11−0.3530.0941.7E−04Brain CerebellumYes
ICAM320.10.0271.8E−04Brain AmygdalaYes
RASIP11−0.0960.0261.9E−04Cells Transformed fibroblastsYes
NSF10.6190.1682.2E−04Esophagus MucosaYes
PPWD120.1530.0422.6E−04Colon TransverseYes
HLA‐DQB23−0.0770.0212.8E−04Brain CortexYes
NOTCH41−0.0870.0253.9E−04Cells EBV‐transformed lymphocytesYes
SNX3130.0840.0244.8E−04Brain HypothalamusYes
SELE1−0.140.044.9E−04LiverYes
HLA‐DQB130.0810.0235.3E−04Brain HypothalamusYes
RP11‐119F19.222−0.220.0417.4E−08BloodNo
RP11‐259G18.310−0.2160.0423.0E−07BloodNo
NAPSB31−0.120.0245.3E−07Whole BloodNo
FAM22B1−0.1750.0361.2E−06BloodNo
FAM215B6−0.1770.0371.3E−06ThyroidNo
RP11‐798G7.520.2080.0431.6E−06BloodNo
DND1P15−0.1030.0222.9E−06BloodNo
RP11‐259G18.210−0.3450.0754.0E−06BloodNo
KANSL1‐AS15−0.1060.0235.2E−06Brain CortexNo
BEND3P33−0.3070.0685.7E−06BloodNo
RP11‐259G18.18−0.170.0387.1E−06Esophagus MucosaNo
RP11‐506M13.323−0.1550.0361.6E−05Nerve TibialNo
RP11‐182L21.54−0.2080.052.9E−05Nerve TibialNo
LRRC37A17P10.260.0623.1E−05Esophagus MucosaNo
AC091132.120.4370.1063.7E−05Esophagus MucosaNo
RP11‐707O23.52−0.0880.0214.2E−05Brain Substantia nigraNo
CTD‐2020K17.110.2340.0597.0E−05Brain CerebellumNo
RPS26P82−0.1910.0498.2E−05Adipose SubcutaneousNo
CTD‐2116N20.110.1520.0399.1E−05TestisNo
CTC‐534A2.21−0.5480.1411.0E−04BloodNo
MAPT‐AS120.1690.0441.0E−04Brain Nucleus accumbens basal gangliaNo
RP11‐798G7.81−0.1350.0351.3E−04Brain Cerebellar HemisphereNo
XXbac‐BPG299F13.1720.1970.0521.4E−04Esophagus Gastroesophageal JunctionNo
AC005682.51−0.2010.0531.6E−04ThyroidNo
RP11‐46C24.31−0.1170.0322.5E−04Brain CerebellumNo
RP11‐707M1.11−0.1430.0392.8E−04Brain Caudate basal gangliaNo
Genes that were associated with severe COVID‐19. GO enrichment analysis (Table 3) for biological processes showed the genes were significantly enriched in seven GO terms relating to immune system regulation. The associated biological processes (p adjusted < 0.05) were driven by four genes (2′‐5′‐Oligoadenylate Synthetase 1 [OAS1], Tyrosine Kinase 2 [TYK2], IFNAR2, and OAS3) and related to type I interferon, interferon‐gamma inducible protein 10 (IP‐10) production, and chemokine (C‐X‐C motif) ligand 2 production.
Table 3

GO biological process enrichment analysis of the significant genes for severe COVID‐19

Term nameTerm idAdjusted p valueIntersections
Type I interferon signaling pathwayGO:00603370.004OAS1, TYK2, IFNAR2, OAS3
Cellular response to type I interferonGO:00713570.005OAS1, TYK2, IFNAR2, OAS3
Response to type I interferonGO:00343400.008OAS1, TYK2, IFNAR2, OAS3
Negative regulation of IP‐10 productionGO:00716590.015OAS1, OAS3
IP‐10 productionGO:00716120.030OAS1, OAS3
Regulation of IP‐10 productionGO:00716580.030OAS1, OAS3
Negative regulation of chemokine (C‐X‐C motif) ligand 2 productionGO:20003420.030OAS1, OAS3
GO biological process enrichment analysis of the significant genes for severe COVID‐19

Hospitalized COVID‐19

A total of 816 gene‐trait associations were identified by univariable SMR analysis across eQTL in 49 tissues on hospitalized COVID‐19 (FDR q value < 0.05). Fifty‐nine of these associations were excluded as they were insignificant in the multi‐SNPs SMR analysis. Of the remaining 757 associations, 462 did not pass the HEIDI test, resulting in 295 significant associations. The 295 associations were spread across 49 tissues (Supporting Information: Table S5), while the associations were mapped to 63 genes (Table 4), of which 36 were protein coding. Thirty‐eight genes were identified in two or more tissues (Supporting Information: Table S6). The top five most associated protein‐coding genes were IFNAR2, NAPSA, IL10RB, alpha 1‐3‐N‐acetylgalactosaminyltransferase and alpha 1‐3‐galactosyltransferase (ABO), and OAS3. Twenty‐three genes (14 protein coding and 9 non‐protein coding) were unique to hospitalized COVID‐19 (Supporting Information: Table S7). The tissue with the highest number of significant genes was blood (n = 11; Supporting Information: Table S4).
Table 4

Genes that were associated with hospitalized COVID‐19

GeneNo. of tissuesResults in the most associated tissueProtein coding
βSE p ValueTissue
IFNAR25−0.2720.0445.3E−10Skin Sun Exposed Lower legYes
NAPSA3−0.3170.0532.5E−09BloodYes
IL10RB30.3290.0581.9E−08Muscle SkeletalYes
ABO10.2980.0531.9E−08Artery TibialYes
OAS330.0780.0142.8E−08Cells Transformed fibroblastsYes
ARL17A10−0.0720.0134.6E−08Brain CortexYes
MUC110.410.0765.8E−08BloodYes
PLEKHM140.0950.0195.2E−07Brain Cerebellar HemisphereYes
ELF510.3330.0675.8E−07LungYes
KANSL16−0.2680.0547.0E−07Cells Transformed fibroblastsYes
KCNC320.0690.0141.4E−06SpleenYes
WNT32−0.1080.0221.4E−06PancreasYes
OAS14−0.3020.0631.5E−06Adipose SubcutaneousYes
LRRC37A25−0.0910.0191.9E−06VaginaYes
ARHGAP271−0.1670.0376.1E−06Brain Nucleus accumbens basal gangliaYes
MUC5B1−0.1910.0426.2E−06LungYes
SPPL2C1−0.1410.0328.5E−06Brain Cerebellar HemisphereYes
LRRC37A24−0.0520.0121.4E−05Brain Cerebellar HemisphereYes
CCHCR150.1110.0261.5E−05BloodYes
NXPE320.080.0192.0E−05Esophagus MuscularisYes
CRHR11−0.1990.0472.3E−05Esophagus MuscularisYes
ICAM3110.1870.0442.6E−05Adipose Visceral OmentumYes
MAPT1−0.1810.0443.2E−05Skin Not Sun Exposed SuprapubicYes
ARL17B20.0910.0223.5E−05Whole BloodYes
TCF1910.0850.0213.8E−05BloodYes
RAB2A10.2520.0624.5E−05Artery TibialYes
CYP4B150.220.0544.6E−05Nerve TibialYes
TYK220.4920.1214.8E−05Skin Sun Exposed Lower legYes
OAS210.3850.0965.5E−05BloodYes
ZNF77820.0540.0135.6E−05PancreasYes
NSF20.3920.0985.9E−05Esophagus MucosaYes
SLC22A31100.0660.0179.0E−05Esophagus MucosaYes
BMP110.230.061.1E−04Artery TibialYes
AVEN10.1040.0271.3E−04Heart Atrial AppendageYes
SCAMP520.0980.0261.5E−04Skin Sun Exposed Lower legYes
SFTPD10.0760.0213.3E−04PituitaryYes
RP11‐259G18.318−0.0720.0124.0E−10Whole BloodNo
KANSL1‐AS112−0.0760.0131.5E−09Colon TransverseNo
RP11‐259G18.24−0.2430.0428.5E−09BloodNo
LRRC37A4P40.0820.0154.1E−08LiverNo
CRHR1‐IT11−0.1150.0216.6E−08Brain Putamen basal gangliaNo
DND1P15−0.0610.0111.0E−07Artery CoronaryNo
NAPSB28−0.0740.0141.5E−07Whole BloodNo
RP11‐259G18.119−0.1230.0242.7E−07Esophagus MucosaNo
FAM215B7−0.1080.0214.5E−07ThyroidNo
ATP5O10.1540.0321.4E−06LungNo
MAPT‐AS120.1950.0423.9E−06Muscle SkeletalNo
RP11‐707O23.51−0.0590.0133.9E−06Brain Substantia nigraNo
AC091132.130.3180.0694.4E−06Esophagus MucosaNo
RP11‐798G7.5140.1060.0241.0E−05Brain Cerebellar HemisphereNo
RP11‐798G7.6200.070.0161.2E−05ThyroidNo
CTD‐2020K17.110.1610.0371.8E−05Brain CerebellumNo
RPS26P82−0.1310.0312.1E−05Breast Mammary TissueNo
BEND3P31−0.1530.0362.2E−05BloodNo
RP11‐798G7.81−0.090.0212.9E−05Brain Cerebellar HemisphereNo
RP11‐119F19.21−0.0870.0226.2E−05BloodNo
PDCL3P413−0.1090.0276.3E−05Heart Left VentricleNo
HLA‐J3−0.0940.0248.7E−05Colon TransverseNo
ZBTB11‐AS11−0.1660.0439.9E−05Muscle SkeletalNo
CTD‐3092A11.21−0.0940.0241.1E−04LungNo
RP11‐322D14.110.0820.0211.5E−04Whole BloodNo
AL133481.120.0410.0111.8E−04Brain CerebellumNo
IL10RB‐AS110.060.0162.6E−04Brain CerebellumNo
Genes that were associated with hospitalized COVID‐19 GO enrichment analysis (Table 5) for biological processes showed the genes were significantly enriched in 12 GO terms relating to immune system regulation and homeostasis. The associated biological processes were driven by seven genes (OAS1, 2′‐5′‐Oligoadenylate Synthetase 2 [OAS2], OAS3, TYK2, IFNAR2, NAPSA, and Surfactant Protein D [SFTPD]). The associated biological processes (p adjusted < 0.05) were similar to those identified in the GO enrichment analysis of severe COVID‐19, with additional broad biological processes related to nuclease regulation, surfactant and chemical homeostasis, and interferon‐beta production.
Table 5

GO biological process enrichment analysis of the significant genes for hospitalized COVID‐19

Term nameTerm idAdjusted p valueIntersections
Type I interferon signaling pathwayGO:00603373.2E−05OAS1, OAS2, OAS3, IFNAR2, TYK2
Cellular response to type I interferonGO:00713573.8E−05OAS1, OAS2, OAS3, IFNAR2, TYK2
Response to type I interferonGO:00343406.7E−05OAS1, OAS2, OAS3, IFNAR2, TYK2
Regulation of ribonuclease activityGO:00607002.6E−04OAS1, OAS2, OAS3
Surfactant homeostasisGO:00431290.003OAS1, NAPSA, SFTPD
Chemical homeostasis within a tissueGO:00488750.004OAS1, NAPSA, SFTPD
Regulation of nuclease activityGO:00320690.006OAS1, OAS2, OAS3
Negative regulation of IP‐10 productionGO:00716590.010OAS1, OAS3
IP‐10 productionGO:00716120.020OAS1, OAS3
Regulation of IP‐10 productionGO:00716580.020OAS1, OAS3
Negative regulation of chemokine (C‐X‐C motif) ligand 2 productionGO:20003420.020OAS1, OAS3
Positive regulation of interferon‐beta productionGO:00327280.048OAS1, OAS2, OAS3
GO biological process enrichment analysis of the significant genes for hospitalized COVID‐19

SARS‐CoV‐2 infection

We identified a total of 86 gene‐trait associations which were significant in both SMR and multi‐SNPs SMR while not being rejected by the HEIDI test. These were mapped to 20 genes across 38 tissues (Supporting Information: Table S8). Among the 20 genes, 14 had a significant association in more than one tissue, 16 were protein‐coding and 13 genes were unique to SARS‐CoV‐2 infection (Supporting Information: Table S7). The top 5 protein‐coding genes were Neurexophilin And PC‐Esterase Domain Family Member 3 (NXPE3), SUMO Specific Peptidase 7 (SENP7), Centrosomal Protein 97 (CEP97), TUB Like Protein 2 (TULP2), and Netrin 5 (NTN5). No GO terms were found to be significantly enriched in the GO analysis of the significant causal genes for SARS‐CoV‐2 infection.

DISCUSSION

In this study, we conducted the SMR analysis between 49 transcriptome‐wide eQTL datasets as exposures and severe COVID‐19 as the primary outcome. We identified a total of 309 gene‐tissue associations and 64 potentially causal genes for severe COVID‐19. Using hospitalized COVID‐19 as the secondary outcome, similar genes and GO enrichment were observed, suggesting the findings observed in the severe COVID‐19 analysis were robust. In addition, we identified 23 genes and 5 pathways that may be specific to hospitalized COVID‐19, as well as 13 genes that may be specific to SARS‐CoV‐2 infection. These findings potentially explain the underlying mechanisms of severe COVID‐19 and serve as potential therapeutic targets for the treatment of COVID‐19. A previous SMR study found no genes to be significantly associated with severe COVID‐19 but found one gene, IFNAR2, to be associated with hospitalized COVID‐19 in the eQTL data set from the blood tissue. In the current study, INFAR2 was identified as a causal gene in the analysis of both severe and hospitalized COVID‐19. The lack of associations in the previous study could be due to the use of the GWAS summary statistics from an earlier release of HGI data (release 3), and the analysis was conducted using only eQTL summary statistics from blood and lung, hence leading to limited statistical power. Another transcriptome‐wide SMR study was conducted using eQTL and mQTL data from lung and whole blood with the hospitalized COVID‐19 GWAS from HGI release 5, which identified associations involving seven protein‐coding genes TYK2, IFNAR2, OAS1, OAS3, XCR1, CCR5, and MAPT. The current study found all but two (XCR1 and CCR5) of these genes to be significantly associated with hospitalized COVID‐19. Compared to these two studies, the current study had the highest power and identified the largest number of potentially causal genes for severe and hospitalized COVID‐19 using the most recent release of HGI data and 49 eQTL data sets. In total, we identified 64 protein‐coding genes (Supporting Information: Table S7). Of these, several have previously been found to be associated with severe COVID‐19, hospitalized COVID‐19 and SARS‐CoV‐2 infection through various experimental techniques and definitions of the COVID‐19 outcomes: these include ABO, OAS1, OAS2, OAS3, TYK2, IFNAR2, IL10RB, MAPT, ARL17A, ARL17B, CCHCR1, CEP97, LRRC37A, LRRC37A2, NXPE3, ICAM5, MUC1, NTN5, ELF5, FUT2, KANSL1, MUC5B, SFTPD, SLC22A31, SELE, SENP7, NAPSA, NOTCH4, PLEKHA4, WNT3, NSF, TOMM7, HLA‐DQB2, HLA‐DQA1, HLA‐DQB1, HLA‐DPA1, TULP2, and TCF19. , , , , , , , , , , , , , , On top of these, our analysis has identified several novel genes which are potentially causally associated with severe COVID‐19, hospitalized COVID‐19 and SARS‐CoV‐2 infection; 15 genes were associated with severe COVID‐19 (ADAMTS6, CENPK, NUTM2B, PPWD1, RMI2, RASIP1, SNX31, TRIM23, ARHGAP27, CRHR1, ICAM3, KCNC3, PLEKHM1, ZGLP1, and ZNF528), 12 with hospitalized COVID‐19 (AVEN, BMP1, CYP4B1, RAB2A, SCAMP5, SPPL2C, ARHGAP27, CRHR1, ICAM3, KCNC3, PLEKHM1, and ZNF778), and 5 were associated with SARS‐CoV‐2 infection (HSD17B14, MED24, CLK2, RAS1P1, and MAMSTR). We also identified 39 non‐protein‐coding genes (Supporting Information: Table S7). Of these, only DND1P1, KANSL1‐AS1, LRRC37A4P, LCN1P1, MAPT‐AS1, IL10RB‐AS1, PDCL3P4, ZBTB11‐AS1, and CRHR1‐IT1 have been reported previously. , , The remaining noncoding genes are novel. The top 10 most associated protein‐coding genes with severe COVID‐19 were also found to be associated with hospitalized COVID‐19 (IFNAR2, OAS3, MUC1, IL10RB, NAPSA, KCNC3, PLEKHM1, OAS1, ARL17A, and TYK2). Four of these, OAS1, IFNAR2, OAS3, and TYK2 were the genes that drove the enrichment of biological processes in the GO enrichment analysis. The enriched biological processes were related to type I interferon signaling, IP‐10 production, and CXCL2 production. IFNAR2 and TYK2 are involved in the signaling pathway of type I interferons, while OAS1 and OAS3 are known to regulate type I interferons and chemokines. Type I and III interferons have previously been implicated in COVID‐19 and may have a detrimental and beneficial effect on SARS‐CoV‐2 replication. These interferons are a central part of the innate antiviral response. However, the response of these interferons was found to be diminished and delayed in patients with COVID‐19 and observed only in a small number of patients as their infection reached critical stages. , It has been suggested that type I and III interferons confer early protection but late amplification of the disease. IL10RB belongs to the cytokine receptor family and has previously been implicated in COVID‐19. Two of the top 10 genes, KCNC3 and PLEKHM1, are novel genes associated with severe COVID‐19 and may play a causal role in the disease through the immune system. KCNC3 is a gated potassium voltage channel that may affect the immune response by inhibiting T cell activation. PLEKHM1 regulates autophagosome‐lysosome fusion, and depletion of PLEKHM1 enhances the presentation of MHC class 1 molecules, thereby affecting the immune response. Taken together, these genes highlight the role of immune system dysregulation in severe COVID‐19 and provide further insight into the potentially causal role of certain genes in this process. In the supplementary analysis using SARS‐CoV‐2 infection as an outcome, we identified four genes in common with severe COVID‐19 (OAS1, IFNAR2, RASIP1, and CCHCR1) and seven in common with hospitalized COVID‐19 (OAS1, IFNAR2, CCHCR1, PDCL3P4, NXPE3, and ZBTB11‐AS1). The remaining 13 unique SARS‐CoV‐2 infection genes may be associated with SARS‐CoV‐2 infection, but not a severe response as a consequence. In the GO enrichment analysis, no significant GO enrichment was found. This may be due to a lower statistical power of the SARS‐CoV‐2 infection GWAS. The asymptomatic nature of COVID‐19 and self‐reported phenotype may lead to misclassification of the SARS‐CoV‐2 cases in the GWAS of SARS‐CoV‐2 infection. As a result, a higher negative predictive value and lower sensitivity of covid infection diagnosis could be observed, resulting in a lower statistical power of the GWAS. The present study also identified several novel genes associated with severe COVID‐19 that may underlie the pathogenesis of severe COVID‐19. ARHGAP27, RASIP1, and CENPK are involved in signaling by the Rho GTPase pathway and may increase COVID‐19 infection through dysregulation of the pathway, affecting cytoskeletal dynamics. , TRIM23 is known to induce autophagy in response to viral infections. ZNF528 and ZGLP1 are involved in transcription and may function as regulators in pathways affecting the severity of COVID‐19 infections. ICAM3 expression in several brain tissues was associated with an increased risk of severe and hospitalized COVID‐19. Previously, ICAM3 was found to increase lactate dehydrogenase levels, and lower total white blood cell counts in patients infected with severe acute respiratory syndrome (SARS) which may similarly affect those infected by SARS‐CoV‐2. Our study has important clinical implications. The ongoing COVID‐19 pandemic is unprecedented in modern times, while severe COVID‐19 is particularly associated with poor prognosis. Our findings provide further insight into the mechanism of severe COVID‐19. The current study has identified several novel genes which have a potentially causal association with severe COVID‐19. Notably, some of the genes identified in the current study have been implicated in COVID‐19 infection and progression in previous studies with different study designs and definitions, which serve as positive controls and provide evidence of the robustness of our findings. Thus, the genes identified in this study provide direction for further functional validation studies and shed light on the development of therapeutic agents for treating severe COVID‐19. Our study also has several strengths. First, we utilized 49 different tissue types, allowing a comprehensive evaluation of gene expression with severe COVID‐19 across tissues. Second, we utilized the multi‐SNPs SMR as the sensitivity analysis, providing further robust evidence of association and reducing the false positive rates. Finally, we used GWAS from release 6 of COVID‐19 HGI data in the analysis providing larger statistical power compared to previous studies. Nevertheless, there are limitations. First, the GWAS did not control for all confounders, such as diabetes, obesity, and other risk factors of severe COVID‐19, which may affect the outcome. However, the nature of the study using eQTL is less likely to be affected by confounders. Second, although several associations only appeared in one tissue, it may not be a true tissue‐specific association since many eQTLs are underpowered due to the low sample size. Thus, cautious interpretation is required. Third, although we conducted additional sensitivity analysis to reduce bias, we cannot eliminate the possibility of the association being due to horizontal pleiotropy. For this reason, we consider the identified genes as “potentially causal.” Further study is required to validate the role of these genes on covid outcomes. In conclusion, we identified 64 genes that are potentially causally associated with severe COVID‐19, of which 38 genes are novel. These results lead to a better understanding of the mechanism of severe COVID‐19 and show potential therapeutic targets for the treatment and/or reduction of symptoms and mortality after SARS‐CoV‐2 infection. Further research into the relationship between the genes put forward here, and severe COVID‐19 is warranted.

AUTHOR CONTRIBUTIONS

Ching‐Lung Cheung designed the study. Suhas Krishnamoorthy gathered data, conducted the analysis, and drafted the manuscript. Ching‐Lung Cheung and Gloria H.‐Y. Li revised the manuscript for intellectual content. All authors read and approved the final manuscript.

CONFLICT OF INTEREST

The authors declare no conflict of interest. Supplementary information. Click here for additional data file. Supplementary information. Click here for additional data file. Supplementary information. Click here for additional data file. Supplementary information. Click here for additional data file.
  50 in total

1.  PLEKHM1 regulates autophagosome-lysosome fusion through HOPS complex and LC3/GABARAP proteins.

Authors:  David G McEwan; Doris Popovic; Andrea Gubas; Seigo Terawaki; Hironori Suzuki; Daniela Stadel; Fraser P Coxon; Diana Miranda de Stegmann; Sagar Bhogaraju; Karthik Maddi; Anja Kirchof; Evelina Gatti; Miep H Helfrich; Soichi Wakatsuki; Christian Behrends; Philippe Pierre; Ivan Dikic
Journal:  Mol Cell       Date:  2014-12-11       Impact factor: 17.970

2.  The COVID-19 Pandemic: A Global Natural Experiment.

Authors:  Blake Thomson
Journal:  Circulation       Date:  2020-04-23       Impact factor: 29.690

3.  Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus.

Authors:  Amy R Cross; Peng Hua; Damien J Downes; Nigel Roberts; Ron Schwessinger; Antony J Cutler; Altar M Munis; Jill Brown; Olga Mielczarek; Carlos E de Andrea; Ignacio Melero; Deborah R Gill; Stephen C Hyde; Julian C Knight; John A Todd; Stephen N Sansom; Fadi Issa; James O J Davies; Jim R Hughes
Journal:  Nat Genet       Date:  2021-11-04       Impact factor: 38.330

Review 4.  Human genetic susceptibility to infectious disease.

Authors:  Stephen J Chapman; Adrian V S Hill
Journal:  Nat Rev Genet       Date:  2012-02-07       Impact factor: 53.242

5.  Genomewide Association Study of Severe Covid-19 with Respiratory Failure.

Authors:  David Ellinghaus; Frauke Degenhardt; Luis Bujanda; Maria Buti; Agustín Albillos; Pietro Invernizzi; Javier Fernández; Daniele Prati; Guido Baselli; Rosanna Asselta; Marit M Grimsrud; Chiara Milani; Fátima Aziz; Jan Kässens; Sandra May; Mareike Wendorff; Lars Wienbrandt; Florian Uellendahl-Werth; Tenghao Zheng; Xiaoli Yi; Raúl de Pablo; Adolfo G Chercoles; Adriana Palom; Alba-Estela Garcia-Fernandez; Francisco Rodriguez-Frias; Alberto Zanella; Alessandra Bandera; Alessandro Protti; Alessio Aghemo; Ana Lleo; Andrea Biondi; Andrea Caballero-Garralda; Andrea Gori; Anja Tanck; Anna Carreras Nolla; Anna Latiano; Anna Ludovica Fracanzani; Anna Peschuck; Antonio Julià; Antonio Pesenti; Antonio Voza; David Jiménez; Beatriz Mateos; Beatriz Nafria Jimenez; Carmen Quereda; Cinzia Paccapelo; Christoph Gassner; Claudio Angelini; Cristina Cea; Aurora Solier; David Pestaña; Eduardo Muñiz-Diaz; Elena Sandoval; Elvezia M Paraboschi; Enrique Navas; Félix García Sánchez; Ferruccio Ceriotti; Filippo Martinelli-Boneschi; Flora Peyvandi; Francesco Blasi; Luis Téllez; Albert Blanco-Grau; Georg Hemmrich-Stanisak; Giacomo Grasselli; Giorgio Costantino; Giulia Cardamone; Giuseppe Foti; Serena Aneli; Hayato Kurihara; Hesham ElAbd; Ilaria My; Iván Galván-Femenia; Javier Martín; Jeanette Erdmann; Jose Ferrusquía-Acosta; Koldo Garcia-Etxebarria; Laura Izquierdo-Sanchez; Laura R Bettini; Lauro Sumoy; Leonardo Terranova; Leticia Moreira; Luigi Santoro; Luigia Scudeller; Francisco Mesonero; Luisa Roade; Malte C Rühlemann; Marco Schaefer; Maria Carrabba; Mar Riveiro-Barciela; Maria E Figuera Basso; Maria G Valsecchi; María Hernandez-Tejero; Marialbert Acosta-Herrera; Mariella D'Angiò; Marina Baldini; Marina Cazzaniga; Martin Schulzky; Maurizio Cecconi; Michael Wittig; Michele Ciccarelli; Miguel Rodríguez-Gandía; Monica Bocciolone; Monica Miozzo; Nicola Montano; Nicole Braun; Nicoletta Sacchi; Nilda Martínez; Onur Özer; Orazio Palmieri; Paola Faverio; Paoletta Preatoni; Paolo Bonfanti; Paolo Omodei; Paolo Tentorio; Pedro Castro; Pedro M Rodrigues; Aaron Blandino Ortiz; Rafael de Cid; Ricard Ferrer; Roberta Gualtierotti; Rosa Nieto; Siegfried Goerg; Salvatore Badalamenti; Sara Marsal; Giuseppe Matullo; Serena Pelusi; Simonas Juzenas; Stefano Aliberti; Valter Monzani; Victor Moreno; Tanja Wesse; Tobias L Lenz; Tomas Pumarola; Valeria Rimoldi; Silvano Bosari; Wolfgang Albrecht; Wolfgang Peter; Manuel Romero-Gómez; Mauro D'Amato; Stefano Duga; Jesus M Banales; Johannes R Hov; Trine Folseraas; Luca Valenti; Andre Franke; Tom H Karlsen
Journal:  N Engl J Med       Date:  2020-06-17       Impact factor: 91.245

6.  Notch4 signaling limits regulatory T-cell-mediated tissue repair and promotes severe lung inflammation in viral infections.

Authors:  Hani Harb; Mehdi Benamar; Peggy S Lai; Paola Contini; Jason W Griffith; Elena Crestani; Klaus Schmitz-Abe; Qian Chen; Jason Fong; Luca Marri; Gilberto Filaci; Genny Del Zotto; Novalia Pishesha; Stephen Kolifrath; Achille Broggi; Sreya Ghosh; Metin Yusuf Gelmez; Fatma Betul Oktelik; Esin Aktas Cetin; Ayca Kiykim; Murat Kose; Ziwei Wang; Ye Cui; Xu G Yu; Jonathan Z Li; Lorenzo Berra; Emmanuel Stephen-Victor; Louis-Marie Charbonnier; Ivan Zanoni; Hidde Ploegh; Gunnur Deniz; Raffaele De Palma; Talal A Chatila
Journal:  Immunity       Date:  2021-04-28       Impact factor: 31.745

7.  Transcriptome-wide summary data-based Mendelian randomization analysis reveals 38 novel genes associated with severe COVID-19.

Authors:  Suhas Krishnamoorthy; Gloria H-Y Li; Ching-Lung Cheung
Journal:  J Med Virol       Date:  2022-09-20       Impact factor: 20.693

8.  Mortality in COVID-19 disease patients: Correlating the association of major histocompatibility complex (MHC) with severe acute respiratory syndrome 2 (SARS-CoV-2) variants.

Authors:  Eric de Sousa; Dário Ligeiro; Joana R Lérias; Chao Zhang; Chiara Agrati; Mohamed Osman; Sherif A El-Kafrawy; Esam I Azhar; Giuseppe Ippolito; Fu-Sheng Wang; Alimuddin Zumla; Markus Maeurer
Journal:  Int J Infect Dis       Date:  2020-07-18       Impact factor: 3.623

View more
  1 in total

1.  Transcriptome-wide summary data-based Mendelian randomization analysis reveals 38 novel genes associated with severe COVID-19.

Authors:  Suhas Krishnamoorthy; Gloria H-Y Li; Ching-Lung Cheung
Journal:  J Med Virol       Date:  2022-09-20       Impact factor: 20.693

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.